What does the option 'Include non-mapped src_compound_ids' mean with respect to whole source mapping ?

The default option for whole source mapping is to 'Exclude' non-mapped 'From' src_compound_ids. Thus the default output simply gives a mapping of all currently assigned src_compound_ids in the 'From' source that happen to share a structure with a src_compound_id (that is also currently assigned to this structure) in the 'To' source. This output is normally all that is required for routine mapping purposes, such as the creation of upto date hyperlinks between the two sources.

However, some users (typically, data managers of resources containing smaller sets of src_compound_ids) have specifically asked to have the option of being able to modify the output to include (in addition to all the data that is provided by the 'Exclude' option) all 'non-mapped' src_compound_ids from the 'From' source in the output, including both current and obsolete src_compound_ids (see * below). This form of output assists efforts to curate and manually maintain links from (usually smaller) sources to other sources by allowing the user to more conveniently see what src_compound_ids are currently not mapped (when perhaps they should be) and which may therefore require further curation within their own resource. In this way, UniChem is being used as an aid to curation. Selecting the 'Include' option allows these users to do this.

Since the output for the 'Include' option contains a number of different kinds of data, it is useful to clarify how these will appear in the output...

 Mapped src_compound_ids will appear as per the 'Exclude' option (ie: with the corresponding 'To' src_compound_id in the second column).
 Non-mapped 'Current src-compound_ids' will appear with a 'null' in the second column.
 Non-mapped 'Obsolete src-compound_ids' will appear with a 'Obsolete src_compound_id' in the second column.

Note that although the list of 'From' src-compound_ids in the output is complete (ie: all src_compound_ids are present), it is not necessarily non-redundant, as mapped src_compound_ids may have multiple mappings to the 'To' source.

Unfortunately, the use of the 'Include' option may result in very significantly larger mapping files than the equivalent sets using the 'Exclude' option. For some very large sources this would produce extremely large outputs, which would be unlikely to be of value to any users. For this reason, the use of this option is limited to 'From' sources below a certain size. The current limit is set to '30000'. Queries using the 'Include' option with 'From' sources containing more that this number of currently assigned src_compound_ids are prohibitted.

* Note that the terms 'Current/Obsolete src_compound_id' (defined here. ) are distinct from 'Current/Obsolete Assignments' (defined here ).

Back to UniChem Home and Query page.