ХОБИ Руководство аннотатора
1. Introduction
This manual is
designed to enable an annotator (curator) to follow the sequence of steps involved in
checking and amending entries in ChEBI. All operations are carried
out using the ChEBI annotator tool.
Inputting of User Name and Password into the
login page takes the user to the 'Welcome' page.
2. Main Menu
Located on the left-hand side of the page.
2.1 Free Text Search
Enables the
annotator to search the complete content of the database. Annotators may enter
names or partial names, ChEBI IDs, other IDs (e.g. KEGG, Beilstein), synonyms,
InChI, SMILES, etc. Searches are case insensitive unless the 'Case Sensitive?' box is checked.
The wildcard character is %.
2.2 Search
Allows the
annotator to perform searches of unchecked entries only or to base the search
criteria on the classification within the ontology. Further options provide hyperlinked lists of pending, submitted and unsubmitted submissions. For the unchecked-entry
search, a drop-down menu allows the source of the entry (Chemical Ontology,
IntEnz, or KEGG COMPOUND) to be specified; the search is case-insensitive
unless the 'Case Sensitive' box is checked. For the Ontology Classification
search, the annotator is able to base the search on compound status (CHECKED or
OK), classification (CLASSIFIED or UNCLASSIFIED) and relation status (CHECKED
or OK).
2.3 Merge
Allows two
entities to be merged by inputting the IDs. The Merge Compounds screen requires
the annotator to select which of the original ChEBI names, definition, default
structures and ontology trees are to be retained. Failure to check any of the
options will result in an error notice. Full details of one or both entries
may be displayed by clicking on the relevant 'show compound' links. The merge
procedure can be cancelled at any time by use of the Cancel Changes tab. Care
must be taken when merging entries as there may be far-reaching consequences,
especially if one or more of the entries is already publicly visible. The
annotator must make absolutely certain by checking all data sources that the two
entries being merged relate to one and the same entity. If in doubt, or if for
example there are unproven stereochemical differences or ambiguities between
two entries, the annotator should not perform a merge but relate the entries to
one another through the ontology.
2.4 Demerge
Takes the
annotator to the Demerge Compounds screen, allowing selection of which children
are to be demerged from the parent. As for 'Merge', the demerge procedure can
be cancelled by use of the Cancel Changes tab. For the same reasons as given
above for merging, care must also be taken when demerging entries. Examples of
when demerging is justified are (1) when differences exist between names and
synonyms for entries that have been subject to a previous automatic merging and
(2) when it is desirable to distinguish between acids and their conjugate
bases. Caution: all demerged entries will inherit the ontology structure of the parent entry. Therefore the annotator will need to modify or delete relationships which are no longer valid.
2.5 Add Compound Entry
Allows addition
of a new entry to the database. Specifying the ChEBI name (see below) and
submitting will allow the system to generate a new ChEBI ID on a Compound
Result screen, with the new entry having an initial status OK. Before adding a
new compound the annotator should always conduct a thorough search in the
database for names, registry numbers and database links. If a compound is then
found already to exist, the annotator should use this rather than creating a new
one.
2.6 Logs
Allows the
annotator to view logs relating to automated procedures (e.g. KEGG updates,
incorporation of new sources, automated merges and demerges).
2.7 Help
Leads to the 'Special Characters List', a list of xml tags used to
enhance the chemical mark-up.
2.8 Logout
Speaks for itself.
2.9 <<hide
Allows the side
menu to be hidden, maximising the view of the main screen, a particularly
useful feature when using a browser such as Mozilla Firefox which does not
allow line wraps. The side menu may be reinstated via the <<show link.
3. Compound Result screen
This is the main
screen on which the results for a ChEBI entity are displayed. It offers the
annotator six tabs: View, View SC, Edit Compound, Edit Ontology, Edit Structure,
Edit Comment.
3.1 View
Displays the following features:
3.1.1 General Information
ChEBI Name – The name recommended by the
annotator for use within the biological community.
ChEBI ID – The stable ID assigned by the
system. IDs are assigned in sequence and their absolute values have no inherent
meaning.
ChEBI ASCII Name – The ChEBI name with
any special characters rendered in ASCII format.
Definition – a definition of the entry,
especially relevant for classes of entities, less so for instances.
3.1.2 Default Structure
The view of
an entity selected by the annotator as being of prime importance and which will
be the main structure displayed on the public web interface. If there is more
than one graphical structure, these may be viewed via a 'more>>' link.
The status of the structures (OK, CHECKED, DELETED, OBSOLETE) is indicated by a
colour-coded frame. Also displayed are the SMILES string and the InChI, both
derived from the MDL molfile corresponding to the default structure.
3.1.3 Status
The status of the
entry is indicated as OK, CHECKED, DELETED or OBSOLETE. Details of type of
merger (automatic or annotator), who and when created it, and who and when last modified it are supplied.
3.1.4 Formula
Assigned (manually by
the annotator) whenever possible; generally the molecular formula. The use of
subscripts is avoided. The source is stated, together with the status and an
indication of which child from a merged entry was the source or whether this
was from a parent.
3.1.5 Additional chemical data
Mass and charge, calculated automatically from the default structure, will appear here and should be checked.
3.1.6 ChEBI Ontology
Shows the
relationships relevant to the entity being viewed. Clicking on 'Tree View'
opens up a visual depiction of the tree with the different types of
relationship being indicated by different symbols. Entries and relationships
with status OK (i.e. unchecked) are shown in light grey, while those with
status CHECKED are in dark grey. The line for the entity currently being viewed
is shown in bold type. Clicking on 'Parents and Children View only' returns
the annotator to the default (textual) display of relationships.
3.1.7 IUPAC Name(s)
Shows one or
more names for an entity based on current IUPAC recommendations.
3.1.8 Synonym
Shows all synonyms
and their sources. Only those with status CHECKED will be viewable to the
public.
3.1.9 Database Accession
Provides
accession numbers and (if available) links to source databases. Only those with
status CHECKED will be viewable to the public.
3.1.10 Registry Numbers
Lists CAS, Beilstein
and Gmelin Registry Nos., where these are available.
3.1.11 CommentsShows comments added
by an annotator, relating either to a single data entry or to a complete entity.
3.1.12 Submitter Remarks
For submitted entries, shows remarks added by the submitter or annotator and any subsequent follow-ups. All remarks are visible on the public website.
3.2 View SC
Displays the standard view but with the addition of xml tags around
special characters.
3.3 Edit Compound
The main screen used by the annotators for editing the main text of an
entry.
3.4 Edit Ontology
The screen used for editing the ontology.
3.5 Edit Structure
The screen used for editing the various structural representations
of an entity.
3.6 Edit Comment
Allows the
annotator to add and edit a comment, either to a single data entry or to an
entity as a whole. For submitted entries, also allows the annotator to add or respond to Submitter Remarks.
4. Edit Compound screen
This is the
screen upon which an annotator can edit all details of a ChEBI entry except for
its structural data, comments and submitter remarks, and relationships within the ontology.
4.1 ChEBI Name
The recommended
name may be changed by the annotator to bring it into line with current usage
within the biological community. Although there is a limit on the number of
characters in a ChEBI Name, this is enormous (around 4000) and it is highly
desirable that such names are kept short – abbreviations (e.g. ATP, NAD) are
acceptable. A good maximum number of characters to work to is 50. Special
characters are encoded using the xml tags listed in the Help file. Care must be
taken to use the correct tags with characters that can be used in more than one
context, e.g. to distinguish between <stereo>alpha</stereo> and
<locant>alpha</locant>. To aid in selection of the correct tags a Special Character tool has been incorporated, accessible via a link next to the ChEBI Name field; similar links to this tool are found next to all those fields on the Edit Compound screen into which free text can be input.
Unless it is an abbreviation (e.g. ATP, NADPH), a ChEBI Name should start with a lower case letter, not a capital (unless this is a special character relating to stereochemistry or denoting an element).
Changing the ChEBI Name has consequences for other databases and resources which use the ChEBI Name as a reference and hence great care must be taken when making changes to ChEBI Names.
NB. In the case of IntEnz the ChEBI Name may be used within the Reactions field if no
IntEnz Name exists. However if an IntEnz Name exists then the changing the ChEBI Name will have no effect on IntEnz.
A singular name should always be used unless the entity is a class and a singular entity already exists within the database, in this case a plural can be used.
For example: porphyrin (CHEBI:8337) is an entity, porphyrins (CHEBI:26214) is a class.
Validations
The curator tool will produce validation errors if:
- Unicode characters are included in the name. All Unicode should be encoded with the Special Characters XML package.
- Brackets in a name are incorrectly nested for example, [(1->4]-alpha-D-galacturonide)n will throw a validation error because the square end bracket has occurred before the round end bracket.
- A ChEBI name is not unique because there is an already existing ChEBI name.
- The XML tags are not valid Special Characters XML or they are not closed properly.
- New line or tab character are present.
4.2 Definition
A definition may
be added. This is especially relevant to classes of compounds which appear at
the higher levels of the ontology. Good sources of definitions for the
Chemical Entity Ontology are the IUPAC Gold Book (http://goldbook.iupac.org/ ) and
the various IUPAC documents on nomenclature and terminology (see http://www.chem.qmul.ac.uk/iupac/ ),
while for the Biological Function and Application Ontologies, (modified) definitions of MeSH terms can
be adopted. No sources of
definitions need to be cited.
Validations
The curator tool will produce validation errors if:
- Unicode characters are included in the name. All Unicode should be encoded with the Special Characters XML package.
- The XML tags are not valid Special Characters XML or they are not closed properly.
- The length of the definition is less than 10 characters.
- New line or tab character are present.
4.3 Status
The annotator
should change the status of an entry to CHECKED only when all details,
including those relating to structure and the ontology, have been edited to the
annotator's satisfaction. An entry which has status CHECKED will be viewable on
the public web interface and included in the downloadable files at the next
release.
4.4 Formula
Any formula
derived from a primary source should be checked and if correct its status changed to
CHECKED. If a different formula is to be added, the status of the incorrect
formula should be changed to DELETE and the new formula added with status
CHECKED. Subscripts and hyphens should not be used. The order of atomic
elements within molecular formulae should follow the Hill system (http://en.wikipedia.org/wiki/Hill_system ).
The source must be specified using the dropdown menu – if arising from an
annotator's own brain this should be indicated as 'ChEBI'. If an entry cannot be assigned a formula (typically in the case of a class of compounds), then a dot '.' should be entered into the formula field and
its status kept as 'OK'.
The following conventions regarding ChEBI formulae should be followed:
- Unless immediately following a dot '.' any numeral refers to the preceding element in the formula. Example:
H2O really means there are two oxygen atoms and one oxygen atom.
- The dot '.' convention is used when dividing a formula into parts. Any numeral following a dot refers to all the elements within that part of the formula that follow it. Example: C2H3O2.Na.3H2O (CHEBI:32138) really means that after C2H3O2 there is one sodium (Na), six hydrogen and three oxygen atoms.
- Parentheses are used within ChEBI formulae to mean multiplication of elements.
- The 'n' convention is used to show an unknown quantity by which a formula is multiplied. For example: (C12H20O11)n from CHEBI:15443 really means that a C12H20O11 unit is multiplied by an unknown quantity.
- A comma can be used to indicate that there is one or more of the elements divided by the comma but that the exact stoicheiometry can vary. For instance, actinolite is a mineral with the chemical formula Ca2(Mg,Fe)5Si8O22(OH)2, which means that it could be anything in the continuous series between Ca2Mg5Si8O22(OH)2 and Ca2Fe5Si8O22(OH)2.
Validations
The curator tool will produce validation errors if:
- Unicode characters are included in the formula.
- Brackets in a formula are incorrectly nested.
- The formulae contain symbols which do not belong in the allowed set of symbols for formulae. The allowed set of symbols for formulae include all element symbols from the periodic table which are case-sensitive:
- H|Li|Na|K|Rb|Cs|Fr|Be|Mg|Ca|Sr|Ba|Ra|Sc|Y|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|
Er|Tm|Yb|Lu|Ac|Th|Pa|U|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|Ti|Zr|Hf|Rf|Ku|V|
Nb|Ta|Ha|Ns|Cr|Mo|W|Unh|Mn|Tc|Re|Uns|Fe|Ru|Os|Uno|Co|Rh|Ir|Une|Ni|Pd|Pt|Cu|
Ag|Au|Zn|Cd|Hg|B|Al|Ga|In|Tl|C|Si|Ge|Sn|Pb| N|P|As|Sb|Bi|O|Db|Sg|Bh|Hs|Mt|Ds|
Rg|S|Se|Te|Po|F|Cl|Br|I|At|He|Ne|Ar|Kr|Xe|Rn|Mu|D|T|
In addition the following characters are allowed:
- Numerals 0 through 9
- Dot character '.'
- The opening and closing parentheses '(' and ')'
- The characters 'n', 'm' and 'R'
- New line or tab character are present.
- A formula item with a single dot as its formula has a status of 'CHECKED'. The single dot should only be used to indicate that a curator has verified that no formula can be added to an item and it should have a status of 'OK'.
- A new formulae is added which already exists in the entry which is of the same source as the formulae being added.
4.5 Synonym
Synonyms derived
from primary sources will be displayed along with details of the source and
status. The annotator should check the status of each synonym and amend if
necessary. NB. Annotators must take
extra care when contemplating deletion of any synonym derived from IntEnz that
has type NAME, as this will also in effect cause a similar deletion within
IntEnz.
4.6 Add Names
Any new synonym
which the annotator considers relevant should be added along with its source.
Cross-reference to the source should be via links in the Database Accession or
Registry Numbers sections (see below).
A synonym taken from an external source should not normally be altered when being entered into ChEBI. However, if there is a real need to make alterations (e.g. in order to rearrange an index style of presentation, or to correct errors in the nesting of brackets), then the 'Adapted' checkbox next to the synonym should be ticked.
IUPAC names should also be added here. An 'IUPAC name'
is a name based on current recommendations of IUPAC. It need not be fully
systematic as it can make use of 'retained' and 'preselected' names. Some relevant sources are:
- IUPAC 'Revised Blue Book'
– To be published 2008, currently available as a draft document at http://www.iupac.org/reports/provisional/abstract04/favre_310305.html
- A Guide to IUPAC Nomenclature of Organic Compounds,
Recommendations 1993
- Nomenclature of Organic Chemistry, Sections A, B, C, D, E,
F and H, 1979 Edition. ('The Blue Book') – largely
superseded but still useful for class names and older trivial names.
- Compendium of Biochemical Nomenclature, 1993 Edition
('The White Book') – however many sections have been superseded.
- Nomenclature of Inorganic Chemistry (recommendations 1990) ('The Red Book')
- Nomenclature of Inorganic Chemistry II. Recommendations
2000
- Nomenclature of Inorganic Chemistry - IUPAC Recommendations
2005 ('The Revised Red Book'; largely supersedes the 1990 and 2000 editions)
- IUPAC Compendium of Chemical Terminology
('The Gold Book'), 1987. A revised version in
electronic form is available at http://goldbook.iupac.org/index.html
.
- Compendium of
Macromolecular Nomenclature,
1991 ('The Purple Book')
Further details of these and other IUPAC nomenclature
documents are available at http://www.iupac.org/publications/books/seriestitles/nomenclature.html
and http://www.chem.qmul.ac.uk/iupac/ .
Annotators should bear in mind the following points when entering IUPAC Names:
- An IUPAC Name should start with a lower case letter, not a capital (unless this is a special character relating to stereochemistry or denoting an element).
- Forward locants should not be used. Example: 'octane-1-sulfonic acid' is the correct form, not '1-octanesulfonic acid'.
- So-called 'systematic names' taken from external sources should not be assumed to have been constructed in strict adherence to IUPAC recommendations.
- The following spellings are recommended by IUPAC and should be used in ChEBI IUPAC Names: sulfur (not sulphur), aluminium (not aluminum), icosane (not eicosane). The alternative spellings may be used as synonyms.
Validations
The curator tool will produce validation errors if:
- Unicode characters are included in the name. All Unicode should be encoded with the Special Characters XML package.
- The XML tags are not valid Special Characters XML or they are not closed properly.
- New line or tab character are present.
- Brackets in the names are incorrectly nested.
- An IntEnz name of type 'NAME' and of status 'CHECKED' already has another IntEnz name of type 'NAME' and status 'CHECKED' in the entry.
- An IUPAC name is not unique within the database because there is an already existing IUPAC name.
4.7 Database Accession
All database accessions listed should be checked and
amended if necessary. Status must be 'CHECKED' for lines to be viewable on the
public web interface. New entries are added using the 'Add
Database Accessions' facility (see below).
Validations
The curator tool will produce validation errors if:
- Any Chemical Ontology accession which is "CHECKED".
- The database link does not conform to the specific database link format. The link formats for each database are listed in table 1.
Table 1: The list of database link validations for the curator application.
4.8 Registry Numbers
All numbers
listed should be checked and amended if necessary. Status
must be 'CHECKED' for lines to be viewable on the public web interface. New
entries are added using the 'Add Database Accessions' facility (see
below). Beilstein and Gmelin Registry Numbers can be added if known (but note that these numbers constitute the only
data that ChEBI can include from these two sources, owing to the databases not
being freely accessible).
Validations
The curator tool will produce validation errors if:
- The registry number does not conform to the specific registry number format. The registry number formats for each database are listed in table 2.
Table 2: The list of registry number validations for the curator application.
4.9 Add Database Accessions
Used by the
annotator for the entering of new database accessions or registry numbers. The
type and source must be selected from the dropdown menus.
Changes may be
incorporated by clicking on 'Submit Changes'. Erroneous changes may be
cancelled at any time up to submission by clicking on 'Cancel Changes'.
5. Edit Ontology screen
Using this screen, an annotator can both edit existing relationships between entities and create new ones.
In the two sub-ontologies "Role" and "Subatomic Particle"
a singular ChEBI Name should always be used. A plural ChEBI Name is allowed within the
"Chemical Entity" sub-ontology if the entity is a class and the singular ChEBI Name already exists.
5.1 Parents and Children View
This view lists
all the parent and child relationships directly pertaining to an entity and
their status [CHECKED, OK, DELETED or OBSOLETE (the OBSOLETE status can be
created only by the system)]. Only relationships with status
CHECKED and OK will be included in the tree structure and be visible on the
public web interface. The annotator must check each existing relationship and
amend if necessary.
When
editing an existing entry, the annotator needs to check all its non-OBSOLETE
relationships and leave these with status CHECKED or DELETED. No relationships
may be deleted which would cause an entity to be separated from the tree: it is
necessary to create a new relationship prior to deleting the last unwanted
one.
Hint: When
creating and editing relationships, it is useful to open the annotator tool in
two or more tabs or separate windows to facilitate rapid copying and pasting of
ChEBI IDs.
5.2 Tree View
Displays in
graphical form the tree structure. All direct lines upwards are shown together
with downward lines only as far as immediate children. Checked entries are
shown in a darker grey with the line for the entity currently being viewed
being in bold type. Annotators may navigate around the ontology in the tree view
by clicking on any displayed line. A table of relationships and their
shorthand symbols is displayed at the right-hand side of the tree view. Brief
descriptions of the sub-ontologies and relationships in the ChEBI Ontology
are provided in Sections 5.4 and 5.5 respectively, with fuller descriptions
and examples being included in the ChEBI User Manual, accessible via the
public web interface.
5.3 Add Relationship
Allows the
annotator to add a new relationship. Dropdown menus are provided for selecting
the type of relationship while the ID for the entity to which the new relationship
refers is entered into the relevant box.
Changes may be
incorporated by clicking on 'Submit Changes'. Erroneous changes may be
cancelled at any time up to submission by clicking on 'Cancel Changes'.
Validations
The tool has general validations which apply to most relationship types.
In general when the term "enabled" is used to describe a relationship it means
that its relationship status is either "CHECKED" or "OK".
The curator tool will produce validation errors if:
- All relationships which are cyclic, namely, "is tautomer of", "is enantiomer of",
"is conjugate base of" and "is conjugate acid of" do not have the same status
per relationship pair. For example,
if A "is tautomer of" B with status "CHECKED" then B "is tautomer of" A should
also have a status "CHECKED".
- A redundant relationship is entered whether it be a parent or child relationship. By
redundant we mean the same relationship already exists in that entry.
- If the graph is disconnected.
Disconnected graph validation is done when any changes are done to the ontology.
Disconnected graph validation will ensure that no entry is disconnected from the
graph. If all relationships for instance are deleted from an entry, the tool will
try to create an "is a" relationship with the "unclassifieds" domain. It will
not do this if the entry has children, in this case it will return errors. It
is not allowed to have entries in the "unclassifieds" domain which have children
themselves.
- An attempt to create a cycle in the graph with a non-cyclic relationship.
Directed acyclic graph validation ensures that any changes made will not create
cycles in the ontology with relationships which are strictly non cyclical.
If this occurs an error will be returned by the curator tool.
Relationships which are not cyclic are, "is a", "has part",
"is substituent group from", "has parent hydride" and "has functional parent".
- Compound status has to be either "CHECKED" or "OK" in order for modifications
to be made to the ontology.
- A validation error will be generated if a parent relationship which is not cyclic is not enabled in the entry.
Validations when creating new ontology relationships:
- You try to create a parent relationship with the root of the ontology namely "CHEBI:23091 ChEBI Ontology".
- You try to create a child relationship with the following entities "CHEBI:23091 ChEBI Ontology",
"CHEBI:27189 unclassifieds", "CHEBI:24431 chemical entity ontology",
"CHEBI:24432 biological role ontology" and "CHEBI:33232 application ontology".
- You try to create a parent relationship with the entry itself.
- You try to enter an invalid ChEBI id format.
- You try to create a new relationship which already exists in the entry.
5.4 The Sub-ontologies
ChEBI Ontology is subdivided into three separate
sub-ontologies:
5.4.1 Chemical Entity
Classifies chemical entities according to their structural features and properties.
5.4.2 Role
The role ontology encompasses three sub-ontologies:
Biological Role. Classifies entries on the basis of their roles played within a biological context, e.g. as antibiotics, antiviral agents, coenzymes, enzyme inhibitors.
Chemical Role. Classifies entries on the basis of their roles played within a chemical context, e.g. as acids or bases, solvents, ligands, surfactants.
Application. classifies entities on the basis of their applications, e.g. as pesticides, detergents, healthcare products, fuel.
5.4.3 Subatomic Particle
Classifies particles which are smaller than atoms.
5.5 The Relationships
Relationships can be created between an entity and either a
parent or a child. To create a new relationship between two entries, open the
Edit Ontology feature for one of them and enter the ChEBI ID for the other in
the appropriate box, selecting the type of relationship from the dropdown menu.
The relationships used in ChEBI are:
5.5.1 Is a
Used to imply that 'Entity A' is an instance of 'Entity B'
or that 'Class A' is an instance of 'Class B'. This is the chief hierarchical
non-cyclic relationship used thoughout the ontologies.
Validations
- If A "is a" B then A cannot have the following relationships with B:
- is conjugate base of
- is conjugate acid of
- has part
- is enantiomer of
- is substituent group from
- is tautomer of
- If A "is a" B then B cannot have the following relationships with A:
- is conjugate base of
- is conjugate acid of
- is enantiomer of
- is tautomer of
5.5.2 Has part
Used to denote the relationship between the whole and a part, especially between a salt or an addition compound and its components, or between a class of compounds and a substituent group characteristic for that class. This relationship is a reverse of the formerly used relationship "is part of".
Validations
- If A "has part" B then A cannot have the following relationships with B:
- is a
- is conjugate acid of
- is conjugate base of
- is enantiomer of
- is tautomer of
- is substituent group from
- If A "is substituent group from" B then B cannot have the following relationships with A:
- is a
- is conjugate acid of
- is conjugate base of
- is enantiomer of
- is tautomer of
- is substituent group from
5.5.3 Is conjugate base of
and
Is conjugate acid of
Cyclic
relationships which are used mainly between acids and their conjugate bases.
When creating a new relationship, only one of these needs to be entered, as the
system will create the reverse relationship. Note that although the IUPAC definition of conjugate acid/base refers to a difference in charge of 1 unit only, for ChEBI this is relaxed to include multiple charge differences. This is especially relevant to di- and poly-carboxylic acids [e.g. ChEBI uses the relationship "succinic acid is_conjugate_acid_of succinate(2—)"].
Validations
- If A "is conjugate base of" B then A cannot have the following relationships with B:
- If A "is conjugate base of" B then B cannot have the following relationships with A:
- If A "is conjugate acid of" B then A cannot have the following relationships with B:
- If A "is conjugate acid of" B then B cannot have the following relationships with A:
- In addition there should only be one pair of acid/base relationships per two items,
for example, if A "is conjugate base of" B then A cannot have another "is
conjugate base of" relationship with B.
5.5.4 Is tautomer of
A cyclic relationship used to show the
interrelationship between two tautomers, where the differences between the
structures are significant enough to warrant their separate inclusion in
ChEBI.
Validations
- If A "is tautomer of" B then A cannot have the following relationships with B:
- is a
- is conjugate acid of
- is conjugate base of
- has part
- has functional parent
- has parent hydride
- is enantiomer of
- is tautomer of
- If A "is tautomer of" B then B cannot have the same relationships above with A
except of course the "is tautomer of" relationship.
5.5.5 Is enantiomer of
A cyclic relationship used when two entities
are enantiomers of each other. An entity may have this relationship with only
one other entity.
Validations
- An entry can only have one "is enantiomer of" relationship with another entry.
- If A "is enantiomer of" B then A cannot have the following relationships with B:
- is a
- is conjugate acid of
- is conjugate base of
- has part
- has functional parent
- has parent hydride
- is tautomer of
- If A "is enantiomer of" B then B cannot have the same relationships above with A.
5.5.6 Has functional parent
Used to denote the relationship between two
molecular entities (or classes of entities), one of which possesses one or more
chacteristic groups from which the other can be derived by functional
modification. This relationship is especially useful to demonstrate the
relationships between a number of functionalised entities and a common
less-functionalised parent.
Validations
- If A "has functional parent" B then A cannot have the following relationships with B:
- is enantiomer of
- is substituent group from
- is tautomer of
- has part
- If A "has functional parent" B then B cannot have the same relationships above with A.
5.5.7 Has parent hydride
Used to denote the relationship between an entity
and its parent hydride (defined by IUPAC as "an unbranched acyclic or
cyclic structure or an acyclic/cyclic structure having a semisystematic or
trivial name to which only hydrogen atoms are attached").
Validations
- If A "is parent hydride of" B then A cannot have the following relationships with B:
- is enantiomer of
- is substituent group from
- is tautomer of
- has part
- If A "is substituent group from" B then B cannot have the same relationships above with A.
5.5.8 Is substituent group from
Indicates the relationship between a substituent
group (or atom) and its parent molecular entity, from which it is formed by
loss of one or more protons or simple groups.
Validations
- If A "is substituent group from" B then A cannot have the following relationships with B:
- is conjugate acid of
- is conjugate base of
- has part
- is enantiomer of
- is tautomer of
- If A "is substituent group from" B then B cannot have the following relationships with A:
- is conjugate acid of
- is conjugate base of
- has part
- is enantiomer of
- is tautomer of
5.5.9 Has role
Indicates the relationship between a molecular entity and its role. For relationship A "has role" B to be valid,
A should belong to the chemical entity ontology while B should belong to the role ontology.
Validations
If A "has role" B then A cannot have any other relationship with B.
6. Edit Structure screen
6.1 Graphics
Structures are
input and edited using the MarvinSketch Applet. To open this, first open the
Edit Structure screen and click in the
box at the right-hand side. Structures
for inputting may be drawn manually or copied and pasted from other
applications, e.g. ACD/Name.
6.1.1 Stereochemistry
The applet
allows 2D structures to be drawn, with stereochemistry at chiral centres being
indicated by bold and dashed wedges, with the points of the wedges directed
towards the stereocentre. In cases where stereochemistry at a centre is
possible but not specified, a plain bond linking the stereocentre and the
substituent is generally used (although in certain cases a wavy bond may be
used to provide emphasis). Where stereochemistry across double bonds is not
defined, this is indicated by use of a wavy bond to H or, if fully substituted,
to one of the substituents.
Attention is
drawn to the document 'Graphical Representation of Stereochemical Configuration
(IUPAC Recommendations 2006)', published in Pure Appl. Chem. Vol. 78,
No. 10, pp 1897-1970, 2006, which gives recommendations on preferred and
acceptable ways of displaying 3D stereochemical information in 2D diagrams,
along with examples for all types of stereochemical configuration.
6.1.2 3D Structures
A manipulatable
3D view (e.g. ball-and-stick or wireframe) may be generated from the 2D
structure by use of the 3D viewer (go to View, Open 3D Viewer). Such structures
may be added to the compound information via an extra MarvinSketch applet on
the Edit Structure screen, but should not be used as the default structure.
If 3D
coordinates are available as a 3D molfile, e.g. from a crystal-structure
determination, these may also be added directly to the molfile box on the Edit
Structure screen to create an extra graphical structure.
6.1.3 Atom labels
It is possible
to generate simple whole-integer atom labels on a structural diagram using the
MarvinSketch applet. Right-clicking on an atom and then selecting 'Map' will
allow an atom label between 1 and 99 to be selected and added to that atom.
However, such labelled structures must never be used as a default structure.
6.1.4 Group structures
The structure of a group contains at least one pseudoatom (attachment point) which is indicated with an asterisk, *. See the section How to edit the entry for a group.
6.1.5 Zero-valence atoms and isotopes
Annotators may experience some difficulty in entering structures for single atoms and isotopes due to the Marvin applet's propensity for adding charges and/or hydrogen atoms. In order to generate a zero-valence monatomic structure, the number '15' must appear in the sixth atom block number of the molfile as shown in the example for sodium-23 below. As stated in the molfile specification (see Appendix 1, Fig. 4), this position in the connection table is used to define valences from 1 to 14 but with a value of 15 being used to indicate zero valence. If a single isotope is being defined, the mass number (23 in the example) appears as a positive integer in the subsequent Isotope line.
Marvin 06230911292D
1 0 0 0 0 0 999 V2000
7.7076 -10.0151 0.0000 Na 0 0 0 0 0 15 0 0 0 0 0 0
M ISO 1 1 23
M END
Annotators may generate a zero-valence monatomic structure by inserting a single atom symbol into the Marvin applet, generating the molfile and then making manual adjustment to the connection table to make it conform to the above description. However they may find it more convenient to generate the structure using an alternative drawing package and then transferring the structure by copy/paste to the Marvin applet.
6.2 Molfiles
The MDL molfile
for a structure is displayed in a window on the left-hand side of the screen.
Information between this window and the graphic display is transferred by use
of left and right radio buttons. Molfiles may be entered directly by
copy-and-paste from other external databases, e.g. KEGG COMPOUND.
Every compound
entry which has a structure should be assigned a default structure. The InChI
and the SMILES will automatically be generated from this default structure if
possible. Tautomer generation for the InChI has been switched on making InChIs
generated by tautomers distinguishable.
7. Edit Comment screen
The annotator may
find it useful to add one or more comments, either for public viewing or for
internal use only. Such comments may be associated with a specific item of
data or with the entry as a whole. The text is keyed into the Add Comment box
and its association selected by checking the appropriate 'Select item' or
'General comment on compound' radio button. The comment is then incorporated
by clicking on 'Submit Changes'. Erroneous comments may be cancelled at any
time up to submission by clicking on 'Cancel Changes'.
Changes to
existing comments may also be made via the Edit Comment screen.
8. Minimal requirements for a ChEBI entry
The following
are the minimal requirements for an entity or class to be checked.
8.1 Entity
Obligatory
- IUPAC name
- Formula
- Classification
Optional
- Registry number(s)
- Cross-reference(s)
- Structure(s)
- Synonym(s)
8.2 Class
Obligatory
- Classification (non-cyclic, parent)
- Classification (non-cyclic, child)
Optional
- IUPAC name
- Definition
- Cross-reference(s)
- Structure(s)
- Synonym(s)
9. How to edit the entry for a group
Example: sulfanediyl group (CHEBI:29830).
It has:
- one IUPAC name from the IUPAC Blue Book Appendix 2: 'sulfanediyl'
- one structural formula which is put as SYNONYM with source IUPAC:
<bond>1</bond>S<bond>1</bond> (these <bond>1</bond> characters should be
interpreted as a dash if Unicode character respresentation is being used in the browser)
- and two synonyms, 'sulfenyl' and 'thio' which are not
recommended. So they have comments added: "This name is explicitly
not recommended by IUPAC."
Usually, ChEBI name is formed as 'name + group'. This name is not necessarily 'IUPAC name + group'.
A structure for a group must contain at least one pseudoatom (attachment point) which is indicated with an asterisk, *. It is important that the annotator tick the 'Validation Off' box since otherwise ChEBI will not accept the structure.
The group should be attached to its parent molecule via the relationship that we use only for groups:
sulfanediyl group (CHEBI:29830) is substituent group from hydrogen sulfide (CHEBI:16136)
10. Editing procedure for focus datasets
10.1 ChEBI Drugs
The main data resource for annotating ChEBI Drugs is DrugBank.
It should be noted that individual DrugBank entries may be linked to multiple ChEBI entities, as different salts, hydrates and isomers from a given compound
share a common ID in DrugBank but are annotated as different entities in ChEBI. As our primary source, we initially input all DrugBank
brand names in ChEBI but soon realised that most of them are deprecated. We have made an effort trying to distinguish between
brand names currently in use (those authorised and/or commercialised in at least one country) and those which are deprecated
(indicated in ChEBI by a yellow triangle surrounding an exclamation mark). However, this is a very manual and time-consuming process,
so currently drug brand names are not annotated in ChEBI. We have found RxNorm
to be a potential resource from where to fetch automatically
non-deprecated brand names. To date RxNorm includes FDA-approved brand names but only a few approved by Governmental Drug
Administrations in other countries. To our knowledge, brand names in RxNorm are assigned correctly to salts, hydrates and isomers of a given
compound, so in future we might use this resource to annotate brand names automatically to ChEBI Drugs.
International Nonproprietary Names (INNs) facilitate the identification of pharmaceutical substances or active pharmaceutical ingredients.
Each INN is a unique name that is globally recognized and is public property. The World Health Organization collaborates closely with
INN experts and national nomenclature committees to select a single name of worldwide acceptability for each active substance that is to
be marketed as a pharmaceutical. Nonproprietary names, also known as generic names, are annotated for all ChEBI Drugs from
WHO MedNet services.
A racemic mixture, or racemate, is one that has equal amounts of left- and right-handed enantiomers of a chiral molecule.
The trend in the pharmaceutical development is increasingly moving towards the development of single isomers rather than racemates;
however, there are around 400 racemic drugs approved at present. Any drug which is a racemic compound is annotated as three ChEBI entities:
(1) a non-stereospecified entity, (2) the left-handed enantiomer, (3) the right-handed enantiomer. Each of the two enantiomers would be linked
to the non-stereospecified entity via an "is_a" ontological relationship and to each other via an "is_enantiomer" relationship. A unique DrugBank ID is
crossreferenced to the three entities except when a specific DrugBank ID exists for one or both enantiomers. The same applies to
the INN of the racemic compound: this will be the same for all three ChEBI entities except when a specific INN exists for one or both isomers.
11. Editing procedure for Submissions
11.1 Structures
Structures do not have a source. If a submission includes a structure, the annotator should, if possible, retain this
in its original form. If a diagram needs modifying or completely redrawing in order to remove ambiguities, to
match the appearance of related entries, to agree with IUPAC recommendations or merely for aesthetic reasons,
then this new diagram should become the default and the submitter's original retained as an additional structure
provided that this has no inaccuracies or ambiguities. If a significant change is made to a submitter's
original, then the annotator should add a remark explaining the reasons for the change.
11.2 Formula, mass and charge
The formula, mass and charge information are derived from the submitted structure and will have as their
source SUBMITTER. If a structure is modified then the source will become ChEBI.
If a default structure is added/modified in ChEBI then a formula with source ChEBI will be added automatically.
If the default structure has status=SUBMITTED and that is changed to status=OK/CHECKED/DELETED, then all chemical data
with source=SUBMITTER will automatically be changed to source=CHEBI and the status changed to the relevant status=OK/CHECKED/DELETED.
If the submitted default structure is changed to a secondary structure (because the annotator adds a new default
structure) then the tool will regenerate the chemical data from the new default structure and update the status with
source ChEBI.
11.3 Nomenclature
If the annotator modifies a submitted name or synonym, the 'Adapted' tickbox should be checked and the source
retained as SUBMITTER. A submitted IUPAC Name, if correct, should be left with source=SUBMITTER.
However if changes need to be made, then the source of the corrected name should be changed to IUPAC.
11.4 Ontology
The submitter's classification must be checked carefully and modified if necessary.
If the submitter has selected from the Simple classification view in the submission tool (i.e. is_a
organic molecular entity, is_a inorganic molecular entity, is_a group, is_a biological role or is_a application),
then in almost all cases a new lower-level classification should be added and the status of the submitter's original changed to DELETED.
11.5 Xrefs
If xrefs are correct, then these should remain as source=SUBMITTER. If a submitter includes a CAS Registry Number
with no external source, then if possible the annotator should provide an authorative (open-access) source for it.
If none can be found, then source=SUBMITTER should be retained.
11.6 Deletion of Entries
If an entry is to be deleted then the annotator should always add a Remark with a clear reason why the entry is deleted. This Remark will be publicly visible.
Appendix 1: MDL mol formats in ChEBI
ChEBI follows the
MDL mol format
specification for its molfiles and what follows is a summary of this file.
Below is a list of the types of file
formats available from MDL.
In ChEBI we use the molfile format but as
we will see later on it allows various properties from the other files.
In the table below is a list of properties
allowed in the properties block of a connection table. The molfile format
allows all properties except the [Reaction] properties.
Please refer to Pg 15 of the format
specification for an exact list of all the properties table. All the properties
listed in the table under molfiles are allowed in molfiles but they will have
restrictions on when they can be used. For example, the RGroup attachment point
(APO) requires that an RGroup be present in the connectivity table.
|