ChEBI User Manual
2. Data fields
2.1 ChEBI ID
2.4 Last Modified
2.5 3-Star status
2.14 Average Mass
2.16 IUPAC name(s)
2.19 Brand names
2.20 Database links
2.21 Registry Number(s)
2.24 Submitter remark(s)
2.25 Supplier Information
3. Data sources
3.2 Other sources
3.2.7 NIST Chemistry WebBook
3.2.15 KEGG GLYCAN
3.2.16 KEGG DRUG
3.2.18 LIPID MAPS
3.2.19 EBI Industry Programme
4. Automatically generated cross-references
5. ChEBI Ontology
5.1 The Ontologies
5.3 The Relationships
Chemical Entities of Biological Interest (ChEBI)
is a freely available dictionary of 'small molecular entities'.
The term 'molecular entity' encompasses any constitutionally or isotopically distinct atom, molecule,
ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately
distinguishable entity. The molecular entities in question are either products of nature or
synthetic products used to intervene in the processes of living organisms (either on purpose,
as for drugs, or by accident, as for chemicals in the environment). The qualifier 'small' implies the exclusion of
entities directly encoded by the genome, and thus as a rule nucleic acids, proteins and peptides derived from
proteins by cleavage are not included.
Classes of molecular entities and part-molecular entities (in the form of substituent groups
or atoms) are also included in ChEBI.
All data in the database is non-proprietary or is derived from a non-proprietary source. It is thus freely accessible and available to anyone. In addition, each data item is fully traceable and explicitly referenced to the original source.
2. Data Fields
2.1 ChEBI IDA unique and stable identifier for the entity, for example, CHEBI:16236. It has no chemical significance and may be cited by external users.
For more information see the Annotation Manual.
Wikipedia: In addition to a definition, for those compounds or classes for which ChEBI provides a database accession link to Wikipedia, the first paragraph of the Wikipedia entry is reproduced, along with a link to the full article.
MDL molfile format. One entity can have one or more connection tables.
ChemAxon MarvinBeans, while clicking on 'Applet' will open an interactive MarvinView applet which allows the structure to be manipulated. Clicking on 'Image' restores the static image view. A link is provided beneath a structure to the corresponding MDL molfile.
For more information see the Annotation Manual.
A very useful 'Unofficial InChI FAQ' is also accessible at http://wwmm.ch.cam.ac.uk/inchifaq.
The following conventions regarding ChEBI formulae are followed:
Section 5 below.
Example: The IUPAC Name for abietic acid (CHEBI:28987) is abieta-7,13-dien-18-oic acid, based on the retained name 'abietane', rather than the fully systematic name (1R,4aR,10aR)-1,4a-dimethyl-7-(propan-2-yl)-1,2,3,4,4a,5,6,10,10a-decahydrophenanthrene-1-carboxylic acid (which is cited in ChEBI within the list of synonyms for this compound).
In most cases, a single IUPAC Name is provided for a molecular entity or a group. For organic compounds this name will, if necessary, be amended when the IUPAC rules for providing a 'Preferred IUPAC Name' for any organic compound are published. For further information on IUPAC's preferred names project see the relevant web page: http://www.iupac.org/projects/2001/2001-043-1-800.html For more information see the Annotation Manual.
'Data sources' below). Systematic names may also be included in this section. In addition to English-language synonyms, versions may be shown in French , German , Spanish and Latin , the language being indicated by a flag.
For more information see the Annotation Manual.
Chemical Abstracts Service (CAS) Registry Number is a unique numeric identifier assigned to a substance when it enters the CAS REGISTRY database. Registry Numbers have no chemical significance and are assigned in sequential order to unique, new substances identified by CAS scientists for inclusion in the database.
Two principles of ChEBI are that (1) nothing held in the database must be proprietary or derived from a proprietary source that would limit its free distribution and/or availability and (2) every data item in the database should be fully traceable and explicitly referenced to the original source. As such, it is impossible for ChEBI to cite CAS as a source for Registry Numbers as this organization's products are not freely accessible. ChEBI therefore cites other reliable and freely accessible sources for CAS Registry Numbers which are always fully referenced.
Other registry numbers which may be displayed are Beilstein and Gmelin Registry Numbers. For more information see the Annotation Manual.
CiteXplore, a web application of the EBI for the exploration of literature related to biological research and bioinformatics. Clicking on the 'Show Abstract' link displays the abstract as contained within CiteXplore.
ZINC and/or the eMolecules databases of commercially available compounds. Note that these links are obtained by automatic matching of InChIKeys, so no Supplier Information will be shown for entities which do not have an associated structure in ChEBI.
3. Data sourcesIntEnz is the master copy of the Enzyme Nomenclature, the recommendations of the NC-IUBMB on the Nomenclature and Classification of Enzyme-Catalysed Reactions.
LIGAND composite database, COMPOUND is a collection of biochemical compound structures.
service providing web access to the Chemical Component Dictionary of the wwPDB as this is loaded into the PDBe database at the EBI (previously known as chemPDB and MSDchem).
ChEMBL resources at the EBI.
ChemIDplus provides access to structure and nomenclature authority files used for the identification of chemical substances cited in National Library of Medicine (NLM) databases.
NC-IUBMB. Of particular relevance is Glossary of Chemical Names used in the Enzyme Nomenclature.
IUPAC-IUBMB Joint Commission on Biochemical Nomenclature, a body jointly responsible to both IUBMB and IUPAC, which deals with matters of biochemical nomenclature that have importance in both biochemistry and chemistry.
IUPAC-IUB Commission on Biochemical Nomenclature, the forerunner of JCBN, which was discontinued in 1977.
National Institute of Standards and Technology operates a Chemistry WebBook providing access to chemical and physical property data for chemical species. The data provided are from collections maintained by the NIST Standard Reference Data Program and outside contributors.
wwpdb.org) organisation. EMBL-EBI's Protein Data Bank in Europe (PDBe; pdbe.org) is one of the founding members of wwPDB.
The University of Minnesota Biocatalysis/Biodegradation Database maintains a list of compounds involved in microbial biocatalytic reactions and biodegradation pathways.
RESID Database of Protein Modifications at the EBI is a comprehensive collection of annotations and structures for protein modifications including amino-terminal, carboxyl-terminal and peptide chain cross-link post-translational modifications.
COMe (Co-Ordination of Metals) at the EBI represents an ontology for bioinorganic and other small molecule centres in complex proteins, using a classification system based on the concept of a bioinorganic motif.
EMBL Nucleotide Sequence Database (also known as EMBL-Bank) constitutes Europe's primary nucleotide sequence resource. It is produced by the EBI in international collaboration with GenBank at the NCBI (National Centre for Biotechnology Information, USA) and DDBJ (DNA Data Bank of Japan).
UniProt Knowledgebase is a central access point for extensive curated protein information, including function, classification, and cross-reference, created in 2002 by joining information contained in Swiss-Prot, TrEMBL, and PIR.
MolBase was constructed by Dr Mark Winter of the University of Sheffield with input from undergraduate students.
KEGG LIGAND database, GLYCAN is a collection of experimentally determined glycan structures.
KEGG LIGAND database, KEGG DRUG contains chemical structures of drugs and additional information such as therapeutic categories and target molecules.
WebElements is a high-quality web-based source of chemistry information relating to the periodic table.
LIPID MAPS) consortium.
EuroFir (European Food Information Resource Network), the world-leading European Network of Excellence on Food Composition Databank systems, is a partnership between 48 universities, research institutes and small-to-medium sized enterprises (SMEs) from 25 European countries.
esp@cenet service of the European Patent Office.
DrugBank database is a bio- and chemo-informatics resource that combines detailed drug data with comprehensive drug target information.
EBI Industry Programme is a forum through which the EBI can provide training and research of benefit to the European pharmaceutical, biotechnology, consumer-goods, chemical and agricultural industries. The membership comprises many of the world's leading pharmaceutical, biotechnology and consumer-goods companies.
Enhanced automatically generated cross-references to a number of external databases are provided on a separate viewing screen reached via a tab on the main results screen. At the time of writing, automatically generated cross-references are provided to the following databases:
UniProt (Universal Protein Resource) is a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL and PIR. UniProtKB (UniProt Knowledgebase) is one component and is the central access point for extensive curated protein information, including function, classification, and cross-reference. The links from a ChEBI entry enable a user to view the UniProtKB entries for all proteins associated with that particular compound and are updated monthly.
IntAct provides a freely available, open source database system and analysis tools for protein interaction data. As for UniProt KB (see above), the links from a ChEBI entry enable a user to view the IntAct entries for all proteins associated with that particular compound.
BioModels Database is a data resource, developed by a consortium including EMBL-EBI and Caltech, that allows biologists to store, search and retrieve published mathematical models of biological interest. Models present in BioModels Database are annotated and linked to relevant data resources, such as publications, databases of compounds and pathways and controlled vocabularies.
Reactome project is a curated resource of core pathways and reactions in human biology, developed as a collaboration among Cold Spring Harbor Laboratory, EMBL-EBI, and the Gene Ontology Consortium.
PubChem is a database maintained by the National Center for Biotechnology Information (NCBI). It contains substance descriptions and information on small molecules with fewer than 1000 atoms and 1000 bonds.
SABIO-RK (System for the Analysis of Biochemical Pathways - Reaction Kinetics) is a database that contains information about biochemical reactions, the corresponding kinetic equations with their parameters, and the experimental conditions under which these parameters were measured.
ArrayExpress is a public repository for transcriptomics and related data, aimed at storing data compliant with MIAME (Minimum Information About a Microarray Experiment) specifications.
Rhea, a collaboration between EMBL-EBI and the Swiss Institute of Bioinformatics (SIB), is a manually annotated database of chemical reactions in which all reaction participants (reactants and products) are linked to ChEBI. While its main focus is enzymatic reactions, other biochemical reactions are included.
IntEnz (Integrated relational Enzyme database) is a freely available resource focused on enzyme nomenclature. A collaboration between EMBL-EBI and the Swiss Institute iof Bioinformatics (SIB), it contains the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) on the nomenclature and classification of enzyme-catalysed reactions.
BRENDA (BRaunschweig ENzyme DAtabase) represents an information system containing a huge amount of biochemical and molecular information on all classified enzymes as well as software tools for querying the database and calculating molecular properties.
NMRShiftDB is a NMR database for organic structures and their nuclear magnetic resonance (nmr) spectra.
5. ChEBI OntologyThe ChEBI Ontology is a structured classification of the entities contained within ChEBI. Originally developed as 'Chemical Ontology' by Michael Ashburner and Pankaj Jaiswal, the initial alpha release was subsumed into ChEBI and is currently in process of being refined and extended. Its structure is essentially that of a directed acyclic graph (DAG), which differs from a simple taxonomy in that a child term can have many parent terms. Additionally, a number of relationships are incorporated which are cyclic in nature.
5.3.1 is a
Implies that 'Entity A' is a subtype of 'Entity B'. E.g.
or, in words,
chloroform (CHEBI:23143) is a subtype of the class of chloromethanes (CHEBI:23148), which means that all instances of chloroform are also instances of chloromethane. Chloromethanes is itself a subtype of the class of chloroalkanes (CHEBI:23143), and so forth.
Definition: "C is_a C' if and only if: given any c that instantiates C at a time t, c instantiates C' at t."
or, in words,
potassium tetracyanonickelate(2−) (CHEBI:30071) has part tetracyanonickelate(2−) (CHEBI:30025).
Definition: "C has_part C' if and only if: given any c that instantiates C at a time t, there is some c' such that c' instantiates C' at time t, and c has c' as a part at t."
Thus, the neutral pyruvic acid (CHEBI:32816) is the conjugate acid of the pyruvate anion (CHEBI:15361), while as a corollary pyruvate is the conjugate base of the acid.
Definition: "A is_conjugate_acid_of B if and only if, given any a, a instantiates A and has the disposition to be a Bronsted Acid, then there is some b, such that b instantiates B and has the disposition to be a Bronsted Base, such that b derives from a through the removal of a proton as the result of a chemical transformation process."
Thus, L-serine (CHEBI:17115) and its zwitterion (CHEBI:33384) are tautomers.
Definition: "A is_tautomer_of B if and only if, given any a which instantiates A and has composition ca and is described by a molecular graph ag, there is some b that instantiates B, has composition cb and is described by a molecular graph bg, such that ca equals cb, ag is different from bg and a derives from b as the result of an intramolecular chemical transformation process (i.e. a chemical transformation process which has only one participant), in which only bonds to hydrogen are broken or formed."
Each relationship shows that D-alanine (CHEBI:15570) is an enantiomer of L-alanine (CHEBI:16977) and vice versa.
Definition: "A is_enantiomer_of B if and only if, given any a that instantiates A, has molecular graph ag, there is some b such that b instantiates B, is described by molecular graph bg, such that ca is equal to cb and ag is transformed into bg through a C2 symmetric transform."
Or, in words, 16α-hydroxyprogesterone (CHEBI:15826) can be derived by functional modification (i.e. 16α-hydroxylation) of progesterone (CHEBI:17026).
Definition: "A has_functional_parent B if and only if given any a, a instantiates A , has molecular graph ag and a obo:has_part some functional group fg, then there is some b such that b instantiates B, has molecular graph bg and has functional group fg’ such that bg is the result of a graph transformation process on ag resulting in the conversion of fg into fg’."
Thus 1,4-naphthoquinone (CHEBI:27418) has as its parent hydride the cyclic hydrocarbon naphthalene (CHEBI:16482).
Definition: "A has_parent_hydride B if and only if given any a, a instantiates A , has molecular graph ag and a obo:has_part some functional group fg, then there is some b such that b instantiates B, has molecular graph bg such that bg is the result of a graph transformation process on ag resulting in the removal of fg and its replacement by a hydrogen atom."
The L-valino group (CHEBI:32854) is derived by a proton loss from the N atom of L-valine (CHEBI:16414).
Definition: "A is_substituent_group_from B if and only if A is a group and B is a molecular entity; given any a that instantiates A, a has molecular graph ag and specified attachment point agap, and there is some b that instantiates B and has molecular graph bg, then it is the case that bg is the result of a graph transformation process on ag resulting in the replacement of agap by some group bgg (which may be a hydrogen atom or a more complex group)."
Thus morphine (CHEBI:17303) has a role opioid analgesic (CHEBI:35482).
Definition: "Chemical entity C has_role role R if and only if: given any c that instantiates C at t, there exists some r that instantiates R at t, and c is the bearer of r at t."
CheckedEntries and relationships which have been checked by a curator are shown in blue in the tree view.
UncheckedEntries and relationships which have not been checked by a curator are shown in grey in the tree view. Such entries and relationships must be regarded as preliminary. All unchecked entries accessed via the tree view carry a heading 'Preliminary ChEBI Entry'.
Closed classes that include:
As "methylbenzene" means benzene substituted with one or more methyl groups, the class is closed, i.e. limited in size.
Anything else is open class. For instance, toluenes (CHEBI:27024) includes toluene and various substituted toluenes, e.g. hydroxytoluenes (CHEBI:24751).
6. Developer's ReferenceSee the ChEBI Developer Manual for further information.
7.1 Searching the ChEBI database
The ChEBI search interface comprises two parts: a quick text search as well as an Advanced Search. Text searching in both the quick and Advanced searches employs Lucene, a full-featured text search engine library written entirely in Java, while the structure search facility of the Advanced Search uses the new chemical structure search algorithm OrChem, an Oracle chemistry plug-in using the Chemistry Development Kit (CDK).
7.1.1 Quick text search
A text search box is provided on the home page. This enables users to enter either a precise search term or one employing wild cards. The wild-card character is the asterisk (*). The search engine will then search for that term through all the fields within the ChEBI entries, and then list the results using a scoring mechanism, the compound with the highest score being listed first. In the table of results are shown for each result the structure (if one exists within the database), the ChEBI ID and Name, the Text Search Score and the '3-star' symbol (where appropriate – see Section 2.5). Clicking on the ChEBI ID takes the user direct to that entry, while hovering the cursor over the structure enlarges the structure.
7.1.2 Advanced Search
The chemical structure search algorithm OrChem allows substructure and similarity searching to be performed on an Oracle 11g database. It allows the user to search on groups and residues, as well as on complete molecular entities. OrChem works in combination with the JChemPaint applet, an editor and viewer included in CDK for 2D chemical structures, and converts chemical structures into fingerprints, each fingerprint representing the occurrence of a particular structural feature. It is important to remember that fingerprints have limitations: they are good at indicating that a particular structure feature is not present but they can only indicate a structure feature's presence with some probability.
Fingerprints are used to eliminate candidates for further examination in substructure searching. For molecule A to be a substructure of molecule B then all bits set in the fingerprint of molecule A should be present in molecule B. Once this initial screening is performed, the potential substructure candidates are subjected to a more rigorous inspection to determine whether molecule A is a substructure of molecule B.
To perform a substructure search in ChEBI draw your chemical structure using the MarvinSketch applet. Then select the 'Chemical Structure Search' option 'Substructure' and click 'Search'. If your substructure is found within the database the results will be displayed with relevant links to the entities found.
Similarity searching is performed by calculating the Tanimoto coefficient for each structure within the database against the query structure. The Tanimoto coefficient calculates how many structural features two chemical structures have in common based on the fingerprint described above. A Tanimoto score of 1.0 indicates that the two structures are very similar. However, as the fingerprints are calculated on a chemical structure path depth of eight it means that many structures will have similar fingerprints and very high similarity scores even though they might not be very structurally similar.
Identity searching is performed using the InChI as a chemical identifier.
Advanced text search
The text search facility of the Advanced Search allows users to search all the data or to filter a search by category (see below). Mass and charge can be searched within ranges: for example, one can search for all entities with a mass of between 150 and 300 atomic mass units. Furthermore, searches can be filtered by database: for example, one can search for entities used in the NMRShiftDB or PubChem databases.
As in the Quick search, the asterisk (*) is provided as the wildcard character. A wildcard character allows you to find compounds by typing in a partial name. The search engine will then try to find names matching the pattern you have specified using the wildcard character. You can place wildcards in any of the search options and in any of the search combinations, making this character very valuable in terms of searching.
Users also have the ability to filter on the ChEBI ontology. This functionality allows one to retrieve all the children of a specific entity based on the relationship given. For example, all cofactors (CHEBI:23357) can be retrieved by entering the term cofactors using the 'has role' relationship and this will retrieve not only its direct children such as pantothenic acids (CHEBI:25848) but also further entities in the graph related via an is a relationship such as NADPH (CHEBI:16474). It also allows retrieval of only those entities with chemical structures by ticking a specific checkbox.
All the above searches can be combined by using the logical operators AND, OR and BUT NOT, and there are options on the Results page for exporting the search results in either MDL SD file, tab delimited or XML format.
As mentioned above, users can also search by category. This option allows searches to be narrowed down by selecting from the categories provided, a summary of which is below:
Categories can be used within any combination of the logical operators described above.
7.2 RSS Feed
You can subscribe to the ChEBI RSS feed by downloading and installing a RSS Reader. Once you have downloaded the RSS Reader you can cut and paste the RSS Feed into your subscription toolbar and save it. Click on the RSS icon to subscribe to the RSS Feed.
Firefox users! You can subscribe to the ChEBI RSS feed by clicking on the RSS link on the top right corner of your address bar.
Once you have bookmarked the RSS feed you can view all the most up to date news via your bookmarks folder.
7.3 Browser Search Plugins
You can install the ChEBI search engine into your web browsers search box. ChEBI uses the the OpenSearch description document format which is \ supported by web browsers such as Internet Explorer 7 and Mozila Firefox .
Follow the steps as follows:
Follow the steps as follows: