![]() |
ChEBI User Manual
Page Contents
1. Introduction
2.
Data fields
2.1 ChEBI ID
2.3 Definition
2.4 Last Modified
2.5 Stars
2.7 Submitter
2.10 InChIKey
2.11 SMILES
2.12 Formula
2.13 Charge
2.14 Average Mass
2.15 Ontology
2.16 IUPAC name(s)
2.17 INNs
2.19 Brand names
2.20 Database links
2.21 Registry Number(s)
2.22 Comment(s)
2.23 Citation(s)
2.24 Submitter remark(s)
3.
Data sources
3.2 Other sources
3.2.1 ChEBI
3.2.2 ChemIDplus
3.2.3 IUBMB
3.2.4 IUPAC
3.2.5 JCBN
3.2.6 CBN
3.2.7 NIST Chemistry WebBook
3.2.8 PDB
3.2.9 UM-BBD
3.2.10 RESID
3.2.11 COMe
3.2.12 EMBL
3.2.13 UniProt
3.2.14 MolBase
3.2.15 KEGG GLYCAN
3.2.16 KEGG DRUG
3.2.17 WebElements
3.2.18 LIPID MAPS
3.2.19 EuroFir
3.2.19 Patent
3.2.19 DrugBank
3.2.19 EBI Industry Programme
4.
Automatically generated cross-references
4.1 UniProtKB
4.2 IntAct
4.4 Reactome
4.5 PubChem
4.6 SABIO-RK
4.7 ArrayExpress
4.8 Rhea
4.9 IntEnz
4.10 BRENDA
4.11 NMRShiftDB
5.
ChEBI Ontology
5.1 The Ontologies
5.3 The Relationships
5.3.1 is a
5.3.2 has part
5.3.4 is tautomer of
5.3.5 is enantiomer of
5.3.6 has functional parent
5.3.7 has parent hydride
5.3.9 has role
5.4 Status
1. IntroductionChemical Entities of Biological Interest (ChEBI)
is a freely available dictionary of 'small molecular entities'.
The term 'molecular entity' encompasses any constitutionally or isotopically distinct atom, molecule,
ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately
distinguishable entity. The molecular entities in question are either products of nature or
synthetic products used to intervene in the processes of living organisms (either on purpose,
as for drugs, or by accident, as for chemicals in the environment). The qualifier 'small' implies the exclusion of
entities directly encoded by the genome, and thus as a rule nucleic acids, proteins and peptides derived from
proteins by cleavage are not included.
Classes of molecular entities and part-molecular entities (in the form of substituent groups
or atoms) are also included in ChEBI.
All data in the database is non-proprietary or is derived from a non-proprietary source. It is thus freely accessible and available to anyone. In addition, each data item is fully traceable and explicitly referenced to the original source. 2. Data Fields2.1 ChEBI IDA unique and stable identifier for the entity, for example, CHEBI:16236. It has no chemical significance and may be cited by external users.2.2 ChEBI Names2.2.1 ChEBI NameThe name for an entity recommended for use by the biological community. In general traditional names have been retained by ChEBI but these may have been modified to enhance clarity, avoid ambiguity and follow more closely current IUPAC recommendations on chemical nomenclature.For more information see the Annotation Manual. 2.2.2 ChEBI ASCII NameThe ChEBI Name is also provided in ASCII format if the original includes special characters which require a Unicode presentation.2.3 DefinitionA short verbal definition is included in some entries (and for all new entries annotated after June 2009). For more information see the Annotation Manual.Wikipedia: In addition to a definition, for those compounds or classes for which ChEBI provides a database accession link to Wikipedia, the first paragraph of the Wikipedia entry is reproduced, along with a link to the full article. 2.4 Last ModifiedIndicating the date that the entity was last modified by an annotator.2.5 StarsAll entries are rated using a star system as follows:
2.6 Secondary ChEBI IDsHere are listed the IDs of any entries which may have been subsumed into the parent.2.7 SubmitterIf an entry is present by virtue of its having been submitted via the ChEBI Submissions Tool, the name of the submitter is displayed here (unless the submitter has elected to remain anonymous).2.8 Structural diagrams2.8.1 Connection tablesChEBI stores the two-dimensional or three-dimensional structural diagrams as connection tables in MDL molfile format. One entity can have one or more connection tables.2.8.2 Graphical representationOne or more structures may be displayed for an entity. Where there is more than one structure available, the additional ones may be viewed by clicking on the 'more structures' link beside the main displayed structure. By default, the diagrams are shown as the static PNG images generated by ChemAxon MarvinBeans, while clicking on 'Applet' will open an interactive MarvinView applet which allows the structure to be manipulated. Clicking on 'Image' restores the static image view. A link is provided beneath a structure to the corresponding MDL molfile.For more information see the Annotation Manual. 2.9 IUPAC International Chemical Identifier (InChI)The InChI is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data compilations. It expresses chemical structures in terms of atomic connectivity, tautomeric state, isotopes, stereochemistry and electronic charge in order to produce a sequence of machine-readable characters unique to the respective molecule. Further information on the InChI is available at http://www.iupac.org/inchi/.A very useful 'Unofficial InChI FAQ' is also accessible at http://wwmm.ch.cam.ac.uk/inchifaq. 2.10 InChIKeyThe InChIKey is a 25-character hashed version of the full InChI, designed to allow for easy web searches of chemical compounds. InChIKeys consist of 14 characters resulting from a hash of the connectivity information of the InChI, followed by a hyphen, followed by 8 characters resulting from a hash of the remaining layers of the InChI, followed by a single character indicating the version of InChI used, followed by single checksum character. There is a finite, but very small probability of finding two structures with the same InChIKey. However the probability for duplication of only the first block of 14 characters has been estimated as one duplication in 75 databases each containing one billion unique structures; such duplication therefore appears unlikely at present. Further information on the InChIKey is available at http://old.iupac.org/inchi/release102.html.2.11 SMILESSMILES (Simplified Molecular Input Line Entry System) is a simple but comprehensive chemical line notation, created in 1986 by David Weininger and further extended by Daylight Chemical Information Systems, Inc. SMILES specifically represents a valence model of a molecule and is widely used as a data exchange format. Further information on SMILES is available at http://www.daylight.com/smiles/.2.12 FormulaWhere possible, formulae are assigned for entities and groups. For compounds consisting of discrete molecules, this is generally the molecular formula, a formula according with the relative molecular mass (or the structure). To facilitate searching and downloading of data from external sources, the use of subscripts to indicate multipliers is avoided.The following conventions regarding ChEBI formulae are followed:
2.13 Net ChargeThe charge is the sum of all the positive and negative charges shown in the structure. For ions the magnitude of the charge is given in arabic numerals preceded by the sign of the charge. For neutral molecules the charge is indicated as a numerical zero. For instance, the charge of 5,10,15,20-tetrakis(1-methylpyridinium-4-yl)porphyrin (CHEBI:37447) is +4; the charge of borate (CHEBI:22908) is -3.2.14 Average MassRelative molecular, atomic and ionic masses are shown for molecular, atomic and ionic entities respectively. The relative masses are calculated from tables of relative atomic masses (atomic weights) published by IUPAC.2.15 OntologySee Section 5 below.2.16 IUPAC name(s)A name provided for an entity based on current recommendations of IUPAC. It need not be fully systematic as it makes use of 'retained names'.Example: The IUPAC Name for abietic acid (CHEBI:28987) is abieta-7,13-dien-18-oic acid, based on the retained name 'abietane', rather than the fully systematic name (1R,4aR,10aR)-1,4a-dimethyl- In most cases, a single IUPAC Name is provided for a molecular entity or a group. For organic compounds this name will, if necessary, be amended when the IUPAC rules for providing a 'Preferred IUPAC Name' for any organic compound are published. For further information on IUPAC's preferred names project see the relevant web page: http://www.iupac.org/projects/2001/2001-043-1-800.html For more information see the Annotation Manual. 2.17 INNIn cases where an entity is a pharmaceutical substance, an International Nonproprietary Name (INN) may be shown. The INN is the official non-proprietary or generic name given to a pharmaceutical substance, as designated by the World Health Organisation (WHO). INNs may appear in ChEBI in English, Latin, Spanish and French language versions.2.18 SynonymsAlternative names for an entity which either have been used in EBI or external sources or have been devised by the curators based on recommendations of IUPAC, NC-IUBMB or their associated bodies. The source of each synonym is clearly identified (see 'Data sources' below). Systematic names may also be included in this section. In addition to English-language synonyms, versions may be shown in FrenchFor more information see the Annotation Manual. 2.18.1 Adapted SynonymsSynonyms are normally reproduced in the exact form in which they appear in the source. However, where changes have been made, e.g. to correct syntax or to convert from an index style of presentation, then this is indicated by .
2.19 Brand namesWhere an entity is an active ingredient of a proprietary pharmaceutical preparation, the brand name of the preparation may be shown.2.20 Database linksDirect links to the entries for an entity in the databases cited.2.21 Registry Number(s)The Chemical Abstracts Service (CAS) Registry Number is a unique numeric identifier assigned to a substance when it enters the CAS REGISTRY database. Registry Numbers have no chemical significance and are assigned in sequential order to unique, new substances identified by CAS scientists for inclusion in the database.Two principles of ChEBI are that (1) nothing held in the database must be proprietary or derived from a proprietary source that would limit its free distribution and/or availability and (2) every data item in the database should be fully traceable and explicitly referenced to the original source. As such, it is impossible for ChEBI to cite CAS as a source for Registry Numbers as this organization's products are not freely accessible. ChEBI therefore cites other reliable and freely accessible sources for CAS Registry Numbers which are always fully referenced. Other registry numbers which may be displayed are Beilstein and Gmelin Registry Numbers. For more information see the Annotation Manual. 2.22 Comment(s)A free-text comment may be added to some terms especially in cases where confusing terminology has been historically used. A comment may relate to a single term or to the entry as a whole.2.23 Citation(s)Publications which cite the entity are listed here, along with hyperlinks to the PubMed entry via CiteXplore, a web application of the EBI for the exploration of literature related to biological research and bioinformatics. Clicking on the 'Show Abstract' link displays the abstract as contained within CiteXplore.2.24 Submitter remark(s)For entries initiated via the ChEBI Submission tool, a record of any discussion had between the submitter and annotator.3. Data sources3.1 Main sources3.1.1 IntEnzThe Integrated relational Enzyme database of the EBI. IntEnz is the master copy of the Enzyme Nomenclature, the recommendations of the NC-IUBMB on the Nomenclature and Classification of Enzyme-Catalysed Reactions.3.1.2 KEGG COMPOUNDOne part of the the Kyoto Encyclopedia of Genes and Genomes LIGAND composite database, COMPOUND is a collection of biochemical compound structures.3.1.3 PDBeChemThe service providing web access to the Chemical Component Dictionary of the wwPDB as this is loaded into the PDBe database at the EBI (previously known as chemPDB and MSDchem).3.1.4 ChEMBLA database of approximately 500,000 bioactive compounds, their quantitive properties and bioactivities, abstracted from the primary scientific literature. It is part of the ChEMBL resources at the EBI.3.2 Other sourcesThese sources are manually entered into the database by a ChEBI curator.3.2.1 ChEBIIndicates entry initiated by a ChEBI curator.3.2.2 ChemIDplusA free, web-based search system, ChemIDplus provides access to structure and nomenclature authority files used for the identification of chemical substances cited in National Library of Medicine (NLM) databases.3.2.3 IUBMBName based on the recommendations of the NC-IUBMB. Of particular relevance is Glossary of Chemical Names used in the Enzyme Nomenclature.3.2.4 IUPACName based on the recommendations of IUPAC.3.2.5 JCBNName based on the recommendations of the IUPAC-IUBMB Joint Commission on Biochemical Nomenclature, a body jointly responsible to both IUBMB and IUPAC, which deals with matters of biochemical nomenclature that have importance in both biochemistry and chemistry.3.2.6 CBNName based on the recommendations of the IUPAC-IUB Commission on Biochemical Nomenclature, the forerunner of JCBN, which was discontinued in 1977.3.2.7 NIST Chemistry WebBookThe National Institute of Standards and Technology operates a Chemistry WebBook providing access to chemical and physical property data for chemical species. The data provided are from collections maintained by the NIST Standard Reference Data Program and outside contributors.3.2.8 PDBThe Protein Data Bank (PDB) is a repository for 3D structural data on biological macromolecules and their complexes. It is maintained by the Worldwide PDB (wwPDB; wwpdb.org) organisation. EMBL-EBI's Protein Data Bank in Europe (PDBe; pdbe.org) is one of the founding members of wwPDB.3.2.9 UM-BBDThe University of Minnesota Biocatalysis/Biodegradation Database maintains a list of compounds involved in microbial biocatalytic reactions and biodegradation pathways.3.2.10 RESIDThe RESID Database of Protein Modifications at the EBI is a comprehensive collection of annotations and structures for protein modifications including amino-terminal, carboxyl-terminal and peptide chain cross-link post-translational modifications.3.2.11 COMeCOMe (Co-Ordination of Metals) at the EBI represents an ontology for bioinorganic and other small molecule centres in complex proteins, using a classification system based on the concept of a bioinorganic motif.3.2.12 EMBLThe EMBL Nucleotide Sequence Database (also known as EMBL-Bank) constitutes Europe's primary nucleotide sequence resource. It is produced by the EBI in international collaboration with GenBank at the NCBI (National Centre for Biotechnology Information, USA) and DDBJ (DNA Data Bank of Japan).3.2.13 UniProtThe UniProt Knowledgebase is a central access point for extensive curated protein information, including function, classification, and cross-reference, created in 2002 by joining information contained in Swiss-Prot, TrEMBL, and PIR.3.2.14 MolBaseAn online database of inorganic compounds, MolBase was constructed by Dr Mark Winter of the University of Sheffield with input from undergraduate students.3.2.15 KEGG GLYCANA part of the KEGG LIGAND database, GLYCAN is a collection of experimentally determined glycan structures.3.2.16 KEGG DRUGA part of the KEGG LIGAND database, KEGG DRUG contains chemical structures of drugs and additional information such as therapeutic categories and target molecules.3.2.17 WebElementsAuthored by Dr Mark Winter of the University of Sheffield, WebElements is a high-quality web-based source of chemistry information relating to the periodic table.3.2.18 LIPID MAPSA comprehensive classification system for lipids developed by the Lipid Metabolites and Pathways Strategy (LIPID MAPS) consortium.3.2.19 EuroFirEuroFir (European Food Information Resource Network), the world-leading European Network of Excellence on Food Composition Databank systems, is a partnership between 48 universities, research institutes and small-to-medium sized enterprises (SMEs) from 25 European countries.3.2.20 PatentLinks to patent documents which either cite the preparation, properties or uses of an entity, or are the source of a synonym, are provided via the esp@cenet service of the European Patent Office.3.2.21 DrugbankDeveloped at the University of Alberta, the DrugBank database is a bio- and chemo-informatics resource that combines detailed drug data with comprehensive drug target information.3.2.22 EBI Industry ProgrammeThe EBI Industry Programme is a forum through which the EBI can provide training and research of benefit to the European pharmaceutical, biotechnology, consumer-goods, chemical and agricultural industries. The membership comprises many of the world's leading pharmaceutical, biotechnology and consumer-goods companies.4. Automatically generated cross-referencesEnhanced automatically generated cross-references to a number of external databases are provided on a separate viewing screen reached via a tab on the main results screen. At the time of writing, automatically generated cross-references are provided to the following databases: 4.1 UniProtKBUniProt (Universal Protein Resource) is a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL and PIR. UniProtKB (UniProt Knowledgebase) is one component and is the central access point for extensive curated protein information, including function, classification, and cross-reference. The links from a ChEBI entry enable a user to view the UniProtKB entries for all proteins associated with that particular compound and are updated monthly.4.2 IntActA service of EMBL-EBI, IntAct provides a freely available, open source database system and analysis tools for protein interaction data. As for UniProt KB (see above), the links from a ChEBI entry enable a user to view the IntAct entries for all proteins associated with that particular compound.4.3 BioModels DatabaseBioModels Database is a data resource, developed by a consortium including EMBL-EBI and Caltech, that allows biologists to store, search and retrieve published mathematical models of biological interest. Models present in BioModels Database are annotated and linked to relevant data resources, such as publications, databases of compounds and pathways and controlled vocabularies.4.4 ReactomeThe Reactome project is a curated resource of core pathways and reactions in human biology, developed as a collaboration among Cold Spring Harbor Laboratory, EMBL-EBI, and the Gene Ontology Consortium.4.5 PubChemPubChem is a database maintained by the National Center for Biotechnology Information (NCBI). It contains substance descriptions and information on small molecules with fewer than 1000 atoms and 1000 bonds.4.6 SABIO-RKThe SABIO-RK (System for the Analysis of Biochemical Pathways - Reaction Kinetics) is a database that contains information about biochemical reactions, the corresponding kinetic equations with their parameters, and the experimental conditions under which these parameters were measured.4.7 ArrayExpressArrayExpress is a public repository for transcriptomics and related data, aimed at storing data compliant with MIAME (Minimum Information About a Microarray Experiment) specifications.4.8 RheaRhea, a collaboration between EMBL-EBI and the Swiss Institute of Bioinformatics (SIB), is a manually annotated database of chemical reactions in which all reaction participants (reactants and products) are linked to ChEBI. While its main focus is enzymatic reactions, other biochemical reactions are included.4.9 IntEnzIntEnz (Integrated relational Enzyme database) is a freely available resource focused on enzyme nomenclature. A collaboration between EMBL-EBI and the Swiss Institute iof Bioinformatics (SIB), it contains the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) on the nomenclature and classification of enzyme-catalysed reactions.4.10 BRENDABRENDA (BRaunschweig ENzyme DAtabase) represents an information system containing a huge amount of biochemical and molecular information on all classified enzymes as well as software tools for querying the database and calculating molecular properties.4.11 IntEnzNMRShiftDB is a NMR database for organic structures and their nuclear magnetic resonance (nmr) spectra.5. ChEBI OntologyThe ChEBI Ontology is a structured classification of the entities contained within ChEBI. Originally developed as 'Chemical Ontology' by Michael Ashburner and Pankaj Jaiswal, the initial alpha release was subsumed into ChEBI and is currently in process of being refined and extended. Its structure is essentially that of a directed acyclic graph (DAG), which differs from a simple taxonomy in that a child term can have many parent terms. Additionally, a number of relationships are incorporated which are cyclic in nature.5.1 The OntologiesThe ChEBI Ontology is subdivided into three separate sub-ontologies:
5.2 The ViewsTwo options for visualising the ontology relationships for an entry in ChEBI are provided:5.2.1 Outgoing and Incoming ViewThe default view which states in words the relationships between a ChEBI entry and its immediate related entities.5.2.2 Tree ViewA view, accessed via the link at the foot of the Outgoing and Incoming View, which by means of graphic illustration places a ChEBI entry into context within the ontology structure. All parents within the hierarchy are shown, as well as the immediate children. Adjacent is a key identifying the relationships used within the tree structure. Entries and relationships which have been checked by a curator are shown in blue while preliminary (unchecked) ones are in grey. Clicking on a node within the tree will take the user to the ChEBI entry for that node. Unchecked ChEBI entries accessed by this route will display the heading 'Preliminary ChEBI Entry'.5.3 The RelationshipsFor each relationship a formal definition is included beneath the description.5.3.1 |