ChEBI, EMBL-EBI’s database of Chemical Entities of Biological Interest, is a freely available, manually annotated database of small molecular entities (molecules not encoded by the genome, Figure 1). These could include any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, or anything else that is a separately distinguishable entity.
ChEBI focuses on chemical nomenclature and structures, and provides a wide range of related chemical data such as formulae, links to other databases and an ontology for the chemical space. It aims to bridge the gap between small molecules and the macromolecules with which they interact in living systems.
The ChEBI database combines chemical nomenclature, structures, synonyms and related chemical information from a number of freely accessible sources. All data are manually annotated to a high standard before public release, using nomenclature, symbolism and terminology endorsed by the International Union of Pure and Applied Chemistry (IUPAC) and the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB).
A major feature of ChEBI is that entries are related to each other using the ChEBI ontology. This represents the meaning of the data in a structured manner by creating relationships between entities and their parents (less specialised terms) and/or children (more specialised terms). ChEBI is probably the only chemistry database to include an ontology. The ChEBI ontology is used by a number of biological ontologies to manage their chemistry-related terms. For more information, see 'The ChEBI ontology' section of this Quick tour.
Data in ChEBI are divided into:
Fully annotated (‘three star’) entries.
Data curated elsewhere but not yet checked by ChEBI curators.
In addition, small molecule data from partner databases such as ChEMBL, a large database of drugs and drug-like molecules, can also be searched from ChEBI.
You can choose to search for 'three star' entries only, to search 'All in ChEBI' (three star entries plus data curated elsewhere), or to search over all of the data in ChEBI and the partner databases (Figure 2).
Find the correct chemical terminology using name, formula or registry numbers, including CAS, Beilstein/Reaxys and Gmelin Registry Numbers
Visualise chemical structures and use the chemical substructure and similarity search powered by OrChem, an open source Oracle chemistry plug-in. The facility allows you to draw or upload a chemical structure and then perform exact, substructure, or similarity searches.
View the relationships between molecules using the ChEBI ontology, either from within a ChEBI entry or using the EBI’s Ontology Lookup Service.
Bridge the gap between small molecules and the macromolecules with which they interact. Biological databases such as the UniProt Knowledgebase and Reactome allow you to view cross- references to all entries featuring a particular chemical.
Download chemical structures in MDL Molfile format and manipulate them using a Java applet.
Simply type the term (e.g. cholesterol), formula (e.g. C6H12O6), registry number (e.g. 64-17-5), InChI (IUPAC International Chemical Identifier, e.g. InChI=1/H2O/h1H2) or ChEBI identifier (e.g. 30815) into the search box on the ChEBI home page, then click on the 'Search ChEBI' button or press 'Enter' on your keyboard.
Wildcards (*) can be used to search using a partial name; e.g. searching for cholest* will find all those entities which have a name or synonym starting with 'cholest', such as cholesterol and cholesteryl β-D-glucoside.
From the Advanced Search page (accessed from the menu in the top left-hand corner of any ChEBI page) searches can be performed using several terms at once or restricted to specific fields (e.g. ChEBI name, synonym, formula).
Structure-based searches can be performed by drawing or loading a chemical structure and the search can be further restricted by combining with a term-based search.
Retrieving data from ChEBI
The entire ChEBI database can be downloaded from ChEBI's ftp site in several formats including SDF, Oracle and generic database dumps, flat files and the Open Biomedical Ontologies (OBO) format.
ChEBI is funded by the European Commission under SLING, grant agreement number 226073 (Integrating Activity) within Research Infrastructures of the FP7 Capacities Specific Programme, and by the BBSRC, grant agreement number BB/G022747/1 within the Bioinformatics and Biological Resources Fund.
Gareth Owen is a member of the Cheminformatics and Metabolism group at the EBI, where he works as curator and project manager for the ChEBI database. Gareth obtained his PhD in synthetic organic chemistry from Leeds University. He continued practising bench chemistry in a collaborative project with the Biotechnology unit at Sheffield University, synthesising radioactive intermediates that were used as part of an effort to produce morphine from microorganisms. He subsequently moved into the area of cheminformatics, designing and building both reaction and molecule databases for ORAC Ltd and later for Synopsys and Accelrys, before joining EMBL-EBI in 2010.
This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.
Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds (i.e. excluding biopolymers such as proteins and nucleic acids). http://www.ebi.ac.uk/chebi
ChEMBL is a chemogenomic database that brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs. www.ebi.ac.uk/chembl
Open Biomedical Ontologies
The Open Biomedical Ontologies (OBO) project is a coordinated international collaborative effort to define standards and methodologies for the development of ontologies and controlled vocabularies in the life sciences. The OBO project defines the OBO Format for ontology serialisation and the OBO Edit tool for ontology editing and visualisation. They provide the OBO Foundry, a web registry for orthogonal and interoperable domain reference ontologies, which include the Gene Ontology and ChEBI.
The Oracle object-relational database management system (ORDBMS), produced and marketed by Oracle Corporation.
A database of the biological processes known to be associated with particular proteins in humans. It is a collaboration between the EBI, The Ontario Institute For Cancer Research, Cold Spring Harbor Laboratory and New York University Medical Center. www.reactome.org
Structure Data Format (SDF) files, also known as SD Files, are simple ASCII text files that adhere to a strict format for representing multiple chemical structure records and associated data fields. Originally developed and published by MDL (now part of Accelrys) the format is the most widely used public standard for exchange of chemical structure/data information.
Low molecular weight organic compounds which are not polymers.
The EBI’s the central access point for extensive curated protein information, including function, classification, and cross-references. See www.uniprot.org for more information.
A professional scientist who collects, annotates, and validates information that is disseminated by biological and model organism databases. The role of a biocurator encompasses quality control of primary biological research data intended for publication, extracting and organizing data from original scientific literature, and describing the data with standard annotation protocols and vocabularies that enable powerful queries and biological database inter-operability. Curators communicate with researchers to ensure the accuracy of curated information and to foster data exchanges with research laboratories.
A record of the table structure and/or the data from a database.
A file that contains one record per line. Individual fields within such a record are typically separated by delimiters such as commas or tabs.