Scientific collaborations



We are actively involved in several aspects of the development, adoption and dissemination of ontologies as standards for the annotation of life-science data. Ontologies are structured controlled vocabularies that have several features that make them ideal for the standardisation of annotations, including hierarchical organisation for flexible aggregation, semantics-free stable identifiers, and a plug-in architecture without dependence on a fixed database schema. We develop the ChEBI database and ontology for chemical entities of biological interest. ChEBI is the chemical ontology of choice for many life science data annotation projects, and has been adopted by the OBO Foundry as the reference ontology for chemical entities. ChEBI is also used by the Gene Ontology to identify chemicals in chemical-involving processes and functions. 

We have developed the CHEMINF ontology for chemical information entities, such as descriptors, algorithms and toolkits, for use in providing provenance and disambiguation for the properties of chemical entities being made available as open data in the context of the in the Semantic Web. 

Our group coordinator, Janna Hastings, is active in the OBO Foundry Operations Committee  and in the International Association for Ontologies and their Applications (IAOA).


Our group leads the COSMOS effort, Coordination of Standards in MetabOlomicS, aiming to drive forward the definition and adoption of standards for data exchange and annotation in the field of metabolomics. Metabolomics is an important phenotyping technique for molecular biology and medicine. It assesses the molecular state of an organism or collections of organisms through the comprehensive quantitative and qualitative analysis of all small molecules in cells, tissues, and body fluids. Metabolic processes are at the core of physiology. Consequently, metabolomics is ideally suited as a medical tool to characterise disease states in organisms, as a tool for the assessment of organisms for their suitability in, for example, renewable energy production or for biotechnological applications in general.

We are now seeing the emergence of metabolomics databases and repositories in various subareas of metabolomics and the emergence of large general e-infrastructures in the life sciences. In particular, the BioMedBridges project is set to link a variety of European Strategy Forum on Research Infrastructures (ESFRI) projects, such as ELIXIR and BBMRI. Metabolomics generates large and diverse sets of analytical data and therefore impose significant challenges for the above mentioned e-infrastructures. The COSMOS effort is designed to develop standards and policies to ensure that metabolomics data are: 

  • Encoded in open standards to allow barrier-free and wide-spread analysis.
  • Tagged with a community-agreed, complete set of metadata (minimum information standard).
  • Supported by a communally developed set of open source data management and capturing tools.
  • Disseminated in open-access databases adhering to the above standards.
  • Supported by vendors and publishers, who require deposition upon publication
  • Properly interfaced with data in other biomedical and life-science e-infrastructures (such as ELIXIR, BioMedBridges, EU-Openscreen).

COSMOS brings together leading European groups in Metabolomics and will interface with all interested players in the Metabolomics and beyond, world-wide.

Standards in Engineered Nanomaterial Safety assessment

Our group is a partner in the eNanoMapper European project which aims to develop a comprehensive database and ontology for the nanomaterial safety assessment domain. We lead the work package on ontology development in this effort.

Standards in Chemical Biology

Our group was a partner in the EU-OPENSCREEN effort, the European Infrastructure of Open Screening Platforms for Chemical Biology, which aims to integrate high-throughput screening platforms, chemical libraries, chemical resources for hit discovery and optimisation, bio- and cheminformatics support, and a database containing screening results, assay protocols, and chemical information. We led the Standardisation work package, tasked with defining a core set of representational and transfer data standards for open data sharing and reproducible analysis in European chemical biology. As a part of this effort we are collaborating closely with the PubChem team for chemical data standardisation and the BioAssay Ontology team for biological assay description standardisation.  We have also contributed to the development of the Minimum Information to Annotate a Bioactive Entity (MIABE) project. 

Standards for Chemical Markup -- CML and CMLSpect

Chemical Markup Language (CML) is an XML language designed to facilitate the creation, interchange, and deposition of chemical information. CML covers many areas of mainstream chemistry including:

  • Molecules - structures and properties
  • Reactions, including properties and reaction schemes
  • Spectra, especially as found in chemical publications (CMLSpect)
  • crystallography, especially the interplay of structure and chemistry
  • computational chemistry

The Steinbeck group has been closely involved in the development of CML, especially CMLSpect. CMLSpect is heavily used in Bioclipse to handle spectral information.

With thanks to our funders

  • EMBL
  • Continued development of ChEBI to increase usability for the systems biology and metabolic modelling communities (BBSRC, 3yr)
  • eNanoMapper - A comprehensive database and ontology for the domain of nanomaterial safety assessment (EU, 3yr)
  • EU-OPENSCREEN – A European chemical biology screening platform (EU, ESFRI, 3yr)
  • MetaboLights – Building the missing community resource for Metabolomics (BBSRC, 3yr)
  • Reconstruction of bacterial metabolic networks (BBSRC case studentship, 3yr)
  • Cheminformatics-based semiautomatic methods for improved curation of Chemical Entities of Biological Interest (EU grant SLING, 3yr)
  • Curation of compounds of relevance for allergy and immunology for ChEBI (NIH through LIAI, 2.5yr)
  • Continued development of the ChEBI ontology and database for improved interoperability with biomedical resources (BBSRC, 3yr)
  • Determination of Target Family Landscapes of Protein Binding Pockets (Orion Pharma, 1yr).
  • Cooperation on Chemical Markup Language (CML) with Prof. Peter Murray-Rust's group at Unilever Center for Molecular Informatics, University of Cambridge, Uk. (DAAD-ARC travel grant , 1yr)
  • Computational methods for the sequence determination of oligonucleotides by tandem mass spectrometry (Roche Diagnostics, Penzberg, Germany, 3yr)
  • NMRShiftDB – An open source, open content database for organic structures and their NMR data. (German Research Council - Deutsche Forschungsgemeinschaft (DFG), 4 yr).
  • SENECA – Stochastic optimization in constitution space for the structure elucidation of biological metabolites. (German Research Council - Deutsche Forschungsgemeinschaft (DFG) , 2 yr)