ChEMBL : Quick tour

The human genome sequence provided a complete molecular ‘parts list’ for researchers interested in improving human health. A key task now is to catalogue how the gene products interact with drugs and drug-like molecules. ChEMBL is a ‘chemogenomic’ database that brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs. The resource is therefore of interest to drug discovery researchers in large pharmaceutical companies, as well as small companies and academic institutions.


ChEMBL: Quick tour
Louisa Bellis [1] Chemical biology Beginner 0.5 hour This quick tour provides a brief introduction to ChEMBL, the EBI's chemogenomics resource.For a more detailed walthrough of ChEMBL, have a look at our ChEMBL: Exploring bioactive drug-like molecules [2] tutorial.

Learning objectives:
Basic understanding of ChEMBL and how it can help you to understand the interactions between drugs or drug-like molecules and their targets Know where to find out more about ChEMBL What is ChEMBL?

Why do we need ChEMBL [3]?
The human genome sequence provided a complete molecular 'parts list' for researchers interested in improving human health.A key task now is to catalogue how the gene products interact with drugs and drug-like molecules.ChEMBL is a 'chemogenomic' database that brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.The resource is therefore of interest to drug discovery researchers in large pharmaceutical companies, as well as small companies and academic institutions.

About ChEMBL
ChEMBL [4] is a publicly available database of drugs, drug-like small molecules [5] and their targets.The database is unique because of its focus on all aspects of drug discovery and its size, containing information on more than 1.4 million compounds and over 12 million records of their effects on biological systems.
The database includes information about how small molecules bind to their targets, how these compounds affect cells and whole organisms, and information on absorption, distribution, metabolism [6], excretion and toxicity (ADMET [7]).ChEMBL holds two-dimensional structures, calculated molecular properties (e.g.logP [8], molecular weight, Lipinski 'Rule of Five' parameters) and bioactivity data (such as binding constants and pharmacology).The bioactivity data is tagged to show links between molecular targets and published assays, with a set of varying confidence levels.Additional data on the clinical progress of compounds is being integrated into ChEMBL.The database holds manually extracted and curated structure-activity relationship (SAR) data from the primary medicinal chemistry and pharmacology literature.
ChEMBL data is also a core component of the SARfaris -integrated chemogenomics workbenches for drug discovery that focus on protein kinases (Kinase SARfari [9]) and G protein-coupled receptors (GPCR SARfari [9]).They are central resources for target class knowledge, combining both biological and chemical information.
Protein kinases are often regulators of cell signalling and are therefore key candidates for drug discovery research.Kinase SARfari contains a reference alignment of each protein kinase family, three-dimensional structures, bound ligand [10] conformations and binding sites.GPCRs [11] are transmembrane receptors implicated in many diseases, including inflammation and neurotransmission.Both SARfaris contain SAR data and clinical candidate data extracted from ChEMBL.

ChEMBL data What data does ChEMBL contain?
The ChEMBL [3] data is a combination of extracted data from literature and donated data sets from companies such as GSK and PubChem.The curated data is made up of targets, organisms, compounds and their associated bioactivities (Figure 3).

Web interface
The ChEMBL web page [4] provides flexible and easy access to ChEMBL's core bioactivity data.You can perform searches using compound names, compound structures, target names, target sequences, bioactivity and target classes.Searching is via an encrypted and secure protocol.

Target search
You can browse all data relating to a particular protein target using the protein family [12] classification tree, searching by keyword, or using the sequence similarity search option.You can filter the resulting data (for example by IC50 [13] or Ki [14]), download bioactivity data and identify interesting molecules on the basis of substructures, physicochemical properties, potency, and ligand [10] efficiency.

Compound search
You can search for data on compounds of interest by drawing a molecular structure or substructure, or by inputting keywords or compound IDs.The results will list all molecules containing that substructure or molecules similar to the input structure.You can then identify molecules with interesting properties (such as their target interactions, selectivity, ADMET [7]) and the data can be Published on EMBL-EBI Train online (http://www.ebi.ac.uk/training/online) downloaded for further analysis.

Analysis of key gene families
The SARfaris (Kinase SARfari [9], GPCR SARfari [15]) enable in-depth searching and analysis of target class data.You can browse families, search for specific protein or chemical (sub)structure and view binding sites, make comparisons with other related sequences and retrieve 3D structural data.

Neglected Tropical Disease (NTD) archive
You can search data on neglected tropical diseases using the ChEMBL-NTD [16] interface.For instance, ChEMBL-NTD contains a data set of 13,500 compounds that are known to inhibit the growth of Plasmodium falciparum strain 3D7, which is one of the most deadly parasites that cause malaria in humans.This data set was donated by GlaxoSmithKline [17] (GSK).

Figure 3 Figure 4 Figure 5
Figure 3 Searching for compound activity in the Neglected Tropical Disease archive