What is ChEMBL?

The human genome sequence provided a complete molecular ‘parts list’ for researchers interested in improving human health. A key task now is to catalogue how the gene products interact with drugs and drug-like molecules.

ChEMBL is a ‘chemogenomic’ database that brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs. The resource is therefore of interest to drug discovery researchers in large pharmaceutical companies, as well as small companies and academic institutions.

What data is included in ChEMBL?

ChEMBL is a manually curated, freely available database of bioactive molecules with drug-like properties. The database (Figure 1) is unique because of its focus on all aspects of drug discovery and its size, containing information on more than 1.8 million compounds and over 15 million records of their effects on biological systems.

ChEMBL includes information about how small molecules interact with their protein targets, how these compounds affect cells and whole organisms, and information on absorption, distribution, metabolism, excretion and toxicity (ADMET). ChEMBL holds two-dimensional structures, calculated molecular properties (e.g. logP, molecular weight, Lipinski ‘Rule of Five’ parameters) and bioactivity data (such as binding constants and pharmacology). The bioactivity data is tagged to show links between molecular targets and published assays.

The data is primarily manually extracted and curated structure-activity relationship (SAR) data from the primary medicinal chemistry and pharmacology literature. Additionally, the ChEMBL database contains data deposited by researchers and data extracted from other data sources.

Figure 1 Data included in ChEMBL.

Lastly data on the clinical progress of compounds has being integrated into ChEMBL (Figure 2). Specifically we have a highly curated set of “drug” molecules that includes marketed compounds and compounds that are or have previously been in clinical development. These compounds are annotated with their known therapeutic targets and their therapeutic indications.

Figure 2 Information on drug molecules included in ChEMBL.