What is ChEMBL?

Why do we need ChEMBL?

The human genome sequence provided a complete molecular ‘parts list’ for researchers interested in improving human health. A key task now is to catalogue how the gene products interact with drugs and drug-like molecules. ChEMBL is a ‘chemogenomic’ database that brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs. The resource is therefore of interest to drug discovery researchers in large pharmaceutical companies, as well as small companies and academic institutions.

About ChEMBL

ChEMBL is a publicly available database of drugs, drug-like small molecules and their targets. The database is unique because of its focus on all aspects of drug discovery and its size, containing information on more than 1.4 million compounds and over 12 million records of their effects on biological systems.

The database includes information about how small molecules bind to their targets, how these compounds affect cells and whole organisms, and information on absorption, distribution, metabolism, excretion and toxicity (ADMET). ChEMBL holds two-dimensional structures, calculated molecular properties (e.g. logP, molecular weight, Lipinski ‘Rule of Five’ parameters) and bioactivity data (such as binding constants and pharmacology). The bioactivity data is tagged to show links between molecular targets and published assays, with a set of varying confidence levels. Additional data on the clinical progress of compounds is being integrated into ChEMBL. The database holds manually extracted and curated structure–activity relationship (SAR) data from the primary medicinal chemistry and pharmacology literature.

ChEMBL data is also a core component of the SARfaris – integrated chemogenomics workbenches for drug discovery that focus on protein kinases (Kinase SARfari) and G protein-coupled receptors (GPCR SARfari). They are central resources for target class knowledge, combining both biological and chemical information.

Protein kinases are often regulators of cell signalling and are therefore key candidates for drug discovery research. Kinase SARfari contains a reference alignment of each protein kinase family, three-dimensional structures, bound ligand conformations and binding sites. GPCRs are transmembrane receptors implicated in many diseases, including inflammation and neurotransmission. Both SARfaris contain SAR data and clinical candidate data extracted from ChEMBL.