ChEMBL data overview
What data does ChEMBL contain?
The ChEMBL data is a combination of extracted data from literature and donated data sets from companies such as GSK and PubChem. The curated data is made up of targets, organisms, compounds and their associated bioactivities (Figure 3).
Where does the data come from?
The majority of the data is extracted from literature, coming from a selection of 47 journals. The most popular journals used include: Journal of Medicinal Chemistry, Bioorganic and Medicinal Chemistry and Bioorganic and Medicinal Chemistry Letters (Figure 4).
Extracted target types
All target types that are reported in the literature are stored in ChEMBL (Figure 5).
Figure 5 Target types in ChEMBL.
A subset of PubChem assays (confirmatory and panel assays with dose—response endpoints) have been loaded into ChEMBL. Assays from PubChem are clearly marked, both on the ChEMBL interface and in the database. This allows you to easily determine where data have originated, while being able to retrieve more information through a single point of access. This led to the addition of over 600,000 compounds to the database, as well as 7,000,000 bioactivities (Figure 6).
As well as the PubChem data, we have also had depositions from other companies and consortiums, thereby allowing us to expand our database. One such deposition is the Neglected Tropical Disease (NTD) dataset, which was donated by companies such as GSK and Novartis (Figure 7).