ChEMBL logo


DrugEBIlity   Structure Based   Feature Based   Ligand Based    


  • What are the Druggability, Tractability and Ensemble scores?
  • What is SCOP?
  • How are the domains in the database classified?
  • I cannot find a PX number when I search the SCOP pages
  • How are structures mapped to full length protein sequences
  • How is average druggability/tractability calculated

What are the Druggability, Tractability and Ensemble scores?

Druggability score evaluates the suitability of the binding site for small molecules under the Lipinski's Rule of 5 [Druggable=1/Undruggable=0]. Tractability score is as above but under more relaxed conditions [Tractable=1/Intractable=0]. Ensemble score is the average of druggability score calculated under different models [ranging from Undruggable:-1.0 to Druggable:+1.0].

In the calculation of druggability, two Decision Trees have been trained in order to classify binding pockets into Druggable=1/Undruggable=0 or Tractable=1/Intractable=0. The trees were trained using PDB structures known to bind ligands of suitable types. The following table summarises the ligand properties selected for training each tree.

Tree Version MW HBD HBA CLogP RotB Others
Tractability tree 200 < MW < 800 ≤ 8 ≤ 15 ≤ 8 ≤ 16 No metal, sugar, carbohydrates
Druggability tree 100 < MW < 550 ≤ 5 ≤ 10 ≤ 5 ≤ 10 No metal, sugar, carbohydrates

Ensemble was calculated by averaging the different structure-based druggability scores. The 11 individual prediction models (8 Decision Trees, 2 SVM:Support Vector Machine, and 1 MLP:Multilayer Perceptron) were generated for this calculation, and separately trained using 25 physico-chemical descriptors of the known ligand binding sites. The average of the output from the individual predictions was calculated as the Ensemble Score ranging from Undruggable:-1.0 to Druggable:+1.00.

What is SCOP?

SCOP is the Structural Classification Of Proteins database created and maintained by Chothia, Murzin et. al in the MRC LMB, Cambidge. It is a database of manually curated protein structural domains, hierarchically organised into the established SCOP Hierarchy, this has the following levels:
  1. Class: Major class determined mostly by secondary structure content of the domains, e.g. All-alpha, All-Beta, Alpha+Beta etc.
  2. Fold: Domains in the same fold share core secondary structures in the same topology. They may or may not be evolutionary.
  3. Superfamily: Domains in the same superfamily share major secondary structures in the same topology, often have related functions, and are almost certainly evolutionarily related even though they may lack any obvious sequence similarity.
  4. Family: Domains in the same family are evolutionarily related, often have a sequence similarity > 30% and have the same or highly similar functions
See the SCOP
Home Page for details and references

How are the domains in the database classified?

As SCOP is a manually curated database it tends to lag behind PDB releases. In order to maintain an up-to-date structural classification database, We maintain a relational database of domain classification which extends the SCOP classification to PDB structures not yet classified by SCOP. This is done using a number of sources and techniques involving the use of ASTEROIDS and manual curation.

I cannot find a PX number when I search the SCOP pages

The PDB domains classified within our system that are not yet classified by SCOP are given domain PX identifier by this database. These will not be in SCOP as the structures have not yet been classified by SCOP.

How are structures mapped to full length protein sequences

A sequence similarity-based pipeline has been implemented to map PDB structures to their parent protein(s). It uses sequence similarity of the SEQRES sequence and a consensus SEQRES and ATOM record sequence to a database of highly curated proteins (SwissProt and TrEMBL). The pipeline also utilises source organism matching and database cross-references.

How is average druggability/tractability calculated

The average druggability is calculated for all the structures of a particular domain of a particular protein. E.g. Human SRC kinase (Protein accession P12931) has three structurally determined domains: SH3 domain; Sh3 domain and a Kinase Catalytic domain. There are 4 structures available for the Kinase Catalytic domain, all of which are druggable.

Hence the average druggability for the Kinase Catalytic domain =

[Number of druggable domain structures] / [Number of domain structures]
= 4/4
= 1.0