ChEMBL logo

Kinase SARfari

spacer Kinase struture image

Introduction


Family


Protein


Binding Site


3D Structure


Bioactivity


Compound


FAQ


What is Kinase SARfari?

Kinase SARfari is an integrated chemogenomics research and discovery workbench for Protein Kinases. It is both biology and chemistry aware and provides a central resource for Protein Kinase knowledge.

Goals

Kinase SARfari was designed and implemented with the following goals in mind:

  • To integrate target, 3D structure, compound and screening data for Protein Kinases in one Chemogenomics knowledgebase
  • To provide a unique view on Kinases and Kinase chemistry by leveraging both public and proprietary data
  • To enable direct and easy linking of chemical and biological spaces by making all Kinase SARfari data accessible via:
    1. Compound-similarity and substructure searching
    2. Target keyword and sequence similarity searching
    3. Providing compound and screening data through target initiated queries
    4. Providing target and screening data through compound initiated queries
    5. To enable selectivity and cross reactivity analysis for Protein Kinase targets
  • To provide user-friendly access to the data via web-based application, utilising familiar search tools such as chemical structure sketching and BLAST.
  • To integrate with internal tools and resources such as Spotfire

System

Kinase SARfari is an Oracle 11g database equipped with the Symyx/MDL Direct chemical cartridge (version 6.2) and is provided with a web-based interface.

Data in Kinase SARfari

Kinase SARfari contains protein, compound and screening data from a number of diverse sources. This section outlines the sources of the major data components in the database.

Kinase Sequences and Sequence Alignment

Kinase SARfari contains >950 Protein Kinase domain sequences, covering all parts of the Kinome. Most of the sequences are human kinases, including polymorphisms and splice variants, but the set also includes a number of orthologous sequences. Since the sequence set contains many versions and isoforms of the same genes, they are clustered into approximately 600 groups, each representing one sequence from the Kinome.

ChEMBL Data

ChEMBL curated database of SAR data from the literature. Compound structures and SAR tables were abstracted from medicinal chemistry literature, and curated by a team of experts. Kinase SARfari contains all compounds which have been screened against a protein kinase, the SAR data for the compound/kinase assays, all other reported experimental data for these compounds including ADMET data, functional data and screening data against non-kinases were reported. As part of the curation, most data points were normalised to allow easier comparison of data from different assays (e.g. all IC50 and Ki data is converted to nM values). Publication references are provided for all data. ChEMBL currently contains data from the following journals:

  • Journal of Medicinal Chemistry: 1980 onwards
  • Bioorganic Medicinal Chemistry Letters: 1990 onwards

The data in ChEMBL is classified into three major classes: Biochemical data:

Biochemical Data (B) Data measuring binding of compound to a molecular target, e.g. Ki, IC50, Kd
Functional Data (F) Data measuring the biological effect of a compound, e.g. %cell death in a cell line, rat weight
ADMET Data (A) ADME and Tox data e.g. t1/2, oral bioavailability, LD50

Targets in ChEMBL

As part of the data curation for ChEMBL, protein molecular targets associated with an assay were identified and their amino-acid sequences captured. For the Protein Kinase cut of the data, the protein Kinase sequences were appended to the master alignment. Kinase compounds from ChEMBL may also have been screened against non-Protein Kinase targets. These are supplied independently within the database.

Compounds in ChEMBL

Kinase SARfari ChEMBL compounds are of interest for two major reasons: firstly, they provide an overview of the screening space for Protein Kinases in the medicinal chemistry literature, and secondly, because the are all bioactive compounds which have been synthesized and their synthesis and assay routes are directly traceable. This makes them a very high value first pass screening library.

Drugs: FDA approved drugs

FDA approved Protein Kinase inhibitors are reported in Kinase SARfari, together with their Trade names and chemical structures.

3D structures from the PDB

A snapshot of Protein Kinase experimental structures from the RCSB Protein DataBank are provided and analyzed in Kinase SARfari. All Protein Kinase domains are extracted, together with bound ligands where available and mapped onto 'parent' protein Kinase sequence from the master alignment.

Structural Superposition

For every snapshot of the PDB provided within Kinase SARfari, a complete all-by-all comparison is calculated for the Protein Kinase catalytic domain structures. This is to enable analysis and visualization of conformational variation within the Protein Kinase family.

Family Browsing

A phylogenetic tree of the target clusters was produced using the master kinase sequence alignment. The tree is provided to guide target navigation. Clicking on a group within the tree displays a table detailing the families and subfamilies of targets within that group. Selection can then be made from these tables by clicking on the relevant rows.

Family Analysis Column Explanations

Column Name Explanation
+ Click '+' to add family to User Selection table
Level 3, Level 4 Kinone classification, where Level 4 is the most detailed level in classification hierarchy
Domains Number of kinase protein domains found at Level 4 of kinase classification hierarchy
ChEMBL Targets Number of kinase protein domains found at Level 4 of kinase classification hierarchy associated with ChEMBL compounds
Drug Targets Number of kinase protein domains found at Level 4 of kinase classification hierarchy associated with FDA approved compounds

User Selection Column Explanations

Column Name Explanation
x Click 'x' to remove family from User Selection table
Level2, Level 3, Level 4 Kinone classification, where Level 4 is the most detailed level in classification hierarchy
Domains Number of kinase protein domains found at Level 4 of kinase classification hierarchy
ChEMBL Targets Number of kinase protein domains found at Level 4 of kinase classification hierarchy associated with ChEMBL compounds
Drug Targets Number of kinase protein domains found at Level 4 of kinase classification hierarchy associated with FDA approved compounds

Protein Report Card

The target report is a one page portal collating information about a target, and providing a single point for linking to all data associated with a target including summary data, screening data, compounds, etc. An example report card is displayed below.

Keyword Search

Perform a text based search against all names, accessions, synonyms and descriptions of all targets (using All fields option). The following workflow explains how to perform a text-based search:

When entering search terms not that each newline represents a new search term. For example "map kinase" on one line produces a more restrictive search than having "map" and "kinase" on separate lines.

BLAST Search

Full BLAST searching is available against all kinase domains. The following workflow explains how to perform a BLAST search:

Protein Keyword Search Results Column Explanations

Column Name Explanation
Check box for selection of domains to carry on to another part of the system. To select all domains, check the box in the table header domains.
Name Short name for kinase protein domain
Accession UniProt accessions
Organism Kinase protein domain organism
Level 2, Level 3, Level 4 Kinone classification, where Level 4 is the most detailed level in classification hierarchy
Drugs Number of FDA approved compounds associated with kinase protein domain
Bioactivities Number of bioactivity data points associated with kinase domain
Compounds Number of compounds associated with kinase protein domain
Query Region (Blast Only) Amino-acid region of the query sequence matched in the BLAST search
Target Region (Blast Only) Amino-acid region of the Kinase SARfari sequence matched in the BLAST search
Identity (Blast Only) % sequence identity as reported by BLAST
Aln (Blast Only) Click on 'Aln' to see the BLAST alignment between your query and the target

Protein BLAST Search Results Column Explanations

Columns duplicated between Protein Keyword Search Results and Protein BLAST Search Results are not listed below.

Column Name Explanation
Query Region Amino-acid region of the query sequence matched in the BLAST search
Target Region Amino-acid region of the Kinase SARfari sequence matched in the BLAST search
Identity % sequence identity as reported by BLAST
Aln Click on 'Aln' to see the BLAST alignment between your query and the target

Binding Site Definitions

One of the utilities of Kinase SARfari is to enable comparison of kinases based on the physicochemical properties of their binding sites, rather than their overall sequence homology. The original goals of this work are listed below:

  • To identify a set of alternative binding site definitions using empirical information derived from public Kinase/Ligand complexes
  • To map these definitions onto the complete Master alignment of all protein kinases Kinase SARfari
  • To apply these definitions for comparison and clustering of kinases based on binding-site physicochemical properties

505 Kinase/Ligand complexes from the PDB were independently assessed. For each complex, a profile of residue contribution to ligand binding was generated (this assessed whether a residue contributes to binding of the ligand, and how much surface area is involved). The figure below shows the profiles for all 505 complexes. The more a residue contributes to the surface of interaction with the ligands, the darker the blue representing this residue

Based on these binding site profiles, the complexes were clustered into 11 footprints using hierarchical clustering. The figure below shows the number of complexes adopting the 11 footprints. The 'canonical' cluster is a large cluster containing 443 complexes, and representing a general definition of the canonical binding site. It can be split into a number of more specific sub-clusters in more detailed expert analysis.

The table below provides a description of each binding site.

Binding Site Name Binding Site Definition
Canonical binding site Representing the canonical binding footprints adopted by ATP and most 'typical' ATP-competitive inhibitors
Gleevec-like binding site Representing the non-canonical binding footprint adopted by the drug Imatinib in complex with its target ABL kinase (see for example 1IEP). This footprint also is adopted by some P38-alpha inhibitors (see for example 1KV2)
Non-canonical P38-alpha binding site Representing a unique footprint so far only observed in two P38-alpha complexes: P38-alpha with Dihydroquinolinone (1OVE) and a pyridazine inhibitor (1YQJ)
MEK2-like Non-competitive inhibitor binding site Representing the non-ATP competitive binding footprint adopted by non-competitive inhibitors that bind alongside ATP (see e.g. 1S9J)
PKA (complexed with Rho-Kinase Inhibitors and peptides) -like binding site Representing binding footprint adopted by small molecule inhibitors, and PKI-alpha peptide (see e.g. 1Q8T and 1SVG)

Analyse Site Similarity and Neighbourhood

Neighbourhood Density (ND) is a measure that reflects the physicochemical space of a protein binding site as compared to other proteins: if many proteins exist with highlight similar binding sites to a particular target, this target will have a high Neighbourhood Density, and thus more likely to show cross-reactivity with other targets.

When entering search terms not that each newline represents a new search term. For example "map kinase" on one line produces a more restrictive search than having "map" and "kinase" on separate lines.

Binding Site Search Results Column Explanations

Column Name Explanation
Check box for selection of domains to carry on to another part of the system. To select all domains, check the box in the table header domains.
Name Short name for kinase protein domain
Organism Kinase protein domain organism
Level 2, Level 3, Level 4 Kinone classification, where Level 4 is the most detailed level in classification hierarchy
Approved Drugs Number of FDA approved compounds associated with kinase protein domain
Bioactivity Number of bioactivity data points associated with kinase domain
Compounds Number of compounds associated with kinase protein domain
ND Score Neighbourhood Density (ND) Score is an estimated measure of the likely inherent selectivity risk associated with a kinase, calculated based on site-distances. The higher the Neighbourhood Density score, the more likely a kinase is to have selectivity issues.
Pairwise Distances Click on 'View' to see the BLAST alignment between your query and the target

Pairwise Distance Search Results Column Explanations

Columns duplicated between Binding Site Search Results and Pairwise Distance Search Results are not listed below.

Column Name Explanation
Distance A physicochemical distance used to compare the binding sites of any two kinases, based on an internal physicochemical scoring matrix and the binding site definition.
Site Alignment Click 'View' to open up binding site pairwise alignment

Pairwise Alignment Column Explanations

Column Name Explanation
Position Position in master kinase sequence alignment
Weight Significance of binding site position
Target 1 Name of kinase domain target 1
Residue 1 Residue of kinase domain target 1 found at this position in master kinase sequence alignment
Distance Physicochemical distance between residues from target 1 and target 2 at this position in master kinase sequence alignment
Residue 2 Residue of kinase domain target 2 found at this position in master kinase sequence alignment
Target 2 Name of kinase domain target 2

User Guide: 3D Structure Pages

3D Structure Search

This section enables retrieval of 3D structures of kinases and comparing and viewing them. The structures can be retrieved using the kinase name or specific structure codes.

Structure Search Results Column Explanations

Column Name Explanation
Ref Select 1 reference (Ref) kinase domain structure using radio boxes. Use drop down menu above table to view or download structure.
Mob Optionally select multiple mobile (Mob) kinase domain structure. Each mobile structure will be rotated and superimposed on to the reference structure. Use drop down menu above table to view or download structural superposition.
Px Internal structural domain identifier
Code PDB code and chain
Ligand Compound ligands found in structure of kinase domain
Resolution Å Resolution is a measure of the quality of the data that has been collected on the crystal containing the protein.
R-Value R-value is the measure of the quality of the atomic model obtained from the crystallographic data.
Name Short name for kinase protein domain
Drugs Number of FDA approved compounds associated with kinase protein domain
Bioactivities Number of bioactivity data points associated with kinase domain
Compounds Number of compounds associated with kinase protein domain

Protein Target Search

The protein target search allows a user apply filters, using 'smart' menus to the target based bioactivity search. This is demonstrated in the workflow below:

Protein Target and Compound Search

It is possible to supply a list of targets and a list of compounds to find experimental activities between them. The following workflow explains how to do this:

Compound Search

Activities associated with a set of compounds can be returned. This is demonstrated in the workflow below:

Bioactivity Search Results Column Explanations

Column Name Explanation
Activity ID Internal activity domain identifier
Domain Name Unique kinase protein domain identifier
Reg. No. Compound registration number
Activity Type Type of end-point measurement: e.g. IC50, LD50, %inhibition etc
Relation Symbol constraining the activity value (e.g. >, <, =)
Value Activity data point
Unit Units of measurement
Comment Additional comments about activity
Details Click 'Details' to open up underlying published data associated with activity

Compound Report Card

The compound report is a one page portal collating information about a compound, and providing a single point for linking to all data associated with a compound including properties, representations, bioactivity data, etc. An example report card is displayed below:

Structure Search

The user can perform compound structure searches. The following workflow explains how a user can draw a chemical structure and return it to the webpage:

Once the structure has been drawn a structure based search can be initiated using the following workflow:

Text Search

It possible to retrieve compounds by searching for their synonyms, names or identifiers. The following workflow explains how to perform a text-based search:

When entering search terms not that each newline represents a new search term. For example "map kinase" on one line produces a more restrictive search than having "map" and "kinase" on separate lines.

Selected Compound Sets

A number of pre-selected compound sets can be retrieved. These currently are Approved Drug compounds (FDA approved kinase inhibitors), PDB Ligands and Clinical Candidate compounds.

Compound Search Results Column Explanations

Column Name Explanation
Compound Structure (Table View) Image of compound
Reg. No. Compound registration number
Similarity (Similarity Search) Tanimoto similarity score between query and target structures
Mol Weight Molecular weight of compound
Synonyms List of compound synonyms
ALogP ALogP value for compound
PSA Polar Surface Area
HBA Hydrogen Bond Acceptors
HBD Hydrogen Bond Donors
#Ro5 Vio. Number of Rule-of-Five Violations

Frequently Asked Questions


Q: What browser does Kinase SARfari run on?

Kinase SARfari should work on most recent browsers. However it has primarily been developed and tested on Internet Explorer 6.x/7.x and on Mozilla Firefox 2.x/3.x and Safari 4.x

Some users have reported issues with the compound structure applet when using Mac OSX 10.6 (Snow Leopard). We are currently looking into this issue.


Q: What plug-ins or other software do I need installed on my computer to use the Kinase SARfari interface fully?

The compound and 3D structure viewers are applets which require Java 1.5 and the browsers to have Java pluggin installed.

Please enable cookies in your browser as these are used by the site, for example storing user drawn compound structures.


Q: Who do I contact if I have a question regarding data in the Kinase Sarfari system?

If you have a data content related question please contact: sarfari-help


Q: Who do I contact if I want to report a bug in the Kinase Sarfari system?

If you experience any problems or identify a bug please contact: sarfari-help

spacer
spacer