ChEMBL logo

GPCR SARfari

spacer GPCR struture image

Introduction


Family


Protein


Bioactivity


Compound


FAQ


What is GPCR SARfari?

GPCR SARfari is an integrated chemogenomics research and discovery workbench for Class A G Protein Coupled Receptors. It is both biology and chemistry aware and provides a central resource for GPCR knowledge.

Goals

GPCR SARfari was designed and implemented with the following goals in mind:

  • To integrate target, 3D structure, compound and screening data for GPCRs in one Chemogenomics knowledgebase
  • To provide a unique view on GPCRs and GPCR chemistry by leveraging both public and proprietary data
  • To enable direct and easy linking of chemical and biological spaces by making all GPCR data accessible via:
    1. Compound-similarity and substructure searching
    2. Target keyword and sequence similarity searching
    3. Providing compound and screening data through target initiated queries
    4. Providing target and screening data through compound initiated queries
    5. To enable selectivity and cross reactivity analysis for Protein Kinase targets
  • To provide user-friendly access to the data via web-based application, utilising familiar search tools such as chemical structure sketching and BLAST.
  • To integrate with internal tools and resources such as Spotfire

System

GPCR SARfari is an Oracle 11g database equipped with the Symyx/MDL Direct chemical cartridge (version 6.2) and is provided with a web-based interface.

Data in GPCR SARfari

GPCR SARfari contains protein, compound and screening data from a number of diverse sources. This section outlines the sources of the major data components in the database.

GPCR Sequences and Sequence Alignment

GPCR SARfari contains >900 GPCR domain sequences. Most of the sequences are human GPCRs, including polymorphisms and splice variants, but the set also includes a number of orthologous sequences. Since the sequence set contains many versions and isoforms of the same genes, they are clustered into approximately 600 groups.

ChEMBL Data

ChEMBL curated database of SAR data from the literature. Compound structures and SAR tables were abstracted from medicinal chemistry literature, and curated by a team of experts. GPCR SARfari contains all compounds which have been screened against a GPCR, the SAR data for the compound/kinase assays, all other reported experimental data for these compounds including ADMET data, functional data and screening data against non-GPCRs were reported. As part of the curation, most data points were normalised to allow easier comparison of data from different assays (e.g. all IC50 and Ki data is converted to nM values). Publication references are provided for all data. ChEMBL currently contains data from the following journals:

  • Journal of Medicinal Chemistry: 1980 onwards
  • Bioorganic Medicinal Chemistry Letters: 1990 onwards

The data in ChEMBL is classified into three major classes: Biochemical data:

Biochemical Data (B) Data measuring binding of compound to a molecular target, e.g. Ki, IC50, Kd
Functional Data (F) Data measuring the biological effect of a compound, e.g. %cell death in a cell line, rat weight
ADMET Data (A) ADME and Tox data e.g. t1/2, oral bioavailability, LD50

Targets in ChEMBL

As part of the data curation for ChEMBL, protein molecular targets associated with an assay were identified and their amino-acid sequences captured. For the GPCR cut of the data, the GPCR sequences were appended to the master alignment. GPCR compounds from ChEMBL may also have been screened against non-GPCR Kinase targets. These are supplied independently within the database.

Compounds in ChEMBL

GPCR SARfari ChEMBL compounds are of interest for two major reasons: firstly, they provide an overview of the screening space for GPCRs in the medicinal chemistry literature, and secondly, because the are all bioactive compounds which have been synthesized and their synthesis and assay routes are directly traceable. This makes them a very high value first pass screening library.

Drugs: FDA approved drugs

FDA approved GPCR inhibitors are reported in GPCR SARfari, together with their Trade names and chemical structures.

Family Browsing

A GPCR family classification wheel has been created based on 3 levels of classification used to describe GPCR protein sequences stored in the SARfari system. The wheel is provided to guide target navigation. Clicking on a segment within the wheel will add row to the Family Analysis table, which will add all GPCR domains to the user selection from that segments position in the classification hierarchy and below.


Family Analysis Column Explanations

Column Name Explanation
Name GPCR classification
Domains Number of GPCR protein domains found at segments position in GPCR classification hierarchy and below
ChEMBL Targets Number of GPCR protein domains found at segments position in GPCR classification hierarchy and below associated with ChEMBL compounds
Drug Targets Number of GPCR protein domains found at segments position in GPCR classification hierarchy and below associated with FDA approved compounds

Protein Report Card

The target report is a one page portal collating information about a target, and providing a single point for linking to all data associated with a target including summary data, screening data, compounds, etc. An example report card is displayed below.

GPCR PDB / Model Structures

The GPCR SARfari have the 89 unique GPCR PDB and the 3D model structures in the protein report pages. The model structures were generated for all the GPCR sequences in the database using the multiple PDB structures and the well-considered alignment. We generated a multiple alignment of the gpcrsarfari's sequences and the Class A GPCR PDB structures. Next, we built models for structure-unknown sequences with MODELLER program using a template structure set including Rhodopsin, beta-1 & beta-2 Adrenergic, Adenosine A2a and Chemokine CXCR4 receptors. Finally, 4 models for each sequence were randomly selected from the generated model clusters and their intra- and extra-cellular loops were excluded.

Keyword Search

Perform a text based search against all names, accessions, synonyms and descriptions of all targets (using All fields option). The following workflow explains how to perform a text-based search:

When entering search terms not that each newline represents a new search term. For example "trace amine" on one line produces a more restrictive search than having "trace" and "amine" on separate lines.

BLAST Search

Full BLAST searching is available against all GPCR domains. The following workflow explains how to perform a BLAST search:

Protein Keyword Search Results Column Explanations

Column Name Explanation
Check box for selection of domains to carry on to another part of the system. To select all domains, check the box in the table header domains.
Name Short name for GPCR protein domain
Accession UniProt accessions
Organism GPCR protein domain organism
Level 2, Level 3, Level 4 GPCR classification, where Level 4 is the most detailed level in classification hierarchy
Drugs Number of FDA approved compounds associated with GPCR protein domain
Bioactivities Number of bioactivity data points associated with GPCR domain
Compounds Number of compounds associated with GPCR protein domain
Query Region (Blast Only) Amino-acid region of the query sequence matched in the BLAST search
Target Region (Blast Only) Amino-acid region of the Kinase SARfari sequence matched in the BLAST search
Identity (Blast Only) % sequence identity as reported by BLAST
Aln (Blast Only) Click on 'Aln' to see the BLAST alignment between your query and the target

Protein BLAST Search Results Column Explanations

Columns duplicated between Protein Keyword Search Results and Protein BLAST Search Results are not listed below.

Column Name Explanation
Query Region Amino-acid region of the query sequence matched in the BLAST search
Target Region Amino-acid region of the Kinase SARfari sequence matched in the BLAST search
Identity % sequence identity as reported by BLAST
Aln Click on 'Aln' to see the BLAST alignment between your query and the target

Protein Target Search

The protein target search allows a user apply filters, using 'smart' menus to the target based bioactivity search. This is demonstrated in the workflow below:

Protein Target and Compound Search

It is possible to supply a list of targets and a list of compounds to find experimental activities between them. The following workflow explains how to do this:

Compound Search

Activities associated with a set of compounds can be returned. This is demonstrated in the workflow below:

Bioactivity Search Results Column Explanations

Column Name Explanation
Activity ID Internal activity domain identifier
Domain Name Unique GPCR protein domain identifier
Reg. No. Compound registration number
Activity Type Type of end-point measurement: e.g. IC50, LD50, %inhibition etc
Relation Symbol constraining the activity value (e.g. >, <, =)
Value Activity data point
Unit Units of measurement
Comment Additional comments about activity
Details Click 'Details' to open up underlying published data associated with activity

Compound Report Card

The compound report is a one page portal collating information about a compound, and providing a single point for linking to all data associated with a compound including properties, representations, bioactivity data, etc. An example report card is displayed below:

Structure Search

The user can perform compound structure searches. The following workflow explains how a user can draw a chemical structure and return it to the webpage:

Once the structure has been drawn a structure based search can be initiated using the following workflow:

Text Search

It possible to retrieve compounds by searching for their synonyms, names or identifiers. The following workflow explains how to perform a text-based search:

When entering search terms not that each newline represents a new search term. For example "trace amine" on one line produces a more restrictive search than having "trace" and "amine" on separate lines.

Selected Compound Sets

A number of pre-selected compound sets can be retrieved. These currently are Drug compounds (FDA approved GPCR inhibitors), PDB Ligands and Clinical Candidate compounds.

Compound Search Results Column Explanations

Column Name Explanation
Compound Structure (Table View) Image of compound
Reg. No. Compound registration number
Similarity (Similarity Search) Tanimoto similarity score between query and target structures
Mol Weight Molecular weight of compound
Synonyms List of compound synonyms
ALogP ALogP value for compound
PSA Polar Surface Area
HBA Hydrogen Bond Acceptors
HBD Hydrogen Bond Donors
#Ro5 Vio. Number of Rule-of-Five Violations

Frequently Asked Questions


Q: What browser does GPCR SARfari run on?

Kinase SARfari should work on most recent browsers. However it has primarily been developed and tested on Internet Explorer 6.x/7.x and on Mozilla Firefox 2.x/3.x and Safari 4.x

Some users have reported issues with the compound structure applet when using Mac OSX 10.6 (Snow Leopard). We are currently looking into this issue.


Q: What plug-ins or other software do I need installed on my computer to use the GPCR SARfari interface fully?

The compound and 3D structure viewers are applets which require Java 1.5 and the browsers to have Java pluggin installed.

Please enable cookies in your browser as these are used by the site, for example storing user drawn compound structures.


Q: Who do I contact if I have a question regarding data in the GPCR SARfari system?

If you have a data content related question please contact: sarfari-help


Q: Who do I contact if I want to report a bug in the GPCR SARfari system?

If you experience any problems or identify a bug please contact: sarfari-help

spacer
spacer