What does ADME SARfari do?
ADME SARfari enables you to:
- Predict likely ADME targets for an input molecule.
- Find ADME targets similar to an input FASTA sequence.
- Find ADME targets related to text terms.
- Find pharmacokinetic data relating to an input target, sequence or text term.
- Find activity/pharmacokinetic data for an input molecule or related compounds (via a similarity/substructure search).
- Match expression levels in human tissues for found targets
There are two main modes of searching. Molecule driven and protein driven. Once you have generated a set of results, you can then move between the tabs of the application to get results for that view related to your set of results.
By utilising the molecule sketcher on the homepage, you can sketch, paste or drag (ensuring the appropriate file format) a molecule in. This then allows you to:
- Run a substructure search
- Run a similarity search
- Run a prediction based on a trained Multiclass Naive Bayesian Classifier.
- Once you have submitted a structure via a search, you will automatically be directed to the Molecules page, which displays the current set of related compounds. Clicking on the other headers will now bring results relating to that compound set. E.g., clicking on the Orthlogues page will then bring up a set of ADME targets that the current set of compounds have activity against.
Pasting a FASTA sequence into the righthand side of the homepage, you can run a BLAST search against the sequences of known ADME targets to find targets of high sequence similarity.
When submitting a BLAST search you have the option of setting the following parameters:
- Low complexity filter, which will mask out low complexity regions that may cause spurious or misleading results (see NCBI site for more details:Low Complexity Filter).
- E-value cut-off, which will set the cut-off for the expected number of chance matches in a random model. (see NCBI website for more details: E-value cut-off)
If you don't have a FASTA sequence handy, but know the name of you protein sequence, enter into the 'Lookup target' box and select the appropriate target. This will then autopopulate the FASTA sequence box enabling you to initiate a search.
You can also enter a search term into the text box at the top right. Suggestions will be offered, which can be selected and a search initiated.
Once a set of results has been generated by one of the search methods outlined above you can switch between the result tabs (Orthologues, Tissues, etc).
An overview of the tabs follows. All pages features an export button which allows you to download the current data set (all as delimited tab seperated values, apart from the molecules table which also offers the download in SDF format.
To load the last run query (Sequence or molecule) or clear all session data, go to the tool bar on the homepage.
This view presents a scrollable table of Human ADME targets along with the orthlogues of the these targets in a selected subset of species. Each 'lozenge' is coloured according to the 'colour' drop-down menu option.
- The 'Target cluster per row' view colours orthologues according to protein sequence clustering of the row entries. The clustering was carried out using the NCBI blastclust tool (default parameters used). Orthologues in the same row, which belong to the same cluster share the same colour and orthologues which have not been clustered, i.e. singletons, have no colour.
- The 'BLAST identity' view colours orthologues according to their BLAST similarity to the Human ADME target sequence. If a sequence search has been intiated, the colouring is relative to the input sequence used.
- The 'Predicted Scores' view colours orthologues according to their predicted activity from the Naive Bayesian.
These ADME targets are derived as follows:
- Core - ADME targets considered to be the most important to drug metabolism. The 'Core ADME Gene List' defined on the PharmaADME site (www.pharmaadme.org), was used to seed this list. Please note this list does not correspond exactly to the PharmaADME 'Core ADME Gene List', as some targets have been promoted from the Extended List to the Core List in the ADME SARfari system.
- Extended - ADME targets considered to be closely associated with drug metabolism. The 'Extended ADME Gene List' defined on the PharmaADME site (www.pharmaadme.org), was used to seed this list. Please note this list does not correspond exactly to the PharmaADME 'Extended ADME Gene List', as some targets have been promoted from the Extended List to the Core List and some targets have been removed as they represent pseudogenes or duplicates in the ADME SARfari system.
- Supplementary - ADME targets not included in the original PharmaADME Core and Extended lists, but believed to play a role in drug metabolism.
- ENSEMBL Mapping - ADME targets that share high sequence identity to targets that appear in the previous lists (based on mappings taken from ENSEMBL), however it is not known if they play a role in drug metabolism.
By default the ADME SARfari system displays targets from all sources listed above, however a user can choose to hide or show sources using the 'Sources' button in the menu bar.
Each 'lozenge' contains the name of the target/protein. A number in a circle indicates the number of SNPs in the target sequence. A ChEMBL symbol indicates that there is ChEMBL data for this target. Hovering over the target will show a tooltip containing more data. Clicking on the target will take you to the ADME SARfari target page. If there is ChEMBL data for this target, a set of widgets will also be displayed on the target page.
There are a number of filters available to alter the number of targets displayed in the table. The filter text box allows you to filter by a text term.
At the end of each row is a button to take you to the alignment page for the sequences in the row. The currently selected species set is used to select the sequences. The alignments are shown in a basic viewer which interacts with a phylogenetic tree shown underneath. Hovering over a sequence highlights it's branch in the tree and vice-versa. There is a drop-down to select the feature colouration of the residues in the sequences. SNPs are represented as circles in the alignment viewer. If there has been a residue change in the SNP, the colour and letter will cycle between the two. Hovering over the alignment shows tooltips and offers the ability to click out to the SNP entry (dbSNP).
This view presents a scrollable table of the expression levels of the ADME targets in Human tissues and cells. There is an expandable tissue tree which allows you to select which tissues you wish to display in the table. Expression levels for cell types are shown in a minature bar chart. Hovering over the bars shows more information. Semi-ubiquitous tissues are coloured with a gradient to show group membership. After selecting a set of tissues click 'update' to render them. A text box is also available to filter the table.
This view presents a scrollable table of molecules with associated activities (mixed assays) used to train the Naive Bayesian. There is full table sorting and text filtering. Clicking on a molecule image will take you to the relevent ChEMBL entry.
NB: Attempting to export the whole table will probably result in a time-out. Data downloads can be found here.
This view presents a scrollable table of molecules with associated calculated properties and descriptors. The table is fully sortable and offers exports in the both 'comma seperated values' and SDF format. Clicking on a molecule image will take you to the activity subset for that molecule. A drop-down is present which switches the view from molecular properties to pharmacokinetic values. Values are binned into 'High', 'Medium' and 'Low' and coloured on a red gradient according to the following table:
This is a sparse matrix view due to limited PK data for all compounds.
This view shows a heatmap containing a condensed PK data set, i.e., only molecules from a subset that have PK data are included. Multiple data points are averaged and binned according to the table above. Data for primates has been condensed into one species catagory as has Volume Distribution (VD) and Volume Distribution at Steady State (VDss). Hovering over the heatmap shows a repositionable tooltip containing the structure and relevent data. You can sort by the assay header or by the Y-axis category (via the drop-down menu). Clicking on a heatmap cell opens a new tab with the relevant ChEMBL molecule entry.
How do I?
Click on the button. A BLAST run will then be intiated and the application will take you to the orthologues view.
Now, either click on or enter a Similarity cut-off and click
The application will automatically take you to the 'Molecules page'. From there you can navigate to the 'Pharmacokinetics' page and if any data exists against your compound set, it will be displayed in a heatmap.
Now click on
The compound will be run against the Naive Bayesian Multiclass classifier model (details below) and a set of predicted targets will then be depicted on the 'Orthologues' page. This distinct set of targets will be used to generate input for all other pages in ADME SARfari. Navigating to the 'Tissues' page will show expression levels in this target subset. Using the tissue/cell tree, the selection can be changed.
Now click on
The compound will be run against the Naive Bayesian Multiclass classifier model (details below) and a set of predicted targets will then be depicted on the 'Orthologues' page. This distinct set of targets will be used to generate input for all other pages in ADME SARfari. Navigating to the 'Molecules' page will show all other compounds associated with these targets.
Naive Bayesian Model Validation
The following chart depicts the recall accuracy of the currently trained SciKit-learn Naive Bayesian (Bernoulli) model. The chart shows the cumulative rank recall for the validation set of compounds (15%)
The model was built with 142345 compounds (training and validation) and features 135 learned classes.
This chart shows the recall precision of the primary result.
The following chart shows the included targets and number of actives included in the model
The following chart shows the top 25 targets and number of actives included in the model
Software and resources used: