Resources and Software – Thornton Group

Resources

Our group has developed and maintains a number of freely available databases and tools, many in collaboration with researchers at other institutes. Some of the resources on this page were developed during the Thornton Group’s time at University College London.

ArchSchema web server

ArchSchema displays dynamic graphs of related Pfam domain architectures. Each node in the graph represents a specific architecture (ie sequence of Pfam domains). Lines between nodes join the most closely related architectures. Satellite nodes can be switched on to show the associated Uniprot protein sequence ids, PDB codes, or E.C. enzyme classification numbers. A stand-alone version of the program can be downloaded (see panel on the right).

Cofactor database

The Cofactor database presents information and catalytic mechanisms, compiled from the scientific literature, relating to organic enzyme cofactors. The information is integrated with various compound and enzyme resources.

Covid-19 pages

Covid-19 – lists the 3D structures in the PDB of the SARS-CoV-2 virus proteins and annotates them with structural data and with observed variants and Variants of Concern.

Enzyme Structures Database (EC->PDB)

The Enzyme Structures Database contains all enzymes of known 3D structure, deposited in the Protein Data Bank (PDB) and categorized according to the E.C. numbering system. Each structure links to PDBsum for detailed structural analyses.

FunTree

FunTree is a resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. It allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzyme’s sequence and structure but also the relationships of their reactions. It has been developed in tandem with the CATH database at UCL.

M-CSA

M-CSA – Mechanism and Catalytic Site Atlas is a database of enzyme reaction mechanisms, providing annotation on the proteins, catalytic residues, cofactors, and reaction mechanisms of hundreds of enzymes.

Metal MACiE

The Metal MACiE database annotates both the properties and functions of the metal ions involved in enzyme catalytic reactions. The database is an extension of the MACiE database (see above) and is the result of a collaborative project among three institutes: the Magnetic Resonance Center (University of Florence), the European Bioinformatics Institute and the Unilever Center (University of Cambridge).

NetEffects

NetEffects analyse the effects of perturbations to specific pathway components on the rest of the pathway, and especially the biological processes it controls. Two different ways of studying each pathway are provided: querying it by up- or down-regulating specific components, or importing gene expression data and phenotype information (lifespan extension/reduction) and inferring the effects on the pathway. The underlying methodology used is Answer Set Programming (ASP), a form of declarative programming.

PDBsum

PDBsum is a pictorial database providing an at-a-glance overview of the contents of each 3D structure deposited in the Protein Data Bank (PDB). It shows the molecule(s) that make up the structure (ie protein chains, DNA, ligands and metal ions) and schematic diagrams of their interactions. Extensive use is made of the freely available RasMol molecular graphics program to view the molecules and their interactions in 3D.

PoreLogo

PoreLogo is an automated tool for visualizing the sequence and residue conservation of pore-lining residues in transmembrane protein structures.

PoreWalker

PoreWalker is a fully automated method for detecting and characterizing transmembrane protein channels from their 3D structure.

ProFunc

ProFunc helps identify the likely biochemical function of a protein from its three-dimensional structure. It uses both sequence- and structure-based methods to try to provide clues as the the protein’s likely or possible function. Often, where one method fails to provide any functional insight another may be more helpful.

SAS

SAS is a sequence search program that uses FASTA to scan a given protein sequence against all the proteins of known 3D structure in the Protein Data Bank. The resultant multiple alignment can be coloured according to different structural features. The matching 3D structures can be superimposed and viewed in RasMol.

Scorecons

Scorecons quantifies residue conservation in a multiple sequence alignment. Given a multiple sequence alignment file, Scorecons calculates the degree of amino acid variability in each column of the alignment and returns this information to the user.

SurvCurv

SurvCurv is a database of manually curated and annotated survival data in model organisms. The database offers various functions including plotting, mathematical models, and statistical tests, facilitating e.g. reanalysis and cross comparisons.

Transform-MinER

Transform-MinER is an interactive online tool that: a. searches a query molecule for potential enzyme transformations (Molecule search), or b. joins a query’s source and target molecules by a path of enzyme transformations (Path Search).

VarMap

VarMap is a database of manually curated and annotated survival data in model organisms. The database offers various functions including plotting, mathematical models, and statistical tests, facilitating e.g. reanalysis and cross comparisons.

VarSite

VarSite is a database annotating known disease-associated variants in human genes with structural information from the 3D structures in the Protein Data Bank (PDB).

Software

Below is a list of software packages, developed by the group, and which are available for download. All are free for academic use and require only completion of a Confidentiality Agreement. For non-academic institutions, some of the packages require an annual fee while others (eg PROCHECK) are free. Several of the programs were developed during the Thornton Group’s time at University College London.

ArchSchema

ArchSchema is a stand-alone java program that generates dynamic plots of related Pfam domain architectures. See the ArchSchema web server (panel on the left) for a brief description.

HBPLUS

HBPLUS is a hydrogen bond calculation program for protein 3D structures. Accepts input files in PDB format.

LigPlot+

LigPlot⁺ is a java-based GUI front-end to the original LIGPLOT program for automatic generation of 2D ligand-protein interaction diagrams. The GUI allows on-screen editing of the plots via mouse click-and-drag operations. Freely available to academic institutions.

NUCPLOT

NUCPLOT automatically generates schematic diagrams of protein-nucleic acid interactions for a given PDB file.

PROCHECK

PROCHECK checks the stereochemical quality of a protein structure, producing a number of PostScript plots analysing its overall and residue-by-residue geometry.

PROCHECK-NMR

PROCHECK-NMR is a version PROCHECK (see above) specifically for analysing ensembles of protein structures, as solved by NMR.

Small Molecule Subgraph Detector

The Small Molecule Subgraph Detector (SMSD) is a Java-based software library for calculating the Maximum Common Subgraph (MCS) between small molecules. This helps find the similarity/distance between any two molecules. MCS is also used for screening drug-like compounds by finding molecules sharing a common subgraph (substructure).

SURFNET

The SURFNET program generates surfaces and void regions between surfaces from coordinate data supplied in a PDB file.