Our group has developed and maintains a number of freely available databases and tools, many in collaboration with researchers at other institutes. Some of the resources on this page were developed during the Thornton Group's time at University College London.

ArchSchema web server

ArchSchema displays dynamic graphs of related Pfam domain architectures. Each node in the graph represents a specific architecture (ie sequence of Pfam domains). Lines between nodes join the most closely related architectures. Satellite nodes can be switched on to show the associated Uniprot protein sequence ids, PDB codes, or E.C. enzyme classification numbers. A stand-alone version of the program can be downloaded (see panel on the right).

Catalytic Site Atlas

The Catalytic Site Atlas is a database documenting enzyme active sites and the catalytic residues in enzymes of known 3D structure. We have defined a classification of catalytic residues which includes only those residues thought to be directly involved in some aspect of the reaction catalysed by the enzyme.

Cofactor database

The Cofactor database presents information and catalytic mechanisms, compiled from the scientific literature, relating to organic enzyme cofactors. The information is integrated with various compound and enzyme resources.

Enzyme Structures Database (EC->PDB)

The Enzyme Structures Database contains all enzymes of known 3D structure, deposited in the Protein Data Bank (PDB) and categorized according to the E.C. numbering system. Each structure links to PDBsum for detailed structural analyses.


FunTree is a resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. It allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzyme's sequence and structure but also the relationships of their reactions. It has been developed in tandem with the CATH database at UCL.


The LigSearch server predicts ligands that might bind to a given protein target. Its chief aim is to help experimentalists, planning to solve a specific protein structure, to identify small molecules that might form a complex with the protein. The server uses resources such as ChEMBL, ChEBI and the structures in the PDB to find potential ligands.


MACiE - Mechanism, Annotation and Classification in Enzymes - is a database of fully annotated enzyme reaction mechanisms. It is a collaborative project with the Mitchell Group at the University of St Andrews and the Bertini Group at the Magnetic Resonance Center.

Metal MACiE

The Metal MACiE database annotates both the properties and functions of the metal ions involved in enzyme catalytic reactions. The database is an extension of the MACiE database (see above) and is the result of a collaborative project among three institutes: the Magnetic Resonance Center (University of Florence), the European Bioinformatics Institute and the Unilever Center (University of Cambridge).


NetEffects analyse the effects of perturbations to specific pathway components on the rest of the pathway, and especially the biological processes it controls. Two different ways of studying each pathway are provided: querying it by up- or down-regulating specific components, or importing gene expression data and phenotype information (lifespan extension/reduction) and inferring the effects on the pathway. The underlying methodology used is Answer Set Programming (ASP), a form of declarative programming.


PDBsum is a pictorial database providing an at-a-glance overview of the contents of each 3D structure deposited in the Protein Data Bank (PDB). It shows the molecule(s) that make up the structure (ie protein chains, DNA, ligands and metal ions) and schematic diagrams of their interactions. Extensive use is made of the freely available RasMol molecular graphics program to view the molecules and their interactions in 3D.


Pita suggests the most likely biological unit for a given X-ray crystal structure of a protein. It uses the crystal symmetry operators and a method of scoring each trial protein-protein interface. 


 is an automated tool for visualizing the sequence and residue conservation of pore-lining residues in transmembrane protein structures.


PoreWalker is a fully automated method for detecting and characterizing transmembrane protein channels from their 3D structure. 


ProFunc helps identify the likely biochemical function of a protein from its three-dimensional structure. It uses both sequence- and structure-based methods to try to provide clues as the the protein's likely or possible function. Often, where one method fails to provide any functional insight another may be more helpful.


SAS is a sequence search program that uses FASTA to scan a given protein sequence against all the proteins of known 3D structure in the Protein Data Bank. The resultant multiple alignment can be coloured according to different structural features. The matching 3D structures can be superimposed and viewed in RasMol.


Scorecons quantifies residue conservation in a multiple sequence alignment. Given a multiple sequence alignment file, Scorecons calculates the degree of amino acid variability in each column of the alignment and returns this information to the user.


SurvCurv is a database of manually curated and annotated survival data in model organisms. The database offers various functions including plotting, mathematical models, and statistical tests, facilitating e.g. reanalysis and cross comparisons.


Below is a list of software packages, developed by the group, and which are available for download. All are free for academic use and require only completion of a Confidentiality Agreement. For non-academic institutions, some of the packages require an annual fee while others (eg PROCHECK) are free. Several of the programs were developed during the Thornton Group's time at University College London.


ArchSchema is a stand-alone java program that generates dynamic plots of related Pfam domain architectures. See the ArchSchema web server (panel on the left) for a brief description.

Example ArchSchema graph


HBPLUS is a hydrogen bond calculation program for protein 3D structures. Accepts input files in PDB format.


LigPlot+ is a java-based GUI front-end to the original LIGPLOT program for automatic generation of 2D ligand-protein interaction diagrams. The GUI allows on-screen editing of the plots via mouse click-and-drag operations. Freely available to academic institutions.

LigPlot+ diagram of the PCP 301 ligand in PDB entry 1a95


NUCPLOT automatically generates schematic diagrams of protein-nucleic acid interactions for a given PDB file.

NUCPLOT diagram of protein-DNA interactions


PROCHECK checks the stereochemical quality of a protein structure, producing a number of PostScript plots analysing its overall and residue-by-residue geometry.

PROCHECK Ramachandran plot for PDB entry 1pro


PROCHECK-NMRis a version PROCHECK (see above) specifically for analysing ensembles of protein structures, as solved by NMR.

Small Molecule Subgraph Detector

The Small Molecule Subgraph Detector (SMSD) is a Java-based software library for calculating the Maximum Common Subgraph (MCS) between small molecules. This helps find the similarity/distance between any two molecules. MCS is also used for screening drug-like compounds by finding molecules sharing a common subgraph (substructure).

Similarities and difference between ATP and ADP


The SURFNET program generates surfaces and void regions between surfaces from coordinate data supplied in a PDB file.

Protein surface (yellow) plus clefts (violet) in PDB entry 3er5: protein