Edit

Thornton Group

Computational biology of proteins and ageing

Our research uses protein 3D structural information to understand molecular evolution and how variants and small molecules can cause or modulate diseases and ageing. We have a strong focus on enzymes, the transformations they perform, their mechanisms, flexibility and how they evolve novel functions.

Edit

Please note: the Thornton Group is no longer able to take on new students, trainees or post-doctoral fellows.

We explore the structure, function and evolution of proteins. These basic studies facilitate our ability to understand how proteins work and to interpret coding variations in humans and their impact on healthy ageing and disease. Our research is focused in three distinct but related areas:

  • We seek to understand how enzymes work and how they evolve to perform new enzyme functions, based on structural data. We have shown that most enzyme functions have evolved from other functions; this opens the path to rational design of novel enzymes with new functions and mechanisms. We also develop computational tools based on our analyses, to improve enzyme design. Our analysis of enzyme active sites reveals a high degree of structural conservation of the catalytic residues (shown by the grey outlines). This allows us to derive representative structural templates (black outlines, derived from PDB entry 1n4o) for each enzyme type. These can be used to search for related enzymes or pseudo-enzymes.
  • Our study of human coding variations aims to use protein structural knowledge to interpret their effects. Using the CATH domain database, we study variations occurring in related domains and explore how their genomic context influences the resultant phenotype.
  • Our work on enzymes and variants is ultimately related to human health, ageing and disease. Our goal is to trace the steps from the molecular protein variant or ligand and its effect on the protein’s function and, from there, to the organismal ‘disease and ageing’ sub-phenotype. In our ageing studies, we aim to combine data over multiple organisms and multiple data types to define sub-phenotypes of ageing to improve our understanding of the molecular basis of ageing. We will explore why ageing makes us more susceptible to some diseases and use computational methods to identify small molecules that may have an impact on ageing.
Catalytic Templates for Class A Beta-lactamases. This shows a superposition of the active site residues taken from 244 related but ‘unique’ structures. Most residues superpose well, but the Ser/Thr residues (which occur in different members of the family) can be seen to adopt two alternative conformations in these structures.

Future projects

For our enzyme work, our central question is whether we can predict the evolution of enzyme function – both in terms of adapting to operate on new substrates and evolving new mechanisms. Can we relate changes in function to changes in the structure of the enzyme and changes in the environment? Can we automatically predict or validate enzyme catalytic mechanisms in silico from structural data? We will further develop our data resources (M-CSA) and websites (PDBsum) and develop novel methods to predict transformations and mechanisms using knowledge-based and deep-learning approaches.

For coding variants, we will enhance our web tool (VarSite) to relate variant, 3D structure and function to help non-experts understand the impact of coding variants and how they generate disease phenotypes.  To address these questions, we plan to:

  • develop new methods to analyse the effects of mutations in ligand binding sites
  • explore variants in co-factor binding sites and their impact on function
  • apply our methods to specific genes of interest in collaboration with ‘domain’ experts
  • explore how the same mutations can cause many diseases, and how one disease can have many causes.

For ageing, we will develop tools to combine transcriptome data sets and analyse a small number of common diseases and the impact of ageing on their occurrence.

Selected publications

Laskowski RA, Stephenson JD, Sillitoe I, Orengo CA, Thornton JM. VarSite: Disease variants and protein structure (2020). Protein Science 29, 111-119

Ribeiro AJ, Tyzack JD, Borkakoti N, Thornton JM. Identifying pseudoenzymes using functional annotation: pitfalls of common practice (2020). The FEBS Journal 287, 4128-4140

Dönertaş HM, Fabian DK, Fuentealba Valenzuela MF, Partridge L, Thornton JM. Common genetic associations between age-related diseases (2020). Nature Ageing ref.

Laskowski, RA, Thornton, JM. PDBsum extras: SARS-CoV-2 and AlphaFold models (2022). Protein Science 31: 283-289.

Thornton J M, Laskowski R A, Borkakoti N. AlphaFold heralds a data-driven revolution in biology and medicine (2021). Nat Med, 27, 1666–1669.

Data resources

ArchSchema

ArchSchema

ArchSchema is a java webstart application that generates dynamic plots of related Pfam domain architectures. The protein sequences having each architecture can be displayed on the plot and separately listed. Where there is 3D structural information in the PDB, the relevant PDB codes can be shown on …

Atlas of Protein Side-Chain Interactions

Atlas of Protein Side-Chain Interactions

This atlas depicts how amino acid side-chains pack against one anotherwithin the knownprotein structures. This packing, which is governed by the interactions between the 20 different types of side-chains, determines the structure, function, and stability of proteins.

Catalytic Site Atlas

Catalytic Site Atlas

CSA is a resource of catalytic sites and residues that have been identified in enzymes using structural data.

Cofactor database

Cofactor database

Organic enzyme cofactors are involved in many enzyme reactions. Therefore, the analysis of cofactors is crucial to gain a better understanding of enzyme catalysis. To aid this, we have created the CoFactor database. It provides a web interface to access hand-curated data extracted from the literatur…

CSS

CSS

Searches a protein structure for likely catalytic sites

EC-PDB

EC-PDB

This database contains the known enzyme structures that have been deposited in the Protein Data Bank (PDB).

FunTree

FunTree

FunTree provides a range of data resources to detect the evolution of enzyme function within distant structurally related clusters within domain super families as determined by CATH . To access the resource enter a specific CATH superfamily code or search for a structure / sequence / function (eithe…

LigSearch

LigSearch

Identifies small molecules likely to bind to given protein

MACiE

MACiE

Mechanism, Annotation and Classification in Enzymes. Query for an enzyme, and return enzyme mechanism.

PDBsum

PDBsum

This pictorial database provides an overview of macromolecular structures deposited in the Protein Data Bank archive.

PDBsum Generate

PDBsum Generate

Protein structure analyses, inlcuding secondary structure determination, quality assessment, protein -ligand -protein -DNA interactions.

PITA

PITA

Suggests most likely biological unit for X-ray structure of protein

PoreLogo

PoreLogo

Generates logo showing conservation of pore-lining residues in transmembrane protein structures

PoreWalker

PoreWalker

Detects and characterises transmembrane protein channels from their 3D structure

SAS

SAS

Annotation of protein sequence with structural info from similar proteins in the PDB

SAS – Sequence Annotated by Structure

SAS – Sequence Annotated by Structure

SAS is a tool for applying structural information to a given protein sequence. It uses FASTA to scan a given protein sequence against all the proteins of known 3D structure in the Protein Data Bank (PDB). The resultant multiple alignment can be coloured according to different structural features an…

Scorecons

Scorecons

Scores residue conservation based on a given multiple sequence alignment

Edit