Proteins, structures & chemical biology
The work in Dr Alex Bateman's research group centres on the idea that there are a finite number of families of protein and RNA genes. The group endeavours to enumerate all of these families to gain an understanding of how complex biological processes have evolved from a relatively small number of components. The Bateman group has produced a number of widely used biological database resources such as Pfam, Rfam, TreeFam and MEROPS to collect and analyse these families of molecules, and has published a large number of novel protein domains and families of particularly high interest.
The major goal of the Protein Data Bank in Europe (PDBe) is to provide integrated structural data resources that evolve with the needs of biologists. To that end, our team endeavours to: handle deposition and annotation of structural data expertly; provide an integrated resource of high-quality macromolecular structures and related data; and maintain in-house expertise in all the major structure-determination techniques (X-ray crystallography, Nuclear Magnetic Resonance spectroscopy and 3D Electron Microscopy). Our specific focus areas are: advanced services, ligands, integration, validation and experimental data. To transform the structural archives into a truly useful resource for biomedical and related disciplines, we focus our developments on five key areas: advanced services (e.g. PDBePISA, PDBeFold, PDBeMotif and the new PDB browsers); annotation, validation and visualisation of ligand data; integration with other biological and chemical data resources; validation and presentation of information about the quality and reliability of structural data; and exposing experimental data in ways that help all users understand the extent to which they support the structural models and inferences.
The ChEMBL team, led by Dr John Overington, develops and manages EMBL-EBI’s database of quantitative small molecule bioactivity data focussed in the area of drug discovery. Although great progress has been made in developing biological drugs, synthetic small molecule and natural product-derived drugs still form the majority of novel life-saving drugs. The process complexity and costs of discovering new drugs has recently risen to the point where public-private partnerships are becoming funded to investigate high risk activities such as target validation. Central to such activities are data sharing and wide availability of integrated structure, binning, functional and ADMET data. The ChEMBL database stores curated two-dimensional chemical structures and abstracted quantitative bioactivity data alongside calculated molecular properties. The majority of the ChEMBL data is derived by manual abstraction and curation from the primary scientific literature, and therefore cover a significant fraction of the structure–activity relationship (SAR) data for the discovery of modern drugs.
Our associated research interests focus on data-mining ChEMBL data applied to drug-discovery challenges.
Dr Christoph Steinbeck leads the Cheminformatics and metabolism service team, which runs a number of key services and develops algorithms to: process chemical information; predict metabolomes based on genomic and other information; determine the structure of metabolites by stochastic screening of large candidate spaces; and enable the identification of molecules with desired properties. This requires algorithms based on machine learning and other statistical methods for the prediction of spectroscopic and other physicochemical properties represented in chemical graphs. Dr Steinbeck also has a research group, which focuses on the understanding of the small-molecule metabolism of living organisms. The group is interested in the analysis of metabolomics experiments, including methods for computer-assisted structure elucidation of biological metabolites and metabolic pathways. They develop and maintain chemistry-related databases of biological interest, and develop machine-learning methods for the prediction of mass (MS) and nuclear magnetic resonance (NMR) spectra for use in rereplication and structure elucidation. The methods and algorithms developed in the group are available through an open-source library for structural chemo- and bioinformatics.
Dr Sarah Teichmann's research group seeks to elucidate general principles of gene expression and protein complex assembly. They study protein complexes in terms of their three-dimensional structure, structure evolution, and the principles underlying protein complex formation and organization. Another major focus is understanding regulation of gene expression during switches in cell state, and in their wet lab at the Wellcome Trust Sanger Institute the group uses mouse T-helper cells as a model of cell differentiation.
Prof Dame Janet Thornton's research group seeks to understand more about how biology works at the molecular level, with a particular focus on proteins and their 3D structure and evolution. They explore how enzymes perform catalysis by gathering relevant data from the literature and developing novel software tools, which allows for the characterisation of enzyme mechanisms. In parallel, they investigate the evolution of these enzymes to discover how they can evolve new mechanisms and specificities. In close collaboration with colleagues at University College London (UCL), the group investigates ways to improve the prediction of function from sequence and structure and to enable the design of new proteins or small molecules with novel functions, and to understand more about the molecular basis of ageing in different organisms.
Dr Pedro Beltrao's group seeks to understand the function and evolution of cellular networks by exploring how genetic variability is propagated through molecules, structures and interaction networks to give rise to phenotypic variability. The group focuses on two areas: the evolution of chemical--genetic interactions in different species and individuals; and the function and evolution of post-translational regulatory networks. There is a strong emphasis in collaborating with experimental groups both for data acquisition and follow-up studies.
Dr John Marioni's research group develops effective statistical and computational methods for analysing the vast amounts of data generated in high-throughput experiments. To gain a deeper understanding of complex biological processes such as gene regulation, the group develops computational methods for interrogating high-throughput genomics data. Their work focuses primarily on modelling variation in gene expression levels in different contexts: between individual cells from the same tissue; across different samples taken from the same tumour; and at the population level where a single, large sample of cells is taken from the organism and tissue of interest. Working with experimental colleagues within and beyond EMBL, the group applies their methods to biological questions ranging from the regulation of mammalian gene expression levels to the brain development in a marine annelid.
Dr Julio Saez Rodriguez's research group creates mathematical models that integrate a range of data (from genomic to biochemical) with various sources of prior knowledge, with an emphasis on providing both predictive power of new experiments and insights into the functioning of the signaling network. Working closely with experimental colleagues, the group combines statistical methods with models describing the mechanisms of signal transduction either as logical or physico-chemical systems. They develop new tools, integrate them with existing resources and use them to explore how signalling is altered in human disease. The aim is to predict effective therapeutic targets.
Dr Oliver Stegle's research group uses computational approaches to unravel the genotype--phenotype map on a genome-wide scale. Their work focuses on the development and use of statistical methodology to dissect the causes of molecular variation. The group has shown how comprehensive modelling can greatly improve the statistical power to find genetic associations with gene expression levels and provide for an enhanced interpretation of the interplay between genetic variation, transcriptional regulation and molecular traits. The address these methodological questions in the context of close collaborations with experimental groups, where they apply novel statistical tools to study molecular traits in model organisms, plant systems and biomedical applications.
Genes & gene expression
Dr Ewan Birney is co-Associate Director of EMBL-EBI and shares strategic oversight of bioinformatics services. His research group focuses on sequence algorithms and using intra-species variation to explore elements of basic biology. The Birney group has a long-standing interest in developing sequencing algorithms, with considerable focus on theoretical and practical implementations of data compression techniques. Dr Birney's "blue skies" research includes collaborating with Dr Nick Goldman on a method to store digital data in DNA molecules. The Birney group continues to be involved in this area as new opportunities arise - including the application of new sequencing technologies. We are also interested in the interplay of natural DNA sequence variation with cellular assays and basic biology. Over the past five years there has been a tremendous increase in the use of genome-wide association to study human diseases. However, this approach is very general and need not be restricted to the human disease arena. Association analysis can be applied to nearly any measureable phenotype in a cellular or organismal system where an accessible, outbred population is available. We are pursuing association analysis for a number of both molecular (e.g. RNA expression levels and chromatin levels) and basic biology traits in a number of species where favourable populations are available including human, and Drosophila. In the future we hope to expand this to a variety of other basic biological phenotypes in other species, including establishing the first vertebrate near-isogenic wild panel in Japanese Rice Paddy fish (Medaka, Oryzias latipes).
Dr Alvis Brazma's research group complements the Functional Genomics service team, and focuses on developing new methods and algorithms and integrating new types of data across multiple platforms. The group is particularly interested in cancer genomics and transcript isoform usage, and collaborates closely with the Marioni group and others throughout EMBL.
Dr Anton Enright's research group aims to predict and describe the functions of genes, proteins and regulatory RNAs as well as their interactions in living organisms. Regulatory RNAs have recently entered the limelight, as the roles of a number of novel classes of non-coding RNAs have been uncovered. The group's work involves the development of algorithms, protocols and datasets for functional genomics. The focus is on determining the functions of regulatory RNAs including microRNAs, piwiRNAs and long non-coding RNAs. The Enright group collaborates extensively with experimental laboratories on commissioning experiments and analysing experimental data.
Dr Paul Flicek's research group focuses on computational models for genome annotation and evolution based on models incorporating DNA-protein interactions, epigenetic modifications, and the DNA sequence itself. The group is also interested in the large-scale infrastructure required for modern bioinformatics including storage and access methods for high-throughput sequencing data.
Dr Nick Goldman's research group centres on three main research activities: developing new evolutionary models and methods; providing these methods to other scientists via stand-alone software and web services; and applying such techniques to tackle biological questions of interest. We participate in comparative genomic studies, both independently and in collaboration with others, including the analysis of next-generation sequencing (NGS) data. This vast source of new data promises great gains in understanding genomes and brings with it many new challenges.