Zerbino group

High-throughput machine learning and gene expression regulation

We aim to leverage the power of high-throughput processing and novel machine learning techniques to integrate large collections of available experiments and better understand the mechanisms of gene expression regulation. Better knowledge of regulatory elements along the genome is of huge importance both for molecular biology and translational biomedical applications. These genomic elements have been demonstrated to be critical in understanding development, common as well as Mendelian traits and diseases, and evolution. Having a detailed map of regulatory regions and networks will allow us to gain higher confidence in determining the aetiology of diseases and help accelerate medical research. We are therefore digging into the structure, evolution and function of gene regulatory networks, and investigating their role in the path from genotype to phenotype.

Ensembl Regulatory Build

Future plans

Defining regulatory elements

To locate and identify regulatory elements, we first need to define what constitutes one. There currently exist various definitions of regulatory elements focused on different aspects such as sequence patterns, evolutionary selection, gene expression, genetic causality on protein binding or chromatin marks.  We are therefore looking into charactering regulatory elements and their possible functional components.

Defining cis-regulatory interactions

Beyond enumerating regulatory regions on the genome, it is necessary to determine their function, in particular the genes that they effectively regulate. Epigenomics provide a reductionist decomposition of cis-regulation, based on a chain of empirical measurements: transcription factor expression, openness of an regulatory element, existence of a transcription factor binding site, effective transcription factor binding to a regulatory element, interaction between the regulatory element and the target promoter, expression of the target gene. Is the intersection of all these observations a better predictor of gene expression patterns than co-expression networks?

Functional annotation of the gene expression regulatory network

Having in effect characterised the function of each regulatory element as a connector between a set of transcription factors and a target gene in a particular tissue or developmental stage, we could test these annotations on practical questions, on genome wide association studies as well as comparative genomics.

Selected publications

Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters.

Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, Cairns J, Wingett SW, Várnai C, Thiecke MJ, Burden F, Farrow S, Cutler AJ, Rehnström K, Downes K, Grassi L, Kostadima M, Freire-Pritchett P, Wang F, BLUEPRINT Consortium, Stunnenberg HG, Todd JA, Zerbino DR, Stegle O, Ouwehand WH, Frontini M, Wallace C, Spivakov M, Fraser P.Cell Volume 167 (2016) p.1369-1384.e19 

The International Human Epigenome Consortium: A Blueprint for Scientific Collaboration and Discovery.

Stunnenberg HG, International Human Epigenome Consortium, Hirst M.Cell Volume 167 (2016) p.1145-1149

The Ensembl regulatory build.

Zerbino DR, Wilder SP, Johnson N, Juettemann T, Flicek PR.Genome Biol Volume 16 (2015) p.56 

WiggleTools: parallel processing of large collections of genome-wide datasets for visualization and statistical analysis.

Zerbino DR, Johnson N, Juettemann T, Wilder SP, Flicek P.Bioinformatics Volume 30 (2014) p.1008-1009