Computational and evolutionary genomics

Marioni group figure (EMBL-EBI) Classifying genes by their regulatory function. We used RNA-seq data generated from F0 mice and their F1 hybrids to classify genes into sets depending upon their regulatory mechanism.  [Goncalves et al., Genome Research 2012.]

Gene expression levels play a critical role in evolution, developmental processes and disease progression. Variability in the transcriptional landscape can help explain phenotypic differences both between and within species; for example, differential expression of the Tan gene between North American Drosophila species underlies divergence of pigmentation, while variation in expression levels of the MMP3 gene within the human population alters both vascular tissue remodelling and risk of developing atherosclerosis. As a result, identifying and characterising the regulatory mechanisms responsible for changes in gene expression is critically important. Recently, the advent of next-generation sequencing technology has revolutionised our ability to do this. By facilitating the generation of unbiased, high-resolution maps of genomes, transcriptomes and regulatory features such as transcription factor binding sites, these new experimental techniques have given rise to a detailed view of gene expression regulation in both model and, importantly, non-model organisms.

To make the most of these technological developments, it is essential to develop effective statistical and computational methods for analysing the vast amounts of data generated. Only by harnessing experimental and computational biology will we be able to truly understand complex biological processes such as gene regulation. With this in mind, my group focuses on the development of computational methods for interrogating high-throughput genomics data. Our work focuses primarily on modelling variation in gene expression levels in different contexts: between individual cells from the same tissue; across different samples taken from the same tumour; and at the population level where a single, large sample of cells is taken from the organism and tissue of interest. We apply these methods to a range of biological questions from studying the regulation of gene expression levels in a mammalian system to the development of the brain in a marine annelid. In all of these projects we collaborate with outstanding experimental groups, both within and outside EMBL. Together, we frame biological questions of interest, design studies and analyse and interpret the data generated.

Future plans

We will work with our experimental collaborators to apply our methods to relevant and important biological questions. From a computational perspective, modelling single-cell transcriptomics data will increase in importance. Methods for storing, visualising, interpreting and analysing the data generated will be critical if we are to exploit these data to the fullest extent. We will also work on methods for analysing conventional next-generation sequencing data, building on work that we have performed previously.

Marioni group news

Selected publications

Goncalves, A. et al. (2012) Extensive compensatory cis-trans regulation in the evolution of mouse gene expression. Genome Res 22, 2376-2384.

Fonseca, N.A. et al. (2012) Tools for mapping high-throughput sequencing data. Bioinformatics 28, 3169-3177.

Kim J.K. and Marioni J.C. (2013) Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol (in press).