Computational and evolutionary genomics
Gene expression levels play a critical role in evolution, developmental processes and disease progression. Variability in the transcriptional landscape can help explain phenotypic differences both between and within species; for example, differential expression of the Tan gene between North American Drosophila species underlies divergence of pigmentation, while variation in expression levels of the MMP3 gene within the human population alters both vascular tissue remodelling and risk of developing atherosclerosis. As a result, identifying and characterising the regulatory mechanisms responsible for changes in gene expression is critically important. Recently, the advent of next-generation sequencing technology has revolutionised our ability to do this. By facilitating the generation of unbiased, high-resolution maps of genomes, transcriptomes and regulatory features such as transcription factor binding sites, these new experimental techniques have given rise to a detailed view of gene expression regulation in both model and, importantly, non-model organisms.
To make the most of these technological developments, it is essential to develop effective statistical and computational methods for analysing the vast amounts of data generated. Only by harnessing experimental and computational biology will we be able to truly understand complex biological processes such as gene regulation. With this in mind, my group focuses on the development of computational methods for interrogating high-throughput genomics data. Our work focuses primarily on modelling variation in gene expression levels in different contexts: between individual cells from the same tissue; across different samples taken from the same tumour; and at the population level where a single, large sample of cells is taken from the organism and tissue of interest. We apply these methods to a range of biological questions from studying the regulation of gene expression levels in a mammalian system to the development of the brain in a marine annelid. In all of these projects we collaborate with outstanding experimental groups, both within and outside EMBL. Together, we frame biological questions of interest, design studies and analyse and interpret the data generated.
Future plans
We will work with our experimental collaborators to apply our methods to relevant and important biological questions. From a computational perspective, modelling single-cell transcriptomics data will increase in importance. Methods for storing, visualising, interpreting and analysing the data generated will be critical if we are to exploit these data to the fullest extent. We will also work on methods for analysing conventional next-generation sequencing data, building on work that we have performed previously.

![Classification of genes according to their pattern of gene expression. The average log2 expression fold change between the alleles in the hybrids (F1 crosses between C57BL/6J and CAST/EiJ) and between the parental strains (F0s: CAST/EiJ and C57BL/6J) is plotted on the x and y-axis, respectively. (A) Genes for which the expression levels have not diverged between the two strains are classified as conserved (coloured black), while genes in which expression has diverged are classified as cis, trans or cis and trans according to whether the divergence is explained by at least one regulatory variant acting in cis (coloured yellow) or in trans (coloured red), or by at least two regulatory variants one in cis and one in trans (coloured purple). (B) Subdivision of the cis and trans category. The regulatory variants can cause gene expression changes in the same direction with the regulatory variant in cis having a stronger effect on expression change than the regulatory variant in trans (blue) or the variant in trans having a stronger effect than the variant in cis (green). Expression changes can also be in opposite directions with the variant in cis having a stronger effect than the variant in trans (brown), or the variant in trans having a stronger effect than the variant in cis (orange). [Taken from Goncalves et al., Genome Research 2012.] Marioni group figure (EMBL-EBI)](http://www.ebi.ac.uk/sites/ebi.ac.uk/files/images/Research/Marioni/Marioni_figure_220.jpg)