Linking genetic predictors of high-dimensional imaging phenotypes and disease outcomes

Linking genetic predictors of high-dimensional imaging phenotypes and disease outcomes

EBPOD 2017: Project 8

This is one of 11 joint postdoctoral fellowships offered by EMBL-EBI, the NIHR Cambridge Biomedical Research Centre and the University of Cambridge’s School of the Biological Sciences in 2017.

Principal Investigators


Genetic variants can be used as anchors to link phenotypic variables (for example, blood pressure, lipid levels) to disease outcomes (for example, heart attack rates). This approach, known as “Mendelian randomization”, enables causal inferences to be made from observational data. It has been used in a wide range of contexts, primarily in epidemiology, but also in psychology, social sciences, economics, and other fields. 

As an example, genetic variants in the HMGCR gene region are associated with reduced levels of low-density lipoprotein cholesterol (LDL-cholesterol, informally known as “bad cholesterol”). The biological pathway associated with the HMGCR gene region is the one targeted by statin drugs. Genetic variants in this region can therefore be used as naturally-occurring proxies for statin therapy. The impact of statin therapy on various diseases can be assessed by testing the association of these genetic variants with disease outcomes. Variants in the HMGCR gene region that are associated with lower LDL-cholesterol are also associated with reduced risk of coronary heart disease. This suggests that taking statins will reduce your risk of coronary heart disease risk. Figure 1 shows similar associations scaled to a 1 mmol/L decrease in LDL-cholesterol for variants in other lipid-related gene regions, suggesting that a similar reduction in coronary heart disease risk is associated with other LDL-cholesterol lowering mechanisms. Additionally, these variants are all associated with Type 2 diabetes risk, but with associations in the opposite direction, suggesting that statin drugs increase the risk of Type 2 diabetes (as has been demonstrated in trials), but that novel NPC1L1-inhibitors (such as ezetimibe) will increase diabetes risk to a greater extent per 1 mmol/L change in LDL-cholesterol.

Recent advances in statistical methodology have focused on obtaining reliable causal inferences in scenarios where the genetic variants may not have specific associations with the trait under investigation. In order to implicate a putative risk factor as causal, one needs to have a genetic variant that is specifically associated with the risk factor under investigation, and that does not have extraneous (pleiotropic) associations with alternative risk factors. Formally speaking, such a genetic variant needs to satisfy the assumptions of an instrumental variable. Recent theoretical work has concentrated on specifying less restrictive assumptions that can still provide consistent causal estimates when there are multiple genetic variants that influence the risk factor. The UK Biobank (UKBB) is a fertile source of genetic variants that can be linked with disease outcomes in further datasets. UKBB is an open-access data resource consisting of 500 000 individuals between the ages of 50 and 64 who have been “deeply phenotyped” as well as genotyped – genetic and phenotypic data are available for around 800 000 measured genetic variants, and for large numbers of risk factors. Once genetic variants have been identified, their associations with disease outcomes can also be looked up in publicly-available datasets, such as in the CARDIoGRAMplusC4D consortium (comprising around 60 000 coronary heart disease cases and 110 000 controls) and the DIAGRAM consortium (comprising around 22 000 Type 2 diabetes cases and 58 000 controls).

Project outline

As well as single risk factors, UKBB have also measured high-dimensional imaging phenotypes in a subset of individuals. For example, brain scans, heart scans and retinal scans have been taken for around 100 000 individuals. The aim of this project is to take existing methods for Mendelian randomization and extend them to the scenario of a high-dimensional phenotype. The Birney group has growing expertise in high dimensional human imaging data in both hearts and retinas. The Burgess group has expertise in developing Mendelian randomization techniques to trace causal pathways. The situation of a multi-dimensional imaging phenotype provides additional complication in terms of multiple testing, the possibility of sparse modelling, and individual measures of the trait being highly correlated. However, it could potentially add considerable insight as to which aspects of a complex phenotype are implicated in disease processes. Our goal would be to find robust image-based components which are on causal pathways towards disease processes, thus implicating the relevant organ/substructure explicitly as linked with disease outcomes. A successful candidate for this project would be supervised by Dr Stephen Burgess (MRC Biostatistics Unit, University of Cambridge) and Dr Ewan Birney (European Bioinformatics Institute) and would divide their time between the two institutions. They would gain wide experience both in the development of statistical methodology and in application to applied research.