EBPOD Project proposal: “Multi-modal analysis of the evolution of genomically unstable pancreatic cancer in murine and 3D organo

Multi-modal analysis of the evolution of genomically unstable pancreatic cancer in murine and 3D organoid models

EBPOD 2017: Project 5

This is one of 11 joint postdoctoral fellowships offered by EMBL-EBI, the NIHR Cambridge Biomedical Research Centre and the University of Cambridge’s School of the Biological Sciences in 2017.

Principal Investigators


The unprecedented (epi)genomic complexity of cancers, and the diversity of the mechanisms underlying their initiation and progression in different tissues, presents major obstacles for biological understanding as well as clinical intervention. In this context, genetically engineered murine models that autochthonously recapitulate spontaneous carcinogenesis in different tissues enable the longitudinal analysis of cancer evolution in an isogenic background, enabling the identification of molecular or cellular events during progression1. Dr. Venkitaraman's laboratory has created a series of novel genetically engineered murine models (10 new "mutator" strains, unpublished) in which pancreatic cancer develops in the context of different forms of genomic instability, enabling cancer evolution to proceed in an accelerated manner that recapitulates key steps in progression of the human disease. Such models offer a valuable interim technological platform that is of potential value for the discovery and clinical translation of approaches for cancer - particularly when coupled to 3D organoid models derived from murine tissue, as well precancerous or malignant human tissue, which are being developed with collaborators, mutually complementing the limitations inherent to each system.

Understanding of the steps in cancer evolution necessitates the integration of information across biological scales, requiring multi-modal analyses spanning genomic, transcriptomic and proteomic events. Because pre-malignant and malignant tissue can be sampled longitudinally, this offers a powerful resource to deconvolute rate-limiting events from “noise”. In the particular case of proteomics, novel techniques e.g. data independent acquisition (DIA) approaches can complement current genomic approaches offering potential for new “multi-omics” analysis at a personal level. Leveraging this capability requires development of novel methods for data analysis, integration and visualization.


The main objective of this project is to substantially improve the understanding of steps in pancreatic cancer evolution by studying its longitudinal progression. To achieve this, we will perform an innovative multistage analysis of ‘multi-omics’ data generated from numerous murine and 3D organoid models, involving genomic, transcriptomic, and proteomic approaches. It is expected that the novel data analysis and data integration approaches developed in this work will be applicable to other types of cancer in the future.

Available datasets

Dr. Venkitaraman's group has begun to generate snapshots of the longitudinal progression of KRASdriven pancreatic cancer in genetically engineered murine models (GEMMs). In these GEMMs,expression of mutant KRADS G12D, with or without mutant p53, is conditionally induced only in the pancreas, and has been combined with different alleles that wholly or partially inactivate the functions of tumour suppressors (e.g., BRCA2, PALB2) known to induce genome instability in human pancreatic adenocarcinoma (PDAC). In addition, strains with dysregulated expression of the mitotic kinase, AURKA, overexpressed in >30% of human PDAC are under development. Together, these strains offer a powerful and well-controlled model in which to identify key steps in PDAC evolution.

Samples of pre-malignant and malignant tissues from the GEMMs are available at different stages in evolution, and characterization of genomic and transcriptomic changes has begun. Preliminary work in collaboration with Ruedi Aebersold’s group in Zurich has begun to generate proteomic and phospho-proteomic profiles using DIA methods such as SWATH-MS. Completion of these multimodal analyses promises to provide a unique profile of the longitudinal progression of KRAS-driven PDAC.

In addition, steps are underway to develop human models for PDAC using 3D organoid cultures from patients being sampled by surgical colleagues working in Ashok’s lab, in collaboration with Matthew Garnett (Sanger Institute) and Raj Chopra (Institute of Cancer Research, London). These models will enable the validation of insights generated in murine models.

Work plan

Data analysis pipelines for proteomic DIA approaches (e.g. SWATH-MS) will be developed at EMBL-EBI using existing free-to-use (ideally open source) software. First of all, proteomic data (total proteome and phospho-proteome data) will be analysed following a multi-stage approach using the spectral library-based software OpenSWATH2 (available via OpenMS), and additional post-processing tools such as MSStats and SWATH2stats.

Spectral libraries will be built using shot-gun proteomic data generated in parallel (and analysed with existing in-house pipelines, taking advantage of the existing corresponding genomic and transcriptomic data), by using the high-performant PRIDE Cluster spectrum clustering algorithm3. To improve the existing spectral libraries (a key step in the analysis), we will also explore the use of relevant public proteomic datasets available in the PRIDE database. In addition, SWATH-MS analysis pipelines using ‘data centric’ (e.g. using DIA-Umpire4) and hybrid approaches (mixing spectral library-based and data-centric strategies) will also be explored. We expect that this innovative iterative analysis will improve dramatically the coverage of the existing proteome data and therefore, enable the discovery of specific variant proteins in pancreatic cancer.

Additionally, quantitative proteomic analysis will be performed using the total and phosphoproteome data, using the pipelines explained above. The availability of both portions of the proteome will enable the elucidation of those proteins where the abundance and phosphorylation levels change unambiguously as a result of the disease, enabling the identification of potential drug targets. We will also study the correlation between the gene (transcriptomic data) and the calculated protein expression levels.

In all cases, comparative analysis will be performed between the different murine and organoid models, corresponding to different cancer stages, using innovative data integration methods between proteomic and genomic data, e.g. the Integrative Genome Viewer (IGV)5. We will then use pathway analysis tools such as the Reactome Pathway Analysis Portal to analyse and visualize the results within a pathway context. We expect to identify pathways that are differentially regulated between normal and cancer-samples or highlight those that can be disrupted by cancer-specific variant or phosphorylated proteins.


Dissemination of the most relevant results to the community will be ensured via prominent EMBL-EBI resources (e.g. PRIDE, UniProt, Ensembl). Additionally, we will select a small number of the most promising candidate proteins for validation using targeted proteomic approaches. These will be reported back for biological follow-up.


1. Skoulidis, F. et al. Cancer Cell 18 , 499-509 (2010).

2. Rost, H.L. et al. Nat Biotechnol 32 , 219-223 (2014).

3. Griss, J. et al. Nat Methods 13 , 651-656 (2016).

4. Tsou, C.C. et al. Nat Methods 12 , 258-264, 257 p following 264 (2015).

5. Thorvaldsdottir, H., Robinson, J.T. & Mesirov, J.P. Brief Bioinform 14 , 178-192 (2013).