Networks and pathways

Introduction

This project will cover typical bioinformatics analysis steps needed to put differentially expressed genes into a wider biological context. You will start with gene expression data (RNA-seq) to build an initial interaction network. Next, you will learn to combine public network datasets, identify key regulators of biological pathways and explore biological function through network analysis. You will get first-hand experience in integration and co-visualising with additional data and functional enrichment analysis. All this helps to put the initial results into a previously known context and provide hypotheses for potential follow up experiments. We will use Cytoscape, Expression Atlas, g:Profiler, StringDb, among other tools.

Scenario

You have recently discovered a paper published a while ago that uses RNA-Seq analysis to study hepatocellular carcinoma (e.g. Chan et al., 2014). Your research aims to identify biomarkers of hepatocellular carcinoma. You have decided to investigate the data on this topic to explore the impact of differentially expressed genes between tumour and non-tumour samples in a network/functional context.

You should already have some (at least theoretical) understanding of RNA-seq data analysis, starting from processing raw data and quality control to estimate gene-level expression values and identifying differentially expressed genes. This project intends to give you the tools to aid the biological interpretation of such data and illustrate how publicly available data can be used towards further experiments and ideas.

Expression Atlas Quick Tour

https://www.ebi.ac.uk/training/online/courses/expression-atlas-quick-tour/

Functional genomics (II): Common technologies and data analysis methods

https://www.ebi.ac.uk/training/online/course/functional-genomics-ii-common-technologies-and-data-analysis-methods

This online course introduces you to the technologies and analysis methods in functional genomics. You should focus on the section on RNA-seq and Biological Interpretation of Gene Expression Data.

If you want to read about RNA-seq analysis in more detail see:

Conesa et al., 2016. A survey of best practices for RNA-seq data analysis

http://europepmc.org/articles/PMC4728800

This pre-reading should give you a good background understanding of the data you are going to work with.

Other relevant datasets and tools worth getting accustomed with before the project:

We suggest you watch this introductory video by Lee Larcombe (one of the mentors and organisers of the summer school), where he introduced relevant databases and tools.

Dataset

Your first task is to find a public RNA-seq dataset from ArrayExpress that studies hepatocellular carcinoma.
For example, there is a dataset where RNA-Seq is performed on three pairs of matched tumours and adjacent non-tumorous tissues from patients of Chinese origin with hepatocellular carcinoma. The raw and processed data for the six samples are available in ArrayExpress. (https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-33294/). 

This dataset has also been analysed by Expression Atlas:

(https://www.ebi.ac.uk/gxa/experiments/E-GEOD-33294/Results)

However, we would also like you to look for similar datasets from the same source.

Project aims

Your first task is to find a public RNA-seq dataset from ArrayExpress that studies hepatocellular carcinoma.
For example, there is a dataset where RNA-Seq is performed on three pairs of matched tumours and adjacent non-tumorous tissues from patients of Chinese origin with hepatocellular carcinoma. The raw and processed data for the six samples are available in ArrayExpress. (https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-33294/). 

This dataset has also been analysed by Expression Atlas:

(https://www.ebi.ac.uk/gxa/experiments/E-GEOD-33294/Results)

However, we would also like you to look for similar datasets from the same source.

Project mentors

Priit Adler | University of Tartu

Priit Adler is a researcher at the Institute of Computer Science at the University of Tartu. As a bioinformatician, he has broad experience in experimental data analysis. He considers himself an expert in gene expression data analysis. In recent years Priit has moved his focus on studying antibody repertoire and their sequence analysis.


Priit has been the lead author for several bioinformatic toolsets (MEM, KEGGanim) and contributed to several others (most notably g:Profiler).
Priit is also a trainer in ELIXIR-EE and a member of the ELIXIR Data Steward Interest group.

Lee Larcombe | Amphimatic

Lee Larcombe (PhD) provides expert bioinformatics and data analytics support to biotech, pharma and FMCG clients through APEXOMIC (Founded in 2012). He has a research background spanning wet-lab and bioinformatics, with particular expertise in functional genetics, biomolecular modelling and data integration for immunology and oncology. After spending 10 years in academia conducting collaborative industry research projects, he has spent the last decade supporting in silico target discovery and (bio)therapeutic development in industry. Lee also has broad experience of teaching and training in bioinformatics, including development of training strategy. He led development of the UK Level 7 Bioinformatics Scientist Apprenticeship standard, has supported several consortium projects, and continues to support skills projects in industry, academia and national institutes. He is an invited organiser of the EMBL European Bioinformatics Institute international summer school, and leads on bioinformatics apprenticeship assessment with the NHS National School of Health Care Science.