Single cell RNA-sequencing analysis using Python

Scenario

The study we have selected for this project provides an atlas of single-cell level gene expression patterns of the human retinal pigment epithelium (RPE) and choroidal cell types. The RPE and the choroid are complex tissues whose dysfunction can lead to visual impairment and loss. The human donors include healthy individuals and individuals with Age-related Macular Degeneration (AMD). Studying cell-type specific signatures using single-cell data can give further insights into the molecular mechanisms involved in cell apoptosis and macular degeneration in the RPE and choroidal tissue. The provided dataset allows for studying the role of endothelial cell types in the RPE and choroidal tissues of the eye and identifying AMD-specific populations and/or AMD-related genes when compared to normal samples.

Recommended materials to check before starting working on the project

Note: 1-2 are essential, 3-4 recommended, and 5-6 optional.

  1. Basic linux command line operations:
  • A brief guide on directories and files, which will mostly be what you need to know – review these to start to familiarise yourself with the commands:

https://support.cs.wm.edu/index.php/tips-and-tricks/basic-linux-commands

  1. Git and Github:
  1. Python basics and relevant libraries:

We will be analysing single cell RNA-seq data using Python. Therefore please familiarise yourself with basic Python programming and the following libraries:

  • Pandas: basic functions for data analysis
  • Annadata: the object that stores single cell RNA-seq data
  • Scanpy: the package providing basic functionalities for explorative analysis  single cell RNA-seq data using AnnData objects
  1. Jupyter notebook, Jupyter lab:

These provide a convenient, interactive interface for you to run the analysis code and visualise the outputs. We will be using these notebooks for the project.

  1. If you are interested in diving into learning python formally you can start from here: https://openbookproject.net/thinkcs/python/english3e/  and the interactive version: https://runestone.academy/ns/books/published/thinkcspy/index.html 
  1. Literature reading: We will work with a human single-cell RNA-sequencing dataset from this study: https://www.pnas.org/doi/full/10.1073/pnas.1914143116. We recommend looking at the paper to familiarise yourself with the dataset and biological question.

Dataset

The single-cell RNA-sequencing dataset we will use in this project can be found here. It contains two single-cell experiments from macular and peripheral regions of the retinal pigment epithelium (RPE)-choroid from 7 human donor eyes (4 right eyes and 3 left eyes). Here we will work with the second experiment conducted on CD31-enriched RPE/choroid from donors 4-7. 

Useful links: 

Project aims

  • Understand the basic components of an AnnData object. Where are count matrices stored? Where are cell metadata and gene metadata stored? 
  • Perform a standard scRNA-seq data analysis using the Scanpy package. Starting from raw count matrices, find the relevant functions from Scanpy to perform the following steps, understand what these functions do to the data and why the steps are necessary:
  • Quality control
  • Dimension reduction
  • Clustering analysis
  • Visualisation
  • Check the AnnData object after these processing steps. What information is added to the object?
  • Perform differential expression analysis between cell types, and between healthy and diseased (AMD) donors.
  • Perform batch correction or data integration between donors, understand the different options and at which levels corrections are performed.

Figure from: Rastoin O, Pagès G, Dufies M. Experimental Models in Neovascular Age Related Macular Degeneration. Int J Mol Sci. 2020 Jun 29;21(13):4627. doi: 10.3390/ijms21134627. PMID: 32610682; PMCID: PMC7370120.

Dictionary-Abbreviations 

Macular degeneration: a degenerative condition affecting the central part of the retina (the macula) and resulting in distortion or loss of central vision.

RPE: Retina Pigment Epithelium

AMD: wet Age-related Macular Degeneration

Subsetting

countmatrix1 = countmatrix[:,countmatrix.var[“gene_name”].notna()]

Resources:

https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html

https://anndata.readthedocs.io/en/latest/concatenation.html

https://pandas.pydata.org/docs/user_guide/10min.html

Mentors

Anna Vathrakokoili-Pournara | EMBL-EBI

I am a Predoctoral fellow at Papatheodorou Group at EBI and during my PhD I have been working on cell-type deconvolution of bulk expression data, exploring the contribution of cell composition in various phenotypes. I am focusing on investigating the contribution of immune cells in cancer outcomes and my goal is to apply machine learning models utilising omics data to predict patient outcomes. I am passionate about making bioinformatic analyses and tools more accessible to the community and reproducible. Beyond bioinformatics I enjoy going for long walks with friends and exploring new cafés in town.

Yuyao Song | EMBL-EBI

Yuyao is a PhD candidate from the Papatheodorou group at EMBL-EBI. Her primary research lies in cross-species comparison of cell types using scRNA-seq data. She has been developing packages and workflows to tackle the conservation and divergence of cell type expression profiles across species, from the perspective of evolution and functional genomics. She is enthusiastic about exploring nature and culture across the globe and theatre.