Pipeline overview
Trainer: Wendi Bacon
Overview: This activity will give an overview on scRNA-seq pipelines and help on making careful and critical interpretations of scRNA-seq data.
Learning outcomes:
By the end of the session you will be able to:
- Identify and describe challenges and limitations in scRNA-seq analysis
Activity goals:
- Analyse the data to determine:
- Number of cells
- Number of cell clusters (generate a cluster map!)
- Disease-specific clusters
- Disease-specific transcript signatures
Activity steps:
- Examine the data
- Make a copy of this activity template
- Key:

2. Demultiplex your data
- Both samples were run on the same sequencing lane with two sample indices from Index Read 1.
- Sample index N701 contained cancerous cells
- Sample index N702 contained only healthy cells
- Divide your reads into N701 and N702 (and keep separate!)
- Example:

4. Generate a ‘cell matrix’
- A “Cell matrix” is like a “Digital Expression Matrix,” where reads that contain the same cell barcode are stacked so that cell-cell differences can be analysed
- Each emoji represents a cell barcode.
- Organise your ‘reads’ into cells by combining cell barcodes (keep N701 and N702 separate)
- Example:

5. Filter the cells
- Remove any ‘cell barcodes’ (emojis) that appear fewer than 4 times. You may also consider whether to put a cap on the highest number of transcripts constituting a cell (doublets may have more transcripts).
- These likely represent background. Setting a cut-off point (i.e. how many genes or transcripts constitute the minimum number to define a cell) can be tricky.

6. Filter the genes
- Remove any ‘genes’ (colours) that appear fewer than 3 times.
- If a gene appears so few times in a sample, it’s unlikely to be informative – it is also difficult mathematically to compare expression when a gene appears so rarely.

7. Normalisation
- You don’t actually have to do this. In this specific activity, each cell now has the same number of transcripts. However, in a real sample, this would not be true – imagine trying to compare transcript signatures between cells with drastically different numbers! Anyway, normalisation helps here.
8. Find Variable Genes
- Some genes don’t vary much between cells – and carrying forward a matrix of size cells x genes can make computation a bit of a nightmare! Standard pipelines only take into account genes that vary significantly.
- Remove all ‘yellow’ transcripts – according to the super intense algorithm of “I said = so”, these transcripts have been found to not vary.

9. Scale Data
- This step is not always performed, although it can help make it easier to compare different samples with different depths of sequencing. This step scales the variation between genes to make them more easily comparable (otherwise, genes with strong expression differences will dominate the analysis, hiding subtle differences from other genes). With this step, you can also optionally ‘regress’ genes, which is to say, their variation will not contribute to cluster calling.
- Green genes here have been found to contribute to cell cycling. We are not interested in this and don’t want it to obscure the genes driving cancer progression. Remove the green genes (‘cell cycle regression’).

10. Dimensionality Reduction
- Normally dimensionality reduction is a huge part of this protocol. There are only 3 dimensions (i.e. 3 genes) in this data, so you can skip this!
11. Identify cell clusters
- Group the cells by the ‘transcript signatures’.
- Exemple:
These cells would be in the same cluster
- Exemple:

But likely not in the same cluster as this cell:

12. Plot your cells
- Select your Cluster Plot here
- Plot the cells using the ‘cell clusters’ you identified in Step 5. Similar cells should be pletted close together. Put a circle around each cell cluster.
- Example:

13. Interpret the results
- Answer the follow questions
- Were there any cells you couldn’t classify?
- How many total cells did you find?
- How many cell types (clusters) are in your final map?
- How did you interpret the results?
14. Check the answer key here.