Differential gene expression analysis

Differential expression analysis means taking the normalised read count data and performing statistical analysis to discover quantitative changes in expression levels between experimental groups. For example, we use statistical testing to decide whether, for a given gene, an observed difference in read counts is significant, that is, whether it is greater than what would be expected just due to natural random variation.

Methods for differential expression analysis

There are different methods for differential expression analysis such as edgeR and DESeq based on negative binomial (NB) distributions or baySeq and EBSeq which are Bayesian approaches based on a negative binomial model. It is important to consider the experimental design when choosing an analysis method. While some of the differential expression tools can only perform pair-wise comparison, others such as edgeR, limma-voom, DESeq and maSigPro can perform multiple comparisons.

In Figure 11, below, we outline the RNA-seq processing pipeline used to generate data for Expression Atlas.

Figure 11 RNA-seq processing pipeline used to generate gene expression data in Expression Atlas.

In this pipeline raw reads (FASTQ files) undergo quality assessment and filtering. The quality-filtered reads are aligned to the reference genome via HISAT2. The mapped reads are summarised and aggregated over genes via HTSeq. For baseline expression, the FPKMs are calculated from the raw counts by iRAP. These are averaged for each set of technical replicates, and then quantile normalised within each set of biological replicates using limma.

Finally, they are averaged for all biological replicates (if any). For differential expression, genes expressed differentially between the test and the reference groups of each pairwise contrast are identified using DESeq2.