- Course overview
- Search within this course
- Real-time PCR
- What is Next Generation DNA Sequencing?
- RNA sequencing
- Biological interpretation of gene expression data
- Genotyping, epigenetic and DNA/RNA-protein interaction methods
- DNA/RNA-protein interactions
- Quiz: Check your learning
- Your feedback
- Learn more
The simplest approach to quantifying gene expression by RNA-seq is to count the number of reads that map (i.e. align) to each gene (read count) using programs such as HTSeq-count. This gene-level quantification approach utilises a gene transfer format (GTF) file containing gene models, with each model representing the structure of transcripts produced by a given gene.
Raw read counts are affected by factors such as transcript length (longer transcripts have higher read counts, at the same expression level) and total number of reads. Thus, if we want to compare expression levels between samples, we need to normalise the raw read counts. The measure RPKM (reads per kilobase of exon model per million reads) and its derivative FPKM (fragments per kilobase of exon model per million reads mapped) account for both gene length and library size effects
Correcting for gene length is not necessary when comparing changes in gene expression within the same gene across samples. However, it is necessary for correctly ranking gene expression levels within the sample to account for the fact that longer genes accumulate more reads (at the same expression level).