Quality control

This step involves pre-processing the data to remove:

  • adapter sequences (adapter trimming)
  • low-quality reads
  • uncalled bases
  • and to filter out contaminants (sequences which don’t derive from the source organism). It is important to check that sequence quality is similar for all samples and discard outliers

FastQC is a popular tool to perform quality assessment. As a general rule, read quality decreases towards the 3’ end of reads, and if it becomes too low, bases should be removed to improve mappability.

Trimmomatic can be used to remove PCR primers, adapter sequences and to trim the lower-scored bases and low quality N bases.