0%

Data preprocessing

Quality control (QC)

Quality control is an essential first step in any genome assembly workflow. It is important to gauge the quality of your sequencing data as it could contain contamination, low quality reads, overrepresented sequences, or be inconsistent in length. 

There are various tools that can be used to perform QC on your sequencing data. One popular tool is FastQC which generates detailed reports that include a range of different quality metrics.

Trimming and Filtering

Once the quality of the sequencing data has been assessed it is vital that any contamination or low quality reads are removed in order to avoid introducing bias or errors into the results downstream. 

Read trimming or filtering methods can be used to remove unwanted sequences. For example, trimming tools, such as fastp or trimmomatic, can identify and eliminate adapter sequences left over from the library preparation step in the sequencing protocol. 

Filtering methods can additionally be used to eliminate sequences that do not meet the minimum length requirements or contaminating reads, in order to ensure the data is ‘clean’.