- Course overview
- Search within this course
- What is antimicrobial resistance?
- How do we study pathogens?
- Public pathogen data
- A guide to the Pathogens Portal
- Identification and investigation of antimicrobial resistance genes
- Looking for antimicrobial resistance genes in different environments
- Data sharing
- The future of AMR
- Crossword: Test your knowledge
- Your feedback
- Further resources
- Help and support
- Glossary
- References
Data preprocessing
Quality control (QC)
Quality control is an essential first step in any genome assembly workflow. It is important to gauge the quality of your sequencing data as it could contain contamination, low quality reads, overrepresented sequences, or be inconsistent in length.
There are various tools that can be used to perform QC on your sequencing data. One popular tool is FastQC which generates detailed reports that include a range of different quality metrics.
Trimming and Filtering
Once the quality of the sequencing data has been assessed it is vital that any contamination or low quality reads are removed in order to avoid introducing bias or errors into the results downstream.
Read trimming or filtering methods can be used to remove unwanted sequences. For example, trimming tools, such as fastp or trimmomatic, can identify and eliminate adapter sequences left over from the library preparation step in the sequencing protocol.
Filtering methods can additionally be used to eliminate sequences that do not meet the minimum length requirements or contaminating reads, in order to ensure the data is ‘clean’.