0%

Assembly

Assembly is the process of reconstructing an organism’s complete genome from sequencing reads. The goal is to generate multiple contiguous sequences, known as contigs, which can be further combined to represent the entire genome. There are two main approaches used for generating genome assemblies from sequencing data:

  1. Reference-based assembly – The genome is assembled by aligning sequencing reads to an existing reference genome of the same strain, species, or a closely related organism. This approach is efficient and accurate when a high-quality reference genome is available, and is less computationally intensive than de novo assembly. 
  2. De novo assembly – The genome is assembled from scratch, without requiring a reference. Sequencing reads are pieced together based on overlapping regions, allowing the original genome to be reconstructed. This approach is used when a reference genome for an organism does not exist, or if the organism has a highly variable genome.

Reference-based assembly involves mapping reads to a reference genome in order to generate the assembly genome. With de novo assembly, reads are assembled based on overlapping regions and this is how the assembly genome is created.
Figure 6: The two main approaches used to assemble sequencing reads into genomes.