How is the sequence assembled?

Sequence assembly is performed by the submitter, not by ENA curators (Figure 6). ENA sometimes receives the initial data as unassembled sequence reads, which are gradually complemented with additional layers of interpretation, such as assembly or annotation, by the author.

As a result, data is always changing. Sources of this change include:

  • Assembly of sequence into larger fragments;
  • Reducing the visibility of obsolete entries (once assembled);
  • Sequence modifications;
  • Daily updates;
  • Corrections submitted by author.
An image showing how the sequence assembled
Figure 6. An example where the resubmission of assembled sequence results in the creation of new entries and the removal of those they replace to the Sequence Version Archive (see below). Sequence might first enter ENA as SRA (Sequence Read Archive) fragmented sequence reads; it might be re-submitted as assembled WGS (Whole Genome Shotgun) sequence overlap contigs; it might be re-submitted again with further assembly as CON (Constructed) sequence entries, with the older WGS entries being consigned to the Sequence Verson Archive.
SRA is the Sequence Read Archive, WGS is the whole genome shotgun data class representing sequence overlap contigs, and CON are the constructed sequences consisting of scaffolds, super-scaffolds and chromosomes. 

Obsolete entries are accessible through the Sequence Version Archive (SVA). See section on 'Finding old archived entries'.