Sequence assembly is performed by the submitter, not by ENA curators (Figure 6). ENA sometimes receives the initial data as unassembled sequence reads, which are gradually complemented with additional layers of interpretation, such as assembly or annotation, by the author.
As a result, data is always changing. Sources of this change include:
- Assembly of sequence into larger fragments;
- Reducing the visibility of obsolete entries (once assembled);
- Sequence modifications;
- Daily updates;
- Corrections submitted by author.
An example where the resubmission of assembled sequence results in the creation of new entries and the removal of those they replace to the Sequence Version
Archive (see below). Sequence might first enter ENA as SRA (Sequence Read Archive
) fragmented sequence reads; it might be re-submitted as assembled WGS (Whole Genome Shotgun) sequence overlap contigs
; it might be re-submitted again with further assembly as CON (Constructed) sequence entries, with the older WGS entries being consigned to the Sequence Verson Archive.
SRA is the Sequence Read Archive, WGS is the whole genome shotgun data class
representing sequence overlap contigs, and CON are the constructed sequences consisting of scaffolds, super-scaffolds and chromosomes.