What is ENA?

The European Nucleotide Archive (ENA) is a comprehensive resource of primary nucleotide sequence information. ENA provides access to both assembled sequence and unassembled (raw) sequence reads, but places them in separate databases in order to optimise accessibility and analysis (2). Figure 2 provides a schematic representation of how the data is stored in ENA.

Image showing the structure of ENA

Figure 2.  ENA is composed of 3 databases: EMBL-Bank for assembled data, and the Sequence Read Archive and Trace Archive for raw data.


ENA consists of three databases:

(1) EMBL-Bank consists of:

    • Assembled sequence data, where the submitter has assembled the sequence into one long contiguous length.
    • Annotation information that describes the biological function of specific regions of the sequence (such as protein-coding regions, exons and introns), which is provided by the submitter.

(2) Sequence Read Archive (SRA) consists of (3):

    • Reads of raw data consisting of typically short, unassembled fragments of sequence generated using Next Generation Sequencing (NGS) technology.

(3) Trace Archive consists of:


Note that the unassembled sequence data contained in the SRA and Trace Archive can be difficult to work with because these raw data forms inherently show considerable duplication of sequence data and sequencing error.  Sequence reads can be downloaded (see section on 'How to export sequence and download data').