The European Nucleotide Archive (ENA) provides a comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation (5). The ENA consists of three main databases (Figure 1.4): the Sequence Read Archive (SRA), the Trace Archive and EMBL-Bank. It comprises three parts: ENA-Annotation, ENA-Assembly and ENA-Reads. ENA-Annotation contains detailed functional annotation, for example of individual, well characterised coding sequences. ENA Assembly is designed for efficient storage of sequence assemblies. Finally, ENA-Reads is optimised for the efficient storage of sequence trace information (6).
Data arrive at ENA from a variety of sources. These include submissions of raw data, assembled sequences and annotation from small-scale sequencing projects or data coming from the major European sequencing centres and through the exchange with partners in the International Nucleotide Sequence Database Collaboration (INSDC).
You should use ENA to retrieve information on nucleotide sequences of interest, if you want to:
- perform a comprehensive sequence-similarity search,
- if you want raw data from electrophoresis-based sequencing machines (held in the European Trace Archive),
- raw data from next-generation (array-based) sequencing platforms .
Moreover, if you have sequenced a gene or transcript, you can submit your data to EMBL-Bank, that constitutes Europe's primary nucleotide sequence resource and is part of the ENA.
Figure 1.4 Three main databases contribute to the European Nucleotide Archive as shown in the figure.