Summary

  • Why do we need ENA? ENA is a comprehensive archive of original nucleotide sequence data and its annotation.

  • What is ENA? ENA provides a browser and underlying databank. ENA consists of 3 databases: EMBL-Bank for assembled sequence and its annotation; Sequence Read Archive (SRA) for unasembled (raw) data generated by Next Generation sequencing methods; Trace Archive (TA) for unassembled data generated by capillary sequencing methods.

  • Where does the data come from? Data is submitted as unassembled or assembled sequence directly to one of the three INSDC databases: ENA, NCBI or DDBJ. All data is exchanged daily between these databases.

  • How is the sequence assembled? Assembly is performed by the submitter, not by ENA curators.

  • How is the sequence annotated? Annotation is usually submitted along with the sequence, but ENA does take third party annotation as long as there is a published reference.

  • How is the data structured? ENA is divided into data classes and taxonomic divisions. A sequence is assigned to the most specific taxonomic division. There are special divisions for sequences not associated with any conventional taxonomy.

  • How do you access ENA? You can access the ENA browser through the EBI homepage or by going directly to ENA.

  • How can you search ENA? You can search ENA through the ENA browser or the Sequence Search & Analysis tools. You can search with gene names, disease names, keywords, accession IDs or sequence.

  • How do I download data from ENA? EMBL-Bank sequences, taxonomy and annotation can be exported by the ENA browser REST URLs or by ftp. SRA and Trace Archive data can be downloaded by ftp or Aspera.

  • How do I submit data to ENA? Assembled and annotated sequences can be submitted to EMBL-Bank using the Webin template submission forms or through pre-prepared EMBL-Bank flat files for complete genomes and long sequences. SRA sequences can also be submitted to ENA.