Read domain formats

This page provides information on the read domain formats in ENA. If you are planning to submit read data into ENA, please refer first to Submitting read data.

Data format

Read data including bases, base qualities and alignments can be submitted in several different formats ... more information.

Metadata format

The metadata model consists of the following objects each controlled by a XML Schema. The latest schema versions are available through FTP from and are kept under version control in  

XML Schema Description
SRA.submission.xsd A submission action to be performed by the archive.
SRA.sample.xsd Detailed information about the sequenced sample. Samples can be used in any number of experiments. A study groups together experiments or analyses for public data release purposes.
SRA.experiment.xsd An experiment contains instrument and library preparation information and groups together one or more runs.

A run contains sequencing reads submitted in data files (e.g. BAM or CRAM).

SRA.analysis.xsd An analysis contains secondary analysis results. for example: read alignments (BAM or CRAM), sequence variations (VCF) of sequence annotations (TAB).
SRA.common.xsd Common types used in other SRA XML schemas.
EGA.dac.xsd An European Genome-phenome Archive (EGA) data access committee (DAC). Required for authorized access submissions.
EGA.policy.xsd An European Genome-phenome Archive (EGA) data access policy. Required for authorized access submissions.
EGA.dataset.xsd An European Genome-phenome Archive (EGA) data set. Required for authorized access submissions.

Accession number format

Each metadata object is assigned a unique accession number by the archive. The accession numbers can be used to retrieve data and metadata using the EB-Eye search available at the top of all EBI web pages or using the free text search available on the ENA home page. The metadata is then retrieved and displayed through the ENA Browser as in the examples in the above table.

Accession numbers assocaited with read data assigned by EBI start with 'ER' and accession numbers assigned by NCBI and DDBJ start with 'SR' and 'DR', respectively. The third letter of the accession number indicates the type of the metadata object. EGA accession numbers start with 'EGA' with the fourth letter indicating the type of the metadata object.

The accession numbers have a fixed number of digits after the letters: six for ENA and eleven for EGA.

Metadata object Accession prefix Number of digits Example
Submission ERA, SRA, DRA 6 ERA000092
Sample ERS, SRS, DRS 6 ERS000081
Study ERP, SRP, DRP 6 ERP000016
Experiment ERX, SRX, DRX 6 ERX000398
Run ERR, SRR, DRR 6 ERR003990
Analysis ERZ, SRZ, DRZ 6 ERZ000001
EGA Submission EGA 11 EGA00001000001
EGA Sample EGAN 11 EGAN00001000001
EGA Study EGAS 11 EGAS00001000001
EGA Experiment EGAX 11 EGAX00001000001
EGA Run EGAR 11 EGAR00001000001
EGA Analysis EGAZ 11 EGAZ00001000001
EGA DAC EGAC 11 EGAC00001000001
EGA Policy EGAP 11 EGAP00001000001
EGA Data Set EGAD 11 EGAD00001000001

Archive generated fastq file format

Once made public, data submitted to ENA are available for download using ftp and Aspara. Detailed data download instructions are available here including details of the archive generated fastq files and their organisation.


Latest ENA news

19 Mar 2018: ENA Release 135

Release 135 of ENA's assembled/annotated sequences is now available

19 Jan 2018: Forthcoming changes to WGS and TSA sequences

ENA is making changes to provision of WGS and TSA sequences

05 Jan 2018: ENA release 134

Release 134 of ENA's assembled/annotated sequences is now available

21 Dec 2017: ENA services over the holiday period

Between Friday 22nd December and Tuesday 2nd January ENA services such as submissions and retrieval...

21 Dec 2017: ENA release 134 expected early January

The last release of assembled and annotated sequences for 2017 (134) has been particularly...