Read domain 1.3 XML metadata format

The SRA 1.3 XML metadata format replaced SRA 1.2 XML metadata format in August 2011. The SRA 1.3 metadata format is designed to be backwards compatible with only a very few exceptions.


XML Schema Description
SRA.submission.xsd A submission contains submission actions to be performed by the archive.
SRA.sample.xsd A sample contains information about the sample upon which the sequencing experiments are based. Samples can be used in any number of sequencing experiments. A study contains information about the sequencing project. Studies can contain any number of sequencing experiments and analysis.
SRA.experiment.xsd An experiment contains instrument, library and spot information about the sequencing experiment. Experiments are associated with studies, samples and runs. Runs contain the actual sequencing reads. A run contains the sequencing reads from sequencing experiments. Each run can contain all or part of the results for a particular experiment.
SRA.analysis.xsd An analysis contains secondary analysis results computed from the primary sequencing reads. Analyses are assocated with studies.
SRA.common.xsd Common types used in other SRA XML Schemas.
EGA.dac.xsd An European Genome-phenome Archive (EGA) data access committee (DAC). Required for authorized access submissions.
EGA.policy.xsd An European Genome-phenome Archive (EGA) data access policy. Required for authorized access submissions.
EGA.dataset.xsd An European Genome-phenome Archive (EGA) data set. Required for authorized access submissions.

For examples how to prepare the SRA XMLs please refer to Preparing XMLs.

Removed instrument models (SRA.common.xsd)

  • 454 Titanium (use 454 GS FLX Titanium)
  • GS 20 (use 454 GS 20)
  • GS FLX (use 454 GS FLX)
  • Solexa 1G Genome Analyzer (use Illumina choices)

New instrument platforms (SRA.common.xsd) *1


New instrument models (SRA.common.xsd) *1

  • Illumina HiSeq 1000
  • Illimina MiSeq
  • AB SOLiD 5500xl
  • AB SOLiD 5500
  • Ion Torrent PGM
  • Complete Genomics
  • PacBio RS

New library sources (SRA.common.xsd)

  • METATRANSCRIPTOMIC has been added as a new library source

Removed library strategies (SRA.common.xsd)

  • Deprecated library strategy BARCODE has been removed

Gap descriptor (SRA.common.xsd) *1

GapDescriptor element has been introduced for Experiment and Run to define the placement of gaps relative to a reference sequence. The GapDescriptor was introduced to be able to describe the CompleteGenomics spot layout. The GapDescriptor is expected to become a replacement for the LIBRARY_LAYOUT.

Changes to Study (

  • CENTER_PROJECT_NAME has been made optional
  • Deprecated RELATED_STUDIES/STUDY has been removed

Changes to Sample (SRA.sample.xsd)

  • TAXON_ID has been made mandatory

Changes to Submission (SRA.submission.xsd)

  • Deprecated 'submission_id' attribute has been removed
  • Deprecated 'handle' attribute has been removed

Changes to Experiment (SRA.experiment.xsd)

  • The SPOT_DESCRIPTOR element has been made optional and is no longer required for file formats which can be interpreted without external spot layout information *1
  • The PROCESSING element has been made optional
  • GAP_DESCRIPTOR is now available on the level of experiment *1

Changes to Run (

  • Only single DATA_BLOCK is supported in run
  • Optional 'unencrypted_checksum' attribute has been added to FILE element to contain the unencrypted file checksum for encrypted (EGA) files *1
  • SPOT_DESCRIPTOR is now supported on the level of run *1
  • GAP_DESCRIPTOR is now available on the level of run *1
  • Added new filetype option PacBio_HDF5 *1
  • Added CompleteGenomics_native file type *1
  • Deprecated _seq.txt, _prb.txt, _sig2.txt, _qhg.txt filetype options have been removed

*1 Also backported to SRA XML 1.2.

Changes to Analysis (SRA.analysis.xsd)

  • DATA_BLOCK is made optional to support updates without files
  • Only single DATA_BLOCK is supported in analysis
  • The PROCESSING elements have been removed
  • Removed data_block name attribute from RUN_LABELS and SEQ_LABELS
  • Removed gi attribute from SEQ_LABELS
  • Removed TARGET element and added SAMPLE_REF and RUN_REF elements

Latest ENA news

19 Jan 2018: Forthcoming changes to WGS and TSA sequences

ENA is making changes to provision of WGS and TSA sequences

05 Jan 2018: ENA release 134

Release 134 of ENA's assembled/annotated sequences is now available

21 Dec 2017: ENA services over the holiday period

Between Friday 22nd December and Tuesday 2nd January ENA services such as submissions and retrieval...

21 Dec 2017: ENA release 134 expected early January

The last release of assembled and annotated sequences for 2017 (134) has been particularly...