Read domain 1.3 XML metadata format

The SRA 1.3 XML metadata format replaced SRA 1.2 XML metadata format in August 2011. The SRA 1.3 metadata format is designed to be backwards compatible with only a very few exceptions.

 

XML Schema Description
SRA.submission.xsd A submission contains submission actions to be performed by the archive.
SRA.sample.xsd A sample contains information about the sample upon which the sequencing experiments are based. Samples can be used in any number of sequencing experiments.
SRA.study.xsd A study contains information about the sequencing project. Studies can contain any number of sequencing experiments and analysis.
SRA.experiment.xsd An experiment contains instrument, library and spot information about the sequencing experiment. Experiments are associated with studies, samples and runs. Runs contain the actual sequencing reads.
SRA.run.xsd A run contains the sequencing reads from sequencing experiments. Each run can contain all or part of the results for a particular experiment.
SRA.analysis.xsd An analysis contains secondary analysis results computed from the primary sequencing reads. Analyses are assocated with studies.
SRA.common.xsd Common types used in other SRA XML Schemas.
EGA.dac.xsd An European Genome-phenome Archive (EGA) data access committee (DAC). Required for authorized access submissions.
EGA.policy.xsd An European Genome-phenome Archive (EGA) data access policy. Required for authorized access submissions.
EGA.dataset.xsd An European Genome-phenome Archive (EGA) data set. Required for authorized access submissions.

For examples how to prepare the SRA XMLs please refer to Preparing XMLs.

Removed instrument models (SRA.common.xsd)

  • 454 Titanium (use 454 GS FLX Titanium)
  • GS 20 (use 454 GS 20)
  • GS FLX (use 454 GS FLX)
  • Solexa 1G Genome Analyzer (use Illumina choices)

New instrument platforms (SRA.common.xsd) *1

  • ION_TORRENT

New instrument models (SRA.common.xsd) *1

  • Illumina HiSeq 1000
  • Illimina MiSeq
  • AB SOLiD 5500xl
  • AB SOLiD 5500
  • Ion Torrent PGM
  • Complete Genomics
  • PacBio RS

New library sources (SRA.common.xsd)

  • METATRANSCRIPTOMIC has been added as a new library source

Removed library strategies (SRA.common.xsd)

  • Deprecated library strategy BARCODE has been removed

Gap descriptor (SRA.common.xsd) *1

GapDescriptor element has been introduced for Experiment and Run to define the placement of gaps relative to a reference sequence. The GapDescriptor was introduced to be able to describe the CompleteGenomics spot layout. The GapDescriptor is expected to become a replacement for the LIBRARY_LAYOUT.

Changes to Study (SRA.study.xsd)

  • CENTER_PROJECT_NAME has been made optional
  • Deprecated RELATED_STUDIES/STUDY has been removed

Changes to Sample (SRA.sample.xsd)

  • TAXON_ID has been made mandatory

Changes to Submission (SRA.submission.xsd)

  • Deprecated 'submission_id' attribute has been removed
  • Deprecated 'handle' attribute has been removed

Changes to Experiment (SRA.experiment.xsd)

  • The SPOT_DESCRIPTOR element has been made optional and is no longer required for file formats which can be interpreted without external spot layout information *1
  • The PROCESSING element has been made optional
  • GAP_DESCRIPTOR is now available on the level of experiment *1

Changes to Run (SRA.run.xsd)

  • Only single DATA_BLOCK is supported in run
  • Optional 'unencrypted_checksum' attribute has been added to FILE element to contain the unencrypted file checksum for encrypted (EGA) files *1
  • SPOT_DESCRIPTOR is now supported on the level of run *1
  • GAP_DESCRIPTOR is now available on the level of run *1
  • Added new filetype option PacBio_HDF5 *1
  • Added CompleteGenomics_native file type *1
  • Deprecated _seq.txt, _prb.txt, _sig2.txt, _qhg.txt filetype options have been removed

*1 Also backported to SRA XML 1.2.

Changes to Analysis (SRA.analysis.xsd)

  • DATA_BLOCK is made optional to support updates without files
  • Only single DATA_BLOCK is supported in analysis
  • The PROCESSING elements have been removed
  • Removed DE_NOVO_ASSEMBLY and ABUNDANCE_MEASUREMENT
  • Removed data_block name attribute from RUN_LABELS and SEQ_LABELS
  • Removed gi attribute from SEQ_LABELS
  • Replaced ASSEMBLY/STANDARD/NAME with ASSEMBLY/STANDARD/@accession
  • Removed TARGET element and added SAMPLE_REF and RUN_REF elements

Latest ENA News

20 Aug 2014: Read data through Globus GridFTP
Read data can now be downloaded using Globus GridFTP through ebi#ena Globus Online public endpoint.

18 Aug 2014: Changes to SRA XML 1.5
Small changes to Experiment XML, Analysis XML, EGA Dataset XML, EGA DAC XMLs were deployed on 11th of August 2014.

1 Jul 2014: ENA release 120
Release 120 of ENA's assembled/annotated seqences now available

23 May 2014: Change to date format for advanced search
From 16th June 2014, the date format used in the advanced search will be changed to ISO format (YYYY-MM-DD).

20 May 2014: Update to the ENA SAMPLE checklist
From 10th of June 2014 the ENA SAMPLE checklist XML will be updated and the older version will be deprecated.