Read domain XML 1.4 metadata format

The SRA 1.4 XML metadata specification replaces SRA 1.3 XML in 13th of July 2012. The SRA 1.4 metadata format is designed to be backwards compatible with only a very few exceptions.

 

XML Schema Description
SRA.submission.xsd A submission contains submission actions to be performed by the archive.
SRA.sample.xsd A sample contains information about the sample upon which the sequencing experiments are based. Samples can be used in any number of sequencing experiments.
SRA.study.xsd A study contains information about the sequencing project. Studies can contain any number of sequencing experiments and analysis.
SRA.experiment.xsd An experiment contains instrument, library and spot information about the sequencing experiment. Experiments are associated with studies, samples and runs. Runs contain the actual sequencing reads.
SRA.run.xsd A run contains the sequencing reads from sequencing experiments. Each run can contain all or part of the results for a particular experiment.
SRA.analysis.xsd An analysis contains secondary analysis results computed from the primary sequencing reads. Analyses are assocated with studies.
SRA.common.xsd Common types used in other SRA XML Schemas.
EGA.dac.xsd An European Genome-phenome Archive (EGA) data access committee (DAC). Required for authorized access submissions.
EGA.policy.xsd An European Genome-phenome Archive (EGA) data access policy. Required for authorized access submissions.
EGA.dataset.xsd An European Genome-phenome Archive (EGA) data set. Required for authorized access submissions.

For examples how to prepare the SRA XMLs please refer to Preparing SRA XML metadata.

New platforms

A new platform CAPILLARY has been added with the following instrument models:

  • AB 3730xL Genetic Analyzer
  • AB 3730 Genetic Analyzer
  • AB 3500xL Genetic Analyzer
  • AB 3500 Genetic Analyzer
  • AB 3130xL Genetic Analyzer
  • AB 3130 Genetic Analyzer
  • AB 310 Genetic Analyzer

New instrument models

The following instrument models have been added:

  • 454 FLX+
  • AB SOLiD 3.0 plus
  • Illumina HiSeq 2500
  • Illumina HiScanSQ
  • Ion Proton

Instruments model changes

The following changes have been made to instrument models:

  • Removed 'none' as an instrument model for Complete Genomics
  • Removed 'none' as an instrument model for PacBio
  • Added corrected names for 'AB 5500' and 'AB 5500xl' instrument models: AB 5500 Genetic Analyzer and AB 5500xl Genetic Analyzer. Uses of 'AB 5500' and 'AB 5500xl' are automatically converted to the corrected names upon submission.
  • Renamed 'AB SOLiD System 3 Plus' system to 'AB SOLiD 3 Plus System'. We automatically convert to using the corrected name upon submission.

New library strategies

The following library strategies have been added:

  • WGA: whole genome amplification
  • miRNA-Seq: micro RNA and other small non-coding RNA sequencing
  • Tn-Seq: gene fitness determination through transposon seeding

New library selections

The following library selections have been added:

  • MDA: multiple displacement amplification
  • padlock probes capture method: to be used in conjuction with Bisulfite-Seq

New targeted loci

The following targeted loci have been added:

  • 18S ribosomal RNA
  • RBCL
  • matK
  • COX1
  • ITS1-5.8S-ITS2

FILES element

The FILES element will be removed from SRA Submission XML Schema in SRA 1.5 XML. Submitters should only use the FILES element in the SRA Run/Analysis XML Schema.

IDENTIFIERS element

IDENTIFIERS element (IdentifierType) has been added to all SRA objects to capture secondary accessions and equivalent accessions in other databases. This will ultimately replace the 'NameGroup' and 'RefNameGroup' types. However, this change will not immediately affect any EBI SRA submitters.

The IDENTIFIERS element contains the following types of identifiers:

  • PRIMARY_ID: The primary accession number of the objects. This is equivalent to NameGroup 'accession' attribute and RefNameGroup 'accession' attribute.
  • SECONDARY_ID: Any secondary accession numbers of the objects.
  • SUBMITTER_ID: a name given to the object by the submitter. This is equivalent to NameGroup 'alias' attribute and RefNameGroup 'refname' attribute. The 'namespace' attribute is equivalent to the NameGroup 'center_name' attribute and RefNameGroup 'refcenter' attribute.
  • EXTERNAL_ID: a name or accession given to the object by an external database.

Please note that:

  • EBA SRA only supports a single submitter provided name.
  • EBI SRA does not support labels within the IDENTIFIERS block. Any occurences are automatically removed.
  • EBI SRA does not support UUIDs in IDENTIFIERS block. Any occurences are automatically removed.

Other changes

  • LIBRARY_NAME has been made optional
  • TITLE has been added to run
  • DEFAULT_MEMBER has been added to sample POOL
  • GapDescriptorType has been changed