SRA XML 1.2 metadata format

The SRA 1.2 XML metadata format replaced the SRA 1.1 XML metadata format in November 2010. The SRA 1.2 metadata format is designed to be backwards compatible. For examples how to prepare the SRA XMLs please refer to Preparing SRA XML metadata.

The SRA 1.2 metadata is expressed with the following XML schemas:

XML Schema XML Schema Document Description
SRA.submission.xsd SRA.submission.pdf A submission contains submission actions to be performed by the archive. For small studies the submission accession number can be quoted in place of study accession number.
SRA.sample.xsd SRA.sample.pdf A sample contains information about the sample upon which the sequencing experiments are based. Samples can be used in any number of sequencing experiments.
SRA.study.xsd SRA.study.pdf A study contains information about the sequencing project. Studies can contain any number of sequencing experiments and analysis.
SRA.experiment.xsd SRA.experiment.pdf An experiment contains information about the sequencing experiment. Experiments are associated with runs which contain the actual sequencing results.
SRA.run.xsd SRA.run.pdf A run contains the sequencing results from sequencing experiments. Each run can contain all or part of the results for a particular experiment. For example, each Illumina Genome Analyzer lane is typically represented by a single run.
SRA.analysis.xsd SRA.analysis.pdf An analysis contains secondary analysis results computed from the primary sequencing results.
SRA.common.xsd SRA.common.pdf Common types.
EGA.dac.xsd EGA.dac.pdf An European Genome-phenome Archive (EGA) data access committee (DAC). Required for authorized access submissions.
EGA.policy.xsd EGA.policy.pdf An European Genome-phenome Archive (EGA) data access policy. Required for authorized access submissions.
EGA.dataset.xsd EGA.dataset.pdf An European Genome-phenome Archive (EGA) data set. Required for authorized access submissions.

Deprecated fields in SRA Experiment

  • EXPERIMENT.DESIGN.LIBRARY_DESCRIPTOR.LIBRARY_SOURCE value 'NON GENOMIC'
  • EXPERIMENT.DESIGN.LIBRARY_DESCRIPTOR.LIBRARY_STRATEGY value 'BARCODE'
  • EXPERIMENT.PROCESSING.BASE_CALLS element
  • EXPERIMENT.PROCESSING.QUALITY_SCORES element
  • EXPERIMENT.expected_number_spots attribute
  • EXPERIMENT.expected_number_reads attribute
  • EXPERIMENT.DESIGN.SPOT_DESCRIPTOR.SPOT_DECODE_METHOD element
  • EXPERIMENT.DESIGN.SPOT_DESCRIPTOR.SPOT_DECODE_SPEC.NUMBER_OF_READS_PER_SPOT element
  • EXPERIMENT.PLATFORM.ILLUMINA.CYCLE_SEQUENCE element
  • EXPERIMENT.PLATFORM.ILLUMINA.CYCLE_COUNT element
  • EXPERIMENT.PLATFORM.ILLUMINA.INSTRUMENT_MODEL value 'Solexa 1G Genome Analyzer'
  • EXPERIMENT.PLATFORM.ABI_SOLID.CYCLE_COUNT element
  • EXPERIMENT.PLATFORM.LS454.instrument_model values 'GS 20', 'GS FLX', '454 Titanium'

Deprecated fields in SRA Run

  • RUN.DATA_BLOCK.total_spots attribute
  • RUN.DATA_BLOCK.total_reads attribute
  • RUN.DATA_BLOCK.number_channels attribute
  • RUN.DATA_BLOCK.format_code attribute
  • RUN.instrument_model attribute
  • RUN.run_file attribute
  • RUN.total_data_blocks attribute

Deprecated fields in SRA Study

  • STUDY.DESCRIPTOR.CENTER_NAME element
  • STUDY.DESCRIPTOR.PROJECT_ID element: STUDY.DESCRIPTOR.RELATED_STUDIES.RELATED_STUDY should be used instead
  • STUDY.DESCRIPTOR.RELATED_STUDIES.STUDY element

Deprecated fields in SRA Submission

  • SUBMISSION.submission_id attribute
  • SUBMISSION.handle attribute
  • SUBMISSION.ACTIONS.ACTION.HOLD.HoldForPeriod attribute

Unsupported fields in SRA Run

The SPOT_DESCRIPTOR, PLATFORM and PROCESSING elements have been added to SRA 1.2 Run but are not currently supported by SRA EBI. Please specify this information in the SRA Experiment instead.

New mandatory fields in SRA Experiment

  • EXPERIMENT.PLATFORM.ILLUMINA.SEQUENCE_LENGTH element
  • EXPERIMENT.PLATFORM.ABI_SOLID.SEQUENCE_LENGTH element
  • EXPERIMENT.DESIGN.SPOT_DESCRIPTOR.SPOT_DECODE_SPEC.SPOT_LENGTH element  (for ILLUMINA & ABI_SOLID platforms)

New platforms in SRA Experiment

  • COMPLETE_GENOMICS
  • PACBIO_SMRT

New instrument models in SRA Experiment

  • Illumina Genome Analyzer IIx
  • Illumina HiSeq 2000
  • AB SOLiD 4 System
  • AB SOLiD 4hq System
  • AB SOLiD PI System
  • 454 GS Junior
  • 454 GS FLX Titanium

New file types in SRA Run

  • bam: BAM file submissions are now supported

New library strategy and selection terms in SRA Experiment

  • Methylation-Sensitive Restriction Enzyme sequencing strategy:
    <LIBRARY_STRATEGY>MRE-Seq</LIBRARY_STRATEGY>
    <LIBRARY_SELECTION>Restriction Digest</LIBRARY_SELECTION>
    
  • Methylated DNA Immunoprecipitation sequencing strategy:
    <LIBRARY_STRATEGY>MeDIP-Seq</LIBRARY_STRATEGY>
    <LIBRARY_SELECTION>5-methylcytidine antibody</LIBRARY_SELECTION>
    
  • RNA-Seq strategy as a general choice for sequencing that targets total RNA:
    <LIBRARY_STRATEGY>RNA-Seq</LIBRARY_STRATEGY>
    <LIBRARY_SELECTION>CAGE</LIBRARY_SELECTION>
    
    <LIBRARY_STRATEGY>RNA-Seq</LIBRARY_STRATEGY>
    <LIBRARY_SELECTION>RAGE</LIBRARY_SELECTION>
    
    <LIBRARY_STRATEGY>RNA-Seq</LIBRARY_STRATEGY>
    <LIBRARY_SELECTION>size fractionation</LIBRARY_SELECTION>
    
  • Direct sequencing of methylated fractions sequencing strategy:
    <LIBRARY_STRATEGY>MBD-Seq</LIBRARY_STRATEGY>
    <LIBRARY_SELECTION>MBD2 protein methyl-CpG binding domain</LIBRARY_SELECTION>
    
  • Whole exome sequencing strategy:
    <LIBRARY_STRATEGY>WXS</LIBRARY_STRATEGY>
    

New library source terms in SRA Experiment

  • TRANSCRIPTOMIC
  • METAGENOMIC

Made read label optional in SRA Experiment

EXPERIMENT.DESIGN.SAMPLE_DESCRIPTOR.POOL.MEMBER.READ_LABEL is now optional to support non-barcoded sample pools which don't require read labels.

Added targetted loci in SRA Experiment

New element EXPERIMENT.DESIGN.LIBRARY_DESCRIPTOR.TARGETED_LOCI.LOCUS should be used to describe which loci are targetted by the experiment:

  • 16S rRNA
  • exome
  • other

When other is used the the targetted locus should be described using the description attribute.

Added library pooling strategy in SRA Experiment

New element EXPERIMENT.DESIGN.LIBRARY_DESCRIPTOR.POOLING_STRATEGY should be used to indicate the sample multiplexing intent of the submitter:

  • none: There is a one-to-one correspondence with sample and library (normal case).
  • simple pool: The sequencing is done on a pool of identified samples which cannot be distinguished in the sequencing result.
  • multiplexed samples: A library was prepared of multiplexed samples each of which can be distinguished in the sequencing result through a molecular barcode or other indicator.
  • multiplexed libraries: Multiple libraries were prepared each of which can be distinguished in the sequencing result through a molecular barcode or other indicator. Each library may be made from the same or different samples.
  • spiked library: One library is prepared with an oligonucleotide sequence included that when sequenced can help provide quality control for the library.
  • other

Added default expected base call length in SRA Experiment

Added default_length attribute to EXPECTED_BASECALL and EXPECTED_BASECALL_TABLE elements to define the default length for the expected base call.

Added processing pipeline in SRA Experiment

New element PIPELINE can be used to describe the data production pipeline by specifying the sequence of steps with program names and versions.

Changed study referencing in SRA Study

New element RELATED_STUDIES is intended to be used as a mechanism to associate SRA Studies with INSDC BioProjects and other resources that track studies. These include EGA and ArrayExpress at EBI and GEO and dbGaP at NCBI.

Added schema attribute to MODIFY element in SRA Submission

New attribute scheme in MODIFY element can be used to specify the type of SRA object being updated.