Read domain 1.1 XML metadata format

The SRA 1.1 XML metadata format HAS BEEN REPLACED by the SRA XML 1.2 metadata format in November 2010.

XML Schema XML Schema Document Description
SRA.submission.xsd SRA.submission.pdf A submission contains submission actions to be performed by the archive. For small studies the submission accession number can be quoted in place of study accession number.
SRA.sample.xsd SRA.sample.pdf A sample contains information about the sample upon which the sequencing experiments are based. Samples can be used in any number of sequencing experiments.
SRA.study.xsd SRA.study.pdf A study contains information about the sequencing project. Studies can contain any number of sequencing experiments and analysis (new to SRA 1,.1).
SRA.experiment.xsd SRA.experiment.pdf An experiment contains information about the sequencing experiment. Experiments are associated with runs which contain the actual sequencing results.
SRA.run.xsd SRA.run.pdf

A run contains the sequencing results from sequencing experiments. Each run can contain all or part of the results for a particular experiment. For example, each Illumina Genome Analyzer lane is typically represented by a single run.

SRA.analysis.xsd SRA.analysis.pdf An analysis contains secondary analysis results computed from the primary sequences results.

Main changes to the SRA format since SRA 1.0

Addition of center_name and refcenter attributes in study, sample, experiment and run

Center name allows for establishing a namespace for the submitted objects; object aliases are required to be unique for each submitting center. The center_name should now be used together with alias when submitting SRA objects. When making references between SRA objects, e.g. when referring to a sample from an experiment, the refname should now be used together with refcenter.

Addition of new study types in study

The following study types have been added to study:

  • RNASeq
  • Other

The Other study type can be used together with the new_study_type attribute to suggest new study types.

Addition of new instrument models in experiment

New instrument values have been added to experiment:

  • 454 XLR Titanium
  • Illumina Genome Analyzer II
  • AB SOLiD System 2.0
  • AB SOLiD System 3.0

Deprecation of instrument model in run

The use of instrument model in run has been deprecated.

Addition of new library selection values in experiment

New library selection values have been added to experiment:

  • Hybrid Selection DNAse

Addition of new library strategy values in experiment

New library strategy values have been added to experiment:

  • Bisulfite-Seq
  • DNAse-Hypersensitivity

Deprecation of BARCODE library strategy in experiment

BARCODE has been deprecated as a library strategy as it pertains to a pooling strategy rather than a library strategy.

Addition of new library selection values in experiment

A new library selection value has been added to experiment:

  • Reduced Representation

Addition of SCIENTIFIC_NAME and INDIVIDUAL_NAME in sample

The new SCIENTIFIC_NAME element is the name or synonym of the organism from the INSDC Taxonomy database. This field can be used to confirm the TAXON_ID or to suggest new organisms to be added to the INSDC Taxonomy database.

The new INDIVIDUAL_NAME element can be used to identify an individual sample where appropriate (this is usually NOT appropriate for human subjects). Example: "Glennie" the platypus.

Addition of TITLE in sample

Sample objects can now have a title which may be used for displayed purposes. For example: E. coli K-12 MG1665 genomic sample.

Deprecation of sample members in sample

Sample pools are now specified in experiment using the POOL element. The SAMPLE.MEMBERS element has now been deprecated in sample. Multiplexed sample experiments where each sample is distinguishable by a barcode are listed by sample and barcode. Pooled samples are listed by sample only.

FILES element no longer required in submission

Submission objects may now be created without the FILES element.

Addition of PROTECT action in submission

The new PROTECT action has been added to support submission to EGA. When using the PROTECT action submitters must authenticate using the EGA user name and password. The PROTECT action can not be used when submitting to SRA.

Deprecation of HoldUntilPublication in submission

The HoldUntilPublication attribute is no longer supported. Please use HoldUntilDate instead.

Deprecation of submission_id and addition of alias in submission

The new alias attribute has been added to submission to be consistent with other objects. The alias should be used in place of submission_id which is now deprecated.

Deprecation of run_file and total_data_blocks attributes in run

The run_file and total_data_blocks attributes have been deprecated. The run_file was never used effectively and the total_data_blocks attribute is not required.

Addition of serial attribute to DATA_BLOCK in run

The new DATA_BLOCK.serial attribute will allow for loading of multiple DATA_BLOCKs by indicating the load order. This specification is needed in order for loaders to work with multiple DATA_BLOCK.

Addition of new filetypes to DATA_BLOCK in run

Please refer to RUN XML for these file formats.

Addition of READ_LABEL element to DATA_BLOCK in run

The new DATA_BLOCK.FILE.READ_LABEL element makes it possible to submit different reads of the spot in separate files.

Addition of DATA_SERIES_LABEL element to DATA_BLOCK in run

The new DATA_BLOCK.FILE.DATA_SERIES_LABEL element makes it possible to submit different data series (e.g. base calls and quality) in separate files.

Addition of checksum attribute to DATA_BLOCK in run

The new DATA_BLOCK.FILE.checksum attribute can be used to specify the checksum of the file that will be presented to the data loader. Please note that we currently only support MD5 checksums. All submitters are adviced to start using DATA_BLOCK.FILE.checksum from now on.

Addition of quality_scoring_system attribute to DATA_BLOCK in run

The new DATA_BLOCK.FILE.quality_scoring_system attribute allows the submitter to specify that the incoming data is in log-odds form. This will help the data loader to correctly process the base quality values.

Addition of quality_encoding attribute to DATA_BLOCK in run

The new DATA_BLOCK.FILE.qualty_encoding attribute tells whether the quality string is ASCII character, decimal, or hexadecimal encoded. This will help the data loader to correctly process the base quality values.

Addition of ascii_offset attribute to DATA_BLOCK in run

The new DATA_BLOCK.FILE.ascii_offset attribute allows for the specification of the basis value (the zero) in the quality values. This will help the data loader to correctly process the base quality values.

Addition of member_name attribute to DATA_BLOCK in run

The new DATA_BLOCK.member_name attribute allows an individual data block among several to be associated with a member of the sample pool.

Deprecation of SPOT_DECODE_METHOD and NUMBER_OF_READS_PER_SPOT in experiment

The SPOT_DECODE_METHOD and NUMBER_OF_READS_PER_SPOT elements are no longer supported in experiment.

Addition of READ_LABEL element to READ_SPEC in experiment

The new READ_SPEC.READ_LABEL attribute allows for the naming of individual reads in the spot. This makes it possible to submit different reads of the spot in separate files.

Addition of EXPECTED_BASECALL_TABLE to READ_SPEC in experiment

The new READ_SPEC.EXPECTED_BASECALL_TABLE specifies the expected base calls for a given read. The read_group_tag attribute can be used to associate a given expected base call to a particular sample in sample pool.

Addition of RELATIVE_ORDER to READ_SPEC in experiment

The new READ_SPEC.RELATIVE_ORDER element can be used to specify that a read is to be found before or after the specified read. This choice is appropriate for example when specifying a read that follows a variable length expected sequence.

Changes to PLATFORM in experiment

The following elements should be used for 454: KEY_SEQUENCE, FLOW_SEQUENCE, FLOW_COUNT. The SEQUENCE_LENGTH element should be used for Illumina and SOLiD to represent the number of bases or colors in the raw sequence including all technical and application reads. The CYCLE_SEQUENCE and CYCLE_COUNT are now deprecated.

Addition of HELICOS to PLATFORM in experiment

Added new instrument model choice HELICOS to PLATFORM in experiment.

Deprecation of QUALITY_SCORES.NUMBER_OF_LEVELS and QUALITY_SCORES.MULTIPLIER elements in experiment

The QUALITY_SCORES.NUMBER_OF_LEVELS and QUALITY_SCORES.MULTIPLIER elements have been deprecated in experiment.