Preparing XMLs

Introduction

Sequence read submissions consists of metadata XML documents and read data files. The metadata XML documents are submitted using a programmatic REST interface while the data files are uploaded using one of the supported file upload protocols. Please note that for a data file to be submitted into ENA it must be referenced by an XML document after it has been uploaded.

The metadata XMLs can be created directly and submitted using the programmatic submission service. When using interactive Webin, the metadata XMLs are created transparently for the user from the information provided through Webin.

The goal of this document is to give sufficient information to submitters to be able to create the metadata XML documents required for programmatic submissions. Links to the latest XML schemas and brief descriptions of the metadata objects are available here.

Please find examples of the metadata XML documents below.

Any questions should be directed to datasubs@ebi.ac.uk.

Raw data submissions

A typical sequence read submission consists of 5 XMLs: Submission, Study, Sample, Experiment and Run XML. A submission does not have to contain all five XMLs. For example, it is possible to submit only samples or a studies to be referenced in the future. Please note that whatever the submission scenario, you will always require a Submission XML.

When technical reads (e.g. barcodes, adaptors or linkers) are included in the submitted raw sequences a spot descriptor must be submitted to describe the position of the technical reads so that they can be removed. The following data files can be submitted without providing spot descriptor information in the experiment/run XML:

  • BAM files
  • SFF files (single reads without barcodes)
  • Fastq files (single reads without any technical reads)
  • Complete Genomics files

Analysis data submissions

A typical sequence read analysis data submission consists of 4 XMLs: Submission, Study, Sample and Analysis XML. Currently, we accept two different types of analysis data submissions:

  • BAM files (for read alignments)
  • VCF files (for sequence variations)

In both cases samples must be created to refer to the samples used within the BAM and VCF files.

Identifying objects

Every object is uniquely identified within the submission account using the alias attribute. Once an object has been submitted, no other object of the same type can use the same alias within the submission account. The aliases are used in submissions to make references between different objects. One object references another object's alias using the refname attribute. For example, if a sample has the alias "sample1", an experiment can reference to this sample by using refname="sample1".

Identifying submitters

The center_name attribute defines the submitting institution. The center names are controlled acronyms provided to the account holders when the account is first generated for an institute. If the submitter is brokering a submission for another institute, the center name should reflect the institute where the data was generated. Brokers should request a special broker account and provide their center name acronym in the broker_name attribute. If the sequencing has been contracted to another partly, the run_center or analysis_center attributes can be used to provide this information.

XML Examples

XML Examples

Submission XML

The submission XML is used to validate, submit or update any number of other objects. The submission XML refers to other XMLs and contains directions for public release.

New submissions use the ADD action to submit new objects. Object updates are done using the MODIFY action and objects can be validated using the VERIFY action.

A submission can be made immediately public by using the RELEASE action or kept confidential for up to two years by using the HOLD action. If neither of these actions is used then the submission is made immediately public. The RELEASE action can also be used to publish objects that were previously submitted using the HOLD action. Read data is attached to run or analysis objects and is made public when the associated study object is released. Once a submission has been released it can be withdrawn from public access only by contacting us at datasubs@ebi.ac.uk.

An example of a submission XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<SUBMISSION_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.submission.xsd">
<SUBMISSION alias="TODO: UNIQUE NAME FOR SUBMISSION" 
 center_name="TODO: ACCOUNT CENTER_NAME ACRONYM">
        <ACTIONS>
            <ACTION>
                <ADD source="TODO: STUDY XML FILENAME" schema="study"/>
            </ACTION>
            <ACTION>
                <ADD source="TODO: SAMPLE XML FILENAME" schema="sample"/>
            </ACTION>
            <ACTION>
                <ADD source="TODO: EXPERIMENT XML FILENAME" schema="experiment"/>
            </ACTION>
            <ACTION>
                <ADD source="TODO: RUN XML FILENAME" schema="run"/>
            </ACTION>
 <ACTION> <ADD source="TODO: ANALYSIS XML FILENAME" schema="analysis"/> </ACTION>
               <!-- Use ADD actions to refer to the XML files being submitted. -->
               <!-- Remove any ADD actions that are not required (e.g. when 
                 referencing previously submitted studies or samples). -->
            <ACTION>
                <RELEASE/>
                <HOLD HoldUntilDate="TODO: hold until date 2010-01-01"/>
                <!-- Choose ONE action. RELEASE is for immediate public release.
                     HOLD is for an embargo (maximum 2 years). -->
            </ACTION>                            
        </ACTIONS>
    </SUBMISSION>
</SUBMISSION_SET>

To publish metadata and data previously submitted using the HOLD action, the RELEASE action is used pointing to the corresponding objects. For example, the following XML publishes the samples ERS001835  and ERS003039:

<?xml version="1.0" encoding="UTF-8"?>
<SUBMISSION_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.submission.xsd">
<SUBMISSION alias="ReleaseSubmissionUpdate" center_name="SC">
        <ACTIONS>
             <ACTION>
                  <RELEASE target="ERS001835"/>
             </ACTION>
             <ACTION>
                  <RELEASE target="ERS003039"/>
             </ACTION>
        </ACTIONS>
   </SUBMISSION>
</SUBMISSION_SET>

The release of run objects will also release the associated data to public.

Study XML

The study XML is used to describe the study in some detail. The study contains a title, a study type and an abstract as it would appear in a publication.

An example of a study XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<STUDY_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.study.xsd">
    <STUDY alias="TODO: UNIQUE NAME FOR SUBMISSION" 
        center_name="TODO: center name abbreviation">
        <DESCRIPTOR>
            <STUDY_TITLE>TODO: STUDY TITLE AS IT COULD APPEAR IN A PUBLICATION</STUDY_TITLE>
            <STUDY_TYPE existing_study_type="TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END 
                OF XML"/>
            <STUDY_ABSTRACT>TODO: STUDY ABSTRACT AS IT COULD APPEAR IN A
                PUBLICATION</STUDY_ABSTRACT>
        </DESCRIPTOR>
        <STUDY_ATTRIBUTES>
            <STUDY_ATTRIBUTE>
                <TAG>TODO: TAG NAME</TAG>
                <VALUE>TODO: TAG VALUE</VALUE>
            </STUDY_ATTRIBUTE>
            <STUDY_ATTRIBUTE>
                <TAG>TODO: TAG NAME</TAG>
                <VALUE>TODO: TAG VALUE</VALUE>
            </STUDY_ATTRIBUTE>
            <!-- You can generate your own fields and values here using STUDY_ATTRIBUTE 
                tag-value pairs. Please delete any unused attributes and add as many as 
                required. -->
        </STUDY_ATTRIBUTES>
    </STUDY>
    <!-- If you are submitting more than one study, replicate the block <STUDY> to </STUDY> 
        here, as many times as necessary. -->
</STUDY_SET>
<!-- Controlled vocabulary for existing_study_type:
    Whole Genome Sequencing
Metagenomics
Transcriptome Analysis
Resequencing
Epigenetics
Synthetic Genomics
Forensic or Paleo-genomics
Gene Regulation Study
Cancer Genomics
Population Genomics
RNASeq
Exome Sequencing
Pooled Clone Sequencing
Other If using "Other" please add new_study_type="TODO: add own term" attribute -->

Please use the following notation when including PubMed citations in Study XML:

<STUDY_LINKS>
    <STUDY_LINK>
        <XREF_LINK>
            <DB>PUBMED</DB>
            <ID>18987735</ID>
        </XREF_LINK>
    </STUDY_LINK>
</STUDY_LINKS>

Please use the following notation when referring to projects in Study XML:

 <RELATED_STUDIES>
    <RELATED_STUDY>
        <RELATED_LINK>
            <DB>PROJECT</DB>
             <ID>149</ID>
        </RELATED_LINK>
        <IS_PRIMARY>true</IS_PRIMARY>
    </RELATED_STUDY>
</RELATED_STUDIES>   

Sample XML

The sample XML is used to describe the sequenced samples. The mandatory fields are minimal and include information about the taxonomy of the sample. However, since the sample is one of the most important objects to be described biologically, it is highly recommended that “TAG-VALUE” pairs are generated to describe the sample in as much detail as possible. We recommend the adoption of GSC (Genomic Standards Consortium) terms for the TAG names were possible. For a full list of terms in the specific standards please visit the GSC wiki.

An example of a sample XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<SAMPLE_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.sample.xsd">
    <SAMPLE alias="TODO: UNIQUE NAME FOR SAMPLE" 
        center_name="TODO: ACCOUNT CENTER_NAME ACRONYM">                     
        <TITLE>TODO: A SHORT INFORMATIVE DESCRIPTION OF THE SAMPLE</TITLE>
        <SAMPLE_NAME>
            <TAXON_ID>TODO: PROVIDE NCBI TAXID FOR ORGANISM (e.g. 9606 for human)
                 </TAXON_ID>
            <!-- For complete prokaryotic genomes, a taxid should be generate for the strain. 
                 Please contact us so we can generate this on your behalf. -->
            <SCIENTIFIC_NAME>TODO: SCIENTIFIC NAME AS APPEARS IN NCBI TAXONOMY FOR THE
                TAXON_ID (e.g. homo sapiens)</SCIENTIFIC_NAME>
            <COMMON_NAME>TODO: OPTIONAL COMMON NAME AS APPEARS IN NCBI TAXONOMY FOR 
                THE TAXON_ID (e.g. human)</COMMON_NAME>
        </SAMPLE_NAME>
        <DESCRIPTION>TODO: A LONGER DESCRIPTION OF SAMPLE AND HOW IT DIFFERS FROM 
            OTHER SAMPLES</DESCRIPTION>
        <SAMPLE_ATTRIBUTES>
            <SAMPLE_ATTRIBUTE>
                <TAG>TODO: TAG NAME</TAG>
                <VALUE>TODO: TAG VALUE</VALUE>
                <UNITS>TODO: OPTIONAL UNIT</UNITS>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>TODO: TAG NAME</TAG>
                <VALUE>TODO: TAG VALUE</VALUE>
                <UNITS>TODO: OPTIONAL UNIT</UNITS>
            </SAMPLE_ATTRIBUTE>
            <!-- You can generate your own fields and values here using SAMPLE_ATTRIBUTE 
                tag-value pairs. An example tag could be "Isolation Source" and the value 
                could be "Seawater". You can also use the UNITS element to include 
                scientific units. E.g., TAG "Age" VALUE "5" UNITS "Years". Please refer
                to online documentation for further help with sample tag-value pairs.
                Please delete any unused attributes and add as many as required. -->
        </SAMPLE_ATTRIBUTES>
    </SAMPLE>
    <!-- If you are submitting more than one sample, replicate the block <SAMPLE> to </SAMPLE> 
         here, as many times as necessary. -->
</SAMPLE_SET>

Experiment XML

The experiment XML is used to describe the experimental setup including insrtument platform and model details, library preparation details, and any additional information required to correctly interpret the submitted data. Where any of these values differ between runs, a new experiment object must exist. Each experiment references a study and a sample by alias, or if previously-submitted, by accession. Pooled data must be demultiplexed by barcode for submission.

Experiment XML samples are provided below.

Experiment XML: llumina single reads 

<?xml version="1.0" encoding="UTF-8"?>
<EXPERIMENT_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.experiment.xsd">
    <EXPERIMENT alias="TODO: UNIQUE NAME FOR EXPERIMENT"
        center_name="TODO: ACCOUNT CENTER_NAME ACRONYM">
        <TITLE>TODO: TITLE OF EXPERIMENT</TITLE>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <DESIGN>
            <DESIGN_DESCRIPTION>TODO: DETAILS ABOUT THE SETUP AND GOALS OF THE 
                EXPERIMENT AS SUPPLIED BY INVESTIGATOR</DESIGN_DESCRIPTION>
            <SAMPLE_DESCRIPTOR refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE 
                OBJECT"/>
            <LIBRARY_DESCRIPTOR>
                <LIBRARY_NAME>TODO: NAME OF LIBRARY</LIBRARY_NAME>
                <LIBRARY_STRATEGY>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT 
                    END OF XML</LIBRARY_STRATEGY>
                <LIBRARY_SOURCE>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT 
                    END OF XML </LIBRARY_SOURCE>
                <LIBRARY_SELECTION>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT 
                    END OF XML </LIBRARY_SELECTION>
                <LIBRARY_LAYOUT>
                    <SINGLE/>
                </LIBRARY_LAYOUT>
                <LIBRARY_CONSTRUCTION_PROTOCOL>TODO: PROTOCOL BY WHICH THE LIBRARY WAS
                    CONSTRUCTED</LIBRARY_CONSTRUCTION_PROTOCOL>
            </LIBRARY_DESCRIPTOR>
            <!-- Please note that the spot descriptor is no longer required for most formats. -->
            <SPOT_DESCRIPTOR>
                <SPOT_DECODE_SPEC>
                    <SPOT_LENGTH>TODO: Expected number of base calls or cycles per spot 
                    (raw sequence length including all application and technical tags and mate pairs)
                    </SPOT_LENGTH>
                    <READ_SPEC>
                        <READ_INDEX>0</READ_INDEX>
                        <READ_CLASS>Application Read</READ_CLASS>
                        <READ_TYPE>Forward</READ_TYPE>
                        <BASE_COORD>1</BASE_COORD>
                    </READ_SPEC>
                </SPOT_DECODE_SPEC>
            </SPOT_DESCRIPTOR>
        </DESIGN>
        <PLATFORM>
            <ILLUMINA>
                <INSTRUMENT_MODEL>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT 
                END OF XML </INSTRUMENT_MODEL>
            </ILLUMINA>
        </PLATFORM>
        <PROCESSING/>
    </EXPERIMENT>
    <!-- If you are submitting more than one experiment, replicate the block <EXPERIMENT> 
        to </EXPERIMENT> here, as many times as necessary. -->
</EXPERIMENT_SET>
<!-- Controlled vocabulary for INSTRUMENT_MODEL (description in parentheses):
    Illumina Genome Analyzer
    Illumina Genome Analyzer II
    Illumina Genome Analyzer IIx
Illumina HiSeq 2500
 Illumina HiSeq 2000
Illumina HiSeq 1500 Illumina HiSeq 1000 Illumina MiSeq
Illumina HiScanSQ
 HiSeq X Ten
NextSeq 500
unspecified --> <!-- Controlled vocabulary for LIBRARY STRATEGY (description in parentheses): WGS (Random sequencing of the whole genome)
WGA (whole genome amplification to replace some instances of RANDOM)
 WXS (Random sequencing of exonic regions selected from the genome) RNA-Seq (Random sequencing of whole transcriptome) miRNA-Seq (for micro RNA and other small non-coding RNA sequencing)
ncRNA-Seq(Non-coding RNA)
WCS (Random sequencing of a whole chromosome or other replicon isolated from a genome) CLONE (Genomic clone based (hierarchical) sequencing) POOLCLONE (Shotgun of pooled clones (usually BACs and Fosmids)) AMPLICON (Sequencing of overlapping or distinct PCR or RT-PCR products) CLONEEND (Clone end (5', 3', or both) sequencing) FINISHING (Sequencing intended to finish (close) gaps in existing coverage) ChIP-Seq (Direct sequencing of chromatin immunoprecipitates) MNase-Seq (Direct sequencing following MNase digestion) DNase-Hypersensitivity (Sequencing of hypersensitive sites, or segments of open chromatin that are more readily cleaved by DNaseI) Bisulfite-Seq (Sequencing following treatment of DNA with bisulfite to convert cytosine residues to uracil depending on methylation status) EST (Single pass sequencing of cDNA templates) FL-cDNA (Full-length sequencing of cDNA templates) CTS (Concatenated Tag Sequencing) MRE-Seq (Methylation-Sensitive Restriction Enzyme Sequencing strategy) MeDIP-Seq (Methylated DNA Immunoprecipitation Sequencing strategy) MBD-Seq (Direct sequencing of methylated fractions sequencing strategy)
Tn-Seq (for gene fitness determination through transposon seeding)
VALIDATION (CGHub special request: Independent experiment to re-evaluate putative variants.
    Micro RNA sequencing strategy designed to capturepost-transcriptional RNA elements and include non-coding functional elements)
FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements)
SELEX (Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an
in vitro strategy to analyze RNA sequences that perform an activity of interest, most commonly high affinity binding to a ligand)
RIP-Seq (Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLI))
ChiA-PET (Direct sequencing of proximity-ligated chromatin immunoprecipitates)
RAD-Seq (Restriction site Associated DNA Sequencing is a method for sampling the genomes of multiple individuals in a population
using NGS)

OTHER (Library strategy not listed) --> <!-- Controlled vocabulary for LIBRARY SOURCE (description in parentheses): GENOMIC (Genomic DNA (includes PCR products from genomic DNA)) TRANSCRIPTOMIC (Transcription products or non genomic DNA (EST, cDNA, RT-PCR, screened libraries)) METAGENOMIC (Mixed material from metagenome) METATRANSCRIPTOMIC (Transcription products from community targets) SYNTHETIC (Synthetic DNA) VIRAL RNA (Viral RNA) OTHER (Other, unspecified, or unknown library source material) --> <!-- Controlled vocabulary for LIBRARY SELECTION (description in parentheses): RANDOM (Random selection by shearing or other method) PCR (Source material was selected by designed primers) RANDOM PCR (Source material was selected by randomly generated primers) RT-PCR (Source material was selected by reverse transcription PCR) HMPR (Hypo-methylated partial restriction digest) MF (Methyl Filtrated) repeat fractionation (replaces: CF-S, CF-M, CF-H, CF-T)
size fractionation
MSLL (Methylation Spanning Linking Library) cDNA (complementary DNA) ChIP (Chromatin immunoprecipitation) MNase (Micrococcal Nuclease (MNase) digestion) DNAse (Deoxyribonuclease (MNase) digestion) Hybrid Selection (Selection by hybridization in array or solution) Reduced Representation (Reproducible genomic subsets, often generated by restriction fragment size selection, containing a manageable number of loci to facilitate re-sampling) Restriction Digest (DNA fractionation using restriction enzymes) 5-methylcytidine antibody (Selection of methylated DNA fragments using an antibody raised against 5-methylcytosine or 5-methylcytidine (m5C)) MBD2 protein methyl-CpG binding domain (Enrichment by methyl-CpG binding domain) CAGE (Cap-analysis gene expression) RACE (Rapid Amplification of cDNA Ends)
MDA (Multiple displacement amplification)
padlock probes capture method (to be used in conjuction with Bisulfite-Seq)
Inverse rRNA selection (Remove the ribosomal transcripts by inverse selection: you capture them by annealing with specific oligos,
also bound to beads, and then discard that)
Oligo-dT (Select primarily messenger RNA, which conveniently is polyadenylated so these transcripts can be captured with oligo-dT beads)

other (Other library enrichment, screening, or selection process)
unspecified (Library enrichment, screening, or selection is not specified)
-->

Experiment XML: llumina paired reads

<?xml version="1.0" encoding="UTF-8"?>
<EXPERIMENT_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.experiment.xsd">
    <EXPERIMENT alias="TODO: UNIQUE NAME FOR EXPERIMENT"
        center_name="TODO: ACCOUNT CENTER_NAME ACRONYM">
        <TITLE>TODO: TITLE OF EXPERIMENT</TITLE>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <DESIGN>
            <DESIGN_DESCRIPTION>TODO: DETAILS ABOUT THE SETUP AND GOALS OF THE EXPERIMENT AS
                SUPPLIED BY INVESTIGATOR</DESIGN_DESCRIPTION>
            <SAMPLE_DESCRIPTOR refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT"/>
            <LIBRARY_DESCRIPTOR>
                <LIBRARY_NAME>TODO: NAME OF LIBRARY</LIBRARY_NAME>
                <LIBRARY_STRATEGY>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_STRATEGY>
                <LIBRARY_SOURCE>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_SOURCE>
                <LIBRARY_SELECTION>TODO: CHOOSE FROM CONTROLLED VOCABULARY aT END OF XML
                </LIBRARY_SELECTION>
                <LIBRARY_LAYOUT>
                    <PAIRED NOMINAL_LENGTH="TODO: EXPECTED INSERT SIZE"/>
                </LIBRARY_LAYOUT>
                <LIBRARY_CONSTRUCTION_PROTOCOL>TODO: PROTOCOL BY WHICH THE LIBRARY WAS
                    CONSTRUCTED</LIBRARY_CONSTRUCTION_PROTOCOL>
            </LIBRARY_DESCRIPTOR>
            <!-- Please note that the spot descriptor is no longer required for most formats. -->
            <SPOT_DESCRIPTOR>
                <SPOT_DECODE_SPEC>
                    <SPOT_LENGTH>TODO: Expected number of base calls or cycles per spot 
                    (raw sequence length including all application and technical tags and mate pairs)
                    </SPOT_LENGTH>
                    <READ_SPEC>
                        <READ_INDEX>0</READ_INDEX>
                        <READ_LABEL>F</READ_LABEL>
                        <READ_CLASS>Application Read</READ_CLASS>
                        <READ_TYPE>Forward</READ_TYPE>
                        <BASE_COORD>1</BASE_COORD>
                    </READ_SPEC>
                    <READ_SPEC>
                        <READ_INDEX>1</READ_INDEX>
                        <READ_LABEL>R</READ_LABEL>
                        <READ_CLASS>Application Read</READ_CLASS>
                        <READ_TYPE>Reverse</READ_TYPE>
                        <BASE_COORD>TODO: SINGLE READ LENGTH + 1</BASE_COORD>
                        <!-- The BASE_COORD is the coordinate for the read. For the 
                            second application read, this will be 1 + length of the 
                            first read. E.g., for first read of length 38 + second read of length 38 
                            this is 39.-->
                        <!-- READ_LABEL can be used to reference separate forward and 
                            reverse files. Delete this tag if both Forward and Reverse 
                            reads occur in the same file (e.g., srf, sff etc). -->
                    </READ_SPEC>
                </SPOT_DECODE_SPEC>
            </SPOT_DESCRIPTOR>
        </DESIGN>
        <PLATFORM>
            <ILLUMINA>
                <INSTRUMENT_MODEL>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </INSTRUMENT_MODEL>
            </ILLUMINA>
        </PLATFORM>
        <PROCESSING/>
    </EXPERIMENT>
    <!-- If you are submitting more than one experiment, replicate the block <EXPERIMENT> 
        to </EXPERIMENT> here, as many times as necessary. -->
</EXPERIMENT_SET>
<!-- Controlled vocabulary for INSTRUMENT_MODEL (description in parentheses):
  Illumina Genome Analyzer
 Illumina Genome Analyzer II
 Illumina Genome Analyzer IIx
Illumina HiSeq 2500
 Illumina HiSeq 2000
Illumina HiSeq 1500
Illumina HiSeq 1000
Illumina MiSeq
Illumina HiScanSQ
HiSeq X Ten
NextSeq 500
unspecified
-->
<!-- Controlled vocabulary for LIBRARY STRATEGY (description in parentheses):
    WGS (Random sequencing of the whole genome)
    WGA (whole genome amplification to replace some instances of RANDOM)
    WXS (Random sequencing of exonic regions selected from the genome)
    RNA-Seq (Random sequencing of whole transcriptome)
    miRNA-Seq (for micro RNA and other small non-coding RNA sequencing)
    ncRNA-Seq(Non-coding RNA)
    WCS (Random sequencing of a whole chromosome or other replicon isolated
    from a genome)
    CLONE (Genomic clone based (hierarchical) sequencing)
    POOLCLONE (Shotgun of pooled clones (usually BACs and Fosmids))
    AMPLICON (Sequencing of overlapping or distinct PCR or RT-PCR products)
    CLONEEND (Clone end (5', 3', or both) sequencing)
    FINISHING (Sequencing intended to finish (close) gaps in existing coverage)
    ChIP-Seq (Direct sequencing of chromatin immunoprecipitates)
    MNase-Seq (Direct sequencing following MNase digestion)
    DNase-Hypersensitivity (Sequencing of hypersensitive sites, or segments of
    open chromatin that are more readily cleaved by DNaseI)
    Bisulfite-Seq (Sequencing following treatment of DNA with bisulfite to
    convert cytosine residues to uracil depending on methylation status)
    EST (Single pass sequencing of cDNA templates)
    FL-cDNA (Full-length sequencing of cDNA templates)
    CTS (Concatenated Tag Sequencing)
    MRE-Seq (Methylation-Sensitive Restriction Enzyme Sequencing strategy)
    MeDIP-Seq (Methylated DNA Immunoprecipitation Sequencing strategy)
    MBD-Seq (Direct sequencing of methylated fractions sequencing strategy)
    Tn-Seq (for gene fitness determination through transposon seeding)
    VALIDATION (CGHub special request: Independent experiment to re-evaluate putative variants.
Micro RNA sequencing strategy designed to capturepost-transcriptional RNA elements and include non-coding functional elements)
    FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements)
    SELEX (Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an
    in vitro strategy to analyze RNA sequences that perform an activity of interest, most
    commonly high affinity binding to a ligand)
    RIP-Seq (Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLI))
    ChiA-PET (Direct sequencing of proximity-ligated chromatin immunoprecipitates)
    RAD-Seq (Restriction site Associated DNA Sequencing is a method for sampling the genomes of multiple individuals in a population
using NGS)
    OTHER (Library strategy not listed)
-->
<!-- Controlled vocabulary for LIBRARY SOURCE (description in parentheses):
    GENOMIC (Genomic DNA (includes PCR products from genomic DNA))
    TRANSCRIPTOMIC (Transcription products or non genomic DNA (EST, cDNA, RT-PCR,
    screened libraries))
    METAGENOMIC (Mixed material from metagenome)
    METATRANSCRIPTOMIC (Transcription products from community targets)
    SYNTHETIC (Synthetic DNA)
    VIRAL RNA (Viral RNA)
    OTHER (Other, unspecified, or unknown library source material)
-->
<!-- Controlled vocabulary for LIBRARY SELECTION (description in parentheses):
    RANDOM (Random selection by shearing or other method)
    PCR (Source material was selected by designed primers)
    RANDOM PCR (Source material was selected by randomly generated primers)
    RT-PCR (Source material was selected by reverse transcription PCR)
    HMPR (Hypo-methylated partial restriction digest)
    MF (Methyl Filtrated)
    repeat fractionation (replaces: CF-S, CF-M, CF-H, CF-T)
    size fractionation
    MSLL (Methylation Spanning Linking Library)
    cDNA (complementary DNA)
    ChIP (Chromatin immunoprecipitation)
    MNase (Micrococcal Nuclease (MNase) digestion)
    DNAse (Deoxyribonuclease (MNase) digestion)
    Hybrid Selection (Selection by hybridization in array or solution)
    Reduced Representation (Reproducible genomic subsets, often generated by
    restriction fragment size selection, containing a
    manageable number of loci to facilitate re-sampling)
    Restriction Digest (DNA fractionation using restriction enzymes)
    5-methylcytidine antibody (Selection of methylated DNA fragments using an
    antibody raised against 5-methylcytosine or
    5-methylcytidine (m5C))
    MBD2 protein methyl-CpG binding domain (Enrichment by methyl-CpG binding domain)
    CAGE (Cap-analysis gene expression)
    RACE (Rapid Amplification of cDNA Ends)
    MDA (Multiple displacement amplification)
    padlock probes capture method (to be used in conjuction with Bisulfite-Seq)
    Inverse rRNA selection (Remove the ribosomal transcripts by inverse selection: you capture them by annealing with specific oligos,
also bound to beads, and then discard that)
    Oligo-dT (Select primarily messenger RNA, which conveniently is polyadenylated so these transcripts can be captured with oligo-dT beads)
    other (Other library enrichment, screening, or selection process)
    unspecified (Library enrichment, screening, or selection is not specified)
-->

Experiment XML: 454 unpooled single reads (SFF files)

<?xml version="1.0" encoding="UTF-8"?>
<EXPERIMENT_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.experiment.xsd">
    <EXPERIMENT alias="TODO: UNIQUE NAME FOR EXPERIMENT"
        center_name="TODO: ACCOUNT CENTER_NAME ACRONYM">
        <TITLE>TODO: TITLE OF EXPERIMENT</TITLE>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <DESIGN>
            <DESIGN_DESCRIPTION>TODO: DETAILS ABOUT THE SETUP AND GOALS OF THE EXPERIMENT AS
                SUPPLIED BY INVESTIGATOR</DESIGN_DESCRIPTION>
            <SAMPLE_DESCRIPTOR refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT"/>
            <LIBRARY_DESCRIPTOR>
                <LIBRARY_NAME>TODO: NAME OF LIBRARY</LIBRARY_NAME>
                <LIBRARY_STRATEGY>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_STRATEGY>
                <LIBRARY_SOURCE>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_SOURCE>
                <LIBRARY_SELECTION>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_SELECTION>
                <LIBRARY_LAYOUT>
                    <SINGLE/>
                </LIBRARY_LAYOUT>
                <LIBRARY_CONSTRUCTION_PROTOCOL>TODO: PROTOCOL BY WHICH THE LIBRARY WAS
                    CONSTRUCTED</LIBRARY_CONSTRUCTION_PROTOCOL>
            </LIBRARY_DESCRIPTOR>
        </DESIGN>
        <PLATFORM>
            <LS454>
                <INSTRUMENT_MODEL>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </INSTRUMENT_MODEL>
            </LS454>
        </PLATFORM>
        <PROCESSING/>
    </EXPERIMENT>
    <!-- If you are submitting more than one experiment, replicate the block <EXPERIMENT> 
        to </EXPERIMENT> here, as many times as necessary. -->
</EXPERIMENT_SET>
<!-- Controlled vocabulary for INSTRUMENT_MODEL (description in parentheses):
    454 GS
    454 GS 20
    454 GS FLX
454 GS FLX+
 454 GS FLX Titanium 454 GS Junior unspecified -->
<!-- Controlled vocabulary for LIBRARY STRATEGY (description in parentheses):
    WGS (Random sequencing of the whole genome)
    WGA (whole genome amplification to replace some instances of RANDOM)
    WXS (Random sequencing of exonic regions selected from the genome)
    RNA-Seq (Random sequencing of whole transcriptome)
    miRNA-Seq (for micro RNA and other small non-coding RNA sequencing)
    ncRNA-Seq(Non-coding RNA)
    WCS (Random sequencing of a whole chromosome or other replicon isolated
    from a genome)
    CLONE (Genomic clone based (hierarchical) sequencing)
    POOLCLONE (Shotgun of pooled clones (usually BACs and Fosmids))
    AMPLICON (Sequencing of overlapping or distinct PCR or RT-PCR products)
    CLONEEND (Clone end (5', 3', or both) sequencing)
    FINISHING (Sequencing intended to finish (close) gaps in existing coverage)
    ChIP-Seq (Direct sequencing of chromatin immunoprecipitates)
    MNase-Seq (Direct sequencing following MNase digestion)
    DNase-Hypersensitivity (Sequencing of hypersensitive sites, or segments of
    open chromatin that are more readily cleaved by DNaseI)
    Bisulfite-Seq (Sequencing following treatment of DNA with bisulfite to
    convert cytosine residues to uracil depending on methylation status)
    EST (Single pass sequencing of cDNA templates)
    FL-cDNA (Full-length sequencing of cDNA templates)
    CTS (Concatenated Tag Sequencing)
    MRE-Seq (Methylation-Sensitive Restriction Enzyme Sequencing strategy)
    MeDIP-Seq (Methylated DNA Immunoprecipitation Sequencing strategy)
    MBD-Seq (Direct sequencing of methylated fractions sequencing strategy)
    Tn-Seq (for gene fitness determination through transposon seeding)
    VALIDATION (CGHub special request: Independent experiment to re-evaluate putative variants.
    Micro RNA sequencing strategy designed to capturepost-transcriptional RNA elements and include non-coding functional elements)
    FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements)
    SELEX (Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an
    in vitro strategy to analyze RNA sequences that perform an activity of interest, most
    commonly high affinity binding to a ligand)
    RIP-Seq (Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLI))
    ChiA-PET (Direct sequencing of proximity-ligated chromatin immunoprecipitates)
    RAD-Seq (Restriction site Associated DNA Sequencing is a method for sampling the genomes of multiple individuals in a population
using NGS)
    OTHER (Library strategy not listed)
-->
<!-- Controlled vocabulary for LIBRARY SOURCE (description in parentheses):
    GENOMIC (Genomic DNA (includes PCR products from genomic DNA))
    TRANSCRIPTOMIC (Transcription products or non genomic DNA (EST, cDNA, RT-PCR,
    screened libraries))
    METAGENOMIC (Mixed material from metagenome)
    METATRANSCRIPTOMIC (Transcription products from community targets)
    SYNTHETIC (Synthetic DNA)
    VIRAL RNA (Viral RNA)
    OTHER (Other, unspecified, or unknown library source material)
-->
<!-- Controlled vocabulary for LIBRARY SELECTION (description in parentheses):
    RANDOM (Random selection by shearing or other method)
    PCR (Source material was selected by designed primers)
    RANDOM PCR (Source material was selected by randomly generated primers)
    RT-PCR (Source material was selected by reverse transcription PCR)
    HMPR (Hypo-methylated partial restriction digest)
    MF (Methyl Filtrated)
    repeat fractionation (replaces: CF-S, CF-M, CF-H, CF-T)
    size fractionation
    MSLL (Methylation Spanning Linking Library)
    cDNA (complementary DNA)
    ChIP (Chromatin immunoprecipitation)
    MNase (Micrococcal Nuclease (MNase) digestion)
    DNAse (Deoxyribonuclease (MNase) digestion)
    Hybrid Selection (Selection by hybridization in array or solution)
    Reduced Representation (Reproducible genomic subsets, often generated by
    restriction fragment size selection, containing a
    manageable number of loci to facilitate re-sampling)
    Restriction Digest (DNA fractionation using restriction enzymes)
    5-methylcytidine antibody (Selection of methylated DNA fragments using an
    antibody raised against 5-methylcytosine or
    5-methylcytidine (m5C))
    MBD2 protein methyl-CpG binding domain (Enrichment by methyl-CpG binding domain)
    CAGE (Cap-analysis gene expression)
    RACE (Rapid Amplification of cDNA Ends)
    MDA (Multiple displacement amplification)
    padlock probes capture method (to be used in conjuction with Bisulfite-Seq)
    Inverse rRNA selection (Remove the ribosomal transcripts by inverse selection: you capture them by annealing with specific oligos,
also bound to beads, and then discard that)
    Oligo-dT (Select primarily messenger RNA, which conveniently is polyadenylated so these transcripts can be captured with oligo-dT beads)
    other (Other library enrichment, screening, or selection process)
    unspecified (Library enrichment, screening, or selection is not specified)
-->

Experiment XML: 454 unpooled paired reads (SFF files)

<?xml version="1.0" encoding="UTF-8"?>
<EXPERIMENT_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.experiment.xsd">
    <EXPERIMENT alias="TODO: UNIQUE NAME FOR EXPERIMENT"
        center_name="TODO: ACCOUNT CENTER_NAME ACRONYM">
        <TITLE>TODO: TITLE OF EXPERIMENT</TITLE>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <DESIGN>
            <DESIGN_DESCRIPTION>TODO: DETAILS ABOUT THE SETUP AND GOALS OF THE EXPERIMENT AS
                SUPPLIED BY INVESTIGATOR</DESIGN_DESCRIPTION>
            <SAMPLE_DESCRIPTOR refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT"/>
            <LIBRARY_DESCRIPTOR>
                <LIBRARY_NAME>TODO: NAME OF LIBRARY</LIBRARY_NAME>
                <LIBRARY_STRATEGY>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_STRATEGY>
                <LIBRARY_SOURCE>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_SOURCE>
                <LIBRARY_SELECTION>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_SELECTION>
                <LIBRARY_LAYOUT>
                    <PAIRED NOMINAL_LENGTH="TODO: EXPECTED INSERT SIZE"/>
                </LIBRARY_LAYOUT>
                <LIBRARY_CONSTRUCTION_PROTOCOL>TODO: PROTOCOL BY WHICH THE LIBRARY WAS
                    CONSTRUCTED</LIBRARY_CONSTRUCTION_PROTOCOL>
            </LIBRARY_DESCRIPTOR>
            <!-- Please note that the spot descriptor is no longer required for most formats. -->
            <SPOT_DESCRIPTOR>
                <SPOT_DECODE_SPEC>
                    <READ_SPEC>
                        <READ_INDEX>0</READ_INDEX>
                        <READ_CLASS>Technical Read</READ_CLASS>
                        <READ_TYPE>Adapter</READ_TYPE>
                        <BASE_COORD>1</BASE_COORD>
                    </READ_SPEC>
                    <READ_SPEC>
                        <READ_INDEX>1</READ_INDEX>
                        <READ_CLASS>Application Read</READ_CLASS>
                        <READ_TYPE>Forward</READ_TYPE>
                        <BASE_COORD>5</BASE_COORD>
                    </READ_SPEC>
                </SPOT_DECODE_SPEC>
            </SPOT_DESCRIPTOR>
        </DESIGN>
        <PLATFORM>
            <LS454>
                <INSTRUMENT_MODEL>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </INSTRUMENT_MODEL>
            </LS454>
        </PLATFORM>
        <PROCESSING/>
    </EXPERIMENT>
    <!-- If you are submitting more than one experiment, replicate the block <EXPERIMENT> 
         to </EXPERIMENT> here, as many times as necessary. -->
</EXPERIMENT_SET>
<!-- Controlled vocabulary for INSTRUMENT_MODEL (description in parentheses):
    454 GS
    454 GS 20
    454 GS FLX
454 GS FLX+
 454 GS FLX Titanium 454 GS Junior unspecified -->
<!-- Controlled vocabulary for LIBRARY STRATEGY (description in parentheses):
    WGS (Random sequencing of the whole genome)
    WGA (whole genome amplification to replace some instances of RANDOM)
    WXS (Random sequencing of exonic regions selected from the genome)
    RNA-Seq (Random sequencing of whole transcriptome)
    miRNA-Seq (for micro RNA and other small non-coding RNA sequencing)
    ncRNA-Seq(Non-coding RNA)
    WCS (Random sequencing of a whole chromosome or other replicon isolated
    from a genome)
    CLONE (Genomic clone based (hierarchical) sequencing)
    POOLCLONE (Shotgun of pooled clones (usually BACs and Fosmids))
    AMPLICON (Sequencing of overlapping or distinct PCR or RT-PCR products)
    CLONEEND (Clone end (5', 3', or both) sequencing)
    FINISHING (Sequencing intended to finish (close) gaps in existing coverage)
    ChIP-Seq (Direct sequencing of chromatin immunoprecipitates)
    MNase-Seq (Direct sequencing following MNase digestion)
    DNase-Hypersensitivity (Sequencing of hypersensitive sites, or segments of
    open chromatin that are more readily cleaved by DNaseI)
    Bisulfite-Seq (Sequencing following treatment of DNA with bisulfite to
    convert cytosine residues to uracil depending on methylation status)
    EST (Single pass sequencing of cDNA templates)
    FL-cDNA (Full-length sequencing of cDNA templates)
    CTS (Concatenated Tag Sequencing)
    MRE-Seq (Methylation-Sensitive Restriction Enzyme Sequencing strategy)
    MeDIP-Seq (Methylated DNA Immunoprecipitation Sequencing strategy)
    MBD-Seq (Direct sequencing of methylated fractions sequencing strategy)
    Tn-Seq (for gene fitness determination through transposon seeding)
    VALIDATION (CGHub special request: Independent experiment to re-evaluate putative variants.
    Micro RNA sequencing strategy designed to capturepost-transcriptional RNA elements and include non-coding functional elements)
    FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements)
    SELEX (Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an
    in vitro strategy to analyze RNA sequences that perform an activity of interest, most
    commonly high affinity binding to a ligand)
    RIP-Seq (Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLI))
    ChiA-PET (Direct sequencing of proximity-ligated chromatin immunoprecipitates)
    RAD-Seq (Restriction site Associated DNA Sequencing is a method for sampling the genomes of multiple individuals in a population
using NGS)
    OTHER (Library strategy not listed)
-->
<!-- Controlled vocabulary for LIBRARY SOURCE (description in parentheses):
    GENOMIC (Genomic DNA (includes PCR products from genomic DNA))
    TRANSCRIPTOMIC (Transcription products or non genomic DNA (EST, cDNA, RT-PCR,
    screened libraries))
    METAGENOMIC (Mixed material from metagenome)
    METATRANSCRIPTOMIC (Transcription products from community targets)
    SYNTHETIC (Synthetic DNA)
    VIRAL RNA (Viral RNA)
    OTHER (Other, unspecified, or unknown library source material)
-->
<!-- Controlled vocabulary for LIBRARY SELECTION (description in parentheses):
    RANDOM (Random selection by shearing or other method)
    PCR (Source material was selected by designed primers)
    RANDOM PCR (Source material was selected by randomly generated primers)
    RT-PCR (Source material was selected by reverse transcription PCR)
    HMPR (Hypo-methylated partial restriction digest)
    MF (Methyl Filtrated)
    repeat fractionation (replaces: CF-S, CF-M, CF-H, CF-T)
    size fractionation
    MSLL (Methylation Spanning Linking Library)
    cDNA (complementary DNA)
    ChIP (Chromatin immunoprecipitation)
    MNase (Micrococcal Nuclease (MNase) digestion)
    DNAse (Deoxyribonuclease (MNase) digestion)
    Hybrid Selection (Selection by hybridization in array or solution)
    Reduced Representation (Reproducible genomic subsets, often generated by
    restriction fragment size selection, containing a
    manageable number of loci to facilitate re-sampling)
    Restriction Digest (DNA fractionation using restriction enzymes)
    5-methylcytidine antibody (Selection of methylated DNA fragments using an
    antibody raised against 5-methylcytosine or
    5-methylcytidine (m5C))
    MBD2 protein methyl-CpG binding domain (Enrichment by methyl-CpG binding domain)
    CAGE (Cap-analysis gene expression)
    RACE (Rapid Amplification of cDNA Ends)
    MDA (Multiple displacement amplification)
    padlock probes capture method (to be used in conjuction with Bisulfite-Seq)
    Inverse rRNA selection (Remove the ribosomal transcripts by inverse selection: you capture them by annealing with specific oligos,
also bound to beads, and then discard that)
    Oligo-dT (Select primarily messenger RNA, which conveniently is polyadenylated so these transcripts can be captured with oligo-dT beads)
    other (Other library enrichment, screening, or selection process)
    unspecified (Library enrichment, screening, or selection is not specified)
-->

Experiment XML: SOLiD single reads

<?xml version="1.0" encoding="UTF-8"?>
<EXPERIMENT_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.experiment.xsd">
    <EXPERIMENT alias="TODO: UNIQUE NAME FOR EXPERIMENT"
        center_name="TODO: ACCOUNT CENTER_NAME ACRONYM">
        <TITLE>TODO: TITLE OF EXPERIMENT</TITLE>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <DESIGN>
            <DESIGN_DESCRIPTION>TODO: DETAILS ABOUT THE SETUP AND GOALS OF THE EXPERIMENT AS
                SUPPLIED BY INVESTIGATOR</DESIGN_DESCRIPTION>
            <SAMPLE_DESCRIPTOR refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT"/>
            <LIBRARY_DESCRIPTOR>
                <LIBRARY_NAME>TODO: NAME OF LIBRARY</LIBRARY_NAME>
                <LIBRARY_STRATEGY>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_STRATEGY>
                <LIBRARY_SOURCE>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_SOURCE>
                <LIBRARY_SELECTION>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_SELECTION>
                <LIBRARY_LAYOUT>
                    <SINGLE/>
                </LIBRARY_LAYOUT>
                <LIBRARY_CONSTRUCTION_PROTOCOL>TODO: PROTOCOL BY WHICH THE LIBRARY WAS
                    CONSTRUCTED</LIBRARY_CONSTRUCTION_PROTOCOL>
            </LIBRARY_DESCRIPTOR>
            <!-- Please note that the spot descriptor is no longer required for most formats. -->
            <SPOT_DESCRIPTOR>
                <SPOT_DECODE_SPEC>                    
                    <SPOT_LENGTH>TODO: Expected number of base calls or cycles per spot 
                    (raw sequence length including all application and technical tags and mate pairs)
                    </SPOT_LENGTH>
                   <READ_SPEC>
                        <READ_INDEX>0</READ_INDEX>
                        <READ_CLASS>Application Read</READ_CLASS>
                        <READ_TYPE>Forward</READ_TYPE>
                        <BASE_COORD>1</BASE_COORD>
                    </READ_SPEC>
                </SPOT_DECODE_SPEC>
            </SPOT_DESCRIPTOR>
        </DESIGN>
        <PLATFORM>
            <ABI_SOLID>
                <INSTRUMENT_MODEL>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </INSTRUMENT_MODEL>
            </ABI_SOLID>
        </PLATFORM>
        <PROCESSING/>
    </EXPERIMENT>
    <!-- If you are submitting more than one experiment, replicate the block <EXPERIMENT> 
        to </EXPERIMENT> here, as many times as necessary. -->
</EXPERIMENT_SET>
<!-- Controlled vocabulary for INSTRUMENT_MODEL (description in parentheses):
    AB SOLiD System
    AB SOLiD System 2.0
    AB SOLiD System 3.0
AB SOLiD 3 Plus System
 AB SOLiD 4 System AB SOLiD 4hq System AB SOLiD PI System AB 5500 Genetic Analyzer AB 5500xl Genetic Analyzer unspecified -->
<!-- Controlled vocabulary for LIBRARY STRATEGY (description in parentheses):
    WGS (Random sequencing of the whole genome)
    WGA (whole genome amplification to replace some instances of RANDOM)
    WXS (Random sequencing of exonic regions selected from the genome)
    RNA-Seq (Random sequencing of whole transcriptome)
    miRNA-Seq (for micro RNA and other small non-coding RNA sequencing)
    ncRNA-Seq(Non-coding RNA)
    WCS (Random sequencing of a whole chromosome or other replicon isolated
    from a genome)
    CLONE (Genomic clone based (hierarchical) sequencing)
    POOLCLONE (Shotgun of pooled clones (usually BACs and Fosmids))
    AMPLICON (Sequencing of overlapping or distinct PCR or RT-PCR products)
    CLONEEND (Clone end (5', 3', or both) sequencing)
    FINISHING (Sequencing intended to finish (close) gaps in existing coverage)
    ChIP-Seq (Direct sequencing of chromatin immunoprecipitates)
    MNase-Seq (Direct sequencing following MNase digestion)
    DNase-Hypersensitivity (Sequencing of hypersensitive sites, or segments of
    open chromatin that are more readily cleaved by DNaseI)
    Bisulfite-Seq (Sequencing following treatment of DNA with bisulfite to
    convert cytosine residues to uracil depending on methylation status)
    EST (Single pass sequencing of cDNA templates)
    FL-cDNA (Full-length sequencing of cDNA templates)
    CTS (Concatenated Tag Sequencing)
    MRE-Seq (Methylation-Sensitive Restriction Enzyme Sequencing strategy)
    MeDIP-Seq (Methylated DNA Immunoprecipitation Sequencing strategy)
    MBD-Seq (Direct sequencing of methylated fractions sequencing strategy)
    Tn-Seq (for gene fitness determination through transposon seeding)
    VALIDATION (CGHub special request: Independent experiment to re-evaluate putative variants.
    Micro RNA sequencing strategy designed to capturepost-transcriptional RNA elements and include non-coding functional elements)
    FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements)
    SELEX (Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an
    in vitro strategy to analyze RNA sequences that perform an activity of interest, most
    commonly high affinity binding to a ligand)
    RIP-Seq (Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLI))
    ChiA-PET (Direct sequencing of proximity-ligated chromatin immunoprecipitates)
    RAD-Seq (Restriction site Associated DNA Sequencing is a method for sampling the genomes of multiple individuals in a population
using NGS)
    OTHER (Library strategy not listed)
-->
<!-- Controlled vocabulary for LIBRARY SOURCE (description in parentheses):
    GENOMIC (Genomic DNA (includes PCR products from genomic DNA))
    TRANSCRIPTOMIC (Transcription products or non genomic DNA (EST, cDNA, RT-PCR,
    screened libraries))
    METAGENOMIC (Mixed material from metagenome)
    METATRANSCRIPTOMIC (Transcription products from community targets)
    SYNTHETIC (Synthetic DNA)
    VIRAL RNA (Viral RNA)
    OTHER (Other, unspecified, or unknown library source material)
-->
<!-- Controlled vocabulary for LIBRARY SELECTION (description in parentheses):
    RANDOM (Random selection by shearing or other method)
    PCR (Source material was selected by designed primers)
    RANDOM PCR (Source material was selected by randomly generated primers)
    RT-PCR (Source material was selected by reverse transcription PCR)
    HMPR (Hypo-methylated partial restriction digest)
    MF (Methyl Filtrated)
    repeat fractionation (replaces: CF-S, CF-M, CF-H, CF-T)
    size fractionation
    MSLL (Methylation Spanning Linking Library)
    cDNA (complementary DNA)
    ChIP (Chromatin immunoprecipitation)
    MNase (Micrococcal Nuclease (MNase) digestion)
    DNAse (Deoxyribonuclease (MNase) digestion)
    Hybrid Selection (Selection by hybridization in array or solution)
    Reduced Representation (Reproducible genomic subsets, often generated by
    restriction fragment size selection, containing a
    manageable number of loci to facilitate re-sampling)
    Restriction Digest (DNA fractionation using restriction enzymes)
    5-methylcytidine antibody (Selection of methylated DNA fragments using an
    antibody raised against 5-methylcytosine or
    5-methylcytidine (m5C))
    MBD2 protein methyl-CpG binding domain (Enrichment by methyl-CpG binding domain)
    CAGE (Cap-analysis gene expression)
    RACE (Rapid Amplification of cDNA Ends)
    MDA (Multiple displacement amplification)
    padlock probes capture method (to be used in conjuction with Bisulfite-Seq)
    Inverse rRNA selection (Remove the ribosomal transcripts by inverse selection: you capture them by annealing with specific oligos,
also bound to beads, and then discard that)
    Oligo-dT (Select primarily messenger RNA, which conveniently is polyadenylated so these transcripts can be captured with oligo-dT beads)
    other (Other library enrichment, screening, or selection process)
    unspecified (Library enrichment, screening, or selection is not specified)
-->
 
Experiment XML: SOLiD paired reads
<?xml version="1.0" encoding="UTF-8"?>
<EXPERIMENT_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.experiment.xsd">
    <EXPERIMENT alias="TODO: UNIQUE NAME FOR EXPERIMENT"
        center_name="TODO: ACCOUNT CENTER_NAME ACRONYM">
        <TITLE>TODO: TITLE OF EXPERIMENT</TITLE>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <DESIGN>
            <DESIGN_DESCRIPTION>TODO: DETAILS ABOUT THE SETUP AND GOALS OF THE EXPERIMENT AS
                SUPPLIED BY INVESTIGATOR</DESIGN_DESCRIPTION>
            <SAMPLE_DESCRIPTOR refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT"/>
            <LIBRARY_DESCRIPTOR>
                <LIBRARY_NAME>TODO: NAME OF LIBRARY</LIBRARY_NAME>
                <LIBRARY_STRATEGY>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_STRATEGY>
                <LIBRARY_SOURCE>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_SOURCE>
                <LIBRARY_SELECTION>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML
                </LIBRARY_SELECTION>
                <LIBRARY_LAYOUT>
                    <PAIRED NOMINAL_LENGTH="TODO: EXPECTED INSERT SIZE"/>
                </LIBRARY_LAYOUT>
                <LIBRARY_CONSTRUCTION_PROTOCOL>TODO: PROTOCOL BY WHICH THE LIBRARY WAS
                    CONSTRUCTED</LIBRARY_CONSTRUCTION_PROTOCOL>
            </LIBRARY_DESCRIPTOR>
            <!-- Please note that the spot descriptor is no longer required for most formats. -->
            <SPOT_DESCRIPTOR>
                <SPOT_DECODE_SPEC>
                    <SPOT_LENGTH>TODO: Expected number of base calls or cycles per spot 
                    (raw sequence length including all application and technical tags and mate pairs)
                    </SPOT_LENGTH>
                    <READ_SPEC>
                        <READ_INDEX>0</READ_INDEX>
                        <READ_CLASS>Application Read</READ_CLASS>
                        <READ_TYPE>Forward</READ_TYPE>
                        <BASE_COORD>1</BASE_COORD>
                    </READ_SPEC>
                    <READ_SPEC>
                        <READ_INDEX>1</READ_INDEX>
                        <READ_CLASS>Application Read</READ_CLASS>
                        <READ_TYPE>Forward</READ_TYPE>
                        <BASE_COORD>TODO: SINGLE READ LENGTH + 1</BASE_COORD>
                        <!-- The BASE_COORD is the coordinate for the read. For 
                            the second application read, this will be 1 + length 
                            of the first read. E.g., for 38+38 this is 39.-->
                    </READ_SPEC>
                </SPOT_DECODE_SPEC>
            </SPOT_DESCRIPTOR>
        </DESIGN>
        <PLATFORM>
            <ABI_SOLID>
                <INSTRUMENT_MODEL>TODO: CHOOSE FROM 'CONTROLLED VOCABULARY 4' AT END OF XML
                </INSTRUMENT_MODEL>
            </ABI_SOLID>
        </PLATFORM>
        <PROCESSING/>
    </EXPERIMENT>
    <!-- If you are submitting more than one experiment, replicate the block <EXPERIMENT> 
        to </EXPERIMENT> here, as many times as necessary. -->
</EXPERIMENT_SET>
<!-- Controlled vocabulary for INSTRUMENT_MODEL (description in parentheses):
   AB SOLiD System 
AB SOLiD System 2.0
AB SOLiD System 3.0
AB SOLiD 3 Plus System
  AB SOLiD 4 System
AB SOLiD 4hq System
AB SOLiD PI System
AB 5500 Genetic Analyzer
AB 5500xl Genetic Analyzer
unspecified -->
<!-- Controlled vocabulary for LIBRARY STRATEGY (description in parentheses):
    WGS (Random sequencing of the whole genome)
    WGA (whole genome amplification to replace some instances of RANDOM)
    WXS (Random sequencing of exonic regions selected from the genome)
    RNA-Seq (Random sequencing of whole transcriptome)
    miRNA-Seq (for micro RNA and other small non-coding RNA sequencing)
    ncRNA-Seq(Non-coding RNA)
    WCS (Random sequencing of a whole chromosome or other replicon isolated
    from a genome)
    CLONE (Genomic clone based (hierarchical) sequencing)
    POOLCLONE (Shotgun of pooled clones (usually BACs and Fosmids))
    AMPLICON (Sequencing of overlapping or distinct PCR or RT-PCR products)
    CLONEEND (Clone end (5', 3', or both) sequencing)
    FINISHING (Sequencing intended to finish (close) gaps in existing coverage)
    ChIP-Seq (Direct sequencing of chromatin immunoprecipitates)
    MNase-Seq (Direct sequencing following MNase digestion)
    DNase-Hypersensitivity (Sequencing of hypersensitive sites, or segments of
    open chromatin that are more readily cleaved by DNaseI)
    Bisulfite-Seq (Sequencing following treatment of DNA with bisulfite to
    convert cytosine residues to uracil depending on methylation status)
    EST (Single pass sequencing of cDNA templates)
    FL-cDNA (Full-length sequencing of cDNA templates)
    CTS (Concatenated Tag Sequencing)
    MRE-Seq (Methylation-Sensitive Restriction Enzyme Sequencing strategy)
    MeDIP-Seq (Methylated DNA Immunoprecipitation Sequencing strategy)
    MBD-Seq (Direct sequencing of methylated fractions sequencing strategy)
    Tn-Seq (for gene fitness determination through transposon seeding)
    VALIDATION (CGHub special request: Independent experiment to re-evaluate putative variants.
    Micro RNA sequencing strategy designed to capturepost-transcriptional RNA elements and include non-coding functional elements)
    FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements)
    SELEX (Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an
    in vitro strategy to analyze RNA sequences that perform an activity of interest, most
    commonly high affinity binding to a ligand)
    RIP-Seq (Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLI))
    ChiA-PET (Direct sequencing of proximity-ligated chromatin immunoprecipitates)
    RAD-Seq (Restriction site Associated DNA Sequencing is a method for sampling the genomes of multiple individuals in a population
using NGS)
    OTHER (Library strategy not listed)
-->
<!-- Controlled vocabulary for LIBRARY SOURCE (description in parentheses):
    GENOMIC (Genomic DNA (includes PCR products from genomic DNA))
    TRANSCRIPTOMIC (Transcription products or non genomic DNA (EST, cDNA, RT-PCR,
    screened libraries))
    METAGENOMIC (Mixed material from metagenome)
    METATRANSCRIPTOMIC (Transcription products from community targets)
    SYNTHETIC (Synthetic DNA)
    VIRAL RNA (Viral RNA)
    OTHER (Other, unspecified, or unknown library source material)
-->
<!-- Controlled vocabulary for LIBRARY SELECTION (description in parentheses):
    RANDOM (Random selection by shearing or other method)
    PCR (Source material was selected by designed primers)
    RANDOM PCR (Source material was selected by randomly generated primers)
    RT-PCR (Source material was selected by reverse transcription PCR)
    HMPR (Hypo-methylated partial restriction digest)
    MF (Methyl Filtrated)
    repeat fractionation (replaces: CF-S, CF-M, CF-H, CF-T)
    size fractionation
    MSLL (Methylation Spanning Linking Library)
    cDNA (complementary DNA)
    ChIP (Chromatin immunoprecipitation)
    MNase (Micrococcal Nuclease (MNase) digestion)
    DNAse (Deoxyribonuclease (MNase) digestion)
    Hybrid Selection (Selection by hybridization in array or solution)
    Reduced Representation (Reproducible genomic subsets, often generated by
    restriction fragment size selection, containing a
    manageable number of loci to facilitate re-sampling)
    Restriction Digest (DNA fractionation using restriction enzymes)
    5-methylcytidine antibody (Selection of methylated DNA fragments using an
    antibody raised against 5-methylcytosine or
    5-methylcytidine (m5C))
    MBD2 protein methyl-CpG binding domain (Enrichment by methyl-CpG binding domain)
    CAGE (Cap-analysis gene expression)
    RACE (Rapid Amplification of cDNA Ends)
    MDA (Multiple displacement amplification)
    padlock probes capture method (to be used in conjuction with Bisulfite-Seq)
    Inverse rRNA selection (Remove the ribosomal transcripts by inverse selection: you capture them by annealing with specific oligos,
also bound to beads, and then discard that)
    Oligo-dT (Select primarily messenger RNA, which conveniently is polyadenylated so these transcripts can be captured with oligo-dT beads)
    other (Other library enrichment, screening, or selection process)
    unspecified (Library enrichment, screening, or selection is not specified)
-->

Experiment XML: 454 pooled single reads

<?xml version="1.0" encoding="UTF-8"?>
<EXPERIMENT_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.experiment.xsd">
    <EXPERIMENT alias="TODO: UNIQUE NAME FOR EXPERIMENT"
        center_name="TODO: ACCOUNT CENTER_NAME ACRONYM">
        <TITLE>TODO: TITLE OF EXPERIMENT</TITLE>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <DESIGN>
            <DESIGN_DESCRIPTION>TODO: DETAILS ABOUT THE SETUP AND GOALS OF THE EXPERIMENT AS
                SUPPLIED BY INVESTIGATOR</DESIGN_DESCRIPTION>
            <SAMPLE_DESCRIPTOR>
                <POOL>
                    <MEMBER refname="TODO: SAMPLE 1 ALIAS" member_name="TODO: SAMPLE 1 ALIAS">
                        <READ_LABEL read_group_tag="barcode1">Barcode Read</READ_LABEL>
                    </MEMBER>
                    <MEMBER refname="TODO: SAMPLE 2 ALIAS" member_name="TODO: SAMPLE 2 ALIAS">
                        <READ_LABEL read_group_tag="barcode2">Barcode Read</READ_LABEL>
                    </MEMBER>
                    <MEMBER refname="TODO: SAMPLE 3 ALIAS" member_name="TODO: SAMPLE 3 ALIAS">
                        <READ_LABEL read_group_tag="barcode3">Barcode Read</READ_LABEL>
                    </MEMBER>
                    <MEMBER refname="TODO: SAMPLE 4 ALIAS" member_name="TODO: SAMPLE 4 ALIAS">
                        <READ_LABEL read_group_tag="barcode4">Barcode Read</READ_LABEL>
                    </MEMBER>
                    <!-- For simplicity, it is useful to use the sample alias as the member_name but
                    it can be anything you wish it to be. The member_name is used as a reference 
                    between the experiment and the demultiplexed run file. -->
                </POOL>
            </SAMPLE_DESCRIPTOR>
            <LIBRARY_DESCRIPTOR>
                <LIBRARY_NAME>TODO: NAME OF LIBRARY</LIBRARY_NAME>
                <LIBRARY_STRATEGY>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML</LIBRARY_STRATEGY>
                <LIBRARY_SOURCE>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML</LIBRARY_SOURCE>
                <LIBRARY_SELECTION>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML</LIBRARY_SELECTION>
                <LIBRARY_LAYOUT>
                    <SINGLE/>
                </LIBRARY_LAYOUT>
                <LIBRARY_CONSTRUCTION_PROTOCOL>TODO: PROTOCOL BY WHICH THE LIBRARY WAS
                    CONSTRUCTED</LIBRARY_CONSTRUCTION_PROTOCOL>
            </LIBRARY_DESCRIPTOR>
            <!-- Please note that the spot descriptor is no longer required for most formats. -->
            <SPOT_DESCRIPTOR>
                <SPOT_DECODE_SPEC>
                    <READ_SPEC>
                        <READ_INDEX>0</READ_INDEX>
                        <READ_CLASS>Application Read</READ_CLASS>
                        <READ_TYPE>Forward</READ_TYPE>
                        <BASE_COORD>1</BASE_COORD>
                    </READ_SPEC>
                    <READ_SPEC>
                        <READ_INDEX>1</READ_INDEX>
                        <READ_LABEL>Barcode Read</READ_LABEL>
                        <READ_CLASS>Technical Read</READ_CLASS>
                        <READ_TYPE>BarCode</READ_TYPE>
                        <EXPECTED_BASECALL_TABLE>
                            <BASECALL read_group_tag="barcode1">TODO: e.g., acgt</BASECALL>
                            <BASECALL read_group_tag="barcode2">TODO: e.g., acgt</BASECALL>
                            <BASECALL read_group_tag="barcode3">TODO: e.g., acgt</BASECALL>
                            <BASECALL read_group_tag="barcode4">TODO: e.g., acgt</BASECALL>
                        </EXPECTED_BASECALL_TABLE>
                    </READ_SPEC>
                    <!-- In this example, the barcode is the second read -->
                </SPOT_DECODE_SPEC>
            </SPOT_DESCRIPTOR>
        </DESIGN>
        <PLATFORM>
            <LS454>
                <INSTRUMENT_MODEL>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF
                    XML</INSTRUMENT_MODEL>
            </LS454>
        </PLATFORM>
        <PROCESSING/>
    </EXPERIMENT>
    <!-- If you are submitting more than one experiment, replicate the block <EXPERIMENT> 
        to </EXPERIMENT> here, as many times as necessary. -->
</EXPERIMENT_SET>
<!-- Controlled vocabulary for INSTRUMENT_MODEL (description in parentheses):
  454 GS 
454 GS 20
454 GS FLX
454 GS FLX+
 454 GS FLX Titanium
454 GS Junior
unspecified -->
<!-- Controlled vocabulary for LIBRARY STRATEGY (description in parentheses):
    WGS (Random sequencing of the whole genome)
    WGA (whole genome amplification to replace some instances of RANDOM)
    WXS (Random sequencing of exonic regions selected from the genome)
    RNA-Seq (Random sequencing of whole transcriptome)
    miRNA-Seq (for micro RNA and other small non-coding RNA sequencing)
    ncRNA-Seq(Non-coding RNA)
    WCS (Random sequencing of a whole chromosome or other replicon isolated
    from a genome)
    CLONE (Genomic clone based (hierarchical) sequencing)
    POOLCLONE (Shotgun of pooled clones (usually BACs and Fosmids))
    AMPLICON (Sequencing of overlapping or distinct PCR or RT-PCR products)
    CLONEEND (Clone end (5', 3', or both) sequencing)
    FINISHING (Sequencing intended to finish (close) gaps in existing coverage)
    ChIP-Seq (Direct sequencing of chromatin immunoprecipitates)
    MNase-Seq (Direct sequencing following MNase digestion)
    DNase-Hypersensitivity (Sequencing of hypersensitive sites, or segments of
    open chromatin that are more readily cleaved by DNaseI)
    Bisulfite-Seq (Sequencing following treatment of DNA with bisulfite to
    convert cytosine residues to uracil depending on methylation status)
    EST (Single pass sequencing of cDNA templates)
    FL-cDNA (Full-length sequencing of cDNA templates)
    CTS (Concatenated Tag Sequencing)
    MRE-Seq (Methylation-Sensitive Restriction Enzyme Sequencing strategy)
    MeDIP-Seq (Methylated DNA Immunoprecipitation Sequencing strategy)
    MBD-Seq (Direct sequencing of methylated fractions sequencing strategy)
    Tn-Seq (for gene fitness determination through transposon seeding)
    VALIDATION (CGHub special request: Independent experiment to re-evaluate putative variants.
    Micro RNA sequencing strategy designed to capturepost-transcriptional RNA elements and include non-coding functional elements)
    FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements)
    SELEX (Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an
    in vitro strategy to analyze RNA sequences that perform an activity of interest, most
    commonly high affinity binding to a ligand)
    RIP-Seq (Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLI))
    ChiA-PET (Direct sequencing of proximity-ligated chromatin immunoprecipitates)
    RAD-Seq (Restriction site Associated DNA Sequencing is a method for sampling the genomes of multiple individuals in a population
using NGS)
    OTHER (Library strategy not listed)
-->
<!-- Controlled vocabulary for LIBRARY SOURCE (description in parentheses):
    GENOMIC (Genomic DNA (includes PCR products from genomic DNA))
    TRANSCRIPTOMIC (Transcription products or non genomic DNA (EST, cDNA, RT-PCR,
    screened libraries))
    METAGENOMIC (Mixed material from metagenome)
    METATRANSCRIPTOMIC (Transcription products from community targets)
    SYNTHETIC (Synthetic DNA)
    VIRAL RNA (Viral RNA)
    OTHER (Other, unspecified, or unknown library source material)
-->
<!-- Controlled vocabulary for LIBRARY SELECTION (description in parentheses):
    RANDOM (Random selection by shearing or other method)
    PCR (Source material was selected by designed primers)
    RANDOM PCR (Source material was selected by randomly generated primers)
    RT-PCR (Source material was selected by reverse transcription PCR)
    HMPR (Hypo-methylated partial restriction digest)
    MF (Methyl Filtrated)
    repeat fractionation (replaces: CF-S, CF-M, CF-H, CF-T)
    size fractionation
    MSLL (Methylation Spanning Linking Library)
    cDNA (complementary DNA)
    ChIP (Chromatin immunoprecipitation)
    MNase (Micrococcal Nuclease (MNase) digestion)
    DNAse (Deoxyribonuclease (MNase) digestion)
    Hybrid Selection (Selection by hybridization in array or solution)
    Reduced Representation (Reproducible genomic subsets, often generated by
    restriction fragment size selection, containing a
    manageable number of loci to facilitate re-sampling)
    Restriction Digest (DNA fractionation using restriction enzymes)
    5-methylcytidine antibody (Selection of methylated DNA fragments using an
    antibody raised against 5-methylcytosine or
    5-methylcytidine (m5C))
    MBD2 protein methyl-CpG binding domain (Enrichment by methyl-CpG binding domain)
    CAGE (Cap-analysis gene expression)
    RACE (Rapid Amplification of cDNA Ends)
    MDA (Multiple displacement amplification)
    padlock probes capture method (to be used in conjuction with Bisulfite-Seq)
    Inverse rRNA selection (Remove the ribosomal transcripts by inverse selection: you capture them by annealing with specific oligos,
also bound to beads, and then discard that)
    Oligo-dT (Select primarily messenger RNA, which conveniently is polyadenylated so these transcripts can be captured with oligo-dT beads)
    other (Other library enrichment, screening, or selection process)
    unspecified (Library enrichment, screening, or selection is not specified)
-->

Experiment XML: Complete Genomics

<?xml version="1.0" encoding="UTF-8"?>
<EXPERIMENT_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.experiment.xsd">
    <EXPERIMENT alias="TODO: UNIQUE NAME FOR EXPERIMENT"
        center_name="TODO: ACCOUNT CENTER_NAME ACRONYM">
        <TITLE>TODO: TITLE OF EXPERIMENT</TITLE>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <DESIGN>
            <DESIGN_DESCRIPTION>TODO: DETAILS ABOUT THE SETUP AND GOALS OF THE EXPERIMENT AS
                SUPPLIED BY INVESTIGATOR</DESIGN_DESCRIPTION>
            <SAMPLE_DESCRIPTOR refname="TODO: SAMPLE ALIAS OR RELEVANT SAMPLE OBJECT"/>
            <LIBRARY_DESCRIPTOR>
                <LIBRARY_NAME>TODO: NAME OF LIBRARY</LIBRARY_NAME>
                <LIBRARY_STRATEGY>WGS</LIBRARY_STRATEGY>
                <LIBRARY_SOURCE>GENOMIC</LIBRARY_SOURCE>
                <LIBRARY_SELECTION>RANDOM</LIBRARY_SELECTION>
                <LIBRARY_LAYOUT>
                    <PAIRED NOMINAL_LENGTH="TODO: EXPECTED INSERT SIZE"/>
                </LIBRARY_LAYOUT>
                <LIBRARY_CONSTRUCTION_PROTOCOL>TODO: PROTOCOL BY WHICH THE LIBRARY WAS
                    CONSTRUCTED</LIBRARY_CONSTRUCTION_PROTOCOL>
            </LIBRARY_DESCRIPTOR>
        </DESIGN>
        <PLATFORM>
            <COMPLETE_GENOMICS>
                <INSTRUMENT_MODEL>Complete Genomics</INSTRUMENT_MODEL>
            </COMPLETE_GENOMICS>
        </PLATFORM>
        <PROCESSING/>
    </EXPERIMENT>
    <!-- If you are submitting more than one experiment, replicate the block <EXPERIMENT> 
        to </EXPERIMENT> here, as many times as necessary. -->
</EXPERIMENT_SET>

 <!-- Controlled vocabulary for LIBRARY STRATEGY (description in parentheses):
    WGS (Random sequencing of the whole genome)
    WGA (whole genome amplification to replace some instances of RANDOM)
    WXS (Random sequencing of exonic regions selected from the genome)
    RNA-Seq (Random sequencing of whole transcriptome)
    miRNA-Seq (for micro RNA and other small non-coding RNA sequencing)
    ncRNA-Seq(Non-coding RNA)
    WCS (Random sequencing of a whole chromosome or other replicon isolated
    from a genome)
    CLONE (Genomic clone based (hierarchical) sequencing)
    POOLCLONE (Shotgun of pooled clones (usually BACs and Fosmids))
    AMPLICON (Sequencing of overlapping or distinct PCR or RT-PCR products)
    CLONEEND (Clone end (5', 3', or both) sequencing)
    FINISHING (Sequencing intended to finish (close) gaps in existing coverage)
    ChIP-Seq (Direct sequencing of chromatin immunoprecipitates)
    MNase-Seq (Direct sequencing following MNase digestion)
    DNase-Hypersensitivity (Sequencing of hypersensitive sites, or segments of
    open chromatin that are more readily cleaved by DNaseI)
    Bisulfite-Seq (Sequencing following treatment of DNA with bisulfite to
    convert cytosine residues to uracil depending on methylation status)
    EST (Single pass sequencing of cDNA templates)
    FL-cDNA (Full-length sequencing of cDNA templates)
    CTS (Concatenated Tag Sequencing)
    MRE-Seq (Methylation-Sensitive Restriction Enzyme Sequencing strategy)
    MeDIP-Seq (Methylated DNA Immunoprecipitation Sequencing strategy)
    MBD-Seq (Direct sequencing of methylated fractions sequencing strategy)
    Tn-Seq (for gene fitness determination through transposon seeding)
    VALIDATION (CGHub special request: Independent experiment to re-evaluate putative variants.
    Micro RNA sequencing strategy designed to capturepost-transcriptional RNA elements and include non-coding functional elements)
    FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements)
    SELEX (Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an
    in vitro strategy to analyze RNA sequences that perform an activity of interest, most
    commonly high affinity binding to a ligand)
    RIP-Seq (Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLI))
    ChiA-PET (Direct sequencing of proximity-ligated chromatin immunoprecipitates)
    RAD-Seq (Restriction site Associated DNA Sequencing is a method for sampling the genomes of multiple individuals in a

    population nusing NGS)
    OTHER (Library strategy not listed)
-->
<!-- Controlled vocabulary for LIBRARY SOURCE (description in parentheses):
    GENOMIC (Genomic DNA (includes PCR products from genomic DNA))
    TRANSCRIPTOMIC (Transcription products or non genomic DNA (EST, cDNA, RT-PCR,
    screened libraries))
    METAGENOMIC (Mixed material from metagenome)
    METATRANSCRIPTOMIC (Transcription products from community targets)
    SYNTHETIC (Synthetic DNA)
    VIRAL RNA (Viral RNA)
    OTHER (Other, unspecified, or unknown library source material)
-->
<!-- Controlled vocabulary for LIBRARY SELECTION (description in parentheses):
    RANDOM (Random selection by shearing or other method)
    PCR (Source material was selected by designed primers)
    RANDOM PCR (Source material was selected by randomly generated primers)
    RT-PCR (Source material was selected by reverse transcription PCR)
    HMPR (Hypo-methylated partial restriction digest)
    MF (Methyl Filtrated)
    repeat fractionation (replaces: CF-S, CF-M, CF-H, CF-T)
    size fractionation
    MSLL (Methylation Spanning Linking Library)
    cDNA (complementary DNA)
    ChIP (Chromatin immunoprecipitation)
    MNase (Micrococcal Nuclease (MNase) digestion)
    DNAse (Deoxyribonuclease (MNase) digestion)
    Hybrid Selection (Selection by hybridization in array or solution)
    Reduced Representation (Reproducible genomic subsets, often generated by
    restriction fragment size selection, containing a
    manageable number of loci to facilitate re-sampling)
    Restriction Digest (DNA fractionation using restriction enzymes)
    5-methylcytidine antibody (Selection of methylated DNA fragments using an
    antibody raised against 5-methylcytosine or
    5-methylcytidine (m5C))
    MBD2 protein methyl-CpG binding domain (Enrichment by methyl-CpG binding domain)
    CAGE (Cap-analysis gene expression)
    RACE (Rapid Amplification of cDNA Ends)
    MDA (Multiple displacement amplification)
    padlock probes capture method (to be used in conjuction with Bisulfite-Seq)
    Inverse rRNA selection (Remove the ribosomal transcripts by inverse selection: you capture them by annealing with specific oligos,

    also bound to beads, and then discard that)
    Oligo-dT (Select primarily messenger RNA, which conveniently is polyadenylated so these transcripts can be captured with oligo-dT beads)
    other (Other library enrichment, screening, or selection process)
    unspecified (Library enrichment, screening, or selection is not specified)
-->

Run XML

The run XML is used to associate data files with experiments and typically comprises of a single data file. Please note that pooled sampled should be de-multiplexed prior submission and submitted as different runs.

<?xml version="1.0" encoding="UTF-8"?>
<RUN_SET  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.run.xsd">
    <RUN alias="TODO: UNIQUE NAME FOR RUN" center_name="TODO: ACCOUNT CENTER_NAME ACRONYM"
        run_center="TODO: ACCOUNT CENTER_NAME ACRONYM" run_date="2008-07-02T10:00:00">
        <!-- run_center is optional but should be used if sequencing was carried out 
             elsewhere. It should use an acronym/abbreviation from a controlled vocabulary. 
             Please contact us for this acronym if unsure. Delete run_center element if 
             not needed. -->
        <!-- run_date uses a strict format. Please make sure the time stamp is included, 
             even if it is zeroes (i.e., T00:00:00). -->
        <EXPERIMENT_REF refname="TODO: EXPERIMENT ALIAS OF RELEVANT EXPERIMENT OBJECT"/>

<!-- reference assembly and sequences should be provided for BAM files with aligned reads -->
  <RUN_TYPE>
         <REFERENCE_ALIGNMENT>
             <ASSEMBLY>
                 <STANDARD refname="TODO: INSDC assembly name (e.g. GRCh37 or GRCh37.p1)"
                 accession="TODO: INSDC assembly accession (e.g. GCA_000001405.1)"/>
             </ASSEMBLY>
             <SEQUENCE accession="TODO: INSDC sequence accession and version"
                 label="TODO: reference sequence name in the BAM file"/>
             <SEQUENCE accession="TODO: INSDC sequence accession and version"
                 label="TODO: reference sequence name in the BAM file"/>
         </REFERENCE_ALIGNMENT>
    </RUN_TYPE>

<DATA_BLOCK member_name="TODO: FOR DEMULTIPLEXED DATA ONLY (see note below)"> <!-- member_name should be the name (usually sample alias) given in the experiment xml of a pooled experiment. For experiments without a pool, the member_name attribute in run.xml should be removed. --> <FILES> <FILE filename="TODO: FILENAME1" filetype="TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML" checksum_method="MD5" checksum="TODO: CHECKSUM1"/> </FILES> </DATA_BLOCK> </RUN> <!-- If you are submitting more than one run, replicate the block <RUN> to </RUN> here, as many times as necessary. --> </RUN_SET> <!-- Controlled vocabulary for filetype: "srf" "sff" "fastq"
"cram"
"bam" "Illumina_native_qseq" "Illumina_native_scarf" "Illumina_native_fastq" "SOLiD_native_csfasta" "SOLiD_native_qual"
"PacBio_HDF5
 "CompleteGenomics_native" -->

 

Run XML: CRAM files

When CRAM files are submitted without embedded reference sequences it is essential to provide RUN_TYPE/REFERENCE_ALIGNMENT/SEQUENCE information (please see above).

Run XML: Complete Genomics

One Run XML object should be created for each Complete Genomics Data Package.

<?xml version="1.0" encoding="UTF-8"?>
<RUN_SET  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_3/SRA.run.xsd">
    <RUN alias="TODO: UNIQUE NAME FOR RUN" center_name="TODO: ACCOUNT CENTER_NAME ACRONYM"
        run_date="2008-07-02T10:00:00">
        <!-- run_date uses a strict format. Please make sure the time stamp is included, 
             even if it is zeroes (i.e., T00:00:00). -->
        <EXPERIMENT_REF refname="TODO: EXPERIMENT ALIAS OF RELEVANT EXPERIMENT OBJECT"/>
        <DATA_BLOCK>
            <FILES>
                <FILE filename="TODO: DIRECTORY NAME FOR THE INDIVIDUAL GENOME"
                    filetype="CompleteGenomics_native"/> 
            </FILES>
        </DATA_BLOCK>
    </RUN>
    <!-- If you are submitting more than one run, replicate the block <RUN> to </RUN> here, 
        as many times as necessary. -->
</RUN_SET>
 

Read alignment (BAM) Analysis XML

The Analysis can be used to submit BAM alignments to ENA. Only one BAM file can be submitted in each analysis and the samples used within the BAM read groups must be associated with Samples. In addition, the Analysis must be associated with a Study. Optimally the BAM file would be associated with an INSDC reference assembly and sequences  either by using accessions  (as for the references sequences in the example below) or by using commonly used labels (as for the reference assembly in the example below). The BAM index can be submitted together with the BAM. If the BAM index file is not submitted then it will be created by ENA. The md5 checksums for the .bam and .bai files can be provided within the Analysis XML or in files .bam.md5 and .bai.md5.

<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
    <ANALYSIS alias="TODO: UNIQUE NAME FOR ANALYSIS" 
        center_name="TODO: center name abbreviation"
        broker_name="TODO: center name abbreviation"
        analysis_center="TODO: center name abbreviation" analysis_date="2011-11-18T10:10:10.0Z">
        <TITLE>TODO: a descriptive title for the analysis shown in search results</TITLE>
        <DESCRIPTION>TODO: a detained description of the analysis</DESCRIPTION>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"
            refcenter="TODO: center name abbreviation"/>
        <SAMPLE_REF refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT 1"
            refcenter="TODO: center name abbreviation"
            label="TODO: sample name in the BAM file"/>
        
        <SAMPLE_REF refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT 2"
            refcenter="TODO: center name abbreviation"
            label="TODO: sample name in the BAM file"/>
        
        <ANALYSIS_TYPE>
            <REFERENCE_ALIGNMENT>
                <ASSEMBLY>
                    <STANDARD refname="TODO: INSDC assembly name (e.g. GRCh37 or GRCh37.p1)" 
accession="TODO: INSDC assembly accession (e.g. GCA_000001405.1)"/> </ASSEMBLY> <SEQUENCE accession="TODO: INSDC sequence accession and version" label="TODO: reference sequence name in the BAM file"/> <SEQUENCE accession="TODO: INSDC sequence accession and version" label="TODO: reference sequence name in the BAM file"/> </REFERENCE_ALIGNMENT> </ANALYSIS_TYPE> <FILES> <FILE filename="TODO: FILENAME.bam" filetype="bam" checksum_method="MD5" checksum="TODO: CHECKSUM" unencrypted_checksum="TODO: CHECKSUM"/> </FILES> <ANALYSIS_ATTRIBUTES> <ANALYSIS_ATTRIBUTE> <TAG>TODO: add any tag and value pairs</TAG> <VALUE>TODO: add any tag and value pairs</VALUE> </ANALYSIS_ATTRIBUTE> </ANALYSIS_ATTRIBUTES> </ANALYSIS> </ANALYSIS_SET>

Sequence variation (VCF) Analysis XML

The Analysis can be used to submit VCF files to ENA. Only one VCF file can be submitted in each analysis and the samples used within the VCF files must be associated with Samples. In addition, the Analysis must be associated with a Study. Optimally the VCF file would be associated with an INSDC reference assembly and sequences either by using accessions (as for the references sequences in the example below) or by using commonly used labels (as for the reference assembly in the example below). The md5 checksums for the .vcf file can be provided within the Analysis XML or in files .vcf.md5.

<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
    <ANALYSIS alias="TODO: UNIQUE NAME FOR ANALYSIS" center_name="TODO: center name abbreviation"
        broker_name="TODO: center name abbreviation"
        analysis_center="TODO: center name abbreviation" analysis_date="2011-11-18T10:10:10.0Z">
        <TITLE>TODO: a descriptive title for the analysis shown is search results</TITLE>
        <DESCRIPTION>TODO: a detailed description of the analysis</DESCRIPTION>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"
            refcenter="TODO: center name abbreviation"/>
        <SAMPLE_REF refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT 1"
            refcenter="TODO: center name abbreviation"
            label="TODO: the sample name in the VCF file"/>
        <SAMPLE_REF refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT 2"
            refcenter="TODO: center name abbreviation"
            label="TODO: the sample name in the VCF file"/>
        <ANALYSIS_TYPE>
            <SEQUENCE_VARIATION>
                <ASSEMBLY>
<STANDARD refname="TODO: INSDC assembly name (e.g. GRCh37 or GRCh37.p1)"
accession="TODO: INSDC assembly accession (e.g. GCA_000001405.1)"/> </ASSEMBLY> <SEQUENCE accession="TODO: INSDC sequence accession and version" label="TODO: the reference sequence name in the VCF file" />
<EXPERIMENT_TYPE>Use one of: "Whole genome sequencing", "Exome sequencing", "Genotyping by array"</EXPERIMENT_TYPE>
  </SEQUENCE_VARIATION> </ANALYSIS_TYPE> <FILES> <FILE filename="TODO: FILENAME.vcf" filetype="vcf" checksum_method="MD5" checksum="TODO: CHECKSUM" unencrypted_checksum="TODO: CHECKSUM"/> </FILES> <ANALYSIS_ATTRIBUTES> <ANALYSIS_ATTRIBUTE> <TAG>TODO: add any tag and value pairs</TAG> <VALUE>TODO: add any tag and value pairs</VALUE> </ANALYSIS_ATTRIBUTE> </ANALYSIS_ATTRIBUTES> </ANALYSIS> </ANALYSIS_SET>