Identifying objects

Each object is uniquely identified within a submission account using the alias attribute. Once an object has been submitted no other object of the same type can use the same alias within the submission account.

Objects can refer to other objects within a submission account using the refname attribute. For example, if a sample has alias="sample1" an experiment can reference this sample by using refname="sample1".

Identifying submitters

The center_name attribute defines the submitting institution. The center name is required to match the one registered for the submission account.

If the submitter is brokering a submission for another institute, the center name should reflect the institute where the data was generated. Brokers should request a special broker account and provide their center name acronym in the broker_name attribute. Brokers can also point to an equivalent object in their resource and to their customer using the BROKER_OBJECT_ID and BROKER_CUSTOMER_NAME attributes. The submission tool used by the broker can be captured using the BROKER_SUBMISSION_TOOL attribute.

If the work has been contracted to another partly, the run_center or analysis_center attributes can be used to provide this information.

Metadata XMLs

The metadata model consists of several objects each represented using XML ... more information.

Read submissions

A typical read submission consists of five XMLs:

  • Submission XML
  • Study XML
  • Sample XML
  • Experiment XML
  • Run XML

A submission does not have to contain all five XMLs. For example, it is possible to submit only samples or studies to be referenced in the future. Please note that whatever the submission scenario you are always required to provide a Submission XML.

Technical reads (e.g. barcodes, adaptors or linkers ) should be removed prior submission. If they are included then a spot descriptor is required so that the technical reads can be identified.

Supported file types in Run XML for read submissions:

Format Filetype
CRAM (recommended format) cram
BAM (recommended format) bam
Fastq fastq
SFF cram
PacBio format PacBio_HDF5
Oxford Nanopore format OxfordNanopore_native 
Complete Genomics format CompleteGenomics_native

Assembly submissions

A typical assembly submission consists of four XMLs:

  • Submission XML
  • Study XML
  • Sample XML
  • Analysis XML ( SEQUENCE_ASSEMBLY )

Supported file types in Analysis XML for assembly submissions:

Format Filetype
Contigs in Fasta format contig_fasta
Contigs in Flat File format contig_flatfile
Scaffolds in Fasta format scaffold_fasta
Scaffolds in Flat File format scaffold_flatfile
Scaffolds in AGP format scaffold_agp
Chromosomes in Fasta format chromosome_fasta
Chromosomes in Flat File format chromosome_flatfile
Chromosomes in AGP format chromosome_agp
List of chromosomes chromosome_list
List of unlocalised contigs unlocalised_contig_list
List of unlocalised scaffods unlocalised_scaffold_list

Alignment submissions

A typical re-aligned read submission consists of four XMLs:

  • Submission XML
  • Study XML
  • Sample XML
  • Analysis XML ( REFERENCE_ALIGNMENT )

Previously submitted studies and samples can also be referenced.

Supported file types in Analysis XML for alignment submissions:

Format Filetype
BAM bam

Annotation submissions

A typical annotation submission consists of four XMLs:

  • Submission XML
  • Study XML
  • Sample XML
  • Analysis XML ( SEQUENCE_ANNOTATION )

Previously submitted studies and samples can also be referenced.

Supported file types in Analysis XML for annotation submissions:

Format Filetype File suffix
Tab separated table tab

.tab

.tab.gz

Variation submissions

A typical variation submission consists of four XMLs:

  • Submission XML
  • Study XML
  • Sample XML
  • Analysis XML ( SEQUENCE_VARIATION )

Previously submitted studies and samples can also be referenced.

Supported file types in Analysis XML for variation submissions:

Format Filetype
VCF vcf
VCF vcf_aggregate

 

Submission XML

The submission XML is used to submit, update, release or validate other objects. The list of other XML documents that can be used in assocation with the submission:

Document Schema
Submission SRA.Submission.xsd
Study SRA.study.xsd
Sample SRA.sample.xsd
Experiment SRA.experiment.xsd
Run SRA.run.xsd
Analysis SRA.analysis.xsd
EGA DAC EGA.dac.xsd
EGA Policy EGA.policy.xsd
EGA Dataset EGA.dataset.xsd

Public release of objects

Data and other objects associated with a study are released to public only when the study is made public. Please note that samples may also be made public independently of studies. After public release withdrawal from public access is only possible by contacting us at datasubs@ebi.ac.uk.

Studies can be kept confidential for up to two years by using the HOLD action. If HOLD action is not used or if RELEASE action is used then the submitted studies will become immediately public with all associated data and other objects.

Submitting objects

New objects are submitted using the ADD action.

An example of a submission XML used to submit new objects using the ADD action is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<SUBMISSION_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.submission.xsd">
<SUBMISSION alias="TODO: UNIQUE NAME FOR SUBMISSION" 
 center_name="TODO: CENTER NAME">
        <ACTIONS>
            <ACTION>
                <ADD/>
            </ACTION>
            <ACTION>
                <RELEASE/>
                <HOLD HoldUntilDate="TODO: hold until date 2010-01-01"/>
                <!-- Choose either RELEASE oe HOLD action. RELEASE is for immediate public release.
                     HOLD is for an embargo (maximum 2 years). By default RELEASE is assumed. -->
            </ACTION>                            
        </ACTIONS>
    </SUBMISSION>
</SUBMISSION_SET>

Updating objects

Object updates are done using the MODIFY action.

An example of a submission XML used to update existing objects is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<SUBMISSION_SET>
   <SUBMISSION alias="TODO: UNIQUE NAME FOR SUBMISSION" center_name="TODO: CENTER NAME">
	<ACTIONS>
   		<ACTION>
   			<MODIFY/>
   		</ACTION>
   	</ACTIONS>
   </SUBMISSION>
</SUBMISSION_SET>

Releasing objects

If studies have been previously submitted with HOLD action then they can be made immediately public by using the RELEASE action.

For example, the following XML publishes the study ERP001835 with associated data and metadata:

<?xml version="1.0" encoding="UTF-8"?>
<SUBMISSION_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.submission.xsd">
<SUBMISSION alias="TODO: UNIQUE NAME FOR SUBMISSION" center_name="TODO: CENTER NAME">
        <ACTIONS>
             <ACTION>
                  <RELEASE target="ERP001835"/>
             </ACTION>
        </ACTIONS>
   </SUBMISSION>
</SUBMISSION_SET> 

Validating objects

Objects can be validated by using the VALIDATE action instead of the ADD action.

<?xml version="1.0" encoding="UTF-8"?>
<SUBMISSION_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.submission.xsd">
<SUBMISSION alias="TODO: UNIQUE NAME FOR SUBMISSION" center_name="TODO: CENTER NAME">
        <ACTIONS>
             <ACTION>
                  <VALIDATE/>
             </ACTION>
        </ACTIONS>
   </SUBMISSION>
</SUBMISSION_SET> 

Study XML

The study XML is used to describe the sequencing study including a title, a study type and an abstract as it would appear in a publication.

An example of a study XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<STUDY_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.study.xsd">
    <STUDY alias="TODO: UNIQUE NAME FOR SUBMISSION" 
        center_name="TODO: CENTER NAME">
        <DESCRIPTOR>
            <STUDY_TITLE>TODO: STUDY TITLE AS IT COULD APPEAR IN A PUBLICATION</STUDY_TITLE>
            <STUDY_TYPE existing_study_type="TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END 
                OF XML"/>
            <STUDY_ABSTRACT>TODO: STUDY ABSTRACT AS IT COULD APPEAR IN A
                PUBLICATION</STUDY_ABSTRACT>
        </DESCRIPTOR>
        <STUDY_ATTRIBUTES>
            <STUDY_ATTRIBUTE>
                <TAG>TODO: TAG NAME</TAG>
                <VALUE>TODO: TAG VALUE</VALUE>
            </STUDY_ATTRIBUTE>
            <STUDY_ATTRIBUTE>
                <TAG>TODO: TAG NAME</TAG>
                <VALUE>TODO: TAG VALUE</VALUE>
            </STUDY_ATTRIBUTE>
            <!-- You can generate your own fields and values here using STUDY_ATTRIBUTE 
                tag-value pairs. Please delete any unused attributes and add as many as 
                required. -->
        </STUDY_ATTRIBUTES>
    </STUDY>
    <!-- If you are submitting more than one study, replicate the block <STUDY> to </STUDY> 
        here, as many times as necessary. -->
</STUDY_SET>
<!-- Controlled vocabulary for existing_study_type:
    Whole Genome Sequencing
Metagenomics
Transcriptome Analysis
Resequencing
Epigenetics
Synthetic Genomics
Forensic or Paleo-genomics
Gene Regulation Study
Cancer Genomics
Population Genomics
RNASeq
Exome Sequencing
Pooled Clone Sequencing
Other If using "Other" please add new_study_type="TODO: add own term" attribute -->

Please use the following notation when including PubMed citations in Study XML:

<STUDY_LINKS>
    <STUDY_LINK>
        <XREF_LINK>
            <DB>PUBMED</DB>
            <ID>18987735</ID>
        </XREF_LINK>
    </STUDY_LINK>
</STUDY_LINKS>

Sample XML

The sample XML is used to describe the sequenced samples. The mandatory fields are minimal and include information about the taxonomy of the sample. However, since the sample is one of the most important objects to be described biologically, it is highly recommended that “TAG-VALUE” pairs are generated to describe the sample in as much detail as possible. We recommend the adoption of GSC (Genomic Standards Consortium) terms for the TAG names were possible. For a full list of terms in the specific standards please visit the GSC wiki.

An example of a sample XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<SAMPLE_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.sample.xsd">
    <SAMPLE alias="TODO: UNIQUE NAME FOR SAMPLE" 
        center_name="TODO: ACCOUNT CENTER_NAME">                     
        <TITLE>TODO: A SHORT INFORMATIVE DESCRIPTION OF THE SAMPLE</TITLE>
        <SAMPLE_NAME>
            <TAXON_ID>TODO: PROVIDE NCBI TAXID FOR ORGANISM (e.g. 9606 for human)
                 </TAXON_ID>
            <!-- For complete prokaryotic genomes, a taxid should be generate for the strain. 
                 Please contact us so we can generate this on your behalf. -->
            <SCIENTIFIC_NAME>TODO: SCIENTIFIC NAME AS APPEARS IN NCBI TAXONOMY FOR THE
                TAXON_ID (e.g. homo sapiens)</SCIENTIFIC_NAME>
            <COMMON_NAME>TODO: OPTIONAL COMMON NAME AS APPEARS IN NCBI TAXONOMY FOR 
                THE TAXON_ID (e.g. human)</COMMON_NAME>
        </SAMPLE_NAME>
        <DESCRIPTION>TODO: A LONGER DESCRIPTION OF SAMPLE AND HOW IT DIFFERS FROM 
            OTHER SAMPLES</DESCRIPTION>
        <SAMPLE_ATTRIBUTES>
            <SAMPLE_ATTRIBUTE>
                <TAG>TODO: TAG NAME</TAG>
                <VALUE>TODO: TAG VALUE</VALUE>
                <UNITS>TODO: OPTIONAL UNIT</UNITS>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>TODO: TAG NAME</TAG>
                <VALUE>TODO: TAG VALUE</VALUE>
                <UNITS>TODO: OPTIONAL UNIT</UNITS>
            </SAMPLE_ATTRIBUTE>
            <!-- You can generate your own fields and values here using SAMPLE_ATTRIBUTE 
                tag-value pairs. An example tag could be "Isolation Source" and the value 
                could be "Seawater". You can also use the UNITS element to include 
                scientific units. E.g., TAG "Age" VALUE "5" UNITS "Years". Please refer
                to online documentation for further help with sample tag-value pairs.
                Please delete any unused attributes and add as many as required. -->
        </SAMPLE_ATTRIBUTES>
    </SAMPLE>
    <!-- If you are submitting more than one sample, replicate the block <SAMPLE> to </SAMPLE> 
         here, as many times as necessary. -->
</SAMPLE_SET> 

Experiment XML

The experiment XML is used to describe the experimental setup including instrument and library preparation details, and any additional information required to correctly interpret the submitted data. Where any of these values differ between runs, a new experiment object must exist. Each experiment references a study and a sample. Please note that pooled data must be demultiplexed by barcode and any technical reads must be removed before submission.

An example of an experiment XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<EXPERIMENT_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.experiment.xsd">
    <EXPERIMENT alias="TODO: UNIQUE NAME FOR EXPERIMENT"
        center_name="TODO: ACCOUNT CENTER_NAME">
        <TITLE>TODO: TITLE OF EXPERIMENT</TITLE>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <DESIGN>
            <DESIGN_DESCRIPTION>TODO: DETAILS ABOUT THE SETUP AND GOALS OF THE 
                EXPERIMENT AS SUPPLIED BY INVESTIGATOR</DESIGN_DESCRIPTION>
            <SAMPLE_DESCRIPTOR refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE 
                OBJECT"/>
            <LIBRARY_DESCRIPTOR>
                <LIBRARY_NAME>TODO: NAME OF LIBRARY</LIBRARY_NAME>
                <LIBRARY_STRATEGY>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT 
                    END OF XML</LIBRARY_STRATEGY>
                <LIBRARY_SOURCE>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT 
                    END OF XML </LIBRARY_SOURCE>
                <LIBRARY_SELECTION>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT 
                    END OF XML </LIBRARY_SELECTION>
                <LIBRARY_LAYOUT>
                    <TODO: CHOOSE LIBRARY LAYOUT FROM CONTROLLED VOCABULARY AT END OF XML/>
                </LIBRARY_LAYOUT>
                <LIBRARY_CONSTRUCTION_PROTOCOL>TODO: PROTOCOL BY WHICH THE LIBRARY WAS
                    CONSTRUCTED</LIBRARY_CONSTRUCTION_PROTOCOL>
            </LIBRARY_DESCRIPTOR>         
        </DESIGN>
        <PLATFORM>
            <TODO: CHOOSE PLATFORM FROM CONTROLLED VOCABULARY AT END OF XML>
                <INSTRUMENT_MODEL>TODO: CHOOSE FROM CONTROLLED VOCABULARY AT 
                END OF XML </INSTRUMENT_MODEL>
            </TODO: CHOOSE PLATFORM FROM CONTROLLED VOCABULARY AT END OF XML>
        </PLATFORM>
        <PROCESSING/>
    </EXPERIMENT>
    <!-- If you are submitting more than one experiment, replicate the block <EXPERIMENT> 
        to </EXPERIMENT> here, as many times as necessary. -->
</EXPERIMENT_SET>
<!-- Controlled vocabulary for LIBRARY_LAYOUT
SINGLE
PAIRED
-->
<!-- Controlled vocabulary for PLATFORM
LS454
ILLUMINA
COMPLETE_GENOMICS
PACBIO_SMRT
ION_TORRENT
OXFORD_NANOPORE
CAPILLARY
-->
<!--Controlled vocabulary for LS454 INSTRUMENT_MODEL:
454 GS
454 GS 20454 GS FLX
454 GS FLX+
454 GS FLX Titanium
454 GS Junior
unspecified
-->
<!-- Controlled vocabulary for ILLUMINA INSTRUMENT_MODEL:
Illumina Genome Analyzer
Illumina Genome Analyzer II
Illumina Genome Analyzer IIx
Illumina HiSeq 2500
Illumina HiSeq 2000
Illumina HiSeq 1500
Illumina HiSeq 1000
Illumina MiSeq
Illumina HiScanSQ
HiSeq X Ten
NextSeq 500
unspecified
-->
<!-- Controlled vocabulary for COMPLETE_GENOMICS INSTRUMENT_MODEL:
Complete Genomics
unspecified
-->
<!-- Controlled vocabulary for PACBIO_SMRT INSTRUMENT_MODEL:
PacBio RS
PacBio RS II
unspecified
-->
<!-- Controlled vocabulary for ION_TORRENT INSTRUMENT_MODEL:
Ion Torrent PGM
Ion Torrent Proton
unspecified
-->
<!-- Controlled vocabulary for OXFORD_NANOPORE INSTRUMENT_MODEL:
MinION
GridION
unspecified
-->
<!-- Controlled vocabulary for CAPILLARY INSTRUMENT_MODEL:
AB 3730xL Genetic Analyzer
AB 3730 Genetic Analyzer
AB 3500xL Genetic Analyzer
AB 3500 Genetic Analyzer
AB 3130xL Genetic Analyzer
AB 3130 Genetic Analyzer
AB 310 Genetic Analyzer
-->
<!-- Controlled vocabulary for LIBRARY STRATEGY (description in parentheses): WGS (Random sequencing of the whole genome) WGA (whole genome amplification to replace some instances of RANDOM) WXS (Random sequencing of exonic regions selected from the genome) RNA-Seq (Random sequencing of whole transcriptome) miRNA-Seq (for micro RNA and other small non-coding RNA sequencing) ncRNA-Seq(Non-coding RNA) WCS (Random sequencing of a whole chromosome or other replicon isolated from a genome) CLONE (Genomic clone based (hierarchical) sequencing) POOLCLONE (Shotgun of pooled clones (usually BACs and Fosmids)) AMPLICON (Sequencing of overlapping or distinct PCR or RT-PCR products) CLONEEND (Clone end (5', 3', or both) sequencing) FINISHING (Sequencing intended to finish (close) gaps in existing coverage) ChIP-Seq (Direct sequencing of chromatin immunoprecipitates) MNase-Seq (Direct sequencing following MNase digestion) DNase-Hypersensitivity (Sequencing of hypersensitive sites, or segments of open chromatin that are more readily cleaved by DNaseI) Bisulfite-Seq (Sequencing following treatment of DNA with bisulfite to convert cytosine residues to uracil depending on methylation status) EST (Single pass sequencing of cDNA templates) FL-cDNA (Full-length sequencing of cDNA templates) CTS (Concatenated Tag Sequencing) MRE-Seq (Methylation-Sensitive Restriction Enzyme Sequencing strategy) MeDIP-Seq (Methylated DNA Immunoprecipitation Sequencing strategy) MBD-Seq (Direct sequencing of methylated fractions sequencing strategy) Tn-Seq (for gene fitness determination through transposon seeding) VALIDATION (CGHub special request: Independent experiment to re-evaluate putative variants. Micro RNA sequencing strategy designed to capturepost-transcriptional RNA elements and include non-coding functional elements) FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements) SELEX (Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an in vitro strategy to analyze RNA sequences that perform an activity of interest, most commonly high affinity binding to a ligand) RIP-Seq (Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLI)) ChiA-PET (Direct sequencing of proximity-ligated chromatin immunoprecipitates) RAD-Seq (Restriction site Associated DNA Sequencing is a method for sampling the genomes of multiple individuals in a population using NGS) OTHER (Library strategy not listed) --> <!-- Controlled vocabulary for LIBRARY SOURCE (description in parentheses): GENOMIC (Genomic DNA (includes PCR products from genomic DNA)) TRANSCRIPTOMIC (Transcription products or non genomic DNA (EST, cDNA, RT-PCR, screened libraries)) METAGENOMIC (Mixed material from metagenome) METATRANSCRIPTOMIC (Transcription products from community targets) SYNTHETIC (Synthetic DNA) VIRAL RNA (Viral RNA) OTHER (Other, unspecified, or unknown library source material) --> <!-- Controlled vocabulary for LIBRARY SELECTION (description in parentheses): RANDOM (Random selection by shearing or other method) PCR (Source material was selected by designed primers) RANDOM PCR (Source material was selected by randomly generated primers) RT-PCR (Source material was selected by reverse transcription PCR) HMPR (Hypo-methylated partial restriction digest) MF (Methyl Filtrated) repeat fractionation (replaces: CF-S, CF-M, CF-H, CF-T) size fractionation MSLL (Methylation Spanning Linking Library) cDNA (complementary DNA) ChIP (Chromatin immunoprecipitation) MNase (Micrococcal Nuclease (MNase) digestion) DNase (Deoxyribonuclease (MNase) digestion) Hybrid Selection (Selection by hybridization in array or solution) Reduced Representation (Reproducible genomic subsets, often generated by restriction fragment size selection, containing a manageable number of loci to facilitate re-sampling) Restriction Digest (DNA fractionation using restriction enzymes) 5-methylcytidine antibody (Selection of methylated DNA fragments using an antibody raised against 5-methylcytosine or 5-methylcytidine (m5C)) MBD2 protein methyl-CpG binding domain (Enrichment by methyl-CpG binding domain) CAGE (Cap-analysis gene expression) RACE (Rapid Amplification of cDNA Ends) MDA (Multiple displacement amplification) padlock probes capture method (to be used in conjuction with Bisulfite-Seq) Inverse rRNA selection (Remove the ribosomal transcripts by inverse selection: you capture them by annealing with specific oligos, also bound to beads, and then discard that) Oligo-dT (Select primarily messenger RNA, which conveniently is polyadenylated so these transcripts can be captured with oligo-dT beads) other (Other library enrichment, screening, or selection process) unspecified (Library enrichment, screening, or selection is not specified) -->

Run XML

The run XML is used to associate data files with experiments. Please note that pooled data must be demultiplexed by barcode and any technical reads must be removed before submission. The md5 checksums for the files can be provided within the Run XML or in files with the same name as the submitted files postfixed with '.md5'.

An example of a run XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<RUN_SET  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ftp://ftp.sra.ebi.ac.uk/meta/xsd/sra_1_5/SRA.run.xsd">
    <RUN alias="TODO: UNIQUE NAME FOR RUN" center_name="TODO: ACCOUNT CENTER_NAME"
        run_center="TODO: ACCOUNT CENTER_NAME" run_date="2008-07-02T10:00:00">
        <!-- run_center is optional but should be used if sequencing was carried out 
             elsewhere. It should use an acronym/abbreviation from a controlled vocabulary. 
             Please contact us for this acronym if unsure. Delete run_center element if 
             not needed. -->
        <!-- run_date uses a strict format. Please make sure the time stamp is included, 
             even if it is zeroes (i.e., T00:00:00). -->
        <EXPERIMENT_REF refname="TODO: EXPERIMENT ALIAS OF RELEVANT EXPERIMENT OBJECT"/>


<DATA_BLOCK member_name="TODO: FOR DEMULTIPLEXED DATA ONLY (see note below)"> <!-- member_name should be the name (usually sample alias) given in the experiment xml of a pooled experiment. For experiments without a pool, the member_name attribute in run.xml should be removed. --> <FILES> <FILE filename="TODO: FILENAME1" filetype="TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML" checksum_method="MD5" checksum="TODO: CHECKSUM1"/> </FILES> </DATA_BLOCK> </RUN> <!-- If you are submitting more than one run, replicate the block <RUN> to </RUN> here, as many times as necessary. --> </RUN_SET> <!-- Controlled vocabulary for filetype: "srf" "sff" "fastq"
"cram"
"bam" "PacBio_HDF5"
"OxfordNanopore_native"
"CompleteGenomics_native (filename must point to the whole data package)" -->

Analysis XML for assembly submissions

Analysis XML can be used to submit genome assemblies. Only one assembly can be submitted in each analysis. Each analysis can contain 1 or more data files and must be associated with a single study and a single sample. The md5 checksums for the files can be provided within the Analysis XML or in files with the same name as the submitted files postfixed with '.md5'.

An example of an analysis XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
    <ANALYSIS alias="TODO: UNIQUE NAME FOR ASSEMBLY" 
        center_name="TODO: CENTER NAME">
        <TITLE>TODO: a descriptive title for the analysis shown in search results</TITLE>
        <DESCRIPTION>TODO: a detained description of the analysis</DESCRIPTION>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <SAMPLE_REF refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT"/>
        <ANALYSIS_TYPE>
            <SEQUENCE_ASSEMBLY>
<NAME>TODO: UNIQUE NAME FOR ASSEMBLY</NAME>
<PARTIAL>TODO: TRUE or FALSE</PARTIAL>
<COVERAGE>TODO: NUMERIC SEQUENCING COVERAGE</COVERAGE>
<PROGRAM>TODO: ASSEMBLY PROGRAM</PROGRAM>
<PLATFORM>TODO: SEQUENCING PLATFORM</PLATFORM>
</SEQUENCE_ASSEMBLY> 
</ANALYSIS_TYPE>
<FILES>
<FILE filename="TODO: FILENAME1"
filetype="TODO: CHOOSE FROM CONTROLLED VOCABULARY AT END OF XML"
checksum_method="MD5" checksum="TODO: CHECKSUM1"/>
</FILES>
<ANALYSIS_ATTRIBUTES>
<ANALYSIS_ATTRIBUTE>
<TAG>TODO: add any tag and value pairs</TAG>
<VALUE>TODO: add any tag and value pairs</VALUE>
</ANALYSIS_ATTRIBUTE>
</ANALYSIS_ATTRIBUTES>
</ANALYSIS>
</ANALYSIS_SET>
<!--
Controlled vocabulary for filetype:
"contig_fasta"
"contig_flatfile"
"scaffold_fasta"
"scaffold_flatfile"
"scaffold_agp"
"chromosome_fasta"
"chromosome_flatfile"
"chromosome_agp"
"chromosome_list"
"unlocalised_contig_list"
"unlocalised_scaffold_list"
-->

Analysis XML for alignment submissions

Analysis XML can be used to submit BAM alignments. Only one BAM file can be submitted in each analysis and the samples used within the BAM must be associated with submitted samples. In addition, the analysis must be associated with a study. Optimally the BAM file would be associated with an INSDC reference assembly and sequences either by using accessions (as for the references sequences in the example below) or by using commonly used labels (as for the reference assembly in the example below). The BAM index can be submitted together with the BAM. If the BAM index file is not submitted then it will be created by ENA. The md5 checksums for the files can be provided within the Analysis XML or in files with the same name as the submitted files postfixed with '.md5'.

An example of an analysis XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
    <ANALYSIS alias="TODO: UNIQUE NAME FOR ANALYSIS" 
        center_name="TODO: CENTER NAME"
        broker_name="TODO: CENTER NAME"
        analysis_center="TODO: CENTER NAME" analysis_date="2011-11-18T10:10:10.0Z">
        <TITLE>TODO: a descriptive title for the analysis shown in search results</TITLE>
        <DESCRIPTION>TODO: a detained description of the analysis</DESCRIPTION>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <SAMPLE_REF refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT 1"
            label="TODO: sample name in the BAM file"/>        
        <SAMPLE_REF refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT 2"
            label="TODO: sample name in the BAM file"/>        
        <ANALYSIS_TYPE>
            <REFERENCE_ALIGNMENT>
                <ASSEMBLY>
                    <STANDARD refname="TODO: INSDC assembly name (e.g. GRCh37 or GRCh37.p1)" 
accession="TODO: INSDC assembly accession (e.g. GCA_000001405.1)"/> </ASSEMBLY> <SEQUENCE accession="TODO: INSDC sequence accession and version" label="TODO: reference sequence name in the BAM file"/> <SEQUENCE accession="TODO: INSDC sequence accession and version" label="TODO: reference sequence name in the BAM file"/> </REFERENCE_ALIGNMENT> </ANALYSIS_TYPE> <FILES> <FILE filename="TODO: FILENAME.bam" filetype="bam" checksum_method="MD5" checksum="TODO: CHECKSUM" unencrypted_checksum="TODO: CHECKSUM"/> </FILES> <ANALYSIS_ATTRIBUTES> <ANALYSIS_ATTRIBUTE> <TAG>TODO: add any tag and value pairs</TAG> <VALUE>TODO: add any tag and value pairs</VALUE> </ANALYSIS_ATTRIBUTE> </ANALYSIS_ATTRIBUTES> </ANALYSIS> </ANALYSIS_SET>

Analysis XML for annotation submissions

Analysis XML can be used to submit annotations in tabulated files. Only one annotation file can be submitted in each analysis. Each analysis must be associated with a single study and a single sample. The md5 checksums for the files can be provided within the Analysis XML or in files with the same name as the submitted files postfixed with '.md5'.

An example of an analysis XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
    <ANALYSIS alias="TODO: UNIQUE NAME FOR ANALYSIS" 
        center_name="TODO: CENTER NAME"
        broker_name="TODO: CENTER NAME"
        analysis_center="TODO: CENTER NAME" analysis_date="2011-11-18T10:10:10.0Z">
        <TITLE>TODO: a descriptive title for the analysis shown in search results</TITLE>
        <DESCRIPTION>TODO: a detained description of the analysis</DESCRIPTION>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <SAMPLE_REF refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT 1"/>        
        <SAMPLE_REF refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT 2"/>        
        <ANALYSIS_TYPE>
            <SEQUENCE_ANNOTATION/>
        </ANALYSIS_TYPE>
        <FILES>
            <FILE filename="TODO: FILENAME.csv" filetype="tab" checksum_method="MD5"
                checksum="TODO: CHECKSUM" unencrypted_checksum="TODO: CHECKSUM" checklist="TODO: CHECKLIST NAME"/>
        </FILES>
        <ANALYSIS_ATTRIBUTES>
            <ANALYSIS_ATTRIBUTE>
                <TAG>TODO: add any tag and value pairs</TAG>
                <VALUE>TODO: add any tag and value pairs</VALUE>
            </ANALYSIS_ATTRIBUTE>
        </ANALYSIS_ATTRIBUTES>
    </ANALYSIS>
</ANALYSIS_SET>

Analysis XML for variation submissions

Analysis XML can be used to submit VCF files to the European Variation Archive (EVA). Only one VCF file can be submitted in each analysis and the samples used within the VCF files must be associated with submitted samples. In addition, the analysis must be associated with a study. Optimally the VCF file would be associated with an INSDC reference assembly and sequences either by using accessions (as for the references sequences in the example below) or by using commonly used labels (as for the reference assembly in the example below). The md5 checksums for the files can be provided within the Analysis XML or in files with the same name as the submitted files postfixed with '.md5'.

An example of an analysis XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
    <ANALYSIS alias="TODO: UNIQUE NAME FOR ANALYSIS" center_name="TODO: CENTER NAME"
        broker_name="TODO: CENTER NAME"
        analysis_center="TODO: CENTER NAME" analysis_date="2011-11-18T10:10:10.0Z">
        <TITLE>TODO: a descriptive title for the analysis shown is search results</TITLE>
        <DESCRIPTION>TODO: a detailed description of the analysis</DESCRIPTION>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <SAMPLE_REF refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT 1"
            label="TODO: the sample name in the VCF file"/>
        <SAMPLE_REF refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT 2"
            label="TODO: the sample name in the VCF file"/>
        <ANALYSIS_TYPE>
            <SEQUENCE_VARIATION>
                <ASSEMBLY>
<STANDARD refname="TODO: INSDC assembly name (e.g. GRCh37 or GRCh37.p1)"
accession="TODO: INSDC assembly accession (e.g. GCA_000001405.1)"/> </ASSEMBLY> <SEQUENCE accession="TODO: INSDC sequence accession and version" label="TODO: the reference sequence name in the VCF file" />
<EXPERIMENT_TYPE>Use one of: "Whole genome sequencing", "Exome sequencing", "Genotyping by array"</EXPERIMENT_TYPE>
  </SEQUENCE_VARIATION> </ANALYSIS_TYPE> <FILES> <FILE filename="TODO: FILENAME.vcf" filetype="vcf" checksum_method="MD5" checksum="TODO: CHECKSUM" unencrypted_checksum="TODO: CHECKSUM"/> </FILES> <ANALYSIS_ATTRIBUTES> <ANALYSIS_ATTRIBUTE> <TAG>TODO: add any tag and value pairs</TAG> <VALUE>TODO: add any tag and value pairs</VALUE> </ANALYSIS_ATTRIBUTE> </ANALYSIS_ATTRIBUTES> </ANALYSIS> </ANALYSIS_SET>

Analysis XML for Bionano genome maps

Analysis XML can be used to submit Bionano genome maps to the European Nucleotide Archive (ENA). The submission consists of bnx, cmap, xmap, smap and coord files. The md5 checksums for the files can be provided within the Analysis XML or in files with the same name as the submitted files postfixed with '.md5'.

An example of an analysis XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
    <ANALYSIS alias="TODO: UNIQUE NAME FOR ANALYSIS" center_name="TODO: CENTER NAME"
        broker_name="TODO: CENTER NAME"
        analysis_center="TODO: CENTER NAME" analysis_date="2011-11-18T10:10:10.0Z">
        <TITLE>TODO: a descriptive title for the analysis shown is search results</TITLE>
        <DESCRIPTION>TODO: a detailed description of the analysis</DESCRIPTION>
        <STUDY_REF refname="TODO: STUDY ALIAS OF RELEVANT STUDY OBJECT"/>
        <SAMPLE_REF refname="TODO: SAMPLE ALIAS OF RELEVANT SAMPLE OBJECT"/>
<ANALYSIS_TYPE> <GENOME_MAP> <PROGRAM>Irys</PROGRAM>
<PLATFORM>BioNano</PLATFORM>
  </GENOME_MAP> </ANALYSIS_TYPE> <FILES> <FILE filename="TODO: FILENAME.bnx" filetype="BioNano_native" checksum_method="MD5" checksum="TODO: CHECKSUM"/>
<!-- TODO: ADD ADDITIONAL FILES --> </FILES> <ANALYSIS_ATTRIBUTES> <ANALYSIS_ATTRIBUTE> <TAG>TODO: add any tag and value pairs</TAG> <VALUE>TODO: add any tag and value pairs</VALUE> </ANALYSIS_ATTRIBUTE> </ANALYSIS_ATTRIBUTES> </ANALYSIS> </ANALYSIS_SET>

Analysis XML for CRAM reference sequence submissions

Analysis XML can be used to submit reference sequences into the CRAM reference registry. Only one Fasta file can be submitted in each analysis. The md5 checksums for the file can be provided within the Analysis XML or in files with the same name as the submitted files postfixed with '.md5'.

An example of an analysis XML is provided below:

<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
    <ANALYSIS alias="TODO: UNIQUE NAME FOR ANALYSIS" center_name="TODO: CENTER NAME"
        broker_name="TODO: CENTER NAME"
        analysis_center="TODO: CENTER NAME">
        <TITLE>TODO: a descriptive title for the analysis shown is search results</TITLE>
        <DESCRIPTION>TODO: a detailed description of the analysis</DESCRIPTION>
        <ANALYSIS_TYPE>
            <REFERENCE_SEQUENCE\>
        </ANALYSIS_TYPE>
        <FILES>
            <FILE filename="TODO: FILENAME.fasta.gz"
                filetype="fasta"
                checksum_method="MD5" checksum="TODO: CHECKSUM"/>
        </FILES>
    </ANALYSIS>
</ANALYSIS_SET>

Latest ENA news

12 Jul 2017: Submission service maintenance - 14/7/17 to 17/7/17

Webin submission services will not be available between Friday 14/7...

07 Jul 2017: Update to Aspera server

EBI has built a new Aspera server on up-dated hardware with the latest Aspera version and configuration. This should improve...

06 Jul 2017: ENA Release 132

Release 132 of ENA's assembled/annotated sequences now available

30 Jun 2017: Taxon support for sequence, WGS and assembly in ENA Browser Tools

You can now download sequence, WGS and assembly data by tax ID using ENA Browser Tools

23 Jun 2017: New tools to download data from ENA

Introducing two new tools to make retrieving data from ENA much easier: enaBrowserTools and ENA FTP Downloader.