Submitting sequence reads

Introduction

European Nucleotide Archive (ENA) accepts sequence reads and associated analyses into the Sequence Read Archive (SRA). Once public, data submitted to ENA is exchanged between International Nucleotide Sequence Database Collaboration (INSDC) partners: NCBI and DDBJ.

Public access data

ENA only accepts data submissions which are intended for public release. Controlled access data should be submitted to European Genome-phenome Archive (EGA).

During the submission processes submitters are asked to define whether the submitted studies will become immediately public or if they should remain confidential for a period of no more than two years. Once data has been publicly released it can be withdrawn from public access only in exceptional circumstances.

Please contact datasubs@ebi.ac.uk to request for an ENA submission account.

Controlled access data

Controlled access data should be submitted to the European Genome-phenome Archive (EGA). The decisions of who will be granted access to data resides with the submitter nominated Data Access Committee.

Please contact ega-helpdesk@ebi.ac.uk to request for an EGA submission account.

 Metadata model

sra Diagrams

The metadata model contains the following objects:

  • Study: information about the sequencing study
  • Sample: information about the sequenced samples
  • Experiment: information about the libraries, platform; associated with study, sample(s) and run(s)
  • Run: contains the raw data files
  • Analysis: contains the analysis data files; associated with study, sample and run objects
  • Submission: information about the submission actions include release date 

Fair usage of data drop boxes

ENA provides a permanent and comprehensive data repository for public domain sequence and associated information. Data being submitted into this system are routed through private data drop boxes, into which users upload their data files. Upon instruction from the user, through Webin or REST, files are validated, moved out of data drop boxes and loaded into the archive. The data drop boxes are provided as a temporary area in which data in transit are held and are not intended nor suitable for any longer term storage. Long-term storage is provided in the archive itself; data can be released immediately following loading or can be held confidential prior to analysis and literature publication if required.

Under normal circumstances, we expect any given data file to remain in a drop box for no longer than 2 months from upload before the instruction is given to submit the file. In order to consume disk efficiently, we limit usage of drop boxes as described above and reserve the right to routinely remove any data files that persist in them for more than 2 months. 

A weekly report of submitted read files is available in the data drop box: /report/submitted_run_files.txt.gz. More information about the weekly reports is available here.

Please contact datasubs@ebi.ac.uk with any queries.

Submitting public access data using Webin

Webin is the recommended submission interface for most submitters:

>Login

Webin provides an intuitive way to submit the required metadata and the data files. Large sequencing centers should consider using the REST service, which can be integrated programatically with LIMS systems

A Webin video tutorial is available here:

Webin submission process

Submission process

Please contact datasubs@ebi.ac.uk to request for a submission account. This grants you access to Webin:

>Login

Please note that data files must be uploaded to your drop box using FTP or Aspera. Information about acceptable data formats is available here.

In Webin:

  • Go to the New Submission page
  • Choose sequence read submission and provide release date
  • Provide study information
  • Provide sample information
  • Provide instrument platform, library and data file information

Please do not use Webin to submit quantative metagenomic studies. The system does not currently capture sufficient information to comply with the GSC (genomic standards consortium) standards for reporting genoming and metagenomic sequences. Instead, The EBI Metagenomics team will broker your submission for you: http://www.ebi.ac.uk/metagenomics/.

Please do not use Webin to submit quantative expression based studies such as RNA-Seq and CHIP-Seq yet. The system does not currently capture sufficient information for MIAME compliance. Array Express will broker your submission for you. Please use their MAGE-TAB submission system: http://www.ebi.ac.uk/cgi-bin/microarray/magetab.cgi.

Read data submission services are also available from third parties, including the myRDP SRA Prepkit (https://pyro.cme.msu.edu/sra/login.spr) and the ISA Infrastructure (http://isatab.sourceforge.net/).

For all questions and enquiries please contact datasubs@ebi.ac.uk.

Submitting public and controlled access data programatically

We recommend that large scale submitters integrate their LIMS systems directly with our submission service. We provide a RESTful submission tool which can be used to submit study, sample, experiment, run, analysis, EGA DAC, EGA policy, EGA dataset XML objects and data files. Advice on preparing XML metadata files required by the service is available here.

The REST submission tool provides immediate validation and object accessioning and can be used for repeated regular submissions. All submitters with submission accounts can take advantage of this service. We also provide a simple web form which can be used to explore and use the REST submission interface.

Please note that data files must be uploaded to your drop box using FTP or Aspera. Information about acceptable data formats is available here.

It is also possible to update any objects using the programmatic REST service. The only exception are some limitations on updating data file related details. Advice on how to update objects is available here.

For all questions and enquiries please contact datasubs@ebi.ac.uk.

Submission validation

All submitted objects are validated prior accessioning. Detailed information about the validations is available here.