Downloading read and analysis data

Sequencing read and analysis data are available for download through FTP and Aspara protocols in their original format and for read data also in an archive generated fastq formats described here.

Submitted data files

Submitted read data files

Submitted read data files are organised by submission accession number under vol1/ directory in ftp.sra.ebi.ac.uk:

ftp://ftp.sra.ebi.ac.uk/vol1/<submission accession prefix>/<submission accession>

where <submission accession prefix> contains the first 6 letters and numbers of the SRA Submission accession.

For example, the files submitted in the SRA Submission ERA007448 are available at: ftp://ftp.sra.ebi.ac.uk/vol1/ERA007/ERA007448/.

Submitted analysis data files

Submitted analysis data are organised by analysis accession number under vol1/ directory in ftp.sra.ebi.ac.uk:

ftp://ftp.sra.ebi.ac.uk/vol1/<analysis accession prefix>/<analysis accession>

where <analysis accession prefix> contains the first 6 letters and numbers of the SRA Analysis accession.

For example, the files submitted in the SRA Submission ERA007448 are available at: ftp://ftp.sra.ebi.ac.uk/vol1/ERZ454/ERZ454001/.

Md5 manifest files for submitted data

ENA produces md5 manifest files for the users to be confirm the integrity of data submitted to and downloaded from the archive. Manifest files are produced for submitted files associated with runs and analysis.

The manifest can used to verify file integrity by using the command:

md5sum

Please note that to validate the content of a run after downloading the data files the subfolder structure (organised by file format) should be preserved. The same is true for any analyses which contain subfolders.

For example, submitted file integrity for run ERR1438847 can be confirmed by doing the following actions.

1. Download manifest file:

ftp://ftp.sra.ebi.ac.uk/vol1/ERA645/ERA645809/ERR1438847.md5

The content of the manifest file is:

fafe406ac8d98725474048db7f617668 fastq/PHESPV0057.R2.fastq.gz
7d2062f9040e0282287162938f4d9276 fastq/PHESPV0057.R1.fastq.gz

Please note that the file refers to file format specific subfolders containing the data files:

ftp://ftp.sra.ebi.ac.uk/vol1/ERA645/ERA645809/fastq/PHESPV0057.R1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/ERA645/ERA645809/fastq/PHESPV0057.R2.fastq.gz

To run the md5sum command these files should be downloaded into the fastq subfolder.

3. Execute the md5sum command:

md5sum -c ERR1438847.md5

Archive generated fastq files

Archive generated Fastq are not available for the following data formats submitted to ENA:

  • BAM/CRAM files containing @PG:longranger
  • BAM/CRAM files containing @PG:cellranger
  • BAM/CRAM files containing CB:Z,CR:Z,CY:Z,RX:Z,QX:Z tags
  • Complete genomics native (data folder) submissions
  • PacBio native (HDF5) submissions
  • Many ONT native format submissions

Generated fastq files

Number of application reads Fastq Files Description
1

<run_accession>.fastq.gz
or
<run_accession>_1.fastq.gz

For experiments with single application reads only all reads will be made available in one fastq file.
2 <run_accession>_1.fastq.gz
<run_accession>_2.fastq.gz
<run_accession>.fastq.gz
For paired experiments with two application reads reads will be made available in 1-3 fastq files. If a paired experiment is submitted with both application reads then the first reads will be in <run accession>_1.fastq.gz file, the second reads will be in <run accession>_2.fastq.gz, and any unpaired reads will be in <run accession>.fastq.gz file. In case a paired experiment is submitted containing only unpaired reads then only a single file will be created: <run accession>.fastq.gz.
> 2 <run_accession>_N.fastq.gz For experiments with more than two application reads (e.g. Complete Genomics) one fastq file is created for each application read, however, no empty fastq files are created.
N/A <run_accession>_consensus.fastq.gz

PacBio consensus reads.

N/A <run_accession>_subreads.fastq.gz

PacBio subreads.

Fastq file format

@<run accession>.<spot index> [<spot name>][/<read index>]
<bases>
+
<phred qualities, ASCII encoded starting with '!' (33)>
Field Description
<run accession> The Run accession. A spot is identified uniquely by the combination of the Run accession and the spot index.
<spot index> A positive integer assigned to the spots in the order in which they appear in the run. A spot is identified uniquely by the combination of the Run accession and the spot index.
<spot name> The spot name as it was provided by the submitter. In cases where the read name is missing or was removed by the archive this field is not present.
<read index> A positive integer assigned to the application reads in the order in which they appear in the spot: /1 for first application read and /2 for the second application read. In cases where the read name is missing or was removed by the archive this field is not present.

Examples

Single layout:

@ERR000017.1 IL6_554:7:1:249:322
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
??????????????????????????????>>>>>>

 

Paired (first read):

@ERR005143.1 ID49_20708_20H04AAXX_R1:7:1:41:356/1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

 

Paired (second read):

@ERR005143.1 ID49_20708_20H04AAXX_R1:7:1:41:356/2
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

 

Single layout without read names:

@ERR000017.1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
??????????????????????????????>>>>>>

 

Paired without read names (first read):

@ERR005143.1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

 

Paired without read names (second read):

@ERR005143.1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

Fastq file directory organisation

Archive generated fastq files are organised by run accession number under vol1/fastq directory in ftp.sra.ebi.ac.uk:

ftp://ftp.sra.ebi.ac.uk/vol1/fastq/<dir1>[/<dir2>]/<run accession>

<dir1> is the first 6 letters and numbers of the run accession ( e.g. ERR000 for ERR000916 ),

<dir2> does not exist if the run accession has six digits. For example, fastq files for run ERR000916 are in directory: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/.

If the run accession has seven digits then the <dir2> is 00 + the last digit of the run accession. For example, fastq files for run SRR1016916 are in directory: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR101/006/SRR1016916/.

If the run accession has eight digits then the <dir2> is 0 + the last two digits of the run accession.

If the run accession has nine digits then the <dir2> is the last three digits of the run accession.

Downloading files using FTP

Files can be downloaded through ftp.sra.ebi.ac.uk using any FTP client.

Example using wget:

wget  ftp://ftp.sra.ebi.ac.uk/vol1/ERA012/ERA012008/sff/library08_GJ6U61T06.sff

 

Example using ftp:

ftp ftp.sra.ebi.ac.uk
Name: anonymous
Password: enter your e-mail address
ftp> cd vol1/ERA012/ERA012008/sff
ftp> get library08_GJ6U61T06.sff

Downloading files using Globus GridFTP

Files can be downloaded through Globus ebi#public endpoint from 'ena' subfolder:

Globus ebi#public ENA endpoint

Downloading files using ENA FTP Downloader

The ENA FTP Downloader is a standalone application that you can download here.

You can download files for a given accession, or upload an Advanced Search or portal API report to perform a bulk download of all files for a given set of criteria.

 

Downloading files using Aspera

Aspera ascp command line client can be downloaded here. Please select the correct operating system. The ascp command line client is distributed as part of the Aspera connect high-performance transfer browser plug-in.

Your command should look similar to this on Unix:

ascp -QT -l 300m -P33001 -i <aspera connect installation directory>/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:<file or files to download> <download location>

 

and on Mac OSX:

ascp -QT -l 300m -P33001 -i <aspera connect installation directory>/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:<file or files to download> <download location>

 

On Windows please use quotes to avoid errors caused by spaces in file path:

"%userprofile%\AppData\Local\Programs\Aspera\Aspera Connect\bin\ascp" -QT -l 300m -i "%userprofile%\AppData\Local\Programs\Aspera\Aspera Connect\etc\asperaweb_id_dsa.openssh" era-fasp@fasp.sra.ebi.ac.uk:<file or files to download> <download location>

Note: The asperaweb_id_dsa.openssh public key was introduced in Aspera Connect plugin 3.3.3. Earlier versions can still use asperaweb_id_dsa.putty.

Unix Examples:

ascp -QT -l 300m -P33001 -i /etc/aperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:/vol1/ERA012/ERA012008/sff/library08_GJ6U61T06.sff .

 

ascp -QT -l 300m -P33001 -i /etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/ERR036/ERR036000/ERR036000_1.fastq.gz .

 

ascp -QT -l 300m -P33001 -i /etc/aperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:/vol1/ERA012/ERA012008/sff/ .