How to export sequence and download data
Exporting sequences and annotation
You can download single or multiple sequences, with or without their annotation, from any of the ENA databases, including:
Downloading a single EMBL-Bank sequence or full entry;
Downloading multiple EMBL-Bank sequences or full entries;
Downloading sequences or full entries from the taxonomy portal;
Downloading SRA sequences and data using SRA-DataDownloader';
Bulk downloads using ftp.
Exporting single EMBL-Bank sequences and annotation
Once you have found the EMBL-Bank entry you want, you can use the download links (Figure 44) at the top right of every entry page to easily download either:
Figure 44. EMBL-Bank entry for BN000065 showing the download links at the top of every ENA entry page.
Exporting multiple EMBL-Bank sequences and annotation
Alternatively, you can download multiple EMBL-Bank sequences or full entries either by:
- Following the links from the search page results;
- Uploading your file of accessions.
You have the option of selecting the range of entries to download, which is particularly useful if your search query returns a large number of results (Figure 45).
Figure 45. Results page from a text search on 'human' displaying the download options.
Exporting SRA sequences and data
The ENA browser enables a range of options when downloading raw sequence data from the Sequence Read Archive (SRA) (Figure 47). SRA data can be downloaded in normalised fastq format, or in the original format as submitted by the author. The SRA data can be grouped by study, sample, experiment, run or submission, where each group of sequences can be downloaded separately. Because these sequence files are often very large, in addition to ftp download, the ENA browser enables downloading using the high-speed file transfer software Aspera. Alternatively, you can upload SRA data into the Galaxy platform.
Bulk downloads using ftp
For example, EMBL-Bank sequences that can be directly downloaded from the ftp site include:
- the entire EMBL-Bank release;
- new and updated entries made available after the latest release;
- specific data classes, such as Coding Sequence (CDS) Whole Genome Shotgun (WGS), Mass Genome Annotation (MGA) or the Construct (CON).
Figure 49. ftp site for the download of data from the EMBL-Bank full release.