Downloading assembled and annotated sequences

Assembled and annotated sequences can be downloaded through the ENA Browser or using FTP.

This document provides instructions for FTP downloads only. The data classes referred to in this document are described here.

Downloading assembled and annotated sequences using FTP

Assembled and annotated sequences are available for download in flat file format through FTP at: ftp://ftp.ebi.ac.uk/pub/databases/embl/. The directory structure and the file name conventions are described below.

Directory                                         Definition
release

A full release of entries is made every March, June, September and December. Genomic and transcriptomic contigs are available in their own subdirectories (see below).

The data files in this directory use the following naming convention:

rel_<data class>_<taxonomic division>_<number>_r<release number>.dat.gz

release/tsa

Transcriptomic contigs included in the release are in the release/tsa directory. 

The data files in this directory use the following naming convention:

tsa_<tsa prefix>_<taxonomic division>[_<number>].dat.gz

The optional <number> is used to divide large sets into several smaller files.

release/wgs

Genomic contigs included in the release are organised into subdirectories using the first two letters of the accession prefix under the release/wgs directory. 

The data files in this directory use the following naming convention:

wgs_<accession prefix>_<taxonomic division>[_<number>].dat.gz

The optional <number> is used to divide large sets into several smaller files.

new

This directory contains entries created or updated after the latest release. Please note the symbolic links to the wgs directory.

The main data files in this directory use the following naming convention:

cum_<data class>_<taxonomic division>_<number>_r<release number>.dat.gz

wgs

This directory contains genomic contigs created or updated after the latest release. 

The data files in this directory use the following naming convention: 

wgs_<wgs prefix>_<taxonomic division>[_<number>].dat.gz

The optional <number> is used to divide large WGS sets into several smaller files.

wgs/masters This directory contains all whole genomic or transcriptomic assembly master entries in a single file:

wgs_masters.dat.gz

con

This directory contains scaffolds (build from genomic or transcriptomic contigs) created or updated after the latest release. 

The data files in this directory use the following naming convention:

cum_con_<taxomic division>_<number>_r<release number>.dat.gz

expanded_con/release

This directory contains all scaffolds (build from genomic or transcriptomic contigs with sequences and annotation extracted from the contigs) included in the latest release.

The data files in this directory use the following naming convention:

rel_con_<taxonomic division>_<number>_r<release number>.dat.gz

expanded_con/new

This directory contains all scaffolds (build from genomic or transcriptomic contigs with sequences and annotation extracted from the contigs) created or updated after the latest release.

The data files in this directory use the following naming convention:

cum_exp_con_<taxonomic division>_<number>_r<release number>.dat.gz

Downloading coding, non-coding and rRNA sequences using FTP

Coding, non-coding and rRNA sequences are available for download in flat file and fasta formats through FTP at: ftp://ftp.ebi.ac.uk/pub/databases/ena/. The directory structure and the file name conventions are described below.

Directory                                         Definition
coding/release

This directory contains all protein coding features part of the latest release. Entries are available in both flat file and fasta formats. 

Flat files use the following naming convention:

rel_<dataclass>_<taxonomic division>_<number>_r<release number>.cds.gz

Fasta files use the following naming convention:

rel_<dataclass>_<taxonomic division>_<number>_r<release number>.cds.fasta.gz

coding/update This directory contains all protein coding features created or updated after the latest release. Same file naming conventions apply as above. 
non-coding/release 

This directory contains all non-coding features part of the latest release. Entries are available in both flat file and fasta formats. 

Flat files use the following naming convention:

rel_<dataclass>_<taxonomic division>_<number>_r<release number>.ncr.gz

Fasta files use the following naming convention:

rel_<dataclass>_<taxonomic division>_<number>_r<release number>.ncr.fasta.gz 

non-coding/update 

This directory contains all non-coding features created or updated after the latest release. Same file naming conventions apply as above. 

rRNA/release

This directory contains all rRNA features part of the latest release. Entries are available in both flat file and fasta formats. 

Flat files use the following naming convention:

rel_<dataclass>_<taxonomic division>_<number>_r<release number>.rRNA.gz

Fasta files use the following naming convention:

rel_<dataclass>_<taxonomic division>_<number>_r<release number>.rRNA.fasta.gz 

rRNA/update 

This directory contains all rRNA features created or updated after the latest release. Same file naming conventions apply as above. 

spacer/release This directory contains all spacer (ITS, IGS, ETS) features part of the latest release. Entries are available in both flat file and fasta formats. 

Flat files use the following naming convention:

rel_<dataclass>_<taxonomic division>_<number>_r<release number>.spacer.gz

Fasta files use the following naming convention:

rel_<dataclass>_<taxonomic division>_<number>_r<release number>.spacer.fasta.gz 

spacer/update 

This directory contains all spacer features created or updated after the latest release. Same file naming conventions apply as above. 

FTP mirror sites

Country URL
UK ftp://ftp.ebi.ac.uk/pub/databases/embl/release
France ftp://ftp-bips.u-strasbg.fr/pub/ebi/pub/databases/embl/release
Finland ftp://ftp.funet.fi/pub/sci/molbio/embl_release
USA ftp://bio-mirror.net/biomirror/embl/release
Japan ftp://bio-mirror.jp.apan.net/pub/biomirror/embl/release
China ftp://ftp.cbi.pku.edu.cn./pub/databases/embl/release
Australia ftp://biomirror.aarnet.edu.au/biomirror/embl/release