Getting data from EBI Metagenomics portal


You can retrieve raw sequence read files and analysis results via the Download page, which is accessible from the Project, Sample or Run Overview pages (Figure 4a). The available data are presented in table format organised by categories such as ‘Sequence data’, ‘Functional analysis’ and ‘Taxonomic analysis’. The type of data are represented by their own icon and the file format is also indicated.

If you click the ‘Submitted nucleotide reads’ link you will be re-directed to the ENA website where the data can be downloaded using FTP or Galaxy.

You can download processed nucleotide reads, rRNA sequences, annotated and unannotated predicted protein coding sequences (in FASTA format) directly from the Download page.

In addition, InterPro matches, GO term annotations and taxonomic assignments (in tab- or comma-separated format) are also available to download. The taxonomy assignments are provided in machine-readable biom format, which can be directly uploaded into external packages such as the MEGAN software. A phylogenetic tree file, compatible with freely available tree viewing software (such as FigTree or TreeView) is also available.

To avoid time out issues during the downloading process, large files are compressed with gzip and split into chunks of ~ 500Mb. To re-build the whole files, users have to concatenate the chunks before decompressing.

Figure 4a  EBI Metagenomics Download page.

As previously described in Searching and visualising data in EBI Metagenomics portal, functional and taxonomic results files summarising abundance counts for all runs within a project are provided on the Project Analysis summary web pages as tab-separated files (Figure 4b).

 Figure 4b  EBI Metagenomics project-level summary page.