EBI metagenomics

EBI metagenomics logo

About EBI Metagenomics

EBI Metagenomics service


Metagenomics is the study of all genomes present in any given environment...

Metagenomics is the study of all genomes present in any given environment without the need for prior individual identification or amplification. For example, in its simplest form, a metagenomic study might be the direct sequence results of DNA extracted from a bucket of sea water.

The EBI Metagenomics service is an automated pipeline for the analysis and archiving of metagenomic data that aims to provide insights into the phylogenetic diversity as well as the functional and metabolic potential of a sample. You can freely browse all the public data in the repository.

Please take your time to explore and tell us what you think about our website. We welcome your feedback on look and feel, functionality or scientific content. If you want to be kept informed about updates to the website, please subscribe to our mailing list. Also note, as we constantly try to improve the site, it may change from time to time.

Why choose EBI Metagenomics?

Easy submission

easy submission

The EBI Metagenomics service offers a manually-assisted submission route, with help available to ensure data and metadata formatting comply with the Sequence Read Archive (SRA) data schema and the Genomic Standards Consortium (GSC) sample metadata guidelines respectively, allowing harmonisation of analysis efforts across the wider genomics community.

Powerful analysis

powerful analysis

The service identifies rRNA sequences, using rRNASelector, and performs taxonomic analysis upon 16S rRNAs using Qiime. The remaining reads are submitted for functional analysis of predicted protein coding sequences using the InterPro sequence analysis resource. InterPro uses diagnostic models to classify sequences into families and to predict the presence of functionally important domains and sites. By utilising this resource, the service offers a powerful and sophisticated alternative to BLAST-based functional metagenomic analyses.

Data archiving

data archiving

Data submitted to the EBI Metagenomics service is automatically archived in the SRA, which is part of the European Nucleotide Archive (ENA). Accession numbers are supplied for sequence data as part of the archiving process, which is a prerequisite for publication in many journals. The SRA only accepts data that is intended for public release. However, any data submitted to us can be kept confidential (by secure user login) for a period of up to 2 years to allow time for the data producer to publish their findings. It should be noted that ALL data must eventually be suitable for public release.

How to use EBI Metagenomics for data analysis

Register with us

Registration is required to submit data for analysis. Please note that the registration system is shared with ENA, so if you have previously submitted sequences to EMBL-Bank you will already have a valid account.

Check your data format

The service accepts all NGS shotgun sequence reads, including Roche 454, Illumina and IonTorrent sequences, from metagenomic or metatranscriptomic samples. Amplicon marker gene studies may be included, particularly if they are associated with other meta-omic data for a sample.

If your dataset does not fit these descriptions, please contact us to help us better understand your needs.

Filter any human-associated samples

Human associated samples (e.g., human gut samples) must be filtered prior to submission to remove any human contaminants.

How we analyse the data

Data processing steps:

  1. 1. Reads submitted
  2. 2. Nucleotide sequences processed
    1. 2.1. Clipped - low quality ends trimmed and adapter sequences removed using Biopython SeqIO package
    2. 2.2. Quality filtered - sequences with > 10% undetermined nucleotides removed
    3. 2.3. Read length filtered - depending on the platform short sequences are removed
    4. 2.4. Duplicate sequences removed - clustered on 99% identity (UCLUST v 1.1.579), representative sequence chosen
    5. 2.5. Repeat masked - RepeatMasker (open-3.2.2), removed reads with 50% or more nucleotides masked
  3. 3. rRNA reads are filtered using rRNASelector (rRNASelector v 1.0.0)
  4. 4. Taxonomy analysis is performed upon 16s rRNA using Qiime (Qiime v 1.5).
  5. 5. CDS predicted (FragGeneScan v 1.15)
  6. 6. Matches were generated against predicted CDS with InterProScan 5.0 (beta release) using a subset of databases from InterPro release 31.0 (databases used for analysis: Pfam, TIGRFAM, PRINTS, PROSITE patterns, Gene3d). The Gene Ontology term summary was generated using the following GO slim: goslim_goa

How to cite

To cite EBI Metagenomics, please refer to the following publication:

Sarah Hunter, Matthew Corbett, Hubert Denise, Matthew Fraser, Alejandra Gonzalez-Beltran, Christopher Hunter, Philip Jones, Rasko Leinonen, Craig McAnulla, Eamonn Maguire, John Maslen, Alex Mitchell, Gift Nuka, Arnaud Oisel, Sebastien Pesseat, Rajesh Radhakrishnan, Philippe Rocca-Serra, Maxim Scheremetjew, Peter Sterk, Daniel Vaughan, Guy Cochrane, Dawn Field and Susanna-Assunta Sansone (2013).
EBI metagenomics - a new resource for the analysis and archiving of metagenomic data. Nucleic Acids Research (2013) doi: 10.1093/nar/gkt961

Planned features

We intend to make frequent updates to the interfaces and services provided. The following features are planned for future releases of the resource:

  • Including comparative analysis of results
  • Human contaminant filtering



The EBI metagenomics resource was initiated by funding from EMBL. It continues to be developed with support from EMBL and additional funding has been gratefully received from the Biotechnology and Biological Sciences Research Council (BBSRC grant BB/I02612X/1) and the EU's Seventh Framework Programme for Research (FP7 grant MICROB3).

Mailing list

If you would like to be kept informed of further developments with the EBI metagenomics resources please sign up for the EBI metagenomics mailing list.


We would like to thanks our beta testers for providing valuable feedback.

The team

This new resource for metagenomics is supported by the EBI Metagenomics Team, in collaboration with the ENA and UniProt groups.


We would like to thank the following authors for their contribution related to the pictures used on the website:

| Clear