MGnify Genomes

We provide below easy access and visualization of 4,644 species-level prokaryotic genomes corresponding to the Unified Human Gastrointestinal Genome (UHGG) catalogue. These species clusters represent a total of 286,997 metagenome-assembled and isolate genomes from the human gut microbiome. Species phylogeny and taxonomic annotations were generated with the Genome Taxonomy Database (GTDB). Each genome and its functional annotations can be explored interactively in the genome browser. Assemblies, annotations and pan-genome results are also available as a separate download and in our FTP server.

Version 1.0 released July 2020

Taxonomy tree

The tree viewer is a derivative of the GTDB Tree viewer, used under the GNU General Public License, version 3

The Unified Human Gastrointestinal Protein catalogue

The Unified Human Gastrointestinal Protein (UHGP) catalogue was generated with all coding sequences of the 286,997 human gut genomes.

A total of 625 million protein sequences were clustered with MMseqs2 linclust into:

  • 170,602,708 representative sequences at 100% amino acid identity (UHGP-100).
  • 20,239,340 representative sequences at 95% amino acid identity (UHGP-95).
  • 13,907,849 representative sequences at 90% amino acid identity (UHGP-90).
  • 4,735,546 representative sequences at 50% amino acid identity (UHGP-50).

A further set of high-quality (HQ) subsets of the UHGP-95/90/50 were generated, consisting of protein clusters where at least two proteins from independent genomes were retrieved from the same species.

All these files and their functional annotations can be downloaded from our public FTP website: ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/