We provide below easy access and visualization of 4,644 species-level prokaryotic genomes corresponding to the Unified Human Gastrointestinal Genome (UHGG) catalogue. These species clusters represent a total of 286,997 metagenome-assembled and isolate genomes from the human gut microbiome. Species phylogeny and taxonomic annotations were generated with the Genome Taxonomy Database (GTDB). Each genome and its functional annotations can be explored interactively in the genome browser. Assemblies, annotations and pan-genome results are also available as a separate download and in our FTP server.
Version 1.0 released July 2020
The Unified Human Gastrointestinal Protein catalogue
The Unified Human Gastrointestinal Protein (UHGP) catalogue was generated with all coding sequences of the 286,997 human gut genomes.
A total of 625 million protein sequences were clustered with MMseqs2 linclust into:
- 170,602,708 representative sequences at 100% amino acid identity (UHGP-100).
- 20,239,340 representative sequences at 95% amino acid identity (UHGP-95).
- 13,907,849 representative sequences at 90% amino acid identity (UHGP-90).
- 4,735,546 representative sequences at 50% amino acid identity (UHGP-50).
A further set of high-quality (HQ) subsets of the UHGP-95/90/50 were generated, consisting of protein clusters where at least two proteins from independent genomes were retrieved from the same species.
All these files and their functional annotations can be downloaded from our public FTP website: ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/
Search DNA fragments in the UHGG catalogue
This is a BIGSI-based search engine designed to query short sequence fragments (50-5,000 bp in length) against 4,644 representative genomes from the human gut microbiome.
Paste a sequence