Software – Iqbal Group

Mykrobe

Tool for rapid light-weight analysis of Mycobacterium tuberculosis, Staphylococcus aureus, Shigella sonnei, giving species/lineage information and drug resistance predictions.

Code: https://github.com/Mykrobe-tools/mykrobe

Papers: DOI: 10.1038/ncomms10063, https://doi.org/10.12688/wellcomeopenres.15603.1

Pandora

Bacterial genomes can be remarkably variable even within a species, leading to the concept of a pan-genome. With standard tools, it is only possible to study SNP/mutation variation in the parts of the genome that are shared across all samples in a cohort (the “core”). Using a new genome graph implementation, we developed a new tool, pandora, for joint analysis of SNP and gene-presence information in the entire bacterial pan-genomes. Pandora supports nanopore and illumina data.

Code: https://github.com/rmcolq/pandora

Paper: DOI: 10.1186/s13059-021-02473-1

Gramtools

Tool for joint analysis of SNP/indel variation in cohorts, allowing analysis of mutations on different haplotypes and on alternate backgrounds to long deletions. The underlying data structure is a generalised BWT. Application has been focussed primarily on surface antigens in P. falciparum). Gramtools supports illumina data.

Code: https://github.com/iqbal-lab-org/gramtools

Paper: DOI: 10.1186/s13059-021-02474-0

make_prg

Python implementation of the Recursive-Cluster-Collapse algorithm described in the Pandora paper, builds genome graphs from either MSA or VCF. Used by both gramtools and pandora.

Code: https://github.com/iqbal-lab-org/make_prg

Minos

Tool for combining multiple callsets (VCF) made for the same sample using different variant callers (eg samtools, freebayes etc), and using a genome graph to adjudicate when the two callsets disagreed. Used heavily in the CRyPTIC project to analyse tens of thousands of M. tuberculosis genomes.

Code: https://github.com/iqbal-lab-org/minos

Preprint: https://doi.org/10.1101/2021.09.15.460475

Varifier

Tool (introduced in the minos paper) for evaluating a VCF file of calls when you have a high quality truth assembly (as is common with bacteria – no issues with phasing calls). A probe is constructed for each record in the VCF, with flanking sequence from the reference genome (with nearby variants applied), and then this is mapped to the truth genome. This allows varifier to measure precision. Measuring recall depends on having reliable true variants; varifier can use minimap and nucmer to compare the reference genome and truth assembly and find a conservative “truth set” of variants (and uses the above probe method to filter the minimap+nucmer calls to exclude errors), and then uses that truth set to measure recall.

Code: https://github.com/iqbal-lab-org/varifier/.

BIGSI

Tool for creating kmer index of large sets of microbial sequence data.

Code: https://github.com/iqbal-lab-org/BIGSI.

Paper: https://pubmed.ncbi.nlm.nih.gov/30718882/.

Note that if you want to build a BIGSI of many samples, the method outlines in the paper is quite memory intensive. We have a better method of merging indexes, documented on the wiki https://github.com/iqbal-lab-org/BIGSI/wiki/Merging-BIGSIs.

COBS

High performance (faster and less disk use) C++ reimplementation with new ideas, of BIGSI.

Code: https://github.com/iqbal-lab-org/cobs.

Paper: https://arxiv.org/abs/1905.09624

Cortex

This rather venerable tool builds coloured de Bruijn graphs and uses them to detect variation between a sample and a reference, or between different samples.
Code: https://github.com/iqbal-lab/cortex

Paper: https://www.nature.com/articles/ng.1028.

Cortex is no longer actively developed but it is heavily used, in particular
our group have analysed large cohorts of M. tuberculosis by using both cortex and samtools, and then combining the results with minos.