Computational microbial genomics
Computational models are important tools for understanding the key forces that shape evolution, establishing how genotype affects biological traits and exploring concepts such as ancestry and infection. The standard model for analysing a species is to take a random individual, reconstruct its genome and make an assumption that all other genomes from that species are mostly the same. This reduces sequence analysis to a simple string matching problem (“read mapping”). When this assumption is true, the method works well; however, for many regions of genomes (e.g. the human MHC, surface antigens in P. falciparum, the non-core part of any bacterial genome), this assumption is false. Our group works on this problem, developing computational methods for representing genetic variation, and using them to study bacteria and parasites. We work closely with species experts, and draw on concrete exemplars of challenging cases to fine-tune our models and methods.
Our methods include Cortex and more recently a modified Burrows Wheeler Transform approach that naturally models the impact of recombination, finding mosaic paths through a reference panel of genomes. Our translational work focuses on applying whole-genome sequencing to pathogens in a clinical setting.
We have developed a rapid, lightweight app (called Mykrobe Predictor) for predicting antibiotic resistance given sequence data from a sample of S. aureus or M. tuberculosis, and are working on testing, updating and extending this to other species. To enable strain and resistance surveillance, we are building online genome graph databases of pathogen variation.
Our group will work on improving our modified BWT encoding, nanopore sequence analysis (including real-time strand filtering in-pore) for clinical applications such as TB diagnostics, bacterial pan-genome analysis and plasmid dynamics. As part of the global Pf3k project, we will build detailed variation maps of P. falciparum vaccine targets. As part of the CRYPTIC consortium, we are doing the genome analysis for 100,000 M. tuberculosis genomes.
We are currently recruiting through the EMBL International PhD Programme. Projects could either be quite “pure” bioinformatics (i.e. succinct data structures, genome graphs, nanopore analysis) or more applied (i.e. analysis of genetic variation in 100,000 M. tuberculosis, or complex variation in P. falciparum). We will soon also be recruiting postdocs and a bioinformatician to work on CRYPTIC.
Iqbal Z, et al. (2012) De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature Genetics 44:226–232.
Bradley P et al (2015) Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nature Communications 6:10063.
Maciuca S et al (2016) A natural encoding of genetic variation in a Burrows-Wheeler Transform to enable mapping and genome inference. (preprint)
Bradley P et al (2017) Real-time search of all bacterial and viral genomic data. BioRxiv doi: https://doi.org/10.1101/234955. (preprint)