Computational comparative genomics
The order Rodentia underwent an extraordinary adaptive radiation during the Cenozoic and accounts for nearly half of all known mammalian diversity containing over 2,000 species. Rodents spread over almost every landmass on earth and occupy almost all terrestrial ecosystems from rainforests, deserts, and arctic tundra. Murid rodents have a rate of chromosomal rearrangement that has been estimated to be between three times and hundreds of times faster than in primates.
For over a century, mice have been used to model human disease, leading to many fundamental discoveries about mammalian biology and the development of new therapies. Mouse genetics research has been further catalysed by a plethora of genomic resources developed in the last 20 years, including the genome sequence of C57BL/6J and more recently the first draft reference genomes for 16 additional laboratory strains. Collectively, the comparison of these genomes highlights the extreme diversity that exists at loci associated with the immune system, pathogen response, and key sensory functions, which form the foundation for dissecting phenotypic traits in vivo.
The Keane group is interested in the structure, function, and mechanisms of gene regulation in the most diverse loci in the mouse genome. The selective pressures driving diversity and CNVs includes host-pathogen coevolution (e.g., red queen hypothesis), kin selection, mating preference, and even selective sweeps due to strong positive selection. Many of these genes have direct orthologues in the human genome and are therefore important for understanding health and disease, drug development, and vaccine development.
In the coming years, our research will focus on two areas:
1. Genomic structure, regulation, and origins of the most diverse regions of the mouse genome
Wild-derived mouse strains have hundreds of thousands of structural differences and novel haplotypes compared to C57BL/6J. This means that using the C57BL/6J reference genome to study these strains is blind to many non-reference haplotypes. Almost all previously known divergent loci were identified, along with hundreds of novel loci. The Keane group found that these strain specific diversity regions (SSDRs) are enriched for genes associated with immunity, sensory, sexual reproduction, and behaviour. Only 3.1% of gene families in SSDRs have complete sequences (introns and intergenic regions) for multiple mouse strains, and 9.8% of those have published coding sequence from multiple mouse strains. It is not currently known to what extent non-reference haplotype variation in SSDRs contributes to biomedically relevant phenotypes. In its work, the group highlighted specific examples including the Nlrp1 locus (NOD-like receptors, pyrin domain-containing) encodes inflammasome components that sense endogenous microbial products and metabolic stresses, thereby stimulating innate immune responses. The Keane group will complete deep long read sequencing and create reference quality assemblies for sixteen mouse strains by mid 2020 and early indications are that these genomes will have complete representation for the majority of SSDR loci, which will provide a unique opportunity to fully understand the contribution of diversity in these loci to unresolved genetic associations, and make it possible to design functional experiments to test these associations.
A few recent studies have discovered the extent of genetic diversity in wild mice which has enabled the study of evolutionary histories and population genetics of natural house mouse populations. It is largely unknown if SSDR non-reference haplotypes, particularly those involved in immune and pathogen defence, are fixed in the wild mice populations or contain many more novel haplotypes or allele combinations that mediate adaptation and resistance to infection. We could expect that many SSDR loci will be subject to continued selective pressure across the population to adapt to environmental challenges, which laboratory mice do not encounter. Insights into the population structure of SSDR loci in wild populations will provide insights into whether functional studies using laboratory mice (e.g. mechanisms of disease resistance) are informative and generalisable within the Mus species or if additional functional alleles exist.
2. Exploring rodent genetic and phenotypic diversity for biomedical and environmental applications
Advances in second and third generation sequencing have made it possible to generate chromosome scale genome sequences. The Keane group is sequencing genomes of several distinct rodent species groups with potential biomedical and climate change applications. For example, the Algerian mouse is nocturnal, and can drink two-thirds less water and can stand much higher temperature than the domestic mouse. Acomys (African spiny mice) can shed their dorsal skin as a deterrent to avoid predators and fully regenerate the lost tissue without fibrosis or tissue overgrowth. A. russatus is of particular interest for adaptive studies as it thrives in the desert and can ingest seawater while still maintaining kidney function. Within the Onychomys, the Southern Grasshopper Mouse (O. Torridus) is insensitive to one of the most painful stings in the animal kingdom, the bark scorpion, having evolved the ability to block voltage-gated pain transmission.
Lilue J, et al. Mouse protein coding diversity: What’s left to discover? Plos Genetics 2019 14 Nov
Lilue J, et al. Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci Nature Genetics 2018 01 Oct
Keane TM, et al. Goodstadt L, et al. Mouse genomic variation and its effect on phenotypes and gene regulation Nature 2011 14 Sep