Genomics reboots deep learning
Genomics reboots deep learning
About the study
- New deep learning technique offers a more accurate approach to single-cell genomics
- Deep learning methods are making it easier for researchers to integrate genomic and epigenomic data
- Advances in deep learning are accelerating epigenetics research, which has implications for understanding health and disease
A new deep-learning method, DeepCpG, helps scientists better understand the epigenome – the biochemical activity around the genome. Published in Genome Biology by researchers at EMBL-EBI, the Babraham Institute and the Sanger Institute, DeepCpG leverages ‘deep neural networks’, multi-layered machine-learning models inspired by the brain, to gain new insights into health and disease.
Nice book. But what does it mean?
Deep learning is one of the most active fields in machine learning - an approach that has led to recent advancements in computer image classification, text translation and speech recognition. But deep learning also has major potential in computational biology, particularly for regulatory genomics and cellular imaging.
“We now have this amazing ‘book’ of the human genome, thanks to projects like 1000 Genomes, divided up nicely into chapters and annotated in parts. But what does it mean? If we want to really understand how life works, we need to decipher both the genome – the set of instructions repeated in every cell – and the epigenome, the part that varies wildly between cells,” explains Oliver Stegle, Research Group Leader at EMBL-EBI.
To better understand how DNA sequences relate to biological changes, the genomics community is turning to artificial neural networks – a class of machine learning methods first introduced in the 1980s and inspired by the wiring of the brain. More recently, these models have been rebranded as ‘deep neural networks’, which form the field of deep learning.
A recent review of deep learning for Molecular Systems Biology provides a ‘user guide’ to how deep learning can be applied in genomics – an area of rapid technological change.
“Single-cell genomics allows us to generate a huge amount of highly detailed information about the genome and all the activity happening around it, in many different types and subtypes of cells. The complexity is simply staggering, and the idea of explicitly probing each of these potential interactions individually is not really workable,” says Stegle.
“Most existing methods require you to know a lot up front, for example which patterns in a DNA sequence are informative for a specific task. However, there is a huge number of possible patterns in the genome that we could explore, so these existing methods are not practical for genomics,” adds Christof Angermueller, PhD candidate at EMBL-EBI. “With deep learning, you do not have to spend your time manually crafting features that capture these patterns. Instead, the model uses raw DNA sequences as input and discovers relevant patters itself.”
Accelerating single-cell genomics
The team leveraged the capacity of deep learning to fill in the gaps in single-cell genomics, an emerging technology that offers a close-up view on epigenetics.
DeepCpG was designed to help scientists learn about the connections between DNA sequences and DNA methylation – a biochemical modification of the genome sequence that can act like an off-switch for individual genes. Methylation plays a key part in important biological processes, including cell development, ageing and cancer progression.
The new method uses genomic and epigenomic data to make predictions about DNA methylation in single cells. This is important because current technologies provide incomplete information about this. With DeepCpG, researchers can obtain a more complete picture of DNA methylation. The model can also be used to obtain new biological insights, for example on the connection between the DNA sequence and methylation.
“DeepCpG actually learns meaningful features in a data-driven manner,” continues Angermueller. “It has major advantages over previous methods, including the ability to more accurately predict DNA methylation and to study intercellular differences. By studying the wiring of the learnt network, we can understand how the biology of DNA methylation works. This has allowed us to recover known DNA sequence motifs that are important for methylation changes, as well as to discover new motifs, which are the starting point for future studies.”
“We have demonstrated that DeepCpG enables to accurately predict and analyse DNA methylation in single cells. However, DeepCpG is just one example of how we can apply deep learning to genomics and single-cell technologies,” says Stegle. “It is exciting to see the versatile applications deep learning has already found in genomics. I am looking forward to seeing more deep learning techniques come online. I believe it will make a big difference to how we study biology and has the potential to yield new answers about how life works.”
“Single cell epigenomics methods provide exciting insights into cell heterogeneity in development, ageing and disease. However, if you are just dealing with two genomes in a single cell, bits of information are often lost during the experiment,” explains Wolf Reik of the Babraham Institute and Associate Faculty member at the Sanger Institute. “This new method recognises patterns of the epigenome in single cells and then reconstructs lost information, returning a data-rich single cell epigenome.”
"Deep learning is now the state-of-the art in many fields. We are exploring its utility for making sense of large scale biological data. Pioneering studies such as the one by Angermueller and colleagues prove that there is lot to be gained by using deep-learning methods in computational biology,” concludes Leopold Parts, Group Leader at the Sanger Institute.
Deep learning in genomics: landmark review
In a review of deep learning for computational biology, Angermueller, Stegle and their colleagues present different applications of deep neural networks in computational biology. These range from models for understanding the impact of disease mutations to methods for localising and classifying cancer cells in microscopy images.
However, they also point out that deep learning is not the ultimate Swiss Army knife. Instead, the choice of whether to apply deep learning or conventional models depends on the nature of the data and the problem to be solved. Read more about publicly available deep-learning software in the Molecular Systems Biology Review.
Angermueller C, et al. (2017). DeepCpG: Accurate prediction of single-cell DNA methylation states using deep learning. Genome Biology (in press). Published online 11 April; DOI: 10.1186/s13059-017-1189-z
Angermueller C, et al. (2016) Deep Learning for computational biology. Mol. Sys. Biol. 12:878; published online 19 July.
Thanks to NIKHIL BUDUMA for an excellent explanation of Deep Learning on his blog. [http://nikhilbuduma.com/2014/12/29/deep-learning-in-a-nutshell/]
Oliver Stegle is supported by EMBL, Wellcome and the European Union. Wolf Reik is supported by the UK Biotechnology and Biological Sciences Research Council (BBSRC), Wellcome and the European Union. This project has received funding from the European Union’s Horizon 2020 research and innovation programme (grant agreement No 635290).