ML in drug discovery: why now?

ML is not a new field. Take for example artificial neural networks (ANN), one of the earliest and most well-known machine learning algorithms. The “perceptron”, which is the basic unit of an ANN, was developed in 1958 and the concept of ANN was first introduced in 1943 [1].

There are several reasons for the current interest in ML in general and its applications in healthcare more specifically, such as:

1. Availability of large scale data in life sciences

  • Biological data: parallel sequencing of thousands of genomes and transcriptomes (e.g. UK Biobank), GWAS in cases and controls, mass spectrometry data, etc [2-7]
  • Chemical data: millions of chemical structures and compounds with their physical properties and other relevant information [8]
  • Imaging data: medical imaging data (MRI and CT scans, for example) [9, 10]

2. Compute power

Deep learning was first developed in 1970, but became very popular recently after advancements in GPU processing

3. Massive R&D investments in machine learning from pharmaceuticals, biotech, and technology industries

4. Algorithmic advancements

5. Availability of open source code and tools

Using existing code can speed up the process of developing ML models. Also, researchers who don’t have programming experience can use existing libraries or tools to develop ML models from their data.