Metabolomics in the cloud: scaling computational tools to big data
Metabolomic datasets are becoming increasingly large and complex, with multiple types of algorithms and workflows needed to process and analyse the data. This makes it difficult for researchers to make sense of their data without access to extensive computational and bioinformatics support. A cloud infrastructure with portable software tools can provide much needed resources enabling the processing of much larger data sets than would be possible at any individual lab, thus resolving bottlenecks and enabling new discoveries. The PhenoMeNal project has developed such an infrastructure, allowing users to run analyses on local or commercial cloud platforms.
To show how a typical analysis may benefit from up-scaling to a cloud solution, we took a conventional NMR tool, BATMAN and examined how it performs on differing levels of compute resource. We carried out tests at three different levels:
1) a high-end stand alone desktop machine (8 cores),
2) a medium scale cluster (50 cores), and
3) a large scale cluster ( more than 1000 cores).
In each case we used BATMAN to quantify 9 metabolites in 2000 1H NMR spectra of blood serum from the Multi Ethnic Study of Atherosclerosis. Initial tests show that a data set which takes 3 days to process on a desktop could be processed in just 6 hours on the medium scale cluster, suggesting that similar improvements can be expected by further increasing the number of cores. Overall, this investigation demonstrates the benefits, but also the limitations, of large scale compute infrastructures in processing large metabolomic data sets.
This webinar was recorded on 7 February 2018 and was presented by Dr Timothy Ebbels, Reader in Computational Bioinformatics, Imperial College London. The slides from this webinar can be downloaded below.
You can learn more about PhenoMeNal in our Train online course PhenoMeNal: accessing metabolomics workflows in Galaxy
See the EMBL-EBI training pages for a list of upcoming webinars.
This webinar is for scientists with an interest in metabolomics. No prior bioinformatics experience is needed but some familiarity with metabolomics workflows is recommended.
About this course
- Outline the challenges of big data
- Describe the PhenoMeNal infrastructure for cloud computing
- Outline advantages of cloud computing compared with desktop computing