Metabolomic datasets are becoming increasingly large and complex, with multiple types of algorithms and workflows needed to process and analyse the data. This makes it difficult for researchers to make sense of their data without access to extensive computational and bioinformatics support. A cloud infrastructure with portable software tools can provide much needed resources enabling the processing of much larger data sets than would be possible at any individual lab, thus resolving bottlenecks and enabling new discoveries. The PhenoMeNal project has developed such an infrastructure, allowing users to run analyses on local or commercial cloud platforms. To show how a typical analysis may benefit from up-scaling to a cloud solution, we took a conventional NMR tool, BATMAN and examined how it performs on differing levels of compute resource. We carried out tests at three different levels: 1) a high-end stand alone desktop machine (8 cores), 2) a medium scale cluster (50 cores), and 3) a large scale cluster (>1000 cores). In each case we used BATMAN to quantify 9 metabolites in 2000 1H NMR spectra of blood serum from the Multi Ethnic Study of Atherosclerosis. Initial tests show that a data set which takes 3 days to process on a desktop could be processed in just 6 hours on the medium scale cluster, suggesting that similar improvements can be expected by further increasing the number of cores. Overall, this investigation demonstrates the benefits, but also the limitations, of large scale compute infrastructures in processing large metabolomic data sets.
This webinar will be presented by Dr Timothy Ebbels, Reader in Computational Bioinformatics, Imperial College London.
This webinar is for scientists with an interest in metabolomics. No prior bioinformatics experience is needed but some familiarity with metabolomics workflows is recommended.