Statistical Workflows in PhenoMeNal


Statistical analysis offers powerful approaches to mine complex and high-dimension datasets, find significant features, and build prediction models with high-prediction performance. Due to the high number of methods available, their complex mathematical background, and the potential pitfalls due to biases and overfitting, a good understanding of each step as well as an environment allowing to efficiently manage the whole workflow, are of major importance.

In this webinar, we will see how to build a statistical workflow within the user-friendly Galaxy environment, including: normalization (signal drift and batch effect), quality control, univariate hypothesis testing, multivariate modelling with (Orthogonal) Partial Least Squares, and feature selection (with Partial Least Squares, Random Forest and Support Vector Machines).

Our example dataset (MTBLS404) can be downloaded from the MetaboLights repository, the Sacurine study, aims at discovering physiological variations of the human urine metabolome with age, body mass index, and gender.

We will be using the PhenoMeNal platform. We will also see how data from the MetaboLights repository can be uploaded directly into Galaxy workflows. Additional statistical modules and public analyses are available on the Workflow4Metabolomics platform.

This webinar was recorded on 18th April 2018 and was presented by Etienne Thévenot. It is best viewed in full screen mode using Google Chrome. The slides from this webinar can be downloaded below.

See the EMBL-EBI training pages for a list of upcoming webinars.

This webinar is aimed at metabolomics researchers and bioinformaticians.

No prior knowledge of bioinformatics is required, but an understanding of metabolomics would be an advantage.

About this course

Etienne Thévenot
Learning objectives: 
  • List statistical steps in analysis of metabolomics data in PhenoMeNal
  • Identify statistical parameters and options for analysis of metabolomics data in PhenoMeNal