Key stages of a metabolomics study

Overview of complete analysis workflow

Both targeted and untargeted metabolomics studies follow a similar pipeline. An example of this pipeline for mass spectrometry-based metabolomics studies is shown in Figure 4 below.

Figure 4 A flowchart showing the main steps typically involved in a mass spectrometry (MS)-based metabolomics study.

Next, we will go through each of the main steps in more details.

Design experiment

The study design, also known as 'experimental design', is of paramount importance for every study. It is essential to make sure that the samples collected reflect and represent the biology in question. In order to determine and examine the most influential factors that are relevant for the hypothesis under investigation, external factors that can affect the experiment have to be eliminated or identified so that they can be accounted for during data analysis.

In the study design, factors like sample size, randomisation, and storage must all be taken into account to guarantee reproducible and successful experiments that minimise erroneous variability, and yet highlight the metabolites of interest and their potential interactions (Figure 5).

Figure 5 Some important considerations when designing a metabolomics study.

Noise (or error) is an important consideration to factor in because it distorts the signals in your data. There are two types of noise:

  • Random noise - this results from contaminants and general technological limitations. It produces signal spikes and discontinuous data that could be mistaken for meaningful data.
  • Systematic noise - this results from external factors that are not relevant for the study. Baseline drift is one example of systematic noise and is a common problem in liquid chromatography-mass spectrometry (LC-MS) where the gradient of the mobile phase causes the chromatographic baseline to be irregular.

 

Sample preparation

Sample preparation usually involves the following steps (Figure 6):

  • collection;
  • storage;
  • extraction;
  • preparation;
  • custom preparation for individual measurement systems, e.g. derivatisation for gas chromatography.
Figure 6 The main steps involved in sample preparation.

Extraction techniques

Solid-phase extraction (SPE)

SPE is a process whereby compounds which are dissolved or suspended in a liquid mixture are separated from other compounds according to their chemical and physical properties. SPE is often used in metabolomics laboratories to concentrate and purify a sample.

Chromatography

Chromatography is an important step in metabolomics experiments to separate individual metabolites from a mixture. The most common technologies used in mass spectrometry are gas and liquid chromatography. Through interactions of analytes with a mobile and stationary phase, compounds are separated and elute off the chromatographic column at different time points, based on their physiochemical properties.

Mass spectrometry

Mass spectrometry (MS) is an analytical technique used to measure small molecules. The small molecules may be either directly injected into the mass spectrometer (direct infusion) or through a coupled chromatographic system. The analytes are ionised at an ion source before they can be detected in a coupled mass detector (Figure 7). The resulting data typically consists of mass-to-charge (m/z), time, and intensity triplets that describe - for every detected ion mass - the strength of the ion beam and the time it is detected by the spectrometer.

Find out more about mass spectrometry, including an interactive animation of how a mass spectrometer works (4).

Figure 7 Modules of a simple mass spectrometer.
Steps

 

1. Sample inlet: The port through which samples enter the mass spectrometer. A mass spectrometer can be combined with a chromatographic technique or used via direct infusion without prior separation of analytes.

2. Ion source: Ionisation techniques are grouped into hard and soft. Hard ionisation, such as electron impact ionisation (EI), heavily fragments a compound by creating high energy electrons that interact with an analyte. In contrast, soft ionisation, such as electron spray ionisation (ESI), ionises a compound but creates only a few fragments.

3. Mass analyser: Generated ions are separated by their m/z ratio in the mass analyser where – for simplicity – charge is often assumed to be equal to one. Consequently a m/z ratio approximately equals the molecular mass of an ion. All mass analysers exploit the mass and electrical charge properties of ions but use different separation methods

4 & 5. Detector and recorder: Separated ions are detected by a mass detector that scans a pre-defined mass range at close intervals. The chromatographic profile of an ion, i.e. the generated continuous ion beam, is recorded across multiple scans at discrete time intervals.

 

Nuclear magnetic resonance (NMR)

NMR is an analytical technique that is used to measure organic and some inorganic compounds inside biological samples (as solid tissue or extracted metabolite). When a sample is exposed to a magnetic field and radio frequency (rf) pulse, the nuclei absorb and re-emit this electromagnetic radiation. The energy that is emitted has a specific resonating frequency, which depends on several factors including the magnetic properties of the atoms' isotopes and the strength of the magnetic field (usually referred to as chemical shifts). In the case of metabolomics, proton atoms from small molecules are usually investigated (1H-NMR). For further information and animations on NMR concepts, see this website (4). 

NMR-based metabolomics is a non-invasive and non-destructive technique with high reproducibility, making it a powerful tool for searching new and novel biomarkers. For NMR, we measure the resulting signal from small molecules’ protons resonating within a magnetic field. One of the first usages of NMR was to detect metabolites in unmodified biological samples (5).

Find our more about different analytical techniques  used in metabolomics studies.

Comparison of NMR and MS

The two most common techniques used in data acquisition are nuclear magnetic resonance and mass spectrometry. Table 1 shows some of the key differences between the two techniques.

Table 1 Contrasting some of the advantages and disadvantages of NMR and MS.

 

Nuclear magnetic resonance (NMR)

Mass spectrometry (MS)

Sensitivity

Low

High

Reproducibility

Very high

Average

Number of detectable metabolites

30-100

300-1000+ (depending on whether GC-MS or LC-MS is used)

Targeted analysis

Not optimal for targeted analysis

Better for targeted analysis than NMR

Sample preparation

Minimal sample preparation required

More complex sample preparation required

Tissue extraction

Not required. Tissues can be analysed directly

Requires tissue extraction

Sample analysis time

Fast. The whole sample can be analysed in one measurement

Takes longer than NMR. Requires different chromatography techniques for depending on type of metabolites analysed

 Instrument Cost

More expensive and occupies more space than MS

Cheaper and occupies less space than NMR

Sample Cost

Low cost per sample

High cost per sample

Data processing

Data processing aims to extract biologically relevant information from the acquired data. It includes many steps that are similar for MS and NMR. A good understanding of the steps involved is important in order to minimise the risk of skewed or false results. Typically, the endpoint of MS and NMR metabolomics studies is an (annotated) feature matrix (Figure 8). A feature is typically a peak or signal that represents a chemical compound. Thus, a feature matrix contains the intensities or (relative) abundances of relevant signals for every sample, describing the metabolomics fingerprint. Ultimately, this feature list would become a list of identified metabolites with semi-quantified or quantified values.  

Figure 8 Example of an MS feature matrix. Transpositions of the matrix are also common.

To compile a feature matrix, noise reduction and background correction are essential before features can be extracted via peak picking (Figure 8). This process greatly tidies up the data. Extracted features of individual samples are then aligned across samples to compensate for drifts in the chemical shift (NMR) or retention time (MS) (see Figure 9). Aligned features can then be aggregated in a feature matrix: a feature has a characteristic chemical shift (NMR) or mass (MS) that can be used as column header. The rows represent individual samples.

Signal distortion image

Figure 9 A summary of components contributing to signal distortions. (a) Random noise adds variation to a signal around the mean (zero). (b) Systematic noise, e.g. baseline drifts, introduces a systematic drift or bias in the data that needs to be removed before data analysis. Systematic noise can impact heavily on signal intensities and derived signal areas. (c) The actual signal follows – in theory – a Gaussian distribution. Deviations from this distribution reflect external factors. (d) Overlay of components (a), (b), and (c), and the resulting 'measured' signal (black line).

Noise (or error) is an important consideration to factor in because it distorts the signals in your data. There are two types of noise:

  • Random noise - this results from contaminants and general technological limitations. It produces signal spikes and discontinuous data that could be mistaken for meaningful data.
  • Systematic noise - this results from external factors that are not relevant for the study. Baseline drift is one example of systematic noise and is a common problem in liquid chromatography-mass spectrometry (LC-MS) where the gradient of the mobile phase causes the chromatographic baseline to be irregular (Figure 9d).

Analysis and interpretation

In the context of metabolomics, the most common statistical analysis approaches are grouped into univariate and multivariate methods. Each method offers unique insights into the data structure. Multivariate analysis works on a matrix of variables and highlights characteristics based on the relationships between all variables. Univariate analysis takes only one variable into account, resulting in differently weighted results.

The goal of statistical analysis is the categorisation and prediction of sample properties through generation of models that capture the information contained in data matrices. In mass spectrometry, the m/z ratio and signal intensity are the two most important variables. In NMR we select integrated signals of interest for data analysis.

Figure 10 (a) Example PCA plot of three batches (red, green, blue). The red batch exhibits a strong batch effect. (b) Pearson's correlation heatmap.

Without venturing too much into the area of statistics, principal component analysis (PCA, figure 10a) and partial least squares (PLS) are established methods for multivariate analysis of metabolomics data. PCA is a method that enables us to reduce the dimensionality of our data into inferred variables, thus helping us to identify major trends and features.

The dimensionality-reduction methods can be used in classification, regression, and prediction exercises. The quality of the statistical models that we infer depends significantly on the data pre-processing, scaling and normalisation methods used. Therefore successful data analysis requires careful investigation of multiple models for consensus building (i.e. don't rely on a single model!).

Figure 10b shows a correlation heatmap of a feature matrix, typically used in metabolomics analysis.

More information about data analysis and interpretation of metabolomics studies can be found in the materials from the first three days of our 2014 EMBO Practical Course on Metabolomics Bioinformatics for Life Scientists.