0%

Bioinformatics analysis

Downstream analysis of data generated from sequencing eDNA is required to quantify or assess biodiversity. General steps for analysing metabarcoding data are:

  1. QC (Quality control): Examine the quality of the raw sequence data and that it is of the expected length, using a tool such as fastqc.
  2. Primer removal: Primer sequences will be identical regardless of the template sequence and so need to be removed prior to downstream analysis, using a tool such as cutadapt.
  3. Filtering: Removing low quality or adapter sequences, using a tool such as trimmomatic or DADA2.
  4. Sequence clustering/ or denoising: Recovering accurate biological sequences from filtered data. Many analysis tools are available and are discussed below.
  5. Chimera removal: Filtering out any erroneous/artefact sequences, usually included in the same analysis tools as above.
  6. Assigning taxonomy: Comparing your sequences to a reference database to assign taxonomy. This is discussed in the ‘Taxonomic classification’ section.

Some of the commonly used and open source analysis tools and packages that will process the data for several of the above steps include:

  • DADA2 – This open-source software package, designed for modelling and correcting amplicon errors from Illumina sequencing, precisely infers sample sequences and can distinguish variations as small as a single nucleotide (Callahan BJ et al, 2016)
  • MOTHUR – Mothur is designed as an all-in-one software package that enables users to analyse community sequence data using a single platform. It integrates and expands on earlier tools, offering a flexible and robust solution for sequencing data analysis (Schloss PD et al, 2009)
  • QIIME – This platform facilitates modular analysis of microbiome data (Bolyen E et al, 2019)
  • UPARSE – A pipeline for de novo OTU construction from next-generation reads, delivering high accuracy in recovering biological sequences and enhancing richness estimates in mock communities (Edgar R, 2013)

The functionality and operation of these tools are thoroughly explained in a published review, which you may find valuable to explore (Prodan A et al, 2020).

After reviewing the eDNA metabarcoding workflow, take a moment to design your own workflow. Choose the sampling methods, DNA extraction kit, sequencing technology, and data analysis package that best fit your biodiversity project needs.

In the next section you will have an opportunity to use the DADA2 package for data analysis by performing a short practical.