0%

Why create workflows?

On the previous page, we saw that workflows can include a range of tools, including those written in different programming languages, and so the aim of the workflow is to bring them all together in an automated way, making the analysis steps easier to run, more shareable and more reproducible.

What is reproducibility?

Reproducibility is about being able to reproduce data analyses and results. This could be simply you reproducing your own analyses in a few months time or someone you don’t know reproducing an analysis that you publish in a journal article. Either way, reproducibility is a key aim for life sciences researchers and allows both you and others to build on your work.

For example, when planning an analysis, we need to consider and keep track of:

  • The format of the data at all stages
  • Any pre-processing steps
  • Tools/software used, including which release/version
  • Parameters we select when using tools

Workflows go a step beyond keeping track of these elements by allowing us to build in decisions we make, such as parameters we select, ensuring reproducibility of our analyses.

To help us think about the different aspects that contribute to making our analyses and data reprocible, we have the FAIR principles (findable, accessible, interoperable and reusable). We will consider how to apply the FAIR principles to workflows later on in this tutorial, but for more details around applying the FAIR principles to data management, please go to the section ‘Is it FAIR?’ in our online tutorial Bringing data to life: Data management for the biomolecular sciences.

What tools are available for creating workflows?

There are many different platforms and tools for creating workflows. Over the next few pages, we’ll look briefly at three platforms/tools:

  • NextFlow
  • SnakeMake
  • Galaxy