Parallel processing, version control and open, reproducible science

Scaling things up

Trainers: Sean Laidlaw and Raheleh Rahbari

Overview: This lecture provides an overview of processing multiple biological datasets through a variety of methods, such as sequential and parallel computing. The practical provides training on parallelize processes for genomic data processing and analysis.

Learning outcomes:

By the end of this session you will be able to:

  • List the steps required to conduct sequential and parallel computing.
  • Know how to apply sequential and parallel computing for the processing and analysis of genomic data.

Materials:


Introduction to GitHub

Trainers: Sean Laidlaw and Raheleh Rahbari

Overview: This lecture provides a theoretical and historical overview of version control using Git, and gives technical guidance on Git, Github and the open-source Gitlab. The practical provides training on Git commits in the command line.

Learning outcomes:

By the end of this session you will be able to:

  • Define the concepts within version control using Git.
  • Apply version control using Git, in the command line

Materials:


The Turing Way and reproducible research aspects of data science

Trainers: Malvika Sharan

Overview: This lecture provides an introduction to reproducible, ethically-led open science and the Turing Way.

Learning outcomes:

By the end of this session you will be able to:

  • List steps scientists can take to make their research more reproducible and ethically-led in open science frameworks.

Materials: