Genome bioinformatics: from short- to long-read sequencing

Course at EMBL-EBI

Genome bioinformatics: from short- to long-read sequencing

A guide to the technology, analysis workflows, tools, and resources for next generation sequencing data analysis.

This course will provide insights and training into how biological knowledge can be derived from genomics experiments and explain different approaches in analysing such data. The main focus will be on introducing sequence informatics, re-sequencing, differences between short- and long-read sequencing, and variant calling during the analysis of higher-eukaryotes, with an emphasis on human genetic research. Throughout the week, more advanced topics will introduce the creation of pipelines, automation, and the scaling-up of analysis experiments.

Practical sessions will be run on datasets prepared by the trainers, not on personal research data. Participants will learn how to process these training datasets and to apply appropriate statistical methods in their analyses.

Who is this course for?

The course is aimed at PhD students and post-doctoral researchers who are starting to use high-throughput sequencing technologies and bioinformatics methods in their research. The content is most applicable for those working with eukaryotic genomes, especially in the area of human genomics.

Participants will require a basic knowledge of the Unix command line in order to adequately complete the practical sessions. A short pre-course session will be offered. Additionally, we recommend this free tutorial or other similar ones:

The Unix Shell: https://swcarpentry.github.io/shell-novice/

Please note that participants without basic knowledge of these resources will have difficulty in completing the practical sessions.

What will I learn?

Learning outcomes

After the course participants will be able to:

State the advantages and limitations of short- and long-read sequencing technologies
Apply appropriate quality control (QC) and aligners to unassembled short- and long-reads
Perform variant calling analysis and annotation
Scale-up and automate simple genomics pipelines
Access genomic datasets from online public resources

Course content

During this course you will learn about:

Quality control methods for cleaning raw sequencing data
Alignment of reads to a reference genome
File format conversion and processing
Tools for variant calling (both single nucleotide and copy number analysis)
Approaches for scaling up and reproducible research
Methodologies for variant annotation
Resources for genomic data:

Trainers

Chiara Batini
University of Leicester

Kayesha Coley
University of Leicester

Erik Garrison
University of Tennessee Health Science Center

Mohab Helmy Abdelfattah Mostafa Elbishbishy
EMBL-EBI

Maira Ihsan
EMBL-EBI

Sean Laidlaw
Wellcome Sanger Institute

Louisse Mirabueno
EMBL-EBI

Raheleh Rahbari
Wellcome Sanger Institute

Dona Shaju
EMBL-EBI

Malvika Sharan
The Alan Turing Institute

Charles Solomon
University of Leicester

Maxime Tarabichi
Institut de Recherche Interdisciplinaire en Biologie Humaine et Moléculaire (IRIBHM)

Programme

Please note that the programme is still subject to minor changes.
All times in the programme are listed in GMT.

Time	Topic	Trainer
Pre-course session (virtual) – Thursday 16 November 2023
12:00 - 15:00	Introduction to Unix and BASH	Kayesha Coley

Time	Topic	Trainer
Day one – Monday 20 November 2023
11:00 – 11:30	Arrival and registration
11:30 – 12:00	Welcome and introduction to EMBL-EBI	Daniel Thomas Lopez
12:00 – 13:00	Overview of NGS technologies	Chiara Batini
13:00 – 14:00	Lunch
14:00 – 14:30	Quality control – lecture	Charles Solomon
14:30 – 15:30	Quality control – practical	Charles Solomon
15:30 – 16:00	Break
16:00 – 18:00	Read mapping	Chiara Batini and Charles Solomon
18:00	End of day one
18:00 – 18:30	Check-in at Hinxton Hall Conference Centre, and free time
18:30	Dinner at Hinxton Hall Conference Centre
Day two – Tuesday 21 November 2023
09:00 – 09:30	Recap of day one	Participants
09:30 – 11:00	SAM/BAM file formats – BAM refinement, QC, and visualisation, part one	Chiara Batini
11:00 – 11:30	Break
11:30 – 12:30	SAM/BAM file formats – BAM refinement, QC, and visualisation, part two	Chiara Batini
12:30 – 14:00	Lunch and group one poster session
14:00 – 15:30	Variant calling (SNPs and indels)	Chiara Batini
15:30 – 16:00	Break
16:00 – 18:00	Variant calling and filtering – practical	Chiara Batini
18:00	End of day two
18:00 –18:30	Free time
18:30	Dinner at Hinxton Hall
Day three – Wednesday 22 November 2023
09:00 – 09:30	Recap of day two	Participants
09:30 – 11:00	Variant calling (SVs and CNVs) – lecture	Maxime Tarabichi
11:00 – 11:30	Break
11:30 – 13:00	Variant calling (SVs and CNVs) – practical	Maxime Tarabichi
13:00 – 14:00	Lunch
14:00 – 15:30	Long-read sequencing, part one	Mohab Helmy
15:30 – 16:00	Break
16:00 – 18:00	Long-read sequencing, part two	Mohab Helmy
18:00 – 18:30	Free time
18:30	Coach departs for The Tickell Arms, Whittlesford from Hinxton Hall Conference Centre Reception
18:45 – 21:00	Dinner at The Tickell Arms, Whittlesford
21:00	Coach returns to Hinxton Hall Conference Centre
Day four – Thursday 23 November 2023
09:00 – 09:30	Recap of day three	Participants
09:30 – 11:30	Scaling things up: Genome bioinformatics on clusters & parallel computing – lecture	Sean Laidlaw
	Scaling things up: Genome bioinformatics on clusters & parallel computing – practical	Sean Laidlaw
11:30 – 12:00	Break
12:00 – 13:00	Introduction to GitHub and sharing research code – lecture	Sean Laidlaw
13:00 – 14:00	Lunch
14:00 – 16:00	Introduction to GitHub and sharing research code – practical	Sean Laidlaw
16:00 – 17:00	Break and group two poster session
17:00 – 18:00	Open and reproducible research	Yo Yehudi
18:00	End of day four
18:00 – 18:30	Free time
18:30	Dinner at Hinxton Hall
Day five – Friday 24 November 2023
09:00 – 09:30	Recap of day four	Participants
09:30 – 11:00	Ensembl genome browser and VEP	Louisse Mirabueno
11:00 – 11:30	Break
11:30 – 12:30	European Nucleotide Archive	Maira Ishan
12:30 – 13:00	Data management	Daniel Thomas Lopez
13:00 – 14:00	Lunch
14:00 – 15:00	European Variation Archive	Dona Shaju
15:00 – 15:45	Keynote lecture (virtual): Human Pangenome Consortium	Erik Garrison
15:45 – 16:15	Course wrap-up and feedback	Daniel Thomas Lopez
16:15	End of course
16:30	Shuttle to Cambridge train station

This course has ended

20 – 24 November 2023

European Bioinformatics Institute

United Kingdom

£825.00 inclusive of four nights accommodation and catering, including dinner

Contact
Sophie Spencer

Organisers

Chiara Batini
University of Leicester
Raheleh Rahbari
Wellcome Sanger Institute
Daniel Thomas Lopez
EMBL-EBI

Share this event with: