Course at EMBL-EBI
Genome bioinformatics: from short- to long-read sequencing
A guide to the technology, analysis workflows, tools, and resources for next generation sequencing data analysis.
This course will provide insights and training into how biological knowledge can be derived from genomics experiments and explain different approaches in analysing such data. The main focus will be on introducing sequence informatics, re-sequencing, differences between short- and long-read sequencing, and variant calling during the analysis of higher-eukaryotes, with an emphasis on human genetic research. Throughout the week, more advanced topics will introduce the creation of pipelines, automation, and the scaling-up of analysis experiments.
Practical sessions will be run on datasets prepared by the trainers, not on personal research data. Participants will learn how to process these training datasets and to apply appropriate statistical methods in their analyses.
Who is this course for?
The course is aimed at PhD students and post-doctoral researchers who are starting to use high-throughput sequencing technologies and bioinformatics methods in their research. The content is most applicable for those working with eukaryotic genomes, especially in the area of human genomics.
Participants will require a basic knowledge of the Unix command line in order to adequately complete the practical sessions. A short pre-course session will be offered. Additionally, we recommend this free tutorial or other similar ones:
- The Unix Shell: https://swcarpentry.github.io/shell-novice/
Please note that participants without basic knowledge of these resources will have difficulty in completing the practical sessions.
What will I learn?
Learning outcomes
After the course participants will be able to:
- State the advantages and limitations of short- and long-read sequencing technologies
- Apply appropriate quality control (QC) and aligners to unassembled short- and long-reads
- Perform variant calling analysis and annotation
- Scale-up and automate simple genomics pipelines
- Access genomic datasets from online public resources
Course content
During this course you will learn about:
- Quality control methods for cleaning raw sequencing data
- Alignment of reads to a reference genome
- File format conversion and processing
- Tools for variant calling (both single nucleotide and copy number analysis)
- Approaches for scaling up and reproducible research
- Methodologies for variant annotation
- Resources for genomic data:
Trainers
Chiara Batini
University of Leicester Kayesha Coley
University of Leicester Erik Garrison
University of Tennessee Health Science Center Mohab Helmy Abdelfattah Mostafa Elbishbishy
EMBL-EBI Maira Ihsan
EMBL-EBI Sean Laidlaw
Wellcome Sanger Institute Malvika Sharan
The Alan Turing Institute Charles Solomon
University of Leicester Maxime Tarabichi
Institut de Recherche Interdisciplinaire en Biologie Humaine et Moléculaire (IRIBHM)
University of Leicester
University of Leicester
University of Tennessee Health Science Center
EMBL-EBI
EMBL-EBI
Wellcome Sanger Institute
The Alan Turing Institute
University of Leicester
Institut de Recherche Interdisciplinaire en Biologie Humaine et Moléculaire (IRIBHM)
Programme
Please note that the programme is still subject to minor changes.
All times in the programme are listed in GMT.
| Time | Topic | Trainer |
| Pre-course session (virtual) – Thursday 16 November 2023 | ||
| 12:00 - 15:00 | Introduction to Unix and BASH | Kayesha Coley |
| Time | Topic | Trainer |
| Day one – Monday 20 November 2023 | ||
| 11:00 – 11:30 | Arrival and registration | |
| 11:30 – 12:00 | Welcome and introduction to EMBL-EBI | Daniel Thomas Lopez |
| 12:00 – 13:00 | Overview of NGS technologies | Chiara Batini |
| 13:00 – 14:00 | Lunch | |
| 14:00 – 14:30 | Quality control – lecture | Charles Solomon |
| 14:30 – 15:30 | Quality control – practical | Charles Solomon |
| 15:30 – 16:00 | Break | |
| 16:00 – 18:00 | Read mapping | Chiara Batini and Charles Solomon |
| 18:00 | End of day one | |
| 18:00 – 18:30 | Check-in at Hinxton Hall Conference Centre, and free time | |
| 18:30 | Dinner at Hinxton Hall Conference Centre | |
| Day two – Tuesday 21 November 2023 | ||
| 09:00 – 09:30 | Recap of day one | Participants |
| 09:30 – 11:00 | SAM/BAM file formats – BAM refinement, QC, and visualisation, part one | Chiara Batini |
| 11:00 – 11:30 | Break | |
| 11:30 – 12:30 | SAM/BAM file formats – BAM refinement, QC, and visualisation, part two | Chiara Batini |
| 12:30 – 14:00 | Lunch and group one poster session | |
| 14:00 – 15:30 | Variant calling (SNPs and indels) | Chiara Batini |
| 15:30 – 16:00 | Break | |
| 16:00 – 18:00 | Variant calling and filtering – practical | Chiara Batini |
| 18:00 | End of day two | |
| 18:00 –18:30 | Free time | |
| 18:30 | Dinner at Hinxton Hall | |
| Day three – Wednesday 22 November 2023 | ||
| 09:00 – 09:30 | Recap of day two | Participants |
| 09:30 – 11:00 | Variant calling (SVs and CNVs) – lecture | Maxime Tarabichi |
| 11:00 – 11:30 | Break | |
| 11:30 – 13:00 | Variant calling (SVs and CNVs) – practical | Maxime Tarabichi |
| 13:00 – 14:00 | Lunch | |
| 14:00 – 15:30 | Long-read sequencing, part one | Mohab Helmy |
| 15:30 – 16:00 | Break | |
| 16:00 – 18:00 | Long-read sequencing, part two | Mohab Helmy |
| 18:00 – 18:30 | Free time | |
| 18:30 | Coach departs for The Tickell Arms, Whittlesford from Hinxton Hall Conference Centre Reception | |
| 18:45 – 21:00 | Dinner at The Tickell Arms, Whittlesford | |
| 21:00 | Coach returns to Hinxton Hall Conference Centre | |
| Day four – Thursday 23 November 2023 | ||
| 09:00 – 09:30 | Recap of day three | Participants |
| 09:30 – 11:30 | Scaling things up: Genome bioinformatics on clusters & parallel computing – lecture | Sean Laidlaw |
| Scaling things up: Genome bioinformatics on clusters & parallel computing – practical | Sean Laidlaw | |
| 11:30 – 12:00 | Break | |
| 12:00 – 13:00 | Introduction to GitHub and sharing research code – lecture | Sean Laidlaw |
| 13:00 – 14:00 | Lunch | |
| 14:00 – 16:00 | Introduction to GitHub and sharing research code – practical | Sean Laidlaw |
| 16:00 – 17:00 | Break and group two poster session | |
| 17:00 – 18:00 | Open and reproducible research | Yo Yehudi |
| 18:00 | End of day four | |
| 18:00 – 18:30 | Free time | |
| 18:30 | Dinner at Hinxton Hall | |
| Day five – Friday 24 November 2023 | ||
| 09:00 – 09:30 | Recap of day four | Participants |
| 09:30 – 11:00 | Ensembl genome browser and VEP | Louisse Mirabueno |
| 11:00 – 11:30 | Break | |
| 11:30 – 12:30 | European Nucleotide Archive | Maira Ishan |
| 12:30 – 13:00 | Data management | Daniel Thomas Lopez |
| 13:00 – 14:00 | Lunch | |
| 14:00 – 15:00 | European Variation Archive | Dona Shaju |
| 15:00 – 15:45 | Keynote lecture (virtual): Human Pangenome Consortium | Erik Garrison |
| 15:45 – 16:15 | Course wrap-up and feedback | Daniel Thomas Lopez |
| 16:15 | End of course | |
| 16:30 | Shuttle to Cambridge train station | |
Please read our application support page before starting your application. In order to be considered for a place on this course, you must do the following:
- Complete the online application form.
- Ensure you add relevant information to the ‘submission details’ section where you are asked to provide information on your:
- pre-requisite skills and knowledge
- current work and course expectations
- data availability
- Upload one letter of support from your supervisor or a senior colleague detailing reasons why you should be selected for the course.
Please submit all documents during the application process by 23:59 BST on 7 August 2023. Items marked * in the application are mandatory. Incomplete registrations will not be processed.
All applicants will be informed of the status of their application (successful, waiting list, unsuccessful) by 4 September 2023. If you have any questions regarding the application process please contact Sophie Spencer.
The registration fee of £825.00 includes:
- Catering as detailed on the course programme
- Accommodation for four nights (20, 21, 22 and 23 November) at Hinxton Hall Conference Centre
- Bespoke course handbook with links to the course materials
- Use of a computer in the EMBL-EBI training suite throughout the course
- Shuttle bus on the final course day to Cambridge train station
Accommodation
Hotel rooms will be provided onsite at Hinxton Hall Conference Centre. Please contact them directly if you wish to arrange to stay additional nights around the course dates.
Catering
The course includes catering as detailed on the programme tab. Successful applicants will be asked for any dietary requirements and allergies upon registration.
Posters
All participants are expected to present a poster that will be displayed during the course outside the training room. Successful applicants will be asked to submit their poster upon registration. We will print these for you and have them available when you arrive on site.
All posters should:
• be A2 in size – 420mm x 594 mm
• be in a portrait orientation
• include your photograph and contact information
We expect the posters to act as a talking point between you, other participants and the trainers on the course. The posters will be displayed throughout the week so people can view them during breaks and lunch. They should give the reader an idea of the work you are engaged in, what you are planning to do next, and anything of interest that might be useful for sharing with the gathered participants.