Computational challenges and performance optimizations in NGS data analyses - London, UK
The application system requires cookies, and the limited processing of your personal data in order to function. By applying to this course you are agreeing to this as outlined in our Privacy Notice and Terms of Use
Date:
Tuesday 3 September 2013Application deadline:
Monday 15 July 2013Registration closed
Overview
Next-generation sequencing datasets are continuing to increase in size, and even small genomic projects now generate terabytes of data. The sheer scale of these datasets now poses a great computational challenge: how can we improve software pipelines to analyse data more efficiently?
Designed jointly by Intel and the Francis Crick Institute, this workshop will tackle some of these challenges and train participants in the principles and practicalities of optimizing NGS analysis pipelines.
Is this course right for me?
The aim of this course is to familiarize participants with high-performance computing (HPC) methodologies and to provide hands-on training on how to optimize a next-generation sequencing (NGS) analysis pipeline. The workshop is aimed at bioinformaticians who are actively involved in NGS data analysis projects and want to learn how to use HPC solutions to run their analytical pipelines in an efficient and reproducible manner. DNA and RNA sequencing analysis workflows will be used to explore bottlenecks and demonstrate solutions.
What will I learn?
Lectures will outline the computational challenges and bottlenecks associated with the analysis of NGS data and present HPC optimization approaches to overcome such challenges. Practicals will consist of computer exercises that will enable the participants to compare optimized vs. non-optimized software code for the analysis of NGS data, under the guidance of the lecturers and teaching assistants.
Prerequisites: A high degree of familiarity with the LINUX/UNIX operating system and knowledge of the R programming language. Applicants will also need to demonstrate their current involvement in high-throughput sequencing data analysis projects.
What will it cover?
Topics will include:
- How to optimize NGS analysis workflows through HPC best practices
- Optimal use of software tools for short read alignment, with emphasis on Bowtie2 and Tophat2
- HPC concepts including parallelization, single/multi-process, shared/distributed memory, CPU memory and I/O constraints, etc.
- Diagnostic tools for debugging and monitoring of parallel programs
- Benchmarking of various technology and system architecture approaches
- Cloud-based analytics
- Scaling up a workflow to deal with a production scale environment and increasingly large datasets
Additional information
Instructors:
Kristina Kermanshahche (Intel)
Clay Beshears (Intel)
Ketan Paranjape (Intel)
Vincent Plagnol (UCL)
Robert Sugar (Cancer Research UK London Research Institute)
Ernest Turro (Department of Haematology, University of Cambridge (tbc))
Kathi Zarnack (Cancer Research UK London Research Institute)
Full details regarding this event can be found here : http://crick.ac.uk/news/events/2013/09/03/intel-crick-ngs-workshop/
Programme
Detailed programme will be announced shortly.
