Course at EMBL-EBI

Data science for life scientists

This course will introduce life scientists to practical data science topics used in life sciences, such as data handling and visualisation, statistical analysis, application of AI, and use of publicly available databases. 

You will initially be introduced to data science theory and practice, including best practices for undertaking analyses, data management, and reproducibility. 

The course will provide hands-on training in tools and resources appropriate to your research, including introducing the use of Python for handling and visualising data, statistical analysis, and the application of machine learning.

Group projects

This course includes group projects where you will be placed in small groups to work together on a challenge set by trainers from EMBL-EBI. This allows you to explore the data sciences methods and resources you will learn about during the course and apply them to a set problem, providing you with hands-on experience. The group work will culminate in a flash talk session involving everyone on the final day of the course.

Groups are mentored and supported by the trainers who set the initial challenge, but the groups will be responsible for driving their projects forward, with all members expected to take an active role. 

There are two different group project topics, gene expression and protein structure. Both these projects will provide an opportunity for participants to apply the knowledge and skills learnt during the other sessions of the course, including data handling in Python, data visualisation, statistics and machine learning. The projects will also allow participants to gain experience of using EMBL-EBI data resources, including Expression Atlas and the Single Cell Expression Atlas for the gene expression project, and the resources PDBe and AlphaFold for the protein structures project. You will be asked during your application to select the group project topic you would most benefit from. 

The projects cover mammalian data sets, however, in many cases, the methods and approaches taught are transferable to data from various species.

Who is this course for?

Applicants are expected to be at an early stage of using data science in their research with the need to develop their knowledge and skills further. The course would most suit PhD students who are ready to start analysing their own data. No particular knowledge of programming is required for this course, however participants will be asked to complete some pre-course learning. We recommend this free tutorial to start learning Python:  http://swcarpentry.github.io/python-novice-gapminder/.

What will I learn?

Learning outcomes

After the course you should be able to:

  • Use Python to handle and visualise biological data
  • Describe and access data using EMBL-EBI data services 
  • Apply statistical methods to analyse biological data
  • Discuss applications of machine learning in life sciences 
  • Use Cytoscape to explore networks

Course content

During this course you will learn about: 

  • Using Python for biological data handling and visualisation
  • Accessing data from EMBL-EBI data services 
  • Statistical analysis of life sciences data
  • Uses of machine learning for analysis of life sciences data
  • Network analysis using Cytoscape

Trainers

Sarah Kaspar
EMBL Heidelberg
Loïc Lannelongue
University of Cambridge
Jiawei Wang
University of Bath
Andrian Yang
EMBL-EBI
Holly Joynes
EMBL Heidelberg
Daianna Gonzalez-Padilla
Wellcome Sanger Institute
Nathanael Sheehan
Technical University of Munich & University of Exeter
This course has ended

16 – 20 June 2025
European Bioinformatics Institute
United Kingdom
£900 (academia)/ £1200 (industry)
Contact
Meredith Willmott

Organisers

Share this event with: