Course at EMBL-EBI

Data science for life scientists

2026

This course will introduce life scientists to practical data science methods increasingly used in research, with a particular focus on statistics and machine learning, using Python as the language of choice. Participants will learn how to prepare, process, and visualise some key biological datatypes, and gain hands-on experience in training and evaluating machine learning models using publicly available data. 

The course will combine morning lectures (supported by short practical exercises where relevant) with afternoons working on group projects, providing independent, hands-on experience in applying what is learnt to research questions. Trainers and mentors will be available throughout to guide participants and offer advice and feedback.
 

Group projects

A central part of the course is the group project. You will work in small groups, applying machine learning approaches to a biological question of interest. These projects will follow a framework suggested by the mentors, but be driven by participants, with as much support and input from mentors as you need.

During the application process, you will be asked to indicate your preferences from a list of biological applications for the group project work. During the course, following a framework provided by project mentors, each group will finalise their project aims, collect and process data, and apply machine learning approaches to address their research question. Groups will receive regular feedback and support from mentors, and are expected to take an active role in driving their projects forward collaboratively.

There will be daily opportunities to present progress so far and receive feedback, and the projects will end with a flash-talk session on the last day, giving participants the opportunity to present the data, methods, and insights they have developed during the training.

Who is this course for?

This course is aimed at life scientists who are beginning to integrate data science approaches into their research, and who wish to develop practical skills in programming, data handling, and machine learning. It will be particularly suitable for PhD students and early-career researchers who are ready to start analysing their own data.

While no prior experience in machine learning is expected, a beginner level in Python is required. We recommend going through this free online training: https://swcarpentry.github.io/python-novice-gapminder/. Further materials may be sent to selected applicants before the course.

What will I learn?

Learning outcomes

By the end of the course, you will be able to:

  • Use Python to collect, handle, and visualise biological data
  • Identify analysis methods suitable for particular datasets
  • Apply preprocessing pipelines and good practices for analysis and reproducibility
  • Apply statistical methods to biological data
  • Train and evaluate simple machine learning models
  • Discuss applications of large language models in the life sciences

Course content

During this course you will learn about:

  • APIs and Python for data collection 
  • Data preprocessing and visualisation in Python
  • Core machine learning methods and evaluation
  • Deep learning approaches, including neural networks and transformers
  • Large language models and their potential applications in the life sciences
Applications close
01 March 2026

15 – 19 June 2026
European Bioinformatics Institute
United Kingdom
£925.00 academia / £1,225.00 industry
Contact
Juanita Riveros
Open application with selection
30 places

Organisers

Share this event with: