Course at EMBL-EBI
Data science for life scientists
Hands-on machine learning for biological data
This course introduces you to practical data‑science skills that are increasingly essential in modern life‑science research. Using Python, you’ll learn how to prepare, process, and visualise key biological data types, and you gain hands‑on experience training and evaluating machine‑learning models with real, publicly available datasets.
An essential part of the course is the group project work: working in small teams, you will have the opportunity to apply what you have learnt during the lectures. Group projects are supported by mentors who provide feedback, advice, and a clear project framework.
This course is ideal for life scientists beginning to integrate data science into their work. Beginner‑level Python is required, but no prior machine‑learning experience is expected.
'Data science for life scientists 2027' is currently in development. Sign up for alerts to hear as this course develops.
Who is this course for?
This course is for you, if you are a life scientist beginning to integrate data science approaches into your research, and you are wanting to develop practical skills in programming, data handling, and machine learning. It will be particularly suitable for PhD students and early-career researchers who are ready to start analysing their own data.
While no prior experience in machine learning is expected, a beginner level in Python is required. We recommend going through this free online training. Further materials may be sent to selected applicants before the course if we think it would be helpful before the course begins.
What will I learn?
Learning outcomes
By the end of the course, you will be able to:
- Use Python to collect, handle, and visualise biological data
- Identify analysis methods suitable for particular datasets
- Apply preprocessing pipelines and good practices for analysis and reproducibility
- Apply statistical methods to biological data
- Train and evaluate simple machine learning models
- Discuss applications of large language models in the life sciences
Course content
During this course you will learn about:
- APIs and Python for data collection
- Data preprocessing and visualisation in Python
- Core machine learning methods and evaluation
- Deep learning approaches, including neural networks and transformers
- Large language models and their potential applications in the life sciences
This course opens applications on 30 November 2026. Click to register your interest using the button on the right-hand side of this course website to hear as the course develops.