Protein Function Prediction with Machine Learning and Interactive Analytics


 Wednesday 29 Thursday 30 May 2019


European Bioinformatics Institute (EMBL-EBI) - Wellcome Genome Campus, Hinxton, Cambridge,  CB10 1SD, United Kingdom

Application opens: 

Friday 15 February 2019

Application deadline: 

Friday 10 May 2019


First come, first served


Marina Pujol

Registration fee: 


Registration closed

Find other similar courses »


Do you want to learn how to develop models to predict protein function? Do you want analyse and exploit the growing volume of biological data? Do you want to develop basic skills in novel machine learning approaches and big data technologies?

This workshop explores how to conduct functional annotation of proteins through machine learning (ML) approaches. Participants will gain an insight into existing public protein data resources; and how novel approaches can be used to analyse and explore these data to gain new understanding of protein function. The workshop will introduce Apache Spark and Apache Zeppelin; technologies for fast data processing and integrating analytics respectively.


This workshop is aimed at researchers and bioinformaticians from across industry and academia who are looking to leverage machine learning approaches in protein function prediction. It will guide participants through the use of big data to build analytical workflows on publically-available biological data.

Participants will require prior experience in the use of the command line interface and confidence in a programming language to fully benefit from the workshop. Please contact us if you have any questions about the course's suitability before you apply.

Syllabus, tools and resources

The workshop will cover the following topics:


After this course you should be able to:

  • Search and locate protein data of interest
  • Conduct interactive analytics and data transformation using machine learning approaches
  • Create simple analytical workflows using publically-available data
  • Discern new biological insights about protein function
  • Develop models for predicting protein function

Additional information

The registration fee covers your lunch, refreshments and a shuttle between Cambridge Station and the Wellcome Genome Campus. Accommodation is not included and you will need to make your own arrangements.

Further learning

A short webinar summarising the main content of this workshop is available here:


Day 1 – Wednesday 29 May 2019
08:30 Shuttle from Cambridge Station  
09:00-09:15 Arrival and registration  
09:15-09:45 Welcome and introduction to workshop Tom Hancocks
09:45-10:45 Functional annotation of proteins in UniProt Hermann Zellner
10:45-11:00 Break
11:00-12:30 Protein data retrieval Herman Zellner & Rabie Saidi
12:30-13:00 Lunch
13:00-15:00 Introduction to machine learning and big data Rabie Saidi
15:00-15:15 Break  
15:15-16:00 Introduction to Spark & Zeppelin Rabie Saidi
16:00-17:00 Spark & Zeppelin practical Rabie Saidi
17:00 End of day  
17:15 Shuttle to Cambridge Station  
Day 2 – Thursday 30 May 2019
08:30 Shuttle from Cambridge Station  
09:00-09:15 Arrival  
09:15-10:30 Data transformation Rabie Saidi
10:30-10:45 Break
10:45-12:30 Exploratory data analysis Rabie Saidi
12:30-13:00 Lunch
13:00-15:00 Creation and application of prediction models with Spark/MLlib Rabie Saidi
15:00-15:30 Break  
15:30-16:45 Creation and application of prediction models with Spark/MLlib Rabie Saidi
16:45-17:00 Course wrap-up and feedback Tom Hancocks
17:00 End of day  
17:15 Shuttle to Cambridge Station