Course at EMBL-EBI

Protein function prediction with machine learning and interactive analytics

Do you want to learn how to develop models to predict protein function? Do you want analyse and exploit the growing volume of biological data? Do you want to develop basic skills in novel machine learning approaches and big data technologies?

This workshop explores how to conduct functional annotation of proteins through machine learning (ML) approaches. Participants will gain an insight into existing public protein data resources; and how novel approaches can be used to analyse and explore these data to gain new understanding of protein function. The workshop will introduce Apache Spark and Apache Zeppelin; technologies for fast data processing and integrating analytics respectively.

Who is this course for?

This workshop is aimed at researchers and bioinformaticians from across industry and academia who are looking to leverage machine learning approaches in protein function prediction. It will guide participants through the use of big data to build analytical workflows on publically-available biological data.

Participants will require prior experience in the use of the command line interface and confidence in a programming language to fully benefit from the workshop. Please contact us if you have any questions about the course's suitability before you apply.

What will I learn?

Learning outcomes

After this course you should be able to:

  • Search and locate protein data of interest
  • Conduct interactive analytics and data transformation using machine learning approaches
  • Create simple analytical workflows using publically-available data
  • Discern new biological insights about protein function
  • Develop models for predicting protein function
Course content

The workshop will cover the following topics:


Tom Hancocks
Rabie Saidi
Hermann Zellner


Day 1 – Wednesday 29 May 2019
08:30 Shuttle from Cambridge Station  
09:00-09:15 Arrival and registration  
09:15-09:45 Welcome and introduction to workshop Tom Hancocks
09:45-10:45 Functional annotation of proteins in UniProt Hermann Zellner
10:45-11:00 Break
11:00-12:30 Protein data retrieval Herman Zellner & Rabie Saidi
12:30-13:00 Lunch
13:00-15:00 Introduction to machine learning and big data Rabie Saidi
15:00-15:15 Break  
15:15-16:00 Introduction to Spark & Zeppelin Rabie Saidi
16:00-17:00 Spark & Zeppelin practical Rabie Saidi
17:00 End of day  
17:15 Shuttle to Cambridge Station  
Day 2 – Thursday 30 May 2019
08:30 Shuttle from Cambridge Station  
09:00-09:15 Arrival  
09:15-10:30 Data transformation Rabie Saidi
10:30-10:45 Break
10:45-12:30 Exploratory data analysis Rabie Saidi
12:30-13:00 Lunch
13:00-15:00 Creation and application of prediction models with Spark/MLlib Rabie Saidi
15:00-15:30 Break  
15:30-16:45 Creation and application of prediction models with Spark/MLlib Rabie Saidi
16:45-17:00 Course wrap-up and feedback Tom Hancocks
17:00 End of day  
17:15 Shuttle to Cambridge Station  

Attendance at this workshop is allocated on a first come, first served basis.

Please note that registration closes two weeks prior to the course, so please register as soon as you can. Once you have registered and we have received payment we can provide a letter of support should you require a visa to travel to the UK. Applying for a visa can take several weeks, and it might not be possible to be granted a visa if you register just before the closing date. If you are unable to attend, then please notify us as quickly as possible so that we can offer your place to someone else.

Once you have registered please send Marina Pujol ( a picture of yourself and a Microsoft Word (.docx) document containing three short paragraphs with a biography, work history and description of your current research interests; each paragraph should be no more than 100 words.

The registration fee covers your lunch, refreshments and a shuttle between Cambridge Station and the Wellcome Genome Campus. Accommodation is not included and you will need to make your own arrangements.

Further learning

A short webinar summarising the main content of this workshop is available here:

This course has ended

29 - 30 May 2019
European Bioinformatics Institute United Kingdom
Marina Pujol

  • Tom Hancocks
  • Rabie Saidi

Share this event with: