Course at EMBL-EBI

Practical biocuration

This three-day course will introduce concepts vital to biocuration, including metadata and ontologies, the role of identifiers, and extracting data from literature. 

During the course, there will be a focus on practical skills required by biocurators, including programming for handling large biological datasets, and querying databases. Participants will have the opportunity to use these skills in a real-life practical biocuration exercise during the group projects. 

There will also be an opportunity to network with biocurators and learn more about their roles and data resources.

Group projects

A major element of this course is a group project, where participants will be placed in small groups to work together on a challenge set by trainers from EMBL-EBI and external institutes. This allows people to explore the role of biocurators and provide participants with hands-on experience in biocuration. The group work will culminate in a presentation session involving all participants on the final day of the course, giving an opportunity for wider discussion in the whole course group.

Groups are mentored and supported by the trainers who set the initial challenge, but the groups will be responsible for driving their projects forward, with all members expected to take an active role. Groups are pre-organised before the course.

Basic outlines of the projects on offer this year are given below. In your registration you must indicate your first and second choice of project. Not all projects may be offered, and final decisions on which projects will be run during the course will be made based on the number of applicants per project.

Group project: Curating a protein complex

In this project, a new protein complex has been published and it is your role to find out as much information as possible about the complex. Using literature and bioinformatics resources, you will provide an overview of the protein complex including a full list of the complex members, a description of the role of each protein and an overview of the role of the complex. You will use resources such as UniProt, PDB and EuropePMC.

Project mentor: Michele Magrane

Group project: Exploring data and metadata from a genome-wide association study

The GWAS Catalog is a richly-annotated database of human genome-wide association studies, which analyse associations between genetic variants and a disease or other trait of interest in a sample of individuals from a particular population. In this mini project you will examine a GWAS publication in detail, to extract information about the traits and samples under investigation and decide how to represent these metadata using standardised vocabularies and ontologies. You will also use simple command line tools to look at a GWAS dataset and explore the role of standard formats and quality control in biocuration.

Project mentor: Elliot Sollis

Group project: Using alignments to improve Rfam families: a case of curating non-coding RNA families

Rfam is the database of non-coding RNAs (ncRNAs) families. These families are built out of a sequence alignment, metadata describing the family and an infernal model. Each family is built by curating alignments from publications or user submitted alignments. In this project we will demonstrate the process of updating an existing family with an improved alignment. We will examine the reports generated during curation. In this process, we will examine the phylogenetic distribution, sequence alignment and model results to determine how the family should be updated to reflect the improved alignment. Additionally, we will show how to connect this information to resources like Wikipedia.

Project mentor: Nancy Ontiveros

 

Who is this course for?

Anyone interested in becoming a biocurator, new biocurators or biologists planning to use biocuration skills in their research.

What will I learn?

Learning outcomes

After the course you should be able to:

  • Discuss the varied role of biocurators and different types of biocuration
  • Describe how metadata and ontologies are used by biological data resources
  • Extract biological data from literature 
  • Explain some challenges that arise in biocuration 
  • Use some tools and techniques for the handling of biological data

Course content

During this course you will learn about: 

  • What biocuration is 
  • The different types of biological databases and how they can relate to each other
  • The role of metadata and ontologies 
  • Literature mining for biocuration
  • Handling datasets with SQL and Python
  • Tools and techniques used by biocurators for biological data handling

Trainers

Dayane Rodrigues Araujo
EMBL-EBI
Sarah Dyer
EMBL-EBI
George Georghiou
Novartis
Matthew Jeffryes
EMBL-EBI
Antonia Nilsson Lock
EMBL-EBI
Michele Magrane
EMBL-EBI
Aleena Mushtaq
EMBL-EBI
Claire O'Donovan
EMBL-EBI
Nancy Ontiveros Palacios
EMBL-EBI
Pedro Raposo
EMBL-EBI
Elena Speretta
EMBL-EBI
Elliot Sollis
EMBL-EBI
Anna Swan
EMBL-EBI
Krishna Kumar Tiwari
EMBL-EBI
Simone Weyand
EMBL-EBI
Aleix Puig
EMBL-EBI
This course has ended

16 – 18 May 2023
European Bioinformatics Institute
United Kingdom
£75.00 inclusive of catering
Contact
Sophie Spencer

Organisers
  • Dayane Rodrigues Araujo
    EMBL-EBI
  • Michele Magrane
    EMBL-EBI
  • Anna Swan
    EMBL-EBI

In association with:


Share this event with: