Finding Evidence in Research Publications


 Tuesday 10 Wednesday 11 September 2019


European Bioinformatics Institute (EMBL-EBI) - Training Room 2 - Wellcome Genome Campus, Hinxton, Cambridge,  CB10 1SD, United Kingdom

Thursday 21 March 2019

Friday 23 August 2019


Meredith Willmott

This workshop introduces tools and approaches used to discover biologically relevant data in the research literature. Participants will be introduced to the basics of programmatic analysis of scientific literature and explore the principles of dictionary-based text-mining, explained using relevant case studies. The workshop has a strong focus on practical exercises and group project work to give participants hands-on experience, tackling biologically relevant problems based on what they have learnt in this workshop.

Do you want to identify publicly available datasets cited in research literature to use them in your future analyses? Do you want to develop basic skills to programmatically access and analyse scientific literature? Do you want to learn about the basics of text analytics to make the most of literature review?

Published research contains a wealth of new data and evidence. Automating literature review and text analysis can help extract valuable knowledge from millions of research publications. Modern tools for literature analysis can help you identify potential drug targets, predict host-pathogen interactions, or infer growth regulators for crops, based on published findings.


This workshop is aimed at life science researchers, who are interested in extracting data and evidence from research literature. It will help those who want to identify cited datasets for reuse, further analyses, background research, or as supporting data for own hypotheses. The workshop would also be of interest to those who are applying or planning to apply literature analysis/text-mining in their own research projects.

Participants will benefit from an undergraduate level knowledge of biology. Participants should ideally have some bioinformatics experience and/or basic understanding of programmatic access. Please note that this workshop requires no prior knowledge of text analytics or computer programming skills.

Regardless of your current knowledge, we encourage participants to explore this short series of recorded webinars on an introduction to programmatic access.

Syllabus, tools and resources

During this workshop you will learn about:

  • How researchers share and cite data
  • How to search for data cited in the literature
  • Basics of dictionary-based text mining
  • Basics of ontologies
  • Programmatic tools to find data and evidence in the literature

Tools and resources covered include: 

Please note that this workshop does not cover text mining.


At the end of the workshop you should be able to:

  • Search and locate publicly-available open-access datasets from the scientific literature
  • Use programmatic tools to search for data in the literature
  • Appreciate the value and limitations of text-mining approaches
  • Develop basic sets of heuristics for text analytics
  • Apply the information you have discovered in the context of your own research


Day 1 – Tuesday 10 September 2019
Finding data cited in research literature
08:30 Shuttle from Bus Stop 5, Cambridge Station   
09:00-09:15 Arrival and registration  
09:15-09:30 Welcome and introduction to the workshop Tom Hancocks
09:30-10:30 Introduction to EMBL-EBI data resources Tom Hancocks
10:30-10:45 Break  
10:45-11:15 How researchers share and cite data Michelle Magrane
11:15-12:30 Search tools for data cited in the literature Dayane Araujo
12:30-13:00 Lunch  
13:00-13:45 Searching for datasets on a particular research topic Dayane Araujo
13:45-14:00 Introduction to REST API Dayane Araujo
14:00-14:30 Introduction to Europe PMC API Dayane Araujo
14:30-14:45 REST API data module Dayane Araujo
14:45-15:00 PDBe: WESTLIFE project Nurul Nadzirin
15:00-15:15 Break  
15:15-17:00 Programmatic search for datasets in research articles - project Dayane Araujo & Maaly Nassar
17:00 End of day  
17:15 Shuttle to Bus Stop 5, Cambridge Station   
Day 2 – Wednesday 11 September 2019
Scientific literature as big data: an introduction to text analysis
08:30 Shuttle from Bus Stop 5, Cambridge Station   
09:00-09:15 Arrival and registration  
09:15-09:20 Scientific literature as big data - the value of text and data mining Dayane Araujo
09:20-09:35 Finding protein-protein interaction evidence in the literature Kalpana Paneerselvam
09:35-10:00 Europe PMC Annotations platform Dayane Araujo
10:00-10:45 Annotation reviewing Dayane Araujo
10:45-11:00 Break  
11:00-11:30 Introduction to Annotations API Dayane Araujo
11:30-12:30 Project work and feedback All
12:30-13:00 Lunch  
13:00-13:10 Limitations of text mining Xiao Yang & Dayane Araujo
13:10-13:40 Basics of dictionary-based text mining Xiao Yang & Dayane Araujo
13:40-14:10 Introduction to ontologies Aravind Venkatesan
14:10-14:30 Open Targets case study Zoe Pendlington
14:30-15:00 DIY text mining project Xiao Yang & Dayane Araujo
15:15-15:30 Break  
15:30-16:45 DIY text mining project Xiao Yang & Dayane Araujo
16:45-17:00 Feedback and wrap-up Tom Hancocks
17:00 End of workshop  
17:15 Shuttle to Bus Stop 5, Cambridge Station