Europe PMC and Open Targets Platform
Europe PMC uses machine learning, specifically text mining, to provide articles with annotations:
“Europe PMC is an open science platform that enables access to a worldwide collection of life science publications and preprints from trusted sources around the globe.” One of the tools provided by Europe PMC is the annotation API. “Annotations are biological terms or relations, such as diseases, chemicals or protein interactions, which can be highlighted for readers on abstracts and full text articles. These terms are identified by text mining algorithms, developed by a variety of text mining groups.” (from ePMC website: https://europepmc.org/Annotations) .
Bibliography is a unique source of information to identify and prioritise targets; the whole corpus of literature represents the accumulated scientific knowledge that drives therapeutic innovations. The Europe PMC dataset is used by Open Targets in two ways:
To extract evidence for target disease associations. A pipeline developed by Europe PMC and Open Targets identifies target-disease co-occurrences in the literature and provides an assessment on the confidence of the relationship. The pipeline uses deep-learning based Named Entity Recognition (NER) to identify targets (usually genes or proteins) and diseases mentioned in publications or preprints.
All co-occurrences of both types of entities in the same sentence are considered evidence (Figure 30).

Figure 30 Text mining evidence for the ACE2-COVID-19 association in the Open Targets Platform (release 23.12).
To provide context to entities and identify similar entities in the literature. In the Open Targets Platform, users can browse the available literature for the entity (target, disease, drug) of their choice. An additional functionality developed using a Word2Vec ML model identifies similar entities in the literature, suggesting connections that our users may not be aware of (Figure 31). Entities are said to be similar to each other when they are both likely to co-occur surrounded by the same entities in specific sections of publications across the whole corpus of scientific literature.
For a deep-dive into how this pipeline was developed, read the post on the Open Targets blog.

Find out more in the webinar run jointly by Europe PMC and Open Targets.