Mining PDBe and PDBe-KB using a graph database


 Tuesday 18 Thursday 20 February 2020


European Bioinformatics Institute (EMBL-EBI) - Wellcome Genome Campus, Hinxton, Cambridge,  CB10 1SD, United Kingdom

Friday 08 November 2019

Friday 31 January 2020


Open application with selection


Meredith Willmott

This workshop covers the use of the PDBe graph database to extract data for solving complex structural biology queries. It will introduce the PDBe graph database and how to write Cypher queries to retrieve data of interest. Workshop participants will be able to use the graph database to explore data relevant to their own research with support and guidance from the development team at PDBe.

The graph database integrates annotations provided by PDBe-KB partners and is implemented in Neo4J. In this graph each PDB entry is represented as a tree, with the root being the PDB entry, connected to chains and entities, which are then connected to residues. Each of the PDB residues (>150 million) are linked to available annotations (e.g. is the residue part of a catalytic site?, or is it on a macromolecular interaction interface?) and are also directly connected to their corresponding UniProt residues. Storing PDBe-KB data as a graph offers great benefits in particular by allowing straightforward transfer of annotations between PDB entries which map to the same UniProt accession, as well as to highly identical UniProt accession.

This workshop is aimed at bioinformaticians with experience of analysing data from the PDB, either by processing archive files or via API access. We encourage applications from individuals with specific questions relating to PDB data that are difficult to solve using existing data queries. Programming experience is required, with a preference for those familiar with Python, although  this is not an absolute requirement.

An example use case might involve research into a specific drug molecule, where protein structure is relevant to drug specificity. The graph database would allow the analysis of all common interaction sites in PDB at the residue level, with the potential to expand this search across ligands containing similar fragments. Additional searches could analyse the protein-protein interaction sites between different isoforms of the same protein, and cross-reference them to sequence conservation data and predicted functional annotations.

Researchers should submit a 200-word abstract when they apply that describes their work and potential queries related to PDB data. This should include details on how PDB data has been accessed previously and the types of questions trying to be answered.

Syllabus, tools and resources

This course will cover:


At the end of this workshop participants will be able to:

  • Access the PDBe graph database using Neo4J
  • Query the database using Cypher queries
  • Find complex data connections
  • Answer complex questions about protein structures

Additional information

The workshop fee covers your catering, refreshments, 2-nights accommodation and a shuttle between Station Road, Cambridge and the Wellcome Genome Campus.


Day 1 – Tuesday 18 February 2020
13:00-14:00 Welcome, introductions and networking Tom Hancocks
14:00--15:30 Overview of PDBe David Armstrong
15:30-16:00 Break  
16:00-18:30 Introduction to the graph database Mihaly Varadi
Day 1 – Wednesday 19 February 2020
09:00-10:30 Utilising the graph database  
10:30-11:00 Break  
11:00-12:30 Utilising the graph database  
13:30-15:30 Project work All
16:00-18:30 Project work All
Day 1 – Wednesday 19 February 2020
09:00-10:30 Project work All
11:00-12:30 Project work All
13:30-14:30 Project discussion All
14:30-14:45 Wrap-up and feedback Tom Hancocks
