Interpreting functional information from large scale protein structure data
Introduction
With the vast array of structures available in the PDB archive, it can be difficult to narrow down these to the ones most likely to support your research. Furthermore, once you have narrowed these down, you could be missing vital information from other related structures.
This project will introduce you to the wealth of publicly available data in the Protein Data Bank (PDB) and give you the opportunity to investigate how large subsets of structure data can be used to analyse protein features and determine function. In the project you will learn how to: identify relevant protein structures, collate and interpret functional information, and implement this process programmatically.
Scenario
In the given biological scenario, you are working in a drug design team. You are trying to design a drug that is complementary in shape and charge to a key protein associated with a specific disease. You would like to gather all the structural information available for this protein (biological target).
In this project, you will start from a specific protein (biological target) and find all the available structural information available in the PDBe and PDBe-KB resources. You will then focus on various small molecules that bind to this protein and try to identify the common binding site. You will also group the small molecules based on their binding site and check if similar small molecules bind to the same residues. Once you have an understanding of the binding sites of this protein, you will then have a look at other proteins which are similar to this specific protein. You will then use superposition processes to structurally align these proteins and interpret common features of the related binding sites.
Recommended preparation material
Basic Python tutorials
Before starting with the project please familiarise yourself with the basics of Python programming. Linked below are some repositories you can download to your local machine and work through at your own pace.
https://github.com/Joseph-Ellaway/python-basics-tutorial/tree/main
Optional:
- Python for bioinformatics: https://github.com/IC-Computational-Biology-Society/Central_Dogma_session
- Data handling and visualisation with Pandas and Matplotlib: https://github.com/IC-Computational-Biology-Society/Pandas_Matplotlib_session
To download the notebooks from GitHub, click on the green ‘Code’ button and a dropdown menu will appear. You can either download the files in a zip folder or type:
git clone <repository>
Into your computer’s terminal (command line), where <repository> name is the URL in the ‘Code’ button’s drop-down menu. Either approach will give you the exact same set of files.
Running Jupyter-Notebooks: Once you have downloaded one or all of the repositories (GitHub folders) to your computer, you can begin executing Python code through your browser. For now, we recommend using Google Colaboratory as it is completely cloud-based and runs in your browser! You can access Google Colaboratory at: https://colab.research.google.com/. Once Colaboratory is running in your browser tab, click on ‘File’ and ‘Open’ one of the notebooks (ending in .ipynb) from the GitHub repository you downloaded. Notebooks are organised into blocks. Code blocks can be run by clicking the arrow on the left of the box.
Webinars
The following webinars will be useful for you to watch:
- PDBe KB Aggregated Views – providing biological insights into 3D structures
- Introduction to PDBe programmatics access
- For further information on programmatic access please watch our recorded webinar:
Dataset
Here you will look at the Human Acetylcholinesterase (AChE), which is essential for neurotransmission in the brain. Due to the critical role it has in brain function, it is the primary target of many lethal neurotoxins, including VX and snake venom. However, it is also a clinically relevant target for drug discovery. In particular, it is the target for Donepezil, which was at one point the most commonly prescribed drug for Alzheimer’s disease. Unfortunately, this drug has some common and severe side effects that limit its clinical potential.
Recently, there have been significant efforts to develop new AChE inhibitors to have improved binding kinetics. Their aim is to reduce the adverse effects of Alzheimer’s treatment (1, 2). By analysing the structural features of AChE, you will try to understand how scientists designed the drugs to target this protein.
Studies into AChE inhibition have identified the modes of action of neurotoxins, snake venom and Alzheimer’s drugs. You will try to compare these classes of AChE inhibitors and propose possible reasons to explain why their clinical effects differ so greatly.
You will also compare the structure of AChE with Butyrylcholinesterase (BChE). Both are major drug targets in the cholinesterase enzyme family and many drugs can inhibit both enzymes. Try to spot some similarities in these two structures and any differences which might affect how drugs bind to them.
In addition, a broader exploration of acetylcholine signalling is important for understanding why AChE is a promising target for drug discovery. You will identify structures of the acetylcholine receptor complexes and obtain functional details about them.
References:
- https://journals.sagepub.com/doi/pdf/10.1177/11795735211029113
- https://pubs.acs.org/doi/10.1021/acs.jmedchem.0c01863
Project aims
- Searching the PDBe
- Search PDBe to find all the structures of Acetylcholinesterase (AChE) from Homo sapiens.
Hint: Use the Advanced Search feature to select specific organisms. - Find all the macromolecules that are described by this search.
Hint: Click on the macromolecules tab in search results. - Find all the ligands/compounds that bind with this protein.
Hint: Click on the compound tab in the search results.
- Search PDBe to find all the structures of Acetylcholinesterase (AChE) from Homo sapiens.
- Using the PDBe Knowledge Base
- Repeat step 1a by using your same search via the PDBe Knowledge Base instead.
Hint: In order to perform your search, you will need your protein’s UniProt Accession (e.g. A12345) or a specific PDB ID. Alternatively, you can access the PDBe-KB page for your search results by clicking on one of the PDB IDs and searching for the term ‘Canonical’ on the page. Click here to go straight to the PDBe_KB page. - Repeat 1b using the PDBe-KB page and navigate to the ‘Interactions’ tab. Identify the macromolecules from the Eastern green mamba (Dendroaspis angusticeps) that interact with AChE. Visualise the interaction interface in a 3D viewer.
- Repeat 1c using the PDBe-KB page and navigate to the ‘Ligands’ tab. Identify the various ligands that interact with this protein.
- Can you locate the common binding site(s)?
- Can you group together any of the ligands based on their binding site(s)?
- Repeat step 1a by using your same search via the PDBe Knowledge Base instead.
Hint: you can use the sequence-based ProtVista viewer to find patterns in the binding site. To see the ligands in 3D, click the ‘3D View of Superposed Ligands’ button.
- Are the ligands you grouped together in 2c structurally similar? In other words, do they have a common scaffold?
Hint: You can look at the 2D image or 3D view of the ligands to visually assess their structure (or shape). In the section on programmatic access, you will also learn how to partially automate this analysis, bringing data into your own code.
- Compare the binding modes of neurotoxin VX (HET-CODE: VX) and drug Donepezil (HET-CODE: E20).
- How do their binding modes differ?
- How might this determine their biological effects?
Hint: After clicking the “3D View of superimposed ligands” button, you can hide all the ligands and carbohydrates by clicking ‘None’. Then click the ‘eye’ icon to select only the ligands you are interested in.
- Related proteins
- Using the Similarity tab on the PDBe-KB page, find all the proteins that have sequence similarity to AChE. How many low-sequence similarity matches do you find?
- Head to UniProt protein BLAST and identify all the reviewed (Swissprot) human proteins that have 50% or more sequence similarity to AChE.
Hint: Use the “Restrict by Taxonomy” box to search only human proteins and the “Target Database” box to search only the UniProtKB Swissprot database).
- Using Mol*, superpose the structures of AChE and BChE proteins and compare the drug target sites identified in 2c between these sets of proteins.
- Protein complexes
- Search PDBe to find all the structures of nicotinic acetylcholine receptors from Homo sapiens.
Hint: Use the Advanced Search feature to search for complex name and organism name.
- What are the differences between the assemblies returned by this search? Why might the authors have solved structures with antibodies bound?
- Identify an entry which contains a receptor complex without any antibodies bound. Using the Complex Portal can you find functional details about this complex?
Hint: From the entry page of this structure, the complex portal ID can be found in “Details” page of the “Structure analysis” section
- Programmatic access
- Use the provided Jupyter Notebooks on GitHub (https://github.com/PDBeurope/pdbe-api-training/) and learn how to do the above aims programmatically:
- pdbe_tutorial_2024/1_running_a_search.ipynb
- pdbe_tutorial_2024/2_ligand_interactions.ipynb
- pdbe_tutorial_2024/3_macromolecular_interactions.ipynb
- pdbe_tutorial_2024/4_similar_proteins.ipynb
- pdbe_tutorial_2024/5_predicted_models.ipynb
- pdbe_tutorial_2024/6_complexes.ipynb
- Use the provided Jupyter Notebooks on GitHub (https://github.com/PDBeurope/pdbe-api-training/) and learn how to do the above aims programmatically:
Cheatsheet: Superposing on browser: details here
Project mentors
Joseph Ellaway | EMBL-EBI

Hi all! My name is Joseph and I’m a bioinformatician for the Protein Data Bank in Europe. I joined EMBL-EBI in 2022 to work on classifying protein conformational states, developing new algorithms to identify and characterise distinct protein conformers. I’ve been involved with several training events and conferences since joining EMBL-EBI and I’m looking forward to meeting you all! My background is in bioinformatics and biochemistry – graduating with a master’s degree from Imperial College London in 2021. I worked on cryoEM and X-ray model refinement, solving the structures of several protein complexes. Between 2018-2019, I joined a group at UCL as a research assistant during a placement year, performing protein biochemistry and molecular biology research into the mechanisms of neuropathic pain. My post-graduate research spanned investigations of differential equations Turing patterns in development, 3D chromosomal interactions in the nucleus and agent-based models for studying aberrant stem cell growth. I have also been involved with Imperial College’s Computational Biology Society, helping to found it in 2020 and organising seminars, an online conference and numerous educational programming sessions. I look forward to meeting you all and please don’t hesitate to ask for assistance!
Marcus Bage | EMBL-EBI
the University of Dundee, studying the mammalian mRNA Capping enzymes using molecular dynamics simulations. Marcus has a background in structural bioinformatics and holds a Masters degree in Biochemistry from University College London.
