Protein Data Bank in Europe - Knowledge Base

PDBe-KB (Protein Data Bank in Europe - Knowledge Base) is a community-driven resource managed by the PDBe team, collating functional annotations and predictions for structure data in the PDB archive. PDBe-KB is a collaborative effort between PDBe and a diverse group of bioinformatics resources and research teams.

PDBe-KB currently includes projects such as SIFTS and FunPDBe , aimed at placing structures from the PDB in their biological context.

PDBe-KB is funded by EMBL and BBSRC.

Projects

SIFTS

Structure Integration with Function, Taxonomy and Sequence (SIFTS) is project in the PDBe-KB resource for residue-level mapping between UniProt and PDB entries. SIFTS also provides residue-level annotation from the IntEnz, GO, Pfam, InterPro, SCOP, CATH, PubMed, Ensembl, Homologene resources. The information is updated and released every week at the same time as the release of new PDB entries and is widely used by resources such as RCSB, PDBsum, Pfam, SCOP, InterPro.

FunPDBe

FunPDBe project integrates and makes available structural and functional annotations for macromolecular structure data in the Protein Data Bank (PDB). It is a collaboration between the Protein Data Bank in Europe (PDBe) and world-leading providers of structural bioinformatics data.

Co-factors

Cofactors are essential for many enzyme reactions. The Protein Data Bank (PDB) contains several enzyme structures, many bound to cofactor or cofactor-like molecules. In collaboration with the Thornton team, we have implemented a semi-automated annotation process that identifies such molecules in the PDB. The information is updated weekly with each PDB release and is stored in the PDBe database. The up-to-date information is made available via the PDBe REST API and query system, and can be visualised on the PDBe entry pages.

Expanding Genome3D

The goal of the Genome3D project is to provide predicted macromolecular structures for structurally uncharacterised protein sequences, thereby helping biologists exploit structural data to understand protein functions. As part of the project, one of the current objectives is to increase the volume of predicted structural data (i.e. map CATH and SCOP protein superfamilies, determine evolutionary conserved residues) and integrate it into the major central data repositories InterPro and PDBe.

Rfam mapping

Finding RNA molecules in the PDB has previously been difficult due to the lack of standard naming and classification, compared to proteins. Many RNA molecules are classified by Rfam, a database of non-coding RNA families based at the European Bioinformatics Institute (EMBL-EBI). We’ve worked together with our colleagues at Rfam to make use of their mappings of RNA molecules present in the PDB. As part of PDBe-KB, we are continuing to work on other features that will improve representation of RNA molecules at PDBe.

Integrating M-CSA data

M-CSA is a database of enzyme reaction mechanisms, maintained by the Thornton team, that is being integrated with PDBe-KB. M-CSA provides annotation on the protein, catalytic residues, cofactors, and the reaction mechanisms of hundreds of enzymes. There are two kinds of entries in M-CSA. 'Detailed mechanism' entries are more complete and show the individual chemical steps of the mechanism as schemes with electron flow arrows. 'Catalytic Site' entries annotate the catalytic residues necessary for the reaction, but do not show the mechanism.

Ligand component

The protein-ligand interactions are recorded and interactively displayed for all the bound molecules found in a structural assembly. All the structures are pre-processed with ChimeraX to add hydrogens at chemically favourable positions for the interactions to be identified using the software Arpeggio. Arpeggio follows the nomenclature established by CREDO to identify various molecular interactions between pairs of atoms including steric clashes, Van der Waals-, hydrogen-, polar-, ionic-, or hydrophobic-hydrophobic interactions on top of atom/plane-plane interactions.

PepVEP

The PepVEP project aims to use existing services from UniProt, the EBI Variation team, the Thornton team and PDBe to implement an integrated platform for interpreting the functional effects of variants. The project concentrates mainly on developing user interface to depict variation data on protein sequence and structure. PDBe is adding additional API calls to its existing API to provide structural residue information for the platform and aims to extend the sequence feature viewer to provide these data on PDBe pages.