spacer
spacer

Protein Data Bank in Europe Group Background

The PDBe (formerly MSD) group was set up with the aim of creating an autonomous capability in Europe to manage the collection, storage, analysis and dissemination of data on macromolecular structures. This is done in collaboration with database groups from the USA, Japan and elsewhere.

Up until 1998 the repository for macromolecular structure data, the Protein Data Bank (PDB), was maintained by Brookhaven National Laboratory (BNL) in the USA. The PDB franchise was retendered in 1998 and as a result US government support for the PDB was transferred to the Research Collaboratory for Structural Biology (RCSB) during 1999.

Since its inception in 1996 the EPDBe project has been planned in distinct phases. The two main priorities in each phase have been developing procedures to aid the deposition of structures to the PDB and developing a relational database implementation of the PDB that will overcome numerous problems inherent in the existing format.

Phase I

BNL mirror and design of database - Autumn 1996 to Spring 1998

The first staff were appointed to the EBI Macromolecular Structure Database Group at the end of 1996 with funding from the European Union and UK Wellcome Trust. The project has been planned in three phases.

  • A full mirror of the BNL PDB web site was established at EBI.
  • The BNL 'Autodep' software was adapted at EBI to allow a site remote from BNL to receive depositions of new macromolecular structures.
  • The Data Harvesting concept to simplify deposition of macromolecular structures was developed, and demonstration code written.
  • Collaboration with the EU validation network was established and the Biotech server set up PQS and Unpublished References services set up.

In addition, the design phase for a relational database version of the PDB was started.

Phase II

Processing PDB entries and building database - Summer 1998 to Winter 1999

As the PDB franchise moved from BNL to the RCSB, the EBI-PDBe group became responsible for processing those entries deposited at the EBI using AutoDep. From 1998 to 1999 the preliminary database design was consolidated and the first implementations drawn up.

  • Processing PDB entries deposited at EBI using AutoDep.
  • First implementations of relational database built. Client software written to populate it from legacy PDB data.
  • Crystallographic Software modified to support Data Harvesting.
  • Protocols established for exchange of data with the RCSB.

Phase III

Coupling of processing to database - Spring 2000 to present

Initially processing PDB entries after deposition via AutoDep was a relatively slow and painstaking process, typically taking half a day per structure. The processing procedure was therefore modularised and individual modules developed to use the relational database built during phase II.

  • AutoDep 3.0 built and released. This version implements data harvesting for CNS and REFMAC.
  • Protocols for exchange of data with the RCSB implemented.
  • Large staff expansion funded to develop advanced clients for the new database.

Document mantained by: Gaurav Sahni
spacer