The Protein Data Bank contains a significant number of protein structures that have ligands bound. One of the most important uses of macromolecular structure data is the study of the interactions between macromolecules and bound ligands. Due to the local nature of the interactions between ligands and macromolecules, ligand binding sites are often more highly conserved across a functional family than the overall structure and fold of the macromolecule. As such, a catalogue of ligand-macromolecule interactions can be invaluable in the wider study of structure-function relationships. The specific three-dimensional environments of the protein binding sites for different ligands has been derived from the parsing and loading of the PDB entries into a relational database. Binding site descriptors have been derived and a powerful search and retrieval system has been developed.

   MSDsite is the database and its query interface located at: http://www.ebi.ac.uk/msd-srv/msdsite.

   This project is funded by the EU TEMBLOR award.

Database

   Information about hetero groups is stored in the reference database that is part of MSD.

Data

   MSDsite allows access to the ligand environment in a particular PDB entry and contains coordinates with the respect to this entry. The derived bound molecule environments include a consideration of the symmetry in PDB X-ray entries. All possible symmetry and translations are considered to list all possible protein ligand contacts. Water coordinates are treated in a similar manner.

Bonds



     The following figure shows relations between database tables for storing interactions between small molecules and macromolecules:

Geometry

    Metal coordination geometries are calculated and stored in MSD:
  • tetrahedral
  • triagonal bipyramidal
  • square-based pyramidal
  • octahedral and undefined

The metal sites were classified and geometry types assigned based on all valid symmetry related ligand positions. Metal site geometry definition angles and distances and deviations from [6] are used.

Sequences and ProSite patterns

   ProSite patterns were loaded into PDBe and a matching algorithm was run for all PDB entriesand the matches stored in the database.

   For all non standard patterns (that are not in ProSite list but comply to the ProSite pattern format) a fast search algorithm was developed and embedded into the Oracle database.

Search concept

     The MSDsite search interface starts from the search form and once a result is found one can then amend the search and get a family of similar PDB entries. So each detail page about a particular PDB entry is at the same time a search form. Even statistics pages with the interactive charts are search forms themselves. Each result set can be amended and to cope with this challenge the front page was developed in such a way that any question, that can be asked from statistics pages or from detail pages, can also be specified on the front page form to get the same answer.

     The second search concept is that search and statistics forms have two parts: ligand search fields and structure filter. The structure filter allows one to restrict a search set by PDB header information like titles, keywords, header class, resolution, release year, author.

Presentation concept

     Now we consider user interface. What can be presented and what are the relative weights of these items? From the start internet was very sensitive to the data volume, but with the high-tech revolution 1995 - 1996 we can benefit very fast internet connection, approximately the same speed as intranet. Keeping in mind that probably most users of MSDsite are academic institutes which have fast internet access and not always have enough calculations power we can include a lot of pregenerated images in the web interface without slowing it down. The ligands presentation needs 3D graphics and applets could help here, but on the other hand their usage becomes more and more restrictive. Most of the modern browsers come without java, which has to be installed separately. The only solution is to support several 3D visualization tools and MSDsite provides Rasmol [10] script output and EBI-Astex Viewer [11] applet. Summarizing MSDsite interface presents most popular in web world html tables with lots of pop-up images and with links to several 3D visualization tools.

In more details:

Interactive sortable list of PDB entries

    The list presents description of PDB entries and has next functionality:
  1. Link to details pages for an entry
    Sequence view
    presents parts of chain sequence where it matches ProSite patterns or searched pattern or interacts with a ligand.
    Ligands view
    presents ligands and interaction with their environment. ProSite or searched pattern is presented as well where matched part of the sequence interacts with a ligand.
    Atomic interaction view
    presents interaction on atomic level- bond type, distance, angle between planes and angles for a single atom ligand.
  2. Links to 3D visualization tools
  3. Link to integrated resource - atlas pages (MSDlite, MSDPro)
  4. List can be listed and sorted interactively.

Interactive statistics charts

Ligand binding statistics
It provides a way to view statistical information regarding the interaction of particular ligand(s) with respect to residues that form an active site environment. The environment is a set of neighbour residues that interact with the ligand through a number of interaction types. It is possible to view two sets of ligand information on the same returned graph to allow comparison of the environment of these ligands.
Environment binding statistics
It provides a way to view statistical information regarding the environment of a ligand with respect to ligands. It is therefore possible to define queries based on residue type and find which ligands are bound to this environment.
Atomic bonds statistics
A search is available to present and compare statistical information for a pair ligand - macromolecule residue interaction on atomic level.
Sequence patterns binding statistics
It provides a way to search PDBe for statistical information regarding the pattern interaction with a ligand with respect to ligands. It is therefore possible to define queries based on pattern and find which ligands are bound to this pattern.

Interactive list of ligands from different PDB entries

The order of this page is to provide functionality for 3D alignment of ligands with their environment. Detailed pages for a PDB entry have links to this page so any ligand can be submitted here and then its geometry, environment and interacting patterns can be compared with another one.