Chemical Components in the PDB

pdbe.org/chem
spacer
spacer

PDBeChem,
PDB Chemical Components

Introduction

The Chemical component dictionary service provides web access to the "Chemical Component Dictionary" of the wwPDB as this is loaded in the PDBe database at EBI.
This dictionary is part of the core "reference" information of the PDBe relational database and is consistently referenced by all macromolecular structures for all bound molecules as well as standard and modified amino acids.
Since every residue and every atom in the PDBe database references a ligand and an atom in this dictionary, this is the repository that defines the link between proteins and chemistry.

Chemical component in PDBechem

The term chemical component (or ligand) refers to the distinct chemical entity of a stereoisomer of a small molecule or monomer.
This means that structural isomers, geometric isomers, and enantoimers (but not conformation isomers) are distinct chemical components (ligands) in PDBechem. The properties that define the chemical identity of a ligand are:
  • atoms (including hydrogens) and atom elements,
  • bonds and bond orders as well as
  • atom and bond stereo descriptors

Ligands in wwPDB

The ligand dictionary is not an isolated effort of the PDBe group. The fundamental parts of the dictionary are exchanged on a weekly basis with collaborators of the international wwPDB (RCSB - PDBj) in the form of mmCif chem_comp_group files and are in sync with the PDB archive. During this process new ligands are manually and semi-automatically processed by the wwPDB members, before they become official 3-letter code identifiers of the PDB.

How to search with PDBeChem

There is a wide range of possibilities for searching and exploring the dictionary.
  • Code: This is the PDB 3 letter code for the ligand (e.g. ATP). You may also select the "like" operator while searching for ligand codes. This search field is auto-completed with codes as you type.
  • Molecule name: An expression or word that is part of any of known molecule name (standard name, common name, systematic name). Examples:
    • 'amino',
    • 'galactose',
    • 'Ethyl'.
    You may use the '=' operator for an exact match, or the 'like' operator to search for your input as a substring. This field is auto-completed with molecule names, starting with a minimum of two input characters.
  • Formula: An expression that sets range constraints for the number of atoms from each element. The value that you have to provide is of the form [<E><n>-<m> ]* where <E> is an element <n> is the minimum number and <m> is the maximum number that the element must appear on the formula . The order in which the elements are given is not important. For example if you want to find ligands that have more than 10 and less than 15 carbons, 3 nitrogens and one oxygen, you should give 'C10-15 N3 O1'.Other examples:
    • - 'CL3 N0' find molecules with exactly 3 Clorines and no nitrogens,
    • - 'C40-50 N5-10 O5-15 S1-10' molecules with 40-50 carbons, 5-10 nitrogens, 5-15 oxygens and 1-10 sulfurs.
    By clicking on the button next to the item, you may use the formula range editor to build your formula expression interactively. You may also use the '=' operator for an exact formula match.
  • Non stereo smile: For structure based searches. By clicking on the edit button, a form appears that will allow you to specify a molecule or a molecule segment by using one of the three options:
    • - Draw the molecule using the JME Molecular editor
    • - Upload a standard chemical file like Mol2,Sdf,PDB e.t.c. in the JME editor. You may specify any file types and formats accepted by the ChemAxon MolConvert
    • - Give the standard code (i.e. ATP) of a ligand that already exists in the database in order to be loaded on the JME editor. This field is auto-completed as you type.
    After you load a ligand you may also modify it. For example if you are looking for ligands similar to ATP you may load ATP on the JME editor and then remove some atoms and bonds, keeping just the substructure you are interested in. As soon as a molecule or molecule segment is specified then you may use it to search the dictionary. All these search operations ignore stereochemistry. This means that a molecule will also match its stereoisomers. Additionally aromatic bonds are treated as single-double. This means that in the case of aromatic rings etc, there may be also some false positives. Finally, you can choose to discard bond orders, as well as consider your input as ring-strict by using the check-boxes on the editor page.
  • Fragments: Similar to formula search but now the search items are not chemical elements but chemical fragments. An expression sets range constraints for the number of occurences from each fragment. The value that you have to provide is of the form [<E><n>-<m> ]* where <E> is a fragment <n> is the minimum number and <m> is the maximum number that the fragment must be contained in the molecule. The order in which the fragments are given is not important. For example if you want to find ligands that have more than 1 and less than 3 adenine groups and a furan ring, you should give 'adenine:1-2 furan:1'. The library of chemical fragments is predefined it includes about 84 fragments while the fragment expression search is quite fast, By clicking on the button next to the item, you may use the fragment pattern editor which is practically the easier way to build a fragment expression.

Functionality of PDBeChem

The PDBeChem service offer a generic browsing interface of all areas of the ligand dictionary. The user may follow links that are available from every record in order to navigate through the relationships of the dictionary.
For example he may follow a relationship link to view the atoms of a ligand and then for a particular atom, its bonds and energy types and so on.
The "complete" link provides a single page with all the information available (including coordinates and energy types).
There is additional functionality provided for ligands. From a ligand page you may also:
  • 3-D View: choose the set of coordinates you want to use (i.e. idealised or PDB) to view in JMol
  • File Export: You have to choose the set of coordinates you want to use (i.e. idealised, PDB) , the export format (PDB,SDF,mmCIf or CML)
  • PDB entries: Follow links to the atlas pages of the entries that are including this ligand
  • Stereoisomers: This will provide all the stereoisomers of the current molecule, if any are available.

PDBeChem back-end

The database that is accessible by the service is the PDBe database, which is based on the wwPDB archive Additionally the dictionary contains classification of the atoms of the ligands in energy types, and associates them with the energy types reference dictionary for different set of libraries (different classification sets).
Please contact PDBe group for suggestions, comments or problem reports. Your input is very helpfull.

Derived information

Several external programs (CACTVS - VEGA) are also used for the ligand dictionary in order to provide derived information like
  • - Gif Images
  • - Atom energy types

The derivation of this information is performed by the PDBe group.

Citing PDBeChem

  • UNIT 14.3: Using PDBeChem to Search the PDB Ligand Dictionary
    Dimitropoulos, D., Ionides, J. and Henrick K. (2006) In Current Protocols in Bioinformatics
    (A.D. Baxevanis, R.D.M. Page, G.A. Petsko, L.D. Stein, and G.D. Stormo, eds.) pp 14.3.1-14.3.3 John Wiley & Sons, Hoboken, N. J. ISBN: 978-0-471-25093-7

  • MSDsite: behind the scene: The technology used in database searching and retrieval for the analysis and viewing of bound ligands and active sites.
    Golovin, A., Dimitropoulos, D., Oldfield, T. and Henrick, K. (2004) The eCheminfo 2004 Conference "Applications of Cheminformatics and Modelling to Drug Discovery 8-19 November.

  • MSD database and MSD database services
    A. Golovin, T. J. Oldfield, J. G. Tate, S. Velankar, G. J. Barton, H. Boutselakis, D. Dimitropoulos, J. Fillon, A. Hussain, J. M. C. Ionides, M. John, P. A. Keller, E. Krissinel, P. McNeil, A. Naim, R. Newman, A. Pajon, J. Pineda, A. Rachedi, J. Copeland, A. Sitnov, S. Sobhany, A. Suarez-Uruena, J. Swaminathan, M. Tagari, S. Tromm, W. Vranken and K. Henrick (2004) E-MSD: an integrated data Nucleic Acids Research, 32 (Database issue), D211-D216. 2004

The following methods and packages have also be used for PDBeChem