Until around 1990, when there were fewer than 1000 entries in the Protein Data Bank (PDB), many structural biologists were intimately familiar with many of the structures in the PDB. In those early days, the 4-character PDB ID code often had "meaning" - for instance, 1hhb was easily recognized as containing the structure of human haemoglobin, 1frd was ferredoxin and 1tim was triosephosphate isomerase. Nowadays, with tens of thousands of entries in the PDB, the more or less arbitrarily assigned PDB ID codes no longer carry any meaning. However, there are many cases where it would be very useful to have a way of obtaining at-a-glance information about the salient features of one or more PDB entries. This is the case when you find cross-references in other databases to PDB entries, when you view the hits obtained in a database search of the PDB, or when a fold-comparison program returns a list of PDB entries that contain proteins with a similar fold to your structure. This is where PDBlogos and PDBprints come into play.
What are PDBlogos?PDBlogos are stylised icons that convey important information about a PDB entry. For example, the PDBlogos below signify that the biomacromolecule in an entry derives from a fungus and that the structure was determined by X-ray crystallography.
PDBlogos by default are shown on (colour-blind-friendly) "EMBL-green" background (although this may be a different colour on external websites that use them). However, sometimes the background will be grey - this signifies that either the feature symbolised by the PDBlogo is absent, or that the underlying data is not available, not published, or not deposited. For instance, the PDBlogos below show that an entry does not contain any nucleic acid molecules and that the structure has not (yet) been published:
What are PDBprints?In order to make the interpretation of the information conveyed by PDBlogos easier and to provide consistent information when a number of PDB entries are compared, PDBe has also developed PDBprints. A PDBprint for a PDB entry is a collection of PDBlogos displayed in a specific order, where each icon represents a well-defined category of information. In the first release of PDBprints (summer 2010) the following categories are included (in this order):
- Primary citation: has the PDB entry been published?
- Taxonomy: what is the source organism of the biomacromolecule(s) in the entry?
- Sample-production technique: how was the sample of the biomacromolecule(s) obtained?
- Structure-determination method: which experimental technique(s) was used to determine the structure and was the experimental data deposited?
- Protein content: does the entry contain any protein molecules?
- Nucleic acid content: does the entry contain any nucleic acid molecules (DNA, RNA or a hybrid)?
- Heterogen content: does the entry contain any ligands (such as inhibitors, cofactors, ions, metals, etc.)?
To give an example, the PDBprint for PDB entry 1cbs looks as follows:
This PDBprint shows immediately that this is a published crystal structure of a heterologously expressed human protein in complex with a ligand, for which the experimental diffraction data were deposited.
What can you tell about PDB entry 2kvy from its PDBprint?
Check your interpretation using the summary page for 2kvy.
In any PDBprint, the individual PDBlogos, the PDBID code and the PDBe icon are all clickable links that will take you to a page that contains further information about the corresponding entry. If you move your mouse over a PDBlogo without clicking on it, a tool tip will be displayed that explains what information the icon conveys, for instance:
You can try out these features with the PDBprints for 1cbs and 2kvy shown above.
To appreciate how useful PDBprints are when viewing a list of database hits, click here to see the results of a live search of the PDB for the term "lambda repressor" (and click "+ Other entries" button), or here for all PDB chains mapping to repressor protein CI in lambda phage. It is immediately obvious which entries in the results list are NMR structures, which entries contain protein-DNA complexes, which entries contain human protein, and so on. The underlying information is retrieved in real time from the PDBe database and is therefore always up-to-date. For instance, once the paper describing an entry is published, the first PDBlogo will change from a grey to a coloured background as soon as the publication details have been updated in the PDB.
The idea for PDBlogos and PDBprints was inspired by the logos used by John Overington on his ChEMBLog to convey information about newly approved drugs.
PDBprints are already incorporated in popular web-based resources such as the Electron Density Server at Uppsala (EDS), Pfam at Sanger Institute, and the PDBe atlas pages.
Further informationBelow are some links providing more information and examples:
- A table containing a detailed description of each of the categories of information in a PDBprint, the values it can assume and the corresponding PDBlogos.
- Instructions for people who want to include PDBprints in their own webpages.
- A sample page showing some of the rendering options for PDBprints (developers may wish to have a look at the HTML source of this page).
- A set of 100 random PDBprints in various image-sizes: 36px*36px, 48px*48px, 64px*64px and 128px*128px.
- PDBprints flyer (pdf).