UniPDB - a UniProt-PDB sequence-coverage widgetThe UniPDB (pdbe.org/unipdb) widget graphically brings together the sequence information from UniProt, protein families (if any) from Pfam and 3D structures from the PDB. This is useful to biologists in assessing the availability and extent of 3D structural coverage of the protein of interest.
Check the structural coverage of your favourite protein:
Provide a UniProt code (by name, e.g. NGF_MOUSE, or by accession, e.g. P01139) and hit the "enter" key or click the "Go!" button.
Or click on the following links to try some representative examples:
- P38398 (human BRCA1 protein)
- P01031 (human complement C5)
- P03023 (Lac repressor from E. coli)
- P29373 (human cellular retinoic acid-binding protein type 2)
- P01139 (beta-nerve growth factor from mouse)
- P22364 (amicyanin from P. denitrificans)
- Q07412 (triosephosphate isomerase from P. falciparum)
- Q16576 (human histone-binding protein RBBP7)
- P01308 (human insulin)
- A5YV76 (fatty acid synthase from pig)
- P03372 (human estrogen receptor)
The UniPDB widget can be used from this page, or it can be included in your own web pages.
The PDBe Atlas pages for many PDB entries contain a table with all the UniProt sequences present in that entry. For each Uniprot entry, a button labelled "UniProt coverage" will launch UniPDB for that sequence (below, left). You can also access this page from the "PDBe Tools" menu on the PDBe front page (below, right).
Why UniPDB?UniProt is the primary resource for information about protein sequence and function and the PDB is the global archive of 3D structures of biomacromolecules and their complexes. Many PDB entries contain proteins that are also archived in UniProt. As a biologist interested in a particular protein, you may want to find out which entries in the PDB (if any) contain structures for (parts of) your favourite protein, and how these structures map to the sequence.
Structural biologists often work on partial sequences (e.g., stably folded domains) and sometimes have to modify the natural sequence to facilitate expression or crystallisation or to allow investigation of the effect of a mutation on the behaviour of the protein (such as catalytic activity or ligand-binding specificity). In addition, the same structure can be determined many times, e.g. in different laboratories, under different conditions, with different ligands, etc. For these reasons, it is not always easy to do sequence-based searches of the PDB and synthesise the results into an overview of what structural information is available for which parts of your favourite protein. This problem is addressed by the UniPDB widget. It uses the mapping data between UniProt sequences and PDB entries that is provided by the SIFTS resource (a collaboration between the UniProt and PDBe teams at the EBI). SIFTS contains mappings from PDB entries to other bioinformatics resources as well, including Pfam (sequence-based protein domains), CATH and SCOP (both of these are structural fold classifications).
How to use UniPDBThe only piece of information you need to supply as a user is a UniProt code such as Q16576 or RBBP7_HUMAN. If you don't know the UniProt code of your protein, you can use UniProt search tools. The widget has an information and control bar at the top which contains:
- The name, species and UniProt code of the selected protein.
- A button labeled "Related PDB sequences" - if you click on it, a new tab will open in which the results of a sequence search against the entire PDB will be shown. Using the PDBeXplore browser, you will be able to analyse in detail all the PDB entries that contain a protein with a sequence similar to that of your favourite protein. This is a quick way of finding orthologous and paralogous sequences.
- A button labeled "Download results" - this will open a new tab that essentially contains all the mapping information in a tabular, textual form (more precisely, all items are tab-separated). This is useful if you want to include the details in a report or carry out further analysis of the PDB entries.
- On the right there is an input box in which you can type a new UniProt code and then hit the "enter" key to launch UniPDB for that new protein.
- On the far right is a question mark icon - click on it to get to this very page.
Below the top bar is a graphical representation of the entire sequence of the selected UniProt entry, including a ruler to indicate residue numbering. The UniProt code is on the left and if you click on it you will go to the UniProt page for that protein. Hovering over the green information icon will open a pop up window that provides more information about the sequence and any and all Pfam domains that occur in it. The Pfam domains are also shown graphically and clicking on one will take you to the corresponding Pfam page for that domain. The rest of the widget contains the graphical display of the coverage of the UniProt sequence by PDB entries. There is one row per entry which contains, from left to right:
- The PDB code (e.g., 1cbs) - clicking on it will take you to the Atlas page for that PDB entry.
- 4 small PDBlogos that convey more information about the PDB entry namely:
- is a publication describing this structure available? (green icon means 'yes', grey icon means 'no')
- which structure-determination technique was used (e.g., X-ray or NMR) and is the experimental data available? (green icon means 'yes', grey icon means 'no')
- is there any DNA or RNA in the entry? (green icon means 'yes', grey icon means 'no')
- are there any small molecules present in the entry? (green icon means 'yes', grey icon means 'no')
- A green information icon - if you hover over it, more information about the mapping will be displayed.
- A green download icon - click on this icon to download the text file for this UniProt or PDB entry. Depending on your browser settings, the file may open in a new tab or a new window.
- One or more coloured bars show which part(s) of the UniProt sequence were observed in the PDB entry. Each bar has one or more capital letters in it - these are the names of the chains in the PDB entry that map (in part) to the UniProt sequence.
How to include UniPDB in your own web pagesIt is easy to include the UniPDB widget in your own webpages. Just add the following code:
In the HTML body of your page, add a div or span at a suitable place with a
unique id to host the UniPDB widget:
suitable place in your webpage, e.g. in the window.onload function:
Arguments to UniPDBwidget() are pretty straightforward and should all be
enclosed in a hashmap:
- hostelem is the id of the HTML element (e.g. div or span) in which the widget is to be displayed.
- height is the height in pixels of the widget. Height will be reduced if the content does not need that much height. Height should be at least 200 for the widget to get rendered properly.
- width is the width in pixels of the widget. Width should be at least 900 for the widget to get rendered properly.
- uniprot is the UniProt accession code (e.g. P01139) or name (e.g. NGF_MOUSE) for which the PDB coverage is to be displayed.
AcknowledgementsIn the development of UniPDB, we have made use of the following:
- The Pfam domain graphics library for canvas-based 2D-drawings of domains
FeedbackThe first release of UniPDB is a no-frills widget that we hope will be easy to use and invaluable in terms of the information and links it provides. We welcome suggestions for additional features - let us know what we could do to make the widget even more useful for you. Please use the feedback button at the top of the page for any feature requests or other comments.
If you should encounter a bug, it would be very helpful if you could provide us with a brief description in the feedback form or, even better, e-mail us a screenshot at firstname.lastname@example.org.