Q. 1. Which amino acids contain chiral carbon atoms? Are there any amino acids that contain more than one chiral carbon atom? If so, which one(s)?
The CA atom of all residues except glycine (Gly) is chiral (L-amino acids, remember?). In addition, the CB atom of threonine (Thr) and iso-leucine (Ile) is chiral.
Q. 2. Currently there are 22 (rather than 20) known naturally occurring, genetically encoded amino acids. Number 21 has one-letter code U and number 22 has one-letter code O. What are the names and three-letter codes of these two amino acids? How many PDB entries contain at least one of them?
Amino acid number 21 is selenocysteine (SEC, U) and number 22 is pyrrolysine (PYL, O). In October 2019 there were 53 entries containing SEC and 5 entries containing PYL.
Q. 3. Do you expect the CB atom of a tyrosine residue to lie in the same plane as the aromatic ring?
Yes. The carbon atoms in the aromatic ring are in one plane with the atoms bonded to them, which includes the CB and OH atoms.
Q. 4. Using your favourite graphics program or web-based 3D viewer, have a look at residue TRP D67 in PDB entry 7GPB. Does anything strike you as odd?
Yes. The two rings of the tryptophan sidechain are not co-planar, but are essentially perpendicular to each other! The wwPDB validation report shows that there are geometry issues with this residue. In fact, there are many validation issues. 7GPB is a relatively old structure published in 1991. Unfortunately, deposition of structure factors was not mandatory at that time, so it is not possible for the structure to be cleaned up by PDB_REDO and there are no electron-density maps.
Q. 5. The three most-densely populated areas in the Ramachandran plot are called the alpha, the beta, and the left-handed helical region. Where are these three regions located approximately in the Ramachandran plot?
- Alpha: the only region that is dark blue (near -70, -50).
- Beta: top-left (near -100, 130).
- Left-handed helical: region with positive phi (near +50, +30).
Q. 6. Why do glycine residues have an atypical distribution? And proline residues?
- Glycines have only a hydrogen atom as a "side-chain" and are therefore much less "sterically disadvantaged", allowing them to assume a much wider range of phi, psi combinations than other residue types.
- Prolines, on the other hand, are much more restricted due to the ring that includes CA and N.
The original Ramachandran analysis excluded glycine and proline residues. Recent implementations typically use separate analyses for glycine and proline. MolProbity extends this by separating out cis-Pro and trans-Pro and adding a pre-proline category.
Q. 7. Which regions would you expect to be most favourable in the Ramachandran plot of a protein that consists entirely of D-amino acids?
L and D-amino acids are each other's mirror image (as far as the main-chain geometry goes). This means that the phi and psi torsion angles in all-D proteins will have the opposite signs compared to all-L proteins. In other words, the Ramachandran plot (and its most favourable regions, too) is inverted through the central point (phi = 0, psi = 0). In effect, this turns the Ramachandran plot upside down.
Q. 8. What are the three conformations that give rise to local maxima in the chi-1 distribution called?
Just as with simple organic molecules, we find local maxima (i.e., energetically favoured conformations) at +60 (gauche +), 180 (anti or trans) and -60 (or +300; gauche -) degrees. (Note: sometimes the definitions of gauche-plus and gauche-minus are switched.)
Q. 9. Can you explain why there appear two "humps" in the chi-1 distribution near +35 and -35 degrees?
These are due to proline residues. The fact that the sidechain of proline is a ring that also includes CA and N severely restricts the conformational freedom of the sidechain torsions.
Q. 10. How many clear rotamer conformations exist for leucine residues?
From the distribution of chi-1 and chi-2 for leucine residues we can clearly see that there are two conformations that are favoured strongly. These are near -60, 180 (54% of all leucines), and near 180, +60 (27%). The third peak, near the unusual angle combination of -90, +40, represents ~9% of all leucines in the plot, but is an artefact due to modelling errors.
Q. 11. In the plot of the chi-2 distribution, where do you expect to find most of the proline residues? What are the two most favourable rotamers that you expect to find for proline?
As we saw earlier, proline chi-1 torsion angles are constrained to values near -35 and +35 degrees due to the severe restrictions that the ring structure imposes. The same is true for the chi-2 torsion. However, out of the four possible combinations of chi-1 and chi-2 torsion angles, only two are compatible with an undistorted, closed ring structure, namely those in which chi-1 and chi-2 have opposite signs. Hence, the two rotamers we expect to be most favourable are (roughly) [+35,-35] and [-35,+35] for [chi-1,chi-2]. (Another way to look at this is that there are two allowed puckers for proline rings.)
Q.12. What are your answers to the above questions for 1CBS, based on the validation report?
What is your impression of the quality of this entry? The slider graphic shows the structure is of good quality.
How does it compare to other entries in the PDB and to other crystal structures at similar resolution? Compared to structures at similar resolution most categories are in the blue, “better than average” region. Only sidechain outliers are in the average (white) region. Side-chain rotamer analysis is now routinely performed so it is more likely that new structures have good rotamers even for residues with weak electron density. (Although this structure was built using a smaller chi-1/chi-2 rotamer library available at the time, i.e. 1994.)
Are there any residues with a poor fit to the density? No, the summary graphic does not have any red dots that would indicate a poor fit to electron density.
Are there any consecutive stretches of residues with many outliers? No, there are only isolated residues and "stretches" of two residues with geometric outliers.
Is the geometry of the ligand in order? The retinoic acid ligand has one bond length and two bond angles that Mogul considers to be “unusual” compared to small-molecule structures in the CSD. The Z-scores of these are just above 2 and less than 2.5 so they are only slightly unusual.
Does it appear to fit the density well? The REA ligand has an RSR of 0.10 and an RSCC of 0.96 so the fit to the electron density is excellent.
Q. 13. What are the values of Rfree and of (Rfree - R) for 1CBS? Are these good or bad?
From the validation report the reported value of R is 0.200 and Rfree is 0.237 and thus Rfree-R is 0.037 - these are reasonably good. The DCC recalculation in the validation process lowers R and Rfree to 0.184 and 0.189 and their difference thus reduces to 0.005. However, this is because the final round of refinement of 1CBS was carried out against all the data, including the test set of reflections. Note that modern software does produce a lower value of R than was attainable at the time (in 1994), even though the coordinates and temperature factors haven't changed. This is due to better procedures for scaling and bulk-solvent correction.
Q. 14. Are the interactions between the protein in 1CBS and its ligand "sensible"?
The retinoic acid contains a long aliphatic tail, which is surrounded by apolar atoms (e.g. from Thr, Val, Leu and Pro residues). The negatively charged carboxylate group forms a “salt-bridge” (charge-charge interaction) with the guanidinium group of arginine 132 as well as a hydrogen bond with tyrosine 134. All these interactions make perfect sense. (If you inspect the 3D structure, you may find that the carboxylate interacts with an additional arginine residue, but since this interaction is mediated by a water molecule it is not shown in the 2D diagram.)
Q. 15. How good is the electron density for the ligand in 1CBS, compared to that of the protein?
The electron density for the REA ligand looks good. At a level of 0.35 e-/A^3 (~1 "sigma"), and even at 0.55 e-/A^3, the 2mFo-DFc density covers the whole molecule with distinct bumps for the methyl groups and the carboxylate moiety. There is a small positive difference density peak (with a maximum of ~3.5 "sigma"). This could be noise but could also indicate that the ring pucker might benefit from readjustment or that there may be an alternative conformation for the ring. Re-refinement with good restraints would be necessary to test these hypotheses.
Q. 16. It has been suggested (reference) that residues 20, 29 and 30 change their sidechain conformations upon ligand-binding to form a non-sequential/spatial Nuclear Localisation Signal. Inspect the density for these three residues in both the apo structure (1XCA) and the holo structure (1CBS). Are the changes in conformation supported by the data (density)?
Looking at 1CBS, there is some electron density for Lys 20 but also some difference density. Arg 29 has density but there is difference electron density with its current placement and there are some validation issues indicating clashes. There is no electron density for Lys 30 beyond the delta carbon and some difference density indicating that there might be an alternative conformation.
Looking at 1XCA (a lower resolution structure), the electron density in this region is rather broken for the A chain but better in chain B. However, Arg 29 is involved in a salt bridge to a symmetry mate in the B chain. As the hypothesis involves a subtle conformational change the evidence appears weak!
Q. 17. Of all the X-ray crystal structures of proteins released in the period 1990-1995 that are still in the PDB, and with resolution between 2.5 and 3.0 Å, which one is the best and which one is the worst? (Hint: use the PDBe search facility to select all X-ray protein crystal structures that satisfy the resolution and release date criteria, and sort by quality.)
The answer may differ depending on when you do this, but the procedure is as follows: first, go to the PDBe website and enter an asterisk (*) in the search box, and hit the "Search" button - this will select all structures in the archive (including unreleased and removed entries) - in September 2018, there were 162,314 entries. Then use the facets on the left to select the appropriate set of entries: molecule type "protein" (140,550 entries), experimental method "X-ray diffraction" (126,866 entries), entry status "REL" (still 126,866 entries), resolution distribution "[2.5 TO 3.0]" (24,216 entries), and finally release year distribution "[1990 TO 1995]" (640 entries). Sort the results by quality (descending), so the first entry shown has the best quality (1LBT). Now, sort the results by quality (ascending) - the first listed entry has the poorest quality (1NRP). Or, in one single URL: https://www.ebi.ac.uk/pdbe/entry/search/index/?searchParams=%7B%22text%22:%5B%7B%22value%22:%22*%22,%22condition1%22:%22AND%22,%22condition2%22:%22Contains%22%7D%5D,%22q_molecule_type%22:%5B%7B%22value%22:%22Protein%22,%22condition1%22:%22AND%22,%22condition2%22:%22Equal%20to%22%7D%5D,%22q_experimental_method%22:%5B%7B%22value%22:%22X-ray%20diffraction%22,%22condition1%22:%22AND%22,%22condition2%22:%22Equal%20to%22%7D%5D,%22q_resolution%22:%5B%7B%22value%22:%222.5%20-%203%22,%22condition1%22:%22AND%22,%22condition2%22:%22%3D%20range%22%7D%5D,%22q_release_year%22:%5B%7B%22value%22:%221990%20-%201995%22,%22condition1%22:%22AND%22,%22condition2%22:%22%3D%20range%22%7D%5D,%22resultState%22:%7B%22tabIndex%22:0,%22paginationIndex%22:1,%22perPage%22:%2210%22,%22sortBy%22:%22Sort%20by%22%7D%7D
You can also build up the query by clicking on the green box marked Advanced search near the top of the page.
Update October 2019: the query still returns 640 entries, but now the entry with the lowest quality score is 1DPR and the one with the highest score is still 1LBT.
Q. 18. What is your opinion of the quality of PDB entry 2GN5?
2GN5 is not in any danger of winning any beauty contests.
Q. 19. PDB entry 1RIP is an NMR structure. What is your impression of its quality?
Oops…
Q. 20. 2HHB, 3HHB and 4HHB are all crystal structures of human hemoglobin. In fact, all three structures were derived from the same crystallographic data. Nevertheless, the quality of the three entries differs rather dramatically. Compare and contrast the three entries and discuss their quality. How would you rank the three models?
Let me know what you find! (You may want to read the original paper to understand how these entries are related.)
Q. 21. In 1993, the 1.74 Å structure of a complex of a mutant of intestinal fatty-acid binding protein (IFABP) with oleic acid was reported (reference). The density for the carboxylate group was ambiguous and the model as deposited in the PDB (1ICN) contains three alternate conformations for this moiety. In a later study, this structure was used by Klebe and co-workers (reference) to validate their docking program and scoring function. The docking calculations indicated that the "observed" binding mode of the oleic acid was not particularly favourable. Instead, their method suggests that a different orientation of the entire ligand (in essence, swapping the head and the tail) is much more favourable. Inspect the density for the oleic acid ligand in the structure of 1ICN. Is the model with three alternative conformations of the carboxylate group credible in terms of (a) density, and (b) stabilising interactions? Is there support in the density for the alternative orientation, with the oleate's head and tail reversed, and with hydrogen bonds between the carboxylate oxygen atoms and an amide group in the protein? What is your conclusion?
Let me know what you find!
Q. 22. PDB entries 1KEL (solved at 1.9 Å) and 1FL6 (solved at 2.8 Å) both contain a ligand with an excruciatingly long name that we shall refer to as simply AAH. For both these structures, assess how much you trust the (a) presence, (b) orientation, (c) conformation, and (d) coordinate precision of the AAH ligand.
Let me know what you find!
Q. 23. Quite a few structures contain one or a few D-amino acids. These may be either genuine D-amino acids or artefacts due to model-building or refinement errors. PDB entry 1A7S is a 1.1 Å structure, in which valine 50 is a D-amino acid. Is this a genuine D-amino acid or an artefact? And how about residue E115 in the 2 Å structure with PDB code 1AN1?
- Valine 50 in 1A7S looks very much like an artefact. It sits in a region of poor density and obviously has been distorted into an almost planar configuration around the CA atom so as to more or less fit the density, and still it fits poorly. It is unclear from the PDB/mmCIF file if the authors have even noticed the distortion.
- Aspartate E115 in 1AN1 on the other hand looks much more convincing. In this case the authors have obviously given the issue some thought, since the residue has been given type DAS (=D-aspartate); see also the REMARK 999 and MODRES records in the PDB/mmCIF file.
Q. 24. Read this short paper (2 pages) if you have access to it. Describe in your own words what the authors are trying to say. Confirm your suspicions by inspecting the electron density in the binding site of PDB entry 2GWX and by comparing it to that in entry 2BAW.
Let me know what you find!
Q. 25. There are hundreds of structures for hen egg-white lysozyme in the PDB. Which one has the highest quality according to the PDBe search facility? Is this also the one with the highest resolution data?
Again, your answer may be different depending on when you do this, but the procedure is: go to the PDBe website and start typing lysozyme into the search box. From the search-completion box, select "Lysozyme" in the Enzyme category. From the Organism facet, select "Gallus gallus" (chicken; 793 entries in March 2019). Sort by quality (descending) to find the entry with the highest quality (in this case, the 1.2Å structure 6F9Z). Sort by resolution (descending) to find the highest resolution structure (2VB1 at 0.65Å). In one URL: https://www.ebi.ac.uk/pdbe/entry/search/index/?searchParams=%7B%22q_ec_hierarchy_name%22:%5B%7B%22value%22:%22Lysozyme%22,%22condition1%22:%22AND%22,%22condition2%22:%22Equal%20to%22%7D%5D,%22q_organism_name%22:%5B%7B%22value%22:%22Gallus%20gallus%22,%22condition1%22:%22AND%22,%22condition2%22:%22Contains%22%7D%5D,%22resultState%22:%7B%22tabIndex%22:0,%22paginationIndex%22:1,%22perPage%22:%2210%22,%22sortBy%22:%22resolution%20asc%22%7D%7D
Update October 2019: the query now returns 837 entries; the entry with the highest score is still 6F9Z and the one with the highest resolution is still 2VB1.
Q. 26. There are modified amino acids in which the hydrogen atom that is normally attached to the alpha carbon atom has been replaced by a methyl group. An example is shown in the figure above. Please answer the following questions: (a) what is the common amino acid from which this 2-methyl-variant is derived? (b) is it D or L? (c) what is its three-letter code? (d) how many PDB entries contain this modified amino acid? (e) can you think of another name for 2-methyl-glycine? (f) can you find another type of 2-methyl-variant of a regular amino acid that occurs in the PDB? (g) do you expect 2-methylated amino acids to have larger, smaller or roughly the same favourable regions in the Ramachandran plot?
(a) The amino acid contains two methyl substituents on the CA carbon, so the non-2-methylated form would have one methyl group and the common amino acid that fits the bill is (L-)alanine.
(b) The CA carbon is not chiral, so the D/L distinction does not apply. Yes, I know: a trick question. Go figure!
(c) If you use the PDBe search system and start typing "2-methyl", the Ligand category will show suggestions. Near the top you will find 2-methyl-alanine, which has the three-letter code AIB.
(d) Execute the search that you started in (c) to find the up-to-date answer (in October 2019 the answer was 64).
(e) Another trick(y) question - 2-methyl-glycine has one hydrogen and one methyl group. In other words, it is (D- or L-) alanine!
(f) Using the same "trick" as in (c), you may find MGN (2-methyl-glutamine), 2ML (2-methyl-leucine) and several others.
(g) The methyl group is considerably bulkier than a hydrogen atom. Thus it will introduce additional limitations on the conformational freedom of such residues. We therefore expect their favourable regions in the Ramachandran plot to be smaller than those of the corresponding non-methylated amino acids.