Structure validation practical (5)

All amino-acid residues whose side-chain extends beyond the CB atom have one or more conformational side-chain torsion angles, termed chi-1 (N-CA-CB-XG; where X may be carbon, sulfur, or oxygen, depending on the residue type; if there are two G atoms, the chi-1 torsion is calculated with reference to the atom with the lowest numerical identifier, e.g., OG1 for threonine residues), chi-2 (CA-CB-XG-XD), etc.

Validation potential of chi torsions: moderate.

Chi-1 distribution

The figure (from 1998) shows the distribution of chi-1 torsion-angle values for more than 67,000 residues in the PDB at the time. Similar figures for chi-2 to chi-4 can be found here.

Q. 8. What are the three conformations that give rise to local maxima in the chi-1 distribution called?

Q. 9. Can you explain why there appear two "humps" in the chi-1 distribution near +35 and -35 degrees?

Early on, it was found that the values that side-chain torsion angles assume in proteins are similar to those expected on the basis of simple energy calculations and that, in addition, certain combinations of chi-1, chi-2 values are clearly preferred (so-called [preferred] rotamer conformations). Analogous to Ramachandran plots, chi-1, chi-2 scatter plots can be produced that show how well a protein's side-chain conformations conform to known preferences. Alternatively, for each residue, a score can be computed that shows how similar its side-chain conformation is to that of the most similar known rotamer for that residue type. This score can be calculated as an RMS distance between corresponding side-chain atoms, or it can be expressed as an RMS difference of side-chain torsion-angle values from those of the most similar rotamer.

Validation potential of chi combinations: excellent.

Rotamers

The figure (from 1998) shows the distribution of chi-1 (horizontal), chi-2 (vertical) torsion-angle combinations for more than 47,000 residues in the PDB at the time. Densely populated regions are shown with contours. Similar figures for individual amino-acid residue types can be found here.

Q. 10. How many clear rotamer conformations exist for leucine residues?

Q. 11. In the plot of the chi-2 distribution, where do you expect to find most of the proline residues? What are the two most favourable rotamers that you expect to find for proline?

In the past (and occasionally still today), "hot" or preliminary or very low resolution models were sometimes deposited as a "CA-only model" (i.e., only the coordinates of the CA atoms were deposited). However, not many validation tools can handle such models. The CA backbone can be characterised by CA-CA distances (~2.9 Å for a cis-peptide, and ~3.8 Å for a trans-peptide), CA-CA-CA pseudo-angles, and CA-CA-CA-CA pseudo-torsion angles. The pseudo-angles and pseudo-torsion angles turn out to assume certain preferred value combinations, much like the backbone phi and psi torsions, and this can be employed for the validation of CA-only models.

Validation potential of CA-only tests: good. But they provide little in the way of error diagnostics.

Hydrophobic, electrostatic, and hydrogen-bonding interactions are the main stabilising forces of protein structure. This leads to packing arrangements where hydrophobic residues tend to interact with each other, where charged residues tend to be involved in salt links, and where hydrophilic residues prefer to interact with each other or to point out into the bulk solvent. Serious model errors will often lead to violations of such simple rules of thumb and introduce non-physical interactions (e.g., a charged arginine residue located inside a hydrophobic pocket) that serve as good indicators of model errors. Directional atomic contact analysis (DACA) is a method in which these empirical notions have been formalised through database analysis. For every group of atoms in a protein, it yields a score which, in essence, expresses how "comfortable" that group is in its environment in the model under scrutiny (compared to the expectations derived from the database). If a region in a model (or the entire model) has consistently low scores, this is a very strong indication of model errors.

Validation potential of DACA analysis: excellent.

Sometimes a protein crystallises with more than one independent copy whose structure needs to be determined separately - this phenomenon is called non-crystallographic symmetry (NCS). Since all copies have the same sequence and chemical composition, we expect that the models of the various independent copies should be quite similar. During model refinement, a careful crystallographer might either constrain the various copies to be identical, or restrain them to be very similar in terms of their structure. This reduces the effective number of parameters (degrees-of-freedom) in the model and tends to result in better determined models. Large, random differences in models related by NCS are often indicative of poor refinement practices, and sometimes result in poor models. Hence, the similarity of the NCS-related models can be used as a validation criterion. This similarity can be expressed in terms of RMS distances between equivalent atoms in the two (or more) copies of the molecule. Alternatively, differences between corresponding phi, psi and chi torsion angles can be used.

Validation potential of NCS checks: moderate. NCS constraints and restraints are so powerful that it is usually better to impose them during refinement (especially at low resolution) than to use NCS as an a posteriori validation criterion.  

In crystallographic refinement, Atomic Displacement Parameters (ADPs; often referred to as temperature factors or B-factors) model the effects of thermal vibration of the atoms. Except at high resolution (typically, better than ~1.5 Å) where there are sufficient observations to warrant refinement of anisotropic temperature factors, ADPs are usually constrained to be isotropic. The isotropic temperature factor B of an atom is related to the atom's mean-square displacement. Compared to the atomic coordinates, there are usually few restraints on temperature factors during refinement. Therefore, particularly at low resolution, temperature factors can function as "error sinks". They "absorb" not only the effects of thermal vibration but also of static and dynamic disorder and of various kinds of model error. Compared to the wealth of statistics that can be used to check and validate coordinates, there are relatively few methods available to assess how reasonable a model's temperature factors are. One should keep in mind that a low average B-factor, per se, is not necessarily an indication of high model quality. For instance, a backwards-traced protein model can have a considerably lower average B-factor than a correct model at a similar resolution. Average (and minimum and maximum) temperature-factor values are sometimes listed separately for various groups of atoms (e.g., individual protein or nucleic acid molecules, ligands, solvent molecules). A simple plot of residue-averaged temperature factors as a function of residue number may reveal regions of the molecule that have consistently high B-factors, which may be due to problems in the model.  Other statistics pertain to the RMS differences in B-factors between atoms that are somehow related, for example through a chemical bond or by NCS. Sometimes these statistics are calculated separately for main-chain and side-chain atoms. If the B-factors of such related atoms have been restrained to be similar during refinement, these checks do not provide a convincing indication of the quality of the model.  Given experimental data (preferably to better than 3 Å resolution) and some knowledge of the contents of the unit cell, an overall temperature factor can be calculated that is known as the Wilson B-factor. In practice, there is a good correlation between the model and the Wilson B factors, so very large discrepancies between them could suggest that the B-factors of the model need to be taken with a grain of salt.

Validation potential of temperature-factor tests: poor.

Click here to continue…

Q. 8. What are the three conformations that give rise to local maxima in the chi-1 distribution called?

Q. 9. Can you explain why there appear two "humps" in the chi-1 distribution near +35 and -35 degrees?

Q. 10. How many clear rotamer conformations exist for leucine residues?

Q. 11. In the plot of the chi-2 distribution, where do you expect to find most of the proline residues? What are the two most favourable rotamers that you expect to find for proline?

PDBe is part of the ELIXIR infrastructure

PDBe is a Global Core Biodata Resource