0%

For X-ray crystallography structures

R-factor and R-free 

The R-factor is a measure that quantifies the overall disagreement between the observed diffraction data and the calculated diffraction data from the atomic coordinates of the model. A lower R-factor indicates a better agreement between the model and the experimental data.

However, the R-factor can be misleading if a model is “overfitted” to the data. Overfitting occurs when the model is adjusted to match noise or minor fluctuations in the specific dataset used for refinement, rather than representing the true underlying structure. This can lead to a model that fits the specific dataset used for refinement but is not a good general representation.

To identify and guard against overfitting, the free R-factor (R-free) was introduced (Brünger, A. , 1992). This metric uses a small subset of the experimental diffraction data (typically 5-10%) that is randomly selected and never used during the model building and refinement process. The R-free is then calculated by comparing the model’s predicted amplitudes to this unused data subset. In essence, R-free is a measure of how well the model predicts “new” data.

For a well-refined model that accurately reflects the data without overfitting, the R-free will be similar to the R-factor, typically slightly higher (often by ~0.02 – 0.05). A low R-free value suggests the model is a good representation of the data and hasn’t just been “overfitted” to match noise in the data used for building. Lower R-free values generally indicate a more accurate and reliable X-ray model.

RSRZ Outliers 

The real-space R-value (RSR) is a measure of the quality of fit between a part of an atomic model (in this case, one residue) and the electron density map.

The RSR Z-score (RSRZ) is a normalisation of RSR specific to a residue type and a resolution bin. RSRZ is calculated only for standard amino acids and nucleotides in protein, DNA, and RNA chains. A residue is considered an RSRZ outlier if its RSRZ value is greater than 2. The global RSRZ outlier score (percentage of outliers) tells you the overall proportion of residues that don’t fit the density well.

There is a trade-off between having fewer RSRZ outliers and building a more complete model in the parts of the map that are more difficult to interpret. Different model makers have different preferences.

A lower percentage of RSRZ outliers is better. This is a key indicator of potential issues, especially at lower resolutions where fitting the model to the density is more ambiguous. There is often a trade-off between having fewer RSRZ outliers and building a more complete model in the parts of the map that are more difficult to interpret. Different model makers have different preferences.

Ligand fit to experimental data (RSCC and RSR)

For structures containing bound ligands (small molecules), it’s crucial to assess how well the ligand model fits the experimental data. Two key metrics for this are the Real-Space Correlation Coefficient (RSCC) and the Real-Space R-value (RSR).

  • Real-Space Correlation Coefficient (RSCC): This metric quantifies the correlation between the electron density calculated from the ligand model and the experimental electron density map in the region of the ligand. A value closer to 1.0 indicates a very good fit, meaning the ligand model aligns well with the observed electron density. A value around 0.90 is generally acceptable, but values around or below 0.80 may indicate a poor fit, suggesting that the experimental data does not strongly support the ligand’s placement.
  • Real-Space R-value (RSR): This metric measures the disagreement between the observed and calculated electron densities for the ligand. Lower values indicate better agreement. RSR values approaching or above 0.4 typically suggest a poor fit and/or low data resolution, indicating potential issues with the ligand’s modelling or its presence in the experimental data.
Graphical depiction of the model fit to the experimental electron density of all instances of the Ligand of Interest (LOI)