- Course overview
- Search within this course
- Introduction to model quality assessment
- Global quality assessment
- Key takeaways
- Crosscheck with a crossword
- Further reading
- Acknowledgements
- Your feedback
Other important considerations
While global and local quality metrics, along with visual inspection of experimental data, provide a strong foundation for assessing a macromolecular structure, several other factors can significantly influence a model’s overall reliability, completeness, and relevance to your research. Understanding these considerations will help you make more informed decisions when selecting and interpreting structural data.
Missing regions/ Structural coverage of the protein chain
When you look at a 3D structure in the PDB, you might notice that parts of the expected protein or nucleic acid sequence are not included in the atomic model. This means that the deposited structure model may not represent the entire molecule studied experimentally.
Be aware that a deposited structure model may not represent the entire molecule studied experimentally.
These missing regions occur for several reasons:
- The most common reason is that these parts of the molecule are disordered or highly flexible in the crystal (X-ray) or in the frozen sample (cryo-EM). This means they don’t adopt a single, fixed 3D conformation but are instead moving around significantly. Since experimental techniques like X-ray diffraction and cryo-EM capture an average picture over time and across many molecules, highly flexible regions often don’t produce clear, well-defined experimental data. It’s therefore not possible to experimentally determine or model the atomic positions for these parts. In the PDB, these missing segments are typically indicated in the entry’s details and sometimes visually represented as dashed lines in molecular viewers.
- Sometimes, even if a region isn’t completely disordered, the experimental data quality might be too low in that specific area to confidently build a model. Occasionally, researchers might intentionally omit parts of a chain if the data is ambiguous or if they are only focused on modelling a specific, well-ordered domain.
In contrast, structures derived from NMR typically do not have this “missing region” problem in the same way. Ensembles of NMR structures often include several different conformations for flexible regions, allowing you to choose one or use them all, providing insight into the molecule’s dynamic nature rather than simply omitting flexible parts.
Unliganded SIV Protease structure in an “open” conformation (PDB ID: 1AZ5)
This particular PDB entry is used here to illustrate missing regions in a structural model. In the PDB, such missing segments are indicated in the entry’s details and can sometimes be visualized as dashed lines in molecular viewers.

The presence of missing regions is important to consider. If the unmodelled part is in an area critical to your research (e.g., a binding site, a site of post-translational modification, or an interaction interface), the incomplete model might not be sufficient for your purposes. Unfortunately, there is no simple solution to this problem apart from modelling coordinates for the missing portions (molecular modelling programs can help with this). This problem can be significant, since flexible loops are often involved in the active site or binding site of the protein.
While sometimes tempting, using computational methods like homology modelling to “fill in” missing regions should be done with caution. The accuracy of such modelled loops or termini is highly dependent on the quality of the template used and may not reflect the true conformation in the experimental system, especially if the region is genuinely flexible. There are also cases where a missing loop can be added using the crystallographic data plus homologous structures (van Beusekom et al., 2018)
Structural determination of large, multi-domain proteins
For very large proteins, particularly those with multiple movable parts, determining the complete structure can be challenging or even impossible using a single experimental approach. In such cases, researchers often employ a piecewise approach. This involves breaking down the protein into smaller, more manageable fragments or individual domains, and the structure of each isolated piece is then determined independently using experimental methods. To understand the full protein, these individual pieces must then be reassembled in their correct relative orientations.
A key challenge with this approach is that there isn’t a comprehensive resource that automatically helps you piece together the complete functional molecule from its individual solved fragments. To understand the overall form and how these domains interact, you will typically consult sequence data, along with reports from molecular biology and biochemical studies that describe the full protein’s architecture and function.
This is where PDBe-KB (Protein Data Bank in Europe – Knowledge Base) becomes an invaluable tool. As an open, collaborative consortium, PDBe-KB is dedicated to integrating and enriching 3D-structure data with functional annotations, offering a unique aggregated view of protein information crucial for basic and translational research. A key feature of PDBe-KB is its organisation around UniProt IDs. This means that for any given protein, PDBe-KB serves as an excellent central hub where you can find all available annotations and structures related to that specific UniProt entry. While it doesn’t automatically “piece together” fragments, by grouping entries by UniProt ID, PDBe-KB helps you contextualise individual domain structures by providing a holistic and easily searchable overview of a protein’s overall structural and functional landscape. This includes experimental models from the PDB, comparisons with AlphaFold models, and a comprehensive gallery of small molecules known to interact with the protein of interest, all under a single protein identifier.

Hydrogen atoms
When evaluating a macromolecular model, you might notice that not all atoms are explicitly present in the coordinate file, particularly hydrogen atoms. The inclusion of hydrogen atoms depends heavily on the experimental method used.
Most crystallographic experiments do not resolve hydrogen atoms; therefore, most crystallographic coordinate files in the PDB archive only include positions for the non-hydrogen (heavy) atoms. NMR-determined structures, on the other hand, most often include all of the hydrogen atoms in the structure. This is because much of the experimental information obtained in NMR experiments consists of distances and angles involving these hydrogen atoms.
Understanding which atoms are explicitly modelled and why is crucial for the accurate interpretation of hydrogen bonding networks and other fine-detail interactions.
Conformational variability
Biological macromolecules are not rigid, static objects; they are dynamic and can undergo significant changes in their 3D shape, or conformation, as part of their function. The functional state of the target protein is therefore an important factor to consider when selecting a PDB structure. For example, an enzyme might change shape upon binding a substrate, or a transporter protein might alternate between inward-facing and outward-facing conformations.
When you obtain a structure from the PDB, it represents the molecule in a specific state captured under the experimental conditions used (e.g., bound to a ligand, in the presence of a specific ion, or under particular buffer conditions). It’s important to ask: Does this structure represent the biological state I am interested in?
Conformational change upon calcium binding in Calmodulin (PDB IDs: 1CLL and 1CFD)
The orange structure (PDB ID: 1CLL) represents Calmodulin in its calcium-bound state, while the green structure (PDB ID: 1CFD) shows Calmodulin in its apo (calcium-free) state. This visually illustrates how biological macromolecules are dynamic and can undergo significant changes in their three-dimensional shape, or conformation, as part of their function. In this example, the binding of calcium ions induces a notable conformational change in Calmodulin. This highlights the importance of selecting a structural model that accurately represents the specific biological state relevant to your research.
If you are interested in the protein’s active state or active site, you should choose a PDB structure of the protein in complex with its natural ligand or a similar molecule. If you are studying an inhibitor binding mode, you should not select a structure in the apo form (without a ligand) or one bound to an agonist. These forms are likely to be in different conformations. In this case, you should be looking for a structure bound to an inhibitor, preferably one that has a similar binding mode to your ligand(s).
In some cases, you may not find a structure bound to an inhibitor, but rather in the apo form only. You can use this with caution. You may need to employ a flexible docking protocol or run a molecular dynamics simulation to account for the possible different conformations between the unbound (apo) and bound (holo) forms of your target structure.
Comparing structures of the same molecule determined in different states or by different experimental methods can provide valuable insights into conformational changes and molecular dynamics.
Mutations
When selecting a structure from the PDB, it’s crucial to consider if any mutations were introduced to the protein’s native (wild type) sequence. If your goal is to study the wild type (WT) protein, you will want to avoid mutated structures. This is especially important if the mutation is located within a key region of interest, such as a ligand binding site or an active site, as it could alter the structure and function in ways that are not representative of the native protein. If you are interested in a mutant form of the protein, you should deliberately choose a PDB structure of that specific mutant.