Where do the data come from?

Where do the data come from?

There are three major methods to determine structures of biomacromolecules: X-ray crystallography, NMR spectroscopy and cryo-electron microscopy.

Some data deposited in PDB is captured using  electron crystallography , fibre diffraction and neutron diffraction.

Next, we will summarise the three main methods.


X-ray crystallography

X-ray crystallography uses the diffraction of X-rays to determine the structure of macromolecules. Over 85% of the structures in the PDB derive from this experimental method. Many macromolecules and macromolecular complexes form crystals under the right conditions. These are routinely frozen to cryogenic temperatures and exposed to synchrotron X-ray radiation to give a diffraction pattern (see Figure 21).  Once electron density is calculated for a crystal then a structural model can be fitted into it. Both the individual diffraction points and the structural model are archived in the PDB.

Workflow for X-ray crystallography

Figure 21 Workflow for X-ray crystallography. From Wikipedia.

Nuclear magnetic resonance spectroscopy

Nuclear magnetic resonance (NMR) spectroscopy is the second most common method of structure determination, providing ~14% of all entries in the PDB. It utilises the fact that some atomic nuclei are magnetically active and can emit radiofrequency signals when placed in a strong external magnetic field (on the order of 10-20 Tesla, which is almost a million times stronger than the Earth's magnetic field on the surface). Typical data collection may take 2-3 weeks for a small soluble protein, but can be substantially longer for larger systems. 

The measurements in NMR spectroscopy are a number of different complex spectra that report, among other things, on the chemical environment for the magnetically active nuclei (most commonly 1H, 13C and 15N), on chemical bond connections between nuclei and on short distances between specific atoms. These short distances constitute constraints for molecular dynamics (MD) simulation software, which attempts to satisfy as many of them as possible. The outcome of MD simulation is an ensemble of structures (usually 10-20) which, when combined, best satisfy the experimental data. The whole ensemble is deposited in the PDB.

On Line Database of Ensemble Representatives And DOmains (OLDERADO) provides analysis of clustering and domain composition for NMR structure ensembles. Vivaldi provides visualisation and validation display for NMR ensembles.



 In order to provide you with a real case scenario, we illustrate an example below (Figure 22):

15N-HSQC (heteronuclear single quantum coherence) spectrum of a protein

Figure 22 15N-HSQC (heteronuclear single quantum coherence) spectrum of a protein. It is a two-dimensional spectrum, where each peak corresponds to an N-H (amide) group and essentially labels a residue of the protein. The HSQC spectrum is therefore often called the "fingerprint" experiment, as each protein will have a unique pattern of peaks. The horizontal axis gives the chemical shifts of hydrogens, while the vertical - that of nitrogens. Chemical shift is a parameter which is very sensitive to the exact chemical environment of a particular atom, and can therefore act as a "label" or "reporter" for that atom. The protein sample is enriched with the 15N isotope of nitrogen, which interacts with the magnetic field stronger than the more common 14N isotope. 


Cryo-electron microscopy

Electron microscopy uses beams of electrons to form images of macromolecules and complexes. Electron microscopy can reveal smaller details than light microscopes owing to the wave length differences between electrons and photons. However, the resolution is lower than that of X-ray crystallography or NMR spectroscopy. Images of macromolecules can be obtained in artificially-stained states or frozen in glassy 'vitreous' ice layers (Figure 23). Images from a large number of orientations of a macromolecule can be combined computationally to give a 3D reconstruction of the electron density. The reconstructed 3D density map is deposited in the Electron Microscopy Data Bank (EMDB). In many cases a higher resolution  X-ray or NMR structure can be positioned in this electron density volume to allow its interpretation. It is these positioned 'fitted' structures that are then made available in the PDB.

 CryoEM image of GroEL suspended in vitreous ice

Figure 23 CryoEM image of GroEL suspended in vitreous ice at 50,000X magnification. Image from Wikipedia.