What information does a PDB entry contain?
A typical PDB entry consists of the 3D coordinates of proteins and/or nucleic acids (e.g. DNA, RNA) along with bound molecules (e.g. sugars, lipids, inhibitors, metals etc.) and solvent (e.g. water) molecules.
Each PDB entry is an experimentally-determined macromolecular structure. That is, it reflects a sample generated in the ‘real world’ and experimental steps have been taken to determine what the sample looks like in 3D, described at the atomic level, by a scientist or group of scientists.
Key insight useful for navigating a PDB entry
Polymer vs. non-polymer
The contents of each entry can be broadly separated into two categories, polymers and non-polymers:
- Biological macromolecules such as proteins/DNA/RNA/polysaccharides are categorised as polymers because they are composed of multiple linked units:
- Proteins are polymers of amino acids.
- DNA strands are polymers of deoxyribonucleotides.
- RNA molecules are polymers of ribonucleotides.
- Polysaccharides are carbohydrates composed of multiple monosaccharide units. For example sucrose (table sugar) is composed of glucose and fructose.
- Water and other bound small molecules, in contrast, fall in the non-polymer category. A single amino acid, a single nucleotide, or monosaccharide (e.g. individual glucose or individual fructose units) bound in a structure are also classed as a non-polymer.
Entity vs. instance
In one PDB entry, the same polymers or non-polymers can occur more than once. When a polymer or non-polymer occurs more than once in a structure (often labelled A, B, C, etc), but has an identical chemical composition, these instances are grouped together and identified as the same entity (due to their identical composition).
- Multiple instances of protein chain with an identical amino acid sequences are one entity.
- Each protein chain instance has unique identifier (e.g. A, B, aa, 1), but if it is the same entity as another protein chain they will both have the same name, and the same sequence associated.
- Multiple instances of small molecules with identical chemical formula, bonds and stereochemistry are one entity.
- Each small molecule instance has a unique identifier (e.g. A101, B201, B202), but if it is the same entity as another small molecule they will both have the same name, and the same chemical formula, molecular weight, etc.
Aspects to explore
The 3-dimensional (3D) atomic coordinates in a PDB entry can be explored for a variety of information:
- visualising interactive 3D structure
- secondary structure, domains and folds present in the proteins
- biological assembly or quaternary structures for the proteins and DNA/RNA
- sequence information for all the proteins and nucleic acids that are present in the entry along with their mapping to UniProt (protein) or GenBank (RNA)
- bound molecules or ligands and their environment
- source and expression system of the proteins/nucleic acids
- quality of the structure and experimental information
- publication information
Unique codes
All PDB entries are assigned a unique accession code, currently this is four character code which always starts with a number, e.g. 1xyz.
To handle the amount of data in the PDB database, there is a planned change for the PDB code composition in the future. The amount of data in the database is expected to be more than the unique four character codes available by end of 2029. Information about this change is found here:
https://www.wwpdb.org/documentation/new-format-for-pdb-ids