EMDB data model

Current map and core data model formats


Dictionary development for the Electron Microscopy Data Bank

New deposition and annotation (D&A) system


The Worldwide Protein Data Bank (wwPDB) and EMDataBank are jointly developing a new deposition and annotation (D&A) system. The aim of this new system is to facilitate the process of deposition of biomacromolecular structure data and to provide tools for validation. With an expected life span of at least 10 years for the new D&A system, the underlying data model used to describe EM experiments needs to be able to capture the important aspects of the various EM methodologies and needs to be sufficiently flexible to adapt to changes and new developments that are bound to occur in this rapidly evolving field.

The new data model has been implemented and will be maintained in XML schema. For the purposes of the D&A system it will be translated into mmCif.


Core data model


Macromolecule and complex description: Macromolecules correspond to basic types (protein, DNA, RNA, saccharide, lipid, ligand, EM label) and they match wwPDB representation. Complexes are any combination of macromolecules. This multilevel organization can be used to describe samples in single particle EM, macromolecular tomography, and cellular tomography.


Better connection to other biological databases: Currently, EMDB entries are linked to external resources, such as, Gene ontology, InterPro, PubMed, digital object identifier, NCBI taxonomy and wwPDB. In a continuous effort to increase the value of EMDB entries we are assessing the linkage to other biological databases.

  • Citations: PubMed, digital object identifier
  • Ligands: PubChem, DrugBank, ChEBI, ChEMBL
  • Lipids: Lipidomics database (LMSD)
  • Complexes: Gene ontology
  • Proteins: InterPro, Uniprot, Enzyme classification, wwPDB
  • DNA/RNA: RefSeq, Genbank
  • Carbohydrates: CardBank
  • Taxonomy: NCBI taxonomy

Better support for advances in the EM field:

  • IHRS helical processing
  • Subtomogram averaging processing
  • Tomography processing
  • New microscopy devices, etc..

Segmentation model

Segmentation refers to the process of dividing a map into sub-regions. The new schema allows hierarchical representation of segments. Each region can be biologically annotated and is spatially defined.


  • Proposed segmentation model

    1. Biological annotation: each segment is biological annotated and linked to either a macromolecule or a complex using a unique identifier.
    2. Spatially definition: the new schema will allow for raster and vector based representation of each segment.
      • Run length encoding
      • Basic vector primitive
      • Link with pre-existent map format

Fourier shell correlation (FSC) model

Fourier shell correlation is the most widely used method for assessing resolution of maps deposited to the EMDB archive.


  • Current FSC model
  • Proposed FSC model
  • The FSC curve depends on a number of parameters such as the mask and symmetry applied, and its interpretation depends on the threshold criteria. The new FSC data model provides the means to comprehensively describe these factors.