Documentation
Summary
EMDB map data model
The EM Data Bank (EMDB) accepts and distributes 3D map volumes derived from several types of EM reconstruction methods, including single particle averaging, helical averaging, 2D crystallography, and tomography. Since its inception in 2002, the EMDB map distribution format has followed CCP4 definition (CCP4 map format) , which is widely recognized by software packages used by the structural biology community. CCP4 map format is closely related to the MRC map format used in the 3DEM community (MRC map format); CCP4 is slightly more restrictive, in that voxel positions are limited to a grid that includes the Cartesian coordinate origin (0,0,0). Further details can be found here.
EMDB header data model
Every EMDB entry has a header file containing meta data (e.g., sample, detector, microscope, image processing) describing the experiment. The header file is an XML file and the structure and content of the header file is described by a XSD data model. With a highly dynamic field such as cryo-EM there is a constant need to adapt and modify the schema to keep it up-to-date with the most recent developments. We consult extensively with the EM community regarding such issues and version the schema according to the policy described here.
Data model version 1.9
This has been a long-term stable version of the data model. It was be replaced in 2018 with an updated model but XML header files in version 1.9 continues to be distributed in parallel for at least one year to give EMDB users ample time to switch. It should be noted that the generation of the version 1.9 header files will be on a best effort basis but involves a back translation from recent versions that are richer in content and will therefore not contain all the information that can be found in the more recent versions.
Download schema
Browse schema documentation
Download Python code to facilitate reading and writing XML version 1.9 header files
Data model version 3.0 (current model)
This data model replaced version 1.9, however header files corresponding to both data models will be distributed in parallel with the view of stopping the distribution of the version 1.9 files in 2019 once users have had a chance to adopt version 3.0.
This version adds a number of features including:
- An improved description of direct electron detectors, specimen preparation and tomography experiments.
- A hierarchal description of the overall sample composition in combination with a low-level description of the macromolecular composition to allow the description of both molecular and cellular samples.
- Specific data items describing the half-maps and segmentations included with the entry.
Download schema
Browse schema documentation
Download Python code to facilitate reading and writing XML version 1.9 header files
EMDB segmentation data model
Segmentation is the decomposition of 3D volumes into regions that can be associated with defined objects. Following several consultations with the EM community (Patwardhan et al., 2012; Patwardhan et al., 2014; Patwardhan et al., 2017), the EMDB is in the process of developing tools to support deposition of volume segmentations with structured biological annotation which is here defined as the association of data with identifiers (e.g., accession codes from UniProt) and ontologies taken from well established bioinformatics resources. To our knowledge, none of the segmentation formats widely used in electron microscopy and related fields currently support structured biological annotation. Third party use of segmentations is further impeded by the prevalence of segmentation file formats and their lack of interoperability. EMDB therefore proposed an open segmentation file format called EMDB-SFF to capture basic segmentation data from application-specific segmentation file formats and provide the means for structured biological annotation. In this way, EMDB-SFF will not only enable depositions of segmentations but also act as a file interchange format between different applications and facilitate analysis of 3D reconstructions. Furthermore EMDB-SFF supports the description of multiple transforms for a segment, thus allowing a segment to be used to describe the placement of a sub-tomogram average onto a tomographic reconstruction.
Model
EMDB-SFF files have the follow features:
- Segmentation metadata:
- name
- version (of schema)
- details (free-form text)
- global external references, e.g. specimen scientific identifier
- bounding box
- primary descriptor contained i.e. one of ‘three_d_volume’, ‘mesh_list’, or ‘shape_primitive_list’ (see schema documentation)
- list of software used to create the segmentation (name, version, processing details)
- list of transforms referenced by segments e.g. transform to place the sub-tomogram average in the tomogram
- Hierarchical ordering of segments through the use of segment IDs and parent IDs;
- Four geometrical representations of segments (volumes, contours, meshes, shapes);
- Can store subtomogram averages and how they map into the parent tomogram through the use of transforms;
- List of associated external references per segment;
- List of associated complexes and macromolecules in a related EMDB entry
Each segment in a segmentation can consist of two types of descriptors:
- textual descriptors;
- geometric descriptors.
Textual descriptors consist of either free-form text or standardised terms. Standard terms should be provided from a [published] ontology or list of identifiers.
Geometric descriptors can take one or more of the following representations:
- ‘three_d_volume’ for 3D volumes;
- ‘mesh_list’ for lists of meshes each of which consists of a set of vertices and polygons;
- lists of shape primitives (ellipsoid, cuboid, cone, cylinder).
Documentation
Download
The current schema (version 0.8.0.dev1) is available here.
Documentation
Complete documentation of the schema is available here.
Auxiliary Tools
sfftk-rw
sfftk-rw is a Python toolkit for reading and writing EMDB-SFF files only. It is part of a family of tools designed to work with EMDB-SFF files.
sfftk-rw has the following utilities:
- convert - interconvert between XML, HDF5 and JSON file formats of the EMDB-SFF data model;
- view - view a file summary
The full documentation is available at readthedocs.
Download
The latest version runs only on Python 3 (version 0.7.1) and may be installed using pip install sfftk-rw
. Alternatively, feel free to obtain the source code from Github.
sfftk
sfftk provides a shell command and a Python API to process EMDB-SFF files.
The following utilities are available using sfftk:
- convert - Conversion of application-specific segmentation file formats to EMDB-SFF. Currently, sfftk supports the following formats:
- AmiraMesh (.am)
- Amira HyperSurface (.surf)
- Segger (.seg)
- EMDB Map masks (.map)
- Stereolithography (.stl)
- IMOD (.mod)
- notes - Annotation of EMDB-SFF files.
- view - Brief summaries of segmentation files.
Read the full documentation here.
Download
The latest development version (version 0.5.5.dev1) of sfftk may be downloaded/installed from PyPI or the source may be obtained from GitHub.
Publications
- Patwardhan, Ardan, Robert Brandt, Sarah J. Butcher, Lucy Collinson, David Gault, Kay Grünewald, Corey Hecksel et al. Building bridges between cellular and molecular structural biology. eLife 6 (2017).
- Patwardhan, Ardan, Alun Ashton, Robert Brandt, Sarah Butcher, Raffaella Carzaniga, Wah Chiu, Lucy Collinson et al. A 3D cellular context for the macromolecular world. Nature structural & molecular biology 21, no. 10 (2014): 841-845.
- Patwardhan, Ardan, José-Maria Carazo, Bridget Carragher, Richard Henderson, J. Bernard Heymann, Emma Hill, Grant J. Jensen et al. Data management challenges in three-dimensional EM. Nature structural & molecular biology 19, no. 12 (2012): 1203-1207.
Quick links
Recent Entries
(Show all)Complex structure of Neuropeptide Y Y2 receptor in complex with PYY(3-36) and Gi
Cryo-EM structure of conjugative pili from Pyrobaculum calidifontis
Alpha1/BetaB Heteromeric Glycine Receptor in 1 mM Glycine 20 uM Ivermectin State
Structure of SLC40/ferroportin in complex with synthetic nanobody Sy3 in occluded conformation
Signal peptide mimicry primes Sec61 for client-selective inhibition
Alpha1/BetaB Heteromeric Glycine Receptor in Strychnine-Bound State
Heteromeric ring comprised of peroxiredoxin from Thermococcus kodakaraensis (TkPrx) F42C/C46S/C205S/C211S mutant modified with 2-(bromoacetyl)naphthalene (Naph@TkPrx*F42C) and TkPrx C46S/F76C/C205S/C211S mutant modified with 2-(bromoacetyl)naphthalene (Naph@TkPrx*F76C) (Naph@(MIX|3:3))
Herpes simplex virus 1 polymerase holoenzyme bound to DNA and dTTP in closed conformation
Cryo-EM structure of the TUG891 bound GPR120-Giq complex(mask on Giq-scFV16 complex)
A cryo-EM structure of B. oleracea RNA polymerase V elongation complex at 2.73 Angstrom
Human Cx36/GJD2 (N-terminal deletion BRIL-fused mutant) gap junction channel in soybean lipids (C1 symmetry)
Cryo-EM structure of the eicosapentaenoic acid bound GPR120-Gi1 complex(mask on receptor)
Structure of WT E.coli ribosome 50S subunit with complexed with mRNA, P-site fMet-NH-tRNAfMet and A-site meta-aminobenzoic acid charged NH-tRNAPhe
Cutibacterium acnes 70S ribosome with mRNA, P-site tRNA and Sarecycline bound
Herpes simplex virus 1 polymerase holoenzyme bound to DNA in both open/closed conformations
Acidipropionibacterium acidipropionici encapsulin in a closed state at pH 7.5
Mouse apoferritin from data collected on Talos Arctica microscope equipped with a K3 camera operating in super resolution mode, in 0-50 nm ice thickness
Human Cx36/GJD2 (BRIL-fused mutant) gap junction channel in soybean lipids
Human L-type voltage-gated calcium channel Cav1.2 complexed with gabapentin (Segment Map)
Structure of SLC40/ferroportin in complex with vamifeport and synthetic nanobody Sy3 in occluded conformation
Herpes simplex virus 1 polymerase holoenzyme bound to DNA and acyclovir triphosphate in closed conformation
Acidipropionibacterium acidipropionici encapsulin in an open state at pH 7.5
Yeast RNA polymerase II transcription pre-initiation complex with core Mediator
Human Cx36/GJD2 (N-terminal deletion BRIL-fused mutant) gap junction channel in soybean lipids (D6 symmetry)
Mouse apoferritin from data collected on Talos Arctica microscope equipped with a K3 camera operating in super resolution mode, in 100-150 nm ice thickness
Cryo-EM structure of the RecA presynaptic filament from S.pneumoniae
CryoEM structure of synthetic tau repeat R1R2 with two acetylated lysines at positions 274 and 280
Electron cryo-tomography of the ER-mitochondria encounter structure ERMES
Cryo-EM structure of Mycobacterial Type VII Secretion System Virulence Factor EspB (residues 1-332) with Phosphatidic acid (PA)
Cryo-EM structure of human ABCD1 E630Q in the presence of ATP in inward-facing state
Acidipropionibacterium acidipropionici encapsulin in a closed state at pH 3.0
Mycobacterium tuberculosis RNAP elongation complex with NusG transcription factor
Structurally hetero-junctional human Cx36/GJD2 gap junction channel in detergents (C6 symmetry)
Structure of the human L-type voltage-gated calcium channel Cav1.2 complexed with gabapentin (consensus map)
Cryo-EM structure of the eicosapentaenoic acid bound GPR120-Gi complex(consensus map)
Structurally hetero-junctional human Cx36/GJD2 gap junction channel in detergents (C1 symmetry)
Cryo-EM structure of the TUG891 bound GPR120-Giq complex (consensus map)
Human Cx36/GJD2 gap junction channel with pore-lining N-terminal helices in soybean lipids
Human olfactory receptor OR51E2 bound to propionate in complex with miniGs399
SARS-CoV-2 Delta-RBD complexed with Fabs BA.2-36, BA.2-23, EY6A and COVOX-45
Structural basis for catalysis of human choline/ethanolamine phosphotransferase (CEPT1)
Mycobacterium tuberculosis RNAP paused elongation complex with NusG transcription factor
Cryo-EM structure of human ABCD1 E630Q in the presence of ATP and Magnesium in outward-facing state
Cryo-EM structure of Mycobacterial Type VII Secretion System Virulence Factor EspB (residues 1-332)
Mouse apoferritin from data collected on Talos Arctica microscope equipped with a K3 camera operating in super resolution mode, in 150-200 nm ice thickness
Bovine multidrug resistance protein 1 (MRP1) bound to cyclic peptide inhibitor 1 (CPI1)
Complex structure of Neuropeptide Y Y2 receptor in complex with PYY(3-36) and Gi (Consensus map)
Octahedral supramolecular assembly of the bicomponent gamma-hemolysin octameric pore complexes from Staphylococcus aureus Newman.
Structurally hetero-junctional human Cx36/GJD2 gap junction channel in soybean lipids (C1 symmetry)
Complex structure of Neuropeptide Y Y2 receptor in complex with PYY(3-36) and Gi (Focused map on PYY(3-36)-Y2R)
The SARS-CoV-2 receptor binding domain bound with the Fab fragment of a human neutralizing antibody Ab712
Structurally hetero-junctional human Cx36/GJD2 gap junction channel in soybean lipids (C6 symmetry)
Herpes simplex virus 1 polymerase holoenzyme bound to mismatched DNA in editing conformation
Complex structure of Neuropeptide Y Y2 receptor in complex with NPY and Gi (Focused map on NPY-Y2R)
Structural basis for catalysis of human choline/ethanolamine phosphotransferase (CEPT1) complexed with CDP-choline
S728-1157 IgG in complex with SARS-CoV-2-6P-Mut7 Spike protein (focused refinement)
Structure of dodecameric KaiC-RS-S413E/S414E complexed with KaiB-RS solved by cryo-EM
Mouse apoferritin from data collected on Talos Arctica microscope equipped with a K3 camera operating in super resolution mode, in 50-100 nm ice thickness
A cryo-EM structure of B. oleracea RNA polymerase V at 3.57 Angstrom
Human Oct4 bound to nucleosome with human nMatn1 sequence (focused refinement of Oct4 bound region)
Improving the secretion of designed protein assemblies through negative design of cryptic transmembrane domains - KWOCA51
The SARS-CoV-2 receptor binding domain bound with the Fab fragment of a human neutralizing antibody Ab847
Complex structure of Neuropeptide Y Y2 receptor in complex with NPY and Gi (Consensus map)
Cryo-EM structure of the Arabidopsis thaliana I+III2 supercomplex (Complete conformation 1 composition)
Cryo-EM structure of the TUG891 bound GPR120-Giq complex(mask on receptor)
Human Oct4 bound to nucleosome with human nMatn1 sequence (focused refinement of nucleosome)
The SARS-CoV-2 receptor binding domain bound with the Fab fragment of a human neutralizing antibody Ab709
Complex structure of Neuropeptide Y Y2 receptor in complex with PYY(3-36) and Gi (Focused map on Gi-scFv16)
Previously uncharacterized rectangular bacteria in the dolphin mouth
Mycobacterium tuberculosis RNAP paused elongation complex with Escherichia coli NusG transcription factor
Human Cx36/GJD2 (BRIL-fused mutant) gap junction channel in detergents at 2.2 Angstroms resolution
human EMC:human Cav1.2 channel complex in GDN detergent at 3.1 Angstrom
Complex structure of Neuropeptide Y Y2 receptor in complex with NPY and Gi (Focused map on Gi-scFv16)
Mouse apoferritin from data collected on Talos Arctica microscope equipped with a K3 camera operating in super resolution mode, in 200-500 nm ice thickness
The SARS-CoV-2 receptor binding domain bound with the Fab fragment of a human neutralizing antibody Ab765
Complex structure of a small molecule (SPC-14) bound SARS-CoV-2 spike protein, closed state
Cryo-EM structure of the eicosapentaenoic acid bound GPR120-Gi1 complex(mask on Gil-scFV16 complex)
Cryo-EM structure of human ABCD1 E630Q in the presence of C26:0-CoA and ATP
Structure of SLC40/ferroportin in complex with vamifeport and synthetic nanobody Sy12 in outward-facing conformation
Structure of the human L-type voltage-gated calcium channel Cav1.2 complexed with gabapentin
Complex structure of Neuropeptide Y Y2 receptor in complex with NPY and Gi
S728-1157 IgG in complex with SARS-CoV-2-6P-Mut7 Spike protein (global refinement)