Happy Holidays and New year from the EMDB! Please be advised that the EMDB team will be operating at reduced capacity between December 15th - January 12th. With this in mind we would ask that you be prepared for longer reply times through both our helpdesk system and the OneDep deposition system. well done on a productive 2025, wishing you all the best and we will see you in 2026!

Segmentation data model prototype

Segmentation is the decomposition of 3D volumes into regions that can be associated with defined objects. Following several consultations with the EM community (Patwardhan et al., 2012Patwardhan et al., 2014Patwardhan et al., 2017), the EMDB developed a prototype to explore supporting the deposition of volume segmentations with structured biological annotation which is here defined as the association of data with identifiers (e.g., accession codes from UniProt) and ontologies taken from well established bioinformatics resources. To our knowledge, none of the segmentation formats widely used in electron microscopy and related fields currently support structured biological annotation. Third party use of segmentations is further impeded by the prevalence of segmentation file formats and their lack of interoperability. EMDB therefore proposed an open segmentation file format called EMDB-SFF to capture basic segmentation data from application-specific segmentation file formats and provide the means for structured biological annotation. In this way, file formats like EMDB-SFF could not only enable depositions of segmentations but also act as a file interchange format between different applications and facilitate analysis of 3D reconstructions. Furthermore EMDB-SFF prototypes the description of multiple transforms for a segment, thus allowing a segment to be used to describe the placement of a sub-tomogram average onto a tomographic reconstruction.

Model

EMDB-SFF files have the follow features:

  • Segmentation metadata:
    • name
    • version (of schema)
    • details (free-form text)
    • global external references, e.g. specimen scientific identifier
    • bounding box
    • primary descriptor contained i.e. one of ‘three_d_volume’, ‘mesh_list’, or ‘shape_primitive_list’ (see schema documentation)
    • list of software used to create the segmentation (name, version, processing details)
    • list of transforms referenced by segments e.g. transform to place the sub-tomogram average in the tomogram
  • Hierarchical ordering of segments through the use of segment IDs and parent IDs;
  • Four geometrical representations of segments (volumes, contours, meshes, shapes);
  • Can store subtomogram averages and how they map into the parent tomogram through the use of transforms;
  • List of associated external references per segment;
  • List of associated complexes and macromolecules in a related EMDB entry

Each segment in a segmentation can consist of two types of descriptors:

  • textual descriptors;
  • geometric descriptors.

Textual descriptors consist of either free-form text or standardised terms. Standard terms should be provided from a [published] ontology or list of identifiers.

Geometric descriptors can take one or more of the following representations:

  • ‘three_d_volume’ for 3D volumes;
  • ‘mesh_list’ for lists of meshes each of which consists of a set of vertices and polygons;
  • lists of shape primitives (ellipsoid, cuboid, cone, cylinder).
Documentation
Download

The current schema (version 0.8.0.dev1) is available here.

Documentation

Complete documentation of the schema is available here.

Auxiliary Tools
sfftk-rw

sfftk-rw is a Python toolkit for reading and writing EMDB-SFF files only. It is part of a family of tools designed to work with EMDB-SFF files.

sfftk-rw has the following utilities:

  • convert - interconvert between XML, HDF5 and JSON file formats of the EMDB-SFF data model;
  • view - view a file summary

The full documentation is available at readthedocs.

Download

The latest version runs only on Python 3 (version 0.7.1) and may be installed using pip install sfftk-rw. Alternatively, feel free to obtain the source code from Github.

sfftk

sfftk provides a shell command and a Python API to process EMDB-SFF files.

The following utilities are available using sfftk:

  • convert - Conversion of application-specific segmentation file formats to EMDB-SFF. Currently, sfftk supports the following formats:
    • AmiraMesh (.am)
    • Amira HyperSurface (.surf)
    • Segger (.seg)
    • EMDB Map masks (.map)
    • Stereolithography (.stl)
    • IMOD (.mod)
  • notes - Annotation of EMDB-SFF files.
  • view - Brief summaries of segmentation files.

Read the full documentation here.

Download

The latest development version (version 0.5.5.dev1) of sfftk may be downloaded/installed from PyPI or the source may be obtained from GitHub.