EMPIAR logo

EMPIAR Policies and Processing Procedures

This is a draft and comments from the community are invited.

Authored by the EMPIAR team (empiar-help@ebi.ac.uk)

Version: 0.3 (draft)

Date: 2019/04/12

Valid from: 2019/07/01

Valid until: superseded by revised document

Preface


This document details the policies and procedures governing the Electron Microscopy Public Image Archive (EMPIAR), a public repository for raw two-dimensional image data from 3D bioimaging experiments as well as certain 3D bioimaging datasets. The repository organises its data into EMPIAR entries, and this document covers the policies regarding what constitutes an EMPIAR entry, the requirements for entry depositions into the repository, the underlying entry data processing and ultimately data provision to the public. Any policy issues that may arise should refer to the date and version provided above, and be raised by contacting the EMPIAR team at empiar-help@ebi.ac.uk. These queries could pertain to issues not currently covered by this document, inconsistencies inside this document, requests for clarification, suggestions for improvement, and discussions of potential exceptions.

Table of contents


  1. Introduction
  2. Definitions
    1. Abbreviations
    2. Terms
  1. Entry requirements
    1. Entry acceptance
    2. Entry association
    3. Entry auxiliary data
  2. Entry ownership and authorship
    1. User roles and permissions
    2. Deposition owner and authors
  3. Deposition requirements
    1. Deposition acceptance
    2. Data file requirements
  4. Deposition and accession code assignment
    1. EMPIAR deposition code
    2. EMPIAR accession code
  5. Processing procedures
    1. Deposition process
    2. Processing procedure
    3. Release process
      1. Release instruction
      2. Status codes
      3. Procedure for release
  6. Modification of entries
    1. Entry changes before release
    2. Entry changes after release
  7. Removal of entries
    1. Entry obsoletion
    2. Entry withdrawal
    3. Entry removal in unusual circumstances

I. Introduction


The Electron Microscopy Public Image Archive (EMPIAR) is a public resource for raw, 2D images from molecular and cellular 3D bioimaging experiments, as well as some 3D datasets, obtained using transmission or scanning electron microscopy and electron or soft X-ray tomography. All data in EMPIAR is freely and publicly available to the global community under the CC0 license (https://creativecommons.org/share-your-work/public-domain/cc0/ ).

As part of EMBL-EBI, EMPIAR is committed to place all primary and derived data in the public domain. The stated mission of EMBL-EBI includes the provision of freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress.

This document outlines current policies governing EMPIAR deposition, processing and release. The EMPIAR staff will continue to update annotation practices in line with evolving structure determination techniques and annotation methods.

II. Definitions


A. Abbreviations

Abbreviation Meaning
3DSEM Three-Dimensional Scanning Electron Microscopy (e.g., FIB - Focussed Ion Beam and SBF - Serial Block Face)
CLEM Correlative Light and Electron Microscopy
CLXM Correlative Light and X-ray Microscopy
EM Electron Microscopy, usually understood to be transmission cryo-electron microscopy or tomography
EMDB Electron Microscopy Data Bank
EMPIAR Electron Microscopy Public Image Archive
PDB Protein Data Bank
PDB-Dev A prototype deposition and archiving system for structural models and associated experimental and metadata obtained through integrative/hybrid (I/H) methods
PI Principal Investigator
SXT Soft X-ray Tomography

B. Terms

Term Description
Owner The (scientific) owner of a deposition must be a PI who is scientifically responsible for the study that generated the data. There may be more than one PI designated as owners. Owners do not need to have an account on the EMPIAR deposition system (although they may)
Depositor The person who owns an EMPIAR account that is associated with a specific deposition and who has permission to modify the deposition. The depositor may also be the or an owner but they don’t have to be. There is only one depositor for any given deposition
Author Any person, designated by the owner(s), who has in any way contributed to the data in an EMPIAR entry. Authors do not need to have an account on the EMPIAR deposition system (although they may)
Public (EMPIAR) archive The authoritative public copy of the EMPIAR archive as maintained by and at EMBL-EBI and from which any and all public EMPIAR files can be downloaded. There may be official mirror sites of the public archive
Mirror (EMPIAR) site Official replicas of the main Public (EMPIAR) archive established on agreement basis. May or may not distribute all the data. Example: https://empiar.pdbj.org/
Operators of the (EMPIAR) archive EMBL-EBI, the organisation that acts as archive keeper of the public archive. They can be contacted via email to empiar-help@ebi.ac.uk

1. Entry requirements


1.1 Entry acceptance. An EMPIAR entry is a set of microscopy files that are tied together logically. The main unifying characteristic of these files can be a publication, an EMDB map that has been obtained from them or a specific experiment during which they have been obtained. A single EMPIAR entry may contain multiple image sets, so it can refer to, for example, multiple EMDB entries, or multiple publications. Alternatively, an EMPIAR entry may be tied to a single publication or EMDB entry.

An entry is comprised of one or more image sets where each image set is one complete set of images used, for example, to obtain one of the associated EMDB entries or to make a single 3D reconstruction from an SBF-SEM experiment. An example of an image set could be a collection of multi-frame micrographs, a stack of particle images, or a tomogram obtained with SXT. Each image set should reside in a separate directory.

1.2 Entry association. All EMPIAR entries (with certain exceptions, see below) are required to be associated with one or more EMDB entries. “Associated” in this context means that it should be the image data used to obtain the 3D reconstruction(s) deposited as one or more EMDB entries. In such cases depositors are encouraged to inform EMDB and PDB (as appropriate) about the EMPIAR accession.

EMPIAR will accept data that is not associated with an EMDB entry in the following cases:

  • 2D/3D data from 3D imaging modalities not covered by EMDB (e.g. 3DSEM and SXT);
  • 2D EM data used in integrative/hybrid methods, associated with a structure deposited in the PDB or PDB-Dev archive;
  • Certain reference and benchmark datasets (to be decided on a case-by-case basis)*
  • Datasets used for certain community challenges (such as the 2015 Map Validation Challenge, see: “The first single particle analysis Map Challenge: A summary of the assessments,” J. Struct. Biol. 204 (2018), 291-300, https://doi.org/10.1016/j.jsb.2018.08.010)*

* We are keen to support community challenges and archival of reference data sets. Please contact the operators of the EMPIAR archive prior to deposition.

In cases not covered above, please contact the operators of the EMPIAR archive prior to deposition to discuss the potential suitability of EMPIAR for your data.

1.3 Entry auxiliary data. Auxiliary data such as files containing particle coordinates, segmentations, gain reference, video guides through the dataset in orthogonal planes, Blender projects, motion correction and gain normalisation scripts may also be included as a part of the entry. If in doubt, please contact the operators of the EMPIAR archive.

2. Entry ownership and authorship


2.1 User roles and permissions. The EMPIAR deposition system is user-based. Individual users sign on to the system, and can create and handle multiple depositions. Users may also invite other users to share access to a deposition with varying degrees of access privileges.

2.2 Deposition owner and authors. The PI(s) scientifically responsible for the study is designated as the owner of the deposition. The owner can delegate the upload and entry of data and metadata to another person. The EMPIAR entry and associated EMDB entry (or entries) must have the same PI/owner. It is the owner’s responsibility to make sure that consent has been given by all the publication authors, EMDB entry authors and EMPIAR entry authors to deposit the data to EMPIAR. The owner of the deposition is ultimately responsible for making sure that the information provided during the deposition process is correct, and that consent has been granted by all citation authors, EMDB entry authors and EMPIAR entry authors to act on their behalf.

All communication regarding a deposition from the EMPIAR annotation team will be addressed to the owner(s) and the depositor. It is the owner’s responsibility to make sure that information is then further channelled to the appropriate authors. The names and ORCID iDs* of all authors associated with the EMPIAR entry are collected and made public upon release. Additional information about the corresponding author and PI/owner (including institutional address, email and phone numbers) is collected during the deposition process but not made public. It is however stored to enable future communications regarding the entry. The corresponding author of the paper must be one of the authors associated with the deposition, but does not necessarily have to be the owner of the deposition. The owner(s) (provided they have an account) and depositor can invite an anonymous reviewer to inspect an EMPIAR deposition.

* https://orcid.org/; ORCID iDs are persistent, unique digital identifiers for researchers

3. Deposition requirements


3.1 Deposition acceptance. We highly recommend that the full raw data is deposited, with each image set in a separate directory. For 3D bioimaging depositions not related to an EMDB entry, we also recommend that the 3D reconstruction is deposited. Every EMPIAR entry must have a thumbnail image representative of the entry; the depositor is required to provide such an image (not subject to any copyright restrictions) in PNG, JPEG, TIFF or GIF format, with a minimum size of 400 x 400 pixels.

3.2 Data file requirements. Data files should preferably be deposited in one of the following formats:

File type File extension(s) Can contain
Image data
MRC individual images .mrc Tilt series, micrographs, picked particles
MRC stacks .mrcs Micrographs, picked particles
DM4 stacks .dm4 Micrographs, class averages
TIFF individual images .tiff Micrographs
SPIDER individual images .spi Micrographs
SPIDER stacks .spi Picked particles
IMAGIC stacks .hed and .img – both have to be provided Picked particles
Auxiliary data
EMDB-SFF .xml Segmentations and annotations
Amira .am Segmentations
VTK .vtk Segmentations
VTP .vtp Segmentations
STL .stl Segmentations
EMX .emx Electron microscopy exchange format metadata (reference)
SCIPION workflow .json Integration, reproducibility and analysis workflow files for Scipion - an image processing framework to obtain 3D models of macromolecular complexes using Electron Microscopy
JPEG .jpg/.jpeg Figures, etc.
PNG .png Representative thumbnail, paper figures, etc.
TIFF .tiff Representative thumbnail, paper figures, etc.
TXT .txt Additional details, data-collection information, defocus parameters, etc.
SHELL .sh Shell scripts, e.g., for using MotionCor2 to gain normalize and similar
AVI .avi Video guides through a dataset
MPEG .mpg/.mpeg/.mp4 Video guides through a dataset
BLENDER .blend Blender project of segmented objects
ImageJ .ijm ImageJ macro language scripts
OBJ .obj Segmented 3D objects in a plain text format with basic geometry and material support, e.g. as produced by Blender

4. Deposition and accession code assignment


4.1 EMPIAR deposition code. An EMPIAR deposition code is assigned as soon as a user creates a new deposition. This code is used internally by the EMPIAR deposition and annotation system and should not be used publicly.

4.2 EMPIAR accession code. A public EMPIAR accession code is assigned immediately after the user submits the deposition. The accession code is sent in an automatic email to the depositor and the owner(s). Also, anyone with permission to view the deposition will be able to see the accession code on the main deposition page. The EMPIAR accession code may be used, e.g., in publications, to refer to the entry.

5. Processing procedures


5.1 Deposition process. Data deposition to EMPIAR consists of the following steps:

  • General metadata and information about the deposition is provided by the depositor (EMDB accession ID, title, authors, etc.).
  • General metadata is validated by the system.
  • Data consisting of image sets in separate directories is uploaded by the depositor.
  • Description of each image set (image set metadata) is provided and linked to the corresponding image set directory by the depositor.
  • Image set metadata is validated by the system.
  • The depositor submits the entry for further processing by EMPIAR staff. Note: the depositor will not be able to submit the entry unless all validation steps have been successful.
  • Immediately upon submission, the depositor and the owner of the entry receive notification of successful submission and are given the EMPIAR accession code that can be used in a publication to link to the EMPIAR entry.

5.2 Processing procedure. Deposited data are processed (also referred to as annotation or curation) by EMPIAR staff, which involves the following steps:

  • Checking the consistency of the deposited data and metadata with the information specified in the deposition forms (EMDB accession ID, title, authors, etc.), and with any related EMDB entries.
  • Ensuring that image set metadata is provided and is correct, and that it is correctly associated to the corresponding image set directory.
  • If any major changes have to be made, then the entry is unlocked for further editing by depositors. The entire entry is checked again after re-submission.

The deposition is locked (i.e., not editable by the depositor) from the moment it is submitted. Only if changes that need to be made are identified during the processing procedure will the entry be unlocked again. Communication between the depositors and EMPIAR staff can be carried out within the deposition system or by regular email.

5.3 Release process. The EMPIAR entry can progress to the release process phase only when the following conditions have been met:

  • All submitted deposition forms have been correctly completed and successfully validated by EMPIAR staff;
  • Representative images inspected by EMPIAR staff appear consistent with descriptions of the data;
  • Annotation has finished.

Once these conditions have been fulfilled, the EMPIAR entry is released to the public in accordance with the release instruction provided by the depositor (see 5.3.1 below for available options).

5.3.1 Release instruction. EMPIAR depositions are released (made available to the public) in accordance with the release instruction provided during deposition. Release instruction options are summarised in the table below. (Note that the physical release of large entries is not instantaneous. Synchronisation with mirror sites may lead to additional delays before an entry is shown on such sites.)

Release instruction Description
REL As soon as the annotation procedure is complete and the entry has been approved by the depositor, the release procedure will be initiated
EMDBPUB Release after the associated EMDB entry has been released. If one year after the deposition date the associated EMDB entry has not been released, the EMPIAR entry will be deleted and never be publicly released. (Later release will require the data to be deposited anew.) The EMPIAR accession code will not be recycled. A one-time extension of no more than 6 months will be considered if (one of) the owner(s) requests this and provides a reasonable explanation
HPUB Release after the primary citation for the dataset becomes available. The same procedure as for EMDBPUB will be applied if the publication is not available one year after the deposition date
HOLD Release after a specified period, not to exceed one year. This option is only available if there is no related EMDB entry or publication. A one-time extension of no more than 6 months will be considered if (one of) the owner(s) requests this and provides a reasonable explanation

Information about the entry will not be made public until the entry is released, except when requested by a journal considering the related manuscript for publication. In that case, the journal must provide the EMPIAR accession code, manuscript title and the author list of the publication for verification purposes. Only if the two lists of authors have at least one PI in common, will we provide information about the status of the deposition prior to release.

5.3.2 Status codes. Depositions can have one of a number of status codes that are described in the following table.

Status code Description
PROC The entry has been submitted
REL The entry has been publicly released
WAIT The entry has been looked at, but we wait for additional information from the depositor
OBS The entry has been released, then obsoleted. The data is no longer part of the active archive but is still publicly available. See section 7.1
WDRN The entry has been withdrawn before release and has never been publicly available. See section 7.2
UNARCH The entry has been removed due to unusual circumstances. Depending on the case, the entry may still be publicly available. See section 7.3

5.3.3 Procedure for release. In order to release an entry and make it publicly available, it is necessary to synchronise the uploaded data to the public archive at EMBL-EBI (and subsequently to any official mirror sites), and to create and update a number of files and database records. This involves various processes including:

  • The depositor is sent an email (copied to the owner(s)) with a request to approve the release (for all release instruction cases). If the depositor approves (or once fourteen days have elapsed since the request and no reply has been received), then the entry release is initiated.
  • Header files are created in the deposition upload directory.
  • The synchronisation process of the upload directory and the public archive is initiated (i.e., all files are copied over).
  • The representative thumbnail image(s) uploaded by the depositor or the image(s) of the corresponding EMDB entry are copied to the public archive.
  • A request is submitted to www.cross-ref.org for the DOI assignment of the EMPIAR entry (that will resolve to the EMPIAR web page for that entry at EMBL-EBI).
  • The depositor and the owner(s) are notified that the entry has been released.
  • Once the public archive at EMBL-EBI has been updated, any official EMPIAR mirror sites can begin their synchronisation process.

6. Modification of entries


6.1 Entry changes before release. Before public release, most issues can be resolved by communicating with the depositor via the deposition system. To delete, rename or move uploaded files we require the depositor to contact the operators of the EMPIAR archive. In some cases after submission (which locks the deposition from further modifications), additional data or information from the depositor may be required, making it necessary to unlock the deposition. This option is used sparingly to avoid creating inconsistencies in the metadata.

6.2 Entry changes after release. In some cases, it may be necessary to modify an entry after release. This may apply to both the data and the metadata. A common example of a metadata change is addition of information about the publication. Occasionally, additional data files may need to be added or unintentionally uploaded files may need to be removed. Such changes should be discussed with the operators of the EMPIAR archive and should be justified by the depositor. Changes after release should be made only when strictly necessary. Depending on the nature of the changes, it may take some time for them to be reflected in the public archive and on any official mirror sites. A version history is maintained and distributed in the public archive.

7. Removal of entries


7.1 Entry obsoletion. An EMPIAR entry can only be obsoleted after the submission has been released. Obsoleting an entry will change the status code of the deposition to OBS, move its files out of the active part of the public archive into a separate area of the public archive (called the "obsolete archive"). Note that the entry will still be publicly accessible. Entry obsoletion can be requested, and needs to be justified, by the owner(s) or depositor of an entry. A confirmation email will be sent to the depositor and owner(s).

7.2 Entry withdrawal. If an EMPIAR entry has not yet been released, it can be withdrawn either because the depositor and owner(s) request it before release or because the hold period plus any extension has lapsed. An entry that is withdrawn will never be released and there will be no public record of the deposition ever having been made. In internal records, the status code will be changed to WDRN.

7.3 Entry removal in unusual circumstances*. Circumstances may arise in which the integrity, correctness, ownership or provenance of data is called into question. In such unusual circumstances, the operators of the EMPIAR archive may decide to make an entry, in whole or in part, obsolete (moved out of the active archive, but still publicly accessible) or to remove it entirely from the public record. Examples of cases in which this might occur are:

  • The publication describing a dataset is retracted by (some of) the author(s), their home institute, or the journal in which it was published, and the retractor(s) request that the corresponding data/entries in EMPIAR are removed from the scientific record as well.
  • An official investigative body (for example, the Office of Research Integrity in the USA) recommends that data or entries in EMPIAR should be removed from the scientific record.

In all such cases, the operators of the EMPIAR archive will endeavour to ascertain if (parts of) one or more entries need to be removed and, if so, in which way (obsoletion or withdrawal). Such unusual cases will be documented on the website of the operators of the EMPIAR archive (and any sites that mirror it). The status code of affected entries will be changed to UNARCH.

* The wording of this section is preliminary. Input and views from the community on this issue are welcome.