EMPIAR deposition manual

1 Introduction

EMPIAR, the Electron Microscopy Public Image Archive, is a public resource for raw electron microscopy images related to 3DEM experiments. The EMPIAR Deposition System can be used to deposited data to EMPIAR. However prior to depositing data to EMPIAR, the associated 3DEM reconstructions must be deposited to EMDB. You may deposit various types of data to EMPIAR including multi-frame micrographs, frame averaged micrographs, particle stacks and tilt series, as well as auxiliary files describing, for example, particle selection coordinates. You can deposited additional data to EMPIAR (and multiple datasets in the same entry). So you could, for example, deposit class averages and metadata etc. to EMPIAR. The requirement we do have, however, is that the 3D reconstructions are deposited to EMDB.

2 Pre-deposition preparations

In order to make the deposition process run smoothly we request that you make certain preparations prior to deposition:

  • Make a note of the accession code (in the form EMD-####, e.g., EMD-1001) of the related EMDB deposition. Not only is this a requirement, but the accession code can also be used to automatically fill in fields (if the EMDB entry has been released).
  • Organize your data
    • If you are planning to upload multiple datasets (e.g., micrographs and particle stacks) we recommend that you create one sub-directory for each dataset.
    • Please name your subdirectories so that it is easy to understand the organization. For example "micrographs" for micrographs and "particles" for particles.
    • Typically having more than 10000 files in a directory has a tendency to slow down access considerably. We would recommend in this case that you sub-divide the directory into subdirectories with no more than 10000 files each.
  • Make a note of the details describing each dataset that you will be asked for during the deposition process. These include:
    • Are the images processed or raw?
    • Are they multi-frame images? If so which frames have been used?
    • Image format
    • Number of images (or tilt series)
    • Image width and height
    • Pixel size
    • Pixel type (unsigned byte, byte, 32 bit float etc)

3 Data transfer technologies

In order to upload data to EMPIAR we provide two alternatives which are both capable of dealing efficiently and robustly with the large data volumes associated with EMPIAR — Aspera (http://asperasoft.com) and Globus (https://www.globus.org). Both technologies require you to install some software on your machine but are free at the point of use.

3.1 Aspera

As easy way to check and install the Aspera plugins is to go to the EMPIAR website. If EMPIAR is not installed it will prompt and guide you to install the relevant plugins, Figure 1.

The 'Aspera Connect' link
Figure 1 Follow the "Aspera Connect" link to install the relevant plugins.

Once this is completed you can test that Aspera is working by trying to download, Figure 2,

Initiate download icon
Figure 2 Click on the icon to start downloading the dataset.

one of the datasets (you can cancel the download once the transfer has started, Figure 3).

Cancel download icon
Figure 3 Press "Abort" to cancel the download.

Aspera makes use of UDP transfer technology. Some institutes block the UDP port (port 33001) by default and it is not possible to get them enabled. If this is the case for you then we recommend that you use Globus which relies on GridFTP.

3.2 Globus

Please go to https://www.globus.org and register (this is free), Figure 4.

Globus registration
Figure 4 Globus is a free service for high performance data transfers.

Once you are logged in to Globus, click on "Quick Links" at the top and "Transfer Files", Figure 5.

Globus transfer files
Figure 5 Select "Transfer Files" from the "Quick Links" menu to go to a page that will allow you to select and transfer files and directories.

Before you can transfer files you will need to have made your computer (where you have the files) an end point. This can be done for free and there is a link on the Transfer Files page to "Globus Connect Personal" which sets your computer up as an end point. In order to test a download, try using the ebi#pub endpoint and set the directory to: /pub/databases/empiar/archive/, select an entry and see if you can transfer it over to your computer, Figure 6.

Setting the endpoint
Figure 6 Set the endpoint and path and press go to see all the entry directories present in the EMPIAR archive.

4 User accounts and deposition landing page

You need a user account to use the deposition system; to register please proceed to the registration page, Figure 7.

Globus transfer files
Figure 7 Registration page for the deposition system.

Once logged in, the "Edit profile" option in the left menu allows you to update your profile and change your password, Figure 8.

Edit profile
Figure 8 Edit profile to change you details or password.

We recommend that you keep your profile up-to-date as you can use it to automatically fill in form fields in a deposition.

By default, when you log in you will be taken to the landing page, Figure 9, which presents the option to create new depositions ("Create a new deposition") and a table with depositions that you can access.

Landing page
Figure 9 Landing page for the EMPIAR deposition system.

One user can create multiple depositions and multiple users can share access to one deposition. For any deposition, only one user is considered the "owner". The owner can grant access rights for the deposition to other users (view only, view and edit, view, edit and submit) or can transfer ownership to another user but following the "Change ownership/grant rights" link in the depositions table.

5 Deposition process

5.1 Overview

The deposition process consists of three parts, Figure 10:

  1. providing the general metadata about the deposition — EMDB accession code, title authors, etc.;
  2. uploading the data — the transfer can take some time, so the next step most likely would not be undertaken immediately;
  3. associating the uploaded data with the corresponding image sets — that is, identifying the image sets present and describing them.
Figure 10 Navigate between the three parts of the deposition using the left menu.

Once these steps have been completed, the deposition can be submitted. This will lock the deposition (make it uneditable) while it is being checked by the EMPIAR annotation team. They will communicate with the user regarding any issues and may choose to unlock the deposition if complementary details or data are required. Once this process is completed, the entry will be released to the public following the instructions provided (see below for options).

5.2 Form basics

5.2.1 Deposition locking

As multiple users can work on the same deposition and more than one user can have edit rights, a locking mechanism has been implemented to prevent simultaneous editing by multiple users.

Editing lock
Figure 11 The editing lock on a deposition will automatically expire after 30 minutes.

Whenever you open a form page, the whole deposition becomes locked to you for 30 minutes and you have exclusive rights to edit it, Figure 11. It is possible to release the lock before the expiration time by closing all pages or by pressing the "Release Lock" button.

5.2.2 "Save", "Save + validation", "Submit entry" and the traffic light system

Changes made to a form will be lost unless they are saved by pressing the "Save" or "Save + validation" buttons. The former is for a temporary save of the page in case it is not possible to fill in all the mandatory information on the page in one go. However, to proceed with the submission it is necessary to have the information on the page validated by our system with "Save + validation", Figure 12.

'Save', 'Save + validation' and 'Submit entry' buttons
Figure 12 "Save" saves the form without checking it. "Save + validation" also performs a validation check. The "Submit entry" button appears when all the forms have been validated.

The state of the page is shown on the left-hand side menu, Figure 13. When the page is first opened, there is an empty circle next to its link, when it is saved the circle becomes filled with yellow, when it is validated with errors — red and when it successfully passes the validation — green.

'Save', 'Save + validation' and 'Submit entry' buttons
Figure 13 A traffic light system is employed in the left menu to indicate the validation status of deposition forms.

When all forms have been filled out and validated successfully, a "Submit" button will appear next to the "Save + validation" button. Please press "Submit" to send the deposition for review by the EMPIAR annotation team. The deposition will be locked from further editing from this stage onwards unless an annotator would require you to fill in or change any of the information.

5.2.3 Mandatory fields and the "N/A" button

Mandatory fields are marked by a red asterisk "*". You will also find many fields that have a "N/A" (not available/not applicable) button next to them, Figure 14. Not all of these fields are mandatory but we expect the user to at least provide a "N/A" to explicitly confirm that the information requested cannot be provided. Pressing "N/A" will automatically erase the existing information in the form field and fill it with the special marker for N/A information.

'N/A' button
Figure 14 Some fields display a "N/A" button next to them. If you do not have a meaningful value for this field, you must press this button to specify that the value is not available.
5.2.4 Form field help and examples

Most form fields have a question mark symbol "?" next to them and an example value below them, Figure 15. Hovering over the question mark symbol will bring up a pop-up box with help.

Figure 15 Example values are shown below the fields and help can be accessed by hovering over the help icon.

5.3 Deposition overview page

5.3.1 Deposition image

This image will be used for representative purposes on the EMPIAR website alongside your entry, Figure 16. If no image is provided, we will use the image provided when depositing the associated EMDB entry. The image should be a minimum of 400 x 400 in png or gif format.

Picture upload
Figure 16 The depositor can upload a picture that will be used to publicly represent the entry on the EMPIAR pages. If no image is provided then an attempt is made to retrieve the image from the EMDB deposition.
5.3.2 Harvesting information from the related EMDB entry and user profiles

You can specify multiple EMDB accession codes but please note that there are separate boxes for released and unreleased entries, Figure 17. If the entry has been released you can copy authors from the related EMDB entry by pressing the "Copy Authors from EMDB entry" button.

EMDB accession codes
Figure 17 You can specify related EMDB accession codes – use the "Add More" button to specify more than one.

You can copy authors from the related EMDB entry by filling in the "EMDB accession code" and then pressing the "Copy Authors from EMDB entry" button. You can also automatically populate the corresponding author and principal author fields from the user profiles of any users that are associated with the deposition. Once the corresponding author fields have been populated, you may also copy these over to the principal author fields.

5.3.3 Citation information

Please provide the information about the citation related to your deposition, Figure 18.

Citation information
Figure 18 Citation information form.

You can automatically fill in the citation information using DOI or PubMed ID.

If the information regarding editors is not available, please press the corresponding "Remove editor" button, Figure 19:

Editor information
Figure 19 Editor information form.
5.3.4 Release instruction

You have four options for how the entry should be released (made available to the public) once it has been submitted and processed successfully, Figure 20:

  1. REL: As soon as the processing has finished the release procedure will be initiated. For datasets in the TB range this could take several days
  2. EMDBPUB: Wait till the associated EMDB entry has been released before releasing the EMPIAR entry. This would require the public release by EMDB of both the header and the map
  3. HPUB: Wait until the primary citation for the associated EMDB entry has been published or max 1 year (whichever comes first)
  4. HOLD: Wait for 1 year
Release instructions
Figure 20 Release instruction specifying how the entry should be released once the deposition has been successfully processed.

Please note that we will have automated checks in place to find out, for example, when the citation is published, these checks will sometimes fail to detect one of these events. We therefore recommend that you contact the EMPIAR annotation team to let them know when associated entries or citations are released/published.

5.4 Upload data

You will not be able to proceed to the Upload data until the Deposition overview page has been completed and validated. You are provided with three options — Aspera via command line, Aspera via web-client or Globus. Once the upload has finished, please check your data on "Associate image sets with data page" as described below.

When using the web-client please keep in mind that there is a limitation set by the web-browser and the operating system for the selection dialogue in the web-browser. Usually the most you can select is about 300 files at a time (depends on the length of filenames and paths to files). If you intend to upload more than that in a single go (as opposed to uploading 300 files at a time) or if your dataset is 400 GB+ in size, we recommend using the Aspera command line client.

Data-transfers commonly proceed at 50 - 200 GB per hour so expect TB+ sized datasets to take days in some cases. If you are using the command-line client or Globus you can do so asynchronously without being logged in to the deposition system. However you need tokens to initiate the transfers which are provided on the Upload data.

5.5 Associate image sets with data page

Due to the fact that we do not prescribe the organization of data being uploaded, the purpose of this page is to allow the depositor to identify and describe the datasets present in the uploaded data. As an example one could have three datasets — raw multi-frame micrographs, frame-averaged micrographs and particle stacks that have to be associated with the directories "micrographs/multiframe/", "micrographs/singleframe/" and "particles/" respectively. As data upload may proceed asynchronously, you may proceed to this page even though the upload has not completed.

5.5.1 Checking the uploaded data
  • The "Refresh directory structure" button will build a logical representation of the directory tree structure and determine the size of the upload, Figure 21.
  • Zero-sized elements
    Figure 21 Any zero-sized files or folders are highlighted. The file tree can be expanded and shows the size of all the directories – this is a good quick first check that the data has been uploaded correctly.
  • It will also check for zero sized files and provide warnings if any are found. These are all good initial checks to see if the upload has completed and has been successful.
  • A more detailed check can be done by comparing the md5 sums for all the uploaded files with the md5 sums of the files on your local disk. In order to make this check possible we provide a json file and a Python script that can be downloaded and run by you to check that files match. More detailed instructions are provided in the "Check that the uploaded data in EMPIAR is the same as the one on your disk" on the image set association page.
  • Also on this page you have an option to download the list of the uploaded files.
  • We recommend that you run all these checks to make sure that the data has been uploaded correctly.
5.5.2 Associating datasets
  • You need to define at least one dataset
  • Use the directory tree browser and select the directory corresponding to the dataset, Figure 22. Click on the directory — this will automatically populate the corresponding field in the form
  • The directory tree browser
    Figure 22 The directory tree browser can be used to select the data directory for an image set.
  • Fill out the form fields describing the dataset. Please note that a descriptive name is useful especially when the deposition consists of several datasets. The "Details" section is also useful to describe auxiliary data and how it may be related to the image data
  • You can fill in some of the fields automatically by clicking on one of the image set files, and, if it is readable by IMOD or BSOFT, you will see its header displayed in a popup. There you can click a button to populate all possible fields in the corresponding form, Figure 23
  • Adding image sets
    Figure 23 Automatically populate all possible fields in the form with the information from the file's header.
  • To add another dataset, please press "Add more" button at the bottom of the page, Figure 24
  • Adding image sets
    Figure 24 You can specify more than one image set.

6 Helpdesk

Contact help
Figure 25 Help options available from the left menu.

There are three help options available from the left menu, Figure 24. This manual can be accessed from "Deposition manual". The "Helpdesk" link in the left menu can be used to pose a question or review previous communications with the annotation staff. To pose questions specifically about a deposition that is being edited, we recommend that you use the "Deposition help" button. The help desk system allows you to add attachments to your communications. If you have trouble with registering an EMPIAR account or using the helpdesk system, please send us e-mail.

7 Invite reviewers

Contact help
Figure 26 Inviting reviewers to examine your entries.

You may be requested by editors or referees to provide an access to your data before the publication. To facilitate this we provide the owner of the entry with an option to generate links that can be sent to the editor to distribute among the referees, Figure 25. When a referee opens such a link, they will be provided with randomly generated credentials for an anonymous user that can be used to log into the EMPIAR deposition system review your metadata, download and check your data.