EMPIAR, the Electron Microscopy Public Image Archive, is a public resource for raw electron microscopy images related to 3DEM experiments.
The EMPIAR Deposition System can be
used to deposited data to EMPIAR. However prior to depositing data to EMPIAR, the associated 3DEM reconstructions must be deposited to EMDB. You may deposit various types of data to EMPIAR including multi-frame micrographs, frame
averaged micrographs, particle stacks and tilt series, as well as auxiliary files describing, for example, particle selection coordinates.
You can deposited additional data to EMPIAR (and multiple datasets in the same entry). So you could, for example, deposit class averages and
metadata etc. to EMPIAR. The requirement we do have, however, is that the 3D reconstructions are deposited to EMDB.
2 Pre-deposition preparations
In order to make the deposition process run smoothly we request that you make certain preparations prior to deposition:
Make a note of the accession code (in the form EMD-####, e.g., EMD-1001) of the related EMDB deposition. Not only is this a requirement,
but the accession code can also be used to automatically fill in fields (if the EMDB entry has been released).
Organize your data
If you are planning to upload multiple datasets (e.g., micrographs and particle stacks) we recommend that you create one sub-directory
for each dataset.
Please name your subdirectories so that it is easy to understand the organization. For example "micrographs" for micrographs and
"particles" for particles.
Typically having more than 10000 files in a directory has a tendency to slow down access considerably. We would recommend in this case
that you sub-divide the directory into subdirectories with no more than 10000 files each.
Make a note of the details describing each dataset that you will be asked for during the deposition process. These include:
Are the images processed or raw?
Are they multi-frame images? If so which frames have been used?
Number of images (or tilt series)
Image width and height
Pixel type (unsigned byte, byte, 32 bit float etc)
3 Data transfer technologies
In order to upload data to EMPIAR we provide two alternatives which are both capable of dealing efficiently and robustly with the large data volumes
associated with EMPIAR — Aspera (http://asperasoft.com) and Globus (https://www.globus.org). Both technologies require you to install some software on your machine but are free at the
point of use.
As easy way to check and install the Aspera plugins is to go to the EMPIAR website. If EMPIAR
is not installed it will prompt and guide you to install the relevant plugins, Figure 1.
Once this is completed you can test that Aspera is working by trying to download, Figure 2,
one of the datasets (you can cancel the download once the transfer has started, Figure 3).
Aspera makes use of UDP transfer technology. Some institutes block the UDP port (port 33001) by default and it is not possible to get them enabled. If this is
the case for you then we recommend that you use Globus which relies on GridFTP.
Once you are logged in to Globus, click on "Quick Links" at the top and "Transfer Files", Figure 5.
Before you can transfer files you will need to have made your computer (where you have the files) an end point. This can be done for free and
there is a link on the Transfer Files page to "Globus Connect Personal" which sets your computer up as an end point. In order to test a download, try
using the ebi#pub endpoint and set the directory to: /pub/databases/empiar/archive/, select an entry and see if you can transfer it over to your computer, Figure 6.
4 User accounts and deposition landing page
You need a user account to use the deposition system; to register please proceed to the
registration page, Figure 7.
Once logged in, the "Edit profile" option in the left menu allows you to update your profile and change your password, Figure 8.
We recommend that you keep your profile up-to-date as you can use it to automatically fill in form fields in a deposition.
By default, when you log in you will be taken to the landing page, Figure 9, which presents the option to create new depositions ("Create a new deposition") and a
table with depositions that you can access.
One user can create multiple depositions and multiple users can share access to one deposition. For any deposition,
only one user is considered the "owner". The owner can grant access rights for the deposition to other users (view only, view and edit, view, edit and submit)
or can transfer ownership to another user but following the "Change ownership/grant rights" link in the depositions table.
5 Deposition process
The deposition process consists of three parts, Figure 10:
providing the general metadata about the deposition — EMDB accession code, title authors, etc.;
uploading the data — the transfer can take some time, so the next step most likely would not be undertaken immediately;
associating the uploaded data with the corresponding image sets — that is, identifying the image sets present and describing them.
Once these steps have been completed, the deposition can be submitted. This will lock the deposition (make it uneditable) while it is being checked by
the EMPIAR annotation team. They will communicate with the user regarding any issues and may choose to unlock the deposition if complementary details or data
are required. Once this process is completed, the entry will be released to the public following the instructions provided (see below for options).
5.2 Form basics
5.2.1 Deposition locking
As multiple users can work on the same deposition and more than one user can have edit rights, a locking mechanism has been implemented to prevent
simultaneous editing by multiple users.
Whenever you open a form page, the whole deposition becomes locked to you for 30 minutes and you have exclusive
rights to edit it, Figure 11. It is possible to release the lock before the expiration time by closing all pages or by pressing the "Release Lock" button.
5.2.2 "Save", "Save + validation", "Submit entry" and the traffic light system
Changes made to a form will be lost unless they are saved by pressing the "Save" or "Save + validation" buttons. The former is for a temporary save of
the page in case it is not possible to fill in all the mandatory information on the page in one go. However, to proceed with the submission it is necessary
to have the information on the page validated by our system with "Save + validation", Figure 12.
The state of the page is shown on the left-hand side menu, Figure 13. When the page is first opened, there is an empty circle next to its link, when it is saved
the circle becomes filled with yellow, when it is validated with errors — red and when it successfully passes the validation — green.
When all forms have been filled out and validated successfully, a "Submit" button will appear next to the "Save + validation" button. Please press "Submit"
to send the deposition for review by the EMPIAR annotation team. The deposition will be locked from further editing from this stage onwards unless an annotator
would require you to fill in or change any of the information.
5.2.3 Mandatory fields and the "N/A" button
Mandatory fields are marked by a red asterisk "*". You will also find many fields that have a "N/A" (not available/not applicable)
button next to them, Figure 14. Not all of these fields are mandatory but we expect the user to at least provide a "N/A" to explicitly confirm that the information requested
cannot be provided. Pressing "N/A" will automatically erase the existing information in the form field and fill it with the special marker for N/A information.
5.2.4 Form field help and examples
Most form fields have a question mark symbol "?" next to them and an example value below them, Figure 15. Hovering over the question mark symbol will bring up a pop-up box
5.3 Deposition overview page
5.3.1 Deposition image
This image will be used for representative purposes on the EMPIAR website alongside your entry, Figure 16. If no image is provided, we will use the image provided when
depositing the associated EMDB entry. The image should be a minimum of 400 x 400 in png or gif format.
5.3.2 Harvesting information from the related EMDB entry and user profiles
You can specify multiple EMDB accession codes but please note that there are separate boxes for released and unreleased entries, Figure 17. If the entry has been released
you can copy authors from the related EMDB entry by pressing the "Copy Authors from EMDB entry" button.
You can copy authors from the related EMDB entry by filling in the "EMDB accession code" and then pressing the "Copy Authors from EMDB entry" button. You can
also automatically populate the corresponding author and principal author fields from the user profiles of any users that are associated with the deposition. Once
the corresponding author fields have been populated, you may also copy these over to the principal author fields.
5.3.3 Citation information
Please provide the information about the citation related to your deposition, Figure 18.
You can automatically fill in the citation information using DOI or PubMed ID.
If the information regarding editors is not available, please press the corresponding "Remove editor" button, Figure 19:
5.3.4 Release instruction
You have four options for how the entry should be released (made available to the public) once it has been submitted and processed successfully, Figure 20:
REL: As soon as the processing has finished the release procedure will be initiated. For datasets in the TB range this could take several days
EMDBPUB: Wait till the associated EMDB entry has been released before releasing the EMPIAR entry. This would require the public release by EMDB of both the header and the map
HPUB: Wait until the primary citation for the associated EMDB entry has been published or max 1 year (whichever comes first)
HOLD: Wait for 1 year
Please note that we will have automated checks in place to find out, for example, when the citation is published, these checks will sometimes fail to detect one
of these events. We therefore recommend that you contact the EMPIAR annotation team to let them know when associated entries or citations are released/published.
5.4 Upload data
You will not be able to proceed to the Upload data until the Deposition overview page has been completed and validated. You are provided with three options —
Aspera via command line, Aspera via web-client or Globus. Once the upload has finished, please check your data on "Associate image sets with data page" as described below.
When using the web-client please keep in mind that there is a limitation set by the web-browser and the operating system for the selection dialogue in the web-browser.
Usually the most you can select is about 300 files at a time (depends on the length of filenames and paths to files). If you intend to upload more than that in a single go (as opposed to uploading 300 files at a time) or if
your dataset is 400 GB+ in size, we recommend using the Aspera command line client.
Data-transfers commonly proceed at 50 - 200 GB per hour so expect TB+ sized datasets to take days in some cases. If you are using the command-line client or Globus
you can do so asynchronously without being logged in to the deposition system. However you need tokens to initiate the transfers which are provided on the Upload data.
5.5 Associate image sets with data page
Due to the fact that we do not prescribe the organization of data being uploaded, the purpose of this page is to allow the depositor to identify and describe the
datasets present in the uploaded data. As an example one could have three datasets — raw multi-frame micrographs, frame-averaged micrographs and particle stacks that
have to be associated with the directories "micrographs/multiframe/", "micrographs/singleframe/" and "particles/" respectively. As data upload may proceed asynchronously,
you may proceed to this page even though the upload has not completed.
5.5.1 Checking the uploaded data
The "Refresh directory structure" button will build a logical representation of the directory tree structure and determine the size of the upload, Figure 21.
It will also check for zero sized files and provide warnings if any are found. These are all good initial checks to see if the upload has completed and has
A more detailed check can be done by comparing the md5 sums for all the uploaded files with the md5 sums of the files on your local disk. In order to make this
check possible we provide a json file and a Python script that can be downloaded and run by you to check that files match. More detailed instructions are provided
in the "Check that the uploaded data in EMPIAR is the same as the one on your disk" on the image set association page.
Also on this page you have an option to download the list of the uploaded files.
We recommend that you run all these checks to make sure that the data has been uploaded correctly.
5.5.2 Associating datasets
You need to define at least one dataset
Use the directory tree browser and select the directory corresponding to the dataset, Figure 22. Click on the directory — this will automatically populate the
corresponding field in the form
Fill out the form fields describing the dataset. Please note that a descriptive name is useful especially when the deposition consists of several datasets. The
"Details" section is also useful to describe auxiliary data and how it may be related to the image data
You can fill in some of the fields automatically by clicking on one of the image set files, and, if it is readable by IMOD or BSOFT, you will see its header
displayed in a popup. There you can click a button to populate all possible fields in the corresponding form, Figure 23
To add another dataset, please press "Add more" button at the bottom of the page, Figure 24
There are three help options available from the left menu, Figure 24. This manual can be accessed from "Deposition manual". The "Helpdesk" link in the left
menu can be used to pose a question or review previous communications with the annotation staff.
To pose questions specifically about a deposition that is being edited, we recommend that you use the "Deposition help" button. The help desk system allows
you to add attachments to your communications. If you have trouble with registering an EMPIAR account or using the helpdesk system, please
send us e-mail.
7 Invite reviewers
You may be requested by editors or referees to provide an access to your data before the publication. To facilitate this we provide the owner of the entry with
an option to generate links that can be sent to the editor to distribute among the referees, Figure 25. When a referee opens such a link, they will be provided with
randomly generated credentials for an anonymous user that can be used to log into the EMPIAR deposition system review your metadata, download and check your data.