EMPIAR supports "Method of the Year"

Detail of the webpage for entry EMPIAR-10055
21 March 2016

The official announcement of EMPIAR in Nature Methods prompts Gerard Kleywegt, Head of the Protein Data Bank in Europe, to reflect on the origins and future directions of this essential resource for raw biomolecular imaging data.

EMPIAR is the public archive for raw electron-microscopy (EM) image data. EMPIAR is the most recent member of the family of archives that store the results and data from molecular and cellular structural biology research. It was founded at the EMBL-EBI in 2014, originally as a complement to EMDB (Electron Microscopy Data Bank), which is the single, global archive for 3DEM volume maps and tomograms. EMPIAR is part of a larger project at PDBe called “MOL2CELL”, that aims to integrate 3D structural information on different length scales (“from molecules to cells”) so that it can be used by biologists even if they are not experts in structural biology.

Nature Methods recently named EM its Method of the Year 2015. As the EM field expands and matures rapidly, the raw image data archived in EMPIAR is very useful for software and methods development, for testing new approaches to validation, for making available data related to controversial studies, for distributing data for community challenges (such as the on-going EMDataBank Map Challenge), and for teaching and training newcomers and students in the 3DEM field. At the same time, it provides a useful test bed to investigate the requirements and technical aspects of archiving and distributing large amounts of bioimaging data. EMPIAR took off properly in 2015, and contains 48 released entries as of March 2016. EMPIAR was designed to handle large datasets: the average size of an EMPIAR entry is ~700 GB, with the largest one currently taking up over 6 TB.

The origins of EMPIAR

As is often the case with new initiatives, the genesis of EMPIAR has not been a straightforward process (although it did fortunately have a happy ending). In 2011, PDBe and the Dundee OME team organised a workshop on data-management challenges in 3DEM. At that meeting, there was a consensus that routine deposition of raw data to EMDB was premature, but that it would be useful to develop a pilot archive of images used in single-particle cryo-EM processing and of tilt series used in electron tomography.

PDBe and OME took up this challenge and applied for funding from the UK’s BBSRC research council for a project to develop a volume browser for 3D imaging data and to provide archiving support for soft X-ray tomography (SXT) data and for 3D scanning electron microscopy (3DSEM) reconstructions. The research council did not fund the proposal as it was deemed “premature” (although we thought it was “visionary”), possibly too ambitious and the potential user base was unclear. We then decided to organise another workshop in 2012 on the 3D cellular context of the macromolecular world specifically to discuss archiving needs and opportunities in this area. The recommendations from this workshop were very much in line with our earlier grant application, and expanded on it. The workshop also discussed compelling use cases, which would help with future grant applications.

MOL2CELL project team

Figure 1 - The MOL2CELL project team at PDBe. From left to right: Andrii Iudin (working on the EMPIAR archive, including the website and the deposition system), Ardan Patwardhan (leading the project), Carlos Lugo (working on the integrated structure browser, building on the volume slicer developed by his predecessor, José Salavert Torres), and Paul Korir (focusing on segmentations and adding support for other experimental methods). An undergraduate trainee, Irene Solanes Valero, did a project on 3D segmentation that laid some of the groundwork for Paul’s work and that was also used in the December 2015 workshop.

Armed with the experts’ recommendations and with strong support from the community, PDBe submitted a new grant application, this time to the UK’s Medical Research Council (MRC). This application was successful, and with BBSRC co-funding, we were awarded two programmer posts for three years to work on the MOL2CELL project (of which the image archive was one component). Once the leader of the project, Ardan Patwardhan, started the preliminary work, we had to come up with a name and decided on EMPIAR (“Electron Microscopy Pilot Image Archive”), which is fairly unique in Europe PMC searches (and even Google). Once two programmers had been recruited and started at the EMBL-EBI, the real work began, and the rest, as they say, is history. Figure 1 shows the people who are currently working on the MOL2CELL project.

What does EMPIAR have to offer now?

The EMPIAR home page provides a portal to the archived data and related services, including the FAQ list and the deposition system. The individual entry pages (for example, for entry EMPIAR-10028) give more information about the entry and provide access to individual data files and the option to download partial or complete datasets using a variety of technologies (Figure 2). Thumbnail images of various files can be viewed inside your web browser. As part of the wider MOL2CELL project, a volume-slice viewer has also been developed, which at present is offered for all EMDB entries (for example, for entry EMD-2363; this will be discussed in a separate blog soon).

EMPIAR features

Figure 2 - The EMPIAR archive of raw 3DEM image data, established by PDBe in 2014, is developing rapidly in terms of archived entries, functionality, tools, and recognition by the community. The volume slicer displayed in the bottom-left panel allows for easy visualisation of 3DEM maps and tomograms, enabling even non-specialist users to inspect very large datasets, in three different orientations. The bottom-right panel shows details of an EMPIAR entry page, including on-the-fly downloads and display of thumbnail images of individual files.

3DSEM entries: significance

On 20 January 2016, four related 3D Scanning EM (more precisely, Serial Block-Face or SBF-SEM) datasets were released, showing different stages of infection of a red blood cell by a malaria parasite (EMPIAR entries 10052, 10053, 10054 and 10055; Figure 3). These were the first experimental entries in EMPIAR that were not derived from any of the Transmission EM methods that are archived in EMDB. We expect other imaging modalities to become more and more important in the next few years (in the first instance, besides 3DSEM, also Correlative Light and Electron Microscopy, or CLEM, and Soft X-ray Tomography, or SXT), and EMPIAR is the perfect “playground” to investigate what is involved in archiving the results of these new techniques.

EMPIAR entry 10055

Figure 3 - EMPIAR entry 10055 is a 3DSEM dataset of a late schizont-stage malaria-parasite-infected red blood cell, deposited and described by Sakaguchi et al.

Imaging the future

The field of bioimaging is developing very rapidly, and there are several initiatives in the UK and Europe to set up appropriate infrastructure. This includes facilities to archive (some of) the data that are generated, and EMPIAR is a well-timed early exemplar of such an archive.

The MOL2CELL project extends beyond the EMPIAR archive. We will also develop a tool for annotation of segments (“regions-of-interest”) in 3D datasets, as well as a tool for visualisation and analysis of integrated structural data, from molecules and complexes (in PDB and EMDB) all the way to cells and small samples (in EMDB and EMPIAR), linked through references to other archives (e.g., UniProt, Pfam and GO). To get detailed input from community experts, PDBe and BioMedBridges recently organised a third workshop discussing 3D segmentations (and their annotation) as well as 3D transformations.

Reference

The EMPIAR announcement was published as: A. Iudin, P.K. Korir, J. Salavert-Torres, G.J. Kleywegt & A. Patwardhan. “EMPIAR: A public archive for raw electron microscopy image data.” Nature Methods 13 (2016). http://dx.doi.org/10.1038/nmeth.3806. See also the EMBL-EBI press release of 21 March 2016.

Acknowledgements

We are very grateful for the support from our funders, in particular MRC (grant MR/L007835, co-funded by BBSRC), BBSRC (grant BB/M018423), the Wellcome Trust (grant 104948) and EMBL-EBI. Besides the people working on the MOL2CELL project (Figure 1), many colleagues inside PDBe, EMBL-EBI and elsewhere have contributed to the development of EMPIAR. We are particularly grateful to the EM specialists in the field who have participated in our expert workshops, contributed data to EMPIAR, helped us with user-experience testing, and spread the word about EMPIAR to colleagues and journal editors.