BioImage Archive Policies

Version 1.1

Use of transient storage

The BioImage Archive is a permanent repository for biological image data. While a submission is being prepared, data uploads are held transiently in a private upload area for users. When the user includes them as part of a submission, data files associated with the submission are moved from this data upload area into the BioImage Archive.

The data upload areas are provided as a temporary place in which data is held while a submission is in preparation. As such, they are neither intended nor suitable for any longer-term storage of data. Such storage is provided in the BioImage Archive itself. Once in the BioImage Archive, data can be released immediately following submission or can be held confidential prior to literature publication if required.

We expect any given data file to remain in a data upload area for no longer than 3 months before it forms part of a submission. We reserve the right to routinely delete any data files that persist in them for more than 3 months.

Dataset submission, updates and persistence

  1. All datasets submitted to the BioImage Archive will remain permanently accessible as part of the scientific record. Corrections of errors and update of the datasets by authors are welcome and erroneous submissions may be removed from visibility by search, but all will remain permanently accessible by accession number.
  2. Submitters are advised that the information displayed on the BioImage Archive and BioStudies websites is fully disclosed to the public. It is the responsibility of the submitters to ascertain that they have the right to submit the data.
  3. Beyond limited editorial control and some internal integrity checks, the quality and accuracy of the submission are the responsibility of the submitting author, not of the archive. The BioImage Archive will work with submitters and users of the archive to achieve the best quality resource possible.
  4. When a submission is made, a release date is set. This date may be up to two years from the time of submission, and can be changed after the submission is completed, for example to ensure data release at the same time as a publication occurs.
  5. All new data directly submitted to the BioImage Archive will be made available under a CC0 licence, datasets brokered/imported from other resources may have other licenses (see below).

Some accessions in the BioImage Archive are available under licences other than CC0 (e.g. CC-BY-4.0). Where this is the case, the licence is indicated as an attribute on the dataset.

Dataset removal by archive staff

Circumstances may arise in which the integrity, correctness, ownership or provenance of data is called into question. In such unusual circumstances, the BioImage Archive may decide to make an entry, in whole or in part, obsolete (moved out of the active archive, but still publicly accessible) or to remove it entirely from the public record. Examples of cases in which this might occur are:

  • The publication describing a dataset is retracted by (some of) the author(s), their home institute, or the journal in which it was published, and the retractor(s) request that the corresponding data/entries in the BioImage Archive are removed from the scientific record as well.
  • An official investigative body (for example, the Office of Research Integrity in the USA) recommends that data or entries in the archive should be removed from the scientific record.

BioImage Archive position on clinical data

The BioImage Archive is EMBL-EBI’s deposition database for biological imaging datasets. The goal of the archive is to enhance life sciences research through the provision of open access image data. To this end, the archive accepts depositions of imaging datasets that are either linked to publications, or show value beyond the immediate experiment, and makes those datasets available for reuse.

At present, medical datasets that are linked to an identifiable individual and/or have a primary use in medical treatment are not in scope for deposition. In the case of human image data where consent for publication is demonstrated, publication in the BioImage Archive will be considered on a case-by-case basis.

We recognise that there is considerable demand for deposition and reuse of personally identifiable imaging data. Developing the technical and policy infrastructure necessary to archive and share this type of data securely requires significant additional complexity, which is not the primary focus of the BioImage Archive. The European Genome-phenome Archive (EGA) provides a service for permanent archiving and sharing of personally identifiable, phenotypic, and clinical data.

Summary

In scope:

  • Non-human biological imaging data associated with publications
  • Non-human reference imaging datasets
  • Image of human cell lines and tissues that are not linked to an individual or do not have phenotypic data allowing for identification

Out of scope:

  • Datasets linked to an identifiable individual
  • Datasets with a primary use in medical treatment (e.g. diagnostic imaging)

All data submitted to the BioImage Archive must be consented for a public release and the submitter self-certifies that they have the rights to submit such data to a public archive.

BioImage Archive support for structured projects

The BioImage Archive aims to support the archival and dissemination of FAIR biological image data. In its role as a primary deposition database, the BioImage Archive accepts any biological image datasets that support results in peer reviewed publications, with exceptions for patient identifiable medical data (see more on Archive scope).

The Archive also supports deposition of datasets which do not have a linked publication, when those datasets have clear value beyond a single experiment (sometimes referred to as reference datasets). One category of reference datasets is those that are generated by structured projects. These projects may generate only imaging data, or multimodal datasets with an imaging component that needs to be cross-linked to other biomolecular data (for example, spatial transcriptomics, or chemogenomic screening).

Project datasets are currently considered on a case-by-case basis, and evaluated on balance of resource use vs predicted scientific impact. This impact may be realised through different routes, for example datasets that:

  • Have high lasting reuse potential (e.g. cell atlases, reference developmental time course studies for model organisms).
  • Support image analysis technology development (e.g. images with ground truth annotations for machine learning).
  • Add value as part of a coherent whole through (e.g. screening datasets with comprehensive compound libraries).

Where significant curation support, storage resources, or development work on specific metadata specifications are likely to be needed, supporting those projects will require dedicated funding. Involving the BioImage Archive at the planning stage is key to success, please get in contact at bioimage-archive@ebi.ac.uk.