Submit Spatial Transcriptomics < BioImage Archive

Submitting spatial transcriptomics data

Here you can find information about depositing Spatial Transcriptomics (ST) data into the BIA paying special attention to the challenges related these datasets.

1. Register an account

You will need to register for a BioStudies account. You can sign up here and then return to this guide.

2. What data types can I submit to the BIA?

ST data is generally multimodal and therefore there are different data types to be archived. Generally, you should include everything needed to reproduce your results. This might be raw data, intermediate results, or processed data like segmentation masks. For any program code accompanying your study, it is best to upload that code to a specialised repository (e.g. GitHub or Gitlab), and then add a link to it in your submission, which you will be able to do during the submission process.

There are two approaches the BIA users can take when it comes to ST data deposition:

Deposition of different data types in different archives:

This is the preferred method for data obtained using sequencing-based methods. In this instance the images are submitted to the BIA and other data types such as gene expression matrices are submitted to other relevant repositories.

Here is an example of how to deposit multimodal data:

Submit bulk transcriptomic data and single-cell omics data to a specialist repository such as the European Nucleotide Archive (ENA) or ArrayExpress.
Afterwards images belonging to multimodal ST datasets can be submitted to the BioImage Archive.
You must link both submissions, instructions on how to do this can be found in section Study level metadata below and here.

Deposition of all data types in the BIA:

This is the preferred method for data obtained using imaging-based spatial transcriptomics technologies. In this case, all data, except for single cell sequencing, can be deposited into the BioImage Archive. Importantly, for these datasets to be reusable they would need to contain a minimum set of files:

Expression matrix - containing transcriptional profiles (or necessary files to construct it such as SpaceRanger output files)
Cell metadata file - cell information such as quality control measures, or metrics like aspect ratio or area
Cell polygons - polygons determining cell boundaries
Immunofluorescence images - raw and/or preprocessed fluorescence images (used for segmentation)
Segmentation masks - raster images with segmentation masks encoded
Transcripts positions - coordinates of each detected transcript

Apart from the minimum set of files we encourage deposition of any additional files that may be important to reproduce the submitted experiments. For example, field of view positions, transformation data, gene panel metadata, analysis summary files, etc.

3. File formats and file size considerations

You can submit your data in any format that you have. We encourage the use of open file formats that do not require proprietary software, to increase the reuse potential of the dataset. For images, in general, it is best to use widely supported community standard formats (e.g. OME-TIFF, OME-NGFF/OME-Zarr, or SpatialData Object) if possible. If you are submitting SpatialData Objects, please look for more instructions on the section Submitting SpatialData Objects. For image annotations and supplementary data (such as tables of features or measurements), use standardised formats, such as CSV, where possible. For large tabular data files (like transcript positions that can have millions of rows), we recommend Parquet to facilitate downstream analysis. If you are submitting Zarr or OME-Zarr files, please check this page. with additional considerations.

We do not have restrictions on the total size of the datasets submitted to the BIA. If your dataset is larger than a few TB, please get in touch with us so we can prepare for your submission. Regarding file sizes, we cannot support single files larger than 2TB, and would recommend keeping individual files under 100GB as that will make the upload process faster and less prone to errors.

4. Prepare your data

Now, organise your data for submission. Your data files should be in a single directory tree with a logical hierarchical file structure. Please note that the file paths will not be directly visible on the entry’s web page, therefore if there is embedded information in the directory structure please put that information as metadata in the file list (see section File level metadata).

Please note that some special characters are not allowed in directory and file names. Please only use the following allowed characters to name your files:

Any alphanumeric character, a-z | A-Z | 0-9
Any of the following special characters, !-_.*’()
- Exclamation point !
- Hyphen -
- Underscore _
- Period .
- Asterisk *
- Single quote ‘
- Open parenthesis (
- Close parenthesis )
- Space

5. Upload your data

We support data upload through multiple methods, each recommended depending on the size of your submission. You can find detailed information about how to upload your data in section 3 here.

6. Prepare your file list

You will need a “table of contents” for your image files, called a “file list” in the submission tool here. The purpose of the file list is to link specific files to your submission and to provide information about the individual files (see section File level metadata).

You can find further information about the file list and how to create it at the file list help page.

7. Submission

Provide Study level metadata

To provide general metadata about the study, images, and annotations, you can use the web form in our submission tool. First, you will need to create a submission here. Make sure you select the BioImages template when creating a new submission. You will have to fill in all of the required metadata fields before submitting.

Please make sure you specify what ST technology was used for both acquisition and analysis (including versions). If any data has been pre-processed, please describe your pre-processing pipeline in the Image Analysis metadata module.

If you are submitting images to the BIA and other data types (such as gene expression matrices) to other relevant repositories, you must link any submission made to other archives in the Linked Information section of our submission tool. Submission S-BIAD1415 is an example of how to link depositions of other data types with a BIA submission. You can find more information about linking information here.

Linked information section from S-BIAD1415. The authors include links to related data depositions in ArrayExpress and PRIDE.

Provide File level metadata

Some metadata that changes among the files of a study component, or of an annotations section, needs to be given at the file level. This metadata should be added in the relevant row for a given file in the file list.

For example, the file list of submission S-BIAD1182 details what are the fluorescent labels and RNA targets used in each image.

Example file list from S-BIAD1182. The authors provide the fluorescent labels imaged on each channel, and the corresponding RNA targets. Channel and target information varies for each file.

Please, use ontologies and controlled vocabularies for the values of the additional metadata where possible. This will improve the findability and interoperability of your data.

If you are submitting annotations, e.g. segmentation masks, point annotations etc., you will need a to add one or more Annotations section to your submission. For this section you will need a separate file list which contains the paths to the annotation files, and should also include a “source image” column to reference the original image that has been annotated. You can find a template for the annotations file list here.

Submitting SpatialData Objects

A Spatial Data Object is a Zarr based file formats with json metadata at the top level. It usually contains multiple subdirectories:

images/ - can contain one or more raster image (e.g. H&E, staining)
labels/ - raster image (segmentation mask)
points/ - point coordinates (e.g. transcripts)
shapes/ - geometric shapes (e.g. cell segmentations, ROIs)
tables/ - anndata objects (gene expressions, cell types etc)

Submission of SpatialData objects

The spatialdata objects should be zipped at the top level, e.g. mysdata.zarr.zip. You can use the following command to zip your spatialdata files with no compression (replace mysdata.zarr.zip and mysdata.zarr/ with the name of your spatialdata object):

zip -0 -r mysdata.zarr.zip mysdata.zarr/

The submission must have:

A Study Component which includes the biosample, specimen and imaging metadata (see REMBI Lab Guide and REMBI Model Reference for more guidance)
Annotation component(s) for annotation (segmentation, labels etc) metadata (see MIFA metadata for more guidance). One annotation component for different annotation types is enough if they were produced using the same method/software and same criteria, coverage etc.
A column in the Study Component filelist to say which file/files are SpatialData objects
The file lists should have additional columns listing what’s inside the spatialdata, i.e. components of the spatialdata object.

If you have a vitessce config file, you can also upload and submit that by including it in the Study Component filelist.

Examples of File Lists for Study and Annotations component for SpatialData Object submissions

Filelist template for Study Component (please delete the examples when submitting): Spatialdata_filelist_example
Filelist template for Annotation component (please delete the examples when submitting): Spatialdata_annotation_filelist_example

8. Complete your submission

Once you have filled in all the necessary fields in the web forms the validation panel on the left should indicate “all ok”. When that is the case, you can click on “Submit”.

After your submission is loaded into the BioImage Archive database, it will be assigned a unique BioImage Archive accession number (i.e., dataset identifier).

It is important to note that your submission will be private, and other people will not be able to see it, until the release date you have set. The accession number will be the same before and after making the dataset public and will not change even if revisions are necessary. Details on how to access and share your study with others will be available in a confirmation window and will also be sent via email. A private link will be available for you to share your dataset with reviewers and collaborators before the release date.

Further information

The BioImage Archive Online Tutorial
The BioImage Archive – Building a Home for Life-Sciences Microscopy Data - the paper describing the BioImage Archive Read more

A paper detailing the need for the BioImage Archive was featured in Nature Methods in November 2018. Read more