Single-cell submission guide

  1. Sample information
  2. Library and sequencing information
  3. File formats
  4. Droplet-based technologies
  5. A few more tips

Start your single-cell submission using the sequencing template in Annotare. Follow the guides below for information about data formats and what metadata to include. Note that larger experiments with more than 1000 raw data files in one experiment are started in Annotare but will be completed manually with the help of a curator.

Sample information

Required sample attributes

Minimum sample attributes for primary cells from different species and cell lines

Human Vertebrates Plants Cell lines Comment
organism organism organism organism
strain or breed cultivar or ecotype
age age age
developmental stage developmental stage developmental stage developmental stage
sex sex
disease disease
genotype genotype genotype genetic modification
organism part organism part organism part
cell type* cell type* cell type * if known, see comment below
individual individual donor or animal ID
cell line name of commercial cell line

Mandatory information that may be applicable

  • Unknown cell type

    If your experiment involves discovering new cell types or you work with pooled cells from a whole organ or organism, the precise cell type might be undefined at the start of the experiment but inferred later during the analysis. In that case, annotate the samples with the inferred cells type using the attribute "inferred cell type". The attribute "cell type" should refer to the input cell type that is known at the start of the experiment. Example: E-MTAB-5953

  • Experimental variable

    If the purpose of your experiment is comparing the heterogeneity within one cell population, you might not have any other experimental variables. In this case, you can use the term "single cell identifier" as experimental variable and enter the cell IDs as values. Example: E-MTAB-6142

  • Quality control pre-analysis

    If your single-cell experiment included visual inspection of the cells before lysis, you can include this information in the sample attributes as "single cell well quality". The values for this attribute should be "OK" or "not OK" (i.e. the well should be discarded from data analysis). If you have more detailed information about the "bad" wells, the following terms should be used:

    • "debris"
    • "multiple cells" or number of cells if known
    • "dead cell"
    • "no cell"
    Example: E-MTAB-5522
  • Quality control post-analysis

    You might have quality information about the dataset, e.g. which cells were excluded from your analysis due to insufficient sequencing quality. You can use the attribute "post analysis single cell quality" and annotate each cell with "pass" or "fail". Example: E-MTAB-5522

Examples of additional single-cell sample attributes

Example single cell identifier (experimental variable) inferred cell type single cell well quality post-analysis single cell quality
Sample 1 cell 1 cell type A OK pass
Sample 2 cell 2 cell type B OK pass
Sample 3 cell 3 not applicable OK fail
Sample 4 cell 4 cell type C OK pass
Sample 5 cell 5 not applicable 2 cells fail
Sample 6 cell 6 not applicable debris fail

Sample collection protocol

  • Method of cell singularisation

    Please include a description of how the cells were treated and separated into single-cells, e.g. FACS or microfluidics (Fluidigm).

2. Library and sequencing information

Library construction protocol

  • Single-cell library construction

    Please mention the type of single-cell library method that was used (e.g. Smart-seq2, 10x, Drop-seq) and give any relevant literature references.

  • Library construction kit

    Please include the name, manufacturer and catalogue number of the library preparation kit(s) that were used.

Sequencing protocol

  • Technical replicates

    Please include details about technical replicates, e.g. if the same libraries were sequenced multiple times or across several lanes.

Spike-in RNAs

  • Commercial spike-ins

    If you have used spike-in that are commercially available, please include the kit name, catalogue number and dilution in the library construction protocol.

  • Custom spike-ins

    If you have used a non-commercial spike-in set, describe this in the library construction protocol. Additionally, upload a table (tab-delimited text) with the names and concentrations of the spike-ins, and a fasta file containing the nucleotide sequences. Example:

  • Spike-in annotation

    You can include spike-in information at the cell level in the sample annotation under "spike in" and "spike in dilution".

    Example of spike-in annotation

    Example spike in spike in dilution
    Sample 1 ERCC mix1 1:40000
    Sample 2 ERCC mix2 1:40000
    Sample 3 ERCC mix1 and mix2 1:40000

Unique molecular identifiers

  • If your single-cell protocol uses unique molecular identifiers (UMIs), all relevant information about the UMI barcodes should be included to enable re-analysis of the data. We suggest to include the following:

    • UMI barcode read (the file that contains the UMI). Values: read1/read2/index1/index2
    • UMI barcode offset (start position of UMI barcode in the sequence). Values: (number, 0 for start of read)
    • UMI barcode size (length of UMI barcode in bp). Values: (number)

File formats

  • Raw data

    Raw read data should be submitted in fastq.gz format. Prepare one file per cell (or 2 if you have used paired-end sequencing), following the general recommendations from the European Nucleotide Archive (ENA). An exception are droplet-based technologies like 10x and Drop-seq; for those protocols please follow the guide below.

  • Processed data

    Processed data is welcome in addition to raw data. Most commonly this would be a raw or normalised read count matrix. The format of any matrix file should be tab-delimited text. But also other analysis results files can be included, e.g. alignment files or annotation tracks.

Droplet-based technologies

For droplet-based technologies, like 10x and Drop-seq, we allow submission of multiplexed data. A few extra rules apply here.

Sample annotation

  • Create and annotate 1 sample per library (often containing several thousands of individual cells), instead of 1 sample per cell.

Library layout

  • For 10x technology, include the version of the 10x chemistry, e.g. v1 or v2, and provide the exact name of the library construction kit and catalogue number in the "library construction protocol".

  • For other large-scale single-cell sequencing methods or where modifications were done to the standard 10x protocol, please include specifications about the multiplexing and barcodes. We suggest to include the following to specify the positions and size of the barcodes:

    Attribute Description Possible values
    cDNA read the file that contains the cDNA read index1/index2/read1/read2
    cDNA read offset offset in sequence for cDNA read (in bp) (number, 0 for start of read)
    cDNA read size length of cDNA read (in bp) (number)
    UMI barcode read the file that contains the UMI barcode read index1/index2/read1/read2
    UMI barcode offset offset in sequence for UMI barcode read (in bp) (number, 0 for start of read)
    UMI barcode size length of UMI barcode read (in bp) (number)
    cell barcode read the file that contains the cell barcode read index1/index2/read1/read2
    cell barcode offset offset in sequence for cell barcode read (in bp) (number, 0 for start of read)
    cell barcode size length of cell barcode read (in bp) (number)
    sample barcode read the file that contains the sample barcode read index1/index2/read1/read2
    sample barcode offset offset in sequence for sample barcode read (in bp) (number, 0 for start of read)
    sample barcode size length of sample barcode read (in bp) (number)

File format for droplet-based technologies

  • For 10x technology, please provide fastq.gz files as generated by CellRanger software from bcl files. These are usually 3-4 fastq.gz files per library containing the cDNA read and several barcode reads in known positions.

    We can currently validate files that follow the 10x file naming conventions:

    10x version 1 10x version 2
    R1 + R2 + R3 + I1 or RA + I1 + I2 R1 + R2 + I1

  • Processed data: Since there is no information about the individual cells at the sample annotation or file level, it makes sense to include processed data, e.g. a read count matrix, to describe the analysis results like the "inferred cell types" (see above).

  • Additional files: It is encouraged to upload any additional files that facilitate data analysis, e.g. text files containing lists of known barcodes in the library. These files will be linked to the experiment and don't need to be associated with any specific sample in the submission. Please leave a description of such files in an appropriate protocol, e.g. the normalisation data transformation protocol. Example: E-MTAB-6153_sample_barcodes.txt

A few more tips

  • Use experiment type "RNA-seq of coding RNA from single cells" for single-cell RNA-sequencing experiments.

  • Make sure to check the library strand information of your library protocol or kit and enter the relevant strand (see guide to sequencing library information).

  • If an attribute or parameter in a protocol is different for different files, please include this information at the level of sample annotation. Alternatively, create multiple protocols of the same type and assign them to individual samples (e.g. if you have included files from different sequencer models, create two sequencing protocols).