Single-cell submission guide
- Sample information
- Library and sequencing information
- File formats
- Droplet-based technologies
- A few more tips
Start your single-cell submission using the sequencing template in Annotare. Follow the guides below for information about data formats and what metadata to include. Note that larger experiments with more than 1000 raw data files in one experiment are started in Annotare but will be completed manually with the help of a curator.
Required sample attributes
Minimum sample attributes for primary cells from different species and cell lines
|strain or breed||cultivar or ecotype|
|developmental stage||developmental stage||developmental stage||developmental stage|
|organism part||organism part||organism part|
|cell type*||cell type*||cell type||* if known, see comment below|
|individual||individual||donor or animal ID|
|cell line||name of commercial cell line|
Mandatory information that may be applicable
Unknown cell type
If your experiment involves discovering new cell types or you work with pooled cells from a whole organ or organism, the precise cell type might be undefined at the start of the experiment but inferred later during the analysis. In that case, annotate the samples with the inferred cells type using the attribute "inferred cell type". The attribute "cell type" should refer to the input cell type that is known at the start of the experiment. Example: E-MTAB-5953
If the purpose of your experiment is comparing the heterogeneity within one cell population, you might not have any other experimental variables. In this case, you can use the term "single cell identifier" as experimental variable and enter the cell IDs as values. Example: E-MTAB-6142
Quality control pre-analysis
If your single-cell experiment included visual inspection of the cells before lysis, you can include this information in the sample attributes as "single cell well quality". The values for this attribute should be "OK" or "not OK" (i.e. the well should be discarded from data analysis). If you have more detailed information about the "bad" wells, the following terms should be used:
- "multiple cells" or number of cells if known
- "dead cell"
- "no cell"
Quality control post-analysis
You might have quality information about the dataset, e.g. which cells were excluded from your analysis due to insufficient sequencing quality. You can use the attribute "post analysis single cell quality" and annotate each cell with "pass" or "fail". Example: E-MTAB-5522
Examples of additional single-cell sample attributes
|Example||single cell identifier (experimental variable)||inferred cell type||single cell well quality||post-analysis single cell quality|
|Sample 1||cell 1||cell type A||OK||pass|
|Sample 2||cell 2||cell type B||OK||pass|
|Sample 3||cell 3||not applicable||OK||fail|
|Sample 4||cell 4||cell type C||OK||pass|
|Sample 5||cell 5||not applicable||2 cells||fail|
|Sample 6||cell 6||not applicable||debris||fail|
Sample collection protocol
Method of cell singularisation
Please include a description of how the cells were treated and separated into single-cells, e.g. FACS or microfluidics (Fluidigm).
2. Library and sequencing information
Library construction protocol
Single-cell library construction
Please mention the type of single-cell library method that was used (e.g. Smart-seq2, 10x, Drop-seq) and give any relevant literature references.
Library construction kit
Please include the name, manufacturer and catalogue number of the library preparation kit(s) that were used.
Please include details about technical replicates, e.g. if the same libraries were sequenced multiple times or across several lanes.
If you have used spike-in that are commercially available, please include the kit name, catalogue number and dilution in the library construction protocol.
If you have used a non-commercial spike-in set, describe this in the library construction protocol. Additionally, upload a table (tab-delimited text) with the names and concentrations of the spike-ins, and a fasta file containing the nucleotide sequences. Example: E-MTAB-3624.additional.1.zip
You can include spike-in information at the cell level in the sample annotation under "spike in" and "spike in dilution".
Example of spike-in annotation
Example spike in spike in dilution Sample 1 ERCC mix1 1:40000 Sample 2 ERCC mix2 1:40000 Sample 3 ERCC mix1 and mix2 1:40000
Unique molecular identifiers
If your single-cell protocol uses unique molecular identifiers (UMIs), all relevant information about the UMI barcodes should be included to enable re-analysis of the data. We suggest to include the following:
- UMI barcode read (the file that contains the UMI). Values: read1/read2/index1/index2
- UMI barcode offset (start position of UMI barcode in the sequence). Values: (number, 0 for start of read)
- UMI barcode size (length of UMI barcode in bp). Values: (number)
Raw read data should be submitted in fastq.gz format. Prepare one file per cell (or 2 if you have used paired-end sequencing), following the general recommendations from the European Nucleotide Archive (ENA). An exception are droplet-based technologies like 10x and Drop-seq; for those protocols please follow the guide below.
Processed data is welcome in addition to raw data. Most commonly this would be a raw or normalised read count matrix. The format of any matrix file should be tab-delimited text. But also other analysis results files can be included, e.g. alignment files or annotation tracks.
For droplet-based technologies, like 10x and Drop-seq, we allow submission of multiplexed data. A few extra rules apply here.
- Create and annotate 1 sample per library (often containing several thousands of individual cells), instead of 1 sample per cell.
For 10x technology, include the version of the 10x chemistry, e.g. v1 or v2, and provide the exact name of the library construction kit and catalogue number in the "library construction protocol".
For other large-scale single-cell sequencing methods or where modifications were done to the standard 10x protocol, please include specifications about the multiplexing and barcodes. We suggest to include the following to specify the positions and size of the barcodes:
Attribute Description Possible values cDNA read the file that contains the cDNA read index1/index2/read1/read2 cDNA read offset offset in sequence for cDNA read (in bp) (number, 0 for start of read) cDNA read size length of cDNA read (in bp) (number) UMI barcode read the file that contains the UMI barcode read index1/index2/read1/read2 UMI barcode offset offset in sequence for UMI barcode read (in bp) (number, 0 for start of read) UMI barcode size length of UMI barcode read (in bp) (number) cell barcode read the file that contains the cell barcode read index1/index2/read1/read2 cell barcode offset offset in sequence for cell barcode read (in bp) (number, 0 for start of read) cell barcode size length of cell barcode read (in bp) (number) sample barcode read the file that contains the sample barcode read index1/index2/read1/read2 sample barcode offset offset in sequence for sample barcode read (in bp) (number, 0 for start of read) sample barcode size length of sample barcode read (in bp) (number)
File format for droplet-based technologies
For 10x technology, please provide fastq.gz files as generated by CellRanger software from bcl files. These are usually 3-4 fastq.gz files per library containing the cDNA read and several barcode reads in known positions.
We can currently validate files that follow the 10x file naming conventions:
10x version 1 10x version 2 R1 + R2 + R3 + I1 or RA + I1 + I2 R1 + R2 + I1
Processed data: Since there is no information about the individual cells at the sample annotation or file level, it makes sense to include processed data, e.g. a read count matrix, to describe the analysis results like the "inferred cell types" (see above).
Additional files: It is encouraged to upload any additional files that facilitate data analysis, e.g. text files containing lists of known barcodes in the library. These files will be linked to the experiment and don't need to be associated with any specific sample in the submission. Please leave a description of such files in an appropriate protocol, e.g. the normalisation data transformation protocol. Example: E-MTAB-6153_sample_barcodes.txt
A few more tips
Use experiment type "RNA-seq of coding RNA from single cells" for single-cell RNA-sequencing experiments.
Make sure to check the library strand information of your library protocol or kit and enter the relevant strand (see guide to sequencing library information).
If an attribute or parameter in a protocol is different for different files, please include this information at the level of sample annotation. Alternatively, create multiple protocols of the same type and assign them to individual samples (e.g. if you have included files from different sequencer models, create two sequencing protocols).