{"id":277,"date":"2022-08-04T15:22:47","date_gmt":"2022-08-04T15:22:47","guid":{"rendered":"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-rna-seq-analysis-using-r-handbook-2022\/programme\/day-1\/pipeline-overview\/"},"modified":"2026-03-10T11:22:20","modified_gmt":"2026-03-10T11:22:20","slug":"pipeline-overview","status":"publish","type":"page","link":"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-rna-seq-analysis-using-python-materials\/planing-your-experiment\/pipeline-overview\/","title":{"rendered":"Dry-lab overview"},"content":{"rendered":"\n<p><strong>Trainer:<\/strong> Marisa Loach, Kayleigh Smith, Wendi Bacon<\/p>\n\n\n\n<p id=\"block-4b8a7371-48dc-40ae-ac2a-1c59f562f909\"><strong>Overview:<\/strong>&nbsp;In this session, you will identify and describe challenges and limitations in scRNA seq analysis<\/p>\n\n\n\n<p><strong>Why does it matter?<\/strong><\/p>\n\n\n\n<p>scRNA seq is a fantastic tool for answering scientific questions, but it is not the be-all\/end-all \u2013 careful and critical interpretation is required.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Activity goals<\/strong>:<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Analyse the data to determine:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Number of cells<\/li>\n\n\n\n<li>Number of cell clusters (generate a cluster map!)<\/li>\n\n\n\n<li>Disease-specific clusters<\/li>\n\n\n\n<li>Disease-specific transcript signatures<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Activity steps<\/strong>:<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Examine the data<\/strong><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Make a copy of this\u00a0<a href=\"https:\/\/ftp.ebi.ac.uk\/pub\/training\/2026\/Single_cell_rna_seq_analysis_with_python_2026\/Presentations\/Dry-lab_Activity\/Dry-lab_Activity_1.pptx\">activity template<\/a>, and the <a href=\"https:\/\/ftp.ebi.ac.uk\/pub\/training\/2026\/Single_cell_rna_seq_analysis_with_python_2026\/Presentations\/Dry-lab_Activity\/Dry-lab_Activity_2-Plot.pptx\">plot<\/a>\n<ul class=\"wp-block-list\">\n<li>Key:<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image is-style-default\"><img decoding=\"async\" src=\"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-materials\/wp-content\/uploads\/sites\/30\/2021\/07\/image-15.png\" alt=\"1 Read \nCell Barcode: \nSample Index: N701 \nTranscript: Pink (i.e. GAPDH) \" class=\"wp-image-184\"\/><\/figure>\n\n\n\n<p><strong>2. Demultiplex your data<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Both samples were run on the same sequencing lane with two sample indices from Index Read 1.\n<ul class=\"wp-block-list\">\n<li>Sample index N701 contained cancerous cells<\/li>\n\n\n\n<li>Sample index N702 contained only healthy cells<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Divide your reads into N701 and N702 (and keep separate!)\n<ul class=\"wp-block-list\">\n<li>Example:<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image is-style-default\"><img decoding=\"async\" src=\"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-materials\/wp-content\/uploads\/sites\/30\/2021\/07\/image-6-1024x249.png\" alt=\"\u039d7\u039f2 \n\u039d702 \n\u039d702 \n\u039d702 \n\u039d702 \n\u039d702 \n\u039d702 \n\u039d70\u0399 \n\u039d701 \n\u039d701 \n\u039d701 \n\u039d7\u039f\u0399 \n\u039d701 \n\u039d701 \n\u039d701 \" class=\"wp-image-164\"\/><\/figure>\n\n\n\n<p><strong>4. Generate a \u2018cell matrix\u2019<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A \u201cCell matrix\u201d is like a \u201cDigital Expression Matrix,\u201d where reads that contain the same cell barcode are stacked so that cell-cell differences can be analysed<\/li>\n\n\n\n<li>Each emoji represents a cell barcode.<\/li>\n\n\n\n<li>Organise your \u2018reads\u2019 into cells by combining cell barcodes (keep N701 and N702 separate)\n<ul class=\"wp-block-list\">\n<li>Example:<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image is-style-default\"><img decoding=\"async\" src=\"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-materials\/wp-content\/uploads\/sites\/30\/2021\/07\/image-7-1024x292.png\" alt=\"TOlN \nTOLN \n\u03c4\u03bf\u0399\u039d \n\u03c4\u03bf\u0399\u039d \n\u03c4\u03bf\u0399\u039d \nTOLN \nTOLN \nTOLN \nTOLN \nTOLN \nTOlN \n\u03a4.0\u0399\u039d \nzolN \n\u03b5\u03bf\u0399\u039d \n\u0396\u039f\u0399\u039d \nTOLN \nTOlN \" class=\"wp-image-166\"\/><\/figure>\n\n\n\n<p><strong>5. Filter the cells<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Remove any \u2018cell barcodes\u2019 (emojis) that appear fewer than 4 times. You may also consider whether to put a cap on the highest number of transcripts constituting a cell (doublets may have more transcripts).<\/li>\n\n\n\n<li>These likely represent background. Setting a cut-off point (i.e. how many genes or transcripts constitute the minimum number to define a cell) can be tricky.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image is-style-default\"><img decoding=\"async\" src=\"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-materials\/wp-content\/uploads\/sites\/30\/2021\/07\/image-8.png\" alt=\"\u03a4\u039f\u0399\u039d \n\u0396O\u0399\u039d \nT.OLN \nTOlN \n\u03c4\u03bf\u03b9 \n\u03c4\u03bf \" class=\"wp-image-168\"\/><\/figure>\n\n\n\n<p><strong>6. Filter the genes<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Remove any \u2018genes\u2019 (colours) that appear fewer than 3 times.<\/li>\n\n\n\n<li>If a gene appears so few times in a sample, it\u2019s unlikely to be informative \u2013 it is also difficult mathematically to compare expression when a gene appears so rarely.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image is-style-default\"><img decoding=\"async\" src=\"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-materials\/wp-content\/uploads\/sites\/30\/2021\/07\/image-9-1024x720.png\" alt=\"Machine generated alternative text:\n\n\" class=\"wp-image-170\"\/><\/figure>\n\n\n\n<p><strong>7. Normalisation<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You don\u2019t actually have to do this. In this specific activity, each cell now has the same number of transcripts. However, in a real sample, this would not be true \u2013 imagine trying to compare transcript signatures between cells with drastically different numbers! Anyway, normalisation helps here.<\/li>\n<\/ul>\n\n\n\n<p><strong>8. Find Variable Genes<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Some genes don\u2019t vary much between cells \u2013 and carrying forward a matrix of size cells x genes can make computation a bit of a nightmare! Standard pipelines only take into account genes that vary significantly.<\/li>\n\n\n\n<li>Remove all \u2018yellow\u2019 transcripts \u2013 according to the super intense algorithm of \u201cI said = so\u201d, these transcripts have been found to not vary.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image is-style-default\"><img decoding=\"async\" src=\"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-materials\/wp-content\/uploads\/sites\/30\/2021\/07\/image-10.png\" alt=\"Machine generated alternative text:\n\n\" class=\"wp-image-172\"\/><\/figure>\n\n\n\n<p><strong>9. Scale Data<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>This step is not always performed, although it can help make it easier to compare different samples with different depths of sequencing. This step scales the variation between genes to make them more easily comparable (otherwise, genes with strong expression differences will dominate the analysis, hiding subtle differences from other genes). With this step, you can also optionally \u2018regress\u2019 genes, which is to say, their variation will not contribute to cluster calling.<\/li>\n\n\n\n<li>Green genes here have been found to contribute to cell cycling. We are not interested in this and don\u2019t want it to obscure the genes driving cancer progression. Remove the green genes (\u2018cell cycle regression\u2019).<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image is-style-default\"><img decoding=\"async\" src=\"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-materials\/wp-content\/uploads\/sites\/30\/2021\/07\/image-11.png\" alt=\"7\uc774 \" class=\"wp-image-174\"\/><\/figure>\n\n\n\n<p><strong>10. Dimensionality Reduction<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Normally dimensionality reduction is a huge part of this protocol. There are only 3 dimensions (i.e. 3 genes) in this data, so you can skip this!<\/li>\n<\/ul>\n\n\n\n<p><strong>11. Identify cell clusters<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Group the cells by the \u2018transcript signatures\u2019.\n<ul class=\"wp-block-list\">\n<li>Exemple:<br>These cells would be in the same cluster<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image is-style-default\"><img decoding=\"async\" src=\"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-materials\/wp-content\/uploads\/sites\/30\/2021\/07\/image-12.png\" alt=\"Z0LN \" class=\"wp-image-176\"\/><\/figure>\n\n\n\n<p>But likely not in the same cluster as this cell:<\/p>\n\n\n\n<figure class=\"wp-block-image is-style-default\"><img decoding=\"async\" src=\"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-materials\/wp-content\/uploads\/sites\/30\/2021\/07\/image-13.png\" alt=\"Machine generated alternative text:\n\n\" class=\"wp-image-178\"\/><\/figure>\n\n\n\n<p><strong>12. Plot your cells<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Select your Cluster Plot here<\/li>\n\n\n\n<li>Plot the cells using the \u2018cell clusters\u2019 you identified in Step 5. Similar cells should be pletted close together. Put a circle around each cell cluster.\n<ul class=\"wp-block-list\">\n<li>Example:<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image is-style-default\"><img decoding=\"async\" src=\"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-materials\/wp-content\/uploads\/sites\/30\/2021\/07\/image-14.png\" alt=\"Machine generated alternative text:\n\n\" class=\"wp-image-180\"\/><\/figure>\n\n\n\n<p><strong>13. Interpret the results<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Answer the follow questions\n<ul class=\"wp-block-list\">\n<li>Were there any cells you couldn\u2019t classify?<\/li>\n\n\n\n<li>How many total cells did you find?<\/li>\n\n\n\n<li>How many cell types (clusters) are in your final map?<\/li>\n\n\n\n<li>How did you interpret the results?<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p><strong>14. Check the <a href=\"https:\/\/ftp.ebi.ac.uk\/pub\/training\/2026\/Single_cell_rna_seq_analysis_with_python_2026\/Presentations\/Dry-lab_Activity\/Dry-lab_Overview_Instructions_and_Answers.pdf\">answer key\u00a0here<\/a><\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><a href=\"https:\/\/ftp.ebi.ac.uk\/pub\/training\/2026\/Single_cell_rna_seq_analysis_with_python_2026\/Presentations\/dry-lab_overview.pptx\">Ending slide deck<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Trainer: Marisa Loach, Kayleigh Smith, Wendi Bacon Overview:&nbsp;In this session, you will identify and describe challenges and limitations in scRNA seq analysis Why does it matter? scRNA seq is a fantastic tool for answering scientific questions, but it is not the be-all\/end-all \u2013 careful and critical interpretation is required. Activity goals: Activity steps: 2. Demultiplex&#8230;<\/p>\n","protected":false},"author":6,"featured_media":0,"parent":1937,"menu_order":1,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-277","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-rna-seq-analysis-using-python-materials\/wp-json\/wp\/v2\/pages\/277","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-rna-seq-analysis-using-python-materials\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-rna-seq-analysis-using-python-materials\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-rna-seq-analysis-using-python-materials\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-rna-seq-analysis-using-python-materials\/wp-json\/wp\/v2\/comments?post=277"}],"version-history":[{"count":22,"href":"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-rna-seq-analysis-using-python-materials\/wp-json\/wp\/v2\/pages\/277\/revisions"}],"predecessor-version":[{"id":2407,"href":"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-rna-seq-analysis-using-python-materials\/wp-json\/wp\/v2\/pages\/277\/revisions\/2407"}],"up":[{"embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-rna-seq-analysis-using-python-materials\/wp-json\/wp\/v2\/pages\/1937"}],"wp:attachment":[{"href":"https:\/\/www.ebi.ac.uk\/training\/materials\/single-cell-rna-seq-analysis-using-python-materials\/wp-json\/wp\/v2\/media?parent=277"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}