Technical help sheet
Technical requirements for data analysis
The practicals in this course were run on virtual machines with Linux Ubuntu 22.04 Operating Systems, 32 GB of RAM, 14 vCPUs, and 400GB of storage capacity.
You will require following tools for executing the practicals in this course:
Alignment and SNV analysis
Tools
- bwa 0.7..17: https://github.com/lh3/bwa https://sourceforge.net/projects/bio-bwa/files/
- FASTQC 0.12.1: https://www.bioinformatics.babraham.ac.uk/projects/download.html#fastqc
- samtools 1.20
- gatk 4.5.0.0: https://github.com/broadinstitute/gatk/releases
- strelka-2.9.10: https://github.com/Illumina/strelka
- ensembl-vep release 112.0: https://github.com/Ensembl/ensembl-vep
- vcf2maf: https://github.com/mskcc/vcf2maf
- trimmomatic: 0.39 http://www.usadellab.org/cms/?page=trimmomatic
- Firefox: https://www.mozilla.org/es-MX/firefox/new/
- R 4.3: https://www.r-project.org/
- maftools 3.18 (R package): https://bioconductor.org/about/release-announcements/
- picard: 3.1.1 https://github.com/broadinstitute/picard/releases/tag/3.1.1
- python2: https://www.python.org/download/releases/2.0/
Datasets
- vep 112 cache: https://ftp.ensembl.org/pub/release-112/variation/indexed_vep_cache/
- Trimmomatic adaptors: https://github.com/timflutre/trimmomatic/blob/master/adapters/TruSeq3-PE-2.fa
- hg38 chr7.fa.gz, COLO829BL.R1.fastq.gz, COLO829BL.R2.fastq.gz, COLO829T.R1.fastq.gz, COLO829T.R2.fastq.gz
- https://console.cloud.google.com/storage/browser/genomics-public-data/references/GRCh38?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false
- http://ftp.ebi.ac.uk/pub/training/2023/Cancer_genomics_transcriptomics_July_2023/Day1/CancerGenomicsCourse_EMBL-EBI/data/fastq_files/
- Homo_sapiens_assembly38.dbsnp138.vcf.gz: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0;tab=objects?pli=1&prefix=&forceOnObjectsSortingFiltering=false
- 1000G_phase1.snps.high_confidence.hg38.vcf.gz: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false
- 1000G_phase1.snps.high_confidence.hg38.vcf.gz: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false
- 1000g_pon.hg38.vcf.gz: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false
SNV and CNV analysis practicals
- Installation:
apt-get update && apt-get install -y autoconf build-essential cmake g++ git libcurl4-gnutls-dev libbz2-dev libdeflate-dev libgl1-mesa-dev libncurses-dev liblzma-dev pkg-config zlib1g-dev && git clone --recursive https://github.com/tobiasrausch/vc && cd vc && make all
- After installation:
https://github.com/tobiasrausch/vc
- All the course data is packaged here https://github.com/tobiasrausch/vc with a Makefile to install tools and download tutorial data (as described above)
Mutational signatures and clonal population structure analysis in cancer genomes
Tools
- SeqPurge
- bwa
- bwamem
- samtools
- gatk
- igv
- ANNOVAR
- VEP
- Strelka2
- MuTect2
- fastqc
R Libraries
- library(data.table)
- library(ggplot2)
- library(mobster)
- library(GenomicRanges)
- library(dmr.util)
- library(ccube)
- library(data.table)
- library(BSgenome)
- library(ref_genome, character.only = TRUE)
- library(GenomicRanges)
- library(MutationalPatterns)
- library(plyr)
- library(NMF)
- library(mobster)
CRISPR-Cas9 screen practicals
- RStudio – https://www.rstudio.com/products/rstudio/download/
- DNAcopy – https://bioconductor.org/packages/release/bioc/html/DNAcopy.html
- CRISPRcleanR – https://github.com/francescojm/CRISPRcleanR
- Python – https://www.python.org/downloads/
- Mageck (v0.5.3. or higher) – https://sourceforge.net/projects/mageck/files/0.5/
- To check that everything is in order and works properly please run this script: https://www.dropbox.com/s/607ehxcdne6m86k/SlideScript.R?dl=0
Short-read and long-read RNA-seq practicals
Software Link to the software web page / source Notes
- conda https://conda.io/projects/conda/en/latest/user-guide/install/index.html
- STAR v2.7.8a or newer https://github.com/alexdobin/STAR
- STAR-Fusion v1.10.0 and newer https://github.com/STAR-Fusion/STAR-Fusion
- CTAT lib (~31GB) https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_StarFv1.10/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz – This is a tarball with resources (~31GB)
- IGV https://software.broadinstitute.org/software/igv/ Requires Java
- htslib http://www.htslib.org/download/
- samtools http://www.htslib.org/download/
- minimap2 https://github.com/lh3/minimap2
- salmon https://github.com/COMBINE-lab/salmon There is a precompiled Linux binary : https://github.com/COMBINE-lab/salmon/releases/download/v1.8.0/salmon-1.8.0_linux_x86_64.tar.gz
- bamseek https://code.google.com/archive/p/bamseek Just a jar file; needs java
- NanoPack https://github.com/wdecoster/nanopack May conflict with Pychopper; to avoid conflicts install in a separate conda environment
- Pychopper https://github.com/epi2me-labs/pychopper May conflict with Nanopack; to avoid conflicts install in a separate conda environment
- Docker https://www.docker.com/ This is an EPI2ME Desktop dependence. Singularity could be installed instead of Docker
- Nextflow https://www.nextflow.io/ This is an EPI2ME Desktop dependence; requires java
- EPI2ME Desktop https://labs.epi2me.io/about/ Although its called EPI2ME Desktop, I only used it in command line (on clusters). I hope it has a nice GUI too 🙂
- wf-transcriptomes pipeline https://github.com/epi2me-labs/wf-transcriptomes This is one of the pipelines supported by EPI2ME, I only need this one, not the others.
- R https://cran.r-project.org/index.html
- R-studio https://www.rstudio.com
- CRAN R-packages:
- dplyr https://cran.r-project.org/web/packages/dplyr/index.html
- ggplot2 https://cran.r-project.org/web/packages/ggplot2/index.html
- gplots https://cran.r-project.org/web/packages/gplots/index.html
- rjson https://cran.r-project.org/web/packages/rjson/index.html
- knitr https://cran.r-project.org/web/packages/knitr/index.html
- rmarkdown https://cran.r-project.org/web/packages/rmarkdown/index.html
- Bioconductor R-packages:
- rjson https://cran.r-project.org/web/packages/rjson/index.html
- knitr https://cran.r-project.org/web/packages/knitr/index.html
- rmarkdown https://cran.r-project.org/web/packages/rmarkdown/index.html
- Bioconductor R-packages:
- chimeraviz https://www.bioconductor.org/packages/release/bioc/html/chimeraviz.html
- DESeq2 https://www.bioconductor.org/packages/release/bioc/html/DESeq2.html
- edgeR https://bioconductor.org/packages/release/bioc/html/edgeR.html
- rtracklayer https://www.bioconductor.org/packages/release/bioc/html/rtracklayer.html
- tximport https://www.bioconductor.org/packages/release/bioc/html/tximport.html
- tximeta https://bioconductor.org/packages/release/bioc/html/tximeta.html
- Rsubread https://bioconductor.org/packages/release/bioc/html/Rsubread.html
Deconvolution of the clonality of a tumour using single-cell transcriptomic data
Tools
- BayesPrism: https://github.com/Danko-Lab/BayesPrism
- MuSiC: https://github.com/xuranw/MuSiC?tab=readme-ov-file
- Immunedeconv: https://github.com/omnideconv/immunedeconv
- Seurat: https://github.com/satijalab/seurat
- R 4.4, Rstudio, Rmarkdown 2.26: https://www.r-project.org/
- Tidyverse: https://cran.r-project.org/web/packages/tidyverse/index.html
- ComplexHeatmap: https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html
- Firefox: https://www.mozilla.org/es-MX/firefox/new/
- BiocManager
- RColorBrewer: https://cran.r-project.org/web/packages/RColorBrewer/index.html
- Bisque: https://github.com/cozygene/bisque
- CIBERSORTx docker: https://github.com/omnideconv/omnideconv
Datasets
- GSE115978 downsampled: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115978