This page has been deprecated and may be removed without further notice.

CRAM tutorial

CRAM is a framework technology comprising file format and toolkit in which we combine highly efficient and tunable reference-based compression of sequence data with a data format that is directly available for computational use. In support of CRAM, we also provide the CRAM reference registry.

This is a step by step tutorial for converting NGS cram format using CRAM toolkit and explaining other options in the CRAM toolkit.

Part 1: Convert CRAM to BAM

  1. create a directory and cd there (Open a Terminal) :
    mkdir CRAM
    cd CRAM
  2. Download the cramtools jar file:
    wget -O cramtools-2.0.jar
  3. Create alias for convenience:
    alias cramtools='java -jar cramtools-2.0.jar'
  4. Try the alias, this should print cramtools usage:
  5. Download a public CRAM file from ENA ftp archive:
  6. Convert the downloaded CRAM file to BAM format:
    cramtools bam -I 9233_8#168_1.cram -O 9233_8#168_1.cram.bam
  7. Peek into the CRAM file using FASTQ format:
    cramtools fastq -I 9233_8#168_1.cram | head
  8. Peek into the BAM file using samtools:
    samtools view 9233_8#168_1.cram.bam | head

Part 2: Convert BAM to CRAM

  1. Download reference sequence used in the CRAM file:
    cramtools getref -I 9233_8#168_1.cram -F ref.fasta
  2. Index the reference fasta file:
    samtools faidx ref.fasta
  3. Compress the BAM file preserving read names and bin non-matching quality scores to 8 values:
    cramtools cram -I 9233_8#168_1.cram.bam -R ref.fasta --preserve-read-names -L N8 -O
  4. Peek into the resulting CRAM file:
    cramtools fastq -I | head -20
  5. Compare the file sizes:
    ls -la