CRAM tutorial

CRAM is a framework technology comprising file format and toolkit in which we combine highly efficient and tunable reference-based compression of sequence data with a data format that is directly available for computational use. In support of CRAM, we also provide the CRAM reference registry.

This is a step by step tutorial for converting NGS cram format using CRAM toolkit and explaining other options in the CRAM toolkit.

Part 1: Convert CRAM to BAM

  1. create a directory and cd there (Open a Terminal) :
    mkdir CRAM
    cd CRAM
  2. Download the cramtools jar file:
    wget https://github.com/enasequence/cramtools/blob/master/cramtools-2.0.jar?raw=true -O cramtools-2.0.jar
  3. Create alias for convenience:
    alias cramtools='java -jar cramtools-2.0.jar'
  4. Try the alias, this should print cramtools usage:
    cramtools
  5. Download a public CRAM file from ENA ftp archive:
    wget ftp://ftp.era.ebi.ac.uk/vol1/ERA209/ERA209803/cram/9233_8%23168_1.cram
  6. Convert the downloaded CRAM file to BAM format:
    cramtools bam -I 9233_8#168_1.cram -O 9233_8#168_1.cram.bam
  7. Peek into the CRAM file using FASTQ format:
    cramtools fastq -I 9233_8#168_1.cram | head
  8. Peek into the BAM file using samtools:
    samtools view 9233_8#168_1.cram.bam | head

Part 2: Convert BAM to CRAM

  1. Download reference sequence used in the CRAM file:
    cramtools getref -I 9233_8#168_1.cram -F ref.fasta
  2. Index the reference fasta file:
    samtools faidx ref.fasta
  3. Compress the BAM file preserving read names and bin non-matching quality scores to 8 values:
    cramtools cram -I 9233_8#168_1.cram.bam -R ref.fasta --preserve-read-names -L N8 -O 9233_8#168_1.cram.bam.cam
  4. Peek into the resulting CRAM file:
    cramtools fastq -I 9233_8#168_1.cram.bam.cam | head -20
  5. Compare the file sizes:
    ls -la

Presentation

CRAM

Latest ENA news

11 Oct 2017: Read data download issues resolved

Read data download issues previously affecting ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk services now resolved.

06 Oct 2017: ENA read data download issues

Issues with read data download from ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk

04 Oct 2017: ENA Release 133

Release 133 of ENA's assembled/annotated sequences now available