CRAM usage

Download

The CRAM Java toolkit can be downloaded as follows:

CRAM 2.0 current production toolkit

CRAM 1.0 previous toolkit

CRAM 2.0 is also supported in the latest Staden io_lib package.

Format specifications

The CRAM format specification is available as follows:

CRAM 2.0 current format specification

CRAM 1.0 previous format specification

For detailed infromation about the preservation data from source BAM data in CRAM format please refer to ArchiveCRAM 1.0 specification.

Reporting bugs

Please report any bugs, issues and problems as follows:

CRAM 2.0 issues

CRAM 1.0 issues

Mailing list

Join the mailing list for discussions and announcements relating to CRAM tools. Mailing list archives are available here.

Download the executable

Download the .jar executables as follows:

CRAM 2.0 current production toolkit

CRAM 1.0 previous toolkit

The program requires java 1.6 runtime or higher:

http://openjdk.java.net/

or

http://www.oracle.com/us/technologies/java/index.html

Download and build from source

To build the program from source: 

To check out the source code from github you will need git client: http://git-scm.com/

Make sure you have java 1.6 SDK or higher:

http://openjdk.java.net/

or

http://www.oracle.com/us/technologies/java/index.html

Make sure you have ant version 1.7 or higher:

http://ant.apache.org/

Run one of the following commands:

git clone git://github.com/enasequence/cramtools.git (to retrieve CRAM 2.0)

git clone git://github.com/vadimzalunin/crammer.git (to retrieve CRAM 1.0)

and then each of the following commands:

ant -f build/build.xml runnable

java -jar cramtools.jar

To run unit tests:

ant -f build/build.xml test

Accessing command line help

General options:

java -jar cramtools-<version>.jar

BAM to CRAM conversion options:

java -jar cramtools-<version>.jar bam

CRAM to BAM conversion options:

java -jar cramtools-<version>.jar cram

Convert a BAM file to a CRAM file

A typical command to convert a BAM file into a CRAM file is:

java -jar cramtools-<version>.jar cram --input-bam-file <bam file> --reference-fasta-file <reference fasta file> [--output-cram-file <output cram file>]

Input files:

  1. <bam file>: BAM file sorted by reference coordinates
  2. <bam file>.bai: BAM index file created using samtools index <bam file>
  3. <reference fasta file>: Reference sequence in fasta format
  4. <reference fasta file>.fai: Reference sequence index file created using samtools faidx <reference fasta file>

Convert a CRAM file to a BAM file

java -jar cramtools-<version>.jar bam --input-cram-file <input cram file> --reference-fasta-file <reference fasta file> --output-bam-file <output bam file>

Input files:

  1. <cram file>: CRAM file
  2. <reference fasta file>: Uncompressed reference sequence in fasta format
  3. <reference fasta file>.fai: Reference sequence index file

Using CRAM files with Picard

Some tools using the Picard API are expected to work for reading CRAM files. For example:

java -cp cramtools-<version>.jar net.sf.picard.sam.ValidateSamFile INPUT=data.cram

will validate the CRAM file using the ValidateSAMFile Picard tool.

Please note that the CRAM toolkit replaces some of the Picard API classes. Therefore, the following will not work:

java -cp cramtools-<version>.jar -jar ValidateSamFile.jar INPUT=data.cram

Reference sequence discovery
 
For tools that use Picard API the following rules describe how the reference sequence file is discovered:
  1. Given an input file '<some name>.cram' search for a '<some name>.fa' file in the same directory.
  2. Given an input file '<some name>.cram' search for a '<some name>.fa' file in the same directory, which should contain a full path to the reference file.
  3. Use java property 'reference=<path to ref file>', usage: java -Dreference=<path to ref file> -cp cramtools.jar ...

List of CRAM tools commands

Command Description
bam Converts a CRAM file into a BAM file.
cram Converts a BAM file into a CRAM file.

List of bam command options

Option Description Default 
--calculate-md-tag Compute MD tag. false
--calculate-nm-tag Compute NM tag. false
--decrypt Decrypt the file. false
--default-quality-score Use this quality score (decimal representation of ASCII symbol) as a default value when the original quality score was lost due to compression. Minimum is 33. 63
--input-cram-file -I The path to the CRAM file.  
--output-bam-file -O The path to the output BAM file.  
--reference-fasta-file Path to the reference fasta file, it must be uncompressed and indexed (use 'samtools faidx' for example).  

List ofcram command options

Option Description Default 
--capture-all-tags Capture all tags. false
--capture-tags Capture the tags listed, for example 'OQ:XA:XB'.  
--encrypt Encrypt the CRAM file. false
--ignore-tags gnore the tags listed, for example 'OQ:XA:XB'.  
--illumina-quality-score-binning Use illumina quality score binning scheme.  
--input-bam-file -I Path to a BAM file to be converted to CRAM. Omit if standard input (pipe).  
--lossy-quality-score-spec, -L A string specifying which quality scores should be preserved.  
--max-records top after compressing this many records. 2147483647
--output-cram-file -O The path for the output CRAM file. Omit if standard output (pipe).  
--preserve-read-names Preserve all read names.  false
--reference-fasta-file -R The reference fasta file, uncompressed and indexed (.fai file, use 'samtools faidx').  

Lossy compression

The CRAM Java Toolkit allows to specify a lossy compression model using --lossy-quality-score-spec (-L) option. The model is composed of one or more words separated by '-'. Each word is a read or base selector with a quality score treatment, currently Illumina 8 bins (see below) or full scale (40 values). 

Examples:

  • N40-D8 - preserve quality scores for non-matching bases with full precision, and bin quality scores (8 bins) for positions flanking deletions.
  • m5 - preserve quality scores for reads with mapping quality score lower than 5
  • m5_8 - bin quality scores (Illumina 8 bins) for reads with mapping quality score lower than 5 
  • R40X10-N40 - preserve non-matching quality scores and those matching with coverage lower than 10
  • *8 - bin all quality scores (Illumina 8 bins)

Selectors:

  • R - bases matching the reference sequence
  • N - aligned bases mismatching the reference, this only applies to 'M', '=' (EQ) or 'X' BAM cigar elements
  • U - unmapped read
  • Pn - pileup: capture all bases at a given position on the reference if there are at least n mismatchesv D read positions flanking a deletion
  • Mn - reads with mapping quality score higher than n
  • mn - reads with mapping quality score lower than n
  • I - insertions
  • * - all

By default no quality scores will be preserved.

Illumina 8-bin scheme

0, 1, 6, 6, 6, 6, 6, 6, 6, 6, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 22, 22, 22, 22, 22, 27, 27, 27, 27, 27, 33, 33, 33, 33, 33, 37,
37, 37, 37, 37, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40"