CRAM usage
Download
The CRAM Java toolkit can be downloaded as follows:
CRAM 2.0 current production toolkit
CRAM 2.0 is also supported in the latest Staden io_lib package.
Format specifications
The CRAM format specification is available as follows:
CRAM 2.0 current format specification
CRAM 1.0 previous format specification
For detailed infromation about the preservation data from source BAM data in CRAM format please refer to ArchiveCRAM 1.0 specification.
Reporting bugs
Please report any bugs, issues and problems as follows:
Mailing list
Join the mailing list for discussions and announcements relating to CRAM tools. Mailing list archives are available here.
Download the executable
Download the .jar executables as follows:
CRAM 2.0 current production toolkit
The program requires java 1.6 runtime or higher:
or
http://www.oracle.com/us/technologies/java/index.html
Download and build from source
To build the program from source:
To check out the source code from github you will need git client: http://git-scm.com/
Make sure you have java 1.6 SDK or higher:
or
http://www.oracle.com/us/technologies/java/index.html
Make sure you have ant version 1.7 or higher:
Run one of the following commands:
git clone git://github.com/enasequence/cramtools.git (to retrieve CRAM 2.0)
git clone git://github.com/vadimzalunin/crammer.git (to retrieve CRAM 1.0)
and then each of the following commands:
ant -f build/build.xml runnable
java -jar cramtools.jar
To run unit tests:
ant -f build/build.xml test
Accessing command line help
General options:
java -jar cramtools-<version>.jar
BAM to CRAM conversion options:
java -jar cramtools-<version>.jar bam
CRAM to BAM conversion options:
java -jar cramtools-<version>.jar cram
Convert a BAM file to a CRAM file
A typical command to convert a BAM file into a CRAM file is:
java -jar cramtools-<version>.jar cram --input-bam-file <bam file> --reference-fasta-file <reference fasta file> [--output-cram-file <output cram file>]
Input files:
- <bam file>: BAM file sorted by reference coordinates
- <bam file>.bai: BAM index file created using samtools index <bam file>
- <reference fasta file>: Reference sequence in fasta format
- <reference fasta file>.fai: Reference sequence index file created using samtools faidx <reference fasta file>
Convert a CRAM file to a BAM file
Input files:
- <cram file>: CRAM file
- <reference fasta file>: Uncompressed reference sequence in fasta format
- <reference fasta file>.fai: Reference sequence index file
Using CRAM files with Picard
Some tools using the Picard API are expected to work for reading CRAM files. For example:
java -cp cramtools-<version>.jar net.sf.picard.sam.ValidateSamFile INPUT=data.cram
will validate the CRAM file using the ValidateSAMFile Picard tool.
Please note that the CRAM toolkit replaces some of the Picard API classes. Therefore, the following will not work:
java -cp cramtools-<version>.jar -jar ValidateSamFile.jar INPUT=data.cram
- Given an input file '<some name>.cram' search for a '<some name>.fa' file in the same directory.
- Given an input file '<some name>.cram' search for a '<some name>.fa' file in the same directory, which should contain a full path to the reference file.
- Use java property 'reference=<path to ref file>', usage: java -Dreference=<path to ref file> -cp cramtools.jar ...
List of CRAM tools commands
| Command | Description |
| bam | Converts a CRAM file into a BAM file. |
| cram | Converts a BAM file into a CRAM file. |
List of bam command options
| Option | Description | Default |
| --calculate-md-tag | Compute MD tag. | false |
| --calculate-nm-tag | Compute NM tag. | false |
| --decrypt | Decrypt the file. | false |
| --default-quality-score | Use this quality score (decimal representation of ASCII symbol) as a default value when the original quality score was lost due to compression. Minimum is 33. | 63 |
| --input-cram-file -I | The path to the CRAM file. | |
| --output-bam-file -O | The path to the output BAM file. | |
| --reference-fasta-file | Path to the reference fasta file, it must be uncompressed and indexed (use 'samtools faidx' for example). |
List ofcram command options
| Option | Description | Default |
| --capture-all-tags | Capture all tags. | false |
| --capture-tags | Capture the tags listed, for example 'OQ:XA:XB'. | |
| --encrypt | Encrypt the CRAM file. | false |
| --ignore-tags | gnore the tags listed, for example 'OQ:XA:XB'. | |
| --illumina-quality-score-binning | Use illumina quality score binning scheme. | |
| --input-bam-file -I | Path to a BAM file to be converted to CRAM. Omit if standard input (pipe). | |
| --lossy-quality-score-spec, -L | A string specifying which quality scores should be preserved. | |
| --max-records | top after compressing this many records. | 2147483647 |
| --output-cram-file -O | The path for the output CRAM file. Omit if standard output (pipe). | |
| --preserve-read-names | Preserve all read names. | false |
| --reference-fasta-file -R | The reference fasta file, uncompressed and indexed (.fai file, use 'samtools faidx'). |
Lossy compression
The CRAM Java Toolkit allows to specify a lossy compression model using --lossy-quality-score-spec (-L) option. The model is composed of one or more words separated by '-'. Each word is a read or base selector with a quality score treatment, currently Illumina 8 bins (see below) or full scale (40 values).
Examples:
- N40-D8 - preserve quality scores for non-matching bases with full precision, and bin quality scores (8 bins) for positions flanking deletions.
- m5 - preserve quality scores for reads with mapping quality score lower than 5
- m5_8 - bin quality scores (Illumina 8 bins) for reads with mapping quality score lower than 5
- R40X10-N40 - preserve non-matching quality scores and those matching with coverage lower than 10
- *8 - bin all quality scores (Illumina 8 bins)
Selectors:
- R - bases matching the reference sequence
- N - aligned bases mismatching the reference, this only applies to 'M', '=' (EQ) or 'X' BAM cigar elements
- U - unmapped read
- Pn - pileup: capture all bases at a given position on the reference if there are at least n mismatchesv D read positions flanking a deletion
- Mn - reads with mapping quality score higher than n
- mn - reads with mapping quality score lower than n
- I - insertions
- * - all
By default no quality scores will be preserved.
Illumina 8-bin scheme
0, 1, 6, 6, 6, 6, 6, 6, 6, 6, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 22, 22, 22, 22, 22, 27, 27, 27, 27, 27, 33, 33, 33, 33, 33, 37,
37, 37, 37, 37, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40"

