CRAM

 

CRAM is a sequencing read file format that is highly space efficient by using reference-based compression of sequence data and offers both lossless and lossy modes of compression. Building on early proof-of-principle for reference-based compression (Hsi-Yang Fritz, et al. (2011). Genome Res. 21:734-740), the CRAM format balances usability with compression efficiency.

The format specification is maintained by the Global Alliance for Genomics and Health (GA4GH) Large Scale Genomics workstream, whose members provide multiple implementations and coordinate future specification changes. In support of CRAM, the ENA provides the CRAM reference registry for serving reference sequences to users of the CRAM format.

The latest CRAM version is CRAM 3.0.

*Note ENA policy on data compression.

CRAM 3.0 Implementations

SAMtools/htslib

GATK/htsjdk

Integrated Genome Viewer (IGV)

GMOD/cram-js

Format specification

CRAM 3.0 specification

CRAM 2.1 specification

CRAM 1.0 specification

Mailing lists

We encourage membership of the samtools developers mailing list and the GA4GH Large Scale Genomics workstream.