CRAM toolkit

CRAM is a framework technology comprising file format and toolkit in which we combine highly efficient and tunable reference-based compression of sequence data with a data format that is directly available for computational use. In support of CRAM, we also provide the CRAM reference registry.

Building on early proof-of-principle for reference-based compression (Hsi-Yang Fritz, et al. (2011). Genome Res. 21:734-740), our approach has been to balance usability with compression efficiency. CRAM supports production pipelines for the European Nucleotide Archive*. Current work includes improvements to functionality and broader integration with third party tools. We remain involved in the community discussion around the application of CRAM to different types of data (Cochrane G. et al. (2012). GigaScience 1:2)

The latest CRAM version is the CRAM 2.1. Please note that minor revisions to the format are possible during the two month review period.

*Note ENA policy on data compression

Download

The CRAM Java toolkit can be downloaded as follows:

CRAM 2.1 current production toolkit

CRAM 1.0 previous toolkit

CRAM is also supported by SAMtools.

Format specification

The CRAM format specification is available as follows:

CRAM 2.1 current production specification

CRAM 1.0 previous specification

For detailed infromation about the preservation data from source BAM data in CRAM format please refer to ArchiveCRAM 1.0 specification.

Mailing list

We encourage membership of the CRAM developers mailing list.

Latest ENA news

23 Mar 2015: ENA release 123
Release 123 of ENA's assembled/annotated sequences now available

18 Feb 2015: Planned changes to ENA sequence search
Planned changes to ENA sequence search: ENA is changing to a new, BLAST-based, sequence search service.