CRAM toolkit

CRAM is a framework technology comprising file format and toolkit in which we combine highly efficient and tunable reference-based compression of sequence data with a data format that is directly available for computational use. In support of CRAM, we also provide the CRAM reference registry.

Building on early proof-of-principle for reference-based compression (Hsi-Yang Fritz, et al. (2011). Genome Res. 21:734-740), our approach has been to balance usability with compression efficiency. CRAM supports production pipelines for the European Nucleotide Archive*. Current work includes improvements to functionality and broader integration with third party tools. We remain involved in the community discussion around the application of CRAM to different types of data (Cochrane G. et al. (2012). GigaScience 1:2)

The latest CRAM version is the CRAM 2.1. Please note that minor revisions to the format are possible during the two month review period.

*Note ENA policy on data compression

Download

The CRAM Java toolkit can be downloaded as follows:

CRAM 2.1 current production toolkit

CRAM 1.0 previous toolkit

CRAM 2.1 is also supported in beta in the latest Staden io_lib package.

Format specification

The CRAM format specification is available as follows:

CRAM 2.1 current production specification

CRAM 1.0 previous specification

For detailed infromation about the preservation data from source BAM data in CRAM format please refer to ArchiveCRAM 1.0 specification.

Mailing list

We encourage membership of the CRAM developers mailing list.

Latest ENA News

20 Aug 2014: Read data through Globus GridFTP
Read data can now be downloaded using Globus GridFTP through ebi#ena Globus Online public endpoint.

18 Aug 2014: Changes to SRA XML 1.5
Small changes to Experiment XML, Analysis XML, EGA Dataset XML, EGA DAC XMLs were deployed on 11th of August 2014.

1 Jul 2014: ENA release 120
Release 120 of ENA's assembled/annotated seqences now available

23 May 2014: Change to date format for advanced search
From 16th June 2014, the date format used in the advanced search will be changed to ISO format (YYYY-MM-DD).

20 May 2014: Update to the ENA SAMPLE checklist
From 10th of June 2014 the ENA SAMPLE checklist XML will be updated and the older version will be deprecated.