ENA policy relating to compression of submitted data

The European Nucleotide Archive (ENA) is committed to the safeguarding into the future of the world's public domain nucleic acid sequencing data.

In order to provide economically sustainable archiving, ENA team is actively developing CRAM, a technology for raw sequence read data compression. This technology offers both lossless compression, in which read sequence and per-base quality information is faithfully preserved, and lossy models, in which data are selectively reduced to reach an optimal balance between data preservation and compression.

It is our aim with CRAM to provide a flexible technological framework in which data producers, the broad scientific community that consumes ENA data, and funding agencies are empowered to make decisions about the level of compression that can appropriately be applied to different data sets.

ENA does not currently apply CRAM compression on incoming data and will not in the future apply lossy compression on submitted data without prior announcement and prior consultation with principal stakeholders. In addition, for legacy data already submitted and loaded into ENA, we will not seek to apply lossy compression without discussion with data owners.

Users may be aware that we currently preserve original submitted data files. Once data are loaded, these files contain redundant information with that integrated into ENA. As such, we have never committed to preserving these submitted files and will, in due course, cease to sustain their storage.

Latest ENA news

19 Jan 2018: Forthcoming changes to WGS and TSA sequences

ENA is making changes to provision of WGS and TSA sequences

05 Jan 2018: ENA release 134

Release 134 of ENA's assembled/annotated sequences is now available

21 Dec 2017: ENA services over the holiday period

Between Friday 22nd December and Tuesday 2nd January ENA services such as submissions and retrieval...

21 Dec 2017: ENA release 134 expected early January

The last release of assembled and annotated sequences for 2017 (134) has been particularly...