ENA policy relating to compression of submitted data
The European Nucleotide Archive (ENA) is committed to the safeguarding into the future of the world's public domain nucleic acid sequencing data.
In order to provide economically sustainable archiving, ENA team is actively developing CRAM, a technology for raw sequence read data compression. This technology offers both lossless compression, in which read sequence and per-base quality information is faithfully preserved, and lossy models, in which data are selectively reduced to reach an optimal balance between data preservation and compression.
It is our aim with CRAM to provide a flexible technological framework in which data producers, the broad scientific community that consumes ENA data, and funding agencies are empowered to make decisions about the level of compression that can appropriately be applied to different data sets.
ENA does not currently apply CRAM compression on incoming data and will not in the future apply lossy compression on submitted data without prior announcement and prior consultation with principal stakeholders. In addition, for legacy data already submitted and loaded into ENA, we will not seek to apply lossy compression without discussion with data owners.
Users may be aware that we currently preserve original submitted data files. Once data are loaded, these files contain redundant information with that integrated into ENA. As such, we have never committed to preserving these submitted files and will, in due course, cease to sustain their storage.