The following files relate to the experiment described in the ms. "Toward practical high-capacity low-maintenance storage of digital information in synthesised DNA" by Goldman et al.
The five files encoded in DNA and decoded again can be downloaded from this directory. Please read the README.html file.
The encoded file View_huff3.cd.new included two minor errors. This in no way affected our experiment (the errors existed in the same form in the file both before encoding in DNA and after in the decoded version), but means that the file does not give a full specification of our base-3 Huffman code as intended. The correct version of this file is available as View_huff3.cd.new.correct.
Each file is also available in the form of one long string of DNA, the intermediate stage during the encoding procedure after the file length information has been added, the resulting file padded to have a length equal to a multiple of 25, and converted to non-repeating DNA nucleotide letters. These files can be downloaded from this directory.
The 153335 designed DNA strings ("features") are available in the file features.txt (28 MB). The order of the features is randomised. Every feature includes a 33 bp adapter sequence at each end. These are not information-carrying components, but were included to facilitate amplification and sequencing. The file is also available in compressed form as features.txt.gz (5 MB).
The base calls from the AYB analysis of the Illumina HiSeq 2000 sequencing run are available in the files s_2_PE1.fq (fastq-format, first paired-end reads) and s_2_PE2.fq (fastq-format,second paired-end reads). These are large files (approx. 18 GB each) so you probably want to right-click and save rather than left-click and view in browser. The files are also available in compressed form as PE1_AYB.fq.gz and PE2_AYB.fq.gz (5 GB each).
The base calls from the Bustard analysis of the same sequencing run are available in files PE1.fq, PE2.fq, PE1.fq.gz and PE2.fq.gz.
These data are also available in the Sequence Read Archive (SRA) with accession number ERP002040: http://www.ebi.ac.uk/ena/data/view/ERP002040. The AYB calls have Run reference number ERR215679: http://www.ebi.ac.uk/ena/data/view/ERR215679. The Bustard calls have Run reference number ERR207813: http://www.ebi.ac.uk/ena/data/view/ERR207813.