Why do we need ENA?

It is important to maintain a record of all the original (primary) sequence data produced, not only because it forms an invaluable resource for biological research, but because this data acts as important reference material from which a wealth of secondary data are derived, such as consensus genomic sequences and translated protein sequences.

The volume of sequencing data being produced is increasing with the advancements made in sequencing technology. Not only do we require complete genomes, but we want access to all the variation seen between individuals, for example disease alleles. Such volumes of data present a challenge in terms of organisation and accessibility (1).

Understanding how sequence databases archive and annotate their data can greatly improve your ability to find what you want quickly, as well as to correctly interpret the information you find. ENA provides access to all nucleotide sequence data, including assembled and annotation-enriched data, as well as raw data as soon as it becomes available, regardless of the sequencing technology used (Figure 1).

Image showing sequence data pipeline

Figure 1. ENA provides access to raw, assembled and annotated sequence data.