0%

How is ENA structured?

All ENA data is structured around a robust, intuitive metadata model, visualised in Figure 1. As well as sequence data, users must register a Study/Project to contain the data and describe its purpose. Most sequence data also require the registration of Samples, which describes the origins of the biomaterial which was sequenced.

Figure 1 The ENA metadata model provides a framework to ensure that data always includes plenty of metadata to contextualise it

The majority of sequence data can be divided into three tiers, which build upon one another (Figure 2):

  • Reads: the raw output of sequencing machines
  • Assembly: reconstructions of replicons (or fragments thereof) made from raw reads
  • Annotation: functional information projected onto assemblies at coordinate defined locations
Figure 2 The ENA’s three-tiered data architecture. Reads may be used to reassemble chromosomes or other replicons, and assemblies may have functional annotation applied to them.