- Course overview
- Search within this course
- What is MGnify portal?
- Why should I submit my metagenomics data to ENA?
- What type of metagenomic data can I submit to ENA?
- How to submit metagenomics data to ENA in one session
- How to submit data in four separate steps
- Summary
- Quiz: Check your learning
- Your feedback
- Get help and support on submitting metagenomics data
What are metadata and why are they so important?
Metadata are the in-depth, controlled description of the sample that your sequence was taken from. Essentially, the ‘what, where, how, and when’ of your study from collection to sequence generation, plus contextual data such as environmental conditions (latitude, longitude, temperature) or clinical observations.
It is essential to describe your samples with such data in order to carry out a meaningful comparison with other samples or projects. For this to happen, all submitters must use a common set of terms to ensure that the vocabulary used to define metadata is constrained.
Using a controlled vocabulary to describe metadata
To help describe data, ENA uses sets of controlled vocabularies such as the Environmental Ontology (ENVO), which can be accessed on the ENVO page and explored using the Ontology Lookup Service (OLS) (Figure 1).

|
Here are some of the terms that could be used to describe it:
Can you see the confusion that could arise by not using a controlled vocabulary? However, by using ENVO: freshwater lake (ENVO:00000021), there is no ambiguity. |