How to search ENA with taxonomy
How to use the taxonomy portal
To look for information on what sequence is available for a species, the taxonomy portal allows easy navigation via a taxonomic tree and a summary of the sequence available at each taxonomic node (see 'Navigating the taxonomic tree'). The taxonomy portal allows you to look at the total coverage for any organism, or for any node in the taxonomic tree.
When doing a text search using taxonomy, it is best to use the scientific taxonomic name as it is more precise. However, you can also use a common name, but you are more likely to get a range of different taxa. For example, Figure 22 shows a query on the common name 'honey bee', which returns results for four different taxa.
Figure 22. Results of an ENA browser text search on 'Honey bee'; taxonomy results are found in the 'Other' section.
A closer look at the taxonomy portal
By expanding the 'Taxa results' section, you can see a summary of the nucleotide information available for a taxon (Figure 23):
Figure 23. Taxonomy portal detailing the nucleotide information available for Apis mellifera (honey bee).
Navigating the taxonomy tree
The taxonomy tree also provides an easy way to explore what nucleotide information is available for related taxonomic groups (Figure 24):
Figure 24. Navigation tab showing the taxonomic tree for Apis mellifera (honey bee).
Restricting your search by taxonomy is a good way of cutting out unwanted data, especially if all you need are sequences from one or a few related species. However, you need to be careful that you don't exclude relevant data from your search. There are several points to consider (Figures 25-28):
How specific is the taxonomy you require?
ENA contains information on strains, varieties and breeds for many taxonomic groups, whether or not the sequences varies between them.
Figure 25. There are several dog sequences in EMBL-Bank; this one is for the Alsatian breed.
What if the sequence you require has no taxonomy associated with it?
ENA contains sequences for which no species has been identified, such as those from environmental studies, synthetic constructs, transgenics and patents. These are found under special taxonomic divisions (see How is the data structured).
Figure 26. EMBL-Bank entry displaying the source information for an unknown bacterium.
Could your sequence be classified in a different way?
Some sequences are difficult to classify and require caution when searching so as not to miss valuable data. For example, endogenous viruses are usually classified by the host organism in which they were sequenced, or as being viral if isolated and sequenced. Therefore, endogenous viruses should be searched under both the VRL (virus) division and the taxonomic division of the host organism.
Figure 27. EMBL-Bank entry of the endogenous virus gamma-3, which is classified as being from the organism Canis lupus familiaris (dog) in the MAM (mammal) division, because it was sequenced as part of the dog genome.
Are you looking to compare sequences from a group of related organisms?
Taxonomic divisions help divide the data into manageable chunks, but be careful which search engine you use:
The ENA browser merges divisions together to provide results that are complete for a taxonomic group; for example, if you search 'Rodents' you will include ROD + MUS divisions.
The Sequence Search & Analysis tools keep the divisions separate to allow more flexibility when searching; for example, if you search 'EMBL Rodent' you will only include the ROD division (to search ROD + MUS you must select both).
Does the organism you are interested in have any alternative names?
Organisms can sometimes have different accepted names, or synonyms, because they were referenced in the literature differently or because different resources use different taxonomic classifications; this is important when you link out to external resources.
ENA uses NCBI taxonomy for all its taxonomic classifications.
Are you certain the taxonomic name for your organism is unique?
Sometimes different organisms can share the same taxonomic name (homonyms).
Figure 28. ENA taxonomy search reveals two organisms with the same genus species name: Agathis montana de Laub is a conifer tree, while Agathis montana Shest is a wasp.