Taxonomy

The classification system for source biological organisms for all INSDC records is the NCBI Taxonomy and is available from the ENA browser (see here for an example). ENA curators work alongside taxonomists at the NCBI to ensure that all ENA records display the accepted organism name and classification hierarchy. NCBI Taxonomy covers the complete tree of life and also includes other types, such as synthetic constructs and environmental samples. However, it is an incomplete classification system in that it only considers taxa for data that are represented in INSDC records. Users should note that taxa are only displayed if at least one associated ENA record is available.

Submittable Organism Names at ENA

Submitted organism names must be at ‘species’ rank. This rank type does not automatically mean the name is a published binomen; it is simply a rank, which differentiates the sequenced organism from another. For example, unidentified strains of the same bacterial genus should be kept as separate species, rather than binned together under the same genus name.

Taxon Checks

The ‘Taxonomy Check/Request’ tool in Webin is a new option that allows submitters to check the specified taxon is valid/submittable. Submitters are advised to use this tool to check if their taxon is valid or to register unclassified taxa for use in their submission, before registering samples for read and genome assembly submissions and before submitting targeted assembled and annotated sequences. Please use the scientific name. If it is recognised, the tool will return a taxid (a unique identifier for the taxonomic record of that species). If the organism name is not recognised, it could be for several reasons. Firstly, you may have provided a taxonomic rank, which is not submittable, e.g., genus or family name only. If you do not have enough information to designate the species, or if it is a novel species, you should create an ‘informal name’. Instructions are set out below (see Unidentified or Novel Organisms). Secondly, your organism may not yet be classified at NCBI Taxonomy – only taxa which have been sequenced are available in NCBI Taxonomy; the taxonomists continually add published species and informal names as their respective sequences are deposited into the nucleotide archives.  

Taxon Requests

If you wish to request a new taxonomic name, simply fill in the details within the tool form and select the 'Send details' button. We will send the details to taxonomists to review and notify you when the taxon name has been added to the taxonomy database. You can continue your submission once the taxonomic name has been indexed within our submission tool (usually within two working days).

Note that organism names often have alternative names (synonyms, teleomorph/anamorph names, acronyms, etc.). If one of the alternative names is used in the tool, it will return the Preferred Name. The Preferred Name must be used in submissions. If you strongly disagree with the Preferred Name and can provide published literature to argue your case, please contact us (datasubs@ebi.ac.uk) and we will pass the details on to NCBI Taxonomy; the taxonomists there will review the case and update the Preferred Name if appropriate.

Author citations should not be included within the scientific name. There are a very few exceptions to this and these are due to ambiguities between different nomenclature codes (cf. Agathis montana Shest. 1932 and Agathis montana de Laub. 1969).

The following text describes how to provide taxonomic names of sequence sources, which don’t follow traditional nomenclature, or have either not yet been published or are unidentified to species level. Note that this document is by no means exhaustive; there are many other cases and exceptions. If you are having difficulties in generating a name, it is always worth checking for similar names within the NCBI Taxonomy browser (http://www.ncbi.nlm.nih.gov/taxonomy); this may help to provide the format required. Otherwise, do not hesitate to contact us via the Helpdesk button.

Organisms which have not been isolated prior to sequencing (environmental samples)

If you are submitting sequences that have been identified taxonomically from homology alone, with no culturing or isolation of the organism beforehand, then we consider these as environmental samples. A typical example would be a dataset of 16S rRNA gene amplicon sequences from a mixed DNA sample (i.e., metagenome).  There are exceptions in this group: for example, organisms which can be reliably recovered from their diseased host (e.g., endosymbionts, phyoplasmas) and organisms from samples which are readily identifiable by other means (e.g., cyanobacteria); organisms such as these are not considered in the way described here.

Environmental samples are usually prefixed with the term uncultured (exceptions exist) and are not allowed to have a species epithet. Some examples of basic organism names that can be used include:

uncultured bacterium  (taxid:77133)

uncultured archaeon  (taxid:115547)

uncultured cyanobacterium  (taxid:1211)

uncultured prokaryote  (taxid:198431)

uncultured fungus  (taxid:175245)

uncultured eukaryote  (taxid:100272)

 More granular identification is preferred, up to Genus level. For example, for prokaryotes, the format is:

uncultured <Rank> sp.

e.g. uncultured Bacillus sp.

e.g. uncultured Thermococcus sp.

 For Fungi, the ‘sp.’ is dropped:

uncultured <Rank>

e.g., uncultured Glomus

e.g., uncultured Saccharomycetes

 Note that these names are used for individual sequence records. If, however, you are naming a Sample record, you will need to describe the complete metagenome, rather than an individual sequence.  A few examples are:

air metagenome  (taxid:655179)

marine metagenome  (taxid:408172)

fish gut metagenome  (taxid:1602388)

root metagenome  (taxid:1118232)

 The full list of available metagenomes (ecological and organismal) can be found here: http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=408169

Please note that new metagenome taxonomic records are rarely added, particularly those that add granularity. Please use the closest available choice, even if this is a less granular option. Only request a new term if you are sure you are unable to use anything in the lists.

Cyanobacteria

Cyanobacteria taxonomy is very complex and so the strain or culture collection identifier is always captured as part of the organism name, whether or not it is identified at species level.

Nostoc punctiforme PCC 73102

Chroococcidiopsis sp. SAG 2025

 

Synthetic Sequences

Synthetic sequences, such as cloning and expression vectors, can use one of the relevant taxa:

synthetic construct (taxid:32630) [uses translation table 11]

eukaryotic synthetic construct (taxid:111789) [uses translation table 1]

synthetic construct (code 6) [uses translation table 6]

Alternatively, a unique name can be requested. In such cases, the name is formed from the type of construct and a unique identifier. Here are some real examples:

Cloning vector pNICO

Expression vector pTEV5

Site-specific excision vector pFLPe4

Viruses

Viruses do not fit well into biological classification systems and do not follow the format of binomial nomenclature. Instead, descriptive names, usually involving the host or disease, are formed. NCBI Taxonomy will add virus names if they are not already in their database.

Certain viruses, specifically those involved in human health, should be named in accordance to known standards, where metadata, such as strain, host and serotype, are included within the taxon name. Here are some examples:

Influenza A virus (A/chicken/Fujian/10567/2005(H5N1))

Influenza B virus (B/Acre/129920-IEC/2014)

HIV-1 CRF02_AG:08GQ032

Norovirus 13-BH-1/2013/GII.17

Norovirus 16-G0188/Ger/2016

Norovirus groundwater/GII.17/61/2010/KOR

Sapovirus Sewage/Toyama/Fu-Feb/2010/JP

Sapovirus Hu/Toyama/Jan3519/2013/JP

Endosymbionts

Endosymbionts live within the cells of their host organisms. They cannot usually be cultured outside of the host. Although such organisms are technically ‘uncultured’ in our terminology, they are exempt from the treatment of other environmental samples. Naming is usually in the format: “<type> endosymbiont of <host>”. Here are some real examples:

endosymbiont of Acharax sp. [taxid:568145]

bacterium endosymbiont of Donacia thallassina [taxid:742888]

Wolbachia endosymbiont of Drosophila recens [taxid:214475]

Rickettsia endosymbiont of Camponotus sayi [taxid:359403]

Unidentified or Novel Organisms

An informal name is used when the organism being submitted is unidentified to a specific name, or if a novel species has not yet been published. Such a name will have a species rank. If you are submitting further data from an informally-named sample, you should re-use the informal name.

Creating an informal name is generally a simple task. The instructions below should be taken as guidelines. There are may caveats and exclusions, and NCBI Taxonomy may re-name a suggested informal name to suit a more specific format for that organism group.

Prokaryotes

If the genus is published but the species is novel or unidentified, please use the following format, where the identifier is something unique to the culture (e.g., strain or isolate name):

<Genus> sp. <identifier>

e.g. Bacillus sp. ABC123

e.g. Thermococcus sp. DEF456

If the genus is unknown, please use the name of the highest known taxonomic rank, followed by the relevant descriptor (bacterium or archaeon), followed by the identifier.

<Rank> <bacterium or achaeon> <identifier>

e.g. Bacillaceae bacterium ABC123

e.g. Firmicutes bacterium ABC123

e.g. Thermococcales archaeon DEF456

e.g. Euryarchaeota archaeon DEF456

Note that if you are publishing a novel species, each strain of the proposed specie should be given a unique informal name. Only after publication will the individual records be merged and renamed as the formal name.

Cyanobacteria should always be submitted with the strain appended, even when the species epithet is provided.

Eukaryotes

Higher organisms are treated in a similar way to prokaryotes. However, the term “sp.” is added no matter which taxonomic rank is used. Also, if multiple strains/isolates/samples are identified to be from the same unidentified or novel species, they should be grouped as a single taxonomic name. For example, if the three strains of Candida (ABC, DEF and GHI) are identified as being from the same species, they should be given a single informal name. The identifier used in species groups can be anything that acts as a placeholder but it is recommended to be a number, followed by an author initials and year abbreviation. This can be summarised:

<Rank> sp. <identifier>

 where identifier can be a strain or isolate name, a specimen voucher or culture collection accession, or any unique term for the species or phylogenetic grouping.

 

Informal Name

Informal Groupings

e.g. Candida sp. ABC

e.g. Candida sp. 1 RG-2016

e.g. Candida sp. DEF

e.g. Candida sp. GHI

e.g. Candida sp. JKL

e.g. Candida sp. 2 RG-2016

e.g. Candida sp. MNO

e.g. Candida sp. PQR

e.g., Candida sp. 3 RG-2016

e.g. Candida sp. STU

e.g. Candida sp. VWX

 Please note that cultivars should not form part of the scientific name of plants or fungi. A cultivar name should be added within the sample information during submission.

 

Latest ENA news

11 Oct 2017: Read data download issues resolved

Read data download issues previously affecting ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk services now resolved.

06 Oct 2017: ENA read data download issues

Issues with read data download from ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk

04 Oct 2017: ENA Release 133

Release 133 of ENA's assembled/annotated sequences now available