The classification system for source biological organisms for all INSDC records is the NCBI Taxonomy and is available from the ENA browser (see here for an example). ENA curators work alongside taxonomist at the NCBI to ensure that all ENA records display the accepted organism name and classification hierarchy. NCBI Taxonomy is an incomplete classification system in that it only considers taxa for data that are represented in INSDC records. Users should note that taxa are only displayed if at least one associated ENA record is available.
As part of the curation of incoming data, ENA curators check the source organisms. Many organisms will already be classified and no further action will be necessary. Curators check for spellings and alternative names, including but not limited to acronyms (viruses), synonyms, teleomorphs, anamorphs and misnomers, and revise these to the accepted scientific name from the NCBI Taxonomy. Organisms which are published but unclassified require a simple taxonomy referral; a standard letter is sent to the NCBI Taxonomy team with details of the organism name and source metadata. The taxonomists will then do the necessary research required to add the new taxon and apply the correct classification which may include adding further taxa from higher ranks.
A novel unpublished organism or one that has not been fully identified requires the creation of an informal name. This contains a placeholder element that is used until a formalised name is accepted by the appropriate nomenclature code. For prokaryotes, this is generally a strain or isolate name but can be any identifier that makes the organism unique in the database. For example, Escherichia sp. ES13 (Taxon:640049) is an informal name for an Escherichia coli strain that has not been identified or published at species level.
ENA curators will communicate at length with the submitter in order to obtain all the information required to correctly classify the source organism of the sequence entry. Curators will also liaise between NCBI Taxonomy and the submitters on nomenclatural changes or updates, such as when an informal name is published. ENA records are only loaded into the archive once the organism data records are up to date in NCBI Taxonomy.