Programmatically Accessing Taxonomic Information

Every ENA sample object should have a taxonomic classification. The INSDC maintains a database of all unique taxonomic classifications known to us. Each classification has a unique id and this is expanded to show the scientific name and common name of the organism when the sample is viewed in the ENA Browser.

There are several ways that you can access information about taxonomy or related records using the REST APIs available through the ENA taxonomy services.

Finding General Information about Taxa

If you know the scientific name of the organism you can find the tax ID with this endpoint:

www.ebi.ac.uk/ena/taxonomy/rest/scientific-name/

Simply append the scientific name to the URL:

> curl "https://www.ebi.ac.uk/ena/taxonomy/rest/scientific-name/Leptonycteris%20nivalis"
[
  {
    "taxId": "59456",
    "scientificName": "Leptonycteris nivalis",
    "commonName": "Mexican long-nosed bat",
    "formalName": "true",
    "rank": "species",
    "division": "MAM",
    "lineage": "Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Chiroptera; Microchiroptera; Phyllostomidae; Glossophaginae; Leptonycteris; ",
    "geneticCode": "1",
    "mitochondrialGeneticCode": "2",
    "submittable": "true"
  }
]

You can also do this with the common name using this endpoint:

www.ebi.ac.uk/ena/taxonomy/rest/any-name/

Simply append the common name to the URL:

> curl "https://www.ebi.ac.uk/ena/taxonomy/rest/any-name/golden%20arrow%20poison%20frog"
[
  {
    "taxId": "377316",
    "scientificName": "Atelopus zeteki",
    "commonName": "golden arrow poison frog",
    "formalName": "true",
    "rank": "species",
    "division": "VRT",
    "lineage": "Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Amphibia; Batrachia; Anura; Neobatrachia; Hyloidea; Bufonidae; Atelopus; ",
    "geneticCode": "1",
    "mitochondrialGeneticCode": "2",
    "submittable": "true"
  }
]

Please note that not all taxa have a common name.

If you do not know the scientific name or the common name but you have an idea, you can use the suggest endpoint:

www.ebi.ac.uk/ena/taxonomy/rest/suggest-for-submission/

See this example using the search term “curry”:

> curl "https://www.ebi.ac.uk/ena/taxonomy/rest/suggest-for-submission/curry"
[
  {
    "taxId": "159030",
    "scientificName": "Murraya koenigii",
    "displayName": "curry leaf"
  },
  {
    "taxId": "261786",
    "scientificName": "Helichrysum italicum",
    "displayName": "curry plant"
  }
]

Accessing Taxon XML Records and Full Lineage

You can access the XML record of any public taxon using the Browser API. For example, to access the record of the ant fungus garden metagenome taxon, we can provide the Browser API XML endpoint with the tax ID 797283.

https://www.ebi.ac.uk/ena/browser/api/xml/797283

You can also choose to download this directly from the API by specifying “download=true”:

https://www.ebi.ac.uk/ena/browser/api/xml/797283?download=true

This XML record provides general taxonomic information such as rank or translation genetic code as well as the scientific names and tax IDs of the parent and child taxa related to the record. This allows full exploration of the lineage of the taxon.

Finding Associated Records

For a report of all records associated with a taxon, you can use the Discovery Portal API. This API can provide a table of record counts as well as provide a list of record IDs and descriptions when provided a ‘result’ data type.

For example, to provide a report of all records that link to the ‘ant fungus garden metagenome’ taxon, we could provide the Portal API with the tax ID using the ‘links/taxon’ endpoint. This can be in tsv or json format:

https://www.ebi.ac.uk/ena/portal/api/links/taxon?accession=797283&format=tsv

Result:

result_id    description     entry_cnt       base_cnt        subtree_entry_cnt       subtree_base_cnt
read_experiment      Experiment      236     12253983418     236     12253983418
sequence_update      Sequence (Update)       0       0       0       0
sample       Sample  236     0       236     0
analysis_study       Study   0       0       0       0
analysis     Analysis        0       0       0       0
study        Study   15      0       15      0
assembly     Assembly        4       340564769       4       340564769
sequence_release     Sequence (Release)      10      2048    10      2048
wgs_set      Genome assembly contig set      4       0       4       0
noncoding_release    Non-coding (Release)    10      2048    10      2048
noncoding    Non-coding      10      2048    10      0
coding_update        Coding (Update) 0       0       0       0
tsa_set      Transcriptome assembly contig set       0       0       0       0
read_run     Read    236     12253983418     236     12253983418
read_study   Study   7       12253983418     7       12253983418
sequence     Sequence        10      2048    10      0
coding_release       Coding (Release)        4       144     4       144
noncoding_update     Non-coding (Update)     0       0       0       0

From this summary, we can see that this taxon has 15 studies associated with it. To then see a report of the study IDs and descriptions, we can specify this with the addition of ‘result=study’:

https://www.ebi.ac.uk/ena/portal/api/links/taxon?accession=797283&format=tsv&result=study

Result:

study_accession      description
PRJNA258031  Atta colombica refuse dump Targeted Locus (Loci)
PRJNA336974  Cyphomyrmex longiscapus fungus garden microbial communities from Gamboa, Panama metagenome
PRJNA336975  Apterostigma dentigerum fungus garden microbial communities from Gamboa, Panama metagenome
PRJNA336982  Leaf cutter ant microbial communities from the University of Wisconsin-Madison, USA, from External Dump - Dump Bottom metagenome
PRJNA336984  Leaf cutter ant microbial communities from the University of Wisconsin-Madison, USA, from External Dump - Dump Top metagenome
PRJNA336998  Leaf cutter ant microbial communities from the University of Wisconsin-Madison, USA, from fungus growing ant-garden - Acromyrmex fungus garden Combined metagenome
PRJNA336999  Leaf cutter ant microbial communities from the University of Wisconsin-Madison, USA, from fungus growing ant-garden - Atta cephalotes fungus garden Combined metagenome
PRJNA337000  Atta colombica fungus garden Top metagenome
PRJNA337001  Atta colombica fungus garden Bottom metagenome
PRJNA337002  Atta texana Internal Dump Top metagenome
PRJNA337003  Atta texana Internal Dump Bottom metagenome
PRJNA39805   Atta colombica Fungus Garden Metagenome
PRJNA62039   Atta cephalotes Fungus Garden Metagenome
PRJNA62041   Atta colombica Fungus Garden Top Metagenome
PRJNA62043   Atta colombica Fungus Garden Bottom Metagenome

When exploring links to taxon records, you can also specify a taxonomic node such as a genus or family rank taxon and request all links in that subtree. For example, if you would like a report of all studies associated with taxa that are under the tax node ecological metagenomes. You could specify this with the addition of “subtree=true”:

https://www.ebi.ac.uk/ena/portal/api/links/taxon?accession=410657&result=study&subtree=true

Downloading Taxonomy Data via FTP

Taxonomy data is available for bulk download through FTP at ftp://ftp.ebi.ac.uk/pub/databases/ena/taxonomy/.

File

Definition

taxonomy.xml.gz

Full release of the taxonomy data in ENA taxonomy format.

sdwca

Full release of taxonomy data in Darwin Core Archive format.

GBIF and the Darwin Core Archive

The Global Biodiversity Information Facility (GBIF) aims to make the world’s Biodiversity data freely and universally available to provide an essential global informatics infrastructure for Biodiversity research and applications worldwide. Read about the Darwin Core Standard (DwC) on their website here.

The Darwin Core Archive comprises of 3 files: a tab-delimited data file, an XML file listing the descriptors of used in the data file and an another XML file representing a metadata file with information related to the data itself, the data supplier, the archive creator name of the person who created the archive.