ENA Browser REST URLs
This document describes the ENA Browser REST URL syntax for programmatic users. The majority of ENA data classes and formats are supported by the ENA Browser.
Full details about ENA data classes and formats are available here.
Basics
- Retrieval using single identifiers
- Retrieval using multiple identifiers
- Retrieval using organism names
- Free text search
- Display options
- Pagination options
- Download options
- Taxonomy portal options
- Using the data warehouse
- Using the marker portal
Examples
- Retrieve SRA metadata in XML format
- Retrieve EMBL-Bank sequences in fasta format
- Retrieve EMBL-Bank subsequences in fasta format
- Retrieve EMBL-Bank subsequences in HTML format
- Retrieve EMBL-Bank records in flat file format
- Retrieve EMBL-Bank records in XML format
- Retrieve EMBL-Bank expanded CON records
- Retrieve EMBL-Bank header in flat file format
- Retrieve EMBL-Bank header in XML format
- Retrieve EMBL-Bank records using sequence versions
- Retrieve EMBL-Bank graphical image
- Retrieve Taxon records in XML format
- Retrieve Taxon records in Darwin Core XML format
- Retrieve Trace sequences in fasta format
- Retrieve Trace sequences in fastq format
- Retrieve Trace metadata in XML format
Retrieval using single identifiers
To retrieve results by a single identifier for all records other than Taxon and Project please, use the following URL syntax:
http://www.ebi.ac.uk/ena/data/view/<identifier>
For example, the following URL returns the EMBL-Bank entry with accession number A00145 in HTML format: http://www.ebi.ac.uk/ena/data/view/A00145.
For Taxon and Project records, respectively, please use the following URL syntax:
http://www.ebi.ac.uk/ena/data/view/Taxon:<identifier>
and
http://www.ebi.ac.uk/ena/data/view/Project:<identifier>
For example, the following URL returns the Taxon record with identifier 9606:
http://www.ebi.ac.uk/ena/data/view/Taxon:9606
and the following URL returns the Project record with identifier 20:
http://www.ebi.ac.uk/ena/data/view/Project:20.
Retrieval using multiple identifiers
To retrieve results by ranges and lists of identifiers please use the following URL syntax:
http://www.ebi.ac.uk/ena/data/view/<identifier1>-<identifierN>
http://www.ebi.ac.uk/ena/data/view/<identifier1>,<identifier2>,<identifier3>,..<identifierN>
For example, the following URL returns all SRA experiments in the range ERA000010 to ERA000020 in HTML format: http://www.ebi.ac.uk/ena/data/view/ERA000010-ERA000020, and the following URL returns two taxons 4235 and 6543 in HTML format: http://www.ebi.ac.uk/ena/data/view/Taxon:4235,Taxon:6543.
Please note that accession number ranges cannot span different data classes.
Retrieval using organism names
Taxonomy retrieval (Taxon:) is supported using scientific names, common names and synonyms. Examples include:
http://www.ebi.ac.uk/ena/data/view/Taxon:Bacillus subtilis
http://www.ebi.ac.uk/ena/data/view/Taxon:turkey
http://www.ebi.ac.uk/ena/data/view/Taxon:Human,Taxon:Cat,Taxon:Mouse,Taxon:Zebrafish
All possible matches are always returned, for example:
http://www.ebi.ac.uk/ena/data/view/Taxon:Bacillus
Free text search
Free text search against all ENA records, excluding trace sequences, is available using the following syntax:
http://www.ebi.ac.uk/ena/data/view/search?query=<term>
For example, the following query returns a summary of all ENA records containing the term histone: http://www.ebi.ac.uk/ena/data/search?query=histone.
Display options
The display parameter is used to specify the display format:
| Option | Description | Example | |
| display=html | Results are displayed in HTML format. Supported by all ENA data classes. HTML is the default display format if no other display option has been specified. | http://www.ebi.ac.uk/ena/data/view/ERA000092&display=html | |
| display=xml | Results are displayed in XML format. Supported by all ENA data classes. |
|
|
| display=text | Results are displayed in text format. Supported only by EMBL-Bank data classes. | http://www.ebi.ac.uk/ena/data/view/A00145&display=text | |
| display=fasta | Results are displayed in fasta format. Supported by EMBL-Bank and Trace data classes. | http://www.ebi.ac.uk/ena/data/view/A00145&display=fasta | |
| display=fastq | Results are displayed in fastq format. Supported only by Trace data class. | http://www.ebi.ac.uk/ena/data/view/TI1941166100&display=fastq |
Additional options affecting the HTML format display:
| Option | Description | Example |
| view_full_sequence | The full EMBL-Bank sequence is displayed in the HTML view. | http://www.ebi.ac.uk/ena/data/view/BN000121?view_full_sequence |
Pagination options
The pagination in the HTML view is controlled by the page parameter with up to 10 records displayed on a single page. For example, http://www.ebi.ac.uk/ena/data/view/A00001-A00145&page=2, shows the second page of records (11-20) from the EMBL-Bank accession range A00001-A00145. If the page parameter has not been defined then the first page of the results will be displayed (1-10).
For all other display options the pagination is controlled by two URL parameters; the offset parameter defines the first record and the length parameter defines the number of records to retrieve. For example, retrieval of 30 Embl-Bank records starting from the fifth record from the range A00001-A00145 in fasta format is returned using the URL http://www.ebi.ac.uk/ena/data/view/A00001-A00145&offset=5&length=30&display=fasta.
Download options
The download parameter is used to specify that records are to be saved in a file:
| Option | Description | Example |
| download=gzip | Results are compressed using gzip and saved into a file. | http://www.ebi.ac.uk/ena/data/view/A00001-A00145&offset=5&length=30&display=fasta&download=gzip |
| download=txt | Results are saved into a file without any compression. | http://www.ebi.ac.uk/ena/data/view/A00001-A00145&display=fasta&download=txt |
Taxonomy portal options
The taxonomy portal is displayed on the ENA Browser taxonomy page in the Navigation panel (e.g. http://www.ebi.ac.uk/ena/data/view/display=html&Taxon:2759).The content of the taxonomy portal table can also be retrieved in a text format: e.g. http://www.ebi.ac.uk/ena/data/stats/taxonomy/2759.
To retrieve records associated with a taxon identifier please use the following URL syntax:
http://www.ebi.ac.uk/ena/data/view/Taxon:<taxon identifier>&portal=<result>[&subtree=true][Pagination options][Display options][Download options]
The following results are supported:
| Result | Description |
| sequence_release | Nucleotide Sequences (EMBL-Bank Release) |
| sequence_update | Nucleotide Sequences (EMBL-Bank Update) |
| sequence_coding | Protein-coding sequences in EMBL-Bank |
| sample | Samples in ENA |
| study | Studies |
| analysis | Nucleotide sequence analyses in SRA |
| analysis_study | Nucleotide sequence analyses in SRA (grouped by study) |
| read_run | Raw reads in SRA |
| read_experiment | Raw reads in SRA (grouped by experiment) |
| read_study | Raw reads in SRA (grouped by study) |
| read_trace | Capillary Traces in Trace Archive |
For example, the first thousand EMBL-CDS records associated with Taxon:9606 are returned in text format using the following URL: http://www.ebi.ac.uk/ena/data/view/Taxon:9606&portal=sequence_coding&offset=1&length=1000&limit=1000&display=txt
The following thousand EMBL-CDS records can be returned using the following URL: http://www.ebi.ac.uk/ena/data/view/Taxon:9606&portal=sequence_coding&offset=1001&length=1000&limit=1000&display=txt
By default only the records directly associated with the <taxon identifier> are returned. All records associated with the taxon and any of its subspecies or strains can be returned by subtree option to true.
Using the data warehouse
While this documentation focuses on the full functionality that is offered by the RESTful interface to the ENA Advanced Search, it can also serve to provide assistance to users of the Query Builder web interface to this service, which offers cut-down functionality (such as supporting only Boolean 'AND' operations). We will provide more specific documentation for the Query Builder interface soon.
A search against ENA content using the EBA data warehouse requires the definition of a domain, a pre-determined partition of ENA content, and one or more search conditions expressed using a query string. A domain comprises a number of results that are deeper partitions of the ENA content. For queries based on these more granular results, display/download format and pagination options are available. While a domain is a partition of content based on the conceptual nature of content (e.g. raw sequence reads vs. annotated assembled sequences) a result is a partition that also takes into account the structure of the underlying content. Because diverse structures are used in ENA for managing different data, it is only at the level of results that some format options are made available.
The URL syntax for retrieving records from the ENA data warehouse is:
http://www.ebi.ac.uk/ena/data/warehouse/search?query=<query string>&[domain=<domain>]
or
http://www.ebi.ac.uk/ena/data/warehouse/search?query=<query string>&result=<result>[Pagination options][Display options][Download options]
By default, the whole ENA is searched using the query string. If a domain or result is specified then only this sub-section of ENA is subject to the search. Please note that the Pagination options, Display options and Download options are only supported when the result parameter is specified.
Examples
Return coding sequences from the STD dataclass for all members of the phylum Diptera (Taxon ID 7147):
Download a compressed flat file representing sequences from in and around the Galapagos Islands:
Discover all paired RNA-seq data from Hi-Seq platforms:
Show genome assemblies for the house mouse (Mus musculus, Taxon ID 10090):
http://www.ebi.ac.uk/ena/data/warehouse/search?query="tax_eq(10090)"&domain=assembly
The available domains and results are listed below (please note that some results may be associated with several domains):
| Domain | Result | Description |
| assembly | assembly | Genome Assemblies in EMBL-Bank |
| sequence | sequence_release | Nucleotide Sequences (EMBL-Bank Release) |
| sequence | sequence_update | Nucleotide Sequences (EMBL-Bank Update) |
| coding | sequence_coding | Protein-coding sequences in EMBL-Bank |
| sample | sample | Samples in ENA |
| sample | sequence_release | Nucleotide Sequences (EMBL-Bank Release) |
| sample | sequence_update | Nucleotide Sequences (EMBL-Bank Update) |
| sample | sequence_coding | Protein-coding sequences in EMBL-Bank |
| study | study | Studies |
| analysis | analysis | Nucleotide sequence analyses in SRA |
| analysis | analysis_study | Nucleotide sequence analyses in SRA (grouped by study) |
| read | read_run | Raw reads in SRA |
| read | read_experiment | Raw reads in SRA (grouped by experiment) |
| read | read_study | Raw reads in SRA (grouped by study) |
| taxon | taxon | Taxonomic classification |
| trace | read_trace | Capillary Traces in Trace Archive |
Query string
The query string is made up of filtering conditions, joined by logical ANDs, ORs and NOTs and bound by double quotes. The use of parentheses is also supported. For example, the following query string could be used:
query=”<filter1> AND (<filter2> OR <filter3>) OR NOT <filter4>”
For ease of reading, query strings have not been URL encoded in the examples below.
Filter types
The following filter types are supported:
- boolean filter
- controlled vocabulary filter
- date filter
- number filter
- text filter
- geospatial filter
- taxonomy filter
Boolean filter
| Operator | = |
| Value | yes, true, no, false |
| Example | environmental_sample=true |
Controlled vocabulary filter
| Operator | =, != |
| Value | A text value from the controlled vocabulary enclosed in double quotes |
| Example | library_source="GENOMIC" |
Date filter
| Operator | =, !=, <, <=, >, >= |
| Value | A date in the format DD-MM-YYYY or DD-MON-YYYY |
| Example | first_public > 01-01-2012 |
Number filter
| Operator | =, !=, <, <=, >, >= |
| Value | Any integer |
| Example | base_count > 4000000 |
Text filter
| Operator | =, != |
| Value | Any text value enclosed in double quotes. Wildcard (*) can be used at the start and/or end of the text value. |
| Example | library_name =”*HUM*" |
Geospatial filter
| Function | Description | Parameters | Example |
| geo_box1 | All locations within a box defined by the lower left (SW) and upper right (NE) points. | south-west latitude, south-west longitude, north-east latitude, north-east longitude | geo_box1(-20, 10, 20, 50) |
| geo_box2 | All locations within a box defined by a centre point and a radius in km. | latitude, longitude, radius (km) | geo_box2(35, 100, 300) |
| geo_circ | All locations within a circle defined by a centre point and a radius in km. | latitude, longitude, radius (km) | geo_circ(35, 100, 300) |
| geo_lat | All locations within a latitude range given by a latitude and a radius in km. | latitude, radius (km) | geo_lat(0, 100) |
| geo_north | All locations north of a given latitude (inclusive). | latitude | geo_north(80) |
| geo_south | All locations south of a given latitude (inclusive). | latitude | geo_south(-80) |
Taxonomy filter
| Function | Description | Parameters | Example |
| tax_eq | All records that match the given NCBI taxonomy identifier | NCBI taxonomy identifier | tax_eq(9606) |
| tax_tree | All records that match the given NCBI taxonomy identifier or are decendands of it | NCBI taxonomy identifier | tax_tree(2759) |
Filter conditions
The geospatial and taxonomy filters are function based. All other filters use the following syntax:
<filter column> <operator> <value>
Filter columns
A full list of filter columns is available below:
| Domain | Result(s) | Filter column | Filter type | Description |
|
assembly,coding, sequence, trace |
assembly,coding, sequence, trace |
accession | text | |
| analysis | analysis, analysis_sample, analysis_study | analysis_accession | text | |
| analysis | analysis, analysis_sample, analysis_study | analysis_title | text | brief sequence analysis description |
| analysis | analysis, analysis_sample, analysis_study | analysis_type | controlled vocabulary | type of sequence analysis |
| assembly | assembly | assembly_description | text | detailed genome assembly description |
| assembly | assembly | assembly_name | text | genome assembly name |
| assembly | assembly | assembly_title | text | brief genome assembly decription |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | base_count | number | number of base pairs |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | bio_material | text | identifier for biological material including institute and collection code |
| read, analysis | analysis, analysis_sample, analysis_study, read_run, read_experiment, read_sample, read_study | center_name | text | |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | cell_line | text | cell line from which the sample was obtained |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | cell_type | text | cell type from which the sample was obtained |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | collected_by | text | name of the person who collected the specimen |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | collection_date | date | date that the specimen was collected |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | country | text | locality of sample isolation: country names, oceans or seas, followed by regions and localities |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | cultivar | text | cultivar (cultivated variety) of plant from which sample was obtained |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | culture_collection | text | identifier for the sample culture including institute and collection code |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | dataclass | controlled vocabulary | sequence data class |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | description | text | brief sequence description |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | dev_stage | text | sample obtained from an organism in a specific developmental stage |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | ecotype | text | a population within a given species displaying traits that reflect adaptation to a local habitat |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | environmental_sample | boolean | identifies sequences derived by direct molecular isolation from an environmental DNA sample |
| read | read_run, read_experiment, read_sample, read_study | experiment_accession | text | |
| read | read_run, read_experiment, read_sample, read_study | experiment_title | text | |
| sequence, sequence_coding, analysis, read | sequence_release, sequence_update, sequence_coding, analysis, analysis_sample, analysis_study, read_run, read_experiment, read_sample, read_study |
first_public | date | date when made public |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | germline | text | the sample is an unrearranged molecule that was inherited from the parental germline |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | host | text | natural (as opposed to laboratory) host to the organism from which sample was obtained |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | identified_by | text | name of the taxonomist who identified the specimen |
| read | read_run, read_experiment, read_sample, read_study | instrument_model | controlled vocabulary | instrument model used in sequencing experiment |
| read | read_run, read_experiment, read_sample, read_study | instrument_platform | controlled vocabulary | instrument platform used in sequencing experiment |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | isolate | text | individual isolate from which sample was obtained |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | isolation_source | text | describes the physical, environmental and/or local geographical source of the sample |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | keywords | text | keywords associated with the sequence |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | lab_host | text | scientific name of the laboratory host used to propagate the source organism for the sample |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | last_updated | date | date when last updated |
| read | read_run, read_experiment, read_sample, read_study | library_layout | controlled vocabulary | sequencing library layout |
| read | read_run, read_experiment, read_sample, read_study | library_name | text | sequencing library name |
| read | read_run, read_experiment, read_sample, read_study | library_selection | controlled vocabulary | |
| read | read_run, read_experiment, read_sample, read_study | library_source | controlled vocabulary | |
| read | read_run, read_experiment, read_sample, read_study | library_strategy | controlled vocabulary | |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | mating_type | text | |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | mol_type | text | in vivo molecule type of the sequence |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | organelle | text | membrane-bound intracellular structure from which the sequence was obtained |
| study | study | project_description | text | detailed project description |
| study | study | project_name | text | sequencing project name |
| study | study | project_title | text | brief sequencing project decription |
| read | read_run, read_experiment, read_sample, read_study | read_count | number | number of reads |
| read |
read_run, read_experiment, read_sample, read_study |
run_accession | text | |
| read, analysis |
read_run, read_experiment, read_sample, read_study, analysis, analysis_sample, analysis_study |
sample_accession | text | |
| read, analysis |
read_run, read_experiment, read_sample, read_study, analysis, analysis_sample, analysis_study |
sample_title | text | brief sample description |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | serotype | text | serological variety of a species characterized by its antigenic properties |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | serovar | text | serological variety of a species (usually a prokaryote) characterized by its antigenic properties |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | sex | text | sex of the organism from which the sample was obtained |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | specimen_voucher | text | identifier for the sample culture including institute and collection code |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | strain | text | strain from which sample was obtained |
| assembly,read,analysis,study | assembly,read_run, read_experiment, read_sample, read_study, analysis, analysis_sample, analysis_study,study | study_accession | text | |
| read, analysis | read_run, read_experiment, read_sample, read_study, analysis, analysis_sample, analysis_study | study_title | text | brief sequencing study decription |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | sub_species | text | name of sub-species of organism from which sample was obtained |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | sub_strain | text | name or identifier of a genetically or otherwise modified strain from which sample was obtained |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | tax_division | text | taxonomic division |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | tissue_lib | text | tissue library from which sample was obtained |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | tissue_type | text | tissue type from which the sample was obtained |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | topology | controlled vocabulary | sequence topology: circular or linear |
| sequence, sequence_coding | sequence_release, sequence_update, sequence_coding | variety | text | variety (varietas, a formal Linnaean rank) of organism from which sample was derived |
Retrieve tabulated data from the data warehouse
In addition to the formats listed above, a tab separated report of data can be returned for each result (that is, this report cannot be returned if searching by domain rather than result). The URL format for retrieving these reports is:
http://www.ebi.ac.uk/ena/data/warehouse/search?query=<query string>&result=<result>&fields=<fields>&display=report[&sortfields=<sortfields>][&download=txt][Pagination options]
Each result has a default accession column. This is returned as the first column of the report regardless of whether or not it was listed in the fields to be retrieved. The report will be sorted by this accession column unless sortfields is provided, in which case it will sort in the order of the listed columns. The additional fields that are able to be returned in these reports are listed below. Note that many of these are the same as the searchable fields listed above, however there are generally more returnable than searchable fields for a given result. Taxonomic information can be returned via tax_id or scientific_name, and geospatial information is returned using location.
| Result | Returnable fields |
| analysis | analysis_accession, analysis_alias, analysis_title, analysis_type, center_name, first_public, sample_accession_list, scientific_name, study_accession, study_alias, study_title, submitted_aspera, submitted_bytes, submitted_ftp, submitted_galaxy, submitted_md5, tax_id |
| analysis_study | analysis_accession, analysis_alias, analysis_title, analysis_type, center_name, first_public, sample_accession_list, scientific_name, study_accession, study_alias, study_title, submitted_aspera, submitted_bytes, submitted_ftp, submitted_galaxy, submitted_md5, tax_id |
| assembly | accession, assembly_description, assembly_name, assembly_title, scientific_name, strain, study_accession, tax_id |
| read_experiment | base_count, center_name, experiment_accession, experiment_alias, experiment_title, fastq_aspera, fastq_bytes, fastq_ftp, fastq_galaxy, fastq_md5, first_public, instrument_model, instrument_platform, library_layout, library_name, library_selection, library_source, library_strategy, project_accession, read_count, run_accession, run_alias, sample_accession_list, scientific_name, study_accession, study_alias, study_title, submission_accession, submitted_aspera, submitted_bytes, submitted_ftp, submitted_galaxy, submitted_md5, tax_id |
| read_run | base_count, center_name, experiment_accession, experiment_alias, experiment_title, fastq_aspera, fastq_bytes, fastq_ftp, fastq_galaxy, fastq_md5, first_public, instrument_model, instrument_platform, library_layout, library_name, library_selection, library_source, library_strategy, project_accession, read_count, run_accession, run_alias, sample_accession_list, scientific_name, study_accession, study_alias, study_title, submission_accession, submitted_aspera, submitted_bytes, submitted_ftp, submitted_galaxy, submitted_md5, tax_id |
| read_study | base_count, center_name, experiment_accession, experiment_alias, experiment_title, fastq_aspera, fastq_bytes, fastq_ftp, fastq_galaxy, fastq_md5, first_public, instrument_model, instrument_platform, library_layout, library_name, library_selection, library_source, library_strategy, project_accession, read_count, run_accession, run_alias, sample_accession_list, scientific_name, study_accession, study_alias, study_title, submission_accession, submitted_aspera, submitted_bytes, submitted_ftp, submitted_galaxy, submitted_md5, tax_id |
| read_trace | accession, scientific_name, tax_id, trace_name |
| sample | accession, bio_material, cell_line, cell_type, collected_by, collection_date, country, cultivar, culture_collection, description, dev_stage, ecotype, environmental_sample, first_public, germline, host, host_status, identified_by, isolate, isolation_source, location, mating_type, sample_alias, scientific_name, serotype, serovar, sex, specimen_voucher, strain, sub_species, sub_strain, tax_id, tissue_lib, tissue_type, variety |
| sample_coding | accession, allele, artificial_location, base_count, bio_material, cell_line, cell_type, codon_start, collected_by, collection_date, country, cultivar, culture_collection, dataclass, db_xref, description, dev_stage, ec_number, ecotype, environmental_sample, exception, experiment, first_public, function, gene, gene_synonym, germline, host, identified_by, inference, isolate, isolation_source, keywords, lab_host, last_updated, location, locus_tag, map, mating_type, mol_type, note, old_locus_tag, operon, organelle, partial, product, protein_id, pseudo, pseudo_gene, ribosomal_slippage, scientific_name, serotype, serovar, sex, specimen_voucher, standard_name, strain, sub_species, sub_strain, tax_division, tax_id, tissue_lib, tissue_type, topology, trans_splicing, transl_except, transl_table, variety |
| sequence_release | accession, base_count, bio_material, cell_line, cell_type, collected_by, collection_date, country, cultivar, culture_collection, dataclass, description, dev_stage, ecotype, environmental_sample, first_public, germline, host, identified_by, isolate, isolation_source, keywords, lab_host, last_updated, location, mating_type, mol_type, organelle, scientific_name, serotype, serovar, sex, specimen_voucher, strain, sub_species , sub_strain, tax_division, tax_id, tissue_lib, tissue_type, topology, variety |
| sequence_update | accession, base_count, bio_material, cell_line, cell_type, collected_by, collection_date, country, cultivar, culture_collection, dataclass, description, dev_stage, ecotype, environmental_sample, first_public, germline, host, identified_by, isolate, isolation_source, keywords, lab_host, last_updated, location, mating_type, mol_type, organelle, scientific_name, serotype, serovar, sex, specimen_voucher, strain, sub_species , sub_strain, tax_division, tax_id, tissue_lib, tissue_type, topology, variety |
| study | breed, cultivar, isolate, scientific_name, strain, study_accession, study_description, study_name, study_title, tax_id |
| taxon | scientific_name, tax_id, taxon_accession, taxon_title |
Using the marker portal
The marker portal is a special section of the data warehouse, with a new domain id "marker". There are currently 12 coding markers that have been defined and linked to the CDS data table:
| Marker symbol | Marker description |
| COX1 | Cytochrome C oxidase subunit I |
| CYTB | Cytochrome B |
| EEF1A | Elongation factor 1-alpha |
| EEF2K | Eukaryotic elongation factor-2 kinase |
| MATK | Maturase K |
| NAD1 | NADH dehydrogenase subunit 1 |
| NAD2 | NADH dehydrogenase subunit 2 |
| NAD4 | NADH dehydrogenase subunit 4 |
| NAD5 | NADH dehydrogenase subunit 5 |
| RAG2 | Recombination activating gene 2 |
| RBCL | Ribulose bisophosphate carboxylase |
| RHOD | Rhodopsin |
The marker domain used by this portal allows the search of a single marker (using a marker symbol listed above) and any taxonomic search to narrow the results to the taxa of interest. No other fields are currently searchable for this domain.
Downloading/displaying reports, sequences and taxonomic coverage information, all require the result to be set. For coding markers, this is always sequence_coding. The examples below show how to perform these queries.
Examples
Download all CDS and parent accessions for mammalian COX1 sequences
Display FASTA sequences for all mammalian COX1 sequences
Display the mammalian taxonomic coverage (with sequence counts) for COX1
Note to download the taxonomic coverage above, append &download=marker to the end of the URL
Retrieve SRA metadata in XML format
SRA metadata XMLs can be retrieved using the display=xml parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/ERA000092&display=xml will return the SRA Submission ERA000092 object in XML format.
Retrieve EMBL-Bank sequences in fasta format
EMBL-Bank sequences can be returned in fasta format using the display=fasta parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/A00145&display=fasta will return the EMBL-Bank sequence A00145 in fasta format.
Retrieve EMBL-Bank subsequences in fasta format
The range parameter can be used in combination with the display=fasta option to return EMBL-Bank subsequences in fasta format. For example, retrieval of a subsequence from EMBL-bank entry A00145 from bases 3 to 63 is done using the URL: http://www.ebi.ac.uk/ena/data/view/A00145&display=fasta&range=3-63.
Retrieve EMBL-Bank subsequences in HTML format
The range parameter can be used in combination with the display=html [default] option to return EMBL-Bank subsequences in HTML format. For example, retrieval of a subsequence from EMBL-bank entry A00145 from bases 3 to 63 is done using the URL: http://www.ebi.ac.uk/ena/data/view/A00145&range=3-63.
Retrieve EMBL-Bank records in flat file format
EMBL-Bank records can be retrieved in flat file format using the display=text parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/A00145&display=text will return the EMBL-Bank entry A00145 in flat file format.
Retrieve EMBL-Bank records in XML format
EMBL-Bank records can be retrieved in XML format using the display=xml parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/A00145&display=xml will return the EMBL-Bank entry A00145 in XML format.
Retrieve EMBL-Bank expanded CON records
To retrieve EMBL-Bank expanded CON records please use the expanded=true parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/AL513382&display=text&expanded=true will return the expanded CON entry AL513382 in flat file format.
Expanded CON records are different from CON records in two ways. Firstly, the expanded CON records contain the full sequence in addition to the contig assembly instructions. Secondly, if a CON record contains only source or gap features the expanded CON records will also display all features from the segment records.
Retrieve EMBL-Bank header in flat file format
To retrieve EMBL-Bank header in flat file format please use the header=true parameter, e.g.:
http://www.ebi.ac.uk/ena/data/view/BN000065&display=text&header=true
Retrieve EMBL-Bank header in XML format
To retrieve EMBL-Bank header in XML format please use the header=true parameter, e.g.:
http://www.ebi.ac.uk/ena/data/view/BN000065&display=xml&header=true
Retrieve EMBL-Bank records using sequence versions
Specific EMBL-Bank sequence versions can be retrieved by appending the sequence version to the identifer:
<identifier>.<sequence version>
In the html view (display=html), if the requested sequence version is below the current sequence version, a warning is shown with a link to historical sequence versions of the record, e.g.: http://www.ebi.ac.uk/ena/data/view/AM407889.1. Also, if the requested sequence version is higher than the current sequence version, a warning is shown: e.g.: http://www.ebi.ac.uk/ena/data/view/AM407889.12). Note that in html view links to historical versions of EMBL-Bank records are always provided in the Navigation panel.
In the xml view (display=xml), the latest sequence version is always displayed, but an analogous warning can be constructed as both requested and returned sequence versions are presented: e.g.: http://www.ebi.ac.uk/ena/data/view/AM407889.1&display=xml.
In fasta and text view (display=fasta and display=text), the requested sequence version is returned directly, e.g.: http://www.ebi.ac.uk/ena/data/view/AM407889.1&display=fasta and http://www.ebi.ac.uk/ena/data/view/AM407889.1&display=text.
Retrieve EMBL-Bank graphical image
A graphical view is available for EMBL-Bank records that have feature/assembly annotation or a sequence. These views contain two components, feature/assembly and sequence, for which ranges to be viewed are defined separately using the featureRange and sequenceRange parameters.
For example, The URL http://www.ebi.ac.uk/ena/data/view/graphics/BN000065&featureRange=10-300000&sequenceRange=103-166 returns an image created from EMBL-Bank entry BN000065 showing feature and assembly annotation from bases 10-300000 and sequence from bases 103-166.
Retrieve Taxon records in XML format
Taxon records can be retrieved in XML format using the display=xml parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/Taxon:2759&display=xml returns the taxonomy record for Eukaryota.
Retrieve Taxon records in Darwin Core XML format
Taxon records can be retrieved in Darwin Core XML format using the display=dwc parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/Taxon:2759&display=dwc returns the taxonomy record for Eukaryota in Darwin Core XML.
Retrieve Trace sequences in fasta format
Trace sequences can be returned in fasta format using the display=fasta parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/TI1&display=fasta will return the Trace sequence TI1 in fasta format.
Retrieve Trace sequences in fastq format
Trace sequences can be returned in fastq format using the display=fastq parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/TI1&display=fastq will return the Trace sequence TI1 in fastq format.
Retrieve Trace metadata in XML format
Trace metadata XMLs can be retrieved using the display=xml parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/TI1&display=xml will return the Trace T1 metadata in XML format.

