ENA Browser REST URLs

This document describes the ENA Browser REST URL syntax for programmatic users. The majority of ENA data classes and formats are supported by the ENA Browser.

Full details about ENA data classes and formats are available here.

Basics

Examples

Retrieval using single identifiers

To retrieve results by a single identifier for all records other than Taxon and Project please, use the following URL syntax:

http://www.ebi.ac.uk/ena/data/view/<identifier>

For example, the following URL returns the EMBL-Bank entry with accession number A00145 in HTML format: http://www.ebi.ac.uk/ena/data/view/A00145.

For Taxon and Project records, respectively, please use the following URL syntax:

http://www.ebi.ac.uk/ena/data/view/Taxon:<identifier>

and 

http://www.ebi.ac.uk/ena/data/view/Project:<identifier>

For example, the following URL returns the Taxon record with identifier 9606:

http://www.ebi.ac.uk/ena/data/view/Taxon:9606 

and the following URL returns the Project record with identifier 20:

http://www.ebi.ac.uk/ena/data/view/Project:20.

Retrieval using multiple identifiers

To retrieve results by ranges and lists of identifiers please use the following URL syntax:

http://www.ebi.ac.uk/ena/data/view/<identifier1>-<identifierN>

http://www.ebi.ac.uk/ena/data/view/<identifier1>,<identifier2>,<identifier3>,..<identifierN>

For example, the following URL returns all SRA experiments in the range ERA000010 to ERA000020 in HTML format: http://www.ebi.ac.uk/ena/data/view/ERA000010-ERA000020, and the following URL returns two taxons 4235 and 6543 in HTML format: http://www.ebi.ac.uk/ena/data/view/Taxon:4235,Taxon:6543.

Please note that accession number ranges cannot span different data classes.

Retrieval using organism names

Taxonomy retrieval (Taxon:) is supported using scientific names, common names and synonyms. Examples include:

http://www.ebi.ac.uk/ena/data/view/Taxon:Bacillus subtilis

http://www.ebi.ac.uk/ena/data/view/Taxon:turkey

http://www.ebi.ac.uk/ena/data/view/Taxon:Human,Taxon:Cat,Taxon:Mouse,Taxon:Zebrafish

All possible matches are always returned, for example:

http://www.ebi.ac.uk/ena/data/view/Taxon:Bacillus

Free text search

Free text search against all ENA records, excluding trace sequences, is available using the following syntax:

http://www.ebi.ac.uk/ena/data/view/search?query=<term>

For example, the following query returns a summary of all ENA records containing the term histone: http://www.ebi.ac.uk/ena/data/search?query=histone.

Display options

The display parameter is used to specify the display format:

Option Description Example
display=html Results are displayed in HTML format. Supported by all ENA data classes. HTML is the default display format if no other display option has been specified. http://www.ebi.ac.uk/ena/data/view/ERA000092&display=html
display=xml Results are displayed in XML format. Supported by all ENA data classes.
http://www.ebi.ac.uk/ena/data/view/A00145&display=xml
display=text Results are displayed in text format. Supported only by EMBL-Bank data classes. http://www.ebi.ac.uk/ena/data/view/A00145&display=text
display=fasta Results are displayed in fasta format. Supported by EMBL-Bank and Trace data classes. http://www.ebi.ac.uk/ena/data/view/A00145&display=fasta
display=fastq Results are displayed in fastq format. Supported only by Trace data class. http://www.ebi.ac.uk/ena/data/view/TI1941166100&display=fastq

Additional options affecting the HTML format display:

Option Description Example
view_full_sequence The full EMBL-Bank sequence is displayed in the HTML view. http://www.ebi.ac.uk/ena/data/view/BN000121?view_full_sequence

 

Pagination options

The pagination in the HTML view is controlled by the page parameter with up to 10 records displayed on a single page. For example, http://www.ebi.ac.uk/ena/data/view/A00001-A00145&page=2, shows the second page of records (11-20) from the EMBL-Bank accession range A00001-A00145. If the page parameter has not been defined then the first page of the results will be displayed (1-10).

For all other display options the pagination is controlled by two URL parameters; the offset parameter defines the first record and the length parameter defines the number of records to retrieve. For example, retrieval of 30 Embl-Bank records starting from the fifth record from the range A00001-A00145 in fasta format is returned using the URL http://www.ebi.ac.uk/ena/data/view/A00001-A00145&offset=5&length=30&display=fasta.

Download options

The download parameter is used to specify that records are to be saved in a file:

Option Description Example
download=gzip Results are compressed using gzip and saved into a file. http://www.ebi.ac.uk/ena/data/view/A00001-A00145&offset=5&length=30&display=fasta&download=gzip
download=txt Results are saved into a file without any compression. http://www.ebi.ac.uk/ena/data/view/A00001-A00145&display=fasta&download=txt

Taxonomy portal options

The taxonomy portal is displayed on the ENA Browser taxonomy page in the Navigation panel (e.g. http://www.ebi.ac.uk/ena/data/view/display=html&Taxon:2759).The content of the taxonomy portal table can also be retrieved in a text format: e.g. http://www.ebi.ac.uk/ena/data/stats/taxonomy/2759.

To retrieve records associated with a taxon identifier please use the following URL syntax:

http://www.ebi.ac.uk/ena/data/view/Taxon:<taxon identifier>&portal=<result>[&subtree=true][Pagination options][Display options][Download options]

The following results are supported:

Result Description
sequence_release Nucleotide Sequences (EMBL-Bank Release)
sequence_update Nucleotide Sequences (EMBL-Bank Update)
sequence_coding Protein-coding sequences in EMBL-Bank
sample Samples in ENA
study Studies
analysis Nucleotide sequence analyses in SRA
analysis_study Nucleotide sequence analyses in SRA (grouped by study)
read_run Raw reads in SRA
read_experiment Raw reads in SRA (grouped by experiment)
read_study Raw reads in SRA (grouped by study)
read_trace Capillary Traces in Trace Archive

For example, the first thousand EMBL-CDS records associated with Taxon:9606 are returned in text format using the following URL: http://www.ebi.ac.uk/ena/data/view/Taxon:9606&portal=sequence_coding&offset=1&length=1000&limit=1000&display=txt

The following thousand EMBL-CDS records can be returned using the following URL: http://www.ebi.ac.uk/ena/data/view/Taxon:9606&portal=sequence_coding&offset=1001&length=1000&limit=1000&display=txt

By default only the records directly associated with the <taxon identifier> are returned. All records associated with the taxon and any of its subspecies or strains can be returned by subtree option to true.

Using the data warehouse

While this documentation focuses on the full functionality that is offered by the RESTful interface to the ENA Advanced Search, it can also serve to provide assistance to users of the Query Builder web interface to this service, which offers cut-down functionality (such as supporting only Boolean 'AND' operations). We will provide more specific documentation for the Query Builder interface soon.

A search against ENA content using the EBA data warehouse requires the definition of a domain, a pre-determined partition of ENA content, and one or more search conditions expressed using a query string. A domain comprises a number of results that are deeper partitions of the ENA content. For queries based on these more granular results, display/download format and pagination options are available. While a domain is a partition of content based on the conceptual nature of content (e.g. raw sequence reads vs. annotated assembled sequences) a result is a partition that also takes into account the structure of the underlying content. Because diverse structures are used in ENA for managing different data, it is only at the level of results that some format options are made available.

The URL syntax for retrieving records from the ENA data warehouse is:

http://www.ebi.ac.uk/ena/data/warehouse/search?query=<query string>&[domain=<domain>] 

or

http://www.ebi.ac.uk/ena/data/warehouse/search?query=<query string>&result=<result>[Pagination options][Display options][Download options]

By default, the whole ENA is searched using the query string. If a domain or result is specified then only this sub-section of ENA is subject to the search. Please note that the Pagination optionsDisplay options and Download options are only supported when the result parameter is specified.

Examples

Return coding sequences from the STD dataclass for all members of the phylum Diptera (Taxon ID 7147):

http://www.ebi.ac.uk/ena/data/warehouse/search?query=%22tax_tree(7147)%20AND%20dataclass=%22STD%22%22&domain=coding

Download a compressed flat file representing sequences from in and around the Galapagos Islands:

http://www.ebi.ac.uk/ena/data/warehouse/search?query="geo_circ(-0.587,-90.5713,170)"&result=sequence_release&display=text&download=gzip

Discover all paired RNA-seq data from Hi-Seq platforms:

http://www.ebi.ac.uk/ena/data/warehouse/search?query=%22(instrument_model=%22Illumina%20HiSeq%202000%22%20OR%20instrument_model=%22Illumina%20HiSeq%201000%22%20OR%20instrument_model=%22Illumina%20HiSeq%202500%22)%20AND%20library_layout=%22PAIRED%22%20AND%20library_source=%22TRANSCRIPTOMIC%22%22&domain=read

Show genome assemblies for the house mouse (Mus musculus, Taxon ID 10090):

http://www.ebi.ac.uk/ena/data/warehouse/search?query="tax_eq(10090)"&domain=assembly

Domains and results

The available domains and results are listed below (please note that some results may be associated with several domains):

Domain Result Description
assembly assembly Genome Assemblies in EMBL-Bank
sequence sequence_release Nucleotide Sequences (EMBL-Bank Release)
sequence sequence_update Nucleotide Sequences (EMBL-Bank Update)
coding sequence_coding Protein-coding sequences in EMBL-Bank
sample sample Samples in ENA
sample sequence_release Nucleotide Sequences (EMBL-Bank Release)
sample sequence_update Nucleotide Sequences (EMBL-Bank Update)
sample sequence_coding Protein-coding sequences in EMBL-Bank
study study Studies
analysis analysis Nucleotide sequence analyses in SRA
analysis analysis_study Nucleotide sequence analyses in SRA (grouped by study)
read read_run Raw reads in SRA
read read_experiment Raw reads in SRA (grouped by experiment)
read read_study Raw reads in SRA (grouped by study)
taxon taxon Taxonomic classification
trace read_trace Capillary Traces in Trace Archive

Query string

The query string is made up of filtering conditions, joined by logical ANDs, ORs and NOTs and bound by double quotes.  The use of parentheses is also supported. For example, the following query string could be used:

query=”<filter1> AND (<filter2> OR <filter3>) OR NOT <filter4>

For ease of reading, query strings have not been URL encoded in the examples below.

Filter types

The following filter types are supported:

  • boolean filter
  • controlled vocabulary filter
  • date filter
  • number filter
  • text filter
  • geospatial filter
  • taxonomy filter 

Boolean filter

Operator =
Value yes, true, no, false
Example environmental_sample=true

Controlled vocabulary filter 

Operator =, !=
Value A text value from the controlled vocabulary enclosed in double quotes
Example library_source="GENOMIC"

Date filter

Operator =, !=, <, <=, >, >=
Value A date in the format DD-MM-YYYY or DD-MON-YYYY
Example first_public > 01-01-2012

Number filter

Operator =, !=, <, <=, >, >=
Value Any integer
Example base_count > 4000000

Text filter

Operator =, !=
Value Any text value enclosed in double quotes. Wildcard (*) can be used at the start and/or end of the text value.
Example library_name =”*HUM*"

Geospatial filter

Function Description Parameters Example
geo_box1 All locations within a box defined by the lower left (SW) and upper right (NE) points. south-west latitude, south-west longitude, north-east latitude, north-east longitude geo_box1(-20, 10, 20, 50)
geo_box2 All locations within a box defined by a centre point and a radius in km. latitude, longitude, radius (km) geo_box2(35, 100, 300)
geo_circ All locations within a circle defined by a centre point and a radius in km. latitude, longitude, radius (km) geo_circ(35, 100, 300)
geo_lat All locations within a latitude range given by a latitude and a radius in km. latitude, radius (km) geo_lat(0, 100)
geo_north All locations north of a given latitude (inclusive). latitude geo_north(80)
geo_south All locations south of a given latitude (inclusive). latitude geo_south(-80)

Taxonomy filter

Function Description Parameters Example
tax_eq All records that match the given NCBI taxonomy identifier NCBI taxonomy identifier tax_eq(9606)
tax_tree All records that match the given NCBI taxonomy identifier or are decendands of it NCBI taxonomy identifier tax_tree(2759)

Filter conditions

The geospatial and taxonomy filters are function based. All other filters use the following syntax:

<filter column> <operator> <value>

Filter columns 

A full list of filter columns is available below:

Domain Result(s) Filter column Filter type Description

assembly,coding, sequence, trace

assembly,coding, sequence, trace

accession text  
analysis analysis, analysis_sample, analysis_study analysis_accession text  
analysis analysis, analysis_sample, analysis_study analysis_title text brief sequence analysis description
analysis analysis, analysis_sample, analysis_study analysis_type controlled vocabulary type of sequence analysis
assembly assembly assembly_description text detailed genome assembly description
assembly assembly assembly_name text genome assembly name
assembly assembly assembly_title text brief genome assembly decription
sequence, sequence_coding sequence_release, sequence_update, sequence_coding base_count number number of base pairs
sequence, sequence_coding sequence_release, sequence_update, sequence_coding bio_material text identifier for biological material including institute and collection code
read, analysis analysis, analysis_sample, analysis_study, read_run, read_experiment, read_sample, read_study center_name text  
sequence, sequence_coding sequence_release, sequence_update, sequence_coding cell_line text cell line from which the sample was obtained
sequence, sequence_coding sequence_release, sequence_update, sequence_coding cell_type text cell type from which the sample was obtained
sequence, sequence_coding sequence_release, sequence_update, sequence_coding collected_by text name of the person who collected the specimen
sequence, sequence_coding sequence_release, sequence_update, sequence_coding collection_date date date that the specimen was collected
sequence, sequence_coding sequence_release, sequence_update, sequence_coding country text locality of sample isolation: country names, oceans or seas, followed by regions and localities
sequence, sequence_coding sequence_release, sequence_update, sequence_coding cultivar text cultivar (cultivated variety) of plant from which sample was obtained
sequence, sequence_coding sequence_release, sequence_update, sequence_coding culture_collection text identifier for the sample culture including institute and collection code
sequence, sequence_coding  sequence_release, sequence_update, sequence_coding dataclass controlled vocabulary sequence data class
sequence, sequence_coding sequence_release, sequence_update, sequence_coding description text brief sequence description
sequence, sequence_coding sequence_release, sequence_update, sequence_coding dev_stage text sample obtained from an organism in a specific developmental stage
sequence, sequence_coding sequence_release, sequence_update, sequence_coding ecotype text a population within a given species displaying traits that reflect adaptation to a local habitat
sequence, sequence_coding sequence_release, sequence_update, sequence_coding environmental_sample boolean identifies sequences derived by direct molecular isolation from an environmental DNA sample
read read_run, read_experiment, read_sample, read_study experiment_accession text  
read read_run, read_experiment, read_sample, read_study experiment_title text  
sequence, sequence_coding, analysis, read sequence_release, sequence_update, sequence_coding,
analysis, analysis_sample, analysis_study, read_run, read_experiment, read_sample, read_study
first_public date date when made public
sequence, sequence_coding sequence_release, sequence_update, sequence_coding germline text the sample is an unrearranged molecule that was inherited from the parental germline
sequence, sequence_coding sequence_release, sequence_update, sequence_coding host text natural (as opposed to laboratory) host to the organism from which sample was obtained
sequence, sequence_coding sequence_release, sequence_update, sequence_coding identified_by text name of the taxonomist who identified the specimen
read read_run, read_experiment, read_sample, read_study instrument_model controlled vocabulary instrument model used in sequencing experiment
read read_run, read_experiment, read_sample, read_study instrument_platform controlled vocabulary instrument platform used in sequencing experiment
sequence, sequence_coding sequence_release, sequence_update, sequence_coding isolate text individual isolate from which sample was obtained
sequence, sequence_coding sequence_release, sequence_update, sequence_coding isolation_source text describes the physical, environmental and/or local geographical source of the sample
sequence, sequence_coding sequence_release, sequence_update, sequence_coding keywords text keywords associated with the sequence
sequence, sequence_coding sequence_release, sequence_update, sequence_coding lab_host text scientific name of the laboratory host used to propagate the source organism for the sample
sequence, sequence_coding sequence_release, sequence_update, sequence_coding last_updated date date when last updated
read read_run, read_experiment, read_sample, read_study library_layout controlled vocabulary sequencing library layout
read read_run, read_experiment, read_sample, read_study library_name text sequencing library name
read read_run, read_experiment, read_sample, read_study library_selection controlled vocabulary  
read read_run, read_experiment, read_sample, read_study library_source controlled vocabulary  
read read_run, read_experiment, read_sample, read_study library_strategy controlled vocabulary  
sequence, sequence_coding sequence_release, sequence_update, sequence_coding mating_type text  
sequence, sequence_coding sequence_release, sequence_update, sequence_coding mol_type text in vivo molecule type of the sequence
sequence, sequence_coding sequence_release, sequence_update, sequence_coding  organelle text membrane-bound intracellular structure from which the sequence was obtained 
study study project_description text detailed project description
study  study project_name text sequencing project name
study study project_title text  brief sequencing project decription
read  read_run, read_experiment, read_sample, read_study  read_count number number of reads
read

read_run, read_experiment, read_sample, read_study

run_accession text  
read, analysis

read_run, read_experiment, read_sample, read_study, analysis, analysis_sample, analysis_study

sample_accession text  
read, analysis 

read_run, read_experiment, read_sample, read_study, analysis, analysis_sample, analysis_study

sample_title text brief sample description 
sequence, sequence_coding sequence_release, sequence_update, sequence_coding serotype text  serological variety of a species characterized by its antigenic properties
sequence, sequence_coding sequence_release, sequence_update, sequence_coding serovar text  serological variety of a species (usually a prokaryote) characterized by its antigenic properties
sequence, sequence_coding sequence_release, sequence_update, sequence_coding sex text  sex of the organism from which the sample was obtained
sequence, sequence_coding sequence_release, sequence_update, sequence_coding  specimen_voucher text  identifier for the sample culture including institute and collection code
sequence, sequence_coding sequence_release, sequence_update, sequence_coding strain text strain from which sample was obtained
assembly,read,analysis,study assembly,read_run, read_experiment, read_sample, read_study, analysis, analysis_sample, analysis_study,study study_accession text  
read, analysis read_run, read_experiment, read_sample, read_study, analysis, analysis_sample, analysis_study study_title text  brief sequencing study decription
sequence, sequence_coding sequence_release, sequence_update, sequence_coding  sub_species text  name of sub-species of organism from which sample was obtained 
sequence, sequence_coding sequence_release, sequence_update, sequence_coding  sub_strain text  name or identifier of a genetically or otherwise modified strain from which sample was obtained
sequence, sequence_coding sequence_release, sequence_update, sequence_coding  tax_division text  taxonomic division
sequence, sequence_coding sequence_release, sequence_update, sequence_coding  tissue_lib text  tissue library from which sample was obtained
sequence, sequence_coding sequence_release, sequence_update, sequence_coding  tissue_type text  tissue type from which the sample was obtained 
sequence, sequence_coding sequence_release, sequence_update, sequence_coding  topology controlled vocabulary  sequence topology: circular or linear
sequence, sequence_coding sequence_release, sequence_update, sequence_coding  variety text  variety (varietas, a formal Linnaean rank) of organism from which sample was derived

 Retrieve tabulated data from the data warehouse

In addition to the formats listed above, a tab separated report of data can be returned for each result (that is, this report cannot be returned if searching by domain rather than result).  The URL format for retrieving these reports is:

http://www.ebi.ac.uk/ena/data/warehouse/search?query=<query string>&result=<result>&fields=<fields>&display=report[&sortfields=<sortfields>][&download=txt][Pagination options]

Each result has a default accession column.  This is returned as the first column of the report regardless of whether or not it was listed in the fields to be retrieved.  The report will be sorted by this accession column unless sortfields is provided, in which case it will sort in the order of the listed columns.  The additional fields that are able to be returned in these reports are listed below.  Note that many of these are the same as the searchable fields listed above, however there are generally more returnable than searchable fields for a given result.  Taxonomic information can be returned via tax_id or scientific_name, and geospatial information is returned using location.

Result Returnable fields
analysis analysis_accession, analysis_alias, analysis_title, analysis_type, center_name, first_public, sample_accession_list, scientific_name, study_accession, study_alias, study_title, submitted_aspera, submitted_bytes, submitted_ftp, submitted_galaxy, submitted_md5, tax_id
analysis_study analysis_accession, analysis_alias, analysis_title, analysis_type, center_name, first_public, sample_accession_list, scientific_name, study_accession, study_alias, study_title, submitted_aspera, submitted_bytes, submitted_ftp, submitted_galaxy, submitted_md5, tax_id
assembly accession, assembly_description, assembly_name, assembly_title, scientific_name, strain, study_accession, tax_id 
read_experiment base_count, center_name, experiment_accession, experiment_alias, experiment_title, fastq_aspera, fastq_bytes, fastq_ftp, fastq_galaxy, fastq_md5, first_public, instrument_model, instrument_platform, library_layout, library_name, library_selection, library_source, library_strategy, project_accession, read_count, run_accession, run_alias, sample_accession_list, scientific_name, study_accession, study_alias, study_title, submission_accession, submitted_aspera, submitted_bytes, submitted_ftp, submitted_galaxy, submitted_md5, tax_id 
read_run base_count, center_name, experiment_accession, experiment_alias, experiment_title, fastq_aspera, fastq_bytes, fastq_ftp, fastq_galaxy, fastq_md5, first_public, instrument_model, instrument_platform, library_layout, library_name, library_selection, library_source, library_strategy, project_accession, read_count, run_accession, run_alias, sample_accession_list, scientific_name, study_accession, study_alias, study_title, submission_accession, submitted_aspera, submitted_bytes, submitted_ftp, submitted_galaxy, submitted_md5, tax_id 
read_study base_count, center_name, experiment_accession, experiment_alias, experiment_title, fastq_aspera, fastq_bytes, fastq_ftp, fastq_galaxy, fastq_md5, first_public, instrument_model, instrument_platform, library_layout, library_name, library_selection, library_source, library_strategy, project_accession, read_count, run_accession, run_alias, sample_accession_list, scientific_name, study_accession, study_alias, study_title, submission_accession, submitted_aspera, submitted_bytes, submitted_ftp, submitted_galaxy, submitted_md5, tax_id 
read_trace accession, scientific_name, tax_id, trace_name 
sample accession, bio_material, cell_line, cell_type, collected_by, collection_date, country, cultivar, culture_collection, description, dev_stage, ecotype, environmental_sample, first_public, germline, host, host_status, identified_by, isolate, isolation_source, location, mating_type, sample_alias, scientific_name, serotype, serovar, sex, specimen_voucher, strain, sub_species, sub_strain, tax_id, tissue_lib, tissue_type, variety 
sample_coding accession, allele, artificial_location, base_count, bio_material, cell_line, cell_type, codon_start, collected_by, collection_date, country, cultivar, culture_collection, dataclass, db_xref, description, dev_stage, ec_number, ecotype, environmental_sample, exception, experiment, first_public, function, gene, gene_synonym, germline, host, identified_by, inference, isolate, isolation_source, keywords, lab_host, last_updated, location, locus_tag, map, mating_type, mol_type, note, old_locus_tag, operon, organelle, partial, product, protein_id, pseudo, pseudo_gene, ribosomal_slippage, scientific_name, serotype, serovar, sex, specimen_voucher, standard_name, strain, sub_species, sub_strain, tax_division, tax_id, tissue_lib, tissue_type, topology, trans_splicing, transl_except, transl_table, variety 
sequence_release accession, base_count, bio_material, cell_line, cell_type, collected_by, collection_date, country, cultivar, culture_collection, dataclass, description, dev_stage, ecotype, environmental_sample, first_public, germline, host, identified_by, isolate, isolation_source, keywords, lab_host, last_updated, location, mating_type, mol_type, organelle, scientific_name, serotype, serovar, sex, specimen_voucher, strain, sub_species , sub_strain, tax_division, tax_id, tissue_lib, tissue_type, topology, variety 
sequence_update accession, base_count, bio_material, cell_line, cell_type, collected_by, collection_date, country, cultivar, culture_collection, dataclass, description, dev_stage, ecotype, environmental_sample, first_public, germline, host, identified_by, isolate, isolation_source, keywords, lab_host, last_updated, location, mating_type, mol_type, organelle, scientific_name, serotype, serovar, sex, specimen_voucher, strain, sub_species , sub_strain, tax_division, tax_id, tissue_lib, tissue_type, topology, variety 
study breed, cultivar, isolate, scientific_name, strain, study_accession, study_description, study_name, study_title, tax_id 
taxon scientific_name, tax_id, taxon_accession, taxon_title 

Using the marker portal

The marker portal is a special section of the data warehouse, with a new domain id "marker".  There are currently 12 coding markers that have been defined and linked to the CDS data table:

Marker symbol Marker description
COX1 Cytochrome C oxidase subunit I
CYTB Cytochrome B
EEF1A Elongation factor 1-alpha 
EEF2K Eukaryotic elongation factor-2 kinase 
MATK Maturase K 
NAD1 NADH dehydrogenase subunit 1 
NAD2 NADH dehydrogenase subunit 2 
NAD4 NADH dehydrogenase subunit 4 
NAD5 NADH dehydrogenase subunit 5 
RAG2 Recombination activating gene 2 
RBCL Ribulose bisophosphate carboxylase 
RHOD Rhodopsin 

The marker domain used by this portal allows the search of a single marker (using a marker symbol listed above) and any taxonomic search to narrow the results to the taxa of interest.  No other fields are currently searchable for this domain.

Downloading/displaying reports, sequences and taxonomic coverage information, all require the result to be set.  For coding markers, this is always sequence_coding.  The examples below show how to perform these queries.

Examples

Download all CDS and parent accessions for mammalian COX1 sequences

http://www.ebi.ac.uk/ena/data/warehouse/search?query="marker="COX1"%20AND%20tax_tree(40674)"&domain=marker&result=sequence_coding&fields=parent_accession&display=report

Display FASTA sequences for all mammalian COX1 sequences

http://www.ebi.ac.uk/ena/data/warehouse/search?query="marker="COX1"%20AND%20tax_tree(40674)"&domain=marker&result=sequence_coding&display=fasta

Display the mammalian taxonomic coverage (with sequence counts) for COX1

http://www.ebi.ac.uk/ena/data/warehouse/search?query="marker="COX1"%20AND%20tax_tree(40674)"&domain=marker&result=sequence_coding&display=marker

Note to download the taxonomic coverage above, append &download=marker to the end of the URL

Retrieve SRA metadata in XML format

SRA metadata XMLs can be retrieved using the display=xml parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/ERA000092&display=xml will return the SRA Submission ERA000092 object in XML format.

Retrieve EMBL-Bank sequences in fasta format

EMBL-Bank sequences can be returned in fasta format using the display=fasta parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/A00145&display=fasta will return the EMBL-Bank sequence A00145 in fasta format.

Retrieve EMBL-Bank subsequences in fasta format

The range parameter can be used in combination with the display=fasta option to return EMBL-Bank subsequences in fasta format. For example, retrieval of a subsequence from EMBL-bank entry A00145 from bases 3 to 63 is done using the URL: http://www.ebi.ac.uk/ena/data/view/A00145&display=fasta&range=3-63.

Retrieve EMBL-Bank subsequences in HTML format

The range parameter can be used in combination with the display=html [default] option to return EMBL-Bank subsequences in HTML format. For example, retrieval of a subsequence from EMBL-bank entry A00145 from bases 3 to 63 is done using the URL: http://www.ebi.ac.uk/ena/data/view/A00145&range=3-63.

Retrieve EMBL-Bank records in flat file format

EMBL-Bank records can be retrieved in flat file format using the display=text parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/A00145&display=text will return the EMBL-Bank entry A00145 in flat file format.

Retrieve EMBL-Bank records in XML format

EMBL-Bank records can be retrieved in XML format using the display=xml parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/A00145&display=xml will return the EMBL-Bank entry A00145 in XML format.

Retrieve EMBL-Bank expanded CON records

To retrieve EMBL-Bank expanded CON records please use the expanded=true parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/AL513382&display=text&expanded=true will return the expanded CON entry AL513382 in flat file format.

Expanded CON records are different from CON records in two ways. Firstly, the expanded CON records contain the full sequence in addition to the contig assembly instructions. Secondly, if a CON record contains only source or gap features the expanded CON records will also display all features from the segment records.

Retrieve EMBL-Bank header in flat file format

To retrieve EMBL-Bank header in flat file format please use the header=true parameter, e.g.:

http://www.ebi.ac.uk/ena/data/view/BN000065&display=text&header=true

Retrieve EMBL-Bank header in XML format

To retrieve EMBL-Bank header in XML format please use the header=true parameter, e.g.:

http://www.ebi.ac.uk/ena/data/view/BN000065&display=xml&header=true

Retrieve EMBL-Bank records using sequence versions

Specific EMBL-Bank sequence versions can be retrieved by appending the sequence version to the identifer:

<identifier>.<sequence version>

In the html view (display=html), if the requested sequence version is below the current sequence version, a warning is shown with a link to historical sequence versions of the record, e.g.: http://www.ebi.ac.uk/ena/data/view/AM407889.1. Also, if the requested sequence version is higher than the current sequence version, a warning is shown: e.g.: http://www.ebi.ac.uk/ena/data/view/AM407889.12). Note that in html view links to historical versions of EMBL-Bank records are always provided in the Navigation panel.

In the xml view (display=xml), the latest sequence version is always displayed, but an analogous warning can be constructed as both requested and returned sequence versions are presented: e.g.: http://www.ebi.ac.uk/ena/data/view/AM407889.1&display=xml.

In fasta and text view (display=fasta and display=text), the requested sequence version is returned directly, e.g.: http://www.ebi.ac.uk/ena/data/view/AM407889.1&display=fasta and http://www.ebi.ac.uk/ena/data/view/AM407889.1&display=text.

Retrieve EMBL-Bank graphical image

A graphical view is available for EMBL-Bank records that have feature/assembly annotation or a sequence. These views contain two components, feature/assembly and sequence, for which ranges to be viewed are defined separately using the featureRange and sequenceRange parameters.

For example, The URL http://www.ebi.ac.uk/ena/data/view/graphics/BN000065&featureRange=10-300000&sequenceRange=103-166 returns an image created from EMBL-Bank entry BN000065 showing feature and assembly annotation from bases 10-300000 and sequence from bases 103-166.

Retrieve Taxon records in XML format

Taxon records can be retrieved in XML format using the display=xml parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/Taxon:2759&display=xml returns the taxonomy record for Eukaryota.

Retrieve Taxon records in Darwin Core XML format

Taxon records can be retrieved in Darwin Core XML format using the display=dwc parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/Taxon:2759&display=dwc returns the taxonomy record for Eukaryota in Darwin Core XML.

Retrieve Trace sequences in fasta format

Trace sequences can be returned in fasta format using the display=fasta parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/TI1&display=fasta will return the Trace sequence TI1 in fasta format.

Retrieve Trace sequences in fastq format

Trace sequences can be returned in fastq format using the display=fastq parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/TI1&display=fastq will return the Trace sequence TI1 in fastq format.

Retrieve Trace metadata in XML format

Trace metadata XMLs can be retrieved using the display=xml parameter. For example, the URL http://www.ebi.ac.uk/ena/data/view/TI1&display=xml will return the Trace T1 metadata in XML format.image