REST URLs to search ENA data

This page describes how to use free text and advanced search programmatically to search data in ENA.

While this documentation focuses on the full functionality that is offered by the RESTful interface to the ENA Advanced Search, it can also serve to provide assistance to users of the Query Builder web interface to this service, which offers cut-down functionality (such as supporting only Boolean 'AND' operations). We will provide more specific documentation for the Query Builder interface soon.


While searches within the ENA browser are performed and/or displayed via domains, programmatic access is only available when a result is declared. A domain comprises a number of results that are deeper partitions of the ENA content. For queries based on these more granular results, display/download format and pagination options are available. While a domain is a partition of content based on the conceptual nature of content (e.g. raw sequence reads vs. annotated assembled sequences) a result is a partition that also takes into account the structure of the underlying content. Because diverse structures are used in ENA for managing different data, it is only at the level of results that some format options are made available.

Free text search

The URL syntax for retrieving records from ENA via free text search is:

http://www.ebi.ac.uk/ena/data/search?query=<query string>&result=<result>[Pagination options][Display options][Download options]

The query string is made up of terms joined with "+".  For example, to search for human kinase sequences, the search query would be "kinase+homo+sapiens".  To fetch these sequences in FASTA format, the following URL could be used:
http://www.ebi.ac.uk/ena/data/search?query=kinase+homo+sapiens&result=sequence_release&display=fasta

By default the first 100,000 records are returned If you wish to download more than this, you will need to use the pagination options.  To determine how many results are available for your search, add the resultcount parameter to your query:
http://www.ebi.ac.uk/ena/data/search?query=<query string>&result=<result>&resultcount

Using the data warehouse

The URL syntax for retrieving records from the ENA data warehouse programmatically is:

http://www.ebi.ac.uk/ena/data/warehouse/search?query=<query string>&result=<result>[Pagination options][Display options][Download options]

By default, the first 100,000 records are returned.  If you wish to download more than this, you will need to use the pagination options.  To determine how many results are available for your search, add the resultcount parameter to your query:
http://www.ebi.ac.uk/ena/data/warehouse/search?query=<query string>&result=<result>&resultcount

Examples

Return coding sequences, in fasta format, from the STD dataclass for all members of the phylum Diptera (Taxon ID 7147):
http://www.ebi.ac.uk/ena/data/warehouse/search?query=%22tax_tree(7147)%20AND%20dataclass=%22STD%22%22&result=coding_release&display=fasta
http://www.ebi.ac.uk/ena/data/warehouse/search?query=%22tax_tree(7147)%20AND%20dataclass=%22STD%22%22&result=coding_update&display=fasta
Note that both the coding_release and coding_result are required to get all coding results.

Download a compressed flat file representing sequences from in and around the Galapagos Islands:
http://www.ebi.ac.uk/ena/data/warehouse/search?query="geo_circ(-0.587,-90.5713,170)"&result=sequence_release&display=text&download=gzip

Download all paired RNA-seq reads from Hi-Seq platforms in XML format: http://www.ebi.ac.uk/ena/data/warehouse/search?query=%22(instrument_model=%22Illumina%20HiSeq%202000%22%20OR%20instrument_model=%22Illumina%20HiSeq%201000%22%20OR%20instrument_model=%22Illumina%20HiSeq%202500%22)%20AND%20library_layout=%22PAIRED%22%20AND%20library_source=%22TRANSCRIPTOMIC%22%22&result=read_run&display=xml&download=xml

Retrieve genome assemblies for the house mouse (Mus musculus, Taxon ID 10090):
http://www.ebi.ac.uk/ena/data/warehouse/search?query="tax_eq(10090)"&result=assembly&display=xml

Domains and results

The available domains and results are listed here.

Query string

The query string is made up of filtering conditions, joined by logical ANDs, ORs and NOTs and bound by double quotes. The use of parentheses is also supported. For example, the following query string could be used: query="<filter1> AND (<filter2> OR <filter3>) OR NOT <filter4>"

For ease of reading, query strings have not been URL encoded in the examples below.

Filter types

The following filter types are supported:

  • boolean filter
  • controlled vocabulary filter
  • date filter
  • number filter
  • text filter
  • geospatial filter
  • taxonomy filter
Boolean filter
Operator =
Value yes, true, no, false
Example environmental_sample=true
Controlled vocabulary filter
Operator =, !=
Value A text value from the controlled vocabulary enclosed in double quotes
Example library_source="GENOMIC"
Date filter
Operator =, !=, <, <=, >, >=
Value A date in the format YYYY-MM-DD
Example first_public > 2012-01-01
Number filter
Operator =, !=, <, <=, >, >=
Value Any integer
Example base_count > 4000000
Text filter
Operator =, !=
Value Any text value enclosed in double quotes. Wildcard (*) can be used at the start and/or end of the text value.
Example library_name =”*HUM*"
Geospatial filter
Function Description Parameters Example
geo_box1 All locations within a box defined by the lower left (SW) and upper right (NE) points. south-west latitude, south-west longitude, north-east latitude, north-east longitude geo_box1(-20, 10, 20, 50)
geo_box2 All locations within a box defined by a centre point and a radius in km. latitude, longitude, radius (km) geo_box2(35, 100, 300)
geo_circ All locations within a circle defined by a centre point and a radius in km. latitude, longitude, radius (km) geo_circ(35, 100, 300)
geo_lat All locations within a latitude range given by a latitude and a radius in km. latitude, radius (km) geo_lat(0, 100)
geo_north All locations north of a given latitude (inclusive). latitude geo_north(80)
geo_south All locations south of a given latitude (inclusive). latitude geo_south(-80)
geo_point An exact lat/lon position latitude, longitude geo_point(9.12,-79.7)
Taxonomy filter
Function Description Parameters Example
tax_eq All records that match the given NCBI taxonomy identifier NCBI taxonomy identifier tax_eq(9606)
tax_tree All records that match the given NCBI taxonomy identifier or are descendants of it NCBI taxonomy identifier tax_tree(2759)
tax_name All records that match the given NCBI scientific name NCBI scientific name tax_name("Homo%20sapiens")

Filter conditions

The geospatial and taxonomy filters are function based. All other filters use the following syntax: <filter column> <operator> <value>

Filter columns

A full list of filter columns is available here.

Retrieve tabulated data from the data warehouse

In addition to the formats listed above, a tab separated report of data can be returned for each result (that is, this report cannot be returned if searching by domain rather than result). The URL format for retrieving these reports is: http://www.ebi.ac.uk/ena/data/warehouse/search?query=<query string>&result=<result>&fields=<fields>&display=report[&sortfields=<sortfields>][&download=txt][Pagination options]

Each result has a default accession column. This is returned as the first column of the report regardless of whether or not it was listed in the fields to be retrieved. The report will be sorted by this accession column unless sortfields is provided, in which case it will sort in the order of the listed columns. The additional fields that are able to be returned in these reports are listed here. Note that many of these are the same as the searchable fields listed above, however there are generally more returnable than searchable fields for a given result. Taxonomic information can be returned via tax_id or scientific_name, and geospatial information is returned using location.

Note: as of 17th June 2014, the format of the date in the tabulated report changed to ISO format. We support single dates (YYYY-MM-DD) and date ranges (YYYY-MM-DD/YYYY-MM-DD).

Latest ENA news

11 Oct 2017: Read data download issues resolved

Read data download issues previously affecting ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk services now resolved.

06 Oct 2017: ENA read data download issues

Issues with read data download from ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk

04 Oct 2017: ENA Release 133

Release 133 of ENA's assembled/annotated sequences now available