Accessing ENA data programmatically

Programmatic access using ENA browser REST URLs

Using ENA browser REST URLs, a wide variety of data is accessible in a variety of different formats. Single or multiple identifiers (including data ranges) can be used to retrieve up to 100,000 records at a time, which can be gzip-compressed or uncompressed. It is also possible to request specific taxonomy, or to retrieve archived versions of the data.

Examples

Here are some examples of what can be achieved through ENA browser REST URLs:

  • Retrieve EMBL-Bank sequences in FASTA format;

  • Retrieve EMBL-Bank records in XML or flat file formats;

  • Retrieve EMBL-Bank records using sequence versions;

  • Retrieve EMBL-Bank graphical images;
  • Retrieve Taxon records in XML or Darwin Core XML formats;

  • Retrieve a list of SRA submitted files or FASTQ files;

  • Retrieve SRA metadata in XML format;

  • Retrieve Trace sequences in FASTA or FASTQ formats;

  • Retrieve Trace metadata in XML format.

 

Help

More information on programmatic access to ENA data can be obtained from the ENA help pages.

More information on ENA Browser REST URLs can be obtained from the ENA help pages.

 

Programmatic access using Dbfetch

Dbfetch provides an easy way to retrieve data from multiple databases, including ENA, in a consistent manner (Figure 50). Dbfetch can be used from any web browser, as well as within a web-aware scripting tool that uses wget, lynx or similar.

EBI Dbfetch tool showing the range of databases and data formats available

Figure 50. EBI Dbfetch tool showing the range of databases and data formats available.

Notes

[A] Databases within Dbfetch include EMBL-Bank, the EMBL-Bank data classes CDS (coding sequence) and CON (constructed), EMBL-SVA (sequence version archive) and Ensembl, as well as a variety of protein and immunoglobulin databases.

[B] Search items takes EMBL-Bank accessions, including accession ranges.

[C] Formats available for EMBL-Bank include EMBL, FASTA and EMBL-XML formats; other formats are available for different databases.

[D] Click Retrieve to obtain ENA sequence or entry data.

 

Help

More information on Dbfetch is available here.

More information on Dbfetch URL syntax can be found here.