Advanced search tutorial

ENA advanced search can be used either via the browser or via URLs. Some examples of what this search service can be used for are listed below. For more information on building advanced queries, please read the data warehouse documentation.

Download the Bos taurus complete genome

  1. Go to the advanced search page: http://www.ebi.ac.uk/ena/data/warehouse/search
  2. Choose the Assembly domain
  3. In the Taxon name field (with the NCBI option), type in Bos taurus
  4. Hit Search
  5. Choose view all results
  6. Select the link for the "Genome assembly for Bos taurus from the University of Maryland"
  7. Scroll down to the Chromosomes table and choose the TEXT, FASTA or XML download link to download all sequences in the format required.

Download all reads for study PRJEB2045

The SRA File downloader is a good way to download raw read data, this is available on the data page in the browser. However if you wish to use wget or aspera to download the files as per the instructions on our download page, you can get a tab separated report of the file locations using the following advanced search URL.

  1. Choose the type of data to search for, in this case study: query="study_accession="PRJEB2045""
  2. Define what files you are interested in: i.e., if you want to download the read files: result=read_run (if you were interested in the analysis files, this would be result=analysis)
  3. Choose the information to be returned. This can be the submitted or fastq files and can be ftp or aspera format, e.g. fields=fastq_ftp. Note that by default the run_accession will be added to the output as the first column for a read_run result (and analysis_accession for an analysis result)
  4. Choose the format of the result (this can only be report if you want the requested fields returned): display=report
  5. Build the URL:http://www.ebi.ac.uk/ena/data/warehouse/search?query="study_accession="PRJEB2045""&result=read_run&fields=fastq_ftp&display=report

Additional things to note:

  1. The resulting report is in a tab separated file and the first row contains the headers.
  2. If you want the result to be saved in a file rather than to be processed directly from the URL stream, add &download=txt to the end of the above URL.
  3. If there is more than one file for a run/analysis accession, they will be provided as a semi colon separated list.

Download all viral sequences in fasta format

Note that the sequence domain is divided into two results: update and release. Fasta sequences therefore need to be fetched in at least two batches, one for each result.

To download these sequences via the browser:

  1. Go to the advanced search page: http://www.ebi.ac.uk/ena/data/warehouse/search
  2. Choose the Sequence domain
  3. In the Taxonomic division drop down menu, select VRL
  4. Hit the search button
  5. On the left hand side of the screen, the two results will be listed, with a count of how many sequences are available, select Sequence (Update)
  6. To download all fasta sequences, click on the Download FASTA link located above the sequence listing, on the right hand side, without changing any of the numbers in the Download boxes.
  7. After the file has been downloaded, select the Sequence (Release) result from the left hand panel and repeat (5).

To download these sequences via URL:

  1. Define the query: query="tax_division=""VRL"
  2. Set the result to fetch: result=sequence_update or result=sequence_release
  3. Choose the display and download formats: display=fasta&download=fasta
  4. If you are expecting more than 100,000 sequences, set the limit. By default if no limit parameter is supplied, a maximum of 100,000 sequences will be returned. If you want to download a very large number of sequences, you may want to break up the query into several batches using the offset and limit parameters. These are described in more detail in our help documentation .
  5. Build the URL:http://www.ebi.ac.uk/ena/data/warehouse/search?query="tax_division="VRL""&result=sequence_update&display=fasta&download=fasta

Investigate the coverage of COX1 sequences for Mus musculus and subpsecies

  1. Go to the advanced search page: http://www.ebi.ac.uk/ena/data/warehouse/search
  2. Choose the Marker domain
  3. In the Marker group drop down menu, select Cytochrome C oxidase subunit I
  4. In the Taxon name field (with the NCBI option), type in Mus musculus and select "include subordinate taxa"
  5. Hit the search button
  6. Choose view all results
  7. Select the Marker tab
  8. You will see a graphical representation of the taxonomic tree from the Mus musculus node. Grey nodes indicate no COX1 sequences for that taxon, otherwise the darker the blue, the more sequences available. Hovering over a node will display the full scientific name and the number of sequences to the right hand side of the plot.
  9. Select the appropriate TEXT link to view or download a report of the number of sequences for each taxon.

Latest ENA news

11 Oct 2017: Read data download issues resolved

Read data download issues previously affecting ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk services now resolved.

06 Oct 2017: ENA read data download issues

Issues with read data download from ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk

04 Oct 2017: ENA Release 133

Release 133 of ENA's assembled/annotated sequences now available