How to search ENA with sequence
Only the EMBL-Bank database of ENA can be searched with a nucleotide sequence; the SRA and TA databases contain raw data composed of very short redundant sequences that make them unsuitable for sequence searching.
The ENA browser allows you to search the entire EMBL-Bank database using either a DNA or RNA query sequence. Searching with a sequence is useful if you:
- have a sequence but are not sure of the gene name;
- have an unknown sequence you want to identify;
- want to find orthologues of a gene in other species, or paralogues within a species;
- want to identify sequence variants for a gene, including disease or mutant alleles;
- want to check whether you have identified a novel sequence.
The sequence search box on the ENA Browser will accept either an EMBL-Bank accession number (where it will automatically insert the sequence from that accession):
...or a nucleotide sequence:
The nucleotide sequence can be in plain text (i.e. straight sequence with no header) or FASTA format (as above).
Accessing the advanced sequence search
Advanced sequence search
There are several search options available on the advanced search page that allow you to refine your search (Figure 18).
Figure 18. Advanced sequence search page; at the foot of the page you can change the search method.
Sequence search results page
Figure 19. Example of a results page for a sequence search using EMBL-Bank CDS accession ‘AAB07223.1’ against both EMBL-Bank and Ensembl.
A closer look at the sequence search results page
Taking a closer look at the search results, you will see that they are ordered by e-value, starting with the lowest e-value which is the most significant match (Figure 20).
Figure 20. A close-up of the Ensembl table in the advanced search results.
Additional sequence search tools
Additional search engines are available on the Sequence Similarity Searching page. These include BLAST, PSI-BLAST, FASTA, SSEARCH and specialised search programs. If you wish to search the entire EMBL-Bank and/or Ensembl database, then the fast Exonerate search on the ENA browser is your best option.
However, you might want to consider trying one of the search programs on the Sequence Similarity Searching page if you want to:
- search a subsection of EMBL-Bank not available through the ENA browser search (Figure 21);
- search other databases with nucleotide information, for example patent, structure or immunoglobulin databases;
- carry out a specialised type of search, for example searching a protein database with a DNA sequence (FASTX or BLASTX), or searching with a set of short oligonucleotide sequences (FASTM);
- search with a very short nucleotide sequence, where a true Smith-Waterman program such as SSEARCH would perform better;
- be able to adjust search parameters to fine-tune your search query.
Figure 21. Sequence Similarity & Analysis search page detailing the selection of databases available, including EMBL-Bank subdivisions.