In March 2015, ENA introduced a new sequence search service built on EBI's central BLAST search service. Our interface allows users to easily select which subset of INSDC sequences to search against, including the ability to limit searches by dataclass or tax division.
Programmatic users should use the central EBI BLAST SOAP or REST web services. For guidance on which databases to use to perform similar searches as those offered through our interface, please contact email@example.com.
The query sequence can be pasted into the text area provided or uploaded from a file. Alternatively, if the sequence has a public INSDC accession, this can be given and the sequence will be fetched from ENA. By default, the entire length of the query sequence is used in the search, however there is also an option to limit it to a fragment of the sequence provided.
Sequences to search against
|Assembled and annotated sequences||Records which have been generated from raw sequencing data and contain functional annotation. They have usually undergone various steps of quality control and may have the functional annotation experimentally validated through wet lab work. Note that WGS sequences are excluded from the searchable set, as are all CON sequences that are not prokaryotic.|
|Geo-referenced sequences||A subset of assembled and annotated sequences consisting of records with latitude and longitude coordinates.|
|Barcode sequences||A subset of assembled and annotated sequences consisting of records which conform to Consortium for the Barcode of Life (CBoL) standards.|
|Non-coding sequences||Annotated non-coding sequences derived from assembled and annotated sequences consisting of records containing specific non-coding features.|
|Coding sequences||Annotated coding sequences derived from assembled and annotated sequences consisting of records containing coding features.|
|Vectors (Emvec)||A reference set of plasmid vectors and tag sequences, etc., that can be used for screening and filtering of data for analysis and submission.|
Searches against assembled and annotated, coding and non-coding sequences can be limited by taxonomic division or dataclass. However, this will also limit the search to only those sequences included in the most recent release as there are limitations to offering this for new and updated sequences at this time.
There are three different BLAST programs that can be run. Please make note of what sequence type is required for each program as using the incorrect program for the type of sequence used in the search will result in an error.
|blastn||Compares a nucleotide sequence (DNA/RNA) to a nucleotide sequence database.||DNA/RNA|
|tblastx||Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. Please note that tblastx is extremely slow and cpu-intensive.||DNA/RNA|
|tblastn||Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.||protein|
Once the required program has been selected, the search can be submitted using all of the default settings for that program. If you're wanting to tailor the number of search results returned and how the alignments are scored, you can alter any of the parameters listed in the following table, by clicking on the "More options" link, under the program selection.
|Maximum scores||Maximum number of match score summaries reported in the result output.||blastn, tblastx, tblastn|
|Maximum alignments||Maximum number of match alignments reported in the result output.||blastn, tblastx, tblastn|
|Expect threshold||Limits the number of scores and alignments reported based on the expectation value. This is the maximum number of times the match is expected to occur by chance.||blastn, tblastx, tblastn|
|Alignment views||Formating for the alignments. See the table below for more information on the options available.||blastn, tblastx, tblastn|
|Filter low complexity regions||Filter regions of low sequence complexity. This can avoid issues with low complexity sequences where matches are found due to composition rather than meaningful sequence similarity. However in some cases filtering also masks regions of interest and so should be used with caution.||blastn, tblastx, tblastn|
|Match/mismatch scores||The match score is the bonus to the alignment score when matching the same base. The mismatch is the penalty when failing to match.||blastn|
|Drop off||The amount a score can drop before gapped extension of word hits is halted||blastn, tblastx, tblastn|
|Gap existence cost||Penalty taken away from the score when a gap is created in sequence. Increasing the gap existence cost will decrease the number of gaps in the final alignment.||blastn, tblastx, tblastn|
|Gap extension cost||Penalty taken away from the score for each base or residue in the gap. Increasing the gap extension cost favors short gaps in the final alignment, conversly decreasing the gap extension cost favors long gaps in the final alignment.||blastn, tblastx, tblastn|
|Matrix||The substitution matrix used for scoring alignments when searching the database.||tblastx, tblastn|
|Composition-based adjustments||Whether to use composition-based adjustments, and if so which kind.||tblastn|
|Align using gaps||If selected, the program will perform an alignment using gaps. Otherwise, it will report only individual HSP where two sequences match each other, and thus will not produce alignments with gaps.||blastn, tblastx, tblastn|
|Translation table||Genetic code to use in translation of query sequence||tblastx|
There are several options for presentation of the aligments in the search result, each of these are described in the table below.
|pairwise||The query and match are output as a pairwsie alignment with a consensus line between the two sequences. In the consensus the match states are represented as: identical match as the base/residue, similarity as a '+' and mismatch as a space.|
|Query-anchored identities||The matches found are shown relative to the ungapped query sequence as a differences to the query. Identities appear as dots (.), similarities in upper case, mismatches in lower case and gaps as dash (-). Insertions are indictated with a line pointing to the insertion site with the inserted sequence on another line.|
|Query-anchored non-identities||The matches found are shown relative to the ungapped query sequence as a differences to the query. Identities and similarities appear in upper case, mismatches in lower case and gaps as dash (-). Insertions are indictated with a line pointing to the insertion site with the inserted sequence on another line.|
|Flat query-anchored identities||The matches found are shown relative to the gapped query sequence as a differences to the query. Identities appear as dots (.), similarities in upper case, mismatches in lower case and gaps as dash (-).|
|Flat query-anchored non-identities||The matches found are shown relative to the gapped query sequence as a differences to the query. Identities and similarities appear in upper case, mismatches in lower case and gaps as dash (-).|
|BLASTXML||Output NCBI BLAST XML instead of a plain text report.|
Submitting your search
Some searches can take several minutes, especially if you are searching against a large set of sequences (e.g. all assembled and annotated sequences). As a visual cue to let you know that a search is still running, we display an ENA "loading" icon in the place of the submit button and block the sequence search form from any editing. Once the search is complete, this icon will disappear, the submit button will return and you should also have a new window open with the search results. If you have the pop-up blocker enabled, you will need to disable it to get the results window. Alternatively, you can select the option to receive an email with a link to the results once the search is complete.
Your results will either be opened as a new window or emailed to you as a link once the search has completed. Your search will be available for 7 days, therefore we suggest you download your BLAST results if you think you might need them for longer. The results page is divided into several tabs, each of which is described below.
|Align||Checkboxes to select results for further actions, eg view in alignment, download or send to multiple sequence alignment|
|DB:ID||The BLAST "database" and INSDC accession of the database sequence. The former can be ignored as it is an internal BLAST service representation of sequence grouping and the accession is used to link to the record in ENA.|
|Source||Information about the record including references to other resources across EBI.|
|Length||The length of the sequence.|
|Score||The literal score of the alignment|
|Identities %||The percentage of identical bases aligned between the query and database sequences.|
|Positives %||The percentage of aligned bases that score positively in the substitution matrix (similar bases)|
|E()||The Expect value (E-value) for the alignment. This is a measure of how likely you are to find the alignment by chance.|
This tab displays the raw output from BLAST and can be downloaded as either a text or XML file.
This tab gives a visual representation of the sequence alignments, showing which portion of the sequences are aligning and colour coding by E-value. Selecting any accession from the left hand side (via a mouse click) will take you to that alignment in the tool output tab.
While you can select your preferred alignment format before performing your BLAST search, all of the different NCBI BLAST outputs are available to download from the result summary tab.
All of the original parameters used for your search are available here as are links to the query sequence used and the output results. This information is useful when you may wish to run the exact query again, if you wish to run the query programmatically or via the command line tool, or if you need to contact EBI with help regarding your sequence search.