 |
2Can Support Portal - Nucleotide AnalysisBLAST similarity search <<< 1/7 >>>
BLAST similarity search - Introduction
WU-BLAST2 stands for Washington University Basic Local Alignment Search Tool Version 2.0. It is used to compare a query sequence with those contained in nucleotide databases by aligning the query sequence with previously characterised genes, therefore helping in identifying genes. The emphasis of this tool is to find regions of
sequence similarity. These can yield clues about the structure and function
of this novel sequence, and about its evolutionary history and homology with
other sequences in the database. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in one location, or global, where regions of similarity can be detected across otherwise unrelated genetic code. Dr Warren Gish at Washington University released this first "gapped" version of BLAST, known as WU-BLAST2 (Washington University version), allowing for gapped alignments and statistics.
The fundamental unit of BLAST algorithm output is the High-scoring Segment Pair (HSP). An HSP consists
of two sequence fragments of arbitrary but equal length whose alignment is locally maximal and for
which the alignment score meets or exceeds a threshold or cutoff score. A set of HSPs is thus defined by
two sequences, a scoring system, and a cutoff score; this set may be empty if the cutoff score is sufficiently
high. In the programmatic implementations of the BLAST algorithm described here, each HSP consists of a
segment from the query sequence and one from a database sequence. The sensitivity and speed of the programs
can be adjusted via the standard BLAST algorithm parameters W, T, and X (Altschul et al., 1990);
selectivity of the programs can be adjusted via the cutoff score.
The approach to similarity searching taken by the BLAST programs is first to look for similar segments
(HSPs) between the query sequence and a database sequence, then to evaluate the statistical significance of
any matches that were found, and finally to report only those matches that satisfy a user-selectable threshold
of significance. Findings of multiple HSPs involving the query sequence and a single database
sequence may be treated statistically in a variety of ways. By default the programs use ''Sum'' statistics
(Karlin and Altschul, 1993). As such, the statistical significance ascribed to a set of HSPs may be higher
than that ascribed to any individual member of the set. Only when the ascribed significance satisfies the
user-selectable threshold (E parameter) will the match be reported to the user.
We will consider a sequence, sequence 3, and look for sequences that are similar in the EMBL Nucleotide Sequence Database. This sequence is a real entry in this database, so we will expect to find a sequence that is a perfect match to our test sequence. Also we expect to find similar sequences, perhaps from closely related animals, or from nucleotide coding sequences for closely related proteins.
Consider the following WU-BLAST2 submission form:
- The sequence, sequence 3 is entered into the textbox in FASTA format, which consists of a one-line header starting with a ">" symbol, followed by the sequence name. The sequence is then entered on new line(s). You can find out more about sequence formats here.
- "interactive" is chosen so that I will have the results delivered to the browser as soon as they are available. Alternatively you can chose "email" and fill in your email and have the results delivered via email.
- The title of the search is left as "_Sequence" although you can give your search title any name you wish to help you identify the results.
- The Wu-BLASTn program is chosen, which is designed to search a nucleotide query sequence against a DNA databank, in this case I have selected EMBL Standard.
- The number of scores (hits to the database) is limited to 10 and the number alignments of these against the query sequence is limited to 5, this is done to limit the size of the output results.
- Other options have been left on "default"
- You now can either go to the WU-BLAST2 page and run the search yourself or view the sample results for sequence 3.
- Which protein does this sequence code for?
See an explanation of the results of this BLAST search >>> |
|