 |
2Can Support Portal - Nucleotide AnalysisChecking for vector contamination <<< 3/12 >>>
Interpreting the results of BLAST2 EVEC (1)
You will be presented in the results with a summary table for the best 5 hits, click the button "BLAST Result" to see the results.
You should have noticed that sequence 1 and sequence 2 appear to be contaminated with vectors. A closer look is required to see if these are genuine hits, or a result of finding a loose match to a similar functional domain to one found in the vectors database. This is due to the contents of the database, and the default E-Values allowing these hits to pass through.
Summary for top 5 hits for sequence 1
Summary for top 5 hits for sequence 2
In the case of sequence 1, you will see that the parameter "Value" for all these hits is zero, so all these hits are exact matches and are a definitly a result of vector contamination. You can confirm this by looking at the alignment below, and seeing that the alignment is over a long distance, without gaps being inserted into the alignment, for example:
In the case of sequence 2, you will see that the parameter "Value" for all these hits is always positive. As a rule of thumb, for a first look at these, if the value is not "to the power of" (e.g 1.2 x10-7) it is not likely to be significant. Now let us have a look at the alignment for L8318BY that has a "Value" of 0.007, which requires a little further investigation.
Here you will see that the alignmenta are very short, and also contains gaps in the alignment (No "|" symbol joining the upper and lower sequence). This is obviously not a good match and can be ignored which is probably a result of similar functional domains being present in the query sequence and in the entry in the vector database.
You should now be convinced that sequence 1 is contaminated with vectors and sequence 2 is clean of vectors. Sequence 1 will need to have the sequence code trimmed before it becomes useful.
With sequence 1 there are at least 5 perfect "hits" against the emvec database, if the number of scores was not limited to 5, more may have been found.
Let us now consider a score:
|
- EM_VEC: This is the name of the database, i.e. EMBL VECtors.
- L8318BY: This is the EMBL accession number and link to the database entry, retrievable from SRS.
- Leishmania major Friedlin cosmid L8318... This the the start of the entry description "Leishmania major Friedlin cosmid L8318.2 t7 end-sequence, similar to CV59231 U59231 Cloning vector".
- Bit Score: The higher the score, the better the match with the database.
- E value: Reflects the number of matches expected to be found by chance. A value of zero means that a perfect match/alignment has been found in the EMBL Vectors database. This is useful as a short sequence may have large numbers of hits but not be a significant alignment.
At the bottom of the page are statistical results as follows:
We will now consider the rest of the results page >>>
|
|