spacer
spacer

2Can Support Portal - Nucleotide Analysis


Interpreting the results of BLAST2 EVEC (1)



You will be presented in the results with a summary table for the best 5 hits, click the button "BLAST Result" to see the results.

You should have noticed that sequence 1 and sequence 2 appear to be contaminated with vectors. A closer look is required to see if these are genuine hits, or a result of finding a loose match to a similar functional domain to one found in the vectors database. This is due to the contents of the database, and the default E-Values allowing these hits to pass through.

Summary for top 5 hits for sequence 1

							  
Score    E Sequences producing significant alignments:              (bits)  Value
EM_VEC:CVPJB8 X98612.1 Artificial cloning vector pjb8                1707   0.0
EM_VEC:CVBETLACT X94336.1 Cloning vector beta-lactamase gene         1707   0.0
EM_VEC:ASPBAD18 X81838.1 E.coli DNA for pBAD18 cloning vector        1707   0.0
EM_VEC:APGD57 X67019.1 Artificial DNA sequence (pGD57) of pBR322...  1707   0.0
EM_VEC:CVU78874 U78874.1 pGEX-6P-3 cloning vector, complete sequ...  1707   0.0

Summary for top 5 hits for sequence 2

 
Score    E Sequences producing significant alignments:               (bits) Value
EM_VEC:L8318BY AL354421.1 Leishmania major Friedlin cosmid L8318...    42   0.007
EM_VEC:AF338825 AF338825.1 Cloning vector pHLH/int(+), complete ...    34   1.7
EM_VEC:AY142483 AY142483.1 Cloning vector pSM565, complete seque...    32   6.6
EM_VEC:AY091640 AY091640.1 Cloning vector pRAF800, complete sequ...    32   6.6
EM_VEC:AB105370 AB105370.1 Cloning vector RCAS-L14 DNA, complete...    32   6.6

In the case of sequence 1, you will see that the parameter "Value" for all these hits is zero, so all these hits are exact matches and are a definitly a result of vector contamination. You can confirm this by looking at the alignment below, and seeing that the alignment is over a long distance, without gaps being inserted into the alignment, for example:

Query: 1    atgagtattcaacatttccgtgtcgcccttattcccttttttgcggcattttgccttcct 60
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 5197 atgagtattcaacatttccgtgtcgcccttattcccttttttgcggcattttgccttcct 5138

                                                                        
Query: 61   gtttttgctcacccagaaacgctggtgaaagtaaaagatgctgaagatcagttgggtgca 120
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 5137 gtttttgctcacccagaaacgctggtgaaagtaaaagatgctgaagatcagttgggtgca 5078
.
.
.
etc  

In the case of sequence 2, you will see that the parameter "Value" for all these hits is always positive. As a rule of thumb, for a first look at these, if the value is not "to the power of" (e.g 1.2 x10-7) it is not likely to be significant. Now let us have a look at the alignment for L8318BY that has a "Value" of 0.007, which requires a little further investigation.

							  
>EM_VEC:L8318BY AL354421.1 Leishmania major Friedlin cosmid L8318.2 
t7 end-sequence, similar to CV59231 U59231 Cloning vector cLHYGpk, complete sequence...., N=2277, Prob=3.2e-192. Length = 650 Score = 42.1 bits (21), Expect = 0.007 Identities = 33/37 (89%) Strand = Plus / Minus Query: 3786 gtgtgcgtgtgtgtgcgtgtgcgtgcgtgcgtgcgtg 3822 ||||| ||||||||| ||||| ||| ||||||||||| Sbjct: 222 gtgtgtgtgtgtgtgtgtgtgtgtgtgtgcgtgcgtg 186 Score = 38.2 bits (19), Expect = 0.11 Identities = 31/35 (88%) Strand = Plus / Minus Query: 3792 gtgtgtgtgcgtgtgcgtgcgtgcgtgcgtgcgtg 3826 ||||||||| ||||| ||| ||| ||||||||||| Sbjct: 220 gtgtgtgtgtgtgtgtgtgtgtgtgtgcgtgcgtg 186

Here you will see that the alignmenta are very short, and also contains gaps in the alignment (No "|" symbol joining the upper and lower sequence). This is obviously not a good match and can be ignored which is probably a result of similar functional domains being present in the query sequence and in the entry in the vector database.

You should now be convinced that sequence 1 is contaminated with vectors and sequence 2 is clean of vectors. Sequence 1 will need to have the sequence code trimmed before it becomes useful.

With sequence 1 there are at least 5 perfect "hits" against the emvec database, if the number of scores was not limited to 5, more may have been found.

Let us now consider a score:


ore    E Sequences producing significant alignments:                 (bits) Value
EM_VEC:L8318BY AL354421.1 Leishmania major Friedlin cosmid L8318...    42   0.007
.
.
etc.

  • EM_VEC: This is the name of the database, i.e. EMBL VECtors.

  • L8318BY: This is the EMBL accession number and link to the database entry, retrievable from SRS.

  • Leishmania major Friedlin cosmid L8318... This the the start of the entry description "Leishmania major Friedlin cosmid L8318.2 t7 end-sequence, similar to CV59231 U59231 Cloning vector".

  • Bit Score: The higher the score, the better the match with the database.

  • E value: Reflects the number of matches expected to be found by chance. A value of zero means that a perfect match/alignment has been found in the EMBL Vectors database. This is useful as a short sequence may have large numbers of hits but not be a significant alignment.

At the bottom of the page are statistical results as follows:


We will now consider the rest of the results page >>>

<<< Previous || Start of Lesson || Next >>>

spacer
spacer