 |
2Can Support Portal - About Optimal Alignments
The alignment that is the best, given a defined set of rules and parameter values for comparing different alignments. There is no such thing as the single best alignment, since optimality always depends on the assumptions one bases the alignment on. For example, what penalty should gaps carry? All sequence alignment procedures make some such assumptions.
Global alignment
An alignment that assumes that the two proteins are basically similar over the entire length of one another. The alignment attempts to match them to each other from end to end, even though parts of the alignment are not very convincing. A tiny example:
Local alignment
An alignment that searches for segments of the two sequences that match well. There is no attempt to force entire sequences into an alignment, just those parts that appear to have good similarity, according to some criterion. Using the same sequences as above, one could get:
It may seem that one should always use local alignments. However, it may be difficult to spot an overall similarlity, as opposed to just a domain-to-domain similarity, if one uses only local alignment, so global alignment is useful in some cases. You can produce a global or a local alignment with the Emboss Pairwise global and local alignment tool. You can search sequence databases, producing local alignments of your query sequence against known sequences with the programs BLAST and FASTA. ClustalW2 is a general purpose multiple sequence alignment program for DNA or proteins. Use this if you wish to compare your sequences against each other. You may wish to edit the alignments you have obtained if you do not like the positions chosen by programs, you can do this using the tool JalView.
Example alignment with ClustalW2
e.g. A multiple sequence alignment was done with ClustalW2 using FOSB_MOUSE vs FOSB_HUMAN. Sequences were input in the FASTA format:
|
Output was in the format:
|
Example alignments with DBClustal
DbClustal addresses the important problem of the automatic multiple alignment of the top scoring full-length sequences detected by a database similarity search. By combining the advantages of both local and global alignment algorithms into a single system, DbClustal is able to provide accurate global alignments of highly divergent, complex sequence sets. Local alignment information is incorporated into a ClustalW2 global alignment in the form of a list of anchor points between pairs of sequences.
Reference:
J. D. Thompson, F. Plewniak, J.-C. Thierry and O. Poch. (2000)
DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches.
Nucleic Acids Research, 2000, Vol. 28, No. 15 2919-2926. 
Example:
The sequence for FOSB_MOUSE was queried with NCBI-BLASTp (or WU-BLAST2 ) against UniProt. The alignments are shown with DBClustal against similar/identical proteins, which were found to be similar on the NCBI BLAST output. |
|
(a) Query(FOSB_MOUSE) Vs FOSB_MOUSE(Identical) ~
(b) Query(FOSB_MOUSE) vs FOSB_HUMAN (similar)
(c) Query(FOSB_MOUSE) vs FOS_RAT (Less similar)
|
Example alignment in Edinburgh Format
Edinburgh format has the two sequences aligned placed directly on top of each other with `*' to show identities and `.' to show conservative replacements above the aligned pair
Example: FOSB_MOUSE(Qy) vs FOSB_HUMAN (Db) |
Example alignment in Intelligenetics Format
Intelligenetics format uses `|' to show identities and `:' to show conservative replacements and places these indicators between the two aligned sequences.
Example: FOSB_MOUSE(Qy) vs FOSB_HUMAN (Db) |
Other Alignment Formats
pairwise
Aligns your query sequence and database matches in pairs. Matches are connected with a "|" symbol. Mismatches are opposed with a spce. Gaps are introduced with a "-" symbol.
e.g.
|
M/S with identities
The databases alignments are anchored (shown in relation to) to your query sequence.
Identities are displayed as dots (.).
Mismatches are displayed as single letter nucleotide abbreviations(c,t,a or g).
Gaps are introduced with a "-" symbol.
e.g. |
M/S without identities
The databases alignments are anchored (shown in relation to) to your query sequence.
Identities are shown as single letter nucleotide abbreviations.
Mismatches displayed as single letter nucleotide abbreviations(c,t,a or g).
Gaps are introduced with a "-" symbol.
e.g. |
|
Flat Query-anchored with identities
The 'flat' display shows inserts as deletions on the query.
Identities are displayed as dots.
Mismatches displayed as single letter nucleotide abbreviations (c,t,a or g).
Gaps are introduced with a "-" symbol.
e.g.
|
Flat Query-anchored without identities
The 'flat' display shows inserts as deletions on the query.
Identities are displayed as as single letter nucleotide abbreviations (c,t,a or g).
Mismatches displayed as single letter nucleotide abbreviations (c,t,a or g).
Gaps are introduced with a "-" symbol.
e.g. |
EDITING AN ALIGNMENT
You can edit the alignment using jalview. Click on the button below to view the above alignment.
|
PHYLOGENETIC TREE
Phylogram is a branching diagram (tree) assumed to be an estimate of a phylogeny, branch lengths are proportional to the amount of inferred evolutionary change. A Cladogram is a branching diagram (tree) assumed to be an estimate of a phylogeny where the branches are of equal length, thus cladograms show common ancestry, but do not indicate the amount of evolutionary "time" separating taxa. Tree distances can be shown, just click on the diagram to get a menu of options.
example:
|
|
NLGPSTKDFGKISESREFDNQ || || || QLNQLERSFGKINMRLEDALV