Basic usage
Given an existing alignment in a file "data" in "plain" format, the minimal
use might be:
mview -in plain data > data.out
Or you might attach MView on the end of a pipeline:
some_process | mview -in plain > data.out
To change the input format to scan a FASTA run, also in "data", use:
mview -in fasta data > data.out
Basic usage-adding HTML
To add some HTML markup a few extra options are needed, for example:
mview -in fasta -html body data > data.html
produces a page of HTML wrapped inside <BODY>
</BODY> tags with a coloured background, and you can load
this into your Web browser with a URL like
"file://your_path/data.html".
If you want a complete Web page, you can use -html full
(gives MIME-type, <HTML>, <BODY>
tags) or -html head (gives <HTML>,
<BODY> tags).
To get just the alignment block without these tags use -html
data.
Adding some colour is simple. To colour all the residues:
mview -in fasta -html head -coloring any data > data.html
and this looks better in my Netscape if the residues are emboldened, so
mview -in fasta -html head -coloring any -bold data > data.html
Now try colouring by identity to the first sequence:
mview -in fasta -html head -coloring identity -bold data > data.html
and then make the non-identical residues and gaps grey, instead of
black:
mview -in fasta -html head -coloring identity -bold -symcolor gray
-gapcolor gray data > data.html
Now try using an internal style sheet to get blocked
colouring. The -bold option is no longer needed:
mview -in fasta -html head -css on -coloring identity -symcolor gray
-gapcolor gray data > data.html
The -in option isn't always necessary. If the filename
extension, or the filename itself minus any directory path begins with or
contains the first few letters of the valid -in options (eg.,
mydata.msf or mydata.fasta or
tfastx_run1.dat), MView tries to choose a
sensible input format, allowing multiple files in mixed formats to be
supplied on the command line. The -in option will always
override this mechanism but requires that all input files be of the same
format.
Rulers
Add a ruler along the top, with -ruler on. Only one kind of
ruler is currently provided, numbering the columns of the final alignment
from M to N (incrementing) or N to M (decrementing) based on the input
sequence numbering, if any. This defaults to 1 to the length of the
alignment for multiple alignments. TBLASTX rulers differ slightly in that
the native query numbering is given in nucleotide units, but
MView reports amino acid units instead (using modulo 3
arithmetic).
Alignment colouring modes
There are several ways to colour the alignment:
-coloring any, will colour every residue according to the
currently selected palette.
-coloring identity, will colour only those residues that are
identical to some reference sequence (usually the
query or first row).
-coloring consensus, will colour only those residues that
belong to a specified physicochemical class that is conserved in at least
a specified percentage of all rows for a given column. This defaults to 70%
and and may be set to another threshold, eg., -coloring
consensus -threshold 80 would specify 80%. Note that the physicochemical
classes in question can be confined to individual residues.
-coloring group, is like -coloring consensus,
but colours residues by the colour of the class to which they belong.
By default, the consensus computation counts gap characters, so that
sections of the alignment may be uncolored where the presence of gaps
prevents the non-gap count from reaching the threshold. Setting
-con_gaps off prevents this, allowing sequence-only based
consensus thresholding.
The default palette assumes the input alignment is of protein sequences
and sets their colours according to amino acid physicochemical properties:
another palette should be selected for DNA or RNA alignments.
Consensus colouring is complicated and some understanding of palettes and consensus
patterns is required first before trying to explain alignment consensus colouring.
Colour palettes
Palettes have (arbitrary) names, eg., MView assumes a
protein alignment and defaults to the palette P1 for proteins
or D1 for nucleotides. To change default molecule type use
-dna. Different palettes are explicitly selected using the
-colormap option. For example, to select one of the built-in
palettes for viewing nucleotide sequences, use -colormap D1.
There are default palettes for protein and nucleotide sequences. The
latter can be selected with the -dna option.
The built-in palettes can be listed from the command line with
-listcolors, and new colour schemes can be loaded from a file
using -colorfile in exactly the same format as produced by
-listcolors. Palette names are case-insensitive, while symbols
to be coloured are case-sensitive. Lines can contain comments beginning
with a hash '#' character. Colours are specified as hexadecimal RGB codes
prefixed with hash '#', exactly as used in HTML markup (named colours may
not be supported equally by all browsers). Here are the default palettes:
[P1]
#protein: highlight amino acid physicochemical properties
#symbols => color [#comment]
* => #666666 #mismatch (dark grey)
? => #999999 #unknown (light grey)
Aa => #33cc00 #hydrophobic (bright green)
Bb => #666666 #D or N (dark grey)
Cc => #ffff00 #cysteine (yellow)
Dd => #0033ff #negative charge (bright blue)
Ee => #0033ff #negative charge (bright blue)
Ff => #009900 #large hydrophobic (dark green)
Gg => #33cc00 #hydrophobic (bright green)
Hh => #009900 #large hydrophobic (dark green)
Ii => #33cc00 #hydrophobic (bright green)
Kk => #cc0000 #positive charge (bright red)
Ll => #33cc00 #hydrophobic (bright green)
Mm => #33cc00 #hydrophobic (bright green)
Nn => #6600cc #polar (purple)
Pp => #33cc00 #hydrophobic (bright green)
Qq => #6600cc #polar (purple)
Rr => #cc0000 #positive charge (bright red)
Ss => #0099ff #small alcohol (dull blue)
Tt => #0099ff #small alcohol (dull blue)
Vv => #33cc00 #hydrophobic (bright green)
Ww => #009900 #large hydrophobic (dark green)
Xx => #666666 #any (dark grey)
Yy => #009900 #large hydrophobic (dark green)
Zz => #666666 #E or Q (dark grey)
[D1]
#DNA: highlight purine versus pyrimidine
#symbols => color [#comment]
* => #666666 #mismatch (dark grey)
? => #999999 #unknown (light grey)
Aa => #0033ff #purine (bright blue)
Bb => #666666 #C or G or T; not A (dark grey)
Cc => #0099ff #pyrimidine (dull blue)
Dd => #666666 #A or G or T; not C (dark grey)
Gg => #0033ff #purine (bright blue)
Hh => #666666 #A or C or T; not G (dark grey)
Kk => #666666 #G or T (dark grey)
Mm => #666666 #A or C (dark grey)
Nn => #666666 #A or C or G or T (dark grey)
Rr => #666666 #A or G (dark grey)
Ss => #666666 #C or G (dark grey)
Tt => #0099ff #pyrimidine (dull blue)
Uu => #0099ff #pyrimidine (dull blue)
Vv => #666666 #A or C or G; not T (dark grey)
Ww => #666666 #A or T (dark grey)
Xx => #666666 #any (dark grey)
Yy => #666666 #C or T (dark grey)
In these examples, both lower and uppercase versions of each residue are
given with their associated colour to ensure that either case is coloured
the same.
The arrow separating the symbols from the colour codes can be double
=> or single ->. When style
sheets have been selected -css on, a double arrow means
that the colour should be applied to the background of the symbol while a
single arrow means that only the letter should be coloured. When Style
Sheets are off, only letters can be coloured anyway and the arrows are
equivalent.
Consensus patterns
A block of consensus lines can be added beneath the alignment using
-consensus on. By default, this adds 4 extra lines giving
consensus patterns computed at thresholds of 100,90,80,70%.
Consensus patterns are based on residue equivalence classes, that is,
sets of residues that share some physicochemical property. There are two
default consensus group definitions for protein P1 and
nucleotide D1 alignments, the latter being selected with the
-dna option.
At a given percentage threshold, the most discriminating equivalence
class is chosen to represent the residues in a given column and an
associated symbol is displayed. For example, the default protein and
nucleotide consensus groups define the following symbols and equivalence
class mappings:
[P1]
#protein consensus: report conserved physicochemical classes, derived from
#the Venn diagrams of:
# Taylor W. R. (1986). The classification of amino acid conservation.
# J. Theor. Biol. 119:205-218.
#as used in:
# Bork, P., Brown, N.P., Hegyi, H., Schultz, J. (1996). The protein
# phosphatase 2C (PP2C) superfamily: Detection of bacterial homologues.
# Protein Science. 5:1421-1425.
#description => symbol members
* => .
A => A { A }
C => C { C }
D => D { D }
E => E { E }
F => F { F }
G => G { G }
H => H { H }
I => I { I }
K => K { K }
L => L { L }
M => M { M }
N => N { N }
P => P { P }
Q => Q { Q }
R => R { R }
S => S { S }
T => T { T }
V => V { V }
W => W { W }
Y => Y { Y }
alcohol => o { S, T }
aliphatic => l { I, L, V }
aromatic => a { F, H, W, Y }
charged => c { D, E, H, K, R }
hydrophobic => h { A, C, F, G, H, I, K, L, M, R, T, V, W, Y }
negative => - { D, E }
polar => p { C, D, E, H, K, N, Q, R, S, T }
positive => + { H, K, R }
small => s { A, C, D, G, N, P, S, T, V }
tiny => u { A, G, S }
turnlike => t { A, C, D, E, G, H, K, N, Q, R, S, T }
[D1]
#DNA consensus: report conserved ring types
#description => symbol members
* => .
A => A { A }
C => C { C }
G => G { G }
T => T { T }
U => U { U }
purine => r { A, G }
pyrimidine => y { C, T, U }
Alternative equivalence classes can be selected using
-con_groupmap, the available list of built-ins can be seen
with -listgroups, and new groups can be defined in the same
format and read in from a file using -groupfile.
Alternative thresholds to be displayed can be specified as a
comma-separated list using the -con_threshold option.
Tip: A useful capability is to control whether only consensus properties
(-con_ignore singleton) or just the conserved residues
themselves (-con_ignore class) are displayed in consensus
lines. The default is to show both using whichever equivalence class is
most specific.
By default, the consensus computation counts gap characters, so that
sections of the alignment may have gaps as the consensus. Setting
-con_gaps off prevents this, producing consensi based only on
sequence.
You can specify a colour scheme for the consensus lines using
-con_coloring and -con_colormap to change the
default palette (PC1 for protein or DC1 for
nucleotide). These options are analogous to those for controlling the
alignment colouring and follow the same naming scheme.
Alignment consensus colouring
This section assumes an understanding of palettes and consensus patterns.
Colouring of an alignment by consensus determines which residues to
colour and the colours to use based on (1) the consensus threshold chosen
for the colouring operation (covered in the section on alignment colouring modes), (2) a
consideration of the common physicochemical properties of the residues in
that column, and (3) the chosen colour scheme:
Given the most specific equivalence class describing the column
using the prevailing consensus equivalence classes, any residues in the
column belonging to that class will be coloured using the prevailing
palette.
In practice, for the default situation of a protein alignment and no
special selection of palettes or consensus groups from the command line,
then the P1 (D1) equivalence classes and the
P1 (D1) colour palette will be used (option
-dna).
Tip: If you want to see only the conserved residues above the threshold
(ie., only one type of conserved residue per column), add the option
-ignore class.
Alternative consensus classes and palettes can be specified using
-groupmap and -colormap. Note that these are
distinct from any settings used to control displayed consensus lines,
although the option naming is similar.
Sequence numbering or ranking
One can colour and compute identities with respect to a sequence other
than the first/query sequence using the -reference
option. This takes either the sequence identifier or an integer argument
corresponding to the ranking or ordering of a sequence. For multiple
alignment input formats, sequences are numbered from 1, while for searches
the hits are numbered from 1, but the query itself is 0, so
beware.
The labelling information can be too broad, so you can switch some
off. Labels at the left of the alignment are in blocks numbered from zero
(0) rank, (1) identifier, (2) description, (3) score block, (4) percent
identities. Each of these can be switched off with an option like
-label2 to remove descriptions.
The default layout is a single unbroken horizontal band of alignment -
fine if scrolling inside Netscape. However, you may prefer to break the
alignment into vertically stacked panes. For panes, for example, 80 columns
wide, set -width 80. Widths refer to the alignment, not to the
descriptor information at left.
It is possible to narrow (or expand!) the displayed sequence range, for
example, -range 10:78 would select only that column range of
the alignment using the numbering scheme reported when -ruler
on is set (see Rulers). The order of the
numbers is unimportant making it simpler to state interest in a region of
the alignment that might actually be reversed in the output (eg., a BLASTN
search hit matching the reverse complement of the query strand).
Filtering alignments
Usually, specifying a limited number of hits to view from a long search
alignment speeds things up a lot as there's less parsing and less
formatting to be generated, so to get the best 10 hits, use the option
-top 10.
You also can squeeze more out of a deep alignment and get a less biased
view if a threshold on the pairwise sequence identity is set using
-maxident N, where N is some value between 0 and 100.
Other filters specific to BLASTP, FASTA, etc., input formats allow
cutoffs on scores or p-values, etc. In particular, it is possible to apply
some control over the selection of HSPs used in
building the MView alignment using the -hsp
filtering option.
Of interest to anyone using PSI-BLAST, you can display alignments for
any/all iterations of a PSI-BLAST run using, say:
mview -in blast -cycle 1,5,10,20 mydata
to get just those iterations. The default is to display only the last
iteration. If you want all output, use -cycle '*'.
Rows can be dropped explicitly using the -disc option. This
can be supplied a comma separated list of row identifiers, rank numbers
(see above for an explanation of sequence rank numbers), rank number
ranges, or regular expressions (case insensitive, enclosed between
characters
) to
match against row identifiers.
Likewise, the -keep option specifies a list of rows to keep
in the alignment. The -keep option overrides
-disc whenever a row is common to both
.
For example, the options
-disc '/.*/' -keep '2,3,6..10,/^pdb/'
would discard everything except rows 2, 3, 6 through 10 inclusive, and any
hits beginning with the string 'pdb'.
Note: the currently set reference row is still
used for percent identity and colouring operations, even though the row may
have been dropped from display by the -disc list.
.
Another control option can be used to prevent MView
from using rows for colouring or for calculation of percent identities
although these rows will still be displayed. Use -nop to
specify a list (comma separated as usual) of id's or row numbers to flag
for 'NO Processing'. This is useful for displaying non-alignment data (eg.,
secondary structure predictions) alongside the alignment.
SRS
Sequence Retrieval System links
If HTML markup is produced it is possible to embed SRS links in sequence
identifiers if they conform to the following patterns:
database|accession|identifier
database:identifier
as produced by some BLAST and FASTA servers. Such links will be to
the EBI and EMBL SRS services and will only be constructed if the database
names are listed in the SRS.pm library with this software. This library can
be modified for your site if you know some Perl and a little SRS syntax.
Using Cascading Style Sheets
Release 1.40 added cascading style sheets allowing more specific control
of HTML elements. In particular, this enables selective colouring of text
fore/backgrounds allowing alignments to use coloured blocks instead of just
coloured lettering.
This is enabled with the -css on option in combination with
the -html option to switch HTML processing on generally. It is
disabled with -css off. You can refer to an external style
file with -css URL where the URL give a valid path for the Web
server to find the file (ie., file:/some/path or
http://server/path).
Having loaded your own colour schemes into MView with
the -colorfile option, you can dump these as a style
file with -html css which just dumps the style sheet to
standard output for redirection to a file.
Controlling coloured fore/backgrounds for alignment lettering is handled
in the colour scheme definition mechanism.