spacer

PRANKSTER: Graphical aligner and alignment browser

Introduction


PRANKSTER is a graphical front-end to the multiple sequence alignment program PRANK. In addition to providing an easy-to-use interface to this program and displaying the resulting alignments in a browsable window, it supports the input and output of multiple different alignment formats, printing to a printer or to a postscript-file and, in the case of the native HSAML format, filtering and selection of data according to the alignment posterior probabilities. For those using PRANK on command line, PRANKSTER provides a GUI to browse the result files.

Executable binaries are provided for Linux, MacOSX and Windows.


Using PRANKSTER


Disclaimer

PRANKSTER has been developed and best tested on Linux. It is known to run on MacOSX and Windows but its correct behaviour on those platforms is not confirmed. The author is not taking any responsability of possible damage that the software may cause to your computer, scientific career, family life or anything else.


Download

PRANKSTER is written in C++ and Qt4. The code is © Ari Loytynoja and distributed under the GPL; an exception are the eigen routines and the sequence input/output functions that come from PAML and readseq packages and are © Ziheng Yang and Don Gilbert, respectively. A snapshot of the current development source code can be obtained by request from Ari Loytynoja.

PRANKSTER binaries can be downloaded from here.


Installation

Linux

Due to licensing issues, PRANKSTER is written using the new Qt 4 graphics libraries. Although the Qt libraries are the basis of any Linux computer running KDE, the version 4 isn't yet included in any major distribution and the compilation of the source code requires installation of these libraries. To avoid this time-consuming step, the software is provided as a precompiled binary.

Download the file prankster.linux.*.tgz, place it in the preferred directory and type

tar xzf prankster.linux.*.tgz
(chmod +x prankster)
./prankster

or open a file browser and click the icon.

Different Linux distributions may not have same versions of libraries included or these libraries may be placed in slightly different locations. If the precompiled binary does not work on your system, you may check if the libraries required by PRANKSTER (use the command ldd prankster) are present on your system, and, if needed, create symbolic links between the actual and expected locations. Contact your systems support for more help.


MacOSX

PRANKSTER has been compiled on MacOSX Tiger (10.4.4; PPC). Download the file prankster.osx.*.tgz, extract it to the preferred location and click the PRANKSTER icon.


Windows

PRANKSTER has been compiled on Windows XP. Download the file prankster.win.*.zip, extract it to the preferred location and click the PRANKSTER icon. You may want to move the file mingwm10.dll to C:\WINDOWS\SYSTEM32.


Source code

Assuming that Qt 4 is installed to /usr/lib/qt4, the software can be compiled as follows:

tar xzf prankster.src.*.tgz
cd src
export QTDIR=/usr/lib/qt4
(or on csh, setenv QTDIR /usr/lib/qt4)
$QTDIR/bin/qmake
make
../bin/prankster


Using PRANKSTER

PRANKSTER can be used with or without a mouse, as all other features but the window panel settings (relative width of the tree/name/data panels) can be changed using shortcut keys or key combinations. In PRANK XML-format, node selection can be done either by mouse (by clicking the node in the tree panel) or using the key combination Ctrl+N and typing the number of the node. All other functions can be found from the pull-down menus.

Opening and printing large files may take time. The program GUI may freeze until the task is finished.


Different menus

File menu

Open file: Input sequence data in various formats (format detection should be automatic). PRANKSTER uses code from Don Gilbert's program readseq, such that the data input and output have the same strengths and weaknesses as that software. In addition to standard formats, PRANK HSAML format is supported.

Open URL: Download a file from an URL address.

Reload data: Reload the active sequence file. Any modification is lost.

Load tree: Import a tree in newick format to be used as a guide tree for the alignment. The names in the tree have to match exactly those of the sequences; if tree contains only a subset of the sequences, the sequence set will be reduced.

Save: Save alignment in one of the various formats. The format is chosen from the file type menu. In most formats, only the currently selected sites (see below) are exported; in HSAML, the selection is stored.

Print: Print the data as shown on the screen: font size, colouring, panel setting/hiding etc. will be replicated in the printout. Page selection and ordering have not been implemented.

Quit: Quit.

Settings menu

Font: Set font, colouring and space. Some systems may support fewer font types than shown on the list.

Plot: Set parameters for the display of posterior probabilities (HSAML format only). Show/hide, height and width are obvious; for each state (left-most pull-down menu), colour, style, offset and show/hide can be individually selected. The names of different structure states are defined in the model; postprob refers to the reliability of the alignment solution.

Sites: Select alignment sites (HSAML format only). Alignment sites can be filtered according to their posterior probabilities (e.g., probability of belonging to a certain structure state, or the reliability of the given solution). All sites can be selected or unselected, and then removed from or added back to selection according to chosen criteria. A selection rule can be limited to a certain range of sites, and either to the currently selected node or all nodes below that node. Selection criteria is saved in the HSAML output and will be recovered when re-opening the file.
For example, if you select all sites and then unselect postprob <= 80% in nodes below the root node, only those sites that have reliability of more than 0.80 in all the pairwise alignments is kept selected. Sites that are currently unselected are not coloured and only the selected sites will be exported in non-native alignment formats.

Model: Specify the alignment model. For both DNA and protein, gap opening rate and gap extension probability can be defined, the option '+F' (keep gaps open; see LG05) set, or an external PRANK alignment model be imported. For DNA, base frequencies can be defined (by default, empirical frequencies are used), and kappa and rho set; coding sequences (reading frame 1 assumed) can be translated into codons and the empirical codon substitution model used. Unchecking 'use log values' may give a 2-3 fold speed up in the alignment but also cause an underflow error and a program crash in the case of large datasets (>>50 sequences).

Input/output: PRANKSTER is foremost an alignment method and thus manipulation of a guide tree (even when inferred by the program) is considered as "input" (or the name of the menu item is wrong -- also possible). Short names reads just the name until the first white space; Truncate branches and Fixed branches use the branch lengths value below and do what they say; Scale branches has its own value. Output with Dots and dashes (requires +F) makes the alignment more readable by marking insertions and deletions differently. Unfortunately few downstream methods understand this sort of output.

Translate to protein: As it says, translates a DNA data set to proteins and allows back-translating these sequences to DNA after the alignment, maintaining the gap structure. Can also be considered as "Save DNA sequences into memory" function (see below). Sequences should be ungapped or the back-translation may fail.

Back-translate to DNA: Back-translates protein sequences to DNA if the names in the two data sets (DNA "saved into memory" -- see above -- and proteins shown on the screen) match and the DNA sequences produce the given protein sequences when translated. Note, however, that the protein alignment does not need to be produced with PRANKSTER but you can use it as a universal back-translation program.

Alignment menu

Make guide tree: Generate an alignment guide tree using the Neighbor Joining algorithm. If the data is unaligned, approximate pairwise alignments are generated and distances are estimated from those; if the data is aligned, distance estimates are generally improved if the the given alignment is used and gaps are not removed (Remove gaps from input? No.).

If you don't know the correct phylogeny, it is recommended that you run the alignment (at least) twice: (1) generate a tree from unaligned data (Alignment/Make guide tree), (2) make a multiple alignment (Alignment/Make alignment), (3) generate a new guide based on the given alignment (Alignment/Make guide tree; Remove existing tree? OK; Remove gaps from input? No), and (4) make an improved multiple alignment (Alignment/Make alignment). You may repeat the steps 3--4 or, even better, export an alignment, use a phylogeny software to infer a tree, import that (rooted) tree in PRANK, and realign the data.

If you know the correct phylogeny, import the tree with branch lengths and use it for alignment. The PRANK algorithm uses insertion-deletion events as phylogenetic information and the results may be very sensitive to the given topology.

Make alignment: A multiple alignment is generated using PRANK.

Shortcut key combinations

The shortcut keys and key combinations have been tested on a Linux system with a UK keyboard. Correct function on other systems is not guaranteed.

  • Alt+F/S/A/B brings you to the top menus, Right/Left around and Esc away.

  • Ctrl+O: open file

  • Ctrl+U: open URL

  • Ctrl+R: reload data

  • Ctrl+L: load tree

  • Ctrl+S: save

  • Ctrl+P: print

  • Ctrl+Q: quit

  • Ctrl+T: make guide tree

  • Ctrl+A: make alignment

  • Ctrl+G: goto site

  • Ctrl+N: select node (HSAML)

  • Ctrl+W: show plot (HSAML)

  • Ctrl+I: hide plot (HSAML)

  • Ctrl+=: bigger font

  • Ctrl+-:smaller font

  • Ctrl++: more space

  • Ctrl+_: less space

  • Right/Left/Up/Down: scroll

  • Shift+Right/Left/Up/Down: scroll more

  • Ctrl+Shift+Right/Left/Up/Down: scroll even more

  • Alt+Right/Left/Up/Down: move window

  • Ctrl+Alt+Right/Left/Up/Down: resize window


Methods

PRANKSTER is a front-end to the multiple sequence alignment program PRANK. For homogeneous models (default), the method corresponds to that published in (LG05). For DNA alignments, the substitution model is Tamura-Nei (TN93) or a subset of it (Hasegawa-Kishino-Yano (HKY85) by default); for protein coding DNA data, an empirical codon model can also be used (substitution model kindly provided by Carolin Kosiol); for protein, the default model is WAG (WG01). For any data type also external models given in PRANK HMM format can be used. This model can either be homogeneous (one structure state) or define a structure. You can build some models here.


References

HKY85. Hasegawa M, Kishino H, Yano T. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. JME 2:160-174.
TN93. Tamura K, Nei M. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. MBE 10:512-526.
WG01. Whelan S, Goldman N. 2001. A General Empirical Model of Protein Evolution Derived from Multiple Protein Families Using a Maximum-Likelihood Approach. MBE 18:691-699.
LG05. Loytynoja A, Goldman N. 2005. An algorithm for progressive multiple alignment of sequences with insertions. PNAS 102:10557-10562.



Back to the front page.    Comments? E-mail ari@ebi.ac.uk.

spacer

spacer