PRANK: Probabilistic Alignment Kit
PRANK is developed on Linux but works also on MacOSX and Windows.
PRANK has been developed and best tested on Linux. Precompiled binaries are provided for MacOSX and Windows; it should also compile fine on these platforms if the necessary tools are available. The author is not taking any responsability of possible damage that the software may cause to your computer, scientific career, family life or anything else.
PRANK written in C++. The code is © Ari Loytynoja and distributed under the GPL; an exception are the eigen routines and the sequence input/output functions that come from PAML and readseq packages and are © Ziheng Yang and Don Gilbert, respectively.
The PRANK source code and precompiled binaries can be downloaded from here.
On Linux/Unix systems, the code can be unpacked and compiled using commands:
tar xvzf prank.src*.tgz
See here for some instructions for compiling the code on a Mac OSX.
The software is still under development and, in addition to lacking much error checking, may contain bugs.
If you wish to use the alignment software for your own studies, please send an email to Ari Loytynoja to be kept up to date with the bug fixes and improvements.
The minimal command is
For DNA data, PRANK by default uses HKY model with empirical base frequencies and kappa=2. With the optional command parameters, it supports TN (TN93) and models below it (JC, K2P, FEL, HKY). For example, JC model is defined as
Simulation studies with nucleotide sequences containing high numbers of insertions and deletions showed that the option '-F' (i.e. the model "+F" as defined in the paper) gives the most accurate results and should generally be used.
Progressive alignment requires a guide tree, and PRANK can construct a tree using Neighbor Joining algorithm and evolutionary distances estimated from fast pairwise alignments. If you don't specify a guide tree, PRANK runs the alignment twice: (1) it generates a tree from unaligned data, (2) makes a multiple alignment, (3) generates a new guide based on the given alignment, and (4) makes an improved multiple alignment. The alignments produced at the stage (2) and (4) are named e.g. as
If you know the correct phylogeny, import the tree with branch lengths and use it for alignment. The PRANK algorithm uses insertion-deletion events as phylogenetic information and the results may be very sensitive to the given topology.
The standard PRANK algorithm is based on an exhaustive search of the best pairwise solution, and for long sequences this soon becomes too time consuming. The command-line version of the algorithm includes an experimental anchoring option that may radically reduce the computation time and allow for aligning sequences up to hundreds of kb's. This option uses the anchoring algorithm chaos from the lagan package and requires the program
If this softare installed is on your system, the option
The PRANK algorithm infers the insertion-deletion events while aligning the sequences, and this information and the inferred ancestral sequences can be outputted in two formats. With the option
You may notice that in the file
By default, the PRANK algorithm doesn't use log-space for the likelihood calculation. This makes the program run faster but may cause underflow problems with larger datasets (>>100 rather distant DNA sequences, fewer protein sequences). By using the log-space (with the flag
For homogeneous models (default), the method corresponds to that published in (LG05). However, PRANK can also use complex models for alignment and infer sequence structure along with the alignment. You can build some models here.
TN93. Tamura K, Nei M. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. MBE 10:512-526.