A quick start: type 'prank sequence_file_name' to align, or 'prank -help' to learn more. --- NEW: PRANK development has moved to Google Code. The new prank-msa site (at http://code.google.com/p/prank-msa) contains the latest version of the program source code and allows entering comments and bug reports. The new version of PRANK available there includes bug fixes and significant speed improvements. The files available here are not up to date. Recent changes: 18.10.10 Added 'libgcc_s_dw2-1.dll' file in the zip packages containing the DOS version of PRANK. This file is required by both PRANK and PRANKSTER on Windows systems but has accidentally been left out of the PRANK packages. This change doesn't affect PRANK/PRANKSTER on OSX and Linux. Thanks to Adam for reporting this. 28.06.10 After many complaints, I have finally rewritten the sequence input/output routines. Now the supported formats are FASTA, Phylip (interleaved and sequential), Paml and Nexus only. These should work with the corresponding software; if they do not work, please let me know and I'll do my best to fix them. PRANK can also read files in these formats as written by PRANK itself; there are no quarantees that it would read files produced with other software. Similarly to the webPRANK, the Nexus files exported by PRANK now contain additional information, namely the guide tree and column-wise minimum reliability score. These are somewhat experimental features and may contain bugs. PRANK accepts input sequence files that contain sequences not included in the guide tree. The additional sequences are simply ignored. This has now been extended by also accepting guide trees that contain more leaves than there are sequences. Using the new option '-prunetree' such guide trees are trimmed to include the shared taxa only, maintaining the relative distances of the sequences. This feature is not enabled by the default as that could cause unintended pruning in cases where the sequence and tree labels don't match due to typos or nonstandard characters. -------------------------------------------------------------------------------------- 07.10.09 There has been several attempts to clean the code lately but this latest update also brings a rather significant change: the earlier version of the program did not "truncate" the branches as intended. Now it does. It's not sure if the intended behaviour gives better alignments than the buggy one but at least it now works (or should work) as it's meant to. Should not affect alignments of closely-related sequences but does significantly affect alignments of diverged sequences. 02.12.08 This version now completes the Balibase benchmark with default options (i.e., including the guide tree estimation that used to be a problem). It doesn't score as well as some other methods but that has never been the aim. The latest fix was for a minor bug that made the program jam on some platforms (at least AMD64) but not others (at least Intel Pentium). This didn't have effect on the results. 28.11.08 As pointed out by Bob and Eyal, the PRANK-made guidetrees for distantly-related datasets are just awful. This fix should improve them. I'm not sure which versions are affected, at least the previous one. PRANKSTER has the same bug and will be fixed soon. 04.09.08 PRANK now has an experimental DNA translation/back-translation feature. This tranlates a protein-coding DNA data set into protein, aligns these sequences as proteins and back-translates the alignment into DNA such that the gaps are maintained. Both DNA and protein alignments are outputted. The translation is used with parameter "-translate" and "-mttranslate" (for mt DNA). 20.08.08 Changed the way ambiguous characters (e.g. Y, R and N in DNA, X in proteins) are treated. Now ambiguous characters perfectly match with all characters they code for; earlier they were "divided" between the characters and made imperfect matches. This mostly affects cases where missing protein sequence is projected from homologs and represented by long strings of X's. It may also affect the alignment of strings of N's in DNA. Thanks to Albert for pointing out the problem. Also, the warnings given during the alignment are clarified. Hopefully they make sense! 15.07.08 Fixed a bug causing some crashes with unrooted, user-defined guidetrees. Thanks to Sindhu for the dataset. 09.07.08 Added two new command line parameters: -tree="treestring" takes a guidetree as a string instead of a file path. The tree has to be in newick format (including semicolon) and in double quotes. -shortnames truncates the sequence names at the first space, the rest is ignored. The truncated names have to unique. Thanks to Martin for the original suggestions! 08.07.08 Minor fixes: - for consistency with our recent paper, the option "+F" can also be specied as '+F'. The old flag '-F' works, too. - the substitution scoring (based on evolutionary modelling) "collapses" with very distant sequences. I've set up an arbitrary upper limit of 0.5 subst/site for the pairwise distance between any two sequences to be aligned. This functionality can be overdriven with the flag '-realbranches'. 07.07.08 Some (significant) bug fixes and new features: - special characters in sequence names are now replaced with underscores. This should prevent some odd crashes. - output of codon alignments is fixed (was truncated to one third of its true length). - sequence input file can contain gap signs -- these will be removed. The removal wasn't done correctly and may have caused error with pre-aligned input data. (Thanks to Derek for all of these!) - an underflow error with the guidetree computation is fixed. This affected *very* long and *highly* diverged sequences only. (Thanks to Pablo!) - added a new option '-dots' (that requires option '-F') that will print gaps caused by insertions as dots (instead of dashes). (Thanks to Esko for the suggestion and his resilience to get the message through!) The program is distributed as source code (prank.src.080707.tgz) that should compile on any Linux/OSX/Windows machine equipped with a standard C++ compiler such as GCC. Precompiled executables are provided for MacOSX and Windows. These have been compiled on Mac Mini (PowerPC, OSX 10.4.11), Power Book Pro (Intel, OSX 10.5.3) and Windows XP (using MinGW compiler). 20.06.08 Added a check for the uniqueness of sequence names. They still need to be unique but duplicates are now detected and promptly reported. If you are using the older, precompiled binaries and have mysterious crashes, please check the sequence names before reporting a bug. 18.06.08 Bug fixes and additional command line options. If iteration is used, alignment (and guidetree) are written for each round. The naming of output files has also changed. 24.04.07 This is a rather massive re-write of the program. Main changes are in the storage of the data (making it significantly faster and typically having a smaller run size) and, finally, a fix for the bug that made it crash on Windows. As many things have changed, there's naturally a chance that something has got broken. In addition to the source code, executable for Mac (osx) and Windows (dos) are provided. They were compiled on OSX 10.4.9 (PowerPC) and Windows XP, and may not work on all other systems. The compilation was clean, so making the program to run on other systems may be possible.