![]() |
webPRANK alignment serverwebPRANK is an easy-to-use web interface to the PRANK alignment algorithm. It supports all the main features of the command-line program PRANK and includes a powerful alignment browser with features similar to those found in the graphical interface PRANKSTER. Note that webPRANK can also be used to upload and display your existing PRANK alignments. Test webPRANK now or read the paper describing it.
IntroductionPRANK is a probabilistic multiple alignment program for DNA, codon and amino-acid sequences. It's based on a novel algorithm that treats insertions correctly and avoids over-estimation of the number of deletion events. In addition, PRANK borrows ideas from maximum likelihood methods used in phylogenetics and correctly takes into account the evolutionary distances between sequences. Lastly, PRANK allows for defining a potential structure for sequences to be aligned and then, simultaneously with the alignment, predicts the locations of structural units in the sequences. PRANK is a command-line program for Unix-style environments but the same sequence alignment engine is implemented in the graphical program PRANKSTER. In addition to providing a user-friendly interface to those not familiar with Unix systems, PRANKSTER is an alignment browser for alignments saved in the HSAML format. The novel format allows for storing all the information generated by the aligner and the alignment browser is a convenient way to analyse and manipulate the data. PRANK aims at an evolutionarily correct sequence alignment and often the result looks different from ones generated with other alignment methods. There are, however, cases where the different look is caused by violations of the method's assumptions. To understand why things may go wrong and how to avoid that, read this explanation of differences between PRANK and traditional progressive alignment methods. The reconstruction of evolutionary homology -- including the correct placement of insertion and deletion events -- is only feasible for rather closely-related sequences. PRANK is not meant for the alignment of very diverged protein sequences. If sequences are very different, the correct homology cannot be reconstructed with confidence and PRANK may simply refuse to match them. There are several methods developed for structural matching of very distant protein sequences. One should not consider the resulting alignment a proper inference of evolutiory homology, though. Often there are ties in the alignment, i.e. positions with many equally good solutions. The common practice among sequence aligners is always to pick the same solution among many different ones and produce consistent alignments. We believe that this gives the user a wrong impression and too high confidence on the resulting alignment (for example, 10 iterations of the alignment always produce the same result -> the alignment must be correct). Our decision is to break the ties randomly. This means that different runs of the program with the same data may give different alignments and PRANK alignment may not be reproducible. However, the practice we have chosen also tells the user that the regions not consistently aligned in a similar manner simply cannot be resolved reliably. Be aware that the reproducability in many other methods does not mean higher confidence -- they also reproduce exactly the same errors! PRANK and PRANKSTER include an option for DNA translation/back-translation. This allows an automatic translation of a protein-coding DNA data set into protein sequences, alignment of these sequences as proteins, and back-translation of the resulting alignment into DNA such that the gaps are maintained. PRANKSTER can also back-translate protein alignments produced with external alignment software. Note, however, that both PRANK and PRANKSTER can also align codon sequences (i.e., using a codon substitution matrix) and this should (theoretically) produce even better alignments for protein-coding sequences than alignment of protein sequences and back-translation of these alignments into DNA. SoftwarePRANK is a command-line program that contains the latest features and the complete set of options. It is ideal for scripting and non-interactive work, and as it supports the HSAML format for outputting, the resulting alignments can always be browsed using the graphical front-end. PRANK is distributed as C++ source code and binaries for MacOSX and Windows. You can find PRANK here. PRANKSTER a graphical front-end to PRANK and a browser for the PRANK HSAML-formatted files. It supports several different alignment formats for input and output, formatted printing, and complex rules for the filtering of alignment sites. PRANKSTER is distributed as pre-compiled binaries for Linux, MacOSX and Windows. A snapshot of the current source code can be obtained from the author. You can find PRANKSTER here. Answers to some of the frequently asked questions are given here. Alignment modelsAlignment programs PRANK and PRANKSTER can simultaneously model sequence structure and evolution. Evolutionary relations between sequence are automatically taken into account by the progressive algorithm, whereas a potential sequence structure (structure can simply be alternating regions of fast and slowly evolving sites, or complex such as a gene with start and stop codons, splice sites and codons) has to be specified in advance in the form of an alignment model. Locations of structural units within the sequences are not specified but these are inferred from the data along with the alignment. An alignment model is defined in a structured flat file that is then imported in PRANK/PRANKSTER during the analysis. Defining a complex model manually can be frustrating, so simple forms to build the most typical models for the alignment of DNA sequence are provided. One can modify some of the basic parameters or leave them with their default values and press the button in the bottom of the form. This will generate a model that can be saved locally and used for the alignment. You can make models here. HSAML formatAn alignment generated with PRANK contains more information than can be stored using the traditional sequence alignment formats. In addition to supporting several well-known formats, PRANK and PRANKSTER can store the results in the HSAML format. A description of the HSAML format and examples how to manipulate HSAML-formatted data using other software can be found here. The XML schema for the novel format is available here. Supplementary material
The simulation data used in the paper "Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis" (Science 320:1632--1635) can be found here. Comments? E-mail ari@ebi.ac.uk. ![]() |