Frequently asked questions
Answers to some of the questions are collected here. Before contacting the author, please,see if a solution to your problem can be found here.
Q: What are these bars (or dots, lines) in the bottom of the prankster screen, or probabilities in the XML file?
A: These are posterior probabilities for different structure states and the reliability of the given solution. Unfortunately, the method isn't published yet and is too complex to go into details here.
In brief, the algorithm can align sequences using an alignment model that consists of multiple processes each describing a class of differently evolving sites. The probabilities (or bars, dots, lines) are then the relative posterior probabilities of these alternative processes at the given site (i.e., probability of this site belonging to a given class). Normally, a pattern gets stronger as the alignment includes more sequences, i.e., the corresponding node has more descendants.
Q: But the probabilities don't sum to one!
A: Yes, one of them is called "postprob" and doesn't belong to the group. "postprob" is computed differently and can be considered as a reliability score for the given solution. If the score is low, there are other alternative solutions that are equally or nearly equally good than the chosen solution. In other words, the alignment at that site may be wrong.
Q: How should I read these velues, and for what can I use them?
A: Multiple processes are meant to better describe the data and thus improve the obtained solution. However, one may also use them as a structure prediction and search for regions that evolve in a particualr way. The reliability score ("postprob") can be used as an objective criterion to remove noise from the data before other analyses. As an example:
(1) In order to select only the most reliably aligned sites for a phylogenetic analysis, one may initially include all the sites in the selection and then exclude sites for which the alignment reliability is below a certain threshold in any internal node.
(2) To search for conserved patterns that are shared by all the species, one can initially exclude all the sites and then include back only those that show high probability of belonging to a slowly evolving structure state (see here) at the root node.
Q: In the xml output of prank, we have access to a "tamura-nei" things. What's that?
A: If you use a structure model with multiple processes (see here), the output file will report the relative probabilities of different structure states at each site. By default, prank/prankster uses a model with a single state that, naturally, has a probability of 1 throughout the sequence. For DNA, the single-state model is called "tamura-nei" (in reality, one of its nested models), and, for proteins, "wag".
Q: For some sites I obtained posterior probabilities = "-1". What does it mean?
A: That site is inferred as an insertion and skipped over free (see LG05 for details). In other words, the algorithm thinks that this site didn't exist in the ancestral sequence and, thus, no score can be assigned for it.
Back to the front page. Comments? E-mail email@example.com.