Goldman Group Projects
The Different Versions of the Dayhoff Rate MatrixPhylogenetic inference methods require Markov models of sequence evolution expressed in terms of instantaneous rate matrices (Q), but some models, most notably the PAM model of Dayhoff et al. (1978), were originally published only in terms of probability matrices (P). Previous methods for deriving Q from P, based on inverting the relationship P(t) = etQ, have used eigen-decomposition of Dayhoff et al.'s PAM1 matrix. We have found that PAM1 is not a close-enough approximation to the required matrix P to ensure convergence of the estimates of the elements of Q.
We have written a paper (see below) which describes the above findings, and introduces two simple methods to derive Q from the information published by Dayhoff et al. which require neither eigen-decomposition nor consideration of the limit t ® 0. We identify the methods used to derive various existing implementations of the Dayhoff matrix in current software, and analyze 200 protein sequence alignments to test these against the two new methods. We conclude with the recommendation that one of the new methods, denoted DCMut, be used as a 'standard’ implementation of the Dayhoff et al. (1978) model, to facilitate agreement amongst scientists using supposedly identical methods. Files described in the paper giving our implementation of this model (and others) are available below.
Our paper has been published in Molecular Biology and Evolution; click here to be directed to the journal's web page.
On p.195 of this paper, the claim is made that "the PAM1 matrix cannot be generated as exp(t*Q*) for any valid IRM Q* and time t* >= 0". Click here to download a short note that proves this claim.
Files described in the paper (all in a format suitable for inclusion in the PAML program Codeml):