Goldman Group Projects

The Different Versions of the Dayhoff Rate Matrix

Phylogenetic inference methods require Markov models of sequence evolution expressed in terms of instantaneous rate matrices (Q), but some models, most notably the PAM model of Dayhoff et al. (1978), were originally published only in terms of probability matrices (P). Previous methods for deriving Q from P, based on inverting the relationship P(t) = etQ, have used eigen-decomposition of Dayhoff et al.'s PAM1 matrix. We have found that PAM1 is not a close-enough approximation to the required matrix P to ensure convergence of the estimates of the elements of Q.

We have written a paper (see below) which describes the above findings, and introduces two simple methods to derive Q from the information published by Dayhoff et al. which require neither eigen-decomposition nor consideration of the limit t ® 0. We identify the methods used to derive various existing implementations of the Dayhoff matrix in current software, and analyze 200 protein sequence alignments to test these against the two new methods. We conclude with the recommendation that one of the new methods, denoted DCMut, be used as a 'standard’ implementation of the Dayhoff et al. (1978) model, to facilitate agreement amongst scientists using supposedly identical methods. Files described in the paper giving our implementation of this model (and others) are available below.

Our paper has been published in Molecular Biology and Evolution; click here to be directed to the journal's web page.

On p.195 of this paper, the claim is made that "the PAM1 matrix cannot be generated as exp(t*Q*) for any valid IRM Q* and time t* >= 0". Click here to download a short note that proves this claim.

Files described in the paper (all in a format suitable for inclusion in the PAML program Codeml):

Implementations of the Dayhoff (1978) matrix:
dayhoff-dcmut.dat new implementation based on Dayhoff et al.'s raw data and amino acid mutabilities
dayhoff-dcfreq.dat new implementation based on Dayhoff et al.'s raw data and amino acid frequencies
dayhoff-paml.dat implementation used in PAML software
dayhoff-kmh.dat uses Kishino et al. (1990) method, but without rounding
dayhoff-molphy.dat implementation used in MOLPHY and TREE-PUZZLE softwares
dayhoff-proml.dat implementation used in PHYLIP software

Implementation of the JTT (1992) matrix:
jtt-dcmut.dat our new, recommended, implementation