Goldman Group Projects

The 'WAG' matrix: a new amino acid replacement matrix for globular proteins

This page contains information regarding a project by Simon Whelan which uses a novel method to estimate empirical amino acid replacement matrices from large databases of aligned protein sequence families. A maximum likelihood procedure is used (with some approximations and short cuts), and the evolutionary relationships within the families are taken account of by the use of phylogenies for each family.

The method has been used to estimate an amino acid replacement matrix for globular proteins. This matrix was derived from 3905 sequences in 182 protein families, kindly provided by David Jones. A paper was published in Molecular Biology and Evolution in 2001; see this link for details.

You can download the WAG matrix ('Whelan And Goldman' matrix) via this link.

[You can download the WAG* matrix, a variant described in the paper mentioned above, via this link.]

These files are in the format required for incorporation in Ziheng Yang's PAML package. First is the symmetric part of the matrix ("sij = sji"), and then the amino acid equilibrium frequencies ("pj") as estimated from the entire database from which the WAG and WAG* matrices were estimated. The instantaneous rate of replacement from amino acid i to amino acid j is then given by Qij = sijpj, where you can use either the pj from the files or replace them with the amino acid frequencies from the data you are analysing. Diagonal elements Qii are set so that the row-sums Sj¹iQij = 0.
Rates of replacement should be normalised so that, for the given amino acid frequencies, the mean rate of replacement is 1. If the amino acid frequencies you use are pj, then the the mean rate of replacement is given by SiSj¹ipiQij; if you divide the sij by this value you get a matrix with mean rate 1. Programs in PAML do this normalization automatically for you.