Goldman Group Projects
The 'WAG' matrix: a new amino acid replacement matrix for globular proteinsThis page contains information regarding a project by Simon Whelan which uses a novel method to estimate empirical amino acid replacement matrices from large databases of aligned protein sequence families. A maximum likelihood procedure is used (with some approximations and short cuts), and the evolutionary relationships within the families are taken account of by the use of phylogenies for each family.
The method has been used to estimate an amino acid replacement matrix for globular proteins. This matrix was derived from 3905 sequences in 182 protein families, kindly provided by David Jones. A paper was published in Molecular Biology and Evolution in 2001; see this link for details.
These files are in the format required for incorporation in
Yang's PAML package. First is the symmetric part of the matrix
("sij = sji"), and
then the amino acid equilibrium frequencies ("pj")
as estimated from the entire database from which the WAG and WAG*
matrices were estimated. The instantaneous rate of replacement
from amino acid i to amino acid j is then given
by Qij = sijpj,
where you can use either the pj
from the files or replace them with the amino acid frequencies
from the data you are analysing. Diagonal elements Qii
are set so that the row-sums Sj¹iQij = 0.