## Goldman Group Projects## The 'WAG' matrix: a new amino acid replacement matrix for globular proteinsThis page contains information regarding a project by Simon Whelan which uses a novel method to estimate empirical amino acid replacement matrices from large databases of aligned protein sequence families. A maximum likelihood procedure is used (with some approximations and short cuts), and the evolutionary relationships within the families are taken account of by the use of phylogenies for each family.The method has been used to estimate an amino acid replacement
matrix for globular proteins. This matrix was derived from 3905
sequences in 182 protein families, kindly provided by David Jones.
A paper was published in You can download the WAG matrix (' [You can download the WAG* matrix, a variant described in the paper mentioned above, via this link.] These files are in the format required for incorporation in
Ziheng
Yang's PAML package. First is the symmetric part of the matrix
(" s"), and
then the amino acid equilibrium frequencies ("p_{ji}")
as estimated from the entire database from which the WAG and WAG*
matrices were estimated. The instantaneous rate of replacement
from amino acid _{j}i to amino acid j is then given
by Q = _{ij}sp_{ij},
where you can use either the p_{j}
from the files or replace them with the amino acid frequencies
from the data you are analysing. Diagonal elements _{j}Q
are set so that the row-sums S_{ii}_{j¹i}Q = 0._{ij}Rates of replacement should be normalised so that, for the given amino acid frequencies, the mean rate of replacement is 1. If the amino acid frequencies you use are p _{j},
then the the mean rate of replacement is given by SS_{i}_{j¹i}p;
if you divide the _{i}Q_{ij}s by this value you get
a matrix with mean rate 1.
Programs in PAML
do this normalization automatically for you._{ij} |