SSM vs. others: General conclusions
Materials from this page cannot be reproduced without permission from the authors.
Comparisons made on November 2002, using current versions of VAST, CE, DALI, DEJAVU and SSM v1.22 from 20/11/2002.
In our comparative analysis, we tried to evaluate the quality of SSM results
in comparison with a few, probably most known, public resources of
similar functionality -
VAST (NCBI/NIH, USA),
CE (SDSC, USA),
DALI (EBI, UK) and
DEJAVU (Uppsala, Sweden). The comparison was done
for the following 9 structures from the PDB archive:
1GPU:A, 1TIG and
4DFR:A. The structures were chosen at
random, the only two requirements were to cover a wide range of protein sizes
and to try different protein families. Certainly, the chosen structures
do not represent all possible situations. It is not feasible to perform an
exchaustive analysis, and therefore our results may be taken only as
very general indications, without implications of applicability in all
particular cases. We nevertheless are convinced that many general regularities
have been touched.
We were interested mainly in two aspects of SSM results: the quality of its 3D alignments and the quality of statistical significance evaluation for the matches. Below we summarize our findings, which, we hope, indicate the place of SSM among other servers, as well as the degree of their agreement with each other.
Quality of 3D alignment
3D alignments are characterized by the alignment length, RMSD and, sometimes,
by the number of gaps or fragments. Since the last two characteristics are
not always available, we did not consider them in our analysis. Although
it is possible to get the lists of matched residues from the servers, we
have not done any systematical research on those, too, as there are
difficulties with automatical retrieval of such data.
Unless the matched structures are highly similar, there is always a compromise between the alignment length, or the number of corresponded pairs of Ca-atoms, and RMSD. Speaking formally, for not very similar structures it is (almost) always possible to add a pair of Ca-s into the set of corresponded atoms at the expense of increased RMSD. Therefore, an alignment algorithm should have a set of rules, based on RMSD and probably other criteria (such as non-correspondence of Ca-s from different types of secondary structure elements, chain bends etc.) for matching the Ca-s. As appears, different servers employ different criteria for solving the compromise between alignment length and RMSD. Moreover, these criteria are complex enough so that servers show little monotonicity in the comparison. The latter means that it is not always possible to say that, e.g., one server systematically favours longer alignments at higher RMSDs as compared to another. For example, VAST produces significantly shorter alignments, than SSM, in the case of 1SAR:A (cf. Figure 1SAR:A-1). For the same structure, CE gives systematically longer, than SSM's, alignments (cf. Figure 1SAR:A-6). However, in the case of 1GTH:A SSM agrees equally well with both VAST and CE (cf. Figures 1GTH:A-1 and 1GTH:A-6). A similar non-monotonicity is often seen in alignments made by the same server. E.g, VAST offers longer, than SSM's, alignments for highly similar structural neighbours of 4DFR:A, but gives considerably shorter alignments for the remote ones (cf. Figure 4DFR:A-1). Looking through the results of comparisons, one can pick many examples of non-monotonicity. There are also examples of a good agreement (see, e.g. Figure 1COL:A-1).
With the account of the written above, we nevertheless conclude that one can expect SSM alignments to be somewhat longer than those from VAST, somewhat shorter than CE's, approximately the same as DALI's and substantially longer than those produced by DEJAVU. It also seems as a rule that servers show a better agreement with each other for larger structures (compare results for 1TIG, 88 residues, and 1GTH:A, 1019 residues). Comparison with DEJAVU is often confusing. As a rule, DEJAVU does not recognize the input structure, finds just a few or none of its closest structural neighbours, and instead offers a number of matches with remote structures, giving short alignments with low RMSDs. DALI and CE also do not always recognize the input, which is probably due to outdated databases. DALI sometimes shows quite specific problems like not aligning a structure to itself fully (see Figure 1LDC:A-10).
The root mean square deviations, RMSDs, given by different servers, show a similar degree of variance and non-monotonicity as the alignment lengths do (follow the references above). However, we found that RMSDs are in a remarkably good correlation with the alignment lengths: the longer the alignment, the larger is the corresponding RMSD. The dependence between RMSD and alignment length should be a server-specific one due to the differences in the servers' alignment algorithms. In order to show the correlation between alignment length and RMSD, we calculated the match index for all alignments that we analysed. Match index is a quality characterstics of 3D alignment, used by SSM for choosing better alignments. It represents a monotonic functional of both alignment length and RMSD. It increases with increasing alignment length and/or decreasing RMSD and ranges from 0+ (very large RMSDs or short alignments) to 1 at ideal alignment (all residues of both structures are matched at zero RMSD; clearly, match index reaches 1 only for identical structures). The match index is therefore an indicator of a principal quality of alignment as derived from the alignment length and RMSD. It shows the balance between those two; the higher is match index, the higher is the quality of 3D alignment (longer alignments at similar RMSDs).
As appears, all servers produce 3D alignments with remarkably close match indexes. In many instances, match indexes almost coincide at considerable deviations of alignment lengths and RMSDs (compare, e.g., Figures 1COL:A-12, 1COL:A-10 and 1COL:A-11; another example: Figures 4DFR:A-8, 4DFR:A-6 and 4DFR:A-7). Although different servers show certain deviations of match indexes as well, the overall agreement between servers on this particular characteristics is so good that it can not be purely circumstantial. We therefore conclude that quality of 3D Ca-alignment can not be measured by the alignment length or RMSD alone. It can be measured only by a complex characteristics, taking into account both of them, and match index seems to be a good candidate for that.
In respect to the match index, SSM agrees best with VAST, CE and DALI. Looking through all the examples, one may conclude that SSM gives a slightly higher match index on average, especially for the remote structural neighbours. The latter may be a reason for a poorer agreement with DEJAVU, because it offers mostly remote structures in every output we have from it. The fact that SSM produces alignments with somewhat higher match indexes should not be necessarily taken as an indication of SSM prevalence in quality. Simply, match index is exactly what SSM is tuned to maximize, while other servers are, effectively, optimizing different, although (apparently) similar functionals.
It should be noted that match index does not measure structural similarity, and neither does that alignment length nor RMSD alone, in general. Being essentially a ratio of alignment length and RMSD, match index may, hypothetically, take similar values for longer alignment with larger RMSD and for shorter alignment with smaller RMSD. It is probably correct to say that higher structural similarity is indicated by longer alignment at higher match index.
Statistical significance of individual matches is traditionally measured
by P-value, that is a probability of getting a same-or-better quality of 3D
alignment at random looking through the database, i.e. simply by chance.
Another representation of statistical significance is given by
Z-score. Z-score is calculated from the assumption that quality of 3D
alignment, at random picking the structures from the database, obey
the Gaussian statistics. Z-score is then associated with P-value by the
following kind of relation: P = erfc(Z/2),
which may differ in details. Obviously, different servers differ in the way
they calculate P-values and Z-scores. For Combinatorial Extension (CE),
that is clearly explained in Ilya Shindyalov and P. Bourne (1998),
Protein Engng. 11, N9, pp.739-747. We are grateful to Dr. Stephen H. Bryant
(NCBI/NIH) for explaining this matter in relation to VAST in a personal
communication. Technical details of DALI, including the scoring, may be
found in Liisa Holm and Chris Sander (1993), J.Mol.Biol. 233, 123-138.
DEJAVU is described in G.J. Kleywegt & T.A. Jones (1997), Methods in
Enzymology 277, 525-545. SSM does not copy any of the above for its scoring,
however it borrows ideas from CE and VAST; the corresponding description
is in preparation.
In our experiments on comarison of SSM with other servers, we have discovered quite a diversity in the estimates of statistical significance. Just as discussed above for the alignment length and RMSD, P-values and Z-scores show a very non-monotonous comparison. For example, Figures 1COL:A-4 and 1COL:A-5 show the best agreement between SSM and VAST in P-values and Z-scores, obtained by us. Figures 1GPU:A-4 and 1GPU:A-5 show a good agreement in P-values but disagreement in absolute values of Z-scores (however Z-scores agree nicely in trends). In the case of 1SAR:A, both P-values and Z-scores, given by SSM and VAST, differ in absolute values (cf. Figures 1SAR:A-4 and 1SAR:A-5). This example has a very common nature. The reasons for that should be found in different definitions for P-values and Z-scores employed by different servers, as well as in different datasets used for the calibration of score functions.
Nevertheless, from our comparison research we conclude that one can expect that on comparison with VAST, SSM gives lower P-values for more similar structures and about the same or lower P-values for dissimlar ones. On average, SSM's Z-scores are lower than those from VAST. It is probably VAST that SSM compares best with in the evaluation of statistical significance of the matches. Z-scores, produced by CE, correlate with those from SSM better if the former are multiplied by 2. On comparison with DALI, SSM gives lower Z-scores for more similar structures and higher Z-scores for remote structures. Interestingly enough, in most cases Z-scores from DALI compare much better with minus logarithm of SSM's P-values. On average, SSM gives higher P-values than those from DEJAVU, while DEJAVU produces higher Z-scores. We find it confusing that in many instances (cf., e.g. Figure 1GTH:A-17) DEJAVU assignes zero or very small P-values to matches with remote structures. A zero P-value means that match of a better quality can not be obtained; such match, by definition, should be the one with the structure itself.
Given a range of disagreements between all the servers, it seems that no standard may be offered. It is, however, possible to come to a common basis in the statistical evaluations if one neglects the absolute figures for P-values and Z-scores and concentrates on their relative changes. As found, the very general trend (increasing P-values and decreasing Z-scores with increasing dissimilarity of the structures) is always there, so that P-values and Z-scores may be used for ranking the matches. It was also found that all servers differ in small details of this ranking, and agree only in the identification of very similar and very dissimlar structures, allowing sometimes also for identification of the intermediate ones (see, e.g. Figures 4DFR:A-4 and 4DFR:A-5).
It seems that, principally, statistical significance should be a better
indicator of structural similarity than quality of 3D alignment as given
by match index or
similar quality characteristics. Match index expresses essentially the
balance between alignment length and RMSD, which may take similar
values for long and short alignments, depending on the corresponding
RMSDs. P-value and Z-score also depend on the quality of 3D alignment.
If calculated correctly, statistical significance should go down with
decreasing the alignment length and/or increasing RMSD. However on top
of that, a match becomes less surprising if a smaller part of the structures
is identified as the common one. This means that P-values and Z-scores
penalize short alignments additionally, which should make them more
appropriate measures of structural similarity. We however see the problem with
uniformity of statistical significance evaluations between different
servers (probably concepts) and its dependence on the calibration database.
Therefore we conclude that structural similarity has a relatively vague
definition, if any, which remains up to the context of a particular