SSM (MSDfold) beginner tutorial |
IntroductionThis tutorial explains the purpose and functionality of MSD web-service SSM (Secondary Structure Matching). It contains a description of server structure and all basic control elements. After reading this tutorial you will be able to use SSM for examining a given structure for similartity with other structure or structures, either found in PDB or SCOP archives, or uploaded from your desktop. You will also have an understanding of basic scores used by SSM for evaluating the structure similarity and how to identify similar parts in the compared structures. It is expected that reading this tutorial will allow you to start using SSM successfully, and more specific information on different aspects of working with the server is found in the on-line help, available from every web-page displayed.
Technical preconditionsFor working with SSM successfully, you need a modern web browser (we recommend Netscape 6.XX or later, Microsoft IE 5.X or later). It is necessary to make sure that your browser does not use cache (see instructions here). It is also necessary to have JavaScript enabled (which is a standard setting). In order to visualize structures, you will need Rasmol or Rastop installed on your desktop as a plug-in or helper application to your browser. Please see instructions on using visualization software in SSM. Purpose and limitationsSSM is a generic tool for alignment of protein structures in 3D. It allows for
The service is limited to proteins with expressed secondary structure motifs, therefore it cannot be applied to
Server structure and general procedureServer structure is presented in Figure 1. It has two mainstreams: a) for pairwise alignment in 3D and database searches b) for multiple 3D alignment. In pairwise 3D alignment, query structure is aligned to a target structure or to the structures from a database. This alignment is used for finding and analysing structural homologs. SSM first returns the list of found matches, or alignments with similarity score above a certain threshold set in advance. Details of matches are available on separate pages linked from the result list. Selected structures from the result list may be submitted to multiple 3D alignment. In multiple 3D alignment, more than 2 structures are aligned simlutaneously to each other. This alignment is used for identifying common structural motifs in sets of structures, or in structural families. Submission to multiple 3D alignment may be done from a special submission page or from the list of results of pairwise alignment. Details of multiple alignment are returned in a single page.
Submission of a query to SSM initiates costly calculations, which are performed in parallel on several CPUs. During the calculation time (normally less than a minute for a typical query), SSM displays a self-updating wait page, which is automatically replaced by result page upon completion. In rare instances, when query takes a considerable time (as may be identified by progress indicator in the wait page), the wait page may be bookmarked, then you are free to switch your browser to other tasks. Calling the bookmarked wait page later on will bring you directly to your results. SSM keeps results for up to 2 days or 4 hours since the last access, whichever is sooner. Each submissions is identified by unique session ID, which is assigned automatically by the server. Session IDs are displayed in the wait page and in submission pages at repeated submissions. You may submit new queries in the same session, which will save space on our server but erase all previous results in that session. In order to keep the results of a particular session, new queries should be submitted with new session IDs (this option is found in submission pages). Access to individual session results is available only through bookmarked wait pages, therefore we advise to bookmark them whenever you anticipate working in more than 1 session simlutaneously. Results of pairwise and multiple alignments do not interfere with each other, so that performing multiple alignment in the session does not erase existing results of pairwise alignment and vice versa. Getting started
Click button
Detail description of all control elements may be obtained from
SSM on-line help (follow the link
Default parameters in the submission page are good for most of the
typical queries. Suppose we'd like to find structural neighbours
in PDB archive for PDB entry
Reading the result list
Figure 3 exemplifies list of results obtained
for the query described above. Detail description of all figures in
the table is available from SSM on-line help (follow the link
The list of matches is ordered by one of several available similarity
scores such that matches with higher similarity are found on the top
of the list (you may reorder the list using "Sort by" selector in
the bottom of the page). First list position is always occupied by an
identical match, if such a match has been found in the database. By
default, matches are ordered by decreasing the Q-score (cf.
SSM on-line help). This score ranges from 0 ("completely dissimilar"
structures) to 1 (identical structures). The higher is Q,
the higher, in general, is the similarity. Values of
Q ≥ 0.1 correspond to a reasonable similarity,
easily identifiable by visual inspection. All matches to
P-score represents Root-mean square deviation at best structure superposition RMSD and number of aligned residues Nalgn are traditional characterstics for evaluating the structural similarity. In general, they are not very indicative as it is almost always possible to achieve longer alignments at higher RMSD. We therefore recommend using Q-score for measuring the structural similarity, which takes both RMSD and alignment length into account and is less subjective to the particular compromise between them accepted by alignment algorithm (see more details in the SSM on-line documentation).
Control elements in the bottom of the page allow you (please find more details in SSM on-line documentation)
Details of structure alignmentHighlighted match numbers in the leftmost column of the result list (cf. Figure 3) are links to pages with more details on individual matches. Following such a link, you get to pages similar to one shown in Figure 4. The page starts and ends with navigation buttons allowing to move up and down the results withowt going back to the result list. First Table presents an extended version of summary data shown in the result list: scoring, titles, total number of residues and SSEs and their matched percent for both structures.
The set of buttons following the summary table, makes visualization
and download of the structures. The left and right
Below the row of visualization and download buttons comes the data on secondary structure alignment. First two lines give the match topology, which allows to quickly identify which secondary structure elements have been aligned (i.e. occupy geometrically equivalent positions). The following table contains details of all matched SSEs. Entry
2|H1 13|A|LEU 24 |LEU 36 |
means "2nd (from the begining of the chain) SSE in the structure,
helix of class 1 (in PDB notations), 13 residues long, chain A,
starting from Leu24 to Leu36". Strands are denoted by "SD".
The SSE alignment table is foolowed by links to available web resources for the query and target structures, and by the visualization and sequence download buttons. The links are based on the annotation found in the PDB or uploaded files, therefore they are absent if you upload bare coordinates. The download buttons download sequences for query and target structures in FASTA format. The visualization buttons act similarly to that described above, however here the colouring is made for secondary structure elements rather than individual residues.
The SSE alignment table is followed by rotation-translation matrix
of best structure superposition and download buttons for the page
content. The structure superposition matrix is that applied to
all
The match page is finished with the Table of C-alpha or residue
alignment. It shows the correspondence between query and target
residues at best structure superposition as given by the
superposition matrix above. Unmatched (too distant or no-partner)
pairs are shown in black, pair of matched identical residues
- in red, matched non-identical residues - in cyan. The middle
column shows the distance between matched C-alphas with its mnemonical
equivalent for quick grasp. Additional columns on the left from
residue IDs show the respective SSE (" Multiple alignment in 3D
Multiple alignment in 3D is used for identifying common structural
motifs in more than 2 structures. The task often arises after
performing a search for structural neighbours in databases like
the one described above on the example of PDB entry
Alternatively, you may proceed directly to the submission page
for multiple alignment in 3D (Figure 6)
by setting a checkmark on the "
After submitting to multiple alignment, SSM takes you to a self-updating wait page, showing the calculation progress. Typical submissions of up to 10 structures normally finish in under-a-minute time. In general, the calculation time is proportional to the square of the number of structures. Although we do not put any principal limit on the number of multiply aligned structures, it is limited by computer memory and CPU cost. To our experience, SSM copes well with submission of as many as 40-60 structures, however then it produces a bulk of data (see below), which is often difficult to rationalize for a human being. Therefore it is advisable to keep submissions as sensible as possible. It should be noted that multiple alignment does not reduce to joining the results of pairwise alignment. In many instances, where structure similarity is not high, you may discover that SSM alignes structures differently in pairwise and multiple alignments. Keep in mind that with default settings, SSM delivers only best matches in pairwise alignment. However, best-matching parts of any two structures may be non-optimal match candidates for other structures in the list. SSM calculates an optimal multiple match by a carefully chosen criteria, which are described in a respective publication. Results of multiple alignment in 3D
Results of multiple alignment are similar to those of pairwise
alignment (described above). The results
are presented in a single page. For convenience of further discussion,
results of multiple alignment of structures
Summary of the results (Figure 7) shows general
characteristics of structures and the alignment: number of total/aligned
residues and SSEs, overall and consensus RMSD and Q-score. "Consensus"
refers to a virtual structure, which, if exists, would be a best
approximation of all aligned structures. Consensus scores give
you an idea of whether there is a core group of structures which are
most close to each other. In our example, Overall scores represent a mere average of all n(n-1)/2 pairwise scores (where n stands for the number of structures). These scores are there to report you how diverse are the structures in general. Two download buttons on the top of the page are for getting the page content in plain text or XML format. XML file is recommended for data exchange between different software applications, because using it preserves backward compatibility at possible modifications of SSM in the future.
The
The
Multiple alignment of secondary structures is given in the next part
of the page (Figure 9). Only schematic alignment
is given, all secondary structure details are available by pushing
the
More details on structural similarity may be obtained from the tables of
pairwise scores (Figure 10). Analysis of the
tables shows that indeed, first three structures in the list are close
to each other, with other two staying apart. The tables also show
that the consensus-remote
Next part of the result page presents matrices of best structure
superposition. The matrices are applied automatically when SSM
sends Rasmol scripts or downloads the structures in response to
pushing the
The results are finished with the residue alignment table
(Figure 12). The table shows unmatched
(not aligned) residues in black, identical aligned residues -
in red, others are presented in green if type of residue is
identical in most of the structures, and in cyan otherwise.
The columns on left from the residue ID indicate secondary
structure element, "
|