SSM (MSDfold) beginner tutorial

Introduction

This tutorial explains the purpose and functionality of MSD web-service SSM (Secondary Structure Matching). It contains a description of server structure and all basic control elements. After reading this tutorial you will be able to use SSM for examining a given structure for similartity with other structure or structures, either found in PDB or SCOP archives, or uploaded from your desktop. You will also have an understanding of basic scores used by SSM for evaluating the structure similarity and how to identify similar parts in the compared structures. It is expected that reading this tutorial will allow you to start using SSM successfully, and more specific information on different aspects of working with the server is found in the on-line help, available from every web-page displayed.

  1.  Technical preconditions
  2.  Purpose and limitations
  3.  Server structure and general procedure
  4.  Getting started
  5.  Reading the result list
  6.  Details of structure alignment
  7.  Multiple alignment in 3D
  8.  Results of multiple alignment in 3D

Technical preconditions

For working with SSM successfully, you need a modern web browser (we recommend Netscape 6.XX or later, Microsoft IE 5.X or later). It is necessary to make sure that your browser does not use cache (see instructions here). It is also necessary to have JavaScript enabled (which is a standard setting).

In order to visualize structures, you will need Rasmol or Rastop installed on your desktop as a plug-in or helper application to your browser. Please see instructions on using visualization software in SSM.


Purpose and limitations

SSM is a generic tool for alignment of protein structures in 3D. It allows for

  • pairwise comparison and 3D alignment of protein structures
  • multiple comparison and 3D alignment of protein structures
  • examination of a protein structure for similarity with the PDB, SCOP or user-supplied archives
  • best Ca-alignment of compared structures
  • best superposition of aligned structures
  • download and visualization of best-superposed structures using RasMol (Unix/Linux platforms) or RasTop (MS Windows machines)
  • linking the results to other services - OCA, SCOP, GeneCensus, FSSP, 3Dee, CATH, PDBSum, SWISS-PROT and ProtoMap.

The service is limited to proteins with expressed secondary structure motifs, therefore it cannot be applied to

  • small molecules or ligands
  • non-protein objects such as DNA chains
  • short protein chains or short selections from protein chains that do not contain secondary structure elements
  • protein structures represented only by backbone Ca-atoms


Server structure and general procedure

Server structure is presented in Figure 1. It has two mainstreams: a) for pairwise alignment in 3D and database searches b) for multiple 3D alignment.

In pairwise 3D alignment, query structure is aligned to a target structure or to the structures from a database. This alignment is used for finding and analysing structural homologs. SSM first returns the list of found matches, or alignments with similarity score above a certain threshold set in advance. Details of matches are available on separate pages linked from the result list. Selected structures from the result list may be submitted to multiple 3D alignment.

In multiple 3D alignment, more than 2 structures are aligned simlutaneously to each other. This alignment is used for identifying common structural motifs in sets of structures, or in structural families. Submission to multiple 3D alignment may be done from a special submission page or from the list of results of pairwise alignment. Details of multiple alignment are returned in a single page.

 
 
Figure 1. SSM server structure

Submission of a query to SSM initiates costly calculations, which are performed in parallel on several CPUs. During the calculation time (normally less than a minute for a typical query), SSM displays a self-updating wait page, which is automatically replaced by result page upon completion. In rare instances, when query takes a considerable time (as may be identified by progress indicator in the wait page), the wait page may be bookmarked, then you are free to switch your browser to other tasks. Calling the bookmarked wait page later on will bring you directly to your results. SSM keeps results for up to 2 days or 4 hours since the last access, whichever is sooner.

Each submissions is identified by unique session ID, which is assigned automatically by the server. Session IDs are displayed in the wait page and in submission pages at repeated submissions. You may submit new queries in the same session, which will save space on our server but erase all previous results in that session. In order to keep the results of a particular session, new queries should be submitted with new session IDs (this option is found in submission pages). Access to individual session results is available only through bookmarked wait pages, therefore we advise to bookmark them whenever you anticipate working in more than 1 session simlutaneously. Results of pairwise and multiple alignments do not interfere with each other, so that performing multiple alignment in the session does not erase existing results of pairwise alignment and vice versa.


Getting started

Click button SUBMISSION in the SSM start page, it takes you to the submission page for pairwise 3D alignment (see Figure 2).

Detail description of all control elements may be obtained from SSM on-line help (follow the link explanation of input in the page). Two radiobuttons on top of the page switch between submission pages for pairwise and multiple alignments. Main part of the page is divided into two halves. Left half is used for the specification of query structure (PDB or SCOP entry, or upload a coordinate file in PDB or mmCIF format). Right half describes target structure or structures. Type of query/target is set using the drop-down lists in the corresponding parts of the form.

 
 
Figure 2. Submission page for pairwise 3D alignment

Default parameters in the submission page are good for most of the typical queries. Suppose we'd like to find structural neighbours in PDB archive for PDB entry 4dfr. A typical course of actions is as follows:

  1. Select "PDB entry" as type of query.
  2. Set up "4dfr" in the "PDB code" input field (cf. Figure 2). The structure can now be visualized in Rasmol using the view link next to the field.
  3. If necessary, set up chains of the query structure (as a comma-separated list) to be examined in the "Chains" input field for the query. PDB entry 4dfr contains 2 virtually identical chains A and B, therefore it may make sense to examine only one of them. Pushing the button Find chains will fill the field with chain IDs of all chains IDs found in the entry like "A,B,''" ("''" stands for chain without a chain ID). For simplicity, a wildcard "*" may be given, which means "all chains". There are alternative ways to select a query substructure of interest, cf. SSM on-line documentation.
  4. Select "All PDB archive" as type of target.
  5. If necessary, adjust the similarity thresholds for query and target structures. Since a similarity may be found between any two structures, SSM needs to know what level of similarity is meaningful for your purposes in order to filter out meaningless matches. It makes the list of results shorter and more up to the point, but besides that, similarity-limited searches are considerably (hundred times) faster. Similarity thresholds are set up separately for query and target structures as "Lowest acceptable match" in percents of overlaping secondary structure. For instance, thresholds of 70% for query and 50% for target mean that only matches where at lest 70% of query's SSEs are aligned to at least 50% of those of the target(s). It may be difficult to guess the right figures from first try, and a repeat search may be needed. In some cases, SSM will advise you about the similarity thresholds.
  6. Check the state of remaining controls in the page. In particular, put check on "match individual chains" if you want SSM to align chains of each structure individually (thus, aligning structure A of 2 chains to structure B of 3 chains would produce 6 matches), otherwise leave it empty (then each structure is treated as a sole entity regardless of the number of chains in it). Other controls are rarely used with non-default settings. Please consult SSM on-line help in order to learn about them.
  7. Push "Submit" button. After a short while, you will be taken to self-updating wait page, which is automatically replaced by a result list when ready.

Reading the result list

Figure 3 exemplifies list of results obtained for the query described above. Detail description of all figures in the table is available from SSM on-line help (follow the link explanation of output in the page). The list in Figure 3 shows structural matches obtained for chain A of PDB entry 4dfr. In order to see results for other chains or for all chains alltogether, simply select the appropriate line in the drop-down chain list in the to of the page.

The list of matches is ordered by one of several available similarity scores such that matches with higher similarity are found on the top of the list (you may reorder the list using "Sort by" selector in the bottom of the page). First list position is always occupied by an identical match, if such a match has been found in the database. By default, matches are ordered by decreasing the Q-score (cf. SSM on-line help). This score ranges from 0 ("completely dissimilar" structures) to 1 (identical structures). The higher is Q, the higher, in general, is the similarity. Values of Q ≥ 0.1 correspond to a reasonable similarity, easily identifiable by visual inspection. All matches to 4dfr:A, shown in Figure 3, are highly similar, which may be seen from data in individual match pages (cf. next section).

P-score represents -lg(Pv), where Pv is P-value showing how surprizing is the similarity. It is estimated as a probability to find a better match simply by chance, at random screening the database. P-value has been calibrated on the non-redundant set of SCOP folds, which is just under 1000 structures, and therefore values of P ≤ 3 indicate completely occasional matches. Z-score is a different representation of P-value, cf. on-line help.

Root-mean square deviation at best structure superposition RMSD and number of aligned residues Nalgn are traditional characterstics for evaluating the structural similarity. In general, they are not very indicative as it is almost always possible to achieve longer alignments at higher RMSD. We therefore recommend using Q-score for measuring the structural similarity, which takes both RMSD and alignment length into account and is less subjective to the particular compromise between them accepted by alignment algorithm (see more details in the SSM on-line documentation).

 
Figure 3. List of results of the pairwise 3D alignment

Control elements in the bottom of the page allow you (please find more details in SSM on-line documentation)

  • Resort the results using one of the similarity scores.
  • Group matches by SCOP family of target structures (sorting order will be kept within the families and between them using best matches).
  • Scroll the result list to a particular match number (useful fo long result lists).
  • Download the list as a plain text or XML file.
  • Download sequences of target structures in FASTA format.
  • Download aligned sequences in FASTA format.
  • Bring selected (checked in column "×") matches on the top of the list (useful for picking potentially sensible matches in long lists).
  • Submit target structures of selected matches, optionally with the query structure, for multiple 3D alignment (cf. below).


Details of structure alignment

Highlighted match numbers in the leftmost column of the result list (cf. Figure 3) are links to pages with more details on individual matches. Following such a link, you get to pages similar to one shown in Figure 4.

The page starts and ends with navigation buttons allowing to move up and down the results withowt going back to the result list. First Table presents an extended version of summary data shown in the result list: scoring, titles, total number of residues and SSEs and their matched percent for both structures.

The set of buttons following the summary table, makes visualization and download of the structures. The left and right view buttons send Rasmol scripts for visualization of query and target structures, respectively. The structures come best-superposed and coloured in gray and cyan (query structure comes in darker colours) for unmatched and matched residues, correspondingly. The central view superposed button combines these scripts into one so that you the structures superposed like in Figure 5. The left download button downloads the original PDB file of query structure, the right download button downloads target structure superposed over the query.

 
Figure 4. Details of pairwise 3D alignment for individual matches

Below the row of visualization and download buttons comes the data on secondary structure alignment. First two lines give the match topology, which allows to quickly identify which secondary structure elements have been aligned (i.e. occupy geometrically equivalent positions). The following table contains details of all matched SSEs. Entry

    2|H1  13|A|LEU  24 |LEU  36 |
means "2nd (from the begining of the chain) SSE in the structure, helix of class 1 (in PDB notations), 13 residues long, chain A, starting from Leu24 to Leu36". Strands are denoted by "SD".

The SSE alignment table is foolowed by links to available web resources for the query and target structures, and by the visualization and sequence download buttons. The links are based on the annotation found in the PDB or uploaded files, therefore they are absent if you upload bare coordinates. The download buttons download sequences for query and target structures in FASTA format. The visualization buttons act similarly to that described above, however here the colouring is made for secondary structure elements rather than individual residues.

 
Figure 5. Aligned and superposed 4dfr:A and 1s3u:A, visualized in Rasmol.
The picture was obtained by pushing the view superposed button in page shown in Figure 4.

The SSE alignment table is followed by rotation-translation matrix of best structure superposition and download buttons for the page content. The structure superposition matrix is that applied to all XYZ coordinates of target structure (including the translation, as shown in the page) in order to get it superposed over the query. When you visualize or download target structure from the match page, SSM does that automatically for you, and the matrix is shown only for using it in other viewers or software. The download buttons on the left from the matrix make a download of the match page in either plain text or XML format. For the purpose of data exchange between different software applications, we recommend using the XML file because parsing it is immune to possible modifications of SSM in the future.

The match page is finished with the Table of C-alpha or residue alignment. It shows the correspondence between query and target residues at best structure superposition as given by the superposition matrix above. Unmatched (too distant or no-partner) pairs are shown in black, pair of matched identical residues - in red, matched non-identical residues - in cyan. The middle column shows the distance between matched C-alphas with its mnemonical equivalent for quick grasp. Additional columns on the left from residue IDs show the respective SSE ("S" for strands and "H" for helices) and hydrophobicity ("+" for hydrophilic, "-" for hydrophobic and "." for neutral residues). The column "SI" shows a measure of chemical equivalence of matched residues as number of dots from 0 (no chemical equivalence) to 5 (highly equivalent or identical residues). Please see more details in SSM on-line documentation (follow the link "notations" just above the table).


Multiple alignment in 3D

Multiple alignment in 3D is used for identifying common structural motifs in more than 2 structures. The task often arises after performing a search for structural neighbours in databases like the one described above on the example of PDB entry 4dfr. Having found a few structural neighbours to the query of interest, one may ask which structural elements are common for all of them. The simplest way of getting them aligned to each other is to put checkmarks on all structures of interest in the result list (cf. Figure 3) and push button Submit for Multiple Alignment in the bottom of the page. If the additional checkbox "include query" is checked (as shown in the Figure), the query structure is added into the list of multiply aligned structures.

Alternatively, you may proceed directly to the submission page for multiple alignment in 3D (Figure 6) by setting a checkmark on the "multiple" radiobutton in the top of submission page for pairwise alignment (see Figure 2). Specification of structures in the multiple alignment submission form is similar to that described above for the pairwise alignment. The data on the right of the form correspond to the highlighted entry in the list of structures found on the left, see Figure 6. New entry in the list of structures is created by pushing the button New entry, input of data must be followed by pushing the button Actualize. Going up and down the list of structure is done by highlighting the corresponding item in the list of structures; unwanted entries may be deleted using the button Delete entry. Figure 6 exemplifies a list of a few structures of a different remotness to PDB entry 4dfr:A picked from the result list shown in Figure 3.

 
Figure 6. Submission page for multiple alignment in 3D.

After submitting to multiple alignment, SSM takes you to a self-updating wait page, showing the calculation progress. Typical submissions of up to 10 structures normally finish in under-a-minute time. In general, the calculation time is proportional to the square of the number of structures. Although we do not put any principal limit on the number of multiply aligned structures, it is limited by computer memory and CPU cost. To our experience, SSM copes well with submission of as many as 40-60 structures, however then it produces a bulk of data (see below), which is often difficult to rationalize for a human being. Therefore it is advisable to keep submissions as sensible as possible.

It should be noted that multiple alignment does not reduce to joining the results of pairwise alignment. In many instances, where structure similarity is not high, you may discover that SSM alignes structures differently in pairwise and multiple alignments. Keep in mind that with default settings, SSM delivers only best matches in pairwise alignment. However, best-matching parts of any two structures may be non-optimal match candidates for other structures in the list. SSM calculates an optimal multiple match by a carefully chosen criteria, which are described in a respective publication.


Results of multiple alignment in 3D

Results of multiple alignment are similar to those of pairwise alignment (described above). The results are presented in a single page. For convenience of further discussion, results of multiple alignment of structures 4dfr:A, 1dhi:B, 1dre:, 1dis: and 1j3i:B, corresponding to the submission shown in Figure 6, are presented in 5 parts below (Figures 7,9-11).

 
Figure 7. Results of multiple alignment in 3D.
Part I: Summary.

Summary of the results (Figure 7) shows general characteristics of structures and the alignment: number of total/aligned residues and SSEs, overall and consensus RMSD and Q-score. "Consensus" refers to a virtual structure, which, if exists, would be a best approximation of all aligned structures. Consensus scores give you an idea of whether there is a core group of structures which are most close to each other. In our example, 4dfr:A, 1dhi:B and 1dre: have lower RMSD and higher Qs, therefore, they form such a core, or cluster, while 1dis: and 1j3i:B stay somewhat apart from them in terms of similarity distance. The extent of the identified common structure similarity in multiple alignment is limited by consensus-remote structures. As a rule, removing one of the core structures does not change the alignment significantly. On contrary, removing a consensus-remote structure increases the identifiable common similarity. These considerations should be taken into account when multiple alignment is used as a tool for composing structure families.

Overall scores represent a mere average of all n(n-1)/2 pairwise scores (where n stands for the number of structures). These scores are there to report you how diverse are the structures in general.

Two download buttons on the top of the page are for getting the page content in plain text or XML format. XML file is recommended for data exchange between different software applications, because using it preserves backward compatibility at possible modifications of SSM in the future.

The view and download buttons in each row of the summary table send Rasmol script and download a PDB file for the respective structure, correspondingly. The structures are automatically oriented into best superposition with each other, as found by the multiple alignment procedure. Matched residues of the structures are colored in Rasmol script (SSM tries to pick a unique colour for each structure), unmatched residues are left in gray.

The view superposed button below the summary table sends a Rasmol script for visualizing selected structures (selection is done by setting the checkmark in the corresponding row of the summary table). Figure 8 shows superposition of multiply aligned structures obtained in this example. Common part of the structures are coloured, others are left in gray. The remaining button, download FASTA alignment, downloads (structurally) aligned sequences of structures selected in the summary table.

 
Figure 8. Aligned and superposed 4dfr:A, 1dhi:B, 1dre:, 1dis: and 1j3i:B, visualized in Rasmol.
The picture was obtained by pushing the view superposed button in page shown in Figure 7.

Multiple alignment of secondary structures is given in the next part of the page (Figure 9). Only schematic alignment is given, all secondary structure details are available by pushing the SSE details button on the left from the table. As seen from the results, secondary structure motifs of 4dfr:A, 1dhi:B, 1dre:, 1dis: show higher similarity than that of 1j3i:B to any of them. As we saw from the consens scores above, first 3 structures form the alignment core. Figure 9 also suggests that 1dis: should have a high resemblance of the core structures in general, and 1j3i:B represents basically the same structure with a few additional elements.

 
Figure 9. Results of multiple alignment in 3D.
Part II: Secondary structure alignment.

More details on structural similarity may be obtained from the tables of pairwise scores (Figure 10). Analysis of the tables shows that indeed, first three structures in the list are close to each other, with other two staying apart. The tables also show that the consensus-remote 1dis: and 1j3i:B do not form a separate structural group, as the distance between them is comparable to those between them and core structures. Comparison of sequence identities (i.e. the fractions of identical residue types in matched pairs) shows that 1dis: has a distinctly different composition to core structures, which share more than 98% of identical residues. This explains a lower consensus scores for this structure despite its overall good resemblance to first three structures. It is also seen that 1j3i:B has composition that is nearly equally distant from that of 1dis: and core structures. One can see from these results that only 25-30% of sequence identity may be enough for a good structure similarity.

 
Figure 10. Results of multiple alignment in 3D.
Part III: Pairwise scores.

Next part of the result page presents matrices of best structure superposition. The matrices are applied automatically when SSM sends Rasmol scripts or downloads the structures in response to pushing the download buttons in the summary table (Figure 7).

 
Figure 11. Results of multiple alignment in 3D.
Part VI: Matrices of best superposition.

The results are finished with the residue alignment table (Figure 12). The table shows unmatched (not aligned) residues in black, identical aligned residues - in red, others are presented in green if type of residue is identical in most of the structures, and in cyan otherwise. The columns on left from the residue ID indicate secondary structure element, "S" for strands and "H" for helices.

 
Figure 12. Results of multiple alignment in 3D.
Part V: Residue alignment.