The Scorecons Server

Instructions

Scorecons is a program that quantifies residue conservation in a multiple sequence alignment. Given a multiple sequence alignment file, Scorecons calculates the degree of amino acid variability in each column of the alignment and returns this information to the user. Read more about the different scores used in my paper:

Valdar WSJ (2002). Scoring residue conservation. Proteins: Structure, Function, and Genetics. 43(2): 227-241.

To go and use The Scorecons Server, click here.
 

Contents

Walkthrough example 1

Suppose we have an alignment of Haemoglobin (below). Intuitively, some positions are more conserved than others. But so that we can analysis this family more rigorously, we would like to put a number on how conserved each position is. This is where The Scorecons Server comes in.

  1. First, we need to have our alignment in a format The Scorecons Server can read. It must be in either FASTA format (eg, ike this) or PIR format (eg, like this). Most multiple alignment programs can be made to write to at least one of these formats.
  2. Tell The Scorecons Server where your alignment file is using the "Browse" button in the "Alignment input" box.
  3. Skip to the "Action" box at the bottom of the page and press the "Score conservation" button.
Provided the alignment file was correctly specified and in an appropriate format, you should see a results page. This provides information about the alignment under two headings "Diversity assessment" and "Conservation scores". First, let's look at the "Conservation scores". The conservation scores are presented in three columns:
  1. The conservation score. This ranges from 0 for unconserved to 1 for highly conserved.
  2. The focus residue identifier (see later).
  3. The column of residues from which the score was calculated. So it can fit on one line, this column is shown on its side.
0.000           #              ---S--------
0.000           #              ---T--------
0.000           #              ---S--------
0.000           #              ---T--------
0.000           #              ---S--------
0.024           #              ---TM-M-----
0.320           #              MMMSV-VVMM--
0.278           #              VVVDHMHHGGG-
0.500           #              LLLYLLWWLLL-
0.603           #              SSSSTTTSSSS-
0.391           #              PAAAPAAEDDD-
0.424           #              AAAAEEEVGGQT
0.809           #              DDDDEEEEEEEE

The first few columns are almost entirely gaps, and so have the lowest possible conservation. The last column shown contains only very similar types of amino acids and no gaps. Its score is correspondingly high.

This alignment contains lots of information. Almost every position score is different. This diversity of scores is recorded in the "Diversity assessment" part of the results. Diversity of position scores (or "Dops") considers the number of different scores in the alignment and the relative frequency of each score. The Dops for this alignment is 95.2%. If no two positions had the same score, the Dops would be 100%. If all positions had the same score, Dops would be 0%. For example, the alignment below of the PDB sequence 1af5 does not contain much information.

All the sequences are virtually identical, as are the conservation scores. The Dops for this alignment is 8.3%, reflecting this.
 

Walkthrough example 2

Consider again the haemoglobin alignment.

Suppose we are interested only in the cow alpha chain (Cow_alpha_Hb above) and how conserved its residues are. In this case, we can get The Scorecons Server to score only those columns in which the cow alpha chain has a residue.

  1. As before make sure the alignment is in the right format and use the "Browse" button to locate the appropriate file.
  2. In the "Output options" box, click the "only positions relating to sequence" option.
  3. Cow alpha haemoglobin is the 2nd sequence in the alignment. Since The Scorecons Server treats the top sequence as sequence "0", the cow alpha haemoglobin sequence is therefore sequence "1", so type "1" in the associated box.
  4. Click on "Score conservation" in the "Action" box.
The results now look slightly different from in the previous example. The "Conservation scores" section should start something like this:

0.320           M              MMMSV-VVMM--
0.278           V              VVVDHMHHGGG-
0.500           L              LLLYLLWWLLL-
0.603           S              SSSSTTTSSSS-
0.391           A              PAAAPAAEDDD-
0.424           A              AAAAEEEVGGQT
0.809           D              DDDDEEEEEEEE
0.453           K              KKKRKKKLWWWW
0.409           G              TGNASAQHQQQE

Notice information is given only positions that are non-gap for cow alpha haemoglobin. Also, notice the middle column. When we scored every position in the alignment indescriminantly, there were only "#"-signs in this column. Now this column contains the residue type of the target (or "focus") sequence. Reading the middle column from top to bottom gives the entire unbroken cow alpha chain sequence. The Dops is also slightly higher than in the previous example, at 95.9%. This is because it now measures the diversity of scores pertaining only to the cow alpha sequence.
 

Scoring methods

Method name Type of score Description
basicmdm Sum-of-Pairs (SP), matrix score Simplest SP score possible
entropynorm7 Entropic Normalized Shanon entropy with 7 symbol types
entropynorm21 Entropic Normalized Shannon entropy with 21 symbol types.
trident Entropic, matrix score, sequence weighted Mixed model score. 
valdar01 SP, matrix score, sequence weighted Score used in Valdar & Thornton 2001

Score parameters

Tips