The Scorecons Server
Scorecons is a program that quantifies residue conservation in a multiple
sequence alignment. Given a multiple sequence alignment file, Scorecons
calculates the degree of amino acid variability in each column of the alignment
and returns this information to the user. Read more about the different scores used in my paper:
Valdar WSJ (2002). Scoring residue conservation. Proteins: Structure, Function, and Genetics. 43(2): 227-241.
To go and use The Scorecons
Server, click here.
Walkthrough example 1
Suppose we have an alignment of Haemoglobin (below). Intuitively, some
positions are more conserved than others. But so that we can analysis this
family more rigorously, we would like to put a number on how conserved
each position is. This is where The Scorecons Server comes in.
Provided the alignment file was correctly specified and in an appropriate
format, you should see a results page. This provides information about
the alignment under two headings "Diversity assessment" and "Conservation
scores". First, let's look at the "Conservation scores". The conservation
scores are presented in three columns:
First, we need to have our alignment in a format The Scorecons Server
can read. It must be in either FASTA
format (eg, ike this) or PIR
format (eg, like this). Most multiple alignment
programs can be made to write to at least one of these formats.
Tell The Scorecons Server where your alignment file is using the
"Browse" button in the "Alignment input" box.
Skip to the "Action" box at the bottom of the page and press the "Score
The conservation score. This ranges from 0 for unconserved to 1 for highly
The focus residue identifier (see later).
The column of residues from which the score was calculated. So it can fit
on one line, this column is shown on its side.
The first few columns are almost entirely gaps, and so have the lowest
possible conservation. The last column shown contains only very similar
types of amino acids and no gaps. Its score is correspondingly high.
This alignment contains lots of information. Almost every position score
is different. This diversity of scores is recorded in the "Diversity assessment"
part of the results. Diversity of position scores (or "Dops") considers
the number of different scores in the alignment and the relative frequency
of each score. The Dops for this alignment is 95.2%. If no two positions
had the same score, the Dops would be 100%. If all positions had the same
score, Dops would be 0%. For example, the alignment
below of the PDB sequence 1af5 does not contain much information.
All the sequences are virtually identical, as are the conservation scores.
The Dops for this alignment is 8.3%, reflecting this.
Walkthrough example 2
Consider again the haemoglobin alignment.
Suppose we are interested only in the cow alpha chain (Cow_alpha_Hb
above) and how conserved its residues are. In this case, we can get The
Scorecons Server to score only those columns in which the cow alpha
chain has a residue.
The results now look slightly different from in the previous example. The
"Conservation scores" section should start something like this:
As before make sure the alignment is in the right format and use the "Browse"
button to locate the appropriate file.
In the "Output options" box, click the "only positions relating to sequence"
Cow alpha haemoglobin is the 2nd sequence in the alignment. Since The
Scorecons Server treats the top sequence as sequence "0", the cow alpha
haemoglobin sequence is therefore sequence "1", so type "1" in the associated
Click on "Score conservation" in the "Action" box.
Notice information is given only positions that are non-gap for cow
alpha haemoglobin. Also, notice the middle column. When we scored every
position in the alignment indescriminantly, there were only "#"-signs in
this column. Now this column contains the residue type of the target (or
"focus") sequence. Reading the middle column from top to bottom gives the
entire unbroken cow alpha chain sequence. The Dops is also slightly higher
than in the previous example, at 95.9%. This is because it now measures
the diversity of scores pertaining only to the cow alpha sequence.
Sum-of-Pairs scores sum all possible pairwise match scores between
amino acids in an aligned column; entropic scores use Shannon's information
theoretical entropy to measure the diversity of symbols (amino acids) in
a column; matrix scores employ a substitution matrix to evaluate stereochemical
diversity in a column; sequence weigted scores normalize against redundancy
of sequences in the alignment.
||Type of score
||Sum-of-Pairs (SP), matrix score
||Simplest SP score possible
||Normalized Shanon entropy with 7 symbol types
||Normalized Shannon entropy with 21 symbol types.
||Entropic, matrix score, sequence weighted
||Mixed model score.
||SP, matrix score, sequence weighted
||Score used in Valdar & Thornton 2001
Fractional rank option
Click the fractional rank checkbox to show each conservation score
as its fractional rank for the alignment. Eg, the most conserved position
will score 1, the median will score 0.5 and so on. This artificially stretches
the distribution of conservation scores evenly between 0 and 1. This involves
a loss of information but may be useful for making vivid images of coloured-up
proteins, since spreading conservation out in this way maximizes contrast.
Substitution matrix choice
Some scores use substitution matrices to measure the diversity of residue
types (see table of scores). The substitution
matrix choice box lets you select which matrix is used. "Modified PET91"
is the PET91 matrix of Jones & Thornton (1992) with diagonal (eg, A
vs A) set constant.
Matrix transformation choice
Before a substitution matrix can be used, it must first be transformed
into a convenient range. A linear transformation is the simplest way of
achieving this. The more sophisticated "Karlinlike" transformation ensures
matches between identical amino acid types consistently score highest.
When not using the modified PET91 matrix, prefer to use the Karlinlike
transformation over the linear one.
= original log odds mutation data matrix
= normalized matrix
Karlinlike (after Karlin & Brocchieri, 1996, J Bacteriol 178:1881-1894):
In sum-of-pairs type scores (see table of
scores) gaps are considered outlying amino acid types in a substitution
matrix. To penalize gaps, the match scores m(a,gap), where a
is an amino acid, and m(gap,gap) are set to be low. This option
allows the gap match score to set explicitly. Acceptable values are between
0 (gaps are strongly penalized) to 1 (gaps are favoured).
Trident score parameters
Trident parameters range anywhere between 0 and infinity (although
"infinity" will neither be accepted nor is it recommended).
By default, the values 1, 0.5 and 3 are chosen so that the trident score
most resembles the valdar01 score.
Diversity: set the extent to which symbol diversity is penalized.
Chemistry: set the extent to which variability in amino acid physical and
chemal properties is penalized.
Gaps: set the extent to which gaps are penalized.
Although The Scorecons Server tolerates non-standard amino acids, such
as B, Z and X, it does model them consistently. X is treated as if it were
a gap. B and Z are present in substitution matrices and so are modeled
in matrix scores; however, since they are artifacts of experimental uncertainty
and have no biological meaning, they cannot be considered extra types in
the entropy scores. Instead, entropy scores convert B and Z to gaps. Therefore,
although a few non-standard amino acids shouldn't cause too many problems,
for best results either avoid alignments containing them or convert them
to standard amino acids or gaps.