spacer

Structure models for sequence alignment

Introduction

Alignment program PRANK and its graphical alternative PRANKSTER can import alignment models from structured flat files but defining complex models manually can be an error-prone and frustrating task. The two forms below provide a convenient way to build the most typical models for the alignment of DNA sequences, Two-state model describing regions of fast and slowly evolving sites and Codon model describing the fast and slow regions and the periodicity of the three codon sites.

To build an alignment model using the forms below, modify the parameters as you wish and click the button in the bottom of the form. Depending on your browser, you will be directed to a new page or offered a text file to be saved. In either case, save the resulting page locally with a name that you will remember. You can then use this model for alignment by either adding parameter -m=model_name for PRANK, or in PRANKSTER, selecting Settings->Model->Import model and choosing the right file.

An alignment generated using the two-state model (left) and the codon model (right).

       

Description of a typical model file can be found in this example.

Two-state model



The simplest possible structure model consists of two distinct processes. As defined below, the model describes sequences with regions of fast and slowly evolving sites (F and S, respectively), such that the fast regions are, in average, slightly longer (length), have more gaps (gap rate) and the gaps tend to be longer (gap length). As long as the relative substitution rate is below 1, S is the slower process and the rate for F is adjusted such that the average rate equals 1. The base frequencies and the transition/transversion rate parameter kappa can be set separately for each process. Equal frequencies and kappa=1 correspond to the model of Jukes and Cantor.


F:

base freqs: A C G T

kappa

length   gap rate   gap length


S:

base freqs: A C G T

kappa

length   gap rate   gap length

relative substitution rate


Codon model



Codon model is an extension of Two-state model with additional three processes depicting the three sites of a codon. As defined below, F, S and 1-3 describe regions of fast and slowly evolving sites and codons, respectively. The new parameters are the length of non-coding region (F + S) and, for codons, the intensity and direction of selection (omega) and the possible weighting of base frequencies according to empirical amino-acid frequencies (WAG). Small values of omega mean that the codons are under purifying selection and substitutions at the codon first and second sites are rare (generating the typical periodicity of "fast, fast, slow"), whereas omega=1 suggests that the sequences are evolving neutrally.

As coding sequence typically is preceded and followed by highly conserved splice regions and 1-3 can only be reached through S (see the figure), it is recommended that the relative substitution rate is below 1, making S the slower process.


F:

base freqs: A C G T

kappa

length   gap rate   gap length


S:

base freqs: A C G T

kappa

length   gap rate   gap length

relative substitution rate

length non-coding


1-3:

base freqs: A C G T

omega   WAG aa freqs

length   gap rate   gap length


Four-state model



Four-state model is a simple extension of the basic model and defines four separate single-character states with equal probabilities of moving from one state to any other state. The rate of the first state can not be defined but it is scaled such that the overall rate given the time spent in different states equals 1. See the description of Two-state model for more details.


1:

base freqs: A C G T

kappa

length   gap rate   gap length


2:

base freqs: A C G T

kappa

length   gap rate   gap length

relative substitution rate


3:

base freqs: A C G T

kappa

length   gap rate   gap length

relative substitution rate


4:

base freqs: A C G T

kappa

length   gap rate   gap length

relative substitution rate


Acknowledgements


The code generating the alignment models relies heavily on Simon Whelan's software multiscaleq.


Back to the front page.    Comments? E-mail ari@ebi.ac.uk.

spacer

spacer