Help - ClustalW2 FAQ


Launch


Why use ClustalW2?
Multiple alignments of protein sequences are important tools in studying sequences. The basic information they provide is the identification of conserved sequence regions. This is very useful in designing experiments to test and modify the function of specific proteins, in predicting the function and structure of proteins and in identifying new members of protein families.
Top



How can I use ClustalW2?
The ClustalW2 web form is available at http://www.ebi.ac.uk/Tools/clustalw2/. There are two ways to use this service at the EBI. The first is interactively (default) and the second is by email. Using it interactively, the user must wait for the results to be displayed in the browser window. The email option means that the results will not be displayed in the browser window but will be sent by email. The email option is the better one to take when submitting large amounts of data.

ClustalW is also available from within SRS (http://srs.ebi.ac.uk).

Top



What can I do with ClustalW2?

The program ClustalW2 can be used for two purposes:

1. It can be used to produce a multiple sequence alignment. Using the web form the user need only input or upload a file of the sequences that they want to align in an accepted format. The other options on the form are set to the default values for producing a multiple alignment. The user can use the defaults or they can make some changes on the form to customise their run. A multiple sequence alignment of the sequences submitted will be returned to the user (.aln file).

2. It can be used to produce a true phylogenetic tree. In order to use this option, the user must input or upload a multiple alignment of sequences in one of the standard multiple alignment formats (.aln file). Then, in the phylogentic tree section of the form, they must choose one of the tree type options; NJ, Pyhlip or Dist. These are programs for drawing phylogenetic trees. This time the user will retrieve a .ph (always), .dst and/or .nj files (depending on options chosen), which will contain the phylogenetic trees.

By default, the form is set to produce a multiple alignment.

Top



What type of sequences can ClustalW2 align?

It can align either nucleotide or protein sequences. In the case of nucleotide sequences, it will align them as they are input - the program does not provide the option of specifying DNA strands. The EMBOSS tool revseq can be used to reverse and/or complement nucleotide sequences.

Top



What input formats does ClustalW2 accept?

The program accepts sequences in the following formats:

NBRF/PIR, EMBL/UniProt, Pearson (FASTA), GDE, ALN/ClustalW, GCG/MSF, RSF (see the Clustal help pages for details about formats).

The sequences can either be pasted into the web form or uploaded to the web form in a file. It is very important that each of the sequences has a unique name. If they do not, the program will fail. There must be no empty lines, white spaces or control characters between sequences or at the top of the file. This will also cause the program to fail.

Top



What output formats does ClustalW2 produce?

There are a number of options provided as output for the user:

aln with numbers (default), aln without numbers, gcg MSF, phylip, pir and gde.

The user can specify which of these they want on the web form in the OUTPUT section. There is also an option to specify the order that the sequences appear in the alignment: aligned (default) or in the order in which they were input. The alignment will appear on the results page along with details of scores and guide trees. The alignment can be obtained on its own by clicking on the alignment file option at the top (.aln). This file can be opened in a separate window and/or saved to a file. 

Top



How can I save my alignment to a file?

The alignment will appear on the results page along with details of scores and guide trees. The alignment can be obtained on its own by clicking on the alignment file option at the top (.aln). This file can be opened in a separate window or saved to a file.

Is there a limit on the number of sequences or the size of the file that I submit to ClustalW2?


The input for ClustalW2 is limited to a maximum of 500 sequences or to a 10MB file (whichever is smaller). When the input file or the number of sequences is large, ClustalW can run for days and in some cases may not finish at all. If you plan to input large amounts of data/sequences, you should use the "RESULTS: email" option

Email jobs are allowed to run for more than 24 hours and the results are kept for a week.

Top



What do the file extensions mean that I get in my results?

On our ClustalW2 submission page, when you submit a number of sequences using the default parameters, you retrieve a .aln and a .dnd file. The .aln file is the alignment and the .dnd file is a guide tree - it is not a phylogenetic tree.

To get an accurate phylogenetic tree, you need to use the .aln file as input and put this back into the ClustalW2 form. This time you need to choose one of the tree options - nj, phylip or dist (all methods for making phylogenetic trees). This time you will retrieve a .ph (always), .dst and/or .nj (depending on options), which are phylogenetic trees.

The .input is your input and the .output is the results that are output.

Top



How are the pairwise alignment scores generated?

A pairwise score is calculated for every pair of sequences that are to be aligned. These scores are presented in a table in the results. Pairwise scores are calculated as the number of identities in the best alignment divided by the number of residues compared (gap positions are excluded). Both of these scores are initially calculated as percent identity scores and are converted to distances by dividing by 100 and subtracting from 1.0 to give number of differences per site. We do not correct for multiple substitutions in these initial distances.

As the pairwise score is calculated independently of the matrix and gaps chosen, it will always be the same value for a particular pair of sequences.

Alignment score is calculated in two ways - fast and slow (more accurate mode). The scores are calculated from separate pairwise alignments. These can be calculated using 2 methods: dynamic programming (slow but accurate) or by the method of Wilbur and Lipman (extremely fast but approximate).


See also:
Top



How can I get the colour version of the alignment?

The alignment will appear on the results page in black and white. There is an option available 'show colours' that will display the alignment in colour according to the physiochemical characteristics of the amino acids.

Top



Why can I not see the guide tree in my browser?

You must have java enabled to see the guide tree. The guide trees are produced by a java applet (provided by Java runtime plugin). To check that you have enabled Java applets go to Preferences, Advanced, and "Enable Java". More Information

Top



What is Jalview?

Jalview is a mulitple alignment editor that is written entirely in java. It is provided as an option when you retrieve a multiple alignment from ClustalW2. To use it, just click on the Jalview gif. It allows you to do things like: For more information about Jalview, look at http://www.jalview.org/

Top



Why does Jalview not work for me?

If Jalview is not available with your results, this is because the program requires that your browser supports Java applets (provided by Java runtime plugin). To check that you have enabled Java applets go to Preferences, Advanced, and "Enable Java".

Top



How can I save the guide tree image or the jalview alignment?
The only way this can be done is by making a screen shot. This is because applets are not printed out with the html pages. You will need to:
Top



What is the difference between a cladogram and a phylogram?

A phylogram is a branching diagram (tree) that is assumed to be an estimate of a phylogeny. The branch lengths are proportional to the amount of inferred evolutionary change. A cladogram is a branching diagram (tree) assumed to be an estimate of a phylogeny where the branches are of equal length. Therefore, cladograms show common ancestry, but do not indicate the amount of evolutionary "time" separating taxa. It is possible to see the tree distances by clicking on the diagram to get a menu of options. The options available allow you to do things like changing the colours of lines and fonts and  showing the distances.

Top



Which method is used to draw the guide tree?

The method names PHYLIP is the equivalent of new hamshire format tree representation. All clustalw phylogenetic calculations are based around the neighbor-joining method of Saitou and Nei.
Top



Is there an alternative tree drawing program that can be used for large numbers of sequences?

If the number of sequences is very large, the default tree drawing program can generate an image that is too large to capture with print screen. In these situations a number of other programs may help to scale the image down:
http://pearl.cs.pusan.ac.kr/phylodraw/
http://pfaat.sourceforge.net/
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
You will need to install the application on your PC. Then save your ClustalW tree file (.ph) and use it with the application.

Top



Is it possible to perform bootstrap analysis or to get the Bootstrap values along with the nodes of each branch of the phylogram?

No - bootstrap analysis is too cpu intensive so we do not allow it via the website

If you do wish to do this you will need to download the software (available from the clustal page) and run this locally.
Top



Can we have distance scale (as bar) based on the substitution rates along with the phylogram?

We do not produce a bar but distances can be displayed when you right click on
the java applet
Top



What are the default parameters used by ClustalW2?

When "def" values are used, we let ClustalW (1.82) use its own default values:
Top



How can I see the parameter values that are used by the ClustalW2 program?

If you submit your job by email, you will receive two emails. The first one is a confirmation mail that lists the parameters that you have chosen. The second mail contains a link to the ClustalW2 result page. It is not possible to show the submission parameters on the result page, because ClustalW2 does not include them in the ClustalW2 output.

Top



How long are ClustalW2 results stored at the EBI?

If you run an interactive job, the results will be available for 24 hours. The results of an email job are available for 2 weeks. Some big files are removed after 15 minutes due to space constraints.

Top



Does my job have an identifier?

Yes. You will find it on the results page. The job identifier has the job name, the date and a random number. If you want to report any errors or queries about a job, please tell us the job identifier.

Top



Where can I download the ClustalW2 software from?

Both ClustalW2 and ClustalX can be downloaded from the EBI ftp site.
We do not distribute the CGI script for the web interface. The CGI.pm module (available from many perl sites on the internet) is needed to build a cgi interface for the command-line version of ClustalW2.

Top



How do I reference use of this service?


Larkin M.A., Blackshields G., Brown N.P., Chenna R., McGettigan P.A., McWilliam H.*, Valentin F.*, Wallace I.M., Wilm A., Lopez R.*, Thompson J.D., Gibson T.J. and Higgins D.G. (2007)
ClustalW and ClustalX version 2.
Bioinformatics 2007 23(21): 2947-2948.
abstract full-text PDF

Top



What is the difference between a guide tree and a true phylogenetic tree?

A guide tree is calculated based on the distance matrix that is generated from the pairwise scores. The output can be found in the .dnd file. A phylogenetic tree is calculated based on the multiple alignment that it receives. The distances between the sequences in the alignment are calculated and can be found in the .ph file. These distances are then used by the method chosen (nj, phylip, dist) to make the phylogenetic tree (.nj, .ph, .dst file).

Top



Where can I find detailed information about ClustalW2?

There are a number of places:

For additional help on ClustalW2 also see:

Top




How does ClustalW2 Work (very simple explanation)?
1. Determine all pairwise alignments between sequences and the degree of similarity between them:

2. Construct a similarity tree.

3. Combine the alignments from 1 in the order specified in 2 using the rule " once a gap always a gap"

In stage 1:
1.1. clustalW2 uses a pairwise alignment to compute pairwise alignments.

1.2. Using the alignments from 1.1 it computes a distance.

1.2.1. The distance is commonly calculated by looking at the non-gapped positions and count the number of mistmatches between the two sequences. Then divide this value by the number of non-gapped pairs to calculate the distance. Once all distances for all pairs are calculated they go into a matrix. This follows on in stage 2.
2. Using the matrix from 1.2.1. and Neighbor-Joining, Clustalw constructs the similarity tree. The root is placed in the middle of the longest chain of consecutive edges.

3. Combine the alignments, starting from the closest related groups (going form the tips of the tree towards the root).

Top