Sequence Alignment Help
Using the Sequence Alignment Tool
The sequence alignment form contains the following options:
Select Locus - this option allows the user to choose which of the HLA or HLA-related genes to align from the drop-down menu. The drop-down menu also includes a number of special choices, like multiple sequence alignments for all the DRB1, 3, 4 & 5 alleles or all the DRB pseudogene alleles.
Select the feature to align - this option provides a list of alignments available for the locus selected. The alignments available include CDS alignments, individual exon alignments or alignments of combined regions. If an option listed in the drop-down menu for one locus is not listed for another locus, then it is either not possible or is currently unavailable.
Enter any specific sequences required - this allows the user to view alignments of specific sequences by entering either common nomenclature or by listing allele names. For example, to align DRB1*01:01, DRB1*01:02:01 and DRB1*01:02:02 the user could enter 01 or 01:0 into this box as the common nomenclature and this will match the desired alleles. Alternatively, the user could enter "01:01, 01:02:01, 01:02:02" in the box, separating each allele name with either a comma or a new line, and this will also match the desired alleles. The wildcard character (*) may also be used in the allele name.
Enter the reference sequence - the alignment tool allows the user seleect an alternative reference sequence. This is optional and, if not selected or altered, the tool uses the default sequence as listed below per locus. To use an alternative reference sequence, simply enter the full numerical code in the box provided. Please note that incorrect codes will cause errors in the alignment e.g. 01:01 is not a valid code for specifying A*01:01:01:01 to be used as a reference sequence - the full numerical code must be entered. A consensus sequence based on the specified alleles in the alignment can be used by typing "CONSENSUS" into the reference box. The consensus sequence will be derived from the alleles specified for the alignment.
|Official Reference Sequences|
Note - in the DQB1 alignments, the DQB1*05 alleles are displayed first.
Alignment Display Options
Mismatches - this option selects whether to display the full sequence for all alleles in the alignment, or to only display those bases that mismatch the reference, e.g.:
- Show mismatches between sequences:
A*01:01:01:01 CGGGGGCCCT GGCCCTGACC
A*01:02 ---------- -------C--
- Show all bases:
A*01:01:01:01 CGGGGGCCCT GGCCCTGACC
A*01:02 CGGGGGCCCT GGCCCTGCCC
Numbering - depending on the alignment type, different numbering formats can be selected. For nucleotide sequences the alignments can be displayed in blocks of 10 nucleotides or in blocks of 3 nucleotides to represent the amino acid codons. Genomic alignments are only displayed in blocks of 10 nucleotides and protein alignments are always displayed in blocks of 10 amino acids. For either format, it may be necessary to increase the width of your browser window (or zoom out) to fully view the alignment. Full details of how sequences are numbered is explained here.
Alleles unsequenced in selected region - the user can omit the alleles that are not sequenced over the region of interest from their alignment. This will reduce the time taken to perform the alignment. For some loci, genomic alignments can contain over 1.5 million bases if all sequences are selected. When non-coding regions are selected, all alleles which contain unsequenced regions are removed from the alignment by default. Where possible, select only the sequences needed as this will reduce the loading time and make the alignments easier to view.
Output - to aid printing of the alignments, the user can select a text only version of the output. This removes all interactive tags and is easier to cut and paste into applications like Microsoft Word.