Sequence Alignment Help

Using the Sequence Alignment Tool

The sequence alignment form contains the following options:

Select Locus - this option allows the user to choose which of the HLA or related genes to align. The locus is selected from the drop down menu. The box also includes a number of specialist choices like multiple alignments for all the DRB1,3,4 & 5 alleles or the DRB Pseudo genes. The selection of a locus automatically determines the type of sequences available to align.

Select the feature to align - this option provides a list of alignments available for the loci selected. The types of alignment include CDS alignments, individual exons or combined regions. If an option is not listed in this list then it is either not possible or is currently unavailable.

Enter any specific sequences required - allows the user to perform specific sequence alignments by either entering common nomenclature or by listing allele names. For example to align DRB1*01:01, DRB1*01:02:01, DRB1*01:02:02. You could enter 01 or 01:0 in the box as the common nomenclature, or you could enter 01:01, 01:02:01, 01:02:02 in the box provided, separating each allele name with a comma. Wildcards (*) may be used in the allele name.

Enter the reference sequence - the alignment tool allows the user to select an alternative reference sequence. This is optional, if not selected the tool uses the default sequence shown below. The alternatives are a user specified sequence or a consensus sequence. To use an alternative reference sequence simply enter the numerical code in full in the box provided. Please note incorrect codes will cause errors in the alignment, 01:01 is not a valid code for A*01:01:01:01, the full numerical code must be entered. A consensus sequence based on those alleles in the alignment can be used by typing "consensus" into the reference box. The consensus sequence is not derived from all alleles at the locus selected but from those alleles selected for the alignment.

Official Reference Sequences
Locus Allele Acc. No.
HLA-A 01:01:01:01 HLA00001
HLA-B 07:02:01 HLA00132
HLA-C 01:02:01 HLA00401
HLA-E 01:01:01:01 HLA00934
HLA-F 01:01:01:01 HLA01096
HLA-G 01:01:01:01 HLA00939
HLA-H 01:01:01:01 HLA02546
HLA-J 01:01:01:01 HLA02626
HLA-K 01:01:01:01 HLA02654
HLA-L 01:01:01:01 HLA02655
HLA-P 01:01:01:01 HLA02742
HLA-V 01:01:01:01 HLA02801
HLA-DMA 01:01:01:01 HLA00485
HLA-DMB 01:01:01:01 HLA00489
HLA-DOA 01:01:01 HLA00494
HLA-DOB 01:01:01:01 HLA01098
HLA-DPA1 01:03:01:01 HLA00499
HLA-DPB1 01:01:01 HLA00514
HLA-DQA1 01:01:01 HLA00601
HLA-DQB1 05:01:01:01 HLA00638
HLA-DRA 01:01:01:01 HLA00662
HLA-DRB1 01:01:01 HLA00664
HLA-DRB2 01:01 HLA01028
HLA-DRB3 01:01:01 HLA00886
HLA-DRB4 01:01:01:01 HLA00905
HLA-DRB5 01:01:01 HLA00915
HLA-DRB6 01:01 HLA00929
HLA-DRB7 01:01:01 HLA00932
HLA-DRB8 01:01 HLA01029
HLA-DRB9 01:01 HLA01030
MICA 001 HLA01013
MICB 001 HLA02033
TAP1 01:01:01:01 HLA00953
TAP2 01:01:01:01 HLA00959

Note - For the DQB1 alleles the DQB1*05 sequences are displayed first.

Select how you wish to view any mismatches - this option selects whether the to display the full sequence or to highlight the mismatches. The full sequence details every base pair for all sequences, highlighting mismatches represents only base pairs that differ between the sequence and the reference sequence used. Examples of both options are shown below.

Show mismatches between sequences:

A*01:02       ---------- -------C--

Show all bases:


Select how the alignment will be numbered - depending on the type of sequence selected different numbering styles can be selected. For nucleotide sequences the alignments can be displayed in blocks of 10 nucleotides or in the amino acid codons. Genomic sequences are only displayed in blocks of 10 nucleotides. Protein are always displayed in blocks of 10 amino acids. For both formats it may be necessary to increase the width of your browser to fully view the sequence. Full details of how sequenced are numbered is explained below.

Do you want to omit alleles unsequenced for this region - due to the high number of alleles in some alignments, you can now omit those alleles that are not sequenced over the region of interest. This will reduce the time taken to perform the alignment and the space required to display the output. For some loci, genomic alignments can contain over 1.5 million bases if all sequences are selected. When non-coding regions are selected, all alleles which contain unsequenced regions are by default removed from the alignment. Where possible select only the sequences needed, this will reduce time and make the alignments easier to view.

Select type of output - in order to aid printing of the alignments, you can select a text only version of the output. This removes all interactive tags and is easier to cut and paste into applications like Microsoft Word.

Sequence Alignment Conventions

Full details of the conventions used for numbering and displaying the alignments can be found here.




Animated Sponsors Logo