Sequence Alignment Help

Using the Sequence Alignment Tool

The sequence alignment form contains the following options:

  • Select Locus - this option allows the user to choose which of the HLA or HLA-related genes to align from the drop-down menu. The drop-down menu also includes a number of special choices, like multiple sequence alignments for all the DRB1, 3, 4 & 5 alleles or all the DRB pseudogene alleles.

  • Select the feature to align - this option provides a list of alignments available for the locus selected. The alignments available include CDS alignments, individual exon alignments or alignments of combined regions. If an option listed in the drop-down menu for one locus is not listed for another locus, then it is either not possible or is currently unavailable.

  • Enter any specific sequences required - this allows the user to view alignments of specific sequences by entering either common nomenclature or by listing allele names. For example, to align DRB1*01:01, DRB1*01:02:01 and DRB1*01:02:02 the user could enter 01 or 01:0 into this box as the common nomenclature and this will match the desired alleles. Alternatively, the user could enter "01:01, 01:02:01, 01:02:02" in the box, separating each allele name with either a comma or a new line, and this will also match the desired alleles. The wildcard character (*) may also be used in the allele name.

  • Enter the reference sequence - the alignment tool allows the user seleect an alternative reference sequence. This is optional and, if not selected or altered, the tool uses the default sequence as listed below per locus. To use an alternative reference sequence, simply enter the full numerical code in the box provided. Please note that incorrect codes will cause errors in the alignment e.g. 01:01 is not a valid code for specifying A*01:01:01:01 to be used as a reference sequence - the full numerical code must be entered. A consensus sequence based on the specified alleles in the alignment can be used by typing "CONSENSUS" into the reference box. The consensus sequence will be derived from the alleles specified for the alignment.


Official Reference Sequences
Locus Allele Acc. No.
HLA-A 01:01:01:01 HLA00001
HLA-B 07:02:01 HLA00132
HLA-C 01:02:01 HLA00401
HLA-E 01:01:01:01 HLA00934
HLA-F 01:01:01:01 HLA01096
HLA-G 01:01:01:01 HLA00939
HLA-H 01:01:01:01 HLA02546
HLA-J 01:01:01:01 HLA02626
HLA-K 01:01:01:01 HLA02654
HLA-L 01:01:01:01 HLA02655
HLA-P 01:01:01:01 HLA02742
HLA-V 01:01:01:01 HLA02801
HLA-Y 01:01 HLA13320
HLA-DMA 01:01:01:01 HLA00485
HLA-DMB 01:01:01:01 HLA00489
HLA-DOA 01:01:01 HLA00494
HLA-DOB 01:01:01:01 HLA01098
HLA-DPA1 01:03:01:01 HLA00499
HLA-DPB1 01:01:01 HLA00514
HLA-DPB2 01:01:01 HLA14837
HLA-DQA1 01:01:01 HLA00601
HLA-DQB1 05:01:01:01 HLA00638
HLA-DRA 01:01:01:01 HLA00662
HLA-DRB1 01:01:01 HLA00664
HLA-DRB2 01:01 HLA01028
HLA-DRB3 01:01:01 HLA00886
HLA-DRB4 01:01:01:01 HLA00905
HLA-DRB5 01:01:01 HLA00915
HLA-DRB6 01:01 HLA00929
HLA-DRB7 01:01:01 HLA00932
HLA-DRB8 01:01 HLA01029
HLA-DRB9 01:01 HLA01030
HFE 001:01:01 HLA14067
MICA 001 HLA01013
MICB 001 HLA02033
TAP1 01:01:01:01 HLA00953
TAP2 01:01:01:01 HLA00959

Note - in the DQB1 alignments, the DQB1*05 alleles are displayed first.

Alignment Display Options

  • Mismatches - this option selects whether to display the full sequence for all alleles in the alignment, or to only display those bases that mismatch the reference, e.g.:

    - Show mismatches between sequences:

    A*01:01:01:01 CGGGGGCCCT GGCCCTGACC
    A*01:02       ---------- -------C--

    - Show all bases:

    A*01:01:01:01 CGGGGGCCCT GGCCCTGACC
    A*01:02       CGGGGGCCCT GGCCCTGCCC
  • Numbering - depending on the alignment type, different numbering formats can be selected. For nucleotide sequences the alignments can be displayed in blocks of 10 nucleotides or in blocks of 3 nucleotides to represent the amino acid codons. Genomic alignments are only displayed in blocks of 10 nucleotides and protein alignments are always displayed in blocks of 10 amino acids. For either format, it may be necessary to increase the width of your browser window (or zoom out) to fully view the alignment. Full details of how sequences are numbered is explained here.

  • Alleles unsequenced in selected region - the user can omit the alleles that are not sequenced over the region of interest from their alignment. This will reduce the time taken to perform the alignment. For some loci, genomic alignments can contain over 1.5 million bases if all sequences are selected. When non-coding regions are selected, all alleles which contain unsequenced regions are removed from the alignment by default. Where possible, select only the sequences needed as this will reduce the loading time and make the alignments easier to view.

  • Output - to aid printing of the alignments, the user can select a text only version of the output. This removes all interactive tags and is easier to cut and paste into applications like Microsoft Word.



Further Information

For more information about the database, queries (including website) or to subscribe to the IPD mailing lists please contact IPD Support.

Please see our licence for our terms of use.