The IPD-IMGT/HLA Database provides an FTP site for the retrieval of sequences in a number of pre-formatted files. The sequence are provided as FASTA, PIR and MSF formats, as well as an archive of the sequence alignments and a flat file formatted copy of the database. Descriptions of each file type is available below.
The FTP directory is available at the following address:
Previous releases are archived as a git repository and available at https://github.com/jrob119/IMGTHLA. This repository contains a branch for each database release and a Latest branch which contains the most recent files as well as all compressed archives.
The files within the FTP directory are copyrighted by the IPD-IMGT/HLA Database, see http://www.eb.ac.uk/imgt/hla/licence.html and distributed under the Creative Commons Attribution-NoDerivs License.
FASTA, PIR and MSF Formats
The FTP directory contains files in a number of file formats, further information on these formats can be found at: http://www.ebi.ac.uk/help/formats.html. All PIR and MSF files have been generated using "ReadSeq", a freely available sequence format conversion program written by D. Gilbert.
A text file version of all the sequences alignments at both the nucleotide and protein level is provided as a zip file. Zip files are compressed archives and can be opened using many utilities like Winzip (Windows), Stuffit (Mac) and Gunzip (Unix). The files in the archives use the following naming conventions:
- locus_nuc.txt - nucleotide CDS alignment
- locus_gen.txt - genomic nucleotide alignments
- locus_prot.txt - protein alignment
- ClassI_nuc.txt - nucleotide alignment of HLA-A, B and C alleles
Flat Files (hla.dat)
Please read the documentation provided in the FTP directory or the online documentation for a description of the flat file format.
All files released under the pre-2010 nomenclature designations and previously available from the FTP directory can now be found in the archive sub-directory.