KIR Downloads

The IPD-KIR Database provides an FTP site for the retrieval of sequences. The sequences are provided as FASTA and PIR formats. Descriptions of each file type is available below.

The FTP directory is available at the following address:

File Formats

The following descriptions detail the types of sequence formats available at the FTP site. The FASTA and PIR files are just raw sequence all inserts (.) and spaces (*) have been removed from the sequence. All files have been generated using "ReadSeq", a freely available sequence format conversion program written by D. Gilbert.

FASTA

Sequences in FASTA/Pearson format are represented by two main line types. The first line always begins with a "greater than" (>) sign and contains sequence information. In the files provided the sequence information lines includes the unique IPD accession number and the allele name. The remaining lines contain plain text representing the nucleotide or protein sequence. There can be any number of these sequence lines, of any length, to represent the sequence. Please note FASTA files contain no form of alignment information, this means the first base of each file may not correspond to the same position when aligned.

Example KIR2DL1*001 in FASTA format:

>IPD:KIR00001 KIR2DL1*001, 1047 bases, 9EB285B5 checksum.
ATGTCGCTCTTGGTCGTCAGCATGGCGTGTGTTGGGTTCTTCTTGCTGCAGGGGGCCTGG
CCACATGAGGGAGTCCACAGAAAACCTTCCCTCCTGGCCCACCCAGGTCCCCTGGTGAAA
TCAGAAGAGACAGTCATCCTGCAATGTTGGTCAGATGTCATGTTTGAACACTTCCTTCTG
CACAGAGAGGGGATGTTTAACGACACTTTGCGCCTCATTGGAGAACACCATGATGGGGTC
TCCAAGGCCAACTTCTCCATCAGTCGCATGACGCAAGACCTGGCAGGGACCTACAGATGC
TACGGTTCTGTTACTCACTCCCCCTATCAGGTGTCAGCTCCCAGTGACCCTCTGGACATC
GTGATCATAGGTCTATATGAGAAACCTTCTCTCTCAGCCCAGCCGGGCCCCACGGTTCTG
GCAGGAGAGAATGTGACCTTGTCCTGCAGCTCCCGGAGCTCCTATGACATGTACCATCTA
TCCAGGGAAGGGGAGGCCCATGAACGTAGGCTCCCTGCAGGGCCCAAGGTCAACGGAACA
TTCCAGGCTGACTTTCCTCTGGGCCCTGCCACCCACGGAGGGACCTACAGATGCTTCGGC
TCTTTCCATGACTCTCCATACGAGTGGTCAAAGTCAAGTGACCCACTGCTTGTTTCTGTC
ACAGGAAACCCTTCAAATAGTTGGCCTTCACCCACTGAACCAAGCTCCAAAACCGGTAAC
CCCCGACACCTGCACATTCTGATTGGGACCTCAGTGGTCATCATCCTCTTCATCCTCCTC
TTCTTTCTCCTTCATCGCTGGTGCTCCAACAAAAAAAATGCTGCGGTAATGGACCAAGAG
TCTGCAGGAAACAGAACAGCGAATAGCGAGGACTCTGATGAACAAGACCCTCAGGAGGTG
ACATACACACAGTTGAATCACTGCGTTTTCACACAGAGAAAAATCACTCGCCCTTCTCAG
AGGCCCAAGACACCCCCAACAGATATCATCGTGTACACGGAACTTCCAAATGCTGAGTCC
AGATCCAAAGTTGTCTCCTGCCCATGA

MSF

The MSF file format is the only format provided that includes the alignment information. These files have been provided for use in the GeneDoc program.

The file may begin with as many lines of comment or description as required. This can be seen in MSF files which have been saved in GeneDoc. The first mandatory line that is recognised as part of the MSF file is the line containing "MSF:". This line also includes the sequence length, type and date plus an internal check sum value. The next line is a mandatory blank line inserted before the sequence names. There then follows one line per sequence describing the sequence name, length, checksum and a weight value. Only one name per line is allowed; the qualifier "Name: " is followed by the sequence name. Names are restricted to 10 characters or less. Extra characters, between the sequence names and "Len: " are acceptable if they contain no blank characters. Another blank line is added followed by a line starting with two slashes "//" , this indicates the end of the name list. There then follows another blank line. Some MSF formats contain two lines at this point with the second line containing the positions of the sequence elements. The sequence lines follow these start with the sequence name followed by two spaces " ", and 50 bases of the sequence. Each block of 10 elements has to be separated by a space. This is repeated for every sequence. Between each block of sequences is a blank line, again a second line may be added to include positional numbering. The last block of sequences may contain less than 50 elements. It is important that all the sequences have the same length including gaps. To conform to this file format all inserts and spaces are marked by a period (.).

Example KIR2DL1 MSF File

temp.msf1  MSF: 1047  Type: N  January 01, 1776  12:00  Check: 7515 ..

Name: KIR2DL1*001      Len:  1047  Check:  9282  Weight:  1.00
Name: KIR2DL1*002      Len:  1047  Check:  9451  Weight:  1.00
Name: KIR2DL1*00301    Len:  1047  Check:  9879  Weight:  1.00
Name: KIR2DL1*00302    Len:  1047  Check:  9759  Weight:  1.00

//

     KIR2DL1*001  ATGTCGCTCT TGGTCGTCAG CATGGCGTGT GTTGGGTTCT TCTTGCTGCA
     KIR2DL1*002  ATGTCGCTCT TGTTCGTCAG CATGGCGTGT GTTGGGTTCT TCTTGCTGCA
   KIR2DL1*00301  ATGTCGCTCT TGGTCGTCAG CATGGCGTGT GTTGGGTTCT TCTTGCTGCA
   KIR2DL1*00302  ATGTCGCTCT TGGTCGTCAG CATGGCGTGT GTTGGGTTCT TCTTGCTGCA

PIR

The format of sequences in PIR/NbrF format is more complex. The first line of each sequence entry begins with a "greater than", (>). This is immediately followed by a two character sequence type specifier, for these sequences this is "DL", meaning DNA linear. Space four must contain a semicolon. Beginning in space five is the sequence name or identification code. The second line of each sequence entry contains a brief description including the accession number, allele name, sequence length, and a internal checksum for PIR files. The nucleic acid sequence begins on the third line. The sequence is free format, however to aid in reading the sequences, the nucleotides have been arranged in blocks of 10 bases. The last character is an asterisk (*), and acts as a termination character.

Example KIR2DL1*001 in PIR format.

>DL;IPD:KIR00001
IPD:KIR00001 KIR2DL1*001, 1047 bases, 9EB285B5 checksum.
 ATGTCGCTCT TGGTCGTCAG CATGGCGTGT GTTGGGTTCT TCTTGCTGCA
 GGGGGCCTGG CCACATGAGG GAGTCCACAG AAAACCTTCC CTCCTGGCCC 
 ACCCAGGTCC CCTGGTGAAA TCAGAAGAGA CAGTCATCCT GCAATGTTGG
 TCAGATGTCA TGTTTGAACA CTTCCTTCTG CACAGAGAGG GGATGTTTAA
 CGACACTTTG CGCCTCATTG GAGAACACCA TGATGGGGTC TCCAAGGCCA
 ACTTCTCCAT CAGTCGCATG ACGCAAGACC TGGCAGGGAC CTACAGATGC
 TACGGTTCTG TTACTCACTC CCCCTATCAG GTGTCAGCTC CCAGTGACCC
 TCTGGACATC GTGATCATAG GTCTATATGA GAAACCTTCT CTCTCAGCCC
 AGCCGGGCCC CACGGTTCTG GCAGGAGAGA ATGTGACCTT GTCCTGCAGC
 TCCCGGAGCT CCTATGACAT GTACCATCTA TCCAGGGAAG GGGAGGCCCA
 TGAACGTAGG CTCCCTGCAG GGCCCAAGGT CAACGGAACA TTCCAGGCTG
 ACTTTCCTCT GGGCCCTGCC ACCCACGGAG GGACCTACAG ATGCTTCGGC
 TCTTTCCATG ACTCTCCATA CGAGTGGTCA AAGTCAAGTG ACCCACTGCT
 TGTTTCTGTC ACAGGAAACC CTTCAAATAG TTGGCCTTCA CCCACTGAAC
 CAAGCTCCAA AACCGGTAAC CCCCGACACC TGCACATTCT GATTGGGACC
 TCAGTGGTCA TCATCCTCTT CATCCTCCTC TTCTTTCTCC TTCATCGCTG
 GTGCTCCAAC AAAAAAAATG CTGCGGTAAT GGACCAAGAG TCTGCAGGAA
 ACAGAACAGC GAATAGCGAG GACTCTGATG AACAAGACCC TCAGGAGGTG
 ACATACACAC AGTTGAATCA CTGCGTTTTC ACACAGAGAA AAATCACTCG
 CCCTTCTCAG AGGCCCAAGA CACCCCCAAC AGATATCATC GTGTACACGG
 AACTTCCAAA TGCTGAGTCC AGATCCAAAG TTGTCTCCTG CCCATGA*

IPD-KIR