IPD-MHC Database
Downloads
The IPD-MHC Database provides an FTP
site for the retrieval of sequences. The sequence are provided as
FASTA and PIR formats. Descriptions of each file type is available
below. The FTP directory is available at the following address:
The FTP directory is further divided into separate
directories for each group.
The files follow the following naming conventions:
- species.locus.type.format
For example for a file containing FASTA formatted
Gorilla-A nucleotide sequences you would select gogo.a.nuc.fasta
from the nhp directory.
IPD
FTP Server
The following descriptions detail the types
of sequence formats available at the FTP site. The FASTA and PIR
files are just raw sequence all inserts (.) and spaces (*) have
been removed from the sequence.
FASTA - Sequences
in FASTA/Pearson format are represented by two main line types.
The first line always begins with a "greater than" (>) sign and
contains sequence information. In the files provided the sequence
information includes the the allele name. The remaining lines contain
plain text representing the nucleotide sequence. There can be any
number of these sequence lines, of any length, to represent the
nucleotide sequence.
Example FASTA format:
>Allele*0101
ATGTCGCTCTTGGTCGTCAGCATGGCGTGTGTTGGGTTCTTCTTGCTGCAGGGGGCCTGG
CCACATGAGGGAGTCCACAGAAAACCTTCCCTCCTGGCCCACCCAGGTCCCCTGGTGAAA
TCAGAAGAGACAGTCATCCTGCAATGTTGGTCAGATGTCATGTTTGAACACTTCCTTCTG
CACAGAGAGGGGATGTTTAACGACACTTTGCGCCTCATTGGAGAACACCATGATGGGGTC
TCCAAGGCCAACTTCTCCATCAGTCGCATGACGCAAGACCTGGCAGGGACCTACAGATGC
TACGGTTCTGTTACTCACTCCCCCTATCAGGTGTCAGCTCCCAGTGACCCTCTGGACATC
GTGATCATAGGTCTATATGAGAAACCTTCTCTCTCAGCCCAGCCGGGCCCCACGGTTCTG
GCAGGAGAGAATGTGACCTTGTCCTGCAGCTCCCGGAGCTCCTATGACATGTACCATCTA
TCCAGGGAAGGGGAGGCCCATGAACGTAGGCTCCCTGCAGGGCCCAAGGTCAACGGAACA
TTCCAGGCTGACTTTCCTCTGGGCCCTGCCACCCACGGAGGGACCTACAGATGCTTCGGC
TCTTTCCATGACTCTCCATACGAGTGGTCAAAGTCAAGTGACCCACTGCTTGTTTCTGTC
ACAGGAAACCCTTCAAATAGTTGGCCTTCACCCACTGAACCAAGCTCCAAAACCGGTAAC
CCCCGACACCTGCACATTCTGATTGGGACCTCAGTGGTCATCATCCTCTTCATCCTCCTC
TTCTTTCTCCTTCATCGCTGGTGCTCCAACAAAAAAAATGCTGCGGTAATGGACCAAGAG
TCTGCAGGAAACAGAACAGCGAATAGCGAGGACTCTGATGAACAAGACCCTCAGGAGGTG
ACATACACACAGTTGAATCACTGCGTTTTCACACAGAGAAAAATCACTCGCCCTTCTCAG
AGGCCCAAGACACCCCCAACAGATATCATCGTGTACACGGAACTTCCAAATGCTGAGTCC
AGATCCAAAGTTGTCTCCTGCCCATGA
PIR - The format
of sequences in PIR/NbrF format is more complex. The first line
of each sequence entry begins with a "greater than", (>). This is
immediately followed by a two character sequence type specifier,
for these seqeunces this is "DL", meaning DNA linear. Space four
must contain a semicolon. Beginning in space five is the sequence
name or identification code. The second line of each sequence entry
contains a brief description including the accession number, allele
name, sequence length, and a internal checksum for PIR files. The
nucleic acid sequence begins on the third line. The sequence is
free format, however to aid in reading the sequences, the nucleotides
have been arranged in blocks of 10 amino acids. The last character
is an asterisk (*), and acts as a termination character.
Example PIR format.
>DL;Allele*0101
Allele*0101, 1047 bases, 9EB285B5 checksum.
ATGTCGCTCT TGGTCGTCAG CATGGCGTGT GTTGGGTTCT TCTTGCTGCA
GGGGGCCTGG CCACATGAGG GAGTCCACAG AAAACCTTCC CTCCTGGCCC
ACCCAGGTCC CCTGGTGAAA TCAGAAGAGA CAGTCATCCT GCAATGTTGG
TCAGATGTCA TGTTTGAACA CTTCCTTCTG CACAGAGAGG GGATGTTTAA
CGACACTTTG CGCCTCATTG GAGAACACCA TGATGGGGTC TCCAAGGCCA
ACTTCTCCAT CAGTCGCATG ACGCAAGACC TGGCAGGGAC CTACAGATGC
TACGGTTCTG TTACTCACTC CCCCTATCAG GTGTCAGCTC CCAGTGACCC
TCTGGACATC GTGATCATAG GTCTATATGA GAAACCTTCT CTCTCAGCCC
AGCCGGGCCC CACGGTTCTG GCAGGAGAGA ATGTGACCTT GTCCTGCAGC
TCCCGGAGCT CCTATGACAT GTACCATCTA TCCAGGGAAG GGGAGGCCCA
TGAACGTAGG CTCCCTGCAG GGCCCAAGGT CAACGGAACA TTCCAGGCTG
ACTTTCCTCT GGGCCCTGCC ACCCACGGAG GGACCTACAG ATGCTTCGGC
TCTTTCCATG ACTCTCCATA CGAGTGGTCA AAGTCAAGTG ACCCACTGCT
TGTTTCTGTC ACAGGAAACC CTTCAAATAG TTGGCCTTCA CCCACTGAAC
CAAGCTCCAA AACCGGTAAC CCCCGACACC TGCACATTCT GATTGGGACC
TCAGTGGTCA TCATCCTCTT CATCCTCCTC TTCTTTCTCC TTCATCGCTG
GTGCTCCAAC AAAAAAAATG CTGCGGTAAT GGACCAAGAG TCTGCAGGAA
ACAGAACAGC GAATAGCGAG GACTCTGATG AACAAGACCC TCAGGAGGTG
ACATACACAC AGTTGAATCA CTGCGTTTTC ACACAGAGAA AAATCACTCG
CCCTTCTCAG AGGCCCAAGA CACCCCCAAC AGATATCATC GTGTACACGG
AACTTCCAAA TGCTGAGTCC AGATCCAAAG TTGTCTCCTG CCCATGA*
All PIR files have been generated using "ReadSeq",
a freely available sequence format conversion program written by
D. Gilbert.
Further Information
For information regarding IPD please contact IPD Support
 |