The IPD-MHC Database provides an FTP site for the retrieval of sequences. The sequence are provided as FASTA and PIR formats. Descriptions of each file type is available below. The FTP directory is available at the following address:
The FTP directory is further divided into separate directories for each group.
The files follow the following naming conventions:
For example for a file containing FASTA formatted Gorilla-A nucleotide sequences you would select gogo.a.nuc.fasta from the nhp directory.
IPD FTP Server
The following descriptions detail the types of sequence formats available at the FTP site. The FASTA and PIR files are just raw sequence all inserts (.) and spaces (*) have been removed from the sequence.
FASTA - Sequences in FASTA/Pearson format are represented by two main line types. The first line always begins with a "greater than" (>) sign and contains sequence information. In the files provided the sequence information includes the the allele name. The remaining lines contain plain text representing the nucleotide sequence. There can be any number of these sequence lines, of any length, to represent the nucleotide sequence.
Example FASTA format:
PIR - The format of sequences in PIR/NbrF format is more complex. The first line of each sequence entry begins with a "greater than", (>). This is immediately followed by a two character sequence type specifier, for these seqeunces this is "DL", meaning DNA linear. Space four must contain a semicolon. Beginning in space five is the sequence name or identification code. The second line of each sequence entry contains a brief description including the accession number, allele name, sequence length, and a internal checksum for PIR files. The nucleic acid sequence begins on the third line. The sequence is free format, however to aid in reading the sequences, the nucleotides have been arranged in blocks of 10 amino acids. The last character is an asterisk (*), and acts as a termination character.
Example PIR format.
Allele*0101, 1047 bases, 9EB285B5 checksum.
ATGTCGCTCT TGGTCGTCAG CATGGCGTGT GTTGGGTTCT TCTTGCTGCA
GGGGGCCTGG CCACATGAGG GAGTCCACAG AAAACCTTCC CTCCTGGCCC
ACCCAGGTCC CCTGGTGAAA TCAGAAGAGA CAGTCATCCT GCAATGTTGG
TCAGATGTCA TGTTTGAACA CTTCCTTCTG CACAGAGAGG GGATGTTTAA
CGACACTTTG CGCCTCATTG GAGAACACCA TGATGGGGTC TCCAAGGCCA
ACTTCTCCAT CAGTCGCATG ACGCAAGACC TGGCAGGGAC CTACAGATGC
TACGGTTCTG TTACTCACTC CCCCTATCAG GTGTCAGCTC CCAGTGACCC
TCTGGACATC GTGATCATAG GTCTATATGA GAAACCTTCT CTCTCAGCCC
AGCCGGGCCC CACGGTTCTG GCAGGAGAGA ATGTGACCTT GTCCTGCAGC
TCCCGGAGCT CCTATGACAT GTACCATCTA TCCAGGGAAG GGGAGGCCCA
TGAACGTAGG CTCCCTGCAG GGCCCAAGGT CAACGGAACA TTCCAGGCTG
ACTTTCCTCT GGGCCCTGCC ACCCACGGAG GGACCTACAG ATGCTTCGGC
TCTTTCCATG ACTCTCCATA CGAGTGGTCA AAGTCAAGTG ACCCACTGCT
TGTTTCTGTC ACAGGAAACC CTTCAAATAG TTGGCCTTCA CCCACTGAAC
CAAGCTCCAA AACCGGTAAC CCCCGACACC TGCACATTCT GATTGGGACC
TCAGTGGTCA TCATCCTCTT CATCCTCCTC TTCTTTCTCC TTCATCGCTG
GTGCTCCAAC AAAAAAAATG CTGCGGTAAT GGACCAAGAG TCTGCAGGAA
ACAGAACAGC GAATAGCGAG GACTCTGATG AACAAGACCC TCAGGAGGTG
ACATACACAC AGTTGAATCA CTGCGTTTTC ACACAGAGAA AAATCACTCG
CCCTTCTCAG AGGCCCAAGA CACCCCCAAC AGATATCATC GTGTACACGG
AACTTCCAAA TGCTGAGTCC AGATCCAAAG TTGTCTCCTG CCCATGA*
All PIR files have been generated using "ReadSeq", a freely available sequence format conversion program written by D. Gilbert.