Downloads

The IPD-MHC Database provides an FTP site for the retrieval of sequences. The sequence are provided as FASTA and PIR formats. Descriptions of each file type is available below. The FTP directory is available at the following address:


Group FTP Directory
IPD-MHC ftp://ftp.ebi.ac.uk/pub/databases/ipd/mhc/

The FTP directory is further divided into separate directories for each group.


Group FTP Directory
Canines ftp://ftp.ebi.ac.uk/pub/databases/ipd/mhc/dla
Cattle ftp://ftp.ebi.ac.uk/pub/databases/ipd/mhc/bola
Felines ftp://ftp.ebi.ac.uk/pub/databases/ipd/mhc/fla
Fish ftp://ftp.ebi.ac.uk/pub/databases/ipd/mhc/fish
Non-Human Primates ftp://ftp.ebi.ac.uk/pub/databases/ipd/mhc/nhp
Rats ftp://ftp.ebi.ac.uk/pub/databases/ipd/mhc/rt1
Sheep ftp://ftp.ebi.ac.uk/pub/databases/ipd/mhc/ovar
Swine ftp://ftp.ebi.ac.uk/pub/databases/ipd/mhc/sla

The files follow the following naming conventions:

  • species.locus.type.format

For example for a file containing FASTA formatted Gorilla-A nucleotide sequences you would select gogo.a.nuc.fasta from the nhp directory.

IPD FTP Server

The following descriptions detail the types of sequence formats available at the FTP site. The FASTA and PIR files are just raw sequence all inserts (.) and spaces (*) have been removed from the sequence.

FASTA - Sequences in FASTA/Pearson format are represented by two main line types. The first line always begins with a "greater than" (>) sign and contains sequence information. In the files provided the sequence information includes the the allele name. The remaining lines contain plain text representing the nucleotide sequence. There can be any number of these sequence lines, of any length, to represent the nucleotide sequence.

Example FASTA format:

>Allele*0101
ATGTCGCTCTTGGTCGTCAGCATGGCGTGTGTTGGGTTCTTCTTGCTGCAGGGGGCCTGG
CCACATGAGGGAGTCCACAGAAAACCTTCCCTCCTGGCCCACCCAGGTCCCCTGGTGAAA
TCAGAAGAGACAGTCATCCTGCAATGTTGGTCAGATGTCATGTTTGAACACTTCCTTCTG
CACAGAGAGGGGATGTTTAACGACACTTTGCGCCTCATTGGAGAACACCATGATGGGGTC
TCCAAGGCCAACTTCTCCATCAGTCGCATGACGCAAGACCTGGCAGGGACCTACAGATGC
TACGGTTCTGTTACTCACTCCCCCTATCAGGTGTCAGCTCCCAGTGACCCTCTGGACATC
GTGATCATAGGTCTATATGAGAAACCTTCTCTCTCAGCCCAGCCGGGCCCCACGGTTCTG
GCAGGAGAGAATGTGACCTTGTCCTGCAGCTCCCGGAGCTCCTATGACATGTACCATCTA
TCCAGGGAAGGGGAGGCCCATGAACGTAGGCTCCCTGCAGGGCCCAAGGTCAACGGAACA
TTCCAGGCTGACTTTCCTCTGGGCCCTGCCACCCACGGAGGGACCTACAGATGCTTCGGC
TCTTTCCATGACTCTCCATACGAGTGGTCAAAGTCAAGTGACCCACTGCTTGTTTCTGTC
ACAGGAAACCCTTCAAATAGTTGGCCTTCACCCACTGAACCAAGCTCCAAAACCGGTAAC
CCCCGACACCTGCACATTCTGATTGGGACCTCAGTGGTCATCATCCTCTTCATCCTCCTC
TTCTTTCTCCTTCATCGCTGGTGCTCCAACAAAAAAAATGCTGCGGTAATGGACCAAGAG
TCTGCAGGAAACAGAACAGCGAATAGCGAGGACTCTGATGAACAAGACCCTCAGGAGGTG
ACATACACACAGTTGAATCACTGCGTTTTCACACAGAGAAAAATCACTCGCCCTTCTCAG
AGGCCCAAGACACCCCCAACAGATATCATCGTGTACACGGAACTTCCAAATGCTGAGTCC
AGATCCAAAGTTGTCTCCTGCCCATGA

PIR - The format of sequences in PIR/NbrF format is more complex. The first line of each sequence entry begins with a "greater than", (>). This is immediately followed by a two character sequence type specifier, for these seqeunces this is "DL", meaning DNA linear. Space four must contain a semicolon. Beginning in space five is the sequence name or identification code. The second line of each sequence entry contains a brief description including the accession number, allele name, sequence length, and a internal checksum for PIR files. The nucleic acid sequence begins on the third line. The sequence is free format, however to aid in reading the sequences, the nucleotides have been arranged in blocks of 10 amino acids. The last character is an asterisk (*), and acts as a termination character.

Example PIR format.

>DL;Allele*0101
Allele*0101, 1047 bases, 9EB285B5 checksum.
ATGTCGCTCT TGGTCGTCAG CATGGCGTGT GTTGGGTTCT TCTTGCTGCA
GGGGGCCTGG CCACATGAGG GAGTCCACAG AAAACCTTCC CTCCTGGCCC
ACCCAGGTCC CCTGGTGAAA TCAGAAGAGA CAGTCATCCT GCAATGTTGG
TCAGATGTCA TGTTTGAACA CTTCCTTCTG CACAGAGAGG GGATGTTTAA
CGACACTTTG CGCCTCATTG GAGAACACCA TGATGGGGTC TCCAAGGCCA
ACTTCTCCAT CAGTCGCATG ACGCAAGACC TGGCAGGGAC CTACAGATGC
TACGGTTCTG TTACTCACTC CCCCTATCAG GTGTCAGCTC CCAGTGACCC
TCTGGACATC GTGATCATAG GTCTATATGA GAAACCTTCT CTCTCAGCCC
AGCCGGGCCC CACGGTTCTG GCAGGAGAGA ATGTGACCTT GTCCTGCAGC
TCCCGGAGCT CCTATGACAT GTACCATCTA TCCAGGGAAG GGGAGGCCCA
TGAACGTAGG CTCCCTGCAG GGCCCAAGGT CAACGGAACA TTCCAGGCTG
ACTTTCCTCT GGGCCCTGCC ACCCACGGAG GGACCTACAG ATGCTTCGGC
TCTTTCCATG ACTCTCCATA CGAGTGGTCA AAGTCAAGTG ACCCACTGCT
TGTTTCTGTC ACAGGAAACC CTTCAAATAG TTGGCCTTCA CCCACTGAAC
CAAGCTCCAA AACCGGTAAC CCCCGACACC TGCACATTCT GATTGGGACC
TCAGTGGTCA TCATCCTCTT CATCCTCCTC TTCTTTCTCC TTCATCGCTG
GTGCTCCAAC AAAAAAAATG CTGCGGTAAT GGACCAAGAG TCTGCAGGAA
ACAGAACAGC GAATAGCGAG GACTCTGATG AACAAGACCC TCAGGAGGTG
ACATACACAC AGTTGAATCA CTGCGTTTTC ACACAGAGAA AAATCACTCG
CCCTTCTCAG AGGCCCAAGA CACCCCCAAC AGATATCATC GTGTACACGG
AACTTCCAAA TGCTGAGTCC AGATCCAAAG TTGTCTCCTG CCCATGA*

All PIR files have been generated using "ReadSeq", a freely available sequence format conversion program written by D. Gilbert.