spacer
spacer

IPI - International Protein Index - Gene Cross-References File Format



IPI gene cross-reference (ipi.genes.*.xrefs) files are convenient tab-delineated files mapping protein entries from IPI protein source databases to gene entries and their chromosomal locations from Ensembl. This data is produced using mappings between IPI protein source databases and Ensembl, Entrez Gene, Vega (when available), and species-specific resources such as Genew (the nomenclature database of the HGNC), MGI, RGD, ZFIN, TAIR (proteins and genes for A. thaliana) and UniGene as part of the process used to make IPI, and is available for all species covered by IPI. These files can be downloaded for the current release from the IPI FTP site.

Each file represents all the chromosomes of a single genome.

Column Summary Comments
1 Chromosome Values: 1-n (where n = 22 (human), 19 (mouse), 20 (rat), etc..), W, X, Y, Z, "M" for Mitochondrion, "C" for Chloroplast, "Un" for unknown.
2 Cosmid For some Ensembl genes, a location in the complete chromosome is not known owing to the incompletion of the underlying sequence assembly, and the co-ordinates provides are merely the local co-ordinates within a single cosmid. Where this is the case, the cosmid name is given in the second column
3 Start co-ordinate of gene (on Ensembl assembly). Given in base pairs, global (these are chromosomal co-ordinates, except in cases discussed previously). Data taken from Ensembl.
4 End co-ordinate of gene (on Ensembl assembly). Given in base pairs, global (these are chromosomal co-ordinates, except in cases discussed previously). Data taken from Ensembl.
5 Strand of gene (on Ensembl assembly) corresponding to these co-ordinates. 1 for FORWARD or SENSE and -1 for REVERSE or ANTISENSE.
6 Gene location Taken from species-specific gene resource (e.g. HUGO) if available or Entrez Gene if not.
7 Ensembl Gene ID
8 Gene id, Gene symbol Taken from HUGO, MGD, RGD, ZFIN, TAIR Gene (as appropriate, dependent on the species).
9 Gene id, Gene symbol Taken from Entrez Gene.
10 IPI ACs All IPI ACs associated with this gene are given.
11 UniProtKB/Swiss-Prot ACs All UniProtKB/Swiss-Prot ACs associated with this gene are given.
12 UniProtKB/TrEMBL ACs All UniProtKB/TrEMBL ACs associated with this gene are given.
13 Ensembl peptide IDs All Ensembl peptide IDs associated with this gene are given. Havana curated transcripts preceeded by the key HAVANA: (e.g. ENSP00000341800;HAVANA:ENSP00000317992;).
14 RefSeq STATUS:ID couples All RefSeq STATUS:ID couples associated with this gene are given (RefSeq entry revision status details).
15 TAIR Protein IDs All TAIR Protein IDs associated with this gene are given.
16 H-InvDB Protein IDs All H-InvDB Protein IDs associated with this gene are given.
17 UniGene Gene IDs All UniGene Gene IDs associated with this gene are given.
18 CCDS IDs All CCDS IDs associated with this gene are given.
19 RefSeq GI protein IDs All GI protein IDs associated with this gene are given.
20 Vega Gene ID
21 Vega peptide IDs All Vega peptide IDs associated with this gene are given.

If a given column contains a list of values, each value in the list is separated by use of a semi-colon (';').

Other links:





Revised format from 04 of April 2006



From the 04 April 2006 onwards, which corresponds to human/mouse/rat 3.16, zebrafish 3.15, arabidopsis 3.14, chicken 3.10 and cow 3.02 releases, Ensembl End co-ordinates and genomic strand information will be inserted (column 4 and 5) after the Start co-ordinate information (column 3). Subsequent columns will shift to the right.
e.g.

  • Column 4 - End co-ordinate of gene (on Ensembl assembly) given in base pairs, global (these are chromosomal co-ordinates). Data taken from Ensembl.
  • Column 5 - Strand of gene (on Ensembl assembly) corresponding to these co-ordinates. 1 for FORWARD or SENSE and -1 for REVERSE or ANTISENSE.
  • Subsequent columns shifted to the right.
Previously:
                            1 216475253 1q41 ENSG00000196660 25355,SLC30A10 55532,SLC30A10 IPI00000012;IPI00464958;
                            Q49AL9;Q6XR72;Q9NPW0; ENSP00000349018; VALIDATED:NP_001004433;VALIDATED:NP_061183;
                            HIT000015673;HIT000022321;HIT000026337; Hs.519812; GI:52351208;GI:52351218;
                            OTTHUMG00000037434 OTTHUMP00000035563;
Revised format:
                            1 216475253 216489910	-1 1q41 ENSG00000196660
                            25355,SLC30A10 55532,SLC30A10 IPI00000012;IPI00464958;
                            Q49AL9;Q6XR72;Q9NPW0; ENSP00000349018; VALIDATED:NP_001004433;VALIDATED:NP_061183;
                            HIT000015673;HIT000022321;HIT000026337; Hs.519812; GI:52351208;GI:52351218;
                            OTTHUMG00000037434 OTTHUMP00000035563;



Revised format from 22 of February 2006



From the 22 February 2006 onwards, which corresponds to human/mouse/rat 3.15, zebrafish 3.14, arabidopsis 3.13, chicken 3.09 and cow 3.01 releases, Ensembl peptides will be moved from column 13 to 11. Single column will be used for all RefSeq entries (subsequent columns will shift to the left).
Entry status information available for each referenced RefSeq entry.
e.g.

  • Column 11 - All Ensembl peptide IDs associated with this gene (previously in column 13).
  • Column 12 - List of RefSeq STATUS:ID couples (separated by a semi-colon ';') associated with this gene (RefSeq entry revision status details) plus column 13 removed and subsequent columns shifted to the left.
Previously:
                            7 151579364 7q36.1 155100,LOC155100 IPI00045628;
                            NP_001025037;	XP_379977;	ENSP00000021776;[...]
Revised format:
                            7 151579364 7q36.1 155100,LOC155100 IPI00045628;
                            ENSP00000021776;	INFERRED:NP_001025037;MODEL:XP_379977;[...]
                        































spacer
spacer