 |
IPI - International Protein Index - UniProt Format
IPI is released in a pseudo-UniProt format to supplement the
original FASTA format file. The UniProt format file contains extra
cross reference information linking IPI to CleanEx, EPD, HGNC, GO,
Interpro, Entrez Gene, MGI, ReAlSplice, RGD, RZPD, S/MARt DB,
Transfac, UniParc, UTRdb, and ZFIN, and identifies the chromowhich the
gene encoding each IPI entry is found. To avoid potential
contradictions, additional cross references are taken only from the
master entry behind each IPI entry.
A sample entry is shown below:
ID IPI00003881.5 IPI; PRT; 415 AA.
AC IPI00003881;
DT 01-OCT-2001 (IPI Human rel. 2.00, Created)
DT 06-OCT-2005 (IPI Human rel. 3.11, Last sequence update)
DE SIMILAR TO HETEROGENEOUS NUCLEAR RIBONUCLEOPROTEIN H.
OS Homo sapiens (Human).
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
OX NCBI_TaxID=9606;
CC -!- GENE_LOCATION: Chr. 10:43201071-43224620:-1.
DR UniProtKB/Swiss-Prot; P52597; HNRPF_HUMAN; -.
DR Vega; OTTHUMP00000019482; OTTHUMG00000018029; M.
DR Vega; OTTHUMP00000043413; OTTHUMG00000018029; -.
DR Vega; OTTHUMP00000043414; OTTHUMG00000018029; -.
DR ENSEMBL_HAVANA; ENSP00000348345; ENSG00000169813; -.
DR ENSEMBL_HAVANA; ENSP00000349573; ENSG00000169813; -.
DR ENSEMBL_HAVANA; ENSP00000363572; ENSG00000169813; -.
DR REFSEQ_REVIEWED; NP_004957; GI:4826760; -.
DR UniProtKB/TrEMBL; Q5T0N2; Q5T0N2_HUMAN; -.
DR UniProtKB/TrEMBL; Q8NI96; Q8NI96_HUMAN; -.
DR UniProtKB/TrEMBL; Q96AU2; Q96AU2_HUMAN; -.
DR ENSEMBL; ENSP00000338477; ENSG00000169813; -.
DR ENSEMBL; ENSP00000348345; ENSG00000169813; -.
DR H-InvDB; HIT000003838; HIX0008779; -.
DR H-InvDB; HIT000030409; HIX0008779; -.
DR H-InvDB; HIT000031821; HIX0008779; -.
DR H-InvDB; HIT000037199; HIX0008779; -.
DR H-InvDB; HIT000037659; HIX0008779; -.
DR UniParc; UPI0000000C5C; -; -.
DR HGNC; 5039; HNRPF; -.
DR Entrez Gene; 3185; HNRPF; -.
DR UniGene; Hs.808; -; -.
DR CCDS; CCDS7204.1; -; -.
DR ReAlSplice protein; SL0000062; hnRNPF; factor involved in alternative splicing.
DR trome; HTR002991; -; -.
DR RZPD; Hs.808; -; Clones and other research material.
DR CleanEx; HS_HNRPF; -; -.
DR InterPro; IPR012677; a_b_plait_nuc_bd.
DR InterPro; IPR000504; RNP1_RNA_bd.
DR InterPro; IPR012996; Znf_CHHC.
DR Pfam; PF00076; RRM_1; 3.
DR Pfam; PF08080; zf-RNPHF; 1.
DR SMART; SM00360; RRM; 3.
DR PROSITE; PS50102; RRM; 2.
DR GENE3D; G3D.3.30.70.330; Nucl_bd_a/b_plat; 3.
SQ SEQUENCE 415 AA; 45672 MW; D14E170631FB1F31 CRC64;
MMLGPEGGEG FVVKLRGLPW SCSVEDVQNF LSDCTIHDGA AGVHFIYTRE GRQSGEAFVE
LGSEDDVKMA LKKDRESMGH RYIEVFKSHR TEMDWVLKHS GPNSADSAND GFVRLRGLPF
GCTKEEIVQF FSGLEIVPNG ITLPVDPEGK ITGEAFVQFA SQELAEKALG KHKERIGHRY
IEVFKSSQEE VRSYSDPPLK FMSVQRPGPY DRPGTARRYI GIVKQAGLER MRPGAYSTGY
GGYEEYSGLS DGYGFTTDLF GRDLSYCLSG MYDHRYGDSE FTVQSTTGHC VHMRGLPYKA
TENDIYNFFS PLNPVRVHIE IGPDGRVTGE ADVEFATHEE AVAAMSKDRA NMQHRYIELF
LNSTTGASNG AYSSQVMQGM GVSAAQATYS GLESQSVSGC YGAGYSGQNS MGGYD
// |
An explanation of the line types is as follows:
id= IPI Identifier with version number, Data Class = 'IPI'
Current IPI accession number, followed by secondary identifiers
Description line, taken from the master sequence for this IPI entry
Taxonomic classification
In IPI, the comment line is used to provide the genomic location of the
gene(s) to which an IPI entry has been mapped to. The location information
is based on the latest Ensembl assembly build.
- "-!- GENE_LOCATION: ", to be followed by a description
of a genomic location on which this gene is believed to be located.
The description of a genomic location contains Chromosome location, start
coordinate, end coordinate and strand, where the structure looks as followed:
Chr. 10:43201071-43224620:-1
Chr. <Chromosome Location>:<start coordinate>-
<end coordinate>:<strand>
- "Chromosome Location": the name of the chromosome
on which this gene is believed to be located.
- "start coordinate": the lowest genomic location
(based on latest assembly build used by Ensembl) of the different
transcripts expressed by the Ensembl gene mapped to this IPI entry.
- "end coordinate": the highest genomic location
(based on latest assembly build used by Ensembl) of the different
transcripts expressed by the Ensembl gene mapped to this IPI entry.
- "strand": the strand from which these transcripts
are expressed (1 for FORWARD and -1 for REVERSE).
Cross references in IPI can be to any of the constituent databases. The master entry of each IPI
entry (the entry which supplies the IPI entry with its sequence and
description line) is indicated by the presence of an 'M' in the
fourth field of its cross-reference. Additional cross-references are
added to a number of other databases (usually by inference from the
master entry).
In more detail, the individual cross reference types are:
SQ lines
Display the sequence of an IPI record,
taken from its master entry |
Format change notices
|
From the 7 August 2007 onwards, which corresponds
to human 3.32, mouse 3.32, rat 3.32, zebrafish 3.31, arabidopsis 3.30,
chicken 3.26 and cow 3.18 releases,
CC (comment) lines have been changed to allow more
than one location to be given in an entry (in the case where a single
protein maps to more than one gene).
e.g.
Before:
CC -!- CHROMOSOME: 6.
CC -!- START CO-ORDINATE: 31853274.
CC -!- END CO-ORDINATE: 31871565.
CC -!- STRAND: -1.
|
Now:
CC -!- GENE_LOCATION: Chr. 6:31892491-31896096:-1.
CC -!- GENE_LOCATION: Chr. 6:31853274-31871565:-1.
|
|
From the 22 February 2006 onwards, which
corresponds to human/mouse/rat 3.15, zebrafish 3.14, arabidopsis
3.13, chicken 3.09 and cow 3.01 releases, REFSEQ_NP and REFSEQ_XP
database codes will be replaced by REFSEQ_STATUS where 'STATUS'
represents the RefSeq entry revision status (or UNKNOWN_STATUS if no
status available).
e.g.
Previously:
DR REFSEQ_NP; NP_061183; GI:52351208; -.
DR REFSEQ_NP; NP_001324; GI:23110960; -.
DR REFSEQ_NP; NP_005617; GI:21361282; -.
DR REFSEQ_NP; NP_001034191; GI:84993245; -.
DR REFSEQ_XP; XP_114618; GI:41148435; -.
DR REFSEQ_NP; NP_001025036; GI:71274178; -.
|
Revised format:
DR REFSEQ_VALIDATED; NP_061183; GI:52351208; -.
DR REFSEQ_REVIEWED; NP_001324; GI:23110960; -.
DR REFSEQ_PROVISIONAL; NP_005617; GI:21361282; -.
DR REFSEQ_PREDICTED; NP_001034191; GI:84993245; -.
DR REFSEQ_MODEL; XP_114618; GI:41148435; -.
DR REFSEQ_INFERRED; NP_001025036; GI:71274178; -.
DR REFSEQ_UNKNOWN_STATUS; AP_000639; GI:58615663; -.
|
|
From April 2004 onwards, which corresponds
to human 2.31, mouse 1.24 and rat 1.14 releases, secondary IPI numbers have
been added (after current accession number) to the AC lines of the UniProt format files. The entry
version number has been moved to the ID line.
e.g.
Before:
ID IPI00013881 IPI; PRT; 449 AA.
AC IPI00013881.4; |
Now:
ID IPI00013881.4 IPI; PRT; 449 AA.
AC IPI00013881; IPI00155062; IPI00334833; |
|
 |