spacer
spacer

IPI - International Protein Index - IPI History File Format



We make every effort to maintain stable IPI identifiers and to propagate these between releases. However, IPI is built from multiple data sources, many of which are themselves unstable: this unstability is partially reflected in IPI. IPI history files (e.g. ipi.HUMAN.history.gz) provide information about the creation and deletion of IPI IDs; they also provide successor IDs for entries that have become secondary, and the reasons for the deletion of IDs that have become invalid. IPI history files can be downloaded for the current release from the IPI FTP site. Each line in the history file represents one IPI ID, which are ordered with the most recently created IDs first. The file is tab-delineated, and consists of the following fields:

  1. IPI ID
  2. Release version when ID was created
  3. Release version when ID was deleted, if available or '-' if not
  4. Successor ID, if available or '-' if not
  5. Comments, if available or '-' if not. These comments can be of the following types:
    • Propagated means that the deleted ID has been propagated to another IPI entry (defined in field #4) as a secondary accession number. For more details on IPI identifier propagation see here.
    • Master (P) defunct means that the master source database entry (identified by its accession number P) of the IPI entry with the deleted ID was deleted in the source database, and that this IPI ID could not be propagated to any successor entry in the next IPI release. This can happen at high frequency when gene prediction alogorithms used by source databases to predict protein sequences are signigificantly revised.
    • Master (P) now invalid means that master P of former IPI cluster is still alive in the source database but is no longer used in the construction of IPI. This happens usually as a consequence of an annotation update (e.g. if a UniProt curator realizes that an entry is wrongly assigned to Human, and change its species to some kind of virus).
    • Source entry (P) defunct This particular use is applied when an IPI entry whose master was from a supplementary database previously mapped to an entry from a non-supplementary database as well, but does not map to such an entry in the latest release. Entries from some source databases (considered 'supplementary') are only considered for inclusion in IPI only if they match to an entry from another source database, or if they map to a known gene. These entries can be chosen as the masters of IPI entries. However, such an IPI entry will be deleted, even if the supplementary entry continues to exist, if the supplementary entry no longer meets these criteria for inclusion in a subsequent IPI release). Click here for more details about supplementary data sets.
    • Mapping to known gene now invalid was used when an IPI entry was previously created although it was linked only to entries from supplementary databases, because it could also be mapped to a known gene. However, the link to a known gene has not been confirmed in a subsequent release, leading to the deletion of the IPI entry. Following changes in the way supplementary data sets are dealt with (see here for details), this comment will not apply to IPI entries which are removed from May 2006 releases on. Instead, the more appropriate following comment will be used:
    • Unsupported hypothetical protein is used when an IPI entry was previously created although it was linked only to entries from supplementary databases, because the master entry had support for its validity (as explained here). However, this support has not been confirmed in a subsequent release, leading to the deletion of the IPI entry.
    • Source entry (P) now invalid is applied in cases where an entry has been dropped from IPI because the entries from non-supplementary source databases previously mapped to this IPI entry are no longer used in the construction of IPI (as in case 4).
    • Identified as putative MHC allele means that an IPI entry from a previous release is now identified as a MHC allele and excluded from IPI final data sets.
    • Master (P) rejected as short fragmentory sequence means that an IPI entry from a previous release is now identified as a fragmentory sequence shorter than 100AA and thus excluded from IPI final data sets.
    • Master (P) lost #, under investigation or Under investigation simply means that these cases are not yet supported and are beeing investigated. We hope to be able to provide more information about the fate of these IPI entries in subsequent releases.

spacer
spacer