spacer
spacer

IPI -  International Protein Index - Old News


IPI Old News
December 2005
TAIR6 release available in IPI

TAIR has released a new version (TAIR6) of Arabidopsis thaliana genome annotation. The TAIR mixed case IDs are now all consistent and in upper case in this new release. IPI reflects these changes.

September 2005
Integration of Vega in IPI

The Vertebrate Genome Annotation (VEGA) database has been added to IPI. Vega is a central repository for high quality, frequently updated, manual annotation of vertebrate finished genome sequence.

Affected species are human, mouse and zebrafish.

August 2005
IPI history data searchable in SRS
It is now easy to track deleted and secondary IPI IDs through the IPI Quick Search tool or the EBI's SRS server. You can still access this data when downloading IPI history files from our ftp server.
May 2005
IPI data sets released for Chicken

We are pleased to announce the release of the first IPI data sets for the Gallus gallus (Chicken). The first release of IPI chicken contains 33799 protein sequences, with cross-references to UniProtKB, Ensembl, RefSeq and Entrez Gene. As with the data for other species, the Chicken datasets are available in UniProt or FASTA format, and additional summary files are also available (click here for more information about available file formats).

April 2005
Cross References to the Consensus CDS (CCDS) project added in Human IPI
The Consensus CDS (CCDS) project is a collaborative effort to identify a core set of human protein coding regions that are consistently annotated and of high quality.

Consequently the file formats of the protein and gene xrefs files have been modified.

UniProt DAS service now serving IPI cross references
The UniProt DAS service is now serving IPI cross references for all IPI IDs. Example URLs: http://www.ebi.ac.uk/das-srv/
uniprot/das/aristotle/features?segment=IPI00000012
March 2005
Publication of IPI history files
IPI IDs are propagated between releases wherever possible. In some cases, IDs are lost (usually because the master entry is dead in the source database).

From now on, we will make IPI history files available to allow users to track deleted IPI IDs.

File format is described here.

Cross References to the Trancriptome database (trome) added
The Transcriptome database (trome) uses alignments of EST data to HTG or full genomes to generate virtual transcripts and coding sequences. This database is in a Swiss-Prot-like format, updated every 3 months and can be downloaded by anonymous ftp from ftp://ftp.isrec.isb-sib.ch/pub/databases.

Cross-references to trome have now been added to IPI UniProt-like format files (.dat files in IPI FTP site).

February 2005
Integration of H-InvDB in Human IPI

The H-Invitational Database (H-InvDB) has been added to IPI as a supplementary data set.

H-InvDB is a human gene database, with integrative annotation of 41,118 full-length cDNA clones currently available from six high throughput cDNA sequencing projects. This database represents 21,037 cDNA clusters...(more).

UniGene Cross-References added

Following the integration of H-InvDB in Human IPI and the addition of UniGene Cross-References to all IPI datasets, the file formats of the protein and gene xrefs files have been modified.

January 2005
IPI data sets released for Arabidopsis

We are pleased to announce the release of the first IPI data sets for the Arabidopsis thaliana (Mouse-ear cress). The first release of IPI arabidopsis contains 32954 protein sequences, with cross-references to UniProtKB, TAIR, RefSeq and Entrez Gene. As with the data for other species, the Arabidospis datasets are available in UniProt or FASTA format, and additional summary files are also available (click here for more information about available file formats).

Entrez Gene replaces LocusLink as cross-referenced database

IPI now has cross-references to Entrez Gene. Entrez Gene is the successor database to LocusLink. For species covered by LocusLink, data can still be accessed using the Entrez Gene identifiers (more information...).

Cross-References file format modified

Following the addition of Arabidopsis to IPI datasets, the file formats of the protein and gene xrefs files have been modified.

IPI announcements mailing list created

See details here.

December 2004
IPI data sets released for zebrafish

We are pleased to announce the release of the first IPI data sets for the zebrafish, Brachydanio rerio. The first release of IPI zebrafish contains 32802 protein sequences, which cross-reference UniProtKB entries, RefSeq entries and Ensembl entries, with additional cross-references to genes defined in LocusLink and ZFIN. As with other species, the zebrafish datasets are available in UniProt or FASTA format, and additional summary files are also available (click here for more information about available file formats).

November 2004
IPI 3.0 released
The latest IPI releases ( 3rd November 2004) represent the first release of IPI version 3. IPI version 3 has been produced through the use of a revised code base, designed to support the extension of IPI to additional species and the incorporation of additional data sources. These developments should become available within the next few months.

Associated with this, there have been some slight changes to the IPI algorithm. Merging entries from protein sequence databases into IPI entries have been made more rigorously dependent on the compatibility of the genes encoding them (more...).

Future releases of IPI for human, mouse and rat will be numbered 3.00, 3.01, etc. and will document intermediate '.dat' file updates between full releases on the FTP server (3.0.1, 3.0.2, etc.). For an overview of the versioning systems numbers used for previous releases of IPI, click here.

August 2004
Cross References to CleanEx database added
CleanEx is a database which provides access to public gene expression data via unique approved gene symbols and which represents heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-dataset comparisons.

Cross-references to CleanEx data have been added to IPI UniProt format files.

July 2004
Cross References to Transfac database added
Transfac® is the database on eukaryotic transcription factors, their genomic binding sites and DNA-binding profiles.

Cross-references to all types of Transfac data (factor, gene, and site) from the public version have been added to IPI UniProt format files.

June 2004
Cross References to EPD and RZPD databases added
EPD, the Eukaryotic Promoter Database is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally.

RZPD Deutsches Ressourcenzentrum für Genomforschung is a not-for-profit service center for genomics and proteomics research. Based on one of the largest clone collections world-wide, it provides high quality research material, high throughput technology and automation solutions for academic institutions as well as for industry.

Cross-references to EPD and RZPD have been added to IPI UniProt format files.

May 2004
Cross References to UTRdb added
UTRdb is a specialized database of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNA's
Apr 2004
Secondary identifiers now available (ATTN: FORMAT CHANGE)
Secondary IPI numbers have been added (after current accession number) to the AC lines of the UniProt format files, to facilitate the track of IPI identifiers along different releases. The entry version number has been moved to the ID line.

More about secondary IPI numbers can be found here.

Cross References to UniParc and S/MARt DB added
UniParc cross-references have been added to the UniProt format files, and the cross-reference files.

S/MARt DB collects information about scaffold/matrix attached regions and the nuclear matrix proteins that are supposed be involved in the interaction of these elements with the nuclear matrix.

Cross-references to S/MARt have been added to IPI UniProt format files.

Mar 2004
Cross References to ReAlSplice added
ReAlSplice collects information on the splicing sites which will be "glued" together during the splicing event, on additional sites in the pre-mRNA which are regarded to influence alternative splicing and on factors involved in alternative splicing (i.e. splicing factors).

Cross-references to ReAlSplice have now been added to IPI UniProt-like format files (.dat files in IPI FTP site).

Feb 2004
Gene Association and Cross Reference Files now available
Gene Association (.goa) files (comprised of protein annotations taken from the GO controlled vocabulary) are now available for the complete IPI sets of human, mouse and rat data, released as part of the GOA project. In addition, convenient tab-delineated (.xrefs) files of the major cross-references in each IPI data set are also now available.

Download the Gene Association files for IPI from the GOA FTP site (see the HUMAN, MOUSE and RAT subdirectories).

Download the cross reference files from the IPI FTP site.

A description of the file format of the cross-reference files can be found here.

Oct 2003
Integration of Mouse and Rat RefSeq XPs
RefSeq XPs (automatic protein predictions) have been incorporated into the new IPI releases for mouse (v1.17) and rat (v1.7). RefSeq XPs have been present in the human IPI since v2.0
May 2003
Improvement in procedure for identifier transfer
The procedure for propagating identifiers between IPI releases has been modified to increase identifier stability. Click here for more details.
April 2003
Splice variant identifiers introduced into Swiss-Prot
Swiss-Prot has introduced stable identifiers for known splice variants (see this page for more details). Following this, Swiss-Prot isoform IDs and sequences have now been incorporated into IPI. Where one Swiss-Prot entry describes many alternative sequences, each sequence is indicated within Swiss-Prot with its own identifier (consisting of the accession number of the parent entry, a hyphen, and a number e.g. P12345-1); and each of these sequences is now represented separately and uniquely within IPI, with its own mappings to Ensembl and RefSeq. Swiss-Prot entries describing only one splice isoform are still represented in IPI by use of their accession number, as before.

At the same time, some minor changes have been

made to the IPI algorithm. Pre-filtering of RefSeq XPs (a part of the human IPI procedure) has been dropped; and additional cross references to certain redundant TrEMBL entries (previously not referred to) have now been added.
March 2003
Rat IPI launched
Version 1.0 of rat IPI has been released. Rat IPI is assembled from Swiss-Prot, TrEMBL, Ensembl and RefSeq NPs, using the same algorithm as used for the human and mouse IPI sets. Rat IPI will subsequently be released on the same schedule as human and mouse IPI, i.e. rat version 1.1 will be released co-incidentally with human version 2.18 and mouse version 1.11 in early April 2003.
May 2002
Mouse IPI launched
Version 1.0 of mouse IPI has been released. Mouse IPI is assembled from Swiss-Prot, TrEMBL, Ensembl and RefSeq NPs, using the same algorithm as used for the human IPI set. Mouse IPI will subsequently be released on the same schedule as human IPI, i.e. mouse version 1.1 will be released co-incidentally with human version 2.8.

With the launch of the mouse IPI set, the format of the header lines of ipi files in FASTA format is slightly changed, with effect from mouse v1.0 (released now) and human v2.8 (release due June 1st 2002). The current human release, v2.7, remains unchanged. A new item, "Tax_Id" (i.e. "Tax_Id=9606" for human, "Tax_Id=10090" for mouse), is added between the database identifiers and the entry description. The DT lines of the Swiss-Prot format files have also been changed. See below for more details of the FASTA and Swiss-Prot formats.

April 2002
IPI algorithm modified
To reflect recent developments in RefSeq, the treatment of RefSeq XPs and RefSeq NPs in IPI by been modified. Overlap between RefSeq XPs and NPs has been reduced, resulting in a small increase in the size of the IPI set.
November 2001
Interpro hits for IPI now available
A file describing Interpro matches for the complete IPI set is now available for download. For IPI v2.1, there are matches for 19526 proteins sequences (i.e. 60% coverage of the complete IPI set). The latest version is available here (the IPI version for which this set of matches was prepared is given in the first line of the file; click here for a fuller description of the format.
October 2001
IPI now available in Swiss-Prot format
IPI has now been released in a pseudo-Swiss-Prot format to supplement the original FASTA format file. The Swiss-Prot format file contains extra cross reference information linking IPI to GO, HUGO, LocusLink and Interpro, and identifies the chromowhich the gene encoding each IPI entry is found. More details here

spacer
spacer