TAIR has released a new version
(TAIR6)
of Arabidopsis thaliana genome annotation. The TAIR mixed case IDs
are now all consistent and in upper case in this new release.
IPI reflects these changes.
September 2005
Integration of Vega in IPI
The Vertebrate Genome Annotation (VEGA) database has been added to IPI.
Vega is a central repository for high
quality, frequently updated, manual annotation of vertebrate finished genome
sequence.
It is now easy to track deleted and secondary IPI IDs through the
IPI Quick Search tool or the
EBI's
SRS server.
You can still access this data when downloading IPI history files from our
ftp
server.
May 2005
IPI data sets released for Chicken
We are pleased to announce the release of the first IPI data sets for the
Gallus gallus (Chicken). The first release of IPI chicken contains
33799 protein sequences, with cross-references to UniProtKB, Ensembl, RefSeq and Entrez
Gene.
As with the data for other species, the Chicken datasets are available in UniProt or
FASTA format, and additional summary files are also available
(click here for more information
about available file formats).
April 2005
Cross References to the Consensus CDS (CCDS) project added
in Human IPI
The Consensus CDS (CCDS) project
is a collaborative effort to identify a core set of human protein coding regions
that are consistently annotated and of high quality.
Consequently the file formats of the
protein and
gene
xrefs files have been modified.
UniProt DAS service now serving IPI cross references
IPI IDs are propagated between releases wherever possible.
In some cases, IDs are lost (usually because the master
entry is dead in the source database).
From now on, we will make IPI history files available
to allow users to track deleted IPI IDs.
Cross-references to trome have now been added to
IPI UniProt-like format files (.dat files in
IPI FTP
site).
February 2005
Integration of H-InvDB in Human IPI
The H-Invitational Database (H-InvDB) has been added to IPI as a
supplementary data set.
H-InvDB is a human gene database, with integrative annotation of 41,118
full-length cDNA clones currently available from six high throughput cDNA
sequencing projects. This database represents 21,037 cDNA
clusters...(more).
UniGene Cross-References added
Following the integration of
H-InvDB
in Human IPI and the addition of
UniGene
Cross-References to all IPI datasets, the file formats of the
protein and
gene
xrefs files have been modified.
January 2005
IPI data sets released for Arabidopsis
We are pleased to announce the release of the first IPI data sets for the
Arabidopsis thaliana (Mouse-ear cress). The first release of IPI arabidopsis contains
32954 protein sequences, with cross-references to UniProtKB, TAIR, RefSeq and Entrez Gene.
As with the data for other species, the Arabidospis datasets are available in UniProt or
FASTA format, and additional summary files are also available
(click here for more information
about available file formats).
Entrez Gene replaces LocusLink as cross-referenced database
IPI now has cross-references to Entrez Gene. Entrez Gene is the successor database
to LocusLink. For species covered by LocusLink, data can still be accessed using
the Entrez Gene identifiers
(more
information...).
Cross-References file format modified
Following the addition of Arabidopsis to IPI datasets, the file formats of the
protein and
gene
xrefs files have been modified.
We are pleased to announce the release of the first IPI data sets for the
zebrafish, Brachydanio rerio. The first release of IPI zebrafish contains
32802 protein sequences, which cross-reference UniProtKB entries, RefSeq entries and
Ensembl entries, with additional cross-references to genes defined in LocusLink
and ZFIN. As with other species, the zebrafish datasets are available in UniProt
or FASTA format, and additional summary files are also available
(click here for more information
about available file formats).
November 2004
IPI 3.0 released
The latest IPI releases ( 3rd November 2004) represent the first release of IPI version 3. IPI
version 3 has been produced through the use of a revised code base, designed to support the
extension of IPI to additional species and the incorporation of additional data sources. These
developments should become available within the next few months.
Associated with this, there have been some slight changes to the IPI
algorithm. Merging entries from protein sequence databases into IPI entries have been made more
rigorously dependent on the compatibility of the genes encoding them (more...).
Future releases of IPI for human, mouse and rat will be numbered 3.00,
3.01, etc. and will document intermediate '.dat' file updates between full releases on the FTP
server (3.0.1, 3.0.2, etc.). For an overview of the versioning systems numbers used for previous
releases of IPI, click here.
August 2004
Cross References to CleanEx database added
CleanEx is a database which provides
access to public gene expression data via unique approved gene symbols and which represents
heterogeneous expression data produced by different technologies in a way that facilitates joint
analysis and cross-dataset comparisons.
Transfac®
is the database on eukaryotic transcription factors, their genomic binding sites and DNA-binding
profiles.
Cross-references to all types of Transfac data (factor, gene, and site)
from the public version have been added to IPI UniProt format
files.
June 2004
Cross References to EPD and RZPD databases added
EPD, the Eukaryotic Promoter Database is an
annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start
site has been determined experimentally.
RZPD Deutsches Ressourcenzentrum für
Genomforschung is a not-for-profit service center for genomics and proteomics research. Based on
one of the largest clone collections world-wide, it provides high quality research material,
high throughput technology and automation solutions for academic institutions as well as for
industry.
Secondary IPI numbers have been added (after current accession number) to the AC lines of the UniProt format files, to facilitate the track of IPI
identifiers along different releases. The entry version number has been moved to the ID line.
More about secondary IPI numbers can be found here.
S/MARt DB collects information about
scaffold/matrix attached regions and the nuclear matrix proteins that are supposed be involved
in the interaction of these elements with the nuclear matrix.
ReAlSplice collects
information on the splicing sites which will be "glued" together during the splicing
event, on additional sites in the pre-mRNA which are regarded to influence alternative splicing and
on factors involved in alternative splicing (i.e. splicing factors).
Cross-references to ReAlSplice have now been added to IPI UniProt-like
format files (.dat files in IPI FTP site).
Feb 2004
Gene Association and Cross Reference Files now available
Gene Association (.goa) files (comprised of protein annotations taken from the GO controlled
vocabulary) are now available for the complete IPI sets of human, mouse and rat data, released as
part of the GOA project. In addition, convenient
tab-delineated (.xrefs) files of the major cross-references in each IPI data set are also now
available.
Download the Gene Association files for IPI from the GOA FTP site (see the
HUMAN, MOUSE and RAT subdirectories).
Download the cross reference files from the IPI FTP site.
A description of the file format of the cross-reference files can be
found here.
Oct 2003
Integration of Mouse and Rat RefSeq XPs
RefSeq XPs (automatic protein predictions) have been incorporated into the new IPI releases for
mouse (v1.17) and rat (v1.7). RefSeq XPs have been present in the human IPI since v2.0
May 2003
Improvement in procedure for identifier transfer
The procedure for propagating identifiers between IPI releases has been modified to increase
identifier stability. Click here for more details.
April 2003
Splice variant identifiers introduced into Swiss-Prot
Swiss-Prot has introduced stable identifiers for known splice variants (see this page for more
details). Following this, Swiss-Prot isoform IDs and sequences have now been incorporated into IPI.
Where one Swiss-Prot entry describes many alternative sequences, each sequence is indicated within
Swiss-Prot with its own identifier (consisting of the accession number of the parent entry, a
hyphen, and a number e.g. P12345-1); and each of these sequences is now represented separately and
uniquely within IPI, with its own mappings to Ensembl and RefSeq. Swiss-Prot entries describing only
one splice isoform are still represented in IPI by use of their accession number, as before.
At the same time, some minor changes have been
made to the IPI algorithm. Pre-filtering of RefSeq XPs (a part of the human IPI procedure) has been
dropped; and additional cross references to certain redundant TrEMBL entries (previously not
referred to) have now been added.
March 2003
Rat IPI launched
Version 1.0 of rat IPI has been released. Rat IPI is assembled from Swiss-Prot, TrEMBL, Ensembl and
RefSeq NPs, using the same algorithm as used for the human and mouse IPI sets. Rat IPI will
subsequently be released on the same schedule as human and mouse IPI, i.e. rat version 1.1 will be
released co-incidentally with human version 2.18 and mouse version 1.11 in early April 2003.
May 2002
Mouse IPI launched
Version 1.0 of mouse IPI has been released. Mouse IPI is assembled from Swiss-Prot, TrEMBL, Ensembl
and RefSeq NPs, using the same algorithm as used for the human IPI set. Mouse IPI will subsequently
be released on the same schedule as human IPI, i.e. mouse version 1.1 will be released
co-incidentally with human version 2.8.
With the launch of the mouse IPI set, the format of the header lines of
ipi files in FASTA format is slightly changed, with effect from mouse v1.0 (released now) and
human v2.8 (release due June 1st 2002). The current human release, v2.7, remains unchanged. A
new item, "Tax_Id" (i.e. "Tax_Id=9606" for human, "Tax_Id=10090"
for mouse), is added between the database identifiers and the entry description. The DT lines of
the Swiss-Prot format files have also been changed. See below for more details of the FASTA and Swiss-Prot formats.
April 2002
IPI algorithm modified
To reflect recent developments in RefSeq, the treatment of RefSeq XPs and RefSeq NPs in IPI by been
modified. Overlap between RefSeq XPs and NPs has been reduced, resulting in a small increase in the
size of the IPI set.
November 2001
Interpro hits for IPI now available
A file describing Interpro matches for the complete IPI set is now available for download. For IPI
v2.1, there are matches for 19526 proteins sequences (i.e. 60% coverage of the complete IPI set).
The latest version is available here
(the IPI version for which this set of matches was prepared is given in the first line of the file;
click here for a fuller description of the format.
October 2001
IPI now available in Swiss-Prot format
IPI has now been released in a pseudo-Swiss-Prot format to supplement the original FASTA format
file. The Swiss-Prot format file contains extra cross reference information linking IPI to GO, HUGO,
LocusLink and Interpro, and identifies the chromowhich the gene encoding each IPI entry is found. More details here