Release Notes

Release 8.1, Monday November 29 2004

Acknowledgements

R.Apweiler (1), T.K.Attwood (4), A.Bairoch (2), A.Bateman (5), D.Binns (1), P.Bradley (1,4), P.Bork (8), P.Bucher (3), L.Cerutti (3), R.Copley (13), E.Courcelle (6), U.Das (1), L.Daugherty (1), R.Durbin (5), W.Fleischmann (1), J.Gough (11), D.Haft (9), N.Harte (1), N.Hulo (2), D.Kahn (6), A.Kanapin (1), M.Krestyaninova (1), D.Lonsdale (1), R.Lopez (1), I.Letunic (8), M.Madera (12), J.Maslen (1), J.McDowall (1), N.Mulder (1), A.N. Nikolskaya (10), S.Orchard (1), M.Pagni (3), D.Peyruc (6), C.Ponting (7), E.Quevillon (1), C.Sigrist (2), V.Silventoinen (1), D.J.Studholme (5), R.Vaughan (1), C.H.Wu (10).

(1) EMBL Outstation European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK;
(2) Swiss Institute for Bioinformatics, Geneva, Switzerland;
(3) Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland;
(4) School of Biological Sciences, The University of Manchester, Manchester, UK;
(5) The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK;
(6) CNRS/INRA, Toulouse, France;
(7) MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK;
(8) Biocomputing Unit, EMBL-Heidelberg, Germany;
(9) The Institute for Genomic Research, Maryland, USA;
(10) Protein Information Resource, Georgetown University Medical Center, Washington, D.C., USA;
(11) Genomic Sciences Centre, RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Japan;
(12) MRC Laboratory of Molecular Biology, Cambridge, UK.
(13) Wellcome Trust Centre for Human Genetics, Oxford, UK

Current Release

Introduction

The databases UniProt, PROSITE, PRINTS, Pfam, and ProDom joined forces to launch an Integrated Documentation Resource of Protein Families, Domains and Functional Sites, abbreviated InterPro. SMART, TIGRFAMs, PIRSF and more recently SUPERFAMILY joined InterPro. A detailed description of the project can be found in the user manual.

Changes since last release

The protein matches have been updated according to the latest update of UniProt and new methods from Pfam, PROSITE, and TIGRFAMs, have been integrated.

SWISS-MODELs, with hyperlinks to the SWISS-MODEL Repository, are displayed in the Graphical Views as red/white striped bars where there is no overlapping experimentally determined structure. SWISS-MODELs are annotated three-dimensional comparative protein structure models generated by the fully automated homology-modelling pipeline SWISS-MODEL.

New database links added are to:

PANDIT, a database of multiple sequence alignments and phylogenetic trees based on Pfam signatures.

MSDsite, provides a link to the PROSITE ligand statistics page of MSD.

Contents and coverage of the current release

InterPro protein matches are now calculated for all UniProt proteins, which are a combination of UniProt/Swiss-Prot, UniProt/TrEMBL and PIR proteins. For more information see UniProt.

InterPro release 8.1 contains 11330 entries (last entry: IPR011995), representing 2933 domains, 8126 families, 222 repeats, 27 active sites, 21 binding sites and 20 post-translational modification sites. Overall, there are 6177236 InterPro hits from 1295396 UniProt protein sequences. A complete list is available from the ftp site.

DATABASE VERSION ENTRIES
SWISS-PROT 45.1 164201
PRINTS 37.0 1850
TrEMBL 28.1 1503829
Pfam 15.0 7426
PROSITE patterns 18.36 1748
PROSITE preprofiles N/A 125
ProDom 2004.1 1522
InterPro 8.1 11330
Smart 4.0 663
TIGRFAMs 4.0 2251
PIRSF 2.41 549
Superfamily 1.65 1160
GO Classification N/A 18059

Coverage within the protein sequence databases:

90.6% of UniProt/Swiss-Prot - 148807 of 164201 proteins
76.2% of UniProt/TrEMBL - 1146589 of 1503829 proteins
77.7% of UniProt - 1295396 of 1668030 proteins

18315 publications in PUBMED are referenced from InterPro

Forthcoming changes

The next release of InterPro will be release 8.2 scheduled for April 2005. This will include new data from the member databases and further improvements to the Taxonomy servlet and to the InterPro Domain Architecture tool. In addition, work is underway to integrate two new member databases: PANTHER, which is a database of protein families based on HMMs developed against curated protein familes associated with specific ontology terms and Gene3D, which is supplementary to the CATH database, and provides extended predictions of protein structures through HMMs.