Release NotesRelease 9.0, Thursday February 17 2005
R.Apweiler (1), T.K.Attwood (4), A.Bairoch (2), A.Bateman (5), D.Binns (1), P.Bradley (1,4), P.Bork (8), P.Bucher (3), M.J.Campbell (14), L.Cerutti (3), R.Copley (13), E.Courcelle (6), U.Das (1), L.Daugherty (1), M.Dibley (7), R.Durbin (5), W.Fleischmann (1), J.Gough (11), D.Haft (9), N.Harte (1), N.Hulo (2), D.Kahn (6), A.Kanapin (1), M.Krestyaninova (1), D.Lonsdale (1), R.Lopez (1), I.Letunic (8), M.Madera (12), J.Maslen (1), J.McDowall (1), J.Mistry (5), N.Mulder (1), A.N. Nikolskaya (10), S.Orchard (1), C.Orengo (7), M.Pagni (3), D.Peyruc (6), E.Quevillon (1), C.Sigrist (2), V.Silventoinen (1), P.D.Thomas (14), R.Vaughan (1), C.H.Wu (10).
(1) EMBL Outstation European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK;
(2) Swiss Institute for Bioinformatics, Geneva, Switzerland;
(3) Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland;
(4) School of Biological Sciences, The University of Manchester, Manchester, UK;
(5) The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK;
(6) CNRS/INRA, Toulouse, France;
(7) Biochemistry and Molecular Biology Department, University College London, University of London, UK;
(8) Biocomputing Unit, EMBL-Heidelberg, Germany;
(9) The Institute for Genomic Research, Maryland, USA;
(10) Protein Information Resource, Georgetown University Medical Center, Washington, D.C., USA;
(11) Genomic Sciences Centre, RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Japan;
(12) MRC Laboratory of Molecular Biology, Cambridge, UK;
(13) Wellcome Trust Centre for Human Genetics, Oxford, UK;
(14) Computational Biology, Applied Biosystems, 850 Lincoln Center Drive, Foster City, CA 94404, USA.
The databases UniProt, PROSITE, PRINTS, Pfam, and ProDom joined forces to launch an Integrated Documentation Resource of Protein Families, Domains and Functional Sites, abbreviated InterPro. SMART, TIGRFAMs, PIRSF, SUPERFAMILY, and with this release PANTHER and Gene3D, have joined InterPro. A detailed description of the project can be found in the user manual.
Changes since last release
The protein matches have been updated according to the latest update of UniProt and new methods from PIRSF have been integrated.
PIRSF protein classification now includes subfamilies that reflects the evolutionary relationship of full-length proteins and domains.
Two new new databases, Gene3D and PANTHER, have been incorporated:
- Gene3D is a supplement to the CATH structural database, and provides extended predictions of protein structures through HMMs. The seed alignments for the models are derived from the proteins found within the homologous superfamily (H-level) classification level in CATH, which groups together domains that are considered to have evolved from a common ancestor.
- PANTHER HMMs define protein families, and subfamilies modelled on the divergence of specific functions within the families; this permits more accurate association with function based on ontology terms and pathways, as well as inference of amino acids important for functional specificity.
Dynamic links to the CluSTr database are now shown where available:
- InterPro entries are linked to protein clusters, only where >= 70% of the CluSTr members occur in the InterPro entry.
The single protein match view now shows predicted orthologues, restricted to the most similar protein in other proteomes.
Contents and coverage of the current release
InterPro protein matches are now calculated for all UniProt proteins, which are a combination of UniProt/Swiss-Prot, UniProt/TrEMBL and PIR proteins. For more information see UniProt.
InterPro release 9.0 contains 11605 entries (last entry IPR012306), representing 2982 domains, 8373 families, 222 repeats, 27 active sites, 21 binding sites and 20 post-translational modification sites. Overall, there are 6781584 InterPro hits from 1387363 UniProt protein sequences. A complete list is available from the ftp site.
90.9% of UniProt/Swiss-Prot - 154641 of 170140 proteins
76.4% of UniProt/TrEMBL - 1232722 of 1614107 proteins
77.8% of UniProt - 1387363 of 1784247 proteins
19044 publications in PubMed are referenced from InterPro.
The next release of InterPro will be release 10.0 scheduled for March 2005.