Acknowledgements

R.Apweiler (1), T.K.Attwood (4), A.Bairoch (2), A.Bateman (5), D.Binns (1), M.Biswas (1), P.Bradley (1,4), P.Bork (8), P.Bucher (3), R.Copley (8), E.Courcelle (6), R.Durbin (5), L.Falquet (5), W.Fleischmann (1), J.Gouzy (6), S.Griffiths-Jones (5), D.Haft (9), N.Hulo (2), D.Kahn (6), A.Kanapin (1), , M.Krestyaninova (1), R.Lopez (1), I.Letunic (8), N.Mulder (1), S.Orchard (1), M.Pagni (3), D.Peyruc (6), C.Ponting (7), F.Servant (1), C.Sigrist (2)

(1) EMBL Outstation European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK;
(2) Swiss Institute for Bioinformatics, Geneva, Switzerland;
(3) Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland;
(4) School of Biological Sciences, The University of Manchester, Manchester, UK;
(5) The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK;
(6) CNRS/INRA, Toulouse, France;
(7) MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK;
(8) Biocomputing Unit, EMBL-Heidelberg, Germany;
(9) The Institute for Genomic Research, Maryland, USA.

Introduction

The databases Swiss-Prot, TrEMBL, PROSITE, PRINTS, Pfam, and ProDom joined forces to launch an Integrated Documentation Resource of Protein Families, Domains and Functional Sites, abbreviated InterPro. SMART and TIGRFAMs have subsequently joined InterPro. A detailed description of the project can be found in the user manual.

Changes since last major release

The protein matches have been updated according to the latest updates of Swiss-Prot and TrEMBL, and additional methods from new releases of the member databases have been added. While the data from Pfam 7.7 is in InterPro, not all new Pfam HMMs have been assigned to InterPro entries yet. A large number of PRINTS false positive hits have been eliminated with the implementation of family-specific thresholds in the InterProScan package. InterProScan has also been updated to include new member database signatures. The number of mappings to the Gene Ontology (GO) classification system has increased since the last release.

The InterPro entry pages have direct links to the sequence and text search pages. Where applicable there are database cross-references to the CArbohydrate- Active EnZymes (CAZy) site, which describes families of related catalytic and carbohydrate-binding modules of enzymes that act on glycosidic bonds. The condensed graphical view of protein matches has been extended to facilitate retrieval of all proteins sharing a common architecture. The proteins can be retrieved in FASTA format or their sequence alignments can be viewed using DisplayFam or Jalview.

Contents of current release

InterPro release 5.3 contains 6725 entries, representing 1453 domains, 5121 families, 136 repeats,and 15 post-translational modification sites. Overall, there are 2932939 InterPro hits from 850953 Swiss-Prot + TrEMBL protein sequences. A complete list is available from the ftp site.

The release was build using the following database versions:

DATABASE VERSION ENTRIES DATE
Swiss-Prot 40.33 118357 09-NOV-2002
PRINTS 33.0 1650 24-JAN-2002
TREMBL 22.2 732596 15-NOV-2002
PFAM 7.7 4832 17-OCT-2002
PROSITE 17.25 1587 04-NOV-2002
PREFILE N/A 162 04-NOV-2002
PRODOM 20001.3 1346 28-JAN-2002
SMART 3.4 654 07-OCT-2002
TIGRFAMs 2.1 1614 12-SEP-2002

Forthcoming changes

The sixth production release 6.0 is scheduled for March 2003. For Release 6.0, we aim to integrate more databases, with links to PIR Superfamilies coming next. A new entry type, "site" will be introduced to replace PTM. This type will be one of three different sorts: binding site, active site or PTM. Release 6.0 will also include the release of a new-look web interface with extended capabilities including a field which describes the taxonomic range of each entry. We have plans to include information on protein secondary and tertiary structure where available