InterPro documentation

Release 4.0, November 2001

InterPro has been prepared by:

R.Apweiler (1), T.K.Attwood (4), A.Bairoch (2), A.Bateman (5), E.Birney (1), M.Biswas (1), P.Bucher (3), R. Copley (8), M.D.R.Croning (1,4), R.Durbin (5), W.Fleischmann (1), J.Gouzy (6), D.Haft (9), H.Hermjakob (1), N.Hulo (2), D.Kahn (6), A.Kanapin (1), R.Lopez (1), N.Mulder (1), T.Oinn (1), C.Ponting (7), F.Servant (1), C.Sigrist (2), E.Zdobnov (1).

(1) EMBL Outstation - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK;
(2) Swiss Institute for Bioinformatics, Geneva, Switzerland;
(3) Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland;
(4) School of Biological Sciences, The University of Manchester, Manchester, UK;
(5) The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK;
(6) CNRS/INRA, Toulouse, France;
(7) MRC Functional Genetics Unit, Department of Human Anatomy & Genetics, University of Oxford, UK;
(8) Biocomputing Unit, EMBL-Heidelberg, Germany;
(9) The Institute for Genomic Research, Maryland, USA.

  Contents
  • 1 - Introduction
  • 2 - Changes since last major release
  • 3 - Contents of current release
  • 4 - Forthcoming changes
  • 5 - Feedback

1. Introduction
The databases SWISS-PROT, TrEMBL, PROSITE, PRINTS, Pfam and ProDom joined forces to launch an Integrated Resource of Protein Families, Domains and Sites, abbreviated InterPro. SMART joined InterPro earlier this year, and the most recent member to join is TIGRFAMs. A detailed description of the project can be found in the InterPro user manual.

2. Changes since last major release
The first 814 TIGRFAMs HMMs have been integrated into InterPro. TIGRFAMs are models based on grouping of functionally equivalent proteins. The number of mappings to the Gene Ontology (GO) classification system has increased since the last release. The protein matches have been updated according to the latest updates of SWISS-PROT and TrEMBL, and additional methods from new releases of the member databases have been added. InterProScan has also been updated to include new member database signatures and the TIGRFAMs HMMs.

The InterPro entry pages now have direct links to the sequence and text search pages. Where applicable there are links to the Oxford University Press (OUP)/EBI Protein Profiles website, which contains protein alignments used for the production of the Protein Profile series of books by OUP. There are also database cross-references to the CArbohydrate-Active EnZymes (CAZy) site, which describes families of related catalytic and carbohydrate-binding modules of enzymes that act on glycosidic bonds. The condensed graphical view of protein matches has been extended to facilitate retrieval of all proteins sharing a common architecture. The proteins can be retrieved in FASTA format or their sequence alignments can be viewed using DisplayFam or Jalview.

3. Contents of current release
InterPro release 4.0 contains 4691 entries, representing 1068 domains, 3532 families, 74 repeats, and 15 post-translational modification sites. Overall there are 2141621 InterPro hits from 586124 SWISS-PROT + TrEMBL protein sequences. A complete list is available from the ftp site.

The release was build using the following database versions:


DATABASE

VERSION

ENTRIES

DATE

SWISS-PROT

40.1

101737

24-OCT-2001

TREMBL

18.1

484387

26-OCT-2001

PROSITE

16.37

1474

05-MAY-2001

PREFILE

N/A

252

18-JUL-2001

PFAM

6.6

3071

06-AUG-2001

PRINTS

31.0

1550

10-JUL-2001

PRODOM

20001.2

1346

30-SEP-2001

SMART

3.1

509

16-NOV-2000

TIGRFAMs

1.2

814

03-AUG-2001


The SWISS-PROT and TrEMBL data used includes updates of these versions.

4. Forthcoming changes
The fifth production release 5.0 is scheduled for March 2002. For Release 5.0, we aim to integrate more databases, with links to PIR Superfamilies coming next. A new entry type, "site" will be introduced to replace PTM. This type will be one of three different sorts: binding site, active site or PTM. We hope to develop a more advanced text search facility for this next release, and to include a field which describes the taxonomic range of each entry. We have plans to provide users with the opportunity to display scores of InterProScan sequence search results, and the scores for protein matches in each entry will also be made available.

5. Feedback
We need your help and would welcome any feedback. If you find errors or omissions please let us know.
You can contact us at: Interhelp@ebi.ac.uk.