InterPro documentation

Release 3.2, July 2001

Acknowledgments
InterPro has been prepared by:

R.Apweiler (1), T.K.Attwood (4), A.Bairoch (2), A.Bateman (5), E.Birney (1), M.Biswas (1), P.Bucher (3), R. Copley (8), M.D.R.Croning (1,4), R.Durbin (5), W.Fleischmann (1), J.Gouzy (6), H.Hermjakob (1), N.Hulo (2), D.Kahn (6), A.Kanapin (1), R.Lopez (1), N.Mulder (1), T.Oinn (1), C.Ponting (7), F.Servant (1), C.Sigrist (2), E.Zdobnov (1).

(1) EMBL Outstation - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK;
(2) Swiss Institute for Bioinformatics, Geneva, Switzerland;
(3) Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland;
(4) School of Biological Sciences, The University of Manchester, Manchester, UK;
(5) The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK;
(6) CNRS/INRA, Toulouse, France;
(7) MRC Functional Genetics Unit, Department of Human Anatomy & Genetics, University of Oxford, UK;
(8) Biocomputing Unit, EMBL-Heidelberg, Germany;

1. Introduction
The databases SWISS-PROT, TrEMBL, PROSITE, PRINTS, Pfam, SMART and ProDom joined forces to launch an Integrated Resource of Protein Families, Domains and Sites, abbreviated InterPro. A detailed description of the project can be found in the InterPro user manual.

2. Changes since last major release
Over 500 SMART signatures have been integrated into InterPro. Each InterPro entry has been assigned a short name which are also applied to ProDom signatures within the entries. The Gene Ontology (GO) classification system has been incorporated for all InterPro entries. The mappings to InterPro entries are displayed in new fields, one for each of the three ontologies: molecular function, biological process and cellular component. The protein matches have been updated according to the latest updates of SWISS-PROT and TrEMBL, and additional methods from new releases of the member databases have been added.

There is a new procedure for the XML file containing the database. The database excluding matches is dumped to a disk. From this stage on, the information flow is strictly one way, so any modifications we make beyond this point are not reflected in Oracle.
Any layout tags in the abstracts are escaped using XML literals, i.e. <p> becomes &lt;p&gt;
In addition, anything within <pre> tags in the abstract is enclosed in a CDATA block (see xml 1.0 standard for more information). Currently the file is then edited by hand to ensure compliance with the XML schema.

2.1 Changes since release 3.1
A new attribute has been introduced in interpro.xml file for interpro and db_xref elements. The attribute protein_count contains number of proteins having matches for the corresponding interpro entry or signature.

3. Contents of current release
InterPro release 3.2 contains 3939 entries, representing 1009 domains, 2850 families, 65 repeats, and 15 post-translational modification sites. Overall there are 1947322 InterPro hits from 571892 SWISS-PROT + TrEMBL protein sequences. A complete list is available from the ftp site.

The release was build using the following database versions:

DATABASE

VERSION

ENTRIES

DATE

SWISS-PROT

39.21

98387

13-JUN-2001

TREMBL

17.0

473505

15-JUN-2001

PROSITE

16.37

1474

05-MAY-2001

PREFILE

N/A

236

25-SEP-2000

PFAM

6.2

2773

01-APR-2001

PRINTS

30.0

1500

30-MAR-2001

PRODOM

20001.1

1310

30-MAR-2001

SMART

3.1

509

16-NOV-2000


The SWISS-PROT and TrEMBL data used includes updates of these versions.

4. Forthcoming changes
The fourth production release 4.0 is scheduled for September 2001. For Release 4.0, we aim to integrate more databases, with TIGRFAMS to come next. There will be a new graphical user interface for viewing protein matches. A new entry type, "site" will be introduced to replace PTM. This type will be one of three different sorts: binding site, active site or PTM.

5. We need your help
We welcome any feedback. If you find errors or omissions please let us know. You can contact us at: Interhelp@ebi.ac.uk.

6. Copyright Notice
InterPro - Integrated Resource Of Protein Domains And Functional Sites
Copyright © 2001 The InterPro Consortium.

This manual and the accompanying database may be copied and redistributed freely, without advance permission, provided that this Copyright statement is reproduced with each copy.