What is InterPro?


Why do we need InterPro?

Classifying proteins into families and identifying important domains and sites is invaluable for helping biologists to identify distantly related proteins and to predict their functions. A daunting array of resources, each with different strengths and weaknesses, is now available for searching genomes and proteomes with ‘protein signatures’ – diagnostic entities that are used to recognise a particular domain or family. InterPro combines protein signatures from multiple, diverse databases into a single searchable resource, capitalising on their individual strengths to produce a powerful integrated database and diagnostic tool for protein sequence classification.

InterPro applications

InterPro is used in the large-scale analysis of whole proteomes, genomes and metagenomes, as well as for the characterisation of individual proteins. Within the EBI, we use InterPro to help us annotate protein sequences in UniProtKB.

InterPro data

InterPro has 13 member databases, each of which uses a different method to classify proteins. InterPro curators manually integrate protein signatures from member databases, merging signatures that represent the same protein family, domain or site into single InterPro entries. Where possible, they also trace biological relationships between entries. They check the biological accuracy of the individual signatures and add pertinent information, including consistent entry names, descriptive abstracts, links to the biomedical literature and Gene Ontology terms. Links are also made to various other databases, such as MEROPS, IntAct, ENZYME and PDB. Figure 1 provides an overview of the data sources used to construct InterPro.

Figure 1 An overview of the data sources used to construct InterPro.

Member databases



The following databases contribute data to InterPro:

  • CDD at NCBI, Bethesda, USA
  • CATH-Gene3D at University College, London, UK
  • PANTHER at University of Southern California, CA, USA
  • PIRSF at the Protein Information Resource, Georgetown University Medical Centre, Washington DC, USA
  • Pfam at the EMBL-EBI, Hinxton, UK
  • PRINTS at the University of Manchester, UK
  • ProDom at PRABI Villeurbanne, France
  • PROSITE and HAMAP at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland
  • SFLD at the University of California, San Francisco, USA
  • SMART at EMBL, Heidelberg, Germany
  • SUPERFAMILY at the University of Bristol, UK
  • TIGRFAMs at the J. Craig Venter Institute, Rockville, MD, US