What is InterPro?
Why do we need InterPro?
Classifying proteins into families and identifying important domains and sites is invaluable for helping biologists to identify distantly related proteins and to predict their functions. A daunting array of resources, each with different strengths and weaknesses, is now available for searching genomes and proteomes with ‘protein signatures’ – diagnostic entities that are used to recognise a particular domain or family. InterPro combines protein signatures from multiple, diverse databases into a single searchable resource, capitalising on their individual strengths to produce a powerful integrated database and diagnostic tool for protein sequence classification.
InterPro is used in the large-scale analysis of whole proteomes, genomes and metagenomes, as well as for the characterisation of individual proteins. Within the EBI, we use InterPro to help us annotate protein sequences in UniProtKB.
InterPro has 11 member databases, each of which uses a different method to classify proteins. InterPro curators manually integrate protein signatures from member databases, merging signatures that represent the same protein family, domain or site into single InterPro entries. Where possible, they also trace biological relationships between entries. They check the biological accuracy of the individual signatures and add pertinent information, including consistent entry names, descriptive abstracts, links to the biomedical literature and Gene Ontology terms. Links are also made to various other databases, such as MEROPS, IntAct, ENZYME and PDB. Figure 1 provides an overview of the data sources used to construct InterPro.