InterPro is used to classify proteins into families and to predict the presence of domains and functionally important site. The project integrates signatures from 14 major protein signature databases: CATH-Gene3D, CDD, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE (patterns and profiles), SFLD, SMART, SUPERFAMILY and TIGRFAMs. The diversity of databases helps to ensure that annotations are as comprehensive as possible. Furthermore, the different databases offer complementary levels of protein classification, from broad-level (e.g., a protein is a member of a superfamily) to more fine-grained assignments (e.g. a protein is a member of a specific family, or possesses a particular type of domain). These different levels of granularity are used by InterPro to produce a hierarchical classification system: one or more-member database signatures are integrated into an InterPro entry, and, where appropriate, relationships are highlighted between different entries, identifying those that represent smaller, functionally specific subsets of a broader entry.
InterPro has a number of important applications, including the automatic annotation of proteins for UniProtKB/TrEMBL and genome annotation projects. InterPro is used by Ensembl and in the GOA project to provide large-scale mapping of proteins to GO terms. InterProScan also forms a core component of the EBI Metagenomics analysis pipeline.