InterPro is used to classify proteins into families and predict the presence of domains and functionally important sites. The project integrates signatures from 11 major protein signature databases: CATH-Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE (patterns and profiles) SMART and TIGRFAMs. During the integration process, InterPro rationalises instances where more than one protein signature describes the same protein family or domain, uniting these into a single InterPro entry and noting relationships between them where applicable.
InterPro adds biological annotation and links to external databases such as GO, PDB, SCOP and CATH. It precomputes all matches of its signatures to UniProt Archive (UniParc) proteins using the InterProScan software, making the data available in a variety of machine-readable formats and via web-based graphical interfaces. This data is updated and incorporated into each UniProtKB release.
InterPro has a number of important applications, including the automatic annotation of proteins for UniProtKB/TrEMBL and genome annotation projects. InterPro is used by Ensembl and in the GOA project to provide large-scale mapping of proteins to GO terms. InterProScan also forms a core component of the EBI Metagenomics analysis pipeline.