Protein classification allows functional and structural properties to be inferred for novel proteins that have not been experimentally characterised.

Proteins can be classified according to the family to which they belong, and/or the domains and features they contain:

  • A protein family is a group of proteins that share a common evolutionary origin reflected by their related functions and similarities in sequence and/or structure.
  • Domains are distinct functional and/or structural units in a protein that can exist in a variety of biological contexts.
  • Sequence features include active sites, binding sites, post-translational modification sites and repeats.

Signatures are mathematical models constructed from multiple sequence alignments that can be used to classify proteins.

Using protein signatures is often a more sensitive way of identifying protein function than pairwise sequence similarity searches, such as BLAST.

Different types of signatures use different methods, focussing on single motifs (patterns), multiple motifs (fingerprints) or considering the whole alignment (profiles and HMMs).They offer distinct advantages in terms of protein sequence analysis and can be used to classify proteins into families, or to identify domains or sequence features.

The EBI offers a resource for protein family classification and domain and site prediction using protein signatures: InterPro. InterPro combines signatures from multiple, diverse source databases into a single searchable resource.