Clustering of entries
Structural properties of proteins are often more conserved than sequence. Therefore, a single profile HMM is often insufficient to model an entire, diverse, superfamily of structurally related proteins. In Pfam there is a hierarchical level of classification which groups evolutionary related entries in to sets, termed Clans.
The relationship between entries in a Clan may be defined by:
- sequence similarity (whilst still originating from a common ancestor)
- similarity of known three-dimensional structures
- functional similarity
- and/or similarity between their profile HMMs, as determined by algorithms such as HHsearch
While the majority of Pfam Clans are groupings of domains and families, they have also been used to group Repeat type entries. Clans like the Pentapeptide repeats include profile HMMs (Pfam families) composed of different numbers of repeat units which are known to share a common structural topology (see Figure 6). In other cases Clans can include repeat units as diverse as Tetratricopeptide-like repeats (TPR) which differ not only in sequence length but also in overall structure (whilst maintaining the same fold). Consequently, Pfam entries representing repeats within a Clan may be very different.

Figure 6. Diverse Pfam families from the same clan CL0020 composed of different numbers of repeat units. A) PF03377 represents a single repeat unit in B2SU53, as seen in the structure 3ugm multiple copies of the Pfam entry cover most of the protein; B) PF14853 represents the C-terminal region of Q9Y3D6, as exemplified in 1pc2; C) PF12688 represents most of the length of the protein sequence U5W915, as illustrated in its AlphaFold structure prediction.
| As you may be aware, Pfam is a member of the InterPro member database consortium and Pfam families get integrated in InterPro entries. However, the type of the InterPro entry where a Pfam family is integrated in may differ from the Pfam entry type, as the definitions of the entry types vary between Pfam and InterPro in some cases, such as PF04078 (integrated into a Family InterPro entry) and PF02259 (integrated into a Domain InterPro entry). You can find more information about InterPro entry types definitions. |