What are profiles?

Profiles are used to model protein families and domains. They are built by converting multiple sequence alignments into position-specific scoring systems (PSSMs). Amino acids at each position in the alignment are scored according to the frequency with which they occur, as represented in Figure 14. Substitution matrices (such as BLOSUM matrices) can be used to add evolutionary distance weighting these scores.

Information

Learn more about PSSMs and substitution matrices.

 

Figure  14. Representation of a scoring matrix based on a multiple sequence alignment. Each of the 20 amino acids commonly found in proteins is given a score for each position in the sequence according to the frequency with which they occur in the original alignment. Other factors, such as evolutionary distances can also be considered.

Figure 14 Representation of a scoring matrix based on a multiple sequence alignment. Each of the 20 amino acids commonly found in proteins is given a score for each position in the sequence according to the frequency with which they occur in the original alignment. Other factors, such as evolutionary distances can also be considered.

Examples  of databases that use profiles to classify proteins include CDD (Marchler-Bauer A. et al. 2015), HAMAP (Lima. T, et al. 2009) and PROSITE (which produces profiles as well as patterns. Sigrist. CJ, et al. 2010). The PRODOM (Servant. F, et al. 2002) database also uses a related approach, using PSI-BLAST to create its profiles. You can find out more about profiles by reading Gribskov M. et al. 1987.