What are protein signatures?
In order to classify proteins into families and to predict the presence of important domains or sequence features, we require computational tools. One set of such tools are the predictive models known as protein signatures.
There are different types of signatures, built using different computational approaches. However, their common starting point is a multiple sequence alignment of proteins sharing a set of characteristics (e.g. belonging to the same family or sharing a domain) (see Figure 10 below). When building the initial model, the level of amino acid conservation at different positions in the alignment is taken into account. The model is then used to search a protein database in an iterative manner, refining the model as more distantly related sequences in the database are identified. Once the model is mature, the signature is ready and can be used for protein sequence analysis.
Figure 10 The process of building a protein signature starts with a multiple sequence alignment, which is used to build a predictive model. By searching a protein database in an iterative way, more distantly related sequences can be identified. This information is used to create a final mature model.