spacer
spacer

2Can Support Portal - Protein Function


Introduction


When we are carrying out sequence analysis, our aim is to find out as much as we can about our mystery sequence. The first step in the process is always to carry out a primary database search (BLAST, FASTA) in order to identify sequence similarities. However, analysis of the output can be difficult and these searches cannot always answer some of the more sophisticated questions of sequence analysis. Our next step is then to search against the secondary (also known as pattern or derived or signature) databases, which have become vital resources for helping in the prediction of protein function.

These signature databases contain the results of analysis of the sequences in the primary databases. They analyse the primary databases differently and therefore they contain different information. Examples of signature databases include PROSITE, Pfam, PRINTS and InterPro. These resources have arisen from a common principle. Homologous sequences are gathered together into multiple alignments, within which are conserved regions that show little or no variation between the constituent sequences. These conserved motifs usually reflect some vital biological role. These motifs have been exploited in different ways to build diagnostic patterns for particular protein families. For example, PROSITE stores information as regular expressions or patterns, PRINTS as aligned motifs or fingerprints and Pfam as Hidden Markov Models.

We can search with our query sequence against these secondary databases looking for matches to the patterns that they contain. Matches allow us to assign our query protein to a particular family. If the structure and function of the family are known, searches of the secondary databases can offer us a fast track for inferring biological function. Bacause the secondary databases are derived from multiple alignments, searches of them are often better able to identify distant relationships rather then the corresponding search of the primary databases. However, the pattern databases are not complete and should therefore be used to augment primary database searches rather than to replace them.


<<< Previous || Start of Lesson || Next >>>

spacer
spacer