spacer
spacer

2Can Support Portal - Protein Function


FingerPRINTScan - Introduction


FingerPRINTScan is a tool that classifies sequences using the family definitions that are present in the PRINTS database. It is particularly good at detecting distant evolutionary relationships.

PRINTS is a database that uses multiple conserved motifs to create signatures of family membership. Within a multiple alignment, it is usual to find not one, but many motifs that will characterise a family. As many of these conserved regions as possible are then used to make a family signature or fingerprint. Searching against multiple motifs is inherently better than searching against single motifs as matching a motif in the context of its natural neighbours improves the confidence that you can have in that match. Searching against these multiple motif databases offers an increased chance of identifying a distant relative. Distant relatives may not have conserved all of the motifs that are characteristic of a particular family. However, if they contain some of the motifs in the correct order and if the distances between them are consistent with those expected of true neighbouring motifs, they can be identifed as a true (yet distant) match to that family.

The ability of multiple motif methods to tolerate mismatches, both at the level of individual residues within a motif and at the level of individual motifs, significantly increases their sensitivity. Close relative sequences will match all the constituent motifs in a given fingerprint. Fragments and more distant relatives will often lack parts of the fingerprint and will make only partial matches as they will fail to make significant matches with one or more of the motifs. It is important to be able to distinguish these cases from false-positive matches.

How FingerPRINTScan works

The question that we are looking to answer by using fingerPRINTScan is " does my query sequence belong to a family described by a pre-defined fingerprint in the PRINTS database?". The algorithm used identifies fragments within a query sequence that match a database of motifs. A motif, or a motif profile consists of a sequential stack of amino acid residues that have been excised from a multiple alignment.

The method
  • A sliding window is used to interrogate a query sequence and each fragment revealed is scored against the motif profile.

  • This scoring process is repeated for every match in the database, generating a list of matches.

  • The list is then turned into a model that describes the relative positions of matches within the sequence. An important characteristic of motifs is that they occur in a specific order that is defined by the parent alignment.Therefore, in order to match a fingerprint, the order of its constituent motifs must be preserved.

  • This is very straightforward when a query matches all the motifs.

  • However, partial matches only match some of the motifs.

  • There is no relaxation on the strict rule of order for these partial matches.

  • The distances between the various motifs are also measured. These intervals are naturally flexible but upper and lower bounds can be observed from the parent alignments. These limits can be used to filter out matches with unlikely distances between adjacent motifs.

  • Additional criteria are used to discard matches with forbidden overlaps.

  • This model allows the identification of the most likely matching fingerprint to a query sequence that relies on both the magnitude of scores and the available biological contextual information. The level of significance for a match is expressed as the product of the P values of each of the individual motifs. This P value is then expressed as an E value by multiplying by the size of the primary database (E = pD).

We will next look at an example of using the FingerPRINTScan tool >>>


<<< Previous || Start of Lesson || Next >>>

spacer
spacer