Sequence motifs

     Sequence search on a structural database is a major challenge. There are many tools that perform fast sequence matching, however, few support ProSite patterns and most are optimized for long sequences and are not effective for a short one.

     This search is going as addition to ligand environment search and can be combined with the first one. The pattern it self is sequence motif which characterises properties of a protein chain. Matching and searching this motifs can bridng a new content to the familiar PDB entries.

    The pattern search is intended to answer the following questions:
  1. Which PDB entries contain protein chains whose sequence matches the given signature?
  2. Which PDB entries contain proteins that interact with a small molecule by residues that are in the pattern?

In MSDsite search interface the pattern can be specified in terms of ProSite code or in compliant form with ProSite pattern motif.

Motif example:

P-x-[STA]-x-[LIV]-[IVT]-x-[GS]-G-Y-S-[QL]-G

A small sequence can be specified instead of a pattern. The only demand is that residues have to be separated by dash(-). Thus the search by a small sequence can reveal its binding properties.

Consider questions that this search is intended to answer we provide 2 examples for each one of them and 1 complex example.

Browsing results of ligand search from the "Ligand search" chapter ot the tutorial where ligand is anticancer drug residue MTX, we can find that the interacting ProSite pattern is PS00075. Use the link for more information about this pattern.
In short it is dihydrofolate reductase signature.
motif:[LVAGC]-[LIF]-G-x(4)-[LIVMF]-P-W-x(4,5)-[DE]-x(3)-[FYIV]-x(3)-[STIQ]

Now we can explore more about proteins which sequence match the pattern.

Which PDB entries contain protein chains whose sequence match PS00075?

    Steps:
  1. In your browser go to: http://www.ebi.ac.uk/msd-srv/msdsite/
  2. Type PS75 in the "ProSite pattenr/code" field of the search form.
  3. Click "Search..." button and get back a list of the PDB entries

Which PDB entries contain protein chains whose sequence match PS00075 and matched residues interact with an anticancer drug residue MTX?

    Steps:
  1. In your browser go to: http://www.ebi.ac.uk/msd-srv/msdsite/
  2. Type MTX in the "Hetero" field of the search form.
  3. Type PS75 in the "ProSite pattenr/code" field of the search form.
  4. Switch on "Interacts" checkbox which is located to the right of the "ProSite pattenr/code" field of search form.

    The input should look like this:

  5. Click "Search..." button and get back a list of the PDB entries.

Which PDB entries contain proteins that interact with a small molecule by residues that are in the pattern that is specific to anticancer drug residue MTX?

Assume that we want to invent a pattern which is specific for the such ligand type as MTX or similar ligands. In general it is rather complex task which demands a lot of work to be done. So here we consider a simplified version to demostrate help that a researcher can get from MSDsite.

    To solve the task we go through the next steps:
  1. We are using results from the previous example where as result we had the list of PDB entries that comply with PS00075 pattern and some these residues interact with MTX.
  2. In this list we choose PDB entry 1df7 by click on the link in the column "ID":
  3. From the "sequence" detail page we take part of the sequence that belongs to the chain A.

    The part of interest starts after 40 sequence number.Within this part of the sequence the MTX interacts with the two protein residues Proline and Argenine. We select the part of the sequence starting one residue before the Proline and finishing at the Argenine: LPAKVRPLPGR.
    We do it using mouse with the pressed left button.

    Now we have small sequence of interest: LPAKVRPLPGR

    to be able to search by this sequence using MSDsite we have to transform it into the comliant with the pattern format form:
    L-P-A-K-V-R-P-L-P-G-R

    At this stage we perform the search by this pattern. To do it we put our pattern into start search from "ProSite pattern/code" field and press "Search..." button.

    The result shows that the pattern is rather strict and was found just in a few PDB entries.

      To generalyse the pattern we find a similar interaction of MTX with the protein sequence:
    • Within the pattern MTX interacts with Argenine and Proline. So we go to the start page of MSDsite and search PDBe by "Hetero" - MTX, "Environment" - ARG PRO and "ProSite pattern/code" - PS00075.
    • The search result brings us the 6 PDB entries: 1df7, 1dra, 3cd2, 3dfr, 3drc, 4dfr

    • In the result list we select 4dfr PDB entry and go into details by clicking on the link in the column "ID".
    • Now we are looking into the detail page. The part of interest starts after 41 sequence number.Within this part of the sequence the MTX interacts with the three protein residues Argenine, Proline and Argenine. We select the part of the sequence that is similar to the previous one: WESIGRPLPGR. We take this part of the sequence and copy it into a separate file or write down on the paper under previous part of the sequence from 1df7.
    • We do the same thing with the other 4 PDB entries and after that we transform the sequences into the ProSite pattern format by including dashes between letters. We get some thing like this:
      MTX     *       *     *   *
      1df7: L-P-A-K-V-R-P-L-P-G-R
      1dra: W-E-S-I-G-R-P-L-P-G-R
      3cd2: I-P-L-Q-F-R-P-L-K-G-R
      3dfr: E-S-F-P-K-R-P-L-P-E-R
      3drc: W-E-S-I-G-R-P-L-P-G-R
      4dfr: W-E-S-I-G-R-P-L-P-G-R
      		
    • Generalizing the pattern in the result we get:
      [LIWE]-[PES]-x(3)-R-P-L-[PK]-[GE]-R
  4. Now let's explore properties of the newly invented pattern:

    [LIWE]-[PES]-x(3)-R-P-L-[PK]-[GE]-R

    For better understanding the binding properties of the pattern we use "ProSite pattern statistics" page.

    • Point your browser to the start page of MSDsite: http://www.ebi.ac.uk/msd-srv/msdsite/
    • In the top left coner there is a link "ProSite pattern statistics".

      Use this link to load the "ProSite pattern statistics" page into your browser.
    • Copy/past the pattern:
      [LIWE]-[PES]-x(3)-R-P-L-[PK]-[GE]-R
      to the "Pattern/code" field of the form:

    • Press "Search statistics" button and get the result charts

    MTX is a favorite ligand of this pattern in sense of quantity. Analyzing the charts we see that most of ligands that interact with the new found pattern have a very similar to MTX structure. For example: from the table in the bottom of the page we can see that FOL/DDF/FA (Folic acids) 99% interact with the pattern. This is a sign of a quality of the pattern.