Abdelkrim RACHEDI


Contents

  1.  Introduction
  2.  Getting Started
    1.  Task 1
  3.  Search SG-targets by sequence similarity
    1.  Paste a sequence
      1.  Task 2
    2.  Upload sequences
      1.  Task 3
      2.  Task 4
  4.  Search/Track SG-targets
    1.  General Query
      1.  Task 5
    2.  Target ID
      1.  Task 6
    3.  Protein Name & Species
      1.  Task 7a
      2.  Task 7b
      3.  Task 7c
      4.  Task 8
    4.  Status
      1.  Task 9
    5.  Stuctural Genomics Centers
      1.  Task 10

Introduction

Target Information Server: Structural Genomics

The Target Information Server is a tool for searching and tracking Structural Genomics targets. These include public domain SG, PDB and prerleased PDB sequences
. Targets are mainly protein sequences however results of a given search query would contain related links to structural data (pdb).

The data in our database has been extracted from the following sites:

  • The Berkeley Structural Genomics Center
  • The Joint Center for Structural Genomics
  • The Midwest Center for Structural Genomics
  • The New York Structural Genomics Research Consortium
  • The Northeast Structural Genomics Consortium
  • The Southeast Collaboratory for Structural Genomics
  • The TB Structural Genomics Consortium
  • The S2F Structure 2 Function Project
  • The BSGI Bacterial Structural Genomics Initiative
  • The SGPP Structural Genomics for Pathogenic Protozoa
  • RIKEN Group
  • Yeast Structural Genomics (France)
  • BNL Group

  • It is worthwhile to note that the sgt data base of sequences is created from the public XML files. There are some differences in the XML between the sites. In addition not all the site identifiers are unique. Until these inconsistencies are resolved the data will not be updated routinely.

    1.   Target data will be described using the XML syntax proposed by the International Task Force.

    Target data will be described according to the following skeleton DTD Updated 25-July-2001

    2.   The protocol for exchanging target data with the registration site will follow the Task Force recommendations:

    The targets file is a concatenation of individual target entries. Targets data files are updated weekly. Each target entry represents a single protein, not a family. Target entries will not be deleted. Abandoned targets will be identified with a "work stopped" status code.

    Tracking targets with this server is done by either searching for sequence similarity between
    user's sequence(s) and SG targets or by direct detailed search(i.e. searching for targets by their status, protein name, or organism source .. etc.).

    Getting Started

    The Target Information Server (MSDtarget) offers two main options to search and track targets (Fig.1):

      Search SG-targets by sequence similarity: (option A)

        This is done by either pasting and or typing a sequence or loading a local file containing one or more sequences in fasta format.

        This will scan for similar target sequences and output a hit list which in turn provide links to targets data (status and sequence alignement with the user's sequence.)

      Search/Track SG-targets: (option B)

        Here, users can search targets database for specific data through many options such
        as status of targets (Status option), target's name (Protein Name option) .. etc.

        Users can also use the option General Query to do a general search such finding all targets that have been submitted to SWISS-PROT, targets information based on date (e.g. 27-aug-2002)
        ..etc.

      Task 1:> Click this link public targets DBase targetDB to launch the MSDtarget service page.

    Search SG-targets by sequence similarity (option A)

      Paste a sequence

      A sequence such as the follwoing can be pasted into the Sequence-Box (see Fig.1) and by clicking the button "Submit"
      MGSSHHHHHHDYDIPTTENLYFQGHMKVKILVDSTADVPFSWMEKYDIDSIPLYVVWEDG
      RSEPDEREPEEIMNFYKRIREAGSVPKTSQPSVEDFKKRYLKYKEEDYDVVLVLTLSSKL
      SGTYNSAVLASKEVDIPVYVVDTLLASGAIPLPARVAREMLENGATIEEVLKKLDERMKN
      KDFKAIFYVSNFDYLVKGGRVSKFQGFVGNLLKIRVCLHIENGELIPYRKVRGDKKAIEA
      LIEKLREDTPEGSKLRVIGVHADNEAGVVELLNTLRKSYEVVDEIISPMGKVITTHVGPG
      TVGFGIEVLERKR
      	      
      a results page will display, on top the sequence that has been submitted and below it a hit list summary of the targets with which the submitted sequence share sequence smilarity. The summary is presented in a table like fashion (Fig.2) showing the following columns :

        Column Maching Targets: Shows the type of target being an SGT (Structural Genomics Target), a PDB (Protein Data Bank) or a PRE (Pre-released PDB target). This is followed by hypelinked accession code of the target. Click the link to see target details and sequence alignement of the target's sequence with the submitted sequence (Fig.3).

        Column Last Status: This concern the SGT targets only and shows the last step of the target.

        Column Seq. Ln: Shows the length of the target's sequence.

        Column Similarity: Shows similiraity level between the target's sequence and submited sequence.

        Column DBase name: Shows database name where the target's sequence resides.

        Column DBase ID: Shows the accession identifier (hypelinked) of the target at the database above. Click the link to see detailed entry of the target at the relevent database

      Task 2:> While the MSDtarget page is on, copy and paste the above sequence into the Sequence-Box and then click the button "Submit". (Depending on the load the MSDtarget is dealing, results page may take some time to appear.)
      Upload sequences

      In case a sequence or more are to be loaded from disk, the second option Upload FASTA sequence(s) file: is to be used. The sequence(s) should be be in FASTA format (FASTA example), which is a simple format in which each sequence is represented by aminoacids in signle letter code (AA Codes) preceeded by a single line which starts with ">" followed by the sequence identifier, title and/or description.

      When loading sequences, the MSDtarget will behave in accordance to one of these following two cases:

        Case1: If uploaded file contains only one sequence, the procedure would be same as seen above with pasting a sequence

        Case2: If uploaded file contains more that one sequence, thus after clicking the "Submit" button, an intermediate page will display the sequences loaded from the file and allow the user to choose which sequence to search against by click on the sequence's button one at a time (Fig.4).
        The rest is same as before.

      Task 3:> (for Case1) Go to MSDtarget main page and click the button "Browse..." and using the dialogue window load the file named "1.fasta" then click the button "Submit".

      Task 4:> (for Case2) Go to MSDtarget main page and click the button "Browse..." and using the dialogue window load the file named "2.fasta" then click the button "Submit". On the intermidiary page, choose one of the sequences then click the button "Submit".

    Search/Track SG-targets (option B)

    This search option allow the search of targets in a direct manner and offers six quering ways (Fig.5). It's worth while to note that the result page of any query will be a page of a hits list where links are provided to allow display of the targets details. However, if a query finds only one hit, the target details will be shown immediatly (see Task 6:>).
      General Query:

      General type of query can be used here such as "swiss-prot" which means "get all targets that have links to the SWISS-PROT database".
      Other queries could be sought including keywords such as "hydrolase" which attempt to get all targets related in away or another with the "enzyme family hydrolase" (see Fig.6).

      Task 5:> Use the textbox General Query: and type oct-2003 then click the button "Submit".
      The result page will show targets dated by producing laboratory as being produced in the month of October 2003.
      Target ID:

      Every target in the databse has an identification number or code. Quering by this way necessates the prior knowledge of the target id for the target that subject of interest.

      Task 6:> Use the textbox Target ID: and type the target_id 281950 then click the button "Submit".
      The result page will give the target data as shown in Fig.7.

      Protein Name & Species:

      Same as above, every target has a protein name or comes from a species (or Organism Source). Prior knowldge of the protein name or species is necessary to query the database for relevent data.

      Though, it is not necessary to use the whole protein name or organism source however the the search outcome would differ, for example consider the following tasks:

      Using the textbox Protein Name: do

      Task 7a:> type D-mannonate hydrolase then click the button "Submit".
      The result page will be one (or very few) targets as shown in Fig.7.

      Task 7b:> type D-mannonate then click the button "Submit".
      The result page will be more than one (more than few) target as shown in Fig.9.

      Task 7c:> type hydrolase then click the button "Submit".
      The result page will be a large list of target hits as shown in Fig.10.

      Thus, depending on whether the query is specific or loose, the outcome results page will be. Same situation is in case of querying by Species:

      Task 8:> Use the textbox Species: and type ESCHERICHIA COLI then click the button "Submit".
      The result page will display hits list of targets for which the source is the bacteria ESCHERICHIA COLI.

      Note the difference between General Query: and Protein Name:. When using general names such as hydolase as a General Query: this yields more hist since the system will search for the word hydolase in entire the targets' data and not only in the targets' Protein Name data category (compare between Fig.10 and Fig.6).
      Status:

      The database keeps truck of the progressive stages targets go through. i.e. at a previous time, the database could show that a set of targets have the status "Selected" and aftr a while the same target would move to a different status such "Cloned", "Expressed" upto "In PDB" or "Work Stopped" (check the pull-down list Status: and DTD for the different stages a target would go through).

      The search option Status: allows for finding those targets that are at a perticular stage.

      Task 9:> Use the pull-down list Status: and select "In PDB" then click the button "Submit".
      The result page will display hits list of targets for which 3d-structure is available Fig.11 and Fig.12

      Stuctural Genomics Centers:

      As mensioned in introduction, there are several centers arround the world that contributes in providing target's data. This search option allows to find out what's the list of targets provided by each center.

      Task 10:> Use the pull-down list Stuctural Genomics Centers: and select "Berkeley SG Center" then click the button "Submit".
      The result page will display hits list of targets provided by the center.
    Remark: Users are encouraged to try out different inputs as search arguments and examine the results.