
Abdelkrim RACHEDI
Contents
- Introduction
- Getting Started
- Task 1
- Search SG-targets by sequence similarity
- Paste a sequence
- Task 2
- Upload sequences
- Task 3
- Task 4
- Search/Track SG-targets
- General Query
- Task 5
- Target ID
- Task 6
- Protein Name & Species
- Task 7a
- Task 7b
- Task 7c
- Task 8
- Status
- Task 9
- Stuctural Genomics Centers
- Task 10
Target Information Server: Structural Genomics
The Target Information Server is a tool for searching and tracking Structural Genomics
targets. These include public domain SG, PDB and prerleased PDB sequences
.
Targets are mainly protein sequences however results of a given search query would contain
related links to structural data (pdb).
The data in our database has been extracted from the following sites:
The Berkeley Structural Genomics Center
The Joint Center for Structural Genomics
The Midwest Center for Structural Genomics
The New York Structural Genomics Research Consortium The Northeast Structural Genomics Consortium
The Southeast Collaboratory for Structural Genomics The TB Structural Genomics Consortium
The S2F Structure 2 Function Project The BSGI Bacterial Structural Genomics Initiative
The SGPP Structural Genomics for Pathogenic Protozoa
RIKEN Group
Yeast Structural Genomics (France)
BNL Group
It is worthwhile to note that the sgt data base of
sequences is created from the public XML files. There are some differences in the XML between the sites. In
addition not all the site identifiers are unique. Until these inconsistencies are resolved
the data will not be updated routinely.
1. Target data will be described
using the XML syntax proposed by the International Task Force.
Target data will be described according to the following
skeleton DTD Updated 25-July-2001
2. The protocol for exchanging
target data with the registration site will follow the Task Force recommendations:
The targets file is a concatenation
of individual target entries. Targets data files are updated weekly.
Each target entry represents a single protein, not a family. Target entries
will not be deleted. Abandoned targets will be identified with a "work
stopped" status code.
Tracking targets with this server is done by either searching for sequence similarity between
user's sequence(s) and SG targets or by direct detailed search(i.e. searching for targets by their status,
protein name, or organism source .. etc.).
The Target Information Server (MSDtarget) offers two main options to search and track targets (Fig.1):
Search SG-targets by sequence similarity: (option A)
This is done by either pasting and or typing a sequence or loading a local file containing one or more sequences in fasta format.This will scan for similar target sequences and output a hit list which in turn provide links to targets data
(status and sequence alignement with the user's sequence.)
Search/Track SG-targets: (option B)
Here, users can search targets database for specific data through many options such
as status of targets (Status option), target's name (Protein Name option) .. etc.
Users can also use the option General Query to do a general search such finding all targets
that have been submitted to SWISS-PROT, targets information based on date (e.g. 27-aug-2002)
..etc.
Task 1:> Click this link public targets DBase targetDB to launch the MSDtarget service page.
Paste a sequence
A sequence such as the follwoing can be pasted into the Sequence-Box (see Fig.1) and by clicking the button "Submit"
MGSSHHHHHHDYDIPTTENLYFQGHMKVKILVDSTADVPFSWMEKYDIDSIPLYVVWEDG
RSEPDEREPEEIMNFYKRIREAGSVPKTSQPSVEDFKKRYLKYKEEDYDVVLVLTLSSKL
SGTYNSAVLASKEVDIPVYVVDTLLASGAIPLPARVAREMLENGATIEEVLKKLDERMKN
KDFKAIFYVSNFDYLVKGGRVSKFQGFVGNLLKIRVCLHIENGELIPYRKVRGDKKAIEA
LIEKLREDTPEGSKLRVIGVHADNEAGVVELLNTLRKSYEVVDEIISPMGKVITTHVGPG
TVGFGIEVLERKR
a results page will display, on top the sequence that has been submitted and below it a hit list summary of the
targets with which the submitted sequence share sequence smilarity.
The summary is presented in a table like fashion (Fig.2) showing the following columns :
Column Maching Targets: Shows the type of target being an SGT (Structural Genomics Target),
a PDB (Protein Data Bank) or a PRE (Pre-released PDB target). This is followed by hypelinked accession code of
the target. Click the link to see target details and sequence alignement of the target's sequence with the submitted sequence
(Fig.3).
Column Last Status: This concern the SGT targets only and shows the last step of the target.
Column Seq. Ln: Shows the length of the target's sequence.
Column Similarity: Shows similiraity level between the target's sequence and submited sequence.
Column DBase name: Shows database name where the target's sequence resides.
Column DBase ID: Shows the accession identifier (hypelinked) of the target at the database above. Click the link
to see detailed entry of the target at the relevent database
Task 2:> While the MSDtarget page is on, copy and paste the above sequence into the
Sequence-Box and then click the button "Submit". (Depending on the load the MSDtarget is dealing, results page may take some time to appear.)
Upload sequences
In case a sequence or more are to be loaded from disk, the second option Upload FASTA sequence(s) file: is
to be used. The sequence(s) should be be in FASTA format (FASTA example), which is a simple format
in which each sequence is represented by aminoacids in signle letter code (AA Codes) preceeded
by a single line which starts with ">" followed by the sequence identifier, title and/or description.
When loading sequences, the MSDtarget will behave in accordance to one of these following two cases:
Case1: If uploaded file contains only one sequence, the procedure would be same as seen above with pasting a sequence
Case2: If uploaded file contains more that one sequence, thus after clicking the "Submit" button, an intermediate page
will display the sequences loaded from the file and allow the user
to choose which sequence to search against by click on the sequence's
button one at a time (Fig.4).
The rest is same as before.
Task 3:> (for Case1) Go to MSDtarget main page and click the button "Browse..." and
using the dialogue window load the file named "1.fasta" then click the button "Submit".
Task 4:> (for Case2) Go to MSDtarget main page and click the button "Browse..." and
using the dialogue window load the file named "2.fasta" then click the button "Submit".
On the intermidiary page, choose one of the sequences then click the button "Submit".
This search option allow the search of targets in a direct manner and offers six quering ways (Fig.5).
It's worth while to note that the result page of any query will be a page of a hits list where links are provided to
allow display of the targets details. However, if a query finds only one hit, the target details will be shown immediatly (see Task 6:>).
General Query:
General type of query can be used here such as "swiss-prot" which means "get all targets that have links to the SWISS-PROT database".
Other queries could be sought including keywords such as "hydrolase" which attempt to get all targets related in away or another
with the "enzyme family hydrolase" (see Fig.6).
Task 5:> Use the textbox General Query: and type oct-2003 then click the button "Submit".
The result page will show targets dated by producing laboratory as being produced in the month of October 2003.
Target ID:
Every target in the databse has an identification number or code. Quering by this way necessates the prior
knowledge of the target id for the target that subject of interest.
Task 6:> Use the textbox Target ID: and type the target_id 281950 then click the button "Submit".
The result page will give the target data as shown in Fig.7.
Protein Name & Species:
Same as above, every target has a protein name or comes from a species (or Organism Source). Prior
knowldge of the protein name or species is necessary to query the database for relevent data.
Though, it is not necessary to use the whole protein name or organism source however the
the search outcome would differ, for example consider the following tasks:
Using the textbox Protein Name: do
Task 7a:> type D-mannonate hydrolase then click the button "Submit".
The result page will be one (or very few) targets as shown in Fig.7.
Task 7b:> type D-mannonate then click the button "Submit".
The result page will be more than one (more than few) target as shown in Fig.9.
Task 7c:> type hydrolase then click the button "Submit".
The result page will be a large list of target hits as shown in Fig.10.
Thus, depending on whether the query is specific or loose, the outcome results page will be.
Same situation is in case of querying by Species:
Task 8:> Use the textbox Species: and type ESCHERICHIA COLI then click the button "Submit".
The result page will display hits list of targets for which the source is the bacteria ESCHERICHIA COLI.
Note the difference between General Query: and Protein Name:.
When using general names such as hydolase as a General Query: this
yields more hist since the system will search for the word hydolase in entire
the targets' data and not only in the targets' Protein Name data category
(compare between Fig.10 and Fig.6).
Status:
The database keeps truck of the progressive stages targets go through.
i.e. at a previous time, the database could show that a set of targets have
the status "Selected" and aftr a while the same target would move to a different
status such "Cloned", "Expressed" upto "In PDB" or "Work Stopped"
(check the pull-down list Status: and DTD for the different
stages a target would go through).
The search option Status: allows for finding those targets that are at a perticular stage.
Task 9:> Use the pull-down list Status: and select "In PDB" then click the button "Submit".
The result page will display hits list of targets for which 3d-structure is available Fig.11 and Fig.12
Stuctural Genomics Centers:
As mensioned in introduction, there are several centers arround the world that
contributes in providing target's data. This search option allows to find out what's the list
of targets provided by each center.
Task 10:> Use the pull-down list Stuctural Genomics Centers: and select "Berkeley SG Center" then click the button "Submit".
The result page will display hits list of targets provided by the center.
Remark: Users are encouraged to try out different inputs as search arguments and examine the results.