 |
Macromolecular Structure Database Group Projects - Targets for NIH Structural Genomics Projects
|
A SEARCH for sequences that are targets in the
NIGMS
supported research centers in its Protein Structure Initiative, is now available.
The public XML target files have been converted to a fasta file.
Current version DOWNLOAD
FASTA format
|
A SCOP structural domain predictions on the set of structural genomics target list
has been carried out by Julian Gough using
SUPERFAMILY HMM library and
genome assignments server, and is available from
here
See Gough et al. 2001. "Assignment of Homology to Genome Sequences using a
Library of Hidden Markov Models that Represent all Proteins of Known Structure."
J. Mol. Biol., 313(4), 903-919.
|
The data has been extracted from the following sites:
The Berkeley Structural Genomics Center
The
Joint Center for Structural Genomics
The Midwest Center for Structural Genomics
The New York Structural Genomics Research Consortium
The Northeast Structural Genomics Consortium
The
Southeast Collaboratory for Structural Genomics
The TB Structural Genomics Consortium
The S2F Structure 2 Function Project
The
BSGI Bacterial Structural Genomics Initiative
The SGPP Structural Genomics for Pathogenic Protozoa
RIKEN Group
Yeast Structural Genomics (France)
BNL Group
|
with blast under
http://www.ebi.ac.uk/blast2/
choose DATABASE sgt
from pull-down menu
For example a search hit will give
>SGT:HR4 Northeast
      Cloned Expressed Soluble HSQC NMR Assigned NMR Structure
        Crystallized In PDB RBP_MS
        Length = 111
Score = 560 (202.2 bits), Expect = 4.9e-56, P = 4.9e-56
Identities = 111/111 (100%), Positives = 111/111 (100%)
Query:    
    1 MNNGGKAEKENTPSEANLQEEEVRTLFVSGLPLDIKPRELYLLFRPFKGYEGSLIKLTSK
60
      MNNGGKAEKENTPSEANLQEEEVRTLFVSGLPLDIKPRELYLLFRPFKGYEGSLIKLTSK
Sbjct:         1 MNNGGKAEKENTPSEANLQEEEVRTLFVSGLPLDIKPRELYLLFRPFKGYEGSLIKLTSK
60
Query:    
    61 QPVGFVSFDSRSEAEAAKNALNGIRFDPEIPQTLRLEFAKANTKMAKNKLV 111
      QPVGFVSFDSRSEAEAAKNALNGIRFDPEIPQTLRLEFAKANTKMAKNKLV
Sbjct:         61 QPVGFVSFDSRSEAEAAKNALNGIRFDPEIPQTLRLEFAKANTKMAKNKLV
111
where:
     
centre
      ^^^^^^^^^^
>SGT:HR4 Northeast Cloned
Expressed Soluble HSQC NMR Assigned NMR Structure
          ^^^^^   ^^^^^^^^^^^^^
         |          |
        centre-id status/remark/name
data fields
|
|
see also-
or NCBI-BLAST2 at http://www.ebi.ac.uk/blastall/
or FASTA3 at http://www.ebi.ac.uk/fasta33/
choose DATABASE sgt from selection menu
A sample search on fasta using in the sequence box one can type a database:id rather
than seq e.g. - type uniprot:ybfg_haein
gives 2 100% hits
SGT:APC387 Midwest Cloned Expressed accession code (228 aa)
initn: 1554 init1: 1554 opt: 1554 Z-score: 2958.0 expect() 3.5e-159
Smith-Waterman score: 1554; 100.000% identity in 228 aa overlap (1-228:1-228)
SGT:HI0374 S2F Selected Cloned Expressed Purified (228 aa)
initn: 1554 init1: 1554 opt: 1554 Z-score: 2958.0 expect() 3.5e-159
Smith-Waterman score: 1554; 100.000% identity in 228 aa overlap (1-228:1-228)
|
Up-to-Date Target Information is available from:-
Not Included in EBI BLAST/FASTA Service
  Berlin
Group
Just Starting up October 2002  
Structural Proteomics
IN Europe (SPINE)
|
|
Please Note: The sgt data base of sequences is created
from the public XML files. There are some differences in the XML between the sites. In
addition not all the site identifiers are unique. Until these inconsistencies are resolved
the data will not be updated routinely.
|
|
1.   Target data will be described
using the XML syntax proposed by the International Task Force.
Target data will be described according to the following
skeleton DTD
Updated 25-July-2001
2.   The protocol for exchanging
target data with the registration site will follow the Task Force recommendations:
The targets file is a concatenation
of individual target entries. Targets data files will be updated weekly.
Each target entry represents a single protein, not a family. Target entries
will not be deleted. Abandoned targets will be identified with a "work
stopped" status code. Each center will provide an FTP address to
be used to download target data Centers will prepare targets files and
FTP sites by 15 June 2001.
3.   The registration site will
provide for FTP download of the concatenated set of targets from all NIH
centers.
4.   A query interface will
be provided at the registration site allowing search on any of the attributes
in the data specification (i.e. lab, name, status, sequence, ...). The
query interface will consist of a simple web form. Search results will
be presented in the form of HTML tables.
|
|