spacer
spacer

Macromolecular Structure Database Group Projects - Targets for NIH Structural Genomics Projects



A SEARCH for sequences that are targets in the NIGMS supported research centers in its Protein Structure Initiative, is now available. The public XML target files have been converted to a fasta file.

Current version DOWNLOAD FASTA format


A SCOP structural domain predictions on the set of structural genomics target list has been carried out by Julian Gough using SUPERFAMILY HMM library and genome assignments server, and is available from here

See Gough et al. 2001. "Assignment of Homology to Genome Sequences using a Library of Hidden Markov Models that Represent all Proteins of Known Structure." J. Mol. Biol., 313(4), 903-919.


The data has been extracted from the following sites:

The Berkeley Structural Genomics Center
The Joint Center for Structural Genomics
The Midwest Center for Structural Genomics
The New York Structural Genomics Research Consortium
The Northeast Structural Genomics Consortium
The Southeast Collaboratory for Structural Genomics
The TB Structural Genomics Consortium
The S2F Structure 2 Function Project
The BSGI Bacterial Structural Genomics Initiative
The SGPP Structural Genomics for Pathogenic Protozoa
RIKEN Group
Yeast Structural Genomics (France)
BNL Group


with blast under http://www.ebi.ac.uk/blast2/ choose DATABASE sgt from pull-down menu

For example a search hit will give


>SGT:HR4 Northeast &nbsp &nbsp &nbsp Cloned Expressed Soluble HSQC NMR Assigned NMR Structure
&nbsp &nbsp &nbsp &nbsp Crystallized In PDB     RBP_MS
&nbsp &nbsp &nbsp &nbsp Length = 111


Score = 560 (202.2 bits), Expect = 4.9e-56, P = 4.9e-56
Identities = 111/111 (100%), Positives = 111/111 (100%)


Query: &nbsp &nbsp &nbsp &nbsp 1 MNNGGKAEKENTPSEANLQEEEVRTLFVSGLPLDIKPRELYLLFRPFKGYEGSLIKLTSK 60
&nbsp &nbsp &nbsp MNNGGKAEKENTPSEANLQEEEVRTLFVSGLPLDIKPRELYLLFRPFKGYEGSLIKLTSK
Sbjct: &nbsp &nbsp &nbsp &nbsp 1 MNNGGKAEKENTPSEANLQEEEVRTLFVSGLPLDIKPRELYLLFRPFKGYEGSLIKLTSK 60


Query: &nbsp &nbsp &nbsp &nbsp 61 QPVGFVSFDSRSEAEAAKNALNGIRFDPEIPQTLRLEFAKANTKMAKNKLV 111
&nbsp &nbsp &nbsp QPVGFVSFDSRSEAEAAKNALNGIRFDPEIPQTLRLEFAKANTKMAKNKLV
Sbjct: &nbsp &nbsp &nbsp &nbsp 61 QPVGFVSFDSRSEAEAAKNALNGIRFDPEIPQTLRLEFAKANTKMAKNKLV 111


where:
&nbsp &nbsp &nbsp centre
&nbsp &nbsp &nbsp ^^^^^^^^^^
>SGT:HR4 Northeast Cloned Expressed Soluble HSQC NMR Assigned NMR Structure
&nbsp &nbsp &nbsp &nbsp &nbsp ^^^^^ &nbsp ^^^^^^^^^^^^^
&nbsp &nbsp &nbsp &nbsp &nbsp|&nbsp &nbsp &nbsp &nbsp &nbsp |
&nbsp &nbsp &nbsp &nbsp centre-id status/remark/name data fields



see also-
or NCBI-BLAST2 at http://www.ebi.ac.uk/blastall/
or FASTA3 at http://www.ebi.ac.uk/fasta33/
choose DATABASE sgt from selection menu

A sample search on fasta using in the sequence box one can type a database:id rather
than seq e.g. - type uniprot:ybfg_haein

gives 2 100% hits
SGT:APC387 Midwest Cloned Expressed accession code (228 aa)
initn: 1554 init1: 1554 opt: 1554 Z-score: 2958.0 expect() 3.5e-159
Smith-Waterman score: 1554; 100.000% identity in 228 aa overlap (1-228:1-228)

SGT:HI0374 S2F Selected Cloned Expressed Purified (228 aa)
initn: 1554 init1: 1554 opt: 1554 Z-score: 2958.0 expect() 3.5e-159
Smith-Waterman score: 1554; 100.000% identity in 228 aa overlap (1-228:1-228)


Up-to-Date Target Information is available from:-
Not Included in EBI BLAST/FASTA Service
  • &nbsp Berlin Group


  • Just Starting up October 2002
  • &nbsp Structural Proteomics IN Europe (SPINE)




  • Please Note: The sgt data base of sequences is created from the public XML files. There are some differences in the XML between the sites. In addition not all the site identifiers are unique. Until these inconsistencies are resolved the data will not be updated routinely.



    1. &nbsp Target data will be described using the XML syntax proposed by the International Task Force.

    Target data will be described according to the following skeleton DTD Updated 25-July-2001

    2. &nbsp The protocol for exchanging target data with the registration site will follow the Task Force recommendations:

    The targets file is a concatenation of individual target entries. Targets data files will be updated weekly. Each target entry represents a single protein, not a family. Target entries will not be deleted. Abandoned targets will be identified with a "work stopped" status code. Each center will provide an FTP address to be used to download target data Centers will prepare targets files and FTP sites by 15 June 2001.

    3. &nbsp The registration site will provide for FTP download of the concatenated set of targets from all NIH centers.

    4. &nbsp A query interface will be provided at the registration site allowing search on any of the attributes in the data specification (i.e. lab, name, status, sequence, ...). The query interface will consist of a simple web form. Search results will be presented in the form of HTML tables.

    spacer