Structural genomics tutorial
Sameer Velankar

Find All Structural Genomics Entries

First, let's find all entries in the PDBe deposited by Structural Genomics Projects, using a crude search in MSDlite.

Put "Structural Genomics" in the Text Search field and hit Start search. You should get a page of results - actually you should get lots of pages of results... You should see some 5000+ results for this search. 

 

Let us select one entry from the result list for further analysis. Go to Page 7 of the results by clicking on the page number tab of the results page.  Find the result line for the entry 1J6P and click on the ID code link on the left of the line. This will take you to the MSD atlas page for this entry.


Look through the various pages in the atlas pages for 1J6P. These pages contain as much information as possible about the structure, sequence, ligands, citations, etc. for this entry. There are also links to various related databases, such as SwissProt and PubMed.

In the similarity page of the 1J6P atlas page, there various links to related entries in SCOP, CATH and PDBSUM databases. Using MSDfold you can also find  structurally similar proteins in the PDB, with respect to their secondary structure.


Click on the compare link on the SSM line on this page. This will execute the MSDfold query, which can take up to several minutes to run, depending on the size of the search molecule. Whilst it runs you can see the number of CPU's that  the system is using to run your particular query.



MSDfold result list

Once the search is complete, you will be taken automatically to the results page:


The MSDfold query for PDB entry 1J6P returns only 4 structures that are likely to have a similar arrangement of secondary structure elements. The most meaningful value in the results table (for our purposes) is the Query % sse. This shows the percentage similarity level between the target structure and
the various results. The first two results of this list refer to the same protein (PDB entries 1J6P and 1P1M), with no functional annotation in the PDB or SwissProt databank. Interestingly, the third and the fourth hit both correspond to cytosine deaminase (EC: 3.5.4.1)from E.coli. Both these results only have 15% sequence identity with 1J6P (the query molecule), but appear to be over 70% similar in secondary structure.

Look down the results and click on the link in the result line for PDB entry 1K70, the structure of E.coli cytosine deaminase. If you have RasMol installed, you can look at the superposition of this result with the search structure (the hypothetical protein, 1J6P). Alternatively, you can look down the table of results for this match and see the match schematically:

   


The superposition clearly shows that the fold of our hypothetical protein (1J6P) is very similar to that of E.coli cytosine deaminase. Similarity of fold does not necessarily translate into analogous function (everybody knows the classic example of the TIM barrel fold being exploited in various functions).  To understand more about our hypothetical protein we need to investigate it in more detail at residue level. But as we can see from the MSDfold results the sequence similarity is rather low. Taking a cue from the proteases trypsin and chymotrypsin, where the folds are very different but the functional residues have the same three dimensional disposition, we will now investigate whether the structural similarity of the hypothetical protein and cytosine deaminase translates into analogous function.

Let us try and find out more about the PDB entry 1K70.  Go to the PDBe home page and enter 1K70 in the Get PDB by ID and click go.


This will take you to the atlas page for the entry 1K70.


As we can see from the summary page this protein is very well characterized and has been determined in complex with substrate analogue "4-HYDROXY-3,4-DIHYDRO-1H-PYRIMIDIN-2-ONE (HPY)" and"IRON (FE)", the latter required for the hydrolase activity. 

Analysis of binding sites

In the Ligand page of the 1K70 atlas pages, there are schematic diagrams of the various heterogens that are bound to the molecule. You can see the environment in which each ligand is found, by clicking the link marked View the interactions of HPY with 1K70.  This takes you to the MSDsite search tool. You can also look at the binding environment for Iron (Fe) by clicking on the View the interactions of Fe with 1K70.

The MSDsite database catalogues the interactions between ligands in the PDB and the protein to which they are bound. You can see all of the bonds, non-bonding interactions, etc. that mannose makes with the protein, and can upload your own structure and use the system to compare it against the whole PDB:


You can look at the details of the binding site using the two leftmost icon links which link  to different viewing programs:
AstexViewer@EBI-MSD - this is a simple java applet that should work on any system which has java installed
RasMol - this requires your machine to have RasMol already installed (may not work on all machines).

You can also look at the statistics for each type of site throughout the PDBe database, by clicking on the small chart icons next to each hit. The red chart shows the statistics in the database for each kind of ligand, whilst the blue chart links to the statistics environment that is similar to the one which surrounds this particular instance of a ligand. The bonds link shows you the details of the interactions between the ligand and the protein.

It is now important to see if any of the residues in the binding envionment for HPY in PDB entry 1K70 are conserved in the hypothetical protein (PDB entry 1J6P). It is important to realize that these residues, if conserved, may not show up in the sequence alignment as the sequence identity between the two proteins is very low. Also, having the same geometrical arrangement for functional residues is more important for protein function than conservation of the relationship in the one dimensional protein polypeptide sequence.

Therefore, while analysing the functional residues we will use the structural alignment obtained from MSDfold rather than using sequence alignment. The MSDfold service also provides residue-by-residue 3D Structural alignment for the query and target protein. We need to check if the residues given in the binding environment for HPY in 1K70 have identical or similar residues in a structurally similar location in the hypothetical protein 1J6P.

The residues we are interested in are (from the MSDsite page for HPY):

GLN156,GLU217,HIS246,ASP313,HIS63,LEU81,PHE154,HIS214,TRP319

Scrolling through the results from MSDfold we get the following:

PDB entry 1K70 (E. coli cytosine Deaminase)
RMSD
PDB entry 1J6P (Hypothetical protein)
His 63
0.73
His 57
Leu 81
4.52
Glu 73
Phe 154
1.09
Gly 137
Gln 156
xx
No equivalent residue
His 214
1.22
His 200
Glu 217
1.54
Glu 203
His 246
1.21
His 228
Asp 313
0.29
Asp 279
Trp 319
3.26
Ser 283


The residues involved in the binding environment for Iron (Fe) in the E.coli cytosine deaminase are:

His61, His63, His214, Asp313


Interestingly enough, on the MSDfold residue-by-residue comparison list, His 61 from 1K70 (cytosine deaminase) is structurally equivalent to His 55 from our hypothetical protein (1J6P) with a RPDBe of 0.65 A.

Conclusions

From published literature, it is known that the cytosine deaminase active site has striking similarity to the active site of adenosine deaminase (Ireton et al., J.Mol. Biol. 2002). The main differences are in residues at position Gln 156 and Trp 319 in cytosine deaminase, replaced by Gly and Phe respectively in adenosine deaminase. Looking at the hypothetical protein, Asn 142 maybe in proximity to Gln 156 of E. coli cytosine deaminase. This residue is important in simultaneous coordination of atoms N1 and O2 of cytosine substrate. The equivalent residue in adenosine deaminase is a glycine. Whereas, adenosine deaminase has no structural equivalent to Trp 319 of cytosine deaminase, there are two phenylalanine residues in proximity. This is similar to the case in the hypothetical protein which also has no structural equivalent for Trp319, but has two phenylalanine residues (290 and 291) in the same area. This analysis shows that the hypothetical protein is likely to be a deaminase, although further  structural analysis could shed light on the specificity of this protein.

A similar analysis has been done by the Joint Center for Structural Genomics for this hypothetical protein (by the authors of the PDB entry 1J6P). This analysis can be found here.

 


Sameer Velankar
Last modified: Wed Sep 3 20:16:36 BST 2003