spacer
spacer

PDBsum entry 1ymz

Go to PDB code: 
Top Page protein links
Unknown function PDB id
1ymz
Contents
Protein chain
37 a.a.

References listed in PDB file
Key reference
Title Evolutionary information for specifying a protein fold.
Authors M.Socolich, S.W.Lockless, W.P.Russ, H.Lee, K.H.Gardner, R.Ranganathan.
Ref. Nature, 2005, 437, 512-518. [DOI no: 10.1038/nature03991]
PubMed id 16177782
Abstract
Classical studies show that for many proteins, the information required for specifying the tertiary structure is contained in the amino acid sequence. Here, we attempt to define the sequence rules for specifying a protein fold by computationally creating artificial protein sequences using only statistical information encoded in a multiple sequence alignment and no tertiary structure information. Experimental testing of libraries of artificial WW domain sequences shows that a simple statistical energy function capturing coevolution between amino acid residues is necessary and sufficient to specify sequences that fold into native structures. The artificial proteins show thermodynamic stabilities similar to natural WW domains, and structure determination of one artificial protein shows excellent agreement with the WW fold at atomic resolution. The relative simplicity of the information used for creating sequences suggests a marked reduction to the potential complexity of the protein-folding problem.
Figure 1.
Figure 1: SCA-based protein design. a, Structure of a representative WW domain (Nedd4.3, Protein Data Bank 1I5H) in complex with a target peptide (in stick representation). The two canonical tryptophans are shown as space-filling side chains. The figure was prepared using PyMol51. b, SCA conservation scores for each position in the WW alignment in arbitrary units of statistical energy12. Position numbers (x axis) and the secondary structure diagram at the top coincide with matrix columns in c-e. c, A matrix representation of statistical coupling values from perturbation analysis of five positions (rows) in the WW domain MSA. d, The matrix for an alignment of IC sequences, built by randomly selecting amino acids at each site from the observed frequency distributions in the natural alignment. e, The matrix for an alignment of CC sequences, derived from a design algorithm where both the conservation pattern and the pattern of statistical couplings in the natural alignment are preserved. Scale bar shows the SCA coevolution score, ranging from 0 (blue) to 2 (red).
Figure 4.
Figure 4: Summary of experiments on all natural and artificial WW sequences. a, A pie chart showing the outcomes of folding studies for natural (n = 42), CC (n = 43), IC (n = 43), or random (n = 19) WW sequences. Red, natively folded; blue, soluble but unfolded; yellow, insoluble; grey, poor expressing. b, Melting temperatures (T[m]) and van't Hoff enthalpies of unfolding for all folded WW sequences. Open circles indicate natural sequences and filled circles indicate the 12 folded CC sequences. The artificial sequences show thermodynamic parameters that fall into the same range as that of natural WW domains.
The above figures are reprinted by permission from Macmillan Publishers Ltd: Nature (2005, 437, 512-518) copyright 2005.
PROCHECK
Go to PROCHECK summary
 Headers

 

spacer

spacer