spacer
spacer

PDBsum entry 1ymz

Go to PDB code: 
protein links
Unknown function PDB id
1ymz

 

 

 

 

Loading ...

 
JSmol PyMol  
Contents
Protein chain
37 a.a. *
* Residue conservation analysis
PDB id:
1ymz
Name: Unknown function
Title: Cc45, an artificial ww domain designed using statistical coupling analysis
Structure: Cc45. Chain: a. Engineered: yes
Source: Synthetic: yes. Other_details: computational design
NMR struc: 10 models
Authors: M.Socolich,S.W.Lockless,W.P.Russ,H.Lee,K.H.Gardner,R.Ranganathan
Key ref:
M.Socolich et al. (2005). Evolutionary information for specifying a protein fold. Nature, 437, 512-518. PubMed id: 16177782 DOI: 10.1038/nature03991
Date:
22-Jan-05     Release date:   27-Sep-05    
PROCHECK
Go to PROCHECK summary
 Headers
 References

Protein chain
No UniProt id for this chain
Struc: 37 a.a.
Key:    Secondary structure

 

 
DOI no: 10.1038/nature03991 Nature 437:512-518 (2005)
PubMed id: 16177782  
 
 
Evolutionary information for specifying a protein fold.
M.Socolich, S.W.Lockless, W.P.Russ, H.Lee, K.H.Gardner, R.Ranganathan.
 
  ABSTRACT  
 
Classical studies show that for many proteins, the information required for specifying the tertiary structure is contained in the amino acid sequence. Here, we attempt to define the sequence rules for specifying a protein fold by computationally creating artificial protein sequences using only statistical information encoded in a multiple sequence alignment and no tertiary structure information. Experimental testing of libraries of artificial WW domain sequences shows that a simple statistical energy function capturing coevolution between amino acid residues is necessary and sufficient to specify sequences that fold into native structures. The artificial proteins show thermodynamic stabilities similar to natural WW domains, and structure determination of one artificial protein shows excellent agreement with the WW fold at atomic resolution. The relative simplicity of the information used for creating sequences suggests a marked reduction to the potential complexity of the protein-folding problem.
 
  Selected figure(s)  
 
Figure 1.
Figure 1: SCA-based protein design. a, Structure of a representative WW domain (Nedd4.3, Protein Data Bank 1I5H) in complex with a target peptide (in stick representation). The two canonical tryptophans are shown as space-filling side chains. The figure was prepared using PyMol51. b, SCA conservation scores for each position in the WW alignment in arbitrary units of statistical energy12. Position numbers (x axis) and the secondary structure diagram at the top coincide with matrix columns in c-e. c, A matrix representation of statistical coupling values from perturbation analysis of five positions (rows) in the WW domain MSA. d, The matrix for an alignment of IC sequences, built by randomly selecting amino acids at each site from the observed frequency distributions in the natural alignment. e, The matrix for an alignment of CC sequences, derived from a design algorithm where both the conservation pattern and the pattern of statistical couplings in the natural alignment are preserved. Scale bar shows the SCA coevolution score, ranging from 0 (blue) to 2 (red).
Figure 4.
Figure 4: Summary of experiments on all natural and artificial WW sequences. a, A pie chart showing the outcomes of folding studies for natural (n = 42), CC (n = 43), IC (n = 43), or random (n = 19) WW sequences. Red, natively folded; blue, soluble but unfolded; yellow, insoluble; grey, poor expressing. b, Melting temperatures (T[m]) and van't Hoff enthalpies of unfolding for all folded WW sequences. Open circles indicate natural sequences and filled circles indicate the 12 folded CC sequences. The artificial sequences show thermodynamic parameters that fall into the same range as that of natural WW domains.
 
  The above figures are reprinted by permission from Macmillan Publishers Ltd: Nature (2005, 437, 512-518) copyright 2005.  
  Figures were selected by an automated process.  

Literature references that cite this PDB file's key reference

  PubMed id Reference
23041932 R.N.McLaughlin, F.J.Poelwijk, A.Raman, W.S.Gosal, and R.Ranganathan (2012).
The spatial architecture of protein function and adaptation.
  Nature, 491, 138-142.  
21365689 D.Armenta-Medina, E.Pérez-Rueda, and L.Segovia (2011).
Identification of functional motions in the adenylate kinase (ADK) protein family by computational hybrid approaches.
  Proteins, 79, 1662-1671.  
20529650 A.Chakravarti (2010).
ASHG Awards and Addresses. 2008 Presidential address: Principia genetica: our future science.
  Am J Hum Genet, 86, 302-308.  
21124869 A.D.van Dijk, G.Morabito, M.Fiers, R.C.van Ham, G.C.Angenent, and R.G.Immink (2010).
Sequence motifs in MADS transcription factors responsible for specificity and diversification of protein-protein interaction.
  PLoS Comput Biol, 6, e1001017.  
20979667 A.D.van Dijk, and R.C.van Ham (2010).
Conserved and variable correlated mutations in the plant MADS protein network.
  BMC Genomics, 11, 607.  
20714644 A.Ernst, D.Gfeller, Z.Kan, S.Seshagiri, P.M.Kim, G.D.Bader, and S.S.Sidhu (2010).
Coevolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing.
  Mol Biosyst, 6, 1782-1790.  
  20862353 A.Kowarsch, A.Fuchs, D.Frishman, and P.Pagel (2010).
Correlated mutations: a hallmark of phenotypic amino acid substitutions.
  PLoS Comput Biol, 6, 0.  
20159780 C.L.Kleinman, N.Rodrigue, N.Lartillot, and H.Philippe (2010).
Statistical potentials for improved structurally constrained evolutionary models.
  Mol Biol Evol, 27, 1546-1560.  
20975933 M.Lunzer, G.B.Golding, and A.M.Dean (2010).
Pervasive cryptic epistasis in molecular evolution.
  PLoS Genet, 6, e1001162.  
20463972 M.Schmidt Am Busch, A.Sedano, and T.Simonson (2010).
Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition.
  PLoS One, 5, e10410.  
20949088 Q.S.Du, C.H.Wang, S.M.Liao, and R.B.Huang (2010).
Correlation analysis for protein evolutionary family based on amino acid position mutations and application in PDZ domain.
  PLoS One, 5, e13207.  
20865007 R.G.Smock, O.Rivoire, W.P.Russ, J.F.Swain, S.Leibler, R.Ranganathan, and L.M.Gierasch (2010).
An interdomain sector mediating allostery in Hsp70 molecular chaperones.
  Mol Syst Biol, 6, 414.  
20596526 R.J.Dickson, L.M.Wahl, A.D.Fernandes, and G.B.Gloor (2010).
Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation.
  PLoS One, 5, e11082.  
20551042 S.C.Lovell, and D.L.Robertson (2010).
An integrated view of molecular coevolution in protein-protein interactions.
  Mol Biol Evol, 27, 2567-2575.  
21168766 S.M.Lippow, T.S.Moon, S.Basu, S.H.Yoon, X.Li, B.A.Chapman, K.Robison, D.Lipovšek, and K.L.Prather (2010).
Engineering enzyme specificity using computational design of a defined-sequence library.
  Chem Biol, 17, 1306-1315.  
20212159 T.Mora, A.M.Walczak, W.Bialek, and C.G.Callan (2010).
Maximum entropy models for antibody diversity.
  Proc Natl Acad Sci U S A, 107, 5405-5410.  
20377457 W.Zheng, K.E.Griswold, and C.Bailey-Kellogg (2010).
Protein fragment swapping: a method for asymmetric, selective site-directed recombination.
  J Comput Biol, 17, 459-475.  
19628501 B.C.Lee, and D.Kim (2009).
A new method for revealing correlated mutations under the structural and functional constraints in proteins.
  Bioinformatics, 25, 2506-2513.  
20007785 C.T.Wong Po Foo, J.S.Lee, W.Mulyasasmita, A.Parisi-Amon, and S.C.Heilshorn (2009).
Two-component protein-engineered physical hydrogels for cell encapsulation.
  Proc Natl Acad Sci U S A, 106, 22067-22072.  
19415757 F.A.Buske, R.Their, E.M.Gillam, and M.Bodén (2009).
In silico characterization of protein chimeras: relating sequence and function within the same fold.
  Proteins, 77, 111-120.  
19368895 H.J.Chang, H.J.Hsu, C.F.Chang, H.P.Peng, Y.K.Sun, H.M.Yu, H.C.Shih, C.Y.Song, Y.T.Lin, C.C.Chen, C.H.Wang, and A.S.Yang (2009).
Molecular evolution of cystine-stabilized miniproteins as stable proteinaceous binders.
  Structure, 17, 620-631.  
19905153 J.D.Fitzgerald, and T.O.Sharpee (2009).
Maximally informative pairwise interactions in networks.
  Phys Rev E Stat Nonlin Soft Matter Phys, 80, 031914.  
19525973 J.Gao, D.A.Bosco, E.T.Powers, and J.W.Kelly (2009).
Localized thermodynamic coupling between hydrogen bonding and microenvironment polarity substantially stabilizes proteins.
  Nat Struct Mol Biol, 16, 684-690.  
19730675 J.L.Lahti, A.P.Silverman, and J.R.Cochran (2009).
Interrogating and predicting tolerated sequence diversity in protein folds: application to E. elaterium trypsin inhibitor-II cystine-knot miniprotein.
  PLoS Comput Biol, 5, e1000499.  
19644177 J.Thomas, N.Ramakrishnan, and C.Bailey-Kellogg (2009).
Protein design by sampling an undirected graphical model of residue constraints.
  IEEE/ACM Trans Comput Biol Bioinform, 6, 506-516.  
19565466 M.Jäger, M.Dendle, and J.W.Kelly (2009).
Sequence determinants of thermodynamic stability in a WW domain--an all-beta-sheet protein.
  Protein Sci, 18, 1806-1813.  
19703402 N.Halabi, O.Rivoire, S.Leibler, and R.Ranganathan (2009).
Protein sectors: evolutionary units of three-dimensional structure.
  Cell, 138, 774-786.  
19816556 O.N.Demerdash, M.D.Daily, and J.C.Mitchell (2009).
Structure-based predictive models for allosteric hot spots.
  PLoS Comput Biol, 5, e1000531.  
20011105 O.Noivirt-Brik, A.Horovitz, and R.Unger (2009).
Trade-off between positive and negative design of protein stability: from lattice models to real proteins.
  PLoS Comput Biol, 5, e1000592.  
18837035 P.C.Whitford, J.K.Noel, S.Gosavi, A.Schug, K.Y.Sanbonmatsu, and J.N.Onuchic (2009).
An all-atom structure-based potential for proteins: bridging minimal models with all-atom empirical forcefields.
  Proteins, 75, 430-441.  
19321008 R.Alves, E.Vilaprinyo, A.Sorribas, and E.Herrero (2009).
Evolution based on domain combinations: the case of glutaredoxins.
  BMC Evol Biol, 9, 66.  
19193735 S.G.Williams, and S.C.Lovell (2009).
The effect of sequence evolution on protein structural divergence.
  Mol Biol Evol, 26, 1055-1065.  
19274733 S.Lukman, and G.H.Grant (2009).
A network of dynamically conserved residues deciphers the motions of maltose transporter.
  Proteins, 76, 588-597.  
19262747 S.N.Fatakia, S.Costanzi, and C.C.Chow (2009).
Computing highly correlated positions using mutual information and graph theory for G protein-coupled receptors.
  PLoS ONE, 4, e4681.  
19541616 S.W.Lockless, and T.W.Muir (2009).
Traceless protein splicing utilizing evolved split inteins.
  Proc Natl Acad Sci U S A, 106, 10999-11004.  
19054790 V.Pabuwal, and Z.Li (2009).
Comparative analysis of the packing topology of structurally important residues in helical membrane and soluble proteins.
  Protein Eng Des Sel, 22, 67-73.  
19282960 W.Stacklies, M.C.Vega, M.Wilmanns, and F.Gräter (2009).
Mechanical network in titin immunoglobulin from force distribution analysis.
  PLoS Comput Biol, 5, e1000306.
PDB code: 1waa
19424487 Y.Roudi, S.Nirenberg, and P.E.Latham (2009).
Pairwise maximum entropy models for studying large biological systems: when they can work and when they can't.
  PLoS Comput Biol, 5, e1000380.  
18675276 A.S.Nascimento, S.Krauchenco, A.M.Golubev, A.Gustchina, A.Wlodawer, and I.Polikarpov (2008).
Statistical coupling analysis of aspartic proteinases based on crystal structures of the Trichoderma reesei enzyme and its complex with pepstatin A.
  J Mol Biol, 382, 763-778.
PDB codes: 3c9x 3c9y 3emy
18190694 B.Adam, B.Charloteaux, J.Beaufays, L.Vanhamme, E.Godfroid, R.Brasseur, and L.Lins (2008).
Distantly related lipocalins share two conserved clusters of hydrophobic residues: use in homology modeling.
  BMC Struct Biol, 8, 1.  
18275083 B.C.Lee, K.Park, and D.Kim (2008).
Analysis of the residue-residue coevolution network and the functionally important residues in proteins.
  Proteins, 72, 863-872.  
18818697 F.Pazos, and A.Valencia (2008).
Protein co-evolution, co-adaptation and interactions.
  EMBO J, 27, 2648-2655.  
18004776 G.J.Bartlett, and W.R.Taylor (2008).
Using scores derived from statistical coupling analysis to distinguish correct and incorrect folds in de-novo protein structure prediction.
  Proteins, 71, 950-959.  
18247351 G.Morra, and G.Colombo (2008).
Relationship between energy distribution and fold stability: Insights from molecular dynamics simulations of native and mutant proteins.
  Proteins, 72, 660-672.  
18765810 G.S.Chang, Y.Hong, K.D.Ko, G.Bhardwaj, E.C.Holmes, R.L.Patterson, and D.B.van Rossum (2008).
Phylogenetic profiles reveal evolutionary relationships within the "twilight zone" of sequence similarity.
  Proc Natl Acad Sci U S A, 105, 13474-13479.  
18555780 J.M.Skerker, B.S.Perchuk, A.Siryaporn, E.A.Lubin, O.Ashenberg, M.Goulian, and M.T.Laub (2008).
Rewiring the specificity of two-component signal transduction systems.
  Cell, 133, 1043-1054.  
18451428 J.Thomas, N.Ramakrishnan, and C.Bailey-Kellogg (2008).
Graphical models of residue coupling in protein families.
  IEEE/ACM Trans Comput Biol Bioinform, 5, 183-197.  
18056067 K.Y.Yip, P.Patel, P.M.Kim, D.M.Engelman, D.McDermott, and M.Gerstein (2008).
An integrated system for studying residue coevolution in proteins.
  Bioinformatics, 24, 290-292.  
17957766 M.D.Daily, T.J.Upadhyaya, and J.J.Gray (2008).
Contact rearrangements form coupled networks from local motions in allosteric proteins.
  Proteins, 71, 455-466.  
18844292 M.Jager, S.Deechongkit, E.K.Koepf, H.Nguyen, J.Gao, E.T.Powers, M.Gruebele, and J.W.Kelly (2008).
Understanding the mechanism of beta-sheet folding from a chemical and biological perspective.
  Biopolymers, 90, 751-758.  
17972288 O.Rahat, A.Yitzhaky, and G.Schreiber (2008).
Cluster conservation as a novel tool for studying protein-protein interactions evolution.
  Proteins, 71, 621-630.  
18650393 S.Gosavi, P.C.Whitford, P.A.Jennings, and J.N.Onuchic (2008).
Extracting function from a beta-trefoil folding motif.
  Proc Natl Acad Sci U S A, 105, 10384-10389.  
17588526 T.R.Jahn, and S.E.Radford (2008).
Folding versus aggregation: polypeptide conformations on competing pathways.
  Arch Biochem Biophys, 469, 100-117.  
17905840 T.R.Weikl (2008).
Transition states in protein folding kinetics: modeling phi-values of small beta-sheet proteins.
  Biophys J, 94, 929-937.  
17197416 A.D.Ferguson, C.A.Amezcua, N.M.Halabi, Y.Chelliah, M.K.Rosen, R.Ranganathan, and J.Deisenhofer (2007).
Signal transduction pathway of TonB-dependent transporters.
  Proc Natl Acad Sci U S A, 104, 513-518.
PDB codes: 1zzv 2a02
18065429 A.Fuchs, A.J.Martin-Galiano, M.Kalman, S.Fleishman, N.Ben-Tal, and D.Frishman (2007).
Co-evolving residues in membrane proteins.
  Bioinformatics, 23, 3312-3319.  
17243158 E.Eyal, M.Frenkel-Morgenstern, V.Sobolev, and S.Pietrokovski (2007).
A pair-to-pair amino acids substitution matrix and its applications for protein structure prediction.
  Proteins, 67, 142-153.  
17477839 J.R.Banavar, and A.Maritan (2007).
Physics of proteins.
  Annu Rev Biophys Biomol Struct, 36, 261-280.  
17332019 J.S.Papadopoulos, and R.Agarwala (2007).
COBALT: constraint-based alignment tool for multiple protein sequences.
  Bioinformatics, 23, 1073-1079.  
17704159 J.Yu, T.Ha, and K.Schulten (2007).
How directional translocation is regulated in a DNA helicase motor.
  Biophys J, 93, 3783-3797.  
17563360 M.A.Wright, P.Kharchenko, G.M.Church, and D.Segrè (2007).
Chromosomal periodicity of evolutionarily conserved gene pairs.
  Proc Natl Acad Sci U S A, 104, 10559-10564.  
17586778 M.Jäger, H.Nguyen, M.Dendle, M.Gruebele, and J.W.Kelly (2007).
Influence of hPin1 WW N-terminal domain boundaries on function, protein stability, and folding.
  Protein Sci, 16, 1495-1501.  
17686488 M.Meiyappan, G.Birrane, and J.A.Ladias (2007).
Structural basis for polyproline recognition by the FE65 WW domain.
  J Mol Biol, 372, 970-980.
PDB codes: 2ho2 2idh 2oei
17915013 R.Gouveia-Oliveira, and A.G.Pedersen (2007).
Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation.
  Algorithms Mol Biol, 2, 12.  
17510961 S.Höfinger, B.Almeida, and U.H.Hansmann (2007).
Parallel tempering molecular dynamics folding simulation of a signal peptide in explicit water.
  Proteins, 68, 662-669.  
17240343 S.Meier, P.R.Jensen, C.N.David, J.Chapman, T.W.Holstein, S.Grzesiek, and S.Ozbek (2007).
Continuous molecular evolution of protein-domain structures by single amino acid changes.
  Curr Biol, 17, 173-178.
PDB codes: 2hm4 2hm5 2hm6
17360527 T.Liu, S.T.Whitten, and V.J.Hilser (2007).
Functional residues serve a dominant role in mediating the cooperativity of the protein ensemble.
  Proc Natl Acad Sci U S A, 104, 4347-4352.  
17179210 T.P.Treynor, C.L.Vizcarra, D.Nedelcu, and S.L.Mayo (2007).
Computationally designed libraries of fluorescent proteins evaluated by preservation and diversity of function.
  Proc Natl Acad Sci U S A, 104, 48-53.  
17922750 T.S.Wong, D.Roccatano, and U.Schwaneberg (2007).
Steering directed protein evolution: strategies to manage combinatorial complexity of mutant libraries.
  Environ Microbiol, 9, 2645-2659.  
16477649 A.M.Marcelino, R.G.Smock, and L.M.Gierasch (2006).
Evolutionary coupling of structural and functional sequence information in the intracellular lipid-binding protein family.
  Proteins, 63, 373-384.  
16843652 A.M.Poole, and R.Ranganathan (2006).
Knowledge-based potentials in protein design.
  Curr Opin Struct Biol, 16, 508-513.  
16914055 E.Roberts, J.Eargle, D.Wright, and Z.Luthey-Schulten (2006).
MultiSeq: unifying sequence and structure data for evolutionary analysis.
  BMC Bioinformatics, 7, 382.  
16789811 G.Grigoryan, F.Zhou, S.R.Lustig, G.Ceder, D.Morgan, and A.E.Keating (2006).
Ultra-fast evaluation of protein energies directly from sequence.
  PLoS Comput Biol, 2, e63.  
17005391 H.B.Fraser (2006).
Coevolution, modularity and human disease.
  Curr Opin Genet Dev, 16, 637-644.  
17154716 J.Zhang, and J.S.Liu (2006).
On side-chain conformational entropy of proteins.
  PLoS Comput Biol, 2, e168.  
16807295 M.Jäger, Y.Zhang, J.Bieschke, H.Nguyen, M.Dendle, M.E.Bowman, J.P.Noel, M.Gruebele, and J.W.Kelly (2006).
Structure-function-folding relationship in a WW domain.
  Proc Natl Acad Sci U S A, 103, 10648-10653.
PDB codes: 1zcn 2f21
16680196 P.Wong, and D.Frishman (2006).
Fold designability, distribution, and disease.
  PLoS Comput Biol, 2, e40.  
17381287 S.R.Eddy (2006).
Computational analysis of RNAs.
  Cold Spring Harb Symp Quant Biol, 71, 117-128.  
16704729 T.P.Roosild, M.Vega, S.Castronovo, and S.Choe (2006).
Characterization of the family of Mistic homologues.
  BMC Struct Biol, 6, 10.  
16737532 U.Bastolla, M.Porto, H.E.Roman, and M.Vendruscolo (2006).
A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank.
  BMC Evol Biol, 6, 43.  
16281353 A.Doerr (2005).
Tackling biology's big question.
  Nat Methods, 2, 803.  
16177774 J.W.Kelly (2005).
Structural biology: form and function instructions.
  Nature, 437, 486-487.  
The most recent references are shown first. Citation data come partly from CiteXplore and partly from an automated harvesting procedure. Note that this is likely to be only a partial list as not all journals are covered by either method. However, we are continually building up the citation data so more and more references will be included with time. Where a reference describes a PDB structure, the PDB code is shown on the right.

 

spacer

spacer