Transformers: The science, not the fiction

Transformers: The science, not the fiction

Can a single sequence of amino acids have more than one three-dimensional fold to achieve more than one physiological function? In the past the answer to this question might have been ‘no’, but over the last decade or two the identification of proteins with variant structures, such as metamorphic proteins and prions, has challenged this view. More recently, research on the bacterial transcription factor RfaH has shown that it undergoes such an extreme conformational transformation that it has been called a ‘transformer protein’. It is able to switch between two completely different secondary and tertiary structures, each with a specific physiological function.

NusG family: Avoiding premature termination

RfaH controls the transcription of a set of genes with a DNA sequence called the operon polarity suppressor (ops) element in the leader region rather than the more usual Shine-Dalgarno element. In E.coli, these genes are involved in the production of cell-surface antigens and toxins, so they are potentially of medical interest. RfaH is a member of the NusG-like family of transcription proteins which, in bacteria, characteristically have two conserved domains connected by a flexible linker (e.g. PDB entry 1npr). The globular N-terminal domain (e.g. PDB entry 2k06) is of mixed α/β topology with a central four-stranded antiparallel β-sheet (view-1). It binds to RNA polymerase via a highly nonpolar surface, increasing the processivity of the RNA polymerase and thus helping to prevent premature termination of mRNA transcription. A five-stranded β-barrel (view-1) containing the KOW motif forms the C-terminal domain (e.g. PDB entry 2jvv) that is known to bind to the ribosomal protein S10 (also called NusE). The interaction with S10 means that the elongating polymerase complex is attached to a ribosome, so the nascent mRNA can be translated immediately. Thus, NusG proteins couple transcription and translation. NusG proteins from some organisms, such as that from the bacterium Aquifex aeolicus shown in (view-1), contain a non-conserved sequence inserted in the N-terminal domain.

Some prime questions

Determination of the crystal structure of RfaH from E.coli (PDB entry 2oug) yielded an unexpected result. Its N-terminal domain has the same topology as other NusG proteins and has a hydrophobic RNA polymerase-binding surface (view-2). As in all other known NusG type proteins, the linker between the two domains is flexible, but the C-terminal domain was found to have a radically different topology. In stark contrast to the β-barrel fold of all known NusG C-terminal domains, the C-terminal domain of RfaH consists of two antiparallel α-helices forming a coiled-coil structure which packs against the RNA-polymerase-binding surface of the N-terminal domain.

The interface between the two domains is mostly hydrophobic and buries 900 Å2 of each domain’s surface. Only one polar residue per domain is present in the contact region: in the C-terminal domain, an arginine that forms a salt bridge with a glutamate in the N-terminal domain. Two backbone carbonyl oxygen atoms in the N-terminal domain also form hydrogen bonds with this arginine (view-3). The domain interaction occludes the RNA-polymerase-binding site on the N-terminal domain and so prevents RfaH binding to the polymerase. This state is called autoinhibited because the conformation of the molecule itself prevents it from performing its function as a transcription factor.

The RfaH crystal structure raised some questions. Is the different fold a curious artifact or physiologically relevant? RfaH specifically interacts with a small number of genes containing the ops element, so have the function and fold of the C-terminal domain changed as RfaH has evolved this specificity? Is it possible that the C-terminal domain can exist in two drastically different conformations, α-helices and a β-barrel?

Resonant answers

Using nuclear magnetic resonance (NMR) spectroscopy, Paul Rösch and his group addressed these questions. First, the Rösch group solved the structure of the isolated RfaH C-terminal domain by NMR, (PDB entry 2lcl). Surprisingly, this revealed a five β-strand barrel, the same fold as for other NusG C-terminal domains (view-4), but totally different to the structure seen in the crystallised full-length protein. This suggested that the C-terminal domain is α-helical only when in contact with the N-terminal domain, but assumes a completely different fold when not bound to it.

A structural switch

Next, the Rösch group were able to demonstrate the α-helix-to-β-barrel transition in full-length RfaH using two engineered variants. In the first variant, a glutamate (highlighted in view-3) was replaced by a serine, which weakens the polar interaction across the domain interface. NMR experiments showed that this variant exists in two conformational states, with the C-terminal domain present in nearly equal amounts of the α-helical and β-barrel form. In the second variant, a TEV protease cleavage site was introduced into the flexible linker between the two RfaH domains. Changes in this protein were monitored by NMR spectroscopy after addition of trace amounts of TEV protease which cleaved the linker and thus separated the domains. The conversion of the C-terminal domain from the α-helical into the β-barrel form could be clearly observed as more of the RfaH was cleaved and the domains dissociated. Both experiments demonstrate that the C-terminal domain spontaneously refolds from the α-helical into the β-barrel form upon dissociation from the N-terminal domain.

Getting to the bottom of the barrel

NMR experiments showed that the free C-terminal domain was able to bind to the ribosomal S10 protein and so, like NusG, RfaH can couple transcription and translation. By observing which amino acids showed changes in the chemical shift of their amide protons and nitrogens on addition of S10, the binding site for S10 could be mapped onto RfaH. When compared to the known complex of NusG bound to S10 from E. coli (PDB entry 2kvg), it was apparent that the RfaH residues which showed chemical-shift changes were equivalent to NusG residues involved in S10 binding (view-5). It seems likely therefore that the C-terminal domains of NusG and RfaH bind S10 in the same manner.

The two different folds of the C-terminal domain enable RfaH to accomplish two functions. In the α-helical form, RfaH is autoinhibited. In this state it is unable to bind to RNA polymerase and so cannot act as a transcription factor. It seems that the two domains dissociate only if RfaH encounters the RNA polymerase bound to an ops element. Thus, the autoinhibited state prevents RfaH binding to the RNA polymerase when other genes are being transcribed and makes RfaH an ops element-specific transcription factor. Once the N-terminal and C-terminal domains dissociate, the β-barrel conformation of the C-terminal domain can interact with ribosomal protein S10, and simultaneously, the N-terminal domain can interact with RNA polymerase. This couples the RNA polymerase that produces mRNA to the ribosome, thereby enabling coupling of transcription and translation (Fig. 1).

Figure 1. In the autoinhibited state, the α-helical C-terminal domain of RfaH (magenta) binds to the N-terminal domain (blue) and blocks the RNA polymerase-binding site. On encountering RNA polymerase (green) bound to ops element DNA, the C-terminal domain of RfaH dissociates and refolds to a β-barrel, enabling the N-terminal domain to interact with the RNA polymerase and prevent premature termination of the mRNA transcript (red). The refolded C-terminal domain recruits a ribosome (orange) to the complex.

Transformers: Transforming our understanding

RfaH uses the switch between two alternative folds to transform from a transcription regulator to a translation activator. Because of this, RfaH has been called a transformer protein, proving that it is not necessary to change a protein sequence to alter both fold and function. It would be truly amazing if nature, be it only for reasons of economy, did not use the potential of such proteins more extensively to shuttle between alternate structures and functions. The NusG family is universally conserved, and is found in bacteria, archaea, plants and animals including humans. Perhaps there are transformers inside you!

Acknowledgement

This Quips article was developed in collaboration with Stefan Knauer and Paul Rösch at the University of Bayreuth, with helpful comments by Irina Artsimovitch from Ohio State University. Read more about the RfaH transformer in their reviews in RNA Biology or in Cell Cycle. You can read the original paper in Cell describing this work.

Further exploration: From sequence to structure

The study of the relationship between amino acid sequence and three-dimensional structure goes back a long way. In 1972, Anfinsen was awarded the Nobel Prize in Chemistry "for his work on ribonuclease, especially concerning the connection between the amino acid sequence and the biologically active conformation." Anfinsen realised what many have since forgotten: the influence of external factors. In his Nobel lecture, he said that the “native conformation is determined... by the amino acid sequence, in a given environment.” (his italics).

In 1994, Creamer and Rose posed the Paracelsus Challenge. A prize of $1000 would be awarded to the first group to transform a globular protein’s fold to that of another by changing no more than 50% of the residues in its sequence. The prize was won three years later when Dalal et al. converted a predominantly β-sheet protein to an α-helical protein by changing 50% of the residues. You can read more about this in the challenger’s news and views article.

The C-terminal domain of the transformer protein described in this Quips can exist in two different conformations. How do secondary structure prediction servers handle this sequence? Jpred predicts that this structure would have four β-strands and PSIPRED predicts four short β-strands and an α-helix. There are many other secondary structure prediction servers listed on Wikipedia. Do any predict an all α-helical structure?

There are some sequences of amino acids, so called chameleon sequences, which are found in both α-helices and β-strands. For instance the sequence VNHFIAEF forms a β-strand in PDB entry 2ieb but in other entries, for example PDB entry 1atr, it forms an α-helix. Research summarised by Ghozlane et al. indicates that, as Anfinsen stated and transformer proteins illustrate, their environment dictates which form they take. You can find out if a given sequence is present in the PDB archive in different conformations using PDBeMotif, as explained in the accompanying mini-tutorial.