Introduction to Protein Structures

(Expected Time for completion: 1 hour)

This exercise will cover the basic types of protein structures as represented in the Protein Data Bank. For a detailed explanation of protein structure components, please see this excellent introduction in Wikipedia. Fold refers to a global type of arrangement, like helix-bundle or beta-barrel. Despite the fact that there are now over 50,000 experimentally determined structure in the PDB, the number of unique folds that these protein adopt is limited, and all proteins can be classified into one of more fold categories, which are annotated in databases like CATH and SCOP. More often than not, similar functions may be associated with certain fold of proteins, and the fold classification therefore, serves as an important tool in understanding the possible function of a protein.


Alpha-helix proteins: There are many different families of proteins which are composed of only alpha-helices. Please see extra details here. Some examples to explore are given below.

PDB Entry: 1IRD

Start with the PDBe home page service at http://www.ebi.ac.uk/pdbe, and in the space provided for Get PDB by id, type in 1ird, and click on “Go”. The browser will take you to the Summary page for the PDBe 'Atlas' pages for this particular structure. The side bar has links to all its Atlas pages. The atlas pages provide information about various facets of the deposited structure, including links to external sites. Coming back to 1ird, this structure is of Human hemoglobin bound to carbon monoxide. Choose the visualization link on the sidebar. You can also choose to view the structure interactively by choosing either the Astex viewer, Jmol or Open Astex links on the visualization page. As you would notice that this protein is only composed on alpha-helices. To look at the sequence section on the atlas page for this entry, click the 'Structure' and then 'Primary' in the sidebar, which will show you sequence alignment with UniProt. You can also view the Pfam classification for this family of proteins by choosing the 'Tertiary' link in the sidebar. Links to all other cross-referenced external databases are listed under 'Cross references' in the sidebar. Also look at the GO (Gene Ontology) entry for this protein on the Sequence page. The GO entry for this protein lists both the processes and function this protein is involved in. The structural classification of this protein can be viewed from the Pfam links in the atlas pages. This protein belongs to the globin type of all alpha-helix proteins. In the ligands section of the atlas pages, you can view more details of the compounds that this protein is associated with in this entry. The ligands present in this structure are Carbon monoxide and heme. You can now view the interactions the heme group makes with the protein by choosing the "interactions" link. This will take you to our PDBeMotif service and will show that HEM in this structure interacts with Histidine, CMO (Carbon monoxide), Tyrosine and others.



Beta-sheet proteins: These proteins are composed of only beta-sheets. This group is fairly large and comprises proteins with widely varying functions, from sugar-binding to metabolic transport to antibodies. Some examples are given below for you to explore. In each example below, look at the structure as above, as well as pay attention to the Pfam, CATH and SCOP classification for each entry to get a feel for the structure.

PDB entry: 1A0S

This protein is a beta-barrel protein and is involved in the transport of maltodextrin across the outer membrane of gram-negative bacteria. Other proteins share similar topology to this protein. More information about related proteins can be seen from the Pfam entry. Essentially this family is comprised of proteins that are collectively called porins. Look at the GO and Pfam entries for this protein.

Question 1: What compound/s is this protein associated with and what are the interactions between the compound/s and the protein ? (HINT: Look at the ligands page!).

Question 2: Which other entries in the PDB have the same protein ? (HINT: Look at the UniProt page: 3D Structure databases: MSD).

PDB entry: 1BKZ

1BKZ is the structure of a protein called galectin-7. This protein belongs a specific family of proteins called galectins (or s-lectins) and almost all of them bind to the galactoside sugars. This protein belongs to a different family of all beta-sheet proteins (Link to SCOP entry). Look at the various links on the right side of the atlas page for this entry for more information and try to answer the questions below.

Question 1: What is general function of this protein as annotated in the GO heirarchy ?

Question 2: This structure is not bound to any sugar/compounds, but are there any other PDB entries for the same protein which have ligands bound ? Which sugars do the other entries for the same protein associate with ? (Hint: Look at the Related PDB entries and their summary pages!)

Question 3: Look at the summary page for PDB entry 1w6o. Is there anything common between the ligand interactions of GAL in 2gal with the interaction of LAT in 1w6o ? (Hint: Look at the interactions of GAL with PDB entry 2GAL and the interactions of LAT with 1w6o).



Alpha-Beta proteins: This is the most populous category in protein fold classification. ( Link to SCOP (a/b) and Link to SCOP (a+b) ). SCOP has a total of 415 classes of proteins which are composed of alpha helices and beta-sheets in different topologies. Most enzymes fall into one of these families. Let us look at a few examples.

PDB entry: 1AFL

1AFL is the structure of pancreatic ribonuclease from Bos taurus (Bovine). Ribonucleases make up a large family of proteins with similar enzymatic functions and structures and include members that are implicated in angiogenesis (blood vessel growth in cancers). Ribonuclease essentially cleave RNA. Read more about the function of ribonucleases from the InterPro and Pfam entries for this structure from the “Cross references” on the sidebar. As is probably obvious there are over 150 structures of pancreatic ribonuclease determined in complex with various enzymatic inhibitors. Look at the Ligands page for this entry. This protein is bound to a compound called ATR, which is a modified ribonucleotide that binds to the active site and inhibits the activity of the enzyme. View the interactions of ATR with 1AFL from the ligands page. Under the “Links” sidebar , click on “PDBe SSM” to go to SSM (PDBeFold) website, and then click “Start SSM”. On next page type in 1afl in the “PDB code” box. Make sure that Target is “All PDB archive”, before you click “Submit your query”. This will activate the PDBefold service which will compare the structure of 1AFL with the whole PDB. The comparison job will run on the whole PDB and will present a page of results. At the bottom of the results page, choose "Sort by Seq%" and wait for this page to be remade. From the right side bottom of the page, choose the "Last Page" button to go to the last page. Click on result 300, which has 29% sequence identity and 78% identity in structure and choose View superpose. This will throw up a graphics window and show the structural alignment of 1AFL with 2I5S (an onconase). You can rotate the aligned structures. The two structures are very similar in fold but share very low sequence identity. Now look at the atlas pages for 2I5S by clicking here. Close the graphical window and the PDBefold windows.

Question 1: What are the similarities in function between 1AFL and 2I5S ? (Hint: Look at the GO entries for both structures!).

Question 2: What interactions does ATR make with 1AFL ?

Question 3: What is the structural characteristic of this class of proteins ? (Hint: Look at the SCOP entry for this protein).

Question 4: What is the EC number for this class of protein ? (Hint: Uniprot)


PDB entry: 1KWW

Look at 1KWW, the structure of a mannose-binding protein from Rat. These sugar-binding proteins are characterized by binding sugars (usually mannose) in the presence of metal ions such as Calcium. Look over the structure carefully and try to answer the questions below.

Question 1: What best describes this family of protein ? (Hint: Look at the Pfam and GO entries for this protein).

Question 2: Which residues are generally involved in the interaction between Calcium (CA) and the protein, and what is the predominant nature of this interaction ?

Question 3: Compare the binding site of MFU in 1KWW, with that of the binding site of FUC in PDB entry 3KMB. Can you identify the binding-site for sugars by looking at the residues which interact with the protein in both cases ? (Hint: Look at the ligand interactions for both MUC in 1KWW and FUC in 3KMB by looking at atlas pages for both these entries).


Document mantained by: Gaurav Sahni