TATA-box Binding Proteins


Family Ties


            There are different TATA-box binding proteins that have been identified, including TBP1, TBP2, TBP3 and TBPL (TATA-box binding protein like).  All of these proteins are related in terms of sequence and structure.  The TBP is composed of an N-terminal that varies in both length and sequence, and a highly conserved C-terminal region that binds to the TATA box.  The C-terminal region contains two 77-amino acid repeats that produce a saddle-shaped structure that straddles the DNA.  In addition, the C-terminal core interacts with a variety of transcription factors as well as regulatory proteins.  The N-terminal region appears to modulate DNA binding of the TBP molecule, in addition to other more specific functions.


What InterPro Tells Us


P20226 Human TATA-box binding proteins


InterPro Domain Architecture:


InterPro Entry

Method Accession

Graphical Match

Method Name

















PDB Chain/Domain ID

PDB Chain/Structural Domains


















From the graphical match above, you can see that the signatures (method accession) are divided into two InterPro entries for human TBP.  These signatures give information on the family relationships and domain architecture of this protein.

The InterPro entry IPR000814 has four signatures representing the transcription factor TFIID family:  PF00352 from the PFAM database, PR00686 from the PRINTS database, PS00351 from the PROSITE database, and PTHR10126 from the PANTHER database.  These signatures describe the core of the TBP, which primarily involves the highly conserved C-terminal region that binds to the TATA box.  The InterPro entry gives details regarding the taxonomic distribution of different TATA-box binding proteins.

The InterPro entry IPR012295 has one signature representing the C-terminal of TATA-box binding protein and beta2-adaptin:  G3D.3.30.310.10 from the Gene3D database.  The C-terminal domain is highly conserved, and contains two 77-amino acid repeats that produce a saddle-shaped structure that straddles the DNA, hence the split nature of the Gene3D signature. 

The remaining five entries in the table above are from the structural database PDB (green stripe), and from the structural classification databases CATH (pink stripe) and SCOP (black stripe) (the names such as d1cdwa2 being derived from the PDB entry upon which they are based, here PDB entry 1cdw, chain A, region 2).  The graphical match for the PDB entry 1d3b displays the length of the original PDB entry.  The CATH (1cdwA1 and 1cdwA2) and SCOP (d1cdwa1 and d1cdwa2) databases each divide the C-terminal region into two structural units, and give information on the classification of these sub-domains.


What the Structure Tells Us


            Structures of the core of TATA-box binding proteins from a variety of species have been determined, and can be viewed using AstexViewer®, which is linked from the Match Table via the logo  on the InterPro page (please note, there is no link directly from this page to the AstexViewer®, therefore you need to go to the link on the InterPro page for P20226).  The AstexViewer® displays the PDB structure with the particular CATH or SCOP domain highlighted in yellow.  There are many structures associated with TATA-box binding proteins in the Protein Data Bank (PDB).  A detailed description and visualisation of the structural features of TATA-box binding proteins can be found at the PDB ‘Molecule of the Month’, providing insights into how these molecules are involved in the transcription of genes.


Next:  Table of TATA-box binding proteins

Previous:  A Variety of TATA-box Binding Proteins