U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Brown TA. Genomes. 2nd edition. Oxford: Wiley-Liss; 2002.

Chapter 9Assembly of the Transcription Initiation Complex

Learning outcomes

When you have read Chapter 9, you should be able to:

  • Outline the various techniques that are used to locate the position at which a DNA-binding protein attaches to a DNA molecule
  • Explain how a DNA-binding protein is purified and how its structure can be determined
  • Describe the key structural motifs that enable proteins to make sequence-specific attachments to DNA molecules
  • Discuss the features of the double helix that are important in interactions between DNA and its binding proteins, and give details of the chemical events that underlie the interaction
  • Identify the key features of the various prokaryotic and eukaryotic RNA polymerases and describe the structures of the promoter sequences that they recognize
  • Give a detailed description of how the Escherichia coli transcription initiation complex is assembled, and discuss the various ways in which this process can be regulated
  • Give a detailed description of the assembly of the RNA polymerase II transcription initiation complex, and explain how assembly of this complex is influenced by proteins that activate or silence gene expression
  • Outline the processes of transcription initiation by eukaryotic RNA polymerases I and III

Once upon a time the first stage in genome expression was described as ‘transcription’ or ‘DNA makes RNA’ (see Figure 3.2A), but we now realize that the process that leads from the genome to the transcriptome is much more complex than simply the synthesis of RNA. This part of genome expression is now divided into two key stages (Figure 9.1):

Figure 9.1. The two stages in the process that leads from genome to transcriptome.

Figure 9.1

The two stages in the process that leads from genome to transcriptome.

1.

Initiation of transcription, which results in the complex of proteins, including the RNA polymerase enzyme and its various accessory proteins, that will subsequently copy the gene into an RNA transcript being assembled upstream of the gene. Inherent in this step are the events that determine whether or not the gene is actually expressed.

2.

Synthesis and processing of RNA, which begins when the RNA polymerase leaves the initiation region and starts to make an RNA copy of the gene, and ends after completion of the processing and modification events that convert the initial transcript into a functional RNA.

This chapter deals with the initiation of transcription, and Chapter 10 covers RNA synthesis and processing. But before we move on to these topics we must do a little groundwork. The central players in many areas of molecular biology, including transcription, are DNA-binding proteins that attach to the genome in order to perform their biochemical functions. Histones are examples of DNA-binding proteins, and we will encounter many others later in this chapter when we look at assembly of the initiation complexes of prokaryotes and eukaryotes. There are also DNA-binding proteins that are involved in DNA replication, repair, and recombination, as well as a large group of related proteins that bind to RNA rather than DNA (Table 9.1). Many DNA-binding proteins recognize specific nucleotide sequences and bind predominantly to these target sites, whereas others bind non-specifically at various positions in the genome.

Table 9.1. Functions of DNA- and RNA-binding proteins.

Table 9.1

Functions of DNA- and RNA-binding proteins.

The mode of action of DNA-binding proteins is central to the initiation of transcription, and without a knowledge of how they function we can never hope to understand how the information in the genome is utilized. We will therefore spend some time examining what is known about DNA-binding proteins and how they interact with the genome.

9.1. The Importance of DNA-binding Proteins

As in all areas of molecular biology and genetics, the amount we know about a topic depends on the range and effectiveness of the methods available for its study. With regard to DNA-binding proteins we are fortunate in having a number of powerful techniques that can provide information on the interaction between a protein and the DNA sequence or sequences that it binds to. These techniques can be divided into three categories:

  • Methods for identifying the region(s) of a DNA molecule to which a protein binds;
  • Methods for purifying a DNA-binding protein;
  • Methods for studying the tertiary structure of a DNA-binding protein, including the complex formed when the protein is bound to DNA.

9.1.1. Locating the positions of DNA-binding sites in a genome

Often the first thing that is discovered about a DNA-binding protein is not the identity of the protein itself but the features of the DNA sequence that the protein recognizes. This is because genetic and molecular biology experiments, which we will deal with later in this chapter, have shown that many of the proteins that are involved in genome expression bind to short DNA sequences immediately upstream of the genes on which they act (Figure 9.2). This means that the sequence of a newly discovered gene, assuming that it includes both the coding DNA and the regions upstream of it, provides immediate access to the binding sites of at least some of the proteins responsible for expression of that gene. Because of this, a number of methods have been developed for locating protein binding sites within DNA fragments up to several kb in length, these methods working perfectly well even if the relevant DNA-binding proteins have not been identified.

Figure 9.2. Attachment sites for DNA-binding proteins are located immediately upstream of a gene.

Figure 9.2

Attachment sites for DNA-binding proteins are located immediately upstream of a gene. See Sections 9.2 and 9.3 for more information on the location and function of these protein attachment sites.

Gel retardation identifies DNA fragments that bind to proteins

The first of these methods makes use of the substantial difference between the electrophoretic properties of a ‘naked’ DNA fragment and one that carries a bound protein. Recall that DNA fragments are separated by agarose gel electrophoresis because smaller fragments migrate through the pore-like structure of the gel more quickly than do larger fragments (see Technical Note 2.1). If a DNA fragment has a protein bound to it then its mobility through the gel will be impeded: the DNA-protein complex therefore forms a band at a position nearer to the starting point (Figure 9.3). This is called gel retardation (Garner and Revzin, 1981). In practice the technique is carried out with a collection of restriction fragments that span the region thought to contain a protein binding site. The digest is mixed with an extract of nuclear proteins (assuming that a eukaryote is being studied) and retarded fragments are identified by comparing the banding pattern obtained after electrophoresis with the pattern for restricted fragments that have not been mixed with proteins. A nuclear extract is used because at this stage of the project the DNA-binding protein has not usually been purified. If, however, the protein is available then the experiment can be carried out just as easily with the pure protein as with a mixed extract.

Figure 9.3. Gel retardation analysis.

Figure 9.3

Gel retardation analysis. A nuclear extract has been mixed with a DNA restriction digest and a DNA-binding protein in the extract has attached to one of the restriction fragments. The DNA-protein complex has a larger molecular mass than the ‘naked’ (more...)

Protection assays pinpoint binding sites with greater accuracy

Gel retardation gives a general indication of the location of a protein binding site in a DNA sequence, but does not pinpoint the site with great accuracy. Often the retarded fragment is several hundred bp in length, compared with the expected length of the binding site of a few tens of bp at most, and there is no indication of where in the retarded fragment the binding site lies. Also, if the retarded fragment is long then it might contain separate binding sites for several proteins, or if it is quite small then there is the possibility that the binding site also includes nucleotides on adjacent fragments, ones that on their own do not form a stable complex with the protein and so do not lead to gel retardation. Retardation studies are therefore a starting point but other techniques are needed to provide more accurate information.

Modification protection assays can take over where gel retardation leaves off. The basis of these techniques is that if a DNA molecule carries a bound protein then part of its nucleotide sequence will be protected from modification. There are two ways of carrying out the modification:

  • By treatment with a nuclease, which cleaves all phosphodiester bonds except those protected by the bound protein;
  • By exposure to a methylating agent, such as dimethyl sulfate which adds methyl groups to G nucleotides. Any Gs protected by the bound protein will not be methylated.

The practical details of these two techniques are shown in Figures 9.4 and 9.5. Both utilize an experimental approach called footprinting. In nuclease footprinting (Galas and Schmitz, 1978), the DNA fragment being examined is labeled at one end, complexed with binding protein (as a nuclear extract or as pure protein), and treated with deoxyribonuclease I (DNase I). Normally, DNase I cleaves every phosphodiester bond, leaving only the DNA segment protected by the binding protein. This is not very useful because it can be difficult to sequence such a small fragment. It is quicker to use the more subtle approach shown in Figure 9.4. The nuclease treatment is carried out under limiting conditions, such as a low temperature and/or very little enzyme, so that on average each copy of the DNA fragment suffers a single ‘hit’ - meaning that it is cleaved at just one position along its length. Although each fragment is cut just once, in the entire population of fragments all bonds are cleaved except those protected by the bound protein. The protein is now removed, the mixture electrophoresed, and the labeled fragments visualized. Each of these fragments has the label at one end and a cleavage site at the other. The result is a ladder of bands corresponding to fragments that differ in length by one nucleotide, the ladder broken by a blank area in which no labeled bands occur. This blank area, or ‘footprint’, corresponds to the positions of the protected phosphodiester bonds, and hence of the bound protein, in the starting DNA.

Figure 9.4. DNase I footprinting.

Figure 9.4

DNase I footprinting. The technique is described in the text. The restriction fragments used at the start of the procedure must be labeled at just one end. This is usually achieved by treating a set of longer restriction fragments with an enzyme that (more...)

Figure 9.5. The dimethyl sulfate (DMS) modification protection assay.

Figure 9.5

The dimethyl sulfate (DMS) modification protection assay. The technique is similar to DNase I footprinting (see Figure 9.4). Instead of DNase I digestion, the fragments are treated with limited amounts of DMS so that a single guanine base is methylated (more...)

Modification interference identifies nucleotides central to protein binding

Modification protection should not be confused with modification interference, a different technique with greater sensitivity in the study of protein binding (Hendrickson and Schleif, 1985). Modification interference works on the basis that if a nucleotide critical for protein binding is altered, for example by addition of a methyl group, then binding may be prevented. One of this family of techniques is illustrated in Figure 9.6. The DNA fragment, labeled at one end, is treated with the modification reagent, in this case dimethyl sulfate, under limiting conditions so that just one guanine per fragment is methylated. Now the binding protein or nuclear extract is added, and the fragments electrophoresed. Two bands are seen, one corresponding to the DNA-protein complex and one containing DNA without bound protein. The latter contains molecules that have been prevented from attaching to the protein because the methylation treatment has modified one or more Gs that are crucial for the binding. To identify which Gs are modified, the fragment is purified from the gel and treated with piperidine, a compound that cleaves DNA at methylguanine nucleotides. The result of this treatment is that each fragment is cut into two segments, one of which carries the label. The length(s) of the labeled fragment(s), determined by a second round of electrophoresis, tells us which nucleotide(s) in the original fragment were methylated and hence identifies the position in the DNA sequence of Gs that participate in the binding reaction. Equivalent techniques can be used to identify the A, C and T nucleotides involved in binding.

Figure 9.6. Dimethyl sulfate (DMS) modification interference assay.

Figure 9.6

Dimethyl sulfate (DMS) modification interference assay. The method is described in the text. See the legend to Figure 9.4 for a description of the procedure used to obtain DNA fragments labeled at just one end.

9.1.2. Purifying a DNA-binding protein

Once a binding site has been identified in a DNA molecule, this sequence can be used to purify the DNA-binding protein, as a prelude to more detailed structural studies. The purification techniques utilize the ability of the protein to bind to its target site. One possibility is to use a form of affinity chromatography (Figure 9.7A). A DNA fragment or synthetic oligonucleotide that contains a protein binding site is immobilized in a chromatography column, usually by attaching one end of the DNA to a silica particle (Kadonaga, 1991). The protein extract is then passed through the column in a low-salt buffer, which promotes binding of proteins to their target sites. The binding protein specific for the immobilized sequence is retained in the column while all other proteins pass through. Once these unwanted proteins have been completely washed out, the column is eluted with a high-salt buffer, which destabilizes the DNA-protein complex. The pure binding protein can then be collected.

Figure 9.7. Two ways of purifying a DNA-binding protein.

Figure 9.7

Two ways of purifying a DNA-binding protein. (A) Affinity chromatography. DNA fragments or synthetic oligonucleotides containing the attachment site for the binding protein are attached to silica beads and these packed into a chromatography column. The (more...)

An alternative is to screen a cloning library (Singh et al., 1988). A library of cDNA clones, each synthesizing a different cloned protein from the organism being studied, is needed. These clones are blotted onto a nylon membrane in such a way that the protein content of each clone is retained (Figure 9.7B). The DNA fragment or oligonucleotide containing the protein binding site is labeled, and washed over the membrane. The DNA attaches to a blotted clone only if that clone has been synthesizing the appropriate DNA-binding protein. These clones are identified by detecting where the labeled DNA is located on the membrane. Samples of the clones can then be recovered from the master library and used to produce larger quantities of the binding protein.

9.1.3. Studying the structures of proteins and DNA-protein complexes

The availability of a pure sample of a DNA-binding protein makes possible the analysis of its structure, in isolation or attached to its DNA-binding site. This provides the most detailed information on the DNA-protein interaction, enabling the precise structure of the DNA-binding part of the protein to be determined, and allowing the identity and nature of the contacts with the DNA helix to be elucidated. Two techniques - X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy - are central to this area of research.

X-ray crystallography has broad applications in structure determination

X-ray crystallography is a long-established technique whose pedigree stretches back to the late 19th century. Indeed, Nobel prizes were awarded as early as 1915 to William and Lawrence Bragg, father and son, for working out the basic methodology and using it to determine the crystal structures of salts such as sodium chloride and zinc sulfide. The technique is based on X-ray diffraction. X-rays have very short wavelengths - between 0.01 and 10 nm - which is 4000 times shorter than visible light and comparable with the spacings between atoms in chemical structures. When a beam of X-rays is directed onto a crystal, some of the X-rays pass straight through, but others are diffracted and emerge from the crystal at a different angle from which they entered (Figure 9.8A). If the crystal is comprised of many copies of the same molecule, all positioned in a regular array, then different X-rays are diffracted in similar ways, resulting in overlapping circles of diffracted waves which interfere with one another. An X-ray-sensitive photographic film or electronic detector placed across the beam reveals a series of spots (Figure 9.8B), an X-ray diffraction pattern, from which the structure of the molecule in the crystal can be deduced.

Figure 9.8. X-ray crystallography.

Figure 9.8

X-ray crystallography. (A) An X-ray diffraction pattern is obtained by passing a beam of X-rays through a crystal of the molecule being studied. (B) The diffraction pattern obtained with crystals of ribonuclease. (C) Part of the electron-density map derived (more...)

The challenge with X-ray crystallography lies with the complexity of the methodology used to deduce the structure of a molecule from its diffraction pattern. The basic principles are that the relative positioning of the spots indicates the arrangement of the molecules in the crystal, and their relative intensities provide information on the structure of the molecule. The problem is that the more complex the molecule, the greater the number of spots and the larger the number of comparisons that must be made between them. Even with computational help the analysis is difficult and time consuming. If successful, the result is an electron density map (Figure 9.8C and D) which, with a protein, provides a chart of the folded polypeptide from which the positioning of structural features such as α-helices and β-sheets can be determined. If sufficiently detailed, the R groups of the individual amino acids in the polypeptide can be identified and their orientations relative to one another established, allowing deductions to be made about the hydrogen bonding and other chemical interactions occurring within the protein structure. With luck, these deductions lead to a detailed three-dimensional model of the protein (Rhodes, 1999).

The first protein structures to be determined by X-ray crystallography were for myoglobin and hemoglobin, resulting in further Nobel prizes, for Perutz and Kendrew in 1962. It still takes several months or longer to complete an X-ray crystallography analysis with a new protein, and there are many pitfalls that can prevent a successful conclusion being reached. In particular, it can often be difficult to obtain a suitable crystal of the protein. Despite these problems, the number of completed structures has gradually increased and now includes more than 50 DNA-binding proteins. An important innovation has been to crystallize DNA-binding proteins in the presence of their target sequences, the resulting protein-DNA structures revealing the precise positioning of the proteins relative to the double helix. It is from this type of information that most of our knowledge about the mode of action of DNA-binding proteins has been obtained.

NMR gives detailed structural information for small proteins

Like X-ray crystallography, NMR traces its origins to the early part of the 20th century, first being described in 1936 with the relevant Nobel prizes awarded in 1952. The principle of the technique is that rotation of a charged chemical nucleus generates a magnetic moment. When placed in an applied electromagnetic field, the spinning nucleus orientates in one of two ways, called α and β (Figure 9.9), the α-orientation (which is aligned with the magnetic field) having a slightly lower energy. In NMR spectroscopy the magnitude of this energy separation is determined by measuring the frequency of the electromagnetic radiation needed to induce the transition from α to β, the value being described as the resonance frequency of the nucleus being studied. The critical point is that although each type of nucleus (e.g. 1H, 13C, 15N) has its own specific resonance frequency, the measured frequency is often slightly different from the standard value (typically by less than 10 parts per million) because electrons in the vicinity of the rotating nucleus shield it to a certain extent from the applied magnetic field. This chemical shift (the difference between the observed resonance energy and the standard value for the nucleus being studied) enables the chemical environment of the nucleus to be inferred, and hence provides structural information. Particular types of analysis (called COSY and TOCSY) enable atoms linked by chemical bonds to the spinning nucleus to be identified; other analyses (e.g. NOESY) identify atoms that are close to the spinning nucleus in space but not directly connected to it.

Figure 9.9. The basis of nuclear magnetic resonance (NMR) spectroscopy.

Figure 9.9

The basis of nuclear magnetic resonance (NMR) spectroscopy. A rotating nucleus can take up either of two orientations in an applied electromagnetic field. The energy separation between the α and β spin states is determined by measuring (more...)

Not all chemical nuclei are suitable for NMR. Most protein NMR projects are 1H studies, the aim being to identify the chemical environments and covalent linkages of every hydrogen atom, and from this information to infer the overall structure of the protein. These studies are frequently supplemented by analyses of substituted proteins in which at least some of the carbon and/or nitrogen atoms have been replaced with the rare isotopes 13C and 15N, these also giving good results with NMR.

When successful, NMR results in the same level of resolution as X-ray crystallography and so provides very detailed information on protein structure (Evans, 1995). The main advantage of NMR is that it works with molecules in solution and so avoids the problems that sometimes occur when attempting to obtain crystals of a protein for X-ray analysis. Solution studies also offer greater flexibility if the aim is to examine changes in protein structure, for example during protein folding or in response to addition of a substrate. The disadvantage of NMR is that it is only suitable for relatively small proteins. There are several reasons for this, one being the need to identify the resonance frequencies for each, or as many as possible, of the 1H or other nuclei being studied. This depends on the various nuclei having different chemical shifts so that their frequencies do not overlap. The larger the protein, the greater the number of nuclei and the greater the chances that frequencies overlap and structural information is lost. Although this limits the applicability of NMR, the technique is still very valuable. There are many interesting proteins that are small enough to be studied by NMR, and important information can also be obtained by structural analysis of peptides which, although not complete proteins, can act as models for aspects of protein activity such as nucleic acid binding.

9.1.4. The special features of DNA-binding proteins

Now that we have examined the methods used to study DNA-binding proteins, we can turn our attention to the proteins themselves. Our main interest lies with those proteins that are able to target a specific nucleotide sequence and hence bind to a limited number of positions on a DNA molecule, this being the type of interaction that is most important in expression of the genome. To bind in this specific fashion a protein must make contact with the double helix in such a way that the nucleotide sequence can be recognized, which generally requires that part of the protein penetrates into the major and/or minor grooves of the helix (see Figures 1.11A and 1.12) in order to achieve direct readout of the sequence (Section 9.1.5). This is usually accompanied by more general interactions with the surface of the molecule, which may simply stabilize the DNA-protein complex or which may be aimed at accessing indirect information on nucleotide sequence that is provided by the conformation of the helix.

When the structures of sequence-specific DNA-binding proteins are compared, it is immediately evident that the family as a whole can be divided into a limited number of different groups on the basis of the structure of the segment of the protein that interacts with the DNA molecule (Table 9.2; Luisi, 1995). Each of these DNA-binding motifs is present in a range of proteins, often from very different organisms, and at least some of them probably evolved more than once. We will look at two in detail - the helix-turn-helix (HTH) motif and the zinc finger - and then briefly survey the others.

Table 9.2. DNA-binding motifs.

Table 9.2

DNA-binding motifs.

The helix-turn-helix motif is present in prokaryotic and eukaryotic proteins

The HTH motif was the first DNA-binding structure to be identified (Harrison and Aggarwal, 1990). As the name suggests, the motif is made up of two α-helices separated by a turn (Figure 9.10). The latter is not a random conformation but a specific structure, referred to as a β-turn, made up of four amino acids, the second of which is usually glycine. This turn, in conjunction with the first α-helix, positions the second α-helix on the surface of the protein in an orientation that enables it to fit inside the major groove of a DNA molecule. This second α-helix is therefore the recognition helix that makes the vital contacts which enable the DNA sequence to be read. The HTH structure is usually 20 or so amino acids in length and so is just a small part of the protein as a whole. Some of the other parts of the protein form attachments with the surface of the DNA molecule, primarily to aid the correct positioning of the recognition helix within the major groove.

Figure 9.10. The helix-turn-helix motif.

Figure 9.10

The helix-turn-helix motif. The drawing shows the orientation of the helix-turn-helix motif (in blue) of the Escherichia coli bacteriophage 434 repressor in the major groove of the DNA double helix. ‘N’ and ‘C’ indicate (more...)

Many prokaryotic and eukaryotic DNA-binding proteins utilize an HTH motif. In bacteria, HTH motifs are present in some of the best studied regulatory proteins, which switch on and off the expression of individual genes. An example is the lactose repressor, which regulates expression of the lactose operon (Sections 2.3.2 and 9.3.1). The various eukaryotic HTH proteins include many whose DNA-binding properties are important in the developmental regulation of genome expression, such as the homeodomain proteins, whose roles we will examine in Section 12.3.3. The homeodomain is an extended HTH motif possessed by each of these proteins. It is made up of 60 amino acids which form four α-helices, numbers 2 and 3 separated by a β-turn, with number 3 acting as the recognition helix and number 1 making contacts within the minor groove (Figure 9.11). Other versions of the HTH motif found in eukaryotes include:

Figure 9.11. The homeodomain motif.

Figure 9.11

The homeodomain motif. The first three helices of a typical homeodomain are shown with helix 3 orientated in the major groove and helix 1 making contacts in the minor groove. Helices 1–3 run in the N→C direction along the motif. Reprinted (more...)

  • The POU domain, which is usually found in proteins that also have a homeodomain, the two motifs probably working together by binding different regions of a double helix. The name ‘POU’ comes from the initial letters of the names of the first proteins found to contain this motif (Herr et al., 1988).
  • The winged helix-turn-helix motif, which is another extended version of the basic HTH structure, this one with a third α-helix on one side of the HTH motif and a β-sheet on the other side.

Many proteins, prokaryotic and eukaryotic, possess an HTH motif, but the details of the interaction of the recognition helix with the major groove are not exactly the same in all cases. The length of the recognition helix varies, generally being longer in eukaryotic proteins, the orientation of the helix in the major groove is not always the same, and the position within the recognition helix of those amino acids that make contacts with nucleotides is different.

Zinc fingers are common in eukaryotic proteins

The second type of DNA-binding motif that we will look at in detail is the zinc finger, which is rare in prokaryotic proteins but very common in eukaryotes (Mackay and Crossley, 1998). There appear to be more than 500 different zinc-finger proteins in the worm Caenorhabditis elegans, out of a total 19 000 proteins (Clarke and Berg, 1998), and it is estimated that 1% of all mammalian genes code for zinc-finger proteins.

There are at least six different versions of the zinc finger. The first to be studied in detail was the Cys 2 His 2 finger, which comprises a series of 12 or so amino acids, including two cysteines and two histidines, which form a segment of β-sheet followed by an α-helix. These two structures, which form the ‘finger’ projecting from the surface of the protein, hold between them a bound zinc atom, coordinated to the two cysteines and two histidines (Figure 9.12). The α-helix is the part of the motif that makes the critical contacts within the major groove, its positioning within the groove being determined by the β-sheet, which interacts with the sugar-phosphate backbone of the DNA, and the zinc atom, which holds the sheet and helix in the appropriate positions relative to one another. Other versions of the zinc finger differ in the structure of the finger, some lacking the sheet component and consisting simply of one or more α-helices, and the precise way in which the zinc atom is held in place also varies. For example, the multicysteine zinc fingers lack histidines, the zinc atom being coordinated between four cysteines.

Figure 9.12. The Cys2His2 zinc finger.

Figure 9.12

The Cys2His2 zinc finger. This particular zinc finger is from the yeast SWI5 protein. The zinc atom is held between two cysteines within the β-sheet of the motif and two histidines in the α-helix. The solid green lines indicate the R groups (more...)

An interesting feature of the zinc finger is that multiple copies of the finger are sometimes found on a single protein. Several have two, three or four fingers, but there are examples with many more than this - 37 for one toad protein. In most cases, the individual zinc fingers are thought to make independent contacts with the DNA molecule, but in some cases the relationship between different fingers is more complex. In one particular group of proteins - the nuclear or steroid receptor family - two α-helices containing six cysteines combine to coordinate two zinc atoms in a single DNA-binding domain, larger than a standard zinc finger (Figure 9.13). Within this motif it appears that one of the α-helices enters the major groove whereas the second makes contacts with other proteins.

Figure 9.13. The steroid receptor zinc finger.

Figure 9.13

The steroid receptor zinc finger. The R groups of the amino acids involved in the interactions with the zinc atoms are shown as solid green lines. ‘N’ and ‘C’ indicate the N- and C-termini of the motif, respectively. Reprinted (more...)

Box Icon

Box 9.1

RNA-binding motifs. RNA-binding proteins also have specific motifs that form the attachment with the RNA molecule. The most important of these are as follows: The ribonucleoprotein (RNP) domain comprises four β-strands and two α-helices (more...)

Other DNA-binding motifs

The various other DNA-binding motifs that have been discovered in different proteins include:

  • The basic domain, in which the DNA recognition structure is an α-helix that contains a high number of basic amino acids (e.g. arginine, serine and threonine). A peculiarity of this motif is that the α-helix only forms when the protein interacts with DNA: in the unbound state the helix has a disorganized structure. Basic domains are found in a number of eukaryotic proteins involved in transcription of DNA into RNA.
  • The ribbon-helix-helix motif, which is one of the few motifs that achieves sequence-specific DNA binding without making use of an α-helix as the recognition structure. Instead, the ribbon (i.e. two strands of a β-sheet) makes contact with the major groove (Figure 9.14). Ribbon-helix-helix motifs are found in some gene-regulatory proteins in bacteria.
  • The TBP domain has so far only been discovered in the TATA-binding protein (Section 9.2.3), after which it is named (Kim et al., 1993). As with the ribbon-helix-helix motif, the recognition structure is a β-sheet, but in this case the main contacts are with the minor, not major, groove of the DNA molecule.
Figure 9.14. The ribbon-helix-helix motif.

Figure 9.14

The ribbon-helix-helix motif. The drawing is of the ribbon-helix-helix motif of the Escherichia coli MetJ repressor, which consists of a dimer of two identical proteins, one shown in blue and the other in green. The β-strands at the left of the (more...)

9.1.5. The interaction between DNA and its binding proteins

In recent years our understanding of the part played by the DNA molecule in the interaction with a binding protein has begun to change. It has always been accepted that proteins that recognize a specific sequence as their binding site can locate this site by forming contacts with chemical groups attached to the nitrogenous bases that are exposed within the major and minor grooves that spiral around the double helix (see Figure 1.11A). It is now recognized that the nucleotide sequence also influences the precise conformation of each region of the helix, and that these conformational features represent a second, less direct way in which the DNA sequence can influence protein binding.

Direct readout of the nucleotide sequence

It was clear from the double helix structure described by Watson and Crick (Section 1.1.3) that although the nucleotide bases are on the inside of the DNA molecule, they are not entirely buried, and some of the chemical groups attached to the purine and pyrimidine bases are accessible from outside the helix. Direct readout of the nucleotide sequence should therefore be possible without breaking the base pairs and opening up the molecule.

In order to form chemical bonds with groups attached to the nucleotide bases, a binding protein must make contacts within one or both of the grooves on the surface of the helix. With the B-form of DNA, the identity and orientation of the exposed parts of the bases within the major groove is such that most sequences can be read unambiguously, whereas within the minor groove it is possible to identify if each base pair is A-T or G-C but difficult to know which nucleotide of the pair is in which strand of the helix (Figure 9.15; Kielkopf et al., 1998). Direct readout of the B-form therefore predominantly involves contacts in the major groove. With other DNA types there is much less information on the contacts formed with binding proteins, but the picture is likely to be quite different. In the A-form, for example, the major groove is deep and narrow and less easily penetrated by any part of a protein molecule (see Table 1.1). The shallower minor groove is therefore likely to play the main part in direct readout. With Z-DNA, the major groove is virtually non-existent and direct readout is possible to a certain extent without moving beyond the surface of the helix.

Figure 9.15. Recognition of an A-T base pair in the B-form double helix.

Figure 9.15

Recognition of an A-T base pair in the B-form double helix. An A-T base pair is shown in outline (see Figure 1.11B), with arrows indicating the chemical features that can be recognized by accessing the base pair via the major groove (above) and minor (more...)

The nucleotide sequence has a number of indirect effects on helix structure

The recent change in our view of DNA structure concerns the influence of the nucleotide sequence on the conformation of the helix at different positions along its length. Originally it was thought that cellular DNA molecules have fairly uniform structures, made up mainly of the B-form of the double helix. Some short segments might be in the A-form, and there might be some Z-DNA tracts, especially near the ends of a molecule, but the majority of the length of a double helix would be unvarying B-DNA. We now recognize that DNA is much more polymorphic, and that it is possible for the A-, B- and Z-DNA configurations, and intermediates between them, to coexist within a single DNA molecule, different parts of the molecule having different structures. These conformational variations are sequence dependent, being largely the result of the base-stacking interactions that occur between adjacent base pairs. As well as being responsible, along with base-pairing, for the stability of helix, the base-stacking also influences the amount of rotation that occurs around the covalent bonds within individual nucleotides and hence determines the conformation of the helix at a particular position. The rotational possibilities in one base pair are influenced, via the base-stacking interactions, by the identities of the neighboring base pairs. This means that the nucleotide sequence indirectly affects the overall conformation of the helix, possibly providing structural information that a binding protein can use to help it locate its appropriate attachment site on a DNA molecule. At present this is just a theoretical possibility as no protein that specifically recognizes a non-B form of the helix has been identified, but many researchers believe that helix conformation is likely to play some role in the interaction between DNA and protein.

A second type of conformational change is DNA bending (Travers, 1995). This does not refer to the natural flexibility of DNA which enables it to form circles and supercoils, but instead to localized positions where the nucleotide sequence causes the DNA to bend. Like other conformational variations, DNA bending is sequence dependent. In particular, a DNA molecule in which one polynucleotide contains two or more groups of repeated adenines, each group comprising 3–5 As, with individual groups separated by 10 or 11 nucleotides, will bend at the 3′ end of the adenine-rich region (Young and Beveridge, 1998). As with helix conformation, it is not yet known to what extent DNA bending influences protein binding, although protein-induced bending at flexible sites has a clearly demonstrated function in the regulation of some genes (e.g. Falvo et al., 1995; Section 9.3.2).

Contacts between DNA and proteins

The contacts formed between DNA and its binding proteins are non-covalent. Within the major groove, hydrogen bonds form between the nucleotide bases and the R groups of amino acids in the recognition structure of the protein, whereas in the minor groove hydrophobic interactions are more important. On the surface of the helix, the major interactions are electrostatic, between the negative charges on the phosphate component of each nucleotide and the positive charges on the R groups of amino acids such as lysine and arginine, although some hydrogen bonding also occurs. In some cases, hydrogen bonding on the surface of the helix or in the major groove is direct between DNA and protein; in others it is mediated by water molecules. Few generalizations can be made: at this level of DNA-protein interaction each example has its own unique features and the details of the bonding have to be worked out by structural studies rather than by comparisons with other proteins.

Most proteins that recognize specific sequences are also able to bind non-specifically to other parts of a DNA molecule. In fact it has been suggested that the amount of DNA in a cell is so large, and the numbers of each binding protein so small, that the proteins spend most, if not all, of their time attached non-specifically to DNA (Stormo and Fields, 1998). The distinction between the non-specific and specific forms of binding is that the latter is more favorable in thermodynamic terms. As a result, a protein is able to bind to its specific site even though there are literally millions of other sites to which it could attach non-specifically. To achieve this thermodynamic favorability, the specific binding process must involve the greatest possible number of DNA-protein contacts, which explains in part why the recognition structures of many DNA-binding motifs have evolved to fit snugly into the major groove of the helix, where the opportunity for DNA-protein contacts is greatest. It also explains why some DNA-protein interactions result in conformational changes to one or other partner, increasing still further the complementarity of the interacting surfaces, and allowing additional bonding to occur.

The need to maximize contacts in order to ensure specificity is also one of the reasons why many DNA-binding proteins are dimers, consisting of two proteins attached to one another. This is the case for most HTH proteins and many of the zinc-finger type. Dimerization occurs in such a way that the DNA-binding motifs of the two proteins are both able to access the helix, possibly with some degree of cooperativity between them, so that the resulting number of contacts is greater than twice the number achievable by a monomer. As well as their DNA-binding motifs, many proteins contain additional characteristic domains that participate in the protein-protein contacts that result in dimer formation. One of these is the leucine zipper, which is an α-helix that coils more tightly than normal and presents a series of leucines on one of its faces. These can form contacts with the leucines of the zipper on a second protein, forming the dimer (Figure 9.16). A second dimerization domain is, rather unfortunately, called the helix-loop-helix motif, which is distinct from, and should not be confused with, the helix-turn-helix DNA-binding motif.

Figure 9.16. A leucine zipper.

Figure 9.16

A leucine zipper. This is a bZIP type of leucine zipper. The blue and green structures are parts of different proteins. Each set of spheres represents the R -group of a leucine amino acid. Leucines in the two helices associate with one another via hydrophobic (more...)

9.2. DNA-Protein Interactions During Transcription Initiation

Now that we have established that DNA-protein interactions are the key to understanding the initiation of transcription, we can move on to begin our examination of the events involved in the assembly of the initiation complex. We will do this in two stages. First, we will study the DNA-protein interactions that are involved in transcription initiation. Then, in Section 9.3, we will investigate how assembly of the initiation complex, and its ability to initiate transcription, can be controlled by various additional proteins that respond to stimuli from inside or outside the cell and ensure that the correct genes are transcribed at the appropriate times.

9.2.1. RNA polymerases

In Section 3.2.2 we learnt that the enzymes responsible for transcription of DNA into RNA are called DNA-dependent RNA polymerases. Transcription of eukaryotic nuclear genes requires three different RNA polymerases: RNA polymerase I, RNA polymerase II and RNA polymerase III. Each is a multi-subunit protein (8–12 subunits) with a molecular mass in excess of 500 kDa. Structurally, these polymerases are quite similar to one another, the three largest subunits being closely related and some of the smaller ones being shared by more than one enzyme; functionally, however, they are quite distinct. Each works on a different set of genes, with no interchangeability (Table 9.3). Most research attention has been directed at RNA polymerase II, as this is the one that transcribes genes that code for proteins. It also works on a set of genes specifying the small nuclear RNAs that are involved in RNA processing. RNA polymerase III transcribes other genes for small RNAs, including those for transfer RNAs (tRNAs). RNA polymerase I transcribes the multicopy repeat units containing the 28S, 5.8S and 18S rRNA genes. The functions of all these RNAs were summarized in Section 3.2.1 and are described in detail in Chapters 10 and 11.

Table 9.3. Functions of the three eukaryotic nuclear RNA polymerases.

Table 9.3

Functions of the three eukaryotic nuclear RNA polymerases.

Archaea possess a single RNA polymerase that is very similar to the eukaryotic enzymes (Bult et al., 1996). But this is not typical of the prokaryotes in general because the bacterial RNA polymerase is very different, consisting of just five subunits, described as α2ββ′σ (two α subunits, one each of β and the related β′, and one of σ). The α, β and β′ subunits are equivalent to the three largest subunits of the eukaryotic RNA polymerases, but the σ subunit has its own special properties, both in terms of its structure and, as we will see in the next section, its function.

Box Icon

Box 9.3

Mitochondrial and chloroplast RNA polymerases. The RNA polymerases that transcribe organelle genes are unlike their counterparts in the nucleus, reflecting the bacterial origins of mitochondria and chloroplasts (Section 2.2.2). The mitochondrial RNA polymerase (more...)

9.2.2. Recognition sequences for transcription initiation

It is essential that transcription initiation complexes are constructed at the correct positions on DNA molecules. These positions are marked by target sequences that are recognized either by the RNA polymerase itself or by a DNA-binding protein which, once attached to the DNA, forms a platform to which the RNA polymerase binds (see Figure 3.6).

Bacterial RNA polymerases bind to promoter sequences

In bacteria, the target sequence for RNA polymerase attachment is called the promoter. This term was first used by geneticists in 1964 to describe the function of a locus immediately upstream of the three genes in the lactose operon (Figure 9.17). When this locus was inactivated by mutation, the genes in the operon were not expressed; the locus therefore appeared to promote expression of the genes. We now know that this is because the locus is the binding site for the RNA polymerase that transcribes the operon.

Figure 9.17. The promoter for the lactose operon of Escherichia coli.

Figure 9.17

The promoter for the lactose operon of Escherichia coli. The promoter is located immediately upstream of lacZ, the first gene in the operon. The DNA sequence shows the positions of the -35 and -10 boxes, the two distinct sequence components of the promoter. (more...)

The sequences that make up the Escherichia coli promoter were first identified by comparing the regions upstream of over 100 genes. It was assumed that promoter sequences would be very similar for all genes and so should be recognizable when the upstream regions were compared. These analyses showed that the E. coli promoter consists of two segments, both of six nucleotides, described as follows (see Figure 9.17):

Image ch9e1.jpg

These are consensus sequences and so describe the ‘average’ of all promoter sequences in E. coli; the actual sequences upstream of any particular gene might be slightly different (Table 9.4). The names of the boxes indicate their positions relative to the point at which transcription begins. The nucleotide at this point is labeled ‘+1’ and is anything between 20 and 600 nucleotides upstream of the start of the coding region of the gene. The spacing between the two boxes is important because it places the two motifs on the same face of the double helix, facilitating their interaction with the DNA-binding component of the RNA polymerase (Section 9.2.3).

Table 9.4. Sequences of Escherichia coli promoters.

Table 9.4

Sequences of Escherichia coli promoters.

Eukaryotic promoters are more complex

In eukaryotes, the term ‘promoter’ is used to describe all the sequences that are important in initiation of transcription of a gene. For some genes these sequences can be numerous and diverse in their functions, including not only the core promoter, sometimes called the basal promoter, which is the site at which the initiation complex is assembled, but also one or more upstream promoter elements which, as their name implies, lie upstream of the core promoter. Assembly of the initiation complex on the core promoter can usually occur in the absence of the upstream elements, but only in an inefficient way. This indicates that the proteins that bind to the upstream elements include at least some that are activators of transcription, and which therefore ‘promote’ gene expression. Inclusion of these sequences in the ‘promoter’ is therefore justified.

Each of the three types of eukaryotic RNA polymerase recognizes a different type of promoter sequence; indeed, it is the difference between the promoters that defines which genes are transcribed by which polymerases. The details for vertebrates are as follows (Figure 9.18):

Figure 9.18. Structures of eukaryotic promoters.

Figure 9.18

Structures of eukaryotic promoters. Promoter regions are indicated in blue. The RNA polymerase III promoter structure refers to the 5S rRNA genes. Other genes transcribed by RNA polymerase III (see Table 9.3) have different promoter structures, including (more...)

  • RNA polymerase I promoters consist of a core promoter spanning the transcription start point, between nucleotides -45 and +20, and an upstream control element about 100 bp further upstream.
  • RNA polymerase II promoters are variable and can stretch for several kilobases upstream of the transcription start site. The core promoter consists of two segments: the -25 or TATA box (consensus 5′-TATAWAW-3′, where W is A or T) and the initiator (Inr) sequence (consensus 5′-YYCARR-3′, where Y is C or T, and R is A or G) located around nucleotide +1. Some genes transcribed by RNA polymerase II have only one of these two components of the core promoter, and some, surprisingly, have neither. The latter are called ‘null’ genes. They are still transcribed, possibly through interactions between the RNA polymerase and a sequence called MED-1 which lies within the gene (Novina and Roy, 1996), although the start position for transcription is more variable than for a gene with a TATA and/or Inr sequence. As well as the core promoter, genes recognized by RNA polymerase II have various upstream promoter elements, the functions of which are described in Section 9.3.2.
  • RNA polymerase III promoters are unusual in that they are located within the genes whose transcription they promote. These promoters are variable, falling into at least three categories. Usually the core promoter spans 50–100 bp and comprises two sequence boxes. One category of RNA polymerase III promoter is similar to those for RNA polymerase II, having a TATA box and a range of upstream promoter elements. Interestingly, this arrangement is seen with the U6 gene, which is one of a family of genes for small nuclear RNAs, all the other members of which are transcribed by RNA polymerase II.

9.2.3. Assembly of the transcription initiation complex

In a general sense, initiation of transcription operates along the same lines with each of the four types of RNA polymerase that we have been considering (Figure 9.19). The bacterial polymerase and the three eukaryotic enzymes all begin by attaching, directly or via accessory proteins, to their promoter or core promoter sequences. Next this closed promoter complex is converted into an open promoter complex by breakage of a limited number of base pairs around the transcription initiation site. Finally, the RNA polymerase moves away from the promoter. This last step is more complicated than it might appear because some attempts by the polymerase to achieve promoter clearance are unsuccessful and lead to truncated transcripts that are degraded soon after they are synthesized. The true completion of the initiation stage of transcription is therefore the establishment of a stable transcription complex that is actively transcribing the gene to which it is attached.

Figure 9.19. Generalized scheme for the events occurring during initiation of transcription.

Figure 9.19

Generalized scheme for the events occurring during initiation of transcription. The core promoter is shown in blue and the transcription initiation site is indicated by a green dot. After RNA polymerase attachment, the closed complex is converted into (more...)

Although the scheme shown in Figure 9.19 is correct in outline for all four polymerases, the details are different for each one. We will begin with the more straightforward events occurring in E. coli and other bacteria, and then move on to the ramifications of initiation in eukaryotes.

Transcription initiation in E. coli

In E. coli, a direct contact is formed between the promoter and RNA polymerase. The sequence specificity of the polymerase resides in its σ subunit: the ‘core enzyme’, which lacks this component, can only make loose and non-specific attachments to DNA.

Mutational studies of E. coli promoters have shown that changes to the sequence of the -35 box affect the ability of RNA polymerase to bind, whereas changes to the -10 box affect the conversion of the closed promoter complex into the open form. These results led to the model for E. coli initiation shown in Figure 9.20, where recognition of the promoter occurs by an interaction between the σ subunit and the -35 box, forming a closed promoter complex in which the RNA polymerase spans some 60 bp from upstream of the -35 box to downstream of the -10 box. This is followed by breaking of the base pairs within the -10 box to produce the open complex. The model is consistent with the fact that the -10 boxes of different promoters are comprised mainly or entirely of A-T base pairs, which are weaker than G-C pairs, being linked by just two hydrogen bonds as opposed to three (see Figure 1.11B).

Figure 9.20. Initiation of transcription in Escherichia coli.

Figure 9.20

Initiation of transcription in Escherichia coli. The E. coli RNA polymerase recognizes the -35 box as its binding sequence. After attachment to the DNA, the transition from closed to open complex is initiated by breakage of base pairs in the AT-rich -10 (more...)

Opening up of the helix involves contacts between the polymerase and the non-template strand (i.e. the one that is not copied into RNA), again with the σ subunit playing a central role (Marr and Roberts, 1997). However, the σ subunit is not all-important because it dissociates soon after initiation is complete, converting the holoenzyme to the core enzyme which carries out the elongation phase of transcription (Section 10.1.1).

Transcription initiation with RNA polymerase II

How does the easily understandable series of events occurring in E. coli compare with the equivalent processes in eukaryotes? RNA polymerase II will show us that eukaryotic initiation involves more proteins and has added complexities.

The first difference between initiation of transcription in E. coli and eukaryotes is that eukaryotic polymerases do not directly recognize their core promoter sequences. For genes transcribed by RNA polymerase II, the initial contact is made by the general transcription factor (GTF) TFIID, which is a complex made up of the TATA-binding protein (TBP) and at least 12 TBP-associated factors or TAFs. TBP is a sequence-specific protein that binds to DNA via its unusual TBP domain (Section 9.1.4) which makes contact with the minor groove in the region of the TATA box. X-ray crystallography studies of TBP show that it has a saddle-like shape that wraps partially around the double helix (Chasman et al., 1993), forming a platform onto which the remainder of the initiation complex can be assembled. The TAFs are intriguing proteins that appear to play a variety of roles during initiation of transcription and also during other events that involve assembly of multiprotein complexes onto the genome. Five of the yeast TAFs are also present in SAGA, one of the histone acetyltransferase complexes that we met in Section 8.2.1 (Grant et al., 1998), and TAFs have also been implicated in control of the cell cycle in various eukaryotes (Green, 2000) and in regulation of the developmental changes that result in formation of gametes in animals (Verrijzer, 2001). During transcription, TAFs assist in attachment of TFIID to the TATA box and, in conjunction with other proteins called TAF- and initiator-dependent cofactors (TICs), possibly also participate in recognition of the Inr sequence, especially at those promoters that lack a TATA box. A clue as to how TAFs carry out their multifarious roles has been provided by structural studies which have shown that at least three of them contain a histone fold - a DNA-binding domain which, as the name suggests, is also present in histone proteins (Table 9.2; Research Briefing 9.1). It has been proposed that these TAFs might be able to form a DNA-binding structure resembling a nucleosome (Burley and Roeder, 1996), but this idea may not be entirely correct because the TAFs lack certain amino acids that are looked on as essential for stabilizing the contacts between real nucleosomes and DNA (Gangloff et al., 2001).

Box Icon

Box 9.1

Similarities between TFIID and the histone core octamer. An intriguing insight into the interaction between TFIID and the RNA polymerase II promoter was provided by the discovery that several TAFs have structural similarities with histones. A key step (more...)

After TFIID has attached to the core promoter, the pre-initiation complex (PIC) is formed by attachment of the remaining GTFs (Table 9.5). Test-tube experiments suggest that these GTFs bind to the complex in the order TFIIA, TFIIB, TFIIF/RNA polymerase II, TFIIE and TFIIH (Figure 9.21) but it is now thought that in vivo assembly involves a more complex set of interactions than indicated by this step-by-step sequence (Lee and Young, 2000). Within the overall process, three events are particularly important:

Table 9.5. Functions of human general transcription factors (GTFs).

Table 9.5

Functions of human general transcription factors (GTFs).

Figure 9.21. Assembly of the RNA polymerase II pre-initiation complex.

Figure 9.21

Assembly of the RNA polymerase II pre-initiation complex. The first step in assembly of the pre-initiation complex is recognition of the TATA box and possibly the Inr sequence by the TATA-binding protein (TBP), in conjunction with the TBP-associated factors (more...)

1.

Attachment of TBP induces formation of a bend in the DNA in the region of the TATA box.

2.

The bend provides a recognition structure for TFIIB, which ensures correct positioning of RNA polymerase II relative to the transcription start site.

3.

The disruption to the base pairing needed to form the open promoter complex is brought about by TFIIH (Kim et al, 2000).

The final step in assembly of the initiation complex is the addition of phosphate groups to the C-terminal domain (CTD) of the largest subunit of RNA polymerase II. In mammals, this domain consists of 52 repeats of the seven-amino-acid sequence Tyr-Ser-Pro-Thr-Ser-Pro-Ser. Two of the three serines in each repeat unit can be modified by addition of a phosphate group, causing a substantial change in the ionic properties of the polymerase. Once phosphorylated, the polymerase is able to leave the pre-initiation complex and begin synthesizing RNA. Phosphorylation might be carried out by TFIIH, which has the appropriate protein kinase capability, or it might be the function of the mediator (Section 9.3.2), which transduces signals from activator proteins that regulate expression of individual genes (Lee and Young, 2000). After departure of the polymerase, at least some of the GTFs detach from the core promoter, but TFIID, TFIIA and TFIIH remain, enabling re-initiation to occur without the need to rebuild the entire assembly from the beginning (Yudkovsky et al., 2000). Re-initiation is therefore a more rapid process than primary initiation, which means that once a gene is switched on, transcripts can be initiated from its promoter with relative ease until such a time as a new set of signals switches the gene off.

Box Icon

Box 9.4

Initiation of transcription in the archaea. Initiation of transcription is one of the key areas in which the archaea differ from the bacteria, emphasizing the major distinction between these two types of prokaryote. The archaeal RNA polymerase is made (more...)

Transcription initiation with RNA polymerases I and III

Initiation of transcription at RNA polymerase I and III promoters involves similar events to those seen with RNA polymerase II, but the details are different. One of the most striking similarities is that TBP, first identified as the key sequence-specific DNA-binding component of the RNA polymerase II pre-initiation complex, is also involved in initiation of transcription by the two other eukaryotic RNA polymerases.

The RNA polymerase I initiation complex involves four protein complexes in addition to the polymerase itself. One of these, UBF, is a dimer of identical proteins that interacts with both the core promoter and the upstream control element (see Figure 9.18). UBF is another protein which, like some of the RNA polymerase II TAFs, resembles a histone and may form a nucleosome-like structure in the promoter region (Wolffe, 1994). A second protein complex, called SL1 in humans and TIF-IB in mice, contains TBP and, together with UBF, directs RNA polymerase I and the last two complexes, TIF-IA and TIF-IC, to the promoter. Originally it was thought that the initiation complex was built up in a stepwise fashion, but recent results suggest that RNA polymerase I binds the four protein complexes before promoter recognition, the entire assembly attaching to the DNA in a single step (Seither et al., 1998).

RNA polymerase III promoters are variable in structure (see Figure 9.18) and this is reflected by a non-uniformity of the processes through which they are recognized. Initiation at the different categories of RNA polymerase III promoter requires different sets of GTFs, but each type of initiation process involves TFIIIB, one of whose subunits is TBP. With promoters of the type seen with the U6 gene, which contain a TATA sequence, TBP probably binds directly to the DNA. At other RNA polymerase III promoters, which have no TATA sequence, binding is probably via a second protein, the latter making the direct DNA contact.

9.3. Regulation of Transcription Initiation

As we progress through the next few chapters we will encounter a number of strategies that organisms use to regulate expression of individual genes. We will discover that virtually every step in the pathway from genome to proteome is subject to some degree of control. Of all these regulatory systems, it appears that transcription initiation is the stage at which the critical controls over the expression of individual genes (i.e. those controls that have greatest impact on the biochemical properties of the cell) are exerted. This is perfectly understandable. It makes sense that transcription initiation, being the first step in genome expression, should be the stage at which ‘primary’ regulation occurs, this being the level of regulation that determines which genes are expressed. Later steps in the pathway might be expected to respond to ‘secondary’ regulation, the function of which is not to switch genes on or off but to modulate expression by making small changes to the rate at which the protein product is synthesized, or possibly by changing the nature of the product in some way (Figure 9.22).

Figure 9.22. Primary and secondary levels of gene regulation.

Figure 9.22

Primary and secondary levels of gene regulation. According to this scheme, ‘primary’ regulation of genome expression occurs at the level of transcription initiation, this step determining which genes are expressed in a particular cell (more...)

In Chapter 8 we looked at how chromatin structure can influence gene expression by controlling the accessibility of promoter sequences to RNA polymerase and its associated proteins. This is just one way in which initiation of transcription can be regulated. To obtain a broader picture we will establish some general principles with bacteria, and then examine the events in eukaryotes.

9.3.1. Strategies for controlling transcription initiation in bacteria

In bacteria such as E. coli, we recognize two distinct ways in which transcription initiation is controlled:

Promoter structure determines the basal level of transcription initiation

The consensus sequence for the E. coli promoter (Section 9.2.2) is quite variable, with a range of different motifs being permissible at both the -35 and -10 boxes (see Table 9.4). These variations, together with less well-defined sequence features around the transcription start site and in the first 50 or so nucleotides of the transcription unit, affect the efficiency of the promoter. Efficiency is defined as the number of productive initiations that are promoted per second, a productive initiation being one that results in the RNA polymerase clearing the promoter and beginning synthesis of a full-length transcript. The exact way in which the sequence of the promoter affects initiation is not known, but from our discussion of the events involved in transcription initiation (Section 9.2.3) we might, intuitively, expect that the precise sequence of the -35 box would influence recognition by the σ subunit and hence the rate of attachment of RNA polymerase, that the transition from the closed to open promoter complex might be dependent on the sequence of the -10 box, and that the frequency of abortive initiations (ones that terminate before they progress very far into the transcription unit) might be influenced by the sequence at, and immediately downstream of, nucleotide +1. All this is speculation but it is a sound ‘working hypothesis’. What is clear is that different promoters vary 1000 fold in their efficiencies, the most efficient promoters (called strong promoters) directing 1000 times as many productive initiations as the weakest promoters. We refer to these as differences in the basal rate of transcription initiation.

Note that the basal rate of transcription initiation for a gene is preprogrammed by the sequence of its promoter and so, under normal circumstances, cannot be changed. It could be changed by a mutation that alters a critical nucleotide in the promoter, and undoubtedly this happens from time to time, but it is not something that the bacterium has control over. The bacterium can, however, determine which promoter sequences are favored by changing the σ subunit of its RNA polymerase. The σ subunit is the part of the polymerase that has the sequence-specific DNA-binding capability (Section 9.2.3), so replacing one version of this subunit with a different version with a slightly different DNA-binding motif, and hence an altered sequence specificity, would result in a different set of promoters being recognized. In E. coli, the standard σ subunit, which recognizes the consensus promoter sequence shown on page 255 and hence directs transcription of most genes, is called σ70 (its molecular mass is approximately 70 kDa). E. coli also has a second σ subunit, σ32, which is made when the bacterium is exposed to a heat shock. During a heat shock, E. coli, in common with other organisms, switches on a set of genes coding for special proteins that help the bacterium withstand the stress (Figure 9.23). These genes have special promoter sequences, ones specifically recognized by the σ32 subunit. The bacterium is therefore able to switch on a whole range of different genes by making one simple alteration to the structure of its RNA polymerase. This system is common in bacteria: for example, Klebsiella pneumoniae uses it to control expression of genes involved in nitrogen fixation, this time with the σ54 subunit, and Bacillus species use a whole range of different σ subunits to switch on and off groups of genes during the changeover from normal growth to formation of spores (Section 12.3.1).

Figure 9.23. Recognition of an Escherichia coli heat shock gene by the σ32 subunit.

Figure 9.23

Recognition of an Escherichia coli heat shock gene by the σ32 subunit. (A) The sequence of the heat-shock promoter is different from that of the normal E. coli promoter (compare with Table 9.4). (B) The heat-shock promoter is not recognized by (more...)

Regulatory control over bacterial transcription initiation

Promoter structure determines the basal level of transcription initiation for a bacterial gene but, with the exception of alternative σ subunits, does not provide any general means by which the expression of the gene can respond to changes in the environment or to the biochemical requirements of the cell. Other types of regulatory control are needed.

The foundation of our understanding of regulatory control over transcription initiation in bacteria was laid in the early 1960s by François Jacob, Jacques Monod, and other geneticists who studied the lactose operon and other model systems (Burian and Gayon, 1999). We have already seen how this work led to discovery of the promoter for the lactose operon (Section 9.2.2). It also resulted in identification of the operator, a region adjacent to the promoter and which regulates initiation of transcription of the operon (Figure 9.24A). The original model envisaged that a DNA-binding protein - the lactose repressor - attached to the operator and prevented the RNA polymerase from binding to the promoter, simply by denying it access to the relevant segment of DNA (Figure 9.24B). Whether the repressor binds depends on the presence in the cell of allolactose, an isomer of lactose, the latter being the substrate for the biochemical pathway carried out by the enzymes coded by the three genes in the operon. Allolactose is an inducer of the lactose operon. When allolactose is present it binds to the lactose repressor, causing a slight structural change which prevents the HTH motifs of the repressor from recognizing the operator as a DNA-binding site. The allolactose-repressor complex therefore cannot bind to the operator, enabling the RNA polymerase to gain access to the promoter. When the supply of lactose is used up and there is no allolactose left to bind to the repressor, the repressor re-attaches to the operator and prevents transcription. The operon is therefore expressed only when the enzymes coded by the operon are needed.

Figure 9.24. Regulation of the lactose operon of Escherichia coli.

Figure 9.24

Regulation of the lactose operon of Escherichia coli. (A) The operator sequence lies immediately downstream of the promoter for the lactose operon. (B) In the original model for lactose regulation, the lactose repressor is looked on as a simple blocking (more...)

Most of the original scheme for regulation of the lactose operon has been confirmed by DNA sequencing of the control region and by structural studies of the repressor bound to its operator. The one complication has been the discovery that the repressor has three potential binding sites at nucleotide positions -82, +11 and +412, and attachment at only one of these, +11, would be expected to prevent access of the polymerase to the promoter. The repressor is a tetramer of four identical proteins which work in pairs to attach to a single operator, so it is possible that the repressor has the capacity to bind to two of the three operator sites at once. It is also possible that the repressor can bind to an operator sequence in such a way that it does not block attachment of the polymerase to the promoter, but does prevent a later step in initiation, such as formation of the open promoter complex.

The lactose operon illustrates the basic principle of regulatory control of transcription initiation: attachment of a DNA-binding protein to its specific recognition site can influence the events involved in assembly of the transcription initiation complex and/or initiation of productive RNA synthesis by an RNA polymerase. Several variations on this theme are seen with other bacterial genes:

  • Some repressors respond not to an inducer but to a co-repressor. An example is provided by the tryptophan operon of E. coli, which codes for a set of genes involved in synthesis of tryptophan (see Figure 2.20B). In contrast to the lactose operon, the regulatory molecule for the tryptophan operon is not a substrate for the relevant biochemical pathway, but the product, tryptophan itself (Figure 9.25). Only when tryptophan is attached to the tryptophan repressor can the latter bind to the operator. The tryptophan operon is therefore switched off in the presence of tryptophan, and switched on when tryptophan is needed.
  • Some DNA-binding proteins are activators rather than repressors of transcription initiation. The best studied example in E. coli is the catabolite activator protein, which binds to sites upstream of several operons, including the lactose operon, and increases the efficiency of transcript initiation, probably by forming a direct contact with the RNA polymerase. The biological role of the catabolite activator protein is described in Section 12.1.1.
  • Other DNA-binding proteins work singly or together to increase or repress transcription of genes to which they are not closely linked. These enhancers and silencers are not common in bacteria but a few examples are known, including an enhancer that acts on the E. coli heat-shock genes whose promoters are recognized by the σ32 version of the RNA polymerase. Because they are so far from the genes that they control, they can only form a contact with the RNA polymerase if the DNA forms a loop. A characteristic feature is that a single enhancer or silencer can control expression of more than one gene.
Figure 9.25. Regulation of the tryptophan operon of Escherichia coli.

Figure 9.25

Regulation of the tryptophan operon of Escherichia coli. Regulation occurs via a repressor-operator system in a similar way to that described for the lactose operon (Figure 9.24) but with the difference that the operon is repressed by the regulatory molecule, (more...)

Box Icon

Box 9.5

Cis and trans. ‘Cis’ and ‘trans’ are two important terms relevant to the genetic study of gene regulation in bacteria and other organisms. A locus is cis-acting on a second locus if it must be on the same DNA molecule in (more...)

9.3.2. Control of transcription initiation in eukaryotes

With bacteria, it is possible to make a clear distinction between constitutive and regulatory forms of control over transcription initiation. The former depends on promoter structure and determines the basal rate of transcription initiation; the latter depends on the activity of regulatory proteins and changes the rate of transcription initiation if the basal rate is inappropriate for the prevailing conditions. With eukaryotes, categorization of different types of control system is less easy. This is because of a fundamental difference in transcription initiation between bacteria and eukaryotes. In bacteria, the RNA polymerase has a strong affinity for its promoter and the basal rate of transcription initiation is relatively high for all but the weakest promoters. With most eukaryotic genes, the reverse is true. The RNA polymerase II and III pre-initiation complexes do not assemble efficiently and the basal rate of transcription initiation is therefore very low, regardless of how ‘strong’ the promoter is. In order to achieve effective initiation, formation of the complex must be activated by additional proteins. Some of these could be defined as ‘constitutive’ activators, in that they work on many different genes and seem not to respond to any external signals; others could be termed ‘regulatory’ activators because they target a limited number of genes and do respond to external signals. But there are gray areas between the two types and it is unwise to use this categorization as anything other than a guide to the types of event that occur.

Activators of eukaryotic transcription initiation

Any protein that stimulates transcription initiation is called an activator. Initially it was imagined that all activators were sequence-specific DNA-binding proteins, some recognizing upstream promoter elements and influencing transcription initiation only at the promoter to which these elements are attached, and others targeting sites within enhancers and influencing transcription of several genes at once (Figure 9.26). As with bacteria, eukaryotic enhancers can be some distance from their genes; their target specificity is ensured by the presence of insulators at either side of each functional domain, preventing the enhancers within that domain from influencing gene expression in adjacent domains (Section 8.1.2). Whether bound to an upstream promoter element or to a more distant enhancer, the activator, according to the traditional view, stabilizes the pre-initiation complex by making contact with it.

Figure 9.26. Activators of eukaryotic transcription initiation.

Figure 9.26

Activators of eukaryotic transcription initiation. The blue activator is attached to a regulatory module upstream of a gene, and influences transcription initiation only at that single gene. The green activator is attached to a site within an enhancer (more...)

This traditional view still holds for the majority of activators that have been identified but cannot be looked upon as all-encompassing. We have already seen that some proteins that were initially identified as activators are now recognized as components of chromatin modification complexes such as SAGA and Swi/Snf (Section 8.2.1). Other proteins classed as activators influence gene expression by introducing bends and other distortions into DNA (Thomas and Travers, 2001), possibly as a prelude to chromatin modification, or possibly to bring together proteins attached to non-adjacent sites, enabling the bound factors to work together in a structure that has been called an enhanceosome. An example of an activator that works in this way is SRY, which is the primary protein responsible for determining sex in mammals (Wolffe, 1995). Still other activators have no DNA-binding properties and they stimulate transcription simply by forming protein-protein contacts with the pre-initiation complex. As more and more activators are discovered, our appreciation of their diversity will undoubtedly grow (Lee and Young, 2000).

Activators have been looked upon as important in initiation by RNA polymerases II and III, but their role at RNA polymerase I promoters has been less well defined. RNA polymerase I is unusual in that it transcribes just a single set of genes: the multiple copies of the transcription unit containing the 28S, 5.8S and 18S rRNA sequences (Section 3.2.1). These genes are expressed continuously in most cells, but the rate of transcription varies during the cell cycle and is subject to a certain amount of tissue-specific regulation. The regulatory mechanism has not been described in detail but recent research has suggested a role for the RNA polymerase I termination factor. This factor, called TTF-1 in mice and Reb1p in Saccharomyces cerevisiae, was first identified as an activator of RNA polymerase II transcription. It appears that the termination factor may also activate RNA polymerase I transcription, a binding site for it having been located immediately upstream of the promoter for the rRNA transcription unit (Reeder and Lang, 1997).

Contacts between activators and the pre-initiation complex

A critical feature of the ‘traditional’ type of activator - those that bind to upstream promoter elements or to enhancers - is the contact that is formed with the pre-initiation complex. The part of the activator that makes this contact is called the activation domain. Structural studies have shown that although activation domains are variable, most of them fall into one of three categories:

Details of the interaction between activators and the pre-initiation complex were obscure for several years, with apparently conflicting evidence coming from work with different organisms. A number of protein-protein interaction studies had suggested that direct contacts could be made between different activators and various parts of the complex, with TBP, various TAFs, TFIIB, TFIIH and RNA polymerase II all implicated as partners in different interactions. An alternative possibility was raised when a large protein complex called the mediator was identified in yeast. The mediator forms a physical contact between activators and the C-terminal domain of RNA polymerase II (Figure 9.27; Thompson et al., 1993; Kim et al., 1994), suggesting that rather than direct interaction between an activator and the pre-initiation complex, the signal is transduced by the mediator. This hypothesis was strengthened when it was shown that the mediator possesses a protein kinase activity that enables it to phosphorylate the CTD of RNA polymerase II, stimulating promoter clearance (Section 9.2.3). The importance of the mediator in yeast transcription initiation was further underlined by the discovery that several of its components were previously looked upon as coactivators, proteins that are needed for full activation of the pre-initiation complex but which do not themselves respond directly to any of the external signals that modulate genome expression (as described in the last section in this chapter).

Figure 9.27. The role of the mediator.

Figure 9.27

The role of the mediator.

For a few years it appeared that the mediator might not be a common feature of eukaryotic pre-initiation complexes in general, but eventually an equivalent structure was identified in mammalian cells (Kingston, 1999; Malik and Roeder, 2000). Subsequent work has shown that there are several different versions of the mammalian mediator, each one responding to a different, although possibly overlapping, set of activators. Current opinion tends to the view that a mediator is an obligatory component of the RNA polymerase II pre-initiation complex, and that the stimulatory effects of all activators pass through the mediator. The possibility that some activators bypass the mediator and have a direct effect on one or other part of the pre-initiation complex cannot, however, be discounted.

Repressors of eukaryotic transcription initiation

Most of the research on regulation of transcription initiation in eukaryotes has concentrated on activation, partly because the low level of basal initiation occurring at RNA polymerase II and III promoters suggests that the repression of initiation, which is so important in bacteria (Section 9.3.1), is unlikely to play a major part in control of eukaryotic transcription. This view is probably incorrect because a growing number of DNA-binding proteins that repress transcription initiation are being discovered, these proteins binding to upstream promoter elements or to more distant sites in silencers. Some influence genome expression in a general way through histone deacetylation or DNA methylation (Section 8.2.2), but others have more specific effects at individual promoters. The yeast repressors called Mot1 and NC2, for example, inhibit assembly of the pre-initiation complex by binding directly to TBP and disrupting its activity. Mot1 causes TBP to dissociate from the DNA, and NC2 prevents further assembly of the complex on the bound TBP (Lee and Young, 2000). Both of these repressors have a broad spectrum of activity, inactivating a large set of genes, as does the Ssn6-Tup1 repressor, which is one of the main gene silencers in the yeast Schizosaccharomyces pombe, and which has homologs in many other eukaryotes (Smith and Johnson, 2000).

Another indication of the importance of repression in eukaryotic transcription comes from the demonstration that some proteins can exert both activating and repressing effects, depending on the circumstances. NC2, for example, represses initiation of transcription from promoters with a TATA box but has an activating effect on promoters that lack the TATA sequence (Willy et al., 2000). Pit-1, which is the first of the three proteins after which the POU domain is named (Section 9.1.4), activates some genes and represses others, depending on the sequence of its DNA-binding site (Scully et al., 2000). The presence in this site of two additional nucleotides induces a change in the conformation of Pit-1, enabling it to interact with a second protein called N-CoR and repress transcription of the target gene (Figures 9.28 and 9.29).

Figure 9.28. Conformation of the POU domains of the Pit-1 activator bound to its target sites upstream of the prolactin and growth-hormone genes.

Figure 9.28

Conformation of the POU domains of the Pit-1 activator bound to its target sites upstream of the prolactin and growth-hormone genes. Pit-1 is a dimer, and each monomer has two POU domains (Section 9.1.4). The two domains of one monomer are shown in red (more...)

Figure 9.29. Pit-1 can activate or repress transcription initiation depending on the sequence of its DNA-binding site.

Figure 9.29

Pit-1 can activate or repress transcription initiation depending on the sequence of its DNA-binding site. Pit-1 activates transcription of the prolactin gene but represses transcription of the growth hormone gene. The drawing shows the contacts made between (more...)

Relatively little is known about the precise interactions occurring between repressors and the pre-initiation complex. A variety of inhibition domains (the converse of an activation domain) have been identified in eukaryotic repressors, several of which are rich in prolines, but no general patterns have emerged (Hanna-Rose and Hansen, 1996). The direct interactions with TBP displayed by Mot1 and NC2 argue against the involvement in repression of a complex equivalent to the mediator that is required for gene activation.

Box Icon

Box 9.6

The modular structures of RNA polymerase II promoters. The promoter for a gene transcribed by RNA polymerase II can be looked on as a series of modules, each comprising a short sequence of nucleotides and each acting as the binding site for a protein (more...)

Controlling the activities of activators and repressors

The operation of individual activators and repressors must be controlled in order to ensure that the appropriate set of genes is expressed by a cell. We will return to this topic in Chapter 12, when it will form the central theme of our study of the ways in which genome activity is regulated in response to extracellular signals and during differentiation and development.

There are several ways in which an activator or repressor could be regulated. One possibility is to control its synthesis, but this does not permit rapid changes in genome expression because it takes time to accumulate an activator or repressor in the cell, or to destroy it when it is not needed. This type of control is therefore associated with activators and repressors responsible for maintaining stable patterns of genome expression, for example those underlying cellular differentiation and some aspects of development. An alternative way of controlling an activator or repressor is by chemical modification, for example by phosphorylation, or by inducing a change in its conformation. These changes are much more rapid than de novo synthesis, and enable the cell to respond to extracellular signaling compounds that induce transient changes in genome expression. We will examine the details of these various regulatory mechanisms in Chapter 12.

Study Aids For Chapter 9

Key terms

Give short definitions of the following terms:

  • β-turn
  • κ-homology domain
  • Acidic domain
  • Activation domain
  • Activator
  • Affinity chromatography
  • Basal promoter
  • Basal promoter element
  • Basal rate of transcription initiation
  • Basic domain
  • CAAT box
  • Catabolite activator protein
  • Cell-specific module
  • Chemical shift
  • Closed promoter complex
  • Coactivator
  • Constitutive control
  • Core promoter
  • Co-repressor
  • C-terminal domain (CTD)
  • Cys2His2 finger
  • Direct readout
  • DNA bending
  • DNA-binding motif
  • DNA-binding protein
  • Double-stranded RNA binding domain (dsRBD)
  • Enhanceosome
  • Enhancer
  • Footprinting
  • GC box
  • Gel retardation
  • General transcription factor (GTF)
  • Glutamine-rich domain
  • Helix-loop-helix
  • Helix-turn-helix
  • Homeodomain
  • Inducer
  • Inhibition domain
  • Initiator (Inr) sequence
  • Lactose repressor
  • Leucine zipper
  • Mediator
  • Modification interference
  • Modification protection
  • Multicysteine zinc finger
  • Nuclear magnetic resonance (NMR) spectroscopy
  • Octamer module
  • Open promoter complex
  • Operator
  • POU domain
  • Pre-initiation complex (PIC)
  • Proline-rich domain
  • Promoter
  • Promoter clearance
  • Recognition helix
  • Regulatory control
  • Response module
  • Ribbon-helix-helix motif
  • Ribonucleoprotein (RNP) domain
  • RNA polymerase I
  • RNA polymerase II
  • RNA polymerase III
  • Silencer
  • Strong promoter
  • TAF and initiator-dependent cofactor (TIC)
  • TATA box
  • TATA-binding protein (TBP)
  • TBP domain
  • TBP-associated factor (TAF)
  • Termination factor
  • Upstream control element
  • Upstream promoter element
  • Winged helix-turn-helix
  • X-ray crystallography
  • X-ray diffraction
  • X-ray diffraction pattern
  • Zinc finger

Self study questions

1.

Explain why DNA-binding proteins are central to genome expression.

2.

Describe how gel retardation can be used to study DNA-protein interactions. What are the limitations of this technique?

3.

Draw diagrams to illustrate the modification protection and modification interference techniques. Indicate the key differences and describe how these differences underlie the specific applications of the two techniques.

4.

Explain how affinity chromatography is used to purify a DNA-binding protein.

5.

Write short essays on (a) X-ray crystallography, and (b) nuclear magnetic resonance spectroscopy, emphasizing the use of these techniques in the study of DNA-binding proteins.

6.

Describe, with examples, how proteins that contain the helix-turn-helix motif bind to DNA. List, again with examples, the modified versions of the helix-turn-helix motif that are found in eukaryotic proteins.

7.

Using examples, distinguish between two or more types of zinc finger.

8.

Compare and contrast the structures used by proteins to bind to DNA and/or RNA molecules.

9.

What features of the double helix are important in determining the nature of the interaction between DNA and a binding protein?

10.

Describe the types of contact made between DNA and a binding protein. Why are many DNA-binding proteins dimeric?

11.

Distinguish between the three nuclear RNA polymerases of eukaryotes. How is the Escherichia coli RNA polymerase similar to or different from the eukaryotes enzymes?

12.

Define the term ‘promoter’. Draw annotated diagrams to illustrate the structures of the promoters for the three eukaryotic RNA polymerases and for the Escherichia coli enzyme.

13.

Explain the roles of the two components of the Escherichia coli promoter during initiation of transcription. Be sure to make clear the difference between the closed and open versions of the promoter-RNA polymerase complex.

14.

Write an essay on ‘Assembly of the RNA polymerase II initiation complex’. As part of your essay, compile a table giving the names of the main proteins or groups of proteins involved in assembly of this complex, along with a summary of the role of each one.

15.

What is the importance of the C-terminal domain of the largest subunit of RNA polymerase II?

16.

How does the TATA-binding protein provide a link between the initiation processes of all three eukaryotic RNA polymerases?

17.

Describe how promoter structure influences gene expression in Escherichia coli.

18.

Using examples, outline how the use of alternative σ subunits enables a bacterium to alter its pattern of genome expression.

19.

Draw a series of diagrams to show how initiation of transcription is regulated at the lactose and tryptophan operons of Escherichia coli. Indicate the key differences between these two control mechanisms.

20.

What is an activator? How do activators influence assembly of the RNA polymerase II initiation complex?

21.

Explain why the discovery of a mammalian mediator was looked upon as a critical breakthrough in understanding the control of transcription initiation.

22.

Describe our current knowledge of proteins that repress eukaryotic transcription initiation.

Problem-based learning

1.

The methods for locating the positions of protein binding sites described in Section 9.1.1 assume that these sites are located in the region upstream of a gene. Is this assumption justified?

2.

Use your knowledge of DNA chip and microarray technologies (Technical Note 5.1) to devise a method for identifying the attachment sites for a DNA-binding protein across the entire genome, as opposed to just within the region upstream of a single gene.

3.

Write a report that elaborates on, and extends, the discussion presented in Box 9.2 (page 252), concerning the possibility that the amino acid sequence of a recognition helix can be used to deduce the nucleotide sequence of the DNA-binding site for a protein that contains that helix.

4.

Construct a hypothesis to explain why eukaryotes have three RNA polymerases. Can your hypothesis be tested?

5.

A model for control of transcription of the lactose operon in Escherichia coli was first proposed by François Jacob and Jacques Monod in 1961 (Jacob F and Monod J [1961] Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol., 3, 318–356). Explain the extent to which their work, which was based almost entirely on genetic analysis, provided an accurate description of the molecular events that are now known to occur.

6.

To what extent is E. coli a good model for the regulation of transcription initiation in eukaryotes? Justify your opinion by providing specific examples of how extrapolations from E. coli have been helpful and/or unhelpful in the development of our understanding of equivalent events in eukaryotes.

7.

Assess the accuracy and usefulness of the module concept for the structure of an RNA polymerase II promoter.

Box Icon

Box 9.2

Can sequence specificity be predicted from the structure of a recognition helix? An intriguing question is whether the specificity of DNA binding can be understood in sufficient detail for the sequence of a protein's target site to be predicted from examination (more...)

References

  1. Bult CJ, White O, Olsen GJ. et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. (1996);273:1058–1073. [PubMed: 8688087]
  2. Burian RM, Gayon J. The French school of genetics: from physiological and population genetics to regulatory molecular genetics. Ann. Rev. Genet. (1999);33:313–349. [PubMed: 10690411]
  3. Burley SK, Roeder RG. Biochemistry and structural biology of transcription factor IID (TFIID). Ann. Rev. Biochem. (1996);65:769–799. [PubMed: 8811195]
  4. Chasman DI, Flaherty KM, Sharp PA, Kornberg RD. Crystal structure of yeast TATA-binding protein and model for interaction with DNA. Proc. Natl Acad. Sci. USA. (1993);90:8174–8178. [PMC free article: PMC47311] [PubMed: 8367480]
  5. Choo Y, Klug A. Physical basis of a protein-DNA recognition code. Curr. Opin. Struct. Biol. (1997);7:117–125. [PubMed: 9032060]
  6. Clarke ND, Berg JM. Zinc fingers in Caenorhabditis elegans: finding families and probing pathways. Science. (1998);282:2018–2022. [PubMed: 9851917]
  7. Evans JNS (1995) Biomolecular NMR Spectroscopy. Oxford University Press, Oxford.
  8. Falvo JV, Thanos D, Maniatis T. Reversal of intrinsic DNA bends in the IFNb gene enhancer by transcription factors and the architectural protein HMG I(Y). Cell. (1995);83:1101–1111. [PubMed: 8548798]
  9. Fierro-Monti I, Mathews MB. Proteins binding to duplexed RNA; one motif, multiple functions. Trends Biochem. Sci. (2000);25:241–246. [PubMed: 10782096]
  10. Galas D, Schmitz A. DNase footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res. (1978);5:3157–3170. [PMC free article: PMC342238] [PubMed: 212715]
  11. Gangloff YG, Romier C, Thuault S, Werten S, Davidson I. The histone fold is a key structural motif of transcription factor TFIID. Trends Biochem. Sci. (2001);26:250–257. [PubMed: 11295558]
  12. Garner MM, Revzin A. A gel electrophoretic method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system. Nucleic Acids Res. (1981);9:3047–3060. [PMC free article: PMC327330] [PubMed: 6269071]
  13. Grant PA, Schieltz D, Pray-Grant MG. et al. A subset of TAFIIs are integral components of the SAGA complex required for nucleosome acetylation and transcriptional stimulation. Cell. (1998);94:45–53. [PubMed: 9674426]
  14. Green MR. TBP-associated factors (TAFIIs): multiple, selective transcriptional mediators in common complexes. Trends Biochem. Sci. (2000);25:59–63. [PubMed: 10664584]
  15. Hanna-Rose W, Hansen U. Active repression mechanisms of eukaryotic transcription repressors. Trends Genet. (1996);12:229–234. [PubMed: 8928228]
  16. Harrison SC, Aggarwal AK. DNA recognition by proteins with the helix-turn-helix motif. Ann. Rev. Biochem. (1990);59:933–969. [PubMed: 2197994]
  17. Hendrickson W, Schleif R. A dimer of AraC protein contacts three adjacent major groove regions at the Ara I DNA site. Proc. Natl Acad. Sci. USA. (1985);82:3129–3133. [PMC free article: PMC397728] [PubMed: 3858809]
  18. Herr W, Sturm RA, Clerc RG. et al. The POU domain: a large conserved region in the mammalian pit-1, oct-1, oct-2 and Caenorhabditis elegans unc-86 gene products. Genes Develop. (1988);2:1513–1516. [PubMed: 3215510]
  19. Kadonaga JT. Purification of sequence-specific DNA binding proteins by DNA affinity chromatography. Methods Enzymol. (1991);208:10–23. [PubMed: 1779831]
  20. Kielkopf CL, White S, Szewczyk JW. et al. A structural basis for recognition of A·T and T·A base pairs in the minor groove of B-DNA. Science. (1998);282:111–115. [PubMed: 9756473]
  21. Kim T-K, Ebright RH, Reinberg D. Mechanism of ATP-dependent promoter melting by transcription factor IIH. Science. (2000);288:1418–1421. [PubMed: 10827951]
  22. Kim YC, Geiger JH, Hahn S, Sigler PB. Crystal structure of a yeast TBP/TATA-box complex. Nature. (1993);365:512–520. [PubMed: 8413604]
  23. Kim YJ, Bjorklund S, Li Y, Sayre MH, Kornberg RD. A multiprotein mediator of transcriptional activation and its interaction with the C-terminal repeat domain of RNA polymerase II. Cell. (1994);77:599–608. [PubMed: 8187178]
  24. Kingston RE. A shared but complex bridge. Nature. (1999);399:199–200. [PubMed: 10353236]
  25. Lee TI, Young RA. Transcription of eukaryotic protein-coding genes. Ann. Rev. Genet. (2000);34:77–137. [PubMed: 11092823]
  26. Luisi B (1995) DNA-protein interaction at high resolution. In: DMJ Lilley, ed. DNA-Protein: Structural Interactions, pp. 1–48. IRL Press, Oxford.
  27. Mackay JP, Crossley M. Zinc fingers are sticking together. Trends Biochem. Sci. (1998);23:1–4. [PubMed: 9478126]
  28. Malik S, Roeder RG. Transcriptional regulation through mediator-like coactivators in yeast and metazoan cells. Trends Biochem. Sci. (2000);25:277–283. [PubMed: 10838567]
  29. Marr MT, Roberts JW. Promoter recognition as measured by binding of polymerase to nontemplate strand oligonucleotide. Science. (1997);276:1258–1260. [PubMed: 9157885]
  30. Novina CD, Roy AL. Core promoters and transcriptional control. Trends Genet. (1996);12:351–355. [PubMed: 8855664]
  31. Reeder RH, Lang WH. Terminating transcription in eukaryotes: lessons learned from RNA polymerase I. Trends Biochem. Sci. (1997);22:473–477. [PubMed: 9433127]
  32. Rhodes G (1999) Crystallography Made Crystal Clear, 2nd edition. Academic Press, London.
  33. Scully KM, Jacobson EM, Jepsen K. et al. Allosteric effects of Pit-1 DNA sites on long-term repression in cell type specification. Science. (2000);290:1127–1131. [PubMed: 11073444]
  34. Seither P, Iben S, Grummt I. Mammalian RNA polymerase I exists as a holoenzyme with associated basal transcription factors. J. Mol. Biol. (1998);275:43–53. [PubMed: 9451438]
  35. Singh H, LeBowitz JH, Baldwin AS, Sharp PA. Molecular cloning of an enhancer binding protein: isolation by screening of an expression library with a recognition site DNA. Cell. (1988);52:415–423. [PubMed: 2964277]
  36. Smith RL, Johnson AD. Turning genes off by Ssn6-Tup1: a conserved system of transcriptional repression in eukaryotes. Trends Biochem. Sci. (2000);25:325–330. [PubMed: 10871883]
  37. Stormo GD, Fields DS. Specificity, free energy and information content in protein–DNA interactions. Trends Biochem. Sci. (1998);23:109–113. [PubMed: 9581503]
  38. Thomas JO, Travers AA. HMG1 and 2, and related ‘architectural’ DNA-binding proteins. Trends Biochem. Sci. (2001);26:167–172. [PubMed: 11246022]
  39. Thompson CM, Koleske AJ, Chao DM, Young RA. A multisubunit complex associated with the RNA polymerase II CTD and TATA-binding protein in yeast. Cell. (1993);73:1361–1375. [PubMed: 8324825]
  40. Travers AA (1995) DNA bending by sequence and proteins. In: DMJ Lilley ed. DNA-Protein: Structural Interactions, pp. 49–75. IRL Press, Oxford.
  41. Verrijzer CP. Transcription factor IID – not so basal after all. Science. (2001);293:2010–2011. [PubMed: 11557865]
  42. Willy PJ, Kobayashi R, Kadonaga JT. A basal transcription factor that activates or represses transcription. Science. (2000);290:982–984. [PubMed: 11062130]
  43. Wolffe AP. Architectural transcription factors. Science. (1994);264:1100–1101. [PubMed: 8178167]
  44. Wolffe AP (1995) Genetic effects of DNA packaging. Sci. Am., Nov/Dec, 68–77.
  45. Young MA, Beveridge DL. Molecular dynamics simulations of an oligonucleotide duplex with adenine tracts phased by a full helix turn. J. Mol. Biol. (1998);281:675–687. [PubMed: 9710539]
  46. Yudkovsky N, Ranish JA, Hahn S. A transcription reinitiation intermediate that is stabilized by activator. Nature. (2000);408:225–229. [PubMed: 11089979]

Further Reading

  1. Adhya S (1996) The lac and gal operons today. In: ECC Lin and AS Lynch, eds. Regulation of Gene Expression in Escherichia coli, pp. 181–200. Chapman & Hall, New York. —A comprehensive description of the lactose operon.
  2. Geiduschek EP, Kassavetis GA. The RNA polymerase III transcription apparatus. J. Mol. Biol. (2001);310:1–26. [PubMed: 11419933]
  3. Kornberg R D. Eukaryotic transcriptional control. Trends Cell Biol. (1999);9:M46–M49.An excellent overview. [PubMed: 10611681]
  4. Latchman DS (1995) Gene Regulation: A Eukaryotic Perspective. Stanley Thorne, Cheltenham. —This is the best general text on this subject.
  5. Latchman DS (1998) Eukaryotic Transcription Factors, 3rd edition. Academic Press, London. —This is also the best general text on this subject.
  6. Latchman DS. Transcription factors: bound to activate or repress. Trends Biochem. Sci. (2001);26:211–213.Short review of proteins that combine activation with repression. [PubMed: 11295539]
  7. Lilley DMJ (ed) (1995) DNA-Protein: Structural Interactions. IRL Press, Oxford. —Research-level description of the subject.
  8. Myers LC, Kornberg RD. Mediator of transcriptional regulation. Ann. Rev. Biochem. (2000);69:729–749.A detailed review of this topic. [PubMed: 10966474]
  9. Nagai K and Mattaj IW (eds) (1994) RNA-Protein Interactions. IRL Press, Oxford. —Complements the material in this chapter by providing details of RNA-binding proteins.
  10. Neidle S (1994) DNA Structure and Recognition: In Focus. IRL Press, Oxford. —Easy to digest information of DNA-binding proteins.
  11. Schleif R. Regulation of the L-arabinose operon of Escherichia coli. Trends Genet. (2000);16:559–565.Gives details of one example of bacterial gene regulation. [PubMed: 11102706]
  12. Travers A (1993) DNA-Protein Interactions. Chapman & Hall, London. —The most accessible of the various books on this topic.
Image ch3f2a
Image ch1f11
Image ch1f12
Image ch3f6
Image ch2f20
Copyright © 2002, Garland Science.
Bookshelf ID: NBK21115

Views

Related Items in Bookshelf

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...