Synthesis and Processing of RNA

Terence A Brown

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Brown TA. Genomes. 2nd edition. Oxford: Wiley-Liss; 2002.

Genomes. 2nd edition.

Show details

< Prev Next >

Chapter 10Synthesis and Processing of RNA

Learning outcomes

When you have read Chapter 10, you should be able to:

Describe the elongation and termination phases of transcription in Escherichia coli, and explain how these are regulated by antitermination and attenuation
Give details of elongation and termination of eukaryotic transcripts, including the processes responsible for capping and polyadenylation of eukaryotic mRNAs
Distinguish between the splicing pathways of different types of intron, and in particular give a detailed description of splicing of GU-AG introns, including examples of alternative splicing
Describe the cutting events involved in processing of bacterial and eukaryotic pre-rRNA and pre-tRNA
Define the term ‘ribozyme’, and give examples of ribozymes
Explain how eukaryotic rRNAs are chemically modified at specific nucleotide positions
Give examples of mRNA editing in mammals and outline the more complex types of RNA editing that occur in various other eukaryotes
Describe the mRNA degradation processes of bacteria and eukaryotes
Outline the events involved in transport of eukaryotic RNAs from the nucleus to the cytoplasm

Initiation of transcription, culminating with the RNA polymerase leaving the promoter and beginning synthesis of an RNA molecule, is simply the first step in the genome expression pathway. In this chapter and the next we will follow the process onwards and examine how transcription and translation eventually result in synthesis of the proteome.

10.1. Synthesis and Processing of mRNA

We begin our detailed study of transcription by looking at the synthesis and processing of mRNAs, the molecules that make up the transcriptome and which specify the protein content of the cell. As the central players in genome expression, mRNAs have received the greatest attention from researchers and we now have a detailed picture of how they are produced. Events in bacteria are different in many respects from those in eukaryotes and so we will deal with the two types of organism in different sections. One aspect of eukaryotic mRNA processing - intron splicing - is so important that it requires a section of its own.

10.1.1. Synthesis of bacterial mRNAs

Bacterial mRNAs do not undergo any significant forms of processing: the primary transcript that is synthesized by the RNA polymerase is itself the mature mRNA, and its translation usually begins before transcription is complete (Figure 10.1). This coupling of transcription and translation is important in that it allows special types of control to be applied to the regulation of bacterial mRNA synthesis, as will be described on page 277.

Figure 10.1

In bacteria, transcription and translation are often coupled.

The elongation phase of bacterial transcription

Because there is just one bacterial RNA polymerase (Section 9.2.1), the general mechanism of transcription is the same for all bacterial genes. The following descriptions of elongation and termination, given in the context of mRNA synthesis, therefore apply equally well to the synthesis of non-coding RNA.

The chemical basis of the template-dependent synthesis of RNA was shown in Figure 3.5. Ribonucleotides are added one after another to the growing 3′ end of the RNA transcript, the identity of each nucleotide specified by the base-pairing rules: A base-pairs with T or U; G base-pairs with C. During each nucleotide addition, the β- and γ-phosphates are removed from the incoming nucleotide (see Figure 1.6), and the hydroxyl group is removed from the 3′-carbon of the nucleotide present at the end of the chain.

At this stage of transcription, the bacterial RNA polymerase is in its core enzyme form, comprising four proteins, two relatively small (approximately 35 kDa) α subunits, and one each of the related subunits β and β′ (both approximately 150 kDa): the σ subunit that has the key role in initiation has now left the complex (Section 9.2.3). The RNA polymerase covers about 30 bp of the template DNA, including the transcription bubble of 12–14 bp, within which the growing transcript is held to the template strand of the DNA by approximately eight RNA-DNA base pairs (Figure 10.2). The RNA polymerase has to keep a tight grip on both the DNA template and the RNA that it is making in order to prevent the transcription complex from falling apart before the end of the gene is reached. However, this grip must not be so tight as to prevent the polymerase from moving along the DNA. To understand how these apparently contradictory requirements are met, the interactions between the polymerase, the DNA template and the RNA transcript are being examined by X-ray crystallography studies (Section 9.1.3), combined with crosslinking experiments in which covalent bonds are formed between the DNA or RNA and the polymerase, these bonds enabling the amino acids that are closest to the DNA and RNA to be identified. In the current model the DNA template lies between the β and β′ subunits, within a trough on the enclosed surface of β′. The active site for RNA synthesis also lies between these two subunits, with the non-template strand of DNA held within the β subunit and the RNA transcript extruded from the complex via a channel formed partly by the β and partly by the β′ subunit (Research Briefing 10.1; Korzheva et al., 2000).

Figure 10.2

Schematic representation of the Escherichia coli transcription elongation complex. The RNA polymerase covers approximately 30 bp of DNA, including a transcription bubble of 12–13 bp, with the RNA attached to the template strand of the DNA by eight (more...)

Box 10.1

The structure of the bacterial RNA polymerase. Structural studies have provided insights into the mechanism for transcription elongation and termination in bacteria. One of the most important developments in molecular biology in recent years has been (more...)

Termination of bacterial transcription

Exactly how termination occurs is not known. Current thinking views transcription as a stepwise nucleotide-by-nucleotide process, with the polymerase pausing at each position and making a ‘choice’ between continuing elongation by adding another ribonucleotide to the transcript, or terminating by dissociating from the template. Which choice is selected depends on which alternative is more favorable in thermodynamic terms (von Hippel, 1998). This model emphasizes that, in order for termination to occur, the polymerase has to reach a position on the template where dissociation is more favorable than continued RNA synthesis.

Bacteria appear to use two distinct strategies for transcription termination. About half the positions in Escherichia coli at which transcription terminates correspond to DNA sequences where the template strand contains an inverted palindrome followed by a run of deoxyadenosine nucleotides (Figure 10.3). These intrinsic terminators have been thought to promote dissociation of the polymerase by destabilizing the attachment of the growing transcript to the template, in two ways. First, when the inverted palindrome is transcribed, the RNA sequence folds into a stable hairpin, this RNA-RNA base pairing being favored over the DNA-RNA pairing that normally occurs within the transcription bubble. This reduces the number of contacts made between the template and transcript, weakening the overall interaction and favoring dissociation. The interaction is further weakened when the run of As in the template is transcribed, because the resulting A-U base pairs have only two hydrogen bonds each, compared with three for each G-C pair. The net result is that termination is favored over continued elongation (von Hippel, 1998). This model is easy to rationalize with the known properties of DNA-RNA hybrids, but an alternative hypothesis has been prompted by the result of crosslinking experiments, which have shown that the RNA hairpin makes contact with a flap structure on the outer surface of the RNA polymerase β subunit, adjacent to the exit point of the channel through which the RNA emerges from the complex. Although the flap structure is quite distant (some 6.5 nm) from the active site of the polymerase, a direct connection is made between the two by a segment of β-sheet within the β subunit. Movement of the flap could therefore affect the positioning of amino acids within the active site, possibly leading to breakage of the DNA-RNA base pairs and termination of transcription (Research Briefing 10.1). Additional evidence in support of this model comes from the demonstration that the protein called NusA, which enhances termination at intrinsic promoters, interacts with the hairpin loop and flap structure and may stabilize the contact between the two (Toulokhonov et al., 2001).

Figure 10.3

Termination at an intrinsic terminator. The presence of an inverted palindrome in the DNA sequence results in formation of a hairpin loop in the transcript. See Research Briefing 10.1 for a more accurate description of transcription termination.

The second type of bacterial termination signal is Rho dependent. These signals usually retain the hairpin feature of intrinsic terminators, although the hairpin is less stable and there is no run of As in the template. Termination requires the activity of a protein called Rho, which attaches to the transcript and moves along the RNA towards the polymerase. If the polymerase continues to synthesize RNA then it keeps ahead of the pursuing Rho, but at the termination signal the polymerase stalls (see Figure 10.4). Exactly why has not been explained - presumably the hairpin loop that forms in the RNA is responsible in some way - but the result is clear: Rho is able to catch up. Rho is a helicase, which means that it actively breaks base pairs, in this case between the template and transcript, resulting in termination of transcription.

Figure 10.4

Rho-dependent termination. Rho is a helicase that follows the RNA polymerase along the transcript. When the polymerase stalls at a hairpin, Rho catches up and breaks the RNA-DNA base pairs, releasing the transcript. Note that the diagram is schematic (more...)

Control over the choice between elongation and termination

In bacteria, two mechanisms have evolved for influencing the repeated choice that the polymerase has to make between elongation and termination when copying a template. Both mechanisms are important in regulating the expression of genes contained within operons.

The first process is called antitermination. This occurs when the RNA polymerase ignores a termination signal and continues elongating its transcript until a second signal is reached (Figure 10.5). It provides a mechanism whereby one or more of the genes at the end of an operon can be switched off or on by the polymerase recognizing or not recognizing a termination signal located upstream of those genes. Antitermination is controlled by an antiterminator protein, which attaches to the DNA near the beginning of the operon and then transfers to the RNA polymerase as it moves past en route to the first termination signal. The presence of the antiterminator protein causes the enzyme to ignore the termination signal, presumably by countering the destabilizing properties of an intrinsic terminator or by preventing stalling at a Rho-dependent terminator. Although the mechanics of the process are unclear, the impact that antitermination can have on gene expression has been described in detail, especially during the infection cycle of bacteriophage λ (Box 10.1).

Figure 10.5

Antitermination. The antiterminator protein attaches to the DNA and transfers to the RNA polymerase as it moves past, subsequently enabling the polymerase to continue transcription through termination signal number 1, so the second of the pair of genes (more...)

Box 10.1

Antitermination during the infection cycle of bacteriophage λ. Bacteriophage λ provides the best studied example of the use of antitermination as a means of regulating gene expression (Friedman et al., 1987). Immediately after entering (more...)

The second type of termination control is called attenuation. This system operates primarily with operons that code for enzymes involved in amino acid biosynthesis, but a few other examples are also known. The tryptophan operon of E. coli (Section 9.3.1) illustrates how it works. In this operon, two hairpin loops can form in the region between the start of the transcript and the beginning of trpE. The smaller of these loops acts as a termination signal, but the larger hairpin loop, which is closer to the start of the transcript, is more stable. The larger loop overlaps with the termination hairpin, so only one of the two hairpins can form at any one time. Which loop forms depends on the relative positioning between the RNA polymerase and a ribosome which attaches to the 5′ end of the transcript as soon as it is synthesized in order to translate the genes into protein (Figure 10.6). If the ribosome stalls so that it does not keep up with the polymerase, then the larger hairpin forms and transcription continues. However, if the ribosome keeps pace with the RNA polymerase then it disrupts the larger hairpin by attaching to the RNA that forms part of the stem of this hairpin. When this happens the termination hairpin is able to form, and transcription stops. Ribosome stalling can occur because upstream of the termination signal is a short open reading frame (ORF) coding for a 14-amino-acid peptide that includes two tryptophans. If the amount of free tryptophan is limiting, then the ribosome stalls as it attempts to synthesize this peptide, while the polymerase continues to make its transcript. Because this transcript contains copies of the genes coding for the biosynthesis of tryptophan, its continued elongation addresses the requirement that the cell has for this amino acid. When the amount of tryptophan in the cell reaches a satisfactory level, the attenuation system prevents further transcription of the tryptophan operon, because now the ribosome does not stall while making the short peptide, and instead keeps pace with the polymerase, allowing the termination signal to form.

Figure 10.6

Attenuation at the tryptophan operon. See the text for details.

The E. coli tryptophan operon is controlled not only by attenuation but also by a repressor (Section 9.3.1). Exactly how attenuation and repression work together to regulate expression of the operon is not known, but it is thought that repression provides the basic on-off switch and attenuation modulates the precise level of gene expression that occurs. Other E. coli operons, such as those for biosynthesis of histidine, leucine and threonine, are controlled solely by attenuation. Interestingly, in some bacteria, including Bacillus subtilis, the tryptophan operon is one of those that does not have a repressor system and so is regulated entirely by attenuation. In these bacteria, attenuation is mediated not by the speed at which the ribosome tracks along the mRNA, but by an RNA-binding protein called trp RNA-binding attenuation protein (TRAP) which, in the presence of tryptophan, attaches to the mRNA in the region equivalent to the short ORF of the E. coli transcript (Figure 10.7). Attachment of TRAP leads to formation of the termination signal and cessation of transcription (Antson et al., 1999).

Figure 10.7

Regulation of the tryptophan operon of Bacillus subtilis. (A) Regulation centers on the protein called TRAP, which attaches to the leader region of the transcript when tryptophan levels are adequate. When attached, TRAP prevents formation of the large (more...)

10.1.2. Synthesis of eukaryotic mRNAs by RNA polymerase II

At the most fundamental level, transcription is similar in bacteria and eukaryotes. The chemistry of RNA polymerization is identical in all types of organism, and the three eukaryotic RNA polymerases are all structurally related to the E. coli RNA polymerase, their three largest subunits being equivalent to the α, β and β′ subunits of the bacterial enzyme. The contacts between the eukaryotic polymerase II, the template DNA and the RNA transcript, as revealed by X-ray crystallography and crosslinking studies (Klug, 2001), are similar to the interactions described for bacterial transcription (see Research Briefing 10.1), and the basic principle that transcription is a step-by-step competition between elongation and termination (see page 275) also holds.

Despite this equivalence, the overall processes for mRNA synthesis in bacteria and eukaryotes are quite different. The most striking dissimilarity is the extent to which eukaryotic mRNAs are processed during transcription. In bacteria, the transcripts of protein-coding genes are not processed at all: the primary transcripts are mature mRNAs. In contrast, all eukaryotic mRNAs have a cap added to the 5′ end, most are also polyadenylated by addition of a series of adenosines to the 3′ end, many contain introns and so undergo splicing, and a few are subject to RNA editing. A function has been assigned to capping, but the reason for polyadenylation largely remains a mystery. With splicing and editing we can appreciate why the events occur - the former removes introns that block translation of the mRNA; the latter changes the coding properties of the mRNA - but we do not understand why these mechanisms have evolved. Why do genes have introns in the first place? Why edit an mRNA rather than encoding the desired sequences in the DNA?

Eukaryotic mRNAs are processed while they are being synthesized. The cap is added as soon as transcription has been initiated, splicing and editing begin while the transcript is still being made, and polyadenylation is an inherent part of the termination mechanism for RNA polymerase II. To deal with all of these events together would be confusing, with too many different things being described at once. We will therefore postpone editing until the end of the chapter, which means it can be dealt with in tandem with similar forms of chemical modification occurring during rRNA and tRNA processing, and we will consider splicing after we have studied capping, elongation and polyadenylation.

Capping of RNA polymerase II transcripts occurs immediately after initiation

Although phosphorylation of the C-terminal domain (CTD) of the largest subunit of RNA polymerase II is the final step in initiation of transcription of mRNA-encoding genes in eukaryotes (Section 9.2.3), it is not immediately followed by the onset of elongation. A somewhat gray area exists in our understanding of the events that distinguish promoter clearance, which refers to the transition from the pre-initiation complex to a complex that has begun to synthesize RNA, and promoter escape, during which the polymerase moves away from the promoter region and becomes committed to making a transcript (Figure 10.8). The opposing effects of negative and positive elongation factors influence the ability of the polymerase to begin productive RNA synthesis, and if the negative factors predominate then transcription halts before the polymerase has moved more than 30 nucleotides from the initiation point. Promoter escape could therefore be an important control point, but how regulation is applied at this stage is not yet known (Lee and Young, 2000).

Figure 10.8

Promoter clearance and promoter escape. Promoter clearance is the transition from the pre-initiation complex to a complex that has begun to synthesize RNA. Promoter escape occurs when the polymerase moves away from the promoter region and becomes committed (more...)

Successful promoter escape could be linked with capping, this processing event being completed before the transcript reaches 30 nucleotides in length. The first step in capping is addition of an extra guanosine to the extreme 5′ end of the RNA. Rather than occurring by normal RNA polymerization, capping involves a reaction between the 5′ triphosphate of the terminal nucleotide and the triphosphate of a GTP nucleotide. The γ-phosphate of the terminal nucleotide (the outermost phosphate) is removed, as are the β and γ phosphates of the GTP, resulting in a 5′-5′ bond (Figure 10.9). The reaction is carried out by the enzyme guanylyl transferase. The second step of the capping reaction converts the new terminal guanosine into 7-methylguanosine by attachment of a methyl group to nitrogen number 7 of the purine ring, this modification catalyzed by guanine methyltransferase. The two capping enzymes make attachments with the CTD and it is possible that they are intrinsic components of the RNA polymerase II complex during promoter clearance (Proudfoot, 2000).

Figure 10.9

Capping of eukaryotic mRNA. The top part of the diagram shows the capping reaction in outline. A GTP molecule (drawn as Gppp) reacts with the 5′ end of the mRNA to give a triphosphate linkage. In the second step of the process, the terminal G (more...)

The 7-methylguanosine structure is called a type 0 cap and is the commonest form in yeast. In higher eukaryotes, additional modifications occur (see Figure 10.9):

A second methylation replaces the hydrogen of the 2′-OH group of what is now the second nucleotide in the transcript. This results in a type 1 cap.
If this second nucleotide is an adenosine, then a methyl group might be added to nitrogen number 6 of the purine.
Another 2′-OH methylation might occur at the third nucleotide position, resulting in a type 2 cap.

All RNAs synthesized by RNA polymerase II are capped in one way or another. This means that as well as mRNAs, the snRNAs that are transcribed by this enzyme are also capped (see Table 9.3). The cap may be important for export of mRNAs and snRNAs from the nucleus (Section 10.5), but its best defined role is in translation of mRNAs, which is covered in Section 11.2.2.

Elongation of eukaryotic mRNAs

As mentioned above, the fundamental aspects of transcript elongation are the same in bacteria and eukaryotes. The one major distinction concerns the length of transcript that must be synthesized. The longest bacterial genes are only a few kb in length and can be transcribed in a matter of minutes by the bacterial RNA polymerase, which has a polymerization rate of several hundred nucleotides per minute. In contrast, RNA polymerase II can take hours to transcribe a single gene, even though it can work at up to 2000 nucleotides per minute. This is because the presence of multiple introns in many eukaryotic genes (Section 10.1.3) means that considerable lengths of DNA must be copied. For example, the pre-mRNA for the human dystrophin gene is 2400 kb in length and takes about 20 hours to synthesize.

The extreme length of eukaryotic genes places demands on the stability of the transcription complex. RNA polymerase II on its own is not able to meet these demands: when the purified enzyme is studied in vitro its polymerization rate is less than 300 nucleotides per minute because the enzyme pauses frequently on the template and sometimes stops altogether. In the nucleus, pausing and stopping are reduced because of the action of a series of elongation factors, proteins that associate with the polymerase after it has cleared the promoter and left behind the transcription factors involved in initiation (Conaway et al., 2000). Thirteen elongation factors are currently known in mammalian cells, displaying a variety of functions (Table 10.1). Their importance is shown by the effects of mutations that disrupt the activity of one or other of the factors (Conaway and Conaway, 1999). Inactivation of CSB, for example, results in Cockayne syndrome, a disease characterized by developmental defects such as mental retardation, and disruption of ELL causes acute myeloid leukemia.

Table 10.1

Examples of elongation factors for mammalian RNA polymerase II.

A second difference between bacterial and eukaryotic elongation is that RNA polymerase II, as well as the other eukaryotic nuclear polymerases, has to negotiate the nucleosomes that are attached to the template DNA that is being transcribed. At first glance it is difficult to imagine how the polymerase can elongate its transcript through a region of DNA wound around a nucleosome (see Figure 2.5). The solution to this problem is probably provided by elongation factors that are able to modify the chromatin structure in some way. In mammals, the elongation factor FACT has been shown to interact with histones H2A and H2B, possibly influencing nucleosome positioning, and less well defined interactions have been demonstrated for other factors (Orphanides and Reinberg, 2000). Yeast possesses a factor called elongator, which has tentatively been assigned a role in chromatin modification because it contains a subunit that has histone acetyltransferase activity (Section 8.2.1; Wittschieben et al., 1999), but so far a homolog of this complex has not been identified in mammals. An intriguing question is whether the first polymerase to transcribe a particular gene is a ‘pioneer’ with a special elongation factor complement that opens up the chromatin structure, with subsequent rounds of transcription being performed by standard polymerase complexes that take advantage of the changes induced by the pioneer.

Termination of mRNA synthesis is combined with polyadenylation

Virtually all eukaryotic mRNAs have a series of up to 250 adenosines at their 3′ ends. These As are not specified by the DNA and are added to the transcript by a template-independent RNA polymerase called poly(A) polymerase (Bard et al., 2000). This polymerase does not act at the extreme 3′ end of the transcript, but at an internal site which is cleaved to create a new 3′ end to which the poly(A) tail is added.

The basic features of polyadenylation have been understood for some time. In mammals, polyadenylation is directed by a signal sequence in the mRNA, almost invariably 5′-AAUAAA-3′. This sequence is located between 10 and 30 nucleotides upstream of the polyadenylation site, which is often immediately after the dinucleotide 5′-CA-3′ and is followed 10–20 nucleotides later by a GU-rich region. Both the poly(A) signal sequence and the GU-rich region are binding sites for multi-subunit protein complexes, which are, respectively, the cleavage and polyadenylation specificity factor (CPSF) and the cleavage stimulation factor (CstF). Poly(A) polymerase and at least two other protein factors must associate with bound CPSF and CstF in order for polyadenylation to occur (Figure 10.10). These additional factors include polyadenylate-binding protein (PADP), which helps the polymerase to add the adenosines, possibly influences the length of the poly(A) tail that is synthesized, and appears to play a role in maintenance of the tail after synthesis. In yeast, the signal sequences in the transcript are slightly different, but the protein complexes are similar to those in mammals and polyadenylation is thought to occur by more or less the same mechanism (Guo and Sherman, 1996; Manley and Takagaki, 1996).

Figure 10.10

Polyadenylation of eukaryotic mRNA. See the text for details. Note that the diagram is schematic and is not intended to indicate the relative sizes and shapes of the various protein complexes, nor their precise positioning, although CPSF and CstF are (more...)

Polyadenylation was once looked on as a ‘posttranscriptional’ event but it is now recognized that the process is an inherent part of the mechanism for termination of transcription by RNA polymerase II. CPSF is known to interact with TFIID and is recruited into the polymerase complex during the initiation stage. By riding along the template with RNA polymerase II, CPSF is able to bind to the poly(A) signal sequence as soon as it is transcribed, initiating the polyadenylation reaction (Figure 10.11). Both CPSF and CstF form contacts with the CTD of the polymerase. It has been suggested that the nature of these contacts changes when the poly(A) signal sequence is located, and that this change alters the properties of the elongation complex so that termination becomes favored over continued RNA synthesis. As a result, transcription stops soon after the poly(A) signal sequence has been transcribed (Bentley, 1999).

Figure 10.11

The link between polyadenylation and termination of transcription by RNA polymerase II. CPSF is shown attached to the RNA polymerase II elongation complex that is synthesizing RNA. CPSF binds to the polyadenylation signal sequence as soon as it is transcribed. (more...)

Even though polyadenylation can be identified as an inherent part of the termination process, this does not explain why it is necessary to add a poly(A) tail to the transcript. A role for the poly(A) tail has been sought for several years, but no convincing evidence has been found for any of the various suggestions that have been made. These suggestions include an influence on mRNA stability, which seems unlikely as some stable transcripts have very short poly(A) tails, and a role in initiation of translation. The latter proposal is supported by research showing that poly(A) polymerase is repressed during those periods of the cell cycle when relatively little protein synthesis occurs (Colgan et al., 1996).

10.1.3. Intron splicing

The existence of introns was not suspected until 1977 when DNA sequencing was first applied to eukaryotic genes and it was realized that many of these contain ‘intervening sequences’ that separate different segments of the coding DNA from one another (Figure 10.12). We now recognize seven distinct types of intron in eukaryotes, and additional forms in the archaea (Table 10.2). Two of these types - the GU-AG and AU-AC introns - are found in eukaryotic protein-coding genes and are dealt with in this section; the other types will be covered later in the chapter.

Figure 10.12

Introns. The structure of the human β-globin gene is shown. This gene is 1423 bp in length and contains two introns, one of 131 bp and one of 851 bp, which together make up 69% of the length of the gene.

Table 10.2

Types of intron.

Box 10.2

Other types of intron. There are eight different types of intron (see Table 10.2). Four types are described in the text: the nuclear pre-mRNA introns of the GU-AG and AU-AC classes, the self-splicing Group I introns, and the introns in eukaryotic pre-tRNA (more...)

Few rules can be established for the distribution of introns in protein-coding genes, beyond the fact that introns are less common in lower eukaryotes: the 6000 genes in the yeast genome contain only 239 introns in total, whereas many individual mammalian genes contain 50 or more introns. When the same gene is compared in related species, we usually find that some of the introns are in identical positions but that each species has one or more unique introns. This implies that some introns remain in place for millions of years, retaining their positions while species diversify, whereas others appear or disappear during this same period. This leads to two competing hypotheses for the evolution of introns:

‘Introns late’ is the hypothesis that introns evolved relatively recently and are gradually accumulating in eukaryotic genomes.
‘Introns early’ is the alternative hypothesis, that introns are very ancient and are gradually being lost from eukaryotic genomes.

These are issues that we will return to in Section 15.3.2 when we study molecular evolution. For the time being, what is important is that a eukaryotic pre-mRNA may contain many introns, perhaps over 100, taking up a considerable length of the transcript (Table 10.3), and that these introns must be excised and the exons joined together in the correct order before the transcript can function as a mature mRNA.

Table 10.3

Introns in human genes.

Conserved sequence motifs indicate the key sites in GU-AG introns

With the vast bulk of pre-mRNA introns, the first two nucleotides of the intron sequence are 5′-GU-3′ and the last two 5′-AG-3′. They are therefore called ‘GU-AG’ introns and all members of this class are spliced in the same way. These conserved motifs were recognized soon after introns were discovered and it was immediately assumed that they must be important in the splicing process. As intron sequences started to accumulate in the databases it was realized that the GU-AG motifs are merely parts of longer consensus sequences that span the 5′ and 3′ splice sites. These consensus sequences vary in different types of eukaryote; in vertebrates they can be described as:

5′ splice site 5′-AG↓GUAAGU-3′

3′ splice site 5′-PyPyPyPyPyPyNCAG↓-3′

In these designations, ‘Py’ is one of the two pyrimidine nucleotides (U or C), ‘N’ is any nucleotide, and the arrow indicates the exon-intron boundary. The 5′ splice site is also known as the donor site and the 3′ splice site as the acceptor site.

Other conserved sequences are present in some but not all eukaryotes. Introns in higher eukaryotes usually have a polypyrimidine tract, a pyrimidine-rich region located just upstream of the 3′ end of the intron sequence (Figure 10.13). This tract is less frequently seen in yeast introns, but these have an invariant 5′-UACUAAC-3′ sequence, located between 18 and 140 bp upstream of the 3′ splice site, which is not present in higher eukaryotes. The polypyrimidine tract and the 5′-UACUAAC-3′ sequence are not functionally equivalent, as described in the next two sections.

Figure 10.13

Conserved sequences in vertebrate introns. The longer consensus sequences around the splice sites are given in the text. Abbreviation: Py, pyrimidine nucleotide (U or C).

Outline of the splicing pathway for GU-AG introns

The conserved sequence motifs indicate important regions of GU-AG introns, regions that we would anticipate either acting as recognition sequences for RNA-binding proteins involved in splicing, or playing some other central role in the process. Early attempts to understand splicing were hindered by technical problems (in particular difficulties in developing a cell-free splicing system with which the process could be probed in detail), but during the 1990s there was an explosion of information. This work showed that the splicing pathway can be divided into two steps (Figure 10.14):

Figure 10.14

Splicing in outline. Cleavage of the 5′ splice site is promoted by the hydroxyl (OH) attached to the 2′-carbon of an adenosine nucleotide within the intron sequence. This results in the lariat structure and is followed by the 3′-OH (more...)

Cleavage of the 5′ splice site occurs by a transesterification reaction promoted by the hydroxyl group attached to the 2′ carbon of an adenosine nucleotide located within the intron sequence. In yeast, this adenosine is the last one in the conserved 5′-UACUAAC-3′ sequence. The result of the hydroxyl attack is cleavage of the phosphodiester bond at the 5′ splice site, accompanied by formation of a new 5′-2′ phosphodiester bond linking the first nucleotide of the intron (the G of the 5′-GU-3′ motif) with the internal adenosine. This means that the intron has now been looped back on itself to create a lariat structure.
Cleavage of the 3′ splice site and joining of the exons result from a second transesterification reaction, this one promoted by the 3′-OH group attached to the end of the upstream exon. This group attacks the phosphodiester bond at the 3′ splice site, cleaving it and so releasing the intron as the lariat structure, which is subsequently converted back to a linear RNA and degraded. At the same time, the 3′ end of the upstream exon joins to the newly formed 5′ end of the downstream exon, completing the splicing process.

In a chemical sense, intron splicing is not a great challenge for the cell. It is simply a double transesterification reaction, no more complicated than many other biochemical reactions that are dealt with by individual enzymes. Why then has such a complex machinery evolved to deal with it? The difficulty lies with the topological problems. The first of these is the substantial distance that might lie between splice sites, possibly a few tens of kb, representing 100 nm or more if the mRNA is in the form of a linear chain. A means is therefore needed of bringing the splice sites into proximity. The second topological problem concerns selection of the correct splice site. All splice sites are similar, so if a pre-mRNA contains two or more introns then there is the possibility that the wrong splice sites could be joined, resulting in exon skipping - the loss of an exon from the mature mRNA (Figure 10.15A). Equally unfortunate would be selection of a cryptic splice site, a site within an intron or exon that has sequence similarity with the consensus motifs of real splice sites (Figure 10.15B). Cryptic sites are present in most pre-mRNAs and must be ignored by the splicing apparatus.

Figure 10.15

Two aberrant forms of splicing. (A) In exon skipping the aberrant splicing results in an exon being lost from the mRNA. (B) When a cryptic splice site is selected, part of an exon might be lost from the mRNA, as shown here, or if the cryptic site lies (more...)

snRNAs and their associated proteins are the central components of the splicing apparatus

The central components of the splicing apparatus for GU-AG introns are the snRNAs called U1, U2, U4, U5 and U6. These are short molecules (between 106 nucleotides [U6] and 185 nucleotides [U2] in vertebrates) that associate with proteins to form small nuclear ribonucleoproteins (snRNPs) (Figure 10.16). The snRNPs, together with other accessory proteins, attach to the transcript and form a series of complexes, the last one of which is the spliceosome, the structure within which the actual splicing reactions occur (Smith and Valcárcel, 2000). The process operates as follows (Figure 10.17):

Figure 10.16

Structure of U1-snRNP. The mammalian U1-snRNP comprises the 165-nucleotide U1-RNA plus ten proteins. Three of these (U1-70K, U1-A and U1-C) are specific to this snRNP, the other seven are Sm proteins that are found in all the snRNPs involved in splicing. (more...)

Figure 10.17

The roles of snRNPs and associated proteins during splicing. See the text for details. There are several unanswered questions about the series of events occurring during splicing and it is unlikely that the scheme shown here is entirely accurate. The (more...)

The commitment complex initiates a splicing activity. This complex comprises U1-snRNP, which binds to the 5′ splice site, partly by RNA-RNA base-pairing, and the protein factors SF1, U2AF³⁵ and U2AF⁶⁵, which make protein-RNA contacts with the branch site, the polypyrimidine tract and the 3′ splice site, respectively.
The pre-spliceosome complex comprises the commitment complex plus U2-snRNP, the latter attached to the branch site. At this stage, an association between U1-snRNP and U2-snRNP brings the 5′ splice site into close proximity with the branch point.
The spliceosome is formed when U4/U6-snRNP (a single snRNP containing two snRNAs) and U5-snRNP attach to the pre-spliceosome complex. This results in additional interactions that bring the 3′ splice site close to the 5′ site and the branch point. All three key positions in the intron are now in proximity and the two transesterifications occur as a linked reaction, possibly catalyzed by U6-snRNP, completing the splicing process.

The series of events shown in Figure 10.17 provides no clues about how the correct splice sites are selected so that exons are not lost during splicing, and cryptic sites are ignored. This aspect of splicing is still poorly understood but it has become clear that a set of splicing factors called SR proteins are important in splice-site selection. The SR proteins - so-called because their C-terminal domains contain a region rich in serine (abbreviation S) and arginine (R) - were first implicated in splicing when it was discovered that they are components of the spliceosome. They appear to have several functions, including the establishment of a connection between bound U1-snRNP and bound U2AF in the commitment complex (Valcárcel and Green, 1996). This is perhaps the clue to their role in splice-site selection, formation of the commitment complex being the critical stage of the splicing process, as this is the event that identifies which sites will be linked.

SR proteins also interact with exonic splicing enhancers (ESEs), which are purine-rich sequences located in the exon regions of a transcript (Blencowe, 2000). We are still at an early stage in our understanding of ESEs and their counterparts, the exonic splicing silencers (ESSs; Del Gatto-Konczak et al., 1999), but their importance in controlling splicing is clear from the discovery that several human diseases, including one type of muscular dystrophy, are caused by mutations in ESE sequences. The location of ESEs and ESSs indicates that assembly of the spliceosome is driven not simply by contacts within the intron but also by interactions with adjacent exons. In fact, it is possible that an individual commitment complex is not assembled within an intron as shown in Figure 10.17, but initially bridges an exon (Figure 10.18). This model is attractive not only because it provides a means by which contact between an ESE or ESS and an SR protein could influence splicing, but also because it takes account of the large disparity between the lengths of exons and introns in vertebrate genes. In the human genome, for example, the exons have an average length of 145 bp compared with 3365 bp for introns (IHGSC, 2001). Initial assembly of a commitment complex across an exon might therefore be a less difficult task than assembly across a much longer intron.

Figure 10.18

An alternative model for assembly of the commitment complex. In this model, each individual commitment complex (one shown in orange and one in blue) is built up across an exon, bringing the complex into close association with an exonic splicing enhancer (more...)

There is one final aspect of SR proteins that we should address. This is the possibility that a subset of these SR proteins, called CASPs (CTD-associated SR-like proteins) or SCAFs (SR-like CTD-associated factors), form a physical connection between the spliceosome and the CTD of the RNA polymerase II transcription complex, and hence provide a link between transcript elongation and processing. As with some of the polyadenylation proteins (Section 10.1.2), it is probable that these splicing factors ride with the polymerase as it synthesizes the transcript, and are deposited at their appropriate positions at intron splice sites as soon as these are transcribed. Electron microscopy studies have shown that transcription and splicing occur together, and the discovery of splicing factors that have an affinity for RNA polymerase provides a biochemical basis for this observation (Corden and Patturajan, 1997).

Alternative splicing is common in many eukaryotes

When introns were first discovered it was imagined that each gene always gives rise to the same mRNA: in other words, that there is a single splicing pathway for each primary transcript (Figure 10.19A). This assumption was found to be incorrect in the 1980s, when it was shown that the primary transcripts of some genes can follow two or more alternative splicing pathways, enabling a single transcript to be processed into related but different mRNAs and hence to direct synthesis of a range of proteins (Figure 10.19B). In some organisms alternative splicing is uncommon, only three examples being known in Saccharomyces cerevisiae, but in higher eukaryotes it is much more prevalent. This first became apparent when the draft Drosophila sequence was examined (Adams et al., 2000), and it was realized that fruit flies have fewer genes that the microscopic worm Caenorhabditis elegans (see Table 2.1), despite the obviously greater physical complexity of Drosophila, which should be reflected in a more diverse proteome. The most likely explanation for the lack of congruence between the number of genes in the Drosophila genome and the number of proteins in its proteome is that a substantial number of the genes give rise to multiple proteins via alternative splicing. At about the same time, the first human chromosome sequences were obtained and it was recognized that rather than having 80 000–100 000 genes, as suggested by the size of the human proteome, humans have only 35 000 or so genes. It is now believed that at least 35% of the genes in the human genome undergo alternative splicing (Graveley, 2001): the principle ‘one gene, one protein’, biological dogma since the 1940s, has been completely overthrown.

Figure 10.19

The assumption that each pre-mRNA follows a single splicing pathway was shown to be incorrect when alternative splicing was discovered.

Alternative splicing is now looked on as a crucial innovation in the genome expression pathway. Two examples will suffice to illustrate its importance. The first of these concerns sex, a fundamental aspect of the biology of any organism, and which in Drosophila is determined by an alternative splicing cascade (Chabot, 1996). The first gene in this cascade is sxl, whose transcript contains an optional exon which, when spliced to the one preceding it, results in an inactive version of protein SXL. In females the splicing pathway is such that this exon is skipped so that functional SXL is made (Figure 10.20). SXL promotes selection of a cryptic splice site in a second transcript, tra, by directing U2AF⁶⁵ away from its normal 3′ splice site to a second site further downstream. The resulting female-specific TRA protein is again involved in alternative splicing, this time by interacting with SR proteins to form a multifactor complex that attaches to an ESE within an exon of a third pre-mRNA, dsx, promoting selection of a secondary, female-specific splice site in this transcript. The male and female versions of the DSX proteins are the primary determinants of Drosophila sex.

Figure 10.20

Regulation of splicing during expression of genes involved in sex determination in Drosophila. (A) The cascade begins with sex-specific alternative splicing of the sxl pre-mRNA. In males all exons are present in the mRNA, but this means that a truncated (more...)

The second example of alternative splicing illustrates the multiplicity of mRNAs synthesized from some primary transcripts. The human slo gene codes for a membrane protein that regulates the entry and exit of potassium ions into and out of cells (Graveley, 2001). The gene has 35 exons, eight of which are involved in alternative splicing events (Figure 10.21). The alternative splicing pathways involve different combinations of the eight optional exons, leading to over 500 distinct mRNAs, each specifying a membrane protein with slightly different functional properties. What are the biological consequences of this example of multiple splicing? The human slo genes are active in the inner ear and determine the auditory properties of the hair cells on the basilar membrane of the cochlea. Different hair cells respond to different sound frequencies between 20 and 20 000 Hz, their individual capabilities determined in part by the properties of their Slo proteins. Alternative splicing of slo genes in cochlear hair cells therefore determines the auditory range of humans.

Figure 10.21

The human slo gene. The gene comprises 35 exons, shown as boxes, eight of which (in blue) are optional and appear in different combinations in different slo mRNAs. There are 8! = 40 320 possible splicing pathways and hence 40 320 possible mRNAs, but only (more...)

At present we do not understand how alternative splicing is regulated and cannot describe the process that determines which of several splicing pathways is followed by a particular transcript. The players are thought to be the SR proteins in conjunction with ESEs and ESSs, but the way in which they control splice site selection is not known.

AU-AC introns are similar to GU-AG introns but require a different splicing apparatus

One of the more surprising events of recent years has been the discovery of a few introns in eukaryotic pre-mRNAs that do not fall into the GU-AG category, having different consensus sequences at their splice sites. These are the AU-AC introns which, to date, have been found in approximately 20 genes in organisms as diverse as humans, plants and Drosophila (Nilsen, 1996; Tarn and Steitz, 1997).

As well as the sequences at their splice sites, AU-AC introns have a conserved (though not invariant) branch site sequence with the consensus 5′-UCCUUAAC-3′, the last adenosine in this motif being the one that participates in the first transesterification reaction. This points us towards the remarkable feature of AU-AC introns: their splicing pathway is very similar to that for GU-AG introns, but involves a different set of splicing factors. Only the U5-snRNP is involved in the splicing mechanisms of both types of intron. The roles of U1-snRNP and U2-snRNP are taken by a previously discovered complex that had never been assigned a function. U11/ U12-snRNP, and an entirely new U4atac/U6atac-snRNP have subsequently been isolated, completing the picture.

The splicing pathways for the ‘major’ and ‘minor’ types of intron are not identical but many of the interactions between the transcript and the snRNPs and other splicing proteins are remarkably similar. This means that AU-AC introns, rather than simply being a curiosity, are proving useful in testing models for interactions occurring during GU-AG intron splicing. The argument is that a predicted interaction between two components of the GU-AG spliceosome can be checked by seeing if the same interaction is possible with the equivalent AU-AC components. This has already been informative in helping to define a base-paired structure formed between the U2 and U6 snRNAs in the GU-AG spliceosome (Tarn and Steitz, 1996).

10.2. Synthesis and Processing of Non-coding RNAs

In bacteria, the same RNA polymerase synthesizes all types of RNA. The issues that we have already discussed regarding elongation and termination of bacterial mRNA (Section 10.1.1) therefore also hold for rRNA and tRNA synthesis, and the only outstanding areas that we have to cover are the processing of the pre-rRNAs and pre-tRNAs into the mature molecules. This processing involves cutting events and chemical modifications, both types of reaction being similar to equivalent processing events for eukaryotic rRNAs and tRNAs: we will therefore deal with bacterial and eukaryotic processing together, cutting events in Section 10.2.2 and chemical modifications in Section 10.3. The distinctive feature of eukaryotic rRNA and tRNA processing is the presence in some eukaryotic pre-RNAs of introns, different from the pre-mRNA introns described above; these will be covered in Section 10.2.3. First, however, there are issues regarding transcript elongation and termination by RNA polymerases I and III that we must address.

10.2.1. Transcript elongation and termination by RNA polymerases I and III

In general, we know less about transcript elongation and termination by RNA polymerases I and III than we do about equivalent processes for RNA polymerase II. The interaction of the polymerase with the template and transcript during elongation appears to be similar with all three enzymes, a reflection of the structural relatedness of the three largest subunits in each polymerase. One difference is the rate of transcription - RNA polymerase I, for example, being much slower than RNA polymerase II, managing a polymerization rate of only 20 nucleotides per minute, compared with 2000 per minute for mRNA synthesis. A second difference is that neither RNA polymerase I nor RNA polymerase III transcripts are capped. Various proteins that might act as elongation factors for RNA polymerase I or III have been isolated, including SGS1 and SRS2 of yeast, which code for two related DNA helicases. Mutations in the genes for SGS1 and SRS2 cause a reduction in RNA polymerase I transcription as well as DNA replication (Lee et al., 1999). SGS1 is interesting because it is a homolog of a pair of human proteins that are defective in the growth disorders Bloom's and Werner's syndromes (Section 7.4.2) but the exact involvement of SGS1 and SRS2, and other putative elongation factors, in transcription by RNA polymerases I and III is not known.

The major differences between the three polymerases are seen when the termination processes are compared. The polyadenylation system for RNA polymerase II termination (Section 10.1.2) is unique to that enzyme and no equivalent has been described for the other two polymerases. Termination by RNA polymerase I involves a DNA-binding protein, called Reb1p in Saccharomyces cerevisiae and TTF-I in mice, which attaches to the DNA at a recognition sequence located 12–20 bp downstream of the point at which transcription terminates (Figure 10.22). Exactly how the bound protein causes termination is not known, but a model in which the polymerase becomes stalled because of the blocking effect of Reb1p/TTF-I has been proposed (Reeder and Lang, 1997). A second protein, PTRF (polymerase I and transcript release factor), is thought to induce dissociation of the polymerase and the transcript from the DNA template (Jansa et al., 1998). Even less is known about RNA polymerase III termination: a run of adenosines in the template is implicated but the process does not involve a hairpin loop and so is not analogous to termination in bacteria.

Figure 10.22

A possible scheme for termination of transcription by RNA polymerase I.

10.2.2. Cutting events involved in processing of bacterial and eukaryotic pre-rRNA and pre-tRNA

Bacteria synthesize three different rRNAs, called 5S rRNA, 16S rRNA and 23S rRNA, the names indicating the sizes of the molecules as measured by sedimentation analysis (see Technical Note 2.2). The three genes for these rRNAs are linked into a single transcription unit (which is usually present in multiple copies, seven for E. coli) and so the pre-rRNA contains copies of all three rRNAs. Cutting events are therefore needed to release the mature rRNAs. These cuts are made by various ribonucleases, at positions specified by double-stranded regions formed by base-pairing between different parts of the pre-rRNA (Figure 10.23). The cut ends are subsequently trimmed by exonucleases.

Figure 10.23

Pre-rRNA processing in Escherichia coli. The pre-rRNA, containing copies of the 16S, 23S and 5S rRNAs, is cleaved by ribonucleases III, P and F, and the resulting molecules trimmed by ribonucleases M16, M23 and M5 to give the mature rRNAs. Eukaryotic (more...)

In eukaryotes there are four rRNAs. One of these, for the 5S rRNA, is transcribed by RNA polymerase III and does not undergo processing. The remaining three (the 5.8S, 18S and 28S rRNAs) are transcribed by RNA polymerase I from a single unit, producing a pre-rRNA which, as with the bacterial pre-rRNAs, is processed by cutting and end-trimming. Several nucleases are required, including the multifunctional ribonuclease MRP which, as well as 5.8S rRNA processing, is involved in replication of mitochondrial DNA and control of the cell cycle. MRP is particularly interesting because the various subunits of the enzyme include one that is made of RNA rather than DNA (Clayton, 2001). RNA subunits are present in several enzymes involved in RNA processing, including RNase P (see Figure 10.24), and RNAs are important components of the snRNPs that participate in splicing of GU-AG and AU-AC introns (Section 10.1.3). The presence of RNA molecules in these processing enzymes and complexes is thought to be a relic of the RNA world, the early period of evolution when all biological reactions centered around RNA (Section 15.1.1).

Figure 10.24

Processing of an Escherichia coli pre-tRNA. The example shown results in synthesis of tRNA^tyr. The tRNA sequence in the primary transcript adopts its base-paired cloverleaf structure (see Figure 11.2) and two additional hairpin structures form, one on (more...)

In both bacteria and eukaryotes, tRNA genes occur singly and as multigene transcription units, or, in bacteria, as infiltrators within the rRNA transcription unit. The pre-tRNAs are also processed by a series of ribonucleases, as illustrated in Figure 10.24. All mature tRNAs must end with the trinucleotide 5′-CCA-3′. Some tRNAs have this sequence already; those that do not, or from which the 5′-CCA-3′ has been removed by the processing ribonucleases, have the motif added by tRNA nucleotidyltransferase.

10.2.3. Introns in eukaryotic pre-rRNA and pre-tRNA

Some eukaryotic pre-rRNAs and pre-tRNAs contain introns which must be spliced during the processing of these transcripts into mature RNAs. Neither type of intron is similar to the GU-AG and AU-AC introns of pre-mRNA.

Eukaryotic pre-rRNA introns are self-splicing

Introns are quite uncommon in eukaryotic pre-rRNAs but a few are known in microbial eukaryotes such as Tetrahymena. These introns are members of the Group I family (see Table 10.2) and are also found in mitochondrial and chloroplast genomes, where they occur in pre-mRNA as well as pre-rRNA. A few isolated examples are known in bacteria, for instance in a tRNA gene of the cyanobacterium Anabaena and in the thymidylate synthase gene of the E. coli bacteriophage T4.

The splicing pathway for Group I introns is similar to that of pre-mRNA introns in that two transesterifications are involved. The first is induced not by a nucleotide within the intron but by a free nucleoside or nucleotide, any one of guanosine or guanosine mono-, di- or triphosphate (Figure 10.25). The 3′-OH of this cofactor attacks the phosphodiester bond at the 5′ splice site, cleaving it, with transfer of the G to the 5′ end of the intron. The second transesterification involves the 3′-OH at the end of the exon, which attacks the phosphodiester bond at the 3′ splice site, causing cleavage, joining of the two exons, and release of the intron. The released intron is linear, rather than the lariat seen with pre-mRNA introns, but may undergo additional transesterifications, leading to circular products, as part of its degradation process.

Figure 10.25

The splicing pathway for the Tetrahymena rRNA intron.

The remarkable feature of the Group I splicing pathway is that it proceeds in the absence of proteins and hence is autocatalytic, the RNA itself possessing enzymatic activity. This was the first example of an RNA enzyme or ribozyme to be discovered, back in the early 1980s. Initially this caused quite a stir but it is now realized that, although uncommon, there are several examples of ribozymes (Table 10.4). The self-splicing activity of Group I introns resides in the base-paired structure taken up by the RNA. This structure was first described in two-dimensional terms by comparing the sequences of different Group I introns and working out a common base-paired arrangement that could be adopted by all versions. This resulted in a model comprising nine major base-paired regions (Figure 10.26; Burke et al., 1987). More recently, the three-dimensional structure has been solved by X-ray crystallography (Cate et al., 1996; Golden et al., 1998). The ribozyme consists of a catalytic core made up of two domains, each one comprising two of the base-paired regions, with the splice sites brought into proximity by interactions between two other parts of the secondary structure. Although this RNA structure is sufficient for splicing, it is possible that with some introns the stability of the ribozyme is enhanced by non-catalytic protein factors that bind to it. This has long been suspected with the Group I introns in organelle genes, many of these containing an ORF coding for a protein called a maturase that appears to play a role in splicing.

Table 10.4

Examples of ribozymes.

Figure 10.26

The base-paired structure of the Tetrahymena rRNA intron. The sequence of the intron is shown in capital letters, with the exons in lower case. Additional interactions fold the intron into a three-dimensional structure that brings the two splice sites (more...)

Eukaryotic tRNA introns are variable but all are spliced by the same mechanism

Introns in eukaryotic pre-tRNA are relatively short and are usually found at the same position in the mature tRNA sequence, within the anticodon arm and loop (see Figure 11.2). Their sequences are variable, and no common motifs can be identified. Unlike all other types, splicing of pre-tRNA introns does not involve transesterifications. Instead, the two splice sites are cut by ribonuclease action, leaving a cyclic phosphate structure attached to the 3′ end of the upstream exon, and a hydroxyl group at the 5′ end of the downstream exon (Figure 10.27). These ends are held in proximity by the natural base-pairing adopted by the tRNA sequence and are ligated by an RNA ligase.

Figure 10.27

Splicing of the Saccharomyces cerevisiae pre-tRNA^Tyr. See the text for details. In the second stage of the splicing pathway, the 2′,3′-P terminus is converted to a 3′-OH end by a phosphodiesterase, and the 5′-OH terminus (more...)

10.3. Processing of Pre-RNA by Chemical Modification

The processing events that we have studied so far have been either chemical modifications that affect the ends of transcripts (capping, polyadenylation) or physical changes to the lengths of transcripts (splicing, cutting events). The final type of processing that occurs with pre-RNAs is the chemical modification of nucleotides within the transcript. This occurs with pre-rRNAs and pre-tRNAs of both bacteria and eukaryotes and, to a much lesser extent, with pre-mRNAs of eukaryotes. Equivalent events in the archaea are poorly understood.

A broad spectrum of chemical changes has been identified with different pre-RNAs: over 50 different modifications are known in total (Table 10.5). Most of these are carried out directly on an existing nucleotide within the transcript but two modified nucleotides, queosine and wyosine, are put in place by cutting out an entire nucleotide and replacing it with the modified version. Many of these modifications were first identified in tRNAs, within which approximately one in ten nucleotides becomes altered. These modifications are thought to mediate the recognition of individual tRNAs by the enzymes that attach amino acids to these molecules (Section 11.1.1), and to increase the range of the interactions that can occur between tRNAs and codons during translation, enabling a single tRNA to recognize more than one codon (Section 11.1.2).

Table 10.5

Examples of chemical modifications occurring with nucleotides in rRNA and tRNA.

We know relatively little about how tRNA modifications are carried out, beyond the fact that there are a number of enzymes that catalyze these changes. How the enzymes are directed to the correct nucleotides on which they must act has not been explained. With the rRNA and mRNA modifications we know rather less about the reasons for the chemical alterations but rather more about how the alterations are carried out. These issues are covered in the next two sections.

10.3.1. Chemical modification of pre-rRNAs

Ribosomal RNAs are modified in two ways: by addition of methyl groups to, mainly, the 2′-OH group on nucleotide sugars, and by conversion of uridine to pseudouridine (see Table 10.5). The same modification occurs at the same position on all copies of an rRNA, and these modified positions are, to a certain extent, the same in different species. Some similarities in modification patterns are even seen when bacteria and eukaryotes are compared, although bacterial rRNAs are less heavily modified than eukaryotic ones. Functions for the modifications have not been identified, although most occur within those parts of rRNAs thought to be most critical to the activity of these molecules in ribosomes (Section 11.2.1). Modified nucleotides might, for example, be involved in rRNA-catalyzed reactions such as synthesis of peptide bonds.

It is not easy, simply by intuition, to imagine how specificity of rRNA modification can be ensured. Human pre-rRNA, for example, undergoes 106 methylations and 95 pseudouridinylations, each alteration at a specified position, with no obvious sequence similarities that can be inferred as target motifs for the modifying enzymes. Not surprisingly, progress in understanding rRNA modification was slow to begin with. The breakthrough came when it was shown that in eukaryotes the short RNAs called snoRNAs are involved in the modification process. These molecules are 70–100 nucleotides in length and are located in the nucleolus, the region within the nucleus where rRNA processing takes place. The initial discovery was that by base-pairing to the relevant region, snoRNAs pinpoint positions at which the pre-rRNA must be methylated. The base-pairing involves only a few nucleotides, not the entire length of the snoRNA, but these nucleotides are always located immediately upstream of a conserved sequence called the D box (Figure 10.28A). The base pair involving the nucleotide that will be modified is five positions away from the D box. The hypothesis is that the D box is the recognition signal for the methylating enzyme, which is therefore directed towards the appropriate nucleotide (Bachellerie and Cavaillé, 1997). After these initial discoveries with regard to methylation, it was shown that a different family of snoRNAs carries out the same guiding role in conversion of uridines to pseudouridines (Maden, 1997). These snoRNAs do not have D boxes but still have conserved motifs that could be recognized by the modifying enzyme, and each is able to form a specific base-paired interaction with its target site, specifying the nucleotide to be modified.

Figure 10.28

Methylation of rRNA by a snoRNA. (A) This example shows methylation of the C at position 1436 in the Saccharomyces cerevisiae 25S rRNA (equivalent to the 28S rRNA of vertebrates), directed by U24 snoRNA. The D box of the snoRNA is highlighted. Modification (more...)

The implication is that there is a different snoRNA for each modified position in a pre-rRNA, except possibly for a few sites that are close enough together to be dealt with by a single snoRNA. This means that there must be a few hundred snoRNAs per cell. At one time this seemed unlikely because very few snoRNA genes could be located, but now it appears that only a fraction of all the snoRNAs are transcribed from these standard genes, most being specified by sequences within the introns of other genes and released by cutting up the intron after splicing (Figure 10.28B).

The snoRNA system provides an elegant solution to site-specific chemical modification but it applies only to eukaryotic rRNAs. In contrast, the modifications made to bacterial rRNAs are carried out by enzymes that directly recognize the sequence and/or structures of the regions of RNA that contain the nucleotides to be modified. Often two or more nucleotides in the same region are modified at once. Bacterial rRNA modification is therefore similar to the systems for modifying tRNAs in both bacteria and eukaryotes.

10.3.2. RNA editing

Because rRNAs and tRNAs are non-coding, chemical modifications to their nucleotides affect only the structural features and, possibly, catalytic activities of the molecules. With mRNAs the situation is very different: chemical modification has the potential to change the coding properties of the transcript, resulting in an equivalent alteration in the amino acid sequence of the protein that is specified. A notable example of RNA editing occurs with the human mRNA for apolipoprotein B. The gene for this protein codes for a 456-amino-acid polypeptide, called apolipoprotein B100, which is synthesized in liver cells and secreted into the bloodstream where it transports lipids around the body. A related protein, apolipoprotein B48, is made by intestinal cells. This protein is only 2153 amino acids in length and is synthesized from an edited version of the mRNA for the full-length protein (Figure 10.29). In intestinal cells this mRNA is modified by deamination of a cytosine, converting this into a uracil. This changes a CAA codon, specifying glutamine, into a UAA codon, which causes translation to stop, resulting in the truncated protein. The deamination is carried out by an RNA-binding enzyme which, in conjunction with a set of auxiliary protein factors, binds to a sequence immediately downstream of the modification position within the mRNA (Smith and Sowden, 1996).

Figure 10.29

Editing of the human apolipoprotein B mRNA. Conversion of a C to a U creates a termination codon, resulting in a shortened form of apolipoprotein B being synthesized in intestinal cells.

Although not common, RNA editing occurs in a number of different organisms and includes a variety of different nucleotide changes (Table 10.6 and Box 10.3). Some editing events have a significant impact on the organism: in humans, editing is partly responsible for the generation of antibody diversity (Neuberger and Scott, 2000; Section 12.2.1) and has also been implicated in control of the HIV-1 infection cycle (Bourara et al., 2000). One particularly interesting type of editing is the deamination of adenosine to inosine, which is carried out by enzymes called adenosine deaminases acting on RNA (ADARs) (Reenan, 2001). Some of the target mRNAs for these enzymes are selectively edited at a limited number of positions. These positions are apparently specified by double-stranded segments of the pre-mRNA, formed by base-pairing between the modification site and sequences from adjacent introns. This type of editing occurs, for example, during processing of the mRNAs for mammalian glutamate receptors (Scott, 1997). Selective editing contrasts with the second type of modification carried out by ADARs, in which the target molecules become extensively deaminated, over 50% of the adenosines in the RNA becoming converted to inosines. Hyperediting has so far been observed mainly, but not exclusively, with viral RNAs and is thought to occur by chance, these RNAs adopting base-paired structures that fortuitously act as substrates for ADAR. It may, however, have physiological importance in the etiology of diseases caused by the edited viruses. This possibility is raised by the discovery that viral RNAs associated with persistent measles infections (as opposed to the more usual transient version of the disease) are hyperedited (Bass, 1997).

Table 10.6

Examples of RNA editing in mammals.

Box 10.3

More complex forms of RNA editing. The examples of RNA editing described in the text and in Table 10.6 are relatively straightforward events that lead to nucleotide changes at a single or limited number of positions in selected mRNAs. More complex types (more...)

10.4. Degradation of mRNAs

So far this chapter has concentrated on synthesis of RNAs. Their degradation is equally important, especially with regard to mRNAs whose presence or absence in the cell determines which proteins will be synthesized. Degradation of specific mRNAs could be a powerful way of regulating genome expression.

The rate of degradation of an mRNA can be estimated by determining its half-life in the cell. The estimates show that there are considerable variations between and within organisms. Bacterial mRNAs are generally turned over very rapidly, their half-lives rarely being longer than a few minutes, a reflection of the rapid changes in protein synthesis patterns that can occur in an actively growing bacterium with a generation time of 20 minutes or so. Eukaryotic mRNAs are longer lived, with half-lives of, on average, 10–20 minutes for yeast and several hours for mammals. Within individual cells the variations are almost equally striking: some yeast mRNAs have half-lives of only 1 minute whereas for others the figure is more like 35 minutes (Tuite, 1996). These observations raise two questions: what are the processes for mRNA degradation, and how are these processes controlled?

10.4.1. Bacterial mRNAs are degraded in the 3′→5′ direction

Studies of mutant bacteria whose mRNAs have extended half-lives have identified a range of ribonucleases and other RNA-degrading enzymes that are thought to be involved in mRNA degradation. These include (Carpousis et al., 1999):

RNase E and RNase III, which are endonucleases that make internal cuts in RNA molecules;
RNase II, which is an exonuclease that removes nucleotides in the 3′→5′ direction;
Polynucleotide phosphorylase (PNPase), which also removes nucleotides sequentially from the 3′ end of an mRNA but, unlike true nucleases, requires inorganic phosphate as a co-substrate.

No enzyme capable of degrading RNA in the 5′→3′ direction has yet been isolated from bacteria. This absence leads to the assumption that the main degradative process for bacterial mRNAs is removal of nucleotides from the 3′ end. This is not possible under normal circumstances because most mRNAs have a hairpin structure near the 3′ end, the same hairpin that induced termination of transcription (see Figures 10.3 and 10.4). This structure blocks the progress of RNase II and PNPase, preventing them from gaining access to the coding part of the transcript (Figure 10.30). The model for mRNA degradation therefore begins with removal of the 3′ terminal region, including the hairpin, by one of the endonucleases, exposing a new end from which RNase II and PNPase can enter the coding region, destroying the functional activity of the mRNA. Polyadenylation may also have a role. Although looked on primarily as a feature of eukaryotic mRNAs (Section 10.1.2), it has been known since 1975 that many bacterial transcripts have poly(A) tails at some stage in their existence, but that these tails are rapidly degraded. At present it is not clear whether polyadenylation precedes degradation of an mRNA, or whether it occurs at various intermediate stages after degradation has begun (Carpousis et al., 1999).

Figure 10.30

Degradation of bacterial RNA. The termination hairpin blocks the exonuclease activities of RNase II and PNPase, and so must be removed by endonuclease action (RNase E and/or RNase III) before degradation can proceed.

In the cell, RNase E and PNPase are located within a multiprotein complex called the degradosome. Other components of the degradosome include an RNA helicase, which is thought to aid degradation by unwinding the double-helix structure of the stems of RNA stem-loops. Fragments of rRNA occasionally co-purify with the degradosome, suggesting that the complex might be involved in both rRNA and mRNA degradation. But the exact role of the degradosome is still not clear and a few researchers are sceptical about its actual existence, pointing out that proteins not obviously involved in mRNA degradation, such as the glycolysis enzyme enolase, appear to be components of the degradosome, possibly indicating that the complex is an artefact that is produced during extraction of proteins from bacterial cells. A more significant gap in our knowledge concerns the way in which degradation is specifically targeted at individual mRNAs. We know that specific degradation occurs because mRNA degradation has been implicated in the regulation of several sets of bacterial genes, such as the pap operon of E. coli, which codes for proteins involved in synthesis of the cell surface pili (Baga et al., 1988). Unfortunately, the process by which such control is exerted remains a mystery.

10.4.2. Eukaryotes have more diverse mechanisms for RNA degradation

Among eukaryotes, most progress in understanding mRNA degradation has been made with yeast. At least four pathways have been identified. One of these involves a multiprotein complex called the exosome, which degrades transcripts in the 3′→5′ direction and contains nucleases related to the enzymes of the bacterial degradosome. Exosomes are probably also present in mammalian cells and are clearly important, but they are not particularly well studied. Their role may not be in mRNA degradation per se, but in monitoring polyadenylation and ensuring that transcripts that are about to leave the nucleus have an appropriate poly(A) tail (Hilleren et al., 2001).

Rather more is known about two other eukaryotic mRNA degradation processes. The first of these is deadenylation-dependent decapping (Figure 10.31), which is triggered by removal of the poly(A) tail, possibly by exonuclease cleavage or possibly by loss of the polyadenylate binding protein which stabilizes the tail (Section 10.1.2). Poly(A) tail removal is followed by cleavage of the 5′ cap by the decapping enzyme Dcp1p. Decapping prevents the mRNA from being translated (Section 11.2.2) and so ends its functional life. The mRNA then undergoes rapid exonuclease digestion from its 5′ end. Whether or not an individual mRNA is degraded is probably determined by the ability of Dcp1p to gain access to the cap structure, which in turn depends on the association between the cap and the proteins that bind to it in order to initiate translation (Section 11.2.2). Degradation is also influenced, at least with some yeast mRNAs, by sequences called instability elements, located within the transcript. The importance of these sequences has been demonstrated by experiments in which an element is artificially deleted, which leads to increased translation and reduced degradation of the mRNA (Tucker and Parker, 2000).

Figure 10.31

The deadenylation-dependent decapping pathway for degradation of an mRNA.

The second well studied system for degradation of eukaryotic mRNAs is called nonsense-mediated RNA decay (NMD) or mRNA surveillance. The first of these names gives a clue to its function, because in molecular biology jargon a ‘nonsense’ sequence is a termination codon. NMD results in the specific degradation of mRNAs that have a termination codon at an incorrect position, either because the gene has undergone a mutation or as a result of incorrect splicing. The incorrect codon is thought to be detected by a ‘surveillance’ mechanism that involves a complex of proteins which scans the mRNA and somehow is able to distinguish between the correct termination codon, located at the end of the coding region of the transcript, and one that is in the wrong place (Figure 10.32A; Culbertson, 1999). There are a number of conceptual difficulties with this model because it is not easy to imagine how the surveillance complex could discriminate between correct and incorrect termination codons. Current hypotheses are based on the demonstration that the correct termination codon is recognized as aberrant if the transcript is engineered so that an exon-intron boundary is placed downstream of this termination codon (Figure 10.32B). The surveillance enzymes may therefore use exon-intron boundaries as orientation positions in order to distinguish the correct termination codon, which is usually downstream from the last intron (Kim et al., 2001; Lykke-Andersen et al., 2001). Alternative schemes have also been proposed, in which importance is placed not on the position of the termination codon but on the precise nature of the events involved in termination of translation at a premature stop codon compared with one that is at its correct position (Hilleren and Parker, 1999). Whatever the mechanism, identification of an incorrect termination codon induces cap cleavage and 5′→3′ exonuclease degradation, without prior removal of the poly(A) tail, by proteins different to those involved in deadenylation-dependent decapping. Although NMD is designed primarily to degrade mRNAs that have become altered by mutation or have been incorrectly spliced, there is evidence that the pathway is also responsible for degradation of normal mRNAs, but probably not in a way that leads to control over expression of any individual gene.

Figure 10.32

mRNA surveillance.

The systems described above represent the eukaryotic processes for controlled degradation of endogenous mRNAs. Eukaryotes also possess other RNA degradation mechanisms that have evolved largely to protect the cell from attack by foreign RNAs such as the genomes of viruses. An example is the pathway called RNA interference, a name that will be familiar because RNA interference has been adopted by genome researchers as a means of inactivating selected genes in order to study their function (Section 7.2.2). The target DNA for RNA interference must be double stranded, which excludes cellular mRNAs but encompasses many viral genomes. The double-stranded RNA is cleaved by a ribonuclease called Dicer into short interfering RNAs (siRNAs) of 21–25 nucleotides in length (Ambros, 2001). This inactivates the virus genome, but what if the virus genes have already been transcribed? If this has occurred then the harmful effects of the virus will already have been initiated and RNA interference would appear to have failed in its attempt to protect the cell from damage. One of the more remarkable discoveries of recent years has revealed a second stage of the interference process that is directed specifically at the viral mRNAs. The siRNAs produced by cleavage of the viral genome are separated into individual strands, one strand of each siRNA subsequently base-pairing to any viral mRNAs that are present in the cell. The double-stranded regions that are formed are target sites for the RDE-1 nuclease, which destroys the mRNAs (see Figure 7.16).

10.5. Transport of RNA Within the Eukaryotic Cell

In a typical mammalian cell, about 14% of the total RNA is present in the nucleus (Alberts et al., 1994). About 80% of this nuclear fraction is RNA that is being processed before leaving for the cytoplasm. The other 20% is snRNAs and snoRNAs, playing an active role in the processing events, at least some of these molecules having already been to the cytoplasm where they were coated with protein molecules before being transported back into the nucleus. In other words, eukaryotic RNAs are continually being moved from nucleus to cytoplasm and possibly back to the nucleus again.

The only way for RNAs to leave or enter the nucleus is via one of the many nuclear pore complexes that cover the nuclear membrane (Figure 10.33). Initially looked upon as little more than a hole in the membrane, pore complexes are now regarded as complex structures that play an active role in movement of molecules into and out of the nucleus (Wente, 2000). Small molecules can move unimpeded through a pore complex but RNAs and most proteins are too large to diffuse through unaided and so have to be transported across by an energy-dependent process. As in many biochemical systems, the energy is obtained by hydrolysis of one of the high-energy phosphate-phosphate bonds in a ribonucleotide triphosphate, in this case by converting GTP to GDP (other processes use ATP→ADP). Energy generation is carried out by a protein called Ran, and transport requires receptor proteins called karyopherins, or exportins and importins depending on the direction of their transport activity. There are at least 20 different human karyopherins, each responsible for the transport of a different class of molecule - mRNA, rRNA, etc. Examples are exportin-t, which has been identified as the karyopherin for export of tRNAs in yeasts and mammals. Transfer RNAs are directly recognized by exportin-t, but other types of RNA are probably exported by protein-specific karyopherins which recognize the proteins bound to the RNA, rather than the RNA itself. This also appears to be the case for import of snRNA from cytoplasm to nucleus, which makes use of importin β, a component of one of the protein transport pathways (Nigg, 1997; Weis, 1998).

Figure 10.33

Eukaryotic RNAs must be transported through the nuclear pore complexes. In eukaryotes, rRNAs, tRNAs and mRNAs are transported from the nucleus to the cytoplasm, where these molecules carry out their cellular functions. At least some of the snRNAs and (more...)

Export of mRNAs is triggered by completion of the splicing pathway, possibly through the action of the protein called Yra1p in yeast and Aly in animals (Zhou et al., 2000; Keys and Green, 2001). Once outside the nucleus, there are mechanisms that ensure that mRNAs are transported to their appropriate places in the cell. It is not known to what extent protein localization within the cell is due to translation of an mRNA at a specific position or to movement of the protein after it has been synthesized, but it is clear that at least some mRNAs are translated at defined places. For example, those mRNAs coding for proteins that are to be transferred into a mitochondrion are translated by ribosomes located on the surface of the organelle. It is assumed that protein ‘address tags’ are attached to mRNAs in order to direct them to their correct locations after they are transported out of the nucleus, but very little is known about this process.

Study Aids For Chapter 10

Key terms

Give short definitions of the following terms:

Acceptor site
Adenosine deaminase acting on RNA (ADAR)
Alternative splicing
Antitermination
Antiterminator protein
Attenuation
AU-AC intron
Cleavage and polyadenylation specificity factor (CPSF)
Cleavage stimulation factor (CstF)
Commitment complex
Cryptic splice site
Cryptogene
CTD-associated SR-like protein (CASP)
Deadenylation-dependent decapping
Degradosome
Dicer
Donor site
Elongation factor
Elongator
Exon skipping
Exonic splicing enhancer (ESE)
Exonic splicing silencer (ESS)
Exosome
Exportin
Group I intron
Group II intron
Group III intron
GU-AG intron
Guanine methyltransferase
Guanylyl transferase
Guide RNA
Hammerhead
Helicase
Importin
Insertional editing
Instability element
Intrinsic terminator
Karyopherin
Lariat
Maturase
mRNA surveillance
Nonsense-mediated RNA decay (NMD)
Nuclear pore complex
Pan-editing
Poly(A) polymerase
Polyadenylate-binding protein (PADP)
Polyadenylation editing
Polypyrimidine tract
Pre-spliceosome complex
Promoter clearance
Promoter escape
Rho dependent terminator
Ribonuclease MRP
Ribozyme
RNA editing
RNA interference
RNA world
Sedimentation analysis
Short interfering RNA (siRNA)
Small nuclear ribonucleoprotein (snRNP)
Spliceosome
Splicing pathway
SR protein
SR-like CTD-associated factor (SCAF)
Transcription bubble
tRNA nucleotidyltransferase
trp RNA-binding attenuation protein (TRAP)
Twintron
Type 0 cap
Type 1 cap
Type 2 cap

Self study questions

1.: Outline the important features of the elongation phase of transcription in Escherichia coli.
2.: Describe how transcription is terminated in Escherichia coli.
3.: Using diagrams and specific examples, indicate how the processes called antitermination and attenuation influence transcription in bacteria.
4.: Describe the series of events that result in capping of a eukaryotic mRNA.
5.: Name and outline the functions of three different elongation factors for mammalian RNA polymerase II.
6.: Draw a series of diagrams to illustrate how a eukaryotic mRNA becomes polyadenylated.
7.: What are the key sequence features of a GU-AG intron?
8.: Give a detailed description of the series of events involved in splicing a GU-AG intron.
9.: What processes are thought to ensure that the correct splice sites are selected during splicing of a GU-AG intron?
10.: Give two examples to illustrate the importance of alternative splicing in genome expression.
11.: Why are AU-AC introns remarkable?
12.: Outline our current knowledge regarding elongation and termination of transcription by RNA polymerases I and III.
13.: Describe the cutting events involved in processing of bacterial and eukaryotic pre-rRNA and pre-tRNA.
14.: What is meant by ‘self-splicing’? Give details of the types of intron that display self-splicing. In your answer, distinguish between those introns that self-splice in vivo and those that only display this property in vitro.
15.: What is a ribozyme? Compile an annotated list of known ribozymes.
16.: List six types of chemical modification that occur with nucleotides in rRNA and tRNA. In each case, draw the structure of an example of a nucleotide resulting from the modification.
17.: Write an essay on ‘The role of snoRNAs in rRNA processing’.
18.: Give details of two examples of mRNA editing that occur in mammals.
19.: Outline the more complex forms of RNA editing that are known in various eukaryotes.
20.: Describe the processes of mRNA degradation in bacteria. How does the bacterial degradosome compare with the eukaryotic exosome?
21.: Distinguish between deadenylation-dependent decapping and nonsense-mediated RNA decay.
22.: What is Dicer and what does it do?
23.: Outline how eukaryotic RNAs are transported from the nucleus to the cytoplasm.

Problem-based learning

1.: ‘Current thinking views transcription as a stepwise nucleotide-by-nucleotide process, with the polymerase pausing at each position and making a “choice” between continuing elongation by adding another ribonucleotide to the transcript, or terminating by dissociating from the template. Which choice is selected depends on which alternative is more favorable in thermodynamic terms.’ Evaluate this view of transcription.
2.: Explore the introns-early and introns-late hypotheses. Is it possible to devise an analysis that will distinguish which of these two hypotheses is correct?
3.: . Figure 10.18 shows a model for splicing of GU-AG introns in which individual commitment complexes are assembled across exons rather than within introns. According to this model, how might alternative splicing be regulated? Devise experiments to test your ideas.
4.: To what extent has the study of AU-AC introns provided insights into the details of GU-AG intron splicing?
5.: The existence of ribozymes is looked upon as evidence that RNA evolved before proteins and therefore at one time, during the earliest stages of evolution, all enzymes were made of RNA. Assuming that this hypothesis is correct, explain why some ribozymes persist to the present day.
6.: Using the current information on RNA degradation, devise a hypothesis to explain how specific mRNAs could be individually degraded. Can your hypothesis be tested?

References

Adams MA, Celniker SE, Holt RA. et al. The genome sequence of Drosophila melanogaster. Science. (2000);287:2185–2195. [PubMed: 10731132]
Alberts B, Bray D, Lewis J, Raff M, Roberts K and Watson JD (1994) Molecular Biology of the Cell, 3rd edition. Garland Publishing, New York.
Ambros V. Dicing up RNAs. Science. (2001);293:811–813. [PubMed: 11486075]
Antson AA, Dodson EJ, Dodson G. et al. Structure of the trp RNA-binding attenuation protein, TRAP, bound to RNA. Nature. (1999);401:235–242. [PubMed: 10499579]
Bachellerie J-P, Cavaillé J. Guiding ribose methylation of rRNA. Trends Biochem. Sci. (1997);22:257–261. [PubMed: 9255067]
Baga M, Goransson M, Normark S, Uhlin BE. Processed mRNA with differential stability in the regulation of E. coli pilin gene expression. Cell. (1988);52:197–206. [PubMed: 2449283]
Bard J, Zhelkovsky AM, Helmling S, Earnest TN, Moore CL, Bohm A. Structure of yeast poly(A) polymerase alone and in complex with 3′-dATP. Science. (2000);289:1346–1349. [PubMed: 10958780]
Bass BL. RNA editing and hypermutation by adenosine deamination. Trends Biochem. Sci. (1997);22:157–162. [PubMed: 9175473]
Bentley D. Coupling RNA polymerase II transcription with pre-mRNA processing. Curr. Opin. Cell. Biol. (1999);11:347–351. [PubMed: 10395561]
Blencowe BJ. Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem. Sci. (2000);25:106–110. [PubMed: 10694877]
Bonen L, Vogel J. The ins and outs of group II introns. Trends Genet. (2001);17:322–331. [PubMed: 11377794]
Bourara K, Litvak S, Araya A. Generation of G-to-A and C-to-U changes in HIV-1 transcripts by RNA editing. Science. (2000);289:1564–1566. [PubMed: 10968794]
Burke JM, Belfort M, Cech TR. et al. Structural conventions for Group I introns. Nucleic Acids Res. (1987);15:7217–7221. [PMC free article: PMC306243] [PubMed: 3658691]
Carpousis AJ, Vanzo NF, Raynal LC. mRNA degradation: a tale of poly(A) and multiprotein machines. Trends Genet. (1999);15:24–28. [PubMed: 10087930]
Cate JH, Gooding AR, Podell E. et al. Crystal structure of a Group I ribozyme domain: principles of RNA packing. Science. (1996);273:1678–1685. [PubMed: 8781224]
Chabot B. Directing alternative splicing: cast and scenarios. Trends Genet. (1996);12:472–478. [PubMed: 8973158]
Clayton DA. A big development for a small RNA. Nature. (2001);410:29–31. [PubMed: 11242026]
Colgan DF, Murthy KGK, Prives C, Manley JL. Cell cycle regulation of poly(A) polymerase by phosphorylation. Nature. (1996);384:282–285. [PubMed: 8918882]
Conaway JW, Conaway RC. Transcription elongation and human disease. Ann. Rev. Biochem. (1999);68:301–309. [PubMed: 10872452]
Conaway JW, Shilatifard A, Dvir A, Conaway RC. Control of elongation by RNA polymerase II. Trends Biochem. Sci. (2000);25:375–380. [PubMed: 10916156]
Copertino DW, Hallick RB. Group II and Group III introns of twintrons: potential relationships with nuclear pre-mRNA introns. Trends Biochem. Sci. (1993);18:467–471. [PubMed: 8108859]
Corden JL, Patturajan M. A CTD function linking transcription to splicing. Trends Biochem. Sci. (1997);22:413–416. [PubMed: 9397679]
Culbertson MR. RNA surveillance: unforeseen consequences for gene expression, inherited genetic disorders and cancer. Trends Genet. (1999);15:74–80. [PubMed: 10098411]
Del Gatto-Konczak F, Olive M, Gesnel MC, Breathnach R. hnRNP A1 recruited to an exon in vivo can function as an exon splicing silencer. Mol. Cell. Biol. (1999);19:251–260. [PMC free article: PMC83883] [PubMed: 9858549]
Doherty EA, Doudna JA. Ribozyme structures and mechanisms. Ann. Rev. Biochem. (2000);69:597–615. [PubMed: 10966470]
Friedman DI, Imperiale MJ, Adhya SL. RNA 3′ end formation in the control of gene expression. Ann. Rev. Genet. (1987);21:453–488. [PubMed: 2450522]
Golden BL, Gooding AR, Podell ER, Cech TR. A preorganized active site in the crystal structure of the Tetrahymena ribozyme. Science. (1998);282:259–264. [PubMed: 9841391]
Graveley BR. Alternative splicing: increasing diversity in the proteomic world. Trends Genet. (2001);17:100–107. [PubMed: 11173120]
Guo Z, Sherman F. 3′-end forming signals of yeast mRNA. Trends Biochem. Sci. (1996);21:477–481. [PubMed: 9009831]
Hilleren P, Parker R. Mechanisms of mRNA surveillance in eukaryotes. Ann. Rev. Genet. (1999);33:229–260. [PubMed: 10690409]
Hilleren P, McCarthy T, Rosbach M, Parker R, Jensen TH. Quality control of mRNA 3′ end processing is linked to the nuclear exosome. Nature. (2001);413:538–542. [PubMed: 11586364]
IHGSC (International Human Genome Sequencing Consortium). Initial sequencing and analysis of the human genome. Nature. (2001);409:860–921. [PubMed: 11237011]
Jansa P, Mason SW, Hoffman-Rohrer U, Grummt I. Cloning and functional characterization of PTRF, a novel protein which induces dissociation of paused ternary transcription complexes. EMBO J. (1998);17:2855–2864. [PMC free article: PMC1170626] [PubMed: 9582279]
Keys RA, Green MR. The odd coupling. Nature. (2001);413:583–585. [PubMed: 11595932]
Kim VN, Kataoka N, Dreyfuss G. Role of nonsense-mediated decay factor hUpf3 in the splicing-dependent exon-exon junction complex. Science. (2001);293:1832–1836. [PubMed: 11546873]
Klug A. A marvellous machine for making messages. Science. (2001);292:1844–1846. [PubMed: 11397933]
Korzheva N, Mustaev A, Kozlov M. et al. A structural model of transcription elongation. Science. (2000);289:619–625. [PubMed: 10915625]
Lee S-K, Johnson RE, Yu S-L, Prakash L, Prakash S. Requirement of yeast SGS1 and SRS2 genes for replication and transcription. Science. (1999);286:2339–2342. [PubMed: 10600744]
Lee TI, Young RA. Transcription of eukaryotic protein-coding genes. Ann. Rev. Genet. (2000);34:77–137. [PubMed: 11092823]
Lykke-Andersen J, Aagaard C, Semionenkov M, Garrett RA. Archaeal introns: splicing, intercellular mobility and evolution. Trends Biochem. Sci. (1997);22:326–331. [PubMed: 9301331]
Lykke-Andersen J, Shu M-D, Steitz JA. Communication of the position of exon-exon junctions to the mRNA surveillance machinery by the protein RNPS1. Science. (2001);293:1836–1839. [PubMed: 11546874]
Maden BEH. Eukaryotic ribosomal RNA: guides to 95 new angles. Nature. (1997);389:129–131. [PubMed: 9296485]
Manley JL, Takagaki Y. The end of the message - another link between yeast and mammals. Science. (1996);274:1481–1482. [PubMed: 8966619]
Neuberger MS, Scott J. RNA editing AIDs antibody diversification. Science. (2000);289:1705–1706. [PubMed: 11001738]
Newman A. RNA enzymes for RNA splicing. Nature. (2001);413:695–696. [PubMed: 11607017]
Nigg EA. Nucleocytoplasmic transport: signals, mechanisms and regulation. Nature. (1997);386:779–787. [PubMed: 9126736]
Nilsen TW. A parallel spliceosome. Science. (1996);273:1813. [PubMed: 8815545]
Orphanides G, Reinberg D. RNA polymerase II elongation through chromatin. Nature. (2000);407:471–475. [PubMed: 11028991]
Proudfoot N. Connecting transcription to messenger RNA processing. Trends Biochem. Sci. (2000);25:290–293. [PubMed: 10838569]
Reeder RH, Lang WH. Terminating transcription in eukaryotes: lessons learned from RNA polymerase I. Trends Biochem. Sci. (1997);22:473–477. [PubMed: 9433127]
Reenan RA. The RNA world meets behavior: A→I pre-mRNA editing in animals. Trends Genet. (2001);17:53–56. [PubMed: 11173098]
Scott J. RNA editing: message change for a fat controller. Nature. (1997);387:242–243. [PubMed: 9153385]
Smith CWJ, Valcárcel J. Alternative pre-mRNA splicing: the logic of combinatorial control. Trends Biochem. Sci. (2000);25:381–388. [PubMed: 10916158]
Smith HC, Sowden MP. Base-modification mRNA editing through deamination - the good, the bad and the unregulated. Trends Genet. (1996);12:418–424. [PubMed: 8909139]
Stark H, Dube P, Lührmann R, Kastner B. Arrangement of RNA and proteins in the spliceosomal U1 small nuclear ribonucleoprotein particle. Nature. (2001);409:539–542. [PubMed: 11206553]
Strachan T and Read AP (1999) Human Molecular Genetics, 2nd edition. BIOS Scientific Publishers, Oxford.
Tarn W-Y, Steitz JA. Highly diverged U4 and U6 small nuclear RNAs required for splicing rare AT-AC introns. Science. (1996);273:1824–1832. [PubMed: 8791582]
Tarn W-Y, Steitz JA. Pre-mRNA splicing: the discovery of a new spliceosome doubles the challenge. Trends Biochem. Sci. (1997);22:132–137. [PubMed: 9149533]
Tollervey D. Small nucleolar RNAs guide ribosomal RNA methylation. Science. (1996);273:1056–1057. [PubMed: 8711484]
Toulokhonov I, Artsimovitch I, Landick R. Allosteric control of RNA polymerase by a site that contacts nascent RNA hairpins. Science. (2001);292:730–733. [PubMed: 11326100]
Tucker M, Packer R. Mechanisms and control of mRNA decapping in Saccharomyces cerevisiae. Ann. Rev. Biochem. (2000);69:571–595. [PubMed: 10966469]
Tuite MF. RNA processing: death by decapitation for mRNA. Nature. (1996);382:577–579. [PubMed: 8757122]
Turner PC, McLennan AG, Bates AD and White MRH (1997) Instant Notes in Molecular Biology. BIOS Scientific Publishers, Oxford.
Valcärcel J, Green MR. The SR protein family: pleiotropic functions in pre-mRNA splicing. Trends Biochem. Sci. (1996);21:296–301. [PubMed: 8772383]
von Hippel PH. An integrated model of the transcription complex in elongation, termination and editing. Science. (1998);281:660–665. [PubMed: 9685251]
Weis K. Importins and exportins: how to get in and out of the nucleus. Trends Biochem. Sci. (1998);23:185–189. [PubMed: 9612083]
Wente SR. Gatekeepers of the nucleus. Science. (2000);288:1374–1377. [PubMed: 10827939]
Wittschieben BO, Otero G, de Bizemont T. et al. A novel histone acetyltransferase is an integral subunit of elongating RNA polymerase II holoenzyme. Mol. Cell. (1999);4:123–128. [PubMed: 10445034]
Zhou Z, Luo M-J, Straesser K, Katahira J, Hurt E, Reed R. The protein Aly links pre-messenger RNA splicing to nuclear export in metazoans. Nature. (2000);407:401–405. [PubMed: 11014198]