U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Mattick J, Amaral P. RNA, the Epicenter of Genetic Information: A new understanding of molecular biology. Abingdon (UK): CRC Press; 2022 Sep 20. doi: 10.1201/9781003109242-5

Cover of RNA, the Epicenter of Genetic Information

RNA, the Epicenter of Genetic Information: A new understanding of molecular biology.

Show details

Chapter 5Strange Genomes, Strange Genetics

By the time of the lac operon, it was evident that the genetics of complex eukaryotes is different to that of bacteria. Plant and animal genomes were found to be orders of magnitude larger and to contain large amounts of differentially expressed ‘repetitive’ sequences, leading to the suggestion that gene regulatory information might occupy the bulk of the genomic information in complex organisms. The repetitive sequences were found to be derived from transposable elements, some of which were shown to act as ‘controlling elements’ affecting phenotypic characteristics in maize, similar to position-effect variegation in Drosophila, neither of which could be explained in conventional terms. Other equally inexplicable genetic phenomena were also observed, including paramutation (‘rogue’ non-Mendelian patterns of inheritance), imprinting (parental allele-specific expression) and transvection (allelic crosstalk), the first indications of epigenetic inheritance and RNA-mediated gene regulation. Homeotic genes controlling body plan formation were identified in Drosophila, along with adjacent non-protein-coding loci that modulate their expression by interaction with genes encoding repressive and activating epigenetic modifiers, notably Polycomb and Trithorax, which later turned out to be ubiquitous in animals and plants and to encode proteins that modify histones. Brave attempts were made to integrate the observations of heterogeneous nuclear RNAs and repetitive sequences into a network model of gene regulation during development, but such models were considered too speculative and unnecessary, then overrun in the stampede of gene cloning.

The genomes of ‘higher organisms’ are thousands of times larger than that of E. coli. 1 Viral genomes range from ~1 kb to 2.5 Mb. 2 , 3 Bacterial and archaeal genomes range from ~300 kb to 11 Mb (the upper limit, apparently, see Chapter 15), protozoan and fungal genomes from ~8 to 100 Mb (most less than 50 Mb), 4 animal genomes from ~100 Mb to 5 Gb (average 1.3 Gb) and plant genomes a from ~100 Mb to 20 Gb (average 1.5 Gb). 8 By comparison, the combined length of the two largest human genes, dystrophin (2.4 Mb; required for muscle integrity) 9 , 10 and the neurexin CNTNAP2 (2.3 Mb; which functions in the nervous system) 11 is approximately the same as the entire E. coli genome.

In 1964, Friedrich Vogel estimated that the human genome (~3.3 Gb) encodes around 7 million genes, based on the size of hemoglobins and the assumption that “most of the DNA works as genetic material”. 12 Similar numbers were obtained using similar logic by others – in 1969 the theoretical biologist Stuart Kauffman extrapolated James Watson’s estimate of 2,000 genes in E. coli in the first (1965) edition of his book ‘Molecular Biology of the Gene’ 13 to predict that there are 2 million genes in humans. 14

Vogel recognized that his estimate was disturbingly high and speculated that “the systems of higher order which are connected with structural genes in operons and regulate their activity might occupy a much larger part of the genetic material than the structural genes”. 12

On the other hand, the existence of upward exceptions, notably in some amoebae, plants and arthropods, to an otherwise consistent increase in the amount of DNA in cells of eukaryotes of differing developmental complexity (Figure 5.1) led to the so-called C-value enigma, 8 which was invoked to support the growing idea that the genomes of higher organisms carry a lot of superfluous DNA (Chapter 7).

Figure 5.1. Britten and Davidson's 1969 graph of the increase in the minimum amount of DNA that had been recorded for “species at various grades of organization”.

Figure 5.1

Britten and Davidson's 1969 graph of the increase in the minimum amount of DNA that had been recorded for “species at various grades of organization”. (Reproduced from Britten and Davidson with permission of the American Association for (more...)

Repetitive DNA

In the late 1960s, Roy Britten and colleagues analyzed the patterns of DNA renaturation 16 , 17 and concluded that, unlike bacteria, “in general, more than one-third of the DNA of higher organisms is made up of sequences which recur anywhere from a thousand to a million times per cell” likely to have “arisen from large-scale precise duplication of selected sequences, with subsequent divergence caused by mutation and the translocation of segments of certain member sequences” (Figure 5.2).

Figure 5.2. Britten and Kohne's 1968 graph of renaturation hybridization kinetics of denatured calf thymus DNA (circles and triangles) showing ~40% rapidly renaturing DNA, indicating the existence of high copy numbers (estimated to be an average of 100,000) of repetitive sequences (left side of graph) and 50%–60% single copy sequences (right side).

Figure 5.2

Britten and Kohne's 1968 graph of renaturation hybridization kinetics of denatured calf thymus DNA (circles and triangles) showing ~40% rapidly renaturing DNA, indicating the existence of high copy numbers (estimated to be an average of 100,000) of repetitive (more...)

They suggested that these sequences and events have important evolutionary implications:

The range of frequency of repetition is very wide, and there are many degrees of precision of repetition in the DNA of individual organisms. During evolution the repeated DNA sequences apparently change slowly and thus diverge from each other. There appears to be some mechanism which, from time to time, extensively reduplicates certain segments of DNA, replenishing the redundancy. 17

Britten and colleagues also cited many studies that observed changes in the pattern of types of hybridizable RNA synthesized during embryonic development and liver regeneration (noting that the hybridization conditions used selectively favored repetitive sequences), which showed “direct evidence for the genetic function of at least some of the repeated DNA sequences” and “that during the course of differentiation different families of repeated sequences are expressed at different stages”. 17 , 18 Many other studies have since confirmed that ‘repeat sequences’ are a prevalent component of the genomes of multicellular eukaryotes generally, 19 and that they are transcribed in a highly regulated manner, especially in the early stages of embryonic development (Chapter 16).

These results also explained the earlier observation that eukaryotic DNA often exhibited the presence of ‘satellite’ bands in equilibrium density gradients, indicating a substantial fraction with different nucleotide composition. Heterogeneity of sequence composition was subsequently found to occur throughout the mammalian genome, due to regional variation in G+C content, termed ‘isochores’ by Giorgio Bernardi. 20 , 21

However, because they did not fit the conventional conception of a ‘gene’, the presence of large amounts of ‘repetitive’ DNA, a term that became pejorative, reinforced the idea that complex eukaryotes could tolerate a great deal of nonfunctional DNA, rather than stimulate new thinking, with the notable exception of Britten himself and a few others (see below).

The repetitive sequences discovered by Britten turned out to be largely derived from transposons, short (SINE) and long (LINE) elements mobilized by reverse transcription, and DNA segments mobilized by a ‘cut-and-paste’ mechanism, which range in length from a few hundred to 30 thousand base pairs and can be excised or transcribed, copied and reinserted into genomes at other places, conveying cassettes of genetic information. 19 , 22

Controlling Elements

Transposons were discovered in 1948 by Barbara McClintock, b who observed unexpected position-dependent effects of mobile DNA segments in maize, which changed the colors and color patterns of corn kernels (among other things), specifically the mobile ‘Dissociator-Activator’ (Ds-Ac) elements, and another that she called ‘Suppressor-mutator’ (Spm), published well before Jacob and Monod’s work on the regulation of the lac operon in E. coli, 25 , 26 with which she later drew parallels 27 (Figure 5.3).

Figure 5.3. Ears of corn kept by McClintock to illustrate the transposition of ‘controlling elements' – the ‘standard' position of the transposable element Ds (Dissociation) located after C (colorless), Sh (shrunken) and wx (waxy) on chromosome 9 (upper ear, from 1947) and the transposed position of Ds now positioned before Sh, Bz (bronze) and wx (lower ear, from 1949).

Figure 5.3

Ears of corn kept by McClintock to illustrate the transposition of ‘controlling elements' – the ‘standard' position of the transposable element Ds (Dissociation) located after C (colorless), Sh (shrunken) and wx (waxy) on chromosome (more...)

McClintock’s work was revolutionary. Rather than focusing on mutations that affect viability, she developed and used cytogenetic analysis of X-ray-induced chromosomal breakage-fusion-bridge cycles for creating and mapping mutations. 28 , 29 She was the first to show that genetic recombination involved physical exchange of chromosomal segments, 30 the first to observe ‘jumping genes’ and the first to show that the position of their insertion alters the expression of nearby genes (for example at the bronze locus, which is involved in the biosynthetic pathway for the pigment anthocyanin) 31 – quite different from normal conceptions of regulatory sequences.

McClintock reported her data on transposition in the formal scientific literature in 1950, 33 and in person in 1951 at the influential Cold Spring Harbor Symposium on Quantitative Biology, 25 at the conclusion of which geneticist Evelyn Witkin recalls:

In her own [McClintock’s] words, the response was puzzlement, and in some instances hostility. Certainly, as I remember it, there was baffled silence after her talk and little or no discussion of her densely documented evidence and argument for transposable elements and their effects on gene expression. The audience seemed embarrassed by the lack of response, understandably, because McClintock was respected and admired as a great geneticist. c Here, her conclusions were too radically in conflict with the entrenched genetic concept of a stable genome, and her data too complex, to allow for rapid or easy acceptance, although a small number of geneticists who had come to know her work well believed it to be profoundly important. 35

Highlighting the parallels with position-effect variegation in Drosophila and heterochromatin (and also citing the work of Ed Lewis, see below), McClintock said:

The behavior of these new mutable loci in maize cannot be considered peculiar to this organism. The author believes that the mechanism underlying the phenomenon of variegation is basically the same in all organisms … Is it usually an orderly mechanism, which is related to the control of the processes of differentiation? 33

McClintock’s observations were confirmed by Royal Alexander Brink and Robert Nilan studying another mobile element called ‘Modulator’, who concluded that their findings “cannot be explained in conventional genetic terms” and speculated that Modulator “may belong to the obscure category of germinal substances, termed heterochromatin, which has been associated … with different variegated phenotypes in Drosophila”. 36

The genetic behavior of these families of elements is complex but, in general, they are comprised of some small, transposition-competent sequences and a larger number of derived transposition-defective sequences d whose mobility is dependent on a factor supplied by an active member of the family. 39 Indeed, Ac and Ds are autonomous and non-autonomous transposable elements, 40 respectively, explaining the dependence of the latter on the former. Other elements, such as the Suppressor-mutator element, are more complicated. 41 , 42

McClintock eventually received the Nobel Prize in 1983 for having discovered transposition, coinciding with demonstrations by Nina Fedoroff and others that Ac encodes a transposase responsible for the transposition of itself as well as Ds, 39 , 43 the identification of TEs (transposable elements, see below) in Drosophila 44–46 and bacteria, 47 , 48 and the growing use of TEs as mutagenic tools, especially ‘P-elements’ in Drosophila. 49 , 50

McClintock insisted that her most important finding was that such ‘controlling elements’ - which others such as Brink preferred to call ‘transposable elements’, as the term was “less interpretative” 51 – played a role in normal development, as both the timing and frequency of transposition and associated chromosomal rearrangements are developmentally regulated and have developmental consequences. 25 , 26 , 52–54 She also showed that some of these elements exhibited reversible mutations. 55

McClintock conceptualized the signatures of epigenetic control of gene expression decades before it was formally recognized and studied. She identified genetic elements that were either active or inactive only in the crowns of corn kernels, as well as elements that were active only in lower side branches (called ‘tillers’) or exhibited a different pattern of activity in cobs from the tillers and primary stalk. 52 , 54 This implied the existence of an additional regulatory mechanism that determines the pattern of TE expression/activity during development. McClintock correctly speculated that this mechanism involved heritable modifications of the gene that are not caused by changes to its DNA sequence, later shown by Rob Martienssen and colleagues to be due to DNA methylation and histone modifications that regulate heterochromatin formation and are imposed by enzymes directed by RNA 56–60 (Chapters 12, 14 and 16).

McClintock intuited that her controlling elements did not encode regulatory proteins but rather functioned as regulatory modules. However, when transposons carrying protein-coding antibiotic resistance genes were discovered in bacteria by James Shapiro in 1969, 47 her theory that TEs are involved in regulating gene expression during development was much ignored, to her ongoing frustration. 51 The fact that the genetic phenomena involved were complex and not easily understood, as opposed to the simple lac operon/transcription factor operator-repressor model, did not help. 32

Nonetheless, she was the pioneer of the concept of the dynamic genome, including in response to stress, 61 and her ideas were well ahead of her contemporaries. We now know that transposon-derived sequences are major sites of epigenetic modification, regulate many aspects of gene expression during plant and animal development, and are mobilized in evolutionary time for phenotypic diversification and in real time in the brain (Chapters 10, 16 and 17).

Paramutation, Imprinting and Transinduction

In 1915, Bateson and Caroline Pellew reported unusual “rogue” non-Mendelian patterns of inheritance in peas, 62 , 63 which were occasionally observed thereafter in other species, but were not studied in a systematic way until Brink coined the term ‘paramutation’ in the 1950s to describe the atypical inheritance of traits displayed by some genes in maize, tomato and other species. 64–68 While initially thought to be restricted to plants, later studies showed that paramutation and related phenomena also occur in animals, including mammals, likely to a much greater extent than has been appreciated, and that it involves the intergenerational transmission of epigenetic information (Chapter 17).

Waddington, who pioneered the concept that epigenetic mechanisms control the trajectories of development, 69 reported, also in the 1950s, that exposure of Drosophila eggs to ether for a few generations resulted in the inheritance of a bithorax-like phenotype (see below) in subsequent (untreated) generations. 70 He also showed that another phenotype, ‘crossveinless’ wings, induced by heat shock of pupae, showed similar inheritance in later generations. 71 Waddington concluded that:

All phenotypes are modified, to a greater or lesser extent, by the environment. All genotypes under natural conditions will be subject to selection pressures relating to the manner in which their development is modified by the environment. The phenotypic effect of any new gene mutation must therefore be to some extent influenced by the kind of developmental flexibility which has been built into the rest of the genotype by selection for its response to the environment. 70

Such ‘genetic assimilation’ has profound implications (Chapter 18), but was waved away by most theoretical and molecular biologists of the time. 72 It is difficult, in any case, to disentangle the interplay of genetic variation with epigenetic inheritance, 73 , 74 and recent evidence suggests that, at least in some instances, the cause is stress-induced mutations. 75 , 76

In 1974, D. R. Johnson reported that an allele of the brachyury gene (which encodes a transcription factor required for mammalian gastrulation and early organogenesis 77 ) has a lethal phenotype if inherited from the mother, but not if inherited from the father. 78 A similar phenomenon was described at a translocated locus affecting the relative number of male and female offspring by Mary Lyon in 1977. 79 This ‘parental imprinting’ was characterized independently by the groups of Azim Surani and Davor Solter in 1984 as a requirement for both female and male genomes for development, due to ‘marked’ loci (‘epialleles’) with differential patterns of expression depending on parent of origin. 80–83

Parental imprinting only occurs in mammals and is thought to be an adaptation to placental biology driven by sexual antagonism, maternal-offspring coadaptation and/or kinship interest. 84–87 A high proportion of the imprinted genes have arisen through retrotransposition 88–90 (although they are depleted of SINE elements 91 ) and imprinting appears to have evolved with the invasion of particular classes of repeats at the marsupial-eutherian interface. 86 , 92 Interestingly, non-coding RNAs are at the heart of the regulation of genomic imprinting and X-chromosome inactivation in mammals (Chapters 9, 13 and 16).

Dysregulated imprinting has since been associated with disorders such as Angelman and Prader-Willi Syndromes in humans, 93–95 as well as with the callipyge (‘beautiful buttocks’) ‘polar overdominance’ phenotype in sheep involving cis- and trans-epiallelic interactions 96–98 and non-coding RNAs. 99 Similar epigenetic mechanisms that regulate transposon activity for genome defense and gene regulation were found to underlie the phenomena (discovered in the 1980s and 1990s) of ‘repeat-induced’ and ‘homology-dependent’ gene silencing e in fungi, plants and animals 100–105 (Chapter 12).

Another odd phenomenon, termed ‘transinduction’, was described in 1997 by Alison Ashe, Nick Proudfoot and colleagues, who found that transient cell transfections with a beta-globin gene construct induced the expression of non-coding RNAs from the regulatory ‘locus control region’ and intergenic regions of the globin locus (but not the endogenous beta-globin mRNA), which required transcription of the transgene but not its translation. 106 That is, ectopic expression of an mRNA has regional effects on the gene expression profile of its locus in the absence of the encoded protein, which indicates that transcription itself and/or mRNAs also convey regulatory signals.

The Bithorax ‘Complex Locus'

Clues pointing to epigenetic control of development and the involvement of regulatory RNAs were also emerging in the 1940s and 1950s from Ed Lewis’ studies on the genetics of Drosophila body segment specification, based on mutant and recombination phenotypes, particularly in ‘homeotic’ genes, which produce transformations of segment identity. 107–109

‘Homeosis’ was initially described and the term coined in 1894 by Bateson to describe phenotypic variations in which “something is changed into the likeness of something else”. 110 , 111 The first homeotic gene was identified in 1915 by Calvin Bridges, 107 who identified a mutation (‘bithorax’) that converted the third thoracic segment into the second, producing an additional pair of wings, f and mapped it to a genomic region later named by Lewis the ‘bithorax complex’ (Figure 5.4). 113–119

Figure 5.4. (a) Wild-type and (b) a bithorax mutant Drosophila melanogaster.

Figure 5.4

(a) Wild-type and (b) a bithorax mutant Drosophila melanogaster. (Reproduced from Akbari et al. with permission from Elsevier.)

Mutations in the bithorax complex showed spectacular perversions of development. The bithorax phenotype is caused by loss-of-function mutations in the protein-coding gene ultrabithorax (Ubx). Other mutations convert antennae into legs (Antennapedia, Antp) or genitalia into legs or antennae (Abdominal-A, abd-A, and Abdominal-B, Abd-B), all also caused by loss-of-function of ‘homeotic’ proteins. 113–119 It was subsequently realized, after the gene cloning revolution (Chapter 6), that duplicated paralogous clusters of homeotic genes, with the same spatial order of paralogs, occur in vertebrates. 121

Lewis found that the spatial expression patterns of the homeotic genes and their phenotypes are modulated by mutations in nearby ‘cis’-regulatory loci called anterobithorax (abx), bithorax (bx), bithoraxoid (bxd), contrabithorax (Cbx) and postbithorax (pbx), as well as infra-abdominal (iab1–4, modulating abd-A; and iab5–8, modulating Abd-B), which he termed ‘pseudoalleles’. 122 Some of these loci turned out to reside in the introns of the protein-coding genes, but most could be easily separated by recombination, indicating that they are discrete, separate genetic elements. 123

These genes also exhibited intriguing position effects, trans-heterozygotes being more abnormal than cis-heterozygotes, indicating a local interaction, with trans-heterozygotes having defective regulation on one chromosome and a defective protein on the other, whereas cis-heterozygotes had a normal pattern of expression of a functional protein from one chromosome. 122 , 124

Transvection

In 1954, Lewis discovered that a structural rearrangement that moved Ubx to a different chromosomal location resulted in a mutant phenotype in flies heterozygous but not those homozygous for the rearrangement, which he attributed to disruption of the pairing of the two alleles. 125

Based on the ‘cis-trans position effect’, Lewis proposed that pseudoallelic regions control the hierarchical expression relations (or ‘polarity’) observed in the cluster by either “cis-vection”, in which the substances generated by the pseudoalleles acted on adjacent genes, or by “trans-vection” based on the proximity (somatic pairing) of homologous sequences to explain the influence of alleles in one chromosome over the alleles in another. 122 , 124

Transvection, or ‘allelic cross-talk’, has been observed at many other loci in Drosophila and, among other things, regulates the sexually dimorphic expression of X-linked genes. 126–128 The phenomenon appears to be proximity-dependent, but not always. 129 , 130 Some interactions can occur at a distance and may not be strictly dependent on homolog pairing as translocations and mutations in the gene zeste, which normally disrupt transvection, have limited or no effects on pairing. 131

To explain transvection, Lewis initially invoked the operon model, postulating that pseudoallelic loci contained genes responsible for the synthesis of substances that are coordinately regulated by an operator element 108 , 132 and that a locally produced diffusible substance would activate a cascade of reaction steps involving the other substances (the “sequential reaction model”). 122 , 124 However, if these substances were proteins their production had to occur close to their sites of action because of the proximity effect, i.e., in the nucleus, whereas translation occurred in the cytoplasm.

Consequently, like transposition, it was problematic to explain the ‘pairing-dependent’ transvection phenomenon in conventional terms, unless “homologous chromosomes cooperate with one another in the transcription process”. 108 Lewis then floated the idea that substances “used only briefly in development” are synthesized “at the chromosomal level”. 108

While the obvious candidate was RNA, Lewis suggested that the substances involved might be RNAs, polypeptides, or products of the enzymatic activity of such polypeptides, 132 , 133 and was careful not to make assumptions, g famously saying that “The laws of genetics had never depended upon knowing what the genes were chemically and would hold true even if they were made of green cheese”, perhaps a frustrated reaction and pointed riposte to colleagues who told him that he “was simply dealing with missense and nonsense mutants within a protein and that all we were doing was mapping sites within a single protein coding unit!”. 132 , 134

Decades later, in the 1980s, David Hogness (who cloned the first eukaryotic transposon), Mike Akam and colleagues found that the bithorax complex produces multiple overlapping and intergenic RNAs, many of which do not encode proteins, 135–138 including those from the (presumed) cis-regulatory sequences encompassing Polycomb and Trithorax ‘response elements’ 126 , 139 (see below). For example, the bxd locus produces a 27 kb transcript whose expression is highly regulated during embryogenesis, in a pattern that is partially reflexive of the expression pattern of Ubx. 135–138 The regulatory elements of abd-A and Abd-B are also transcribed. 140–142

With this evidence in mind, others also suggested that transvection involves RNA transactions, 143–145 concluding in one case that “the genetic analysis of these transvection effects suggests that the transcription of the CbxIRM and Cbx2 alleles depends on RNAs of short radius of action from the homologous Ubx gene”. 143 Subsequent studies showed that transvection occurs by sequences on one homolog directing the expression of the cognate transcription unit on the other, 146 and that these sequences are ‘enhancers’, 147–150 genes that control the spatial expression patterns of nearby and distal protein-coding genes during development and produce non-coding RNAs in the cells in which they are active (Chapters 14 and 16).

Like other genes and genetic phenomena affecting development that were first discovered in maize or Drosophila and thought to be idiosyncratic, transvection was subsequently found to occur in fungi, plants and mammals, and to be a general feature of multicellular eukaryotes, although still poorly understood. 151 , 152

Epigenetic Modifiers

The other major finding of this period, whose importance was also not broadly appreciated until much later, was the identification of genes that had a global effect on the expression of homeotic genes, termed Polycomb Group (PcG) and Trithorax group (TrxG). The first PcG gene, Polycomb (Pc) was discovered in 1947 by Pamela Lewis, 153 so named because the most common effect of PcG mutants is the appearance of extra pairs of sex combs on the legs of the second and third thoracic segment in males, a feature normally found only on the legs of the first thoracic segment. These mutants are lethal when homozygous but cause mild homeotic transformations when one copy is incapacitated, a semi-dominant ‘haplo-insufficient’ phenotype.

Related genes with similar effects, such as Extra sex combs, Sex combs on midleg and Additional sex combs, were subsequently also identified. 154–156 In 1978, Lewis observed that Pc “seems to code for a repressor of the complex”, as it repressed expression of Ubx in anterior segments, causing them to be transformed into more posterior ones. 113

Mutants that had the opposite effect were also identified, notably by Phil Ingham in the 1980s. These mutants, including Trithorax and others such as Enhancer of zeste, cause embryonic segments to be transformed into anterior ones by antagonizing PcG proteins. 123 , 157–161 Many TrxG genes were subsequently identified through genetic screens for mutations that suppress the phenotype of PcG genes. 162–164

The observation that PcG and TrxG factors are required to maintain homeotic gene expression gave rise to the hypothesis that PcG (repressive) and TrxG (activating) proteins act as a “cellular memory system”. 165 , 166 Orthologs of PcG and TrxG genes were later found to be ubiquitous in plants and animals 167 and to encode histone-modifying proteins and ATP-dependent chromatin-remodeling complexes, or repressors thereof, for the epigenetic control of gene expression during development (Chapter 14).

Indeed, if all this sounds complicated, it is, and was too much for the operon crowd to digest. Max Delbruck, the physicist turned bacteriophage geneticist, wrote in a 1963 letter to a former member of Lewis’ laboratory: “I then plunged into the bithorax saga for which Lewis very kindly sent me his latest manuscript … I must say I am puzzled … [and] … strongly suspect that there is something wrong here in the analysis.” 132

The Britten and Davidson Model

In 1969, Britten and Eric Davidson h published a paper entitled ‘Gene regulation for higher cells: a theory’, 15 which attempted to integrate the findings of the preceding decades, including the prevalence of repetitive sequences in higher organisms, Davidson’s observations on ‘informational RNAs’ in amphibian and sea urchin embryos, 170 and the data on embryo development that Davidson had assembled the year before in his book ‘Gene activity in early development’. 171 This paper devoted special attention to the enormous size of the genomes of higher organisms, the diversity of transcripts in the nucleus and the abundance of repetitive sequences that are transcribed in a cell-specific fashion. It was the first serious consideration of gene networks (“gene batteries”) and regulatory circuits in the evolution and development of higher eukaryotes. 172

Britten and Davidson’s theory was partly but not entirely influenced by Jacob and Monod’s principles (with elements analogous to operators, regulator and structural genes), but was distinct in that it encompassed the difference in genomic organization and transcriptional output between eukaryotes and prokaryotes, and emphasized the importance of genomic structure in the cascade of gene expression during development. An important proposition was that “the (normal) state of the higher cell genome is histone-mediated repression and that regulation is accomplished by specific activation”. 15

Britten and Davidson proposed that genes in eukaryotes were differentiated into functional classes: structural (or “producer”) genes (which encode proteins) and “integrator genes” (which encode regulatory molecules), the latter of which was influenced by McClintock’s model of gene regulation, developed before it was known that transposable elements were the main sources of repeated sequences.

As in Jacob and Monod’s original model, Britten and Davidson posited that regulatory genes produce RNAs. They extended this notion to suggest that such RNAs would connect regulatory networks to activate the gene batteries that specified the phenotypes of different cell types. Focusing on regulation at the level of transcription, the proposed advantage of RNAs as regulatory molecules was again its base pairing ability, which allowed genomic regulation by recognition of specific “receptor sequences”. 15

A major factor in their proposal was the literature describing chromatin-associated RNAs and the huge heterogeneous nuclear RNAs that contain repetitive sequences. According to the model, RNAs bind in a sequence-specific manner to DNA to provide the basis of specific gene regulation, explaining the complexity of RNAs in the nucleus compared to the cytoplasm. It would also explain the different spectrum of RNAs transcribed from repeated sequences during early development and cell differentiation. 170 , 173

Britten and Davidson thought that, while it was possible either RNA or proteins were the diffusible regulatory molecules, RNA represented the “simpler alternative”. 15 In addition, according to them, the potential of formation of new gene batteries (acting in a specific cellular program) appeared to differ greatly between the two alternatives, because RNA specificity would only depend on target sequence complementarity, with implications for the modularity and co-evolution of these sequences (Chapter 16).

They were also motivated by early measurements of the cellular DNA content in different species, which showed that the amount of DNA per cell was greater in “higher” than in “primitive” forms of invertebrates. 174 Britten and Davidson plotted the amount of DNA of a vast range of organisms (from viruses and bacteria to mammals) and were able to show that the minimum genome size greatly increased concomitant with increased complexity of their organization. They then reasoned that most of the known biosynthetic pathways were already present in unicellular organisms, and that a significant increase in the number of ‘structural genes’ in higher organisms was unlikely. 15 Therefore, they posited that the difference between sponges and mammals would lie in the increased complexity of regulation, which corresponded to the “integrator genes” and “receptor sequences”. The expansion of non-protein-coding sequences, but not coding sequences, in animal genomes was later confirmed by genome sequence data (Chapter 10). 175–177

Finally, Britten and Davidson proposed that the repetitive sequences played a role in coordinating the expression of gene networks during development and differentiation. In fact, Britten and David Kohne in their original paper published in the previous year, perhaps aware of the growing consensus that repetitive and other non-protein-coding sequences in the huge genomes of higher organisms is junk (Chapter 7), stated: “A concept that is repugnant to us is that about half of the DNA of higher organisms is trivial or permanently inert (on an evolutionary time scale). Furthermore, at least some of the members of DNA families find expression as RNA. We therefore believe that the organization of DNA into families of related sequences will ultimately be found important to the phenotype.” 17 In their model, the repeat elements comprise “receptor genes” adjacent to “producer genes”, functioning as target sequences for regulatory RNAs (Figure 5.5).

Figure 5.5. The 1969 Britten and Davidson gene regulatory network model.

Figure 5.5

The 1969 Britten and Davidson gene regulatory network model. (Reproduced from Britten and Davidson with permission from the American Association for the Advancement of Science.)

These ideas were extended in subsequent publications during the 1970s, including proposed molecular mechanisms for gene-by-gene rewiring in development and evolution. 178–182 They were also supported by Nina Fedoroff’s observation that “large RNAs in the mammalian cell nucleus contain elements complementary to ones in other large nuclear RNAs”, 183 although it was later noted by Pederson that “this intriguing finding was never pursued by this or any other group, at least as the extant literature speaks”. 184

Importantly, Britten and Davidson developed the idea that multicellular development was dependent on an extensive, interwoven regulatory architecture embedded in the structural organization of the genome, and that developmental ‘novelty’ – phenotypic diversification – is achieved mainly by variation in the regulatory architecture, 172 , 185–187 not the repertoire of encoded proteins (although there are lineage-specific expansions of protein families, such as homeotic genes in vertebrates and olfactory receptors in mice, and some novel proteins that appear from time to time in evolution). The concept that phenotypic diversity in plants and animals is largely achieved through mutations altering regulatory circuits i was extended by Mary-Claire King and Allan Wilson in their 1975 paper ‘Evolution at two levels in humans and chimpanzees’ 188 and Jacob’s 1977 paper ‘Evolution and tinkering’, 189 and is now well accepted in the field referred to as ‘Evo-Devo’, 190 , 191 even extending to the evolution of enzyme activity. 192

Waddington commented that the Britten-Davidson model of the mechanisms controlling embryogenesis was “the first … to make sense”. 193

Boolean Models of Combinatorial Control

In parallel with Britten and Davidson, a Boolean network function for the regulation of gene activity was advanced by Kauffman. 14 , 194–196 Kauffman invoked bacterial operon promoter-repressor systems as the model 195 and suggested that combinations of binary (‘on-off’) DNA-protein interactions could produce stable gene activity patterns from small numbers of variables, later embraced by Davidson. 185 , 197–199 Such approaches (which did not consider but does not prohibit regulatory RNAs) have been used to correctly predict some gene expression profiles; 198 , 200 , 201 they also gave comfort to the generally accepted idea that combinatorics of protein interactions are sufficient to account for ontogeny, although it has otherwise had little impact on the consciousness of the molecular biologists traditionally concerned with nuts and bolts.

In any case, by the beginning of the 1970s, two major problems identified in the 1960s were becoming generally recognized: for what purpose is so much of the DNA in a cell transcribed, and why does only a proportion of nuclear hnRNA contain the poly(A) sequence typical of mRNAs? A commentary in the journal Nature at the time remarked that “all the attention has focused on proving the existence of the messenger proportion of the HnRNAs”, and then predicted that “In the future it may be the other sequences, among which controlling elements might be found, that will command more interest”. 202

Processed RNAs as Global Regulators

Explicitly influenced by the ideas of Britten and Davidson, as well as of Vogel, 12 Crick (later a proponent of selfish and junk DNA, Chapter 7) ventured in 1971 that most DNA of ‘higher organisms’ does not encode proteins and proposed a general model for the chromosomes of higher organisms. 203 Crick’s model was based primarily on cytogenetic studies of Drosophila polytene chromosomes, in which dense bands contained high concentrations of DNA and histones separated by less dense ‘interband’ areas (see Chapter 14 for modern views on chromatin and genome organization). He suggested that the protein-coding sequences were found in the small fraction of fibrous DNA characterizing the interbands. On the other hand, the dense chromosome bands corresponded to “globular structures”, which comprised most of the genome and which, he proposed, would contain “unpaired DNA” available for gene control, the ‘Unpairing Postulate’. Crick speculated that “this postulate has even more force if single-stranded RNA is also used to recognize the control sequences on DNA elements”, and that repetitive sequences may be the specific interaction sites regulated by interactions with histones. 203

Based on the properties of RNA, Gerald Kolodny proposed in the early 1970s that regulatory RNAs originating from the breakdown of hnRNAs constituted major drivers of cell differentiation during development. Kolodny hypothesized that the derived “short activator RNAs” could base pair with unique single-stranded DNA sequences in control regions of the target genes (such as in areas of the chromatin where the DNA is exposed and able to be experimentally cleaved by DNases j ) and promote regulation at the transcriptional level. 204 , 205 According to this model, hnRNA processing could occur either in the nucleus or in the cytoplasm, as there was evidence from Lester Goldstein and colleagues that short RNAs are shuttled back to the nucleus, including RNAs that then would become associated with chromatin; 206–209 the short RNAs would act as “primers” for transcription of one or several mRNA species containing matching sequences. 204

In addition, Kolodny and colleagues showed that RNAs are secreted by mammalian cells, confirming Pierre Mandel and Pierre Métais’ 1948 report of RNAs circulating in plasma and transmitted between cells. 210–212 In Kolodny’s model, activator RNAs are derived from stored maternal RNAs (thus having a role in inheritance) and initiate an unfolding developmental program during embryogenesis. 204 , 205 , 211 , 212

Burke Judd and Michael Young suggested in 1974 that individual chromosomal subunits (or ‘chromomeres’) corresponded to cistrons and were modules of gene regulation in eukaryotes. 213 This idea was based on evidence that large proportions of chromomeres are transcribed into very large hnRNAs 214 and likely processed into one or several mRNAs. Judd and Young hypothesized that these segments contained much more information than protein-coding sequences, and yet acted as a single operational unit. This theory of ‘one cistron – one chromomere’ considered that the large proportion of DNA that was transcribed, but apparently not translated, might be “coding for regulatory functions”, conveying information for regulating transcription and post-transcriptional maturation and translation of that unit. Finally, they proposed that some of the “extra RNA” released during the processing of the large transcripts could activate other cistrons of a (related) biosynthetic or developmental pathway, in which mutations might manifest in a pleiotropic (i.e., multilateral) fashion. 213

In 1975, Stuart Heywood and colleagues proposed mRNA regulation by “translational-control RNA”, short RNAs hypothesized to be generated by processing of hnRNAs in the nucleus, 215 , 216 conceptually presaging the biogenesis and action of microRNAs discovered 30 years later (Chapter 12).

In 1976, George Brawerman developed an RNA-based model for the control of transcription during multicellular development “distinct from those operating in bacteria”, mediated by “primer RNAs” that could bind by complementary base pairing to target sites in the genome, including repetitive sequences, and be abnormally expressed in cancer. He hedged his bets however, noting that recent studies had also indicated “that unique proteins associated with the DNA in chromatin appear to be directly responsible for the specificity of transcription … [which does] not seem to leave any room for a specific role of primer RNA”. 217

In the same year, Elizabeth Dickson and Hugh Robertson published an article entitled ‘Potential regulatory roles for RNA in cellular development’. 218 , 219 They proposed that RNAs may represent “informed signals” that regulate gene expression by interacting either “directly or indirectly” with DNA. 218 They used examples of emerging cellular phenomena involving RNAs (such as the existence of ‘non-coding’ infectious RNA molecules, called ‘viroids’, in plants; Chapter 8) to highlight the biological capacities of RNAs, and canvassed the features that made RNAs “prime candidates” as regulatory molecules: rapid turnover by nuclease degradation; the ability to fold in complex ways, with globular regions “reminiscent of protein structure” that are ideal for interaction with proteins; base pairing specificity for interaction with nucleic acids; and, finally, high (informational) “coding” capacity.

They highlighted that an RNA sequence of just 17 bases is sufficient to specify a unique region in the human genome, compared to the information required to produce a regulatory protein such as the lac repressor (~1,000 nucleotides). Thus, they suggested, besides the protein regulatory systems, “the use of RNA as an additional control element would add flexibility, efficiency, and elegance to a logical system of gene control”. 218

Dickson and Robertson speculated about the possible sources of regulatory RNAs and their mechanisms of action, including the regulation of transcription, translation, target degradation and even DNA replication. In short, like previous suggestions, they indicated that hnRNAs could be processed in the nucleus in a cell-type specific manner into mRNAs and “extra RNAs”, generating a “stable RNA molecule from a previously unstable RNA (precursor) species” that could act, for example, as trans-acting transcriptional primers that coordinately promoted expression of specific genes. Finally, they also proposed that RNAs could be utilized as signals to transfer external information to the genome during cellular development, resulting in a change in the state of differentiation. 218

This model not only explored the potential of RNA regulators, but integrated it with the broader spectrum of regulatory options in eukaryotic cells, involving both proteins and RNAs. It was remarkably prescient.

Out on a Limb

Receptive reactions, however, were not common, and most contemporaries preferred the perceived simplicity of a protein-based regulatory schema, believing that RNA regulation was unnecessary, and that such models were flights of fancy.

As Ellen Rothenberg, who worked with Davidson, wrote in her obituary of him: 220

Now, by the 1970s, most regulatory biologists in my own molecular biology orbit (at Harvard, MIT, University of California San Francisco, and the Salk Institute) had been massively influenced by Jacob and Monod’s work, by models of bacterial operon regulation, and by the precedents for elegant λ phage regulation of lytic vs. lysogenic growth by a mini network of mutually antagonistic activator/repressor proteins. 221–226 How could these be skimmed over so lightly in a book about differential gene regulation as the foundation for development?

It was not just this particular work of Eric’s that failed to draw upon Jacob and Monod. Interestingly, one of the most controversial predictions in the 1969 Britten and Davidson paper was that regulatory RNAs rather than regulatory proteins might be responsible for complex gene regulation. 15 Yet this was presented without regard for the clear evidence already in hand at the time that gene regulatory molecules were proteins in these bacterial systems. Why? Asked about this many years later, Eric often explained that for him in the 1960′s, the evident differences between bacterial gene regulation and complex eukaryotic gene regulation in development completely dwarfed the similarities. Hybridization kinetic analyses of bacterial and multicellular eukaryotic genomes had already showed these to have vastly different kinds of sequence organization, with a severe paucity of repeat sequences in the bacterial genomes compared to the multicellular eukaryotes. If these were regulatory sites, then bacteria were missing this kind of regulation.

Also, Eric’s view of development was that this irreversible, hierarchical process of increasing complexity that he was interested in was so different from the reversible, physiological nutrient responses of bacteria that there was no reason to posit the same kinds of molecular mechanisms. In this way, Eric and Roy were indeed charting their own course. But were they actually solving developmental mechanisms? 220

It seems that many felt the same way, and the thoughtful models of Britten and Davidson, and Dickson and Robertson, and their ideas of regulatory RNA, although they attracted some attention at the time, were ultimately sidelined and overrun in the excitement of the emerging gene cloning revolution.

Brow-beaten by the orthodoxy, k Davidson spent the rest of his highly successful and influential career studying the role of transcription factors and gene regulatory networks in sea urchin development 172 , 185–187 , 197 , 198 , 220 , 228–233 (Chapter 15). Davidson did however report that “nontranslatable transcripts containing interspersed repetitive sequence elements constitute a major fraction of the poly(A) RNA stored in the cytoplasm of both the sea urchin egg and the amphibian oocyte”, 234 one of the first explicit descriptions of long ‘non-coding’ RNAs beyond those in the ribosome (Chapter 9).

Roy Britten remained studying repetitive DNA mainly from an evolutionary perspective but not specifically as regulatory cassettes, although he did acknowledge their importance as sources of variation, 235–237 a theme that would be picked up later by others.

Further Reading

  1. Britten R.J. and Davidson E.H. (1969) Gene regulation for higher cells: A theory. Science 165: 349–357. [PubMed: 5789433]
  2. Chomet P. and Martienssen R. (2017) Barbara McClintock’s final years as nobelist and mentor: A memoir. Cell 170: 1049–1054. [PubMed: 28886375]
  3. Comfort N.C. (2003) The Tangled Field: Barbara McClintock’s Search for the Patterns of Genetic Control (Harvard University Press, New York).
  4. Crow J.F. and Bender W. (2004) Edward B. Lewis, 1918–2004. Genetics 168: 1773. [PMC free article: PMC1448758] [PubMed: 15611154]
  5. Davidson E.H. (2006) The Regulatory Genome: Gene Regulatory Networks In Development And Evolution (Academic Press, New York).
  6. Deichmann U. (2016) Interview with Eric Davidson. Developmental Biology 412: S20–S29. [PubMed: 26825396]
  7. Kauffman S. (1971) Current Topics in Developmental Biology, Vol. 6 (A. A. Moscona & Alberto Monroy) 5, pp. 145–182 (Academic Press, New York).
  8. Lewis E.B. (2007) Genes, Development and Cancer, The Life and Work of Edward B. Lewis (Springer, New York).

Footnotes

a

The smallest genome known is that of the parasitic microsporidian Encephalitozoon intestinalis (haploid 2.3 Mb); 5 the largest known are the marbled lungfish Protopterus aethiopicus (ca. 130 Gb) 6 and the monocot plant Paris japonica (ca. 150 Gb). 7

b

McClintock also made many cytological observations of subcellular structures, most notably the “nucleolar organizing region”, 23 now known to contain tandem arrays of ribosomal RNA genes. 24

c

On more conventional grounds, McClintock was elected to the National Academy of Sciences in 1944 at the relatively young age of 42, and in 1945 she was elected the first female president of the Genetics Society of America. 34

d

The human genome, for example, has over 1 million copies of the Alu element (comprising over 10% of the genome), only ~5% of which are active. 37 The name ‘Alu’ derives from the fact that these elements were first recognized as the short (~300nt) repeated DNA sequences commonly contain a cleavage site for the Arthrobacter luteus (AluI) restriction enzyme. 38

e

These phenomena have also been referred to as ‘quelling’ and ‘co-suppression’.

f

The evolutionary significance of this transformation was not lost on Lewis, who noted the similarity of bithorax mutations to (and therefore the ease of generating) the additional set of wings on a dragonfly. 112

g

Lewis later capitulated, but not completely, to the conventional view of gene regulation, stating in 1994 that: “The remaining regions are thought to include enhancer-like sequences to which regulatory proteins bind, thereby conferring spatial- and temporal specific production of the proteins encoded by the complexes.” 111

h

Davidson was a colorful character. He played American football at a senior level, rode a Harley-Davidson, and was lead singer for an Appalachian folk music ensemble and played banjo in the Iron Mountain String Band. 168 , 169 He had worked with Mirsky and he and Roy Britten were part of the Caltech group that included Delbruck and Leroy Hood (who later invented the first automated DNA sequencer, Chapter 10), one of a number of overlapping schools of thought that influenced research and concepts throughout the early period of molecular biology. McClintock was an outsider.

i

A feature especially amenable to transposon insertion and associated epigenetic control.

j

‘DNase hypersensitivity sites’ occur at different chromosomal ­positions in different cell types and were later exploited to map the positions of transcription initiation and transcription factor binding sites in genomes (Chapters 11 and 14).

k

As Davidson recalled later: “I read it [Jacob and Monod’s 1961 lac operon paper], but it didn’t penetrate ... As I recall, everything they did was about repression … We knew that there were all kinds of RNAs that bacteria didn’t make. And nuclear RNA had been discovered, but nobody knew what it did. So the relevance of the Jacob & Monod ideas was definitely not obvious to me… We built [our] model on RNA reading the sequences because it was simpler to deal with complementarity than with what was completely unknown, namely how proteins can read DNA sequences. … It was known that some proteins bind to DNA, and we knew about viral proteins that read DNA sequences. In the ‘71 application of that model to evolution we said it could be either RNA or protein, but the logic is going to be the same. And the logic is the same, but the first model was basically built on RNA recognition. Well, now it turns out that there are a number of regulatory functions that do use complementary sequence recognition by regulatory RNAs. But the main heavy lifting of regulation is done by protein–DNA interaction, of the kind that had already been found in bacteria. I suppose if I had paid more attention to Jacob and Monod we probably would not have built the model using regulatory RNA. We discussed whether we were going to talk about protein or RNA, and RNA was neater and easier to deal with because of natural complementarity. But that was wrong, from the standpoint of how it works.” 227

© 2023 John Mattick and Paulo Amaral.

Open Access: This content is Open Access under the Creative Commons license CC-BY-NC-ND.

Bookshelf ID: NBK595943DOI: 10.1201/9781003109242-5

Views

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...