U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Coffin JM, Hughes SH, Varmus HE, editors. Retroviruses. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press; 1997.

Cover of Retroviruses

Retroviruses.

Show details

Structural Classes of Retroelements and Replication Strategies

For convenience, the retroelements to be discussed in this chapter are presented in seven groups in a somewhat arbitrary order, and these groups do not always correspond to “natural” taxonomic groupings. These groups are (1) the vertebrate long terminal repeat (LTR)-containing elements, beginning with endogenous retroviruses; (2) the invertebrate LTR elements; (3) the poly(A) retrotransposons (often referred to as the non-LTR elements); (4) the retroplasmids and retrointrons; (5) the prokaryotic “retrons” or msDNAs; (6) the hepadnaviruses (mammalian pararetroviruses); and (7) the caulimoviruses (plant pararetroviruses). In this section, the structural features of each class of element are summarized, as well as the overall life cycle and replication strategy, to the extent that it is known.

Endogenous Retroviruses and Other Retrovirus-like LTR-containing Elements

One of the chief characteristics of the retroviral life cycle is integration of viral genetic information into the genome of the host cell, forming the provirus. The provirus embodies a permanent association between cell and retrovirus. Integration into a chromosome of a germ cell provides the means for retroviruses to colonize the germ line of their hosts, where they can persist as stable integrated proviruses for multiple generations. Such elements are known as endogenous retroviruses. Analysis of genomic DNA reveals the presence of many thousands of retroviruses or retroviral-like elements, indicating that the products of reverse transcription have played a major part in shaping the eukaryotic genome.

A few general conclusions can be drawn from analysis of these elements. It should be stressed, however, that these rules are not absolute and that some of the most important pieces of information come from the exceptional cases. First, endogenous retroviral elements tend to be present in groups with individual members showing a high degree of sequence similarity but differing in their specific integration sites within the genome. Second, the different groups can vary widely in sequence and structural organization but all resemble the products of retroviral integration reactions, with two LTRs and cis-acting sequences necessary for reverse transcription and packaging. Intact versions of endogenous proviruses are colinear with exogenous proviruses. Third, although closely resembling exogenous viruses, endogenous proviruses are not simple copies but are subject to constraints and have undergone adaptations rendering their lifestyle compatible with their hosts. In most cases, viruses induced from endogenous proviruses are relatively nonpathogenic and will often replicate less well than their exogenous counterparts. Fourth, many endogenous proviral elements are transcriptionally silent; this inactivity is associated with host-directed methylation of CG sequences in the provirus. Finally, endogenous proviruses are often defective, many carrying deletions or point mutations, so that they are incapable of yielding infectious virus. In addition, genetic changes in the host may have caused the loss of functional copies of receptors for specific viruses, preventing reinfection by germ line viruses, a phenomenon known as xenotropism (see Chapter 3.

This section describes the discovery of endogenous retroviruses, compares the structures of the endogenous elements with one another and with exogenous viruses, and examines their movement and spread. Later sections consider the specific phenomena associated with individual proviruses, the potential pathogenic effects of proviral expression, and the use of endogenous retroviruses in developmental studies. Finally, speculations are presented on the evolutionary consequences for the virus and host of this particular form of genetic parasitism.

Structure and Classification of Endogenous Retroviruses and Retrovirus-like Elements

Endogenous retroviruses and retroviral elements have been found in all vertebrates investigated. They are usually present in the genome of the host in numbers ranging from a few to a few thousand, although occasional single member groups have been identified. As a general rule, the number of groups of viral sequence found within a given vertebrate species is proportional to the effort spent searching in that species. Members of a given group show a high degree of sequence conservation. A much lower degree of homology is observed between different groups, although most are clearly related to contemporary exogenous viruses. The internal sequences of several groups of elements lack any detectable homology with reverse transcriptase; these have been termed mammalian apparent LTR-retrotransposons (MaLRs) (Smit 1993).

Groups of endogenous viruses appear to have colonized the germ line at different times. It is therefore quite helpful to think of endogenous proviruses as ancient or modern (Coffin 1990, 1993). Ancient proviruses were inserted into the germ line prior to speciation as judged from the presence of a provirus at the same location in all individuals of a species and sometimes related species. Ancient proviruses show signs of their age, and none have been found to yield infectious virus. All show sequence differences between their LTRs, which were presumably identical at the time of integration (see Chapter 4 but also see below) and carry a variety of mutations in their coding sequences, although a few retain sufficient protein-coding capacity to yield noninfectious, RT-negative, viral particles (Tönjes et al. 1996).

Groups of modern proviruses have been introduced into the germ line of a few species much more recently and therefore show considerable genetic heterogeneity in insertion site from individual to individual. They have closely related exogenous cousins. One or more members of recently introduced groups will often spontaneously release infectious virus or can be induced to do so, for example, by agents that reverse DNA methylation. Recent endogenous viruses have not been identified in all studied vertebrates. For example, despite the presence of infectious endogenous retroviruses in mice, chickens, and some primates (Todaro et al. 1974), such viruses have not been identified in humans. Significant efforts have been, and continue to be, devoted to this task (Weiss 1982; Wilkinson et al. 1994).

Discovery of Endogenous Retroviral Elements

Studies of radiation- and carcinogen-induced cancers in the 1950s and 1960s showed that C-type retroviruses were frequently expressed in the tumors (Gross 1970). At the same time, genetic studies with GR mice led to the demonstration of genetically transmitted, virus-associated mammary adenocarcinomas (Muhlbock 1955). Taken together, these observations suggested the involvement of genetically inherited latent proviruses, analogous to bacterial prophages, in tumor formation (Lwoff 1960). It was subsequently shown that normal, nontransformed cells express viral antigens. First came the demonstration that leukosis-free chicken embryos contain an antigen that reacted in immunological tests for group-specific (gs) antigens of the avian leukemia viruses (ALVs) (Dougherty et al. 1967). Then it was shown that expression of this gag-encoded protein is controlled by a single dominant autosomal locus (Payne and Chubb 1968). Independently, it was found that some normal uninfected chick cells can complement RSV defective in env expression. This activity, called chick helper factor (chf), was also shown to map to a single locus (Weiss and Payne 1971).

Simultaneously, liquid hybridization experiments showed that proviral genomes are present in uninfected cells (Rosenthal et al. 1971). Next came the isolation of infectious virus either spontaneously released from uninfected cells (Vogt and Friis 1971) or following X-irradiation-mediated activation (Weiss et al. 1971). At the same time, evidence for the existence of endogenous copies of murine leukemia virus (MLV) accumulated in a similar fashion. Virus-like sequences in the DNA of uninfected murine cells were described (Harel et al. 1967), then gag (Huebner et al. 1970) and env-encoded (Stockert et al. 1971) antigens in normal mouse strains were identified, and the demonstration of spontaneous (Aaronson et al. 1969) or induced (by treatment with halogenated analogs of thymidine; Lowy et al. 1971) MLV production by previously nonproducing cells followed. Simultaneously, single genes, which presumably correspond to endogenous copies of mouse mammary tumor virus (MMTV), controlling the incidence of virus-producing mammary tumors in different strains of mice were identified (Bentvelzen et al. 1970; Nandi and McGrath 1973). These and related experiments, reviewed in more detail in the initial version of this chapter (Coffin 1982), provided convincing evidence for the presence of endogenous retroviruses in the chromosomes of normal uninfected cells. The presence of such elements in association with the relationship between retroviruses and tumorigenesis stimulated the search for novel proviruses, particularly of human origin.

A variety of strategies have been employed to look for endogenous retroviral elements. The simplest approach is to try to isolate replicating virus by cocultivating induced cells (Weiss et al. 1971) with appropriate indicator cells derived from a different species. This method has succeeded for most modern endogenous viruses, but it has led to some embarrassing moments after accidental cross-contaminations occurred (for examples, see Weiss 1982). High- and low-stringency hybridizations using cloned exogenous and endogenous viral genomes as probes have also been successful, and a variety of ancient retroviruses have been isolated in this way (Dunwiddie et al. 1986; Wilkinson et al. 1994). More recently, polymerase chain reaction (PCR) strategies designed to amplify conserved regions of pol have been employed (Medstrand and Blomberg 1993; Shih et al. 1989; Tristem et al. 1996). Other elements have been detected by analysis of genomic sequence databases (Smit 1993). A number of endogenous elements have also been discovered by serendipity in ways that illustrate the biology of the elements (see below). It remains to be seen how many groups of endogenous sequences have been missed in well-studied species such as mice and humans; a definitive answer to this question will presumably be provided by the sequencing studies carried out under the auspices of the Human Genome Project.

Structure of Endogenous Retroviral Elements in Different Animal Species

Endogenous proviruses identified in the germ lines of a number of the most-studied species are listed in Table 2, along with key references, an indication of the structure of each element, and sequence database accession numbers. A brief description of each element follows. For more details concerning the isolation of replication-competent viruses from various species, see chapters in the previous edition of this book on virus taxonomy (Teich 1982) and endogenous viruses (Coffin 1982).

Table 2. Endogenous Viruses in Different Species.

Table 2

Endogenous Viruses in Different Species.

CHICKENS

Certain lines of chickens spontaneously release a virus called Rous-associated virus (RAV-0) (Vogt and Friis 1971). A single gene controls viremia in susceptible birds; this gene was later identified as the RAV-0 provirus itself. This virus represents the prototype modern endogenous virus of birds and belongs to a relatively simple group that has served as a model for studies of all endogenous retroviruses. Much of the picture that emerged (Coffin 1982) is equally applicable to other more complex systems of endogenous retroviral elements. The viruses are closely related to exogenous avian sarcoma/leukosis viruses (ASLVs) (Coffin et al. 1978). The main difference, which apparently accounts for differences in growth rate and pathogenicity, lies in the U3 region of the LTR (Coffin et al. 1981; Tsichlis and Coffin 1980). Structural studies indicate that these viruses are essentially indistinguishable from proviruses obtained by normal infection (Hughes et al. 1981); they have two LTRs, colinear genomes (although a number have obvious deletions), and are associated with a characteristic 6-bp duplication of host sequences (Hishinuma et al. 1981). Individual proviruses vary greatly in expression levels, with phenotypes determined in large part by their structures (see below). Individual proviral loci (ev genes) behave as stable Mendelian genes residing at a variety of sites in the genome (Astrin et al. 1980; Boulliou et al. 1991; Iraqi et al. 1991). They are found only in domestic chickens and the related red jungle fowl but not in other members of the genus Gallus, suggesting a relatively recent germ line infection (Frisby et al. 1979). However, closely related proviruses can also be found in other birds such as pheasants, suggesting at least two independent germ line infections (Frisby et al. 1979). As far as is known, the ASLV group represents the only modern endogenous proviruses of birds.

More recently, at least two families of ancient endogenous retrovirus-like elements have been cloned, [Table 2: ch08-t2.typ]using ALV probes under low-stringency hybridization conditions, from a line of chickens lacking RAV-0 (Dunwiddie et al. 1986; Boyce-Jacino et al. 1992). The endogenous avian retrovirus (EAV-0) and E51 families both show weak homology with RAV-0 and a similar structural organization, although many EAV-0 elements are deleted for env. The LTRs of EAV-0 are unusually short, 243 bp, but they contain all of the regulatory features of the longer RAV-0 LTR and are fully capable of driving transcription (Boyce-Jacino et al. 1989). Both families appear to have entered the germ line prior to Gallus speciation (Resnick et al. 1990) and are therefore classified as ancient, although some polymorphism in insertion sites can be observed in chicken DNA. Infectious viruses corresponding to the two families have not been seen, although it is noteworthy that the env gene of the subgroup J ASLV appears to be derived, by recombination, from a member of the E51 family (Bai et al. 1995).

A fourth family of endogenous elements found in chickens, termed ART-CH (avian retrotransposon from chicken genome) elements, has recently been characterized (Nikiforov and Gudkov 1994). These elements are 3.3 kb long, contain two LTRs, and show a few short patches of sequence similar to those of the gag and pol genes of ALVs and EAVs, buried in a sequence of unknown origin. They have apparently lost the capacity to encode proteins. Like the virus-like 30S RNA (VL30) elements of mice (see below), these elements seem to contain all the cis-acting sequences needed for transmission as a retroviral passenger. About 50 closely related (>3% divergence) ART-CH elements are present in chickens, suggesting a fairly recent entry into the chicken germ line. Their distribution within other members of the Gallus genus has not been investigated.

MICE

At least eight different groups of endogenous retroviral elements have been described in inbred strains of mice (Stoye and Coffin 1985; Keshet et al. 1991; Kozak and Ruscetti 1992). The structures of these elements are illustrated in Figure 1. Two groups, the C type and the B type, have closely related exogenous relatives, the MLVs and the MMTVs, respectively, and therefore have been subjected to considerable scrutiny as potential tumorigenic agents for many years (Lilly and Pincus 1973; Nandi and McGrath 1973). Recently, the association between mammary tumor viruses and superantigens has focused considerable attention on endogenous MMTVs (discussed below). The remaining groups do not seem to have close exogenous relatives, although some still show signs of ongoing movement in the genome. Additional groups of endogenous elements, at least one of which is infectious (Miller et al. 1996), are undoubtedly present in the genome of Mus (Stavenhagen and Robins 1988; Cordonnier et al. 1995); these, however, have yet to be characterized in detail and are not be considered further in this section.

Figure 1. Families of murine endogenous retroviral elements.

Figure 1

Families of murine endogenous retroviral elements. The relative size and genomic organization of typical retroviral elements found in the mouse germ line are shown. It is likely that only the MLV, MMTV, and IAP groups can encode functional proteins, although (more...)

MLV

The MLV-like elements are very similar to one another, but they can be divided into distinct classes on the basis of the host ranges specified by their env genes and a number of other linked structural characteristics. The host ranges (Table 3 and see Chapter 3) are termed ecotropic (replicate only on murine cells), xenotropic (grow only on nonmurine cells), and polytropic (replicate on murine and nonmurine cells). The provirus groups are named accordingly even though many members of these groups are not replication-competent; assignment of individual proviruses to these groups is usually based on sequence criteria alone. The amphotropic class of MLV, widely used in retroviral vectors, has not been identified in the murine germ line (O'Neill et al. 1987).

Table 3. Host Ranges of Endogenous MLVs.

Table 3

Host Ranges of Endogenous MLVs.

The MLV sequences are extremely homogeneous; however, detailed structural analyses have revealed characteristic features of the different classes of MLVs endogenous to laboratory mice (Fig. 2). The major difference between the classes lies in their env genes. In particular, the ecotropic class shows only 40% homology with the nonecotropic classes in the amino-terminal two thirds of the surface (SU)-coding region, which makes it possible to generate an ecotrope-specific env probe (see below). The xenotropic and polytropic classes are much more closely related (10–15% divergence in env) but show a number of characteristic differences (e.g., the absence of a BamHI site at the start of the xenotropic env gene). Sequence studies showed that the polytropic class can be subdivided to include members with a characteristic 27-bp deletion in SU called modified polytropic (Stoye and Coffin 1987). Endogenous polytropic and modified polytropic proviruses contain characteristic insertions in their LTRs of 150 and 190 bp, respectively (Khan and Martin 1983; Stoye and Coffin 1987; Ch'ang et al. 1989). These insertions are derived, presumably by recombination, from the murine retrovirus-related sequence (MuRRS) group of elements (Schmidt et al. 1984).

Figure 2. Structures of endogenous MLVs.

Figure 2

Structures of endogenous MLVs. Features indicated are the restriction sites for EcoRI (E), BamHI (B), and HindIII (H). The relative sizes of the LTRs with the 150-bp and 190-bp inserts in polytropic and modified polytropic proviruses (color) are shown; (more...)

Endogenous proviruses encoding replication-competent ecotropic and xenotropic viruses are present in some but not all inbred mice (for review, see Coffin 1982). Infectious endogenous polytropic viruses have not been observed, although recombination with infecting ecotropic viruses allows the host range specified by the env gene of the endogenous element to be revealed. It should be noted that the expanded host range of polytropic viruses, compared to ecotropic and xenotropic and xenotropic env genes, but it is an inherent property of the polytropic SU protein (Stoye and Coffin 1987). Analysis of the distribution of different classes of viruses in geographically separate and taxonomically distinct wild mouse populations indicates that the germ line sequences were acquired independently and that the pattern seen in laboratory mice results from the interbreeding of distinct taxonomic groups (Kozak and O'Neill 1987). The nonecotropic sequences are more widely distributed than the ecotropic proviruses, implying a more recent origin for the ecotropic sequences (Kozak and O'Neill 1987). This would be consistent with the suggestion on the basis of sequence comparisons that the ecotropic class arose by substitution of a region of the env gene of the xenotropic class by the analogous region of another unknown virus which has not been fixed in the germ line (Stoye and Coffin 1987). Such an event may have occurred more than once, since the env gene of different ecotropic isolates may show considerable sequence variation (Kozak et al. 1984; Kozak and O'Neill 1987; Voytek and Kozak 1989). Phylogenetic studies indicate that MLV entered the Mus germ line within the last 1.5 million years (Lander and Chattopadhyay 1984; Kozak and O'Neill 1987; Bonhomme and Guénet 1989).

MMTV

MMTV was the first mammalian retrovirus isolated (Bittner 1936). Inbred strains of mice contain up to ten polymorphic MMTV proviruses (Cohen and Varmus 1979). More than 50 endogenous MMTVs scattered over the mouse genome have now been identified (Kozak et al. 1987; Tomonari et al. 1993). At least two loci, Mtv1 and Mtv2, can be expressed as infectious virus (Van Nie and Verstraeten 1975; Van Nie et al. 1977). Additional polymorphic MMTV proviruses are present in some but not all wild mice (Callahan et al. 1982; Siracusa et al. 1991; Imai et al. 1994), but systematic studies of MMTV distribution in different species of Mus have not been performed.

IAP elements

The remaining classes of endogenous murine retroviral agents are not known to encode replication-competent viruses. Intracisternal A-type particle (IAP) elements are present in mice at approximately 1000–2000 copies per cell (Kuff and Lueders 1988). Often expressed at high levels in plasma cell tumors, IAPs are assembled on the endoplasmic reticulum (ER) and bud into the cisternae but are not released from the cell (Dalton et al. 1961). These particles can be extracted from tumor cells and used for the purification of genomic RNA (Kuff et al. 1968), which can then be used as a probe for the identification of IAP-element-containing clones (Lueders and Kuff 1980). These cloned elements comprise “full-length” forms (type I) or a series of deleted derivatives (type II) (Shen-Ong and Cole 1982). More recent studies indicate that there are at least four subclasses of type I and three subclasses of type II (Kuff and Lueders 1988). The full-length forms contain obvious gag and pol genes, but most seem to lack intact env genes (Mietz et al. 1987). Comparisons of RT sequences indicate that IAPs are related to the B- and D-type taxonomic groups of retroviral families (Xiong and Eickbush 1990; see Appendix 1).

In some cell types, particularly plasmacytomas, amplification of IAP proviruses occurs. The novel proviruses apparently represent the reverse-transcribed and integrated progeny of one or a few heavily transcribed proviruses (Shen-Ong and Cole 1984). Surprisingly, several of the novel progeny do not have identical LTRs (Kuff and Lueders 1988; Michaud et al. 1994), possibly reflecting a relatively high frequency of recombination between copackaged IAP RNAs with nonidentical LTRs. Such events, which would require intermolecular plus-strand DNA transfers (see Chapter 4, have not been observed with exogenous retroviruses (Jones et al. 1994), but they are not uncommon with Ty1 elements (Boeke et al. 1985, 1988). Recently, a subset of IAP-related proviruses have been identified that do encode an env-like protein (Reuss and Schaller 1991; Fennelly et al. 1996). These are termed IAPE elements.

IAP sequences appear to be widely distributed in nature (Kuff and Lueders 1988). Different subspecies of Mus contain the same IAP types but in different proportions (Shen-Ong and Cole 1982). Repetitive sequences related to murine IAPs can be detected in most, but not all, rodents and some other mammals (Lueders and Kuff 1981, 1983). Large-scale amplification of IAP sequences has apparently occurred independently in the mouse, rat, and Syrian hamster (Lueders and Kuff 1983).

VL30 elements

VL30 elements were originally discovered as contaminants of MLV particles grown on mouse or rat cells. They can be transmitted from cell to cell as a passenger within MLV virions, sometimes representing the majority of genomes present (Besmer et al. 1979; Scolnick et al. 1979). These elements have two LTRs and MLV virion encapsidation sequences (Torrent et al. 1994), as well as plus- and minus-strand priming sites. There are considerable variations in the U3 sequences and promoter properties of different elements (Nilsson and Bohm 1994). VL30 elements seem to lack coding capacity, although some show a distant relationship to MLV (Adams et al. 1988); they can be detected in all Mus species at copy numbers ranging from a few to several hundred, with a certain degree of polymorphism between inbred strains (Courtney et al. 1982). Given their ability to be packaged by MLV and the extraordinarily high rate of recombination observed with retroviruses, it is perhaps not surprising that VL30s can recombine with MLVs; this has apparently taken place during the generation of ras-containing sarcoma viruses (Makris et al. 1993).

MuRRS and GLN elements

Two further groups, the MuRRSs and GLNs (after the glutamine tRNA primer-binding site), were originally identified by analysis of proviruses that seem to result from recombination between different groups of elements. For example, polytropic MLVs contain a characteristic 150–190-bp insertion within the U3 region when compared to ecotropic or xenotropic proviruses (Khan and Martin 1983). This sequence is present in the mouse germ line at much higher abundance than are MLVs (Wirth et al. 1983), suggesting that it might represent part of another group of repeat elements. The inserted sequence was then used to clone and define this novel group, which was dubbed MuRRS (Schmidt et al. 1984, 1985). These analyses indicate that the 150–190-bp sequence found in the polytropic MLV LTR is derived originally from the MuRRS LTR. Similarly, GLNs were initially identified using the LTR from an atypical VL30 as probe (Itin and Keshet 1986). Subsequently, an example of a mosaic provirus containing MLV and GLN sequences was also identified (Obata and Khan 1988).

Both groups have members with recognizable gag, pol, or env genes, but intact proviruses are rare. For instance, it is possible to recognize long stretches of amino acid homology between MuRRS and MLV separated by several sizable deletions (Schmidt et al. 1985). The majority of these elements are present as solo LTRs, there being approximately 50 intact MuRRS and GLN elements in the germ line and at least 20-fold more solo LTRs (Keshet et al. 1991). Examples of solo LTRs, which are apparently generated by homologous recombination between the two LTRs of an integrated provirus (Hughes et al. 1981; Copeland et al. 1983; Stoye et al. 1988), are found for many classes of endogenous retroviruses. However, the proportion of solo LTRs varies among the different groups of endogenous elements, there being relatively few solo LTRs in the MLV and ALV groups. Whether this is due to inherent differences in recombination frequency or simply reflects different periods of germ line occupancy remains to be determined. Hybridization analyses show that MuRRS sequences are present in all Mus subspecies but not in other genera (Schmidt et al. 1985), whereas there is some evidence for the presence of GLN sequences in Rattus as well as in Mus (Itin and Keshet 1986).

MuRVY elements

MuRVYs (murine repeated virus on the Y chromosome) are present in about 500 copies on the Y chromosome of laboratory mice (Phillips et al. 1982). Initial sequence analyses were confined to the proviral LTRs which show only weak homology with the LTRs of other murine endogenous elements (Hutchison and Eicher 1989). More recent studies have revealed extensive regions of similarity to the gag, pol, and env regions of other C-type retroviruses (Fennelly et al. 1996). Phylogenetic studies suggest that MuRVYs entered the Mus germ line early during its evolution (5–10 million years ago). Approximately 1–2 million years ago, a provirus and its associated flanking DNA, which includes an env-containing IAPE (Fennelly et al. 1996), was amplified by a nonretroviral mechanism to reach the current copy number (Eicher et al. 1989).

ETn elements

A final class of endogenous elements are the ETns (early transposons). They were originally cloned by differential hybridization and are expressed at high levels in undifferentiated embryonal carcinoma cells and preimplantation embryos but not in more differentiated tissues (Brûlet et al. 1983, 1985). Nucleotide sequencing reveals LTRs with an appropriately positioned primer-binding site and polypurine tract but no open reading frames (ORFs) or homology with retroviruses (Kaghad et al. 1985; Sonigo et al. 1987). About 200 ETns, present in two subgroups (Shell et al. 1990), can be found in the genome of inbred mice (Sonigo et al. 1987). ETns are widely distributed among the different species of the genus Mus but are not found in the Rattus, Pyromys, or Coelomys genera which diverged between 5 and 10 million years ago (Sonigo et al. 1987).

HAMSTER

Due to the extensive use of Chinese hamster ovary (CHO) cells for production of recombinant proteins by the biotechnology industry, there has been a recent surge of interest in the endogenous viruses of hamsters. Budding C-type viruses in CHO cells can frequently be seen by electron microscopy (Lieber et al. 1973), but, with one possible exception (Russell et al. 1979), no infectious viruses have been identified. Approximately 100 genomic proviruses related to murine C-type viruses have been detected, some of which appear to be transcriptionally active (Anderson et al. 1991). A full-length genomic clone has been isolated and sequenced; despite obvious similarity to MLVs (65% similarity), the clone carries numerous nucleotide substitutions plus small insertions and deletions in the potential coding sequence (Lie et al. 1994). The evolutionary origins of this virus group are unclear.

IAP elements are present in the genomes of both Syrian (Mesocricetus auretus) and Chinese hamsters (Cricetulus griseus) (Kuff and Lueders 1988). Examples of both kinds have been cloned and sequenced (Ono et al. 1985; Anderson et al. 1990; Dorner et al. 1991). Interestingly, the Syrian hamster element appears to contain an ORF in the region between the pol gene and the 3′LTR with some similarity to the MMTV env gene (Ono et al. 1985; Aota et al. 1987), an observation that may shed some light on the evolution of the IAP class of element.

CAT

Two groups of endogenous cat viruses, each present at about 10–20 copies per genome equivalent, have been described in molecular detail. Both are present in the domestic cat and a few close relatives (Benveniste and Todaro 1974). One group, the endogenous feline leukemia virus (FeLV), is very similar to the exogenous FeLVs. As is the case with the ALVs, the main difference between the endogenous viruses and their exogenous counterparts lies in the U3 region of the LTR (Casey et al. 1981; Berry et al. 1988). No infectious endogenous FeLVs have been isolated, although one or more of the proviruses are expressed, and recombination with infecting exogenous FeLVs to give rise to virus with altered host range is common (Stewart et al. 1986; Overbaugh et al. 1988; Neil et al. 1991; Sheets et al. 1993). Analysis of cloned endogenous FeLVs reveals the presence of many deleted proviruses in the cat germ line (Soe et al. 1985).

The other group of cat endogenous proviruses, the RD114 class, is very similar in sequence to that of an infectious endogenous virus from baboons, BaEV (Kato et al. 1987; Spodick et al. 1988). RD114 was originally isolated from human rhabdomyosarcoma (hence RD) cells passaged through fetal kittens and was originally thought to be a human virus (McAllister et al. 1972), but it was subsequently shown to represent an infectious xenotropic virus of cats (Todaro et al. 1973). Again, molecular analysis of endogenous RD114 proviruses from cat DNA showed that many contained obvious lesions, and it is likely that a single locus encodes the infectious virus (Reeves and O'Brien 1984). The homology with BaEV suggests that cross-species infection by the ancestor of the baboon virus has occurred (Benveniste and Todaro 1974). Liquid hybridization data imply that cats may contain a third group of endogenous element, distantly related to the primate virus MAC-1 (Bonner and Todaro 1979); however, no further studies have bxeen reported.

PIG

Until now, the endogenous proviruses of pigs have been little studied, although at least one replication-competent virus has been isolated (Teich 1982). The virus has C-type morphology, is immunologically related to MLV, and appears to be ecotropic in host range, although it is able to rescue low levels of MSV from heterologous nonproducer cell lines (Lieber et al. 1975). In light of the potential use of pig organs in xenotransplantations (Cozzi and White 1995) and the theoretical hazards associated with endogenous proviruses (Stoye and Coffin 1995), porcine viruses are currently beginning to receive considerable attention. Very recently, the production of xenotropic virus, presumably endogenous in origin, by one porcine cell line has been reported (Patience et al. 1997).

BABOON

Baboon endogenous virus (BaEV) is a xenotropic endogenous type-C virus isolated from a number of baboon species (Teich 1982). Originally estimated to be present at approximately 50–200 copies per genome, more recent studies suggest that the number of intact proviruses is about tenfold lower (van der Kuyl et al. 1995a). Phylogenetic studies are not consistent with inheritance of proviral genomes from a single early primate, but they suggest that cross-species transmissions between animals with shared habitats, followed by germ line fixation, have occurred on several occasions in the recent past (van der Kuyl et al. 1995b). Endogenous type-D proviruses are also present in the baboon genome (van der Kuyl et al. 1997). Since the baboon is also a potential source of tissues for xenotransplantation (Rowe 1996), the potential pathogenicity of endogenous baboon viruses for humans must be carefully evaluated.

SHEEP

Studies of jaagsiekte sheep retrovirus, the suspected etiological agent of ovine pulmonary carcinoma, have revealed the presence of more than 20 related endogenous proviruses in sheep. These elements are transcribed in a variety of sheep tissues, complicating studies of the role of the exogenous virus in disease (Palmarini et al. 1996). Sequence studies reveal similarities to both B- and D-type viruses (York et al. 1992). Assortment of the majority of these proviruses can be seen in sheep families (Hecht et al. 1994). Similar endogenous proviruses can also be detected in goats and other ungulates (Hecht et al. 1996).

SQUIRREL MONKEY

Squirrel monkey retrovirus (SMRV) is a xenotropic endogenous type-D virus present in multiple copies in the New World squirrel monkey (Teich 1982). SMRV was first isolated from squirrel monkey lung cells by cocultivation with canine cells. Sequence studies have revealed a virtually identical provirus in several permanent human cell lines (Chiu and Skuntz 1986; Oda et al. 1988), presumably reflecting the ease with which accidental infection of human cells by SMRV can occur. SMRV apparently utilizes the same receptor on human cells as BaEV (Sommerfelt et al. 1990), a finding consistent with the suggestion that SMRV and BaEV share a common ancestor (van der Kuyl et al. 1997).

HUMAN

The human genome contains a variety of ancient endogenous retroviruses (Wilkinson et al. 1994), but no infectious elements have been isolated. They are commonly referred to as HERV (for human endogenous retrovirus) followed by a letter indicating the amino acid specificity of the tRNA which is predicted to act as minus-strand primer. (This nomenclature has certain problems, e.g., in cases where the primer-binding site sequence has not been determined or where two different groups of viruses share the same primer-binding site; thus, the original names are still frequently encountered.) Sequence homologies within the pol gene have been used to divide HERVs into two classes. HERVs that have homology with the mammalian type-C retroviruses comprise class I, whereas those that have homology with mammalian A- and B-type viruses are termed class II (Callahan 1988). Spumavirus-related endogenous proviruses have been described recently (see Cordonnier et al. 1995), but endogenous viruses closely related to the visna or bovine leukemia virus/human T-cell leukemia virus (BLV/HTLV) groups seem not to be present in the human germ line. Phylogenetic analyses of specific HERVs in different primates have provided conclusive evidence for the existence of retroviruses in modern form more than 25 million years ago.

The first HERVs to be cloned were identified by hybridization under conditions of relaxed stringency using probes from other species. For example, the prototypic class II element, HERV-K, was cloned because it cross-hybridizes with probes prepared from the pol region of MMTV (Callahan et al. 1982; Westley and May 1984) and Syrian hamster IAP (Ono 1986). The first class I element identified, HERV-E, was cloned using a probe from an African green monkey which was originally isolated using an MLV probe (Martin et al. 1981). Further groups of class I elements were cloned in a similar fashion using a chimpanzee genomic clone (Bonner et al. 1982; O'Connell et al. 1984) or probes to different exogenous retroviruses (Leib-Mösch et al. 1986; Perl et al. 1989). Other families of class I sequences have been identified during searches for endogenous viruses that might resemble the infectious viruses HTLV-1 and HTLV-2 in utilizing tRNAPro as primer for minus-strand synthesis (Harada et al. 1987; Kröger and Horak 1987); by examination of repeat regions in the neighborhood of the haptoglobin (Maeda and Kim 1990) and hemoglobin (Mager and Henthorn 1984) genes; or through identification of sequences up- or downregulated in teratocarcinoma cells differentiating in response to retinoic acid (Kannan et al. 1991; La Mantia et al. 1991). With the exception of the HERV-K elements, class II sequences have not been studied in such detail. However, PCR and hybridization studies strongly suggest the presence of further groups of MMTV-related elements awaiting discovery and characterization (Franklin et al. 1988; Medstrand and Blomberg 1993).

Sequence studies imply that the HERV elements are of considerable antiquity. Assuming that the two LTRs are identical at the time of integration and that they show mutation rates similar to those seen in introns—even though it has been suggested that newly introduced foreign DNA can initially accumulate mutations at an accelerated rate (Maeda and Kim 1990)— then the approximate time of insertion into the genome can be calculated. Retroviral insertions can in this way provide molecular clocks for studies of gene evolution (Dangel et al. 1995). Typically, the 5′and 3′LTRs of a number of cloned proviruses show 5–12% divergence (Mager and Henthorn 1984; O'Connell and Cohen 1984; Steele et al. 1984; Samuelson et al. 1990). Given an evolution rate of 5 × 10–9 per site per year (Hayashida and Miyata 1983), this divergence corresponds to a germ line occupancy of 10–25 million years. One exception concerns a pair of HERV-K LTRs (Ono 1986) which shows only two nucleotide changes out of 969 bp (0.2%), implying a fairly recent integration event, although another pair differed by 4.3%.

Like the LTRs, the coding sequences of these proviruses are riddled with mutations, including deletions and in-frame stop codons (Repaske et al. 1985; Ono et al. 1986; Mager and Freeman 1987; Maeda and Kim 1990). Interpretation of the original reading frames is an exercise in paleontology, but one that can be achieved by comparison among related elements and with modern retroviruses. It appears extremely unlikely that any of the HERVs will yield infectious, mobile virus (with the possible exception of the HERV-K family discussed below). Despite their mutated coding sequences, the overall organization of the HERV groups is remarkably constant, and remnants of gag, pol, and env genes can be identified in most cases. One group originally thought to lack an env gene (Mager and Freeman 1987) is the HERV-H group. More recently, a relatively small number of HERV-H proviruses encoding undeleted pol- and env-like genes were identified (Hirose et al. 1993; Wilkinson et al. 1993). Loss of the env gene appears to be quite a common theme among endogenous elements, occurring in the murine IAP and the avian E51 elements, as well as in the HERV-H family, and perhaps reflects the dispensability of the env gene once cross-species infection has been achieved. Solo LTRs are common in the human genome: LTRs derived from several of the HERV groups are present at 1000 copies or more (Wilkinson et al. 1994).

Of all the known families of endogenous sequences in the human genome, the HERV-K family apparently comes closest to encoding infectious virus (Löwer et al. 1996). Although first introduced into the germ line more than 25 million years ago, as judged from their distribution in different primates (Mariani-Costantini et al. 1989), there seem to be some HERV-K proviruses that have not suffered the ravages of time. These contain essentially identical LTRs and possess ORFs potentially capable of encoding Gag, Pol, and Env proteins (Löwer et al. 1993). In addition, they encode a small protein synthesized from a multiply spliced message with certain similarities to the lentivirus Rev protein (Löwer et al. 1995). In some teratocarcinoma cells, one or more of these proviruses are expressed, leading to production of functional core proteins capable of packaging HERV-K RNA into viral particles (Boller et al. 1993; Mueller-Lantzsch et al. 1993). These particles appear to be noninfectious; however, it is possible that, like their closest relatives MMTV and jaagsiekte virus of sheep, they have a very narrow host range.

The antiquity of human endogenous retroviruses is also revealed by phylogenetic analysis. Hybridization studies indicate that closely related proviruses, from several different groups of HERV, are widely distributed among primates. For example, HERV-E, HERV-H, and HERV-K proviruses have all been found in Old World monkeys as well as in apes (Steele et al. 1986; Mariani-Costantini et al. 1989; Goodchild et al. 1993). These groups of primates diverged 25–30 million years ago (Gingerich 1984), implying that the different proviral families, distantly but unmistakably related to modern-day viruses, entered the germ line more than 25 million years ago. Most of these proviruses are not present in New World monkeys, which diverged from Old World monkeys approximately 45 million years ago (Gingerich 1984), although more recent studies have indicated the presence of some HERV-H sequences in New World monkeys (Mager and Freeman 1995). In several cases, proviruses have been found at identical chromosomal sites in Old World monkeys and apes, implying that those specific integration events occurred before the Old World monkey/ape divergence more than 25 million years ago.

Proviral amplification appears to have continued over millions of years. For example, HERV-H elements with three different types of LTRs have been described. The first two classes are present in Old World monkeys and apes. The third, which appears to be a recombinant between the first two, is found only in apes; indeed, analysis of specific proviruses indicates that integration occurred less than 15 million years ago (Goodchild et al. 1993). Similarly, despite the presence of HERV-K elements in the primate germ line for at least 25 million years, there is at least one HERV-K solo LTR present in chimpanzees but not in humans (Craig et al. 1991), implying the movement of these elements within the last 5 million years and supporting the LTR sequence analysis data. Unfortunately, it is impossible to gauge how the rate of amplification has changed during evolution or if there are low levels of continuing movement at the current time. One example of an endogenous retrovirus-associated polymorphism identified in humans represents the loss of an HERV-H element, leaving behind a solo LTR in a pair of siblings (Mager and Goodchild 1989); a second involves the apparent gain of HERV-K env sequences in one individual, although it is unclear whether this gain corresponds to proviral amplification (Steinhuber et al. 1995).

Characterization and Genetic Mapping of Specific Endogenous Retroviruses

An important step in studying the biological properties of endogenous proviruses has been the development of strategies for using Southern blotting technology to permit the unambiguous physical identification of individual proviruses. Southern analysis with viral probes reveals two classes of restriction fragments; the first class, containing only viral sequences, is termed internal fragments, whereas the second class, containing viral and adjoining cellular DNAs, is termed junction fragments. Internal fragments are useful for structural comparisons between proviruses, but they yield relatively little information identifying individual proviruses (although proviruses may contain characteristic deletions). In contrast, junction fragments provide little information about proviral structure, but they do allow identification of individual proviruses, since each provirus is integrated within a specific cellular sequence. Thus, each provirus will yield a characteristic profile of junction fragments with different restriction enzymes. For certain studies, it may be advantageous to generate specific unique sequence probes for cellular DNA flanking proviral loci; these will give different-sized reactive fragments, depending on the presence or absence of the provirus, and will facilitate an analysis of the distribution of that provirus. Knowledge of the flanking sequence also allows ready detection of specific insertions by PCR.

Early studies with endogenous avian viruses (Astrin et al. 1980; Hughes et al. 1979, 1981) showed that endogenous proviruses have the same basic structure as proviruses acquired by exogenous infection and, furthermore, that each locus is correlated with a unique SacI junction fragment (Table 4). Different endogenous loci of chickens are associated with different viral phenotypes characterized by the presence or absence of viral production, either spontaneously or after induction, and by the expression of protein related to the gag gene (gs) or the env gene (chf). A good correlation can be seen between the proviral structure and phenotype.

Table 4. Properties of Some Endogenous ALVs.

Table 4

Properties of Some Endogenous ALVs.

Individual birds in commercial flocks of layer and broiler chickens contain up to ten endogenous ALVs, although few proviral loci are widely distributed in different flocks. Exceptions include ev-1, which is virtually ubiquitous in White Leghorn chickens (Tereba and Astrin 1980), although frequently absent in other populations (Boulliou et al. 1991; Iraqi et al. 1991), and ev-21, which is associated with the sex-linked gene for the slow-feathering phenotype (Bacon et al. 1988). This phenotype is widely used by breeders in the sexing of poultry; consequently, ev-21 is found in many stocks of birds. Unfortunately, although slow feathering provides a cost-effective method for sexing birds, the presence of ev-21 is associated with increased susceptibility to exogenous ALV, resulting in lower productivity and higher mortality rates. Some rapid-feathering revertants of slow feathering retain ev-21 junction fragments, indicating that ev-21 cannot be the sole determinant of slow feathering (Smith and Levin 1991).

The Southern blot approach works well with probes that detect relatively small groups of endogenous proviruses. Thus, whole-virus probes can be used to study endogenous ALVs and MMTVs, but they are less helpful when greater numbers of endogenous sequences are encountered. For example, Southern blots of mouse DNA, probed with the relatively nonspecific MLV probes originally available, revealed a bewildering complexity of bands, indicating many more proviruses than could readily be distinguished or enumerated (Chattopadhyay et al. 1980) This complexity hindered tests of the hypothesis that the strain-specific differences observed in spontaneous expression of ecotropic MLV (Risser et al. 1983) were due to the inheritance of different proviruses in the different strains. The development of probes specific for ecotropic MLVs (Chattopadhyay et al. 1980) made it possible to examine the distribution and inheritance of specific ecotropic proviruses. For example, Jenkins and her co-workers (1982) identified 29 different proviruses, each located at a unique site in the genome, in the 54 inbred strains examined. Upon digestion with PvuII, each provirus gives a unique virus-host junction fragment, depending on the position of the nearest PvuII site in the flanking DNA sequence. The endogenous ecotropic MLV loci have been designated Emv (for ecotropic murine leukemia virus locus) followed by a number. A strict correlation has been observed between inheritance of specific proviral loci, only a few of which are fully functional (Copeland et al. 1984), and the virological status of the animals. Most inbred strains contain no or one endogenous ecotropic provirus; a few contain five or six (Jenkins et al. 1982). To date, more than 50 Emv loci have been described and most have been mapped to specific chromosomal locations (Fig. 3).

Figure 3. Chromosomal distribution of endogenous MMTV and MLV-related proviral sequences.

Figure 3

Chromosomal distribution of endogenous MMTV and MLV-related proviral sequences. Shown are the chromosomal map locations of endogenous MMTVs (Mtv loci), ecotropic MLVs (Emv), polytropic MLV (Pmv), modified polytropic (Mpmv), and xenotropic MLV (Xmv), as (more...)

Genetic studies of the nonecotropic proviruses, normally present at 40–80 copies per genome equivalent, followed a similar pattern. The high degree of nucleotide sequence identity initially complicated the analysis. Detailed structural analyses of the nonecotropic proviruses allowed their subdivision into three different groups: polytropic, modified polytropic, and xenotropic (Stoye and Coffin 1987). Oligonucleotide probes specific for these groups were prepared and used to show that they were present in approximately equal proportions in numbers which could be resolved by Southern hybridization analysis (Stoye and Coffin 1988). With these probes, it was possible to map genetically essentially all of the endogenous nonecotropic MLVs present in inbred strains of mice (Frankel et al. 1989a,b, 1990, 1992). More than 160 proviruses have now been mapped (Fig. 3). These proviral loci are termed Xmv (xenotropic), Pmv (polytropic), and Mpmv (modified polytropic). It is important to remember that the names given to these loci reflect reactivity with specific hybridization probes; in most cases, host range and infectivity have not been tested. More recently, LTR probes specific for the different nonecotropic MLVs have been used in mapping studies. Although the complexity of the patterns generated by probes that recognize each provirus twice precluded the comprehensive studies possible with the env probes, it still proved possible to draw the important conclusion that solo LTRs do not constitute a significant fraction (<10%) of the MLV group (Frankel and Coffin 1994).

A number of general conclusions can be drawn from these mapping studies of mice (Frankel et al. 1990). First, each strain has a unique and characteristic set of fixed endogenous proviruses. All mice from a given inbred strain have the same set of proviruses. This pattern can provide a useful fingerprint in cases of mixed strain identity (Naggert et al. 1995). Very few, if any, proviruses are shared by all strains of mice, implying that germ line colonization occurred after speciation. However, there are few proviruses unique to one strain, suggesting a relatively limited pool of animals, in which most proviruses were already present, as precursors to modern inbred strains of mice. Second, virtually all polymorphisms result from distinct integration events rather than restriction site polymorphisms, again consistent with a relatively recent origin for the MLVs. Third, proviruses are randomly scattered across the mouse genome. With the possible exception of a group of four proviruses tightly linked to the retroviral restriction gene Fv1 (see below), there is no evidence for clustering or other nonrandom patterns of integration. Fourth, the proviruses are quite stable, showing rather low frequencies of gain or loss (see below).

Mapping the elements present at still higher numbers presents a yet more formidable challenge. One approach, analogous to that followed with the MLVs, has been to use subfamily-specific oligonucleotide probes. Approximately 100 IAP proviruses have been mapped in this way (Lueders and Frankel 1994; Lueders et al. 1993b). Alternatively, efforts have been made to improve the resolution of electrophoretic systems. Examples of this approach include running very long wedge-shaped gels (Brilliant et al. 1991) and using two-dimensional gel systems (Sheppard et al. 1991). Recently, a PCR-based approach, termed REVEAL-PCR (for repeat element viral element amplified locus-PCR), has been developed to examine the inheritance of IAP proviruses (Kaushik and Stoye 1994). This technique identifies the fraction of IAPs with genomic locations close enough to SINEs (short interspersed nuclear elements) to yield PCR products primed by conserved sequences in the provirus LTR and the SINE. One advantage of this approach is that it allows the facile cloning of individual fragments for preparation of locus-specific probes. The appropriate choice of primers may facilitate the analysis of retrovirus-like families present in the genomes of other vertebrates including humans. REVEAL analysis of the IAP content of a limited number of strains demonstrated a significant degree of IAP element polymorphism between inbred strains (20–40%) and yielded conclusions concerning the distribution and stability of these elements very similar to those drawn in the MLV studies (Kaushik and Stoye 1994).

Experimentally Induced Endogenous Proviruses

Retroviral genomes can be introduced experimentally into the germ line of mice. In initial studies, performed by Jaenisch and collaborators, a number of substrains of mice, each carrying a single copy of Moloney murine leukemia virus (Mo-MLV) as an endogenous provirus at unique chromosomal sites (Mov loci), were bred after exposing embryos at different stages of development to Mo-MLV (Jaenisch 1976, 1980; Jaenisch et al. 1981). About half of the animals born following exposure as preimplantation embryos contain novel proviruses, and many of these are transmitted through the germ line (Soriano and Jaenisch 1986). Exposure of postimplantation embryos is much less efficient (Soriano et al. 1987). Alternatively, it is possible to obtain germ line transmission of proviruses formed following in vitro infection of embryonic stem cells, a procedure that facilitates introduction of multiple viruses (Robertson et al. 1986) or selection of integrations into specific sites (Kuehn et al. 1987).

Experimentally generated endogenous proviruses have found a number of applications (Jaenisch 1988), including acting as chromosomal or lineage markers (Soriano and Jaenisch 1986) and as vectors for introducing novel genes into the germ line (van der Putten et al. 1985). Study of these proviruses have yielded important clues about the role of methylation in gene control (Jähner et al. 1982; Jähner and Jaenisch 1985), the effect of integration site on the level of viral gene expression (Jaenisch et al. 1981; Barklis et al. 1986), and the role of point mutations in controlling viral replication (Schnieke et al. 1983b). In addition, analysis of the Mov13 mouse, which carries a Mo-MLV insertion that prevents α-collagen expression, has provided a paradigm for studies of insertional mutagenesis in mice (Barker et al. 1991).

More recently, recombinant ALVs have been introduced into the chicken germ line by injecting virus into fertilized eggs (Salter et al. 1987; Crittenden et al. 1989). In contrast to the results obtained with Mov transgenic mice (Jaenisch et al. 1981), where expression of the introduced provirus is tightly controlled (see below), experimentally introduced ALV proviruses are expressed at high levels in the avian system (Brisken et al. 1991; Federspiel et al. 1991).

Movement of Endogenous Retroviruses

Since the original discovery of endogenous proviruses, there has been considerable interest in the factors controlling the stability and movement of these elements. Such information might provide the basis for a reasonable description of the origin and current pattern of endogenous proviruses.

Observing Germ Line Movement

On the whole, endogenous proviruses are remarkably stable, with little apparent change from generation to generation. However, by using laboratory mice, with their high degree of inbreeding, defined pedigrees, and well-characterized complements of MLVs, it is possible to carry out systematic studies of proviral stability. These studies show that provirus acquisition and loss are ongoing processes. The most extensive of these studies concern the ecotropic MLV loci of the different substrains of AKR mice (Buckler et al. 1982; Herr and Gilbert 1982; Moore and Chan 1982; Quint et al. 1982; Steffen et al. 1982). At least 17 different substrains, all descendants from the original AKR strain, have been identified and their proviral content examined by Southern analysis. All substrains express high levels of ecotropic MLV and contain between two and six independently segregating ecotropic MLVs, but only one, Emv11, is shared among all strains. Other proviruses are shared to an extent correlated with the genealogical relationship between the substrains. Within given colonies of mice, extra unfixed proviruses could be detected. These data are most easily interpreted by asserting that the original stock contained Emv11 and that this provirus can give rise to novel germ line insertions (see below). These loci are fixed at a certain frequency during inbreeding. AKR mice appear to gain one ecotropic MLV provirus per 50–100 generations of inbreeding. SWR/J-RF/J hybrid mice show a higher frequency of germ line MLV acquisition with up to 75% of the progeny carrying novel proviruses (Jenkins and Copeland 1985). RF/J mice carry two ecotropic proviruses, Emv16 and Emv17. Both are capable of generating new insertions (Szabo et al. 1993) but are normally not expressed in RF mice because milk-transmitted maternal antibody is present in neonates and a restrictive allele of Fv1 inhibits viral replication (Chen et al. 1980). SWR/J provides a susceptible strain for amplification, and cross-breeding allows expression of ecotropic MLV in the absence of maternal antibody.

The nonecotropic proviruses are also increasing in copy number but more slowly than the ecotropic viruses. For example, analysis of the proviral content of common recombinant inbred strains revealed only two MLVs (one xenotropic and one polytropic) that appear not to be present in the parental strains (Frankel et al. 1990). The recombinant inbred mice examined correspond to a survey covering 7000 meiotic generations, implying that the current rate of nonecotropic provirus acquisition is about one per 3500 meioses (i.e., 35–70-fold slower than for ecotropic viruses). Analysis of the different AKR substrains using nonecotropic MLV probes gives a similar estimate of reinsertion frequency (J.P. Stoye, unpubl.).

Evidence for the continued germ line movement of other murine retroviral elements comes from the analysis of mutations. Several of these mutations are caused by insertion of retroviral elements belonging to classes different from those of the MLVs (see below).

In several cases, loss of germ line proviruses has been observed. However, there are no known examples of precise proviral excision. Most commonly, provirus loss occurs by homologous recombination between the two LTRs, since in each case examined, a solo LTR remains at the integration site. The best documented case concerns the ecotropic provirus, Emv3, which caused the dilute (d) mutation of mice (Copeland et al. 1983; Hutchison et al. 1984). Loss of the provirus causes a readily identifiable phenotypic reversion of d. Since the mutation is carried in DBA/2 and other popular strains of mice that are bred in large numbers, it is possible to make a good estimate of the excision frequency. In 1.1 million mice observed, 10 revertants were seen giving an excision rate of 4.5 × 10–6 per meiosis (Seperack et al. 1988). A similar estimate was made for nonecotropic MLVs in recombinant inbred mice by following the loss from recombinant inbred strains of proviral junction fragments shared by the progenitor strains (Frankel et al. 1990). Germ line excision rates significantly exceed those found in somatic cells (Seperack et al. 1988). It should be remembered that thousands of solo LTRs of ancient proviruses (but not MLVs) can be found scattered across the genome of both mice and humans (Keshet et al. 1991; Wilkinson et al. 1994), suggesting that proviral loss in this manner may play an important part in relieving the proviral load in the course of evolution.

Mechanisms of Proviral Acquisition

How do increases in copy number occur? A number of investigators have pointed out apparent structural similarities of proviruses to both prokaryotic and eukaryotic transposable elements. This analogy raises the possibility that new proviruses might arise by direct DNA transposition rather than by a process involving reverse transcription of an RNA intermediate. However, all the available evidence favors a mechanism of proviral acquisition involving reverse transcription.

The best studied case concerns the increases in the number of ecotropic MLV proviruses in the mouse germ line. For a number of years, it has been known that proviral amplification occurs in the progeny of viremic females but not in progeny of viremic males (Rowe and Kozak 1980; Jenkins and Copeland 1985), suggesting that amplification involved the infection of the female germ line and/or embryos. To study this effect more closely, genetically tagged ovaries from mice lacking endogenous ecotropic proviruses were transplanted into the ovarian bursas of hosts carrying Emv16 and Emv17. These mice were then crossed to virus-negative males, and the progeny were analyzed for novel proviruses. Mice derived from the donor ovary contained novel proviruses at a frequency comparable with that of nontransplanted animals (Lock et al. 1988). As ecotropic proviral acquisition is occurring in cells which themselves do not inherit ecotropic proviruses, this result can only be explained by a mechanism involving infection. The only germ cells present in the donor ovaries are oocytes, implying that these are the target for infection. Experiments involving the acquisition of novel proviruses in mice whose mothers did not contain germ line copies of the acquired viruses reinforce the conclusion that the novel proviruses result from retroviral replication. In one case, ecotropic MLV was introduced into the germ line of mice whose mothers were infected as newborns with ecotropic MLV (Lock et al. 1988; Panthier et al. 1989). A second case concerns a substrain of AKR that now carries a germ line copy of the virus responsible for tumors in AKR mice (Quint et al. 1982). Normally, the leukemia virus is generated by a series of recombination events in somatic cells (see below). Finally, in situ hybridization experiments show that oocytes from SWR mice carrying Emv16 and Emv17 do not express detectable levels of endogenous viral RNA (Lock et al. 1988). If these are the cells in which novel proviruses are generated, the virus must originate from other cells. Similar conclusions were reached from analogous studies of the Drosophila retrovirus, gypsy (Pélisson et al. 1994; Song et al. 1994).

Taken together, these data imply that the initial germ line colonization as well as proviral amplification can take place by a mechanism of extracellular infection essentially no different from that seen with exogenous viruses. However, many endogenous proviruses are known to be xenotropic, i.e., do not recognize a receptor on cells of their hosts. How could these viruses enter the germ line by infection if there is no receptor? The most likely answer is that loss of functional receptors has occurred subsequent to germ line infection, quite possibly selected by the desirability of preventing further proviral amplification (see below). Alternatively, different receptors may be used. Limited amplification of xenotropic proviruses can apparently occur even in animals lacking the corresponding receptor (Frankel et al. 1990; J.P. Stoye, unpubl.). In these cases, it seems likely that the SU function is provided in trans by another retrovirus or perhaps even by another cellular gene with fusion activity (Blobel et al. 1992). Structural studies of endogenous proviruses provide ample evidence, discussed above (see also Keshet et al. 1991), for recombination between different groups of viruses. Similar data exist for recombination between different MLVs; for example, the provirus that caused the hairless mutation (see below) is a recombinant between an Xmv and a Pmv element (J.P. Stoye, unpubl.). Recombination between retroviruses implies copackaging as a result of coexpression. Under conditions of coexpression, pseudotype formation can occur. Thus, polytropic MLV might provide env genes, allowing amplification of xenotropic MLV, and xenotropic virus might complement possible functional defects in the polytropic provirus.

This concept can be taken further to provide a retroviral explanation of the movement of elements such as IAPs, most of which lack env genes, or VL30s, which apparently lack all protein-coding capacity but possess the cis-acting signals necessary for replication by gene products encoded by other retroviral elements. Thus, env function for IAP movement might be supplied in trans by, for example, the IAPE group, and VL30 elements might travel between host genomes as molecular hitchhikers.

Although it seems virtually certain that all cross-species transmission events must entail an infection event, it is by no means certain that all novel insertions require an extracellular infectious phase for viral replication. Indeed, a substantial body of evidence from tissue culture studies is consistent with the notion that increases in proviral copy number could occur via intracellular retrotransposition. These studies use proviral constructs with reporter genes that are activated only after a round of replication through RNA intermediates (analogous to those used to show a reverse transcription step in the replication of Ty elements in yeast; see below). Similar results have been obtained with constructs derived from both IAPs (Heidmann et al. 1988; Heidmann and Heidmann 1991) and defective human elements (Tchénio and Heidmann 1991). However, the efficiency of transfer in the absence of an extracellular phase is many orders of magnitude less than transfer by infection. It remains to be determined whether the large number of germ line proviruses that lack env genes were in fact inserted by such an intracellular retrotransposition mechanism. Oocyte expression of IAP RNA has been reported (for review, see Kuff and Lueders 1988) that would be consistent with this possibility. Nevertheless, the major IAP species amplified in plasmacytomas appears to be deleted in gag, and if Gag function is required, then it must be supplied by complementation (Lueders and Kuff 1989). If Gag proteins are supplied in trans, there would appear to be no reason why Env function should not be as well.

In a few cases, for example, the MuRVY elements discussed above, proviral elements have been coamplified with their associated flanking sequences, thereby implying that proviral copy number can increase by a mechanism involving gene duplication. To date, such examples appear to be restricted to integrations on the Y chromosome and may reflect the properties of this chromosome, which mainly consists of highly repeated DNA sequences of no obvious function (Bishop 1992).

Establishment of Current Patterns of Endogenous Retroviruses

Combining the “fossil” data provided by studies of the structures and locations of different endogenous elements with our understanding of proviral movement derived from studies of proviral amplification, the following picture of the history of endogenous retroviruses emerges. In each species, a collection of clearly distinguishable groups of endogenous proviruses is present. Each proviral group results from an initial cross-species infection event, requiring an extracellular replication phase, by a virus usually of unknown origin, rather than by coevolution within their hosts. Phylogenetic data show that these infections have occurred over tens of millions of years, implying that members of some of the major genera of retroviruses have existed for at least this period of time. We have no means of estimating the frequency of these cross-species infections; viruses that cannot coexist with a particular host—for example, if they are very tumorigenic in that host—will not become fixed in that population.

Following the initial cross-species infection, proviral amplification can occur by reinfection or, alternatively, by intracellular retrotransposition. Different proviruses of a given group often form a discrete and homogeneous set separated only by a few rounds of replication as a retrovirus (King et al. 1987; Stoye and Coffin 1987). This implies that progeny of the initial provirus should mainly be regarded as siblings, rather than representing successive generations. The degree of amplification will vary, reflecting the selective forces imposed by the virus on the host and vice versa. Proviral amplification will cease when the founder provirus has accumulated sufficient random mutations to make it noninfectious. Judged from the examples of HERV-H and HERV-K discussed above, this process may take many millions of years. Alternatively, amplification will cease when the founder provirus is lost due to homologous recombination between its LTRs or if the host develops mechanisms to limit integrative bombardment of its genome.

The best characterized mechanisms for acquisition of host resistance involve different ways of preventing virus binding to cell surface receptors. They include selection for proviruses expressing env, for example, Fv4 (see below), which interfere with receptor binding, as well as changes to the host-encoded receptor that make it unable to be recognized by the virus. Thus, the receptor for xenotropic MLV seems to be absent in inbred strains of mice, although it is present in many wild mice (Kozak 1985). However, it is unlikely that the resistant mice have lost the ability to make the receptor protein altogether, since the viral receptors identified to date have important physiological roles in the cell (see Chapter 3. In fact, the xenotropic receptor is most probably allelic with the polytropic viral receptor (Kozak 1985) which is certainly present on murine cells. It is much more likely that point mutations are involved. For instance, a single amino acid change in the ecotropic receptor on cells from Mus dunni mice is sufficient to render them resistant to Mo-MLV but not to other ecotropic viral subtypes (Eiden et al. 1993). Alternatively, alterations in posttranslational processing may result in masking of the receptor, as apparently occurs with the feline receptor for the xenotropic RD114 virus (Dunn et al. 1993).

Although attention has focused on surface events, these are not the only level at which host-mediated inhibition of retroviral infection can occur. As illustrated by the case of Fv1, described below, resistance appears to be manifested at a number of other levels including integration and expression. It is tempting to speculate that understanding these naturally occurring processes might suggest novel approaches to combating retroviral infections.

Expression of Endogenous Retroviruses

The study of endogenous proviral expression has attracted considerable attention during the past 30 years. Individual endogenous proviruses are usually transcriptionally silent. Even so, expression of endogenous retroviruses has been recorded in every species examined. In some cases, expression leads to replication of infectious virus and can manifest itself as viremia; in others, it corresponds to production of single genes and their products in specific cell types. Expression can have a number of consequences, from virus-induced tumor formation to protection from viral infection. Without expression, there can be no movement of elements and no insertional activation or inactivation of genes.

Control of Expression

It is often difficult to explain the patterns of endogenous viral expression in the animal in terms of the properties of individual proviruses, because the high copy number of some families makes it very difficult to distinguish which members of a group are expressed and whether a pattern of expression is a property of the whole group or one or more members. Expression of some elements appears ubiquitous, whereas other families show spatially and temporally restricted patterns of expression (Keshet et al. 1991). For example, VL30 expression is observed in all late embryonic tissues examined (Norton and Hogan 1988), whereas ETns are expressed only in very early embryos and adult lymphoid cells (Brûlet et al. 1983, 1985; Shell et al. 1987). Human elements also show characteristic patterns of expression in different tissues (Wilkinson et al. 1994). Regulation of certain groups of elements is very tightly controlled with regard to differentiation. Thus, ERV-9 is expressed in undifferentiated human teratocarcinoma cells but is downregulated after retinoic-acid-induced differentiation (La Mantia et al. 1991), whereas a sequence termed RRHERV-I shows a reciprocal pattern of expression (Kannan et al. 1991). Other elements are inducible by a wide variety of physiological stimuli, for example, agents that stimulate the differentiation of embryonal carcinoma cells, with the VL30s representing one of the most promiscuous families (Keshet et al. 1991). Another property of some families of endogenous sequences is enhanced activity in tumors or transformed cells (Kuff and Lueders 1988). These differentiation-linked and transformation-linked changes of expression have resulted in the frequent cloning of retroviral genes in differential cDNA screening experiments.

A variety of studies have shown that members of different families are extremely heterogeneous with respect to transcription and that different subsets of a given family of elements are expressed in different tissues at different times. Comparison of ETn cDNAs isolated from embryos and plasmacytomas reveals that different subclasses of proviruses are transcribed (Shell et al. 1990). Comparison of IAP cDNAs from lipopolysaccharide-stimulated normal lymphocytes and plasmacytomas (Mietz et al. 1992; Lueders et al. 1993a) and during myelomonocytic differentiation (Takayama et al. 1991) gives similar results. Nuclease S1 mapping experiments of VL30 transcripts show that distinct VL30 elements are independently regulated in different tissues during mouse development (Norton and Hogan 1988).

What controls the differential expression of these elements__sequences inherent to the elements themselves or factors and cis-acting sequences from the host? Studies with chimeric exogenous viruses have pointed to the role of the LTR in determining the tissue tropism for viral replication (see Chapter 6; unambiguous evidence indicates that LTR sequences also have a key role in controlling the expression of endogenous elements. These include studies showing a correlation between specific LTR sequences and specific expression patterns (Mietz et al. 1992; Lueders et al. 1993a; Nilsson and Bohm 1994) as well as gene transfer experiments (by transfection or using pseudotyping of VL30 sequences), indicating that LTR sequences alone can drive tissue-specific inducer-responsive expression (Rodland et al. 1987; Morgan et al. 1988; Schiff et al. 1991; Nilsson and Bohm 1994).

Host factors can also control expression of endogenous proviruses. One classic example concerns the GIX antigen of strain 129 mice (Stockert et al. 1971). The GIX-positive phenotype, which results from expression of MLV SU proteins on thymocytes, is controlled by a locus, Gv1, which coordinately regulates the expression of multiple endogenous MLV proviruses (Levy et al. 1982, 1985).

It is highly likely that one of the most important factors controlling expression of a provirus is its position within the chromosome. Transgenes integrated in heterochromatic regions, for example, near the centromeres of mouse chromosomes, are often transcriptionally silent or show position-effect variegation (Butner and Lo 1986; Festenstein et al. 1996). Sequence elements known as silencers can lead to local repression embedded within a region of euchromatin (Loo and Rine 1994). Studies of retroviral expression in embryonal carcinoma cells apparently provide a striking example of the role of cis-acting sequences in controlling retroviral expression. The viral promoter does not work in these cells, most probably as the result of a silencer that overlaps the viral primer-binding site (Yamauchi et al. 1995), but rare integration events allow expression of virally encoded genes (Barklis et al. 1986). However, they do not reflect activation of the viral promoter but rather the generation of spliced messages initiated within the 5′-flanking region of the viruses (Peckham et al. 1989; Bonnerot et al. 1992). Unfortunately, there is little unambiguous data showing effects of cis-acting sequences on endogenous proviruses, because point mutations in the viral genome that might have occurred during replication prior to integration complicate the interpretation of many experiments. Nevertheless, studies of the Mov proviruses provide reasonably convincing evidence for differential, presumably position-dependent, activation of different loci (Berleth et al. 1987).

Host-mediated DNA methylation apparently plays an important part in controlling proviral expression. Treatment of cells with the methylation inhibitor 5-azacytidine has been shown to induce the expression of endogenous ALV (Groudine et al. 1981), endogenous MLV (Niwa and Sugahara 1981), and IAP genes (Lasneret et al. 1983) in vitro. Injection of azacytidine into mice activates silent retroviral proviruses (Jaenisch et al. 1985). An inverse correlation between the level of DNA methylation of an Mo-MLV genome and its activity in transfection assays has been observed (Simon et al. 1983). Estimates of the number of transcribed VL30 sequences coincide with estimates of the number of unmethylated proviruses (Carter et al. 1986); IAP transcripts found in differentiated lymphocytes are of the same sequence as hypomethylated proviruses (Mietz et al. 1992; Lueders et al. 1993a).

Despite all this, there is considerable uncertainty about the precise role of methylation in regulating proviral expression because methylation levels have a tendency to react to the transcription level (Enver et al. 1988; Sullivan et al. 1989). It is therefore hard to determine whether transcription controls methylation or vice versa and leaves unresolved the question of why specific proviruses are expressed. Part of this uncertainty stems from gaps in our understanding of the importance of methylation in the control of eukaryotic gene regulation (Bird 1992; Tate and Bird 1993). We know that methylation of CpG residues is associated with repression of gene activity and that preventing methylation by ablation of DNA methyltransferase prevents complete embryonic development (Li et al. 1992). However, the precise nature of the requirement remains something of a mystery. One suggestion is that DNA methylation evolved from the immune recognition system in bacteria in order to repress newly acquired transposable elements and retroviruses (Bestor 1990). Consistent with methylation as a defense mechanism is the observation that a wave of de novo methylation occurs shortly after the implantation of the embryo (Kafri et al. 1992). This would act on all novel proviruses resulting from the infection of the oocyte/early embryo but not on those resulting from later infections—a picture that fits observed patterns of Mo-MLV methylation following pre- and postimplantation infection (Jähner et al. 1982). Subsequent reactivation of specific loci must reflect the influence of sequences flanking the provirus. How this takes place remains an important open question.

Interaction with Infecting Exogenous Viruses

Expression of endogenous retroviruses can influence the outcome of infection by exogenous viruses in a number of different ways both beneficial and detrimental to the host. These include modulation of immune responses to exogenous viruses, provision of a source of novel viral genes for recombination with exogenous virus, blocking cellular receptors for virus, and ablating potential viral target cells.

An example of opposing effects is illustrated by an analysis of chickens responding to ALV challenge. This study showed that the expression of endogenous viruses is directly correlated with a reduced titer of virus-neutralizing antibodies (Crittenden et al. 1982, 1984). This reduced immune response, which presumably results from immunological tolerance to antigens shared between the endogenous and exogenous viruses, results in delayed clearance of the infectious virus. On the other hand, birds expressing endogenous proviruses are much less likely to develop a wasting syndrome. This syndrome apparently results from immunological destruction of infected cells and represents one of the major pathogenic consequences of ALV infection. It is not known whether expression of endogenous elements modulates immune responses to infection in other systems.

As discussed in Chapter 4 expression of two proviruses in the same cell can lead to phenotypic mixing and viral RNA copackaging. Numerous examples involving endogenous viruses have been observed, some of considerable practical importance. The most straightforward cases are those following exogenous infection of cells or tissues expressing endogenous proviral sequences by avian, murine, or feline leukemia viruses. A somewhat more complicated example is provided by the passage of human tumors through nude or immunosuppressed mice. Under these conditions, efficient infection of the human cells by endogenous xenotropic MLV takes place (Tralka et al. 1983; Lusso et al. 1990). If the human cells are infected with human immunodeficiency virus type 1 (HIV-1), phenotypic mixing between the murine and human viruses can result (Lusso et al. 1990). The presence of such viruses, especially if polytropic viruses are simultaneously mobilized, may seriously complicate the analysis of HIV-1 in animal model systems such as the SCID-Hu mice (see Chapter 11.

Infection by virions containing copackaged RNA can result in recombination and the generation of viruses with altered biological properties. For example, recombination events within env leading to viruses with altered host ranges and, sometimes, enhanced pathogenic potential (Khan 1984; Stewart et al. 1986) are frequently observed. In mice, different exogenous viruses seem to recombine with progeny of different endogenous proviral loci, perhaps reflecting the tissue specificity of viral replication (Evans and Cloyd 1985). Several studies have shown that defective viruses carrying deletions in essential genes can be repaired by recombination with endogenous retroviruses to yield replication-competent viruses (Martinelli and Goff 1990; Murphy and Goff 1994). This is of particular concern in the preparation of vector products designed for therapeutic use, especially when the replication-competent viruses are capable of tumor induction (Vanin et al. 1994). This problem was particularly acute with vectors prepared using the first generation of packaging cells; recently developed lines, although less dangerous, still pose problems (see Chapter 9.

A further complication for vector production results from copackaging of unwanted RNA passengers with unanticipated potential side effects. For example, all virions prepared from rodent cells contain VL30 sequences (Besmer et al. 1979; Scolnick et al. 1979). Indeed, levels of VL30 RNA may exceed the titer of the desired vector construct (Meric and Goff 1989; Torrent et al. 1994). Expression of both VL30 (Chakraborty et al. 1994) and defective MLV (Scadden et al. 1990) RNAs has been detected in human cells transduced with retroviral vectors. VL30 and defective MLV might both act as insertional mutagens for either cellular proto-oncogenes or tumor suppressor genes (Largaespada et al. 1995). Alternatively, their gene products might represent novel targets for the immune system. To date, only well-characterized VL30 sequences have been found in rodent cells, so the use of nonrodent cells for vector production may circumvent this problem. However, it seems quite likely that more elements showing the same form of viral parasitism as the VL30s (i.e., elements containing cis-acting signals for expression, reverse transcription, packaging, and integration by gene products encoded by other elements) will be described. Both chicken ART-CH (Nikiforov and Gudkov 1994) and human THE-1 (Paulson et al. 1985) elements might fall into this category. Whether such elements will present practical complications to the use of retroviral vectors remains to be determined, but clearly, the packaging of unanticipated hitchhikers represents a potentially serious danger.

The presence of certain endogenous retroviruses can confer a beneficial effect on the host by conferring resistance to superinfection with exogenous viruses. env gene expression from endogenous proviruses apparently blocks interaction of exogenous virus with its receptor. Examples of this phenomenon can be found in chicken (Payne et al. 1971), mice (Buller et al. 1987; Limjoco et al. 1993), and, apparently, cats (McDougall et al. 1994). Mice contain two genes, Fv4 and Rmcf, that inhibit the replication of two host-range classes, ecotropic and polytropic viruses. Fv4 has been shown to correspond to a partial provirus expressing an ecotropic env gene (Ikeda and Sugimura 1989). Rmcf may represent an analogous polytropic provirus, but the situation is still not perfectly clear (Frankel et al. 1990).

Studies of a natural population of feral mice with an ongoing ecotropic viral infection (Gardner et al. 1991) as well as mice carrying the Fv4 provirus inserted as a transgene (Limjoco et al. 1993) strongly suggest that Fv4 can provide an adaptive strategy for protecting its carriers. It should, however, be noted that Fv4, despite a lengthy presence in the mouse genome, has not been fixed in wild populations; persistent interference may have negative effects if it inhibits the normal function of a receptor (see Chapter 3. The expression of endogenous env sequences (as well as receptor mutations) may also represent an important selective pressure for viral evolution by forcing the use of alternative receptors and consequent changes in the env gene.

A further example of the involvement of endogenous retroviral sequences in resistance to retroviral infection is provided by the Fv1 restriction gene (Lilly 1970), which has an important role in determining the susceptibility of mice to MLV infection (see Chapter 10. The Fv1 gene product interacts with the capsid (CA) component present on the preintegration complex of incoming MLV (see Chapter 5. Fv1 has recently been cloned by positional means (Best et al. 1996). Sequence comparisons show that it shares 60% identity with the presumptive matrix (MA) and CA genes of the human endogenous virus HERV-L (Cordonnier et al. 1995). Although no other retroviral sequences (e.g., LTR or pol) can be detected adjacent to the Fv1 locus, it is presumably derived from a murine copy of the HERV-L family. Fv1 shows no apparent sequence similarity to the gag gene of MLV; despite this, however, it seems likely that the Fv1 protein and its target share functional similarities, perhaps suggesting that Fv1 acts in a dominant-negative fashion. In light of the precedents set by Fv1 and Fv4, it does not seem unreasonable to suppose that some of the proviral remnants present in other species, including humans, might have played a part in protecting their hosts from lethal viral epidemics.

Endogenous MMTVs can determine the outcome of infection by exogenous viruses in a different way. The sag gene in the LTR of all MMTVs is a superantigen (for review, see Coffin 1992). It is believed to be of importance during in vivo infection by virtue of its ability to stimulate target cells for viral replication (Held et al. 1993). Endogenous superantigens were first identified as genetic elements, known as minor lymphocyte-stimulating (mls) antigens, which, when expressed on B cells expressing major histocompatibility complex (MHC) class II molecules, could stimulate a large proportion of T cells to proliferate (Festenstein 1973). mls superantigen expression early in life leads to clonal deletion of responsive T cells expressing specific T-cell receptor Vβ gene subsets in the thymus (Simpson et al. 1993). Different mls genes at different chromosomal locations can stimulate or delete different T-cell subsets (Table 5). T cells responding to a given mls antigen are characterized by expression of specific T-cell receptor (TCR) Vb genes. Four lines of evidence showed that the mls antigens are encoded by endogenous MMTVs. First, several groups showed perfect genetic linkage between mls genes and endogenous MMTVs (Dyson et al. 1991; Frankel et al. 1991; Woodland et al. 1991). Second, infection of newborn animals by foster nursing on viremic mothers leads to deletion of cells expressing specific Vb subsets (Marrack et al. 1991). Third, transgenic animals carrying new MMTVs (or just sag) delete predicted subsets of T cells (Acha-Orbea et al. 1991, 1993). Fourth, cloned proviruses can induce sag activity after transfection into B-cell lines (Choi et al. 1991; Beutner et al. 1992).

Table 5. Specificity of Viral Superantigens Encoded by Endogenous MMTVs.

Table 5

Specificity of Viral Superantigens Encoded by Endogenous MMTVs.

Deletion of certain T-cell subsets does not seem to be deleterious for the mouse as evidenced by the fact that some mice carry large genomic deletions removing half the Vb region without an apparent adverse effect (Huppi et al. 1988). One case of a specific effect on pathogenesis by a nonretroviral agent has been reported. Mice carrying Mtv7 show greatly increased susceptibility to oncogenesis by polyomavirus, apparently due to deletion of cells expressing Vβ6-expressing subset necessary for recognition and killing of cells expressing a polyomavirus antigen (Lukacher et al. 1995). With this exception, no specific ill effects of endogenous superantigens have been reported. On the contrary, a strong case can be made that the presence of endogenous MMTV proviruses has beneficial effects. There is considerable experimental evidence that T-cell deletion due to the presence of expressed sag genes prevents infection by exogenous MMTV of the same sag gene specificity (Golovkina et al. 1992; Held et al. 1993). This might be expected to provide strong pressure favoring mice with endogenous viruses, since they would be protected against infection by milk-borne exogenous virus (Golovkina et al. 1993). This in turn might select for novel exogenous viruses specific to other Vβ genes. Relatively small changes in the carboxy-terminal region of sag should be sufficient to generate fresh specificities (Choi et al. 1991; Beutner et al. 1992; Brandt-Carlson et al. 1993). Endogenous sag genes seem so far to be confined to mice. There is no evidence that other genera of retroviruses have acquired such genes, although strong sequence similarity between sag and a protein encoded by Herpesvirus saimiri has been noticed (Thomson and Nicholas 1991).

Endogenous Retroviruses and Disease

Many retroviral infections can lead to tumors; infections by other retroviruses can cause symptoms of a variety of autoimmune diseases. Evidence indicating an association between endogenous viruses and cancer or autoimmunity in mice has led to the idea that endogenous elements might have a widespread role as agents of disease and has stimulated much research in this area. In retrospect, this rationale seems naive. Only in a few murine systems is there conclusive evidence for a pathological role of endogenous elements. In all other cases, associations between expressed endogenous proviruses and the disease state remain speculative. For a description of some of the pitfalls encountered during early attempts to implicate retroviruses in human diseases, including the rescue of host xenotropic viruses while passaging human tumor tissue in vivo, see Weiss et al. (1982).

The first unambiguous case of involvement of endogenous proviruses in disease is the spontaneous leukemias afflicting various strains of mice. Several strains, most notably AKR (Furth et al. 1933), but also HRS and C58, were originally bred by selecting for a high incidence of thymic lymphoma. Studies during a period of 20 years revealed that the tumors have a viral etiology and probably result from enhancer-mediated insertional activation of one of a number of proto-oncogenes (Kung et al. 1991) (see Chapter 10). Inheritance of an endogenous ecotropic virus is one prerequisite for leukemia (Lilly et al. 1975), but the virus is not the ultimate oncogenic agent (Cloyd et al. 1980). Rather, this part is played by recombinant viruses known as MCF (for mink cell focus forming) viruses (Hartley et al. 1977). Leukemogenic MCF viruses from AKR mice all have very similar structures, consisting of an ecotropic backbone with two areas of substitution by a nonecotropic viral sequence (Stoye et al. 1991). These viruses are derived from at least three endogenous proviruses by a complex series of events (Thomas and Coffin 1982; Khan 1984; Stoye et al. 1991) illustrated in Figure 4. These steps include the following:

1.

Expression and widespread replication of ecotropic virus in the embryo before or around birth (Lilly et al. 1975; Thomas and Coffin 1982).

2.

Recombination with a specific endogenous xenotropic virus encoded by the Bxv1 locus to acquire new sequences, including those encoding the carboxyl terminus of the transmembrane (TM) as well as most of U3 (Thomas and Coffin 1982; Stoye et al. 1991). Bxv1 was originally defined as a mouse genetic locus controlling induction of xenotropic virus by halogenated pyrimidines (Kozak and Rowe 1980). The newly acquired LTR sequences are thought to allow increased levels of viral replication in the thymus; the role of the change in TM is unclear (Holland et al. 1985, 1989).

3.

Duplication of the enhancer region within the newly acquired LTR by nonhomologous recombination, presumably providing still greater replication capacity in the target organ (Holland et al. 1989; Stoye et al. 1991).

4.

Recombination with an endogenous polytropic provirus to acquire altered env sequences. A specific polytropic provirus donating env sequences has not been identified; it therefore remains an open question as to whether the env donor corresponds to one or more genetic loci (Frankel et al. 1989b). The env gene change may be important not because it contributes novel receptor specificity but because it provides novel growth-stimulating activity by interaction with cytokine receptors (Li and Baltimore 1991).

Figure 4. Virological events leading to leukemia in AKR mice.

Figure 4

Virological events leading to leukemia in AKR mice. (A) Expression of an endogenous ecotropic virus. (B) Recombination with Bxv1 to generate a virus with an alteration in U3 (and the carboxyl terminus of TM). (C) Recombination with an endogenous polytropic (more...)

Despite the complexity of this process, at least 90% of all AKR mice die of an MCF-induced thymoma, implying that recombinant viruses of a precisely defined structure are generated in a very consistent and reproducible fashion. Since there is no reason to believe that recombination between retroviruses occurs in a directed fashion, this implies that certain products of recombination are strongly selected on the basis of the biological properties conferred by their component parts. A similar pattern of virological events has been observed in HRS and C58 as well as AKR mice (Thomas et al. 1984). However, in other strains of mice, with different types of tumors, different recombinant viruses are formed from much the same starting pool of endogenous viruses, supporting the notion that nonvirological factors have a crucial role of selecting viral recombinants (see Coppola and Thomas 1990; Gilbert et al. 1993).

The second case where there is firm evidence for a causal role of endogenous proviruses in oncogenesis concerns mammary adenocarcinomas of mice. It has long been known that such tumors can arise following horizontal transmission of MMTV via the milk from mother to pup (see Chapter 10. However, females from GR mice, even if foster-nursed on virus-free mice, also show a high incidence of early mammary tumors (Muhlbock 1955; Bentvelzen et al. 1970). Similarly, certain other strains, for example, DBA/2 and C3H, still sometimes develop mammary tumors late in life, even if reared on virus-free milk (Nandi and McGrath 1973; Van Nie and Verstraeten 1975). The proviral loci Mtv2 and Mtv1 have been implicated in GR (Bentvelzen et al. 1970) and C3H (Van Nie and Verstraeten 1975) tumors, respectively. It seems highly likely that they are playing a part analogous to that of exogenous MMTV. Expression of these proviruses in lactating mammary glands leads to release of infectious virus and reinfection of the same tissue, followed by rare transformation events and tumor formation. Spontaneous tumors in foster-nursed GR mice show a high frequency of novel proviral insertions near the same genes, Wnt1 and Fgf3 (formerly int-1 and int-2, respectively), commonly activated in exogenous tumors (Gray et al. 1986; Nusse 1988; Marchetti et al. 1991), an observation seemingly confirming the idea that endogenous and exogenous viruses cause disease in the same manner (see Chapter 10. Unlike the case with MLV, which requires extensive reengineering by recombination to make it tumorigenic, MMTV can apparently cause mammary tumors while still in its germ line form.

Somewhat unexpectedly, MMTV has also been associated with T-cell lymphomas in mice. Approximately 20% of adult male GR mice develop thymic lymphomas, and these contain novel MMTV proviruses (Michalides et al. 1982). T-cell lymphomas from a number of strains show amplification of their MMTV content (Dudley and Risser 1984). B-type viruses capable of inducing thymic leukemia have been isolated (Ball et al. 1988). Interestingly, all of these viruses show rearrangements in their LTRs, including changes in most of the glucocorticoid-responsive elements and deletion of a negative regulatory element. Studies with cloned proviruses provide convincing evidence that the LTR change is responsible for altering the tissue specificity of disease from the mammary gland to the T cell (Yanagawa et al. 1993). The rearranged LTRs show much greater transcriptional activities in T-cell lines than wild-type LTRs. By analogy with MLV models, the efficient expression of MMTV in thymus-derived cells might be a crucial factor in making it oncogenic for that cell type. However, in addition to altering cis-acting control sequences, the LTR deletion also truncates of the Sag protein, and it has been suggested that this protein has a role in preventing T-cell tumor development during the lymphocyte-mediated transit of MMTV from the gut (the site of primary infection by exogenous virus) to the mammary gland (Coffin 1992). So far, this idea has not been tested.

Other classes of endogenous proviral elements greatly outnumber the MLVs and the MMTVs, but there is no evidence to suggest that they represent a significant cause of spontaneous tumors. However, as has been shown, the transformed phenotype is associated with the increased transcription of certain classes of endogenous elements. At a certain frequency, reinsertion of these elements can occur in immortalized cell lines; the novel proviruses may act as insertional mutagens and contribute to tumor progression by relieving growth factor requirements for growth (Dührsen et al. 1990; Leslie et al. 1991; Algate and McCubrey 1993; Hirsch et al. 1993).

Expression of endogenous retroviruses has been observed in a number of mouse strains genetically predisposed to autoimmune disease, thereby raising the possibility that retroviral antigens may act as triggers for autoimmune diseases (Krieg and Steinberg 1990; Krieg et al. 1992). Examples include NZB mice, which develop an autoimmune disease resembling systemic lupus erythematosis, and NOD mice, which show a high incidence of insulin-dependent diabetes mellitus. NZB mice express xenotropic virus from birth (Levy 1973) and make large amounts of autoantibody to endogenous SU protein. Immune complexes between autoantibody and SU contribute to the glomerular nephritis that represents the main pathological lesion in these animals (Izui et al. 1979). Although the expression of xenotropic virus might contribute to nephritis, it is not the primary cause since there is no genetic correlation among the presence of infectious virus, titers of antiviral antibodies, and disease (Datta et al. 1978, 1982). In NOD mice, a clear correlation exists between the presence of retrovirus-like particles in pancreatic β cells and the development of insulitis and diabetes (Suenaga and Yoon 1988; Gaskins et al. 1992), but the role, if any, of the expressed retroviral sequences in the T-cell-mediated autoimmune destruction of β cells remains to be determined (Leiter 1989). Identification of the several genes contributing to diabetes in NOD mice may resolve this issue (Todd et al. 1991).

The question of retroviral involvement in the etiology of human autoimmune diseases (Krieg and Steinberg 1990; Krieg et al. 1992; Urnovitz and Murphy 1996) remains a controversial topic (Talal et al. 1992; Fox 1994; Garry 1994). Patients with HIV-1 or HTLV-1 infection can develop symptoms similar to those of autoimmune disease (Talal 1991; Wilder 1994). A fraction of patients with certain autoimmune diseases, including Sjögrens syndrome and multiple sclerosis, show low levels of antibodies to Gag antigens in the absence of serological responses (Brookes et al. 1992) or positive PCR data (Nelson et al. 1994), demonstrating infection by known exogenous human viruses. The anti-Gag antibodies may result from immune responses to poorly characterized, cross-reactive retroviruses of endogenous or exogenous origin (Garry et al. 1990; Banki et al. 1992; Lagaye et al. 1992). The significance of these immune responses and their viral targets in the induction and progression of autoimmune disease is unclear. Complicating the issue further is the observation that certain endogenous proviruses may encode TM proteins with immunosuppressive properties (Cianciolo et al. 1985; Haraguchi et al. 1995). Further progress in this area requires the identification and molecular characterization of candidate viruses followed by a genetic investigation of their relationship to disease. Given the number of endogenous proviruses present in humans and their general lack of genetic polymorphism, this appears to be a formidable task.

LTR-containing Retrotransposons in Nonvertebrates

General Features

The LTR-containing retrotransposons comprise a large family of elements that have been identified in all well-studied eukaryotic nuclear genomes (Table 6). Struc-turally, their DNA genome includes many retrovirus-like features. They almost always contain two identical (or nearly so) LTR sequences that enclose a single gag-pol or two separate gag and pol ORFs (Fig. 5). In nearly all cases, the LTR tips have the retroviral consensus terminal sequences TG...CA. These extreme termini are usually embedded within a somewhat longer inverted repeat sequence. Other conserved sequences that are usually found are a primer-binding site, usually complementary to the last 8–18 nucleotides of a specific cellular tRNA, adjacent to the 5′LTR, and a polypurine tract, resembling the retroviral site of plus-strand priming, adjacent to the 3′LTR. Thus, the overall structure is essentially that of a provirus, except that no env or accessory ORFs are present. (A number of interesting exceptions to these generalizations are discussed below.) Retrotransposon LTRs, like their retroviral counterparts, contain promoter elements and elements specifying 3′-end formation. The structure of the major retrotransposon RNA is similar to that of retroviral RNA (i.e., nearly full-length), allowing the U3, R, and U5 sequences of the LTRs to be defined for many elements.

Table 6. Structural and Organizational Features of Retrotransposons and Their Nucleotide Sequences.

Table 6

Structural and Organizational Features of Retrotransposons and Their Nucleotide Sequences.

Figure 5. Structures of some major retrotransposon types.

Figure 5

Structures of some major retrotransposon types. (Wavy lines) Host DNA; (double lines) transposon sequences; (colored boxes) transposon ORFs; (arrows) RNA transcripts; (shaded dots and ovals) target site duplications; (bracketed ovals) variable-length (more...)

LTR retrotransposons share many functional similarities with retroviruses, including aspects of gene expression and the more mechanistic aspects of element replication/transposition and associated genome rearrangements. Conserved aspects of gene expression include the presence of enhancer elements controlled by cellular regulatory circuits, the ability to enhance or inhibit the expression of adjacent cellular genes, and mechanisms to ensure the maintenance of appropriate stoichiometry of gag and pol gene products (most often by ribosomal frameshifting). Conserved aspects of replication include transposition via reverse transcription of an RNA intermediate, virus-like particles (VLPs) that appear to be transposition intermediates, and the retention of essential enzymatic functions required for transposition such as the protease (PR), RT, RNase H (RH), and integrase (IN) functions. The genomic rearrangements associated with both proviruses and LTR retrotransposons include LTR-LTR recombination to generate solo LTRs and deletion or inversion of cellular sequences interposed between two or more element copies. The latter types of events tend to have been studied in much greater and mechanistic detail for retrotransposons than for retroviruses because many retrotransposons inhabit the genomes of genetically tractable hosts, such as yeast and Drosophila.

What then is the critical distinction, if one must be drawn, between retroviruses and LTR retrotransposons? Unlike retroviruses, LTR retrotransposons normally carry out their replicative cycle within a single cell, i.e., their transposition (replication) process is noninfectious. As will be shown, even this distinguishing line becomes blurry at times.

Two Major Groups of LTR Retrotransposons

The very large number of LTR retrotransposons isolated thus far segregate phylogenetically rather neatly into two groups, called the Ty1-copia family and the Ty3-gypsy family. The families can be readily differentiated both in terms of gross structure and by phylogenetic comparisons of conserved (mostly RT) protein sequences. The most striking distinguishing feature is that the Ty1-copia family has an inversion in the order of the domains encoded within pol (PR, IN, RT), whereas the Ty3-gypsy family has the more familiar arrangement (PR, RT, IN). Phylogenetic distinctions among poly(A) retrotransposons or retroviruses themselves are generally more subtle.

Both LTR retrotransposon families are widely distributed among fungi, plants, and invertebrates (for a list of retrotransposons and their hosts, see Table 6). Whether true members of either of these families are found in the genomes of vertebrates is not yet entirely clear, although evidence is mounting for the Ty1-copia family. There are now reports of Ty1-copia elements from fish (Flavell and Smith 1992) and amphibians and reptiles (Flavell et al. 1995), although complete nucleotide sequences are not yet available. A fair degree of similarity exists between the Ty3-gypsy family and certain retroviruses, especially Mo-MLV, suggesting the possibility that the Ty3-gypsy group and the vertebrate retroviruses might be considered as one large superfamily. Of particular relevance to this possibility are recent findings regarding the subset of Ty3-gypsy family elements; these contain a third ORF resembling env, and transposition is infectious (see below).

The Ty3--gypsy Group

The Ty3-gypsy family is broadly distributed and structurally diverse. LTR lengths range from the longest known for all retroelements (2.1 kb for the Ulysses element of Drosophila [Evgen'ev et al. 1992] and 3.6 kb for the Tribolium Woot element [Beeman et al. 1996]) to the shortest (77 bp for the mag element of Bombyx; Michaille et al. 1990). Functional analyses indicate very strong similarities to the retroviral replication process; an overview of the nucleic acid transactions that occur during eukaryotic retroelement replication is given in Figure 5.

Ty3 from yeast and the gypsy element of Drosophila are the archetypes of this diverse family, and these two elements exemplify just a few of the many variations that are found in this eclectic group. Both show highly specific forms of transcriptional regulation: Ty3 transcription is induced by mating pheromone treatment, and gypsy is expressed in a developmental stage- and tissue-specific manner. Both of these elements demonstrate transposition target specificity with very distinct mechanisms: sequence and probably regional specificity in the case of gypsy, and position specificity, determined by RNA polymerase III transcription factors, for Ty3 (see below).

Recent evidence implies that some members of this group are endogenous retroviruses. A few LTR retrotransposons have additional ORFs beyond those corresponding to gag and pol. At least five insect retrotransposons have a third, env-like ORF (ORF3). The gypsy ORF3 is obviously conserved, as it is present in both the D. melanogaster and D. virilis gypsy elements, in which the DNA sequence has diverged to 70% identity but the ORF3 protein has been maintained at 80% amino acid identity. The genomic position (downstream from pol in all cases), length, and presence of a putative carboxy-terminal membrane-spanning domain in all of these proteins suggest that they may have an env-like function. Furthermore, all have putative N-linked glycosylation sites, and potential cleavage sites separating possible surface and transmembrane proteins can be identified in some cases; the gypsy ORF3 protein is, in fact, glycosylated and cleaved (Song et al. 1994, 1997). Studies on the ORF3 protein of the TED element of Lepidoptera indicate that it too is a glycoprotein (M. Szatkowski and P. Friesen, pers. comm.).

A number of recent findings make the case quite strong that gypsy-like elements are endogenous retroviruses. Studies of the biosynthesis of ORF3 protein from Tom (Tanda et al. 1994) and gypsy (Pélisson et al. 1994) suggest that these ORFs are expressed from spliced subgenomic RNAs, just like their vertebrate retroviral counterparts. Both the expression of gypsy ORF3 mRNA and high-frequency mobilization of gypsy (Prud'homme et al. 1995) depend on the presence of the flamenco mutation in the host strain. flamenco is a cellular gene on the X chromosome which has been mapped but not yet cloned. Mother flies must be homozygous for flamenco for gypsy activity in their germ line. The flamenco mutation apparently controls both overall gypsy mRNA level as well as the splicing event that produces gypsy ORF3 RNA (Pélisson et al. 1994). Kim et al. (1994) were able to transfer gypsy from flies carrying flamenco by microinjection or feeding of extracts to a strain that lacked active gypsy elements. Transfer of gypsy could be assayed by its propensity to insert into the ovo D1 locus on the X chromosome. The dominant ovo D1 mutation results in female sterility and is readily “reverted” by gypsy insertional inactivation of the ovo D1 locus (Mével-Ninio et al. 1989).

Viral particles derived from gypsy have been isolated from adult female flies of a flamenco strain bearing a large number of gypsy elements and partially purified using sucrose gradient centrifugation (Song et al. 1994). The fractions contain viral particles that have proteins encoded by ORF3. These fractions were fed to larvae of a strain lacking active gypsy elements which was then crossed to an ovo D1 strain. Fertile female progeny were obtained, and the fertility of some of these strains was shown to result from insertion of gypsy into ovo D1. This experiment shows the direct transfer of gypsy elements via an extracellular viral particle.

The Ty1-copia Group

This group of elements consists of a small number of fungal and insect elements, such as Ty1 and copia themselves, and large numbers of element families identified by PCR techniques in plants, ranging from the tiny Volvox to trees, and include monocot and dicot species (Flavell et al. 1992; Voytas et al. 1992). A survey of a 40-kb region of the maize genome revealed that half of this region consisted of retrotransposons, mostly of the Ty1-copia class (SanMiguel et al. 1996). The plant Ty1-copia elements can for the most part be divided into two types: multicopy (ten to thousands per host genome) and containing transpositionally active members, e.g., the tobacco Tnt1 element (Grandbastien et al. 1989), and inactive low-copy (1–2 copies) elements. No less than ten families with the latter characteristics have been isolated from the genetically tractable Arabidopsis thaliana genome (Konieczny et al. 1991). Horizontal introduction by as yet unknown means, but potentially via insect vectors, of these elements into their host genomes may account for the low-copy-number plant Ty1-copia element families. Lacking the ability to use host factors provided by the new host, they cannot proliferate and gradually decay in their new host genome.

Although Ty1 has distinct gag and pol genes separated by a frameshift, most of these elements have a single gag/pol ORF. How then are Gag and Pol proteins produced in the appropriate ratio? The answer to this question is not yet known for the abundant plant elements. For the Drosophila copia element, it is apparent that there are two major transcripts, one of which is full-length and the other of which is a spliced derivative lacking most (but not all) of the internal pol sequences. The spliced mRNA thus encodes Gag as well as PR, whereas the full-length RNA encodes Gag/ Pol (Yoshioka et al. 1990). Presumably, only the long RNA contains an encapsidation (ψ) sequence. Nevertheless, defective elements appear to be generated occasionally by reverse transcription of the spliced form of the mRNA (Yoshioka et al. 1991). The mechanism for generating appropriate ratios of Gag to Pol proteins has also been studied with Ty1, one of a small number of elements which require ribosomes to shift in the +1 direction to read from the gag into the pol frame, unlike the –1 frameshift required by retroviruses (Chapter 7. Ty4, a more recently isolated element, appears to use the same frameshift mechanism as Ty1 (Janetzky and Lehle 1992; Stucka et al. 1992).

Recent cloning/sequencing studies of a number of new transposons from this family indicated some unusual features. A significant subset of these elements, including Ty5, appear to use a primer tRNA fragment, as was shown many years ago for copia (Kikuchi et al. 1986; Voytas and Boeke 1992). In addition, some of the plant elements, such as the wheat Wis2 element and barley BARE-1, have rather long (1.8-kb) LTR sequences (Lucas et al. 1992; Manninen and Schulman 1993). Functional studies of a limited number of transposons in the family suggest that the elements have become very well adapted to their hosts. Like retroviruses, these elements probably depend on a variety of different host transcriptional circuits to regulate expression.

Ty1 and copia deserve special mention for their historical importance. These two elements were the first to be intensively studied and shown to be mutagenic transposons (Cameron et al. 1979; Chaleff and Fink 1980; Farabaugh and Fink 1980; Roeder and Fink 1980; Bingham and Judd 1981; Zachar and Bingham 1982). They also provided the first evidence that LTR retrotransposons transpose by a mechanism akin to retroviral replication (Elder et al. 1983; Shiba and Saigo 1983; Boeke et al. 1985; Garfinkel et al. 1985; Mellor et al. 1985). The biology of these elements has been extensively reviewed (Bingham and Zachar 1989; Boeke 1989; Boeke and Sandmeyer 1991; Garfinkel 1992).

LTR Retrotransposition Mechanism

The mechanism of retrotransposition is believed to be similar for all LTR retrotransposons and retroviruses (Fig. 6). Therefore, this discussion focuses on the differences and points of uncertainty.

Figure 6. Replication mechanisms: Eukaryotic retroelements.

Figure 6

Replication mechanisms: Eukaryotic retroelements. A sketch of each element's RNA and DNA forms, drawn so as to indicate the mechanisms used for replication. (A) LTR retrotransposon and retrovirus; (B) poly(A) retrotransposon; (C) hepadnavirus; (D) caulimovirus. (more...)

Like the retroviral life cycle, the LTR retrotransposon life cycle can be divided into a few separate phases: transcription and gene expression, RNA encapsidation and particle assembly, reverse transcription, nuclear entry, target site recognition, and integration. The broad outlines of the important events as well as many of the mechanistic details are quite similar for the two groups. It must be kept in mind, however, that there are some key differences between the biology of retroviruses and retroelements that must be reflected in the mechanism of their replication. First, retroelement replication must be regulated in a manner different from retroviral replication; the numbers of elements must not be allowed to become more than the host genome can accommodate. Second, transposable elements, by definition, do not require an extracellular phase (i.e., a virion) to complete the transposition cycle. This property simplifies the multiplication process by removing the necessity of release and subsequent re-entry of the retroelement genome, but it complicates assembly and maturation (or at least our understanding of them) by not providing the cues for processing that retroviruses probably obtain from membrane association and release or receptor interaction. Third, many of these elements exist in cells with compact genomes lacking in large intragenic or intergenic DNA regions to provide relatively safe havens for integrated DNA and, in some cases, also spending a significant amount of their lives in a haploid state. Thus, the probability of collateral genetic damage as a consequence of integration is likely to be much higher than with retroviruses (Craigie 1992). Fourth, retrotransposons do not need interaction of Env proteins with membrane glycoproteins to make new transposon copies; hence, their transposition intermediates may differ with regard to mode and intracellular site(s) of assembly relative to the true viruses. A final issue is that all fungi (including yeast) undergo “closed” mitosis, in which the nuclear membrane never breaks down, eliminating a portal of entry used by the preintegration complex of many retroviruses.

Because these elements usually have strictly intracellular transposition intermediates, retrotransposon researchers do not have the luxury of viral stocks as a starting biochemical reagent to study individual steps in the life cycle. Furthermore, the multicopy nature of the elements in the host genome complicates traditional genetic analyses. For these reasons, the ability to be able to control the transcriptional activity of a single, active copy of a retrotransposon has been a tremendously useful technique. For example, the Saccharomyces cerevisiae GAL promoter system has been used to drive Ty1, Ty2, Ty3, and Ty5 retrotransposition (Boeke et al. 1985; Garfinkel et al. 1988; Chalker and Sandmeyer 1990; Zou et al. 1996), and the Schizosaccharomyces pombe nmt promoter has been used to drive Tf1 (Levin et al. 1993).

Expression

The life cycle begins with transcription of the element by RNA polymerase II. With many LTR retrotransposons, and unlike exogenous retroviruses, initiation of transcription is expected to be a key step limiting the timing and extent of retrotransposition to levels tolerable to the host. However, this is by no means universally true, as surprisingly, the transcripts of some elements, such as Ty1 (Curcio et al. 1990) and copia (Flavell et al. 1980), are among the most abundant cellular poly(A)+ RNAs (in fact, copia is named for the abundance of its RNA). Nevertheless, both of these elements, like almost all transposons, regulate their transposition very tightly.

Regulation of expression is as varied among retrotransposons as among retroviruses. In a number of elements, transcription is controlled by cellular regulatory circuits; some of these have been extensively studied in yeast and in D. melanogaster. Three of these are the mating-type and mating pheromone regulation of yeast Ty elements (Boeke and Sandmeyer 1991), developmental/tissue-specific regulation of gypsy (Corces and Geyer 1991), and steroid hormone (ecdysone) regulation of insect element 1731 (Fourcade-Peronnet et al. 1988). The last is reminiscent of hormonal regulation of MMTV expression. Some unusual developmental expression patterns have been observed for retrotransposons; perhaps one of the most unusual is that of the Drosophila LTR element 412. No evidence for germ line expression of this element has been obtained; rather, there is specific expression in the gonadal mesoderm. This tissue- and developmental-specific expression is presumably mediated by host factors (Brookman et al. 1992).

In addition to developmental stage, transcription of LTR retrotransposons may also be regulated by DNA damaging agents (Rolfe et al. 1986; McEntee and Bradshaw 1988), high temperature (Cappello et al. 1985; Strand and McDonald 1985; Chen and Fonzi 1992), and tissue type. Surprisingly, there are very few instances in which these agents have been directly correlated with the frequency of transposition of the element. Correlation with transcript level and retrotransposition frequency has been demonstrated for Ty1, which is repressed by the a/α mating type (Paquin and Williamson 1986), and requires the host SPT3 gene for efficient retrotransposition (Boeke et al. 1986). Although it is certainly likely that relatively large amounts of retrotransposon RNA are required for retrotransposition, it is also true that high transcript levels are not necessarily sufficient for frequent transposition, as already mentioned for Ty1 and copia.

An important advantage of the Drosophila and yeast systems for studying retrotransposon expression is the ability to select or screen for factors that suppress or enhance expression by their effects on element-driven expression of neighboring genes (see below). Genetic suppressor (or enhancer) analysis has led to the identification of a large number of specific transcription factors as well as general components of the transcription machinery, such as the TATA-box-binding protein (Eisenmann et al. 1989; Hahn et al. 1989). In brief, this analysis is usually performed by beginning with a retrotransposon insertion mutation that results in a complete or partial loss of function of an easily scorable reporter gene. The mutant strain is then mutagenized, and progeny are screened for suppression of the mutant phenotype (i.e., restoration of function) or enhancement of the mutant phenotype (i.e., changes that result in the further decrease in gene expression). For example, the spt3 mutations that were used to define this transcription factor gene were first isolated as suppressors of solo LTR insertions upstream of the HIS4 gene, giving rise to a cold-sensitive His phenotype. The His phenotype results from a promoter competition between the LTR promoter (which produces a HIS4 transcript that cannot be efficiently translated due to the presence of several out of frame ATGs) and the native HIS4 promoter (Fig. 7). Loss of SPT3 function leads to loss of expression of the LTR-driven transcript and higher levels of the native HIS4 transcript. Similar selections have been carried out in Drosophila. This topic has been the subject of a number of reviews (Boeke and Sandmeyer 1991; Corces and Geyer 1991; Winston and Carlson 1992).

Figure 7. Mechanism of suppression of spt3 mutations.

Figure 7

Mechanism of suppression of spt3 mutations. The his4-912LTR mutation consists of a solo LTR upstream of the HIS4-coding region inserted between the HIS4 upstream activating sequence (UAS) and the TATA box; this results in a His phenotype. The (more...)

Binding sites for the trans-acting regulatory proteins affecting transcription are of at least three types: TATA-box-binding factors, those binding upstream of the TATA box at upstream activating sites (UASs) or enhancers, and those binding at downstream activating sites (DASs). As with transcription of cellular genes, the regulatory regions bind multiple proteins, which act together in a combinatorial fashion to effect properly regulated element transcription (see Chapter 6. However, the downstream elements tend to be the most important ones in terms of their effects on promoter strength. Yeast Ty elements have important DAS sequences (see, e.g., Yu and Elder 1989; Farabaugh et al. 1993b). A number of LTR retrotransposons from insects contain DAS sequences similar to those found in the TATA-independent internal promoters of poly(A) retrotransposons and of certain cellular genes (Arkhipova and Ilyin 1991). These downstream activators are common in various LTR elements, perhaps because they are an adaptation to a size constraint on the element; this strategy may also “insulate” the retrotransposon's transcriptional regulatory machinery from effects of flanking sequences. The frequent use of DAS elements raises the possibility that LTR promoters may have evolved from the internal promoters used by poly(A) retrotransposons (Mizrokhi et al. 1988).

The process of retrotransposon protein synthesis is simpler than that used by retroviruses due to the absence of an env gene. However, the problem of synthesizing appropriate ratios of Gag and Pol proteins is similar, although the strategies are often different, even among elements in the same group. For most retrotransposons, as for many retroviruses, the first AUG codon in the mRNA is upstream of the first ORF and must be bypassed by the translational machinery for correct initiation at a subsequent AUG. However, functional studies of initiation codon selection have not been carried out for retrotransposons. Many elements related to gypsy have separate gag and pol ORFs arranged in the same fashion as most retroviruses, with a single –1 frameshift required to synthesize the Gag-Pol precursor, presumably by a simultaneous slip mechanism (Chapter 7. The yeast elements are different: Both Ty1 and Ty3 use a +1 frameshift, but they accomplish the frameshift in different ways.

The mechanism of Ty1 +1 frameshifting is totally different from the simultaneous slip mechanism. As is the case for all programmed translational infidelity mechanisms, a key component is that the translation apparatus must pause (Atkins et al. 1990). In the retroviruses, pausing is thought to be effected by the secondary structures in the template RNA. In Ty1, the seven-nucleotide sequence CUU-AGG-C is necessary and sufficient for frameshifting, indicating no requirement for secondary structure. Pausing is effected instead by the rare AGG codon at the slip site, causing the ribosome to stall with an empty A site. Frameshifting can be abolished by overproduction of the appropriate tRNAArg that recognizes this codon. The frameshift itself is mediated by a tRNALeu that can decode either of two overlapping Leu codons within the frameshift site (Belcourt and Farabaugh 1990).

Ty3 also undergoes +1 frameshifting, but the mechanism is different. The frameshift occurs within a critical heptanucleotide GCG-AGU-U; but unlike the Ty1 case, this sequence is not sufficient for efficient frameshifting. This frameshift event is also thought to require stalled translation, mediated by low availability of the host tRNASer recognizing the AGU codon. While translation is arrested, a tRNAVal is proposed to recognize the out-of-frame GUU codon either by binding in the inappropriate reading frame or via a 4-bp codon-anticodon interaction. The latter event is presumably stimulated (in an as yet unspecified way) by the nucleotide sequence immediately distal to the frameshift site; this sequence very strongly stimulates but is not essential for frameshifting. This frameshifting mode is unique in that tRNA slippage does not appear to be involved (Farabaugh et al. 1993a).

The Tf1 element of fission yeast has a single gag/pol ORF, so it has no apparent mechanism for down-modulating pol expression at the level of biosynthesis (Levin et al. 1993). Recent results suggest that selective degradation of Pol proteins may effect an appropriate ratio of Gag to Pol proteins (Atwood and Levin 1996). The apparent avoidance of –1 frameshifting by yeast LTR retrotransposons is not due to the inability of the yeast translation machinery to perform a –1 frameshift; the yeast “killer” virus L-A uses a –1 frameshift (Dinman et al. 1991) and, furthermore, HIV-1 gag-pol and MMTV gag-pro frameshift in yeast (Wilson et al. 1988; Lee et al. 1995).

For copia, the solution to the Gag-Pro-Pol expression problem is quite different. This element is unusual among LTR retrotransposons because it produces two major transcripts. One transcript is full length, encoding the Gag-Pol fusion protein; the other is a spliced derivative in which most (but not all) of the internal pol sequences are removed in frame. The spliced mRNA thus encodes a Gag-Pro fusion protein (Miller et al. 1989; Yoshioka et al. 1990). As with retroviruses (but not most retroelements), copia has the problem of positioning packaging signals to avoid encapsidation and transposition of the spliced form of the mRNA. The mechanism appears to be imperfect, because intronless elements have been isolated from the Drosophila genome (Yoshioka et al. 1991).

In most retroelements, the Gag precursor is cleaved to yield an amino-terminal CA protein and a carboxy-terminal protein analogous to NC, with Gag-Pro-Pol yielding PR, RT, and IN (Garfinkel et al. 1991; Kirchner and Sandmeyer 1993; Merkulov et al. 1996). The absence of membrane assembly and budding apparently obviates the requirement for an MA-like protein, as none are present among most known retrotransposons. Even gypsy Gag protein, which might be a candidate to contain an MA protein, lacks a myristylation signal at the amino terminus and also lacks homology with retroviral MA proteins. The CA protein is indeed the amino-terminal peptide in Ty1, as indicated by the presence of a blocking group (presumably acetyl) on its amino terminus as well as epitope mapping studies.

The products of Gag are rather diverse. Ty3 apparently generates an amino-terminal CA of 26 kD and a 9-kD carboxy-terminal NC (complete with zinc-finger-like motif). Ty1 Gag processing results in a very large amino-terminal CA (estimated to be 45 kD), with a small carboxy-terminal peptide of approximately 4 kD. This peptide has never been observed directly and would contain no zinc-binding motif. It is interesting that a large fraction of retrotransposon gag genes lack a zinc-finger-like sequence, present in most but not all retroviruses (Covey 1986). This peptide of unknown function is released by Ty1 PR; the same cleavage site in Gag-Pol probably defines the amino terminus of PR (Merkulov et al. 1996). Another posttranslational modification that has been reported is phosphorylation of the Ty1 CA protein (Dobson et al. 1984). An increase in the amount of phosphorylated Ty1 CA protein is associated with mating pheromone treatment, a condition that blocks Ty1 transposition (Xu and Boeke 1991).

PR, RT, and IN are the three major products of pro and pol, and they are synthesized from the same reading frame in a variety of retrotransposons that have been analyzed. These proteins are produced from the primary translation products by proteolytic cleavages within the virus-like particle, much as with retroviruses. In addition to the cleavages specifying the three major Pol proteins, there are some minor proteolytic cleavages whose significance is uncertain (Garfinkel et al. 1991; Kirchner and Sandmeyer 1993).

Assembly and Maturation

As with retroviruses, the assembly phase is one of the least well understood parts of the retrotransposon life cycle, but there are many similarities suggested by what is known. Assembly has been studied primarily with Ty1 and Ty3 using GAL-Ty overexpression strategies and wild-type and mutant elements (Garfinkel et al. 1985; Mellor et al. 1985; Hansen et al. 1992). Virus-like particles (VLPs) are then isolated and examined biochemically for defects. Assembly defects have also been noted under various stress conditions and in certain host mutants, such as ssb1/ssb2 mutants (Stone and Craig 1990) lacking Hsp70 proteins so that they mimic “stressed” cells; this defect can be overcome by overexpressing UBP3, a ubiquitin-removing protease (Menees and Sandmeyer 1996). An alternative strategy used to study some Drosophila elements has been to search for particles or VLP-like structures by expression of Gag proteins in heterologous cells (Miyake et al. 1987; Yoshioka et al. 1991, 1992; Lerch and Friesen 1992; Kim et al. 1993).

For both retroviruses and retrotransposons, VLP assembly absolutely requires the Gag protein, and there is element-specific packaging of genomic RNA and primer tRNA, although certain other cellular tRNAs and mRNAs are also sometimes packaged. In the case of Ty1, either full-length Gag, without Pro or Pol proteins, or a version of Gag prematurely truncated at its carboxyl terminus is capable of assembling VLPs in yeast (Adams et al. 1987; Müller et al. 1987; Burns et al. 1992). Further studies suggest that a relatively small portion of the Ty1 Gag protein is sufficient for assembly (Brookman et al. 1995; Brachmann and Boeke 1997). An important difference from retroviruses is in the intracellular compartment of assembly, which invariably differs from retroviruses that always wind up at the plasma membrane, where the capsid often first becomes visible. Although B- and D-type retroviruses assemble into A particles at intracellular locations, the particles subsequently associate with the plasma membrane, where budding occurs.

There are interesting differences among retroelements in the site of assembly of VLPs. In the case of copia expressed in yeast, the VLP-like protein structure is found in the nucleus (Yoshioka et al. 1992), as are copia VLPs in Drosophila tissue culture cells (Miyake et al. 1987). This observation suggests that copia Gag-PR contains a nuclear localization signal. It would follow then that for copia, VLP assembly takes place in the nucleus, and thus reverse transcription probably occurs there as well. This might well be the solution to the problem of packaging spliced mRNAs in copia. Presumably, the spliced mRNAs would be efficiently transferred to the cytoplasm by the splicing/export machinery and be unavailable for packaging. Ty1 and Ty3 VLPs are observed only in the cytoplasm, and such VLPs are competent to carry out integration in vitro (Eichinger and Boeke 1988). In yeast, as in all fungi, mitosis is “closed”; i.e., it lacks a nuclear membrane breakdown step. Nuclear entry for copia and the yeast elements must be accomplished by mechanisms different from one another and from that proposed for MLV, namely, entry during mitosis, when the nuclear membrane breaks down (Roe et al. 1993). On the basis of these data, it has been tentatively concluded that Ty assembly and reverse transcription take place in the cytoplasm and that Ty DNA is transported to or through a nuclear entry site. This transport of DNA might occur with the DNA in the form of a VLP or some subassembly thereof, such as a preintegration complex. However, it is important to remember that VLP numbers exceed the number of transposition events; it is always dangerous to base conclusions on the behavior of bulk particles. Thus, an alternative that must be considered is that a small number of Ty VLPs assemble in the nucleus and that these give rise to the observed transposition events, whereas those that assemble in the cytoplasm might be unable to access genomic DNA. However, recent results in two laboratories suggest that Ty1 integrase contains a carboxy-terminal nuclear localization signal that is essential for retrotransposition, consistent with transport of a preintegration complex (S. Moore and D.J. Garfinkel, pers. comm.; M. Kenna et al., unpubl.).

The difference in site of assembly implies an important difference in the maturation of the VLPs and of retroviruses. Some aspect of membrane association and budding activates PR and subsequent maturation events in retroviruses (Chapter 7, but it is not known what triggers the retrotransposon proteolytic cleavages that give rise to the “mature” (and presumed active) VLPs. A popular hypothesis for both retroviruses and retrotransposons is that PR dimerization, required for PR activity, can only occur efficiently within the confines of an assembled particle in which the precursor concentration is very high. This would not occur when Gag/Pol protein is free in solution or in partially assembled particles. Experiments to test this point remain to be carried out with retrotransposons.

LTR retrotransposon proteases belong to the same aspartyl protease family as their retroviral relatives, and they share the proposed catalytic triad sequence D(T/S)G. They also appear (from size and sequence) to require dimerization for activity, unlike their cellular counterparts, the renin-like proteases. Furthermore, the retrotransposon proteolytic cleavage sites in Ty1 and Ty3 fit relatively well with the rather degenerate sequences recognized by retroviral PRs, except that hydrophilic residues are more common at the P1 and P1′positions for the retrotransposon proteases than for the retroviral proteases (Pettit et al. 1991; Kirchner and Sandmeyer 1993; Moore and Garfinkel 1994; Merkulov et al. 1996; see also Chapter 7.

There are no published reports of assembly of retrotransposon particles in vitro. Thus, it is possible that their assembly in cells requires more than simply the RNA and retrotransposon-encoded protein components. Expression of the Gag-PR protein encoded by the copia subgenomic mRNA in both bacteria and yeast results in the formation of a curious lamellar structure rather than the VLPs found in Drosophila cells; the structures generated in the bacteria and yeast are morphologically different. Although there can be many explanations for the generation of such potentially artifactual structures, Yoshioka et al. (1992) have suggested that assembly may require the participation of a cellular factor such as a chaperonin; the use of inappropriate chaperonins, such as those present in yeast and bacteria, would lead to the assembly of the lamellar structures. An in vitro system for studying Ty1 VLP assembly has recently been developed (J.L. Brookman, pers. comm.). Reticulocyte lysates programmed with synthetic Ty1 RNA are allowed to synthesize protein. A fraction of the newly synthesized Ty1 Gag protein sediments similarly to that of authentic Ty1 VLP Gag in sucrose gradients. Such a system may be very promising to begin in vitro dissection of this process.

Reverse Transcription

Once VLPs have been assembled, the intricately orchestrated steps of reverse transcription (Chapter 4 can take place. As with retroviruses, proteolytic cleavage may activate RT in such a way that it can effectively access its own primer/template complex. In Ty1, PR mutants produce an RT that is capable of utilizing exogenous substrates effectively, but they cannot carry out the endogenous reaction. Some of this apparent inability to utilize internal primer/template RNA may be an artifact of inefficient packaging or retention of genomic RNA, but the defect in RNA packaging is less dramatic than the defect in the endogenous RT reaction (Youngren et al. 1988).

Both Ty1 and a number of Drosophila elements use specific cellular tRNAs to prime reverse transcription and, like retroviruses, generate minus-strand strong stop DNAs of the appropriate sizes. The yeast system offers special genetic advantages for studying the interaction between primer tRNA and retrotransposon. Strains that depend on a single gene encoding a given tRNA have been constructed and can be used to evaluate primer tRNA genetics in detail, because any mutation of interest can be studied in isolation in an in vivo system (Byström and Fink 1989; Chapman et al. 1992). A recent study of the role of tRNA in retrotransposition of Ty1 and Ty3 in yeast concluded that the acceptor stem and TψC loop regions of the tRNA are critical for efficient transposition. Mutations elsewhere in the tRNA, including many alterations within the anticodon arm, have little or no effect on transposition, even though they render the tRNA completely nonfunctional for accurate translation (Keeney et al. 1995). An unexpected finding was that a modification of the tRNA backbone unique to the eukaryotic cytoplasmic initiator tRNAs appears to have an important role in retrotransposition (Åström and Byström 1994; V. Lauermann et al., in prep.).

The finding of plus-strand strong stop DNAs for some LTR retrotransposons (Arkhipova et al. 1984; Müller et al. 1991; Chapman et al. 1992) implies a mechanism of priming and transfer of the plus strand similar to that of retroviruses. Indeed, a plus-strand strong stop DNA has been identified for Ty1 (Heyman et al. 1995; Lauermann et al. 1995). However, Ty1 appears to differ in some critical aspect of its mechanism of plus-strand reverse transcription; the inheritance pattern of mutations in the primer-binding site suggests that the primer-binding site sequence is not inherited from the tRNA as would be predicted by the classical retroviral reverse transcription model (Lauermann and Boeke 1994). Ty1 RT may make strong stop “jumps” before reaching the end of the template; the short (10-bp) primer-binding site length suggests that the distance between the two active sites (polymerase and RNase H) might be much shorter for Ty1 than for retroviral RTs, resulting in a higher propensity to jump to homologous template regions (V. Lauermann and J.D. Boeke, in prep.). An additional major site of initiation of plus-strand DNA synthesis in the middle of the genome, analogous to that seen in HIV-1 (Charneau et al. 1992), has been reported for Ty1 (Müller et al. 1991; Pochart et al. 1993b; Heyman et al. 1995). Initiation at this site leads to a gap and overhanging single strand in the resulting double-stranded DNA molecule. However, this second plus-strand priming site, unlike the case for HIV, appears to be dispensable for Ty1 replication (Xu and Boeke 1990b; Heyman et al. 1995).

Priming of minus-strand synthesis in retrotransposons differs in interesting ways from that in retroviruses: 18 nucleotides at the 3′terminus of primer tRNA are complementary to the template. In contrast, only 10 nucleotides of Ty1 are complementary to its primer tRNA (Chapman et al. 1992), and this number drops to 8 nucleotides in Ty3 (Keeney et al. 1995; V. Lauermann and J.D. Boeke, in prep.). In the case of copia, a fragment of tRNA is used as the primer (Kikuchi et al. 1986); this fragment may be generated by an RNase P cleavage (Kikuchi et al. 1990). The tRNAs selected as primers by retrotransposons and retroviruses are quite diverse, and it is difficult to discern any special pattern of usage. That being said, the Ty1-copia family appears to have a special affinity for initiator methionine tRNA, which is also used as a primer by caulimoviruses and certain gypsy family elements but not by any known retrovirus. The features of a tRNA that specify its encapsidation in VLPs are not known, but the complementarity between primer tRNA and primer-binding site appears to be dispensable for packaging (Chapman et al. 1992). A number of nonprimer tRNAs are packaged at high levels into VLPs; their function in retrotransposition, if any, is unknown (Pochart et al. 1993a).

Recently, a radically different type of priming has been shown for the S. pombe retrotransposons Tf1 and Tf2 (Levin 1995). These elements lack complementarity to cellular tRNAs, but their primer-binding site has perfect 11-bp complementarity with the 5′end of the retrotransposon transcript. Point mutations in either the primer-binding site or the 5′-end complementary region of Tf1 eliminate formation of minus-strand strong stop DNA and Tf1 transposition. If complementary point mutations in the primer-binding site and 5′end are combined into a single Tf1 RNA, strong stop DNA formation and transposition are restored to a near wild-type level. These results support a model in which 5′-end sequences fold back and base pair with the primer-binding site. More extensive mutagenesis has revealed a complex “pretzel” secondary structure required for priming (Lin and Levin 1997). An 11-nucleotide 5′fragment is released from the RNA by Tf1 RNase H, and this is used as the primer for minus-strand synthesis (Levin 1996). Note that removal of the 5′segment of 11 nucleotides still leaves behind a terminal redundancy (R region), so that the remaining steps in the reverse transcription pathway are retrovirus-like.

Integration

The integration step of LTR retrotransposons has been studied with Ty1 and Ty3 and resembles retroviral integration (Eichinger and Boeke 1988, 1990; Kirchner et al. 1995). Minisubstrates containing only a few terminal nucleotides of Ty1 are sufficient for complete in vitro integration (Devine and Boeke 1994). Many of the Ty1 studies have been carried out with IN present in VLPs. The in vitro integration reaction mediated by VLPs and utilizing exogenous minisubstrates requires wild-type IN, but it can take place quite efficiently in the presence of mutationally inactivated PR or RT. A variety of products are generated in the in vitro reaction, including single-end joins, resulting in Y-shaped products, as well as complete, double-end insertions and more complex products (Braiterman and Boeke 1994a,b). Purified recombinant Ty1 IN is capable of efficiently utilizing a surprising variety of oligonucleotide substrates and can perform a disintegration reaction (Moore and Garfinkel 1994; Moore et al. 1995; see Chapter 5). It remains to be seen how efficiently the recombinant protein can carry out a complete integration reaction in vitro.

Ty3 integration has also been studied in vitro. Although this reaction takes place at an exceedingly low efficiency and can currently be detected only by PCR, it shows an interesting dependence on RNA polymerase III transcription factors (see below), and integration into a tRNA target hot spot was observed (Kirchner et al. 1995).

One difference between Ty1 DNA (and certain other, but not all, LTR retrotransposons) and proviral DNA is the lack of the extra base pairs residing between the 3′end of the 5′LTR and the first base pair of the primer-binding site. Retroviral IN removes the extra bases (usually two) from one strand of the unintegrated linear DNA prior to integration into target DNA in a “processing” step (Chapter 5. Clearly, Ty1 IN does not need to carry out a processing step, nor does it seem capable of carrying out this reaction (Moore and Garfinkel 1994; Moore et al. 1995). Thus, 3′-end processing does not seem to be an obligatory feature of integration. The differences and similarities in structure between Ty1 and retroviral integration reactions are compared in Figure 8.

Figure 8. Integration of Ty1 as compared to retroviral Mo-MLV DNA.

Figure 8

Integration of Ty1 as compared to retroviral Mo-MLV DNA. Ty1 integration is thought to bypass the integrase-mediated dinucleotide removal or “processing” reaction conserved among retroviral integrases. This step in the viral integration (more...)

In addition to the Ty1 type of LTR/primer-binding site junction, there is another unusual type typified by the Drosophila elements 17.6 and gypsy (Saigo et al. 1984; Marlor et al. 1986). In these elements, the last base pair of the 5′LTR is the first base pair of the primer-binding site. Two possible scenarios could explain this arrangement: (1) The primer for these elements could be a deadenylated tRNA, retaining only the CC of the normal CCA 3′terminus. If this were true, the integration could then proceed via a Ty1-like mechanism, i.e., without an IN-mediated processing step. Perhaps the terminal adenylate residue is actually cleaved off of the tRNA by RT, just as some retroviral RTs apparently contain an activity capable of removing aminoacyl groups from charged tRNA bound to its primer-binding site (Sarih et al. 1982). (2) Alternatively, if the primer is a full-length tRNA, the terminal riboadenosine of the tRNA may actually form the 5′terminus of the mature cDNA in these elements. The terminal riboadenosine is thought to become incorporated into HIV-1 LTR circle junctions, suggesting the plausibility of this model (Whitcomb et al. 1990). Detailed analysis of the appropriate reverse transcription intermediates will be necessary to answer these questions.

Target Site Specificity

Like all transposons, retrotransposons are potentially deleterious to their host cell. Thus, retrotransposons may adjust their transposition rates to extremely low levels or specific circumstances, so as to minimize their impact on host fitness. However, extremely low transposition rates may be incompatible with survival of the transposon in diploid hosts, since counterbalancing genetic forces such as homozygosis of the transposon-free allele (meaning conversion of a transposon-containing allele to a transposon-free allele by homologous recombination), LTR-LTR recombination, and other homology-dependent rearrangements can lead to elimination of transposon copies. Thus, many transposons, including retrotransposons, have apparently evolved mechanisms to minimize the extent of potential damage to the host genome by transposing into specific, presumably nondeleterious, regions of the genome. There appear to be at least three classes of target site specificities in retrotransposons (Sandmeyer et al. 1990): sequence-based, positional, and regional.

Sequence specificity

The first type of target site specificity is simple sequence specificity, which is common among poly(A) retrotransposons and may apply to some LTR retrotransposons. This specificity is suggested by the target site duplications of a subset of Drosophila elements in the Ty3/gypsy family. The consensus sequence for these elements is TA(T/C)ATA. This conclusion is based on a relatively small number of sequenced elements and is also confused by difficulty in assigning which base pairs are derived from target DNA and which base pairs are derived from the transposon termini. Furthermore, it is not known whether this particular type of sequence is underrepresented in coding regions, although this sequence does occur in transcription initiation sites (“TATA” boxes). Recent results suggest that this consensus should be revised to a short alternating pyrimidine/ purine stretch (Song et al. 1994).

Position specificity

Ty3 elements associate with tRNA genes in a position-specific manner. The insertion sites for this element show a near-perfect correlation with the nucleotide position at which RNA polymerase III initiates pre-tRNA transcription, although the sequences serving as the preintegration sites bear no resemblance to each other. Rather, it is the distance to the site of transcription initiation that appears to be conserved for each type of element. This correlation has been explored in great detail. Ty3 is targeting RNA polymerase III-transcribed genes, not just tRNA genes; the correlation directly reflects integration specificity and is not the result of selection bias for integration at tRNA genes. In an elegant series of recent experiments, the initiation site for RNA polymerase III transcription initiation at a particular target gene was altered by mutations, and the hot spot for Ty3 integration was moved to the new transcription initiation site (Chalker and Sandmeyer 1992, 1993). Thus, the target may be recognized by interactions between the integration machinery and a component of the RNA polymerase III transcription complex, probably TFIIIB. Ty3 integration also depends on the presence of transcription factors TFIIIB and TFIIIC in vitro, implying a direct interaction between Ty3 IN and one or both of these factors (Kirchner et al. 1995).

Regional specificity

Other yeast transposons, such as Ty1, Ty2, and Ty4, are known to be associated with tRNA genes in a looser manner, i.e., without position specificity. Many studies of tRNA genes and native Ty elements in the genome have turned up correlations of this type. Nevertheless, many studies indicate that Ty1 can readily insert into coding regions and promoter regions of RNA polymerase II-transcribed genes that lack known RNA polymerase III-transcribed genes in their neighborhood (Natsoulis et al. 1989; Wilke et al. 1989; Garfinkel and Strathern 1991). Clearly, a more genome-level approach was necessary to obtain a better picture of Ty1 integration site specificity. An experimental system was developed in which any unselected Ty1 transposition into yeast chromosome III could be recovered (Ji et al. 1993). Because the complete nucleotide sequence of this chromosome was available (Oliver et al. 1992), it was possible to sequence the ends of these experimentally induced transposition events and position them precisely. The data showed a very strong correlation between new insertion sites and tRNA genes. Seven different tRNA gene regions served as targets and more than 50% of the sequenced insertions were associated with the regions 400 bp upstream of tRNA genes, a target region representing less than 1.5% of the DNA on chromosome III; when the window was extended 1000 bp upstream of tRNA genes, 90% of the sequenced insertions were within this range. Examination of the tRNA gene positions relative to the position of ORFs on the chromosome indicates that the regions upstream of tRNA genes typically lack ORFs for several hundred to several thousand base pairs. Thus, these regions of the yeast genome may represent safe havens for transposons because genes are not inactivated.

These results have been confirmed and extended by examination of Ty1 integration into plasmid targets in vivo. Plasmids containing tRNA genes or other RNA polymerase III-transcribed genes are approximately 100–1000 times more likely to serve as transposition targets than plasmids that lacked such genes. Furthermore, mutations that crippled transcription of the RNA polymerase III-transcribed genes severely reduce upstream transposition. In contrast, changing the DNA sequence into which the Ty1s insert has little effect on the ability of the DNA to serve as a target (Devine and Boeke 1996). In the case of Ty3, tRNA gene recognition is probably via direct recognition of a transcription factor; for Ty1 (and other elements with this regional specificity), it is less clear what factor is recognized, but it is probably associated with chromatin structure and/or the RNA polymerase III transcription machinery. Another difference between Ty3 and Ty1 is that Ty1 does not show the absolute specificity for RNA polymerase III-transcribed regions that Ty3 does; insertion into other genes, however, represents a relatively small fraction of Ty1 transposition events. Notably, such integration specificity is not observed with Ty1 artificial transposon integration in vitro (Ji et al. 1993; Devine and Boeke 1994), as expected if protein factors not likely to be present in the in vitro reaction are required for specificity.

Host Genetic Factors Regulating Retrotransposition

A variety of other host factors that affect replication of retrotransposons have been identified (Table 7). Some are gene products such as transcription factors required by these elements, which are obviously needed for retrotransposition. The requirement for other host factors such as Rad6p (also known as Ubc2p) in Ty1 transposition is more subtle. RAD6 encodes a ubiquitin-conjugating enzyme that can ubiquitinate histones in vitro. rad6 mutants are deficient in various types of DNA repair. Transposition of GAL-Ty1 elements is not affected by rad6 mutations (H. Xu and J.D. Boeke, unpubl.); however, spontaneous transpositions into certain loci can be affected dramatically (Kang et al. 1992). This suggests that the effect of the mutation is on targeting, rather than on transposition intermediates. rad6 mutations alter the pattern of Ty1 insertions within the target gene CAN1, perhaps by changing local chromatin structure (Liebman and Newman 1993). In addition, the transcription of donor copies of Ty1 elements located in the tandemly arrayed rDNA is markedly affected by rad6 mutations, suggesting a second level of action in which Rad6p silences the Ty1 elements located in rDNA (Bryk et al. 1997). In another example, the flamenco (flam) gene of Drosophila appears to be critical for transposition of the gypsy element. flam strains, which support transposition, have elevated overall levels of gypsy transcripts, but perhaps more importantly, they express a spliced env mRNA resembling that of retroviruses. This spliced mRNA is not seen in wild-type strains (Pélisson et al. 1994). Thus, the transposition competence of flam strains may reflect a requirement for Env protein in gypsy transposition, consistent with transposition via an infectious mechanism. The phenotype of flam mutants suggests that the flam gene product may be involved in both transcription and RNA splicing.

Table 7. Retrotransposition Host Factors.

Table 7

Retrotransposition Host Factors.

Poly(A) (Non-LTR) Retrotransposons

The poly(A)-type retrotransposons represent a very large class of elements—large in number, diversity of structural types, and the extent to which they have colonized certain host genomes, including the human genome. This very important class of poly(A) elements has been found in filamentous fungi, plants, invertebrates, and mammals. The abbreviation LINE (long interspersed nuclear element) was coined by investigators of the mammalian genome to distinguish these abundant repeat sequences from another very common but much shorter type of repeat, called SINE (short interspersed nuclear element), typified by the human Alu and mouse B2 elements. The less cumbersome term L1 is now applied to mammalian LINEs and is used here. As much as 20% of human DNA is estimated to consist of L1 DNA (Smit 1996); if Alu sequences depend on L1 for transposition machinery as suggested above, then L1 is responsible for at least one third of human DNA. The poly(A) retrotransposons are unique among transposable elements: Most of them lack terminal repeats of any kind (except for the target site duplication). The target site duplication itself is unusual in that for most of these elements, it is variable rather than fixed in length. There are now also several documented examples of target site deletions caused by these elements, although these are less common than target site duplications (Jensen et al. 1995). Most of these transposons are characterized by a 3′poly(A), oligo(A), or a similar sequence (e.g., [TAA]n for Drosophila I factor). Because of these structural peculiarities, the poly(A) retrotransposons serve as a molecular archetype for a larger set of elements such as the retrotranscripts, which include the SINEs and processed pseudogenes. All of these elements share the unusual structural features listed above. Thus, it seems likely that the retrotranscripts, which do not appear to encode any transposition machinery, “borrow” the transposition machinery encoded by the poly(A) retrotransposons.

Among the various types of poly(A) retrotransposons are two recurring subtypes: site-specific and non-site-specific. The former are associated with specific repeat sequences of their host and are always inserted at the same site within the host repeat. These are typified by the R1 and R2 elements that inhabit insect ribosomal DNA, whereas the nonsite-specific class is typified by human L1. When RT sequences of the poly(A) retrotransposons are compared, the resulting phylogenetic tree does not sort these classes into separate branches, suggesting that site-specific integration is polyphyletic (i.e., arose independently more than once within the class) (Xiong and Eickbush 1990).

Site-specific Elements

The site-specific elements are associated with specific repeat sequences and usually have a target site duplication of fixed length. Probably the best studied elements of this type are the R1 and R2 elements of insect rDNA, in particular the R2 element of Bombyx mori (R2Bm). These elements have now been found in many disparate orders of insects (Jakubczak et al. 1991) as well as other invertebrates (Burke et al. 1996) and are adapted to highly conserved regions of the 28S rDNA. R2, for example, inserts near the site occupied by a group I intron in Tetrahymena rDNA, always inserting at exactly the same position. As in most organisms, insect rDNA occurs in genomic tandem arrays. Yet another variety of these retrotransposons is specific to trypanosomatids. These elements are associated with a different tandem array, the short (400–1400-bp) gene repeats encoding the spliced leader (or “mini-exon”) that is spliced in trans to all trypanosomatid mRNAs. Again, association is with a specific position in the miniexon DNA, usually very near or at the splicing site (Aksoy et al. 1987; Carrington et al. 1987; Gabriel et al. 1990; Villanueva et al. 1991). Yet another type of association with a target repeat is postulated for the Xenopus Tx elements; these appear to be associated with small dispersed repeat sequences bearing short inverted repeat termini; the latter repeats are believed to represent members of DNA transposon families of Xenopus (Garrett et al. 1989).

Little is known about the in vivo behavior of the sequence-specific elements, their transposition frequency, tissue or germ-line specificity, or possible regulation. One of the most puzzling aspects of this family is that little clear evidence exists that these site-specific elements are even transcribed, although there is now some very interesting data on the function of gene products via overexpression in heterologous systems. Because many of them are associated with transcribed sequences such as rDNA or miniexon DNA, it is possible either that these elements are cotranscribed passively with the flanking DNA or that they contain their own promoters, like the nonsite-specific poly(A) retrotransposons. The transcripts of these elements either may be very short-lived or fractionate away from cellular RNAs. Alternatively, they may be transcribed only in specific tissues or, in the case of microorganisms, under very specific environmental conditions. Regulated transcription seems more likely given that both the Drosophila I factor and mammalian L1 sequences are known to show tissue-specific transcription (Skowronski and Singer 1985; Skowronski et al. 1988; Chaboissier et al. 1990; Branciforte and Martin 1994).

The site-specific poly(A) retrotransposons can have either one (e.g., R2Bm, CRE-1) or two (e.g., R1, SLACS) ORFs. Those elements having two ORFs generally have a shorter upstream ORF that is featureless except for a carboxy-terminal zinc-finger-like domain (Covey 1986). The large second ORF invariably contains the motifs shared among RTs (Poch et al. 1989; Xiong and Eickbush 1990). RT activity has now been biochemically demonstrated for products of CRE-1 (Gabriel and Boeke 1991) and R2Bm (Luan et al. 1993).

Important clues to the mechanism of replication of these elements can be gleaned from the family-wide analysis of the structure of poly(A) retrotransposons, such as human L1 elements. As most element copies have variable amounts of 5′-end sequence, but virtually all have identical 3′termini, one can easily imagine a critical initial priming event requiring nucleotide sequences from the transposon's 3′end, followed by reverse transcription for variable distances. This would be followed by a second sequence-joining event (or second priming event) that lacks a specific requirement for 5′transposon sequences.

Direct evidence for a specific version of this model was provided by analysis of the R2Bm element RT protein. This protein, purified from E. coli, has RT activity coupled to an endonuclease activity that is specific for DNA containing the insertion site of R2Bm. The purified protein carries out a target DNA nicking reaction (Xiong and Eickbush 1988), and in the presence of the appropriate RNA, which must contain at least the 3′ 250 nucleotides of R2Bm RNA (Luan and Eickbush 1995), a minus-strand nick-primed reverse transcription reaction is carried out in which the R2Bm RNA serves as template and the 3′-OH of the nicked target site serves as a primer. These data have provided the best evidence for a model of retrotransposition of the poly(A) retrotransposons (Fig. 9; see also below) (Eickbush 1992; Luan et al. 1993). Thus far, mostly minus-strand synthesis has been observed in this in vitro system; the means by which the 5′end of the element is joined to target DNA and by which the plus strand is synthesized are not yet clear. Given the existence of many 5′truncated R2 elements (as found in nearly all poly(A) retrotransposon families), it is unlikely that the reactions will depend on specific 5′-terminal transposon sequences.

Figure 9. Non-LTR retrotransposition model.

Figure 9

Non-LTR retrotransposition model. In certain poly(A) retrotransposons such as R2Bm and probably in others as well, priming of reverse transcription is mediated by a nick in the target site. Thus, in these elements, cutting of target site DNA precedes (more...)

These functional data imply an in vitro coupling between cleavage of the target site and reverse transcription of at least the minus strand in the R2Bm element and suggest that models such as those proposing reverse transcription in the cytoplasm or fold-back priming are unlikely. Eickbush (1992) has proposed a simple model (Fig. 9) in which retrotransposition is quite different from the LTR-containing element replication process: Integration is mediated by priming of reverse transcription at the target DNA nick. Hence, in this model, “integration” in the form of nicking the target DNA precedes reverse transcription, rather than the other way around as in the LTR elements. Recent experiments strongly suggest that this model can be extended to other poly(A) retrotransposon elements that lack specific target sequences, such as the human L1 element. These elements also encode a nicking endonuclease. The specificity and biochemical properties of the endonuclease strongly suggest that it recognizes and cleaves target DNA sites. This suggests that nick priming is the retrotransposition mechanism used by this entire class of elements (Feng et al. 1996).

Nonsite-specific Poly(A) Retrotransposons

L1-like Elements

This very important class of poly(A) elements is represented in virtually all eukaryotes (although conspicuously absent in Saccharomyces). A very large number of these elements have now been characterized from a variety of host genomes (Table 6), but this section focuses the discussion on four elements used for most functional studies: the Neurospora crassa Tad element, the Drosophila I factor, mouse L1, and human L1. These elements are usually characterized by a 3′A-rich sequence (Tad lacks this), by the presence of 5′-truncated copies, and by variable-length target site duplications.

Transcripts

The full-length plus-strand transcripts of these elements are transposition intermediates; hence, it is of great interest to catalog the tissues containing these transcripts, to determine whether such transcripts can be found in the germ line, and to identify the relevant transcription factors. The mammalian L1 elements are highly transcribed in a variety of embryonal carcinoma cell lines, and much less so in other cell lines. The full-length human L1 transcripts predicted to be transposition intermediates were observed in teratocarcinoma lines (Skowronski and Singer 1985; Skowronski et al. 1988). Since all of these elements have different 5′-flanking sequences, this result suggested that these elements might contain an internal promoter downstream from the transcription start site. The presence of an internal RNA polymerase II promoter was demonstrated for the Drosophila jockey element (Mizrokhi et al. 1988) and subsequently found in the human L1 element (Swergold 1990) and Drosophila I factor (McLean et al. 1993). A number of cellular RNA polymerase II-transcribed genes that appear to contain internal promoters have also been described; many such TATA-independent promoters utilize the common transcription factor YY1 (Singer et al. 1993; Usheva and Shenk 1994). The murine L1 sequences differ from the jockey and the human L1 elements (and most other poly(A) retrotransposons) in that they contain tandem repeats at their 5′termini. These repeats have promoter activity (Severynse et al. 1992; Adey et al. 1994); however, the detailed 5′structure of murine L1 transcripts has not yet been reported. A recent study demonstrated that the mouse L1 elements are transcribed in a select subset of teratocarcinoma cell lines (Branciforte and Martin 1993). Expression of L1 RNA and protein was also demonstrated directly in vivo in mouse testis in both germ and somatic cell lineages (Branciforte and Martin 1994).

The identification of new mutations caused by human L1 elements in the factor VIII gene, causing hemophilia A (Kazazian et al. 1988), and the dystrophin gene, causing Duchenne muscular dystrophy (Narita et al. 1993; Holmes et al. 1994), suggests that L1 elements transpose and hence must be transcribed in the germ line. Similarly, an L1 insertion was reported in a canine venereal tumor, consistent with germ line expression (Katzir et al. 1985), and human L1 is expressed at high levels in germ line tumors (Bratthauer and Fanning 1992). A recently characterized mutation leading to spasticity in the mouse was shown to result from an L1 insertion in a glycine receptor gene (Kingsmore et al. 1994). Three additional mouse germ line mutations caused by L1 insertion have also been described recently (Kohrman et al. 1996; Takehara et al. 1996; Perou et al. 1997). However, transposition of mammalian L1 elements is clearly not limited to the germ line, as somatic transposition has also been observed in a breast cancer (Morse et al. 1988) and in a colon cancer (Miki et al. 1992), implying expression and transposition in somatic tissues as well.

In insects, evidence has also been obtained for germ-line-specific transcription of members of this family. The I factor of Drosophila has been shown to produce abundant full-length transcripts only in the germ line (Chaboissier et al. 1990; Lachaume et al. 1992; McLean et al. 1993). Similarly, the Fex element of Drosophila shows expression in the germ line, as well as in other tissues (Kerber et al. 1996).

Other elements of this type for which full-length RNAs have been detected include the Tad element of N. crassa. In addition to full-length RNA, an abundant short minus-strand RNA initiated near the 3′end of the element has been demonstrated for this retrotransposon; the detailed structure and function of this minus-strand species are not yet known.

Gene Products

Many workers suspect that poly(A) elements, by analogy to LTR elements, require some particulate species or virus-like particle to mediate retrotransposition. ORF1 proteins of the poly(A) elements are similar in size and encoded at genomic locations similar to those of the Gag proteins; they have been put forward as candidate structural proteins. Like LTR element Gag proteins, they show little to no absolute conservation at the amino acid sequence level. However, the most popular current model for poly(A) retrotransposition, nick priming, does not require a virus-like particle, but only an RT-RNA complex (Fig. 9). Furthermore, many of these elements lack an ORF1 altogether, making a universal requirement for a virus-like particle in these elements somewhat less likely.

Protein products of the nonsite-specific poly(A) retrotransposons have mostly been detected by expression in heterologous cells. In the case of the ORF1 proteins of mammalian L1 elements, both human (Holmes and Singer 1992) and mouse (Martin 1991; Branciforte and Martin 1993, 1994), these proteins can be observed in the homologous cell type. In the case of the mouse elements, they reside in a particulate fraction. The human L1 ORF1 protein contains a leucine-zipper-like motif, which may be important for multimerization (Holmes and Singer 1992), but this motif is not conserved in all mammalian L1 ORF1s. The amino termini of L1 ORF1s are extremely diverse (Demers et al. 1989; Kolosha and Martin 1995); the amino-terminal domain of rabbit L1 strikingly resembles keratins. In addition, many poly(A) element ORFs contain a putative zinc-binding motif that is similar to zinc-binding motifs in the Gag proteins of many other retroelements (Covey 1986), supporting the possibility of a virus-like particle as a transposition intermediate. The best evidence for this comes from the mouse L1 element, in which cosedimentation of ORF1 protein and L1 RNA molecules has been observed in sucrose gradients. Most of these particles appear to be cytoplasmically located (Martin 1991). It has been reported that human teratocarcinoma cells contain a particulate RT activity that has been attributed to the L1 element (Deragon et al. 1990). Finally, the human L1 ORF1 protein is present as a high-molecular-weight cytoplasmic complex from which soluble L1 ORF1 protein can be released by RNase treatment; furthermore, L1 RNA could be specifically UV-cross-linked to ORF1 protein (Hohjoh and Singer 1996). Nevertheless, the function of the ORF1 protein in the poly(A) retrotransposons remains enigmatic.

Both ORF1 and ORF2 (pol) proteins from a variety of poly(A) retrotransposons have now been expressed in heterologous systems. The pol genes of jockey (Ivanov et al. 1991) and human L1 (Mathias et al. 1991) have been shown to encode enzymatically active RT. Are there any additional activities associated with these poly(A)-type retrotransposon proteins? One activity that might be expected is RNase H. Suggestive homologies with RNases H have been reported for I factors (Fawcett et al. 1988; Abad et al. 1989). Although most other poly(A) retrotransposon pol ORFs lack a readily identifiable RNase H domain, there is certainly “room” in the pol ORF of these elements to encode additional functions both upstream and downstream from the RT. Most poly(A) retrotransposons encode a zinc-finger-like motif in the pol ORF, but its function is unknown.

A recent report suggested the presence of an AP (apurinic-apyrimidinic) endonuclease domain in certain of these retrotransposons. This domain is found as a separate ORF in the trypanosome L1Tc element and in the amino termini of ORF2 of certain other elements (Martín et al. 1995). This domain was hypothesized to provide either a damage-specific nuclease that might direct integration (Martín et al. 1995) or an RNase H (a conserved activity of AP endonucleases) (Barzilay and Hickson 1995). A recent study has shown that this domain is widely conserved across all nonsite-specific poly(A) elements and is found in some but not all site-specific elements. When the human L1 endonuclease domain was expressed in E. coli, the purified protein was shown to contain a nicking endonuclease activity (Feng et al. 1996).

Retrotransposition Mechanism

Transposition mechanisms can often be deduced from studies of cloned, preferably genetically marked, functional elements in their native hosts. This type of study has now been carried out with the I factor, Tad, and L1 elements. Original studies on the I factors were carried out with unmarked elements in appropriate D. melanogaster strains. New mutations were shown to result from I factor insertions that resembled the original insertions except that the number of 3′-terminal TAA repeats and the target site duplication differed from those at the site of the parental element. Direct demonstration of an RNA intermediate in the transposition of these elements was accomplished by “intron-marking” experiments (Jensen and Heidmann 1991; Pélisson et al. 1991). In such experiments, an intron flanked by exon sequences sufficient to confer complete splicing signals is used as a marker for the element. Progeny copies can then be studied by various methods to determine whether the intron was lost during transposition, indicating passage through an RNA intermediate. Similar intron-marking experiments have been done with Tad (Kinsey 1993), murine L1 elements (Evans and Palmiter 1991), and human L1 (Moran et al. 1996). These experiments, together with the demonstration of full-length RNA transcripts and encoded RT activities, make a strong case that transposition of poly(A) retrotransposons is mediated by a cDNA intermediate. Moran et al. (1996) obtained very high frequency retrotransposition of a marked human L1 element by driving its expression with the strong cytomegalovirus enhancer/promoter from a high-copy episome, creating the mammalian equivalent of a GAL/Ty1 plasmid. Analysis of the progeny transposition events derived from this plasmid revealed a number of surprising results. All of the marked L1s terminated at a poly(A) site specified by an SV40 sequence in the plasmid rather than at the native L1 poly(A) site. Furthermore, most of the 3′UTR could be deleted without affecting transposition frequency. These data suggest that it is the 3′poly(A) sequence in the transposon RNA that is critical for retrotransposition and not a 3′secondary structure. Presumably, the 3′poly(A) is recognized by the EN/RT ORF2 protein, and this complex is brought to the target DNA site, where nick priming occurs. A second surprise was that the heterologous promoter was not required for transposition in HeLa cells, suggesting that the L1 promoter is not always quiescent in nonteratocarcinoma cells. Although this result seems to be at odds with other data (Swergold 1990), the expression observed by Moran et al. may be due to methylation differences, the presence of the promoter on a replicating plasmid, or sequence context effects. Finally, the human L1 construct is active for retrotransposition in murine cells, suggesting a host of practical applications ranging from insertional mutagenesis to gene therapy using L1 constructs.

There is still no single, satisfying model for the retrotransposition of nonsite-specific poly(A) retrotransposons. Studies of the R2Bm element strongly support models in which reverse transcription and integration are coupled and hence must occur in the nucleus where target DNA resides (Luan et al. 1993). However, there are a number of observations on the behavior of nonsite-specific poly(A) retrotransposons that seem to be somewhat difficult to reconcile with this model.

First, cytoplasmic particles containing ORF1 protein and L1 RNA are found in mouse cells (Martin 1991). This observation may indicate that these particles are en route to delivering the genomic L1 RNA to the nucleus, where it will become reverse-transcribed and incorporated into chromosomal DNA. Alternatively, the observed particles may be artifactual and are unrelated to transposition.

Second, studies of the N. crassa Tad element in heterokaryons have revealed a cytoplasmic intermediate in transposition, in which the transposon is transferred from one nucleus to the other. Again, this observation can be reconciled with the in situ integration/reverse transcription model if the intermediate is simply cytoplasmic Tad RNA or a ribonucleoprotein derived from it.

A model competing with the “integration precedes reverse transcription” model of Eickbush (1992) has been proposed by Schumann et al. (1994). This simple model is based on the observation that there is an abundant minus-strand RNA molecule corresponding precisely to the 3′terminus of the DRE-1 element. Annealing of the minus-strand species to the longest plus-strand species, which is incomplete at its 3′terminus relative to the DNA copy, would result in a duplex that could be filled in at both ends by RT. In principle, this would result in a fully duplex nucleic acid in which both 3′termini would be in the DNA form. Such a molecule could then be integrated into a target site subsequent to reverse transcription, as in the case of the LTR-containing retrotransposons. It may also be that this transposition mechanism is used only by DRE elements, which, unlike other poly(A) elements, have unusual terminal repeat structures.

Retrotranscripts

The term retrotranscript is used here to refer to both short, non-RT-encoding transposable sequences, such as Alu and other “SINE” sequences, and the processed pseudogenes, a diverse set of genes apparently derived from their normal progenitor genes via an mRNA intermediate. The latter are related to their “parent” gene by intron loss, presence of a 3′poly(A) tract, and lack of the native external promoter sequence. Like other transposable elements, the retrotranscripts are flanked by short target site duplications. Alu-like elements and processed pseudogenes have been lumped together because they share three common features: (1) a 3′poly(A) or oligo(A) tract, (2) variable-length target site duplications, and (3) lack of a functional gene product. There are a few exceptions to this final criterion among the processed pseudogenes. Because the first two features are shared with nonsite-specific poly(A) retrotransposons such as L1, and because retrotranscripts do not encode RT or other obvious transposition proteins, it is a reasonable assumption that these elements borrow transposition machinery from other elements, most likely poly(A) retrotransposons. SINEs are specialized nucleic acid sequences, most of which probably originally derived from “normal” cellular sequences such as 7SL RNA (Alu) or tRNA genes.

The SINEs themselves must have acquired special features that render them particularly well suited for successful replication; presumably, these features are acquired by chance. It is noteworthy that retrotranscripts have not been found in the yeast S. cerevisiae, which also lacks poly(A) elements, whereas most if not all organisms known to contain poly(A)-type elements, from insects to mammals, also contain retrotranscripts. Retrotranscripts such as Alu sequences are clearly transposons; they cause new mutations, such as an insertion in the NF1 gene resulting in neurofibromatosis (Wallace et al. 1991). Recent studies suggest that the target sites of L1 and Alu sequences are similar and utilize a similar L1-like endonuclease (Jurka 1997). This further supports the hypothesis that Alus depend on L1s for movement.

An unusual family of retrotranscripts of special interest to retrovirologists has recently been described by Ohshima et al. (1993). The structure of these elements suggests that they may be retroviral replication intermediates corresponding to minus-strand strong stop DNA that have become independent mobile genetic elements. These elements contain a tRNA-like domain (containing an internal RNA polymerase III promoter), as well as an R/U5-like domain. However, the mode of movement of these elements has not yet been established. They are present in high copy number in certain marine organisms and perhaps elsewhere, and they can be transcribed by RNA polymerase III in vitro.

Transcripts, Secondary Structure Models

The transcripts that give rise to processed pseudogenes are probably conventional cellular mRNAs. Therefore, there is nothing to indicate that any special structures in the mRNA (other than perhaps the 3′poly(A) tail) are essential to their use as templates for the creation of pseudogenes. On the other hand, some genes appear to be more prone to pseudogene formation than others (Piechaczyk et al. 1984; Heller et al. 1988). If, like R2Bm, the human L1 element prefers certain RNA structures to define the template for reverse transcription, and human L1 provides the RT for processed pseudogene formation, then it is likely that certain transcripts will be selectively reverse-transcribed.

Alu and B1 sequences have been shown to adopt very specific secondary structures (Labuda and Striker 1989; Sinnett et al. 1992). These sequences appear to be derived from the 7SL RNA that is the RNA component of SRP (Ullu and Tschudi 1984; Quentin 1992). Table 8 summarizes some common retrotranscript types. Alu and B2 transcripts appear to be quite rare and difficult to detect in mammalian RNA preparations, belying their extremely efficient colonization of mammalian genomes, of which they can constitute 10% or more by weight. This, together with the abundance of 7SL RNA, has made it difficult to ascertain the precise structure of the 5′and 3′ends of native Alu transcripts.

Table 8. Retrotranscript Types.

Table 8

Retrotranscript Types.

Alu insertion events have now been detected as mutations in human genes (Muratani et al. 1991; Wallace et al. 1991). Interestingly, these recently transposed Alu copies conform very well to a subclass of Alu elements that are thought to be recently transposed by other criteria. Evidence has been presented that this subclass is transcribed in human cells, albeit only at a very low level (Matera et al. 1990). Certain Alu sequences are hypomethylated in sperm, consistent with the possibility of higher transcription/transposition in germ line cells, as well as possible involvement in imprinting (Hellmann-Blumberg et al. 1993; Chesnokov and Schmid 1995). Recent results suggest that Alu transcript accumulation in tissue culture cells responds to various environmental stresses; 5-azacytidine treatment, heat shock, and related insults can dramatically increase Alu transcript levels (Liu et al. 1995).

Transposition Mechanism: Models

Model systems have been developed for the integration of the equivalent of processed pseudogenes in yeast and animal cells. In the first example of this approach, constructs bearing a neo gene but lacking viral packaging signals were introduced into cells expressing virion proteins. These cells fail to package viral RNA due to a packaging defect. “Virus” produced by these cells could transduce this neo gene into naive target cells. The transductants were dubbed “retrofectants” and were shown to have unusual structures (Linial 1987; Dornburg and Temin 1990; Linial and Miller 1990). The genomic structure of the retrofectants differs from that of “normal” processed pseudogenes, namely, they lacked the expected 3′poly(A) tails and target site duplications. These structures suggest that retrofection represents a different pathway than the one used for processed pseudogene formation, consistent with the hypothesis that the processed pseudogene formation machinery is provided by the L1 element, rather than by endogenous retroviruses.

Derr et al. (1991) demonstrated pseudogene formation in yeast using an intron-marking scheme with the selectable HIS3 reporter gene (see Fig. 10). This system more closely resembled true pseudogene formation in that poly(A) tails were always present at the expected end of the pseudogene, namely, at the 3′end of the RNA that was originally transcribed and gave rise to the pseudogene. The efficiency of pseudogene formation was proportional to the expression of Ty1; Ty1 RT thus appears to be required for formation of these genes. A few of the pseudogenes were cloned and their structure was determined. The pseudogenes were flanked not by variable-length target site duplications but by various Ty1 sequences. Often, these Ty1::HIS3:: Ty1 pseudogenes were part of larger arrays containing more than one Ty1 copy. The pseudogenes could be incorporated into the Ty1 element in either orientation; the mechanism by which the Ty1 sequences became joined to pseudogene DNA was presumed to reflect jumping of the RT from a Ty1 RNA (or minus-strand DNA) template to HIS3 RNA and back to Ty1 RNA (or minus-strand DNA). This process is a good model for the capture of oncogenes by retroviruses (see Chapter 4. These arrays were then integrated into the genome, either by Ty1 IN action on the terminal Ty1 sequences or via homologous recombination between the terminal Ty1 sequences and the endogenous Ty1 copies.

Figure 10. Intron reporters of retrotransposition: Strategies.

Figure 10

Intron reporters of retrotransposition: Strategies. A useful strategy for evaluating whether retrotransposition is occurring is to insert an intron-containing marker into the retroelement. This was done using a native gene segment as a marker by Boeke (more...)

A mammalian system which leads to structurally faithful pseudogene formation (Tchénio et al. 1993) uses introduction of a highly expressed intron-marked neo gene into mammalian cells. Subsequent Neo selection on these cell lines gives rise to NeoR colonies, most of which contain processed pseudogenes. These pseudogenes have both a poly(A) terminus and variable-length target site duplications very similar to those observed in natural processed pseudogenes. Although the construct used here contains retroviral sequences, their removal actually results in more efficient pseudogene formation. This and follow-up studies (Maestre et al. 1996) suggest that the retroviral origin of some of the sequences in the construct was irrelevant. These results suggest that pseudogene formation is not mediated by endogenous viruses because (1) inclusion of retroviral packaging signals into their reporter gene interferes with pseudogene formation, rather than stimulating it, and (2) conditions that stimulate L1 transcription stimulate pseudogene formation. Thus, L1 is a very good candidate to encode the RT and perhaps other machinery responsible for pseudogene formation in mammalian cells. However, a direct demonstration of this key unanswered question remains to be made.

Experiments in yeast suggest that L1 RT may be necessary but not sufficient for the full pathway of processed pseudogene formation (Dombroski et al. 1994). Expression of L1 ORF2 in yeast cells that do not express Ty1 RT leads to efficient HIS3 pseudogene formation. However, most of these pseudogenes, like the Ty1-mediated pseudogenes, are flanked by Ty1 sequences and lack characteristic target site duplications. Unlike the Ty1-mediated events, extra bases are found at the junctions between Ty1 and pseudogene sequences. These results suggest that a mechanism similar to that used by Ty1 RT is employed by L1 RT, except that upon reaching the end of a Ty1 or HIS3 RNA, untemplated bases are added to the 3′end of the DNA strand. These would occasionally be fortuitously complementary to the sequence of an adjacent RNA template, allowing “jumping” of the RT to another template. L1 RT-mediated pseudogene formation in yeast is dependent on the host RAD52 gene, suggesting that integration of these pseudogenes is mediated by general homologous recombination between the terminal Ty1 sequences and Ty1 copies in the genome (J. Moran, unpubl.).

Yeast may not be the most appropriate system to study mammalian processed pseudogene formation as natural processed pseudogenes have not been observed in this organism. This might be due to the fact that yeast lacks L1-like elements. Fink (1987) has suggested that essentially all yeast genes are processed pseudogenes but that they are incorporated into the genome via general homologous recombination, leading to loss of any introns present in the original genomic copy. Such recombination occurs much more readily in yeast than in more complex organisms. By this hypothesis, yeast-processed pseudogenes would lack two important hallmarks of processed pseudogenes, namely, the poly(A) tail and variable-length target site duplication, because these features of the termini would be lost during the process of general homologous recombination. Thus, they are recognized as “processed” simply by their lack of introns, making them impossible to distinguish from normal yeast genes.

Retroplasmids and Retrointrons

Mauriceville/Varkud Plasmids

Most retroelements inhabit the cytoplasm and nuclei of host cells; however, a few bizarre retro-creatures live in organelles. These are members of the “prokaryotic” class of retroelements; an outline of the nucleic acid transactions that occur in these elements is presented in Figure 11. The unusual Mauriceville and Varkud plasmids were isolated from the mitochondria of certain strains of N. crassa, in which they replicate independently of the mitochondrial DNA. Sequence analysis indicated that these closely related plasmids contain a single RT reading frame (Nargang et al. 1984; Akins et al. 1988). The plasmids are transcribed into a long transcript that is exactly unit length; at the end of this transcript, an unusual tRNA-like structure is found, reminiscent of the 3′genomic termini of Brome mosaic virus and certain other RNA viruses.

Figure 11. Replication mechanisms: “Prokaryotic” retroelements.

Figure 11

Replication mechanisms: “Prokaryotic” retroelements. A brief sketch of each element's RNA and DNA forms, drawn so as to imply the mechanisms used for replication. (A) Retron (msDNA); (B) Retrointron (group II intron); (C) Retroplasmid (more...)

Recent work has shown that this RNA is packaged into a particulate structure with RT activity; in fact, the RT can carry out an endogenous reaction on this template, generating a full-length linear DNA (Kuiper and Lambowitz 1988). Surprisingly, the reaction products are not associated with an RNA primer on the 5′end of minus-strand DNA. Recent results indicate that the Mauriceville RT can initiate DNA synthesis in a primer-independent reaction templated by the penultimate nucleotide of the 3′terminus of the specific template RNA (Fig. 12) (Wang and Lambowitz 1993). This RT (unlike all DNA polymerases thus far studied) lacks an absolute primer requirement. This has led to the speculation that this enzyme might represent a transitional form between the primer-independent RNA polymerases and the DNA polymerases. In addition, this enzyme can also use a wide variety of noncomplementary nucleic acids to prime its specific minus-strand DNA product in vitro. The Mauriceville template RNA 3′end folds into a tRNA-like structure that is apparently a recognition motif for the RT. This 3′tRNA-like structure fits very well with the genomic tag hypothesis of Weiner and Maizels (Weiner and Maizels 1987; Maizels and Weiner 1993), which suggests that tRNAs were recruited early in the RNA world as genomic punctuation marks. However, unlike the case for the primer tRNAs in retroviruses and LTR retrotransposons, the Mauriceville tRNA-like structure does not normally serve as the primer for reverse transcription.

Figure 12. Initiation of reverse transcription in retroplasmids.

Figure 12

Initiation of reverse transcription in retroplasmids. The 3′end of Mauriceville plasmid RNA ends in a 3′tRNA-like structure (A). Like native tRNAs and similar structures found in plant RNA viruses such as Brome mosaic virus, the Mauriceville (more...)

Limited plus-strand synthesis is also observed in vitro with the isolated Mauriceville RT ribonucleoproteins, and full-length double-stranded DNA can be isolated from the reaction. Thus, it has been proposed that the replication mechanism of the plasmid is via its transcription, reverse transcription, and circularization. This life cycle is similar to that of the pararetroviruses (see below) but differs in that the plasmid genomic RNA is nonredundant.

Retrointrons

Yet another unusual form of RT was originally discovered in fungal mitochondria in the form of group II intron-encoded proteins. Group II introns are one of two major classes of self-splicing introns that are primarily but not exclusively limited to prokaryotic and organellar genomes. Group II introns are thought to be the precursors of nuclear pre-mRNA introns because they use the same lariat intermediate in splicing. Furthermore, group II introns have a highly conserved secondary structure; regions of this structure are functionally analogous to and may have given rise to at least some of the small nuclear RNAs that represent the core machinery of the spliceosome that carries out nuclear pre-mRNA splicing (Michel and Ferat 1995).

Certain of the group II introns of fungal mitochondria contain long ORFs. An unexpected homology between the products of group II intron ORFs and RTs was noted by Michel and Lang (1985). Since then, similar sequences have been found in group II intron ORFs from various bacteria (Ferat and Michel 1993). Direct evidence for an RT activity corresponding to group II introns came from yeast mitochondria (Wang and Lambowitz 1993) and from an overexpressed Podospora anserina mitochondrial group II intron (Fassbender et al. 1994). A yeast mitochondrial RT activity is encoded by the second group II intron (aI2) of the cytochrome oxidase subunit I (COX1) gene (Moran et al. 1995), which can preferentially use RNA transcripts containing the intron as templates for reverse transcription (Kennell et al. 1993).

The mobile self-splicing introns can be divided into two groups with different mobility mechanisms. Mobile group I introns encode a site-specific DNA endonuclease that cleaves intronless alleles and mediates intron mobility by a mechanism that does not involve reverse transcription (Lambowitz and Belfort 1993). In contrast, group II intron mobility involves an intron-encoded RT and proceeds via an RNA intermediate (see below) (Moran et al. 1995). The group II introns aI1 and aI2 (adjacent introns just 36 bp apart in the same mitochondrial gene) are mobile genetic elements. In crosses between donor strains containing the introns and recipient strains lacking the introns, virtually all of the recipient alleles are converted to recombinant forms that contain both introns (Meunier et al. 1990; Lazowska et al. 1994; Moran et al. 1995). This process is often referred to as “intron homing.”

In every group II intron examined, transcription of the retrointron RT-like ORF is initiated from the promoter of the structural gene that harbors the intron. A hybrid pre-mRNA results, containing an in-frame fusion between the spliced upstream exons of the structural gene and the intron ORF (Perlman 1990). Translation of the pre-mRNA produces a fusion protein; the amino-terminal amino acids are encoded by the structural gene, whereas the majority of the protein (including the RT domain) is encoded by the intron (Fig. 11B). As a consequence of being translated from an intron-containing pre-mRNA, the level of retrointron protein is tightly regulated by splicing. Consistent with this notion, cis-dominant splicing mutations in the yeast mitochondrial introns aI1 or aI2 result in about a tenfold overproduction of the intron-encoded protein. It remains to be seen whether the overproduction of the retrointron protein actually results in an increase in RT activity in these mutants. A complication in interpreting the exact role of the intron-encoded enzymes is that these proteins may also have a “maturase” activity required for efficient splicing of their own intron (Carignani et al. 1983; Moran et al. 1994).

Another unusual feature of the homing reaction of aI1 and aI2 is the unusual inheritance patterns of sequences adjacent to the introns. In crosses in which there are polymorphisms between the donor and recipient mitochondrial DNAs at these sites, intron homing is accompanied by flanking marker coconversion (Lazowska et al. 1994; Moran et al. 1995). This result means that not just the intron but some of the nucleic acid sequences flanking it are transferred from donor to recipient during homing.

An additional transaction that certain group II introns can apparently undergo is referred to as “intron transposition”; this differs from intron homing in that the group II intron inserts itself into an ectopic site (Mueller et al. 1993; Sellem et al. 1993).

One proposed mechanism that could explain retrointron homing and transposition is reverse splicing of the excised retrointron lariat into an intronless site on RNA (or even DNA) that lacks the intron. Reverse transcription of the reinserted intron sequences could result in homing (if it inserted into its native location) or transposition of the intron. Group II introns can reverse splice in vitro (albeit inefficiently) into both RNA (Augustin et al. 1990; Mörl and Schmelzer 1990) and DNA (Mörl et al. 1992). However, a mechanism consisting of reverse splicing into pre-mRNA, followed by reverse transcription, cannot account for the asymmetric flanking marker coconversion that accompanies aI1 and aI2 homing (Lazowska et al. 1994; Moran et al. 1995). Another odd result was that mutations in the YXDD box of the RT do not completely eliminate homing, suggesting the existence of a backup pathway for homing that was RT-independent.

Recent results indicate that aI2 homing occurs by a mechanism resembling the R2Bm site-specific nicking/ reverse transcription mechanism, neatly explaining the features of intron homing, including the unusual inheritance features (Zimmerly et al. 1995a). Previous work had indicated that mitochondrial ribonucleoprotein preparations contain an RT activity dependent on the presence of aI2; in appropriate strains, these ribonucleoprotein preparations also contain unspliced aI2-containing pre-mRNAs (Kennell et al. 1993). These preparations also contain a site-specific activity that precisely cleaves the homing site on one strand and 10 bp downstream from the homing site on the other strand (Fig. 11B). This cleavage results in the incorporation of labeled dNTPs at the 3′ends of the cleaved DNA, and the newly synthesized DNA is derived directly from the aI2 sequence. The DNA was found to be joined to the homing site appropriately, indicating that homing had been at least partially reconstituted in vitro (Zimmerly et al. 1995a). Even more unexpectedly, aI2 intron lariat RNA, which is also present in these preparations, appears to have a direct biochemical role in the DNA cleavage reactions. Direct joining of aI2 intron lariats to the sense-strand cleavage product accompanies the sense-strand cleavage reaction, providing direct evidence that this step of the reaction is indeed accompanied by at least a partial reverse splicing reaction into target DNA (Fig. 11B) (Zimmerly et al. 1995b). Cleavage of the antisense strand (10 bp downstream from the homing site) requires “zinc domain” protein sequences distal to the RT domain; these are believed to encode an endonuclease function because they contain sequence motifs shared by other endonucleases (Gorbalenya 1994; Shub et al. 1994). Extension of the antisense strand by RT, using a pre-aI2 RNA template, would result in the synthesis of a product that would extend into sequences upstream of aI2. The mechanism by which these become joined to the target DNA is not yet clear, but it is presumed to be via general homologous recombination. The RT-independent backup pathway is presumably mediated by cleavage of the homing site DNA by the endonuclease, followed by gene conversion. Consistent with this, mutations that eliminate endonuclease activity eliminate homing.

The Prokaryotic Retrons

General Features

The prokaryotic retrons are among the most unusual members of the retro-bestiary, both because of their bizarre structure and because they do not appear to encode any means for movement within or among host genomes. Because they do not appear to confer any selective advantage on their host, it is difficult to understand how they have persisted. But they are relatively pervasive, at least among some bacterial lineages. Found in all Myxococcus species, msDNAs are also observed in a significant fraction of E. coli isolates, as well as in occasional other genera of gram-negative bacteria such as Rhizobium (Rice et al. 1993). The retrons encode a bizarre nucleic acid species called msDNA. This nucleic acid is synthesized by a complex reverse transcription process that is quite different from the familiar retroviral process. There are several different types of msDNAs. Myxococcus has two known types, whereas E. coli can have one of at least five different types of msDNAs: msDNAs are referred to by a two-letter code referring to the host species followed by a number representing the DNA length in nucleotides (e.g., msDNAEc-67). These unusual entities have been the subject of extensive recent reviews (Inouye and Inouye 1991; Lampson et al. 1991; Inouye and Inouye 1993) and thus discussion here will be limited.

Structure and Biosynthesis of msDNA

The msDNAs consist of RNA and DNA moieties linked via a 2′-5′phosphodiester bond (Fig. 11A). Both moieties are encoded in the same segment of genomic DNA; just downstream from this segment is an RT ORF required for the synthesis of the corresponding msDNA. The unit of DNA containing the genetic information sufficient to make msDNA is referred to as a retron. The RT-based biosynthetic pathway of msDNA has been established through many different lines of evidence. A long transcript (pre-msRNA) is first made by host RNA polymerase and then processed by host enzymes to a shorter form. This form of msRNA is able to fold into a conformation recognized by the appropriate RT; it is an unusual primer-template complex in which a single RNA is both primer and template. In the msRNAs, an internal guanosine residue serves as the priming nucleotide, with the 2′-OH serving as the priming functional group rather than the usual 3′-OH. Presumably, the RT forms the first 2′-5′phosphodiester bond of msDNA during the priming reaction. The best evidence for this point is that the biosynthesis of msDNA has been carried out in vitro using purified RT and pre-msRNA (Hsu et al. 1992; Shimamoto et al. 1995).

Retronphages

As a variety of msDNAs became known from E. coli strains and other sources, it became obvious that many retrons are part of much larger mobile elements that strongly resemble cryptic prophages. This was dramatically demonstrated when transmission of one E. coli retron, residing on a P4-like prophage, was mediated experimentally by superinfection of the strain with P4 and transduction of the msDNA (and its cryptic prophage) to E. coli strains lacking this “retronphage” (Inouye et al. 1991). The msDNA is not required for this type of gene transfer to occur, but rather temperate bacteriophage gene products (int, etc.) apparently mediate the process via direct DNA transactions. The direct mobility of msDNA itself has never been demonstrated, nor has msDNA ever been shown to have any effect, positive or negative, on host-cell phenotype. How it has so successfully maintained itself within certain bacterial lineages and why it should generate the unusual branched msDNA molecule remain quite mysterious.

Animal Pararetroviruses: Hepadnaviruses

Structure

The next two sections describe the pararetroviruses—a group of viruses that use a variation on the retroviral life cycle in which DNA, rather than RNA, is found in the virion; the reverse transcription process takes place in the infected cell prior to virion release (Summers and Mason 1982). The elements have so far been found in mammals, birds, and plants. Spumaviruses have recently been shown to contain DNA in the virion as well (Yu et al. 1996). However, spumaviruses have linear genomes and clearly belong taxonomically to the Retroviridae, whereas pararetroviruses have circular DNA genomes. The important point is that the type of nucleic acid that can be packaged into virions can be either RNA or DNA in viral retroelements.

Hepadnaviruses are known only from a small number of mammals and birds. Viral infection leads to hepatitis and is also associated with a high rate of liver cancer in humans and animals (Mason and Taylor 1989). The infectious virions consist of a core particle composed of the C (core) protein, the viral gapped circular DNA and the viral RT, or P (polymerase) protein, and it is surrounded by an envelope containing the surface proteins. In the presence of dNTPs, virion cores extend the short plus strands to fill in the gap, effecting the first step in the replication process.

Life Cycle

The incoming infectious viral particle contains double-stranded genomic DNA bearing a large gap of variable length in the plus strand. This DNA is also unusual in that the minus strand is covalently linked to the RT protein (P protein) at its 5′end. The replication process of hepadnaviruses is convoluted and only partially understood (Loeb and Ganem 1993). Shortly after infection, the input molecule becomes converted to the covalently closed circle found in the nucleus of the infected hepatocyte. Transcription of this DNA by host RNA polymerase results in the formation of a number of transcripts including a greater than unit length, terminally redundant polyadenylated RNA referred to as pregenome RNA, which is then packaged into new viral core particles in the cytoplasm. The R region of pregenome RNA performs a function analogous to that of the retroviral R region in that it allows for strand transfer. However, the effective length of R is only about four nucleotides, as only this much of R is reverse-transcribed into the equivalent of strong stop DNA. Minus-strand and partial plus-strand DNA synthesis is carried out by the hepadnavirus RT to form the new genomic gapped circles.

One of the most unusual aspects of this reverse transcription is that the priming of minus-strand synthesis is carried out by the P (polymerase) protein itself rather than by a nucleic acid (Wang and Seeger 1992; Tavis and Ganem 1993; Lanford et al. 1995). The priming occurs at a tyrosine residue in the amino terminus of the RT (P) protein (Weber et al. 1994; Zoulim and Seeger 1994). This priming reaction is complex, occurring in two steps. The initial priming event occurs at a stem-loop structure in an RNA sequence (ε) that is also required for RNA encapsidation (see Fig. 6C). The first four nucleotides synthesized within ε are complementary to a sequence that is repeated at the 5′ boundary of the 3′R region in the RNA. The latter copy of the sequence is sometimes called τ and coincidentally comprises part of the DR1 sequence (see below) that is also important in plus-strand priming. In the minus-strand priming reaction, the first four nucleotides are polymerized using the viral packaging signal (ε) at the 5′end of pregenomic RNA as a template; this short DNA product, still covalently bound to RT protein, is then transferred (by an unknown mechanism) to the τ sequence near the 3′end of the RNA and is elongated from there toward the 5′end of pregenome RNA (Wang and Seeger 1993). The function of this elaborate process is not clear, but it ties together the processes of RNA encapsidation and initiation of reverse transcription. It is interesting to note that several other viruses that use protein priming (adenovirus, bacteriophage ϕ29) carry out a similar two-step priming process, possibly as a “proofreading” step to ensure accurate initiation by a relatively sloppy protein-priming mechanism (Esteban et al. 1993).

The plus-strand priming reaction appears to be carried out by a more conventional RNA priming mechanism, utilizing a capped oligoribonucleotide derived from the 5′end of pregenome RNA, probably generated by RNase H cleavage. In this way, it somewhat resembles retroviral plus-strand priming, although a polypurine tract sequence does not define the RNase H cleavage site. This priming reaction occurs at DR2, a sequence identical to DR1 but located upstream of it. The RNA primer, however, is derived from the 5′copy of DR1 and not from DR2 as would have been predicted based on a retrovirus-like, plus-strand priming mechanism. Therefore, the RNA primer must be translocated from the 5′DR1 to DR2, to explain the requirement for these sequences in cis for replication. The region of complementarity allowing plus-strand transfer is called R and is defined as the sequences on the minus-strand DNA corresponding to sequences between the 5′end of minus-strand DNA and the 5′end of pregenomic RNA. Extension of the plus strand beyond the protein-linked 5′end of the template presumably involves a template transfer step in order to allow utilization of this 5′R region and circle formation. Once this template transfer has occurred, plus-strand reverse transcription proceeds for a variable distance, leaving a gap on the plus strand in most hepadnavirus types. The reason for this plus-strand gap is unclear, but it would appear to result from exhaustion of the dNTP pool or some other event associated with maturation of hepadnavirions; isolated hepadnavirion cores can readily carry out an endogenous reaction using the internal gapped circle as a primer/template and exogenously supplied dNTPs.

Plant Pararetroviruses: Caulimoviruses

Structure

The caulimoviruses resemble the hepadnaviruses in containing a double-stranded circular DNA genome with interruptions on both plus and minus strands. However, there are no large gaps, nor is there any protein covalently attached to the genome. The virions are produced in the form of large inclusion bodies, can be mechanically transmitted from one plant to another, and are transmitted by aphids or other insect vectors in nature. The gene structure of caulimoviruses is much more complex than that of retroviruses or hepadnaviruses. Most caulimoviruses have seven to eight ORFs. Many of these ORFs are important for inclusion body formation and aphid transmission and are not discussed further here. In addition to the “conventional” icosahedral caulimoviruses, a new class of nonicosahedral caulimoviruses called badnaviruses (bacilliform DNA viruses) has recently been characterized. For further details of caulimoviruses, see reviews by Bonneville et al. (1988) and Gordon (1990).

Life Cycle

Once injected into a plant cell either by a researcher or an aphid, caulimoviral particles are uncoated and the virion DNA becomes converted to supercoiled DNA circles, which are found in the nucleus; as in the hepadnaviruses, it is presumed that incoming viral DNA is “repaired” by unspecified processes to a supercoiled circle. This then provides the template for transcription, which results in multiple transcripts, including a terminally redundant polyadenylated 35S transcript. The details of gene expression in these viruses are very complex in that the mRNAs are polycistronic, they can contain very long leader sequences, and there is evidence for posttranscriptional regulation of expression (for more details, see Fütterer and Hohn 1991). The 35S RNA is presumably encapsidated together with the tRNAi Met primer used by all caulimoviruses and badnaviruses, and there the process of reverse transcription begins.

The overall process of reverse transcription is much more retrovirus-like and (thus far) somewhat less convoluted in caulimoviruses than in hepadnaviruses (Fig. 6D). The initial step in reverse transcription is tRNAi Met priming at a primer-binding site downstream from the 5′R sequence. This results in a minus-strand strong-stop-like molecule (called α1) that is presumably transferred to the 3′R repeat via an RNase H-mediated process. Extension of the minus strand and degradation of the RNA by RNase H would produce a terminally redundant minus-strand DNA (the terminal redundancy consisting of the primer-binding site sequence). Plus-strand priming at two polypurine tracts would result in the synthesis of two segments of plus-strand DNA; limited displacement synthesis would explain the overlapping strand discontinuities found in the plus strand. Circularization of the genome should occur during plus-strand synthesis and must occur within the primer-binding site sequence (for further details of the replication cycle, see Bonneville and Hohn 1993).

Copyright © 1997, Cold Spring Harbor Laboratory Press.
Bookshelf ID: NBK19412

Views

  • PubReader
  • Print View
  • Cite this Page

Related Items in Bookshelf

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...