NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Coffin JM, Hughes SH, Varmus HE, editors. Retroviruses. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press; 1997.
Until DNA cloning and sequencing were developed in the late 1970s, inferences about the organization of the retroviral genome were hard won. Because of the very high rate of recombination, typically yielding more than one crossover per recombinant genome, classical genetic experiments were difficult to interpret. Even the size of the genome remained uncertain until the application of RNA fingerprinting in the mid 1970s (Beemon et al. 1974; Billeter et al. 1974). Determination of the location of viral genes on the genomic RNA was first based on the identification of specific oligonucleotides on the viral RNA, localized by the selection of RNA fragments containing the poly(A) tail at the 3′end of the genome (Wang et al. 1975; Coffin and Billeter 1976). Restriction endonuclease mapping and Southern blot hybridization were subsequently used for obtaining high-resolution maps of the structure of viral genomes and the DNA intermediates. These approaches were superseded by DNA sequencing coupled with limited amino-terminal sequence analysis of viral proteins, which allowed precise definitions of coding sequences on the genome. The first complete retroviral genomes to be defined in such detail were the Moloney strain of murine leukemia virus (Mo-MLV; Shinnick et al. 1981), the prototype of the mammalian C-type genus, and the Prague strain of Rous sarcoma virus (Pr-RSV; Schwartz et al. 1983), the prototype of the ASLV genus. Detailed structural information derived from sequenced prototypes of all retroviral genera is presented in Appendix 2.
Retroviruses carry their genome as RNA, and because they replicate via a DNA intermediate, they must also carry the sequence for the promoter that will drive the expression of the genomic RNA. In the RNA genome, this sequence is located near the 3′end. The promoter is juxtaposed upstream of the coding sequences only after reverse transcription creates a DNA copy (Chapter 4), as first shown in studies of the avian viruses (Hughes et al. 1978; Shank et al. 1978). For many purposes, it is conceptually simpler to discuss retroviral genome organization in terms of the DNA as it is integrated into cellular DNA (i.e., the provirus), because this places the promoter, the RNA start site, and the polyadenylation site in the same positions as they are found in typical host genes. Working out the relationship between the sequence organization of the viral RNA and DNA was an important milestone in retroviral replication. A generalized provirus (Fig. 4) can be divided into genes proper and sequences that have other roles. By convention, the genes are given three-letter names written in lowercase and italics (Baltimore 1975). The product encoded by each gene is given the same name, but in normal type with the first letter capitalized. The proteins that are formed by proteolytic processing of the primary translational products are designated by two-letter names in cases where a function is known or by a “p” followed by a number for the size of the polypeptide if function has not been established (Leis et al. 1988). This nomenclature supersedes an earlier convention, in which all proteins were named by their apparent molecular weight, with prefixes “p,” “pp,” “gp,” and “Pr” standing for protein, phosphoprotein, glycoprotein, and precursor, respectively (August et al. 1974).
The viral genes gag, pro, pol, and env (Table 1) occupy the body of the DNA. Indeed, for many retroviruses, these are the only genes. gag encodes the internal structural protein of the virus (Gag protein, from the original name “group-specific antigen,” reflecting what were believed to be the antigenic properties of this protein). Gag is proteolytically processed into the mature proteins MA (matrix), CA (capsid), NC (nucleocapsid), and sometimes others, of uncertain function, that are designated by numbers. pol encodes the enzymes reverse transcriptase (RT), which contains both DNA polymerase and associated RNase H activities, and integrase (IN), which mediates replication of the genome. pro encodes the viral protease (PR), which acts late in assembly of the viral particle to process proteolytically the proteins encoded by gag, pro, and pol, and in some cases also env (Chapter 7). env encodes the surface (SU) glycoprotein and the transmembrane (TM) protein of the virion, which form a complex that interacts specifically with cellular receptor proteins. This interaction leads ultimately to fusion of the viral membrane with the cell membrane (Chapter 3). A few retroviral groups also contain another gene, called dut, which encodes a deoxyuridine triphosphatase (dUTPase or DU). Unlike all other genes, dut is found in differing locations, being translated in the pro reading frame in the B- and D-type viruses, and in the pol frame in the nonprimate lentiviruses.
In addition to gag, pro, pol, and dut, those retroviruses classified as complex—the lentivirus genus, the spumavirus genus, the HTLV/bovine leukemia virus (BLV) genus, and a newly characterized fish virus genus—also contain “accessory” genes. Accessory genes regulate and coordinate viral gene expression, and some also have other ancillary roles. These genes are located between pol and env, just downstream from env including the U3 region of the LTR, or overlapping portions of env and each other.
Some retroviruses carry genes of a different class: the oncogenes, or onc genes. Many retroviruses—hereafter referred to as “transforming” viruses—were first identified by their ability to rapidly cause tumors in animals and oncogenically transform cells in culture. Transformation of cultured cells invariably was traced to host-derived sequences that the virus had acquired (Chapter 10). With the exception of some strains of RSV, retroviruses that carry oncogenes are defective, having suffered variable deletions of one or more of the viral genes needed for replication during or after the acquisition event. As a consequence, many retroviral oncogenes are expressed as Gag-Onc fusion proteins, with part or most of gag being deleted. There are also numerous examples in which the oncogene replaces env or is positioned elsewhere in the genome. Retroviruses with such rearrangements are defective for replication on their own and can replicate only if the cell is also infected with a nondefective virus, usually called a helper virus. In the nondefective strains of RSV, the v-src oncogene is freestanding downstream from env and is expressed from a separately spliced mRNA (Chapter 6). An overview of the organization of genes in a few prototypic retroviruses is presented in Figure 5.
The genes in the viral DNA are bracketed by the long terminal repeats (LTRs), identical sequences that can be divided into three elements. U3 is derived from the sequence unique to the 3′end of the RNA, R is derived from a sequence repeated at both ends of the RNA, and U5 is derived from the sequence unique to the 5′end of the RNA. The genesis of the LTR elements lies in the process of reverse transcription, whereby the enzyme “jumps” from one end of the template to the other (Chapter 4). The sizes of these three elements vary considerably among different retroviruses, with U3 typically ranging from several hundred nucleotides to more than a thousand nucleotides, R from a dozen to more than a hundred nucleotides, and U5 from about one to two hundred nucleotides (Table 2). From the definition of U3, R, and U5, it follows that the site of transcription initiation is at the boundary between U3 and R, and the site of poly(A) addition is at the boundary between R and U5, as shown. The other boundaries of U3 and U5 are determined by the sites of initiation of plus- and minus- strand DNA synthesis. U3 contains most of the transcriptional control elements of the provirus, which include the promoter proper, and multiple enhancer sequences responsive to cellular and in some cases viral transcriptional activator proteins (Chapter 6). The exact nature of the enhancer sequences plays a critical part in determining tissue specificity of viral replication and, as a consequence, pathogenesis. Even minor sequence alterations in U3 can convert a pathogenic virus into a nonpathogenic one, or vice versa (Chapter 10).
Figures
Figure 4
Figure 5
Tables
Table 1Retroviral Genes
Gene | Properties/function of protein |
---|---|
Common to All Retroviruses | |
gag | precursor to internal structural proteins |
pro | PR enzyme |
pol | precursor to RT and IN enzymes |
env | precursor to envelope glycoproteins |
Accessory Genes | |
HTLV/BLV (e.g., HTLV-1) | |
tax | transcription activator |
rex | splicing/RNA transport regulator |
Lentiviruses (for primate lenti, e.g., HIV-1) | |
tat | activates transcription |
rev | regulates splicing/RNA transport |
vif | affects infectivity of viral particles |
vpr and/or vpx nef | is present in virion; has nuclear localization signal; facilitates infectivity in quiescent cells; triggers CD4 endocytosis, alters signal transduction in T cells; enhances virion infectivity |
vpu | integral membrane protein; triggers CD4 degradation; enhances virion release |
dut | dUTPase (only in nonprimate lentiviruses); facilitates replication in certain cell types |
Type B (e.g., MMTV) | |
sag | superantigen |
dut | dUTPase (NC-DU fusion) |
Type D (e.g., M-PMV) | |
dut | DU enzyme (NC-DU fusion) |
Spumaviruses (e.g., HSRV) | |
bel1 | activates transcription |
bel2 | ? |
bet | ? |
Piscine retroviruses (e.g., WDSV) | |
orf A | ? |
orf B | ? |
orf C | ? |