See "Essentials of Glycobiology, 4th Edition"
See the updated version of this chapter
Like the other biopolymers, proteins, and nucleic acids, glycans come in a diversity of structures that underlie a vast array of biological functions. To understand these functions at a molecular level, we must first understand glycan structures at a chemical level. This chapter begins with an introduction to the building blocks—monosaccharides—that are assembled to generate more complex glycans. After a brief summary of nomenclature, we present the salient chemical features of the monosaccharides that define their structural diversity, with an emphasis on stereochemical properties. We then illustrate how monosaccharide diversity, combined with a multiplicity of ways in which they can be linked together, can create the wealth of glycan structures found in nature. An understanding of the structural features that distinguish glycans from other biopolymers will help the reader to appreciate the origin of their biological capabilities.
INTRODUCTION TO GLYCAN TERMINOLOGY
Although we use the term glycan in this book, several names for sugar polymers are found in other textbooks and the literature. Early on, sugar-based substances were referred to as carbohydrates, a term literally derived from “hydrates of carbon.” This name was coined more than 100 years ago to describe naturally occurring substances with the general formula C x (H2O) n that also possess a carbonyl group, either an aldehyde or a ketone. The simplest of these polyhydroxylated carbonyl compounds is called monosaccharides (saccharide is derived from the Greek sakchar, meaning sugar or sweetness).
Monosaccharides are joined together to make oligosaccharides or polysaccharides. Typically, the term oligosaccharide refers to any glycan that contains a small number (2–20) of monosaccharide residues connected by glycosidic linkages. The term polysaccharide is typically used to denote any linear or branched polymer consisting of monosaccharide residues, such as cellulose (Chapters 13 and 22). Thus, the relationship of monosaccharides to oligosaccharides or polysaccharides is analogous to that of amino acids and proteins, or nucleotides and nucleic acids.
The term glycoconjugate is often used to describe a macromolecule that contains monosaccharides covalently linked to proteins or lipids. The prefix glyco- and the suffixes -saccharide and -glycan indicate the presence of carbohydrate constituents (e.g., glycoproteins, glycolipids, and proteoglycans). Just as is observed with proteins in nature, additional structural diversity can be imparted to glycans by modifying their hydroxy groups with phosphate, sulfate, or acetyl esters.
The designation complex is given to a variety of carbohydrates. A carbohydrate may be termed complex if it contains more than one type of monosaccharide building unit. Thus, the glucose-based polymer cellulose is an example of a simple carbohydrate, whereas a galactomannan polysaccharide, which possesses both galactose and mannose, is an example of a complex carbohydrate. However, even so-called simple glycans, such as cellulose and starch, often have very complex molecular structures in three dimensions. In the description of N-glycans of glycoproteins (see Chapter 8), the term complex is used more specifically for N-glycans with multiple, extended branches, often containing N-acetyllactosamine units. Finally, the term complex carbohydrates includes glycoconjugates, whereas the term carbohydrates per se would not. Additional nomenclature issues are covered in the various sections of this chapter. A more detailed and comprehensive listing of carbohydrate nomenclature rules has been published (see the McNaught reference in Further Reading at the end of the chapter).
MONOSACCHARIDES: BASIC STRUCTURES AND STEREOISOMERISM
The classification of monosaccharide structures began in the late 19th century with the pioneering work of Emil Fischer. All simple monosaccharides have the general empirical formula C x (H2O) n , where n is an integer ranging from 3 to 9. As mentioned briefly in Chapter 1, all monosaccharides consist of a chain of chiral hydroxymethylene units, which terminates at one end with a hydroxymethyl group and at the other with either an aldehyde group (aldoses) or an α-hydroxy ketone group (ketoses). Glyceraldehyde is the simplest aldose and dihydroxyacetone is the simplest ketose (Figure 2.1). The structures of glyceraldehyde and dihydroxyacetone are distinct in that glyceraldehyde contains an asymmetric (chiral) carbon atom (Figure 2.1), whereas dihydroxyacetone does not. With the exception of dihydroxyacetone, all monosaccharides have at least one asymmetric carbon atom, the total number being equal to the number of internal (CHOH) groups (n – 2 for aldoses and n – 3 for ketoses with n carbon atoms). The number of stereoisomers corresponds to 2 k , where k equals the number of asymmetric carbon atoms. For example, an aldohexose with the general formula C6H12O6 and four asymmetric carbon atoms, that is, four (CHOH) groups, can be described in 16 possible isomeric forms.
The numbering of carbon atoms follows the rules of organic chemistry nomenclature: The aldehyde carbon is referred to as C-1 and the carbonyl group in ketoses is referred to as C-2. The overall configuration (D or L) of each sugar is determined by the absolute configuration of the stereogenic center furthest from the carbonyl group (i.e., with the highest numbered asymmetric carbon atom; this is C-5 in hexoses and C-4 in pentoses). The configuration of a monosaccharide is most easily determined by representing the structure in a Fischer projection. If the OH (or other non-H group) is on the right in the Fischer projection, the overall configuration is D. If the OH (or other non-H group) is on the left, the overall configuration is L (Figure 2.2). This figure also shows D- and L-glucose in the cyclic form (chair conformation) found in solution. Most vertebrate monosaccharides have the D configuration with the exception of fucose and iduronic acid, L sugars. The Fischer projections shown in Figure 2.3 illustrate the acyclic structures of all D-aldoses through the aldohexose group.
Any two sugars that differ only in the configuration around a single chiral carbon atom are called epimers. For example, D-mannose is the C-2 epimer of D-glucose, whereas D-galactose is the C-4 epimer of D-glucose (Figure 2.4). The names of monosaccharides are frequently abbreviated; most common are three-letter abbreviations for simple monosaccharides (e.g., Gal, Glc, Man, Xyl, Fuc). There are nine common monosaccharides found in vertebrate glycoconjugates (Figure 2.4). Once incorporated into a glycan, these nine monosaccharide building blocks can be further modified to generate additional sugar structures. For example, glucuronic acid can be epimerized at C-5 to generate iduronic acid (IdoA). A great many more monosaccharides exist in glycoconjugates from other species and as intermediates in metabolism. In this volume, we use a symbolic notation for the monosaccharides that are most abundant in vertebrate glycoconjugates (see Chapter 1, Figure 1.5).
MONOSACCHARIDES EXIST PRIMARILY IN CYCLIC FORM
Monosaccharides exist in solution as an equilibrium mixture of acyclic and cyclic forms. The percentage of each form depends on the sugar structure. The cyclic form of a monosaccharide is characterized by a hemiacetal group formed by the reaction of one of the hydroxy groups with the C-1 aldehyde or ketone. The general reaction that cyclizes the monosaccharide is shown in Figure 2.5. For reasons of chemical stability, five- and six-membered rings are most commonly formed from acyclic monosaccharides. Generally, aldohexoses form six-membered rings via a C-1—O—C-5 ring closure; ketohexoses form five-membered rings via a C-2—O—C-5 ring closure; aldohexoses form five-membered rings through a C-1—O—C-4 ring closure or six-membered rings through a C-1—O—C-5 ring closure (Figure 2.6). Because of the structural similarity to the organic compounds furan and pyran, a five-membered cyclic hemiacetal is labeled a furanose and a six-membered cyclic hemiacetal is called a pyranose.
Monosaccharides can also be represented as Haworth projections in which both five-and six-membered cyclic structures are depicted as planar ring systems, with the hydroxy groups oriented either above or below the plane of the ring (Figure 2.7). Although not truly representative of the three-dimensional structure of a monosaccharide, the Haworth representation has been used since the late 1920s as an easy-to-draw formula that permits a quick evaluation of stereochemistry around the monosaccharide ring. The Haworth representations are preferably drawn with the ring oxygen atom at the top (for furanose) or the top right-hand corner (for pyranose) of the structure; the numbering of the ring carbons increases in a clockwise direction.
For any D sugar, the conversion of a Fischer projection into a Haworth projection proceeds as follows: (1) Any groups (atoms) that are directed to the right in the Fischer structure are given a downward orientation in the Haworth structure, (2) any groups (atoms) that are directed to the left in the Fischer structure are given an upward orientation in the Haworth structure, and (3) the terminal —CH2OH group is given an upward orientation in the Haworth structure. For an L sugar, (1) and (2) are the same, but the terminal —CH2OH group is projected downward.
The planar Haworth structures are distorted representations of the actual molecules. The preferred conformation of a pyranose ring is the chair conformation, similar to the structure of cyclohexane. The conversion from Haworth projection to chair conformation leaves the downward or upward orientation of ring substituents unaltered. Two chair conformations can be distinguished and designated as 4C1 and 1C4, respectively (Figure 2.8a), and these conformers can interconvert by a process called the “ring flip.” The first numeral in the chair conformer designation (superscript) indicates the number of the ring carbon atom above the “seat of the chair (C)” and the second numeral (subscript) indicates the number of the ring carbon atom below the plane of the seat (spanned by C-2, C-3, C-5, and the ring O). Chair conformations are designated from structures with the ring oxygen atom in the top right-hand corner of the ring “seat,” resulting in the clockwise appearance of the ring numbering. To determine the stereochemistry in the chair form as it corresponds to the Fischer projection, one can locate C-6 and then trace along the carbon skeleton of the sugar, bisecting the C—O and C—H bonds formed from each atom. The OH (or OR) and H groups are found on the right (R) or left (L) sides, just as in the Fischer projection (Figure 2.9).
At present, the more structurally accurate chair representations are preferred to Haworth projections for depicting pyranoses. However, Haworth projections are still commonly used for depicting furanoses. The furanose ring is rather flexible and not entirely flat in any of its energetically favored conformations; for example, it has a slight pucker when viewed from the side, as seen in the representations of the so-called envelope and twist (or skew) conformations (Figure 2.8b). Because furanoses can adopt many low-energy conformations, researchers have adopted the Haworth projection as a simple means to avoid this complexity.
CHEMISTRY AT THE ANOMERIC CENTER
Mutarotation
When cyclized into rings, monosaccharides acquire an additional asymmetric center derived from the carbonyl carbon atom (Figure 2.6). The new asymmetric center is termed the anomeric carbon (i.e., C-1 in the ring form of glucose). Two stereoisomers are formed by the cyclization reaction because the anomeric hydroxy group can assume two possible orientations. When the configurations (R or S) are the same at the anomeric carbon and the stereogenic center furthest from the anomeric carbon, the monosaccharide is defined as the α anomer. When the configurations are different, the monosaccharide is defined as the β anomer (Figure 2.10). Unlike the other stereocenters on the monosaccharide ring, which are configurationally stable, the anomeric center can undergo an interconversion of stereoisomers via the process of mutarotation. Catalyzed by dilute acid or base, the reaction proceeds by the reverse of the cyclization reaction. The monosaccharide ring opens up and then recloses to form a ring with the other anomeric configuration (Figure 2.6). The term mutarotation derives from the rapid change in optical rotation (denoted [α] D) that is observed when an anomerically pure form of a monosaccharide is dissolved in water. For example, β-D-glucopyranose shows an initial rotation of +19°, whereas the α anomer shows an initial rotation of +112°. When either anomer is allowed to undergo the mutarotation reaction, an equilibrium mixture containing both anomers is obtained, producing a rotation of +52.5°.
Oxidation and Reduction
Generally, the acyclic (aldehyde or ketone) form of a monosaccharide is only present in minor amounts in an equilibrium mixture (<0.01%). Nevertheless, the open-chain aldehydes or ketones can participate in chemical reactions that drive the equilibrium and eventually consume the sugar.
Aldoses and ketoses were historically referred to as reducing sugars because they responded positively in a chemical test that effected oxidation of their aldehyde and hydroxyketone functionalities, respectively. The carboxylic acid formed by oxidation of the aldehyde in an aldose is referred to as a glyconic acid (e.g., gluconic acid is the oxidation product of glucose). It is also possible to oxidize the hydroxy groups of monosaccharides, most notably the terminal OH group (i.e., C-6 of glucose). In this reaction, a glycuronic acid is produced, and if both terminal groups are oxidized, the product is a glycaric acid. The three acids derived from D-glucose are illustrated in Figure 2.11. These compounds have a tendency to undergo intramolecular cyclization reactions, preferably yielding six-membered lactones. Two examples of lactonization are shown in Figure 2.11. Oxidized forms of monosaccharides can be found in nature. For example, glucuronic acid (GlcA) is an abundant component of many glycosaminoglycans (see Chapter 16).
The carbonyl groups of aldoses and ketoses also can be reduced with sodium borohydride (NaBH4) to form polyhydroxy alcohols, referred to as alditols. This reaction is widely used to introduce a radiolabel at C-1 of the monosaccharide by reduction with NaB3H4 (Figure 2.12).
Schiff Base Formation
The aldehyde and ketone groups of monosaccharides can also undergo Schiff base formation with amines or hydrazides, forming imines and hydrazones, respectively (Figure 2.13). This reaction is often used to conjugate the monosaccharide to proteins (via their lysine residues) or to biochemical probes such as biotin hydrazide. It should be noted that the imines formed with amino groups are not stable to water and are typically reduced with sodium cyanoborohydride (NaCNBH3) in a process termed reductive amination.
As aldehydes, reducing sugars can also form Schiff bases with amino groups of the lysine residues in proteins. This nonenzymatic process that links glycans to proteins is termed “glycation” and is distinct from “glycosylation,” which involves the formation of a glyco-sidic bond between the sugar and protein. Glycation products can undergo further reactions that lead to the formation of protein cross-links, and these can have pathogenic consequences (i.e., they are immunogenic and change the properties of the protein). Glycation products of glucose accumulate at higher levels in diabetics than in healthy individuals because of elevated blood glucose levels. These modified proteins are thought to underlie some of the pathologies associated with diabetes.
Glycosidic Bond Formation
Two monosaccharide units can be joined together by a glycosidic bond—this is the fundamental linkage among the monosaccharide building blocks found in all oligosaccharides. The glycosidic bond is formed between the anomeric carbon of one monosaccharide and a hydroxy group of another. In chemical terms, a hemiacetal group reacts with an alcohol group to form an acetal. Glycosidic bonds can be formed with virtually any hydroxylated compound, including simple alcohols such as methanol or ethanol (Figure 2.14) or hydroxy amino acids such as serine, threonine, and tyrosine. Indeed, glycosidic linkages are formed between sugars and these amino acids within proteins to form glycoproteins (see Chapters 8 and 9). Like the hemiacetal, the acetal or glycosidic linkage can exist in two stereoisomeric forms: α and β. But unlike the hemiacetal, the acetal is configurationally stable under most conditions. Thus, once a glycosidic bond is formed, its configuration is maintained indefinitely. Furthermore, no oxidation or reduction can take place at an anomeric center that is involved in a glycosidic bond. Like acetals in general, glycosidic bonds can be hydrolyzed in dilute acid, generating the constituent monosaccharides from oligosaccharides.
In the field of glycan synthesis, there has been considerable effort directed at the development of methods for constructing glycosidic bonds among monosaccharides. An over -view of glycan synthesis strategies is provided in Chapter 49.
CHEMISTRY OF MONOSACCHARIDE FUNCTIONAL GROUPS
Methylation of Hydroxyl Groups
The hydroxy groups of both monosaccharides and oligosaccharides can be chemically modified without affecting glycosidic linkages. A common modification is the capping of hydroxy groups to generate methyl ethers. Methylation is a chemical transformation that is used in the structural analysis of glycans (see Chapter 47). Partially methylated glycans are also known to occur in natural products and a number of methyltransferases have been identified.
Esterification of Hydroxyl Groups
The hydroxy groups of glycans can be esterified in nature by a variety of different enzymes. Esterification constitutes another element of variation in glycan structure and is sometimes required for interactions with other biomolecules. The most important types of sugar esters in nature are phosphate esters (including diphosphate esters), acyl esters (with acetic acid or fatty acids), and sulfate esters.
Deoxygenation of Hydroxyl Groups
The hydroxy groups of monosaccharides can be replaced with hydrogen atoms to form deoxysugars. This can be achieved chemically using rather complex procedures, but nature has evolved reductases that perform such reactions in one step. For example, deoxygenation of ribose within a ribonucleotide to form the 2-deoxyribonucleotide is a critical reaction in DNA biosynthesis. Fucose (Fuc), one of the common vertebrate monosaccharides, is deoxygenated at C-6 (Figure 2.4) during its biosynthesis from mannose (see Chapter 4).
Amino Groups
Many monosaccharides have N-acetamido groups, such as GlcNAc, GalNAc, and NeuNAc. In rare cases, the N-acetamido group is de-N-acetylated to form amino groups. These are found in heparan sulfate (see Chapter 16), glycosylphosphatidylinositol (GPI) anchors (see Chapter 11), and many bacterial glycan structures (see Chapter 20). Amino groups can be modified with sulfates, similar to hydroxyl groups, as found in heparan sulfate.
GLYCAN STRUCTURE AND DIVERSITY
The diversity characteristic of glycans on glycoconjugates derives from the many ways in which monosaccharides can be linked together to form higher-order structures. Variety in glycosidic linkages has as its source the many possible isomers that can be formed between two monosaccharides. First, the glycosidic linkage can be formed in two possible stereoisomers at the anomeric carbon of one sugar (α or β). Second, the many hydroxy groups of the other sugar permit several possible regioisomers. For example, two glucose residues can be joined together in numerous ways, as illustrated by maltose (Glcα4Glc) and gentiobiose (Glcβ6Glc) (Figure 2.15). These isomers have very different three-dimensional structures and biological activities. Finally, a monosaccharide can be involved in more than two glycosidic linkages, thus serving as a branchpoint. The common occurrence of branched sequences (as opposed to the linear sequences that are found in almost all peptides) is unique to glycans and contributes to their structural diversity.
The relationship of the glycosidic bond to oligosaccharides is analogous to the relationship of the peptide bond to polypeptides and the phosphodiester bond to polynucleotides. However, amino acids and nucleotides are linked in only one fashion during the formation of polypeptides and nucleic acids, respectively; there is no stereochemical or regiochemical diversity in these biopolymers. The number of monomeric residues contained in an oligosaccharide is designated in the nomenclature—disaccharide, trisac-charide, and so on. Just as polypeptides have amino and carboxy ltermini and polynucleotides have 5′ and 3′ termini, oligosaccharides have a polarity that is defined by their reducing and nonreducing termini (Figure 2.16). The reducing end of the oligosaccharide bears a free anomeric center that is not engaged in a glycosidic bond and thus retains the chemical reactivity of the aldehyde. However, it continues to be referred to as reducing end even when it is engaged in a linkage to another hydroxylic compound, for example, to the hydroxyl of serine or threonine, as in glycoproteins. Structures are commonly written from the nonreducing end on the left toward the reducing end on the right. For some structures, there is no reducing end. For example, the common disaccharides sucrose and trehalose have glycosidic linkages between the anomeric centers of two monosaccharide constituents (Figure 2.17).
The glycosidic linkage is the most flexible part of a disaccharide structure. Whereas the chair conformation of the constituent monosaccharides is relatively rigid, the torsion angles around the glycosidic bond (φ, ψ, and ω; Figure 2.18) can vary. Thus, a disaccharide of well-defined primary structure can adopt multiple conformations in solution that differ in the relative orientation of the two monosaccharides. The combination of structural rigidity and flexibility is typical of complex carbohydrates and, more than likely, essential to their biological functions.
Glycans are linked to other biomolecules, such as lipids or amino acids within poly-peptides, through glycosidic linkages to form glycoconjugates (see Chapters 8, 9, 10, and 11). Glycans are often referred to as the glycone of a glycoconjugate and the noncarbohydrate component is named the aglycone. The glycan may be a single monosaccharide or an oligosac-charide. The attachment of many glycans to a polypeptide scaffold creates tremendous diversity among glycoproteins. The two general classes of protein-bound glycans are N-and O-linked. The N-linked glycans are bound to the nitrogen atom of asparagine side chains, whereas the O-linked glycans are bound to the oxygen atom of serine or threonine side chains. The structures, biosynthesis, and biological properties of these protein-bound glycans are discussed in Chapters 8 and 9, respectively. As examples, the structures depicted in Figure 2.19 represent a complex N-linked glycan and an O-linked glycan possessing a branched heptasaccharide structure. Many glycoproteins possess several attached glycans, each of different structure, and both N- and O-linked glycans can occur on the same polypeptide scaffold.
In conclusion, the tremendous diversity of glycoconjugate structures derives from many elements. The multiple monosaccharide building blocks can be linked to various regio-chemistries and stereochemistries, and the resulting oligosaccharides can be assembled on protein or lipid scaffolds. Glycoconjugates therefore comprise an “information-rich” system capable of participating in a wide range of biological functions.
ONLINE RESOURCES
To assist the scientific community in keeping track of the primary structures and biological functions of glycans, several groups have generated computer-searchable databases, available through several large-scale glycomics initiatives.
The Consortium for Functional Glycomics (CFG) has developed a bioinformatics platform with search options including glycan structures and biological activities (http://www.functionalglycomics.org). Other major carbohydrate databases include the Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan database (http://www.genome.ad.jp/kegg/glycan/) and the KEGG pathways database (http://www.genome.jp/kegg/pathway.html). A collection of linked carbohydrate databases can be found at the German Cancer Research Center (DKFZ) (http://www.glycosciences.de). The Collaborative Glycomics Initiative, EUROCarbDB, can be accessed at http://www.eurocarbdb.org/.
FURTHER READING
- El Khadem HS. Carbohydrate chemistry—Monosaccharides and their oligomers. Academic Press; San Diego: 1988.
- Allen HJ, Kisailus EC, editors. Glycoconjugates: Composition, structure, and function. Marcel Dekker; New York: 1992.
- McNaught AD. Nomenclature of carbohydrates. Carbohydr. Res. 1997;297:1–92. [PubMed: 9042704]
- Bill MR, Revers L, Wilson IBH. Protein glycosylation. Kluwer Academic Publishers; Boston: 1998.
- Boons G-J, editor. Carbohydrate chemistry. Blackie Academic & Professional; London: 1998.
- Stick RV. Carbohydrates: The sweet molecules of life. Academic Press; New York: 2001.
- Raman R, Raguram S, Venkataraman G, Paulson JC, Sasisekharan R. Glycomics: An integrated systems approach to structure-function relationships of glycans. Nat Methods. 2005;2:817–824. [PubMed: 16278650]
- Varki NM, Varki A. Diversity in cell surface sialic acid presentations: Implications for biology and disease. Lab Invest. 2007;87:851–857. [PMC free article: PMC7100186] [PubMed: 17632542]
Publication Details
Author Information and Affiliations
Authors
Carolyn R Bertozzi and David Rabuka.Copyright
Publisher
Cold Spring Harbor Laboratory Press, Cold Spring Harbor (NY)
NLM Citation
Bertozzi CR, Rabuka D. Structural Basis of Glycan Diversity. In: Varki A, Cummings RD, Esko JD, et al., editors. Essentials of Glycobiology. 2nd edition. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press; 2009. Chapter 2.