The Molecular Composition of Cells

Geoffrey M Cooper

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Cooper GM. The Cell: A Molecular Approach. 2nd edition. Sunderland (MA): Sinauer Associates; 2000.

By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.

The Cell: A Molecular Approach. 2nd edition.

Show details

The Molecular Composition of Cells

Cells are composed of water, inorganic ions, and carbon-containing (organic) molecules. Water is the most abundant molecule in cells, accounting for 70% or more of total cell mass. Consequently, the interactions between water and the other constituents of cells are of central importance in biological chemistry. The critical property of water in this respect is that it is a polar molecule, in which the hydrogen atoms have a slight positive charge and the oxygen has a slight negative charge (Figure 2.1). Because of their polar nature, water molecules can form hydrogen bonds with each other or with other polar molecules, as well as interacting with positively or negatively charged ions. As a result of these interactions, ions and polar molecules are readily soluble in water (hydrophilic). In contrast, nonpolar molecules, which cannot interact with water, are poorly soluble in an aqueous environment (hydrophobic). Consequently, nonpolar molecules tend to minimize their contact with water by associating closely with each other instead. As discussed later in this chapter, such interactions of polar and nonpolar molecules with water and with each other play crucial roles in the formation of biological structures, such as cell membranes.

Figure 2.1

Characteristics of water. (A) Water is a polar molecule, with a slight negative charge (δ^-) on the oxygen atom and a slight positive charge (δ⁺) on the hydrogen atoms. Because of this polarity, water molecules can form hydrogen bonds (dashed (more...)

The inorganic ions of the cell, including sodium (Na⁺), potassium (K⁺), magnesium (Mg²⁺), calcium (Ca²⁺), phosphate (HPO₄^2-), chloride (Cl^-), and bicarbonate (HCO₃^-), constitute 1% or less of the cell mass. These ions are involved in a number of aspects of cell metabolism, and thus play critical roles in cell function.

It is, however, the organic molecules that are the unique constituents of cells. Most of these organic compounds belong to one of four classes of molecules: carbohydrates, lipids, proteins, and nucleic acids. Proteins, nucleic acids, and most carbohydrates (the polysaccharides) are macromolecules formed by the joining (polymerization) of hundreds or thousands of low-molecular-weight precursors: amino acids, nucleotides, and simple sugars, respectively. Such macromolecules constitute 80 to 90% of the dry weight of most cells. Lipids are the other major constituent of cells. The remainder of the cell mass is composed of a variety of small organic molecules, including macromolecular precursors. The basic chemistry of cells can thus be understood in terms of the structures and functions of four major classes of organic molecules.

Carbohydrates

The carbohydrates include simple sugars as well as polysaccharides. These simple sugars, such as glucose, are the major nutrients of cells. As discussed later in this chapter, their breakdown provides both a source of cellular energy and the starting material for the synthesis of other cell constituents. Polysaccharides are storage forms of sugars and form structural components of the cell. In addition, polysaccharides and shorter polymers of sugars act as markers for a variety of cell recognition processes, including the adhesion of cells to their neighbors and the transport of proteins to appropriate intracellular destinations.

The structures of representative simple sugars (monosaccharides) are illustrated in Figure 2.2. The basic formula for these molecules is (CH₂O)_n, from which the name carbohydrate is derived (C= “carbo” and H₂O= “hydrate”). The six-carbon (n= 6) sugar glucose (C₆H₁₂O₆) is especially important in cells, since it provides the principal source of cellular energy. Other simple sugars have between three and seven carbons, with three- and five-carbon sugars being the most common. Sugars containing five or more carbons can cyclize to form ring structures, which are the predominant forms of these molecules within cells. As illustrated in Figure 2.2, the cyclized sugars exist in two alternative forms (called α or β), depending on the configuration of carbon 1.

Figure 2.2

Structure of simple sugars. Representative sugars containing three, five, and six carbons (triose, pentose, and hexose sugars, respectively) are illustrated. Sugars with five or more carbons can cyclize to form rings, which exist in two alternative forms (more...)

Monosaccharides can be joined together by dehydration reactions, in which H₂O is removed and the sugars are linked by a glycosidic bond between two of their carbons (Figure 2.3). If only a few sugars are joined together, the resulting polymer is called an oligosaccharide. If a large number (hundreds or thousands) of sugars are involved, the resulting polymers are macromolecules called polysaccharides.

Figure 2.3

Formation of a glycosidic bond. Two simple sugars are joined by a dehydration reaction (a reaction in which water is removed). In the example shown, two glucose molecules in the α configuration are joined by a bond between carbons 1 and 4, which (more...)

Two common polysaccharides—glycogen and starch—are the storage forms of carbohydrates in animal and plant cells, respectively. Both glycogen and starch are composed entirely of glucose molecules in the α configuration (Figure 2.4). The principal linkage is between carbon 1 of one glucose and carbon 4 of a second. In addition, both glycogen and one form of starch (amylopectin) contain occasional α (1→6) linkages, in which carbon 1 of one glucose is joined to carbon 6 of a second. As illustrated in Figure 2.4, these linkages lead to the formation of branches resulting from the joining of two separate α (1→4) linked chains. Such branches are present in glycogen and amylopectin, although another form of starch (amylose) is an unbranched molecule.

Figure 2.4

Structure of polysaccharides. Polysaccharides are macromolecules consisting of hundreds or thousands of simple sugars. Glycogen, starch, and cellulose are all composed entirely of glucose residues, which are joined by α (1→4) glycosidic (more...)

The structures of glycogen and starch are thus basically similar, as is their function: to store glucose. Cellulose, in contrast, has a quite distinct function as the principal structural component of the plant cell wall. Perhaps surprisingly, then, cellulose is also composed entirely of glucose molecules. The glucose residues in cellulose, however, are in the β rather than the α configuration, and cellulose is an unbranched polysaccharide (see Figure 2.4). The linkage of glucose residues by β (1→4) rather than α (1→4) bonds causes cellulose to form long extended chains that pack side by side to form fibers of great mechanical strength.

In addition to their roles in energy storage and cell structure, oligosaccharides and polysaccharides are important in a variety of cell signaling processes. For example, oligosaccharides are frequently linked to proteins, where they serve as markers to target proteins for transport to the cell surface or incorporation into different subcellular organelles. Oligosaccharides and polysaccharides also serve as markers on the surface of cells, playing important roles in cell recognition and the interactions between cells in tissues of multicellular organisms.

Lipids

Lipids have three major roles in cells. First, they provide an important form of energy storage. Second, and of great importance in cell biology, lipids are the major components of cell membranes. Third, lipids play important roles in cell signaling, both as steroid hormones (e.g., estrogen and testosterone) and as messenger molecules that convey signals from cell surface receptors to targets within the cell.

The simplest lipids are fatty acids, which consist of long hydrocarbon chains, most frequently containing 16 or 18 carbon atoms, with a carboxyl group (COO^-) at one end (Figure 2.5). Unsaturated fatty acids contain one or more double bonds between carbon atoms; in saturated fatty acids all of the carbon atoms are bonded to the maximum number of hydrogen atoms. The long hydrocarbon chains of fatty acids contain only nonpolar C—H bonds, which are unable to interact with water. The hydrophobic nature of these fatty acid chains is responsible for much of the behavior of complex lipids, particularly in the formation of biological membranes.

Figure 2.5

Structure of fatty acids. Fatty acids consist of long hydrocarbon chains terminating in a carboxyl group (COO^-). Palmitate and stearate are saturated fatty acids consisting of 16 and 18 carbons, respectively. Oleate is an unsaturated 18-carbon fatty acid (more...)

Fatty acids are stored in the form of triacylglycerols, or fats, which consist of three fatty acids linked to a glycerol molecule (Figure 2.6). Triacylglycerols are insoluble in water and therefore accumulate as fat droplets in the cytoplasm. When required, they can be broken down for use in energy-yielding reactions discussed later in this chapter. It is noteworthy that fats are a more efficient form of energy storage than carbohydrates, yielding more than twice as much energy per weight of material broken down. Fats therefore allow energy to be stored in less than half the body weight that would be required to store the same amount of energy in carbohydrates—a particularly important consideration for animals because of their mobility.

Figure 2.6

Structure of triacylglycerols. Triacylglycerols (fats) contain three fatty acids joined to glycerol. In this example, all three fatty acids are palmitate, but triacylglycerols often contain a mixture of different fatty acids.

Phospholipids, the principal components of cell membranes, consist of two fatty acids joined to a polar head group (Figure 2.7). In the glycerol phospholipids, the two fatty acids are bound to carbon atoms in glycerol, as in triacylglycerols. The third carbon of glycerol, however, is bound to a phosphate group, which is in turn frequently attached to another small polar molecule, such as choline, serine, inositol, or ethanolamine. Sphingomyelin, the only nonglycerol phospholipid in cell membranes, contains two hydrocarbon chains linked to a polar head group formed from serine rather than from glycerol. All phospholipids have hydrophobic tails, consisting of the two hydrocarbon chains, and hydrophilic head groups, consisting of the phosphate group and its polar attachments. Consequently, phospholipids are amphipathic molecules, part water-soluble and part water-insoluble. This property of phospholipids is the basis for the formation of biological membranes, as discussed later in this chapter.

Figure 2.7

Structure of phospholipids. Glycerol phospholipids contain two fatty acids joined to glycerol. The fatty acids may be different from each other and are designated R1 and R2. The third carbon of glycerol is joined to a phosphate group (forming phosphatidic (more...)

In addition to phospholipids, many cell membranes contain glycolipids and cholesterol. Glycolipids consist of two hydrocarbon chains linked to polar head groups that contain carbohydrates (Figure 2.8). They are thus similar to the phospholipids in their general organization as amphipathic molecules. Cholesterol, in contrast, consists of four hydrocarbon rings rather than linear hydrocarbon chains (Figure 2.9). The hydrocarbon rings are strongly hydrophobic, but the hydroxyl (OH) group attached to one end of cholesterol is weakly hydrophilic, so cholesterol is also amphipathic.

Figure 2.8

Structure of glycolipids. Two hydrocarbon chains are joined to a polar head group formed from serine and containing carbohydrates (e.g., glucose).

Figure 2.9

Cholesterol and steroid hormones. Cholesterol, an important component of cell membranes, is an amphipathic molecule because of its polar hydroxyl group. Cholesterol is also a precursor to the steroid hormones, such as testosterone and estradiol (a form (more...)

In addition to their roles as components of cell membranes, lipids function as signaling molecules, both within and between cells. The steroid hormones (such as estrogens and testosterone) are derivatives of cholesterol (see Figure 2.9). These hormones are a diverse group of chemical messengers, all of which contain four hydrocarbon rings to which distinct functional groups are attached. Derivatives of phospholipids also serve as messenger molecules within cells, acting to convey signals from cell surface receptors to intracellular targets (see Chapter 13).

Nucleic Acids

The nucleic acids—DNA and RNA—are the principal informational molecules of the cell. Deoxyribonucleic acid (DNA) has a unique role as the genetic material, which in eukaryotic cells is located in the nucleus. Different types of ribonucleic acid (RNA) participate in a number of cellular activities. Messenger RNA (mRNA) carries information from DNA to the ribosomes, where it serves as a template for protein synthesis. Two other types of RNA (ribosomal RNA and transfer RNA) are involved in protein synthesis. Still other kinds of RNAs are involved in the processing and transport of both RNAs and proteins. In addition to acting as an informational molecule, RNA is also capable of catalyzing a number of chemical reactions. In present-day cells, these include reactions involved in both protein synthesis and RNA processing.

DNA and RNA are polymers of nucleotides, which consist of purine and pyrimidine bases linked to phosphorylated sugars (Figure 2.10). DNA contains two purines (adenine and guanine) and two pyrimidines (cytosine and thymine). Adenine, guanine, and cytosine are also present in RNA, but RNA contains uracil in place of thymine. The bases are linked to sugars (2′-deoxyribose in DNA, or ribose in RNA) to form nucleosides. Nucleotides additionally contain one or more phosphate groups linked to the 5′ carbon of nucleoside sugars.

Figure 2.10

Components of nucleic acids. Nucleic acids contain purine and pyrimidine bases linked to phosphorylated sugars. A nucleic acid base linked to a sugar alone is a nucleoside. Nucleotides additionally contain one or more phosphate groups.

The polymerization of nucleotides to form nucleic acids involves the formation of phosphodiester bonds between the 5′ phosphate of one nucleotide and the 3′ hydroxyl of another (Figure 2.11). Oligonucleotides are small polymers containing only a few nucleotides; the large polynucleotides that make up cellular RNA and DNA may contain thousands or millions of nucleotides, respectively. It is important to note that a polynucleotide chain has a sense of direction, with one end of the chain terminating in a 5′ phosphate group and the other in a 3′ hydroxyl group. Polynucleotides are always synthesized in the 5′ to 3′ direction, with a free nucleotide being added to the 3′ OH group of a growing chain. By convention, the sequence of bases in DNA or RNA is also written in the 5′ to 3′ direction.

Figure 2.11

Polymerization of nucleotides. A phosphodiester bond is formed between the 3′ hydroxyl group of one nucleotide and the 5′ phosphate group of another. A polynucleotide chain has a sense of direction, one end terminating in a 5′ (more...)

The information in DNA and RNA is conveyed by the order of the bases in the polynucleotide chain. DNA is a double-stranded molecule consisting of two polynucleotide chains running in opposite directions (see Chapter 3). The bases are on the inside of the molecule, and the two chains are joined by hydrogen bonds between complementary base pairs—adenine pairing with thymine and guanine with cytosine (Figure 2.12). The important consequence of such complementary base pairing is that one strand of DNA (or RNA) can act as a template to direct the synthesis of a complementary strand. Nucleic acids are thus uniquely capable of directing their own self-replication, allowing them to function as the fundamental informational molecules of the cell. The information carried by DNA and RNA directs the synthesis of specific proteins, which control most cellular activities.

Figure 2.12

Complementary pairing between nucleic acid bases. The formation of hydrogen bonds between bases on opposite strands of DNA leads to the specific pairing of guanine (G) with cytosine (C) and adenine (A) with thymine (T).

Nucleotides are not only important as the building blocks of nucleic acids; they also play critical roles in other cell processes. Perhaps the most prominent example is adenosine 5′-triphosphate (ATP), which is the principal form of chemical energy within cells. Other nucleotides similarly function as carriers of either energy or reactive chemical groups in a wide variety of metabolic reactions. In addition, some nucleotides (e.g., cyclic AMP) are important signaling molecules within cells (see Chapter 13).

Proteins

While nucleic acids carry the genetic information of the cell, the primary responsibility of proteins is to execute the tasks directed by that information. Proteins are the most diverse of all macromolecules, and each cell contains several thousand different proteins, which perform a wide variety of functions. The roles of proteins include serving as structural components of cells and tissues, acting in the transport and storage of small molecules (e.g., the transport of oxygen by hemoglobin), transmitting information between cells (e.g., protein hormones), and providing a defense against infection (e.g., antibodies). The most fundamental property of proteins, however, is their ability to act as enzymes, which, as discussed in the following section, catalyze nearly all the chemical reactions in biological systems. Thus, proteins direct virtually all activities of the cell. The central importance of proteins in biological chemistry is indicated by their name, which is derived from the Greek word proteios, meaning “of the first rank.”

Proteins are polymers of 20 different amino acids. Each amino acid consists of a carbon atom (called the α carbon) bonded to a carboxyl group (COO^-), an amino group (NH₃⁺), a hydrogen atom, and a distinctive side chain (Figure 2.13). The specific chemical properties of the different amino acid side chains determine the roles of each amino acid in protein structure and function.

Figure 2.13

Structure of amino acids. Each amino acid consists of a central carbon atom (the α carbon) bonded to a hydrogen atom, a carboxyl group, an amino group, and a specific side chain (designated R). At physiological pH, both the carboxyl and amino (more...)

The amino acids can be grouped into four broad categories according to the properties of their side chains (Figure 2.14). Ten amino acids have nonpolar side chains that do not interact with water. Glycine is the simplest amino acid, with a side chain consisting of only a hydrogen atom. Alanine, valine, leucine, and isoleucine have hydrocarbon side chains consisting of up to four carbon atoms. The side chains of these amino acids are hydrophobic and therefore tend to be located in the interior of proteins, where they are not in contact with water. Proline similarly has a hydrocarbon side chain, but it is unique in that its side chain is bonded to the nitrogen of the amino group as well as to the α carbon, forming a cyclic structure. The side chains of two amino acids, cysteine and methionine, contain sulfur atoms. Methionine is quite hydrophobic, but cysteine is less so because of its sulfhydryl (SH) group. As discussed later, the sulfhydryl group of cysteine plays an important role in protein structure because disulfide bonds can form between the side chains of different cysteine residues. Finally, two nonpolar amino acids, phenylalanine and tryptophan, have side chains containing very hydrophobic aromatic rings.

Figure 2.14

The amino acids. The three-letter and one-letter abbreviations for each amino acid are indicated. The amino acids are grouped into four categories according to the properties of their side chains: nonpolar, polar, basic, and acidic.

Five amino acids have uncharged but polar side chains. These include serine, threonine, and tyrosine, which have hydroxyl groups on their side chains, as well as asparagine and glutamine, which have polar amide (O=C—NH₂) groups. Because the polar side chains of these amino acids can form hydrogen bonds with water, these amino acids are hydrophilic and tend to be located on the outside of proteins.

The amino acids lysine, arginine, and histidine have side chains with charged basic groups. Lysine and arginine are very basic amino acids, and their side chains are positively charged in the cell. Consequently, they are very hydrophilic and are found in contact with water on the surface of proteins. Histidine can be either uncharged or positively charged at physiological pH, so it frequently plays an active role in enzymatic reactions involving the exchange of hydrogen ions, as illustrated in the example of enzymatic catalysis discussed in the following section.

Finally, two amino acids, aspartic acid and glutamic acid, have acidic side chains terminating in carboxyl groups. These amino acids are negatively charged within the cell and are therefore frequently referred to as aspartate and glutamate. Like the basic amino acids, these acidic amino acids are very hydrophilic and are usually located on the surface of proteins.

Amino acids are joined together by peptide bonds between the α amino group of one amino acid and the α carboxyl group of a second (Figure 2.15). Polypeptides are linear chains of amino acids, usually hundreds or thousands of amino acids in length. Each polypeptide chain has two distinct ends, one terminating in an α amino group (the amino, or N, terminus) and the other in an α carboxyl group (the carboxy, or C, terminus). Polypeptides are synthesized from the amino to the carboxy terminus, and the sequence of amino acids in a polypeptide is written (by convention) in the same order.

Figure 2.15

Formation of a peptide bond. The carboxyl group of one amino acid is linked to the amino group of a second.

The defining characteristic of proteins is that they are polypeptides with specific amino acid sequences. In 1953 Frederick Sanger was the first to determine the complete amino acid sequence of a protein, the hormone insulin. Insulin was found to consist of two polypeptide chains, joined by disulfide bonds between cysteine residues (Figure 2.16). Most important, Sanger's experiment revealed that each protein consists of a specific amino acid sequence. Proteins are currently sequenced using automated methods, and the complete amino acid sequences of over 100,000 proteins are now known. Each consists of a unique sequence of amino acids, determined by the order of nucleotides in a gene (see Chapter 3).

Figure 2.16

Amino acid sequence of insulin. Insulin consists of two polypeptide chains, one of 21 and the other of 30 amino acids (indicated here by their one-letter codes). The side chains of three pairs of cysteine residues are joined by disulfide bonds, two of (more...)

The amino acid sequence of a protein is only the first element of its structure. Rather than being extended chains of amino acids, proteins adopt distinct three-dimensional conformations that are critical to their function. These three-dimensional conformations of proteins are the result of interactions between their constituent amino acids, so the shapes of proteins are determined by their amino acid sequences. This was first demonstrated by experiments of Christian Anfinsen in which he disrupted the three-dimensional structures of proteins by treatments, such as heating, that break noncovalent bonds—a process called denaturation (Figure 2.17). Following incubation under milder conditions, such denatured proteins often spontaneously returned to their native conformations, indicating that these conformations were directly determined by the amino acid sequence.

Figure 2.17

Protein denaturation and refolding. Ribonuclease (RNase) is a protein of 124 amino acids (indicated by numbers). The protein is normally folded into its native conformation, which contains four disulfide bonds (indicated as paired circles representing (more...)

The three-dimensional structure of proteins is most frequently analyzed by X-ray crystallography, a high-resolution technique that can determine the arrangement of individual atoms within a molecule. A beam of X rays is directed at crystals of the protein to be analyzed, and the pattern of X rays that pass through the protein crystal is detected on X-ray film. As the X rays strike the crystal, they are scattered in characteristic patterns determined by the arrangement of atoms in the molecule. The structure of the molecule can therefore be deduced from the pattern of scattered X rays (the diffraction pattern).

In 1958 John Kendrew was the first to determine the three-dimensional structure of a protein, myoglobin—a relatively simple protein of 153 amino acids (Figure 2.18). Since then, the three-dimensional structures of several thousand proteins have been analyzed. Most, like myoglobin, are globular proteins with polypeptide chains folded into compact structures, although some (such as the structural proteins of connective tissues) are long fibrous molecules. Analysis of the three-dimensional structures of these proteins has revealed several basic principles that govern protein folding, although protein structure is so complex that predicting the three-dimensional structure of a protein directly from its amino acid sequence is impossible.

Figure 2.18

Three-dimensional structure of myoglobin. Myoglobin is a protein of 153 amino acids that is involved in oxygen transport. The polypeptide chain is folded around a heme group that serves as the oxygen-binding site.

Protein structure is generally described as having four levels. The primary structure of a protein is the sequence of amino acids in its polypeptide chain. The secondary structure is the regular arrangement of amino acids within localized regions of the polypeptide. Two types of secondary structure, which were first proposed by Linus Pauling and Robert Corey in 1951, are particularly common: the α helix and the β sheet. Both of these secondary structures are held together by hydrogen bonds between the CO and NH groups of peptide bonds. An α helix is formed when a region of a polypeptide chain coils around itself, with the CO group of one peptide bond forming a hydrogen bond with the NH group of a peptide bond located four residues downstream in the linear polypeptide chain (Figure 2.19). In contrast, a β sheet is formed when two parts of a polypeptide chain lie side by side with hydrogen bonds between them. Such β sheets can be formed between several polypeptide strands, which can be oriented either parallel or antiparallel to each other.

Figure 2.19

Secondary structure of proteins. The most common types of secondary structure are the α helix and the β sheet. In an α helix, hydrogen bonds form between CO and NH groups of peptide bonds separated by four amino acid residues. (more...)

Tertiary structure is the folding of the polypeptide chain as a result of interactions between the side chains of amino acids that lie in different regions of the primary sequence (Figure 2.20). In most proteins, combinations of α helices and β sheets, connected by loop regions of the polypeptide chain, fold into compact globular structures called domains, which are the basic units of tertiary structure. Small proteins, such as ribonuclease or myoglobin, contain only a single domain; larger proteins may contain a number of different domains, which are frequently associated with distinct functions.

Figure 2.20

Tertiary structure of ribonuclease. Regions of α-helix and β-sheet secondary structures, connected by loop regions, are folded into the native conformation of the protein. In this schematic representation of the polypeptide chain as a (more...)

A critical determinant of tertiary structure is the localization of hydrophobic amino acids in the interior of the protein and of hydrophilic amino acids on the surface, where they interact with water. The interiors of folded proteins thus consist mainly of hydrophobic amino acids arranged in α helices and β sheets; these secondary structures are found in the hydrophobic cores of proteins because hydrogen bonding neutralizes the polar character of the CO and NH groups of the polypeptide backbone. The loop regions connecting the elements of secondary structure are found on the surface of folded proteins, where the polar components of the peptide bonds form hydrogen bonds with water or with the polar side chains of hydrophilic amino acids. Interactions between polar amino acid side chains (hydrogen bonds and ionic bonds) on the protein surface are also important determinants of tertiary structure. In addition, the covalent disulfide bonds between the sulfhydryl groups of cysteine residues stabilize the folded structures of many cell-surface or secreted proteins.

The fourth level of protein structure, quaternary structure, consists of the interactions between different polypeptide chains in proteins composed of more than one polypeptide. Hemoglobin, for example, is composed of four polypeptide chains held together by the same types of interactions that maintain tertiary structure (Figure 2.21).

Figure 2.21

Quaternary structure of hemoglobin. Hemoglobin is composed of four poly-peptide chains, each of which is bound to a heme group. The two α chains and the two β chains are identical.

The distinct chemical characteristics of the 20 different amino acids thus lead to considerable variation in the three-dimensional conformations of folded proteins. Consequently, proteins constitute an extremely complex and diverse group of macromolecules, suited to the wide variety of tasks they perform in cell biology.