NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Subcommittee on Reproductive and Developmental Toxicity. Evaluating Chemical and Other Agent Exposures for Reproductive and Developmental Toxicity. Washington (DC): National Academies Press (US); 2001.

Appendix DExperimental Animal and In Vitro Study Designs

Experimental animal studies should be evaluated as part of hazard characterization to ensure that adequate research has been carried out. The design (choice of species, vehicle, route and timing of exposure), conduct, interpretation, and reporting should be considered. In any assessment of the reproductive and developmental toxicity potential of exposure to a potentially harmful substance, all available data should be considered, including supplementary data from studies that are not designed to test reproductive and developmental toxicity. Supplementary information can be obtained from acute (single or multiple exposures that occur within 24 hours or less), subchronic (multiple or continuous exposures that last up to 3 months), and chronic (multiple exposures that occur over a significant fraction of an animal's life span) systemic toxicity studies (particularly where reproductive organs have been examined) and from toxicokinetic or tissue distribution data. In vitro test systems also can provide information about an agent's potential to cause reproductive or developmental toxicity. By themselves, however, in vitro tests are insufficient for defining the potential reproductive or developmental toxicity of an agent.

The primary information on experimental animal testing for reproductive and developmental toxicity potential is likely to be derived from standard studies used by regulatory agencies. Several statutes and guidelines have been published by different authorities, such as the Organization for Economic Cooperation and Development (OECD 1983, 1984, 1995, 1996, 2000a,b), the U.S. Environmental Protection Agency (EPA 1998a,b,c,d), and the U.S. Food and Drug Administration (FDA 1994,2000).

This appendix describes experimental animal and in vitro studies that are used to assess developmental toxicity and male and female reproductive toxicity from exposures to pesticides, industrial chemicals, and food ingredients. The testing of pharmaceutical agents is not described in detail here, but can be found in FDA (1994). A summary of the study types, protocols, endpoints and limitations is presented in Table D-1. A description of the manifestations of each type of toxicity and guidance on the interpretation of results from the studies also are presented.

DEVELOPMENTAL TOXICITY

Developmental toxicity is defined as adverse effects in the developing organism that can result from exposure before conception in either parent, exposure during gestation, or exposure during postnatal development from birth to sexual maturation. Adverse developmental effects can be detected at any point in the Life span of the organism. The major manifestations of developmental toxicity include death of the developing organism, structural abnormality, altered growth, and functional deficiency (EPA 1991).

Structural abnormalities in development include malformations and variations. A malformation is usually defined as a permanent structural change that can adversely affect survival, development, or function. The term variation indicates a divergence from the usual range of structural constitution that might not adversely affect survival or health. Because there is a continuum of responses from normal to severely abnormal, distinguishing between variations and malformations can be difficult.

Altered growth can result in an alteration in the size or weight of an organ or in body weight or size of exposed offspring. Changes in one indicator of altered growth might or might not be accompanied by other signs of altered growth. For example, changes in body weight sometimes accompany changes in crown-rump length or skeletal ossification. Altered growth can occur at any stage of development, and it can be reversible in some cases or permanent in others. Most current study designs do not allow differentiation between reversible and permanent changes.

Functional developmental toxicity is the study of alterations or delays in the physiological or biochemical competence of an organism or organ system after exposure to an agent during pre- or postnatal development. In any given test animal, delayed development can be assessed in relation to established landmarks for physical, behavioral, and sexual maturation.

Types of Studies

Two types of studies specifically designed to assess developmental toxicity are discussed in this section: the prenatal developmental toxicity study and the developmental neurotoxicity study. Several other types of studies, although not solely designed to assess developmental toxicity, can be used for that purpose. They include single- and multigeneration reproduction studies, reproductive assessment by continuous-breeding studies, and serial mating (dominant lethal) studies discussed in later sections.

Prenatal Developmental Toxicity

The prenatal developmental toxicity study provides information on the effects of repeated exposure to an agent during pregnancy (OECD 2000a; EPA 1998a; FDA 2000). It is normally conducted in two species, a rodent (usually rat) and a nonrodent (usually rabbit), although not all guidelines specify nonrodents. Animals are exposed to an agent, usually via ingestion or inhalation, during the period of major organogenesis. The protocols include exposure to the end of gestation in order to cover developmental events that occur later in gestation (e.g., central nervous system, skeletal growth, sexual differentiation). Offspring are delivered by cesarean section on the day before the expected day of parturition, and a maternal necropsy is conducted, including examination of the uterus for number of implantations, resorptions, fetal deaths, and live fetuses. Corpora lutea in the ovaries are also counted. Live fetuses are weighed and examined carefully for external, visceral, and skeletal malformations and variations. Although the terminology used for malformations and variations has been variable from laboratory to laboratory, attempts have been made at standardization (Wise et al. 1997).

Developmental Neurotoxicity

The objective of developmental neurotoxicity studies is to assess the potential of an agent to affect neurodevelopment (EPA 1998c). The protocol is designed to be used either as a separate study, usually as a follow up to other studies, or as part of a multigeneration reproduction study. A test agent is administered at a minimum of three dose levels to pregnant animals in groups that are large enough to produce 20 litters per dose group from day 6 of gestation through day 10 postnatally (the first half of lactation). (This is the minimum exposure period. Dosing can be continued throughout lactation or, in the context of a multigeneration study, dosing is done daily over two generations.) Pregnant and lactating dams are assessed for clinical signs of neurodevelopmental effects and for their performance in a functional observation battery. Litter sizes can be adjusted by random selection to provide equal numbers of male and female offspring (usually four of each). Offspring are randomly selected from litters for neurotoxicity evaluation, including gross neurological and behavioral disorders, motor activity, response to auditory startle, learning and memory, brain weight, and neuropathological examination. Motor activity is studied on postnatal days 13, 17, 21, and 60 ± 2. Auditory startle tests are conducted on postnatal days 22 and 60 ± 2. Learning and memory are evaluated in the offspring around the time of weaning (postnatal day 21) and again in adulthood (postnatal day 60 ± 2). Neuropathology is examined in the offspring on postnatal day 11 and at the termination of the study. The neuropathology analysis includes simple morphometric measurements of brain areas.

Although these studies are designed to specifically assess the effects of developmental exposures on nervous system structure and function, they are limited in the extent to which this complex system can be evaluated as part of routine testing. For example, assessment of social and reproductive behavior and condition (such as anxiety) are not included, different types of learning and memory–such as spatial and sequential learning, reference and working memory, or the effects of recall delay–are not assessed, and long-term effects of developmental exposures (beyond 60 days (d)) are not evaluated. Several efforts are under way to evaluate the utility of such protocols and to improve the methods used in rodent studies so that they are more comparable to those used in humans.

In Vitro Assays

Any developmental toxicity assay that uses a test subject other than a pregnant mammal falls under the general heading of an “in vitro assay.” Examples include isolated whole mammalian embryos in culture, nonmammalian embryo culture, and tissue, organ, and cell culture. Several manipulations are possible using in vitro assay systems that are not possible using pregnant mammals, such as the removal of the maternal environment, the removal or transplantation of specific tissues and cells, and the ability to track specific cells and molecules, to genetically alter cells, or to monitor embryo physiology.

There are two potential applications for in vitro assays: screening for developmental toxicity and analyzing mechanisms of normal and abnormal development. In vitro assays to screen chemicals for potential developmental toxicity have been under development for approximately 15 years with the idea that they could be used to assess larger numbers of chemicals than can be evaluated with in vivo developmental toxicity tests in mammals, could reduce the number of experimental animals used in those tests, and could be used to reduce the costs of testing large numbers of chemicals. A number of attempts have been made to validate in vitro assays for screening chemicals, and efforts are under way to validate the rodent embryo culture, micromass, and stem cell assays in a European-sponsored trial (Spielmann et al. 1998), and the frog embryo teratogenicity assay (Xenopus) in an interlaboratory comparison (Fort et al. 1998). Validation requires certain considerations in study design, including defined endpoints for toxicity, an understanding of the procedure's ability to respond to chemicals that require metabolic activation, and the accuracy of the test's response to chemicals that cause developmental toxicity or no effect in whole animal studies (Kimmel et al. 1982; Kimmel and Kochhar 1990; Schwetz et al. 1991). Since most in vitro systems involve an interruption in normal metabolism and the biological interrelationships found in the intact system, the range of developmental effects that can be produced and the power of the study to detect an effect are compromised as compared to those obtained using standard study designs in whole animal systems (Kimmel 1990). For these and other reasons, in vitro developmental toxicity assays are unlikely to be used alone to screen chemicals for risk assessment purposes when there is no prior knowledge about the potential for developmental toxicity. In the case of priority-setting in early drug or chemical development, such assays may be useful for eliminating those with toxicity that can be detected in these systems, leading to further development of those with little or no toxicity, with the expectation that standard in vivo assays would be conducted before actual marketing. In vitro screens also may be useful for assessing the developmental toxicity of chemicals or chemical classes for which there is already some information about toxicity from in vivo studies for the purpose of describing the relative toxicity (potency) of members of chemical families. If chemicals are likely to act through a common mechanism, a Single in vitro screen that is sensitive to a particular mechanism may predict the relative potencies of a class of chemicals. For example, an in vitro mouse limb bud cell screen has been used successfully to rank the relative teratogenic potential of a large series of synthetic retinoids (Kistler 1987). In addition, in vitro assays may be useful for studying complex mixtures for synergism or antagonism, and for evaluating the cumulative risk of two or more chemicals that have similar mechanisms or effects.

In vitro assays have become widely used for mechanistic studies in developmental toxicology (Harris 1997). An advantage to using in vitro assays for such studies is that they utilize decreasing levels of biological complexity to isolate specific developmental processes. In vitro assays are useful for identification of tissue sites of accumulation, initial biochemical insults, gene expression changes, structure-activity relationships, and disrupted developmental pathways. It is important to link the information developed in these assays to the whole tissue and organism events that are seen as a result of developmental toxicity in order to be most useful for risk assessment purposes. Such information can be employed in developing biologically based dose-response models for developmental toxicity (e.g., Shuey et al. 1994).

Interpretation

Box D-1 lists endpoints that can be used to assess developmental toxicity from standard testing studies. EPA (1991) published guidelines for developmental toxicity risk assessment that provide more detailed discussion of study result interpretation.

Observations on dams during the course of a study include regular examination for signs of toxicity and measurements of body weight. Assessment of food and water intake also can indicate toxicity and is essential to calculate actual test substance intake when the substance is administered in the diet or in drinking-water. When an agent is known to produce pharmacological or toxic effects, including sedation, respiratory depression, or hemolysis, such endpoints also are monitored. Maternal observations assess the relative contribution of maternal toxicity to any embryo-fetal toxicity observed. Maternal body weight before and after removal of the gravid uterus allows the determination of toxicity to the mother exclusive of effects on uterine content.

Examination of the uterus and its contents and of the ovaries of animals that are killed before parturition allows determination of the number of corpora lutea (a measure of the number of eggs released); implantations; live, dead, and resorbing fetuses; fetal weight; and sex. The number of implantation sites equals the number of live fetuses plus the number of dead embryos and dead fetuses. Preimplantation loss can be determined by subtracting the number of implantation sites from the number of corpora lutea. It is possible that the treatment can prevent implantation, and caution should be applied when interpreting the number of implantation sites and preimplantation loss. Dividing the number of resorptions (embryonic deaths) by the total number of implants gives a measure of postimplantation loss, subject to the same caution as above. It should be noted that postimplantation loss is sometimes expressed inclusive of fetal deaths. Uteri that show no signs of implantation at all can be stained with ammonium sulfide to reveal completely resorbed implantation sites (Salewski 1964).

Viable fetuses are examined for external, visceral, and skeletal malformations and variations, and the sex is determined. Individual fetal weight and identification allow external, visceral or skeletal findings to be linked to individual weights. Because there is a correlation between the number of fetuses in a litter and fetal weight, fetal weight can be analyzed with litter size as a covariate.

Source: Adapted from EPA 1991.

It is helpful to distinguish between early and late resorptions because dose-related effects can assist in determining the period during development that is sensitive to the test agent. Placental examination and weight might be of value in interpreting results. It is the commonly accepted practice for studies of rats, mice, and hamsters to allocate fetuses alternately for visceral or skeletal examination; this is done when fetal sectioning is used for visceral examination and the fetus cannot be examined skeletally. Where fresh microdissection is used, the fetuses can be examined both viscerally and skeletally, except for the head, which is usually fixed for head sections to examine the brain, eyes, nasopharynx, and other structures or processed with the skeleton to examine skeletal structures. For rabbit or larger fetuses, each fetus usually is examined both for visceral and skeletal effects. Several techniques are used for skeletal examination, including single-staining with alizarin red S, double-staining with alizarin red S and alcian blue to show both ossified bone and cartilage, or X-rays with or without intensification (Inouye 1976; Whitaker and Dix 1979).

Maternal and developmental endpoints are evaluated to interpret developmental toxicity data (EPA 1991). Of particular concern are agents for which there are no signs of toxicity to the maternal animal but that induce toxicity in the developing offspring, or when developmental effects are observed at doses below those causing toxicity in maternal animals. Another common situation is when adverse developmental effects occur only at doses that cause minimal maternal toxicity. In these situations, developmental effects should be attributed to developmental toxicity and should not be considered secondary effects of maternal toxicity. It is possible that the adult and the developing offspring are sensitive to the same dose of an agent. Also, it is important to note that maternal effects might be reversible whereas developmental effects could be permanent. Data on developmental effects can be difficult to assess when they occur at doses that cause severe maternal toxicity.

REPRODUCTIVE TOXICITY

Manifestations

Male Reproductive Toxicity

Expressions of male reproductive toxicity can involve alterations in the male reproductive organs or in related endocrine systems. Such alterations can include changes in sexual behavior (mating behavior, libido, erection, intromission, ejaculation), onset of puberty (delayed physical and behavioral development), fertility (achieving conception within a defined period), pregnancy outcome (production of normal quality and number of offspring), reproductive organ structure and morphology, reproductive endocrine parameters (including peptide and steroid hormone control), or other functions that compromise the integrity of the male reproductive system.

Female Reproductive Toxicity

Female reproductive toxicity includes adverse effects on reproductive organs and related endocrine systems. Endpoints that reflect toxicity include sexual behavior (receptivity to the male at appropriate times in the cycle), age at onset of puberty, fertility (the ability to produce offspring in normal number), gestation length, parturition, lactation, loss of primordial follicles, and age at reproductive senescence.

Types of Studies

Single-Generation Reproduction Study

The single-generation test can provide useful information on basic reproductive function (OECD 1983). It also provides information on the effects of subchronic exposure of peripubertal and adult animals.

In a single-generation reproduction study, males and females can be exposed during the same or separate trials to determine whether one or both sexes are affected. Daily dosing of male laboratory animals should begin when they are 5 to 9 weeks (wk) old and continue for 10 wk (for rats) or 8 wk (for mice) before the mating period. This schedule exposes the animals to an agent for the duration of one complete cycle of spermatogenesis (approximately 70 d in rats and 56 d in mice). Daily dosing of females should begin when they are 5 to 9 wk old and continue for at least 2 wk (OECD 1983) before mating. Females should continue to receive daily doses of the test agent throughout the 3-wk mating period, during pregnancy, and until offspring are weaned. At least three dose groups and one control group are usually included. Either one male to one female or one male to two female matings can be used, resulting in group sizes of at least 20 females and 10 or 20 males. The goal is to produce a minimum of 20 pregnant females per treatment group. Animals that have not mated or that remain infertile should be separated and studied for the cause of their infertility.

Animals are allowed to litter normally and rear their progeny until weaning. Optionally, by the removal of some pups, the litters can be standardized (normally on day 4 postpartum) to include an equal number of pups of each sex (Agnish and Keller 1997). It is considered inappropriate to remove only runts or any other deviant animals. Adjustment of litter size is not possible when there are fewer than eight animals per litter. The major advantage of the standardization of litter size is the diminished variability of pup and litter data, because dams have equal lactational challenges and pups have similar possibilities for growth. The disadvantages of standardization have been documented extensively elsewhere (Palmer 1986; Palmer and Ulbrich 1997) and include the disruption of the normal distribution of litter sizes; standardized litter sizes that are below the natural mean, median, and modal values normally observed for most rat and mouse strains; the elimination of large numbers of offspring that normally would survive; the introduction of human bias in selection; and the raising of mean body weight of pups and the lowering of the challenge to the lactating ability of the dam.

The animals are observed daily throughout testing. Toxic effects, mortality, neurobehavioral changes, altered sexual behavior, and problems in parturition and lactation are recorded. Food consumption and weight of animals are measured at least weekly, and after parturition on the same days litters are weighed. Individual records of each parent test animal and litter are maintained. The time after pairing to achieve a sperm-positive smear (the precoital interval) and duration of pregnancy are recorded, and soon after delivery the number and sex of pups, stillbirths, live births, and the presence of gross anomalies in each litter are recorded. The pups are weighed at a minimum on the morning after birth, on days 4 and 7, and weekly thereafter. Dead pups and the excess pups killed at day 4, if the litter is standardized, should be studied for any defects. All abnormalities in the dams or offspring should be recorded.

At necropsy, the offspring are examined for structural abnormalities, particularly those of reproductive organs, that also can be preserved for histopathological study. At a minimum, all parental animals and offspring that die during the test, those in the highest dose group, and the controls should be examined. Whenever there are gross abnormalities in an organ, the animals also must be examined microscopically.

The data should be treated with appropriate statistical methods. If one male is mated to two females, then nested statistical analysis must be performed based on the number of males used. A well-conducted, single-generation reproductive toxicity study should provide an estimate of a no-observed-adverse-effect level (NOAEL) and an assessment of adverse effects on fertility, parturition, lactation, and postnatal growth. Significant detrimental effects on any endpoint should be considered adverse.

The primary limitation of a single-generation toxicity study is that it provides no information on the breeding capacity of offspring. Other limitations are noted in Table D-1. An EPA workshop (Francis and Kimmel 1988) examined the value of the single-generation reproductive study and concluded that it is “insufficient to identify all potential reproductive toxicants, because it would exclude detection of effects caused by prenatal and postnatal exposures (including the prepubertal period) as well as effects on germ cells that could be transmitted to and expressed in the next generation” (EPA 1996a).

Multigeneration Reproduction Study

Several authorities have published guidelines on multigeneration reproductive assays (OECD 2000b; FDA 2000; EPA 1998b). For a discussion of multigeneration tests, see articles by Lamb (1988, 1989) and Christian (1986).

Multigeneration reproduction studies determine the potential of an agent to produce adverse effects on the male and female reproductive systems, in the embryo and fetus, and in the neonate. The bioassay examines a wide variety of endpoints related to reproduction, including effects on libido; germ cells; gametogenesis; fertilization; implantation; embryonic, fetal, and neonatal growth; development; parturition; lactation; and postweaning growth and maturity. The direct toxic effects of an agent on the pregnant dam can be evaluated. In addition, the recently revised guidelines include measures of estrous cyclicity and ovarian primordial follicle counts in parental and first filial ( F1) females, and sperm parameters (number, motility, and morphology) in parental and F1 males. Development of the reproductive system and measures of sexual maturation (vaginal opening and preputial separation) are also included. Finally, organ weights of the reproductive organs, target organs, and brain, spleen, and thymus are included. Because of its study design, a multigeneration reproduction study can provide data that cannot be developed from other standard testing protocols. The observed effects can be different from those seen in other (e.g., subchronic) studies.

The parental animals (P generation) are treated with an agent, usually via ingestion, for at least 10 wk before mating. Females continue to be exposed during gestation and lactation. Each dam can produce one to three litters, depending on whether the outcome in the first litter is unequivocal or confirmation is required of findings. This gives some flexibility in the protocol (multiple litters are needed only when initial results are equivocal), and it applies to P and F1 generations. If the effects in the F2 (second filial) generation are more marked than in the F1 generation, additional generations can be examined to clarify potential transgenerational effects.

Dosing of all generations is continuous throughout the study. During lactation, pups receive the test substance through the dam's milk and later from the treated food or drinking-water. If inhalation exposure is used, grooming of the fur can lead to additional exposure to the test material. Coprophagia by the pups is another possible route of exposure. Upon reaching sexual maturity, at least one male and one female from the F1 generation are selected from each litter for mating with another pup from a different litter but exposed to the same dose. F1 generation rats are treated for at least 13 wk and F1 mice are treated for at least 11 wk before mating.

The study report must include the following data:

  • Species and strain.
  • Toxic response data by sex and dose, including indices of mating, fertility, gestation, birth, viability, and lactation; offspring sex ratio; time-to-mating (including the number of days until mating and the number of estrous cycles until mating); duration of gestation.
  • Day of death if that occurs during the study.
  • Toxic or other effects on reproduction and pre- or postnatal growth of the offspring.
  • Developmental data, such as anogenital distance (triggered in F2 pups if positive findings are noted in the F1 animals), age of vaginal opening, and preputial separation.
  • Number of P and F1 females with normal cycles and cycle length.
  • Day of recording an abnormal effect and its subsequent course.
  • Body weight data by sex for each generation.
  • Dietary intake and food efficiency (body weight gain per food consumed), and test substance consumption for P and F1 animals, except for the period of cohabitation.
  • Sperm evaluation on data including total cauda epididymal sperm counts, percentage of progressively motile sperm, percentage of morphologically normal sperm, and percentage of sperm with each identified abnormality.
  • Stage of estrous cycle at the time of death for P and F1 females.
  • Necropsy findings.
  • Implantation data and postimplantation loss calculations for P and F1 females.
  • Absolute and relative organ weight data.
  • Detailed descriptions of all histopathological findings.
  • Adequate statistical analyses.
  • A copy of the study protocol.

The multigeneration study is probably the most complex type undertaken for regulatory purposes and provides information on toxicity that follows treatment throughout the entire reproductive cycle, except that it does not evaluate reproductive senescence other than the evaluation of primordial follicles in females. Other limitations are noted in Table D-1. In general, significant detrimental effects on any endpoints or on indices derived from the data should be considered adverse. EPA (1996a) provides a detailed discussion on adverse effects.

Reproductive Assessment by Continuous Breeding Study

The U.S. National Toxicology Program (NTP) has developed a test protocol for evaluating toxicity through a reproductive assessment by continuous breeding (RACB) study design (Lamb 1985; Gulati et al. 1991). The protocol was originally developed for mice as a faster and more cost-effective alternative to the conventional regulatory reproductive toxicity studies, but it also has been used successfully with rats. After a 1-wk pretreatment period, males and females are housed as breeding pairs in individual cages and allowed to mate continuously for 14-wk. Exposure to the test substance (usually in feed or drinking-water) is continued throughout the study, and the offspring are removed from the cage immediately after parturition. After the cohabitation breeding period, the pair is separated and the last litter is raised to weaning. Pups from these litters can then be selected, and treatment is continued. The pups are used in a mating trial to evaluate effects in the second generation in a manner comparable to that described previously for the multigeneration design.

The same endpoints (fertility, pups per litter, pup weight, sex, survival) are studied in the RACB protocol as in the standard multigeneration protocol. It is the time between Litters and the progressive effects on fertility and reproduction that are specific to the RACB study design. The difference is that the RACB study design allows more than one litter to be examined per generation and can give an indication of subfertility and infertility. Adverse effects that might not be noted in the first mating may become evident later due to longer exposure time; such findings would not normally be detected in the conventional studies.

The RACB does not give information on specific male and female reproductive effects unless cross-breeding of control and treated males and females is done following the 14-week mating trial. It also does not provide information on effects in the second generation unless F1 pups are raised to breeding age and mated to produce a second generation as in the multigeneration study design. Other limitations are noted in Table D-1.

Serial Mating Study (Dominant Lethal Study)

If a single-mating trial results in an adverse effect attributable to the male, it is difficult to determine the developmental stage in which the disruption occurs. It is well known that different stages of spermatogenesis are variably sensitive to toxic effects and that each toxic substance can affect different sperm cell populations (Parvinen 1982). Spermatogonia, for example, are sensitive to cyclophosphamide in experiments conducted in mice (Toppari et al. 1990), whereas spermatocytes are disrupted by ethylene glycol monomethyl ether in experiments conducted in rats (Chapin et al. 1985). The action of a compound that primarily affects the somatic Sertoli cells of the testis, for example, m-dinitrobenzene (Foster 1989), will produce an extensive period of infertility because of adverse effects on the function of these cells at various stages of germ cell differentiation.

Serial mating makes it possible to assess the sensitive stages of spermatogenesis and susceptible cell types. This information can be obtained from a specific serial-mating trial or from a similar protocol used for dominant lethal testing (OECD 1984; EPA 1998d). Adult males (usually rats) are exposed before mating, typically for 1-5 d, with 20 males per dose group, where after they are mated to one to three females weekly for the next 8-10 wk. Adverse effects on male reproduction are manifested as decreased numbers of implantation sites in uteri (indicative of failure of fertilization or preimplantation loss) and increased early fetal mortality (indicative of postimplantation loss or dominant lethality). To examine the uterine contents, dams are killed before parturition (e.g., on days 13-18 of pregnancy).

Any adverse effects can then be attributed to specific cell populations by back-calculation on the basis of the well-known kinetics of spermatogenesis (Chapin et al. 1985). The test was originally designed for detection of germ cell mutagenicity, and it requires a large number of female animals (e.g., an 8-wk trial would use 160-480 females), which is a disadvantage.

The limitations of serial-mating trials are similar to those shown in Table D-1 for other reproduction studies, except for the identification of stage of spermatogenesis affected. Additional endpoints of male reproductive toxicity and effects other than death of the offspring are not evaluated unless included in the protocol.

Total Reproductive Capacity

The total reproductive capacity study, a variant of the continuousbreeding study, is designed to assess ovarian toxicity. Female fetuses are particularly susceptible to agents that can adversely affect germ cells because development of the oocyte occurs prenatally; no new germ cells develop after birth. Female animals are exposed to a test substance for a short period in utero (i.e., days 9-16 of gestation) (McLachlan et al. 1981) or postnatally (Generoso et al. 1971) and allowed to mate with a single male as long as the females remain fertile. The numbers of litters and offspring are compared with those of control animals to estimate the loss of oocytes resulting from the exposure.

Total reproductive capacity studies have been designed with the specific purpose of evaluating female reproductive capacity and are not tests of general reproduction function.

Interpretation

Well-conducted multigeneration and continuous-breeding studies can provide data that demonstrate changes in the key parameters of male and female fertility and reproduction. Statistically significant, dose-related changes in the indices listed in Table D-2 provide sufficient evidence of reproductive toxicity but by themselves do not identify the affected sex. Because most multigeneration or continuousbreeding studies place test males with females treated at the same dose, they cannot identify which sex is affected. Although such studies are the most typical way to evaluate the reproductive toxicity of an agent, most provide insufficient evidence of whether the agent causes male or female reproductive toxicity in animals. There is, therefore, a need for additional data, which, in fact, can come from the same study. For example, evidence of gonadal toxicity measured by testicular weight or altered morphology can provide sufficient evidence that an agent is a male reproductive toxicant or add weight to evidence that it is not a male reproductive toxicant. Likewise for females, evidence of ovarian toxicity measured by weight changes and altered morphology can provide sufficient evidence for female reproductive toxicity. Another way to provide sufficient evidence of male reproductive toxicity would be to mate the treated animal of one sex to the untreated animal of the other sex.

Male Indices

Organ Weight

A statistically significant, dose-related decrease in absolute or relative testicular weight is generally sufficient evidence that an agent can cause reproductive toxicity in animals. Most agents that cause testicular toxicity also cause decreases in testicular weight, but if they cause edema, the testicular weight increases. Decreases in testicular weight can be considered sufficient evidence of toxicity by themselves, but increases must be explained by other endpoints, such as morphology. Any changes also must be considered in light of the systemic toxicity elicited by the test chemical. Severe systemic toxicity brings into question not only the organ weight data, but also the relevance of any other reproductive effects.

Weight changes in male accessory sex organs can indicate significant functional effects. Both the seminal vesicles and the prostate, for example, contain a large proportion of luminal fluid that can decrease rapidly when androgenic hormone concentrations decline. Epididymal weight is largely affected by the number of sperm present in the epididymis. Statistically significant, dose-related decreases in the weight of the epididymis would be sufficient evidence of male effects. Decreases in the weight of the seminal vesicles or ventral prostate can be sufficient evidence of male reproductive toxicity, but are more useful if supplemented by data on endocrine effects. Changes in pituitary weight alone would typically be insufficient evidence of male reproductive toxicity, because pituitary weight is an inaccurate indicator of changes in pituitary function, which are best measured by other parameters, such as hormone concentrations. Furthermore, only a small portion of the gland is involved with reproductive function.

Organ Morphology

Changes in testicular morphology are best observed when the tissues are preserved by optimal methods. The best evaluations can be done on testes fixed by perfusion and embedded in a plastic, such as glycol methacrylate. More conventional, but still quite acceptable, morphologic investigations can be performed on testes fixed by immersion in Bouin's fixative, embedded in paraffin, and stained with PAS. Formalin fixation and paraffin embedding of testes is an inferior and generally inadequate method for the study of testicular pathology because it will reveal only the most severe effects. In formalin-fixed and paraffin-embedded tissues, only the most severe changes in the seminiferous epithelium of the testis could be considered sufficient evidence of male effects. The sensitivity of these evaluations can be substantially improved by more careful fixation, embedding, and observation techniques. Low-quality morphological techniques, such as formalin fixation and paraffin embedding, are never sufficient to show that an exposure did not produce testicular toxicity.

Morphological changes in accessory sex organs are less common, but clear treatment-related effects also can provide sufficient evidence of male effects.

Sexual Behavior

Fertility studies do not incorporate measures of sexual behavior, but they indirectly measure endpoints that can be altered by effects on sexual behavior. These measurements include collecting vaginal smears to check for the presence of sperm or checking vaginal plugs as evidence of mating. An azospermic male, however, might have normal sexual behavior but will not have a “sperm-positive” mating. Thus, even though a decrease in sperm-positive matings can be sufficient evidence of reproductive toxicity, it would not be sufficient evidence of abnormal sexual behavior. If a study does measure sexual behavior, mounting frequency, intromission, ejaculation number, and latency can be measured. More detailed studies of sexual behavior (Zenick and Clegg 1989) would be helpful, but are rarely done.

Sperm Evaluation

In mice and rats, sperm motility and count are relatively sensitive and reliable indicators of male reproductive toxicity (Morrissey et al. 1988a,b). Statistically significant, dose-related decreases in these parameters would constitute sufficient evidence of male reproductive toxicity, even if fertility is not adversely affected. Sperm morphology changes, if statistically significant and dose-related, would be sufficient evidence of reproductive toxicity. Experience has shown, however, that sperm morphology changes in rodents are fairly insensitive indicators of reproductive toxicity (Morrissey et al. 1988a,b) even though they can be good indicators of reproductive dysfunction in humans.

Sperm evaluations in rats and mice are nearly always limited to the terminal sacrifice of the test animals because it is extremely difficult to collect semen samples from such small animals. Because investigators can collect whole semen samples from rabbits and domestic animals, however, it is possible to assess and follow progressive changes in semen in these animals over time. The potential advantages to conducting sperm assessments in rabbits include the ability to assess the same parameters (morphology, motility, sperm count) at successive points. Studies have shown that large decreases in semen parameters must occur before there are noticeable changes in fertility. Statistically significant, dose-related decreases in semen quality, however, could constitute sufficient evidence that an exposure causes reproductive effects in the test species.

Endocrine Evaluations

If adequately designed studies detect changes in concentrations of gonadal steroid or gonadotropic pituitary hormones, these endocrine parameters do provide sufficient evidence of reproductive toxicity. Typically, adequate studies that show toxicity will have multiple samples obtained in a well-defined context that includes sex, age, reproductive state, day of cycle, and so on. Endocrine changes that indicate toxicity will include both multiple values outside the normal physiological ranges and physiologically plausible changes in direction in hormone concentrations.

Biochemical Markers of Reproductive Exposure and Effect

Various markers of exposure and effect have been investigated in male reproductive toxicology, including prostatein, androgens, and prolactin (NRC1989). Sertoli cell enzymes or biochemical secretory products, measured in vitro and in vivo as markers of cell function, are other examples of useful endpoints for studying target organ or cell responses. Currently, however, they cannot be considered evidence of male reproductive toxicity.

In Vitro Methods

There are methods for culturing various cells from the male reproductive system, such as pituitary cells, Sertoli cells, and germ cell-Sertoli cell cocultures. Although these investigations help elucidate mechanisms of action, they cannot by themselves generate sufficient evidence of reproductive toxicity.

Female Indices

Several endpoints listed in Table D-2 can provide evidence for female reproductive toxicity. For example, when a continuous-breeding study shows an adverse effect, it is desirable that the study also mate each member of a breeding pair to an untreated control to identify which member is affected by the agent. If a study has not taken this step, it cannot be said with certainty that the observed effect is the result of female reproductive toxicity; it can be equally likely that a male effect or a couple effect is involved.

Because most standard animal reproduction studies do not observe mating, they do not contain evaluations of an agent's effect on sexual behavior. If a study does report observations of mating, the failure of female rodents to assume a lordotic position and to accept mounting is evidence of abnormal sexual behavior. Additional signs include running from or fighting with the male (Uphouse and Williams 1989; Uphouse 1985).

Cytology Abnormalities

Abnormal findings for estrous animals include persistent estrus, prolonged diestrus, or anestrus (May and Finch 1988). To characterize the estrous cycle in appropriate experimental animals, studies can use vaginal cytology or other cyclic signs in animals that menstruate, including humans. These parameters can give information on whether cycling has discontinued or whether segments of the cycle are altered in length. Because estrous cycle length has a normal variation, it is also possible to evaluate changes in the distribution of cycle lengths. The interpretation of these data is, however, open to question. Vaginal cytology data can also be incorporated into such protocols as the continuous-breeding test, the subchronic study, and the two-generation reproduction study (Morrissey et al. 1988a,b; EPA 1998b; OECD 2000b; FDA 2000). Alterations in the distribution of estrous or menstrual cycle length alone have not been shown to be reliable predictors of reproductive toxicity. By themselves, these alterations would be insufficient to identify an agent as a reproductive toxicant.

Weight and Morphology Changes

A statistically significant decrement in ovarian or uterine weight in a study properly controlled for cyclic variation is worthy of consideration and should signal the need for additional studies. Similarly, an increase in uterine weight in an acyclic or castrate animal, or in a study that controls for cyclic variation, should raise concern about possible estrogenicity of the test agent and should suggest that additional studies are needed. Neither of these parameters, as an isolated endpoint, is sufficient to characterize an agent as a reproductive toxicant. Evaluation of the ovary often includes counts of follicles or subpopulations of follicles (Pederson and Peters 1968; Heindel 1999). A decrease in the number of ovarian follicles or a change in follicle subtype, however, is evidence of reproductive toxicity.

Biochemical Changes

Secretion products of the uterus can be obtained with uterine lavage (Teng et al. 1986). Changes in uterine secretions could be useful for characterizing alterations associated with treatment because these changes can be cycle dependent, however, they can be difficult to interpret. To date, the characterization of normal changes in uterine secretory products is incomplete. Such changes alone, however, are insufficient to characterize an agent as a reproductive toxicant.

Alterations in Age at Puberty or Reproductive Senescence

In animals with estrous cycles, the onset of puberty is marked by vaginal opening. Reproductive senescence may manifest as persistent vaginal estrus followed by anestrus. A change in the age at puberty or reproductive senescence is sufficient to characterize reproductive toxicity, although it is desirable to have supporting data that explain the mechanism of toxicity.

Endocrine Parameters

In estrous and menstrual animals, the reproductive cycle is characterized by the production of sex steroids from the ovary in response to pituitary gonadotropins, which are under hypothalamic control. It is possible to measure the relevant hormones, but evaluators must keep in mind that the hormones are produced in a pulsatile fashion, with cyclic variation in the amplitude and frequency of the pulses. For this reason, single static measures are unlikely to be informative unless a result is well outside the normal ranges (e.g., castrate concentrations of gonadotropins). Other strategies for evaluating endocrine parameters include serial measurements of hormones in blood at short intervals, and response of an endocrine measure to a stimulus. In the serial measurement strategy, frequent sampling permits the construction of a profile of the hormone change over time, which can disclose the pulse pattern. This method is difficult in animals with small blood volumes where frequent sampling may produce its own effects.

The second method, response of an endocrine measure to a stimulus, involves sampling an animal at a fixed time after administration of a releasing factor. One can, for example, measure luteinizing hormone after injecting gonadotropin-releasing hormone or measure progesterone after injecting chorionic gonadotropin (Hughes 1988). The disadvantage of this method is the possibility that the injection of the releasing agent will cause an atypical physiological situation, so that one cannot extrapolate the effect it “unmasks” to unmanipulated animals.

If changes in concentrations of gonadal steroid or gonadotropic pituitary hormones are detected in adequately designed studies, these endocrine parameters do provide sufficient evidence of reproductive toxicity. Results that show multiple values outside the normal physiological ranges, changes in hormone concentrations in physiologically plausible directions, or failure of key hormonal events (such as luteinizing hormone surge, preovulatory estradiol rise, maintenance of luteal phase progesterone production) provide sufficient evidence of reproductive toxicity.

In Vitro and Perfusion Systems

Tissue culture methods have been used to study ovary slices in vitro, and cell culture methods have been used for studying granulosa cells and myometrial cells. In culturing ovary slices or granulosa cells, investigators often use the release of sex steroids into the medium as an outcome parameter. Under some conditions, granulosa cells will luteinize, producing a range of steroid and nonsteroid products; of these, progesterone is measured most commonly. Some studies, however, have measured other products, including nonsteroidal substances (Haney et al. 1984; Teaff et al. 1990). Some cell culture studies have made use of the contractile properties of myometrial cells for evaluating the potential of agents to alter uterine activity. In all of these test systems, the artificial nature of the in vitro setting can limit the predictive value of the results.

Ovaries perfused in vitro are useful systems for studying the mechanical aspects of ovulation. The preparations allow observations on the effects of agents in preventing rupture of the follicle and expulsion of the oocyte. The perfusion system is artificial, however, and the relocation of the ovary from peritoneal cavity to the perfusion chamber can alter the mechanical features of the system. For this reason, data from perfusion studies are not, in themselves, sufficient for drawing conclusions about an agent's reproductive toxicity.

Any change observed in an in vitro or organ perfusion system should be considered supplemental. Isolated findings of studies that use these systems are insufficient to characterize an agent as causing reproductive toxicity.

Breast Milk

Changes in breast histopathology or in breast milk amount or composition should signal the need for additional studies, and in particular, the need for studies that evaluate the effect of such changes on the nourishment and health of the offspring. The mere presence of xenobiotics in milk is not, by itself, evidence of toxicity; however, if a test agent is concentrated in milk, this should prompt recognition of the need for studies on the nursling. Conversely, if an agent is not transferred into the milk in rodent studies, but it is clear that exposure to critical organ systems continues in utero at the same developmental stages in humans, it may be appropriate to conduct direct dosing studies in rodents to determine any potential effects on the structural and functional development of these systems.

Tables

TABLE D-1Types of Animal Studies Used to Assess Reproductive and Developmental Toxicity

Study TypePurposeProtocolEndpointsLimitationsReference
Single-generation reproductionProvides basic information on potential of agent to produce adverse effects on male and female reproductionBefore mating, males exposed for complete cycle of spermatogenesis and epididymal transit time; females exposed for at least 2 estrous cyclesToxic effects, mortality, neurobehavioral changes, altered sexual behavior, problems in parturition and lactation, time to positive sperm smear, duration of pregnancy, number and sex of pups, stillbirths, live births, presence of gross abnormalities, weight changes, and structural abnormalitiesDoes not provide information on breeding capacity of the F1 generation, effects expressed after weaning, individual male and female effects, reproductive senescence, reversibility, other specific functional developmental effects, time of effect initiation, structural anomalies of offspring (generally), internal doseOECD 1983
Multi-generation reproductionDetermines potential of agent to produce adverse effects on male and female reproductive systems and on the embryo, fetus, and neonateBefore mating, males and females exposed 10 wk; offspring exposed through lactation and after weaning, by individual treatment; dosing of all generations is continuous throughout studyLibido; estrous cyclicity, ovarian histopathology including quantification of primordial follicles in P and F1 females; sperm parameters (count, motility, morphology) in P and F1 males; fertilization; implantation; embryonic, fetal, neonatal growth and development; parturition; lactation; post-weaning growth and sexual maturity; development of reproductive organs; brain, spleen, and thymus weightsDoes not provide information on reproductive senescence, reversibility, detailed functional developmental effects other than on the reproductive system, time of effect initiation, structural anomalies of offspring (generally)OECD 2000b; FDA 2000; EPA 1998b
Prenatal developmental toxicityDetermines potential of agent to produce adverse effects on animals exposed during gestationAnimals (rodent and nonrodent) exposed at least from implantation until just prior to parturitionExternal visceral and skeletal malformations and variations, weight, pre implantation and post-implantation lossDoes not provide information on reversibility and repair of specific effects, malformed offspring may die before observation, low power for detecting malformations, treatment does not usually cover period before implantation, function of fetal organs not evaluated, restricted macroscopic examination, specific susceptible period of development can not be identified, limited evaluation of maternal and adult toxicityOECD 2000a; EPA 1998a; FDA 2000
Developmental neurotoxicityAssesses potential neurotoxicity from exposure during critical stages of developmentPregnant animals exposed during gestation and lactation to postnatal day 10Offspring tested for gross neurological and behavioral disorders, motor activity, response to auditory startle, learning and memory, neuropathological effects, brain weightExposure over whole period of postnatal development is not included, limited assessment of learning and memoryEPA 1998c
Serial mating (Dominant lethal)Assesses stages of spermato genesis and cell types in male reproduction for sensitivity to an agentAdult males exposed for 1-5 d before mating then mated to 1-3 females weekly for 8-10 wkNumber of implantation sites in uteri, early fetal mortalityDoes not provide information on reversibility, several general reproduction parameters as stated under the limitations section of the single- and two-generation reproduction studies, and endpoints of male reproductive toxicityOECD 1984; EPA 1998d
Continuous breeding (RACB)Similar to multigeneration except that reproductive capacity over a 14-wk is also assessedMales and females (rats and mice) treated for 1 wk before mating and for 14-wk mating period. Litters removed at birth, examined, and discarded, except for last litter which is weaned, raised to breeding age, and mated to evaluate effects in second generationSame as those in the multigeneration studies, as well as time between litters and progressive effects on fertility and reproductionDoes not provide information on individual male and female parental effects; reversibility; specific functional developmental effects in offspring, time of effect initiation, internal and skeletal anomalies in offspring, sexual behaviorReviewed in Lamb 1985
Total reproductive capacityAssesses ovarian toxicityFemales (usually mice) exposed to an agent for a short period in utero or post-natally and allowed to mate with a single male while female remains fertileNumber of litters and offspringDoes not provide information on reversibility types of toxicity other than ovarian, effects in malesMc Lachlan et al. 1981; Generoso et al. 1971

TABLE D-2Indices of Fertility and Reproductive Function

Image p20004378g227001.jpg

Boxes

Box D-1Developmental Toxicity Endpoints from Standard Testing Protocols

Endpoints typically measured at terminal phase of pregnancy

Preimplantation loss

Implantation site

Corpora lutea

Resorptions and fetal death

Live offspring with malformations and variations

Affected (nonlive and malformed) conceptus

Fetal weight

Endpoints that can be measured postnatally

Stillbirth

Offspring viability (birth, within the first week, weaning, etc.)

Offspring growth (birth, postnatal)

Physical landmarks of development (e.g., vaginal opening, palanopreputial separation)

Neurobehavioral development and function (actual enpoints measured depend on the function or organ system being studied)

Reflex development

Locomotor development

Motor activity

Sensory function

Social-reproductive behavior

Cognitive function

Neuropathology and brain weight

Reproductive system development and function

Vaginal opening

Onset of estrus

Balano-preputial separation

Ovarian cyclicity

Quantitation of ovarian primordial follicles

Sperm measures (e.g., morphology, motility, number)

Fertility

Pregnancy outcome

Other organ system function (e.g., renal, cardiovascular)

Copyright 2001 by the National Academy of Sciences . All rights reserved.
Bookshelf ID: NBK222201