MATERIALS AND METHODS

Publication Details

Procurement and Characterization

AZT

3′-Azido-3′-deoxythymidine (AZT) was obtained from Cipla Ltd., Mumbai Central (Mumbai, India) in one lot (FX4159) used in the 30- and 45-week studies and the 45-week stop-study. Identity and purity analyses were conducted by the study laboratory at the National Center for Toxicological Research (NCTR; Jefferson, AR) and Galbraith Laboratories, Inc. (Knoxville, TN). Reports on analyses performed in support of the AZT studies are on file at the NCTR.

Lot FX4159 of the chemical, a white crystalline powder, was identified as AZT by the study laboratory using proton nuclear magnetic resonance (NMR) spectroscopy, direct exposure probe-electron ionization mass spectrometry, and high-performance liquid chromatography (HPLC) coupled with electrospray ionization mass spectrometry. All spectra were consistent with literature spectra and the structure of AZT. The melting point range of lot FX4159 was determined to be 122.5° to 123.4° C by Galbraith Laboratories, Inc.; these values were consistent with those reported in the literature for AZT crystallized from water.

Karl Fischer titration and elemental analyses of lot FX4159 were performed by Galbraith Laboratories, Inc., and the study laboratory determined the purity of the bulk chemical by HPLC. Karl Fischer titration indicated less than 0.46% water. Elemental analyses for carbon, hydrogen, nitrogen, and oxygen were in agreement with the theoretical values for AZT. HPLC detected no impurity peaks by comparisons to spectra from previously characterized AZT standards, and the purity of lot FX4159 was determined to be 100% under the conditions of the assay.

Dosing Vehicle

The vehicle used for dose formulations in the 30- and 45-week studies and the 45-week stop-study was a 0.2% methylcellulose/0.1% Tween® 80 aqueous solution. Methylcellulose was obtained from Sigma-Aldrich® Corporation (St. Louis, MO) in one lot (014K0081). Identity studies of lot 014K0081 were performed by the study laboratory using proton and carbon-13 NMR spectroscopy; the results of these analyses were consistent with those obtained previously for a methylcellulose standard obtained from Fischer Scientific (Pittsburgh, PA). Tween® 80 was obtained from Sigma-Aldrich® Corporation in one lot (073K00641).

Preparation and Analysis of Dose Formulations

The dose formulations were prepared by mixing AZT with the dosing vehicle to give the required concentrations. The dose formulations were stored at room temperature in sealed amber glass bottles for up to 29 days.

Stability studies of 0.05 and 0.20 mg/mL formulations were performed by the study laboratory using HPLC. Stability was confirmed for at least 29 days for formulations stored in sealed amber glass bottles at room temperature.

Periodic analyses of the dose formulations of AZT were conducted by the study laboratory using HPLC. Accuracy of dose delivery from the dosing apparatus was also periodically determined using HPLC. Of the dose formulations analyzed and used during the studies, 209 of 211 were within 10% of the target concentrations (Table F2). For the dose accuracy of delivered doses, 17 of the 20 samples analyzed were within 10% of the target doses (Table F3).

Study Designs

In utero exposure was selected to mimic exposures in humans and was begun at gestational day (GD) 12, a time period coinciding with administration of AZT in the last third of pregnancy (DHHS, 2009). Starting dosing at GD 12 also allows for maximal sensitivity for carcinogenesis studies of genotoxic agents (Rice, 1973; Anderson, 2004). The doses for this study were selected to overlap human exposures.

Male and female heterozygous F1 p53+/− mice were exposed to AZT in utero on GDs 12 through 18, then administered AZT by gavage from postnatal day (PND) 1 through 30 weeks of age (30-week study), 45 weeks of age (45-week study), or PND 8 (45-week stop-study). Mice in 0 mg/kg groups received only an aqueous solution containing 0.2% methylcellulose and 0.1% Tween® 80. The dosing volume was 10 mL/kg, except on PNDs 1 through 8, when the dosing volume was 5 mL/kg. Mice were dosed once daily until PND 28, then once daily 5 days per week. Dosing was performed using a Hamilton Microlab 500 series pump. Full details of this dosing technique have been described by Lewis et al. (2010). A summary of the dose groups is presented in Table 1.

Table 1. Summary of Dose Groups for Mice Treated with AZT.

Table 1

Summary of Dose Groups for Mice Treated with AZT.

For the continuous dosing studies, pregnant dams were administered 0, 80, 160, or 240 mg AZT/kg body weight per day on GDs 12 through 18. Litters were culled to six pups, which were administered 0, 40, 80, or 120 mg/kg as a single daily gavage on PNDs 1 through 10, then 0, 80, 160, or 240 mg/kg as a single daily gavage until PND 28, when they were assigned to either the 30-week or 45-week studies (Figure 1), or culled for genotoxicity studies (Appendix B). Litters that had less than six live pups on PND 1 were also dosed and used for these studies.

Figure 1. Dosing Schedules for 30-Week and 45-Week Studies.

Figure 1

Dosing Schedules for 30-Week and 45-Week Studies.

For the 30-week study, groups of 26 or 27 male and 26 or 27 female pups from the 0/0/0 or 240/120/240 mg/kg dose groups were assigned to the study on PND 28 and were administered either 0 or 240 mg/kg in a single daily gavage, 5 days/week until the end of the study (Figure 1).

For the 45-week study, groups of 27 male and 26 or 27 female pups from the 0/0/0, 80/40/80, 160/80/160, or 240/120/240 mg/kg dose groups were assigned to the study on PND 28 and were administered either 0, 80, 160, or 240 mg/kg in a single daily gavage, 5 days/week until the end of the study (Figure 1).

For the 45-week stop-study, pregnant dams were administered 0 or 240 mg AZT/kg body weight per day on GDs 12 through 18. Litters were culled to six pups, which were administered 0 or 40 mg/kg as a single daily gavage on PNDs 1 through 8. Groups of 24 or 25 male and 25 or 26 female pups were assigned 0/0 or 240/40 mg/kg dose groups on PND 28 and then maintained on study until 45 weeks of age without further dosing (Figure 2). The neonatal dose of 40 mg/kg was used to match that used in a 2-year study in B6C3F1 mice that will be presented elsewhere. Where possible, no more than two littermates per sex were assigned to each dose group. The individual litter assignments are presented in Appendix G.

Figure 2. Dosing Schedule for 45-Week Stop-Study.

Figure 2

Dosing Schedule for 45-Week Stop-Study.

Source and Specification of Animals

Female C3H/HeN wild-type mice and male homozygous, p53-null C57BL/6(N12)Trp53(−/−) p53-null mice were obtained under an academic breeding license from Taconic Farms, Inc. (Germantown, NY). The animals were quarantined for 14 days prior to being assigned to the study. Each male was mated with up to six females in succession and plug-positive females were provisionally assigned to the study on the morning that the vaginal plugs were identified, which was designated GD 0 for the study. The plug-positive animals were weighed daily and those not showing signs of pregnancy were returned to the breeding pool on GD 10. Dosing of pregnant mice was initiated on GD 12. The heterozygous F1 p53+/− pups were born on GD 19 or 20 and the morning a litter was first observed was designated PND 0. On PND 1 each litter was examined, the sex of each pup determined, and the litter was culled to six pups of equal sex ratio when possible. On PND 28, excess pups were culled for mechanistic studies that will be reported elsewhere.

Animal Maintenance

Mouse dams were housed individually with litters until PND 21; pups were weaned, then housed individually beginning PND 29. Feed and water were available ad libitum. In order to monitor the health of animals, blood was drawn from two sentinel mice at 3 and 20 weeks and from four sentinel mice at 37, 47, and 56 weeks after receipt of the breeding mice. Sera were analyzed for antibody titers to rodent viruses; all results were negative. Further details of animal maintenance are given in Table 2. Pups were weaned on PND 21, and were group housed as a litter until PND 28 when mice assigned to the study were housed individually.

Table 2. Experimental Design and Materials and Methods in the In Utero/Postnatal Gavage Studies of AZT.

Table 2

Experimental Design and Materials and Methods in the In Utero/Postnatal Gavage Studies of AZT.

Clinical Examinations and Pathology

Animals were observed twice daily. Body weights were recorded daily for pregnant dams and for litters until PND 21, then individual pups were weighed weekly until the end of the studies; clinical findings were recorded twice weekly.

At PND 160, blood was collected via the saphenous vein from eight males and eight females in the 0/0/0 mg/kg groups and 16 males and 16 females in the 240/120/240 mg/kg groups from the 30- and 45-week studies to monitor for macrocytic anemia. Blood was also collected from surviving male and female mice in the 30-week study at study termination via cardiac puncture under carbon dioxide anesthesia for hematology and clinical chemistry. Blood samples for clinical chemistry were allowed to clot then centrifuged. The serum was removed and frozen at −60° C until analysis.

Blood samples for hematology were collected in EDTA and analysis was performed on the day of collection. Automated hematology was performed using an ABX Pentra 60 C+ hematology analyzer (ABX, Irvine, CA). Clinical chemistry analyses were conducted on an Alfa Wassermann ALERA analyzer (Alfa Wassermann, West Caldwell, NJ) with reagents manufactured and/or supplied by Alfa Wassermann or Catachem (Bridgeport, CT). The parameters measured are listed in Table 2.

Necropsies and microscopic examinations were performed on all mice. The brain (45-week study and 45-week stop-study) heart, left and right kidney, liver, and lung (30-week study) were weighed. At necropsy, all major tissues were examined grossly for visible lesions, and all major tissues were preserved in 10% neutral buffered formalin or Davidson’s solution (eyes and testes). The major tissues and gross lesions were trimmed, processed, and embedded in Formula R®, sectioned at approximately 5 μm, and stained with hematoxylin and eosin. When applicable, nonneoplastic lesions were graded for severity as 1 (minimal), 2 (mild), 3 (moderate), or 4 (marked).

Microscopic evaluations were completed by the study laboratory pathologist, and the pathology data were entered into the Laboratory Data Acquisition System (LDAS) database. The slides, individual animal data records, and pathology tables were evaluated by the Toxicologic Pathology Associates (Jefferson, AR) and NCTR quality assurance units. The individual animal records and tables were compared for accuracy, the slide and tissue counts were verified, and the histotechnique was evaluated. A quality assessment pathologist evaluated slides from all tumors and all potential target organs, which included the liver of male mice and the lymph nodes, spleen, and thymus of male and female mice. Tissues examined microscopically are listed in Table 2.

The quality assessment report and the reviewed slides were submitted to the NTP Pathology Working Group (PWG) coordinator, who reviewed the selected tissues and addressed any inconsistencies in the diagnoses made by the laboratory and quality assessment pathologists. Representative histopathology slides containing examples of lesions related to chemical administration, examples of disagreements in diagnoses between the laboratory and quality assessment pathologists, or lesions of general interest were presented by the coordinator to the PWG for review. The PWG consisted of the quality assessment pathologist and other pathologists experienced in rodent toxicologic pathology. This group examined the tissues without any knowledge of dose groups or previously rendered diagnoses. When the PWG consensus differed from the opinion of the laboratory pathologist, the diagnosis was changed. Final diagnoses for reviewed lesions represent a consensus between the laboratory pathologist, reviewing pathologist, and the PWG. Details of these review procedures have been described, in part, by Maronpot and Boorman (1982) and Boorman et al. (1985). For subsequent analyses of the pathology data, the decision of whether to evaluate the diagnosed lesions for each tissue type separately or combined was generally based on the guidelines of McConnell et al. (1986).

Statistical Methods

Survival

Weaned pups reaching terminal kill were censored from analysis. Kaplan-Meier estimates (Kaplan and Meier, 1958) of mean survival times were calculated for each sex-by-treatment group and the Kaplan-Meier curves were plotted. For each sex and dosing regimen (30-week, 45-week, and 45-week stop-study) combination, four proportional hazards models (Cox, 1972) were used to test the effect of the dose (linear trend and comparison to control). The four models were unadjusted for litter using standard Cox model, unadjusted for litter using a sandwich variance estimate (Binder, 1992), adjusted for litter using a sandwich variance estimate, and adjusted for sires using a sandwich variance estimate. The second model was necessary to truly gauge the impact of the litter correlation. All survival analysis P values are two sided. Unless otherwise noted, statistical significance was set at the 5% level.

Body Weight Analysis

The body weight data for each animal were rasterized to evenly-spaced time points (every 4 weeks) via LOESS scoring (Cleveland, 1979; Cleveland et al., 1988). This process reduces the number of time points for the mixed-effects model, reduces the effects of outliers, and creates a grid of regularly spaced time points. The scored data were then analyzed using a mixed effects model by sex and age (in weeks). This was done to facilitate proper modeling of the intralitter correlation and the inherent variance heteroscedasticity with age. To capture the growth dynamics, additional modeling captured the initial growth rate and the late growth rate. The model treated body weight as a function of treatment group. Dunnett’s method (Dunnett, 1955) was used to compare body weight in the dosed animals to body weight in the control animals at each scored age. This model was run unadjusted for litters, adjusted for litters, and adjusted for sires. Plots are presented using naive means and standard errors.

Analysis of Continuous Variables

For organ weights in the 30-week study and the 45-week stop-study and for hematology and clinical chemistry data, dosed groups were compared to the control groups using Student’s t-test. For organ weights in the 45-week study, groups were analyzed by a two-tailed Dunnett’s test run under the SAS General Linear Models Program.

Calculation of Neoplasm and Nonneoplastic Lesion Incidences and Severities

The incidences of neoplasms or nonneoplastic lesions are presented in Tables A1, A3, A4, A6, A7, A9, A10, A12, A13, A15, A16, and A18 as the number of animals bearing such lesions at a specific anatomic site and the number of animals with that site examined microscopically. For neoplasms and nonneoplastic lesions, the Poly-3 method of Bailer and Portier (1988) as modified by Bieler and Williams (1993) and NIEHS (continuity-correction) was used to analyze age-adjusted incidence. For dam- and sire-adjusted correlation models, the Poly-3 weighted generalized linear model was used to generate estimated correlation-adjusted incidences and these are given along with the relevant test P value.

The Poly-k test (Bailer and Portier, 1988; Portier and Bailer, 1989; Piegorsch and Bailer, 1997) is typically used to assess treatment effects on neoplastic and nonneoplastic lesion prevalence. This test is a survival-adjusted quantal-response procedure that modifies the Cochran-Armitage linear trend test to take survival differences into account. The variance correction of Bieler and Williams (1993) is usually used to account for the extra-binomial variability induced by using a stochastic denominator in the Cochran-Armitage test. Pairwise comparisons in this test are accomplished by reanalyzing the treatment groups in pairs. This framework limits the Poly-k test to one-way designs with no correlation. This model was run for these studies for each dosing regimen (30-week, 45-week, and 45-week stop-study) and does not adjust for intralitter correlation.

To adjust for intralitter correlation, the Poly-k test was revised. Bieler and Williams (1993) used the fact that the Cochran-Armitage test can be envisioned as a binomial-weighted regression in the derivation of their variance correction. If we begin with this paradigm, we can generalize it to view the Cochran-Armitage test as a generalized linear model with binomial variation and an identity link function (McCullagh and Nelder, 1989). This model can be used with the Poly-k weights to allow more complex designs including litter correlations using generalized estimating equations (Liang and Zeger, 1986) and factorial effects as well as alternative link functions.

Several issues arise in this situation. First, estimated variances will be group-specific rather than null hypothesis-specific. Bieler and Williams (1993) mentioned that group-specific variances had caused their correction to be less stable in simulations and opted for the null hypothesis variance. Using group-specific variances certainly has the effect of creating uniform groups and causing estimation problems in typical NTP datasets. Specifying the null hypothesis deviance function might address this problem but this was not done for the current studies. Instead, for these analyses, a more traditional method of bumping uniform groups was used. The identity link did not appear to be very sensitive to the amount of the bump; especially in comparison to the logit link which was sensitive to bump size. Therefore the identity link was used.

The second difficulty comes in estimating correlations. Poly-k refits the pairwise models in order to estimate the pairwise effects. This is not applicable in factorial analyses and is problematic in litter correlation analyses since it allows the litter correlation to be different among the different analyses. Thus all the data were used to estimate the correlations that should be considered as common across treatment groups.

The common correlation leads to the third issue; Poly-k is ultimately a linear regression. However, we can consider the generalized linear model to be an ANOVA model with design effects and use contrasts to generate the linear trend and pairwise comparisons or, indeed, any factorial contrast we may wish to examine. This ANOVA-style method is not quite in keeping with traditional Poly-k.

For each dosing-regimen and lesion pool, three analyses of these data were analyzed: traditional Poly-3, dam-adjusted Poly-3, and sire-adjusted Poly-3. For each lesion pool and drug combination, the sire- or dam-adjusted analysis fits a lesion-present flag to dose level using a generalized linear model with binomial distribution, identity link function, and Poly-3 observation weights. Uniform treatment groups were bumped away from uniformity by adding an uncorrelated dummy lesion flag observation with value=0.005 and Poly-3 weight=0.005. Correlation within sire or dam was achieved by invoking a generalized estimating equation-based exchangeable correlation among sire- or dam-mates. This completed the model. Suitable contrasts were used to test the relevant hypotheses. One-sided results were generated in keeping with NTP standards. The sire-adjusted analyses were generated in the same manner differing only in the specification of the correlation group variable.

It should be emphasized again that the implementation details of the correlation Poly-k methods are different from the Bieler and Williams’ variance-adjusted Poly-k test (Bieler and Williams, 1993). Particularly, the variance is not quantal-adjusted, is group-specific rather than null hypothesis-specific, and all comparisons are estimated within a single analysis of variance model rather than multiple regression models. Suitable contrasts were used to test the relevant hypotheses. Although the correlation method appears to generate similar results, it does represent a significant departure from the standard Poly-k method that the NTP has used.

In addition, to incorporate lesion severity scores, the distribution-free (but unadjusted for age and unadjusted for litter correlation) method of Jonckheere (1954) and Terpstra (1952) was used to compute monotonic trend tests and the method of Shirley (1977) as modified by Williams (1986) was used to compute comparisons to controls.

Quality Assurance Methods

The 30- and 45-week studies and the 45-week stop-study were conducted in compliance with FDA Good Laboratory Practice Regulations (21 CFR, Part 58). In addition, records from these studies including protocol and any amendments, deviations, or related information; study-related standard operating procedures (SOPs) and documentation; test article accountability and characterization; raw data generated in operational areas as defined in applicable SOPs; computer records containing INLIFE and pathology raw data; daily animal room logs and a copy of the laboratory study will be submitted to the NCTR Archives.

Genetic Toxicology

Mouse Peripheral Blood Micronucleus Test Protocol

Blood was collected from excess heterozygous F1 p53+/− mice that were culled on PND 1, PND 10, or PND 28 in the 45-week study or from mice in the 30-week study at terminal kill (Appendix B). Micronucleated cells were identified and quantified using a MicroFlowPLUS mouse kit (Litron Laboratories, Rochester, NY) (Dertinger et al., 2006). The frequencies of micronucleated reticulocytes and micronucleated normochromatic erythrocytes were determined in blood samples collected on PND 1 following transplacental dosing from GDs 12 through 18 (Figure 1) or within 6 hours after dosing for older mice. Mouse peripheral blood was diluted with anticoagulant, fixed in −80° C methanol, and stained with three fluorochromes for flow analysis. Reticulocytes were identified by fluorescein isothiocyanate-labeled antibodies against the CD71 mouse surface antigen; platelets were identified by phycoerythrin-labeled antibodies against CD61 antigen; and DNA, including micronuclei, was stained with propidium iodide. Flow cytometry was performed on a FACScan™ (Beckton-Dickinson, San Jose, CA) equipped with a 488 nm argon ion laser and fluorescence detectors. Data acquisition for each sample stopped automatically after 20,000 reticulocytes were detected by the flow cytometer. The flow cytometry collection parameters for the samples were set as described in the Litron instruction manual. Differences in reticulocyte micronucleus frequency between dose groups were analyzed using a two-tailed Dunnett’s test. At 30 weeks, when data from only 0/0/0 mg/kg and 240/120/240 mg/kg groups were available, the statistical analysis consisted of a two-tailed Student’s t-test. When data were available from more than one pup/sex from a given litter, the data were averaged and incorporated as one data point.