U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Academies of Sciences, Engineering, and Medicine; Division on Earth and Life Studies; Board on Environmental Studies and Toxicology; Committee on Endocrine-Related Low-Dose Toxicity. Application of Systematic Review Methods in an Overall Strategy for Evaluating Low-Dose Toxicity from Endocrine Active Chemicals. Washington (DC): National Academies Press (US); 2017 Jul 18.

Cover of Application of Systematic Review Methods in an Overall Strategy for Evaluating Low-Dose Toxicity from Endocrine Active Chemicals

Application of Systematic Review Methods in an Overall Strategy for Evaluating Low-Dose Toxicity from Endocrine Active Chemicals.

Show details

Appendix DSupporting Materials for the Phthalate (Human) Systematic Review

SECTION D-1. PHTHALATE (HUMAN) SYSTEMATIC REVIEW PROTOCOL

  • August 3, 2016
  • (Modified on October 31, 2016—See Section D-1f)

BACKGROUND AND INTRODUCTION

Phthalates are high production volume chemicals used primarily as plasticizers in many industrial and consumer products. As a result of their ubiquitous use, there is documented widespread human exposure to them. Because the fetus has been shown to be particularly vulnerable to endocrine-disrupting chemicals, such as phthalates, the committee decided to focus on studies of in utero exposure. Ortho-phthalates have been linked to effects on male reproductive-tract development after in utero exposure in human studies.

OBJECTIVE AND SPECIFIC AIMS

Review Question

The overall objective of this systematic review is to answer the question what is the effect of in utero exposure to ortho-phthalates on anogenital distance, hypospadias, or testosterone concentrations in male humans?

The specific aims of the review include

  • Identify literature reporting the effects of in utero phthalate exposure on male anogenital distance, hypospadias, or testosterone in humans.
  • Extract data on the effects of in utero phthalate exposure on male anogenital distance, hypospadias, or testosterone from relevant studies.
  • Assess the internal validity (risk of bias) of individual studies.
  • Summarize the extent of evidence available.
  • Synthesize the evidence using a narrative approach or meta-analysis (if appropriate) considering limitations on data integration such as study-design heterogeneity.
  • Rate the confidence in the body of evidence for studies in humans according to one of five statements: (1) high; (2) moderate; (3) low; (4) very low/no evidence available; or (5) evidence of lack of effects on male reproductive-tract development.

PECO Statement

A PECO (Population, Exposure, Comparator, and Outcome) statement was developed by the review team as an aid to identify search terms and inclusion/exclusion criteria as appropriate for addressing the review question for the systematic review.

Population: Male humans

Exposure:

  • In utero exposure to any of the following ortho-phthalates or the corresponding monoester or oxidative metabolites: benzylbutyl phthalate (CAS no. 85-68-7), dibutyl phthalate (CAS no. 84-74-2), diethyl phthalate (CAS 84-66-2), diethylhexyl phthalate (CAS no. 117-81-7), diisobutyl phthalate (CAS no. 84-69-5), diisononyl phthalate (CAS no. 28553-12-0), diisooctyl phthalate (CAS no. 27554-26-3), dimethyl phthalate (CAS no. 131-11-3), di-n-octyl phthalate (CAS no. 117-84-0), diisodecyl phthalate (CAS no. 26761-40-0), and/or dipentyl phthalate (CAS no. 131-18-0).
  • No restrictions based on route of exposure. Measurements must be based on biomonitoring data (e.g., urinary monoester or oxidative metabolites, amniotic fluid oxidative phthalate metabolites, oxidative metabolites in other matricies).

Comparator: Male humans exposed in utero to lower concentrations of phthalates.

Outcomes:

  • Anogenital distance (AGD): the measured distance between the anus and the genitals. Typically measured from the anus to the base of the scrotum or the base of the phallus. Other measures that might be used:
    • Anogenital index (AGI): AGD measurement divided by body weight or by the cube root of body weight.
    • Anoscrotal distance (ASD): the measured distance between the anus and base of the scrotum.
    • Anopenile distance (APD): the measured distance from the anus to the base of the penis.
  • Hypospadias (incidence, prevalence, and severity/grade) based on clinical guidelines for assessment.
  • Testosterone concentrations measured during gestation or at delivery.

METHODS

Problem Formulation and Protocol Development

The review question and specific aims were developed and refined through a series of problem formulation steps. The committee considered review articles on endocrine disruptors in surveying the types of chemicals that might make good case examples, and held a workshop to explore potential case examples, including phthalates. The committee sought an example of a chemical for which both the human and the animal evidence on effects appear to be associated with different exposure levels of that chemical and due to perturbation of the estrogen or androgen hormone system. Phthalates appear to fit this case criterion, and positive feedback was received at the committee's workshop.

Alterations in male reproductive-tract development are the most sensitive effects from exposure to phthalates (NRC 2008). Because the period during in utero sexual differentiation (i.e., the masculinization programming window) is the most sensitive life stage, the exposure period of interest for the systematic review is in utero. This systematic review will focus on the same end points chosen for the phthalate (animal) systematic review: end points reflecting androgen-dependent adverse effects (AGD and hypospadias), an adverse effect that occurs at relatively low doses (AGD), and a key event in the adverse outcome pathway leading to reduced AGD and hypospadias (fetal testosterone).

Consideration was given to including cryptorchidism as an end point, but the committee decided against it. The mode of action for phthalate-induced cryptorchidism involves reductions in INSL-3 levels in addition to androgen-dependent mechanisms. Important for the committee's charge, there are few, if any, human studies on dose-response relationships between phthalate exposure and cryptorchidism to compare to animal data. Furthermore, studies have shown that rats exposed to phthalates have similar sensitivity to decreased fetal testosterone and AGD as they do for decreased INSL-3, and that cryptorchidism is a less sensitive end point compared to reductions in AGD. Because the overall objective of the committee is to use this systematic review with the one being conducted on the animal evidence to evaluate the coherence between effects and dose-response relationships, the committee judged that it would not be useful to include cryptorchidism in the systematic reviews on phthalates.

The protocol will be peer reviewed by subject-matter and systematic-review experts in accordance with standard report-review practices of the National Academies of Sciences, Engineering, and Medicine. The protocols will be revised in response to peer review comments and will subsequently be published as appendices to the committee's final report. The identity of the peer reviewers will remain anonymous to the committee until the publication of the final report, when their names and affiliations are disclosed in the Preface.

Committee and Staff

There are 11 committee members, supported by two staff members of the National Academies. The committee members were appointed in accordance with the standard policies and practices of the National Academies on the basis of their expertise in general toxicology, reproductive toxicology, developmental toxicology, endocrinology, neurotoxicology, epidemiology, risk assessment, biostatistics, and systematic-review methods. The membership of the committee and the staff was determined before the topic of the systematic review was selected. It was known, however, that each case study would be on an endocrine-disrupting chemical, so committee members who have relevant expertise were specifically recruited and appointed.

Review Team

The review team for this case study will be a subgroup of the committee (RH, SS), two National Academies staff members (EM, SM), and an information specialist (JB). If a member of the review team was a coauthor of a study under review, that member will recuse himself or herself from the evaluation of the quality of that study.

The review team will be responsible for performing all aspects of the review, including conducting the literature searches; applying inclusion/exclusion criteria to screen studies; extracting data; assessing risk of bias for included studies; and analyzing and synthesizing data. The roles and responsibilities of the team members will be documented throughout the protocol. Throughout the course of its work, the review team will also engage other members of the committee to provide consultation needed. The involvement of those individuals will be documented and acknowledged.

Biographical information on the review team is presented in Section D-1a.

Search Methods

Search for Existing Systematic Reviews

The review team will consider using existing systematic reviews to address or help to address its research question. English-language systematic reviews conducted within the last 3 years will be sought. The review team will incorporate prior reviews, update prior reviews, and/or use the reviews as part of its searching, depending on determination of their relevancy and quality (Whitlock et al. 2008). Current guidance on using existing systematic reviews will be used (Robinson et al. 2014, 2015, 2016).

Search

Recent, relevant high-quality systematic reviews addressing the research question about phthalates and male reproductive-tract development will be searched. PubMed will be searched by adding the qualifier “systematic review”[ti] OR “meta-analysis”[pt] OR “meta-analysis”[ti] OR (“systematic”[ti] AND “review”[ti]) OR (systematic review [tiab] AND review [pt]) OR “meta synthesis”[ti] OR “meta synthesis”[ti] OR “integrative review”[tw] OR “integrative research review”[tw] OR “cochrane database syst rev”[ta] OR “evidence synthesis”[tiab] to the preliminary search strategy (see Section D-1b). Language and date restrictions will be applied (English language; published 2013 to present). The systematic review protocol registry PROSPERO (CRD) will also be searched using key terms from the preliminary PubMed strategy.

Study Selection

Two team members (SM, EM) will independently screen search results, applying the following exclusion criteria:

  • Not a systematic review.1 The minimum criteria for a study to be considered a systematic review are
    • conduct of an explicit and adequate literature search,
    • application of predefined eligibility criteria,
    • consideration of the quality of included studies or risk of bias assessment, and
    • synthesis (or attempt at synthesis) of the findings, either qualitatively or quantitatively.
  • Not in English.
  • Search date prior to 2013.
  • Does not match our research question or PECO elements.

For PubMed results, screening will be conducted first using abstracts and then at the full-text level. Results from PROSPERO will be conducted at one level, using the information in the registry. Disagreements regarding eligibility will be resolved through discussion or, where necessary, by a third team member.

Assessment for Quality

Two investigators (KR, AR) will independently assess the risk of bias of eligible systematic reviews using ROBIS (Whiting et al. 2016). Disagreements in rating will be resolved through discussion or, where necessary, through consultation with a third team member. Systematic reviews rated as low quality will be excluded from further consideration at this stage.

Use of Existing Reviews

Eligible systematic reviews of high quality will be reviewed, considering date of search and match with the PECO statement as well as availability of data from the primary studies, how risk of bias was conducted, and other factors. Current reviews considered a good match will be used to address the research question. Reviews that are a good match but with search dates more than a year ago will be updated. If no relevant systematic reviews are found, an independent systematic review will be performed.

Literature Search for Independent Systematic Review

The review team will collaborate with an information specialist (JB) who has training, expertise, and familiarity with developing and performing systematic review literature searches. A variety of methods will be used to identify relevant data (see below). Literature searches will not be limited by publication date.

Online Databases

Electronic searches of the following three online databases will be performed using the search terms outlined in Section D-1b: PubMed, Embase, and Toxline. The search strategy and search terms will be developed by the information specialist (JB), who will implement the search for relevant studies.

Other Resources

Hand searching the reference lists of all the included studies after full-text review will be conducted using the same study selection process as used for screening records retrieved from the electronic search. Relevant studies identified through these steps will be marked as “provided from other sources” in the study selection flow diagram.

Study Selection

All search results will be imported or manually entered into EndNote (Version x7) reference management software. EndNote will be used to eliminate any duplicate citations before evaluating the eligibility of the citations.

Screening Process

References retrieved from the literature search will be screened for relevance and eligibility against the evidence selection criteria using DistillerSR (Evidence Partners; https://www.evidencepartners.com). Screeners from the review team will be trained with an initial pilot phase on 25 studies undertaken to improve clarity of the evidence selection criteria and to improve accuracy and consistency among screeners. Screening forms are presented in Section D-1c.

Title and Abstract Screening

Each citation will be independently screened by two reviewers (SM, EM) to determine whether it meets the selection criteria for inclusion that reflect the PECO statement with some additional considerations as listed below. Citations included at the title/abstract screening level will be subject to a full-text review by the same two reviewers. Disagreements regarding citation eligibility will be resolved via consensus and, where necessary, by consulting a committee member.

The title/abstract screening form will be used to screen and EXCLUDE references if at least one of the following criteria is met:

1.

No original data (e.g., review article, commentary, editorial)

2.

Study does not include male humans

3.

Study does not report phthalate exposure

4.

No relevant outcomes

5.

Incomplete information (e.g., conference abstract, meeting poster)

6.

Not in English and unable to determine eligibility

7.

Other (explanation required)

The following types of records will be INCLUDED at the title/abstract level: any English-language study of male humans exposed to phthalates in utero.

Only English-language publications will be included, because of time and resource constraints. There appears to be no indication that foreign-language publications would make a contribution that is distinct from what is found in the English-language literature.

Updated details to instructions and interpretations for title and abstract screening will be added to the Section D-1f to document the process of the review team during the screening process.

Full-Text Screening

Citations included at the title/abstract screening level will be subject to a full-text review by the same two reviewers involved in title and abstract screening (SM, EM). Each reference will be screened in duplicate and independently. Disagreements regarding citation eligibility will be resolved via consensus and, where necessary, by consulting a committee member.

Citations will be EXCLUDED at the full-text level if at least one of the following criteria is met:

1.

No original data (e.g., review article, commentary, editorial)

2.

Study does not include male humans

3.

Study does not report phthalate exposure to one or more of the phthalates listed in the PECO statement

4.

Study does not have biomonitoring data specific to phthalate exposure

5.

Study does not include in utero exposure

6.

Study does not assess or report anogenital distance, anogenital index, anoscrotal distance, anopenile distance, hypospadias, or testosterone concentrations measured during gestation or at delivery

7.

No comparator group (males exposed in utero at lower concentrations of phthalates)

8.

Not in English

9.

Other reason (explanation required)

The reason for exclusion at the full-text-review stage will be annotated and reported in a study selection flow diagram in the final report (following PRISMA [Moher et al. 2009]). The reasons for exclusion will be documented from the list (1-9) above.

Citations will be INCLUDED if they meet the PECO statement criteria:

  • Study includes male humans
  • Study includes in utero exposure
  • Study includes comparison with males exposed in utero at lower concentrations
  • Study measures anogenital distance, anogenital index, anoscrotal distance, anopenile distance, hypospadias, or testosterone concentrations

Updated details to instructions and interpretations for full-text screening will be added to the Section D-1f to document the process of the review team during the screening process.

Data Extraction

Data will be collected and recorded (extracted) from included studies by one member of the review team and checked by a second member for completeness and accuracy. Any discrepancies in data extraction will be resolved through discussion. The extracted data will be used to summarize study designs and findings and/or to conduct statistical analyses. Section D-1d presents the data extraction elements that will be used.

The review team will attempt to contact authors of included studies to obtain missing data considered important for evaluating key study findings (e.g., level of data required to conduct a meta-analysis). The study extraction files will note whether an attempt was made to contact study authors by email for missing data considered important for evaluating key study findings (and whether or not a response was received).

Multiple publications with overlapping data for the same study (e.g., publications reporting subgroups, additional outcomes or exposures outside the scope of an evaluation, or longer follow-up) are identified by examining author affiliations, study designs, cohort name, enrollment criteria, and enrollment dates. If necessary, study authors will be contacted to clarify any uncertainty about the independence of two or more articles. The review will include all publications on the study, select one publication to use as the primary publication, and consider the others as secondary publications with annotation as being related to the primary record during data extraction. The primary study will generally be the publication with the longest follow-up or, for studies with equivalent follow-up periods, the study with the largest number of cases or the most recent publication date. The review will include relevant data from all publications of the study, although if the same outcome is reported in more than one report, the review team will include a single instance of the data (and avoid more than one—that is, duplicate instances of the data).

Data extraction will be completed using the Health Assessment Workspace Collaborative (HAWC) software, an open source and freely available Web-based interface application, for visualization and warehousing.2

Risk of Bias (Quality) Assessment of Individual Studies

Risk of bias is related to the internal validity of a study and reflects study-design characteristics that can introduce a systematic error (or deviation from the true effect) that might affect the magnitude and even the direction of the apparent effect. Internal validity or risk of bias will be assessed for individual studies using a tool developed by the National Toxicology Program's Office of Health Assessment and Translation (OHAT) that outlines an approach to evaluating risk of bias for human epidemiology studies. The risk of bias domains and questions are based on established guidance for observational human studies and randomized controlled trials (Higgins and Green 2011; Viswanathan et al. 2012, 2013; Sterne et al. 2014). The risk of bias tool includes a common set of questions (Section D-1e) that are answered based on the specific details of individual studies to develop risk of bias ratings (using the four options: definitely low risk of bias; probably low risk of bias; probably high risk of bias; or definitely high risk of bias). Study design determines the subset of questions that should be used to assess risk of bias for an individual study (see Table D1-1).

Studies are independently assessed by two assessors (RH, SS) who answer all applicable risk of bias questions with one of four options (see Table D1-2) following prespecified criteria detailed in Section D-1e. The criteria describe aspects of study design, conduct, and reporting required to reach risk of bias ratings for each question and specify factors that can distinguish among ratings (e.g., what separates “definitely low” from “probably low” risk of bias). The instructions and detailed criteria are tailored to the specific type of human study designs. Risk of bias will be assessed at the outcome level because study design or method specifics may increase the risk of bias for some outcomes and not others within the same study.

Information or study procedures that were not reported are assumed not to have been conducted, resulting in an assessment of “probably high” risk of bias. Authors will be queried by email to obtain missing information, and responses received were used to update risk of bias ratings.

Assessors will be trained in using the criteria to develop risk of bias ratings for each question, with an initial pilot phase undertaken to improve clarity of criteria that distinguish between adjacent ratings and to improve consistency among assessors. All team members involved in the risk of bias assessment will be trained on the same set of studies and asked to identify potential ambiguities in the criteria used to assign ratings for each question. Any ambiguities and rating conflicts will be discussed relative to opportunities to refine the criteria to more clearly distinguish between adjacent ratings. If major changes to the risk of bias criteria are made based on the pilot phase (i.e., those that would likely result in revision of response), they will be documented in a protocol amendment along with the date and the logic for the changes. It is also expected that information about confounding, exposure characterization, outcome assessment, and other important issues may be identified during or after data extraction, which can lead to further refinement of the risk of bias criteria.

After assessors have independently made risk of bias determinations for a study across all risk of bias questions, the two assessors will compare their results to identify discrepancies and attempt to resolve them. Any remaining discrepancies will be considered and resolved with the review team. The final risk of bias rating for each question will be recorded along with a statement of the basis for that rating.

Data Analysis and Evidence Synthesis

The review team will qualitatively synthesize the body of evidence for each outcome and, where appropriate, a meta-analysis will be performed. If a meta-analysis is performed, summaries of main characteristics for each included study will be compiled and reviewed by two team members to determine comparability between studies, to identify data transformations necessary to ensure comparability, and to determine whether heterogeneity is a concern. The main characteristics considered across all eligible studies include the following:

  • Study design (e.g., cross-sectional, cohort)
  • Details on how participants were classified into exposure groups (e.g., quartiles of exposure)
  • Details on source of exposure data (e.g., questionnaire, area monitoring, biomonitoring)
  • Measurement of biomonitoring data specific to phthalate exposure for each exposure group
  • Health outcome(s) reported
  • Conditioning variables in the analysis (e.g., variables considered confounders)
  • Type of data (e.g., continuous or dichotomous), statistics presented in paper, access to raw data
  • Variation in degree of risk of bias at individual study level

The review team expects to require input from subject-matter experts to help assess the heterogeneity of the studies. Subgroup analyses to examine the extent to which risk of bias contributes to heterogeneity will be performed. Situations where it may not be appropriate to include a study are when data on exposure or outcome are too different to be combined or other circumstances that may indicate that averaging study results would not produce meaningful results. When considering outcome measures for conducting meta-analyses, continuous outcome measures, such as beta coefficients (and their associated confidence intervals) from regression analysis, are preferred. A secondary alternative, when there are more than two groups, is to conduct a regression analysis of the odds or risk ratios across exposure groups and to use the derived beta coefficient. A tertiary alternative when there are only two groups (e.g., higher and lower exposure) is to use the odds or risk ratio itself.

TABLE D1-1. OHAT Risk of Bias Tool.

TABLE D1-1

OHAT Risk of Bias Tool.

TABLE D1-2. Answers to the Risk of Bias Questions.

TABLE D1-2

Answers to the Risk of Bias Questions.

If a meta-analysis is conducted, a random effects model will be used for the analysis. Heterogeneity will be assessed using the I-squared statistic. Interpretation of I-squared will be based on the Cochrane Handbook: 0% to 40% (might not be important); 30% to 60% (may represent moderate heterogeneity); 50% to 90% (may represent substantial heterogeneity); 75% to 100% (considerable heterogeneity). Additionally, as described in the Cochrane Handbook, for the last three categories, the importance of the I-squared will be interpreted considering not only the magnitude of effects but also the strength of the evidence (90% two-tailed confidence interval).

The review team will also perform sensitivity analyses on the following aspects:

  • Sensitivity to exclusion of individual studies in succession,
  • Sensitivity to alternative exposure metrics (if available), and
  • Sensitivity to alternative outcome metrics (if available).

It is unlikely that there will be enough studies or information to meaningfully assess publication bias or to perform subgroup analyses, so no such analyses are planned.

In the event that these proposed methods for data analysis are altered to tailor to the evidence base from included studies, the protocol will be amended accordingly, and the reasons for change will be justified in the documentation.

Confidence Rating: Assessment of the Body of Evidence

The quality of evidence for each male reproductive outcome will be evaluated using the GRADE system for rating the confidence in the body of evidence (Guyatt et al. 2011; Rooney et al. 2014). More detailed guidance on reaching confidence ratings in the body of evidence as “high,” “moderate,” “low,” or “very low” is provided in NTP (2015, see Step 5). In brief, available studies on a particular outcome are initially grouped by key study-design features, and each grouping of studies is given an initial confidence rating by those features.

The initial rating is downgraded for factors that decrease confidence in the results, including

  • high risk of bias
  • unexplained inconsistency
  • indirectness or lack of applicability
  • imprecision
  • publication bias

The initial rating is upgraded for factors that increase confidence in the results, including

  • large magnitude of effect
  • dose-response relationship
  • consistency across study designs/populations/animal models or species
  • consideration of residual confounding
  • other factors that increase our confidence in the association or effect (e.g., particularly rare outcomes)

The reasons for downgrading (or upgrading) confidence may not be due to a single domain of the body of evidence. If a decision to downgrade is borderline for two domains, the body of evidence is downgraded once in a single domain to account for both partial concerns based on considering the key drivers of the strengths or weaknesses. Similarly, the body of evidence is not downgraded twice for what is essentially the same limitation (or upgraded twice for the same asset) that could be considered applicable to more than one domain of the body of evidence. Consideration of consistency across study designs, human populations, or animal species is not included in the GRADE guidance (Guyatt et al. 2011); however, it is considered in the modified version of GRADE used by OHAT (Rooney et al. 2014).

Confidence ratings are independently assessed by members of the review team, and discrepancies will be resolved by consensus and consultation with technical advisors as needed. Confidence ratings will be summarized in evidence profile tables.

REFERENCES

  • Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, Devereaux PJ, Montori VM, Freyschuss B, Vist G, Jaeschke R, Williams JW Jr., Murad MH, Sinclair D, Falck-Ytter Y, Meerpohl J, Whittington C, Thorlund K, Andrews J, Schunemann HJ. GRADE guidelines 6. Rating the quality of evidence—imprecision. J. Clin. Epidemiol. 2011;64(12):1283–1293. [PubMed: 21839614]
  • Higgins J, Green S, editors. Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0 (updated March 2011). The Cochrane Collaboration; 2011. [May 6, 2016]. http://handbook​.cocharne.org.
  • IOM (Institute of Medicine). Finding What Works in Health Care: Standards for Systematic Reviews. Washington, DC: The National Academies Press; 2011. [PubMed: 24983062]
  • Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. J. Clin. Epidemiol. 2009;62(10):1006–1012. [PubMed: 19631508]
  • NRC (National Research Council). Phthalates and Cumulative Risk Assessment: The Tasks Ahead. Washington, DC: The National Academies Press; 2008. [PubMed: 25009926]
  • NTP (National Toxicology Program). Handbook for Conducting a Literature-Based Health Assessment Using OHAT Approach for Systematic Review and Evidence Integration. Office of Health Assessment and Translation, Division, National Toxicology Program, National Institute of Environmental Health Sciences; Jan 9, 2015. 2015. [September 21, 2015]. http://ntp​.niehs.nih​.gov/ntp/ohat/pubs/handbookjan2015_508​.pdf.
  • Robinson KA, Whitlock EP, O'Neil ME, Anderson JK, Hartling L, Dryden DM, Butler M, Newberry SJ, McPheeters M, Berkman ND, Lin JS, Chang S S. Integration of Existing Systematic Reviews. Research white paper. Rockville, MD: Agency for Healthcare Research and Quality; 2014. [May 9, 2016]. https://www​.ncbi.nlm​.nih.gov/books/NBK216379/ AHRQ Publication No. 14-EHC016-EF. [PubMed: 25032273]
  • Robinson KA, Chou R, Berkman ND, Newberry SJ, Fu R, Hartling L, Dryden D, Butler M, Foisy M, Anderson J, Motu'apuaka M, Relevo R, Guise JM, Chang S. Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Rockville, MD: Agency for Healthcare Research and Quality; 2015. [May 9, 2016]. Integrating Bodies of Evidence: Existing Systematic Reviews and Primary Studies. https://www​.ncbi.nlm​.nih.gov/books/NBK279904/ AHRQ Publication No. 15-EHC007-EF.
  • Robinson KA, Chou R, Berkman ND, Newberry SJ, Fu R, Hartling L, Dryden D, Butler M, Foisy M, Anderson J, Motu'apuaka M, Relevo R, Guise JM, Chang S. Twelve recommendations for integrating existing systematic reviews into new reviews: EPC guidance. J. Clin. Epidemiol. 2016;70:38–44. [PubMed: 26261004]
  • Rooney AA, Boyles AL, Wolfe MS, Bucher JR, Thayer KA. Systematic review and evidence integration for literature-based environmental health assessments. Environ. Health Perspect. 2014;122(7):711–718. [PMC free article: PMC4080517] [PubMed: 24755067]
  • Sterne JAC, Higgins JPT, Reeves BC. ACROBAT-NRSI: A Cochrane risk of Bias Assessment Tool for Non-randomized Studies of Interventions. Version 1.0.0. Sep 24, 2014. 2014. [May 6, 2016]. www​.riskofbias.info.
  • Viswanathan M, Ansari M, Berkman ND, Chang S, Hartling L, McPheeters LM, Santaguida PL, Shamliyan T, Singh K, Tsertsvadze A, Treadwell JR. Assessing the Risk of Bias of Individual Studies when Comparing Medical Interventions. Rockville, MD: Agency for Healthcare Research and Quantitative Methods Guide for Comparative Effectiveness Reviews; 2012. [May 6, 2016]. www​.effectivehealthcare.ahrq.gov/ AHRQ Publication No. 12-EHC047-EF.
  • Viswanathan M, Berkman ND, Dryden DM, Hartling L. Methods Research Report. Rockville, MD: Agency for Healthcare Research and Quantitative Methods Guide for Comparative Effectiveness Reviews; 2013. [May 6, 2016]. Assessing Risk of Bias and Confounding in Observational Studies of Interventions or Exposures: Further Development of the RTI Item Bank. www​.effectivehealthcare​.ahrq.gov/reports/final.cfm. AHRQ Publication No. 13-EHC106-EF. [PubMed: 24006553]
  • Whiting P, Savovic J, Higgins JB, Caldwell DM, Reeves BC, Shea B, Davies P, Kleijnen J, Churchill R. ROBIS: A new tool to assess risk of bias in systematic reviews was developed. J. Clin. Epidemiol. 2016;69:225–234. [PMC free article: PMC4687950] [PubMed: 26092286]
  • Whitlock EP, Lin JS, Chou R, Shekelle P, Robinson KA. Using existing systematic reviews in complex systematic reviews. Ann. Intern. Med. 2008;148(10):776–782. [PubMed: 18490690]

Footnotes

1

A systematic review “is a scientific investigation that focuses on a specific question and uses explicit, prespecified scientific methods to identify, select, assess, and summarize the findings of similar but separate studies” (IOM 2011, p. 1).

2

HAWC (Health Assessment Workspace Collaborative): A Modular Web-based Interface to Facilitate Development of Human Health Assessments of Chemicals (https://hawcproject​.org/portal/).

SECTION D-1a. REVIEW TEAM BIOGRAPHICAL INFORMATION

Jaime F. Blanck is a clinical informationist at the Welch Medical Library at Johns Hopkins University. She creates and implements systematic review search strategies across multiple databases and provides comprehensive reference, research, and information services to multiple departments within the School of Medicine. She received an MLIS from the University of Pittsburgh and an MPA from the University of Baltimore.

Russ B. Hauser is the Frederick Lee Hisaw Professor of Reproductive Physiology and Professor of environmental and occupational epidemiology in the Department of Environmental Health at the Harvard T.H. Chan School of Public Health. He also holds an appointment at the Harvard Medical School, where he is professor of obstetrics, gynecology, and reproductive biology. Dr. Hauser's research focuses on the health risks posed by exposure to environmental chemicals that adversely affect human development and reproductive health. He has served on several NRC and IOM committees, including the Committee to Review EPA's State of the Science Paper on Nonmonotonic Dose Response and the Committee on the Health Risks of Phthalates. Dr. Hauser is a member of two EPA Science Advisory Boards. He served on the US Consumer Product Safety Commission's Chronic Hazard Advisory Panel examining the effects of phthalates on children's health. He received an MD from the Albert Einstein College of Medicine and an MPH and a ScD from the Harvard School of Public Health.

Ellen Mantus is a scholar and director of risk assessment on the Board on Environmental Studies and Toxicology at the National Academies of Sciences, Engineering, and Medicine with more than 20 years of experience in the fields of toxicology and risk assessment. She has served as the study director on numerous projects, including ones that have assessed the health implications of various chemical exposures; developed strategies for applying modern scientific approaches in toxicology and risk assessment; provided guidance to federal agencies on risk-based decision making; and evaluated barriers to deployment of electric vehicles and associated charging infrastructure. Before joining the National Academies, Dr. Mantus was a project manager with ICF Consulting where she served as a primary reviewer for numerous toxicological studies and provided risk assessment and regulatory support on a wide array of projects. Dr. Mantus received a PhD in chemistry from Cornell University.

Susan Martel is a senior program officer in the Board on Environmental Studies and Toxicology at the National Academies of Sciences, Engineering, and Medicine. She has 20 years of experience in supporting toxicology and risk assessment projects for the US Environmental Protection Agency, the US Department of Defense, and the National Aeronautics and Space Administration. Recent projects include working with committees evaluating the toxicological effect of arsenic, developing exposure guidelines for use on spacecraft, and assessing pesticide risks-assessment practices. Before joining the National Academies, she was the administrator of the Registry for Toxicology Pathology for Animals at the American Registry of Pathology. She received a BA in biology from Skidmore College.

Andrew A. Rooney is deputy director of the Office of Health Assessment and Translation (OHAT) in the National Toxicology Program at the National Institute of Environmental Health Sciences. He has been developing risk assessment methods and guidance throughout his professional career and is a principal author of the 2012 WHO/IPCS Guidance for Immunotoxicity Risk Assessment for Chemicals. Most recently, he has been working on emerging issues in toxicology and environmental health, including methods to address study quality in terms of risk of bias for human, animal, and mechanistic studies and adaptation of systematic review methods for addressing environmental health questions. He led the team that developed the OHAT approach to systematic review. Dr. Rooney has an MS and a PhD in zoology from the University of Florida.

Sheela Sathyanarayana is an associate professor in the Department of Pediatrics and an Adjunct Associate Professor in the Department of Environmental and Occupational Health Sciences at the University of Washington. She is also an attending physician at Harborview Medical Center and Seattle Children's Hospital. Her research interests focus on exposures to endocrine disrupting chemicals, including phthalates and bisphenol A, and their effects on reproductive development. Currently, Dr. Sathyanarayana is the center director and clinical director for The Infant Development and Environment Study, which is a multicenter cohort study of phthalate exposures in pregnancy and health outcomes in children. She also chairs EPA's Children's Health Protection Advisory Committee. Dr. Sathyanarayana earned an MD from the University of Southern California and an MPH in epidemiology from the University of Washington.

SECTION D-1b. LITERATURE SEARCH STRATEGY

The review team will employ a multi-method process to identify all potentially relevant studies as detailed below.

Electronic Searches

PubMed

A search string employing medical subject heading (MeSH) terms and keyword synonyms will be developed. The PubMed search strategy will be considered the primary search strategy and will provide the basis of the other electronic search strategies. To assist in compiling these terms, the review team will conduct a text analysis of eight articles known to the authors. These articles were selected because they represent both American and non-American publications and will help identify spelling variants. The search strategies will address each of the following concepts:

  • Phthalates—The review team will use the MeSH database (http://www.ncbi.nlm.nih.gov/mesh) to find all MeSH heading and Supplementary Concept headings that relate to the following phthalates: the CAS numbers to these 11 phthalates: benzylbutyl phthalate (CAS no. 85-68-7), dibutyl phthalate (CAS no. 84-74-2), diethyl phthalate (CAS 84-66-2), diethylhexyl phthalate (CAS no. 117-81-7), diisobutyl phthalate (CAS no. 84-69-5), diisononyl phthalate (CAS no. 28553-12-0), diisooctyl phthalate (CAS no. 27554-26-3), dimethyl phthalate (CAS no. 131-11-3), di-n-octyl phthalate (CAS no. 117-84-0), diisodecyl phthalate (CAS no. 26761-40-0), and/or dipentyl phthalate (CAS no. 131-18-0). The review team will mine the “Entry Terms” list for each of the controlled vocabulary terms identified and include all unique keyword synonyms listed for each. CAS registry numbers for each phthalate substance will also be included in the list of search terms. All MeSH terms, Supplementary Concept terms, keyword synonyms, and CAS registry numbers will be searched together as one concept using the Boolean operator “OR.”
  • Exposure—The review team will use the MeSH database (http://www.ncbi.nlm.nih.gov/mesh) to find all MeSH heading and Supplementary Concept headings that relate to the exposure concept. The review team will mine the “Entry Terms” list for each of the controlled vocabulary terms identified and include all unique keyword synonyms listed for each. All MeSH terms and keyword synonyms will be searched together as one concept using the Boolean operator “OR.”
  • Human studies—The search filter developed by the Cochrane Library to identify human studies (see http://handbook.cochrane.org/ part 2, section 6.4.f) will be modified to comply with PubMed formatting.
  • Outcomes—The review team will use the MeSH database (http://www.ncbi.nlm.nih.gov/mesh) to find all MeSH heading and Supplementary Concept headings that relate to male genital abnormalities.

Each of the above concepts will be searched together using the Boolean operator “AND.” There will not be limitations on date of publication, language, or publication type. All citation records will be exported to EndNote. Additional citations identified through the search processes identified below will also be exported to the project EndNote library. Duplicates will be removed from the citation library using the “Find Duplicates” tool in EndNote as well as a manual review of citations by the project librarian to identify any duplicates not found during the automated process. The number of citations found in each database will be recorded, as well as the number of duplicates and final tally of unique citations. The final library of citations will be uploaded to the Health Assessment Workspace Collaboration Web-based tool (www.hawcproject.org) for systematic reviews where they will be reviewed by the team.

Embase

The controlled vocabulary database Emtree is used by Embase. For each MeSH term identified through the process above, Emtree will be searched for the appropriate corresponding term. Additional keywords will be identified using the list of synonyms from each Emtree record and added to the keywords from the MeSH records.

Toxline

The review team will develop the Toxline search strategy by removing any database specific formatting from the PubMed search strategy to create a keyword-only search (Toxline does not employ a controlled vocabulary).

Search Strategies

PubMed

(“butylbenzyl phthalate” [Supplementary Concept] OR “Dibutyl Phthalate”[Mesh] OR “diethyl phthalate” [Supplementary Concept] OR “Diethylhexyl Phthalate”[Mesh] OR “diisobutyl phthalate” [Supplementary Concept] OR “diisononyl phthalate” [Supplementary Concept] OR “diisooctyl phthalate” [Supplementary Concept] OR “dimethyl phthalate” [Supplementary Concept] OR “di-n-octyl phthalate” [Supplementary Concept] OR “benzylbutyl phthalate”[tw] OR “benzyl butyl phthalate”[tw] OR “butyl benzyl phthalate”[tw] OR “butylbenzyl phthalate”[tw] OR “butylbenzylphthalate”[tw] OR “phthalic acid butyl benzyl ester”[tw] OR “butyl-benzyl-phthalate”[tw] OR “BBzP”[tw] OR “BzBP”[tw] OR “BBPHT”[tw] OR “85-68-7”[tw] OR “Dibutyl Phthalate”[tw] OR “Di-n-Butyl Phthalate”[tw] OR “Di n Butyl Phthalate”[tw] OR “Butyl Phthalate”[tw] OR “d n butyl phthalate”[tw] OR “dbp”[tw] OR “di n butyl phthalate”[tw] OR “dibutyl phthalate”[tw] OR “dibutylphthalate”[tw] OR “phthalic acid di n butyl este”[tw] OR “84-74-2”[tw] OR “phthalic acid diethyl ester”[tw] OR “diethyl phthalate”[tw] OR “diethylphthalate”[tw] OR “ethyl phthalate”[tw] OR “di-ethyl phthalate”[tw] OR “DEP”[tw] OR “84-66-2”[tw] OR “bis (2 ethylhexyl) phthalate”[tw] OR “bis (2 ethylhexyl) phthalate”[tw] OR “bis (2 ethylhexyl) phthalate”[tw] OR “bis (2 ethylhexylphthalate)”[tw] OR “Bis(2-ethylhexyl)phthalate”[tw] OR “DEHP”[tw] OR “di (2 ethylhexyl) phthalate”[tw] OR “di 2 ethylhexyl phthalate”[tw] OR “di 2 ethylhexylphthalate”[tw] OR “Di-2-Ethylhexylphthalate”[tw] OR “diethylhexyl phthalate”[tw] OR “Dioctyl Phthalate”[tw] OR “octoil”[tw] OR “phthalic acid di 2 ethylhexyl ester”[tw] OR “phthalic acid diethylhexyl ester”[tw] OR “117-81-7”[tw] OR “di-iso-butyl phthalate”[tw] OR “DiBP”[tw] OR “84-69-5”[tw] OR “di-isononylphthalate”[tw] OR “ENJ 2065”[tw] OR “ENJ-2065”[tw] OR “di-isononyl phthalate”[tw] OR “di-iso-nonyl phthalate”[tw] OR “DINP”[tw] OR “28553-12-0”[tw] OR “Diisooctylphthalate”[tw] OR “27554-26-3”[tw] OR “diamyl phthalate”[tw] OR “dipentyl phthalate”[tw] OR “phthalic acid dipentyl ester”[tw] OR “dipentyl benzene-1,2-dicarboxylate”[tw] OR “di-n-pentyl phthalate”[tw] OR “131-18-0”[tw] OR “Dimethyl phthalate”[tw] OR “Dimethylphthalate”[tw] OR “Avolin”[tw] OR “Citrola”[tw] OR “Dmp”[tw] OR “dmp30”[tw] OR “fermine”[tw] OR “methyl phthalate”[tw] OR “mipax”[tw] OR “mugia”[tw] OR “palatinol m”[tw] OR “sketofax”[tw] OR “131-11-3”[tw] OR “Di-n-octyl phthalate”[tw] OR “di n octyl phthalate”[tw] OR “di n octylphthalate”[tw] OR “dioctyl phthalate”[tw] OR “dioctylphthalate”[tw] OR “di(n-octyl)phthalate”[tw] OR “phthalic acid di n octyl ester”[tw] OR “DNOP”[tw] OR “117-84-0”[tw]) AND (“Maternal Exposure”[Mesh] OR “Environmental Exposure”[Mesh:NoExp] OR “Prenatal Exposure Delayed Effects”[Mesh] OR “Exposure”[tw] OR “Exposed”[tw] OR “exposures”[tw] OR “exposing”[tw] AND (“Genital Diseases, Male”[Mesh] OR “Genitalia, Male”[Mesh] OR “Testosterone”[Mesh:NoExp] OR “Androgens”[Mesh] OR “Anogenital”[tw] OR “AGD”[tw] OR “AGI”[tw] OR “ASD”[tw] OR “APD”[tw] OR “Urogenital”[tw] OR “Penile”[tw] OR “penis”[tw] OR “Anoscrotal”[tw] OR “Anopenile”[tw] OR “anorectal”[tw] OR “Testosterone”[tw] OR “androgen”[tw] OR “androgens”[tw] OR “Hypospadias”[tw] OR “hypospadia”[tw] OR “Testis”[tw] OR “testes”[tw] OR ((“Anorectal”[tw] OR “genital”[tw] OR “genitals”[tw] OR “testes”[tw] OR “rectum”[tw]) AND (“malformation”[tw] OR “malformations”[tw] OR “development”[tw] OR “abnormalities”[tw] OR “abnormality”[tw] OR “dysplasia”[tw])) OR (“Male”[tw] and (“genital”[tw] OR “genitals”[tw] OR “genitalia”[tw])) OR (“Anus”[tw] AND (“genital”[tw] OR “genitals”[tw] OR “genitalia”[tw]))) NOT (((“Animals”[Mesh] NOT (“Animals”[Mesh] AND (“Humans”[Mesh]))))

Embase

“phthalic acid benzyl butyl ester”/exp OR “phthalic acid dibutyl ester”/exp OR “phthalic acid diethyl ester”/exp OR “phthalic acid bis(2 ethylhexyl) ester”/exp OR “phthalic acid dimethyl ester”/exp OR “phthalic acid dioctyl ester”/exp OR “benzylbutyl phthalate” OR “benzyl butyl phthalate” OR “butyl benzyl phthalate” OR “butylbenzyl phthalate” OR “butylbenzylphthalate” OR “phthalic acid butyl benzyl ester” OR “butyl-benzyl-phthalate” OR “BBzP” OR “BzBP” OR “BBPHT” OR “85-68-7” OR “Dibutyl Phthalate” OR “Di-n-Butyl Phthalate” OR “Di n Butyl Phthalate” OR “Butyl Phthalate” OR “d n butyl phthalate” OR “dbp” OR “di n butyl phthalate” OR “dibutyl phthalate” OR “dibutylphthalate” OR “phthalic acid di n butyl este” OR “84-74-2” OR “phthalic acid diethyl ester” OR “diethyl phthalate” OR “diethylphthalate” OR “ethyl phthalate” OR “di-ethyl phthalate” OR “DEP” OR “84-66-2” OR “bis (2 ethylhexyl) phthalate” OR “bis (2 ethylhexyl) phthalate” OR “bis (2 ethylhexyl) phthalate” OR “bis (2 ethylhexylphthalate)” OR “Bis(2-ethylhexyl)phthalate” OR “DEHP” OR “di (2 ethylhexyl) phthalate” OR “di 2 ethylhexyl phthalate” OR “di 2 ethylhexylphthalate” OR “Di-2-Ethylhexylphthalate” OR “diethylhexyl phthalate” OR “Dioctyl Phthalate” OR “octoil” OR “phthalic acid di 2 ethylhexyl ester” OR “phthalic acid diethylhexyl ester” OR “117-81-7” OR “di-iso-butyl phthalate” OR “DiBP” OR “84-69-5” OR “di-isononylphthalate” OR “ENJ 2065” OR “ENJ-2065” OR “di-isononyl phthalate” OR “di-isononyl phthalate” OR “DINP” OR “28553-12-0” OR “Diisooctylphthalate” OR “27554-26-3” OR “diamyl phthalate” OR “dipentyl phthalate” OR “phthalic acid dipentyl ester” OR “dipentyl benzene-1,2dicarboxylate” OR “di-n-pentyl phthalate” OR “131-18-0” OR “Dimethyl phthalate” OR “Dimethylphthalate” OR “Avolin” OR “Citrola” OR “Dmp” OR “dmp30” OR “fermine” OR “methyl phthalate” OR “mipax” OR “mugia” OR “palatinol m” OR “sketofax” OR “131-11-3” OR “Di-n-octyl phthalate” OR “di n octyl phthalate” OR “di n octylphthalate” OR “dioctyl phthalate” OR “dioctylphthalate” OR “di(n-octyl)phthalate” OR “phthalic acid di n octyl ester” OR “DNOP” OR “117-84-0” AND (‘male genital system disease'/exp OR ‘male genital system'/exp OR ‘testosterone'/exp OR ‘androgen'/de OR “Anogenital”:ti,ab OR “AGD”:ti,ab OR “AGI”:ti,ab OR “ASD”:ti,ab OR “APD”:ti,ab OR “Urogenital”:ti,ab OR “Penile”:ti,ab OR “penis”:ti,ab OR “Anoscrotal”:ti,ab OR “Anopenile”:ti,ab OR “anorectal”:ti,ab OR “Testosterone”:ti,ab OR “androgen”:ti,ab OR “androgens”:ti,ab OR “Hypospadias”:ti,ab OR “hypospadia”:ti,ab OR “Testis”:ti,ab OR “testes”:ti,ab OR ((“Anorectal”:ti,ab OR “genital”:ti,ab OR “genitals”:ti,ab OR “testes”:ti,ab OR “rectum”:ti,ab) AND (“malformation”:ti,ab OR “malformations”:ti,ab OR “development”:ti,ab OR “abnormalities”:ti,ab OR “abnormality”:ti,ab OR “dysplasia”:ti,ab)) OR (“Male”:ti,ab and (“genital”:ti,ab OR “genitals”:ti,ab OR “genitalia”:ti,ab)) OR (“Anus”:ti,ab AND (“genital”:ti,ab OR “genitals”:ti,ab OR “genitalia”:ti,ab))) AND (‘prenatal exposure'/exp OR ‘environmental exposure'/exp OR ‘exposure' OR ‘exposed' OR ‘exposures' OR ‘exposing' NOT (‘animal'/exp NOT (‘animal'/exp AND ‘human'/exp))

Toxline

((“117-81-7” OR “117-84-0” OR “131-11-3” OR “131-18-0” OR “27554-26-3” OR “28553-12-0” OR “84-66-2” OR “84-69-5” OR “84-74-2” OR “85-68-7” OR “Avolin” OR “BBPHT” OR “BBzP” OR “bis 2 ethylhexylphthalate” OR “Bis 2-ethylhexyl phthalate” OR “butylbenzylphthalate” OR “butyl-benzyl-phthalate” OR “BzBP” OR “Dbp” OR “DEP” OR “di n octylphthalate” OR “DiBP” OR “diethylphthalate” OR “di-isononylphthalate” OR “Diisooctylphthalate” OR “Dimethylphthalate” OR “DINP” OR “dioctylphthalate” OR “dipentyl benzene-1,2-dicarboxylate” OR “Dmp” OR “dmp30” OR “DNOP” OR “ENJ 2065” OR “fermine” OR “mipax” OR “mugia” OR “octoil” OR “o-phthalate” OR “o-phthalates” OR “palatinol” OR “sketofax”) AND (“Exposure” OR “Exposed” OR “exposures” OR “exposing”) AND (“Anogenital” OR “AGD” OR “AGI” OR “ASD” OR “APD” OR “Urogenital” OR “Penile” OR “penis” OR “Anoscrotal” OR “Anopenile” OR “anorectal” OR “Testosterone” OR “androgen” OR “androgens” OR “Hypospadias” OR “hypospadia” OR “Testis” OR “testes” OR ((“Anorectal” OR “genital” OR “genitals” OR “testes” OR “rectum”) AND (“malformation” OR “malformations” OR “development” OR “abnormalities” OR “abnormality” OR “dysplasia”)) OR (“Male” and (“genital” OR “genitals” OR “genitalia”)) OR (“Anus” AND (“genital” OR “genitals” OR “genitalia”))) NOT (animals OR animal OR mice OR mouse OR rats OR rat OR rodent OR rodents OR fish)

SECTION D-1c. SCREENING FORMS

Title and Abstract Screening Form

Instructions: When a citation is excluded, reason should be specified.

Exclusion Reasons
No original data (e.g., review article, commentary, editorial)
Study does not include male humans
Study does not report phthalate exposure
No relevant outcomes
Incomplete information (e.g., conference abstract, meeting poster)
Not in English and unable to determine eligibility
Other (explanation required)

Full-Text Screening Form

Instructions: When a citation is excluded, reason should be specified.

Exclusion Reasons
No original data (e.g., review article, commentary, editorial)
Study does not include male humans
Study does not report exposure to one or more of the phthalates listed in the PECO statement
Study does not have biomonitoring data specific to phthalate exposure
Study does not include in utero exposure
Study does not assess or report anogenital distance, anogenital index, anoscrotal distance, anopenile distance, hypospadias, or testosterone concentration measured during gestation or at delivery
No comparator group (male humans exposed in utero to lower concentrations of phthalates)
Not in English
Other (explanation required)

SECTION D-1d. DATA EXTRACTION ELEMENTS FOR HUMAN STUDIES

FundingFunding source(s)
Reporting of conflict of interest (COI) by authors (*reporting bias)
SubjectsStudy population name/description
Dates of study and sampling time frame
Geography (country, region, state, etc.)
Demographics (sex, race/ethnicity, age or life stage at exposure and at outcome assessment)
Number of subjects (target, enrolled, n per group in analysis, and participation/follow-up rates) (*missing data bias)
Inclusion/exclusion criteria/recruitment strategy (*selection bias)
Description of reference group (*selection bias)
MethodsStudy design (e.g., prospective or retrospective cohort, nested case-control study, cross-sectional, population-based case-control study, intervention, case report, etc.)
Length of follow-up (*information bias)
Health outcome category (e.g., cardiovascular)
Health outcome (e.g., blood pressure) (*reporting bias)
Diagnostic or methods used to measure health outcome (*information bias)
Confounders or modifying factors and how considered in analysis (e.g., included in final model, considered for inclusion but determined not needed) (*confounding bias)
Substance name or CAS number
Exposure assessment (e.g., blood, urine, hair, air, drinking water, job classification, residence, administered treatment in controlled study, etc.) (*information bias)
Methodological details for exposure assessment (e.g., HPLC-MS/MS, limit of detection) (*information bias)
Statistical methods (*information bias)
ResultsExposure levels (e.g., mean, median, measures of variance as presented in paper, such as SD, SEM, 75th/90th/95th percentile, minimum/maximum); range of exposure levels, number of exposed cases
Statistical findings (e.g., adjusted β, standardized mean difference, adjusted odds ratio, standardized mortality ratio, relative risk, etc.) or description of qualitative results. When possible, measures of effect will be converted to a common metric with associated 95% confidence intervals (CIs). Most often, measures of effect for continuous data are expressed as mean difference, standardized mean difference, and percent control response. Categorical data are typically expressed as odds ratio, relative risk (RR, also called risk ratio), or β values, depending on what metric is most commonly reported in the included studies.
Observations on dose response (e.g., trend analysis, description of whether dose-response shape appears to be monotonic, nonmonotonic)
OtherDocumentation of author queries, use of digital rulers to estimate data values from figures, exposure unit, and statistical result conversions, etc.

Items marked with an asterisk (*) are examples of items that can be used to assess internal validity/risk of bias.

SECTION D-1e. RISK OF BIAS QUESTIONS FOR EPIDEMIOLOGIC STUDIES

Cohort Studies

1. Was administered dose or exposure level adequately randomized? [NA]

2. Was allocation to study groups adequately concealed? [NA]

3. Did selection of study participants result in the appropriate comparison groups?

Definitely Low Risk of Bias (++)
  • Direct evidence that subjects (both exposed and nonexposed) were similar (e.g., recruited from the same eligible population, recruited with the same method of ascertainment using the same inclusion and exclusion criteria, and were of similar age and health status), recruited within the same time frame, and had the similar participation/response rates.
  • Note: A study will be considered low risk of bias if baseline characteristics of groups differed, but these differences were considered as potential confounding or stratification variables (see question #4).
Probably Low Risk of Bias (+)
  • Indirect evidence that subjects (both exposed and nonexposed) were similar (e.g., recruited from the same eligible population, recruited with the same method of ascertainment using the same inclusion and exclusion criteria, and were of similar age and health status), recruited within the same time frame, and had the similar participation/response rates,
  • OR differences between groups would not appreciably bias results.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that subjects (both exposed and nonexposed) were not similar, recruited within very different time frames, or had the very different participation/response rates,
  • OR there is insufficient information provided about the comparison group, including a different rate of nonresponse without an explanation (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that subjects (both exposed and nonexposed) were not similar, recruited within very different time frames, or had the very different participation/response rates.

4. Did study design or analysis account for important confounding and modifying variables?

Definitely Low Risk of Bias (++)
  • Direct evidence that appropriate adjustments or explicit considerations were made for the variables listed below as potential confounders and/or effect measure modifiers in the final analyses through the use of statistical models to reduce research-specific bias, including standardization, matching, adjustment in multivariate model, stratification, propensity scoring, or other methods that were appropriately justified. Acceptable consideration of appropriate adjustment factors includes cases when the factor is not included in the final adjustment model because the author conducted analyses that indicated it did not need to be included,
  • AND there is direct evidence that primary covariates and confounders were assessed using valid and reliable measurements,
  • AND there is direct evidence that other exposures anticipated to bias results were not present or were appropriately measured and adjusted for. In occupational studies or studies of contaminated sites, other chemical exposures known to be associated with those settings were appropriately considered.
  • Note: The following variables should be considered as key/primary potential confounders and/or effect measure modifiers that must be considered in the analyses of the relationship between phthalate exposure and male reproductive outcomes: a measure of weight or body size at exam, a measure of weight or body size at birth, age at exam, and measure of urinary dilution (specific gravity, creatinine, or osmolality) or indication that exposure measure was adjusted for urinary dilution.
  • Note: The following variables should be considered as additional potential confounders and/or effect measure modifiers, but consideration of these variables is not required in the analysis of the relationship between phthalate exposure and male reproductive outcomes: maternal age, pre-pregnancy or maternal BMI, maternal education, maternal income, maternal race/ethnicity, and time of day of urine collection.
Probably Low Risk of Bias (+)
  • Indirect evidence that appropriate adjustments were made,
  • OR it is deemed that not considering or only considering a partial list of covariates or confounders in the final analyses would not appreciably bias results,
  • AND there is evidence (direct or indirect) that covariates and confounders considered were assessed using valid and reliable measurements,
  • OR it is deemed that the measures used would not appreciably bias results (i.e., the authors justified the validity of the measures from previously published research),
  • OR it is deemed that co-exposures present would not appreciably bias results.
  • Note: This includes insufficient information provided on co-exposures in general population studies.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that the distribution of important covariates and known confounders differed between the groups and was not appropriately adjusted for in the final analyses,
  • OR there is insufficient information provided about the distribution of known confounders (record “NR” as basis for answer),
  • OR there is indirect evidence that covariates and confounders considered were assessed using measurements of unknown validity,
  • OR there is insufficient information provided about the measurement techniques used to assess covariates and confounders considered (record “NR” as basis for answer),
  • OR there is insufficient information provided about co-exposures in occupational studies or studies of contaminated sites where high exposures to other chemical exposures would have been reasonably anticipated (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that the distribution of important covariates and known confounders differed between the groups, confounding was demonstrated, and was not appropriately adjusted for in the final analyses,
  • OR there is direct evidence that covariates and confounders considered were assessed using nonvalid measurements,
  • OR there is direct evidence that there was an unbalanced provision of additional co-exposures across the primary study groups, which were not appropriately adjusted for.

5. Were experimental conditions identical across study groups? [NA]

6. Were the research personnel blinded to the study group during the study? [NA]

7. Were outcome data complete without attrition or exclusion from analysis?

Definitely Low Risk of Bias (++)
  • Direct evidence that loss of subjects (i.e., incomplete outcome data) was adequately addressed and reasons were documented when human subjects were removed from a study.
  • Note: Acceptable handling of subject attrition includes very little missing outcome data; reasons for missing subjects unlikely to be related to outcome (for survival data, censoring unlikely to be introducing bias); missing outcome data balanced in numbers across study groups, with similar reasons for missing data across groups,
  • OR missing data have been imputed using appropriate methods and characteristics of subjects lost to followup or with unavailable records are described in identical way and are not significantly different from those of the study participants.
Probably Low Risk of Bias (+)
  • Indirect evidence that loss of subjects (i.e., incomplete outcome data) was adequately addressed and reasons were documented when human subjects were removed from a study,
  • OR it is deemed that the proportion lost to follow-up would not appreciably bias results. This would include reports of no statistical differences in characteristics of subjects lost to follow-up or with unavailable records from those of the study participants. Generally, the higher the ratio of participants with missing data to participants with events, the greater potential there is for bias. For studies with a long duration of follow-up, some withdrawals for such reasons are inevitable.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that loss of subjects (i.e., incomplete outcome data) was unacceptably large and not adequately addressed,
  • OR there is insufficient information provided about numbers of subjects lost to follow-up (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that loss of subjects (i.e., incomplete outcome data) was unacceptably large and not adequately addressed.
  • Note: Unacceptable handling of subject attrition includes reason for missing outcome data likely to be related to true outcome, with either imbalance in numbers or reasons for missing data across study groups; or potentially inappropriate application of imputation.

8. Can we be confident in the exposure characterization?

Definitely Low Risk of Bias (++)
  • Direct evidence that exposure was consistently assessed (i.e., under the same method and time frame) using well-established methods that directly measure exposure (e.g., measurement of urinary phthalate metabolites [and a measure of urinary dilution was available], amniotic fluid oxidative phthalate metabolites).
  • OR exposure was assessed using less-established methods that directly measure exposure and are validated against well-established methods,
  • AND exposure was assessed in a relevant time-window for development of the outcome,
  • AND there is sufficient range or variation in exposure measurements across groups to potentially identify associations with health outcomes,
  • AND there is evidence that most of the exposure data measurements are above the limit of quantitation for the assay such that different exposure groups can be distinguished.
Probably Low Risk of Bias (+)
  • Indirect evidence that the exposure was consistently assessed using well-established methods that directly measure exposure),
  • AND exposure was assessed in a relevant time-window for development of the outcome,
  • AND there is sufficient range or variation in exposure measurements across groups to potentially identify associations with health outcomes (at a minimum from high exposure or ever exposed from low exposure or never exposed).
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that the exposure was assessed using poorly validated methods that directly measure exposure,
  • OR there is insufficient information provided about the exposure assessment, including validity and reliability, but no evidence for concern about the method used (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that the exposure was assessed using methods with poor validity.

9. Can we be confident in the outcome assessment?

Definitely Low Risk of Bias (++)
  • Direct evidence that the male reproductive outcome was assessed using well-established methods (e.g., gold standard),
  • AND there is direct evidence that the outcome assessors were adequately blinded to the study group or exposure level, and it is unlikely that they could have broken the blinding prior to reporting outcomes.
  • Note: Well-established methods will depend on the outcome, and include: 1. Training on AGD measurements for all examiners using well-described methods (preferred method is using calipers but other methods will be considered). 2. Intra- and inter-rater reliability assessed. 3. For hypospadias diagnosis, direct exam by urologists/pediatric urologist or examiners who participated in training. 4. Testosterone and/or free testosterone measured in serum or amniotic fluid by HPLC, GC-MS, LC-MS, or equilibrium dialysis.
Probably Low Risk of Bias (+)
  • Indirect evidence that the outcome was assessed using acceptable methods (i.e., deemed valid and reliable but not the gold standard), such as non-caliper AGD measurements or testosterone measurements using radioimmunoassay,
  • OR it is deemed that the outcome assessment methods used would not appreciably bias results,
  • AND there is indirect evidence that the outcome assessors were adequately blinded to the study group, and it is unlikely that they could have broken the blinding prior to reporting outcomes,
  • OR it is deemed that lack of adequate blinding of outcome assessors would not appreciably bias results, which is more likely to apply to objective outcome measures.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that the outcome assessment method is an insensitive instrument (e.g., a questionnaire used to assess outcomes with no information on validation), or AGD assessment without intra-rater and/or inter-rater reliability, or hypospadias measured from the medical record,
  • OR there is indirect evidence that it was possible for outcome assessors to infer the study group prior to reporting outcomes,
  • OR there is insufficient information provided about blinding of outcome assessors (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that the outcome assessment method is an insensitive instrument or no training for AGD measurement, or no description of AGD measurement or hypospadias assessment methods, or no description of methods for testosterone assays,
  • OR there is direct evidence for lack of adequate blinding of outcome assessors, including no blinding or incomplete blinding.

10. Were all measured outcomes reported?

Definitely Low Risk of Bias (++)
  • Direct evidence that all of the study's measured outcomes outlined in the protocol, methods, abstract, and/or introduction (that are relevant for the evaluation) have been reported.
Probably Low Risk of Bias (+)
  • Indirect evidence that all of the study's measured outcomes outlined in the protocol, methods, abstract, and/or introduction (that are relevant for the evaluation) have been reported,
  • OR analyses that had not been planned in advance (i.e., retrospective unplanned subgroup analyses) are clearly indicated as such and deemed that unplanned analyses were appropriate and selective reporting would not appreciably bias results (e.g., appropriate analyses of an unexpected effect). This would include outcomes reported with insufficient detail such as only reporting that results were statistically significant (or not).
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that all of the study's measured outcomes outlined in the protocol, methods, abstract, and/or introduction (that are relevant for the evaluation) have not been reported,
  • OR and there is indirect evidence that unplanned analyses were included that may appreciably bias results,
  • OR there is insufficient information provided about selective outcome reporting (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that all of the study's measured outcomes outlined in the protocol, methods, abstract, and/or introduction (that are relevant for the evaluation) have not been reported. In addition to not reporting outcomes, this would include reporting outcomes based on a composite score without individual outcome components or outcomes reported using measurements, analysis methods, or subsets of the data (e.g., subscales) that were not prespecified or reporting outcomes not prespecified, or that unplanned analyses were included that would appreciably bias results.

11. Were there no other potential threats to internal validity? There are no phthalate-specific additions to the risk of bias questions for this evaluation. This question will be used to examine individual studies for appropriate statistical methods (e.g., confirmation of homogeneity of variance for ANOVA and other statistical tests that require normally distributed data). It will also be used for risk of bias considerations that do not fit under the other questions.

Cross-Sectional and Case-Series Studies

1. Was administered dose or exposure level adequately randomized? [NA]

2. Was allocation to study groups adequately concealed? [NA]

3. Did selection of study participants result in the appropriate comparison groups? [NA to Case Series]

Definitely Low Risk of Bias (++)
  • Direct evidence that subjects (both exposed and nonexposed) were similar (e.g., recruited from the same eligible population, recruited with the same method of ascertainment using the same inclusion and exclusion criteria, and were of similar age and health status), recruited within the same time frame, and had the similar participation/response rates.
  • Note: A study will be considered low risk of bias if baseline characteristics of groups differed, but these differences were considered as potential confounding or stratification variables (see question #4).
Probably Low Risk of Bias (+)
  • Indirect evidence that subjects (both exposed and nonexposed) were similar (e.g., recruited from the same eligible population, recruited with the same method of ascertainment using the same inclusion and exclusion criteria, and were of similar age and health status), recruited within the same time frame, and had the similar participation/response rates,
  • OR differences between groups would not appreciably bias results.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that subjects (both exposed and nonexposed) were not similar, recruited within very different time frames, or had the very different participation/response rates,
  • OR there is insufficient information provided about the comparison group, including a different rate of nonresponse without an explanation (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that subjects (both exposed and nonexposed) were not similar, recruited within very different time frames, or had the very different participation/response rates.

4. Did study design or analysis account for important confounding and modifying variables?

Definitely Low Risk of Bias (++)
  • Direct evidence that appropriate adjustments or explicit considerations were made for the variables listed below as potential confounders and/or effect measure modifiers in the final analyses through the use of statistical models to reduce research-specific bias, including standardization, matching, adjustment in multivariate model, stratification, propensity scoring, or other methods that were appropriately justified. Acceptable consideration of appropriate adjustment factors includes cases when the factor is not included in the final adjustment model because the author conducted analyses that indicated it did not need to be included,
  • AND there is direct evidence that primary covariates and confounders were assessed using valid and reliable measurements,
  • AND there is direct evidence that other exposures anticipated to bias results were not present or were appropriately measured and adjusted for. In occupational studies or studies of contaminated sites, other chemical exposures known to be associated with those settings were appropriately considered.
  • Note: The following variables should be considered as key/primary potential confounders and/or effect measure modifiers that must be considered in the analyses of the relationship between phthalate exposure and male reproductive outcomes: a measure of weight or body size at exam, a measure of weight or body size at birth, age at exam, and measure of urinary dilution (specific gravity, creatinine, or osmolality) or indication that exposure measure was adjusted for urinary dilution.
  • Note: The following variables should be considered as additional potential confounders and/or effect measure modifiers, but consideration of these variables is not required in the analysis of the relationship between phthalate exposure and male reproductive outcomes: maternal age, pre-pregnancy or maternal BMI, maternal education, maternal income, maternal race/ethnicity, and time of day of urine collection.
Probably Low Risk of Bias (+)
  • Indirect evidence that appropriate adjustments were made,
  • OR it is deemed that not considering or only considering a partial list of covariates or confounders in the final analyses would not appreciably bias results,
  • AND there is evidence (direct or indirect) that covariates and confounders considered were assessed using valid and reliable measurements,
  • OR it is deemed that the measures used would not appreciably bias results (i.e., the authors justified the validity of the measures from previously published research),
  • OR it is deemed that co-exposures present would not appreciably bias results.
  • Note: This includes insufficient information provided on co-exposures in general population studies.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that the distribution of important covariates and known confounders differed between the groups and was not appropriately adjusted for in the final analyses,
  • OR there is insufficient information provided about the distribution of known confounders (record “NR” as basis for answer),
  • OR there is indirect evidence that covariates and confounders considered were assessed using measurements of unknown validity,
  • OR there is insufficient information provided about the measurement techniques used to assess covariates and confounders considered (record “NR” as basis for answer),
  • OR there is insufficient information provided about co-exposures in occupational studies or studies of contaminated sites where high exposures to other chemical exposures would have been reasonably anticipated (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that the distribution of important covariates and known confounders differed between the groups, confounding was demonstrated, and was not appropriately adjusted for in the final analyses,
  • OR there is direct evidence that covariates and confounders considered were assessed using nonvalid measurements,
  • OR there is direct evidence that there was an unbalanced provision of additional co-exposures across the primary study groups, which were not appropriately adjusted for.

5. Were experimental conditions identical across study groups? [NA]

6. Were the research personnel blinded to the study group during the study? [NA]

7. Were outcome data complete without attrition or exclusion from analysis?

Definitely Low Risk of Bias (++)
  • Direct evidence that exclusion of subjects from analyses was adequately addressed, and reasons were documented when subjects were removed from the study or excluded from analyses.
Probably Low Risk of Bias (+)
  • Indirect evidence that exclusion of subjects from analyses was adequately addressed, and reasons were documented when subjects were removed from the study or excluded from analyses.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that exclusion of subjects from analyses was not adequately addressed,
  • OR there is insufficient information provided about why subjects were removed from the study or excluded from analyses (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that exclusion of subjects from analyses was not adequately addressed.
  • Note: Unacceptable handling of subject exclusion from analyses includes reason for exclusion likely to be related to true outcome, with either imbalance in numbers or reasons for exclusion across study groups.

8. Can we be confident in the exposure characterization?

Definitely Low Risk of Bias (++)
  • Direct evidence that exposure was consistently assessed (i.e., under the same method and time frame) using well-established methods that directly measure exposure (e.g., measurement of urinary phthalate metabolites [and a measure of urinary dilution was available], amniotic fluid oxidative phthalate metabolites).
  • OR exposure was assessed using less-established methods that directly measure exposure and are validated against well-established methods,
  • AND exposure was assessed in a relevant time-window for development of the outcome,
  • AND there is sufficient range or variation in exposure measurements across groups to potentially identify associations with health outcomes,
  • AND there is evidence that most of the exposure data measurements are above the limit of quantitation for the assay such that different exposure groups can be distinguished.
Probably Low Risk of Bias (+)
  • Indirect evidence that the exposure was consistently assessed using well-established methods that directly measure exposure),
  • AND exposure was assessed in a relevant time-window for development of the outcome,
  • AND there is sufficient range or variation in exposure measurements across groups to potentially identify associations with health outcomes (at a minimum from high exposure or ever exposed from low exposure or never exposed),
  • AND there is evidence that most of the exposure data measurements are above the limit of quantitation for the assay such that different exposure groups can be distinguished.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that the exposure was assessed using poorly validated methods that directly measure exposure,
  • OR there is insufficient information provided about the exposure assessment, including validity and reliability, but no evidence for concern about the method used (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that the exposure was assessed using methods with poor validity,
  • OR evidence of exposure misclassification (e.g., differential recall of self-reported exposure).

9. Can we be confident in the outcome assessment?

Definitely Low Risk of Bias (++)
  • Direct evidence that the male reproductive outcome was assessed using well-established methods (e.g., gold standard),
  • AND there is direct evidence that the outcome assessors (including study subjects, if outcomes were self-reported) were adequately blinded to the study group or exposure level, and it is unlikely that they could have broken the blinding prior to reporting outcomes.
  • Note: Well-established methods will depend on the outcome, and include: 1. Training on AGD measurements for all examiners using well-described methods (preferred method is using calipers but other methods will be considered). 2. Intra- and inter-rater reliability assessed. 3. For hypospadias diagnosis, direct exam by urologists/pediatric urologist or examiners who participated in training. 4. Testosterone and/or free testosterone measured in serum or amniotic fluid by HPLC, GC-MS, LC-MS, or equilibrium dialysis.
Probably Low Risk of Bias (+)
  • Indirect evidence that the outcome was assessed using acceptable methods (i.e., deemed valid and reliable but not the gold standard), such as non-caliper AGD measurements and testosterone measurements using radioimmunoassay,
  • AND subjects had been followed for the same length of time in all study groups,
  • OR it is deemed that the outcome assessment methods used would not appreciably bias results,
  • AND there is indirect evidence that the outcome assessors were adequately blinded to the study group, and it is unlikely that they could have broken the blinding prior to reporting outcomes,
  • OR it is deemed that lack of adequate blinding of outcome assessors would not appreciably bias results, which is more likely to apply to objective outcome measures.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that the outcome assessment method is an insensitive instrument (e.g., a questionnaire used to assess outcomes with no information on validation), or AGD assessment without intra-rater and/or inter-rater reliability, or hypospadias measured from the medical record,
  • OR there is indirect evidence that it was possible for outcome assessors to infer the study group prior to reporting outcomes,
  • OR there is insufficient information provided about blinding of outcome assessors (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that the outcome assessment method is an insensitive instrument, or no training for AGD measurement, or no description of AGD measurement or hypospadias assessment methods, or no description of methods for testosterone assays,
  • OR there is direct evidence for lack of adequate blinding of outcome assessors including no blinding or incomplete blinding.

10. Were all measured outcomes reported?

Definitely Low Risk of Bias (++)
  • Direct evidence that all of the study's measured outcomes outlined in the protocol, methods, abstract, and/or introduction (that are relevant for the evaluation) have been reported.
Probably Low Risk of Bias (+)
  • Indirect evidence that all of the study's measured outcomes outlined in the protocol, methods, abstract, and/or introduction (that are relevant for the evaluation) have been reported,
  • OR analyses that had not been planned in advance (i.e., retrospective unplanned subgroup analyses) are clearly indicated as such and deemed that unplanned analyses were appropriate and selective reporting would not appreciably bias results (e.g., appropriate analyses of an unexpected effect). This would include outcomes reported with insufficient detail such as only reporting that results were statistically significant (or not).
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that all of the study's measured outcomes outlined in the protocol, methods, abstract, and/or introduction (that are relevant for the evaluation) have not been reported,
  • OR and there is indirect evidence that unplanned analyses were included that may appreciably bias results,
  • OR there is insufficient information provided about selective outcome reporting (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that all of the study's measured outcomes outlined in the protocol, methods, abstract, and/or introduction (that are relevant for the evaluation) have not been reported. In addition to not reporting outcomes, this would include reporting outcomes based on composite score without individual outcome components or outcomes reported using measurements, analysis methods or subsets of the data (e.g., subscales) that were not pre-specified or reporting outcomes not pre-specified, or that unplanned analyses were included that would appreciably bias results.

11. Were there no other potential threats to internal validity?

There are no phthalate-specific additions to the risk-of-bias questions for this evaluation. This question will be used to examine individual studies for appropriate statistical methods (e.g., confirmation of homogeneity of variance for ANOVA and other statistical tests that require normally distributed data). It will also be used for risk-of-bias considerations that do not fit under the other questions.

Case-Control Studies

1. Was administered dose or exposure level adequately randomized? [NA]

2. Was allocation to study groups adequately concealed? [NA]

3. Did selection of study participants result in the appropriate comparison groups?

Definitely Low Risk of Bias (++)
  • Direct evidence that cases and controls were similar (e.g., recruited from the same eligible population including being of similar age, gender, ethnicity, and eligibility criteria other than outcome of interest as appropriate), recruited within the same time frame, and controls are described as having no history of the outcome.
  • Note: A study will be considered low risk of bias if baseline characteristics of groups differed but these differences were considered as potential confounding or stratification variables (see question #4)
Probably Low Risk of Bias (+)
  • Indirect evidence that cases and controls were similar (e.g., recruited from the same eligible population, recruited with the same method of ascertainment using the same inclusion and exclusion criteria, and were of similar age), recruited within the same time frame, and controls are described as having no history of the outcome,
  • OR it is deemed differences between cases and controls would not appreciably bias results.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that controls were drawn from a very dissimilar population than cases or recruited within very different time frames,
  • OR there is insufficient information provided about the appropriateness of controls including rate of response reported for cases only (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that controls were drawn from a very dissimilar population than cases or recruited within very different time frames.

4. Did study design or analysis account for important confounding and modifying variables?

Definitely Low Risk of Bias (++)
  • Direct evidence that appropriate adjustments or explicit considerations were made for the variables listed below as potential confounders and/or effect measure modifiers in the final analyses through the use of statistical models to reduce research-specific bias including standardization, matching, adjustment in multivariate model, stratification, propensity scoring, or other methods that were appropriately justified. Acceptable consideration of appropriate adjustment factors includes cases when the factor is not included in the final adjustment model because the author conducted analyses that indicated it did not need to be included,
  • AND there is direct evidence that primary covariates and confounders were assessed using valid and reliable measurements,
  • AND there is direct evidence that other exposures anticipated to bias results were not present or were appropriately measured and adjusted for. In occupational studies or studies of contaminated sites, other chemical exposures known to be associated with those settings were appropriately considered.
  • Note: The following variables should be considered as key/primary potential confounders and/or effect measure modifiers that must be considered in the analyses of the relationship between phthalate exposure and male reproductive outcomes: a measure of weight or body size at exam, a measure of weight or body size at birth, age at exam, and measure of urinary dilution (specific gravity, creatinine, or osmolality) or indication that exposure measure was adjusted for urinary dilution.
  • Note: The following variables should be considered as additional potential confounders and/or effect measure modifiers but consideration of these variables is not required in the analysis of the relationship between phthalate exposure and male reproductive outcomes: maternal age, pre-pregnancy or maternal BMI, maternal education, maternal income, maternal race/ethnicity, and time of day of urine collection.
  • Note: It may be that in case control studies, the original cases and controls were matched on the covariates above. If this is the case, the adjustment is not needed.
Probably Low Risk of Bias (+)
  • Indirect evidence that appropriate adjustments were made,
  • OR it is deemed that not considering or only considering a partial list of covariates or confounders in the final analyses would not appreciably bias results,
  • AND there is evidence (direct or indirect) that covariates and confounders considered were assessed using valid and reliable measurements,
  • OR it is deemed that the measures used would not appreciably bias results (i.e., the authors justified the validity of the measures from previously published research),
  • OR it is deemed that co-exposures present would not appreciably bias results.
  • Note: This includes insufficient information provided on co-exposures in general population studies.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that the distribution of important covariates and known confounders differed between the groups and was not appropriately adjusted for in the final analyses,
  • OR there is insufficient information provided about the distribution of known confounders (record “NR” as basis for answer),
  • OR there is indirect evidence that covariates and confounders considered were assessed using measurements of unknown validity,
  • OR there is insufficient information provided about the measurement techniques used to assess covariates and confounders considered (record “NR” as basis for answer),
  • OR there is insufficient information provided about co-exposures in occupational studies or studies of contaminated sites where high exposures to other chemical exposures would have been reasonably anticipated (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that the distribution of important covariates and known confounders differed between the groups, confounding was demonstrated, and was not appropriately adjusted for in the final analyses,
  • OR there is direct evidence that covariates and confounders considered were assessed using non valid measurements,
  • OR there is direct evidence that there was an unbalanced provision of additional co-exposures across the primary study groups, which were not appropriately adjusted for.

5. Were experimental conditions identical across study groups? [NA]

6. Were the research personnel blinded to the study group during the study? [NA]

7. Were outcome data complete without attrition or exclusion from analysis?

Definitely Low Risk of Bias (++)
  • Direct evidence that exclusion of subjects from analyses was adequately addressed, and reasons were documented when subjects were removed from the study or excluded from analyses.
Probably Low Risk of Bias (+)
  • Indirect evidence that exclusion of subjects from analyses was adequately addressed, and reasons were documented when subjects were removed from the study or excluded from analyses.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that exclusion of subjects from analyses was not adequately addressed,
  • OR there is insufficient information provided about why subjects were removed from the study or excluded from analyses (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that exclusion of subjects from analyses was not adequately addressed.
  • Note: Unacceptable handling of subject exclusion from analyses includes reason for exclusion likely to be related to true outcome, with either imbalance in numbers or reasons for exclusion across study groups.

8. Can we be confident in the exposure characterization?

Definitely Low Risk of Bias (++)
  • Direct evidence that exposure was consistently assessed (i.e., under the same method and time-frame) using well-established methods that directly measure exposure (e.g., measurement of urinary phthalate metabolites[ and a measure of urinary dilution was available], amniotic fluid oxidative phthalate metabolites).
  • OR exposure was assessed using less-established methods that directly measure exposure and are validated against well-established methods,
  • AND exposure was assessed in a relevant time-window for development of the outcome,
  • AND there is sufficient range or variation in exposure measurements across groups to potentially identify associations with health outcomes,
  • AND there is evidence that most of the exposure data measurements are above the limit of quantitation for the assay such that different exposure groups can be distinguished.
Probably Low Risk of Bias (+)
  • Indirect evidence that the exposure was consistently assessed using well-established methods that directly measure exposure),
  • AND exposure was assessed in a relevant time-window for development of the outcome,
  • AND there is sufficient range or variation in exposure measurements across groups to potentially identify associations with health outcomes (at a minimum from high exposure or ever exposed from low exposure or never exposed).
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that the exposure was assessed using poorly validated methods that directly measure exposure,
  • OR there is insufficient information provided about the exposure assessment, including validity and reliability, but no evidence for concern about the method used (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that the exposure was assessed using methods with poor validity.

9. Can we be confident in the outcome assessment?

Definitely Low Risk of Bias (++)
  • Direct evidence that the male reproductive outcome was assessed using well-established methods (e.g., gold standard),
  • AND there is direct evidence that the outcome assessors (including study subjects, if outcomes were self-reported) were adequately blinded to the study group or exposure level, and it is unlikely that they could have broken the blinding prior to reporting outcomes.
  • Note: Well-established methods will depend on the outcome, and include: 1. Training on AGD measurements for all examiners using well-described methods (preferred method is using calipers but other methods will be considered). 2. Intra- and inter-rater reliability assessed. 3. For hypospadias diagnosis, direct exam by urologists/pediatric urologist or examiners who participated in training. 4. Testosterone and/or free testosterone measured in serum or amniotic fluid by HPLC, GC-MS, LC-MS, or equilibrium dialysis.
Probably Low Risk of Bias (+)
  • Indirect evidence that the outcome was assessed using acceptable methods (i.e., deemed valid and reliable but not the gold standard), such as non-caliper AGD measurements and testosterone measurements using radioimmunoassays,
  • OR it is deemed that the outcome assessment methods used would not appreciably bias results,
  • AND there is indirect evidence that the outcome assessors were adequately blinded to the study group, and it is unlikely that they could have broken the blinding prior to reporting outcomes,
  • OR it is deemed that lack of adequate blinding of outcome assessors would not appreciably bias results, which is more likely to apply to objective outcome measures.
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that the outcome assessment method is an insensitive instrument (e.g., a questionnaire used to assess outcomes with no information on validation), or AGD assessment without intra-rater and/or inter-rater reliability, or hypospadias measured from the medical record,
  • OR there is indirect evidence that it was possible for outcome assessors to infer the study group prior to reporting outcomes,
  • OR there is insufficient information provided about blinding of outcome assessors (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that the outcome assessment method is an insensitive instrument, or no training for AGD measurement, or no description of AGD measurement or hypospadias assessment methods, or no description of methods for testosterone assays,
  • OR there is direct evidence for lack of adequate blinding of outcome assessors, including no blinding or incomplete blinding.

10. Were all measured outcomes reported?

Definitely Low Risk of Bias (++)
  • Direct evidence that all of the study's measured outcomes outlined in the protocol, methods, abstract, and/or introduction (that are relevant for the evaluation) have been reported.
Probably Low Risk of Bias (+)
  • Indirect evidence that all of the study's measured outcomes outlined in the protocol, methods, abstract, and/or introduction (that are relevant for the evaluation) have been reported,
  • OR analyses that had not been planned in advance (i.e., retrospective unplanned subgroup analyses) are clearly indicated as such and deemed that unplanned analyses were appropriate and selective reporting would not appreciably bias results (e.g., appropriate analyses of an unexpected effect). This would include outcomes reported with insufficient detail such as only reporting that results were statistically significant (or not).
Probably High Risk of Bias (-) or (NR)
  • Indirect evidence that all of the study's measured outcomes outlined in the protocol, methods, abstract, and/or introduction (that are relevant for the evaluation) have not been reported,
  • OR and there is indirect evidence that unplanned analyses were included that may appreciably bias results,
  • OR there is insufficient information provided about selective outcome reporting (record “NR” as basis for answer).
Definitely High Risk of Bias (--)
  • Direct evidence that all of the study's measured outcomes outlined in the protocol, methods, abstract, and/or introduction (that are relevant for the evaluation) have not been reported. In addition to not reporting outcomes, this would include reporting outcomes based on composite score without individual outcome components or outcomes reported using measurements, analysis methods, or subsets of the data (e.g., subscales) that were not prespecified or reporting outcomes not prespecified, or that unplanned analyses were included that would appreciably bias results.

11. Were there no other potential threats to internal validity?

There are no phthalate-specific additions to the risk of bias questions for this evaluation. This question will be used to examine individual studies for appropriate statistical methods (e.g., confirmation of homogeneity of variance for ANOVA and other statistical tests that require normally distributed data). It will also be used for risk of bias considerations that do not fit under the other questions.

SECTION D-1f. AMENDMENTS TO THE PROTOCOL

Additions to the Review Team

The following committee member was added to the review team supplement expertise and to assist with the workload:

  • David C. Dorman (Chair) is a professor of toxicology in the Department of Molecular Biosciences of North Carolina State University. The primary objective of his research is to provide a refined understanding of chemically induced neurotoxicity in laboratory animals that will lead to improved assessment of potential toxicity in humans. Dr. Dorman's research interests include neurotoxicology, nasal toxicology, pharmacokinetics, and cognition and olfaction in animals. He has chaired or served on several NRC committees, including the Committee on Design and Evaluation of Safer Chemical Substitutions: A Framework to Inform Government and Industry Decisions, the Committee to Review EPA's Draft IRIS Assessment of Formaldehyde, and the Committee to Review the IRIS Process. He has served on other advisory boards for the US Navy, NASA, and USDA and is currently a member of NTP's Board of Scientific Counselors. Dr. Dorman is an elected fellow of the Academy of Toxicological Sciences and a fellow of the American Association for the Advancement of Sciences. He received a DVM from Colorado State University. He completed a combined PhD and residency program in toxicology at the University of Illinois at Urbana-Champaign, and he is a diplomate of the American Board of Veterinary Toxicology and the American Board of Toxicology.

A consulting firm (ICF International) was hired to assist the committee with extracting data from the epidemiological studies into HAWC. The two ICF staff members who performed the extraction task were:

  • Robyn Blain, who has 22 years of experience reviewing and analyzing public-health and mammalian toxicity studies, with 14 years at ICF. She also has about 10 years of experience reviewing and analyzing epidemiologic and mechanism studies. She has applied her expertise in several work assignments for NTP, using both DRAGON and HAWC as well as the Excel-based ROBINS-E risk of bias tool. Dr. Blain also has been involved in several work assignments for the US Environmental Protection Agency's (EPA's) National Center for Environmental Assessment (NCEA), including authoring Provisional Toxicity Values (PTVs) in support of EPA's Superfund Program and several Integrated Risk Information System (IRIS) Toxicological Reviews; supporting a 2,3,7,8-tetrachlorodibenzo-p-dioxin literature review; evaluating studies for chemical registration under the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) Program; developing robust study summaries in International Uniform Chemical Information Database (IUCLID) version 5; and preparing toxicological assessments for several chemicals. She has conducted multiple literature reviews and has been involved in the production and testing of two relational retrieval databases. Dr. Blain has extensive experience with in vivo toxicological experimentation and has published several articles on the hepatotoxicity of organic solvents, the induction of cancer from single exposures to carcinogens, and hormesis.
  • Pamela Hartman, who has more than 20 years of professional experience in environmental consulting, specializing in exposure and risk assessment, toxicology, literature search and review, technical editing, and document production. For NIEHS, she has conducted data extraction, study quality reviews, and risk of bias assessments for toxicological and epidemiological studies using DRAGON and HAWC for multiple projects, including perfluorooctanoic acid (PFOA)/perfluorooctane sulfonate (PFOS), bisphenol A (BPA), Fluoride, Folic Acid, and Transgenerational Inheritance. Ms. Hartman has also provided support to many work assignments for EPA/NCEA, specifically: Exposure Factors Interactive Resource for Scenarios Tool (ExpoFIRST); numerous IRIS Toxicological Reviews; EPA-Expo-Box; HERO Support; Risk Assessment Training and Experience (RATE) Program—Exposure Assessment (EXA) Course Series; Provisional Toxicity Value (PTV) documents; two Nanomaterial Case Study documents; and Dioxin Reassessment. Ms. Hartman has a BS in Natural Resources from Cornell University and an MA in Environmental Management from Duke University.

SECTION D-2. Results of Literature Searches for Human Studies on the Effects of Phthalates on Male Reproductive-Tract Development

Literature searches were performed on August 15, 2016, using the search strategy presented in the Phthalate (Human) Systematic Review Protocol (Section D-1). A summary of the results is presented below.

Embase:422
PubMed:210
Toxline:111
Total citations found:743
Duplicates removed:149
Total unique citations:594

SECTION D-3. Confidence Ratings for the Body of Evidence from Human Studies of Phthalates and Anogenital Distance

The confidence in the body of evidence from human studies on phthalates and male reproductive-tract development was rated in accordance with the OHAT Guidance (NTP 2015) specified in Section D-1. The results for DEHP are presented first, and the remaining phthalates are subsequently presented in alphabetical order.

DEHP (metabolites MEHP, 5oxo-MEHP, 5OH-MEHP, or sumDEHP Metabolites)

Five human studies of DEHP and AGD (as) or AGD (ap) were available. Figures D3-1 and D3-2 illustrate the data from studies that evaluated sumDEHP metabolites and individual DEHP metabolites, respectively.

FIGURE D3-1. Data pivot of studies that measured sumDEHP metabolites and AGD (as) or AGD (ap).

FIGURE D3-1

Data pivot of studies that measured sumDEHP metabolites and AGD (as) or AGD (ap). In HAWC: https://hawcproject.org/summary/data-pivot/assessment/350/sumdehp-metabolite-effects-agd-or-agd-ap/.

FIGURE D3-2. Data pivot of studies that measured DEHP metabolites and AGD (as) or AGD (ap).

FIGURE D3-2

Data pivot of studies that measured DEHP metabolites and AGD (as) or AGD (ap). In HAWC: https://hawcproject.org/summary/data-pivot/assessment/350/mehp-effects-agd-and-agd-ap/.

Factors Considered for Downgrading Confidence
  • Risk of bias: No downgrade. Figure D3-3 shows the risk of bias assessment for each study.
  • Unexplained inconsistencies: No downgrade. Although there is some inconsistency in results across studies, the differences can be explained (at least partially) by different study populations, different ranges of exposure within study population, and differences in timing of collection of urine samples (first trimester vs later samples). Despite these differences, there is general consistency of associations with respect to direction and magnitude of effects (smaller AGD with higher DEHP metabolite exposure).
  • Indirectness: No downgrade. The studies directly addressed the effect of prenatal exposure to DEHP on AGD in males, defined the window of exposure, and assessed the outcome within an appropriate amount of time.
  • Imprecision: No downgrade. Most of the studies included 95% confidence intervals to assess precision of associations.
  • Publication bias: No downgrade (see Table D3-1).

Sources of funding were used to evaluate publication bias in terms of whether a particular sector funded more studies than another.

FIGURE D3-3. Risk of bias heatmap of studies of DEHP and AGD in humans.

FIGURE D3-3

Risk of bias heatmap of studies of DEHP and AGD in humans. In HAWC: https://hawcproject.org/summary/visual/341/.

TABLE D3-1. Sources of Funding for the Human Studies on Phthalates.

TABLE D3-1

Sources of Funding for the Human Studies on Phthalates.

Factors Considered for Upgrading Confidence
  • Large magnitude: No upgrade. No evidence of a large magnitude of effect size.
  • Dose-response: No upgrade. No evidence of dose-response curve based on one study that performed quartile analyses (see Figure D3-4).
  • Residual confounding: No upgrade. Studies measured and adjusted for important known confounders. Little evidence that unmeasured confounding would affect the results across studies. Direction of potential residual confounding bias is unknown.

BzBP (metabolite MBzP)

Four human studies of BzBP and AGD (as) or AGD (ap) were available (see Figure D3-5).

Factors Considered for Downgrading Confidence
  • Risk of bias: No downgrade. See Figure D3-3 for the risk of bias assessments for the four studies.
  • Unexplained inconsistencies: No downgrade. Although there is some inconsistency in results across studies, the differences can be explained (at least partially) by different study populations, different ranges of exposure within the study population, and differences in timing of collection of urine samples (first trimester vs later samples).
  • Indirectness: No downgrade. The study designs directly addressed the topic of the evaluation.
  • Imprecision: No downgrade. All but one study (Swan 2008) included 95% confidence intervals to assess precision of associations.
  • Publication bias: No downgrade (see Table D3-1).
FIGURE D3-4. Data pivot of the Jensen et al. (2016) study of sumDEHP metabolites and AGD (as) or AGD (ap).

FIGURE D3-4

Data pivot of the Jensen et al. (2016) study of sumDEHP metabolites and AGD (as) or AGD (ap). In HAWC: https://hawcproject.org/summary/data-pivot/assessment/350/sum-dehp-effects-agd-ap-agd-and-agi-quartiles/.

FIGURE D3-5. Data pivot of studies that measured MBzP and AGD (as) or AGD (ap).

FIGURE D3-5

Data pivot of studies that measured MBzP and AGD (as) or AGD (ap). In HAWC: https://hawcproject.org/summary/data-pivot/assessment/350/mbzp-effects-agd-ap-or-agd/.

Factors Considered for Upgrading Confidence
  • Large magnitude: No upgrade. No evidence of a large magnitude of effect size.
  • Dose-response: No upgrade. Unable to assess because no quartile data were available.
  • Residual confounding: No upgrade. Studies measured and adjusted for important known confounders. Little evidence that unmeasured confounding would affect the results across studies. Direction of potential residual confounding bias is unknown.

DBP (metabolite MBP)

Four human studies of MBP and AGD (as) or AGD (ap) were available (see Figure D3-6).

Factors Considered for Downgrading Confidence
  • Risk of bias: No downgrade. See Figure D3-3 for the risk of bias assessments for the four studies.
  • Unexplained inconsistencies: No downgrade. Although there is some inconsistency in results across studies, the differences can be explained (at least partially) by different study populations, different ranges of exposure within the study population, and differences in timing of collection of urine samples (first trimester vs later samples). Despite these differences, there is general consistency of associations with respect to direction and magnitude of effects (smaller AGD with higher MBP metabolite exposure).
  • Indirectness: No downgrade. The study designs directly addressed the topic of the evaluation.
  • Imprecision: No downgrade. All but one study (Swan 2008) included 95% confidence intervals to assess precision of associations.
  • Publication bias: No downgrade (see Table D3-1).
FIGURE D3-6. Data pivot of studies that measured MBP and AGD (as) or AGD (ap).

FIGURE D3-6

Data pivot of studies that measured MBP and AGD (as) or AGD (ap). In HAWC: https://hawcproject.org/summary/data-pivot/assessment/350/mbp-effects-agd-or-agd-ap/update/.

Factors Considered for Upgrading Confidence
  • Large magnitude: No upgrade. No evidence of a large magnitude of effect size.
  • Dose-response: No upgrade. Unable to assess because no quartile data were available.
  • Residual confounding: No upgrade. Studies measured and adjusted for important known confounders. Little evidence that unmeasured confounding would affect the results across studies. Direction of potential residual confounding bias is unknown.

DEP (metabolite MEP)

Four human studies of DEP and AGD (as) or AGD (ap) were available (see Figure D3-7).

Factors Considered for Downgrading Confidence
  • Risk of bias: No downgrade. See Figure D3-3 for the risk of bias assessments for the four studies.
  • Unexplained inconsistencies: No downgrade. Results are largely null except for Swan (2008).
  • Indirectness: No downgrade. The study designs directly addressed the topic of the evaluation.
  • Imprecision: No downgrade. All but one study (Swan 2008) included 95% confidence intervals to assess precision of associations.
  • Publication bias: No downgrade (see Table D3-1).
Factors Considered for Upgrading Confidence
  • Large magnitude: No upgrade. No evidence of a large magnitude of effect size.
  • Dose-response: No upgrade. Unable to assess because no quartile data were available.
  • Residual confounding: No upgrade. Studies measured and adjusted for important known confounders. Little evidence that unmeasured confounding would affect the results across studies. Direction of potential residual confounding bias is unknown.
FIGURE D3-7. Data pivot of studies that measured MEP and AGD (as) or AGD (ap).

FIGURE D3-7

Data pivot of studies that measured MEP and AGD (as) or AGD (ap). In HAWC: https://hawcproject.org/summary/data-pivot/assessment/350/mep-effects-agd-or-agd-ap/.

DIBP (metabolite MIBP)

Three human studies of DIBP and AGD (as) or AGD (ap) were available (see Figure D3-8).

Factors Considered for Downgrading Confidence
  • Risk of bias: No downgrade. See Figure D3-3 for the risk of bias assessments for the three studies.
  • Unexplained inconsistencies: No downgrade. Results are largely null, except for Swan (2008), and they are consistent across two other larger studies (Swan et al. 2015; Jensen et al. 2016).
  • Indirectness: No downgrade. The study designs directly addressed the topic of the evaluation.
  • Imprecision: No downgrade. All but one study (Swan 2008) included 95% confidence intervals to assess precision of associations.
  • Publication bias: No downgrade (see Table D3-1).
FIGURE D3-8. Data pivot of studies that measured MIBP and AGD (as) or AGD (ap).

FIGURE D3-8

Data pivot of studies that measured MIBP and AGD (as) or AGD (ap). In HAWC: https://hawcproject.org/summary/data-pivot/assessment/350/mibp-effects-agd-or-agd-ap/.

Factors Considered for Upgrading Confidence
  • Large magnitude: No upgrade. The effect estimates were generally null.
  • Dose-response: No upgrade. Unable to assess because no quartile data were available.
  • Residual confounding: No upgrade. Studies measured and adjusted for important known confounders. Little evidence that unmeasured confounding would affect the results across studies. Direction of potential residual confounding bias is unknown.

DIDP (metabolite MCNP)

One human study that evaluated the relationship between metabolites of DIDP and AGD was available (see Figure D3-9).

FIGURE D3-9. Data pivot of the study that measured MCNP and AGD (as) or AGD (ap).

FIGURE D3-9

Data pivot of the study that measured MCNP and AGD (as) or AGD (ap). In HAWC: https://hawcproject.org/summary/data-pivot/assessment/350/mcnp-effects-agd-or-agd-ap/.

Factors Considered for Downgrading Confidence
  • Risk of bias: No downgrade. The study was rated as having a definitely low risk of bias (see Figure D3-3).
  • Unexplained inconsistencies: No downgrade.
  • Indirectness: No downgrade.
  • Imprecision: No downgrade.
  • Publication bias: No downgrade (see Table D3-1).
Factors Considered for Upgrading Confidence
  • Large magnitude: No upgrade. The effect estimates were generally null.
  • Dose-response: No upgrade. Unable to assess because no quartile data were available.
  • Residual confounding: No upgrade. Study measured and adjusted for important known confounders.

DINP (metabolite MCOP)

Three human studies that evaluated the relationship between metabolites of DINP and AGD were available (see Figure D3-10).

Factors Considered for Downgrading Confidence
  • Risk of bias: No downgrade. See Figure D3-3 for the risk of bias assessments for the three studies.
  • Unexplained inconsistencies: No downgrade. Although there is some inconsistency in results across studies, the differences can be explained (at least partially) by different study populations, different ranges of exposure within study population, and differences in timing of collection of urine samples (first trimester vs later samples).
  • Indirectness: No downgrade. The study designs directly addressed the topic of the evaluation.
  • Imprecision: No downgrade.
  • Publication bias: No downgrade (see Table D3-1).
FIGURE D3-10. Data pivot of studies that measured MCOP and AGD (as) or AGD (ap).

FIGURE D3-10

Data pivot of studies that measured MCOP and AGD (as) or AGD (ap). In HAWC: https://hawcproject.org/summary/data-pivot/assessment/350/mcop-effects-agd-or-agd-ap/.

Factors Considered for Upgrading Confidence
  • Large magnitude: No upgrade. No evidence of a large magnitude of effect size.
  • Dose-response: No upgrade. Unable to assess because no quartile data were available.
  • Residual confounding: No upgrade. Studies measured and adjusted for important known confounders. Little evidence that unmeasured confounding would affect the results across studies. Direction of potential residual confounding bias is unknown.

DMP (metabolite MMP)

One human study of the relationship between metabolites of DMP and AGD was available (see Figure D3-11).

FIGURE D3-11. Data pivot of the study that measured MMP and AGD (as) or AGD (ap).

FIGURE D3-11

Data pivot of the study that measured MMP and AGD (as) or AGD (ap). In HAWC: https://hawcproject.org/summary/data-pivot/assessment/350/mmp-effects-agd-or-agd-ap/.

Factors Considered for Downgrading Confidence
  • Risk of bias: No downgrade. The study was rated as having a probably low risk of bias (see Figure D3-3).
  • Unexplained inconsistencies: No downgrade.
  • Indirectness: No downgrade.
  • Imprecision: No downgrade.
  • Publication bias: No downgrade (see Table D3-1).
Factors Considered for Upgrading Confidence
  • Large magnitude: No upgrade.
  • Dose-response: No upgrade. Unable to assess because no quartile data were available.
  • Residual confounding: No upgrade. Study measured and adjusted for important known confounders.

DOP (metabolite MCPP)

Two human studies of the relationship between metabolites of DOP and AGD were available (see Figure D3-12).

Factors Considered for Downgrading Confidence
  • Risk of bias: No downgrade. The studies were rated as having probably or definitely low risk of bias (see Figure D3-3).
  • Unexplained inconsistencies: No downgrade.
  • Indirectness: No downgrade.
  • Imprecision: No downgrade.
  • Publication bias: No downgrade (see Table D3-1).
Factors Considered for Upgrading Confidence
  • Large magnitude: No upgrade. The effect estimates were generally null.
  • Dose-response: No upgrade. Unable to assess because no quartile data were available.
  • Residual confounding: No upgrade. Studies measured and adjusted for important known confounders.
FIGURE D3-12. Data pivot of studies that measured MCPP and AGD (as) or AGD (ap).

FIGURE D3-12

Data pivot of studies that measured MCPP and AGD (as) or AGD (ap). In HAWC: https://hawcproject.org/summary/data-pivot/assessment/350/mcpp-effects-agd-or-agd-ap/.

SECTION D-4. Sensitivity Analyses of DEHP and AGD

TABLE D4-1. Sensitivity Analyses Performed by Leaving One Study Out at a Time, Using Alternative Exposure and Outcome Measures for Each Study One at a Time, and Restricting Analyses to Use the Same Exposure Measure (sumDEHP or MEHP) and/or the Same Outcome Measure (AGD [as] or AGD [ap]).

TABLE D4-1

Sensitivity Analyses Performed by Leaving One Study Out at a Time, Using Alternative Exposure and Outcome Measures for Each Study One at a Time, and Restricting Analyses to Use the Same Exposure Measure (sumDEHP or MEHP) and/or the Same Outcome Measure (more...)

SECTION D-5. Meta-Analyses of Human Studies of Additional Phthalates and Anogenital Distance

Meta-analyses of human studies on BzBP, DBP, DEP, DIBP, and DINP in relation to alterations in anogenital distance (AGD) were conducted. The same meta-analysis methods used for DEHP in Chapter 3 were applied to these phthalate. (Three phthalates, DIDP, DMP, and DOP, had only one study each so no meta-analyses for these phthalates were performed.)

For each study, AGD (as) is preferred over AGD (ap). For the studies by Bustamonte-Montes et al. (2013) and Swan (2008), the confidence interval was estimated using the reported p-value, assuming a normal distribution.

Beta coefficients are reported in units of mm/log10 change in exposure. Two factors a priori may affect comparability across studies. First, there are baseline differences in AGD (as) across different studies due to demographic factors, such as birth weight. For instance, the mean AGD (as) in Bustamante-Montes et al. (2013) was 12.4 mm, whereas the mean AGD (as) in Bornehag et al. (2015) was 41.4 mm. Additionally, AGD (as) is shorter than AGD (ap) is. For instance, in the study by Jensen et al. (2016), mean AGD (as) was 36.9 mm whereas mean AGD (ap) was 70.2 mm. Therefore, the same mm change may reflect different percentage change in AGD across studies in end points. To standardize effect sizes across studies, each reported beta coefficient was divided by the mean value of the reported outcome measure prior to conducting the meta-analysis. The result is that each beta coefficient is standardized to a percent change in AGD per log10 change in exposure.

Sensitivity analyses included leaving one study out at a time and using AGD (ap) exclusively as the outcome measure. As separate meta-analysis for using exclusive AGD (as) was not performed because it is the same as excluding the Swan (2008) study.

BzBP Meta-Analysis

Primary Analysis

In the primary analysis, four studies (see Table D5-1), with beta coefficients standardized to a percent change per log10 change in BzBP exposure, were analyzed using a random effects model. A summary estimate of -1.43 [95% CI: -3.47, 0.61] (p = 0.17) was found (see Figure D5-1). There was no significant heterogeneity, with an estimated I2 value of 0% (Q statistic was not statistically significant). In the sensitivity analyses (see Figures D5-2 and D5-3 and Table D5-2), effect sizes ranged from -0.15 to -2.21, none of which were statistically significant. In sum, although a small effect was observed, the precision of the estimate was not sufficient to rule out chance. Thus, the available studies do not support BzBP exposure being associated with decreased AGD.

TABLE D5-1. Studies Included in the Meta-Analysis of BzBP and AGD.

TABLE D5-1

Studies Included in the Meta-Analysis of BzBP and AGD.

FIGURE D5-1. Meta-analysis of human studies of BzBP and AGD; reported effect estimates [95% confidence interval] from individual studies and overall pooled estimate from random effects (RE) model per 10-fold increase in BzBP exposure.

FIGURE D5-1

Meta-analysis of human studies of BzBP and AGD; reported effect estimates [95% confidence interval] from individual studies and overall pooled estimate from random effects (RE) model per 10-fold increase in BzBP exposure.

Sensitivity Analyses
FIGURE D5-2. Sensitivity analyses of human studies of BzBP and AGD performed by leaving one study out at a time.

FIGURE D5-2

Sensitivity analyses of human studies of BzBP and AGD performed by leaving one study out at a time.

FIGURE D5-3. Sensitivity analyses of human studies of BzBP and AGD performed by restricting analysis to the same outcome measure (AGD [ap]).

FIGURE D5-3

Sensitivity analyses of human studies of BzBP and AGD performed by restricting analysis to the same outcome measure (AGD [ap]).

TABLE D5-2. Sensitivity Analyses of Human Studies of BzBP and AGD.

TABLE D5-2

Sensitivity Analyses of Human Studies of BzBP and AGD.

DBP Meta-Analysis

Primary Analysis

In the primary analysis, four studies (see Table D5-3), with beta coefficients standardized to a percent change per log10 change in DBP exposure, were analyzed using a random effects model. A summery estimate of -3.13 [95% CI: -5.63, -0.64] (p = 0.014) was found (see Figure D5-4). There was no significant heterogeneity, with an estimated I2 value of 0% (Q statistic was not statistically significant). In the sensitivity analyses (see Figures D5-5 and D5-6 and Table D5-4), effect sizes ranged from -1.85 to -4.02, and remained statistically significant in three of the five analyses. Specifically, dropping either the Swan (2008) or Swan et al. (2015) studies resulted in summary estimates that were no longer statistically significant. There was no observed heterogeneity in any sensitivity analysis results (I2 = 0).

Overall, there is consistent evidence of a small decrease in AGD being associated with increasing DBP exposure, of magnitude around 3% for each log10 increase in DBP exposure. However, some uncertainty remains because the statistical significance of this result depends on the Swan (2008) or Swan et al. (2015) studies. On the other hand, there was no observed heterogeneity, so it is likely that this sensitivity is related to the decreased statistical power when dropping studies.

TABLE D5-3. Studies Included in the Meta-Analysis of DBP and AGD.

TABLE D5-3

Studies Included in the Meta-Analysis of DBP and AGD.

FIGURE D5-4. Meta-analysis of human studies of DBP and AGD; reported effect estimates [95% confidence interval] from individual studies and overall pooled estimate from random effects (RE) model per 10-fold increase in DBP exposure.

FIGURE D5-4

Meta-analysis of human studies of DBP and AGD; reported effect estimates [95% confidence interval] from individual studies and overall pooled estimate from random effects (RE) model per 10-fold increase in DBP exposure.

Sensitivity Analyses
FIGURE D5-5. Sensitivity analysis of human studies of DBP and AGD performed by leaving one study out at a time.

FIGURE D5-5

Sensitivity analysis of human studies of DBP and AGD performed by leaving one study out at a time.

FIGURE D5-6. Sensitivity analysis of human studies of DBP and AGD performed by restricting analysis to the same outcome measure (AGD [ap]).

FIGURE D5-6

Sensitivity analysis of human studies of DBP and AGD performed by restricting analysis to the same outcome measure (AGD [ap]).

TABLE D5-4. Sensitivity Analyses of Human Studies of DBP and AGD.

TABLE D5-4

Sensitivity Analyses of Human Studies of DBP and AGD.

DEP Meta-Analysis

Primary Analysis

In the primary analysis, four studies (see Table D5-5), with beta coefficients standardized to a percent change per log10 change in DEP exposure, were analyzed using a random effects model. A summary estimate of-1.94 [95% CI: -3.88, 0.001] (p = 0.0501) was found (see Figure D5-7). There was some heterogeneity, with an estimated I2 value of 29%, though the Q statistic was not statistically significant. In the five sensitivity analyses (see Figures D5-8 D5-9 and Table D5-6), effect sizes ranged from -1.11 to -2.54; only one of the five analyses was statistically significant. Additionally, heterogeneity with I2>50% was observed in three of the five sensitivity analyses (though none were statistically significant). Thus, while the primary analysis suggests DEP exposure being associated with decreased AGD, the effect size is small (e.g., as compared to DEHP or DBP), the statistical significance of the result was not robust, and some heterogeneity was observed.

TABLE D5-5. Studies Included in the Meta-Analysis of DEP and AGD.

TABLE D5-5

Studies Included in the Meta-Analysis of DEP and AGD.

FIGURE D5-7. Meta-analysis of human studies of DEP and AGD; reported effect estimates [95% confidence interval] from individual studies and overall pooled estimate from random effects (RE) model per 10-fold increase in DEP exposure.

FIGURE D5-7

Meta-analysis of human studies of DEP and AGD; reported effect estimates [95% confidence interval] from individual studies and overall pooled estimate from random effects (RE) model per 10-fold increase in DEP exposure.

Sensitivity Analyses
FIGURE D5-8. Sensitivity analysis of human studies of DEP and AGD performed by leaving one study out at a time.

FIGURE D5-8

Sensitivity analysis of human studies of DEP and AGD performed by leaving one study out at a time.

FIGURE D5-9. Sensitivity analysis of human studies of DEP and AGD performed by restricting analysis to the same outcome measure (AGD [ap]).

FIGURE D5-9

Sensitivity analysis of human studies of DEP and AGD performed by restricting analysis to the same outcome measure (AGD [ap]).

TABLE D5-6. Sensitivity Analyses of Human Studies of DEP and AGD.

TABLE D5-6

Sensitivity Analyses of Human Studies of DEP and AGD.

TABLE D5-7Studies Included in the Meta-Analysis of DIBP and AGD

ReferenceOutcomeMean AGD, mmExposure MetricEstimate, %Lower CIUpper CI
Swan et al. 2015AGD (as)24.73MIBP (maternal urine, trimester 1)-1.98-6.832.91
Jensen et al. 2016AGD (as)36.90MIBP (maternal urine)-0.19-5.615.18
Swan 2008AGD (ap)70.40MIBP (maternal urine)-4.20-9.160.76

DIBP Meta-Analysis

In the primary analysis, three studies (see Table 5-7), with beta coefficients standardized to a percent change per log10 change in DIBP exposure, were analyzed using a random effects model. A summary estimate of -2.23 [95% CI: -5.15, 0.70] (p = 0.13) was found (see Figure D5-10). There was no significant heterogeneity, with an estimated I2 value of 0% (Q statistic was not statistically significant). In the sensitivity analyses (see Figures D5-11 and D5-12 and Table D5-8), effect sizes ranged from -1.18 to -3.07, none of which were statistically significant. In sum, although a small effect was observed, the precision of the estimate was not sufficient to rule out chance. Thus, the available studies do not support DIBP exposure being associated with decreased AGD.

FIGURE D5-10. Meta-analysis of human studies of DIBP and AGD; reported effect estimates [95% confidence interval] from individual studies and overall pooled estimate from random effects (RE) model per 10-fold increase in DIBP exposure.

FIGURE D5-10

Meta-analysis of human studies of DIBP and AGD; reported effect estimates [95% confidence interval] from individual studies and overall pooled estimate from random effects (RE) model per 10-fold increase in DIBP exposure.

FIGURE D5-11. Sensitivity analysis of human studies of DIBP and AGD performed by leaving one study out at a time.

FIGURE D5-11

Sensitivity analysis of human studies of DIBP and AGD performed by leaving one study out at a time.

FIGURE D5-12. Sensitivity analysis of human studies of DIBP and AGD performed by restricting analysis to the same outcome measure (AGD [ap]).

FIGURE D5-12

Sensitivity analysis of human studies of DIBP and AGD performed by restricting analysis to the same outcome measure (AGD [ap]).

TABLE D5-8. Sensitivity Analyses of Human Studies of DIBP and AGD.

TABLE D5-8

Sensitivity Analyses of Human Studies of DIBP and AGD.

DINP Meta-Analysis

In the primary analysis, three studies (see Table D5-9), with beta coefficients standardized to a percent change per log10 change in DINP exposure, were analyzed using a random effects model. A summary estimate of-0.96 [95% CI: -4.17, 2.25] (p = 0.56) was found (see Figure D5-13). Heterogeneity was observed, with an estimated I2 value of 58%, though the Q statistic was not statistically significant. In the sensitivity analyses (see Figures D5-14 and D5-15 and Table D5-10), effect sizes ranged from -2.42 to -0.30, none of which were statistically significant. Thus, the available studies do not support DINP exposure being associated with decreased AGD.

TABLE D5-9. Studies Included in the Meta-Analysis of DINP and AGD.

TABLE D5-9

Studies Included in the Meta-Analysis of DINP and AGD.

FIGURE D5-13. Meta-analysis of human studies of DINP and AGD; reported effect estimates [95% confidence interval] from individual studies and overall pooled estimate from random effects (RE) model per 10-fold increase in DINP exposure.

FIGURE D5-13

Meta-analysis of human studies of DINP and AGD; reported effect estimates [95% confidence interval] from individual studies and overall pooled estimate from random effects (RE) model per 10-fold increase in DINP exposure.

FIGURE D5-14. Sensitivity analysis of human studies of DINP and AGD performed by leaving one study out at a time.

FIGURE D5-14

Sensitivity analysis of human studies of DINP and AGD performed by leaving one study out at a time.

FIGURE D5-15. Sensitivity analysis of human studies of DINP and AGD performed by restricting analysis to the same outcome measure (AGD [ap]).

FIGURE D5-15

Sensitivity analysis of human studies of DINP and AGD performed by restricting analysis to the same outcome measure (AGD [ap]).

TABLE D5-10. Sensitivity Analyses of Human Studies of DINP and AGD.

TABLE D5-10

Sensitivity Analyses of Human Studies of DINP and AGD.

REFERENCES

  • Bornehag CG, Carlstedt F, Jonsson BA, Lindh CH, Jensen TK, Bodin A, Jonsson C, Janson S, Swan SH. Prenatal phthalate exposures and anogenital distance in Swedish boys. Environ. Health Perspect. 2015;123(1):101–107. [PMC free article: PMC4286276] [PubMed: 25353625]
  • Bustamante-Montes LP, Hernández-Valero MA, Flores-Pimentel D, García-Fábila M, Amaya-Chávez A, Barr DB, Borja-Aburto VH. Prenatal exposure to phthalates is associated with decreased anogenital distance and penile size in male newborns. J. Dev. Orig. Health Dis. 2013;4(4):300–306. [PMC free article: PMC3862078] [PubMed: 24349678]
  • Jensen TK, Frederiksen H, Kyhl HB, Lassen TH, Swan SH, Bornehag CG, Skakkebaek NE, Main KM, Lind DV, Husby S, Andersson AM. Prenatal exposure to phthalates and anogenital distance in male infants from a low-exposed Danish cohort (2010-2012). Environ. Health Perspect. 2016;124(7):1107–1113. [PMC free article: PMC4937858] [PubMed: 26672060]
  • Martino-Andrade AJ, Liu F, Sathyanarayana S, Barrett ES, Redmon JB, Nguyen RH, Levine H, Swan SH. Timing of prenatal phthalate exposure in relation to genital endpoints in male newborns. Andrology. 2016;4(4):585–593. [PubMed: 27062102]
  • Swan SH. Environmental phthalate exposure in relation to reproductive outcomes and other health endpoints in humans. Environ. Res. 2008;108(2):177–184. [PMC free article: PMC2775531] [PubMed: 18949837]
  • Swan SH, Sathyanarayana S, Barrett ES, Janssen S, Liu F, Nguyen RH, Redmon JB. First trimester phthalate exposure and anogenital distance in newborns. Hum. Reprod. 2015;30(4):963–972. [PMC free article: PMC4359397] [PubMed: 25697839]
Copyright 2017 by the National Academy of Sciences. All rights reserved.
Bookshelf ID: NBK453255

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (31M)

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...