U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Viswanathan M, Berkman ND, Dryden DM, et al. Assessing Risk of Bias and Confounding in Observational Studies of Interventions or Exposures: Further Development of the RTI Item Bank [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Aug.

Cover of Assessing Risk of Bias and Confounding in Observational Studies of Interventions or Exposures: Further Development of the RTI Item Bank

Assessing Risk of Bias and Confounding in Observational Studies of Interventions or Exposures: Further Development of the RTI Item Bank [Internet].

Show details

Introduction

A primary concern for systematic reviews of the effectiveness of interventions is to evaluate the causal relationship between the intervention and outcomes. Despite the basic nature of causality as the motivator for much of science, definitions of causality tend to be circular (“to cause” is to produce an effect, and “to produce” is to cause).1 Definitions of causality that reference the counterfactual offer a way out of the circularity.1 A causal effect is “a counterfactual contrast between the outcomes of a single unit under different treatment possibilities.”1, p. 1914 For example, a pregnant woman will deliver her child using a single mode of delivery. Although her selected mode of delivery may have resulted in respiratory distress syndrome, we can postulate a different outcome should the woman have employed another mode of delivery. This approach to thinking about causal inference implies that studies that seek to establish a causal link need to meet three conditions: (1) a causal contrast involving two or more well-defined interventions,2 (2) independence between the counterfactual outcome and the intervention, and (3) each participant in the study having a positive probability (greater than zero) of being assigned to each of the evaluated interventions.3

Randomization offers a participant an equal probability of being assigned to each treatment alternative, distributes unmeasured confounding randomly, and clearly sets up contrasts between interventions. Because randomization ensures that these three conditions for causal inference are met, randomized clinical trials (RCTs) are generally considered the gold standard for evidence of benefits.

RCTs, however, cannot always be used to answer questions on the causal link between interventions or exposures and outcomes. RCTs may be unethical.4 A review consisting of RCTs alone may provide insufficient information on adverse effects, long-term benefits,5 or vulnerable subpopulations.6 In the absence of sufficient evidence from RCTs, Agency for Healthcare Research and Quality (AHRQ) Evidence-based Practice Center (EPC) guidance suggests that systematic reviews may include observational studies to help answer questions about the causal link between the intervention or exposure and outcomes.7 This approach, mirrored by other recent guides developed by the Cochrane Collaboration and the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group, offers cautious support for reliance on observational studies.8-10

In addition to other bias risks, when including observational study evidence, systematic reviewers have to contend with the possibility of confounding; that is, the potential that extraneous factors, rather than the factors of interest (the intervention or exposure), influenced the results.1 “Confounding by indication” or “allocation bias” refers to a common cause that influences both treatment and outcome.11 In the example of mode of delivery, fetal distress increases the likelihood that the mother will undergo cesarean delivery in addition to the likelihood of neonatal respiratory distress. Confounding from the indication of fetal distress makes it difficult to assess the independent effect of the mode of delivery on neonatal respiratory distress. When reviewers rely on evidence from observational studies, the inferences they draw need to account for potential important confounding, as researchers did in a recent asthma study. a

In this project, we reviewed methodological considerations when evaluating the validity of studies included in systematic reviews; specifically, concerns related to risk of bias and confounding in observational studies. Then, based on earlier work and current project activities, we developed a framework for the assessment of the risk of bias and confounding across a body of observational study evidence and refined an existing tool to aid in identifying risk of bias and confounding in individual studies.

Empirical Assessments of Risk of Bias

The higher theoretical risk of some types of bias in observational studies compared with RCTs (described in detail in Appendix A, based on our overview of the literature) raises a fundamental question when assessing causality in observational studies: do observational studies routinely provide a more inflated estimate of effect than trials?14 If so, can their results be discounted to arrive at a closer approximation of the true effect? Empirically, however, reviews have found no difference15,16 or inconsistent differences between types of designs (that is, RCTs sometimes had smaller estimates of effect than observational studies and vice versa).17,18 This unpredictability in direction and magnitude of effect means that systematic reviews cannot rely on a rule of thumb (based on type of design)19 to discount evidence from observational studies. Instead, a careful assessment of the risk of bias in each individual observational study that accounts for its unique clinical context is necessary to evaluate the validity of estimates from observational studies.14

Another important question when assessing causality in observational studies is whether empirical assessments of risk of bias help to shape our understanding of the validity of evidence. MacLehose et al. were unable to show associations between study quality and relative risk in predictable ways, suggesting the need for improved instrumentation for evaluating risk of bias for different types of observational studies: they note “compromises and ambiguities” arising from the use of the same instrument for all study designs and that “[d]eveloping an instrument to assess and characterize different studies is an urgent priority.”16, p. 45 Concern for better instrumentation is echoed by other researchers. Shrier et al. note that ideally reviews would weigh the results of a study with the potential for bias, but that approach requires that quality scores “be highly correlated with bias; therefore, there must be agreement on which items create which biases, in which direction and of what magnitude.”19, p. 1208 Consensus on the direction and magnitude of bias caused by aspects of study design and performance does not yet exist.

Need for a Revised Approach to Evaluating Observational Studies

In a review of critical appraisal tools for assessing the risk of bias of observational studies, Deeks et al. identified key quality domains (background, sample definition and selection, interventions, outcomes, creation of treatment groups, blinding, soundness of information, followup, analysis of comparability, analysis of outcomes, interpretation, and presentation and reporting) but found no gold standard for how the assessment should best be accomplished.20 Tools typically focus their assessments on either: (1) capturing manuscript authors' descriptions or reporting of the methods they used in designing or conducting particular elements of the study, or (2) judging the risk of bias based on the study's design or implementation (whether the conduct of the study altered the validity of results). A recent review by Mhasker et al. reveals the inadequacies of risk of bias assessment tools, all of which rely in varying degrees on reporting. They compared quality in protocols to reported quality in publications from 429 RCTs. Their results showed that reported quality (from publications) did not adequately reflect the actual high quality of the trials (from protocols); moreover, the associations between poor allocation concealment or blinding and effect size as reported in publications did not persist when examining the association between these two quality domains and effect size based on descriptions in the protocols.21 Although the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement may improve the reporting of observational studies22 in the future, risk of bias instruments for observational studies that rely on reporting suffer from the same constraints as risk of bias instruments for trials and study conduct is less likely to be based on published protocols.

Another constraint of some existing instruments is the reliance on a scale to summarize their findings. Empirical tests of the validity and reliability of these scales suggest the need for critical analysis of individual bias components rather than dependence solely on checklists and scales. Juni et al. noted dramatically different results in meta-analyses when different quality rating scales were used.23 For observational studies in particular, mechanical scoring of items on a checklist that focuses on quality of reporting, ethical issues, background, rationale, and so on, will fail to assess the critical question: whether the outcomes can be attributed to the effects of the intervention.24

As noted above, available empirical evidence cautions against merely applying weights to observational studies based on scores on a bias checklist.24-26 Rather, the risk of bias requires interpretation based on an understanding of the content of the topic, particularly when evaluating studies in a heterogeneous body of observational evidence (e.g., differently defined interventions and outcomes, choice of analytic methods). Existing guidance offers limited assistance with interpretation of risk of bias in observational studies, particularly with regard to confounding. “The Cochrane Handbook of Systematic Reviews,”9 AHRQ guidance,7 and the Institute of Medicine (IOM) standards for systematic review8 all detail reasons to include observational studies for harms and attendant risks and to monitor longer-term outcomes.27 These documents caution that the risk of bias will always be greater for observational studies than for RCTs. All offer general guidance on sources of bias for observational studies.

Both the IOM8 and AHRQ guidance28 discuss the role for plausible confounding (confounding not controlled for in a study that inflates the observed effect) to increase the strength of evidence in a rating system in which observational studies start out with a lower grade, but offer no framework to consider how to evaluate the risk of confounding. Cochrane guidance notes that issues of confounding cannot be easily addressed within existing instruments evaluating the risk of bias in individual studies in isolation and suggests developing summary tables to identify prespecified confounders and variables controlled for in the analysis for each study. This information is intended to illustrate the extent of heterogeneity in the literature. Although the guidance does not require that all Cochrane reviews that include observational studies develop these tables, reviewers need to demonstrate that they have considered the role of residual confounding (confounding not controlled for in the design or analysis of the study) in explaining the findings from nonrandomized studies.9

Earlier conceptual and empirical work described above suggests the following needs for evaluating causality in observational studies: (1) consensus around potential sources of bias for different observational study designs,1,6,19 (2) content-specific criteria to rate risk of bias,1,6,19 (3) use of risk of bias criteria to understand heterogeneity in results across individual studies rather than merely to weight pooled estimates of effect,24-26 and (4) a framework to adequately capture risk of bias and confounding concerns that are specific to a body of observational studies included in a systematic review.

Earlier Phases of Development

Our own previous work developed a tool consisting of specific questions that reviewers can use for identifying context-specific sources of biases and confounding in observational studies29 but stopped short of offering guidance on a larger framework for its use. In this earlier project, we developed the tool through first, identifying a large list of questions that had been used previously in EPC systematic reviews or other instruments identified by the research team and a Technical Expert Panel. A resulting list of 1,492 questions was culled to eliminate duplicates and minor wording differences and refined through face, cognitive, and content validity and inter-rater reliability testing. Based on this process, the RTI Item Bank included 29 questions which sought to comprehensively capture the risk of bias and precision domains critical for evaluating observational studies. Precision was included in the bank to allow reviewers to create a tool to evaluate what has been traditionally been referred to as the “quality” of an individual study, inclusion of both risk of bias and precision. The bank also included directions for adapting questions to a specific topic area and assistance in selecting questions relevant for particular study designs (e.g., case series, case-control, cohort, and cross-sectional).29 The next step in developing this tool required consensus around which specific questions are essential for evaluating a study of each design, and further refinement of the questions and related instructions.

Project Objectives

The project's objectives, in response to the needs described above, included creating a framework for evaluating the risk of bias and confounding in individual studies included in a body of observational study evidence, developing consensus around sources of bias for different observational study designs, and making enhancements to the RTI Item Bank. We determined, based on discussions with our expert Working Group, that adequate evaluation of the risk of bias and confounding in observational studies included in a systematic review could not be accomplished solely within the confines of questions included in an item bank evaluating individual studies in isolation. Therefore the project's activities expanded to more adequately accommodate the project's goal. Tasks included: (1) developing a process framework to assess the effect of confounding across the body of observational study evidence as well as within individual studies, (2) identifying critical sources of bias and confounding most common to each observational study design type, (3) refining and reducing the set of “core” questions that would be necessary for evaluating risk of bias and confounding concerns for each design, and (4) refining the instructions provided to users to improve clarity and usefulness so that the RTI Item Bank is a easily accessible and practical tool that can be used across EPCs and other organizations conducting systematic reviews.

Determining, from the questions available within the item bank, the best set of questions to use for studies with specific design features and subject topics requires sufficient epidemiological expertise to classify study designs, experience conducting systematic reviews, and familiarity with risk of bias rating. However, even experienced researchers show poor inter-rater reliability in classifying study designs.30 To facilitate optimal use of the item bank, further development was needed to provide guidance to users in appropriately identifying the study designs included in their reviews and the possible bias concerns specific to those designs (sources of bias referred to above). Users could then use the item bank to create an instrument to capture the most likely risk of bias concerns in their subject area (content-specific criteria referred to above). We believe these enhancements will improve the practicality and user-friendliness of instruments created from the item bank and may help promote their inter-rater reliability.

Footnotes

a

As an example, a recent systematic review examined the association between the use of acetaminophen and risk of asthma.12 A subsequent prospective cohort study attributed the association to confounding by indication.13 That is, the study showed that individuals who subsequently developed asthma were taking acetaminophen to manage early infections of the lower respiratory tract; however, after adjustment for respiratory infections, or when acetaminophen use was restricted to non-respiratory tract infections, no association was found. Therefore, they concluded that acetaminophen use was not an independent risk factor for development of asthma, and that previous positive findings were due to confounding by indication. The researchers commented that setting up an RCT with the aim of studying the adverse effect would be infeasible. The example shows how diligent analyses of observational data can find solutions to problems of confounding by indication.

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (900K)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...