U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

West SL, Gartlehner G, Mansfield AJ, et al. Comparative Effectiveness Review Methods: Clinical Heterogeneity [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2010 Sep.

Cover of Comparative Effectiveness Review Methods: Clinical Heterogeneity

Comparative Effectiveness Review Methods: Clinical Heterogeneity [Internet].

Show details

Executive Summary

Introduction

The Agency for Healthcare Research and Quality (AHRQ) commissioned the RTI International–University of North Carolina at Chapel Hill (RTI-UNC) Evidence-based Practice Center (EPC) to explore how systematic review groups have dealt with clinical heterogeneity and to seek out best practices for addressing clinical heterogeneity in systematic reviews (SRs) and comparative effectiveness reviews (CERs). Such best practices, to the extent they exist, may enable AHRQ’s EPCs to address critiques from patients, clinicians, policymakers, and other proponents of health care about the extent to which “average” estimates of the benefits and harms of health care interventions apply to individual patients or to small groups of patients sharing similar characteristics.

Such users of reviews often assert that EPC reviews typically focus on broad populations and, as a result, often lack information relevant to patient subgroups that are of particular concern to them. More important, even when EPCs evaluate literature on homogeneous groups, there may be varying individual treatment for no apparent reason, indicating that average treatment effect does not point to the best treatment for any given individual. Thus, the health care community is looking for better ways to develop information that may foster better medical care at a “personal” or “individual” level. (We do not use the phrases “personalized medicine” or “individualized medicine” here, because of the terms’ commonly understood applications in genetics and genomics.)

To address our charge for this methods project, the EPC set out to answer six key questions (KQ) (Table A). AHRQ assigned these KQs to us and we worked with AHRQ staff and the EPC program’s Scientific Resource Center (SRC) at the Oregon Health & Science University on approaches to address the five empirical issues and their subquestions. KQ 6 asked the project team to put forward ideas that an AHRQ cross-EPC work group might take on in 2010 or later, drawing on our findings for the first five questions. As implied by KQ 6, AHRQ wanted to understand how its EPC program (and the EPC tasks in the Effective Health Care Program related to production of CERs) can better address concerns of stakeholders related to clinical heterogeneity—i.e., how confidently clinicians, policymakers, and others can draw conclusions about the effectiveness of interventions from reports that account for clinical heterogeneity in both the populations of interest to them and the populations studied. Although the first set of audiences are oriented to the United States, we believe that our findings can be helpful to systematic reviewers globally.

Table A. Key questions for methods report on clinical heterogeneity.

Table A

Key questions for methods report on clinical heterogeneity.

Before focusing on clinical heterogeneity per se, we needed to clarify three other terms often appearing in EPC reviews: effect measure, methodologic heterogeneity, and statistical heterogeneity. Table B provides definitions of these concepts and, specifically, gives the definition we used for clinical heterogeneity for this project.

Table B. Core concepts of heterogeneity and their definitions.

Table B

Core concepts of heterogeneity and their definitions.

Heterogeneity (of any type) in EPC reviews is important because its appearance suggests that included studies differed on one or more dimensions such as patient demographics, study designs, coexisting conditions, or other factors. EPCs then need to clarify for clinical and other audiences, collectively referred to as stakeholders, what are the potential causes of the heterogeneity in their results. This will allow the stakeholders to understand whether and to what degree they can apply this information to their own patients or constituents. Of greatest importance for this project was clinical heterogeneity, which we define as the variation in study population characteristics, coexisting conditions, cointerventions, and outcomes evaluated across studies included in an SR or CER that may influence or modify the magnitude of the intervention measure of effect (e.g., odds ratio, risk ratio, risk difference)Assessing how systematic reviewers approach clinical heterogeneity required us to develop and adopt a working definition of clinical heterogeneity and to explore how reviewers typically treat various types of heterogeneity. One major issue in dealing with heterogeneity, and clinical heterogeneity in particular, was that these terms have not been used consistently in clinical research or the SR literature. In fact, the term “clinical heterogeneity” may be more appropriate when used in the context of individual clinical studies rather than for SRs and CERs. Some review groups, such as the Cochrane Collaboration and the Centre for Reviews and Dissemination, use “clinical diversity” rather than “clinical heterogeneity” to describe clinical differences among studies in SRs. Because we could not find a clear definition of clinical heterogeneity in guidance documents or in the published literature that distinguished clinical heterogeneity from clinical diversity, we treat “clinical heterogeneity” and “clinical diversity” as synonymous in this report.

Researchers often consider any heterogeneity problematic because it indicates that pooling across studies may not be appropriate, yet true heterogeneity can be informative by suggesting new avenues for research investigations. Ideally, one could differentiate between heterogeneity of treatment effects stemming from factors, such as demographics, coexisting conditions, treatments, or genetics (what we and other researchers term “clinical heterogeneity”) and that resulting from variability in study design and analysis (which we and others refer to as “methodologic heterogeneity”). However, trying to distinguish between clinical and methodological heterogeneity is not easy because they are intertwined; both can and do co-occur in SRs and CERs.

Alternatively, “statistical heterogeneity” refers to the variability in observed treatment effects that is beyond what would be expected by random error (chance). Statistical heterogeneity may signal the presence of clinical heterogeneity, methodological heterogeneity, or chance. The difference between clinical heterogeneity and statistical heterogeneity can be thought of as a cause and an effect relationship, respectively: when clinical heterogeneity is apparent across studies included in a meta-analysis, it can lead to some degree of statistical heterogeneity.

How to address clinical heterogeneity when conducting SRs, CERs, and meta-analyses has been discussed in the literature, but little consensus has been reached on best practices for identifying and understanding the factors underlying such heterogeneity, which was one of the goals of this project.

Methods

To produce this methods report on issues relating to clinical heterogeneity and the six key questions, we used a variety of data sources. For KQ 1, we reviewed guidance documents developed by organizations involved in developing SRs and clinical practice guidelines. These organizations included AHRQ, Centre for Reviews and Dissemination (CRD), Cochrane Collaboration, Drug Effectiveness Review Program (DERP), Institute for Quality and Efficiency in Health Care (IQWIG), National Health and Medical Research Council (NHMRC), UK National Institute for Health and Clinical Excellence (NICE), and various health technology assessment organizations.

The literature base for KQs 2 and 3 included selected, relevant literature—i.e., SRs and CERs and similar reports (e.g., health technology assessments)—completed by four organizations with extensive expertise in literature syntheses: AHRQ, Cochrane Collaboration, DERP, and NICE. Our sample also included syntheses catalogued in the Centre for Reviews and Dissemination’s Database of Abstracts of Reviews of Effects (DARE) and the Health Technology Assessment (HTA) database. We limited our evaluation to reviews of 15 clinical conditions: breast cancer, lung cancer, prostate cancer, congestive heart failure, cesarean section, chronic kidney disease, chronic obstructive pulmonary disease (COPD), depression, dyspepsia/gastroesophageal reflux disease (GERD), heavy menstrual bleeding, hypertension, irritable bowel syndrome, labor induction, myocardial infarction, and osteoarthritis.

To address KQs 4 and 5, on critiques of reviews and best practices for dealing with clinical heterogeneity, we examined peer and public review comments from three CERs and conducted a literature scan to identify articles that discussed clinical heterogeneity. In addition, we conducted a small number of interviews with key informants to address KQ 5 further and to inform our recommendations for KQ 6.

Results

Determining the answers to two questions—whether an intervention will benefit some patients more (or less) than others and which patients are at greatest (or least) risk of harm when receiving an intervention—is the primary purpose for evaluating clinical heterogeneity in SRs and CERs.

Clinical heterogeneity definitions across review groups. Our first finding was that use and definitions of the terms “heterogeneity,” “clinical heterogeneity,” and “clinical diversity” varied among review groups, often without any clear distinctions among the definitions. As mentioned earlier, The Cochrane Handbook for Systematic Reviews of Interventions defines heterogeneity as “any kind of variability among studies in a systematic review,” but states that variability in the participants, interventions, and outcomes studied is termed “clinical diversity.” AHRQ, CRD, Cochrane Collaboration, DERP, NICE, and EUnetHTA all discuss variability in the population, interventions, and outcomes. These are three of the six factors to be considered in the development of KQs for AHRQ SRs (“PICOTS,” i.e., population, intervention, comparator, outcomes, the timing of their measurement, and setting).

Clinical heterogeneity is closely linked to statistical heterogeneity. The occurrence of clinical heterogeneity may lead to statistical heterogeneity that is detected using techniques such as Cochran’s Q test, the I2 index, or meta -regression. Statistical heterogeneity may signal the presence of clinical heterogeneity, methodological heterogeneity, or chance (random error). If reviewers detect statistical heterogeneity, they cannot be sure whether to attribute it to clinical heterogeneity, methodologic heterogeneity, chance, or some combination of the three.

Clinical heterogeneity is also closely related to applicability. This concept, otherwise known as “external validity” or “generalizability,” refers to whether, and to what extent, analysts can decide that they can generalize intervention-outcome associations to different persons, treatments, outcomes, or settings.

Addressing how clinical heterogeneity has been handled in the development of key questions. We evaluated the KQs from 123 SRs and CERs conducted by systematic reviewers. We focused on whether the review groups considered demographic variables, disease variables (i.e., disease stage, type, or severity), risk factors for disease, cointerventions, and coexisting conditions.

The groups varied in the extent to which they included demographic variables and disease factors in their KQs. In addition, we detected differences in the extent to which the groups elaborated on how they identified the variables; conditions in which the literature base was more extensive (e.g., hypertension) tended to specify more variables related to clinical heterogeneity. Manuals for these review organizations stressed that reviewers should specify such variables when they develop the protocols for their reviews. Few groups, however, documented exactly when or how they determined which factors to include in their questions.

Reporting on how clinical heterogeneity is handled in the review process. We focused on the results sections from 11 AHRQ reviews. We looked at whether the authors considered demographic and clinical variables during the analysis phase of their work. Of these 11 reviews, all included a clinical factor (disease variable, risk factor, coexisting condition, or cointervention) in their analysis, and 10 considered one or more demographic variables. However, it is important to realize that what EPCs evaluate in their analysis reflects the extent of the available literature and AHRQ does not require specific analyses for investigating clinical heterogeneity.

Five general critiques of how SRs handle clinical heterogeneity were noted. They were from the peer and public review comments for three AHRQ CERs:

  1. Missing information on clinically relevant subgroups;
  2. Failure to include all studies with relevant information on clinically heterogeneous populations;
  3. Too much focus on randomized controlled trials (RCTs) that do not inform “real world” practice;
  4. Inappropriate pooling of dissimilar populations or loss of information because of pooling; and
  5. Too little discussion of availability and/or evaluation of subgroups in conclusions.

External reviewer comments claimed that the publications failed to address important subgroups, even though all three reviews in fact had had KQs focused on evaluating subgroups. It is very likely that the literature synthesized for these reviews did not have sufficient information to provide summary data on important subgroups or the authors chose not to present subgroup information.

From our literature search, we noted two major concerns. First, timing of subgroup identification is critical (i.e., a priori during the protocol development phase vs. post hoc during the analysis phase). Subgroups identified after the fact are often considered a product of data dredging; these subgroups are likely to be misleading and not confirmed in future studies. Second, the literature also cautions that testing of numerous subgroups without controlling for overall type I error probability may lead to misleading results as well.

Addressing clinical heterogeneity in SRs. We gleaned two best practices from the existing literature. First, authors should identify factors that may cause clinical heterogeneity during the protocol development stage. Second, they should keep the list of factors to as few as possible to avoid misleading results.

We noted similar views from authors of six SRs on myocardial infarction or osteoarthritis. They commented that, ideally, authors should consider such factors during the protocol development process, but they also acknowledged that, sometimes, too little information about a given topic is available to enable any a priori determination. For that reason, and given the varying literature available for any specific topic when initiating a review, systematic reviewers should be considering clinical heterogeneity throughout the entire review process. This means being attentive to such heterogeneity issues not only during protocol development and analysis of the results, but also as part of developing the inclusion/exclusion criteria, creating abstraction forms, and abstracting data from articles.

Some authors combined clinical and methodological heterogeneity under the rubric of clinical heterogeneity. In addition, many of the publications we reviewed indicated that analysis of individual patient-level data in meta-analyses may allow better assessment of clinical heterogeneity, but the time, cost, and difficulty in obtaining these data are often prohibitive barriers to such analyses.

We provide the following suggestions for extending this work by an evidence-based practice work group. We note, as well, that these are not settled matters in the broader world of systematic reviewers. Thus, any elucidation of these types of questions should prove of benefit beyond the AHRQ EPC ambit. For that reason, and to gain the most up-to-date thinking across many groups dealing with these same problems, AHRQ’s EPC program may wish to involve leaders in the SR field from outside AHRQ and outside the U.S.

The 11 questions in Table C, offered in a somewhat “chronological order” as authors might move through a review, are our priority recommendations for what an EPC work group might address. Many might have obvious subquestions, but we believe that this set can establish a robust agenda for any work group.

Table C. Topics for specific charge to the work group.

Table C

Topics for specific charge to the work group.

Discussion

Clinical heterogeneity exists when patient-level factors—most commonly variables related to patient characteristics, disease location and severity, comorbidities, and accompanying treatment—influence or modify the magnitude of the treatment effect. Unlike statistical heterogeneity which can be quantified, clinical heterogeneity is detected and evaluated without using statistical methods. Moreover, if a reviewer detects statistical heterogeneity, he/she cannot be sure whether to attribute it to clinical heterogeneity, methodologic heterogeneity (study design issues), chance, or some combination of the three. Also, we say there is a distinction between clinical and methodological heterogeneity but drawing a firm line between them is often difficult.

Clinicians and patients are interested in which factors have important effects on the intervention-outcome association because this information helps them understand who is likely to benefit the most, who is likely to benefit the least, and who has the greatest (or preferably least) risk of experiencing adverse outcomes. Because every patient is different with respect to their comorbidities and concurrent treatments, clinicians need to know the extent to which a test or treatment might benefit the next patient they see. Likewise, patients seek to know whether a test or treatment will benefit them individually.

Unfortunately, research studies are not designed to answer treatment questions about individual patients. In order to provide robust conclusions, we need to study large numbers of individuals. And, if we want to be able to say anything about specific subgroups of the population such as those who are over age 65 with very severe disease, we need to make sure that we include enough people who are elderly with late-stage disease in our studies (i.e., we have to power our studies to estimate the intervention-outcome association in this subgroup). Thus, researchers need to consider subgroups in the planning of their original research studies.

This assumes, however, that we know which particular subgroup is important to evaluate in the primary studies that are eventually included in SRs and CERs. For some intervention-outcome associations, the research literature is so sparse that specifying subgroups a priori for an SR or CER is almost impossible because this information has not been published. Those conducting original research studies may have evaluated important subgroups but not included the information in the published paper. This may occur for two reasons: because the study was not powered to provide robust estimates for that subgroup or because including information on all of the subgroups evaluated might be considered data dredging.

Alternatively, systematic reviewers may be faced with the dilemma of having too many subgroups to evaluate if the topic has been well researched. Reviewers need to be cognizant that when evaluating numerous subgroups, either in original research papers or SRs, one might want to control for multiple comparisons or else the findings may be misleading.

Addressing clinical heterogeneity in various types of SRs is a necessary step and some review organizations do provide guidance and rules on how to identify and evaluate clinical heterogeneity. The AHRQ EPC Methods Guide does not yet pro vide guidance on how to identify clinical heterogeneity variables that might modify estimates of treatment outcome in any given review, although an upcoming guidance paper on assessing applicability does provide some suggestions on how to select factors that may be considered for assessing both applicability and clinical heterogeneity. Neither does it discuss methods for addressing clinical heterogeneity or provide suggestions for inclusion in final reports. We conclude that our findings and the recommendations noted earlier can provide a foundation for an AHRQ workgroup to strategize on how to best address these issues for the EPCs.

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...