U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Cover of Assessing the Predictive Validity of Strength of Evidence Grades: A Meta-Epidemiological Study

Assessing the Predictive Validity of Strength of Evidence Grades: A Meta-Epidemiological Study

Research White Paper

Investigators: , MD, MPH, , MD, , MOP, , PhD, , MSc, PhD, , PhD, MPH, , MD, MPH, , PhD, MPH, , MD, , MS, , MA, and , PhD, MPhil, MA.

Author Information and Affiliations
Rockville (MD): Agency for Healthcare Research and Quality (US); .
Report No.: 15-EHC028-EF

Structured Abstract

Objective:

We sought to determine the predictive validity of the U.S. Evidence-based Practice Center (EPC) approach to GRADE (Grading of Recommendations Assessment, Development and Evaluation) by examining how reliably it can predict the likelihood that treatment effects remain stable as new studies emerge.

Study design and setting:

Based on 37 Cochrane reports with outcomes graded as high strength of evidence (SOE), we prepared 160 documents using portions of these bodies of evidence in a chronological order. We randomly assigned these documents, which represented different levels of SOE, to professional systematic reviewers from seven academic centers in Austria, Canada, and the United States, who dually graded the SOE using guidance for the EPC program. For each of the 160 documents, we determined whether estimates remained stable as subsequent studies were added to the evidence base. For each grade of SOE, we compared the observed proportion of stable estimates with the expected proportion from an international survey. To determine the predictive validity, we used the Hosmer-Lemeshow test to assess calibration and the C (concordance) index to assess discrimination.

Results:

Overall, the predictive validity of the EPC approach to GRADE for the stability of effect estimates was limited. Except for moderate SOE, the expected and observed proportions of stable effect estimates differed considerably. Estimates graded as high SOE were less likely to remain stable than expected by producers and users of systematic reviews. By contrast, estimates graded as low or insufficient SOE were substantially more likely to remain stable than expected. In this sample, the EPC approach to GRADE could not reliably predict the likelihood that individual bodies of evidence remain stable as new evidence becomes available. Depending on the definition used, C-indices ranged between 0.56 (95% CI, 0.47 to 0.66) and 0.58 (95% CI, 0.50 to 0.67) indicating a low discriminatory ability.

Conclusions:

The limited predictive validity of the EPC approach to GRADE seems to reflect a mismatch between expected and observed changes in treatment effects as bodies of evidence advance from insufficient to high SOE. In addition, many low or insufficient grades appear to be too strict.

Prepared for: Agency for Healthcare Research and Quality, U.S. Department of Health and Human Services1, Contract No. 290-2012-00008-I. Prepared by: RTI International–University of North Carolina Evidence-based Practice Center, Research Triangle Park, NC

Suggested citation:

Gartlehner G, Dobrescu A, Swinson Evans T, Bann C, Robinson KA, Reston J, Thaler K, Skelly A, Glechner A, Peterson K, Kien C, Lohr KN. Assessing the Predictive Validity of Strength of Evidence Grades: A Meta-Epidemiological Study, Research White Paper. (Prepared by the RTI International–University of North Carolina Evidence-based Practice Center under Contract No. 290-2012-00008-I.) AHRQ Publication No. 15-EHC028-EF. Rockville, MD: Agency for Healthcare Research and Quality. September 2015. www.effectivehealthcare.ahrq.gov/reports/final.cfm.

This report is based on research conducted by the RTI International–University of North Carolina Evidence-based Practice Center (EPC) under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No290-2012-00008-I). The findings and conclusions in this document are those of the authors, who are responsible for its contents; the findings and conclusions do not necessarily represent the views of AHRQ. Therefore, no statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.

The information in this report is intended to help health care decisionmakers—patients and clinicians, health system leaders, and policymakers, among others—make well informed decisions and thereby improve the quality of health care services. This report is not intended to be a substitute for the application of clinical judgment. Anyone who makes decisions concerning the provision of clinical care should consider this report in the same way as any medical reference and in conjunction with all other pertinent information, i.e., in the context of available resources and circumstances presented by individual patients.

AHRQ or U.S. Department of Health and Human Services endorsement of any derivative products that may be developed from this report, such as clinical practice guidelines, other quality enhancement tools, or reimbursement or coverage policies may not be stated or implied.

Drs. Gartlehner and Thaler are members of the GRADE Working Group. Drs. Gartlehner, Lohr, and Reston are co-authors of the AHRQ guidance for grading the strength of evidence. The other authors have no disclosures to report.

1

540 Gaither Road, Rockville, MD 20850; www​.ahrq.gov

Bookshelf ID: NBK321518PMID: 26468566

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.6M)

Related information

Similar articles in PubMed

See reviews...See all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...