NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Gartlehner G, Dobrescu A, Evans TS, et al. Assessing the Predictive Validity of Strength of Evidence Grades: A Meta-Epidemiological Study [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2015 Sep.
Assessing the Predictive Validity of Strength of Evidence Grades: A Meta-Epidemiological Study [Internet].
Show detailsOf 160 bodies of evidence, researchers dually graded 11 percent (n=17) as high, 42 percent (n=68) as moderate, 32 percent (n=51) as low, and 15 percent (n=24) as insufficient (very low) SOE. The inter-rater reliability was 0.56 (95% CI, 0.40 to 0.68), suggesting moderate agreement of researchers assigning SOE grades.
Concordance Between Expected and Observed Proportions of Stable Effect Estimates
For each grade, we compared the expected proportions of stable effect estimates with the observed proportion from our sample, using three different definitions of stability (see Methods and Table 2). Table 1 gave the proportions of estimates that producers and users of systematic reviews expected to remain stable for each SOE grade.
Overall, except for moderate SOE, the stability differed considerably between expected and observed proportions regardless of the definition used. Fewer estimates graded as high SOE in our sample remained stable relative to the expectations of producers and users of systematic reviews; that is, in our survey 208 experts expected high SOE outcomes to remain stable in at least 86 percent of the cases.6 In our sample, the observed proportions of stable estimates for definitions 1, 2, and 3 were, respectively, 71 percent, 76 percent, and 76 percent. Conversely, substantially more low or insufficient SOE estimates than expected remained stable. Table 4 presents expected and observed proportions of stable effect estimates by grade of SOE for each of the three definitions of stability.
Figures 2, 3, and 4 illustrate the overlap of expected proportions of stable effects (black large boxes) and confidence intervals (CI) of observed proportions (grey columns) for different grades of SOE and different definitions of stability. The circles in the columns reflect the point estimates. The y-axis delineates the proportion of estimates that remained stable; the x-axis presents the four grades of SOE. For insufficient SOE, for example, producers and users of systematic reviews expected 0 percent to 33 percent of estimates to remain stable as new studies are added to the evidence base. For definition 1, which was the most rigorous of the three definitions of stability, more than half (54 percent) of effect estimates graded as insufficient remained stable. The CIs ranged from 33 percent to 74 percent, which barely overlaps the expected range for insufficient SOE. For the less rigorous definitions 2 and 3, CIs did not overlap at all with the range that producers and users of systematic reviews expected from insufficient SOE grades. By contrast, observed proportions of stable results for moderate SOE grades were concordant for all three definitions. Confidence intervals overlap widely with the range of expected proportions. Estimates graded as low SOE show some concordance for definitions 1 and 3 but little for definition 2.
Predictive Validity of the EPC Approach to GRADE
To determine the predictive validity of the EPC approach to GRADE, we assessed the calibration (how accurately it can predict the likelihood that effect estimates will remain stable as new evidence evolves) and the discrimination (how accurately it can differentiate between effect estimates that will remain stable and those that will substantially change). In theory, an ideal predictive tool would reliably identify estimates with a high likelihood of remaining stable and always grade them as high SOE. Conversely, effect estimates with a very low likelihood of remaining stable would always be graded as insufficient. Such an ideal tool would have high calibration and a C index of 1.
Overall, regardless of the definition used, the calibration of the EPC approach to GRADE was suboptimal. When we compared observed proportions of stable effect estimates with lower, middle, and upper values of the ranges of expected proportions, eight of nine comparisons were statistically significantly different based on the Hosmer-Lemeshow test (Table 5), indicating a lack of calibration.
Likewise, the C indices for the EPC approach to GRADE were low, with values close to that expected by chance (i.e., C index=0.50). For definitions 1, 2, and 3, the C indices were 0.57 (95% CI, 0.50 to 0.67), 0.56 (95% CI, 0.47 to 0.66), and 0.58 (95% CI, 0.50 to 0.67), respectively. C indices for definitions 1 and 3 reached statistical significance (CIs did not cross 0.5). Taking the uncertainty of the confidence intervals into consideration, results mean that in the worst case (lower limit of CIs), the EPC approach to GRADE has no discriminatory ability for distinguishing between effect estimates with a low or high likelihood of remaining stable. In the best case (upper confidence limits), it can accurately distinguish between effect estimates with a low or high likelihood of remaining stable in 67 percent of cases.
The low overall predictive validity, however, is caused primarily by the discordance of expected and observed proportions of stable effect estimates for high and insufficient SOE. In a post-hoc sensitivity analysis, we chose proportions within the expected ranges (Table 1) that were closest to the observed proportions of stable effect estimates. Using expected proportions of 86 percent for high (lower end of expected range), 71 percent for moderate, 60 percent for low, and 33 percent for insufficient SOE (both upper end of expected range), we found that the EPC approach to GRADE achieved satisfactory calibration for definitions 1 and 3 (Table 5).
- Results - Assessing the Predictive Validity of Strength of Evidence Grades: A Me...Results - Assessing the Predictive Validity of Strength of Evidence Grades: A Meta-Epidemiological Study
- Methods - Assessing the Predictive Validity of Strength of Evidence Grades: A Me...Methods - Assessing the Predictive Validity of Strength of Evidence Grades: A Meta-Epidemiological Study
- ejaculatory bulb-specific protein 3-like precursor [Linepithema humile]ejaculatory bulb-specific protein 3-like precursor [Linepithema humile]gi|827012498|ref|NP_001296370.1|Protein
Your browsing activity is empty.
Activity recording is turned off.
See more...