NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Structured Abstract
Background:
Numerous tools exist to assess methodological quality, or risk of bias in systematic reviews; however, few have undergone extensive reliability or validity testing.
Objectives:
(1) assess the reliability of the Cochrane Risk of Bias (ROB) tool for randomized controlled trials (RCTs) and the Newcastle-Ottawa Scale (NOS) for cohort studies between individual raters, and between consensus agreements of individual raters for the ROB tool; (2) assess the validity of the Cochrane ROB tool and NOS by examining the association between study quality and treatment effect size (ES); (3) examine the impact of study-level factors on reliability and validity.
Methods:
Two reviewers independently assessed risk of bias for 154 RCTs. For a subset of 30 RCTs, two reviewers from each of four Evidence-based Practice Centers assessed risk of bias and reached consensus. Inter-rater agreement was assessed using kappa statistics. We assessed the association between ES and risk of bias using meta-regression. We examined the impact of study-level factors on the association between risk of bias and ES using subgroup analyses. Two reviewers independently applied the NOS to 131 cohort studies from 8 meta-analyses. Inter-rater agreement was calculated using kappa statistics. Within each meta-analysis, we generated a ratio of pooled estimates for each quality domain. The ratios were combined to give an overall estimate of differences in effect estimates with inverse-variance weighting and a random effects model.
Results:
Inter-rater reliability between two reviewers was considered fair for most domains (κ ranging from 0.24 to 0.37), except for sequence generation (κ=0.79, substantial). Inter-rater reliability of consensus assessments across four reviewer pairs was moderate for sequence generation (κ=0.60), fair for allocation concealment and “other sources of bias” (κ=0.37, 0.27), and slight for the remaining domains (κ ranging from 0.05 to 0.09). Inter-rater variability was influenced by study-level factors including nature of outcome, nature of intervention, study design, trial hypothesis, and funding source. Inter-rater variability resulted more often from different interpretation of the tool rather than different information identified in the study reports. No statistically significant differences were found in ES when comparing studies categorized as high, unclear or low risk of bias. Inter-rater reliability of the NOS varied from substantial for length of followup to poor for selection of non-exposed cohort and demonstration that the outcome was not present at outset of study. We found no association between individual NOS items or overall NOS score and effect estimates.
Conclusion:
More specific guidance is needed to apply risk of bias/quality tools. Study-level factors that were shown to influence agreement provide direction for detailed guidance. Low agreement across pairs of reviewers has implications for incorporation of risk of bias into results and grading the strength of evidence. Variable agreement for the NOS, and lack of evidence that it discriminates studies that may provide biased results, underscores the need for more detailed guidance to apply the tool in systematic reviews.
Contents
- Preface
- Acknowledgments
- Steering Committee
- Peer Reviewers
- Executive Summary
- Introduction
- Methods
- Results
- Summary and Discussion
- References
- Abbreviations
- Appendixes
- Appendix A Sample of Randomized Controlled Trials
- Appendix B Guidelines for Risk of Bias Assessments
- Appendix C Variables for Data Extraction from Randomized Controlled Trials
- Appendix D Meta-Analyses and Cohort Studies Used for NOS Assessments
- Appendix E Decision Rules for Application of the Newcastle-Ottawa Scale
- Appendix F Supplementary Information for NOS Assessments
- Appendix G Description of Randomized Controlled Trials
- Appendix H Data Used for Kappa Calculations
- Appendix I Inter-rater Reliability on Risk of Bias Assessments, by Domain and Study-level Variable With Confidence Intervals
- Appendix J Sources of Disagreement Across Consensus Assessments Using the Risk of Bias Tool
Prepared for: Agency for Healthcare Research and Quality, U.S. Department of Health and Human Services1, . Contract No. 290-2007-10021-I. Prepared by: University of Alberta Evidence-based Practice Center, Edmonton, Alberta, Canada
Suggested citation:
Hartling L, Hamm M, Milne A, Vandermeer B, Santaguida PL, Ansari M, Tsertsvadze A, Hempel S, Shekelle P, Dryden DM. Validity and inter-rater reliability testing of quality assessment instruments. (Prepared by the University of Alberta Evidence-based Practice Center under Contract No. 290-2007-10021-I.) AHRQ Publication No. 12-EHC039-EF. Rockville, MD: Agency for Healthcare Research and Quality. March 2012. www.effectivehealthcare.ahrq.gov/reports/final.cfm.
This report is based on research conducted by the University of Alberta Evidence-based Practice Center under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No. 290-2007-10021-1). The findings and conclusions in this document are those of the author(s), who are responsible for its content, and do not necessarily represent the views of AHRQ. No statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.
The information in this report is intended to help clinicians, employers, policymakers, and others make informed decisions about the provision of health care services. This report is intended as a reference and not as a substitute for clinical judgment.
This report may be used, in whole or in part, as the basis for the development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products or actions may not be stated or implied.
None of the investigators have any affiliations or financial involvement that conflicts with the material presented in this report.
- 1
540 Gaither Road, Rockville, MD 20850; www
.ahrq.gov
- NLM CatalogRelated NLM Catalog Entries
- Testing the risk of bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs.[J Clin Epidemiol. 2013]Testing the risk of bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs.Hartling L, Hamm MP, Milne A, Vandermeer B, Santaguida PL, Ansari M, Tsertsvadze A, Hempel S, Shekelle P, Dryden DM. J Clin Epidemiol. 2013 Sep; 66(9):973-81. Epub 2012 Sep 13.
- Testing the Newcastle Ottawa Scale showed low reliability between individual reviewers.[J Clin Epidemiol. 2013]Testing the Newcastle Ottawa Scale showed low reliability between individual reviewers.Hartling L, Milne A, Hamm MP, Vandermeer B, Ansari M, Tsertsvadze A, Dryden DM. J Clin Epidemiol. 2013 Sep; 66(9):982-93. Epub 2013 May 16.
- Review Reliability Testing of the AHRQ EPC Approach to Grading the Strength of Evidence in Comparative Effectiveness Reviews[ 2012]Review Reliability Testing of the AHRQ EPC Approach to Grading the Strength of Evidence in Comparative Effectiveness ReviewsBerkman ND, Lohr KN, Morgan LC, Richmond E, Kuo TM, Morton S, Viswanathan M, Kamerow D, West S, Tant E. 2012 May
- Inter-rater reliability and validity of risk of bias instrument for non-randomized studies of exposures: a study protocol.[Syst Rev. 2020]Inter-rater reliability and validity of risk of bias instrument for non-randomized studies of exposures: a study protocol.Jeyaraman MM, Al-Yousif N, Robson RC, Copstein L, Balijepalli C, Hofer K, Fazeli MS, Ansari MT, Tricco AC, Rabbani R, et al. Syst Rev. 2020 Feb 12; 9(1):32. Epub 2020 Feb 12.
- Review Applying the risk of bias tool in a systematic review of combination long-acting beta-agonists and inhaled corticosteroids for persistent asthma.[PLoS One. 2011]Review Applying the risk of bias tool in a systematic review of combination long-acting beta-agonists and inhaled corticosteroids for persistent asthma.Hartling L, Bond K, Vandermeer B, Seida J, Dryden DM, Rowe BH. PLoS One. 2011 Feb 24; 6(2):e17242. Epub 2011 Feb 24.
- Validity and Inter-Rater Reliability Testing of Quality Assessment InstrumentsValidity and Inter-Rater Reliability Testing of Quality Assessment Instruments
Your browsing activity is empty.
Activity recording is turned off.
See more...