NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Structured Abstract
Background:
Classification of study design can help provide a common language for researchers. Within a systematic review, definition of specific study designs can help guide inclusion, assess the risk of bias, pool studies, interpret results, and grade the body of evidence. However, recent research demonstrated poor reliability for an existing classification scheme.
Objectives:
To review tools used to classify study designs; to select a tool for evaluation; to develop instructions for application of the tool to intervention/exposure studies; and to test the tool for accuracy and interrater reliability.
Methods:
We contacted representatives from all AHRQ Evidence-based Practice Centers (EPCs), other relevant organizations, and experts in the field to identify tools used to classify study designs. Twenty-three tools were identified; 10 were relevant to our objectives. The Steering Committee ranked the 10 tools using predefined criteria. The highest-ranked tool was a design algorithm for studies of health care interventions developed, but no longer advocated, by the Cochrane Non-Randomised Studies Methods Group. This tool was used as the basis for our classification tool and was revised to encompass more study designs and to incorporate elements of other tools. A sample of 30 studies was used to test the tool. Three members of the Steering Committee developed a reference standard (i.e., the “true” classification for each study); 6 testers applied the revised tool to the studies. Interrater reliability was measured using Fleiss’ kappa (κ) and accuracy of the testers’ classification was assessed against the reference standard. Based on feedback from the testers and the reference standard committee, the tool was further revised and tested by another 6 testers using 15 studies randomly selected from the original sample.
Results:
In the first round of testing the inter-rater reliability was fair among the testers (κ = 0.26) and the reference standard committee (κ = 0.33). Disagreements occurred at all decision points in the algorithm; revisions were made based on the feedback. The second round of testing showed improved interrater reliability (κ = 0.45, moderate agreement) with improved, but still low, accuracy. The most common disagreements were whether the study was “experimental” (5/15 studies) and whether there was a comparison (4/15 studies). In both rounds of testing, the level of agreement for testers who had completed graduate-level training was higher than for testers who had not completed training.
Conclusion:
Potential reasons for the observed low reliability and accuracy include the lack of clarity and comprehensiveness of the tool, inadequate reporting of the studies, and variability in user characteristics. Application of a tool to classify study designs in the context of a systematic review should be accompanied by adequate training, pilot testing, and documented decision rules.
Contents
- Preface
- Executive Summary
- 1. Introduction
- 2. Methods
- 3. Results
- 4. Discussion
- References
- Appendixes
- Appendix A Steering Committee Members
- Appendix B Contacts for Identification of Study Design Classification Tools
- Appendix C Letter of Request To Identify Study Design Classification Tools
- Appendix D Bibliography and Summary of Studies for Classification (Objective 4)
- Appendix E Round One Algorithm and Glossary
- Appendix F Changes Made Between Round One and Round Two Algorithm
- Appendix G Round Two Algorithm and Glossary
- Appendix H Top Ranked Classification Tools Design
Acknowledgments: We are very grateful to the following individuals from the University of Alberta Evidence-based Practice Center who were involved in testing the taxonomies and providing feedback: Ahmed Abou-Setta, Liza Bialy, Michele Hamm, Nicola Hooton, David Jones, Andrea Milne, Kelly Russell, Jennifer Seida, and Kai-On Wong. We thank Ben Vandermeer for conducting the statistical analyses. We thank Karen Siegel from AHRQ for her input and advice during the course of the project. We are also appreciative of the individuals who provided sample taxonomies and of the members of other Evidence-based Practice Centers who provided the studies that were used for testing.
Prepared for: Agency for Healthcare Research and Quality, U.S. Department of Health and Human Services1, Contract No. 290-02-0023 Prepared by: University of Alberta Evidence-based Practice Center, Edmonton, Alberta, Canada.
Suggested citation:
Hartling L, Bond K, Harvey K, Santaguida PL, Viswananthan M, Dryden DM. Developing and Testing a Tool for the Classification of Study Designs in Systematic Reviews of Interventions and Exposures. Agency for Healthcare Research and Quality; December 2010. Methods Research Report. AHRQ Publication No. 11-EHC-007. Available at http://effectivehealthcare.ahrq.gov/.
This report is based on research conducted by the University of Alberta Evidence-based Practice Center (EPC) under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No. 290-02-0023). The findings and conclusions in this document are those of the author(s), who are responsible for its contents; the findings and conclusions do not necessarily represent the views of AHRQ. Therefore, no statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.
The information in this report is intended to help clinicians, employers, policymakers, and others make informed decisions about the provision of health care services. This report is intended as a reference and not as a substitute for clinical judgment.
This report may be used, in whole or in part, as the basis for development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.
This document was written with support from the Effective Health Care Program at AHRQ. None of the authors has a financial interest in any of the products discussed in this document.
The investigators have no relevant financial interests in the report. The investigators have no employment, consultancies, honoraria, or stock ownership or options, or royalties from any organization or entity with a financial interest or financial conflict with the subject matter discussed in the report.
- 1
540 Gaither Road, Rockville, MD 20850; www
.ahrq.gov
- NLM CatalogRelated NLM Catalog Entries
- Review Validity and Inter-Rater Reliability Testing of Quality Assessment Instruments[ 2012]Review Validity and Inter-Rater Reliability Testing of Quality Assessment InstrumentsHartling L, Hamm M, Milne A, Vandermeer B, Santaguida PL, Ansari M, Tsertsvadze A, Hempel S, Shekelle P, Dryden DM. 2012 Mar
- A newly developed tool for classifying study designs in systematic reviews of interventions and exposures showed substantial reliability and validity.[J Clin Epidemiol. 2016]A newly developed tool for classifying study designs in systematic reviews of interventions and exposures showed substantial reliability and validity.Seo HJ, Kim SY, Lee YJ, Jang BH, Park JE, Sheen SS, Hahn SK. J Clin Epidemiol. 2016 Feb; 70:200-5. Epub 2015 Sep 25.
- Ensuring inter-tester reliability of voluntary muscle and monofilament sensory testing in the INFIR Cohort Study.[Lepr Rev. 2007]Ensuring inter-tester reliability of voluntary muscle and monofilament sensory testing in the INFIR Cohort Study.Roberts AE, Nicholls PG, Maddali P, Van Brakel WH. Lepr Rev. 2007 Jun; 78(2):122-30.
- How has the impact of 'care pathway technologies' on service integration in stroke care been measured and what is the strength of the evidence to support their effectiveness in this respect?[Int J Evid Based Healthc. 2008]How has the impact of 'care pathway technologies' on service integration in stroke care been measured and what is the strength of the evidence to support their effectiveness in this respect?Allen D, Rixson L. Int J Evid Based Healthc. 2008 Mar; 6(1):78-110.
- Review Effectiveness of Portable Monitoring Devices for Diagnosing Obstructive Sleep Apnea: Update of a Systematic Review[ 2004]Review Effectiveness of Portable Monitoring Devices for Diagnosing Obstructive Sleep Apnea: Update of a Systematic ReviewLux L, Boehlecke B, Lohr KN. 2004 Sep 1
- Developing and Testing a Tool for the Classification of Study Designs in Systema...Developing and Testing a Tool for the Classification of Study Designs in Systematic Reviews of Interventions and Exposures
Your browsing activity is empty.
Activity recording is turned off.
See more...