NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Structured Abstract
Background:
Web applications that employ natural language processing technologies such as text mining and text classification to support systematic reviewers during abstract screening have become more user friendly and more common. Such semi-automated screening tools can increase efficiency by reducing the number of abstracts needed to screen or by replacing one screener after adequately training the algorithm of the machine. Savings in workload between 30 percent and 70 percent might be possible with the use of such tools. The goal of our project was to conduct a case study to explore a screening approach that temporarily replaces a human screener with a semi-automated screening tool.
Methods:
To address our objective, we evaluated the accuracy of a machine-assisted screening approach using an Agency for Healthcare Research and Quality comparative effectiveness review as the reference standard. We chose DistillerAI as a semi-automated screening tool for our project, applying its naïve Bayesian machine-learning option. Five teams screened the same 2,472 abstracts in parallel, using the machine-assisted approach. Each team trained DistillerAI with 300 randomly selected abstracts that the team screened dually. For the remaining 2,172 abstracts, DistillerAI replaced one human screener in each team and provided predictions about the relevance of records. We used a prediction score of 0.5 (i.e., inconclusive) or greater to classify a record as an inclusion. A single reviewer also screened all remaining abstracts. A second human screener resolved conflicts between the single reviewer and DistillerAI. We compared the decisions of the machine-assisted approach, single-reviewer screening (i.e., no machine assistance), and screening with DistillerAI alone (i.e., no human involvement after training) against the reference standard and calculated sensitivities, specificities, and the area under the receiver operating characteristics curve. In addition, we determined the interrater agreement, the proportion of included abstracts, and the number of conflicts between human screeners and DistillerAI.
Results:
The mean sensitivity of the machine-assisted screening approach across the five screening teams was 78 percent (95% confidence interval [CI], 66% to 90%), and the mean specificity was 95 percent (95% CI, 92% to 97%). By comparison, the sensitivity of single-reviewer screening was also 78 percent (95% CI, 66% to 89%); the sensitivity of DistillerAI alone was 14 percent (95% CI, 0% to 31%). Specificities for single-reviewer screening and DistillerAI alone were 94 percent (95% CI, 91% to 97%) and 98 percent (95% CI, 97% to 100%), respectively. Machine-assisted screening and single-reviewer screening had similar areas under the curve (0.87 and 0.86, respectively); by contrast, the area under the curve for DistillerAI alone was just slightly better than chance (0.56). The interrater agreement between human screeners and DistillerAI with a prevalence-adjusted kappa was 0.85 (95% CI, 0.84 to 0.86).
Discussion:
Findings of our study indicate that the accuracy of DistillerAI is not yet adequate to replace a human screener temporarily during abstract screening. The approach that we tested missed too many relevant studies and created too many conflicts between human screeners and DistillerAI. Rapid reviews, which do not require detecting the totality of the relevant evidence, may find semi-automation tools to have greater utility than traditional systematic reviews.
Contents
Suggested citation:
Gartlehner G, Wagner G, Lux L, Affengruber L, Dobrescu A, Kaminski-Hartenthaler A, Viswanathan M. Assessing the Accuracy of Machine-Assisted Abstract Screening With DistillerAI: A User Study. Methods Research Report. (Prepared by the RTI International–University of North Carolina Evidence-based Practice Center under Contract No. 290-2015-00011-I.) AHRQ Publication No. 19(20)-EHC026-EF. Rockville, MD: Agency for Healthcare Research and Quality; November 2019. Posted final reports are located on the Effective Health Care Program search page. DOI: https://doi.org/10.23970/AHRQEPCMETHMACHINEDISTILLER.
This report is based on research conducted by the RTI International–University of North Carolina Evidence-based Practice Center (EPC) under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No 290-2015-00011-I). The findings and conclusions in this document are those of the authors, who are responsible for its contents; the findings and conclusions do not necessarily represent the views of AHRQ. Therefore, no statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.
None of the investigators have any affiliations or financial involvement that conflicts with the material presented in this report.
The information in this report is intended to help healthcare researchers and funders of research make well informed decisions in designing and funding research and thereby improve the quality of healthcare services. This report is not intended to be a substitute for the application of scientific judgment. Anyone who makes decisions concerning the provision of clinical care should consider this report in the same way as any medical research and in conjunction with all other pertinent information, i.e., in the context of available resources and circumstances.
This report is made available to the public under the terms of a licensing agreement between the author and AHRQ. This report may be used and reprinted without permission except those copyrighted materials that are clearly noted in the report. Further reproduction of those copyrighted materials is prohibited without the express permission of copyright holders.
AHRQ or the U.S. Department of Health and Human Services endorsement of any derivative products that may be developed from this report, such as clinical practice guidelines, other quality enhancement tools, or reimbursement or coverage policies, may not be stated or implied.
Persons using assistive technology may not be able to fully access information in this report. For assistance contact vog.shh.qrha@CPE.
- NLM CatalogRelated NLM Catalog Entries
- Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study.[Syst Rev. 2019]Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study.Gartlehner G, Wagner G, Lux L, Affengruber L, Dobrescu A, Kaminski-Hartenthaler A, Viswanathan M. Syst Rev. 2019 Nov 15; 8(1):277. Epub 2019 Nov 15.
- Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools.[Syst Rev. 2019]Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools.Gates A, Guitard S, Pillay J, Elliott SA, Dyson MP, Newton AS, Hartling L. Syst Rev. 2019 Nov 15; 8(1):278. Epub 2019 Nov 15.
- Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow.[Syst Rev. 2021]Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow.Pham B, Jovanovic J, Bagheri E, Antony J, Ashoor H, Nguyen TT, Rios P, Robson R, Thomas SM, Watt J, et al. Syst Rev. 2021 May 26; 10(1):156. Epub 2021 May 26.
- Review Screening for Skin Cancer in Adults: An Updated Systematic Evidence Review for the U.S. Preventive Services Task Force[ 2016]Review Screening for Skin Cancer in Adults: An Updated Systematic Evidence Review for the U.S. Preventive Services Task ForceWernli KJ, Henrikson NB, Morrison CC, Nguyen M, Pocobelli G, Whitlock EP. 2016 Jul
- Review Screening for High Blood Pressure in Adults: A Systematic Evidence Review for the U.S. Preventive Services Task Force[ 2014]Review Screening for High Blood Pressure in Adults: A Systematic Evidence Review for the U.S. Preventive Services Task ForcePiper MA, Evans CV, Burda BU, Margolis KL, O’Connor E, Smith N, Webber E, Perdue LA, Bigler KD, Whitlock EP. 2014 Dec
- Assessing the Accuracy of Machine-Assisted Abstract Screening With DistillerAI: ...Assessing the Accuracy of Machine-Assisted Abstract Screening With DistillerAI: A User Study
- CADTH Canadian Drug Expert Committee Recommendation: Ixekizumab (Taltz — Eli Lil...CADTH Canadian Drug Expert Committee Recommendation: Ixekizumab (Taltz — Eli Lilly Canada Inc.)
Your browsing activity is empty.
Activity recording is turned off.
See more...