Testing a Machine Learning Tool for Facilitating Living Systematic Reviews of Chronic Pain Treatments [Internet]

Review
Rockville (MD): Agency for Healthcare Research and Quality (US); 2020 Nov. Report No.: 21-EHC004.

Excerpt

Background: Living systematic reviews can more rapidly and efficiently incorporate new evidence into systematic reviews through ongoing updates. A challenge to conducting living systematic reviews is identifying new articles in a timely manner. Optimizing search strategies to identify new studies before they have undergone indexing in electronic databases and automation using machine learning classifiers may increase the efficiency of identifying relevant new studies.

Methods: This project had three stages: develop optimized search strategies (Stage 1), test machine learning classifier on optimized searches (Stage 2), and test machine learning classifier on monthly update searches (Stage 3). Ovid® MEDLINE® search strategies were developed for three previously conducted chronic pain reviews using standard methods, combining National Library of Medicine Medical Subject Headings (MeSH) terms and text words (“standard searches”). Text word-only search strategies (“optimized searches”) were also developed based on the inclusion criteria for each review. In Stage 2, a machine learning classifier was trained and refined using citations from each of the completed pain reviews (“training set”) and tested on a subset of more recent citations (“simulated update”), to develop models that could predict the relevant of citations for each topic. In Stage 3, the machine learning models were prospectively applied to “optimized” monthly update searches conducted for the three pain reviews.

Results: In Stage 1, the optimized searches were less precise than the standard searches (i.e., identified more citations that reviewers eventually excluded) but were highly sensitive. In Stage 2, a machine learning classifier using a support vector machine model achieved 96 to 100 percent recall for all topics, with precision of between 1 and 7 percent. Performance was similar using the training data and on the simulated updates. The machine learning classifier excluded 35 to 65 percent of studies classified as low relevance. In Stage 3, the machine classifier achieved 97 to 100 percent sensitivity and excluded (i.e., classified as very low probability) 45 to 76 percent of studies identified in prospective, actual update searches. The estimated savings in time using the machine classifier ranged from 2.0 to 13.2 hours.

Conclusions: Text word-only searches to facilitate the conduct of living systematic reviews are associated with high sensitivity but reduced precision compared with standard searches using MeSH indexing terms. A machine learning classifier had high recall for identifying studies identified using text word searches, but had low to moderate precision, resulting in a small to moderate estimated time savings when applied to update searches.

Publication types

  • Review

Grants and funding

Prepared for: Agency for Healthcare Research and Quality, U.S. Department of Health and Human Services, 5600 Fishers Lane, Rockville, MD 20857; www.ahrq.govContract Nos. 290-2015-00009-I, 290-2015-00010-IPrepared by: Pacific Northwest Evidence-based Practice Center, Portland, OR; Southern California Evidence-based Practice Center–RAND Corporation, Santa Monica, CA