Comparison of Machine Learning Methods With National Cardiovascular Data Registry Models for Prediction of Risk of Bleeding After Percutaneous Coronary Intervention

Bobak J Mortazavi; Emily M Bucholz; Nihar R Desai; Chenxi Huang; Jeptha P Curtis; Frederick A Masoudi; Richard E Shaw; Sahand N Negahban; Harlan M Krumholz

doi:10.1001/jamanetworkopen.2019.6835

Comparison of Machine Learning Methods With National Cardiovascular Data Registry Models for Prediction of Risk of Bleeding After Percutaneous Coronary Intervention

JAMA Netw Open. 2019 Jul 3;2(7):e196835. doi: 10.1001/jamanetworkopen.2019.6835.

Authors

Bobak J Mortazavi^{1

2

3

4}, Emily M Bucholz^{5

6}, Nihar R Desai^{3

4}, Chenxi Huang^{3

4}, Jeptha P Curtis^{3

4}, Frederick A Masoudi⁷, Richard E Shaw⁸, Sahand N Negahban⁹, Harlan M Krumholz^{3

4

10}

Affiliations

¹ Department of Computer Science and Engineering, Texas A&M University, College Station.
² Center for Remote Health Technologies and Systems, Texas A&M University, College Station.
³ Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut.
⁴ Center for Outcomes Research and Evaluation, Yale New Haven Hospital, New Haven, Connecticut.
⁵ Division of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut.
⁶ Now with the Department of Pediatrics, Boston Children's Hospital, Boston, Massachusetts.
⁷ Division of Cardiology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora.
⁸ Division of Cardiology, Department of Medicine, California Pacific Medical Center, Sutter Health, San Francisco.
⁹ Department of Statistics, Yale University, New Haven, Connecticut.
¹⁰ Department of Health Policy and Management, Yale School of Public Health, New Haven, Connecticut.

Abstract

Importance: Better prediction of major bleeding after percutaneous coronary intervention (PCI) may improve clinical decisions aimed to reduce bleeding risk. Machine learning techniques, bolstered by better selection of variables, hold promise for enhancing prediction.

Objective: To determine whether machine learning techniques better predict post-PCI major bleeding compared with the existing National Cardiovascular Data Registry (NCDR) models.

Design, setting, and participants: This comparative effectiveness study used the NCDR CathPCI Registry data version 4.4 (July 1, 2009, to April 1, 2015), machine learning techniques were used (logistic regression with lasso regularization and gradient descent boosting [XGBoost, version 0.71.2]), and output was then compared with the existing simplified risk score and full NCDR models. The existing models were recreated, and then performance was evaluated through additional techniques and variables in a 5-fold cross-validation in analysis conducted from October 1, 2015, to October 27, 2017. The setting was retrospective modeling of a nationwide clinical registry of PCI. Participants were all patients undergoing PCI. Percutaneous coronary intervention procedures were excluded if they were not the index PCI of admission, if the hospital site had missing outcomes measures, or if the patient underwent subsequent coronary artery bypass grafting.

Exposures: Clinical variables available at admission and diagnostic coronary angiography data were used to determine the severity and complexity of presentation.

Main outcomes and measures: The main outcome was in-hospital major bleeding within 72 hours after PCI. Results were evaluated by comparing C statistics, calibration, and decision threshold-based metrics, including the F score (harmonic mean of positive predictive value and sensitivity) and the false discovery rate.

Results: The post-PCI major bleeding rate among 3 316 465 procedures (patients' median age, 65 years; interquartile range, 56-73 years; 68.1% male) was 4.5%. The existing full model achieved a mean C statistic of 0.78 (95% CI, 0.78-0.78). The use of XGBoost and full range of selected variables achieved a C statistic of 0.82 (95% CI, 0.82-0.82), with an F score of 0.31 (95% CI, 0.30-0.31). XGBoost correctly identified an additional 3.7% of cases identified as high risk who experienced a bleeding event and an overall improvement of 1.0% of cases identified as low risk who did not experience a bleeding event. The data-driven decision threshold helped improve the false discovery rate of the existing techniques. The existing simplified risk score model improved the false discovery rate from more than 90% to 78.7%. Modifying the model and the data decision threshold improved this rate from 78.7% to 73.4%.

Conclusions and relevance: Machine learning techniques improved the prediction of major bleeding after PCI. These techniques may help to better identify patients who would benefit most from strategies to reduce bleeding risk.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Aged
Clinical Decision Rules
Comparative Effectiveness Research
Coronary Artery Disease / surgery
Female
Humans
Machine Learning*
Male
Models, Statistical
Percutaneous Coronary Intervention / adverse effects*
Percutaneous Coronary Intervention / methods
Postoperative Hemorrhage / diagnosis*
Registries / statistics & numerical data*
Risk Adjustment / methods
Risk Assessment / methods*
United States

Grants and funding

UL1 TR001863/TR/NCATS NIH HHS/United States