Automated extraction of BI-RADS final assessment categories from radiology reports with natural language processing

J Digit Imaging. 2013 Oct;26(5):989-94. doi: 10.1007/s10278-013-9616-5.

Abstract

The objective of this study is to evaluate a natural language processing (NLP) algorithm that determines American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) final assessment categories from radiology reports. This HIPAA-compliant study was granted institutional review board approval with waiver of informed consent. This cross-sectional study involved 1,165 breast imaging reports in the electronic medical record (EMR) from a tertiary care academic breast imaging center from 2009. Reports included screening mammography, diagnostic mammography, breast ultrasound, combined diagnostic mammography and breast ultrasound, and breast magnetic resonance imaging studies. Over 220 reports were included from each study type. The recall (sensitivity) and precision (positive predictive value) of a NLP algorithm to collect BI-RADS final assessment categories stated in the report final text was evaluated against a manual human review standard reference. For all breast imaging reports, the NLP algorithm demonstrated a recall of 100.0 % (95 % confidence interval (CI), 99.7, 100.0 %) and a precision of 96.6 % (95 % CI, 95.4, 97.5 %) for correct identification of BI-RADS final assessment categories. The NLP algorithm demonstrated high recall and precision for extraction of BI-RADS final assessment categories from the free text of breast imaging reports. NLP may provide an accurate, scalable data extraction mechanism from reports within EMRs to create databases to track breast imaging performance measures and facilitate optimal breast cancer population management strategies.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Breast Neoplasms / diagnosis*
  • Cross-Sectional Studies
  • Databases, Factual / statistics & numerical data
  • Female
  • Humans
  • Magnetic Resonance Imaging / statistics & numerical data
  • Mammography / statistics & numerical data*
  • Natural Language Processing*
  • Radiology Information Systems / statistics & numerical data*
  • Sensitivity and Specificity
  • Ultrasonography, Mammary / statistics & numerical data*