Ontology-guided feature engineering for clinical text classification

Vijay N Garla; Cynthia Brandt

doi:10.1016/j.jbi.2012.04.010

Ontology-guided feature engineering for clinical text classification

J Biomed Inform. 2012 Oct;45(5):992-8. doi: 10.1016/j.jbi.2012.04.010. Epub 2012 May 9.

Authors

Vijay N Garla¹, Cynthia Brandt

Affiliation

¹ Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, 300 George Street, Suite 501, New Haven, CT 06520-8009, USA. vijay.garla@yale.edu

Abstract

In this study we present novel feature engineering techniques that leverage the biomedical domain knowledge encoded in the Unified Medical Language System (UMLS) to improve machine-learning based clinical text classification. Critical steps in clinical text classification include identification of features and passages relevant to the classification task, and representation of clinical text to enable discrimination between documents of different classes. We developed novel information-theoretic techniques that utilize the taxonomical structure of the Unified Medical Language System (UMLS) to improve feature ranking, and we developed a semantic similarity measure that projects clinical text into a feature space that improves classification. We evaluated these methods on the 2008 Integrating Informatics with Biology and the Bedside (I2B2) obesity challenge. The methods we developed improve upon the results of this challenge's top machine-learning based system, and may improve the performance of other machine-learning based clinical text classification systems. We have released all tools developed as part of this study as open source, available at http://code.google.com/p/ytex.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Cardiovascular Diseases
Data Mining
Databases as Topic / classification
Humans
Medical Informatics Applications
Models, Theoretical
Natural Language Processing*
Obesity
Semantics
Unified Medical Language System

Abstract

Publication types

MeSH terms

Grants and funding