Biomedical informatics techniques for processing and analyzing web blogs of military service members

J Med Internet Res. 2010 Oct 5;12(4):e45. doi: 10.2196/jmir.1538.

Abstract

Introduction: Web logs ("blogs") have become a popular mechanism for people to express their daily thoughts, feelings, and emotions. Many of these expressions contain health care-related themes, both physical and mental, similar to information discussed during a clinical interview or medical consultation. Thus, some of the information contained in blogs might be important for health care research, especially in mental health where stress-related conditions may be difficult and expensive to diagnose and where early recognition is often key to successful treatment. In the field of biomedical informatics, techniques such as information retrieval (IR) and natural language processing (NLP) are often used to unlock information contained in free-text notes. These methods might assist the clinical research community to better understand feelings and emotions post deployment and the burden of symptoms of stress among US military service members.

Methods: In total, 90 military blog posts describing deployment situations and 60 control posts of Operation Enduring Freedom/Operation Iraqi Freedom (OEF/OIF) were collected. After "stop" word exclusion and stemming, a "bag-of-words" representation and term weighting was performed, and the most relevant words were manually selected out of the high-weight words. A pilot ontology was created using Collaborative Protégé, a knowledge management application. The word lists and the ontology were then used within General Architecture for Text Engineering (GATE), an NLP framework, to create an automated pipeline for recognition and analysis of blogs related to combat exposure. An independent expert opinion was used to create a reference standard and evaluate the results of the GATE pipeline.

Results: The 2 dimensions of combat exposure descriptors identified were: words dealing with physical exposure and the soldiers' emotional reactions to it. GATE pipeline was able to retrieve blog texts describing combat exposure with precision 0.9, recall 0.75, and F-score 0.82.

Discussion: Natural language processing and automated information retrieval might potentially provide valuable tools for retrieving and analyzing military blog posts and uncovering military service members' emotions and experiences of combat exposure.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Adult
  • Artificial Intelligence
  • Benchmarking / methods*
  • Benchmarking / statistics & numerical data
  • Blogging / statistics & numerical data*
  • Emotions
  • Female
  • Humans
  • Information Storage and Retrieval / methods*
  • Information Storage and Retrieval / statistics & numerical data
  • Interpersonal Relations
  • Iraq War, 2003-2011
  • Male
  • Military Medicine
  • Military Personnel / statistics & numerical data*
  • Natural Language Processing*
  • Self Concept
  • Terminology as Topic*
  • United States
  • Young Adult