Using Visual Analytic Methods to Identify Patient Groups

Suresh K. Bhavnani; Yong-Fang Kuo

doi:10.25302/06.2021.ME.151133194

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Using Visual Analytic Methods to Identify Patient Groups

Suresh K. Bhavnani, PhD and Yong-Fang Kuo, PhD.

Author Information and Affiliations

Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2021 Jun.

Structured Abstract

Background:

A primary goal of precision medicine is to identify patient subgroups and infer their underlying disease processes to design interventions that are targeted to those processes. However, few approaches to identify patient subgroups are designed to reveal relationships within and across patient subgroups, which constrains the ability of clinician and patient stakeholders to fully comprehend the processes underlying heterogeneity in a patient population.

Objectives:

Our study had the following 3 aims: Develop a visual analytical method to (1) quantitatively identify the number, size, statistical significance, and replicability of biclusters consisting of patient subgroups and their most frequently co-occurring characteristics; (2) visualize that information through a bipartite network of readmitted patients and their comorbidities to enable stakeholders to comprehend the relationships within and across patient subgroups; and (3) test whether these patient subgroups could improve the prediction of hospital readmission through the use of stratified regression models.

Methods

Data:

We extracted Medicare data from 2013 and 2014 related to hospital readmission for 3 conditions: chronic obstructive pulmonary disease (COPD), congestive heart failure (CHF), and total hip/knee arthroplasty (THA/TKA). For each condition, we extracted cases (patients readmitted within 30 days of hospital discharge) and controls (patients not readmitted within 90 days of discharge, matched by age, sex, and race). The data were randomly divided into a training data set and a replication data set consisting of case-control matched pairs. Variables included 42 unique comorbidities highly prevalent in older adult patients and were extracted from the Elixhauser, Charlson, and Centers for Medicare & Medicaid Services condition category indices, in addition to condition-specific variables identified by the stakeholders. Comorbidities within each condition that were significant in both the training and replication data sets were selected for further analysis.

Analysis:

We analyzed the data using 3 modeling methods:

Visual analytical modeling was conducted by (a) representing the patients (only cases) and their comorbidities as a bipartite network; (b) identifying patient subgroups and their most frequently co-occurring comorbidities through the use of bicluster modularity maximization, and testing the statistical significance of its degree of clusteredness through comparisons to 1000 random permutations of the data; and (c) comparing the comorbidity co-occurrence between the training and replication data sets through the use of the Rand Index and testing its significance. The stakeholders examined the bipartite networks of readmitted patients and their comorbidities to assess their clinical significance and to design potential interventions for reducing the risk of readmission.
Classification modeling was conducted by (a) building and validating a multinomial logistic regression model to predict the probability of a patient (cases and controls) belonging to each of the subgroups identified by the visual analytics based on their profile of the comorbidities; and (b) classifying each patient (cases and controls) to the subgroup to which the patient had the highest probability of belonging, which was used in the subsequent predictive modeling.
Predictive modeling was conducted by (a) building and validating a logistic regression model using data from all the patients lumped together to predict the probability of a patient being readmitted to the hospital, based on the patient's profile of comorbidities, age, sex, and race; and (b) using these variables and standard predictive modeling discrimination measures (concordance statistic [C statistic], calibration plots, net reclassification improvement [NRI], and integrated discrimination improvement [IDI]) to compare this lumped model to the corresponding hierarchical logistic regression models containing information about the patient subgroups (identified from the visual analytics).

Results:

In each of the 3 conditions, the visual analytical model (No. 1 above) automatically identified statistically significant and replicated biclusters, whose comorbidity co-occurrence and risk for readmission were determined to be clinically meaningful by the stakeholders. Similarly, in each of the 3 conditions, the multinomial logistic regression classifier (No. 2 above) had high accuracy in correctly classifying patients into the patient subgroups identified by the visual analytical model. Finally, the hierarchical model had a small but statistically significant improvement in discriminating between the readmitted and not readmitted patients as measured by NRI, but not as measured by the C statistic or IDI, in 2 (COPD, THA/TKA) of the 3 conditions.

Conclusions:

The visual analytical modeling successfully identified statistically and clinically significant patient subgroups. Similarly, the classification modeling accurately predicted which patient subgroup a patient should belong to, based on their comorbidity profile. However, the hierarchical predictive models had significantly higher but small improvement in discrimination between readmitted and not readmitted patients as measured by NRI, but not as measured by the C statistic or IDI. These predictive modeling results suggest that improvements in predictive accuracy occurred within readmission risk categories rather than across all the patients and that comorbidities on their own were not strong predictors for hospital readmission.

Limitations

Data limitations:

Because the baseline models were built using Medicare administrative data, we used the same data for our models to enable a head-to-head comparison. However, although such administrative data have strengths, including being representative of the population aged >65 years in the United States and having multiple years enabling replication analysis, they have known limitations, such as the lack of comorbidity severity and test results, which could strongly impact the accuracy of predictive models. Research using electronic medical records containing a wider range and more granularity of variables, such as laboratory results and medications, could help overcome such limitations. Furthermore, comorbidities on their own do not have a strong enough signal for readmission, and future approaches should explore other variables, including medications, to predict readmission.

Method limitations:

Our development and use of visual analytics for building stratified regression models revealed 2 limitations related to analyzing large data sets:

Although our overall method scaled well to analyze data sets ranging up to 25 000 patients, the measurement of significance for biclustering and replication took up to a week of computation despite using a dedicated server with multiple cores. This high computational cost was caused because significance of clustering is currently measured by comparing the clustering results to 1000 random permutations of the data. Future research should explore approaches to measure the significance of modularity using mathematical approaches (through formulas) rather than permutation approaches.
Although bicluster modularity was successful in finding significant and meaningful biclusters, the visualizations were extremely dense and therefore concealed patterns within and between biclusters. Future research should explore more advanced methods to enable interpretability of results within and between large and dense biclusters. Finally, following recommendations from the stakeholders, we pooled 2 years of the data and randomly split them to form the training and internal validation data sets. Future research should therefore externally validate the results using a different year from the Medicare database.

Background

Need for Automatic Approaches to Identify Patient Subgroups

Heterogeneity in Humans

A wide range of studies^1-9 on topics ranging from molecular to environmental determinants of health have shown that most humans tend to share key characteristics (eg, comorbidities or genes) that form distinct patient subgroups. A primary goal of precision medicine is to identify such patient subgroups and infer their underlying disease processes to design interventions that are targeted to those processes.^2,10 For example, recent studies in complex diseases such as breast cancer^3,4 and asthma^5-7 have revealed molecular-based subphenotypes, each with different underlying biological mechanisms precipitating the disease, and therefore each requiring different treatments. Because such targeted treatments can have a profound impact on patient outcomes, the identification and comprehension of patient subgroups is critical to patient-centered outcomes research (PCOR).

Current Approaches for Identifying Patient Subgroups

A patient subgroup is defined as a subset of patients drawn from a population (eg, older adult patients) that share ≥1 characteristics (eg, renal failure and diabetes). Patients have been divided into subgroups by using (1) investigator-selected variables, such as race, for developing stratified regression models¹¹ or assigning patients to different arms of a clinical trial; (2) existing classification systems, such as the Medicare Severity Diagnosis Related Group,¹² to assign patients to a disease category for billing purposes; or (3) computational methods, such as classification^13-15 and clustering,^5,16 to discover patient subgroups from data.

One of the simplest unsupervised computational methods for identifying patient subgroups is by enumerating conjunctions of variables, such as by analyzing all dyads and triads of co-occurring comorbidities in the Medicare database,¹⁷ and then examining the most prevalent subgroups. Other methods attempt to partition a data set of patients and characteristics into sets that are relatively homogeneous. These sets can either be 1-sided clusters (clusters of patients, or clusters of characteristics) or biclusters^16,18,19 (clusters of patients and characteristics). K-means and hierarchical clustering^15,16 are among the most commonly used 1-sided clustering methods and require inputs such as a similarity measure (eg, Jaccard similarity) and the expected number of subgroups, but with no agreed-upon approaches to automatically determine them. More recently, biclustering^16,18,19 methods (also called co-clustering methods) have been developed to automatically identify nonoverlapping or overlapping submatrices consisting of both patients and characteristics. Compared with partitioning methods that use similarity measures to identify clusters, dimensionality reduction methods are used to attempt to find a reduced dimensional space where differences among patients are maximized. For example, principal component analysis¹⁵ (PCA) is used to identify principal components, which are weighted combinations of characteristics along which patients have the maximum variance. The patients' scores are projected onto a plane typically defined by the 2 most important principal components. Methods such as k-means are then used to identify clusters of patients in this reduced dimensional space.

In contrast to the above-described unsupervised methods, supervised methods focus on identifying patient subgroups by taking into consideration outcome variables (eg, responders and nonresponders in a treatment arm). For example, classification and regression trees (CARTs),¹⁵ as well as enhancements such as random forests²⁰ and bump hunting,¹⁴ progressively divide patients into subgroups based on the outcome variable by using conjunctions of patient characteristics at each step. The method outputs a tree, and each path from the root node to a leaf node defines a patient subgroup.

Strengths and Limitations of Existing Methods

Although the aforementioned methods have improved our understanding of heterogeneity in different populations, they have important limitations with respect to enabling the identification and comprehension of patient subgroups. Although all share the goal of identifying patient subgroups based on characteristics, they either (1) consider only some characteristics when defining subgroups (eg, methods using variable conjunctions), (2) output 1-sided clusters such as patient subgroups without their characteristics (eg, k-means, hierarchical clustering, PCA), or (3) cannot reveal the relationship among patient subgroups (eg, biclustering, CART).

Visual Analytical Approach for the Identification and Comprehension of Patient Subgroups

Bipartite Network Analysis

This project focuses on using a powerful unsupervised visual analytical method called bipartite network analysis²¹ to automatically identify and visualize patient subgroups based on multiple characteristics. Visual analytics is defined as the science of analytical reasoning facilitated by interactive visual interfaces.²² Bipartite network analysis is an unsupervised visual analytics²² method used to help discover complex patterns in data. Bipartite networks have been used extensively to analyze a wide range of associations, including social networks,²³ disease classification,²⁴ and drug discovery,²⁵ and to identify proteomic heterogeneity.²⁶ The principal investigator (PI) has developed a general methodology to use bipartite networks to identify patient subgroups based on their molecular profile²⁶ and applied it in a number of diseases, including asthma,²⁷ rickettsial diseases,²⁸ and influenza.²⁹ This methodology enabled the identification of a new classification of asthma based on complex inter-subgroup associations and their underlying mechanisms.²⁷

A network consists of a set of nodes (represented as shapes such as circles) connected in pairs by edges (represented as lines).²¹ Unipartite networks (Figure 1A), commonly used to model social networks, have nodes that represent 1 type of entity (eg, patients), while the edges represent associations among them (eg, friendship links). In contrast, bipartite networks (Figure 1B) represent 2 types of entities (eg, patients and characteristics), and edges exist only between different types of entities representing a specific relationship (eg, a patient has diabetes).²¹

Figure 1

Comparison Between a Unipartite (A) and a Bipartite Network Representation (B), and How a Bipartite Network Can Reveal Biclusters Containing Patient Subgroups and Their Most Frequently Co-occurring Comorbidities (C).

Based on a wide range of projects that the PI has conducted to identify patient subgroups in different diseases,^26,27,30-35 in addition to a preliminary analysis of readmission data discussed later in this report, bipartite networks appear to be powerful for identifying patient subgroups and their underlying mechanisms based on 2 fundamental properties:

Automatic identification of the number, size, and significance of patient subgroups. A network is a graph with mathematical properties,²¹ and the field of network analysis has developed a large collection of quantitative measures to analyze complex associations in data. For example, modularity algorithms can automatically (ie, requiring no user input) identify in large networks the number of clusters, their members, the degree of clustering with respect to random networks of the same size,^21,36-38 and the significance of the clustering.^16,39 Modularity is defined as the fraction of edges falling within a cluster (ie, the expected fraction of such edges in a network of the same size with randomly assigned edges).²¹ Modularity ranges from −0.5 to +1. Modularity algorithms search for a partition of nodes that optimizes the modularity quantity. This approach has been developed for identifying uniclusters in unipartite networks and biclusters in bipartite networks.⁴⁰ For example, we have used modularity to identify biclusters of patients and their characteristics in readmission data.^41-43 Although there is active research in developing extensions to the modularity algorithm, its basic approach is a well-accepted and automatic method for finding clusters in complex networks.²¹
Visual representation enabling comprehension of quantitative results. Although modularity algorithms identify the number and membership of biclusters, they cannot reveal how individual nodes or subgroups are related to each other, which has considerable explanatory power to aid comprehension of patient subgroups. To enable comprehension of these relationships, networks are typically laid out using force-directed algorithms that pull together nodes that are strongly connected and push apart nodes that are not. The result is that nodes with a similar pattern of connections are placed close to each other, and those that are dissimilar are pushed apart. As shown in Figure 1C, the application of the Kamada–Kawai (KK) force-directed algorithm⁴⁴ to a bipartite network has revealed 2 biclusters, each consisting of patients and comorbidities, 1 on the left and 1 on the right. The patients in each bicluster are defined as a patient subgroup, and the comorbidities in each bicluster are the most frequently co-occurring comorbidities for that patient subgroup. Because such layouts are approximate, the patient subgroups identified from a modularity algorithm are superimposed on the layout (as shown by the ovals in Figure 1C).

The resulting bipartite visual representation reveals 2 important relationships enabling comprehension of subgroups:

Relationships within patient subgroups include the number and proportion of patients and characteristics in each cluster (eg, a bicluster containing few patients and many variables and another containing many patients with few variables) or patient subgroups that have high or low uniformity (eg, the left- and right-side patient subgroups in Figure 1C).
Relationships across patient subgroups include visualizing the proportion of additional characteristics across biclusters. For example, by coloring nodes in a network by race or sex, the visualization could reveal differences according to these characteristics across the subgroups (eg, an Asian-dominated cluster that has chronic obstructive pulmonary disease [COPD] and congestive heart failure [CHF] vs a Hispanic-dominated cluster that has osteoporosis and renal failure). Finally, the visualization also can reveal intercluster associations (eg, the intersubgroup associations shown in Figure 1C).

Although each of the above-described relationships can be calculated without the visualization, the simultaneous presentation of them in the bipartite network visualization leverages the parallel processing power of the visual cortex of the human brain,⁴⁵ enabling the discovery and comprehension of intra- and intercluster associations. We have demonstrated^26,27,30-35 that this integration of associations, once tested for significance, can enable stakeholders to infer the underlying processes in each patient subgroup and thus design targeted interventions. Furthermore, as shown by the edges that cross the biclusters, patients in 1 bicluster can have characteristics present in another bicluster, albeit far less frequently compared with the comorbidities in their own biclusters. Therefore, based on the 2 biclusters in Figure 1C, a patient with C1 is also highly likely to have C3 and C5, but less likely to have C2 or C4, and vice versa. Consequently, patient subgroups tend to have a similar but not necessarily identical profile of comorbidities. Membership of a new patient in a bicluster is therefore probabilistic, based on the entire comorbidity profile, and therefore reflects the true variation of comorbidities in humans. This probabilistic approach is different from the approach of defining patient subgroups based on a few inclusion or exclusion criteria (eg, CHF and diabetes) while ignoring the rest of the comorbidity profile.

Role of Patient Subgroups in Developing More Accurate Predictive Models

Stratified Regression Modeling

Although numerous studies have revealed considerable heterogeneity across patients, automatically identified subgroups have not been systematically used to develop predictive models. Consider the following logistic regression model with the 3 predictors age, sex, and body mass index (BMI) used to predict the risk of type 2 diabetes, simplified for illustration:

Risk of type 2 diabetes = \frac{\exp (β_{0} + β_{1} * Age + β_{2} * Sex + β_{3} * BMI)}{1 + \exp (β_{0} + β_{1} * Age + β_{2} * Sex + β_{3} * BMI)}

This model can predict, for example, that patients John, Mary, and Tom, each with different profiles of age, sex, and BMI, have different risks for developing type 2 diabetes of 0.9, 0.6, and 0.3, respectively. Although the model shown above uses the characteristics of each patient to estimate their individual risks, each estimate is derived using the same coefficients (β₁, β₂, and β₃) for the same set of variables (age, sex, BMI). Therefore, the underlying assumption is that a single model can accurately estimate the risk for all patients in a cohort.

However, as stated earlier, a “one size fits all” assumption is not well aligned with the notion of heterogeneity in patient populations that is observed in numerous diseases and conditions.^1-9 For example, the sample regression model discussed above ignores the possibility that John and Mary might be similar because they share the same processes resulting in diabetes, whereas Tom might have an entirely different process leading to the same disease. In such situations, a single model might not be as accurate as having 2 models, each targeting different patient subgroups formed by similar profiles—an approach referred to as stratified regression modeling.

The mathematical intuition for using stratified regression models is that they can achieve better fit to subsets of the data that are homogeneous, compared with a single regression model that is fitted to all the data,¹¹ an approach well established in biostatistics. However, as stated before, stratification is typically done using the conjunction of a few investigator-selected variables. Although this approach can be successful sometimes, it does not consider multiple characteristics in a data set. In contrast, bipartite network analysis provides the opportunity to automatically identify patient subgroups based on many characteristics, refine them through stakeholder engagement, and then incorporate them in the design of stratified regression models. Therefore, if the subgroups identified are indeed strongly and significantly clustered, they should improve the accuracy of predictive models that do not exploit such patient subgroups.

Need for Automatic Patient Stratification to Improve the Prediction of Hospital Readmission

The High Cost of Hospital Readmission

Although our approach to the automatic identification of patient subgroups and its use in stratified regression modeling is generally applicable to analyzing a wide range of patient-characteristic data sets, there is an urgent need⁴⁶ to apply it to address the hospital readmission problem in older adult patients. An estimated 1 in 5 older adult patients (>2.3 million Americans) is readmitted to a hospital within 30 days after being discharged.⁴⁶ Although many readmissions are unavoidable, an estimated 75% of readmissions are unplanned, and mostly preventable.⁴⁷ Unplanned hospital readmissions nationwide therefore impose a significant burden in terms of mortality, morbidity, and resource consumption. For example, a hospital readmission of a patient with a hip fracture can easily negate the functional gains painstakingly achieved through weeks of postacute rehabilitation. This loss is over and above the costs to caregivers and relatives, who have to relive the stress of the original hip fracture episode, reorganize their work schedules to care for the patient, and restart the rehabilitation process after discharge. Across all conditions, unplanned readmissions cost almost $17 billion annually in the United States,⁴⁷ making them an ineffective use of costly resources in addition to being closely scrutinized as a marker for poor quality of care by organizations such as the Centers for Medicare & Medicaid Services (CMS).⁴⁸

Current Risk Prediction Models for Readmission

Several national and local efforts have developed models to predict the patient-specific risk of readmission. However, there is considerable room for improvement in their predictive power. For example, researchers at Yale University developed a model to predict the readmission of patients with CHF, based on patient profiles such as patient age, sex, and comorbidities.⁴⁹ Similar models have been developed for predicting the readmission risk of patients admitted for COPD⁵⁰ and total hip/knee arthroplasty (THA/TKA).⁵¹ However, because the predictive power (measured by the concordance statistic [C statistic]) of these models is in the range of 0.60 to 0.65, there is considerable room for improvement. We have achieved some improvement in predictive power for COPD (C statistic = 0.72) by introducing factors related to postdischarge management (eg, antibiotic use and follow-up after discharge).⁵² However, an accurate predictive model based solely on a patient's health profile at or during hospitalization will be more helpful for planning targeted postdischarge care. In this project, we used discrimination (ie, C statistic, discrimination slope), calibration (ie, calibration-in-the-large, calibration slope), and reclassification (ie, net reclassification improvement [NRI], integrated discrimination improvement [IDI]) measures discussed in aim 2 to test whether information about patient subgroups identified through visual analytics will improve the accuracy of prediction when compared with existing models that do not consider information about patient subgroups.

Importance of Comorbidities in Predicting the Risk of Readmission

Studies have shown that the number and type of comorbidities play a critical role in determining the risk of readmission.⁴⁶ For example, almost two-thirds of older adult patients population have ≥2 comorbid conditions, resulting in a heightened risk of adverse health outcomes, such as hospital readmission.¹⁷ Furthermore, multiple comorbidities very often do not act independently but rather interact with each other, resulting in processes that can precipitate readmission.⁵³ For example, because of the systemic nature of renal disease, a patient with hip fracture, CHF, and renal failure is at a higher risk of renal failure exacerbation resulting in a readmission than is a patient who only has renal failure.^41,42 Additional evidence for different comorbidities precipitating readmission came from discussions with 3 readmitted patients (see Patient and Stakeholder Engagement section). The aforementioned studies and our discussions with readmitted patients suggest that comorbidity profiles can cause considerable heterogeneity among patients, each with different processes precipitating readmission and associated risks.

Differences With Previously Funded Methods for Improving PCOR

Previously funded PCOR methods related to heterogeneity of treatment effects focus on identifying the characteristics that best separate responders from nonresponders to a treatment. In contrast, our unsupervised approach is focused on identifying patient subgroups in any data consisting of patients and characteristics, with the goal of enabling stakeholders to more fully comprehend the disease processes within patient subgroups. For example, our method could be used to identify subgroups in Medicare data, in electronic medical records (EMRs), or in data from randomized clinical trials to identify subgroups within responders to treatment or nonresponders, in order to comprehend heterogeneity within each group. Furthermore, as shown in Figure 1, our integrated quantitative and visual approach enables clinician stakeholders to infer the processes underlying patient subgroups and to design potential interventions.

Research Questions and Specific Aims

This project will address 3 research questions: (1) Can a visual analytical approach be developed to automatically identify patient subgroups based on their comorbidity profiles, enable clinician stakeholders to infer the processes precipitating readmission, and design potential targeted interventions? (2) Can knowledge of patient subgroups and the inferred processes precipitating their readmission improve existing readmission prediction models? (3) Does the visual analytical approach generalize across disease conditions? These research questions will be addressed through the following 3 specific aims:

Aim 1. Develop a visual analytical method to automatically identify subgroups of readmitted patients with COPD. This aim will be achieved through the following subaims: (a) engage clinician stakeholders to review comorbidities critical for subgroup identification in readmitted patients with COPD; (b) implement a method for the automatic identification of subgroups of readmitted patients with COPD, based on their comorbidity profiles; (c) engage clinician stakeholders to reevaluate the comorbidities for each patient subgroup; and (d) develop a classifier to organize patients into the identified subgroups.
Aim 2. Develop, validate, and test improvement of subgroup-specific models for predicting readmission in patients with COPD. This aim will be achieved by the following subaims: (a) develop and validate subgroup-specific risk prediction models for readmission; (b) compare subgroup-specific models with an existing COPD model that does not consider subgroups; (c) engage stakeholders to reevaluate the variables included in the subgroup-specific models; and (d) compare subgroup-specific models vs enhanced subgroup-specific models with additional variables suggested by stakeholders.
Aim 3. Test the generalizability of subgroup-specific modeling in other conditions and disseminate the method. We planned to achieve this aim through the following subaims: (a) test the generalizability of the subgroup-specific modeling method in patients readmitted for CHF and post–THA/TKA; and (b) disseminate the method for automatically identifying patient subgroups using visual analytics to PCOR researchers.

Although the above-mentioned specific aims demonstrate our method using Medicare claims data, the methods are designed to analyze other types of patient-characteristic data, such as hospital EMRs.

Potential Impact

Automatic method for identification of patient subgroups. In contrast to the current approach of using investigator-selected variables for identifying patient subgroups to develop stratified regression models, our use of powerful visual analytical methods provides a scalable and data-driven approach to automatically identify patient subgroups based on multiple comorbidities in large data sets.
Improved comprehension of patient subgroups by stakeholders. Little is understood about how comorbidities co-occur to define patient subgroups in older adult patients. Our visualization of patient subgroups will enable stakeholders to (1) comprehend the number, size, and interrelationships among patient subgroups; (2) consider reasons for the processes that precipitate readmission underlying the patient subgroups; and (3) design interventions targeted to each patient subgroup. Furthermore, our approach targets an important patient outcome in an underrepresented and rapidly growing segment of the US population, and therefore could have a potentially significant impact on health care.
Improved accuracy of prediction. Despite evidence of considerable heterogeneity in comorbidities in older adult patients,⁵⁴ current risk models for readmission use comorbidities and demographic information but do not use patient-subgroup information. Our proposed research demonstrates how data-driven patient subgroup information can be added to readmission risk prediction modeling with the goal of improving predictive accuracy.
Publicly available R scripts for identifying and comprehending patient subgroups. Current PCOR methods do not have publicly available code to automatically identify patient subgroups and visually inspect the resulting subgroups. Our research produced R scripts to identify patient subgroups through the use of visual analytics, which will be disseminated widely through a website for free download by PCOR researchers worldwide.

Participation of Patients and Other Stakeholders

Identification of Research Questions

The research questions addressed in this project were motivated by the experiences of 3 patients, including that of the PI's mother with a hip fracture; she was readmitted twice to the hospital within 30 days of discharge for a sudden onset of syncope caused by interactions of extended bed-rest and comorbidities. The other 2 patients faced personal hardships after being readmitted due to drug interactions with existing comorbidities. The high cost of these unplanned readmissions to the patients and families prompted the question of how such readmissions could be prevented and focused our attention on comorbidities. Discussions with co-PI Yong-Fang Kuo, PhD, revealed that readmission was a widespread problem but was not well understood at the patient level. We were intrigued by the idea of using visual analytics (the PI's area of expertise) to gain deeper insight into the problem. We extracted data on all patients with hip fracture in 2009 and 2010 who had unplanned hospital readmissions. A network analysis of these patients revealed distinct patient subgroups with different combinations of comorbidities. Using the visual representation to show the comorbidity pattern, clinician-stakeholder Mukaila Raji, MD, MS, inferred the processes that might have precipitated readmission in each subgroup.

Presentations drew the attention of co-PI Dr Kuo and clinician-stakeholder Gulshan Sharma, MD, MPH. Both agreed that such subgroup information provided a novel understanding of the readmission problem and suggested that we include such information in the design of readmission-risk prediction models for determining the risk of a patient. These early attempts led to a publication and a pilot grant from the University of Texas Medical Branch Sealy Center of Aging, which enabled detailed discussions with Dr Goodwin, who is well known in the area of geriatric research based on claims and EMR data. The research questions in this project were therefore motivated by observing first-hand the real-world experience of a readmitted patient, leading to a progressive series of collaborations resulting in the generation of ideas, publications, and pilot funding.

Stakeholder Engagement

As shown in Table 1, we assembled a stakeholder team with complementary expertise and viewpoints. Engagement with the stakeholder team was organized by the PI, who has expertise in visual analytics, and the co-PI (Dr Kuo), who has extensive experience in analyzing Medicare data.

Table 1

Members of the Stakeholder Team, Their Expertise, and Their Roles.

We here describe the meetings that were held with the stakeholders over the course of the project, along with their outcomes:

Meeting 1 (July 2017, April 2020: Dr Raji, geriatrician): Dr Raji provided feedback on how to combine variables that have different names across the 3 comorbidity indices (Elixhauser, Charlson, and CMS condition category [CC] indices). This feedback was crucial to align the variables to those being used in the CMS readmission models.
Meeting 2 (December 2017: Dr Raji): Because the network was very dense in the center, we have explored the concept of identifying patients that belong equally to all clusters, and therefore belong strongly to none. We explored with Dr Raji the notion of a “null” cluster, which would define this phenomenon, and whose extraction from the other clusters could lead to improved subgroup-specific modeling. Dr Raji thought this was a good direction, because members of a null cluster could contain patients who had many comorbidities and therefore were the “frequent fliers” who were repeatedly admitted to and discharged from hospitals. We plan to pursue this informatics-identified and clinically relevant concept in our future modeling. Dr Raji was also available by phone for clarifications related to the clinical interpretation of the patient subgroups and the design of targeted interventions throughout the project period.
Meeting 1 (June 2017: Dr Sharma, pulmonologist and expert in hospital readmission): The meeting produced the following recommendations that were used in the reanalysis of the inpatient-outpatient data we received from Medicare: (1) Include socioeconomic status (eligible for Medicaid) with age, sex, and race for matching controls; (2) use feature selection to preserve variable interactions and collinearities; (3) indicate risk for each cluster on visualization.
Meeting 2 (December 2017: Dr Sharma): The meeting produced the following recommendations: (1) In addition to conducting risk analysis at the cluster level (which was probabilistic), we should also conduct risk analysis at the tuple level (ie, specific combinations of comorbidities that were deterministic). (2) Because patients were not independent between the training and replication datasets, we should redo the analysis by pooling both years and splitting the pooled data equally into training and replication datasets. Dr Sharma has also been available by phone for clarifications related to the clinical interpretation of the patient subgroups and the design of targeted interventions throughout the project period. The results from the new pooled-split approach in the reanalysis did not significantly change the results; thus, there were no additional recommendations from the stakeholders.

Methods

Research Design

As shown in Figure 2, our project had 3 aims: (1) develop a visual analytical method to automatically identify subgroups of readmitted patients with COPD based on their comorbidity profiles; (2) develop risk models targeted to subgroups by integrating results from the method and stakeholders; and (3) test the generalizability of the visual analytical method across conditions and make the method available to other researchers through R scripts on a website.

Figure 2

Overview of the 3 Specific Aims, Showing Feedback Loops With the Stakeholders.

Research Conduct

There were no major departures from the research as originally approved by PCORI. However, procedurally there were 3 changes: (1) Because of changes in Medicare policies, we received the data much later than expected. We therefore proceeded with data that were available at the University of Texas Medical Branch; therefore, some of our publications contain results from only inpatient Medicare data. (2) The stakeholders suggested that instead of testing for replication between years, that we instead pool 2 years of the data, remove duplicate patients occurring in both years, and then split the pooled data into 2 random halves. This required redoing the COPD analysis. However, the overall results did not change. (3) Because we had regular contact with the stakeholders in person and by phone, aim 2d (see Figure 2) was more tightly integrated and therefore did not require a separate iteration.

Data Sources and Data Sets

The following description of data sources, study population, and variables applied for all 3 index conditions: COPD, CHF, and THA/TKA.

Data Source

We used 100% Medicare claims data for years 2011-2014 because (1) the scale of the data enables subgroup identification with sufficient statistical power; (2) the spread of data collected from across the United States enables generalizability of the results; (3) the population consisting of older adult patients enables research on an underrepresented segment of the US population; (4) the variables include comorbidities critical in readmission and demographics enabling analysis of differences by race and sex; and (5) the use of the data to build readmission predictive models enables head-to-head comparison with the subgroup-specific modeling we will develop. (Limitations of Medicare data are discussed in the “Conclusions” section.)

Study Population

We analyzed patients hospitalized for COPD, CHF, and THA/TKA. We selected these 3 conditions because (1) hospitalizations for each of these conditions are highly prevalent in older adult patients,⁴⁶ (2) hospitals report very high variations in their readmission rates,⁴⁶ and (3) for each of these conditions there exist well-tested readmission prediction models that did not consider patient subgroups.^49-52,55 For example, as shown in the regression equation shown earlier, although these models include comorbidities and demographics as predictors, they do not take into consideration the existence of patient subgroups that share many comorbidities. For each index condition, we used 100% Medicare claims data on readmitted patients in 2013 and 2014, removed patients who had duplicate records, and then randomly split the pooled data set to create the training and replication data sets. We extracted all patients who were admitted to an acute care hospital on or after July 1, 2013, had a principal diagnosis of the index condition, were aged ≥66 years, and were enrolled in both Medicare parts A and B fee-for-service plans. Because payment to primary care physicians participating in Medicare Advantage (MA) plans is capitated, and the physicians are not required to file claims for MA patients, these patients do not have complete data (eg, comorbidities) relevant to our study. Therefore, we excluded patients with MA coverage from 6 months before to 1 month after the discharge of the index admission. To determine how this exclusion affects the generalizability of our results, we compared the demographics of patients with and without MA coverage and here report whether our results generalize to MA patients. Because the published CMS models did not use Medicare Part D related to prescription medications, we did not include those data in our analyses.

After extracting the cases (defined as patients readmitted within 30 days of hospital discharge), we extracted an equal number of controls (defined as patients not readmitted within 90 days of discharge, matched by age, sex, race, and Medicaid eligibility as a proxy for economic status). We performed individual matching by randomly selecting a control to match a case on age, sex, race/ethnicity, and Medicaid eligibility.⁵⁶ The 90-day window of no readmittance represents an episode of care proposed by CMS for patients,^57,58 indicating that the controls are substantially free of complications that result in readmission during this period, and therefore allowing an effective comparison with the cases. This method of defining controls was identical to that used by the CMS models, which we therefore had to use for a fair comparison. We used this method for both the training data set and the replication data set. A small percentage (0.8%) of Medicare patients had “unknown race” recorded for the race attribute. Therefore, we had an equal number of patients of unknown race in cases and controls.

To match the exclusion criteria of the published CMS models, we excluded patients who were transferred from other facilities, died during the readmission hospitalization, or transferred to another acute care hospital. For this aim, we analyzed only data from patients who were readmitted within 30 days after discharge. Finally, we used the union of comorbidities in the 3 comorbidity indices described earlier: Elixhauser and Charlson comorbidity indices, and CMS CCs. To guard against miscoding of comorbidities, we used a 6-month look-back period in addition to the year the data were collected to ensure all comorbidities of the patients were included. Appendix A shows the detailed inclusion and exclusion criteria used to extract cases and controls for COPD, CHF, and THA/TKA; the respective numbers of patients extracted at each step; and the ICD codes for each of the 3 index conditions selected for analysis.

Measures

The outcome of interest was whether a patient with the index admission had an unplanned readmission to an acute care hospital in 30 days after discharge, based on data extracted from the MEDPAR file. Predictors for the risk of readmission model for each disease condition were the same patient demographics (eg, age and sex) and condition-specific variables (eg, oxygen dependency for COPD) as in the existing predictive models, in addition to the comorbidities we describe next.

Established comorbidity indices were used to form the candidate predictor pool, including the Charlson and Elixhauser comorbidity indices and the CMS CCs in the condition-and procedure-specific readmission models.⁵⁹ All these indices have mapped ICD-9-CM diagnosis codes to identify selected comorbid conditions. The Charlson Comorbidity Index is calculated as a weighted score from 17 conditions.⁶⁰ Elixhauser methods identify 31 conditions often used individually in regression models for prediction or risk adjustment.⁶¹ The CMS CCs are used for risk adjustment for various outcomes, such as readmission, mortality, and complication. We extracted the conditions included in these indices using corresponding diagnostic and procedure codes from physician and outpatient and inpatient claims in the 6 months before or on the day of the index admission. Because these comorbidity indices contain common as well as unique conditions, a union of the indices was prepared for analysis. This list of comorbidities was presented to the stakeholders, who suggested additional variables for each index condition that were important for analyzing hospital readmission; this resulted in the following variables in each condition: for COPD, history of sleep apnea and mechanical ventilation; for CHF, history of coronary artery bypass graft surgery; and for THA/TKA, congenital deformity of the hip joint and posttraumatic osteoarthritis.

Feature Selection

The visual analytical modeling was conducted with comorbidities, whereas the subsequent predictive modeling was done using the same variables that were included in the existing Yale models with which we were conducting our comparison. We used the following method to select features to include in our visual analytical modeling: (1) We removed comorbidities with prevalence <1%. This approach is typically used in other studies,⁶² because low-prevalence comorbidities tend to be unreliable, representing noise in the data. (2) We selected significant comorbidities in the training data set based on 2-way interaction tests using odds ratios with directionality and corrected for multiple testing using the Bonferroni method. (3) The remaining comorbidities were tested for replication in the replication data set. Appendix B shows the number of comorbidities and overall variables that were included in the analysis for each of the 3 conditions.

Analytical and Evaluative Approach

Visual Analytical Modeling

We developed a visual analytical method to identify patient subgroups in each of the 3 index conditions. The visual analytical method was developed using R scripts and applied using the following steps:

Represented data as a bipartite network. As shown in Figure 1C, we used a bipartite network to model the cases (30-day readmitted patients) and their comorbidities (features selected above) with each condition. The nodes represented the cases and comorbidities, and edges represented which patient had which comorbidity.
Laid out the network using a force-directed algorithm. We used the KK algorithm,⁴⁴ a force-directed algorithm to lay out the bipartite network. This layout algorithm pulls together nodes that are strongly connected and pushes apart nodes that are not. The result is that nodes with a similar pattern of connections are placed close to each other, and those that are dissimilar are pushed apart.
Identified biclusters and their degree of clusteredness, using a modularity algorithm. We used a modularity algorithm to identify the number of clusters, their boundaries, and quality of that cluster partitioning. Modularity is defined as the fraction of edges falling within a cluster, that is, the expected fraction of such edges in a network of the same size with randomly assigned edges.²¹ Modularity ranges from −0.5 to +1, with values >0.3 indicating substantial clustering.²¹ We used the bipartite version of modularity to find biclusters (consisting of patient and comorbidity nodes) in the network.
Enhanced the network layout based on biclusters. Because the traditional KK algorithm often cannot fully separate clusters in large and dense networks, the network layout needs to be enhanced before it can be useful to the stakeholders for understanding the associations. Therefore, once the biclusters were identified through the bicluster modularity algorithm, we used the ExplodeLayout algorithm⁴³ to separate the clusters identified through bicluster modularity, with the goal of reducing the visual overlap among them and thereby enhancing their comprehensibility. This approach effectively preserved the distances of nodes within a cluster but not the distances of nodes across clusters.
Evaluated the significance and internal validation of the readmitted patient subgroups. The patient subgroups identified through the modularity algorithm were tested for statistical significance and internal validation in the test data set. The significance of the biclustered modularity was measured by comparing it with a distribution of the same quantity generated from 1000 random permutations of the network by preserving the network size (number of nodes) and the network density (number of edges). Next, we tested the replicability of comorbidity co-occurrence in the test data set using the Rand index.⁶³ This index measures the proportion of comorbidity pairs that co-occurred and did not co-occur in a cluster in both years (where 0 = no internetwork cluster similarity, and 1 = total internetwork cluster similarity). Finally, we tested the significance of the Rand index by comparing it with a distribution of the same measure generated from 1000 random permutations of the 2010 and 2009 networks. All tests of statistical significance were 2 sided.

Classification Modeling

As shown in Figure 1, the biclusters identified through the modularity algorithm contain patient subgroups and their most frequently co-occurring comorbidities with respect to other patients in the network. However, as shown in both figures, many edges go across biclusters, which demonstrates that many patients within a bicluster have comorbidities that exist in other biclusters. As is true for all partitioning clustering methods, including modularity, membership of a new patient in each bicluster is therefore probabilitistic. This means that classification of a patient into a cluster is not defined by the inclusion or exclusion of comorbidities (eg, CHF and diabetes) but rather by the probability of being in a patient subgroup. This reflects the true variation in comorbidities where patients are similar or different, not just in a handful of carefully selected comorbidities while ignoring others; rather, patients are similar or different based on all of their considered comorbidities. It is this overall profile of patients that reflects the reality of comorbid conditions and therefore is most likely to reveal the real-world processes precipitating readmission.

Therefore, to determine whether a specific patient belongs to a patient subgroup, we developed a classifier that provided the probability of a patient belonging to each subgroup. Such probabilities provide information such as whether a patient has a much higher probability of being in 1 patient subgroup compared with others or whether the patient has an approximately equal probability of belonging to 2 patient sugroups, in which case the patient could have the probability of having multiple processes precipitating readmission.

To capture this complexity of how patients are similar or different based on their comorbidities, we developed the classifier using multinomial logistic regression.¹⁵ In contrast to support vector machines and decision trees, multinomial logistic regression has the advantage of generating “soft labels,” or probabilities for a patient to be in each patient subgroup, enabling an understanding of complex patterns in patients such as those who might belong equally to 2 different subgroups. To validate the classifier, we used a cross-validation approach, where 1000 random samples of 75% of the data were selected with replacement to build the prediction model. The classifier was then used to predict the subgroup membership for the remaining 25% of the patients as the basis of calculating the average accuracy across the 1000 cross-validated samples.

Predictive Modeling

To develop and evaluate subgroup-specific models, we used the same data sources and inclusion and exclusion criteria for all patients who were admitted for each condition, as described in the data selection section above. However, here we used the cohort consisting of the entire population in each condition. We classified all patients in each index condition into subgroups by applying the patient subgroup classifier developed in aim 1. The outcome measure was 30-day unplanned readmission (yes vs no) extracted from MEDPAR files (inpatient claims). We conducted the development and evaluation of subgroup-specific readmission prediction models through the following steps.

Developed subgroup-specific models. Our goal was to develop predictive models with an optimal balance between predictive ability and complexity. Therefore, we used an iterative process consisting of modeling and feedback from the clinician stakeholders. The training set for model development was randomly selected from the patients who were admitted for the respective index condition. All models included age, sex, race, index condition severity (eg, use of mechanical ventilation and respirator dependence), and subgroup indicators. A small percentage (0.8%) of Medicare patients had “unknown race” (ie, the data for race were missing); therefore, we grouped “unknown race” and “other race” together in the analyses. Because only 0.8% of Medicare beneficiaries had missing data on race, the risk of bias was too low to warrant a sensitivity analysis.
Using the subgroups identified in aim 1, we developed stratified logistic regression models for each subgroup. Each of these subgroup-specific models included the same variables selected through feature selection in an index condition and additional ones deemed to be relevant to hospital readmission by the stakeholders. Subgroup-stratified modeling has the advantage of being more specific and interpretable but may not be the most efficient modeling approach. Therefore, we also incorporated information on patient subgroups into a single regression model (referred to as a hierarchical model) of all the data by inclusion of a subgroup variable. The model performance in the training sample was measured by goodness-of-fit statistics (eg, likelihood ratio test, deviance), C statistics, and Akaike information criterion (AIC). Given a set of models with satisfying goodness-of-fit, the preferred model should be the one with higher C statistic and lower AIC values. Coefficient estimates from the model were applied to the validation sets for model validation.
Internally validated subgroup-specific models. We conducted internal validation to examine whether the prediction model had sustainable good performance in the validation sets. The model's performance in the validation sets was evaluated by the following measures:
–
Discrimination. This quantity measures the model's ability to distinguish patients with readmission from those without readmission. The C statistic is the most frequently used discrimination measure for logistic regression models and is identical to the area under the receiver operation curve. Model discrimination was also presented visually, using box plots showing the average risk prediction for patients with and without readmission.
–
Calibration. This quantity measures how well the predicted probabilities agree with observed probabilities. In logistic regression, calibration is measured by calibration indices (calibration-in-the-large and calibration slope) and visually illustrated by calibration plot (the scatterplot of the proportion of patients actually admitted vs deciles of predicted probability of having readmission). Good calibration is indicated by a calibration-in-the-large with a value close to 0 and a calibration slope close to 1. Because the study was overpowered from the large sample size, we did not measure the calibration based on statistical significance (eg, P values of the Hosmer-Lemeshow and calibration indices).
Compare hierarchical models with the existing model that does not consider patient subgroups. The performance of our subgroup-specific prediction model derived from visual analytics was compared with the CMS models that include comorbidities and demographic variable as predictors but do not consider subgroups. Besides the comorbidities, all variables in the comparison sets were the same. We tested for discrimination and calibration using the following 2 measures:
–
NRI indicates the proportion of patients whose predicted probability of readmission improved (with reference to actual readmission status) between the CMS reference model and subgroup-specific model. We report 2 NRI statistics: (1) categorical NRI, which predicted readmission probabilities for each model divided into 10 sequential categories between 0 and 1, with improvement requiring a shift between categories; and (2) continuous NRI, which is based on the proportions of individuals with any improved predicted probability of readmission, regardless of the size of that improvement.
–
IDI indicates the difference in the average improvement in predicted risks between the CMS reference model and the subgroup-specific model.

Results

Here, we present the results for each of the modeling methods used—visual analytics, classification, and predictive—across the 3 index conditions (COPD, CHF, and THA/TKA).

Visual Analytical Modeling

The visual analytical modeling of readmitted patients in all 3 index conditions produced statistically and clinically significant patient subgroups based on their most frequently co-occurring comorbidities and were significantly replicated using the Rand index.

Chronic Obstructive Pulmonary Disease

The application of the inclusion and exclusion selection criteria described in Appendix A resulted in a training data set (n = 14 508 matched case-control pairs, of which 51 pairs of patients with no comorbidities were dropped) and a replication data set (n = 14 508 matched case-control pairs, of which 51 pairs of patients with no comorbidities were dropped), matched by age, sex, race, and Medicaid eligibility (a proxy for economic status). The feature selection described in Appendix B resulted in 45 unique comorbidities identified from the 3 comorbidity indices (Elixhauser and Charlson indices and CMS CCs) plus 2 condition-specific comorbidities suggested by the stakeholders. Of these, 3 were removed because of <1% prevalence, and 30 survived the significance and replication testing with Bonferroni correction. These data were used for the subsequent visual analytical modeling

As shown in Figure 3, the bipartite network method identified 4 biclusters, each representing a subgroup of readmitted patients with COPD and their most frequently co-occurring comorbidities. The analysis revealed that the biclustering had significant modularity (Q = 0.17; z = 7.3; P = 2.813e−13) and significant co-occurrence replication (RI = 0.92; z = 11.62; P = 3.11e−31). Furthermore, as requested by the stakeholders, we superimposed the risk for each of the patient subgroups to enable stakeholders to determine whether the subgroups were clinically meaningful. Although the comorbidity co-occurrences within each bicluster were meaningful, because of the size of the data, the edges between clusters were too dense to be individually visible.

Figure 3

Results From the COPD Network Analysis Showing 4 Patient Subgroups, Based on Their Most Frequently Co-occurring Comorbidities, and Their Risk of Readmission (Red Text).

Drs Sharma and Raji inspected the network and noted that the readmission risk of the patient subgroups had a wide range (12.7%-19.6%) with clinical (face) validity. Furthermore, the co-occurrence of comorbidities in each patient subgroups was clinically meaningful. Drs Sharma and Raji inferred that patients in subgroup 1 had a low disease burden, with uncomplicated hypertension leading to the lowest risk for readmission (12.7%). This subgroup represented patients with early organ dysfunction who would benefit from creating checklists to remind clinicians to conduct procedures such as regular monitoring of blood pressure in predischarge protocols to reduce the risk of readmission. Next, Drs Sharma and Raji inferred that patients in subgroup 3 had mainly psychosocial comorbidities, which could lead to aspiration-precipitating pneumonia, in turn leading to an increased risk for readmission (15.9%). This subgroup would benefit from early consultation with specialists (eg, psychiatrists, therapists, neurologists, geriatricians) with expertise in psychosocial comorbidities, with a focus on the early identification of aspiration risks and precautions. Next, Drs Sharma and Raji inferred that patients in subgroup 2 had diabetes with complications, renal failure, and heart failure, and therefore had higher disease burden leading to an increased risk for readmission (17.8%) compared with subgroup 1. This subgroup had metabolic abnormalities with greater end-organ dysfunction and would therefore benefit from case management from advanced practice providers (eg, nurse practitioners) with rigorous adherence to established guidelines to reduce the risk of readmission. Finally, Drs Sharma and Raji inferred that patients in subgroup 4 had diseases with end-organ damage including gastrointestinal disorders, and therefore had the highest disease burden and risk for readmission (19.6%). This subgroup would also benefit from case management with rigorous adherence to established guidelines to reduce the risk of readmission. Furthermore, because patients in this subgroup typically experience complications that could impair their ability to make medical decisions, they should be provided with early consultation with a palliative care team to ensure that care interventions are aligned with patient preferences and values.

Congestive Heart Failure

The application of the inclusion and exclusion selection criteria described in Appendix A resulted in a training data set (n = 25 775 matched case-control pairs, of which 103 pairs of patients with no comorbidities were dropped) and a replication data set (n = 25 775 matched case-control pairs, of which 104 pairs of patients with no comorbidities were dropped), matched by age, sex, race, and Medicaid eligibility (a proxy for economic status). The feature selection described in Appendix B resulted in 42 unique comorbidities identified from the 3 comorbidity indices plus 1 condition-specific comorbidity. Of these, 1 comorbidity was removed because of <1% prevalence, and 37 comorbidities survived the significance and replication testing with Bonferroni correction. These data were used for the subsequent visual analytical modeling.

As shown in Figure 4, the bipartite network method identified 4 biclusters, each representing a subgroup of readmitted patients with CHF and their most frequently co-occurring comorbidities. The analysis revealed that the biclustering had significant modularity (Q = 0.17; z = 8.69; P = 3.636e−18) and significant co-occurrence replication (RI = 0.94; z = 17.66; P = 8.65e−70). Furthermore, as requested by the stakeholders, we superimposed the risk for each of the patient subgroups to enable a geriatrician to determine whether the subgroups were clinically meaningful.

Figure 4

Results From the CHF Network Analysis Showing 4 Patient Subgroups, Based on Their Most Frequently Co-occurring Comorbidities, and Their Risk of Readmission (Red Text).

Dr Raji inspected the network and noted that the readmission risk of the patient subgroups, ranging from 15.1% to 19.9%, was wide and had clinical (face) validity. Furthermore, the co-occurrence of comorbidities in each patient subgroup was clinically meaningful. He inferred that patients in subgroup 1 had chronic but stable conditions and therefore had the lowest risk for readmission (15.1%). Next, he inferred that patients in subgroup 3 had mainly psychosocial comorbidities but were not acute or fragile compared with subgroups 2 and 4 and therefore had medium risk (16.6%). Finally, he inferred that patients in subgroup 4 had severe chronic conditions; these patients were fragile and close to hospice care and were therefore at high risk for readmission (19.9%).

Total Hip/Knee Arthroplasty

The application of the inclusion and exclusion selection criteria described in Appendix A resulted in a training data set (n = 8249 matched case-control pairs, of which 1239 pairs of patients with no comorbidities were dropped) and a replication data set (n = 8249 matched case-control pairs, of which 1264 pairs of patients with no comorbidities were dropped), matched by age, sex, race, and Medicaid eligibility (a proxy for economic status). The feature selection described in Appendix B resulted in 39 unique comorbidities identified from the 3 comorbidity indices plus 2 condition-specific comorbidities. Of these, 11 comorbidities were removed because of <1% prevalence, and 11 that survived the significance and replication testing with Bonferroni correction. These data were used for the subsequent visual analytical modeling.

As shown in Figure 5, the bipartite network method identified 7 biclusters, each representing a subgroup of readmitted patients who had undergone THA/TKA and their most frequently co-occurring comorbidities. The analysis revealed that the biclustering had significant modularity (Q = 0.31; z = 2.52; P = .011) and significant co-occurrence replication (RI = 0.89; z = 3.15; P = .0016). Furthermore, as requested by the stakeholders, we superimposed the risk for each of the patient subgroups to enable a geriatrician to determine whether the subgroups were clinically meaningful.

Figure 5

Results From the THA/TKA Network Analysis Showing 7 Patient Subgroups, Based on Their Most Frequently Co-occurring Comorbidities, and Their Risk for Readmission (Red Text).

Dr Raji inspected the network and noted that patients who had undergone TKA were healthier, in general, than were patients who had undergone THA, and the network was therefore difficult to interpret when merged together. However, he noted that the range of readmission risk had clinical (face) validity. Finally, he noted that subgroups 2, 4, and 5 comprised patients with more-severe comorbidities related to lung, heart, and kidney, with higher risk for readmission, than patients in subgroups 1, 6, and 7, who had less severe comorbidities and therefore had a lower risk for readmission.

Classification Modeling

The results of the classification modeling across all 3 index conditions showed high accuracy (ranging from 98.7% to 100%) in classifying patients into the subgroups identified from the visual analytics.

Chronic Obstructive Pulmonary Disease

Table 2 summarizes the percentage of patients correctly assigned to a subgroup by the classification model over 1000 iterations for the training (first row of the table) and testing components (second row of the table), including the total N, mean, SD, quantiles 0.025, 0.25, 0.50, 0.75, and 0.975, and the minimum and maximum. For the testing data, the percentage of patients correctly assigned to a subgroup ranged from 99.1% to 100.0%, with a median of 99.6%, and with 95% being in the range of 99.3% to 99.8%. A single model of all the data (not separated into training and testing components) correctly predicted subgroup membership for 99.9% of patients (14 443/14 457). Model coefficients are summarized in Appendix C.

Table 2

Percentage of Patients With COPD Correctly Assigned to a Subgroup by the Classification Model.

Congestive Heart Failure

Table 3 summarizes the percentage of patients correctly assigned to a subgroup by the classification model over 1000 iterations for the training (first row of the table) and testing components (second row of the table), including the total number; mean (SD); quantiles 0.025, 0.25, 0.50, 0.75, and 0.975; and the minimum and maximum. For the testing data, the percentage of patients correctly assigned to a subgroup ranged from 98.7% to 99.7%, with a median of 99.3%, and with 95% being in the range of 99.3% to 99.6%. A single model of all the data (not separated into training and testing components) correctly predicted subgroup membership for 99.2% of patients (25 476/25 672). Model coefficients are summarized in Appendix C.

Table 3

Percentage of Patients With CHF Correctly Assigned to a Subgroup by the Classification Model.

Total Hip and Total Knee Arthroplasty

Table 4 summarizes the percentage of patients who underwent a THA/TKA who were correctly assigned to a subgroup by the classification model over 1000 iterations for the training (first row of the table) and testing components (second row of the table), including the total number, mean (SD); quantiles 0.025, 0.25, 0.50, 0.75, and 0.975; and the minimum and maximum. For the testing data, the percentage of patients correctly assigned to a subgroup ranged from 99.4% to 100%, with a median of 99.9%, and with 95% being in the range of 99.8% to 100%. A single model of all the data (not separated into training and testing components) correctly predicted subgroup membership for 100% of patients (70107010). Model coefficients are summarized in Appendix C.

Table 4

Percentage of Patients Who Underwent THA/TKA Correctly Assigned to a Subgroup by the Classification Model.

Predictive Modeling

For each of the 3 index conditions, we developed the following 3 sets of models to predict readmission, each of which included all predictors used in the current CMS model (comorbidities, age, sex, and race): (1) a lumped CMS model, which used the entire study population in each condition; (2) subgroup models (1 for each bicluster in each condition), each of which used cases identified from the visual analytical modeling, and controls from the entire study population assigned to each subgroup using the classifier; and (3) a hierarchical model, which used the entire study population with an additional categorical predictor for subgroup membership (determined by the subgroup classifier).

Chronic Obstructive Pulmonary Disease

The application of the inclusion and exclusion selection criteria (described in Appendix A) resulted in a cohort of 186 041 patients (29 026 cases and 157 015 controls). As shown in Figure 6, the lumped CMS model had a C statistic of 0.624; this was not significantly different from the hierarchical model, which had a C statistic of 0.625. The 4 subgroup models had C statistics that ranged from 0.596 to 0.616. Appendix D provides additional details related to the distribution of risk among the subgroup models. Furthermore, as shown in Appendix D, the calibration plots revealed that all models had a slope close to 1 and an intercept close to 0. Finally, the hierarchical model had significantly higher NRI but not significantly higher IDI compared with the CMS reference model.

Figure 6

Results From the Prediction Modeling of the COPD Index Condition Showing the C Statistic of the Lumped CMS Model, the 4 Subgroup Models (SM-1, SM-2, SM-3, and SM-4), and the Hierarchical Model.

Congestive Heart Failure

The application of the inclusion and exclusion selection criteria (described in Appendix A) resulted in a cohort of 295 761 patients (51 573 cases and 186 935 controls). As shown in Figure 7, the lumped CMS model had a C statistic of 0.6; this was not significantly different from the hierarchical model, which also had a C statistic of 0.6. The 4 subgroup models had C statistics that ranged from 0.570 to 0.614. Appendix D provides additional details related to the distribution of risk among the subgroup models. Furthermore, as shown in Appendix D, the calibration plots revealed that all models had a slope close to 1 and an intercept close to 0. Finally, the hierarchical model had a significantly lower NRI and IDI than did the CMS reference model.

Figure 7

Results From the Prediction Modeling of the CHF Index Condition Showing the C Statistic of the Lumped CMS Model, the 4 Subgroup Models (SM-1, SM-2, SM-3, and SM-4), and the Hierarchical Model.

Total Hip and Total Knee Arthroplasty

The application of the inclusion and exclusion selection criteria (described in Appendix A) resulted in a cohort of 356 772 patients (16 520 cases and 321 441 controls). As shown in Figure 8, the lumped CMS model had a C statistic of 0.638; this was not significantly different from the hierarchical model, which also had a C statistic of 0.638. The 7 subgroup models had C statistics that ranged from 0.578 to 0.636. Appendix D provides the additional details related to the distribution of risk among the subgroup models. Furthermore, as shown in Appendix D, the calibration plots revealed that the lumped, CMS, and hierarchical models had a slope close to 1 and an intercept close to 0. However, the calibration plots also revealed that 4 subgroup models had poor calibration in terms of slope (0.868--1.176) and intercept (−0.427 to 0.396). Finally, the hierarchical model had a significantly higher NRI and IDI than did the CMS reference model.

Figure 8

Results from the Prediction Modeling of the THA/TKA Index Conditions Showing the C Statistic of the Lumped CMS Model, the 7 Subgroup Models (SM-1–SM-7), and the Hierarchical Model.

Discussion

Overall Results

The results revealed that (1) the visual analytical modeling was successful in automatically identifying strong and significant clustering of patients and comorbidities that were clinically meaningful to the stakeholders and enabled them to infer potential targeted interventions; (2) the classification modeling had high accuracy in classifying patients into the patient subgroups identified by the visual analytical modeling; and (3) the predictive modeling revealed that in 2 of the 3 conditions (COPD and THA/TKA), the hierarchical models (which incorporated information about patient subgroups) had significant improvement in discrimination between readmitted and not readmitted patients as measured by NRI, but not as measured by the C statistic or IDI, suggesting that improvements occurred within risk categories. However, the absolute differences were small, suggesting that comorbidities on their own were insufficient for building such predictive models. Furthermore, because the hierarchical models in all conditions had close to perfect calibration, the improvement in NRI occurred without losing accuracy in calibration.

Strengths and Limitations of the Data

The strengths of the Medicare data we used to test our method include the following: (1) large-scale data, enabling subgroup identification with sufficient statistical power; (2) generalizability of the data, as they were collected from across the United States); (3) an older adult population, enabling research on an underrepresented segment of the US population; (4) predictor variables that, according to clinicians, included comorbidities critical in readmission, as well as demographics enabling analysis of differences in race and sex; and (5) data that have been used to build readmission predictive models for COPD, CHF, and THA/TKA, enabling a head-to-head comparison with the hierarchical modeling approach that we developed.

However, the data had several limitations. Our baseline models were built using Medicare administrative data, and we then used the same data set to make head-to-head comparisons with other models. However, such administrative data have known limitations, such as the lack of comorbidity severity and test results, which could strongly impact the accuracy of predictive models. Furthermore, findings from the existing CMS models suggest that comorbidities are not strong predictors for readmission, and future modeling should explore other variables to predict readmission. Finally, because we were comparing our models with the CMS models, we had to use the same definition for controls (ie, 90 days with no readmission) that CMS used, which introduced a selection bias that exaggerated the separation between cases and controls. Similarly, we excluded patients who died, which introduced a bias of the results toward healthier patients.

Strengths and Limitations of the Visual Analytical Method

The strengths of our method include the following: (1) We used an automatic method for the identification of patient subgroups and their characteristics, enabling development of stratified regression models. This method contrasts with the current approach of using investigator-selected variables for identifying patient subgroups to develop stratified regression models. (2) Our method takes a bipartite visual analytical approach that enables stakeholders to comprehend the number, size, and interrelationships among patient subgroups; consider reasons for the processes that precipitate readmission underlying the patient subgroups; and design interventions targeted to each patient subgroup. (3) The method gives us the ability to identify which subgroups have higher or lower predictive accuracy, compared with predictive models that do not take patient subgroups into consideration.

However, our method has 2 limitations related to the analysis of large data sets: (1) Although our overall method scaled adequately to analyze data sets ranging up to 25 000 patients, testing the significance of biclustering and its replication took almost a week of computation, despite using a dedicated server with multiple cores. This high computational cost to test for significance resulted from the permutation test using 1000 random permutations, which is currently the best approach to measure the significance of bipartite modularity. (2) Although bicluster modularity was successful in finding significant and meaningful patient subgroups based on their comorbidities, the visualizations were extremely dense and therefore concealed patterns within and between the subgroups.

Future Research Recommendations

From a data perspective, we recommend that future research use EMRs containing a wider range and granularity of variables, such as test results and disease severity, to improve the accuracy of stratified regression models of hospital readmission. Future research should also explore alternate approaches for inclusion and exclusion criteria of patients based on the data limitations discussed earlier, updated ICD-10 codes, and the inclusion of data from Medicare Part D, which contain medications. From a method perspective, future research should externally validate the results using a different year from the Medicare database. We also recommend the development of mathematical methods to approximate the significance of bipartite modularity, which would enable the automatic measurement of significance in massive data sets. Furthermore, we recommend more powerful visual analytical algorithms to visualize and comprehend associations within and between large and dense subgroups.

Conclusions

This PCORI-funded research enabled us to arrive at the following 4 conclusions:

The visual analytical modeling was successful in identifying statistically and clinically significant patient subgroups in all 3 index conditions (COPD, CHF, and THA/TKA). Furthermore, we made no changes to the method or the code after it was developed and tested on the initial COPD index condition.
The classification modeling was successful in accurately predicting which patient subgroup a patient should belong to, based on their profile of comorbidities.
The predictive modeling was successful in revealing that in 2 of the 3 conditions (COPD and THA/TKA), the hierarchical models had significant improvement in discrimination between readmitted and not readmitted patients as measured by NRI, but not as measured by the C statistic or IDI, suggesting that improvements occurred within risk categories. Furthermore, this improvement occurred without losing accuracy in calibration. However, the absolute differences in discrimination were small, suggesting that comorbidities were not good predictors for building such readmission predictive models.
The analysis also revealed that dichotomized comorbidities by themselves do not have sufficient predictive power individually or in groups of related comorbidities to significantly improve the predictive accuracy of readmission. This and other lacking information, such as test results, are known limitations of CMS administrative data, but because the baseline CMS models were built using Medicare administrative data, we had to use the same data for our models to enable a head-to-head comparison. Future research using EMRs that contain a wider range and granularity of variables, such as laboratory test results and disease severity, could help overcome such limitations but could also bring with them other yet-to-be-discovered limitations.

References

1.: McClellan J, King M-C. Genetic heterogeneity in human disease. Cell. 2010;141(2):210-217. [PubMed: 20403315]
2.: Waldman SA, Terzic A. Therapeutic targeting: a crucible for individualized medicine. Clin Pharmacol Ther. 2008;83(5):651-654. [PubMed: 18425084]
3.: Rouzier R, Perou CM, Symmans WF, et al. Breast cancer molecular subtypes respond differently to preoperative chemotherapy. Clin Cancer Res. 2005;11(16):5678-5685. [PubMed: 16115903]
4.: Sorlie T, Perou CM, Tibshirani R, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A. 2001;98(19):10869-10874. [PMC free article: PMC58566] [PubMed: 11553815]
5.: Fitzpatrick AM, Teague WG, Meyers DA, et al. Heterogeneity of severe asthma in childhood: confirmation by cluster analysis of children in the National Institutes of Health/National Heart, Lung, and Blood Institute Severe Asthma Research Program. J Allergy Clin Immunol. 2011;127(2):382-389.e1-13. doi:10.1016/j.jaci.2010.11.015 [PMC free article: PMC3060668] [PubMed: 21195471] [CrossRef]
6.: Haldar P, Pavord ID, Shaw DE, et al. Cluster analysis and clinical asthma phenotypes. Am J Respir Crit Care Med. 2008;178(3):218-224. [PMC free article: PMC3992366] [PubMed: 18480428]
7.: Lotvall J, Akdis CA, Bacharier LB, et al. Asthma endotypes: a new approach to classification of disease entities within the asthma syndrome. J Allergy Clin Immunol. 2011;127(2):355-360. [PubMed: 21281866]
8.: Nair P, Pizzichini MMM, Kjarsgaard M, et al. Mepolizumab for prednisone-dependent asthma with sputum eosinophilia. N Engl J Med. 2009;360(10):985-993. [PubMed: 19264687]
9.: Ortega HG, Liu MC, Pavord ID, et al. Mepolizumab treatment in patients with severe eosinophilic asthma. N Engl J Med. 2014;371(13):1198-1207. [PubMed: 25199059]
10.: Collins FS, Varmus H. A new initiative on precision medicine. N Engl J Med. 2015;372(9):793-795. [PMC free article: PMC5101938] [PubMed: 25635347]
11.: Lacy ME, Wellenius GA, Carnethon MR, et al. Racial differences in the performance of existing risk prediction models for incident type 2 diabetes: the CARDIA study. Diabetes Care. 2016;39(2):285-291. [PMC free article: PMC4722943] [PubMed: 26628420]
12.: Baker JJ. Medicare payment system for hospital inpatients: diagnosis-related groups. J Health Care Finance. 2002;28(3):1-13. [PubMed: 12079147]
13.: Lipkovich I, Dmitrienko A, Denne J, Enas G. Subgroup identification based on differential effect search--a recursive partitioning method for establishing response to treatment in patient subpopulations. Stat Med. 2011;30(21):2601-2621. [PubMed: 21786278]
14.: Kehl V, Ulm K. Responder identification in clinical trials with censored data. Comput Stat Data Anal. 2006;50(5):1338-1355.
15.: Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Springer; 2001.
16.: Abu-Jamous B, Fa R, Nandi AK. Integrative Cluster Analysis in Bioinformatics. John Wiley & Sons, Ltd; 2015.
17.: Lochner KA, Cox CS. Prevalence of multiple chronic conditions among Medicare beneficiaries, United States, 2010. Prev Chronic Dis. 2013;10:E61. doi:10.5888/pcd10.120137 [PMC free article: PMC3652723] [PubMed: 23618541] [CrossRef]
18.: Shabalin AA, Weigman VJ, Perou CM, Nobel AB. Finding large average submatrices in high dimensional data. Ann Appl Stat. 2009;3(3):985-1012.
19.: Odibat O, Reddy CK. Efficient mining of discriminative co-clusters from gene expression data. Knowl Inf Syst. 2014;41(3):667-696. [PMC free article: PMC4308820] [PubMed: 25642010]
20.: Casanova R, Saldana S, Chew EY, Danis RP, Greven CM, Ambrosius WT. Application of random forests methods to diabetic retinopathy classification analyses. PLoS One. 2014;9(6):e98587. doi:10.1371/journal.pone.0098587 [PMC free article: PMC4062420] [PubMed: 24940623] [CrossRef]
21.: Newman MEJ. Networks: An Introduction. Oxford University Press; 2010.
22.: Thomas JJ, Cook KA, eds. Illuminating the Path: The R&D Agenda for Visual Analytics. IEEE Press; 2005.
23.: Christakis NA, Fowler JH. Social network sensors for early detection of contagious outbreaks. PLoS One. 2010;5(9):e12948. doi:10.1371/journal.pone.0012948 [PMC free article: PMC2939797] [PubMed: 20856792] [CrossRef]
24.: Bhavnani SK, Eichinger F, Martini S, Saxman P, Jagadish HV, Kretzler M. Network analysis of genes regulated in renal diseases: implications for a molecular-based classification. BMC Bioinformatics. 2009;10(Suppl 9):S3. [PMC free article: PMC2745690] [PubMed: 19761573]
25.: Yildirim MA, Goh KI, Cusick ME, Barabasi AL, Vidal M. Drug-target network. Nat Biotechnol. 2007;25(10):1119-1126. [PubMed: 17921997]
26.: Bhavnani SK, Dang B, Bellala G, et al. Unlocking proteomic heterogeneity in complex diseases through visual analytics. Proteomics. 2015;15(8):1405-1418. [PMC free article: PMC4471338] [PubMed: 25684269]
27.: Bhavnani SK, Victor S, Calhoun WJ, et al. How cytokines co-occur across asthma patients: from bipartite network analysis to a molecular-based classification. J Biomed Inform. 2011;44(Suppl 1):S24-S30. [PMC free article: PMC3277832] [PubMed: 21986291]
28.: Bhavnani SK, Drake J, Gowtham B, et al. How cytokines co-occur across rickettsioses patients: from bipartite visual analytics to mechanistic inferences of a cytokine storm. Paper presented at: AMIA Summit on Translational Bioinformatics; March 18-20, 2013; San Francisco, CA. [PMC free article: PMC3814500] [PubMed: 24303287]
29.: Bhavnani SK, Dang B, Caro M, et al. Heterogeneity within and across pediatric pulmonary infections: from bipartite networks to at-risk subphenotypes. Paper presented at: AMIA Summit on Translational Bioinformatics; April 7-9, 2014; San Francisco, CA. [PMC free article: PMC4333711] [PubMed: 25717396]
30.: Bhavnani SK, Bellala G, Victor S, Bassler KE, Visweswaran S. The role of complementary bipartite visual analytical representations in the analysis of SNPs: a case study in ancestral informative markers. J Am Med Inform Assoc. 2012;19(e1):e5-e12. doi:10.1136/amiajnl-2011-000745 [PMC free article: PMC3392853] [PubMed: 22718038] [CrossRef]
31.: Bhavnani SK, Drake J, Divekar R. The role of visual analytics in asthma phenotyping and biomarker discovery. Adv Exp Med Biol. 2014;795:289-305. [PubMed: 24162916]
32.: Bhavnani SK, Dang B, Caro M, et al. Heterogeneity within and across pediatric pulmonary infections: from bipartite networks to at-risk subphenotypes. Proc AMIA Jt Summits Transl Sci. 2014;2014:29-34. [PMC free article: PMC4333711] [PubMed: 25717396]
33.: Bhavnani SK, Drake J, Bellala G, et al. How cytokines co-occur across rickettsioses patients: from bipartite visual analytics to mechanistic inferences of a cytokine storm. Proc AMIA Jt Summits Transl Sci. 2013;2013:15-19. [PMC free article: PMC3814500] [PubMed: 24303287]
34.: Bhavnani S, Dang B, Kilaru V, et al. Methylation differences reveal heterogeneity in spontaneous preterm birth pathophysiology: a visual analytical approach. Am J Obstet Gynecol. 2014;210(1):S9-S10. https://www.ajog.org/article/S0002-9378(13)01112-5/fulltext
35.: Bhavnani S, Dang B, Caro M, Saade G, Visweswaran S. Genetic differences reveal heterogeneity in spontaneous preterm birth pathophysiology: a visual analytical approach. Am J Obstet Gynecol. 2013;210(1):S343-S344.
36.: Newman MEJ. Modularity and community structure in networks. Proc Natl Acad Sci USA. 2006;103(23):8577-8582. [PMC free article: PMC1482622] [PubMed: 16723398]
37.: Newman MEJ. Fast algorithm for detecting community structure in networks. Phys Rev E. 2004;69(6):066133. [PubMed: 15244693]
38.: Trevino S III, Nyberg A, Del Genio CI, Bassler KE. Fast and accurate determination of modularity and its effect size. J Stat Mech. 2015;P02003.
39.: Chauhan R, Ravi J, Datta P, et al. Reconstruction and topological features of the sigma factor regulatory network of Mycobacterium tuberculosis. Nat Commun. 2016;7:11062. doi:10.1038/ncomms11062 [PMC free article: PMC4821874] [PubMed: 27029515] [CrossRef]
40.: Barber MJ. Modularity and community detection in bipartite networks. Phys Rev E. 2007;76(6):066102. doi:10.1103/PhysRevE.76.066102 [PubMed: 18233893] [CrossRef]
41.: Bhavnani SK, Dang B, Visweswaran S, et al. How comorbidities co-occur in readmitted hip fracture patients: from bipartite networks to insights for post-discharge planning. Proc AMIA Jt Summits Transl Sci. 2015;2015:36-40. [PMC free article: PMC4525217] [PubMed: 26306228]
42.: Bhavnani SK, Dang B, Visweswaran S, et al. How high-risk comorbidities co-occur in readmitted hip fracture patients: implications for precision medicine and predictive modeling. JMIR Med Inform. 2020;26;8(10):e13567. doi:10.2196/13567 [PMC free article: PMC7652691] [PubMed: 33103657] [CrossRef]
43.: Dang B, Chen T, Bassler KE, Bhavnani SK. ExplodeLayout: enhancing the comprehension of large and dense networks. Presented at: AMIA Summit on Translational Bioinformatics; March 21-24, 2016; San Francisco, CA.
44.: Kamada T, Kawai S. An algorithm for drawing general undirected graphs. Inf Process Lett. 1989;31:7-15.
45.: Merigan W, Freeman A, Meyers SP. Parallel processing streams in human visual cortex. Neuroreport. 1997;8(18):3985-3991. [PubMed: 9462479]
46.: Jencks SF, Williams MV, Coleman EA. Rehospitalizations among patients in the Medicare fee-for-service program. N Engl J Med. 2009;360(14):1418-1428. [PubMed: 19339721]
47.: Medical Payment Advisory Commission. Report to the Congress. Promoting Greater Efficiency in Medicare. Published June 2007. Accessed June 24, 2021. http://www.medpac.gov/docs/default-source/reports/Jun07_EntireReport.pdf [Link no longer active]
48.: Ashton CM, Del Junco DJ, Souchek J, Wray NP, Mansyur CL. The association between the quality of inpatient care and early readmission: a meta-analysis of the evidence. Med Care. 1997;35(10):1044-1059. [PubMed: 9338530]
49.: Keenan PS, Normand SL, Lin Z, et al. An administrative claims measure suitable for profiling hospital performance on the basis of 30-day all-cause readmission rates among patients with heart failure. Circ Cardiovasc Qual Outcomes. 2008;1(1):29-37. [PubMed: 20031785]
50.: Yale New Haven Health Services Corporation/Center for Outcomes Research & Evaluation. 2015 Condition-Specific Measures Updates and Specifications Report: Hospital-Level 30-Day Risk-Standardized Readmission Measures on Acute Myocardial Infarction, Heart Failure, Pneumonia, Chronic Obstructive Pulmonary Disease, and Stroke. 2015. Accessed April 20, 2015. http://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/HospitalQualityInits/Measure-Methodology.html
51.: Yale New Haven Health Services Corporation/Center for Outcomes Research & Evaluation. 2015 Procedure-Specific Readmission Measures Updates and Specifications Report: Elective Primary Total Hip Arthroplasty and/or Total Knee Arthroplasty, and Isolated Coronary Artery Bypass Graft Surgery. 2015. Accessed April 20, 2015. http://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/HospitalQualityInits/Measure-Methodology.html
52.: Sharif R, Parekh TM, Pierson KS, Kuo YF, Sharma G. Predictors of early readmission among patients 40 to 64 years of age hospitalized for chronic obstructive pulmonary disease. Ann Am Thorac Soc. 2014;11(5):685-694. [PMC free article: PMC4225809] [PubMed: 24784958]
53.: Hillege HL, Girbes AR, de Kam PJ, et al. Renal function, neurohormonal activation, and survival in patients with chronic heart failure. Circulation. 2000;102(2):203-210. [PubMed: 10889132]
54.: French DD, Bass E, Bradham DD, Campbell RR, Rubenstein LZ. Rehospitalization after hip fracture: predictors and prognosis from a national veterans study. J Am Geriatr Soc. 2008;56(4):705-710. [PubMed: 18005354]
55.: Grosso LM, Curtis JP, Lin Z, et al. Hospital-level 30-Day All-Cause Risk-Standardized Readmission Rate Following Elective Primary Total Hip Arthroplasty (THA) and/or Total Knee Arthroplasty (TKA): Measure Methodology Report. Yale New Haven Health Services Corporation, Center for Outcomes Research & Evaluation; 2012.
56.: Grandits G, Neuhaus J. Using SAS to Perform Individual Matching in Design of Case-Control Studies. 2010; SAS Institute. Accessed May 5, 2020. https://support.sas.com/resources/papers/proceedings10/061-2010.pdf
57.: Medicare Payment Advisory Commission. Chapter 3. Approaches to bundling payment for post-acute care. In: Report to the Congress. Medicare and the Health Care Delivery System. Published June 2013. Accessed June 24, 2021. http://medpac.gov/docs/default-source/reports/jun13_entirereport.pdf [Link no longer active]
58.: Yale New Haven Health Services Corporation/Center for Outcomes Research & Evaluation. Procedure-Specific Measures Updates and Specifications Report: Hospital-Level 30-Day Risk-Standardized Readmission Measures. 2017. Accessed June 1, 2017. https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/HospitalQualityInits/Measure-Methodology.html
59.: Yale New Haven Health Services Corporation/Center for Outcomes Research & Evaluation. 2017 Condition-Specific Measures Updates and Specifications Report: Hospital-Level 30-Day Risk-Standardized Readmission Measures on Acute Myocardial Infarction, Heart Failure, Pneumonia, Chronic Obstructive Pulmonary Disease, and Stroke. 2017. Accessed June 1, 2017. https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/HospitalQualityInits/Measure-Methodology.html
60.: Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373-383. [PubMed: 3558716]
61.: Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Med Care. 1998;36(1):8-27. [PubMed: 9431328]
62.: Islam MM, Valderas JM, Yen L, Dawda P, Jowsey T, McRae IS. Multimorbidity and comorbidity of chronic diseases among the senior Australians: prevalence and patterns. PLoS One. 2014;9(1):e83783. doi:10.1371/journal.pone.0083783 [PMC free article: PMC3885451] [PubMed: 24421905] [CrossRef]
63.: Rand WM. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66(336):846-850.

Related Publications

Bhavnani SK, Andersen AM, Lin YL, Santillana E, Chen T, Kuo YF. The role of bipartite networks in stratified predictive modeling. Paper presented at: AMIA Annual Symposium; November 16-20, 2019; Washington, DC.
Bhavnani SK, Andersen C, Lin YL, Santillana E, Chen T, Kuo YF. The role of visual analytics in predictive modeling. Paper presented at: 2019 AMIA Summit, March 25-28, 2019; San Francisco, CA.
Bhavnani SK, Lin YL, Chennuri LR, Bores JM, Chen CH, Kuo YF. Identification, replication, visualization, and interpretation of patient subgroups: implications for precision medicine, and predictive modeling. Presented at: AMIA 2018 Informatics Summit; March 12-15, 2018; San Francisco, CA.
Bhavnani SK, Dang B, Vishweswaran S, et al. How high-risk comorbidities co-occur in readmitted hip fracture patients: implications for precision medicine and predictive modeling. JMIR Med Inform. 2020;26;8(10):e13567. doi:10.2196/13567 [PMC free article: PMC7652691] [PubMed: 33103657] [CrossRef]

Acknowledgments

We thank Tianlong Chen, Clark Andersen, Yu-Li Lin, and Emmanuel Santillana for their dedication in performing the analysis for this project.

Research reported in this report was funded through a Patient-Centered Outcomes Research Institute® (PCORI®) Award (ME-1511-33194). Further information available at: https://www.pcori.org/research-results/2016/using-visual-analytic-methods-identify-patient-groups

Appendices

Institution receiving the Award: The University of Texas Medical Branch at Galveston

Original Project Title: Leveraging Visual Analytics for the Identification of Patient Subgroups: Application to Improving the Prediction of Hospital Readmission in the Elderly

PCORI ID: ME-1511-33194

Suggested citation:

Bhavnani SK, Kuo Y-F. (2021). Using Visual Analytic Methods to Identify Patient Groups. Patient-Centered Outcomes Research Institute (PCORI). https://doi.org/10.25302/06.2021.ME.151133194

Disclaimer

The [views, statements, opinions] presented in this report are solely the responsibility of the author(s) and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute® (PCORI®), its Board of Governors or Methodology Committee.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License which permits noncommercial use and distribution provided the original author(s) and source are credited. (See https://creativecommons.org/licenses/by-nc-nd/4.0/

Bookshelf ID: NBK603847PMID: 38781405DOI: 10.25302/06.2021.ME.151133194

PubReader
Print View
Cite this Page
Bhavnani SK, Kuo YF. Using Visual Analytic Methods to Identify Patient Groups [Internet]. Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2021 Jun. doi: 10.25302/06.2021.ME.151133194
PDF version of this title (2.1M)

In this Page

Background
Participation of Patients and Other Stakeholders
Methods
Results
Discussion
Conclusions
References
Related Publications
Acknowledgments
Appendices

Other titles in this collection

PCORI Final Research Reports

Related information

NLM Catalog
Related NLM Catalog Entries
PMC
PubMed Central citations
PubMed
Links to PubMed

Recent Activity

Clear Turn Off Turn On

Using Visual Analytic Methods to Identify Patient Groups
Using Visual Analytic Methods to Identify Patient Groups
Creating Survey Questions to Measure Important Aspects of Health for People Livi...
Creating Survey Questions to Measure Important Aspects of Health for People Living with HIV
Mus musculus doublecortin-like kinase 1 (Dclk1), transcript variant 3, mRNA
Mus musculus doublecortin-like kinase 1 (Dclk1), transcript variant 3, mRNA
gi|1271360201|ref|NM_001111052.2|
Nucleotide
PREDICTED: Mus musculus bromodomain adjacent to zinc finger domain, 2B (Baz2b), ...
PREDICTED: Mus musculus bromodomain adjacent to zinc finger domain, 2B (Baz2b), transcript variant X24, mRNA
gi|1907141118|ref|XM_030251840.2|
Nucleotide
Mus musculus MAB21L1 (Mab21l1) mRNA, complete cds
Mus musculus MAB21L1 (Mab21l1) mRNA, complete cds
gi|7677373|gb|AF228913.1|
Nucleotide

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Using Visual Analytic Methods to Identify Patient Groups

Authors

Affiliations

Structured Abstract

Background:

Objectives:

Methods

Data:

Analysis:

Results:

Conclusions:

Limitations

Data limitations:

Method limitations:

Background

Need for Automatic Approaches to Identify Patient Subgroups

Heterogeneity in Humans

Current Approaches for Identifying Patient Subgroups

Strengths and Limitations of Existing Methods

Visual Analytical Approach for the Identification and Comprehension of Patient Subgroups

Bipartite Network Analysis

Role of Patient Subgroups in Developing More Accurate Predictive Models

Stratified Regression Modeling

Need for Automatic Patient Stratification to Improve the Prediction of Hospital Readmission

The High Cost of Hospital Readmission

Current Risk Prediction Models for Readmission

Importance of Comorbidities in Predicting the Risk of Readmission

Differences With Previously Funded Methods for Improving PCOR

Research Questions and Specific Aims

Potential Impact

Participation of Patients and Other Stakeholders

Identification of Research Questions

Stakeholder Engagement

Methods

Research Design

Research Conduct

Data Sources and Data Sets

Data Source

Study Population

Measures

Feature Selection

Analytical and Evaluative Approach

Visual Analytical Modeling

Classification Modeling

Predictive Modeling

Results

Visual Analytical Modeling

Chronic Obstructive Pulmonary Disease

Congestive Heart Failure

Total Hip/Knee Arthroplasty

Classification Modeling

Chronic Obstructive Pulmonary Disease

Congestive Heart Failure

Total Hip and Total Knee Arthroplasty

Predictive Modeling

Chronic Obstructive Pulmonary Disease

Congestive Heart Failure

Total Hip and Total Knee Arthroplasty

Discussion

Overall Results

Strengths and Limitations of the Data

Strengths and Limitations of the Visual Analytical Method

Future Research Recommendations

Conclusions

References

Related Publications

Acknowledgments

Appendices

Appendix A.

Appendix B.

Appendix C.

Appendix D.

Suggested citation:

Disclaimer

Views

In this Page

Other titles in this collection

Related information

Similar articles in PubMed

Recent Activity