Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach

J Am Stat Assoc. 2018;113(522):845-854. doi: 10.1080/01621459.2017.1292915. Epub 2018 Jun 6.

Abstract

Ordinal outcomes are common in scientific research and everyday practice, and we often rely on regression models to make inference. A long-standing problem with such regression analyses is the lack of effective diagnostic tools for validating model assumptions. The difficulty arises from the fact that an ordinal variable has discrete values that are labeled with, but not, numerical values. The values merely represent ordered categories. In this paper, we propose a surrogate approach to defining residuals for an ordinal outcome Y. The idea is to define a continuous variable S as a "surrogate" of Y and then obtain residuals based on S. For the general class of cumulative link regression models, we study the residual's theoretical and graphical properties. We show that the residual has null properties similar to those of the common residuals for continuous outcomes. Our numerical studies demonstrate that the residual has power to detect misspecification with respect to 1) mean structures; 2) link functions; 3) heteroscedasticity; 4) proportionality; and 5) mixed populations. The proposed residual also enables us to develop numeric measures for goodness-of-fit using classical distance notions. Our results suggest that compared to a previously defined residual, our residual can reveal deeper insights into model diagnostics. We stress that this work focuses on residual analysis, rather than hypothesis testing. The latter has limited utility as it only provides a single p-value, whereas our residual can reveal what components of the model are misspecified and advise how to make improvements.

Keywords: goodness-of-fit; logistic odds model; model diagnostics; probit model.