Variable selection in strong hierarchical semiparametric models for longitudinal data

Stat Interface. 2015;8(3):355-365. doi: 10.4310/SII.2015.v8.n3.a9.

Abstract

In this paper, we consider the variable selection problem in semiparametric additive partially linear models for longitudinal data. Our goal is to identify relevant main effects and corresponding interactions associated with the response variable. Meanwhile, we enforce the strong hierarchical restriction on the model, that is, an interaction can be included in the model only if both the associated main effects are included. Based on B-splines basis approximation for the nonparametric components, we propose an iterative estimation procedure for the model by penalizing the likelihood with a partial group minimax concave penalty (MCP), and use BIC to select the tuning parameter. To further improve the estimation efficiency, we specify the working covariance matrix by maximum likelihood estimation. Simulation studies indicate that the proposed method tends to consistently select the true model and works efficiently in estimation and prediction with finite samples, especially when the true model obeys the strong hierarchy. Finally, the China Stock Market data are fitted with the proposed model to illustrate its effectiveness.

Keywords: Interaction; Longitudinal data; Semiparametric additive partially linear model; Strong hierarchy; Variable selection.