Augmented pseudo-likelihood estimation for two-phase studies

Stat Methods Med Res. 2020 Feb;29(2):344-358. doi: 10.1177/0962280219833415. Epub 2019 Mar 5.

Abstract

In many public health and medical research settings, information on key covariates may not be readily available or too expensive to gather for all individuals in the study. In such settings, the two-phase design provides a way forward by first stratifying an initial (large) phase I sample on the basis of covariates readily available (including, possibly, the outcome), and sub-sampling participants at phase II to collect the expensive measure(s). When the outcome of interest is binary, several methods have been proposed for estimation and inference for the parameters of a logistic regression model, including weighted likelihood, pseudo-likelihood and maximum likelihood. Although these methods yield consistent estimation and valid inference, they do so solely on the basis of the phase I stratification and the detailed covariate information obtained at phase II. Moreover, they ignore any additional information that is readily available at phase I but was not used as part of the stratified sampling design. Motivated by the potential for efficiency gains, especially concerning parameters corresponding to the additional phase I covariates, we propose a novel augmented pseudo-likelihood estimator for two-phase studies that makes use of all available information. In contrast to recently-proposed weighted likelihood-based methods that calibrate to the influence function of the model of interest, the methods we propose do not require the development of additional models and, therefore, enjoy a degree of robustness. In addition, we expand the broader framework for pseudo-likelihood based estimation and inference to permit link functions for binary regression other than the logit link. Comprehensive simulations, based on a one-time cross sectional survey of 82,887 patients undergoing anti-retroviral therapy in Malawi between 2005 and 2007, illustrate finite sample properties of the proposed methods and compare their performance competing approaches. The proposed method yields the lowest standard errors when the model is correctly specified. Finally, the methods are applied to a large implementation science project examining the effect of an enhanced community health worker program to improve adherence to WHO guidelines for at least four antenatal visits, in Dar es Salaam, Tanzania.

Keywords: Calibration; pseudo-likelihood; two-phase design; weighted likelihood.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Analysis of Variance
  • Cross-Sectional Studies
  • Databases, Factual
  • Humans
  • Models, Statistical*
  • Research Design*
  • Tanzania