Accommodating missingness in environmental measurements in gene-environment interaction analysis

Genet Epidemiol. 2017 Sep;41(6):523-554. doi: 10.1002/gepi.22055. Epub 2017 Jun 28.

Abstract

For the prognosis of complex diseases, beyond the main effects of genetic (G) and environmental (E) factors, gene-environment (G-E) interactions also play an important role. Many approaches have been developed for detecting important G-E interactions, most of which assume that measurements are complete. In practical data analysis, missingness in E measurements is not uncommon, and failing to properly accommodate such missingness leads to biased estimation and false marker identification. In this study, we conduct G-E interaction analysis with prognosis data under an accelerated failure time (AFT) model. To accommodate missingness in E measurements, we adopt a nonparametric kernel-based data augmentation approach. With a well-designed weighting scheme, a nice "byproduct" is that the proposed approach enjoys a certain robustness property. A penalization approach, which respects the "main effects, interactions" hierarchy, is adopted for selection (of important interactions and main effects) and regularized estimation. The proposed approach has sound interpretations and a solid statistical basis. It outperforms multiple alternatives in simulation. The analysis of TCGA data on lung cancer and melanoma leads to interesting findings and models with superior prediction.

Keywords: G-E interaction; data augmentation; missing data; penalized estimation; prognosis.

MeSH terms

  • Adenocarcinoma / genetics
  • Adenocarcinoma of Lung
  • Computer Simulation
  • Databases, Genetic
  • Gene-Environment Interaction*
  • Humans
  • Lung Neoplasms / genetics
  • Melanoma / genetics
  • Models, Genetic*
  • Skin Neoplasms / genetics