Identifying gene-environment and gene-gene interactions using a progressive penalization approach

Genet Epidemiol. 2014 May;38(4):353-68. doi: 10.1002/gepi.21807. Epub 2014 Apr 10.

Abstract

In genomic studies, identifying important gene-environment and gene-gene interactions is a challenging problem. In this study, we adopt the statistical modeling approach, where interactions are represented by product terms in regression models. For the identification of important interactions, we adopt penalization, which has been used in many genomic studies. Straightforward application of penalization does not respect the "main effect, interaction" hierarchical structure. A few recently proposed methods respect this structure by applying constrained penalization. However, they demand very complicated computational algorithms and can only accommodate a small number of genomic measurements. We propose a computationally fast penalization method that can identify important gene-environment and gene-gene interactions and respect a strong hierarchical structure. The method takes a stagewise approach and progressively expands its optimization domain to account for possible hierarchical interactions. It is applicable to multiple data types and models. A coordinate descent method is utilized to produce the entire regularized solution path. Simulation study demonstrates the superior performance of the proposed method. We analyze a lung cancer prognosis study with gene expression measurements and identify important gene-environment interactions.

Keywords: gene-environment interactions; gene-gene interactions; progressive penalization; stagewise regression.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Gene Expression Regulation, Neoplastic
  • Gene-Environment Interaction*
  • Genes / genetics*
  • Genomics
  • Humans
  • Lung Neoplasms / diagnosis
  • Lung Neoplasms / genetics
  • Models, Genetic*
  • Models, Statistical
  • Prognosis