Network-based regularization for high dimensional SNP data in the case-control study of Type 2 diabetes

BMC Genet. 2017 May 16;18(1):44. doi: 10.1186/s12863-017-0495-5.

Abstract

Background: Over the past decades, the prevalence of type 2 diabetes mellitus (T2D) has been steadily increasing around the world. Despite large efforts devoted to better understand the genetic basis of the disease, the identified susceptibility loci can only account for a small portion of the T2D heritability. Some of the existing approaches proposed for the high dimensional genetic data from the T2D case-control study are limited by analyzing a few number of SNPs at a time from a large pool of SNPs, by ignoring the correlations among SNPs and by adopting inefficient selection techniques.

Methods: We propose a network constrained regularization method to select important SNPs by taking the linkage disequilibrium into account. To accomodate the case control study, an iteratively reweighted least square algorithm has been developed within the coordinate descent framework where optimization of the regularized logistic loss function is performed with respect to one parameter at a time and iteratively cycle through all the parameters until convergence.

Results: In this article, a novel approach is developed to identify important SNPs more effectively through incorporating the interconnections among them in the regularized selection. A coordinate descent based iteratively reweighed least squares (IRLS) algorithm has been proposed.

Conclusions: Both the simulation study and the analysis of the Nurses's Health Study, a case-control study of type 2 diabetes data with high dimensional SNP measurements, demonstrate the advantage of the network based approach over the competing alternatives.

Keywords: Case–control association study; Network-based regularization; Regularized logistic regression; Type 2 diabetes; Variable selection.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Case-Control Studies
  • Computer Simulation
  • Diabetes Mellitus, Type 2 / genetics*
  • Gene Regulatory Networks*
  • Humans
  • Linkage Disequilibrium
  • Polymorphism, Single Nucleotide*