Bayesian semiparametric regression models for evaluating pathway effects on continuous and binary clinical outcomes

Stat Med. 2012 Jul 10;31(15):1633-51. doi: 10.1002/sim.4493. Epub 2012 Mar 22.

Abstract

Many statistical methods for microarray data analysis consider one gene at a time, and they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from prior biological knowledge. Limited work has been carried out in the regression setting to study the effects of clinical covariates and expression levels of genes in a pathway either on a continuous or on a binary clinical outcome. Hence, we propose a Bayesian approach for identifying pathways related to both types of outcomes. We compare our Bayesian approaches with a likelihood-based approach that was developed by relating a least squares kernel machine for nonparametric pathway effect with a restricted maximum likelihood for variance components. Unlike the likelihood-based approach, the Bayesian approach allows us to directly estimate all parameters and pathway effects. It can incorporate prior knowledge into Bayesian hierarchical model formulation and makes inference by using the posterior samples without asymptotic theory. We consider several kernels (Gaussian, polynomial, and neural network kernels) to characterize gene expression effects in a pathway on clinical outcomes. Our simulation results suggest that the Bayesian approach has more accurate coverage probability than the likelihood-based approach, and this is especially so when the sample size is small compared with the number of genes being studied in a pathway. We demonstrate the usefulness of our approaches through its applications to a type II diabetes mellitus data set. Our approaches can also be applied to other settings where a large number of strongly correlated predictors are present.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bayes Theorem*
  • Case-Control Studies
  • Computer Simulation
  • Diabetes Mellitus, Type 2 / genetics*
  • Gene Expression Profiling / methods
  • Gene Expression Profiling / statistics & numerical data*
  • Humans
  • Male
  • Microarray Analysis / statistics & numerical data
  • Normal Distribution
  • Outcome Assessment, Health Care / methods
  • Outcome Assessment, Health Care / statistics & numerical data*
  • Regression Analysis