Integrative analysis of multiple cancer prognosis studies with gene expression measurements

Stat Med. 2011 Dec 10;30(28):3361-71. doi: 10.1002/sim.4337. Epub 2011 Aug 25.

Abstract

Although in cancer research microarray gene profiling studies have been successful in identifying genetic variants predisposing to the development and progression of cancer, the identified markers from analysis of single datasets often suffer low reproducibility. Among multiple possible causes, the most important one is the small sample size hence the lack of power of single studies. Integrative analysis jointly considers multiple heterogeneous studies, has a significantly larger sample size, and can improve reproducibility. In this article, we focus on cancer prognosis studies, where the response variables are progression-free, overall, or other types of survival. A group minimax concave penalty (GMCP) penalized integrative analysis approach is proposed for analyzing multiple heterogeneous cancer prognosis studies with microarray gene expression measurements. An efficient group coordinate descent algorithm is developed. The GMCP can automatically accommodate the heterogeneity across multiple datasets, and the identified markers have consistent effects across multiple studies. Simulation studies show that the GMCP provides significantly improved selection results as compared with the existing meta-analysis approaches, intensity approaches, and group Lasso penalized integrative analysis. We apply the GMCP to four microarray studies and identify genes associated with the prognosis of breast cancer.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Breast Neoplasms / diagnosis
  • Breast Neoplasms / genetics
  • Computer Simulation
  • Female
  • Gene Expression Profiling*
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Kaplan-Meier Estimate
  • Least-Squares Analysis
  • Meta-Analysis as Topic
  • Models, Statistical*
  • Neoplasms / diagnosis*
  • Neoplasms / genetics*
  • Oligonucleotide Array Sequence Analysis
  • Prognosis
  • Sample Size
  • Survival Analysis