Integrating multidimensional omics data for cancer outcome

Biostatistics. 2016 Oct;17(4):605-18. doi: 10.1093/biostatistics/kxw010. Epub 2016 Mar 14.

Abstract

In multidimensional cancer omics studies, one subject is profiled on multiple layers of omics activities. In this article, the goal is to integrate multiple types of omics measurements, identify markers, and build a model for cancer outcome. The proposed analysis is achieved in two steps. In the first step, we analyze the regulation among different types of omics measurements, through the construction of linear regulatory modules (LRMs). The LRMs have sound biological basis, and their construction differs from the existing analyses by modeling the regulation of sets of gene expressions (GEs) by sets of regulators. The construction is realized with the assistance of regularized singular value decomposition. In the second step, the proposed cancer outcome model includes the regulated GEs, "residuals" of GEs, and "residuals" of regulators, and we use regularized estimation to select relevant markers. Simulation shows that the proposed method outperforms the alternatives with more accurate marker identification. We analyze the The Cancer Genome Atlas data on cutaneous melanoma and lung adenocarcinoma and obtain meaningful results.

Keywords: Integrated analysis; Multidimensional data; Regularized estimation and selection.

MeSH terms

  • Gene Expression Profiling / methods*
  • Gene Expression Regulation / genetics*
  • Genomics / methods*
  • Humans
  • Neoplasms / genetics*
  • Outcome Assessment, Health Care / methods*