Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization

Jin Liu; Jian Huang; Shuangge Ma

doi:10.1111/j.1467-9469.2012.00816.x

Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization

Scand Stat Theory Appl. 2014 Mar 1;41(1):87-103. doi: 10.1111/j.1467-9469.2012.00816.x.

Authors

Jin Liu¹, Jian Huang², Shuangge Ma¹

Affiliations

¹ School of Public Health, Yale University.
² Departments of Statistics & Actuarial Science and Biostatistics, University of Iowa.

Abstract

In cancer diagnosis studies, high-throughput gene profiling has been extensively conducted, searching for genes whose expressions may serve as markers. Data generated from such studies have the "large d, small n" feature, with the number of genes profiled much larger than the sample size. Penalization has been extensively adopted for simultaneous estimation and marker selection. Because of small sample sizes, markers identified from the analysis of single datasets can be unsatisfactory. A cost-effective remedy is to conduct integrative analysis of multiple heterogeneous datasets. In this article, we investigate composite penalization methods for estimation and marker selection in integrative analysis. The proposed methods use the minimax concave penalty (MCP) as the outer penalty. Under the homogeneity model, the ridge penalty is adopted as the inner penalty. Under the heterogeneity model, the Lasso penalty and MCP are adopted as the inner penalty. Effective computational algorithms based on coordinate descent are developed. Numerical studies, including simulation and analysis of practical cancer datasets, show satisfactory performance of the proposed methods.

Keywords: cancer diagnosis studies; composite penalization; gene expression; integrative analysis.

Abstract

Grants and funding