Control of dataset bias in combined Affymetrix cohorts of triple negative breast cancer

Genom Data. 2014 Oct 23:2:354-6. doi: 10.1016/j.gdata.2014.09.014. eCollection 2014 Dec.

Abstract

Heterogenous subtypes of breast cancer need to be analyzed separately. Pooling of datasets can provide reasonable sample sizes but dataset bias is an important concern. We assembled a combined dataset of 579 Affymetrix microarrays from triple negative breast cancer (TNBC) in Gene Expression Omnibus (GEO) series GSE31519. We developed a method for selecting comparable datasets and to control for the amount of dataset bias of individual probesets.

Keywords: Breast cancer; Dataset bias; Gene expression; Microarray; Pooling.