A parametric model to estimate the proportion from true null using a distribution for p-values

Comput Stat Data Anal. 2017 Oct:114:105-118. doi: 10.1016/j.csda.2017.04.008. Epub 2017 Apr 29.

Abstract

Microarray studies generate a large number of p-values from many gene expression comparisons. The estimate of the proportion of the p-values sampled from the null hypothesis draws broad interest. The two-component mixture model is often used to estimate this proportion. If the data are generated under the null hypothesis, the p-values follow the uniform distribution. What is the distribution of p-values when data are sampled from the alternative hypothesis? The distribution is derived for the chi-squared test. Then this distribution is used to estimate the proportion of p-values sampled from the null hypothesis in a parametric framework. Simulation studies are conducted to evaluate its performance in comparison with five recent methods. Even in scenarios with clusters of correlated p-values and a multicomponent mixture or a continuous mixture in the alternative, the new method performs robustly. The methods are demonstrated through an analysis of a real microarray dataset.

Keywords: distribution of p-values; microarray studies; mixture model; proportion from the null hypothesis.