Overlapping probabilities of top ranking gene lists, hypergeometric distribution, and stringency of gene selection criterion

Wen Fury; Franak Batliwalla; Peter K Gregersen; Wentian Li

doi:10.1109/IEMBS.2006.260828

Overlapping probabilities of top ranking gene lists, hypergeometric distribution, and stringency of gene selection criterion

Conf Proc IEEE Eng Med Biol Soc. 2006:2006:5531-4. doi: 10.1109/IEMBS.2006.260828.

Authors

Wen Fury¹, Franak Batliwalla, Peter K Gregersen, Wentian Li

Affiliation

¹ Regeneron Pharmaceutical Inc., Tarrytown, NY 10591, USA. wen.fury@regeneron.com

PMID: 17947148
DOI: 10.1109/IEMBS.2006.260828

Abstract

When the same set of genes appear in two top ranking gene lists in two different studies, it is often of interest to estimate the probability for this being a chance event. This overlapping probability is well known to follow the hypergeometric distribution. Usually, the lengths of top-ranking gene lists are assumed to be fixed, by using a pre-set criterion on, e.g., p-value for the t-test. We investigate how overlapping probability changes with the gene selection criterion, or simply, with the length of the top-ranking gene lists. It is concluded that overlapping probability is indeed a function of the gene list length, and its statistical significance should be quoted in the context of gene selection criterion.

MeSH terms

Algorithms
Cluster Analysis
Data Interpretation, Statistical
Databases, Protein
Gene Expression Profiling*
Gene Expression Regulation*
Humans
Models, Genetic
Models, Statistical
Models, Theoretical
Oligonucleotide Array Sequence Analysis / instrumentation
Oligonucleotide Array Sequence Analysis / methods*
Pattern Recognition, Automated
Probability
Software