Detecting differentially expressed genes by relative entropy

J Theor Biol. 2005 Jun 7;234(3):395-402. doi: 10.1016/j.jtbi.2004.11.039. Epub 2005 Jan 24.

Abstract

DNA microarray experiments have generated large amount of gene expression measurements across different conditions. One crucial step in the analysis of these data is to detect differentially expressed genes. Some parametric methods, including the two-sample t-test (T-test) and variations of it, have been used. Alternatively, a class of non-parametric algorithms, such as the Wilcoxon rank sum test (WRST), significance analysis of microarrays (SAM) of Tusher et al. (2001), the empirical Bayesian (EB) method of Efron et al. (2001), etc., have been proposed. Most available popular methods are based on t-statistic. Due to the quality of the statistic that they used to describe the difference between groups of data, there are situations when these methods are inefficient, especially when the data follows multi-modal distributions. For example, some genes may display different expression patterns in the same cell type, say, tumor or normal, to form some subtypes. Most available methods are likely to miss these genes. We developed a new non-parametric method for selecting differentially expressed genes by relative entropy, called SDEGRE, to detect differentially expressed genes by combining relative entropy and kernel density estimation, which can detect all types of differences between two groups of samples. The significance of whether a gene is differentially expressed or not can be estimated by resampling-based permutations. We illustrate our method on two data sets from Golub et al. (1999) and Alon et al. (1999). Comparing the results with those of the T-test, the WRST and the SAM, we identified novel differentially expressed genes which are of biological significance through previous biological studies while they were not detected by the other three methods. The results also show that the genes selected by SDEGRE have a better capability to distinguish the two cell types.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acute Disease
  • Animals
  • Entropy*
  • Gene Expression
  • Gene Expression Profiling*
  • Leukemia, Myeloid / genetics
  • Models, Genetic*
  • Models, Statistical*
  • Oligonucleotide Array Sequence Analysis*
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / genetics
  • Reproducibility of Results