Correcting the loss of cell-cycle synchrony in clustering analysis of microarray data using weights

Bioinformatics. 2004 Jul 22;20(11):1766-71. doi: 10.1093/bioinformatics/bth169. Epub 2004 May 27.

Abstract

Motivation: Due to the existence of the loss of synchrony in cell-cycle data sets, standard clustering methods (e.g. k-means), which group open reading frames (ORFs) based on similar expression levels, are deficient unless the temporal pattern of the expression levels of the ORFs is taken into account.

Methods: We propose to improve the performance of the k-means method by assigning a decreasing weight on its variable level and evaluating the 'weighted k-means' on a yeast cell-cycle data set. Protein complexes from a public website are used as biological benchmarks. To compare the k-means clusters with the structures of the protein complexes, we measure the agreement between these two ways of clustering via the adjusted Rand index.

Results: Our results show the time-decreasing weight function--exp[-(1/2)(t(2)/C(2))]--which we assign to the variable level of k-means, generally increases the agreement between protein complexes and k-means clusters when C is near the length of two cell cycles.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Biological Clocks / physiology
  • Cell Cycle / physiology*
  • Cluster Analysis*
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Fungal / physiology
  • Models, Genetic
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated
  • Reproducibility of Results
  • Saccharomyces cerevisiae / cytology*
  • Saccharomyces cerevisiae / physiology*
  • Saccharomyces cerevisiae Proteins / genetics
  • Saccharomyces cerevisiae Proteins / metabolism
  • Sensitivity and Specificity
  • Time Factors

Substances

  • Saccharomyces cerevisiae Proteins