CCor: A whole genome network-based similarity measure between two genes

Biometrics. 2016 Dec;72(4):1216-1225. doi: 10.1111/biom.12508. Epub 2016 Mar 8.

Abstract

Measuring the similarity between genes is often the starting point for building gene regulatory networks. Most similarity measures used in practice only consider pairwise information with a few also consider network structure. Although theoretical properties of pairwise measures are well understood in the statistics literature, little is known about their statistical properties of those similarity measures based on network structure. In this article, we consider a new whole genome network-based similarity measure, called CCor, that makes use of information of all the genes in the network. We derive a concentration inequality of CCor and compare it with the commonly used Pearson correlation coefficient for inferring network modules. Both theoretical analysis and real data example demonstrate the advantages of CCor over existing measures for inferring gene modules.

Keywords: Clustering; Co-expression; Concentration inequality; Gaussian graphical model; Gene module; Pearson correlation; Similarity measure; Topological overmap measure.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Gene Expression Profiling*
  • Gene Regulatory Networks*
  • Genome*
  • Models, Statistical