Spectral clustering based on learning similarity matrix

Bioinformatics. 2018 Jun 15;34(12):2069-2076. doi: 10.1093/bioinformatics/bty050.

Abstract

Motivation: Single-cell RNA-sequencing (scRNA-seq) technology can generate genome-wide expression data at the single-cell levels. One important objective in scRNA-seq analysis is to cluster cells where each cluster consists of cells belonging to the same cell type based on gene expression patterns.

Results: We introduce a novel spectral clustering framework that imposes sparse structures on a target matrix. Specifically, we utilize multiple doubly stochastic similarity matrices to learn a similarity matrix, motivated by the observation that each similarity matrix can be a different informative representation of the data. We impose a sparse structure on the target matrix followed by shrinking pairwise differences of the rows in the target matrix, motivated by the fact that the target matrix should have these structures in the ideal case. We solve the proposed non-convex problem iteratively using the ADMM algorithm and show the convergence of the algorithm. We evaluate the performance of the proposed clustering method on various simulated as well as real scRNA-seq data, and show that it can identify clusters accurately and robustly.

Availability and implementation: The algorithm is implemented in MATLAB. The source code can be downloaded at https://github.com/ishspsy/project/tree/master/MPSSC.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Gene Expression Profiling / methods*
  • Sequence Analysis, RNA / methods*
  • Single-Cell Analysis / methods*
  • Software