A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa

PLoS Comput Biol. 2018 Apr 9;14(4):e1006053. doi: 10.1371/journal.pcbi.1006053. eCollection 2018 Apr.

Abstract

Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Case-Control Studies
  • Cluster Analysis
  • Collagen Type VII / genetics
  • Computational Biology
  • Computer Simulation
  • Embryonic Stem Cells / cytology
  • Embryonic Stem Cells / metabolism
  • Epidermolysis Bullosa Dystrophica / genetics*
  • Gene Expression Profiling / methods
  • Genetic Markers
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Leukocytes, Mononuclear / cytology
  • Leukocytes, Mononuclear / metabolism
  • Lung / cytology
  • Lung / metabolism
  • Machine Learning
  • Mice
  • Models, Genetic
  • RNA / genetics
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis / methods

Substances

  • COL7A1 protein, human
  • Collagen Type VII
  • Genetic Markers
  • RNA