Removal of batch effects using distribution-matching residual networks

Bioinformatics. 2017 Aug 15;33(16):2539-2546. doi: 10.1093/bioinformatics/btx196.

Abstract

Motivation: Sources of variability in experimentally derived data include measurement error in addition to the physical phenomena of interest. This measurement error is a combination of systematic components, originating from the measuring instrument and random measurement errors. Several novel biological technologies, such as mass cytometry and single-cell RNA-seq (scRNA-seq), are plagued with systematic errors that may severely affect statistical analysis if the data are not properly calibrated.

Results: We propose a novel deep learning approach for removing systematic batch effects. Our method is based on a residual neural network, trained to minimize the Maximum Mean Discrepancy between the multivariate distributions of two replicates, measured in different batches. We apply our method to mass cytometry and scRNA-seq datasets, and demonstrate that it effectively attenuates batch effects.

Availability and implementation: our codes and data are publicly available at https://github.com/ushaham/BatchEffectRemoval.git.

Contact: yuval.kluger@yale.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Computational Biology / methods*
  • Cytophotometry / methods
  • Data Accuracy*
  • Humans
  • Machine Learning*
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis / methods
  • Statistics as Topic*