Identification of human circadian genes based on time course gene expression profiles by using a deep learning method

Biochim Biophys Acta Mol Basis Dis. 2018 Jun;1864(6 Pt B):2274-2283. doi: 10.1016/j.bbadis.2017.12.004. Epub 2017 Dec 12.

Abstract

Circadian genes express periodically in an approximate 24-h period and the identification and study of these genes can provide deep understanding of the circadian control which plays significant roles in human health. Although many circadian gene identification algorithms have been developed, large numbers of false positives and low coverage are still major problems in this field. In this study we constructed a novel computational framework for circadian gene identification using deep neural networks (DNN) - a deep learning algorithm which can represent the raw form of data patterns without imposing assumptions on the expression distribution. Firstly, we transformed time-course gene expression data into categorical-state data to denote the changing trend of gene expression. Two distinct expression patterns emerged after clustering of the state data for circadian genes from our manually created learning dataset. DNN was then applied to discriminate the aperiodic genes and the two subtypes of periodic genes. In order to assess the performance of DNN, four commonly used machine learning methods including k-nearest neighbors, logistic regression, naïve Bayes, and support vector machines were used for comparison. The results show that the DNN model achieves the best balanced precision and recall. Next, we conducted large scale circadian gene detection using the trained DNN model for the remaining transcription profiles. Comparing with JTK_CYCLE and a study performed by Möller-Levet et al. (doi: https://doi.org/10.1073/pnas.1217154110), we identified 1132 novel periodic genes. Through the functional analysis of these novel circadian genes, we found that the GTPase superfamily exhibits distinct circadian expression patterns and may provide a molecular switch of circadian control of the functioning of the immune system in human blood. Our study provides novel insights into both the circadian gene identification field and the study of complex circadian-driven biological control. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.

Keywords: Circadian genes; Classification; Deep learning; Deep neural network; Functional analysis; GTPase.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Circadian Rhythm / physiology*
  • Databases, Genetic*
  • Gene Expression Profiling / methods*
  • Humans
  • Machine Learning*
  • Transcriptome / physiology*