A Dirichlet-tree multinomial regression model for associating dietary nutrients with gut microorganisms

Biometrics. 2017 Sep;73(3):792-801. doi: 10.1111/biom.12654. Epub 2017 Jan 23.

Abstract

Understanding the factors that alter the composition of the human microbiota may help personalized healthcare strategies and therapeutic drug targets. In many sequencing studies, microbial communities are characterized by a list of taxa, their counts, and their evolutionary relationships represented by a phylogenetic tree. In this article, we consider an extension of the Dirichlet multinomial distribution, called the Dirichlet-tree multinomial distribution, for multivariate, over-dispersed, and tree-structured count data. To address the relationships between these counts and a set of covariates, we propose the Dirichlet-tree multinomial regression model for which we develop a penalized likelihood method for estimating parameters and selecting covariates. For efficient optimization, we adopt the accelerated proximal gradient approach. Simulation studies are presented to demonstrate the good performance of the proposed procedure. An analysis of a data set relating dietary nutrients with bacterial counts is used to show that the incorporation of the tree structure into the model helps increase the prediction power.

Keywords: Dirichlet distributions; Over-dispersion; Sparse group lasso; Tree-structured learning.

MeSH terms

  • Diet
  • Gastrointestinal Microbiome*
  • Humans
  • Likelihood Functions
  • Phylogeny
  • Regression Analysis
  • Statistical Distributions