Studying the evolution of transcription factor binding events using multi-species ChIP-Seq data

Stat Appl Genet Mol Biol. 2013 Mar 26;12(1):1-15. doi: 10.1515/sagmb-2012-0004.

Abstract

Recent technology advances make it possible to collect whole-genome transcription factor binding (TFB) profiles from multiple species through the ChIP-Seq data. This provides rich information to understand TFB evolution. However, few rigorous statistical models are available to infer TFB evolution from these data. We have developed a phylogenetic tree based method to model the on/off rates of TFB events. There are two unique features of our method compared to existing models. First, we mask nucleotide substitutions and focus on INDEL disruption of TFB events, which are rarer evolution events and more appropriate for divergent species and non-coding regulatory regions. Second, we correct for ascertainment bias in ChIP-Seq data by maximizing likelihood conditional on the observed (incomplete) data. Simulations show that our method works well in model selection and parameter estimation when there are sufficient aligned TFB events. When this method is applied to a ChIP-Seq data set with five vertebrates, we find that the instantaneous transition rates to INDELs are higher in TFB regions than in homologous non-binding regions. This is driven by an excess of alignment columns showing binding in one species but gaps in all other species. When we compare the inferred transition rates between the conserved and non-conserved regions, as expected, the conserved regions are estimated to have lower transition rates. The R package TFBphylo that implements the described model can be downloaded from http://bioinformatics.med.yale.edu/.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Artifacts
  • Binding Sites / genetics
  • CCAAT-Enhancer-Binding Proteins / metabolism*
  • Chromatin Immunoprecipitation
  • Computer Simulation
  • Conserved Sequence
  • Evolution, Molecular*
  • Humans
  • INDEL Mutation
  • Likelihood Functions
  • Models, Genetic
  • Phylogeny
  • Protein Binding
  • Transcription Factors / metabolism

Substances

  • CCAAT-Enhancer-Binding Proteins
  • CEBPA protein, human
  • Transcription Factors