A stochastic model for the evolution of transcription factor binding site abundance

J Theor Biol. 2007 Aug 7;247(3):544-53. doi: 10.1016/j.jtbi.2007.03.001. Epub 2007 Mar 7.

Abstract

Both experimental as well as sequence evolution evidence suggests that transcription factor binding sites can undergo divergence and turnover even when the transcriptional output remains conserved. Furthermore, it is likely that there exist lineage specific differences in the retention rate of binding sites that make it desirable to estimate the rate of acquisition and decay of transcription factor binding sites from comparative sequence data. In this paper we propose a stochastic, phenomenological model for binding site turnover. For a given genomic region we assume a constant rate of binding site origination lambda and a constant per site decay rate of mu. We derived an explicit expression for the conditional probability distribution of the number of binding sites n at time t given n(0) binding sites at t=0. The analytical result was compared to a simulation model and we found that it closely predicts the simulated sequence evolution. We then analyzed a small data set of the number of estrogen response elements (ERE) in mammalian HoxA sequences and showed that the data is broadly consistent with the assumption of a stationary turnover process. A regression of shared EREs over the time since divergence led to an estimate of the half-life time for an ERE in the primate HoxA clusters of about 27 Myr, which corresponds to a per site decay rate of mu approximately 1.3 x 10(-8)/year and a rate of origination of lambda approximately 1.6 x 10(-7)/year. We conclude that the model can be used to estimate the rate of binding site turnover from comparative genomic data.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • DNA-Binding Proteins / genetics
  • Evolution, Molecular*
  • Gene Expression Regulation, Developmental*
  • Genes, Homeobox*
  • Genome
  • Humans
  • Models, Genetic*
  • Plant Proteins / genetics
  • Stochastic Processes
  • Transcription Factors / genetics*

Substances

  • DNA-Binding Proteins
  • Plant Proteins
  • Transcription Factors
  • ethylene-responsive element binding protein