Heterodimeric DNA motif synthesis and validations

Nucleic Acids Res. 2019 Feb 28;47(4):1628-1636. doi: 10.1093/nar/gky1297.

Abstract

Bound by transcription factors, DNA motifs (i.e. transcription factor binding sites) are prevalent and important for gene regulation in different tissues at different developmental stages of eukaryotes. Although considerable efforts have been made on elucidating monomeric DNA motif patterns, our knowledge on heterodimeric DNA motifs are still far from complete. Therefore, we propose to develop a computational approach to synthesize a heterodimeric DNA motif from two monomeric DNA motifs. The approach is sequentially divided into two components (Phases A and B). In Phase A, we propose to develop the inference models on how two DNA monomeric motifs can be oriented and overlapped with each other at nucleotide level. In Phase B, given the two monomeric DNA motifs oriented, we further propose to develop DNA-binding family-specific input-output hidden Markov models (IOHMMs) to synthesize a heterodimeric DNA motif. To validate the approach, we execute and cross-validate it with the experimentally verified 618 heterodimeric DNA motifs across 49 DNA-binding family combinations. We observe that our approach can even "rescue" the existing heterodimeric DNA motif pattern (i.e. HOXB2_EOMES) previously published on Nature. Lastly, we apply the proposed approach to infer previously uncharacterized heterodimeric motifs. Their motif instances are supported by DNase accessibility, gene ontology, protein-protein interactions, in vivo ChIP-seq peaks, and even structural data from PDB. A public web-server is built for open accessibility and scientific impact. Its address is listed as follows: http://motif.cs.cityu.edu.hk/custom/MotifKirin.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binding Sites / genetics
  • Computational Biology*
  • DNA Replication / genetics
  • Gene Expression Regulation, Developmental / genetics
  • Genomics / methods*
  • Humans
  • Markov Chains
  • Nucleotide Motifs / genetics*
  • Regulatory Elements, Transcriptional / genetics
  • Sequence Analysis, DNA / methods
  • Software
  • Transcription Factors / chemistry
  • Transcription Factors / genetics*

Substances

  • Transcription Factors