An efficient motif discovery algorithm with unknown motif length and number of binding sites

Int J Data Min Bioinform. 2006;1(2):201-15. doi: 10.1504/ijdmb.2006.010856.

Abstract

Most motif discovery algorithms from DNA sequences require the motif's length as input. Styczynski et al. introduced the Extended (l,d)-Motif Problem (EMP) where the motif's length is not an input parameter. Unfortunately, their algorithm takes an unacceptably long time to run, e.g. over 3 months to discover a length-14 motif. Since the best motif may not be the longest nor have the largest number of binding sites, in this paper we further eliminate another input parameter about the minimum number of binding sites in order to provide more realistic/robust results. We also develop an efficient algorithm to solve EMP and this redefined problem.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Motifs
  • Binding Sites
  • Proteins / chemistry*

Substances

  • Proteins