Finding exact optimal motifs in matrix representation by partitioning

Bioinformatics. 2005 Sep 1:21 Suppl 2:ii86-92. doi: 10.1093/bioinformatics/bti1115.

Abstract

Motivation: Finding common patterns, or motifs, in the promoter regions of co-expressed genes is an important problem in bioinformatics. A common representation of the motif is by probability matrix or PSSM (position specific scoring matrix). However, even for a motif of length six or seven, there is no algorithm that can guarantee finding the exact optimal matrix from an infinite number of possible matrices.

Results: This paper introduces the first algorithm, called EOMM, for finding the exact optimal matrix-represented motif, or simply optimal motif. Based on branch-and-bound searching by partitioning the solution space recursively, EOMM can find the optimal motif of size up to eight or nine, and a motif of larger size with any desired accuracy on the principle that the smaller the error bound, the longer the running time. Experiments show that for some real and simulated data sets, EOMM finds the motif despite very weak signals when existing software, such as MEME and MITRA-PSSM, fails to do so.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Motifs / genetics
  • Base Sequence
  • Molecular Sequence Data
  • Multigene Family / genetics*
  • Proteins / genetics*
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*

Substances

  • Proteins