Finding linear motif pairs from protein interaction networks: a probabilistic approach

Comput Syst Bioinformatics Conf. 2007:6:111-9.

Abstract

Finding motif pairs from a set of protein sequences based on the protein-protein interaction data is a challenging computational problem. Existing effective approaches usually rely on additional information such as some prior knowledge on protein groupings based on protein domains. In reality, this kind of knowledge is not always available. Novel approaches without using this knowledge is much desirable. Recently, Tan et al. proposed such an approach. However, there are two problems with their approach. The scoring function (using chi(2) testing) used in their approach is not adequate. Random motif pairs may have higher scores than the correct ones. Their approach is also not scalable. It may take days to process a set of 5000 protein sequences with about 20,000 interactions. In this paper, our contribution is two-fold. We first introduce a new scoring method, which is shown to be more accurate than the chi-score used in Tan et al. Then, we present two efficient algorithms, one exact algorithm and a heuristic version of it, to solve the problem of finding motif pairs. Based on experiments on real datasets, we show that our algorithms are efficient and can accurately locate the motif pairs. We have also evaluated the sensitivity and efficiency of our heuristics algorithm using simulated datasets, the results show that the algorithm is very efficient with reasonably high sensitivity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Motifs
  • Amino Acid Sequence
  • Binding Sites
  • Computer Simulation
  • Data Interpretation, Statistical
  • Models, Biological*
  • Models, Chemical*
  • Models, Statistical
  • Molecular Sequence Data
  • Protein Binding
  • Protein Interaction Mapping / methods*
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Sequence Analysis, Protein / methods*
  • Signal Transduction / physiology*

Substances

  • Proteins