Bayesian methods for predicting interacting protein pairs using domain information

Biometrics. 2007 Sep;63(3):824-33. doi: 10.1111/j.1541-0420.2007.00755.x.

Abstract

Protein-protein interactions (PPIs) play important roles in most fundamental cellular processes including cell cycle, metabolism, and cell proliferation. Therefore, the development of effective statistical approaches to predicting protein interactions based on recently available large-scale experimental data is very important. Because protein domains are the functional units of proteins and PPIs are mostly achieved through domain-domain interactions (DDIs), the modeling and analysis of protein interactions at the domain level may be more informative and insightful. However, due to the large number of domains, the number of parameters to be estimated is very large, yet the amount of information for statistical inference is quite limited. In this article we propose a full Bayesian method and a semi-Bayesian method for simultaneously estimating DDI probabilities, the false positive rate, and the false negative rate of high-throughput data through integrating data from several organisms. We also propose a model to associate protein interaction probabilities with domain interaction probabilities that reflects the number of domains in each protein. Our Bayesian methods are compared with the likelihood-based approach (Deng et al., 2002, Genome Research12, 1504-1508; Liu, Liu, and Zhao, 2005, Bioinformatics21, 3279-3285) developed using the expectation maximization algorithm. We show that the full Bayesian method has the smallest mean square error through both simulations and theoretical justification under a special scenario. The large-scale PPI data obtained from high-throughput yeast two-hybrid experiments are used to demonstrate the advantages of the Bayesian approaches.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Binding Sites
  • Computer Simulation
  • Data Interpretation, Statistical
  • Databases, Protein
  • Models, Biological*
  • Models, Chemical*
  • Models, Statistical
  • Protein Binding
  • Protein Interaction Mapping / methods*
  • Protein Structure, Tertiary / physiology*
  • Sequence Analysis, Protein / methods*