A spatial simulation approach to account for protein structure when identifying non-random somatic mutations

BMC Bioinformatics. 2014 Jul 3:15:231. doi: 10.1186/1471-2105-15-231.

Abstract

Background: Current research suggests that a small set of "driver" mutations are responsible for tumorigenesis while a larger body of "passenger" mutations occur in the tumor but do not progress the disease. Due to recent pharmacological successes in treating cancers caused by driver mutations, a variety of methodologies that attempt to identify such mutations have been developed. Based on the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of cluster identification algorithms has become critical.

Results: We have developed a novel methodology, SpacePAC (Spatial Protein Amino acid Clustering), that identifies mutational clustering by considering the protein tertiary structure directly in 3D space. By combining the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC) and the spatial information in the Protein Data Bank (PDB), SpacePAC is able to identify novel mutation clusters in many proteins such as FGFR3 and CHRM2. In addition, SpacePAC is better able to localize the most significant mutational hotspots as demonstrated in the cases of BRAF and ALK. The R package is available on Bioconductor at: http://www.bioconductor.org/packages/release/bioc/html/SpacePAC.html.

Conclusion: SpacePAC adds a valuable tool to the identification of mutational clusters while considering protein tertiary structure.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Protein
  • Genes, Neoplasm / genetics
  • Humans
  • Mutation*
  • Neoplasms / genetics
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / genetics*

Substances

  • Proteins