Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis

J Biomed Inform. 2007 Dec;40(6):750-60. doi: 10.1016/j.jbi.2007.06.002. Epub 2007 Jun 10.

Abstract

Genome-wide association studies can help identify multi-gene contributions to disease. As the number of high-density genomic markers tested increases, however, so does the number of loci associated with disease by chance. Performing a brute-force test for the interaction of four or more high-density genomic loci is unfeasible given the current computational limitations. Heuristics must be employed to limit the number of statistical tests performed. In this paper we explore the use of biological domain knowledge to supplement statistical analysis and data mining methods to identify genes and pathways associated with disease. We describe Pathway/SNP, a software application designed to help evaluate the association between pathways and disease. Pathway/SNP integrates domain knowledge--SNP, gene and pathway annotation from multiple sources--with statistical and data mining algorithms into a tool that can be used to explore the etiology of complex diseases.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Artificial Intelligence
  • Biomarkers / analysis
  • Chromosome Mapping / methods*
  • Data Interpretation, Statistical
  • Databases, Genetic*
  • Genetic Predisposition to Disease / genetics*
  • Humans
  • Information Storage and Retrieval / methods*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Polymorphism, Single Nucleotide / genetics*
  • Proteome / genetics*
  • Systems Integration

Substances

  • Biomarkers
  • Proteome