Improved risk prediction for Crohn's disease with a multi-locus approach

Hum Mol Genet. 2011 Jun 15;20(12):2435-42. doi: 10.1093/hmg/ddr116. Epub 2011 Mar 22.

Abstract

Genome-wide association studies have identified numerous loci demonstrating genome-wide significant association with Crohn's disease. However, when many single nucleotide polymorphisms (SNPs) have weak-to-moderate disease risks, genetic risk prediction models based only on those markers that pass the most stringent statistical significance testing threshold may be suboptimal. Haplotype-based predictive models may provide advantages over single-SNP approaches by facilitating detection of associations driven by cis-interactions among nearby SNPs. In addition, these approaches may be helpful in assaying non-genotyped, rare causal variants. In this study, we investigated the use of two-marker haplotypes for risk prediction in Crohn's disease and show that it leads to improved prediction accuracy compared with single-point analyses. With large numbers of predictors, traditional classification methods such as logistic regression and support vector machine approaches may be suboptimal. An alternative approach is to apply the risk-score method calculated as the number of risk haplotypes an individual carries, both within and across loci. We used the area under the curve (AUC) of the receiver operating curve to assess the performance of prediction models in large-scale genetic data, and observed that the prediction performance in the validation cohort continues to improve as thousands of haplotypes are included in the model, with the AUC reaching its plateau at 0.72 at ∼7000 haplotypes, and begins to gradually decline after that point. In contrast, using the SNP as predictors, we only obtained maximum AUC of 0.65. Validation studies in independent cohorts further support improved prediction capacity with multi-marker, as opposed to single marker analyses.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Area Under Curve
  • Crohn Disease / genetics*
  • Genetic Predisposition to Disease / genetics*
  • Genome-Wide Association Study / methods*
  • Haplotypes / genetics
  • Humans
  • Models, Genetic
  • Polymorphism, Single Nucleotide / genetics
  • Risk Assessment / methods