Estimating the total number of susceptibility variants underlying complex diseases from genome-wide association studies

Hon-Cheong So; Benjamin H K Yip; Pak Chung Sham

doi:10.1371/journal.pone.0013898

Estimating the total number of susceptibility variants underlying complex diseases from genome-wide association studies

PLoS One. 2010 Nov 17;5(11):e13898. doi: 10.1371/journal.pone.0013898.

Authors

Hon-Cheong So¹, Benjamin H K Yip, Pak Chung Sham

Affiliation

¹ Department of Psychiatry, University of Hong Kong, Hong Kong SAR, China.

Abstract

Recently genome-wide association studies (GWAS) have identified numerous susceptibility variants for complex diseases. In this study we proposed several approaches to estimate the total number of variants underlying these diseases. We assume that the variance explained by genetic markers (Vg) follow an exponential distribution, which is justified by previous studies on theories of adaptation. Our aim is to fit the observed distribution of Vg from GWAS to its theoretical distribution. The number of variants is obtained by the heritability divided by the estimated mean of the exponential distribution. In practice, due to limited sample sizes, there is insufficient power to detect variants with small effects. Therefore the power was taken into account in fitting. Besides considering the most significant variants, we also tried to relax the significance threshold, allowing more markers to be fitted. The effects of false positive variants were removed by considering the local false discovery rates. In addition, we developed an alternative approach by directly fitting the z-statistics from GWAS to its theoretical distribution. In all cases, the "winner's curse" effect was corrected analytically. Confidence intervals were also derived. Simulations were performed to compare and verify the performance of different estimators (which incorporates various means of winner's curse correction) and the coverage of the proposed analytic confidence intervals. Our methodology only requires summary statistics and is able to handle both binary and continuous traits. Finally we applied the methods to a few real disease examples (lipid traits, type 2 diabetes and Crohn's disease) and estimated that hundreds to nearly a thousand variants underlie these traits.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computer Simulation
Crohn Disease / genetics
Diabetes Mellitus, Type 2 / genetics
Gene Frequency
Genetic Predisposition to Disease / genetics*
Genome-Wide Association Study / methods*
Humans
Lipid Metabolism / genetics
Models, Genetic
Polymorphism, Single Nucleotide*