Group-combined P-values with applications to genetic association studies

Bioinformatics. 2016 Sep 15;32(18):2737-43. doi: 10.1093/bioinformatics/btw314. Epub 2016 Jun 3.

Abstract

Motivation: In large-scale genetic association studies with tens of hundreds of single nucleotide polymorphisms (SNPs) genotyped, the traditional statistical framework of logistic regression using maximum likelihood estimator (MLE) to infer the odds ratios of SNPs may not work appropriately. This is because a large number of odds ratios need to be estimated, and the MLEs may be not stable when some of the SNPs are in high linkage disequilibrium. Under this situation, the P-value combination procedures seem to provide good alternatives as they are constructed on the basis of single-marker analysis.

Results: The commonly used P-value combination methods (such as the Fisher's combined test, the truncated product method, the truncated tail strength and the adaptive rank truncated product) may lose power when the significance level varies across SNPs. To tackle this problem, a group combined P-value method (GCP) is proposed, where the P-values are divided into multiple groups and then are combined at the group level. With this strategy, the significance values are integrated at different levels, and the power is improved. Simulation shows that the GCP can effectively control the type I error rates and have additional power over the existing methods-the power increase can be as high as over 50% under some situations. The proposed GCP method is applied to data from the Genetic Analysis Workshop 16. Among all the methods, only the GCP and ARTP can give the significance to identify a genomic region covering gene DSC3 being associated with rheumatoid arthritis, but the GCP provides smaller P-value.

Availability and implementation: http://www.statsci.amss.ac.cn/yjscy/yjy/lqz/201510/t20151027_313273.html

Contact: liqz@amss.ac.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Databases, Genetic
  • Genetic Association Studies
  • Genome-Wide Association Study*
  • Genomics
  • Genotype
  • Linkage Disequilibrium*
  • Models, Statistical*
  • Polymorphism, Single Nucleotide*