Characterizing linkage disequilibrium and evaluating imputation power of human genomic insertion-deletion polymorphisms

Genome Biol. 2012 Feb 29;13(2):R15. doi: 10.1186/gb-2012-13-2-r15.

Abstract

Background: Indels are an important cause of human variation and central to the study of human disease. The 1000 Genomes Project Low-Coverage Pilot identified over 1.3 million indels shorter than 50 bp, of which over 890 were identified as potentially disruptive variants. Yet, despite their ubiquity, the local genomic characteristics of indels remain unexplored.

Results: Herein we describe population- and minor allele frequency-based differences in linkage disequilibrium and imputation characteristics for indels included in the 1000 Genomes Project Low-Coverage Pilot for the CEU, YRI and CHB+JPT populations. Common indels were well tagged by nearby SNPs in all studied populations, and were also tagged at a similar rate to common SNPs. Both neutral and functionally deleterious common indels were imputed with greater than 95% concordance from HapMap Phase 3 and OMNI SNP sites. Further, 38 to 56% of low frequency indels were tagged by low frequency SNPs. We were able to impute heterozygous low frequency indels with over 50% concordance. Lastly, our analysis also revealed evidence of ascertainment bias. This bias prevents us from extending the applicability of our results to highly polymorphic indels that could not be identified in the Low-Coverage Pilot.

Conclusions: Although further scope exists to improve the imputation of low frequency indels, our study demonstrates that there are already ample opportunities to retrospectively impute indels for prior genome-wide association studies and to incorporate indel imputation into future case/control studies.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Frequency
  • Genetics, Population
  • Genome, Human
  • Genome-Wide Association Study
  • HapMap Project
  • Heterozygote
  • Humans
  • INDEL Mutation / genetics*
  • Linkage Disequilibrium*
  • Polymorphism, Genetic*
  • Polymorphism, Single Nucleotide / genetics*