A universal algorithm for de novo decrypting of heterozygous indel sequences: a tool for personalized medicine

Clin Chim Acta. 2008 Mar;389(1-2):7-13. doi: 10.1016/j.cca.2007.11.011. Epub 2007 Nov 23.

Abstract

Introduction: Indels (insertions/deletions) are important DNA sequence variations because of the high frequency in the human genome, the deleterious effects on the reading frame and protein expression, and the association with disease and disease susceptibility of common diseases. In a recent study with a human individual with the whole genome sequenced, 292,102 heterozygous indels and 559,473 homozygous indels were identified. Decrypting such a large number of heterozygous indels is computationally intensive and requires efficient algorithms. However, the current algorithms for decrypting heterozygous indel cannot be applied to unprecedented sequenced genomes and cannot be performed without reference sequences or reference sequence tracings for sequenced genomes.

Methods: A new algorithm for de novo decrypting of heterozygous indels is conceptualized in the direction of isolating the indel sequence from the genotype or diploid sequence. A universal algorithm is described, here, for heterozygous indel detection, indel size determination, and de novo decrypting of the indel sequence without subtracting the diploid DNA sequence from the reference sequence or reference sequence tracing.

Results: The result obtained by this algorithm is exactly the same as that obtained by PolyPhred and PolyScan. Unlike these algorithms, this new algorithm is not computationally intense for large indels, is independent of sequencing technologies and applies to genotype data derived from all existing sequencing technology platforms. A read of only 29 bases is enough to reduce the false detection rate (FDR) to 1 in a million.

Conclusions: This algorithm is unique amongst all the existing algorithms in terms of performing the task of indel detection, size determination, and decrypting simultaneously. This universal approach eliminates the requirement of a reference sequence or sequence tracing and makes this algorithm unique in decrypting unprecedented sequenced genomes. Because of the high frequency of heterozygous indels in human genome, this universal algorithm will greatly reduce the time required for post-sequencing data analysis in whole genome sequencing of an individual for the practice of personalized medicine.

MeSH terms

  • Algorithms*
  • Base Sequence
  • Genetic Diseases, Inborn / diagnosis
  • Genetic Diseases, Inborn / genetics
  • Heterozygote*
  • INDEL Mutation*
  • Molecular Sequence Data
  • Mutation