PrimeIndel: four-prime-number genetic code for indel decryption and sequence read alignment

Clin Chim Acta. 2014 Sep 25:436:1-4. doi: 10.1016/j.cca.2014.04.006. Epub 2014 Apr 24.

Abstract

Background: To decrypt a doubly heterozygous sequence (DHS) in order to define the indel mutation for mutation reporting, an algorithm recursively searching the overlapped nucleotide using an offset of nucleotide positions can decrypt the indel without using a reference sequence. However, as genetic code is letter-based, special computer programs are required to run the decryption algorithm.

Methods: The previous text-based algorithm was converted to a number-based algorithm by expressing DNA sequence from a 4-letter genetic code to a 4-prime-number genetic code, i.e., converting A, C, G, T to 2, 3, 5, and 7. This algorithm based on prime-number genetic code is called PrimeIndel and is executable by spreadsheet. Using prime number coded DNA sequence, the overlapped nucleotide between any 2 positions of the DHS is represented by the greatest common divisor (GCD) of the multiplication product of 2 prime numbers. This algorithm can also be used for aligning multiple overlapping sequence reads by in-silico DHS formation. The indel size of the in-silico formed DHS indicates the positions in the paired sequences for correct alignment.

Results: DHSs were successfully decrypted by the prime number-based algorithm and sequence reads were aligned correctly.

Conclusions: DNA sequence expressed in prime numbers can be used for the decryption of DHS and the alignment of sequence reads using a well-known mathematical function GCD of a spreadsheet program. PrimeIndel is a useful tool for mutation reporting in clinical laboratories. The software is downloadable from http://www.patho.hku.hk/staff/list/cwlam.htm.

Keywords: Indel decryption; Prime-number genetic code; PrimeIndel; Sequence read alignment.

MeSH terms

  • Algorithms*
  • DNA Mutational Analysis / methods*
  • Heterozygote
  • INDEL Mutation*
  • Sequence Alignment / methods*
  • Statistics as Topic / methods*