Quick, sensitive and specific detection and evaluation of quantification of minor variants by high-throughput sequencing

Mol Biosyst. 2014 Feb;10(2):206-14. doi: 10.1039/c3mb70334g.

Abstract

Minor variants have significant implications in quasispecies evolution, early cancer detection and non-invasive fetal genotyping but their accurate detection by next-generation sequencing (NGS) is hampered by sequencing errors. We generated sequencing data from mixtures at predetermined ratios in order to provide insight into sequencing errors and variations that can arise for which simulation cannot be performed. The information also enables better parameterization in depth of coverage, read quality and heterogeneity, library preparation techniques, technical repeatability for mathematical modeling, theory development and simulation experimental design. We devised minor variant authentication rules that achieved 100% accuracy in both testing and validation experiments. The rules are free from tedious inspection of alignment accuracy, sequencing read quality or errors introduced by homopolymers. The authentication processes only require minor variants to: (1) have minimum depth of coverage larger than 30; (2) be reported by (a) four or more variant callers, or (b) DiBayes or LoFreq, plus SNVer (or BWA when no results are returned by SNVer), and with the interassay coefficient of variation (CV) no larger than 0.1. Quantification accuracy undermined by sequencing errors could neither be overcome by ultra-deep sequencing, nor recruiting more variant callers to reach a consensus, such that consistent underestimation and overestimation (i.e. low CV) were observed. To accommodate stochastic error and adjust the observed ratio within a specified accuracy, we presented a proof of concept for the use of a double calibration curve for quantification, which provides an important reference towards potential industrial-scale fabrication of calibrants for NGS.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Genetic Variation*
  • Genome, Bacterial
  • Haplotypes*
  • High-Throughput Nucleotide Sequencing / methods*
  • Mutation
  • Plasmids
  • Sensitivity and Specificity
  • Software
  • Viral Nonstructural Proteins / genetics*

Substances

  • NS protein, influenza virus
  • Viral Nonstructural Proteins