Reproducibility of Variant Calls in Replicate Next Generation Sequencing Experiments

PLoS One. 2015 Jul 2;10(7):e0119230. doi: 10.1371/journal.pone.0119230. eCollection 2015.

Abstract

Nucleotide alterations detected by next generation sequencing are not always true biological changes but could represent sequencing errors. Even highly accurate methods can yield substantial error rates when applied to millions of nucleotides. In this study, we examined the reproducibility of nucleotide variant calls in replicate sequencing experiments of the same genomic DNA. We performed targeted sequencing of all known human protein kinase genes (kinome) (~3.2 Mb) using the SOLiD v4 platform. Seventeen breast cancer samples were sequenced in duplicate (n=14) or triplicate (n=3) to assess concordance of all calls and single nucleotide variant (SNV) calls. The concordance rates over the entire sequenced region were >99.99%, while the concordance rates for SNVs were 54.3-75.5%. There was substantial variation in basic sequencing metrics from experiment to experiment. The type of nucleotide substitution and genomic location of the variant had little impact on concordance but concordance increased with coverage level, variant allele count (VAC), variant allele frequency (VAF), variant allele quality and p-value of SNV-call. The most important determinants of concordance were VAC and VAF. Even using the highest stringency of QC metrics the reproducibility of SNV calls was around 80% suggesting that erroneous variant calling can be as high as 20-40% in a single experiment. The sequence data have been deposited into the European Genome-phenome Archive (EGA) with accession number EGAS00001000826.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Artifacts*
  • Breast Neoplasms / genetics*
  • Breast Neoplasms / pathology
  • DNA / genetics
  • Female
  • Gene Frequency
  • Genome, Human
  • High-Throughput Nucleotide Sequencing / statistics & numerical data*
  • Humans
  • Polymorphism, Single Nucleotide*
  • Protein Kinases / genetics*
  • Reproducibility of Results
  • Sequence Analysis, DNA / statistics & numerical data*

Substances

  • DNA
  • Protein Kinases