According to the literature search, WGS is currently applied in medical genetics and heritable monogenic disorders, which interrogate germ line variants.
More...According to the literature search, WGS is currently applied in medical genetics and heritable monogenic disorders, which interrogate germ line variants. Studies reported in the previous paragraph usually conduct variant calling following gold standard GATK pipeline (https://software.broadinstitute.org/gatk/best-practices/), additionally supported in cancer studies by somatic variant callers [do Valle and others 2016]. From technical point of view, the critical step in such pipeline is variant calling, which must be precise, adequate to WGS coverage and to the type of experiment. Our literature search indicates that the most accurate variant calls for 30x human WGS were recently reported by PrecisionFDA Truth Challenge (https://precision.fda.gov/challenges/truth/results). F-score values (Harmonic mean of recall and precision) reached 99.9587% for single nucleotide variants (SNV) and 99.4009% for short indels. DeepVariant tool [Poplin et al., biorxiv] which won the challenge is the first variant calling method which uses TensorFlow machine learning. Thus we ask a question if introduction of machine learning to medical genomics could significantly improve variant analysis precision. To test this hypothesis, we compare DeepVariant tool to recently used methods using independent NA12878 sample sequenced in our laboratory. Furthermore, we used the newest GRCh38.p10 reference genome [Speir and others 2016] whereas Poplin et al. called variants using GRCh37. Results generated by DeepVariant were compared to GATK 4.0 (gold standard pipeline) and Speedseq (efficient pipeline).
Less...