TSomVar: a tumor-only somatic and germline variant identification method with random forest

Brief Bioinform. 2022 Sep 20;23(5):bbac381. doi: 10.1093/bib/bbac381.

Abstract

Somatic variants act as critical players during cancer occurrence and development. Thus, an accurate and robust method to identify them is the foundation of cutting-edge cancer genome research. However, due to low accessibility and high individual-/sample-specificity of the somatic variants in tumor samples, the detection is, to date, still crammed with challenges, particularly when lacking paired normal samples as control. To solve this burning issue, we developed a tumor-only somatic and germline variant identification method (TSomVar) using the random forest algorithm established on sample-specific variant datasets derived from genotype imputation, reads-mapping level annotation and functional annotation. We trained TSomVar by using genomic variant datasets of three major cancer types: colorectal cancer, hepatocellular carcinoma and skin cutaneous melanoma. Compared with existing tumor-only somatic variant identification tools, TSomVar shows excellent performances in somatic variant detection with higher accuracy and better capability of recalling for test datasets from colorectal cancer and skin cutaneous melanoma. In addition, TSomVar is equipped with the competence of accurately identifying germline variants in tumor samples. Taken together, TSomVar will undoubtedly facilitate and revolutionize somatic variant explorations in cancer research.

Keywords: genotype imputation; identification method; random forest; somatic variant; tumors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Colorectal Neoplasms*
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Melanoma* / genetics
  • Melanoma, Cutaneous Malignant
  • Neoplasms* / genetics
  • Skin Neoplasms* / genetics