Understanding the function of rare non-coding genetic variants represents a significant challenge. Here, we developed MapUTR, a screen to identify rare 3’ UTR variants affecting mRNA abundance post-transcriptionally. Among 17,301 rare variants, an average of 24.5% were functional, with 70% in cancer-related genes, many in critical cancer pathways. This observation motivated a further interrogation of 11,929 cancer somatic mutations, uncovering 3,928 (33%) functional mutations in well-established cancer driver genes, such as CDKN2A. Functional MapUTR variants were enriched in miRNA targets and protein-RNA interaction sites. Based on MapUTR, we define a new metric, untranslated tumor mutation burden (uTMB), reflecting the amount of somatic functional MapUTR variants of a tumor. We showed the potential of uTMB in predicting patient survival. Through prime editing, we characterized three variants in cancer-relevant genes (MFN2, FOSL2, and IRAK1), illustrating their cancer-driving potential. Our study elucidates the function of thousands of non-coding variants, nominates non-coding cancer driver mutations, and demonstrates their potential contributions to cancer.
Overall design: To test a variant of interest, both the reference and the alternative allele were designed into 200nt oligos synthesized by Twist Biosciences. Each oligo contains a 164nt flanking sequence centered around the variant and adaptor sequences for cloning. The oligos were then cloned into the 3’ UTR region of the eGFP gene, whose expression is driven by the CAG promoter, in the plasmid reporters. We introduced 3ug plasmid libraries into 15M HEK293/HeLa cells via cell electroporation for each biological replicate, with a total number of three replicates. Total mRNA was isolated from the cells 24h after electroporation, reverse-transcribed into cDNA, and made into sequencing libraries through a stepwise PCR. Specifically, a unique molecular identifier (UMI) was added to each cDNA transcript in the first round of PCR. The PCR products were further amplified to add Illumina sequencing adaptors in the second round of PCR. A similar stepwise PCR protocol was also applied to the plasmid DNA to generate DNA sequencing libraries. Both DNA and RNA sequencing libraries were pooled together to be sequenced on Hiseq3000 PE150 or Novaseq SP PE150 with 15% PhiX spike-in. For data analysis, we extracted UMIs from each read to remove PCR duplicates.
Please note that the C#Sp#.fasta contains Chip# Subpool# library sequences (e.g. C1Sp1.fasta = Chip1 Subpool1 library sequences etc.).
Less...