NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Series GSE166189 Query DataSets for GSE166189
Status Public on Jun 30, 2021
Title CASowary: CRISPR-cas13 guide RNA predictor for transcript depletion
Organism Homo sapiens
Experiment type Other
Summary Recent discovery of the gene editing system - CRISPR (Clustered Regularly Interspersed Short Palindromic Repeats) associated proteins (Cas), has resulted in its widespread use for improved understanding of a variety of biological systems, by enabling large-scale perturbation of the genomes and transcriptomes. Cas13, a lesser studied Cas protein, has been repurposed to allow for efficient and precise editing of RNA molecules. The Cas13 system utilizes base complementarity between a crRNA/sgRNA (crispr RNA or single guide RNA) and a target RNA transcript, to preferentially bind to only the target transcript. Unlike targeting the upstream regulatory regions of protein coding genes on the genome, the transcriptome is significantly more redundant, leading to many transcripts having wide stretches of identical nucleotide sequences. Transcripts also exhibit complex three-dimensional structures and interact with an array of RBPs (RNA Binding Proteins), both of which further limit the scope of effective target sequences. As a result, there currently exists no method to predict whether a specific sgRNA will effectively knockdown a transcript. Here we present a novel machine learning and computational tool, to predict the efficacy of a sgRNA. We used publicly available RNA knockdown data from cas13 characterization experiments1 for 555 sgRNAs targeting the transcriptome in HEK293 cells, in conjunction with transcriptome-wide protein occupancy information on RNA2. Our model utilizes a Decision Tree architecture with a set of 112 sequence and target availability features, to classify sgRNA efficacy into one of four classes, based upon expected level of target transcript knockdown. After accounting for noise in the training data set, the noise-normalized accuracy exceeds 90%. Additionally, highly effective sgRNA predictions have been experimentally validated using an independent RNA targeting cas system – CIRTS3, confirming the robustness and reproducibility of our model’s sgRNA predictions. In particular, several highly efficient sgRNA’s designed using our model against SMARCA4 gene exhibited strong agreement with experimental data supporting a 10-fold decrease in expression. Utilizing transcriptome wide protein occupancy information, CASowary can predict high quality guides for different transcripts in a cell specific manner. Applications of CASowary to whole transcriptomes should enable rapid deployment of CRISPR/Cas13 systems, facilitating the development of therapeutic interventions linked with aberrations in RNA regulatory processes.
 
Overall design To test the efficacy of CASowary in designing potential sgRNAs, we generated the protein occupied sites in HeLa cells by implementing Protein Occupancy Profile-Sequencing (POP-seq) with three variants- NPOP-seq (no crosslinking), UPOP-seq (UV-crosslinking) and FPOP-seq (Formaldehyde crosslinking) as described previously by Srivastava et al, Sci Rep 11, 1175 (2021).
 
Contributor(s) Srivastava M, Krohannon A, Srivastava R, Janga S
Citation(s) 35236300
NIH grant(s)
Grant ID Grant title Affiliation Name
R01 GM123314 Mapping RNA protein interaction networks in the human genome INDIANA UNIVERSITY Sarath Chandra Janga
Submission date Feb 04, 2021
Last update date Mar 14, 2022
Contact name Rajneesh Srivastava
E-mail(s) rsrivast@pitt.edu
Organization name McGowan Institute for Regenerative Medicine
Department Surgery, University of Pittsburgh
Street address 450 Technology Dr
City Pittsburgh
State/province Pennsylvania
ZIP/Postal code 15219
Country USA
 
Platforms (1)
GPL18573 Illumina NextSeq 500 (Homo sapiens)
Samples (6)
GSM5065623 NPOP_R1 HeLa
GSM5065626 NPOP_R2 HeLa
GSM5065628 FPOP_R1 HeLa
Relations
BioProject PRJNA699563
SRA SRP304756

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE166189_FPOP_peaks.tsv.gz 796.7 Kb (ftp)(http) TSV
GSE166189_NPOP_peaks.tsv.gz 200.1 Kb (ftp)(http) TSV
GSE166189_UPOP_peaks.tsv.gz 1.3 Mb (ftp)(http) TSV
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap