|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Mar 28, 2016 |
Title |
A1_N704_S502 |
Sample type |
SRA |
|
|
Source name |
Cerebral Cortex
|
Organism |
Homo sapiens |
Characteristics |
tissue: Cerebral Cortex developmental stage: GW20.5
|
Extracted molecule |
polyA RNA |
Extraction protocol |
Single-cell capture and cell lysis using Fluidigm C1. RT and whole transcriptome amplifcation on the Fluidigm C1 IFC; library indexed by illumina Nextera DNA Sample Prep Kit
|
|
|
Library strategy |
ncRNA-Seq |
Library source |
transcriptomic |
Library selection |
size fractionation |
Instrument model |
Illumina HiSeq 2500 |
|
|
Description |
polyA RNA single cell
|
Data processing |
Bulk RNA-seq: Strand-specific reads were aligned to the human reference genome, Ensembl GRCh37/hg19 release 75, using TopHat v2.0.10 with the flags (--library-type fr-firststrand –microexon-search). De novo transcriptome assembly was performed separately on rRNA depletion total RNA-seq alignments, and on polyA selection RNA-seq alignments, using Cufflinks v2.2.1 with the flags (-M ensembl_75_mtRNA_rRNA.gtf -b genome.fa -u --library-type fr-firststrand --max-multiread-fraction 0.25 --3-overhang-tolerance 2000). Transcriptome assemblies at all developmental stages and replicates were merged, separately for rRNA depletion total RNA-seq and polyA selection RNA-seq, with the Ensembl 75/GENCODE 19 reference transcriptome, using Cuffmerge. To identify transcripts novel compared to Ensembl, we utilized Cuffcompare class codes and extracted those assembled transcripts classified as: i – novel intronic, u – novel intergenic, x – novel antisense. All novel transcripts under 200 nt in length were removed. Of the remaining transcripts, we determined minimal read coverage thresholds based on whether Cufflinks classified previously annotated transcripts as having “full_read_support.” By analyzing the true positive rate vs. false positive rate of classifying known genes as obtaining “full_read_support” at various coverage thresholds, we determined the minimum coverage to be 1.4 for polyA and 1.67 for total RNA-seq (at FDR = 0.05). Starting with just the polyA RNA-seq data, transcripts with read coverage above 1.4 in both biological replicates of at least one developmental stage were included in the reference and considered to be expressed in the neocortex. Due to limited availability of early fetal tissue, the GW14.5 sample was treated as the biological duplicate of the GW13 sample. Novel transcripts that were predicted to have protein coding capability by one or more of the following methods were classified as transcripts of uncertain coding potential (TUCP): CPAT, threshold = 0.364; CPC, threshold = 0; Pfam. For comparing to the Pfam database, the longest potential open reading frame (ORF) of each novel transcript was obtained, and any putative ORF that had a significant match for a protein domain annotated in Pfam A or Pfam B resulted in the parent transcript being classified as a TUCP. All remaining novel lncRNAs and TUCPs were then named according to recently proposed nomenclature standards, for instance LINC-[nearest mRNA] for intergenic lncRNAs and [nearest mRNA]-AS for antisense lncRNAs, and were then merged to the Ensembl 75 reference transcriptome, resulting in the polyA Full reference transcriptome. The polyA Stringent reference transcriptome was produced by removing all novel single-exon lncRNAs and TUCPs. Known lncRNAs from Ensembl were obtained by identifying transcripts with one of the following biotype classifications: “3prime_overlapping_ncrna,” “antisense,” “lincRNA,” “processed_transcript,” “sense_intronic,” and “sense_overlapping.” The same pipeline, with the coverage threshold of 1.67, was performed for reads derived from the total RNA-seq. Gene-level fragment counts for each polyA and total RNA sample were quantified using featureCounts v1.4.6, using the flags: -p -s 2 -B -C -t exon -g gene_id. Count tables were normalized to TPM (Transcripts per Million) for internal comparisons and visualizations of bulk RNA-seq. To identify differentially expressed genes, we used DESeq2 on gene-level fragment counts derived from the polyA samples and polyA Full reference transcriptome. Pairwise negative binomial significance tests were performed between developmental stages using biological duplicates, and the union of genes that were significant at FDR < 0.01 were classified as differentially expressed. Single Cell RNA-seq: Paired end 100 reads from single cell cDNA libraries were quality trimmed using Trim Galore with the flags: -q 20 --nextera --length 20. Trimmed reads were aligned to the human reference genome, Ensembl GRCh37/hg19 release 75, augmented with the 92 ERCC Spike-In Control sequences, using TopHat v2.0.10 with the flags: --transcriptome-index=polya_stringent_reference.gtf --prefilter-multihits. The polyA Stringent reference transcriptome, derived from whole tissue RNA-seq as described above, was used as a transcriptome guide. Gene-level fragment counts were quantified using featureCounts v1.4.6 with the flags: -p -B -C. Counts were normalized by transcriptome size factors according to DESeq. 50 additional single cell libraries were also included, which were deposited in SRP041736.
|
|
|
Submission date |
Jul 24, 2015 |
Last update date |
May 15, 2019 |
Contact name |
John Liu |
E-mail(s) |
john.liu@ucsf.edu
|
Organization name |
UCSF
|
Street address |
35 Medical Center Way
|
City |
SAN FRANCISCO |
State/province |
California |
ZIP/Postal code |
94143 |
Country |
USA |
|
|
Platform ID |
GPL16791 |
Series (1) |
GSE71315 |
Single cell analysis of long non-coding RNAs in the developing human neocortex |
|
Relations |
BioSample |
SAMN03922181 |
SRA |
SRX1117356 |
Supplementary data files not provided |
SRA Run Selector |
Raw data are available in SRA |
Processed data are available on Series record |
|
|
|
|
|