GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM1832379

Query DataSets for GSM1832379

Status

Public on Mar 28, 2016

Title

A1_N704_S502

Sample type

SRA

Source name

Cerebral Cortex

Organism

Homo sapiens

Characteristics

tissue: Cerebral Cortex
developmental stage: GW20.5

Extracted molecule

polyA RNA

Extraction protocol

Single-cell capture and cell lysis using Fluidigm C1.
RT and whole transcriptome amplifcation on the Fluidigm C1 IFC; library indexed by illumina Nextera DNA Sample Prep Kit

Library strategy

ncRNA-Seq

Library source

transcriptomic

Library selection

size fractionation

Instrument model

Illumina HiSeq 2500

Description

polyA RNA
single cell

Data processing

Bulk RNA-seq: Strand-specific reads were aligned to the human reference genome, Ensembl GRCh37/hg19 release 75, using TopHat v2.0.10 with the flags (--library-type fr-firststrand –microexon-search). De novo transcriptome assembly was performed separately on rRNA depletion total RNA-seq alignments, and on polyA selection RNA-seq alignments, using Cufflinks v2.2.1 with the flags (-M ensembl_75_mtRNA_rRNA.gtf -b genome.fa -u --library-type fr-firststrand --max-multiread-fraction 0.25 --3-overhang-tolerance 2000). Transcriptome assemblies at all developmental stages and replicates were merged, separately for rRNA depletion total RNA-seq and polyA selection RNA-seq, with the Ensembl 75/GENCODE 19 reference transcriptome, using Cuffmerge. To identify transcripts novel compared to Ensembl, we utilized Cuffcompare class codes and extracted those assembled transcripts classified as: i – novel intronic, u – novel intergenic, x – novel antisense. All novel transcripts under 200 nt in length were removed. Of the remaining transcripts, we determined minimal read coverage thresholds based on whether Cufflinks classified previously annotated transcripts as having “full_read_support.” By analyzing the true positive rate vs. false positive rate of classifying known genes as obtaining “full_read_support” at various coverage thresholds, we determined the minimum coverage to be 1.4 for polyA and 1.67 for total RNA-seq (at FDR = 0.05). Starting with just the polyA RNA-seq data, transcripts with read coverage above 1.4 in both biological replicates of at least one developmental stage were included in the reference and considered to be expressed in the neocortex. Due to limited availability of early fetal tissue, the GW14.5 sample was treated as the biological duplicate of the GW13 sample. Novel transcripts that were predicted to have protein coding capability by one or more of the following methods were classified as transcripts of uncertain coding potential (TUCP): CPAT, threshold = 0.364; CPC, threshold = 0; Pfam. For comparing to the Pfam database, the longest potential open reading frame (ORF) of each novel transcript was obtained, and any putative ORF that had a significant match for a protein domain annotated in Pfam A or Pfam B resulted in the parent transcript being classified as a TUCP. All remaining novel lncRNAs and TUCPs were then named according to recently proposed nomenclature standards, for instance LINC-[nearest mRNA] for intergenic lncRNAs and [nearest mRNA]-AS for antisense lncRNAs, and were then merged to the Ensembl 75 reference transcriptome, resulting in the polyA Full reference transcriptome. The polyA Stringent reference transcriptome was produced by removing all novel single-exon lncRNAs and TUCPs. Known lncRNAs from Ensembl were obtained by identifying transcripts with one of the following biotype classifications: “3prime_overlapping_ncrna,” “antisense,” “lincRNA,” “processed_transcript,” “sense_intronic,” and “sense_overlapping.” The same pipeline, with the coverage threshold of 1.67, was performed for reads derived from the total RNA-seq. Gene-level fragment counts for each polyA and total RNA sample were quantified using featureCounts v1.4.6, using the flags: -p -s 2 -B -C -t exon -g gene_id. Count tables were normalized to TPM (Transcripts per Million) for internal comparisons and visualizations of bulk RNA-seq. To identify differentially expressed genes, we used DESeq2 on gene-level fragment counts derived from the polyA samples and polyA Full reference transcriptome. Pairwise negative binomial significance tests were performed between developmental stages using biological duplicates, and the union of genes that were significant at FDR < 0.01 were classified as differentially expressed.
Single Cell RNA-seq: Paired end 100 reads from single cell cDNA libraries were quality trimmed using Trim Galore with the flags: -q 20 --nextera --length 20. Trimmed reads were aligned to the human reference genome, Ensembl GRCh37/hg19 release 75, augmented with the 92 ERCC Spike-In Control sequences, using TopHat v2.0.10 with the flags: --transcriptome-index=polya_stringent_reference.gtf --prefilter-multihits. The polyA Stringent reference transcriptome, derived from whole tissue RNA-seq as described above, was used as a transcriptome guide. Gene-level fragment counts were quantified using featureCounts v1.4.6 with the flags: -p -B -C. Counts were normalized by transcriptome size factors according to DESeq. 50 additional single cell libraries were also included, which were deposited in SRP041736.

Submission date

Jul 24, 2015

Last update date

May 15, 2019

Contact name

John Liu

E-mail(s)

john.liu@ucsf.edu

Organization name

UCSF

Street address

35 Medical Center Way