NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM673636 Query DataSets for GSM673636
Status Public on Aug 24, 2011
Title cortex_B_rep2 (mostly layer 4)
Sample type SRA
 
Source name primary somatosensory cortex
Organism Mus musculus
Characteristics tissue: primary somatosensory cortex
number of pooled mice: 4
strain: C57BL/6
Sex: male
age: P56
laminar sample (see publication): B2
Extracted molecule polyA RNA
Extraction protocol Total RNA >200 nt was extracted using the RNeasy Lipid Tissue Mini kit (QIAGEN), following the manufacturer's instructions and using the on-column DNase digest. RNA quantity was assessed using a NanoDrop 1000 spectrophotometer (ThermoScientific), and RNA quality and integrity assessed using a BioAnalyzer (Agilent Laboratories). Total RNA was prepared for paired-end deep sequencing on Illumina's Genome Analyzer II following the manufacturer's protocol. Briefly, poly(A) RNA was enriched using Dynal oligo(dT) beads (Invitrogen). This was fragmented using the RNA fragmentation kit (Ambion). First- and second-strand cDNA was synthesized with random hexamer primers (Invitrogen) and SuperScript II (Invitrogen). Ends were repaired using T4 DNA polymerase and Klenow DNA polymerase. A single adenosine and Illumina adapters were ligated using Klenow 3'-to-5' exo-nuclease. Following gel purification of cDNA templates, the library was enriched with 15 rounds of PCR before being added to the flow cell for paired-end sequencing (51 nt from either end, of which 50 nt was used in mapping; except for two libraries in sample F where 76 nt were sequenced from either end, using 75 in mapping). Paired-end library average internal insert sizes (standard deviations) were as follows: A 164 (27); B1 239 (29); B2 193 (28); C 184 (21); D 208 (22); E 172 (24); F 184 (28); DC2 A 287 (31); DC2 B 290 (28); DC2 C 291 (28); DC2 D 269 (30); DC2 E 284 (31); DC2 F 279 (21); LC A 281 (40); LC B 275 (32); LC C 282 (41); LC D 267 (29); LC E 268 (38); LC F 272 (34).
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina Genome Analyzer II
 
Description mostly layer 4
Data processing Bam: Reads were processed with versions 1.4 and 1.6 of the Illumina pipeline. The internal insert size and standard deviation was estimated for each library by empirically calculating the full width at half max (FWHM) for an internal insert size distribution constructed using uniquely mapping reads on the same chromosome from Illumina's Gerald pipeline. The insert size was estimated to be the midpoint of the minimum and maximum insert sizes at half (or greater) the frequency of the most common insert size, and the standard deviation was calculated under a Gaussian assumption StdDev=FWHM/(2*sqrt(2ln2)). Using this insert size and standard deviation, lanes were separately mapped with TopHat v1.0.13 to the mouse reference genome (mm9, downloaded from UCSC) plus all splice junctions from the UCSC Table Browser (cite) tables all_mrna (mouse mRNAs), intronEst (mouse spliced ESTs), and xenoMrna (mRNAs mapped from other species to mouse) with the following options: butterfly search, closure search, fill gaps, microexon search, min anchor length = 5, min isoform fraction = 0.0, max intron length = 500 kb. In a second round of TopHat mapping (default options except min isoform fraction = 0.0), all novel junctions identified from any sample in the previous step were given to TopHat again (as raw junctions), so splice junctions could be measured across all samples in an unbiased manner.
Wig: WIG files created by parsing a pileup were normalized (such that scales were comparable across samples) as follows. For normalization, uniquely mapping reads were first extracted from alignment files using a custom script and pysam. Then, the number of unique (and uniquely mapping) reads overlapping (at least partially) each Ensembl (release 56) protein-coding gene model were quantified for each sample. The exonic Ensembl models were further required to be >200 nt in size, as the experimental protocol selected against these, and as variations in expression levels might derive from the upstream experimental protocol. Eliminating these 328 small gene models left 22533 Ensembl protein-coding gene models that were used in subsequent analyses. Read counts were then normalized to adjusted read counts that were directly comparable across samples as a measure of relative expression. For each gene model, the number of reads falling (at least partially) into that model was summed separately for each sample. Then, for each gene model in every sample, the ratio of (read counts in that model in the sample):(total number of read counts in that model across all samples) was calculated. The median ratio across all gene models was found for each sample. Adjusted read counts were calculated for each sample by multiplying all read counts in that sample by (the minimum median ratio of all samples)/(median ratio of that sample). These ratios were used to adjust WIG coverage densities to (often non-integral) normalized values. These wigs represent uniquely mapping reads.
Expression: De novo transcript models were built for each sample using CuffLinks v0.8.3, providing the average insert sizes and standard deviations calculated previously and using default options. Transcript models were combined across all samples using cuffcompare with default options, and comparing against a GTF of Ensembl gene models downloaded from Ensembl (release 57) and adapted to use the UCSC nomenclature. Expression levels of the resultant unified cuffcompare transcript models were assessed in every sample using cufflinks in quantification-only mode with default settings. Expression levels of all genes and transcripts in Ensembl (release 57) were quantified with cufflinks in quantification-only mode with default settings. This is the tab-separated cufflinks output.
GTF: De novo transcript models were built for each sample using CuffLinks v0.8.3, providing the average insert sizes and standard deviations calculated previously and using default options. Transcript models were combined across all samples using cuffcompare with default options, and comparing against a GTF of Ensembl gene models downloaded from Ensembl (release 57) and adapted to use the UCSC nomenclature.
 
Submission date Feb 11, 2011
Last update date May 15, 2019
Contact name T Grant Belgard
URL http://www.grantbelgard.com
Organization name University of Oxford
Department Department of Physiology, Anatomy and Genetics
Lab MRC Functional Genomics Unit
Street address Le Gros Clark Building, South Parks Road
City Oxford
ZIP/Postal code OX1 3QX
Country United Kingdom
 
Platform ID GPL9250
Series (1)
GSE27243 A Transcriptomic Atlas of Mouse Neocortical Layers
Relations
Reanalyzed by GSE80797
SRA SRX042256
BioSample SAMN00210980

Supplementary file Size Download File type/resource
GSM673636_B2.bam 7.7 Gb (ftp)(http) BAM
GSM673636_B2.wig.gz 1.0 Gb (ftp)(http) WIG
GSM673636_B2_cuff.gtf.gz 20.0 Mb (ftp)(http) GTF
GSM673636_B2_cuff_genes.tsv.gz 4.2 Mb (ftp)(http) TSV
GSM673636_B2_cuff_trans.tsv.gz 11.0 Mb (ftp)(http) TSV
GSM673636_B2_ens_genes.tsv.gz 580.6 Kb (ftp)(http) TSV
GSM673636_B2_ens_trans.tsv.gz 1.2 Mb (ftp)(http) TSV
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap