GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM3061195

Query DataSets for GSM3061195

Status

Public on Dec 31, 2018

Title

mouse26_pancreas_mRNAseq

Sample type

SRA

Source name

mouse26_pancreas_mRNAseq

Organism

Mus musculus

Characteristics

strain: C57BL6
internal animal id: 26
age: 13 weeks old mouse
inhibitors: none
injection route: n/a
tissue: pancreas

Extracted molecule

polyA RNA

Extraction protocol

Total RNA was extracted with Trizol (Invitrogen), according to the manual. After phase separation, upper aqueous phase was collected and mixed with 1.5 volumes of isopropanol. This mixture was applied to the column from the RNAqueous kit and proceeded as described in the manual. After elution 1 ul of Superase-In was added before storing at -80.

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina HiSeq 4000

Description

processed data file:
organs_mRNAseq_count.txt

Data processing

To remove illumina adapter from Ribo-seq libraries run "cutadapt -u 1 -m 23 -a AGATCGGAAGAGCACACGTCT --discard-untrimmed -o output.fastq input.fastq"
mRNA-seq sequences were aligned with TopHat 2.1.0 using following settings: tophat --transcriptome-index --no-discordant --no-mixed --no-novel-juncs
The NCBI mouse genome build GRCm38.p3 and the Mus musculus Annotation Release 105 were used as a reference. When aligning mRNA-seq and ribosome profiling reads for expression and translation efficiency estimation we used the following strategy. Only full chromosomes were left and all non-chromosomal and mitochondrial records removed. Furthermore, only RefSeq and BestRefSeq records were left in the transcriptome annotation, while Gnomon predictions were discarded along with pseudogenes. Read count per gene was accessed by HTseq-count software
To plot time-dependent ribosome occupancy, we employed a different strategy. First, using a gene bank (gbk) file, which comes in a package with the genome assembly and annotation, we extracted RefSeq and BestRefSeq records for every gene including CDS, 5’-UTR and 3’-UTR lengths and sequences. Among them, we identified the longest isoform for every gene, prioritized as CDS > 5’UTR > 3’UTR. 5’-UTRs were trimmed by 100 nucleotides. If either or UTRs were shorter than 100 nucleotides, we filled it with up to 100 based on genomic coordinates. The full list of sequences is collected in the mRNA_100.fna file. To prepare a list of non-redundant genes, we run blast of all vs. all (blastall -p blastn -m 8 -b 500 -v 500 -e 0.001). Gene pairs that are too similar at the level of nucleotide sequence were excluded. In addition to the e-score, we enforced a requirement of the high-homology stretch being at least 50 nt long, and if it was longer, the similarity had to be at least 90% to treat these genes as homologous and redundant. A total of 13,685 genes passed every threshold. This reference set ensured unambiguous alignment of ribosome footprints. Resulting sequences are collected in the mRNA_100unique.fna file.
Custom Perl and R scripts were used to calculate the footprint coverage profiles of individual genes.
Genome_build: GRCm38
Supplementary_files_format_and_content: read counts contain raw read count of individual genes; footprint coverage contains gene coverage profiles

Submission date

Mar 22, 2018

Last update date

Dec 31, 2018

Contact name

Maxim Gerashchenko

E-mail(s)

mgerashchenko@bwh.harvard.edu

Organization name

Brigham and Women's Hospital

Department

Medicine