GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM1703070

Query DataSets for GSM1703070

Status

Public on Jun 04, 2015

Title

CA894

Sample type

SRA

Source name

children with diarrhea_DAEC

Organism

Homo sapiens

Characteristics

age: 2 year-old or older
group: Diarrhea
organisms: DAEC
severity: mild
tissue: Whole blood
flowcell: A
lane: 4

Extracted molecule

polyA RNA

Extraction protocol

Total RNA was isolated using MagMax for Stabilized Blood Tubes RNA Isolation Kit (Ambion, TX) and RNA sample was globin reduced wit GLOBINclear (Ambion, TX) according to manufacturer’s instructions.
Libraries were prepared from globin reduced RNA samples using the Illumina TrueSeq RNA Sample Preparation kit according to manufacturer's instructions.

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina HiSeq 2500

Data processing

Base-calling was performed automatically in Illumina BaseSpace after sequencing; FASTQ reads were trimmed in Galaxy in two steps: 1) hard-trimming to remove 1 3'-end base (FASTQ Trimmer tool, v.1.0.0); 2) quality trimming from both ends until minimum base quality for each read >= 30 (FASTQ Quality Trimmer tool, v.1.0.0).
Reads were aligned in Galaxy using bowtie and TopHat (Tophat for Illumina tool, v.1.5.0).
Read counts per Ensembl gene ID were estimated in Galaxy using htseq-count (htseq-count tool, v.0.4.1).
Sequencing, alignment, and quantitation metrics were obtained for FASTQ, BAM/SAM, and count files in Galaxy using FastQC, Picard, TopHat, Samtools, and ht-seq-count.
Non-protein coding and mitochondrial genes were filtered out.
Samples were selected for further analysis by using these criteria: unpaired reads examined/FASQ total read >0.75 and median CV coverage <1. Samples that had total counts less than one million were excluded.
Genes expressed (counts per million >1) in less than 3 samples were removed. Data normalization (TMM) and differential expression analysis were performed in R using "edgeR" package.
Genome_build: GRCh38
Supplementary_files_format_and_content: (1) mexico_processed_data_for_GEO.csv: comma-separated matrix – first 6 columns contain Ensembl gene ID, Ensembl transcript ID, HGNC symbol, location information and chromosome name, remaining columns include read counts assigned for each library; data represents all processing steps up to and including "sample QC and filtering" but not downstream processing/normalization for analysis. (2) mexico_combined_metrics.csv: comma-separated matrix – the first column contains library ID, remaining columns include RNA sequencing and alignment metrics.

Submission date

Jun 03, 2015

Last update date

May 15, 2019

Contact name

Scott Presnell

E-mail(s)

SPresnell@benaroyaresearch.org

Organization name

Benaroya Research Institute

Street address

1201 Ninth Avenue