NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM3276717 Query DataSets for GSM3276717
Status Public on Jul 18, 2018
Title SRX124975
Sample type SRA
 
Source name Dmel23_E23h
Organism Drosophila melanogaster
Characteristics developmental stage: embryonic stage 22-23 h (ael)
tissue: whole body
attributes: sp|embryo|cn|bw|laying|whole|egg
Extracted molecule total RNA
Extraction protocol see original sample
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina HiSeq 2000
 
Data processing We created a pre-alignment pipeline to identify technical metadata and generate sample quality metrics. We downloaded FASTQs from SRA using fastq-dump (sra-tools v2.8.2) --split-files -M 0, and counted the number of reads and estimated average read lengths. A sample was considered paired end if two files were generated by fastq-dump and each file had an equal number of reads, ≥ 10,000 reads, and an average read length ≥ 10 bp. We filtered individual reads that were ≤ 25 bp using atropos (v1.1.18) with --minimum-length 25. We simultaneously verified samples were indeed Drosophila and estimated contamination with FastQ Screen (v0.11.3) and bowtie 2 (v2.3.3.1); by mapping 100,000 reads to 8 references (dm6, rRNA, wolbachia, human, yeast, e. coli, PhiX, ERCC-SRM2374). Next we aligned all reads with Hisat2 (v2.1.0) with --max_intronlen 300000 and --known-splicesite-file to the Drosophila melanogaster Release 6 plus ISO1 MT (GCA_000001215.4). This was followed with samtools (v1.7) and bamtools (v2.4.1) with default settings to generate summary statistics. We estimated various metrics with Picard CollectRNASeqMetrics (v2.15.0) using three separate parameters STRAND=NONE, STRAND=FIRST_READ_TRANSCRIPTION_STRAND, and STRAND=SECOND_READ_TRANSCRIPTION_STRAND. These metrics allowed us estimate library strandedness. Finally we identified duplicates using Picard MarkDuplicates (v2.15.0).
To generate counts tables and coverage tracks we used parameters discovered in the pre-alignment pipeline in our alignment pipeline. The alignment pipeline uses FASTQ file(s) downloaded by the pre-alignment pipeline, but trimms adapter sequence and low quality bases using atropos (v1.1.18) with -q 20 --minimum-length 25. The remaining reads were mapped using Hisat2 (v2.1.0) with --dta --max-intronlen 300000 --known-splicesite-infile and the --rna-strandedness using ‘F’, ‘R’, ‘FR’, or ‘RF’ depending on the strandedness. We merged alignments from individual SRA runs (SRRs) to the library level (SRX) and generated gene level, junction level, and intergenic coverage counts using FeatureCounts from the subread package (v1.5.3). Finally we created browser tracks using bamCoverage from the deeptools package (v2.5.4) using --binSize 1 --normalizeTo1x 129000000 --ignoreForNormalization chrX.
Genome_build: Drosophila melanogaster Release 6 plus ISO1 MT (GeneBank assembly accession: GCA_000001215.4)
Supplementary_files_format_and_content: Processed data files include:
*.bw are BigWig files generated using deeptools bamCoverage
*.counts are gene level coverage counts
*.jcounts are gene level junction counts
*.intergenic.counts are intergenic coverage counts
*.intergenic.jcounts are intergenic junction counts
Series level supplementary files:
dmel_r6-11.intergenic.gtf intergenic GTF generated by the pipeline for estimating intergenic coverage counts.
supplemental_metadata.tsv supplemental metadata file containing additional metadata for each sample including QC values and various flags generated by each pipeline
gene_counts.tsv supplemental file containing all gene counts as a single matrix
intergenic_counts.tsv supplemental file containing all intergenic counts as a single matrix
 
Submission date Jul 17, 2018
Last update date Sep 04, 2018
Contact name Brian Oliver
E-mail(s) briano@nih.gov
Phone 301-204-9463
Organization name NIDDK, NIH
Department LBG
Lab Developmental Genomics
Street address 50 South Drive
City Bethesda
State/province MD
ZIP/Postal code 20892
Country USA
 
Platform ID GPL13304
Series (1)
GSE117217 Remapping the SRA: Drosophila melanogaster RNA-Seq data from the Sequence Read Archive
Relations
Reanalysis of GSM883785
BioSample SAMN00794536
SRA SRX124975
Named Annotation GSM3276717_SRX124975.flybase.minus.bw
Named Annotation GSM3276717_SRX124975.flybase.plus.bw

Supplementary file Size Download File type/resource
GSM3276717_SRX124975.bam.counts.jcounts.txt.gz 183.2 Kb (ftp)(http) TXT
GSM3276717_SRX124975.bam.counts.txt.gz 829.1 Kb (ftp)(http) TXT
GSM3276717_SRX124975.bam.intergenic.counts.jcounts.txt.gz 157.8 Kb (ftp)(http) TXT
GSM3276717_SRX124975.bam.intergenic.counts.txt.gz 145.7 Kb (ftp)(http) TXT
GSM3276717_SRX124975.flybase.minus.bw 3.2 Mb (ftp)(http) BW
GSM3276717_SRX124975.flybase.plus.bw 3.3 Mb (ftp)(http) BW
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap