GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
Sample GSM2214185 Query DataSets for GSM2214185
Status Public on Aug 01, 2016
Title mTAIL-seq exp #10 e20
Sample type SRA
Source name Drosophila embryo
Organism Drosophila melanogaster
Characteristics genotype: w1118
developmental stage: 2.0-3.0hr after egg laying (AEL)
Treatment protocol No treatment
Growth protocol HeLa cells were maintained in DMEM (Welgene) supplemented with 10% fetal bovine serum (Welgene). All the fly strains were obtained from Bloomington stock center. w1118 was used as wild type control. wispKG5287 was previously described as a null allele of wisp (Benoit et al., 2008). Immature oocytes and mature oocytes were collected by hand dissection in Grace′s Unsupplemented Insect Media (Gibco, 11595-030) from 3 or 4 day old female flies. Unfertilized activated eggs were produced from w1118 virgin females mated to sterile males (son of tud1 mothers) . Fly eggs and embryos were collected on grape juice plates for the designated time frame at 25°C.
Extracted molecule total RNA
Extraction protocol Total RNAs were extracted from HeLa cells or Drosophila samples by TRIzol reagent (Invitrogen, 15596-018).
Total RNA (~1–5 ug) was ligated to 3′ hairpin adaptor using T4 RNA ligase 2 (NEB, M0239) for overnight. 3′ ligated RNA was partially digested by RNase T1 (Ambion, AM2283) and subject to streptavidin beads (Invitrogen, 11206D). 5′ phosphorylation by PNK reaction (Takara, 2021B) and endonucleolytic cleavage by APE1 reaction (NEB, M0282) were performed on beads. Subsequently, RNA was eluted by 2X RNA loading dye and gel purified by 6% Urea-PAGE gel in the range of 300–750 nucleotides. The purified RNAs were ligated to 5′ adaptor, subjected to reverse-transcription (Invitrogen, 18080-085) and amplified by PCR using Phusion DNA polymerase (Thermo, F-530L). PCR products were purified by AMPure XP beads (Beckman, A63881).
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina MiSeq
Description mTAIL-seq for activated egg of wt, rep #1
Data processing The base calls and signal intensities were processed by Illumina RTA 1.17.28 for MiSeq. The read 1 sequences were reanalyzed for more sensitive basecalling using AYB 2. The read 1 sequences were aligned to the common contaminants set, which is composed of rDNA repeat units (GenBank accession U13369.1), PhiX genome (GenBank accession J02482.1), Illumina TruSeq primer sequences, and all sequences for 5S and 5.8S rRNAs of respective species (retrieved from Rfam 11.0 of the Wellcome Trust Sanger Institute) using GSNAP 2013-03-31 with maximum 5% mismatches allowed. Clusters with any match to the contaminants were removed from the subsequent analyses. The sequences having completely identical nucleotides in the 21st to 35th cycle in read 1 (representative region of the insert) and the 1st to 15th cycle in read 2 (degenerate bases in 3′ adapter) are deduplicated by leaving only a cluster with the maximum PHRED quality sum of read 1. The degenerate and fixed delimiter sequence in 3′ adapter was clipped out from read 2 by searching perfect match of delimiter sequence (‘GTCAG’ as in the direction of read 2) between the 14th and 16th cycles in read 2. The clusters missing a delimiter sequence or having low diversity in degenerate region (at least two occurrences for all of A, C, G and T) were removed from further analyses.
The fluorescence signal intensities were processed into “Relative T signal” as described in our previous paper (Chang et al, 2014). The signals from a spike-in sample were purified with an outlier filter based on robust Mahalanobis distance (mvoutlier package 1.9.9; quan=0.5, alpha=0.025). Random 500 clusters per each spike-in were chosen for parameter calculation of a Gaussian mixture hidden Markov model (GMHMM). We trained the model using Baum-Welch algorithm implemented in the GHMM library ( with topology and initial parameters shown in fig. S1A and table S7 and S8 (1,000 iterations). The procedure was iterated to maximize likelihood, not using any property (eg. designed length of poly(A) tail) of spike-ins. Relative T signals outside the range of [-5, 5] were clipped into the range for both training and later calculations.
The length of poly(A) tails were first measured with base call-based “Strategy II” described in fig. S1A. For clusters with the measured length is shorter than 8 nt, the length is called as the final poly(A) tail length. For the others, normalized T signals starting from the first position in T-stretch detected by Strategy II were analyzed with the GMHMM. The hidden states were decoded with the standard Viterbi algorithm implemented in the GHMM library. The number of cycles with state 1 and 2 was called as the length of poly(A) tail. For the estimation of performance, we applied the process to all spike-in samples except the clusters used for the parameter fitting of the model.
The remaining reads after contaminant filter and the first duplication filters were then aligned to the genome sequences (UCSC hg19, positions of splicing junctions were processed from the UCSC Genome Browser database for version of Jan 24, 2013) using GSNAP 2013-03-31. Three different versions of alignments to genome were used in this study. (1) R1 alignment: using only the full read 1 sequences which are 51 nt long. This was used for identification of a cluster. (2) R2 short alignment: using only 40 nt right next to the 3′ adapter of read 2. This was used in searching for the poly(A)-free 3′ hydroxyl ends. (3) paired alignment: using the full read 1 sequences and part of read 2 sequences trimmed of degenerate bases and delimiter. We filtered out poly(A) stretches encoded from genome using this alignment set. All the alignments were performed with maximum mismatches of 5%, minimum mapping quality of 3. All multi-mapped reads were removed. The remaining PCR artifacts with few mismatches were removed again using the R1 alignment with 15 degenerate bases inside the 3′ adapter region. To detect that kind of artifacts, we clustered the R1 alignments with maximum distance between mapped positions of 10 bp, they were then clustered again within the first cluster using degenerate bases from read 2 of respective reads with CD-HIT-EST 4.5.4 (word size=6, sequence identity=0.85). For a set of detected duplicates, we chose a read with maximum sum of PHRED quality in read 1 to leave.
For classification and transcript-level analyses, we compiled reference annotations for human and mouse using NCBI RefSeq, RepeatMasker, gtRNAdb, Rfam and miRBase databases (the first three were downloaded from the UCSC Genome Browser on Apr 25, 2013; Rfam version 11; miRBase version 19). The R1 alignments were annotated with intersection with the compiled annotations using BEDTools {Quinlan, 2010 #66}. When multiple annotations were overlapped to an alignment, we chose a class for the statistics requiring exclusive assignment of a genomic source type by the following priority: miRNA, rRNA, tRNA, Mt-tRNA, snoRNA, scRNA, srpRNA, snRNA, lncRNA, RNA, ncRNA, misc_RNA, Cis-reg, ribozyme, RC, IRES, frameshift_element, LINE, SINE, Simple_repeat, Low_complexity, Satellite, DNA, LTR, CDS, 3′ UTR, 5′ UTR, intron, Other, Unknown (higher priority first). The transcript-level analyses were performed using our custom non-redundant RefSeq (nrRefSeq) transcript set, which is a reduced set retaining only the longest isoform or transcript when regions overlap with each other. The positions of read 1 in nrRefSeq transcripts were positioned with BEDTools intersection between alignments to genome sequences and nrRefSeq annotation set, and then translated to the transcript-level coordination with in-house software.
As poly(A) tails were initially detected with a constraint that it must begin within the first 30 cycles, so the maximum detectable 3′ end modification of poly(A) tails was limited to the last 30 nucleotides of insert. To exclude A stretches obviously encoded from genomic sequence (with or without 3′ end modifications), we masked detected poly(A) tail ranges with read 2 alignments so that the 3′-most position of alignable (not clipped) is eliminated from poly(A) tail or its 3′ end modifications. All statistics regarding transcript-level modification rates were calculated for transcripts having more than 200 tags with poly(A) tails longer than 8 nt.
Genome_build: hg38, dm6
Supplementary_files_format_and_content: The spreadsheet files contain the poly(A) tail length distribution and 3' end modification frequencies next to poly(A) tails for all detected transcripts
Submission date Jun 26, 2016
Last update date May 15, 2019
Contact name Jaechul Lim
Organization name Seoul National University
Street address 1 Gwanak-ro, 1 Gwanak-gu
City Seoul
State/province CT
ZIP/Postal code 08826
Country South Korea
Platform ID GPL16479
Series (2)
GSE83731 mTAIL-seq reveals dynamic poly(A) tail regulation in oocyte-to-embryo development [RNA-seq and mTAIL-seq Human and Drosphila]
GSE83732 mTAIL-seq reveals dynamic poly(A) tail regulation in oocyte-to-embryo development
Reanalyzed by GSM3281515
BioSample SAMN05293651
SRA SRX1878976

Supplementary file Size Download File type/resource
GSM2214185_mTS10_e20.csv.gz 477.0 Kb (ftp)(http) CSV
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap