|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Jul 10, 2019 |
Title |
TT-seq_K562_Ctrl_rep1 |
Sample type |
SRA |
|
|
Source name |
K562 erythroleukemia cells
|
Organism |
Homo sapiens |
Characteristics |
cell line: K562 treatment conditions: 37 oC, 4sU (Sigma) labeling for 5 minutes. molecule: 4sU-labeled RNA
|
Treatment protocol |
To avoid transcriptional changes by freshly added growth medium, cells were expanded 24 h prior to heat shock treatments to a density of 0.3 x 10^6 cells/mL. Heat shock treatments of K562 or Raji B (CDK9as) cells were performed in T175 flasks in a volume of 50 mL at 0.6 x 10^6 cells/mL in a water bath (LAUDA, Aqualine AL12) at 42 °C. Temperature was monitored by thermometer. It took 5 min until the cell suspension reached 42 °C. For TT-seq, RNA-seq and mNET-seq experiments, cells were treated for 0 (Ctrl), 15 min (HS15) or 30 min (HS30).
|
Growth protocol |
Human K562 erythroleukemia cells were obtained from DSMZ (Cat.# ACC-10). Human Raji Burkitt’s (B) lymphoma cells (CDK9as) carry homozygous mutation of phenylalanine (F) 103 to alanine (A) at the CDK9 gene loci and were generated using the CRISPR-Cas9 system (Gressel, Schwalb et al. 2017). K562 and Raji B (CDK9as) cells were cultured antibiotic-free in accordance with the DSMZ Cell Culture standards in RPMI 1640 medium (Thermo Fisher Scientific, Waltham, MA USA) containing 10 % heat inactivated fetal bovine serum (FBS) (Thermo Fisher Scientific, Waltham, MA USA), and 1x GlutaMAX supplement (Thermo Fisher Scientific, Waltham, MA USA) at 37 °C in a humidified 5 % CO2 incubator. Both cell lines used in this study display the phenotypic properties, including morphology and proliferation rate, that have been described in literature. Cells were verified to be free of mycoplasma contamination using Plasmo Test Mycoplasma Detection Kit (InvivoGen, San Diego, CA USA). Both cell lines were authenticated using DNA profiling according to the documentary standard (ASN-0002). Biological replicates were cultured independently.
|
Extracted molecule |
total RNA |
Extraction protocol |
Ovation Universal RNA-Seq System (NuGEN) TT-seq, RNA-seq (Schwalb et al., 2016), or mNET-seq (Nojima et al., 2016; Schlackow et al., 2017).
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
Illumina NextSeq 500 |
|
|
Data processing |
TT-seq and RNA-seq data preprocessing and global normalization parameters. Paired-end 75 base reads with additional 6 base reads of barcodes were obtained for each of the samples (Supplementary Table 1). Reads were demultiplexed and mapped with STAR 2.3.0 (Dobin and Gingeras 2015) to the hg20/hg38 (GRCh38) genome assembly (Human Genome Reference Consortium). Samtools (Li, Handsaker et al. 2009) was used to quality filter SAM files, whereby alignments with MAPQ smaller than 7 (-q 7) were skipped and only proper pairs (-f2) were selected. Further data processing was carried out using the R/Bioconductor environment. We used a spike-in (RNAs) normalization strategy essentially as described (Schwalb, Michel et al. 2016) to allow observation of global shifts (sequencing depth) , cross-contamination rate (proportion of unlabeled reads purified in the TT-seq samples) and antisense bias ratio (ratio of spurious reads originating from the opposite strand introduced by the reverse transcription reaction). Read counts (kij) for spike-ins were calculated using HTSeq (Anders, Pyl et al. 2015). Calculation of the number of transcribed bases. Of all fragments, only those were kept that exhibited a positive inner mate distance. The number of transcribed bases (tbj) for all samples was calculated as the sum of the coverage of evident (sequenced) fragment parts (read pairs only) for all fragments with an inner mate interval not entirely overlapping a Refseq annotated intron (UCSC RefSeq GRCh38, ~ 98.5% of all fragments) in addition to the sum of the coverage of non-evident fragment parts (entire fragment). Number of transcribed bases per base (coverage). Coverages were calculated upon antisense bias corrected number of transcribed bases (tbj) falling into the region of a cTUs divided by its length in bases. Based on the antisense bias corrected coverages a subgroup of expressed cTUs was defined to comprise all cTUs with a coverage of 5 or higher in one of two summarized replicates of TT-seq (HS15, HS30 or control). mNET-seq data preprocessing. Paired-end 75 base reads with additional 6 base reads of barcodes were obtained for each of the samples (Supplementary Table 1). Reads were demultiplexed, trimmed for adapter content with cutadapt (Martin 2011) -O 12 -m 25 -a TGGAATTCTCGG -A GATCGTCGGACT) and mapped with STAR 2.3.0 (Dobin and Gingeras 2015) to the hg20/hg38 (GRCh38) genome assembly (Human Genome Reference Consortium). Samtools (Li, Handsaker et al. 2009) was used to quality filter SAM files, whereby alignments with MAPQ smaller than 7 (-q 7) were skipped and only proper pairs (-f2) were selected. Further data processing was carried out using the R/Bioconductor environment. Antisense bias (ratio of spurious reads originating from the opposite strand introduced by the RT reactions) was determined using positions in regions without antisense annotation with a coverage of at least 100 according to Refseq annotated genes (UCSC RefSeq GRCh38). Coverage tracks for further analysis were restricted to the last nucleotide incorporated by the polymerase in the aligned mNET-seq reads. mNET-seq data preprocessing and normalization. We first identified a subgroup of RefSeq-TUs with unchanged behavior over the response to heat shock in the spike-ins normalized TT-seq data via k-means clustering. On the resulting 3,416 RefSeq-TUs i, size factors for each sample j were determined as in (Anders and Huber 2010) and was used to correct for library size and sequencing depth variations. GenoSTAN annotation of transcription units (TUs, K562 and Raji). Annotation of different transcript classes was done as in (Schwalb, Michel et al. 2016) with minor differences. In brief, genome-wide coverage was calculated from all TT-seq fragment midpoints in consecutive 200 bp bins throughout the genome. In order to create a comprehensive annotation independent of heat shock induced length differences, two replicate tracks were constructed by taking the maximum of each bin over the first and second replicates, respectively, regardless of treatment. A two-state hidden Markov model with a Poisson-Log-Normal emission distribution was learned in order to segment the genome into ‘transcribed’ and ‘untranscribed’ states. Consecutive ‘transcribed’ states were joined, if its gaps were smaller than 200 bp, within a validated GENCODE mRNA or lincRNA (version 22) or showed uninterrupted coverage supported by all TT-seq samples. Subsequently, TU start and end sites were refined to nucleotide precision by finding borders of abrupt coverage increase or decrease between two consecutive segments in the four 200 bp bins located around the initially assigned start and stop sites via fitting a piecewise constant curve to the TT-seq coverage profiles for both replicates using the segmentation method from the R/Bioconductor package "tilingArray" (Huber, Toedling et al. 2006). GRO-cap TSS refinement of TUs (cTUs, K562). For all TUs i, the GRO-cap refined transcription start site tss* was determined as the maximum of GRO-cap signal (Core, Martins et al. 2014) in a window of 500 bp around the start of the TUs. Note that all TUs without a non-zero GRO-cap site were not used. Transcript sorting (K562 and Raji). We sorted each cTU into one of the following seven classes: eRNA, sincRNA, asRNA, conRNA, uaRNA, lincRNA and mRNA (see also Supplementary Figure 3a). First, cTUs reciprocally overlapping by at least 50% with a validated GENCODE mRNA or lincRNA (version 22) in the same strand were classified as mRNAs and lincRNAs. cTUs reciprocally overlapping by less than 50% with a validated GENCODE mRNA or lincRNA (version 22) in the same strand were not classified. Next, cTUs located on the opposite strand of either a mRNA or lincRNA were classified as asRNA if the TSS was located > 1 kb downstream of the sense TSS, as uaRNA if its TSS was located < 1 kb upstream of the sense TSS, and as conRNA if its TSS was located < 1 kb downstream of the TSS. For K562, each of the remaining cTUs did not overlap with a validated GENCODE mRNA or lincRNA (version 22) and was classified as sincRNA. Every ncRNA (sincRNA, asRNA, conRNA or uaRNA) was re-classified as eRNA if its TSS fell into a K562 enhancer state (Zacher, Michel et al. 2017). This resulted in 11,325 non-ambiguously classified RNAs (transcript.annotation.timecourse.refined.TSS.corrected.gtf) (825 eRNA, 1,582 sincRNA, 564 asRNA, 502 conRNA, 1,064 uaRNA, 239 lincRNA and 6,549 mRNA). For Raji, each of the remaining TUs did not overlap with a validated GENCODE mRNA or lincRNA (version 22) and was classified into eRNA – if its TSS exhibited a high (>1) ratio H3K4me1/H3K4me3 – or as sincRNA – if its TSS exhibited a low (<1) ratio of H3K4me1/H3K4me3. This resulted in 16,452 non-ambiguously classified RNAs (transcript.annotation.timecourse.refined.gtf) (3,451 eRNA, 3,479 sincRNA, 1,398 asRNA, 326 conRNA, 565 uaRNA, 243 lincRNA and 6,990 mRNA). Genome_build: hg20/hg38 (GRCh38) genome assembly (Human Genome Reference Consortium http://hgdownload.soe.ucsc.edu/downloads.html#human) Supplementary_files_format_and_content: bigwig files containing the coverage
|
|
|
Submission date |
Dec 17, 2018 |
Last update date |
Jul 10, 2019 |
Contact name |
Björn Schwalb |
E-mail(s) |
bschwal@gwdg.de
|
Organization name |
MPI
|
Street address |
Fassberg 11
|
City |
Göttingen |
ZIP/Postal code |
37077 |
Country |
Germany |
|
|
Platform ID |
GPL18573 |
Series (1) |
GSE123980 |
The pause-initiation limit restricts transcription activation in human cells |
|
Relations |
BioSample |
SAMN10604697 |
SRA |
SRX5145531 |
Supplementary file |
Size |
Download |
File type/resource |
GSM3518105_L_with_1_AAGCCT_1_K562_wildtype.minus.bw |
321.3 Mb |
(ftp)(http) |
BW |
GSM3518105_L_with_1_AAGCCT_1_K562_wildtype.plus.bw |
335.1 Mb |
(ftp)(http) |
BW |
SRA Run Selector |
Raw data are available in SRA |
Processed data provided as supplementary file |
|
|
|
|
|