GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
Sample GSM4610687 Query DataSets for GSM4610687
Status Public on Jan 12, 2021
Title TTseq_LabeledRNA_K562_Rep2
Sample type SRA
Source name human cancer cell line
Organism Homo sapiens
Characteristics cell line: K562
cell type: cancer
4su labelling: 5min 4sU
method: TT-seq
labeled molecule: 4sU-labeled RNA
Treatment protocol 48 hours before experiment cells were seeded in 15 cm dishes to obtain 70 % confluent cells at time of 4sU labeling. Cells were labeled with 500 µM 4-thiouridine (4sU) (Sigma-Aldrich) for 5 min at 37 °C under 5% CO2. Cells were harvested using TRIzol (Life Technologies).
Growth protocol Cells were cultured in cell culture media supplemented with 1% penicillin/streptomycin (GE Healthcare) and 10% fetal bovine serum (Sigma) at 37°C in a humidified 5% CO2 incubator. BT-20 (ATCC HTB-19), DU 145 (ATCC® HTB-81) and SK-UT-1 (ATCC HTB114) cells were cultured in MEM (Gibco) supplemented with 0.1 mM NEAA (Gibco) and 1 mM sodium pyruvat (Gibco). 769-P (ATCC CRL-1933) and 786-O (ATCC CRL-1932) cells were cultured in RPMI-1640 (Gibco A1049101). MDA-MB-231 (ATCC HTB-26) and LoVo (ATCC CCL-229) cells were cultured in DMEM-GlutaMAX (Gibco 21885-25). U-118 MG (ATCC HTB-15) and GP5d (Sigma) cells were cultured in DMEM (Gibco). GP5d cells were cultivated in cell culture flasks coated with poly-L-lysine (Sigma). M059J (ATCC CRL-2366) cells were cultivated in DMEM/F-12 (Gibco), HCT 116 (ATCC CCL-247) cells in McCoy's 5a Medium Modified (Gibco) and PC-3 (ATCC CRL-1435) cells in F-12K (ATCC 30-2004).
Extracted molecule total RNA
Extraction protocol Total RNA was extracted using TRIzol (Life Technologies), 4sU-labeled RNA was extracted according to Gressel at al.,, 2019.
Sequencing libraries were prepared with the Ovation Universal RNA-Seq System (NuGEN) following the manufacturer's’ instructions
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina HiSeq 2500
Description K562_eRNAs.gtf
Data processing Paired-end 50 bp reads were first mapped to a single copy of the rDNA locus to remove rRNA related sequences. Reads that did not map to the rDNA were then aligned to the hg20/hg38 (GRCh38) genome assembly (Human Genome Reference Consortium) using STAR 2.6.0c (Dobin 2013) with the following specifications: outFilterMismatchNoverLmax 0.05, outFilterMultimapScoreRange 0 and alignIntronMax 500000. Bam files were filtered with Samtools (Li 2009) to remove alignments with MAPQ smaller than 7 (-q 7) and only proper pairs (f99, f147, f83, f163) were selected. Fragment counts for different features were calculated with HTSeq (Anders 2015). Further data processing was carried out using the R/Bioconductor environment. To normalize TT-seq samples for sequencing depth and make them comparable between cell lines we used the median of ratios approach described in the DESeq2 R/Bioconductor package (Anders 2010). Transcription unit (TU) annotation and classification was done using the 4sU-labeled (L) samples as described (Michel, Demel 2017) with a few modifications. Briefly, strandspecific coverage was calculated from fragment midpoints in consecutive 200bp bins throughout the genome for all TTseq samples. Binning reduced the number of uncovered positions within expressed transcripts and increased the sensitivity for detection of lowly synthesized transcripts. The GenoSTAN R/Bioconductor package (Zacher 2017) was used to learn a twostate hidden Markov model with a PoissonLogNormal emission distribution in order to segment the genome into transcribed and untranscribed states, which resulted in 51,198 134,528 TUs per cell line. TUs that overlapped at least 25% of a proteincoding gene annotated in RefSeq (Release 109.20190607) or GENCODE (v31) and overlapped with an annotated exon of the corresponding gene were classified as mRNAs. Remaining TUs were annotated as non-coding (nc)RNAs. In order to overcome low expression or mappability issues, ncRNAs that were only 200 bp (1 bin) apart were merged. Subsequently, TU start and end sites were refined to nucleotide precision by finding borders of abrupt coverage increase or decrease between two consecutive segments in the two 200bp bins located around the initially assigned start and stop sites via fitting a piecewise constant curve to the coverage profiles (whole fragments) for all TTseq samples using the segmentation method from the R/Bioconductor package tilingArray (Huber 2006). In order to filter spurious predictions, TUs were further filtered with a minimal expression threshold of replicate-averaged normalized FPK >= 20 as described above (requiring at least 15 normalized FPK per replicate). ncRNAs overlapping with GENCODE (v31) annotated small non-coding RNA classes (snRNA, snoRNA, tRNA) were omitted from further analysis. The remaining ncRNAs were further classified according to their genomic location relative to protein-coding genes into four categories: upstream antisense RNA (uaRNA), convergent RNA (conRNA), antisense RNA (asRNA) and intergenic RNA. ncRNAs located on the opposite strand of an mRNA were classified as asRNA if the TSS was located > 1 kb downstream of the sense TSS, as uaRNA if the TSS was located < 1 kb upstream of the sense TSS, and as conRNA if the TSS was located < 1 kb downstream of the sense TSS. Each of the remaining ncRNAs was classified as intergenic.
eRNA annotation: Of the ncRNA classes defined above, we further classified intergenic and asRNAs as eRNAs, if their TSS +/- 500 bp overlapped with an enhancer state annotated by GenoSTAN (Zacher 2017) based on patterns of five histone modifications (H3K4me1, H3K4me3, H3K36me3, H3K27me3, and H3K9me3) in any of 127 cell types and tissues covering 111 datasets of the Roadmap Epigenomics project (Roadmap Epigenomics Consortium et al., 2015) as well as 16 datasets of the Encyclopedia of DNA Elements (ENCODE) Project (Dunham et al., 2012). Long antisense eRNAs that overlapped with the promoter region of the respective protein-coding gene (TSS +/- 1 kbp) were annotated to end 1 kbp downstream of the protein-coding TSS. Next, we introduced additional filtering criteria, as highly transcribed TUs can give rise to spurious downstream TUs, which do not represent independent TUs, but ongoing, gradually decreasing transcription by Pol II downstream of the main TU. To avoid misclassification of such downstream TUs as putative eRNAs, we did the following:
First, we restricted our set of intergenic eRNAs to only those originating from regions with TT-seq detected transcription on both strands, with an allowed gap of up to 750 bp between divergently transcribed eRNA pairs. To this end, a non-coding TU had to be annotated on the opposite strand, but it was not required to lie above the expression threshold. Second, we also restricted our set of antisense eRNAs. To this end, non-coding TUs downstream of mRNA/uaRNA TUs were sequentially merged with the upstream mRNA/uaRNA if the gap between TUs was less than 5 kbp. Subsequently, non-coding TUs, which had been classified as antisense eRNAs and were constituents of such merged regions were excluded from further analyses, unless their TT-seq signal (normalized FPK) was at least 2-fold higher than TT-seq signal over the end (last 1 kbp) of the upstream mRNA/uaRNA.
Enhancer annotation: To define intergenic enhancer regions, we selected the region between the TSSs of each pair of intergenic bidirectional eRNAs and extended it by 500 bp in both directions. To define intragenic enhancer regions, we selected the region covering 750 bp upstream to 500 bp downstream of the TSS of antisense eRNAs. Since we allowed a gap of maximum 750 bp between bidirectional intergenic eRNAs, we selected 750 bp also as putative enhancer region upstream of antisense eRNAs. Based on this definition all intragenic enhancer regions were 1250 bp long whereas the length of intergenic enhancer regions was dependent on the gap or overlap between bidirectional eRNAs. Strongly transcribed intergenic enhancer regions gave rise to uninterrupted bidirectional TT-seq signal over long genomic stretches, which could not be resolved at the resolution of individual enhancers, and did therefore result in the definition of some enhancer regions covering several kbp.
Dealing with annotated lncRNAs: The FANTOM consortium has shown that the majority of intergenic lncRNAs originate from enhancer regions, and thus we only excluded eRNAs/enhancers if they overlapped a functional lncRNA originating from an intergenic promoter DHS (24 lncRNAs, e.g. Malat1; taken from Supplementary Table 5 in (Hon 2017) selecting CAT_geneCategory = p_lncRNA_intergenic).
The resulting annotations of eRNAs and enhancer regions reported in Lidschreiber et al. are provided as eRNAs/enhancer .gtf files. In addition, conservative subsets of the eRNAs/enhancer regions reported in Lidschreiber et al. are provided. The conservative annotations contain only intragenic enhancer regions defined by eRNAs >= 300 nt and with normalized FPK >= 30; intergenic enhancer regions / eRNAs are the same as in Lidschreiber et al.
processed data files format and content: eRNA and transcribed enhancer annotations are provided for each of the cell lines as .gtf files. Strand-specific bigWig files contain size factor normalized fragment coverages for each TT-seq labeled RNA sample.
Submission date Jun 11, 2020
Last update date Jan 12, 2021
Contact name Michael Lidschreiber
Organization name Max-Planck-Institute for Multidisciplinary Scienes
Street address Am Fassberg 11
City Goettingen
ZIP/Postal code 37077
Country Germany
Platform ID GPL16791
Series (1)
GSE152291 Transcriptionally active enhancers in human cancer cells
Reanalysis of GSM1967864
BioSample SAMN15215110
SRA SRX8530979

Supplementary file Size Download File type/resource 216.2 Mb (ftp)(http) BW 228.2 Mb (ftp)(http) BW
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap