Status Public on May 13, 2019
Title HSJ-020
Sample type SRA
Source name Tumor
Organism Homo sapiens
Characteristics genotype: H3.3K27WT
treatment: None
Extracted molecule total RNA
Extraction protocol Total RNA was extracted from cell pellets using the RNeasy mini kit (Qiagen) according to instructions from the manufacturer.
Library preparation was performed with ribosomal RNA (rRNA) depletion according to instructions from the manufacturer (Epicentre) to achieve greater coverage of mRNA and other long non-coding transcripts. Paired-end sequencing was performed on the Illumina HiSeq 2000, 2500 and 4000 platforms.
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina HiSeq 2000
Data processing Data processing. Adaptor sequences and the first four nucleotides of each read were removed from the read sets using Trimmomatic (v0.32). Reads were scanned from start to end and truncated if and when the average quality of a 4-nucleotide sliding window fell too low (phred33<30). Short reads (<30 bp) were subsequently discarded. Multiple control metrics were obtained using FASTQC (v0.11.2), samtools (v0.1.19), BEDtools (v2.17.0) and custom scripts.
Gene expression analysis. The remaining clean set of reads were then aligned to the reference genome build hg19 (GRCh37) with STAR (v2.3.0e) using the default parameters. Multimapping reads (MAPQ>3) were discarded from downstream analyses. Gene expression levels were estimated by quantifying uniquely mapped reads falling into exonic regions defined by the ensGene annotation set from Ensembl (GRCh37; N=60234 genes) using featureCounts (v1.4.4). Normalization (mean-of-ratios), variance-stabilized transformations of the data, as well as differential expression analysis, were performed using DESeq2. Unless otherwise stated, all reported p-values have been adjusted for multiple testing using the Benjamini-Hochberg procedure.
Repeat element expression analysis. The clean set of rRNA-depleted RNA-seq reads were aligned to the human repeat genome, as previously described. The reference repeat sequences were downloaded from Repbase (v23.03) ( We then combined humrep.ref and humsub.ref into a single reference of repeat sequences for the human genome, covering a total of 1132 consensus repeat element sequences. All subsequent steps closely mirror those detailed for gene expression analysis. Normalization (library size) factors derived from canonical genes (Ensembl ensGene annotation) using mean-of-ratios as described above were used to normalize repeat elements.
Genome_build: hg19
Supplementary_files_format_and_content: tab separated values containing raw read counts
Submission date Mar 21, 2019
Last update date May 13, 2019
Contact name Nada Jabado
Organization name McGill University
Department Department of Pediatrics
Lab Jabado Lab
Street address 1001 Décarie Boulevard
City Montreal
State/province Québec
ZIP/Postal code H4A 3J1
Country Canada
Platform ID GPL11154
