|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on May 17, 2019 |
Title |
WT_N_3 (5711) |
Sample type |
SRA |
|
|
Source name |
normal mammary epithelial cells
|
Organism |
Mus musculus |
Characteristics |
strain: C3H/HeJ genotyping: Wild type age: 270 days
|
Extracted molecule |
polyA RNA |
Extraction protocol |
RNA extracted from tumors using TRI reagent (Biolab), then chloroform was added. The upper layer was taken and washed with isopropanol and then with ethanol 75%. The RNA was eluted using DPEC water Libraries were made using KAPA Single-Indexed Adapter Kit (Illumina, Massachusetts, USA)
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
Illumina NextSeq 500 |
|
|
Description |
DESeq2_final 14-7-2018
|
Data processing |
Trimming and filtering of raw reads The NextSeq basecalls files were converted to fastq files using the bcl2fastq (v2.17.1.14) program with default parameters. The provided SampleSheet.csv file contained samples' names and barcodes only, so no trimming or filtering was done at this stage and a fastq file was created for each sample separately. Raw reads (fastq files) were inspected for quality issues with FastQC (v0.11.2, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). According to the FastQC report, reads were quality-trimmed at both ends, using in-house Perl scripts, with a quality threshold of 32. In short, the scripts use a sliding window of 5 bases from the read's end and trim one base at a time until the average quality of the window passes the given threshold. Following quality-trimming, adapter sequences were removed with cutadapt (version 1.7.1, http://cutadapt.readthedocs.org/en/stable/), using a minimal overlap of 1 (-O parameter), allowing for read wildcards, and filtering out reads that became shorter than 15 nt (-m parameter). The remaining reads were further filtered to remove very low quality reads, using the fastq_quality_filter program of the FASTX package (version 0.0.13, http://hannonlab.cshl.edu/fastx_toolkit/), with a quality threshold of 20 at 90 percent or more of the read's positions. Mapping and differential expression analysis The processed fastq files were mapped to the mouse transcriptome and genome using TopHat (v2.0.13). The genome version was GRCm38, with annotations from Ensembl release 84. Mapping allowed up to 5 mismatches per read, a maximum gap of 5 bases, and a total edit distance of 10 (full command: tophat -G genes.gtf -N 5 --read-gap-length 5 --read-edit-dist 10 --segment-length 20 --read-realign-edit-dist 3 --library-type fr-firststrand genome processed.fastq). For the statistics file, quantification was done using htseq-count (version 0.6.0, http://www-huber.embl.de/users/anders/HTSeq/doc/count.html). Strand information was set to 'reverse', and an annotation file that lacked information for genes of type IG, TR, Mt, rRNA, tRNA, miRNA, misc_RNA, scRNA, snRNA, snoRNA, sRNA, scaRNA, piRNA, vaultRNA, ribozyme, artifact and LRG_gene, was used. For further analysis, quantification was done with the Cufflinks package (v2.2.1), using the cuffquant program with the genome bias correction (-b parameter), multi-mapped reads assignment algorithm (-u parameter) and masking for genes of the same types as above (IG, TR, etc.) (-M parameter). Strand directionallity was given by the --library-type fr-firststrand parameter. Raw counts were obtained by running cuffnorm on the cuffquant output. Normalization and differential expression were done with the DESeq2 package (version 1.10.1). Genes with a sum of counts less than 2 over all samples were filtered out prior to normalization, then dispersion and size factors were calculated. Differential expression was calculated using a design which included the genotype factor, compensating for any run effects (both P53_WWOX_DKO and WWOX_KO_T genotypes included samples from 2 different runs). Calculations were performed with default parameters. All required comparisons were performed with a significance threshold of padj<0.1, using default parameters. Comparisons between normal and cancer samples were performed with the lfcThreshold (changing the default testing for any change to such that is above a given threshold), setting its value to 0.7 (~1.6 fold). Several quality control assays, such as counts distributions and principal component analysis, as well as differential expression results, were calculated and visualized in R (version 3.2.1, with packages 'RColorBrewer_1.1-2', 'pheatmap_1.0.8' and 'ggplot2_2.1.0'). The principle component analysis showed that sample P53_WWOX_DKO_4 harbored an additional, unknown, source of variation, thus might be an outlier. Hence, an analysis which excluded this sample (using the same parameters otherwise) was also performed (termed All_noDKO4). In addition, samples in this experiment came from two different mice strains, introducing a source of variation which might not be directly related to the different genotypes. For each strain, an analysis was performed which included only its own samples, calculating the intra-strain comparisons (using the same parameters as above). Results (for all analyses) were then combined with gene details (such as symbol, Entrez accession, etc.) taken from the results of a BioMart query (Ensembl, release 84) to produce the final Excel file. Genome_build: mm10 Supplementary_files_format_and_content: Normalization and differential expression were done with the DESeq2 package (version 1.10.1).
|
|
|
Submission date |
Jul 19, 2018 |
Last update date |
May 17, 2019 |
Contact name |
Suhaib Khaled Abdeen |
E-mail(s) |
suhibabdin@gmail.com
|
Organization name |
Hebrew Univeristy
|
Department |
Immunology and cancer research
|
Lab |
Rami Aqeilan
|
Street address |
Ein kerem
|
City |
Jerusalem |
ZIP/Postal code |
9112101 |
Country |
Israel |
|
|
Platform ID |
GPL19057 |
Series (1) |
GSE117387 |
Somatic loss of WWOX is associated with TP53 perturbation in basal-like breast cancer |
|
Relations |
BioSample |
SAMN09692894 |
SRA |
SRX4409210 |
Supplementary data files not provided |
SRA Run Selector |
Raw data are available in SRA |
Processed data are available on Series record |
|
|
|
|
|