NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM6103949 Query DataSets for GSM6103949
Status Public on May 09, 2022
Title S001253
Sample type SRA
 
Source name Primary breast tumor
Organism Homo sapiens
Characteristics scanb external id: Q007551.C007529.S001253.l.r.lib.g.k2.a
age at diagnosis: 67
tumor size: 16
lymph node group: N0
lymph node status: NodeNegative
er status: 1
pgr status: 1
her2 status: 0
ki67 status: NA
nhg: G2
pam50 subtype: LumA
clinical groups: ERpHER2n
histopathological type: Ductal
endocrine treated: 1
chemo treated: 0
overall survival days: 1191
overall survival years: 3.26078028747433
overall survival event: 1
relapse free interval days: 1191
relapse free interval years: 3.26078028747433
relapse free interval event: 0
esr1 expression: log2(tpm+0.1): 8.2674228677742
esr2 expression: log2(tpm+0.1): -2.40656113053976
Extracted molecule polyA RNA
Extraction protocol QIAGEN AllPrep
Poly(A) mRNA is isolated from the total RNA in up to 96-well microtiter plate format by two rounds of purification with Dynabeads Oligo (dT)25 (Invitrogen) using a KingFisher Flex magnetic particle processor (ThermoScientific). Zinc-mediated fragmentation (Ambion) is performed and the fragmented mRNA retrieved using column purification (Zymo-Spin I-96 plates; Zymo). The sequencing library generation protocol is a modification of the dUTP method, which importantly retains the directionality (stranded-ness) of the sequenced RNA molecules. First strand cDNA synthesis is performed using random hexamers and standard dNTP mix followed by cleanup using Sephadex gel filtration (Illustra AutoScreen-96A plates; GE Healthcare), and second strand cDNA synthesis is performed using dUTP in place of dTTP in the dNTP-mix and cleanup using Zymo-Spin I-96 plates. The cDNA is end-repaired and A-tailed, and diluted TruSeq adapters with barcodes are ligated using a modified protocol (Illumina). Adapter-ligated cDNA is then size-selected to remove short oligonucleotides using carboxylic acid (CA) paramagnetic beads (Invitrogen) and polyethylene glycol (PEG), similar to the previously described methods, and automated on the KingFisher Flex. The second cDNA strand is digested using uracil-DNA glycosylase and the product is enriched by 12 PCR cycles (Illumina). The PCR product undergoes two cycles of size selection using CA-beads and varying concentrations of PEG, first to exclude DNA fragments >700 bp and then to exclude fragments <200 bp. Quality control is performed on control libraries using Qubit fluorometric measurement (Life Technologies) and Caliper LabChip XT microcapillary gel electrophoresis. Typically, 10-24 barcoded libraries are included in a pool and each pool is sequenced in at least one lane across dual flowcells. Paired-end sequencing of 50 bp read-length is performed on an Illumina HiSeq 2000 or NextSeq 500 instrument.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina HiSeq 2000
 
Description ER/PgR/HER2/Ki67: 0 negative/normal/low, 1 positive/amplified/high
Data processing Base-calling using manufacturer's on-instrument software.
Demultiplexing was done with Picard versions 1.120 or 1.128. IlluminaBasecallsToFastq parameters used were ADAPTERS_TO_CHECK=INDEXED, ADAPTERS_TO_CHECK=PAIRED_END, INCLUDE_NON_PF_READS=false
Filtering to remove reads that align (using Bowtie 2 with default parameters except -k 1 --phred33 --local) to ribosomal RNA/DNA (GenBank loci NR_023363.1, NR_003285.2, NR_003286.2, NR_003287.2, X12811.1, U13369.1), phiX174 Illumina control (NC_001422.1), and sequences contained in the UCSC RepeatMasker track (downloaded March 14, 2011).
Fragment size distribution (mean and width) for the alignment step was estimated for each sample using bowtie2 2.2.3 and 2.2.5. Parameter set during estimation were -fr, -k 1, --phred33, --local, and -u 100000, using human genome assembly GRCh38.
Remaining reads were aligned using HISAT2 v 2.1.0 to the human genome reference GRCh38/hg38 together with 104,133 transcript annotations from the UCSC knownGenes table (downloaded September 22, 2014) using the GENCODE release 27 transcriptome model, with default parameters except --no-unal --non-deterministic --novel-splicesite-outfile ${SPLICEFILE}  --rna-strandness RF. HISAT2 indexes were created using the --snp parameter and dbSNP build 150.
Gene expression data in FPKM ( fragments per kilobase of transcript per million mapped reads) were generated using StringTie v1.3.3b (default parameters including --rf -e) using protein coding transcripts from GENCODE release 27 as transcriptome model. Novel transcripts were discarded. An FPKM gene expression matrix was generated from .ctab files using tximport and subsequently transformed to TPM values adding to each expression measurement 0.1, followed by log2 transformation.
PAM50 subtyping was performed using an implementation of the Parker method (Parker et al., J.Clin Oncol 2009). In short, to avoid context dependency when assigning PAM50 subtype by nearest-centroid, a fixed reference was selected to match the original cohort used by Parker et al. with respect to available clinical characteristics. Before subtyping tumors in this study, gene expression of the PAM50 genes for each tumor was centered to the reference set separately using custom R scripts.
Spearman rank correlation was used to determine correlations between expression of ESR1 and ESR2. Kruskal-Wallis non-parametric test and Wilcoxon rank sum test were used to compare and plot expression of the ESR1 and ESR2 genes in various clinical groups such as PAM50 subtype and age groups in the SCAN-B cohort. Transformed ESR2 expression data was divided into tertiles, with the first tertile defined as ESR2-high, and the bottom two tertiles at ESR2-low. Mann Whitney U test and Fisher’s exact test were used to evaluate significant differences in the clinicopathological variables for the ESR2-high and ESR2-low groups. Survival analysis was performed by Kaplan-Meier and Cox regression survival analyses. All analysis was performed for a cohort of 3207 samples from patients enrolled in the SCAN-B study.
Assembly: Human genome reference GRCh38/hg38.
Supplementary files format and content: Gene expression in TPM in CSV format.
 
Submission date May 04, 2022
Last update date May 09, 2022
Contact name Lao H Saal
E-mail(s) lao.saal@med.lu.se
Organization name Lund University
Department Department of Oncology and Pathology
Lab Translational Oncogenomics Unit
Street address Scheelevägen 2, MV404B2
City Lund
ZIP/Postal code 22391
Country Sweden
 
Platform ID GPL11154
Series (1)
GSE202203 Clinical associations of ESR2 (estrogen receptor beta; ERβ) expression across thousands of primary breast tumors 
Relations
Reanalysis of GSM2528162

Supplementary data files not provided
Processed data are available on Series record
Raw data not provided for this record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap