|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Mar 12, 2018 |
Title |
F1 |
Sample type |
SRA |
|
|
Source name |
Primary breast tumor
|
Organism |
Homo sapiens |
Characteristics |
scan-b external id: Q008818.C008840.S000215.l.r.m2.c.lib.g.k.a.t instrument model: HiSeq 2000 age at diagnosis: 43 tumor size: 9 lymph node group: NodeNegative lymph node status: NodeNegative er status: NA pgr status: NA her2 status: 0 ki67 status: NA nhg: G3 er prediction mgc: 0 pgr prediction mgc: 0 her2 prediction mgc: 0 ki67 prediction mgc: 1 nhg prediction mgc: G3 er prediction sgc: 0 pgr prediction sgc: 0 her2 prediction sgc: 0 ki67 prediction sgc: 1 pam50 subtype: Basal overall survival days: 2367 overall survival event: 0 endocrine treated: 0 chemo treated: 1
|
Extracted molecule |
polyA RNA |
Extraction protocol |
QIAGEN AllPrep Poly(A) mRNA is isolated from the total RNA in up to 96-well microtiter plate format by two rounds of purification with Dynabeads Oligo (dT)25 (Invitrogen) using a KingFisher Flex magnetic particle processor (ThermoScientific). Zinc-mediated fragmentation (Ambion) is performed and the fragmented mRNA retrieved using column purification (Zymo-Spin I-96 plates; Zymo). The sequencing library generation protocol is a modification of the dUTP method, which importantly retains the directionality (stranded-ness) of the sequenced RNA molecules. First strand cDNA synthesis is performed using random hexamers and standard dNTP mix followed by cleanup using Sephadex gel filtration (Illustra AutoScreen-96A plates; GE Healthcare), and second strand cDNA synthesis is performed using dUTP in place of dTTP in the dNTP-mix and cleanup using Zymo-Spin I-96 plates. The cDNA is end-repaired and A-tailed, and diluted TruSeq adapters with barcodes are ligated using a modified protocol (Illumina). Adapter-ligated cDNA is then size-selected to remove short oligonucleotides using carboxylic acid (CA) paramagnetic beads (Invitrogen) and polyethylene glycol (PEG), similar to the previously described methods, and automated on the KingFisher Flex. The second cDNA strand is digested using uracil-DNA glycosylase and the product is enriched by 12 PCR cycles (Illumina). The PCR product undergoes two cycles of size selection using CA-beads and varying concentrations of PEG, first to exclude DNA fragments >700 bp and then to exclude fragments <200 bp. Quality control is performed on control libraries using Qubit fluorometric measurement (Life Technologies) and Caliper LabChip XT microcapillary gel electrophoresis. Typically, 10-24 barcoded libraries are included in a pool and each pool is sequenced in at least one lane across dual flowcells. Paired-end sequencing of 50 bp read-length is performed on an Illumina HiSeq 2000 or NextSeq 500 instrument.
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
Illumina HiSeq 2000 |
|
|
Description |
ER/PgR/HER2/Ki67: 0 negative/normal/low, 1 positive/amplified/high processed data file: gene_expression_3273_samples_and_136_replicates_transformed.csv transcript_expression_3273_samples_and_136_replicates.csv
|
Data processing |
Base-calling using manufacturer's on-instrument software. Demultiplexing was done with Picard versions 1.120 or 1.128. IlluminaBasecallsToFastq parameters used were ADAPTERS_TO_CHECK=INDEXED, ADAPTERS_TO_CHECK=PAIRED_END, INCLUDE_NON_PF_READS=false Filtering to remove reads that align (using Bowtie 2 with default parameters except -k 1 --phred33 --local) to ribosomal RNA/DNA (GenBank loci NR_023363.1, NR_003285.2, NR_003286.2, NR_003287.2, X12811.1, U13369.1), phiX174 Illumina control (NC_001422.1), and sequences contained in the UCSC RepeatMasker track (downloaded March 14, 2011). Fragment size distribution (mean and width) for the alignment step was estimated for each sample using bowtie2 2.2.3 and 2.2.5. Parameter set during estimation were -fr, -k 1, --phred33, --local, and -u 100000, using human genome assembly GRCh38. Remaining reads were aligned using TopHat2 2.0.12 or 2.0.13 (default parameters except for --mate-inner-dist X (estimated in previous step), --mate-std-dev Y (estimated in previous step), --library-type fr-firststrand, --no-coverage-search, --max-insertion-length 20, --max-deletion-length 20, --read-gap-length 20, --read-edit-dist 22) to the human genome reference GRCh38 together with 104,133 transcript annotations from the UCSC knownGenes table (downloaded September 22, 2014). Gene expression data in FPKM were generated using cufflinks 2.2.1 (default parameters except –GTF, --frag-bias-correct GRCh38.fa, --multi-read-correct, --library-type fr-firststrand, --total-hits-norm, --max-bundle-frags 10000000). The resulting data was was post-processed by collapsing on 30,865 unique gene symbols (sum of FPKM values of each matching transcript), adding to each expression measurement 0.1 FPKM, and performing a log2 transformation. PAM50 subtyping was performed using an implementation of the Parker method (Parker et al., J.Clin Oncol 2009). In short, to avoid context dependency when assigning PAM50 subtype by nearest-centroid, a fixed reference was selected to match the original cohort used by Parker et al. with respect to available clinical characteristics. Before subtyping tumors in this study, gene expression of the PAM50 genes for each tumor was centered to the reference set separately using custom R scripts. Single-gene classifiers (SGCs) and multi-gene classifiers (MGCs) were trained for ER, PgR, HER2, Ki67 (SGC and MGC) and NHG (MGC only) on the expression of the single underlying gene (ESR1, PGR, ERBB2 or MKI67) or multiple genes (5000 most varying genes across all samples) of a 405 sample cohort using consensus pathology scores as labels. Classifier training was performed by selecting expression thresholds that maximize prediction concordance to the consensus scores (SGCs) and nearest shrunken centroids using the pamr R package (MGCs). The classifiers were used to predict the biomarker status in a validation cohort of 3273 samples from patients enrolled in the SCAN-B study. Genome_build: Human genome reference GRCh38/hg38. Supplementary_files_format_and_content: Gene expression in FPKM in CSV format.
|
|
|
Submission date |
Mar 09, 2017 |
Last update date |
May 04, 2022 |
Contact name |
Lao H Saal |
E-mail(s) |
lao.saal@med.lu.se
|
Organization name |
Lund University
|
Department |
Department of Oncology and Pathology
|
Lab |
Translational Oncogenomics Unit
|
Street address |
Scheelevägen 2, MV404B2
|
City |
Lund |
ZIP/Postal code |
22391 |
Country |
Sweden |
|
|
Platform ID |
GPL11154 |
Series (2) |
GSE81540 |
Clinical Value of RNA Sequencing–Based Classifiers for Prediction of the Five Conventional Breast Cancer Biomarkers: A Report From the Population-Based Multicenter Sweden Cancerome Analysis Network—Breast Initiative [superseries] |
GSE96058 |
Clinical Value of RNA Sequencing–Based Classifiers for Prediction of the Five Conventional Breast Cancer Biomarkers: A Report From the Population-Based Multicenter Sweden Cancerome Analysis Network—Breast Initiative [cohort 3273] |
|
Relations |
Reanalyzed by |
GSM6103185 |
BioSample |
SAMN06556824 |
Supplementary data files not provided |
Raw data provided as supplementary file |
Processed data are available on Series record |
|
|
|
|
|