GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM6643751

Query DataSets for GSM6643751

Status

Public on Dec 06, 2022

Title

whole blood, subject 5386f49e, sample time point T1, library prep Plate_11

Sample type

SRA

Source name

whole blood

Organism

Homo sapiens

Characteristics

tissue: whole blood
subject id: Subj_5386f49e
blood sample_id: Subj_5386f49eT1
library prep_plate: Plate_11
Sex: Female
sampling time_point_label: T1
days since_first_sample: 0
patient classification_at_first_sample: SARS-CoV-2_Positive_Ab_Positive
covid-19 positive: Positive

Extracted molecule

total RNA

Extraction protocol

Frozen blood samples were thawed and total RNA was extracted from the samples using a modification of the MagMax protocol for Stabilized Blood Tubes RNA Isolation Kit (Thermo Fisher, #4451893).
Samples that yielded sufficient RNA (>50 ng) were barcoded and prepared for pooled whole transcriptome sequencing using the TruSeq Stranded Total RNA Library Prep Gold (Illumina, #20020599), which is designed to remove ribosomal, globin, and mitochondrial RNA. Libraries were amplified with 15 cycles of PCR, pooled, and sequenced on a NovaSeq 6000 (Illumina) using Sprime flow cells with 100 base pair paired-end reads, targeting a mean of 50 million read pairs per sample. For a minority of samples in which the first extraction failed (N=24), RNA was re-extracted from the supernatant saved from the first centrifugation pellet. The extraction protocol was repeated starting with the second wash step after re-pelleting the RNA.

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina NovaSeq 6000

Description

See https://www.synapse.org/#!Synapse:syn35874390/ for full clinical data, RNA-seq QC data, and other data and metadata.

Data processing

Quality-filtered raw data was converted into FASTQ files using bcl2fastq (Illumina).
RNA-seq reads were aligned to the GRCh38 primary assembly with Gencode gene annotation v30 by STAR (v2.7.3a) using per-sample 2-pass mapping (--twopassMode Basic) and chimeric alignment options (--chimOutType Junctions SeparateSAMold -chimSegmentMin 15 -chimJunctionOverhangMin 15).
RNA-seq QC metrics were calculated by fastqc (v0.11.8) and Picard Tools (v2.22.3).
Quantification was done at the gene-level with antisense specificity using featureCounts (Subread R package v1.6.3 and strandness option -s 2) with gene-level grouping / primary alignments only / count all overlapped features (-t exon -g gene_id -primary -O).
MultiQC was used to compile and summarize per-sample statistics from STAR, Picard Tools and featureCounts (e.g. gene-level counts, mtRNA counts, globinRNA counts, etc.) into an interactive HTML report.
We excluded samples with DV200 below 80% as well as samples with fewer than 10 million mapped reads counted by featureCounts.
To remove the unwanted signal due to globin gene expression in whole blood, counts for all annotated globin genes (gene symbols CYGB, HBA1, HBA2, HBB, HBD, HBE1, HBG1, HBG2, HBM, HBQ1, HBZ, and MB) were discarded, and the remaining count matrix was transformed to counts per million (CPM).
Genes with CPM ≥1 in ≥36 samples (half the number of subjects with no positive PCR or antibody test for SARS-CoV-2 during the study period) were included in our analyses (21,194).
Gene expression was normalized for composition bias using the trimmed mean of M-values method, implemented by calcNormFactors in the edgeR package and transformed to normalized log2 CPM with observation weights computed by voomWithDreamWeights from the variancePartition package.
Batch effects were residualized from the data by fitting a linear mixed model to the normalized log2 CPM and weights for each gene with random effects for library prep plate (the batch effect to be residualized out) and blood sample ID (the biological signal to be retained) using dream, and then subtracting the best linear unbiased predictors for library prep plate while retaining the differences between blood samples and the residuals.
The technical replicates for each batch control sample were summarized to a single residualized expression value equal to the weighted mean of the technical replicates with a weight equal to the sum of the individual weights of the technical replicates.
Assembly: GRCh38 primary genome assembly with Gencode v30 gene annotation (corresponds to Ensembl version 96)
Supplementary files format and content: rnaseq_raw_count_matrix.csv: comma-delimited raw RNA-seq read counts
Supplementary files format and content: rnaseq_logCPM_matrix.csv: comma-delimited filtered, normalized, batch-corrected log2 counts per million computed by voomWithDreamWeights with values for technical replicates summarized to a single weighted mean value per blood sample
Supplementary files format and content: rnaseq_voom_weight_matrix.csv: comma-delimited voom weight matrix corresponding to normalized batch-corrected log2 CPM values

Submission date

Oct 16, 2022

Last update date

Dec 06, 2022

Contact name

Ryan Conrad Thompson

E-mail(s)

rct@thompsonclan.org

Organization name

Icahn School of Medicine at Mount Sinai

Department

Charles Bronfman Institute for Personalized Medicine