NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM6643751 Query DataSets for GSM6643751
Status Public on Dec 06, 2022
Title whole blood, subject 5386f49e, sample time point T1, library prep Plate_11
Sample type SRA
 
Source name whole blood
Organism Homo sapiens
Characteristics tissue: whole blood
subject id: Subj_5386f49e
blood sample_id: Subj_5386f49eT1
library prep_plate: Plate_11
Sex: Female
sampling time_point_label: T1
days since_first_sample: 0
patient classification_at_first_sample: SARS-CoV-2_Positive_Ab_Positive
covid-19 positive: Positive
Extracted molecule total RNA
Extraction protocol Frozen blood samples were thawed and total RNA was extracted from the samples using a modification of the MagMax protocol for Stabilized Blood Tubes RNA Isolation Kit (Thermo Fisher, #4451893).
Samples that yielded sufficient RNA (>50 ng) were barcoded and prepared for pooled whole transcriptome sequencing using the TruSeq Stranded Total RNA Library Prep Gold (Illumina, #20020599), which is designed to remove ribosomal, globin, and mitochondrial RNA. Libraries were amplified with 15 cycles of PCR, pooled, and sequenced on a NovaSeq 6000 (Illumina) using Sprime flow cells with 100 base pair paired-end reads, targeting a mean of 50 million read pairs per sample. For a minority of samples in which the first extraction failed (N=24), RNA was re-extracted from the supernatant saved from the first centrifugation pellet. The extraction protocol was repeated starting with the second wash step after re-pelleting the RNA.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina NovaSeq 6000
 
Description See https://www.synapse.org/#!Synapse:syn35874390/ for full clinical data, RNA-seq QC data, and other data and metadata.
Data processing Quality-filtered raw data was converted into FASTQ files using bcl2fastq (Illumina).
RNA-seq reads were aligned to the GRCh38 primary assembly with Gencode gene annotation v30 by STAR (v2.7.3a) using per-sample 2-pass mapping (--twopassMode Basic) and chimeric alignment options (--chimOutType Junctions SeparateSAMold -chimSegmentMin 15 -chimJunctionOverhangMin 15).
RNA-seq QC metrics were calculated by fastqc (v0.11.8) and Picard Tools (v2.22.3).
Quantification was done at the gene-level with antisense specificity using featureCounts (Subread R package v1.6.3 and strandness option -s 2) with gene-level grouping / primary alignments only / count all overlapped features (-t exon -g gene_id -primary -O).
MultiQC was used to compile and summarize per-sample statistics from STAR, Picard Tools and featureCounts (e.g. gene-level counts, mtRNA counts, globinRNA counts, etc.) into an interactive HTML report.
We excluded samples with DV200 below 80% as well as samples with fewer than 10 million mapped reads counted by featureCounts.
To remove the unwanted signal due to globin gene expression in whole blood, counts for all annotated globin genes (gene symbols CYGB, HBA1, HBA2, HBB, HBD, HBE1, HBG1, HBG2, HBM, HBQ1, HBZ, and MB) were discarded, and the remaining count matrix was transformed to counts per million (CPM).
Genes with CPM ≥1 in ≥36 samples (half the number of subjects with no positive PCR or antibody test for SARS-CoV-2 during the study period) were included in our analyses (21,194).
Gene expression was normalized for composition bias using the trimmed mean of M-values method, implemented by calcNormFactors in the edgeR package and transformed to normalized log2 CPM with observation weights computed by voomWithDreamWeights from the variancePartition package.
Batch effects were residualized from the data by fitting a linear mixed model to the normalized log2 CPM and weights for each gene with random effects for library prep plate (the batch effect to be residualized out) and blood sample ID (the biological signal to be retained) using dream, and then subtracting the best linear unbiased predictors for library prep plate while retaining the differences between blood samples and the residuals.
The technical replicates for each batch control sample were summarized to a single residualized expression value equal to the weighted mean of the technical replicates with a weight equal to the sum of the individual weights of the technical replicates.
Assembly: GRCh38 primary genome assembly with Gencode v30 gene annotation (corresponds to Ensembl version 96)
Supplementary files format and content: rnaseq_raw_count_matrix.csv: comma-delimited raw RNA-seq read counts
Supplementary files format and content: rnaseq_logCPM_matrix.csv: comma-delimited filtered, normalized, batch-corrected log2 counts per million computed by voomWithDreamWeights with values for technical replicates summarized to a single weighted mean value per blood sample
Supplementary files format and content: rnaseq_voom_weight_matrix.csv: comma-delimited voom weight matrix corresponding to normalized batch-corrected log2 CPM values
 
Submission date Oct 16, 2022
Last update date Dec 06, 2022
Contact name Ryan Conrad Thompson
E-mail(s) rct@thompsonclan.org
Organization name Icahn School of Medicine at Mount Sinai
Department Charles Bronfman Institute for Personalized Medicine
Lab Beckmann Lab
Street address 1 Gustave L. Levy Place
City New York
State/province NY
ZIP/Postal code 10029-5674
Country USA
 
Platform ID GPL24676
Series (1)
GSE215865 Molecular states during acute COVID-19 reveal distinct etiologies of long-term sequelae
Relations
BioSample SAMN31307713
SRA SRX17908726

Supplementary data files not provided
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap