|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Dec 06, 2022 |
Title |
whole blood, subject 2a142bf4, sample time point T2, library prep Plate_19 |
Sample type |
SRA |
|
|
Source name |
whole blood
|
Organism |
Homo sapiens |
Characteristics |
tissue: whole blood subject id: Subj_2a142bf4 blood sample_id: Subj_2a142bf4T2 library prep_plate: Plate_19 Sex: Female sampling time_point_label: T2 days since_first_sample: 2 patient classification_at_first_sample: SARS-CoV-2_Positive_Ab_Positive covid-19 positive: Positive
|
Extracted molecule |
total RNA |
Extraction protocol |
Frozen blood samples were thawed and total RNA was extracted from the samples using a modification of the MagMax protocol for Stabilized Blood Tubes RNA Isolation Kit (Thermo Fisher, #4451893). Samples that yielded sufficient RNA (>50 ng) were barcoded and prepared for pooled whole transcriptome sequencing using the TruSeq Stranded Total RNA Library Prep Gold (Illumina, #20020599), which is designed to remove ribosomal, globin, and mitochondrial RNA. Libraries were amplified with 15 cycles of PCR, pooled, and sequenced on a NovaSeq 6000 (Illumina) using Sprime flow cells with 100 base pair paired-end reads, targeting a mean of 50 million read pairs per sample. For a minority of samples in which the first extraction failed (N=24), RNA was re-extracted from the supernatant saved from the first centrifugation pellet. The extraction protocol was repeated starting with the second wash step after re-pelleting the RNA.
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
Illumina NovaSeq 6000 |
|
|
Description |
See https://www.synapse.org/#!Synapse:syn35874390/ for full clinical data, RNA-seq QC data, and other data and metadata.
|
Data processing |
Quality-filtered raw data was converted into FASTQ files using bcl2fastq (Illumina). RNA-seq reads were aligned to the GRCh38 primary assembly with Gencode gene annotation v30 by STAR (v2.7.3a) using per-sample 2-pass mapping (--twopassMode Basic) and chimeric alignment options (--chimOutType Junctions SeparateSAMold -chimSegmentMin 15 -chimJunctionOverhangMin 15). RNA-seq QC metrics were calculated by fastqc (v0.11.8) and Picard Tools (v2.22.3). Quantification was done at the gene-level with antisense specificity using featureCounts (Subread R package v1.6.3 and strandness option -s 2) with gene-level grouping / primary alignments only / count all overlapped features (-t exon -g gene_id -primary -O). MultiQC was used to compile and summarize per-sample statistics from STAR, Picard Tools and featureCounts (e.g. gene-level counts, mtRNA counts, globinRNA counts, etc.) into an interactive HTML report. We excluded samples with DV200 below 80% as well as samples with fewer than 10 million mapped reads counted by featureCounts. To remove the unwanted signal due to globin gene expression in whole blood, counts for all annotated globin genes (gene symbols CYGB, HBA1, HBA2, HBB, HBD, HBE1, HBG1, HBG2, HBM, HBQ1, HBZ, and MB) were discarded, and the remaining count matrix was transformed to counts per million (CPM). Genes with CPM ≥1 in ≥36 samples (half the number of subjects with no positive PCR or antibody test for SARS-CoV-2 during the study period) were included in our analyses (21,194). Gene expression was normalized for composition bias using the trimmed mean of M-values method, implemented by calcNormFactors in the edgeR package and transformed to normalized log2 CPM with observation weights computed by voomWithDreamWeights from the variancePartition package. Batch effects were residualized from the data by fitting a linear mixed model to the normalized log2 CPM and weights for each gene with random effects for library prep plate (the batch effect to be residualized out) and blood sample ID (the biological signal to be retained) using dream, and then subtracting the best linear unbiased predictors for library prep plate while retaining the differences between blood samples and the residuals. The technical replicates for each batch control sample were summarized to a single residualized expression value equal to the weighted mean of the technical replicates with a weight equal to the sum of the individual weights of the technical replicates. Assembly: GRCh38 primary genome assembly with Gencode v30 gene annotation (corresponds to Ensembl version 96) Supplementary files format and content: rnaseq_raw_count_matrix.csv: comma-delimited raw RNA-seq read counts Supplementary files format and content: rnaseq_logCPM_matrix.csv: comma-delimited filtered, normalized, batch-corrected log2 counts per million computed by voomWithDreamWeights with values for technical replicates summarized to a single weighted mean value per blood sample Supplementary files format and content: rnaseq_voom_weight_matrix.csv: comma-delimited voom weight matrix corresponding to normalized batch-corrected log2 CPM values
|
|
|
Submission date |
Oct 16, 2022 |
Last update date |
Dec 06, 2022 |
Contact name |
Ryan Conrad Thompson |
E-mail(s) |
rct@thompsonclan.org
|
Organization name |
Icahn School of Medicine at Mount Sinai
|
Department |
Charles Bronfman Institute for Personalized Medicine
|
Lab |
Beckmann Lab
|
Street address |
1 Gustave L. Levy Place
|
City |
New York |
State/province |
NY |
ZIP/Postal code |
10029-5674 |
Country |
USA |
|
|
Platform ID |
GPL24676 |
Series (1) |
GSE215865 |
Molecular states during acute COVID-19 reveal distinct etiologies of long-term sequelae |
|
Relations |
BioSample |
SAMN31307394 |
SRA |
SRX17908689 |
Supplementary data files not provided |
SRA Run Selector |
Raw data are available in SRA |
Processed data are available on Series record |
|
|
|
|
|