GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
Sample GSM7766775 Query DataSets for GSM7766775
Status Public on Sep 11, 2023
Title P020
Sample type SRA
Source name 3rd and 4th rosette leaves
Organism Arabidopsis thaliana
Characteristics tissue: 3rd and 4th rosette leaves
genotype: Ws-2
Growth protocol The Arabidopsis thaliana ecotype Wassilewskija Ws-2 seeds were surface sterilized and then grown in MS-agar, 1% sucrose plates. These were cold treated at 4C in the dark for 4 days before being grown in 16h light/ 8h dark cycles at a constant temperature of 21C. Seedlings were transferred to soil (with 5% sand) after 10 days and the 3rd and 4th true rosette leaves were tracked for future reference. After 10 days in soil conditions, 75 plants were selected from the total of 225 plants grown, with plants selected to ensure a diversity of bolting statuses. At ZT4 (4h after lights on) the following day, the 3rd and 4th rosette leaves for each individual plant were pooled and flash frozen in liquid nitrogen.
Extracted molecule polyA RNA
Extraction protocol Total RNA was isolated from these samples using the Qiagen RNeasy Plant Mini Kit (Cat no. 74904). Residual genomic DNA was removed using the Invitrogen Turbo DNA-free kit (Cat no. AM1907), according to the manufacturer’s protocol.
Libraries were prepared with the NEBNext Ultra II Directional Library Prep Kit for Illumina (Cat no. E7765), using the NEBNext poly(A) magnetic isolation module (Cat no. E7490). Quality control was performed with the Agilent 2100 Bioanalyzer instrument (Part no. G2939BA). Finally, a total of 70 libraries were pooled and sequenced, via Novagene, using one lane on an Illumina NovaSeq system.
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina NovaSeq 6000
Description 4 days of stratification at 4C; 10 days grown on plates (16h light / 8h dark); 10 days grown in soil (same photoperiod); then sampled
Data processing Before analysis of the raw sequencing data, FastQC v0.11.7(Andrews et al. 2012) was used to assess read quality.
Illumina adapters were trimmed using CutAdapt v3.4 (Martin 2011). (parameters: -a AGATCGGAAGAG -A AGATCGGAAGAG -j 0 -q 10,10 -m 30 -u 10 -U 10)
Reads were quantified using Salmon v1.6.0 (Patro et al. 2017) and the TAIR10 transcriptome (Berardini et al. 2015). (parameters: -l A --validateMappings)
To analyse genetic variation within the population, we followed GATK guidelines for short variant discovery from RNA-seq data.
We used STAR (v2.7.10b; two-pass mode) to align reads to TAIR10 genome, revision 56 (Dobin et al. 2013). (parameters: --runThreadN 20 --readFilesIn ${file}_1.fq.gz ${file}_2.fq.gz --readFilesCommand "gunzip -c" --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 5000000000 --twopassMode Basic --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 0 --outFilterMismatchNmax 1)
We then pre-processed the aligned reads using the commands ‘MarkDuplicates’ (parameters: gatk --java-options '-Xmx28G' MarkDuplicates -ASO coordinate --VERBOSITY DEBUG --MAX_RECORDS_IN_RAM 5000000) and ‘SplitNCigarReads’ (parameters: gatk --java-options '-Xmx28G' SplitNCigarReads --verbosity DEBUG) from GATK (v4.3.0.0) (Poplin et al. 2018).
We used the command ‘HaplotypeCaller’ (parameters: -ERC GVCF --dont-use-soft-clipped-bases true --standard-min-confidence-threshold-for-calling 20) to produce genomic variant calling format (gVCF) files per sample.
Finally, we combined GVCF files using 'CombineGVCFs' (parameters: gatk --java-options '-Xmx60G' CombineGVCFs) produced identified single-nucleotide variants (SNVs) and insertions / deletions (indels) which were confidently called across the whole population, using ‘GenotypeGVCFs’ (parameters: gatk --java-options "-Xmx60g" GenotypeGVCFs).
For further processing of these initial SNVs and indels, we followed the filtering guidelines suggested by ref. (Cruz et al. 2020). Specifically, we selected only biallelic variants with a minimum genotype quality of 40, and which were called in at least 80% of all samples, using VCFtools (v0.1.16) (Danecek et al. 2011). (parameters: --minGQ 40 --max-missing 0.8 --out joint_genotyping/all_genotyped_filtered.vcf --recode)
We then imputed missing genotypes using Beagle (v5.4, 22Jul22, 46e) on default settings (Browning, Zhou, and Browning 2018). Note, due to pre-processing by Beagle, the variant calling format (VCF) file includes two versions of the same heterozygous haplotype (‘0|1’ and ‘1|0’). (parameters: java -Xmx10240m -jar beagle.22Jul22.46e.jar nthreads=2)
Finally, we selected variants with a minor allele frequency (MAF) of at least 0.05, using VCFtools. (vcftools --maf 0.05 --recode)
Assembly: TAIR10
Supplementary files format and content: Salmon quantification folder for each sample, included as a .tar file
Supplementary files format and content: A final VCF file for all samples
Submission date Sep 07, 2023
Last update date Sep 11, 2023
Contact name Ethan James Redmond
Organization name University of York
Department Department of Biology
Lab Ezer lab
Street address Wentworth Way
City York
ZIP/Postal code YO10 5DD
Country United Kingdom
Platform ID GPL26208
Series (1)
GSE242681 Single-plant-omics reveals the cascade of transcriptional changes during the vegetative-to-reproductive transition
BioSample SAMN37318780
SRA SRX21665181

Supplementary file Size Download File type/resource
GSM7766775_P020_quant.tar.gz 611.7 Kb (ftp)(http) TAR
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap