Status Public on Jun 04, 2021
Title HYR__3ac717__20201006_non-exo_S2
Sample type SRA
Source name 50:50 mixture of human MM087 cells and mouse melanoma cells
Organisms Homo sapiens; Mus musculus
Characteristics tissue/cell line: Human: melanoma cell line/mouse: melanoma cell line
Growth protocol MM087 mouse melanoma cells were cultured in F-10 Nutrient mix supplemented with 10% FBS and 1% penicillin/streptomycin and passaged once per week. Mouse melanoma cells were cultured in DMEM supplemented with 10% FBS and 1% penicillin/streptomycin and passaged once per week.
Extracted molecule cytoplasmic RNA
Extraction protocol NOT PROVIDED; REQUESTED
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model NextSeq 2000
Data processing For HyDrop-ATAC samples (prefix HYA):
Barcode reads (read 2) were trimmed to exclude the intersub-barcode PCR adapters MAWK
Barcode reads (read 2) were trimmed using trimgalore (trim_galore -j 2 -o . $R1 $R2 --paired --gzip), corrected to a whitelist allowing 1 mismatch (custom seq script inVSN pipeline version 41df185308 [develop_atac]), and appended to the fastq sequence identifier of the paired-end ATAC-seq reads (read 1 and read 3)
Cell barcodes were written to the bam as a CB SAM tag using SAMtools and AWK (custom VSN pipeline version 41df185308 [develop_atac] script) (samtools view -h HYA__combined__20210323_cortex_phu_dv_etssb_1-5_S5.bwa.out.possorted.bam | mawk '/^@/ {print;next} {N=split($1,n,"_"); NN=split(n[1],nbc,"-"); ucorr_bc=""; corr_bc=""; if(NN==1){ # BioRad data with format {corrected_bc}_qname; corr_bc="CB:Z:" nbc[1];} if(NN==2){ucorr_bc="CR:Z:" nbc[1]; if(length(nbc[2])>0){corr_bc="
Reads were mapped to reference genomes using bwa-mem and duplicate-marked using samtools markdup by the VSN pipeline version 41df185308 [develop_atac] (bwa mem -t 8 genome.fa $R1.fq.gz $R2.fq.gz | samtools fixmate -@ 6 -m -u -O bam - - | samtools sort -@ 6 -u -O bam - | samtools markdup -@ 6 -f markdup.log - bwa.out.possorted.bam).
Fragments files were generated using Sinto (sinto fragments -b $bam -m 30 --barcodetag CB --use_chrom "(?i)^chr" --min_distance 10 --max_distance 5000 --chunksize 5000000 -p 2 -f $fragments.bed)
Per-cluster differentially accessible region were generated using pycistopic '0.1.dev214+gc72ef83.d20210522' (markers_dict = find_diff_features(cistopic_obj, imputed_acc_obj, variable='cell_type_finetuned', var_features=variable_regions, contrasts=None, adjpval_thr=0.05, log2fc_thr=np.log2(1.5), n_cpu=12)) and written to BED format using a custom python script.
Per-cluster coverage bigwigs were generated using pycistopic '0.1.dev214+gc72ef83.d20210522' (bw_paths, bed_paths = export_pseudobulk(input_data = cistopic '0.1.dev214+gc72ef83.d20210522'_obj.cell_data, variable = 'cell_type_finetuned', chromsizes = chromsizes, bed_path = bed_path, bigwig_path = bw_path, path_to_fragments = fragments_dict, n_cpu = nclusters, normalize_bigwig = True, remove_duplicates = True))
Aggregate genome coverage bigwigs were generated using DeepTools 3.5.0 (bamCoverage -b ${file} -o ${file%.bam} -bs 1 -p 4 --normalizeUsing RPGC --effectiveGenomeSize 2913022398)
For HyDrop-RNA samples (prefix HYR):
For HyDrop-ATAC samples (prefix HYA):
Barcode reads (read 2) were trimmed to exclude the intersub-barcode PCR adapters MAWK
Reads were mapped to reference genomes and expression matrices were generated using STAR version: 2.7.8a_2021-04-27 in STARsolo mode (STAR --runThreadN 6 --runMode alignReads --outSAMtype BAM SortedByCoordinate --sysShell /bin/bash --genomeDir "${star_reference_dir}" --readFilesIn "${fastq_R1_filename}" "${fastq_R2_filename}" --readFilesCommand 'gzip -c -d' --soloCBwhitelist "${whitelist_part1_filename}" "${whitelist_part2_filename}" "${whitelist_part3_filename}" --soloType CB_UMI_Complex --soloCBposition 0_0_0_9 0_20_0_29 0_40_0_49 --soloUMIposition 0_50_0_57 --sjdbGTFfile $sjdbgtf --soloCellFilter CellRanger2.2 2000 0.99 10 --soloCBmatchWLtype 1MM --outFilterMultimapNmax 1 --outSAMattributes NH HI AS nM CB UB CR CY UR UY --outFileNamePrefix ${bam_filename%bam} --outReadsUnmapped Fastx --quantMode GeneCounts --bamRemoveDuplicatesType UniqueIdentical --soloFeatures Gene GeneFull Velocyto)
For the optimisation samples (HYR__1ec4a0__20200929_Exo_S1 HYR__3ac717__20201006_non-exo_S2 HYR__70afb2__20201006_TSO-LNA_S3 HYR__c6c4b7__20201016_gtp_S4 HYR__d0a5b4__20201023_klenow_S5 HYR__f0eb5a__20201023_bst_S6), FASTQs were downsampled to match the reads per cell depth of the lowest sample and re-mapped using the same STAR parameters to generate the DOWNSAMPLED expression matrices.
Genome_build: For human samples: hg38, for mouse samples: mm10, for hybrid samples: GRCh38_GRCm10 hybrid genome, for fly samples: dm6
Supplementary_files_format_and_content: For HyDrop-ATAC samples (prefix HYA): standard tab-separated fragments files, tar archive with per-cluster differentially accessible region bed files, tar archive with per-cluster genome coverage bigwig
Supplementary_files_format_and_content: For HyDrop-RNA samples (prefix HYR): STARsolo gene expression matrix in MEX format
Submission date May 27, 2021
Last update date Jun 04, 2021
Contact name Stein Aerts
Organization name KU Leuven
Lab Lab of Computational Biology
Street address O&N4 Herestraat 49 PO Box 602
City Leuven
State/province Vlaams-Brabant
ZIP/Postal code 3000
Country Belgium
Platform ID GPL30202
GSE175684 HyDrop: droplet-based scATAC-seq and scRNA-seq using dissolvable hydrogel beads
BioSample SAMN19370043
