Sample GSM3476057 Query DataSets for GSM3476057
Status Public on Mar 28, 2019
Title human white blood cells single cells H3K4me3 Ab-Mnase-179
Sample type SRA
Source name white blood cells
Organism Homo sapiens
Characteristics cell type: white blood cells
antibody-mnase: H3K4me3 Ab-MNase
Extracted molecule genomic DNA
Extraction protocol The DNAs were end-repaired using the End-It™ DNA End-Repair Kit (Lucigen, Radnor, PA, USA). The reaction was done in 10 µL reaction volume according to the manufacturer’s instruction. The reaction was terminated by adding 115 µL 10 mM Tris-Cl buffer (pH7.5) containing 1 mM EDTA, and the DNAs were extracted by the phenol-chloroform extraction, followed by the ethanol precipitation as described above. The precipitate is re-suspended in 5 µL Qiagen Buffer EB (10 mM Tris-Cl buffer (pH 8.5)). The addition of 3’ A overhangs to the end-repaired DNA fragments was done using the Klenow fragment (3’→5’ exo-) (New England BioLabs, Ipswich, MA, USA) and 1 mM deoxyadenosine triphosphate in the reaction buffer. The DNA fragments were incubated at 37°C for 20 min and the enzymes were inactivated at 75°C for 20 min. The Illumina Adaptor Oligo Mix (Illumina, San Diego, CA, USA) was ligated to the 3’ A overhangs of the DNA fragments using the T4 DNA ligase (New England BioLabs) by incubating at room temperature for 3 hr.
The DNA fragment with the adaptors bound was PCR amplified using the Phusion High-Fidelity PCR Master Mix (New England BioLabs) and the 125 nM PE PCR Primer 1.0 (Illumina). The PCR was done under the following condition: 98°C for 10 min, followed by a cycle of 30 sec at 68°C and 30 sec at 72°C (repeat 18 cycles in the case of 3,000 cells or 25 cycles in the case of a single cell). The 2% [w/v] agarose gel electrophoresis was done to isolate the 140-350 bp fragments and purify using the QIAquick Gel Extraction kit (Qiagen). The concentration of the purified DNAs was measured using Qubit dsDNA HS kit (Thermo Fisher Scientific). The paired-end sequencings were run using the Illumina MiSeq and HiSeq2000.
Library strategy OTHER
Library source genomic
Library selection other
Instrument model Illumina HiSeq 3000
Description KZ1353_GB2021
Data processing Raw Sequence reads were aligned to the reference human genome (UCSC, hg18) by Bowtie2 (bowtie2 -D 20 -R 3 -N 1 -L 16 -i S,1,0.5 -X 1000 -p 20 -q -5 0 -3 30). The reads with mapping quality (MAPQ ≤10) or redundant reads that mapped to the same location with the same orientation were removed from further analysis in each single cell library. For scH3K4me3, To avoid the possibility of doublet (or triplet) outliers, four cells were removed. A cell is defined as a doublet outlier if it satisfies the following two requirements. 1. The log-library size of the cell is larger the averaged the log-library size over all cells. 2. Cells are sorted based on their log-library size. With any of the selected window sizes, the log-library size of the cell is larger than three local scaled median absolute deviation (MAD) from the local median within the sliding window. A number of window sizes are tested. Finally, the remaining 281 cells are used for genome browser visualization and identification of peaks for pooled cells.
Peaks/Enriched regions for the 281 pooled single cell H3K4me3 data were identified using SICER (gap size = 200 bp, window size =200bp and frag size= 150bp). 22293 peaks are identified for the pooled H3K4me3 scChIC-seq data. Peaks/Enriched regions for the 106 pooled single cell H3K4me3 data were identified using SICER (gap size = 600 bp, window size =200bp and frag size= 150bp). 21,465 peaks are identified for the pooled H3K27me3 scChIC-seq data.
For each single cell library (both scH3K4me3 and scH3K27me3), we filtered the reads that are not located in the identified peak regions (22,293 peaks and 21,465 peaks for scH3K4me3 and scH3K27me3 libraries, respectively). Therefore, the *filtered.txt files are obtained for the downstream analysis.
Genome_build: hg18
Supplementary_files_format_and_content: *filtered.txt: bed 6 file. Each line is a peak region. The six columns are chrom, start position, end position, end position-start position, mapq, strand
Supplementary_files_format_and_content: bed: Bed 6 file. Each line is a read. The six columns are chrom, start position, end position, end position-start position, mapq, strand
Supplementary_files_format_and_content: bedgraph: bed 4 file. Each line is a bin, with bin size = 200bp. The four columns are chrom, start position, end position,read density.
Library strategy: scChIC-seq
Submission date Nov 16, 2018
Last update date Mar 30, 2019
Contact name Keji Zhao
Organization name national Institute of Health
Department National Heart, Lung, and Blood Institute
Street address Building 10 Room 7B06A
City Bethesda
State/province MD
ZIP/Postal code 20814
Country USA
Platform ID GPL21290
Series (1)
GSE105012 Single cell chromatin immunocleavage sequencing (scChIC-Seq) to profile histone modification
BioSample SAMN10434646
SRA SRX5017074

Supplementary file Size Download File type/resource
GSM3476057_KZ1353_GB2021_sc1_0_30_mapq10_noDup.bed.gz 312.6 Kb (ftp)(http) BED
GSM3476057_KZ1353_GB2021_sc1_0_30_mapq10_noDup_filtered.txt.gz 91.1 Kb (ftp)(http) TXT
GSM3476057_KZ1353_GB2021_sc1_mapq10_noDup_RPBM.bedgraph.gz 229.7 Kb (ftp)(http) BEDGRAPH
