|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Feb 18, 2015 |
Title |
CTCF CHIP-Seq, H1 embryonic stem cells, replicate one |
Sample type |
SRA |
|
|
Source name |
H1 embryonic stem cells
|
Organism |
Homo sapiens |
Characteristics |
cell type: H1 embryonic stem cells
|
Treatment protocol |
None
|
Growth protocol |
Growth and differentiation of H1 hESCs was performed as previously described in Xie et al., 2013, Cell 153 (1134-1148)
|
Extracted molecule |
genomic DNA |
Extraction protocol |
Sequencing libraries were constructed according to previous publication (Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289-93 (2009).).
|
|
|
Library strategy |
ChIP-Seq |
Library source |
genomic |
Library selection |
ChIP |
Instrument model |
Illumina HiSeq 2500 |
|
|
Description |
CTCF_H1_R1.fastq.gz
|
Data processing |
fastq: Illumina's HiSeq Control Software For Hi-C read alignment, we aligned Hi-C reads to the hg18 (human) genome. We masked any bases in the genome that were genotyped as SNPs in the H1 genome. These bases were masked to āNā in order to reduce reference bias mapping artifacts. Hi-C reads were aligned iteratively as single end reads using Novoalign. Specifically, for iterative alignment, we first aligned the entire sequencing read to either the mouse or human genome. Unmapped reads are then trimmed by 5 base pairs and realigned. This process is repeated until the read successfully aligns to the genome or until the trimmed read is less than 25 base pairs long. After iterative mapping was finished, read pairs were re-constructed from single reads using an in house pipeline. Unmapped reads were filtered out and PCR duplicate reads were removed. Final alignment files were then processed using the GATK pipeline, specifically using Indel Realignment and Variant Recalibration. A similar pipeline was used for alignment of the other high-throughput sequencing datasets without the iterative alignment step. Haplotypes were generated from the final aligned bam file after merging the two biological replicats using the HapCUT algorithm. The details of HapCUT are described previously (Bansal and Bafna, Bioinformatics 24, i153-159, 2008). Genome_build: hg18 Supplementary_files_format_and_content: Reads are listed in bed format with one line for each sequencing read. The reads have been split by haplotype into the "A" and "B" (alternatively, "p1" and "p2") alleles according to which haplotype the bases within each sequencing read correspond. For paired end Hi-C data, each line lists a single read, and paired infomration can be obtained from the read names. The original fastq files for data other than the Hi-C and CTCF ChIP-seq are available in the GSE16256 dataset. Supplementary_files_format_and_content: The processed haplotypes for the H1 genome ("H1_haps.vcf") are available in VCF format.
|
|
|
Submission date |
Nov 18, 2013 |
Last update date |
May 15, 2019 |
Contact name |
Jesse R Dixon |
E-mail(s) |
jedixon@salk.edu
|
Organization name |
Salk Institute for Biological Studies
|
Lab |
PBL-D
|
Street address |
10010 N. Torrey Pines Rd.
|
City |
La Jolla |
State/province |
CA |
ZIP/Postal code |
92037 |
Country |
USA |
|
|
Platform ID |
GPL16791 |
Series (1) |
GSE52457 |
Global Reorganization of Chromatin Architecture during Embronic Stem Cell Differentiation |
|
Relations |
BioSample |
SAMN02404681 |
SRA |
SRX378281 |
Supplementary data files not provided |
SRA Run Selector |
Raw data are available in SRA |
Processed data are available on Series record |
|
|
|
|
|