NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Series GSE32465 Query DataSets for GSE32465
Status Public on Sep 29, 2011
Title Transcription Factor Binding Sites by ChIP-seq from ENCODE/HAIB
Project ENCODE
Organism Homo sapiens
Experiment type Genome binding/occupancy profiling by high throughput sequencing
Summary This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Florencia Pauli mailto:fpauli@hudsonalpha.org). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu).

The ChIP-Seq method was used to assay chromatin fragments bound by specific or general transcription factors as described below. DNA isolated by ChIP-Seq was size-selected (~225 bp) and sequenced. Short reads of 25-36 bp were mapped to the human reference genome, and enriched regions of high read density relative to a total input chromatin control reads were identified.
The sequence reads with quality scores (fastq files) and alignment coordinates (BAM files) from these experiments are available for download.

For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf
 
Overall design Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). Cross-linked chromatin was immunoprecipitated with an antibody. The Protein:DNA crosslinks were then reversed and the DNA fragments were recovered and sequenced. Please see protocol notes below and go to http://hudsonalpha.org/myers-lab/protocols for the most current version of the protocol. Biological replicates from each experiment were completed.
Libraries were sequenced with an Illumina Genome Analyzer I or an Illumina Genome Analyzer IIx according to the manufacturer's recommendations. Sequence data produced by the Illumina data pipeline software were quality filtered and then mapped to NCBI Build37 (hg19) using the integrated Eland software; 32 nt of the sequence reads were used for alignment; up to two mismatches were tolerated; reads that mapped to multiple sites in the genome were discarded.
To identify likely binding sites, peak calling was applied to the aligned sequence data sets using Model-based Analysis of Chip-Seq MACS (Zhang Y, et al., 2008) (http://liulab.dfci.harvard.edu/MACS/00README.html). MACS models the shift size of ChIP-Seq tags empirically, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to capture local biases in the genome, allowing for more robust predictions (Zhang Y, et al., 2008).
Protocol Notes:
Several changes and improvements were made to the original ChIP-Seq protocol (Jonshon et al.,2008). The major differences between protocols are the number of cells and magnetic beads used for IP, the method of sonication used to fragment DNA, and the number of cycles of PCR used to amplify the sequencing library. The most current protocol used by the Myers lab can be found at http://hudsonalpha.org/myers-lab/protocols. The protocol field for each file denotes the version of the protocol used as being PCR1x, PCR2x or a version number (for examples, v041610.1).
The sequencing libraries labeled as PCR2x were made with two rounds of amplification (25 and 15 cycles) and those labeled as PCR1x were made with one 15-cycle round of amplification. These experiments were completed prior to January 2010 and were originally aligned to NCBI Build36 (hg18). They have been re-aligned to NCBI Build37 (hg19) with the Bowtie software (Langmead, et al., 2009) for this data release (http://bowtie-bio.sourceforge.net/index.shtml). The libraries labeled with a protocol version number were competed after January 2010 and were only aligned to NCBI Build37 (hg19). Please refer to the Myers Lab website (http://hudsonalpha.org/myers-lab/protocols) for details on each protocol version.
Verification:
The MACS (http://liulab.dfci.harvard.edu/MACS/00README.html) peak caller was used to call significant peaks on the individual replicates of a ChIP-Seq experiment. Afterwards, the irreproducible discovery rate (IDR) method, developed by Li et al. (submitted), was used to quantify the consistency between pairs of ranked peaks lists from replicates. The IDR methods uses a model that assumes that the ranked lists of peaks in a pair of replicates consist of two groups - a reproducible group and an irreproducible group. In general, the signals in the reproducible group are more consistent (i.e. with a larger rank correlation coefficient) and are ranked higher than the irreproducible group. The proportion of peaks that belong to the irreproducible component and the correlation of the reproducible component are estimated adaptively from the data. The model also provides an IDR score for each peak, which reflects the posterior probability of the peak belonging to the irreproducible group. The aligned reads were pooled from all replicates and the MACS peak caller was used to call significant peaks on the pooled data. Only datasets containing at least 100 peaks passing the IDR threshold are considered valid and submitted for release.
Web link http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeHaibTfbs
 
Contributor(s) Myers R, Pauli F
Citation(s) 24076218
BioProject PRJNA63447
Submission date Sep 28, 2011
Last update date May 15, 2019
Contact name ENCODE DCC
E-mail(s) encode-help@lists.stanford.edu
Organization name ENCODE DCC
Street address 300 Pasteur Dr
City Stanford
State/province CA
ZIP/Postal code 94305-5120
Country USA
 
Platforms (1)
GPL9052 Illumina Genome Analyzer (Homo sapiens)
Samples (399)
GSM803333 HudsonAlpha_ChipSeq_SK-N-SH_RA_CTCF_v041610.2
GSM803334 HudsonAlpha_ChipSeq_GM12892_PAX5-C20_v041610.1
GSM803335 HudsonAlpha_ChipSeq_SK-N-SH_NRSF_PCR2x
Relations
Affiliated with GSE33816
SRA SRP008797

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE32465_RAW.tar 52.3 Gb (http)(custom) TAR (of BIGWIG, BROADPEAK)
GSE32465_run_info.txt.gz 12.4 Kb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap