Series GSE118725 Query DataSets for GSE118725
Status Public on Oct 28, 2020
Title Systematic Analysis of Transcription Factors Binding to Noncoding Variants
Organisms Homo sapiens; synthetic construct
Experiment type Genome binding/occupancy profiling by high throughput sequencing
Expression profiling by high throughput sequencing
Summary A large number of sequence variants have been linked to complex human traits and diseases, but deciphering their biological functions is still challenging since most of them reside in the noncoding DNA. To fill this gap, we have systematically assessed the binding of 270 human transcription factors (TF) to 95,886 noncoding variants in the human genome using an ultra-high-throughput multiplex protein-DNA binding assay, termed SNP evaluation by Systematic Evolution of Ligands by EXponential enrichment (SNP-SELEX). The resulting 828 million measurements of TF-DNA interactions enable estimation of the relative affinity of these TFs to each variant in vitro and allow for evaluation of the current methods to predict the impact of noncoding variants on TF binding. We show that the Position Weight Matrices (PWMs) of most TFs lack sufficient predictive power, while the Support Vector Machine (SVM) combined with the gapped k-mer representation show much improved performance, when assessed on results from independent SNP-SELEX experiments involving a new set of 61,020 sequence variants. We report highly predictive models for 94 human TFs and demonstrate their utility in genome-wide association studies (GWAS) and understanding of the molecular pathways involved in diverse human traits and diseases.
Overall design 768 experiments with HT-SELEX for six SELEX cycles to measure allelic TF binding for 95,886 SNPs. TF ChIP-seq and RNA-seq were performed in HepG2 cells to validate allelic TF binding. STARR-seq experiments were performed to identify SNPs that affect enhancer activity in HepG2 and HEK293 cells with three replicates. In situ Hi-C experiments were performed in HepG2 cells and human islets to identify target genes of SNPs.
Contributor(s) Yan J, Qiu Y, Ribeiro dos Santos AM, Benaglio P, Ren B
Citation(s) 33505025
Submission date Aug 17, 2018
Last update date Feb 02, 2021
Contact name Yunjiang Qiu
Organization name University of California, San Diego
Street address 9500 Gilman Drive
City La Jolla
State/province CA
ZIP/Postal code 92093
Country USA
Platforms (4)
GPL16791 Illumina HiSeq 2500 (Homo sapiens)
GPL19604 Illumina HiSeq 2500 (synthetic construct)
GPL20301 Illumina HiSeq 4000 (Homo sapiens)
Samples (11359)
GSM3344562 HepG2 HOXB4
GSM3344563 HepG2 PEA3
GSM3344564 HepG2 MAFF/G/K
BioProject PRJNA486548
SRA SRP158289

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE118725_Islet.fpkm.tsv.gz 1.7 Mb (ftp)(http) TSV
GSE118725_Islet.hic 9.6 Gb (ftp)(http) HIC
GSE118725_Islet.loops.txt.gz 595.0 Kb (ftp)(http) TXT
GSE118725_RAW.tar 12.2 Gb (http)(custom) TAR (of BW, HIC, NARROWPEAK, TSV, TXT, VCF)
GSE118725_pbs.novel_batch.tsv.gz 41.0 Mb (ftp)(http) TSV
GSE118725_pbs.obs_pval05.tsv.gz 97.2 Mb (ftp)(http) TSV
GSE118725_starr_seq.count.tsv.gz 389.3 Kb (ftp)(http) TSV
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record

