GWAS have discovered thousands of genomic loci that are associated with disease risk and quantitative traits, but most of the variants responsible for risk remain uncharacterized. The vast majority of GWAS-identified loci contain non-coding SNPs and defining molecular mechanism of risk is challenging. Many non-coding causal SNPs are hypothesized to alter Transcription Factor (TF) binding sites as the mechanism by which they affect organismal phenotypes. We employed an integrative genomics approach to identify candidate TF binding motifs that confer breast cancer-specific phenotypes identified by GWAS. We performed de novo motif analysis of regulatory elements, analyzed evolutionary conservation of identified motifs, and assayed TF footprinting data to identify sequence elements that recruit TFs and maintain chromatin landscape in breast cancer-relevant tissue and cell lines. Regulatory elements for MCF10A were mapped with ATAC-seq.We identified top candidate causal SNPs that are predicted to alter TF binding, within breast cancer-relevant regulatory regions, and in strong linkage disequilibrium with the GWAS SNPs. This integrative analysis pipeline is a general framework to identify candidate causal variants within regulatory regions and TF binding sites that confer phenotypic variation and disease risk.
Map the hypersensitve regions in breast cancer specific cell lines. Identfy TF motifs that underlie the overexpressed regions. Using GWAS ipute causal SNPs that could disrupt TF binding site. Five biological replicate libraries were prepared and sequenced paired end (2x75).