Transcription factors (TFs) play a central role in regulating gene expression by interacting with cis regulatory DNA elements associated with their target genes. Recent surveys have examined the DNA binding specificities of most Saccharomyces cerevisiae transcription factors but a comprehensive evaluation of their data has been lacking.
Results: We analyzed in vitro and in vivo TF-DNA binding data reported in previous large-scale studies to generate a comprehensive, curated resource of DNA binding specificity data for all characterized S. cerevisiae transcription factors. Our collection comprises DNA binding site motifs and comprehensive in vitro DNA binding specificity data for all possible 8 bp sequences. Included in this database is DNA binding specificity data for 27 TFs independently generated by PBM analysis in this current study. Investigation of the DNA binding specificities within the basic leucine zipper (bZIP) and VHR transcription factor families revealed unexpected plasticity in TF-DNA recognition: intriguingly, the VHR transcription factors, newly characterized by protein binding microarrays in this study, recognize bZIP like DNA motifs, while the bZIP transcription factor Hac1 recognizes a motif highly similar to the canonical E-box motif of basic helix-loop-helix (bHLH) transcription factors. We identified several transcription factors with distinct primary and secondary motifs, which might be associated with different regulatory functions. Finally, integrated analysis of in vivo transcription factor binding data with protein binding microarray data lends further support for indirect DNA binding in vivo by sequence-specific transcription factors.
Overall design: 27 Protein binding microarray (PBM) experiments of Saccharomyces cerevisiae transcription factors were performed. Briefly, the PBMs involved binding GST-tagged yeast transcription factors to double-stranded 44K Agilent microarrays in order to determine their sequence preferences. The method is described in Berger et al., Nature Biotechnology 2006 (PMID 16998473). A key feature is that the microarrays are composed of de Bruijn sequences that contain each 10-base sequence once and only once, providing an evenly balanced sequence distribution. Individual de Bruijn sequences have different properties, including representation of gapped patterns. The array probe sequences on the custom array design used in this study were reported previously in Berger et al., Cell 2008 (PMID 18585359) and are available via an academic research use license. Here we provide the data transformed into median signal intensities (after normalization and detrending of the original array data) for all 32,896 8-base sequences, Z-scores for these intensities, and E-scores. E-scores are a modified version of AUC and describe how well each 8-mer ranks the intensities of the spots. In general, the E-scores are slightly more reproducible than Z-scores, but contain less information about relative binding affinity. Additional experimental details are found in Berger et al., Nature Biotechnology 2006, Gordan et al., Genome Biology (in press), and the accompanying Supplementary information.
Less...