Arabidopsis gene expression is regulated by more than 1,900 transcription factors (TFs), which have been identified genome-wide by the presence of well-conserved DNA binding domains.
More...Arabidopsis gene expression is regulated by more than 1,900 transcription factors (TFs), which have been identified genome-wide by the presence of well-conserved DNA binding domains. Activator TFs contain activation domains (ADs) that recruit coactivator complexes; however, for most Arabidopsis TFs, we lack knowledge about the presence, location, and transcriptional strength of their ADs. To address this gap, we experimentally identified Arabidopsis ADs on a proteome-wide scale, finding that over half of Arabidopsis TFs carry an AD. We annotated 1,553 ADs, the vast majority of which were previously unknown. We used the dataset generated to develop a neural network to accurately predict ADs and to identify sequence features necessary to recruit coactivator complexes. We uncovered six distinct sequence feature combinations that resulted in activation activity, providing a framework to interrogate activation domain sub-functionalization. Furthermore, we identified activation domains within the ancient AUXIN RESPONSE FACTOR (ARF) family of transcription factors, finding conservation of AD positioning in distinct clades. Our findings provide a deep resource for understanding transcriptional activation, a framework for examination of function within intrinsically disordered regions, and a predictive model of activation domains.
Overall design: Arabidopsis TFs (PADI), or ARF regions (ARF Evolution and Pilot) were fragmented into 40 amino acid tiles with a 10 amino acid step size were made as oligo pools and cloned into plasmid libraries. Yeast transformations ensured integration of a single synthetic TF, each expressing one tile, at the URA3 locus. TF activity was induced with beta-estradiol and TF activity was measured using cell sorting based on the ratio of reporter to synthetic TF (GFP:mcherry). The distribution of tiles across GFP:mCherry ratio bins was determined with Next-generation sequencing of the integrated TF. Median GFP:mcherry values were used to calculate activation domain scores.
Less...