Cotton has one of the lowest amounts of genetic diversity of any of our major crop plants. Much of the natural genetic of diversity contained in the wild was lost during the thousands of years of breeding since the original founding population. Developments in DNA sequencing technology have enabled new strategies to quantify and characterize this missing genetic diversity. This project will sequence the genomes of old cotton cultivars, cotton plants from the wild, and some of its wild relatives. Computational analysis of the DNA sequences will identify individual plants that contain unique genetic diversity that will be useful for understanding cotton biology and for crop improvement. This project will also provide Research Experiences for science Teachers (7-12) at both Brigham Young University and Iowa State University that introduce the conceptual and experimental foundations of genome analysis strategies. Teachers will then be able to share their knowledge of genome sequencing with their high school and middle school students.
The amount and apportionment of genetic diversity in domesticated plants and their wild relatives are key concerns for crop improvement. We have selected 506 accessions of cotton (and its wild, diploid relatives) for genome re-sequencing. Following high-coverage sequencing of DNA libraries of each sample, reads will be mapped to the reference genome using GSNAP; subsequent analyses will include GATK and the BamBam packages (including PolyCat) to analyze and compare these reads alignments to the reference genome of cotton. Nucleotide variation along chromosomes will be identified and quantified for for various levels of clustering of accessions. Population genomics analyses will reveal unique alleles and accessions within the public cotton germplasm collection. We will characterize and phylogenetically place CNVs using sequence coverage and sliding windows of SNPs. These analyses will provide a broad-based interpretation of the homoeoglous exchanges within cotton’s polyploid genome. A pan-genome of cotton will be synthesized to better understand the scope, pace, and pattern of genomic variations that arise during speciation, population differentiation, and domestication and breeding. All project outcomes will be made available to the public through a project website. Sequence data will be accessible through long-term repositories such as Genbank, the NCBI's SRA, and Phytozome. Data will also be available long-term through CottonGen (http://www.cottongen.org/), a community genomics, genetics and breeding database for cotton. The results of these analysis will have a long-lasting impact on how we understand the organization of the cotton genome. Less...