Vibrio cholerae is a globally dispersed pathogen capable of infecting a wide range of hosts. As the causative agent of cholera it has evolved and adapted with humans for centuries. Controversy surrounding the source of the recent cholera outbreak in Haiti has brought renewed interest to this organism, which is becoming the model species for genomic epidemiology. The recently released Ion Torrent DNA sequencing platform enabled rapid and cheap whole genome sequencing of Vibrio genomes. Here, we combine this novel sequencing technology with Random Forest-based machine learning to understand the evolution and persistence of V. cholerae across habitat, space and time. Our analysis shows that both individual point mutations (SNPs) and genetic markers (gene clusters, subsystems and phages) contribute to the persistence of Vibrio across these different niche dimensions. We show that the toxin-coregulated pilus (TCP), involved in Vibrio virulence, is a critical determinant of habitat; the prophage composition of the genomes separates the isolates by space; and the interplay between error-prone DNA replication and SNPs describe the genomic differences that accumulate over time. The new approach applied here highlights the genes that may be responsible for previously reported differences in the rate of evolution of the global pathogen V. cholerae.
Less...