Next-generation sequencing (NGS) methods are low-cost high-throughput sequencing technologies, producing thousands to millions of sequences. Despite the high-throughput raw data, NGS produces smaller reads when compared to Sanger methodology, complicating the assembly of genomic repeats. Therefore, the assembly of highly repetitive genomes using just NGS short reads remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). In the present study, we designed a low-cost approach to quickly assemble and annotate the repetitive genome of the human parasite Trypanosoma cruzi 231 strain using only Illumina reads.We developed a computational workflow to assemble highly repetitive genomes combining the de novo and reference based assembly strategies, to better overcome their intrinsic limitations. The high accuracy and performance of our combined assembly approach were proven in comparison with other assembly approaches using repetitive sequencing read datasets. The combined assembly approach proposed in this study benefits from the ability of the reference based assembly in resolving highly repetitive sequences represented in the reference genome, with the de novo capacity to assemble genome-specific regions, improving the quality of the genome assembly using only short Illumina reads in an affordable and fast fashion. This pipeline was also used to assemble to other trypanosomatid genomes with accuracy, showing that our approach is an attractive option to assemble highly repetitive genomes.
Less...