More than 10 years ago the T. brucei reference genome from the TREU 927/4 strain (Tb927) was published, greatly boosting our understanding of its complex biology. However, it is not the Tb927 but the Lister 427 strain that is most widely used in the T. brucei field. Molecular karyotyping has revealed that the Lister 427 genome is ~16Mb larger than the reference strain and that its size is more representative of that from other isolates. The lack of a complete 427 genome has made ‘true’ genome-wide analyses impossible, highlighting the importance of obtaining a complete genome sequence of the Lister 427 model strain. However, the generation of complete and accurate trypanosome genome assemblies has been hindered by the high amount of repetitive sequence elements present in their genomes. To overcome this problem, we combined long read sequencing technology (SMRT sequencing) with DNA-DNA contact data obtained from genome-wide chromosome conformation capture (Hi-C) experiments. We exploited strong ubiquitous features of 3D-architecture of eukaryotic genomes, to order and scaffold the assembled fragments.Using Pacbio sequencing technology we have sequenced the 427 genome to 100X coverage and obtained reads exceeding 50 kb in length, which were assembled into 1232 contigs. These contigs were then clustered and positioned relative to each other based on 3D contact frequency, scaffolded and joined where possible. The resulting scaffold of the eleven chromosome genome has a size of ~42Mb, 30% larger than the reference genome, agreeing with karyotype predictions. Genome comparison shows that the gene-dense central regions of chromosomes are mostly conserved among strains with specific variations in the copy number of some gene families. However, interestingly, unlike Tb927 strain, the majority of Lister 427 chromosomes have extremely long hemizygous subtelomeric regions. These regions contain gene arrays coding for variant surface proteins and can comprise most of a chromosome, indicating that T. brucei devotes a very large part of its genome to maintain its antigenic variation repertoire.The availability of the Lister 427 genome will provide a better platform on which to study gene function and variation.
Less...