A genome assembly was generated using a small pool (n=10) of male individuals from an inbred line of Callosobruchus chinensis.
More...A genome assembly was generated using a small pool (n=10) of male individuals from an inbred line of Callosobruchus chinensis. PacBio RSII sequences representing 57X genomic coverage with an average read length of 9.2 Kbp were assembled using FALCON, and error-corrected by one round of Arrow (SMART portal) based on re-alignement of the full set of PacBio reads. The resulting assembly is 701 Mbp in total size, with an N50 of 801 Kbp. For the genome annotation, a first round of annotation was done with MAKER3 pipeline using evidence data: i) proteins from the Uniprot-Swissprot database; ii) extant proteomes of Acanthoscelides obtectus and Callosobruchus maculatus. This evidence-based gene build (rc1) contained 27,601 gene models. Gene models obtained from the first round of annotation were then used to train the ab initio tools Augustus (v2.7), Snap and GeneMark-ET (version 4.3). We next performed an ab initio evidence-driven gene build called “evidence-driven” annotation. This round of annotation integrates the ab initio tools previously trained: Augustus, Snap and Genemark-ET and EVidenceModeler. The ab initio evidence-driven gene build (rc2) contains 34,872 gene models. Finally, all ab initio gene models (rc2) that mapped within an empty locus in the evidence-driven annotation (rc1), was added to rc1 to create our final build (rc3), containing 36,086 gene models. With the final gene build (rc3), we proceeded to infer putative functions for all genes. We first predicted functional domains using InterProscan 5.21-60 to retrieve functional information from Interpro (21 different sources). Functional annotations were thus assigned to 25,118 of the predicted coding genes and to 42,983 of the predicted mRNAs. Each predicted protein sequence was also blasted against the Uniprot/Swissprot reference data set in order to infer, when available, the gene and protein name. The inference was made using the best blast hit approach, i.e. using the best hit with a maximum e-value cut-off to 1e-6. This made it possible to associate gene names to 15,014 protein sequences. In addition, 6,200 tRNA genes were annotated and added to rc3 through tRNAscan 1.3.1.
Less...Accession | PRJEB70760 |
Scope | Monoisolate |
Submission | Registration date: 30-Nov-2023 Evolutionary Biology Centre |
Locus Tag Prefix | CALCHINE |
Project Data:
No public data is linked to this project. Any recently released data that cites this project will be linked to it within a few days.