We report the genome sequence of melon (Cucumis melo L.), an important horticultural crop worldwide. We assembled 375 Mb, representing 83.3 % of the estimated melon genome.
A whole-genome shotgun strategy based on 454 pyrosequencing was used for sequencing, producing 14.8 million single shotgun reads and 7.7 million paired-end reads. Additionally, 53,203 BAC end sequences obtained with Sanger sequencing were available (Gonzalez et al., 2010, BMC Genomics 11:618). A total of 22.7 million reads were produced, equivalent to 7.99 Gb with 17.76x coverage of the estimated 450 Mb melon genome, and by filtering the mitochondrial and chloroplast genomes (Rodriguez-Moreno et al., 2011, BMC Genomics 12:424), 13.52x coverage was finally obtained. We assembled 361.4 Mb into 1,594 scaffolds, representing 80.3 % of the estimated genome size, and 29,865 contigs spanning 13.5 Mb were also obtained, totalling 375 Mb of assembled genome. The N50 scaffold size was 4.68 Mb, and 90 % of the assembly was contained in 78 scaffolds. Finally, the assembly was corrected in homopolymer regions with Illumina reads
Assembly v3.3 was obtained using Newbler version 2.5 (Roche 454) using all reads available. Each SFF file was filtered for duplicate reads using CD-HIT-454. Raw BAC-end sequences were filtered for quality and vector contamination using SeqTrim. The sequences of the melon chloroplast and mitochondria were used in Newbler as the screening database. Assembly v3.3 was homopolymer-corrected with 2 x 54 bp Illumina reads obtained in two lanes of a GAIIx instrument. Three mapping steps were carried out with GEM (with parameters -d 20 --max-indel-length 12), sequentially, to map unmapped reads from the step before. 82,651,113 out of 97,643,590 reads (84.6 %) were mapped. Mapping positions were converted to SAM format and the SAMtools pileup program (6) was run to identify indels. Called indels (substitutions were ignored) with a quality greater than 20, and only involving homopolymers, were applied to the assembly sequence and qualities. In the case of insertions, the pileup consensus quality was used for the assembly consensus quality. The homopolymer-corrected assembly was named v3.4. Finally, by anchoring the genetic map five large scaffolds were detected, each mapping in two genomic locations due to misassemblies, which were manually corrected yielding 5 additional scaffolds. This version was named v3.5.
Less...