Our study introduces DeepMineLys, a two-track CNN-based model designed to mine phage lysins from human microbiome datasets. The human microbiome, including the Gut Virome dataset (GutV), offers a rich source for discovering phage lysins. Two samples from the GutV dataset, collected from healthy individuals, were sequenced in-house. Viral DNA from each sample was used for library construction, with whole genome amplification (WGA) performed to obtain sufficient nucleic acids. Paired-end (2 x 150 bp reads) metagenomic sequencing was conducted on an Illumina HiSeq 2500 short-read platform with an expected sequencing depth of 6 Gb per library (MAGIGENE, Guangzhou, China). The shotgun Illumina paired-end reads from the GutV dataset were assessed for quality using FastQC v0.11.7 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Based on the quality statistics, low-quality bases were trimmed using Trimmomatic v0.36 with the options: SLIDINGWINDOW:4:20 MINLEN:50. Human contamination was removed by aligning the reads with BWA MEM v0.7.17-r1188 (default options) against the human genome GRCh38. The cleaned paired-end reads were then assembled into metagenomic contigs using MEGAHIT v1.1.3 with the option '--k-list 31,51,71,91,111'.
Less...