Metagenomic data acquired by deep sequencing is immensely complex, lacks apparent structure and is typically dominated by unknown species. Using an abundance co-variance strategy, we group highly co-varying genes into MetaGenomic Species, which represent a wide range of biological entities: bacterial genomes, plasmids, genomic islands, clonal variation and bacteriophages. Applying this concept to a new 3.9 million microbial gene catalogue derived from 396 human stool samples we identified 7,381 such MetaGenomic Species. They range in size from 3 to 6,319 genes, with 741 MetaGenomic Species resembling bacterial genomes in number of genes contained. The Meta-Genomic Species displays remarkable consistency in taxonomy and GC content. 247 of the MetaGenomic Species assemblies even pass the HMP high quality draft genome criteria. A large proportion (73%) of the MetaGenomic Species displays no sequence similarity to any previously sequenced organism. Smaller MetaGenomic Species are enriched for genes characteristic for bacteriophages and functions important for biotic interactions and show strong dependencies to gene-rich MetaGenomic Species. We present the first unsupervised structuring of a highly complex series of metagenomic samples into biological entities, including a global analysis of the genetic interdependencies between bacteria, plasmids, phages and genetic islands in the human distal gut.
Less...