2D microbiomeprints of Disease-Set1 generated by random and manifold embedding algorithms and disease-prediction performances of the corresponding AggMapNet models
(A) The example of the different embedding algorithms for generating the 2D signatures in the IBD dataset. The random uniform embedding (RUE) and five manifold embedding approaches, including multi-dimensional scaling (MDS), isometric mapping (ISOMAP), locally linear embedding (LLE), t-distributed stochastic neighbor embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP), are used for the embedding of the microbes to generate the microbial grid map and 2D microbiomeprint Fmaps. The metagenomic-based hierarchical clustering is used to group the microbes into ten subgroups. Each color of microbes is one cluster from the clustering of the microbial species.
(B) The average ROC AUC performance of the AggMapNet models that are trained on the four metagenomic Disease-Set1 (cirrhosis, IBD, obesity, and T2D binary task) with the RUE, MDS, ISOMAP, LLE, t-SNE, and UMAP 2D microbiomeprint Fmaps as inputs. The model performances were evaluated by the average AUC of 10-fold cross-valuation repeated 10 times (total 100 data points), and the SD error bars of 10 repeats are shown. The paired t test (100 pairs of RUE versus each ME method) was used to test the significance of the difference between the random embedding RUE and ME performance. p values for the significant levels: ∗∗∗∗p ≤ 0.0001, ∗∗∗0.0001 < p ≤ 0.001, ∗∗0.001 < p ≤ 0.01, ∗0.01 < p ≤ 0.05; not significant (n.s.), 0.05 < p ≤ 1.