| Title: | Metric Learning for Statistical Inference in Microbiome Analysis |
|---|---|
| Description: | MeLSI (Metric Learning for Statistical Inference) is a novel machine learning method for microbiome data analysis that learns optimal distance metrics to improve statistical power in detecting group differences. Unlike traditional distance metrics (Bray-Curtis, Euclidean, Jaccard), MeLSI adapts to the specific characteristics of your dataset to maximize separation between groups. The method uses an ensemble of weak learners to identify which microbial features drive group differences, providing both improved statistical power and biological interpretability through feature importance weights. |
| Authors: | Nathan Bresette [aut, cre] (ORCID: <https://orcid.org/0009-0003-1554-6006>), Aaron C. Ericsson [aut] (ORCID: <https://orcid.org/0000-0002-3053-7269>), Carter Woods [aut] (ORCID: <https://orcid.org/0009-0007-5345-2712>), Ai-Ling Lin [aut, fnd] (ORCID: <https://orcid.org/0000-0002-5197-2219>) |
| Maintainer: | Nathan Bresette <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.1 |
| Built: | 2026-06-01 23:05:56 UTC |
| Source: | https://github.com/bioc/MeLSI |
Applies centered log-ratio (CLR) transformation to microbiome count data. This transformation is recommended for microbiome data before running MeLSI.
clr_transform(X, pseudocount = 1)clr_transform(X, pseudocount = 1)
X |
Feature matrix (samples x taxa) with raw counts or relative abundances |
pseudocount |
Small constant to add before log transformation (default: 1) |
CLR-transformed matrix with preserved column names
# Generate synthetic data test_data <- generate_test_data(n_samples = 20, n_taxa = 30, n_signal_taxa = 5) X <- test_data$counts # Transform microbiome data X_clr <- clr_transform(X) # Verify transformation stopifnot(is.matrix(X_clr)) stopifnot(nrow(X_clr) == nrow(X))# Generate synthetic data test_data <- generate_test_data(n_samples = 20, n_taxa = 30, n_signal_taxa = 5) X <- test_data$counts # Transform microbiome data X_clr <- clr_transform(X) # Verify transformation stopifnot(is.matrix(X_clr)) stopifnot(nrow(X_clr) == nrow(X))
Creates synthetic microbiome count data with specified characteristics for testing and demonstration purposes.
generate_test_data( n_samples = 60, n_taxa = 100, n_signal_taxa = 10, effect_size = 2, group_balance = 0.5 )generate_test_data( n_samples = 60, n_taxa = 100, n_signal_taxa = 10, effect_size = 2, group_balance = 0.5 )
n_samples |
Total number of samples to generate |
n_taxa |
Total number of taxa (features) |
n_signal_taxa |
Number of taxa with differential abundance between groups |
effect_size |
Magnitude of differential effect (fold-change) |
group_balance |
Proportion of samples in group A (default 0.5 for balanced) |
A list containing:
counts: Matrix of count data (samples x taxa)
metadata: Data frame with sample metadata including Group
signal_taxa: Vector of indices for taxa with true differential abundance
# Generate balanced dataset with 60 samples and 100 taxa test_data <- generate_test_data(n_samples = 60, n_taxa = 100, n_signal_taxa = 10) X <- test_data$counts y <- test_data$metadata$Group# Generate balanced dataset with 60 samples and 100 taxa test_data <- generate_test_data(n_samples = 60, n_taxa = 100, n_signal_taxa = 10) X <- test_data$counts y <- test_data$metadata$Group
Performs MeLSI (Metric Learning for Statistical Inference) analysis for microbiome data. Automatically handles both pairwise comparisons (2 groups) and multi-group analysis (3+ groups).
melsi( X, y, analysis_type = "auto", n_perms = 200, B = 30, m_frac = 0.8, show_progress = TRUE, plot_vip = TRUE, correction_method = "BH", BPPARAM = NULL )melsi( X, y, analysis_type = "auto", n_perms = 200, B = 30, m_frac = 0.8, show_progress = TRUE, plot_vip = TRUE, correction_method = "BH", BPPARAM = NULL )
X |
A matrix of feature abundances with samples as rows and features as columns |
y |
A vector of group labels for each sample |
analysis_type |
Type of analysis to perform: - "auto" (default): Automatically choose based on number of groups - "pairwise": For 2 groups or all pairwise comparisons for 3+ groups - "omnibus": Global analysis for 3+ groups (requires at least 3 groups) - "both": Both omnibus and pairwise for 3+ groups |
n_perms |
Number of permutations for p-value calculation (default: 200) |
B |
Number of weak learners in the ensemble (default: 30) |
m_frac |
Fraction of features to use in each weak learner (default: 0.8) |
show_progress |
Whether to display progress information (default: TRUE) |
plot_vip |
Whether to display Variable Importance Plot (default: TRUE) |
correction_method |
Multiple testing correction method for pairwise comparisons (default: "BH") |
BPPARAM |
A |
For 2 groups or pairwise analysis: List with F-statistic, p-value, feature weights, etc. For 3+ groups: List containing omnibus results, pairwise results, or both.
# Generate test data test_data <- generate_test_data(n_samples = 40, n_taxa = 50, n_signal_taxa = 5) X <- test_data$counts y <- test_data$metadata$Group # CLR transformation X_clr <- clr_transform(X) # Run MeLSI analysis results <- melsi(X_clr, y, n_perms = 19, B = 10, show_progress = FALSE) # Check results stopifnot(is.list(results)) stopifnot("F_observed" %in% names(results)) stopifnot("p_value" %in% names(results))# Generate test data test_data <- generate_test_data(n_samples = 40, n_taxa = 50, n_signal_taxa = 5) X <- test_data$counts y <- test_data$metadata$Group # CLR transformation X_clr <- clr_transform(X) # Run MeLSI analysis results <- melsi(X_clr, y, n_perms = 19, B = 10, show_progress = FALSE) # Check results stopifnot(is.list(results)) stopifnot("F_observed" %in% names(results)) stopifnot("p_value" %in% names(results))
Creates a barplot showing the top features ranked by their learned weights
plot_feature_importance( feature_weights, top_n = 8, main_title = NULL, directionality = NULL )plot_feature_importance( feature_weights, top_n = 8, main_title = NULL, directionality = NULL )
feature_weights |
Named vector of feature weights |
top_n |
Number of top features to display (default: 8) |
main_title |
Optional title for the plot |
directionality |
Optional named vector indicating which group has higher abundance for each feature |
A ggplot2 object (invisibly)
# Generate test data and run MeLSI test_data <- generate_test_data(n_samples = 30, n_taxa = 20, n_signal_taxa = 5) X <- test_data$counts y <- test_data$metadata$Group X_clr <- clr_transform(X) results <- melsi(X_clr, y, n_perms = 19, B = 10, show_progress = FALSE) # Plot feature importance plot_feature_importance(results$feature_weights, top_n = 10)# Generate test data and run MeLSI test_data <- generate_test_data(n_samples = 30, n_taxa = 20, n_signal_taxa = 5) X <- test_data$counts y <- test_data$metadata$Group X_clr <- clr_transform(X) results <- melsi(X_clr, y, n_perms = 19, B = 10, show_progress = FALSE) # Plot feature importance plot_feature_importance(results$feature_weights, top_n = 10)
Creates a Principal Coordinates Analysis (PCoA) plot using the learned MeLSI distance matrix.
plot_pcoa(melsi_results, X, y, title = "PCoA using MeLSI Distance")plot_pcoa(melsi_results, X, y, title = "PCoA using MeLSI Distance")
melsi_results |
Results object from melsi() function |
X |
Original feature matrix (samples x taxa) |
y |
Group labels vector |
title |
Optional custom title for the plot (default: "PCoA using MeLSI Distance") |
A ggplot2 object (invisibly)
# Generate test data and run MeLSI test_data <- generate_test_data(n_samples = 30, n_taxa = 20, n_signal_taxa = 5) X <- test_data$counts y <- test_data$metadata$Group X_clr <- clr_transform(X) results <- melsi(X_clr, y, n_perms = 19, B = 10, show_progress = FALSE) # Plot PCoA plot_pcoa(results, X_clr, y)# Generate test data and run MeLSI test_data <- generate_test_data(n_samples = 30, n_taxa = 20, n_signal_taxa = 5) X <- test_data$counts y <- test_data$metadata$Group X_clr <- clr_transform(X) results <- melsi(X_clr, y, n_perms = 19, B = 10, show_progress = FALSE) # Plot PCoA plot_pcoa(results, X_clr, y)
Simplified function to plot Variable Importance (VIP) directly from MeLSI results. Automatically extracts feature weights and optionally includes directionality information.
plot_vip(melsi_results, top_n = 15, title = NULL, directionality = TRUE)plot_vip(melsi_results, top_n = 15, title = NULL, directionality = TRUE)
melsi_results |
Results object from melsi() function |
top_n |
Number of top features to display (default: 15) |
title |
Optional custom title for the plot |
directionality |
Whether to include directionality coloring (default: TRUE) |
A ggplot2 object (invisibly)
# Generate test data and run MeLSI test_data <- generate_test_data(n_samples = 30, n_taxa = 20, n_signal_taxa = 5) X <- test_data$counts y <- test_data$metadata$Group X_clr <- clr_transform(X) results <- melsi(X_clr, y, n_perms = 19, B = 10, show_progress = FALSE) # Plot VIP with directionality (default) plot_vip(results, top_n = 10)# Generate test data and run MeLSI test_data <- generate_test_data(n_samples = 30, n_taxa = 20, n_signal_taxa = 5) X <- test_data$counts y <- test_data$metadata$Group X_clr <- clr_transform(X) results <- melsi(X_clr, y, n_perms = 19, B = 10, show_progress = FALSE) # Plot VIP with directionality (default) plot_vip(results, top_n = 10)