Title: | "Refining and extending generalized multivariate linear models for meta-omic association discovery" |
---|---|
Description: | MaAsLin 3 refines and extends generalized multivariate linear models for meta-omicron association discovery. It finds abundance and prevalence associations between microbiome meta-omics features and complex metadata in population-scale epidemiological studies. The software includes multiple analysis methods (including support for multiple covariates, repeated measures, and ordered predictors), filtering, normalization, and transform options to customize analysis for your specific study. |
Authors: | William Nickols [aut, cre] , Jacob Nearing [aut] |
Maintainer: | William Nickols <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.99.3 |
Built: | 2025-01-20 03:32:02 UTC |
Source: | https://github.com/bioc/maaslin3 |
Check the arguments provided are valid for further MaAsLin 3 use.
maaslin_check_arguments(feature_specific_covariate = NULL, feature_specific_covariate_name = NULL, feature_specific_covariate_record = NULL, zero_threshold = 0, normalization = 'TSS', transform = 'LOG', correction = 'BH', warn_prevalence = TRUE, evaluate_only = NULL, unscaled_abundance = NULL, median_comparison_abundance = TRUE)
maaslin_check_arguments(feature_specific_covariate = NULL, feature_specific_covariate_name = NULL, feature_specific_covariate_record = NULL, zero_threshold = 0, normalization = 'TSS', transform = 'LOG', correction = 'BH', warn_prevalence = TRUE, evaluate_only = NULL, unscaled_abundance = NULL, median_comparison_abundance = TRUE)
feature_specific_covariate |
A table of feature-specific covariates
or a filepath to a tab-delimited file with feature-specific covariates.
It should be formatted with features as columns and samples as rows (or
the transpose). The row names and column names should be the same as
those of the |
feature_specific_covariate_name |
The name for the feature-specific covariates when fitting the models. |
feature_specific_covariate_record |
Whether to keep the feature-specific covariates in the outputs when calculating p-values, writing results, and displaying plots. |
zero_threshold |
Abundances less than or equal to
|
normalization |
The normalization to apply to the features before
transformation and analysis. The option |
transform |
The transformation to apply to the features after
normalization and before analysis. The option |
correction |
The correction to obtain FDR-corrected q-values from
raw p-values. Any valid options for |
warn_prevalence |
Warn when prevalence associations are likely induced by abundance associations. This requires re-fitting the linear models on the TSS log-transformed data. |
evaluate_only |
Whether to evaluate just the abundnace ("abundance") or prevalence ("prevalence") models |
unscaled_abundance |
A data frame with a single column of absolute
abundances or a filepath to such a tab-delimited file. The row names
should match the names of the samples in |
median_comparison_abundance |
Test abundance coefficients against a null value corresponding to the median coefficient for a metadata variable across the features. This is recommended for relative abundance data but should not be used for absolute abundance data. |
No value is returned, but incompatibile arguments will produce an error.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) # Prepare parameter lists maaslin3::maaslin_check_arguments(zero_threshold = 0, normalization = 'TSS', transform = 'LOG', correction = 'BH', median_comparison_abundance = TRUE) unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) # Prepare parameter lists maaslin3::maaslin_check_arguments(zero_threshold = 0, normalization = 'TSS', transform = 'LOG', correction = 'BH', median_comparison_abundance = TRUE) unlink('output', recursive=TRUE) logging::logReset()
Ensure that the formula provided is valid. Only one of
maaslin_compute_formula
or maaslin_check_formula
should be
used.
maaslin_check_formula(data, metadata, input_formula = NULL, feature_specific_covariate_name = NULL)
maaslin_check_formula(data, metadata, input_formula = NULL, feature_specific_covariate_name = NULL)
data |
A data frame of feature abundances. It should be formatted with features as columns and samples as rows. The column and row names should be the feature names and sample names respectively. |
metadata |
A data frame of per-sample metadata. It should be formatted with variables as columns and samples as rows. The column and row names should be the variable names and sample names respectively. |
input_formula |
A formula in |
feature_specific_covariate_name |
The name for the feature-specific covariates when fitting the models. |
A list containing the following named items:
formula
: The constructed formula.
random_effects_formula
: A formula for the random
effects.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') unlink('output', recursive=TRUE) logging::logReset()
Compute a formula using variables provided through fixed_effects
,
random_effects
, group_effects
, ordered_effects
, and
strata_effects
. Only one of maaslin_compute_formula
or
maaslin_check_formula
should be used.
maaslin_compute_formula(data, metadata, fixed_effects = NULL, random_effects = NULL, group_effects = NULL, ordered_effects = NULL, strata_effects = NULL, feature_specific_covariate_name = NULL)
maaslin_compute_formula(data, metadata, fixed_effects = NULL, random_effects = NULL, group_effects = NULL, ordered_effects = NULL, strata_effects = NULL, feature_specific_covariate_name = NULL)
data |
A data frame of feature abundances. It should be formatted with features as columns and samples as rows. The column and row names should be the feature names and sample names respectively. |
metadata |
A data frame of per-sample metadata. It should be formatted with variables as columns and samples as rows. The column and row names should be the variable names and sample names respectively. |
fixed_effects |
A vector of variable names to be included as fixed effects. |
random_effects |
A vector of variable names to be included as random intercepts. |
group_effects |
A factored categorical variable to be included for group testing. An ANOVA-style test will be performed to assess whether any of the variable's levels are significant, and no coefficients or individual p-values will be returned. |
ordered_effects |
A factored categorical variable to be included. Consecutive levels will be tested for significance against each other, and the resulting associations will correspond to effect sizes, standard errors, and significances of each level versus the previous. |
strata_effects |
A vector with one variable name to be included as the strata variable in case-control studies. Strata cannot be combined with random effects. |
feature_specific_covariate_name |
The name for the feature-specific covariates when fitting the models. |
A list containing the following named items:
formula
: The constructed formula.
random_effects_formula
: A formula for the random
effects.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', fixed_effects = c('diagnosis', 'dysbiosis_state', 'antibiotics', 'age', 'reads'), random_effects = c('participant_id'), plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_compute_formula( data, metadata, fixed_effects = c('diagnosis', 'dysbiosis_state', 'antibiotics', 'age', 'reads'), random_effects = c('participant_id')) unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', fixed_effects = c('diagnosis', 'dysbiosis_state', 'antibiotics', 'age', 'reads'), random_effects = c('participant_id'), plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_compute_formula( data, metadata, fixed_effects = c('diagnosis', 'dysbiosis_state', 'antibiotics', 'age', 'reads'), random_effects = c('participant_id')) unlink('output', recursive=TRUE) logging::logReset()
Perform a contrast test (lmerTest::contest
for mixed effects linear;
multcomp::glht
for all others) using a named contrast matrix and
right hand side. One contrast test is applied per row of the matrix.
maaslin_contrast_test(fits, contrast_mat, rhs = NULL, median_comparison = NULL)
maaslin_contrast_test(fits, contrast_mat, rhs = NULL, median_comparison = NULL)
fits |
The list of fits from |
contrast_mat |
A matrix with one row per contrast test to run. The columns will be matched to the coefficients of the model by name. Contrast vector coefficients need not be specified if they would be zero. If row names are provided, they will be used to label the test in the results. |
rhs |
The right hand size of the contrast test. The length should
be the same as the number of rows in the |
median_comparison |
Run a median comparison as though the contrast
had been fit directly. The parameter |
A dataframe with the following columns:
feature
: The feature involved in the test.
test
: The name of the contrast_mat
row for the
test or the row number of the contrast_mat
if no row names are
specified.
coef
: The coefficient used in the hypothesis test: this
is the dot product of the contrast vector and the model coefficients.
rhs
: The right hand side against which the coefficient
is test.
stderr
: The standard error of the coefficient.
pval_individual
: The (uncorrected) p-value of the
test.
error
: Any error produced during the testing.
NA otherwise.
model
: abundance
for the abundance models and
prevalence
for the prevalence models.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 fit_out <- maaslin3::maaslin3(input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) contrast_mat <- matrix(c(1, 1, 0, 0, 0, 0, 1, 1), ncol = 4, nrow = 2, byrow = TRUE) colnames(contrast_mat) <- c("diagnosisUC", "diagnosisCD", "dysbiosis_statedysbiosis_UC", "dysbiosis_statedysbiosis_CD") rownames(contrast_mat) <- c("diagnosis_test", "dysbiosis_test") maaslin_contrast_test(fits = fit_out$fit_data_abundance$fits, contrast_mat = contrast_mat) unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 fit_out <- maaslin3::maaslin3(input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) contrast_mat <- matrix(c(1, 1, 0, 0, 0, 0, 1, 1), ncol = 4, nrow = 2, byrow = TRUE) colnames(contrast_mat) <- c("diagnosisUC", "diagnosisCD", "dysbiosis_statedysbiosis_UC", "dysbiosis_statedysbiosis_CD") rownames(contrast_mat) <- c("diagnosis_test", "dysbiosis_test") maaslin_contrast_test(fits = fit_out$fit_data_abundance$fits, contrast_mat = contrast_mat) unlink('output', recursive=TRUE) logging::logReset()
Set abundances below zero_threshold
to zero, remove features
without abundances more than min_abundance
in
min_prevalence
of the samples, and remove features with variances
less than or equal to min_variance
.
maaslin_filter(normalized_data, output, min_abundance = 0, min_prevalence = 0, zero_threshold = 0, min_variance = 0)
maaslin_filter(normalized_data, output, min_abundance = 0, min_prevalence = 0, zero_threshold = 0, min_variance = 0)
normalized_data |
A data frame of normalized feature abundances. It should be formatted with features as columns and samples as rows. The column and row names should be the feature names and sample names respectively. |
output |
The output folder to write results. |
min_abundance |
Features with abundances more than
|
min_prevalence |
See |
zero_threshold |
Abundances less than or equal to
|
min_variance |
Features with abundance variances less than or equal
to |
A dataframe of filtered features (features are columns; samples are rows).
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') unlink('output', recursive=TRUE) logging::logReset()
Fit the abundance data with abundance and prevalence models to discover feature-metadata associations.
maaslin_fit(filtered_data, transformed_data, metadata, formula, random_effects_formula, feature_specific_covariate = NULL, feature_specific_covariate_name = NULL, feature_specific_covariate_record = NULL, zero_threshold = 0, max_significance = 0.1, correction = 'BH', median_comparison_abundance = TRUE, median_comparison_prevalence = FALSE, median_comparison_abundance_threshold = 0, median_comparison_prevalence_threshold = 0, subtract_median = FALSE, warn_prevalence = TRUE, augment = TRUE, evaluate_only = NULL, cores = 1, save_models = FALSE, data = NULL, min_abundance = 0, min_prevalence = 0, min_variance = 0)
maaslin_fit(filtered_data, transformed_data, metadata, formula, random_effects_formula, feature_specific_covariate = NULL, feature_specific_covariate_name = NULL, feature_specific_covariate_record = NULL, zero_threshold = 0, max_significance = 0.1, correction = 'BH', median_comparison_abundance = TRUE, median_comparison_prevalence = FALSE, median_comparison_abundance_threshold = 0, median_comparison_prevalence_threshold = 0, subtract_median = FALSE, warn_prevalence = TRUE, augment = TRUE, evaluate_only = NULL, cores = 1, save_models = FALSE, data = NULL, min_abundance = 0, min_prevalence = 0, min_variance = 0)
filtered_data |
A data frame of filtered feature abundances. It should be formatted with features as columns and samples as rows. The column and row names should be the feature names and sample names respectively. |
transformed_data |
A data frame of transformed feature abundances. It should be formatted with features as columns and samples as rows. The column and row names should be the feature names and sample names respectively. |
metadata |
A data frame of per-sample metadata. It should be formatted with variables as columns and samples as rows. The column and row names should be the variable names and sample names respectively. |
formula |
A formula in |
random_effects_formula |
A formula in |
feature_specific_covariate |
A table of feature-specific covariates
or a filepath to a tab-delimited file with feature-specific covariates.
It should be formatted with features as columns and samples as rows (or
the transpose). The row names and column names should be the same as
those of the |
feature_specific_covariate_name |
The name for the feature-specific covariates when fitting the models. |
feature_specific_covariate_record |
Whether to keep the feature-specific covariates in the outputs when calculating p-values, writing results, and displaying plots. |
zero_threshold |
Abundances less than or equal to
|
max_significance |
The FDR corrected q-value threshold for significance used in selecting which associations to write as significant and to plot. |
correction |
The correction to obtain FDR-corrected q-values from
raw p-values. Any valid options for |
median_comparison_abundance |
Test abundance coefficients against a null value corresponding to the median coefficient for a metadata variable across the features. This is recommended for relative abundance data but should not be used for absolute abundance data. |
median_comparison_prevalence |
Test prevalence coefficients against a null value corresponding to the median coefficient for a metadata variable across the features. This is only recommended if the analyst is interested in how feature prevalence associations compare to each other or if there is likely strong compositionality-induced sparsity. |
median_comparison_abundance_threshold |
Coefficients within
|
median_comparison_prevalence_threshold |
Same as
|
subtract_median |
Subtract the median from the coefficients. |
warn_prevalence |
Warn when prevalence associations are likely induced by abundance associations. This requires re-fitting the linear models on the TSS log-transformed data. |
augment |
Add extra lowly-weighted 0s and 1s to avoid linear separability. |
evaluate_only |
Whether to evaluate just the abundnace ("abundance") or prevalence ("prevalence") models |
cores |
How many cores to use when fitting models. (Using multiple cores will likely be faster only for large datasets or complex models. |
save_models |
Whether to return the fit models and save them to an RData file. |
data |
The original data (only necessary if warn_prevalence is TRUE). |
min_abundance |
The original min_abundance parameter (only necessary if warn_prevalence is TRUE). |
min_prevalence |
The original min_prevalence parameter (only necessary if warn_prevalence is TRUE). |
min_variance |
The original min_variance parameter (only necessary if min_variance is TRUE). |
A list containing the following named items:
fit_data_abundance
: The results from the fit abundance
models.
fit_data_prevalence
: The results from the fit
prevalence models.
The fit_data_abundance
and fit_data_prevalence
items have
the same structure. They are both lists with the following named items:
results
: A results table with the modeled associations
(see below).
residuals
: A features (rows) by samples (columns)
dataframe of residuals from the models.
fitted
: A features (rows) by samples (columns)
dataframe of fitted values from the models.
ranef
: A features (rows) by random effect (columns)
dataframe of random effects from the models. If multiple random effects
are specified, this is a dataframe of dataframes.
fits
: If save_models=TRUE
, this is a list of
the fit models.
The results
tables contain the following columns for each
association (row):
feature
: The feature involved in the association.
metadata
: The metadata variable involved in the
association.
value
: The value of the metadata variable: the
metadata variable itself if continuous or the level if categorical.
name
: The name of the model component involved in the
association: the metadata variable itself if continuous or a
concatenated version of the metadata variable and level if categorical.
coef
: The coefficient of the association: the slope
coefficient in the abundance model and the change in log odds in the
prevalence model.
stderr
: The standard error of the coefficient.
pval_individual
: The (uncorrected) p-value of the
association.
qval_individual
: The FDR corrected q-value of the
association. FDR correction is performed over all associations in the
abundance and prevalence modeling without errors together.
pval_joint
: The p-value of the overall association
(combining abundance and prevalence) by taking the minimum of the
abundance and logistic p-values and applying the Beta(1,2) CDF. These
will be the same in the abundance and prevalence results for an
association.
qval_joint
: The FDR corrected q-value of the
association. FDR correction is performed over all joint p-values
without errors.
error
: Any error produced by the model during fitting.
NA otherwise.
model
: linear
for the abundance models and
logistic
for the prevalence models.
N
: The number of data points for the association's
feature.
N_not_zero
: The number of non-zero data points for
the association's feature.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') formula <- formulas$formula random_effects_formula <- formulas$random_effects_formula normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') transformed_data = maaslin3::maaslin_transform(filtered_data, output = 'output') standardized_metadata = maaslin3::maaslin_process_metadata( metadata, formula = formula) maaslin_results = maaslin3::maaslin_fit( filtered_data, transformed_data, standardized_metadata, formula, random_effects_formula, warn_prevalence = FALSE) unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') formula <- formulas$formula random_effects_formula <- formulas$random_effects_formula normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') transformed_data = maaslin3::maaslin_transform(filtered_data, output = 'output') standardized_metadata = maaslin3::maaslin_process_metadata( metadata, formula = formula) maaslin_results = maaslin3::maaslin_fit( filtered_data, transformed_data, standardized_metadata, formula, random_effects_formula, warn_prevalence = FALSE) unlink('output', recursive=TRUE) logging::logReset()
Check that the parameters provided are valid for further MaAsLin 3 use and open a logger to log the parameters.
maaslin_log_arguments(input_data, input_metadata, output, formula = NULL, fixed_effects = NULL, reference = NULL, random_effects = NULL, group_effects = NULL, ordered_effects = NULL, strata_effects = NULL, feature_specific_covariate = NULL, feature_specific_covariate_name = NULL, feature_specific_covariate_record = NULL, min_abundance = 0, min_prevalence = 0, zero_threshold = 0, min_variance = 0, max_significance = 0.1, normalization = 'TSS', transform = 'LOG', correction = 'BH', standardize = TRUE, unscaled_abundance = NULL, median_comparison_abundance = TRUE, median_comparison_prevalence = FALSE, median_comparison_abundance_threshold = 0, median_comparison_prevalence_threshold = 0, subtract_median = FALSE, warn_prevalence = TRUE, augment = TRUE, evaluate_only = NULL, plot_summary_plot = TRUE, summary_plot_first_n = 25, coef_plot_vars = NULL, heatmap_vars = NULL, plot_associations = TRUE, max_pngs = 30, cores = 1, save_models = FALSE, save_plots_rds = FALSE, verbosity = 'FINEST', summary_plot_balanced = FALSE)
maaslin_log_arguments(input_data, input_metadata, output, formula = NULL, fixed_effects = NULL, reference = NULL, random_effects = NULL, group_effects = NULL, ordered_effects = NULL, strata_effects = NULL, feature_specific_covariate = NULL, feature_specific_covariate_name = NULL, feature_specific_covariate_record = NULL, min_abundance = 0, min_prevalence = 0, zero_threshold = 0, min_variance = 0, max_significance = 0.1, normalization = 'TSS', transform = 'LOG', correction = 'BH', standardize = TRUE, unscaled_abundance = NULL, median_comparison_abundance = TRUE, median_comparison_prevalence = FALSE, median_comparison_abundance_threshold = 0, median_comparison_prevalence_threshold = 0, subtract_median = FALSE, warn_prevalence = TRUE, augment = TRUE, evaluate_only = NULL, plot_summary_plot = TRUE, summary_plot_first_n = 25, coef_plot_vars = NULL, heatmap_vars = NULL, plot_associations = TRUE, max_pngs = 30, cores = 1, save_models = FALSE, save_plots_rds = FALSE, verbosity = 'FINEST', summary_plot_balanced = FALSE)
input_data |
A data frame of feature abundances or read counts or a filepath to a tab-delimited file with abundances. It should be formatted with features as columns and samples as rows (or the transpose). The column and row names should be the feature names and sample names respectively. |
input_metadata |
A data frame of per-sample metadata or a filepath to a tab-delimited file with metadata. It should be formatted with variables as columns and samples as rows (or the transpose). The column and row names should be the variable names and sample names respectively. |
output |
The output folder to write results. |
formula |
A formula in |
fixed_effects |
A vector of variable names to be included as fixed effects. |
reference |
For a variable with more than two levels supplied with
|
random_effects |
A vector of variable names to be included as random intercepts. |
group_effects |
A factored categorical variable to be included for group testing. An ANOVA-style test will be performed to assess whether any of the variable's levels are significant, and no coefficients or individual p-values will be returned. |
ordered_effects |
A factored categorical variable to be included. Consecutive levels will be tested for significance against each other, and the resulting associations will correspond to effect sizes, standard errors, and significances of each level versus the previous. |
strata_effects |
A vector with one variable name to be included as the strata variable in case-control studies. Strata cannot be combined with random effects. |
feature_specific_covariate |
A table of feature-specific covariates
or a filepath to a tab-delimited file with feature-specific covariates.
It should be formatted with features as columns and samples as rows (or
the transpose). The row names and column names should be the same as
those of the |
feature_specific_covariate_name |
The name for the feature-specific covariates when fitting the models. |
feature_specific_covariate_record |
Whether to keep the feature-specific covariates in the outputs when calculating p-values, writing results, and displaying plots. |
min_abundance |
Features with abundances more than
|
min_prevalence |
See |
zero_threshold |
Abundances less than or equal to
|
min_variance |
Features with abundance variances less than or equal
to |
max_significance |
The FDR corrected q-value threshold for significance used in selecting which associations to write as significant and to plot. |
normalization |
The normalization to apply to the features before
transformation and analysis. The option |
transform |
The transformation to apply to the features after
normalization and before analysis. The option |
correction |
The correction to obtain FDR-corrected q-values from
raw p-values. Any valid options for |
standardize |
Whether to apply z-scores to continuous metadata
variables so they are on the same scale. This is recommended in order to
compare coefficients across metadata variables, but note that functions
of the metadata specified in the |
unscaled_abundance |
A data frame with a single column of absolute
abundances or a filepath to such a tab-delimited file. The row names
should match the names of the samples in |
median_comparison_abundance |
Test abundance coefficients against a null value corresponding to the median coefficient for a metadata variable across the features. This is recommended for relative abundance data but should not be used for absolute abundance data. |
median_comparison_prevalence |
Test prevalence coefficients against a null value corresponding to the median coefficient for a metadata variable across the features. This is only recommended if the analyst is interested in how feature prevalence associations compare to each other or if there is likely strong compositionality-induced sparsity. |
median_comparison_abundance_threshold |
Coefficients within
|
median_comparison_prevalence_threshold |
Same as
|
subtract_median |
Subtract the median from the coefficients. |
warn_prevalence |
Warn when prevalence associations are likely induced by abundance associations. This requires re-fitting the linear models on the TSS log-transformed data. |
augment |
Add extra lowly-weighted 0s and 1s to avoid linear separability. |
evaluate_only |
Whether to evaluate just the abundnace ("abundance") or prevalence ("prevalence") models |
plot_summary_plot |
Generate a summary plot of significant associations. |
summary_plot_first_n |
Include the top |
coef_plot_vars |
Vector of variable names to be used in the
coefficient plot section of the summary plot. Continuous variables
should match the metadata column name, and categorical variables should
be of the form |
heatmap_vars |
Vector of variable names to be used in the heatmap
section of the summary plot. Continuous variables should match the
metadata column name, and categorical variables should be of the form
|
plot_associations |
Whether to generate plots for significant associations. |
max_pngs |
The top |
cores |
How many cores to use when fitting models. (Using multiple cores will likely be faster only for large datasets or complex models. |
save_models |
Whether to return the fit models and save them to an RData file. |
save_plots_rds |
Whether to return the plots to an RDS file. |
verbosity |
The level of verbosity for the |
summary_plot_balanced |
If set to TRUE the summary plot will
show the top N features of each variable included in
|
No value is returned, but a logger is opened with the parameters logged.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) # Prepare parameter lists maaslin3::maaslin_log_arguments(input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) # Prepare parameter lists maaslin3::maaslin_log_arguments(input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) unlink('output', recursive=TRUE) logging::logReset()
Normalize the abundance data according to the normalization
parameter. If unscaled_abundance
is specified, compute the
absolute abundances.
maaslin_normalize(data, output, zero_threshold = 0, normalization = 'TSS', unscaled_abundance = NULL)
maaslin_normalize(data, output, zero_threshold = 0, normalization = 'TSS', unscaled_abundance = NULL)
data |
A data frame of feature abundances. It should be formatted with features as columns and samples as rows. The column and row names should be the feature names and sample names respectively. |
output |
The output folder to write results. |
zero_threshold |
Abundances less than or equal to
|
normalization |
The normalization to apply to the features before
transformation and analysis. The option |
unscaled_abundance |
A data frame with a single column of absolute
abundances. The row names should match the names of the samples in
|
A dataframe of normalized features (features are columns; samples are rows).
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') normalized_data = maaslin3::maaslin_normalize(data, output = 'output') unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') normalized_data = maaslin3::maaslin_normalize(data, output = 'output') unlink('output', recursive=TRUE) logging::logReset()
Two types of plots are generated. First, the summary plot contains sorted per-feature coefficients plotted with their standard errors for key variables and a heatmap summarizing the remaining variables. Second, for significant features, association plots (scatterplots, boxplots, or tables depending on the association) are generated to visualize and verify the model fits. The data are shown with their transformed values in the association plots since this is the scale on which the models are fit.
maaslin_plot_results(output, transformed_data, unstandardized_metadata, fit_data_abundance, fit_data_prevalence, normalization, transform, feature_specific_covariate = NULL, feature_specific_covariate_name = NULL, feature_specific_covariate_record = NULL, median_comparison_abundance = TRUE, median_comparison_prevalence = FALSE, max_significance = 0.1, plot_summary_plot = TRUE, summary_plot_first_n = 25, coef_plot_vars = NULL, heatmap_vars = NULL, plot_associations = TRUE, max_pngs = 30, balanced = FALSE, save_plots_rds = FALSE)
maaslin_plot_results(output, transformed_data, unstandardized_metadata, fit_data_abundance, fit_data_prevalence, normalization, transform, feature_specific_covariate = NULL, feature_specific_covariate_name = NULL, feature_specific_covariate_record = NULL, median_comparison_abundance = TRUE, median_comparison_prevalence = FALSE, max_significance = 0.1, plot_summary_plot = TRUE, summary_plot_first_n = 25, coef_plot_vars = NULL, heatmap_vars = NULL, plot_associations = TRUE, max_pngs = 30, balanced = FALSE, save_plots_rds = FALSE)
output |
The output folder to write results. |
transformed_data |
A data frame of transformed feature abundances. It should be formatted with features as columns and samples as rows. The column and row names should be the feature names and sample names respectively. |
unstandardized_metadata |
A data frame of per-sample metadata. It should be formatted with variables as columns and samples as rows. The column and row names should be the variable names and sample names respectively. |
fit_data_abundance |
The abundance outputs of
|
fit_data_prevalence |
The prevalence outputs of
|
normalization |
The normalization to apply to the features before
transformation and analysis. The option |
transform |
The transformation to apply to the features after
normalization and before analysis. The option |
feature_specific_covariate |
A table of feature-specific covariates
or a filepath to a tab-delimited file with feature-specific covariates.
It should be formatted with features as columns and samples as rows (or
the transpose). The row names and column names should be the same as
those of the |
feature_specific_covariate_name |
The name for the feature-specific covariates when fitting the models. |
feature_specific_covariate_record |
Whether to keep the feature-specific covariates in the outputs when calculating p-values, writing results, and displaying plots. |
median_comparison_abundance |
Test abundance coefficients against a null value corresponding to the median coefficient for a metadata variable across the features. This is recommended for relative abundance data but should not be used for absolute abundance data. |
median_comparison_prevalence |
Test prevalence coefficients against a null value corresponding to the median coefficient for a metadata variable across the features. This is only recommended if the analyst is interested in how feature prevalence associations compare to each other or if there is likely strong compositionality-induced sparsity. |
max_significance |
The FDR corrected q-value threshold for significance used in selecting which associations to write as significant and to plot. |
plot_summary_plot |
Generate a summary plot of significant associations. |
summary_plot_first_n |
Include the top |
coef_plot_vars |
Vector of variable names to be used in the
coefficient plot section of the summary plot. Continuous variables
should match the metadata column name, and categorical variables should
be of the form |
heatmap_vars |
Vector of variable names to be used in the heatmap
section of the summary plot. Continuous variables should match the
metadata column name, and categorical variables should be of the form
|
plot_associations |
Whether to generate plots for significant associations. |
max_pngs |
The top |
balanced |
If set to TRUE the summary plot will
show the top N features of each variable included in
|
save_plots_rds |
Whether to return the plots to an RDS file. |
Results will be written to the figures
folder within the folder
output
. The list of individual association plots is returned if
plot_associations=TRUE
. In the heatmap of the summary plot, one
star corresponds to the user-set max_significance
and two stars
corresponds to the user-set max_signifiance
divided by 10.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') formula <- formulas$formula random_effects_formula <- formulas$random_effects_formula normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') transformed_data = maaslin3::maaslin_transform(filtered_data, output = 'output') standardized_metadata = maaslin3::maaslin_process_metadata( metadata, formula = formula) maaslin_results = maaslin3::maaslin_fit( filtered_data, transformed_data, standardized_metadata, formula, random_effects_formula, warn_prevalence = FALSE) maaslin3::maaslin_write_results( output = 'output', maaslin_results$fit_data_abundance, maaslin_results$fit_data_prevalence, random_effects_formula) maaslin3::maaslin_plot_results( output = 'output', transformed_data, metadata, maaslin_results$fit_data_abundance, maaslin_results$fit_data_prevalence, normalization = "TSS", transform = "LOG") unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') formula <- formulas$formula random_effects_formula <- formulas$random_effects_formula normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') transformed_data = maaslin3::maaslin_transform(filtered_data, output = 'output') standardized_metadata = maaslin3::maaslin_process_metadata( metadata, formula = formula) maaslin_results = maaslin3::maaslin_fit( filtered_data, transformed_data, standardized_metadata, formula, random_effects_formula, warn_prevalence = FALSE) maaslin3::maaslin_write_results( output = 'output', maaslin_results$fit_data_abundance, maaslin_results$fit_data_prevalence, random_effects_formula) maaslin3::maaslin_plot_results( output = 'output', transformed_data, metadata, maaslin_results$fit_data_abundance, maaslin_results$fit_data_prevalence, normalization = "TSS", transform = "LOG") unlink('output', recursive=TRUE) logging::logReset()
Two types of plots are generated. First, the summary plot contains
sorted per-feature coefficients plotted with their standard errors for
key variables and a heatmap summarizing the remaining variables. Second,
for significant features, association plots (scatterplots, boxplots, or
tables depending on the association) are generated to visualize and
verify the model fits. The data are shown with their transformed values
in the association plots since this is the scale on which the models are
fit. In comparison to maaslin_plot_results
that needs the entire
maaslin_fit
list, only the parameter list and an outputs
directory containing a completed run are needed for
maaslin_plot_results_from_output
.
maaslin_plot_results_from_output(output, metadata, normalization, transform, feature_specific_covariate = NULL, feature_specific_covariate_name = NULL, feature_specific_covariate_record = NULL, median_comparison_abundance = TRUE, median_comparison_prevalence = FALSE, max_significance = 0.1, plot_summary_plot = TRUE, summary_plot_first_n = 25, coef_plot_vars = NULL, heatmap_vars = NULL, plot_associations = TRUE, max_pngs = 30, balanced = FALSE, save_plots_rds = FALSE)
maaslin_plot_results_from_output(output, metadata, normalization, transform, feature_specific_covariate = NULL, feature_specific_covariate_name = NULL, feature_specific_covariate_record = NULL, median_comparison_abundance = TRUE, median_comparison_prevalence = FALSE, max_significance = 0.1, plot_summary_plot = TRUE, summary_plot_first_n = 25, coef_plot_vars = NULL, heatmap_vars = NULL, plot_associations = TRUE, max_pngs = 30, balanced = FALSE, save_plots_rds = FALSE)
output |
The output folder to write results. |
metadata |
A data frame of per-sample metadata. It should be formatted with variables as columns and samples as rows. The column and row names should be the variable names and sample names respectively. |
normalization |
The normalization to apply to the features before
transformation and analysis. The option |
transform |
The transformation to apply to the features after
normalization and before analysis. The option |
feature_specific_covariate |
A table of feature-specific covariates
or a filepath to a tab-delimited file with feature-specific covariates.
It should be formatted with features as columns and samples as rows (or
the transpose). The row names and column names should be the same as
those of the |
feature_specific_covariate_name |
The name for the feature-specific covariates when fitting the models. |
feature_specific_covariate_record |
Whether to keep the feature-specific covariates in the outputs when calculating p-values, writing results, and displaying plots. |
median_comparison_abundance |
Test abundance coefficients against a null value corresponding to the median coefficient for a metadata variable across the features. This is recommended for relative abundance data but should not be used for absolute abundance data. |
median_comparison_prevalence |
Test prevalence coefficients against a null value corresponding to the median coefficient for a metadata variable across the features. This is only recommended if the analyst is interested in how feature prevalence associations compare to each other or if there is likely strong compositionality-induced sparsity. |
max_significance |
The FDR corrected q-value threshold for significance used in selecting which associations to write as significant and to plot. |
plot_summary_plot |
Generate a summary plot of significant associations. |
summary_plot_first_n |
Include the top |
coef_plot_vars |
Vector of variable names to be used in the
coefficient plot section of the summary plot. Continuous variables
should match the metadata column name, and categorical variables should
be of the form |
heatmap_vars |
Vector of variable names to be used in the heatmap
section of the summary plot. Continuous variables should match the
metadata column name, and categorical variables should be of the form
|
plot_associations |
Whether to generate plots for significant associations. |
max_pngs |
The top |
balanced |
If set to TRUE the summary plot will
show the top N features of each variable included in
|
save_plots_rds |
Whether to return the plots to an RDS file. |
Results will be written to the figures
folder within the folder
output
. The list of individual association plots is returned if
plot_associations=TRUE
. In the heatmap of the summary plot, one
star corresponds to the user-set max_significance
and two stars
corresponds to the user-set max_signifiance
divided by 10.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') formula <- formulas$formula random_effects_formula <- formulas$random_effects_formula normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') transformed_data = maaslin3::maaslin_transform(filtered_data, output = 'output') standardized_metadata = maaslin3::maaslin_process_metadata( metadata, formula = formula) maaslin_results = maaslin3::maaslin_fit( filtered_data, transformed_data, standardized_metadata, formula, random_effects_formula, warn_prevalence = FALSE) maaslin3::maaslin_write_results( output = 'output', maaslin_results$fit_data_abundance, maaslin_results$fit_data_prevalence, random_effects_formula) maaslin3::maaslin_plot_results_from_output( output = 'output', metadata, normalization = "TSS", transform = "LOG") unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') formula <- formulas$formula random_effects_formula <- formulas$random_effects_formula normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') transformed_data = maaslin3::maaslin_transform(filtered_data, output = 'output') standardized_metadata = maaslin3::maaslin_process_metadata( metadata, formula = formula) maaslin_results = maaslin3::maaslin_fit( filtered_data, transformed_data, standardized_metadata, formula, random_effects_formula, warn_prevalence = FALSE) maaslin3::maaslin_write_results( output = 'output', maaslin_results$fit_data_abundance, maaslin_results$fit_data_prevalence, random_effects_formula) maaslin3::maaslin_plot_results_from_output( output = 'output', metadata, normalization = "TSS", transform = "LOG") unlink('output', recursive=TRUE) logging::logReset()
Check that references are set properly if the metadata variables are
categorical and provided through fixed_effects
. Standardize the
continuous metadata variables as a z-score (subtract the mean, divide by
the standard deviation) if standardize
is set.
maaslin_process_metadata(metadata, formula = NULL, fixed_effects = NULL, reference = NULL, feature_specific_covariate_name = NULL, standardize = TRUE)
maaslin_process_metadata(metadata, formula = NULL, fixed_effects = NULL, reference = NULL, feature_specific_covariate_name = NULL, standardize = TRUE)
metadata |
A data frame of per-sample metadata. It should be formatted with variables as columns and samples as rows. The column and row names should be the variable names and sample names respectively. |
formula |
A formula in |
fixed_effects |
A vector of variable names to be included as fixed effects. |
reference |
For a variable with more than two levels supplied with
|
feature_specific_covariate_name |
The name for the feature-specific covariates when fitting the models. |
standardize |
Whether to apply z-scores to continuous metadata
variables so they are on the same scale. This is recommended in order to
compare coefficients across metadata variables, but note that functions
of the metadata specified in the |
The processed metadata.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') formula <- formulas$formula random_effects_formula <- formulas$random_effects_formula normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') standardized_metadata = maaslin3::maaslin_process_metadata( metadata, formula = formula) unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') formula <- formulas$formula random_effects_formula <- formulas$random_effects_formula normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') standardized_metadata = maaslin3::maaslin_process_metadata( metadata, formula = formula) unlink('output', recursive=TRUE) logging::logReset()
Read in the abundance data and metadata from files if necessary.
maaslin_read_data(input_data, input_metadata, feature_specific_covariate = NULL, unscaled_abundance = NULL)
maaslin_read_data(input_data, input_metadata, feature_specific_covariate = NULL, unscaled_abundance = NULL)
input_data |
A data frame of feature abundances or read counts or a filepath to a tab-delimited file with abundances. It should be formatted with features as columns and samples as rows (or the transpose). The column and row names should be the feature names and sample names respectively. |
input_metadata |
A data frame of per-sample metadata or a filepath to a tab-delimited file with metadata. It should be formatted with variables as columns and samples as rows (or the transpose). The column and row names should be the variable names and sample names respectively. |
feature_specific_covariate |
A table of feature-specific covariates
or a filepath to a tab-delimited file with feature-specific covariates.
It should be formatted with features as columns and samples as rows (or
the transpose). The row names and column names should be the same as
those of the |
unscaled_abundance |
A data frame with a single column of absolute
abundances or a filepath to such a tab-delimited file. The row names
should match the names of the samples in |
A list containing the following items:
data
: A data frame of feature abundances.
metadata
: A data frame of metadata.
feature_specific_covariate
: A data frame of feature
specific covariates.
unscaled_abundance
: A data frame of unscaled
abundances.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) unlink('output', recursive=TRUE) logging::logReset()
Reorder the abundance data and metadata to ensure samples are rows and remove any samples without abundances or metadata.
maaslin_reorder_data(data, metadata, feature_specific_covariate = NULL, unscaled_abundance = NULL)
maaslin_reorder_data(data, metadata, feature_specific_covariate = NULL, unscaled_abundance = NULL)
data |
A data frame of feature abundances or read counts. It should be formatted with features as columns and samples as rows (or the transpose). The column and row names should be the feature names and sample names respectively. |
metadata |
A data frame of per-sample metadata. It should be formatted with variables as columns and samples as rows (or the transpose). The column and row names should be the variable names and sample names respectively. |
feature_specific_covariate |
A table of feature-specific
covariates. It should be formatted with features as columns and samples
as rows (or the transpose). The row names and column names should be the
same as those of the |
unscaled_abundance |
A data frame with a single column of absolute
abundances. The row names should match the names of the samples in
|
A list containing the following items:
data
: A data frame of feature abundances.
metadata
: A data frame of metadata.
feature_specific_covariate
: A data frame of feature
specific covariates.
unscaled_abundance
: A data frame of unscaled
abundances.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( taxa_table, metadata) unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( taxa_table, metadata) unlink('output', recursive=TRUE) logging::logReset()
Transform the abundance data according to the transform
parameter.
maaslin_transform(filtered_data, output, transform = 'LOG')
maaslin_transform(filtered_data, output, transform = 'LOG')
filtered_data |
A data frame of filtered feature abundances. It should be formatted with features as columns and samples as rows. The column and row names should be the feature names and sample names respectively. |
output |
The output folder to write results. |
transform |
The transformation to apply to the features after
normalization and before analysis. The option |
A dataframe of transformed features (features are columns; samples are rows).
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') transformed_data = maaslin3::maaslin_transform(filtered_data, output = 'output') unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') transformed_data = maaslin3::maaslin_transform(filtered_data, output = 'output') unlink('output', recursive=TRUE) logging::logReset()
Write the results from a MaAsLin 3 run to the output folder as a TSV.
maaslin_write_results(output, fit_data_abundance, fit_data_prevalence, random_effects_formula = NULL, max_significance = 0.1, save_models = FALSE)
maaslin_write_results(output, fit_data_abundance, fit_data_prevalence, random_effects_formula = NULL, max_significance = 0.1, save_models = FALSE)
output |
The output folder to write results. |
fit_data_abundance |
The abundance outputs of
|
fit_data_prevalence |
The prevalence outputs of
|
random_effects_formula |
A formula in |
max_significance |
The FDR corrected q-value threshold for significance used in selecting which associations to write as significant and to plot. |
save_models |
Whether to return the fit models and save them to an RData file. |
Results will be written to the all_results.tsv
and
significant_results.tsv
files in the folder output
. The
file all_results.tsv
will contain all results in the
fit_data_abundance
and fit_data_prevalence
items of the
input list (with 'linear' and 'logistic' replaced by 'abundance' and
'prevalence' in the model
column). The file
significant_results.tsv
will contain all results with joint or
individual
q-values below the 'max_significance' parameter and only 'NA' in the
'error' column of the results (the 'error' column is subsequently
dropped). No value is returned.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') formula <- formulas$formula random_effects_formula <- formulas$random_effects_formula normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') transformed_data = maaslin3::maaslin_transform(filtered_data, output = 'output') standardized_metadata = maaslin3::maaslin_process_metadata( metadata, formula = formula) maaslin_results = maaslin3::maaslin_fit( filtered_data, transformed_data, standardized_metadata, formula, random_effects_formula, warn_prevalence = FALSE) maaslin3::maaslin_write_results( output = 'output', maaslin_results$fit_data_abundance, maaslin_results$fit_data_prevalence, random_effects_formula) unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') formula <- formulas$formula random_effects_formula <- formulas$random_effects_formula normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') transformed_data = maaslin3::maaslin_transform(filtered_data, output = 'output') standardized_metadata = maaslin3::maaslin_process_metadata( metadata, formula = formula) maaslin_results = maaslin3::maaslin_fit( filtered_data, transformed_data, standardized_metadata, formula, random_effects_formula, warn_prevalence = FALSE) maaslin3::maaslin_write_results( output = 'output', maaslin_results$fit_data_abundance, maaslin_results$fit_data_prevalence, random_effects_formula) unlink('output', recursive=TRUE) logging::logReset()
Write the results from a MaAsLin 3 run to the output folder in LEfSe format.
maaslin_write_results_lefse_format(output, fit_data_abundance, fit_data_prevalence)
maaslin_write_results_lefse_format(output, fit_data_abundance, fit_data_prevalence)
output |
The output folder to write results. |
fit_data_abundance |
The abundance outputs of
|
fit_data_prevalence |
The prevalence outputs of
|
Results will be written to the lefse_style_results_abundance.res
file in the folder output
. No value is returned.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') formula <- formulas$formula random_effects_formula <- formulas$random_effects_formula normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') transformed_data = maaslin3::maaslin_transform(filtered_data, output = 'output') standardized_metadata = maaslin3::maaslin_process_metadata( metadata, formula = formula) maaslin_results = maaslin3::maaslin_fit( filtered_data, transformed_data, standardized_metadata, formula, random_effects_formula, warn_prevalence = FALSE) maaslin3::maaslin_write_results_lefse_format( output = 'output', maaslin_results$fit_data_abundance, maaslin_results$fit_data_prevalence) unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 maaslin3::maaslin_log_arguments( input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) read_data_list <- maaslin3::maaslin_read_data( taxa_table, metadata) read_data_list <- maaslin3::maaslin_reorder_data( read_data_list$data, read_data_list$metadata) data <- read_data_list$data metadata <- read_data_list$metadata formulas <- maaslin3::maaslin_check_formula( data, metadata, input_formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads') formula <- formulas$formula random_effects_formula <- formulas$random_effects_formula normalized_data = maaslin3::maaslin_normalize(data, output = 'output') filtered_data = maaslin3::maaslin_filter(normalized_data, output = 'output') transformed_data = maaslin3::maaslin_transform(filtered_data, output = 'output') standardized_metadata = maaslin3::maaslin_process_metadata( metadata, formula = formula) maaslin_results = maaslin3::maaslin_fit( filtered_data, transformed_data, standardized_metadata, formula, random_effects_formula, warn_prevalence = FALSE) maaslin3::maaslin_write_results_lefse_format( output = 'output', maaslin_results$fit_data_abundance, maaslin_results$fit_data_prevalence) unlink('output', recursive=TRUE) logging::logReset()
This wrapper for all MaAsLin 3 steps finds abundance and prevalence associations between microbiome meta-omics features and complex metadata in population-scale epidemiological studies. The software includes multiple analysis methods (including support for multiple covariates, repeated measures, and ordered predictors), filtering, normalization, and transform options to customize analysis for your specific study.
maaslin3(input_data, input_metadata, output, formula = NULL, fixed_effects = NULL, reference = NULL, random_effects = NULL, group_effects = NULL, ordered_effects = NULL, strata_effects = NULL, feature_specific_covariate = NULL, feature_specific_covariate_name = NULL, feature_specific_covariate_record = NULL, min_abundance = 0, min_prevalence = 0, zero_threshold = 0, min_variance = 0, max_significance = 0.1, normalization = 'TSS', transform = 'LOG', correction = 'BH', standardize = TRUE, unscaled_abundance = NULL, median_comparison_abundance = TRUE, median_comparison_prevalence = FALSE, median_comparison_abundance_threshold = 0, median_comparison_prevalence_threshold = 0, subtract_median = FALSE, warn_prevalence = TRUE, augment = TRUE, evaluate_only = NULL, plot_summary_plot = TRUE, summary_plot_first_n = 25, coef_plot_vars = NULL, heatmap_vars = NULL, plot_associations = TRUE, max_pngs = 30, cores = 1, save_models = FALSE, save_plots_rds = FALSE, verbosity = 'FINEST', summary_plot_balanced = FALSE)
maaslin3(input_data, input_metadata, output, formula = NULL, fixed_effects = NULL, reference = NULL, random_effects = NULL, group_effects = NULL, ordered_effects = NULL, strata_effects = NULL, feature_specific_covariate = NULL, feature_specific_covariate_name = NULL, feature_specific_covariate_record = NULL, min_abundance = 0, min_prevalence = 0, zero_threshold = 0, min_variance = 0, max_significance = 0.1, normalization = 'TSS', transform = 'LOG', correction = 'BH', standardize = TRUE, unscaled_abundance = NULL, median_comparison_abundance = TRUE, median_comparison_prevalence = FALSE, median_comparison_abundance_threshold = 0, median_comparison_prevalence_threshold = 0, subtract_median = FALSE, warn_prevalence = TRUE, augment = TRUE, evaluate_only = NULL, plot_summary_plot = TRUE, summary_plot_first_n = 25, coef_plot_vars = NULL, heatmap_vars = NULL, plot_associations = TRUE, max_pngs = 30, cores = 1, save_models = FALSE, save_plots_rds = FALSE, verbosity = 'FINEST', summary_plot_balanced = FALSE)
input_data |
A data frame of feature abundances or read counts, a filepath to a tab-delimited file with abundances, or a SummarizedExperiment or TreeSummarizedExperiment object with the taxa table in 'assays' and metadata in 'colData'. If a data frame or a filepath is supplied, the table should be formatted with features as columns and samples as rows (or the transpose). The column and row names should be the feature names and sample names respectively. |
input_metadata |
A data frame of per-sample metadata or a filepath to a tab-delimited file with metadata. It should be formatted with variables as columns and samples as rows (or the transpose). The column and row names should be the variable names and sample names respectively. |
output |
The output folder to write results. |
formula |
A formula in |
fixed_effects |
A vector of variable names to be included as fixed effects. |
reference |
For a variable with more than two levels supplied with
|
random_effects |
A vector of variable names to be included as random intercepts. |
group_effects |
A factored categorical variable to be included for group testing. An ANOVA-style test will be performed to assess whether any of the variable's levels are significant, and no coefficients or individual p-values will be returned. |
ordered_effects |
A factored categorical variable to be included. Consecutive levels will be tested for significance against each other, and the resulting associations will correspond to effect sizes, standard errors, and significances of each level versus the previous. |
strata_effects |
A vector with one variable name to be included as the strata variable in case-control studies. Strata cannot be combined with random effects. |
feature_specific_covariate |
A table of feature-specific covariates
or
a filepath to a tab-delimited file with feature-specific covariates. It
should be formatted with features as columns and samples as rows (or the
transpose). The row names and column names should be the same as those
of
the |
feature_specific_covariate_name |
The name for the feature-specific covariates when fitting the models. |
feature_specific_covariate_record |
Whether to keep the feature-specific covariates in the outputs when calculating p-values, writing results, and displaying plots. |
min_abundance |
Features with abundances more than
|
min_prevalence |
See |
zero_threshold |
Abundances less than or equal to
|
min_variance |
Features with abundance variances less than or equal
to
|
max_significance |
The FDR corrected q-value threshold for significance used in selecting which associations to write as significant and to plot. |
normalization |
The normalization to apply to the features before
transformation and analysis. The option |
transform |
The transformation to apply to the features after
normalization and before analysis. The option |
correction |
The correction to obtain FDR-corrected q-values from
raw
p-values. Any valid options for |
standardize |
Whether to apply z-scores to continuous metadata
variables so they are on the same scale. This is recommended in order to
compare coefficients across metadata variables, but note that functions
of the metadata specified in the |
unscaled_abundance |
A data frame with a single column of absolute
abundances or a filepath to such a tab-delimited file. The row names
should match the names of the samples in |
median_comparison_abundance |
Test abundance coefficients against a null value corresponding to the median coefficient for a metadata variable across the features. This is recommended for relative abundance data but should not be used for absolute abundance data. |
median_comparison_prevalence |
Test prevalence coefficients against a null value corresponding to the median coefficient for a metadata variable across the features. This is only recommended if the analyst is interested in how feature prevalence associations compare to each other or if there is likely strong compositionality-induced sparsity. |
median_comparison_abundance_threshold |
Coefficients within
|
median_comparison_prevalence_threshold |
Same as
|
subtract_median |
Subtract the median from the coefficients. |
warn_prevalence |
Warn when prevalence associations are likely induced by abundance associations. This requires re-fitting the linear models on the TSS log-transformed data. |
augment |
Add extra lowly-weighted 0s and 1s to avoid linear separability. |
evaluate_only |
Whether to evaluate just the abundnace ("abundance") or prevalence ("prevalence") models |
plot_summary_plot |
Generate a summary plot of significant associations. |
summary_plot_first_n |
Include the top |
coef_plot_vars |
Vector of variable names to be used in the
coefficient plot section of the summary plot. Continuous variables
should match the metadata column name, and categorical variables should
be of the form |
heatmap_vars |
Vector of variable names to be used in the heatmap
section of the summary plot. Continuous variables should match the
metadata column name, and categorical variables should be of the form
|
plot_associations |
Whether to generate plots for significant associations. |
max_pngs |
The top |
cores |
How many cores to use when fitting models. (Using multiple cores will likely be faster only for large datasets or complex models. |
save_models |
Whether to return the fit models and save them to an RData file. |
save_plots_rds |
Whether to return the fit models and save them to an RData file. |
verbosity |
The level of verbosity for the |
summary_plot_balanced |
If set to TRUE the summary plot will
show the top N features of each variable included in
|
A list containing the following items:
data
: A dataframe of feature abundances with the
retained samples for fitting.
normalized_data
: A dataframe of normalized feature
abundances.
filtered_data
: A dataframe of feature abundances on
the original scale after normalization and filtering.
transformed_data
: A dataframe of feature abundances
after filtering, normalization, and transformation.
metadata
: A dataframe of metadata with the retained
samples for fitting.
standardized_metadata
: A dataframe of metadata after
scaling (if selected).
formula
: Checked or constructed formula(s) specifying
the model to be fit.
fit_data_abundance
: The results from the fit abundance
models (see maaslin_fit
).
fit_data_prevalence
: The results from the fit
prevalence models (see maaslin_fit
).
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 fit_out <- maaslin3::maaslin3(input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) unlink('output', recursive=TRUE) logging::logReset()
# Read features table taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package = "maaslin3") taxa_table <- read.csv(taxa_table_name, sep = '\t') # Read metadata table metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package = "maaslin3") metadata <- read.csv(metadata_name, sep = '\t') metadata$diagnosis <- factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD')) metadata$dysbiosis_state <- factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC', 'dysbiosis_CD')) metadata$antibiotics <- factor(metadata$antibiotics, levels = c('No', 'Yes')) #Run MaAsLin3 fit_out <- maaslin3::maaslin3(input_data = taxa_table, input_metadata = metadata, output = 'output', formula = '~ diagnosis + dysbiosis_state + antibiotics + age + reads', plot_summary_plot = FALSE, plot_associations = FALSE) unlink('output', recursive=TRUE) logging::logReset()
Pre-process the DNA covariates for metatranscriptomics by total-sum-scaling DNA abundances per sample and then, for each sample in each feature:
Log 2 transforming the DNA abundance if the DNA abundance is >=0
Setting the DNA abundance to log2([minimum non-zero relative
abundance in the dataset] / 2)
if the corresponding RNA abundance is
non-zero but the DNA abundance is zero
Setting the DNA abundance to NA if both are zero
When the DNA is present, the RNA data can be modeled as usual in MaAsLin
3 with
log2(DNA)
as a covariate. When the DNA is not present, if the RNA
is
present, we assume the DNA was missed due to finite read depth, so the
DNA
abundance is imputed with a small pseudo-count. When neither the DNA nor
RNA is
present, we assume the gene/microbe was not in the sample and therefore
no
information about the transcription level can be obtained. Setting the
DNA
covariate to NA has the effect of dropping the sample when fitting the
relevant
feature's model in MaAsLin 3. Unlike most MaAsLin functions that will
infer the
samples from the row names and column names, the rna_table
must be
formated as samples (rows) by features (columns).
preprocess_dna_mtx(dna_table, rna_table)
preprocess_dna_mtx(dna_table, rna_table)
dna_table |
The samples (rows) by features (columns) data frame of DNA or taxon abundances to preprocess. These can be relative abundances or counts. |
rna_table |
The samples (rows) by features (columns) data frame of RNA to preprocess. These can be relative abundances or counts. |
A list containing the following named items:
dna_table
: The table of log2 transformed DNA relative
abundances with NAs for any feature-sample pairs for which both the DNA
and
RNA abundances were 0.
rna_table
: The table of total sum scaled RNA abundances.
These
are not log2 transformed.
William Nickols<[email protected]>,
Jacob Nearing<[email protected]>,
Maintainers: Lauren McIver<[email protected]>,
mgx_in <- data.frame('a' = c(1, 2, 0, 4, 5), 'b' = c(2, 3, 4, 5, 6), 'c' = c(3, 4, 5, 6, 0)) rownames(mgx_in) <- paste0("sample", c(1:5)) mtx_in <- data.frame('a' = c(1, 2, 3, 4, 5), 'b' = c(2, 3, 4, 5, 0), 'c' = c(3, 4, 5, 6, 0)) rownames(mtx_in) <- paste0("sample", c(1:5)) preprocess_out <- preprocess_dna_mtx(mgx_in, mtx_in)
mgx_in <- data.frame('a' = c(1, 2, 0, 4, 5), 'b' = c(2, 3, 4, 5, 6), 'c' = c(3, 4, 5, 6, 0)) rownames(mgx_in) <- paste0("sample", c(1:5)) mtx_in <- data.frame('a' = c(1, 2, 3, 4, 5), 'b' = c(2, 3, 4, 5, 0), 'c' = c(3, 4, 5, 6, 0)) rownames(mtx_in) <- paste0("sample", c(1:5)) preprocess_out <- preprocess_dna_mtx(mgx_in, mtx_in)