| Title: | Bayesian analysis of longitudinal metabolomics data |
|---|---|
| Description: | MetaboDynamics is an R-package that provides a framework of probabilistic models to analyze longitudinal metabolomics data. It enables robust estimation of mean concentrations despite varying spread between timepoints and reports differences between timepoints as well as metabolite specific dynamics profiles that can be used for identifying "dynamics clusters" of metabolites of similar dynamics. Provides probabilistic over-representation analysis of KEGG functional modules and pathways as well as comparison between clusters of different experimental conditions. |
| Authors: | Katja Danielzik [aut, cre] (ORCID: <https://orcid.org/0009-0007-5021-6212>), Simo Kitanovski [ctb] (ORCID: <https://orcid.org/0000-0003-2909-5376>), Johann Matschke [ctb] (ORCID: <https://orcid.org/0000-0003-4878-8741>), Daniel Hoffmann [ctb] (ORCID: <https://orcid.org/0000-0003-2973-7869>) |
| Maintainer: | Katja Danielzik <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 2.3.0 |
| Built: | 2026-05-29 06:08:31 UTC |
| Source: | https://github.com/bioc/MetaboDynamics |
Convenient wrapper function for clustering of metabolite dynamics employing the "hybrid" method of the dynamicTreeCut package for clustering and hclust for computing of distance matrix and hierarchical clustering needed as input for dynamicTreeCut. Provides bootstrapping of clustering solution from posterior estimates of the model.
cluster_dynamics( data = NULL, fit, estimates = NULL, distance = "euclidean", agglomeration = "ward.D2", minClusterSize = 1, deepSplit = 2, B = 1000 )cluster_dynamics( data = NULL, fit, estimates = NULL, distance = "euclidean", agglomeration = "ward.D2", minClusterSize = 1, deepSplit = 2, B = 1000 )
data |
data frame or colData of a SummarizedExperiment used to fit dynamics model |
fit |
model fit obtained by fit_dynamics_model(). Needed if data is not a SummarizedExperiment object for which the model fit is saved in "dynamic_fit" of metadata |
estimates |
output of estimates_dynamics function, needed if data is not a SummarizedExperiment object for which the model estimates are saved in "estimates_dynamics" of metadata |
distance |
distance method to be used as input for hierarchical clustering dist can be "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski" |
agglomeration |
agglomerative method to be used for hierarchical clustering hclust can be "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median" or "centroid" |
minClusterSize |
minimum number of metabolites per of cluster cutreeDynamic |
deepSplit |
rough control over sensitivity of cluster analysis. Possible values are 0:4, the higher the value, the more and smaller clusters will be produced by cutreeDynamic |
B |
number of bootstraps |
a list with one list per condition. The elements per condition are 'data' (mean estimates of mu plus the clustering solution), mean_dendro' the dendrogram of the mean estimates, and mean_phylo' the phylogram of the mean estimates. if data is a SummarizedExperiment object clustering results are stored in metadata under "cluster" Element 'dynamics' contains column names of time points
fit_dynamics_model(), estimates_dynamics(), plot_cluster()
data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" & longitudinalMetabolomics$metabolite %in% c("ATP", "L-Alanine", "GDP")] data <- fit_dynamics_model( data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- estimates_dynamics( data = data ) data <- cluster_dynamics(data, B = 1000) S4Vectors::metadata(data)[["cluster"]][["A"]] plot(S4Vectors::metadata(data)[["cluster"]][["A"]][["mean_dendro"]])data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" & longitudinalMetabolomics$metabolite %in% c("ATP", "L-Alanine", "GDP")] data <- fit_dynamics_model( data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- estimates_dynamics( data = data ) data <- cluster_dynamics(data, B = 1000) S4Vectors::metadata(data)[["cluster"]][["A"]] plot(S4Vectors::metadata(data)[["cluster"]][["A"]][["mean_dendro"]])
Employs a Bayesian model that assumes a normal distribution of Euclidean distances between dynamics vectors (metabolite abundances at different time points) of two clusters that come from different experimental conditions to estimate the mean distance between clusters.
compare_dynamics(data, cores = 4)compare_dynamics(data, cores = 4)
data |
result of |
cores |
how many cores should be used for model fitting; this parallelizes the model fitting and therefore speeds it up; default=4 |
a list holding a 1) the model fit 2) dataframe of estimates of the mean distance between #' clusters of different experimental conditions ("mean") and the standard deviation ("sigma"). If data input was a SummarizedExperiment results are stored in metadata(data) under "comparison_dynamics"
Visualization of estimates heatmap_dynamics()/
compare metabolite composition of clusters compare_metabolites()
data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition %in% c("A", "B") & longitudinalMetabolomics$metabolite %in% c("ATP", "L-Alanine", "GDP")] data <- fit_dynamics_model( data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- estimates_dynamics( data = data ) data <- cluster_dynamics(data, B = 1) data <- compare_dynamics( data = data, cores = 1 ) S4Vectors::metadata(data)[["comparison_dynamics"]]data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition %in% c("A", "B") & longitudinalMetabolomics$metabolite %in% c("ATP", "L-Alanine", "GDP")] data <- fit_dynamics_model( data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- estimates_dynamics( data = data ) data <- cluster_dynamics(data, B = 1) data <- compare_dynamics( data = data, cores = 1 ) S4Vectors::metadata(data)[["comparison_dynamics"]]
Uses the Jaccard Index to compare metabolite names between dynamics clusters of different experimental conditions
compare_metabolites(data)compare_metabolites(data)
data |
result of |
a data frame of Jaccard indices between data or if data input was a SummarizedExperiment results are stored in metadata(data) under "comparison_metabolites"
Visualization of metabolite similarity heatmap_metabolites()/
compare dynamics of clusters compare_dynamics()
data("longitudinalMetabolomics") longitudinalMetabolomics <- compare_metabolites( data = longitudinalMetabolomics ) S4Vectors::metadata(longitudinalMetabolomics)[["comparison_metabolites"]]data("longitudinalMetabolomics") longitudinalMetabolomics <- compare_metabolites( data = longitudinalMetabolomics ) S4Vectors::metadata(longitudinalMetabolomics)[["comparison_metabolites"]]
gathers number of divergences, rhat values, number of effective samples (n_eff) and provides plots for diagnostics criteria as well as posterior predictive checks. Output dataframe "model_diagnostics" contains information about experimental condition, number of divergent transitions and rhat and neff values for all timepoints.
diagnostics_dynamics( data, assay = "scaled_log", iter = 2000, warmup = iter/4, chains = 4, fit = metadata(data)[["dynamic_fit"]] )diagnostics_dynamics( data, assay = "scaled_log", iter = 2000, warmup = iter/4, chains = 4, fit = metadata(data)[["dynamic_fit"]] )
data |
data frame or a SummarizedExperiment used to fit dynamics model column of "time" that contains time must be numeric, has to contain columns specifying the metabolite named "metabolite", and column specifiying the time point named "time", a column named "condition" must specify the experimental condition. |
assay |
of the SummarizedExperiment object that was used to fit the dynamics model |
iter |
number of iterations used for model fit |
warmup |
number of warmup iterations used for model fit |
chains |
number of chains used for model fit |
fit |
model fit for which diagnostics should be extracted, is the object that gets returned by fit_dynamics_model(), or if a SummarizedExperiment object the results of fit_dynamics_model() are stored in metadata(data) under "dynamic_fit" |
a list which contains diagnostics criteria of all conditions in a dataframe (named "model_diagnostics") and one dataframe per condition that contains necessary information for Posterior predictive check (named "PPC_condition"). If data is a summarizedExperiment object the diagnostics are stored in metadata(data) "diagnostics_dynamics"
estimates_dynamics()
parent function fit_dynamics_model()
visualization functions: plot_diagnostics()/plot_PPC()
data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition %in% c("A", "B") & longitudinalMetabolomics$metabolite == "ATP"] data <- fit_dynamics_model( model = "scaled_log", data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- diagnostics_dynamics( data = data, assay = "scaled_log", iter = 2000, chains = 1, fit = metadata(data)[["dynamic_fit"]] ) S4Vectors::metadata(data)[["diagnostics_dynamics"]][["model_diagnostics"]] S4Vectors::metadata(data)[["diagnostics_dynamics"]][["posterior"]]data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition %in% c("A", "B") & longitudinalMetabolomics$metabolite == "ATP"] data <- fit_dynamics_model( model = "scaled_log", data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- diagnostics_dynamics( data = data, assay = "scaled_log", iter = 2000, chains = 1, fit = metadata(data)[["dynamic_fit"]] ) S4Vectors::metadata(data)[["diagnostics_dynamics"]][["model_diagnostics"]] S4Vectors::metadata(data)[["diagnostics_dynamics"]][["posterior"]]
Extracts the mean concentrations (mu) at every time point from the dynamics model fit, the 95% highest density interval (HDI), the estimated standard deviation of metabolite concentrations at every time point (sigma), and the pooled standard deviation of every metabolite over all timepoints (lambda). Additionally samples from the posterior of mu can be drawn. This can be helpful if p.e. one wants to estimate the clustering precision. Lambda can be used for clustering algorithms such as VSClust that also take the variance into account.
estimates_dynamics( data, assay = "scaled_log", fit = metadata(data)[["dynamic_fit"]], model_option = "sd_per_condition" )estimates_dynamics( data, assay = "scaled_log", fit = metadata(data)[["dynamic_fit"]], model_option = "sd_per_condition" )
data |
data frame or colData of a SummarizedExperiment
used to fit dynamics model, must contain a column named "condition" specifiyng
the experimental condition and a column
named "time" specifying the timepoints.
If it is a SummarizedExperiment object the dynamic fits must be stored in metadata(data)
under "dynamic_fits" (automatically done by |
assay |
of the SummarizedExperiment object that was used to fit the dynamics model |
fit |
model fit for which estimates should be extracted |
model_option |
model option that was used for fitting with |
a list of dataframes (one per parameters mu, sigma, lambda, delta_mu and euclidean distance) that contains the estimates: mu: is the estimated mean metabolite abundance sigma: the estimated standard deviation of metabolite abundance lambda: pooled sigma per condition delta_mu: differences of mu between time points euclidean_distances: estimated euclidean distance of time vectors of one metabolite between conditions If data is a summarizedExperiment object the estimates are stored in metadata(data) under "estimates_dynamics"
Fit the dynamic model fit_dynamics_model().
Diagnostics of the dynamic model diagnostics_dynamics()
Visualization of estimates with plot_estimates()
data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" & longitudinalMetabolomics$metabolite == "ATP"] data <- fit_dynamics_model( data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- estimates_dynamics( data = data ) S4Vectors::metadata(data)[["estimates_dynamics"]]data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" & longitudinalMetabolomics$metabolite == "ATP"] data <- fit_dynamics_model( data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- estimates_dynamics( data = data ) S4Vectors::metadata(data)[["estimates_dynamics"]]
Employs a hierarchical model that assumes a normal distribution of standardized (mean=0, sd=1) log(cpc) (cpc = normalized metabolite abundance) values for robust estimation of mean concentrations over time of single metabolites at single experimental conditions. At least three replicates for metabolite concentrations per time point and condition are needed. If cell counts are provided at least one replicate per time point and condition is needed.
fit_dynamics_model( model = "scaled_log", model_option = "sd_per_condition", data, scaled_measurement = "m_scaled", counts = NULL, assay = "scaled_log", chains = 4, cores = 4, adapt_delta = 0.95, max_treedepth = 10, iter = 2000, warmup = iter/4 )fit_dynamics_model( model = "scaled_log", model_option = "sd_per_condition", data, scaled_measurement = "m_scaled", counts = NULL, assay = "scaled_log", chains = 4, cores = 4, adapt_delta = 0.95, max_treedepth = 10, iter = 2000, warmup = iter/4 )
model |
which model to fit. Two options are available: "scaled_log": taking in normalized and scaled metabolite concentrations (see scaled measurement) "raw_plus_counts": tailored for in vitro untargeted LC-MS experiments, taking in "raw" (i.e. not normalized and not scaled) metabolite concentrations and cell counts. This model assumes independent measurement (i.e. different wells) of cell counts and metabolite concentrations. Additionally it assumes that cell counts were estimated e.g. by cell counters (i.e. that cells were not counted under the microscope) leading to a small uncertainty of the true cell count. |
model_option |
for each model ("scaled_log" and "raw_plus_counts") two versions are available: "sd_per_time_point" and "sd_per_condition". "sd_per_time_point" is suited for more replicates (at least triplicates) and estimates standard deviations per time point and therefore allows for more sensitive discrimination of changing abundances. high deviations between replicates) or for many metabolites, time points or conditions as it reduces run time due to less parameters. For model details see vignette (browseVignettes("MetaboDynamics")). |
data |
concentration table with at least three replicate measurements per metabolite. Must contain columns named "metabolite" (containing names or IDs), "time" (categorical, the same for all conditions), and "condition" or colData of a SummarizedExperiment object Time column needs to be sorted in ascending order |
scaled_measurement |
column of "data" that contains the concentrations per cell, centered and normalized per metabolite and experimental condition (mean=0, sd=1), must be numeric |
counts |
data frame with at least one replicate per time point and condition specifying the cell counts, must contain columns "time", and "condition" equivalent to the specifications of "data". Must contain a column named "counts" that specifies the cell counts. Model assumes that the replicates of the cell counts and metabolite concentrations are independent of each other (i.e. cell counts were measured in in different wells than metabolite concentrations) |
assay |
if input is a SummarizedExperiment specify the assay that should be used for input, colData has to hold the columns, "condition" and "metabolite", rowData the timepoint specifications, in case of the model "scaled_log" assay needs to hold scaled log-transformed metabolite concentrations (mean=0,sd=1 per metabolite and experimental condition), if model "raw_plus_counts" is chosen must hold the non-transformed and non-scaled metabolite concentrations |
chains |
how many Markov-Chains should be used for model fitting, use at least two, default=4 |
cores |
how many cores should be used for model fitting; this parallelizes the model fitting and therefore speeds it up; default=4 |
adapt_delta |
target average acceptance probability, can be adapted if divergent transitions are reported, default is 0.95 |
max_treedepth |
can be adapted if model throws warnings about hitting max_treedepth, warnings are mostly efficiency not validity concerns and treedepth can be raised, default=10 |
iter |
how many iterations are run, increasing might help with effective sample size being to low, default=2000 |
warmup |
how many iterations the model warms up for, increasing might facilitate efficiency, must be at least 25% of ITER, default=iter/4 |
returns a list of model fits. One model fit named "condition" per experimental condition. If input is a summarizedExperiment object the dynamic fits are stored metadata(data) under "dynamic_fits". 'model' and 'model_option' are also stored in metadata(data)
Example data setlongitudinalMetabolomics.
Get model diagnostics diagnostics_dynamics()
Get model estimates estimates_dynamics()
## on scaled log-transformed metabolite concentrations data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition %in% c("A", "B") & longitudinalMetabolomics$metabolite == "ATP"] data <- fit_dynamics_model( model = "scaled_log", data = data, assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) S4Vectors::metadata(data)[["dynamic_fit"]]## on scaled log-transformed metabolite concentrations data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition %in% c("A", "B") & longitudinalMetabolomics$metabolite == "ATP"] data <- fit_dynamics_model( model = "scaled_log", data = data, assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) S4Vectors::metadata(data)[["dynamic_fit"]]
Uses the package KEGGREST to retrieve background and annotation information needed for over-representation analysis. As KEGGREST only allows 10 queries per second this might take some time to run, depending on the size of the dataset and organism. The user should check afterwards if all functional modules are applicable for the analysis of the dataset (p.e. organism, tissue).
get_ORA_annotations( data, kegg = "KEGG", metabolite_name = "metabolite", update_background = TRUE )get_ORA_annotations( data, kegg = "KEGG", metabolite_name = "metabolite", update_background = TRUE )
data |
data frame to be analyzed with ORA. Must at least contain a column with KEGG IDs or a SummarizedExperiment where the metabolite names or IDs are stored in colData |
kegg |
column name of "data" that holds KEGG IDs |
metabolite_name |
column name of "data" that holds metabolite names |
update_background |
logical. Should the background information be updated? Should be set to TRUE of this is the first time using this function. If TRUE this may take some time. |
a list with dataframes "background" and "annotation" needed for ORA, if data is a SummarizedExperiment SummarizedExperiment object annotations are stored in metadata(data) under "KEGG_annotations"
Do over-representation analysis of KEGG functional modules ORA_hypergeometric()
data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" & longitudinalMetabolomics$metabolite == "ATP"] data <- get_ORA_annotations( data = data, kegg = "KEGG", metabolite_name = "metabolite", update_background = FALSE ) S4Vectors::metadata(data)[["KEGG_annotations"]]data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" & longitudinalMetabolomics$metabolite == "ATP"] data <- get_ORA_annotations( data = data, kegg = "KEGG", metabolite_name = "metabolite", update_background = FALSE ) S4Vectors::metadata(data)[["KEGG_annotations"]]
plot bubble heatmap from the numerical fit of compare_dynamics()
heatmap_dynamics( estimates = metadata(data)[["comparison_dynamics"]][["estimates"]], data )heatmap_dynamics( estimates = metadata(data)[["comparison_dynamics"]][["estimates"]], data )
estimates |
dataframe of estimates of the mean distance between clusters of different experimental conditions ("mean") and the standard deviation ("sigma") obtain by function compare_dynamics() |
data |
a dataframe or containing a column specifying the metabolite names to be compared and cluster IDs (column named "cluster") of clusters of similar dynamics, as well as a column "condition" specifying the experimental conditions. to be compared or a SummarizedExperiment storing the same information in metadata(data) under "cluster" |
a bubble heat map where the color of the bubble represents the similarity of two clusters in regards to their dynamics in the color and the size the uncertainty of the similarity. Big bright bubbles mean high similarity with low uncertainty.
Do calculations for comparison of dynamics between clusters compare_dynamics()
data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition %in% c("A", "B") & longitudinalMetabolomics$metabolite %in% c("ATP", "L-Alanine", "GDP")] data <- fit_dynamics_model( data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- estimates_dynamics( data = data ) data <- cluster_dynamics(data, B = 1) data <- compare_dynamics( data = data, cores = 1 ) S4Vectors::metadata(data)[["comparison_dynamics"]] heatmap_dynamics(data = data)data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition %in% c("A", "B") & longitudinalMetabolomics$metabolite %in% c("ATP", "L-Alanine", "GDP")] data <- fit_dynamics_model( data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- estimates_dynamics( data = data ) data <- cluster_dynamics(data, B = 1) data <- compare_dynamics( data = data, cores = 1 ) S4Vectors::metadata(data)[["comparison_dynamics"]] heatmap_dynamics(data = data)
plot heatmap from comparison of metabolite composition compare_metabolites()
heatmap_metabolites( distances = metadata(data)[["comparison_metabolites"]], data )heatmap_metabolites( distances = metadata(data)[["comparison_metabolites"]], data )
distances |
dataframe of Jaccard indices between clusters obtained by function compare_metabolites(). If compare_metabolites() was executed on as SummarizedExperiment or a SummarizedExperiment than this is stored in metadata(data) under "comparison_metabolites" |
data |
a dataframe containing the columns "metabolite" specifying the metabolite names to be compared and cluster IDs(column named "cluster") of clusters of similar dynamics, as well as a column "condition" specifying the experimental conditions to be compared |
a heatmap where the color of the tile represents the similarity of two clusters in regards to their metabolite composition. The brighter the color the more similar the metabolite compositions.
Do calculations for comparison of metabolites between clusters compare_metabolites()
data("longitudinalMetabolomics") longitudinalMetabolomics <- compare_metabolites( data = longitudinalMetabolomics ) heatmap_metabolites(data = longitudinalMetabolomics)data("longitudinalMetabolomics") longitudinalMetabolomics <- compare_metabolites( data = longitudinalMetabolomics ) heatmap_metabolites(data = longitudinalMetabolomics)
Example data set of KEGG ID annotations of metabolites as needed for ORA_hypergeometric()
data("IDs")data("IDs")
A dataframe
KEGGKEGG ID of metabolites
replicatecolumn that specifies the measurement replicate
A simulated data set of 98 metabolites. 3 replicate measurements of 4 time points and at 2 experimental conditions. Metabolites are in 8 dynamics groups per experimental condition. 4 groups have varying dynamics between conditions. Is represented as a SummarizedExperiment object where concentration tables of each experimental condition are stored in assays (raw concentrations in "concentrations", log-transformed transformations in "log_con" and scaled log-transformed concentrations" in "scaled_log") and metabolite names, KEGG IDs, experimental conditions and clustering solutions per experimental condition are stored in colData and timepoint specifications in rowData. (SummarizedExperiment).
data("longitudinalMetabolomics")data("longitudinalMetabolomics")
A SummarizedExperiment object with concentration tables in assays. RowData contains the time point specification. ColData as specified below.
conditionexperimental condition
metabolitemetabolite name
KEGGKEGG ID of metabolites
replicatecolumn that specifies the measurement replicate
clustercluster ID that is condition specific for every metabolite
Script used to create simulated data set.seed(1234) # load KEGG database for assignment of metabolite names: data("metabolite_modules")
# metabolite_db <- metabolite_modules # Group <- middle_hierarchy
library(dplyr) library(SummarizedExperiment) library(tidyverse) # Parameters (as before)
n_features <- 98
n_groups <- 8 # Number of groups (randomly choose between 6-8)
n_time_points <- 4 # Number of time points
n_replicates <- 3 # Number of replicates for all features and time points
n_conditions <- 2 # Number of experimental conditions
x_varying_groups <- 4 # Number of groups with varying dynamics across conditions
condition_names <- c("A","B")
# Probability matrix for assigning metabolites from different database groups to dynamic groups # For simplicity, we assume equal probability; customize as needed
group_probabilities <- matrix(c(0.8,rep(0.01,7), #amino acid metabolism rep(0.01,7),0.8, #nucleotide metabolism 0.1,0.8,0.8,rep(0.1,5), # energy and carbohydrate metabolism runif(5 * length(unique(metabolite_modules$middle_hierarchy)))), nrow = n_groups, ncol = length(unique(metabolite_modules$middle_hierarchy)))
# Generate group dynamics (base trends over time) for each condition
group_dynamics <- list()
# Define the base group dynamics for condition 1
group_dynamics[[1]] <- lapply(1:n_groups, function(g) trend <- rnorm(n_time_points, mean = g * 2, sd = 0.5) return(trend) )
# Define varying dynamics for selected groups across other conditions
varying_groups <- sample(1:n_groups, x_varying_groups, replace = FALSE)
for (cond in 2:n_conditions) group_dynamics[[cond]] <- group_dynamics[[1]] for (g in varying_groups) group_dynamics[[cond]][[g]] <- rnorm(n_time_points, mean = g * 2, sd = 1)
# Assign each feature to a group
feature_to_group <- sample(1:n_groups, n_features, replace = TRUE)
# Initialize a list to store the simulated data
simulated_data <- list()
# Assign metabolite names to features
available_metabolites <- metabolite_modules # Copy of metabolite database to keep track of unused names
# Simulate data for each feature across all conditions
for (feature in 1:n_features)
# Get the group for this feature
group <- feature_to_group[feature]
# Determine probability of each metabolite database group for this dynamic group
group_probs <- group_probabilities[group, ]
# Subset the metabolite database for selection based on group probabilities
metabolite_candidates <- available_metabolites group_by(middle_hierarchy) mutate(Probability = group_probs[match(middle_hierarchy, unique(metabolite_modules$middle_hierarchy))]) ungroup() filter(metabolite
# Randomly sample a metabolite based on these probabilities
metabolite_name <- sample(metabolite_candidates$metabolite, 1, prob = metabolite_candidates$Probability)
# Remove this metabolite from available pool
available_metabolites <- available_metabolites[available_metabolites$metabolite != metabolite_name, ]
# Generate a random base mean for this feature between 0.001 and 1000
base_mean <- runif(1, min = 0.1, max = 1e5)
# Generate feature-specific variances for each time point
feature_variances <- runif(n_time_points, min = 0.1, max = 2)
# Store data for each condition
for (cond in 1:n_conditions) trend <- group_dynamics[[cond]][[group]] feature_means <- base_mean * trend / max(abs(trend))
feature_data <- data.frame( metabolite = metabolite_name, # Assign metabolite name here condition = paste0(condition_names[[cond]]), time = rep(1:n_time_points, each = n_replicates), replicate = rep(1:n_replicates, times = n_time_points) )
# Generate the actual data points with strictly positive concentrations
feature_data$measurement <- unlist(lapply(1:n_time_points, function(t) rlnorm(n_replicates, meanlog = log(feature_means[t]), sdlog = feature_variances[t]) ))
simulated_data[[length(simulated_data) + 1]] <- feature_data
rm(base_mean,cond,feature,feature_means,feature_to_group,feature_variances, g,group,group_probs,metabolite_name,n_conditions,n_features,n_groups,n_replicates, n_time_points,trend,varying_groups,x_varying_groups,available_metabolites,feature_data,group_dynamics, group_probabilities,metabolite_candidates)
# Combine all features and conditions into one data frame
simulated_data_df <- do.call(rbind, simulated_data)
simulated_data_df <- simulated_data_df group_by(metabolite, condition) mutate( log_m = log10(measurement), m_scaled = (log_m - mean(log_m)) / sd(log_m) )
# add KEGG IDs name_map_HMDB_CAS <- readr::read_csv("name_map_HMDB_CAS.csv") longitudinalMetabolomics <- dplyr::left_join(simulated_data_df,name_map_HMDB_CAS[,c("Query","KEGG")],join_by("metabolite"=="Query"))
## concentrations temp <- longitudinalMetabolomics temp <- temp select(condition,metabolite,KEGG,time,measurement,replicate) pivot_wider(id_cols=c(condition,metabolite,KEGG,replicate), names_from = time, values_from = measurement) concentrations <- temp ## transform matrix so that conditions are in columns to facilitate access ## with colData -> se[,se$condition="A"] concentrations <- t(as.matrix(concentrations)) row.names(concentrations) <- NULL # prepare log_transformed data temp <- longitudinalMetabolomics temp <- temp select(condition,metabolite,KEGG,time,log_m,replicate) pivot_wider(id_cols=c(condition,metabolite,KEGG,replicate), names_from = time, values_from = log_m) log_con <- temp log_con <- t(as.matrix(log_con)) row.names(log_con) <- NULL # prepare scaled log_transformed data temp <- longitudinalMetabolomics temp <- temp select(condition,metabolite,KEGG,time,m_scaled,replicate) pivot_wider(id_cols=c(condition,metabolite,KEGG,replicate), names_from = time, values_from = m_scaled) scaled_log <- temp scaled_log <- t(as.matrix(scaled_log)) row.names(scaled_log) <- NULL
# simulate cell counts ## I am assuming that all cell counts are from a poisson distribution with the same ## location parameter (not the case for real life applications). One parameter is just used to show application of the dynamics model including cell counts
counts <- rbind(rpois(ncol(concentrations),1e5), rpois(ncol(concentrations),1e5), rpois(ncol(concentrations),1e5), rpois(ncol(concentrations),1e5)) # 882 columns, three entries for same metabolite, four time points
# prepare row and colData #### row_data <- DataFrame(time_points=c("time_point_1","time_point_2","time_point_3", "time_point_4")) col_data <- DataFrame(condition=temp$condition,metabolite=temp$metabolite, KEGG=temp$KEGG,replicate=temp$replicate)
se <- SummarizedExperiment(assays=SimpleList(concentrations=concentrations, log_con=log_con, scaled_log=scaled_log, cell_counts = counts), rowData = row_data, colData = col_data) # set row and colnames #### rownames(se) <- c("time_point_1","time_point_2","time_point_3", "time_point_4") colnames(se) <- temp$metabolite
# add metadata #### metadata(se)[["data origin"]] <- "Simulated data of 98 metabolites with three concentration observations at four time points and at three different biological conditions. Script to construct dataset can be seen with ?longitudinalMetabolomics"
data <- se data <- fit_dynamics_model( data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.999, iter = 5000, cores = 2, chains = 2 )
data <- estimates_dynamics( data = data ) data <- cluster_dynamics(data,B=10) metadata(data)[["dynamic_fit"]] <- NULL
data("modules_compounds") data("metabolite_modules") data("IDs") data <- ORA_hypergeometric(data = data, background = modules_compounds, IDs = IDs, annotations = metabolite_modules, tested_column = "middle_hierarchy")
# save shortened analysis in metadata of se metadata(se)[["estimates_dynamics"]] <- metadata(data)[["estimates_dynamics"]] metadata(se)[["dynamics"]] <- metadata(data)[["dynamics"]] metadata(se)[["cluster"]] <- metadata(data)[["cluster"]] metadata(se)[["ORA_middle_hierarchy"]] <- metadata(data)[["ORA_middle_hierarchy"]]
longitudinalMetabolomics <- se usethis::use_data(longitudinalMetabolomics,overwrite = TRUE)
longitudinalMetabolomics_df <- dplyr::left_join(simulated_data_df,name_map_HMDB_CAS[,c("Query","KEGG")],join_by("metabolite"=="Query")) usethis::use_data(longitudinalMetabolomics_df,overwrite = TRUE)
A simulated data set of 98 metabolites. 3 replicate measurements of 4 time points and at 3 experimental conditions. Metabolites are in 8 dynamics groups per experimental condition. 4 groups have varying dynamics between conditions. Is represented as a SummarizedExperiment object where concentration tables of each experimental condition are stored in assays (raw concentrations in "concentrations", log-transformed transformations in "log_con" and scaled log-transformed concentrations" in "scaled_log") and metabolite names, KEGG IDs, experimental conditions and clustering solutions per experimental condition are stored in colData and timepoint specifications in rowData. (SummarizedExperiment).
data("longitudinalMetabolomics")data("longitudinalMetabolomics")
A SummarizedExperiment object with concentration tables in assays. RowData contains the time point specification. ColData as specified below.
conditionexperimental condition
metabolitemetabolite name
KEGGKEGG ID of metabolites
replicatecolumn that specifies the measurement replicate
clustercluster ID that is condition specific for every metabolite
Script used to create simulated data
# load KEGG database for assignment of metabolite names: data("metabolite_modules")
# metabolite_db <- metabolite_modules # Group <- middle_hierarchy
library(dplyr) library(SummarizedExperiment) # Parameters (as before)
n_features <- 98
n_groups <- 8 # Number of groups (randomly choose between 6-8)
n_time_points <- 4 # Number of time points
n_replicates <- 3 # Number of replicates for all features and time points
n_conditions <- 2 # Number of experimental conditions
x_varying_groups <- 4 # Number of groups with varying dynamics across conditions
condition_names <- c("A","B")
# Probability matrix for assigning metabolites from different database groups to dynamic groups # For simplicity, we assume equal probability; customize as needed
group_probabilities <- matrix(c(0.8,rep(0.01,7), #amino acid metabolism rep(0.01,7),0.8, #nucleotide metabolism 0.1,0.8,0.8,rep(0.1,5), # energy and carbohydrate metabolism runif(5 * length(unique(metabolite_modules$middle_hierarchy)))), nrow = n_groups, ncol = length(unique(metabolite_modules$middle_hierarchy)))
# Generate group dynamics (base trends over time) for each condition
group_dynamics <- list()
# Define the base group dynamics for condition 1
group_dynamics[[1]] <- lapply(1:n_groups, function(g) trend <- rnorm(n_time_points, mean = g * 2, sd = 0.5) return(trend) )
# Define varying dynamics for selected groups across other conditions
varying_groups <- sample(1:n_groups, x_varying_groups, replace = FALSE)
for (cond in 2:n_conditions) group_dynamics[[cond]] <- group_dynamics[[1]] for (g in varying_groups) group_dynamics[[cond]][[g]] <- rnorm(n_time_points, mean = g * 2, sd = 1)
# Assign each feature to a group
feature_to_group <- sample(1:n_groups, n_features, replace = TRUE)
# Initialize a list to store the simulated data
simulated_data <- list()
# Assign metabolite names to features
available_metabolites <- metabolite_modules # Copy of metabolite database to keep track of unused names
# Simulate data for each feature across all conditions
for (feature in 1:n_features)
# Get the group for this feature
group <- feature_to_group[feature]
# Determine probability of each metabolite database group for this dynamic group
group_probs <- group_probabilities[group, ]
# Subset the metabolite database for selection based on group probabilities
metabolite_candidates <- available_metabolites group_by(middle_hierarchy) mutate(Probability = group_probs[match(middle_hierarchy, unique(metabolite_modules$middle_hierarchy))]) ungroup() filter(metabolite
# Randomly sample a metabolite based on these probabilities
metabolite_name <- sample(metabolite_candidates$metabolite, 1, prob = metabolite_candidates$Probability)
# Remove this metabolite from available pool
available_metabolites <- available_metabolites[available_metabolites$metabolite != metabolite_name, ]
# Generate a random base mean for this feature between 0.001 and 1000
base_mean <- runif(1, min = 0.001, max = 1000)
# Generate feature-specific variances for each time point
feature_variances <- runif(n_time_points, min = 0.1, max = 2)
# Store data for each condition
for (cond in 1:n_conditions) trend <- group_dynamics[[cond]][[group]] feature_means <- base_mean * trend / max(abs(trend))
feature_data <- data.frame( metabolite = metabolite_name, # Assign metabolite name here condition = paste0(condition_names[[cond]]), time = rep(1:n_time_points, each = n_replicates), replicate = rep(1:n_replicates, times = n_time_points) )
# Generate the actual data points with strictly positive concentrations
feature_data$measurement <- unlist(lapply(1:n_time_points, function(t) rlnorm(n_replicates, meanlog = log(feature_means[t]), sdlog = feature_variances[t]) ))
simulated_data[[length(simulated_data) + 1]] <- feature_data
rm(base_mean,cond,feature,feature_means,feature_to_group,feature_variances, g,group,group_probs,metabolite_name,n_conditions,n_features,n_groups,n_replicates, n_time_points,trend,varying_groups,x_varying_groups,available_metabolites,feature_data,group_dynamics, group_probabilities,metabolite_candidates)
# Combine all features and conditions into one data frame
simulated_data_df <- do.call(rbind, simulated_data)
simulated_data_df <- simulated_data_df group_by(metabolite, condition) mutate( log_m = log10(measurement), m_scaled = (log_m - mean(log_m)) / sd(log_m) )
# add KEGG IDs name_map_HMDB_CAS <- readr::read_csv("name_map_HMDB_CAS.csv") longitudinalMetabolomics_df <- dplyr::left_join(simulated_data_df,name_map_HMDB_CAS[,c("Query","KEGG")],join_by("metabolite"=="Query")) usethis::use_data(longitudinalMetabolomics_df,overwrite = TRUE)
Using the package KEGGREST (https://www.bioconductor.org/packages/release/bioc/html/KEGGREST.html) all experimental metabolites (see data("intra")) were queried with there KEGG-IDs and all functional modules recorded to which the metabolite is annotated in the KEGG-database.
data("metabolite_modules")data("metabolite_modules")
A data frame with 348 observations on the following 8 variables.
...1row number of the dataframe
metabolitename of the experimental metabolite
KEGGKEGG ID of the experimental metabolite
module_idID of the KEGG module to which the metabolite is annotated
module_namename of the KEGG module to which the metabolite is annotated
upper_hierarchyname of the highest hierarchy level of module organization
middle_hierarchyname of the middle hierarchy = functional module, p.e. "Amino acid metabolism"
lower_hierarchyname of the lowest level of modules, this often contain only a couple pathways p.e. "Arginine and proline metabolism"
https://www.genome.jp/kegg/module.html
Using the package KEGGREST (https://www.bioconductor.org/packages/release/bioc/html/KEGGREST.html) a list of all KEGG-modules (KeggList("module")) including their upper, middle and lower hierachy as given by the KEGG-database and the corresponding annotated metabolites was queried.
data("modules_compounds")data("modules_compounds")
A data frame with 1988 observations on the following 6 variables.
KEGGKEGG ID of a metabolite annotated to a functional module
upper_hierarchyname of the highest hierachy level of module organization
middle_hierarchyname of the middle hierachy = functional module, p.e. "Amino acid metabolism"
lower_hierarchyname of the lowest level of modules, this often contain only a couple pathways p.e. "Arginine and proline metabolism"
module_idthe ID of the KEGG functional module
module_namename of the KEGG module
https://www.genome.jp/kegg/module.html
Testing the hypothesis that certain KEGG modules are over-represented in clusters of metabolites. A module is considered over-represented in a cluster the number of metabolites in a cluster being annotated to a functional module (n_obs) is higher than the expected number of metabolites in a cluster of this size being annotated to a functional module (n_theo). We can calculate the OvE (Observed versus Expected = n_obs/n_theo) and show the probabilities of these ratios. log(p(OvE))>0 indicates an over-representation of the functional module in the cluster, log(p(OvE))<0 an under-representation.
ORA_hypergeometric( background, annotations, data, IDs, tested_column = "middle_hierarchy" )ORA_hypergeometric( background, annotations, data, IDs, tested_column = "middle_hierarchy" )
background |
dataframe that contains KEGG IDs of metabolites that are assigned to functional modules, is incorporated in the package modules_compounds |
annotations |
dataframe tha contains information to which functional modules our experimental metabolites are annotated in KEGG, can be constructed by filtering the provided KEGG background modules_compounds for the experimental metabolites |
data |
result of |
IDs |
dataframe with two columns 'metabolite' and 'KEGG' mapping KEGG IDs to metabolites.
If function |
tested_column |
column that is in background and annotations and on which the hypergeometric model will be executed |
a dataframe containing the ORA results or if data is SummarizedExperiment SummarizedExperiment object the output is stored in metadata(data) under "ORA_tested_column"
Function to obtain data cluster_dynamics()
Function to visualize ORA results plot_ORA()
data("longitudinalMetabolomics") data("modules_compounds") head(modules_compounds) data("metabolite_modules") head(metabolite_modules) data("IDs") head(IDs) longitudinalMetabolomics <- ORA_hypergeometric( data = longitudinalMetabolomics, annotations = metabolite_modules, background = modules_compounds, IDs = IDs, tested_column = "middle_hierarchy" ) S4Vectors::metadata(longitudinalMetabolomics)[["ORA_middle_hierarchy"]]data("longitudinalMetabolomics") data("modules_compounds") head(modules_compounds) data("metabolite_modules") head(metabolite_modules) data("IDs") head(IDs) longitudinalMetabolomics <- ORA_hypergeometric( data = longitudinalMetabolomics, annotations = metabolite_modules, background = modules_compounds, IDs = IDs, tested_column = "middle_hierarchy" ) S4Vectors::metadata(longitudinalMetabolomics)[["ORA_middle_hierarchy"]]
visualize clustering solution of cluster_dynamics()
plot_cluster(data)plot_cluster(data)
data |
result of |
a list of plots. Per experimental condition: 1) a 'bubbletree': a phylogram with numbers on nodes indicating in how many bootstraps of the posterior estimates the same clustering solution was generated, 2) cluster affiliation of metabolites, 3) dynamics of metabolites per cluster, 4) patchwork of 1-3, 5) order of the tips in bubbletree: needed for matching cluster plots and ORA
data("longitudinalMetabolomics") plots <- plot_cluster(longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A"]) plots[["trees"]][["A"]]data("longitudinalMetabolomics") plots <- plot_cluster(longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A"]) plots[["trees"]][["A"]]
Plot diagnostic criteria of numerical fit of Bayesian model of dynamics
plot_diagnostics( data, assay = "scaled_log", diagnostics = metadata(data)[["diagnostics_dynamics"]][["model_diagnostics"]], divergences = TRUE, max_treedepth = TRUE, Rhat = TRUE, n_eff = TRUE )plot_diagnostics( data, assay = "scaled_log", diagnostics = metadata(data)[["diagnostics_dynamics"]][["model_diagnostics"]], divergences = TRUE, max_treedepth = TRUE, Rhat = TRUE, n_eff = TRUE )
data |
dataframe or colData of a SummarizedExperiment used to fit dynamics model must contain column "time" |
assay |
of the SummarizedExperiment object that was used to fit the dynamics model |
diagnostics |
dataframe containing diagnostics criteria from the numerical fit of Bayesian model of dynamics obtained by function diagnostics_dynamics() |
divergences |
should number of divergent transitions be visualized? |
max_treedepth |
should number of exceeded maximum treedepth be visualized? |
Rhat |
should Rhat be visualized? |
n_eff |
should number of effective samples be visualized? |
plots of diagnostic criteria of numerical fit of Bayesian model of dynamics
parent function diagnostics_dynamics()
visualization function for posterior predictive check plot_PPC()
data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" & longitudinalMetabolomics$metabolite %in% c("ATP", "ADP")] data <- fit_dynamics_model( model = "scaled_log", data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- diagnostics_dynamics( data = data, assay = "scaled_log", iter = 2000, chains = 1, fit = metadata(data)[["dynamic_fit"]] ) plot_diagnostics(data = data, assay = "scaled_log")data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" & longitudinalMetabolomics$metabolite %in% c("ATP", "ADP")] data <- fit_dynamics_model( model = "scaled_log", data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- diagnostics_dynamics( data = data, assay = "scaled_log", iter = 2000, chains = 1, fit = metadata(data)[["dynamic_fit"]] ) plot_diagnostics(data = data, assay = "scaled_log")
Visualization of parameter estimates from numeric fit of Bayesian model of dynamics
plot_estimates( data = NULL, estimates = metadata(data)[["estimates_dynamics"]], delta_t = TRUE, dynamics = TRUE, distance_conditions = TRUE )plot_estimates( data = NULL, estimates = metadata(data)[["estimates_dynamics"]], delta_t = TRUE, dynamics = TRUE, distance_conditions = TRUE )
data |
SummarizedExperiment used to fit dynamics model and extract the estimates |
estimates |
a list of data frames (elements: mu, sigma, lambda, euclidean_distance) that contains the model estimates by estimates_dynamics() or if data is a SummarizedExperiment estimates must be stored in metadata(data) under "estimates_dynamics" |
delta_t |
should differences between time points be plotted? |
dynamics |
should dynamics be plotted? |
distance_conditions |
should differences in metabolite specific dynamic should be plotted? |
Visualization of differences between time points(delta_t) and dynamics profiles of single metabolites
parent function estimates_dynamics()
data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition %in% c("A", "B") & longitudinalMetabolomics$metabolite %in% c("ATP")] data <- fit_dynamics_model( data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- estimates_dynamics( data = data ) plot_estimates(data = data, delta_t = TRUE, dynamic = FALSE, distance_conditions = FALSE) plot_estimates(data = data, delta_t = FALSE, dynamic = TRUE, distance_conditions = FALSE) plot_estimates(data = data, delta_t = FALSE, dynamic = FALSE, distance_conditions = TRUE)data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition %in% c("A", "B") & longitudinalMetabolomics$metabolite %in% c("ATP")] data <- fit_dynamics_model( data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- estimates_dynamics( data = data ) plot_estimates(data = data, delta_t = TRUE, dynamic = FALSE, distance_conditions = FALSE) plot_estimates(data = data, delta_t = FALSE, dynamic = TRUE, distance_conditions = FALSE) plot_estimates(data = data, delta_t = FALSE, dynamic = FALSE, distance_conditions = TRUE)
Plot results of over-representation analysis with ORA_hypergeometric()
plot_ORA( data, tested_column = "middle_hierarchy", patchwork = FALSE, plot_cluster = NULL )plot_ORA( data, tested_column = "middle_hierarchy", patchwork = FALSE, plot_cluster = NULL )
data |
result dataframe from |
tested_column |
KEGG module hierarchy level on which ORA was executed |
patchwork |
should result be patchworked to results of |
plot_cluster |
if patchwork = TRUE this needs to be the result of |
a plot of the over-representation analysis and lsit of plots suitable to patchwork with cluster visualization if patchwork=TRUE
do over-represenation analysis of KEGG functional modules ORA_hypergeometric()
data("longitudinalMetabolomics") data("modules_compounds") head(modules_compounds) data("metabolite_modules") head(metabolite_modules) data("IDs") head(IDs) # middly hierachy longitudinalMetabolomics <- ORA_hypergeometric( data = longitudinalMetabolomics, annotations = metabolite_modules, background = modules_compounds, tested_column = "middle_hierarchy", IDs = IDs ) plot_ORA(longitudinalMetabolomics)data("longitudinalMetabolomics") data("modules_compounds") head(modules_compounds) data("metabolite_modules") head(metabolite_modules) data("IDs") head(IDs) # middly hierachy longitudinalMetabolomics <- ORA_hypergeometric( data = longitudinalMetabolomics, annotations = metabolite_modules, background = modules_compounds, tested_column = "middle_hierarchy", IDs = IDs ) plot_ORA(longitudinalMetabolomics)
Plots posterior predictive check of numerical fit of Bayesian dynamics model
plot_PPC( posterior = metadata(data)[["diagnostics_dynamics"]], data, assay = "scaled_log", scaled_measurement = "scaled_measurement" )plot_PPC( posterior = metadata(data)[["diagnostics_dynamics"]], data, assay = "scaled_log", scaled_measurement = "scaled_measurement" )
posterior |
a dataframe that contains necessary information for Posterior predictive check obtained by function diagnostics_dynamics()(named "PPC_condition") |
data |
dataframe or colData of a SummarizedExperiment used to fit dynamics model |
assay |
of the SummarizedExperiment object that was used to fit the dynamics model |
scaled_measurement |
column name of concentration values used to model fit, should be normalized by experimental condition and metabolite to mean of zero and standard deviation of one |
a list of visual posterior predictive check, one per experimental condition
parent function diagnostics_dynamics()
visualization function for diagnostics plot_diagnostics()
data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" & longitudinalMetabolomics$metabolite %in% c("ATP", "ADP")] data <- fit_dynamics_model( model = "scaled_log", data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- diagnostics_dynamics( data = data, assay = "scaled_log", iter = 2000, chains = 1, fit = metadata(data)[["dynamic_fit"]] ) plot_PPC( data = data, assay = "scaled_log" )data("longitudinalMetabolomics") data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" & longitudinalMetabolomics$metabolite %in% c("ATP", "ADP")] data <- fit_dynamics_model( model = "scaled_log", data = data, scaled_measurement = "m_scaled", assay = "scaled_log", max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1 ) data <- diagnostics_dynamics( data = data, assay = "scaled_log", iter = 2000, chains = 1, fit = metadata(data)[["dynamic_fit"]] ) plot_PPC( data = data, assay = "scaled_log" )