Package 'MetaboDynamics' reference manual

Title:	Bayesian analysis of longitudinal metabolomics data
Description:	MetaboDynamics is an R-package that provides a framework of probabilistic models to analyze longitudinal metabolomics data. It enables robust estimation of mean concentrations despite varying spread between timepoints and reports differences between timepoints as well as metabolite specific dynamics profiles that can be used for identifying "dynamics clusters" of metabolites of similar dynamics. Provides probabilistic over-representation analysis of KEGG functional modules and pathways as well as comparison between clusters of different experimental conditions.
Authors:	Katja Danielzik [aut, cre] , Simo Kitanovski [ctb] , Johann Matschke [ctb] , Daniel Hoffmann [ctb]
Maintainer:	Katja Danielzik <katja.danielzik@uni-due.de>
License:	GPL (>= 3)
Version:	0.99.23
Built:	2025-03-27 03:39:16 UTC
Source:	https://github.com/bioc/MetaboDynamics

Comparison of metabolite dynamics clusters under different experimental conditions

Description

Employs a Bayesian model that assumes a normal distribution of Euclidean distances between dynamics vectors (metabolite concentrations at different time points) of two clusters that come from different experimental conditions to estimate the mean distance between clusters.

Usage

compare_dynamics(data, dynamics, cores = 4)
compare_dynamics(data, dynamics, cores = 4)

Arguments

`data`	a dataframe or containing a column specifying the metabolite names to be compared and cluster IDs (column named "cluster") of clusters of similar dynamics, as well as a column "condition" specifying the experimental conditions. to be compared or a SummarizedExperiment storing the same information in metadata(data) under "cluster"
`dynamics`	vector specifying the column names of dataframe clusters that hold the dynamics information
`cores`	how many cores should be used for model fitting; this parallelizes the model fitting and therefore speeds it up; default=4

Value

a list holding a 1) the model fit 2) dataframe of estimates of the mean distance between #' clusters of different experimental conditions ("mean") and the standard deviation ("sigma"). If data input was a SummarizedExperiment results are stored in metadata(data) under "comparison_dynamics"

Examples

data("longitudinalMetabolomics")
longitudinalMetabolomics <- compare_dynamics(
  data = longitudinalMetabolomics,
  dynamics = c("mu1_mean", "mu2_mean", "mu3_mean", "mu4_mean"),
  cores = 1
)
S4Vectors::metadata(longitudinalMetabolomics)[["comparison_dynamics"]]
data("longitudinalMetabolomics")
longitudinalMetabolomics <- compare_dynamics(
  data = longitudinalMetabolomics,
  dynamics = c("mu1_mean", "mu2_mean", "mu3_mean", "mu4_mean"),
  cores = 1
)
S4Vectors::metadata(longitudinalMetabolomics)[["comparison_dynamics"]]

Comparison of metabolite sets between dynamics clusters of different experimental conditions

Description

Uses the Jaccard Index to compare metabolite names between dynamics clusters of different experimental conditions

Usage

compare_metabolites(data, metabolite = "metabolite")
compare_metabolites(data, metabolite = "metabolite")

Arguments

`data`	a dataframe or containing a column specifying the metabolite names to be compared and cluster IDs (column named "cluster") of clusters of similar dynamics, as well as a column "condition" specifying the experimental conditions. to be compared or a SummarizedExperiment storing the same information in metadata(data) under "cluster"
`metabolite`	column in "data" that specifies either metabolite name or KEGG ID or some other identifier

Value

a dataframe of Jaccard indices between data or if data input was a SummarizedExperiment results are stored in metadata(data) under "comparison_metabolites"

Examples

data("longitudinalMetabolomics")
longitudinalMetabolomics <- compare_metabolites(
  data = longitudinalMetabolomics
)
S4Vectors::metadata(longitudinalMetabolomics)[["comparison_metabolites"]]
data("longitudinalMetabolomics")
longitudinalMetabolomics <- compare_metabolites(
  data = longitudinalMetabolomics
)
S4Vectors::metadata(longitudinalMetabolomics)[["comparison_metabolites"]]

Extracts diagnostic criteria from numeric fit of Bayesian model of dynamics

Description

gathers number of divergences, rhat values, number of effective samples (n_eff) and provides plots for diagnostics criteria as well as posterior predictive checks. Dataframe "model_diagnostics" contains information about experimental condition, number of divergent transitions and rhat and neff values for all timepoints.

Usage

diagnostics_dynamics(
  data,
  assay = "scaled_log",
  iter = 2000,
  warmup = iter/4,
  chains = 4,
  fits = metadata(data)[["dynamic_fits"]]
)
diagnostics_dynamics(
  data,
  assay = "scaled_log",
  iter = 2000,
  warmup = iter/4,
  chains = 4,
  fits = metadata(data)[["dynamic_fits"]]
)

Arguments

`data`	dataframe or a SummarizedExperiment used to fit dynamics model column of "time" that contains time as numeric
`assay`	of the SummarizedExperiment object that was used to fit the dynamics model
`iter`	number of iterations used for model fit
`warmup`	number of warmup iterations used for model fit
`chains`	number of chains used for model fit
`fits`	list of model fits for which diagnostics should be extracted, is the object that gets returned by fit_dynamics_model(), or if a summarizedExperiment object the results of fit_dynamics_model() are stored in metadata(data) under "dynamic_fits"

Value

a list which contains diagnostics criteria of all conditions in a dataframe (named "model_diagnostics") and one dataframe per condition that contains necessary information for Posterior predictive check (named "PPC_condition"). If data is a summarizedExperiment object the diagnostics are stored in metadata(data) "diagnostics_dynamics"

Examples

data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite == "ATP"]
data <- fit_dynamics_model(
  data = data,
  scaled_measurement = "m_scaled", assay = "scaled_log",
  max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1
)
data <- diagnostics_dynamics(
  data = data, assay = "scaled_log",
  iter = 2000, chains = 1,
  fits = metadata(data)[["dynamic_fits"]]
)
S4Vectors::metadata(data)[["diagnostics_dynamics"]][["model_diagnostics"]]
S4Vectors::metadata(data)[["diagnostics_dynamics"]][["posterior_A"]]
data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite == "ATP"]
data <- fit_dynamics_model(
  data = data,
  scaled_measurement = "m_scaled", assay = "scaled_log",
  max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1
)
data <- diagnostics_dynamics(
  data = data, assay = "scaled_log",
  iter = 2000, chains = 1,
  fits = metadata(data)[["dynamic_fits"]]
)
S4Vectors::metadata(data)[["diagnostics_dynamics"]][["model_diagnostics"]]
S4Vectors::metadata(data)[["diagnostics_dynamics"]][["posterior_A"]]

Extracts parameter estimates from numeric fit of Bayesian model of dynamics

Description

Extracts the mean concentrations (mu) at every timepoint from the dynamics model fit, the 95% highest density interval (HDI), the estimated standard deviation of metabolite concentrations at every time point (sigma), and the pooled standard deviation of every metabolite over all timepoints (lambda). Additionally samples from the posterior of mu can be drawn. This can be helpful if p.e. one wants to estimate the clustering precision. Lambda can be used for clustering algorithms such as VSClust that also take the variance into account.

Usage

estimates_dynamics(
  data,
  assay = "scaled_log",
  kegg = "KEGG",
  condition = "condition",
  time = "time",
  fits = metadata(data)[["dynamic_fits"]],
  iter = 2000,
  warmup = iter/4,
  chains = 4,
  samples = 1
)
estimates_dynamics(
  data,
  assay = "scaled_log",
  kegg = "KEGG",
  condition = "condition",
  time = "time",
  fits = metadata(data)[["dynamic_fits"]],
  iter = 2000,
  warmup = iter/4,
  chains = 4,
  samples = 1
)

Arguments

`data`	dataframe or colData of a SummarizedExperiment used used to fit dynamics model, must contain a column specifying KEGG IDs, column named "condition" specifiyng the experimental condition and a column named "time" specifying the timepoints. If it is a SummarizedExperiment object the dynamic fits must be stores in metadata(data) under "dynamic_fits"
`assay`	of the SummarizedExperiment object that was used to fit the dynamics model
`kegg`	column in "data" that contains the KEGG IDs or other identifier of metabolites
`condition`	name of column in dataframe data that specifies the experimental condition
`time`	column in "data" that contains the time point identifiers
`fits`	list of model fits for which estimates should be extracted
`iter`	how many iterations were used to fit the dynamics model
`warmup`	how many warm-up iterations were used to fit the dynamics model
`chains`	how many chains were used to fit the dynamics model
`samples`	how many posterior samples should be drawn (p.e. for check of clustering precision)

Value

a list of dataframes (one per experimental condition) that contains the estimates at the timepoints and samples from the posterior (number as specified in samples), delta_mu specifies the difference between time point specified in column "time.ID" and subsequent time point (delta_mu in row time.ID=1: mu(time point 2)- mu(time point 1)) if number of time points in dataset is >1 If data is a summarizedExperiment object the estimates are stored in metadata(data) under "estimates_dynamics"

Examples

data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite == "ATP"]
data <- fit_dynamics_model(
  data = data,
  scaled_measurement = "m_scaled", assay = "scaled_log",
  max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1
)
data <- estimates_dynamics(
  data = data, iter = 2000,
  chains = 1, condition = "condition"
)

data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite == "ATP"]
data <- fit_dynamics_model(
  data = data,
  scaled_measurement = "m_scaled", assay = "scaled_log",
  max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1
)
data <- estimates_dynamics(
  data = data, iter = 2000,
  chains = 1, condition = "condition"
)

Fits dynamics model

Description

Employs a hierarchical model that assumes a normal distribution of standardized (mean=0, sd=1) log(cpc) (cpc = concentration per cell) values for robust estimation of mean concentrations over time of single metabolites at single experimental conditions.

Usage

fit_dynamics_model(
  data,
  metabolite = "metabolite",
  time = "time",
  condition = "condition",
  scaled_measurement = "m_scaled",
  assay = "scaled_log",
  chains = 4,
  cores = 4,
  adapt_delta = 0.95,
  max_treedepth = 10,
  iter = 2000,
  warmup = iter/4
)
fit_dynamics_model(
  data,
  metabolite = "metabolite",
  time = "time",
  condition = "condition",
  scaled_measurement = "m_scaled",
  assay = "scaled_log",
  chains = 4,
  cores = 4,
  adapt_delta = 0.95,
  max_treedepth = 10,
  iter = 2000,
  warmup = iter/4
)

Arguments

`data`	concentration table with at least three replicate measurements per metabolites containing the columns "metabolite", "condition", and "m_scaled" by default or colData of a SummarizedExperiment object
`metabolite`	column of "data" that contains the metabolite names or IDs
`time`	column of "time" that contains time as numeric, make sure your time column is ordered from lowest to highest for the model to work
`condition`	column of "data" that contains the experimental conditions
`scaled_measurement`	column of "data" that contains the concentrations per cell, centered and normalized per metabolite and experimental condition (mean=0, sd=1)
`assay`	if input is a summarizedExperiment specify the assay that should be used for input, colData has to hold the columns, "condition" and "metabolite", rowData the timepoint specifications
`chains`	how many Markov-Chains should be used for model fitting, use at least two, default=4
`cores`	how many cores should be used for model fitting; this parallelizes the model fitting and therefore speeds it up; default=4
`adapt_delta`	target average acceptance probability, can be adapted if divergent transitions are reported, default is 0.95
`max_treedepth`	can be adapted if model throws warnings about hitting max_treedepth, warnings are mostly efficiency not validity concerns and treedepth can be raised, default=10
`iter`	how many iterations are run, increasing might help with effective sample size being to low, default=2000
`warmup`	how many iterations the model warms up for, increasing might facilitate efficiency, must be at least 25% of ITER, default=iter/4

Value

returns a list of model fits. One model fit named "condition" per experimental condition. If input is a summarizedExperiment object the dynamic fits are stored metadata(data) under "dynamic_fits"

Examples

data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite == "ATP"]
data <- fit_dynamics_model(
  data = data,
  scaled_measurement = "m_scaled", assay = "scaled_log",
  max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1
)
S4Vectors::metadata(data)[["dynamic_fits"]]

data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite == "ATP"]
data <- fit_dynamics_model(
  data = data,
  scaled_measurement = "m_scaled", assay = "scaled_log",
  max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1
)
S4Vectors::metadata(data)[["dynamic_fits"]]

Retrieve background and annotation information for over-representation analysis (ORA)

Description

Uses the package KEGGREST to retrieve background and annotation information needed for over-representation analysis. As KEGGREST only allows 10 queries per second this might take some time to run, depending on the size of the dataset and organism. The user should check afterwards if all functional modules are applicable for the analysis of the dataset (p.e. organism, tissue).

Usage

get_ORA_annotations(
  data,
  kegg = "KEGG",
  metabolite_name = "metabolite",
  update_background = TRUE
)
get_ORA_annotations(
  data,
  kegg = "KEGG",
  metabolite_name = "metabolite",
  update_background = TRUE
)

Arguments

`data`	data frame to be analyzed with ORA. Must at least contain a column with KEGG IDs or a SummarizedExperiment where the metabolite names or IDs are stored in colData
`kegg`	column name of "data" that holds KEGG IDs
`metabolite_name`	column name of "data" that holds metabolite names
`update_background`	logical. Should the background information be updated? Should be set to TRUE of this is the first time using this function. If TRUE this may take some time.

Value

a list with dataframes "background" and "annotation" needed for ORA, if data is a SummarizedExperiment SummarizedExperiment object annotations are stored in metadata(data) under "KEGG_annotations"

Examples

data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite == "ATP"]
data <- get_ORA_annotations(
  data = data,
  kegg = "KEGG",
  metabolite_name = "metabolite",
  update_background = FALSE
)
S4Vectors::metadata(data)[["KEGG_annotations"]]
data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite == "ATP"]
data <- get_ORA_annotations(
  data = data,
  kegg = "KEGG",
  metabolite_name = "metabolite",
  update_background = FALSE
)
S4Vectors::metadata(data)[["KEGG_annotations"]]

plot bubble heatmap from the numerical fit of compare_dynamics()

Description

plot bubble heatmap from the numerical fit of compare_dynamics()

Usage

heatmap_dynamics(
  estimates = metadata(data)[["comparison_dynamics"]][["estimates"]],
  data
)
heatmap_dynamics(
  estimates = metadata(data)[["comparison_dynamics"]][["estimates"]],
  data
)

Arguments

`estimates`	dataframe of estimates of the mean distance between clusters of different experimental conditions ("mean") and the standard deviation ("sigma") obtain by function compare_dynamics()
`data`	a dataframe or containing a column specifying the metabolite names to be compared and cluster IDs (column named "cluster") of clusters of similar dynamics, as well as a column "condition" specifying the experimental conditions. to be compared or a SummarizedExperiment storing the same information in metadata(data) under "cluster"

Value

a bubble heat map where the color of the bubble represents the similarity of two clusters in regards to their dynamics in the color and the size the uncertainty of the similarity. Big bright bubbles mean high similarity with low uncertainty.

Examples

data("longitudinalMetabolomics")
longitudinalMetabolomics <- compare_dynamics(
  data = longitudinalMetabolomics,
  dynamics = c("mu1_mean", "mu2_mean", "mu3_mean", "mu4_mean"),
  cores = 1
)
heatmap_dynamics(data = longitudinalMetabolomics)
data("longitudinalMetabolomics")
longitudinalMetabolomics <- compare_dynamics(
  data = longitudinalMetabolomics,
  dynamics = c("mu1_mean", "mu2_mean", "mu3_mean", "mu4_mean"),
  cores = 1
)
heatmap_dynamics(data = longitudinalMetabolomics)

plot heatmap from comparison of metabolite composition compare_metabolites()

Description

plot heatmap from comparison of metabolite composition compare_metabolites()

Usage

heatmap_metabolites(
  distances = metadata(data)[["comparison_metabolites"]],
  data
)
heatmap_metabolites(
  distances = metadata(data)[["comparison_metabolites"]],
  data
)

Arguments

`distances`	dataframe of Jaccard indices between clusters obtained by function compare_metabolites(). If compare_metabolites() was executed on as SummarizedExperiment or a SummarizedExperiment than this is stored in metadata(data) under "comparison_metabolites"
`data`	a dataframe containing the columns "metabolite" specifying the metabolite names to be compared and cluster IDs(column named "cluster") of clusters of similar dynamics, as well as a column "condition" specifying the experimental conditions to be compared

Value

a heatmap where the color of the tile represents the similarity of two clusters in regards to their metabolite composition. The brighter the color the more similar the metabolite compositions.

Examples

data("longitudinalMetabolomics")
longitudinalMetabolomics <- compare_metabolites(
  data = longitudinalMetabolomics
)
heatmap_metabolites(data = longitudinalMetabolomics)
data("longitudinalMetabolomics")
longitudinalMetabolomics <- compare_metabolites(
  data = longitudinalMetabolomics
)
heatmap_metabolites(data = longitudinalMetabolomics)

A simulated data set of longitudinal concentration tables of metabolites.

Description

A simulated data set of 98 metabolites. 3 replicate measurements of 4 time points and at 3 experimental conditions. Metabolites are in 8 dynamics groups per experimental condition. 4 groups have varying dynamics between conditions. Is represented as a SummarizedExperiment object where concentration tables of each experimental condition are stored in assays (raw concentrations in "concentrations", log-transformed transformations in "log_con" and scaled log-transformed concentrations" in "scaled_log") and metabolite names, KEGG IDs, experimental conditions and clustering solutions per experimental condition are stored in colData and timepoint specifications in rowData. (SummarizedExperiment).

Usage

data("longitudinalMetabolomics")data("longitudinalMetabolomics")

Format

A SummarizedExperiment object with concentration tables in assays. RowData contains the time point specification. ColData as specified below.

condition: experimental condition
metabolite: metabolite name
KEGG: KEGG ID of metabolites
replicate: column that specifies the measurement replicate
cluster: cluster ID that is condition specific for every metabolite

Source

Script used to create simulated data

# load KEGG database for assignment of metabolite names: data("metabolite_modules")

# metabolite_db <- metabolite_modules # Group <- middle_hierarchy

library(dplyr) library(SummarizedExperiment) # Parameters (as before)

n_features <- 98

n_groups <- 8 # Number of groups (randomly choose between 6-8)

n_time_points <- 4 # Number of time points

n_replicates <- 3 # Number of replicates for all features and time points

n_conditions <- 3 # Number of experimental conditions

x_varying_groups <- 4 # Number of groups with varying dynamics across conditions

condition_names <- c("A","B","C")

# Probability matrix for assigning metabolites from different database groups to dynamic groups # For simplicity, we assume equal probability; customize as needed

group_probabilities <- matrix(c(0.8,rep(0.01,7), #amino acid metabolism rep(0.01,7),0.8, #nucleotide metabolism 0.1,0.8,0.8,rep(0.1,5), # energy and carbohydrate metabolism runif(5 * length(unique(metabolite_modules$middle_hierarchy)))), nrow = n_groups, ncol = length(unique(metabolite_modules$middle_hierarchy)))

# Generate group dynamics (base trends over time) for each condition

group_dynamics <- list()

# Define the base group dynamics for condition 1

group_dynamics[[1]] <- lapply(1:n_groups, function(g) trend <- rnorm(n_time_points, mean = g * 2, sd = 0.5) return(trend) )

# Define varying dynamics for selected groups across other conditions

varying_groups <- sample(1:n_groups, x_varying_groups, replace = FALSE)

for (cond in 2:n_conditions) group_dynamics[[cond]] <- group_dynamics[[1]] for (g in varying_groups) group_dynamics[[cond]][[g]] <- rnorm(n_time_points, mean = g * 2, sd = 1)

# Assign each feature to a group

feature_to_group <- sample(1:n_groups, n_features, replace = TRUE)

# Initialize a list to store the simulated data

simulated_data <- list()

# Assign metabolite names to features

available_metabolites <- metabolite_modules # Copy of metabolite database to keep track of unused names

# Simulate data for each feature across all conditions

for (feature in 1:n_features)

# Get the group for this feature

group <- feature_to_group[feature]

# Determine probability of each metabolite database group for this dynamic group

group_probs <- group_probabilities[group, ]

# Subset the metabolite database for selection based on group probabilities

metabolite_candidates <- available_metabolites group_by(middle_hierarchy) mutate(Probability = group_probs[match(middle_hierarchy, unique(metabolite_modules$middle_hierarchy))]) ungroup() filter(metabolite

# Randomly sample a metabolite based on these probabilities

metabolite_name <- sample(metabolite_candidates$metabolite, 1, prob = metabolite_candidates$Probability)

# Remove this metabolite from available pool

available_metabolites <- available_metabolites[available_metabolites$metabolite != metabolite_name, ]

# Generate a random base mean for this feature between 0.001 and 1000

base_mean <- runif(1, min = 0.001, max = 1000)

# Generate feature-specific variances for each time point

feature_variances <- runif(n_time_points, min = 0.1, max = 2)

# Store data for each condition

for (cond in 1:n_conditions) trend <- group_dynamics[[cond]][[group]] feature_means <- base_mean * trend / max(abs(trend))

feature_data <- data.frame( metabolite = metabolite_name, # Assign metabolite name here condition = paste0(condition_names[[cond]]), time = rep(1:n_time_points, each = n_replicates), replicate = rep(1:n_replicates, times = n_time_points) )

# Generate the actual data points with strictly positive concentrations

feature_data$measurement <- unlist(lapply(1:n_time_points, function(t) rlnorm(n_replicates, meanlog = log(feature_means[t]), sdlog = feature_variances[t]) ))

simulated_data[[length(simulated_data) + 1]] <- feature_data

rm(base_mean,cond,feature,feature_means,feature_to_group,feature_variances, g,group,group_probs,metabolite_name,n_conditions,n_features,n_groups,n_replicates, n_time_points,trend,varying_groups,x_varying_groups,available_metabolites,feature_data,group_dynamics, group_probabilities,metabolite_candidates)

# Combine all features and conditions into one data frame

simulated_data_df <- do.call(rbind, simulated_data)

simulated_data_df <- simulated_data_df group_by(metabolite, condition) mutate( log_m = log10(measurement), m_scaled = (log_m - mean(log_m)) / sd(log_m) )

# add KEGG IDs name_map_HMDB_CAS <- readr::read_csv("name_map_HMDB_CAS.csv") longitudinalMetabolomics <- dplyr::left_join(longitudinalMetabolomics,name_map_HMDB_CAS[,c("Query","KEGG")],join_by("metabolite"=="Query"))

## concentrations temp <- longitudinalMetabolomics temp <- temp select(condition,metabolite,KEGG,time,measurement,replicate) pivot_wider(id_cols=c(condition,metabolite,KEGG,replicate), names_from = time, values_from = measurement) concentrations <- temp ## transform matrix so that conditions are in columns to facilitate access ## with colData -> se[,se$condition="A"] concentrations <- t(as.matrix(concentrations)) row.names(concentrations) <- NULL # prepare log_transformed data temp <- data temp <- temp select(condition,metabolite,KEGG,time,log_m,replicate) pivot_wider(id_cols=c(condition,metabolite,KEGG,replicate), names_from = time, values_from = log_m) log_con <- temp log_con <- t(as.matrix(log_con)) row.names(log_con) <- NULL # prepare scaled log_transformed data temp <- data temp <- temp select(condition,metabolite,KEGG,time,m_scaled,replicate) pivot_wider(id_cols=c(condition,metabolite,KEGG,replicate), names_from = time, values_from = m_scaled) scaled_log <- temp scaled_log <- t(as.matrix(scaled_log)) row.names(scaled_log) <- NULL

# prepare row and colData #### row_data <- DataFrame(time_points=c("time_point_1","time_point_2","time_point_3", "time_point_4")) col_data <- DataFrame(condition=temp$condition,metabolite=temp$metabolite, KEGG=temp$KEGG,replicate=temp$replicate)

se <- SummarizedExperiment(assays=SimpleList(concentrations=concentrations, log_con=log_con, scaled_log=scaled_log), rowData = row_data, colData = col_data) # set row and colnames #### rownames(se) <- c("time_point_1","time_point_2","time_point_3", "time_point_4") colnames(se) <- temp$metabolite

# add metadata #### metadata(se)[["data origin"]] <- "Simulated data of 98 metabolites with three concentration observations at four time points and at three different biological conditions. Script to construct dataset can be seen with ?longitudinalMetabolomics"

# add clustering solution # get distances between vectors clust_A <- metadata(data)[["estimates_dynamics"]][["A"]] clust_A <- clust_A select(metabolite.ID, condition, time.ID, mu_mean) pivot_wider(id_cols = c(metabolite.ID, condition), names_from = time.ID, values_from = mu_mean)

dist <- clust_A dd_A <- dist(dist, method = "euclidean" ) # hierarchical clustering clust <- hclust(dd_A, method = "ward.D2") clust_cut <- cutree(clust, k = 8) # adding cluster ID to estimates clust_A$cluster <- clust_cut

# get distances between vectors clust_B <- metadata(data)[["estimates_dynamics"]][["B"]] clust_B <- clust_B select(metabolite.ID, condition, time.ID, mu_mean) pivot_wider(id_cols = c(metabolite.ID, condition), names_from = time.ID, values_from = mu_mean)

dist <- clust_B dd_B <- dist(dist, method = "euclidean" ) # hierarchical clustering clust <- hclust(dd_B, method = "ward.D2") clust_cut <- cutree(clust, k = 8) # adding cluster ID to estimates clust_B$cluster <- clust_cut

# get distances between vectors clust_C <- metadata(data)[["estimates_dynamics"]][["C"]] clust_C <- clust_C select(metabolite.ID, condition, time.ID, mu_mean) pivot_wider(id_cols = c(metabolite.ID, condition), names_from = time.ID, values_from = mu_mean)

dist <- clust_C dd_C <- dist(dist, method = "euclidean" ) # hierarchical clustering clust <- hclust(dd_C, method = "ward.D2") clust_cut <- cutree(clust, k = 8) # adding cluster ID to estimates clust_C$cluster <- clust_cut

cluster <- rbind(clust_A,clust_B,clust_C) metadata(se)[["cluster"]] <- cluster

KEGG Query Results of experimental metabolites

Description

Using the package KEGGREST (https://www.bioconductor.org/packages/release/bioc/html/KEGGREST.html) all experimental metabolites (see data("intra")) were queried with there KEGG-IDs and all functional modules recorded to which the metabolite is annotated in the KEGG-database.

Usage

data("metabolite_modules")data("metabolite_modules")

Format

A data frame with 348 observations on the following 8 variables.

...1: row number of the dataframe
metabolite: name of the experimental metabolite
KEGG: KEGG ID of the experimental metabolite
module_id: ID of the KEGG module to which the metabolite is annotated
module_name: name of the KEGG module to which the metabolite is annotated
upper_hierarchy: name of the highest hierachy level of module organisation
middle_hierarchy: name of the middle hierachy = functional module, p.e. "Amino acid metabolism"
lower_hierarchy: name of the lowest level of modules, this often contain only a couple pathways p.e. "Arginine and proline metabolism"

Source

https://www.genome.jp/kegg/module.html data <- get_ORA_annotations(longitudinalMetabolomics,update_background=TRUE) metabolite_modules <- get_ORA_annotations["annotations"]

Background KEGG Query Results Of Functional Modules

Description

Using the package KEGGREST (https://www.bioconductor.org/packages/release/bioc/html/KEGGREST.html) a list of all KEGG-modules (KeggList("module")) including their upper, middle and lower hierachy as given by the KEGG-database and the corresponding annotated metabolites was queried.

Usage

data("modules_compounds")data("modules_compounds")

Format

A data frame with 1988 observations on the following 6 variables.

KEGG: KEGG ID of a metabolite annotated to a functional module
upper_hierarchy: name of the highest hierachy level of module organisation
middle_hierarchy: name of the middle hierachy = functional module, p.e. "Amino acid metabolism"
lower_hierarchy: name of the lowest level of modules, this often contain only a couple pathways p.e. "Arginine and proline metabolism"
module_id: the ID of the KEGG functional module
module_name: name of the KEGG module

Source

https://www.genome.jp/kegg/module.html data <- get_ORA_annotations(longitudinalMetabolomics,update_background=TRUE) modules_compunds <- get_ORA_annotations["background"]

OverRepresentationAnalysis with a hypergeometric model

Description

Testing the hypothesis that certain KEGG modules are over-represented in clusters of metabolites. A module is considered over-represented in a cluster the number of metabolites in a cluster being annotated to a functional module (n_obs) is higher than the expected number of metabolites in a cluster of this size being annotated to a functional module (n_theo). We can calculate the OvE (Observed versus Expected = n_obs/n_theo) and show the probabilites of these ratios. log(p(OvE))>0 indicates an over-representation of the functional module in the cluster, log(p(OvE))<0 an under-representation.

Usage

ORA_hypergeometric(
  background = metadata(data)[["KEGG_annotations"]]$background,
  annotations = metadata(data)[["KEGG_annotations"]]$annotations,
  data,
  tested_column = "middle_hierarchy"
)
ORA_hypergeometric(
  background = metadata(data)[["KEGG_annotations"]]$background,
  annotations = metadata(data)[["KEGG_annotations"]]$annotations,
  data,
  tested_column = "middle_hierarchy"
)

Arguments

`background`	dataframe that contains KEGG IDs of metabolites that are assigned to functional modules
`annotations`	dataframe tha contains information to which functional modules our experimental metabolites are annotated in KEGG
`data`	dataframe containing columns "KEGG" specifying the KEGG identifiers of metabolites, "cluster" specifying the cluster ID of metabolites and a column specifying the experimental condition called "condition" or if data is a SummarizedExperiment or a SummarizedExperiment clustering solution must be stored in metadata(data) under "cluster"
`tested_column`	column that is in background and annotations and on which the hypergeometric model will be executed

Value

a dataframe containing the ORA results or if data is SummarizedExperiment SummarizedExperiment object the output is stored in metadata(data) under "ORA_tested_column"

Examples

data("longitudinalMetabolomics")
data("modules_compounds")
head(modules_compounds)
data("metabolite_modules")
head(metabolite_modules)
# middly hierachy
longitudinalMetabolomics <- ORA_hypergeometric(
  data = longitudinalMetabolomics,
  annotations = metabolite_modules,
  background = modules_compounds,
  tested_column = "middle_hierarchy"
)
S4Vectors::metadata(longitudinalMetabolomics)[["ORA_middle_hierarchy"]]
data("longitudinalMetabolomics")
data("modules_compounds")
head(modules_compounds)
data("metabolite_modules")
head(metabolite_modules)
# middly hierachy
longitudinalMetabolomics <- ORA_hypergeometric(
  data = longitudinalMetabolomics,
  annotations = metabolite_modules,
  background = modules_compounds,
  tested_column = "middle_hierarchy"
)
S4Vectors::metadata(longitudinalMetabolomics)[["ORA_middle_hierarchy"]]

Plot diagnostic criteria of numerical fit of Bayesian model of dynamics

Description

Plot diagnostic criteria of numerical fit of Bayesian model of dynamics

Usage

plot_diagnostics(
  data,
  assay = "scaled_log",
  diagnostics = metadata(data)[["diagnostics_dynamics"]][["model_diagnostics"]],
  divergences = TRUE,
  max_treedepth = TRUE,
  Rhat = TRUE,
  n_eff = TRUE
)
plot_diagnostics(
  data,
  assay = "scaled_log",
  diagnostics = metadata(data)[["diagnostics_dynamics"]][["model_diagnostics"]],
  divergences = TRUE,
  max_treedepth = TRUE,
  Rhat = TRUE,
  n_eff = TRUE
)

Arguments

`data`	dataframe or colData of a SummarizedExperiment used to fit dynamics model must contain column "time"
`assay`	of the SummarizedExperiment object that was used to fit the dynamics model
`diagnostics`	dataframe containing diagnostics criteria from the numerical fit of Bayesian model of dynamics obtained by function diagnostics_dynamics()
`divergences`	should number of divergent transitions be visualized?
`max_treedepth`	should number of exeeded maximum treedepth be visualized?
`Rhat`	should Rhat be visualized?
`n_eff`	should number of effective samples be visualized?

Value

plots of diagnostic criteria of numerical fit of Bayesian model of dynamics

Examples

data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite %in% c("ATP", "ADP")]
data <- fit_dynamics_model(
  data = data,
  scaled_measurement = "m_scaled", assay = "scaled_log",
  max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1
)
data <- diagnostics_dynamics(
  data = data, assay = "scaled_log",
  iter = 2000, chains = 1,
  fits = metadata(data)[["dynamic_fits"]]
)
plot_diagnostics(data = data, assay = "scaled_log")
data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite %in% c("ATP", "ADP")]
data <- fit_dynamics_model(
  data = data,
  scaled_measurement = "m_scaled", assay = "scaled_log",
  max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1
)
data <- diagnostics_dynamics(
  data = data, assay = "scaled_log",
  iter = 2000, chains = 1,
  fits = metadata(data)[["dynamic_fits"]]
)
plot_diagnostics(data = data, assay = "scaled_log")

Visualization of parameter estimates from numeric fit of Bayesian model of dynamics

Description

Visualization of parameter estimates from numeric fit of Bayesian model of dynamics

Usage

plot_estimates(
  data,
  estimates = metadata(data)[["estimates_dynamics"]],
  assay = "scaled_log",
  time = "time",
  delta_t = TRUE,
  dynamics = TRUE
)
plot_estimates(
  data,
  estimates = metadata(data)[["estimates_dynamics"]],
  assay = "scaled_log",
  time = "time",
  delta_t = TRUE,
  dynamics = TRUE
)

Arguments

`data`	dataframe or SummarizedExperiment used used to fit dynamics model and extract the estimates
`estimates`	a list of dataframes (one per experimental condition) that contains the estimates at the timepoints and samples from the posterior generated by estimates_dynamics() or if data is a SummarizedExperiment estimates must be stored in metadata(data) under "estimates_dynamics"
`assay`	of the SummarizedExperiment object that was used to fit the dynamics model
`time`	column in "data" that contains the time point identifiers
`delta_t`	should differences between timepoints be plotted?
`dynamics`	should dynamics be plotted?

Value

Visualization of differences between timepoints(delta_t) and dynamics profiles of single metabolites

Examples

data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite == "ATP"]
data <- fit_dynamics_model(
  data = data,
  scaled_measurement = "m_scaled", assay = "scaled_log",
  max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1
)
data <- estimates_dynamics(
  data = data, iter = 2000,
  chains = 1, condition = "condition"
)
plot_estimates(data = data, delta_t = FALSE)
plot_estimates(data = data, dynamics = FALSE)
data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite == "ATP"]
data <- fit_dynamics_model(
  data = data,
  scaled_measurement = "m_scaled", assay = "scaled_log",
  max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1
)
data <- estimates_dynamics(
  data = data, iter = 2000,
  chains = 1, condition = "condition"
)
plot_estimates(data = data, delta_t = FALSE)
plot_estimates(data = data, dynamics = FALSE)

Plot results of over-representation analysis with ORA_hypergeometric()

Description

Plot results of over-representation analysis with ORA_hypergeometric()

Usage

plot_ORA(data, tested_column = "middle_hierarchy")
plot_ORA(data, tested_column = "middle_hierarchy")

Arguments

`data`	result dataframe from ORA_hypergeometric() or SummarizedExperiment object where the ORA_hypergeometric() results are stored in metadata(data) under "ORA_tested_column"
`tested_column`	KEGG module hierarchy level on which ORA was executed

Value

a plot of the over-representation analysis

Examples

data("longitudinalMetabolomics")
data("modules_compounds")
head(modules_compounds)
data("metabolite_modules")
head(metabolite_modules)
# middly hierachy
longitudinalMetabolomics <- ORA_hypergeometric(
  data = longitudinalMetabolomics,
  annotations = metabolite_modules,
  background = modules_compounds,
  tested_column = "middle_hierarchy"
)
plot_ORA(longitudinalMetabolomics)
data("longitudinalMetabolomics")
data("modules_compounds")
head(modules_compounds)
data("metabolite_modules")
head(metabolite_modules)
# middly hierachy
longitudinalMetabolomics <- ORA_hypergeometric(
  data = longitudinalMetabolomics,
  annotations = metabolite_modules,
  background = modules_compounds,
  tested_column = "middle_hierarchy"
)
plot_ORA(longitudinalMetabolomics)

Plots posterior predictive check of numerical fit of Bayesian dynamics model

Description

Plots posterior predictive check of numerical fit of Bayesian dynamics model

Usage

plot_PPC(
  posterior = metadata(data)[["diagnostics_dynamics"]],
  data,
  assay = "scaled_log",
  scaled_measurement = "scaled_measurement"
)
plot_PPC(
  posterior = metadata(data)[["diagnostics_dynamics"]],
  data,
  assay = "scaled_log",
  scaled_measurement = "scaled_measurement"
)

Arguments

`posterior`	a list of one dataframe per condition that contains necessary information for Posterior predictive check obtained by function diagnostics_dynamics()(named "PPC_condition")
`data`	dataframe or colData of a SummarizedExperiment used to fit dynamics model
`assay`	of the SummarizedExperiment object that was used to fit the dynamics model
`scaled_measurement`	column name of concentration values used to model fit, should be normalized by experimental condition and metabolite to mean of zero and standard deviation of one

Value

a list of visual posterior predictive check, one per experimental condition

Examples

data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite %in% c("ATP", "ADP")]
data <- fit_dynamics_model(
  data = data,
  scaled_measurement = "m_scaled", assay = "scaled_log",
  max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1
)
data <- diagnostics_dynamics(
  data = data, assay = "scaled_log",
  iter = 2000, chains = 1,
  fits = metadata(data)[["dynamic_fits"]]
)
plot_PPC(
  data = data, assay = "scaled_log"
)
data("longitudinalMetabolomics")
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
  longitudinalMetabolomics$metabolite %in% c("ATP", "ADP")]
data <- fit_dynamics_model(
  data = data,
  scaled_measurement = "m_scaled", assay = "scaled_log",
  max_treedepth = 14, adapt_delta = 0.95, iter = 2000, cores = 1, chains = 1
)
data <- diagnostics_dynamics(
  data = data, assay = "scaled_log",
  iter = 2000, chains = 1,
  fits = metadata(data)[["dynamic_fits"]]
)
plot_PPC(
  data = data, assay = "scaled_log"
)

Package 'MetaboDynamics'

Help Index

Comparison of metabolite dynamics clusters under different experimental conditions

Description

Usage

Arguments

Value

See Also

Examples

Comparison of metabolite sets between dynamics clusters of different experimental conditions

Description

Usage

Arguments

Value

See Also

Examples

Extracts diagnostic criteria from numeric fit of Bayesian model of dynamics

Description

Usage

Arguments

Value

See Also

Examples

Extracts parameter estimates from numeric fit of Bayesian model of dynamics

Description

Usage

Arguments

Value

See Also

Examples

Fits dynamics model

Description

Usage

Arguments

Value

See Also

Examples

Retrieve background and annotation information for over-representation analysis (ORA)

Description

Usage

Arguments

Value

See Also

Examples

plot bubble heatmap from the numerical fit of compare_dynamics()

Description

Usage

Arguments

Value

See Also

Examples

plot heatmap from comparison of metabolite composition compare_metabolites()

Description

Usage

Arguments

Value

See Also

Examples

A simulated data set of longitudinal concentration tables of metabolites.

Description

Usage

Format

Source

KEGG Query Results of experimental metabolites

Description

Usage

Format

Source

See Also

Background KEGG Query Results Of Functional Modules

Description

Usage

Format

Source

See Also

OverRepresentationAnalysis with a hypergeometric model

Description

Usage

Arguments

Value