| Title: | Integrated Exposure-Omics Analysis Powered by Tidy Principles |
|---|---|
| Description: | The tidyexposomics package is designed to facilitate the integration of exposure and omics data to identify exposure-omics associations. We structure our commands to fit into the tidyverse framework, where commands are designed to be simplified and intuitive. Here we provide functionality to perform quality control, sample and exposure association analysis, differential abundance analysis, multi-omics integration, and functional enrichment analysis. |
| Authors: | Jason Laird [aut, cre] (ORCID: <https://orcid.org/0009-0000-5994-2236>), Thomas Hartung [ctb] (ORCID: <https://orcid.org/0000-0003-1359-7689>), Fenna Sillé [ctb] (ORCID: <https://orcid.org/0000-0003-4305-2049>), Alexandra Maertens [ctb] (ORCID: <https://orcid.org/0000-0002-2077-2011>), JHU Discovery Award [fnd] |
| Maintainer: | Jason Laird <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-16 09:25:50 UTC |
| Source: | https://github.com/bioc/tidyexposomics |
Returns a Shiny app object for ontology annotation. This function does not
launch the app. Please call shiny::runApp() yourself (see examples).
build_ont_annot_app(use_demo = TRUE, ...)build_ont_annot_app(use_demo = TRUE, ...)
use_demo |
Logical, if TRUE, load packaged lightweight demo ontologies. |
... |
Optional named overrides passed as data.frames with columns
|
A shiny.appobj.
if (interactive()) { app <- build_ont_annot_app() shiny::runApp(app) }if (interactive()) { app <- build_ont_annot_app() shiny::runApp(app) }
Constructs a MultiAssayExperiment object from exposure data and optionally
omics datasets, ensuring proper formatting and alignment of samples and
features. For epidemiology-only workflows, omics data can be omitted.
create_exposomicset(codebook, exposure, omics = NULL, row_data = NULL)create_exposomicset(codebook, exposure, omics = NULL, row_data = NULL)
codebook |
A data frame containing variable information metadata. |
exposure |
A data frame containing exposure data, with rows as samples and columns as variables. |
omics |
An optional list of matrices or a single matrix representing
omics data. Each matrix should have samples as columns and features as rows.
If |
row_data |
An optional list of |
The function validates inputs and creates a MultiAssayExperiment object.
If omics data is provided, it converts matrices into SummarizedExperiment
objects with proper sample alignment. If omics is NULL, the function
creates an exposure-only object suitable for epidemiological analyses
using run_association() with source = "exposures".
A MultiAssayExperiment object containing the formatted exposure
and optionally omics datasets.
# Epi user workflow # so no omics data epi_data <- data.frame( pm25 = rnorm(10), outcome = rbinom(10, 1, 0.5), age = rnorm(10, 45, 10), row.names = paste0("subj_", 1:10) ) codebook <- data.frame( variable = c("pm25", "outcome", "age"), category = c("exposure", "outcome", "covariate") ) mae <- create_exposomicset( codebook = codebook, exposure = epi_data ) # Multi-omics workflow tmp <- make_example_data(n_samples = 10) mae <- create_exposomicset( codebook = tmp$codebook, exposure = tmp$exposure, omics = tmp$omics, row_data = tmp$row_data )# Epi user workflow # so no omics data epi_data <- data.frame( pm25 = rnorm(10), outcome = rbinom(10, 1, 0.5), age = rnorm(10, 45, 10), row.names = paste0("subj_", 1:10) ) codebook <- data.frame( variable = c("pm25", "outcome", "age"), category = c("exposure", "outcome", "covariate") ) mae <- create_exposomicset( codebook = codebook, exposure = epi_data ) # Multi-omics workflow tmp <- make_example_data(n_samples = 10) mae <- create_exposomicset( codebook = tmp$codebook, exposure = tmp$exposure, omics = tmp$omics, row_data = tmp$row_data )
Download and cache a tidyexposomics dataset
download_dataset( name = c("omics_list", "fdata", "meta", "annotated_cb", "expom_1", "chebi", "ecto", "hpo"), verbose = TRUE, validate = TRUE )download_dataset( name = c("omics_list", "fdata", "meta", "annotated_cb", "expom_1", "chebi", "ecto", "hpo"), verbose = TRUE, validate = TRUE )
name |
Dataset name: one of "omics_list", "fdata", "meta", "annotated_cb", "expom_1", "chebi", "ecto", "hpo". |
verbose |
Logical; print messages. |
validate |
Logical; validate MD5 checksum. |
A list or object loaded from the cached .RData.
This function extracts and merges exposure variables from colData
with selected features from omics datasets
in a MultiAssayExperiment object. Optionally applies log2 transformation
to omics data and restricts features based on a variable map.
extract_omics_exposure_df(exposomicset, variable_map = NULL, log2_trans = TRUE)extract_omics_exposure_df(exposomicset, variable_map = NULL, log2_trans = TRUE)
exposomicset |
A |
variable_map |
A data frame with columns |
log2_trans |
Logical; whether to log2-transform omics data.
Default is |
If variable_map is provided, it is used to select variables from
both exposures and omics. If not provided, all numeric colData variables
are used as exposures (excluding variables matching ^PC),
and all omics features are included.
A data frame where rows correspond to samples, and columns contain exposure variables and log2-transformed omics features. Columns from different omics types are disambiguated using prefixes.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # export the omics exposure df merged_df <- extract_omics_exposure_df( mae, log2_trans = TRUE )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # export the omics exposure df merged_df <- extract_omics_exposure_df( mae, log2_trans = TRUE )
MultiAssayExperiment MetadataRetrieves a specific analysis result
from the metadata slot of a MultiAssayExperiment object.
extract_results( exposomicset, result = c("codebook", "quality_control", "correlation", "association", "differential_analysis", "multiomics_integration", "network", "enrichment") )extract_results( exposomicset, result = c("codebook", "quality_control", "correlation", "association", "differential_analysis", "multiomics_integration", "network", "enrichment") )
exposomicset |
A |
result |
A character string indicating which result to
extract from metadata. Must be one of:
|
The corresponding result object stored in metadata(exposomicset),
or NULL if not present.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # extract results res <- extract_results( exposomicset = mae, result = "codebook" )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # extract results res <- extract_results( exposomicset = mae, result = "codebook" )
Exports selected results stored in a MultiAssayExperiment object created
by the tidyexposomics pipeline to an Excel workbook.
Users can select which result types to include,
and optionally add placeholder sheets for missing data.
extract_results_excel( exposomicset, file = "tidyexposomics_results.xlsx", result_types = c("correlation", "association", "mixture_analysis", "differential_analysis", "multiomics_integration", "network", "enrichment", "exposure_summary", "pipeline"), include_empty_tabs = FALSE )extract_results_excel( exposomicset, file = "tidyexposomics_results.xlsx", result_types = c("correlation", "association", "mixture_analysis", "differential_analysis", "multiomics_integration", "network", "enrichment", "exposure_summary", "pipeline"), include_empty_tabs = FALSE )
exposomicset |
A |
file |
Character. Path to the output Excel file. |
result_types |
Character vector specifying which result categories to export. Options include:
Use |
include_empty_tabs |
Logical. If |
An Excel file is written to the specified path. A message is printed with the file location.
# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # run correlation analysis mae <- mae |> run_correlation( feature_type = "exposures", exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # file path of the output file tmp <- tempfile(fileext = ".xlsx") # extract the correlation results extract_results_excel( exposomicset = mae, result_types = "correlation", file = tmp )# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # run correlation analysis mae <- mae |> run_correlation( feature_type = "exposures", exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # file path of the output file tmp <- tempfile(fileext = ".xlsx") # extract the correlation results extract_results_excel( exposomicset = mae, result_types = "correlation", file = tmp )
Identifies the most influential features for specified factors using multiomics integration results. Features are selected based on either a percentile cutoff or an absolute loading threshold.
extract_top_factor_features( exposomicset, factors = NULL, pval_col = "p_adjust", pval_thresh = 0.05, method = "percentile", percentile = 0.9, threshold = 0.3, action = "add" )extract_top_factor_features( exposomicset, factors = NULL, pval_col = "p_adjust", pval_thresh = 0.05, method = "percentile", percentile = 0.9, threshold = 0.3, action = "add" )
exposomicset |
A |
factors |
A character vector specifying the factors of interest.
If |
pval_col |
A string specifying the column name of the p-value or
adjusted p-value used for factor selection if |
pval_thresh |
A numeric value specifying the significance threshold
for selecting factors from association results when |
method |
A character string specifying the feature selection method
( |
percentile |
A numeric value between 0 and 1 indicating the
percentile threshold for feature selection when |
threshold |
A numeric value specifying the absolute loading cutoff
for feature selection when |
action |
A character string indicating whether to return results
( |
The function extracts factor loadings from metadata(exposomicset),
applies filtering based on
the selected method, and identifies top contributing features for
each specified factor.
If factors is not provided, the function will automatically select
statistically significant factors from metadata(exposomicset)$association$assoc_factors$results_df
using the specified pval_col and pval_thresh as criteria.
Features can be selected using:
Percentile-based filtering (method = "percentile"): Selects
features with absolute loadings above a specified percentile.
Threshold-based filtering (method = "threshold"): Selects
features with absolute loadings exceeding a fixed value.
If action = "add", returns the modified exposomicset with
selected features stored in metadata.
If action = "get", returns a data frame containing:
feature |
The selected feature contributing to the factor. |
factor |
The factor to which the feature contributes. |
loading |
The factor loading value of the feature. |
exp_name |
The experiment from which the feature originated. |
# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # perform multiomics integration mae <- run_multiomics_integration( mae, method = "DIABLO", outcome = "smoker", n_factors = 3 ) top_feats <- extract_top_factor_features( mae, factors = c("V1", "V2", "V3"), method = "percentile", percentile = 0.9, action = "get" )# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # perform multiomics integration mae <- run_multiomics_integration( mae, method = "DIABLO", outcome = "smoker", n_factors = 3 ) top_feats <- extract_top_factor_features( mae, factors = c("V1", "V2", "V3"), method = "percentile", percentile = 0.9, action = "get" )
Removes exposure variables and omics features with missing values above a specified threshold. Generates missing data summaries and quality control (QC) plots.
filter_missing(exposomicset, na_thresh = 20, na_plot_thresh = 5)filter_missing(exposomicset, na_thresh = 20, na_plot_thresh = 5)
exposomicset |
A |
na_thresh |
A numeric value specifying the percentage of
missing data allowed before a variable or feature is removed. Default is |
na_plot_thresh |
A numeric value specifying the
minimum missing percentage for inclusion in QC plots. Default is |
The function assesses missingness in both colData(exposomicset)
(exposure data) and experiments(exposomicset) (omics data).
Exposure variables with more than na_thresh% missing values are removed.
Omics features (rows in assay matrices) exceeding na_thresh%
missing values are filtered.
Missingness summaries and QC plots are generated using
naniar::gg_miss_var() and stored in metadata.
A MultiAssayExperiment object with filtered exposure
variables and omics features.
QC results, including missingness summaries and plots,
are stored in metadata(exposomicset)$na_qc.
# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Introduce some missingness MultiAssayExperiment::colData(mae)$exposure_pm25[sample(1:20, 5)] <- NA # Filter features and exposures with high missingness mae_filtered <- filter_missing( exposomicset = mae, na_thresh = 20, na_plot_thresh = 5 )# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Introduce some missingness MultiAssayExperiment::colData(mae)$exposure_pm25[sample(1:20, 5)] <- NA # Filter features and exposures with high missingness mae_filtered <- filter_missing( exposomicset = mae, na_thresh = 20, na_plot_thresh = 5 )
Removes exposure variables that deviate significantly from a normal distribution based on normality test results stored in metadata.
filter_non_normal(exposomicset, p_thresh = 0.05)filter_non_normal(exposomicset, p_thresh = 0.05)
exposomicset |
A |
p_thresh |
A numeric value specifying the p-value
threshold for normality. Variables with |
The function identifies exposure variables that fail a normality test
using metadata(exposomicset)$transformation$norm_df.
Exposure variables with p.value < p_thresh are removed from
colData(exposomicset).
The same filtering is applied to colData in each experiment
within experiments(exposomicset).
A MultiAssayExperiment object with non-normal
exposure variables removed.
# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Test for normality mae <- mae |> run_normality_check() |> transform_exposure(exposure_cols = c("age", "bmi", "exposure_pm25")) # Remove non-normal variables mae_filtered <- mae |> filter_non_normal()# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Test for normality mae <- mae |> run_normality_check() |> transform_exposure(exposure_cols = c("age", "bmi", "exposure_pm25")) # Remove non-normal variables mae_filtered <- mae |> filter_non_normal()
This function applies variance- or expression-based filtering
across one or more assays within a MultiAssayExperiment object.
It is useful for removing low-quality or uninformative features
before downstream analysis.
filter_omics( exposomicset, method = c("variance", "expression"), assays = NULL, assay_name = 1, min_var = 1e-05, min_value = 5, min_prop = 0.7, verbose = TRUE )filter_omics( exposomicset, method = c("variance", "expression"), assays = NULL, assay_name = 1, min_var = 1e-05, min_value = 5, min_prop = 0.7, verbose = TRUE )
exposomicset |
A |
method |
Filtering method: either |
assays |
Character vector of assay names to filter.
If |
assay_name |
Name or index of the assay within each
|
min_var |
Minimum variance threshold (used if |
min_value |
Minimum expression value (used if |
min_prop |
Minimum proportion of samples exceeding |
verbose |
Whether to print messages for each assay being filtered. |
A filtered MultiAssayExperiment object with
updated assays and step record.
# Filter the proteomics assay by variance filtered_mae <- filter_omics( exposomicset = make_example_data(return_mae = TRUE), method = c("variance"), assays = "proteomics", assay_name = 1, min_var = 0.01, verbose = TRUE )# Filter the proteomics assay by variance filtered_mae <- filter_omics( exposomicset = make_example_data(return_mae = TRUE), method = c("variance"), assays = "proteomics", assay_name = 1, min_var = 0.01, verbose = TRUE )
Removes sample outliers from a MultiAssayExperiment object
based on PCA analysis.
filter_sample_outliers(exposomicset, outliers = NULL)filter_sample_outliers(exposomicset, outliers = NULL)
exposomicset |
A |
outliers |
An optional character vector specifying
sample names to be removed.
If |
The function checks for the presence of PCA results in
metadata(exposomicset). If outliers is not provided,
it retrieves precomputed outliers from metadata(exposomicset)$pca$outliers.
The identified samples are removed
from the dataset.
A MultiAssayExperiment object with the specified outliers removed.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run PCA mae <- mae |> run_pca() # filter outliers if present mae <- mae |> filter_sample_outliers()# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run PCA mae <- mae |> run_pca() # filter outliers if present mae <- mae |> filter_sample_outliers()
Downloads and loads chebi, ecto, and hpo.
load_annotation_data( to_load = c("all", "chebi", "ecto", "hpo"), verbose = TRUE, validate = TRUE )load_annotation_data( to_load = c("all", "chebi", "ecto", "hpo"), verbose = TRUE, validate = TRUE )
to_load |
Character vector indicating which ontology to load. |
verbose |
Logical; print messages. |
validate |
Logical; validate MD5 checksum. |
A named list of ontology objects.
## Not run: # load single ontology onts <- load_annotation_data(to_load = "ecto") # load all ontologies onts <- load_annotation_data(to_load = "all") ## End(Not run)## Not run: # load single ontology onts <- load_annotation_data(to_load = "ecto") # load all ontologies onts <- load_annotation_data(to_load = "all") ## End(Not run)
This helper function generates a reproducible dummy dataset
containing exposures, mRNA data, and proteomics data. It can
optionally return the data as a MultiAssayExperiment using
create_exposomicset.
make_example_data( n_samples = 12, n_proteins = 80, use_batch = FALSE, return_mae = FALSE )make_example_data( n_samples = 12, n_proteins = 80, use_batch = FALSE, return_mae = FALSE )
n_samples |
Integer. Number of samples to simulate (default: 12). |
n_proteins |
Integer. Number of proteins to simulate (default: 80). |
use_batch |
Logical. If |
return_mae |
Logical. If |
Either:
A named list containing codebook,
exposure, omics, and row_data,
if return_mae = FALSE.
A MultiAssayExperiment, if return_mae = TRUE.
# Return as a list dummy <- make_example_data() # Return as a MultiAssayExperiment mae <- make_example_data(return_mae = TRUE)# Return as a list dummy <- make_example_data() # Return as a MultiAssayExperiment mae <- make_example_data(return_mae = TRUE)
Extracts a specified omics dataset from a MultiAssayExperiment,
optionally filters by feature (row) names,
and returns a tidy tibble. The output includes assay values along with
sample metadata and feature metadata.
pivot_exp(exposomicset, exp_name, features = NULL)pivot_exp(exposomicset, exp_name, features = NULL)
exposomicset |
A |
exp_name |
A character string. The name of the omics dataset to extract (e.g., "Proteomics"). |
features |
Optional character vector of row (feature) names to retain.
If |
A tibble in tidy format with one row per feature/sample pair,
including all metadata and a new column exp_name
indicating the assay source. Assay values are provided in separate columns
named after the assay slot(s).
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # pivot experiment exp_data <- pivot_exp( exposomicset = mae, exp_name = "mRNA", features = c("feat_42") )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # pivot experiment exp_data <- pivot_exp( exposomicset = mae, exp_name = "mRNA", features = c("feat_42") )
Extracts feature-level metadata across all assays in a
MultiAssayExperiment and returns a combined tibble.
pivot_feature(exposomicset)pivot_feature(exposomicset)
exposomicset |
A |
This function:
Iterates over all assays in the MultiAssayExperiment.
Updates each assay's sample metadata (colData) using
.update_assay_colData().
Extracts feature-level metadata using pivot_transcript() from tidybulk.
Combines results across assays into a single tibble,
adding a .exp_name column.
A tibble with feature metadata from all assays,
with an added .exp_name column.
#' # create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # pivot experiment feature_data <- mae |> pivot_feature()#' # create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # pivot experiment feature_data <- mae |> pivot_feature()
Extracts and formats sample-level metadata (colData) from a
MultiAssayExperiment or SummarizedExperiment object.
pivot_sample(x, ...)pivot_sample(x, ...)
x |
A |
... |
Additional arguments passed to |
This function:
Extracts sample metadata from MultiAssayExperiment using colData(),
converting it to a tibble.
Calls pivot_sample() from tidybulk when applied to a
SummarizedExperiment object.
Error Handling: Returns an error if x is not a
MultiAssayExperiment or SummarizedExperiment.
A tibble containing sample metadata with an added .sample column.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) sample_data <- mae |> pivot_sample()# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) sample_data <- mae |> pivot_sample()
Generates a forest plot for association results from any source stored
in the metadata of a MultiAssayExperiment object.
Supports faceting and visual augmentation with R^2 tiles when available.
plot_association( exposomicset, source = c("omics", "exposures", "factors", "go_pcs"), terms = NULL, filter_col = "p.value", filter_thresh = 0.05, direction_filter = "all", add_r2_tile = TRUE, r2_col = "adj_r2", facet = FALSE, nrow = 1, subtitle = NULL )plot_association( exposomicset, source = c("omics", "exposures", "factors", "go_pcs"), terms = NULL, filter_col = "p.value", filter_thresh = 0.05, direction_filter = "all", add_r2_tile = TRUE, r2_col = "adj_r2", facet = FALSE, nrow = 1, subtitle = NULL )
exposomicset |
A |
source |
Character string indicating the association source.
One of |
terms |
Optional character vector of term names to subset the plot to.
Default is |
filter_col |
Column used to assess statistical significance
(default: |
filter_thresh |
Numeric threshold applied to |
direction_filter |
Direction of associations to retain.
One of |
add_r2_tile |
Logical; if |
r2_col |
Column used for coloring the tile plot (default: |
facet |
Logical; if |
nrow |
Integer; number of rows for facet layout if enabled (default: |
subtitle |
Optional subtitle for the plot. If |
This function visualizes effect size estimates and confidence intervals from
association analyses. It allows filtering by
direction ("up" for positive, "down" for negative) and by significance.
For source = "go_pcs", it supports special
formatting by splitting term labels into nested facets.
The R^2 tile (if enabled) adds a side heatmap indicating model fit for each association. This can be useful for model diagnostics.
A ggplot2 object: either a single forest plot or a composite plot
with an R^2 tile strip.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run association tests mae <- mae |> run_association( source = "exposures", feature_set = c("exposure_pm25", "exposure_no2"), outcome = "smoker", covariates = c("age"), family = "binomial" ) assoc_plot <- mae |> plot_association( source = "exposures" )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run association tests mae <- mae |> run_association( source = "exposures", feature_set = c("exposure_pm25", "exposure_no2"), outcome = "smoker", covariates = c("age"), family = "binomial" ) assoc_plot <- mae |> plot_association( source = "exposures" )
Generates a circular network plot to visualize relationships between exposures, either based on correlation ("exposures") or shared features ("degs", "factors").
plot_circos_correlation( exposomicset, feature_type = c("degs", "omics", "factors", "factor_features", "exposures", "pcs"), exposure_cols = NULL, corr_threshold = NULL, shared_cutoff = 10, annotation_colors = NULL, low = "#006666", mid = "white", high = "#8E0152", midpoint = NULL )plot_circos_correlation( exposomicset, feature_type = c("degs", "omics", "factors", "factor_features", "exposures", "pcs"), exposure_cols = NULL, corr_threshold = NULL, shared_cutoff = 10, annotation_colors = NULL, low = "#006666", mid = "white", high = "#8E0152", midpoint = NULL )
exposomicset |
A MultiAssayExperiment object. |
feature_type |
One of "exposures", "degs", or "factors". |
exposure_cols |
Character vector of exposures to include (only for "exposures"). |
corr_threshold |
Minimum |correlation| (only for "exposures"). |
shared_cutoff |
Minimum number of shared features (only for "degs" or "factors"). Default = 10. |
annotation_colors |
Optional named vector of colors for categories. |
low |
low value color for edges. |
mid |
middle value color for edges. |
high |
high value color for edges. |
midpoint |
Midpoint for edge color gradient. Defaults to 0 (for correlations) or mean shared features. |
A ggplot object (ggraph circular plot).
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run correlation analysis mae <- mae |> run_correlation( feature_type = "exposures", exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # create the circos plot circos_plot <- mae |> plot_circos_correlation( feature_type = "exposures" )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run correlation analysis mae <- mae |> run_correlation( feature_type = "exposures", exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # create the circos plot circos_plot <- mae |> plot_circos_correlation( feature_type = "exposures" )
Generates a bar plot summary of exposure-feature correlations using customizable modes.
plot_correlation_summary( exposomicset, feature_type = c("degs", "omics", "factors", "factor_features", "exposures", "pcs"), mode = c("top_exposures", "top_features", "exposure_category", "assay", "summary"), top_n = 15 )plot_correlation_summary( exposomicset, feature_type = c("degs", "omics", "factors", "factor_features", "exposures", "pcs"), mode = c("top_exposures", "top_features", "exposure_category", "assay", "summary"), top_n = 15 )
exposomicset |
A |
feature_type |
One of |
mode |
One of:
|
top_n |
Number of top exposures or features to display
(for top modes). Default is |
A ggplot2 object or a patchwork object (if mode = "summary").
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # correlate with exposures mae <- mae |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # create the correlation summary plot cor_summary_plot <- mae |> plot_correlation_summary( feature_type = "omics", mode = "summary" )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # correlate with exposures mae <- mae |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # create the correlation summary plot cor_summary_plot <- mae |> plot_correlation_summary( feature_type = "omics", mode = "summary" )
Visualizes a correlation matrix as a heatmap tile plot using
correlation results stored in the metadata of a
MultiAssayExperiment object. When feature_type = "pcs",
the function forces PCs to appear on the x-axis and exposures on the y-axis,
and it adds a barplot showing how many PCs are significantly
associated with each exposure. It also suppresses nonsignificant tiles
based on a specified p-value threshold.
plot_correlation_tile( exposomicset, feature_type = c("pcs", "degs", "omics", "factors", "factor_features", "exposures"), pval_cutoff = 0.05, na_color = "grey100", fill_limits = c(-1, 1), midpoint = 0 )plot_correlation_tile( exposomicset, feature_type = c("pcs", "degs", "omics", "factors", "factor_features", "exposures"), pval_cutoff = 0.05, na_color = "grey100", fill_limits = c(-1, 1), midpoint = 0 )
exposomicset |
A |
feature_type |
Type of correlation results to plot.
One of |
pval_cutoff |
Numeric p-value cutoff below which
correlations are displayed.
Nonsignificant tiles are rendered in the |
na_color |
Color used to represent nonsignificant or
missing correlations. Default is |
fill_limits |
Numeric vector of length 2 defining the scale
limits for correlation values. Default is |
midpoint |
Numeric value for centering the fill gradient.
Default is |
A ggplot2 tile plot (or a combined tile + barplot if
feature_type = "pcs").
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run pca mae <- mae |> run_pca() # correlate with exposures mae <- mae |> run_correlation( feature_type = "pcs", exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # make the correlation tile plot cor_tile_p <- mae |> plot_correlation_tile( feature_type = "pcs" )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run pca mae <- mae |> run_pca() # correlate with exposures mae <- mae |> run_correlation( feature_type = "pcs", exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # make the correlation tile plot cor_tile_p <- mae |> plot_correlation_tile( feature_type = "pcs" )
Visualize enrichment results stored in a MultiAssayExperiment object.
Supports dotplots, heatmaps, cnetplots, networks,
and multi-panel summary plots.
plot_enrichment( exposomicset, feature_type = c("degs", "degs_robust", "omics", "factor_features", "degs_cor", "omics_cor", "factor_features_cor"), plot_type = c("dotplot", "cnet", "network", "heatmap", "summary"), top_n = 5, n_per_group = 5, add_top_genes = TRUE, top_n_genes = 5, heatmap_fill = TRUE, logfc_thresh = log2(1), pval_col = "P.Value", pval_thresh = 0.05, score_metric = "stability_score", score_thresh = NULL, overlap_thresh = 0.2, node_radius = 0.2, pie_colors = NULL, label_top_n = NULL, label_colour = "black", net_facet_by = NULL, max_terms = 30, node_size = 1, term_node_correction = 0.2, gene_node_correction = 3, go_groups = NULL, layout_algo = "fr", edge_alpha = 0.3, label_size = 3, feature_col = "feature", logfc_col = "logFC" )plot_enrichment( exposomicset, feature_type = c("degs", "degs_robust", "omics", "factor_features", "degs_cor", "omics_cor", "factor_features_cor"), plot_type = c("dotplot", "cnet", "network", "heatmap", "summary"), top_n = 5, n_per_group = 5, add_top_genes = TRUE, top_n_genes = 5, heatmap_fill = TRUE, logfc_thresh = log2(1), pval_col = "P.Value", pval_thresh = 0.05, score_metric = "stability_score", score_thresh = NULL, overlap_thresh = 0.2, node_radius = 0.2, pie_colors = NULL, label_top_n = NULL, label_colour = "black", net_facet_by = NULL, max_terms = 30, node_size = 1, term_node_correction = 0.2, gene_node_correction = 3, go_groups = NULL, layout_algo = "fr", edge_alpha = 0.3, label_size = 3, feature_col = "feature", logfc_col = "logFC" )
exposomicset |
A |
feature_type |
Character; one of |
plot_type |
Type of plot to generate. One of |
top_n |
Integer; number of top |
n_per_group |
Integer; number of terms per group to plot
(used in |
add_top_genes |
Logical; if |
top_n_genes |
Integer; number of top genes to show in each group
(used in |
heatmap_fill |
Logical; whether to fill tiles by logFC in the heatmap.
Default is |
logfc_thresh |
Numeric; log2 fold change threshold for filtering
(heatmap only). Default is |
pval_col |
Column name of the p-value used for filtering in |
pval_thresh |
Threshold for |
score_metric |
Column for stability score
(used in |
score_thresh |
Numeric; threshold for |
overlap_thresh |
Numeric; Jaccard threshold for edges in
the network plot. Default is |
node_radius |
Numeric; node size in network plot. Default is |
pie_colors |
Optional named vector of colors for pie charts (network and cnet). |
label_top_n |
Integer; number of top nodes to label in network.
Default is |
label_colour |
Color of node labels in network. Default is |
net_facet_by |
Column used to facet the network plot
(e.g., |
max_terms |
Integer; max number of terms to include in the cnet plot.
Default is |
node_size |
Numeric; base node size for cnet plot. Default is |
term_node_correction |
Scaling factor for term nodes in cnet plot.
Default is |
gene_node_correction |
Scaling factor for gene nodes in cnet plot.
Default is |
go_groups |
Optional character vector of GO group names to subset enrichment results (all plots). |
layout_algo |
Graph layout algorithm to use in |
edge_alpha |
Transparency of network/cnet plot edges.
Default is |
label_size |
Font size for labels in network and cnet plots.
Default is |
feature_col |
Column name used to join gene-level metadata.
Default is |
logfc_col |
Column name used for log2 fold change values.
Default is |
This function visualizes results from run_enrichment()
using one of several plot types:
"dotplot": Enrichment terms grouped by GO group, colored by
significance.
"heatmap": Term - gene matrix with optional logFC fill and
shared gene highlighting.
"network": Graph of term overlap based on shared genes,
faceted by metadata if desired.
"cnet": Gene - term bipartite graph with gene logFC values
and term pie slices.
"summary": Multi-panel figure with GO group ridgeplots,
gene counts, and Venn diagram.
A ggplot or patchwork object corresponding to the
requested plot type.
# create example data mae <- make_example_data( n_samples = 30, return_mae = TRUE ) # perform differential abundance analysis mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_voom", action = "add" ) # perform enrichment analysis mae <- run_enrichment( exposomicset = mae, feature_type = "degs", feature_col = "symbol", species = "goa_human", deg_logfc_threshold = log2(1), deg_pval_col = "P.Value", deg_pval_threshold = 0.2, action = "add" ) # create an enrichment plot enr_plot <- plot_enrichment( exposomicset = mae, feature_type = "degs", plot_type = "network" )# create example data mae <- make_example_data( n_samples = 30, return_mae = TRUE ) # perform differential abundance analysis mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_voom", action = "add" ) # perform enrichment analysis mae <- run_enrichment( exposomicset = mae, feature_type = "degs", feature_col = "symbol", species = "goa_human", deg_logfc_threshold = log2(1), deg_pval_col = "P.Value", deg_pval_threshold = 0.2, action = "add" ) # create an enrichment plot enr_plot <- plot_enrichment( exposomicset = mae, feature_type = "degs", plot_type = "network" )
Visualizes the impact of exposures on network centrality measures of associated features (e.g., genes or latent factors) as a heatmap. Each exposure is scored by four centrality metrics, scaled within metric, and grouped by exposure category.
plot_exposure_impact( exposomicset, feature_type = c("degs", "omics", "factors"), min_per_group = 5, facet_cols = NULL, bar_cols = NULL, alpha = 0.3, ncol = 2, nrow = 1, heights = c(1, 1), widths = c(2, 1) )plot_exposure_impact( exposomicset, feature_type = c("degs", "omics", "factors"), min_per_group = 5, facet_cols = NULL, bar_cols = NULL, alpha = 0.3, ncol = 2, nrow = 1, heights = c(1, 1), widths = c(2, 1) )
exposomicset |
A |
feature_type |
Character string specifying the feature type.
One of |
min_per_group |
Minimum number of features per exposure for
inclusion (not currently used). Default is |
facet_cols |
Optional named vector of colors for exposure categories. |
bar_cols |
Optional vector of colors for bar plots (if enabled). |
alpha |
Transparency level for category strips (if enabled).
Default is |
ncol, nrow
|
Layout for optional patchwork combination
(currently unused). Default: |
heights |
Relative heights and widths for combined plots
(currently unused). Default: |
widths |
Relative widths for combined plots
(currently unused). Default: |
This function uses the output of run_exposure_impact() to
calculate and visualize the mean centrality
values for each exposure across its associated features.
The following network centrality metrics are shown:
Degree centrality
Eigenvector centrality
Closeness centrality
Betweenness centrality
All values are scaled within metric across exposures. A side bar indicates the category of each exposure.
A ggplot/patchwork object showing a heatmap of
scaled network centrality scores per exposure, annotated by category.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # perform correlation analyses # correlate with exposures mae <- mae |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), feature_cors = TRUE, exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # create the networks mae <- mae |> run_create_network( feature_type = "omics_feature_cor", action = "add" ) |> run_create_network( feature_type = "omics", action = "add" ) # perform impact analysis mae <- mae |> run_exposure_impact( feature_type = "omics" ) # create the exposure impact plot exposure_impact_p <- mae |> plot_exposure_impact( feature_type = "omics" )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # perform correlation analyses # correlate with exposures mae <- mae |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), feature_cors = TRUE, exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # create the networks mae <- mae |> run_create_network( feature_type = "omics_feature_cor", action = "add" ) |> run_create_network( feature_type = "omics", action = "add" ) # perform impact analysis mae <- mae |> run_exposure_impact( feature_type = "omics" ) # create the exposure impact plot exposure_impact_p <- mae |> plot_exposure_impact( feature_type = "omics" )
Plots the number of significant exposure-omics associations, grouped either by exposure or the exposure category.
plot_exposure_omics_association( exposomicset, plot_type = c("exposures", "category"), pval_col = "p_adjust", pval_thresh = 0.05 )plot_exposure_omics_association( exposomicset, plot_type = c("exposures", "category"), pval_col = "p_adjust", pval_thresh = 0.05 )
exposomicset |
A |
plot_type |
Character. One of |
pval_col |
Character. Name of the column used for p-value filtering.
Defaults to |
pval_thresh |
Numeric. Significance threshold applied to |
A ggplot object.
# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # run exposure-omics association mae <- mae |> run_exposure_omics_association( exposures = c("exposure_pm25", "exposure_no2"), covariates = c("age", "sex") ) plot_exposure_omics_association( exposomicset = mae, plot_type = "exposures" )# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # run exposure-omics association mae <- mae |> run_exposure_omics_association( exposures = c("exposure_pm25", "exposure_no2"), covariates = c("age", "sex") ) plot_exposure_omics_association( exposomicset = mae, plot_type = "exposures" )
Visualizes exposure variable distributions using boxplots or ridge plots.
plot_exposures( exposomicset, exposure_cat = "all", exposure_cols = NULL, group_by = NULL, plot_type = "boxplot", alpha = 0.3, panel_sizes = rep(1, 100), title = "Exposure Levels by Category", xlab = "", ylab = "", facet_cols = NULL, group_cols = NULL, box_width = 0.1, fill_lab = "" )plot_exposures( exposomicset, exposure_cat = "all", exposure_cols = NULL, group_by = NULL, plot_type = "boxplot", alpha = 0.3, panel_sizes = rep(1, 100), title = "Exposure Levels by Category", xlab = "", ylab = "", facet_cols = NULL, group_cols = NULL, box_width = 0.1, fill_lab = "" )
exposomicset |
A |
exposure_cat |
A character string or vector specifying exposure
category names (from |
exposure_cols |
Optional character vector specifying exact exposure variables to plot. |
group_by |
A string specifying the column in |
plot_type |
Type of plot: |
alpha |
Transparency level for background facet color strips.
Default is |
panel_sizes |
A numeric vector passed to |
title |
Plot title. Default is |
xlab |
X-axis label. Default is an empty string. |
ylab |
Y-axis label. Default is an empty string. |
facet_cols |
Optional vector of colors to use as background for
facet categories. If |
group_cols |
Optional named vector of colors for |
box_width |
A numeric value specifying the width of the boxplots.
Only used when |
fill_lab |
Legend title for the fill aesthetic
(e.g., |
This function:
Filters exposure data based on category or selected columns.
Merges variable metadata from metadata(exposomicset)$codebook.
Supports either boxplot (vertical distributions per variable) or ridgeplot (horizontal density plots per variable).
If group_by is specified, that variable defines the plot fill color;
otherwise, the fill is based on exposure category.
Facets by category using ggh4x::facet_grid2()
with color-coded strip backgrounds.
The box_width argument controls the width of the boxplots when
plot_type = "boxplot".
A ggplot2 object showing exposure distributions,
optionally grouped.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # plot exposure data exposure_plot <- mae |> plot_exposures( exposure_cols = c("exposure_pm25", "exposure_no2"), box_width = 0.2 )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # plot exposure data exposure_plot <- mae |> plot_exposures( exposure_cols = c("exposure_pm25", "exposure_no2"), box_width = 0.2 )
Generates a summary plot of factor contributions from multi-omics
integration results stored in a MultiAssayExperiment object.
plot_factor_summary( exposomicset, low = "#006666", mid = "white", high = "#8E0152", midpoint = 0.5 )plot_factor_summary( exposomicset, low = "#006666", mid = "white", high = "#8E0152", midpoint = 0.5 )
exposomicset |
A |
low |
Color for low values in the fill gradient.
Default is |
mid |
Color for midpoint in the fill gradient.
Default is |
high |
Color for high values in the fill gradient.
Default is |
midpoint |
Midpoint value for the gradient color scale.
Default is |
This function visualizes factor contributions based on the integration method:
MOFA: Variance explained per factor and view.
MCIA: Block score weights per omic.
DIABLO: Mean absolute sample score per omic and factor (from block-specific variates).
RGCCA: Mean absolute sample score per omic and factor (from aligned block scores).
The color gradient can be customized using the low, mid, high,
and midpoint parameters.
A ggplot object showing factor contributions based on
the integration method.
# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) mae <- run_multiomics_integration( mae, method = "DIABLO", outcome = "smoker", n_factors = 3 ) factor_sum_plot <- mae |> plot_factor_summary()# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) mae <- run_multiomics_integration( mae, method = "DIABLO", outcome = "smoker", n_factors = 3 ) factor_sum_plot <- mae |> plot_factor_summary()
This function generates a multi-faceted Manhattan plot from the results of
associate_all_outcome(), visualizing the significance of associations
across omics features, grouped by category. Significant features can be
highlighted and labeled, and strip backgrounds can be colored per facet.
plot_manhattan( exposomicset, pval_thresh = 0.05, feature_col = "term", alpha = 0.5, min_per_cat = 1, vars_to_label = NULL, sig_color = "magenta2", non_sig_cols = c("grey25", "grey75"), pval_thresh_line_col = "grey25", panel_sizes = c(1, 1, 1, 1, 1), linetype = "dashed", facet_cols = NULL, label_size = 3.5, facet_angle = 90, facet_text_face = "bold.italic" )plot_manhattan( exposomicset, pval_thresh = 0.05, feature_col = "term", alpha = 0.5, min_per_cat = 1, vars_to_label = NULL, sig_color = "magenta2", non_sig_cols = c("grey25", "grey75"), pval_thresh_line_col = "grey25", panel_sizes = c(1, 1, 1, 1, 1), linetype = "dashed", facet_cols = NULL, label_size = 3.5, facet_angle = 90, facet_text_face = "bold.italic" )
exposomicset |
A |
pval_thresh |
Numeric threshold for significance (default = 0.05). |
feature_col |
A character string indicating the column name to use
for feature labeling and highlighting (e.g., |
alpha |
Transparency applied to facet strip colors (default = 0.5). |
min_per_cat |
Minimum number of features per category to be shown (default = 1). |
vars_to_label |
Optional character vector of variable names to label
explicitly, matched against the |
sig_color |
Color used for significant points (default = |
non_sig_cols |
Character vector of alternating colors for
non-significant points (default = |
pval_thresh_line_col |
Color of the horizontal significance
threshold line (default = |
panel_sizes |
Numeric vector passed to |
linetype |
Line type for the horizontal threshold (default = |
facet_cols |
Optional vector of colors to use for facet strip backgrounds. |
label_size |
Numeric size of the feature label text (default = 3.5). |
facet_angle |
Angle (in degrees) for strip text rotation (default = 90). |
facet_text_face |
Font face for facet strip labels
(default = |
This function expects associate_all_outcome() to have been run first.
Facets represent omics categories, and points represent features.
Points below the significance threshold are colored using non_sig_cols,
while significant ones are colored with sig_color and optionally labeled.
Uses ggrepel to avoid overlapping labels and ggh4x for
enhanced faceting.
The feature_col argument allows customization of which column is used
to label or identify features, enabling compatibility with different
result formats.
A ggplot object showing the Manhattan-style faceted plot.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run association tests mae <- mae |> run_association( source = "omics", top_n = 20, feature_set = c("exposure_pm25", "exposure_no2"), outcome = "smoker", covariates = c("age"), family = "binomial" ) # create the manhattan plot manhattan_p <- mae |> plot_manhattan(feature_col = "term")# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run association tests mae <- mae |> run_association( source = "omics", top_n = 20, feature_set = c("exposure_pm25", "exposure_no2"), outcome = "smoker", covariates = c("age"), family = "binomial" ) # create the manhattan plot manhattan_p <- mae |> plot_manhattan(feature_col = "term")
Visualizes missing data patterns in a MultiAssayExperiment object using
summary bar plots or feature-level lollipop plots.
plot_missing( exposomicset, threshold = 5, plot_type = c("summary", "lollipop"), layers = NULL )plot_missing( exposomicset, threshold = 5, plot_type = c("summary", "lollipop"), layers = NULL )
exposomicset |
A |
threshold |
Numeric. The percentage threshold (0-100) above which
features are counted as missing in the summary plot. Default is |
plot_type |
Character. Type of plot to generate. Either |
layers |
Optional character vector. If specified, filters the plot
to include only selected layers (e.g., |
The function calculates missing data per feature (or variable) across all assays (including exposure variables) and generates:
Summary plot (plot_type = "summary): A bar plot showing the number
of variables in each assay exceeding the specified missingness threshold.
Lollipop plot (plot_type = "lollipop): A feature-level plot where
each feature's percent missingness is shown, along with a color-coded tile
on the side indicating its layer of origin.
The tile colors in the lollipop plot match the experiment colors used in
other visualizations (e.g., via scale_color_tidy_exp()).
A ggplot or patchwork object depending on the selected
plot_type.
#' # Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Introduce some missingness MultiAssayExperiment::colData(mae)$exposure_pm25[sample(1:20, 5)] <- NA # Summary bar plot of missing data summary_p <- plot_missing( mae, threshold = 10, plot_type = "summary" ) # Lollipop plot for all features with any missingness lollipop_p <- plot_missing( mae, plot_type = "lollipop" )#' # Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Introduce some missingness MultiAssayExperiment::colData(mae)$exposure_pm25[sample(1:20, 5)] <- NA # Summary bar plot of missing data summary_p <- plot_missing( mae, threshold = 10, plot_type = "summary" ) # Lollipop plot for all features with any missingness lollipop_p <- plot_missing( mae, plot_type = "lollipop" )
Visualizes network structures created by run_create_network() from
the metadata of a MultiAssayExperiment object.
Nodes can represent features (e.g., genes or factors) or exposures,
and edges represent correlations or shared connections.
plot_network( exposomicset, network = c("degs", "omics", "factors", "factor_features", "exposures", "degs_feature_cor", "omics_feature_cor", "factor_features_feature_cor"), include_stats = TRUE, nodes_to_include = NULL, centrality_thresh = NULL, top_n_nodes = NULL, cor_thresh = NULL, label = FALSE, label_top_n = 5, nodes_to_label = NULL, facet_var = NULL, foreground = "steelblue", fg_text_colour = "grey25", node_colors = NULL, node_color_var = NULL, alpha = 0.5, size_lab = "Centrality", color_lab = "Group" )plot_network( exposomicset, network = c("degs", "omics", "factors", "factor_features", "exposures", "degs_feature_cor", "omics_feature_cor", "factor_features_feature_cor"), include_stats = TRUE, nodes_to_include = NULL, centrality_thresh = NULL, top_n_nodes = NULL, cor_thresh = NULL, label = FALSE, label_top_n = 5, nodes_to_label = NULL, facet_var = NULL, foreground = "steelblue", fg_text_colour = "grey25", node_colors = NULL, node_color_var = NULL, alpha = 0.5, size_lab = "Centrality", color_lab = "Group" )
exposomicset |
A |
network |
Character string specifying the network type.
One of |
include_stats |
Logical; if |
nodes_to_include |
Optional character vector of node names to
include (subset of |
centrality_thresh |
Optional numeric threshold to filter nodes by centrality degree. |
top_n_nodes |
Optional integer to keep only the top N nodes by centrality. |
cor_thresh |
Optional numeric threshold to filter edges by minimum absolute correlation. |
label |
Logical; whether to label nodes.
If |
label_top_n |
Integer; number of top-centrality nodes to
label if |
nodes_to_label |
Optional character vector of specific nodes to label. |
facet_var |
Optional node metadata column to facet the network layout by. |
foreground |
Color for node outlines and edge borders.
Default is |
fg_text_colour |
Color of node label text. Default is |
node_colors |
Optional named vector of colors for node groups. |
node_color_var |
Optional node attribute used for node color mapping. |
alpha |
Alpha transparency for nodes and edges. Default is |
size_lab |
Legend title for node size (typically centrality).
Default is |
color_lab |
Legend title for node color group. Default is |
This function retrieves the stored graph object and optionally filters or labels nodes based on: centrality, correlation, user input, or group-specific attributes. It supports layout faceting, custom color mappings, and highlights highly central nodes.
Large graphs (> 500 nodes) will prompt the user before plotting.
A ggraph plot object.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # perform correlation analyses # correlate with exposures mae <- mae |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # create the networks mae <- mae |> run_create_network( feature_type = "omics", action = "add" ) # plot the network network_p <- mae |> plot_network( network = "omics" )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # perform correlation analyses # correlate with exposures mae <- mae |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # create the networks mae <- mae |> run_create_network( feature_type = "omics", action = "add" ) # plot the network network_p <- mae |> plot_network( network = "omics" )
Generates a bar plot summarizing the number of exposure variables that pass or fail normality tests (e.g., Shapiro-Wilk) before or after transformation.
plot_normality_summary(exposomicset, transformed = FALSE)plot_normality_summary(exposomicset, transformed = FALSE)
exposomicset |
A |
transformed |
Logical; if |
This function assumes that run_normality_check() has been executed and
that the results are
stored in metadata(exposomicset)$quality_control$normality.
If transformed = TRUE, the function will
instead plot the transformation summary stored in metadata(exposomicset)$quality_control$transformation$norm_summary,
which is populated by transform_exposure().
The plot includes both bar heights and overlaid line segments to reinforce the counts.
A ggplot object summarizing the number of exposures
classified as normal or not normal.
# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Test for normality mae <- mae |> run_normality_check() |> transform_exposure(exposure_cols = c("age", "bmi", "exposure_pm25")) # plot the normality summary norm_p <- mae |> plot_normality_summary()# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Test for normality mae <- mae |> run_normality_check() |> transform_exposure(exposure_cols = c("age", "bmi", "exposure_pm25")) # plot the normality summary norm_p <- mae |> plot_normality_summary()
Generates PCA plots for both feature space and sample space, including scatter plots and scree plots.
plot_pca( exposomicset, feature_col = "#00a9b2", sample_col = "#8a4f77", sample_outlier_col = "firebrick" )plot_pca( exposomicset, feature_col = "#00a9b2", sample_col = "#8a4f77", sample_outlier_col = "firebrick" )
exposomicset |
A |
feature_col |
A character string specifying the color for the
feature scree plot.
Default is |
sample_col |
A character string specifying the color for the
sample scree plot.
Default is |
sample_outlier_col |
A character string specifying the color
for sample outlier labels.
Default is |
This function creates four PCA visualizations:
Feature Space PCA Plot: Colored by category (e.g., omics, exposure).
Feature Scree Plot: Displays the variance explained by each principal component.
Sample Space PCA Plot: Highlights outlier samples.
Sample Scree Plot: Displays variance explained in the sample PCA.
Outliers are labeled based on metadata(exposomicset)$pca$outliers.
A combined ggplot object containing the four PCA plots.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run pca mae <- mae |> run_pca() # create the pca plot pca_p <- mae |> plot_pca()# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run pca mae <- mae |> run_pca() # create the pca plot pca_p <- mae |> plot_pca()
Generates a heatmap of sample clustering results and summarizes sample group assignments.
plot_sample_clusters(exposomicset, exposure_cols = NULL)plot_sample_clusters(exposomicset, exposure_cols = NULL)
exposomicset |
A |
exposure_cols |
A character vector specifying columns from |
This function:
Extracts sample cluster assignments from
metadata(exposomicset)$sample_clustering.
Merges cluster labels with colData(exposomicset).
Plots the heatmap stored in
metadata(exposomicset)$sample_clustering$heatmap.
A ComplexHeatmap plot displaying sample clustering results.
# create example data mae <- make_example_data( n_samples = 30, return_mae = TRUE ) # determine sample clusters mae <- run_cluster_samples( exposomicset = mae, exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi"), clustering_approach = "diana" ) # plot sample clusters sample_cluster_p <- mae |> plot_sample_clusters( exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") )# create example data mae <- make_example_data( n_samples = 30, return_mae = TRUE ) # determine sample clusters mae <- run_cluster_samples( exposomicset = mae, exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi"), clustering_approach = "diana" ) # plot sample clusters sample_cluster_p <- mae |> plot_sample_clusters( exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") )
Generates a ridge plot and bar chart summarizing feature stability scores across assays.
plot_sensitivity_summary( exposomicset, stability_score_thresh = NULL, stability_metric = "stability_score", title = "Distribution of Stability Scores" )plot_sensitivity_summary( exposomicset, stability_score_thresh = NULL, stability_metric = "stability_score", title = "Distribution of Stability Scores" )
exposomicset |
A |
stability_score_thresh |
A numeric threshold for stability scores.
Default is |
stability_metric |
A character string specifying which stability metric to plot (e.g., "stability_score", "logp_weighted_score"). Default is "stability_score". |
title |
A character string specifying the title of the ridge plot. Default is "Distribution of Stability Scores". |
This function:
Extracts feature stability scores from metadata(exposomicset)$sensitivity_analysis$feature_stability.
Displays a ridge plot of stability score distributions per assay.
Displays a bar chart of the number of features per assay.
Prints the number of features with stability scores above the threshold.
A patchwork object combining a ridge plot and a bar chart.
# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Run differential abundance mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_voom", action = "add" ) # Run the sensitivity analysis mae <- run_sensitivity_analysis( exposomicset = mae, base_formula = ~ smoker + sex, methods = c("limma_voom"), scaling_methods = c("none"), covariates_to_remove = "sex", pval_col = "P.Value", logfc_col = "logFC", pval_threshold = 0.05, logFC_threshold = 0, bootstrap_n = 3, action = "add" ) # create the sensitivity summary plot sens_sum_p <- mae |> plot_sensitivity_summary()# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Run differential abundance mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_voom", action = "add" ) # Run the sensitivity analysis mae <- run_sensitivity_analysis( exposomicset = mae, base_formula = ~ smoker + sex, methods = c("limma_voom"), scaling_methods = c("none"), covariates_to_remove = "sex", pval_col = "P.Value", logfc_col = "logFC", pval_threshold = 0.05, logFC_threshold = 0, bootstrap_n = 3, action = "add" ) # create the sensitivity summary plot sens_sum_p <- mae |> plot_sensitivity_summary()
Visualizes the top loading features for each factor from multi-omics integration results (e.g., MOFA, MCIA, DIABLO, RGCCA).
plot_top_factor_features( exposomicset, feature_col = "feature", factors = NULL, top_n = 5, facet_cols = NULL, exp_name_cols = NULL, alpha = 0.5 )plot_top_factor_features( exposomicset, feature_col = "feature", factors = NULL, top_n = 5, facet_cols = NULL, exp_name_cols = NULL, alpha = 0.5 )
exposomicset |
A |
feature_col |
A character string indicating the column name to use
for y-axis feature labels (e.g., |
factors |
Character vector of factors to include
(e.g., "Factor1", "Factor2"). If |
top_n |
Integer specifying the number of top features to show
per factor. Default is |
facet_cols |
Optional color palette for facet strip backgrounds
(one per |
exp_name_cols |
Optional color palette for experiment labels
in the plot ( |
alpha |
Numeric value between 0 and 1 controlling the
transparency of facet strip background fill. Default is |
This function supports the following integration methods:
"MOFA": Uses feature weights from MOFA2 (get_weights()).
"MCIA": Uses block loadings from MCIA (@block_loadings).
"DIABLO": Extracts block-specific loadings from loadings.
"RGCCA": Extracts block-specific loadings from a.
For each factor, it:
Selects the top top_n features by absolute loading.
Merges with feature metadata using pivot_feature().
Creates a point-range plot showing the loading magnitude.
Facets each factor with a customizable strip background.
The feature_col argument allows you to control which feature-level
metadata column (e.g., gene symbols, metabolite names) is used for
labeling the y-axis.
If palettes are not provided, defaults are chosen using
ggpubr::get_palette().
A ggplot2 object with one facet per factor, showing the
top features and their loadings by experiment.
# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) mae <- run_multiomics_integration( mae, method = "DIABLO", outcome = "smoker", n_factors = 3 ) # plot top features using default `feature` column top_feature_p <- mae |> plot_top_factor_features()# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) mae <- run_multiomics_integration( mae, method = "DIABLO", outcome = "smoker", n_factors = 3 ) # plot top features using default `feature` column top_feature_p <- mae |> plot_top_factor_features()
Generates a volcano plot to visualize differential abundance results across one or more omics layers.
plot_volcano( exposomicset, pval_col = "adj.P.Val", pval_thresh = 0.05, logFC_col = "logFC", logFC_thresh = log2(1.5), plot_n_sig = TRUE, top_n_label = NULL, features_to_label = NULL, feature_col = "feature", xlab = expression(Log[2] * "FC"), ylab = expression(-Log[10] * "P"), title = "Volcano Plot of Differential Abundance", nrow = 2 )plot_volcano( exposomicset, pval_col = "adj.P.Val", pval_thresh = 0.05, logFC_col = "logFC", logFC_thresh = log2(1.5), plot_n_sig = TRUE, top_n_label = NULL, features_to_label = NULL, feature_col = "feature", xlab = expression(Log[2] * "FC"), ylab = expression(-Log[10] * "P"), title = "Volcano Plot of Differential Abundance", nrow = 2 )
exposomicset |
A |
pval_col |
A character string specifying the column containing p-values.
Default is |
pval_thresh |
A numeric threshold for significance. Features with
p-values below this are considered significant. Default is |
logFC_col |
A character string specifying the column for
log fold changes. Default is |
logFC_thresh |
A numeric threshold for absolute
log fold change significance. Default is |
plot_n_sig |
Logical; if |
top_n_label |
Optional integer. If provided, the top |
features_to_label |
Optional character vector. Specific features to label regardless of significance. |
feature_col |
A character string naming the feature ID
column to use for labeling. Default is |
xlab |
Label for the x-axis. Default is |
ylab |
Label for the y-axis. Default is |
title |
Plot title. Default is
|
nrow |
Number of rows in the |
The function:
Extracts differential abundance results from
metadata(exposomicset)$differential_abundance.
Assigns each feature a direction of change: Upregulated, Downregulated, or Not-Significant.
Uses logFC_thresh and pval_thresh to define thresholds.
Adds dashed lines to indicate cutoffs for fold change and significance.
Uses facet_wrap() to display each assay (exp_name) separately.
Optionally labels the most significant features or user-defined ones.
A ggplot2 object representing the volcano plot.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # perform differential abundance analysis mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_voom", action = "add" ) # create the volcano plot volcano_p <- mae |> plot_volcano()# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # perform differential abundance analysis mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_voom", action = "add" ) # create the volcano plot volcano_p <- mae |> plot_volcano()
Perform GLM-based association testing between a specified outcome and features from exposures, omics, or latent factors. Automatically adjusts for covariates and supports both Gaussian and binomial models.
run_association( exposomicset, outcome, source = c("omics", "exposures", "factors"), covariates = NULL, feature_set = NULL, log_trans = TRUE, top_n = NULL, family = "gaussian", correction_method = "fdr", action = "add", feature_col = NULL, mirna_assays = NULL )run_association( exposomicset, outcome, source = c("omics", "exposures", "factors"), covariates = NULL, feature_set = NULL, log_trans = TRUE, top_n = NULL, family = "gaussian", correction_method = "fdr", action = "add", feature_col = NULL, mirna_assays = NULL )
exposomicset |
A |
outcome |
The outcome variable name (must be in |
source |
Source of features to test. One of |
covariates |
Optional vector of covariate names to include in the model. |
feature_set |
Optional character vector of exposure or GO terms to test. |
log_trans |
Optional boolean value dictating whether or not to log2 transform omics features. |
top_n |
Optional integer: if using omics source, select top |
family |
GLM family; |
correction_method |
Method for p-value adjustment (default: |
action |
If |
feature_col |
The column in |
mirna_assays |
Optional character vector of assays to exclude when extracting GO terms. |
If action = "add", returns updated MultiAssayExperiment.
Otherwise, returns a list of:
results_df: tidy summary of associations
covariates: the covariates used
model_data: model matrix used in the GLMs
#' # create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run association tests mae <- mae |> run_association( source = "exposures", feature_set = c("exposure_pm25", "exposure_no2"), outcome = "smoker", covariates = c("age"), family = "binomial" )#' # create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run association tests mae <- mae |> run_association( source = "exposures", feature_set = c("exposure_pm25", "exposure_no2"), outcome = "smoker", covariates = c("age"), family = "binomial" )
Performs hierarchical clustering of samples using exposure data from
colData(exposomicset).
run_cluster_samples( exposomicset, exposure_cols = NULL, dist_method = NULL, user_k = NULL, cluster_method = "ward.D", clustering_approach = "diana", action = "add" )run_cluster_samples( exposomicset, exposure_cols = NULL, dist_method = NULL, user_k = NULL, cluster_method = "ward.D", clustering_approach = "diana", action = "add" )
exposomicset |
A |
exposure_cols |
A character vector of column names in
|
dist_method |
A character string specifying the distance metric
( |
user_k |
An integer specifying the number of clusters.
If |
cluster_method |
A character string specifying the hierarchical
clustering method. Default is |
clustering_approach |
A character string specifying the method
for determining |
action |
A character string specifying |
This function:
Extracts numeric exposure data from colData(exposomicset).
Computes a distance matrix ("gower" for mixed data,
"euclidean" for numeric).
Determines the optimal number of clusters (k) using the
specified method.
Performs hierarchical clustering (hclust) and assigns
samples to clusters.
Generates a heatmap of scaled exposure values.
Stores results in metadata(exposomicset)$sample_clustering
when action="add".
If action="add", returns the updated exposomicset.
If action="get", returns a list with:
sample_cluster |
A hierarchical clustering object ( |
sample_groups |
A named vector of sample cluster assignments. |
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # determine sample clusters mae <- run_cluster_samples( exposomicset = mae, exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi"), clustering_approach = "diana" )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # determine sample clusters mae <- run_cluster_samples( exposomicset = mae, exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi"), clustering_approach = "diana" )
Computes correlations between exposures and feature types including DEGs, omics, latent factors, top factor features, or principal components (PCs). Optionally computes feature-feature correlations to support network analysis.
run_correlation( exposomicset, feature_type = c("degs", "omics", "factors", "factor_features", "exposures", "pcs"), exposure_cols = NULL, variable_map = NULL, n_pcs = NULL, feature_cors = FALSE, robust = FALSE, score_col = "stability_score", score_thresh = NULL, correlation_method = "spearman", correlation_cutoff = 0.3, cor_pval_column = "p.value", pval_cutoff = 0.05, deg_pval_col = "adj.P.Val", deg_logfc_col = "logFC", deg_pval_thresh = 0.05, deg_logfc_thresh = log2(1.5), batch_size = 1500, action = c("add", "get") )run_correlation( exposomicset, feature_type = c("degs", "omics", "factors", "factor_features", "exposures", "pcs"), exposure_cols = NULL, variable_map = NULL, n_pcs = NULL, feature_cors = FALSE, robust = FALSE, score_col = "stability_score", score_thresh = NULL, correlation_method = "spearman", correlation_cutoff = 0.3, cor_pval_column = "p.value", pval_cutoff = 0.05, deg_pval_col = "adj.P.Val", deg_logfc_col = "logFC", deg_pval_thresh = 0.05, deg_logfc_thresh = log2(1.5), batch_size = 1500, action = c("add", "get") )
exposomicset |
A |
feature_type |
Type of features to correlate. One of |
exposure_cols |
Optional vector of exposure column names
(from |
variable_map |
Optional mapping of features to include by assay
for |
n_pcs |
Number of PCs to use when |
feature_cors |
Logical; if |
robust |
Logical; restrict DEGs to those passing sensitivity threshold. |
score_col |
Column name in sensitivity analysis with feature stability score. |
score_thresh |
Threshold for filtering robust features. |
correlation_method |
One of |
correlation_cutoff |
Minimum absolute correlation to retain. |
cor_pval_column |
Column in output to filter by p-value
(default: |
pval_cutoff |
Maximum p-value or FDR threshold to retain a correlation. |
deg_pval_col |
Column with DEG adjusted p-values. |
deg_logfc_col |
Column with DEG log fold-changes. |
deg_pval_thresh |
P-value cutoff for DEGs. |
deg_logfc_thresh |
Log fold-change cutoff for DEGs. |
batch_size |
Number of features to process per batch (default: 1500). |
action |
Whether to |
If action = "add", returns updated MultiAssayExperiment
with results added to metadata.
If action = "get", returns a tidy data.frame of correlations.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run correlation analysis mae <- mae |> run_correlation( feature_type = "exposures", exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run correlation analysis mae <- mae |> run_correlation( feature_type = "exposures", exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") )
Constructs an undirected feature-feature or feature-exposure
correlation network from correlation results stored in a
MultiAssayExperiment object. The function
supports multiple correlation formats depending on feature_type,
and stores or returns an igraph object with associated node
and edge metadata.
run_create_network( exposomicset, feature_type = c("degs", "omics", "factors", "factor_features", "exposures", "degs_feature_cor", "omics_feature_cor", "factor_features_feature_cor"), action = c("add", "get") )run_create_network( exposomicset, feature_type = c("degs", "omics", "factors", "factor_features", "exposures", "degs_feature_cor", "omics_feature_cor", "factor_features_feature_cor"), action = c("add", "get") )
exposomicset |
A |
feature_type |
Type of correlation result to convert to a network.
One of:
|
action |
Whether to |
The function detects the appropriate edge and node structure based on column names in the correlation results. Edge weights are based on correlation coefficients and include FDR values.
If action = "add", returns the updated MultiAssayExperiment
with a new network entry in metadata.
If action = "get", returns a list with graph
(an igraph object) and summary (a tibble).
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # perform correlation analyses # correlate with exposures mae <- mae |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) |> # correlate omics features with themselves run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), feature_cors = TRUE, exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # create the networks mae <- mae |> run_create_network( feature_type = "omics_feature_cor", action = "add" ) |> run_create_network( feature_type = "omics", action = "add" )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # perform correlation analyses # correlate with exposures mae <- mae |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) |> # correlate omics features with themselves run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), feature_cors = TRUE, exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # create the networks mae <- mae |> run_create_network( feature_type = "omics_feature_cor", action = "add" ) |> run_create_network( feature_type = "omics", action = "add" )
Performs differential abundance testing across all assays in a
MultiAssayExperiment object using a specified statistical method.
The function updates each assay with its
corresponding colData, fits the model using the provided formula,
and combines the results into a unified table.
run_differential_abundance( exposomicset, formula, abundance_col = "counts", method = "limma_trend", contrasts = NULL, scaling_method = "none", action = "add" )run_differential_abundance( exposomicset, formula, abundance_col = "counts", method = "limma_trend", contrasts = NULL, scaling_method = "none", action = "add" )
exposomicset |
A |
formula |
A model formula for the differential analysis (e.g., ~ group + batch). |
abundance_col |
Character. The name of the assay matrix to use
for abundance values. Default is |
method |
Character. Differential analysis method to use.
Currently supports |
contrasts |
A named list of contrasts for pairwise comparisons.
Default is |
scaling_method |
Character. Scaling method to apply before modeling.
Options include |
action |
Character. Whether to |
Either the updated MultiAssayExperiment (if action = "add")
or a tibble with differential abundance results (if action = "get").
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # perform differential abundance analysis mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_trend", action = "add" )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # perform differential abundance analysis mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_trend", action = "add" )
This function performs enrichment analysis using selected features derived
from differential expression, correlation analysis,
or multi-omics factor features across experiments in an exposomicset.
It supports multiple enrichment databases (e.g., GO, KEGG, Reactome),
applies FDR correction, and optionally clusters GO terms by Jaccard overlap.
run_enrichment( exposomicset, feature_type = c("degs", "degs_robust", "omics", "factor_features", "degs_cor", "omics_cor", "factor_features_cor"), score_col = "stability_score", score_threshold = NULL, variable_map = NULL, factor_type = c("common_top_factor_features", "top_factor_features"), feature_col = "feature", deg_pval_col = "adj.P.Val", deg_pval_threshold = 0.05, deg_logfc_col = "logFC", deg_logfc_threshold = log2(1.5), db = c("GO", "KEGG", "Reactome"), species = NULL, fenr_col = "gene_symbol", padj_method = "fdr", pval_thresh = 0.1, min_set = 3, max_set = 800, clustering_approach = "diana", action = "add" )run_enrichment( exposomicset, feature_type = c("degs", "degs_robust", "omics", "factor_features", "degs_cor", "omics_cor", "factor_features_cor"), score_col = "stability_score", score_threshold = NULL, variable_map = NULL, factor_type = c("common_top_factor_features", "top_factor_features"), feature_col = "feature", deg_pval_col = "adj.P.Val", deg_pval_threshold = 0.05, deg_logfc_col = "logFC", deg_logfc_threshold = log2(1.5), db = c("GO", "KEGG", "Reactome"), species = NULL, fenr_col = "gene_symbol", padj_method = "fdr", pval_thresh = 0.1, min_set = 3, max_set = 800, clustering_approach = "diana", action = "add" )
exposomicset |
An |
feature_type |
Character string indicating the feature source.
One of |
score_col |
Column name used for stability score filtering
(only for |
score_threshold |
Optional numeric threshold for filtering
stability scores. If |
variable_map |
A data frame with |
factor_type |
Character string for selecting factor features:
|
feature_col |
The name of the feature column used to extract gene identifiers. |
deg_pval_col |
Column name for adjusted p-values from DEG analysis. |
deg_pval_threshold |
Threshold to select significant DEGs (default: 0.05). |
deg_logfc_col |
Column name for log-fold changes from DEG analysis. |
deg_logfc_threshold |
Threshold to select DEGs by absolute logFC
(default: |
db |
Enrichment database to use. One of |
species |
Species name (required for GO enrichment,
e.g., |
fenr_col |
Column name for gene IDs used by |
padj_method |
Method for p-value adjustment (default: |
pval_thresh |
Adjusted p-value threshold for filtering enriched terms (default: 0.1). |
min_set |
Minimum number of selected genes overlapping an enriched term (default: 3). |
max_set |
Maximum number of selected genes overlapping an enriched term (default: 800). |
clustering_approach |
Clustering method for GO term grouping.
Defaults to |
action |
Either |
The function identifies selected features based on the chosen
feature_type, determines the gene universe
for each experiment, and performs enrichment analysis using the
fenr package. Results are adjusted for
multiple testing and optionally clustered by gene set overlap (for GO terms).
If feature_type includes correlation-based results
(ending in _cor), enrichment is performed for each
exposure category separately.
If action = "add", returns the modified exposomicset with
enrichment results added to metadata.
If action = "get", returns a data.frame of enrichment results
with GO term clusters (if applicable).
# create example data mae <- make_example_data( n_samples = 30, return_mae = TRUE ) # perform differential abundance analysis mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_voom", action = "add" ) # perform enrichment analysis mae <- run_enrichment( exposomicset = mae, feature_type = "degs", feature_col = "symbol", species = "goa_human", deg_logfc_threshold = log2(1), deg_pval_col = "P.Value", deg_pval_threshold = 0.2, action = "add" )# create example data mae <- make_example_data( n_samples = 30, return_mae = TRUE ) # perform differential abundance analysis mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_voom", action = "add" ) # perform enrichment analysis mae <- run_enrichment( exposomicset = mae, feature_type = "degs", feature_col = "symbol", species = "goa_human", deg_logfc_threshold = log2(1), deg_pval_col = "P.Value", deg_pval_threshold = 0.2, action = "add" )
Calculates a summary exposome score per sample using one of several methods
including mean, sum, median, PCA (first principal component),
IRT (Item Response Theory), quantile binning, or row-wise variance.
The resulting score is added to the colData of the
MultiAssayExperiment object.
run_exposome_score( exposomicset, score_type, exposure_cols = NULL, scale = TRUE, score_column_name = NULL )run_exposome_score( exposomicset, score_type, exposure_cols = NULL, scale = TRUE, score_column_name = NULL )
exposomicset |
A |
score_type |
Character. The method used to compute the score.
Options are:
|
exposure_cols |
Optional character vector. Specific exposure
column names to include. If |
scale |
Logical. Whether to scale the exposures before computing
the score. Default is |
score_column_name |
Optional name for the resulting score column.
If |
"pca" uses the first principal component from prcomp().
"irt" uses the mirt package to fit a graded response model to
discretized exposures.
"quantile" assigns decile bins (1-10) to each variable and sums
them row-wise.
"var" computes the row-wise variance across exposures.
A MultiAssayExperiment object with the exposome score added
to colData().
# create the example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # create the air pollution score mae <- run_exposome_score( mae, score_type = "pca", exposure_cols = c("exposure_pm25", "exposure_no2"), scale = TRUE, score_column_name = "air_pollution_score" )# create the example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # create the air pollution score mae <- run_exposome_score( mae, score_type = "pca", exposure_cols = c("exposure_pm25", "exposure_no2"), scale = TRUE, score_column_name = "air_pollution_score" )
Generalized centrality-based exposure impact analysis using DEG, omics, or factor features.
run_exposure_impact( exposomicset, feature_type = c("degs", "omics", "factor_features"), pval_col = "adj.P.Val", pval_thresh = 0.1, action = c("add", "get") )run_exposure_impact( exposomicset, feature_type = c("degs", "omics", "factor_features"), pval_col = "adj.P.Val", pval_thresh = 0.1, action = c("add", "get") )
exposomicset |
A |
feature_type |
One of |
pval_col |
Column in differential abundance results to filter DEGs.
Default = |
pval_thresh |
DEG p-value threshold. Ignored unless
|
action |
Either |
Either an updated MultiAssayExperiment (if action = "add") or a list.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # perform correlation analyses # correlate with exposures mae <- mae |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), feature_cors = TRUE, exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # create the networks mae <- mae |> run_create_network( feature_type = "omics_feature_cor", action = "add" ) |> run_create_network( feature_type = "omics", action = "add" ) # perform impact analysis mae <- mae |> run_exposure_impact( feature_type = "omics" )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # perform correlation analyses # correlate with exposures mae <- mae |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) |> run_correlation( feature_type = "omics", variable_map = mae |> pivot_feature() |> dplyr::select( variable = .feature, exp_name = .exp_name ), feature_cors = TRUE, exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi") ) # create the networks mae <- mae |> run_create_network( feature_type = "omics_feature_cor", action = "add" ) |> run_create_network( feature_type = "omics", action = "add" ) # perform impact analysis mae <- mae |> run_exposure_impact( feature_type = "omics" )
Test associations between each exposure and each omics feature using limma's linear modeling framework.
run_exposure_omics_association( exposomicset, exposures = NULL, exp_name = NULL, covariates = NULL, scaling_method = "none", correction_method = "fdr", top_pct = NULL, filter_by = c("variance", "mean"), action = "add" )run_exposure_omics_association( exposomicset, exposures = NULL, exp_name = NULL, covariates = NULL, scaling_method = "none", correction_method = "fdr", top_pct = NULL, filter_by = c("variance", "mean"), action = "add" )
exposomicset |
A |
exposures |
Character vector of exposure variable names to test.
If |
exp_name |
Name(s) of the omics assay(s) to test against. If |
covariates |
Optional character vector of covariate names to include in the model. |
scaling_method |
Character. Scaling method to apply before modeling.
Options include |
correction_method |
Method for p-value adjustment. Default is |
top_pct |
Top X% of features to retain using either mean or variance
which is specified by |
filter_by |
Determination of how to filter omics features either by mean or variance. |
action |
If |
This function uses limma to test associations between multiple exposures and omics features. For each exposure, a linear model is fit with the exposure as the predictor and each omics feature as the outcome, adjusting for covariates.
omics_feature ~ exposure + covariate1 + covariate2 + ...
If action = "add", returns updated MultiAssayExperiment.
Otherwise, returns a tibble with association results.
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run exposure-omics association mae <- mae |> run_exposure_omics_association( exposures = c("exposure_pm25", "exposure_no2"), covariates = c("age", "sex") )# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run exposure-omics association mae <- mae |> run_exposure_omics_association( exposures = c("exposure_pm25", "exposure_no2"), covariates = c("age", "sex") )
Identifies top features shared across factors based on integration method. For MOFA/MCIA, takes intersection across factors. For DIABLO/RGCCA, takes features recurring in more than 2 block-specific components.
run_factor_overlap( exposomicset, robust = TRUE, stability_score = NULL, score_col = "stability_score", pval_thresh = 0.05, logfc_thresh = log2(1.5), pval_col = "padj", logfc_col = "logFC", action = "add" )run_factor_overlap( exposomicset, robust = TRUE, stability_score = NULL, score_col = "stability_score", pval_thresh = 0.05, logfc_thresh = log2(1.5), pval_col = "padj", logfc_col = "logFC", action = "add" )
exposomicset |
A |
robust |
Logical; if |
stability_score |
Optional numeric threshold (overrides default from metadata). |
score_col |
Column name for sensitivity score.
Default is |
pval_thresh |
DEG p-value threshold (if |
logfc_thresh |
DEG logFC threshold (if |
pval_col |
Column name for p-value. Default is |
logfc_col |
Column name for logFC. Default is |
action |
|
Modified MultiAssayExperiment or data.frame of
shared top features.
# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # perform multiomics integration mae <- run_multiomics_integration( mae, method = "DIABLO", outcome = "smoker", n_factors = 3 ) # identify the features that contribute most to the factors mae <- extract_top_factor_features( mae, factors = c("V1", "V2", "V3"), method = "percentile", percentile = 0.5, action = "add" ) # perform differential abundance analysis mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_voom", action = "add" ) # determine the overlap in features mae <- mae |> run_factor_overlap( robust = FALSE, pval_col = "adj.P.Val" )# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # perform multiomics integration mae <- run_multiomics_integration( mae, method = "DIABLO", outcome = "smoker", n_factors = 3 ) # identify the features that contribute most to the factors mae <- extract_top_factor_features( mae, factors = c("V1", "V2", "V3"), method = "percentile", percentile = 0.5, action = "add" ) # perform differential abundance analysis mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_voom", action = "add" ) # determine the overlap in features mae <- mae |> run_factor_overlap( robust = FALSE, pval_col = "adj.P.Val" )
Performs missing data imputation on both exposure variables
(from colData) and omics datasets (from experiments) within
a MultiAssayExperiment object.
run_impute_missing( exposomicset, exposure_impute_method = "median", exposure_cols = NULL, omics_impute_method = NULL, omics_to_impute = NULL )run_impute_missing( exposomicset, exposure_impute_method = "median", exposure_cols = NULL, omics_impute_method = NULL, omics_to_impute = NULL )
exposomicset |
A |
exposure_impute_method |
Character. Imputation method to use for
exposure variables. Defaults to |
exposure_cols |
Character vector. Names of columns in
|
omics_impute_method |
Character. Imputation method to use for
omics data. Defaults to |
omics_to_impute |
Character vector. Names of omics datasets to impute.
If |
For exposures, numeric columns in colData are imputed using
the selected method. For omics data, assays are selected and
imputed individually.
Supported imputation methods include:
"median": Median imputation using
naniar::impute_median_all
"mean": Mean imputation using
naniar::impute_mean_all
"knn": k-nearest neighbor imputation using
impute::impute.knn
"mice": Multiple imputation using chained equations
(mice::mice)
"missforest": Random forest-based imputation using
missForest::missForest
"lod_sqrt2": Substitution of missing values with
LOD/sqrt(2), where LOD is the smallest non-zero value per variable
A MultiAssayExperiment object with imputed exposure
and/or omics data.
# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Introduce some missingness MultiAssayExperiment::colData(mae)$exposure_pm25[sample(1:20, 5)] <- NA # Filter features and exposures with high missingness mae <- run_impute_missing( exposomicset = mae, exposure_impute_method = "median" )# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Introduce some missingness MultiAssayExperiment::colData(mae)$exposure_pm25[sample(1:20, 5)] <- NA # Filter features and exposures with high missingness mae <- run_impute_missing( exposomicset = mae, exposure_impute_method = "median" )
Performs multi-omics integration using one of several available methods:
MOFA, MCIA, RGCCA, or DIABLO. This function takes a MultiAssayExperiment
object with two or more assays and computes shared latent factors
across omics layers.
run_multiomics_integration( exposomicset, method = "MCIA", n_factors = 10, scale = TRUE, outcome = NULL, max.iter = 500, near.zero.var = TRUE, action = "add" )run_multiomics_integration( exposomicset, method = "MCIA", n_factors = 10, scale = TRUE, outcome = NULL, max.iter = 500, near.zero.var = TRUE, action = "add" )
exposomicset |
A |
method |
Character. Integration method to use. Options are
|
n_factors |
Integer. Number of latent factors/components to compute. Default is 10. |
scale |
Logical. Whether to scale each assay before integration.
Default is |
outcome |
Character. Required if |
max.iter |
numeric. Option to increase the number of iterations for
|
near.zero.var |
Logical. Option to remove variables with near zero
variance for |
action |
Character. Whether to |
"MOFA" runs Multi-Omics Factor Analysis using the MOFA2 package and
returns a trained model.
"MCIA" runs multi-co-inertia analysis using the nipalsMCIA package.
"RGCCA" runs Regularized Generalized Canonical Correlation Analysis
using the RGCCA package.
"DIABLO" performs supervised integration using the mixOmics package
and a specified outcome.
If action = "add", returns a MultiAssayExperiment with
integration results
stored in metadata(exposomicset)$multiomics_integration$integration_results.
If action = "get", returns a list with integration method and result.
# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # perform multiomics integration mae <- run_multiomics_integration( mae, method = "DIABLO", outcome = "smoker", n_factors = 3 )# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # perform multiomics integration mae <- run_multiomics_integration( mae, method = "DIABLO", outcome = "smoker", n_factors = 3 )
Performs Shapiro-Wilk tests to check the normality of numeric exposure
variables in colData of a MultiAssayExperiment object.
run_normality_check(exposomicset, action = "add")run_normality_check(exposomicset, action = "add")
exposomicset |
A |
action |
A character string specifying whether to store ( |
This function:
Extracts numeric, non-constant exposure variables from colData.
Runs Shapiro-Wilk tests to assess normality.
Summarizes the number of normally and non-normally distributed exposures.
Generates a bar plot visualizing the normality results.
Output Handling:
"add": Stores results in metadata(exposomicset)$normality.
"get": Returns a list containing the normality test results and plot.
A MultiAssayExperiment object with normality results added to
metadata (if action = "add") or a list with:
norm_df |
A data frame of Shapiro-Wilk test results for each exposure variable. |
norm_plot |
A ggplot object showing the distribution of normal and non-normal exposures. |
# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Test for normality mae <- mae |> run_normality_check() |> transform_exposure(exposure_cols = c("age", "bmi", "exposure_pm25"))# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Test for normality mae <- mae |> run_normality_check() |> transform_exposure(exposure_cols = c("age", "bmi", "exposure_pm25"))
Runs PCA on the feature and sample spaces of a MultiAssayExperiment object,
identifying outliers based on Mahalanobis distance.
run_pca( exposomicset, log_trans_exp = FALSE, log_trans_omics = TRUE, action = "add" )run_pca( exposomicset, log_trans_exp = FALSE, log_trans_omics = TRUE, action = "add" )
exposomicset |
A |
log_trans_exp |
A boolean value specifying whether to log2 transform the exposure data |
log_trans_omics |
a boolean value specifying whether to log2 transform the omics data |
action |
A character string specifying whether to store
( |
This function:
Identifies common samples across all assays and exposure data.
Performs PCA on features (transformed and standardized).
Performs PCA on samples and computes Mahalanobis distance to detect outliers.
If action = "add", the function returns the input
MultiAssayExperiment with:
PC scores added as columns in colData(exposomicset), and
PCA objects stored under
metadata(exposomicset)$quality_control$pca.
If action = "get", the function returns a list containing:
pca_df |
A tibble of the transformed input data. |
pca_feature |
A |
pca_sample |
A |
outliers |
A character vector of detected sample outliers. |
# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run pca mae <- mae |> run_pca()# create example data mae <- make_example_data( n_samples = 10, return_mae = TRUE ) # run pca mae <- mae |> run_pca()
This function prints and visualizes the analysis steps stored in the
metadata of a MultiAssayExperiment object. The steps are optionally
printed to the console as a numbered list and can be rendered as a
left-to-right Mermaid flowchart.
The flowchart connects steps with arrows and includes step notes
if requested.
run_pipeline_summary( exposomicset, show_index = TRUE, console_print = TRUE, diagram_print = FALSE, include_notes = TRUE )run_pipeline_summary( exposomicset, show_index = TRUE, console_print = TRUE, diagram_print = FALSE, include_notes = TRUE )
exposomicset |
A |
show_index |
Logical, default |
console_print |
Logical, default |
diagram_print |
Logical, default |
include_notes |
Logical, default |
The Mermaid flowchart is rendered left-to-right and connects each step in sequence. Each node is labeled using the step name and, optionally, any attached notes.
No return value. This function is called for its side effects: console output and/or diagram rendering.
# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Test for normality mae <- mae |> run_normality_check() |> transform_exposure(exposure_cols = c("age", "bmi", "exposure_pm25")) # Run the pipeline summary run_pipeline_summary(mae)# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Test for normality mae <- mae |> run_normality_check() |> transform_exposure(exposure_cols = c("age", "bmi", "exposure_pm25")) # Run the pipeline summary run_pipeline_summary(mae)
Performs sensitivity analysis by systematically varying statistical methods, scaling strategies, and model formulas (with optional bootstrap sampling) to assess the stability of differentially abundant features.
run_sensitivity_analysis( exposomicset, base_formula, abundance_col = "counts", methods = c("limma_trend", "limma_voom", "DESeq2", "edgeR_quasi_likelihood"), scaling_methods = c("none", "TMM", "quantile"), contrasts = NULL, covariates_to_remove = NULL, pval_col = "adj.P.Val", logfc_col = "logFC", pval_threshold = 0.05, logFC_threshold = log2(1), score_thresh = NULL, score_quantile = 0.9, stability_metric = "stability_score", action = "add", bootstrap_n = 1 )run_sensitivity_analysis( exposomicset, base_formula, abundance_col = "counts", methods = c("limma_trend", "limma_voom", "DESeq2", "edgeR_quasi_likelihood"), scaling_methods = c("none", "TMM", "quantile"), contrasts = NULL, covariates_to_remove = NULL, pval_col = "adj.P.Val", logfc_col = "logFC", pval_threshold = 0.05, logFC_threshold = log2(1), score_thresh = NULL, score_quantile = 0.9, stability_metric = "stability_score", action = "add", bootstrap_n = 1 )
exposomicset |
A |
base_formula |
The base model formula used for differential analysis. |
abundance_col |
Character. Name of the column in the assays representing
abundance. Default is |
methods |
Character vector of differential expression methods.
Options include |
scaling_methods |
Character vector of normalization methods to try.
Options include |
contrasts |
Optional list of contrasts to apply for differential testing. |
covariates_to_remove |
Optional character vector of covariates to remove from the base formula to generate model variants. |
pval_col |
Name of the column containing p-values or adjusted p-values used to define significance. |
logfc_col |
Name of the column containing log fold changes. |
pval_threshold |
Numeric threshold for significance. Default is 0.05. |
logFC_threshold |
Numeric threshold for absolute log fold change.
Default is |
score_thresh |
Optional threshold for the selected stability metric.
If not provided, calculated using |
score_quantile |
Quantile used to define the threshold
if |
stability_metric |
Character. Name of the column in
|
action |
Whether to |
bootstrap_n |
Integer. Number of bootstrap iterations. If 0, no resampling is performed. Default is 1. |
If action = "add", returns a MultiAssayExperiment
with results stored in
metadata(exposomicset)$differential_analysis$sensitivity_analysis.
If action = "get",
returns a list with three elements:
sensitivity_dfData frame of all differential results across model/method combinations.
feature_stabilityData frame summarizing feature stability scores.
score_threshThe threshold used to define stable features.
# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Run differential abundance mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_voom", action = "add" ) # Run the sensitivity analysis mae <- run_sensitivity_analysis( exposomicset = mae, base_formula = ~ smoker + sex, methods = c("limma_voom"), scaling_methods = c("none"), covariates_to_remove = "sex", pval_col = "P.Value", logfc_col = "logFC", pval_threshold = 0.05, logFC_threshold = 0, bootstrap_n = 3, action = "add" )# create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Run differential abundance mae <- run_differential_abundance( exposomicset = mae, formula = ~ smoker + sex, abundance_col = "counts", method = "limma_voom", action = "add" ) # Run the sensitivity analysis mae <- run_sensitivity_analysis( exposomicset = mae, base_formula = ~ smoker + sex, methods = c("limma_voom"), scaling_methods = c("none"), covariates_to_remove = "sex", pval_col = "P.Value", logfc_col = "logFC", pval_threshold = 0.05, logFC_threshold = 0, bootstrap_n = 3, action = "add" )
Computes summary statistics for numeric exposure variables and
optionally stores the results in the MultiAssayExperiment metadata.
run_summarize_exposures(exposomicset, exposure_cols = NULL, action = "add")run_summarize_exposures(exposomicset, exposure_cols = NULL, action = "add")
exposomicset |
A |
exposure_cols |
A character vector of exposure variable names
to summarize. If |
action |
A string specifying the action to take. Use |
This function:
Extracts sample-level exposure data using pivot_sample().
Filters to user-specified exposures (exposure_cols) if provided.
Computes descriptive statistics for each numeric variable:
Number of values (n_values)
Number of NAs (n_na)
Minimum, maximum, and range
Sum, median, mean
Standard error of the mean
95% confidence interval of the mean
Variance, standard deviation
Coefficient of variation (sd / mean)
Merges the result with variable metadata stored in
metadata(exposomicset)$codebook.
A modified MultiAssayExperiment object (if action = "add"),
or a data frame of summary statistics (if action = "get").
# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Summarize exposure data exp_sum <- mae |> run_summarize_exposures( exposure_cols = c("age", "bmi", "exposure_pm25"), action = "get" )# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Summarize exposure data exp_sum <- mae |> run_summarize_exposures( exposure_cols = c("age", "bmi", "exposure_pm25"), action = "get" )
A downsampled version of the ISGlobal Exposome Data Challenge 2021 dataset (Maitre et al., Environment International, 2022; DOI: 10.1016/j.envint.2022.107422). This example dataset is included with tidyexposomics for vignette evaluation. This subset represents samples from children with low socioeconomic status in the first cohort of the original dataset.
data("tidyexposomics_example")data("tidyexposomics_example")
An object of class list of length 6.
The data have been filtered and transformed to illustrate
the tidyexposomics workflow. Only a small subset of variables, and
the top 500 most variable features per omic layer are retained.
Contents
A data frame of selected exposure, demographic, and outcome variables for a subset of participants.
A data frame providing ontology-linked annotation
for the exposures in meta.
A numeric matrix of gene expression values (500 features, 48 samples) representing the top-variance transcripts.
Feature-level metadata for exp_filt, including
cleaned gene symbols.
A numeric matrix of DNA methylation M-values (500 CpG sites, 48 samples).
Feature-level metadata for methyl_filt.
Source
Derived from the ISGlobal Exposome Data Challenge 2021
(Maitre et al., Environment International, 2022;
DOI: 10.1016/j.envint.2022.107422),licensed under CC-BY 4.0.
The original data are available on Figshare (Project 98813)
and GitHub (isglobal-exposomeHub/ExposomeDataChallenge2021).
This example dataset was processed and downsampled by the
tidyexposomics authors and is not a replacement for the full dataset.
data("tidyexposomics_example")data("tidyexposomics_example")
Applies a transformation to selected numeric exposure variables in the
colData of a MultiAssayExperiment to improve their normality
(e.g., log, Box-Cox, sqrt). Transformation results and normality
statistics are stored in metadata for tracking.
transform_exposure( exposomicset, exposure_cols = NULL, transform_method = "boxcox_best" )transform_exposure( exposomicset, exposure_cols = NULL, transform_method = "boxcox_best" )
exposomicset |
A |
exposure_cols |
Optional character vector of exposure variable
names to transform.
If |
transform_method |
Character. Transformation method to apply. Options:
|
For transform_method = "boxcox_best", the function automatically shifts
values to be strictly positive and chooses from a discrete set of
transformations (e.g., 1/x, log(x), x^2)
based on estimated Box-Cox lambda. Each variable may receive a
different transformation.
A MultiAssayExperiment object with transformed exposures
in colData, and transformation details stored in:
metadata(exposomicset)$quality_control$transformation$norm_df:
Shapiro-Wilk test results
metadata(exposomicset)$quality_control$transformation$norm_summary:
Summary of normality
metadata(exposomicset)$codebook: Updated with transformation info
per variable
metadata(exposomicset)$summary$steps: Updated with step record
# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Test for normality mae <- mae |> run_normality_check() |> transform_exposure( exposure_cols = c("age", "bmi", "exposure_pm25"), transform_method = "boxcox_best" )# Create example data mae <- make_example_data( n_samples = 20, return_mae = TRUE ) # Test for normality mae <- mae |> run_normality_check() |> transform_exposure( exposure_cols = c("age", "bmi", "exposure_pm25"), transform_method = "boxcox_best" )