| Title: | Estimating Absolute Abundance from Microbial Spike-in Controls |
|---|---|
| Description: | Provides a reproducible and modular workflow for absolute microbial quantification using spike-in controls. Supports both single spike-in taxa and synthetic microbial communities with user-defined spike-in volumes and genome copy numbers. Compatible with 'phyloseq' and 'TreeSummarizedExperiment' (TSE) data structures. The package implements methods for spike-in validation, preprocessing, scaling factor estimation, absolute abundance conversion, bias correction, and normalization. Facilitates downstream statistical analyses with 'DESeq2', 'edgeR', and other Bioconductor-compatible methods. Visualization tools are provided via 'ggplot2', 'ggtree', and related packages. Includes detailed vignettes, case studies, and function-level documentation to guide users through experimental design, quantification, and interpretation. |
| Authors: | Mitra Ghotbi [aut, cre] (ORCID: <https://orcid.org/0000-0001-9185-9993>), Marjan Ghotbi [ctb] (ORCID: <https://orcid.org/0000-0003-4655-6445>) |
| Maintainer: | Mitra Ghotbi <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.3.0 |
| Built: | 2026-05-22 09:53:44 UTC |
| Source: | https://github.com/bioc/DspikeIn |
This dataset provides reference ranges and sample-level metadata for microbial spike-in performance evaluation. It includes taxonomic annotations and summary statistics used in validation and quality control workflows.
data(AcceptableRange)data(AcceptableRange)
A data frame with the following columns:
EPA Level III ecoregion classification.
Genus of the taxon.
Host genus from which the sample was derived.
Observed spike-in percentage.
Phylum classification of the taxon.
Acceptable range category.
Number of reads matching the spike-in species.
Total number of reads per sample.
Row identifier (optional, may be an index or artifact of data processing).
Mean abundance across all samples.
A data frame of spike-in evaluation metrics and taxonomy annotations.
Internal package dataset.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("AcceptableRange", package = "DspikeIn") head(AcceptableRange) summary(AcceptableRange$Percentage) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("AcceptableRange", package = "DspikeIn") head(AcceptableRange) summary(AcceptableRange$Percentage) }
This function normalizes microbial abundance data in a phyloseq or TreeSummarizedExperiment
object by dividing all abundance values by a user-defined factor.
It supports both data structures and preserves their internal metadata.
adjust_abundance_one_third(obj, factor = 3, output_file = NULL) adjust_abundance_by_factor(obj, factor = 3, output_file = NULL)adjust_abundance_one_third(obj, factor = 3, output_file = NULL) adjust_abundance_by_factor(obj, factor = 3, output_file = NULL)
obj |
A |
factor |
A numeric value |
output_file |
A character string specifying a path to save the adjusted object as |
This function extracts the OTU table (for phyloseq) or assay matrix (for TSE),
divides all abundance values by the provided factor, and reinserts the adjusted
table while maintaining the full metadata structure.
Historically, the function name adjust_abundance_one_third() referred to a fixed
normalization by 3. This generalized version preserves the same function name for
backward compatibility and allows any user-defined divisor.
For clarity, an alias function adjust_abundance_by_factor() is provided.
An adjusted object of the same class (phyloseq or TreeSummarizedExperiment),
where the abundance values are divided by the specified factor.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Adjust phyloseq object adjusted_physeq <- adjust_abundance_one_third(physeq_16SOTU, factor = 3) # Convert to TSE and adjust tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) adjusted_tse <- adjust_abundance_one_third(tse_16SOTU, factor = 2) # Using the alias adjusted_physeq2 <- adjust_abundance_by_factor(physeq_16SOTU, factor = 5) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Adjust phyloseq object adjusted_physeq <- adjust_abundance_one_third(physeq_16SOTU, factor = 3) # Convert to TSE and adjust tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) adjusted_tse <- adjust_abundance_one_third(tse_16SOTU, factor = 2) # Using the alias adjusted_physeq2 <- adjust_abundance_by_factor(physeq_16SOTU, factor = 5) }
Removes low-prevalence taxa from a phyloseq or TreeSummarizedExperiment object based
on user-specified prevalence thresholds derived from taxa abundance statistics.
adjusted_prevalence(obj, method = "min", output_file = NULL)adjusted_prevalence(obj, method = "min", output_file = NULL)
obj |
A |
method |
Character. Threshold method: one of |
output_file |
Optional. Character. Path to save the adjusted object as |
The adjusted object of the same class as obj.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") ## Adjust prevalence in phyloseq adjusted_physeq <- adjusted_prevalence(physeq_16SOTU, method = "mean") ## Convert to TreeSummarizedExperiment tse_obj <- convert_phyloseq_to_tse(physeq_16SOTU) ## Adjust prevalence in TSE adjusted_tse <- adjusted_prevalence(tse_obj, method = "median") }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") ## Adjust prevalence in phyloseq adjusted_physeq <- adjusted_prevalence(physeq_16SOTU, method = "mean") ## Convert to TreeSummarizedExperiment tse_obj <- convert_phyloseq_to_tse(physeq_16SOTU) ## Adjust prevalence in TSE adjusted_tse <- adjusted_prevalence(tse_obj, method = "median") }
This function creates an alluvial plot based on input data, which can be either absolute or relative abundance data.
alluvial_plot( data, axes = NULL, abundance_threshold = 10000, fill_variable = "Phylum", silent = TRUE, abundance_type = "absolute", total_reads = NULL, top_taxa = NULL, facet_vars = NULL, text_size = 4, legend_ncol = 1, custom_colors = color_palette$MG, color_mapping = NULL )alluvial_plot( data, axes = NULL, abundance_threshold = 10000, fill_variable = "Phylum", silent = TRUE, abundance_type = "absolute", total_reads = NULL, top_taxa = NULL, facet_vars = NULL, text_size = 4, legend_ncol = 1, custom_colors = color_palette$MG, color_mapping = NULL )
data |
A data frame containing abundance and categorical variables. |
axes |
A character vector specifying the categorical variables for the x-axis. |
abundance_threshold |
A numeric value specifying the minimum abundance
required for an entity to be included in the plot. Default is |
fill_variable |
A string specifying the variable to be used for fill colors. Default is |
silent |
Logical. If |
abundance_type |
A string specifying the type of abundance: |
total_reads |
Numeric, total number of reads for relative abundance calculation. Default is |
top_taxa |
Integer. The number of top abundant taxa to retain. Default is |
facet_vars |
A character vector specifying variables to facet by. Default is |
text_size |
Numeric, size of text labels. Default is |
legend_ncol |
Integer, number of columns for the legend. Default is |
custom_colors |
A named vector specifying colors for taxa. Default is |
color_mapping |
A named vector of colors for taxa, overriding |
A ggplot2 object representing an alluvial plot.
This function assumes data has already been converted to long format with an "Abundance" column.
Built using ggalluvial, ggplot2, and dplyr for visualization of microbial abundance dynamics.
if (requireNamespace("DspikeIn", quietly = TRUE) && requireNamespace("phyloseq", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") physeq_subset <- phyloseq::subset_samples(physeq_16SOTU, Animal.type == "Frog") physeq_subset <- phyloseq::prune_taxa( phyloseq::taxa_sums(physeq_subset) > 0, physeq_subset ) # Convert phyloseq object to long format pps_Abs <- get_long_format_data(physeq_subset) # Calculate total reads (illustrative) total_reads <- sum(pps_Abs$Abundance) message("Total reads in Frog subset: ", total_reads) # Heavy plotting step – wrapped in \donttest{} to reduce build time alluvial_plot_rel <- alluvial_plot( data = pps_Abs, axes = c("Env.broad.scale", "Host.genus", "Diet"), abundance_threshold = 0.01, fill_variable = "Phylum", abundance_type = "relative", top_taxa = 5, silent = TRUE, text_size = 3, legend_ncol = 1, custom_colors = DspikeIn::color_palette$cool_MG ) print(alluvial_plot_rel) # Convert to TreeSummarizedExperiment (TSE) format tse_data <- convert_phyloseq_to_tse(physeq_subset) tse_long <- get_long_format_data(tse_data) # Heavy plotting step – wrapped in \donttest{} to reduce build time alluvial_plot_abs <- alluvial_plot( data = tse_long, axes = c("Env.broad.scale", "Host.genus", "Diet"), abundance_threshold = 2000, fill_variable = "Phylum", abundance_type = "absolute", top_taxa = 5, silent = TRUE, text_size = 3, legend_ncol = 1, custom_colors = DspikeIn::color_palette$cool_MG ) print(alluvial_plot_abs) }if (requireNamespace("DspikeIn", quietly = TRUE) && requireNamespace("phyloseq", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") physeq_subset <- phyloseq::subset_samples(physeq_16SOTU, Animal.type == "Frog") physeq_subset <- phyloseq::prune_taxa( phyloseq::taxa_sums(physeq_subset) > 0, physeq_subset ) # Convert phyloseq object to long format pps_Abs <- get_long_format_data(physeq_subset) # Calculate total reads (illustrative) total_reads <- sum(pps_Abs$Abundance) message("Total reads in Frog subset: ", total_reads) # Heavy plotting step – wrapped in \donttest{} to reduce build time alluvial_plot_rel <- alluvial_plot( data = pps_Abs, axes = c("Env.broad.scale", "Host.genus", "Diet"), abundance_threshold = 0.01, fill_variable = "Phylum", abundance_type = "relative", top_taxa = 5, silent = TRUE, text_size = 3, legend_ncol = 1, custom_colors = DspikeIn::color_palette$cool_MG ) print(alluvial_plot_rel) # Convert to TreeSummarizedExperiment (TSE) format tse_data <- convert_phyloseq_to_tse(physeq_subset) tse_long <- get_long_format_data(tse_data) # Heavy plotting step – wrapped in \donttest{} to reduce build time alluvial_plot_abs <- alluvial_plot( data = tse_long, axes = c("Env.broad.scale", "Host.genus", "Diet"), abundance_threshold = 2000, fill_variable = "Phylum", abundance_type = "absolute", top_taxa = 5, silent = TRUE, text_size = 3, legend_ncol = 1, custom_colors = DspikeIn::color_palette$cool_MG ) print(alluvial_plot_abs) }
Computes sample-specific scaling factors for multiple groups of spiked species
in a phyloseq or TreeSummarizedExperiment object. Each group can have its own
expected spike-in cell count. Scaling factors are calculated per sample and averaged across groups.
Missing spike-in observations in a sample will be handled gracefully by averaging available groups.
calculate_list_average_scaling_factors( obj, spiked_species_list, spiked_cells_list, merge_method = c("sum", "max") )calculate_list_average_scaling_factors( obj, spiked_species_list, spiked_cells_list, merge_method = c("sum", "max") )
obj |
A |
spiked_species_list |
A list of character vectors. Each vector contains taxon names (at species level) for one spike-in group. |
spiked_cells_list |
A numeric vector specifying the expected number of spike-in cells for each group.
The order must match |
merge_method |
Character. Either |
The function assumes that the taxonomy table has a Species column.
The output is suitable for downstream absolute quantification pipelines.
OTUs belonging to each spike-in group will be merged using the specified merge_method
("sum" or "max") to obtain a group-specific spike-in abundance in each sample.
If a sample does not contain any spike-in sequences, a scaling factor of 1 is assigned.
A named numeric vector of sample-specific scaling factors.
This function does not modify the input object.
The returned scaling factors are intended to be used for absolute abundance normalization.
if (requireNamespace("DspikeIn", quietly = TRUE)) { # Load example phyloseq and TSE objects data("physeq", package = "DspikeIn") data("tse", package = "DspikeIn") # Define spike-in species groups and expected cell counts spiked_species_list <- list( c("Pseudomonas aeruginosa"), c("Escherichia coli"), c("Clostridium difficile") ) spiked_cells_list <- c(10000, 20000, 15000) # --- Phyloseq example --- scaling_phyloseq <- calculate_list_average_scaling_factors( physeq, spiked_species_list, spiked_cells_list, merge_method = "sum" ) print(scaling_phyloseq) # --- TreeSummarizedExperiment (TSE) example --- scaling_tse <- calculate_list_average_scaling_factors( tse, spiked_species_list, spiked_cells_list, merge_method = "sum" ) print(scaling_tse) }if (requireNamespace("DspikeIn", quietly = TRUE)) { # Load example phyloseq and TSE objects data("physeq", package = "DspikeIn") data("tse", package = "DspikeIn") # Define spike-in species groups and expected cell counts spiked_species_list <- list( c("Pseudomonas aeruginosa"), c("Escherichia coli"), c("Clostridium difficile") ) spiked_cells_list <- c(10000, 20000, 15000) # --- Phyloseq example --- scaling_phyloseq <- calculate_list_average_scaling_factors( physeq, spiked_species_list, spiked_cells_list, merge_method = "sum" ) print(scaling_phyloseq) # --- TreeSummarizedExperiment (TSE) example --- scaling_tse <- calculate_list_average_scaling_factors( tse, spiked_species_list, spiked_cells_list, merge_method = "sum" ) print(scaling_tse) }
In spike-in based absolute quantitation workflows, the acceptable recovery range of spike-in reads is system dependent, varying with sequencing platform, extraction protocol, and microbial community structure. This function calculates the percentage of reads, categorizes the results as passed or failed, optionally saves the results as DOCX and CSV files.It also visualizes the relationship between observed spike-in abundance and total reads, stratified across user-defined recovery intervals, to empirically determine the optimal range for quality control.
calculate_spike_percentage( obj, merged_spiked_species = NULL, merged_spiked_hashcodes = NULL, output_file = NULL, passed_range = c(0.1, 11) )calculate_spike_percentage( obj, merged_spiked_species = NULL, merged_spiked_hashcodes = NULL, output_file = NULL, passed_range = c(0.1, 11) )
obj |
A |
merged_spiked_species |
A character vector of spiked taxa names (can be from any taxonomic level). |
merged_spiked_hashcodes |
A character vector of spiked hashcodes (ASV/OTU IDs) to check in the dataset. Default is NULL. |
output_file |
A character string specifying the path to save the output files. Default is NULL (no files are written). |
passed_range |
A numeric vector of length 2 specifying the range of percentages to categorize results as "passed". Default is c(0.1, 11). |
A data frame with the following columns:
Sample identifier.
Total number of reads in the sample.
Number of reads mapped to the spike-in taxa.
Percentage of spike-in reads (Spiked_Reads / Total_Reads * 100).
Quality control result, either "passed" or "failed", based on the specified range.
Pre_processing_species, calculate_spike_percentage
if (requireNamespace("DspikeIn", quietly = TRUE)) { # Load example phyloseq object data("physeq_16SOTU", package = "DspikeIn") # ----------- Phyloseq Example ----------- species_name <- c("Tetragenococcus_halophilus", "Tetragenococcus_sp.") merged_spiked_species <- "Tetragenococcus_halophilus" # Pre-process the phyloseq object to merge spike-in taxa merged_physeq <- Pre_processing_species( physeq_16SOTU, species_name = species_name, merge_method = "sum" ) # Perform spike-in percentage calculation output_docx <- file.path(tempdir(), "spike_summary_physeq.docx") result_physeq <- calculate_spike_percentage( obj = merged_physeq, merged_spiked_species = merged_spiked_species, output_file = output_docx, passed_range = c(0.1, 20) ) print(result_physeq) # ----------- TSE Example ----------- tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) merged_tse <- Pre_processing_species( tse_16SOTU, species_name = species_name, merge_method = "sum" ) output_docx_tse <- file.path(tempdir(), "spike_summary_tse.docx") result_tse <- calculate_spike_percentage( obj = merged_tse, merged_spiked_species = merged_spiked_species, output_file = output_docx_tse, passed_range = c(0.1, 20) ) print(result_tse) # Clean up temporary files if (file.exists(output_docx)) unlink(output_docx, force = TRUE) if (file.exists(output_docx_tse)) unlink(output_docx_tse, force = TRUE) }if (requireNamespace("DspikeIn", quietly = TRUE)) { # Load example phyloseq object data("physeq_16SOTU", package = "DspikeIn") # ----------- Phyloseq Example ----------- species_name <- c("Tetragenococcus_halophilus", "Tetragenococcus_sp.") merged_spiked_species <- "Tetragenococcus_halophilus" # Pre-process the phyloseq object to merge spike-in taxa merged_physeq <- Pre_processing_species( physeq_16SOTU, species_name = species_name, merge_method = "sum" ) # Perform spike-in percentage calculation output_docx <- file.path(tempdir(), "spike_summary_physeq.docx") result_physeq <- calculate_spike_percentage( obj = merged_physeq, merged_spiked_species = merged_spiked_species, output_file = output_docx, passed_range = c(0.1, 20) ) print(result_physeq) # ----------- TSE Example ----------- tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) merged_tse <- Pre_processing_species( tse_16SOTU, species_name = species_name, merge_method = "sum" ) output_docx_tse <- file.path(tempdir(), "spike_summary_tse.docx") result_tse <- calculate_spike_percentage( obj = merged_tse, merged_spiked_species = merged_spiked_species, output_file = output_docx_tse, passed_range = c(0.1, 20) ) print(result_tse) # Clean up temporary files if (file.exists(output_docx)) unlink(output_docx, force = TRUE) if (file.exists(output_docx_tse)) unlink(output_docx_tse, force = TRUE) }
Computes the percentage of reads attributed to specified spike-in taxa in a
phyloseq or TreeSummarizedExperiment object. The function merges spike-in taxa,
computes percentages, classifies samples into "passed" or "failed" based on a
user-defined threshold, and optionally exports DOCX and CSV reports.
calculate_spike_percentage_list( obj, merged_spiked_species, output_path = NULL, passed_range = c(0.1, 11) )calculate_spike_percentage_list( obj, merged_spiked_species, output_path = NULL, passed_range = c(0.1, 11) )
obj |
A |
merged_spiked_species |
Character vector or list of spike-in species names (at the |
output_path |
Optional. Character string specifying the output path (DOCX).
If |
passed_range |
Numeric vector of length 2 specifying the accepted percentage range.
Default is |
The function automatically detects spike-in OTUs based on the Species column in
the taxonomy table. It works with both phyloseq and TreeSummarizedExperiment
objects and produces QC diagnostics commonly required for spike-in based absolute
quantification workflows.
A data.frame with:
Sample
Total_Reads
Total_Reads_spiked
Percentage (of spike-in reads)
Result ("passed"/"failed")
Assumes the taxonomy table contains a column named Species.
Supports both phyloseq and TreeSummarizedExperiment objects.
if (requireNamespace("DspikeIn", quietly = TRUE)) { # Load example phyloseq and TSE objects data("physeq", package = "DspikeIn") data("tse", package = "DspikeIn") # Define merged spike-in species list spiked_species_list <- c( "Pseudomonas aeruginosa", "Escherichia coli", "Clostridium difficile" ) # Create temporary output paths temp_docx <- file.path(tempdir(), "merged_result.docx") temp_csv <- sub(".docx", ".csv", temp_docx) # --- Phyloseq Example --- result_physeq <- calculate_spike_percentage_list( obj = physeq, merged_spiked_species = spiked_species_list, output_path = temp_docx, passed_range = c(0.1, 10) ) print(result_physeq) # --- TSE Example --- result_tse <- calculate_spike_percentage_list( obj = tse, merged_spiked_species = spiked_species_list, output_path = temp_docx, passed_range = c(0.1, 10) ) print(result_tse) # Clean up if (file.exists(temp_docx)) unlink(temp_docx, force = TRUE) if (file.exists(temp_csv)) unlink(temp_csv, force = TRUE) }if (requireNamespace("DspikeIn", quietly = TRUE)) { # Load example phyloseq and TSE objects data("physeq", package = "DspikeIn") data("tse", package = "DspikeIn") # Define merged spike-in species list spiked_species_list <- c( "Pseudomonas aeruginosa", "Escherichia coli", "Clostridium difficile" ) # Create temporary output paths temp_docx <- file.path(tempdir(), "merged_result.docx") temp_csv <- sub(".docx", ".csv", temp_docx) # --- Phyloseq Example --- result_physeq <- calculate_spike_percentage_list( obj = physeq, merged_spiked_species = spiked_species_list, output_path = temp_docx, passed_range = c(0.1, 10) ) print(result_physeq) # --- TSE Example --- result_tse <- calculate_spike_percentage_list( obj = tse, merged_spiked_species = spiked_species_list, output_path = temp_docx, passed_range = c(0.1, 10) ) print(result_tse) # Clean up if (file.exists(temp_docx)) unlink(temp_docx, force = TRUE) if (file.exists(temp_csv)) unlink(temp_csv, force = TRUE) }
Calculates scaling factors for specified spike-in species or genera in
a phyloseq or TreeSummarizedExperiment (TSE) object. It supports genus/species-level detection,
removes spike-ins, merges them, computes scaling factors, and returns a bias-corrected absolute count matrix.
The function automatically handles:Species or genus-level spike-in identification, Safe tree and taxonomy synchronization,
Volume-based scaling (via the spiked.volume field in metadata),
Optional export of intermediate results for traceability (Total_Reads.csv, Spiked_Reads.csv, Scaling_Factors.csv).
calculate_spikeIn_factors( obj, spiked_cells, merged_spiked_species, output_path = NULL )calculate_spikeIn_factors( obj, spiked_cells, merged_spiked_species, output_path = NULL )
obj |
A |
spiked_cells |
A numeric value for the number of spiked cells per unit volume. |
merged_spiked_species |
A character vector of spiked taxon names (species or genus). |
output_path |
Optional directory path to save intermediate files (default is NULL). |
A list with:
scaling_factors |
Named numeric vector of scaling factors per sample. |
filtered_obj |
Input object with spike-in taxa removed. |
spiked_species_reads |
Data frame with spike-in reads per sample. |
total_reads |
Data frame with total reads per sample. |
spiked_species_merged |
The merged spike-in taxa as a phyloseq object. |
tree |
Original phylogenetic tree, if available. |
if (requireNamespace("DspikeIn", quietly = TRUE) && requireNamespace("phyloseq", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") spiked_cells <- 1847 species_name <- spiked_species <- c("Tetragenococcus_halophilus", "Tetragenococcus_sp.") merged_spiked_species <- "Tetragenococcus_halophilus" # --- Phyloseq Example --- spiked_16S_OTU <- phyloseq::subset_samples(physeq_16SOTU, spiked.volume %in% c("2", "1")) temp_output_file <- file.path(tempdir(), "merged_physeq_sum.rds") output_dir <- file.path(tempdir(), "spikeIn_factors_output") merged_physeq_sum <- Pre_processing_species( spiked_16S_OTU, species_name = merged_spiked_species, merge_method = "sum", output_file = temp_output_file ) result_physeq <- calculate_spikeIn_factors( merged_physeq_sum, spiked_cells = spiked_cells, merged_spiked_species = merged_spiked_species, output_path = output_dir ) print(result_physeq$scaling_factors) if (file.exists(temp_output_file)) unlink(temp_output_file, force = TRUE) if (dir.exists(output_dir)) unlink(output_dir, recursive = TRUE, force = TRUE) # --- TSE Example --- tse_data <- convert_phyloseq_to_tse(physeq_16SOTU) merged_tse_sum <- Pre_processing_species( tse_data, species_name = merged_spiked_species, merge_method = "sum" ) result_tse <- calculate_spikeIn_factors( merged_tse_sum, spiked_cells = spiked_cells, merged_spiked_species = merged_spiked_species ) print(result_tse$scaling_factors) # --- Final cleanup of any extra leftover RDS files --- leftover_rds <- list.files(tempdir(), pattern = "merged_physeq.*\\.rds$", full.names = TRUE) file.remove(leftover_rds[file.exists(leftover_rds)]) }if (requireNamespace("DspikeIn", quietly = TRUE) && requireNamespace("phyloseq", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") spiked_cells <- 1847 species_name <- spiked_species <- c("Tetragenococcus_halophilus", "Tetragenococcus_sp.") merged_spiked_species <- "Tetragenococcus_halophilus" # --- Phyloseq Example --- spiked_16S_OTU <- phyloseq::subset_samples(physeq_16SOTU, spiked.volume %in% c("2", "1")) temp_output_file <- file.path(tempdir(), "merged_physeq_sum.rds") output_dir <- file.path(tempdir(), "spikeIn_factors_output") merged_physeq_sum <- Pre_processing_species( spiked_16S_OTU, species_name = merged_spiked_species, merge_method = "sum", output_file = temp_output_file ) result_physeq <- calculate_spikeIn_factors( merged_physeq_sum, spiked_cells = spiked_cells, merged_spiked_species = merged_spiked_species, output_path = output_dir ) print(result_physeq$scaling_factors) if (file.exists(temp_output_file)) unlink(temp_output_file, force = TRUE) if (dir.exists(output_dir)) unlink(output_dir, recursive = TRUE, force = TRUE) # --- TSE Example --- tse_data <- convert_phyloseq_to_tse(physeq_16SOTU) merged_tse_sum <- Pre_processing_species( tse_data, species_name = merged_spiked_species, merge_method = "sum" ) result_tse <- calculate_spikeIn_factors( merged_tse_sum, spiked_cells = spiked_cells, merged_spiked_species = merged_spiked_species ) print(result_tse$scaling_factors) # --- Final cleanup of any extra leftover RDS files --- leftover_rds <- list.files(tempdir(), pattern = "merged_physeq.*\\.rds$", full.names = TRUE) file.remove(leftover_rds[file.exists(leftover_rds)]) }
Computes summary statistics (mean, standard deviation, standard error, and quartiles)
for numeric columns of a data frame, and saves the resulting table as a .csv file.
This utility is designed for quick post-analysis summaries and ensures that
all numerical columns are summarized consistently across datasets.
calculate_summary_stats_table(data, output_path = NULL)calculate_summary_stats_table(data, output_path = NULL)
data |
A data frame containing numeric variables. |
output_path |
Optional. Character string specifying the CSV output path.
Default is |
This function provides a concise statistical overview of numeric datasets.
It computes central tendency, dispersion, and quartile-based spread metrics
and writes the results to a .csv file suitable for reporting or downstream use.
A data frame containing mean, standard deviation, standard error, first quartile (Q25), median, and third quartile (Q75) for each numeric column.
if (requireNamespace("DspikeIn", quietly = TRUE)) { # --- Phyloseq example --- data("physeq", package = "DspikeIn") absolute_count <- phyloseq::otu_table(physeq) tmp_csv <- file.path(tempdir(), "physeq_summary.csv") summary_table_physeq <- calculate_summary_stats_table( data = as.data.frame(absolute_count), output_path = tmp_csv ) print(summary_table_physeq) if (file.exists(tmp_csv)) file.remove(tmp_csv) # --- TSE example --- data("tse", package = "DspikeIn") tse_counts <- SummarizedExperiment::assay(tse) tmp_csv2 <- file.path(tempdir(), "tse_summary.csv") summary_table_tse <- calculate_summary_stats_table( data = as.data.frame(tse_counts), output_path = tmp_csv2 ) print(summary_table_tse) if (file.exists(tmp_csv2)) file.remove(tmp_csv2) }if (requireNamespace("DspikeIn", quietly = TRUE)) { # --- Phyloseq example --- data("physeq", package = "DspikeIn") absolute_count <- phyloseq::otu_table(physeq) tmp_csv <- file.path(tempdir(), "physeq_summary.csv") summary_table_physeq <- calculate_summary_stats_table( data = as.data.frame(absolute_count), output_path = tmp_csv ) print(summary_table_physeq) if (file.exists(tmp_csv)) file.remove(tmp_csv) # --- TSE example --- data("tse", package = "DspikeIn") tse_counts <- SummarizedExperiment::assay(tse) tmp_csv2 <- file.path(tempdir(), "tse_summary.csv") summary_table_tse <- calculate_summary_stats_table( data = as.data.frame(tse_counts), output_path = tmp_csv2 ) print(summary_table_tse) if (file.exists(tmp_csv2)) file.remove(tmp_csv2) }
This object provides multiple color palettes for scientific, exploratory, and publication-grade graphics:
MG: The original palette.
extended_palette: MG combined with rainbow colors.
light_MG: A chic pastel palette.
MG_Awesome: A vibrant and unique 50-color palette for visualization.
cool_MG: A sophisticated, modern 50-color palette with oceanic, earthy, and high-contrast tones.
mix_MG: A randomly mixed unique palette.
vivid_MG: A vivid and bright 50-color palette.
Mar_palette: A carefully crafted 40-color nature-inspired palette suitable for high-quality publications.
color_palettecolor_palette
A list with eight elements:
A character vector of the original color codes.
A character vector of extended color codes, combining MG with the rainbow palette.
A character vector of chic pastel color codes.
A vibrant 50-color palette designed for high-impact visualizations.
A sophisticated and modern 50-color palette with oceanic, earthy, and high-contrast tones.
A fully combined and randomly mixed unique palette including Mar_palette.
A vivid and bright 50-color palette.
A 40-color palette inspired by nature, suitable for academic and professional publications.
These palettes can be directly used with ggplot2 or other visualization systems supporting manual color scales.
A named list of character vectors, each containing color hex codes for different palettes.
# Example using the Mar_palette ggplot2::ggplot(mtcars, ggplot2::aes(x = wt, y = mpg, color = factor(cyl))) + ggplot2::geom_point(size = 4) + ggplot2::scale_color_manual(values = color_palette$Mar_palette) + ggplot2::theme_minimal()# Example using the Mar_palette ggplot2::ggplot(mtcars, ggplot2::aes(x = wt, y = mpg, color = factor(cyl))) + ggplot2::geom_point(size = 4) + ggplot2::scale_color_manual(values = color_palette$Mar_palette) + ggplot2::theme_minimal()
Computes per-sample spike-in summary statistics from a microbiome object (phyloseq or TSE),
generates a spike-in success report using calculate_spike_percentage(), and returns both the raw data and
a formatted summary table (flextable). The function also attempts to extract and retain the phylogenetic tree if present.
conclusion( obj, merged_spiked_species, max_passed_range = 11, output_path = "merged_data.docx" )conclusion( obj, merged_spiked_species, max_passed_range = 11, output_path = "merged_data.docx" )
obj |
A |
merged_spiked_species |
A character vector of spiked species names. |
max_passed_range |
Numeric, maximum threshold for passing spike percentage. |
output_path |
Character, file path for the |
A list containing:
summary_stats |
A |
full_report |
The full spiked species report as a |
phy_tree |
The phylogenetic tree (if available). |
## ------------------------------- ## Example 1: Using phyloseq object ## ------------------------------- library(DspikeIn) data("physeq_16SOTU", package = "DspikeIn") # Merge spike-in species species_name <- c("Tetragenococcus_halophilus", "Tetragenococcus_sp.") merged_sum <- Pre_processing_species( obj = physeq_16SOTU, species_name = species_name, merge_method = "sum" ) # Compute summary statistics output_doc <- file.path(tempdir(), "summary_phyloseq.docx") results_physeq <- conclusion( obj = merged_sum, merged_spiked_species = "Tetragenococcus_halophilus", max_passed_range = 20, output_path = output_doc ) print(results_physeq$summary_stats) ## ----------------------------------------------- ## Example 2: Using TreeSummarizedExperiment object ## ----------------------------------------------- tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) output_doc_tse <- file.path(tempdir(), "summary_tse.docx") results_tse <- conclusion( obj = tse_16SOTU, merged_spiked_species = "Tetragenococcus_halophilus", max_passed_range = 20, output_path = output_doc_tse ) print(results_tse$summary_stats)## ------------------------------- ## Example 1: Using phyloseq object ## ------------------------------- library(DspikeIn) data("physeq_16SOTU", package = "DspikeIn") # Merge spike-in species species_name <- c("Tetragenococcus_halophilus", "Tetragenococcus_sp.") merged_sum <- Pre_processing_species( obj = physeq_16SOTU, species_name = species_name, merge_method = "sum" ) # Compute summary statistics output_doc <- file.path(tempdir(), "summary_phyloseq.docx") results_physeq <- conclusion( obj = merged_sum, merged_spiked_species = "Tetragenococcus_halophilus", max_passed_range = 20, output_path = output_doc ) print(results_physeq$summary_stats) ## ----------------------------------------------- ## Example 2: Using TreeSummarizedExperiment object ## ----------------------------------------------- tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) output_doc_tse <- file.path(tempdir(), "summary_tse.docx") results_tse <- conclusion( obj = tse_16SOTU, merged_spiked_species = "Tetragenococcus_halophilus", max_passed_range = 20, output_path = output_doc_tse ) print(results_tse$summary_stats)
Convert Categorical Columns to Factors in Sample Data
convert_categorical_to_factors(obj)convert_categorical_to_factors(obj)
obj |
A |
A phyloseq object with updated sample data.
data("physeq_16SOTU", package = "DspikeIn") ps_factor <- convert_categorical_to_factors(physeq_16SOTU)data("physeq_16SOTU", package = "DspikeIn") ps_factor <- convert_categorical_to_factors(physeq_16SOTU)
phyloseq Object to a TreeSummarizedExperiment
Converts a phyloseq object into a TreeSummarizedExperiment (TSE),
preserving key biological data components. The function supports retention of:
OTU abundance matrix (as assay).
Taxonomic classifications (as rowData).
Sample metadata (as colData).
Phylogenetic tree (as rowTree, if available).
Reference sequences (as referenceSeq, if available).
This allows seamless interoperability between phyloseq and Bioconductor ecosystems.
convert_phyloseq_to_tse(physeq)convert_phyloseq_to_tse(physeq)
physeq |
A valid |
A TreeSummarizedExperiment object with one or more of the following slots:
assays: OTU count matrix.
rowData: Taxonomy table.
colData: Sample metadata.
rowTree: Phylogenetic tree (if present).
referenceSeq: Reference sequences (if present).
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Create a small subset for fast execution physeq_sub <- phyloseq::prune_taxa( phyloseq::taxa_names(physeq_16SOTU)[1:10], phyloseq::prune_samples( phyloseq::sample_names(physeq_16SOTU)[1:5], physeq_16SOTU ) ) # Example transformation tse_sub <- convert_phyloseq_to_tse(physeq_sub) tse_sub }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Create a small subset for fast execution physeq_sub <- phyloseq::prune_taxa( phyloseq::taxa_names(physeq_16SOTU)[1:10], phyloseq::prune_samples( phyloseq::sample_names(physeq_16SOTU)[1:5], physeq_16SOTU ) ) # Example transformation tse_sub <- convert_phyloseq_to_tse(physeq_sub) tse_sub }
Converts relative ASV counts in a phyloseq or TreeSummarizedExperiment (TSE) object
to absolute counts by multiplying ASV counts by provided scaling factors.
Ensures phylogenetic tree (rowTree) and reference sequences (refseq) are retained if possible.
convert_to_absolute_counts(obj, scaling_factors, output_dir = NULL)convert_to_absolute_counts(obj, scaling_factors, output_dir = NULL)
obj |
A |
scaling_factors |
A named numeric vector of scaling factors for each sample. |
output_dir |
A character string specifying the directory to save the output file (default: |
A list containing:
absolute_counts: A data frame of absolute counts.
obj_adj: The modified phyloseq or TreeSummarizedExperiment object.
The output file absolute_counts.csv is written to the specified directory. For CRAN compliance, use tempdir() when saving files inside examples or vignettes.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") spiked_cells <- 1847 species_name <- spiked_species <- c("Tetragenococcus_halophilus", "Tetragenococcus_sp.") merged_spiked_species <- "Tetragenococcus_halophilus" spiked_16S_OTU <- phyloseq::subset_samples(physeq_16SOTU, spiked.volume %in% c("2", "1")) Spiked_16S_sum_scaled <- Pre_processing_species( spiked_16S_OTU, species_name, merge_method = "sum", output_file = file.path(tempdir(), "merged_physeq_sum.rds") ) result <- calculate_spikeIn_factors( Spiked_16S_sum_scaled, spiked_cells, merged_spiked_species ) scaling_factors <- result$scaling_factors result_physeq <- convert_to_absolute_counts( Spiked_16S_sum_scaled, scaling_factors, output_dir = tempdir() ) abs_counts_physeq <- result_physeq$absolute_counts physeq_adj <- result_physeq$obj_adj tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) spiked_16S_OTU_TSE <- tse_16SOTU[, tse_16SOTU$spiked.volume %in% c("2", "1")] Spiked_16S_sum_tse_scaled <- Pre_processing_species( spiked_16S_OTU_TSE, species_name, merge_method = "sum", output_file = file.path(tempdir(), "merged_tse_sum.rds") ) result <- calculate_spikeIn_factors( Spiked_16S_sum_tse_scaled, spiked_cells, merged_spiked_species ) scaling_factors <- result$scaling_factors result_tse <- convert_to_absolute_counts( spiked_16S_OTU_TSE, scaling_factors, output_dir = tempdir() ) abs_counts_tse <- result_tse$absolute_counts tse_adj <- result_tse$obj_adj }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") spiked_cells <- 1847 species_name <- spiked_species <- c("Tetragenococcus_halophilus", "Tetragenococcus_sp.") merged_spiked_species <- "Tetragenococcus_halophilus" spiked_16S_OTU <- phyloseq::subset_samples(physeq_16SOTU, spiked.volume %in% c("2", "1")) Spiked_16S_sum_scaled <- Pre_processing_species( spiked_16S_OTU, species_name, merge_method = "sum", output_file = file.path(tempdir(), "merged_physeq_sum.rds") ) result <- calculate_spikeIn_factors( Spiked_16S_sum_scaled, spiked_cells, merged_spiked_species ) scaling_factors <- result$scaling_factors result_physeq <- convert_to_absolute_counts( Spiked_16S_sum_scaled, scaling_factors, output_dir = tempdir() ) abs_counts_physeq <- result_physeq$absolute_counts physeq_adj <- result_physeq$obj_adj tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) spiked_16S_OTU_TSE <- tse_16SOTU[, tse_16SOTU$spiked.volume %in% c("2", "1")] Spiked_16S_sum_tse_scaled <- Pre_processing_species( spiked_16S_OTU_TSE, species_name, merge_method = "sum", output_file = file.path(tempdir(), "merged_tse_sum.rds") ) result <- calculate_spikeIn_factors( Spiked_16S_sum_tse_scaled, spiked_cells, merged_spiked_species ) scaling_factors <- result$scaling_factors result_tse <- convert_to_absolute_counts( spiked_16S_OTU_TSE, scaling_factors, output_dir = tempdir() ) abs_counts_tse <- result_tse$absolute_counts tse_adj <- result_tse$obj_adj }
TreeSummarizedExperiment to a phyloseq ObjectConverts a TreeSummarizedExperiment (TSE) object into a phyloseq object,
preserving key components such as the OTU table, taxonomy, sample metadata, phylogenetic tree,
and reference sequences (if present). This enables seamless interoperability between Bioconductor
and phyloseq workflows.
convert_tse_to_phyloseq(tse)convert_tse_to_phyloseq(tse)
tse |
A
|
A phyloseq object with all available components (otu_table, tax_table,
sample_data, phy_tree, refseq) populated accordingly.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Convert to TSE and back to phyloseq tse <- convert_phyloseq_to_tse(physeq_16SOTU) phy <- convert_tse_to_phyloseq(tse) print(phy) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Convert to TSE and back to phyloseq tse <- convert_phyloseq_to_tse(physeq_16SOTU) phy <- convert_tse_to_phyloseq(tse) print(phy) }
This function checks if a specified directory exists and creates it if it doesn't. Optionally, it can also set the newly created or existing directory as the working directory.
create_directory(directory_path, set_working_dir = FALSE)create_directory(directory_path, set_working_dir = FALSE)
directory_path |
A character string specifying the path of the directory to create. |
set_working_dir |
A logical value indicating whether to set the directory as the working directory. Default is FALSE. |
NULL. The function prints messages indicating whether the directory was created or already exists, and if the working directory was set.
if (interactive()) { # Save the current working directory old_wd <- getwd() # Use a temporary directory for safe example use tmp_dir <- file.path(tempdir(), "example_new_dir") # Create the directory and set it as the working directory create_directory(tmp_dir, set_working_dir = TRUE) # Do something inside tmp_dir... # Restore the original working directory setwd(old_wd) # Remove the created directory unlink(tmp_dir, recursive = TRUE, force = TRUE) }if (interactive()) { # Save the current working directory old_wd <- getwd() # Use a temporary directory for safe example use tmp_dir <- file.path(tempdir(), "example_new_dir") # Create the directory and set it as the working directory create_directory(tmp_dir, set_working_dir = TRUE) # Do something inside tmp_dir... # Restore the original working directory setwd(old_wd) # Remove the created directory unlink(tmp_dir, recursive = TRUE, force = TRUE) }
Create a List from a Phyloseq or TSE Object
create_list(obj)create_list(obj)
obj |
A phyloseq or TreeSummarizedExperiment object. |
A list containing the DGE list and updated phyloseq object.
This function loads a microbial network from a GraphML file or an igraph object,
computes node degree and modularity, assigns node sizes based on degree, and visualizes the network.
Edge thickness is determined by weight. It also computes and optionally saves global network metrics.
degree_network( graph_path, save_metrics = TRUE, metrics_path = "Global_Network_Metrics.csv", layout_type = "stress" )degree_network( graph_path, save_metrics = TRUE, metrics_path = "Global_Network_Metrics.csv", layout_type = "stress" )
graph_path |
Character or igraph object. Path to the GraphML file or an already loaded |
save_metrics |
Logical. If TRUE, saves the global metrics as a CSV file. |
metrics_path |
Optional. Path to save the metrics file. Default: "Global_Network_Metrics.csv". |
layout_type |
Character. Layout algorithm: "stress" (default), "graphopt", "fr", "mds", or "kk". |
A list containing:
plot |
A ggplot object displaying the network. |
metrics |
A dataframe with global network metrics. |
layout_data |
The layout used for plotting. |
graph |
The annotated igraph object. |
if (requireNamespace("DspikeIn", quietly = TRUE)) { Complete <- load_graphml(system.file("extdata", "Complete.graphml", package = "DspikeIn")) # Save metrics to a temporary file temp_metrics <- file.path(tempdir(), "Global_Network_Metrics.csv") result <- degree_network( graph_path = Complete, save_metrics = TRUE, metrics_path = temp_metrics ) print(result$metrics) print(result$plot) # Clean up temporary file unlink(temp_metrics) }if (requireNamespace("DspikeIn", quietly = TRUE)) { Complete <- load_graphml(system.file("extdata", "Complete.graphml", package = "DspikeIn")) # Save metrics to a temporary file temp_metrics <- file.path(tempdir(), "Global_Network_Metrics.csv") result <- degree_network( graph_path = Complete, save_metrics = TRUE, metrics_path = temp_metrics ) print(result$metrics) print(result$plot) # Clean up temporary file unlink(temp_metrics) }
This function identifies the common Amplicon Sequence Variants (ASVs) and
common taxa detected across multiple phyloseq or TreeSummarizedExperiment (TSE) objects.
It extracts ASVs and taxa using the package's accessor functions, finds the common ones,
and returns a pruned object containing only the shared features.
detect_common_asvs_taxa( obj_list, output_common_csv = "common_asvs_taxa.csv", output_common_rds = "common_asvs_taxa.rds", return_as_df = FALSE )detect_common_asvs_taxa( obj_list, output_common_csv = "common_asvs_taxa.csv", output_common_rds = "common_asvs_taxa.rds", return_as_df = FALSE )
obj_list |
A list of |
output_common_csv |
A character string specifying the path to save the pruned ASV/Taxa
object as a CSV file. Default is |
output_common_rds |
A character string specifying the path to save the pruned ASV/Taxa
object as an RDS file. Default is |
return_as_df |
A logical indicating whether to return the results as a data frame ( |
Optionally, the results can be saved to CSV and RDS files.
A pruned phyloseq or TreeSummarizedExperiment object containing only common ASVs and taxa.
If return_as_df = TRUE, a data frame with common ASVs/taxa is returned instead.
## Not run: if (requireNamespace("DspikeIn", quietly = TRUE)) { # Example with phyloseq objects common_physeq <- detect_common_asvs_taxa(list(physeq1, physeq2, physeq3)) # Example with TreeSummarizedExperiment objects common_tse <- detect_common_asvs_taxa(list(tse1, tse2, tse3)) } ## End(Not run)## Not run: if (requireNamespace("DspikeIn", quietly = TRUE)) { # Example with phyloseq objects common_physeq <- detect_common_asvs_taxa(list(physeq1, physeq2, physeq3)) # Example with TreeSummarizedExperiment objects common_tse <- detect_common_asvs_taxa(list(tse1, tse2, tse3)) } ## End(Not run)
Extracts the first and second neighbors of a given target node in an igraph network.
Users can provide either an igraph object or a GraphML file path (internal or external).
extract_neighbors(graph = NULL, target_node, mode = "all")extract_neighbors(graph = NULL, target_node, mode = "all")
graph |
Either:
|
target_node |
Character. The name of the target node. |
mode |
Character. Direction of edges to consider (
|
A list containing:
first_neighbors |
Character vector of first-degree neighbor names. |
second_neighbors |
Character vector of second-degree neighbor names. |
summary |
Data frame summarizing the extracted neighbors. |
# Load the built-in Complete graph complete_graph <- load_graphml("Complete.graphml") result1 <- extract_neighbors( graph = complete_graph, target_node = "OTU69:Basidiobolus_sp" ) print(result1$summary) # Load from an external GraphML file (ensure the file path is correct) # external_graph <- load_graphml("~/custom_network.graphml") # extract_neighbors(external_graph, target_node = "SomeNode")# Load the built-in Complete graph complete_graph <- load_graphml("Complete.graphml") result1 <- extract_neighbors( graph = complete_graph, target_node = "OTU69:Basidiobolus_sp" ) print(result1$summary) # Load from an external GraphML file (ensure the file path is correct) # external_graph <- load_graphml("~/custom_network.graphml") # extract_neighbors(external_graph, target_node = "SomeNode")
Filters low-abundance taxa from a phyloseq or TreeSummarizedExperiment object,
splits data into high- and low-abundance groups, and optionally saves results.
filter_and_split_abundance(obj, threshold = 0.01, output_prefix = NULL)filter_and_split_abundance(obj, threshold = 0.01, output_prefix = NULL)
obj |
A |
threshold |
A numeric value indicating the mean abundance threshold for filtering. |
output_prefix |
A character string specifying the filename prefix for saving.
If |
A named list with components:
Subset with taxa having mean abundance > threshold
Subset with taxa having mean abundance <= threshold
Each returned element contains a single object of the same class as the input (either phyloseq or TreeSummarizedExperiment).
data("physeq_ITSOTU", package = "DspikeIn") # Return results without saving output <- filter_and_split_abundance(physeq_ITSOTU, threshold = 0.05) # With saving tse_ITSOTU <- convert_phyloseq_to_tse(physeq_ITSOTU) output <- filter_and_split_abundance(tse_ITSOTU, threshold = 0.05, output_prefix = file.path(tempdir(), "abund") )data("physeq_ITSOTU", package = "DspikeIn") # Return results without saving output <- filter_and_split_abundance(physeq_ITSOTU, threshold = 0.05) # With saving tse_ITSOTU <- convert_phyloseq_to_tse(physeq_ITSOTU) output <- filter_and_split_abundance(tse_ITSOTU, threshold = 0.05, output_prefix = file.path(tempdir(), "abund") )
Converts a phyloseq or TreeSummarizedExperiment object into
a long-format data frame, similar to phyloseq::psmelt(), for compatibility
with visualization functions such as alluvial_plot().
get_long_format_data(obj)get_long_format_data(obj)
obj |
A |
A long-format data.frame containing taxonomic, abundance, and sample metadata.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Use a small subset for examples physeq_small <- phyloseq::prune_taxa( phyloseq::taxa_names(physeq_16SOTU)[1:10], phyloseq::prune_samples( phyloseq::sample_names(physeq_16SOTU)[1:5], physeq_16SOTU ) ) tse_small <- convert_phyloseq_to_tse(physeq_small) melted <- get_long_format_data(tse_small) head(melted) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Use a small subset for examples physeq_small <- phyloseq::prune_taxa( phyloseq::taxa_names(physeq_16SOTU)[1:10], phyloseq::prune_samples( phyloseq::sample_names(physeq_16SOTU)[1:5], physeq_16SOTU ) ) tse_small <- convert_phyloseq_to_tse(physeq_small) melted <- get_long_format_data(tse_small) head(melted) }
Retrieves the OTU table from a phyloseq or TreeSummarizedExperiment object.
Retrieves the OTU table or assay matrix from a phyloseq or TreeSummarizedExperiment object.
Retrieves the OTU table from a phyloseq or TreeSummarizedExperiment object.
Retrieves the OTU table from a phyloseq or TreeSummarizedExperiment object.
get_otu_table(obj) get_otu_table(obj) get_otu_table(obj) get_otu_table(obj)get_otu_table(obj) get_otu_table(obj) get_otu_table(obj) get_otu_table(obj)
obj |
A |
This is a thin wrapper that unifies access to assay() (for TreeSummarizedExperiment) and otu_table() (for phyloseq) into a single function for format-agnostic use.
Ensures consistent extraction across object classes and automatically transposes
the phyloseq matrix if taxa are not stored as rows.
A matrix containing OTU count data.
A numeric matrix containing OTU count data.
A matrix containing OTU count data.
A matrix containing OTU count data.
Retrieves the phylogenetic tree.
get_phy_tree(obj)get_phy_tree(obj)
obj |
A |
This function wraps phy_tree() and rowTree() for compatibility with both frameworks.
A phylogenetic tree object.
Retrieves the reference sequences (if available) from an object.
get_reference_seq(obj)get_reference_seq(obj)
obj |
A |
Thin wrapper around refseq() or referenceSeq() for unified sequence retrieval.
A DNAStringSet object containing the reference sequences.
Retrieves the sample metadata.
get_sample_data(obj)get_sample_data(obj)
obj |
A |
Thin wrapper over colData() or sample_data() for unified metadata access across formats.
A data frame with sample metadata.
Retrieves the total sequence counts per sample from a phyloseq or TreeSummarizedExperiment object.
get_sample_sums(obj)get_sample_sums(obj)
obj |
A |
A numeric vector of total counts per sample.
Retrieves the taxonomy table from an object.
get_tax_table(obj)get_tax_table(obj)
obj |
A |
This is a thin wrapper around rowData() or tax_table() to ensure compatibility across data structures.
A data frame containing taxonomy annotations.
This documentation provides an overview of normalization methods used for microbiome analysis.
For Differential Abundance Analysis: DESeq, TMM, or CSS normalization methods handle compositional biases and library size differences.
For Compositional Data: CLR normalization accounts for compositional structure by transforming the data into log-ratio format.
For Simplicity and Ease of Use: TC, UQ, or Median normalization methods are quick but may not be as robust.
This function calculates the geometric mean of a numeric vector. It removes non-positive and NA values by default.
gm_mean(x, na.rm = TRUE)gm_mean(x, na.rm = TRUE)
x |
A numeric vector. |
na.rm |
Logical. Should missing values (NAs) be removed? Defaults to TRUE. |
Overview of Normalization Methods
Geometric mean of x, or NA if no valid values are present.
These normalization methods are commonly used in microbiome analysis to ensure fair comparisons across samples.
vec <- c(1, 10, 100, 1000) gm_mean(vec)vec <- c(1, 10, 100, 1000) gm_mean(vec)
Computes per-sample scaling factors for multiple spike-in taxa (e.g.,
Bacillus_spike, Flavobacterium_spike) in either a phyloseq or
TreeSummarizedExperiment object. Handles variable spike-in cell counts per
sample and supports "sum" or "max" OTU merging methods.
imbalance_calculate_list_average_scaling_factors( obj, spiked_species_list, spiked_cells_list, merge_method = c("sum", "max"), normalize = TRUE, allow_infinite = FALSE, verbose = FALSE )imbalance_calculate_list_average_scaling_factors( obj, spiked_species_list, spiked_cells_list, merge_method = c("sum", "max"), normalize = TRUE, allow_infinite = FALSE, verbose = FALSE )
obj |
A |
spiked_species_list |
A named list of character vectors giving the
spike-in species names (as in |
spiked_cells_list |
A named list (same length as |
merge_method |
|
normalize |
Logical; if |
allow_infinite |
Logical; if |
verbose |
Logical; if |
Scaling factors are computed as:
For each spike-in group:
Identify OTUs matching that spike species via the Species column.
Merge those OTUs per sample (sum or max).
Divide expected spike cells by observed reads.
Average across all spike-in groups to produce one factor per sample.
Uses full matrix preallocation (no incremental vector growth) for Bioconductor
compliance. Missing values (zero spike reads) are set to NA or Inf if
allow_infinite = TRUE. Samples with all NA receive scaling = 1.
Named numeric vector of scaling factors (one per sample).
if (requireNamespace("phyloseq", quietly = TRUE)) { library(phyloseq) ## Example dataset otu <- matrix( c( 6000, 6200, 5900, 6100, 4000, 4200, 3900, 4100, 2000, 1900, 2100, 2050, 1300, 1250, 1350, 1400, 500, 800, 900, 700, # Flavobacterium_spike 900, 1200, 1100, 1000 # Bacillus_spike ), nrow = 6, byrow = TRUE, dimnames = list( c("OTU1", "OTU2", "OTU3", "OTU4", "Flavobacterium_spike", "Bacillus_spike"), c("S1", "S2", "S3", "S4") ) ) tax <- data.frame( Kingdom = rep("Bacteria", 6), Species = c("OTU1", "OTU2", "OTU3", "OTU4", "Flavobacterium_spike", "Bacillus_spike"), row.names = rownames(otu) ) # Fixed: add a column so sample_data is valid sam <- data.frame(SampleID = c("S1", "S2", "S3", "S4"), row.names = c("S1", "S2", "S3", "S4")) ps <- phyloseq( otu_table(otu, taxa_are_rows = TRUE), tax_table(as.matrix(tax)), sample_data(sam) ) spiked_species_list <- list( Flavo = "Flavobacterium_spike", Bacillus = "Bacillus_spike" ) spiked_cells_list <- list( Flavo = c(S1 = 1e7, S2 = 3e7, S3 = 6e7, S4 = 2e7), Bacillus = c(S1 = 2e7, S2 = 1e7, S3 = 5e7, S4 = 3e7) ) ## Works for both phyloseq and TSE: factors_phy <- imbalance_calculate_list_average_scaling_factors( ps, spiked_species_list, spiked_cells_list, normalize = FALSE ) tss <- convert_phyloseq_to_tse(ps) factors_tse <- imbalance_calculate_list_average_scaling_factors( tss, spiked_species_list, spiked_cells_list, normalize = FALSE ) all.equal(factors_phy, factors_tse) }if (requireNamespace("phyloseq", quietly = TRUE)) { library(phyloseq) ## Example dataset otu <- matrix( c( 6000, 6200, 5900, 6100, 4000, 4200, 3900, 4100, 2000, 1900, 2100, 2050, 1300, 1250, 1350, 1400, 500, 800, 900, 700, # Flavobacterium_spike 900, 1200, 1100, 1000 # Bacillus_spike ), nrow = 6, byrow = TRUE, dimnames = list( c("OTU1", "OTU2", "OTU3", "OTU4", "Flavobacterium_spike", "Bacillus_spike"), c("S1", "S2", "S3", "S4") ) ) tax <- data.frame( Kingdom = rep("Bacteria", 6), Species = c("OTU1", "OTU2", "OTU3", "OTU4", "Flavobacterium_spike", "Bacillus_spike"), row.names = rownames(otu) ) # Fixed: add a column so sample_data is valid sam <- data.frame(SampleID = c("S1", "S2", "S3", "S4"), row.names = c("S1", "S2", "S3", "S4")) ps <- phyloseq( otu_table(otu, taxa_are_rows = TRUE), tax_table(as.matrix(tax)), sample_data(sam) ) spiked_species_list <- list( Flavo = "Flavobacterium_spike", Bacillus = "Bacillus_spike" ) spiked_cells_list <- list( Flavo = c(S1 = 1e7, S2 = 3e7, S3 = 6e7, S4 = 2e7), Bacillus = c(S1 = 2e7, S2 = 1e7, S3 = 5e7, S4 = 3e7) ) ## Works for both phyloseq and TSE: factors_phy <- imbalance_calculate_list_average_scaling_factors( ps, spiked_species_list, spiked_cells_list, normalize = FALSE ) tss <- convert_phyloseq_to_tse(ps) factors_tse <- imbalance_calculate_list_average_scaling_factors( tss, spiked_species_list, spiked_cells_list, normalize = FALSE ) all.equal(factors_phy, factors_tse) }
Labels ASVs/OTUs in a phyloseq object using a named vector mapping hashcodes
to known taxonomy labels (e.g., spike-in species, genera, or families). This is
especially useful for clearly labeling synthetic controls or mock taxa.
If the specified taxonomic rank (e.g., "Genus" or "Family") does not exist in the
tax_table, it will be added and filled with NA, followed by inserting your custom labels.
label(obj, hashcode_label_map, tax_rank = "Species")label(obj, hashcode_label_map, tax_rank = "Species")
obj |
A |
hashcode_label_map |
A named character vector where names are ASV hashcodes (i.e., row names of the tax table), and values are the taxonomy labels to assign. |
tax_rank |
A character string specifying the taxonomic rank to label
(e.g., |
A modified phyloseq object with updated taxonomic labels at the specified rank.
if (requireNamespace("phyloseq", quietly = TRUE)) { library(phyloseq) # Create dummy tax_table with hashcodes tax_mat <- matrix( data = c( "Bacteria", NA, "Bacteria", NA, "Bacteria", NA, "Bacteria", NA ), nrow = 4, dimnames = list( c( "8ac7ad6e4b6501eb143d97f10bcc2b6d", "5a92565231c6df8f58871c0df2d1a12a", "8bf5a7b04cb725bc3d2627b971eb03fb", "7ae171e44f46ddadcbb53ee6b34a483b" ), c("Kingdom", "Species") ) ) ps <- phyloseq::phyloseq(phyloseq::tax_table(tax_mat)) # Define hash-to-label map hash_map <- c( "8ac7ad6e4b6501eb143d97f10bcc2b6d" = "MockGenus_A", "5a92565231c6df8f58871c0df2d1a12a" = "MockGenus_B" ) # Label genus (automatically adds "Genus" column if missing) ps <- label(ps, hash_map, tax_rank = "Genus") # Check updated taxonomy phyloseq::tax_table(ps)[, "Genus"] }if (requireNamespace("phyloseq", quietly = TRUE)) { library(phyloseq) # Create dummy tax_table with hashcodes tax_mat <- matrix( data = c( "Bacteria", NA, "Bacteria", NA, "Bacteria", NA, "Bacteria", NA ), nrow = 4, dimnames = list( c( "8ac7ad6e4b6501eb143d97f10bcc2b6d", "5a92565231c6df8f58871c0df2d1a12a", "8bf5a7b04cb725bc3d2627b971eb03fb", "7ae171e44f46ddadcbb53ee6b34a483b" ), c("Kingdom", "Species") ) ) ps <- phyloseq::phyloseq(phyloseq::tax_table(tax_mat)) # Define hash-to-label map hash_map <- c( "8ac7ad6e4b6501eb143d97f10bcc2b6d" = "MockGenus_A", "5a92565231c6df8f58871c0df2d1a12a" = "MockGenus_B" ) # Label genus (automatically adds "Genus" column if missing) ps <- label(ps, hash_map, tax_rank = "Genus") # Check updated taxonomy phyloseq::tax_table(ps)[, "Genus"] }
Loads a GraphML network from DspikeIn or user path, ensuring all attributes remain intact and correctly assigning original node names.
load_graphml(filename = "herp.spiecsym.network.graphml")load_graphml(filename = "herp.spiecsym.network.graphml")
filename |
Name of the GraphML file (default: "herp.spiecsym.network.graphml") or full file path. |
An igraph object representing the network.
This dataset contains detailed metadata associated with microbiome sequencing samples. It includes host information, sequencing metrics, environmental classifications, and experimental factors relevant for downstream analyses.
data(metadata_full)data(metadata_full)
A data frame with 312 rows and 46 columns:
Unique identifier for each sample.
Total number of sequencing reads before processing.
Total number of sequencing reads after spiking.
Relative abundance percentage of the target taxa.
Experimental result classification.
Beta diversity distances between samples.
Observed microbial richness in the sample.
Chao1 diversity estimator.
Abundance-based Coverage Estimator (ACE).
Standard error of ACE.
Shannon diversity index.
Simpson diversity index.
Inverse Simpson diversity index.
16S rRNA sequencing biosample identifier.
DNA biosample identifier.
General data type classification.
Library quantification (ng/L).
Identifier for the sequencing plate.
Well location on the sequencing plate.
Broad-scale environmental classification.
Taxonomic classification of the host.
Genus of the host organism.
Species of the host organism.
Classification of the host (e.g., Mammal, Bird).
Ecological mode of the host organism.
Taxonomic order of the host.
Taxonomic family of the host.
General dietary classification of the host.
Detailed dietary classification.
Habitat description where the sample was collected.
Indicates whether the species undergoes metamorphosis.
Reproductive strategy of the host species.
Ecoregion classification (level III).
Ecoregion classification (level IV).
Sampling site identifier.
Human-readable sample name.
Parent biosample identifier.
Secondary data type classification.
Second library quantification (ng/L).
Secondary plate ID (if applicable).
Secondary well location (if applicable).
Indicates whether the sample is biological or a blank control.
Indicates whether the sample was spiked with a blank.
Volume of spike added to the sample.
Indicates whether a swab was used for collection.
Molecular spike-in used in the sequencing process.
A data frame containing sample-level metadata for microbiome sequencing analysis.
data(metadata_full) head(metadata_full) summary(metadata_full)data(metadata_full) head(metadata_full) summary(metadata_full)
A numeric vector of point shapes (0-25) for use in ggplot2.
MG_shapesMG_shapes
An object of class numeric of length 26.
A numeric vector of shape codes.
# Example usage of MG_shapes in ggplot2 ggplot2::ggplot(mtcars, ggplot2::aes(x = wt, y = mpg, shape = factor(cyl))) + ggplot2::geom_point(size = 4) + ggplot2::scale_shape_manual(values = MG_shapes[seq_len(3)]) + my_custom_theme()# Example usage of MG_shapes in ggplot2 ggplot2::ggplot(mtcars, ggplot2::aes(x = wt, y = mpg, shape = factor(cyl))) + ggplot2::geom_point(size = 4) + ggplot2::scale_shape_manual(values = MG_shapes[seq_len(3)]) + my_custom_theme()
Creates a custom ggplot2 theme with consistent styling options, including background color, font size, and axis line formatting.
my_custom_theme(base_size = 12, font_family = "sans", bg_color = "white")my_custom_theme(base_size = 12, font_family = "sans", bg_color = "white")
base_size |
Numeric. Base font size for text elements (default: 12). |
font_family |
Character. Font family to use for text elements (default: "sans"). |
bg_color |
Character. Background color of the plot (default: "white"). |
This function applies a custom theme to ggplot2 plots. It removes unnecessary grid lines, ensures a clean background, and standardizes text styling.
A ggplot2 theme object.
library(ggplot2) p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(size = 3) + my_custom_theme() print(p)library(ggplot2) p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(size = 3) + my_custom_theme() print(p)
Computes various network metrics and generates:
Two multi-panel visualizations (each with 4 subplots).
A faceted plot of standardized (Z-score) metrics across communities.
A formatted flextable summarizing node-level metrics.
node_level_metrics(graph, save_path = NULL)node_level_metrics(graph, save_path = NULL)
graph |
An |
save_path |
Character. A file path (without extension) to save figures and table. Default is NULL (no saving). |
The function computes the following node-level metrics:
| Metric | Description |
Node |
Node name (character format) |
Degree |
Number of edges connected to the node |
Strength |
Sum of edge weights connected to the node |
Closeness |
Closeness centrality (normalized, based on shortest paths) |
Betweenness |
Betweenness centrality (normalized, measures control over network flow) |
EigenvectorCentrality |
Eigenvector centrality (importance based on connections to influential nodes) |
PageRank |
PageRank score (importance based on incoming links) |
Transitivity |
Local clustering coefficient (tendency of a node to form triangles) |
Coreness |
Node's coreness (from k-core decomposition) |
Constraint |
Burt's constraint (measures structural holes in a node's ego network) |
EffectiveSize |
Inverse of constraint (larger values = more non-redundant connections) |
Redundancy |
Sum of constraint values of a node's alters |
Community |
Community assignment from Louvain clustering |
Efficiency |
Global efficiency (average inverse shortest path length) |
Local_Efficiency |
Local efficiency (subgraph efficiency for a node's neighbors) |
Within_Module_Connectivity |
Proportion of neighbors in the same community |
Among_Module_Connectivity |
Proportion of neighbors in different communities |
A list containing:
metrics: A data frame with node-level metrics.
plot1: First multi-panel plot (2x2 layout with 4 subplots)
plot2: Second multi-panel plot (2x2 layout with 4 subplots)
facet_plot: A faceted plot showing Z-score standardized metrics across communities.
library(igraph) set.seed(42) # For external graphml please use full address # Load internal graphml Complete <- load_graphml("Complete.graphml") # Compute node-level metrics result <- node_level_metrics(Complete) # View computed metrics print(result$metrics) # Show the first 4x4 plot print(result$plots$plot1) # Show the second 4x4 plot print(result$plots$plot2) # Show facet plot print(result$facet_plot) # Print metrics and flextable print(result$metrics) print(result$flextable)library(igraph) set.seed(42) # For external graphml please use full address # Load internal graphml Complete <- load_graphml("Complete.graphml") # Compute node-level metrics result <- node_level_metrics(Complete) # View computed metrics print(result$metrics) # Show the first 4x4 plot print(result$plots$plot1) # Show the second 4x4 plot print(result$plots$plot2) # Show facet plot print(result$facet_plot) # Print metrics and flextable print(result$metrics) print(result$flextable)
CLR Normalization (Centered Log-Ratio Transformation)
norm.clr(obj)norm.clr(obj)
obj |
A Phyloseq or TreeSummarizedExperiment objects. |
A list containing the normalized phyloseq object and scaling factors.
CSS Normalization (Cumulative Sum Scaling)
norm.css(obj)norm.css(obj)
obj |
A Phyloseq or TreeSummarizedExperiment objects. |
A list containing the normalized phyloseq object and scaling factors.
DESeq Normalization
norm.DESeq(obj, groups, pseudocount = 1)norm.DESeq(obj, groups, pseudocount = 1)
obj |
A |
groups |
A string specifying the grouping variable in sample data. |
pseudocount |
A numeric value added to avoid zeros in the dataset. |
A list containing the normalized object (same format as input) and scaling factors.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Example 1: phyloseq input (subset to Animal.type == "Frog") physeq_frog <- phyloseq::subset_samples(physeq_16SOTU, Animal.type == "Frog") result_DESeq_phy <- norm.DESeq(physeq_frog, groups = "Animal.type", pseudocount = 1) # Example 2: TSE input (convert and subset to Animal.type == "Frog") tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) col_meta <- SummarizedExperiment::colData(tse_16SOTU) tse_frog <- tse_16SOTU[, which(col_meta$Animal.type == "Frog")] result_DESeq_tse <- norm.DESeq(tse_frog, groups = "Animal.type", pseudocount = 1) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Example 1: phyloseq input (subset to Animal.type == "Frog") physeq_frog <- phyloseq::subset_samples(physeq_16SOTU, Animal.type == "Frog") result_DESeq_phy <- norm.DESeq(physeq_frog, groups = "Animal.type", pseudocount = 1) # Example 2: TSE input (convert and subset to Animal.type == "Frog") tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) col_meta <- SummarizedExperiment::colData(tse_16SOTU) tse_frog <- tse_16SOTU[, which(col_meta$Animal.type == "Frog")] result_DESeq_tse <- norm.DESeq(tse_frog, groups = "Animal.type", pseudocount = 1) }
Median Normalization
norm.med(obj, groups)norm.med(obj, groups)
obj |
A Phyloseq or TreeSummarizedExperiment objects. |
groups |
A string specifying the grouping variable in sample data. |
A list containing the normalized phyloseq object and scaling factors.
Poisson Normalization and Differential Abundance Function
norm.Poisson(obj, group_var = NULL, pseudocount = 1e-06)norm.Poisson(obj, group_var = NULL, pseudocount = 1e-06)
obj |
A Phyloseq or TreeSummarizedExperiment objects. |
group_var |
A string specifying the grouping variable in sample data (if phyloseq object). |
pseudocount |
A numeric value added to avoid division by zero. |
A list containing the normalized data, scaling factor, and differential abundance results.
Quantile Normalization (QN) for phyloseq object
norm.QN(obj, filter = FALSE)norm.QN(obj, filter = FALSE)
obj |
A Phyloseq or TreeSummarizedExperiment objects. |
filter |
Logical, whether to filter low counts. |
A list containing the normalized phyloseq object and scaling factors.
Rarefying
norm.rar(obj)norm.rar(obj)
obj |
A Phyloseq or TreeSummarizedExperiment objects. |
A list containing the normalized phyloseq object and scaling factors.
RLE Normalization (Relative Log Expression)
norm.rle( obj, locfunc = stats::median, type = c("poscounts", "ratio"), geo_means = NULL, control_genes = NULL )norm.rle( obj, locfunc = stats::median, type = c("poscounts", "ratio"), geo_means = NULL, control_genes = NULL )
obj |
A Phyloseq or TreeSummarizedExperiment objects. |
locfunc |
A function to compute the location statistic (default is median). |
type |
A character string specifying the type of normalization ("poscounts" or "ratio"). |
geo_means |
A vector of geometric means for each feature. |
control_genes |
A vector of control genes. |
A list containing the normalized phyloseq object and scaling factors.
TC Normalization (Total Count Scaling)
norm.TC(obj, groups)norm.TC(obj, groups)
obj |
A Phyloseq or TreeSummarizedExperiment objects. |
groups |
A string specifying the grouping variable in sample data. |
A list containing the normalized phyloseq object and scaling factors.
TMM Normalization (Trimmed Mean of M component)
norm.TMM(obj, groups)norm.TMM(obj, groups)
obj |
A Phyloseq or TreeSummarizedExperiment objects. |
groups |
A string specifying the grouping variable in sample data. |
A list containing the normalized phyloseq object and scaling factors.
TSS Normalization (Total Sum Scaling)
norm.tss(obj)norm.tss(obj)
obj |
A Phyloseq or TreeSummarizedExperiment objects. |
A list containing the normalized phyloseq object and scaling factor
UQ Normalization (Upper Quartile)
norm.UQ(obj, groups)norm.UQ(obj, groups)
obj |
A Phyloseq or TreeSummarizedExperiment objects. |
groups |
A string specifying the grouping variable in sample data. |
A list containing the normalized phyloseq object and scaling factors.
Apply the Selected Normalization Method to the Phyloseq and TSE Objects
normalization_set(obj, method, groups = NULL)normalization_set(obj, method, groups = NULL)
obj |
A phyloseq object. |
method |
A character string specifying the normalization method ("TC", "UQ", "med", "DESeq", "Poisson", "QN", "TMM", "clr", "rar", "css", "tss", "rle"). |
groups |
A column name of group labels from sample data. |
A list containing the normalized phyloseq object and scaling factors.
# Example with a phyloseq object if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") ps <- physeq_16SOTU result_phyloseq <- normalization_set(ps, method = "TC", groups = "Host.species") head(result_phyloseq$scaling.factor) normed_physeq <- result_phyloseq$dat.normed } # Example with a TreeSummarizedExperiment (TSE) object if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) result_tse <- normalization_set(tse_16SOTU, method = "clr") head(result_tse$scaling.factor) normed_tse <- result_tse$dat.normed } # For a full comparison of all normalization methods, see the vignette: # vignette("DspikeIn-with-Phyloseq", package = "DspikeIn") # vignette("DspikeIn-with-TSE", package = "DspikeIn")# Example with a phyloseq object if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") ps <- physeq_16SOTU result_phyloseq <- normalization_set(ps, method = "TC", groups = "Host.species") head(result_phyloseq$scaling.factor) normed_physeq <- result_phyloseq$dat.normed } # Example with a TreeSummarizedExperiment (TSE) object if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) result_tse <- normalization_set(tse_16SOTU, method = "clr") head(result_tse$scaling.factor) normed_tse <- result_tse$dat.normed } # For a full comparison of all normalization methods, see the vignette: # vignette("DspikeIn-with-Phyloseq", package = "DspikeIn") # vignette("DspikeIn-with-TSE", package = "DspikeIn")
Performs differential abundance analysis using edgeR or DESeq2, and visualizes results with a volcano plot, a log-fold change bar plot, and a relative abundance bar plot for significantly enriched taxa. Supports comparisons across one or more variables and allows global FDR correction across multiple contrasts.
perform_and_visualize_DA( obj, method, group_var = NULL, contrast, pseudocount = 1, significance_level = 0.05, output_csv_path = NULL, target_glom = "Genus", palette = c("#FFEB3B", "#073B4C"), global_fdr = TRUE )perform_and_visualize_DA( obj, method, group_var = NULL, contrast, pseudocount = 1, significance_level = 0.05, output_csv_path = NULL, target_glom = "Genus", palette = c("#FFEB3B", "#073B4C"), global_fdr = TRUE )
obj |
A |
method |
A string: either |
group_var |
A string specifying the grouping variable in the sample metadata.
Not required if |
contrast |
One of the following:
|
pseudocount |
A numeric value used to replace zero or negative counts (default = |
significance_level |
A numeric value for the FDR threshold to determine significance (default = |
output_csv_path |
Optional path to save results as CSV files. For multiple contrasts, each is saved separately. |
target_glom |
A string specifying the taxonomic rank to aggregate taxa (default = |
palette |
A vector of colors for plotting significant and non-significant points (default = |
global_fdr |
Logical. If |
For edgeR, the standard error of log-fold change (lfcSE) is estimated using the formula:
lfcSE = logFC / sqrt(LR), based on the likelihood ratio test statistic.
When providing multiple contrasts, FDR correction can be applied globally using the
global_fdr = TRUE option.
A list of result objects, or a single result if a single contrast is given. Each result includes:
results |
A |
obj_significant |
A filtered |
plot |
A |
bar_plot |
A |
bar_abundance_plot |
A |
if (requireNamespace("phyloseq", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Salamander samples belonging to two diet types ps_sal <- phyloseq::subset_samples( physeq_16SOTU, Animal.type == "Salamander" & Diet %in% c("Insectivore", "Carnivore") ) # Remove taxa with zero counts ps_sal <- phyloseq::prune_taxa( phyloseq::taxa_sums(ps_sal) > 0, ps_sal ) # Differential abundance test at Genus level da_res <- perform_and_visualize_DA( obj = ps_sal, method = "DESeq2", group_var = "Diet", significance_level = 0.05, contrast = c("Insectivore", "Carnivore"), target_glom = "Genus" ) # Visualize results if (!is.null(da_res$plot)) print(da_res$plot) head(da_res$results) # Example: multiple contrasts (optional demonstration) contrast_list <- list(c("Insectivore", "Carnivore")) da_multi <- perform_and_visualize_DA( obj = ps_sal, method = "DESeq2", group_var = "Diet", significance_level = 0.01, contrast = contrast_list, target_glom = "Genus", global_fdr = TRUE ) if (!is.null(da_multi[[1]]$bar_plot)) print(da_multi[[1]]$bar_plot) }if (requireNamespace("phyloseq", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Salamander samples belonging to two diet types ps_sal <- phyloseq::subset_samples( physeq_16SOTU, Animal.type == "Salamander" & Diet %in% c("Insectivore", "Carnivore") ) # Remove taxa with zero counts ps_sal <- phyloseq::prune_taxa( phyloseq::taxa_sums(ps_sal) > 0, ps_sal ) # Differential abundance test at Genus level da_res <- perform_and_visualize_DA( obj = ps_sal, method = "DESeq2", group_var = "Diet", significance_level = 0.05, contrast = c("Insectivore", "Carnivore"), target_glom = "Genus" ) # Visualize results if (!is.null(da_res$plot)) print(da_res$plot) head(da_res$results) # Example: multiple contrasts (optional demonstration) contrast_list <- list(c("Insectivore", "Carnivore")) da_multi <- perform_and_visualize_DA( obj = ps_sal, method = "DESeq2", group_var = "Diet", significance_level = 0.01, contrast = contrast_list, target_glom = "Genus", global_fdr = TRUE ) if (!is.null(da_multi[[1]]$bar_plot)) print(da_multi[[1]]$bar_plot) }
A general-purpose phyloseq object for microbiome method development and testing.
This synthetic dataset includes OTU abundances, taxonomic classifications, sample metadata,
a phylogenetic tree, and reference sequences.
data(physeq)data(physeq)
A phyloseq object with:
OTU abundance matrix (ASVs samples).
Taxonomic classification of ASVs.
Sample-level metadata.
A rooted phylogenetic tree of ASVs.
DNA reference sequences corresponding to each ASV.
A phyloseq object with full microbiome data components.
Synthetic data generated for benchmarking and demonstration.
if (requireNamespace("phyloseq", quietly = TRUE)) { data(physeq, package = "DspikeIn") physeq phyloseq::sample_names(physeq) phyloseq::taxa_names(physeq) phyloseq::phy_tree(physeq) phyloseq::refseq(physeq) }if (requireNamespace("phyloseq", quietly = TRUE)) { data(physeq, package = "DspikeIn") physeq phyloseq::sample_names(physeq) phyloseq::taxa_names(physeq) phyloseq::phy_tree(physeq) phyloseq::refseq(physeq) }
This dataset contains an example phyloseq object representing
16S rRNA gene amplicon sequencing data. It includes taxonomic assignments,
abundance counts, sample metadata, and phylogenetic information.
This object is intended for demonstration and testing of microbiome workflows.
data(physeq_16SOTU)data(physeq_16SOTU)
A phyloseq object with:
Operational Taxonomic Unit (OTU) abundance matrix.
Taxonomic classification of OTUs.
Metadata associated with the samples.
Phylogenetic tree relating OTUs (if available).
A phyloseq object containing 16S OTU data with taxonomy, sample metadata, and phylogeny.
Internal dataset for microbiome analysis.
if (requireNamespace("phyloseq", quietly = TRUE)) { data(physeq_16SOTU) physeq_16SOTU summary(physeq_16SOTU) phyloseq::sample_names(physeq_16SOTU) phyloseq::taxa_names(physeq_16SOTU) }if (requireNamespace("phyloseq", quietly = TRUE)) { data(physeq_16SOTU) physeq_16SOTU summary(physeq_16SOTU) phyloseq::sample_names(physeq_16SOTU) phyloseq::taxa_names(physeq_16SOTU) }
This dataset contains an example phyloseq object representing
Internal Transcribed Spacer (ITS) amplicon sequencing data. It includes
taxonomic annotations, OTU abundance counts, and associated sample metadata,
suitable for downstream analysis of fungal communities.
data(physeq_ITSOTU)data(physeq_ITSOTU)
A phyloseq object with:
Operational Taxonomic Unit (OTU) abundance matrix.
Taxonomic classification of OTUs.
Metadata associated with the samples.
Phylogenetic tree relating OTUs (if available).
A phyloseq object containing ITS OTU data with taxonomy, sample metadata, and phylogeny.
Internal dataset for microbiome analysis.
if (requireNamespace("phyloseq", quietly = TRUE)) { data(physeq_ITSOTU) physeq_ITSOTU summary(physeq_ITSOTU) phyloseq::sample_data(physeq_ITSOTU) phyloseq::taxa_names(physeq_ITSOTU) }if (requireNamespace("phyloseq", quietly = TRUE)) { data(physeq_ITSOTU) physeq_ITSOTU summary(physeq_ITSOTU) phyloseq::sample_data(physeq_ITSOTU) phyloseq::taxa_names(physeq_ITSOTU) }
This function generates a prevalence heatmap of the core microbiome at a specified taxonomic rank. It allows users to pass custom detection thresholds, prevalence thresholds, and a minimum prevalence filter, and it provides an option to order taxa either in ascending or descending abundance. The plot displays a heatmap showing detection thresholds at different prevalence levels.
plot_core_microbiome_custom( obj, taxrank = "Genus", select_taxa = NULL, detections = list(prevalences = seq(0.03, 1, 0.01), thresholds = 10^seq(log10(0.03), log10(1), length = 10), min_prevalence = 0.2, taxa_order = "descending"), output_core_csv = NULL, output_core_rds = NULL )plot_core_microbiome_custom( obj, taxrank = "Genus", select_taxa = NULL, detections = list(prevalences = seq(0.03, 1, 0.01), thresholds = 10^seq(log10(0.03), log10(1), length = 10), min_prevalence = 0.2, taxa_order = "descending"), output_core_csv = NULL, output_core_rds = NULL )
obj |
A |
taxrank |
A character string specifying the taxonomic rank to glom taxa. Default is "Genus". |
select_taxa |
A character vector of taxa to select. Default is |
detections |
A list with the following elements:
|
output_core_csv |
Path to save the core microbiome subset as a CSV file.
Default is |
output_core_rds |
Path to save the core microbiome subset as an RDS file.
Default is |
A ggplot2 object representing the core microbiome prevalence heatmap.
Uses microbiome::plot_core() for core heatmap visualization
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Subset to only Frog samples (biologically meaningful) physeq_frog <- phyloseq::subset_samples( physeq_16SOTU, Animal.type == "Frog" ) # Remove taxa with zero total abundance physeq_frog <- phyloseq::prune_taxa( phyloseq::taxa_sums(physeq_frog) > 0, physeq_frog ) # Relaxed thresholds to ensure visible core taxa custom_detections <- list( prevalences = seq(0.03, 1, 0.01), thresholds = 10^seq(log10(0.03), log10(1), length = 10), min_prevalence = 0.3, taxa_order = "ascending" ) # Temporary output paths (required for Bioconductor examples) core_csv <- file.path(tempdir(), "core_microbiome_frog.csv") core_rds <- file.path(tempdir(), "core_microbiome_frog.rds") # Generate the core microbiome prevalence plot plot_result <- plot_core_microbiome_custom( obj = physeq_frog, detections = custom_detections, taxrank = "Genus", output_core_csv = core_csv, output_core_rds = core_rds ) print(plot_result) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Subset to only Frog samples (biologically meaningful) physeq_frog <- phyloseq::subset_samples( physeq_16SOTU, Animal.type == "Frog" ) # Remove taxa with zero total abundance physeq_frog <- phyloseq::prune_taxa( phyloseq::taxa_sums(physeq_frog) > 0, physeq_frog ) # Relaxed thresholds to ensure visible core taxa custom_detections <- list( prevalences = seq(0.03, 1, 0.01), thresholds = 10^seq(log10(0.03), log10(1), length = 10), min_prevalence = 0.3, taxa_order = "ascending" ) # Temporary output paths (required for Bioconductor examples) core_csv <- file.path(tempdir(), "core_microbiome_frog.csv") core_rds <- file.path(tempdir(), "core_microbiome_frog.rds") # Generate the core microbiome prevalence plot plot_result <- plot_core_microbiome_custom( obj = physeq_frog, detections = custom_detections, taxrank = "Genus", output_core_csv = core_csv, output_core_rds = core_rds ) print(plot_result) }
Diagnostic visualization of spike-in taxa (ASVs/OTUs) on a phylogenetic tree.
Works with both phyloseq and TreeSummarizedExperiment objects (auto-converts internally).
Visual elements:
Tip labels (OTU/ASV names)
Branch length annotations
Prevalence (star size on tip)
Log10(mean abundance) (bar ring)
Sample metadata (discrete tile at tip)
plot_spikein_tree_diagnostic( obj, metadata_var, output_prefix = "spikein_diag", layout = "circular", save_plot = FALSE, width = 10, height = 10 )plot_spikein_tree_diagnostic( obj, metadata_var, output_prefix = "spikein_diag", layout = "circular", save_plot = FALSE, width = 10, height = 10 )
obj |
A |
metadata_var |
Character. Sample metadata variable to use as tile color. |
output_prefix |
Character. Filename prefix for saved plot. Default = "spikein_diag". |
layout |
Character. Tree layout. Default = "circular". |
save_plot |
Logical. Save figure if TRUE. |
width |
Numeric. Width of the output plot in inches. Default = 10. |
height |
Numeric. Height of the output plot in inches. Default = 10. |
Invisibly returns the ggplot object.
## Not run: if ( requireNamespace("DspikeIn", quietly = TRUE) && requireNamespace("phyloseq", quietly = TRUE) && requireNamespace("TreeSummarizedExperiment", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("ggtree", quietly = TRUE) && requireNamespace("ggtreeExtra", quietly = TRUE) && requireNamespace("ggstar", quietly = TRUE) && requireNamespace("ggnewscale", quietly = TRUE) ) { # ----------- Phyloseq Example ----------- data("physeq_16SOTU", package = "DspikeIn") spikein_ps <- phyloseq::subset_taxa(physeq_16SOTU, Genus == "Tetragenococcus") plot_spikein_tree_diagnostic( obj = spikein_ps, metadata_var = "Animal.type", save_plot = FALSE ) # ----------- TSE Example ----------- tse_spikein <- convert_phyloseq_to_tse(spikein_ps) plot_spikein_tree_diagnostic( obj = tse_spikein, metadata_var = "Animal.type", save_plot = FALSE ) } ## End(Not run)## Not run: if ( requireNamespace("DspikeIn", quietly = TRUE) && requireNamespace("phyloseq", quietly = TRUE) && requireNamespace("TreeSummarizedExperiment", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("ggtree", quietly = TRUE) && requireNamespace("ggtreeExtra", quietly = TRUE) && requireNamespace("ggstar", quietly = TRUE) && requireNamespace("ggnewscale", quietly = TRUE) ) { # ----------- Phyloseq Example ----------- data("physeq_16SOTU", package = "DspikeIn") spikein_ps <- phyloseq::subset_taxa(physeq_16SOTU, Genus == "Tetragenococcus") plot_spikein_tree_diagnostic( obj = spikein_ps, metadata_var = "Animal.type", save_plot = FALSE ) # ----------- TSE Example ----------- tse_spikein <- convert_phyloseq_to_tse(spikein_ps) plot_spikein_tree_diagnostic( obj = tse_spikein, metadata_var = "Animal.type", save_plot = FALSE ) } ## End(Not run)
Create a bar plot of relative or absolute abundances of microbial taxa
without using taxonomic glomming (tax_glom). Works on raw or normalized data.
plotbar_abundance( physeq, tax_level = "Genus", normalize = TRUE, treatment_variable = "Host.taxon", abundance_type = "relative", x_angle = 25, fill_variable = tax_level, facet_variable = NULL, palette = DspikeIn::color_palette$mix_MG, legend_size = 11, legend_columns = 1, x_scale = "free", xlab = NULL )plotbar_abundance( physeq, tax_level = "Genus", normalize = TRUE, treatment_variable = "Host.taxon", abundance_type = "relative", x_angle = 25, fill_variable = tax_level, facet_variable = NULL, palette = DspikeIn::color_palette$mix_MG, legend_size = 11, legend_columns = 1, x_scale = "free", xlab = NULL )
physeq |
A |
tax_level |
A character string indicating the taxonomic rank (e.g., |
normalize |
Logical; if |
treatment_variable |
Character; column name in sample metadata for x-axis grouping. |
abundance_type |
Character; either |
x_angle |
Numeric; angle of x-axis tick labels. Default: |
fill_variable |
Character; variable to fill bars by. Default: same as |
facet_variable |
Optional; column name for faceting. Default: |
palette |
A named vector of colors to use for |
legend_size |
Numeric; legend text size. Default: |
legend_columns |
Integer; number of legend columns. Default: |
x_scale |
Character; either |
xlab |
Optional; override x-axis label. If |
A ggplot2 object containing a bar plot of taxa abundance.
## Not run: # Load required package if (requireNamespace("phyloseq", quietly = TRUE)) { # Load example data data("physeq_ITSOTU", package = "DspikeIn") # Subset: Eurycea salamanders from Blue Ridge, exclude unwanted genera Des <- physeq_ITSOTU |> phyloseq::subset_taxa(Genus != "Dekkera") |> phyloseq::subset_samples(Clade.Order == "Caudate") |> phyloseq::subset_samples(Host.genus == "Eurycea") |> phyloseq::subset_samples(Ecoregion.III == "Blue Ridge") # Clean taxa: remove NAs or blanks in Phylum, filter low-abundance Des_filtered <- phyloseq::subset_taxa(Des, !is.na(Phylum) & Phylum != "") Des_ps <- phyloseq::prune_taxa(phyloseq::taxa_sums(Des_filtered) > 99, Des_filtered) # Plot taxa abundance with full control plotbar_abundance( physeq = Des_ps, normalize = TRUE, treatment_variable = "Diet", abundance_type = "absolute", x_angle = 0, fill_variable = "Phylum", palette = DspikeIn::color_palette$mix_MG, legend_size = 10, legend_columns = 1, x_scale = "free", xlab = NULL ) } ## End(Not run)## Not run: # Load required package if (requireNamespace("phyloseq", quietly = TRUE)) { # Load example data data("physeq_ITSOTU", package = "DspikeIn") # Subset: Eurycea salamanders from Blue Ridge, exclude unwanted genera Des <- physeq_ITSOTU |> phyloseq::subset_taxa(Genus != "Dekkera") |> phyloseq::subset_samples(Clade.Order == "Caudate") |> phyloseq::subset_samples(Host.genus == "Eurycea") |> phyloseq::subset_samples(Ecoregion.III == "Blue Ridge") # Clean taxa: remove NAs or blanks in Phylum, filter low-abundance Des_filtered <- phyloseq::subset_taxa(Des, !is.na(Phylum) & Phylum != "") Des_ps <- phyloseq::prune_taxa(phyloseq::taxa_sums(Des_filtered) > 99, Des_filtered) # Plot taxa abundance with full control plotbar_abundance( physeq = Des_ps, normalize = TRUE, treatment_variable = "Diet", abundance_type = "absolute", x_angle = 0, fill_variable = "Phylum", palette = DspikeIn::color_palette$mix_MG, legend_size = 10, legend_columns = 1, x_scale = "free", xlab = NULL ) } ## End(Not run)
Subsets, merges, and saves taxa based on hashcodes and a specified merge method ("sum" or "max").
This function pre-processes a phyloseq or TreeSummarizedExperiment (TSE) object by subsetting, merging,
and saving taxa based on provided hashcodes. It retains taxonomic information and creates intermediate datasets
for further downstream analysis.
Pre_processing_hashcodes( obj, hashcodes, merge_method = c("sum", "max"), output_prefix = "merged_physeq" )Pre_processing_hashcodes( obj, hashcodes, merge_method = c("sum", "max"), output_prefix = "merged_physeq" )
obj |
A |
hashcodes |
A character vector of taxon hashcodes (OTU row names). |
merge_method |
The method to merge taxa: "sum" or "max". |
output_prefix |
A prefix for the output file names. |
A processed phyloseq or TSE object.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Subset to Tetragenococcus species tetragenococcus_physeq <- phyloseq::subset_taxa( physeq_16SOTU, Species %in% c("Tetragenococcus_halophilus", "Tetragenococcus_sp.") ) # Extract OTU IDs (hashcodes) for phyloseq object hashcodes_physeq <- rownames(phyloseq::otu_table(tetragenococcus_physeq)) # Remove previous output file if exists if (file.exists("merged_physeq_processed.rds")) { file.remove("merged_physeq_processed.rds") } # Run merging with "sum" method for phyloseq processed_sum <- Pre_processing_hashcodes( physeq_16SOTU, hashcodes = hashcodes_physeq, merge_method = "sum" ) # Convert to TreeSummarizedExperiment (TSE) tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) tetragenococcus_TSE <- convert_phyloseq_to_tse(tetragenococcus_physeq) # Extract hashcodes for TSE hashcodes_tse <- rownames(tetragenococcus_TSE) # Run merging with "max" method for TSE processed_max <- Pre_processing_hashcodes( tse_16SOTU, hashcodes = hashcodes_tse, merge_method = "max" ) # Final cleanup of written file file.remove("merged_physeq_processed.rds") }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Subset to Tetragenococcus species tetragenococcus_physeq <- phyloseq::subset_taxa( physeq_16SOTU, Species %in% c("Tetragenococcus_halophilus", "Tetragenococcus_sp.") ) # Extract OTU IDs (hashcodes) for phyloseq object hashcodes_physeq <- rownames(phyloseq::otu_table(tetragenococcus_physeq)) # Remove previous output file if exists if (file.exists("merged_physeq_processed.rds")) { file.remove("merged_physeq_processed.rds") } # Run merging with "sum" method for phyloseq processed_sum <- Pre_processing_hashcodes( physeq_16SOTU, hashcodes = hashcodes_physeq, merge_method = "sum" ) # Convert to TreeSummarizedExperiment (TSE) tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) tetragenococcus_TSE <- convert_phyloseq_to_tse(tetragenococcus_physeq) # Extract hashcodes for TSE hashcodes_tse <- rownames(tetragenococcus_TSE) # Run merging with "max" method for TSE processed_max <- Pre_processing_hashcodes( tse_16SOTU, hashcodes = hashcodes_tse, merge_method = "max" ) # Final cleanup of written file file.remove("merged_physeq_processed.rds") }
Merges ASVs/OTUs while ensuring that the phylogenetic tree and reference sequences remain intact. The provided taxonomic name(s) will be searched across all taxonomic levels (e.g., Kingdom, Phylum, ..Genus, Species). If tree or refseq become mismatched, they are pruned or removed safely.
Pre_processing_species( obj, species_name, merge_method = c("sum", "max"), output_file = NULL )Pre_processing_species( obj, species_name, merge_method = c("sum", "max"), output_file = NULL )
obj |
A |
species_name |
A character vector of exact taxonomic names to merge (matched across all taxonomy levels). |
merge_method |
Method used to merge counts: |
output_file |
Optional file path to save the processed object (e.g., |
A processed phyloseq or TreeSummarizedExperiment object with merged ASVs/OTUs.
library(DspikeIn) data("physeq_16SOTU", package = "DspikeIn") species_name <- c("Tetragenococcus_halophilus", "Tetragenococcus_sp.") # Merge species in phyloseq format merged_sum <- Pre_processing_species( physeq_16SOTU, species_name, merge_method = "sum" ) # Convert phyloseq to TSE format tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) # Merge species in TSE format and write to tempdir output_rds <- file.path(tempdir(), "merged_TSE_sum.rds") merged_TSE_sum <- Pre_processing_species( tse_16SOTU, species_name, merge_method = "sum", output_file = output_rds )library(DspikeIn) data("physeq_16SOTU", package = "DspikeIn") species_name <- c("Tetragenococcus_halophilus", "Tetragenococcus_sp.") # Merge species in phyloseq format merged_sum <- Pre_processing_species( physeq_16SOTU, species_name, merge_method = "sum" ) # Convert phyloseq to TSE format tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) # Merge species in TSE format and write to tempdir output_rds <- file.path(tempdir(), "merged_TSE_sum.rds") merged_TSE_sum <- Pre_processing_species( tse_16SOTU, species_name, merge_method = "sum", output_file = output_rds )
Merges ASVs belonging to user-defined spike-in species by summing or selecting maximum counts,
while preserving all available metadata (taxonomy, sample data, phylogenetic tree, and reference sequences).
This function works for both phyloseq and TreeSummarizedExperiment objects.
Pre_processing_species_list( obj, spiked_species, merge_method = c("sum", "max"), output_file = NULL )Pre_processing_species_list( obj, spiked_species, merge_method = c("sum", "max"), output_file = NULL )
obj |
A |
spiked_species |
Character vector of species names to be processed (matched against the |
merge_method |
Either |
output_file |
Optional. File path to save the merged object as an |
A merged object of the same class as the input (phyloseq or TreeSummarizedExperiment).
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq", package = "DspikeIn") data("tse", package = "DspikeIn") spiked_species <- c("Pseudomonas aeruginosa", "Escherichia coli", "Clostridium difficile") # Merge in phyloseq object merged_physeq <- Pre_processing_species_list( physeq, spiked_species = spiked_species, merge_method = "sum" ) # Merge in TreeSummarizedExperiment object merged_tse <- Pre_processing_species_list( tse, spiked_species = spiked_species, merge_method = "sum" ) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq", package = "DspikeIn") data("tse", package = "DspikeIn") spiked_species <- c("Pseudomonas aeruginosa", "Escherichia coli", "Clostridium difficile") # Merge in phyloseq object merged_physeq <- Pre_processing_species_list( physeq, spiked_species = spiked_species, merge_method = "sum" ) # Merge in TreeSummarizedExperiment object merged_tse <- Pre_processing_species_list( tse, spiked_species = spiked_species, merge_method = "sum" ) }
This function normalizes the abundance data in a phyloseq or TreeSummarizedExperiment
object by adjusting each sample's counts based on a total value, typically the maximum total
sequence count across all samples. The adjusted counts are then rounded to the nearest integer.
proportion_adj(obj, output_file = "proportion_adjusted.rds")proportion_adj(obj, output_file = "proportion_adjusted.rds")
obj |
A |
output_file |
A character string specifying the output file name for the adjusted object.
If |
This function extracts the OTU table (or assay in TSE), normalizes it based on the sample sums,
and updates the original object while maintaining its structure.
A modified object of the same class (phyloseq or TreeSummarizedExperiment) with proportionally adjusted and rounded abundance data.
if (requireNamespace("DspikeIn", quietly = TRUE)) { # Load phyloseq object data("physeq_16SOTU", package = "DspikeIn") normalized_physeq <- proportion_adj( physeq_16SOTU, output_file = file.path(tempdir(), "proportion_adjusted_physeq.rds") ) print(normalized_physeq) # Convert to TSE and apply tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) normalized_tse <- proportion_adj( tse_16SOTU, output_file = file.path(tempdir(), "proportion_adjusted_tse.rds") ) print(normalized_tse) }if (requireNamespace("DspikeIn", quietly = TRUE)) { # Load phyloseq object data("physeq_16SOTU", package = "DspikeIn") normalized_physeq <- proportion_adj( physeq_16SOTU, output_file = file.path(tempdir(), "proportion_adjusted_physeq.rds") ) print(normalized_physeq) # Convert to TSE and apply tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) normalized_tse <- proportion_adj( tse_16SOTU, output_file = file.path(tempdir(), "proportion_adjusted_tse.rds") ) print(normalized_tse) }
This function visualizes relationships between any two node metrics in a quadrant plot. Quadrant labels dynamically adjust based on the selected X and Y axis metrics.
quadrant_plot( metrics, x_metric = "Degree", y_metric = "Redundancy", x_threshold = NULL, y_threshold = NULL, top_quantile = 0.95, point_size = 3 )quadrant_plot( metrics, x_metric = "Degree", y_metric = "Redundancy", x_threshold = NULL, y_threshold = NULL, top_quantile = 0.95, point_size = 3 )
metrics |
A |
x_metric |
Character. Column name to use for the x-axis. Default is |
y_metric |
Character. Column name to use for the y-axis. Default is |
x_threshold |
Numeric. X-axis threshold for quadrant separation. Default is |
y_threshold |
Numeric. Y-axis threshold for quadrant separation. Default is |
top_quantile |
Numeric. Quantile threshold (0-1) to highlight top nodes. Default is |
point_size |
Numeric. Size of the points. Default is |
A ggplot object representing the customized quadrant plot.
if (requireNamespace("DspikeIn", quietly = TRUE)) { g <- load_graphml("Complete.graphml") # Compute node-level metrics result <- node_level_metrics(g) metrics <- result$metrics # Generate a quadrant plot using Degree and Efficiency plot <- quadrant_plot(metrics, x_metric = "Degree", y_metric = "Efficiency") print(plot) }if (requireNamespace("DspikeIn", quietly = TRUE)) { g <- load_graphml("Complete.graphml") # Compute node-level metrics result <- node_level_metrics(g) metrics <- result$metrics # Generate a quadrant plot using Degree and Efficiency plot <- quadrant_plot(metrics, x_metric = "Degree", y_metric = "Efficiency") print(plot) }
Performs random subsampling on the OTU table of a phyloseq or
TreeSummarizedExperiment (TSE) object by dividing each ASV count
by a specified reduction factor and rounding down to the nearest whole number.
Optionally, the result can be saved to disk as an .rds file.
random_subsample_WithReductionFactor( obj, reduction_factor = 3, output_file = NULL )random_subsample_WithReductionFactor( obj, reduction_factor = 3, output_file = NULL )
obj |
A |
reduction_factor |
A numeric value |
output_file |
Optional. A character string specifying the |
A subsampled object of the same class as the input (phyloseq or TreeSummarizedExperiment).
if (requireNamespace("DspikeIn", quietly = TRUE) && requireNamespace("phyloseq", quietly = TRUE) && requireNamespace("TreeSummarizedExperiment", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") red <- random_subsample_WithReductionFactor(physeq_16SOTU, reduction_factor = 10) summary_stats <- summ_phyloseq_sampleID(red) print(summary_stats) tse <- convert_phyloseq_to_tse(physeq_16SOTU) red_tse <- random_subsample_WithReductionFactor(tse, reduction_factor = 10) summary_stats <- summ_phyloseq_sampleID(red_tse) print(summary_stats) }if (requireNamespace("DspikeIn", quietly = TRUE) && requireNamespace("phyloseq", quietly = TRUE) && requireNamespace("TreeSummarizedExperiment", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") red <- random_subsample_WithReductionFactor(physeq_16SOTU, reduction_factor = 10) summary_stats <- summ_phyloseq_sampleID(red) print(summary_stats) tse <- convert_phyloseq_to_tse(physeq_16SOTU) red_tse <- random_subsample_WithReductionFactor(tse, reduction_factor = 10) summary_stats <- summ_phyloseq_sampleID(red_tse) print(summary_stats) }
This function selects the most important Amplicon Sequence Variants (ASVs) or Operational Taxonomic Units (OTUs)
based on a Random Forest model. If a TreeSummarizedExperiment (TSE) is provided, it is first converted to phyloseq.
The function allows filtering and pruning of taxa before selecting the most important features.
Optionally, the selected ASVs/OTUs can be saved as a CSV file.
RandomForest_selected( physeq, response_var, minlib = 5000, prunescale = 1e-05, ntree = 100, n_top_predictors = 100, output_csv = NULL, na_vars = NULL )RandomForest_selected( physeq, response_var, minlib = 5000, prunescale = 1e-05, ntree = 100, n_top_predictors = 100, output_csv = NULL, na_vars = NULL )
physeq |
A |
response_var |
A character string specifying the response variable from the sample metadata. |
minlib |
A numeric value specifying the minimum library size for filtering low-abundance taxa. Default is |
prunescale |
A numeric value specifying the relative abundance threshold for pruning rare OTUs. Default is |
ntree |
An integer specifying the number of trees to grow in the Random Forest model. Default is |
n_top_predictors |
An integer specifying the number of top ASVs/OTUs to select based on feature importance. Default is |
output_csv |
An optional character string specifying the output CSV file name. If |
na_vars |
A character vector specifying metadata variables to check for missing values ( |
Returns a pruned phyloseq or TreeSummarizedExperiment (TSE) object containing only the selected ASVs/OTUs.
If the input is TSE, the output is converted back to TSE.
Based on public API usage of randomForest and phyloseq packages.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Perform Random Forest feature selection rf_physeq <- RandomForest_selected( physeq_16SOTU, prunescale = 0.00001, minlib = 5000, ntree = 30, n_top_predictors = 30, response_var = "Host.genus", na_vars = c("Habitat", "Ecoregion.III", "Host.genus", "Diet") ) # Less aggressive pruning (retain rare taxa) rf_physeq_relaxed <- RandomForest_selected( physeq_16SOTU, response_var = "Host.genus", minlib = 5000, prunescale = 0.00001, na_vars = c("Habitat", "Ecoregion.III", "Host.genus", "Diet") ) rf_physeq_strict <- RandomForest_selected( physeq_16SOTU, response_var = "Host.genus", minlib = 20000, prunescale = 0.0002, ntree = 200, n_top_predictors = 30, na_vars = c("Habitat", "Ecoregion.III", "Host.genus", "Diet") ) # Load TreeSummarizedExperiment (TSE) object tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) # Perform Random Forest feature selection on TSE object rf_tse <- RandomForest_selected( tse_16SOTU, response_var = "Host.genus", na_vars = c("Habitat", "Ecoregion.III", "Host.genus", "Diet") ) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Perform Random Forest feature selection rf_physeq <- RandomForest_selected( physeq_16SOTU, prunescale = 0.00001, minlib = 5000, ntree = 30, n_top_predictors = 30, response_var = "Host.genus", na_vars = c("Habitat", "Ecoregion.III", "Host.genus", "Diet") ) # Less aggressive pruning (retain rare taxa) rf_physeq_relaxed <- RandomForest_selected( physeq_16SOTU, response_var = "Host.genus", minlib = 5000, prunescale = 0.00001, na_vars = c("Habitat", "Ecoregion.III", "Host.genus", "Diet") ) rf_physeq_strict <- RandomForest_selected( physeq_16SOTU, response_var = "Host.genus", minlib = 20000, prunescale = 0.0002, ntree = 200, n_top_predictors = 30, na_vars = c("Habitat", "Ecoregion.III", "Host.genus", "Diet") ) # Load TreeSummarizedExperiment (TSE) object tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) # Perform Random Forest feature selection on TSE object rf_tse <- RandomForest_selected( tse_16SOTU, response_var = "Host.genus", na_vars = c("Habitat", "Ecoregion.III", "Host.genus", "Diet") ) }
Performs subsampling to an equal sequencing depth, determined by the sample with
the lowest sequencing depth after excluding very low abundant taxa. It rounds down the result
to the nearest integer (floor). Note: some samples may be lost in this process.
randomsubsample_Trimmed_evenDepth( obj, smalltrim = 0.001, replace = TRUE, output_file = NULL )randomsubsample_Trimmed_evenDepth( obj, smalltrim = 0.001, replace = TRUE, output_file = NULL )
obj |
A |
smalltrim |
A numeric value specifying the trimming percentage to exclude very low abundant taxa. Default is 0.001. |
replace |
A logical value indicating whether to sample with replacement. Default is TRUE. |
output_file |
A character string specifying the output file name for the subsampled object.
Default is |
A rarefied phyloseq or TreeSummarizedExperiment object with adjusted sequencing depths.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_ITSOTU", package = "DspikeIn") tse_ITSOTU <- convert_phyloseq_to_tse(physeq_ITSOTU) rarefied <- randomsubsample_Trimmed_evenDepth(tse_ITSOTU, smalltrim = 0.001) print(rarefied) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_ITSOTU", package = "DspikeIn") tse_ITSOTU <- convert_phyloseq_to_tse(physeq_ITSOTU) rarefied <- randomsubsample_Trimmed_evenDepth(tse_ITSOTU, smalltrim = 0.001) print(rarefied) }
This function generates a customizable scatter plot with a linear regression line,
statistical equation, and facets based on a specified range variable.
The x and y variables are transformed using a natural log transformation (log1p) to handle zero values.
regression_plot( data, x_var, y_var, custom_range = c(0.1, 15, 30, 50, 75, 100), formula = y ~ x, plot_title = NULL )regression_plot( data, x_var, y_var, custom_range = c(0.1, 15, 30, 50, 75, 100), formula = y ~ x, plot_title = NULL )
data |
A data frame containing the variables to plot. |
x_var |
A string specifying the name of the x-axis variable. |
y_var |
A string specifying the name of the y-axis variable. |
custom_range |
A numeric vector for defining custom ranges for the 'Percentage' column (default: c(0.1, 15, 30, 50, 75, 100)). |
formula |
A formula for the regression equation (default: y ~ x). |
plot_title |
A string specifying the title of the plot (default: NULL, no title will be shown if not provided). |
A ggplot2 object.
stat_regline_equation, stat_cor, facet_wrap
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("metadata_full", package = "DspikeIn") plot_object <- regression_plot( data = metadata_full, x_var = "Observed", y_var = "Total_Reads_spiked", custom_range = c(0.1, 15, 30, 50, 75, 100) ) # Print the plot output print(plot_object) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("metadata_full", package = "DspikeIn") plot_object <- regression_plot( data = metadata_full, x_var = "Observed", y_var = "Total_Reads_spiked", custom_range = c(0.1, 15, 30, 50, 75, 100) ) # Print the plot output print(plot_object) }
This function filters taxa from a phyloseq or TreeSummarizedExperiment (TSE) object based on custom thresholds for percentage of samples, mean abundance, count, and relative abundance.
relativized_filtered_taxa( obj, threshold_percentage = 0.5, threshold_mean_abundance = 0.001, threshold_count = 10, threshold_relative_abundance = NULL )relativized_filtered_taxa( obj, threshold_percentage = 0.5, threshold_mean_abundance = 0.001, threshold_count = 10, threshold_relative_abundance = NULL )
obj |
A |
threshold_percentage |
A numeric value specifying the minimum percentage of samples in which a taxon must be present to be retained. Default is 0.5. |
threshold_mean_abundance |
A numeric value specifying the minimum mean abundance of a taxon to be retained. Default is 0.001. |
threshold_count |
A numeric value specifying the minimum count of a taxon in a sample to be considered present. Default is 10. |
threshold_relative_abundance |
A numeric value specifying the minimum relative abundance of a taxon to be retained. Default is NULL. |
A filtered phyloseq or TSE object containing only the taxa that meet the specified thresholds.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Apply relative filtering on taxa FT <- relativized_filtered_taxa( physeq_16SOTU, threshold_percentage = 0.001, threshold_mean_abundance = 1, threshold_count = 5, threshold_relative_abundance = 0.001 ) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Apply relative filtering on taxa FT <- relativized_filtered_taxa( physeq_16SOTU, threshold_percentage = 0.001, threshold_mean_abundance = 1, threshold_count = 5, threshold_relative_abundance = 0.001 ) }
Remove Samples with Zero, Negative Counts, or NA Values and Add Pseudocount
remove_zero_negative_count_samples(obj, pseudocount = 1e-06)remove_zero_negative_count_samples(obj, pseudocount = 1e-06)
obj |
A |
pseudocount |
A numeric value to add to avoid zero counts. |
A phyloseq object with filtered and adjusted OTU table.
if (requireNamespace("DspikeIn", quietly = TRUE)) { library(DspikeIn) data("physeq_16SOTU", package = "DspikeIn") # Remove samples with zero/negative/NA counts and add pseudocount cleaned_ps <- remove_zero_negative_count_samples( physeq_16SOTU, pseudocount = 1e-6 ) }if (requireNamespace("DspikeIn", quietly = TRUE)) { library(DspikeIn) data("physeq_16SOTU", package = "DspikeIn") # Remove samples with zero/negative/NA counts and add pseudocount cleaned_ps <- remove_zero_negative_count_samples( physeq_16SOTU, pseudocount = 1e-6 ) }
This function processes microbiome data (phyloseq or TreeSummarizedExperiment),
rarefies the dataset, performs proportion transformation, and generates ridge plots
to visualize the distribution of relative abundances at a specified taxonomic rank.
ridge_plot_it(obj, taxrank = "Genus", rarefaction_depth = NULL, top_n = 10)ridge_plot_it(obj, taxrank = "Genus", rarefaction_depth = NULL, top_n = 10)
obj |
A |
taxrank |
A character string specifying the taxonomic rank for glomming and plotting. Default is "Genus". |
rarefaction_depth |
A numeric value specifying the rarefaction depth. If NULL, it is set to 90\
of the minimum sample sum (for |
top_n |
An integer specifying the number of top taxa to include in the plot. Default is 10. |
This function:
Rarefies the dataset (if phyloseq) to normalize sample depth.
Extracts abundance and taxonomic data using get_otu_table() and get_tax_table().
Aggregates abundance data at the specified taxonomic rank.
Selects the top n taxa by total abundance.
Generates a ridge plot to visualize abundance distributions.
A ggplot2 object representing the ridge plot of the distribution of relative abundances.
if (requireNamespace("DspikeIn", quietly = TRUE)) { # Load phyloseq object data("physeq_16SOTU", package = "DspikeIn") ridge_physeq <- ridge_plot_it(physeq_16SOTU, taxrank = "Family", top_n = 10) # convert phyloseq object to TSE tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) ridge_tse <- ridge_plot_it(tse_16SOTU, taxrank = "Family", top_n = 10) }if (requireNamespace("DspikeIn", quietly = TRUE)) { # Load phyloseq object data("physeq_16SOTU", package = "DspikeIn") ridge_physeq <- ridge_plot_it(physeq_16SOTU, taxrank = "Family", top_n = 10) # convert phyloseq object to TSE tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) ridge_tse <- ridge_plot_it(tse_16SOTU, taxrank = "Family", top_n = 10) }
Set Normalization Factors in the Sample Data of the Phyloseq Object
set_nf(obj, scaling.factor)set_nf(obj, scaling.factor)
obj |
A |
scaling.factor |
A vector of normalization factors. |
A phyloseq object with updated sample data.
if (requireNamespace("phyloseq", quietly = TRUE)) { data("physeq_ITSOTU", package = "DspikeIn") # Create normalization factors (e.g., all ones) nf <- rep(1, phyloseq::nsamples(physeq_ITSOTU)) # Apply normalization factors physeq_ITSOTU <- DspikeIn::set_nf(physeq_ITSOTU, scaling.factor = nf) # Check the updated sample data head(phyloseq::sample_data(physeq_ITSOTU)$norm_factors) }if (requireNamespace("phyloseq", quietly = TRUE)) { data("physeq_ITSOTU", package = "DspikeIn") # Create normalization factors (e.g., all ones) nf <- rep(1, phyloseq::nsamples(physeq_ITSOTU)) # Apply normalization factors physeq_ITSOTU <- DspikeIn::set_nf(physeq_ITSOTU, scaling.factor = nf) # Check the updated sample data head(phyloseq::sample_data(physeq_ITSOTU)$norm_factors) }
Evaluates the robustness of a network by iteratively removing nodes and measuring the size of the largest connected component at each step.
simulate_network_robustness( graph, steps = 10, removal_strategy = "random", plot_results = TRUE )simulate_network_robustness( graph, steps = 10, removal_strategy = "random", plot_results = TRUE )
graph |
An |
steps |
Integer. Number of nodes to remove. Default is |
removal_strategy |
Character. Node removal strategy, one of:
|
plot_results |
Logical. If |
Users can specify a node removal strategy from the following options:
"random": Removes nodes randomly.
"degree": Removes the node with the highest degree first.
"betweenness": Removes the node with the highest betweenness centrality first.
Important: For the "betweenness" strategy, all edge weights must be positive.
A list containing:
results: A data.frame with the step number and the relative size of the largest connected component.
plot: A ggplot2 object (returned if plot_results = TRUE).
if (requireNamespace("DspikeIn", quietly = TRUE)) { Complete <- load_graphml("Complete.graphml") # Simulate robustness by removing 200 highest-degree nodes robustness_degree <- simulate_network_robustness( graph = Complete, steps = 200, removal_strategy = "degree" ) # Simulate robustness with random node removal robustness_random <- simulate_network_robustness( graph = Complete, steps = 200, removal_strategy = "random" ) # Simulate robustness with betweenness-based node removal robustness_betweenness <- simulate_network_robustness( graph = Complete, steps = 200, removal_strategy = "betweenness" ) # Print robustness plots print(robustness_degree$plot) print(robustness_random$plot) print(robustness_betweenness$plot) }if (requireNamespace("DspikeIn", quietly = TRUE)) { Complete <- load_graphml("Complete.graphml") # Simulate robustness by removing 200 highest-degree nodes robustness_degree <- simulate_network_robustness( graph = Complete, steps = 200, removal_strategy = "degree" ) # Simulate robustness with random node removal robustness_random <- simulate_network_robustness( graph = Complete, steps = 200, removal_strategy = "random" ) # Simulate robustness with betweenness-based node removal robustness_betweenness <- simulate_network_robustness( graph = Complete, steps = 200, removal_strategy = "betweenness" ) # Print robustness plots print(robustness_degree$plot) print(robustness_random$plot) print(robustness_betweenness$plot) }
This function generates summary statistics (mean, median, standard deviation,
standard error, and quantiles) for each ASV (Amplicon Sequence Variant)
in a phyloseq or TreeSummarizedExperiment object.
summ_ASV_OTUID(obj)summ_ASV_OTUID(obj)
obj |
A |
This function extracts the OTU table (or assay in TSE), computes per-ASV statistics,
and returns a tidy summary data frame.
A data frame containing summary statistics for each ASV.
# Example with a phyloseq object if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_ITSOTU", package = "DspikeIn") summary_physeq <- summ_ASV_OTUID(physeq_ITSOTU) # Example with a TreeSummarizedExperiment object tse_ITSOTU <- convert_phyloseq_to_tse(physeq_ITSOTU) summary_tse <- summ_ASV_OTUID(tse_ITSOTU) }# Example with a phyloseq object if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_ITSOTU", package = "DspikeIn") summary_physeq <- summ_ASV_OTUID(physeq_ITSOTU) # Example with a TreeSummarizedExperiment object tse_ITSOTU <- convert_phyloseq_to_tse(physeq_ITSOTU) summary_tse <- summ_ASV_OTUID(tse_ITSOTU) }
Computes overall summary statistics (mean, median, standard deviation, standard error, and quantiles)
for the OTU table in a phyloseq or TreeSummarizedExperiment (TSE) object.
summ_count_phyloseq(obj)summ_count_phyloseq(obj)
obj |
A |
A data frame with overall summary statistics.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Summarize counts for the phyloseq object summary_stats_physeq <- summ_count_phyloseq(physeq_16SOTU) # Convert phyloseq object to a TreeSummarizedExperiment (TSE) tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) # Summarize counts for the TSE object summary_stats_tse <- summ_count_phyloseq(tse_16SOTU) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Summarize counts for the phyloseq object summary_stats_physeq <- summ_count_phyloseq(physeq_16SOTU) # Convert phyloseq object to a TreeSummarizedExperiment (TSE) tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) # Summarize counts for the TSE object summary_stats_tse <- summ_count_phyloseq(tse_16SOTU) }
Calculates summary statistics (mean, median, standard deviation, standard error, and quartiles)
for each sample in a phyloseq or TreeSummarizedExperiment object.
summ_phyloseq_sampleID(obj)summ_phyloseq_sampleID(obj)
obj |
A |
A data frame containing summary statistics per sample, with columns:
Sample_ID
Mean
Median
Standard Deviation (SD)
Standard Error (SE)
Q25 (25th percentile)
Q50 (Median)
Q75 (75th percentile)
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Summarize the phyloseq object summary_stats_physeq <- summ_phyloseq_sampleID(physeq_16SOTU) print(summary_stats_physeq) # Convert to TreeSummarizedExperiment tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) summary_stats_tse <- summ_phyloseq_sampleID(tse_16SOTU) print(summary_stats_tse) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") # Summarize the phyloseq object summary_stats_physeq <- summ_phyloseq_sampleID(physeq_16SOTU) print(summary_stats_physeq) # Convert to TreeSummarizedExperiment tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) summary_stats_tse <- summ_phyloseq_sampleID(tse_16SOTU) print(summary_stats_tse) }
This function creates a bar plot of relative or absolute abundances of the top n taxa
(OTUs/ASVs) at a specified taxonomic rank. Non-top taxa are aggregated into an "Others" category,
which is always displayed at the top for clarity. The function supports normalization, faceting,
and customization of plot aesthetics. The plot is built using ggplot2, and the taxa are ordered
with "Others" appearing first.
taxa_barplot( physeq, target_glom = "Genus", custom_tax_names = NULL, normalize = TRUE, treatment_variable = "Treatment", abundance_type = "relative", x_angle = 25, fill_variable = target_glom, facet_variable = NULL, top_n_taxa = 20, palette = color_palette$MG, legend_size = 11, legend_columns = 1, x_scale = "free", xlab = NULL )taxa_barplot( physeq, target_glom = "Genus", custom_tax_names = NULL, normalize = TRUE, treatment_variable = "Treatment", abundance_type = "relative", x_angle = 25, fill_variable = target_glom, facet_variable = NULL, top_n_taxa = 20, palette = color_palette$MG, legend_size = 11, legend_columns = 1, x_scale = "free", xlab = NULL )
physeq |
A |
target_glom |
A character string specifying the taxonomic rank to plot (e.g., "Genus"). |
custom_tax_names |
A character vector specifying custom taxonomic names for the levels. Default is |
normalize |
A logical value indicating whether to normalize sample counts to relative abundances. Default is |
treatment_variable |
A character string specifying the treatment variable for the x-axis (e.g., "Treatment"). Default is |
abundance_type |
A character string specifying whether to plot |
x_angle |
A numeric value specifying the angle of x-axis text labels. Default is |
fill_variable |
A character string specifying the variable to use for filling bar colors. Default is the same as |
facet_variable |
A character string specifying the variable to use for faceting the plot. Default is |
top_n_taxa |
A numeric value specifying the number of top taxa to include in the plot. Default is |
palette |
A character vector of color codes or a function generating such a palette. Default is |
legend_size |
A numeric value specifying the size of the legend text. Default is |
legend_columns |
A numeric value specifying the number of columns for the legend. Default is |
x_scale |
A character string specifying the x-axis scale in facets. Options are |
xlab |
A character string specifying the x-axis label. Default is |
A list containing the following components:
A ggplot2 object representing the taxa barplot.
A phyloseq object containing the top taxa and aggregated "Others" taxa.
Built on public functions from phyloseq and ggplot2 for data transformation and plotting.
# Example 1: Relative abundance barplot (subset to Frog samples for speed) data("physeq_16SOTU", package = "DspikeIn") # Subset to only 'Frog' samples physeq_frog <- phyloseq::subset_samples(physeq_16SOTU, Animal.type == "Frog") # Remove taxa with zero abundance physeq_frog <- phyloseq::prune_taxa(phyloseq::taxa_sums(physeq_frog) > 0, physeq_frog) # Plot relative abundance for the top 5 genera bp_rel <- taxa_barplot( physeq = physeq_frog, target_glom = "Genus", fill_variable = "Family", treatment_variable = "Diet", abundance_type = "relative", top_n_taxa = 5, legend_size = 9, x_scale = "fixed", legend_columns = 1, x_angle = 15, palette = DspikeIn::color_palette$MG ) print(bp_rel$barplot) # Example 2: Absolute abundance barplot (ITS data converted to TSE; Frog subset) data("physeq_ITSOTU", package = "DspikeIn") # Convert the phyloseq object to TreeSummarizedExperiment tse_ITSOTU <- convert_phyloseq_to_tse(physeq_ITSOTU) tse_frog <- tse_ITSOTU[, SummarizedExperiment::colData(tse_ITSOTU)$Animal.type == "Frog"] tse_frog <- tse_frog[rowSums(SummarizedExperiment::assay(tse_frog)) > 0, ] # Plot absolute abundance for top 5 Families bp_abs <- taxa_barplot( physeq = tse_frog, target_glom = "Family", treatment_variable = "Diet", fill_variable = "Family", abundance_type = "absolute", top_n_taxa = 5, x_angle = 15, legend_size = 9, legend_columns = 1, x_scale = "fixed", palette = DspikeIn::color_palette$cool_MG ) print(bp_abs$barplot)# Example 1: Relative abundance barplot (subset to Frog samples for speed) data("physeq_16SOTU", package = "DspikeIn") # Subset to only 'Frog' samples physeq_frog <- phyloseq::subset_samples(physeq_16SOTU, Animal.type == "Frog") # Remove taxa with zero abundance physeq_frog <- phyloseq::prune_taxa(phyloseq::taxa_sums(physeq_frog) > 0, physeq_frog) # Plot relative abundance for the top 5 genera bp_rel <- taxa_barplot( physeq = physeq_frog, target_glom = "Genus", fill_variable = "Family", treatment_variable = "Diet", abundance_type = "relative", top_n_taxa = 5, legend_size = 9, x_scale = "fixed", legend_columns = 1, x_angle = 15, palette = DspikeIn::color_palette$MG ) print(bp_rel$barplot) # Example 2: Absolute abundance barplot (ITS data converted to TSE; Frog subset) data("physeq_ITSOTU", package = "DspikeIn") # Convert the phyloseq object to TreeSummarizedExperiment tse_ITSOTU <- convert_phyloseq_to_tse(physeq_ITSOTU) tse_frog <- tse_ITSOTU[, SummarizedExperiment::colData(tse_ITSOTU)$Animal.type == "Frog"] tse_frog <- tse_frog[rowSums(SummarizedExperiment::assay(tse_frog)) > 0, ] # Plot absolute abundance for top 5 Families bp_abs <- taxa_barplot( physeq = tse_frog, target_glom = "Family", treatment_variable = "Diet", fill_variable = "Family", abundance_type = "absolute", top_n_taxa = 5, x_angle = 15, legend_size = 9, legend_columns = 1, x_scale = "fixed", palette = DspikeIn::color_palette$cool_MG ) print(bp_abs$barplot)
Cleans and standardizes a microbiome dataset, supporting both
phyloseq and TreeSummarizedExperiment. Performs:
Standardization of taxonomic ranks (if available)
Removal of leading/trailing whitespace in taxa names
Filtering out zero-count taxa
Exclusion of "Chloroplast" and "Mitochondria" classifications (if applicable)
Cleans and standardizes a microbiome object by:
Removing taxonomic prefixes (e.g. k__, p__)
Trimming whitespace
Renaming Domain to Kingdom (if present)
Removing taxa with "Chloroplast" or "Mitochondria" in any rank
Filtering out taxa with zero total abundance
Ensuring robust handling of both phyloseq and TreeSummarizedExperiment formats
tidy_phyloseq_tse(obj) tidy_phyloseq_tse(obj)tidy_phyloseq_tse(obj) tidy_phyloseq_tse(obj)
obj |
A |
Tidy a Phyloseq Object and Remove Zero/Negative Count Samples
This function standardizes taxonomic ranks, removes unnecessary whitespace, and filters unwanted classifications, ensuring consistency for downstream analysis.
A cleaned and tidied object of the same class.
A cleaned and filtered object of the same class with updated taxonomy.
if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") tidy_physeq <- tidy_phyloseq_tse(physeq_16SOTU) } if (requireNamespace("DspikeIn", quietly = TRUE)) { # Load example phyloseq object data("physeq_16SOTU", package = "DspikeIn") # ----------- Tidy phyloseq object ----------- tidy_physeq <- tidy_phyloseq_tse(physeq_16SOTU) # ----------- Tidy TSE object ----------- tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) tidy_tse <- tidy_phyloseq_tse(tse_16SOTU) }if (requireNamespace("DspikeIn", quietly = TRUE)) { data("physeq_16SOTU", package = "DspikeIn") tidy_physeq <- tidy_phyloseq_tse(physeq_16SOTU) } if (requireNamespace("DspikeIn", quietly = TRUE)) { # Load example phyloseq object data("physeq_16SOTU", package = "DspikeIn") # ----------- Tidy phyloseq object ----------- tidy_physeq <- tidy_phyloseq_tse(physeq_16SOTU) # ----------- Tidy TSE object ----------- tse_16SOTU <- convert_phyloseq_to_tse(physeq_16SOTU) tidy_tse <- tidy_phyloseq_tse(tse_16SOTU) }
This dataset contains synthetic microbiome data stored as a TreeSummarizedExperiment (TSE) object.
The TSE structure includes abundance data, sample metadata, taxonomic annotations,
a phylogenetic tree linking ASVs, and reference sequences.
It is suitable for benchmarking preprocessing workflows involving taxonomic merging or filtering.
data(tse)data(tse)
A TreeSummarizedExperiment object with:
Matrix of observed counts or abundances (ASVs samples).
Taxonomic annotations per ASV.
Sample-level metadata.
A phylogenetic tree object representing ASV relationships.
A DNAStringSet of DNA sequences matching ASVs.
A TreeSummarizedExperiment object with full microbiome context (abundance, taxonomy, tree, and reference sequences).
Simulated data generated for reproducibility in microbiome pipelines.
TreeSummarizedExperiment::TreeSummarizedExperiment()
if (requireNamespace("TreeSummarizedExperiment", quietly = TRUE)) { data(tse, package = "DspikeIn") tse SummarizedExperiment::assay(tse) SummarizedExperiment::colData(tse) SummarizedExperiment::rowData(tse) TreeSummarizedExperiment::rowTree(tse) TreeSummarizedExperiment::referenceSeq(tse) }if (requireNamespace("TreeSummarizedExperiment", quietly = TRUE)) { data(tse, package = "DspikeIn") tse SummarizedExperiment::assay(tse) SummarizedExperiment::colData(tse) SummarizedExperiment::rowData(tse) TreeSummarizedExperiment::rowTree(tse) TreeSummarizedExperiment::referenceSeq(tse) }
Validates whether sample spike-in sequences form a monophyletic clade with known reference spike-in(s) using a Neighbor-Joining (NJ) tree with Jukes-Cantor correction and bootstrap support.
This function produces:
A bootstrap-annotated NJ tree
A boxplot comparing branch lengths
A histogram of patristic distances
If output_prefix is provided, outputs are saved. Otherwise, they
are shown interactively and recorded in-memory.
validate_spikein_clade( reference_fasta, sample_fasta, bootstrap = 100, output_prefix = NULL )validate_spikein_clade( reference_fasta, sample_fasta, bootstrap = 100, output_prefix = NULL )
reference_fasta |
Character. Path to FASTA file of reference spike-ins. |
sample_fasta |
Character. Path to FASTA file of sample spike-ins. |
bootstrap |
Integer. Number of bootstrap replicates (default = 100). |
output_prefix |
Character or NULL. File prefix for saving output. |
A list with:
NJ tree (class phylo)
TRUE if sample spike-ins form a clade
Bootstrap support percentage
Branch length summary
Patristic distance matrix
Tree plot object
Boxplot object
Histogram object
Text summary
Multiple sequence alignment (MSA)
Alignment converted to phyDat
JC69 distance matrix
ref_fasta <- system.file("extdata", "Ref.fasta", package = "DspikeIn") sample_fasta <- system.file("extdata", "Sample.fasta", package = "DspikeIn") result <- validate_spikein_clade(ref_fasta, sample_fasta)ref_fasta <- system.file("extdata", "Ref.fasta", package = "DspikeIn") sample_fasta <- system.file("extdata", "Sample.fasta", package = "DspikeIn") result <- validate_spikein_clade(ref_fasta, sample_fasta)
Loads a microbial network from the DspikeIn package (Complete.graphml, NoHubs.graphml, or NoBasid.graphml)
or from a user-provided GraphML file. Computes network metrics, assigns modularity-based node colors (cool_MG from DspikeIn),
and visualizes the network using ggraph. Optionally saves computed global network metrics as a CSV file.
weight_Network(graph_path = NULL, save_metrics = TRUE)weight_Network(graph_path = NULL, save_metrics = TRUE)
graph_path |
Character. The GraphML file path. Default ( |
save_metrics |
Logical. If |
A list containing:
plot |
A |
metrics |
A |
graph |
The annotated |
cluster_fast_greedy, ggraph, load_graphml
if (requireNamespace("DspikeIn", quietly = TRUE)) { Complete <- load_graphml("Complete.graphml") # Run network weighting function on the loaded dataset # Load a specific GraphML dataset from DspikeIn result <- weight_Network(graph_path = "Complete.graphml") # Load a custom GraphML file from user directory, # for external graphml please use **full address** print(result$plot) # View network metrics result$metrics # Optional: Clean up generated metrics file if saved unlink("Global_Network_Metrics.csv") }if (requireNamespace("DspikeIn", quietly = TRUE)) { Complete <- load_graphml("Complete.graphml") # Run network weighting function on the loaded dataset # Load a specific GraphML dataset from DspikeIn result <- weight_Network(graph_path = "Complete.graphml") # Load a custom GraphML file from user directory, # for external graphml please use **full address** print(result$plot) # View network metrics result$metrics # Optional: Clean up generated metrics file if saved unlink("Global_Network_Metrics.csv") }