Reproducibility is a crucial aspect of data analysis, particularly in the context of microbiome data. The ability to consistently replicate an analysis and obtain the same results is essential for ensuring the reliability of findings and facilitating scientific collaboration.
The dar
package includes two key functions,
export_steps
and import_steps
, which promote
reproducibility in microbiome data analysis. These functions allow you
to export the steps of a recipe to a JSON file and then import those
steps to reproduce the analysis in a different environment.
The export_steps
function facilitates the export of a
recipe’s steps to a JSON file. This is useful for documenting and
sharing the parameters used in the analysis.
Here’s an example of how to use the export_steps
function:
library(dar)
data(metaHIV_phy)
# Create a recipe with steps
rec <-
recipe(metaHIV_phy, "RiskGroup2", "Species") |>
step_subset_taxa(tax_level = "Kingdom", taxa = c("Bacteria", "Archaea")) |>
step_filter_taxa(.f = "function(x) sum(x > 0) >= (0.3 * length(x))") |>
step_maaslin()
# Export the steps to a JSON file
out_file <- tempfile(fileext = ".json")
export_steps(rec, out_file)
In this example, a recipe with multiple steps is created, and then
the steps are exported to a JSON file using the
export_steps
function.
The import_steps
function allows you to import steps
from a JSON file and add them to an existing recipe. This is useful when
you want to reuse a previously saved set of steps or incorporate steps
from another recipe into your current analysis.
Here’s an example of how to use the import_steps
function:
# Initialize a recipe with a phyloseq object
rec <- recipe(metaHIV_phy, "RiskGroup2", "Species")
# Import the steps from a JSON file
json_file <- out_file
rec <- import_steps(rec, json_file)
rec
#> ── DAR Recipe ──────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 451 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
#>
#> Preporcessing steps:
#>
#> ◉ step_subset_taxa() id = subset_taxa__Makmur
#> ◉ step_filter_taxa() id = filter_taxa__Gustavus_Adolphus_pastry
#>
#> DA steps:
#>
#> ◉ step_maaslin() id = maaslin__Chouquette
In this example, an empty recipe is initialized, and then the steps
are imported from a JSON file using the import_steps
function. The imported steps are added to the existing recipe.
Once the recipe is imported, we can choose to add more steps or
execute the code using the prep
function. In this case, we
choose to execute prep
directly.
## Execute
da_results <- prep(rec, parallel = FALSE) |> bake()
da_results
#> ── DAR Results ─────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 101 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
#>
#> Results:
#>
#> ✔ maaslin__Chouquette diff_taxa = 86
#>
#> ℹ 86 taxa are present in all tested methods
#>
#> Bakes:
#>
#> ◉ 1 -> count_cutoff: NULL, weights: NULL, exclude: NULL, id: bake__Cuban_pastelito
It’s important to note some limitations and considerations when using
the export_steps
and import_steps
functions:
The JSON files generated by export_steps
only
contain the parameters of the recipe steps and bakes, not the original
data used in the analysis. Therefore, when importing the steps from a
JSON file, ensure that you have access to the same data that was
originally used.
The export_steps
and import_steps
functions are specific to the dar
package and are designed
for use in microbiome data analysis. They are not applicable to other
types of analyses or packages.
When importing steps from a JSON file, it’s important to check if the file contains “bake” steps. If so, the recipe will be automatically prepared after importing the steps. This may have implications for runtime and resource requirements of the analysis.
Make sure you have the correct versions of the dependencies of
the dar
package when exporting and importing recipe steps.
Updates in dependencies can affect the compatibility and reproducibility
of the analyses.
Reproducibility is essential in microbiome data analysis, and the
dar
package facilitates this aspect by providing the
export_steps
and import_steps
functions. These
functions allow you to export the steps of a recipe to a JSON file and
then import them to reproduce the analysis in a different environment.
With these tools, you can effectively document and share your analyses,
increasing transparency and the reliability of your results.
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.1 (2024-06-14)
#> os Ubuntu 24.04.1 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz Etc/UTC
#> date 2024-10-30
#> pandoc 3.2.1 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> abind 1.4-8 2024-09-12 [2] RSPM (R 4.4.0)
#> ade4 1.7-22 2023-02-06 [2] RSPM (R 4.4.0)
#> ape 5.8 2024-04-11 [2] RSPM (R 4.4.0)
#> assertthat 0.2.1 2019-03-21 [2] RSPM (R 4.4.0)
#> backports 1.5.0 2024-05-23 [2] RSPM (R 4.4.0)
#> base64enc 0.1-3 2015-07-28 [2] RSPM (R 4.4.0)
#> beachmat 2.23.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> beeswarm 0.4.0 2021-06-01 [2] RSPM (R 4.4.0)
#> biglm 0.9-3 2024-06-12 [2] RSPM (R 4.4.0)
#> Biobase * 2.67.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> BiocGenerics * 0.53.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> BiocNeighbors 2.1.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> BiocParallel 1.39.0 2024-10-23 [2] https://bioc.r-universe.dev (R 4.4.1)
#> BiocSingular 1.23.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> biomformat 1.35.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> Biostrings * 2.75.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> bit 4.5.0 2024-09-20 [2] RSPM (R 4.4.0)
#> bit64 4.5.2 2024-09-22 [2] RSPM (R 4.4.0)
#> bluster 1.17.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> boot 1.3-31 2024-08-28 [2] RSPM (R 4.4.0)
#> brio 1.1.5 2024-04-24 [2] RSPM (R 4.4.0)
#> bslib 0.8.0 2024-07-29 [2] RSPM (R 4.4.0)
#> buildtools 1.0.0 2024-10-28 [3] local (/pkg)
#> ca 0.71.1 2020-01-24 [2] RSPM (R 4.4.0)
#> cachem 1.1.0 2024-05-16 [2] RSPM (R 4.4.0)
#> checkmate 2.3.2 2024-07-29 [2] RSPM (R 4.4.0)
#> cli 3.6.3 2024-06-21 [2] RSPM (R 4.4.0)
#> cluster 2.1.6 2023-12-01 [2] RSPM (R 4.4.0)
#> codetools 0.2-20 2024-03-31 [2] RSPM (R 4.4.0)
#> colorspace 2.1-1 2024-07-26 [2] RSPM (R 4.4.0)
#> crayon 1.5.3 2024-06-20 [2] RSPM (R 4.4.0)
#> crosstalk 1.2.1 2023-11-23 [2] RSPM (R 4.4.0)
#> dar * 1.3.0 2024-10-30 [1] https://bioc.r-universe.dev (R 4.4.1)
#> data.table 1.16.2 2024-10-10 [2] RSPM (R 4.4.0)
#> DBI 1.2.3 2024-06-02 [2] RSPM (R 4.4.0)
#> DECIPHER 3.1.6 2024-10-29 [2] https://bioc.r-universe.dev (R 4.4.1)
#> decontam 1.25.0 2024-10-10 [2] https://bioc.r-universe.dev (R 4.4.1)
#> DelayedArray 0.31.14 2024-10-29 [2] https://bioc.r-universe.dev (R 4.4.1)
#> DelayedMatrixStats 1.27.3 2024-10-08 [2] https://bioc.r-universe.dev (R 4.4.1)
#> dendextend 1.18.1 2024-10-13 [2] RSPM (R 4.4.0)
#> DEoptimR 1.1-3 2023-10-07 [2] RSPM (R 4.4.0)
#> devtools 2.4.5 2022-10-11 [2] RSPM (R 4.4.0)
#> digest 0.6.37 2024-08-19 [2] RSPM (R 4.4.0)
#> DirichletMultinomial 1.47.2 2024-10-20 [2] https://bioc.r-universe.dev (R 4.4.1)
#> dplyr 1.1.4 2023-11-17 [2] RSPM (R 4.4.0)
#> ellipsis 0.3.2 2021-04-29 [2] RSPM (R 4.4.0)
#> evaluate 1.0.1 2024-10-10 [2] RSPM (R 4.4.0)
#> fansi 1.0.6 2023-12-08 [2] RSPM (R 4.4.0)
#> farver 2.1.2 2024-05-13 [2] RSPM (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [2] RSPM (R 4.4.0)
#> foreach 1.5.2 2022-02-02 [2] RSPM (R 4.4.0)
#> foreign 0.8-87 2024-06-26 [2] RSPM (R 4.4.0)
#> Formula 1.2-5 2023-02-24 [2] RSPM (R 4.4.0)
#> fs 1.6.4 2024-04-25 [2] RSPM (R 4.4.0)
#> furrr 0.3.1 2022-08-15 [2] RSPM (R 4.4.0)
#> future 1.34.0 2024-07-29 [2] RSPM (R 4.4.0)
#> generics 0.1.3 2022-07-05 [2] RSPM (R 4.4.0)
#> GenomeInfoDb * 1.41.2 2024-10-02 [2] https://bioc.r-universe.dev (R 4.4.1)
#> GenomeInfoDbData 1.2.13 2024-10-30 [2] Bioconductor
#> GenomicRanges * 1.57.2 2024-10-29 [2] https://bioc.r-universe.dev (R 4.4.1)
#> getopt 1.20.4 2023-10-01 [2] RSPM (R 4.4.0)
#> ggbeeswarm 0.7.2 2023-04-29 [2] RSPM (R 4.4.0)
#> ggplot2 3.5.1 2024-04-23 [2] RSPM (R 4.4.0)
#> ggrepel 0.9.6 2024-09-07 [2] RSPM (R 4.4.0)
#> globals 0.16.3 2024-03-08 [2] RSPM (R 4.4.0)
#> glue 1.8.0 2024-09-30 [2] RSPM (R 4.4.0)
#> gridExtra 2.3 2017-09-09 [2] RSPM (R 4.4.0)
#> gtable 0.3.6 2024-10-25 [2] RSPM (R 4.4.0)
#> hash 2.2.6.3 2023-08-19 [2] RSPM (R 4.4.0)
#> heatmaply 1.5.0 2023-10-06 [2] RSPM (R 4.4.0)
#> highr 0.11 2024-05-26 [2] RSPM (R 4.4.0)
#> Hmisc 5.2-0 2024-10-28 [2] RSPM (R 4.4.0)
#> hms 1.1.3 2023-03-21 [2] RSPM (R 4.4.0)
#> htmlTable 2.4.3 2024-07-21 [2] RSPM (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [2] RSPM (R 4.4.0)
#> htmlwidgets 1.6.4 2023-12-06 [2] RSPM (R 4.4.0)
#> httpuv 1.6.15 2024-03-26 [2] RSPM (R 4.4.0)
#> httr 1.4.7 2023-08-15 [2] RSPM (R 4.4.0)
#> igraph 2.1.1 2024-10-19 [2] RSPM (R 4.4.0)
#> IRanges * 2.39.2 2024-10-25 [2] https://bioc.r-universe.dev (R 4.4.1)
#> irlba 2.3.5.1 2022-10-03 [2] RSPM (R 4.4.0)
#> iterators 1.0.14 2022-02-05 [2] RSPM (R 4.4.0)
#> jquerylib 0.1.4 2021-04-26 [2] RSPM (R 4.4.0)
#> jsonlite 1.8.9 2024-09-20 [2] RSPM (R 4.4.0)
#> knitr 1.48 2024-07-07 [2] RSPM (R 4.4.0)
#> labeling 0.4.3 2023-08-29 [2] RSPM (R 4.4.0)
#> later 1.3.2 2023-12-06 [2] RSPM (R 4.4.0)
#> lattice 0.22-6 2024-03-20 [2] RSPM (R 4.4.0)
#> lazyeval 0.2.2 2019-03-15 [2] RSPM (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [2] RSPM (R 4.4.0)
#> listenv 0.9.1 2024-01-29 [2] RSPM (R 4.4.0)
#> lme4 1.1-35.5 2024-07-03 [2] RSPM (R 4.4.0)
#> logging 0.10-108 2019-07-14 [2] RSPM (R 4.4.0)
#> lpSolve 5.6.21 2024-09-12 [2] RSPM (R 4.4.0)
#> Maaslin2 1.19.0 2024-10-10 [2] https://bioc.r-universe.dev (R 4.4.1)
#> magrittr 2.0.3 2022-03-30 [2] RSPM (R 4.4.0)
#> maketools 1.3.1 2024-10-28 [3] Github (jeroen/maketools@d46f92c)
#> MASS 7.3-61 2024-06-13 [2] RSPM (R 4.4.0)
#> Matrix 1.7-1 2024-10-18 [2] RSPM (R 4.4.0)
#> MatrixGenerics * 1.17.1 2024-10-23 [2] https://bioc.r-universe.dev (R 4.4.1)
#> matrixStats * 1.4.1 2024-09-08 [2] RSPM (R 4.4.0)
#> mediation 4.5.0 2019-10-08 [2] RSPM (R 4.4.0)
#> memoise 2.0.1 2021-11-26 [2] RSPM (R 4.4.0)
#> mgcv 1.9-1 2023-12-21 [2] RSPM (R 4.4.0)
#> mia * 1.13.47 2024-10-22 [2] https://bioc.r-universe.dev (R 4.4.1)
#> microbiome 1.27.0 2024-10-22 [2] https://bioc.r-universe.dev (R 4.4.1)
#> mime 0.12 2021-09-28 [2] RSPM (R 4.4.0)
#> miniUI 0.1.1.1 2018-05-18 [2] RSPM (R 4.4.0)
#> minqa 1.2.8 2024-08-17 [2] RSPM (R 4.4.0)
#> MultiAssayExperiment * 1.31.5 2024-10-22 [2] https://bioc.r-universe.dev (R 4.4.1)
#> multtest 2.61.0 2024-10-20 [2] https://bioc.r-universe.dev (R 4.4.1)
#> munsell 0.5.1 2024-04-01 [2] RSPM (R 4.4.0)
#> mvtnorm 1.3-1 2024-09-03 [2] RSPM (R 4.4.0)
#> nlme 3.1-166 2024-08-14 [2] RSPM (R 4.4.0)
#> nloptr 2.1.1 2024-06-25 [2] RSPM (R 4.4.0)
#> nnet 7.3-19 2023-05-03 [2] RSPM (R 4.4.0)
#> optparse 1.7.5 2024-04-16 [2] RSPM (R 4.4.0)
#> parallelly 1.38.0 2024-07-27 [2] RSPM (R 4.4.0)
#> pbapply 1.7-2 2023-06-27 [2] RSPM (R 4.4.0)
#> pcaPP 2.0-5 2024-08-19 [2] RSPM (R 4.4.0)
#> permute 0.9-7 2022-01-27 [2] RSPM (R 4.4.0)
#> phyloseq * 1.49.0 2024-10-24 [2] https://bioc.r-universe.dev (R 4.4.1)
#> pillar 1.9.0 2023-03-22 [2] RSPM (R 4.4.0)
#> pkgbuild 1.4.5 2024-10-28 [2] RSPM (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [2] RSPM (R 4.4.0)
#> pkgload 1.4.0 2024-06-28 [2] RSPM (R 4.4.0)
#> plotly 4.10.4 2024-01-13 [2] RSPM (R 4.4.0)
#> plyr 1.8.9 2023-10-02 [2] RSPM (R 4.4.0)
#> profvis 0.4.0 2024-09-20 [2] RSPM (R 4.4.0)
#> promises 1.3.0 2024-04-05 [2] RSPM (R 4.4.0)
#> purrr 1.0.2 2023-08-10 [2] RSPM (R 4.4.0)
#> R6 2.5.1 2021-08-19 [2] RSPM (R 4.4.0)
#> rbiom 1.0.3 2021-11-05 [2] RSPM (R 4.4.0)
#> RColorBrewer 1.1-3 2022-04-03 [2] RSPM (R 4.4.0)
#> Rcpp 1.0.13 2024-07-17 [2] RSPM (R 4.4.0)
#> RcppParallel 5.1.9 2024-08-19 [2] RSPM (R 4.4.0)
#> readr 2.1.5 2024-01-10 [2] RSPM (R 4.4.0)
#> registry 0.5-1 2019-03-05 [2] RSPM (R 4.4.0)
#> remotes 2.5.0 2024-03-17 [2] RSPM (R 4.4.0)
#> reshape2 1.4.4 2020-04-09 [2] RSPM (R 4.4.0)
#> rhdf5 2.49.0 2024-10-28 [2] https://bioc.r-universe.dev (R 4.4.1)
#> rhdf5filters 1.17.0 2024-10-03 [2] https://bioc.r-universe.dev (R 4.4.1)
#> Rhdf5lib 1.27.0 2024-10-03 [2] https://bioc.r-universe.dev (R 4.4.1)
#> rlang 1.1.4 2024-06-04 [2] RSPM (R 4.4.0)
#> rmarkdown * 2.28 2024-08-17 [2] RSPM (R 4.4.0)
#> robustbase 0.99-4-1 2024-09-27 [2] RSPM (R 4.4.0)
#> rpart 4.1.23 2023-12-05 [2] RSPM (R 4.4.0)
#> rstudioapi 0.17.1 2024-10-22 [2] RSPM (R 4.4.0)
#> rsvd 1.0.5 2021-04-16 [2] RSPM (R 4.4.0)
#> Rtsne 0.17 2023-12-07 [2] RSPM (R 4.4.0)
#> S4Arrays 1.5.11 2024-10-29 [2] https://bioc.r-universe.dev (R 4.4.1)
#> S4Vectors * 0.43.2 2024-10-17 [2] https://bioc.r-universe.dev (R 4.4.1)
#> sandwich 3.1-1 2024-09-15 [2] RSPM (R 4.4.0)
#> sass 0.4.9 2024-03-15 [2] RSPM (R 4.4.0)
#> ScaledMatrix 1.13.0 2024-10-28 [2] https://bioc.r-universe.dev (R 4.4.1)
#> scales 1.3.0 2023-11-28 [2] RSPM (R 4.4.0)
#> scater 1.33.4 2024-10-17 [2] https://bioc.r-universe.dev (R 4.4.1)
#> scuttle 1.15.5 2024-10-27 [2] https://bioc.r-universe.dev (R 4.4.1)
#> seriation 1.5.6 2024-08-19 [2] RSPM (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [2] RSPM (R 4.4.0)
#> shiny 1.9.1 2024-08-01 [2] RSPM (R 4.4.0)
#> SingleCellExperiment * 1.27.2 2024-10-26 [2] https://bioc.r-universe.dev (R 4.4.1)
#> slam 0.1-54 2024-10-15 [2] RSPM (R 4.4.0)
#> SparseArray 1.5.45 2024-10-29 [2] https://bioc.r-universe.dev (R 4.4.1)
#> sparseMatrixStats 1.17.2 2024-10-11 [2] https://bioc.r-universe.dev (R 4.4.1)
#> stringi 1.8.4 2024-05-06 [2] RSPM (R 4.4.0)
#> stringr 1.5.1 2023-11-14 [2] RSPM (R 4.4.0)
#> SummarizedExperiment * 1.35.5 2024-10-29 [2] https://bioc.r-universe.dev (R 4.4.1)
#> survival 3.7-0 2024-06-05 [2] RSPM (R 4.4.0)
#> sys 3.4.3 2024-10-04 [2] RSPM (R 4.4.0)
#> testthat 3.2.1.1 2024-04-14 [2] RSPM (R 4.4.0)
#> tibble 3.2.1 2023-03-20 [2] RSPM (R 4.4.0)
#> tidyr 1.3.1 2024-01-24 [2] RSPM (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [2] RSPM (R 4.4.0)
#> tidytree 0.4.6 2023-12-12 [2] RSPM (R 4.4.0)
#> treeio 1.29.2 2024-10-28 [2] https://bioc.r-universe.dev (R 4.4.1)
#> TreeSummarizedExperiment * 2.13.0 2024-10-28 [2] https://bioc.r-universe.dev (R 4.4.1)
#> TSP 1.2-4 2023-04-04 [2] RSPM (R 4.4.0)
#> tzdb 0.4.0 2023-05-12 [2] RSPM (R 4.4.0)
#> UCSC.utils 1.1.0 2024-10-29 [2] https://bioc.r-universe.dev (R 4.4.1)
#> UpSetR 1.4.0 2019-05-22 [2] RSPM (R 4.4.0)
#> urlchecker 1.0.1 2021-11-30 [2] RSPM (R 4.4.0)
#> usethis 3.0.0 2024-07-29 [2] RSPM (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [2] RSPM (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [2] RSPM (R 4.4.0)
#> vegan 2.6-8 2024-08-28 [2] RSPM (R 4.4.0)
#> vipor 0.4.7 2023-12-18 [2] RSPM (R 4.4.0)
#> viridis 0.6.5 2024-01-29 [2] RSPM (R 4.4.0)
#> viridisLite 0.4.2 2023-05-02 [2] RSPM (R 4.4.0)
#> vroom 1.6.5 2023-12-05 [2] RSPM (R 4.4.0)
#> webshot 0.5.5 2023-06-26 [2] RSPM (R 4.4.0)
#> withr 3.0.2 2024-10-28 [2] RSPM (R 4.4.0)
#> xfun 0.48 2024-10-03 [2] RSPM (R 4.4.0)
#> xtable 1.8-4 2019-04-21 [2] RSPM (R 4.4.0)
#> XVector * 0.45.0 2024-10-02 [2] https://bioc.r-universe.dev (R 4.4.1)
#> yaml 2.3.10 2024-07-26 [2] RSPM (R 4.4.0)
#> yulab.utils 0.1.7 2024-08-26 [2] RSPM (R 4.4.0)
#> zlibbioc 1.51.2 2024-10-21 [2] Bioconductor 3.20 (R 4.4.1)
#> zoo 1.8-12 2023-04-13 [2] RSPM (R 4.4.0)
#>
#> [1] /tmp/Rtmplc7Qga/Rinst44d2321d7110
#> [2] /github/workspace/pkglib
#> [3] /usr/local/lib/R/site-library
#> [4] /usr/lib/R/site-library
#> [5] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────