The package includes a dataset from a study by Noguera-Julian, M.,
et al. 2016, which investigated the differential abundance of
microbial species between men who have sex with men (MSM) and non-MSM
(hts). This data is stored as an object of the phyloseq
class, a standard input format for creating recipes with dar in
conjunction with TreeSummarizedExperiment
. To begin the
analysis, we first load and inspect the data:
First, we will create a recipe object from the original data and then specify the processing and differential analysis steps.
Recipes can be created manually by sequentially adding roles to variables in a data set.
The easiest way to create the initial recipe is:
rec_obj <- recipe(metaHIV_phy, var_info = "RiskGroup2", tax_info = "Species")
rec_obj
#> ── DAR Recipe ──────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 451 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
The var_info
argument corresponds to the variable to be
considered in the modeling process and tax_info
indicates
the taxonomic level that will be used for the analyses.
From here, preprocessing steps for some step X can be added sequentially in one of two ways:
Note that all step_ancom
and the other functions will
always return updated recipes.
We have two types of steps, those in charge of processing (prepro) and those destined to define the methods of differential analysis (da).
The prepro steps are used to modify the data loaded into the recipe
which will then be used for the da steps. The dar
package
include 3 main preprocessing functionalities.
step_subset_taxa
: Is used for subsetting the columns
and values within the tax_table.
step_filter_taxa
: Is used for filtering OTUs from
recipe objects.
step_rarefaction
: Is used to resample an OTU table
such that all samples have the same library size.
Additionally, the dar
package provides convenient
wrappers for the step_filter_taxa
function, designed to
filter Operational Taxonomic Units (OTUs) based on specific criteria:
prevalence, variance, abundance, and rarity.
step_filter_by_prevalence
: Filters OTUs according to
the number of samples in which the OTU appears.step_filter_by_variance
: Filters OTUs based on the
variance of the OTU’s presence across samples.step_filter_by_abundance
: Filters OTUs according to the
OTU’s abundance across samples.step_filter_by_rarity
: Filters OTUs based on the rarity
of the OTU across samples.For our data, we can add an operation to preprocessing the data
stored in the initial recpie. First, we will use
step_subset_taxa
to retain only Bacteria and Archaea OTUs
from the Kingdom taxonomic level. We will then filter out OTUs where at
least 3% of the samples have counts greater than 0.
rec_obj <- rec_obj |>
step_subset_taxa(tax_level = "Kingdom", taxa = c("Bacteria", "Archaea")) |>
step_filter_by_prevalence(0.03)
rec_obj
#> ── DAR Recipe ──────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 451 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
#>
#> Preporcessing steps:
#>
#> ◉ step_subset_taxa() id = subset_taxa__Jachnun
#> ◉ step_filter_by_prevalence() id = filter_by_prevalence__Bougatsa
#>
#> DA steps:
Now that we have defined the preprocessing of the input data for all the da methods that will be used, we need to define them. For this introduction we will use metagenomeSeq and maaslin2 methods with default parameters (those defined by the authors of each method).
rec_obj <- rec_obj |>
step_deseq() |>
step_metagenomeseq(rm_zeros = 0.01) |>
step_maaslin()
rec_obj
#> ── DAR Recipe ──────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 451 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
#>
#> Preporcessing steps:
#>
#> ◉ step_subset_taxa() id = subset_taxa__Jachnun
#> ◉ step_filter_by_prevalence() id = filter_by_prevalence__Bougatsa
#>
#> DA steps:
#>
#> ◉ step_deseq() id = deseq__Nunt
#> ◉ step_metagenomeseq() id = metagenomeseq__Kringle
#> ◉ step_maaslin() id = maaslin__Kouign_amann
The dar
package includes more da steps than those
defined above. Below is the full list:
To ensure the reproducibility and consistency of the generated
results, all the steps defined in the recipe are executed at the same
time using the prep
function.
da_results <- prep(rec_obj, parallel = TRUE)
da_results
#> ── DAR Results ─────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 278 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
#>
#> Results:
#>
#> ✔ deseq__Nunt diff_taxa = 166
#> ✔ metagenomeseq__Kringle diff_taxa = 233
#> ✔ maaslin__Kouign_amann diff_taxa = 146
#>
#> ℹ 64 taxa are present in all tested methods
Note that the resulting object print shows information about the amount of differentially abundant OTUs in each of the methods, as well as the number of OTUs that have been detected by all methods (consensus).
Now that we have the results we need to extract them, however for
this we first need to define a consensus strategy with the
bake
. For this example we are only interested in those OTUs
detected as differentially abundant in the three methods used.
## Number of used methods
count <- steps_ids(da_results, type = "da") |> length()
## Define the bake
da_results <- bake(da_results, count_cutoff = count)
Finally we can extract the table with the results using the
cool
function.
cool(da_results)
#> # A tibble: 64 × 2
#> taxa_id taxa
#> <chr> <chr>
#> 1 Otu_15 Bifidobacterium_catenulatum
#> 2 Otu_34 Olsenella_scatoligenes
#> 3 Otu_35 Collinsella_aerofaciens
#> 4 Otu_37 Collinsella_stercoris
#> 5 Otu_38 Enorma_massiliensis
#> 6 Otu_45 Slackia_isoflavoniconvertens
#> 7 Otu_47 Bacteroides_cellulosilyticus
#> 8 Otu_48 Bacteroides_clarus
#> 9 Otu_63 Bacteroides_plebeius
#> 10 Otu_69 Bacteroides_sp_CAG_530
#> # ℹ 54 more rows
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.1 (2024-06-14)
#> os Ubuntu 24.04.1 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz Etc/UTC
#> date 2024-10-30
#> pandoc 3.2.1 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> ade4 1.7-22 2023-02-06 [2] RSPM (R 4.4.0)
#> ape 5.8 2024-04-11 [2] RSPM (R 4.4.0)
#> assertthat 0.2.1 2019-03-21 [2] RSPM (R 4.4.0)
#> Biobase 2.67.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> BiocGenerics 0.53.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> biomformat 1.35.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> Biostrings 2.75.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> brio 1.1.5 2024-04-24 [2] RSPM (R 4.4.0)
#> bslib 0.8.0 2024-07-29 [2] RSPM (R 4.4.0)
#> buildtools 1.0.0 2024-10-28 [3] local (/pkg)
#> ca 0.71.1 2020-01-24 [2] RSPM (R 4.4.0)
#> cachem 1.1.0 2024-05-16 [2] RSPM (R 4.4.0)
#> cli 3.6.3 2024-06-21 [2] RSPM (R 4.4.0)
#> cluster 2.1.6 2023-12-01 [2] RSPM (R 4.4.0)
#> codetools 0.2-20 2024-03-31 [2] RSPM (R 4.4.0)
#> colorspace 2.1-1 2024-07-26 [2] RSPM (R 4.4.0)
#> crayon 1.5.3 2024-06-20 [2] RSPM (R 4.4.0)
#> crosstalk 1.2.1 2023-11-23 [2] RSPM (R 4.4.0)
#> dar * 1.3.0 2024-10-30 [1] https://bioc.r-universe.dev (R 4.4.1)
#> data.table 1.16.2 2024-10-10 [2] RSPM (R 4.4.0)
#> dendextend 1.18.1 2024-10-13 [2] RSPM (R 4.4.0)
#> devtools 2.4.5 2022-10-11 [2] RSPM (R 4.4.0)
#> digest 0.6.37 2024-08-19 [2] RSPM (R 4.4.0)
#> dplyr 1.1.4 2023-11-17 [2] RSPM (R 4.4.0)
#> ellipsis 0.3.2 2021-04-29 [2] RSPM (R 4.4.0)
#> evaluate 1.0.1 2024-10-10 [2] RSPM (R 4.4.0)
#> fansi 1.0.6 2023-12-08 [2] RSPM (R 4.4.0)
#> farver 2.1.2 2024-05-13 [2] RSPM (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [2] RSPM (R 4.4.0)
#> foreach 1.5.2 2022-02-02 [2] RSPM (R 4.4.0)
#> fs 1.6.4 2024-04-25 [2] RSPM (R 4.4.0)
#> furrr 0.3.1 2022-08-15 [2] RSPM (R 4.4.0)
#> future 1.34.0 2024-07-29 [2] RSPM (R 4.4.0)
#> generics 0.1.3 2022-07-05 [2] RSPM (R 4.4.0)
#> GenomeInfoDb 1.41.2 2024-10-02 [2] https://bioc.r-universe.dev (R 4.4.1)
#> GenomeInfoDbData 1.2.13 2024-10-30 [2] Bioconductor
#> ggplot2 3.5.1 2024-04-23 [2] RSPM (R 4.4.0)
#> globals 0.16.3 2024-03-08 [2] RSPM (R 4.4.0)
#> glue 1.8.0 2024-09-30 [2] RSPM (R 4.4.0)
#> gridExtra 2.3 2017-09-09 [2] RSPM (R 4.4.0)
#> gtable 0.3.6 2024-10-25 [2] RSPM (R 4.4.0)
#> heatmaply 1.5.0 2023-10-06 [2] RSPM (R 4.4.0)
#> highr 0.11 2024-05-26 [2] RSPM (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [2] RSPM (R 4.4.0)
#> htmlwidgets 1.6.4 2023-12-06 [2] RSPM (R 4.4.0)
#> httpuv 1.6.15 2024-03-26 [2] RSPM (R 4.4.0)
#> httr 1.4.7 2023-08-15 [2] RSPM (R 4.4.0)
#> igraph 2.1.1 2024-10-19 [2] RSPM (R 4.4.0)
#> IRanges 2.39.2 2024-10-25 [2] https://bioc.r-universe.dev (R 4.4.1)
#> iterators 1.0.14 2022-02-05 [2] RSPM (R 4.4.0)
#> jquerylib 0.1.4 2021-04-26 [2] RSPM (R 4.4.0)
#> jsonlite 1.8.9 2024-09-20 [2] RSPM (R 4.4.0)
#> knitr 1.48 2024-07-07 [2] RSPM (R 4.4.0)
#> labeling 0.4.3 2023-08-29 [2] RSPM (R 4.4.0)
#> later 1.3.2 2023-12-06 [2] RSPM (R 4.4.0)
#> lattice 0.22-6 2024-03-20 [2] RSPM (R 4.4.0)
#> lazyeval 0.2.2 2019-03-15 [2] RSPM (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [2] RSPM (R 4.4.0)
#> listenv 0.9.1 2024-01-29 [2] RSPM (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [2] RSPM (R 4.4.0)
#> maketools 1.3.1 2024-10-28 [3] Github (jeroen/maketools@d46f92c)
#> MASS 7.3-61 2024-06-13 [2] RSPM (R 4.4.0)
#> Matrix 1.7-1 2024-10-18 [2] RSPM (R 4.4.0)
#> memoise 2.0.1 2021-11-26 [2] RSPM (R 4.4.0)
#> mgcv 1.9-1 2023-12-21 [2] RSPM (R 4.4.0)
#> microbiome 1.27.0 2024-10-22 [2] https://bioc.r-universe.dev (R 4.4.1)
#> mime 0.12 2021-09-28 [2] RSPM (R 4.4.0)
#> miniUI 0.1.1.1 2018-05-18 [2] RSPM (R 4.4.0)
#> multtest 2.61.0 2024-10-20 [2] https://bioc.r-universe.dev (R 4.4.1)
#> munsell 0.5.1 2024-04-01 [2] RSPM (R 4.4.0)
#> nlme 3.1-166 2024-08-14 [2] RSPM (R 4.4.0)
#> parallelly 1.38.0 2024-07-27 [2] RSPM (R 4.4.0)
#> permute 0.9-7 2022-01-27 [2] RSPM (R 4.4.0)
#> phyloseq 1.49.0 2024-10-24 [2] https://bioc.r-universe.dev (R 4.4.1)
#> pillar 1.9.0 2023-03-22 [2] RSPM (R 4.4.0)
#> pkgbuild 1.4.5 2024-10-28 [2] RSPM (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [2] RSPM (R 4.4.0)
#> pkgload 1.4.0 2024-06-28 [2] RSPM (R 4.4.0)
#> plotly 4.10.4 2024-01-13 [2] RSPM (R 4.4.0)
#> plyr 1.8.9 2023-10-02 [2] RSPM (R 4.4.0)
#> profvis 0.4.0 2024-09-20 [2] RSPM (R 4.4.0)
#> promises 1.3.0 2024-04-05 [2] RSPM (R 4.4.0)
#> purrr 1.0.2 2023-08-10 [2] RSPM (R 4.4.0)
#> R6 2.5.1 2021-08-19 [2] RSPM (R 4.4.0)
#> RColorBrewer 1.1-3 2022-04-03 [2] RSPM (R 4.4.0)
#> Rcpp 1.0.13 2024-07-17 [2] RSPM (R 4.4.0)
#> registry 0.5-1 2019-03-05 [2] RSPM (R 4.4.0)
#> remotes 2.5.0 2024-03-17 [2] RSPM (R 4.4.0)
#> reshape2 1.4.4 2020-04-09 [2] RSPM (R 4.4.0)
#> rhdf5 2.49.0 2024-10-28 [2] https://bioc.r-universe.dev (R 4.4.1)
#> rhdf5filters 1.17.0 2024-10-03 [2] https://bioc.r-universe.dev (R 4.4.1)
#> Rhdf5lib 1.27.0 2024-10-03 [2] https://bioc.r-universe.dev (R 4.4.1)
#> rlang 1.1.4 2024-06-04 [2] RSPM (R 4.4.0)
#> rmarkdown * 2.28 2024-08-17 [2] RSPM (R 4.4.0)
#> Rtsne 0.17 2023-12-07 [2] RSPM (R 4.4.0)
#> S4Vectors 0.43.2 2024-10-17 [2] https://bioc.r-universe.dev (R 4.4.1)
#> sass 0.4.9 2024-03-15 [2] RSPM (R 4.4.0)
#> scales 1.3.0 2023-11-28 [2] RSPM (R 4.4.0)
#> seriation 1.5.6 2024-08-19 [2] RSPM (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [2] RSPM (R 4.4.0)
#> shiny 1.9.1 2024-08-01 [2] RSPM (R 4.4.0)
#> stringi 1.8.4 2024-05-06 [2] RSPM (R 4.4.0)
#> stringr 1.5.1 2023-11-14 [2] RSPM (R 4.4.0)
#> survival 3.7-0 2024-06-05 [2] RSPM (R 4.4.0)
#> sys 3.4.3 2024-10-04 [2] RSPM (R 4.4.0)
#> testthat 3.2.1.1 2024-04-14 [2] RSPM (R 4.4.0)
#> tibble 3.2.1 2023-03-20 [2] RSPM (R 4.4.0)
#> tidyr 1.3.1 2024-01-24 [2] RSPM (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [2] RSPM (R 4.4.0)
#> TSP 1.2-4 2023-04-04 [2] RSPM (R 4.4.0)
#> UCSC.utils 1.1.0 2024-10-29 [2] https://bioc.r-universe.dev (R 4.4.1)
#> UpSetR 1.4.0 2019-05-22 [2] RSPM (R 4.4.0)
#> urlchecker 1.0.1 2021-11-30 [2] RSPM (R 4.4.0)
#> usethis 3.0.0 2024-07-29 [2] RSPM (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [2] RSPM (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [2] RSPM (R 4.4.0)
#> vegan 2.6-8 2024-08-28 [2] RSPM (R 4.4.0)
#> viridis 0.6.5 2024-01-29 [2] RSPM (R 4.4.0)
#> viridisLite 0.4.2 2023-05-02 [2] RSPM (R 4.4.0)
#> webshot 0.5.5 2023-06-26 [2] RSPM (R 4.4.0)
#> withr 3.0.2 2024-10-28 [2] RSPM (R 4.4.0)
#> xfun 0.48 2024-10-03 [2] RSPM (R 4.4.0)
#> xtable 1.8-4 2019-04-21 [2] RSPM (R 4.4.0)
#> XVector 0.45.0 2024-10-02 [2] https://bioc.r-universe.dev (R 4.4.1)
#> yaml 2.3.10 2024-07-26 [2] RSPM (R 4.4.0)
#> zlibbioc 1.51.2 2024-10-21 [2] Bioconductor 3.20 (R 4.4.1)
#>
#> [1] /tmp/Rtmplc7Qga/Rinst44d2321d7110
#> [2] /github/workspace/pkglib
#> [3] /usr/local/lib/R/site-library
#> [4] /usr/lib/R/site-library
#> [5] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────