The step_filter_taxa
function is a general function that
allows for flexible filtering of OTUs based on across-sample abundance
criteria. The other functions, step_filter_by_prevalence
, and
, are convenience wrappers around
, each designed to filter OTUs based on a
specific criterion: prevalence, variance, abundance, and rarity,
The step_subset_taxa
function is used to subset taxa
based on their taxonomic level.
The phyloseq or TSE used as input can be pre-filtered using methods
that are most convenient to the user. However, the dar
package provides several functions to perform this filtering directly on
the recipe object.
The step_filter_taxa
function applies an arbitrary set
of functions to OTUs as across-sample criteria. It takes a phyloseq
object as input and returns a logical vector indicating whether each OTU
passed the criteria. If the “prune” option is set to FALSE, it returns
the already-trimmed version of the phyloseq object.
This function filters OTUs based on their abundance. The taxa retained in the dataset are those where the sum of their abundance is greater than the product of the total abundance and the provided threshold.
This function filters OTUs based on their prevalence. The taxa retained in the dataset are those where the prevalence is greater than the provided threshold.
This function filters OTUs based on their rarity. The taxa retained in the dataset are those where the sum of their rarity is less than the provided threshold.
This function filters OTUs based on their variance. The taxa retained in the dataset are those where the variance of their abundance is greater than the provided threshold.
The subset_taxa
function subsets taxa based on their
taxonomic level. The taxa retained in the dataset are those where the
taxonomic level matches the provided taxa.
These functions provide a powerful and flexible way to filter and
subset OTUs in phyloseq objects contained within a recipe object, making
it easier to work with complex experimental data. By understanding how
to use these functions effectively, you can streamline your data
analysis workflow and focus on the aspects of your data that are most
relevant to your research questions. The dar
package offers
the added convenience of performing these operations directly on the
recipe object.
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.2 (2024-10-31)
#> os Ubuntu 24.04.1 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz Etc/UTC
#> date 2025-02-10
#> pandoc 3.2.1 @ /usr/local/bin/ (via rmarkdown)
#> quarto 1.6.40 @ /usr/local/bin/quarto
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> abind 1.4-8 2024-09-12 [2] RSPM (R 4.4.0)
#> ade4 1.7-22 2023-02-06 [2] RSPM (R 4.4.0)
#> ape 5.8-1 2024-12-16 [2] RSPM (R 4.4.0)
#> assertthat 0.2.1 2019-03-21 [2] RSPM (R 4.4.0)
#> beachmat 2.23.6 2025-01-15 [2] (R 4.4.2)
#> beeswarm 0.4.0 2021-06-01 [2] RSPM (R 4.4.0)
#> Biobase * 2.67.0 2025-01-29 [2] (R 4.4.2)
#> BiocBaseUtils 1.9.0 2025-01-17 [2] (R 4.4.2)
#> BiocGenerics * 0.53.6 2025-02-07 [2] (R 4.4.2)
#> BiocNeighbors 2.1.2 2025-02-08 [2] (R 4.4.2)
#> BiocParallel 1.41.0 2025-01-14 [2] (R 4.4.2)
#> BiocSingular 1.23.0 2024-12-19 [2] (R 4.4.2)
#> biomformat 1.35.0 2025-01-28 [2] (R 4.4.2)
#> Biostrings * 2.75.3 2025-01-15 [2] (R 4.4.2)
#> bluster 1.17.0 2025-02-08 [2] (R 4.4.2)
#> brio 1.1.5 2024-04-24 [2] RSPM (R 4.4.0)
#> bslib 0.9.0 2025-01-30 [2] RSPM (R 4.4.0)
#> buildtools 1.0.0 2025-01-11 [3] local (/pkg)
#> ca 0.71.1 2020-01-24 [2] RSPM (R 4.4.0)
#> cachem 1.1.0 2024-05-16 [2] RSPM (R 4.4.0)
#> cellranger 1.1.0 2016-07-27 [2] RSPM (R 4.4.0)
#> cli 3.6.3 2024-06-21 [2] RSPM (R 4.4.0)
#> cluster 2.1.8 2024-12-11 [2] RSPM (R 4.4.0)
#> coda 0.19-4.1 2024-01-31 [2] RSPM (R 4.4.0)
#> codetools 0.2-20 2024-03-31 [2] RSPM (R 4.4.0)
#> colorspace 2.1-1 2024-07-26 [2] RSPM (R 4.4.0)
#> crayon 1.5.3 2024-06-20 [2] RSPM (R 4.4.0)
#> crosstalk 1.2.1 2023-11-23 [2] RSPM (R 4.4.0)
#> dar * 1.3.0 2025-02-10 [1] (R 4.4.2)
#> data.table 1.16.4 2024-12-06 [2] RSPM (R 4.4.0)
#> DBI 1.2.3 2024-06-02 [2] RSPM (R 4.4.0)
#> DECIPHER 3.3.2 2025-01-19 [2] (R 4.4.2)
#> decontam 1.27.0 2025-01-16 [2] (R 4.4.2)
#> DelayedArray 0.33.5 2025-02-06 [2] (R 4.4.2)
#> DelayedMatrixStats 1.29.1 2025-02-08 [2] (R 4.4.2)
#> dendextend 1.19.0 2024-11-15 [2] RSPM (R 4.4.0)
#> devtools 2.4.5 2022-10-11 [2] RSPM (R 4.4.0)
#> digest 0.6.37 2024-08-19 [2] RSPM (R 4.4.0)
#> DirichletMultinomial 1.49.0 2024-12-19 [2] (R 4.4.2)
#> dplyr 1.1.4 2023-11-17 [2] RSPM (R 4.4.0)
#> ellipsis 0.3.2 2021-04-29 [2] RSPM (R 4.4.0)
#> emmeans 1.10.7 2025-01-31 [2] RSPM (R 4.4.0)
#> estimability 1.5.1 2024-05-12 [2] RSPM (R 4.4.0)
#> evaluate 1.0.3 2025-01-10 [2] RSPM (R 4.4.0)
#> farver 2.1.2 2024-05-13 [2] RSPM (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [2] RSPM (R 4.4.0)
#> fillpattern 1.0.2 2024-06-24 [2] RSPM (R 4.4.0)
#> foreach 1.5.2 2022-02-02 [2] RSPM (R 4.4.0)
#> fs 1.6.5 2024-10-30 [2] RSPM (R 4.4.0)
#> furrr 0.3.1 2022-08-15 [2] RSPM (R 4.4.0)
#> future 1.34.0 2024-07-29 [2] RSPM (R 4.4.0)
#> generics * 0.1.3 2022-07-05 [2] RSPM (R 4.4.0)
#> GenomeInfoDb * 1.43.4 2025-01-25 [2] (R 4.4.2)
#> GenomeInfoDbData 1.2.13 2025-02-10 [2] Bioconductor
#> GenomicRanges * 1.59.1 2024-12-19 [2] (R 4.4.2)
#> ggbeeswarm 0.7.2 2023-04-29 [2] RSPM (R 4.4.0)
#> ggnewscale 0.5.0 2024-07-19 [2] RSPM (R 4.4.0)
#> ggplot2 3.5.1 2024-04-23 [2] RSPM (R 4.4.0)
#> ggrepel 0.9.6 2024-09-07 [2] RSPM (R 4.4.0)
#> ggtext 0.1.2 2022-09-16 [2] RSPM (R 4.4.0)
#> globals 0.16.3 2024-03-08 [2] RSPM (R 4.4.0)
#> glue 1.8.0 2024-09-30 [2] RSPM (R 4.4.0)
#> gridExtra 2.3 2017-09-09 [2] RSPM (R 4.4.0)
#> gridtext 0.1.5 2022-09-16 [2] RSPM (R 4.4.0)
#> gtable 0.3.6 2024-10-25 [2] RSPM (R 4.4.0)
#> heatmaply 1.5.0 2023-10-06 [2] RSPM (R 4.4.0)
#> hms 1.1.3 2023-03-21 [2] RSPM (R 4.4.0)
#> htmltools 2024-04-04 [2] RSPM (R 4.4.0)
#> htmlwidgets 1.6.4 2023-12-06 [2] RSPM (R 4.4.0)
#> httpuv 1.6.15 2024-03-26 [2] RSPM (R 4.4.0)
#> httr 1.4.7 2023-08-15 [2] RSPM (R 4.4.0)
#> igraph 2.1.4 2025-01-23 [2] RSPM (R 4.4.0)
#> IRanges * 2.41.2 2025-02-01 [2] (R 4.4.2)
#> irlba 2022-10-03 [2] RSPM (R 4.4.0)
#> iterators 1.0.14 2022-02-05 [2] RSPM (R 4.4.0)
#> jquerylib 0.1.4 2021-04-26 [2] RSPM (R 4.4.0)
#> jsonlite 1.8.9 2024-09-20 [2] RSPM (R 4.4.0)
#> knitr 1.49 2024-11-08 [2] RSPM (R 4.4.0)
#> labeling 0.4.3 2023-08-29 [2] RSPM (R 4.4.0)
#> later 1.4.1 2024-11-27 [2] RSPM (R 4.4.0)
#> lattice 0.22-6 2024-03-20 [2] RSPM (R 4.4.0)
#> lazyeval 0.2.2 2019-03-15 [2] RSPM (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [2] RSPM (R 4.4.0)
#> listenv 0.9.1 2024-01-29 [2] RSPM (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [2] RSPM (R 4.4.0)
#> maketools 1.3.1 2024-10-04 [3] RSPM (R 4.4.0)
#> MASS 7.3-64 2025-01-04 [2] RSPM (R 4.4.0)
#> Matrix 1.7-2 2025-01-23 [2] RSPM (R 4.4.0)
#> MatrixGenerics * 1.19.1 2025-02-08 [2] (R 4.4.2)
#> matrixStats * 1.5.0 2025-01-07 [2] RSPM (R 4.4.0)
#> memoise 2.0.1 2021-11-26 [2] RSPM (R 4.4.0)
#> mgcv 1.9-1 2023-12-21 [2] RSPM (R 4.4.0)
#> mia * 1.15.20 2025-01-29 [2] (R 4.4.2)
#> microbiome 1.29.0 2024-12-19 [2] (R 4.4.2)
#> mime 0.12 2021-09-28 [2] RSPM (R 4.4.0)
#> miniUI 2018-05-18 [2] RSPM (R 4.4.0)
#> multcomp 1.4-28 2025-01-29 [2] RSPM (R 4.4.0)
#> MultiAssayExperiment * 1.33.9 2025-01-29 [2] (R 4.4.2)
#> multtest 2.63.0 2025-01-28 [2] (R 4.4.2)
#> munsell 0.5.1 2024-04-01 [2] RSPM (R 4.4.0)
#> mvtnorm 1.3-3 2025-01-10 [2] RSPM (R 4.4.0)
#> nlme 3.1-167 2025-01-27 [2] RSPM (R 4.4.0)
#> parallelly 1.42.0 2025-01-30 [2] RSPM (R 4.4.0)
#> patchwork 1.3.0 2024-09-16 [2] RSPM (R 4.4.0)
#> permute 0.9-7 2022-01-27 [2] RSPM (R 4.4.0)
#> phyloseq * 1.51.0 2025-01-30 [2] (R 4.4.2)
#> pillar 1.10.1 2025-01-07 [2] RSPM (R 4.4.0)
#> pkgbuild 1.4.6 2025-01-16 [2] RSPM (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [2] RSPM (R 4.4.0)
#> pkgload 1.4.0 2024-06-28 [2] RSPM (R 4.4.0)
#> plotly 4.10.4 2024-01-13 [2] RSPM (R 4.4.0)
#> plyr 1.8.9 2023-10-02 [2] RSPM (R 4.4.0)
#> profvis 0.4.0 2024-09-20 [2] RSPM (R 4.4.0)
#> promises 1.3.2 2024-11-28 [2] RSPM (R 4.4.0)
#> purrr 1.0.4 2025-02-05 [2] RSPM (R 4.4.0)
#> R6 2.5.1 2021-08-19 [2] RSPM (R 4.4.0)
#> ragg 1.3.3 2024-09-11 [2] RSPM (R 4.4.0)
#> rbiom 2.0.13 2025-01-24 [2] RSPM (R 4.4.0)
#> RColorBrewer 1.1-3 2022-04-03 [2] RSPM (R 4.4.0)
#> Rcpp 1.0.14 2025-01-12 [2] RSPM (R 4.4.0)
#> readr 2.1.5 2024-01-10 [2] RSPM (R 4.4.0)
#> readxl 1.4.3 2023-07-06 [2] RSPM (R 4.4.0)
#> registry 0.5-1 2019-03-05 [2] RSPM (R 4.4.0)
#> remotes 2.5.0 2024-03-17 [2] RSPM (R 4.4.0)
#> reshape2 1.4.4 2020-04-09 [2] RSPM (R 4.4.0)
#> rhdf5 2.51.2 2025-02-08 [2] (R 4.4.2)
#> rhdf5filters 1.19.0 2025-01-17 [2] (R 4.4.2)
#> Rhdf5lib 1.29.0 2025-01-29 [2] (R 4.4.2)
#> rlang 1.1.5 2025-01-17 [2] RSPM (R 4.4.0)
#> rmarkdown * 2.29 2024-11-04 [2] RSPM (R 4.4.0)
#> rsvd 1.0.5 2021-04-16 [2] RSPM (R 4.4.0)
#> Rtsne 0.17 2023-12-07 [2] RSPM (R 4.4.0)
#> S4Arrays 1.7.3 2025-02-08 [2] (R 4.4.2)
#> S4Vectors * 0.45.2 2025-01-15 [2] (R 4.4.2)
#> sandwich 3.1-1 2024-09-15 [2] RSPM (R 4.4.0)
#> sass 0.4.9 2024-03-15 [2] RSPM (R 4.4.0)
#> ScaledMatrix 1.15.0 2025-01-18 [2] (R 4.4.2)
#> scales 1.3.0 2023-11-28 [2] RSPM (R 4.4.0)
#> scater 1.35.1 2025-02-01 [2] (R 4.4.2)
#> scuttle 1.17.0 2025-01-29 [2] (R 4.4.2)
#> seriation 1.5.7 2024-12-05 [2] RSPM (R 4.4.0)
#> sessioninfo 1.2.3 2025-02-05 [2] RSPM (R 4.4.0)
#> shiny 1.10.0 2024-12-14 [2] RSPM (R 4.4.0)
#> SingleCellExperiment * 1.29.1 2025-02-07 [2] (R 4.4.2)
#> slam 0.1-55 2024-11-13 [2] RSPM (R 4.4.0)
#> SparseArray 1.7.5 2025-02-05 [2] (R 4.4.2)
#> sparseMatrixStats 1.19.0 2024-12-19 [2] (R 4.4.2)
#> stringi 1.8.4 2024-05-06 [2] RSPM (R 4.4.0)
#> stringr 1.5.1 2023-11-14 [2] RSPM (R 4.4.0)
#> SummarizedExperiment * 1.37.0 2025-01-20 [2] (R 4.4.2)
#> survival 3.8-3 2024-12-17 [2] RSPM (R 4.4.0)
#> sys 3.4.3 2024-10-04 [2] RSPM (R 4.4.0)
#> systemfonts 1.2.1 2025-01-20 [2] RSPM (R 4.4.0)
#> testthat 3.2.3 2025-01-13 [2] RSPM (R 4.4.0)
#> textshaping 1.0.0 2025-01-20 [2] RSPM (R 4.4.0)
#> 1.1-3 2025-01-17 [2] RSPM (R 4.4.0)
#> tibble 3.2.1 2023-03-20 [2] RSPM (R 4.4.0)
#> tidyr 1.3.1 2024-01-24 [2] RSPM (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [2] RSPM (R 4.4.0)
#> tidytree 0.4.6 2023-12-12 [2] RSPM (R 4.4.0)
#> treeio 1.31.0 2024-12-19 [2] (R 4.4.2)
#> TreeSummarizedExperiment * 2.15.0 2025-01-25 [2] (R 4.4.2)
#> TSP 1.2-4 2023-04-04 [2] RSPM (R 4.4.0)
#> tzdb 0.4.0 2023-05-12 [2] RSPM (R 4.4.0)
#> UCSC.utils 1.3.1 2025-01-16 [2] (R 4.4.2)
#> UpSetR 1.4.0 2019-05-22 [2] RSPM (R 4.4.0)
#> urlchecker 1.0.1 2021-11-30 [2] RSPM (R 4.4.0)
#> usethis 3.1.0 2024-11-26 [2] RSPM (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [2] RSPM (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [2] RSPM (R 4.4.0)
#> vegan 2.6-10 2025-01-29 [2] RSPM (R 4.4.0)
#> vipor 0.4.7 2023-12-18 [2] RSPM (R 4.4.0)
#> viridis 0.6.5 2024-01-29 [2] RSPM (R 4.4.0)
#> viridisLite 0.4.2 2023-05-02 [2] RSPM (R 4.4.0)
#> webshot 0.5.5 2023-06-26 [2] RSPM (R 4.4.0)
#> withr 3.0.2 2024-10-28 [2] RSPM (R 4.4.0)
#> xfun 0.50 2025-01-07 [2] RSPM (R 4.4.0)
#> xml2 1.3.6 2023-12-04 [2] RSPM (R 4.4.0)
#> xtable 1.8-4 2019-04-21 [2] RSPM (R 4.4.0)
#> XVector * 0.47.2 2025-02-07 [2] (R 4.4.2)
#> yaml 2.3.10 2024-07-26 [2] RSPM (R 4.4.0)
#> yulab.utils 0.2.0 2025-01-29 [2] RSPM (R 4.4.0)
#> zoo 1.8-12 2023-04-13 [2] RSPM (R 4.4.0)
#> [1] /tmp/RtmpogDe5Q/Rinst3cac26054101
#> [2] /github/workspace/pkglib
#> [3] /usr/local/lib/R/site-library
#> [4] /usr/lib/R/site-library
#> [5] /usr/lib/R/library
#> * ── Packages attached to the search path.
#> ──────────────────────────────────────────────────────────────────────────────