This vignette is made for users that are already familiar with the basic condiments workflow described in the first vignette. Here, we will show how to modify the default parameters for the first two steps of the workflow
We rely on the same toy dataset as the first vignette
By default, the topologyTest function requires only
two inputs, the sds
object and the condition
labels. To limit run time for the vignette, we also change the default
number of permutations used to generate trajectories under the null by
setting the rep
argument to 10 instead of the default 100. As such, the test statistics might be
more variable.
## Generating permuted trajectories
## Running KS-mean test
method | thresh | statistic | p.value |
---|---|---|---|
KS_mean | 0.01 | 0 | 1 |
The topologyTest function can be relatively slow on
large datasets. Moreover, when changing the method used to test the null
hypothesis that a common trajectory should be fitted, the first
permutation part of generating rep
trajectories under the
null is identical. Therefore, we allow users to specify more than one
method and one value of the threshold. Here, we will use both the
Kolmogorov-Smirnov test test(Smirnov 1939)
and the classifier-test(Lopez-Paz and Oquab
2016).
top_res <- topologyTest(sds = sds, conditions = df$conditions, rep = 10,
methods = c("KS_mean", "Classifier"),
threshs = c(0, .01, .05, .1))
## Generating permuted trajectories
## Running KS-mean test
## Running Classifier test
method | thresh | statistic | p.value |
---|---|---|---|
KS_mean | 0 | 0.0070000 | 1.0000000 |
KS_mean | 0.01 | 0.0000000 | 1.0000000 |
KS_mean | 0.05 | 0.0000000 | 1.0000000 |
KS_mean | 0.1 | 0.0000000 | 1.0000000 |
Classifier | 0 | 0.4150000 | 0.9999821 |
Classifier | 0.01 | 0.3800000 | 1.0000000 |
Classifier | 0.05 | 0.3333333 | 1.0000000 |
Classifier | 0.1 | 0.2833333 | 1.0000000 |
To see all methods avaible, use
/tmp/RtmpmmEDAO/Rinst24524556e80c/condiments/help/topologyTest and look
at the methods
argument.
For all methods but the KS test, additional paramters can be
specified, using a custom argument: args_classifier
,
args_wass
or args_mmd
. See the help file for
given test more information on those parameters. For example, since the
default test based on the wasserstein distance and permutation test is
quite slow, we can pass a fast
argument.
top_res <- topologyTest(sds = sds, conditions = df$conditions, rep = 10,
methods = "wasserstein_permutation",
args_wass = list(fast = TRUE, S = 100, iterations = 10^2))
## Generating permuted trajectories
## Running wassertsein permutation test
method | thresh | statistic | p.value |
---|---|---|---|
wasserstein_permutation | NA | 1.356861 | 0.85 |
For now, the first part of the topologyTest has been designed for parallelisation using the BiocParallel package. For example, to run with 4 cores, you can run the following command
The tests for the second test are much less compute-intensive, therefore there is no parallelisation. However, the other changes introduce in the previous section are still possible
lineage | statistic | p.value |
---|---|---|
All | 5.504172 | 0 |
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
##
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
pair | statistic | p.value |
---|---|---|
1vs2 | 0.6386486 | 1.9e-05 |
prog_res <- progressionTest(sds, conditions = df$conditions, method = "Classifier")
knitr::kable(prog_res)
lineage | statistic | p.value |
---|---|---|
All | 0.5890991 | 0.0043539 |
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
##
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
pair | statistic | p.value |
---|---|---|
1vs2 | 0.6121622 | 0.0004817 |
prog_res <- progressionTest(sds, conditions = df$conditions, method = "Classifier",
args_classifier = list(method = "rf"))
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
lineage | statistic | p.value |
---|---|---|
All | 0.517027 | 0.3192952 |
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
##
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
pair | statistic | p.value |
---|---|---|
1vs2 | 0.6611712 | 8e-07 |
For all of the above procedures, it is important to note that we are making multiple comparisons. The p-values we obtain from these tests should be corrected for multiple testing, especially for trajectories with a large number of lineages.
That said, trajectory inference is often one of the last computational methods in a very long analysis pipeline (generally including gene-level quantification, gene filtering / feature selection, and dimensionality reduction). Hence, we strongly discourage the reader from putting too much faith in any p-value that comes out of this analysis. Such values may be useful suggestions, indicating particular features or cells for follow-up study, but should generally not be treated as meaningful statistical quantities.
If some commands and parameters are still unclear after going through this vignette, do not hesitate to open an issue on the condiments Github repository.
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] caret_6.0-94 lattice_0.22-6
## [3] viridis_0.6.5 viridisLite_0.4.2
## [5] RColorBrewer_1.1-3 ggplot2_3.5.1
## [7] tidyr_1.3.1 dplyr_1.1.4
## [9] slingshot_2.13.0 TrajectoryUtils_1.13.0
## [11] SingleCellExperiment_1.27.2 SummarizedExperiment_1.35.5
## [13] Biobase_2.67.0 GenomicRanges_1.57.2
## [15] GenomeInfoDb_1.41.2 IRanges_2.39.2
## [17] S4Vectors_0.43.2 BiocGenerics_0.53.0
## [19] MatrixGenerics_1.17.1 matrixStats_1.4.1
## [21] princurve_2.1.6 condiments_1.15.0
## [23] knitr_1.48 rmarkdown_2.28
##
## loaded via a namespace (and not attached):
## [1] sys_3.4.3 jsonlite_1.8.9
## [3] magrittr_2.0.3 spatstat.utils_3.1-0
## [5] ggbeeswarm_0.7.2 farver_2.1.2
## [7] zlibbioc_1.51.2 vctrs_0.6.5
## [9] DelayedMatrixStats_1.27.3 htmltools_0.5.8.1
## [11] S4Arrays_1.5.11 BiocNeighbors_2.1.0
## [13] SparseArray_1.5.45 pROC_1.18.5
## [15] sass_0.4.9 parallelly_1.38.0
## [17] bslib_0.8.0 plyr_1.8.9
## [19] lubridate_1.9.3 cachem_1.1.0
## [21] buildtools_1.0.0 igraph_2.1.1
## [23] lifecycle_1.0.4 iterators_1.0.14
## [25] pkgconfig_2.0.3 rsvd_1.0.5
## [27] Matrix_1.7-1 R6_2.5.1
## [29] fastmap_1.2.0 GenomeInfoDbData_1.2.13
## [31] future_1.34.0 digest_0.6.37
## [33] colorspace_2.1-1 scater_1.33.4
## [35] irlba_2.3.5.1 beachmat_2.23.0
## [37] labeling_0.4.3 randomForest_4.7-1.2
## [39] fansi_1.0.6 timechange_0.3.0
## [41] mgcv_1.9-1 httr_1.4.7
## [43] abind_1.4-8 compiler_4.4.1
## [45] rngtools_1.5.2 proxy_0.4-27
## [47] withr_3.0.2 doParallel_1.0.17
## [49] BiocParallel_1.39.0 highr_0.11
## [51] MASS_7.3-61 lava_1.8.0
## [53] DelayedArray_0.31.14 ModelMetrics_1.2.2.2
## [55] tools_4.4.1 vipor_0.4.7
## [57] beeswarm_0.4.0 future.apply_1.11.3
## [59] nnet_7.3-19 glue_1.8.0
## [61] nlme_3.1-166 grid_4.4.1
## [63] reshape2_1.4.4 generics_0.1.3
## [65] recipes_1.1.0 gtable_0.3.6
## [67] class_7.3-22 data.table_1.16.2
## [69] BiocSingular_1.23.0 ScaledMatrix_1.13.0
## [71] utf8_1.2.4 XVector_0.45.0
## [73] ggrepel_0.9.6 RANN_2.6.2
## [75] foreach_1.5.2 pillar_1.9.0
## [77] stringr_1.5.1 limma_3.61.12
## [79] Ecume_0.9.2 splines_4.4.1
## [81] survival_3.7-0 tidyselect_1.2.1
## [83] maketools_1.3.1 scuttle_1.15.5
## [85] pbapply_1.7-2 transport_0.15-4
## [87] gridExtra_2.3 xfun_0.48
## [89] statmod_1.5.0 hardhat_1.4.0
## [91] distinct_1.17.0 timeDate_4041.110
## [93] stringi_1.8.4 UCSC.utils_1.1.0
## [95] yaml_2.3.10 evaluate_1.0.1
## [97] codetools_0.2-20 kernlab_0.9-33
## [99] tibble_3.2.1 cli_3.6.3
## [101] rpart_4.1.23 munsell_0.5.1
## [103] jquerylib_0.1.4 Rcpp_1.0.13
## [105] globals_0.16.3 spatstat.univar_3.0-1
## [107] parallel_4.4.1 gower_1.0.1
## [109] doRNG_1.8.6 sparseMatrixStats_1.17.2
## [111] listenv_0.9.1 ipred_0.9-15
## [113] scales_1.3.0 prodlim_2024.06.25
## [115] e1071_1.7-16 purrr_1.0.2
## [117] crayon_1.5.3 rlang_1.1.4