This package implements alternative inference methods for BASiCS. The original package uses adaptive Metropolis within Gibbs sampling, while BASiCStan uses Stan to enable the use of maximum a posteriori estimation, variational inference, and Hamiltonian Monte Carlo. These each have advantages for different use cases.
To use BASiCStan
, we can first use
BASICS_MockSCE()
to generate an toy example dataset. We
will also set a seed for reproducibility.
The interface for running MCMC using BASiCS and using alternative
inference methods using Stan is very similar. It is worth noting that
the joint prior between mean and over-dispersion parameters,
corresponding to Regression = TRUE
, is the only supported
mode in BASiCStan()
.
amwg_fit <- BASiCS_MCMC(
sce,
N = 200,
Thin = 10,
Burn = 100,
Regression = TRUE
)
#> altExp 'spike-ins' is assumed to contain spike-in genes.
#> see help(altExp) for details.
#> Running with spikes BASiCS sampler (regression case) ...
#> -------------------------------------------------------------
#> MCMC running time
#> -------------------------------------------------------------
#> user: 0.195
#> system: 0
#> elapsed: 0.195
#> -------------------------------------------------------------
#> Output
#> -------------------------------------------------------------
#> -------------------------------------------------------------
#> BASiCS version 2.17.0 :
#> vertical integration (spikes case)
#> -------------------------------------------------------------
stan_fit <- BASiCStan(sce, Method = "sampling", iter = 10)
#> Warning: There were 5 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
The output of BASiCStan()
is an object of class
BASiCS_Chain
, similar to the output of
BASiCS_MCMC()
. Therefore, you could use these as you would
an object generated using Metropolis within Gibbs sampling. For example,
we can plot the relationship between mean and over-dispersion estimated
using the joint regression prior:
Using Stan has advantages outside of the variety of inference engines
available. By returning a Stan object that we can later convert to a
BASiCS_Chain
object, we can leverage an even broader set of
functionality. For example, Stan has the ability to easily generate MCMC
diagnostics using simple functions. For example, summary()
outputs a number of useful per-chain and across-chain diagnostics:
stan_obj <- BASiCStan(sce, ReturnBASiCS = FALSE, Method = "sampling", iter = 10)
#>
#> SAMPLING FOR MODEL 'basics_regression' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 0.001893 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 18.93 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: WARNING: No variance estimation is
#> Chain 1: performed for num_warmup < 20
#> Chain 1:
#> Chain 1: Iteration: 1 / 10 [ 10%] (Warmup)
#> Chain 1: Iteration: 2 / 10 [ 20%] (Warmup)
#> Chain 1: Iteration: 3 / 10 [ 30%] (Warmup)
#> Chain 1: Iteration: 4 / 10 [ 40%] (Warmup)
#> Chain 1: Iteration: 5 / 10 [ 50%] (Warmup)
#> Chain 1: Iteration: 6 / 10 [ 60%] (Sampling)
#> Chain 1: Iteration: 7 / 10 [ 70%] (Sampling)
#> Chain 1: Iteration: 8 / 10 [ 80%] (Sampling)
#> Chain 1: Iteration: 9 / 10 [ 90%] (Sampling)
#> Chain 1: Iteration: 10 / 10 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.033 seconds (Warm-up)
#> Chain 1: 0.018 seconds (Sampling)
#> Chain 1: 0.051 seconds (Total)
#> Chain 1:
#> Warning: There were 5 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
head(summary(stan_obj)$summary)
#> mean se_mean sd 2.5% 25% 50% 75% 97.5%
#> log_mu[1] 3.098420 NaN 0 3.098420 3.098420 3.098420 3.098420 3.098420
#> log_mu[2] 1.058488 NaN 0 1.058488 1.058488 1.058488 1.058488 1.058488
#> log_mu[3] 2.107672 NaN 0 2.107672 2.107672 2.107672 2.107672 2.107672
#> log_mu[4] 2.245089 NaN 0 2.245089 2.245089 2.245089 2.245089 2.245089
#> log_mu[5] 2.002693 NaN 0 2.002693 2.002693 2.002693 2.002693 2.002693
#> log_mu[6] 1.457520 NaN 0 1.457520 1.457520 1.457520 1.457520 1.457520
#> n_eff Rhat
#> log_mu[1] NaN NaN
#> log_mu[2] NaN NaN
#> log_mu[3] NaN NaN
#> log_mu[4] NaN NaN
#> log_mu[5] NaN NaN
#> log_mu[6] NaN NaN
We can then convert this object to a BASiCS_Chain
and
carry on a workflow as before:
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] BASiCStan_1.9.0 rstan_2.32.6
#> [3] StanHeaders_2.32.10 BASiCS_2.17.0
#> [5] SingleCellExperiment_1.27.2 SummarizedExperiment_1.35.5
#> [7] Biobase_2.65.1 GenomicRanges_1.57.2
#> [9] GenomeInfoDb_1.41.2 IRanges_2.39.2
#> [11] S4Vectors_0.43.2 BiocGenerics_0.51.3
#> [13] MatrixGenerics_1.17.1 matrixStats_1.4.1
#> [15] rmarkdown_2.28
#>
#> loaded via a namespace (and not attached):
#> [1] gridExtra_2.3 inline_0.3.19
#> [3] rlang_1.1.4 magrittr_2.0.3
#> [5] compiler_4.4.1 DelayedMatrixStats_1.27.3
#> [7] loo_2.8.0 vctrs_0.6.5
#> [9] pkgconfig_2.0.3 crayon_1.5.3
#> [11] fastmap_1.2.0 backports_1.5.0
#> [13] XVector_0.45.0 labeling_0.4.3
#> [15] scuttle_1.15.5 utf8_1.2.4
#> [17] promises_1.3.0 UCSC.utils_1.1.0
#> [19] xfun_0.48 bluster_1.15.1
#> [21] zlibbioc_1.51.2 cachem_1.1.0
#> [23] beachmat_2.21.9 jsonlite_1.8.9
#> [25] highr_0.11 later_1.3.2
#> [27] DelayedArray_0.31.14 BiocParallel_1.39.0
#> [29] irlba_2.3.5.1 parallel_4.4.1
#> [31] cluster_2.1.6 R6_2.5.1
#> [33] bslib_0.8.0 limma_3.61.12
#> [35] jquerylib_0.1.4 Rcpp_1.0.13
#> [37] assertthat_0.2.1 knitr_1.48
#> [39] splines_4.4.1 httpuv_1.6.15
#> [41] Matrix_1.7-1 igraph_2.1.1
#> [43] abind_1.4-8 yaml_2.3.10
#> [45] viridis_0.6.5 codetools_0.2-20
#> [47] miniUI_0.1.1.1 curl_5.2.3
#> [49] pkgbuild_1.4.5 lattice_0.22-6
#> [51] tibble_3.2.1 withr_3.0.2
#> [53] shiny_1.9.1 posterior_1.6.0
#> [55] coda_0.19-4.1 evaluate_1.0.1
#> [57] RcppParallel_5.1.9 pillar_1.9.0
#> [59] tensorA_0.36.2.1 checkmate_2.3.2
#> [61] distributional_0.5.0 generics_0.1.3
#> [63] ggplot2_3.5.1 sparseMatrixStats_1.17.2
#> [65] rstantools_2.4.0 munsell_0.5.1
#> [67] scales_1.3.0 xtable_1.8-4
#> [69] glue_1.8.0 metapod_1.13.0
#> [71] maketools_1.3.1 tools_4.4.1
#> [73] hexbin_1.28.4 BiocNeighbors_1.99.3
#> [75] sys_3.4.3 ScaledMatrix_1.13.0
#> [77] locfit_1.5-9.10 buildtools_1.0.0
#> [79] scran_1.33.2 cowplot_1.1.3
#> [81] grid_4.4.1 QuickJSR_1.4.0
#> [83] edgeR_4.3.21 colorspace_2.1-1
#> [85] GenomeInfoDbData_1.2.13 BiocSingular_1.21.4
#> [87] cli_3.6.3 rsvd_1.0.5
#> [89] fansi_1.0.6 S4Arrays_1.5.11
#> [91] viridisLite_0.4.2 V8_6.0.0
#> [93] glmGamPoi_1.17.4 gtable_0.3.6
#> [95] sass_0.4.9 digest_0.6.37
#> [97] SparseArray_1.5.45 dqrng_0.4.1
#> [99] farver_2.1.2 htmltools_0.5.8.1
#> [101] lifecycle_1.0.4 httr_1.4.7
#> [103] statmod_1.5.0 mime_0.12
#> [105] ggExtra_0.10.1 MASS_7.3-61