dreamlet()
evaluates precision-weighted linear (mixed)
models on each gene that passes standard filters. The linear mixed model
used by dream()
can be a little fragile for small sample
sizes and correlated covariates. dreamlet()
runs
variancePartition::dream()
in the backend for each cell
cluster. dream()
reports model failures for each cell
cluster and dreamlet()
reports these failures to the user.
dreamlet()
returns all successful model
fits to be used for downstream analysis.
See details from variancePartition
error
page.
Due to a recent bug in the dependency
Matrix
package, all random effects models may fail for
technical reasons. If your random effects analysis is failing for all
genes in cases with no good explanation, this bug may be responsible.
This case can be detected and resolved as follows:
library(lme4)
# Fit simple mixed model
lmer(Reaction ~ (1 | Subject), sleepstudy)
# Error in initializePtr() :
# function 'chm_factor_ldetL2' not provided by package 'Matrix'
This error indicates incompatible installs of Matrix
and
lme4
. This can be solved with
followed by restarting R.
The most common issue is when dreamlet()
analysis
succeeds for most genes, but a handful of genes fail in each cell
cluster. These genes can fail if the iterative process of fitting the
linear mixed model does not converge, or if the estimated covariance
matrix that is supposed be positive definite has an eigen-value that is
negative or too close to zero due to rounding errors in floating point
arithmetic.
In these cases, dreamlet()
stores a summary of these
failures for all cell clusters that is accessible with
details()
. Specific failure messages for each cell cluster
and gene can be extracted using seeErrors()
Here we demonstrate how dreamlet()
handles model
failures:
library(dreamlet)
library(muscat)
library(SingleCellExperiment)
data(example_sce)
# create pseudobulk for each sample and cell cluster
pb <- aggregateToPseudoBulk(example_sce,
assay = "counts",
cluster_id = "cluster_id",
sample_id = "sample_id",
verbose = FALSE
)
# voom-style normalization for each cell cluster
res.proc <- processAssays(
pb[1:300, ],
~group_id
)
# Redundant formula
# This example is an extreme example of redundancy
# but more subtle cases often show up in real data
form <- ~ group_id + (1 | group_id)
# fit dreamlet model
res.dl <- dreamlet(res.proc, form)
## B cells...7.9 secs
## CD14+ Monocytes...10 secs
## CD4 T cells...9 secs
## CD8 T cells...4.4 secs
## FCGR3A+ Monocytes...11 secs
##
## Of 1,062 models fit across all assays, 96.2% failed
# summary of models
res.dl
## class: dreamletResult
## assays(5): B cells CD14+ Monocytes CD4 T cells CD8 T cells FCGR3A+ Monocytes
## Genes:
## min: 3
## max: 11
## details(7): assay n_retain ... n_errors error_initial
## coefNames(2): (Intercept) group_idstim
##
## Of 1,062 models fit across all assays, 96.2% failed
# summary of models for each cell cluster
details(res.dl)
## assay n_retain formula formDropsTerms n_genes n_errors error_initial
## 1 B cells 4 ~group_id + (1 | group_id) FALSE 201 190 FALSE
## 2 CD14+ Monocytes 4 ~group_id + (1 | group_id) FALSE 269 263 FALSE
## 3 CD4 T cells 4 ~group_id + (1 | group_id) FALSE 216 207 FALSE
## 4 CD8 T cells 4 ~group_id + (1 | group_id) FALSE 118 115 FALSE
## 5 FCGR3A+ Monocytes 4 ~group_id + (1 | group_id) FALSE 258 247 FALSE
assay
: cell typen_retain
: number of samples retainedformula
: regression formula used after variable
filteringformDropsTerms
: whether a variable was dropped from the
formula for having zero variance following filteringn_genes
: number of genes analyzedn_errors
: number of genes with errorserror_initial
: indicator for assay-level errorBefore the full dataset is analyzed, dreamlet()
runs a
test for each assay to see if the model succeeds. If the model fails,
its does not continue analysis for that assay. These assay-level errors
are reported above in the error_initial
column, and details
are returned here.
# Extract errors as a tibble
res.err = seeErrors(res.dl)
## Assay-level errors: 0
## Gene-level errors: 1038
# No errors at the assay level
res.err$assayLevel
# the most common error is:
"Some predictor variables are on very different scales: consider rescaling"
This indicates that the scale of the predictor variables are very
different and can affect the numerical stability of the iterative
algorithm. This can be solved by running scale()
on each
variable in the formula:
A model can fail for a single gene if covariates are too correlated, or for other numerical issues. Failed models are reported here and are not included in downstream analysis.
# See gene-level errors for each assay
res.err$geneLevel[1:2,]
## # A tibble: 2 × 3
## assay feature errorText
## <chr> <chr> <chr>
## B cells ISG15 "Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol):…
## B cells AURKAIP1 "Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol):…
# See full error message text
res.err$geneLevel$errorText[1]
"Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol): (converted from warning)
Model may not have converged with 1 eigenvalue close to zero: 1.4e-09\n"
This message indicates that the model was numerically unstable likely because the variables are closely correlated.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] GO.db_3.20.0 org.Hs.eg.db_3.20.0
## [3] AnnotationDbi_1.69.0 zenith_1.9.0
## [5] muscData_1.20.0 scater_1.35.0
## [7] scuttle_1.17.0 ExperimentHub_2.15.0
## [9] AnnotationHub_3.15.0 BiocFileCache_2.15.0
## [11] dbplyr_2.5.0 muscat_1.21.0
## [13] dreamlet_1.5.0 SingleCellExperiment_1.29.1
## [15] SummarizedExperiment_1.37.0 Biobase_2.67.0
## [17] GenomicRanges_1.59.1 GenomeInfoDb_1.43.2
## [19] IRanges_2.41.2 S4Vectors_0.45.2
## [21] BiocGenerics_0.53.3 generics_0.1.3
## [23] MatrixGenerics_1.19.0 matrixStats_1.4.1
## [25] variancePartition_1.37.1 BiocParallel_1.41.0
## [27] limma_3.63.2 ggplot2_3.5.1
## [29] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-9 httr_1.4.7
## [3] RColorBrewer_1.1-3 doParallel_1.0.17
## [5] Rgraphviz_2.51.0 numDeriv_2016.8-1.1
## [7] tools_4.4.2 sctransform_0.4.1
## [9] backports_1.5.0 utf8_1.2.4
## [11] R6_2.5.1 metafor_4.6-0
## [13] mgcv_1.9-1 GetoptLong_1.0.5
## [15] withr_3.0.2 prettyunits_1.2.0
## [17] gridExtra_2.3 cli_3.6.3
## [19] labeling_0.4.3 sass_0.4.9
## [21] KEGGgraph_1.67.0 SQUAREM_2021.1
## [23] mvtnorm_1.3-2 blme_1.0-6
## [25] mixsqp_0.3-54 parallelly_1.40.1
## [27] invgamma_1.1 RSQLite_2.3.9
## [29] shape_1.4.6.1 gtools_3.9.5
## [31] dplyr_1.1.4 Matrix_1.7-1
## [33] metadat_1.2-0 ggbeeswarm_0.7.2
## [35] abind_1.4-8 lifecycle_1.0.4
## [37] yaml_2.3.10 edgeR_4.5.1
## [39] mathjaxr_1.6-0 gplots_3.2.0
## [41] SparseArray_1.7.2 grid_4.4.2
## [43] blob_1.2.4 crayon_1.5.3
## [45] lattice_0.22-6 beachmat_2.23.5
## [47] msigdbr_7.5.1 annotate_1.85.0
## [49] KEGGREST_1.47.0 sys_3.4.3
## [51] maketools_1.3.1 pillar_1.10.0
## [53] knitr_1.49 ComplexHeatmap_2.23.0
## [55] rjson_0.2.23 boot_1.3-31
## [57] corpcor_1.6.10 future.apply_1.11.3
## [59] codetools_0.2-20 glue_1.8.0
## [61] data.table_1.16.4 vctrs_0.6.5
## [63] png_0.1-8 Rdpack_2.6.2
## [65] gtable_0.3.6 assertthat_0.2.1
## [67] cachem_1.1.0 xfun_0.49
## [69] mime_0.12 rbibutils_2.3
## [71] S4Arrays_1.7.1 Rfast_2.1.0
## [73] reformulas_0.4.0 iterators_1.0.14
## [75] statmod_1.5.0 nlme_3.1-166
## [77] pbkrtest_0.5.3 bit64_4.5.2
## [79] filelock_1.0.3 progress_1.2.3
## [81] EnvStats_3.0.0 bslib_0.8.0
## [83] TMB_1.9.15 irlba_2.3.5.1
## [85] vipor_0.4.7 KernSmooth_2.23-24
## [87] colorspace_2.1-1 rmeta_3.0
## [89] DBI_1.2.3 DESeq2_1.47.1
## [91] tidyselect_1.2.1 curl_6.0.1
## [93] bit_4.5.0.1 compiler_4.4.2
## [95] graph_1.85.0 BiocNeighbors_2.1.2
## [97] DelayedArray_0.33.3 scales_1.3.0
## [99] caTools_1.18.3 remaCor_0.0.18
## [101] rappdirs_0.3.3 stringr_1.5.1
## [103] digest_0.6.37 minqa_1.2.8
## [105] rmarkdown_2.29 aod_1.3.3
## [107] XVector_0.47.0 RhpcBLASctl_0.23-42
## [109] htmltools_0.5.8.1 pkgconfig_2.0.3
## [111] lme4_1.1-35.5 sparseMatrixStats_1.19.0
## [113] mashr_0.2.79 fastmap_1.2.0
## [115] rlang_1.1.4 GlobalOptions_0.1.2
## [117] UCSC.utils_1.3.0 DelayedMatrixStats_1.29.0
## [119] farver_2.1.2 jquerylib_0.1.4
## [121] jsonlite_1.8.9 BiocSingular_1.23.0
## [123] RCurl_1.98-1.16 magrittr_2.0.3
## [125] GenomeInfoDbData_1.2.13 munsell_0.5.1
## [127] Rcpp_1.0.13-1 babelgene_22.9
## [129] viridis_0.6.5 EnrichmentBrowser_2.37.0
## [131] RcppZiggurat_0.1.6 stringi_1.8.4
## [133] zlibbioc_1.52.0 MASS_7.3-61
## [135] plyr_1.8.9 parallel_4.4.2
## [137] listenv_0.9.1 ggrepel_0.9.6
## [139] Biostrings_2.75.3 splines_4.4.2
## [141] hms_1.1.3 circlize_0.4.16
## [143] locfit_1.5-9.10 buildtools_1.0.0
## [145] reshape2_1.4.4 ScaledMatrix_1.15.0
## [147] BiocVersion_3.21.1 XML_3.99-0.17
## [149] evaluate_1.0.1 RcppParallel_5.1.9
## [151] BiocManager_1.30.25 nloptr_2.1.1
## [153] foreach_1.5.2 tidyr_1.3.1
## [155] purrr_1.0.2 future_1.34.0
## [157] clue_0.3-66 scattermore_1.2
## [159] ashr_2.2-63 rsvd_1.0.5
## [161] broom_1.0.7 xtable_1.8-4
## [163] fANCOVA_0.6-1 viridisLite_0.4.2
## [165] truncnorm_1.0-9 tibble_3.2.1
## [167] lmerTest_3.1-3 glmmTMB_1.1.10
## [169] memoise_2.0.1 beeswarm_0.4.0
## [171] cluster_2.1.8 globals_0.16.3
## [173] GSEABase_1.69.0