---
title: "Error handling"
author: "Developed by [Gabriel Hoffman](http://gabrielhoffman.github.io/)"
date: "Run on `r format(Sys.time())`"
documentclass: article
vignette: >
%\VignetteIndexEntry{Error handling}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\usepackage[utf8]{inputenc}
# output:
# BiocStyle::html_document:
# toc: true
# toc_float: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = TRUE,
warning = FALSE,
message = FALSE,
error = FALSE,
tidy = FALSE,
dev = c("png"),
cache = TRUE
)
```
`dreamlet()` evaluates precision-weighted linear (mixed) models on each gene that passes standard filters. The linear mixed model used by `dream()` can be a little fragile for small sample sizes and correlated covariates. `dreamlet()` runs `variancePartition::dream()` in the backend for each cell cluster. `dream()` reports model failures for each cell cluster and `dreamlet()` reports these failures to the user. `dreamlet()` returns all **successful** model fits to be used for downstream analysis.
See details from `variancePartition` [error page](https://diseaseneurogenomics.github.io/variancePartition/articles/errors.html).
# Errors with random effects
Due to a recent [bug](https://github.com/lme4/lme4/issues/763) in the dependency `Matrix` package, all random effects models may fail for technical reasons. If your random effects analysis is failing for all genes in cases with no good explanation, this bug may be responsible. This case can be detected and resolved as follows:
```{r lme4, eval=FALSE}
library(lme4)
# Fit simple mixed model
lmer(Reaction ~ (1 | Subject), sleepstudy)
# Error in initializePtr() :
# function 'chm_factor_ldetL2' not provided by package 'Matrix'
```
This error indicates incompatible installs of `Matrix` and `lme4`. This can be solved with
```{r install, eval=FALSE}
install.packages("lme4", type = "source")
```
followed by __restarting__ R.
# Errors at the assay- and gene-level
The most common issue is when `dreamlet()` analysis succeeds for most genes, but a handful of genes fail in each cell cluster. These genes can fail if the iterative process of fitting the linear mixed model does not converge, or if the estimated covariance matrix that is supposed be positive definite has an eigen-value that is negative or too close to zero due to rounding errors in floating point arithmetic.
In these cases, `dreamlet()` stores a summary of these failures for all cell clusters that is accessible with `details()`. Specific failure messages for each cell cluster and gene can be extracted using `seeErrors()`
Here we demonstrate how `dreamlet()` handles model failures:
```{r error, eval=FALSE}
library(dreamlet)
library(muscat)
library(SingleCellExperiment)
data(example_sce)
# create pseudobulk for each sample and cell cluster
pb <- aggregateToPseudoBulk(example_sce,
assay = "counts",
cluster_id = "cluster_id",
sample_id = "sample_id",
verbose = FALSE
)
# voom-style normalization for each cell cluster
res.proc <- processAssays(
pb[1:300, ],
~group_id
)
# Redundant formula
# This example is an extreme example of redundancy
# but more subtle cases often show up in real data
form <- ~ group_id + (1 | group_id)
# fit dreamlet model
res.dl <- dreamlet(res.proc, form)
## B cells...7.9 secs
## CD14+ Monocytes...10 secs
## CD4 T cells...9 secs
## CD8 T cells...4.4 secs
## FCGR3A+ Monocytes...11 secs
##
## Of 1,062 models fit across all assays, 96.2% failed
# summary of models
res.dl
## class: dreamletResult
## assays(5): B cells CD14+ Monocytes CD4 T cells CD8 T cells FCGR3A+ Monocytes
## Genes:
## min: 3
## max: 11
## details(7): assay n_retain ... n_errors error_initial
## coefNames(2): (Intercept) group_idstim
##
## Of 1,062 models fit across all assays, 96.2% failed
# summary of models for each cell cluster
details(res.dl)
## assay n_retain formula formDropsTerms n_genes n_errors error_initial
## 1 B cells 4 ~group_id + (1 | group_id) FALSE 201 190 FALSE
## 2 CD14+ Monocytes 4 ~group_id + (1 | group_id) FALSE 269 263 FALSE
## 3 CD4 T cells 4 ~group_id + (1 | group_id) FALSE 216 207 FALSE
## 4 CD8 T cells 4 ~group_id + (1 | group_id) FALSE 118 115 FALSE
## 5 FCGR3A+ Monocytes 4 ~group_id + (1 | group_id) FALSE 258 247 FALSE
```
- `assay`: cell type
- `n_retain`: number of samples retained
- `formula`: regression formula used after variable filtering
- `formDropsTerms`: whether a variable was dropped from the formula for having zero variance following filtering
- `n_genes`: number of genes analyzed
- `n_errors`: number of genes with errors
- `error_initial`: indicator for assay-level error
## Assay-level errors
Before the full dataset is analyzed, `dreamlet()` runs a test for each assay to see if the model succeeds. If the model fails, its does not continue analysis for that assay. These assay-level errors are reported above in the `error_initial` column, and details are returned here.
```{r err1, eval=FALSE}
# Extract errors as a tibble
res.err = seeErrors(res.dl)
## Assay-level errors: 0
## Gene-level errors: 1038
# No errors at the assay level
res.err$assayLevel
# the most common error is:
"Some predictor variables are on very different scales: consider rescaling"
```
This indicates that the scale of the predictor variables are very different and can affect the numerical stability of the iterative algorithm. This can be solved by running `scale()` on each variable in the formula:
```{r formula, eval=FALSE}
form = ~ scale(x) + scale(y) + ...
```
## Gene-level errors
A model can fail for a single gene if covariates are too correlated, or for other numerical issues. Failed models are reported here and are not included in downstream analysis.
```{r err2, eval=FALSE}
# See gene-level errors for each assay
res.err$geneLevel[1:2,]
## # A tibble: 2 × 3
## assay feature errorText
##
## B cells ISG15 "Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol):…
## B cells AURKAIP1 "Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol):…
# See full error message text
res.err$geneLevel$errorText[1]
"Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol): (converted from warning)
Model may not have converged with 1 eigenvalue close to zero: 1.4e-09\n"
```
This message indicates that the model was numerically unstable likely because the variables are closely correlated.
# Session Info
```{r session, echo=FALSE}
sessionInfo()
```