spoon
is a method to address the mean-variance
relationship in spatially resolved transcriptomics data. Current
approaches rank spatially variable genes based on either p-values or
some effect size, such as the proportion of spatially variable genes.
However, previous work in RNA-sequencing has shown that a technical
bias, referred to as the “mean-variance relationship”, exists in these
data in that the gene-level variance is correlated with mean RNA
expression. We found that there is a “mean-variance relationship” in
spatial transcriptomics data, and so we propose spoon
, a
statistical framework to prioritize spatially variable genes that is not
confounded by this relationship. We fit a spline curve to estimate the
mean-variance relationship. Then, similar to using weights in a weighted
least squares model, we used weights that we plugged into a Gaussian
Process Regression model fit with a nearest-neighbor Gaussian process
model to the preprocessed expression measurements for each gene,
i.e. one model per gene. spoon
removes the bias and leads
to a more informative set of spatially variable genes.
The generate_weights()
function calculates individual
observation weights, where an individual observation is a UMI (unique
molecular identifier) count value for a specific gene and sample. If the
desired SVG detection method accepts weights, then the individual
observation weights can be used as inputs. If the desired SVG detection
method does not accept weights, then the Delta method is leveraged to
rescale the data and covariates by the weights. These scaled data and
covariates are used as inputs into the desired SVG detection
function.
Bioconductor houses the infrastructure to store and analyze spatially
resolved transcriptomics data for R users, including many SVG detection
methods. This method addresses the mean-variance relationship
confounding SVG detection, which is related to these other Bioconductor
packages. Additionally, spoon
is inspired by
limma::voom()
, which is a popular Bioconductor
package.
The following code will install the latest release version of the
spoon
package from Bioconductor. Additional details are
shown on the Bioconductor
page.
The latest development version can also be installed from the
devel
version of Bioconductor or from GitHub.
We recommend the input data be provided as a SpatialExperiment
Bioconductor object. The outputs are stored in the rowData
of the SpatialExperiment
object. The examples below use
this input data format.
The inputs can also be provided as a numeric matrix of raw counts and a numeric matrix of spatial coordinates.
Load packages and data
## Loading required package: ExperimentHub
## Loading required package: BiocGenerics
## Loading required package: generics
##
## Attaching package: 'generics'
## The following objects are masked from 'package:base':
##
## as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
## setequal, union
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
## mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
## unsplit, which.max, which.min
## Loading required package: AnnotationHub
## Loading required package: BiocFileCache
## Loading required package: dbplyr
## Loading required package: SingleCellExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
## The following object is masked from 'package:ExperimentHub':
##
## cache
## The following object is masked from 'package:AnnotationHub':
##
## cache
## Loading required package: SpatialExperiment
## Loading required package: RANN
## Loading required package: parallel
## Loading required package: rdist
## Loading required package: pbapply
## The ordering of inputs x (covariates) and y (response) in BRISC_estimation has been changed BRISC 1.0.0 onwards.
## Please check the new documentation with ?BRISC_estimation.
##
## Attaching package: 'Matrix'
## The following object is masked from 'package:S4Vectors':
##
## expand
## see ?STexampleData and browseVignettes('STexampleData') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
Preprocessing
# keep spots over tissue
spe <- spe[, colData(spe)$in_tissue == 1]
# filter out low quality genes
spe <- filter_genes(spe)
## Gene filtering: removing mitochondrial genes
## removed 13 mitochondrial genes
## Gene filtering: retaining genes with at least 3 counts in at least 0.5% (n = 14) of spatial locations
## removed 21883 out of 32272 genes due to low expression
# calculate logcounts (log-transformed normalized counts) using scran package
spe <- computeLibraryFactors(spe)
spe <- logNormCounts(spe)
# choose a small number of genes for this example to run quickly
set.seed(3)
ix_random <- sample(seq_len(nrow(spe)), 10)
spe <- spe[ix_random, ]
# remove spots with zero counts
spe <- spe[, colSums(logcounts(spe)) > 0]
Step 1: generate weights
weights <- generate_weights(input = spe,
stabilize = TRUE,
BPPARAM = MulticoreParam(workers = 1,
RNGseed = 4))
## 21.8962962962963% of observations had their weight stabilized
Step 2: weighted SVG detection
## Warning in nnSVG(spe[i, ], X = matrix(w[, i]), assay_name =
## "weighted_logcounts"): Rows (genes) and/or columns (spots) containing all zero
## counts have been found. Please see examples in tutorial for code to filter out
## zeros and/or low-expressed genes to avoid errors.
## Warning in nnSVG(spe[i, ], X = matrix(w[, i]), assay_name =
## "weighted_logcounts"): Rows (genes) and/or columns (spots) containing all zero
## counts have been found. Please see examples in tutorial for code to filter out
## zeros and/or low-expressed genes to avoid errors.
## Warning in nnSVG(spe[i, ], X = matrix(w[, i]), assay_name =
## "weighted_logcounts"): Rows (genes) and/or columns (spots) containing all zero
## counts have been found. Please see examples in tutorial for code to filter out
## zeros and/or low-expressed genes to avoid errors.
## Warning in nnSVG(spe[i, ], X = matrix(w[, i]), assay_name =
## "weighted_logcounts"): Rows (genes) and/or columns (spots) containing all zero
## counts have been found. Please see examples in tutorial for code to filter out
## zeros and/or low-expressed genes to avoid errors.
## Warning in nnSVG(spe[i, ], X = matrix(w[, i]), assay_name =
## "weighted_logcounts"): Rows (genes) and/or columns (spots) containing all zero
## counts have been found. Please see examples in tutorial for code to filter out
## zeros and/or low-expressed genes to avoid errors.
## Warning in nnSVG(spe[i, ], X = matrix(w[, i]), assay_name =
## "weighted_logcounts"): Rows (genes) and/or columns (spots) containing all zero
## counts have been found. Please see examples in tutorial for code to filter out
## zeros and/or low-expressed genes to avoid errors.
## Warning in nnSVG(spe[i, ], X = matrix(w[, i]), assay_name =
## "weighted_logcounts"): Rows (genes) and/or columns (spots) containing all zero
## counts have been found. Please see examples in tutorial for code to filter out
## zeros and/or low-expressed genes to avoid errors.
## Warning in nnSVG(spe[i, ], X = matrix(w[, i]), assay_name =
## "weighted_logcounts"): Rows (genes) and/or columns (spots) containing all zero
## counts have been found. Please see examples in tutorial for code to filter out
## zeros and/or low-expressed genes to avoid errors.
## Warning in nnSVG(spe[i, ], X = matrix(w[, i]), assay_name =
## "weighted_logcounts"): Rows (genes) and/or columns (spots) containing all zero
## counts have been found. Please see examples in tutorial for code to filter out
## zeros and/or low-expressed genes to avoid errors.
## Warning in nnSVG(spe[i, ], X = matrix(w[, i]), assay_name =
## "weighted_logcounts"): Rows (genes) and/or columns (spots) containing all zero
## counts have been found. Please see examples in tutorial for code to filter out
## zeros and/or low-expressed genes to avoid errors.
Show results
## DataFrame with 10 rows and 11 columns
## gene_id gene_name feature_type weighted_mean
## <character> <character> <character> <numeric>
## ENSMUSG00000030282 ENSMUSG00000030282 Cmas Gene Expression 2.648394
## ENSMUSG00000022601 ENSMUSG00000022601 Zbtb11 Gene Expression 0.950335
## ENSMUSG00000040220 ENSMUSG00000040220 Gas8 Gene Expression 0.403709
## ENSMUSG00000020704 ENSMUSG00000020704 Asic2 Gene Expression 1.320342
## ENSMUSG00000019173 ENSMUSG00000019173 Rab5c Gene Expression 2.830450
## ENSMUSG00000042156 ENSMUSG00000042156 Dzip1 Gene Expression 0.919618
## ENSMUSG00000050856 ENSMUSG00000050856 Atp5k Gene Expression 40.248323
## ENSMUSG00000033594 ENSMUSG00000033594 Spata2l Gene Expression 0.769744
## ENSMUSG00000037003 ENSMUSG00000037003 Tns2 Gene Expression 0.362365
## ENSMUSG00000026817 ENSMUSG00000026817 Ak1 Gene Expression 2.459034
## weighted_LR_stat weighted_sigma.sq weighted_tau.sq
## <numeric> <numeric> <numeric>
## ENSMUSG00000030282 502.84174 1.2603087 1.513980
## ENSMUSG00000022601 179.37382 0.7362598 1.079312
## ENSMUSG00000040220 7.10899 0.0161404 0.763038
## ENSMUSG00000020704 378.40033 1.2209165 1.177764
## ENSMUSG00000019173 38.96779 0.1380800 1.397705
## ENSMUSG00000042156 87.34257 0.1366753 1.315818
## ENSMUSG00000050856 141.25778 3.4727316 11.651506
## ENSMUSG00000033594 308.28712 0.6077107 1.038931
## ENSMUSG00000037003 228.11029 0.8156710 0.582559
## ENSMUSG00000026817 190.23915 0.3057843 1.490163
## weighted_prop_sv weighted_phi weighted_padj weighted_rank
## <numeric> <numeric> <numeric> <numeric>
## ENSMUSG00000030282 0.4542817 6.153329 0.00000e+00 1
## ENSMUSG00000022601 0.4055249 2.217456 0.00000e+00 6
## ENSMUSG00000040220 0.0207146 2.117324 2.85958e-02 10
## ENSMUSG00000020704 0.5089950 1.872969 0.00000e+00 2
## ENSMUSG00000019173 0.0899084 16.532327 3.45344e-09 9
## ENSMUSG00000042156 0.0940970 4.405530 0.00000e+00 8
## ENSMUSG00000050856 0.2296137 27.166397 0.00000e+00 7
## ENSMUSG00000033594 0.3690607 2.204091 0.00000e+00 3
## ENSMUSG00000037003 0.5833596 0.179876 0.00000e+00 4
## ENSMUSG00000026817 0.1702635 11.461213 0.00000e+00 5
Other SVG detection tools
We provided a function to use the weights with nnSVG for more accurate spatially variable gene detection. The weights can also be used with other spatially variable gene detection tools using the following procedure:
assay_name <- "logcounts"
weighted_logcounts <- t(weights)*assays(spe)[[assay_name]]
assay(spe, "weighted_logcounts") <- weighted_logcounts
weighted_logcounts
can be accessed from
assay(spe, "weighted_logcounts")
. Then,
weighted_logcounts
should be used as the input counts
matrix and weights
as the input covariate matrix in a
spatially variable detection tool.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] spoon_1.3.1 Matrix_1.7-1
## [3] scuttle_1.17.0 BiocParallel_1.41.0
## [5] BRISC_1.0.6 pbapply_1.7-2
## [7] rdist_0.0.5 RANN_2.6.2
## [9] STexampleData_1.14.0 SpatialExperiment_1.17.0
## [11] SingleCellExperiment_1.29.1 SummarizedExperiment_1.37.0
## [13] Biobase_2.67.0 GenomicRanges_1.59.1
## [15] GenomeInfoDb_1.43.1 IRanges_2.41.1
## [17] S4Vectors_0.45.2 MatrixGenerics_1.19.0
## [19] matrixStats_1.4.1 ExperimentHub_2.15.0
## [21] AnnotationHub_3.15.0 BiocFileCache_2.15.0
## [23] dbplyr_2.5.0 BiocGenerics_0.53.3
## [25] generics_0.1.3 nnSVG_1.11.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4 blob_1.2.4
## [4] filelock_1.0.3 Biostrings_2.75.1 fastmap_1.2.0
## [7] digest_0.6.37 mime_0.12 lifecycle_1.0.4
## [10] KEGGREST_1.47.0 RSQLite_2.3.8 magrittr_2.0.3
## [13] compiler_4.4.2 rlang_1.1.4 sass_0.4.9
## [16] tools_4.4.2 utf8_1.2.4 yaml_2.3.10
## [19] knitr_1.49 S4Arrays_1.7.1 bit_4.5.0
## [22] curl_6.0.1 DelayedArray_0.33.2 abind_1.4-8
## [25] withr_3.0.2 purrr_1.0.2 sys_3.4.3
## [28] grid_4.4.2 fansi_1.0.6 beachmat_2.23.1
## [31] cli_3.6.3 rmarkdown_2.29 crayon_1.5.3
## [34] httr_1.4.7 rjson_0.2.23 DBI_1.2.3
## [37] cachem_1.1.0 zlibbioc_1.52.0 AnnotationDbi_1.69.0
## [40] BiocManager_1.30.25 XVector_0.47.0 vctrs_0.6.5
## [43] jsonlite_1.8.9 bit64_4.5.2 maketools_1.3.1
## [46] magick_2.8.5 jquerylib_0.1.4 glue_1.8.0
## [49] codetools_0.2-20 BiocVersion_3.21.1 UCSC.utils_1.3.0
## [52] tibble_3.2.1 pillar_1.9.0 rappdirs_0.3.3
## [55] htmltools_0.5.8.1 GenomeInfoDbData_1.2.13 R6_2.5.1
## [58] evaluate_1.0.1 lattice_0.22-6 png_0.1-8
## [61] memoise_2.0.1 bslib_0.8.0 Rcpp_1.0.13-1
## [64] SparseArray_1.7.2 xfun_0.49 buildtools_1.0.0
## [67] pkgconfig_2.0.3