| Title: | Spatially-aware normalisation for spatial transcriptomics data |
|---|---|
| Description: | This package implements the spatially aware library size normalisation algorithm, SpaNorm. SpaNorm normalises out library size effects while retaining biology through the modelling of smooth functions for each effect. Normalisation is performed in a gene- and cell-/spot- specific manner, yielding library size adjusted data. |
| Authors: | Dharmesh D. Bhuva [aut, cre] (ORCID: <https://orcid.org/0000-0002-6398-9157>), Agus Salim [aut] (ORCID: <https://orcid.org/0000-0003-3999-7701>), Ahmed Mohamed [aut] (ORCID: <https://orcid.org/0000-0001-6507-5300>) |
| Maintainer: | Dharmesh D. Bhuva <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.7.0 |
| Built: | 2026-05-30 09:52:48 UTC |
| Source: | https://github.com/bioc/SpaNorm |
This function computes the size factors using a fast but inaccurate approach. Size factors are computed using the direct estimate of library sizes (sum of all counts). Though fast, this approach does not cater for compositional biases in the data and therefore is less accurate than scran-based estimates.
fastSizeFactors(spe) ## S4 method for signature 'SpatialExperiment' fastSizeFactors(spe)fastSizeFactors(spe) ## S4 method for signature 'SpatialExperiment' fastSizeFactors(spe)
spe |
a SpatialExperiment, Seurat, or SpatialFeatureExperiment object containing count data. |
a SpatialExperiment, Seurat, or SpatialFeatureExperiment, containing size factors in the 'sizeFactor' column of the column annotation.
data(HumanDLPFC) HumanDLPFC <- fastSizeFactors(HumanDLPFC) head(HumanDLPFC$sizeFactor)data(HumanDLPFC) HumanDLPFC <- fastSizeFactors(HumanDLPFC) head(HumanDLPFC$sizeFactor)
This function removes genes that are very lowly expressed.
filterGenes(spe, prop = 0.1) ## S4 method for signature 'SpatialExperiment' filterGenes(spe, prop = 0.1) ## S4 method for signature 'Seurat' filterGenes(spe, prop = 0.1)filterGenes(spe, prop = 0.1) ## S4 method for signature 'SpatialExperiment' filterGenes(spe, prop = 0.1) ## S4 method for signature 'Seurat' filterGenes(spe, prop = 0.1)
spe |
a SpatialExperiment, Seurat, or SpatialFeatureExperiment object containing count data. |
prop |
a numeric, indicating the proportion of loci/cells where the gene should be expressed (default is 0.1, i.e., genes should be expressed in at least 10% of the loci/cells). |
a logical vector encoding which genes should be kept for further analysis.
data(HumanDLPFC) keep <- filterGenes(HumanDLPFC) table(keep)data(HumanDLPFC) keep <- filterGenes(HumanDLPFC) table(keep)
Human DLPFC sample profiled using the 10x Visium platform. Lowly expressed genes were filtered out using the filterGenes function to retain genes expressed in at least 10% of samples. This version was obtained from the SpatialLIBD R/Bioconductor package.
data(HumanDLPFC)data(HumanDLPFC)
A SpatialExperiment containing region annotations in the colData column 'AnnotatedCluster'.
SpatialLIBD R/Bioconductor package.
Maynard KR, Collado-Torres L, Weber LM, Uytingco C, Barry BK, Williams SR, Catallini JL, Tran MN, Besich Z, Tippani M, Chew J. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nature neuroscience. 2021 Mar;24(3):425-36.
This function can be used to spatially visualise the library size, biology or batch specific effect modelled for each gene.
plotCovariate(spe, covariate = c("biology", "ls", "batch"), ...)plotCovariate(spe, covariate = c("biology", "ls", "batch"), ...)
spe |
a SpatialExperiment object. |
covariate |
a character, specifying the type of covariate to be plot: "biology" (default), "ls" to plot the library size effect, and "batch" to plot the batch-specific effect. |
... |
additional parameters to be passed to the plotSpatial function. |
a ggplot2 object
library(SpatialExperiment) library(ggplot2) data(HumanDLPFC) HumanDLPFC = SpaNorm(HumanDLPFC, sample.p = 0.05, df.tps = 2, tol = 1e-2) # plot spatial region annotations p1 <- plotCovariate(HumanDLPFC, covariate = "biology", colour = ENSG00000075624) + scale_colour_viridis_c(option = "F") p1 p2 <- plotCovariate(HumanDLPFC, covariate = "ls", colour = ENSG00000075624) + scale_colour_viridis_c(option = "F") p2library(SpatialExperiment) library(ggplot2) data(HumanDLPFC) HumanDLPFC = SpaNorm(HumanDLPFC, sample.p = 0.05, df.tps = 2, tol = 1e-2) # plot spatial region annotations p1 <- plotCovariate(HumanDLPFC, covariate = "biology", colour = ENSG00000075624) + scale_colour_viridis_c(option = "F") p1 p2 <- plotCovariate(HumanDLPFC, covariate = "ls", colour = ENSG00000075624) + scale_colour_viridis_c(option = "F") p2
Plot spatial transcriptomic annotations per spot
plotSpatial( spe, what = c("annotation", "expression", "reduceddim"), ..., assay = SummarizedExperiment::assayNames(spe), dimred = SingleCellExperiment::reducedDimNames(spe), img = FALSE, crop = FALSE, imgAlpha = 1, rl = 1, circles = FALSE )plotSpatial( spe, what = c("annotation", "expression", "reduceddim"), ..., assay = SummarizedExperiment::assayNames(spe), dimred = SingleCellExperiment::reducedDimNames(spe), img = FALSE, crop = FALSE, imgAlpha = 1, rl = 1, circles = FALSE )
spe |
a SpatialExperiment object. |
what |
a character, specifying what aspect should be plot, "annotation", "expression", or reduced dimension ("reduceddim"). |
... |
additional aesthetic mappings or fixed parameters (e.g., shape = "."). |
assay |
a character or numeric, specifying the assay to plot (default is the first assay). |
dimred |
a character or numeric, specifying the reduced dimension to plot (default is the first reduced dimension). |
img |
a logical, indicating whether the tissue image (if present) should be plot (default = FALSE). |
crop |
a logical, indicating whether the image should be cropped to the spatial coordinates (default = FALSE). |
imgAlpha |
a numeric, specifying the alpha value for the image (default = 1). |
rl |
a numeric, specifying the relative size of the text (default = 1). |
circles |
a logical, indicating whether the spots should be plotted as circles (default = FALSE). This can be slower for large datasets. |
a ggplot2 object
library(SpatialExperiment) library(ggplot2) # load data data(HumanDLPFC) # plot spatial region annotations p1 <- plotSpatial(HumanDLPFC, colour = AnnotatedCluster) p1 # change colour scale p1 + scale_colour_brewer(palette = "Paired") # plot spatial expression plotSpatial(HumanDLPFC, what = "expression", colour = ENSG00000075624) + scale_colour_viridis_c(option = "F") # plot logcounts logcounts(HumanDLPFC) <- log2(counts(HumanDLPFC) + 1) plotSpatial(HumanDLPFC, what = "expression", colour = ENSG00000075624, assay = "logcounts") + scale_colour_viridis_c(option = "F") # change point shape plotSpatial(HumanDLPFC, what = "expression", colour = ENSG00000075624, assay = "logcounts", shape = 18) + scale_colour_viridis_c(option = "F")library(SpatialExperiment) library(ggplot2) # load data data(HumanDLPFC) # plot spatial region annotations p1 <- plotSpatial(HumanDLPFC, colour = AnnotatedCluster) p1 # change colour scale p1 + scale_colour_brewer(palette = "Paired") # plot spatial expression plotSpatial(HumanDLPFC, what = "expression", colour = ENSG00000075624) + scale_colour_viridis_c(option = "F") # plot logcounts logcounts(HumanDLPFC) <- log2(counts(HumanDLPFC) + 1) plotSpatial(HumanDLPFC, what = "expression", colour = ENSG00000075624, assay = "logcounts") + scale_colour_viridis_c(option = "F") # change point shape plotSpatial(HumanDLPFC, what = "expression", colour = ENSG00000075624, assay = "logcounts", shape = 18) + scale_colour_viridis_c(option = "F")
Performs normalisation of spatial transcriptomics data using spatially-dependent spot- and gene- specific size factors.
SpaNorm( spe, sample.p = 0.25, gene.model = c("nb"), adj.method = c("auto", "logpac", "pearson", "medbio", "meanbio"), scale.factor = 1, df.tps = 6, lambda.a = 1e-04, batch = NULL, tol = 1e-04, step.factor = 0.5, maxit.nb = 50, maxit.psi = 25, maxn.psi = 500, overwrite = FALSE, backend = c("auto", "cpu", "gpu"), verbose = TRUE, ... ) ## S4 method for signature 'SpatialExperiment' SpaNorm( spe, sample.p = 0.25, gene.model = c("nb"), adj.method = c("auto", "logpac", "pearson", "medbio", "meanbio"), scale.factor = 1, df.tps = 6, lambda.a = 1e-04, batch = NULL, tol = 1e-04, step.factor = 0.5, maxit.nb = 50, maxit.psi = 25, maxn.psi = 500, overwrite = FALSE, backend = c("auto", "cpu", "gpu"), verbose = TRUE, ... ) ## S4 method for signature 'Seurat' SpaNorm( spe, sample.p = 0.25, gene.model = c("nb"), adj.method = c("auto", "logpac", "pearson", "medbio", "meanbio"), scale.factor = 1, df.tps = 6, lambda.a = 1e-04, batch = NULL, tol = 1e-04, step.factor = 0.5, maxit.nb = 50, maxit.psi = 25, maxn.psi = 500, overwrite = FALSE, backend = c("auto", "cpu", "gpu"), verbose = TRUE, ... )SpaNorm( spe, sample.p = 0.25, gene.model = c("nb"), adj.method = c("auto", "logpac", "pearson", "medbio", "meanbio"), scale.factor = 1, df.tps = 6, lambda.a = 1e-04, batch = NULL, tol = 1e-04, step.factor = 0.5, maxit.nb = 50, maxit.psi = 25, maxn.psi = 500, overwrite = FALSE, backend = c("auto", "cpu", "gpu"), verbose = TRUE, ... ) ## S4 method for signature 'SpatialExperiment' SpaNorm( spe, sample.p = 0.25, gene.model = c("nb"), adj.method = c("auto", "logpac", "pearson", "medbio", "meanbio"), scale.factor = 1, df.tps = 6, lambda.a = 1e-04, batch = NULL, tol = 1e-04, step.factor = 0.5, maxit.nb = 50, maxit.psi = 25, maxn.psi = 500, overwrite = FALSE, backend = c("auto", "cpu", "gpu"), verbose = TRUE, ... ) ## S4 method for signature 'Seurat' SpaNorm( spe, sample.p = 0.25, gene.model = c("nb"), adj.method = c("auto", "logpac", "pearson", "medbio", "meanbio"), scale.factor = 1, df.tps = 6, lambda.a = 1e-04, batch = NULL, tol = 1e-04, step.factor = 0.5, maxit.nb = 50, maxit.psi = 25, maxn.psi = 500, overwrite = FALSE, backend = c("auto", "cpu", "gpu"), verbose = TRUE, ... )
spe |
a SpatialExperiment or Seurat object, with the count data stored in 'counts' or 'data' assays respectively. |
sample.p |
a numeric, specifying the (maximum) proportion of cells/spots to sample for model fitting (default is 0.25). |
gene.model |
a character, specifying the model to use for gene/protein abundances (default 'nb'). This should be 'nb' for count based datasets. |
adj.method |
a character, specifying the method to use to adjust the data (default 'auto', see details) |
scale.factor |
a numeric, specifying the sample-specific scaling factor to scale the adjusted count. |
df.tps |
a numeric, of length 1 or 2, specifying the maximum degrees of freedom for the thin-plate spline for the biology and library size effects, respectively (default is 6, see details). |
lambda.a |
a numeric, of length 1 or 2, specifying the smoothing parameter for regularizing regression coefficients (default is 0.0001, see details). Actual lambda.a used is lambda.a * ncol(spe). |
batch |
a vector or numeric matrix, specifying the batch design to regress out (default NULL, representing no batch effects). See details for more information on how to define this variable. |
tol |
a numeric, specifying the tolerance for convergence (default is 1e-4). |
step.factor |
a numeric, specifying the multiplicative factor to decrease IRLS step by when log-likelihood diverges (default is 0.5). |
maxit.nb |
a numeric, specifying the maximum number of IRLS iteration for estimating NB mean parameters for a given dispersion parameter (default is 50). |
maxit.psi |
a numeric, specifying the maximum number of IRLS iterations to estimate the dispersion parameter (default is 25). |
maxn.psi |
a numeric, specifying the maximum number of cells/spots to sample for dispersion estimation (default is 500). |
overwrite |
a logical, specifying whether to force recomputation and overwrite an existing fit (default FALSE). Note that if df.tps, batch, lambda.a, or gene.model are changed, the model is recomputed and overwritten. |
backend |
a character, specifying the backend to use for computations (default 'auto', see details). If 'gpu', GPU-based computations are used if available, otherwise CPU-based computations are used. |
verbose |
a logical, specifying whether to show update messages (default TRUE). |
... |
other parameters fitting parameters. |
SpaNorm works by first fitting a spatial regression model for library size to the data. Normalised data can then be computed using various adjustment approaches. When a negative binomial gene-model is used, the data can be adjusted using the following approaches: 'logpac', 'pearson', 'medbio', and 'meanbio'.
The df.tps parameter specifies the degrees of freedom for the thin-plate spline. If only 1 value is provided, it specifies the degrees of freedom of the biology with the degrees of freedom of the library size being half of that. If 2 values are provided, the first value specifies the degrees of freedom of the biology and the second value specifies the degrees of freedom of the library size. For rectangular tissues, df.tps specifies the degrees of freedom along the length, with the degrees of freedom along the width calculated ceiling(width / length * df.tps).
Similarly, the lambda.a parameter specifies the smoothing parameter for regularizing regression coefficients. If only 1 value is provided, it specifies the lambda.a for both the biology and library size functions. If 2 values are provided, the first value specifies the lambda.a for the biology and the second value specifies the lambda.a for the library size. Batch effects are not regularised.
Batch effects can be specified using the batch parameter. If this parameter is a vector, a design matrix will be created within the function using model.matrix. If a custom design is provided in the form of a numeric matrix, this should ideally be created using model.matrix. The batch matrix should be created with an intercept term. The SpaNorm function will automatically detect the intercept term and remove the relevant column. Alternatively, users can subset the model matrix to remove this column manually. Please note that the model formula should include the intercept term and that the intercept column should be subset out after.
a SpatialExperiment or Seurat object with the adjusted data stored in 'logcounts' or 'data', respectively.
data(HumanDLPFC) SpaNorm(HumanDLPFC, sample.p = 0.05, df.tps = 2, tol = 1e-2)data(HumanDLPFC) SpaNorm(HumanDLPFC, sample.p = 0.05, df.tps = 2, tol = 1e-2)
An S4 class to store a SpaNorm model fit
## S4 method for signature 'SpaNormFit' x$name## S4 method for signature 'SpaNormFit' x$name
x |
an object of class SpaNormFit. |
name |
a character, specifying the name of the slot to retrieve. |
Return value varies depending on method.
ngenesa numeric, specifying the number of genes in the dataset.
ncellsa numeric, specifying the number of cells/spots in the dataset.
gene.modela character, specifying the gene-specific model to used (see getGeneModels()).
df.tpsan integer, specifying the degrees of freedom to used for the thin plate spline.
sample.pa numeric, specifying the proportion of samples used to approximated the model.
lambda.aa numeric, specifying the shinkage parameter used.
batcha vector or matrix, specifying the batch design used (if any).
Wa matrix, specifying the covariate matrix of the linear model.
alphaa matrix, specifying the coefficients of the linear model.
gmeana numeric, specifying the mean estimate for each gene in the linear model.
psia numeric, specifying the over-dispersion parameter for each gene if a negative binomial model was used (or a vector of NAs if another gene model is used).
wtypea factor, specifying the covariate types of columns in the covariate matrix, W. These could be "biology", "ls", or "batch".
loglika numeric, specifying the log-likelihood of the model at each external iteration.
samplinga factor, specifying the cells/spots used for dispersion estimation ('dispersion'), GLM fitting ('glm' and 'dispersion'), all other cells/spots ('all').
example(SpaNorm)example(SpaNorm)
GLM-based PCA using the SpaNorm model. The null model is considered to consist of the library size effects, batch effects, and the gene mean. GLM-PCA is approximated by regressing the null model from the data, and performing PCA on the residuals (Pearson or deviance).
SpaNormPCA( spe, nsvgs = 3000, ncomponents = 50, svg.fdr = 1, BSPARAM = bsparam(), BPPARAM = SerialParam(), residuals = c("deviance", "pearson"), name = "PCA" ) ## S4 method for signature 'SpatialExperiment' SpaNormPCA( spe, nsvgs = 3000, ncomponents = 50, svg.fdr = 1, BSPARAM = bsparam(), BPPARAM = SerialParam(), residuals = c("deviance", "pearson"), name = "PCA" )SpaNormPCA( spe, nsvgs = 3000, ncomponents = 50, svg.fdr = 1, BSPARAM = bsparam(), BPPARAM = SerialParam(), residuals = c("deviance", "pearson"), name = "PCA" ) ## S4 method for signature 'SpatialExperiment' SpaNormPCA( spe, nsvgs = 3000, ncomponents = 50, svg.fdr = 1, BSPARAM = bsparam(), BPPARAM = SerialParam(), residuals = c("deviance", "pearson"), name = "PCA" )
spe |
a SpatialExperiment or Seurat object, with the count data stored in 'counts' or 'data' assays respectively, and a SpaNorm model fit. |
nsvgs |
the number of SVGs to use for PCA. |
ncomponents |
the number of components to compute. |
svg.fdr |
the FDR threshold for SVG calling. |
BSPARAM |
a BiocSingularParam object specifying which algorithm should be used to perform the PCA. |
BPPARAM |
a BiocParallelParam object specifying whether the PCA should be parallelized. |
residuals |
the type of residuals to use for PCA. Either "deviance" (default) or "pearson". |
name |
the name of the reducedDim to store the PCA results. |
SpaNorm PCA works by using the SpaNorm model fit for data normalisation to approximate a GLM-based PCA as described in Townes et al. (Genome Biology, 2019). The model used for normalisation represents the library size effects and the gene mean. Regressing these covariates, we remain with the deviance or Pearson residuals, upon which PCA can be performed to approximate the GLM-PCA.
a SpatialExperiment or Seurat object with PCA results. For SpatialExperiment objects, these are stored in the reducedDims.
library(SpatialExperiment) library(ggplot2) data(HumanDLPFC) HumanDLPFC = SpaNorm(HumanDLPFC, sample.p = 0.05, df.tps = 2, tol = 1e-2) HumanDLPFC = SpaNormSVG(HumanDLPFC) HumanDLPFC = SpaNormPCA(HumanDLPFC) reducedDims(HumanDLPFC)library(SpatialExperiment) library(ggplot2) data(HumanDLPFC) HumanDLPFC = SpaNorm(HumanDLPFC, sample.p = 0.05, df.tps = 2, tol = 1e-2) HumanDLPFC = SpaNormSVG(HumanDLPFC) HumanDLPFC = SpaNormPCA(HumanDLPFC) reducedDims(HumanDLPFC)
Spatially variable gene (SVG) calling using the SpaNorm model.
SpaNormSVG(spe, backend = c("auto", "cpu", "gpu"), verbose = TRUE) ## S4 method for signature 'SpatialExperiment' SpaNormSVG(spe, backend = c("auto", "cpu", "gpu"), verbose = TRUE)SpaNormSVG(spe, backend = c("auto", "cpu", "gpu"), verbose = TRUE) ## S4 method for signature 'SpatialExperiment' SpaNormSVG(spe, backend = c("auto", "cpu", "gpu"), verbose = TRUE)
spe |
a SpatialExperiment or Seurat object, with the count data stored in 'counts' or 'data' assays respectively, and a SpaNorm model fit. |
backend |
a character, specifying the backend to use for computations. Options are "auto" (default), "cpu", or "gpu". If "auto", it will use GPU if available, otherwise CPU. |
verbose |
a logical, specifying whether to show update messages (default TRUE). |
SpaNorm SVG calling works by using the SpaNorm model fit for data normalisation to perform a likelihood ratio test (LRT). The model used for normalisation is considered to be the full model. A second nested model is fit without the splines representing biology. These nested models are then compared using a LRT to identify genes where the splines representing biology contain strong signal.
a SpatialExperiment or Seurat object with F-statistics, false discovery rates (FDRs). For SpatialExperiment objects, these are stored in the rowData.
library(SpatialExperiment) library(ggplot2) data(HumanDLPFC) HumanDLPFC = SpaNorm(HumanDLPFC, sample.p = 0.05, df.tps = 2, tol = 1e-2) HumanDLPFC = SpaNormSVG(HumanDLPFC) head(rowData(HumanDLPFC))library(SpatialExperiment) library(ggplot2) data(HumanDLPFC) HumanDLPFC = SpaNorm(HumanDLPFC, sample.p = 0.05, df.tps = 2, tol = 1e-2) HumanDLPFC = SpaNormSVG(HumanDLPFC) head(rowData(HumanDLPFC))
Export top SVG results to a data frame
topSVGs(spe, n = 10, fdr = 1)topSVGs(spe, n = 10, fdr = 1)
spe |
a SpatialExperiment object with SVG results from SpaNormSVG. |
n |
a numeric, specifying the number of top SVGs to call. |
fdr |
a numeric, specifying the false discovery rate (FDR) threshold for calling SVGs. |
A data frame containing the top SVGs from F-test results including F-statistics, p-values and FDR.
library(SpatialExperiment) library(ggplot2) data(HumanDLPFC) HumanDLPFC = SpaNorm(HumanDLPFC, sample.p = 0.05, df.tps = 2, tol = 1e-2) HumanDLPFC = SpaNormSVG(HumanDLPFC) topSVGs = topSVGs(HumanDLPFC, n = 10)library(SpatialExperiment) library(ggplot2) data(HumanDLPFC) HumanDLPFC = SpaNorm(HumanDLPFC, sample.p = 0.05, df.tps = 2, tol = 1e-2) HumanDLPFC = SpaNormSVG(HumanDLPFC) topSVGs = topSVGs(HumanDLPFC, n = 10)