Title: | Clustering and Resolution Enhancement of Spatial Transcriptomes |
---|---|
Description: | Tools for clustering and enhancing the resolution of spatial gene expression experiments. BayesSpace clusters a low-dimensional representation of the gene expression matrix, incorporating a spatial prior to encourage neighboring spots to cluster together. The method can enhance the resolution of the low-dimensional representation into "sub-spots", for which features such as gene expression or cell type composition can be imputed. |
Authors: | Edward Zhao [aut], Senbai Kang [aut], Matt Stone [aut, cre], Xing Ren [ctb], Raphael Gottardo [ctb] |
Maintainer: | Matt Stone <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.17.0 |
Built: | 2024-12-29 04:09:49 UTC |
Source: | https://github.com/bioc/BayesSpace |
Plot spatial cluster assignments.
clusterPlot( sce, label = "spatial.cluster", palette = NULL, color = NULL, platform = NULL, is.enhanced = NULL, nsubspots.per.edge = 3, ... )
clusterPlot( sce, label = "spatial.cluster", palette = NULL, color = NULL, platform = NULL, is.enhanced = NULL, nsubspots.per.edge = 3, ... )
sce |
SingleCellExperiment. If |
label |
Labels used to color each spot. May be the name of a column in
|
palette |
Optional vector of hex codes to use for discrete spot values. |
color |
Optional hex code to set color of borders around spots. Set to
|
platform |
Spatial sequencing platform. If "Visium", the hex spot layout
will be used, otherwise square spots will be plotted. |
is.enhanced |
True if |
nsubspots.per.edge |
Number of subspots per edge of the square. Only
valid when |
... |
Additional arguments for |
Returns a ggplot object.
Other spatial plotting functions:
featurePlot()
sce <- exampleSCE() clusterPlot(sce)
sce <- exampleSCE() clusterPlot(sce)
Predict feature vectors from enhanced PCs.
enhanceFeatures( sce.enhanced, sce.ref, feature_names = NULL, model = c("xgboost", "dirichlet", "lm"), use.dimred = "PCA", assay.type = "logcounts", altExp.type = NULL, feature.matrix = NULL, nrounds = 0, train.n = round(ncol(sce.ref) * 2/3) )
enhanceFeatures( sce.enhanced, sce.ref, feature_names = NULL, model = c("xgboost", "dirichlet", "lm"), use.dimred = "PCA", assay.type = "logcounts", altExp.type = NULL, feature.matrix = NULL, nrounds = 0, train.n = round(ncol(sce.ref) * 2/3) )
sce.enhanced |
SingleCellExperiment object with enhanced PCs. |
sce.ref |
SingleCellExperiment object with original PCs and expression. |
feature_names |
List of genes/features to predict expression/values for. |
model |
Model used to predict enhanced values. |
use.dimred |
Name of dimension reduction to use. |
assay.type |
Expression matrix in |
altExp.type |
Expression matrix in |
feature.matrix |
Expression/feature matrix to predict, if not directly
attached to |
nrounds |
Nonnegative integer to set the |
train.n |
Number of spots to use in the training dataset for tuning nrounds. By default, 2/3 the total number of spots are used. |
Enhanced features are computed by fitting a predictive model to a
low-dimensional representation of the original expression vectors. By
default, a linear model is fit for each gene using the top 15 principal
components from each spot, i.e. lm(gene ~ PCs)
, and the fitted model
is used to predict the enhanced expression for each gene from the subspots'
principal components.
Diagnostic measures, such as RMSE for xgboost
or R.squared for linear
regression, are added to the 'rowData' of the enhanced experiment if the
features are an assay of the original experiment. Otherwise they are stored
as an attribute of the returned matrix/altExp.
Note that feature matrices will be returned and are expected to be input as
matrices of
-dimensional feature vectors over the
spots.
If assay.type
or altExp.type
are specified, the
enhanced features are stored in the corresponding slot of
sce.enhanced
and the modified SingleCellExperiment object is
returned.
If feature.matrix
is specified, or if a subset of features are
requested, the enhanced features are returned directly as a matrix.
set.seed(149) sce <- exampleSCE() sce <- spatialCluster(sce, 7, nrep=100, burn.in=10) enhanced <- spatialEnhance(sce, 7, init=sce$spatial.cluster, nrep=100, burn.in=10) enhanced <- enhanceFeatures(enhanced, sce, feature_names=c("gene_1", "gene_2"))
set.seed(149) sce <- exampleSCE() sce <- spatialCluster(sce, 7, nrep=100, burn.in=10) enhanced <- spatialEnhance(sce, 7, init=sce$spatial.cluster, nrep=100, burn.in=10) enhanced <- enhanceFeatures(enhanced, sce, feature_names=c("gene_1", "gene_2"))
SingleCellExperiment
for documentation examples.Create minimal SingleCellExperiment
for documentation examples.
exampleSCE(nrow = 8, ncol = 12, n_genes = 100, n_PCs = 10)
exampleSCE(nrow = 8, ncol = 12, n_genes = 100, n_PCs = 10)
nrow |
Number of rows of spots |
ncol |
Number of columns of spots |
n_genes |
Number of genes to simulate |
n_PCs |
Number of principal components to include |
Inspired by scuttle's mockSCE()
.
A SingleCellExperiment object with simulated counts, corresponding
logcounts and PCs, and positional data in colData
. Spots are
distributed over an (nrow
x ncol
) rectangle.
set.seed(149) sce <- exampleSCE()
set.seed(149) sce <- exampleSCE()
Plot spatial gene expression.
featurePlot( sce, feature, assay.type = "logcounts", diverging = FALSE, low = NULL, high = NULL, mid = NULL, color = NULL, platform = NULL, is.enhanced = NULL, nsubspots.per.edge = 3, ... )
featurePlot( sce, feature, assay.type = "logcounts", diverging = FALSE, low = NULL, high = NULL, mid = NULL, color = NULL, platform = NULL, is.enhanced = NULL, nsubspots.per.edge = 3, ... )
sce |
SingleCellExperiment. If |
feature |
Feature vector used to color each spot. May be the name of a
gene/row in an assay of |
assay.type |
String indicating which assay in |
diverging |
If true, use a diverging color gradient in
|
low , mid , high
|
Optional hex codes for low, mid, and high values of the color gradient used for continuous spot values. |
color |
Optional hex code to set color of borders around spots. Set to
|
platform |
Spatial sequencing platform. If "Visium", the hex spot layout
will be used, otherwise square spots will be plotted. |
is.enhanced |
True if |
nsubspots.per.edge |
Number of subspots per edge of the square. Only
valid when |
... |
Additional arguments for |
Returns a ggplot object.
Other spatial plotting functions:
clusterPlot()
sce <- exampleSCE() featurePlot(sce, "gene_2")
sce <- exampleSCE() featurePlot(sce, "gene_2")
Datasets are cached locally using BiocFileCache
. The first time using
this function, you may need to consent to creating a BiocFileCache directory
if one does not already exist.
getRDS(dataset, sample, cache = TRUE)
getRDS(dataset, sample, cache = TRUE)
dataset |
Dataset identifier |
sample |
Sample identifier |
cache |
If true, cache the dataset locally with |
The following datasets are available via getRDS
.
Dataset | Sample(s) |
2018_thrane_melanoma | ST_mel1_rep2 |
2020_maynard_prefrontal-cortex | 151507, 151508, 151509, 151510, 151669, 151670, 151671, 151672, 151673, 151674, 151675, 151676 |
2020_ji_squamous-cell-carcinoma | P4_rep1 |
2020_10X-IDC | IDC1 |
2020_10X-demo_ovarian-cancer | whole_transcriptome |
sce A SingleCellExperiment with positional information in colData and PCs based on the top 2000 HVGs
sce <- getRDS("2018_thrane_melanoma", "ST_mel1_rep2", cache = FALSE)
sce <- getRDS("2018_thrane_melanoma", "ST_mel1_rep2", cache = FALSE)
BayesSpace stores the MCMC chain associated with a clustering or enhancement
on disk in an HDF5 file. The mcmcChain()
function reads any parameters
specified by the user into a coda::mcmc
object compatible with
TidyBayes.
mcmcChain(sce, params = NULL) removeChain(sce)
mcmcChain(sce, params = NULL) removeChain(sce)
sce |
SingleCellExperiment with a file path stored in its metadata. |
params |
List of model parameters to read |
To interact with the HDF5 file directly, obtain the filename from the
SingleCellExperiment's metadata: metadata(sce)$chain.h5
. Each
parameter is stored as a separate dataset in the file, and is represented as
a matrix of size (n_iterations x n_parameter_indices). Parameter choices
for the spot-level clustering include:
z
(cluster assignments)
weights
()
mu
(mean vectors)
lambda
(precision matrix)
plogLik
(pseudo-log-likelihood)
Parameter choices for the subspot-level enhanced clustering include:
z
(cluster assignments)
weights
()
Y
(enhanced PCs)
mu
(mean vectors)
lambda
(precision matrix)
Ychange
(acceptance rate for the jittering of PCs)
For best results, Ychange
should average between 0.25 and 0.40.
Returns an mcmc
object containing the values of the requested
parameters over the constructed chain.
set.seed(149) sce <- exampleSCE() sce <- spatialCluster(sce, 7, nrep=100, burn.in=10, save.chain=TRUE) chain <- mcmcChain(sce) removeChain(sce)
set.seed(149) sce <- exampleSCE() sce <- spatialCluster(sce, 7, nrep=100, burn.in=10, save.chain=TRUE) chain <- mcmcChain(sce) removeChain(sce)
A convenient wrapper function of BiocParallel
providing easy
parallelization.
paraLapply( X, FUN, BPPARAM = NULL, cores = 1L, type = c("serial", "fork", "sock"), verbose = FALSE, ... )
paraLapply( X, FUN, BPPARAM = NULL, cores = 1L, type = c("serial", "fork", "sock"), verbose = FALSE, ... )
X |
Any object for which methods |
FUN |
The |
BPPARAM |
An optional |
cores |
The number of threads to use. The results are invariate to the
value of |
type |
One of "serial", "fork", or "sock". When |
verbose |
Whether to print debug information or not. |
... |
Additional parameters passed to |
See lapply
.
Before running spatialCluster()
, we recommend tuning the choice of
q
by choosing the q
that minimizes the model's negative log
likelihood over early iterations. qTune()
computes the average
negative log likelihood for a range of q values over iterations 100:1000, and
qPlot()
displays the results.
qPlot(sce, qs = seq(3, 7), force.retune = FALSE, ...) qTune(sce, qs = seq(3, 7), burn.in = 100, nrep = 1000, cores = 1L, ...)
qPlot(sce, qs = seq(3, 7), force.retune = FALSE, ...) qTune(sce, qs = seq(3, 7), burn.in = 100, nrep = 1000, cores = 1L, ...)
sce |
A SingleCellExperiment object containing the spatial data. |
qs |
The values of q to evaluate. |
force.retune |
If specified, existing tuning values in |
... |
Other parameters are passed to |
burn.in , nrep
|
Integers specifying the range of repetitions to compute. |
cores |
The number of threads to use. The results are invariate to the
value of |
qTune()
takes the same parameters as spatialCluster()
and will
run the MCMC clustering algorithm up to nrep
iterations for each
value of q
. The first burn.in
iterations are discarded as
burn-in and the log likelihood is averaged over the remaining iterations.
qPlot()
plots the computed negative log likelihoods as a function of
q. If qTune()
was run previously, i.e. there exists an attribute of
sce
named "q.logliks"
, the pre-computed results are
displayed. Otherwise, or if force.retune
is specified,
qplot()
will automatically run qTune()
before plotting (and
can take the same parameters as spatialCluster()
.
qTune()
returns a modified sce
with tuning log
likelihoods stored as an attribute named "q.logliks"
.
qPlot()
returns a ggplot object.
set.seed(149) sce <- exampleSCE() sce <- qTune(sce, seq(3, 7), burn.in = 10, nrep = 100) qPlot(sce)
set.seed(149) sce <- exampleSCE() sce <- qTune(sce, seq(3, 7), burn.in = 10, nrep = 100) qPlot(sce)
Load a Visium spatial dataset as a SingleCellExperiment.
readVisium( dirname, rm.feats.pat = c("^NegControl.*", "^BLANK.*", "^DEPRECATED.*") ) read10Xh5( dirname, fname = "filtered_feature_bc_matrix.h5", rm.feats.pat = c("^NegControl.*", "^BLANK.*", "^DEPRECATED.*") ) counts2h5(dirname)
readVisium( dirname, rm.feats.pat = c("^NegControl.*", "^BLANK.*", "^DEPRECATED.*") ) read10Xh5( dirname, fname = "filtered_feature_bc_matrix.h5", rm.feats.pat = c("^NegControl.*", "^BLANK.*", "^DEPRECATED.*") ) counts2h5(dirname)
dirname |
Path to spaceranger output directory (e.g. "sampleID/outs/").
This directory must contain the counts matrix and feature/barcode TSVs in
|
rm.feats.pat |
Patterns for features (genes) to remove. |
fname |
File name of the h5 file. It should be inside |
We store two variables associated with downstream BayesSpace
functions in a list called BayesSpace.data
in the
SingleCellExperiment's metadata
.
platform
is set to "Visium", and is used to determine spot
layout and neighborhood structure.
is.enhanced
is set to FALSE
to denote the object
contains spot-level data.
SingleCellExperiment containing the counts matrix in counts
and spatial data in colData
. Array coordinates for each spot are
stored in columns array_row
and array_col
, while image
coordinates are stored in columns pxl_row_in_fullres
and
pxl_col_in_fullres
.
## Not run: sce <- readVisium("path/to/outs/") ## End(Not run)
## Not run: sce <- readVisium("path/to/outs/") ## End(Not run)
Cluster a spatial expression dataset.
spatialCluster( sce, q, use.dimred = "PCA", d = 15, platform = c("Visium", "VisiumHD", "ST"), init = NULL, init.method = c("mclust", "kmeans"), model = c("t", "normal"), precision = c("equal", "variable"), nrep = 50000, burn.in = 1000, thin = 100, gamma = NULL, mu0 = NULL, lambda0 = NULL, alpha = 1, beta = 0.01, save.chain = FALSE, chain.fname = NULL )
spatialCluster( sce, q, use.dimred = "PCA", d = 15, platform = c("Visium", "VisiumHD", "ST"), init = NULL, init.method = c("mclust", "kmeans"), model = c("t", "normal"), precision = c("equal", "variable"), nrep = 50000, burn.in = 1000, thin = 100, gamma = NULL, mu0 = NULL, lambda0 = NULL, alpha = 1, beta = 0.01, save.chain = FALSE, chain.fname = NULL )
sce |
A SingleCellExperiment object containing the spatial data. |
q |
The number of clusters. |
use.dimred |
Name of a reduced dimensionality result in
|
d |
Number of top principal components to use when clustering. |
platform |
Spatial transcriptomic platform. Specify 'Visium' for hex
lattice geometry or 'ST' and 'VisiumHD' for square lattice geometry.
Specifying this parameter is optional when analyzing SingleCellExperiments
processed using |
init |
Initial cluster assignments for spots. |
init.method |
If |
model |
Error model. ('normal' or 't') |
precision |
Covariance structure. ('equal' or 'variable' for EEE and VVV covariance models, respectively.) |
nrep |
The number of MCMC iterations. |
burn.in |
The number of MCMC iterations to exclude as burn-in period. |
thin |
Thinning rate. |
gamma |
Smoothing parameter. Defaults to 2 for |
mu0 |
Prior mean hyperparameter for mu. If not provided, mu0 is set to the mean of PCs over all spots. |
lambda0 |
Prior precision hyperparam for mu. If not provided, lambda0
is set to a diagonal matrix |
alpha |
Hyperparameter for Wishart distributed precision lambda. |
beta |
Hyperparameter for Wishart distributed precision lambda. |
save.chain |
If true, save the MCMC chain to an HDF5 file. |
chain.fname |
File path for saved chain. Tempfile used if not provided. |
The input SCE must have row
and col
columns in its
colData
, corresponding to the array row and column coordinates of each
spot. These are automatically parsed by readVisium
or can be
added manually when creating the SCE.
Cluster labels are stored in the spatial.cluster
column of the SCE,
and the cluster initialization is stored in cluster.init
.
Returns a modified sce
with cluster assignments stored in
colData
under the name spatial.cluster
.
spatialPreprocess
for preparing the SCE for
clustering, spatialEnhance
for enhancing the clustering
resolution, clusterPlot
for visualizing the cluster
assignments, featurePlot
for visualizing expression levels
in spatial context, and mcmcChain
for examining the full
MCMC chain associated with the clustering.
set.seed(149) sce <- exampleSCE() sce <- spatialCluster(sce, 7, nrep = 100, burn.in = 10)
set.seed(149) sce <- exampleSCE() sce <- spatialCluster(sce, 7, nrep = 100, burn.in = 10)
Enhanced clustering of a spatial expression dataset to subspot resolution.
spatialEnhance( sce, q, platform = c("Visium", "VisiumHD", "ST"), use.dimred = "PCA", d = 15, nsubspots.per.edge = 3, init = NULL, init.method = c("spatialCluster", "mclust", "kmeans"), model = c("t", "normal"), nrep = 1e+05, gamma = NULL, mu0 = NULL, lambda0 = NULL, alpha = 1, beta = 0.01, save.chain = FALSE, chain.fname = NULL, burn.in = 10000, thin = 100, jitter.scale = 5, jitter.prior = 0.3, adapt.before = burn.in, cores = 1, verbose = FALSE ) coreTune(sce, test.cores = detectCores(), test.times = 1, ...) adjustClusterLabels(sce, burn.in)
spatialEnhance( sce, q, platform = c("Visium", "VisiumHD", "ST"), use.dimred = "PCA", d = 15, nsubspots.per.edge = 3, init = NULL, init.method = c("spatialCluster", "mclust", "kmeans"), model = c("t", "normal"), nrep = 1e+05, gamma = NULL, mu0 = NULL, lambda0 = NULL, alpha = 1, beta = 0.01, save.chain = FALSE, chain.fname = NULL, burn.in = 10000, thin = 100, jitter.scale = 5, jitter.prior = 0.3, adapt.before = burn.in, cores = 1, verbose = FALSE ) coreTune(sce, test.cores = detectCores(), test.times = 1, ...) adjustClusterLabels(sce, burn.in)
sce |
A SingleCellExperiment object containing the spatial data. |
q |
The number of clusters. |
platform |
Spatial transcriptomic platform. Specify 'Visium' for hex
lattice geometry or 'ST' and 'VisiumHD' for square lattice geometry.
Specifying this parameter is optional when analyzing SingleCellExperiments
processed using |
use.dimred |
Name of a reduced dimensionality result in
|
d |
Number of top principal components to use when clustering. |
nsubspots.per.edge |
Number of subspots per edge of the square. Only
valid when |
init |
Initial cluster assignments for spots. |
init.method |
If |
model |
Error model. ('normal' or 't') |
nrep |
The number of MCMC iterations. |
gamma |
Smoothing parameter. (Values in range of 1-3 seem to work well.) |
mu0 |
Prior mean hyperparameter for mu. If not provided, mu0 is set to the mean of PCs over all spots. |
lambda0 |
Prior precision hyperparam for mu. If not provided, lambda0
is set to a diagonal matrix |
alpha |
Hyperparameter for Wishart distributed precision lambda. |
beta |
Hyperparameter for Wishart distributed precision lambda. |
save.chain |
If true, save the MCMC chain to an HDF5 file. |
chain.fname |
File path for saved chain. Tempfile used if not provided. |
burn.in |
Number of iterations to exclude as burn-in period. The MCMC
iterations are currently thinned to every 100; accordingly |
thin |
Thinning rate. |
jitter.scale |
Controls the amount of jittering. Small amounts of
jittering are more likely to be accepted but result in exploring the space
more slowly. We suggest tuning |
jitter.prior |
Scale factor for the prior variance, parameterized as the
proportion (default = 0.3) of the mean variance of the PCs.
We suggest making |
adapt.before |
Adapting the MCMC chain before the specified number
or proportion of iterations (by default equal to |
cores |
The number of threads to use. The results are invariate to the
value of |
verbose |
Log progress to stderr. |
test.cores |
Either a list of, or a maximum number of cores to test. In the latter case, a list of values (power of 2) will be created |
test.times |
Times to repeat the benchmarking with microbenchmark. |
... |
Arguments for |
The enhanced SingleCellExperiment
has most of the properties of the
input SCE - rowData
, colData
, reducedDims
- but does
not include expression data in counts
or logcounts
. To impute
enhanced expression vectors, please use [enhanceFeatures()] after
running spatialEnhance
.
The colData
of the enhanced SingleCellExperiment
includes the
following columns to permit referencing the subspots in spatial context and
linking back to the original spots:
spot.idx
: Index of the spot this subspot belongs to (with
respect to the input SCE).
subspot.idx
: Index of the subspot within its parent spot.
spot.row
: Array row of the subspot's parent spot.
spot.col
: Array col of the subspot's parent spot.
array_row
: Array row of the subspot. This is the parent spot's row
plus an offset based on the subspot's position within the spot.
array_col
: Array col of the subspot. This is the parent spot's col
plus an offset based on the subspot's position within the spot.
pxl_row_in_fullres
: Pixel row of the subspot. This is the parent spot's
row plus an offset based on the subspot's position within the spot.
pxl_col_in_fullres
: Pixel col of the subspot. This is the parent spot's
col plus an offset based on the subspot's position within the spot.
spatialEnhance
returns a new SingleCellExperiment object.
By default, the assays
of this object are empty, and the enhanced
resolution PCs are stored as a reduced dimensionality result accessible
with reducedDim(sce, 'PCA')
.
coresTune
returns the output of microbenchmark
.
adjustClusterLabels
adjusts the cluster labels from the MCMC samples
via burn.in
, the percentage of samples to drop. The MCMC chain
must be retained.
spatialCluster
for clustering at the spot level
before enhancing, clusterPlot
for visualizing the cluster
assignments, enhanceFeatures
for imputing enhanced
expression, and mcmcChain
for examining the full MCMC chain
associated with the enhanced clustering.
.
set.seed(149) sce <- exampleSCE() sce <- spatialCluster(sce, 7, nrep = 100, burn.in = 10) enhanced <- spatialEnhance(sce, 7, nrep = 100, burn.in = 10)
set.seed(149) sce <- exampleSCE() sce <- spatialCluster(sce, 7, nrep = 100, burn.in = 10) enhanced <- spatialEnhance(sce, 7, nrep = 100, burn.in = 10)
Adds metadata required for downstream analyses, and (optionally) performs PCA on log-normalized expression of top HVGs.
spatialPreprocess( sce, platform = c("Visium", "VisiumHD", "ST"), n.PCs = 15, n.HVGs = 2000, skip.PCA = FALSE, log.normalize = TRUE, assay.type = "logcounts", BSPARAM = ExactParam(), BPPARAM = SerialParam() )
spatialPreprocess( sce, platform = c("Visium", "VisiumHD", "ST"), n.PCs = 15, n.HVGs = 2000, skip.PCA = FALSE, log.normalize = TRUE, assay.type = "logcounts", BSPARAM = ExactParam(), BPPARAM = SerialParam() )
sce |
SingleCellExperiment to preprocess |
platform |
Spatial sequencing platform. Used to determine spot layout and neighborhood structure (Visium = hex, VisiumHD = square, ST = square). |
n.PCs |
Number of principal components to compute. We suggest using the top 15 PCs in most cases. |
n.HVGs |
Number of highly variable genes to run PCA upon. |
skip.PCA |
Skip PCA (if dimensionality reduction was previously computed.) |
log.normalize |
Whether to log-normalize the input data with scater. May be omitted if log-normalization previously computed. |
assay.type |
Name of assay in |
BSPARAM |
A BiocSingularParam object specifying which
algorithm should be used to perform the PCA. By default, an exact PCA is
performed, as current spatial datasets are generally small (<10,000 spots).
To perform a faster approximate PCA, please specify
|
BPPARAM |
A BiocParallelParam object specifying whether
to model the gene variation in parallel or not
(default to |
SingleCellExperiment with PCA and BayesSpace metadata
sce <- exampleSCE() sce <- spatialPreprocess(sce)
sce <- exampleSCE() sce <- spatialPreprocess(sce)