Title: | Spatial transcriptomic clustering |
---|---|
Description: | Banksy is an R package that incorporates spatial information to cluster cells in a feature space (e.g. gene expression). To incorporate spatial information, BANKSY computes the mean neighborhood expression and azimuthal Gabor filters that capture gene expression gradients. These features are combined with the cell's own expression to embed cells in a neighbor-augmented product space which can then be clustered, allowing for accurate and spatially-aware cell typing and tissue domain segmentation. |
Authors: | Vipul Singhal [aut], Joseph Lee [aut, cre] |
Maintainer: | Joseph Lee <[email protected]> |
License: | file LICENSE |
Version: | 1.3.0 |
Built: | 2024-10-30 03:42:15 UTC |
Source: | https://github.com/bioc/Banksy |
Perform clustering in BANKSY's neighborhood-augmented feature space.
clusterBanksy( se, use_agf = FALSE, lambda = 0.2, use_pcs = TRUE, npcs = 20L, dimred = NULL, ndims = NULL, assay_name = NULL, group = NULL, algo = c("leiden", "louvain", "kmeans", "mclust"), k_neighbors = 50, resolution = 1, leiden.iter = -1, kmeans.centers = 5, mclust.G = 5, M = NULL, seed = NULL, ... )
clusterBanksy( se, use_agf = FALSE, lambda = 0.2, use_pcs = TRUE, npcs = 20L, dimred = NULL, ndims = NULL, assay_name = NULL, group = NULL, algo = c("leiden", "louvain", "kmeans", "mclust"), k_neighbors = 50, resolution = 1, leiden.iter = -1, kmeans.centers = 5, mclust.G = 5, M = NULL, seed = NULL, ... )
se |
A |
use_agf |
A logical vector specifying whether to use the AGF for clustering. |
lambda |
A numeric vector in |
use_pcs |
A logical scalar specifying whether to cluster on PCs. If FALSE, runs on the BANKSY matrix. |
npcs |
An integer scalar specifying the number of principal components
to use if |
dimred |
A string scalar specifying the name of an existing
dimensionality reduction result to use. Will overwrite |
ndims |
An integer scalar specifying the number of dimensions to use if
|
assay_name |
A string scalar specifying the name of the assay used in
|
group |
A string scalar specifying a grouping variable for samples in
|
algo |
A string scalar specifying the clustering algorithm to use; one of leiden, louvain, mclust, kmeans. |
k_neighbors |
An integer vector specifying number of neighbors for constructing sNN (for louvain / leiden). |
resolution |
A numeric vector specifying resolution used for clustering (louvain / leiden). |
leiden.iter |
An integer scalar specifying the number of leiden iterations. For running till convergence, set to -1 (leiden). |
kmeans.centers |
An integer vector specifying the number of kmeans clusters (kmeans). |
mclust.G |
An integer vector specifying the number of mixture components (Mclust). |
M |
Advanced usage. An integer vector specifying the highest azimuthal
Fourier harmonic to cluster with. If specified, overwrites the
|
seed |
Random seed for clustering. If not specified, no seed is set. |
... |
to pass to methods |
This function performs clustering on the principal components computed on
the BANKSY matrix, i.e., the BANKSY embedding. The PCA corresponding to the
parameters use_agf
and lambda
must have been computed with
runBanksyPCA. Clustering may also be performed directly on the
BANKSY matrix with use_pcs
set to FALSE
(this is not
recommended).
Four clustering algorithms are implemented.
leiden: Leiden graph-based clustering. The arguments
k_neighbors
and resolution
should be specified.
louvain: Louvain graph-based clustering. The arguments
k_neighbors
and resolution
should be specified.
kmeans: kmeans clustering. The argument kmeans.centers
should
be specified.
mclust: Gaussian mixture model-based clustering. The argument
mclust.G
should be specified.
By default, no seed is set for clustering. If a seed is specified, the same seed is used for clustering across the input parameters.
A SpatialExperiment / SingleCellExperiment / SummarizedExperiment
object with cluster labels in colData(se)
.
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = c(0, 0.2), npcs = 20) spe <- clusterBanksy(spe, M = 1, lambda = c(0, 0.2), resolution = 1)
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = c(0, 0.2), npcs = 20) spe <- clusterBanksy(spe, M = 1, lambda = c(0, 0.2), resolution = 1)
Get names of clustering runs.
clusterNames(se)
clusterNames(se)
se |
A |
A character vector of names of clustering runs.
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = c(0, 0.2), npcs = 20) spe <- clusterBanksy(spe, M = 1, lambda = c(0, 0.2), resolution = 1) clusterNames(spe)
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = c(0, 0.2), npcs = 20) spe <- clusterBanksy(spe, M = 1, lambda = c(0, 0.2), resolution = 1) clusterNames(spe)
Compare cluster outputs based on various clustering comparison measures.
compareClusters( se, func = c("ARI", "AMI", "MARI", "MARIraw", "RI", "NID", "NMI", "NVI"), digits = 3 )
compareClusters( se, func = c("ARI", "AMI", "MARI", "MARIraw", "RI", "NID", "NMI", "NVI"), digits = 3 )
se |
A |
func |
A string scalar specifying what clustering comparison measure to
compute. See |
digits |
An integer scalar specifying the number of digits to round to. |
A matrix of cluster comparison measures.
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = 0.2, npcs = 20) spe <- clusterBanksy(spe, M = 1, lambda = 0.2, resolution = c(0.1, 1)) spe <- connectClusters(spe) compareClusters(spe)
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = 0.2, npcs = 20) spe <- clusterBanksy(spe, M = 1, lambda = 0.2, resolution = c(0.1, 1)) spe <- connectClusters(spe) compareClusters(spe)
Compute the component neighborhood matrices for the BANKSY matrix.
computeBanksy( se, assay_name, coord_names = NULL, compute_agf = FALSE, k_geom = 15, spatial_mode = c("kNN_median", "kNN_r", "kNN_rn", "kNN_rank", "kNN_unif", "rNN_gauss"), n = 2, sigma = 1.5, alpha = 0.05, k_spatial = 100L, M = NULL, sample_size = NULL, sample_renorm = TRUE, seed = NULL, dimensions = "all", center = TRUE, verbose = TRUE )
computeBanksy( se, assay_name, coord_names = NULL, compute_agf = FALSE, k_geom = 15, spatial_mode = c("kNN_median", "kNN_r", "kNN_rn", "kNN_rank", "kNN_unif", "rNN_gauss"), n = 2, sigma = 1.5, alpha = 0.05, k_spatial = 100L, M = NULL, sample_size = NULL, sample_renorm = TRUE, seed = NULL, dimensions = "all", center = TRUE, verbose = TRUE )
se |
A |
assay_name |
A string scalar specifying the name of the assay to use. |
coord_names |
A string vector specifying the names in |
compute_agf |
A logical scalar specifying whether to compute the AGF. |
k_geom |
An integer scalar specifying the number of neighbors to use.
Values |
spatial_mode |
A string scalar specifying the kernel for neighborhood computation (default: kNN_median).
|
n |
A numeric scalar specifying the exponent of radius (for kNN_rn). |
sigma |
A numeric scalar specifying the std. dev. of Gaussian kernel (for rNN_gauss). |
alpha |
A numeric scalar specifying the radius used: larger alphas give smaller radii (for rNN_gauss). |
k_spatial |
An integer scalar specifying the initial number of neighbors to use (for rNN_gauss) |
M |
Advanced usage. A integer scalar specifying the highest azimuthal
Fourier harmonic to compute. If specified, overwrites the |
sample_size |
An integer scalar number of neighbors to sample from the neighborhood. |
sample_renorm |
A logical scalar specifying whether to renormalize the neighbor weights to 1. |
seed |
An integer scalar specifying seed for sampling the neighborhood. |
dimensions |
A character vector specifying the dimensions to use when computing neighborhood.
|
center |
A logical scalar specifying whether to center higher order harmonics in local neighborhoods. |
verbose |
A logical scalar specifying verbosity. |
Given an expression matrix (as specified by assay_name
), this function
computes the mean neighborhood matrix (H0
) and optionally, the
azimuthal Gabor filter (AGF) matrix (H1
). The number of neighbors
used to define the spatial neighborhood is given by k_geom
.
Different kernels may be used to compute the neighborhood features,
specified by spatial_mode
.
A SpatialExperiment / SingleCellExperiment / SummarizedExperiment object with neighborhood matrices added.
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30))
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30))
Relabel cluster labels across parameter runs to maximise their similarity.
connectClusters(se, map_to = NULL, verbose = TRUE)
connectClusters(se, map_to = NULL, verbose = TRUE)
se |
A |
map_to |
A string scalar specify a cluster to map to. |
verbose |
A logical scalar specifying verbosity. |
A SpatialExperiment / SingleCellExperiment / SummarizedExperiment
object with 'connected' cluster labels in colData(se)
.
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = c(0, 0.2), npcs = 20) spe <- clusterBanksy(spe, M = 1, lambda = c(0, 0.2), resolution = 1) spe <- connectClusters(spe)
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = c(0, 0.2), npcs = 20) spe <- clusterBanksy(spe, M = 1, lambda = c(0, 0.2), resolution = 1) spe <- connectClusters(spe)
Builds the BANKSY matrix from neighborhood matrices.
getBanksyMatrix( se, M, lambda, assay_name = NULL, scale = FALSE, group = NULL, verbose = TRUE )
getBanksyMatrix( se, M, lambda, assay_name = NULL, scale = FALSE, group = NULL, verbose = TRUE )
se |
A |
M |
A integer scalar specifying the highest azimuthal Fourier harmonic to compute. |
lambda |
A numeric vector in |
assay_name |
A string scalar specifying the name of the assay used in
|
scale |
A logical scalar specifying whether to scale the features to zero mean and unit standard deviation. This is performed before multiplying the assays by their corresponding lambda weighting factors. |
group |
A string scalar specifying a grouping variable for samples in
|
verbose |
A logical scalar specifying verbosity. |
After computation of the neighborhood matrices
(see computeBanksy), this function builds the BANKSY matrix by
concatenating the original expression matrix with the neighborhood matrices,
and scales each matrix by an appropriate weight as determined by
lambda
. The weights of the own expression matrix, mean neighborhood
matrix and azimuthal Gabor filter are given by ,
and
respectively, where
. In the case where the AGF is not computed, the weights for
the own and mean neighborhood expression matrix simplify to
and
respectively.
BANKSY matrix.
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) banksyMatrix <- getBanksyMatrix(spe, M = 1, lambda = 0.2)
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) banksyMatrix <- getBanksyMatrix(spe, M = 1, lambda = 0.2)
This dataset comprises VeraFISH profiling of cells in the mouse hippocampus. Gene expression and cell centroids for 10,944 cells and 129 genes in 2 spatial dimensions are provided. For details on how this dataset was generated, refer to Supplementary Information section 2.2 of our preprint.
data(hippocampus)
data(hippocampus)
A list with 2 entries:
(matrix) gene expression matrix
(data.frame) cell centroids in 2D
List with expression and locations
This dataset comprises gene expression and spatial coordinates for 50 genes
and 308 cells from 4 clusters (rings$clusters
). See system.file('scripts/rings.R', package='Banksy')
on how this dataset was generated.
data(rings)
data(rings)
A SpatialExperiment object.
A SpatialExperiment object
Run PCA on a BANKSY matrix.
runBanksyPCA( se, use_agf = FALSE, lambda = 0.2, npcs = 20L, assay_name = NULL, scale = TRUE, group = NULL, M = NULL, seed = NULL )
runBanksyPCA( se, use_agf = FALSE, lambda = 0.2, npcs = 20L, assay_name = NULL, scale = TRUE, group = NULL, M = NULL, seed = NULL )
se |
A |
use_agf |
A logical vector specifying whether to use the AGF for computing principal components. |
lambda |
A numeric vector in |
npcs |
An integer scalar specifying the number of principal components to compute. |
assay_name |
A string scalar specifying the name of the assay used in
|
scale |
A logical scalar specifying whether to scale features before PCA. Defaults to TRUE. |
group |
A string scalar specifying a grouping variable for samples in
|
M |
Advanced usage. An integer vector specifying the highest azimuthal
Fourier harmonic to use. If specified, overwrites the |
seed |
Seed for PCA. If not specified, no seed is set. |
This function runs PCA on the BANKSY matrix (see getBanksyMatrix) with features scaled to zero mean and unit standard deviation.
A SpatialExperiment / SingleCellExperiment / SummarizedExperiment
object with PC coordinates in reducedDims(se)
.
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = 0.2, npcs = 20)
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = 0.2, npcs = 20)
Run UMAP on a BANKSY embedding.
runBanksyUMAP( se, use_agf = FALSE, lambda = 0.2, use_pcs = TRUE, npcs = 20L, dimred = NULL, ndims = NULL, assay_name = NULL, scale = TRUE, group = NULL, n_neighbors = 30L, spread = 3, min_dist = 0.1, n_epochs = 300L, M = NULL, seed = NULL, ... )
runBanksyUMAP( se, use_agf = FALSE, lambda = 0.2, use_pcs = TRUE, npcs = 20L, dimred = NULL, ndims = NULL, assay_name = NULL, scale = TRUE, group = NULL, n_neighbors = 30L, spread = 3, min_dist = 0.1, n_epochs = 300L, M = NULL, seed = NULL, ... )
se |
A |
use_agf |
A logical vector specifying whether to use the AGF for computing UMAP. |
lambda |
A numeric vector in |
use_pcs |
A logical scalar specifying whether to run UMAP on PCs. If FALSE, runs on the BANKSY matrix. |
npcs |
An integer scalar specifying the number of principal components
to use if |
dimred |
A string scalar specifying the name of an existing
dimensionality reduction result to use. Will overwrite |
ndims |
An integer scalar specifying the number of dimensions to use if
|
assay_name |
A string scalar specifying the name of the assay used in
|
scale |
A logical scalar specifying whether to scale features before UMAP. Only used when use_pcs is FALSE. Defaults to TRUE. |
group |
A string scalar specifying a grouping variable for samples in
|
n_neighbors |
An integer scalar specifying the number of neighbors to use for UMAP. |
spread |
A numeric scalar specifying the effective scale of embedded points. |
min_dist |
A numeric scalar specifying the effective min. dist. between embedded points. |
n_epochs |
An integer scalar specifying the number of epochs to run UMAP optimization. |
M |
Advanced usage. An integer vector specifying the highest azimuthal
Fourier harmonic to use. If specified, overwrites the |
seed |
Seed for UMAP. If not specified, no seed is set. |
... |
parameters to pass to uwot::umap |
This function runs UMAP on the principal components computed on the BANKSY matrix.
A SpatialExperiment / SingleCellExperiment / SummarizedExperiment
object with UMAP coordinates in reducedDims(se)
.
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = 0.2, npcs = 20) spe <- runBanksyUMAP(spe, M = 1, lambda = 0.2)
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = 0.2, npcs = 20) spe <- runBanksyUMAP(spe, M = 1, lambda = 0.2)
Simulate an unrealistic spatial omics dataset.
simulateDataset(n_cells = 300, n_genes = 30, n_rings = 3, rate = 10)
simulateDataset(n_cells = 300, n_genes = 30, n_rings = 3, rate = 10)
n_cells |
An integer scalar specifying the approximate number of cells. |
n_genes |
An integer scalar specifying the number of genes. |
n_rings |
An integer scalar specifying the number of spatial rings. |
rate |
A numeric scalar specifying the Poisson rate parameter for simulating counts. |
This function generates an unrealistic spatial omics dataset based on a
user-specified number of cells and genes. The number of clusters is defined
by n_rings
, while counts follow a Poisson distribution with a
user-specified rate rate
. The simulation is set up such that the
number of cells in each cluster is uniformly distributed; as such, the final
number of cells is approximately equal to the user-specified number of cells.
A SpatialExperiment object.
set.seed(2023) rings <- simulateDataset(n_cells = 5e3, n_genes = 50, n_rings = 8) rings table(rings$cluster) df <- cbind.data.frame( SummarizedExperiment::colData(rings), SpatialExperiment::spatialCoords(rings)) library(ggplot2) ggplot(df, aes(x=x, y=y, col=cluster)) + geom_point() + theme_classic()
set.seed(2023) rings <- simulateDataset(n_cells = 5e3, n_genes = 50, n_rings = 8) rings table(rings$cluster) df <- cbind.data.frame( SummarizedExperiment::colData(rings), SpatialExperiment::spatialCoords(rings)) library(ggplot2) ggplot(df, aes(x=x, y=y, col=cluster)) + geom_point() + theme_classic()
k-Nearest neighbor cluster label smoothing.
smoothLabels( se, cluster_names = NULL, coord_names = NULL, k = 15L, prop_thres = 0.5, max_iter = 10, verbose = TRUE )
smoothLabels( se, cluster_names = NULL, coord_names = NULL, k = 15L, prop_thres = 0.5, max_iter = 10, verbose = TRUE )
se |
A |
cluster_names |
A string vector of label names to smooth. If NULL, smooths labels in colData(se) matching /^clust/ |
coord_names |
A string vector specifying the names in |
k |
An integer scalar specifying number of neighbors for smooething. |
prop_thres |
A numeric scalar |
max_iter |
An integer scalar specifying the max number of smoothing iterations. Set to -1 for smoothing to convergence. |
verbose |
A logical scalar specifying verbosity. |
As described in SpiceMix (https://doi.org/10.1038/s41588-022-01256-z). Implemented for labels that can be coerced to numeric only.
A SpatialExperiment / SingleCellExperiment / SummarizedExperiment
object with smoothed cluster labels in colData(se)
suffixed with
'_smooth'.
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = 0.2, npcs = 20) spe <- clusterBanksy(spe, M = 1, lambda = 0.2, resolution = 1) spe <- smoothLabels(spe, cluster_names = "clust_M1_lam0.2_k50_res1")
data(rings) spe <- computeBanksy(rings, assay_name = "counts", M = 1, k_geom = c(15, 30)) spe <- runBanksyPCA(spe, M = 1, lambda = 0.2, npcs = 20) spe <- clusterBanksy(spe, M = 1, lambda = 0.2, resolution = 1) spe <- smoothLabels(spe, cluster_names = "clust_M1_lam0.2_k50_res1")