smoothclust
is a method for segmentation of spatial
domains and spatially-aware clustering in spatial transcriptomics data.
The method generates spatial domains with smooth boundaries by smoothing
gene expression profiles across neighboring spatial locations, followed
by unsupervised clustering. Spatial domains consisting of consistent
mixtures of cell types may then be further investigated by applying cell
type compositional analyses or differential analyses.
The smoothclust
package can be installed from
Bioconductor as follows (using R version 4.4 onwards). This is the
recommended installation for most users. Additional details are shown on
the Bioconductor
package landing page.
The latest development version of the package can also be installed from the devel version of Bioconductor or from GitHub.
Input data can be provided either as a SpatialExperiment
object within the Bioconductor framework, or as numeric matrices of
expression values and spatial coordinates. See help file
(?smoothclust
) for details.
In the example workflow below, we assume the input is in
SpatialExperiment
format.
The example workflow in this section demonstrates how to run
smoothclust
to generate spatial domains with smooth
boundaries for a dataset from the 10x Genomics Visium platform.
library(smoothclust)
library(STexampleData)
library(scuttle)
library(scran)
library(scater)
library(ggspavis)
Load dataset from STexampleData
package:
## see ?STexampleData and browseVignettes('STexampleData') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## [1] 33538 3639
## [1] "counts"
Run smoothclust
using default parameter settings, which
have been selected to be appropriate for Visium data from human
tissue.
The method for smoothing can be specified by providing the
method
argument, with available options
uniform
, kernel
, and knn
.
Additional arguments can be used to set parameter values, including
bandwidth
, k,
and truncate
,
depending on the choice of method. For more details, see the function
documentation (?smoothclust
) or the paper (in
preparation).
The smoothed expression counts are stored in a new assay named
counts_smooth
:
## [1] "counts" "counts_smooth"
Calculate log-transformed normalized counts (logcounts) on the
smoothed expression counts. Here, the argument
assay.type = "counts_smooth"
specifies that we want to
calculate logcounts using the smoothed counts from
smoothclust
.
## [1] "counts" "counts_smooth" "logcounts"
Run clustering. We use a standard clustering workflow from single-cell data, consisting of k-means clustering on the top principal components (PCs) calculated on the set of top highly variable genes (HVGs) with logcounts as the input.
We use a relatively small number of clusters for demonstration purposes in this example:
# preprocessing steps for clustering
# remove mitochondrial genes
is_mito <- grepl("(^mt-)", rowData(spe)$gene_name, ignore.case = TRUE)
table(is_mito)
## is_mito
## FALSE TRUE
## 33525 13
## [1] 33525 3639
# select top highly variable genes (HVGs)
dec <- modelGeneVar(spe)
top_hvgs <- getTopHVGs(dec, prop = 0.1)
length(top_hvgs)
## [1] 986
## [1] 986 3639
# run k-means clustering
set.seed(123)
k <- 5
clust <- kmeans(reducedDim(spe, "PCA"), centers = k)$cluster
table(clust)
## clust
## 1 2 3 4 5
## 967 1574 492 272 334
Plot clusters / spatial domains generated by the smoothclust workflow:
# color palettes
pal8 <- "libd_layer_colors"
pal36 <- unname(palette.colors(36, "Polychrome 36"))
# plot clusters / spatial domains
plotSpots(spe, annotate = "label", pal = pal8)
Plot manually annotated reference labels, which can be used to evaluate the performance of the clustering in this dataset:
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggspavis_1.13.0 scater_1.35.1
## [3] ggplot2_3.5.1 scran_1.35.0
## [5] scuttle_1.17.0 STexampleData_1.14.0
## [7] SpatialExperiment_1.17.0 SingleCellExperiment_1.29.1
## [9] SummarizedExperiment_1.37.0 Biobase_2.67.0
## [11] GenomicRanges_1.59.1 GenomeInfoDb_1.43.4
## [13] IRanges_2.41.2 S4Vectors_0.45.2
## [15] MatrixGenerics_1.19.1 matrixStats_1.5.0
## [17] ExperimentHub_2.15.0 AnnotationHub_3.15.0
## [19] BiocFileCache_2.15.1 dbplyr_2.5.0
## [21] BiocGenerics_0.53.6 generics_0.1.3
## [23] smoothclust_1.3.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 sys_3.4.3 jsonlite_1.8.9
## [4] wk_0.9.4 magrittr_2.0.3 ggbeeswarm_0.7.2
## [7] magick_2.8.5 farver_2.1.2 rmarkdown_2.29
## [10] vctrs_0.6.5 spdep_1.3-10 memoise_2.0.1
## [13] htmltools_0.5.8.1 S4Arrays_1.7.1 curl_6.2.0
## [16] BiocNeighbors_2.1.2 s2_1.1.7 SparseArray_1.7.4
## [19] sass_0.4.9 spData_2.3.4 KernSmooth_2.23-26
## [22] bslib_0.9.0 cachem_1.1.0 buildtools_1.0.0
## [25] igraph_2.1.4 mime_0.12 ggside_0.3.1
## [28] lifecycle_1.0.4 pkgconfig_2.0.3 rsvd_1.0.5
## [31] Matrix_1.7-2 R6_2.5.1 fastmap_1.2.0
## [34] GenomeInfoDbData_1.2.13 digest_0.6.37 colorspace_2.1-1
## [37] AnnotationDbi_1.69.0 dqrng_0.4.1 irlba_2.3.5.1
## [40] RSQLite_2.3.9 beachmat_2.23.6 labeling_0.4.3
## [43] filelock_1.0.3 httr_1.4.7 abind_1.4-8
## [46] compiler_4.4.2 proxy_0.4-27 bit64_4.6.0-1
## [49] withr_3.0.2 BiocParallel_1.41.0 viridis_0.6.5
## [52] DBI_1.2.3 rappdirs_0.3.3 DelayedArray_0.33.4
## [55] rjson_0.2.23 classInt_0.4-11 bluster_1.17.0
## [58] tools_4.4.2 units_0.8-5 vipor_0.4.7
## [61] beeswarm_0.4.0 glue_1.8.0 grid_4.4.2
## [64] sf_1.0-19 cluster_2.1.8 gtable_0.3.6
## [67] class_7.3-23 BiocSingular_1.23.0 ScaledMatrix_1.15.0
## [70] metapod_1.15.0 sp_2.2-0 XVector_0.47.2
## [73] ggrepel_0.9.6 BiocVersion_3.21.1 pillar_1.10.1
## [76] limma_3.63.3 dplyr_1.1.4 lattice_0.22-6
## [79] bit_4.5.0.1 deldir_2.0-4 tidyselect_1.2.1
## [82] locfit_1.5-9.10 maketools_1.3.1 Biostrings_2.75.3
## [85] knitr_1.49 gridExtra_2.3 edgeR_4.5.2
## [88] xfun_0.50 statmod_1.5.0 UCSC.utils_1.3.1
## [91] yaml_2.3.10 boot_1.3-31 evaluate_1.0.3
## [94] codetools_0.2-20 tibble_3.2.1 BiocManager_1.30.25
## [97] cli_3.6.3 munsell_0.5.1 jquerylib_0.1.4
## [100] Rcpp_1.0.14 png_0.1-8 parallel_4.4.2
## [103] blob_1.2.4 sparseMatrixStats_1.19.0 viridisLite_0.4.2
## [106] scales_1.3.0 e1071_1.7-16 purrr_1.0.2
## [109] crayon_1.5.3 rlang_1.1.5 KEGGREST_1.47.0