The sosta package can be installed from Bioconductor as follows:
For this vignette, we will need several additional packages:
As an example, we will load an simulated dataset with three images and three cell types A, B and C, which are stored in a SpatialExperiment object:
# load the data
data("sostaSPE")
sostaSPE
#> class: SpatialExperiment
#> dim: 0 5641
#> metadata(0):
#> assays(0):
#> rownames: NULL
#> rowData names(0):
#> colnames: NULL
#> colData names(3): cellType imageName sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : x y
#> imgData names(0):
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = cellType)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal()
The goal is to reconstruct, or segment, the structures given by cell type A.
In this example, we will reconstruct cellular “structures” based on the point pattern density of the cells of type A. We will start with estimating parameters that are used for reconstruction afterwards. For one image, this can be illustrated as follows:
shapeIntensityImage(
sostaSPE,
marks = "cellType",
imageCol = "imageName",
imageId = "image1",
markSelect = "A"
)
We see the pixel-wise density image on the left and a histogram of
the intensity values on the right. The estimated threshold is roughly
the mean between the two modes of the (truncated) pixel intensity
distribution. The dimensions of the pixel image are specified on the
bottom left. The dimensions correspond to the density image, setting a
higher value for the dim
parameter will result in a higher
resolution density image but will not change how many points are within
a pixel. This can be changed by varying the smoothing parameter
(bndw
).
This was done for one image. The function
estimateReconstructionParametersSPE
returns two plots with
the estimated parameters for each image. The parameter bndw
is the bandwidth parameter that is used for estimating the intensity
profile of the point pattern. The parameter thres
is the
estimated parameter for the density threshold for reconstruction.
n <- estimateReconstructionParametersSPE(
sostaSPE,
marks = "cellType",
imageCol = "imageName",
markSelect = "A",
plotHist = TRUE
)
We will use the mean of the two estimated vectors as our parameters.
We can now use the function reconstructShapeDensity
to
segment the cell-type-A structure into regions, the result of which is
sf polygon
(Pebesma 2018).
(struct <- reconstructShapeDensityImage(
sostaSPE,
marks = "cellType",
imageCol = "imageName",
imageId = "image1",
markSelect = "A",
bndw = bndwSPE,
dim = 500,
thres = thresSPE
))
#> Simple feature collection with 3 features and 0 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: 0.1061074 ymin: 0.1104626 xmax: 127.8393 ymax: 127.9601
#> CRS: NA
#> sostaPolygon
#> 1 POLYGON ((3.938104 113.6409...
#> 2 POLYGON ((96.41697 125.6588...
#> 3 POLYGON ((47.11194 127.4487...
We can then plot both the points and the segmented polygon.
cbind(
colData(sostaSPE[, sostaSPE$imageName == "image1"]),
spatialCoords(sostaSPE[, sostaSPE$imageName == "image1"])
) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = cellType)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal() +
geom_sf(
data = struct,
fill = NA,
color = "darkblue",
inherit.aes = FALSE, # this is important
linewidth = 1
)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
If no arguments are given, the function
reconstructShapeDensityImage
estimates the reconstruction
parameters internally. The bandwidth is estimated using cross validation
using the function bw.diggle
of the spatstat.explore
package. The threshold is estimated by taking the mean between the two
modes of the pixel intensity distribution as illustrated above.
struct2 <- reconstructShapeDensityImage(
sostaSPE,
marks = "cellType",
imageCol = "imageName",
imageId = "image1",
markSelect = "A",
dim = 500
)
cbind(
colData(sostaSPE[, sostaSPE$imageName == "image1"]),
spatialCoords(sostaSPE[, sostaSPE$imageName == "image1"])
) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = cellType)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal() +
geom_sf(
data = struct2,
fill = NA,
color = "darkblue",
inherit.aes = FALSE, # this is important
linewidth = 1
)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
The function reconstructShapeDensitySPE
reconstructs the
cell-type-A structure for all images in the spe
object. We
use the estimated parameters from above.
Next, we assign cells in the SingleCellExperiment
object
to their specific structure.
assign <- assingCellsToStructures(sostaSPE, allStructs,
imageCol = "imageName", nCores = 1
)
sostaSPE$structAssign <- assign
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = structAssign)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal()
Using the function cellTypeProportions
, we can estimate
the proportion of cell types witin each individual structure.
cellTypeProportions(sostaSPE, "structAssign", "cellType")
#> A B C
#> image1_2 0.8302326 0.08837209 0.08139535
#> image1_1 0.7996255 0.12172285 0.07865169
#> image1_3 0.8333333 0.12500000 0.04166667
#> image2_3 0.8770764 0.07475083 0.04817276
#> image2_2 0.8609626 0.09447415 0.04456328
#> image3_7 0.8088235 0.10160428 0.08957219
#> image3_2 0.6200000 0.28000000 0.10000000
#> image3_5 0.8421053 0.10526316 0.05263158
#> image3_6 0.7307692 0.19230769 0.07692308
#> image3_4 0.7894737 0.21052632 0.00000000
#> image3_1 0.8000000 0.20000000 0.00000000
#> image3_3 1.0000000 0.00000000 0.00000000
The function totalShapeMetrics
calculates a set of
metrics related to the shape of the structures
(shapeMetrics <- totalShapeMetrics(allStructs))
#> image1_1 image1_2 image1_3 image2_1 image2_2
#> Area 4250.2792228 3574.6477409 226.9959779 1.3752774 4295.1222139
#> Compactness 0.1346764 0.2519490 0.6080376 0.4581489 0.1953486
#> Eccentricity 0.7875707 0.4641678 0.8363778 0.9999739 0.5647329
#> Circularity 0.4723775 0.5970360 0.8846882 0.5943689 0.4862921
#> Solidity 0.5963449 0.8051348 0.9769469 0.8936170 0.6614292
#> Curl 0.6224990 0.3932054 0.2617308 0.3922730 0.4574191
#> fibreLength 300.7421880 192.5589303 25.2605053 2.5265850 245.3105698
#> fibreWidth 14.1326338 18.5639157 8.9862010 0.5443226 17.5089162
#> image2_3 image3_1 image3_2 image3_3 image3_4
#> Area 4746.6716180 51.8620571 417.5778496 12.6875650 145.71079859
#> Compactness 0.2150477 0.5718048 0.6331764 0.6342179 0.35221739
#> Eccentricity 0.4239889 0.5710073 0.7120634 0.9381917 0.27046967
#> Circularity 0.5075727 0.7502777 0.8892681 0.9053567 0.45173303
#> Solidity 0.7430950 0.9653074 0.9832153 0.9440389 0.96512887
#> Curl 0.4753033 0.1632771 0.1889006 0.2827801 0.09665593
#> fibreLength 243.8668860 12.8415084 32.7783740 5.7028999 31.41216792
#> fibreWidth 19.4641909 4.0386266 12.7394315 2.2247567 4.63867374
#> image3_5 image3_6 image3_7
#> Area 186.3240867 234.8507530 6621.3393593
#> Compactness 0.6428357 0.5221272 0.1752898
#> Eccentricity 0.8736600 0.7688595 0.7542569
#> Circularity 0.9288492 0.7443243 0.5683033
#> Solidity 0.9787015 0.9320010 0.6863814
#> Curl 0.2514767 0.2936578 0.6032739
#> fibreLength 21.5160903 29.6774666 324.0514214
#> fibreWidth 8.6597558 7.9134367 20.4329897
cbind(allStructs, t(shapeMetrics)) |>
ggplot() +
geom_sf(aes(fill = Area)) +
facet_wrap(~imageName)
Using the function minBoundaryDistances
, we can compute
the distance to the border structure. Negative values indicate that the
points lie inside the structure.
sostaSPE$minDist <- minBoundaryDistances(
sostaSPE, "imageName",
"structAssign", allStructs
)
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = minDist)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal() +
scale_colour_gradient2() +
geom_sf(
data = allStructs,
fill = NA,
inherit.aes = FALSE
) +
facet_wrap(~imageName)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
This information can be used to define border cells by thresholding to a range of positive and negative values.
sostaSPE$border <- ifelse(abs(sostaSPE$minDist) < 3, TRUE, FALSE)
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = border)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal() +
geom_sf(
data = allStructs,
fill = NA,
inherit.aes = FALSE
) +
facet_wrap(~imageName)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
Alternatively, borders can be defined using
st_difference
and st_buffer
on the structure
object. In this case we will have an sf
polygon that
correspond to the border region.
borders <- lapply(
st_geometry(allStructs),
\(x) st_difference(st_buffer(x, 3), st_buffer(x, -3))
) |>
# both needed for proper conversion
st_as_sfc() |> st_as_sf()
borders$imageName <- allStructs$imageName
borders$borderID <- paste0("border_", allStructs$structID)
sostaSPE$borderSf <- assingCellsToStructures(sostaSPE,
borders,
imageCol = "imageName",
uniqueId = "borderID",
nCores = 1
)
cbind(colData(sostaSPE), spatialCoords(sostaSPE)) |>
as.data.frame() |>
ggplot(aes(x = x, y = y, color = borderSf)) +
geom_point(size = 0.25) +
facet_wrap(~imageName) +
coord_equal() +
geom_sf(
data = borders,
fill = NA,
inherit.aes = FALSE
) +
facet_wrap(~imageName)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
sessionInfo()
#> R version 4.4.3 (2025-02-28)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] ggfortify_0.4.17 tidyr_1.3.1
#> [3] sosta_0.99.5 SpatialExperiment_1.17.0
#> [5] SingleCellExperiment_1.29.2 SummarizedExperiment_1.37.0
#> [7] Biobase_2.67.0 GenomicRanges_1.59.1
#> [9] GenomeInfoDb_1.43.4 IRanges_2.41.3
#> [11] S4Vectors_0.45.4 MatrixGenerics_1.19.1
#> [13] matrixStats_1.5.0 sf_1.0-19
#> [15] lmerTest_3.1-3 lme4_1.1-36
#> [17] Matrix_1.7-3 ggplot2_3.5.1
#> [19] ExperimentHub_2.15.0 AnnotationHub_3.15.0
#> [21] BiocFileCache_2.15.1 dbplyr_2.5.0
#> [23] BiocGenerics_0.53.6 generics_0.1.3
#> [25] dplyr_1.1.4 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 sys_3.4.3 jsonlite_1.9.1
#> [4] magrittr_2.0.3 spatstat.utils_3.1-3 magick_2.8.5
#> [7] farver_2.1.2 nloptr_2.2.1 rmarkdown_2.29
#> [10] vctrs_0.6.5 memoise_2.0.1 minqa_1.2.8
#> [13] spatstat.explore_3.4-2 RCurl_1.98-1.16 terra_1.8-29
#> [16] htmltools_0.5.8.1 S4Arrays_1.7.3 curl_6.2.1
#> [19] SparseArray_1.7.7 sass_0.4.9 KernSmooth_2.23-26
#> [22] bslib_0.9.0 htmlwidgets_1.6.4 cachem_1.1.0
#> [25] buildtools_1.0.0 mime_0.13 lifecycle_1.0.4
#> [28] pkgconfig_2.0.3 R6_2.6.1 fastmap_1.2.0
#> [31] GenomeInfoDbData_1.2.13 rbibutils_2.3 digest_0.6.37
#> [34] numDeriv_2016.8-1.1 colorspace_2.1-1 patchwork_1.3.0
#> [37] AnnotationDbi_1.69.0 tensor_1.5 RSQLite_2.3.9
#> [40] filelock_1.0.3 labeling_0.4.3 spatstat.sparse_3.1-0
#> [43] httr_1.4.7 polyclip_1.10-7 abind_1.4-8
#> [46] compiler_4.4.3 proxy_0.4-27 bit64_4.6.0-1
#> [49] withr_3.0.2 tiff_0.1-12 DBI_1.2.3
#> [52] MASS_7.3-65 rappdirs_0.3.3 DelayedArray_0.33.6
#> [55] rjson_0.2.23 classInt_0.4-11 tools_4.4.3
#> [58] units_0.8-7 goftest_1.2-3 glue_1.8.0
#> [61] nlme_3.1-167 EBImage_4.49.0 grid_4.4.3
#> [64] gtable_0.3.6 spatstat.data_3.1-6 class_7.3-23
#> [67] XVector_0.47.2 spatstat.geom_3.3-6 stringr_1.5.1
#> [70] BiocVersion_3.21.1 pillar_1.10.1 splines_4.4.3
#> [73] lattice_0.22-6 bit_4.6.0 deldir_2.0-4
#> [76] tidyselect_1.2.1 locfit_1.5-9.12 maketools_1.3.2
#> [79] Biostrings_2.75.4 knitr_1.50 gridExtra_2.3
#> [82] reformulas_0.4.0 xfun_0.51 smoothr_1.0.1
#> [85] stringi_1.8.4 UCSC.utils_1.3.1 fftwtools_0.9-11
#> [88] yaml_2.3.10 boot_1.3-31 evaluate_1.0.3
#> [91] codetools_0.2-20 tibble_3.2.1 BiocManager_1.30.25
#> [94] cli_3.6.4 Rdpack_2.6.3 munsell_0.5.1
#> [97] jquerylib_0.1.4 Rcpp_1.0.14 spatstat.random_3.3-3
#> [100] png_0.1-8 spatstat.univar_3.1-2 parallel_4.4.3
#> [103] blob_1.2.4 jpeg_0.1-11 bitops_1.0-9
#> [106] viridisLite_0.4.2 scales_1.3.0 e1071_1.7-16
#> [109] purrr_1.0.4 crayon_1.5.3 rlang_1.1.5
#> [112] KEGGREST_1.47.0