The sosta package can be installed from Bioconductor as follows:
For this vignette, we will need several additional packages:
In this vignette, we will use an imaging mass cytometry (IMC) dataset of pancreatic islets from human donors at different stages of type 1 diabetes (T1D) and healthy controls (Damond et al. 2019). Note that we will only use a subset of the patients.
# load experiment hub
eh <- ExperimentHub()
oid <- names(eh[eh$title == "Damond_2019_Pancreas - sce - v1 - full"])
# Load single cell experiment object
spe <- eh[[oid]]
# Convert to spatial experiment object
spe <- toSpatialExperiment(spe,
sample_id = "image_name",
spatialCoordsNames = c("cell_x", "cell_y")
)
First, we plot the data for illustration. As we have multiple images per patient, we will subset to a few slides. As can be seen, the dimensions of the field of view are different for each image.
df <- cbind(
colData(spe[, spe$image_name %in% c("E04", "E03", "G01", "J02")]),
spatialCoords(spe[, spe$image_name %in% c("E04", "E03", "G01", "J02")])
)
df |>
as.data.frame() |>
ggplot(aes(x = cell_x, y = cell_y, color = cell_category)) +
geom_point(size = 0.25) +
facet_wrap(~image_name, ncol = 2) +
coord_equal()
The goal is to reconstruct / segment and quantify the pancreatic islets.
In this example, we will reconstruct the islets based on the point pattern density of the islet cells. We will start with estimating the parameters that we use for reconstruction afterwards. For one image this can be illustrated as follows.
shapeIntensityImage(
spe,
marks = "cell_category",
imageCol = "image_name",
imageId = "G01",
markSelect = "islet"
)
We see the density (pixel-level) image on the left and a histogram of the intensity values on the right. The estimated threshold is roughly the mean between the two modes of the (truncated) pixel intensity distribution.
Note that the above calculation was done for one image. The function
estimateReconstructionParametersSPE
returns two plots with
the estimated parameters for each image. The parameter bndw
is the bandwidth parameter that is used for estimating the intensity
profile of the point pattern. The parameter thres
is the
estimated parameter for the density threshold for reconstruction. We
subset 20 images to speed up computation.
n <- estimateReconstructionParametersSPE(
spe,
marks = "cell_category",
imageCol = "image_name",
markSelect = "islet",
nImages = 20,
plotHist = TRUE
)
We can inspect the relationship of the estimated bandwidth and threshold.
We note that the estimated bandwidth varies more than the estimated threshold. We will use the mean of the two estimated vectors as our parameters.
The function reconstructShapeDensitySPE
reconstructs the
islets for all images in the spe
object. We use the
estimated parameters from above. For computational reasons, we will
subset to 10 images per patient for the rest of the vignette.
# Sample 15 images per patient
sel <- colData(spe) |>
as.data.frame() |>
group_by(patient_id) |>
select(image_name) |>
sample_n(size = 10, replace = FALSE) |>
pull(image_name)
#> Adding missing grouping variables: `patient_id`
# Select sampled images
spe <- spe[, spe$image_name %in% sel]
# Run on all images
allIslets <- reconstructShapeDensitySPE(
spe,
marks = "cell_category",
imageCol = "image_name",
markSelect = "islet",
bndw = bndwSPE,
thres = thresSPE,
nCores = 1
)
The result is a (simple feature collection)[https://r-spatial.github.io/sf/articles/sf1.html]. This
contains the polygons (<GEOMETRY>
column), a
structure identifier (structID
) and the image identifier
(image_name
). Let’s add some patient metadata to the
object.
colsKeep <- c(
"patient_stage", "tissue_slide", "tissue_region",
"patient_id", "patient_disease_duration",
"patient_age", "patient_gender", "sample_id"
)
patientData <- colData(spe) |>
as_tibble() |>
group_by(image_name) |>
select(all_of(colsKeep)) |>
unique()
#> Adding missing grouping variables: `image_name`
allIslets <- allIslets |>
dplyr::left_join(patientData, by = "image_name")
We can now inspect the number of structures found per patient or image.
allIslets |>
st_drop_geometry() |>
group_by(patient_id) |>
summarise(n = n()) |>
ungroup()
#> # A tibble: 12 × 2
#> patient_id n
#> <int> <int>
#> 1 6089 25
#> 2 6126 17
#> 3 6134 24
#> 4 6180 28
#> 5 6228 17
#> 6 6264 27
#> 7 6278 29
#> 8 6362 30
#> 9 6380 28
#> 10 6386 19
#> 11 6414 28
#> 12 6418 25
Now that we have islet structures for all images, we can now use the
function totalShapeMetrics
to calculate a set of metrics
related to the shape of the islets.
The result is a simple feature collection with polygons. We will add some patient level information to the simple feature collection for plotting afterwards.
We use PCA to get an overview of the different features. Each dot represents one structure.
library(ggfortify)
autoplot(
prcomp(t(isletMetrics), scale. = TRUE),
x = 1,
y = 2,
data = allIslets,
color = "patient_stage",
size = 2,
# shape = 'type',
loadings = TRUE,
loadings.colour = "steelblue3",
loadings.label = TRUE,
loadings.label.size = 3,
loadings.label.repel = TRUE,
loadings.label.colour = "black"
) +
scale_color_brewer(palette = "Dark2") +
theme_bw() +
coord_fixed()
We can use boxplots to investigate differences of shape metrics between patient stages. We will subset to a few metrics that are not colinear in the PCA plot. Note that the boxplots don’t reveal patient specific effects.
allIslets |>
sf::st_drop_geometry() |>
select(patient_stage, rownames(isletMetrics)) |>
pivot_longer(-patient_stage) |>
filter(name %in% c("Area", "Compactness", "Curl")) |>
ggplot(aes(x = patient_stage, y = value, fill = patient_stage)) +
geom_boxplot() +
facet_wrap(~name, scales = "free") +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
guides(fill = "none")
As the individual structure level metrics are not independent we have to account for dependence between measurements. This dependence can lie on the level of the patient and the slide as we have repeated measurements for each level.
To account for this, we will use mixed linear models with random effects for the patient and the individual slides (image name). We will use the lme4 package for fitting linear mixed effects models (Bates et al. 2015) and lmerTest for p-value calculation (Kuznetsova, Brockhoff, and Christensen 2017).
To see differences between the Area of the islets between conditions, we can test as follows.
mod <- lmer(Area ~ patient_stage + (1 | patient_id) + (1 | image_name), data = allIslets)
#> boundary (singular) fit: see help('isSingular')
summary(mod)
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: Area ~ patient_stage + (1 | patient_id) + (1 | image_name)
#> Data: allIslets
#>
#> REML criterion at convergence: 6618.3
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -1.4215 -0.4256 -0.1589 0.1260 5.9149
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> image_name (Intercept) 0 0
#> patient_id (Intercept) 31101292 5577
#> Residual 321853054 17940
#> Number of obs: 297, groups: image_name, 110; patient_id, 12
#>
#> Fixed effects:
#> Estimate Std. Error df t value Pr(>|t|)
#> (Intercept) 17290.369 3391.002 10.308 5.099 0.000423
#> patient_stageOnset -3308.928 4744.110 9.869 -0.697 0.501602
#> patient_stageLong-duration -12655.942 4727.033 9.773 -2.677 0.023635
#>
#> (Intercept) ***
#> patient_stageOnset
#> patient_stageLong-duration *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr) ptnt_O
#> ptnt_stgOns -0.715
#> ptnt_stgLn- -0.717 0.513
#> optimizer (nloptwrap) convergence code: 0 (OK)
#> boundary (singular) fit: see help('isSingular')
As we can see from summary(mod)
there is a signification
difference in islet area of long-duration patients with respect to
non-diabetic patients, accounting for correlation on the patient and
image level. Please note that this was only run on a subset of the
data.
sessionInfo()
#> R version 4.4.3 (2025-02-28)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] ggfortify_0.4.17 tidyr_1.3.1
#> [3] sosta_0.99.5 SpatialExperiment_1.17.0
#> [5] SingleCellExperiment_1.29.2 SummarizedExperiment_1.37.0
#> [7] Biobase_2.67.0 GenomicRanges_1.59.1
#> [9] GenomeInfoDb_1.43.4 IRanges_2.41.3
#> [11] S4Vectors_0.45.4 MatrixGenerics_1.19.1
#> [13] matrixStats_1.5.0 sf_1.0-19
#> [15] lmerTest_3.1-3 lme4_1.1-36
#> [17] Matrix_1.7-3 ggplot2_3.5.1
#> [19] ExperimentHub_2.15.0 AnnotationHub_3.15.0
#> [21] BiocFileCache_2.15.1 dbplyr_2.5.0
#> [23] BiocGenerics_0.53.6 generics_0.1.3
#> [25] dplyr_1.1.4 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 sys_3.4.3 jsonlite_1.9.1
#> [4] magrittr_2.0.3 spatstat.utils_3.1-3 magick_2.8.5
#> [7] farver_2.1.2 nloptr_2.2.1 rmarkdown_2.29
#> [10] vctrs_0.6.5 memoise_2.0.1 minqa_1.2.8
#> [13] spatstat.explore_3.4-2 RCurl_1.98-1.16 terra_1.8-29
#> [16] htmltools_0.5.8.1 S4Arrays_1.7.3 curl_6.2.1
#> [19] SparseArray_1.7.7 sass_0.4.9 KernSmooth_2.23-26
#> [22] bslib_0.9.0 htmlwidgets_1.6.4 cachem_1.1.0
#> [25] buildtools_1.0.0 mime_0.13 lifecycle_1.0.4
#> [28] pkgconfig_2.0.3 R6_2.6.1 fastmap_1.2.0
#> [31] GenomeInfoDbData_1.2.13 rbibutils_2.3 digest_0.6.37
#> [34] numDeriv_2016.8-1.1 colorspace_2.1-1 patchwork_1.3.0
#> [37] AnnotationDbi_1.69.0 tensor_1.5 RSQLite_2.3.9
#> [40] filelock_1.0.3 labeling_0.4.3 spatstat.sparse_3.1-0
#> [43] httr_1.4.7 polyclip_1.10-7 abind_1.4-8
#> [46] compiler_4.4.3 proxy_0.4-27 bit64_4.6.0-1
#> [49] withr_3.0.2 tiff_0.1-12 DBI_1.2.3
#> [52] MASS_7.3-65 rappdirs_0.3.3 DelayedArray_0.33.6
#> [55] rjson_0.2.23 classInt_0.4-11 tools_4.4.3
#> [58] units_0.8-7 goftest_1.2-3 glue_1.8.0
#> [61] nlme_3.1-167 EBImage_4.49.0 grid_4.4.3
#> [64] gtable_0.3.6 spatstat.data_3.1-6 class_7.3-23
#> [67] XVector_0.47.2 spatstat.geom_3.3-6 stringr_1.5.1
#> [70] BiocVersion_3.21.1 pillar_1.10.1 splines_4.4.3
#> [73] lattice_0.22-6 bit_4.6.0 deldir_2.0-4
#> [76] tidyselect_1.2.1 locfit_1.5-9.12 maketools_1.3.2
#> [79] Biostrings_2.75.4 knitr_1.50 gridExtra_2.3
#> [82] reformulas_0.4.0 xfun_0.51 smoothr_1.0.1
#> [85] stringi_1.8.4 UCSC.utils_1.3.1 fftwtools_0.9-11
#> [88] yaml_2.3.10 boot_1.3-31 evaluate_1.0.3
#> [91] codetools_0.2-20 tibble_3.2.1 BiocManager_1.30.25
#> [94] cli_3.6.4 Rdpack_2.6.3 munsell_0.5.1
#> [97] jquerylib_0.1.4 Rcpp_1.0.14 spatstat.random_3.3-3
#> [100] png_0.1-8 spatstat.univar_3.1-2 parallel_4.4.3
#> [103] blob_1.2.4 jpeg_0.1-11 bitops_1.0-9
#> [106] viridisLite_0.4.2 scales_1.3.0 e1071_1.7-16
#> [109] purrr_1.0.4 crayon_1.5.3 rlang_1.1.5
#> [112] KEGGREST_1.47.0