Segmenting and normalizing multiplexed imaging data with simpleSeg

# load required packages
library(simpleSeg)
library(ggplot2)
library(EBImage)
library(cytomapper)

Installation

# Install the package from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

BiocManager::install("simpleSeg")

Overview

The simpleSeg package extends existing bioconductor packages such as cytomapper and EBImage by providing a structured pipeline for creating segmentation masks from multiplexed cellular images in the form of tiff stacks. This allows for the single cell information of these images to be extracted in R, without the need for external segmentation programs. simpleSeg also facilitates the normalisation of cellular features after these features have been extracted from the image, priming cells for classification / clustering. These functions leverage the functionality of the EBImage package on Bioconductor. For more flexibility when performing your segmentation in R we recommend learning to use the EBimage package. A key strength of simpleSeg is that we have coded multiple ways to perform some simple segmentation operations as well as incorporating multiple automatic procedures to optimise key parameters when these aren’t specified.

Load example data

In the following we will reanalyse two MIBI-TOF images from (Risom et al., 2022) profiling the spatial landscape of ductal carcinoma in situ (DCIS), which is a pre-invasive lesion that is thought to be a precursor to invasive breast cancer (IBC). These images are stored in the “extdata” folder in the package. When the path to this folder is identified, we can read these images into R using readImage from EBImage and store these as a CytoImageList using the cytomapper package.

# Get path to image directory
pathToImages <- system.file("extdata", package = "simpleSeg")

# Get directories of images

imageDirs <- dir(pathToImages, "Point", full.names = TRUE)
names(imageDirs) <- dir(pathToImages, "Point", full.names = FALSE)





# Get files in each directory
files <- files <- lapply(
  imageDirs,
  list.files,
  pattern = "tif",
  full.names = TRUE
)

# Read files with readImage from EBImage
images <- lapply(files, EBImage::readImage, as.is = TRUE)

# Convert to cytoImageList
images <- cytomapper::CytoImageList(images)
mcols(images)$imageID <- names(images)

Segmentation

simpleSeg accepts an Image, list of Image’s, or CytoImageList as input and generates a CytoImageList of masks as output. Here we will use the histone H3 channel in the image as a nuclei marker for segmentation. By default, simpleseg will isolate individual nuclei by watershedding using a combination of the intensity of this marker and a distance map. Nuclei are dilated out by 3 pixels to capture the cytoplasm. The user may also specify simple image transformations using the transform argument.

masks <- simpleSeg::simpleSeg(images,
  nucleus = "HH3",
  transform = "sqrt"
)

Visualise separation

The display and colorLabels functions in EBImage make it very easy to examine the performance of the cell segmentation. The great thing about display is that if used in an interactive session it is very easy to zoom in and out of the image.

# Visualise segmentation performance one way.
EBImage::display(colorLabels(masks[[1]]))

Visualise outlines

The plotPixels function in cytomapper make it easy to overlay the masks on top of the intensities of 6 markers. Here we can see that the segmentation appears to be performing reasonably.

# Visualise segmentation performance another way.
cytomapper::plotPixels(
  image = images[1],
  mask = masks[1],
  img_id = "imageID",
  colour_by = c("PanKRT", "GLUT1", "HH3", "CD3", "CD20"),
  display = "single",
  colour = list(
    HH3 = c("black", "blue"),
    CD3 = c("black", "purple"),
    CD20 = c("black", "green"),
    GLUT1 = c("black", "red"),
    PanKRT = c("black", "yellow")
  ),
  bcg = list(
    HH3 = c(0, 1, 1.5),
    CD3 = c(0, 1, 1.5),
    CD20 = c(0, 1, 1.5),
    GLUT1 = c(0, 1, 1.5),
    PanKRT = c(0, 1, 1.5)
  ),
  legend = NULL
)

Methods of Watershedding

Watershedding is a method which treats images as topographical maps in order to identify individual objects and the borders between them.

The user may specify how watershedding is to be performed by using the watershed argument in simpleSeg.

Method Description
“distance” Performs watershedding on a distance map of the thresholded nuclei signal. With a pixels distance being defined as the distance from the closest background signal.
“intensity” Performs watershedding using the intensity of the nuclei marker.
“combine” Combines the previous two methods by multiplying the distance map by the nuclei marker intensity.

Methods of cell body identification

The cell body can also be identified in simpleSeg using models of varying complexity, specified with the cellBody argument.

Method Description
“dilation” Dilates the nuclei by an amount defined by the user. The size of the dilatation in pixels may be specified with the discDize argument.
“discModel” Uses all the markers to predict the presence of dilated ‘discs’ around the nuclei. The model therefore learns which markers are typically present in the cell cytoplasm and generates a mask based on this.
“marker” The user may specify one or multiple dedicated cytoplasm markers to predict the cytoplasm. This can be done using cellBody = "marker name"/"index"
“None” The nuclei mask is returned directly.

Parallel Processing

simpleSeg also supports parallel processing, with the cores argument being used to specify how many cores should be used.

masks <- simpleSeg::simpleSeg(images,
  nucleus = "HH3",
  cores = 1
)

Summarise cell features

In order to characterise the phenotypes of each of the segmented cells, measureObjects from cytomapper will calculate the average intensity of each channel within each cell as well as a few morphological features. The channel intensities will be stored in the counts assay in a SingleCellExperiment. Information on the spatial location of each cell is stored in colData in the m.cx and m.cy columns. In addition to this, it will propagate the information we have store in the mcols of our CytoImageList in the colData of the resulting SingleCellExperiment.

cellSCE <- cytomapper::measureObjects(masks, images, img_id = "imageID")

Normalising cells

Once cellular features have been extracted into a SingleCellExperement or dataframe, these features may then be normalised using the normalizeCellsfunction, transformed by any number of transformations (e.g., asinh, sqrt) and normalisation methods.

mean(Divides the marker cellular marker intensities by their mean), minMax (Subtracts the minimum value and scales markers between 0 and 1.), trim99 (Sets the highest 1% of values to the value of the 99th percentile.), PC1 (Removes the 1st principal component) can be performed with one call of the function, in the order specified by the user.

Method Description
“mean” Divides the marker cellular marker intensities by their mean.
“minMax” Subtracts the minimum value and scales markers between 0 and 1.
“trim99” Sets the highest 1% of values to the value of the 99th percentile.`
“PC1” Removes the 1st principal component) can be performed with one call of the function, in the order specified by the user.
# Transform and normalise the marker expression of each cell type.
# Use a square root transform, then trimmed the 99 quantile
cellSCE <- normalizeCells(cellSCE,
  assayIn = "counts",
  assayOut = "norm",
  imageID = "imageID",
  transformation = "sqrt",
  method = c("trim99", "minMax")
)

QC normalisation

We could check to see if the marker intensities of each cell require some form of transformation or normalisation. Here we extract the intensities from the counts assay. Looking at PanKRT which should be expressed in the majority of the tumour cells, the intensities are clearly very skewed.

# Extract marker data and bind with information about images
df <- as.data.frame(cbind(colData(cellSCE), t(assay(cellSCE, "counts"))))

# Plots densities of PanKRT for each image.
ggplot(df, aes(x = PanKRT, colour = imageID)) +
  geom_density() +
  labs(x = "PanKRT expression") +
  theme_minimal()

We can see that the normalised data stored in the norm assay appears more bimodal, not perfect, but likely sufficient for clustering.

# Extract normalised marker information.
df <- as.data.frame(cbind(colData(cellSCE), t(assay(cellSCE, "norm"))))

# Plots densities of normalised PanKRT for each image.
ggplot(df, aes(x = PanKRT, colour = imageID)) +
  geom_density() +
  labs(x = "PanKRT expression") +
  theme_minimal()

Session Info

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] cytomapper_1.19.0           SingleCellExperiment_1.28.0
#>  [3] SummarizedExperiment_1.36.0 Biobase_2.67.0             
#>  [5] GenomicRanges_1.59.0        GenomeInfoDb_1.43.0        
#>  [7] IRanges_2.41.0              S4Vectors_0.44.0           
#>  [9] BiocGenerics_0.53.0         MatrixGenerics_1.19.0      
#> [11] matrixStats_1.4.1           EBImage_4.49.0             
#> [13] ggplot2_3.5.1               simpleSeg_1.9.0            
#> [15] BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] bitops_1.0-9             deldir_2.0-4             gridExtra_2.3           
#>   [4] rlang_1.1.4              magrittr_2.0.3           svgPanZoom_0.3.4        
#>   [7] shinydashboard_0.7.2     spatstat.geom_3.3-3      compiler_4.4.1          
#>  [10] png_0.1-8                systemfonts_1.1.0        fftwtools_0.9-11        
#>  [13] vctrs_0.6.5              pkgconfig_2.0.3          SpatialExperiment_1.16.0
#>  [16] crayon_1.5.3             fastmap_1.2.0            magick_2.8.5            
#>  [19] XVector_0.46.0           labeling_0.4.3           utf8_1.2.4              
#>  [22] promises_1.3.0           rmarkdown_2.28           UCSC.utils_1.2.0        
#>  [25] ggbeeswarm_0.7.2         xfun_0.48                zlibbioc_1.52.0         
#>  [28] cachem_1.1.0             jsonlite_1.8.9           highr_0.11              
#>  [31] later_1.3.2              rhdf5filters_1.18.0      DelayedArray_0.33.1     
#>  [34] spatstat.utils_3.1-0     Rhdf5lib_1.28.0          BiocParallel_1.41.0     
#>  [37] jpeg_0.1-10              tiff_0.1-12              terra_1.7-83            
#>  [40] parallel_4.4.1           R6_2.5.1                 spatstat.data_3.1-2     
#>  [43] bslib_0.8.0              RColorBrewer_1.1-3       spatstat.univar_3.0-1   
#>  [46] jquerylib_0.1.4          Rcpp_1.0.13              knitr_1.48              
#>  [49] httpuv_1.6.15            Matrix_1.7-1             nnls_1.6                
#>  [52] tidyselect_1.2.1         abind_1.4-8              yaml_2.3.10             
#>  [55] viridis_0.6.5            codetools_0.2-20         lattice_0.22-6          
#>  [58] tibble_3.2.1             withr_3.0.2              shiny_1.9.1             
#>  [61] evaluate_1.0.1           polyclip_1.10-7          pillar_1.9.0            
#>  [64] BiocManager_1.30.25      generics_0.1.3           sp_2.1-4                
#>  [67] RCurl_1.98-1.16          munsell_0.5.1            scales_1.3.0            
#>  [70] xtable_1.8-4             glue_1.8.0               maketools_1.3.1         
#>  [73] tools_4.4.1              sys_3.4.3                locfit_1.5-9.10         
#>  [76] buildtools_1.0.0         rhdf5_2.50.0             grid_4.4.1              
#>  [79] colorspace_2.1-1         GenomeInfoDbData_1.2.13  raster_3.6-30           
#>  [82] beeswarm_0.4.0           HDF5Array_1.35.0         vipor_0.4.7             
#>  [85] cli_3.6.3                fansi_1.0.6              S4Arrays_1.6.0          
#>  [88] viridisLite_0.4.2        svglite_2.1.3            dplyr_1.1.4             
#>  [91] gtable_0.3.6             sass_0.4.9               digest_0.6.37           
#>  [94] SparseArray_1.6.0        farver_2.1.2             rjson_0.2.23            
#>  [97] htmlwidgets_1.6.4        htmltools_0.5.8.1        lifecycle_1.0.4         
#> [100] httr_1.4.7               mime_0.12