Package 'clustSIGNAL'

Title: ClustSIGNAL: a spatial clustering method
Description: clustSIGNAL: clustering of Spatially Informed Gene expression with Neighbourhood Adapted Learning. A tool for adaptively smoothing and clustering gene expression data. clustSIGNAL uses entropy to measure heterogeneity of cell neighbourhoods and performs a weighted, adaptive smoothing, where homogeneous neighbourhoods are smoothed more and heterogeneous neighbourhoods are smoothed less. This not only overcomes data sparsity but also incorporates spatial context into the gene expression data. The resulting smoothed gene expression data is used for clustering and could be used for other downstream analyses.
Authors: Pratibha Panwar [cre, aut, ctb] (ORCID: <https://orcid.org/0000-0002-7437-7084>), Boyi Guo [aut], Haowen Zhao [aut], Stephanie Hicks [aut], Shila Ghazanfar [aut, ctb] (ORCID: <https://orcid.org/0000-0001-7861-6997>)
Maintainer: Pratibha Panwar <[email protected]>
License: GPL-2
Version: 1.5.1
Built: 2026-06-02 18:47:14 UTC
Source: https://github.com/bioc/clustSIGNAL

Help Index


Adaptive smoothing

Description

A function to perform a weighted, adaptive smoothing of the gene expression of each cell based on the heterogeneity of its neighbourhood. Heterogeneous neighbourhoods are smoothed less with higher weights given to cells belonging to same initial cluster as the index cell. Homogeneous neighbourhoods are smoothed more with similar weights given to most cells.

Usage

adaptiveSmoothing(spe, nnCells, NN = 30, kernel = c("G", "E"), spread = 0.3)

Arguments

spe

SpatialExperiment object containing neighbourhood entropy values of each cell.

nnCells

a character matrix of NN nearest neighbours - rows are index cells and columns are their nearest neighbours ranging from closest to farthest neighbour. For sort = TRUE, the neighbours belonging to the same initial cluster as the index cell are moved closer to it.

NN

an integer for the number of neighbouring cells the function should consider. The value must be greater than or equal to 1. Default value is 30.

kernel

a character for type of distribution to be used. The two valid values are "G" or "E" for Gaussian and exponential distributions, respectively. Default value is "G".

spread

a numeric value for distribution spread, represented by standard deviation for Gaussian distribution and rate for exponential distribution. Default value is 0.3 for Gaussian distribution. The recommended value is 5 for exponential distribution.

Value

SpatialExperiment object including smoothed gene expression as an additional assay.

Examples

data(ClustSignal_example)

# requires matrix containing NN nearest neighbour cell labels (nnCells),
# generated using the neighbourDetect() function
spe <- clustSIGNAL::adaptiveSmoothing(spe, nnCells)
spe

ClustSIGNAL

Description

Clustering method for spatially-resolved cell-state classification of spatial transcriptomics data. The tool generates and uses an adaptively smoothed, spatially-informed gene expression for clustering.

Usage

clustSIGNAL(
  spe,
  samples,
  dimRed_init = "None",
  dimRed_f = c("None", "embed.smooth"),
  batch = FALSE,
  batch_by = "None",
  NN = 30,
  kernel = c("G", "E"),
  spread = 0.3,
  sort = TRUE,
  threads = 1,
  outputs = c("c", "n", "s", "a"),
  clustParams = list(clust_c = 0, subclust_c = 0, iter.max = 30, k = 10, cluster.fun =
    "louvain")
)

Arguments

spe

a SpatialExperiment object containing spatial coordinates in 'spatialCoords' matrix and normalised gene expression in 'logcounts' assay.

samples

a character indicating name of colData(spe) column containing sample names.

dimRed_init

a character indicating the name of the reduced dimensions in the SpatialExperiment object (i.e., from reducedDimNames(spe)) to use for initial clustering step. Default value is 'None'.

dimRed_f

a character indicating the name of the reduced dimensions in the SpatialExperiment object (i.e., from reducedDimNames(spe)) to use for final clustering step. Two valid options are "None" (default), which triggers a PCA run on smoothed expression, and "embed.smooth", which triggers a search for externally-generated "embed.smooth" low embedding in reducedDimNames(spe).

batch

a logical parameter for whether to perform batch correction. Default value is FALSE.

batch_by

a character indicating name of colData(spe) column containing the batch names. Default value is 'None'.

NN

an integer for the number of neighbouring cells the function should consider. The value must be greater than or equal to 1. Default value is 30.

kernel

a character for type of distribution to be used. The two valid values are "G" or "E" for Gaussian and exponential distributions, respectively. Default value is "G".

spread

a numeric value for distribution spread, represented by standard deviation for Gaussian distribution and rate for exponential distribution. Default value is 0.3 for Gaussian distribution. The recommended value is 5 for exponential distribution.

sort

a logical parameter for whether to sort the neighbourhood by initial clusters. Default value is TRUE.

threads

a numeric value for the number of CPU cores to be used for the analysis. Default value set to 1.

outputs

a character for the type of output to return to the user. "c" for data frame of cell IDs and their respective ClustSIGNAL cluster labels, "n" for ClustSIGNAL cluster dataframe plus neighbourhood matrix, "s" for ClustSIGNAL cluster dataframe plus final SpatialExperiment object, or "a" for all 3 outputs. Default value is 'c'.

clustParams

a list of parameters for TwoStepParam clustering methods: clust_c is the number of centers to use for clustering with KmeansParam. By default set to 0, in which case the method uses either 3000 centers or 1/5th of the total cells in the data as the number of centers, whichever is lower. subclust_c is the number of centers to use for sub-clustering the initial clusters with KmeansParam. The default value is 0, in which case the method uses either 1 center or half of the total cells in the initial cluster as the number of centers, whichever is higher. iter.max is the maximum number of iterations to perform during clustering and sub-clustering with KmeansParam. Default value is 30. k is a numeric value indicating the k-value used for clustering and sub-clustering with NNGraphParam. Default value is 10. cluster.fun is a character indicating the graph clustering method used with NNGraphParam. By default, the Louvain method is used.

Value

a list of outputs depending on the type of outputs specified in the main function call.

1. clusters: a data frame of cell names and their ClustSIGNAL cluster classification.

2. neighbours: a character matrix containing cells IDs of each cell's NN neighbours.

3. spe_final: a SpatialExperiment object containing the original spe object data plus initial cluster and subcluster labels, entropy values, smoothed gene expression, and ClustSIGNAL cluster labels.

Examples

data(ClustSignal_example)

names(colData(spe))
# identify the column name with sample labels
samples = "sample_id"
res_list <- clustSIGNAL(spe, samples, outputs = "c")

Example data with SpatialExperiment object

Description

This example data was generated from the mouse embryo spatial transcriptomics dataset of 3 mouse embryos, with 351 genes and a total of 57536 cells. For running examples, we subset the data by selecting 1000 random cells from embryo 2, excluding any cells annotated as 'low quality'. After subsetting, we have expression for 351 genes from 1000 cells in embryo 2.

Usage

data(ClustSignal_example)

Format

spe a spatialExperiment object containing gene expression matrix with normalised counts, where rows indicate genes and columns indicate cells. Also, contains a cell metadata including cell IDs, sample IDs, cell type annotations, and x-y coordinates of cells. nnCells a matrix where each row corresponds to a cell in spe object, and the columns correspond to the nearest neighbors. regXclust a list where each element corresponds to a cell in spe object, and contains the cluster composition proportions.

Source

Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis, Nature Biotechnology, 2021. Webpage: https://www.nature.com/articles/s41587-021-01006-2


Heterogeneity measure

Description

A function to measure the heterogeneity of a cell's neighbourhood in terms of entropy. Generally, homogeneous neighbourhoods have low entropy and heterogeneous neighbourhoods have high entropy.

Usage

entropyMeasure(spe, regXclust)

Arguments

spe

SpatialExperiment object with initial cluster and subcluster labels.

regXclust

a numeric matrix of cells by subclusters, where the values are the proportion of initial subclusters in each cell's neighbourhood.

Value

SpatialExperiment object with entropy values associated with each cell.

Examples

data(ClustSignal_example)

# requires matrix containing cluster proportions of each neighbourhood
# (regXclust), generated using the neighbourDetect() function
spe <- clustSIGNAL::entropyMeasure(spe, regXclust)
spe$entropy |> head()

Mouse Embryo Data

Description

This dataset contains spatial transcriptomics data from 3 mouse embryos, with 351 genes and a total of 57536 cells. For vignettes, we subset the data by randomly selecting 5000 cells from embryo 2, excluding cells that were annotated as 'low quality'.

Usage

data(mEmbryo2)

Format

me_expr a gene expression matrix with normalised counts, where rows indicate genes and columns indicate cells. me_data a data frame of cell metadata including cell IDs, sample IDs, cell type annotations, and x-y coordinates of cells.

Source

Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis, Nature Biotechnology, 2021. Webpage: https://www.nature.com/articles/s41587-021-01006-2


Mouse Hypothalamus Data

Description

This dataset contains spatial transcriptomics data from 181 mouse hypothalamus samples, 155 genes and a total of 1,027,080 cells. For running the vignettes, we subset the data by selecting total 6000 cells from only 3 samples - Animal 1 Bregma -0.09 (2080 cells) and Animal 7 Bregmas 0.16 (1936 cells) and -0.09 (1984 cells), excluding cells that were annotated as 'ambiguous', and removed 20 genes that were assessed using a different technology.

Usage

data(mHypothal)

Format

mh_expr a gene expression matrix with normalised counts, where rows indicate genes and columns indicate cells. mh_data a data frame of cell metadata including cell IDs, sample IDs, cell type annotations, and x-y coordinates of cells.

Source

Molecular, Spatial and Functional Single-Cell Profiling of the Hypothalamic Preoptic Region, Science, 2018. Webpage: https://www.science.org/doi/10.1126/science.aau5324


Cell neighbourhood detection

Description

A function to identify the neighbourhood of each cell. If sort = TRUE, the neighbourhoods are also sorted such that cells belonging to the same 'initial cluster' as the index cell are arranged closer to it.

Usage

neighbourDetect(spe, samples, NN = 30, sort = TRUE, threads = 1)

Arguments

spe

SpatialExperiment object with initial cluster and subcluster labels.

samples

a character indicating name of colData(spe) column containing sample names.

NN

an integer for the number of neighbouring cells the function should consider. The value must be greater than or equal to 1. Default value is 30.

sort

a logical parameter for whether to sort the neighbourhood by initial clusters. Default value is TRUE.

threads

a numeric value for the number of CPU cores to be used for the analysis. Default value set to 1.

Value

a list containing two items:

1. nnCells, a character matrix of NN nearest neighbours - rows are index cells and columns are their nearest neighbours ranging from closest to farthest neighbour. For sort = TRUE, the neighbours belonging to the same initial cluster as the index cell are moved closer to it.

2. regXclust, a numeric matrix of each cell's neighbourhood composition indicated by the proportion of initial subclusters (column) in each cell (row).

Examples

data(ClustSignal_example)

out_list <- clustSIGNAL::neighbourDetect(spe, samples = "sample_id")
out_list |> names()

Initial non-spatial clustering

Description

A function to perform initial non-spatial clustering and sub-clustering of normalised gene expression to generate 'initial clusters' and 'initial subclusters'.

Usage

p1_clustering(
  spe,
  dimRed_init = "None",
  batch = FALSE,
  batch_by = "None",
  threads = 1,
  clustParams = list(clust_c = 0, subclust_c = 0, iter.max = 30, k = 10, cluster.fun =
    "louvain")
)

Arguments

spe

a SpatialExperiment object containing spatial coordinates in 'spatialCoords' matrix and normalised gene expression in 'logcounts' assay.

dimRed_init

a character indicating the name of the reduced dimensions in the SpatialExperiment object (i.e., from reducedDimNames(spe)) to use for initial clustering step. Default value is 'None'.

batch

a logical parameter for whether to perform batch correction. Default value is FALSE.

batch_by

a character indicating name of colData(spe) column containing the batch names. Default value is 'None'.

threads

a numeric value for the number of CPU cores to be used for the analysis. Default value set to 1.

clustParams

a list of parameters for TwoStepParam clustering methods: clust_c is the number of centers to use for clustering with KmeansParam. By default set to 0, in which case the method uses either 3000 centers or 1/5th of the total cells in the data as the number of centers, whichever is lower. subclust_c is the number of centers to use for sub-clustering the initial clusters with KmeansParam. The default value is 0, in which case the method uses either 1 center or half of the total cells in the initial cluster as the number of centers, whichever is higher. iter.max is the maximum number of iterations to perform during clustering and sub-clustering with KmeansParam. Default value is 30. k is a numeric value indicating the k-value used for clustering and sub-clustering with NNGraphParam. Default value is 10. cluster.fun is a character indicating the graph clustering method used with NNGraphParam. By default, the Louvain method is used.

Value

SpatialExperiment object with initial cluster and subcluster labels of each cell.

Examples

data(ClustSignal_example)

spe <- clustSIGNAL::p1_clustering(spe, dimRed_init = "PCA")
spe$nsCluster |> head()
spe$initCluster |> head()

Final non-spatial clustering

Description

A function to perform clustering on adaptively smoothed gene expression data to generate ClustSIGNAL clusters.

Usage

p2_clustering(
  spe,
  dimRed_f = c("None", "embed.smooth"),
  batch = FALSE,
  batch_by = "None",
  threads = 1,
  clustParams = list(clust_c = 0, subclust_c = 0, iter.max = 30, k = 10, cluster.fun =
    "louvain")
)

Arguments

spe

SpatialExperiment object containing the adaptively smoothed gene expression.

dimRed_f

a character indicating the name of the reduced dimensions in the SpatialExperiment object (i.e., from reducedDimNames(spe)) to use for final clustering step. Two valid options are "None" (default), which triggers a PCA run on smoothed expression, and "embed.smooth", which triggers a search for "embed.smooth" low embedding in reducedDimNames(spe).

batch

a logical parameter for whether to perform batch correction. Default value is FALSE.

batch_by

a character indicating name of colData(spe) column containing the batch names. Default value is 'None'.

threads

a numeric value for the number of CPU cores to be used for the analysis. Default value set to 1.

clustParams

a list of parameters for TwoStepParam clustering methods: clust_c is the number of centers to use for clustering with KmeansParam. By default set to 0, in which case the method uses either 3000 centers or 1/5th of the total cells in the data as the number of centers, whichever is lower. subclust_c is the number of centers to use for sub-clustering the initial clusters with KmeansParam. This parameter is not used in the final clustering step. iter.max is the maximum number of iterations to perform during clustering and sub-clustering with KmeansParam. Default value is 30. k is a numeric value indicating the k-value used for clustering with NNGraphParam. Default value is 10. cluster.fun is a character indicating the graph clustering method used with NNGraphParam. By default, the Louvain method is used.

Value

SpatialExperiment object containing clusters generated from smoothed data.

Examples

data(ClustSignal_example)

# For non-spatial clustering of normalised counts
spe <- clustSIGNAL::p2_clustering(spe)
spe$ClustSIGNAL |> head()