Title: | A Correlation Based Multiview Self Organizing Maps Clustering For IMC Datasets |
---|---|
Description: | A correlation-based multiview self-organizing map for the characterization of cell types in highly multiplexed in situ imaging cytometry assays (`FuseSOM`) is a tool for unsupervised clustering. `FuseSOM` is robust and achieves high accuracy by combining a `Self Organizing Map` architecture and a `Multiview` integration of correlation based metrics. This allows FuseSOM to cluster highly multiplexed in situ imaging cytometry assays. |
Authors: | Elijah Willie [aut, cre] |
Maintainer: | Elijah Willie <[email protected]> |
License: | GPL-2 |
Version: | 1.9.0 |
Built: | 2024-11-29 07:12:32 UTC |
Source: | https://github.com/bioc/FuseSOM |
Function to do arsinh normalization
.arsinhNnorm(x, cofactor = 5)
.arsinhNnorm(x, cofactor = 5)
x |
A numeric or complex vector |
cofactor |
Cofactor of the vector. Default is 5. |
Arsinh normalized vector.
A function to compute the elbow point given a set of points
.computeElbow(vals)
.computeElbow(vals)
vals |
Values to compute the elbow point of. |
A integer indicating the elbow point of vals
.
Function to do min max normalization
.minmaxNorm(x)
.minmaxNorm(x)
x |
Matrix to min max nomalize. |
Max normalized version of x
Function to do percentile normalizaton
.percentileNorm(x)
.percentileNorm(x)
x |
Matrix to percentile normilse. |
percentile normalized version of x
Function to estimate the number of clusters using discriminant analysis parts of this function is based on the sigclust2 package by Patrick Kimes see https://github.com/pkimes/sigclust2
.runDiscriminant(distMat, minClusterSize, alpha = 0.001)
.runDiscriminant(distMat, minClusterSize, alpha = 0.001)
distMat |
A distance matrix |
minClusterSize |
The minimum cluster size |
alpha |
a value between 0 and 1 specifying the desired level of cutoff |
Optimal number of clusters
Creates uniformly distributed data of same dimensionality as input data this function was obtained from the Stab package
.uniformData(data)
.uniformData(data)
data |
A data matrix. |
Uniform random noise with dim(data)
Cluster the prototypes from the Self Organizing Map Clustering is done using hierarchical clustering with the average linkage function
clusterPrototypes(somModel, numClusters = NULL)
clusterPrototypes(somModel, numClusters = NULL)
somModel |
the self organizing map |
numClusters |
the number of clusters to generate |
the cluster labels
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) prototypes <- generatePrototypes(risom_dat[, risomMarkers]) clusters <- clusterPrototypes(prototypes, 23)
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) prototypes <- generatePrototypes(risom_dat[, risomMarkers]) clusters <- clusterPrototypes(prototypes, 23)
The function finds the eigenvalues of the sample covariance matrix. It will then return the number of significant eigenvalues according to the Tracy-Widom test. The function is based on the estKW function from the SC3 package
computeGridSize(dataset)
computeGridSize(dataset)
dataset |
The optimal grid size. |
the optimal grid size.
Elijah WIllie [email protected]
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) computeGridSize(risom_dat[, risomMarkers])
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) computeGridSize(risom_dat[, risomMarkers])
A function for estimating the number of clusters using various method Methods available are: Discriminant, Distance (Gap, Silhouette, Slope, Jump, and Within Cluster Distance,) and Instability
estimateNumCluster(data, method = c("Discriminant", "Distance"), kSeq = 2:20)
estimateNumCluster(data, method = c("Discriminant", "Distance"), kSeq = 2:20)
data |
the SOM object generated by generatePrototypes(), or an object of class SingleCellExperiment or SpatialExperiment. |
method |
one of Discriminant, Distance, Stability. By default, everything is run |
kSeq |
a sequence of the number of clusters to try. Default is 2:20 clusters |
A list containing the cluster estimations if a dataframe or matrix is provided
A SingleCellExperiment with a cluster estimate in it's metadata if a SingleCellExperiment or SpatialExperiment object is provided
Elijah WIllie [email protected]
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) res <- runFuseSOM(risom_dat, markers = risomMarkers, numClusters = 23) res.est.k <- estimateNumCluster(res$model, kSeq = 2:25)
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) res <- runFuseSOM(risom_dat, markers = risomMarkers, numClusters = 23) res.est.k <- estimateNumCluster(res$model, kSeq = 2:25)
FuseSOM provides a pipeline for the clustering of highly multiplexed in situ imaging cytometry assays. This pipeline uses the Self Organizing Map architecture coupled with Multiview hierarchical clustering. We also provide functions for normalisation and estimation of the number of clusters.
The FuseSOM package provides three categories of important functions: foo, bar and baz.
A self organizing map of the marker intensities is generated and the prototypes are returned. The grid size is determined automatically
generatePrototypes(data, verbose = FALSE, size = NULL)
generatePrototypes(data, verbose = FALSE, size = NULL)
data |
the marker intensities |
verbose |
should the progress be printed out |
size |
The optimal grid size for the Self Organizing Map |
the self organizing map object
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) generatePrototypes(risom_dat[, risomMarkers])
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) generatePrototypes(risom_dat[, risomMarkers])
A function for generating a heat map of marker expression across clusters
markerHeatmap( data, markers = NULL, clusters = NULL, threshold = 2, clusterMarkers = FALSE, fontSize = 14 )
markerHeatmap( data, markers = NULL, clusters = NULL, threshold = 2, clusterMarkers = FALSE, fontSize = 14 )
data |
a matrix or dataframe where the rows are samples and columns are markers |
markers |
a list of markers of interest. If not provided, all columns will be used |
clusters |
a vector of cluster labels |
threshold |
the value to threshold the marker expression at |
clusterMarkers |
should the rows(markers) of the heatmap be clustered |
fontSize |
the size of the text on the heatmap |
a heatmap with the markers in the rows and clusters in the columns
Elijah WIllie [email protected]
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) res <- runFuseSOM(risom_dat, markers = risomMarkers, numClusters = 23) p.heat <- markerHeatmap(risom_dat, risomMarkers, clusters = res$clusters)
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) res <- runFuseSOM(risom_dat, markers = risomMarkers, numClusters = 23) p.heat <- markerHeatmap(risom_dat, risomMarkers, clusters = res$clusters)
The matrix of intensities is normalised based on one of four different method These methods include Percentile, zscore, arsinh and minmax
normaliseData(data, markers, method = "none", cofactor = 5)
normaliseData(data, markers, method = "none", cofactor = 5)
data |
the raw intensity scores. |
markers |
the markers of interest. |
method |
the normalizaton method |
cofactor |
the cofactor for arsinh normalisation |
normalised matrix.
Elijah WIllie [email protected]
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) normaliseData(risom_dat[, risomMarkers])
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) normaliseData(risom_dat[, risomMarkers])
The matrix of intensities is normalised based on one of four different method These methods include Percentile, zscore, arsinh and minmax
normalizeData(data, markers, method = "none", cofactor = 5)
normalizeData(data, markers, method = "none", cofactor = 5)
data |
the raw intensity scores. |
markers |
the markers of interest. |
method |
the normalizaton method |
cofactor |
the cofactor for arsinh normalization |
normalised matrix.
Elijah WIllie [email protected]
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) normaliseData(risom_dat[, risomMarkers])
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) normaliseData(risom_dat[, risomMarkers])
A function generating the elbow plot for the optimal number of clusters returned by the estimateNumcluster() function Methods available are: Gap, Silhouette, Slope, Jump, and Within Cluster Distance(WCD)
optiPlot(data, method = "jump")
optiPlot(data, method = "jump")
data |
a Self Organizing Map object generated by generatePrototypes(), or an object of class SingleCellExperiment or SpatialExperiment |
method |
one of 'jump', 'slope', 'wcd', 'gap', or 'silhouette' |
an elbow plot object where the optimal number of clusters is marked
Elijah WIllie [email protected]
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) res <- runFuseSOM(risom_dat, markers = risomMarkers, numClusters = 23) resEstK <- estimateNumCluster(res$model, kSeq = 2:25) p <- optiPlot(resEstK, method = "jump")
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) res <- runFuseSOM(risom_dat, markers = risomMarkers, numClusters = 23) resEstK <- estimateNumCluster(res$model, kSeq = 2:25) p <- optiPlot(resEstK, method = "jump")
IMC Breast Cancer Data Data from A spatial atlas of breast cancer progression using MIBI-TOF and tissue transcriptomics
data(risom_dat)
data(risom_dat)
An object of class "data.frame"
.
Mendeley Data, https://data.mendeley.com/datasets/d87vg86zd8/3
T. Risom, et al. Transition to invasive breast cancer is associated with progressive changes in the structure and composition of tumor stroma Cell, 185 (2022), pp. 299-310 (ScienceDirect)
This function accepts a matrix, dataframe or a SingleCellExperiment object. For matrices and dataframes, it is assumed that markers are the columns and samples rows.
runFuseSOM( data, markers = NULL, numClusters = NULL, assay = NULL, clusterCol = "clusters", size = NULL, verbose = FALSE )
runFuseSOM( data, markers = NULL, numClusters = NULL, assay = NULL, clusterCol = "clusters", size = NULL, verbose = FALSE )
data |
a matrix, dataframe, SingleCellExperiment or SpatialExperiment object. |
markers |
the markers of interest. If this is not provided, all columns will be used |
numClusters |
the number of clusters to be generated from the data |
assay |
the assay of interest if SingleCellExperiment object is used |
clusterCol |
the name of the column to store the clusters in |
size |
the size of the square grid. eg for a 10X10 grid, size = 10 |
verbose |
should the generation of the Self Organising Map be printed |
A list containing the SOM model and the cluster labels if a dataframe or matrix is provided
A SingleCellExperiment object with labels in coldata, and SOM model in metadata if a SingleCellExperiment or SpatialExperiment object is provided
Elijah WIllie [email protected]
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) res <- runFuseSOM( risom_dat, markers = risomMarkers, numClusters = 23, size = 8 )
data("risom_dat") risomMarkers <- c( "CD45", "SMA", "CK7", "CK5", "VIM", "CD31", "PanKRT", "ECAD" ) res <- runFuseSOM( risom_dat, markers = risomMarkers, numClusters = 23, size = 8 )
these functions were obtained from https://rdrr.io/rforge/yasomi/ with some major modifications
## Default S3 method: somInitPca(data, somGrid, weights, with.princomp = FALSE, ...)
## Default S3 method: somInitPca(data, somGrid, weights, with.princomp = FALSE, ...)
data |
The data to which the SOM will be fitted, a matrix or data frame of observations (which should be scaled) |
somGrid |
A |
weights |
Optional weights for the data points |
with.princomp |
Switch specifying whether the princomp should be used instead of the prcomp for computing the principal components when no weights are given (see details) |
... |
not used |
A list containing: prototype
, a matrix containing appropriate
initial prototypes, and data.pca
the results of the PCA conducted
on the data