| Title: | POpulation-based Evaluation Metrics |
|---|---|
| Description: | This package provides a comprehensive set of external and internal evaluation metrics. It includes metrics for assessing partitions or fuzzy partitions derived from clustering results, as well as for evaluating subpopulation identification results within embeddings or graph representations. Additionally, it provides metrics for comparing spatial domain detection results against ground truth labels, and tools for visualizing spatial errors. |
| Authors: | Siyuan Luo [cre, aut] (ORCID: <https://orcid.org/0009-0007-6404-3244>), Pierre-Luc Germain [aut, ctb] (ORCID: <https://orcid.org/0000-0003-3418-4218>) |
| Maintainer: | Siyuan Luo <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.5.0 |
| Built: | 2026-05-30 07:09:24 UTC |
| Source: | https://github.com/bioc/poem |
Computes the CDbw-index (Halkidi and Vazirgiannis 2008; Halkidi,
Vazirgiannis and Hennig, 2015). This function is directly copied from the
fpc CRAN package and was written by Christian Hennig. It is included here
to reduce the package dependencies (since fpc has a few not-so-light
dependencies that aren't required here).
CDbw( x, labels, r = 10, s = seq(0.1, 0.8, by = 0.1), clusterstdev = TRUE, trace = FALSE )CDbw( x, labels, r = 10, s = seq(0.1, 0.8, by = 0.1), clusterstdev = TRUE, trace = FALSE )
x |
Something that can be coerced into a numerical matrix, with elements as rows. |
labels |
A vector of integers with length |
r |
Number of cluster border representatives. |
s |
Vector of shrinking factors. |
clusterstdev |
Logical. If |
trace |
Logical; whether to print processing info. |
A vector with the following values (see refs for details):
cdbw |
value of CDbw index (the higher the better). |
cohesion |
cohesion. |
compactness |
compactness. |
sep |
separation. |
Christian Hennig
Halkidi, M. and Vazirgiannis, M. (2008) A density-based cluster validity approach using multi-representatives. Pattern Recognition Letters 29, 773-786.
Halkidi, M., Vazirgiannis, M. and Hennig, C. (2015)
Method-independent indices for cluster validation. In C. Hennig, M. Meila,
F. Murtagh, R. Rocci (eds.) Handbook of Cluster Analysis, CRC Press/Taylor
& Francis, Boca Raton.
d1 <- mockData() CDbw(d1[,seq_len(2)], d1[,3])d1 <- mockData() CDbw(d1[,seq_len(2)], d1[,3])
CHAOS score measures the clustering performance by calculating the mean length of the graph edges in the 1-nearest neighbor (1NN) graph for each cluster, averaged across clusters. Lower CHAOS score indicates better spatial domain clustering performance.
CHAOS(labels, location, BNPARAM = NULL)CHAOS(labels, location, BNPARAM = NULL)
labels |
Cluster labels. |
location |
A numeric data matrix containing location information, where rows are points and columns are location dimensions. |
BNPARAM |
BNPARAM object passed to |
A numeric value for CHAOS score.
data(sp_toys) data <- sp_toys CHAOS(data$label, data[,c("x", "y")]) CHAOS(data$p1, data[,c("x", "y")]) CHAOS(data$p2, data[,c("x", "y")])data(sp_toys) data <- sp_toys CHAOS(data$label, data[,c("x", "y")]) CHAOS(data$p1, data[,c("x", "y")]) CHAOS(data$p2, data[,c("x", "y")])
Compute the DBCV (Density-Based Clustering Validation) metric.
dbcv( X, labels, distance = "euclidean", noise_id = -1, check_duplicates = FALSE, use_igraph_mst = TRUE, BPPARAM = BiocParallel::SerialParam(), ... )dbcv( X, labels, distance = "euclidean", noise_id = -1, check_duplicates = FALSE, use_igraph_mst = TRUE, BPPARAM = BiocParallel::SerialParam(), ... )
X |
Numeric matrix of samples. |
labels |
Integer vector of cluster IDs. |
distance |
String specifying the distance metric. |
noise_id |
Integer, the cluster ID in |
check_duplicates |
Logical flag to check for duplicate samples. |
use_igraph_mst |
Logical flag to use |
BPPARAM |
BiocParallel params for multithreading (default none) |
... |
Ignored |
This implementation will not fully reproduce the results of other existing implementations (e.g. https://github.com/FelSiq/DBCV) due to the different algorithms used for computing the Minimum Spanning Tree.
A list:
vcs |
Numeric vector of validity index for each cluster. |
dbcv |
Numeric value representing the overall DBCV metric. |
Davoud Moulavi, et al. 2014; 10.1137/1.9781611973440.96.
data(noisy_moon) data <- noisy_moon dbcv(data[, c("x", "y")], data$kmeans_label) dbcv(data[, c("x", "y")], data$hdbscan_label)data(noisy_moon) data <- noisy_moon dbcv(data[, c("x", "y")], data$kmeans_label) dbcv(data[, c("x", "y")], data$hdbscan_label)
Calculating the Entropy-based Local indicator of Spatial Association (ELSA) scores, which consist of Ea, Ec and the overall ELSA.
ELSA(labels, location, k = 10)ELSA(labels, location, k = 10)
labels |
Cluster labels. |
location |
A numerical matrix containing the location information, with rows as samples and columns as location dimensions. |
k |
Number of nearest neighbors. |
A dataframe containing the Ea, Ec and ELSA for all samples in the dataset.
Naimi, Babak, et al., 2019; 10.1016/j.spasta.2018.10.001
data(sp_toys) data <- sp_toys ELSA(data$label, data[,c("x", "y")], k=6) ELSA(data$p1, data[,c("x", "y")], k=6) ELSA(data$p2, data[,c("x", "y")], k=6)data(sp_toys) data <- sp_toys ELSA(data$label, data[,c("x", "y")], k=6) ELSA(data$p1, data[,c("x", "y")], k=6) ELSA(data$p2, data[,c("x", "y")], k=6)
Computes k nearest neighbors from embedding.
emb2knn(x, k, BNPARAM = NULL)emb2knn(x, k, BNPARAM = NULL)
x |
A numeric matrix (with features as columns and items as rows) from which nearest neighbors will be computed. |
k |
The number of nearest neighbors. |
BNPARAM |
A BiocNeighbors parameter object to compute kNNs. Ignored unless the input is a matrix or data.frame. If omitted, the Annoy approximation will be used if there are more than 500 elements. |
A knn list.
d1 <- mockData() emb2knn(as.matrix(d1[,seq_len(2)]),k=5)d1 <- mockData() emb2knn(as.matrix(d1[,seq_len(2)]),k=5)
computes shared nearest neighbors from embedding.
emb2snn(x, k, type = "rank", BNPARAM = NULL)emb2snn(x, k, type = "rank", BNPARAM = NULL)
x |
A numeric matrix (with features as columns and items as rows) from which nearest neighbors will be computed. |
k |
The number of nearest neighbors. |
type |
A string specifying the type of weighting scheme to use for
shared neighbors.
Possible choices include "rank", "number", and "jaccard". See |
BNPARAM |
A BiocNeighbors parameter object to compute kNNs. Ignored unless the input is a matrix or data.frame. If omitted, the Annoy approximation will be used if there are more than 500 elements. |
An igraph object.
d1 <- mockData() emb2snn(as.matrix(d1[,seq_len(2)]),k=5)d1 <- mockData() emb2snn(as.matrix(d1[,seq_len(2)]),k=5)
For a given dataset, find the k nearest neighbors for each
object based on their spatial locations, with the option of handling ties.
findSpatialKNN( location, k, keep_ties = TRUE, useMedianDist = FALSE, BNPARAM = NULL )findSpatialKNN( location, k, keep_ties = TRUE, useMedianDist = FALSE, BNPARAM = NULL )
location |
A numeric data matrix containing location information, where rows are points and columns are location dimensions. |
k |
The number of nearest neighbors to look at. |
keep_ties |
A Boolean indicating if ties are counted once or not. If
TRUE, neighbors of the same distances will be included even if it means
returning more than |
useMedianDist |
Use the median distance of the k nearest neighbor as
maximum distance to be included. Ignored if |
BNPARAM |
BNPARAM object passed to |
A list of indices.
data(sp_toys) data <- sp_toys findSpatialKNN(data[,c("x", "y")], k=6)data(sp_toys) data <- sp_toys findSpatialKNN(data[,c("x", "y")], k=6)
Computes fuzzy-hard versions of pair-sorting partition metrics to compare a
hard clustering with both a fuzzy and hard truth. This was especially
designed for cases where the fuzzy truth represents an uncertainty of a hard
truth. Briefly put, the maximum of the pair concordance between the
clustering and either the hard or the fuzzy truth is used, and the hard truth
is used to compute completeness. See fuzzyPartitionMetrics for
the more standard implementation of the metrics.
fuzzyHardMetrics( hardTrue, fuzzyTrue, hardPred, nperms = NULL, returnElementPairAccuracy = FALSE, lowMemory = NULL, verbose = TRUE, BPPARAM = BiocParallel::SerialParam() )fuzzyHardMetrics( hardTrue, fuzzyTrue, hardPred, nperms = NULL, returnElementPairAccuracy = FALSE, lowMemory = NULL, verbose = TRUE, BPPARAM = BiocParallel::SerialParam() )
hardTrue |
An atomic vector coercible to a factor or integer vector
containing the true hard labels. Must have the same length as |
fuzzyTrue |
A object coercible to a numeric matrix with membership
probability of elements (rows) in clusters (columns). Must have the same
number of rows as the length of |
hardPred |
An atomic vector coercible to a factor or integer vector containing the predicted hard labels. |
nperms |
The number of permutations (for correction for chance). If NULL (default), a first set of 10 permutations will be run to estimate whether the variation across permutations is above 0.0025, in which case more (max 1000) permutations will be run. |
returnElementPairAccuracy |
Logical. If TRUE, returns the per-element pair accuracy instead of the various parition-level and dataset-level metrics. Default FALSE. |
lowMemory |
Logical; whether to use the slower, low-memory algorithm. By default this is enabled if the projected memory usage is higher than ~2GB. |
verbose |
Logical; whether to print info and warnings, including the standard error of the mean across permutations (giving an idea of the precision of the adjusted metrics). |
BPPARAM |
BiocParallel params for multithreading (default none) |
A list of metrics:
NDC |
Hullermeier's NDC (fuzzy rand index) |
ACI |
Ambrosio's Adjusted Concordance Index (ACI), i.e. a permutation-based fuzzy version of the adjusted Rand index. |
fuzzyWH |
Fuzzy Wallace Homogeneity index |
fuzzyWC |
Fuzzy Wallace Completeness index |
fuzzyAWH |
Adjusted fuzzy Wallace Homogeneity index |
fuzzyAWC |
Adjusted fuzzy Wallace Completeness index |
Pierre-Luc Germain
Hullermeier et al. 2012; 10.1109/TFUZZ.2011.2179303;
D'Ambrosio et al. 2021; 10.1007/s00357-020-09367-0
# generate a fuzzy truth: fuzzyTrue <- matrix(c( 0.95, 0.025, 0.025, 0.98, 0.01, 0.01, 0.96, 0.02, 0.02, 0.95, 0.04, 0.01, 0.95, 0.01, 0.04, 0.99, 0.005, 0.005, 0.025, 0.95, 0.025, 0.97, 0.02, 0.01, 0.025, 0.025, 0.95), ncol = 3, byrow=TRUE) # a hard truth: hardTrue <- apply(fuzzyTrue,1,FUN=which.max) # some predicted labels: hardPred <- c(1,1,1,1,1,1,2,2,2) fuzzyHardMetrics(hardTrue, fuzzyTrue, hardPred, nperms=3)# generate a fuzzy truth: fuzzyTrue <- matrix(c( 0.95, 0.025, 0.025, 0.98, 0.01, 0.01, 0.96, 0.02, 0.02, 0.95, 0.04, 0.01, 0.95, 0.01, 0.04, 0.99, 0.005, 0.005, 0.025, 0.95, 0.025, 0.97, 0.02, 0.01, 0.025, 0.025, 0.95), ncol = 3, byrow=TRUE) # a hard truth: hardTrue <- apply(fuzzyTrue,1,FUN=which.max) # some predicted labels: hardPred <- c(1,1,1,1,1,1,2,2,2) fuzzyHardMetrics(hardTrue, fuzzyTrue, hardPred, nperms=3)
Per-element maximal concordance between a hard clustering and hard and fuzzy ground truth labels.
fuzzyHardSpotConcordance( hardTrue, fuzzyTrue, hardPred, useNegatives = TRUE, verbose = TRUE )fuzzyHardSpotConcordance( hardTrue, fuzzyTrue, hardPred, useNegatives = TRUE, verbose = TRUE )
hardTrue |
A vector of true cluster labels |
fuzzyTrue |
A object coercible to a numeric matrix with membership
probability of elements (rows) in clusters (columns). Must have the same
number of rows as the length of |
hardPred |
A vector of predicted cluster labels |
useNegatives |
Logical; whether to include negative pairs in the concordance score (tends to result in a larger overall concordance and lower dynamic range of the score). Default TRUE. |
verbose |
Logical; whether to print expected memory usage for large datasets. |
A numeric vector of concordance scores for each element of hardPred
# generate a fuzzy truth: fuzzyTrue <- matrix(c( 0.95, 0.025, 0.025, 0.98, 0.01, 0.01, 0.96, 0.02, 0.02, 0.95, 0.04, 0.01, 0.95, 0.01, 0.04, 0.99, 0.005, 0.005, 0.025, 0.95, 0.025, 0.97, 0.02, 0.01, 0.025, 0.025, 0.95), ncol = 3, byrow=TRUE) # a hard truth: hardTrue <- apply(fuzzyTrue,1,FUN=which.max) # some predicted labels: hardPred <- c(1,1,1,1,1,1,2,2,2) fuzzyHardSpotConcordance(hardTrue, fuzzyTrue, hardPred)# generate a fuzzy truth: fuzzyTrue <- matrix(c( 0.95, 0.025, 0.025, 0.98, 0.01, 0.01, 0.96, 0.02, 0.02, 0.95, 0.04, 0.01, 0.95, 0.01, 0.04, 0.99, 0.005, 0.005, 0.025, 0.95, 0.025, 0.97, 0.02, 0.01, 0.025, 0.025, 0.95), ncol = 3, byrow=TRUE) # a hard truth: hardTrue <- apply(fuzzyTrue,1,FUN=which.max) # some predicted labels: hardPred <- c(1,1,1,1,1,1,2,2,2) fuzzyHardSpotConcordance(hardTrue, fuzzyTrue, hardPred)
Computes fuzzy versions of pair-sorting partition metrics. This is largely
based on the permutation-based implementation by Antonio D'Ambrosio from the
ConsRankClass package, modified to also compute the fuzzy versions of the
adjusted Wallace indices, implement multithreading, and adjust the number of
permutations according to their variability.
fuzzyPartitionMetrics( P, Q, computeWallace = TRUE, nperms = NULL, verbose = TRUE, returnElementPairAccuracy = FALSE, BPPARAM = BiocParallel::SerialParam(), tnorm = c("product", "min", "lukasiewicz") )fuzzyPartitionMetrics( P, Q, computeWallace = TRUE, nperms = NULL, verbose = TRUE, returnElementPairAccuracy = FALSE, BPPARAM = BiocParallel::SerialParam(), tnorm = c("product", "min", "lukasiewicz") )
P |
A object coercible to a numeric matrix with membership probability of elements (rows) in ground-truth classes (columns). |
Q |
A object coercible to a numeric matrix with membership probability
of elements (rows) in predicted clusters (columns). Must have the same
number of rows as |
computeWallace |
Logical; whether to compute the individual fuzzy versions of the Wallace indices (increases running time). |
nperms |
The number of permutations (for correction for chance). If NULL (default), a first set of 10 permutations will be run to estimate whether the variation across permutations is above 0.0025, in which case more (max 1000) permutations will be run. |
verbose |
Logical; whether to print info and warnings, including the standard error of the mean across permutations (giving an idea of the precision of the adjusted metrics). |
returnElementPairAccuracy |
Logical. If TRUE, returns the per-element pair accuracy instead of the various parition-level and dataset-level metrics. Default FALSE. |
BPPARAM |
BiocParallel params for multithreading (default none) |
tnorm |
Which type of t-norm operation to use for class membership of pairs (either product, min, or lukasiewicz) when calculating the Wallace indices. Does not influence the NDC/ACI metrics. |
When returnElementPairAccuracy is FALSE, return a list of
metrics:
NDC |
Hullermeier's NDC (fuzzy rand index) |
ACI |
Ambrosio's Adjusted Concordance Index (ACI), i.e. a permutation-based fuzzy version of the adjusted Rand index. |
fuzzyWH |
Fuzzy Wallace Homogeneity index |
fuzzyWC |
Fuzzy Wallace Completeness index |
fuzzyAWH |
Adjusted fuzzy Wallace Homogeneity index |
fuzzyAWC |
Adjusted fuzzy Wallace Completeness index |
Pierre-Luc Germain
Hullermeier et al. 2012; 10.1109/TFUZZ.2011.2179303;
D'Ambrosio et al. 2021; 10.1007/s00357-020-09367-0
# generate fuzzy partitions:# generate fuzzy partitions:
Computes embedding-based metrics for the specified level.
getEmbeddingMetrics( x, labels, metrics = NULL, distance = "euclidean", level = "class", ... )getEmbeddingMetrics( x, labels, metrics = NULL, distance = "euclidean", level = "class", ... )
x |
A data.frame or matrix (with features as columns and items as rows) from which the metrics will be computed. |
labels |
A vector containing the labels of the predicted clusters. Must be a vector of characters, integers, numerics, or a factor, but not a list. |
metrics |
The metrics to compute. See details. |
distance |
The distance metric to use (default euclidean). |
level |
The level to calculate the metrics. Options include
|
... |
Optional arguments. See details. |
The allowed values for metrics depend on the value of level:
If level = "element", the allowed metrics are: "SW".
If level = "class", the allowed metrics are: "meanSW", "minSW",
"pnSW", "dbcv".
If level = "dataset", the allowed metrics are: "meanSW",
"meanClassSW", "pnSW", "minClassSW", "cdbw", "cohesion",
"compactness", "sep", "dbcv".
The function(s) that the optional arguments ... passed to depend on the
value of level:
If level = "element", optional arguments are passed to stats::dist().
If level = "class", optional arguments are passed to dbcv().
If level = "dataset", optional arguments are passed to dbcv() or
CDbw().
A data.frame of metrics.
d1 <- mockData() getEmbeddingMetrics(d1[,seq_len(2)], labels=d1$class, metrics=c("meanSW", "minSW", "pnSW", "dbcv"), level="class")d1 <- mockData() getEmbeddingMetrics(d1[,seq_len(2)], labels=d1$class, metrics=c("meanSW", "minSW", "pnSW", "dbcv"), level="class")
Get fuzzy representation of labels according to the spatial neighborhood label composition.
getFuzzyLabel(labels, location, k = 6, alpha = 0.5, ...)getFuzzyLabel(labels, location, k = 6, alpha = 0.5, ...)
labels |
An anomic vector of cluster labels |
location |
A matrix or data.frame of coordinates |
k |
The wished number of nearest neighbors |
alpha |
The parameter to control to what extend the spot itself
contribute to the class composition calculation. |
... |
Passed to |
A matrix of fuzzy memberships.
data(sp_toys) data <- sp_toys getFuzzyLabel(data$label, data[,c("x", "y")], k=6)data(sp_toys) data <- sp_toys getFuzzyLabel(data$label, data[,c("x", "y")], k=6)
Computes a selection of external fuzzy clustering evaluation metrics.
getFuzzyPartitionMetrics( hardTrue = NULL, fuzzyTrue = NULL, hardPred = NULL, fuzzyPred = NULL, metrics = c("fuzzyWH", "fuzzyAWH", "fuzzyWC", "fuzzyAWC"), level = "class", nperms = NULL, verbose = TRUE, BPPARAM = BiocParallel::SerialParam(), useNegatives = TRUE, usePairs = NULL, lowMemory = NULL, ... )getFuzzyPartitionMetrics( hardTrue = NULL, fuzzyTrue = NULL, hardPred = NULL, fuzzyPred = NULL, metrics = c("fuzzyWH", "fuzzyAWH", "fuzzyWC", "fuzzyAWC"), level = "class", nperms = NULL, verbose = TRUE, BPPARAM = BiocParallel::SerialParam(), useNegatives = TRUE, usePairs = NULL, lowMemory = NULL, ... )
hardTrue |
An atomic vector coercible to a factor or integer vector containing the true hard labels. |
fuzzyTrue |
A object coercible to a numeric matrix with membership probability of elements (rows) in clusters (columns). |
hardPred |
An atomic vector coercible to a factor or integer vector containing the predicted hard labels. |
fuzzyPred |
A object coercible to a numeric matrix with membership probability of elements (rows) in clusters (columns). |
metrics |
The metrics to compute. See details. |
level |
The level to calculate the metrics. Options include
|
nperms |
The number of permutations (for correction for chance). If NULL (default), a first set of 10 permutations will be run to estimate whether the variation across permutations is above 0.0025, in which case more (max 1000) permutations will be run. |
verbose |
Logical; whether to print info and warnings, including the standard error of the mean across permutations (giving an idea of the precision of the adjusted metrics). |
BPPARAM |
BiocParallel params for multithreading (default none) |
useNegatives |
Logical; whether to include negative pairs in the concordance score (tends to result in a larger overall concordance and lower dynamic range of the score). Default TRUE. |
usePairs |
Logical; whether to compute over pairs instead of elements Recommended and TRUE by default. |
lowMemory |
Logical, whether to use a low memory mode. This is only
useful when |
... |
Optional arguments for |
The allowed values for metrics depend on the value of level:
If level = "element", the allowed metrics are: "fuzzySPC".
If level = "class", the allowed metrics are: "fuzzyWH",
"fuzzyAWH", "fuzzyWC", "fuzzyAWC".
If level = "dataset", the allowed metrics are: "fuzzyRI",
"fuzzyARI", "fuzzyWH", "fuzzyAWH", "fuzzyWC", "fuzzyAWC".
A dataframe of metric results.
# generate fuzzy partitions: m1 <- matrix(c(0.95, 0.025, 0.025, 0.98, 0.01, 0.01, 0.96, 0.02, 0.02, 0.95, 0.04, 0.01, 0.95, 0.01, 0.04, 0.99, 0.005, 0.005, 0.025, 0.95, 0.025, 0.97, 0.02, 0.01, 0.025, 0.025, 0.95), ncol = 3, byrow=TRUE) m2 <- matrix(c(0.95, 0.025, 0.025, 0.98, 0.01, 0.01, 0.96, 0.02, 0.02, 0.025, 0.95, 0.025, 0.02, 0.96, 0.02, 0.01, 0.98, 0.01, 0.05, 0.05, 0.95, 0.02, 0.02, 0.96, 0.01, 0.01, 0.98), ncol = 3, byrow=TRUE) colnames(m1) <- colnames(m2) <- LETTERS[seq_len(3)] getFuzzyPartitionMetrics(fuzzyTrue=m1,fuzzyPred=m2, level="class") # generate a fuzzy truth: fuzzyTrue <- matrix(c( 0.95, 0.025, 0.025, 0.98, 0.01, 0.01, 0.96, 0.02, 0.02, 0.95, 0.04, 0.01, 0.95, 0.01, 0.04, 0.99, 0.005, 0.005, 0.025, 0.95, 0.025, 0.97, 0.02, 0.01, 0.025, 0.025, 0.95), ncol = 3, byrow=TRUE) # a hard truth: hardTrue <- apply(fuzzyTrue,1,FUN=which.max) # some predicted labels: hardPred <- c(1,1,1,1,1,1,2,2,2) getFuzzyPartitionMetrics(hardPred=hardPred, hardTrue=hardTrue, fuzzyTrue=fuzzyTrue, nperms=3, level="class") getFuzzyPartitionMetrics(hardTrue=hardPred, hardPred=hardTrue, fuzzyPred=fuzzyTrue, nperms=3, level="class")# generate fuzzy partitions: m1 <- matrix(c(0.95, 0.025, 0.025, 0.98, 0.01, 0.01, 0.96, 0.02, 0.02, 0.95, 0.04, 0.01, 0.95, 0.01, 0.04, 0.99, 0.005, 0.005, 0.025, 0.95, 0.025, 0.97, 0.02, 0.01, 0.025, 0.025, 0.95), ncol = 3, byrow=TRUE) m2 <- matrix(c(0.95, 0.025, 0.025, 0.98, 0.01, 0.01, 0.96, 0.02, 0.02, 0.025, 0.95, 0.025, 0.02, 0.96, 0.02, 0.01, 0.98, 0.01, 0.05, 0.05, 0.95, 0.02, 0.02, 0.96, 0.01, 0.01, 0.98), ncol = 3, byrow=TRUE) colnames(m1) <- colnames(m2) <- LETTERS[seq_len(3)] getFuzzyPartitionMetrics(fuzzyTrue=m1,fuzzyPred=m2, level="class") # generate a fuzzy truth: fuzzyTrue <- matrix(c( 0.95, 0.025, 0.025, 0.98, 0.01, 0.01, 0.96, 0.02, 0.02, 0.95, 0.04, 0.01, 0.95, 0.01, 0.04, 0.99, 0.005, 0.005, 0.025, 0.95, 0.025, 0.97, 0.02, 0.01, 0.025, 0.025, 0.95), ncol = 3, byrow=TRUE) # a hard truth: hardTrue <- apply(fuzzyTrue,1,FUN=which.max) # some predicted labels: hardPred <- c(1,1,1,1,1,1,2,2,2) getFuzzyPartitionMetrics(hardPred=hardPred, hardTrue=hardTrue, fuzzyTrue=fuzzyTrue, nperms=3, level="class") getFuzzyPartitionMetrics(hardTrue=hardPred, hardPred=hardTrue, fuzzyPred=fuzzyTrue, nperms=3, level="class")
Computes a selection of graph evaluation metrics using class labels.
getGraphMetrics( x, labels, metrics = NULL, directed = NULL, k = 10, shared = FALSE, level = "class", ... )getGraphMetrics( x, labels, metrics = NULL, directed = NULL, k = 10, shared = FALSE, level = "class", ... )
x |
Either an igraph object, a list of nearest neighbors (see details below), or a data.frame or matrix (with features as columns and items as rows) from which nearest neighbors will be computed. |
labels |
Either a factor or a character vector indicating the true class
label of each element (i.e. row or vertex) of |
metrics |
The metrics to compute. See details. |
directed |
Logical; whether to compute the metrics in a directed fashion. If left to NULL, conventional choices will be made per metric (adhesion, cohesion, PWC AMSP undirected, others directed). |
k |
The number of nearest neighbors to compute and/or use. Can be
omitted if |
shared |
Logical; whether to use a shared nearest neighbor network
instead of a nearest neighbor network. Ignored if |
level |
The level to calculate the metrics. Options include
|
... |
The allowed values for metrics depend on the value of level:
If level = "element", the allowed metrics are: "SI","ISI","NP",
"NCE" (see below for details).
If level = "class", the allowed metrics are:
"SI": Simpson’s Index.
"ISI": Inverse Simpson’s Index
"NP": Neighborhood Purity
"AMSP": Adjusted Mean Shortest Path
"PWC": Proportion of Weakly Connected
"NCE": Neighborhood Class Enrichment
"adhesion": adhesion of a graph, is the minumum number of nodes
that must be removed to split a graph.
"cohesion": cohesion of a graph, is the minumum number of edges
that must be removed to split a graph.
If level = "dataset", the allowed metrics are: "SI","ISI",
"NP","AMSP","PWC","NCE", "adhesion","cohesion".
A data.frame of metrics.
d1 <- mockData() getGraphMetrics(d1[,seq_len(2)], labels=d1$class, level="class")d1 <- mockData() getGraphMetrics(d1[,seq_len(2)], labels=d1$class, level="class")
Per-element local concordance between a clustering and a ground truth
getNeighboringPairConcordance( true, pred, location, k = 20L, useNegatives = FALSE, distWeights = TRUE, BNPARAM = NULL )getNeighboringPairConcordance( true, pred, location, k = 20L, useNegatives = FALSE, distWeights = TRUE, BNPARAM = NULL )
true |
A vector of true class labels |
pred |
A vector of predicted clusters |
location |
A matrix or data.frame with spatial dimensions as columns.
Alternatively, a nearest neighbor object as produced by
|
k |
Approximate number of nearest neighbors to consider |
useNegatives |
Logical; whether to include the concordance of negative pairs in the score (default FALSE). |
distWeights |
Logical; whether to weight concordance by distance (default TRUE). |
BNPARAM |
A BiocNeighbors parameter object to compute kNNs. Ignored unless the input is a matrix or data.frame. If omitted, the Annoy approximation will be used if there are more than 500 elements. |
A vector of concordance scores
data(sp_toys) data <- sp_toys getNeighboringPairConcordance(data$label, data$p1, data[,c("x", "y")], k=6)data(sp_toys) data <- sp_toys getNeighboringPairConcordance(data$label, data$p1, data[,c("x", "y")], k=6)
Per-element pair concordance between a clustering and a ground truth. Note
that by default, negative pairs (i.e. that are split in both the predicted
and true groupings) are not counted. To count it (as in the standard Rand
Index), use useNegatives=TRUE.
getPairConcordance( true, pred, usePairs = TRUE, useNegatives = FALSE, adjust = FALSE )getPairConcordance( true, pred, usePairs = TRUE, useNegatives = FALSE, adjust = FALSE )
true |
A vector of true class labels |
pred |
A vector of predicted clusters |
usePairs |
Logical; whether to compute over pairs instead of elements Recommended and TRUE by default. |
useNegatives |
Logical; whether to include the consistency of negative pairs in the score (default FALSE). |
adjust |
Logical; whether to adjust for chance. Only implemented for
|
A vector of concordance scores
Computes a selection of external evaluation metrics for partition.
getPartitionMetrics(true, pred, metrics = NULL, level = "class", ...)getPartitionMetrics(true, pred, metrics = NULL, level = "class", ...)
true |
A vector containing the labels of the true classes. Must be a vector of characters, integers, numerics, or a factor, but not a list. |
pred |
A vector containing the labels of the predicted clusters. Must be a vector of characters, integers, numerics, or a factor, but not a list. |
metrics |
The metrics to compute. If omitted, main metrics will be computed. See details. |
level |
The level to calculate the metrics. Options include "element",
|
... |
Optional arguments for MI, VI, or VM. See |
The allowed values for metrics depend on the value of level:
If level = "element", the allowed metrics are:
"SPC": Spot-wise Pair Concordance.
"ASPC": Adjusted Spot-wise Pair Concordance.
If level = "class", the allowed metrics are: "WC","WH","AWC",
"AWH","FM" (see below for details).
If level = "dataset", the allowed metrics are:
"RI": Rand Index
"WC": Wallace Completeness
"WH": Wallace Homogeneity
"ARI": Adjusted Rand Index
"AWC": Adjusted Wallace Completeness
"AWH": Adjusted Wallace Homogeneity
"NCR": Normalized class size Rand index
"MI": Mutual Information
"AMI": Adjusted Mutual Information
"VI": Variation of Information
"EH": (Entropy-based) Homogeneity
"EC": (Entropy-based) Completeness
"VM": V-measure
"FM": F-measure/weighted average F1 score
"VDM": Van Dongen Measure
"MHM": Meila-Heckerman Measure
"MMM": Maximum-Match Measure
"Mirkin": Mirkin Metric
"Accuracy": Set Matching Accuracy
A data.frame of metrics.
true <- rep(LETTERS[seq_len(3)], each=10) pred <- c(rep("A", 8), rep("B", 9), rep("C", 3), rep("D", 10)) getPartitionMetrics(true, pred, level="class") getPartitionMetrics(true, pred, level="dataset")true <- rep(LETTERS[seq_len(3)], each=10) pred <- c(rep("A", 8), rep("B", 9), rep("C", 3), rep("D", 10)) getPartitionMetrics(true, pred, level="class") getPartitionMetrics(true, pred, level="dataset")
A generic function to calculate spatial external metrics. It can be applied
to raw components (true, pred, location) or directly to a
SpatialExperiment object.
getSpatialExternalMetrics( object = NULL, true, pred, location = NULL, k = 6, alpha = 0.5, level = "class", metrics = NULL, fuzzy_true = TRUE, fuzzy_pred = FALSE, ... ) ## S4 method for signature 'missing' getSpatialExternalMetrics( object = NULL, true, pred, location = NULL, k = 6, alpha = 0.5, level = "class", metrics = NULL, fuzzy_true = TRUE, fuzzy_pred = FALSE, ... ) ## S4 method for signature 'SpatialExperiment' getSpatialExternalMetrics( object = NULL, true, pred, location = NULL, k = 6, alpha = 0.5, level = "class", metrics = NULL, fuzzy_true = TRUE, fuzzy_pred = FALSE, ... )getSpatialExternalMetrics( object = NULL, true, pred, location = NULL, k = 6, alpha = 0.5, level = "class", metrics = NULL, fuzzy_true = TRUE, fuzzy_pred = FALSE, ... ) ## S4 method for signature 'missing' getSpatialExternalMetrics( object = NULL, true, pred, location = NULL, k = 6, alpha = 0.5, level = "class", metrics = NULL, fuzzy_true = TRUE, fuzzy_pred = FALSE, ... ) ## S4 method for signature 'SpatialExperiment' getSpatialExternalMetrics( object = NULL, true, pred, location = NULL, k = 6, alpha = 0.5, level = "class", metrics = NULL, fuzzy_true = TRUE, fuzzy_pred = FALSE, ... )
object |
The main input. Can be a |
true |
When |
pred |
When |
location |
A matrix or data.frame of coordinates |
k |
The number of neighbors used when calculating the fuzzy class memberships for fuzzy metrics, or when calculating the weighted accuracy. |
alpha |
The parameter to control to what extend the spot itself
contribute to the class composition calculation. |
level |
The level to calculate the metrics. Options include |
metrics |
The metrics to compute. See details. |
fuzzy_true |
Logical; whether to compute fuzzy class memberships
for |
fuzzy_pred |
Logical; whether to compute fuzzy class memberships
for |
... |
Additional arguments passed to specific methods. |
The allowed values for metrics depend on the value of level:
If level = "element", the allowed metrics are: "nsSPC",
"NPC","SpatialSPC".
If level = "class", the allowed metrics are: "nsWH",
"nsAWH", "nsWC","nsAWC".
If level = "dataset", the allowed metrics are: "nsRI",
"nsARI","nsWH","nsAWH", "nsWC","nsAWC",
"nsAccuracy","SpatialRI","SpatialARI".
A data.frame of metrics based on the specified input.
# Example with individual components data(sp_toys) data <- sp_toys getSpatialExternalMetrics(true=data$label, pred=data$p1, location=data[,c("x", "y")], k=6, level="class") # Example with SpatialExperiment object se_object <- SpatialExperiment::SpatialExperiment(assays=matrix(NA, ncol = nrow(data[,c("x", "y")]), nrow = ncol(data[,c("x", "y")])), spatialCoords=as.matrix(data[,c("x", "y")])) SummarizedExperiment::colData(se_object) <- cbind(SummarizedExperiment::colData(se_object), data.frame(true=data$label, pred=data$p1)) getSpatialExternalMetrics(object=se_object, true="true", pred="pred", k=6, level="class")# Example with individual components data(sp_toys) data <- sp_toys getSpatialExternalMetrics(true=data$label, pred=data$p1, location=data[,c("x", "y")], k=6, level="class") # Example with SpatialExperiment object se_object <- SpatialExperiment::SpatialExperiment(assays=matrix(NA, ncol = nrow(data[,c("x", "y")]), nrow = ncol(data[,c("x", "y")])), spatialCoords=as.matrix(data[,c("x", "y")])) SummarizedExperiment::colData(se_object) <- cbind(SummarizedExperiment::colData(se_object), data.frame(true=data$label, pred=data$p1)) getSpatialExternalMetrics(object=se_object, true="true", pred="pred", k=6, level="class")
A generic function to compute a selection of internal clustering evaluation
metrics for spatial data. It can be applied to raw components
(labels, location) or directly to a SpatialExperiment object.
getSpatialInternalMetrics( object = NULL, labels, location = NULL, k = 6, level = "class", metrics = c("CHAOS", "PAS", "ELSA"), ... ) ## S4 method for signature 'missing' getSpatialInternalMetrics( object = NULL, labels, location = NULL, k = 6, level = "class", metrics = c("CHAOS", "PAS", "ELSA"), ... ) ## S4 method for signature 'SpatialExperiment' getSpatialInternalMetrics( object = NULL, labels, location = NULL, k = 6, level = "class", metrics = c("CHAOS", "PAS", "ELSA"), ... )getSpatialInternalMetrics( object = NULL, labels, location = NULL, k = 6, level = "class", metrics = c("CHAOS", "PAS", "ELSA"), ... ) ## S4 method for signature 'missing' getSpatialInternalMetrics( object = NULL, labels, location = NULL, k = 6, level = "class", metrics = c("CHAOS", "PAS", "ELSA"), ... ) ## S4 method for signature 'SpatialExperiment' getSpatialInternalMetrics( object = NULL, labels, location = NULL, k = 6, level = "class", metrics = c("CHAOS", "PAS", "ELSA"), ... )
object |
The main input. Can be a |
labels |
When |
location |
A numerical matrix containing the location information, with rows as samples and columns as location dimensions. |
k |
The size of the spatial neighborhood to look at for each spot. This is used for calculating PAS and ELSA scores. |
level |
The level to calculate the metrics. Options include |
metrics |
The metrics to compute. See details. |
... |
Optional params for |
The allowed values for metrics depend on the value of level:
If level = "element", the allowed metrics are: "PAS", "ELSA".
If level = "class", the allowed metrics are: "CHAOS", "PAS",
"ELSA".
If level = "dataset", the allowed metrics are:
"PAS": Proportion of abnormal spots (PAS score)
"ELSA": Entropy-based Local indicator of Spatial Association
(ELSA score)
"CHAOS": Spatial Chaos Score.
"MPC": Modified partition coefficient
"PC": Partition coefficient
"PE": Partition entropy
A data.frame of metrics.
# Example with individual components data(sp_toys) data <- sp_toys getSpatialInternalMetrics(labels=data$label, location=data[,c("x", "y")], k=6, level="class") # Example with SpatialExperiment object se_object <- SpatialExperiment::SpatialExperiment(assays=matrix(NA, ncol = nrow(data[,c("x", "y")]), nrow = ncol(data[,c("x", "y")])), spatialCoords=as.matrix(data[,c("x", "y")])) SummarizedExperiment::colData(se_object) <- cbind(SummarizedExperiment::colData(se_object), data.frame(label=data$label)) getSpatialInternalMetrics(object=se_object, labels="label", k=6, level="class")# Example with individual components data(sp_toys) data <- sp_toys getSpatialInternalMetrics(labels=data$label, location=data[,c("x", "y")], k=6, level="class") # Example with SpatialExperiment object se_object <- SpatialExperiment::SpatialExperiment(assays=matrix(NA, ncol = nrow(data[,c("x", "y")]), nrow = ncol(data[,c("x", "y")])), spatialCoords=as.matrix(data[,c("x", "y")])) SummarizedExperiment::colData(se_object) <- cbind(SummarizedExperiment::colData(se_object), data.frame(label=data$label)) getSpatialInternalMetrics(object=se_object, labels="label", k=6, level="class")
For a given dataset with locations and labels, compute the label composition of the neighborhood for each sample.
knnComposition(location, k = 6, labels, alpha = 0.5, ...)knnComposition(location, k = 6, labels, alpha = 0.5, ...)
location |
A numeric data matrix containing location information, where rows are points and columns are location dimensions. |
k |
The number of nearest neighbors to look at. |
labels |
A vector containing the label for the dataset. |
alpha |
The parameter to control to what extend the spot itself
contribute to the class composition calculation. |
... |
Optional arguments for |
A numerical matrix indicating the composition, where rows correspond
to samples and columns correspond to the classes in label.
data(sp_toys) data <- sp_toys knnComposition(data[,c("x", "y")], k=6, data$label)data(sp_toys) data <- sp_toys knnComposition(data[,c("x", "y")], k=6, data$label)
Match sets from a partitions to a reference partition using the Hungarian algorithm to optimize F1 scores.
matchSets(pred, true, forceMatch = TRUE, returnIndices = is.integer(true))matchSets(pred, true, forceMatch = TRUE, returnIndices = is.integer(true))
pred |
An integer or factor of cluster labels |
true |
An integer or factor of reference labels |
forceMatch |
Logical; whether to enforce a match for every set of |
returnIndices |
Logical; whether to return indices rather than levels |
A vector of matching sets (i.e. level) from true for every set
(i.e. level) of pred.
A dataframe storing the information of all metrics. The code to generate the dataset is at system.file('inst/scripts/', 'metric_info.R', package='poem').
metric_infometric_info
metric_infoA data frame.
Generates mock multidimensional data of a given number of classes of points, for testing.
mockData( Ns = c(25, 15), classDiff = 2, Sds = 1, ndims = 2, spread = c(1, 2), rndFn = rnorm )mockData( Ns = c(25, 15), classDiff = 2, Sds = 1, ndims = 2, spread = c(1, 2), rndFn = rnorm )
Ns |
A vector of more than one positive integers specifying the number of elements of each class. |
classDiff |
The distances between the classes. If there are more than 2
classes, this can be a |
Sds |
The standard deviation. Can either be a fixed value, a value per class, or a matrix of values for each class (rows) and dimension (column). |
ndims |
The number of dimensions to generate (default 2). |
spread |
The spread of the points. Can either be a fixed value, a value per class, or a matrix of values for each class (rows) and dimension (col). |
rndFn |
The random function, by default |
A data.frame with coordinates and a class column.
d <- mockData()d <- mockData()
A simple toy dataset consists of two interleaving half circles. The code to generate the dataset is at system.file('inst/scripts/', 'noisy_moon.R', package='poem').
noisy_moonnoisy_moon
noisy_moonA data frame with 100 rows and 5 columns:
Coordinates of each observations.
Ground truth labels. Either 1 or 2.
Predicted clustering labels using kmeans with 2 centers.
Predicted clustering labels using hdbscan with
minPts = 5.
PAS score measures the clustering performance by calculating the randomness of the spots that located outside of the spatial region where it was clustered to. Lower PAS score indicates better spatial domian clustering performance.
PAS(labels, location, k = 10, ...)PAS(labels, location, k = 10, ...)
labels |
Cluster labels. |
location |
A numerical matrix containing the location information, with rows as samples and columns as location dimensions. |
k |
Number of nearest neighbors. |
... |
Optional params for |
A numeric value for PAS score, and a boolean vector about the abnormal spots.
data(sp_toys) data <- sp_toys PAS(data$label, data[,c("x", "y")], k=6) PAS(data$p1, data[,c("x", "y")], k=6) PAS(data$p2, data[,c("x", "y")], k=6)data(sp_toys) data <- sp_toys PAS(data$label, data[,c("x", "y")], k=6) PAS(data$p1, data[,c("x", "y")], k=6) PAS(data$p2, data[,c("x", "y")], k=6)
Toy examples of spatial data. The code to generate the dataset is at system.file('inst/scripts/', 'sp_toys.R', package='poem').
sp_toyssp_toys
sp_toysA data frame with 240 rows and 11 columns, representing a 16 x 15 array of spots:
Coordinates of each spots.
The row and column index of each spots.
Ground truth labels. Either 1 or 2.
Hypothetical predicted spatial clustering labels.
Computes the spatial Rand Index and spatial ARI (Yan, Feng and Luo, 2025).
Note that by default, the decay functions are different from those of the
original publication (see details for more information), but the latter can
be replicated with original=TRUE.
spatialARI( true, pred, location, normCoords = TRUE, lambda = 0.8, fbeta = 4, hbeta = 1, spotWise = FALSE, nChunks = NULL, original = FALSE, f = function(x) { lambda * exp(-x * fbeta) }, h = function(x) { lambda * (1 - exp(-x * hbeta)) } )spatialARI( true, pred, location, normCoords = TRUE, lambda = 0.8, fbeta = 4, hbeta = 1, spotWise = FALSE, nChunks = NULL, original = FALSE, f = function(x) { lambda * exp(-x * fbeta) }, h = function(x) { lambda * (1 - exp(-x * hbeta)) } )
true |
A vector of true class labels |
pred |
A vector of predicted clusters |
location |
A matrix of spatial coordinates, with dimensions as columns |
normCoords |
Logical; whether to normalize the coordinates to 0-1. |
lambda |
The |
fbeta, hbeta
|
Additional factors used in the exponential decay functions
(see details). A higher value means a faster decay. These are ignored if
|
spotWise |
Logical; whether to return the spot-wise spatial concordance (not adjusted for chance). |
nChunks |
The number of processing chunks. If NULL, this will be determined automatically based on the size of the dataset, so as to remain below 2GB RAM usage. |
original |
Logical; whether to use the original h/f functions from Yan,
Feng and Luo (default FALSE). If set to TRUE, the arguments |
f |
The f function, which determines the positive contribution of pairs that are in different partitions in the reference, but grouped together in the clustering, based on the distance between mates. |
h |
The h function, which determines the positive contribution of pairs that are in the same partition in the reference, but different ones in the clustering, based on the distance between mates. |
This is a reimplementation of the method from the spARI package, made more
scalable (i.e. a bit slower but more memory-efficient) through chunk-based
processing, extensible to more than 2 dimensions, and with some additional
options.
Note that by default, this will not produce the same results as the original
method: to do so, set original=TRUE. In our exploration of the method and
its behavior, we found the decay to be too slow, and we therefore 1) do not
square the distances, and 2) introduced a beta parameter in each function
which allows to scale it (a higher beta parameter means a faster decay).
By default, chunking to keep RAM usage roughly below 2GB. Higher speed can
be achieved (at higher memory costs) for larger datasets by limiting the
number of chunks. The memory usage if done in a single chunk should be
roughly 4e-5*nrow(location)^2 Mb, and this scales down linearly with the
number of chunks.
A vector containing the spatial Rand Index (spRI) and spatial
adjusted Rand Index (spARI). Alternatively, if spotWise=TRUE, a vector
of spatial pair concordances for each spot.
Pierre-Luc Germain
Yan, Feng and Luo, biorxiv 2025, https://doi.org/10.1101/2025.03.25.645156
data(sp_toys) spatialARI(true=sp_toys$label, pred=sp_toys$p2, location = sp_toys[,1:2])data(sp_toys) spatialARI(true=sp_toys$label, pred=sp_toys$p2, location = sp_toys[,1:2])
Toy example 2D embeddings of elements of different classes, with varying mixing and spread. Graphs 1-3 all have 20 elements of each of 4 classes, but that are mixed in different fashion in the embedding space. Graphs 4-7 all have 100 elements of class1 and 60 of class2, and the class1 elements vary in their spread. The code to generate the dataset is at system.file('inst/scripts/', 'graph_example.R', package='poem').
toyExamplestoyExamples
toyExamplesA data frame.
The name of the embedding to which the element belongs.
Coordinates in the 2D embedding.
The class to which the element belongs.