Title: | Improving replicability in single-cell RNA-Seq cell type discovery |
---|---|
Description: | Given a set of clustering labels, Dune merges pairs of clusters to increase mean ARI between labels, improving replicability. |
Authors: | Hector Roux de Bezieux [aut, cre] , Kelly Street [aut] |
Maintainer: | Hector Roux de Bezieux <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.19.0 |
Built: | 2024-11-29 05:26:02 UTC |
Source: | https://github.com/bioc/Dune |
adjustedRandIndex
.adjustedRandIndex(tab)
.adjustedRandIndex(tab)
tab |
The confusion matrix |
The ARI
Compute the ARI improvement over the ARI merging procedure
ARIImp(merger, unclustered = NULL)
ARIImp(merger, unclustered = NULL)
merger |
the result from having run |
unclustered |
The value assigned to unclustered cells. Default to |
a vector with the mean ARI between methods at each merge
ARItrend
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) plot(0:nrow(merger$merges), ARIImp(merger))
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) plot(0:nrow(merger$merges), ARIImp(merger))
ARI Matrix
ARIs(clusMat, unclustered = NULL)
ARIs(clusMat, unclustered = NULL)
clusMat |
The clustering matrix with a row per cell and a column per clustering label type |
unclustered |
The value assigned to unclustered cells. Default to |
In the ARI matrix where each cell i,j is the adjusted Rand Index
between columns i and j of the original clusMat
.
If unclustered
is not NULL, the cells which have been assigned to the
unclustered
cluster will not be counted towards computing the ARI.
The ARI matrix
data("clusMat", package = "Dune") ARIs(clusMat)
data("clusMat", package = "Dune") ARIs(clusMat)
A plot to see how ARI improves over merging
ARItrend(merger, unclustered = NULL)
ARItrend(merger, unclustered = NULL)
merger |
the result from having run |
unclustered |
The value assigned to unclustered cells. Default to
|
a ggplot
object
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) ARItrend(merger)
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) ARItrend(merger)
A clustering matrix used to demonstrate the ari-merging process.
clusMat
clusMat
An object of class matrix
(inherits from array
) with 100 rows and 5 columns.
This matrix has 100 samples with 5 cluster labels. Cluster labels 2 trought 5 are modified versions of cluster label 1, where some clusters from label 1 where broken down into smaller clusters. It is just a toy dataset that can be re-generated with the code in https://github.com/HectorRDB/Pipeline_Brain/blob/master/Sandbox/createToyDataset.R
Find the conversion between the old cluster and the final clusters
clusterConversion(merger, p = 1, average_n = NULL, n_steps = NULL)
clusterConversion(merger, p = 1, average_n = NULL, n_steps = NULL)
merger |
the result from having run |
p |
A value between 0 and 1. We stop when the metric used for merging has improved by p of the final total improvement. Default to 1 (i.e running the full merging). |
average_n |
Alternatively, you can specify the average number of clusters you want to have. |
n_steps |
Finally, you can specify the number of merging steps to do before stopping. |
If more than one of p
,average_n
and n_steps
is specified,
then the order of preference is n_steps
, then average_n
then p
.
A list containing a matrix per clustering method, with a column for the old labels and a column for the new labels.
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) clusterConversion(merger)[[2]]
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) clusterConversion(merger)[[2]]
Animated version of ConfusionPlot
ConfusionEvolution(merger, unclustered = NULL, x, y, state_length = 1)
ConfusionEvolution(merger, unclustered = NULL, x, y, state_length = 1)
merger |
the result from having run |
unclustered |
The value assigned to unclustered cells. Default to |
x |
The name of the first cluster label to plot |
y |
The name of the second cluster label to plot |
state_length |
Time between steps. Default to 1. See |
See ConfusionPlot
and animate
.
a gganim
object
## Not run: data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) ConfusionEvolution(merger, x = "A", y = "B") ## End(Not run)
## Not run: data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) ConfusionEvolution(merger, x = "A", y = "B") ## End(Not run)
A plot to visualize how alike two clustering labels are
ConfusionPlot(x, y = NULL)
ConfusionPlot(x, y = NULL)
x |
A vector of clustering labels or a matrix of clustering labels. See details. |
y |
Optional. Another vector of clustering labels |
a ggplot
object
data("nuclei", package = "Dune") ConfusionPlot(nuclei[, c("SC3", "Monocle")])
data("nuclei", package = "Dune") ConfusionPlot(nuclei[, c("SC3", "Monocle")])
Compute the Metric between every pair of clustering labels after merging every possible pair of clusters. Find the one that improves the Metric merging the most, merge the pair. Repeat until there is no improvement.
Dune(clusMat, ...) ## S4 method for signature 'matrix' Dune( clusMat, unclustered = NULL, verbose = FALSE, parallel = FALSE, BPPARAM = BiocParallel::bpparam(), metric = "NMI" ) ## S4 method for signature 'data.frame' Dune( clusMat, unclustered = NULL, verbose = FALSE, parallel = FALSE, BPPARAM = BiocParallel::bpparam(), metric = "NMI" ) ## S4 method for signature 'SummarizedExperiment' Dune( clusMat, cluster_columns, unclustered = NULL, verbose = FALSE, parallel = FALSE, BPPARAM = BiocParallel::bpparam(), metric = "NMI" )
Dune(clusMat, ...) ## S4 method for signature 'matrix' Dune( clusMat, unclustered = NULL, verbose = FALSE, parallel = FALSE, BPPARAM = BiocParallel::bpparam(), metric = "NMI" ) ## S4 method for signature 'data.frame' Dune( clusMat, unclustered = NULL, verbose = FALSE, parallel = FALSE, BPPARAM = BiocParallel::bpparam(), metric = "NMI" ) ## S4 method for signature 'SummarizedExperiment' Dune( clusMat, cluster_columns, unclustered = NULL, verbose = FALSE, parallel = FALSE, BPPARAM = BiocParallel::bpparam(), metric = "NMI" )
clusMat |
the matrix of samples by clustering labels. |
... |
parameters including: |
unclustered |
The value assigned to unclustered cells. Default to |
verbose |
Whether or not the print cluster merging as it happens. |
parallel |
Logical, defaults to FALSE. Set to TRUE if you want to parallellize the fitting. |
BPPARAM |
object of class |
metric |
The metric that is tracked to decide which clusters to merge. For now, either ARI and NMI are accepted. Default to NMI. See details. |
cluster_columns |
if |
The Dune algorithm merges pairs of clusters in order to improve the mean adjusted Rand Index or the mean normalized mutual information with other clustering labels. It returns a list with five components.: #'
initialMat
: The initial matrix of cluster labels
currentMat
: The final matrix of cluster labels
merges
: The step-by-step detail of the merges, recapitulating
which clusters where merged in which cluster label
impMetric
: How much each merge improved the mean Metric between the
cluster label that has been merged and the other cluster labels.
metric
: The metric that was used to find the merges.
A list with four components: the initial matrix of clustering labels, the final matrix of clustering labels, the merge info matrix and the Metric improvement vector.
clusterConversion ARIImp
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) # clusters 11 to 14 from cluster label 5 and 3 are subset of cluster 2 from # other cluster labels. Designing cluster 2 as unclustered therefore means we # do fewer merges. merger2 <- Dune(clusMat = clusMat, unclustered = 2) merger$merges merger2$merges
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) # clusters 11 to 14 from cluster label 5 and 3 are subset of cluster 2 from # other cluster labels. Designing cluster 2 as unclustered therefore means we # do fewer merges. merger2 <- Dune(clusMat = clusMat, unclustered = 2) merger$merges merger2$merges
For a given ARI merging, compute the evolution on the function f
functionTracking(merger, f, p = 1, n_steps = NULL, ...)
functionTracking(merger, f, p = 1, n_steps = NULL, ...)
merger |
the result from having run |
f |
the function used. It must takes as input a clustering matrix and return a value |
p |
A value between 0 and 1. We stop when the metric used for merging has improved by p of the final total improvement. Default to 1 (i.e running the full merging). |
n_steps |
Alternatively, you can specifiy the number of merging steps to do before stopping. |
... |
additional arguments passed to f |
a vector of length the number of merges
# Return the number of clusters for the fourth cluster label data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) f <- function(clusMat, i) dplyr::n_distinct(clusMat[, i]) functionTracking(merger, f, i = 4)
# Return the number of clusters for the fourth cluster label data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) f <- function(clusMat, i) dplyr::n_distinct(clusMat[, i]) functionTracking(merger, f, i = 4)
Find the clustering matrix that we would get if we stopped the ARI merging early
intermediateMat(merger, p = 1, average_n = NULL, n_steps = NULL)
intermediateMat(merger, p = 1, average_n = NULL, n_steps = NULL)
merger |
the result from having run |
p |
A value between 0 and 1. We stop when the metric used for merging has improved by p of the final total improvement. Default to 1 (i.e running the full merging). |
average_n |
Alternatively, you can specify the average number of clusters you want to have. |
n_steps |
Finally, you can specify the number of merging steps to do before stopping. |
If more than one of p
,average_n
and n_steps
is specified,
then the order of preference is n_steps
, then average_n
then p
.
A data.frame with the same dimensions as the currentMat of the merger argument, plus one column with cell names, related to the rownames of the original input
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) head(intermediateMat(merger, n_steps = 1))
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) head(intermediateMat(merger, n_steps = 1))
Compute the NMI improvement over the NMI merging procedure
NMIImp(merger, unclustered = NULL)
NMIImp(merger, unclustered = NULL)
merger |
the result from having run |
unclustered |
The value assigned to unclustered cells. Default to |
a vector with the mean NMI between methods at each merge
NMItrend
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) plot(0:nrow(merger$merges), NMIImp(merger))
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) plot(0:nrow(merger$merges), NMIImp(merger))
NMI Matrix
NMIs(clusMat, unclustered = NULL)
NMIs(clusMat, unclustered = NULL)
clusMat |
The clustering matrix with a row per cell and a column per clustering label type |
unclustered |
The value assigned to unclustered cells. Default to |
In the NMI matrix where each cell i,j is the normalized mutual
information between columns i and j of the original clusMat
.
If unclustered
is not NULL, the cells which have been assigned to the
unclustered
cluster will not be counted towards computing the NMI.
The NMI matrix
data("clusMat", package = "Dune") NMIs(clusMat)
data("clusMat", package = "Dune") NMIs(clusMat)
A plot to see how NMI improves over merging
NMItrend(merger, unclustered = NULL)
NMItrend(merger, unclustered = NULL)
merger |
the result from having run |
unclustered |
The value assigned to unclustered cells. Default to
|
a ggplot
object
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) NMItrend(merger)
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) NMItrend(merger)
Cluster labels for a subset of the allen Smart-Seq nuclei dataset
nuclei
nuclei
An object of class data.frame
with 1744 rows and 7 columns.
This matrix of clusters was obtained by running 3 clustering algorithms on a brain snRNA-Seq dataset from Tasic et .al (https://doi.org/10.1038/s41586-018-0654-5). This dataset was then subsetted to the GABAergic neurons. Code to reproduce all this can be found in the github repository from the Dune paper (https://github.com/HectorRDB/Dune_Paper).
We can compute the ARI between pairs of cluster labels. This function plots a matrix where a cell is the adjusted Rand Index between cluster label of row i and cluster label of column j.
plotARIs(clusMat, unclustered = NULL, values = TRUE, numericalLabels = FALSE)
plotARIs(clusMat, unclustered = NULL, values = TRUE, numericalLabels = FALSE)
clusMat |
The clustering matrix with a row per cell and a column per clustering label type |
unclustered |
The value assigned to unclustered cells. Default to |
values |
Whether to also display the ARI values. Default to TRUE. |
numericalLabels |
Whether labels are numerical values. Default to FALSE. |
a ggplot
object
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) plotARIs(merger$initialMat) plotARIs(merger$currentMat)
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) plotARIs(merger$initialMat) plotARIs(merger$currentMat)
We can compute the NMI between pairs of cluster labels. This function plots a matrix where a cell is the Normalized Mutual Information between cluster label of row i and cluster label of column j.
plotNMIs(clusMat, unclustered = NULL, values = TRUE, numericalLabels = FALSE)
plotNMIs(clusMat, unclustered = NULL, values = TRUE, numericalLabels = FALSE)
clusMat |
The clustering matrix with a row per cell and a column per clustering label type |
unclustered |
The value assigned to unclustered cells. Default to |
values |
Whether to also display the ARI values. Default to TRUE. |
numericalLabels |
Whether labels are numerical values. Default to FALSE. |
a ggplot
object
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat, metric = "NMI") plotNMIs(merger$initialMat) plotNMIs(merger$currentMat)
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat, metric = "NMI") plotNMIs(merger$initialMat) plotNMIs(merger$currentMat)
Dune
Plot the reduction in cluster size for an ARI merging with Dune
plotPrePost(merger)
plotPrePost(merger)
merger |
The output from an ARI merging, by calling |
a ggplot
object
#' @importFrom dplyr mutate
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) plotPrePost(merger)
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) plotPrePost(merger)
When to Stop
whenToStop(merger, p = 1, average_n = NULL)
whenToStop(merger, p = 1, average_n = NULL)
merger |
the result from having run |
p |
A value between 0 and 1. We stop when the metric used for merging has improved by p of the final total improvement. Default to 1 (i.e running the full merging). |
average_n |
Alternatively, you can specify the average number of clusters you want to have. |
The Dune
process improves the metric. This return
the first merging step after which the metric has been improved by p of the
total. Setting p = 1 just return the number of merges.
An integer giving the step where to stop.
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) whenToStop(merger, p = .5)
data("clusMat", package = "Dune") merger <- Dune(clusMat = clusMat) whenToStop(merger, p = .5)