Package 'Dune' reference manual

Title:	Improving replicability in single-cell RNA-Seq cell type discovery
Description:	Given a set of clustering labels, Dune merges pairs of clusters to increase mean ARI between labels, improving replicability.
Authors:	Hector Roux de Bezieux [aut, cre] , Kelly Street [aut]
Maintainer:	Hector Roux de Bezieux <[email protected]>
License:	MIT + file LICENSE
Version:	1.19.0
Built:	2025-03-26 06:05:12 UTC
Source:	https://github.com/bioc/Dune

ARI improvement

Description

Compute the ARI improvement over the ARI merging procedure

Usage

ARIImp(merger, unclustered = NULL)
ARIImp(merger, unclustered = NULL)

Arguments

`merger`	the result from having run `Dune` on the dataset
`unclustered`	The value assigned to unclustered cells. Default to `NULL`

Value

a vector with the mean ARI between methods at each merge

Examples

data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
plot(0:nrow(merger$merges), ARIImp(merger))
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
plot(0:nrow(merger$merges), ARIImp(merger))

ARI Matrix

Description

ARI Matrix

Usage

ARIs(clusMat, unclustered = NULL)
ARIs(clusMat, unclustered = NULL)

Arguments

`clusMat`	The clustering matrix with a row per cell and a column per clustering label type
`unclustered`	The value assigned to unclustered cells. Default to `NULL`

Details

In the ARI matrix where each cell i,j is the adjusted Rand Index between columns i and j of the original clusMat. If unclustered is not NULL, the cells which have been assigned to the unclustered cluster will not be counted towards computing the ARI.

Value

The ARI matrix

Examples

data("clusMat", package = "Dune")
ARIs(clusMat)
data("clusMat", package = "Dune")
ARIs(clusMat)

ARI improvement plot

Description

A plot to see how ARI improves over merging

Usage

ARItrend(merger, unclustered = NULL)
ARItrend(merger, unclustered = NULL)

Arguments

`merger`	the result from having run `Dune` on the dataset
`unclustered`	The value assigned to unclustered cells. Default to `NULL`

Value

a ggplot object

Examples

data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
ARItrend(merger)
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
ARItrend(merger)

A clustering matrix used to demonstrate the ari-merging process.

Description

A clustering matrix used to demonstrate the ari-merging process.

Usage

clusMat
clusMat

Format

An object of class matrix (inherits from array) with 100 rows and 5 columns.

Details

This matrix has 100 samples with 5 cluster labels. Cluster labels 2 trought 5 are modified versions of cluster label 1, where some clusters from label 1 where broken down into smaller clusters. It is just a toy dataset that can be re-generated with the code in https://github.com/HectorRDB/Pipeline_Brain/blob/master/Sandbox/createToyDataset.R

clusterConversion

Description

Find the conversion between the old cluster and the final clusters

Usage

clusterConversion(merger, p = 1, average_n = NULL, n_steps = NULL)
clusterConversion(merger, p = 1, average_n = NULL, n_steps = NULL)

Arguments

`merger`	the result from having run `Dune` on the dataset
`p`	A value between 0 and 1. We stop when the metric used for merging has improved by p of the final total improvement. Default to 1 (i.e running the full merging).
`average_n`	Alternatively, you can specify the average number of clusters you want to have.
`n_steps`	Finally, you can specify the number of merging steps to do before stopping.

Details

If more than one of p,average_n and n_steps is specified, then the order of preference is n_steps, then average_n then p.

Value

A list containing a matrix per clustering method, with a column for the old labels and a column for the new labels.

Examples

data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
clusterConversion(merger)[[2]]
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
clusterConversion(merger)[[2]]

Plot the evolution of the ConfusionPlot as merging happens

Description

Animated version of ConfusionPlot

Usage

ConfusionEvolution(merger, unclustered = NULL, x, y, state_length = 1)
ConfusionEvolution(merger, unclustered = NULL, x, y, state_length = 1)

Arguments

`merger`	the result from having run `Dune` on the dataset
`unclustered`	The value assigned to unclustered cells. Default to `NULL`
`x`	The name of the first cluster label to plot
`y`	The name of the second cluster label to plot
`state_length`	Time between steps. Default to 1. See `transition_states` for details.

Details

See ConfusionPlot and animate.

Value

a gganim object

Examples

## Not run: 
  data("clusMat", package = "Dune")
  merger <- Dune(clusMat = clusMat)
  ConfusionEvolution(merger, x = "A", y = "B")
## End(Not run)
## Not run: 
  data("clusMat", package = "Dune")
  merger <- Dune(clusMat = clusMat)
  ConfusionEvolution(merger, x = "A", y = "B")
## End(Not run)

Plot confusion matrix

Description

A plot to visualize how alike two clustering labels are

Usage

ConfusionPlot(x, y = NULL)
ConfusionPlot(x, y = NULL)

Arguments

`x`	A vector of clustering labels or a matrix of clustering labels. See details.
`y`	Optional. Another vector of clustering labels

Value

a ggplot object

Examples

data("nuclei", package = "Dune")
ConfusionPlot(nuclei[, c("SC3", "Monocle")])
data("nuclei", package = "Dune")
ConfusionPlot(nuclei[, c("SC3", "Monocle")])

Dune

Description

Compute the Metric between every pair of clustering labels after merging every possible pair of clusters. Find the one that improves the Metric merging the most, merge the pair. Repeat until there is no improvement.

Usage

Dune(clusMat, ...)

## S4 method for signature 'matrix'
Dune(
  clusMat,
  unclustered = NULL,
  verbose = FALSE,
  parallel = FALSE,
  BPPARAM = BiocParallel::bpparam(),
  metric = "NMI"
)

## S4 method for signature 'data.frame'
Dune(
  clusMat,
  unclustered = NULL,
  verbose = FALSE,
  parallel = FALSE,
  BPPARAM = BiocParallel::bpparam(),
  metric = "NMI"
)

## S4 method for signature 'SummarizedExperiment'
Dune(
  clusMat,
  cluster_columns,
  unclustered = NULL,
  verbose = FALSE,
  parallel = FALSE,
  BPPARAM = BiocParallel::bpparam(),
  metric = "NMI"
)
Dune(clusMat, ...)

## S4 method for signature 'matrix'
Dune(
  clusMat,
  unclustered = NULL,
  verbose = FALSE,
  parallel = FALSE,
  BPPARAM = BiocParallel::bpparam(),
  metric = "NMI"
)

## S4 method for signature 'data.frame'
Dune(
  clusMat,
  unclustered = NULL,
  verbose = FALSE,
  parallel = FALSE,
  BPPARAM = BiocParallel::bpparam(),
  metric = "NMI"
)

## S4 method for signature 'SummarizedExperiment'
Dune(
  clusMat,
  cluster_columns,
  unclustered = NULL,
  verbose = FALSE,
  parallel = FALSE,
  BPPARAM = BiocParallel::bpparam(),
  metric = "NMI"
)

Arguments

`clusMat`	the matrix of samples by clustering labels.
`...`	parameters including:
`unclustered`	The value assigned to unclustered cells. Default to `NULL`
`verbose`	Whether or not the print cluster merging as it happens.
`parallel`	Logical, defaults to FALSE. Set to TRUE if you want to parallellize the fitting.
`BPPARAM`	object of class `bpparamClass` that specifies the back-end to be used for computations. See `bpparam` in `BiocParallel` package for details. Won't be used if `parallel` is FALSE.
`metric`	The metric that is tracked to decide which clusters to merge. For now, either ARI and NMI are accepted. Default to NMI. See details.
`cluster_columns`	if `clusMat` is a `SummarizedExperiment`, then this defines the columns of `colData` that are outputs from a clustering algorithm.

Details

The Dune algorithm merges pairs of clusters in order to improve the mean adjusted Rand Index or the mean normalized mutual information with other clustering labels. It returns a list with five components.: #'

initialMat: The initial matrix of cluster labels
currentMat: The final matrix of cluster labels
merges: The step-by-step detail of the merges, recapitulating which clusters where merged in which cluster label
impMetric: How much each merge improved the mean Metric between the cluster label that has been merged and the other cluster labels.
metric: The metric that was used to find the merges.

Value

A list with four components: the initial matrix of clustering labels, the final matrix of clustering labels, the merge info matrix and the Metric improvement vector.

Examples

data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
# clusters 11 to 14 from cluster label 5 and 3 are subset of cluster 2 from
# other cluster labels. Designing cluster 2 as unclustered therefore means we
# do fewer merges.
merger2 <- Dune(clusMat = clusMat, unclustered = 2)
merger$merges
merger2$merges
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
# clusters 11 to 14 from cluster label 5 and 3 are subset of cluster 2 from
# other cluster labels. Designing cluster 2 as unclustered therefore means we
# do fewer merges.
merger2 <- Dune(clusMat = clusMat, unclustered = 2)
merger$merges
merger2$merges

Track the evolution of a function along merging

Description

For a given ARI merging, compute the evolution on the function f

Usage

functionTracking(merger, f, p = 1, n_steps = NULL, ...)
functionTracking(merger, f, p = 1, n_steps = NULL, ...)

Arguments

`merger`	the result from having run `Dune` on the dataset
`f`	the function used. It must takes as input a clustering matrix and return a value
`p`	A value between 0 and 1. We stop when the metric used for merging has improved by p of the final total improvement. Default to 1 (i.e running the full merging).
`n_steps`	Alternatively, you can specifiy the number of merging steps to do before stopping.
`...`	additional arguments passed to f

Value

a vector of length the number of merges

Examples

# Return the number of clusters for the fourth cluster label
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
f <- function(clusMat, i) dplyr::n_distinct(clusMat[, i])
functionTracking(merger, f, i = 4)
# Return the number of clusters for the fourth cluster label
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
f <- function(clusMat, i) dplyr::n_distinct(clusMat[, i])
functionTracking(merger, f, i = 4)

Find the clustering matrix that we would get if we stopped the ARI merging early

Description

Find the clustering matrix that we would get if we stopped the ARI merging early

Usage

intermediateMat(merger, p = 1, average_n = NULL, n_steps = NULL)
intermediateMat(merger, p = 1, average_n = NULL, n_steps = NULL)

Arguments

`merger`	the result from having run `Dune` on the dataset
`p`	A value between 0 and 1. We stop when the metric used for merging has improved by p of the final total improvement. Default to 1 (i.e running the full merging).
`average_n`	Alternatively, you can specify the average number of clusters you want to have.
`n_steps`	Finally, you can specify the number of merging steps to do before stopping.

Details

If more than one of p,average_n and n_steps is specified, then the order of preference is n_steps, then average_n then p.

Value

A data.frame with the same dimensions as the currentMat of the merger argument, plus one column with cell names, related to the rownames of the original input

Examples

data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
head(intermediateMat(merger, n_steps = 1))
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
head(intermediateMat(merger, n_steps = 1))

NMI improvement

Description

Compute the NMI improvement over the NMI merging procedure

Usage

NMIImp(merger, unclustered = NULL)
NMIImp(merger, unclustered = NULL)

Arguments

`merger`	the result from having run `Dune` on the dataset
`unclustered`	The value assigned to unclustered cells. Default to `NULL`

Value

a vector with the mean NMI between methods at each merge

Examples

data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
plot(0:nrow(merger$merges), NMIImp(merger))
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
plot(0:nrow(merger$merges), NMIImp(merger))

NMI Matrix

Description

NMI Matrix

Usage

NMIs(clusMat, unclustered = NULL)
NMIs(clusMat, unclustered = NULL)

Arguments

`clusMat`	The clustering matrix with a row per cell and a column per clustering label type
`unclustered`	The value assigned to unclustered cells. Default to `NULL`

Details

In the NMI matrix where each cell i,j is the normalized mutual information between columns i and j of the original clusMat. If unclustered is not NULL, the cells which have been assigned to the unclustered cluster will not be counted towards computing the NMI.

Value

The NMI matrix

Examples

data("clusMat", package = "Dune")
NMIs(clusMat)
data("clusMat", package = "Dune")
NMIs(clusMat)

NMI improvement plot

Description

A plot to see how NMI improves over merging

Usage

NMItrend(merger, unclustered = NULL)
NMItrend(merger, unclustered = NULL)

Arguments

`merger`	the result from having run `Dune` on the dataset
`unclustered`	The value assigned to unclustered cells. Default to `NULL`

Value

a ggplot object

Examples

data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
NMItrend(merger)
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
NMItrend(merger)

Cluster labels for a subset of the allen Smart-Seq nuclei dataset

Description

Cluster labels for a subset of the allen Smart-Seq nuclei dataset

Usage

nuclei
nuclei

Format

An object of class data.frame with 1744 rows and 7 columns.

Details

This matrix of clusters was obtained by running 3 clustering algorithms on a brain snRNA-Seq dataset from Tasic et .al (https://doi.org/10.1038/s41586-018-0654-5). This dataset was then subsetted to the GABAergic neurons. Code to reproduce all this can be found in the github repository from the Dune paper (https://github.com/HectorRDB/Dune_Paper).

Plot an heatmap of the ARI matrix

Description

We can compute the ARI between pairs of cluster labels. This function plots a matrix where a cell is the adjusted Rand Index between cluster label of row i and cluster label of column j.

Usage

plotARIs(clusMat, unclustered = NULL, values = TRUE, numericalLabels = FALSE)
plotARIs(clusMat, unclustered = NULL, values = TRUE, numericalLabels = FALSE)

Arguments

`clusMat`	The clustering matrix with a row per cell and a column per clustering label type
`unclustered`	The value assigned to unclustered cells. Default to `NULL`
`values`	Whether to also display the ARI values. Default to TRUE.
`numericalLabels`	Whether labels are numerical values. Default to FALSE.

Value

a ggplot object

Examples

data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
plotARIs(merger$initialMat)
plotARIs(merger$currentMat)
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
plotARIs(merger$initialMat)
plotARIs(merger$currentMat)

Plot an heatmap of the NMI matrix

Description

We can compute the NMI between pairs of cluster labels. This function plots a matrix where a cell is the Normalized Mutual Information between cluster label of row i and cluster label of column j.

Usage

plotNMIs(clusMat, unclustered = NULL, values = TRUE, numericalLabels = FALSE)
plotNMIs(clusMat, unclustered = NULL, values = TRUE, numericalLabels = FALSE)

Arguments

`clusMat`	The clustering matrix with a row per cell and a column per clustering label type
`unclustered`	The value assigned to unclustered cells. Default to `NULL`
`values`	Whether to also display the ARI values. Default to TRUE.
`numericalLabels`	Whether labels are numerical values. Default to FALSE.

Value

a ggplot object

Examples

data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat, metric = "NMI")
plotNMIs(merger$initialMat)
plotNMIs(merger$currentMat)
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat, metric = "NMI")
plotNMIs(merger$initialMat)
plotNMIs(merger$currentMat)

Plot the reduction in cluster size for an ARI merging with `Dune`

Description

Plot the reduction in cluster size for an ARI merging with Dune

Usage

plotPrePost(merger)
plotPrePost(merger)

Arguments

merger

The output from an ARI merging, by calling Dune

Value

a ggplot object #' @importFrom dplyr mutate

Examples

data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
plotPrePost(merger)
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
plotPrePost(merger)

When to Stop

Description

When to Stop

Usage

whenToStop(merger, p = 1, average_n = NULL)
whenToStop(merger, p = 1, average_n = NULL)

Arguments

`merger`	the result from having run `Dune` on the dataset
`p`	A value between 0 and 1. We stop when the metric used for merging has improved by p of the final total improvement. Default to 1 (i.e running the full merging).
`average_n`	Alternatively, you can specify the average number of clusters you want to have.

Details

The Dune process improves the metric. This return the first merging step after which the metric has been improved by p of the total. Setting p = 1 just return the number of merges.

Value

An integer giving the step where to stop.

Examples

data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
whenToStop(merger, p = .5)
data("clusMat", package = "Dune")
merger <- Dune(clusMat = clusMat)
whenToStop(merger, p = .5)

Package 'Dune'

Help Index

adjustedRandIndex

Description

Usage

Arguments

Value

ARI improvement

Description

Usage

Arguments

Value

See Also

Examples

ARI Matrix

Description

Usage

Arguments

Details

Value

Examples

ARI improvement plot

Description

Usage

Arguments

Value

Examples

A clustering matrix used to demonstrate the ari-merging process.

Description

Usage

Format

Details

clusterConversion

Description

Usage

Arguments

Details

Value

Examples

Plot the evolution of the ConfusionPlot as merging happens

Description

Usage

Arguments

Details

Value

Examples

Plot confusion matrix

Description

Usage

Arguments

Value

Examples

Dune

Description

Usage

Arguments

Details

Value

See Also

Examples

Track the evolution of a function along merging

Description

Usage

Arguments

Value

Examples

Find the clustering matrix that we would get if we stopped the ARI merging early

Description

Usage

Arguments

Details

Value

Examples

NMI improvement

Description

Usage

Arguments

Value

See Also

Examples

Plot the reduction in cluster size for an ARI merging with `Dune`