Package 'evaluomeR' reference manual

Title:	Evaluation of Bioinformatics Metrics
Description:	Evaluating the reliability of your own metrics and the measurements done on your own datasets by analysing the stability and goodness of the classifications of such metrics.
Authors:	José Antonio Bernabé-Díaz [aut, cre], Manuel Franco [aut], Juana-María Vivo [aut], Manuel Quesada-Martínez [aut], Astrid Duque-Ramos [aut], Jesualdo Tomás Fernández-Breis [aut]
Maintainer:	José Antonio Bernabé-Díaz <[email protected]>
License:	GPL-3
Version:	1.23.0
Built:	2025-01-28 06:29:04 UTC
Source:	https://github.com/bioc/evaluomeR

Calculate the cluster ID from the optimal cluster per metric for each individual. annotateClustersByMetric

Description

Return a named list, where each metric name is linked to a data frame containing the evaluated individuals, their score for the specified metric, and the cluster id in which each individual is classified. This cluster assignment is performed by calculating the optimal k value by evaluome.

Usage

annotateClustersByMetric(df, k.range, bs, seed)
annotateClustersByMetric(df, k.range, bs, seed)

Arguments

`df`	Input data frame. The first column denotes the identifier of the evaluated individuals. The remaining columns contain the metrics used to evaluate the individuals. Rows with NA values will be ignored.
`k.range`	Range of k values in which the optimal k will be searched
`bs`	Bootstrap re-sample param.
`seed`	Random seed to be used.

Value

A named list resulting from computing the optimal cluster for each metric. Each metric is a name in the named list, and its content is a data frame that includes the individuals, the value for the corresponding metric, and the cluster id in which the individual has been asigned according to the optimal cluster.

Examples

data("ontMetrics")
annotated_clusters=annotateClustersByMetric(ontMetrics, k.range=c(2,3), bs=20, seed=100)
annotated_clusters[['ANOnto']]
data("ontMetrics")
annotated_clusters=annotateClustersByMetric(ontMetrics, k.range=c(2,3), bs=20, seed=100)
annotated_clusters[['ANOnto']]

Dataset: Metrics for biological pathways

Description

Metrics for biological pathways, 2 metrics that quantitative characterizations of the importance of regulation in biochemical pathway systems, including systems designed for applications in synthetic biology or metabolic engineering. The metrics are reachability and efficiency

Usage

data("bioMetrics")
data("bioMetrics")

Format

An object of class SummarizedExperiment with 15 rows and 3 columns.

References

Davis JD, Voit EO (2018). “Metrics for regulated biochemical pathway systems.” Bioinformatics. doi:10.1093/bioinformatics/bty942.

Get supported CBIs in evaluomeR.

Description

A vector of supported CBIs available in evaluomeR.

Usage

evaluomeRSupportedCBI()
evaluomeRSupportedCBI()

Value

A String vector.

Examples

supportedCBIs <- evaluomeRSupportedCBI

supportedCBIs <- evaluomeRSupportedCBI

Dataframe getter for `qualityRange` function.

Description

This method is a wrapper to retrieve a specific SummarizedExperiment given a k value from the object returned by qualityRange function.

Usage

getDataQualityRange(data, k)
getDataQualityRange(data, k)

Arguments

`data`	The object returned by `qualityRange` function.
`k`	The desired `k` cluster.

Value

The SummarizedExperiment that contains information about the selected k cluster.

Examples

# Using example data from our package
data("ontMetrics")
qualityRangeData <- qualityRange(ontMetrics, k.range=c(3,5), getImages = FALSE)
# Getting dataframe that contains information about k=5
k5Data = getDataQualityRange(qualityRangeData, 5)

# Using example data from our package
data("ontMetrics")
qualityRangeData <- qualityRange(ontMetrics, k.range=c(3,5), getImages = FALSE)
# Getting dataframe that contains information about k=5
k5Data = getDataQualityRange(qualityRangeData, 5)

Get the range of each metric per cluster from the optimal cluster. getMetricRangeByCluster

Description

Obtains the ranges of the metrics obtained by each optimal cluster.

Usage

getMetricRangeByCluster(df, k.range, bs, seed)
getMetricRangeByCluster(df, k.range, bs, seed)

Arguments

`df`	Input data frame. The first column denotes the identifier of the evaluated individuals. The remaining columns contain the metrics used to evaluate the individuals. Rows with NA values will be ignored.
`k.range`	Range of k values in which the optimal k will be searched
`bs`	Bootstrap re-sample param.
`seed`	Random seed to be used.

Value

A dataframe including the min and the max value for each pair (metric, cluster).

Get the range of each metric per cluster from the optimal cluster. getMetricRangeByCluster

Description

Obtains the ranges of the metrics obtained by each optimal cluster.

Usage

getMetricsRelevancy(df, k, alpha = NULL, L1 = NULL, seed = NULL)
getMetricsRelevancy(df, k, alpha = NULL, L1 = NULL, seed = NULL)

Arguments

`df`	Input data frame. The first column denotes the identifier of the evaluated individuals. The remaining columns contain the metrics used to evaluate the individuals. Rows with NA values will be ignored.
`k`	K value (number of clusters)
`alpha`	0 <= alpha <= 1, the proportion of the cases to be trimmed in robust sparse K-means, see `RSKC`.
`L1`	A single L1 bound on weights (the feature weights), see `RSKC`.
`seed`	Random seed to be used.

Value

A dataframe including the min and the max value for each pair (metric, cluster).

Examples

data("ontMetrics")
metricsRelevancy = getMetricsRelevancy(ontMetrics, k=3, alpha=0.1, seed=100)
metricsRelevancy$rskc # RSKC output object
metricsRelevancy$trimmed_cases # Trimmed cases from input (row indexes)
metricsRelevancy$relevancy # Metrics relevancy table

data("ontMetrics")
metricsRelevancy = getMetricsRelevancy(ontMetrics, k=3, alpha=0.1, seed=100)
metricsRelevancy$rskc # RSKC output object
metricsRelevancy$trimmed_cases # Trimmed cases from input (row indexes)
metricsRelevancy$relevancy # Metrics relevancy table

Calculating the optimal value of k. getOptimalKValue

Description

This method finds the optimal value of K per each metric.

Usage

getOptimalKValue(stabData, qualData, k.range = NULL)
getOptimalKValue(stabData, qualData, k.range = NULL)

Arguments

`stabData`	An output `ExperimentList` from a `stabilityRange` execution.
`qualData`	An output `SummarizedExperiment` from a `qualityRange` execution.
`k.range`	A range of K values to limit the scope of the analysis.

Value

It returns a dataframe following the schema: metric, optimal_k.

Examples

# Using example data from our package
data("rnaMetrics")
stabilityData <- stabilityRange(data=rnaMetrics, k.range=c(2,4), bs=20, getImages = FALSE)
qualityData <- qualityRange(data=rnaMetrics, k.range=c(2,4), getImages = FALSE)
kOptTable = getOptimalKValue(stabilityData, qualityData)


# Using example data from our package
data("rnaMetrics")
stabilityData <- stabilityRange(data=rnaMetrics, k.range=c(2,4), bs=20, getImages = FALSE)
qualityData <- qualityRange(data=rnaMetrics, k.range=c(2,4), getImages = FALSE)
kOptTable = getOptimalKValue(stabilityData, qualityData)

Global metric score defined by a prediction.

Description

This analysis calculates a global metric score based upon a prediction model computed with flexmix package.

Usage

globalMetric(data, k.range = c(2, 15), nrep = 10,
  criterion = c("BIC", "AIC"), PCA = FALSE, seed = NULL)
globalMetric(data, k.range = c(2, 15), nrep = 10,
  criterion = c("BIC", "AIC"), PCA = FALSE, seed = NULL)

Arguments

`data`	A `SummarizedExperiment`. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.
`k.range`	Concatenation of two positive integers. The first value `k.range[1]` is considered as the lower bound of the range, whilst the second one, `k.range[2]`, as the higher. Both values must be contained in [2,15] range.
`nrep`	Positive integer. Number of random initializations used in adjusting the model.
`criterion`	String. Critirion applied in order to select the best model. Possible values: "BIC" or "AIC".
`PCA`	Boolean. If true, a PCA is performed on the input dataframe before computing the predictions.
`seed`	Positive integer. A seed for internal bootstrap.

Value

A dataframe containing the global metric score for each metric.

Examples

# Using example data from our package
data("rnaMetrics")
globalMetric(rnaMetrics, k.range = c(2,3), nrep=10, criterion="AIC", PCA=TRUE)



# Using example data from our package
data("rnaMetrics")
globalMetric(rnaMetrics, k.range = c(2,3), nrep=10, criterion="AIC", PCA=TRUE)

Calculation of Pearson correlation coefficient.

Description

Calculation of Pearson correlation coefficient between every pair of metrics available in order to quantify their interrelationship degree. The score is in the range [-1,1]. Perfect correlations: -1 (inverse), and 1 (direct).

Usage

metricsCorrelations(data, margins = c(0, 10, 9, 11), getImages = TRUE)
metricsCorrelations(data, margins = c(0, 10, 9, 11), getImages = TRUE)

Arguments

`data`	A `SummarizedExperiment`. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.
`margins`	See `par`.
`getImages`	Boolean. If true, a plot is displayed.

Value

The Pearson correlation matrix as an assay in a SummarizedExperiment object.

Examples

# Using example data from our package
data("ontMetrics")
cor = metricsCorrelations(ontMetrics, getImages = TRUE, margins = c(1,0,5,11))

# Using example data from our package
data("ontMetrics")
cor = metricsCorrelations(ontMetrics, getImages = TRUE, margins = c(1,0,5,11))

Dataset: Structural ontology metrics

Description

Structural ontology metrics, 19 metrics measuring structural aspects of bio-ontologies have been analysed on two different corpora of ontologies: OBO Foundry and AgroPortal

Usage

data("ontMetrics")
data("ontMetrics")

Format

An object of class SummarizedExperiment with 80 rows and 20 columns.

References

Franco M, Vivo JM, Quesada-Martínez M, Duque-Ramos A, Fernández-Breis JT (2019). “Evaluation of ontology structural metrics based on public repository data.” Bioinformatics. doi:10.1093/bib/bbz009, https://dx.doi.org/10.1093/bib/bbz009.

Metric values as a boxplot.

Description

It plots the value of the metrics in a SummarizedExperiment object as a boxplot.

Usage

plotMetricsBoxplot(data)
plotMetricsBoxplot(data)

Arguments

data

A SummarizedExperiment. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.

Value

Nothing.

Examples

# Using example data from our package
data("ontMetrics")
plotMetricsBoxplot(ontMetrics)

# Using example data from our package
data("ontMetrics")
plotMetricsBoxplot(ontMetrics)

Metric values clustering.

Description

It clusters the value of the metrics in a SummarizedExperiment object a an hclust dendogram from stats. By default distance is measured in 'euclidean' and hclust method is 'ward.D20.

Usage

plotMetricsCluster(data, scale = FALSE, k = NULL)
plotMetricsCluster(data, scale = FALSE, k = NULL)

Arguments

`data`	A `SummarizedExperiment`. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.
`scale`	Boolean. If true input data is scaled. Default: FALSE.
`k`	Integer. If not NULL a 'cutree' cut on the cluster is done. Default: NULL

Value

An hclust object.

Examples

# Using example data from our package
data("ontMetrics")
plotMetricsCluster(ontMetrics, scale=TRUE)

# Using example data from our package
data("ontMetrics")
plotMetricsCluster(ontMetrics, scale=TRUE)

Comparison between two clusterings as plot. plotMetricsClusterComparison

Description

It plots a clustering comparison between two different k-cluster vectors for a set of metrics.

Usage

plotMetricsClusterComparison(data, k.vector1, k.vector2 = NULL,
  seed = NULL)
plotMetricsClusterComparison(data, k.vector1, k.vector2 = NULL,
  seed = NULL)

Arguments

`data`	A `SummarizedExperiment`. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.
`k.vector1`	Vector of positive integers representing `k` clusters. The `k` values must be contained in [2,15] range.
`k.vector2`	Optional. Vector of positive integers representing `k` clusters. The `k` values must be contained in [2,15] range.
`seed`	Positive integer. A seed for internal bootstrap.

Value

Nothing.

Examples

# Using example data from our package
data("rnaMetrics")
stabilityData <- stabilityRange(data=rnaMetrics, k.range=c(2,4), bs=20, getImages = FALSE)
qualityData <- qualityRange(data=rnaMetrics, k.range=c(2,4), getImages = FALSE)
kOptTable = getOptimalKValue(stabilityData, qualityData)


# Using example data from our package
data("rnaMetrics")
stabilityData <- stabilityRange(data=rnaMetrics, k.range=c(2,4), bs=20, getImages = FALSE)
qualityData <- qualityRange(data=rnaMetrics, k.range=c(2,4), getImages = FALSE)
kOptTable = getOptimalKValue(stabilityData, qualityData)

Minimum and maximum metric values plot.

Description

It plots the minimum, maximum and standard deviation values of the metrics in a SummarizedExperiment object.

Usage

plotMetricsMinMax(data)
plotMetricsMinMax(data)

Arguments

data

Value

Nothing.

Examples

# Using example data from our package
data("ontMetrics")
plotMetricsMinMax(ontMetrics)

# Using example data from our package
data("ontMetrics")
plotMetricsMinMax(ontMetrics)

Metric values as violin plot.

Description

It plots the value of the metrics in a SummarizedExperiment object as a violin plot.

Usage

plotMetricsViolin(data, nplots = 20)
plotMetricsViolin(data, nplots = 20)

Arguments

data

nplots

Positive integer. Number of metrics per violin plot. Default: 20.

Value

Nothing.

Examples

# Using example data from our package
data("ontMetrics")
plotMetricsViolin(ontMetrics)

# Using example data from our package
data("ontMetrics")
plotMetricsViolin(ontMetrics)

Goodness of classifications.

Description

The goodness of the classifications are assessed by validating the clusters generated. For this purpose, we use the Silhouette width as validity index. This index computes and compares the quality of the clustering outputs found by the different metrics, thus enabling to measure the goodness of the classification for both instances and metrics. More precisely, this goodness measurement provides an assessment of how similar an instance is to other instances from the same cluster and dissimilar to all the other clusters. The average on all the instances quantifies how appropriately the instances are clustered. Kaufman and Rousseeuw suggested the interpretation of the global Silhouette width score as the effectiveness of the clustering structure. The values are in the range [0,1], having the following meaning:

There is no substantial clustering structure: [-1, 0.25].
The clustering structure is weak and could be artificial: ]0.25, 0.50].
There is a reasonable clustering structure: ]0.50, 0.70].
A strong clustering structure has been found: ]0.70, 1].

Usage

quality(data, k = 5, cbi = "kmeans", getImages = FALSE,
  all_metrics = FALSE, seed = NULL, ...)
quality(data, k = 5, cbi = "kmeans", getImages = FALSE,
  all_metrics = FALSE, seed = NULL, ...)

Arguments

`data`	A `SummarizedExperiment`. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.
`k`	Positive integer. Number of clusters between [2,15] range.
`cbi`	Clusterboot interface name (default: "kmeans"): "kmeans", "clara", "clara_pam", "hclust", "pamk", "pamk_pam", "pamk". Any CBI appended with '_pam' makes use of `pam`. The method used in 'hclust' CBI is "ward.D2".
`getImages`	Boolean. If true, a plot is displayed.
`all_metrics`	Boolean. If true, clustering is performed upon all the dataset.
`seed`	Positive integer. A seed for internal bootstrap.

Value

A SummarizedExperiment containing the silhouette width measurements and cluster sizes for cluster k.

References

Kaufman L, Rousseeuw PJ (2009). Finding groups in data: an introduction to cluster analysis, volume 344. John Wiley & Sons.

Examples

# Using example data from our package
data("ontMetrics")
result = quality(ontMetrics, k=4)

# Using example data from our package
data("ontMetrics")
result = quality(ontMetrics, k=4)

Goodness of classifications for a range of k clusters.

Description

The goodness of the classifications are assessed by validating the clusters generated for a range of k values. For this purpose, we use the Silhouette width as validity index. This index computes and compares the quality of the clustering outputs found by the different metrics, thus enabling to measure the goodness of the classification for both instances and metrics. More precisely, this measurement provides an assessment of how similar an instance is to other instances from the same cluster and dissimilar to the rest of clusters. The average on all the instances quantifies how the instances appropriately are clustered. Kaufman and Rousseeuw suggested the interpretation of the global Silhouette width score as the effectiveness of the clustering structure. The values are in the range [0,1], having the following meaning:

There is no substantial clustering structure: [-1, 0.25].
The clustering structure is weak and could be artificial: ]0.25, 0.50].
There is a reasonable clustering structure: ]0.50, 0.70].
A strong clustering structure has been found: ]0.70, 1].

Usage

qualityRange(data, k.range = c(3, 5), cbi = "kmeans",
  getImages = FALSE, all_metrics = FALSE, seed = NULL, ...)
qualityRange(data, k.range = c(3, 5), cbi = "kmeans",
  getImages = FALSE, all_metrics = FALSE, seed = NULL, ...)

Arguments

`data`	A `SummarizedExperiment`. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.
`k.range`	Concatenation of two positive integers. The first value `k.range[1]` is considered as the lower bound of the range, whilst the second one, `k.range[2]`, as the higher. Both values must be contained in [2,15] range.
`cbi`	Clusterboot interface name (default: "kmeans"): "kmeans", "clara", "clara_pam", "hclust", "pamk", "pamk_pam", "pamk". Any CBI appended with '_pam' makes use of `pam`. The method used in 'hclust' CBI is "ward.D2".
`getImages`	Boolean. If true, a plot is displayed.
`all_metrics`	Boolean. If true, clustering is performed upon all the dataset.
`seed`	Positive integer. A seed for internal bootstrap.

Value

A list of SummarizedExperiment containing the silhouette width measurements and cluster sizes from k.range[1] to k.range[2]. The position on the list matches with the k-value used in that dataframe. For instance, position 5 represents the dataframe with k = 5.

References

Kaufman L, Rousseeuw PJ (2009). Finding groups in data: an introduction to cluster analysis, volume 344. John Wiley & Sons.

Examples

# Using example data from our package
data("ontMetrics")
# Without plotting
dataFrameList = qualityRange(ontMetrics, k.range=c(2,3), getImages = FALSE)

# Using example data from our package
data("ontMetrics")
# Without plotting
dataFrameList = qualityRange(ontMetrics, k.range=c(2,3), getImages = FALSE)

Goodness of classifications for a set of k clusters.

Description

There is no substantial clustering structure: [-1, 0.25].
The clustering structure is weak and could be artificial: ]0.25, 0.50].
There is a reasonable clustering structure: ]0.50, 0.70].
A strong clustering structure has been found: ]0.70, 1].

Usage

qualitySet(data, k.set = c(2, 4), cbi = "kmeans",
  all_metrics = FALSE, getImages = FALSE, seed = NULL, ...)
qualitySet(data, k.set = c(2, 4), cbi = "kmeans",
  all_metrics = FALSE, getImages = FALSE, seed = NULL, ...)

Arguments

`data`	A `SummarizedExperiment`. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.
`k.set`	A list of integer values of `k`, as in c(2,4,8). The values must be contained in [2,15] range.
`cbi`	Clusterboot interface name (default: "kmeans"): "kmeans", "clara", "clara_pam", "hclust", "pamk", "pamk_pam", "pamk". Any CBI appended with '_pam' makes use of `pam`. The method used in 'hclust' CBI is "ward.D2".
`all_metrics`	Boolean. If true, clustering is performed upon all the dataset.
`getImages`	Boolean. If true, a plot is displayed.
`seed`	Positive integer. A seed for internal bootstrap.

Value

A list of SummarizedExperiment containing the silhouette width measurements and cluster sizes from k.set.

References

Kaufman L, Rousseeuw PJ (2009). Finding groups in data: an introduction to cluster analysis, volume 344. John Wiley & Sons.

Examples

# Using example data from our package
data("rnaMetrics")
# Without plotting
dataFrameList = qualitySet(rnaMetrics, k.set=c(2,3), getImages = FALSE)

# Using example data from our package
data("rnaMetrics")
# Without plotting
dataFrameList = qualitySet(rnaMetrics, k.set=c(2,3), getImages = FALSE)

Dataset: RNA quality metrics

Description

RNA quality metrics for the assessment of gene expression differences, 2 quality metrics from 16 aliquots of a unique batch of RNA Samples. The metrics are Degradation Factor (DegFact) and RNA Integrity Number (RIN)

Usage

data("rnaMetrics")
data("rnaMetrics")

Format

An object of class SummarizedExperiment with 16 rows and 3 columns.

References

Imbeaud S, Graudens E, Boulanger V, Barlet X, Zaborski P, Eveno E, Mueller O, Schroeder A, Auffray C (2005). “Towards standardization of RNA quality assessment using user-independent classifiers of microcapillary electrophoresis traces.” Nucleic acids research, 33(6), e56–e56.

Stability index.

Description

This analysis permits to estimate whether the clustering is meaningfully affected by small variations in the sample. First, a clustering using the k-means algorithm is carried out. The value of k can be provided by the user. Then, the stability index is the mean of the Jaccard coefficient values of a number of bs bootstrap replicates. The values are in the range [0,1], having the following meaning:

Unstable: [0, 0.60[.
Doubtful: [0.60, 0.75].
Stable: ]0.75, 0.85].
Highly Stable: ]0.85, 1].

Usage

stability(data, k = 5, bs = 100, cbi = "kmeans", getImages = FALSE,
  all_metrics = FALSE, seed = NULL, ...)
stability(data, k = 5, bs = 100, cbi = "kmeans", getImages = FALSE,
  all_metrics = FALSE, seed = NULL, ...)

Arguments

`data`	A `SummarizedExperiment`. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.
`k`	Positive integer. Number of clusters between [2,15] range.
`bs`	Positive integer. Bootstrap value to perform the resampling.
`cbi`	Clusterboot interface name (default: "kmeans"): "kmeans", "clara", "clara_pam", "hclust", "pamk", "pamk_pam", "pamk". Any CBI appended with '_pam' makes use of `pam`. The method used in 'hclust' CBI is "ward.D2".
`getImages`	Boolean. If true, a plot is displayed.
`all_metrics`	Boolean. If true, clustering is performed upon all the dataset.
`seed`	Positive integer. A seed for internal bootstrap.

Value

A ExperimentList containing the stability and cluster measurements for k clusters.

References

Milligan GW, Cheng R (1996). “Measuring the influence of individual data points in a cluster analysis.” Journal of classification, 13(2), 315–335.

Jaccard P (1901). “Distribution de la flore alpine dans le bassin des Dranses et dans quelques regions voisines.” Bull Soc Vaudoise Sci Nat, 37, 241–272.

Examples

# Using example data from our package
data("ontMetrics")
result <- stability(ontMetrics, k=6, getImages=TRUE)

# Using example data from our package
data("ontMetrics")
result <- stability(ontMetrics, k=6, getImages=TRUE)

Stability index for a range of k clusters.

Description

This analysis permits to estimate whether the clustering is meaningfully affected by small variations in the sample. For a range of k values (k.range), a clustering using the k-means algorithm is carried out. Then, the stability index is the mean of the Jaccard coefficient values of a number of bs bootstrap replicates. The values are in the range [0,1], having the following meaning:

Unstable: [0, 0.60[.
Doubtful: [0.60, 0.75].
Stable: ]0.75, 0.85].
Highly Stable: ]0.85, 1].

Usage

stabilityRange(data, k.range = c(2, 15), bs = 100, cbi = "kmeans",
  getImages = FALSE, all_metrics = FALSE, seed = NULL, ...)
stabilityRange(data, k.range = c(2, 15), bs = 100, cbi = "kmeans",
  getImages = FALSE, all_metrics = FALSE, seed = NULL, ...)

Arguments

`data`	A `SummarizedExperiment`. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.
`k.range`	Concatenation of two positive integers. The first value `k.range[1]` is considered as the lower bound of the range, whilst the second one, `k.range[2]`, as the higher. Both values must be contained in [2,15] range.
`bs`	Positive integer. Bootstrap value to perform the resampling.
`cbi`	Clusterboot interface name (default: "kmeans"): "kmeans", "clara", "clara_pam", "hclust", "pamk", "pamk_pam", "pamk". Any CBI appended with '_pam' makes use of `pam`. The method used in 'hclust' CBI is "ward.D2".
`getImages`	Boolean. If true, a plot is displayed.
`all_metrics`	Boolean. If true, clustering is performed upon all the dataset.
`seed`	Positive integer. A seed for internal bootstrap.

Value

A ExperimentList containing the stability and cluster measurements for 2 to k clusters.

References

Milligan GW, Cheng R (1996). “Measuring the influence of individual data points in a cluster analysis.” Journal of classification, 13(2), 315–335.

Jaccard P (1901). “Distribution de la flore alpine dans le bassin des Dranses et dans quelques regions voisines.” Bull Soc Vaudoise Sci Nat, 37, 241–272.

Examples

# Using example data from our package
data("ontMetrics")
result <- stabilityRange(ontMetrics, k.range=c(2,3))

# Using example data from our package
data("ontMetrics")
result <- stabilityRange(ontMetrics, k.range=c(2,3))

Stability index for a set of k clusters.

Description

This analysis permits to estimate whether the clustering is meaningfully affected by small variations in the sample. For a set of k values (k.set), a clustering using the k-means algorithm is carried out. Then, the stability index is the mean of the Jaccard coefficient values of a number of bs bootstrap replicates. The values are in the range [0,1], having the following meaning:

Unstable: [0, 0.60[.
Doubtful: [0.60, 0.75].
Stable: ]0.75, 0.85].
Highly Stable: ]0.85, 1].

Usage

stabilitySet(data, k.set = c(2, 3), bs = 100, cbi = "kmeans",
  getImages = FALSE, all_metrics = FALSE, seed = NULL, ...)
stabilitySet(data, k.set = c(2, 3), bs = 100, cbi = "kmeans",
  getImages = FALSE, all_metrics = FALSE, seed = NULL, ...)

Arguments

`data`	A `SummarizedExperiment`. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.
`k.set`	A list of integer values of `k`, as in c(2,4,8). The values must be contained in [2,15] range.
`bs`	Positive integer. Bootstrap value to perform the resampling.
`cbi`	Clusterboot interface name (default: "kmeans"): "kmeans", "clara", "clara_pam", "hclust", "pamk", "pamk_pam", "pamk". Any CBI appended with '_pam' makes use of `pam`. The method used in 'hclust' CBI is "ward.D2".
`getImages`	Boolean. If true, a plot is displayed.
`all_metrics`	Boolean. If true, clustering is performed upon all the dataset.
`seed`	Positive integer. A seed for internal bootstrap.

Value

A ExperimentList containing the stability and cluster measurements of the list of k clusters.

References

Milligan GW, Cheng R (1996). “Measuring the influence of individual data points in a cluster analysis.” Journal of classification, 13(2), 315–335.

Jaccard P (1901). “Distribution de la flore alpine dans le bassin des Dranses et dans quelques regions voisines.” Bull Soc Vaudoise Sci Nat, 37, 241–272.

Examples

# Using example data from our package
data("rnaMetrics")
result <- stabilitySet(rnaMetrics, k.set=c(2,3))

# Using example data from our package
data("rnaMetrics")
result <- stabilitySet(rnaMetrics, k.set=c(2,3))

Package 'evaluomeR'

Help Index

Calculate the cluster ID from the optimal cluster per metric for each individual. annotateClustersByMetric

Description

Usage

Arguments

Value

Examples

Dataset: Metrics for biological pathways

Description

Usage

Format

References

Get supported CBIs in evaluomeR.

Description

Usage

Value

Examples

Dataframe getter for qualityRange function.

Description

Usage

Arguments

Value

Examples

Get the range of each metric per cluster from the optimal cluster. getMetricRangeByCluster

Description

Usage

Arguments

Value

Get the range of each metric per cluster from the optimal cluster. getMetricRangeByCluster

Description

Usage

Arguments

Value

Examples

Calculating the optimal value of k. getOptimalKValue

Description

Usage

Arguments

Value

Examples

Global metric score defined by a prediction.

Description

Usage

Arguments

Value

Examples

Calculation of Pearson correlation coefficient.

Description

Usage

Arguments

Value

Examples

Dataset: Structural ontology metrics

Description

Usage

Format

References

Metric values as a boxplot.

Description

Usage

Arguments

Value

Examples

Metric values clustering.

Description

Usage

Arguments

Value

Examples

Comparison between two clusterings as plot. plotMetricsClusterComparison

Description

Usage

Arguments

Value

Examples

Minimum and maximum metric values plot.

Description

Usage

Arguments

Dataframe getter for `qualityRange` function.