Title: | Machine learning tools for automated transcriptome clustering analysis |
---|---|
Description: | Symptomatic heterogeneity in complex diseases reveals differences in molecular states that need to be investigated. However, selecting the numerous parameters of an exploratory clustering analysis in RNA profiling studies requires deep understanding of machine learning and extensive computational experimentation. Tools that assist with such decisions without prior field knowledge are nonexistent and further gene association analyses need to be performed independently. We have developed a suite of tools to automate these processes and make robust unsupervised clustering of transcriptomic data more accessible through automated machine learning based functions. The efficiency of each tool was tested with four datasets characterised by different expression signal strengths. Our toolkit’s decisions reflected the real number of stable partitions in datasets where the subgroups are discernible. Even in datasets with less clear biological distinctions, stable subgroups with different expression profiles and clinical associations were found. |
Authors: | Sokratis Kariotis [aut, cre] |
Maintainer: | Sokratis Kariotis <[email protected]> |
License: | GPL-3 |
Version: | 1.9.0 |
Built: | 2024-10-30 09:08:45 UTC |
Source: | https://github.com/bioc/omada |
Method Selection through intra-method Consensus Partition Consistency
clusteringMethodSelection(data, method.upper.k = 5, number.of.comparisons = 3)
clusteringMethodSelection(data, method.upper.k = 5, number.of.comparisons = 3)
data |
A dataframe, where columns are features and rows are data points |
method.upper.k |
The number of clusters, k, up to which the average agreements will be calculated |
number.of.comparisons |
The number of comparisons to average over per k |
An object of class "methodSelection" containing a dataframe of partition agreement scores for a set of random parameters clustering runs across different methods and the corresponding plot
clusteringMethodSelection(toy_genes, method.upper.k = 3, number.of.comparisons = 2)
clusteringMethodSelection(toy_genes, method.upper.k = 3, number.of.comparisons = 2)
Estimating number of clusters through internal exhaustive ensemble majority voting
clusterVoting(data, min.k, max.k, algorithm)
clusterVoting(data, min.k, max.k, algorithm)
data |
A dataframe, where columns are features and rows are data points |
min.k |
Minimum number of clusters for which we calculate stabilities |
max.k |
Maximum number of clusters for which we calculate stabilities |
algorithm |
The clustering algorithm to use for the multiple clustering runs to be measured |
An object of class "clusterVoting" containing a matrix with metric scores for every k and internal index, cluster memberships for every k, a dataframe with the k votes for every index, k vote frequencies and the frequency barplot of the k votes
clusterVoting(toy_genes, 4,14,"sc")
clusterVoting(toy_genes, 4,14,"sc")
Simulating dataset and calculate stabilities over different number of clusters
feasibilityAnalysis(classes = 3, samples = 320, features = 400)
feasibilityAnalysis(classes = 3, samples = 320, features = 400)
classes |
The number of classes of samples to be reflected in the simulated dataset. Also determines the ks to be considered (classes-2, classes+2) |
samples |
The number of samples in the simulated dataset |
features |
The number of features in the simulated dataset |
An object of class "feasibilityAnalysis" containing the average stabilities for all number of clusters(k), the average (over all k) and maximum stabilities observed and the generated dataset
feasibilityAnalysis(classes = 2, samples = 20, features = 30)
feasibilityAnalysis(classes = 2, samples = 20, features = 30)
Simulating dataset based on existing dataset's dimensions, mean and standard deviation
feasibilityAnalysisDataBased(data, classes = 3)
feasibilityAnalysisDataBased(data, classes = 3)
data |
The dataset to base the simulation extracting the number of samples, features and numeric |
classes |
The number of classes of samples to be reflected in the simulated dataset. Also determines the ks to be considered (classes-2, classes+2) |
An object of class "feasibilityAnalysis" containing the average stabilities for all numbers of clusters(k), the average (over all k) and maximum stabilities observed and the generated dataset
feasibilityAnalysisDataBased(data = toy_genes, classes = 2)
feasibilityAnalysisDataBased(data = toy_genes, classes = 2)
Predictor variable subsampling sets and bootstrapping stability set selection
featureSelection(data, min.k = 2, max.k = 4, step = 5)
featureSelection(data, min.k = 2, max.k = 4, step = 5)
data |
A dataframe, where columns are features and rows are data points |
min.k |
Minimum number of clusters for which we calculate stabilities |
max.k |
Maximum number of clusters for which we calculate stabilities |
step |
The number for additional features each feature set will contain |
An object of class "featureSelection" containing the dataframe of average bootstrap stabilities, where rows represent feature sets and columns number of clusters, the corresponding line plot, the number and the names of the selected features
featureSelection(toy_genes, min.k = 2, max.k = 4, step = 10)
featureSelection(toy_genes, min.k = 2, max.k = 4, step = 10)
Get a dataframe of partition agreement scores for a set of random parameters clustering runs across different methods
get_agreement_scores(object)
get_agreement_scores(object)
object |
An object of class "partitionAgreement" |
A dataframe of partition agreement scores for a set of random parameters clustering runs across different methods
pa.object <- partitionAgreement(toy_genes, algorithm.1 = "spectral", measure.1 = "rbfdot", algorithm.2 = "kmeans",measure.2 = "Lloyd", number.of.clusters = 3) get_agreement_scores(pa.object)
pa.object <- partitionAgreement(toy_genes, algorithm.1 = "spectral", measure.1 = "rbfdot", algorithm.2 = "kmeans",measure.2 = "Lloyd", number.of.clusters = 3) get_agreement_scores(pa.object)
Get a dataframe of average bootstrap stabilities
get_average_feature_k_stabilities(object)
get_average_feature_k_stabilities(object)
object |
An object of class "featureSelection" |
A dataframe of average bootstrap stabilities
fs.object <- featureSelection(toy_genes, min.k = 3, max.k = 4, step = 10) get_average_feature_k_stabilities(fs.object)
fs.object <- featureSelection(toy_genes, min.k = 3, max.k = 4, step = 10) get_average_feature_k_stabilities(fs.object)
Get average stabilities for all numbers of clusters(k)
get_average_stabilities_per_k(object)
get_average_stabilities_per_k(object)
object |
An object of class "feasibilityAnalysis" |
Average stabilities for all numbers of clusters(k)
fa.object <- feasibilityAnalysis(classes = 2, samples = 10, features = 15) average.sts.k <- get_average_stabilities_per_k(fa.object)
fa.object <- feasibilityAnalysis(classes = 2, samples = 10, features = 15) average.sts.k <- get_average_stabilities_per_k(fa.object)
Get the average stability(over all k)
get_average_stability(object)
get_average_stability(object)
object |
An object of class "feasibilityAnalysis" |
The average stability(over all k)
fa.object <- feasibilityAnalysis(classes = 2, samples = 10, features = 15) average.st <- get_average_stability(fa.object)
fa.object <- feasibilityAnalysis(classes = 2, samples = 10, features = 15) average.st <- get_average_stability(fa.object)
Get cluster memberships for every k
get_cluster_memberships_k(object)
get_cluster_memberships_k(object)
object |
An object of class "clusterVoting" |
Cluster memberships for every k
cv.object <- clusterVoting(toy_genes, 4,6,"sc") get_cluster_memberships_k(cv.object)
cv.object <- clusterVoting(toy_genes, 4,6,"sc") get_cluster_memberships_k(cv.object)
Get k vote frequencies
get_cluster_voting_k_votes(object)
get_cluster_voting_k_votes(object)
object |
An object of class "clusterAnalysis" |
Matrix with k vote frequencies
oa.object <- omada(toy_genes, method.upper.k = 4) get_cluster_voting_k_votes(oa.object)
oa.object <- omada(toy_genes, method.upper.k = 4) get_cluster_voting_k_votes(oa.object)
Get cluster memberships for every k
get_cluster_voting_memberships(object)
get_cluster_voting_memberships(object)
object |
An object of class "clusterAnalysis" |
Cluster memberships for every k
oa.object <- omada(toy_genes, method.upper.k = 4) get_cluster_voting_memberships(oa.object)
oa.object <- omada(toy_genes, method.upper.k = 4) get_cluster_voting_memberships(oa.object)
Get a dataframe with the k votes for every index
get_cluster_voting_metric_votes(object)
get_cluster_voting_metric_votes(object)
object |
An object of class "clusterAnalysis" |
Dataframe with the k votes for every index
oa.object <- omada(toy_genes, method.upper.k = 4) get_cluster_voting_metric_votes(oa.object)
oa.object <- omada(toy_genes, method.upper.k = 4) get_cluster_voting_metric_votes(oa.object)
Get a matrix with metric scores for every k and internal index
get_cluster_voting_scores(object)
get_cluster_voting_scores(object)
object |
An object of class "clusterAnalysis" |
A matrix with metric scores for every k and internal index
oa.object <- omada(toy_genes, method.upper.k = 4) get_cluster_voting_scores(oa.object)
oa.object <- omada(toy_genes, method.upper.k = 4) get_cluster_voting_scores(oa.object)
Get the optimal features
get_feature_selection_optimal_features(object)
get_feature_selection_optimal_features(object)
object |
An object of class "clusterAnalysis" |
The list of optimal features
oa.object <- omada(toy_genes, method.upper.k = 4) get_feature_selection_optimal_features(oa.object)
oa.object <- omada(toy_genes, method.upper.k = 4) get_feature_selection_optimal_features(oa.object)
Get the optimal number of features
get_feature_selection_optimal_number_of_features(object)
get_feature_selection_optimal_number_of_features(object)
object |
An object of class "clusterAnalysis" |
The optimal number of features
oa.object <- omada(toy_genes, method.upper.k = 6) get_feature_selection_optimal_number_of_features(oa.object)
oa.object <- omada(toy_genes, method.upper.k = 6) get_feature_selection_optimal_number_of_features(oa.object)
Get a dataframe of average bootstrap stabilities
get_feature_selection_scores(object)
get_feature_selection_scores(object)
object |
An object of class "clusterAnalysis" |
A dataframe of average bootstrap stabilities
oa.object <- omada(toy_genes, method.upper.k = 6) get_feature_selection_scores(oa.object)
oa.object <- omada(toy_genes, method.upper.k = 6) get_feature_selection_scores(oa.object)
Get the simulated dataset
get_generated_dataset(object)
get_generated_dataset(object)
object |
An object of class "feasibilityAnalysis" |
Simulated dataset
fa.object <- feasibilityAnalysis(classes = 4, samples = 50, features = 15) generated.ds <- get_generated_dataset(fa.object)
fa.object <- feasibilityAnalysis(classes = 4, samples = 50, features = 15) generated.ds <- get_generated_dataset(fa.object)
Get a matrix with metric scores for every k and internal index
get_internal_metric_scores(object)
get_internal_metric_scores(object)
object |
An object of class "clusterVoting" |
A matrix with metric scores for every k and internal index
cv.object <- clusterVoting(toy_genes, 4,6,"sc") get_internal_metric_scores(cv.object)
cv.object <- clusterVoting(toy_genes, 4,6,"sc") get_internal_metric_scores(cv.object)
Get the maximum stability
get_max_stability(object)
get_max_stability(object)
object |
An object of class "feasibilityAnalysis" |
The maximum stability
fa.object <- feasibilityAnalysis(classes = 2, samples = 10, features = 15) maximum.st <- get_max_stability(fa.object)
fa.object <- feasibilityAnalysis(classes = 2, samples = 10, features = 15) maximum.st <- get_max_stability(fa.object)
Get a dataframe with the k votes for every index
get_metric_votes_k(object)
get_metric_votes_k(object)
object |
An object of class "clusterVoting" |
Dataframe with the k votes for every index
cv.object <- clusterVoting(toy_genes, 4,6,"sc") get_metric_votes_k(cv.object)
cv.object <- clusterVoting(toy_genes, 4,6,"sc") get_metric_votes_k(cv.object)
Get the optimal features
get_optimal_features(object)
get_optimal_features(object)
object |
An object of class "featureSelection" |
The list of optimal features
fs.object <- featureSelection(toy_genes, min.k = 3, max.k = 6, step = 10) get_optimal_features(fs.object)
fs.object <- featureSelection(toy_genes, min.k = 3, max.k = 6, step = 10) get_optimal_features(fs.object)
Get a dataframe with the memberships of the samples found in the input data
get_optimal_memberships(object)
get_optimal_memberships(object)
object |
An object of class "optimalClustering" |
A dataframe with the memberships of the samples found in the input data
oc.object <- optimalClustering(toy_genes, 4, "spectral") get_optimal_memberships(oc.object)
oc.object <- optimalClustering(toy_genes, 4, "spectral") get_optimal_memberships(oc.object)
Get the optimal number of features
get_optimal_number_of_features(object)
get_optimal_number_of_features(object)
object |
An object of class "featureSelection" |
The optimal number of features
fs.object <- featureSelection(toy_genes, min.k = 3, max.k = 6, step = 10) get_optimal_number_of_features(fs.object)
fs.object <- featureSelection(toy_genes, min.k = 3, max.k = 6, step = 10) get_optimal_number_of_features(fs.object)
Get the optimal parameter used
get_optimal_parameter_used(object)
get_optimal_parameter_used(object)
object |
An object of class "optimalClustering" |
The optimal parameter used
oc.object <- optimalClustering(toy_genes, 4, "spectral") get_optimal_parameter_used(oc.object)
oc.object <- optimalClustering(toy_genes, 4, "spectral") get_optimal_parameter_used(oc.object)
Get the optimal stability score
get_optimal_stability_score(object)
get_optimal_stability_score(object)
object |
An object of class "optimalClustering" |
The optimal stability score
oc.object <- optimalClustering(toy_genes, 4, "spectral") get_optimal_stability_score(oc.object)
oc.object <- optimalClustering(toy_genes, 4, "spectral") get_optimal_stability_score(oc.object)
Get a dataframe of partition agreement scores for a set of random parameters clustering runs across different methods
Get a dataframe of partition agreement scores
get_partition_agreement_scores(object) get_partition_agreement_scores(object)
get_partition_agreement_scores(object) get_partition_agreement_scores(object)
object |
An object of class "clusterAnalysis" |
A dataframe of partition agreement scores for a set of random parameters clustering runs across different methods
A dataframe of partition agreement scores parameters clustering runs across different methods
ms.object <- clusteringMethodSelection(toy_genes, method.upper.k = 3, number.of.comparisons = 2) get_partition_agreement_scores(ms.object) oa.object <- omada(toy_genes, method.upper.k = 4) get_partition_agreement_scores(oa.object)
ms.object <- clusteringMethodSelection(toy_genes, method.upper.k = 3, number.of.comparisons = 2) get_partition_agreement_scores(ms.object) oa.object <- omada(toy_genes, method.upper.k = 4) get_partition_agreement_scores(oa.object)
Get a dataframe with the memberships of the samples found in the input data
get_sample_memberships(object)
get_sample_memberships(object)
object |
An object of class "clusterAnalysis" |
A dataframe with the memberships of the samples found in the input data
oa.object <- omada(toy_genes, method.upper.k = 4) get_sample_memberships(oa.object)
oa.object <- omada(toy_genes, method.upper.k = 4) get_sample_memberships(oa.object)
Get k vote frequencies
get_vote_frequencies_k(object)
get_vote_frequencies_k(object)
object |
An object of class "clusterVoting" |
Matrix with k vote frequencies
cv.object <- clusterVoting(toy_genes, 4,6,"sc") get_vote_frequencies_k(cv.object)
cv.object <- clusterVoting(toy_genes, 4,6,"sc") get_vote_frequencies_k(cv.object)
A wrapper function that utilizes all tools to produce the optimal sample memberships
omada(data, method.upper.k = 5)
omada(data, method.upper.k = 5)
data |
A dataframe, where columns are features and rows are data points |
method.upper.k |
The upper limit of clusters, k, to be considered. Must be more than 2 |
An object of class "clusterAnalysis" containing partition.agreement.scores,partition.agreement.plot,feature.selection.scores, feature.selection.plot, feature.selection.optimal.features, feature.selection.optimal.number.of.features, cluster.voting.scores, cluster.voting.cluster.memberships,cluster.voting.metric.votes,
omada(toy_genes, method.upper.k = 3)
omada(toy_genes, method.upper.k = 3)
Clustering with the optimal parameters estimated by these tools
optimalClustering(data, clusters, algorithm)
optimalClustering(data, clusters, algorithm)
data |
A dataframe, where columns are features and rows are data points |
clusters |
Number of clusters to be generated by this clustering |
algorithm |
The clustering algorithm to be used |
An object of class "optimalClustering" containing a dataframe with the memberships of the samples found in the input data, the optimal stability score and parameter used
optimalClustering(toy_genes, 2,"kmeans")
optimalClustering(toy_genes, 2,"kmeans")
Calculate the agreement (0,1) between two partitioning generated by two clustering runs using the adjust Rand Index. We can use three clustering algorithms (spectral, kmeans and hierarchical) along with the following parameters for each:
partitionAgreement( data, algorithm.1 = "hierarchical", measure.1 = "canberra", hier.agglo.algorithm.1 = "average", algorithm.2 = "hierarchical", measure.2 = "manhattan", hier.agglo.algorithm.2 = "average", number.of.clusters = 5 )
partitionAgreement( data, algorithm.1 = "hierarchical", measure.1 = "canberra", hier.agglo.algorithm.1 = "average", algorithm.2 = "hierarchical", measure.2 = "manhattan", hier.agglo.algorithm.2 = "average", number.of.clusters = 5 )
data |
A dataframe, where columns are features and rows are data points |
algorithm.1 |
Second algorithm to be used (spectral/kmeans/hierarchical) |
measure.1 |
Concerns the first algorithm to be used and represents a kernel for Spectral/kmeans or a distance measure for hierarchical clustering |
hier.agglo.algorithm.1 |
Concerns the first algorithm to be used and represents the agglomerative method for hierarchical clustering (not used in spectral/kmeans clustering) |
algorithm.2 |
First algorithm to be used (spectral/kmeans/hierarchical) |
measure.2 |
Concerns the second algorithm to be used and represents a kernel for Spectral/kmeans or a distance measure for hierarchical clustering |
hier.agglo.algorithm.2 |
Concerns the second algorithm to be used and represents the agglomerative method for hierarchical clustering (not used in spectral/kmeans clustering) |
number.of.clusters |
The upper limit of clusters to form starting from 2 |
Spectral kernels: rbfdot, polydot, vanilladot, tanhdot, laplacedot, besseldot, anovadot, splinedot
K-means kernels: "Hartigan-Wong", Lloyd, Forgy, MacQueen
Hierarchical Agglomeration methods: average, ward.D, ward.D2, single, complete, mcquitty, median, centroid
Distance measures: euclidean, manhattan, canberra, minkowski, maximum
An object of class "partitionAgreement" containing agreements (Rand Indexes) from 1 cluster (ARI=0) up to the number of clusters requested
partitionAgreement(toy_genes, algorithm.1 = "hierarchical", measure.1 = "canberra",hier.agglo.algorithm.1 = "average", algorithm.2 = "hierarchical",measure.2 = "manhattan", hier.agglo.algorithm.2 = "average",number.of.clusters = 3) partitionAgreement(toy_genes, algorithm.1 = "spectral", measure.1 = "rbfdot", algorithm.2 = "kmeans",measure.2 = "Lloyd", number.of.clusters = 5)
partitionAgreement(toy_genes, algorithm.1 = "hierarchical", measure.1 = "canberra",hier.agglo.algorithm.1 = "average", algorithm.2 = "hierarchical",measure.2 = "manhattan", hier.agglo.algorithm.2 = "average",number.of.clusters = 3) partitionAgreement(toy_genes, algorithm.1 = "spectral", measure.1 = "rbfdot", algorithm.2 = "kmeans",measure.2 = "Lloyd", number.of.clusters = 5)
Plot the average bootstrap stabilities
plot_average_stabilities(object)
plot_average_stabilities(object)
object |
An object of class "featureSelection" |
Line plot of average bootstrap stabilities
fs.object <- featureSelection(toy_genes, min.k = 3, max.k = 6, step = 10) plot_average_stabilities(fs.object)
fs.object <- featureSelection(toy_genes, min.k = 3, max.k = 6, step = 10) plot_average_stabilities(fs.object)
Plot k vote frequencies
plot_cluster_voting(object)
plot_cluster_voting(object)
object |
An object of class "clusterAnalysis" |
Plot k vote frequencies
oa.object <- omada(toy_genes, method.upper.k = 3) plot_cluster_voting(oa.object)
oa.object <- omada(toy_genes, method.upper.k = 3) plot_cluster_voting(oa.object)
Plot the average bootstrap stabilities
plot_feature_selection(object)
plot_feature_selection(object)
object |
An object of class "clusterAnalysis" |
Line plot of average bootstrap stabilities
oa.object <- omada(toy_genes, method.upper.k = 4) plot_feature_selection(oa.object)
oa.object <- omada(toy_genes, method.upper.k = 4) plot_feature_selection(oa.object)
Plot of partition agreement scores
Plot of partition agreement scores
plot_partition_agreement(object) plot_partition_agreement(object)
plot_partition_agreement(object) plot_partition_agreement(object)
object |
An object of class "clusterAnalysis" |
Plot of partition agreement scores
Plot of partition agreement scores
ms.object <- clusteringMethodSelection(toy_genes, method.upper.k = 3, number.of.comparisons = 2) plot_partition_agreement(ms.object) oa.object <- omada(toy_genes, method.upper.k = 4) plot_partition_agreement(oa.object)
ms.object <- clusteringMethodSelection(toy_genes, method.upper.k = 3, number.of.comparisons = 2) plot_partition_agreement(ms.object) oa.object <- omada(toy_genes, method.upper.k = 4) plot_partition_agreement(oa.object)
Plot k vote frequencies
plot_vote_frequencies(object)
plot_vote_frequencies(object)
object |
An object of class "clusterVoting" |
Plot k vote frequencies
cv.object <- clusterVoting(toy_genes, 4,6,"sc") plot_vote_frequencies(cv.object)
cv.object <- clusterVoting(toy_genes, 4,6,"sc") plot_vote_frequencies(cv.object)
Column "id" represents genes and column "memberships" represents their respective clusters. Rows are samples
data(toy_gene_memberships)
data(toy_gene_memberships)
An object of class '"cross"'; see [qtl::read.cross()].
nope
nothing
data(toy_gene_memberships)
data(toy_gene_memberships)
Columns are genes and rows are samples
data(toy_genes)
data(toy_genes)
An object of class '"cross"'; see [qtl::read.cross()].
nope
nothing
data(toy_genes)
data(toy_genes)