Title: | A Framework for Consensus Partitioning |
---|---|
Description: | Subgroup classification is a basic task in genomic data analysis, especially for gene expression and DNA methylation data analysis. It can also be used to test the agreement to known clinical annotations, or to test whether there exist significant batch effects. The cola package provides a general framework for subgroup classification by consensus partitioning. It has the following features: 1. It modularizes the consensus partitioning processes that various methods can be easily integrated. 2. It provides rich visualizations for interpreting the results. 3. It allows running multiple methods at the same time and provides functionalities to straightforward compare results. 4. It provides a new method to extract features which are more efficient to separate subgroups. 5. It automatically generates detailed reports for the complete analysis. 6. It allows applying consensus partitioning in a hierarchical manner. |
Authors: | Zuguang Gu [aut, cre] |
Maintainer: | Zuguang Gu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.13.0 |
Built: | 2024-11-29 05:45:18 UTC |
Source: | https://github.com/bioc/cola |
Subset a ConsensusPartitionList object
## S3 method for class 'ConsensusPartitionList' x[i, j, drop = TRUE]
## S3 method for class 'ConsensusPartitionList' x[i, j, drop = TRUE]
x |
A |
i |
Index for top-value methods, character or nummeric. |
j |
Index for partitioning methods, character or nummeric. |
drop |
Whether drop class |
For a specific combination of top-value method and partitioning method, you can also
subset by e.g. x['SD:hclust']
.
A ConsensusPartitionList-class
object or a ConsensusPartition-class
object.
Zuguang Gu <[email protected]>
data(golub_cola) golub_cola[c("SD", "MAD"), c("hclust", "kmeans")] golub_cola["SD", "kmeans"] # a ConsensusPartition object golub_cola["SD:kmeans"] # a ConsensusPartition object golub_cola[["SD:kmeans"]] # a ConsensusPartition object golub_cola["SD", "kmeans", drop = FALSE] # still a ConsensusPartitionList object golub_cola["SD:kmeans", drop = FALSE] # still a ConsensusPartitionList object golub_cola["SD", ] golub_cola[, "hclust"] golub_cola[1:2, 1:2]
data(golub_cola) golub_cola[c("SD", "MAD"), c("hclust", "kmeans")] golub_cola["SD", "kmeans"] # a ConsensusPartition object golub_cola["SD:kmeans"] # a ConsensusPartition object golub_cola[["SD:kmeans"]] # a ConsensusPartition object golub_cola["SD", "kmeans", drop = FALSE] # still a ConsensusPartitionList object golub_cola["SD:kmeans", drop = FALSE] # still a ConsensusPartitionList object golub_cola["SD", ] golub_cola[, "hclust"] golub_cola[1:2, 1:2]
Subset the HierarchicalPartition object
## S3 method for class 'HierarchicalPartition' x[i]
## S3 method for class 'HierarchicalPartition' x[i]
x |
A |
i |
Index. The value should be numeric or a node ID. |
On each node, there is a ConsensusPartition-class
object.
Note you cannot get a sub-hierarchy of the partition.
A ConsensusPartition-class
object.
data(golub_cola_rh) golub_cola_rh["01"]
data(golub_cola_rh) golub_cola_rh["01"]
Subset a ConsensusPartitionList object
## S3 method for class 'ConsensusPartitionList' x[[i]]
## S3 method for class 'ConsensusPartitionList' x[[i]]
x |
A |
i |
Character index for combination of top-value methods and partitioning method in a form of e.g. |
A ConsensusPartition-class
object.
Zuguang Gu <[email protected]>
data(golub_cola) golub_cola[["SD:kmeans"]]
data(golub_cola) golub_cola[["SD:kmeans"]]
Subset the HierarchicalPartition object
## S3 method for class 'HierarchicalPartition' x[[i]]
## S3 method for class 'HierarchicalPartition' x[[i]]
x |
A |
i |
Index. The value should be numeric or a node ID. |
On each node, there is a ConsensusPartition-class
object.
Note you cannot get a sub-hierarchy of the partition.
A ConsensusPartition-class
object.
# There is no example NULL
# There is no example NULL
Remove rows with low variance and impute missing values
adjust_matrix(m, sd_quantile = 0.05, max_na = 0.25, verbose = TRUE)
adjust_matrix(m, sd_quantile = 0.05, max_na = 0.25, verbose = TRUE)
m |
A numeric matrix. |
sd_quantile |
Cutoff of the quantile of standard deviation. Rows with standard deviation less than it are removed. |
max_na |
Maximum NA fraction in each row. Rows with NA fraction larger than it are removed. |
verbose |
Whether to print messages. |
The function uses impute.knn
to impute missing values, then
uses adjust_outlier
to adjust outliers and
removes rows with low standard deviations.
A numeric matrix.
Zuguang Gu <[email protected]>
set.seed(123) m = matrix(rnorm(100), nrow = 10) m[sample(length(m), 5)] = NA m[1, ] = 0 m m2 = adjust_matrix(m) m2
set.seed(123) m = matrix(rnorm(100), nrow = 10) m[sample(length(m), 5)] = NA m[1, ] = 0 m m2 = adjust_matrix(m) m2
Adjust outliers
adjust_outlier(x, q = 0.05)
adjust_outlier(x, q = 0.05)
x |
A numeric vector. |
q |
Percentile to adjust. |
Vaules larger than percentile 1 - q
are adjusted to the 1 - q
percentile and
values smaller than percentile q
are adjusted to the q
percentile
A numeric vector with same length as the original one.
Zuguang Gu <[email protected]>
set.seed(123) x = rnorm(40) x[1] = 100 adjust_outlier(x)
set.seed(123) x = rnorm(40) x[1] = 100 adjust_outlier(x)
All leaves in the hierarchy
## S4 method for signature 'HierarchicalPartition' all_leaves(object, merge_node = merge_node_param())
## S4 method for signature 'HierarchicalPartition' all_leaves(object, merge_node = merge_node_param())
object |
A |
merge_node |
Parameters to merge sub-dendrograms, see |
A vector of node ID.
Zuguang Gu <[email protected]>
data(golub_cola_rh) all_leaves(golub_cola_rh)
data(golub_cola_rh) all_leaves(golub_cola_rh)
All nodes in the hierarchy
## S4 method for signature 'HierarchicalPartition' all_nodes(object, merge_node = merge_node_param())
## S4 method for signature 'HierarchicalPartition' all_nodes(object, merge_node = merge_node_param())
object |
A |
merge_node |
Parameters to merge sub-dendrograms, see |
A vector of node ID.
Zuguang Gu <[email protected]>
data(golub_cola_rh) all_nodes(golub_cola_rh)
data(golub_cola_rh) all_nodes(golub_cola_rh)
All supported partitioning methods
all_partition_methods()
all_partition_methods()
New partitioning methods can be registered by register_partition_methods
.
A vector of supported partitioning methods.
Zuguang Gu <[email protected]>
all_partition_methods()
all_partition_methods()
All supported top-value methods
all_top_value_methods()
all_top_value_methods()
New top-value methods can be registered by register_top_value_methods
.
A vector of supported top-value methods.
Zuguang Gu <[email protected]>
all_top_value_methods()
all_top_value_methods()
Adapted PAC scores
aPAC(consensus_mat)
aPAC(consensus_mat)
consensus_mat |
A consensus matrix. |
For the consensus values x, it is transformed to 1 - x if x < 0.5. After the transformation, for any pair of samples in the consensus matrix, If they are always in a same group or always in different groups, the value x is both to 1. Thus, if the consensus matrix shows stable partitions, values x will be all close to 1. Reflected in the CDF of x, the curve is shifted to the right and the area under CDF curve should be very small.
An aPAC value less than 0.05 is considered as the stable partition, which can be thought the proportion of abmiguous partitioning is less than 0.05.
A numeric value.
data(golub_cola) aPAC(get_consensus(golub_cola[1, 1], k = 2)) aPAC(get_consensus(golub_cola[1, 1], k = 3)) aPAC(get_consensus(golub_cola[1, 1], k = 4)) aPAC(get_consensus(golub_cola[1, 1], k = 5)) aPAC(get_consensus(golub_cola[1, 1], k = 6))
data(golub_cola) aPAC(get_consensus(golub_cola[1, 1], k = 2)) aPAC(get_consensus(golub_cola[1, 1], k = 3)) aPAC(get_consensus(golub_cola[1, 1], k = 4)) aPAC(get_consensus(golub_cola[1, 1], k = 5)) aPAC(get_consensus(golub_cola[1, 1], k = 6))
Ability to correlate to other rows
ATC(mat, cor_fun = stats::cor, min_cor = 0, power = 1, k_neighbours = -1, group = NULL, mc.cores = 1, cores = mc.cores, ...)
ATC(mat, cor_fun = stats::cor, min_cor = 0, power = 1, k_neighbours = -1, group = NULL, mc.cores = 1, cores = mc.cores, ...)
mat |
A numeric matrix. ATC score is calculated by rows. |
cor_fun |
A function which calculates correlations. |
min_cor |
Cutoff for the minimal absolute correlation. |
power |
Power on the correlation values. |
k_neighbours |
Nearest k neighbours. |
mc.cores |
Number of cores. This argument will be removed in future versions. |
cores |
Number of cores. |
group |
A categorical variable. If it is specified, the correlation is only calculated for the rows in the same group as current row. |
... |
Pass to |
For a given row in a matrix, the ATC score is the area above the curve of the curmulative density
distribution of the absolute correlation to all other rows. Formally, if F_i(X)
is the
cumulative distribution function of X
where X
is the absolute correlation for row i with power power
(i.e. x = cor^power
),
ATC_i = 1 - \int_{min_cor}^1 F_i(X)
.
By default the ATC scores are calculated by Pearson correlation, to use Spearman correlation, you can register a new top-value method by:
register_top_value_methods( "ATC_spearman" = function(m) ATC(m, method = "spearman") )
Similarly, to use a robust correlation method, e.g. bicor
function, you can do like:
register_top_value_methods( "ATC_bicor" = function(m) ATC(m, cor_fun = WGCNA::bicor) )
If the number of rows execeeds 30000, it internally uses ATC_approx
.
A vector of numeric values with the same order as rows in the input matrix.
Zuguang Gu <[email protected]>
https://jokergoo.github.io/cola_supplementary/suppl_1_ATC/suppl_1_ATC.html
set.seed(12345) nr1 = 100 mat1 = matrix(rnorm(100*nr1), nrow = nr1) nr2 = 10 require(mvtnorm) sigma = matrix(0.8, nrow = nr2, ncol = nr2); diag(sigma) = 1 mat2 = t(rmvnorm(100, mean = rep(0, nr2), sigma = sigma)) nr3 = 50 sigma = matrix(0.5, nrow = nr3, ncol = nr3); diag(sigma) = 1 mat3 = t(rmvnorm(100, mean = rep(0, nr3), sigma = sigma)) mat = rbind(mat1, mat2, mat3) ATC_score = ATC(mat) plot(ATC_score, pch = 16, col = c(rep(1, nr1), rep(2, nr2), rep(3, nr3)))
set.seed(12345) nr1 = 100 mat1 = matrix(rnorm(100*nr1), nrow = nr1) nr2 = 10 require(mvtnorm) sigma = matrix(0.8, nrow = nr2, ncol = nr2); diag(sigma) = 1 mat2 = t(rmvnorm(100, mean = rep(0, nr2), sigma = sigma)) nr3 = 50 sigma = matrix(0.5, nrow = nr3, ncol = nr3); diag(sigma) = 1 mat3 = t(rmvnorm(100, mean = rep(0, nr3), sigma = sigma)) mat = rbind(mat1, mat2, mat3) ATC_score = ATC(mat) plot(ATC_score, pch = 16, col = c(rep(1, nr1), rep(2, nr2), rep(3, nr3)))
Ability to correlate to other rows - an approximated method
ATC_approx(mat, cor_fun = stats::cor, min_cor = 0, power = 1, k_neighbours = -1, mc.cores = 1, cores = mc.cores, n_sampling = c(1000, 500), group = NULL, ...)
ATC_approx(mat, cor_fun = stats::cor, min_cor = 0, power = 1, k_neighbours = -1, mc.cores = 1, cores = mc.cores, n_sampling = c(1000, 500), group = NULL, ...)
mat |
A numeric matrix. ATC score is calculated by rows. |
cor_fun |
A function which calculates correlations on matrix rows. |
min_cor |
Cutoff for the minimal absolute correlation. |
power |
Power on the correlation values. |
k_neighbours |
Nearest k neighbours. Note when this argument is set, there won't be subset sampling for calculating correlations, whihc means, it will calculate correlation to all other rows. |
mc.cores |
Number of cores. This argument will be removed in future versions. |
cores |
Number of cores. |
n_sampling |
When there are too many rows in the matrix, to get the curmulative distribution of how one row correlates other rows, actually we don't need to use all the rows in the matrix, e.g. 1000 rows can already give a very nice estimation. |
group |
A categorical variable. If it is specified, the correlation is only calculated for the rows in the same group as current row. |
... |
Pass to |
For a matrix with huge number of rows. It is not possible to calculate correlation to all other rows, thus the correlation is only calculated for a randomly sampled subset of othe rows.
With small numbers of rows of the matrix, ATC
should be used which calculates the "exact" ATC value, but the value of ATC
and ATC_approx
should be very similar.
# There is no example NULL
# There is no example NULL
A bottle of cola
cola()
cola()
Simply serve you a bottle of cola.
The ASCII art is from http://ascii.co.uk/art/coke .
No value is returned.
Zuguang Gu <[email protected]>
for(i in 1:10) cola()
for(i in 1:10) cola()
Global parameters
cola_opt(..., RESET = FALSE, READ.ONLY = NULL, LOCAL = FALSE, ADD = FALSE)
cola_opt(..., RESET = FALSE, READ.ONLY = NULL, LOCAL = FALSE, ADD = FALSE)
... |
Arguments for the parameters, see "details" section. |
RESET |
Whether to reset to default values. |
READ.ONLY |
Please ignore. |
LOCAL |
Please ignore. |
ADD |
Please ignore. |
There are following global parameters:
group_diff
Used in get_signatures,ConsensusPartition-method
to globally control the minimal difference between subgroups.
fdr_cutoff
Used in get_signatures,ConsensusPartition-method
to globally control the cutoff of FDR for the differential signature tests.
color_set_2
Colors for the predicted subgroups.
help
Whether to print help messages.
message
Whether to print messages.
cola_opt cola_opt$group_diff = 0.2 # e.g. for methylation datasets cola_opt$fdr_cutoff = 0.1 # e.g. for methylation datasets cola_opt cola_opt(RESET = TRUE)
cola_opt cola_opt$group_diff = 0.2 # e.g. for methylation datasets cola_opt$fdr_cutoff = 0.1 # e.g. for methylation datasets cola_opt cola_opt(RESET = TRUE)
Make HTML report from the ConsensusPartition object
## S4 method for signature 'ConsensusPartition' cola_report(object, output_dir = getwd(), title = qq("cola Report for Consensus Partitioning (@{object@top_value_method}:@{object@partition_method})"), env = parent.frame())
## S4 method for signature 'ConsensusPartition' cola_report(object, output_dir = getwd(), title = qq("cola Report for Consensus Partitioning (@{object@top_value_method}:@{object@partition_method})"), env = parent.frame())
object |
A |
output_dir |
The output directory where the report is saved. |
title |
Title of the report. |
env |
Where the objects in the report are found, internally used. |
It generates report for a specific combination of top-value method and partitioning method.
No value is returned.
Zuguang Gu <[email protected]>
cola_report,ConsensusPartitionList-method
# There is no example NULL
# There is no example NULL
Make HTML report from the ConsensusPartitionList object
## S4 method for signature 'ConsensusPartitionList' cola_report(object, output_dir = getwd(), mc.cores = 1, cores = mc.cores, title = "cola Report for Consensus Partitioning", env = parent.frame())
## S4 method for signature 'ConsensusPartitionList' cola_report(object, output_dir = getwd(), mc.cores = 1, cores = mc.cores, title = "cola Report for Consensus Partitioning", env = parent.frame())
object |
A |
output_dir |
The output directory where the report is saved. |
mc.cores |
Multiple cores to use. This argument will be removed in future versions. |
cores |
Number of cores, or a |
title |
Title of the report. |
env |
Where the objects in the report are found, internally used. |
The ConsensusPartitionList-class
object contains results for all combinations of top-value methods and partitioning methods.
This function generates a HTML report which contains all plots and tables for every combination
of method.
The report generation may take a while because it generates A LOT of heatmaps.
Examples of reports can be found at https://jokergoo.github.io/cola_collection/ .
No value is returned.
Zuguang Gu <[email protected]>
if(FALSE) { # the following code is runnable data(golub_cola) cola_report(golub_cola[c("SD", "MAD"), c("hclust", "skmeans")], output_dir = "~/test_cola_cl_report") }
if(FALSE) { # the following code is runnable data(golub_cola) cola_report(golub_cola[c("SD", "MAD"), c("hclust", "skmeans")], output_dir = "~/test_cola_cl_report") }
Method dispatch page for cola_report
.
cola_report
can be dispatched on following classes:
cola_report,HierarchicalPartition-method
, HierarchicalPartition-class
class method
cola_report,ConsensusPartition-method
, ConsensusPartition-class
class method
cola_report,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
# no example NULL
# no example NULL
Make HTML report from the HierarchicalPartition object
## S4 method for signature 'HierarchicalPartition' cola_report(object, output_dir = getwd(), mc.cores = 1, cores = mc.cores, title = qq("cola Report for Hierarchical Partitioning"), env = parent.frame())
## S4 method for signature 'HierarchicalPartition' cola_report(object, output_dir = getwd(), mc.cores = 1, cores = mc.cores, title = qq("cola Report for Hierarchical Partitioning"), env = parent.frame())
object |
A |
output_dir |
The output directory where the report is put. |
mc.cores |
Multiple cores to use. This argument will be removed in future versions. |
cores |
Number of cores, or a |
title |
Title of the report. |
env |
Where the objects in the report are found, internally used. |
This function generates a HTML report which contains all plots for all nodes in the partition hierarchy.
No value is returned.
Zuguang Gu <[email protected]>
if(FALSE) { # the following code is runnable data(golub_cola_rh) cola_report(golub_cola_rh, output_dir = "~/test_cola_rh_report") }
if(FALSE) { # the following code is runnable data(golub_cola_rh) cola_report(golub_cola_rh, output_dir = "~/test_cola_rh_report") }
Example ConsensusPartitionList object
data(cola_rl)
data(cola_rl)
Following code was used to generate cola_rl
:
set.seed(123) m = cbind(rbind(matrix(rnorm(20*20, mean = 1, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.5), nr = 20)), rbind(matrix(rnorm(20*20, mean = 0, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 1, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.5), nr = 20)), rbind(matrix(rnorm(20*20, mean = 0.5, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 0.5, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 1, sd = 0.5), nr = 20)) ) + matrix(rnorm(60*60, sd = 0.5), nr = 60) cola_rl = run_all_consensus_partition_methods(data = m, cores = 6)
Zuguang Gu <[email protected]>
data(cola_rl) cola_rl
data(cola_rl) cola_rl
Collect subgroups from ConsensusPartition object
## S4 method for signature 'ConsensusPartition' collect_classes(object, internal = FALSE, show_row_names = FALSE, row_names_gp = gpar(fontsize = 8), anno = object@anno, anno_col = object@anno_col)
## S4 method for signature 'ConsensusPartition' collect_classes(object, internal = FALSE, show_row_names = FALSE, row_names_gp = gpar(fontsize = 8), anno = object@anno, anno_col = object@anno_col)
object |
A |
internal |
Used internally. |
show_row_names |
Whether to show row names in the heatmap (which is the column name in the original matrix). |
row_names_gp |
Graphics parameters for row names. |
anno |
A data frame of annotations for the original matrix columns. By default it uses the annotations specified in |
anno_col |
A list of colors (color is defined as a named vector) for the annotations. If |
The percent membership matrix and the subgroup labels for each k are plotted in the heatmaps.
Same row in all heatmaps corresponds to the same column in the original matrix.
No value is returned.
Zuguang Gu <[email protected]>
data(golub_cola) collect_classes(golub_cola["ATC", "skmeans"])
data(golub_cola) collect_classes(golub_cola["ATC", "skmeans"])
Collect classes from ConsensusPartitionList object
## S4 method for signature 'ConsensusPartitionList' collect_classes(object, k, show_column_names = FALSE, column_names_gp = gpar(fontsize = 8), anno = get_anno(object), anno_col = get_anno_col(object), simplify = FALSE, ...)
## S4 method for signature 'ConsensusPartitionList' collect_classes(object, k, show_column_names = FALSE, column_names_gp = gpar(fontsize = 8), anno = get_anno(object), anno_col = get_anno_col(object), simplify = FALSE, ...)
object |
A |
k |
Number of subgroups. |
show_column_names |
Whether to show column names in the heatmap (which is the column name in the original matrix). |
column_names_gp |
Graphics parameters for column names. |
anno |
A data frame of annotations for the original matrix columns. By default it uses the annotations specified in |
anno_col |
A list of colors (color is defined as a named vector) for the annotations. If |
simplify |
Internally used. |
... |
Pass to |
There are following panels in the plot:
a heatmap showing partitions predicted from all methods where the top annotation is the consensus partition summarized from partitions from all methods, weighted by mean silhouette scores in every single method.
a row barplot annotation showing the mean silhouette scores for different methods.
The row clustering is applied on the dissimilarity matrix calculated by cl_dissimilarity
with the comembership
method.
The brightness of the color corresponds to the silhouette scores for the consensus partition in each method.
No value is returned.
Zuguang Gu <[email protected]>
data(golub_cola) collect_classes(golub_cola, k = 3)
data(golub_cola) collect_classes(golub_cola, k = 3)
Method dispatch page for collect_classes
.
collect_classes
can be dispatched on following classes:
collect_classes,HierarchicalPartition-method
, HierarchicalPartition-class
class method
collect_classes,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
collect_classes,ConsensusPartition-method
, ConsensusPartition-class
class method
# no example NULL
# no example NULL
Collect classes from HierarchicalPartition object
## S4 method for signature 'HierarchicalPartition' collect_classes(object, merge_node = merge_node_param(), show_row_names = FALSE, row_names_gp = gpar(fontsize = 8), anno = get_anno(object[1]), anno_col = get_anno_col(object[1]), ...)
## S4 method for signature 'HierarchicalPartition' collect_classes(object, merge_node = merge_node_param(), show_row_names = FALSE, row_names_gp = gpar(fontsize = 8), anno = get_anno(object[1]), anno_col = get_anno_col(object[1]), ...)
object |
A |
merge_node |
Parameters to merge sub-dendrograms, see |
show_row_names |
Whether to show the row names. |
row_names_gp |
Graphic parameters for row names. |
anno |
A data frame of annotations for the original matrix columns. By default it uses the annotations specified in |
anno_col |
A list of colors (color is defined as a named vector) for the annotations. If |
... |
Other arguments. |
The function plots the hierarchy of the classes.
No value is returned.
Zuguang Gu <[email protected]>
data(golub_cola_rh) collect_classes(golub_cola_rh) collect_classes(golub_cola_rh, merge_node = merge_node_param(depth = 2))
data(golub_cola_rh) collect_classes(golub_cola_rh) collect_classes(golub_cola_rh, merge_node = merge_node_param(depth = 2))
Collect plots from ConsensusPartition object
## S4 method for signature 'ConsensusPartition' collect_plots(object, verbose = TRUE)
## S4 method for signature 'ConsensusPartition' collect_plots(object, verbose = TRUE)
object |
A |
verbose |
Whether print messages. |
Plots by plot_ecdf
, collect_classes,ConsensusPartition-method
, consensus_heatmap
, membership_heatmap
and get_signatures
are arranged in one single page, for all avaialble k.
No value is returned.
Zuguang Gu <[email protected]>
collect_plots,ConsensusPartitionList-method
collects plots for the ConsensusPartitionList-class
object.
data(golub_cola) collect_plots(golub_cola["ATC", "skmeans"])
data(golub_cola) collect_plots(golub_cola["ATC", "skmeans"])
Collect plots from ConsensusPartitionList object
## S4 method for signature 'ConsensusPartitionList' collect_plots(object, k = 2, fun = consensus_heatmap, top_value_method = object@top_value_method, partition_method = object@partition_method, verbose = TRUE, mc.cores = 1, cores = mc.cores, ...)
## S4 method for signature 'ConsensusPartitionList' collect_plots(object, k = 2, fun = consensus_heatmap, top_value_method = object@top_value_method, partition_method = object@partition_method, verbose = TRUE, mc.cores = 1, cores = mc.cores, ...)
object |
A |
k |
Number of subgroups. |
fun |
Function used to generate plots. Valid functions are |
top_value_method |
A vector of top-value methods. |
partition_method |
A vector of partitioning methods. |
verbose |
Whether to print message. |
mc.cores |
Number of cores. This argument will be removed in figure versions. |
cores |
Number of cores, or a |
... |
other Arguments passed to corresponding |
Plots for all combinations of top-value methods and parittioning methods are arranged in one single page.
This function makes it easy to directly compare results from multiple methods.
No value is returned.
Zuguang Gu <[email protected]>
collect_plots,ConsensusPartition-method
collects plots for a single ConsensusPartition-class
object.
data(golub_cola) collect_plots(cola_rl, k = 3) collect_plots(cola_rl, k = 3, fun = membership_heatmap) collect_plots(cola_rl, k = 3, fun = get_signatures)
data(golub_cola) collect_plots(cola_rl, k = 3) collect_plots(cola_rl, k = 3, fun = membership_heatmap) collect_plots(cola_rl, k = 3, fun = get_signatures)
Method dispatch page for collect_plots
.
collect_plots
can be dispatched on following classes:
collect_plots,ConsensusPartition-method
, ConsensusPartition-class
class method
collect_plots,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
# no example NULL
# no example NULL
Draw and compare statistics for a single method
## S4 method for signature 'ConsensusPartition' collect_stats(object, ...)
## S4 method for signature 'ConsensusPartition' collect_stats(object, ...)
object |
A |
... |
Other arguments. |
It is identical to select_partition_number,ConsensusPartition-method
.
# There is no example NULL
# There is no example NULL
Draw and compare statistics for multiple methods
## S4 method for signature 'ConsensusPartitionList' collect_stats(object, k, layout_nrow = 2, all_stats = FALSE, ...)
## S4 method for signature 'ConsensusPartitionList' collect_stats(object, k, layout_nrow = 2, all_stats = FALSE, ...)
object |
A |
k |
Number of subgroups. |
layout_nrow |
Number of rows in the layout |
all_stats |
Whether to show all statistics that were calculated. Used internally. |
... |
Other arguments |
It draws heatmaps for statistics for multiple methods in parallel, so that users can compare which combination of methods gives the best results with given the number of subgroups.
data(golub_cola) collect_stats(golub_cola, k = 3)
data(golub_cola) collect_stats(golub_cola, k = 3)
Method dispatch page for collect_stats
.
collect_stats
can be dispatched on following classes:
collect_stats,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
collect_stats,ConsensusPartition-method
, ConsensusPartition-class
class method
# no example NULL
# no example NULL
Column names of the matrix
## S4 method for signature 'ConsensusPartition' colnames(x)
## S4 method for signature 'ConsensusPartition' colnames(x)
x |
A |
# There is no example NULL
# There is no example NULL
Column names of the matrix
## S4 method for signature 'ConsensusPartitionList' colnames(x)
## S4 method for signature 'ConsensusPartitionList' colnames(x)
x |
A |
# There is no example NULL
# There is no example NULL
Method dispatch page for colnames
.
colnames
can be dispatched on following classes:
colnames,ConsensusPartition-method
, ConsensusPartition-class
class method
colnames,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
colnames,DownSamplingConsensusPartition-method
, DownSamplingConsensusPartition-class
class method
colnames,HierarchicalPartition-method
, HierarchicalPartition-class
class method
# no example NULL
# no example NULL
Column names of the matrix
## S4 method for signature 'DownSamplingConsensusPartition' colnames(x)
## S4 method for signature 'DownSamplingConsensusPartition' colnames(x)
x |
A |
# There is no example NULL
# There is no example NULL
Column names of the matrix
## S4 method for signature 'HierarchicalPartition' colnames(x)
## S4 method for signature 'HierarchicalPartition' colnames(x)
x |
A |
# There is no example NULL
# There is no example NULL
Compare two partitionings
## S4 method for signature 'ConsensusPartition' compare_partitions(object, object2, output_file, k1 = 2, k2 = 2, dimension_reduction_method = "UMAP", id_mapping = guess_id_mapping(rownames(object), "org.Hs.eg.db", FALSE), row_km1 = ifelse(k1 == 2, 2, 1), row_km2 = ifelse(k1 ==2 && k2 == 2, 2, 1), row_km3 = ifelse(k2 == 2, 2, 1))
## S4 method for signature 'ConsensusPartition' compare_partitions(object, object2, output_file, k1 = 2, k2 = 2, dimension_reduction_method = "UMAP", id_mapping = guess_id_mapping(rownames(object), "org.Hs.eg.db", FALSE), row_km1 = ifelse(k1 == 2, 2, 1), row_km2 = ifelse(k1 ==2 && k2 == 2, 2, 1), row_km3 = ifelse(k2 == 2, 2, 1))
object |
A |
object2 |
A |
output_file |
The path of the output HTML file. If it is not specified, the report will be opened in the web browser. |
k1 |
Number of subgroups in |
k2 |
Number of subgroups in |
dimension_reduction_method |
Which dimension reduction method to use. |
id_mapping |
|
row_km1 |
Number of k-means groups, see Details. |
row_km2 |
Number of k-means groups, see Details. |
row_km3 |
Number of k-means groups, see Details. |
The function produces a HTML report which includes comparisons between two partitioning results.
In the report, there are three heatmaps which visualize A) the signature genes specific in the first partition, B) the signature genes
both in the two partitionings and C) the signatures genes specific in the second partition. Argument row_km1
, row_km2
and
row_km3
control how many k-means groups should be applied on the three heatmaps.
## Not run: data(golub_cola) require(hu6800.db) x = hu6800ENTREZID mapped_probes = mappedkeys(x) id_mapping = unlist(as.list(x[mapped_probes])) compare_partitions(golub_cola["ATC:skmeans"], golub_cola["SD:kmeans"], id_mapping = id_mapping) ## End(Not run)
## Not run: data(golub_cola) require(hu6800.db) x = hu6800ENTREZID mapped_probes = mappedkeys(x) id_mapping = unlist(as.list(x[mapped_probes])) compare_partitions(golub_cola["ATC:skmeans"], golub_cola["SD:kmeans"], id_mapping = id_mapping) ## End(Not run)
Compare Signatures from Different k
## S4 method for signature 'ConsensusPartition' compare_signatures(object, k = object@k, verbose = interactive(), ...)
## S4 method for signature 'ConsensusPartition' compare_signatures(object, k = object@k, verbose = interactive(), ...)
object |
A |
k |
Number of subgroups. Value should be a vector. |
verbose |
Whether to print message. |
... |
Other arguments passed to |
It plots an Euler diagram showing the overlap of signatures from different k.
data(golub_cola) res = golub_cola["ATC", "skmeans"] compare_signatures(res)
data(golub_cola) res = golub_cola["ATC", "skmeans"] compare_signatures(res)
Method dispatch page for compare_signatures
.
compare_signatures
can be dispatched on following classes:
compare_signatures,HierarchicalPartition-method
, HierarchicalPartition-class
class method
compare_signatures,ConsensusPartition-method
, ConsensusPartition-class
class method
# no example NULL
# no example NULL
Compare Signatures from Different Nodes
## S4 method for signature 'HierarchicalPartition' compare_signatures(object, merge_node = merge_node_param(), method = c("euler", "upset"), upset_max_comb_sets = 20, verbose = interactive(), ...)
## S4 method for signature 'HierarchicalPartition' compare_signatures(object, merge_node = merge_node_param(), method = c("euler", "upset"), upset_max_comb_sets = 20, verbose = interactive(), ...)
object |
A |
merge_node |
Parameters to merge sub-dendrograms, see |
method |
Method to visualize. |
upset_max_comb_sets |
Maximal number of combination sets to show. |
verbose |
Whether to print message. |
... |
Other arguments passed to |
It plots an Euler diagram or a UpSet plot showing the overlap of signatures from different nodes.
On each node, the number of subgroups is inferred by suggest_best_k,ConsensusPartition-method
.
data(golub_cola_rh) compare_signatures(golub_cola_rh)
data(golub_cola_rh) compare_signatures(golub_cola_rh)
Concordance to the consensus partition
concordance(membership_each, class)
concordance(membership_each, class)
membership_each |
A matrix which contains partitions in every single runs where columns correspond to runs. The object can be get from |
class |
Consensus subgroup labels. |
Note subgroup labels in membership_each
should already be adjusted to the consensus labels, i.e. by relabel_class
.
The concordance score is the mean proportion of samples having the same subgroup labels as the consensus labels among individual partition runs.
A numeric value.
Zuguang Gu <[email protected]>
data(golub_cola) membership_each = get_membership(golub_cola["SD", "kmeans"], each = TRUE, k = 3) consensus_classes = get_classes(golub_cola["SD", "kmeans"], k = 3)$class concordance(membership_each, consensus_classes)
data(golub_cola) membership_each = get_membership(golub_cola["SD", "kmeans"], each = TRUE, k = 3) consensus_classes = get_classes(golub_cola["SD", "kmeans"], k = 3)$class concordance(membership_each, consensus_classes)
Adjust parameters for default ATC method
config_ATC(cor_fun = stats::cor, min_cor = 0, power = 1, k_neighbours = -1, group = NULL, cores = 1, ...)
config_ATC(cor_fun = stats::cor, min_cor = 0, power = 1, k_neighbours = -1, group = NULL, cores = 1, ...)
cor_fun |
A function that calculates correlations from a matrix (on matrix rows). |
min_cor |
Cutoff for the minimal absolute correlation. |
power |
Power on the correlation values. |
k_neighbours |
Number of the closest neighbours to use. |
group |
A categorical variable. |
cores |
Number of cores. |
... |
Other arguments passed to |
This function changes the default parameters for ATC method. All the arguments in this function all pass to ATC
.
# use Spearman correlation config_ATC(cor_fun = function(m) stats::cor(m, method = "spearman")) # use knn config_ATC(k_neighbours = 100)
# use Spearman correlation config_ATC(cor_fun = function(m) stats::cor(m, method = "spearman")) # use knn config_ATC(k_neighbours = 100)
Heatmap of the consensus matrix
## S4 method for signature 'ConsensusPartition' consensus_heatmap(object, k, internal = FALSE, anno = object@anno, anno_col = get_anno_col(object), show_row_names = FALSE, show_column_names = FALSE, row_names_gp = gpar(fontsize = 8), simplify = FALSE, ...)
## S4 method for signature 'ConsensusPartition' consensus_heatmap(object, k, internal = FALSE, anno = object@anno, anno_col = get_anno_col(object), show_row_names = FALSE, show_column_names = FALSE, row_names_gp = gpar(fontsize = 8), simplify = FALSE, ...)
object |
A |
k |
Number of subgroups. |
internal |
Used internally. |
anno |
A data frame of annotations for the original matrix columns. By default it uses the annotations specified in |
anno_col |
A list of colors (color is defined as a named vector) for the annotations. If |
show_row_names |
Whether plot row names on the consensus heatmap (which are the column names in the original matrix) |
show_column_names |
Whether show column names. |
row_names_gp |
Graphics parameters for row names. |
simplify |
Internally used. |
... |
other arguments. |
For row i and column j in the consensus matrix, the value of corresponding x_ij is the probability of sample i and sample j being in a same group from all partitions.
There are following heatmaps from left to right:
probability of the sample to stay in the corresponding group
silhouette scores which measure the distance of an item to the second closest subgroups.
predicted subgroups
consensus matrix.
more annotations if provided as anno
One thing that is very important to note is that since we already know the consensus subgroups from consensus partition, in the heatmap, only rows or columns within the group is clustered.
No value is returned.
Zuguang Gu <[email protected]>
membership_heatmap,ConsensusPartition-method
data(golub_cola) consensus_heatmap(golub_cola["ATC", "skmeans"], k = 3)
data(golub_cola) consensus_heatmap(golub_cola["ATC", "skmeans"], k = 3)
Consensus partition
consensus_partition(data, top_value_method = "ATC", top_n = NULL, partition_method = "skmeans", max_k = 6, k = NULL, sample_by = "row", p_sampling = 0.8, partition_repeat = 50, partition_param = list(), anno = NULL, anno_col = NULL, scale_rows = NULL, verbose = TRUE, mc.cores = 1, cores = mc.cores, prefix = "", .env = NULL, help = cola_opt$help)
consensus_partition(data, top_value_method = "ATC", top_n = NULL, partition_method = "skmeans", max_k = 6, k = NULL, sample_by = "row", p_sampling = 0.8, partition_repeat = 50, partition_param = list(), anno = NULL, anno_col = NULL, scale_rows = NULL, verbose = TRUE, mc.cores = 1, cores = mc.cores, prefix = "", .env = NULL, help = cola_opt$help)
data |
A numeric matrix where subgroups are found by columns. |
top_value_method |
A single top-value method. Available methods are in |
top_n |
Number of rows with top values. The value can be a vector with length > 1. When n > 5000, the function only randomly sample 5000 rows from top n rows. If |
partition_method |
A single partitioning method. Available methods are in |
max_k |
Maximal number of subgroups to try. The function will try for |
k |
Alternatively, you can specify a vector k. |
sample_by |
Should randomly sample the matrix by rows or by columns? |
p_sampling |
Proportion of the submatrix which contains the top n rows to sample. |
partition_repeat |
Number of repeats for the random sampling. |
partition_param |
Parameters for the partition method which are passed to |
anno |
A data frame with known annotation of samples. The annotations will be plotted in heatmaps and the correlation to predicted subgroups will be tested. |
anno_col |
A list of colors (color is defined as a named vector) for the annotations. If |
scale_rows |
Whether to scale rows. If it is |
verbose |
Whether print messages. |
mc.cores |
Multiple cores to use. This argument will be removed in future versions. |
cores |
Number of cores, or a |
prefix |
Internally used. |
.env |
An environment, internally used. |
help |
Whether to print help messages. |
The function performs analysis in following steps:
calculate scores for rows by top-value method,
for each top_n value, take top n rows,
randomly sample p_sampling
rows from the top_n-row matrix and perform partitioning for partition_repeats
times,
collect partitions from all individual partitions and summarize a consensus partition.
A ConsensusPartition-class
object. Simply type object in the interactive R session
to see which functions can be applied on it.
Zuguang Gu <[email protected]>
run_all_consensus_partition_methods
runs consensus partitioning with multiple top-value methods
and multiple partitioning methods.
set.seed(123) m = cbind(rbind(matrix(rnorm(20*20, mean = 1, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.5), nr = 20)), rbind(matrix(rnorm(20*20, mean = 0, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 1, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.5), nr = 20)), rbind(matrix(rnorm(20*20, mean = 0.5, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 0.5, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 1, sd = 0.5), nr = 20)) ) + matrix(rnorm(60*60, sd = 0.5), nr = 60) res = consensus_partition(m, partition_repeat = 10, top_n = c(10, 20, 50)) res
set.seed(123) m = cbind(rbind(matrix(rnorm(20*20, mean = 1, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.5), nr = 20)), rbind(matrix(rnorm(20*20, mean = 0, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 1, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.5), nr = 20)), rbind(matrix(rnorm(20*20, mean = 0.5, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 0.5, sd = 0.5), nr = 20), matrix(rnorm(20*20, mean = 1, sd = 0.5), nr = 20)) ) + matrix(rnorm(60*60, sd = 0.5), nr = 60) res = consensus_partition(m, partition_repeat = 10, top_n = c(10, 20, 50)) res
Consensus partitioning only with a subset of columns
consensus_partition_by_down_sampling(data, top_value_method = "ATC", top_n = NULL, partition_method = "skmeans", max_k = 6, k = NULL, subset = min(round(ncol(data)*0.2), 250), pre_select = TRUE, verbose = TRUE, prefix = "", anno = NULL, anno_col = NULL, predict_method = "centroid", dist_method = c("euclidean", "correlation", "cosine"), .env = NULL, .predict = TRUE, mc.cores = 1, cores = mc.cores, ...)
consensus_partition_by_down_sampling(data, top_value_method = "ATC", top_n = NULL, partition_method = "skmeans", max_k = 6, k = NULL, subset = min(round(ncol(data)*0.2), 250), pre_select = TRUE, verbose = TRUE, prefix = "", anno = NULL, anno_col = NULL, predict_method = "centroid", dist_method = c("euclidean", "correlation", "cosine"), .env = NULL, .predict = TRUE, mc.cores = 1, cores = mc.cores, ...)
data |
A numeric matrix where subgroups are found by columns. |
top_value_method |
A single top-value method. Available methods are in |
top_n |
Number of rows with top values. The value can be a vector with length > 1. When n > 5000, the function only randomly sample 5000 rows from top n rows. If |
partition_method |
A single partitioning method. Available methods are in |
max_k |
Maximal number of subgroups to try. The function will try for |
k |
Alternatively, you can specify a vector k. |
subset |
Number of columns to randomly sample, or a vector of selected indices. |
pre_select |
Whether to pre-select by k-means. |
verbose |
Whether to print messages. |
prefix |
Internally used. |
anno |
Annotation data frame. |
anno_col |
Annotation colors. |
predict_method |
Method for predicting class labels. Possible values are "centroid", "svm" and "randomForest". |
dist_method |
Method for predict the class for other columns. |
.env |
An environment, internally used. |
.predict |
Internally used. |
mc.cores |
Number of cores. This argument will be removed in future versions. |
cores |
Number of cores, or a |
... |
All pass to |
The function performs consensus partitioning only with a small subset
of columns and the class of other columns are predicted by predict_classes,ConsensusPartition-method
.
## Not run: data(golub_cola) m = get_matrix(golub_cola) set.seed(123) golub_cola_ds = consensus_partition_by_down_sampling(m, subset = 50, anno = get_anno(golub_cola), anno_col = get_anno_col(golub_cola), top_value_method = "SD", partition_method = "kmeans") ## End(Not run)
## Not run: data(golub_cola) m = get_matrix(golub_cola) set.seed(123) golub_cola_ds = consensus_partition_by_down_sampling(m, subset = 50, anno = get_anno(golub_cola), anno_col = get_anno_col(golub_cola), top_value_method = "SD", partition_method = "kmeans") ## End(Not run)
The ConsensusPartition class
The ConsensusPartition-class
has following methods:
consensus_partition
:constructor method, run consensus partitioning with a specified top-value method and a partitioning method.
select_partition_number,ConsensusPartition-method
:make a list of plots for selecting optimized number of subgroups.
consensus_heatmap,ConsensusPartition-method
:make heatmap of the consensus matrix.
membership_heatmap,ConsensusPartition-method
:make heatmap of the membership for individual partitions.
get_signatures,ConsensusPartition-method
:get the signature rows and make heatmap.
dimension_reduction,ConsensusPartition-method
:make dimension reduction plots.
collect_plots,ConsensusPartition-method
:make heatmaps for consensus matrix and membership matrix with different number of subgroups.
collect_classes,ConsensusPartition-method
:make heatmap with subgroups with different numbers.
get_param,ConsensusPartition-method
:get parameters for the consensus clustering.
get_matrix,ConsensusPartition-method
:get the original matrix.
get_consensus,ConsensusPartition-method
:get the consensus matrix.
get_membership,ConsensusPartition-method
:get the membership of partitions generated from random samplings.
get_stats,ConsensusPartition-method
:get statistics for the consensus partitioning.
get_classes,ConsensusPartition-method
:get the consensus subgroup labels and other columns.
suggest_best_k,ConsensusPartition-method
:guess the best number of subgroups.
test_to_known_factors,ConsensusPartition-method
:test correlation between predicted subgroups and known factors, if available.
cola_report,ConsensusPartition-method
:generate a HTML report for the whole analysis.
functional_enrichment,ConsensusPartition-method
:perform functional enrichment analysis on significant genes if rows in the matrix can be corresponded to genes.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
The ConsensusPartitionList class
The object contains results from all combinations of top-value methods and partitioning methods.
The ConsensusPartitionList-class
provides following methods:
run_all_consensus_partition_methods
:constructor method.
top_rows_overlap,ConsensusPartitionList-method
:plot the overlaps of top rows under different top-value methods.
top_rows_heatmap,ConsensusPartitionList-method
:plot the heatmap of top rows under different top-value methods.
get_classes,ConsensusPartitionList-method
:get consensus subgroup labels merged from all methods.
get_matrix,ConsensusPartition-method
:get the original matrix.
get_stats,ConsensusPartitionList-method
:get statistics for the partition for a specified k.
get_membership,ConsensusPartitionList-method
:get consensus membership matrix summarized from all methods.
suggest_best_k,ConsensusPartitionList-method
:guess the best number of subgroups for all methods.
collect_plots,ConsensusPartitionList-method
:collect plots from all combinations of top-value methods and partitioning methods with choosing a plotting function.
collect_classes,ConsensusPartitionList-method
:make a plot which contains predicted subgroups from all combinations of top-value methods and partitioning methods.
test_to_known_factors,ConsensusPartitionList-method
:test correlation between predicted subgroups and known annotations, if provided.
cola_report,ConsensusPartitionList-method
:generate a HTML report for the whole analysis.
functional_enrichment,ConsensusPartitionList-method
:perform functional enrichment analysis on significant genes if rows in the matrix can be corresponded to genes.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Correspond between a list of rankings
correspond_between_rankings(lt, top_n = length(lt[[1]]), col = cola_opt$color_set_1[1:length(lt)], ...)
correspond_between_rankings(lt, top_n = length(lt[[1]]), col = cola_opt$color_set_1[1:length(lt)], ...)
lt |
A list of scores under different metrics. |
top_n |
Top n elements to show the correspondance. |
col |
A vector of colors for |
... |
Pass to |
It makes plots for every pairwise comparison in lt
.
No value is returned.
Zuguang Gu <[email protected]>
require(matrixStats) mat = matrix(runif(1000), ncol = 10) x1 = rowSds(mat) x2 = rowMads(mat) x3 = rowSds(mat)/rowMeans(mat) correspond_between_rankings(lt = list(SD = x1, MAD = x2, CV = x3), top_n = 20, col = c("red", "blue", "green"))
require(matrixStats) mat = matrix(runif(1000), ncol = 10) x1 = rowSds(mat) x2 = rowMads(mat) x3 = rowSds(mat)/rowMeans(mat) correspond_between_rankings(lt = list(SD = x1, MAD = x2, CV = x3), top_n = 20, col = c("red", "blue", "green"))
Correspond two rankings
correspond_between_two_rankings(x1, x2, name1, name2, col1 = 2, col2 = 3, top_n = round(0.25*length(x1)), transparency = 0.9, pt_size = unit(1, "mm"), newpage = TRUE, ratio = c(1, 1, 1))
correspond_between_two_rankings(x1, x2, name1, name2, col1 = 2, col2 = 3, top_n = round(0.25*length(x1)), transparency = 0.9, pt_size = unit(1, "mm"), newpage = TRUE, ratio = c(1, 1, 1))
x1 |
A vector of scores calculated by one metric. |
x2 |
A vector of scores calculated by another metric. |
name1 |
Name of the first metric. |
name2 |
Name of the second metric. |
col1 |
Color for the first metric. |
col2 |
Color for the second metric. |
top_n |
Top n elements to show the correspondance. |
transparency |
Transparency of the connecting lines. |
pt_size |
Size of the points, must be a |
newpage |
Whether to plot in a new graphic page. |
ratio |
Ratio of width of the left barplot, connection lines and right barplot. The three values will be scaled to a sum of 1. |
In x1
and x2
, the i^th element in both vectors corresponds to the same object (e.g. same row if they are calculated from a matrix) but with different
scores under different metrics.
x1
and x2
are sorted in the left panel and right panel respectively. The top n elements
under corresponding metric are highlighted by vertical colored lines in both panels.
The left and right panels also shown as barplots of the scores in the two metrics.
Between the left and right panels, there are lines connecting the same element (e.g. i^th element in x1
and x2
)
in the two ordered vectors so that you can see how a same element has two different ranks in the two metrics.
Under the plot is a simple Venn diagram showing the overlaps of the top n elements by the two metrics.
No value is returned.
Zuguang Gu <[email protected]>
correspond_between_rankings
draws for more than 2 sets of rankings.
require(matrixStats) mat = matrix(runif(1000), ncol = 10) x1 = rowSds(mat) x2 = rowMads(mat) correspond_between_two_rankings(x1, x2, name1 = "SD", name2 = "MAD", top_n = 20)
require(matrixStats) mat = matrix(runif(1000), ncol = 10) x1 = rowSds(mat) x2 = rowMads(mat) correspond_between_two_rankings(x1, x2, name1 = "SD", name2 = "MAD", top_n = 20)
Perform DAVID enrichment analysis
david_enrichment(genes, email, catalog = c("GOTERM_CC_FAT", "GOTERM_BP_FAT", "GOTERM_MF_FAT", "KEGG_PATHWAY"), idtype = "ENSEMBL_GENE_ID", species = "Homo sapiens")
david_enrichment(genes, email, catalog = c("GOTERM_CC_FAT", "GOTERM_BP_FAT", "GOTERM_MF_FAT", "KEGG_PATHWAY"), idtype = "ENSEMBL_GENE_ID", species = "Homo sapiens")
genes |
A vector of gene identifiers. |
email |
The email that user registered on DAVID web service (https://david.ncifcrf.gov/content.jsp?file=WS.html ). |
catalog |
A vector of function catalogs. Valid values should be in |
idtype |
ID types for the input gene list. Valid values should be in |
species |
Full species name if the ID type is not uniquely mapped to one single species. |
This function directly sends the HTTP request to DAVID web service (https://david.ncifcrf.gov/content.jsp?file=WS.html ) and parses the returned XML. The reason of writing this function is I have problems with other R packages doing DAVID analysis (e.g. RDAVIDWebService, https://bioconductor.org/packages/devel/bioc/html/RDAVIDWebService.html ) because the rJava package RDAVIDWebService depends on can not be installed on my machine.
Users are encouraged to use more advanced gene set enrichment tools such as clusterProfiler (http://www.bioconductor.org/packages/release/bioc/html/clusterProfiler.html ), or fgsea (http://www.bioconductor.org/packages/release/bioc/html/fgsea.html ).
If you want to run this function multiple times, please set time intervals between runs.
A data frame with functional enrichment results.
Zuguang Gu <[email protected]>
Now cola has a replacement function functional_enrichment
to perform enrichment analysis.
# There is no example NULL
# There is no example NULL
Dimension of the matrix
## S3 method for class 'ConsensusPartition' dim(x)
## S3 method for class 'ConsensusPartition' dim(x)
x |
A |
# There is no example NULL
# There is no example NULL
Dimension of the matrix
## S3 method for class 'ConsensusPartitionList' dim(x)
## S3 method for class 'ConsensusPartitionList' dim(x)
x |
A |
# There is no example NULL
# There is no example NULL
Dimension of the matrix
## S3 method for class 'DownSamplingConsensusPartition' dim(x)
## S3 method for class 'DownSamplingConsensusPartition' dim(x)
x |
A |
# There is no example NULL
# There is no example NULL
Dimension of the matrix
## S3 method for class 'HierarchicalPartition' dim(x)
## S3 method for class 'HierarchicalPartition' dim(x)
x |
A |
# There is no example NULL
# There is no example NULL
Visualize samples (the matrix columns) after dimension reduction
## S4 method for signature 'ConsensusPartition' dimension_reduction(object, k, top_n = NULL, method = c("PCA", "MDS", "t-SNE", "UMAP"), control = list(), color_by = NULL, internal = FALSE, nr = 5000, silhouette_cutoff = 0.5, remove = FALSE, scale_rows = object@scale_rows, verbose = TRUE, ...)
## S4 method for signature 'ConsensusPartition' dimension_reduction(object, k, top_n = NULL, method = c("PCA", "MDS", "t-SNE", "UMAP"), control = list(), color_by = NULL, internal = FALSE, nr = 5000, silhouette_cutoff = 0.5, remove = FALSE, scale_rows = object@scale_rows, verbose = TRUE, ...)
object |
A |
k |
Number of subgroups. |
top_n |
Top n rows to use. By default it uses all rows in the original matrix. |
method |
Which method to reduce the dimension of the data. |
color_by |
If annotation table is set, an annotation name can be set here. |
control |
|
internal |
Internally used. |
nr |
If number of matrix rows is larger than this value, random |
silhouette_cutoff |
Cutoff of silhouette score. Data points with values less than it will be mapped with cross symbols. |
remove |
Whether to remove columns which have less silhouette scores than the cutoff. |
scale_rows |
Whether to perform scaling on matrix rows. |
verbose |
Whether print messages. |
... |
Pass to |
Locations of the points.
Zuguang Gu <[email protected]>
data(golub_cola) dimension_reduction(golub_cola["ATC", "skmeans"], k = 3)
data(golub_cola) dimension_reduction(golub_cola["ATC", "skmeans"], k = 3)
Method dispatch page for dimension_reduction
.
dimension_reduction
can be dispatched on following classes:
dimension_reduction,ConsensusPartition-method
, ConsensusPartition-class
class method
dimension_reduction,DownSamplingConsensusPartition-method
, DownSamplingConsensusPartition-class
class method
dimension_reduction,HierarchicalPartition-method
, HierarchicalPartition-class
class method
dimension_reduction,matrix-method
, matrix-class
class method
# no example NULL
# no example NULL
Visualize samples (the matrix columns) after dimension reduction
## S4 method for signature 'DownSamplingConsensusPartition' dimension_reduction(object, k, top_n = NULL, method = c("PCA", "MDS", "t-SNE", "UMAP"), control = list(), color_by = NULL, internal = FALSE, nr = 5000, p_cutoff = 0.05, remove = FALSE, scale_rows = TRUE, verbose = TRUE, ...)
## S4 method for signature 'DownSamplingConsensusPartition' dimension_reduction(object, k, top_n = NULL, method = c("PCA", "MDS", "t-SNE", "UMAP"), control = list(), color_by = NULL, internal = FALSE, nr = 5000, p_cutoff = 0.05, remove = FALSE, scale_rows = TRUE, verbose = TRUE, ...)
object |
A |
k |
Number of subgroups. |
top_n |
Top n rows to use. By default it uses all rows in the original matrix. |
method |
Which method to reduce the dimension of the data. |
color_by |
If annotation table is set, an annotation name can be set here. |
control |
|
internal |
Internally used. |
nr |
If number of matrix rows is larger than this value, random |
p_cutoff |
Cutoff of p-value of class label prediction. Data points with values higher than it will be mapped with cross symbols. |
remove |
Whether to remove columns which have high p-values than the cutoff. |
scale_rows |
Whether to perform scaling on matrix rows. |
verbose |
Whether print messages. |
... |
Other arguments. |
This function is basically very similar as dimension_reduction,ConsensusPartition-method
.
No value is returned.
data(golub_cola_ds) dimension_reduction(golub_cola_ds, k = 2) dimension_reduction(golub_cola_ds, k = 3)
data(golub_cola_ds) dimension_reduction(golub_cola_ds, k = 2) dimension_reduction(golub_cola_ds, k = 3)
Visualize columns after dimension reduction
## S4 method for signature 'HierarchicalPartition' dimension_reduction(object, merge_node = merge_node_param(), parent_node, top_n = NULL, top_value_method = object@list[[1]]@top_value_method, method = c("PCA", "MDS", "t-SNE", "UMAP"), color_by = NULL, scale_rows = object@list[[1]]@scale_rows, verbose = TRUE, ...)
## S4 method for signature 'HierarchicalPartition' dimension_reduction(object, merge_node = merge_node_param(), parent_node, top_n = NULL, top_value_method = object@list[[1]]@top_value_method, method = c("PCA", "MDS", "t-SNE", "UMAP"), color_by = NULL, scale_rows = object@list[[1]]@scale_rows, verbose = TRUE, ...)
object |
A |
merge_node |
Parameters to merge sub-dendrograms, see |
top_n |
Top n rows to use. By default it uses all rows in the original matrix. |
top_value_method |
Which top-value method to use. |
parent_node |
Parent node. If it is set, the function call is identical to |
method |
Which method to reduce the dimension of the data. |
color_by |
If annotation table is set, an annotation name can be set here. |
scale_rows |
Whether to perform scaling on matrix rows. |
verbose |
Whether print messages. |
... |
Other arguments passed to |
The class IDs are extract at depth
.
No value is returned.
Zuguang Gu <[email protected]>
data(golub_cola_rh) dimension_reduction(golub_cola_rh)
data(golub_cola_rh) dimension_reduction(golub_cola_rh)
Visualize columns after dimension reduction
## S4 method for signature 'matrix' dimension_reduction(object, pch = 16, col = "black", cex = 1, main = NULL, method = c("PCA", "MDS", "t-SNE", "UMAP"), pc = NULL, control = list(), scale_rows = FALSE, nr = 5000, internal = FALSE, verbose = TRUE)
## S4 method for signature 'matrix' dimension_reduction(object, pch = 16, col = "black", cex = 1, main = NULL, method = c("PCA", "MDS", "t-SNE", "UMAP"), pc = NULL, control = list(), scale_rows = FALSE, nr = 5000, internal = FALSE, verbose = TRUE)
object |
A numeric matrix. |
method |
Which method to reduce the dimension of the data. |
pc |
Which two principle components to visualize |
control |
|
pch |
Ahape of points. |
col |
Color of points. |
cex |
Aize of points. |
main |
Title of the plot. |
scale_rows |
Whether perform scaling on matrix rows. |
nr |
If number of matrix rows is larger than this value, random |
internal |
Internally used. |
verbose |
Whether print messages. |
Locations of the points.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
The DownSamplingConsensusPartition class
The DownSamplingConsensusPartition
performs consensus partitioning only with a small subset
of columns and the class of other columns are predicted by predict_classes,ConsensusPartition-method
.
The DownSamplingConsensusPartition-class
is a child class of ConsensusPartition-class
. It inherits
all methods of ConsensusPartition-class
.
The constructor function consensus_partition_by_down_sampling
.
# There is no example NULL
# There is no example NULL
Flatness of the CDF curve
FCC(consensus_mat, diff = 0.1)
FCC(consensus_mat, diff = 0.1)
consensus_mat |
A consensus matrix. |
diff |
Difference of F(b) - F(a). |
For a in [0, 0.5] and for b in [0.5, 1], the flatness measures the flatness of the CDF curve of the consensus matrix. It is calculated as the maximum width that fits F(b) - F(a) <= diff
A numeric value.
data(golub_cola) FCC(get_consensus(golub_cola[1, 1], k = 2)) FCC(get_consensus(golub_cola[1, 1], k = 3)) FCC(get_consensus(golub_cola[1, 1], k = 4)) FCC(get_consensus(golub_cola[1, 1], k = 5)) FCC(get_consensus(golub_cola[1, 1], k = 6))
data(golub_cola) FCC(get_consensus(golub_cola[1, 1], k = 2)) FCC(get_consensus(golub_cola[1, 1], k = 3)) FCC(get_consensus(golub_cola[1, 1], k = 4)) FCC(get_consensus(golub_cola[1, 1], k = 5)) FCC(get_consensus(golub_cola[1, 1], k = 6))
Find a best k for the k-means clustering
find_best_km(mat, max_km = 15)
find_best_km(mat, max_km = 15)
mat |
A matrix where k-means clustering is executed by rows. |
max_km |
Maximal k to try. |
The best k is determined by looking for the knee/elbow of the WSS curve (within-cluster sum of square).
Note this function is only for a rough and quick estimation of the best k.
# There is no example NULL
# There is no example NULL
Perform functional enrichment on signature genes
## S4 method for signature 'ANY' functional_enrichment(object, id_mapping = guess_id_mapping(object, org_db, verbose), org_db = "org.Hs.eg.db", ontology = "BP", min_set_size = 10, max_set_size = 1000, verbose = TRUE, prefix = "", ...)
## S4 method for signature 'ANY' functional_enrichment(object, id_mapping = guess_id_mapping(object, org_db, verbose), org_db = "org.Hs.eg.db", ontology = "BP", min_set_size = 10, max_set_size = 1000, verbose = TRUE, prefix = "", ...)
object |
A vector of gene IDs. |
id_mapping |
If the gene IDs are not Entrez IDs, a named vector should be provided where the names are the gene IDs and values are the correspoinding Entrez IDs. The value can also be a function that converts gene IDs. |
org_db |
Annotation database. |
ontology |
Following ontologies are allowed: |
min_set_size |
The minimal size of the gene sets. |
max_set_size |
The maximal size of the gene sets. |
verbose |
Whether to print messages. |
prefix |
Used internally. |
... |
Pass to |
The function enrichment is applied by clusterProfiler, DOSE or ReactomePA packages.
A data frame.
http://bioconductor.org/packages/devel/bioc/vignettes/cola/inst/doc/functional_enrichment.html
# There is no example NULL
# There is no example NULL
Perform functional enrichment on signature genes
## S4 method for signature 'ConsensusPartition' functional_enrichment(object, gene_fdr_cutoff = cola_opt$fdr_cutoff, k = suggest_best_k(object, help = FALSE), row_km = NULL, id_mapping = guess_id_mapping(rownames(object), org_db, verbose), org_db = "org.Hs.eg.db", ontology = "BP", min_set_size = 10, max_set_size = 1000, verbose = TRUE, prefix = "", ...)
## S4 method for signature 'ConsensusPartition' functional_enrichment(object, gene_fdr_cutoff = cola_opt$fdr_cutoff, k = suggest_best_k(object, help = FALSE), row_km = NULL, id_mapping = guess_id_mapping(rownames(object), org_db, verbose), org_db = "org.Hs.eg.db", ontology = "BP", min_set_size = 10, max_set_size = 1000, verbose = TRUE, prefix = "", ...)
object |
a |
gene_fdr_cutoff |
Cutoff of FDR to define significant signature genes. |
k |
Number of subgroups. |
row_km |
Number of row clusterings by k-means to separate the matrix that only contains signatures. |
id_mapping |
If the gene IDs which are row names of the original matrix are not Entrez IDs, a named vector should be provided where the names are the gene IDs in the matrix and values are correspoinding Entrez IDs. The value can also be a function that converts gene IDs. |
org_db |
Annotation database. |
ontology |
See corresponding argumnet in |
min_set_size |
The minimal size of the gene sets. |
max_set_size |
The maximal size of the gene sets. |
verbose |
Whether to print messages. |
prefix |
Used internally. |
... |
Pass to |
For how to control the parameters of functional enrichment, see help page of functional_enrichment,ANY-method
.
A list of data frames which correspond to results for the functional ontologies:
http://bioconductor.org/packages/devel/bioc/vignettes/cola/inst/doc/functional_enrichment.html
# There is no example NULL
# There is no example NULL
Perform functional enrichment on signature genes
## S4 method for signature 'ConsensusPartitionList' functional_enrichment(object, gene_fdr_cutoff = cola_opt$fdr_cutoff, id_mapping = guess_id_mapping(rownames(object), org_db, FALSE), org_db = "org.Hs.eg.db", ontology = "BP", min_set_size = 10, max_set_size = 1000, ...)
## S4 method for signature 'ConsensusPartitionList' functional_enrichment(object, gene_fdr_cutoff = cola_opt$fdr_cutoff, id_mapping = guess_id_mapping(rownames(object), org_db, FALSE), org_db = "org.Hs.eg.db", ontology = "BP", min_set_size = 10, max_set_size = 1000, ...)
object |
A |
gene_fdr_cutoff |
Cutoff of FDR to define significant signature genes. |
id_mapping |
If the gene IDs which are row names of the original matrix are not Entrez IDs, a named vector should be provided where the names are the gene IDs in the matrix and values are correspoinding Entrez IDs. The value can also be a function that converts gene IDs. |
org_db |
Annotation database. |
ontology |
See corresponding argumnet in |
min_set_size |
The minimal size of the gene sets. |
max_set_size |
The maximal size of the gene sets. |
... |
Pass to |
For each method, the signature genes are extracted based on the best k.
It calls functional_enrichment,ConsensusPartition-method
on the consensus partitioning results for each method.
For how to control the parameters of functional enrichment, see help page of functional_enrichment,ANY-method
.
A list where each element in the list corresponds to enrichment results from a single method.
http://bioconductor.org/packages/devel/bioc/vignettes/cola/inst/doc/functional_enrichment.html
# There is no example NULL
# There is no example NULL
Method dispatch page for functional_enrichment
.
functional_enrichment
can be dispatched on following classes:
functional_enrichment,HierarchicalPartition-method
, HierarchicalPartition-class
class method
functional_enrichment,ANY-method
, ANY-class
class method
functional_enrichment,ConsensusPartition-method
, ConsensusPartition-class
class method
functional_enrichment,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
# no example NULL
# no example NULL
Perform functional enrichment on signature genes
## S4 method for signature 'HierarchicalPartition' functional_enrichment(object, merge_node = merge_node_param(), gene_fdr_cutoff = cola_opt$fdr_cutoff, row_km = NULL, id_mapping = guess_id_mapping(rownames(object), org_db, verbose), org_db = "org.Hs.eg.db", ontology = "BP", min_set_size = 10, max_set_size = 1000, verbose = TRUE, ...)
## S4 method for signature 'HierarchicalPartition' functional_enrichment(object, merge_node = merge_node_param(), gene_fdr_cutoff = cola_opt$fdr_cutoff, row_km = NULL, id_mapping = guess_id_mapping(rownames(object), org_db, verbose), org_db = "org.Hs.eg.db", ontology = "BP", min_set_size = 10, max_set_size = 1000, verbose = TRUE, ...)
object |
a |
merge_node |
Parameters to merge sub-dendrograms, see |
gene_fdr_cutoff |
Cutoff of FDR to define significant signature genes. |
row_km |
Number of row clusterings by k-means to separate the matrix that only contains signatures. |
id_mapping |
If the gene IDs which are row names of the original matrix are not Entrez IDs, a named vector should be provided where the names are the gene IDs in the matrix and values are correspoinding Entrez IDs. The value can also be a function that converts gene IDs. |
org_db |
Annotation database. |
ontology |
See corresponding argumnet in |
min_set_size |
The minimal size of the gene sets. |
max_set_size |
The maximal size of the gene sets. |
verbose |
Whether to print messages. |
... |
Pass to |
For how to control the parameters of functional enrichment, see help page of functional_enrichment,ANY-method
.
A list of data frames which correspond to results for the functional ontologies:
# There is no example NULL
# There is no example NULL
Get annotation colors
## S4 method for signature 'ConsensusPartition' get_anno_col(object)
## S4 method for signature 'ConsensusPartition' get_anno_col(object)
object |
A |
A list of color vectors or else NULL
.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Get annotation colors
## S4 method for signature 'ConsensusPartitionList' get_anno_col(object)
## S4 method for signature 'ConsensusPartitionList' get_anno_col(object)
object |
A |
A list of color vectors or else NULL
.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Method dispatch page for get_anno_col
.
get_anno_col
can be dispatched on following classes:
get_anno_col,HierarchicalPartition-method
, HierarchicalPartition-class
class method
get_anno_col,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
get_anno_col,ConsensusPartition-method
, ConsensusPartition-class
class method
# no example NULL
# no example NULL
Get annotation colors
## S4 method for signature 'HierarchicalPartition' get_anno_col(object)
## S4 method for signature 'HierarchicalPartition' get_anno_col(object)
object |
A |
A list of color vectors or NULL
.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Get annotations
## S4 method for signature 'ConsensusPartition' get_anno(object)
## S4 method for signature 'ConsensusPartition' get_anno(object)
object |
A |
A data frame if anno
was specified in run_all_consensus_partition_methods
or consensus_partition
, or else NULL
.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Get annotations
## S4 method for signature 'ConsensusPartitionList' get_anno(object)
## S4 method for signature 'ConsensusPartitionList' get_anno(object)
object |
A |
A data frame if anno
was specified in run_all_consensus_partition_methods
, or else NULL
.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Method dispatch page for get_anno
.
get_anno
can be dispatched on following classes:
get_anno,HierarchicalPartition-method
, HierarchicalPartition-class
class method
get_anno,ConsensusPartition-method
, ConsensusPartition-class
class method
get_anno,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
get_anno,DownSamplingConsensusPartition-method
, DownSamplingConsensusPartition-class
class method
# no example NULL
# no example NULL
Get annotations
## S4 method for signature 'DownSamplingConsensusPartition' get_anno(object, reduce = FALSE)
## S4 method for signature 'DownSamplingConsensusPartition' get_anno(object, reduce = FALSE)
object |
A |
reduce |
Used internally. |
A data frame if anno
was specified in consensus_partition_by_down_sampling
, or else NULL
.
Zuguang Gu <[email protected]>
data(golub_cola_ds) get_anno(golub_cola_ds)
data(golub_cola_ds) get_anno(golub_cola_ds)
Get annotations
## S4 method for signature 'HierarchicalPartition' get_anno(object)
## S4 method for signature 'HierarchicalPartition' get_anno(object)
object |
A |
A data frame if anno
was specified in hierarchical_partition
, or NULL
.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Test whether a node is a leaf node
## S4 method for signature 'HierarchicalPartition' get_children_nodes(object, node, merge_node = merge_node_param())
## S4 method for signature 'HierarchicalPartition' get_children_nodes(object, node, merge_node = merge_node_param())
object |
A |
node |
A vector of node IDs. |
merge_node |
Parameters to merge sub-dendrograms, see |
A vector of children nodes.
# There is no example NULL
# There is no example NULL
Get subgroup labels
## S4 method for signature 'ConsensusPartition' get_classes(object, k = object@k)
## S4 method for signature 'ConsensusPartition' get_classes(object, k = object@k)
object |
A |
k |
Number of subgroups. |
A data frame with subgroup labels and other columns which are entropy of the percent membership matrix and the silhouette scores which measure the stability of a sample to stay in its group.
If k
is not specified, it returns a data frame with subgroup labels from all k.
Zuguang Gu <[email protected]>
data(golub_cola) obj = golub_cola["ATC", "skmeans"] get_classes(obj, k = 2) get_classes(obj)
data(golub_cola) obj = golub_cola["ATC", "skmeans"] get_classes(obj, k = 2) get_classes(obj)
Get subgroup labels
## S4 method for signature 'ConsensusPartitionList' get_classes(object, k)
## S4 method for signature 'ConsensusPartitionList' get_classes(object, k)
object |
A |
k |
Number of subgroups. |
The subgroup labels are inferred by merging partitions from all methods by weighting the mean silhouette scores in each method.
A data frame with subgroup labels and other columns which are entropy of the percent membership matrix and the silhouette scores which measure the stability of a sample to stay in its group.
Zuguang Gu <[email protected]>
data(golub_cola) get_classes(golub_cola, k = 2)
data(golub_cola) get_classes(golub_cola, k = 2)
Method dispatch page for get_classes
.
get_classes
can be dispatched on following classes:
get_classes,HierarchicalPartition-method
, HierarchicalPartition-class
class method
get_classes,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
get_classes,ConsensusPartition-method
, ConsensusPartition-class
class method
get_classes,DownSamplingConsensusPartition-method
, DownSamplingConsensusPartition-class
class method
# no example NULL
# no example NULL
Get subgroup labels
## S4 method for signature 'DownSamplingConsensusPartition' get_classes(object, k = object@k, p_cutoff = 0.05, reduce = FALSE)
## S4 method for signature 'DownSamplingConsensusPartition' get_classes(object, k = object@k, p_cutoff = 0.05, reduce = FALSE)
object |
A |
k |
Number of subgroups. |
p_cutoff |
Cutoff of p-values of class label prediction. It is only used when |
reduce |
Used internally. |
If k
is a scalar, it returns a data frame with two columns:
the class labels
the p-value for the prediction of class labels.
If k
is a vector, it returns a data frame of class labels for each k. The class
label with prediction p-value > p_cutoff
is set to NA
.
data(golub_cola_ds) get_classes(golub_cola_ds, k = 3) get_classes(golub_cola_ds)
data(golub_cola_ds) get_classes(golub_cola_ds, k = 3) get_classes(golub_cola_ds)
Get class IDs from the HierarchicalPartition object
## S4 method for signature 'HierarchicalPartition' get_classes(object, merge_node = merge_node_param())
## S4 method for signature 'HierarchicalPartition' get_classes(object, merge_node = merge_node_param())
object |
A |
merge_node |
Parameters to merge sub-dendrograms, see |
A data frame of classes IDs. The class IDs are the node IDs where the subgroup sits in the hierarchy.
Zuguang Gu <[email protected]>
data(golub_cola_rh) get_classes(golub_cola_rh)
data(golub_cola_rh) get_classes(golub_cola_rh)
Get consensus matrix
## S4 method for signature 'ConsensusPartition' get_consensus(object, k)
## S4 method for signature 'ConsensusPartition' get_consensus(object, k)
object |
A |
k |
Number of subgroups. |
For row i and column j in the consensus matrix, the value of corresponding x_ij is the probability of sample i and sample j being in the same group from all partitions.
A consensus matrix corresponding to the current k.
Zuguang Gu <[email protected]>
data(golub_cola) obj = golub_cola["ATC", "skmeans"] get_consensus(obj, k = 2)
data(golub_cola) obj = golub_cola["ATC", "skmeans"] get_consensus(obj, k = 2)
Get the original matrix
## S4 method for signature 'ConsensusPartition' get_matrix(object, full = FALSE, include_all_rows = FALSE)
## S4 method for signature 'ConsensusPartition' get_matrix(object, full = FALSE, include_all_rows = FALSE)
object |
A |
full |
Whether to extract the complete original matrix. |
include_all_rows |
Internally used. |
A numeric matrix.
Zuguang Gu <[email protected]>
data(golub_cola) obj = golub_cola["ATC", "skmeans"] get_matrix(obj)
data(golub_cola) obj = golub_cola["ATC", "skmeans"] get_matrix(obj)
Get the original matrix
## S4 method for signature 'ConsensusPartitionList' get_matrix(object)
## S4 method for signature 'ConsensusPartitionList' get_matrix(object)
object |
A |
A numeric matrix.
Zuguang Gu <[email protected]>
data(golub_cola) get_matrix(golub_cola)
data(golub_cola) get_matrix(golub_cola)
Method dispatch page for get_matrix
.
get_matrix
can be dispatched on following classes:
get_matrix,ConsensusPartition-method
, ConsensusPartition-class
class method
get_matrix,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
get_matrix,DownSamplingConsensusPartition-method
, DownSamplingConsensusPartition-class
class method
get_matrix,HierarchicalPartition-method
, HierarchicalPartition-class
class method
# no example NULL
# no example NULL
Get the original matrix
## S4 method for signature 'DownSamplingConsensusPartition' get_matrix(object, reduce = FALSE)
## S4 method for signature 'DownSamplingConsensusPartition' get_matrix(object, reduce = FALSE)
object |
A |
reduce |
Whether to return the reduced matrix where columns are randomly sampled. |
A numeric matrix
# There is no example NULL
# There is no example NULL
Get the original matrix
## S4 method for signature 'HierarchicalPartition' get_matrix(object)
## S4 method for signature 'HierarchicalPartition' get_matrix(object)
object |
A |
A numeric matrix.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Get membership matrix
## S4 method for signature 'ConsensusPartition' get_membership(object, k, each = FALSE)
## S4 method for signature 'ConsensusPartition' get_membership(object, k, each = FALSE)
object |
A |
k |
Number of subgroups. |
each |
Whether to return the percentage membership matrix which is summarized from all partitions or the individual membership in every single partition run. |
If each == FALSE
, the value in the membership matrix is the probability
to be in one subgroup, while if each == TRUE
, the membership matrix contains the
subgroup labels for every single partitions which are from randomly sampling from the original matrix.
The percent membership matrix is calculated by cl_consensus
.
If each == FALSE
, it returns a membership matrix where rows correspond to the columns from the subgroups.
If each == TRUE
, it returns a membership matrix where rows correspond to the columns from the original matrix.
Zuguang Gu <[email protected]>
get_membership,ConsensusPartitionList-method
summarizes membership from partitions from all combinations
of top-value methods and partitioning methods.
data(golub_cola) obj = golub_cola["ATC", "skmeans"] get_membership(obj, k = 2) get_membership(obj, k = 2, each = TRUE)
data(golub_cola) obj = golub_cola["ATC", "skmeans"] get_membership(obj, k = 2) get_membership(obj, k = 2, each = TRUE)
Get membership matrix
## S4 method for signature 'ConsensusPartitionList' get_membership(object, k)
## S4 method for signature 'ConsensusPartitionList' get_membership(object, k)
object |
A |
k |
Number of subgroups. |
The membership matrix (the probability of each sample to be in one subgroup, if assuming columns represent samples) is inferred from the consensus partition of every combination of methods, weighted by the mean silhouette score of the partition for each method. So methods which give unstable partitions have lower weights when summarizing membership matrix from all methods.
A membership matrix where rows correspond to the columns in the original matrix.
Zuguang Gu <[email protected]>
get_membership,ConsensusPartition-method
returns membership matrix for a single top-value method and partitioning method.
data(golub_cola) get_membership(golub_cola, k = 2)
data(golub_cola) get_membership(golub_cola, k = 2)
Method dispatch page for get_membership
.
get_membership
can be dispatched on following classes:
get_membership,ConsensusPartition-method
, ConsensusPartition-class
class method
get_membership,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
# no example NULL
# no example NULL
Get parameters
## S4 method for signature 'ConsensusPartition' get_param(object, k = object@k, unique = TRUE)
## S4 method for signature 'ConsensusPartition' get_param(object, k = object@k, unique = TRUE)
object |
A |
k |
Number of subgroups. |
unique |
Whether to apply |
It is mainly used internally.
A data frame of parameters corresponding to the current k. In the data frame, each row corresponds to a partition run.
Zuguang Gu <[email protected]>
data(golub_cola) obj = golub_cola["ATC", "skmeans"] get_param(obj) get_param(obj, k = 2) get_param(obj, unique = FALSE)
data(golub_cola) obj = golub_cola["ATC", "skmeans"] get_param(obj) get_param(obj, k = 2) get_param(obj, unique = FALSE)
Get signature rows
## S4 method for signature 'ConsensusPartition' get_signatures(object, k, col = if(scale_rows) c("green", "white", "red") else c("blue", "white", "red"), silhouette_cutoff = 0.5, fdr_cutoff = cola_opt$fdr_cutoff, top_signatures = NULL, group_diff = cola_opt$group_diff, scale_rows = object@scale_rows, .scale_mean = NULL, .scale_sd = NULL, row_km = NULL, diff_method = c("Ftest", "ttest", "samr", "pamr", "one_vs_others", "uniquely_high_in_one_group"), anno = get_anno(object), anno_col = get_anno_col(object), internal = FALSE, show_row_dend = FALSE, show_column_names = FALSE, column_names_gp = gpar(fontsize = 8), use_raster = TRUE, plot = TRUE, verbose = TRUE, seed = 888, left_annotation = NULL, right_annotation = NULL, simplify = FALSE, prefix = "", enforce = FALSE, hash = NULL, from_hc = FALSE, ...)
## S4 method for signature 'ConsensusPartition' get_signatures(object, k, col = if(scale_rows) c("green", "white", "red") else c("blue", "white", "red"), silhouette_cutoff = 0.5, fdr_cutoff = cola_opt$fdr_cutoff, top_signatures = NULL, group_diff = cola_opt$group_diff, scale_rows = object@scale_rows, .scale_mean = NULL, .scale_sd = NULL, row_km = NULL, diff_method = c("Ftest", "ttest", "samr", "pamr", "one_vs_others", "uniquely_high_in_one_group"), anno = get_anno(object), anno_col = get_anno_col(object), internal = FALSE, show_row_dend = FALSE, show_column_names = FALSE, column_names_gp = gpar(fontsize = 8), use_raster = TRUE, plot = TRUE, verbose = TRUE, seed = 888, left_annotation = NULL, right_annotation = NULL, simplify = FALSE, prefix = "", enforce = FALSE, hash = NULL, from_hc = FALSE, ...)
object |
A |
k |
Number of subgroups. |
col |
Colors for the main heatmap. |
silhouette_cutoff |
Cutoff for silhouette scores. Samples with values less than it are not used for finding signature rows. For selecting a proper silhouette cutoff, please refer to https://www.stat.berkeley.edu/~s133/Cluster2a.html#tth_tAb1. |
fdr_cutoff |
Cutoff for FDR of the difference test between subgroups. |
top_signatures |
Top signatures with most significant fdr. Note since fdr might be same for multiple rows, the final number of signatures might not be exactly the same as the one that has been set. |
group_diff |
Cutoff for the maximal difference between group means. |
scale_rows |
Whether apply row scaling when making the heatmap. |
.scale_mean |
Internally used. |
.scale_sd |
Internally used. |
row_km |
Number of groups for performing k-means clustering on rows. By default it is automatically selected. |
diff_method |
Methods to get rows which are significantly different between subgroups, see 'Details' section. |
anno |
A data frame of annotations for the original matrix columns. By default it uses the annotations specified in |
anno_col |
A list of colors (color is defined as a named vector) for the annotations. If |
internal |
Used internally. |
show_row_dend |
Whether show row dendrogram. |
show_column_names |
Whether show column names in the heatmap. |
column_names_gp |
Graphics parameters for column names. |
use_raster |
Internally used. |
plot |
Whether to make the plot. |
verbose |
Whether to print messages. |
seed |
Random seed. |
left_annotation |
Annotation put on the left of the heatmap. It should be a |
right_annotation |
Annotation put on the right of the heatmap. Same format as |
simplify |
Only used internally. |
prefix |
Only used internally. |
enforce |
The analysis is cached by default, so that the analysis with the same input will be automatically extracted without rerunning them. Set |
hash |
Userd internally. |
from_hc |
Is the |
... |
Other arguments. |
Basically the function applies statistical test for the difference in subgroups for every row. There are following methods which test significance of the difference:
First it looks for the subgroup with highest mean value, compare to each of the other subgroups with t-test and take the maximum p-value. Second it looks for the subgroup with lowest mean value, compare to each of the other subgroups again with t-test and take the maximum p-values. Later for these two list of p-values take the minimal p-value as the final p-value.
use SAM (from samr package)/PAM (from pamr package) method to find significantly different rows between subgroups.
use F-test to find significantly different rows between subgroups.
For each subgroup i in each row, it uses t-test to compare samples in current subgroup to all other samples, denoted as p_i. The p-value for current row is selected as min(p_i).
The signatures are defined as, if they are uniquely up-regulated in subgroup A, then it must fit following criterions: 1. in a two-group t-test of A ~ other_merged_groups, the statistic must be > 0 (high in group A) and p-value must be significant, and 2. for other groups (excluding A), t-test in every pair of groups should not be significant.
diff_method
can also be a self-defined function. The function needs two arguments which are the matrix for the analysis
and the predicted classes. The function should returns a vector of FDR from the difference test.
A data frame with more than two columns:
which_row
:row index corresponding to the original matrix.
fdr
:the FDR.
km
:the k-means groups if row_km
is set.
the mean value (depending rows are scaled or not) in each subgroup.
Zuguang Gu <[email protected]>
data(golub_cola) res = golub_cola["ATC", "skmeans"] tb = get_signatures(res, k = 3) head(tb) get_signatures(res, k = 3, top_signatures = 100)
data(golub_cola) res = golub_cola["ATC", "skmeans"] tb = get_signatures(res, k = 3) head(tb) get_signatures(res, k = 3, top_signatures = 100)
Method dispatch page for get_signatures
.
get_signatures
can be dispatched on following classes:
get_signatures,ConsensusPartition-method
, ConsensusPartition-class
class method
get_signatures,DownSamplingConsensusPartition-method
, DownSamplingConsensusPartition-class
class method
get_signatures,HierarchicalPartition-method
, HierarchicalPartition-class
class method
# no example NULL
# no example NULL
Get signature rows
## S4 method for signature 'DownSamplingConsensusPartition' get_signatures(object, k, p_cutoff = 1, ...)
## S4 method for signature 'DownSamplingConsensusPartition' get_signatures(object, k, p_cutoff = 1, ...)
object |
A |
k |
Number of subgroups. |
p_cutoff |
Cutoff for p-values of class label prediction. Samples with values higher than it are not used for finding signature rows. |
... |
Other arguments passed to |
This function is very similar as get_signatures,ConsensusPartition-method
.
data(golub_cola_ds) get_signatures(golub_cola_ds, k = 2) get_signatures(golub_cola_ds, k = 3)
data(golub_cola_ds) get_signatures(golub_cola_ds, k = 2) get_signatures(golub_cola_ds, k = 3)
Get signatures rows
## S4 method for signature 'HierarchicalPartition' get_signatures(object, merge_node = merge_node_param(), group_diff = object@param$group_diff, row_km = NULL, diff_method = "Ftest", fdr_cutoff = object@param$fdr_cutoff, scale_rows = object[1]@scale_rows, anno = get_anno(object), anno_col = get_anno_col(object), show_column_names = FALSE, column_names_gp = gpar(fontsize = 8), verbose = TRUE, plot = TRUE, seed = 888, ...)
## S4 method for signature 'HierarchicalPartition' get_signatures(object, merge_node = merge_node_param(), group_diff = object@param$group_diff, row_km = NULL, diff_method = "Ftest", fdr_cutoff = object@param$fdr_cutoff, scale_rows = object[1]@scale_rows, anno = get_anno(object), anno_col = get_anno_col(object), show_column_names = FALSE, column_names_gp = gpar(fontsize = 8), verbose = TRUE, plot = TRUE, seed = 888, ...)
object |
a |
merge_node |
Parameters to merge sub-dendrograms, see |
group_diff |
Cutoff for the maximal difference between group means. |
row_km |
Number of groups for performing k-means clustering on rows. By default it is automatically selected. |
diff_method |
Methods to get rows which are significantly different between subgroups. |
fdr_cutoff |
Cutoff for FDR of the difference test between subgroups. |
scale_rows |
whether apply row scaling when making the heatmap. |
anno |
a data frame of annotations for the original matrix columns. By default it uses the annotations specified in |
anno_col |
a list of colors (color is defined as a named vector) for the annotations. If |
show_column_names |
whether show column names in the heatmap. |
column_names_gp |
Graphic parameters for column names. |
verbose |
whether to print messages. |
plot |
whether to make the plot. |
seed |
Random seed. |
... |
other arguments pass to |
The function calls get_signatures,ConsensusPartition-method
to find signatures at
each node of the partition hierarchy.
A data frame with more than two columns:
which_row
:row index corresponding to the original matrix.
km
:the k-means groups if row_km
is set.
the mean value (depending rows are scaled or not) in each subgroup.
Zuguang Gu <[email protected]>
data(golub_cola_rh) tb = get_signatures(golub_cola_rh) head(tb)
data(golub_cola_rh) tb = get_signatures(golub_cola_rh) head(tb)
Get statistics
## S4 method for signature 'ConsensusPartition' get_stats(object, k = object@k, all_stats = FALSE)
## S4 method for signature 'ConsensusPartition' get_stats(object, k = object@k, all_stats = FALSE)
object |
A |
k |
Number of subgroups. The value can be a vector. |
all_stats |
Whether to show all statistics that were calculated. Used internally. |
The statistics are:
1 - proportion of ambiguous clustering, calculated by PAC
.
The mean silhouette score. See https://en.wikipedia.org/wiki/Silhouette_(clustering) .
The mean probability that each partition fits the consensus partition, calculated by concordance
.
The increased area under eCDF (the empirical cumulative distribution function) curve to the previous k.
This is the percent of pairs of samples that are both in a same cluster or both are not in a same cluster in the partition of k
and k-1
. See https://en.wikipedia.org/wiki/Rand_index .
The ratio of pairs of samples are both in a same cluster in the partition of k
and k-1
and the pairs of samples are both in a same cluster in the partition k
or k-1
.
A matrix of partition statistics.
Zuguang Gu <[email protected]>
data(golub_cola) obj = golub_cola["ATC", "skmeans"] get_stats(obj) get_stats(obj, k = 2)
data(golub_cola) obj = golub_cola["ATC", "skmeans"] get_stats(obj) get_stats(obj, k = 2)
Get statistics
## S4 method for signature 'ConsensusPartitionList' get_stats(object, k, all_stats = FALSE)
## S4 method for signature 'ConsensusPartitionList' get_stats(object, k, all_stats = FALSE)
object |
A |
k |
Number of subgroups. The value can only be a single value. |
all_stats |
Whether to show all statistics that were calculated. Used internally. |
A matrix of partition statistics for a selected k. Rows in the matrix correspond to combinations of top-value methods and partitioning methods.
Zuguang Gu <[email protected]>
data(golub_cola) get_stats(golub_cola, k = 2)
data(golub_cola) get_stats(golub_cola, k = 2)
Method dispatch page for get_stats
.
get_stats
can be dispatched on following classes:
get_stats,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
get_stats,ConsensusPartition-method
, ConsensusPartition-class
class method
# no example NULL
# no example NULL
Example ConsensusPartitionList object from Golub dataset
data(golub_cola)
data(golub_cola)
Following code was used to generate golub_cola
:
library(cola) library(golubEsets) # from bioc data(Golub_Merge) m = exprs(Golub_Merge) colnames(m) = paste0("sample_", colnames(m)) anno = pData(Golub_Merge) m[m <= 1] = NA m = log10(m) m = adjust_matrix(m) library(preprocessCore) # from bioc cn = colnames(m) rn = rownames(m) m = normalize.quantiles(m) colnames(m) = cn rownames(m) = rn set.seed(123) golub_cola = run_all_consensus_partition_methods( m, cores = 6, anno = anno[, c("ALL.AML"), drop = FALSE], anno_col = c("ALL" = "red", "AML" = "blue") )
Zuguang Gu <[email protected]>
https://jokergoo.github.io/cola_examples/Golub_leukemia/
data(golub_cola) golub_cola
data(golub_cola) golub_cola
Example DownSamplingConsensusPartition object from Golub dataset
data(golub_cola_ds)
data(golub_cola_ds)
Following code was used to generate golub_cola_ds
:
library(cola) data(golub_cola) m = get_matrix(golub_cola) set.seed(123) golub_cola_ds = consensus_partition_by_down_sampling( m, subset = 50, cores = 6, anno = get_anno(golub_cola), anno_col = get_anno_col(golub_cola), )
Zuguang Gu <[email protected]>
data(golub_cola_ds) golub_cola_ds
data(golub_cola_ds) golub_cola_ds
Example HierarchicalPartition object from Golub dataset
data(golub_cola_rh)
data(golub_cola_rh)
Following code was used to generate golub_cola_rh
:
library(cola) data(golub_cola) m = get_matrix(golub_cola) set.seed(123) golub_cola_rh = hierarchical_partition( m, cores = 6, anno = get_anno(golub_cola), anno_col = get_anno_col(golub_cola) )
Zuguang Gu <[email protected]>
data(golub_cola_rh) golub_cola_rh
data(golub_cola_rh) golub_cola_rh
Hierarchical partition
hierarchical_partition(data, top_n = NULL, top_value_method = "ATC", partition_method = "skmeans", combination_method = expand.grid(top_value_method, partition_method), anno = NULL, anno_col = NULL, mean_silhouette_cutoff = 0.9, min_samples = max(6, round(ncol(data)*0.01)), subset = Inf, predict_method = "centroid", group_diff = ifelse(scale_rows, 0.5, 0), fdr_cutoff = cola_opt$fdr_cutoff, min_n_signatures = NULL, filter_fun = function(mat) { s = rowSds(mat) s > quantile(unique(s[s > 1e-10]), 0.05, na.rm = TRUE) }, max_k = 4, scale_rows = TRUE, verbose = TRUE, mc.cores = 1, cores = mc.cores, help = TRUE, ...)
hierarchical_partition(data, top_n = NULL, top_value_method = "ATC", partition_method = "skmeans", combination_method = expand.grid(top_value_method, partition_method), anno = NULL, anno_col = NULL, mean_silhouette_cutoff = 0.9, min_samples = max(6, round(ncol(data)*0.01)), subset = Inf, predict_method = "centroid", group_diff = ifelse(scale_rows, 0.5, 0), fdr_cutoff = cola_opt$fdr_cutoff, min_n_signatures = NULL, filter_fun = function(mat) { s = rowSds(mat) s > quantile(unique(s[s > 1e-10]), 0.05, na.rm = TRUE) }, max_k = 4, scale_rows = TRUE, verbose = TRUE, mc.cores = 1, cores = mc.cores, help = TRUE, ...)
data |
a numeric matrix where subgroups are found by columns. |
top_n |
Number of rows with top values. |
top_value_method |
a single or a vector of top-value methods. Available methods are in |
partition_method |
a single or a vector of partition methods. Available methods are in |
combination_method |
A list of combinations of top-value methods and partitioning methods. The value can be a two-column data frame where the first column is the top-value methods and the second column is the partitioning methods. Or it can be a vector of combination names in a form of "top_value_method:partitioning_method". |
anno |
A data frame with known annotation of samples. The annotations will be plotted in heatmaps and the correlation to predicted subgroups will be tested. |
anno_col |
A list of colors (color is defined as a named vector) for the annotations. If |
mean_silhouette_cutoff |
The cutoff to test whether partition in current node is stable. |
min_samples |
the cutoff of number of samples to determine whether to continue looking for subgroups. |
group_diff |
|
fdr_cutoff |
|
subset |
Number of columns to randomly sample. |
predict_method |
Method for predicting class labels. Possible values are "centroid", "svm" and "randomForest". |
min_n_signatures |
Minimal number of signatures under the best classification. |
filter_fun |
A self-defined function which filters the original matrix and returns a submatrix for partitioning. |
max_k |
maximal number of partitions to try. The function will try |
scale_rows |
Whether rows are scaled? |
verbose |
whether print message. |
mc.cores |
multiple cores to use. This argument will be removed in future versions. |
cores |
Number of cores, or a |
help |
Whether to show the help message. |
... |
pass to |
The function looks for subgroups in a hierarchical way.
There is a special way to encode the node in the hierarchy. The length of the node name
is the depth of the node in the hierarchy and the substring excluding the last digit is the name
node of the parent node. E.g. for the node 0011
, the depth is 4 and the parent node is 001
.
A HierarchicalPartition-class
object. Simply type object in the interactive R session
to see which functions can be applied on it.
Zuguang Gu <[email protected]>
## Not run: set.seed(123) m = cbind(rbind(matrix(rnorm(20*20, mean = 2, sd = 0.3), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.3), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.3), nr = 20)), rbind(matrix(rnorm(20*20, mean = 0, sd = 0.3), nr = 20), matrix(rnorm(20*20, mean = 1, sd = 0.3), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.3), nr = 20)), rbind(matrix(rnorm(20*20, mean = 0, sd = 0.3), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.3), nr = 20), matrix(rnorm(20*20, mean = 1, sd = 0.3), nr = 20)) ) + matrix(rnorm(60*60, sd = 0.5), nr = 60) rh = hierarchical_partition(m, top_value_method = "SD", partition_method = "kmeans") ## End(Not run)
## Not run: set.seed(123) m = cbind(rbind(matrix(rnorm(20*20, mean = 2, sd = 0.3), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.3), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.3), nr = 20)), rbind(matrix(rnorm(20*20, mean = 0, sd = 0.3), nr = 20), matrix(rnorm(20*20, mean = 1, sd = 0.3), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.3), nr = 20)), rbind(matrix(rnorm(20*20, mean = 0, sd = 0.3), nr = 20), matrix(rnorm(20*20, mean = 0, sd = 0.3), nr = 20), matrix(rnorm(20*20, mean = 1, sd = 0.3), nr = 20)) ) + matrix(rnorm(60*60, sd = 0.5), nr = 60) rh = hierarchical_partition(m, top_value_method = "SD", partition_method = "kmeans") ## End(Not run)
The HierarchicalPartition class
The HierarchicalPartition-class
has following methods:
hierarchical_partition
:constructor method.
collect_classes,HierarchicalPartition-method
:plot the hierarchy of subgroups predicted.
get_classes,HierarchicalPartition-method
:get the class IDs of subgroups.
suggest_best_k,HierarchicalPartition-method
:guess the best number of partitions for each node.
get_matrix,HierarchicalPartition-method
:get the original matrix.
get_signatures,HierarchicalPartition-method
:get the signatures for each subgroup.
compare_signatures,HierarchicalPartition-method
:compare signatures from different nodes.
dimension_reduction,HierarchicalPartition-method
:make dimension reduction plots.
test_to_known_factors,HierarchicalPartition-method
:test correlation between predicted subgrouping and known annotations, if available.
cola_report,HierarchicalPartition-method
:generate a HTML report for the whole analysis.
functional_enrichment,HierarchicalPartition-method
:apply functional enrichment.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Test whether the current k is the best/optional k
## S4 method for signature 'ConsensusPartition' is_best_k(object, k, ...)
## S4 method for signature 'ConsensusPartition' is_best_k(object, k, ...)
object |
A |
k |
Number of subgroups. |
... |
Optional best k is also assigned as TRUE
.
Logical scalar.
data(golub_cola) obj = golub_cola["ATC", "skmeans"] is_best_k(obj, k = 2) is_best_k(obj, k = 3)
data(golub_cola) obj = golub_cola["ATC", "skmeans"] is_best_k(obj, k = 2) is_best_k(obj, k = 3)
Test whether the current k is the best/optional k
## S4 method for signature 'ConsensusPartitionList' is_best_k(object, k, ...)
## S4 method for signature 'ConsensusPartitionList' is_best_k(object, k, ...)
object |
A |
k |
Number of subgroups. |
... |
It tests on the partitions for every method.
Logical vector.
data(golub_cola) is_best_k(golub_cola, k = 3)
data(golub_cola) is_best_k(golub_cola, k = 3)
Method dispatch page for is_best_k
.
is_best_k
can be dispatched on following classes:
is_best_k,ConsensusPartition-method
, ConsensusPartition-class
class method
is_best_k,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
# no example NULL
# no example NULL
Test whether a node is a leaf node
## S4 method for signature 'HierarchicalPartition' is_leaf_node(object, node, merge_node = merge_node_param())
## S4 method for signature 'HierarchicalPartition' is_leaf_node(object, node, merge_node = merge_node_param())
object |
A |
node |
A vector of node IDs. |
merge_node |
Parameters to merge sub-dendrograms, see |
data(golub_cola_rh) is_leaf_node(golub_cola_rh, all_leaves(golub_cola_rh))
data(golub_cola_rh) is_leaf_node(golub_cola_rh, all_leaves(golub_cola_rh))
Test whether the current k corresponds to a stable partition
## S4 method for signature 'ConsensusPartition' is_stable_k(object, k, stable_PAC = 0.1, ...)
## S4 method for signature 'ConsensusPartition' is_stable_k(object, k, stable_PAC = 0.1, ...)
object |
A |
k |
Number of subgroups. |
stable_PAC |
Cutoff for stable PAC. |
... |
if 1-PAC for the k is larger than 0.9 (10% ambiguity for the partition), cola marks it as a stable partition.
Logical scalar.
data(golub_cola) obj = golub_cola["ATC", "skmeans"] is_stable_k(obj, k = 2) is_stable_k(obj, k = 3)
data(golub_cola) obj = golub_cola["ATC", "skmeans"] is_stable_k(obj, k = 2) is_stable_k(obj, k = 3)
Test whether the current k corresponds to a stable partition
## S4 method for signature 'ConsensusPartitionList' is_stable_k(object, k, ...)
## S4 method for signature 'ConsensusPartitionList' is_stable_k(object, k, ...)
object |
A |
k |
Number of subgroups. |
... |
It tests on the partitions for every method.
Logical vector
data(golub_cola) is_stable_k(golub_cola, k = 3)
data(golub_cola) is_stable_k(golub_cola, k = 3)
Method dispatch page for is_stable_k
.
is_stable_k
can be dispatched on following classes:
is_stable_k,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
is_stable_k,ConsensusPartition-method
, ConsensusPartition-class
class method
# no example NULL
# no example NULL
Find the knee/elbow of a list of sorted points
knee_finder2(x, plot = FALSE)
knee_finder2(x, plot = FALSE)
x |
A numeric vector. |
plot |
Whether to make the plot. |
A vector of two numeric values. One for the left knee and the second for the right knee.
x = rnorm(1000) knee_finder2(x, plot = TRUE)
x = rnorm(1000) knee_finder2(x, plot = TRUE)
Add JavaScript tab in the report
knitr_add_tab_item(code, header, prefix, desc = "", opt = NULL, message = NULL, hide_and_show = FALSE)
knitr_add_tab_item(code, header, prefix, desc = "", opt = NULL, message = NULL, hide_and_show = FALSE)
code |
R code to execute. |
header |
Header or the title for the tab. |
prefix |
Prefix of the chunk label. |
desc |
Decription in the tab. |
opt |
Options for the knitr chunk. |
message |
Message to print. |
hide_and_show |
Whether to hide the code output. |
Each tab contains the R source code and results generated from it (figure, tables, text, ...).
This function is only for internal use.
No value is returned.
Zuguang Gu <[email protected]>
knitr_insert_tabs
produces a complete HTML fragment.
# There is no example NULL
# There is no example NULL
Generate the HTML fragment for the JavaScript tabs
knitr_insert_tabs(uid)
knitr_insert_tabs(uid)
uid |
A unique identifier for the div. |
The jQuery UI is used to generate html tabs (https://jqueryui.com/tabs/ ).
knitr_insert_tabs
should be used after several calls of knitr_add_tab_item
to generate a complete HTML fragment for all tabs with all necessary Javascript and css code.
This function is only for internal use.
No value is returned.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Map to Entrez IDs
map_to_entrez_id(from, org_db = "org.Hs.eg.db")
map_to_entrez_id(from, org_db = "org.Hs.eg.db")
from |
The input gene ID type. Valid values should be in, e.g. |
org_db |
The annotation database. |
If there are multiple mappings from the input ID type to an unique Entrez ID, it randomly picks one.
A named vectors where names are IDs with input ID type and values are the Entrez IDs.
The returned object normally is used in functional_enrichment
.
map = map_to_entrez_id("ENSEMBL") head(map)
map = map_to_entrez_id("ENSEMBL") head(map)
Max depth of the hierarchy
## S4 method for signature 'HierarchicalPartition' max_depth(object)
## S4 method for signature 'HierarchicalPartition' max_depth(object)
object |
A |
A numeric value.
Zuguang Gu <[email protected]>
data(golub_cola_rh) max_depth(golub_cola_rh)
data(golub_cola_rh) max_depth(golub_cola_rh)
Heatmap of membership in each partition
## S4 method for signature 'ConsensusPartition' membership_heatmap(object, k, internal = FALSE, anno = object@anno, anno_col = get_anno_col(object), show_column_names = FALSE, column_names_gp = gpar(fontsize = 8), ...)
## S4 method for signature 'ConsensusPartition' membership_heatmap(object, k, internal = FALSE, anno = object@anno, anno_col = get_anno_col(object), show_column_names = FALSE, column_names_gp = gpar(fontsize = 8), ...)
object |
A |
k |
Number of subgroups. |
internal |
Used internally. |
anno |
A data frame of annotations for the original matrix columns. By default it uses the annotations specified in |
anno_col |
A list of colors (color is defined as a named vector) for the annotations. If |
show_column_names |
Whether show column names in the heatmap (which is the column name in the original matrix). |
column_names_gp |
Graphics parameters for column names. |
... |
Other arguments. |
Each row in the heatmap is the membership in one single partition.
Heatmap is split on rows by top_n
.
No value is returned.
Zuguang Gu <[email protected]>
data(golub_cola) membership_heatmap(golub_cola["ATC", "skmeans"], k = 3)
data(golub_cola) membership_heatmap(golub_cola["ATC", "skmeans"], k = 3)
Parameters to merge branches in subgroup dendrogram.
merge_node_param(depth = Inf, min_n_signatures = -Inf, min_p_signatures = -Inf)
merge_node_param(depth = Inf, min_n_signatures = -Inf, min_p_signatures = -Inf)
depth |
Depth of the dendrogram. |
min_n_signatures |
Minimal number of signatures for the partitioning on each node. |
min_p_signatures |
Minimal fraction of sigatures compared to the total number of rows on each node. |
# There is no example NULL
# There is no example NULL
Merge node
## S4 method for signature 'HierarchicalPartition' merge_node(object, node_id)
## S4 method for signature 'HierarchicalPartition' merge_node(object, node_id)
object |
A |
node_id |
A vector of node IDs where each node is merged as a leaf node. |
A HierarchicalPartition-class
object.
# There is no example NULL
# There is no example NULL
Number of columns in the matrix
## S4 method for signature 'ConsensusPartition' ncol(x)
## S4 method for signature 'ConsensusPartition' ncol(x)
x |
A |
# There is no example NULL
# There is no example NULL
Number of columns in the matrix
## S4 method for signature 'ConsensusPartitionList' ncol(x)
## S4 method for signature 'ConsensusPartitionList' ncol(x)
x |
A |
# There is no example NULL
# There is no example NULL
Method dispatch page for ncol
.
ncol
can be dispatched on following classes:
ncol,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
ncol,ConsensusPartition-method
, ConsensusPartition-class
class method
ncol,DownSamplingConsensusPartition-method
, DownSamplingConsensusPartition-class
class method
ncol,HierarchicalPartition-method
, HierarchicalPartition-class
class method
# no example NULL
# no example NULL
Number of columns in the matrix
## S4 method for signature 'DownSamplingConsensusPartition' ncol(x)
## S4 method for signature 'DownSamplingConsensusPartition' ncol(x)
x |
A |
# There is no example NULL
# There is no example NULL
Number of columns in the matrix
## S4 method for signature 'HierarchicalPartition' ncol(x)
## S4 method for signature 'HierarchicalPartition' ncol(x)
x |
A |
# There is no example NULL
# There is no example NULL
Information on the nodes
## S4 method for signature 'HierarchicalPartition' node_info(object)
## S4 method for signature 'HierarchicalPartition' node_info(object)
object |
A |
It returns the following node-level information:
Node id.
Number of columns.
Number of signatures.
Percent of signatures.
Whether the node is a leaf
# There is no example NULL
# There is no example NULL
Information on the nodes
## S4 method for signature 'HierarchicalPartition' node_level(object)
## S4 method for signature 'HierarchicalPartition' node_level(object)
object |
A |
It is the same as node_info,HierarchicalPartition-method
.
# There is no example NULL
# There is no example NULL
Number of rows in the matrix
## S4 method for signature 'ConsensusPartition' nrow(x)
## S4 method for signature 'ConsensusPartition' nrow(x)
x |
A |
# There is no example NULL
# There is no example NULL
Number of rows in the matrix
## S4 method for signature 'ConsensusPartitionList' nrow(x)
## S4 method for signature 'ConsensusPartitionList' nrow(x)
x |
A |
# There is no example NULL
# There is no example NULL
Method dispatch page for nrow
.
nrow
can be dispatched on following classes:
nrow,HierarchicalPartition-method
, HierarchicalPartition-class
class method
nrow,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
nrow,ConsensusPartition-method
, ConsensusPartition-class
class method
# no example NULL
# no example NULL
Number of rows in the matrix
## S4 method for signature 'HierarchicalPartition' nrow(x)
## S4 method for signature 'HierarchicalPartition' nrow(x)
x |
A |
# There is no example NULL
# There is no example NULL
The proportion of ambiguous clustering (PAC score)
PAC(consensus_mat, x1 = 0.1, x2 = 0.9, class = NULL)
PAC(consensus_mat, x1 = 0.1, x2 = 0.9, class = NULL)
consensus_mat |
A consensus matrix. |
x1 |
Lower bound to define "ambiguous clustering". |
x2 |
Upper bound to define "ambihuous clustering". |
class |
Subgroup labels. If it is provided, samples with silhouette score less than the 5^th percential are removed from PAC calculation. |
The PAC score is defined as F(x2) - F(x1) where F(x) is the CDF of the consensus matrix.
A single numeric vaule.
See https://www.nature.com/articles/srep06207 for explanation of PAC score.
Zuguang Gu <[email protected]>
data(golub_cola) PAC(get_consensus(golub_cola[1, 1], k = 2)) PAC(get_consensus(golub_cola[1, 1], k = 3)) PAC(get_consensus(golub_cola[1, 1], k = 4)) PAC(get_consensus(golub_cola[1, 1], k = 5)) PAC(get_consensus(golub_cola[1, 1], k = 6)) # with specifying `class` PAC(get_consensus(golub_cola[1, 1], k = 2), class = get_classes(golub_cola[1, 1], k = 2)[, 1])
data(golub_cola) PAC(get_consensus(golub_cola[1, 1], k = 2)) PAC(get_consensus(golub_cola[1, 1], k = 3)) PAC(get_consensus(golub_cola[1, 1], k = 4)) PAC(get_consensus(golub_cola[1, 1], k = 5)) PAC(get_consensus(golub_cola[1, 1], k = 6)) # with specifying `class` PAC(get_consensus(golub_cola[1, 1], k = 2), class = get_classes(golub_cola[1, 1], k = 2)[, 1])
Plot the empirical cumulative distribution (eCDF) curve of the consensus matrix
## S4 method for signature 'ConsensusPartition' plot_ecdf(object, ...)
## S4 method for signature 'ConsensusPartition' plot_ecdf(object, ...)
object |
A |
... |
Other arguments. |
It plots eCDF curve for each k.
This function is mainly used in collect_plots
and select_partition_number
functions.
No value is returned.
Zuguang Gu <[email protected]>
See ecdf
for a detailed explanation of the empirical cumulative distribution function.
data(golub_cola) plot_ecdf(golub_cola["ATC", "skmeans"])
data(golub_cola) plot_ecdf(golub_cola["ATC", "skmeans"])
Predict classes for new samples based on cola classification
## S4 method for signature 'ConsensusPartition' predict_classes(object, k, mat, silhouette_cutoff = 0.5, fdr_cutoff = cola_opt$fdr_cutoff, group_diff = cola_opt$group_diff, scale_rows = object@scale_rows, diff_method = "Ftest", method = "centroid", dist_method = c("euclidean", "correlation", "cosine"), nperm = 1000, p_cutoff = 0.05, plot = TRUE, col_fun = NULL, split_by_sigatures = FALSE, force = FALSE, verbose = TRUE, help = TRUE, prefix = "", mc.cores = 1, cores = mc.cores)
## S4 method for signature 'ConsensusPartition' predict_classes(object, k, mat, silhouette_cutoff = 0.5, fdr_cutoff = cola_opt$fdr_cutoff, group_diff = cola_opt$group_diff, scale_rows = object@scale_rows, diff_method = "Ftest", method = "centroid", dist_method = c("euclidean", "correlation", "cosine"), nperm = 1000, p_cutoff = 0.05, plot = TRUE, col_fun = NULL, split_by_sigatures = FALSE, force = FALSE, verbose = TRUE, help = TRUE, prefix = "", mc.cores = 1, cores = mc.cores)
object |
A |
k |
Number of subgroups to get the classifications. |
mat |
The new matrix where the sample classes are going to be predicted. The number of rows should be the same as the original matrix for cola analysis (also make sure the row orders are the same). Be careful that the scaling of |
silhouette_cutoff |
Send to |
fdr_cutoff |
Send to |
group_diff |
Send to |
scale_rows |
Send to |
diff_method |
Send to |
method |
Method for predicting class labels. Possible values are "centroid", "svm" and "randomForest". |
dist_method |
Distance method. Value should be "euclidean", "correlation" or "cosine". Send to |
nperm |
Number of permutatinos. It is used when |
p_cutoff |
Cutoff for the p-values for determining class assignment. Send to |
plot |
Whether to draw the plot that visualizes the process of prediction. Send to |
col_fun |
A color mapping function generated from |
split_by_sigatures |
Should the heatmaps be split based on k-means on the main heatmap, or on the patterns of the signature heatmap. |
force |
If the value is |
verbose |
Whether to print messages. Send to |
help |
Whether to print help messages. |
prefix |
Used internally. |
mc.cores |
Number of cores. This argument will be removed in future versions. |
cores |
Number of cores, or a |
The prediction is based on the signature centroid matrix from cola classification. The processes are as follows:
1. For the provided ConsensusPartition-class
object and a selected k, the signatures that discriminate classes
are extracted by get_signatures,ConsensusPartition-method
. If number of signatures is more than 2000, only 2000 signatures are randomly sampled.
2. The signature centroid matrix is a k-column matrix where each column is the centroid of samples in the corresponding
class, i.e. the mean across samples. If rows were scaled in cola analysis, the signature centroid matrix is the mean of scaled
values and vise versa. Please note the samples with silhouette score less than silhouette_cutoff
are removed
for calculating the centroids.
3. With the signature centroid matrix and the new matrix, it calls predict_classes,matrix-method
to perform the prediction.
Please see more details of the prediction on that help page. Please note, the scales of the new matrix should be the same as the matrix
used for cola analysis.
A data frame with two columns: the class labels (in numeric) and the corresponding p-values.
predict_classes,matrix-method
that predicts the classes for new samples.
data(golub_cola) res = golub_cola["ATC:skmeans"] mat = get_matrix(res) # note scaling should be applied here because the matrix was scaled in the cola analysis mat2 = t(scale(t(mat))) cl = predict_classes(res, k = 3, mat2) # compare the real classification and the predicted classification data.frame(cola_class = get_classes(res, k = 3)[, "class"], predicted = cl[, "class"]) # change to correlation method cl = predict_classes(res, k = 3, mat2, dist_method = "correlation") # compare the real classification and the predicted classification data.frame(cola_class = get_classes(res, k = 3)[, "class"], predicted = cl[, "class"])
data(golub_cola) res = golub_cola["ATC:skmeans"] mat = get_matrix(res) # note scaling should be applied here because the matrix was scaled in the cola analysis mat2 = t(scale(t(mat))) cl = predict_classes(res, k = 3, mat2) # compare the real classification and the predicted classification data.frame(cola_class = get_classes(res, k = 3)[, "class"], predicted = cl[, "class"]) # change to correlation method cl = predict_classes(res, k = 3, mat2, dist_method = "correlation") # compare the real classification and the predicted classification data.frame(cola_class = get_classes(res, k = 3)[, "class"], predicted = cl[, "class"])
Method dispatch page for predict_classes
.
predict_classes
can be dispatched on following classes:
predict_classes,matrix-method
, matrix-class
class method
predict_classes,ConsensusPartition-method
, ConsensusPartition-class
class method
# no example NULL
# no example NULL
Predict classes for new samples based on signature centroid matrix
## S4 method for signature 'matrix' predict_classes(object, mat, dist_method = c("euclidean", "correlation", "cosine"), nperm = 1000, p_cutoff = 0.05, plot = TRUE, col_fun = NULL, split_by_sigatures = FALSE, verbose = TRUE, prefix = "", mc.cores = 1, cores = mc.cores, width1 = NULL, width2 = NULL)
## S4 method for signature 'matrix' predict_classes(object, mat, dist_method = c("euclidean", "correlation", "cosine"), nperm = 1000, p_cutoff = 0.05, plot = TRUE, col_fun = NULL, split_by_sigatures = FALSE, verbose = TRUE, prefix = "", mc.cores = 1, cores = mc.cores, width1 = NULL, width2 = NULL)
object |
The signature centroid matrix. See the Details section. |
mat |
The new matrix where the classes are going to be predicted. The number of rows should be the same as the signature centroid matrix (also make sure the row orders are the same). Be careful that |
dist_method |
Distance method. Value should be "euclidean", "correlation" or "cosine". |
nperm |
Number of permutatinos. It is used when |
p_cutoff |
Cutoff for the p-values for determining class assignment. |
plot |
Whether to draw the plot that visualizes the process of prediction. |
col_fun |
A color mapping function generated from |
verbose |
Whether to print messages. |
split_by_sigatures |
Should the heatmaps be split based on k-means on the main heatmap, or on the patterns of the signature heatmap. |
prefix |
Used internally. |
mc.cores |
Number of cores. This argument will be removed in future versions. |
cores |
Number of cores, or a |
width1 |
Width of the first heatmap. |
width2 |
Width of the second heatmap. |
The signature centroid matrix is a k-column matrix where each column is the centroid of samples in the corresponding class (k-group classification).
For each sample in the new matrix, the task is basically to test which signature centroid the current sample is the closest to. There are two methods: the Euclidean distance and the correlation (Spearman) distance.
For the Euclidean/cosine distance method, for the vector denoted as x which corresponds to sample i
in the new matrix, to test which class should be assigned to sample i, the distance between
sample i and all k signature centroids are calculated and denoted as d_1, d_2, ..., d_k. The class with the smallest distance is assigned to sample i.
The distances for k centroids are sorted increasingly, and we design a statistic named "difference ratio", denoted as r
and calculated as: (|d_(1) - d_(2)|)/mean(d), which is the difference between the smallest distance and the second
smallest distance, normalized by the mean distance.
To test the statistical significance of r, we randomly permute rows of the signature centroid matrix and calculate r_rand.
The random permutation is performed n_perm
times and the p-value is calculated as the proportion of r_rand being
larger than r.
For the correlation method, the distance is calculated as the Spearman correlation between sample i and signature
centroid k. The label for the class with the maximal correlation value is assigned to sample i. The
p-value is simply calculated by cor.test
between sample i and centroid k.
If a sample is tested with a p-value higher than p_cutoff
, the corresponding class label is set to NA
.
A data frame with two columns: the class labels (the column names of the signature centroid matrix are treated as class labels) and the corresponding p-values.
data(golub_cola) res = golub_cola["ATC:skmeans"] mat = get_matrix(res) # note scaling should be applied here because the matrix was scaled in the cola analysis mat2 = t(scale(t(mat))) tb = get_signatures(res, k = 3, plot = FALSE) sig_mat = tb[, grepl("scaled_mean", colnames(tb))] sig_mat = as.matrix(sig_mat) colnames(sig_mat) = paste0("class", seq_len(ncol(sig_mat))) # this is how the signature centroid matrix looks like: head(sig_mat) mat2 = mat2[tb$which_row, , drop = FALSE] # now we predict the class for `mat2` based on `sig_mat` predict_classes(sig_mat, mat2)
data(golub_cola) res = golub_cola["ATC:skmeans"] mat = get_matrix(res) # note scaling should be applied here because the matrix was scaled in the cola analysis mat2 = t(scale(t(mat))) tb = get_signatures(res, k = 3, plot = FALSE) sig_mat = tb[, grepl("scaled_mean", colnames(tb))] sig_mat = as.matrix(sig_mat) colnames(sig_mat) = paste0("class", seq_len(ncol(sig_mat))) # this is how the signature centroid matrix looks like: head(sig_mat) mat2 = mat2[tb$which_row, , drop = FALSE] # now we predict the class for `mat2` based on `sig_mat` predict_classes(sig_mat, mat2)
Print the hc_table_suggest_best_k object
## S3 method for class 'hc_table_suggest_best_k' print(x, ...)
## S3 method for class 'hc_table_suggest_best_k' print(x, ...)
x |
A |
... |
Other arguments. |
# There is no example NULL
# There is no example NULL
Recalculate statistics in the ConsensusPartitionList object
recalc_stats(rl)
recalc_stats(rl)
rl |
A |
It updates the stat
slot in the ConsensusPartitionList object, used internally.
# There is no example NULL
# There is no example NULL
Register NMF partitioning method
register_NMF()
register_NMF()
NMF analysis is performed by nmf
.
# There is no example NULL
# There is no example NULL
Register user-defined partitioning methods
register_partition_methods(..., scale_method = c("z-score", "min-max", "none"))
register_partition_methods(..., scale_method = c("z-score", "min-max", "none"))
... |
A named list of functions. |
scale_method |
Normally, data matrix is scaled by rows before sent to the partition function. The default scaling is applied by |
The user-defined function should accept at least two arguments. The first two arguments are the data
matrix and the number of subgroups. The third optional argument should always be ...
so that parameters
for the partition function can be passed by partition_param
from consensus_partition
.
If users forget to add ...
, it is added internally.
The function should return a vector of partitions (or class labels) or an object which can be recognized by cl_membership
.
The partition function should be applied on columns (Users should be careful with this because some R functions apply on rows and
some R functions apply on columns). E.g. following is how we register kmeans
partition method:
register_partition_methods( kmeans = function(mat, k, ...) { # mat is transposed because kmeans() applies on rows kmeans(t(mat), centers = k, ...)$centers } )
The registered partitioning methods will be used as defaults in run_all_consensus_partition_methods
.
To remove a partitioning method, use remove_partition_methods
.
There are following default partitioning methods:
hierarchcial clustering with Euclidean distance, later columns are partitioned by cutree
. If users want to use another distance metric or clustering method, consider to register a new partitioning method. E.g. register_partition_methods(hclust_cor = function(mat, k) cutree(hclust(as.dist(cor(mat)))))
.
by kmeans
.
by skmeans
.
by pam
.
by Mclust
. mclust is applied to the first three principle components from rows.
Users can register two other pre-defined partitioning methods by register_NMF
and register_SOM
.
No value is returned.
Zuguang Gu <[email protected]>
all_partition_methods
lists all registered partitioning methods.
all_partition_methods() register_partition_methods( random = function(mat, k) sample(k, ncol(mat), replace = TRUE) ) all_partition_methods() remove_partition_methods("random")
all_partition_methods() register_partition_methods( random = function(mat, k) sample(k, ncol(mat), replace = TRUE) ) all_partition_methods() remove_partition_methods("random")
Register SOM partitioning method
register_SOM()
register_SOM()
The SOM analysis is performed by som
.
# There is no example NULL
# There is no example NULL
Register user-defined top-value methods
register_top_value_methods(..., validate = TRUE)
register_top_value_methods(..., validate = TRUE)
... |
A named list of functions. |
validate |
Whether validate the functions. |
The user-defined function should accept one argument which is the data matrix where the scores are calculated by rows. Rows with top scores are treated as "top rows" in cola analysis. Following is how we register "SD" (standard deviation) top-value method:
register_top_value_methods(SD = function(mat) apply(mat, 1, sd))
Of course, you can use rowSds
to give a faster calculation of row SD:
register_top_value_methods(SD = rowSds)
The registered top-value method will be used as defaults in run_all_consensus_partition_methods
.
To remove a top-value method, use remove_top_value_methods
.
There are four default top-value methods:
standard deviation, by rowSds
.
coefficient variance, calculated as sd/(mean+s)
where s
is the 10^th percentile of all row means.
median absolute deviation, by rowMads
.
the ATC
method.
No value is returned.
Zuguang Gu <[email protected]>
all_top_value_methods
lists all registered top-value methods.
all_top_value_methods() register_top_value_methods( ATC_spearman = function(mat) ATC(mat, method = "spearman") ) all_top_value_methods() remove_top_value_methods("ATC_spearman")
all_top_value_methods() register_top_value_methods( ATC_spearman = function(mat) ATC(mat, method = "spearman") ) all_top_value_methods() remove_top_value_methods("ATC_spearman")
Relabel class labels according to the reference labels
relabel_class(class, ref, full_set = union(class, ref), return_map = TRUE)
relabel_class(class, ref, full_set = union(class, ref), return_map = TRUE)
class |
A vector of class labels. |
ref |
A vector of reference labels. |
full_set |
The full set of labels. |
return_map |
Whether to return the mapping of the adjusted labels. |
In partitions, the exact value of the class label is not of importance. E.g. for two partitions
a, a, a, b, b, b, b
and b, b, b, a, a, a, a
, they are the same partitions although the labels
of a
and b
are switched in the two partitions. Even the partition c, c, c, d, d, d, d
is the same as the previous two although it uses a different set of labels. Here relabel_class
function relabels
class
vector according to the labels in ref
vector by looking for a mapping m()
to maximize sum(m(class) == ref)
.
Mathematically, this is called linear sum assignment problem and it is solved by solve_LSAP
.
A named vector where names correspond to the labels in class
and values correspond to ref
,
which means map = relabel_class(class, ref); map[class]
returns the relabelled labels.
The returned object attaches a data frame with three columns:
original labels. in class
adjusted labels. according to ref
reference labels. in ref
If return_map
in the relabel_class
is set to FALSE
, the function simply returns
a vector of adjusted class labels.
If the function returns the mapping vector (when return_map = TRUE
), the mapping variable
is always character, which means, if your class
and ref
are numeric, you need to convert
them back to numeric explicitely. If return_map = FALSE
, the returned relabelled vector has
the same mode as class
.
class = c(rep("a", 10), rep("b", 3)) ref = c(rep("b", 4), rep("a", 9)) relabel_class(class, ref) relabel_class(class, ref, return_map = FALSE) # if class and ref are from completely different sets class = c(rep("A", 10), rep("B", 3)) relabel_class(class, ref) # class labels are numeric class = c(rep(1, 10), rep(2, 3)) ref = c(rep(2, 4), rep(1, 9)) relabel_class(class, ref) relabel_class(class, ref, return_map = FALSE)
class = c(rep("a", 10), rep("b", 3)) ref = c(rep("b", 4), rep("a", 9)) relabel_class(class, ref) relabel_class(class, ref, return_map = FALSE) # if class and ref are from completely different sets class = c(rep("A", 10), rep("B", 3)) relabel_class(class, ref) # class labels are numeric class = c(rep(1, 10), rep(2, 3)) ref = c(rep(2, 4), rep(1, 9)) relabel_class(class, ref) relabel_class(class, ref, return_map = FALSE)
Remove partitioning methods
remove_partition_methods(method)
remove_partition_methods(method)
method |
Name of the partitioning methods to be removed. |
No value is returned.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Remove top-value methods
remove_top_value_methods(method)
remove_top_value_methods(method)
method |
Name of the top-value methods to be removed. |
No value is returned.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Row names of the matrix
## S4 method for signature 'ConsensusPartition' rownames(x)
## S4 method for signature 'ConsensusPartition' rownames(x)
x |
A |
# There is no example NULL
# There is no example NULL
Row names of the matrix
## S4 method for signature 'ConsensusPartitionList' rownames(x)
## S4 method for signature 'ConsensusPartitionList' rownames(x)
x |
A |
# There is no example NULL
# There is no example NULL
Method dispatch page for rownames
.
rownames
can be dispatched on following classes:
rownames,HierarchicalPartition-method
, HierarchicalPartition-class
class method
rownames,ConsensusPartition-method
, ConsensusPartition-class
class method
rownames,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
# no example NULL
# no example NULL
Row names of the matrix
## S4 method for signature 'HierarchicalPartition' rownames(x)
## S4 method for signature 'HierarchicalPartition' rownames(x)
x |
A |
# There is no example NULL
# There is no example NULL
Consensus partitioning for all combinations of methods
run_all_consensus_partition_methods(data, top_value_method = all_top_value_methods(), partition_method = all_partition_methods(), max_k = 6, k = NULL, top_n = NULL, mc.cores = 1, cores = mc.cores, anno = NULL, anno_col = NULL, sample_by = "row", p_sampling = 0.8, partition_repeat = 50, scale_rows = NULL, verbose = TRUE, help = cola_opt$help)
run_all_consensus_partition_methods(data, top_value_method = all_top_value_methods(), partition_method = all_partition_methods(), max_k = 6, k = NULL, top_n = NULL, mc.cores = 1, cores = mc.cores, anno = NULL, anno_col = NULL, sample_by = "row", p_sampling = 0.8, partition_repeat = 50, scale_rows = NULL, verbose = TRUE, help = cola_opt$help)
data |
A numeric matrix where subgroups are found by columns. |
top_value_method |
Method which are used to extract top n rows. Allowed methods are in |
partition_method |
Method which are used to partition samples. Allowed methods are in |
max_k |
Maximal number of subgroups to try. The function will try |
k |
Alternatively, you can specify a vector k. |
top_n |
Number of rows with top values. The value can be a vector with length > 1. When n > 5000, the function only randomly sample 5000 rows from top n rows. If |
mc.cores |
Number of cores to use. This argument will be removed in future versions. |
cores |
Number of cores, or a |
anno |
A data frame with known annotation of columns. |
anno_col |
A list of colors (color is defined as a named vector) for the annotations. If |
sample_by |
Should randomly sample the matrix by rows or by columns? |
p_sampling |
Proportion of the top n rows to sample. |
partition_repeat |
Number of repeats for the random sampling. |
scale_rows |
Whether to scale rows. If it is |
verbose |
Whether to print messages. |
help |
Whether to print help messages. |
The function performs consensus partitioning by consensus_partition
for all combinations of top-value methods and partitioning methods.
It also adjsuts the subgroup labels for all methods and for all k to make them as consistent as possible.
A ConsensusPartitionList-class
object. Simply type object in the interactive R session
to see which functions can be applied on it.
Zuguang Gu <[email protected]>
## Not run: set.seed(123) m = cbind(rbind(matrix(rnorm(20*20, mean = 1), nr = 20), matrix(rnorm(20*20, mean = -1), nr = 20)), rbind(matrix(rnorm(20*20, mean = -1), nr = 20), matrix(rnorm(20*20, mean = 1), nr = 20)) ) + matrix(rnorm(40*40), nr = 40) rl = run_all_consensus_partition_methods(data = m, top_n = c(20, 30, 40)) ## End(Not run)
## Not run: set.seed(123) m = cbind(rbind(matrix(rnorm(20*20, mean = 1), nr = 20), matrix(rnorm(20*20, mean = -1), nr = 20)), rbind(matrix(rnorm(20*20, mean = -1), nr = 20), matrix(rnorm(20*20, mean = 1), nr = 20)) ) + matrix(rnorm(40*40), nr = 40) rl = run_all_consensus_partition_methods(data = m, top_n = c(20, 30, 40)) ## End(Not run)
Several plots for determining the optimized number of subgroups
## S4 method for signature 'ConsensusPartition' select_partition_number(object, mark_best = TRUE, all_stats = FALSE)
## S4 method for signature 'ConsensusPartition' select_partition_number(object, mark_best = TRUE, all_stats = FALSE)
object |
A |
mark_best |
Whether mark the best k in the plot. |
all_stats |
Whether to show all statistics that were calculated. Used internally. |
There are following plots made:
eCDF of the consensus matrix under each k, made by plot_ecdf,ConsensusPartition-method
,
PAC
score,
mean sihouette score,
the concordance
for each partition to the consensus partition,
area increase of the area under the ECDF of consensus matrix with increasing k,
Rand index for current k compared to k - 1,
Jaccard coefficient for current k compared to k - 1,
No value is returned.
Zuguang Gu <[email protected]>
data(golub_cola) select_partition_number(golub_cola["ATC", "skmeans"])
data(golub_cola) select_partition_number(golub_cola["ATC", "skmeans"])
Print the ConsensusPartition object
## S4 method for signature 'ConsensusPartition' show(object)
## S4 method for signature 'ConsensusPartition' show(object)
object |
A |
No value is returned.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Print the ConsensusPartitionList object
## S4 method for signature 'ConsensusPartitionList' show(object)
## S4 method for signature 'ConsensusPartitionList' show(object)
object |
A |
No value is returned.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Method dispatch page for show
.
show
can be dispatched on following classes:
show,HierarchicalPartition-method
, HierarchicalPartition-class
class method
show,ConsensusPartition-method
, ConsensusPartition-class
class method
show,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
show,DownSamplingConsensusPartition-method
, DownSamplingConsensusPartition-class
class method
# no example NULL
# no example NULL
Print the DownSamplingConsensusPartition object
## S4 method for signature 'DownSamplingConsensusPartition' show(object)
## S4 method for signature 'DownSamplingConsensusPartition' show(object)
object |
A |
No value is returned.
Zuguang Gu <[email protected]>
data(golub_cola_ds) golub_cola_ds
data(golub_cola_ds) golub_cola_ds
Print the HierarchicalPartition object
## S4 method for signature 'HierarchicalPartition' show(object)
## S4 method for signature 'HierarchicalPartition' show(object)
object |
a |
No value is returned.
Zuguang Gu <[email protected]>
data(golub_cola_rh) golub_cola_rh
data(golub_cola_rh) golub_cola_rh
Split node
## S4 method for signature 'HierarchicalPartition' split_node(object, node_id, subset = object@param$subset, min_samples = object@param$min_samples, max_k = object@param$max_k, cores = object@param$cores, verbose = TRUE, top_n = object@param$top_n, min_n_signatures = object@param$min_n_signatures, group_diff = object@param$group_diff, fdr_cutoff = object@param$fdr_cutoff)
## S4 method for signature 'HierarchicalPartition' split_node(object, node_id, subset = object@param$subset, min_samples = object@param$min_samples, max_k = object@param$max_k, cores = object@param$cores, verbose = TRUE, top_n = object@param$top_n, min_n_signatures = object@param$min_n_signatures, group_diff = object@param$group_diff, fdr_cutoff = object@param$fdr_cutoff)
object |
A |
node_id |
A single ID of a node that is going to be split. |
subset |
The same as in |
min_samples |
The same as in |
max_k |
max_k The same as in |
cores |
Number of cores. |
verbose |
Whether to print messages. |
top_n |
The same as in |
min_n_signatures |
The same as in |
group_diff |
The same as in |
fdr_cutoff |
The same as in |
It applies hierarchical consensus partitioning on the specified node.
A HierarchicalPartition-class
object.
# There is no example NULL
# There is no example NULL
Suggest the best number of subgroups
## S4 method for signature 'ConsensusPartition' suggest_best_k(object, jaccard_index_cutoff = select_jaccard_cutoff(ncol(object)), mean_silhouette_cutoff = NULL, stable_PAC = 0.1, help = cola_opt$help)
## S4 method for signature 'ConsensusPartition' suggest_best_k(object, jaccard_index_cutoff = select_jaccard_cutoff(ncol(object)), mean_silhouette_cutoff = NULL, stable_PAC = 0.1, help = cola_opt$help)
object |
A |
jaccard_index_cutoff |
The cutoff for Jaccard index for comparing to previous k. |
mean_silhouette_cutoff |
Cutoff for mean silhourtte scores. |
stable_PAC |
Cutoff for stable PAC. This argument only take effect when |
help |
Whether to print help message. |
The best k is selected according to following rules:
All k with Jaccard index larger than jaccard_index_cutoff
are removed because increasing k does not provide enough extra information. If all k are removed, it is marked as no subgroup is detected.
If all k with Jaccard index larger than 0.75, k with the highest mean silhourtte score is taken as the best k.
For all k with mean silhouette score larger than mean_silhouette_cutoff
, the maximal k is taken as the best k, and other k are marked as optional best k.
If argument mean_silhouette_cutoff
is set to NULL, which means we do not filter by mean silhouette scores while by 1-PAC scores. Similarly, k with the highest 1-PAC is taken the best k and other k are marked as optional best k.
If it does not fit the second rule. The k with the maximal vote of the highest 1-PAC score, highest mean silhouette, and highest concordance is taken as the best k.
It should be noted that it is difficult to find the best k deterministically, we encourage users to compare results for all k and determine a proper one which best explain their studies.
The best k.
The selection of the best k can be visualized by select_partition_number
.
Zuguang Gu <[email protected]>
data(golub_cola) obj = golub_cola["ATC", "skmeans"] suggest_best_k(obj)
data(golub_cola) obj = golub_cola["ATC", "skmeans"] suggest_best_k(obj)
Suggest the best number of subgroups
## S4 method for signature 'ConsensusPartitionList' suggest_best_k(object, jaccard_index_cutoff = select_jaccard_cutoff(ncol(object)))
## S4 method for signature 'ConsensusPartitionList' suggest_best_k(object, jaccard_index_cutoff = select_jaccard_cutoff(ncol(object)))
object |
A |
jaccard_index_cutoff |
The cutoff for Jaccard index for comparing to previous k. |
It basically gives the best k for each combination of top-value method and partitioning method by calling suggest_best_k,ConsensusPartition-method
.
1-PAC score higher than 0.95 is treated as very stable partition (marked by **
) and higher than 0.9 is treated as stable partition (marked by *
).
A data frame with the best k and other statistics for each combination of methods.
Zuguang Gu <[email protected]>
data(golub_cola) suggest_best_k(golub_cola)
data(golub_cola) suggest_best_k(golub_cola)
Method dispatch page for suggest_best_k
.
suggest_best_k
can be dispatched on following classes:
suggest_best_k,HierarchicalPartition-method
, HierarchicalPartition-class
class method
suggest_best_k,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
suggest_best_k,ConsensusPartition-method
, ConsensusPartition-class
class method
# no example NULL
# no example NULL
Guess the best number of partitions
## S4 method for signature 'HierarchicalPartition' suggest_best_k(object)
## S4 method for signature 'HierarchicalPartition' suggest_best_k(object)
object |
A |
It basically gives the best k at each node.
A data frame with the best k and other statistics for each node.
Zuguang Gu <[email protected]>
data(golub_cola_rh) suggest_best_k(golub_cola_rh)
data(golub_cola_rh) suggest_best_k(golub_cola_rh)
Test whether a list of factors are correlated
test_between_factors(x, y = NULL, all_factors = FALSE, verbose = FALSE)
test_between_factors(x, y = NULL, all_factors = FALSE, verbose = FALSE)
x |
A data frame or a vector which contains discrete or continuous variables. if |
y |
A data frame or a vector which contains discrete or continuous variables. |
all_factors |
Are all columns in |
verbose |
Whether to print messages. |
Pairwise test is applied to every two columns in the data frames. Methods are:
two numeric variables: correlation test by cor.test
is applied (Spearman method);
two character or factor variables: chisq.test
is applied;
one numeric variable and one character/factor variable: oneway ANOVA test by oneway.test
is applied.
This function can be used to test the correlation between the predicted classes and other known factors.
A matrix of p-values. If there are NA values, basically it means there are no efficient data points to perform the test.
Zuguang Gu <[email protected]>
df = data.frame( v1 = rnorm(100), v2 = sample(letters[1:3], 100, replace = TRUE), v3 = sample(LETTERS[5:6], 100, replace = TRUE) ) test_between_factors(df) x = runif(100) test_between_factors(x, df)
df = data.frame( v1 = rnorm(100), v2 = sample(letters[1:3], 100, replace = TRUE), v3 = sample(LETTERS[5:6], 100, replace = TRUE) ) test_between_factors(df) x = runif(100) test_between_factors(x, df)
Test correspondance between predicted subgroups and known factors
## S4 method for signature 'ConsensusPartition' test_to_known_factors(object, k, known = get_anno(object), silhouette_cutoff = 0.5, verbose = FALSE)
## S4 method for signature 'ConsensusPartition' test_to_known_factors(object, k, known = get_anno(object), silhouette_cutoff = 0.5, verbose = FALSE)
object |
A |
k |
Number of subgroups. It uses all |
known |
A vector or a data frame with known factors. By default it is the annotation table set in |
silhouette_cutoff |
Cutoff for sihouette scores. Samples with value less than it are omit. |
verbose |
Whether to print messages. |
The test is performed by test_between_factors
between the predicted classes and user's annotation table.
A data frame with the following columns:
number of samples used to test after filtered by silhouette_cutoff
,
p-values from the tests,
number of subgroups.
Zuguang Gu <[email protected]>
data(golub_cola) res = golub_cola["ATC:skmeans"] anno = get_anno(res) anno test_to_known_factors(res, k = 3) # or explicitly specify known argument test_to_known_factors(res, k = 3, known = anno)
data(golub_cola) res = golub_cola["ATC:skmeans"] anno = get_anno(res) anno test_to_known_factors(res, k = 3) # or explicitly specify known argument test_to_known_factors(res, k = 3, known = anno)
Test correspondance between predicted classes and known factors
## S4 method for signature 'ConsensusPartitionList' test_to_known_factors(object, k, known = get_anno(object), silhouette_cutoff = 0.5, verbose = FALSE)
## S4 method for signature 'ConsensusPartitionList' test_to_known_factors(object, k, known = get_anno(object), silhouette_cutoff = 0.5, verbose = FALSE)
object |
A |
k |
Number of subgroups. It uses all |
known |
A vector or a data frame with known factors. By default it is the annotation table set in |
silhouette_cutoff |
Cutoff for sihouette scores. Samples with value less than this are omit. |
verbose |
Whether to print messages. |
The function basically sends each ConsensusPartition-class
object to
test_to_known_factors,ConsensusPartition-method
and merges results afterwards.
A data frame with the following columns:
number of samples used to test after filtered by silhouette_cutoff
,
p-values from the tests,
number of subgroups.
If there are NA values, basically it means there are no efficient data points to perform the test.
Zuguang Gu <[email protected]>
test_between_factors
, test_to_known_factors,ConsensusPartition-method
data(golub_cola) test_to_known_factors(golub_cola)
data(golub_cola) test_to_known_factors(golub_cola)
Method dispatch page for test_to_known_factors
.
test_to_known_factors
can be dispatched on following classes:
test_to_known_factors,HierarchicalPartition-method
, HierarchicalPartition-class
class method
test_to_known_factors,ConsensusPartition-method
, ConsensusPartition-class
class method
test_to_known_factors,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
test_to_known_factors,DownSamplingConsensusPartition-method
, DownSamplingConsensusPartition-class
class method
# no example NULL
# no example NULL
Test correspondance between predicted subgroups and known factors
## S4 method for signature 'DownSamplingConsensusPartition' test_to_known_factors(object, k, known = get_anno(object), p_cutoff = 0.05, verbose = FALSE)
## S4 method for signature 'DownSamplingConsensusPartition' test_to_known_factors(object, k, known = get_anno(object), p_cutoff = 0.05, verbose = FALSE)
object |
A |
k |
Number of subgroups. It uses all |
known |
A vector or a data frame with known factors. By default it is the annotation table set in |
p_cutoff |
Cutoff for p-values for the class prediction. Samples with p-value higher than it are omit. |
verbose |
Whether to print messages. |
The test is performed by test_between_factors
between the predicted classes and user's annotation table.
A data frame with the following columns:
number of samples used to test after filtered by p_cutoff
,
p-values from the tests,
number of subgroups.
Zuguang Gu <[email protected]>
data(golub_cola_ds) test_to_known_factors(golub_cola_ds, k = 3) test_to_known_factors(golub_cola_ds)
data(golub_cola_ds) test_to_known_factors(golub_cola_ds, k = 3) test_to_known_factors(golub_cola_ds)
Test correspondance between predicted classes and known factors
## S4 method for signature 'HierarchicalPartition' test_to_known_factors(object, known = get_anno(object[1]), merge_node = merge_node_param(), verbose = FALSE)
## S4 method for signature 'HierarchicalPartition' test_to_known_factors(object, known = get_anno(object[1]), merge_node = merge_node_param(), verbose = FALSE)
object |
A |
merge_node |
Parameters to merge sub-dendrograms, see |
known |
A vector or a data frame with known factors. By default it is the annotation table set in |
verbose |
Whether to print messages. |
A data frame with columns:
number of samples
p-values from the tests
number of classes
The classifications are extracted for each depth.
Zuguang Gu <[email protected]>
data(golub_cola_rh) # golub_cola_rh already has known annotations, so test_to_known_factors() # can be directly applied test_to_known_factors(golub_cola_rh)
data(golub_cola_rh) # golub_cola_rh already has known annotations, so test_to_known_factors() # can be directly applied test_to_known_factors(golub_cola_rh)
Overlap of top elements from different metrics
top_elements_overlap(object, top_n = round(0.25*length(object[[1]])), method = c("euler", "upset", "venn", "correspondance"), fill = NULL, ...)
top_elements_overlap(object, top_n = round(0.25*length(object[[1]])), method = c("euler", "upset", "venn", "correspondance"), fill = NULL, ...)
object |
A list which contains values from different metrics. |
top_n |
Number of top rows. |
method |
|
fill |
Filled color for the Euler diagram. The value should be a color vector. Transparency of 0.5 are added internally. |
... |
Additional arguments passed to |
The i^th value in every vectors in object
should correspond to the same element from the original data.
No value is returned.
Zuguang Gu <[email protected]>
require(matrixStats) set.seed(123) mat = matrix(rnorm(1000), nrow = 100) lt = list(sd = rowSds(mat), mad = rowMads(mat)) top_elements_overlap(lt, top_n = 20, method = "euler") top_elements_overlap(lt, top_n = 20, method = "upset") top_elements_overlap(lt, top_n = 20, method = "venn") top_elements_overlap(lt, top_n = 20, method = "correspondance")
require(matrixStats) set.seed(123) mat = matrix(rnorm(1000), nrow = 100) lt = list(sd = rowSds(mat), mad = rowMads(mat)) top_elements_overlap(lt, top_n = 20, method = "euler") top_elements_overlap(lt, top_n = 20, method = "upset") top_elements_overlap(lt, top_n = 20, method = "venn") top_elements_overlap(lt, top_n = 20, method = "correspondance")
Heatmap of top rows
## S4 method for signature 'ConsensusPartition' top_rows_heatmap(object, top_n = min(object@top_n), k = NULL, anno = get_anno(object), anno_col = get_anno_col(object), scale_rows = object@scale_rows, ...)
## S4 method for signature 'ConsensusPartition' top_rows_heatmap(object, top_n = min(object@top_n), k = NULL, anno = get_anno(object), anno_col = get_anno_col(object), scale_rows = object@scale_rows, ...)
object |
A |
top_n |
Number of top rows. |
k |
Number of subgroups. If it is not specified, it uses the "best k". |
anno |
A data frame of annotations. |
anno_col |
A list of colors (color is defined as a named vector) for the annotations. If |
scale_rows |
Wether to scale rows. |
... |
Pass to |
No value is returned.
Zuguang Gu <[email protected]>
top_rows_heatmap,matrix-method
data(golub_cola) top_rows_heatmap(golub_cola["ATC:skmeans"])
data(golub_cola) top_rows_heatmap(golub_cola["ATC:skmeans"])
Heatmap of top rows from different top-value methods
## S4 method for signature 'ConsensusPartitionList' top_rows_heatmap(object, top_n = min(object@list[[1]]@top_n), anno = get_anno(object), anno_col = get_anno_col(object), scale_rows = object@list[[1]]@scale_rows, ...)
## S4 method for signature 'ConsensusPartitionList' top_rows_heatmap(object, top_n = min(object@list[[1]]@top_n), anno = get_anno(object), anno_col = get_anno_col(object), scale_rows = object@list[[1]]@scale_rows, ...)
object |
A |
top_n |
Number of top rows. |
anno |
A data frame of annotations for the original matrix columns. By default it uses the annotations specified in |
anno_col |
A list of colors (color is defined as a named vector) for the annotations. If |
scale_rows |
Wether to scale rows. |
... |
Pass to |
No value is returned.
Zuguang Gu <[email protected]>
top_rows_heatmap,matrix-method
data(golub_cola) top_rows_heatmap(golub_cola)
data(golub_cola) top_rows_heatmap(golub_cola)
Method dispatch page for top_rows_heatmap
.
top_rows_heatmap
can be dispatched on following classes:
top_rows_heatmap,ConsensusPartition-method
, ConsensusPartition-class
class method
top_rows_heatmap,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
top_rows_heatmap,HierarchicalPartition-method
, HierarchicalPartition-class
class method
top_rows_heatmap,matrix-method
, matrix-class
class method
# no example NULL
# no example NULL
Heatmap of top rows from different top-value methods
## S4 method for signature 'HierarchicalPartition' top_rows_heatmap(object, top_n = min(object@list[[1]]@top_n), anno = get_anno(object), anno_col = get_anno_col(object), scale_rows = object@list[[1]]@scale_rows, ...)
## S4 method for signature 'HierarchicalPartition' top_rows_heatmap(object, top_n = min(object@list[[1]]@top_n), anno = get_anno(object), anno_col = get_anno_col(object), scale_rows = object@list[[1]]@scale_rows, ...)
object |
A |
top_n |
Number of top rows. |
anno |
A data frame of annotations for the original matrix columns. By default it uses the annotations specified in |
anno_col |
A list of colors (color is defined as a named vector) for the annotations. If |
scale_rows |
Wether to scale rows. |
... |
Pass to |
No value is returned.
Zuguang Gu <[email protected]>
top_rows_heatmap,matrix-method
# There is no example NULL
# There is no example NULL
Heatmap of top rows from different top-value methods
## S4 method for signature 'matrix' top_rows_heatmap(object, all_top_value_list = NULL, top_value_method = all_top_value_methods(), bottom_annotation = NULL, top_n = round(0.25*nrow(object)), scale_rows = TRUE, ...)
## S4 method for signature 'matrix' top_rows_heatmap(object, all_top_value_list = NULL, top_value_method = all_top_value_methods(), bottom_annotation = NULL, top_n = round(0.25*nrow(object)), scale_rows = TRUE, ...)
object |
A numeric matrix. |
all_top_value_list |
Top-values that have already been calculated from the matrix. If it is |
top_value_method |
Methods defined in |
bottom_annotation |
A |
top_n |
Number of top rows to show in the heatmap. |
scale_rows |
Whether to scale rows. |
... |
Pass to |
The function makes heatmaps where the rows are scaled (or not scaled) for the top n rows from different top-value methods.
The top n rows are used for subgroup classification in cola analysis, so the heatmaps show which top-value method gives better candidate rows for the classification.
No value is returned.
Zuguang Gu <[email protected]>
set.seed(123) mat = matrix(rnorm(1000), nrow = 100) top_rows_heatmap(mat, top_n = 25)
set.seed(123) mat = matrix(rnorm(1000), nrow = 100) top_rows_heatmap(mat, top_n = 25)
Overlap of top rows from different top-value methods
## S4 method for signature 'ConsensusPartitionList' top_rows_overlap(object, top_n = min(object@list[[1]]@top_n), method = c("euler", "upset", "venn", "correspondance"), fill = NULL, ...)
## S4 method for signature 'ConsensusPartitionList' top_rows_overlap(object, top_n = min(object@list[[1]]@top_n), method = c("euler", "upset", "venn", "correspondance"), fill = NULL, ...)
object |
A |
top_n |
Number of top rows. |
method |
|
fill |
Filled color for the Euler diagram. The value should be a color vector. Transparency of 0.5 are added internally. |
... |
Additional arguments passed to |
No value is returned.
Zuguang Gu <[email protected]>
data(golub_cola) top_rows_overlap(golub_cola, method = "euler") top_rows_overlap(golub_cola, method = "upset") top_rows_overlap(golub_cola, method = "venn") top_rows_overlap(golub_cola, method = "correspondance")
data(golub_cola) top_rows_overlap(golub_cola, method = "euler") top_rows_overlap(golub_cola, method = "upset") top_rows_overlap(golub_cola, method = "venn") top_rows_overlap(golub_cola, method = "correspondance")
Method dispatch page for top_rows_overlap
.
top_rows_overlap
can be dispatched on following classes:
top_rows_overlap,HierarchicalPartition-method
, HierarchicalPartition-class
class method
top_rows_overlap,matrix-method
, matrix-class
class method
top_rows_overlap,ConsensusPartitionList-method
, ConsensusPartitionList-class
class method
# no example NULL
# no example NULL
Overlap of top rows on different nodes
## S4 method for signature 'HierarchicalPartition' top_rows_overlap(object, method = c("euler", "upset", "venn"), fill = NULL, ...)
## S4 method for signature 'HierarchicalPartition' top_rows_overlap(object, method = c("euler", "upset", "venn"), fill = NULL, ...)
object |
A |
method |
|
fill |
Filled color for the Euler diagram. The value should be a color vector. Transparency of 0.5 are added internally. |
... |
Additional arguments passed to |
No value is returned.
Zuguang Gu <[email protected]>
data(golub_cola_rh) top_rows_overlap(golub_cola_rh, method = "euler") top_rows_overlap(golub_cola_rh, method = "upset") top_rows_overlap(golub_cola_rh, method = "venn")
data(golub_cola_rh) top_rows_overlap(golub_cola_rh, method = "euler") top_rows_overlap(golub_cola_rh, method = "upset") top_rows_overlap(golub_cola_rh, method = "venn")
Overlap of top rows from different top-value methods
## S4 method for signature 'matrix' top_rows_overlap(object, top_value_method = all_top_value_methods(), top_n = round(0.25*nrow(object)), method = c("euler", "upset", "venn", "correspondance"), fill = NULL, ...)
## S4 method for signature 'matrix' top_rows_overlap(object, top_value_method = all_top_value_methods(), top_n = round(0.25*nrow(object)), method = c("euler", "upset", "venn", "correspondance"), fill = NULL, ...)
object |
A numeric matrix. |
top_value_method |
Methods defined in |
top_n |
Number of top rows. |
method |
|
fill |
Filled color for the Euler diagram. The value should be a color vector. Transparency of 0.5 are added internally. |
... |
Additional arguments passed to |
It first calculates scores for every top-value method and make plot by top_elements_overlap
.
No value is returned.
Zuguang Gu <[email protected]>
set.seed(123) mat = matrix(rnorm(1000), nrow = 100) top_rows_overlap(mat, top_n = 25)
set.seed(123) mat = matrix(rnorm(1000), nrow = 100) top_rows_overlap(mat, top_n = 25)