Title: | Compute Mantel Cluster Correlations |
---|---|
Description: | Computes Mantel cluster correlations from a (p x n) numeric data matrix (e.g. microarray gene-expression data). |
Authors: | Brian Steinmeyer and William Shannon |
Maintainer: | Brian Steinmeyer <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.77.0 |
Built: | 2024-12-29 05:35:47 UTC |
Source: | https://github.com/bioc/MantelCorr |
'ClusterGeneList' produces a list of both significant and nonsignificant genes from each respective cluster type
ClusterGeneList(clus, clustlist.sig, x.data)
ClusterGeneList(clus, clustlist.sig, x.data)
clus |
'clusters' object returned by 'GetClusters' |
clustlist.sig |
'SignificantClusters' object returned by 'ClusterList' |
x.data |
original (p x n) numeric data matrix (e.g., gene-expression data) |
A list with components:
SignificantClusterGenes |
significant cluster genes returned from 'ClusterList' |
NonSignificantClusterGenes |
nonsignificant cluster genes returned from 'ClusterList' |
argument 'x.data' should have an ID gene variable, 'probes', attached as a 'dimnames' attribute
Brian Steinmeyer
'GetClusters' 'ClusterList'
# simulate a p x n microarray expression dataset, where p = genes and n = samples data.sep <- rbind(matrix(rnorm(1000), ncol=50), matrix(rnorm(1000, mean=5), ncol=50)) noise <- matrix(runif(40000), ncol=1000) data <- t(cbind(data.sep, noise)) data <- data[1:200, ] # data has p = 1,050 genes and n = 40 samples clusters.result <- GetClusters(data, 100, 100) dist.matrices <- DistMatrices(data, clusters.result$clusters) mantel.corrs <- MantelCorrs(dist.matrices$Dfull, dist.matrices$Dsubsets) permutation.result <- PermutationTest(dist.matrices$Dfull, dist.matrices$Dsubsets, 100, 40, 0.05) # generate both significant and non-significant gene clusters cluster.list <- ClusterList(permutation.result, clusters.result$cluster.sizes, mantel.corrs) # significant and non-significant cluster genes (expression values) cluster.genes <- ClusterGeneList(clusters.result$clusters, cluster.list$SignificantClusters, data)
# simulate a p x n microarray expression dataset, where p = genes and n = samples data.sep <- rbind(matrix(rnorm(1000), ncol=50), matrix(rnorm(1000, mean=5), ncol=50)) noise <- matrix(runif(40000), ncol=1000) data <- t(cbind(data.sep, noise)) data <- data[1:200, ] # data has p = 1,050 genes and n = 40 samples clusters.result <- GetClusters(data, 100, 100) dist.matrices <- DistMatrices(data, clusters.result$clusters) mantel.corrs <- MantelCorrs(dist.matrices$Dfull, dist.matrices$Dsubsets) permutation.result <- PermutationTest(dist.matrices$Dfull, dist.matrices$Dsubsets, 100, 40, 0.05) # generate both significant and non-significant gene clusters cluster.list <- ClusterList(permutation.result, clusters.result$cluster.sizes, mantel.corrs) # significant and non-significant cluster genes (expression values) cluster.genes <- ClusterGeneList(clusters.result$clusters, cluster.list$SignificantClusters, data)
'ClusterList' generates a list of both significant and nonsignificant clusters, with cluster number, Mantel cluster correlation and size
ClusterList(p.val, clus.size, mantel.cors)
ClusterList(p.val, clus.size, mantel.cors)
p.val |
permutation p-value returned from 'PermutationTest' |
clus.size |
vector of k cluster sizes returned from 'GetCluster' |
mantel.cors |
orignal, unpermuted k Mantel correlations returned from 'MantelCorrs' |
A list with components:
SignificantClusters |
clusters with significant Mantel correlation, equal to or larger than the permutation p-value returned by 'PermutationTest' |
NonSignificantClusters |
clusters with nonsignificant Mantel correlation, smaller than the permutation p-value returned by 'PermutationTest' |
Brian Steinmeyer
'PermutationTest'
# simulate a p x n microarray expression dataset, where p = genes and n = samples data.sep <- rbind(matrix(rnorm(1000), ncol=50), matrix(rnorm(1000, mean=5), ncol=50)) noise <- matrix(runif(40000), ncol=1000) data <- t(cbind(data.sep, noise)) data <- data[1:200, ] # data has p = 1,050 genes and n = 40 samples clusters.result <- GetClusters(data, 100, 100) dist.matrices <- DistMatrices(data, clusters.result$clusters) mantel.corrs <- MantelCorrs(dist.matrices$Dfull, dist.matrices$Dsubsets) permutation.result <- PermutationTest(dist.matrices$Dfull, dist.matrices$Dsubsets, 100, 40, 0.05) # generate both significant and non-significant gene clusters cluster.list <- ClusterList(permutation.result, clusters.result$cluster.sizes, mantel.corrs)
# simulate a p x n microarray expression dataset, where p = genes and n = samples data.sep <- rbind(matrix(rnorm(1000), ncol=50), matrix(rnorm(1000, mean=5), ncol=50)) noise <- matrix(runif(40000), ncol=1000) data <- t(cbind(data.sep, noise)) data <- data[1:200, ] # data has p = 1,050 genes and n = 40 samples clusters.result <- GetClusters(data, 100, 100) dist.matrices <- DistMatrices(data, clusters.result$clusters) mantel.corrs <- MantelCorrs(dist.matrices$Dfull, dist.matrices$Dsubsets) permutation.result <- PermutationTest(dist.matrices$Dfull, dist.matrices$Dsubsets, 100, 40, 0.05) # generate both significant and non-significant gene clusters cluster.list <- ClusterList(permutation.result, clusters.result$cluster.sizes, mantel.corrs)
'DistMatrices' uses 'dist' to compute dissimilarity matrices for 'data' and each cluster k from 'GetClusters'
DistMatrices(x.data, cluster.assignment)
DistMatrices(x.data, cluster.assignment)
x.data |
original 'data' matrix |
cluster.assignment |
cluster assignment vector, "clusters", returned by 'GetClusters' |
returns a list with two components:
Dsubsets |
dissimilarity matrices for each cluster k |
Dfull |
dissimilarity matrix for the original 'data' |
'GetClusters' should be executed prior to 'DistMatrices'
Brian Steinmeyer
'GetClusters'
# simulate a p x n microarray expression dataset, where p = genes and n = samples data.sep <- rbind(matrix(rnorm(1000), ncol=50), matrix(rnorm(1000, mean=5), ncol=50)) noise <- matrix(runif(40000), ncol=1000) data <- t(cbind(data.sep, noise)) data <- data[1:200, ] # data has p = 1,050 genes and n = 40 samples clusters.result <- GetClusters(data, 100, 100) dissimilarity.matrices <- DistMatrices(data, clusters.result$clusters)
# simulate a p x n microarray expression dataset, where p = genes and n = samples data.sep <- rbind(matrix(rnorm(1000), ncol=50), matrix(rnorm(1000, mean=5), ncol=50)) noise <- matrix(runif(40000), ncol=1000) data <- t(cbind(data.sep, noise)) data <- data[1:200, ] # data has p = 1,050 genes and n = 40 samples clusters.result <- GetClusters(data, 100, 100) dissimilarity.matrices <- DistMatrices(data, clusters.result$clusters)
'GetClusters' uses an overly large k with the 'kmeans' function to over-partition p variables (rows = genes) from n objects (cols = samples) from a given data matrix 'x.data'
GetClusters(x.data, num.k, num.iters)
GetClusters(x.data, num.k, num.iters)
x.data |
p x n data matrix of numeric values |
num.k |
number of k partitions desired |
num.iters |
number of iterations - recommend >= 100 |
'GetClusters' returns a list with the following components:
clusters |
cluster assignment from 'kmeans' |
cluster.sizes |
size of each cluster k from 'kmeans' |
The input data matrix, x.data, must be numeric (e.g., gene-expression values). We recommend using 'num.k' = one-half the number of genes and 'num.iters' greater than 50
Brian Steinmeyer
'kmeans'
# simulate a p x n microarray expression dataset, where p = genes and n = samples data.sep <- rbind(matrix(rnorm(1000), ncol=50), matrix(rnorm(1000, mean=5), ncol=50)) noise <- matrix(runif(40000), ncol=1000) data <- t(cbind(data.sep, noise)) data <- data[1:200, ] # data has p = 1,050 genes and n = 40 samples clusters.result <- GetClusters(data, 100, 100)
# simulate a p x n microarray expression dataset, where p = genes and n = samples data.sep <- rbind(matrix(rnorm(1000), ncol=50), matrix(rnorm(1000, mean=5), ncol=50)) noise <- matrix(runif(40000), ncol=1000) data <- t(cbind(data.sep, noise)) data <- data[1:200, ] # data has p = 1,050 genes and n = 40 samples clusters.result <- GetClusters(data, 100, 100)
Samples were taken with Affymetrix Hgu6800 chips and expression levels measured on 7,129 genes (probes). The samples consist of 27 acute lymphoblastic leukemia (ALL) and 11 acute myeloid luekemia (AML) patients. The data values are raw (e.g. no standardization or gene filtering applied).
data(GolubTrain)
data(GolubTrain)
A data frame of 7129 observations (genes) with the following 38 variables (samples):
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
AML
AML
AML
AML
AML
AML
AML
AML
AML
AML
AML
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
Golub, T.R. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, vol 286, 531-537, 1999.
data(GolubTrain)
data(GolubTrain)
'MantelCorrs' computes the Mantel correlation between two dissimilarity matrices
MantelCorrs(Dfull, Dsubsets)
MantelCorrs(Dfull, Dsubsets)
Dfull |
distance matrix returned by 'DistMatrices' using original 'data' |
Dsubsets |
list of distance matrices from each k cluster or partition returned by 'DistMatrices' |
A list with k components
where component i |
Mantel correlation for cluster i, i = 1,...,k |
The function is meant to be executed AFTER 'GetClustes' and 'DistMatrices' (see example)
the value 'k' corresponds to the parameter 'num.k' in 'GetClusters'
Brian Steinmeyer
Mantel N: The detection of disease clustering and a generalized regression approach. Cancer Research. 27(2), 209-220 (1967).
'GetClusters' 'DistMatrices' 'kmeans'
# simulate a p x n microarray expression dataset, where p = genes and n = samples data.sep <- rbind(matrix(rnorm(1000), ncol=50), matrix(rnorm(1000, mean=5), ncol=50)) noise <- matrix(runif(40000), ncol=1000) data <- t(cbind(data.sep, noise)) data <- data[1:200, ] # data has p = 1,050 genes and n = 40 samples clusters.result <- GetClusters(data, 100, 100) dist.matrices <- DistMatrices(data, clusters.result$clusters) mantel.corrs <- MantelCorrs(dist.matrices$Dfull, dist.matrices$Dsubsets)
# simulate a p x n microarray expression dataset, where p = genes and n = samples data.sep <- rbind(matrix(rnorm(1000), ncol=50), matrix(rnorm(1000, mean=5), ncol=50)) noise <- matrix(runif(40000), ncol=1000) data <- t(cbind(data.sep, noise)) data <- data[1:200, ] # data has p = 1,050 genes and n = 40 samples clusters.result <- GetClusters(data, 100, 100) dist.matrices <- DistMatrices(data, clusters.result$clusters) mantel.corrs <- MantelCorrs(dist.matrices$Dfull, dist.matrices$Dsubsets)
'PermutationTest' computes and returns an empirical p-value from a null distribution generated by permuting 'Dfull' a total of 'num.per' times.
PermutationTest(Dfull, Dsubsets, num.per, num.chips, alpha)
PermutationTest(Dfull, Dsubsets, num.per, num.chips, alpha)
Dfull |
dissimilarity matrix from the original (p x n) microarray expression data |
Dsubsets |
dissimilarity matrices from each k disjoint clusters returned by 'GetClusters' |
num.per |
number of permutations |
num.chips |
number of samples, 'n' from the original (p x n) data matrix |
alpha |
desired level of significance |
For each permutation, k Mantel correlations are computed by correlating the permuted 'Dfull' with each dissimilarity matrix 'Dsubsets' from the 'k' clusters returned by 'GetClusters'. The absolute value of the maximum Mantel cluster correlation is retained at each permutation. These 'num.per' maximum correlations are then used to generate a null distribution for distance metric independence, with the p-value taken from the (1 - 'alpha') percentile of this permutation distribution.
returns the permuted p-value for the 'alpha' selected level of significance
(p x n) data matrix should be numeric (e.g. gene-expression levels)
The function is meant to be executed AFTER 'GetClustes', 'DistMatrices' and 'MantelCorr' (see example)
Brian Steinmeyer
'GetClusters' 'DistMatrices' 'MantelCorrs'
# simulate a p x n microarray expression dataset, where p = genes and n = samples data.sep <- rbind(matrix(rnorm(1000), ncol=50), matrix(rnorm(1000, mean=5), ncol=50)) noise <- matrix(runif(40000), ncol=1000) data <- t(cbind(data.sep, noise)) data <- data[1:200, ] # data has p = 1,050 genes and n = 40 samples clusters.result <- GetClusters(data, 100, 100) dist.matrices <- DistMatrices(data, clusters.result$clusters) mantel.corrs <- MantelCorrs(dist.matrices$Dfull, dist.matrices$Dsubsets) permutation.result <- PermutationTest(dist.matrices$Dfull, dist.matrices$Dsubsets, 100, 40, 0.05)
# simulate a p x n microarray expression dataset, where p = genes and n = samples data.sep <- rbind(matrix(rnorm(1000), ncol=50), matrix(rnorm(1000, mean=5), ncol=50)) noise <- matrix(runif(40000), ncol=1000) data <- t(cbind(data.sep, noise)) data <- data[1:200, ] # data has p = 1,050 genes and n = 40 samples clusters.result <- GetClusters(data, 100, 100) dist.matrices <- DistMatrices(data, clusters.result$clusters) mantel.corrs <- MantelCorrs(dist.matrices$Dfull, dist.matrices$Dsubsets) permutation.result <- PermutationTest(dist.matrices$Dfull, dist.matrices$Dsubsets, 100, 40, 0.05)