Title: | Graph Theoretic Association Tests |
---|---|
Description: | Functions and data used in Balasubramanian, et al. (2004) |
Authors: | R. Balasubramanian, T. LaFramboise, D. Scholtens |
Maintainer: | Thomas LaFramboise <[email protected]> |
License: | LGPL |
Version: | 1.79.0 |
Built: | 2024-10-30 07:24:42 UTC |
Source: | https://github.com/bioc/GraphAT |
This data set contains mRNA expression from a microarray experiment involving yeast grown under a variety of altered environments (e.g. acid, heat, sorbitol, etc.)
data(causton)
data(causton)
A matrix whose rows are the 6015 genes and whose columns are the 45 experimental conditions.
http://web.wi.mit.edu/young/environment
Causton, H. C. et al. 2001 “Remodeling of Yeast Genome Expression in Response to Environmental Changes.” Molecular Biology of the Cell 12, 323–337.
data(causton) ## Find the 3000 most variable genes, according to sd/mean: varMeas<-function(vec) sd(vec)/mean(vec) variability<-apply(causton,1,varMeas) rks<-rank(variability) causton3000<-causton[rks>length(rownames(causton))-3000,]
data(causton) ## Find the 3000 most variable genes, according to sd/mean: varMeas<-function(vec) sd(vec)/mean(vec) variability<-apply(causton,1,varMeas) rks<-rank(variability) causton3000<-causton[rks>length(rownames(causton))-3000,]
An adjacency matrix in which
data(ccCM)
data(ccCM)
ccCM
is a symmetric matrix with 2885 columns and 2885 rows.
nNamescc
is a vector of 2885 gene names.
Cho, et al. discuss the k means clustering of 2885 Saccharomyces genes into 30 clusters with measurements taken over two synchronized cell cycles. nNamescc
is a vector of the 2885 gene names. ccCM
is an adjacency matrix in which a "1" in the ith row and jth column indicates that gene i and gene j belong to the same cluster. All other entries are 0. These data are integrated with phenotypic data and GO data in Balasubramanian, et al (2004).
Balasubramanian R, LaFramboise T, Scholtens D, Gentleman R. (2004) A graph theoretic approach to integromics - integrating disparate sources of functional genomics data
Cho, et al. (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell, 2, 65-73.
Tavazoie, et al. (1999) Systematic determination of genetic network architecture. Nature Genetics, 22, 281-285.
data(ccCM)
data(ccCM)
Given a list of cluster memberships, this function computes the adjacency matrix for the associated cluster graph. The adjacency matrix is a matrix whose rows and columns are the nodes of the cluster graph and whose entries are 0's or 1's. A 1 entry indicates that the corresponding nodes are connected, and a 0 indocates that they are not.
clust2Mat(memb)
clust2Mat(memb)
memb |
A numeric vector, with each entry representing a node, the entry's value being the number of the cluster to which that node belongs. |
Given a vector of cluster membership numbers, where the number of entries is the number of nodes n, the function computes an nxn “adjacency matrix” for the corresponding cluster graph. The cluster graph is the graph in which two nodes are connected by an edge if and only if they are members of the same cluster. The adjacency matrix for the graph has rows and columns representing the nodes, in the same order as the input vector. The (i,j) entry is 1 if and only if node i and node j are in the same cluster. Otherwise, the entry is 0. By convention, diagonal entries are 0.
An nxn adjacency matrix for the cluster graph, where n=length of cluster membership input vector memb.
Tom LaFramboise [email protected]
memberships<-c(1,2,3,1,2,3,1,2,3,4) clust2Mat(memberships)
memberships<-c(1,2,3,1,2,3,1,2,3,4) clust2Mat(memberships)
This matrix of depths is used to obtain the predictome data in the paper. This is a symmetric matrix, where the i,j element corresponds to is the maximum depth of all annotations shared by genes i and j. Note that depth of a term in a specific Gene Ontology (BP, CC, MF) is defined as the shortest path between the term and the root node, where distance between nodes is measured by the number of edges traversed. Row labels of the matrix can be obtained by the row.names() function
data(depthmatBP)
data(depthmatBP)
Each of three matrices, namely depthmatBP.rda, depthmatCC.rda, depthmatMF.rda is a symmetric matrix whose rows and columns correspond to specific YEAST genes (see row labels using row.names()). The i,j entry of each matrix refers to the maximum depth shared by genes i and j under each of the BP, CC and MF ontlogies respectively
data(depthmatBP) print(row.names(depthmatBP)[1:10])
data(depthmatBP) print(row.names(depthmatBP)[1:10])
The function takes as inputs two adjacency matrices. Let X denote the observed number of edges in common between the two adjacency matrices. To test the significance of the correlation between the two data sources, the function performs N random edge permutations and random node permutations respectively. For each permutation test, the function outputs the proportion of N realizations that resulted in X edges or more at the intersection of the two datasources
getpvalue(act.mat, nonact.mat, num.iterations = 1000)
getpvalue(act.mat, nonact.mat, num.iterations = 1000)
act.mat |
Adjacency matrix corresponding to first data source. That is, the i,j element of this matrix is 1 if data source one specifies a functional link between genes i and j |
nonact.mat |
Adjacency matrix corresponding to first data source. That is, the i,j element of this matrix is 1 if data source two specifies a functional link between genes i and j |
num.iterations |
Number of realizations from random edge (node) permutation to be obtained |
We note that the first adjacency matrix, denoted act.mat is the data source that is permutated with respect to edges or notes
A vector of length 2, where the first element is the P value from Random Edge Permutation and the second element is the P value from Random Node Permutation
Raji Balasubramanian [email protected]
permEdgesM2M
, permNodesM2M
, makeClustM
act.mat <- matrix(0,3,3) act.mat[2,1] <- 1 act.mat[3,1] <- 1 nonact.mat <- matrix(0,3,3) nonact.mat[2,1] <- 1 nonact.mat[3,2] <- 1 p.val <- getpvalue(act.mat, nonact.mat, num.iterations = 100) print(p.val)
act.mat <- matrix(0,3,3) act.mat[2,1] <- 1 act.mat[3,1] <- 1 nonact.mat <- matrix(0,3,3) nonact.mat[2,1] <- 1 nonact.mat[3,2] <- 1 p.val <- getpvalue(act.mat, nonact.mat, num.iterations = 100) print(p.val)
This data set contains fitness deficiency scores from gene knockout experiments involving yeast grown under a variety of altered environments (e.g. acid, heat, sorbitol, etc.)
data(giaever)
data(giaever)
A matrix whose rows are the 5922 genes knocked out and whose columns are the 32 experimental conditions.
http://gobi.lbl.gov/YeastFitnessData
Giaever, G. et al. 2002 “Functional profiling of the Saccharomyces cerevisiae genome.” Nature 418, 387–391.
data(giaever) ## Find the 3000 most variable genes, according to sd/mean: varMeas <- function(vec, na.rm=TRUE) { if(na.rm) vec <- vec[!is.na(vec)] if(length(vec) == 0) measure <- NA else measure <- sd(vec)/mean(vec) return(measure) } variability <- apply(giaever, 1, varMeas) rks <- rank(variability) giaever3000 <- giaever[rks>length(rownames(giaever))-3000,]
data(giaever) ## Find the 3000 most variable genes, according to sd/mean: varMeas <- function(vec, na.rm=TRUE) { if(na.rm) vec <- vec[!is.na(vec)] if(length(vec) == 0) measure <- NA else measure <- sd(vec)/mean(vec) return(measure) } variability <- apply(giaever, 1, varMeas) rks <- rank(variability) giaever3000 <- giaever[rks>length(rownames(giaever))-3000,]
This function takes a vector of cluster sizes and returns an adjacency matrix for a graph in which edges connect nodes if they are members of the same cluster.
makeClustM(nvec)
makeClustM(nvec)
nvec |
A vector of cluster sizes |
A square adjacency matrix with the number of rows and columns equal to the sum of nvec. An entry of "1" in the ith row and jth column indicates that node i and node j are members of the same cluster. All other entries are "0".
Denise Scholtens
Balasubramanian, et al. (2004) A graph theoretic approach to integromics - integrating disparate sources of functional genomics data.
a <- makeClustM(c(2,3,4))
a <- makeClustM(c(2,3,4))
A function to turn an adjacency matrix for a graph into a graphNEL object.
mat2UndirG(V, mat)
mat2UndirG(V, mat)
V |
A vector of node names |
mat |
A square symmetric matrix indicating the presence of edges |
mat
is a square matrix with rows and columns corresponding to nodes in the graph. Entries of "0" indicate the lack of an edge. Since this is making an undirected graph, mat
must be symmetric.
A graphNEL object.
Denise Scholtens
Balasubramanian, et al. (2004) A graph theoretic approach to integromics - integrating disparate sources of functional genomics data.
library(graph) a <- matrix(c(0,1,1,1,1,0,0,0,1,0,0,0,1,0,0,0),ncol=4) ag <- mat2UndirG(V=letters[1:4],mat=a)
library(graph) a <- matrix(c(0,1,1,1,1,0,0,0,1,0,0,0,1,0,0,0),ncol=4) ag <- mat2UndirG(V=letters[1:4],mat=a)
This data set contains cluster membership for yeast genes clustered using mRNA expression from a microarray experiment in Causton, et al. Molecular Biology of the Cell (2001). The 3000 most variable genes were clustered using k-means with 30 clusters.
data(mRNAclusters)
data(mRNAclusters)
A data frame whose rows are the 3000 genes and whose two columns are gene name and cluster membership number.
http://web.wi.mit.edu/young/environment
Causton, H. C. et al. 2001 “Remodeling of Yeast Genome Expression in Response to Environmental Changes.” Molecular Biology of the Cell 12, 323–337.
data(mRNAclusters) ## Compute the adjacency matrix for the corresponding cluster graph: mRNAMat<-clust2Mat(mRNAclusters[,2])
data(mRNAclusters) ## Compute the adjacency matrix for the corresponding cluster graph: mRNAMat<-clust2Mat(mRNAclusters[,2])
This function simulates graphs from the alternative hypothesis of preferential connection of intracluster nodes. For each graph, it runs a node and edge permutation test. The estimated “power” of each test is the proportion of graphs that the test rejects the null hypothesis of no preferential connection of intracluster edges.
permPower(psi=1,clsizes, nedge, nhyper=100, nperms=1000)
permPower(psi=1,clsizes, nedge, nhyper=100, nperms=1000)
psi |
The non-centrality parameter for the noncentral hypergeometric distribution used to simulate the graphs. |
clsizes |
A vector of cluster sizes. |
nedge |
The number of edges in each graph. |
nhyper |
The number of noncentral hypergeometric graphs simulated to estimate "power". |
nperms |
The number of permutations used for each run of the edge and node permutation tests. |
The function first generates nhyper realizations of a noncentral hypergeometric(nedge,n,k,psi) random variable, where n is the number of node pairs and k is the number of intracluster node pairs. For each realization x, a graph with n edges, x of which are intracluster, is generated. The edge and node permutation tests (with nperms permutations each) are performed on each graph. The estimated “power” of each test is the proportion of graphs for which the test rejects the null hypothesis of no preferential connection of intracluster nodes (at the 5% level). The 95% confidence intervals for the power levels are also computed.
A list with four components:
power.permedge |
Estimated “power” for edge permutation test. |
power.permnode |
Estimated “power” for node permutation test. |
CI.permedge |
Vector giving 95% confidence interval for edge permutation test power. |
CI.permnode |
Vector giving 95% confidence interval for node permutation test power. |
Tom LaFramboise [email protected]
permEdgesM2M
, permNodesM2M
, makeClustM
permPower(psi=5,clsizes=c(1,2,3,4),nedge=10,nhyper=100,nperms=100)
permPower(psi=5,clsizes=c(1,2,3,4),nedge=10,nhyper=100,nperms=100)
Given an adjacency matrix for a graph, permEdgesM2M
will return an adjacency matrix after an Erdos-Renyi random permutation of the edges in the graph. perNodesM2M
will return an adjacency matrix for a graph with identical structure, but with the node labels permuted.
permEdgesM2M(mat) permNodesM2M(mat)
permEdgesM2M(mat) permNodesM2M(mat)
mat |
A square adjacency matrix for a graph. |
A square adjacency matrix for the new graph, subject to a random permutation of the edges or nodes.
Denise Scholtens
Balasubramanian, et al. (2004) A graph theoretic approach to integromics - integrating disparate sources of functional genomics data.
g <- matrix(c(0,1,1,1,1,0,0,0,1,0,0,0,1,0,0,0),nrow=4) g1 <- permEdgesM2M(g) g2 <- permNodesM2M(g)
g <- matrix(c(0,1,1,1,1,0,0,0,1,0,0,0,1,0,0,0),nrow=4) g1 <- permEdgesM2M(g) g2 <- permNodesM2M(g)
This data set contains cluster memberships for yeast genes clustered using fitness deficiency scores from gene knockout experiments from Giaever et al. Nature (2002). The 3000 most variable genes were clustered using k-means with 30 clusters
data(Phenoclusters)
data(Phenoclusters)
A matrix whose rows are the 3000 genes and whose two columns are gene name and cluster membership number.
http://gobi.lbl.gov/YeastFitnessData
Giaever, G. et al. 2002 “Functional profiling of the Saccharomyces cerevisiae genome.” Nature 418, 387–391.
data(Phenoclusters) ## Compute the adjacency matrix for the corresponding cluster graph: phenoMat<-clust2Mat(Phenoclusters[,2])
data(Phenoclusters) ## Compute the adjacency matrix for the corresponding cluster graph: phenoMat<-clust2Mat(Phenoclusters[,2])