Title: | Gene network reconstruction |
---|---|
Description: | This package can be used to compute associations among genes (gene-networks) or between genes and some external traits (i.e. clinical). |
Authors: | Yin Jin, Hesen Peng, Lei Wang, Raffaele Fronza, Yuanhua Liu and Christine Nardini |
Maintainer: | Yuanhua Liu <[email protected]> |
License: | GPL-3 |
Version: | 1.63.0 |
Built: | 2024-11-19 03:24:04 UTC |
Source: | https://github.com/bioc/BUS |
This package can be used to compute associations among genes (gene-networks) or between genes and some external traits (i.e. clinical). [Function: BUS]
Both associations can be computed via correlation or mutual information (MI). [Functions: gene.similarity (gene-gene associations) and gene.trait.similarity (gene-trait associations)]
Statistical significance of the association is computed for single and multiple hypotheses testing, using random permutations method [Functions: gene.pvalue, gene.trait.pvalue]
The package can handle data with missing values using bootstrapping methods to fill NAs. [Arguments: na.replica]
Package: | BUS |
Type: | Package |
Version: | 1.0.2 |
Date: | 2009-10-31 |
License: | GPL-3 |
Yin Jin, Hesen Peng, Lei Wang, Raffaele Fronza, Yuanhua Liu and Christine Nardini
Maintainer: Yuanhua Liu<[email protected]>
A wrapper function to calculate the computation of two types of similarities (correlation and mutual information) with two different goals: (i) identification of the statistically significant similarities among the activity of molecules sampled across different experiments (option Unsupervised, U), (ii) identification of the statistically significant similarities between such molecules and other types of information (clinical etc., option supervised, S) .
BUS(EXP, trait = NULL, measure, method.permut = 2, n.replica = 400, net.trim = NULL, thresh = NULL, nflag)
BUS(EXP, trait = NULL, measure, method.permut = 2, n.replica = 400, net.trim = NULL, thresh = NULL, nflag)
EXP |
Gene expression data in form of a matrix. Row stands for genes and column for experiments. |
trait |
Trait data in form of a matrix. The row stands for traits and column for experiments. |
measure |
Metric used to calculate similarity: "corr" for correlation, "MI" for mutual information. |
method.permut |
A flag to indicate which method is used to correct permutation p-values, default as 2. See gene.pvalue for details. |
n.replica |
Number of permutations used for the correction of multiple hypothesis testing; default value is 400. |
net.trim |
Method used to trim the network: "mrnet", "clr", "aracne" and "none" . "mrnet" infers a network using the maximum relevance/minimum redundancy feature selection method; "clr" use the CLR algorithm; "aracne" applies the data processing inequality to all triplets of nodes in order to remove the least significant edge in each triplet. These options come from the package minet, and they are used only for mutual information. "none" indicates no trim operation. It should be chosen when correlation is considered. |
thresh |
Threshold for significance of the corrected p-value. It is used, in the Unsupervised case, to trim the adjacency matrix (contains the results of the gene-gene association based on the chosen metric) and obtain a predicted gene interaction network. In the Supervised case, since no network is predicted, it is set as NULL. |
nflag |
A flag to indicate a gene-gene interaction case (Unsupervised) or a gene-trait interaction case (Supervised); 1 for Unsupervised and 2 for Supervised. |
similarity |
A matrix of similarity, which could be correlation or mutual information |
single.perm.p.value |
A matrix of single p-values |
multi.perm.p.value |
A matrix of corrected p-values |
net.pred.permut |
Predicted network obtained trimming non-significant values |
Yin Jin, Hesen Peng, Lei Wang, Raffaele Fronza, Yuanhua Liu and Christine Nardini
gene.pvalue
,gene.trait.pvalue
,pred.network
data(copasi) mat<-as.matrix(copasi[1:10,]) rownames(mat)<-paste("G",1:nrow(mat), sep="") BUS(EXP=mat,measure="corr",net.trim="none",thresh=0.05,nflag=1)
data(copasi) mat<-as.matrix(copasi[1:10,]) rownames(mat)<-paste("G",1:nrow(mat), sep="") BUS(EXP=mat,measure="corr",net.trim="none",thresh=0.05,nflag=1)
This dataset is taken from Copasi2 (Complex Pathway Simulator), a software for simulation and analysis of biochemical networks. The system generates random artificial gene networks according to well-defined topological and kinetic properties. These are used to run in silico experiments simulating real laboratory micro-array experiments. Noise with controlled properties is added to the simulation results several times emulating measurement replicates, before expression ratios are calculated. This series consists of 150 artificial gene networks. Each network consists of 100 genes with a total of 200 gene interactions (on average each gene has 2 modulators).
A data frame is size of 100x100, the 100 rows represent 100 genes and 100 columns for 100 experiments.
See http://www.comp-sys-bio.org/AGN/data.html for detailed information.
To calculate p-value for the null hypothesis that there is no gene-gene interaction. For gene expression data with M genes, a p-value matrix under MxM single null hypotheses (each two genes have no interaction) is computed; besides, matrices with correct p-values are output: corrected permutation method using a distribution of MxMxP (P number of permutations) null hypotheses tests (multi.perm.p.value). p-values are calculated based on the adjacency matrix for gene-gene interaction computed by function gene.similarity.
gene.pvalue(EXP, measure, net.trim, n.replica = 400)
gene.pvalue(EXP, measure, net.trim, n.replica = 400)
EXP |
Gene expression data in form of a matrix. Row stands for genes and column for experiments. |
measure |
Metric used to calculate similarity between genes: "corr" for correlation, "MI" for mutual information. |
net.trim |
Method used to trim the network: "mrnet", "clr", "aracne" and "none" . "mrnet" infers a network using the maximum relevance/minimum redundancy feature selection method; "clr" use the CLR algorithm; "aracne" applies the data processing inequality to all triplets of nodes in order to remove the least significant edge in each triplet. These options come from the package minet, and they are used only for mutual information. "none" indicates no trim operation. It should be chosen when correlation is considered. |
n.replica |
Number of permutations used for the correction of multiple hypothesis testing; default value is 400. |
Normally, in a permutation method, we use the empirical distribution of some statistics to estimate the p-value. To get a simple p-value for no interaction between gene i and j, empirical distribution of a vector with length of P (number of replicates) is used; to correct for multiple hypothesis with permutations, an empirical distribution of a vector with length of PxM (M being the number of hypotheses tested) is used.
single.perm.p.value |
A matrix of single p-values obtained with permutation method + beta distribution for extreme values (for MI) or obtained with the exact distribution computed directly by cor.test (for correlation) |
multi.perm.p.value |
A matrix of corrected p-values obtained with permutation method |
Yin Jin, Hesen Peng, Lei Wang, Raffaele Fronza, Yuanhua Liu and Christine Nardini
data(copasi) mat=as.matrix(copasi)[1:10,] rownames(mat)<-paste("G",1:nrow(mat), sep="") gene.pvalue(mat,measure="MI",net.trim="mrnet")
data(copasi) mat=as.matrix(copasi)[1:10,] rownames(mat)<-paste("G",1:nrow(mat), sep="") gene.pvalue(mat,measure="MI",net.trim="mrnet")
To calculate an adjacency matrix for gene-gene interaction (using correlation/mutual information metric). For gene expression data with M genes and N experiments, the adjacency matrix is in size of MxM. It is optional to get a trimmed adjacency matrix according to the argument net.trim, i.e. mrnet, clr andaracne (from the package minet).
gene.similarity(EXP, measure, net.trim, na.replica = 50)
gene.similarity(EXP, measure, net.trim, na.replica = 50)
EXP |
Gene expression data in form of a matrix. Row stands for genes and column for experiments. |
measure |
Metric used to calculate similarity between genes: "corr" for correlation, "MI" for mutual information. |
net.trim |
Method used to trim the adjacency matrix: "mrnet", "clr", "aracne" and "none". "mrnet" infers a network using the maximum relevance/minimum redundancy feature selection method; "clr" use the CLR algorithm; "aracne" applies the data processing inequality to all triplets of nodes in order to remove the least significant edge in each triplet. These options come from the package minet, and they are used only for mutual information. "none" indicates no trim operation. It should be chosen when correlation is considered. |
na.replica |
Times of replication for filling NANs in the impute method; default value is 50. The (smooth) bootstrapping approach is used to give an estimation to missing value in the data. |
An adjacency matrix in size of MxM with rows and columns both standing for genes. Element in row i and column j indicates the similarity between gene i and gene j.
Yin Jin, Hesen Peng, Lei Wang, Raffaele Fronza, Yuanhua Liu and Christine Nardini
data(copasi) mat=as.matrix(copasi)[1:10,] rownames(mat)<-paste("G",1:nrow(mat), sep="") res<-gene.similarity(mat,measure="corr",net.trim="none")
data(copasi) mat=as.matrix(copasi)[1:10,] rownames(mat)<-paste("G",1:nrow(mat), sep="") res<-gene.similarity(mat,measure="corr",net.trim="none")
To calculate p-value for null hypothesis that there is no interaction between gene and trait. There are MxT interactions between M genes and T traits. Results are given with 3 possibilities 1 for single p-value, and 3 for different types of correction. p-values are calculated based on the adjacency matrix for gene-gene interaction computed by function gene.trait.similarity.
gene.trait.pvalue(EXP, trait, measure, method.permut = 2, n.replica = 400)
gene.trait.pvalue(EXP, trait, measure, method.permut = 2, n.replica = 400)
EXP |
Gene expression data in form of a matrix. Row stands for genes and column for experiments. |
trait |
Trait data in form of matrix. Row stands for traits and column for experiments. |
measure |
Metric used to calculate similarity: "corr" for correlation, "MI" for mutual information. |
method.permut |
A flag to indicate correction style when multiple hypotheses testing is considered. 1 for multiple traits correction, 2 for multiple genes and 3 for both genes and traits correction. The default value is 2. |
n.replica |
Number of permutations for the correction of multiple hypothesis testing; default value is 400. |
According to a permutation method, we use the empirical distribution of some statistics to estimate the p-value. For single p-value the empirical distribution is a vector of P (number of random replicates for each test) test values. It is then possible to correct p-value in different ways: method.permut = 1, it is the empirical distribution of a vector with length of TxP, corrects for the multiple traits tested; method.permut = 2, it is the empirical distribution of a vector with length of MxP, corrects for the multiple genes tested; method.permut = 3, it is empirical distribution of a vector with length of MxTxP, corrects for the multiple traits and genes tested.
single.perm.p.value |
A matrix of single p-values obtained with permutation method + beta distribution for extreme values (for MI) or obtained with the exact distribution computed directly by cor.test (for correlation) |
multi.perm.p.value |
A matrix of corrected p-values obtained with permutation method |
Yin Jin, Hesen Peng, Lei Wang, Raffaele Fronza, Yuanhua Liu and Christine Nardini
data(tumors.mRNA) data(tumors.miRNA) exp<-tumors.mRNA trait<-tumors.miRNA gene.trait.pvalue(EXP=exp[1:10,],trait=trait[1:5,],measure="MI")
data(tumors.mRNA) data(tumors.miRNA) exp<-tumors.mRNA trait<-tumors.miRNA gene.trait.pvalue(EXP=exp[1:10,],trait=trait[1:5,],measure="MI")
To calculate similarity for gene-trait interaction (using correlation/mutual information metric).
gene.trait.similarity(EXP, trait, measure, na.replica = 50)
gene.trait.similarity(EXP, trait, measure, na.replica = 50)
EXP |
Gene expression data in form of a matrix. Row stands for genes and column for experiments. |
trait |
Trait data in form of matrix. Row stands for traits and column for experiments. |
measure |
Metric used to calculate similarity: "corr" for correlation, "MI" for mutual information. |
na.replica |
Times of replicates for filling NANs in impute method; default value is 50. The (smooth) bootstrapping approach is used to give an estimation to missing value in the data. |
A matrix, row stands for gene and column for trait. Element in row i and column j stands for the association between the gene i and trait j.
Yin Jin, Hesen Peng, Lei Wang, Raffaele Fronza, Yuanhua Liu and Christine Nardini
data(tumors.mRNA) data(tumors.miRNA) exp<-tumors.mRNA trait<-tumors.miRNA gene.trait.similarity(EXP= exp[1:10, ],trait= trait[1:5, ],measure="MI")
data(tumors.mRNA) data(tumors.miRNA) exp<-tumors.mRNA trait<-tumors.miRNA gene.trait.similarity(EXP= exp[1:10, ],trait= trait[1:5, ],measure="MI")
To predict the matrix of gene network, based on the similarity matrix and filtered according to a corrected p-value matrix.
pred.network(pM,similarity,thresh)
pred.network(pM,similarity,thresh)
pM |
A corrected p-value matrix, a MxM matrix for significance of similarity among M genes. |
similarity |
A MxM matrix for similarity between genes. |
thresh |
Threshold for significance of the p-value. |
A MxM matrix of the predicted network, where cell emphij infers a link between gene i and j and set 0 when the p-value is not significant (no link).
Yin Jin, Hesen Peng, Lei Wang, Raffaele Fronza, Yuanhua Liu and Christine Nardini
data(copasi) mat<-as.matrix(copasi[1:10,]) rownames(mat)<-paste("G",1:nrow(mat), sep="") similarity=gene.similarity(mat,measure="MI",net.trim="mrnet") pM=gene.pvalue(mat,measure="MI",net.trim="mrnet")$single.perm.p.value pred.network(pM,similarity,thresh=0.05)
data(copasi) mat<-as.matrix(copasi[1:10,]) rownames(mat)<-paste("G",1:nrow(mat), sep="") similarity=gene.similarity(mat,measure="MI",net.trim="mrnet") pM=gene.pvalue(mat,measure="MI",net.trim="mrnet")$single.perm.p.value pred.network(pM,similarity,thresh=0.05)
MiRNA data obtained by RT-PCR from human brain tumors. 12 brain tumors at different levels are analyzed for both mRNA and miRNA levels to study the correlation of any mRNA-miRNA pair in the reference .
data(tumors.miRNA)
data(tumors.miRNA)
tumors.miRNA
is a matrix with miRNA as rows and tumor type as columns.
Liu T, Papagiannakopoulos T, Puskar K, Qi S, Santiago F, Clay W, Lao K, Lee Y, Nelson SF, Kornblum HI, Doyle F, Petzold L, Shraiman B, Kosik KS. Detection of a microRNA signal in an in vivo expression set of mRNAs. Plos One. 2007; 2(8):e804.
data(tumors.miRNA) tumors.miRNA[1:10,]
data(tumors.miRNA) tumors.miRNA[1:10,]
Gene expression data obtained by microarray from human brain tumors. 12 brain tumors at different levels are analyzed for both mRNA and miRNA levels to study the correlation of any mRNA-miRNA pair in the reference .
data(tumors.mRNA)
data(tumors.mRNA)
tumors.mRNA
is a matrix with mRNA probe IDs as rows and tumor type as columns.
Liu T, Papagiannakopoulos T, Puskar K, Qi S, Santiago F, Clay W, Lao K, Lee Y, Nelson SF, Kornblum HI, Doyle F, Petzold L, Shraiman B, Kosik KS. Detection of a microRNA signal in an in vivo expression set of mRNAs. Plos One. 2007; 2(8):e804.
data(tumors.mRNA) tumors.mRNA[1:10,]
data(tumors.mRNA) tumors.mRNA[1:10,]