Title: | Differential co-expression/association network analysis |
---|---|
Description: | This package implements methods and an evaluation framework to infer differential co-expression/association networks. Various methods are implemented and can be evaluated using simulated datasets. Inference of differential co-expression networks can allow identification of networks that are altered between two conditions (e.g., health and disease). |
Authors: | Dharmesh D. Bhuva [aut, cre] |
Maintainer: | Dharmesh D. Bhuva <[email protected]> |
License: | GPL-3 |
Version: | 1.23.0 |
Built: | 2024-11-08 06:00:33 UTC |
Source: | https://github.com/bioc/dcanr |
Fast estimation of pairwise correlation coefficients.
cor.pairs(emat, cor.method = c("pearson", "spearman"))
cor.pairs(emat, cor.method = c("pearson", "spearman"))
emat |
a numeric matrix |
cor.method |
a character, specifying the method to use for estimation. Possible values are 'pearson' (default) and 'spearman' |
a numeric matrix with estimated correlation coefficients
x <- matrix(rnorm(200), 100, 2) cor.pairs(x) cor.pairs(x, cor.method = 'spearman')
x <- matrix(rnorm(200), 100, 2) cor.pairs(x) cor.pairs(x, cor.method = 'spearman')
Adjust for multiple hypothesis testing after performing
statistical tests using dcTest
. This can be performed using a
method provided by the users. p.adjust
is used by default.
dcAdjust(dcpvals, f = stats::p.adjust, ...)
dcAdjust(dcpvals, f = stats::p.adjust, ...)
dcpvals |
a matrix, the result of the |
f |
a function, the function to be used for adjustment. |
... |
additional parameters to the adjustment function such as
|
Ensure that the p-value matrix passed to this function is the one
produced by dcTest
. Any modification to the result matrix will
result in failure of the function.
This method applies the adjustment method only to one triangle of the matrix to ensure adjustment is not performed for duplicated tests (symmetric matrix). As results from the DiffCoEx and EBcoexpress do not produce p-values, this method does not change anything thereby returning the original matrix.
a matrix, of adjusted p-values (or scores in the case of DiffCoEx and EBcoexpress) representing significance of differential associations.
x <- matrix(rnorm(60), 2, 30) cond <- rep(1:2, 15) zscores <- dcScore(x, cond) pvals <- dcTest(zscores, emat = x, condition = cond) dcAdjust(pvals, p.adjust, method = 'fdr')
x <- matrix(rnorm(60), 2, 30) cond <- rep(1:2, 15) zscores <- dcScore(x, cond) pvals <- dcTest(zscores, emat = x, condition = cond) dcAdjust(pvals, p.adjust, method = 'fdr')
Quantify the performance of a differential co-expression pipeline on simulated data.
dcEvaluate( simulation, dclist, truth.type = c("association", "influence", "direct"), perf.method = "f.measure", combine = TRUE, ... )
dcEvaluate( simulation, dclist, truth.type = c("association", "influence", "direct"), perf.method = "f.measure", combine = TRUE, ... )
simulation |
a list, storing data and results generated from simulations |
dclist |
a list of igraphs, produced using |
truth.type |
a character, specifying which level of the true network to retrieve: 'association' (default), 'influence' or 'direct' |
perf.method |
a character, specifying the method to use. Available
methods can be accessed using |
combine |
a logical, indicating whether differential networks from
independent knock-outs should be treated as a single inference or
independent inferences (defaults to |
... |
additional parameters to be passed on to the performance metric
method (see |
a numeric, representing the performance metric. A single value if
combine = TRUE
and a named vector otherwise.
dcPipeline
, performanceMeasure
,
perfMethods
data(sim102) #run a standard pipeline resStd <- dcPipeline(sim102, dc.func = 'zscore') dcEvaluate(sim102, resStd) dcEvaluate(sim102, resStd, combine = FALSE)
data(sim102) #run a standard pipeline resStd <- dcPipeline(sim102, dc.func = 'zscore') dcEvaluate(sim102, resStd) dcEvaluate(sim102, resStd, combine = FALSE)
Returns a list of differential co-expression methods
dcMethods()
dcMethods()
names of methods implemented
dcMethods()
dcMethods()
Threshold the results from a differential co-expression analysis and create a differential network.
dcNetwork(dcscores, dcpvals = NULL, thresh = NULL, ...)
dcNetwork(dcscores, dcpvals = NULL, thresh = NULL, ...)
dcscores |
a matrix, the result of the |
dcpvals |
a matrix or NULL, raw or adjusted p-values resulting from
|
thresh |
a numeric, threshold to apply. If |
... |
see details |
No extra arguments required for this function. The ellipsis are used to allow flexibility in pipelines.
an igraph object, representing the differential network. Scores are added as edge attributes with the name 'score'
#create data set.seed(360) x <- matrix(rnorm(120), 4, 30) cond <- rep(1:2, 15) #perform analysis - z-score zscores <- dcScore(x, cond) pvals <- dcTest(zscores, emat = x, condition = cond) pvals <- dcAdjust(pvals, p.adjust, method = 'fdr') ig <- dcNetwork(zscores, pvals, 0.1) #perform analysis - DiffCoEx dcscores <- dcScore(x, cond, dc.method = 'diffcoex') ig <- dcNetwork(dcscores, thresh = 0.001) #plot the resulting differential co-expression network igraph::plot.igraph(ig)
#create data set.seed(360) x <- matrix(rnorm(120), 4, 30) cond <- rep(1:2, 15) #perform analysis - z-score zscores <- dcScore(x, cond) pvals <- dcTest(zscores, emat = x, condition = cond) pvals <- dcAdjust(pvals, p.adjust, method = 'fdr') ig <- dcNetwork(zscores, pvals, 0.1) #perform analysis - DiffCoEx dcscores <- dcScore(x, cond, dc.method = 'diffcoex') ig <- dcNetwork(dcscores, thresh = 0.001) #plot the resulting differential co-expression network igraph::plot.igraph(ig)
Run a differential co-expression pipeline on data from a simulation experiment. A default pipeline can be used which consists of methods in the package or custom pipelines can be provided.
dcPipeline( simulation, dc.func = "zscore", precomputed = FALSE, continuous = FALSE, cond.args = list(), ... )
dcPipeline( simulation, dc.func = "zscore", precomputed = FALSE, continuous = FALSE, cond.args = list(), ... )
simulation |
a list, storing data and results generated from simulations |
dc.func |
a function or character. Character represents one of the
method names from |
precomputed |
a logical, indicating whether the precomputed inference should be used or a new one computed (default FALSE) |
continuous |
a logical, indicating whether binary or continuous conditions should be used (default FALSE). No methods implemented currently use continuous conditions. This is to allow custom methods that require continuous conditions |
cond.args |
a list, containing condition-specific arguments for the DC inference pipeline. See details |
... |
additional parameters to |
If dc.func
is a character, the existing methods in the
package will be run with their default parameters. The pipeline is as such:
dcScore -> dcTest -> dcAdjust -> dcNetwork, resulting in a igraph object.
Parameters to the independent processing steps can also be provided to this
function as shown in the examples.
If precomputed
is TRUE while dc.func
is a character,
pre-computed results will be used. These can then be evaluated using
dcEvaluate
.
Custom pipelines need to be coded into a function which can then be provided instead of a character. Functions must have the following structure:
function(emat, condition, ...)
They must return either an igraph object or an adjacency matrix stored in a base R 'matrix' or the S4 'Matrix' class, containing all genes in the expression matrix 'emat'. See examples for how the in-built functions are combined into a pipeline.
If the pipeline (in-built or custom) requires condition-specific parameters
to run, cond.args can be used to pass these. For instance, LDGM requires
lambda OR the number of edges in the target network to be specified for
each inference/condition. For the latter case and with 3 different
conditions, this can be done by setting cond.args =
list('ldgm.ntarget' = c(100, 140, 200))
. Non-specific arguments should be
passed directly to the dcPipeline
function call.
a list of igraphs, representing the differential network for each independent condition (knock-out).
plot.igraph
, dcScore
,
dcTest
, dcAdjust
, dcNetwork
,
dcMethods
data(sim102) #run a standard pipeline resStd <- dcPipeline(sim102, dc.func = 'zscore') #run a standard pipeline and specify params resParam <- dcPipeline(sim102, dc.func = 'zscore', cor.method = 'pearson') #run a standard pipeline and specify condition-specific params resParam <- dcPipeline( sim102, dc.func = 'diffcoex', #arguments for the conditions ADR1 knockdown and UME6 knockdown resp. cond.args = list(diffcoex.beta = c(6, 20)) ) #retrieve pre-computed results resPrecomputed <- dcPipeline(sim102, dc.func = 'zscore', precomputed = TRUE) #run a custom pipeline analysisInbuilt <- function(emat, condition, dc.method = 'zscore', ...) { #compute scores score = dcScore(emat, condition, dc.method, ...) #perform statistical test pvals = dcTest(score, emat, condition, ...) #adjust tests for multiple testing adjp = dcAdjust(pvals, ...) #threshold and generate network dcnet = dcNetwork(score, adjp, ...) return(dcnet) } resCustom <- dcPipeline(sim102, dc.func = analysisInbuilt) plot(resCustom[[1]])
data(sim102) #run a standard pipeline resStd <- dcPipeline(sim102, dc.func = 'zscore') #run a standard pipeline and specify params resParam <- dcPipeline(sim102, dc.func = 'zscore', cor.method = 'pearson') #run a standard pipeline and specify condition-specific params resParam <- dcPipeline( sim102, dc.func = 'diffcoex', #arguments for the conditions ADR1 knockdown and UME6 knockdown resp. cond.args = list(diffcoex.beta = c(6, 20)) ) #retrieve pre-computed results resPrecomputed <- dcPipeline(sim102, dc.func = 'zscore', precomputed = TRUE) #run a custom pipeline analysisInbuilt <- function(emat, condition, dc.method = 'zscore', ...) { #compute scores score = dcScore(emat, condition, dc.method, ...) #perform statistical test pvals = dcTest(score, emat, condition, ...) #adjust tests for multiple testing adjp = dcAdjust(pvals, ...) #threshold and generate network dcnet = dcNetwork(score, adjp, ...) return(dcnet) } resCustom <- dcPipeline(sim102, dc.func = analysisInbuilt) plot(resCustom[[1]])
Implementations and wrappers for existing implementations for methods inferring differential associations/co-expression. This method requires a matrix of expression and a binary condition to compute the differential association scores for all pairs of features (genes). Applications are not limited to analysis of gene expression data and may be used for differential associations in general.
dcScore(emat, condition, dc.method, ...) ## S4 method for signature 'matrix' dcScore(emat, condition, dc.method = "zscore", ...) ## S4 method for signature 'Matrix' dcScore(emat, condition, dc.method = "zscore", ...) ## S4 method for signature 'data.frame' dcScore(emat, condition, dc.method = "zscore", ...) ## S4 method for signature 'ExpressionSet' dcScore(emat, condition, dc.method = "zscore", ...) ## S4 method for signature 'SummarizedExperiment' dcScore(emat, condition, dc.method = "zscore", ...) ## S4 method for signature 'DGEList' dcScore(emat, condition, dc.method = "zscore", ...)
dcScore(emat, condition, dc.method, ...) ## S4 method for signature 'matrix' dcScore(emat, condition, dc.method = "zscore", ...) ## S4 method for signature 'Matrix' dcScore(emat, condition, dc.method = "zscore", ...) ## S4 method for signature 'data.frame' dcScore(emat, condition, dc.method = "zscore", ...) ## S4 method for signature 'ExpressionSet' dcScore(emat, condition, dc.method = "zscore", ...) ## S4 method for signature 'SummarizedExperiment' dcScore(emat, condition, dc.method = "zscore", ...) ## S4 method for signature 'DGEList' dcScore(emat, condition, dc.method = "zscore", ...)
emat |
a matrix, Matrix, data.frame, ExpressionSet, SummarizedExperiment or DGEList |
condition |
a numeric, (with 1's and 2's representing a binary condition), a factor with 2 levels or a character representing 2 conditions |
dc.method |
a character, representing the method to use. Use
|
... |
possible arguments are |
When using data from sequencing experiments, make sure appropriate filtering for low counts and data transformation has been performed. Not doing so will affect estimation of correlation coefficients which most methods rely on.
Additional method specific parameters can be supplied to the function.
cor.method
can be set to either 'pearson' (default) or 'spearman' to
determine the method to use for estimating correlations. These are the two
measures currently supported in the package. We recommend using the
'spearman' correlation when dealing with sequencing data.
The beta parameter in the DiffCoEx method can be specified using
diffcoex.beta
(defaults to 6). This enable soft thresholding of
correlations similar to WGCNA.
EBcoexpress specific parameters include ebcoexpress.useBWMC
(defaults to TRUE
) representing whether to use the bi-weight
mid-correlation coefficient or not, and ebcoexpress.plot
which plots
the diagnostic plots if set to TRUE
(defaults to FALSE
).
LDGM specific parameters include ldgm.lambda
, ldgm.ntarget
and ldgm.iter
. ldgm.lambda
specifies the L1 regularisation
parameter to use when fitting the model. This can be tuned and specified by
the user. Alternatively, this can be tuned such that the resulting network
has a specified number of edges. In this case, ldgm.ntarget
should
be specified instead. ldgm.iter
is the maximum number of iterations
to perform when tuning ldgm.lambda
using ldgm.ntarget
(defaults to 50).
EBcoexpress and GGM-based are implemented by providing interfaces to,
or using functions from the EBcoexpress
, GeneNet
, and
COSINE
packages respectively. If using any of these methods, please
cite the appropriate packages and the appropriate methodology
articles.
a matrix, of scores/statistics representing differential associations; p-values will be returned if FTGI is used and posterior probabilities if EBcoexpress is used.
x <- matrix(rnorm(60), 2, 30) cond <- rep(1:2, 15) dcScore(x, cond) #defaults to zscore dcScore(x, cond, dc.method = 'diffcoex')
x <- matrix(rnorm(60), 2, 30) cond <- rep(1:2, 15) dcScore(x, cond) #defaults to zscore dcScore(x, cond, dc.method = 'diffcoex')
Perform statistical tests for scores generated using
dcScore
. Selects appropriate tests for the different methods used
in computing scores. The exact test is selected based on the scoring method
used and cannot be manually specified. Available tests include the z-test
and permutation tests. Parallel computation supported for the permutation
test.
dcTest(dcscores, emat, condition, ...)
dcTest(dcscores, emat, condition, ...)
dcscores |
a matrix, the result of the |
emat |
a matrix, data.frame, ExpressionSet, SummarizedExperiment or
DGEList. This should be the one passed to |
condition |
a numeric, (with 1's and 2's representing a binary
condition), a factor with 2 levels or a character representing 2
conditions. This should be the one passed to |
... |
see details |
Ensure that the score matrix passed to this function is the one
produced by dcScore
. Any modification to the result matrix will
cause this function to fail. This is intended as the test need to be
performed on the entire score matrix, not subsets.
The appropriate test is chosen automatically based on the scoring method used. A z-test is performed for the z-score method while no tests are performed for DiffCoEx, EBcoexpress and FTGI. Permutation tests are performed for the remainder of methods by permutation sample labels. Statistics from a permutation are pooled such that statistics from all scores are used to evaluate a single observed score.
Additional method specific parameters can be supplied to the function when
performing permutation tests. B
specifies the number of permutations
to be performed and defaults to 20.
If a cluster exists, computation in a permutation test will be performed in parallel (see examples).
a matrix, of p-values (or scores in the case of DiffCoEx and EBcoexpress) representing significance of differential associations. DiffCoEx will return scores as the publication specifies direct thresholding of scores and EBcoexpress returns posterior probabilities.
x <- matrix(rnorm(60), 2, 30) cond <- rep(1:2, 15) scores <- dcScore(x, cond, dc.method = 'mindy') dcTest(scores, emat = x, condition = cond) ## Not run: #running in parallel num_cores = 2 cl <- parallel::makeCluster(num_cores) doSNOW::registerDoSNOW(cl) #or doParallel set.seed(36) #for reproducibility dcTest(scores, emat = x, condition = cond, B = 100) parallel::stopCluster(cl) ## End(Not run)
x <- matrix(rnorm(60), 2, 30) cond <- rep(1:2, 15) scores <- dcScore(x, cond, dc.method = 'mindy') dcTest(scores, emat = x, condition = cond) ## Not run: #running in parallel num_cores = 2 cl <- parallel::makeCluster(num_cores) doSNOW::registerDoSNOW(cl) #or doParallel set.seed(36) #for reproducibility dcTest(scores, emat = x, condition = cond, B = 100) parallel::stopCluster(cl) ## End(Not run)
This function packs the entire DC analysis pipeline using the z-score method. It simplifies the implementation of the analysis and increases the flexibility of the analysis (not just limited to all pairwise comparisons).
dcZscore( emat, condition, from = NULL, to = NULL, fdrthresh = 0.1, cor.method = c("spearman", "pearson") )
dcZscore( emat, condition, from = NULL, to = NULL, fdrthresh = 0.1, cor.method = c("spearman", "pearson") )
emat |
a matrix, Matrix, data.frame, ExpressionSet, SummarizedExperiment or DGEList |
condition |
a numeric, (with 1's and 2's representing a binary condition), a factor with 2 levels or a character representing 2 conditions |
from |
a character vector, with the names of nodes from which comparisons need to be performed. |
to |
a character vector, with the names of nodes to which comparisons need to be performed. |
fdrthresh |
a numeric, specifying the FDR cutoff to apply to the inferred network. |
cor.method |
a character, either 'spearman' (default) or 'pearson' specifying the correlation computation method to use. |
an igraph object, containing the differential coexpression network.
x <- matrix(rnorm(60), 10, 30) rownames(x) = 1:10 cond <- rep(1:2, 15) dcZscore(x, cond) dcZscore(x, cond, to = 1:2)
x <- matrix(rnorm(60), 10, 30) rownames(x) = 1:10 cond <- rep(1:2, 15) dcZscore(x, cond) dcZscore(x, cond, to = 1:2)
Retrieves the simulated expression matrix and sample classification for a specific knock-down experiment.
getSimData(simulation, cond.name = NULL, full = FALSE) getConditionNames(simulation) getTrueNetwork( simulation, cond.name = NULL, truth.type = c("association", "influence", "direct"), full = FALSE )
getSimData(simulation, cond.name = NULL, full = FALSE) getConditionNames(simulation) getTrueNetwork( simulation, cond.name = NULL, truth.type = c("association", "influence", "direct"), full = FALSE )
simulation |
a list, storing data and results generated from simulations |
cond.name |
a character, indicating the knock-down to use to derive
conditions. Multiple knock-downs (KDs) are performed per simulation. If
|
full |
a logical, indicating whether genes associated with the condition
should be excluded. Defaults to |
truth.type |
a character, specifying which level of the true network to retrieve: 'association' (default), 'influence' or 'direct' |
Genes discarded when full
is FALSE
are those that are
solely dependent on the condition. These genes are discarded from the
analysis to focus on those that are differentially co-expressed, not
coordinately co-expressed.
The names of all genes knocked-out can be retrieved using
getConditionNames
.
The direct, influence and association networks represent different levels of true differential networks. The direct network contains differential regulatory interactions present in the original network. The influence network includes upstream interactions and the association network includes non-causative differential interactions.
a list, containing emat
, a matrix representing the expression
data, condition
, a numeric containing the classification of samples,
and , condition_c
, a numeric containing the expression levels of the
KD gene (continuous condition) for getSimData
; the names of all genes
that are KD for getConditionNames
; and an adjacency matrix for
getTrueNetwork
.
getSimData
: get the expression matrix and sample classification
getConditionNames
: get names of the conditions (KDs)
getTrueNetwork
: get the true differential network
data(sim102) KDs <- getConditionNames(sim102) #get simulated data simdata <- getSimData(sim102, KDs[2]) cond <- simdata$condition emat <- simdata$emat zscores <- dcScore(emat, cond) #get the true network to evaluate against truenet <- getTrueNetwork(sim102, KDs[2], truth.type = 'association')
data(sim102) KDs <- getConditionNames(sim102) #get simulated data simdata <- getSimData(sim102, KDs[2]) cond <- simdata$condition emat <- simdata$emat zscores <- dcScore(emat, cond) #get the true network to evaluate against truenet <- getTrueNetwork(sim102, KDs[2], truth.type = 'association')
Computes the mutual information between all pairs of variables in the matrix (along the columns). Variables are discretised using the adaptive partitioning algorithm
mi.ap(mat)
mi.ap(mat)
mat |
a numeric matrix |
matrix of pairwise mutual information estimates
x <- matrix(rnorm(200), 100, 2) mi.ap(x)
x <- matrix(rnorm(200), 100, 2) mi.ap(x)
Returns a list of performance metrics
perfMethods()
perfMethods()
names of methods implemented
perfMethods()
perfMethods()
Quantify the performance of a classification algorithm. Predictions and truth both have to be binary.
performanceMeasure(pred, obs, perf.method = "f.measure", ...)
performanceMeasure(pred, obs, perf.method = "f.measure", ...)
pred |
a logical or numeric, where 0 and FALSE represent control, and, 1 and TRUE represent cases |
obs |
a logical or numeric, where 0 and FALSE represent control, and, 1 and TRUE represent cases |
perf.method |
a character, specifying the method to use. Available
methods can be accessed using |
... |
additional parameters to methods. see details |
The F-measure requires the beta parameter which can be specified
using f.beta
which defaults to 1 thereby computing the F1-measure.
a numeric, representing the performance
pred <- sample(0:1, 100, replace = TRUE, prob = c(0.75, 0.25)) obs <- sample(0:1, 100, replace = TRUE, prob = c(0.75, 0.25)) #compute the F1 and F2 scores f1 <- performanceMeasure(pred, obs) f2 <- performanceMeasure(pred, obs, f.beta = 2)
pred <- sample(0:1, 100, replace = TRUE, prob = c(0.75, 0.25)) obs <- sample(0:1, 100, replace = TRUE, prob = c(0.75, 0.25)) #compute the F1 and F2 scores f1 <- performanceMeasure(pred, obs) f2 <- performanceMeasure(pred, obs, f.beta = 2)
Plots either the source network or the true differential network for all KDs performed in the simulation. KD nodes are coloured with their resulting differential networks coloured accordingly.
plotSimNetwork( simulation, what = c("source", "direct", "influence", "association"), ... )
plotSimNetwork( simulation, what = c("source", "direct", "influence", "association"), ... )
simulation |
a list, storing data and results generated from simulations |
what |
a character, indicating which network to retrieve, 'source' (default), 'direct', 'influence' or 'association' |
... |
additional parameters to |
The direct, influence and association networks represent different levels of true differential networks. The direct network contains differential regulatory interactions present in the original network. The influence network includes upstream interactions and the association network includes non-causative differential interactions.
a plot of the network
data(sim102) plotSimNetwork(sim102) plotSimNetwork(sim102, what = 'direct') plotSimNetwork(sim102, what = 'influence') plotSimNetwork(sim102, what = 'association')
data(sim102) plotSimNetwork(sim102) plotSimNetwork(sim102, what = 'direct') plotSimNetwork(sim102, what = 'influence') plotSimNetwork(sim102, what = 'association')
A dataset containing simulated expression dataset. Data is simulated using a dynamical systems model from a network sampled from the S. Cerevisiae regulatory network. The dataset is a list containing the results from the simulation, and other information generated subsequently.
sim102
sim102
A named list with 14 elements:
a numeric, indicating the iteration of the simulation (a total of 1000 were performed and 812 converged)
an S4 Matrix, containing vectorised inference scores of applying the methods implemented in the package. These are precomputed predictions
a named list, storing the parameters used to sample the initial values of input genes. Proportions, means and variances of each gene is stored for each gene
an igraph object, storing the initial regulatory network (150 node network)
an igraph object, representing the true differential network as determined using sensitivity analysis of the model
a matrix (150 x 2), storing the (x, y) positions of nodes for laying out the graph
a numeric, network density of the true differential association network
a numeric, the number of input genes in the regulatory network. These are genes that have no regulators therefore need to be pre-defined
a numeric, the number of input genes that are knocked-down therefore have a bimodal distribution
a numeric, the number of genes in the network that regulate any other gene (are TFs)
a numeric, the number of genes that are co-regulated, i.e. regulated by more than one TF
an S4 Matrix, the expression data with samples along the columns and genes along the rows. Condition classification (KD vs WT) are stored as attributes of this object
a data frame, consisting of gene triplets representing TF- Target associations conditioned on the gene knocked-down. Triplets are annotated for being in either the direct, influence and association networks
an S4 Matrix, sensitivities of genes to TFs based on perturbation analysis of the simulation model
LINK TO PAPERRRR