Title: | Computes the Quaternary Dot Product Scoring Statistic for Signed and Unsigned Causal Graphs |
---|---|
Description: | QuaternaryProd is an R package that performs causal reasoning on biological networks, including publicly available networks such as STRINGdb. QuaternaryProd is an open-source alternative to commercial products such as Inginuity Pathway Analysis. For a given a set of differentially expressed genes, QuaternaryProd computes the significance of upstream regulators in the network by performing causal reasoning using the Quaternary Dot Product Scoring Statistic (Quaternary Statistic), Ternary Dot product Scoring Statistic (Ternary Statistic) and Fisher's exact test (Enrichment test). The Quaternary Statistic handles signed, unsigned and ambiguous edges in the network. Ambiguity arises when the direction of causality is unknown, or when the source node (e.g., a protein) has edges with conflicting signs for the same target gene. On the other hand, the Ternary Statistic provides causal reasoning using the signed and unambiguous edges only. The Vignette provides more details on the Quaternary Statistic and illustrates an example of how to perform causal reasoning using STRINGdb. |
Authors: | Carl Tony Fakhry [cre, aut], Ping Chen [ths], Kourosh Zarringhalam [aut, ths] |
Maintainer: | Carl Tony Fakhry <[email protected]> |
License: | GPL (>=3) |
Version: | 1.41.0 |
Built: | 2024-10-31 03:42:10 UTC |
Source: | https://github.com/bioc/QuaternaryProd |
QuaternaryProd is an R package that performs causal reasoning on biological networks, including publicly available networks such as STRINGdb. QuaternaryProd is an open-source alternative to commercial products such as Inginuity Pathway Analysis. For a given a set of differentially expressed genes, QuaternaryProd computes the significance of upstream regulators in the network by performing causal reasoning using the Quaternary Dot Product Scoring Statistic (Quaternary Statistic), Ternary Dot product Scoring Statistic (Ternary Statistic) and Fisher's exact test (Enrichment test). The Quaternary Statistic handles signed, unsigned and ambiguous edges in the network. Ambiguity arises when the direction of causality is unknown, or when the source node (e.g., a protein) has edges with conflicting signs for the same target gene. On the other hand, the Ternary Statistic provides causal reasoning using the signed and unambiguous edges only. The Vignette provides more details on the Quaternary Statistic and illustrates an example of how to perform causal reasoning using STRINGdb.
Package: | QuaternaryProd |
Type: | Package |
Version: | 1.15.3 |
Date: | 2015-10-22 |
License: | GPL (>= 2) |
Carl Tony Fakhry, Ping Chen and Kourosh Zarringhalam
Maintainer: Carl Tony Fakhry <[email protected]>
Carl Tony Fakhry, Parul Choudhary, Alex Gutteridge, Ben Sidders, Ping Chen, Daniel Ziemek, and Kourosh Zarringhalam. Interpreting transcriptional changes using causal graphs: new methods and their practical utility on public networks. BMC Bioinformatics, 17:318, 2016. ISSN 1471-2105. doi: 10.1186/s12859-016-1181-8.
Franceschini, A (2013). STRING v9.1: protein-protein interaction networks, with increased coverage and integration. In:'Nucleic Acids Res. 2013 Jan;41(Database issue):D808-15. doi: 10.1093/nar/gks1094. Epub 2012 Nov 29'.
This function computes the probability mass function for the Quaternary Dot Product Scoring Statistic for signed causal graphs. This includes scores with probabilities strictly greater than zero.
QP_Pmf(q_p, q_m, q_z, q_r, n_p, n_m, n_z, epsilon = 1e-16)
QP_Pmf(q_p, q_m, q_z, q_r, n_p, n_m, n_z, epsilon = 1e-16)
q_p |
Expected number of positive predictions. |
q_m |
Expected number of negative predictions. |
q_z |
Expected number of nil predictions. |
q_r |
Expected number of regulated predictions. |
n_p |
Number of positive predictions from experiments. |
n_m |
Number of negative predictions from experiments. |
n_z |
Number of nil predictions from experiments. |
epsilon |
parameter for thresholding probabilities of matrices. Default value is 1e-16. |
This function computes the probability for each score in the support of the distribution. The returned value is a vector of probabilities where the returned vector has names set equal to the corresponding scores.
Setting epsilon to zero will compute the probability mass function without ignoring any matrices with probabilities smaller than epsilon*D_max (D_max is the numerator associated with the matrix of highest probability for the given constraints). The default value of 1e-16 is experimentally validated to be a very reasonable threshold. Setting the threshold to higher values which are smaller than 1 will lead to understimating the probabilities of each score since more tables will be ignored.
Vector of probabilities for scores in the support.
Carl Tony Fakhry, Ping Chen and Kourosh Zarringhalam
Carl Tony Fakhry, Parul Choudhary, Alex Gutteridge, Ben Sidders, Ping Chen, Daniel Ziemek, and Kourosh Zarringhalam. Interpreting transcriptional changes using causal graphs: new methods and their practical utility on public networks. BMC Bioinformatics, 17:318, 2016. ISSN 1471-2105. doi: 10.1186/s12859-016-1181-8.
Franceschini, A (2013). STRING v9.1: protein-protein interaction networks, with increased coverage and integration. In:'Nucleic Acids Res. 2013 Jan;41(Database issue):D808-15. doi: 10.1093/nar/gks1094. Epub 2012 Nov 29'.
# Compute the probability mass function of the Quaternary Dot # Product Scoring Statistic for the given table margins. pmf <- QP_Pmf(50,50,50,0,50,50,50)
# Compute the probability mass function of the Quaternary Dot # Product Scoring Statistic for the given table margins. pmf <- QP_Pmf(50,50,50,0,50,50,50)
This function computes the probability of a score in the Quaternary Dot Product scoring distribution.
QP_Probability(score, q_p, q_m, q_z, q_r, n_p, n_m, n_z, epsilon = 1e-16)
QP_Probability(score, q_p, q_m, q_z, q_r, n_p, n_m, n_z, epsilon = 1e-16)
score |
The score for which the probability will be computed. |
q_p |
Expected number of positive predictions. |
q_m |
Expected number of negative predictions. |
q_z |
Expected number of nil predictions. |
q_r |
Expected number of regulated predictions. |
n_p |
Number of positive predictions from experiments. |
n_m |
Number of negative predictions from experiments. |
n_z |
Number of nil predictions from experiments. |
epsilon |
Threshold for probabilities of matrices. Default value is 1e-16. |
Setting epsilon to zero will compute the probability mass function without ignoring any matrices with probabilities smaller than epsilon*D_max (D_max is the numerator associated with the matrix of highest probability for the given constraints). The default value of 1e-16 is experimentally validated to be a very reasonable threshold. Setting the threshold to higher values which are smaller than 1 will lead to understimating the probabilities of each score since more tables will be ignored.
For computing p-values, the user is advised to use the p-value function which is optimized for such purposes.
This function returns a numerical value, where the numerical value is the probability of the score.
Carl Tony Fakhry, Ping Chen and Kourosh Zarringhalam
Carl Tony Fakhry, Parul Choudhary, Alex Gutteridge, Ben Sidders, Ping Chen, Daniel Ziemek, and Kourosh Zarringhalam. Interpreting transcriptional changes using causal graphs: new methods and their practical utility on public networks. BMC Bioinformatics, 17:318, 2016. ISSN 1471-2105. doi: 10.1186/s12859-016-1181-8.
Franceschini, A (2013). STRING v9.1: protein-protein interaction networks, with increased coverage and integration. In:'Nucleic Acids Res. 2013 Jan;41(Database issue):D808-15. doi: 10.1093/nar/gks1094. Epub 2012 Nov 29'.
QP_Pmf
, QP_Pvalue
, QP_SigPvalue
# Computing The probability of score 50 # for the given table margins. prob <- QP_Probability(0,50,50,50,0,50,50,50)
# Computing The probability of score 50 # for the given table margins. prob <- QP_Probability(0,50,50,50,0,50,50,50)
This function computes the right sided p-value for the Quaternary Dot Product Scoring Statistic.
QP_Pvalue(score, q_p, q_m, q_z, q_r, n_p, n_m, n_z, epsilon = 1e-16)
QP_Pvalue(score, q_p, q_m, q_z, q_r, n_p, n_m, n_z, epsilon = 1e-16)
score |
The score for which the p-value will be computed. |
q_p |
Expected number of positive predictions. |
q_m |
Expected number of negative predictions. |
q_z |
Expected number of nil predictions. |
q_r |
Expected number of regulated predictions. |
n_p |
Number of positive predictions from experiments. |
n_m |
Number of negative predictions from experiments. |
n_z |
Number of nil predictions from experiments. |
epsilon |
Threshold for probabilities of matrices. Default value is 1e-16. |
Setting epsilon to zero will compute the probability mass function without ignoring any matrices with probabilities smaller than epsilon*D_max (D_max is the numerator associated with the matrix of highest probability for the given constraints). The default value of 1e-16 is experimentally validated to be a very reasonable threshold. Setting the threshold to higher values which are smaller than 1 will lead to understimating the probabilities of each score since more tables will be ignored.
This function returns a numerical value, where the numerical value is the p-value of the score.
Carl Tony Fakhry, Ping Chen and Kourosh Zarringhalam
Carl Tony Fakhry, Parul Choudhary, Alex Gutteridge, Ben Sidders, Ping Chen, Daniel Ziemek, and Kourosh Zarringhalam. Interpreting transcriptional changes using causal graphs: new methods and their practical utility on public networks. BMC Bioinformatics, 17:318, 2016. ISSN 1471-2105. doi: 10.1186/s12859-016-1181-8.
Franceschini, A (2013). STRING v9.1: protein-protein interaction networks, with increased coverage and integration. In:'Nucleic Acids Res. 2013 Jan;41(Database issue):D808-15. doi: 10.1093/nar/gks1094. Epub 2012 Nov 29'.
# Computing The p-value of score 50 # for the given table margins. pval <- QP_Pvalue(50,50,50,50,0,50,50,50)
# Computing The p-value of score 50 # for the given table margins. pval <- QP_Pvalue(50,50,50,50,0,50,50,50)
This function computes the right sided p-value for the Quaternary Dot Product Scoring Statistic for statistically significant scores.
QP_SigPvalue(score, q_p, q_m, q_z, q_r, n_p, n_m, n_z, epsilon = 1e-16, sig_level = 0.05)
QP_SigPvalue(score, q_p, q_m, q_z, q_r, n_p, n_m, n_z, epsilon = 1e-16, sig_level = 0.05)
score |
The score for which the p-value will be computed. |
q_p |
Expected number of positive predictions. |
q_m |
Expected number of negative predictions. |
q_z |
Expected number of nil predictions. |
q_r |
Expected number of regulated predictions. |
n_p |
Number of positive predictions from experiments. |
n_m |
Number of negative predictions from experiments. |
n_z |
Number of nil predictions from experiments. |
epsilon |
Threshold for probabilities of matrices. Default value is 1e-16. |
sig_level |
Significance level of test hypothesis. Default value is 0.05. |
Setting epsilon to zero will compute the probability mass function without ignoring any matrices with probabilities smaller than epsilon*D_max (D_max is the numerator associated with the matrix of highest probability for the given constraints). The default value of 1e-16 is experimentally validated to be a very reasonable threshold. Setting the threshold to higher values which are smaller than 1 will lead to understimating the probabilities of each score since more tables will be ignored. If the score is not statistically significant, then a value of -1 will be returned.
This function returns a numerical value, where the numerical value is the p-value of a score if the score is statistically significant otherwise it returns -1.
Carl Tony Fakhry, Ping Chen and Kourosh Zarringhalam
Carl Tony Fakhry, Parul Choudhary, Alex Gutteridge, Ben Sidders, Ping Chen, Daniel Ziemek, and Kourosh Zarringhalam. Interpreting transcriptional changes using causal graphs: new methods and their practical utility on public networks. BMC Bioinformatics, 17:318, 2016. ISSN 1471-2105. doi: 10.1186/s12859-016-1181-8.
Franceschini, A (2013). STRING v9.1: protein-protein interaction networks, with increased coverage and integration. In:'Nucleic Acids Res. 2013 Jan;41(Database issue):D808-15. doi: 10.1093/nar/gks1094. Epub 2012 Nov 29'.
# Computing The p-value of score 50 # for the given table margins. pval <- QP_SigPvalue(50,50,50,50,0,50,50,50)
# Computing The p-value of score 50 # for the given table margins. pval <- QP_SigPvalue(50,50,50,50,0,50,50,50)
This function computes the support of the Quaternary Dot Product Scoring distribution for signed causal graphs. This includes all scores which have probabilities strictly greater than 0.
QP_Support(q_p, q_m, q_z, q_r, n_p, n_m, n_z)
QP_Support(q_p, q_m, q_z, q_r, n_p, n_m, n_z)
q_p |
Expected number of positive predictions. |
q_m |
Expected number of negative predictions. |
q_z |
Expected number of nil predictions. |
q_r |
Expected number of regulated predictions. |
n_p |
Number of positive predictions from experiments. |
n_m |
Number of negative predictions from experiments. |
n_z |
Number of nil predictions from experiments. |
Integer vector of support.
Carl Tony Fakhry, Ping Chen and Kourosh Zarringhalam
Carl Tony Fakhry, Parul Choudhary, Alex Gutteridge, Ben Sidders, Ping Chen, Daniel Ziemek, and Kourosh Zarringhalam. Interpreting transcriptional changes using causal graphs: new methods and their practical utility on public networks. BMC Bioinformatics, 17:318, 2016. ISSN 1471-2105. doi: 10.1186/s12859-016-1181-8.
Franceschini, A (2013). STRING v9.1: protein-protein interaction networks, with increased coverage and integration. In:'Nucleic Acids Res. 2013 Jan;41(Database issue):D808-15. doi: 10.1093/nar/gks1094. Epub 2012 Nov 29'.
# Compute the support of the Quaternary Dot Product Scoring distribution with the given margins. QP_Support(50,50,50,0,50,50,50)
# Compute the support of the Quaternary Dot Product Scoring distribution with the given margins. QP_Support(50,50,50,0,50,50,50)
This function runs a causal relation engine by computing the Quaternary Dot Product Scoring Statistic, Ternary Dot Product Scoring Statistic or the Enrichment test over the Homo Sapien STRINGdb causal network (version 10 provided under the Creative Commons license: https://creativecommons.org/licenses/by/3.0/). Note that the user has the option of specifying other causal networks with this function.
RunCRE_HSAStringDB(gene_expression_data, method = "Quaternary", fc.thresh = log2(1.3), pval.thresh = 0.05, only.significant.pvalues = FALSE, significance.level = 0.05, epsilon = 1e-16, progressBar = TRUE, relations = NULL, entities = NULL)
RunCRE_HSAStringDB(gene_expression_data, method = "Quaternary", fc.thresh = log2(1.3), pval.thresh = 0.05, only.significant.pvalues = FALSE, significance.level = 0.05, epsilon = 1e-16, progressBar = TRUE, relations = NULL, entities = NULL)
gene_expression_data |
A data frame for gene expression data. The |
method |
Choose one of |
fc.thresh |
Threshold for fold change in |
pval.thresh |
Threshold for p-values in |
only.significant.pvalues |
If |
significance.level |
When |
epsilon |
Threshold for probabilities of matrices. Default value is |
progressBar |
Progress bar for the percentage of computed p-values for the regulators in the network. Default
value is |
relations |
A data frame containing pairs of connected entities in a causal network,
and the type of causal relation between them. The data frame must have three columns with column names: srcuid,
trguid and mode respective of order. srcuid stands for source entity, trguid stands for
target entity and mode stands for the type of relation between srcuid and trguid. The relation
has to be one of +1 for upregulation, -1 for downregulation or 0 for regulation without
specified direction of regulation. All three columns must be of type integer. Default value is |
entities |
A data frame of mappings for all entities present in data frame relations. entities must contain
four columns: uid, id, symbol and type respective of order. uid must be
of type integer and id, symbol and type must be of type character. uid includes every source and target
node in the network (i.e relations),
id is the id of uid (e.g entrez id of an mRNA), symbol is the symbol of id and type
is the type of entity of id (e.g mRNA, protein, drug or compound). Default value is |
This function returns a data frame containing parameters concerning the method used. The p-values of each of the regulators is also computed, and the data frame is in increasing order of p-values of the goodness of fit score for the given regulators. The column names of the data frame are:
uid
The regulator in the causal network.
symbol
Symbol of the regulator.
regulation
Direction of regulation of the regulator.
correct.pred
Number of correct predictions in gene_expression_data
when compared to predictions made
by the network.
incorrect.pred
Number of incorrect predictions in gene_expression_data
when compared to predictions made
by the network.
score
The number of correct predictions minus the number of incorrect predictions.
total.reachable
Total Number of children of the given regulator.
significant.reachable
Number of children of the given regulator that are also present
in gene_expression_data
.
total.ambiguous
Total number of children of the given regulator which are regulated by the given regulator without
knowing the direction of regulation.
significant.ambiguous
Total number of children of the given regulator which are regulated by the given regulator without
knowing the direction of regulation and are also present in gene_expression_data
.
unknown
Number of target nodes in the causal network which do not interact with the given regulator.
pvalue
P-value of the score computed according to the selected method. If only.significant.pvalues = TRUE
and the pvalue
of the regulator is greater than significance.level
, then
the p-value is not computed and is set to a value of -1.
Carl Tony Fakhry, Ping Chen and Kourosh Zarringhalam
Carl Tony Fakhry, Parul Choudhary, Alex Gutteridge, Ben Sidders, Ping Chen, Daniel Ziemek, and Kourosh Zarringhalam. Interpreting transcriptional changes using causal graphs: new methods and their practical utility on public networks. BMC Bioinformatics, 17:318, 2016. ISSN 1471-2105. doi: 10.1186/s12859-016-1181-8.
Franceschini, A (2013). STRING v9.1: protein-protein interaction networks, with increased coverage and integration. In:'Nucleic Acids Res. 2013 Jan;41(Database issue):D808-15. doi: 10.1093/nar/gks1094. Epub 2012 Nov 29'.
# Get gene expression data e2f3 <- system.file("extdata", "e2f3_sig.txt", package = "QuaternaryProd") e2f3 <- read.table(e2f3, sep = "\t", header = TRUE, stringsAsFactors = FALSE) # Rename column names appropriately and remove duplicated entrez ids names(e2f3) <- c("entrez", "pvalue", "fc") e2f3 <- e2f3[!duplicated(e2f3$entrez),] # Compute the Quaternary Dot Product Scoring statistic for statistically significant # regulators in the STRINGdb network enrichment_results <- RunCRE_HSAStringDB(e2f3, method = "Enrichment", fc.thresh = log2(1.3), pval.thresh = 0.05, only.significant.pvalues = TRUE) enrichment_results[1:4, c("uid","symbol","regulation","pvalue")]
# Get gene expression data e2f3 <- system.file("extdata", "e2f3_sig.txt", package = "QuaternaryProd") e2f3 <- read.table(e2f3, sep = "\t", header = TRUE, stringsAsFactors = FALSE) # Rename column names appropriately and remove duplicated entrez ids names(e2f3) <- c("entrez", "pvalue", "fc") e2f3 <- e2f3[!duplicated(e2f3$entrez),] # Compute the Quaternary Dot Product Scoring statistic for statistically significant # regulators in the STRINGdb network enrichment_results <- RunCRE_HSAStringDB(e2f3, method = "Enrichment", fc.thresh = log2(1.3), pval.thresh = 0.05, only.significant.pvalues = TRUE) enrichment_results[1:4, c("uid","symbol","regulation","pvalue")]