Main functions demonstration
Information measure is typically implemented by first discretizing
continuous variables into a count table, evaluating probability from the
counting, and/or then estimating entropy according to the (joint)
probability matrix, finally calculating the information value that is
the most representative for the association between variables. Two of
the most common discretization methods are adopted in this package. One
is a uniform width-based method (default) that divides the continuous
data into N
count bins with equal width. The other
alternative is a uniform frequency-based approach that determines the
continuous data into N
count bins with equal count number.
By default in both methods, the number of bins in these two methods is
initialized into a round-off value based on the square root of the data
size. In the process of probability estimation, three types of
probability estimators referencing to the entropy
package[5] that include the empirical estimator (default), the Dirichlet
distribution estimator and the shrinkage estimator, while the Dirichlet
distribution estimator also includes four different distribution with
different prior values. These different probability estimators are
showed in detail below:
method = “ML”: empirical estimator, also referred to maximum likelihood estimator,
method = “Jeffreys”: Dirichlet distribution estimator with prior a = 0.5,
method = “Laplace”: Dirichlet distribution estimator with prior a = 1,
method = “SG”: Dirichlet distribution estimator with prior a = 1/length(count table),
method = “minimax”: Dirichlet distribution estimator with prior a = sqrt(sum(count table))/length(count table),
method = “shrink”: shrinkage estimator.
MI.measure(): mutual information
In the case of two variables, the representative method is mutual information, used to measure the mutual dependence between two joint variables. It can be used to identify dependence between proteins in protein-protein interaction network inference. Two types of data formats can be used as input to the algorithm. One is the simple data.frame data type, and the other is the SummarizedExperiment data type.
# data.frame data type
library(Informeasure)
load(system.file("extdata/tcga.brca.testdata.Rdata", package = "Informeasure"))
mRNAexpression <- log2(mRNAexpression + 1)
x <- as.numeric(mRNAexpression[which(rownames(mRNAexpression) == "BRCA1"), ])
y <- as.numeric(mRNAexpression[which(rownames(mRNAexpression) == "BARD1"), ])
XY <- discretize2D(x,y)
MI.measure(XY)
##> [1] 0.6459387
# SummarizedExperiment data type
library(Informeasure)
library(SummarizedExperiment)
load(system.file("extdata/tcga.brca.testdata.Rdata", package = "Informeasure"))
mRNAexpression <- as.matrix(mRNAexpression)
se.mRNAexpression = SummarizedExperiment(assays = list(mRNAexpression = mRNAexpression))
assays(se.mRNAexpression)[["log2"]] <- log2(assays(se.mRNAexpression)[["mRNAexpression"]]+1)
x <- assays(se.mRNAexpression["BRCA1", ])$log2
y <- assays(se.mRNAexpression["BARD1", ])$log2
XY <- discretize2D(x,y)
MI.measure(XY)
##> [1] 0.6459387
CMI.measure(): conditional mutual informaiton
In the three-variable case, the most classic method is conditional mutual information. It is widely used to evaluate the expected mutual information between two random variables conditioned on the third one. Such characteristics of conditional mutual information are fully applicable to the lncRNA-associated ceRNA network inference.
# data.frame data type
library(Informeasure)
load(system.file("extdata/tcga.brca.testdata.Rdata", package = "Informeasure"))
lncRNAexpression <- log2(lncRNAexpression + 1)
miRNAexpression <- log2(miRNAexpression + 1)
mRNAexpression <- log2(mRNAexpression + 1)
x <- as.numeric(miRNAexpression[which(rownames(miRNAexpression) == "hsa-miR-26a-5p"), ])
y <- as.numeric(mRNAexpression[which(rownames(mRNAexpression) == "PTEN"), ])
z <- as.numeric(lncRNAexpression[which(rownames(lncRNAexpression) == "PTENP1"), ])
XYZ <- discretize3D(x,y,z)
CMI.measure(XYZ)
##> [1] 0.7697107
# SummarizedExperiment data type
library(Informeasure)
library(SummarizedExperiment)
load(system.file("extdata/tcga.brca.testdata.Rdata", package="Informeasure"))
lncRNAexpression <- as.matrix(lncRNAexpression)
se.lncRNAexpression = SummarizedExperiment(assays = list(lncRNAexpression = lncRNAexpression))
miRNAexpression <- as.matrix(miRNAexpression)
se.miRNAexpression = SummarizedExperiment(assays = list(miRNAexpression = miRNAexpression))
mRNAexpression <- as.matrix(mRNAexpression)
se.mRNAexpression = SummarizedExperiment(assays = list(mRNAexpression = mRNAexpression))
assays(se.lncRNAexpression)[["log2"]] <- log2(assays(se.lncRNAexpression)[["lncRNAexpression"]] + 1)
assays(se.miRNAexpression)[["log2"]] <- log2(assays(se.miRNAexpression)[["miRNAexpression"]] + 1)
assays(se.mRNAexpression)[["log2"]] <- log2(assays(se.mRNAexpression)[["mRNAexpression"]] + 1)
x <- assays(se.miRNAexpression["hsa-miR-26a-5p", ])$log2
y <- assays(se.mRNAexpression["PTEN", ])$log2
z <- assays(se.lncRNAexpression["PTENP1", ])$log2
XYZ <- discretize3D(x,y,z)
CMI.measure(XYZ)
##> [1] 0.7697107
II.measure(): interaction information
Interaction information, also known as co-information, measures the amount information contained in a set of variables beyond any subset of those variables. The number of variables here is limited to three. It can be applied to explore the cooperative or competitive regulation mechanism of two miRNAs on the common target mRNA.
# data.frame data type
library(Informeasure)
load(system.file("extdata/tcga.brca.testdata.Rdata", package = "Informeasure"))
miRNAexpression <- log2(miRNAexpression + 1)
mRNAexpression <- log2(mRNAexpression + 1)
x <- as.numeric(miRNAexpression[which(rownames(miRNAexpression) == "hsa-miR-34a-5p"), ])
y <- as.numeric(mRNAexpression[which(rownames(mRNAexpression) == "MYC"), ])
z <- as.numeric(miRNAexpression[which(rownames(miRNAexpression) == "hsa-miR-34b-5p"), ])
XYZ <- discretize3D(x,y,z)
II.measure(XYZ)
##> [1] 0.4676038
# SummarizedExperiment data type
library(Informeasure)
library(SummarizedExperiment)
load(system.file("extdata/tcga.brca.testdata.Rdata", package="Informeasure"))
miRNAexpression <- as.matrix(miRNAexpression)
se.miRNAexpression = SummarizedExperiment(assays = list(miRNAexpression = miRNAexpression))
mRNAexpression <- as.matrix(mRNAexpression)
se.mRNAexpression = SummarizedExperiment(assays = list(mRNAexpression = mRNAexpression))
assays(se.miRNAexpression)[["log2"]] <- log2(assays(se.miRNAexpression)[["miRNAexpression"]] + 1)
assays(se.mRNAexpression)[["log2"]] <- log2(assays(se.mRNAexpression)[["mRNAexpression"]] + 1)
x <- assays(se.miRNAexpression["hsa-miR-34a-5p", ])$log2
y <- assays(se.mRNAexpression["MYC", ])$log2
z <- assays(se.miRNAexpression["hsa-miR-34b-5p", ])$log2
XYZ <- discretize3D(x,y,z)
II.measure(XYZ)
##> [1] 0.4676038
PID.measure(): partial information decomposition
Partial information decomposition decomposes two source information acting on the common target into four information parts: joint information (synergy), unique information from x, unique information from y and shared information (redundancy). It also can be applied to explore the cooperative or competitive regulation mechanism of two miRNAs on the common target mRNA.
# data.frame data type
library(Informeasure)
load(system.file("extdata/tcga.brca.testdata.Rdata", package = "Informeasure"))
miRNAexpression <- log2(miRNAexpression + 1)
mRNAexpression <- log2(mRNAexpression + 1)
x <- as.numeric(miRNAexpression[which(rownames(miRNAexpression) == "hsa-miR-34a-5p"), ])
y <- as.numeric(miRNAexpression[which(rownames(miRNAexpression) == "hsa-miR-34b-5p"), ])
z <- as.numeric(mRNAexpression[which(rownames(mRNAexpression) == "MYC"), ])
XYZ <- discretize3D(x,y,z)
PID.measure(XYZ)
##> Synergy Unique_X Unique_Y Redundancy PID
##> 1 0.670815 0.1854147 0.003058109 0.2032112 1.062499
# SummarizedExperiment data type
library(Informeasure)
library(SummarizedExperiment)
load(system.file("extdata/tcga.brca.testdata.Rdata", package="Informeasure"))
miRNAexpression <- as.matrix(miRNAexpression)
se.miRNAexpression = SummarizedExperiment(assays = list(miRNAexpression = miRNAexpression))
mRNAexpression <- as.matrix(mRNAexpression)
se.mRNAexpression = SummarizedExperiment(assays = list(mRNAexpression = mRNAexpression))
assays(se.miRNAexpression)[["log2"]] <- log2(assays(se.miRNAexpression)[["miRNAexpression"]] + 1)
assays(se.mRNAexpression)[["log2"]] <- log2(assays(se.mRNAexpression)[["mRNAexpression"]] + 1)
x <- assays(se.miRNAexpression["hsa-miR-34a-5p", ])$log2
y <- assays(se.miRNAexpression["hsa-miR-34b-5p", ])$log2
z <- assays(se.mRNAexpression["MYC", ])$log2
XYZ <- discretize3D(x,y,z)
PID.measure(XYZ)
##> Synergy Unique_X Unique_Y Redundancy PID
##> 1 0.670815 0.1854147 0.003058109 0.2032112 1.062499
PMI.measure(): part mutual information
Part mutual information devotes to measuring the non-linearly direct dependencies between two random variables given a third, especially when any one variable has a potentially strong correlation with the third one. Such characteristics of part mutual information are also fully applicable to the lncRNA-associated ceRNA network inference.
# data.frame data type
library(Informeasure)
load(system.file("extdata/tcga.brca.testdata.Rdata", package = "Informeasure"))
lncRNAexpression <- log2(lncRNAexpression + 1)
miRNAexpression <- log2(miRNAexpression + 1)
mRNAexpression <- log2(mRNAexpression + 1)
x <- as.numeric(miRNAexpression[which(rownames(miRNAexpression) == "hsa-miR-26a-5p"), ])
y <- as.numeric(mRNAexpression[which(rownames(mRNAexpression) == "PTEN"), ])
z <- as.numeric(lncRNAexpression[which(rownames(lncRNAexpression) == "PTENP1"), ])
XYZ <- discretize3D(x,y,z)
PMI.measure(XYZ)
##> [1] 1.074813
# SummarizedExperiment data type
library(Informeasure)
library(SummarizedExperiment)
load(system.file("extdata/tcga.brca.testdata.Rdata", package="Informeasure"))
lncRNAexpression <- as.matrix(lncRNAexpression)
se.lncRNAexpression = SummarizedExperiment(assays = list(lncRNAexpression = lncRNAexpression))
miRNAexpression <- as.matrix(miRNAexpression)
se.miRNAexpression = SummarizedExperiment(assays = list(miRNAexpression = miRNAexpression))
mRNAexpression <- as.matrix(mRNAexpression)
se.mRNAexpression = SummarizedExperiment(assays = list(mRNAexpression = mRNAexpression))
assays(se.lncRNAexpression)[["log2"]] <- log2(assays(se.lncRNAexpression)[["lncRNAexpression"]] + 1)
assays(se.miRNAexpression)[["log2"]] <- log2(assays(se.miRNAexpression)[["miRNAexpression"]] + 1)
assays(se.mRNAexpression)[["log2"]] <- log2(assays(se.mRNAexpression)[["mRNAexpression"]] + 1)
x <- assays(se.miRNAexpression["hsa-miR-26a-5p", ])$log2
y <- assays(se.mRNAexpression["PTEN", ])$log2
z <- assays(se.lncRNAexpression["PTENP1", ])$log2
XYZ <- discretize3D(x,y,z)
PMI.measure(XYZ)
##> [1] 1.074813