Title: | TargetScore: Infer microRNA targets using microRNA-overexpression data and sequence information |
---|---|
Description: | Infer the posterior distributions of microRNA targets by probabilistically modelling the likelihood microRNA-overexpression fold-changes and sequence-based scores. Variaitonal Bayesian Gaussian mixture model (VB-GMM) is applied to log fold-changes and sequence scores to obtain the posteriors of latent variable being the miRNA targets. The final targetScore is computed as the sigmoid-transformed fold-change weighted by the averaged posteriors of target components over all of the features. |
Authors: | Yue Li |
Maintainer: | Yue Li <[email protected]> |
License: | GPL-2 |
Version: | 1.45.0 |
Built: | 2024-10-31 05:38:22 UTC |
Source: | https://github.com/bioc/TargetScore |
Infer the posterior distributions of microRNA targets by probabilistically modeling the likelihood microRNA-overexpression fold-changes and sequence-based scores. Variational Bayesian Gaussian mixture model (VB-GMM) is applied to log fold-changes and sequence scores to obtain the posteriors of latent variable being the miRNA targets. The final targetScore is computed as the sigmoid-transformed fold-change weighted by the averaged posteriors of target components over all of the features.
Package: | TargetScore |
Type: | Package |
Version: | 1.1.5 |
Date: | 2013-10-15 |
License: | GPL-2 |
The front-end main function targetScore
should be used to obtain the probablistic score of miRNA target. The workhourse function is vbgmm
, which implementates multivariate variational Bayesian Gaussian mixture model.
Yue Li <[email protected]>
Lim, L. P., Lau, N. C., Garrett-Engele, P., Grimson, A., Schelter, J. M., Castle, J., Bartel, D. P., Linsley, P. S., and Johnson, J. M. (2005). Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature, 433(7027), 769-773.
Bartel, D. P. (2009). MicroRNAs: Target Recognition and Regulatory Functions. Cell, 136(2), 215-233.
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer, Information Science and Statistics. NY, USA. (p474-486)
library(TargetScore) ls("package:TargetScore")
library(TargetScore) ls("package:TargetScore")
Depending on the dimension of x, repeat y in either by row or by column and apply element-wise operation defined by func.
bsxfun.se(func, x, y, expandByRow = TRUE)
bsxfun.se(func, x, y, expandByRow = TRUE)
func |
function with two or more input parameters. |
x , y
|
two vectors, matrices, or arrays |
expandByRow |
expand by row or by column of x when nrow(x)==ncol(x)==length(y) |
The function is used by vbgmm.
func(x , y)
|
A matrix of having the same dimension of x. |
Internal function.
Yue Li
bsxfun.se("*", matrix(c(1:10), nrow=2), matrix(c(1:5), nrow=5))
bsxfun.se("*", matrix(c(1:10), nrow=2), matrix(c(1:5), nrow=5))
Same as dot but handle single row matrix differently by multiplying each value but not sum them up
dot.ext(x, y, mydim)
dot.ext(x, y, mydim)
x |
numeric vector or matrix |
y |
numeric vector or matrix |
mydim |
Elementwise product (if 1); otherwise defined by |
Returns the 'dot' or 'scalar' product of vectors or columns of matrices. Two vectors must be of same length, two matrices must be of the same size. If x and y are column or row vectors, their dot product will be computed IF mydim is 1 (only difference from dot
).
A scalar or vector of length the number of columns of x and y.
Yue Li
dot.ext(1:5, 1:5) dot.ext(1:5, 1:5, 1)
dot.ext(1:5, 1:5) dot.ext(1:5, 1:5, 1)
Obtain for each gene the targetScore using using pre-computed (logFC) TargetScan context score and PCT as sequence score. TargetScanData package is needed.
getTargetScores(mirID, logFC, ...)
getTargetScores(mirID, logFC, ...)
mirID |
A character string of microRNA ID (e.g., hsa-miR-1) |
logFC |
N x D numeric vector or matrix of logFC with D replicates for N genes. |
... |
Paramters passed to |
This is a conveinient function for computing targetScore for a human miRNA using user-supplied or pre-computed logFC and (if available) two pre-computed sequence scores namely TargetScan context score and PCT (probibility of conserved targeting). The function also searches for any validated targets from the MirTarBase human validated target list. The function requires TargetScanData to be installed first.
targetScores |
numeric matrix of probabilistic targetScores together with the input variable and a binary vector indicating whether each gene is a valdiated target (if available). |
Yue Li
Lim, L. P., Lau, N. C., Garrett-Engele, P., Grimson, A., Schelter, J. M., Castle, J., Bartel, D. P., Linsley, P. S., and Johnson, J. M. (2005). Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature, 433(7027), 769-773.
Bartel, D. P. (2009). MicroRNAs: Target Recognition and Regulatory Functions. Cell, 136(2), 215-233.
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer, Information Science and Statistics. NY, USA. (p474-486)
if(interactive()) { library(TargetScoreData) library(Biobase) library(GEOquery) # compute targetScore from pre-computed logFC and sequence socres # for hsa-miR-1 mir1.score <- getTargetScores("hsa-miR-1", tol=1e-3, maxiter=200) # download fold-change data from GEO for hsa-miR-124 overexpression in HeLa gset <- getGEO("GSE2075", GSEMatrix =TRUE, AnnotGPL=TRUE) if (length(gset) > 1) idx <- grep("GPL1749", attr(gset, "names")) else idx <- 1 gset <- gset[[idx]] sampleinfo <- as.character(pData(gset)$title) geneInfo <- fData(gset) # only 24h data are used (discard 12h data) logfc.mir124 <- as.matrix(exprs(gset)[, grep("HeLa transfected with miR-1 versus control transfected HeLa, 24 hours", sampleinfo)]) rownames(logfc.mir124) <- geneInfo$`Gene symbol` mir124.score <- getTargetScores("hsa-miR-124", logfc.mir124, tol=1e-3, maxiter=200) head(mir124.score) }
if(interactive()) { library(TargetScoreData) library(Biobase) library(GEOquery) # compute targetScore from pre-computed logFC and sequence socres # for hsa-miR-1 mir1.score <- getTargetScores("hsa-miR-1", tol=1e-3, maxiter=200) # download fold-change data from GEO for hsa-miR-124 overexpression in HeLa gset <- getGEO("GSE2075", GSEMatrix =TRUE, AnnotGPL=TRUE) if (length(gset) > 1) idx <- grep("GPL1749", attr(gset, "names")) else idx <- 1 gset <- gset[[idx]] sampleinfo <- as.character(pData(gset)$title) geneInfo <- fData(gset) # only 24h data are used (discard 12h data) logfc.mir124 <- as.matrix(exprs(gset)[, grep("HeLa transfected with miR-1 versus control transfected HeLa, 24 hours", sampleinfo)]) rownames(logfc.mir124) <- geneInfo$`Gene symbol` mir124.score <- getTargetScores("hsa-miR-124", logfc.mir124, tol=1e-3, maxiter=200) head(mir124.score) }
Initialize latent varaibles based on the number of components. The function is run before the VB-EM iteration in vbgmm.
initialization(X, init)
initialization(X, init)
X |
D x N numeric vector or matrix of observations |
init |
Based on the dimension, init is expected to be one of the followings: scalar: number of components; vector: intial class labels; matrix: initialize with a D x K matrix for D variables and K components. |
The function is expected to be used by vbgmm to initialize assignments of latent varaibles before VM-EM iterations.
R |
N by K matrix for N observations and K latent components (defined by init) |
Yue Li
Mo Chen (2012). Matlab code for Variational Bayesian Inference for Gaussian Mixture Model. http://www.mathworks.com/matlabcentral/fileexchange/35362-variational-bayesian-inference-for-gaussian-mixture-model
tmp <- initialization(matrix(c(rnorm(100,mean=2), rnorm(100,mean=3)),nrow=1), init=2)
tmp <- initialization(matrix(c(rnorm(100,mean=2), rnorm(100,mean=3)),nrow=1), init=2)
Compute logarithm multivariate Gamma function.
logmvgamma(x, d)
logmvgamma(x, d)
x |
numeric vector or matrix |
d |
dimension |
Gamma_p(x) = pi^(p(p-1)/4) prod_(j=1)^p Gamma(x+(1-j)/2)
log Gamma_p(x) = p(p-1)/4 log pi + sum_(j=1)^p log Gamma(x+(1-j)/2)
Matrix of the same dimension as x.
Yue Li
Mo Chen (2012). Matlab code for Variational Bayesian Inference for Gaussian Mixture Model. http://www.mathworks.com/matlabcentral/fileexchange/35362-variational-bayesian-inference-for-gaussian-mixture-model
logmvgamma(matrix(1:6,nrow=3), 2)
logmvgamma(matrix(1:6,nrow=3), 2)
Compute log(sum(exp(x),dim)) while avoiding numerical underflow.
logsumexp(x, margin = 1)
logsumexp(x, margin = 1)
x |
numeric vector or matrix |
margin |
dimension to apply summation |
numeric vector or matrix of the same columns or rows (depending on margin) as x
Yue Li
Mo Chen (2012). Matlab code for Variational Bayesian Inference for Gaussian Mixture Model. http://www.mathworks.com/matlabcentral/fileexchange/35362-variational-bayesian-inference-for-gaussian-mixture-model
logsumexp(matrix(c(1:5)), 2)
logsumexp(matrix(c(1:5)), 2)
Sort Gaussian mixture components with model paramters in increasing order of averaged means of d variables.
sort_components(model)
sort_components(model)
model |
A list containing trained parameters of the Baysian GMM (see Value section in |
VB-GMM model list in increasing order of averaged means.
Yue Li
tmp <- vbgmm(c(rnorm(100,mean=2), rnorm(100,mean=3)), tol=1e-3) tmp$mu
tmp <- vbgmm(c(rnorm(100,mean=2), rnorm(100,mean=3)), tol=1e-3) tmp$mu
Given the overexpression fold-change and sequence-scores (optional) of all of the genes, calculate for each gene the TargetScore as a probability of miRNA target.
targetScore(logFC, seqScores, ...)
targetScore(logFC, seqScores, ...)
logFC |
numeric vector of log fold-changes of N genes in treatment (miRNA overexpression) vs control (mock). |
seqScores |
N x D numeric vector or matrix of D sequence-scores for N genes. Each score vector is expected to be equal to or less than 0. The more negative the scores, the more likely the corresponding target. |
... |
Paramters passed to |
Given expression fold-change (due to miRNA transfection), we use a three-component VB-GMM to infer down-regulated targets accounting for genes with little or positive log fold-change (due to off-target effects (Khan et al., 2009). Otherwise, two-component VB-GMM is applied to unsigned sequence scores (seqScores). The parameters for the VB-GMM are optimized using Variational Bayesian Expectation-Maximization (VB-EM) algorithm. Presumably, the mixture component with the largest absolute means of observed negative fold-change or sequence score is associated with miRNA targets and denoted as "target component". The other components correspond to the "background component". It follows that inferring miRNA-mRNA interactions most likely explained by the observed data is equivalent to inferring the posterior distribution of the target component given the observed variables. The targetScore is computed as the sigmoid-transformed fold-change weighted by the averaged posteriors of target components over all of the features. Specifically, we define the targetScore as a composite probabilistic score of a gene being the target t of a miRNA:
sigmoid(-logFC) (1/K+1) sum_x in {x_f, x_1, ..., x_L} p(t | x)),
where sigmoid(-logFC) = 1/(1 + exp(logFC)) and p(t | x) is the posterior of the first component computed by vbgmm
.
targetScore |
numeric vector of probabilistic targetScores for N genes |
Yue Li
Lim, L. P., Lau, N. C., Garrett-Engele, P., Grimson, A., Schelter, J. M., Castle, J., Bartel, D. P., Linsley, P. S., and Johnson, J. M. (2005). Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature, 433(7027), 769-773.
Bartel, D. P. (2009). MicroRNAs: Target Recognition and Regulatory Functions. Cell, 136(2), 215-233.
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer, Information Science and Statistics. NY, USA. (p474-486)
# A toy example: # 10 down-reg, 1000 unchanged, 90 up-reg genes # due to overexpressing a miRNA trmt <- c(rnorm(10,mean=0.01), rnorm(1000,mean=1), rnorm(90,mean=2)) + 1e3 ctrl <- c(rnorm(1100,mean=1)) + 1e3 logFC <- log2(trmt) - log2(ctrl) # 8 out of the 10 down-reg genes have prominent seq score A seqScoreA <- c(rnorm(8,mean=-2), rnorm(1092,mean=0)) # 10 down-reg genes plus 10 more genes have prominent seq score B seqScoreB <- c(rnorm(20,mean=-2), rnorm(1080,mean=0)) seqScores <- cbind(seqScoreA, seqScoreB) p.targetScore <- targetScore(logFC, seqScores, tol=1e-3)
# A toy example: # 10 down-reg, 1000 unchanged, 90 up-reg genes # due to overexpressing a miRNA trmt <- c(rnorm(10,mean=0.01), rnorm(1000,mean=1), rnorm(90,mean=2)) + 1e3 ctrl <- c(rnorm(1100,mean=1)) + 1e3 logFC <- log2(trmt) - log2(ctrl) # 8 out of the 10 down-reg genes have prominent seq score A seqScoreA <- c(rnorm(8,mean=-2), rnorm(1092,mean=0)) # 10 down-reg genes plus 10 more genes have prominent seq score B seqScoreB <- c(rnorm(20,mean=-2), rnorm(1080,mean=0)) seqScores <- cbind(seqScoreA, seqScoreB) p.targetScore <- targetScore(logFC, seqScores, tol=1e-3)
Given a N x D matrix of N observations and D variables, compute VB-GMM via VB-EM.
vbgmm(data, init = 2, prior, tol = 1e-20, maxiter = 2000, mirprior = TRUE, expectedTargetFreq = 0.01, verbose = FALSE)
vbgmm(data, init = 2, prior, tol = 1e-20, maxiter = 2000, mirprior = TRUE, expectedTargetFreq = 0.01, verbose = FALSE)
data |
N x D numeric vector or matrix of N observations (rows) and D variables (columns) |
init |
Based on the dimension, init is expected to be one of the followings: scalar: number of components; vector: intial class labels; matrix: initialize with a D x K matrix for D variables and K components. |
prior |
A list containing the hyperparameters including alpha (Dirichlet), m (Gaussian mean), kappa (Gaussian variance), v (Wishart degree of freedom), M (Wishart precision matrix). |
tol |
Threshold that defines termination/convergence of VB-EM when abs(L[t] - L[t-1])/abs(L[t]) < tol |
maxiter |
Scalar for maximum number of EM iterations |
mirprior |
Boolean to indicate whether to use expectedTargetFreq to initialize alpha0 for the hyperparameters of Dirichlet. |
expectedTargetFreq |
Expected target frequence within the gene population. By default, it is set to 0.01, which is consistent with the widely accepted prior knoweldge that 200/20000 targets per miRNA. |
verbose |
Boolean indicating whether to show progress in terms of lower bound ( |
The function implements variation Bayesian multivariate GMM described in Bishop (2006). Please refer to the reference below for more details. This is the workhorse of targetScore
. Alternatively, user can choose to apply this function to other problems other than miRNA target prediction.
A list containing:
label |
a vector of maximum-a-posteriori (MAP) assignments of latent discrete values based on the posteriors of latent variables. |
R |
N x D matrix of posteriors of latent variables |
mu |
Gaussian means of the latent components |
full.model |
A list containing posteriors R, logR, and the model parameters including alpha (Dirichlet), m (Gaussian mean), kappa (Gaussian variance), v (Wishart degree of freedom), M (Wishart precision matrix) |
L |
A vector of variational lower bound at each EM iterations (should be strictly increasing) |
Yue Li
Mo Chen (2012). Matlab code for Variational Bayesian Inference for Gaussian Mixture Model. http://www.mathworks.com/matlabcentral/fileexchange/35362-variational-bayesian-inference-for-gaussian-mixture-model
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer, Information Science and Statistics. NY, USA. (p474-486)
X <- c(rnorm(100,mean=2), rnorm(100,mean=3)) tmp <- vbgmm(X, tol=1e-3) names(tmp)
X <- c(rnorm(100,mean=2), rnorm(100,mean=3)) tmp <- vbgmm(X, tol=1e-3) names(tmp)
Evaluate variational lower bound to determine when to stop VB-EM iteration (convergence).
vbound(X, model, prior)
vbound(X, model, prior)
X |
D x N numeric vector or matrix of N observations (columns) and D variables (rows) |
model |
List containing model parameters (see |
prior |
numeric vector or matrix containing the hyperparameters for the prior distributions |
A continuous scalar indicating the lower bound (the higher the more converged)
X is expected to be D x N for N observations (columns) and D variables (rows)
Yue Li
Mo Chen (2012). Matlab code for Variational Bayesian Inference for Gaussian Mixture Model. http://www.mathworks.com/matlabcentral/fileexchange/35362-variational-bayesian-inference-for-gaussian-mixture-model
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer, Information Science and Statistics. NY, USA. (p474-486)
X <- c(rnorm(100,mean=2), rnorm(100,mean=3)) tmp <- vbgmm(X, tol=1e-3) head(tmp$L) # lower bound should be strictly increasing
X <- c(rnorm(100,mean=2), rnorm(100,mean=3)) tmp <- vbgmm(X, tol=1e-3) head(tmp$L) # lower bound should be strictly increasing
The E step in VB-EM iteration.
vexp(X, model)
vexp(X, model)
X |
D x N numeric vector or matrix of N observations (columns) and D variables (rows) |
model |
List containing model parameters (see |
model |
A list containing the updated model parameters including alpha (Dirichlet), m (Gaussian mean), kappa (Gaussian variance), v (Wishart degree of freedom), M (Wishart precision matrix). |
X is expected to be D x N for N observations (columns) and D variables (rows)
Yue Li
Mo Chen (2012). Matlab code for Variational Bayesian Inference for Gaussian Mixture Model. http://www.mathworks.com/matlabcentral/fileexchange/35362-variational-bayesian-inference-for-gaussian-mixture-model
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer, Information Science and Statistics. NY, USA. (p474-486)
X <- c(rnorm(100,mean=2), rnorm(100,mean=3)) tmp <- vbgmm(X, tol=1e-3) dim(tmp$R); head(tmp$R)
X <- c(rnorm(100,mean=2), rnorm(100,mean=3)) tmp <- vbgmm(X, tol=1e-3) dim(tmp$R); head(tmp$R)
The M step in VB-EM iteration.
vmax(X, model, prior)
vmax(X, model, prior)
X |
D x N numeric vector or matrix of N observations (columns) and D variables (rows) |
model |
List containing model parameters (see |
prior |
List containing the hyperparameters defining the prior distributions |
model |
A list containing the updated model parameters including alpha (Dirichlet), m (Gaussian mean), kappa (Gaussian variance), v (Wishart degree of freedom), M (Wishart precision matrix). |
X is expected to be D x N for N observations (columns) and D variables (rows)
Yue Li
Mo Chen (2012). Matlab code for Variational Bayesian Inference for Gaussian Mixture Model. http://www.mathworks.com/matlabcentral/fileexchange/35362-variational-bayesian-inference-for-gaussian-mixture-model
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer, Information Science and Statistics. NY, USA. (p474-486)
X <- c(rnorm(100,mean=2), rnorm(100,mean=3)) tmp <- vbgmm(X, tol=1e-3) names(tmp$full.model)
X <- c(rnorm(100,mean=2), rnorm(100,mean=3)) tmp <- vbgmm(X, tol=1e-3) names(tmp$full.model)