Title: | Zero-Inflated Negative Binomial Model for RNA-Seq Data |
---|---|
Description: | Implements a general and flexible zero-inflated negative binomial model that can be used to provide a low-dimensional representations of single-cell RNA-seq data. The model accounts for zero inflation (dropouts), over-dispersion, and the count nature of the data. The model also accounts for the difference in library sizes and optionally for batch effects and/or other covariates, avoiding the need for pre-normalize the data. |
Authors: | Davide Risso [aut, cre, cph], Svetlana Gribkova [aut], Fanny Perraudeau [aut], Jean-Philippe Vert [aut], Clara Bagatin [aut] |
Maintainer: | Davide Risso <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.29.0 |
Built: | 2024-11-18 04:44:53 UTC |
Source: | https://github.com/bioc/zinbwave |
Given a matrix of counts, this function computes the deviance residuals under a zero-inflated negative binomial (ZINB) model.
computeDevianceResiduals(model, x, ignoreW = TRUE)
computeDevianceResiduals(model, x, ignoreW = TRUE)
model |
the zinb model |
x |
the matrix of counts n cells by J genes |
ignoreW |
logical, if true matrix |
the matrix of deviance residuals of the model.
se <- SummarizedExperiment(matrix(rpois(60, lambda=5), nrow=10, ncol=6), colData = data.frame(bio = gl(2, 3))) m <- zinbFit(se, X=model.matrix(~bio, data=colData(se)), BPPARAM=BiocParallel::SerialParam()) computeDevianceResiduals(m, t(assay(se)))
se <- SummarizedExperiment(matrix(rpois(60, lambda=5), nrow=10, ncol=6), colData = data.frame(bio = gl(2, 3))) m <- zinbFit(se, X=model.matrix(~bio, data=colData(se)), BPPARAM=BiocParallel::SerialParam()) computeDevianceResiduals(m, t(assay(se)))
Given a matrix of counts, this function computes the observational weights of the counts under a zero-inflated negative binomial (ZINB) model. For each count, the ZINB distribution is parametrized by three parameters: the mean value and the dispersion of the negative binomial distribution, and the probability of the zero component.
computeObservationalWeights(model, x)
computeObservationalWeights(model, x)
model |
the zinb model |
x |
the matrix of counts |
the matrix of observational weights computed from the model.
se <- SummarizedExperiment(matrix(rpois(60, lambda=1), nrow=10, ncol=6), colData = data.frame(bio = gl(2, 3))) m <- zinbFit(se, X=model.matrix(~bio, data=colData(se)), BPPARAM=BiocParallel::SerialParam()) computeObservationalWeights(m, assay(se))
se <- SummarizedExperiment(matrix(rpois(60, lambda=1), nrow=10, ncol=6), colData = data.frame(bio = gl(2, 3))) m <- zinbFit(se, X=model.matrix(~bio, data=colData(se)), BPPARAM=BiocParallel::SerialParam()) computeObservationalWeights(m, assay(se))
Given an object that describes a matrix of zero-inflated distributions, returns the matrix of parameters associated with W for the mean part (mu)
getAlpha_mu(object, ...)
getAlpha_mu(object, ...)
object |
an object that describes a matrix of zero-inflated distributions. |
... |
Additional parameters. |
the matrix of alpha_mu parameters
a <- zinbModel(n=5, J=10) getAlpha_mu(a)
a <- zinbModel(n=5, J=10) getAlpha_mu(a)
Given an object that describes a matrix of zero-inflated distributions, returns the matrix of parameters associated with W for the zero part (pi)
getAlpha_pi(object, ...)
getAlpha_pi(object, ...)
object |
an object that describes a matrix of zero-inflated distributions. |
... |
Additional parameters. |
the matrix of alpha_pi parameters
a <- zinbModel(n=5, J=10) getAlpha_pi(a)
a <- zinbModel(n=5, J=10) getAlpha_pi(a)
Given an object that describes a matrix of zero-inflated distributions, returns the matrix of parameters associated with X_mu
getBeta_mu(object, ...)
getBeta_mu(object, ...)
object |
an object that describes a matrix of zero-inflated distributions. |
... |
Additional parameters. |
the matrix of beta_mu parameters
a <- zinbModel(n=5, J=10) getBeta_mu(a)
a <- zinbModel(n=5, J=10) getBeta_mu(a)
Given an object that describes a matrix of zero-inflated distributions, returns the matrix of parameters associated with X_pi
getBeta_pi(object, ...)
getBeta_pi(object, ...)
object |
an object that describes a matrix of zero-inflated distributions. |
... |
Additional parameters. |
the matrix of beta_pi parameters
a <- zinbModel(n=5, J=10) getBeta_pi(a)
a <- zinbModel(n=5, J=10) getBeta_pi(a)
Given an object describing a ZINB model, returns a vector of size the number
of rows in the parameter alpha
with the regularization parameters
associated to each row. Here alpha
refers to both alpha_mu
and
alpha_pi
, which have the same size and have the same regularization.
getEpsilon_alpha(object)
getEpsilon_alpha(object)
object |
an object that describes a matrix of zero-inflated distributions. |
the regularization parameters for alpha_mu
and
alpha_pi
.
a <- zinbModel(n=5, J=10) getEpsilon_alpha(a)
a <- zinbModel(n=5, J=10) getEpsilon_alpha(a)
Given an object describing a ZINB model, returns a vector of size the number
of rows in the parameter beta_mu
with the regularization parameters
associated to each row.
getEpsilon_beta_mu(object)
getEpsilon_beta_mu(object)
object |
an object that describes a matrix of zero-inflated distributions. |
the regularization parameters for beta_mu
.
a <- zinbModel(n=5, J=10) getEpsilon_beta_mu(a)
a <- zinbModel(n=5, J=10) getEpsilon_beta_mu(a)
Given an object describing a ZINB model, returns a vector of size the number
of rows in the parameter beta_pi
with the regularization parameters
associated to each row.
getEpsilon_beta_pi(object)
getEpsilon_beta_pi(object)
object |
an object that describes a matrix of zero-inflated distributions. |
the regularization parameters for beta_pi
.
a <- zinbModel(n=5, J=10) getEpsilon_beta_pi(a)
a <- zinbModel(n=5, J=10) getEpsilon_beta_pi(a)
Given an object describing a ZINB model, returns a vector of size the number
of columns in the parameter gamma_mu
with the regularization
parameters associated to each row.
getEpsilon_gamma_mu(object)
getEpsilon_gamma_mu(object)
object |
an object that describes a matrix of zero-inflated distributions. |
the regularization parameters for gamma_mu
.
a <- zinbModel(n=5, J=10) getEpsilon_gamma_mu(a)
a <- zinbModel(n=5, J=10) getEpsilon_gamma_mu(a)
Given an object describing a ZINB model, returns a vector of size the number
of columns in the parameter gamma_pi
with the regularization
parameters associated to each column.
getEpsilon_gamma_pi(object)
getEpsilon_gamma_pi(object)
object |
an object that describes a matrix of zero-inflated distributions. |
the regularization parameters for gamma_pi
.
a <- zinbModel(n=5, J=10) getEpsilon_gamma_pi(a)
a <- zinbModel(n=5, J=10) getEpsilon_gamma_pi(a)
Given an object describing a ZINB model, returns a vector of size the number
of columns in the parameter W
with the regularization
parameters associated to each column.
getEpsilon_W(object)
getEpsilon_W(object)
object |
an object that describes a matrix of zero-inflated distributions. |
the regularization parameters for W
.
a <- zinbModel(n=5, J=10) getEpsilon_W(a)
a <- zinbModel(n=5, J=10) getEpsilon_W(a)
The regularization parameter penalizes the variance of zeta, the log of the dispersion parameters across samples.
getEpsilon_zeta(object)
getEpsilon_zeta(object)
object |
an object that describes a matrix of zero-inflated distributions. |
the regularization parameters for zeta
.
a <- zinbModel(n=5, J=10) getEpsilon_zeta(a)
a <- zinbModel(n=5, J=10) getEpsilon_zeta(a)
Given an object that describes a matrix of zero-inflated distributions, returns the matrix of parameters associated with V_mu
getGamma_mu(object, ...)
getGamma_mu(object, ...)
object |
an object that describes a matrix of zero-inflated distributions. |
... |
Additional parameters. |
the matrix of gamma_mu parameters
a <- zinbModel(n=5, J=10) getGamma_mu(a)
a <- zinbModel(n=5, J=10) getGamma_mu(a)
Given an object that describes a matrix of zero-inflated distributions, returns the matrix of parameters associated with V_pi
getGamma_pi(object, ...)
getGamma_pi(object, ...)
object |
an object that describes a matrix of zero-inflated distributions. |
... |
Additional parameters. |
the matrix of gamma_pi parameters
a <- zinbModel(n=5, J=10) getGamma_pi(a)
a <- zinbModel(n=5, J=10) getGamma_pi(a)
Given an object that describes a matrix of zero-inflated distributions, returns the matrix of logit of probabilities of 0.
getLogitPi(object)
getLogitPi(object)
object |
an object that describes a matrix of zero-inflated distributions. |
Note that although the user interface of zinbFit
requires a J x n matrix, internally this is stored as a n x J matrix (i.e.,
samples in row and genes in column). Hence the parameter matrix returned by
this function is of n x J dimensions.
the matrix of logit-probabilities of 0
a <- zinbModel(n=5, J=10) getLogitPi(a)
a <- zinbModel(n=5, J=10) getLogitPi(a)
Given an object that describes a matrix of zero-inflated distributions, returns the matrix of logarithm of mean parameters.
getLogMu(object)
getLogMu(object)
object |
an object that describes a matrix of zero-inflated distributions. |
Note that although the user interface of zinbFit
requires a J x n matrix, internally this is stored as a n x J matrix (i.e.,
samples in row and genes in column). Hence the parameter matrix returned by
this function is of n x J dimensions.
the matrix of logarithms of mean parameters
a <- zinbModel(n=5, J=10) getLogMu(a)
a <- zinbModel(n=5, J=10) getLogMu(a)
Given an object that describes a matrix of zero-inflated distributions, returns the matrix of mean parameters.
getMu(object)
getMu(object)
object |
an object that describes a matrix of zero-inflated distributions. |
Note that although the user interface of zinbFit
requires a J x n matrix, internally this is stored as a n x J matrix (i.e.,
samples in row and genes in column). Hence the parameter matrix returned by
this function is of n x J dimensions.
the matrix of mean parameters
a <- zinbModel(n=5, J=10) getMu(a)
a <- zinbModel(n=5, J=10) getMu(a)
Given an object that describes a matrix of zero-inflated negative binomial
distributions, returns the vector of dispersion parameters phi
.
getPhi(object)
getPhi(object)
object |
an object that describes a matrix of zero-inflated. distributions. |
the vector of dispersion parameters
a <- zinbModel(n=5, J=10) getPhi(a)
a <- zinbModel(n=5, J=10) getPhi(a)
Given an object that describes a matrix of zero-inflated distributions, returns the matrix of probabilities of 0.
getPi(object)
getPi(object)
object |
an object that describes a matrix of zero-inflated distributions. |
Note that although the user interface of zinbFit
requires a J x n matrix, internally this is stored as a n x J matrix (i.e.,
samples in row and genes in column). Hence the parameter matrix returned by
this function is of n x J dimensions.
the matrix of probabilities of 0
a <- zinbModel(n=5, J=10) getPi(a)
a <- zinbModel(n=5, J=10) getPi(a)
Given an object that describes a matrix of zero-inflated negative binomial
distributions, returns the vector of inverse dispersion parameters
theta
.
getTheta(object)
getTheta(object)
object |
an object that describes a matrix of zero-inflated distributions. |
the vector of inverse dispersion parameters theta
a <- zinbModel(n=5, J=10) getTheta(a)
a <- zinbModel(n=5, J=10) getTheta(a)
Given an object that describes a matrix of zero-inflated distributions, returns the gene-level design matrix for mu
getV_mu(object, ...)
getV_mu(object, ...)
object |
an object that describes a matrix of zero-inflated distributions. |
... |
Additional parameters. |
the gene-level design matrix for mu
a <- zinbModel(n=5, J=10) getV_mu(a)
a <- zinbModel(n=5, J=10) getV_mu(a)
Given an object that describes a matrix of zero-inflated distributions, returns the gene-level design matrix for pi
getV_pi(object, ...)
getV_pi(object, ...)
object |
an object that describes a matrix of zero-inflated distributions. |
... |
Additional parameters. |
the gene-level design matrix for pi
a <- zinbModel(n=5, J=10) getV_pi(a)
a <- zinbModel(n=5, J=10) getV_pi(a)
Given an object that contains the fit of a ZINB-WaVE model, returns the
matrix W
of low-dimensional matrix of inferred sample-level
covariates.
getW(object)
getW(object)
object |
the matrix W
of inferred sample-level covariates.
a <- zinbModel(n=5, J=10) getW(a)
a <- zinbModel(n=5, J=10) getW(a)
Given an object that describes a matrix of zero-inflated distributions, returns the sample-level design matrix for mu
getX_mu(object, ...)
getX_mu(object, ...)
object |
an object that describes a matrix of zero-inflated distributions. |
... |
Additional parameters. |
the sample-level design matrix for mu
a <- zinbModel(n=5, J=10) getX_mu(a)
a <- zinbModel(n=5, J=10) getX_mu(a)
Given an object that describes a matrix of zero-inflated distributions, returns the sample-level design matrix for pi
getX_pi(object, ...)
getX_pi(object, ...)
object |
an object that describes a matrix of zero-inflated distributions. |
... |
Additional parameters. |
the sample-level design matrix for pi
a <- zinbModel(n=5, J=10) getX_pi(a)
a <- zinbModel(n=5, J=10) getX_pi(a)
Given an object that describes a matrix of zero-inflated negative binomial
distributions, returns the vector zeta
of log of inverse dispersion
parameters
getZeta(object)
getZeta(object)
object |
an object that describes a matrix of zero-inflated distributions. |
the vector zeta
of log of inverse dispersion parameters
a <- zinbModel(n=5, J=10) getZeta(a)
a <- zinbModel(n=5, J=10) getZeta(a)
This function recycles an old version of the glmLRT
method that allows an F-test with adjusted denominator degrees of freedom to
account for the downweighting in the zero-inflation model.
glmWeightedF( glmfit, coef = ncol(glmfit$design), contrast = NULL, ZI = TRUE, independentFiltering = TRUE, filter = NULL )
glmWeightedF( glmfit, coef = ncol(glmfit$design), contrast = NULL, ZI = TRUE, independentFiltering = TRUE, filter = NULL )
glmfit |
a |
coef |
integer or character vector indicating which coefficients of the
linear model are to be tested equal to zero. Values must be columns or
column names of design. Defaults to the last coefficient. Ignored if
|
contrast |
numeric vector or matrix specifying one or more contrasts of
the linear model coefficients to be tested equal to zero. Number of rows
must equal to the number of columns of |
ZI |
logical, specifying whether the degrees of freedom in the
statistical test should be adjusted according to the weights in the
|
independentFiltering |
logical, specifying whether independent filtering should be performed. |
filter |
vector of values to perform filtering on. Default is the mean of the fitted values from glmfit. |
When 'independentFiltering=TRUE' (default) an independent filtering step is applied prior to the multiple testing procedure, as described in great details in the 'DESeq2“ vignette. The values in the 'padjFilter' column refer to this procedure. They are identical to the 'FDR' values if the filtering step does not remove any gene, since the function uses the Benjamini-Hochberg correction by default. If the procedure filters some genes, the adjusted p-values will typically result in greater power to detect DE genes. The theory behind independent filtering is described in Bourgon et al. (2010).
This function uses an adapted version of the glmLRT
function
that was originally written by Gordon Smyth, Davis McCarthy and Yunshun
Chen as part of the edgeR package. Koen Van den Berge wrote code to adjust
residual degree of freedoom and added the independent filtering step.
McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297. Bourgon, Richard, Robert Gentleman, and Wolfgang Huber (2010) Independent Filtering Increases Detection Power for High-Throughput Experiments. PNAS 107 (21): 9546-51.
Given a matrix of counts and a zinb model, this function computes the imputed counts under a zero-inflated negative binomial (ZINB) model.
imputeZeros(model, x)
imputeZeros(model, x)
model |
the zinb model |
x |
the matrix of counts n cells by J genes |
the matrix of imputed counts.
se <- SummarizedExperiment(matrix(rpois(60, lambda=5), nrow=10, ncol=6), colData = data.frame(bio = gl(2, 3))) m <- zinbFit(se, X=model.matrix(~bio, data=colData(se)), BPPARAM=BiocParallel::SerialParam()) imputeZeros(m, t(assay(se)))
se <- SummarizedExperiment(matrix(rpois(60, lambda=5), nrow=10, ncol=6), colData = data.frame(bio = gl(2, 3))) m <- zinbFit(se, X=model.matrix(~bio, data=colData(se)), BPPARAM=BiocParallel::SerialParam()) imputeZeros(m, t(assay(se)))
This function uses the DESeq2
independent filtering method
to increase detection power in high throughput gene expression
studies.
independentFiltering(object, filter, objectType = c("edgeR", "limma"))
independentFiltering(object, filter, objectType = c("edgeR", "limma"))
object |
Either a |
filter |
The characteristic to use for filtering, usually a measure of normalized mean expression for the features. |
objectType |
Either |
Koen Van den Berge
Michael I Love, Wolfgang Huber, and Simon Anders. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12):550, dec 2014.
Given a statistical model and some data, this function computes the log-likelihood of the model given the data, i.e., the log-probability of the data under the model.
loglik(model, x, ...) ## S4 method for signature 'ZinbModel,matrix' loglik(model, x)
loglik(model, x, ...) ## S4 method for signature 'ZinbModel,matrix' loglik(model, x)
model |
an object that describes a statistical model. |
x |
an object that describes data. |
... |
additional arguments. |
The log-likelihood of the model given the data.
loglik(model = ZinbModel, x = matrix)
: return the log-likelihood of the ZINB model.
m <- zinbModel(n=5, J=10) x <- zinbSim(m) loglik(m, x$counts)
m <- zinbModel(n=5, J=10) x <- zinbSim(m) loglik(m, x$counts)
Given an object that describes a dataset or a model involving latent factors, this function returns the number of latent factors.
nFactors(x)
nFactors(x)
x |
an object that describes a dataset or a model involving latent factors |
the number of latent factors
a <- zinbModel() nFactors(a)
a <- zinbModel() nFactors(a)
Given an object that describes a dataset or a model, it returns the number of features.
nFeatures(x)
nFeatures(x)
x |
an object that describes a dataset or a model. |
the number of features.
a <- zinbModel() nFeatures(a)
a <- zinbModel() nFeatures(a)
Given an object that describes a model or a dataset, it returns total number of parameters of the model.
nParams(model) ## S4 method for signature 'ZinbModel' nParams(model)
nParams(model) ## S4 method for signature 'ZinbModel' nParams(model)
model |
an object that describes a dataset or a model. |
the total number of parameters of the model.
nParams(ZinbModel)
: returns the total number of parameters in the model.
a <- zinbModel() nParams(a)
a <- zinbModel() nParams(a)
Given an object that describes a model or a dataset, it returns the number of samples.
nSamples(x)
nSamples(x)
x |
an object that describes a dataset or a model. |
the number of samples.
a <- zinbModel() nSamples(a)
a <- zinbModel() nSamples(a)
Given two matrices U and V that can be multiplied, this function finds two new matrices U2 and V2 such that their product is conserved (U*V = U2*V2) and such that a||U||^2 + b||V||^2 is minimized.
orthogonalizeTraceNorm(U, V, a = 1, b = 1)
orthogonalizeTraceNorm(U, V, a = 1, b = 1)
U |
left matrix |
V |
right matrix |
a |
weight of the norm of U (default=1) |
b |
weight of the norm of V (default=1) |
A list with the two matrices that solve the problem in the slots U and V.
U <- matrix(rnorm(15),5,3) V <- matrix(rnorm(12),3,4) o <- orthogonalizeTraceNorm(U,V) norm( U%*%V - o$U%*%o$V) # should be zero sum(U^2)+sum(V^2) sum(o$U^2)+sum(o$V^2) # should be smaller
U <- matrix(rnorm(15),5,3) V <- matrix(rnorm(12),3,4) o <- orthogonalizeTraceNorm(U,V) norm( U%*%V - o$U%*%o$V) # should be zero sum(U^2)+sum(V^2) sum(o$U^2)+sum(o$V^2) # should be smaller
Given a statistical model with regularization parameters, compute the penalty.
penalty(model) ## S4 method for signature 'ZinbModel' penalty(model)
penalty(model) ## S4 method for signature 'ZinbModel' penalty(model)
model |
an object that describes a statistical model with regularization parameters. |
The penalty of the model.
penalty(ZinbModel)
: return the penalization.
m <- zinbModel(K=2) penalty(m)
m <- zinbModel(K=2) penalty(m)
This function performs independent filtering to increase detection power in high throughput gene expression studies.
pvalueAdjustment( baseMean, filter, pValue, theta, alpha = 0.05, pAdjustMethod = "BH" )
pvalueAdjustment( baseMean, filter, pValue, theta, alpha = 0.05, pAdjustMethod = "BH" )
baseMean |
A vector of mean values. |
filter |
A vector of stage-one filter statistics. |
pValue |
A vector of unadjusted p-values, or a function which is able to compute this vector from the filtered portion of data, if data is supplied. The option to supply a function is useful when the value of the test statistic depends on which hypotheses are filtered out at stage one. (The limma t-statistic is an example.) |
theta |
A vector with one or more filtering fractions to consider. Actual cutoffs are then computed internally by applying quantile to the filter statistics contained in (or produced by) the filter argument. |
alpha |
A cutoff to which p-values, possibly adjusted for multiple testing, will be compared. Default is 0.05. |
pAdjustMethod |
The unadjusted p-values contained in (or produced by) test will be adjusted for multiple testing after filtering. Default is "BH". |
a list with pvalues, filtering threshold, theta, number of rejections, and alpha.
This function is an adapted version of the
pvalueAdjustment
function that was originally written by
Michael I. Love as part of the DESeq2 package.
Koen Van den Berge adapted the function.
This function solves a regression or logistic regression problem regularized
by a L2 or weighted L2 penalty. Contrary to lm.ridge
or glmnet
,
it works for any number of predictors.
solveRidgeRegression( x, y, beta = rep(0, NCOL(x)), epsilon = 1e-06, family = c("gaussian", "binomial"), offset = rep(0, NROW(x)) )
solveRidgeRegression( x, y, beta = rep(0, NCOL(x)), epsilon = 1e-06, family = c("gaussian", "binomial"), offset = rep(0, NROW(x)) )
x |
a matrix of covariates, one sample per row, one covariate per column. |
y |
a vector of response (continuous for regression, 0/1 binary for logistic regression) |
beta |
an initial solution where optimization starts (null vector by default) |
epsilon |
a scalar or vector of regularization parameters (default
|
family |
a string to choose the type of regression (default
|
offset |
a vector of offsets (default null vector) |
When family="gaussian"
, we solve the ridge regression problem
that finds the that minimizes:
When family="binomial"
we solve the ridge
logistic regression problem
When epsilon
is a
vector of size equal to the size of beta
, then the penalty is a
weighted L2 norm .
A vector solution of the regression problem
Toy dataset to check the model
A matrix of integers (counts) with 96 samples (rows) and 500 genes (columns).
Given a vector of counts, this function computes the sum of the log-probabilities of the counts under a zero-inflated negative binomial (ZINB) model. For each count, the ZINB distribution is parametrized by three parameters: the mean value and the dispersion of the negative binomial distribution, and the probability of the zero component.
zinb.loglik(Y, mu, theta, logitPi)
zinb.loglik(Y, mu, theta, logitPi)
Y |
the vector of counts |
mu |
the vector of mean parameters of the negative binomial |
theta |
the vector of dispersion parameters of the negative binomial, or a single scalar is also possible if the dispersion parameter is constant. Note that theta is sometimes called inverse dispersion parameter (and phi=1/theta is then called the dispersion parameter). We follow the convention that the variance of the NB variable with mean mu and dispersion theta is mu + mu^2/theta. |
logitPi |
the vector of logit of the probabilities of the zero component |
the log-likelihood of the model.
n <- 10 mu <- seq(10,50,length.out=n) logitPi <- rnorm(10) zeta <- rnorm(10) Y <- rnbinom(n=n, size=exp(zeta), mu=mu) zinb.loglik(Y, mu, exp(zeta), logitPi) zinb.loglik(Y, mu, 1, logitPi)
n <- 10 mu <- seq(10,50,length.out=n) logitPi <- rnorm(10) zeta <- rnorm(10) Y <- rnbinom(n=n, size=exp(zeta), mu=mu) zinb.loglik(Y, mu, exp(zeta), logitPi) zinb.loglik(Y, mu, 1, logitPi)
Given a unique dispersion parameter and a set of counts, together with a corresponding set of mean parameters and probabilities of zero components, this function computes the sum of the log-probabilities of the counts under the ZINB model. The dispersion parameter is provided to the function through zeta = log(theta), where theta is sometimes called the inverse dispersion parameter. The probabilities of the zero components are provided through their logit, in order to better numerical stability.
zinb.loglik.dispersion(zeta, Y, mu, logitPi)
zinb.loglik.dispersion(zeta, Y, mu, logitPi)
zeta |
a scalar, the log of the inverse dispersion parameters of the negative binomial model |
Y |
a vector of counts |
mu |
a vector of mean parameters of the negative binomial |
logitPi |
a vector of logit of the probabilities of the zero component |
the log-likelihood of the model.
mu <- seq(10,50,length.out=10) logitPi <- rnorm(10) zeta <- rnorm(10) Y <- rnbinom(n=10, size=exp(zeta), mu=mu) zinb.loglik.dispersion(zeta, Y, mu, logitPi)
mu <- seq(10,50,length.out=10) logitPi <- rnorm(10) zeta <- rnorm(10) Y <- rnbinom(n=10, size=exp(zeta), mu=mu) zinb.loglik.dispersion(zeta, Y, mu, logitPi)
Derivative of the log-likelihood of the zero-inflated negative binomial model with respect to the log of the inverse dispersion parameter
zinb.loglik.dispersion.gradient(zeta, Y, mu, logitPi)
zinb.loglik.dispersion.gradient(zeta, Y, mu, logitPi)
zeta |
the log of the inverse dispersion parameters of the negative binomial |
Y |
a vector of counts |
mu |
a vector of mean parameters of the negative binomial |
logitPi |
a vector of the logit of the probability of the zero component |
the gradient of the inverse dispersion parameters.
zinb.loglik
, zinb.loglik.dispersion
.
Given a matrix of counts, this function computes the log-probabilities of the counts under a zero-inflated negative binomial (ZINB) model. For each count, the ZINB distribution is parametrized by three parameters: the mean value and the dispersion of the negative binomial distribution, and the probability of the zero component.
zinb.loglik.matrix(model, x)
zinb.loglik.matrix(model, x)
model |
the zinb model |
x |
the matrix of counts |
the matrix of log-likelihood of the model.
This function computes the penalized log-likelihood of a ZINB regression model given a vector of counts.
zinb.loglik.regression( alpha, Y, A.mu = matrix(nrow = length(Y), ncol = 0), B.mu = matrix(nrow = length(Y), ncol = 0), C.mu = matrix(0, nrow = length(Y), ncol = 1), A.pi = matrix(nrow = length(Y), ncol = 0), B.pi = matrix(nrow = length(Y), ncol = 0), C.pi = matrix(0, nrow = length(Y), ncol = 1), C.theta = matrix(0, nrow = length(Y), ncol = 1), epsilon = 0 )
zinb.loglik.regression( alpha, Y, A.mu = matrix(nrow = length(Y), ncol = 0), B.mu = matrix(nrow = length(Y), ncol = 0), C.mu = matrix(0, nrow = length(Y), ncol = 1), A.pi = matrix(nrow = length(Y), ncol = 0), B.pi = matrix(nrow = length(Y), ncol = 0), C.pi = matrix(0, nrow = length(Y), ncol = 1), C.theta = matrix(0, nrow = length(Y), ncol = 1), epsilon = 0 )
alpha |
the vectors of parameters c(a.mu, a.pi, b) concatenated |
Y |
the vector of counts |
A.mu |
matrix of the model (see Details, default=empty) |
B.mu |
matrix of the model (see Details, default=empty) |
C.mu |
matrix of the model (see Details, default=zero) |
A.pi |
matrix of the model (see Details, default=empty) |
B.pi |
matrix of the model (see Details, default=empty) |
C.pi |
matrix of the model (see Details, default=zero) |
C.theta |
matrix of the model (see Details, default=zero) |
epsilon |
regularization parameter. A vector of the same length as
|
The regression model is parametrized as follows:
where are
respectively the vector of mean parameters of the NB distribution, the
vector of probabilities of the zero component, and the vector of inverse
dispersion parameters. Note that the
vector is shared between the
mean of the negative binomial and the probability of zero. The
log-likelihood of a vector of parameters
is penalized by a regularization term
is
is a scalar, or
is
is a vector of the same size as
to allow for
differential regularization among the parameters.
the penalized log-likelihood.
This function computes the gradient of the penalized log-likelihood of a ZINB regression model given a vector of counts.
zinb.loglik.regression.gradient( alpha, Y, A.mu = matrix(nrow = length(Y), ncol = 0), B.mu = matrix(nrow = length(Y), ncol = 0), C.mu = matrix(0, nrow = length(Y), ncol = 1), A.pi = matrix(nrow = length(Y), ncol = 0), B.pi = matrix(nrow = length(Y), ncol = 0), C.pi = matrix(0, nrow = length(Y), ncol = 1), C.theta = matrix(0, nrow = length(Y), ncol = 1), epsilon = 0 )
zinb.loglik.regression.gradient( alpha, Y, A.mu = matrix(nrow = length(Y), ncol = 0), B.mu = matrix(nrow = length(Y), ncol = 0), C.mu = matrix(0, nrow = length(Y), ncol = 1), A.pi = matrix(nrow = length(Y), ncol = 0), B.pi = matrix(nrow = length(Y), ncol = 0), C.pi = matrix(0, nrow = length(Y), ncol = 1), C.theta = matrix(0, nrow = length(Y), ncol = 1), epsilon = 0 )
alpha |
the vectors of parameters c(a.mu, a.pi, b) concatenated |
Y |
the vector of counts |
A.mu |
matrix of the model (see Details, default=empty) |
B.mu |
matrix of the model (see Details, default=empty) |
C.mu |
matrix of the model (see Details, default=zero) |
A.pi |
matrix of the model (see Details, default=empty) |
B.pi |
matrix of the model (see Details, default=empty) |
C.pi |
matrix of the model (see Details, default=zero) |
C.theta |
matrix of the model (see Details, default=zero) |
epsilon |
regularization parameter. A vector of the same length as
|
The regression model is described in
zinb.loglik.regression
.
The gradient of the penalized log-likelihood.
Given the parameters of a ZINB regression model, this function parses the
model and computes the vector of log(mu), logit(pi), and the dimensions of
the different components of the vector of parameters. See
zinb.loglik.regression
for details of the ZINB regression model
and its parameters.
zinb.regression.parseModel(alpha, A.mu, B.mu, C.mu, A.pi, B.pi, C.pi)
zinb.regression.parseModel(alpha, A.mu, B.mu, C.mu, A.pi, B.pi, C.pi)
alpha |
the vectors of parameters c(a.mu, a.pi, b) concatenated |
A.mu |
matrix of the model (see above, default=empty) |
B.mu |
matrix of the model (see above, default=empty) |
C.mu |
matrix of the model (see above, default=zero) |
A.pi |
matrix of the model (see above, default=empty) |
B.pi |
matrix of the model (see above, default=empty) |
C.pi |
matrix of the model (see above, default=zero) |
A list with slots logMu
, logitPi
, dim.alpha
(a
vector of length 3 with the dimension of each of the vectors a.mu
,
a.pi
and b
in alpha
), and start.alpha
(a vector
of length 3 with the starting indices of the 3 vectors in alpha
)
Given a statistical model and some data, these functions compute the AIC or BIC of the model given the data, i.e., the AIC/BIC of the data under the model.
zinbAIC(model, x) zinbBIC(model, x) ## S4 method for signature 'ZinbModel,matrix' zinbAIC(model, x) ## S4 method for signature 'ZinbModel,matrix' zinbBIC(model, x)
zinbAIC(model, x) zinbBIC(model, x) ## S4 method for signature 'ZinbModel,matrix' zinbAIC(model, x) ## S4 method for signature 'ZinbModel,matrix' zinbBIC(model, x)
model |
an object that describes a statistical model. |
x |
an object that describes data. |
the AIC/BIC of the model.
zinbAIC(model = ZinbModel, x = matrix)
: returns the AIC of the ZINB model.
zinbBIC(model = ZinbModel, x = matrix)
: returns the BIC of the ZINB model.
se <- SummarizedExperiment(matrix(rpois(60, lambda=5), nrow=10, ncol=6), colData = data.frame(bio = gl(2, 3))) m <- zinbFit(se, X=model.matrix(~bio, data=colData(se)), BPPARAM=BiocParallel::SerialParam()) zinbAIC(m, t(assay(se))) zinbBIC(m, t(assay(se)))
se <- SummarizedExperiment(matrix(rpois(60, lambda=5), nrow=10, ncol=6), colData = data.frame(bio = gl(2, 3))) m <- zinbFit(se, X=model.matrix(~bio, data=colData(se)), BPPARAM=BiocParallel::SerialParam()) zinbAIC(m, t(assay(se))) zinbBIC(m, t(assay(se)))
Given an object with the data, it fits a ZINB model.
zinbFit(Y, ...) ## S4 method for signature 'SummarizedExperiment' zinbFit( Y, X, V, K, which_assay, commondispersion = TRUE, zeroinflation = TRUE, verbose = FALSE, nb.repeat.initialize = 2, maxiter.optimize = 25, stop.epsilon.optimize = 1e-04, BPPARAM = BiocParallel::bpparam(), ... ) ## S4 method for signature 'matrix' zinbFit( Y, X, V, K, commondispersion = TRUE, zeroinflation = TRUE, verbose = FALSE, nb.repeat.initialize = 2, maxiter.optimize = 25, stop.epsilon.optimize = 1e-04, BPPARAM = BiocParallel::bpparam(), ... ) ## S4 method for signature 'dgCMatrix' zinbFit(Y, ...)
zinbFit(Y, ...) ## S4 method for signature 'SummarizedExperiment' zinbFit( Y, X, V, K, which_assay, commondispersion = TRUE, zeroinflation = TRUE, verbose = FALSE, nb.repeat.initialize = 2, maxiter.optimize = 25, stop.epsilon.optimize = 1e-04, BPPARAM = BiocParallel::bpparam(), ... ) ## S4 method for signature 'matrix' zinbFit( Y, X, V, K, commondispersion = TRUE, zeroinflation = TRUE, verbose = FALSE, nb.repeat.initialize = 2, maxiter.optimize = 25, stop.epsilon.optimize = 1e-04, BPPARAM = BiocParallel::bpparam(), ... ) ## S4 method for signature 'dgCMatrix' zinbFit(Y, ...)
Y |
The data (genes in rows, samples in columns). |
... |
Additional parameters to describe the model, see
|
X |
The design matrix containing sample-level covariates, one sample per row. If missing, X will contain only an intercept. If Y is a SummarizedExperiment object, X can be a formula using the variables in the colData slot of Y. |
V |
The design matrix containing gene-level covariates, one gene per row. If missing, V will contain only an intercept. If Y is a SummarizedExperiment object, V can be a formula using the variables in the rowData slot of Y. |
K |
integer. Number of latent factors. |
which_assay |
numeric or character. Which assay of Y to use (only if Y is a SummarizedExperiment). |
commondispersion |
Whether or not a single dispersion for all features is estimated (default TRUE). |
zeroinflation |
Whether or not a ZINB model should be fitted. If FALSE, a negative binomial model is fitted instead. |
verbose |
Print helpful messages. |
nb.repeat.initialize |
Number of iterations for the initialization of beta_mu and gamma_mu. |
maxiter.optimize |
maximum number of iterations for the optimization step (default 25). |
stop.epsilon.optimize |
stopping criterion in the optimization step, when the relative gain in likelihood is below epsilon (default 0.0001). |
BPPARAM |
object of class |
By default, i.e., if no arguments other than Y
are passed,
the model is fitted with an intercept for the regression across-samples and
one intercept for the regression across genes, both for mu and for pi.
This means that by default the model is fitted with X_mu =
X_pi = 1_n
and V_mu = V_pi = 1_J
. If the user explicitly passes the
design matrices, this behavior is overwritten, i.e., the user needs to
explicitly include the intercept in the design matrices.
If Y is a Summarized experiment, the function uses the assay named "counts", if any, or the first assay.
Currently, if Y is a sparseMatrix, this calls the zinbFit method on as.matrix(Y)
An object of class ZinbModel
that has been fitted by penalized
maximum likelihood on the data.
zinbFit(SummarizedExperiment)
: Y is a
SummarizedExperiment
.
zinbFit(matrix)
: Y is a matrix of counts (genes in rows).
zinbFit(dgCMatrix)
: Y is a sparse matrix of counts (genes in rows).
se <- SummarizedExperiment(matrix(rpois(60, lambda=5), nrow=10, ncol=6), colData = data.frame(bio = gl(2, 3))) m <- zinbFit(se, X=model.matrix(~bio, data=colData(se)), BPPARAM=BiocParallel::SerialParam()) bio <- gl(2, 3) m <- zinbFit(matrix(rpois(60, lambda=5), nrow=10, ncol=6), X=model.matrix(~bio), BPPARAM=BiocParallel::SerialParam())
se <- SummarizedExperiment(matrix(rpois(60, lambda=5), nrow=10, ncol=6), colData = data.frame(bio = gl(2, 3))) m <- zinbFit(se, X=model.matrix(~bio, data=colData(se)), BPPARAM=BiocParallel::SerialParam()) bio <- gl(2, 3) m <- zinbFit(matrix(rpois(60, lambda=5), nrow=10, ncol=6), X=model.matrix(~bio), BPPARAM=BiocParallel::SerialParam())
The initialization performs quick optimization of the parameters with several simplifying assumptions compared to the true model: non-zero counts are models as log-Gaussian, zeros are modeled as dropouts. The dispersion parameter is not modified.
zinbInitialize( m, Y, nb.repeat = 2, it.max = 100, BPPARAM = BiocParallel::bpparam() )
zinbInitialize( m, Y, nb.repeat = 2, it.max = 100, BPPARAM = BiocParallel::bpparam() )
m |
The model of class ZinbModel |
Y |
The matrix of counts. |
nb.repeat |
Number of iterations for the estimation of beta_mu and gamma_mu. |
it.max |
Maximum number of iterations in softImpute. |
BPPARAM |
object of class |
An object of class ZinbModel similar to the one given as argument with modified parameters alpha_mu, alpha_pi, beta_mu, beta_pi, gamma_mu, gamma_pi, W.
Y <- matrix(rpois(60, lambda=2), 6, 10) bio <- gl(2, 3) time <- rnorm(6) gc <- rnorm(10) m <- zinbModel(Y, X=model.matrix(~bio + time), V=model.matrix(~gc), which_X_pi=1L, which_V_mu=1L, K=1) m <- zinbInitialize(m, Y, BPPARAM=BiocParallel::SerialParam())
Y <- matrix(rpois(60, lambda=2), 6, 10) bio <- gl(2, 3) time <- rnorm(6) gc <- rnorm(10) m <- zinbModel(Y, X=model.matrix(~bio + time), V=model.matrix(~gc), which_X_pi=1L, which_V_mu=1L, K=1) m <- zinbInitialize(m, Y, BPPARAM=BiocParallel::SerialParam())
Initialize an object of class ZinbModel
zinbModel( X, V, O_mu, O_pi, which_X_mu, which_X_pi, which_V_mu, which_V_pi, W, beta_mu, beta_pi, gamma_mu, gamma_pi, alpha_mu, alpha_pi, zeta, epsilon, epsilon_beta_mu, epsilon_gamma_mu, epsilon_beta_pi, epsilon_gamma_pi, epsilon_W, epsilon_alpha, epsilon_zeta, epsilon_min_logit, n, J, K )
zinbModel( X, V, O_mu, O_pi, which_X_mu, which_X_pi, which_V_mu, which_V_pi, W, beta_mu, beta_pi, gamma_mu, gamma_pi, alpha_mu, alpha_pi, zeta, epsilon, epsilon_beta_mu, epsilon_gamma_mu, epsilon_beta_pi, epsilon_gamma_pi, epsilon_W, epsilon_alpha, epsilon_zeta, epsilon_min_logit, n, J, K )
X |
matrix. The design matrix containing sample-level covariates, one sample per row. |
V |
matrix. The design matrix containing gene-level covariates, one gene per row. |
O_mu |
matrix. The offset matrix for mu. |
O_pi |
matrix. The offset matrix for pi. |
which_X_mu |
integer. Indeces of which columns of X to use in the regression of mu. |
which_X_pi |
integer. Indeces of which columns of X to use in the regression of pi. |
which_V_mu |
integer. Indeces of which columns of V to use in the regression of mu. |
which_V_pi |
integer. Indeces of which columns of V to use in the regression of pi. |
W |
matrix. The factors of sample-level latent factors. |
beta_mu |
matrix or NULL. The coefficients of X in the regression of mu. |
beta_pi |
matrix or NULL. The coefficients of X in the regression of pi. |
gamma_mu |
matrix or NULL. The coefficients of V in the regression of mu. |
gamma_pi |
matrix or NULL. The coefficients of V in the regression of pi. |
alpha_mu |
matrix or NULL. The coefficients of W in the regression of mu. |
alpha_pi |
matrix or NULL. The coefficients of W in the regression of pi. |
zeta |
numeric. A vector of log of inverse dispersion parameters. |
epsilon |
nonnegative scalar. Regularization parameter. |
epsilon_beta_mu |
nonnegative scalar. Regularization parameter for beta_mu. |
epsilon_gamma_mu |
nonnegative scalar. Regularization parameter for gamma_mu. |
epsilon_beta_pi |
nonnegative scalar. Regularization parameter for beta_pi. |
epsilon_gamma_pi |
nonnegative scalar. Regularization parameter for gamma_pi. |
epsilon_W |
nonnegative scalar. Regularization parameter for W. |
epsilon_alpha |
nonnegative scalar. Regularization parameter for alpha (both alpha_mu and alpha_pi). |
epsilon_zeta |
nonnegative scalar. Regularization parameter for zeta. |
epsilon_min_logit |
scalar. Minimum regularization parameter for parameters of the logit model, including the intercept. |
n |
integer. Number of samples. |
J |
integer. Number of genes. |
K |
integer. Number of latent factors. |
This is a wrapper around the new() function to create an
instance of class ZinbModel
. Rarely, the user will need to create a
ZinbModel
object from scratch, as tipically this is the result of
zinbFit
.
If any of X
, V
, W
matrices are passed,
n
, J
, and K
are inferred. Alternatively, the user can
specify one or more of n
, J
, and K
.
The regularization parameters can be set by a unique parameter
epsilon
or specific values for the different regularization
parameters can also be provided.
If only epsilon
is specified, the other parameters take the
following values:
epsilon_beta = epsilon/J
epsilon_gamma = epsilon/n
epsilon_W = epsilon/n
epsilon_alpha = epsilon/J
epsilon_zeta = epsilon
We empirically found that large values of epsilon
provide a more
stable estimation of W
.
A call with no argument has the following default values: n =
50
, J = 100
, K = 0
, epsilon=J
.
Although it is possible to create new instances of the class by
calling this function, this is not the most common way of creating
ZinbModel
objects. The main use of the class is within the
zinbFit
function.
an object of class ZinbModel
.
a <- zinbModel() nSamples(a) nFeatures(a) nFactors(a) nParams(a)
a <- zinbModel() nSamples(a) nFeatures(a) nFactors(a) nParams(a)
Objects of this class store all the values needed to work with a zero-inflated negative binomial (ZINB) model, as described in the vignette. They contain all information to fit a model by penalized maximum likelihood or simulate data from a model.
## S4 method for signature 'ZinbModel' show(object) ## S4 method for signature 'ZinbModel' nSamples(x) ## S4 method for signature 'ZinbModel' nFeatures(x) ## S4 method for signature 'ZinbModel' nFactors(x) ## S4 method for signature 'ZinbModel' getX_mu(object, intercept = TRUE) ## S4 method for signature 'ZinbModel' getX_pi(object, intercept = TRUE) ## S4 method for signature 'ZinbModel' getV_mu(object, intercept = TRUE) ## S4 method for signature 'ZinbModel' getV_pi(object, intercept = TRUE) ## S4 method for signature 'ZinbModel' getLogMu(object) ## S4 method for signature 'ZinbModel' getMu(object) ## S4 method for signature 'ZinbModel' getLogitPi(object) ## S4 method for signature 'ZinbModel' getPi(object) ## S4 method for signature 'ZinbModel' getZeta(object) ## S4 method for signature 'ZinbModel' getPhi(object) ## S4 method for signature 'ZinbModel' getTheta(object) ## S4 method for signature 'ZinbModel' getEpsilon_beta_mu(object) ## S4 method for signature 'ZinbModel' getEpsilon_gamma_mu(object) ## S4 method for signature 'ZinbModel' getEpsilon_beta_pi(object) ## S4 method for signature 'ZinbModel' getEpsilon_gamma_pi(object) ## S4 method for signature 'ZinbModel' getEpsilon_W(object) ## S4 method for signature 'ZinbModel' getEpsilon_alpha(object) ## S4 method for signature 'ZinbModel' getEpsilon_zeta(object) ## S4 method for signature 'ZinbModel' getW(object) ## S4 method for signature 'ZinbModel' getBeta_mu(object) ## S4 method for signature 'ZinbModel' getBeta_pi(object) ## S4 method for signature 'ZinbModel' getGamma_mu(object) ## S4 method for signature 'ZinbModel' getGamma_pi(object) ## S4 method for signature 'ZinbModel' getAlpha_mu(object) ## S4 method for signature 'ZinbModel' getAlpha_pi(object)
## S4 method for signature 'ZinbModel' show(object) ## S4 method for signature 'ZinbModel' nSamples(x) ## S4 method for signature 'ZinbModel' nFeatures(x) ## S4 method for signature 'ZinbModel' nFactors(x) ## S4 method for signature 'ZinbModel' getX_mu(object, intercept = TRUE) ## S4 method for signature 'ZinbModel' getX_pi(object, intercept = TRUE) ## S4 method for signature 'ZinbModel' getV_mu(object, intercept = TRUE) ## S4 method for signature 'ZinbModel' getV_pi(object, intercept = TRUE) ## S4 method for signature 'ZinbModel' getLogMu(object) ## S4 method for signature 'ZinbModel' getMu(object) ## S4 method for signature 'ZinbModel' getLogitPi(object) ## S4 method for signature 'ZinbModel' getPi(object) ## S4 method for signature 'ZinbModel' getZeta(object) ## S4 method for signature 'ZinbModel' getPhi(object) ## S4 method for signature 'ZinbModel' getTheta(object) ## S4 method for signature 'ZinbModel' getEpsilon_beta_mu(object) ## S4 method for signature 'ZinbModel' getEpsilon_gamma_mu(object) ## S4 method for signature 'ZinbModel' getEpsilon_beta_pi(object) ## S4 method for signature 'ZinbModel' getEpsilon_gamma_pi(object) ## S4 method for signature 'ZinbModel' getEpsilon_W(object) ## S4 method for signature 'ZinbModel' getEpsilon_alpha(object) ## S4 method for signature 'ZinbModel' getEpsilon_zeta(object) ## S4 method for signature 'ZinbModel' getW(object) ## S4 method for signature 'ZinbModel' getBeta_mu(object) ## S4 method for signature 'ZinbModel' getBeta_pi(object) ## S4 method for signature 'ZinbModel' getGamma_mu(object) ## S4 method for signature 'ZinbModel' getGamma_pi(object) ## S4 method for signature 'ZinbModel' getAlpha_mu(object) ## S4 method for signature 'ZinbModel' getAlpha_pi(object)
object |
an object of class |
x |
an object of class |
intercept |
logical. Whether to return the intercept (ignored if the
design matrix has no intercept). Default |
For the full description of the model see the model vignette.
Internally, the slots are checked so that the matrices are of the
appropriate dimensions: in particular, X
, O_mu
, O_pi
,
and W
need to have n
rows, V
needs to have J
rows, zeta
must be of length J
.
nSamples
returns the number of samples; nFeatures
returns the number of features; nFactors
returns the number of latent
factors.
show(ZinbModel)
: show useful info on the object.
nSamples(ZinbModel)
: returns the number of samples.
nFeatures(ZinbModel)
: returns the number of features.
nFactors(ZinbModel)
: returns the number of latent factors.
getX_mu(ZinbModel)
: returns the sample-level design matrix for mu.
getX_pi(ZinbModel)
: returns the sample-level design matrix for pi.
getV_mu(ZinbModel)
: returns the gene-level design matrix for mu.
getV_pi(ZinbModel)
: returns the sample-level design matrix for pi.
getLogMu(ZinbModel)
: returns the logarithm of the mean of the non-zero
component.
getMu(ZinbModel)
: returns the mean of the non-zero component.
getLogitPi(ZinbModel)
: returns the logit-probability of zero.
getPi(ZinbModel)
: returns the probability of zero.
getZeta(ZinbModel)
: returns the log of the inverse of the dispersion
parameter.
getPhi(ZinbModel)
: returns the dispersion parameter.
getTheta(ZinbModel)
: returns the inverse of the dispersion parameter.
getEpsilon_beta_mu(ZinbModel)
: returns the regularization parameters for
beta_mu
.
getEpsilon_gamma_mu(ZinbModel)
: returns the regularization parameters for
gamma_mu
.
getEpsilon_beta_pi(ZinbModel)
: returns the regularization parameters for
beta_pi
.
getEpsilon_gamma_pi(ZinbModel)
: returns the regularization parameters for
gamma_pi
.
getEpsilon_W(ZinbModel)
: returns the regularization parameters for
W
.
getEpsilon_alpha(ZinbModel)
: returns the regularization parameters for
alpha
.
getEpsilon_zeta(ZinbModel)
: returns the regularization parameters for
zeta
.
getW(ZinbModel)
: returns the matrix W of inferred sample-level
covariates.
getBeta_mu(ZinbModel)
: returns the matrix beta_mu of inferred parameters.
getBeta_pi(ZinbModel)
: returns the matrix beta_pi of inferred parameters.
getGamma_mu(ZinbModel)
: returns the matrix gamma_mu of inferred parameters.
getGamma_pi(ZinbModel)
: returns the matrix gamma_pi of inferred parameters.
getAlpha_mu(ZinbModel)
: returns the matrix alpha_mu of inferred parameters.
getAlpha_pi(ZinbModel)
: returns the matrix alpha_pi of inferred parameters.
X
matrix. The design matrix containing sample-level covariates, one sample per row.
V
matrix. The design matrix containing gene-level covariates, one gene per row.
O_mu
matrix. The offset matrix for mu.
O_pi
matrix. The offset matrix for pi.
which_X_mu
integer. Indeces of which columns of X to use in the regression of mu.
which_V_mu
integer. Indeces of which columns of V to use in the regression of mu.
which_X_pi
integer. Indeces of which columns of X to use in the regression of pi.
which_V_pi
integer. Indeces of which columns of V to use in the regression of pi.
X_mu_intercept
logical. TRUE if X_mu contains an intercept.
X_pi_intercept
logical. TRUE if X_pi contains an intercept.
V_mu_intercept
logical. TRUE if V_mu contains an intercept.
V_pi_intercept
logical. TRUE if V_pi contains an intercept.
W
matrix. The factors of sample-level latent factors.
beta_mu
matrix or NULL. The coefficients of X in the regression of mu.
gamma_mu
matrix or NULL. The coefficients of V in the regression of mu.
alpha_mu
matrix or NULL. The coefficients of W in the regression of mu.
beta_pi
matrix or NULL. The coefficients of X in the regression of pi.
gamma_pi
matrix or NULL. The coefficients of V in the regression of pi.
alpha_pi
matrix or NULL. The coefficients of W in the regression of pi.
zeta
numeric. A vector of log of inverse dispersion parameters.
epsilon_beta_mu
nonnegative scalar. Regularization parameter for beta_mu
epsilon_gamma_mu
nonnegative scalar. Regularization parameter for gamma_mu
epsilon_beta_pi
nonnegative scalar. Regularization parameter for beta_pi
epsilon_gamma_pi
nonnegative scalar. Regularization parameter for gamma_pi
epsilon_W
nonnegative scalar. Regularization parameter for W
epsilon_alpha
nonnegative scalar. Regularization parameter for alpha (both alpha_mu and alpha_pi)
epsilon_zeta
nonnegative scalar. Regularization parameter for zeta
epsilon_min_logit
scalar. Minimum regularization parameter for parameters of the logit model, including the intercept.
The parameters of the model given as argument are optimized by penalized maximum likelihood on the count matrix given as argument. It is recommended to call zinb_initialize before this function to have good starting point for optimization, since the optimization problem is not convex and can only converge to a local minimum.
zinbOptimize( m, Y, commondispersion = TRUE, maxiter = 25, stop.epsilon = 1e-04, verbose = FALSE, BPPARAM = BiocParallel::bpparam() )
zinbOptimize( m, Y, commondispersion = TRUE, maxiter = 25, stop.epsilon = 1e-04, verbose = FALSE, BPPARAM = BiocParallel::bpparam() )
m |
The model of class ZinbModel |
Y |
The matrix of counts. |
commondispersion |
Whether the dispersion is the same for all features (default=TRUE) |
maxiter |
maximum number of iterations (default 25) |
stop.epsilon |
stopping criterion, when the relative gain in likelihood is below epsilon (default 0.0001) |
verbose |
print information (default FALSE) |
BPPARAM |
object of class |
An object of class ZinbModel similar to the one given as argument with modified parameters alpha_mu, alpha_pi, beta_mu, beta_pi, gamma_mu, gamma_pi, W.
Y = matrix(10, 3, 5) m = zinbModel(n=NROW(Y), J=NCOL(Y)) m = zinbInitialize(m, Y, BPPARAM=BiocParallel::SerialParam()) m = zinbOptimize(m, Y, BPPARAM=BiocParallel::SerialParam())
Y = matrix(10, 3, 5) m = zinbModel(n=NROW(Y), J=NCOL(Y)) m = zinbInitialize(m, Y, BPPARAM=BiocParallel::SerialParam()) m = zinbOptimize(m, Y, BPPARAM=BiocParallel::SerialParam())
The dispersion parameters of the model are optimized by penalized maximum likelihood on the count matrix given as argument.
zinbOptimizeDispersion( J, mu, logitPi, epsilon, Y, commondispersion = TRUE, BPPARAM = BiocParallel::bpparam() )
zinbOptimizeDispersion( J, mu, logitPi, epsilon, Y, commondispersion = TRUE, BPPARAM = BiocParallel::bpparam() )
J |
The number of genes. |
mu |
the matrix containing the mean of the negative binomial. |
logitPi |
the matrix containing the logit of the probability parameter of the zero-inflation part of the model. |
epsilon |
the regularization parameter. |
Y |
The matrix of counts. |
commondispersion |
Whether or not a single dispersion for all features is estimated (default TRUE) |
BPPARAM |
object of class |
An object of class ZinbModel similar to the one given as argument with modified parameters zeta.
Y = matrix(10, 3, 5) m = zinbModel(n=NROW(Y), J=NCOL(Y)) m = zinbInitialize(m, Y, BPPARAM=BiocParallel::SerialParam()) m = zinbOptimizeDispersion(NROW(Y), getMu(m), getLogitPi(m), getEpsilon_zeta(m), Y, BPPARAM=BiocParallel::SerialParam())
Y = matrix(10, 3, 5) m = zinbModel(n=NROW(Y), J=NCOL(Y)) m = zinbInitialize(m, Y, BPPARAM=BiocParallel::SerialParam()) m = zinbOptimizeDispersion(NROW(Y), getMu(m), getLogitPi(m), getEpsilon_zeta(m), Y, BPPARAM=BiocParallel::SerialParam())
Given an object that describes zero-inflated negative binomial distribution, simulate counts from the distribution.
zinbSim(object, seed, ...) ## S4 method for signature 'ZinbModel' zinbSim(object, seed)
zinbSim(object, seed, ...) ## S4 method for signature 'ZinbModel' zinbSim(object, seed)
object |
an object that describes a matrix of zero-inflated negative binomial. |
seed |
an optional integer to specify how the random number generator
should be initialized with a call to |
... |
additional arguments. |
A list with the following elements.
countsthe matrix with the simulated counts.
dataNBthe data simulated from the negative binomial.
dataDropoutsthe data simulated from the binomial process.
zeroFractionthe fraction of zeros.
zinbSim(ZinbModel)
: simulate from a ZINB distribution.
a <- zinbModel(n=5, J=10) zinbSim(a)
a <- zinbModel(n=5, J=10) zinbSim(a)
Given an object with the data, it performs dimensionality reduction using a ZINB regression model with gene and cell-level covariates on a random subset of the data. It then projects the remaining data onto the lower dimensional space.
zinbsurf(Y, ...) ## S4 method for signature 'SummarizedExperiment' zinbsurf( Y, X, V, K, which_assay, which_genes, zeroinflation = TRUE, prop_fit = 0.1, BPPARAM = BiocParallel::bpparam(), verbose = FALSE, ... )
zinbsurf(Y, ...) ## S4 method for signature 'SummarizedExperiment' zinbsurf( Y, X, V, K, which_assay, which_genes, zeroinflation = TRUE, prop_fit = 0.1, BPPARAM = BiocParallel::bpparam(), verbose = FALSE, ... )
Y |
The data (genes in rows, samples in columns). Currently implemented
only for |
... |
Additional parameters to describe the model, see
|
X |
The design matrix containing sample-level covariates, one sample per row. If missing, X will contain only an intercept. If Y is a SummarizedExperiment object, X can be a formula using the variables in the colData slot of Y. |
V |
The design matrix containing gene-level covariates, one gene per row. If missing, V will contain only an intercept. If Y is a SummarizedExperiment object, V can be a formula using the variables in the rowData slot of Y. |
K |
integer. Number of latent factors. Specify |
which_assay |
numeric or character. Which assay of Y to use. If missing, if 'assayNames(Y)' contains "counts" then that is used. Otherwise, the first assay is used. |
which_genes |
character. Which genes to use to estimate W (see details).
Ignored if |
zeroinflation |
Whether or not a ZINB model should be fitted. If FALSE, a negative binomial model is fitted instead. |
prop_fit |
numeric between 0 and 1. The proportion of cells to use for the zinbwave fit. |
BPPARAM |
object of class |
verbose |
Print helpful messages. |
This function implements an approximate strategy, in which the full
zinbwave
model is fit only on a random subset of the data
(controlled by the prop_fit
parameter). The rest of the samples are
subsequently projected onto the low-rank space. This strategy is much
faster and uses less memory than the full zinbwave
method. It
is recommended with extremely large datasets.
By default zinbsurf
uses all genes to estimate W
.
However, we recommend to use the top 1,000 most variable genes for this
step. In general, a user can specify any custom set of genes to be used to
estimate W
, by specifying either a vector of gene names, or a single
character string corresponding to a column of the rowData
.
An object of class SingleCellExperiment
; the dimensionality
reduced matrix is stored in the reducedDims
slot.
zinbsurf(SummarizedExperiment)
: Y is a
SummarizedExperiment
.
se <- SingleCellExperiment(assays = list(counts = matrix(rpois(60, lambda=5), nrow=10, ncol=6)), colData = data.frame(bio = gl(2, 3))) colnames(se) <- paste0("sample", 1:6) m <- zinbsurf(se, X="~bio", K = 1, prop_fit = .5, which_assay = 1, BPPARAM=BiocParallel::SerialParam())
se <- SingleCellExperiment(assays = list(counts = matrix(rpois(60, lambda=5), nrow=10, ncol=6)), colData = data.frame(bio = gl(2, 3))) colnames(se) <- paste0("sample", 1:6) m <- zinbsurf(se, X="~bio", K = 1, prop_fit = .5, which_assay = 1, BPPARAM=BiocParallel::SerialParam())
Given an object with the data, it performs dimensionality reduction using a ZINB regression model with gene and cell-level covariates.
zinbwave(Y, ...) ## S4 method for signature 'SummarizedExperiment' zinbwave( Y, X, V, K = 2, fitted_model, which_assay, which_genes, commondispersion = TRUE, zeroinflation = TRUE, verbose = FALSE, nb.repeat.initialize = 2, maxiter.optimize = 25, stop.epsilon.optimize = 1e-04, BPPARAM = BiocParallel::bpparam(), normalizedValues = FALSE, residuals = FALSE, imputedValues = FALSE, observationalWeights = FALSE, ... )
zinbwave(Y, ...) ## S4 method for signature 'SummarizedExperiment' zinbwave( Y, X, V, K = 2, fitted_model, which_assay, which_genes, commondispersion = TRUE, zeroinflation = TRUE, verbose = FALSE, nb.repeat.initialize = 2, maxiter.optimize = 25, stop.epsilon.optimize = 1e-04, BPPARAM = BiocParallel::bpparam(), normalizedValues = FALSE, residuals = FALSE, imputedValues = FALSE, observationalWeights = FALSE, ... )
Y |
The data (genes in rows, samples in columns). Currently implemented
only for |
... |
Additional parameters to describe the model, see
|
X |
The design matrix containing sample-level covariates, one sample per row. If missing, X will contain only an intercept. If Y is a SummarizedExperiment object, X can be a formula using the variables in the colData slot of Y. |
V |
The design matrix containing gene-level covariates, one gene per row. If missing, V will contain only an intercept. If Y is a SummarizedExperiment object, V can be a formula using the variables in the rowData slot of Y. |
K |
integer. Number of latent factors. Specify |
fitted_model |
a |
which_assay |
numeric or character. Which assay of Y to use. If missing, if 'assayNames(Y)' contains "counts" then that is used. Otherwise, the first assay is used. |
which_genes |
character. Which genes to use to estimate W (see details).
Ignored if |
commondispersion |
Whether or not a single dispersion for all features is estimated (default TRUE). |
zeroinflation |
Whether or not a ZINB model should be fitted. If FALSE, a negative binomial model is fitted instead. |
verbose |
Print helpful messages. |
nb.repeat.initialize |
Number of iterations for the initialization of beta_mu and gamma_mu. |
maxiter.optimize |
maximum number of iterations for the optimization step (default 25). |
stop.epsilon.optimize |
stopping criterion in the optimization step, when the relative gain in likelihood is below epsilon (default 0.0001). |
BPPARAM |
object of class |
normalizedValues |
indicates wether or not you want to compute normalized values for the counts after adjusting for gene and cell-level covariates. |
residuals |
indicates wether or not you want to compute the residuals of the ZINB model. Deviance residuals are computed. |
imputedValues |
indicates wether or not you want to compute the imputed counts of the ZINB model. |
observationalWeights |
indicates whether to compute the observational weights for differential expression (see vignette). |
For visualization (heatmaps, ...), please use the normalized values.
It corresponds to the deviance residuals when the W
is not included
in the model but the gene and cell-level covariates are. As a results, when
W
is not included in the model, the deviance residuals should capture
the biology. Note that we do not recommend to use the normalized values for
any downstream analysis (such as clustering, or differential expression), but
only for visualization.
If one has already fitted a model using ZinbModel
,
the object containing such model can be used as input of zinbwave
to
save the resulting W into a SummarizedExperiment
and optionally
compute residuals and normalized values, without the need for re-fitting the
model.
By default zinbwave
uses all genes to estimate W
.
However, we recommend to use the top 1,000 most variable genes for this
step. In general, a user can specify any custom set of genes to be used to
estimate W
, by specifying either a vector of gene names, or a single
character string corresponding to a column of the rowData
.
Note that if both which_genes
is specified and at least one
among observationalWeights
, imputedValues
, residuals
,
and normalizedValues
is TRUE
, the model needs to be fit
twice.
An object of class SingleCellExperiment
; the dimensionality
reduced matrix is stored in the reducedDims
slot and optionally
normalized values and residuals are added in the list of assays.
zinbwave(SummarizedExperiment)
: Y is a
SummarizedExperiment
.
se <- SingleCellExperiment(assays = list(counts = matrix(rpois(60, lambda=5), nrow=10, ncol=6)), colData = data.frame(bio = gl(2, 3))) m <- zinbwave(se, X="~bio", BPPARAM=BiocParallel::SerialParam())
se <- SingleCellExperiment(assays = list(counts = matrix(rpois(60, lambda=5), nrow=10, ncol=6)), colData = data.frame(bio = gl(2, 3))) m <- zinbwave(se, X="~bio", BPPARAM=BiocParallel::SerialParam())