| Title: | Hierarchical Variance Partitioning |
|---|---|
| Description: | HVP is a quantitative batch effect metric that estimates the proportion of variance associated with batch effects in a data set. |
| Authors: | Wei Xin Chan [aut, cre] (ORCID: <https://orcid.org/0000-0003-3193-9195>) |
| Maintainer: | Wei Xin Chan <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.3.0 |
| Built: | 2026-05-30 08:01:28 UTC |
| Source: | https://github.com/bioc/HVP |
'HVP' calculates the proportion of variance associated with batch effects in a data set (the "HVP" value of a data set). To determine whether batch effects are statistically significant in a data set, a permutation test can be performed by setting 'nperm' to a number above 100. 'HVP' is an S4 generic function; methods can be added for new classes. S4 methods for class: array-like objects, 'SummarizedExperiment', 'SingleCellExperiment' and 'Seurat' are provided.
HVP(x, ...) ## S4 method for signature 'matrix' HVP(x, batch, cls = NULL, nperm = 0, use.sparse = FALSE, ...) ## S4 method for signature 'Matrix' HVP(x, batch, cls = NULL, nperm = 0, use.sparse = FALSE, ...) ## S4 method for signature 'data.frame' HVP(x, ...) ## S4 method for signature 'Seurat' HVP(x, batchname, classname = NULL, nperm = 0, use.sparse = FALSE, ...) ## S4 method for signature 'SummarizedExperiment' HVP( x, batchname, classname = NULL, assayname = NULL, nperm = 0, use.sparse = FALSE, ... )HVP(x, ...) ## S4 method for signature 'matrix' HVP(x, batch, cls = NULL, nperm = 0, use.sparse = FALSE, ...) ## S4 method for signature 'Matrix' HVP(x, batch, cls = NULL, nperm = 0, use.sparse = FALSE, ...) ## S4 method for signature 'data.frame' HVP(x, ...) ## S4 method for signature 'Seurat' HVP(x, batchname, classname = NULL, nperm = 0, use.sparse = FALSE, ...) ## S4 method for signature 'SummarizedExperiment' HVP( x, batchname, classname = NULL, assayname = NULL, nperm = 0, use.sparse = FALSE, ... )
x |
object to calculate HVP for. |
... |
additional arguments to pass to S4 methods. |
batch |
vector, indicating the batch information of samples. |
cls |
vector or list of vectors with class information of samples. |
nperm |
numeric indicating number of permutations to simulate in the Monte Carlo permutation test. We recommend a value no less than 1000. By default, no permutation test is performed. |
use.sparse |
logical indicating whether to use sparse matrices when computing HVP. N.B. Using sparse matrices may lead to slight increase in run time. |
batchname |
character, name of column in metadata indicating batch. |
classname |
character, name of column/s in metadata indicating class. |
assayname |
character, name of assay to use. By default the first assay is used. |
S4 method for class data frame or matrix takes in array with dimensions (nfeatures, nsamples).
S4 method for 'SummarizedExperiment' is applicable for the 'SingleExperiment' class as well, as it inherits from the 'SummarizedExperiment' class.
hvp S4 object with the following slots:
the proportion of variance associated with batch effects.
matrix of sum of squares between batch and total sum of squares for all features.
p-value of permutation test
numeric, null distribution of HVP values.
Last two components are only present if permuation test is performed.
Wei Xin Chan
X <- matrix(rnorm(1000), 50, 20) batch <- factor(rep(1:2, each = 10)) class <- factor(rep(LETTERS[1:2], 10)) res <- HVP(X, batch, class)X <- matrix(rnorm(1000), 50, 20) batch <- factor(rep(1:2, each = 10)) class <- factor(rep(LETTERS[1:2], 10)) res <- HVP(X, batch, class)
An S4 class to store the results from Hierarchical variance partitioning (HVP).
HVPnumeric indicating the proportion of variance associated with batch effects.
sum.squaresmatrix containing sum of squares between batches and total sum of squares for all features.
p.valueoptional numeric of P-value from permutation test.
null.distributionoptional numeric vector of null distribution of HVP values.
Plot results of permutation test
## S4 method for signature 'hvp,missing' plot(x, y, ...)## S4 method for signature 'hvp,missing' plot(x, y, ...)
x |
hvp S4 class containing HVP results after permutation testing. |
y |
ignored argument for compatibility with generic plot function. |
... |
ignored argument for compatibility with generic plot function. |
Plots the null distribution of the permutation test.
ggplot object of null distribution of permutation test.
Sigmoid function
sigmoid(x, r = 1, s = 0)sigmoid(x, r = 1, s = 0)
x |
numeric scalar/vector/matrix |
r |
inverse scale parameter of the sigmoid function |
s |
midpoint parameter of the sigmoid function |
A numeric scalar/vector/matrix of the same dimensions containing the transformed values.
p <- sigmoid(0.5)p <- sigmoid(0.5)
Simulate log-transformed microarray gene expression data
simulateMicroarray( crosstab, m, delta = 1, gamma = 0.5, phi = 0.2, c = 10, d = 6, epsilon = 0.5, kappa = 0.2, a = 40, b = 5, dropout = FALSE, r = 2, s = -6 )simulateMicroarray( crosstab, m, delta = 1, gamma = 0.5, phi = 0.2, c = 10, d = 6, epsilon = 0.5, kappa = 0.2, a = 40, b = 5, dropout = FALSE, r = 2, s = -6 )
crosstab |
matrix of contingency table specifying number of samples in each class-batch condition, with classes as rows and batches as columns. |
m |
number of genes. |
delta |
magnitude of additive batch effects (i.e. standard deviation of normal distribution modelling batch log fold change means of all genes). |
gamma |
magnitude of multiplicative batch effects (i.e. standard deviation of normal distribution modelling log batch effect terms of all samples in a batch). |
phi |
percentage of differentially expressed genes. |
c |
shape parameter of Gamma distribution modelling class log fold change means of all genes. |
d |
rate parameter of Gamma distribution modelling class log fold change means of all genes. |
epsilon |
magnitude of random noise across samples (i.e. standard deviation of normal distribution modelling log expression values with class effects only). |
kappa |
standard deviation of normal distribution modelling log scaling factors of all samples. |
a |
shape parameter of Gamma distribution modelling basal log mean expression of all genes. |
b |
rate parameter of Gamma distribution modelling basal log mean expression of all genes. |
dropout |
logical indicating whether to perform dropout |
r |
inverse scale parameter of the sigmoid function used to calculate probability of dropout for each value. |
s |
midpoint parameter of the sigmoid function used to calculate probability of dropout for each value. |
A list containing the following components:
matrix with dimensions '(m, n)' of log expression values.
data frame with 'n' rows of sample metadata.
character vector, names of differentially expressed genes.
matrix with dimensions '(m, n)' of log expression values with class effects only.
matrix with dimensions '(m, n)' of log batch effect terms.
matrix of class log fold change means for each gene in each class
matrix of batch log fold change means for each gene in each batch.
list of parameters supplied.
Wei Xin Chan
crosstab <- matrix(10, 3, 2) data <- simulateMicroarray(crosstab, 100)crosstab <- matrix(10, 3, 2) data <- simulateMicroarray(crosstab, 100)
Splits subsettable objects according to their columns
splitCols(x, f, drop = FALSE, ...)splitCols(x, f, drop = FALSE, ...)
x |
subsettable object to be split |
f |
vector or list of vectors indicating the grouping of columns |
drop |
logical indicating if levels that do not occur should be dropped |
... |
optional arguments to [split()] |
List of objects split by columns
X <- matrix(1:60, 10, 6) cond <- rep(1:3, each = 2) splitCols(X, cond)X <- matrix(1:60, 10, 6) cond <- rep(1:3, each = 2) splitCols(X, cond)