| Title: | Count ratio uncertainty modeling base linear regression |
|---|---|
| Description: | Crumblr enables analysis of count ratio data using precision weighted linear (mixed) models. It uses an asymptotic normal approximation of the variance following the centered log ration transform (CLR) that is widely used in compositional data analysis. Crumblr provides a fast, flexible alternative to GLMs and GLMM's while retaining high power and controlling the false positive rate. |
| Authors: | Gabriel Hoffman [aut, cre] (ORCID: <https://orcid.org/0000-0002-0957-0224>) |
| Maintainer: | Gabriel Hoffman <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.5.0 |
| Built: | 2026-05-30 09:40:12 UTC |
| Source: | https://github.com/bioc/crumblr |
Crumblr enables analysis of count ratio data using precision weighted linear (mixed) models. It uses an asymptotic normal approximation of the variance following the centered log ration transform (CLR) that is widely used in compositional data analysis. Crumblr provides a fast, flexible alternative to GLMs and GLMM's while retaining high power and controlling the false positive rate.
none
Perform hierarchical clustering dimension reduction from single cell expression data
buildClusterTree( sce, reduction, labelCol, method.dist = c("cosine", "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), method.hclust = c("complete", "ward.D", "ward.D2") )buildClusterTree( sce, reduction, labelCol, method.dist = c("cosine", "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), method.hclust = c("complete", "ward.D", "ward.D2") )
sce |
|
reduction |
field of reducedDims(sce) to use |
labelCol |
column in |
method.dist |
method for |
method.hclust |
method for |
hierarchical clustering computed by hclust()
library(muscat) data(example_sce) hcl_test = buildClusterTree(example_sce, "TSNE", "cluster_id")library(muscat) data(example_sce) hcl_test = buildClusterTree(example_sce, "TSNE", "cluster_id")
Compute the centered log ratio (CLR) transform of a count matrix.
clr(counts, pseudocount = 0.5)clr(counts, pseudocount = 0.5)
counts |
count data with samples as rows and variables are columns |
pseudocount |
added to counts to avoid issues with zeros |
The CLR of a vector x of counts in D categories is defined as
clr(x) = log(x) - mean(log(x)). For details see van den Boogaart and Tolosana-Delgado (2013).
matrix of CLR transformed counts
Van den Boogaart, K. Gerald, and Raimon Tolosana-Delgado. Analyzing compositional data with R. Vol. 122. Berlin: Springer, 2013.
compositions::clr()
# set probability of each category prob <- c(0.1, 0.2, 0.3, 0.5) # number of total counts countsTotal <- 300 # number of samples n_samples <- 5 # simulate info for each sample info <- data.frame(Age = rgamma(n_samples, 50, 1)) rownames(info) <- paste0("sample_", 1:n_samples) # simulate counts from multinomial counts <- t(rmultinom(n_samples, size = countsTotal, prob = prob)) colnames(counts) <- paste0("cat_", 1:length(prob)) rownames(counts) <- paste0("sample_", 1:n_samples) # centered log ratio clr(counts)# set probability of each category prob <- c(0.1, 0.2, 0.3, 0.5) # number of total counts countsTotal <- 300 # number of samples n_samples <- 5 # simulate info for each sample info <- data.frame(Age = rgamma(n_samples, 50, 1)) rownames(info) <- paste0("sample_", 1:n_samples) # simulate counts from multinomial counts <- t(rmultinom(n_samples, size = countsTotal, prob = prob)) colnames(counts) <- paste0("cat_", 1:length(prob)) rownames(counts) <- paste0("sample_", 1:n_samples) # centered log ratio clr(counts)
Compute the inverse centered log ratio (CLR) transform of a count matrix.
clrInv(x)clrInv(x)
x |
CLR transform values |
Given the CLR transformed values, compute the original fractions
matrix of fractions
Van den Boogaart, K. Gerald, and Raimon Tolosana-Delgado. Analyzing compositional data with R. Vol. 122. Berlin: Springer, 2013.
compositions::clrInv()
# set probability of each category prob <- c(0.1, 0.2, 0.3, 0.5) # number of total counts countsTotal <- 300 # number of samples n_samples <- 5 # simulate info for each sample info <- data.frame(Age = rgamma(n_samples, 50, 1)) rownames(info) <- paste0("sample_", 1:n_samples) # simulate counts from multinomial counts <- t(rmultinom(n_samples, size = countsTotal, prob = prob)) colnames(counts) <- paste0("cat_", 1:length(prob)) rownames(counts) <- paste0("sample_", 1:n_samples) # Fractions counts / rowSums(counts) # centered log ratio, with zero pseudocount clr(counts, 0) # recover fractions from CLR transformed values clrInv(clr(counts, 0))# set probability of each category prob <- c(0.1, 0.2, 0.3, 0.5) # number of total counts countsTotal <- 300 # number of samples n_samples <- 5 # simulate info for each sample info <- data.frame(Age = rgamma(n_samples, 50, 1)) rownames(info) <- paste0("sample_", 1:n_samples) # simulate counts from multinomial counts <- t(rmultinom(n_samples, size = countsTotal, prob = prob)) colnames(counts) <- paste0("cat_", 1:length(prob)) rownames(counts) <- paste0("sample_", 1:n_samples) # Fractions counts / rowSums(counts) # centered log ratio, with zero pseudocount clr(counts, 0) # recover fractions from CLR transformed values clrInv(clr(counts, 0))
Count ratio uncertainty modeling based linear regression (crumblr) returns CLR-transformed counts and observation-level inverse-variance weights for use in weighted linear models.
crumblr( counts, pseudocount = 0.5, method = c("clr", "clr_2class"), tau = 1, max.ratio = 5, quant = 0.05 ) ## S4 method for signature 'matrix' crumblr( counts, pseudocount = 0.5, method = c("clr", "clr_2class"), tau = 1, max.ratio = 5, quant = 0.05 ) ## S4 method for signature 'data.frame' crumblr( counts, pseudocount = 0.5, method = c("clr", "clr_2class"), tau = 1, max.ratio = 5, quant = 0.05 )crumblr( counts, pseudocount = 0.5, method = c("clr", "clr_2class"), tau = 1, max.ratio = 5, quant = 0.05 ) ## S4 method for signature 'matrix' crumblr( counts, pseudocount = 0.5, method = c("clr", "clr_2class"), tau = 1, max.ratio = 5, quant = 0.05 ) ## S4 method for signature 'data.frame' crumblr( counts, pseudocount = 0.5, method = c("clr", "clr_2class"), tau = 1, max.ratio = 5, quant = 0.05 )
counts |
count data with samples as rows and variables are columns |
pseudocount |
added to counts to avoid issues with zeros |
method |
|
tau |
overdispersion parameter for Dirichlet multinomial. If |
max.ratio |
regularize estimates of the weights to have a maximum ratio of |
quant |
quantile value used for |
Evaluate the centered log ratio (CLR) transform of the count matrix, and the asymptotic theoretical variances of each transformed observation. The asymptotic normal approximation is increasingly accurate for small overdispersion , large total counts , and large proportions , but shows good agreement with the empirical results in most situations. In practice, it is often reasonable to assume a sufficient number of counts before a variable is included in an analysis anyway. But the feasibility of this assumption is up to the user to determine.
Given the array p storing proportions for one sample across all categories, the delta approximation uses the term 1/p. This can be unstable for small values of p, and the estimated variances can be sensitive to small changes in the proportions. To address this, the "clr_2class" method computes the clr() transform for category i using 2 classes: 1) counts in category i, and 2) counts _not_ in category i. Since class (2) now sums counts across all other categories, the small proportions are avoided and the variance estimates are more stable.
For real data, the asymptotic variance formula can give weights that vary substantially across samples and give very high weights for a subset of samples. In order to address this, we regularize the weights to reduce the variation in the weights to have a maximum ratio of max.ratio between the maximum and quant quantile value.
An EList object with the following components:
numeric matrix of CLR transformed counts
numeric matrix of observation-level inverse-variance weights
limma::voom(), variancePartition::dream()
# set probability of each category prob <- c(0.1, 0.2, 0.3, 0.5) # number of total counts countsTotal <- 300 # number of samples n_samples <- 100 # simulate info for each sample info <- data.frame(Age = rgamma(n_samples, 50, 1)) rownames(info) <- paste0("sample_", 1:n_samples) # simulate counts from multinomial counts <- t(rmultinom(n_samples, size = countsTotal, prob = prob)) colnames(counts) <- paste0("cat_", 1:length(prob)) rownames(counts) <- paste0("sample_", 1:n_samples) # run crumblr on counts cobj <- crumblr(counts) # run standard variancePartition analysis on crumblr results library(variancePartition) fit <- dream(cobj, ~Age, info) fit <- eBayes(fit) topTable(fit, coef = "Age", sort.by = "none")# set probability of each category prob <- c(0.1, 0.2, 0.3, 0.5) # number of total counts countsTotal <- 300 # number of samples n_samples <- 100 # simulate info for each sample info <- data.frame(Age = rgamma(n_samples, 50, 1)) rownames(info) <- paste0("sample_", 1:n_samples) # simulate counts from multinomial counts <- t(rmultinom(n_samples, size = countsTotal, prob = prob)) colnames(counts) <- paste0("cat_", 1:length(prob)) rownames(counts) <- paste0("sample_", 1:n_samples) # run crumblr on counts cobj <- crumblr(counts) # run standard variancePartition analysis on crumblr results library(variancePartition) fit <- dream(cobj, ~Age, info) fit <- eBayes(fit) topTable(fit, coef = "Age", sort.by = "none")
Compare difference in coefficient estimates between two trees. For node i, the test evaluates tree1[i] - tree2[i] = 0.
diffTree(tree1, tree2)diffTree(tree1, tree2)
tree1 |
object of type |
tree2 |
object of type |
When a fixed effect test is performed at each node using treeTest() with method = "FE.empirical" or method = "FE", a coefficient estimate and standard error are estimated for each node based on the children. This function performs a two-sample z-test to test if a given coefficient from tree1 is significantly different from the corresponding coefficient in tree2.
a comparison of the coefficient estimates at each node
library(variancePartition) # Load cell counts, clustering and metadata # from Kang, et al. (2018) https://doi.org/10.1038/nbt.4042 data(IFNCellCounts) # Simulate a factor with 2 levels called DiseaseRand set.seed(123) info$DiseaseRand <- sample(LETTERS[seq(2)], nrow(info), replace = TRUE) info$DiseaseRand <- factor(info$DiseaseRand, LETTERS[seq(2)]) # Apply crumblr transformation cobj <- crumblr(df_cellCounts) # Use dream workflow to analyze each cell separately fit <- dream(cobj, ~ StimStatus + ind, info) fit <- eBayes(fit) # Perform multivariate test across the hierarchy res1 <- treeTest(fit, cobj, hcl, coef = "StimStatusstim") # Perform same test, but on DiseaseRand fit2 <- dream(cobj, ~DiseaseRand, info) fit2 <- eBayes(fit2) res2 <- treeTest(fit2, cobj, hcl, coef = "DiseaseRandB") # Compare the coefficient estimates at each node # Test if res1 - res2 is significantly different from zero resDiff <- diffTree(res1, res2) resDiff plotTreeTest(resDiff) plotTreeTestBeta(resDiff)library(variancePartition) # Load cell counts, clustering and metadata # from Kang, et al. (2018) https://doi.org/10.1038/nbt.4042 data(IFNCellCounts) # Simulate a factor with 2 levels called DiseaseRand set.seed(123) info$DiseaseRand <- sample(LETTERS[seq(2)], nrow(info), replace = TRUE) info$DiseaseRand <- factor(info$DiseaseRand, LETTERS[seq(2)]) # Apply crumblr transformation cobj <- crumblr(df_cellCounts) # Use dream workflow to analyze each cell separately fit <- dream(cobj, ~ StimStatus + ind, info) fit <- eBayes(fit) # Perform multivariate test across the hierarchy res1 <- treeTest(fit, cobj, hcl, coef = "StimStatusstim") # Perform same test, but on DiseaseRand fit2 <- dream(cobj, ~DiseaseRand, info) fit2 <- eBayes(fit2) res2 <- treeTest(fit2, cobj, hcl, coef = "DiseaseRandB") # Compare the coefficient estimates at each node # Test if res1 - res2 is significantly different from zero resDiff <- diffTree(res1, res2) resDiff plotTreeTest(resDiff) plotTreeTestBeta(resDiff)
MLE for Dirichlet Multinomial
dmn_mle(counts, ...)dmn_mle(counts, ...)
counts |
matrix with rows as samples and columns as categories |
... |
additional arguments passed to |
Maximize Dirichlet Multinomial (DMN) log-likelihood with optim() using log likelihood function and its gradient. This method uses a second round of optimization to estimate the scale of parameters, which is necessary for accurate estimation of overdispersion metric.
The covariance between counts in each category from DMN distributed data is for total counts, and vector of proportions , where and . The count data is overdispersed by a factor of compared to a multinomial (MN) distribution. As increases, the DMN converges to the MN.
See https://en.wikipedia.org/wiki/Dirichlet-multinomial_distribution#Matrix_notation
list storing alpha parameter estimates, logLik, and details about convergence
alphaestimated parameters
overdispersionOverdispersion value compared to multinomial
logLikvalue of function
scalescaling of parameters computed in a second optimization step
evalsnumber of function evaluations in step 1
convergenceconvergence details from step 1
Other functions also estimate DMN parameters. MGLM::MGLMfit() and dirmult::dirmult() give good parameter estimates but are slower. Rfast::dirimultinom.mle() often fails to converge
library(HMP) set.seed(1) n_samples <- 1000 n_counts <- 5000 alpha <- c(500, 1000, 2000) # Dirichlet.multinomial counts <- Dirichlet.multinomial(rep(n_counts, n_samples), alpha) fit <- dmn_mle(counts) fit # overdispersion: true value a0 <- sum(alpha) rhoSq <- 1 / (a0 + 1) 1 + rhoSq * (n_counts - 1) # multinomial, so overdispersion is 1 counts <- t(rmultinom(n_samples, n_counts, prob = alpha / sum(alpha))) dmn_mle(counts) # #library(HMP) set.seed(1) n_samples <- 1000 n_counts <- 5000 alpha <- c(500, 1000, 2000) # Dirichlet.multinomial counts <- Dirichlet.multinomial(rep(n_counts, n_samples), alpha) fit <- dmn_mle(counts) fit # overdispersion: true value a0 <- sum(alpha) rhoSq <- 1 / (a0 + 1) 1 + rhoSq * (n_counts - 1) # multinomial, so overdispersion is 1 counts <- t(rmultinom(n_samples, n_counts, prob = alpha / sum(alpha))) dmn_mle(counts) # #
Counts are from single cell RNA-seq data from treated and untreated samples from Kang, et al (2018).
data(IFNCellCounts) info df_cellCounts hcldata(IFNCellCounts) info df_cellCounts hcl
info is metadata for each sample
df_cellCounts data.frame of counts for each sample
hcl cluster of cell types based on pseudobulk expression
An object of class data.frame with 16 rows and 4 columns.
An object of class matrix (inherits from array) with 16 rows and 8 columns.
An object of class hclust of length 7.
Kang, Hyun Min, et al. "Multiplexed droplet single-cell RNA-sequencing using natural genetic variation." Nature Biotechnology 36.1 (2018): 89-94.
Compute log fractions and precision weights from matrix of c ounts, where columns are variables and rows are samples
logFrac(counts, pseudocount = 0.5, max.ratio = 5, quant = 0.05)logFrac(counts, pseudocount = 0.5, max.ratio = 5, quant = 0.05)
counts |
count data with samples as rows and variables are columns |
pseudocount |
added to counts to avoid issues with zeros |
max.ratio |
regularize estimates of the weights to have a maximum ratio of |
quant |
quantile value used for |
For real data, the asymptotic variance formula can give weights that vary substantially across samples and give very high weights for a subset of samples. In order to address this, we regularize the weights to reduce the variation in the weights to have a maximum ratio of max.ratio between the maximum and quant quantile value.
An EList object with the following components:
numeric matrix of log transformed counts
numeric matrix of observation-level inverse-variance weights
limma::voom(), variancePartition::dream()
# set probability of each category prob <- c(0.1, 0.2, 0.3, 0.5) # number of total counts countsTotal <- 300 # number of samples n_samples <- 100 # simulate info for each sample info <- data.frame(Age = rgamma(n_samples, 50, 1)) rownames(info) <- paste0("sample_", 1:n_samples) # simulate counts from multinomial counts <- t(rmultinom(n_samples, size = countsTotal, prob = prob)) colnames(counts) <- paste0("cat_", 1:length(prob)) rownames(counts) <- paste0("sample_", 1:n_samples) # run logFrac on counts cobj <- logFrac(counts) # run standard variancePartition analysis on crumblr results library(variancePartition) fit <- dream(cobj, ~ Age, info) fit <- eBayes(fit) topTable(fit, coef = "Age", sort.by = "none")# set probability of each category prob <- c(0.1, 0.2, 0.3, 0.5) # number of total counts countsTotal <- 300 # number of samples n_samples <- 100 # simulate info for each sample info <- data.frame(Age = rgamma(n_samples, 50, 1)) rownames(info) <- paste0("sample_", 1:n_samples) # simulate counts from multinomial counts <- t(rmultinom(n_samples, size = countsTotal, prob = prob)) colnames(counts) <- paste0("cat_", 1:length(prob)) rownames(counts) <- paste0("sample_", 1:n_samples) # run logFrac on counts cobj <- logFrac(counts) # run standard variancePartition analysis on crumblr results library(variancePartition) fit <- dream(cobj, ~ Age, info) fit <- eBayes(fit) topTable(fit, coef = "Age", sort.by = "none")
Diagnositic plot for homoscedasticity across variables
meanSdPlot(x)meanSdPlot(x)
x |
data matrix |
Plot the sd versus rank mean of each row like vsn::meanSdPlot. Also show the coefficient of variation of the variances. A lower value indicates stronger variance stabilization
ggplot2 object
vsn::meanSdPlot
# set probability of each category prob <- runif(300) # number of samples n_samples <- 1000 # number of counts nCounts <- 3000 # simulate counts from multinomial counts <- t(rmultinom(n_samples, size = nCounts, prob = prob)) colnames(counts) <- paste0("cat_", 1:length(prob)) rownames(counts) <- paste0("sample_", 1:n_samples) # keep categories with at least 5 counts in at least 10 samples keep <- colSums(counts > 5) > 10 # run crumblr on counts cobj <- crumblr(counts[, keep], max.ratio = 10) # Plot for CLR # For each sample, plot rank of mean vs sd fig1 <- meanSdPlot(cobj$E) + ggtitle("CLR") # run crumblr::standardize() df_std <- standardize(cobj) # Standardized crumblr fig2 <- meanSdPlot(df_std) + ggtitle("Standardized crumblr") # Standardizing the crumblr results better stabilizes # the variances across variables fig1 | fig2# set probability of each category prob <- runif(300) # number of samples n_samples <- 1000 # number of counts nCounts <- 3000 # simulate counts from multinomial counts <- t(rmultinom(n_samples, size = nCounts, prob = prob)) colnames(counts) <- paste0("cat_", 1:length(prob)) rownames(counts) <- paste0("sample_", 1:n_samples) # keep categories with at least 5 counts in at least 10 samples keep <- colSums(counts > 5) > 10 # run crumblr on counts cobj <- crumblr(counts[, keep], max.ratio = 10) # Plot for CLR # For each sample, plot rank of mean vs sd fig1 <- meanSdPlot(cobj$E) + ggtitle("CLR") # run crumblr::standardize() df_std <- standardize(cobj) # Standardized crumblr fig2 <- meanSdPlot(df_std) + ggtitle("Standardized crumblr") # Standardizing the crumblr results better stabilizes # the variances across variables fig1 | fig2
Forest plot
Forest plot of effect size estimates at the leaves of the tree
plotForest(x, ...) ## S4 method for signature 'treedata' plotForest(x, ..., hide = FALSE)plotForest(x, ...) ## S4 method for signature 'treedata' plotForest(x, ..., hide = FALSE)
x |
result from |
... |
other arguments |
hide |
hide rownames and legend |
ggplot2 object
library(variancePartition) # Load cell counts, clustering and metadata # from Kang, et al. (2018) https://doi.org/10.1038/nbt.4042 data(IFNCellCounts) # Apply crumblr transformation cobj <- crumblr(df_cellCounts) # Use dream workflow to analyze each cell separately fit <- dream(cobj, ~ StimStatus + ind, info) fit <- eBayes(fit) # Perform multivariate test across the hierarchy res <- treeTest(fit, cobj, hcl, coef = "StimStatusstim") # Plot log fold changes from coef plotForest(res)library(variancePartition) # Load cell counts, clustering and metadata # from Kang, et al. (2018) https://doi.org/10.1038/nbt.4042 data(IFNCellCounts) # Apply crumblr transformation cobj <- crumblr(df_cellCounts) # Use dream workflow to analyze each cell separately fit <- dream(cobj, ~ StimStatus + ind, info) fit <- eBayes(fit) # Perform multivariate test across the hierarchy res <- treeTest(fit, cobj, hcl, coef = "StimStatusstim") # Plot log fold changes from coef plotForest(res)
Scatter plot with 2D density using viridis colors
plotScatterDensity(x, y, size = 1)plotScatterDensity(x, y, size = 1)
x |
the x-coordinates of points in the plot |
y |
the y-coordinates of points in the plot |
size |
size of point |
ggplot2 object
# simulate data M <- Rfast::rmvnorm(1000, mu = c(0, 0), sigma = diag(1, 2)) # create 2D density plot plotScatterDensity(M[, 1], M[, 2])# simulate data M <- Rfast::rmvnorm(1000, mu = c(0, 0), sigma = diag(1, 2)) # create 2D density plot plotScatterDensity(M[, 1], M[, 2])
Plot tree with results from multivariate testing
plotTreeTest( tree, low = "grey90", mid = "red", high = "darkred", xmax.scale = 1.5 )plotTreeTest( tree, low = "grey90", mid = "red", high = "darkred", xmax.scale = 1.5 )
tree |
phylo object storing tree |
low |
low color on gradient |
mid |
mid color on gradient |
high |
high color on gradient |
xmax.scale |
expand the x-axis by this factor so leaf labels fit in the plot |
ggplot2 object
library(variancePartition) # Load cell counts, clustering and metadata # from Kang, et al. (2018) https://doi.org/10.1038/nbt.4042 data(IFNCellCounts) # Apply crumblr transformation cobj <- crumblr(df_cellCounts) # Use dream workflow to analyze each cell separately fit <- dream(cobj, ~ StimStatus + ind, info) fit <- eBayes(fit) # Perform multivariate test across the hierarchy res <- treeTest(fit, cobj, hcl, coef = "StimStatusstim") # Plot hierarchy and testing results plotTreeTest(res) # Extract results for first 3 nodes res[1:3, ]library(variancePartition) # Load cell counts, clustering and metadata # from Kang, et al. (2018) https://doi.org/10.1038/nbt.4042 data(IFNCellCounts) # Apply crumblr transformation cobj <- crumblr(df_cellCounts) # Use dream workflow to analyze each cell separately fit <- dream(cobj, ~ StimStatus + ind, info) fit <- eBayes(fit) # Perform multivariate test across the hierarchy res <- treeTest(fit, cobj, hcl, coef = "StimStatusstim") # Plot hierarchy and testing results plotTreeTest(res) # Extract results for first 3 nodes res[1:3, ]
Plot tree coefficients from multivariate testing at each node. Only applicable top fixed effect tests
plotTreeTestBeta( tree, low = "blue", mid = "white", high = "red", xmax.scale = 1.5 )plotTreeTestBeta( tree, low = "blue", mid = "white", high = "red", xmax.scale = 1.5 )
tree |
phylo object storing tree |
low |
low color on gradient |
mid |
mid color on gradient |
high |
high color on gradient |
xmax.scale |
expand the x-axis by this factor so leaf labels fit in the plot |
ggplot2 object
library(variancePartition) # Load cell counts, clustering and metadata # from Kang, et al. (2018) https://doi.org/10.1038/nbt.4042 data(IFNCellCounts) # Apply crumblr transformation cobj <- crumblr(df_cellCounts) # Use dream workflow to analyze each cell separately fit <- dream(cobj, ~ StimStatus + ind, info) fit <- eBayes(fit) # Perform multivariate test across the hierarchy res <- treeTest(fit, cobj, hcl, coef = "StimStatusstim") # Plot hierarchy, no tests are significant plotTreeTestBeta(res)library(variancePartition) # Load cell counts, clustering and metadata # from Kang, et al. (2018) https://doi.org/10.1038/nbt.4042 data(IFNCellCounts) # Apply crumblr transformation cobj <- crumblr(df_cellCounts) # Use dream workflow to analyze each cell separately fit <- dream(cobj, ~ StimStatus + ind, info) fit <- eBayes(fit) # Perform multivariate test across the hierarchy res <- treeTest(fit, cobj, hcl, coef = "StimStatusstim") # Plot hierarchy, no tests are significant plotTreeTestBeta(res)
Compute standardized observations by dividing the observed values by their standard deviations based on the precision weights
standardize(x, ...) ## S4 method for signature 'EList' standardize(x, ...)standardize(x, ...) ## S4 method for signature 'EList' standardize(x, ...)
x |
object storing data to be transformed |
... |
other arguments |
Weighted response by their standard deviation so that resulting values have approximately equal sample variance. This is a key property that improves downstream PCA and clustering analysis.
matrix of standardized values
# set probability of each category prob <- c(0.1, 0.2, 0.3, 0.5) # number of total counts countsTotal <- 300 # number of samples n_samples <- 100 # simulate counts from multinomial counts <- t(rmultinom(n_samples, size = countsTotal, prob = prob)) colnames(counts) <- paste0("cat_", 1:length(prob)) rownames(counts) <- paste0("sample_", 1:n_samples) # run crumblr on counts cobj <- crumblr(counts) # Standardize crumblr responses df_std <- standardize(cobj) # Perform PCA on student transformed data pca <- prcomp(t(df_std)) df_pca <- as.data.frame(pca$x) ggplot(df_pca, aes(PC1, PC2)) + geom_point() + theme_classic() + theme(aspect.ratio = 1)# set probability of each category prob <- c(0.1, 0.2, 0.3, 0.5) # number of total counts countsTotal <- 300 # number of samples n_samples <- 100 # simulate counts from multinomial counts <- t(rmultinom(n_samples, size = countsTotal, prob = prob)) colnames(counts) <- paste0("cat_", 1:length(prob)) rownames(counts) <- paste0("sample_", 1:n_samples) # run crumblr on counts cobj <- crumblr(counts) # Standardize crumblr responses df_std <- standardize(cobj) # Perform PCA on student transformed data pca <- prcomp(t(df_std)) df_pca <- as.data.frame(pca$x) ggplot(df_pca, aes(PC1, PC2)) + geom_point() + theme_classic() + theme(aspect.ratio = 1)
Perform multivariate testing using mvTest() along the nodes of tree
treeTest( fit, obj, hc, coef, method = c("FE.empirical", "FE", "RE2C", "tstat", "sidak", "fisher"), shrink.cov = TRUE )treeTest( fit, obj, hc, coef, method = c("FE.empirical", "FE", "RE2C", "tstat", "sidak", "fisher"), shrink.cov = TRUE )
fit |
|
obj |
|
hc |
hierarchical clustering as an |
coef |
name of coefficient to be extracted |
method |
statistical method used to perform multivariate test. See details. |
shrink.cov |
shrink the covariance matrix between coefficients using the Schafer-Strimmer method |
See package remaCor for details about the remaCor::RE2C() test, and see remaCor::LS() for details about the fixed effect test. When only 1 feature is selected, the original t-statistic and p-value are returned.
object of type treedata storing results
variancePartition::mvTest()
library(variancePartition) # Load cell counts, clustering and metadata # from Kang, et al. (2018) https://doi.org/10.1038/nbt.4042 data(IFNCellCounts) # Apply crumblr transformation cobj <- crumblr(df_cellCounts) # Use dream workflow to analyze each cell separately fit <- dream(cobj, ~ StimStatus + ind, info) fit <- eBayes(fit) # Perform multivariate test across the hierarchy res <- treeTest(fit, cobj, hcl, coef = "StimStatusstim") # Plot hierarchy and testing results plotTreeTest(res) # Extract results for first 3 nodes res[1:3, ]library(variancePartition) # Load cell counts, clustering and metadata # from Kang, et al. (2018) https://doi.org/10.1038/nbt.4042 data(IFNCellCounts) # Apply crumblr transformation cobj <- crumblr(df_cellCounts) # Use dream workflow to analyze each cell separately fit <- dream(cobj, ~ StimStatus + ind, info) fit <- eBayes(fit) # Perform multivariate test across the hierarchy res <- treeTest(fit, cobj, hcl, coef = "StimStatusstim") # Plot hierarchy and testing results plotTreeTest(res) # Extract results for first 3 nodes res[1:3, ]