| Title: | Improved ChromVAR (Chromatin Variation Across Regions) |
|---|---|
| Description: | A much faster analytical implementation of chromVAR, with additional features, used to infer TF activity from (bulk or single-cell) ATAC-seq data and motif annotations (or binding probabilities). The package also includes the CVnorm normalization method based on the chromVAR logic. |
| Authors: | Pierre-Luc Germain [aut, cre] (ORCID: <https://orcid.org/0000-0003-3418-4218>) |
| Maintainer: | Pierre-Luc Germain <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.1.2 |
| Built: | 2026-06-04 06:41:00 UTC |
| Source: | https://github.com/bioc/betterChromVAR |
Add the bias column to the object's rowData, containing the regions'
proportion of Gs and Cs.
addGCBias(object, genome)addGCBias(object, genome)
object |
An object inheriting RangedSummarizedExperiment or GRanges. |
genome |
A BSgenome object or any other genome object supported by
|
object with the GC content in mcols(object)$bias (if GRanges)
or rowData(object)$bias.
# not run: # se <- addGCBias(se, genome)# not run: # se <- addGCBias(se, genome)
Coerce bcvBackground to a list
Show a bcvBackground object
Subsetting a bcvBackground
## S4 method for signature 'bcvBackground' as.list(x) ## S4 method for signature 'bcvBackground' show(object) ## S4 method for signature 'bcvBackground,ANY,ANY,ANY' x[i, j, ..., drop = TRUE]## S4 method for signature 'bcvBackground' as.list(x) ## S4 method for signature 'bcvBackground' show(object) ## S4 method for signature 'bcvBackground,ANY,ANY,ANY' x[i, j, ..., drop = TRUE]
x |
A |
object |
A |
i, j
|
Indices for subsetting (if j is provided, i is ignored). |
... |
Additional arguments. |
drop |
Logical, whether to drop dimensions. |
A list containing the slots of the object.
Nothing, prints an overview of the object.
An bcvBackground object.
Bin and background data for betterChromVAR (for internal use)
A fast, analytic implementation of chromVAR.
This is a wrapper around the getBackgroundBins,
computeBackgrounds, and computeDeviationsAnalytic
steps. It additionally allows for multithreading. For more control or
optimization, see the individual steps.
A fast, analytic implementation of chromVAR's computeDeviations, with
additional features.
betterChromVAR( object, annotations, grouping = NULL, nthreads = NULL, verbose = FALSE, ... )betterChromVAR( object, annotations, grouping = NULL, nthreads = NULL, verbose = FALSE, ... )
object |
A SummarizedExperiment (or SingleCellExperiment) with an assay
'counts', and with a 'bias' column in |
annotations |
Peak annotation (sparse) matrix, with motifs as columns, or a SummarizedExperiment containing this in the first assay. Values should be either logical or between 0 and 1. |
grouping |
An optional factor or vector coercible to a factor indicating
the groupings of the columns of |
nthreads |
Either an integer scalar indicating the number of threads to
use, or a |
verbose |
Logical; whether to output progress messages (default FALSE). |
... |
Passed to |
Contrarily to the original chromVAR, this function is entirely deterministic, and achieves higher precision and much higher efficiency through two changes:
working with expected background sampling mean and variances, rather than
actual permutations, and 2) computing expectations and variance at the level
of bias bins, instead of in the peak-space. The function additionally
includes experimental bias shrinkage options, the possibility to handle
annotations that are not binary (e.g. probability scores) and a third
bias dimension (fragment length bias, which should be stored in
rowData(object)$flbias see getBackgroundBins for details).
A SummarizedExperiment containing the adjusted deviations and z-scores for each motif/sample. The rowData additionally contains the number of motif matches and their variability.
Pierre-Luc Germain
Schep A.N., Wu B., Buenrostro J.D., Greenleaf W.J. (2017) chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data, Nature Methods, doi: 10.1038/nmeth.4401
attach(getDummyData()) # if GC content not already in the object, use: # counts <- addGCBias(counts, genome=YOUR_GENOME) dev <- betterChromVAR(counts, motifMatches) dev # note that this is the exact equivalent of doing: # bg <- getBackgroundBins(counts) # bg <- computeBackgrounds(counts, bg) # dev <- computeDeviationsAnalytic(counts, bg, motifMatches)attach(getDummyData()) # if GC content not already in the object, use: # counts <- addGCBias(counts, genome=YOUR_GENOME) dev <- betterChromVAR(counts, motifMatches) dev # note that this is the exact equivalent of doing: # bg <- getBackgroundBins(counts) # bg <- computeBackgrounds(counts, bg) # dev <- computeDeviationsAnalytic(counts, bg, motifMatches)
computeBackgrounds
computeBackgrounds( object, bins, grouping = NULL, expectation = NULL, shrinkage = c("none", "average", "smooth"), sigma = 1, verbose = FALSE )computeBackgrounds( object, bins, grouping = NULL, expectation = NULL, shrinkage = c("none", "average", "smooth"), sigma = 1, verbose = FALSE )
object |
A SummarizedExperiment (or SingleCellExperiment) with an assay 'counts', or a (sparse) matrix of counts. |
bins |
A |
grouping |
An optional factor or vector coercible to a factor indicating
the groupings of the columns of |
expectation |
Optional vector of length equal to |
shrinkage |
The method to use to shrink background (i.e. bias) bin
frequencies. Either "average" (shrinks towards the bin's average across
cells/samples of the same group), "smooth" (per-sample 2D smoothing over
the bin matrix, somewhat redundant with |
sigma |
Sigma parameter for the 2D smoothing. Ignored unless
|
verbose |
Logical; whether to output progress messages. |
A bcvBackground object with bins*samples slots filled, for use
with computeDeviationsAnalytic.
attach(getDummyData()) # if GC content not already in the object, use: # counts <- addGCBias(counts, genome=YOUR_GENOME) # we fist get the background bins: bg <- getBackgroundBins(counts) # then we can compute the backgrounds for each sample: bg <- computeBackgrounds(counts, bg) # for use in computeDeviationsAnalytic...attach(getDummyData()) # if GC content not already in the object, use: # counts <- addGCBias(counts, genome=YOUR_GENOME) # we fist get the background bins: bg <- getBackgroundBins(counts) # then we can compute the backgrounds for each sample: bg <- computeBackgrounds(counts, bg) # for use in computeDeviationsAnalytic...
computeDeviationsAnalytic
computeDeviationsAnalytic( object, background, annotations, verbose = FALSE, retSE = TRUE, compute = c("deviations", "z", "variability"), denominator = c("global", "local", "none") )computeDeviationsAnalytic( object, background, annotations, verbose = FALSE, retSE = TRUE, compute = c("deviations", "z", "variability"), denominator = c("global", "local", "none") )
object |
A SummarizedExperiment (or SingleCellExperiment) with an assay
'counts', and with a 'bias' column in |
background |
A |
annotations |
Peak annotation (sparse) matrix, with motifs as columns, or a SummarizedExperiment containing this in the first assay. Values should be either logical or between 0 and 1. |
verbose |
Logical; whether to output progress messages. |
retSE |
Logical; whether to return a SummarizedExperiment object. |
compute |
What to compute. Defaults to everything: deviations, z and motif variability. |
denominator |
The type of denominator to use for the deviations. Either 'global' (default), i.e. the global expectation (same as the original chromVAR), 'local' (background expectation of the cell/sample), or 'none' (denominator of 1). 'global' (default) is recommended. |
A SummarizedExperiment (or a list if retSE=FALSE).
attach(getDummyData()) # if GC content not already in the object, use: # counts <- addGCBias(counts, genome=YOUR_GENOME) # we fist get the background bins: bg <- getBackgroundBins(counts) # then we compute the backgrounds for each sample: bg <- computeBackgrounds(counts, bg) # then we can compute the deviations: dev <- computeDeviationsAnalytic(counts, bg, motifMatches) devattach(getDummyData()) # if GC content not already in the object, use: # counts <- addGCBias(counts, genome=YOUR_GENOME) # we fist get the background bins: bg <- getBackgroundBins(counts) # then we compute the backgrounds for each sample: bg <- computeBackgrounds(counts, bg) # then we can compute the deviations: dev <- computeDeviationsAnalytic(counts, bg, motifMatches) dev
Corrects ATAC peak counts by removing the effects of technical biases (GC/accessibility) using the chromVAR background binning approach and an optional variance-based bias shrinkage (inspired from the qsmooth package) to preserve group biological signal.
CVnorm( object, bias = NULL, grouping = NULL, smoothGrouping = grouping, shrinkMode = c("dampen", "qsmooth"), toAssay = "corrected", bs = NULL, w = 0.1, Z = FALSE, useWidthAdj = NULL, enforceZeros = TRUE )CVnorm( object, bias = NULL, grouping = NULL, smoothGrouping = grouping, shrinkMode = c("dampen", "qsmooth"), toAssay = "corrected", bs = NULL, w = 0.1, Z = FALSE, useWidthAdj = NULL, enforceZeros = TRUE )
object |
A matrix of counts, or a SummarizedExperiment-like object with an assay named 'counts'. |
bias |
A vector of length equal to |
grouping |
Optional grouping for the baseline expectation (prevents
bias toward more abundant groups). This should either be a vector coercible
to factor of length equal to |
smoothGrouping |
Optional grouping to determine correction strength.
If bias is consistent within these groups, correction is reduced. Accepts
the same type of inputs as |
shrinkMode |
The way to perform the group-based shrinkage. With
|
toAssay |
The name of the assay in which to store the corrected data
(default 'corrected'). Ignored unless |
bs |
Number of bins per dimension (see |
w |
Standard deviation of the Gaussian kernel for bin smoothing. |
Z |
Logical; whether to return standardized residuals (Z-scores) instead of the (default) corrected counts. |
useWidthAdj |
Whether to adjust for the different width of the regions.
If omitted, will be TRUE if the average absolute difference to the median
width is greater than 10% of the median width. If TRUE, will adjust for
|
enforceZeros |
Logical; whether to enforce that zero counts should
remain zeroes after correction (ignored if |
If object is a matrix, then a matrix of corrected counts of the
same dimensions. If object is a SummarizedExperiment-like object, then
the object is returned with an extra assay named based on toAssay.
Pierre-Luc Germain
Schep A.N., Wu B., Buenrostro J.D., Greenleaf W.J. (2017) chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data, Nature Methods, doi: 10.1038/nmeth.4401
Hicks SC, Okrah K, Paulson JN, Quackenbush J, Irizarry RA, Corrado Bravo H (2018). “Smooth quantile normalization.” Biostatistics 19 (2), doi: 10.1093/biostatistics/kxx028
counts_se <- getDummyData()$counts # if GC content not already in the object, use: # counts_se <- addGCBias(counts_se, genome=YOUR_GENOME) counts_se <- CVnorm(counts_se)counts_se <- getDummyData()$counts # if GC content not already in the object, use: # counts_se <- addGCBias(counts_se, genome=YOUR_GENOME) counts_se <- CVnorm(counts_se)
Computes chromVAR-like background (i.e. bias) bins, as well as bin-to-bin
selection probabilities needed for betterChromVAR.
getBackgroundBins( x, bias = NULL, flbias = NULL, w = 0.1, bs = NULL, pseudo = 0, verbose = TRUE )getBackgroundBins( x, bias = NULL, flbias = NULL, w = 0.1, bs = NULL, pseudo = 0, verbose = TRUE )
x |
A SummarizedExperiment containing a 'counts' assay, or a matrix of counts, or a vector of expected (e.g. mean) counts. |
bias |
A vector of length equal to |
flbias |
A vector of length equal to |
w |
Standard deviation of the Gaussian kernel. |
bs |
Number of bins per dimension. This can be a single integer (total
bins = |
pseudo |
Optional pseudocount to be added. This should not be needed with standard workflows. |
verbose |
Whether to print processing info. |
The procedure underlying this function is the same as in
chromVAR::getBackgroundPeaks, with the following differences:
Rather than producing a set of background peaks for each input peak, the function returns peak-to-bin mappings and bin-to-bin background selection probabilities, which enables an analytic background computation. It is, as such, entirely deterministic.
The function supports the optional use of a third bias dimension, provided
through the flbias argument, meant for fragment length bias. This is
still an experimental feature.
A bcvBackground object, to be used with
computeBackgrounds.
Schep A.N., Wu B., Buenrostro J.D., Greenleaf W.J. (2017) chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data, Nature Methods, doi: 10.1038/nmeth.4401
counts_se <- getDummyData()$counts background <- getBackgroundBins(counts_se)counts_se <- getDummyData()$counts background <- getBackgroundBins(counts_se)
Dummy data for testing purposes
getDummyData(nRegions = 500, nSamples = 10, nMotifs = 5)getDummyData(nRegions = 500, nSamples = 10, nMotifs = 5)
nRegions |
Number of regions to generate |
nSamples |
Number of samples to generate |
nMotifs |
Number of motifs to generate |
A list with the slots counts (a peak counts SummarizedExperiment)
and matches (a sparse matrix of binary motif matches per peaks)
out <- getDummyData() (counts <- out$counts) matches <- out$motifMatchesout <- getDummyData() (counts <- out$counts) matches <- out$motifMatches
Computes expected counts (a glorified rowMeans)
getExpectation(counts, grouping = NULL, normalize = TRUE)getExpectation(counts, grouping = NULL, normalize = TRUE)
counts |
A count matrix, or object inheriting SummarizedExperiment with a 'counts' assay. |
grouping |
An optional vector of length equal to |
normalize |
Logical; whether to normalize data between averaging (but
after grouping). Default TRUE and highly recommended if providing
|
A vector of expectation for each row of counts
attach(getDummyData()) e <- getExpectation(counts)attach(getDummyData()) e <- getExpectation(counts)
Normalizes the z-scores assay of a deviations object to make the scores comparable across motifs with different number of matches.
normalizeDevsForSize(dev)normalizeDevsForSize(dev)
dev |
A SummarizedExperiment object as produced by
|
The dev object with an additional assay named 'norm'.
attach(getDummyData()) dev <- betterChromVAR(counts, motifMatches) dev <- normalizeDevsForSize(dev) devattach(getDummyData()) dev <- betterChromVAR(counts, motifMatches) dev <- normalizeDevsForSize(dev) dev
Given a background generated by getBackgroundBins, samples
background peaks for each input peak.
sampleBackgroundPeaks(background, niterations = 50)sampleBackgroundPeaks(background, niterations = 50)
background |
A |
niterations |
Number of background peaks to sample for each target peak. |
This function is not used by betterChromVAR, which is
deterministic, but for other applications requiring an outputs similar to
that of the original getBackgroundPeaks.
A peaks x niterations matrix of integers representing the indices of the sampled background peaks.
counts_se <- getDummyData()$counts background <- getBackgroundBins(counts_se) bg_peaks <- sampleBackgroundPeaks(background, niterations=20)counts_se <- getDummyData()$counts background <- getBackgroundBins(counts_se) bg_peaks <- sampleBackgroundPeaks(background, niterations=20)
Empirical Bayes shrinkage of a matrix of counts towards a prior proportion (by default the mean across columns).
shrinkColumnProps(x, shrinkTo = NULL, var.theo = FALSE)shrinkColumnProps(x, shrinkTo = NULL, var.theo = FALSE)
x |
A matrix of counts, with features as rows and samples as columns. |
shrinkTo |
A vector (of length equal to |
var.theo |
Logical; whether to use theoretical (i.e. binomial) variances of the proportions, rather than the observed (weighted) variance. |
A matrix of the same dimensions as x representing the shrunk
column-wise proportions.
# generate a matrix of 5 sampling (with different total counts) of 20 # features based on the same base frequency : baseFreq <- abs(rnorm(20)) baseFreq <- baseFreq/sum(baseFreq) mat <- sapply(c(10,20,30,40,50), function(tot){ rpois(length(baseFreq), baseFreq*tot) }) # apply shrinkage and confirm that shrunk proportions are better correlated shrunk_mat <- shrinkColumnProps(mat) mean(cor(shrunk_mat))>mean(cor(mat))# generate a matrix of 5 sampling (with different total counts) of 20 # features based on the same base frequency : baseFreq <- abs(rnorm(20)) baseFreq <- baseFreq/sum(baseFreq) mat <- sapply(c(10,20,30,40,50), function(tot){ rpois(length(baseFreq), baseFreq*tot) }) # apply shrinkage and confirm that shrunk proportions are better correlated shrunk_mat <- shrinkColumnProps(mat) mean(cor(shrunk_mat))>mean(cor(mat))