| Title: | Flexible simulation of paired-insertion counts for single-cell ATAC-sequencing data |
|---|---|
| Description: | simPIC is a package for simulating single-cell ATAC-seq count data. It provides a user-friendly, well documented interface for data simulation. Functions are provided for parameter estimation, realistic scATAC-seq data simulation, and comparing real and simulated datasets. |
| Authors: | Sagrika Chugh [aut, cre]
|
| Maintainer: | Sagrika Chugh <[email protected]> |
| License: | GPL-3 |
| Version: | 1.9.0 |
| Built: | 2026-05-28 18:05:35 UTC |
| Source: | https://github.com/bioc/simPIC |
Add additional feature statistics to a SingleCellExperiment object
addFeatureStats( sce, value = "counts", log = FALSE, offset = 1, no.zeros = FALSE )addFeatureStats( sce, value = "counts", log = FALSE, offset = 1, no.zeros = FALSE )
sce |
SingleCellExperiment to add feature statistics to. |
value |
the count value to calculate statistics. |
log |
logical. Whether to take log2 before calculating statistics. |
offset |
offset to add to avoid taking log of zero. |
no.zeros |
logical. Whether to remove all zeros from each feature before calculating statistics. |
Currently adds the following statistics: mean and variance. Statistics
are added to the rowData slot and are named
Stat[Log]Value[No0] where Log and No0 are added if
those arguments are true.
SingleCellExperiment with additional feature statistics
This function converts a sparse matrix into a SingleCellExperiment (SCE) object.
convert_to_SCE(sparse_data)convert_to_SCE(sparse_data)
sparse_data |
A sparse matrix containing count data, where rows are peaks and columns represent cells. |
A SingleCellExperiment (SCE) object with the sparse matrix stored in the "counts" assay.
Reorders assays in a SingleCellExperiment so that the counts assay
appears first. This keeps assay(sce) behavior predictable for
downstream code that still assumes the primary assay is counts.
ensureCountsFirst(sce)ensureCountsFirst(sce)
sce |
SingleCellExperiment object. |
SingleCellExperiment with counts assay first if present.
Get counts matrix from a SingleCellExperiment object. If counts is missing a warning is issued and the first assay is returned.
getCounts(sce)getCounts(sce)
sce |
SingleCellExperiment object |
counts matrix
simPIC: Simulate single-cell ATAC-seq data
globalvariables
Create a newsimPICcount object to store parameters.
newsimPICcount(...)newsimPICcount(...)
... |
Variables to set newsimPICcount object parameters. |
This function creates the object variable which is passed in all functions.
new object from class simPICcount.
object <- newsimPICcount()object <- newsimPICcount()
This function defines a custom theme for ggplot2 to ensure consistent visual appearance across multiple plots.
plot_theme()plot_theme()
A ggplot2 theme object with predefined settings.
Bind the rows of two data frames, keeping only the columns that are common to both.
rbindMatched(df1, df2)rbindMatched(df1, df2)
df1 |
first data.frame to bind. |
df2 |
second data.frame to bind. |
data.frame containing rows from df1 and df2 but
only common columns.
Trying two fitting methods and selecting the best one.
selectFit(data, distr, verbose = TRUE)selectFit(data, distr, verbose = TRUE)
data |
The data to fit. |
distr |
Name of the distribution to fit. |
verbose |
Logical. Whether to print progress messages. |
The distribution is fitted to the data using each of the
fitdist fitting methods. The fit with the
smallest Cramer-von Mises statistic is selected.
The selected fit object
Set input parameters of the simPICcount object.
setsimPICparameters(object, update = NULL, ...)setsimPICparameters(object, update = NULL, ...)
object |
input simPICcount object. |
update |
new parameters. |
... |
set new parameters for simPICcount object. |
simPICcount object with updated parameters.
object <- newsimPICcount() object <- setsimPICparameters(object, nCells = 200, nPeaks = 500)object <- newsimPICcount() object <- setsimPICparameters(object, nCells = 200, nPeaks = 500)
Combine data from several SingleCellExperiment objects and produce some basic plots comparing them.
simPICcompare( sces, point.size = 0.2, point.alpha = 0.1, fits = TRUE, colours = NULL )simPICcompare( sces, point.size = 0.2, point.alpha = 0.1, fits = TRUE, colours = NULL )
sces |
named list of SingleCellExperiment objects to combine and compare. |
point.size |
size of points in scatter plots. |
point.alpha |
opacity of points in scatter plots. |
fits |
whether to include fits in scatter plots. |
colours |
vector of colours to use for each dataset. |
The returned list has three items:
The first dataset in sces is treated as the reference dataset when
computing Kolmogorov-Smirnov (KS) summaries.
RowDataCombined row data from the provided SingleCellExperiments.
ColDataCombined column data from the provided SingleCellExperiments.
KSData frame summarizing KS statistics for peak means,
library sizes, and cell sparsity, comparing each dataset to the first
dataset in sces.
PlotsComparison plots
MeansBoxplot of mean distribution.
VariancesBoxplot of variance distribution.
MeanVarScatter plot with fitted lines showing the mean-variance relationship.
LibrarySizesBoxplot of the library size distribution.
ZerosPeakBoxplot of the percentage of each peak that is zero.
ZerosCellBoxplot of the percentage of each cell that is zero.
MeanZerosScatter plot with fitted lines showing the mean-zeros relationship.
The plots returned by this function are created using
ggplot and are only a sample of the kind of plots
you might like to consider. The data used to create these plots is also
returned and should be in the correct format to allow you to create
further plots using ggplot.
List containing the combined datasets and plots.
sim1 <- simPICsimulate( nPeaks = 1000, nCells = 500, pm.distr = "weibull", seed = 7856 ) sim2 <- simPICsimulate( nPeaks = 1000, nCells = 500, pm.distr = "gamma", seed = 4234 ) comparison <- simPICcompare(list(weibull = sim1, gamma = sim2)) names(comparison) names(comparison$Plots)sim1 <- simPICsimulate( nPeaks = 1000, nCells = 500, pm.distr = "weibull", seed = 7856 ) sim2 <- simPICsimulate( nPeaks = 1000, nCells = 500, pm.distr = "gamma", seed = 4234 ) comparison <- simPICcompare(list(weibull = sim1, gamma = sim2)) names(comparison) names(comparison$Plots)
S4 class that holds parameters for simPIC simulation.
a simPIC class object.
The parameters not shown in brackets can be estimated from real data
using simPICestimate. For details of the simPIC simulation
see simPICsimulate. The default parameters are based on the
PBMC10k dataset and can be reproduced using the test data and script
provided in inst/scripts.
simPIC simulation parameters:
nPeaksThe number of peaks to simulate.
nCellsThe number of cells to simulate.
[seed]Seed to use for generating random numbers.
[default]Logical value indicating whether to use default parameters (TRUE) or learn parameters from data (FALSE).
lib.size.meanlogmeanlog (location) parameter for the library size log-normal distribution.
lib.size.sdlogsdlog (scale) parameter for the library size log-normal distribution.
mean.scalescale parameter for the mean weibull distribution.
mean.shapeshape parameter for the mean weibull distribution.
sparsityProbability that contributes to the sparsity of the final simulated matrix.
Parameters are estimated using the estimateDisp function
in the edgeR package.
simPICEstBCV(counts, object, verbose)simPICEstBCV(counts, object, verbose)
counts |
counts matrix to estimate parameters from. |
object |
simPICcount object to store estimated values in. |
verbose |
Logical. Whether to print progress messages. |
The estimateDisp function is used to estimate the common
dispersion and prior degrees of freedom. See
estimateDisp for details. When estimating parameters on
simulated data we found a broadly linear relationship between the true
underlying common dispersion and the edgeR estimate, therefore we
apply a small correction, disp = -0.3 + 0.15 * edgeR.disp.
simPICcount object with estimated values.
Estimate simulation parameters for library size, peak means, and sparsity from a real peak-by-cell input matrix.
simPICestimate( counts, object = newsimPICcount(), pm.distr = c("lngamma", "gamma", "weibull", "pareto"), method = c("single", "groups"), verbose = TRUE ) ## S3 method for class 'SingleCellExperiment' simPICestimate( counts, object = newsimPICcount(), pm.distr = "lngamma", method = "single", verbose = TRUE ) ## S3 method for class 'dgCMatrix' simPICestimate( counts, object = newsimPICcount(), pm.distr = "lngamma", method = "single", verbose = TRUE )simPICestimate( counts, object = newsimPICcount(), pm.distr = c("lngamma", "gamma", "weibull", "pareto"), method = c("single", "groups"), verbose = TRUE ) ## S3 method for class 'SingleCellExperiment' simPICestimate( counts, object = newsimPICcount(), pm.distr = "lngamma", method = "single", verbose = TRUE ) ## S3 method for class 'dgCMatrix' simPICestimate( counts, object = newsimPICcount(), pm.distr = "lngamma", method = "single", verbose = TRUE )
counts |
either a sparse peak by cell count matrix, or a SingleCellExperiment object containing count data to estimate parameters. |
object |
simPICcount object to store estimated parameters and counts. |
pm.distr |
statistical distribution for estimating peak mean parameters. Available distributions: gamma, weibull, lngamma, pareto. Default is lngamma. |
method |
Simulation mode. Use |
verbose |
logical variable. Prints the simulation progress if TRUE. |
simPICcount object containing all estimated parameters.
counts <- readRDS(system.file("extdata", "test.rds", package = "simPIC")) est <- newsimPICcount() est <- simPICestimate(counts, pm.distr = "lngamma")counts <- readRDS(system.file("extdata", "test.rds", package = "simPIC")) est <- newsimPICcount() est <- simPICestimate(counts, pm.distr = "lngamma")
Estimate the library size parameters for simPIC simulation.
simPICestimateLibSize(counts, object, verbose)simPICestimateLibSize(counts, object, verbose)
counts |
count matrix. |
object |
simPICcount object to store estimated values. |
verbose |
Logical. Whether to print progress messages. |
Parameters for the lognormal distribution are estimated by fitting the
library sizes using fitdist. All the fitting
methods are tried and the fit with the best Cramer-von Mises statistic is
selected.
simPICcount object with estimated library size parameters.
Estimate peak mean parameters for simPIC simulation
simPICestimatePeakMean(norm.counts, object, pm.distr, verbose)simPICestimatePeakMean(norm.counts, object, pm.distr, verbose)
norm.counts |
Library-size normalized count matrix. |
object |
simPICcount object to store estimated values. |
pm.distr |
distribution parameter for peak means. |
verbose |
Logical. Whether to print progress messages. |
Parameters for gamma distribution are estimated by fitting the mean
normalized counts using fitdist.
All the fitting methods are tried and the fit with the best Cramer-von
Mises statistic is selected.
simPICcount object containing all estimated parameters
This function estimates cell sparsity from a normalized count matrix and updates the parameters of a simPIC object accordingly.
simPICestimateSparsity(norm.counts, object, verbose)simPICestimateSparsity(norm.counts, object, verbose)
norm.counts |
A normalized count matrix to estimate parameters from. |
object |
simPICcount object to store estimated parameters. |
verbose |
Logical. Whether to print progress messages. |
simPICcount object with updated sparsity parameter.
Get the value of a single variable from input simPICcount object.
simPICget(object, name)simPICget(object, name)
object |
input simPICcount object. |
name |
name of the parameter. |
Value of the input parameter.
object <- newsimPICcount() nPeaks <- simPICget(object, "nPeaks")object <- newsimPICcount() nPeaks <- simPICget(object, "nPeaks")
Get multiple parameter values from a simPIC object.
simPICgetparameters(object, names)simPICgetparameters(object, names)
object |
input object to get values from. |
names |
vector of names of the parameters to get. |
List with the values of the selected parameters.
object <- newsimPICcount() simPICgetparameters(object, c("nPeaks", "nCells", "peak.mean.shape"))object <- newsimPICcount() simPICgetparameters(object, c("nPeaks", "nCells", "peak.mean.shape"))
Run a streamlined population-scale Microglia example using packaged
SingleCellExperiment, sample mapping, peak annotation, and VCF files.
The workflow follows the splatPop estimation pattern by estimating
population means from retained donor-batch units while estimating
single-cell behaviour from the donor-batch unit with the most cells.
simPICMicrogliaExample( sce = NULL, sample.map = NULL, peak.annot = NULL, vcf = NULL, params = NULL, min.cells = 20L, pop.cv.bins = 10L, eqtl.n = 0.05, similarity.scale = 2, eqtl.dist = 1e+08, sparsify = FALSE, pca.ntop = 100L, pca.components = 5L, plot.n.samples = 4L, plot.samples = NULL, comparison.batch = NULL, comparison.pathology = NULL, seed = NULL, verbose = TRUE, ... )simPICMicrogliaExample( sce = NULL, sample.map = NULL, peak.annot = NULL, vcf = NULL, params = NULL, min.cells = 20L, pop.cv.bins = 10L, eqtl.n = 0.05, similarity.scale = 2, eqtl.dist = 1e+08, sparsify = FALSE, pca.ntop = 100L, pca.components = 5L, plot.n.samples = 4L, plot.samples = NULL, comparison.batch = NULL, comparison.pathology = NULL, seed = NULL, verbose = TRUE, ... )
sce |
Optional |
sample.map |
Optional data.frame or path to a tab-separated sample map
linking VCF donor IDs to Microglia donor IDs. If |
peak.annot |
Optional data.frame or path to the chr14 peak annotation
table. If |
vcf |
Optional |
params |
Optional |
min.cells |
Minimum number of cells required per donor-batch unit for inclusion in the aggregated population means. Units must exceed this threshold. |
pop.cv.bins |
Number of CV bins used when |
eqtl.n |
Proportion of peaks to simulate with caQTL effects. The packaged example uses a modest default to keep the workflow lightweight while still including genetic effects. |
similarity.scale |
Similarity scale passed to
|
eqtl.dist |
Maximum cis-distance used when assigning caQTL effects. |
sparsify |
Logical. Whether to sparsify the simulated output. |
pca.ntop |
Number of variable peaks used for PCA plots. |
pca.components |
Number of principal components to compute for the PCA summaries. |
plot.n.samples |
Number of libraries to show in the default PCA comparison panels. The top libraries are chosen by real-data cell count. |
plot.samples |
Optional character vector of specific libraries to show
in the PCA comparison panels. When |
comparison.batch |
Optional real-data batch/library label used for the Poptrial-style comparison panels. When supplied, the real data are subset to this batch and the simulated data are matched by the same sample IDs. |
comparison.pathology |
Optional pathology label used together with
|
seed |
Optional random seed. |
verbose |
Logical. Whether to print progress messages. |
... |
Additional parameters passed to |
A list containing:
sceFiltered real-data Microglia SingleCellExperiment.
bigcountsThe donor-batch subset with the most cells used for count-based estimation.
aggregatedAggregated donor-batch means as a
SingleCellExperiment.
paramsEstimated SplatPopParams after Microglia-specific
parameter updates.
simSimulated SingleCellExperiment.
plot_samplesLibraries used for the default PCA comparison panels.
plotsA named list containing real and simulated PCA plot summaries, side-by-side comparison panels, and bluster-based silhouette and neighborhood-purity panels.
sample_mapThe cleaned sample map used for alignment.
peak_annotThe chr14 peak annotation used for simulation.
vcfThe aligned VCF used for simulation.
if (requireNamespace("splatter", quietly = TRUE) && requireNamespace("VariantAnnotation", quietly = TRUE) && requireNamespace("bluster", quietly = TRUE)) { out <- simPICMicrogliaExample(verbose = FALSE) out$plot_samples names(out$plots$comparison) }if (requireNamespace("splatter", quietly = TRUE) && requireNamespace("VariantAnnotation", quietly = TRUE) && requireNamespace("bluster", quietly = TRUE)) { out <- simPICMicrogliaExample(verbose = FALSE) out$plot_samples names(out$plots$comparison) }
Create side-by-side real-versus-simulated comparison panels inspired by the
bluster-based plotting workflow used in Poptrial.Rmd. The comparison
can be restricted either to a chosen set of samples or to a real-data batch,
in which case the simulated data are matched by the same sample IDs.
simPICplotBlusterComparison( real.sce, simulated.sce, sample.col = "Sample", batch.col = NULL, subset.batch = NULL, pathology.col = NULL, subset.pathology = NULL, plot.samples = NULL, plot.n.samples = 5L, pca.ntop = 2000L, pca.components = 10L, point.size = 0.8, verbose = TRUE )simPICplotBlusterComparison( real.sce, simulated.sce, sample.col = "Sample", batch.col = NULL, subset.batch = NULL, pathology.col = NULL, subset.pathology = NULL, plot.samples = NULL, plot.n.samples = 5L, pca.ntop = 2000L, pca.components = 10L, point.size = 0.8, verbose = TRUE )
real.sce |
Real-data |
simulated.sce |
Simulated |
sample.col |
Column in |
batch.col |
Optional column in the real-data |
subset.batch |
Optional batch/library value used to subset the real data before matching the simulated samples. |
pathology.col |
Optional pathology column in the real-data
|
subset.pathology |
Optional pathology value used together with
|
plot.samples |
Optional character vector of samples to compare. Ignored
when |
plot.n.samples |
Number of top samples to keep when neither
|
pca.ntop |
Number of most variable peaks used for PCA. |
pca.components |
Number of principal components to compute. |
point.size |
Point size used in PCA panels. |
verbose |
Logical. Whether to print progress messages. |
A list with the subsetted real and simulated SCEs, the selected
sample IDs, the bluster metric tables, and three ggplot panels:
cell_pca_by_sample, silhouette_width, and
neighborhood_purity.
Create splatPop-style PCA plots from a real peak-by-cell matrix or
SingleCellExperiment. When metadata is not already available, sample
IDs can be inferred from cell names, which is useful for peak-by-cell
matrices where columns are encoded like sample#barcode.
simPICplotPopulationPCA( counts, sample.col = NULL, batch.col = NULL, sample.pattern = "#.*$", sample.replacement = "", pca.ntop = 2000, pca.components = 10, aggregate.by.sample = TRUE, point.size = 0.8, verbose = TRUE )simPICplotPopulationPCA( counts, sample.col = NULL, batch.col = NULL, sample.pattern = "#.*$", sample.replacement = "", pca.ntop = 2000, pca.components = 10, aggregate.by.sample = TRUE, point.size = 0.8, verbose = TRUE )
counts |
A peak-by-cell count matrix, sparse |
sample.col |
Optional name of a sample column already present in
|
batch.col |
Optional name of a batch column already present in
|
sample.pattern |
Regular expression used to strip the barcode suffix from cell names when inferring sample IDs. |
sample.replacement |
Replacement string used with
|
pca.ntop |
Number of most variable peaks to use for PCA. |
pca.components |
Number of principal components to compute. |
aggregate.by.sample |
Logical. Whether to also aggregate cells by sample and create a donor-level PCA plot. |
point.size |
Point size passed to |
verbose |
Logical. Whether to print progress messages. |
A list containing:
cell_sceCell-level SingleCellExperiment with inferred
metadata, log-normalized counts, and PCA.
sample_sceSample-aggregated SingleCellExperiment if
aggregate.by.sample = TRUE, otherwise NULL.
plotsA named list of ggplot objects.
if (requireNamespace("scater", quietly = TRUE)) { counts <- matrix(rpois(50 * 24, lambda = 3), nrow = 50, ncol = 24) colnames(counts) <- paste0( rep(paste0("Sample", 1:6), each = 4), "#Cell", seq_len(ncol(counts)) ) plots <- simPICplotPopulationPCA(counts, verbose = FALSE) names(plots$plots) }if (requireNamespace("scater", quietly = TRUE)) { counts <- matrix(rpois(50 * 24, lambda = 3), nrow = 50, ncol = 24) colnames(counts) <- paste0( rep(paste0("Sample", 1:6), each = 4), "#Cell", seq_len(ncol(counts)) ) plots <- simPICplotPopulationPCA(counts, verbose = FALSE) names(plots$plots) }
Simulate a peak by cell matrix given the mean accessibility for each peak in each cell. Cells start with the mean accessibility for the group they belong to (when simulating groups). The selected means are adjusted for each cell's expected library size.
simPICsimSingleCellMeans(object, sim) simPICsimulateGroupCellMeans(object, sim)simPICsimSingleCellMeans(object, sim) simPICsimulateGroupCellMeans(object, sim)
object |
simPIC object with simulation parameters. |
sim |
SingleCellExperiment to add cell means to. |
SingleCellExperiment with added cell means.
Simulate a peak-by-cell count matrix using simPIC methods.
simPICsimulate( object = newsimPICcount(), pm.distr = "lngamma", method = c("single", "groups"), verbose = TRUE, ... ) simPICsimulatesingle(object = newsimPICcount(), verbose = TRUE, ...) simPICsimulatemulti( object = newsimPICcount(), pm.distr = "lngamma", method = c("groups"), verbose = TRUE, ... )simPICsimulate( object = newsimPICcount(), pm.distr = "lngamma", method = c("single", "groups"), verbose = TRUE, ... ) simPICsimulatesingle(object = newsimPICcount(), verbose = TRUE, ...) simPICsimulatemulti( object = newsimPICcount(), pm.distr = "lngamma", method = c("groups"), verbose = TRUE, ... )
object |
simPICcount object with simulation parameters.
See |
pm.distr |
distribution parameter for peak means. Available distributions: gamma, weibull, lngamma, pareto. Default is lngamma. |
method |
Simulation mode. Use |
verbose |
Logical. Whether to print progress messages. |
... |
Any additional parameter settings to override what is provided
in |
simPIC provides the option to manually adjust each of the
simPICcount object parameters by calling
setsimPICparameters.
The simulation involves the following steps:
Set up simulation parameters
Set up SingleCellExperiment object
Simulate library sizes
Simulate sparsity
Simulate peak means
Create final synthetic counts
The final output is a
SingleCellExperiment object that
contains the simulated count matrix. The parameters are stored in the
colData (for cell-specific information),
rowData (for peak-specific information),
or assays (for peak-by-cell matrices).
SingleCellExperiment object containing the simulated counts.
# default simulation sim <- simPICsimulate(pm.distr = "lngamma")# default simulation sim <- simPICsimulate(pm.distr = "lngamma")
Simulate means for each peak in each cell that are adjusted to follow a mean-variance trend using Biological Coefficient of Variation taken from and inverse gamma distribution.
simPICsimulateBCVMeans(object, sim)simPICsimulateBCVMeans(object, sim)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment to add BCV means to. |
SingleCellExperiment with simulated BCV means.
Generate library sizes for cells in simPIC simulation based on the estimated values of mus and sigmas.
simPICsimulateLibSize(object, sim, verbose)simPICsimulateLibSize(object, sim, verbose)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment object containing simulation parameters. |
verbose |
Logical. Whether to print progress messages. |
SingleCellExperiment object with simulated library sizes.
Simulate differential accessibility. Differential accessibility factors for each
group are produced using getLNormFactors and these are added
along with updated means for each group. For paths care is taken to make sure
they are simulated in the correct order.
simPICsimulatemultiDA(object, sim)simPICsimulatemultiDA(object, sim)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment to add differential accessibility to. |
SingleCellExperiment with simulated differential accessibility.
Generate peak means for cells in simPIC simulation based on the estimated values of shape and rate parameters.
simPICsimulatePeakMean(object, sim, pm.distr, verbose)simPICsimulatePeakMean(object, sim, pm.distr, verbose)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment object containing simulation parameters. |
pm.distr |
distribution parameter for peak means. Available distributions: gamma, weibull, lngamma, pareto. Default is lngamma. |
verbose |
logical. Whether to print progress messages. |
SingleCellExperiment object with simulated peak means.
Counts are simulated from a Poisson distribution where each peak has a mean, expected library size and proportion of accessible chromatin.
simPICsimulateTrueCounts(object, sim)simPICsimulateTrueCounts(object, sim)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment object containing simulation parameters. |
SingleCellExperiment object with simulated true counts.
Counts are simulated from a Poisson distribution where each peak has a mean, expected library size and proportion of accessible chromatin.
simPICsimulateTrueCountsGroups(object, sim)simPICsimulateTrueCountsGroups(object, sim)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment object containing simulation parameters. |
SingleCellExperiment object with simulated true counts.
Estimate splatPop population parameters from peak-by-cell counts. This is a
peak-based wrapper around splatter::splatPopEstimate() that can
optionally aggregate donor-level peak means from a
SingleCellExperiment object and then apply the simPIC BCV correction.
splatPopEstimatePeak( counts = NULL, means = NULL, eqtl = NULL, params = NULL, sample.col = "Sample", batch.col = NULL, aggregate.by = c("sample", "sample_batch"), min.cells = 1, pop.cv.bins = 50, apply.bcv.correction = TRUE, verbose = TRUE )splatPopEstimatePeak( counts = NULL, means = NULL, eqtl = NULL, params = NULL, sample.col = "Sample", batch.col = NULL, aggregate.by = c("sample", "sample_batch"), min.cells = 1, pop.cv.bins = 50, apply.bcv.correction = TRUE, verbose = TRUE )
counts |
Either a |
means |
Optional dense matrix of aggregated peak means, with peaks in rows and donors/samples in columns. |
eqtl |
Optional empirical eQTL table to pass through to
|
params |
Optional |
sample.col |
Name of the sample/donor column in |
batch.col |
Optional batch column in |
aggregate.by |
Whether automatically derived means should be aggregated by sample or by sample-batch combinations. |
min.cells |
Minimum number of cells required per aggregation unit when
deriving |
pop.cv.bins |
Number of CV bins to use when |
apply.bcv.correction |
Logical. If |
verbose |
Logical. Whether to print progress messages. |
A SplatPopParams object.
if (requireNamespace("splatter", quietly = TRUE) && requireNamespace("VariantAnnotation", quietly = TRUE)) { set.seed(101) gene_means <- rgamma(60, shape = 2, rate = 0.4) cell_scales <- runif(48, min = 0.7, max = 1.4) counts <- vapply( cell_scales, function(scale) { rnbinom(60, mu = gene_means * scale, size = 0.2) }, numeric(60) ) sce <- SingleCellExperiment::SingleCellExperiment( assays = list(counts = counts), colData = data.frame( Sample = rep(paste0("S", 1:12), each = 4), Batch = rep(c("B1", "B2"), each = 24) ) ) params <- splatPopEstimatePeak( counts = sce, sample.col = "Sample", batch.col = "Batch", aggregate.by = "sample", min.cells = 2, pop.cv.bins = 10, verbose = FALSE ) params }if (requireNamespace("splatter", quietly = TRUE) && requireNamespace("VariantAnnotation", quietly = TRUE)) { set.seed(101) gene_means <- rgamma(60, shape = 2, rate = 0.4) cell_scales <- runif(48, min = 0.7, max = 1.4) counts <- vapply( cell_scales, function(scale) { rnbinom(60, mu = gene_means * scale, size = 0.2) }, numeric(60) ) sce <- SingleCellExperiment::SingleCellExperiment( assays = list(counts = counts), colData = data.frame( Sample = rep(paste0("S", 1:12), each = 4), Batch = rep(c("B1", "B2"), each = 24) ) ) params <- splatPopEstimatePeak( counts = sce, sample.col = "Sample", batch.col = "Batch", aggregate.by = "sample", min.cells = 2, pop.cv.bins = 10, verbose = FALSE ) params }
Simulate peak-based single-cell data using splatter::splatPopSimulate
while coercing peak annotations into the GFF-like structure that splatPop
expects internally.
splatPopSimulatePeak( params = NULL, vcf = NULL, method = c("single", "groups", "paths"), gff = NULL, eqtl = NULL, means = NULL, key = NULL, counts.only = FALSE, sparsify = TRUE, verbose = TRUE, ... )splatPopSimulatePeak( params = NULL, vcf = NULL, method = c("single", "groups", "paths"), gff = NULL, eqtl = NULL, means = NULL, key = NULL, counts.only = FALSE, sparsify = TRUE, verbose = TRUE, ... )
params |
Optional |
vcf |
Optional |
method |
Simulation mode passed through to |
gff |
Peak annotation supplied either as |
eqtl |
Optional empirical eQTL table. |
means |
Optional empirical population means matrix. |
key |
Optional splatPop key. |
counts.only |
Logical. Whether to keep only counts in the simulated object. |
sparsify |
Logical. Whether to sparsify the resulting assays. |
verbose |
Logical. Whether to print progress messages. |
... |
Additional parameters passed to |
A SingleCellExperiment object.
if (requireNamespace("splatter", quietly = TRUE) && requireNamespace("VariantAnnotation", quietly = TRUE)) { params <- splatter::newSplatPopParams(nGenes = 20) params <- splatter::setParams(params, batchCells = c(20)) sim <- splatPopSimulatePeak( params = params, vcf = splatter::mockVCF(), gff = splatter::mockGFF()[seq_len(20), ], sparsify = FALSE, verbose = FALSE ) sim }if (requireNamespace("splatter", quietly = TRUE) && requireNamespace("VariantAnnotation", quietly = TRUE)) { params <- splatter::newSplatPopParams(nGenes = 20) params <- splatter::setParams(params, batchCells = c(20)) sim <- splatPopSimulatePeak( params = params, vcf = splatter::mockVCF(), gff = splatter::mockGFF()[seq_len(20), ], sparsify = FALSE, verbose = FALSE ) sim }