Title: | simPIC: flexible simulation of paired-insertion counts for single-cell ATAC-sequencing data |
---|---|
Description: | simPIC is a package for simulating single-cell ATAC-seq count data. It provides a user-friendly, well documented interface for data simulation. Functions are provided for parameter estimation, realistic scATAC-seq data simulation, and comparing real and simulated datasets. |
Authors: | Sagrika Chugh [aut, cre] , Davis McCarthy [aut], Heejung Shim [aut] |
Maintainer: | Sagrika Chugh <[email protected]> |
License: | GPL-3 |
Version: | 1.3.0 |
Built: | 2025-01-02 05:44:01 UTC |
Source: | https://github.com/bioc/simPIC |
Add additional feature statistics to a SingleCellExperiment object
addFeatureStats( sce, value = "counts", log = FALSE, offset = 1, no.zeros = FALSE )
addFeatureStats( sce, value = "counts", log = FALSE, offset = 1, no.zeros = FALSE )
sce |
SingleCellExperiment to add feature statistics to. |
value |
the count value to calculate statistics. |
log |
logical. Whether to take log2 before calculating statistics. |
offset |
offset to add to avoid taking log of zero. |
no.zeros |
logical. Whether to remove all zeros from each feature before calculating statistics. |
Currently adds the following statistics: mean and variance. Statistics
are added to the rowData
slot and are named
Stat[Log]Value[No0]
where Log
and No0
are added if
those arguments are true.
SingleCellExperiment with additional feature statistics
This function converts a dgc/sparse matrix into a SingleCellExperiment(SCE) object.
convert_to_SCE(sparse_data)
convert_to_SCE(sparse_data)
sparse_data |
A sparse matrix containing count data, where rows are peaks and columns represent cells. |
A SingleCellExperiment(SCE) object with the sparse matrix stored in the "counts" assay.
Get counts matrix from a SingleCellExperiment object. If counts is missing a warning is issued and the first assay is returned.
getCounts(sce)
getCounts(sce)
sce |
SingleCellExperiment object |
counts matrix
simPIC: Simulate single-cell ATAC-seq data
globalvariables
Create a newsimPICcount object to store parameters.
newsimPICcount(...)
newsimPICcount(...)
... |
Variables to set newsimPICcount object parameters. |
This function creates the object variable which is passed in all functions.
new object from class simPICcount.
object <- newsimPICcount()
object <- newsimPICcount()
This function defines a custom theme for ggplot2 to ensure consistent visual appearance across multiple plots.
plot_theme()
plot_theme()
A ggplot2 theme object with predefined settings.
Bind the rows of two data frames, keeping only the columns that are common to both.
rbindMatched(df1, df2)
rbindMatched(df1, df2)
df1 |
first data.frame to bind. |
df2 |
second data.frame to bind. |
data.frame containing rows from df1
and df2
but
only common columns.
Trying two fitting methods and selecting the best one.
selectFit(data, distr, verbose = TRUE)
selectFit(data, distr, verbose = TRUE)
data |
The data to fit. |
distr |
Name of the distribution to fit. |
verbose |
logical. To print messages or not. |
The distribution is fitted to the data using each of the
fitdist
fitting methods. The fit with the
smallest Cramer-von Mises statistic is selected.
The selected fit object
Set input parameters of the simPICcount object.
setsimPICparameters(object, update = NULL, ...)
setsimPICparameters(object, update = NULL, ...)
object |
input simPICcount object. |
update |
new parameters. |
... |
set new parameters for simPICcount object. |
simPICcount object with updated parameters.
object <- newsimPICcount() object <- setsimPICparameters(object, nCells = 200, nPeaks = 500)
object <- newsimPICcount() object <- setsimPICparameters(object, nCells = 200, nPeaks = 500)
Combine data from several SingleCellExperiment objects and produce some basic plots comparing them.
simPICcompare( sces, point.size = 0.2, point.alpha = 0.1, fits = TRUE, colours = NULL )
simPICcompare( sces, point.size = 0.2, point.alpha = 0.1, fits = TRUE, colours = NULL )
sces |
named list of SingleCellExperiment objects to combine and compare. |
point.size |
size of points in scatter plots. |
point.alpha |
opacity of points in scatter plots. |
fits |
whether to include fits in scatter plots. |
colours |
vector of colours to use for each dataset. |
The returned list has three items:
RowData
Combined row data from the provided SingleCellExperiments.
ColData
Combined column data from the provided SingleCellExperiments.
Plots
Comparison plots
Means
Boxplot of mean distribution.
Variances
Boxplot of variance distribution.
MeanVar
Scatter plot with fitted lines showing the mean-variance relationship.
LibrarySizes
Boxplot of the library size distribution.
ZerosPeak
Boxplot of the percentage of each peak that is zero.
ZerosCell
Boxplot of the percentage of each cell that is zero.
MeanZeros
Scatter plot with fitted lines showing the mean-zeros relationship.
The plots returned by this function are created using
ggplot
and are only a sample of the kind of plots
you might like to consider. The data used to create these plots is also
returned and should be in the correct format to allow you to create
further plots using ggplot
.
List containing the combined datasets and plots.
sim1 <- simPICsimulate( nPeaks = 1000, nCells = 500, pm.distr = "weibull", seed = 7856 ) sim2 <- simPICsimulate( nPeaks = 1000, nCells = 500, pm.distr = "gamma", seed = 4234 ) comparison <- simPICcompare(list(weibull = sim1, gamma = sim2)) names(comparison) names(comparison$Plots)
sim1 <- simPICsimulate( nPeaks = 1000, nCells = 500, pm.distr = "weibull", seed = 7856 ) sim2 <- simPICsimulate( nPeaks = 1000, nCells = 500, pm.distr = "gamma", seed = 4234 ) comparison <- simPICcompare(list(weibull = sim1, gamma = sim2)) names(comparison) names(comparison$Plots)
S4 class that holds parameters for simPIC simulation.
a simPIC class object.
The parameters not shown in brackets can be estimated from real data
using simPICestimate
. For details of the simPIC simulation
see simPICsimulate
. The default parameters are based on PBMC10k
dataset and can be reproduced using test data and script provided in
inst/script
simPIC simulation parameters:
nPeaks
The number of peaks to simulate.
nCells
The number of cells to simulate.
[seed]
Seed to use for generating random numbers.
[default]
The logical variable whether to use default parameters (TRUE) or learn from data (FALSE)
lib.size.meanlog
meanlog (location) parameter for the library size log-normal distribution.
lib.size.sdlog
sdlog (scale) parameter for the library size log-normal distribution.
mean.scale
scale parameter for the mean weibull distribution.
mean.shape
shape parameter for the mean weibull distribution.
sparsity
probability of openness to be multiplied to the input of poisson distribution to generate final simulated matrix.
Estimate simulation parameters for library size, peak means, and sparsity for simPIC simulation from a real peak by cell input matrix
simPICestimate( counts, object = newsimPICcount(), pm.distr = c("gamma", "weibull", "pareto", "lngamma"), verbose = TRUE ) ## S3 method for class 'SingleCellExperiment' simPICestimate( counts, object = newsimPICcount(), pm.distr = "weibull", verbose = TRUE ) ## S3 method for class 'dgCMatrix' simPICestimate( counts, object = newsimPICcount(), pm.distr = "weibull", verbose = TRUE )
simPICestimate( counts, object = newsimPICcount(), pm.distr = c("gamma", "weibull", "pareto", "lngamma"), verbose = TRUE ) ## S3 method for class 'SingleCellExperiment' simPICestimate( counts, object = newsimPICcount(), pm.distr = "weibull", verbose = TRUE ) ## S3 method for class 'dgCMatrix' simPICestimate( counts, object = newsimPICcount(), pm.distr = "weibull", verbose = TRUE )
counts |
either a sparse peak by cell count matrix, or a SingleCellExperiment object containing count data to estimate parameters. |
object |
simPICcount object to store estimated parameters and counts. |
pm.distr |
statistical distribution for estimating peak mean parameters. Available distributions: gamma, weibull, lngamma, pareto. Default is weibull. |
verbose |
logical variable. Prints the simulation progress if TRUE. |
simPICcount object containing all estimated parameters.
counts <- readRDS(system.file("extdata", "test.rds", package = "simPIC")) est <- newsimPICcount() est <- simPICestimate(counts, pm.distr = "weibull")
counts <- readRDS(system.file("extdata", "test.rds", package = "simPIC")) est <- newsimPICcount() est <- simPICestimate(counts, pm.distr = "weibull")
Estimate the library size parameters for simPIC simulation.
simPICestimateLibSize(counts, object, verbose)
simPICestimateLibSize(counts, object, verbose)
counts |
count matrix. |
object |
simPICcount object to store estimated values. |
verbose |
logical. To print messages or not. |
Parameters for the lognormal distribution are estimated by fitting the
library sizes using fitdist
. All the fitting
methods are tried and the fit with the best Cramer-von Mises statistic is
selected.
simPICcount object with estimated library size parameters.
Estimate peak mean parameters for simPIC simulation
simPICestimatePeakMean(norm.counts, object, pm.distr, verbose)
simPICestimatePeakMean(norm.counts, object, pm.distr, verbose)
norm.counts |
library size normalised counts matrix. |
object |
simPICcount object to store estimated values. |
pm.distr |
distribution parameter for peak means. |
verbose |
logical. To print progress messages or not. |
Parameters for gamma distribution are estimated by fitting the mean
normalised counts using fitdist
.
All the fitting methods are tried and the fit with the best Cramer-von
Mises statistic is selected.
simPICcount object containing all estimated parameters
Extract the accessibility proportion (sparsity) of each cell among all peaksvfrom the input count matrix.
simPICestimateSparsity(norm.counts, object, verbose)
simPICestimateSparsity(norm.counts, object, verbose)
norm.counts |
A sparse count matrix to estimate parameters from. |
object |
simPICcount object to store estimated parameters. |
verbose |
logical. To print messages or not. |
Vector of non-zero cell proportions of peaks is calculated by dividing the number of non-zero entries over the number of all cells for each peak.
simPICcount object with updated non-zero cell proportion parameter.
Get the value of a single variable from input simPICcount object.
simPICget(object, name)
simPICget(object, name)
object |
input simPICcount object. |
name |
name of the parameter. |
Value of the input parameter.
object <- newsimPICcount() nPeaks <- simPICget(object, "nPeaks")
object <- newsimPICcount() nPeaks <- simPICget(object, "nPeaks")
Get multiple parameter values from a simPIC object.
simPICgetparameters(object, names)
simPICgetparameters(object, names)
object |
input object to get values from. |
names |
vector of names of the parameters to get. |
List with the values of the selected parameters.
object <- newsimPICcount() simPICgetparameters(object, c("nPeaks", "nCells", "peak.mean.shape"))
object <- newsimPICcount() simPICgetparameters(object, c("nPeaks", "nCells", "peak.mean.shape"))
Simulate peak by cell count matrix from a sparse single-cell ATAC-seq peak by cell input using simPIC methods.
simPICsimulate( object = newsimPICcount(), verbose = TRUE, pm.distr = "weibull", ... )
simPICsimulate( object = newsimPICcount(), verbose = TRUE, pm.distr = "weibull", ... )
object |
simPICcount object with simulation parameters.
See |
verbose |
logical variable. Prints the simulation progress if TRUE. |
pm.distr |
distribution parameter for peak means. Available distributions: gamma, weibull, lngamma, pareto. Default is weibull. |
... |
Any additional parameter settings to override what is provided
in |
simPIC provides the option to manually adjust each of the
simPICcount
object parameters by calling
setsimPICparameters
.
The simulation involves following steps:
Set up simulation parameters
Set up SingleCellExperiment object
Simulate library sizes
Simulate sparsity
Simulate peak means
Create final synthetic counts
The final output is a
SingleCellExperiment
object that
contains the simulated count matrix. The parameters are stored in the
colData
(for cell specific information),
rowData
(for peak specific information) or
assays
(for peak by cell matrix) slots. This additional
information includes:
SingleCellExperiment object containing the simulated counts.
# default simulation sim <- simPICsimulate(pm.distr = "weibull")
# default simulation sim <- simPICsimulate(pm.distr = "weibull")
Generate library sizes for cells in simPIC simulation based on the estimated values of mus and sigmas.
simPICsimulateLibSize(object, sim, verbose)
simPICsimulateLibSize(object, sim, verbose)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment object containing simulation parameters. |
verbose |
logical. To print progress messages. |
SingleCellExperiment object with simulated library sizes.
Generate peak means for cells in simPIC simulation based on the estimated values of shape and rate parameters.
simPICsimulatePeakMean(object, sim, pm.distr, verbose)
simPICsimulatePeakMean(object, sim, pm.distr, verbose)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment object containing simulation parameters. |
pm.distr |
distribution parameter for peak means. Available distributions: gamma, weibull, lngamma, pareto. Default is weibull. |
verbose |
logical. Whether to print progress messages. |
SingleCellExperiment object with simulated peak means.
Counts are simulated from a poisson distribution where each peak has a mean, expected library size and proportion of accessible chromatin.
simPICsimulateTrueCounts(object, sim)
simPICsimulateTrueCounts(object, sim)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment object containing simulation parameters. |
SingleCellExperiment object with simulated true counts.