Title: | Simulation of Multi-Modality Single Cell Data Guided By Gene Regulatory Networks and Cell-Cell Interactions |
---|---|
Description: | scMultiSim simulates paired single cell RNA-seq, single cell ATAC-seq and RNA velocity data, while incorporating mechanisms of gene regulatory networks, chromatin accessibility and cell-cell interactions. It allows users to tune various parameters controlling the amount of each biological factor, variation of gene-expression levels, the influence of chromatin accessibility on RNA sequence data, and so on. It can be used to benchmark various computational methods for single cell multi-omics data, and to assist in experimental design of wet-lab experiments. |
Authors: | Hechen Li [aut, cre] , Xiuwei Zhang [aut], Ziqi Zhang [aut], Michael Squires [aut] |
Maintainer: | Hechen Li <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.3.0 |
Built: | 2025-01-17 05:02:06 UTC |
Source: | https://github.com/bioc/scMultiSim |
Add experimental noise to true counts
add_expr_noise(results, ...)
add_expr_noise(results, ...)
results |
The scMultisim result object |
... |
|
none
The underlying methods True2ObservedCounts and True2ObservedATAC
results <- sim_example(ncells = 10) add_expr_noise(results)
results <- sim_example(ncells = 10) add_expr_noise(results)
Add outliers to the observed counts
add_outliers( res, prob = 0.01, factor = 2, sd = 0.5, cell.num = 1, max.var = Inf )
add_outliers( res, prob = 0.01, factor = 2, sd = 0.5, cell.num = 1, max.var = Inf )
res |
The scMultisim result object |
prob |
The probability of adding outliers for each gene |
factor |
The factor of the outliers |
sd |
The standard deviation of the outliers |
cell.num |
For a gene, the number of cells chosen to add outliers |
max.var |
The maximum variance allowed |
none
See the return value if you want to specify the cell-type level ground truth.
cci_cell_type_params( tree, total.lr, ctype.lr = 4:6, step.size = 1, rand = TRUE, discrete = FALSE )
cci_cell_type_params( tree, total.lr, ctype.lr = 4:6, step.size = 1, rand = TRUE, discrete = FALSE )
tree |
Use the same value for |
total.lr |
Total number of LR pairs in the database. Use the same value for |
ctype.lr |
If |
step.size |
Use the same value for |
rand |
Whether fill the matrix randomly |
discrete |
Whether the cell population is discrete. Use the same value for |
A 3D matrix of (n_cell_type, n_cell_type, n_lr). The value at (i, j, k) is 1 if there exist CCI of LR-pair k between cell type i and cell type j.
cci_cell_type_params(Phyla3(), 100, 4:6, 0.5, TRUE, FALSE)
cci_cell_type_params(Phyla3(), 100, 4:6, 0.5, TRUE, FALSE)
this is the density function of log(x+1), where x is the non-zero values for ATAC-SEQ data
data(dens_nonzero)
data(dens_nonzero)
a vector.
a vector.
data(dens_nonzero)
data(dens_nonzero)
Divide batches for observed counts
divide_batches(results, nbatch = 2, effect = 3, randseed = 0)
divide_batches(results, nbatch = 2, effect = 3, randseed = 0)
results |
The scMultisim result object, after running |
nbatch |
Number of batches |
effect |
Batch effect size, default is 3 |
randseed |
Random seed |
none
results <- sim_example(ncells = 10) add_expr_noise(results) divide_batches(results)
results <- sim_example(ncells = 10) add_expr_noise(results) divide_batches(results)
generate a clutter of cells by growing from the center
gen_clutter( n_cell, grid_size = NA, center = c(0, 0), existing_loc = NULL, existing_grid = NULL )
gen_clutter( n_cell, grid_size = NA, center = c(0, 0), existing_loc = NULL, existing_grid = NULL )
n_cell |
the number of cells |
grid_size |
the width and height of the grid |
center |
the center of the grid |
existing_loc |
only place cells on the specified existing locations |
existing_grid |
manually specify what locations are in the grid |
a matrix of locations
gen_clutter(10, 10, c(5, 5))
gen_clutter(10, 10, c(5, 5))
Plot the ligand-receptor correlation summary
gene_corr_cci( results = .getResultsFromGlobal(), all.genes = FALSE, .pair = NULL, .exclude.same.types = TRUE )
gene_corr_cci( results = .getResultsFromGlobal(), all.genes = FALSE, .pair = NULL, .exclude.same.types = TRUE )
results |
The scMultisim result object |
all.genes |
Whether to use all genes or only the ligand/receptor genes |
.pair |
Return the raw data for the given LR pair |
.exclude.same.types |
Whether to exclude neighbor cells with same cell type |
none
results <- sim_example_spatial(ncells = 10) gene_corr_cci(results)
results <- sim_example_spatial(ncells = 10) gene_corr_cci(results)
Print the correlations between targets of each regulator
gene_corr_regulator(results = .getResultsFromGlobal(), regulator)
gene_corr_regulator(results = .getResultsFromGlobal(), regulator)
results |
The scMultisim result object |
regulator |
The regulator ID in the GRN params |
none
results <- sim_example(ncells = 10) gene_corr_regulator(results, 2)
results <- sim_example(ncells = 10) gene_corr_regulator(results, 2)
This function gets the average correlation rna seq counts and region effect on genes for genes which are only associated with 1 chromatin region
Get_1region_ATAC_correlation(counts, atacseq_data, region2gene)
Get_1region_ATAC_correlation(counts, atacseq_data, region2gene)
counts |
rna seq counts |
atacseq_data |
atac seq data |
region2gene |
a 0 1 coupling matrix between regions and genes of shape (nregions) x (num_genes), where a value of 1 indicates the gene is affected by a particular region |
the correlation value
results <- sim_example(ncells = 10) Get_1region_ATAC_correlation(results$counts, results$atacseq_data, results$region_to_gene)
results <- sim_example(ncells = 10) Get_1region_ATAC_correlation(results$counts, results$atacseq_data, results$region_to_gene)
This function gets the average correlation rna seq counts and chromatin region effect on genes
Get_ATAC_correlation(counts, atacseq_data, num_genes)
Get_ATAC_correlation(counts, atacseq_data, num_genes)
counts |
rna seq counts |
atacseq_data |
atac seq data |
num_genes |
number of genes |
the correlation value
results <- sim_example(ncells = 10) Get_ATAC_correlation(results$counts, results$atacseq_data, results$num_genes)
results <- sim_example(ncells = 10) Get_ATAC_correlation(results$counts, results$atacseq_data, results$num_genes)
100_gene_GRN is a matrix of GRN params consisting of 100 genes where: # - column 1 is the target gene ID, # - column 2 is the gene ID which acts as a transcription factor for the target (regulated) gene # - column 3 is the effect of the column 2 gene ID on the column 1 gene ID
data(GRN_params_100)
data(GRN_params_100)
a data frame.
a data frame with three columns: target gene ID, TF gene ID, and the effect of TF on target gene.
data(GRN_params_100)
data(GRN_params_100)
GRN_params_1139 is a matrix of GRN params consisting of 1139 genes where: # - column 1 is the target gene ID, # - column 2 is the gene ID which acts as a transcription factor for the target (regulated) gene # - column 3 is the effect of the column 2 gene ID on the column 1 gene ID
data(GRN_params_1139)
data(GRN_params_1139)
a data frame.
a data frame with three columns: target gene ID, TF gene ID, and the effect of TF on target gene.
data(GRN_params_1139)
data(GRN_params_1139)
Creating a linear example tree
Phyla1(len = 1)
Phyla1(len = 1)
len |
length of the tree |
a R phylo object
Phyla1(len = 1)
Phyla1(len = 1)
Creating an example tree with 3 tips
Phyla3(plotting = FALSE)
Phyla3(plotting = FALSE)
plotting |
True for plotting the tree on console, False for no plot |
a R phylo object
Phyla3()
Phyla3()
Creating an example tree with 5 tips
Phyla5(plotting = FALSE)
Phyla5(plotting = FALSE)
plotting |
True for plotting the tree on console, False for no plot |
a R phylo object
Phyla5()
Phyla5()
Plot cell locations
plot_cell_loc( results = .getResultsFromGlobal(), size = 4, show.label = FALSE, show.arrows = TRUE, lr.pair = 1, .cell.pop = NULL, .locs = NULL )
plot_cell_loc( results = .getResultsFromGlobal(), size = 4, show.label = FALSE, show.arrows = TRUE, lr.pair = 1, .cell.pop = NULL, .locs = NULL )
results |
The scMultisim result object |
size |
Fig size |
show.label |
Show cell numbers |
show.arrows |
Show arrows representing cell-cell interactions |
lr.pair |
The ligand-receptor pair used to plot CCI arrows
|
.cell.pop |
Specify the cell population metadata |
.locs |
Manually specify the cell locations as a 2x |
none
results <- sim_example_spatial(ncells = 10) plot_cell_loc(results)
results <- sim_example_spatial(ncells = 10) plot_cell_loc(results)
Plot the gene module correlation heatmap
plot_gene_module_cor_heatmap( results = .getResultsFromGlobal(), seed = 0, grn.genes.only = TRUE, save = FALSE )
plot_gene_module_cor_heatmap( results = .getResultsFromGlobal(), seed = 0, grn.genes.only = TRUE, save = FALSE )
results |
The scMultisim result object |
seed |
The random seed |
grn.genes.only |
Plot the GRN gens only |
save |
save the plot as pdf |
none
results <- sim_example(ncells = 10) plot_gene_module_cor_heatmap(results)
results <- sim_example(ncells = 10) plot_gene_module_cor_heatmap(results)
In normal cases, please use plotCellLoc
instead.
plot_grid(results = .getResultsFromGlobal())
plot_grid(results = .getResultsFromGlobal())
results |
The scMultisim result object |
none
results <- sim_example_spatial(ncells = 10) plot_grid(results)
results <- sim_example_spatial(ncells = 10) plot_grid(results)
Plot the GRN network
plot_grn(params)
plot_grn(params)
params |
The GRN params data frame |
none
data(GRN_params_100, envir = environment()) plot_grn(GRN_params_100)
data(GRN_params_100, envir = environment()) plot_grn(GRN_params_100)
Plot a R phylogenic tree
plot_phyla(tree)
plot_phyla(tree)
tree |
The tree |
none
plot_phyla(Phyla5())
plot_phyla(Phyla5())
Plot RNA velocity as arrows on tSNE plot
plot_rna_velocity( results = .getResultsFromGlobal(), velocity = results$velocity, perplexity = 70, arrow.length = 1, save = FALSE, randseed = 0, ... )
plot_rna_velocity( results = .getResultsFromGlobal(), velocity = results$velocity, perplexity = 70, arrow.length = 1, save = FALSE, randseed = 0, ... )
results |
The scMultiSim result object |
velocity |
The velocity matrix, by default using the velocity matrix in the result object |
perplexity |
The perplexity for tSNE |
arrow.length |
The length scaler of the arrow |
save |
Whether to save the plot |
randseed |
The random seed |
... |
Other parameters passed to ggplot |
The plot
results <- sim_example(ncells = 10, velocity = TRUE) plot_rna_velocity(results, perplexity = 3)
results <- sim_example(ncells = 10, velocity = TRUE) plot_rna_velocity(results, perplexity = 3)
Plot t-SNE visualization of a data matrix
plot_tsne( data, labels, perplexity = 60, legend = "", plot.name = "", save = FALSE, rand.seed = 0, continuous = FALSE, labels2 = NULL, lim = NULL, runPCA = FALSE, alpha = 1 )
plot_tsne( data, labels, perplexity = 60, legend = "", plot.name = "", save = FALSE, rand.seed = 0, continuous = FALSE, labels2 = NULL, lim = NULL, runPCA = FALSE, alpha = 1 )
data |
The |
labels |
A vector of length |
perplexity |
Perplexity value used for t-SNE |
legend |
A list of colors for the labels |
plot.name |
The plot title |
save |
If |
rand.seed |
The random seed |
continuous |
Whether |
labels2 |
Additional label |
lim |
Specify the xlim and y lim c(x_min, x_max, y_min, y_max) |
runPCA |
Whether to run PCA before t-SNE |
alpha |
The alpha value for the points |
the figure if not save
, otherwise save the figure as plot.name
.pdf
results <- sim_example(ncells = 10) plot_tsne(log2(results$counts + 1), results$cell_meta$pop, perplexity = 3)
results <- sim_example(ncells = 10) plot_tsne(log2(results$counts + 1), results$cell_meta$pop, perplexity = 3)
Launch the Shiny App to configure the simulation
run_shiny()
run_shiny()
Show detailed documentations of scMultiSim's parameters
scmultisim_help(topic = NULL)
scmultisim_help(topic = NULL)
topic |
Can be |
none
scmultisim_help()
scmultisim_help()
Simulate a small example dataset with 200 cells and the 100-gene GRN
sim_example(ncells = 10, velocity = FALSE)
sim_example(ncells = 10, velocity = FALSE)
ncells |
number of cells, please increase this number on your machine |
velocity |
whether to simulate RNA velocity |
the simulation result
sim_example(ncells = 10)
sim_example(ncells = 10)
Simulate a small example dataset with 200 cells and the 100-gene GRN, with CCI enabled
sim_example_spatial(ncells = 10)
sim_example_spatial(ncells = 10)
ncells |
number of cells, please increase this number on your machine |
the simulation result
sim_example_spatial(ncells = 10)
sim_example_spatial(ncells = 10)
Simulate true scRNA and scATAC counts from the parameters
sim_true_counts(options, return_summarized_exp = FALSE)
sim_true_counts(options, return_summarized_exp = FALSE)
options |
See scMultiSim_help(). |
return_summarized_exp |
Whether to return a SummarizedExperiment object. |
scMultiSim returns an environment with the following fields:
counts
: Gene-by-cell scRNA-seq counts.
atac_counts
: Region-by-cell scATAC-seq counts.
region_to_gene
: Region-by-gene 0-1 marix indicating the corresponding relationship between chtomatin regions and genes.
atacseq_data
: The "clean" scATAC-seq counts without added intrinsic noise.
cell_meta
: A dataframe containing cell type labels and pseudotime information.
cif
: The CIF used during the simulation.
giv
: The GIV used during the simulation.
kinetic_params
: The kinetic parameters used during the simulation.
.grn
: The GRN used during the simulation.
.grn$regulators
: The list of TFs used by all gene-by-TF matrices.
.grn$geff
: Gene-by-TF matrix representing the GRN used during the simulation.
.n
: Other metadata, e.g. .n$cells
is the number of cells.
If do.velocity
is enabled, it has these additional fields:
unspliced_counts
: Gene-by-cell unspliced RNA counts.
velocity
: Gene-by-cell RNA velocity ground truth.
cell_time
: The pseudotime at which the cell counts were generated.
If dynamic GRN is enabled, it has these additional fields:
cell_specific_grn
: A list of length n_cells
. Each element is a gene-by-TF matrix, indicating the cell's GRN.
If cell-cell interaction is enabled, it has these additional fields:
grid
: The grid object used during the simulation.
grid$get_neighbours(i)
: Get the neighbour cells of cell i
.
cci_locs
: A dataframe containing the X and Y coordinates of each cell.
cci_cell_type_param
: A dataframe containing the CCI network ground truth: all ligand-receptor pairs between each pair of cell types.
cci_cell_types
: For continuous cell population, the sub-divided cell types along the trajectory used when simulating CCI.
If it is a debug session (debug = TRUE
), a sim
field is available,
which is an environment contains all internal states and data structures.
data(GRN_params_100, envir = environment()) sim_true_counts(list( rand.seed = 0, GRN = GRN_params_100, num.cells = 100, num.cifs = 50, tree = Phyla5() ))
data(GRN_params_100, envir = environment()) sim_true_counts(list( rand.seed = 0, GRN = GRN_params_100, num.cells = 100, num.cifs = 50, tree = Phyla5() ))
The class for spatial grids
a spatialGrid object
method
the method to generate the cell layout
grid_size
the width and height of the grid
ncells
the number of cells
grid
the grid matrix
locs
a list containing the locations of all cells
loc_order
deprecated, don't use; the order of the locations
cell_types
a map to save the cell type of each allocated cell
same_type_prob
the probability of a new cell placed next to a cell with the same type
max_nbs
the maximum number of neighbors for each cell
nb_map
a list containing the neighbors for each cell
nb_adj
adjacency matrix for neighbors
nb_radius
the radius of neighbors
final_types
the final cell types after the final time step
pre_allocated_pos
the pre-allocated positions for each cell, if any
method_param
additional parameters for the layout method
Simulate observed ATAC-seq matrix given technical noise and the true counts
True2ObservedATAC( atacseq_data, randseed, observation_prob = 0.3, sd_frac = 0.1 )
True2ObservedATAC( atacseq_data, randseed, observation_prob = 0.3, sd_frac = 0.1 )
atacseq_data |
true ATAC-seq data |
randseed |
(should produce same result if nregions, nevf and randseed are all the same) |
observation_prob |
for each integer count of a particular region for a particular cell, the probability the count will be observed |
sd_frac |
the fraction of ATAC-seq data value used as the standard deviation of added normally distrubted noise |
a matrix of observed ATAC-seq data
results <- sim_example(ncells = 10) True2ObservedATAC(results$atac_counts, randseed = 1)
results <- sim_example(ncells = 10) True2ObservedATAC(results$atac_counts, randseed = 1)
Simulate observed count matrix given technical biases and the true counts
True2ObservedCounts( true_counts, meta_cell, protocol, randseed, alpha_mean = 0.1, alpha_sd = 0.002, alpha_gene_mean = 1, alpha_gene_sd = 0, gene_len, depth_mean, depth_sd, lenslope = 0.02, nbins = 20, amp_bias_limit = c(-0.2, 0.2), rate_2PCR = 0.8, nPCR1 = 16, nPCR2 = 10, LinearAmp = FALSE, LinearAmp_coef = 2000 )
True2ObservedCounts( true_counts, meta_cell, protocol, randseed, alpha_mean = 0.1, alpha_sd = 0.002, alpha_gene_mean = 1, alpha_gene_sd = 0, gene_len, depth_mean, depth_sd, lenslope = 0.02, nbins = 20, amp_bias_limit = c(-0.2, 0.2), rate_2PCR = 0.8, nPCR1 = 16, nPCR2 = 10, LinearAmp = FALSE, LinearAmp_coef = 2000 )
true_counts |
gene cell matrix |
meta_cell |
the meta information related to cells, will be combined with technical cell level information and returned |
protocol |
a string, can be "nonUMI" or "UMI" |
randseed |
(should produce same result if nregions, nevf and randseed are all the same) |
alpha_mean |
the mean of rate of subsampling of transcripts during capture step, default at 10 percent efficiency |
alpha_sd |
the std of rate of subsampling of transcripts |
alpha_gene_mean |
the per-gene scale factor of the alpha parameter, default at 1 |
alpha_gene_sd |
the standard deviation of the per-gene scale factor of the alpha parameter, default at 0 |
gene_len |
a vector with lengths of all genes |
depth_mean |
mean of sequencing depth |
depth_sd |
std of sequencing depth |
lenslope |
amount of length bias |
nbins |
number of bins for gene length |
amp_bias_limit |
range of amplification bias for each gene, a vector of length ngenes |
rate_2PCR |
PCR efficiency, usually very high, default is 0.8 |
nPCR1 |
the number of PCR cycles in "pre-amplification" step, default is 16 |
nPCR2 |
the number of PCR cycles used after fragmentation. |
LinearAmp |
if linear amplification is used for pre-amplification step, default is FALSE |
LinearAmp_coef |
the coeficient of linear amplification, that is, how many times each molecule is amplified by |
if UMI, a list with two elements, the first is the observed count matrix, the second is the metadata; if nonUMI, a matrix
results <- sim_example(ncells = 10) data(gene_len_pool) gene_len <- sample(gene_len_pool, results$num_genes, replace = FALSE) True2ObservedCounts( results$counts, results$cell_meta, protocol = "nonUMI", randseed = 1, alpha_mean = 0.1, alpha_sd = 0.05, gene_len = gene_len, depth_mean = 1e5, depth_sd = 3e3 )
results <- sim_example(ncells = 10) data(gene_len_pool) gene_len <- sample(gene_len_pool, results$num_genes, replace = FALSE) True2ObservedCounts( results$counts, results$cell_meta, protocol = "nonUMI", randseed = 1, alpha_mean = 0.1, alpha_sd = 0.05, gene_len = gene_len, depth_mean = 1e5, depth_sd = 3e3 )