--- title: "1. Getting Started" output: BiocStyle::html_document: toc: true toc_depth: 2 vignette: > %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{1. Getting Started} %\usepackage[UTF-8]{inputenc} --- scMultiSim is a simulation tool for single-cell multi-omics data. It can simulate RNA counts, ATAC-seq data, RNA velocity, and spatial locations of continuous or discrete cell populations. It can model the effects of gene regulatory networks (GRN), chromatin accessibility, and cell-cell interactions on the simulated data. This article introduces the basic workflow of `scMultiSim`. ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Installation It is recommended to install `scMultiSim` from Bioconductor with: ```R if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("scMultiSim") ``` You can also install the development version of `scMultiSim` from GitHub with: ```R devtools::install_github("ZhangLabGT/scMultiSim@main") ``` # Running Simulation Once installed, you can load the package with: ```{r setup} library(scMultiSim) ``` A typical workflow consists two main steps: 1. Simulate the true counts; 2. Add technical noise and batch to the dataset. The `sim_true_counts` function generates the true counts. It accepts a list of options as input. You are able to control most of the simulated effects here. ```{r true-counts} data(GRN_params_100) results <- sim_true_counts(list( # required options GRN = GRN_params_100, tree = Phyla3(), num.cells = 500, # optional options num.cif = 20, discrete.cif = F, cif.sigma = 0.1 # ... other options )) ``` scMultiSim requires users to provide the following options: - `GRN`: The Gene Regulatory Network. - `tree`: The cell differential tree. Typically, you may also want to adjust the following options to control other important factors: - `num.cells`: Specify the number of cells. - `unregulated.gene.ratio` or `num.genes`: Control the total number of genes. - `discrete.cif`: Whether generating discrete or continuous cell population. - `diff.cif.fraction`: Control the contribution of the trajectory/cluster specified by the tree. - `cif.sigma`: Control the variation of cells along the trajectory. The [Simulating Multimodal Single-Cell Data](https://zhanglabgt.github.io/scMultiSim/articles/basics.html) tutorial will introduce these functions in more detail, including how to simulate RNA velocity data and ATAC-seq data. The [Simulating Spatial Cell-Cell Interactions](https://zhanglabgt.github.io/scMultiSim/articles/spatialCCI.html) tutorial will focus on simulating spatial cell locations and cell-cell interactions. You may also want to check the [Parameter Guide](https://zhanglabgt.github.io/scMultiSim/articles/options.html) or running the `scmultisim_help()` function for a complete list of options. ## The Shiny app Don't forget that scMultiSim provides a Shiny app to help you explore the options interactively. Simply run `run_shiny()` to start the app. ```{r run-shiny, eval = FALSE} run_shiny() ``` ## Add technical noise and batch effect You can use `add_expr_noise` to add technical noise to the dataset, and `divide_batches` to add batch effects. ```{r technical-noise} add_expr_noise(results) divide_batches(results, nbatch = 2) ``` ## Visualize the results scMultiSim provides various visualization functions to help you understand the simulated data. For example, `plot_tsne()` visualizes the cells using t-SNE. ```{r visualize} plot_tsne(results$counts, results$cell_meta$pop) ```