--- title: "3. Simulating Spatial Cell-Cell Interactions" output: BiocStyle::html_document: toc: true toc_depth: 2 vignette: > %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{3. Simulating Spatial Cell-Cell Interactions} %\usepackage[UTF-8]{inputenc} --- ```{r "setup", include=FALSE} require("knitr") opts_chunk$set(fig.width=4, fig.height=3) # devtools::load_all(".") ``` ## Simulating Spatial Cell-Cell Interactions scMultiSim can simulate spatial cell-cell interactions. To do so, we need to provide the `cci` option as a list. The following code will print more instructions on how to use the `cci` option. ```{r help-cci} library(scMultiSim) scmultisim_help("cci") ``` Now, we prepare a ligand-receptor interaction database. This is pretty similar to the GRN network: it is a data frame with three columns, specifying `target`, `regulator`, and `effect`, respectively. The target and regulator columns should contain the IDs of the target and regulator genes. In the following example, we have two ligand-receptor pairs interacting between two neighboring cells. ```{r cci-network} lig_params <- data.frame( target = c(101, 102), regulator = c(103, 104), effect = c(5.2, 5.9) ) ``` We can now simulate the spatial cell-cell interactions. In scMultiSim, the CCI network is cell-type based, which means that between each cell type pair, we can have a different CCI network sampled from the database defined above. Here, we set the `step.size` to 0.5, so the differentiation tree is divided into segments of length 0.5, each segment is treated as a cell type in CCI. We set `cell.type.interaction` to `random`, so the CCI network between each cell type pair is randomly sampled from the database. Here, we use only 100 cells to speed up the simulation. Feel free to try a larger number of cells when running this vignette locally. ```{r} data(GRN_params_100) set.seed(42) options_ <- list( GRN = GRN_params_100, speed.up = TRUE, num.genes = 120, num.cells = 80, num.cifs = 20, cif.sigma = 0.2, tree = Phyla3(), intrinsic.noise = 0.5, cci = list( params = lig_params, max.neighbors = 4, grid.size = 13, cell.type.interaction = "random", step.size = 0.5 ) ) results <- sim_true_counts(options_) ``` The `results$cell_meta` will contain the cell type information used in CCI. We can plot the cell spatial locations using `plot_cell_loc()`. The arrows indicate cell-cell interactions between two cells (for the first ligand-receptor pair). ```{r plot-cell-loc, fig.width=6, fig.height=6} plot_cell_loc(results) ``` The cell locations are available in `results$cci_locs`. ```{r print-cell-loc} head(results$cci_locs) ``` ### Speeding up the Simulation Simulating spatial cell-cell interactions can be computationally expensive. Setting these two options can speed up the simulation: ``` options_ <- list( # ... speed.up = T, cci = list( # ... start.layer = ncells ) ) ``` First of all, it is recommended to set the experimental `speed.up = T` option. This option will become default in later versions of scMultiSim. Next, it is possible to set the CCI option `start.layer = n_cells`, where `n_cells` is the number of cells. scMultiSim simulates a spatial dataset by following `n_cells` steps, adding one more cell to the spatial grid in each step. Only the final step is outputted as the result. The CCI option `start.layer` can be used to start simulation from a specific time step. When set to `n_cells`, the simulation will skip all previous steps by adding all cells at once. By default, `start.layer` will be set to `n_cells` when number of cells is greater than 800. ## Spatial layouts scMultiSim provides powerful customization options for spatial cell layouts. ### Built-in layouts scMultiSim ships with several built-in spatial layouts. The `enhanced` layout is the default layout, where cells are added to the grid one by one. When adding a new cell, it has a higher probability of being placed near the existing cells of the same cell type. ```{r layout-enhanced, fig.width=6, fig.height=6} # helper function to add `layout` to options, to make the code more readable spatial_options <- function (...) { cci_opt <- list( params = lig_params, max.neighbors = 4, start.layer = 300, grid.size = 28, cell.type.interaction = "random" ) list( rand.seed = 0, GRN = GRN_params_100, speed.up = TRUE, num.genes = 200, num.cells = 300, num.cifs = 50, tree = Phyla3(), cci = c(cci_opt, list(...)) ) } results <- sim_true_counts(spatial_options( layout = "enhanced" )) plot_cell_loc(results, show.arrows = FALSE) ``` An option `same.type.prob` decides the probability of a new cell being placed near the existing cells of the same cell type. By default, it is 0.8; and if we use a lower value, the new cell will be placed more randomly. ```{r layout-random, fig.width=6, fig.height=6} results <- sim_true_counts(spatial_options( layout = "enhanced", same.type.prob = 0.1 )) plot_cell_loc(results, show.arrows = FALSE) ``` The `layers` layout arranges cells in layers. ```{r layout-layers, fig.width=6, fig.height=6} results <- sim_true_counts(spatial_options( layout = "layers" )) plot_cell_loc(results, show.arrows = FALSE) ``` The `islands` layout will put some cell types in the center like islands, and others around them. You may specify which cell type should be islands in the format `islands:1,2,3`. The number here can be looked up in `results$cci_cell_types`. ```{r} results$cci_cell_types ``` ```{r layout-islands, fig.width=6, fig.height=6} results <- sim_true_counts(spatial_options( # cell type 4_1_2 should be the island layout = "islands:5" )) plot_cell_loc(results, show.arrows = FALSE) ``` ### Custom layouts It is also possible to layout the cells programmatically. The `layout` option can be a function that takes the cell type information and returns the spatial locations of the cells: ``` # grid_size is a number # cell_types is an integer vector, representing the cell types function(grids_size, cell_types) { # return a matrix with two columns, representing the x and y coordinates of the cells return matrix(nrow = 2, ncol = ncells) } ``` For example, the following layout function will place the cells sequentially in the grid, starting from the bottom-left corner. ```{r layout-custom, fig.width=6, fig.height=6} results <- sim_true_counts(spatial_options( layout = function (grid_size, cell_types) { ncells <- length(cell_types) new_locs <- matrix(nrow = ncells, ncol = 2) # for each cell... for (i in 1:ncells) { # ...place it in the grid new_locs[i,] <- c(i %% grid_size, i %/% grid_size) } return(new_locs) } )) plot_cell_loc(results, show.arrows = FALSE) ``` ## Spatial domains Next, we demonstrate how to use custom layout function to create spatial domains. We want to have three spatial domains in a layered layout, and we have four cell types. Each cell type has a different probability of being in each domain. The following layout function will do this job: First of all, it generates a set of locations that form a circular shape. Next, it assigns cells to these locations; the leftmost cell is selected as the origin. Then, we can create a layered layout by sorting the locations based on their euclidian distance to the origin. The three domains are determined by the distance to the origin. We have a matrix `ct_matrix` that specifies the probability of each cell type being in each domain. Finally, we sample the cells based on the probabilities and assign them to the domains. ```{r layout-domains} layout_fn <- function(grid_size, final_types) { ncells <- length(final_types) grid_center <- c(round(grid_size / 2), round(grid_size / 2)) all_locs <- gen_clutter(ncells, grid_size, grid_center) # center is bottom-left left_ones <- which(all_locs[,1] == min(all_locs[,1])) new_center <<- all_locs[left_ones[which.min(all_locs[left_ones, 2])],] dist_to_center <- sqrt(colSums((t(all_locs) - new_center)^2)) new_locs <- all_locs[order(dist_to_center),] # prob of a cell type being in a zone (cell_type x zone) ct_matrix <- matrix(c( 0.9, 0.1, 0.0, 0.1, 0.8, 0.1, 0.1, 0.7, 0.2, 0.0, 0.1, 0.9 ), nrow = 4, byrow = TRUE) # number of cells per type ct_pop <- c(160, 80, 100, 140) pop_mtx <- round(ct_matrix * ct_pop) if (sum(pop_mtx) != ncells) { diffrence <- ncells - sum(pop_mtx) pop_mtx[1, 1] <- pop_mtx[1, 1] + diffrence } # number of cells per zone zone_pop <- colSums(pop_mtx) # assign cells to zones cs <- cumsum(zone_pop) # sample cells cell_idx <- unlist(lapply(1:3, function(izone) { sample(rep(1:4, pop_mtx[,izone]), zone_pop[izone]) })) locs <<- new_locs[order(cell_idx),] zone_gt <<- rep(1:3, zone_pop)[order(cell_idx)] return(locs) } ``` Inspecting the result, we can see the three spatial domains, where the middle one contains a mix of two cell types. ```{r layout-domains-plot, fig.width=6, fig.height=6} results <- sim_true_counts(list( num.cells = 500, num.genes = 300, num.cifs = 40, GRN = NA, speed.up = T, cif.sigma = 0.8, tree = ape::read.tree(text = "(A:1,B:1,C:1,D:1);"), diff.cif.fraction = 0.8, discrete.cif = T, discrete.pop.size = as.integer(c(120,150,100,130)), cci = list( params = lig_params, max.neighbors = 4, start.layer = 500, cell.type.interaction = "random", layout = layout_fn, step.size = 1 ) )) plot_cell_loc(results, show.arrows = FALSE) ``` ## Spatially variable genes The `ext.cif.giv` option allows us to append custom CIF and GIV entries for each cell and gene. We can use this option to simulate spatially variable genes. This option should be a function that takes the kinetic parameter index and returns a list of extra CIF and GIV matrices. ```{r} scmultisim_help("ext.cif.giv") ``` Using the previous layout function, we can add extra CIF with value based on the distance to the origin. ```{r} ext_cif <- function(i) { # We manually set genes 290-300 to be spatially variable spatial_genes <- 290:300 dist_to_center <- colSums((t(locs) - new_center)^2) dist_to_center <- dist_to_center / max(dist_to_center) # 3 is the s parameter if (i == 3) { # n_extra_cif x n_cells ex_cif <- cbind( # the two CIFs have large values when distance to the center is near 0.5 rnorm(500, 0.5 * dnorm(abs(dist_to_center - 0.5), 0, 0.04), 0.02), rnorm(500, 0.5 * dnorm(abs(dist_to_center - 0.5), 0, 0.04), 0.02) ) # n_genes x n_extra_cif ex_giv <- matrix(0, nrow = 300, ncol = 2) for (i in spatial_genes) { # odd genes affected by the first two CIF, even genes affected by the last two CIF ex_giv[i, ] <- rnorm(2, 1, 0.5) } list(ex_cif, ex_giv * 2) } else { NULL } } ``` ```{r} results <- sim_true_counts(list( num.cells = 500, num.genes = 300, num.cifs = 40, GRN = NA, speed.up = T, cif.sigma = 0.8, tree = ape::read.tree(text = "(A:1,B:1,C:1,D:1);"), diff.cif.fraction = 0.8, ext.cif.giv = ext_cif, discrete.cif = T, discrete.pop.size = as.integer(c(120,150,100,130)), cci = list( params = lig_params, max.neighbors = 4, start.layer = 500, cell.type.interaction = "random", layout = layout_fn, step.size = 1 ) )) ``` Try plotting one of the spatially variable genes. We can see that the gene expression is higher in the specific spatial region. ```{r spatially-variable-gene, fig.width=6, fig.height=6} library(ggplot2) plot_cell_loc(results, show.arrows = FALSE, .cell.pop = log(results$counts[299,] + 1)) + scale_colour_viridis_c() ``` ## Long-distance Cell-Cell Interactions scMultiSim also supports simulation of long-distance cell-cell interactions. The CCI option `radius` controls the maximum distance between two cells for them to interact. It can be a number or a string. When it is a number, it specifies the maximum distance. When it is a string it should be in the format `gaussian:sigma`, for example, `gaussian:1.2`. In this case, the probability of two cells interacting is proportional to the distance with a Gaussian kernel applied. By default, `radius = 1`, which means scMultiSim only consider the four nearest neighbors. We can compare the result with different sigma values 1 and 3: ```{r long-distance-cci} options <- lapply(c(1, 3), \(sigma) { list( rand.seed = 1, GRN = NA, num.genes = 200, num.cells = 500, num.cifs = 50, tree = Phyla5(), discrete.cif = T, discrete.min.pop.size = 20, discrete.pop.size = as.integer(c(110, 80, 140, 40, 130)), do.velocity = F, scale.s = 1, cci = list( params = lig_params, max.neighbors = 4, cell.type.interaction = "random", cell.type.lr.pairs = 3:6, step.size = 0.3, grid.size = 35, start.layer = 500, radius = paste0("gaussian:", sigma), layout = "layers" ) ) }) results_1 <- sim_true_counts(options[[1]]) results_3 <- sim_true_counts(options[[2]]) ``` ```{r plot-long-distance-cci, fig.width=6, fig.height=6} plot_cell_loc(results_1, show.arrows = T, .cell.pop = as.character(results$grid$final_types)) plot_cell_loc(results_3, show.arrows = T, .cell.pop = as.character(results$grid$final_types)) ``` ## Session Information ```{r session-info} sessionInfo() ```