--- title: "A quick start guide to the hoodscanR package" author: "Ning Liu, Melissa Davis" date: "`r Sys.Date()`" output: BiocStyle::html_document: toc: true number_sections: true theme: cosmo highlight: tango code_folding: show vignette: > %\VignetteIndexEntry{hoodscanR_introduction} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{=html} ``` ```{r message=FALSE, warning=FALSE, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, dpi = 80) suppressWarnings(library(ggplot2)) ``` # Introduction hoodscanR is an user-friendly R package providing functions to assist cellular neighborhood analysis of any spatial transcriptomics data with single-cell resolution. All functions in the package are built based on the `r BiocStyle::Biocpkg("SpatialExperiment")` infrastructure, allowing integration into various spatial transcriptomics-related packages from Bioconductor. The package can result in cell-level neighborhood annotation output, along with funtions to perform neighborhood colocalization analysis and neighborhood-based cell clustering. # Installation ```{r eval=FALSE} if (!require("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("hoodscanR") ``` The development version of `hoodscanR` can be installed from GitHub: ```{r eval=FALSE} devtools::install_github("DavisLaboratory/hoodscanR") ``` # Quick start ```{r} library(hoodscanR) library(SpatialExperiment) library(scico) ``` # Data exploration The `readHoodData` function can format the spatialExperiment input object as desired for all other functions in the `hoodscanR` package. ```{r} data("spe_test") spe <- readHoodData(spe, anno_col = "celltypes") ``` ```{r} spe ``` ```{r} colData(spe) ``` We can have a look at the tissue and cell positions by using the function `plotTissue`. The test data is relatively sparse with low-level cell type annotations. ```{r, fig.width=7, fig.height=4} col.pal <- c("red3", "royalblue", "gold", "cyan2", "purple3", "darkgreen") plotTissue(spe, color = cell_annotation, size = 1.5, alpha = 0.8) + scale_color_manual(values = col.pal) ``` # Neighborhoods scanning In order to perform neighborhood scanning, we need to firstly identify k (in this example, k = 100) nearest cells for each cells. The searching algorithm is based on Approximate Near Neighbor (ANN) C++ library from the RANN package. ```{r} fnc <- findNearCells(spe, k = 100) ``` The output of `findNearCells` function includes two matrix, an annotation matrix and a distance matrix. ```{r} lapply(fnc, function(x) x[1:10, 1:5]) ``` We can then perform neighborhood analysis using the function `scanHoods`. This function incldue the modified softmax algorithm, aimming to genereate a matrix with the probability of each cell associating with their 100 nearest cells. ```{r} pm <- scanHoods(fnc$distance) ``` The resulting ```{r} pm[1:10, 1:5] ``` We can then merge the probabilities by the cell types of the 100 nearest cells. ```{r} hoods <- mergeByGroup(pm, fnc$cells) ``` Now we have the final probability distribution of each cell all each neighborhood. ```{r} hoods[1:10, ] ``` # Neighborhoods analysis We plot randomly plot 10 cells to see the output of neighborhood scanning using `plotHoodMat`. In this plot, each value represent the probability of the each cell (each row) located in each cell type neighborhood. The rowSums of the probability maxtrix will always be 1. ```{r, fig.width=5, fig.height=4} plotHoodMat(hoods, n = 10, hm_height = 5) ``` Or to check the cells-of-interest with the parameter `targetCells` within the function ```{r, fig.width=5, fig.height=4} plotHoodMat(hoods, targetCells = c("Lung9_Rep1_5_1975", "Lung9_Rep1_5_2712"), hm_height = 3) ``` We can then merge the neighborhood results with the SpatialExperiment object using `mergeHoodSpe` so that we can conduct more neighborhood-related analysis. ```{r} spe <- mergeHoodSpe(spe, hoods) ``` To summarise our neighborhood results, we can use `calcMetrics` to calculate entropy and perplexity of the probability matrix so that we can have a summarisation of the neighborhood distribution across the tissue slide, i.e. where neighborhood is more distinct and where is more mixed. ```{r} spe <- calcMetrics(spe, pm_cols = colnames(hoods)) ``` We then again use `plotTissue` to plot out the entropy or perplexity. While both entropy and perplexity measure the mixture of neighborhood of each cell, perplexity can be more intuitive for the human mind as the value of it actually mean something. For example, perplexity of 1 means the cell is located in a very distinct neighborhood, perplexity of 2 means the cell is located in a mixed neighborhood, and the probability is about 50% to 50%. ```{r, fig.width=7, fig.height=4} plotTissue(spe, size = 1.5, color = entropy) + scale_color_scico(palette = "tokyo") ``` ```{r, fig.width=7, fig.height=4} plotTissue(spe, size = 1.5, color = perplexity) + scale_color_scico(palette = "tokyo") ``` We can perform neighborhood colocalization analysis using `plotColocal`. This function compute pearson correlation on the probability distribution of each cell. Here we can see in the test data, endothelial cells and stromal cells are more likely to colocalize, epithelial cells and dividing cells are more likely to colocalize. ```{r, fig.width=6, fig.height=4} plotColocal(spe, pm_cols = colnames(hoods)) ``` We can cluster the cells by their neighborhood probability distribution using `clustByHood`, it is based on the k-means algorithm and here we set k to 10. ```{r} spe <- clustByHood(spe, pm_cols = colnames(hoods), k = 10) ``` We can see what are the neighborhood distributions look like in each cluster using `plotProbDist`. ```{r, fig.width=6, fig.height=5} plotProbDist(spe, pm_cols = colnames(hoods), by_cluster = TRUE, plot_all = TRUE, show_clusters = as.character(seq(10))) ``` We can plot the clusters on the tissue slide, agian using `plotTissue`. ```{r, fig.width=7, fig.height=4} plotTissue(spe, color = clusters) ``` # Session info ```{r} sessionInfo() ```