--- title: "Getting started with 'SpotSweeper'" author: - name: Michael Totty affiliation: &id1 "Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA" - name: Boyi Guo affiliation: *id1 - name: Stephanie Hicks affiliation: *id1 date: "`r Sys.Date()`" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Getting Started with `SpotSweeper`} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() ``` ## Introduction `SpotSweeper` is an R package for spatial transcriptomics data quality control (QC). It provides functions for detecting and visualizing spot-level local outliers and artifacts using spatially-aware methods. The package is designed to work with [SpatialExperiment](https://github.com/drighelli/SpatialExperiment) objects, and is compatible with data from 10X Genomics Visium and other spatial transcriptomics platforms. ## Installation Currently, the only way to install `SpotSweeper` is by downloading the development version which can be installed from [GitHub](https://github.com/MicTott/SpotSweeper) using the following: ```{r 'install_dev', eval = FALSE} if (!require("devtools")) install.packages("devtools") remotes::install_github("MicTott/SpotSweeper") ``` Once accepted in [Bioconductor](http://bioconductor.org/), `SpotSweeper` will be installable using: ```{r 'install', eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("SpotSweeper") ``` ## Spot-level local outlier detection ### Loading example data Here we'll walk you through the standard workflow for using 'SpotSweeper' to detect and visualize local outliers in spatial transcriptomics data. We'll use the `Visium_humanDLPFC` dataset from the `STexampleData` package, which is a `SpatialExperiment` object. Because local outliers will be saved in the `colData` of the `SpatialExperiment` object, we'll first view the `colData` and drop out-of-tissue spots before calculating quality control (QC) metrics and running `SpotSweeper`. ```{r example_spe} library(SpotSweeper) # load Maynard et al DLPFC daatset spe <- STexampleData::Visium_humanDLPFC() # show column data before SpotSweeper colnames(colData(spe)) # drop out-of-tissue spots spe <- spe[, spe$in_tissue == 1] ``` ### Calculating QC metrics using `scuttle` We'll use the `scuttle` package to calculate QC metrics. To do this, we'll need to first change the `rownames` from gene id to gene names. We'll then get the mitochondrial transcripts and calculate QC metrics for each spot using `scuttle::addPerCellQCMetrics`. ```{r example_scuttle} # change from gene id to gene names rownames(spe) <- rowData(spe)$gene_name # identifying the mitochondrial transcripts is.mito <- rownames(spe)[grepl("^MT-", rownames(spe))] # calculating QC metrics for each spot using scuttle spe <- scuttle::addPerCellQCMetrics(spe, subsets = list(Mito = is.mito)) colnames(colData(spe)) ``` ### Identifying local outliers using `SpotSweeper` We can now use `SpotSweeper` to identify local outliers in the spatial transcriptomics data. We'll use the `localOutliers` function to detect local outliers based on the unique detected genes, total library size, and percent of the total reads that are mitochondrial. These methods assume a normal distribution, so we'll use the log-transformed sum of the counts and the log-transformed number of detected genes. For mitochondrial percent, we'll use the raw mitochondrial percentage. ```{r example_local_outliers} # library size spe <- localOutliers(spe, metric = "sum", direction = "lower", log = TRUE ) # unique genes spe <- localOutliers(spe, metric = "detected", direction = "lower", log = TRUE ) # mitochondrial percent spe <- localOutliers(spe, metric = "subsets_Mito_percent", direction = "higher", log = FALSE ) ``` The `localOutlier` function automatically outputs the results to the `colData` with the naming convention `X_outliers`, where `X` is the name of the input `colData`. We can then combine all outliers into a single column called `local_outliers` in the `colData` of the `SpatialExperiment` object. ```{r example_combine_local_outliers} # combine all outliers into "local_outliers" column spe$local_outliers <- as.logical(spe$sum_outliers) | as.logical(spe$detected_outliers) | as.logical(spe$subsets_Mito_percent_outliers) ``` ### Visualizing local outliers We can visualize the local outliers using the `plotQC` function. This function creates a scatter plot of the specified metric and highlights the local outliers in red using the `escheR` package. Here, we'll visualize local outliers of library size, unique genes, mitochondrial percent, and finally, all local outliers. We'll then arrange these plots in a grid using `ggpubr::arrange`. ```{r local_outlier_plot} library(escheR) library(ggpubr) # library size p1 <- plotQC(spe, metric = "sum_log", outliers = "sum_outliers", point_size = 1.1 ) + ggtitle("Library Size") # unique genes p2 <- plotQC(spe, metric = "detected_log", outliers = "detected_outliers", point_size = 1.1 ) + ggtitle("Unique Genes") # mitochondrial percent p3 <- plotQC(spe, metric = "subsets_Mito_percent", outliers = "subsets_Mito_percent_outliers", point_size = 1.1 ) + ggtitle("Mitochondrial Percent") # all local outliers p4 <- plotQC(spe, metric = "sum_log", outliers = "local_outliers", point_size = 1.1, stroke = 0.75 ) + ggtitle("All Local Outliers") # plot plot_list <- list(p1, p2, p3, p4) ggarrange( plotlist = plot_list, ncol = 2, nrow = 2, common.legend = FALSE ) ``` ## Removing technical artifacts using `SpotSweeper` ### Loading example data ```{r example_artifactRemoval} # load in DLPFC sample with hangnail artifact data(DLPFC_artifact) spe <- DLPFC_artifact # inspect colData before artifact detection colnames(colData(spe)) ``` ### Visualizing technical artifacts Technical artifacts can commonly be visualized by standard QC metrics, including library size, unique genes, or mitochondrial percentage. We can first visualize the technical artifacts using the `plotQC` function. This function plots the Visium spots with the specified QC metric.We'll then again arrange these plots using `ggpubr::arrange`. ```{r artifact_QC_plots} # library size p1 <- plotQC(spe, metric = "sum_umi", outliers = NULL, point_size = 1.1 ) + ggtitle("Library Size") # unique genes p2 <- plotQC(spe, metric = "sum_gene", outliers = NULL, point_size = 1.1 ) + ggtitle("Unique Genes") # mitochondrial percent p3 <- plotQC(spe, metric = "expr_chrM_ratio", outliers = NULL, point_size = 1.1 ) + ggtitle("Mitochondrial Percent") # plot plot_list <- list(p1, p2, p3) ggarrange( plotlist = plot_list, ncol = 3, nrow = 1, common.legend = FALSE ) ``` ### Identifying artifacts using `SpotSweeper` We can then use the `findArtifacts` function to identify artifacts in the spatial transcriptomics (data. This function identifies technical artifacts based on the first principle component of the local variance of the specified QC metric (`mito_percent`) at numerous neighorhood sizes (`n_rings=5`). Currently, `kmeans` clustering is used to cluster the technical artifact vs high-quality Visium spots. Similar to `localOutliers`, the `findArtifacts` function then outputs the results to the `colData`. ```{r artifact_plot} # find artifacts using SpotSweeper spe <- findArtifacts(spe, mito_percent = "expr_chrM_ratio", mito_sum = "expr_chrM", n_rings = 5, name = "artifact" ) # check that "artifact" is now in colData colnames(colData(spe)) ``` ### Visualizing artifacts We can visualize the artifacts using the `escheR` package. Here, we'll visualize the artifacts using the `plotQC` function and arrange these plots using `ggpubr::arrange`. ```{r artifact_visualization} plotQC(spe, metric = "expr_chrM_ratio", outliers = "artifact", point_size = 1.1 ) + ggtitle("Hangnail artifact") ``` # Session information ```{r} utils::sessionInfo() ```