--- title: "SpaceMarkers with SpatialExperiment Objects" author: "Atul Deshpande" date: "`r Sys.Date()`" output: BiocStyle::html_document: toc: yes toc_float: toc_collapsed: true vignette: > %\VignetteIndexEntry{SpaceMarkers with SpatialExperiment Objects} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r global.options, include = FALSE} knitr::opts_knit$set( collapse = TRUE, comment = "#>", fig.align = 'center' ) knitr::opts_chunk$set( out.extra = 'style="display:block; margin:auto;"', warning = FALSE, message = FALSE, # Compact, HTML-friendly rendering. JPEG (rather than PNG) is the # right choice for the H&E histology overlays — photographic raster # compresses ~5x better as JPEG at quality 80, keeping the rendered # vignette under Bioconductor's 5 MB per-file size guideline. dev = "jpeg", dev.args = list(quality = 80), dpi = 72, fig.width = 5, fig.height = 4, fig.retina = 1 ) ``` # Overview This vignette demonstrates how to use the `SpaceMarkersExperiment` class, a `SpatialExperiment` subclass that serves as a unified container for all SpaceMarkers inputs and outputs. Instead of tracking separate matrices, data frames, and lists across pipeline steps, a single `SpaceMarkersExperiment` object holds everything: expression data, spatial coordinates, pattern features, optimal parameters, hotspots, interacting genes, and interaction scores. We will reproduce the same breast cancer analysis from the main vignette using this integrated workflow. # Installation ```{r eval = FALSE} if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("SpaceMarkers") ``` ```{r message = FALSE, warning = FALSE} library(SpaceMarkers) ``` # Setup ## Links to Data ```{r} data_dir <- "visiumBrCA" unlink(file.path(data_dir), recursive = TRUE) dir.create(data_dir, showWarnings = FALSE) main_10xlink <- "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0" counts_folder <- "Visium_Human_Breast_Cancer" counts_file <- "Visium_Human_Breast_Cancer_filtered_feature_bc_matrix.h5" counts_url <- paste(c(main_10xlink, counts_folder, counts_file), collapse = "/") sp_folder <- "Visium_Human_Breast_Cancer" sp_file <- "Visium_Human_Breast_Cancer_spatial.tar.gz" sp_url <- paste(c(main_10xlink, sp_folder, sp_file), collapse = "/") ``` ## Downloading Data Downloads are routed through `BiocFileCache` so the ~260 MB dataset is fetched once and reused across vignette builds and across the other SpaceMarkers vignettes. ```{r} bfc <- BiocFileCache::BiocFileCache(ask = FALSE) counts_cache <- BiocFileCache::bfcrpath(bfc, counts_url) sp_cache <- BiocFileCache::bfcrpath(bfc, sp_url) file.copy(counts_cache, file.path(data_dir, counts_file), overwrite = TRUE) untar(sp_cache, exdir = data_dir) ``` # Loading Data into a SpaceMarkersExperiment ## Using `load10X` `load10X()` searches the Visium directory for expression and spatial files using flexible pattern matching — it works regardless of whether your files are in an `outs/`, `data/`, or flat directory layout. It log-transforms the counts and returns a `SpaceMarkersExperiment` directly: ```{r message=FALSE, warning=FALSE} cogaps_result <- readRDS(system.file("extdata", "CoGAPS_result.rds", package = "SpaceMarkers", mustWork = TRUE)) sme <- load10X(data_dir, features = cogaps_result, method = "CoGAPS") sme ``` Alternatively, you can load first and add features separately with `add_features()`: ```{r eval=FALSE} # Step 1: Load without features sme <- load10X(data_dir) # Step 2: Add features later sme <- add_features(sme, cogaps_result, method = "CoGAPS") ``` If the features cover fewer spots than the expression data (or vice versa), `add_features()` will keep only the shared spots and emit a warning. The object inherits from `SpatialExperiment` so all standard accessors work: ```{r} # Expression matrix dim(assay(sme, "logcounts")) # Spatial coordinates head(spatialCoords(sme)) # Spatial patterns stored in colData head(spatial_patterns(sme)) ``` ## Filtering For this demonstration we will use two patterns and a curated gene list: ```{r} data("curated_genes") good_gene_threshold <- 3 keep_genes <- rownames(sme)[ apply(assay(sme, "logcounts"), 1, function(x) sum(x > 0) >= good_gene_threshold) ] keep_genes <- intersect(keep_genes, curated_genes) sme <- sme[keep_genes, ] # Keep all patterns for now; we will subset per analysis below sme ``` # Running SpaceMarkers with the SME Object ## Undirected Analysis The `SpaceMarkers()` function now accepts a `SpaceMarkersExperiment` as its first argument. All results are stored back in the returned object. We keep all five CoGAPS patterns on the SME (so the overlap heatmap shows every pairwise relationship) and restrict the gene-level interaction tests to Pattern\_1 × Pattern\_2 — our narrative focus — via `pattern_pairs`, which `SpaceMarkers()` forwards to `get_pairwise_interacting_genes()`. ```{r message=FALSE, warning=FALSE} sme_undirected <- SpaceMarkers( x = sme, directed = FALSE, min.gene.expr = 3, pattern_pairs = rbind(c("Pattern_1", "Pattern_2")) ) ``` ### Inspecting Results Summary scores are accessed via dedicated accessors: ```{r} # Analysis type analysis_type(sme_undirected) # IM scores (summary: genes x pattern pairs) head(undirected_scores(sme_undirected)) ``` Detailed intermediate results are stored in `metadata()` and accessed via convenience functions: ```{r} # Undirected hotspots hs <- hotspots(sme_undirected, type = "undirected") head(hs) ``` ```{r} # Per-pair interaction details pair_names <- names(interactions(sme_undirected)) pair_names # Detailed stats for the first pair (if any interactions were found) if (length(pair_names) > 0) { pair1 <- interactions(sme_undirected, pair = pair_names[1]) if (length(pair1$interacting_genes) > 0) { head(pair1$interacting_genes[[1]]) } else { message("No interacting genes found for this pair.") } } ``` ## Directed Analysis The directed analysis keeps every pattern on the SME so the directed overlap heatmap covers all source → target pairs, and restricts gene-level scoring to Pattern\_1 → Pattern\_5 (immune → cancer) via `pattern_pairs`, which `SpaceMarkers()` forwards to `calculate_gene_scores_directed()`. ```{r message=FALSE, warning=FALSE} sme_directed <- SpaceMarkers( x = sme, directed = TRUE, min.gene.expr = 3, pattern_pairs = rbind(c("Pattern_1", "Pattern_5")) ) ``` ```{r} analysis_type(sme_directed) head(directed_scores(sme_directed)) ``` Directed analysis also stores hotspot and influence intermediates: ```{r} # Pattern hotspots head(hotspots(sme_directed, type = "pattern")) # Influence hotspots head(hotspots(sme_directed, type = "influence")) # Influence map head(influence_map(sme_directed)) ``` # Visualization Since the SME holds all results, every visualization function accepts the object directly and reads from the stored slots. ## Patterns on the tissue `plot_spatial()` automatically extracts coordinates from the object, finds the tissue image stored in the object's `visiumDir`, and falls back to a blank background if none is available. The `source` argument selects where `feature_col` is resolved: `colData` (patterns, continuous features), `assay` (gene expression), `hotspots` (hotspot labels), `influence_map` (per-pattern influence values), or `interaction` (a three-way categorical overlay). ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} plot_spatial(sme, feature_col = "Pattern_1", title = "Pattern_1 (immune)") ``` ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} plot_spatial(sme, feature_col = "Pattern_5", title = "Pattern_5 (cancer)") ``` ## Undirected hotspots Pass `source = "hotspots"` so `plot_spatial()` reads hotspot labels from `metadata(sme)$hotspots$undirected` rather than the continuous pattern column in `colData` (which shares the same name). `colors` maps the "hot" level to a single fill. ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} plot_spatial(sme_undirected, feature_col = "Pattern_1", source = "hotspots", hotspot_type = "undirected", colors = "steelblue", title = "Pattern_1 hotspots") ``` …and Pattern\_2, from the same `find_all_hotspots()` pass: ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} plot_spatial(sme_undirected, feature_col = "Pattern_2", source = "hotspots", hotspot_type = "undirected", colors = "firebrick", title = "Pattern_2 hotspots") ``` ## Undirected overlap scores With hotspots in hand we can already answer the broader spatial question — which pattern pairs actually share territory. `plot_overlap_scores()` reads `overlap_scores(sme)` directly and auto-detects the undirected shape (symmetric `pattern1 × pattern2` heatmap) across every pair: ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} plot_overlap_scores(sme_undirected, title = "Undirected overlap") ``` ## Undirected interaction overlay For a chosen pair, a three-way overlay classifies each hotspot spot as Pattern\_1 only, interacting (hot in both), or Pattern\_2 only. This is the spatial region the per-gene tests below operate on. ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} plot_spatial(sme_undirected, source = "interaction", hotspot_type = "undirected", interaction_patterns = c("Pattern_1", "Pattern_2"), colors = c(Pattern_1 = "steelblue", interacting = "purple", Pattern_2 = "firebrick")) ``` ## IM scores bar plot and top genes The column names of `undirected_scores(sme)` are the pattern-pair keys — use the first pair key robustly instead of hard-coding: ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} pair_col <- setdiff(colnames(undirected_scores(sme_undirected)), "Gene")[1] plot_im_scores(sme_undirected, interaction = pair_col, nGenes = 10) ``` `plot_spatial()` accepts gene names directly — it falls through to the expression assay via `source = "auto"` (the default): ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} ims <- undirected_scores(sme_undirected) top_genes <- head(ims[order(ims[[pair_col]], decreasing = TRUE), "Gene"], 5) for (g in top_genes) { print(plot_spatial(sme_undirected, feature_col = g, title = g)) } ``` ## Directed hotspots The directed workflow splits hotspots into two flavors: pattern-value GMM hotspots (`hotspot_type = "pattern"`) and influence-map GMM hotspots (`hotspot_type = "influence"`). Here we focus on the pattern-level hotspots that define the source/target regions of each directed pair. ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} plot_spatial(sme_directed, feature_col = "Pattern_1", source = "hotspots", hotspot_type = "pattern", colors = "steelblue", title = "Pattern_1 GMM hotspots") ``` …and Pattern\_5, the target region the immune pattern radiates into: ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} plot_spatial(sme_directed, feature_col = "Pattern_5", source = "hotspots", hotspot_type = "pattern", colors = "firebrick", title = "Pattern_5 GMM hotspots") ``` ## Directed overlap scores `plot_overlap_scores()` detects the directed shape (`pattern × influence` with a relative-abundance fill) automatically, again across every source → target pair: ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} plot_overlap_scores(sme_directed, title = "Directed overlap (relative abundance)") ``` ## Directed interaction overlays Because a directed analysis runs two independent t-tests per pair — one per source → target direction — the per-spot classification is best rendered one direction at a time. The forward view shows Pattern\_1's hotspot split by whether each spot is also in Pattern\_5's influence zone (the test's interaction group): ```{r} labs_fwd <- overlap_map(sme_directed, c("Pattern_1", "Pattern_5"), direction = "forward") table(labs_fwd, useNA = "ifany") ``` ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} plot_spatial(sme_directed, source = "interaction", interaction_patterns = c("Pattern_1", "Pattern_5"), direction = "forward", colors = c(Pattern_1 = "steelblue", "Pattern_1 near Pattern_5" = "purple", "Pattern_5 influence" = "gray60")) ``` And the reverse (Pattern\_5 → Pattern\_1): ```{r} labs_rev <- overlap_map(sme_directed, c("Pattern_1", "Pattern_5"), direction = "reverse") table(labs_rev, useNA = "ifany") ``` ```{r, message=FALSE, warning=FALSE, dpi=60, fig.width=6} plot_spatial(sme_directed, source = "interaction", interaction_patterns = c("Pattern_1", "Pattern_5"), direction = "reverse", colors = c(Pattern_5 = "firebrick", "Pattern_5 near Pattern_1" = "goldenrod", "Pattern_1 influence" = "gray60")) ``` # Constructing a SpaceMarkersExperiment Manually If you already have expression and coordinate data in R, you can construct a `SpaceMarkersExperiment` directly: ```{r} mat <- matrix(rnorm(100), nrow = 10, ncol = 10) rownames(mat) <- paste0("gene_", 1:10) colnames(mat) <- paste0("spot_", 1:10) coords <- matrix(runif(20) * 100, ncol = 2) colnames(coords) <- c("y", "x") rownames(coords) <- colnames(mat) cd <- S4Vectors::DataFrame( CellType_A = runif(10), CellType_B = runif(10), row.names = colnames(mat) ) my_sme <- SpaceMarkersExperiment( assays = list(logcounts = mat), colData = cd, spatialCoords = coords, spaceMarkers = list( params = list(pattern_names = c("CellType_A", "CellType_B")) ) ) my_sme ``` ## From an Existing SpatialExperiment You can also extract features from any `SpatialExperiment` object using `get_spatial_features()`: ```{r} spe <- SpatialExperiment::SpatialExperiment( assays = list(logcounts = mat), colData = cd, spatialCoords = coords ) # Extract numeric colData columns as spatial features features <- get_spatial_features(spe) head(features) ``` # Backward Compatibility The legacy file-path interface still works. Set `returnSME = FALSE` to get the old-style output: ```{r eval = FALSE} # Legacy usage (returns data.frame, not SME) result <- SpaceMarkers( features = "path/to/features.rds", data = "path/to/visium_dir", directed = FALSE, returnSME = FALSE ) ``` # Summary | Feature | Legacy Workflow | SME Workflow | |---|---|---| | Input | File paths | `SpaceMarkersExperiment` or file paths | | Output | Scattered data.frames/lists | Single `SpaceMarkersExperiment` | | Expression | Separate matrix | `assay(sme, "logcounts")` | | Coordinates | Separate data.frame | `spatialCoords(sme)` | | Patterns | Merged in spPatterns | `spatial_patterns(sme)` | | Kernel Parameters | Separate matrix | `spatial_params(sme)` | | All Hyperparameters | Scattered arguments | `params(sme)` | | Undirected Scores | Separate data.frame | `undirected_scores(sme)` | | Directed Scores | Separate data.frame | `directed_scores(sme)` | | LR Scores | Separate matrix | `lr_scores(sme)` | | Overlap Scores | Separate data.frame | `overlap_scores(sme)` | | Hotspots | Separate data.frame | `hotspots(sme, type = ...)` | | Interactions | Nested list | `interactions(sme, pair = ...)` | | Influence | Separate data.frame | `influence_map(sme)` | | Spatial Plotting | Manual coord extraction + `plot_spatial_data_over_image` | `plot_spatial(sme, feature_col = ...)` | # Cleanup ```{r} unlink(file.path(data_dir), recursive = TRUE) ``` # Session Info ```{r} sessionInfo() ```