---
title: "Introduction to visiumStitched"
author: 
  - name: Nicholas J. Eagles
    affiliation:
    - &libd Lieber Institute for Brain Development
    email: nickeagles77@gmail.com
  - name: Leonardo Collado-Torres
    affiliation:
    - *libd
    - &ccb Center for Computational Biology, Johns Hopkins University
    - &jhubiostat Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
    email: lcolladotor@gmail.com
output: 
  BiocStyle::html_document:
    self_contained: yes
    toc: true
    toc_float: true
    toc_depth: 2
    code_folding: show
date: "`r doc_date()`"
package: "`r pkg_ver('visiumStitched')`"
vignette: >
  %\VignetteIndexEntry{Introduction to visiumStitched}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}  
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html
)
```


```{r vignetteSetup, echo=FALSE, message=FALSE, warning = FALSE}
## Bib setup
library("RefManageR")

## Write bibliography information
bib <- c(
    R = citation(),
    BiocFileCache = citation("BiocFileCache")[1],
    BiocStyle = citation("BiocStyle")[1],
    dplyr = citation("dplyr")[1],
    DropletUtils = citation("DropletUtils")[1],
    ggplot2 = citation("ggplot2")[1],
    imager = citation("imager")[1],
    knitr = citation("knitr")[1],
    pkgcond = citation("pkgcond")[1],
    readr = citation("readr")[1],
    RefManageR = citation("RefManageR")[1],
    rjson = citation("rjson")[1],
    rmarkdown = citation("rmarkdown")[1],
    S4Vectors = citation("S4Vectors")[1],
    sessioninfo = citation("sessioninfo")[1],
    Seurat = citation("Seurat")[1],
    SpatialExperiment = citation("SpatialExperiment")[1],
    spatialLIBD = citation("spatialLIBD")[1],
    stringr = citation("stringr")[1],
    SummarizedExperiment = citation("SummarizedExperiment")[1],
    testthat = citation("testthat")[1],
    visiumStitched = citation("visiumStitched")[1],
    xml2 = citation("xml2")[1]
)
```

# Basics

## Install `visiumStitched`

`r Biocpkg("visiumStitched")` is a Bioconductor `R` package that can be
installed with the following commands in your `R` session:

```{r "install", eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("visiumStitched")
```

## Citing `visiumStitched`

We hope that `r Biocpkg("visiumStitched")` will be useful for your research.
Please use the following information to cite the package and the overall
approach. Thank you!

```{r "citation"}
## Citation info
citation("visiumStitched")
```

## Packages used in this vignette

Let's load the packages we'll use in this vignette.

```{r "start", message=FALSE, warning=FALSE}
library("SpatialExperiment")
library("visiumStitched")
library("dplyr")
library("spatialLIBD")
library("BiocFileCache")
library("ggplot2")
```

# Preparing Experiment Information

Much of the `visiumStitched` package uses a `tibble` (or `data.frame`) defining
information about the experiment. Most fundamentally, the `group` column allows
you to line up which capture areas, in the `capture_area` column, are to be
stitched together later. In our case, we have just one unique group, consisting
of all three capture areas. Note multiple groups are supported. By the end of
this demo, the `SpatialExperiment` will consist of just one sample composed of
the three capture areas; in general, there will be one sample per group.

```{r "sample_info"}
## Create initial sample_info
sample_info <- data.frame(
    group = "Br2719",
    capture_area = c("V13B23-283_A1", "V13B23-283_C1", "V13B23-283_D1")
)

## Initial sample_info
sample_info
```

Next, we'll need the Spaceranger outputs for each capture area, which can be
retrieved with `spatialLIBD::fetch_data()`.

```{r "spaceranger_dir"}
## Download example SpaceRanger output files
sr_dir <- tempdir()
temp <- unzip(spatialLIBD::fetch_data("visiumStitched_brain_spaceranger"),
    exdir = sr_dir
)
sample_info$spaceranger_dir <- file.path(
    sr_dir, sample_info$capture_area, "outs", "spatial"
)

## Sample_info with paths to SpaceRanger output directories
sample_info
```

# Preparing Inputs to Fiji

The `visiumStitched` workflow makes use of
[Fiji](https://imagej.net/software/fiji/), a distribution of the `ImageJ`
image-processing software, which includes an interface for aligning images on a
shared coordinate system. Before aligning anything in Fiji, we need to ensure
that images to align from all capture areas are on the same scale-- that is, a
pixel in each image represents the same distance. This is typically
approximately true, but is not guaranteed to be exactly true, especially when
the capture areas to align come from different Visium slides.
`rescale_fiji_inputs()` reads in the
[high-resolution tissue images](https://www.10xgenomics.com/support/software/space-ranger/latest/analysis/outputs/spatial-outputs#tissue-png)
for each capture area, and uses info about their spot diameters in pixels and
scale factors to rescale the images appropriately (even if they are from
different Visium slides).

For demonstration purposes, we'll set `out_dir` to a temporary location.
Typically, it would really be any suitable directory to place the rescaled
images for later input to Fiji.

```{r "rescale_inputs"}
#   Generate rescaled approximately high-resolution images
sample_info <- rescale_fiji_inputs(sample_info, out_dir = tempdir())

## Sample_info with output directories
sample_info
```

# Building a `SpatialExperiment`

## Stitching Images with Fiji

Before building a `SpatialExperiment` for a stitched dataset, we must align the
images for each group in Fiji. Check out
[this video](https://www.youtube.com/watch?v=kFLtpK3qbSY) for a guide through
this process with the example data.

<iframe width="560" height="315" src="https://www.youtube.com/embed/kFLtpK3qbSY?si=CF-rv7Nd5yc7W6nC" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

## Creating Group-Level Samples

From the Fiji alignment, two output files will be produced: an `XML` file
specifying rigid affine transformations for each capture area, and the stitched
approximately high-resolution image. These files for this dataset are
available through `spatialLIBD::fetch_data()`. We'll need to add the paths to
the XML and PNG files to the `fiji_xml_path` and `fiji_image_path` columns of
`sample_info`, respectively.

```{r "fiji_out"}
fiji_dir <- tempdir()
temp <- unzip(fetch_data("visiumStitched_brain_Fiji_out"), exdir = fiji_dir)
sample_info$fiji_xml_path <- temp[grep("xml$", temp)]
sample_info$fiji_image_path <- temp[grep("png$", temp)]
```

We now have every column present in `sample_info` that will be necessary for any
`visiumStitched` function.

```{r "print_info"}
## Complete sample_info
sample_info
```

Before building the `SpatialExperiment`, the idea is to create a directory
structure very similar to
[Spaceranger's spatial outputs](https://www.10xgenomics.com/support/software/space-ranger/latest/analysis/outputs/spatial-outputs)
for each *group*, as opposed to the *capture-area*-level directories we already
have. We'll place this directory in a temporary location that will later be read
in to produce the final `SpatialExperiment`.

First, `prep_fiji_coords()` will apply the rigid affine transformations
specified by Fiji's output XML file to the spatial coordinates, ultimately
producing a group-level `tissue_positions.csv` file. Next, `prep_fiji_image()`
will rescale the stitched image to have a default of 1,200 pixels in the
longest dimension. The idea is that in an experiment with multiple groups, the
images stored in the `SpatialExperiment` for any group will be similarly scaled
and occupy similar memory footprints.

```{r "prep_fiji"}
## Prepare the Fiji coordinates and images.
## These functions return the file paths to the newly-created files that follow
## the standard directory structure from SpaceRanger (10x Genomics)
spe_input_dir <- tempdir()
prep_fiji_coords(sample_info, out_dir = spe_input_dir)
prep_fiji_image(sample_info, out_dir = spe_input_dir)
```

## Constructing the Object

We now have all the pieces to create the `SpatialExperiment` object. After
constructing the base object, information related to how spots may overlap
between capture areas in each `group` is added. The `sum_umi` metric will by
default determine which spots in overlapping regions to exclude in plots. In
particular, at regions of overlap, spots from capture areas with higher average
UMI (unique molecular identifier) counts will be plotted, while any other spots
will not be shown using `spatialLIBD::vis_clus()`, `spatialLIBD::vis_gene()`,
and related visualization functions. We'll also mirror the image and
gene-expression data to match the orientation specified at the wet bench. More
info about performing geometric transformations is
[here](#geometric-transformations).

```{r "gtf"}
## Download the Gencode v32 GTF file which is the closest one to the one
## that was used with SpaceRanger. Note that SpaceRanger GTFs are available at
## https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2024-A.tar.gz
## but is too large for us to download here since it includes many other files
## we don't need right now.
## However, ideally you would adapt this code and use the actual GTF file you
## used when running SpaceRanger.
bfc <- BiocFileCache::BiocFileCache()
gtf_cache <- bfcrpath(
    bfc,
    paste0(
        "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/",
        "release_32/gencode.v32.annotation.gtf.gz"
    )
)
```

```{r "build_SpatialExperiment"}
## Now we can build the SpatialExperiment object. We'll later explore error
## metrics related to computing new array coordinates, and thus specify
## 'calc_error_metrics = TRUE'.
spe <- build_SpatialExperiment(
    sample_info,
    coords_dir = spe_input_dir, reference_gtf = gtf_cache,
    calc_error_metrics = TRUE
)

## The images in this example data have to be mirrored across the horizontal axis.
spe <- SpatialExperiment::mirrorObject(spe, axis = "h")

## Explore stitched spe object
spe
```

The `colData(spe)$exclude_overlapping` column controls
which spots to drop for visualization purposes. Note also that the `overlap_key`
column was added, which gives a comma-separated string of spot keys overlapping
each given spot, or the empty string otherwise. After spatial clustering, the
`overlap_key` information can be useful to check how frequently overlapping
spots are assigned the same cluster.

```{r "exclude_overlapping"}
## Examine spots to exclude for plotting
table(spe$exclude_overlapping)
```

# Examining the stitched data

## Stitched plotting

To demonstrate that we've stitched both the gene expression and image data
successfully, we'll use `spatialLIBD::vis_gene(is_stitched = TRUE)` (version
1.17.8 or newer) to plot the distribution of white matter spatially. For more
context on human brain white matter spatial marker genes, check
[our previous work on this subject](https://doi.org/10.1038/s41593-020-00787-0).

```{r "explore_coords", fig.height = 4}
## Show combined raw expression of white-matter marker genes
wm_genes <- rownames(spe)[
    match(c("MBP", "GFAP", "PLP1", "AQP4"), rowData(spe)$gene_name)
]
vis_gene(spe, geneid = wm_genes, assayname = "counts", is_stitched = TRUE)
```

Note that we're plotting raw counts; prior to normalization, library-size
variation across spots can bias the apparent distribution. Later, we'll show 
that normalization is critical to producing a visually seamless transition
between overlapping capture areas.

## Defining Array Coordinates

Given that the stitched data is larger than a default Visium capture area,
`add_array_coords()` (which is used internally by `build_SpatialExperiment()`) recomputed the
array coordinates (i.e. `spe$array_row` and `spe$array_col`) to more sensibly
index the stitched data. 

Let's explain this in more detail. By definition, these array coordinates (see 
[documentation from 10X](https://www.10xgenomics.com/support/software/space-ranger/latest/analysis/outputs/spatial-outputs#tissue-positions))
are integer indices of each spot on a Visium capture area, numbering the
typically 78 and 128 rows and columns, respectively, for a 6.5mm capture area.
The `build_SpatialExperiment()` function retains each capture area's original array
coordinates, `spe$array_row_original` and `spe$array_col_original`, but these
are typically not useful to represent our group-level, stitched data. In fact,
each stitched capture area has the same exact array coordinates, despite having
different spatial positions after stitching. We'll take in-tissue spots only and
use transparency to emphasize the overlap among capture areas:

```{r "array_plot_orig"}
## Plot positions of default array coordinates, before overwriting with more
## meaningful values. Use custom colors for each capture area
ca_colors <- c("#A33B20", "#e7bb41", "#3d3b8e")
names(ca_colors) <- c("V13B23-283_C1", "V13B23-283_D1", "V13B23-283_A1")

colData(spe) |>
    as_tibble() |>
    filter(in_tissue) |>
    ggplot(
        mapping = aes(
            x = array_row_original, y = array_col_original, color = capture_area
        )
    ) +
    geom_point(alpha = 0.3) +
    scale_color_manual(values = ca_colors)
```

Let's contrast this with the array coordinates recomputed by `visiumStitched`.
Briefly, `visiumStitched` forms a new hexagonal, Visium-like grid spanning the
space occupied by all capture areas after stitching. Then, the true spot
positions are fit to the nearest new spot positions, in terms of Euclidean
distance. Finally, array coordinates are re-indexed according to the new spot
assignments, resulting in spatially meaningful values that apply at the group
level for stitched data. 

```{r "array_plot_after"}
## Plot positions of redefined array coordinates
colData(spe) |>
    as_tibble() |>
    filter(in_tissue) |>
    ggplot(
        mapping = aes(
            x = array_row, y = array_col, color = capture_area
        )
    ) +
    geom_point(alpha = 0.3) +
    scale_color_manual(values = ca_colors)
```

An important downstream application of these array coordinates, is that it
enables methods that rely on the hexagonal grid structure of Visium to find more
than the original six neighboring spots. This enables clustering with
[`BayesSpace`](https://doi.org/10.1038/s41587-021-00935-2) or
[`PRECAST`](https://doi.org/10.1038/s41467-023-35947-w), to treat each group as
a spatially continuous sample. We can see here how
[`BayesSpace:::.find_neighbors()`](https://github.com/edward130603/BayesSpace/blob/8e9af8f2fa8e93518cf9ecee1ded9ab93e88fffd/R/spatialCluster.R#L214-L220)
version 1.11.0 uses the hexagonal Visium grid properties to find the spot
neighbors. See also
[`BayesSpace` Figure 1b](https://www.nature.com/articles/s41587-021-00935-2/figures/1)
for an illustration of this process.

<iframe frameborder="0" scrolling="no" style="width:100%; height:226px;" allow="clipboard-write" src="https://emgithub.com/iframe.html?target=https%3A%2F%2Fgithub.com%2Fedward130603%2FBayesSpace%2Fblob%2F8e9af8f2fa8e93518cf9ecee1ded9ab93e88fffd%2FR%2FspatialCluster.R%23L214-L220&style=default&type=code&showBorder=on&showLineNumbers=on&showFileMeta=on&showFullPath=on&showCopy=on"></iframe>

Yet, it doesn't matter if there are actually two or more spots on each of those
six neighbor positions. `visiumStitched` takes advantage of this property to
enable `BayesSpace` and other spatially-aware clustering methods to use data
from overlapping spots when performing spatial clustering. You can then use
`colData(spe)$overlap_key` to inspect whether overlapping spots were assigned to
the same spatial cluster.

### Error metrics

No algorithm can fit a set of capture areas' spots onto a single hexagonal grid
without some error. Here, we define a spot's error in being assigned new array
coordinates with two independent metrics, which are stored in
`spe$euclidean_error` and `spe$shared_neighbors` if the user opts to compute
them with `build_SpatialExperiment(calc_error_metrics = TRUE)`. The latter metric can take a
couple minutes to compute, and thus by default the metrics are not computed.

The first metric is the Euclidean distance, in multiples of 100 microns (the
distance between spots on a Visium capture area), between a spot's original
position and the position of its assigned array coordinates.

```{r "euclidean_error"}
#   Explore the distribution of Euclidean error
colData(spe) |>
    as_tibble() |>
    ggplot(mapping = aes(x = 0, y = euclidean_error)) +
    geom_boxplot()
```

The other metric, `spe$shared_neighbors`, measures the fraction of original
neighbors (from a same capture area) that are retained after mapping to the new
array coordinates. Thus, a value of 1 is ideal.

```{r "shared_neighbors"}
#   Explore the distribution of Euclidean error
colData(spe) |>
    as_tibble() |>
    ggplot(mapping = aes(x = 0, y = shared_neighbors)) +
    geom_boxplot()
```

In theory, error as measured through these metrics could have a very slight
impact on the quality of clustering results downstream. We envision interested
users in checking these metrics when interpreting specific spots' cluster
assignments downstream.

# Downstream applications

One common area of analysis in spatial transcriptomics involves clustering--
in particular, spatially-aware clustering. Many spatially-aware clustering
algorithms check the array coordinates to determine neighboring spots and
ultimately produce spatially smooth clusters. As we have previously explained, 
`visiumStitched` [re-computes array coordinates](#array-coordinates) in a
meaningful way, such that software like
[`BayesSpace`](https://doi.org/10.1038/s41587-021-00935-2) and
[`PRECAST`](https://doi.org/10.1038/s41467-023-35947-w) work out-of-the-box with
stitched data, treating each group as a single continuous sample.

[We've already run PRECAST](https://github.com/LieberInstitute/visiumStitched_brain/blob/devel/code/03_stitching/06_precast.R),
and can visualize the results here, where we see a fairly seamless transition of
cluster assignments across capture-area boundaries. First, let's examine
`k = 2`:

```{r "precast_k2_stitched", fig.height = 4}
## Grab SpatialExperiment with normalized counts
spe_norm <- fetch_data(type = "visiumStitched_brain_spe")
assayNames(spe_norm)

## PRECAST k = 2 clusters with our manually chosen colors
vis_clus(
    spe_norm,
    clustervar = "precast_k2_stitched",
    is_stitched = TRUE,
    colors = c(
        "1" = "gold",
        "2" = "darkblue",
        "NA" = "white"
    ),
    spatial = FALSE
)
```

We can see that these two spatial clusters are differentiating the white vs the
gray matter based on the white matter marker genes we
[previously visualized](https://research.libd.org/visiumStitched/articles/visiumStitched.html#a-note-on-normalization).

In the example data, `k = 4` and `k = 8` have also been computed. Let's
visualize the `k = 4` results.

```{r "precast_k4_stitched", fig.height = 4}
## PRECAST results already available in this example data
vars <- colnames(colData(spe_norm))
vars[grep("precast", vars)]

## PRECAST k = 4 clusters with default cluster colors
vis_clus(
    spe_norm,
    clustervar = "precast_k4_stitched",
    is_stitched = TRUE,
    spatial = FALSE
)
```

The biological interpretation of these spatial clusters would need further work,
using methods such as:

* [spatial registration](https://research.libd.org/spatialLIBD/articles/guide_to_spatial_registration.html) of reference sc/snRNA-seq or spatial data,
* visualization of known marker genes for the tissue of interest,
* or identification of data driven marker genes using [`spatialLIBD::registration_wrapper()`](https://research.libd.org/spatialLIBD/reference/registration_wrapper.html), [`DeconvoBuddies::findMarkers_1vAll()`](https://research.libd.org/DeconvoBuddies/reference/findMarkers_1vAll.html), [`DeconvoBuiddies::get_mean_ratio()`](https://research.libd.org/DeconvoBuddies/reference/get_mean_ratio.html) or other tools. See [Pullin and McCarthy, _Genome Biol._, 2024](https://doi.org/10.1186/s13059-024-03183-0) for a list of marker gene selection methods.

# Conclusion

`visiumStitched` provides a set of helper functions, in conjunction with
`ImageJ`/`Fiji`, intended to simplify the stitching of Visium data into a
spatially integrated `SpatialExperiment` object ready for analysis. We hope you
find it useful for your research!

# Reproducibility

The `r Biocpkg("visiumStitched")` package `r Citep(bib[["visiumStitched"]])` was
made possible thanks to:

* R `r Citep(bib[["R"]])`
* `r Biocpkg("BiocFileCache")` `r Citep(bib[["BiocFileCache"]])`
* `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])`
* `r CRANpkg("dplyr")` `r Citep(bib[["dplyr"]])`
* `r Biocpkg("DropletUtils")` `r Citep(bib[["DropletUtils"]])`
* `r CRANpkg("ggplot2")` `r Citep(bib[["ggplot2"]])`
* `r CRANpkg("imager")` `r Citep(bib[["imager"]])`
* `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])`
* `r CRANpkg("pkgcond")` `r Citep(bib[["pkgcond"]])`
* `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`
* `r CRANpkg("rjson")` `r Citep(bib[["rjson"]])`
* `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])`
* `r Biocpkg("S4Vectors")` `r Citep(bib[["S4Vectors"]])`
* `r CRANpkg("sessioninfo")` `r Citep(bib[["sessioninfo"]])`
* `r CRANpkg("Seurat")` `r Citep(bib[["Seurat"]])`
* `r Biocpkg("SpatialExperiment")` `r Citep(bib[["SpatialExperiment"]])`
* `r Biocpkg("spatialLIBD")` `r Citep(bib[["spatialLIBD"]])`
* `r CRANpkg("stringr")` `r Citep(bib[["stringr"]])`
* `r Biocpkg("SummarizedExperiment")` `r Citep(bib[["SummarizedExperiment"]])`
* `r CRANpkg("testthat")` `r Citep(bib[["testthat"]])`
* `r CRANpkg("xml2")` `r Citep(bib[["xml2"]])`

This package was developed using `r BiocStyle::Biocpkg("biocthis")`.

Code for creating the vignette

```{r createVignette, eval=FALSE}
## Create the vignette
library("rmarkdown")
system.time(render("visiumStitched.Rmd", "BiocStyle::html_document"))

## Extract the R code
library("knitr")
knit("visiumStitched.Rmd", tangle = TRUE)
```

`R` session information.

```{r reproduce3, echo=FALSE}
## Session info
library("sessioninfo")
options(width = 120)
session_info()
```

# Bibliography

This vignette was generated using `r Biocpkg("BiocStyle")`
`r Citep(bib[["BiocStyle"]])` with `r CRANpkg("knitr")`
`r Citep(bib[["knitr"]])` and `r CRANpkg("rmarkdown")`
`r Citep(bib[["rmarkdown"]])` running behind the scenes.

Citations made with `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`.

```{r vignetteBiblio, results = "asis", echo = FALSE, warning = FALSE, message = FALSE}
## Print bibliography
PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html"))
```