---
title: "Spatial Linear and Mixed-Effects Modelling with spicy"
date: "`r BiocStyle::doc_date()`"
params:
  test: FALSE
author:
- name: Nicolas Canete
  affiliation:  
  - &WIMR Westmead Institute for Medical Research, University of Sydney, Australia
  email: nicolas.canete@sydney.edu.au
- name: Ellis Patrick
  affiliation:
  - &WIMR Westmead Institute for Medical Research, University of Sydney, Australia
  - School of Mathematics and Statistics, University of Sydney, Australia
- name: Alex Qin
  affiliation:
  - &WIMR Westmead Institute for Medical Research, University of Sydney, Australia
  - School of Mathematics and Statistics, University of Sydney, Australia
package: "`r BiocStyle::pkg_ver('spicyR')`"
abstract: > 
  Perform linear and mixed-effects models to assess and visualise changes 
  in cell localisation across disease conditions.
vignette: >
  %\VignetteIndexEntry{"Spatial Linear and Mixed-Effects Modelling with spicy"}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
output: 
  BiocStyle::html_document
bibliography: REFERENCES.bib
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE
)
library(BiocStyle)
```

# Installation

```{r, eval = FALSE}
if (!require("BiocManager")) {
  install.packages("BiocManager")
}
BiocManager::install("spicyR")
```


```{r warning=FALSE, message=FALSE}
# load required packages
library(spicyR)
library(ggplot2)
library(SpatialExperiment)
library(SpatialDatasets)
library(imcRtools)
```


# Overview

This guide provides step-by-step instructions on how to apply a linear model to multiple segmented and labelled images to assess how the localisation of different cell types changes across different disease conditions. 

# Example data

We use the @keren2018 breast cancer dataset to compare the spatial distribution of immune cells in individuals with different levels of tumour infiltration (cold and compartmentalised).

The data is stored as a `SpatialExperiment` object and contains single-cell spatial data from 41 images.


```{r warning=FALSE, message=FALSE}
kerenSPE <- SpatialDatasets::spe_Keren_2018()
```

The cell types in this dataset includes 11 immune cell types (double negative CD3 T cells, CD4 T cells, B cells, monocytes, macrophages, CD8 T cells, neutrophils, natural killer cells, dendritic cells, regulatory T cells), 2 structural cell types (endothelial, mesenchymal), 2 tumour cell types (keratin+ tumour, tumour) and one unidentified category.

# Linear modelling

To investigate changes in localisation between two different cell types, we 
measure the level of localisation between two cell types by modelling with the 
L-function. The L-function is a variance-stabilised K-function given by the equation

$$
\widehat{L_{ij}} (r) = \sqrt{\frac{\widehat{K_{ij}}(r)}{\pi}}
$$

with $\widehat{K_{ij}}$ defined as

$$
\widehat{K_{ij}} (r) = \frac{|W|}{n_i n_j} \sum_{n_i} \sum_{n_j} 1 \{d_{ij} \leq r \} e_{ij} (r)
$$

where $\widehat{K_{ij}}$ summarises the degree of co-localisation of cell type $j$ with cell type $i$, $n_i$ and $n_j$ are the number of cells of type $i$ and $j$, $|W|$ is the image area, $d_{ij}$ is the distance between two cells and $e_{ij} (r)$ is an edge correcting factor.

Specifically, the mean difference between the experimental function and the theoretical function is used as a measure for the level of localisation, defined as

$$
u = \sum_{r' = r_{\text{min}}}^{r_{\text{max}}} \widehat L_{ij, \text{Experimental}} (r') - \widehat L_{ij, \text{Poisson}} (r')
$$

where $u$ is the sum is taken over a discrete range of $r$ between $r_{\text{min}}$ and $r_{\text{max}}$. Differences of the statistic $u$ between two  conditions is modelled using a weighted linear model.

## Test for change in localisation for a specific pair of cells

Firstly, we can test whether one cell type tends to be more localised with another cell type in one condition compared to the other. This can be done using the `spicy()` function, where we specify the `condition` parameter.

In this example, we want to see whether or not neutrophils (`to`) tend to be found around CD8 T cells (`from`) in compartmentalised tumours compared to cold tumours. Given that there are 3 conditions, we can specify the desired conditions by setting the order of our `condition` factor. `spicy` will choose the first level of the factor as the base condition and the second level as the comparison condition. `spicy` will also naturally coerce the `condition` column into a factor if it is not already a factor. The column containing cell type annotations can be specified using the `cellTypeCol` argument. By default, `spicy` uses the column named `cellType` in the `SpatialExperiment` object.

```{r}
spicyTestPair <- spicy(
  kerenSPE,
  condition = "tumour_type",
  from = "CD8_T_cell",
  to = "Neutrophils"
)

topPairs(spicyTestPair)
```

We obtain a `spicy` object which details the results of the 
modelling performed. The `topPairs()` function can be used to obtain the associated coefficients and p-value.

As the `coefficient` in `spicyTestPair` is positive, we find that neutrophils are significantly more likely to be found near CD8 T cells in the compartmentalised tumours group compared to the cold tumour group.

## Test for change in localisation for all pairwise cell combinations

We can perform what we did above for all pairwise combinations of cell 
types by excluding the `from` and `to` parameters in `spicy()`.


```{r}
spicyTest <- spicy(
  kerenSPE,
  condition = "tumour_type"
)

topPairs(spicyTest)
```

Again, we obtain a `spicy` object which outlines the result of the linear
models performed for each pairwise combination of cell types.

We can also examine the L-function metrics of individual images by using the
convenient `bind()` function on our `spicyTest` results object.

```{r}
bind(spicyTest)[1:5, 1:5]
```

The results can be represented as a bubble plot using the `signifPlot()` function.

```{r}
signifPlot(
  spicyTest,
  breaks = c(-3, 3, 1),
  marksToPlot = c("Macrophages", "DC_or_Mono", "dn_T_CD3", "Neutrophils",
                  "CD8_T_cell", "Keratin_Tumour")
)
```

Here, we can observe that the most significant relationships occur between macrophages and double negative CD3 T cells, suggesting that the two cell types are far more dispersed in compartmentalised tumours compared to cold tumours.

To examine a specific cell type-cell type relationship in more detail, we can use `spicyBoxplot()` and specify either `from = "Macrophages"` and `to = "dn_T_CD3"` or `rank = 1`.

```{r}
spicyBoxPlot(results = spicyTest, 
             # from = "Macrophages",
             # to = "dn_T_CD3"
             rank = 1)
```

# Linear modelling for custom metrics

`spicyR` can also be applied to custom distance or abundance metrics. A kNN interactions graph can be generated with the function `buildSpatialGraph` from the `imcRtools` package. This generates a `colPairs` object inside of the `SpatialExperiment` object.

`spicyR` provides the function `convPairs` for converting a `colPairs` object into an abundance matrix by calculating the average number of nearby cells types for every cell type for a given `k`. For example, if there exists on average 5 neutrophils for every macrophage in image 1, the column `neutrophil__macrophage` would have a value of 5 for image 1.

```{r}
kerenSPE <- imcRtools::buildSpatialGraph(kerenSPE, 
                                         img_id = "imageID", 
                                         type = "knn", k = 20,
                                        coords = c("x", "y"))

pairAbundances <- convPairs(kerenSPE,
                  colPair = "knn_interaction_graph")

head(pairAbundances["B_cell__B_cell"])
```


The custom distance or abundance metrics can then be included in the analysis with the `alternateResult` parameter. The `Statial` package contains other custom distance metrics which can be used with `spicy`.

```{r}
spicyTestColPairs <- spicy(
  kerenSPE,
  condition = "tumour_type",
  alternateResult = pairAbundances,
  weights = FALSE
)

topPairs(spicyTestColPairs)
```


```{r}
signifPlot(
  spicyTestColPairs,
  breaks = c(-3, 3, 1),
  marksToPlot = c("Macrophages", "dn_T_CD3", "CD4_T_cell", 
                  "B_cell", "DC_or_Mono", "Neutrophils", "CD8_T_cell")
)
```

# Mixed effects modelling

`spicyR` supports mixed effects modelling when multiple images are obtained for each subject. In this case, `subject` is treated as a random effect and `condition` is treated as a fixed effect. To perform mixed effects modelling, we can specify the `subject` parameter in the `spicy` function.

```{r eval=FALSE}
spicyMixedTest <- spicy(
  diabetesData,
  condition = "stage",
  subject = "case"
)
```

# References


```{r}
sessionInfo()
```