---
title: "Supported differential expression methods"
author: 
  - name: Kevin Rue-Albrecht
    affiliation:
    - University of Oxford
    email: kevin.rue-albrecht@imm.ox.ac.uk
output: 
  BiocStyle::html_document:
    self_contained: yes
    toc: true
    toc_float: true
    toc_depth: 2
    code_folding: show
date: "`r doc_date()`"
package: "`r pkg_ver('iSEEde')`"
vignette: >
  %\VignetteIndexEntry{Supported differential expression methods}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}  
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html
)
options(width = 100)
```

```{r, eval=!exists("SCREENSHOT"), include=FALSE}
SCREENSHOT <- function(x, ...) knitr::include_graphics(x)
```

```{r vignetteSetup, echo=FALSE, message=FALSE, warning = FALSE}
## Track time spent on making the vignette
startTime <- Sys.time()

## Bib setup
library("RefManageR")

## Write bibliography information
bib <- c(
    R = citation(),
    BiocStyle = citation("BiocStyle")[1],
    knitr = citation("knitr")[1],
    RefManageR = citation("RefManageR")[1],
    rmarkdown = citation("rmarkdown")[1],
    sessioninfo = citation("sessioninfo")[1],
    testthat = citation("testthat")[1],
    iSEEde = citation("iSEEde")[1]
)
```

# Implementation

## User-facing storage and access

Differential expression results are generally reported as tables of statistics,
including (log~2~) fold-change, p-value, average expression, etc.

Those statistics being reported for individual features (e.g., genes), the
`rowData()` component of `SummarizedExperiment()` objects provides a natural
home for that information.
Specifically, `r BiocStyle::Biocpkg("iSEEde")` stores and retrieves differential expression results in `rowData(se)[["iSEEde"]]`.
However, the `rowData()` function should not be used to access or edit that information.
Instead, the functions `embedContrastResults()` and `contrastResults()`, should be used to store and retrieve contrast results, respectively, as those functions ensure that feature names are kept synchronised with the enclosing `SummarizedExperiment` object.

Moreover, the function `contrastResultsNames()` can be used to retrieve the names of contrast available in a given `SummarizedExperiment` object.

## Additional considerations

The first challenge arises when differential expression statistics are computed only for a subset of features.
In that case, `embedContrastResults()` populates missing information with `NA` values.

The second challenge arises from the different names of columns used by individual differential expression methods to store differential expression common statistics.
To address this, `r BiocStyle::Biocpkg("iSEEde")` provides S4 classes creating a common interface to supported differential expression methods.
Each class of differential expression results implements the following methods:

- `pValue()` returns the vector of raw p-values.
- `log2FoldChange()` returns the vector of log2-fold-change values.
- `averageLog2()` returns the vector of average log2-expression values.

# Example data

In this example, we use the `?airway` data set.

We briefly adjust the reference level of the treatment factor to the untreated condition.

```{r, message=FALSE, warning=FALSE}
library("airway")
data("airway")
airway$dex <- relevel(airway$dex, "untrt")
```

# Supported methods

## Limma

We first use `edgeR::filterByExpr()` to remove genes whose counts are too low to
support a rigorous differential expression analysis. Then we run a standard 
Limma-Voom analysis using `edgeR::voomLmFit()`, `limma::makeContrasts()`, and
`limma::eBayes()`.  (Alternatively, we could have used `limma::treat()` instead
of `limma::eBayes()`.)

The linear model includes the `dex` and `cell` covariates, indicating the
treatment conditions and cell types, respectively. Here, we are interested in 
differences between treatments, adjusted by cell type, and define this
comparison as the `dextrt - dexuntrt` contrast.

The final differential expression results are fetched using `limma::topTable()`.

```{r, message=FALSE, warning=FALSE}
library("edgeR")

design <- model.matrix(~ 0 + dex + cell, data = colData(airway))

keep <- filterByExpr(airway, design)
fit <- voomLmFit(airway[keep, ], design, plot = FALSE)
contr <- makeContrasts("dextrt - dexuntrt", levels = design)
fit <- contrasts.fit(fit, contr)
fit <- eBayes(fit)
res_limma <- topTable(fit, sort.by = "P", n = Inf)
head(res_limma)
```

Then, we embed this set of differential expression results in the `airway`
object using the `embedContrastResults()` method. 

Because the output of `limma::topTable()` is a standard `data.frame`, the
`class=` argument must be used to manually identify the method that produced
those results.

Supported classes are listed in the object `iSEEde::embedContrastResultsMethods`, i.e.

```{r, message=FALSE}
library(iSEEde)
embedContrastResultsMethods
```

This manual method is preferable to any automated heuristic (e.g. using the
column names of the `data.frame` for identifying it as a `limma::topTable()`
output).

The results embedded in the `airway` object can be accessed using the `contrastResults()` function.

```{r}
airway <- embedContrastResults(res_limma, airway, name = "Limma-Voom", class = "limma")
contrastResults(airway)
contrastResults(airway, "Limma-Voom")
```

## DESeq2

First, we use `DESeqDataSet()` to construct a `DESeqDataSet` object
for the analysis. Then we run a standard `r BiocStyle::Biocpkg("DESeq2")`
analysis using `DESeq()`.

The differential expression results are fetched using `results()`.

```{r, message=FALSE, warning=FALSE}
library("DESeq2")

dds <- DESeqDataSet(airway, ~ 0 + dex + cell)

dds <- DESeq(dds)
res_deseq2 <- results(dds, contrast = list("dextrt", "dexuntrt"))
head(res_deseq2)
```

Then, we embed this set of differential expression results in the `airway` 
object using the `embedContrastResults()` method.

In this instance, the `r BiocStyle::Biocpkg("DESeq2")` results are stored in a
recognisable `?DESeqResults` object, which can be given *as is* directly to the
`embedContrastResults()` method.

The function will automatically pass the results object to the 
`iSEEDESeq2Results()` constructor, to identify it as such.

The results embedded in the airway object can be accessed using the `contrastResults()` function.

```{r}
airway <- embedContrastResults(res_deseq2, airway, name = "DESeq2")
contrastResults(airway)
contrastResults(airway, "DESeq2")
```

## edgeR

We run a standard `r BiocStyle::Biocpkg("edgeR")` analysis using `glmFit()` and
`glmLRT()`.

The differential expression results are fetched using `topTags()`.

```{r, message=FALSE, warning=FALSE}
library("edgeR")

design <- model.matrix(~ 0 + dex + cell, data = colData(airway))

fit <- glmFit(airway, design, dispersion = 0.1)
lrt <- glmLRT(fit, contrast = c(-1, 1, 0, 0, 0))
res_edger <- topTags(lrt, n = Inf)
head(res_edger)
```

Then, we embed this set of differential expression results in the `airway`
object using the `embedContrastResults()` method.

In this instance, the `r BiocStyle::Biocpkg("edgeR")` results are stored in a 
recognisable `?TopTags` object. As such, the results object can be given *as is*
directly to the `embedContrastResults()` method. The function will automatically pass
the results object to the `iSEEedgeRResults()` constructor, to identify it as
such.

The results embedded in the airway object can be accessed using the `contrastResults()` function.

```{r}
airway <- embedContrastResults(res_edger, airway, name = "edgeR")
contrastResults(airway)
contrastResults(airway, "edgeR")
```

# Live app

In this example, we used the `r BiocStyle::Biocpkg("iSEEde")` functions `DETable()`, `VolcanoPlot()`, and `MAPlot()` to add panels that facilitate the visualisation of differential expression results in an `r BiocStyle::Biocpkg("iSEE")` app.

Specifically, we add one set of panels for each differential expression method used in this vignette (i.e., Limma-Voom, DESeq2, edgeR).

```{r, message=FALSE}
library(iSEE)
app <- iSEE(airway, initial = list(
  DETable(ContrastName="Limma-Voom", HiddenColumns = c("AveExpr", 
    "t", "B"), PanelWidth = 4L),
  VolcanoPlot(ContrastName = "Limma-Voom", PanelWidth = 4L),
  MAPlot(ContrastName = "Limma-Voom", PanelWidth = 4L),
  DETable(ContrastName="DESeq2", HiddenColumns = c("baseMean", 
    "lfcSE", "stat"), PanelWidth = 4L),
  VolcanoPlot(ContrastName = "DESeq2", PanelWidth = 4L),
  MAPlot(ContrastName = "DESeq2", PanelWidth = 4L),
  DETable(ContrastName="edgeR", HiddenColumns = c("logCPM", 
    "LR"), PanelWidth = 4L),
  VolcanoPlot(ContrastName = "edgeR", PanelWidth = 4L),
  MAPlot(ContrastName = "edgeR", PanelWidth = 4L)
))

if (interactive()) {
  shiny::runApp(app)
}
```

```{r, echo=FALSE, out.width="100%"}
SCREENSHOT("screenshots/methods_side_by_side.png", delay = 20)
```

# Comparing two contrasts

The `?LogFCLogFCPlot` class allows users to compare the log~2~ fold-change value of features between two differential expression contrasts.

In this example, we add one `?LogFCLogFCPlot` panel comparing the same contrast using the Limma-Voom and DESeq2 methods, alongside one `?VolcanoPlot` panel for each of those two contrasts.
Moreover, we pre-select an area of the `?LogFCLogFCPlot` and highlight the selected features in the two `?VolcanoPlot` panels.

```{r, message=FALSE, warning=FALSE}
library(iSEE)
app <- iSEE(airway, initial = list(
    VolcanoPlot(ContrastName="Limma-Voom",
        RowSelectionSource = "LogFCLogFCPlot1", ColorBy = "Row selection",
        PanelWidth = 4L),
    LogFCLogFCPlot(ContrastNameX="Limma-Voom", ContrastNameY="DESeq2",
        BrushData = list(
        xmin = 3.6, xmax = 8.2, ymin = 3.8, ymax = 9.8,
        mapping = list(x = "X", y = "Y"),
        direction = "xy", brushId = "LogFCLogFCPlot1_Brush", 
        outputId = "LogFCLogFCPlot1"),
        PanelWidth = 4L),
    VolcanoPlot(ContrastName="DESeq2",
        RowSelectionSource = "LogFCLogFCPlot1", ColorBy = "Row selection",
        PanelWidth = 4L)
))

if (interactive()) {
  shiny::runApp(app)
}
```

```{r, echo=FALSE, out.width="100%"}
SCREENSHOT("screenshots/logfc_logfc_plot.png", delay = 20)
```

# Reproducibility

The `r Biocpkg("iSEEde")` package `r Citep(bib[["iSEEde"]])` was made possible
thanks to:

* R `r Citep(bib[["R"]])`
* `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])`
* `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])`
* `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`
* `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])`
* `r CRANpkg("sessioninfo")` `r Citep(bib[["sessioninfo"]])`
* `r CRANpkg("testthat")` `r Citep(bib[["testthat"]])`

This package was developed using `r BiocStyle::Biocpkg("biocthis")`.


Code for creating the vignette

```{r createVignette, eval=FALSE}
## Create the vignette
library("rmarkdown")
system.time(render("methods.Rmd", "BiocStyle::html_document"))

## Extract the R code
library("knitr")
knit("methods.Rmd", tangle = TRUE)
```

Date the vignette was generated.

```{r reproduce1, echo=FALSE}
## Date the vignette was generated
Sys.time()
```

Wallclock time spent generating the vignette.

```{r reproduce2, echo=FALSE}
## Processing time in seconds
totalTime <- diff(c(startTime, Sys.time()))
round(totalTime, digits = 3)
```

`R` session information.

```{r reproduce3, echo=FALSE}
## Session info
library("sessioninfo")
options(width = 120)
session_info()
```


# Bibliography

This vignette was generated using `r Biocpkg("BiocStyle")` `r
Citep(bib[["BiocStyle"]])` with `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])`
and `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])` running behind the
scenes.

Citations made with `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`.

```{r vignetteBiblio, results = "asis", echo = FALSE, warning = FALSE, message = FALSE}
## Print bibliography
PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html"))
```

<!-- Links -->

[scheme-wikipedia]: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax
[iana-uri]: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml