--- title: "Supported differential expression methods" author: - name: Kevin Rue-Albrecht affiliation: - University of Oxford email: kevin.rue-albrecht@imm.ox.ac.uk output: BiocStyle::html_document: self_contained: yes toc: true toc_float: true toc_depth: 2 code_folding: show date: "`r doc_date()`" package: "`r pkg_ver('iSEEde')`" vignette: > %\VignetteIndexEntry{Supported differential expression methods} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html ) options(width = 100) ``` ```{r, eval=!exists("SCREENSHOT"), include=FALSE} SCREENSHOT <- function(x, ...) knitr::include_graphics(x) ``` ```{r vignetteSetup, echo=FALSE, message=FALSE, warning = FALSE} ## Track time spent on making the vignette startTime <- Sys.time() ## Bib setup library("RefManageR") ## Write bibliography information bib <- c( R = citation(), BiocStyle = citation("BiocStyle")[1], knitr = citation("knitr")[1], RefManageR = citation("RefManageR")[1], rmarkdown = citation("rmarkdown")[1], sessioninfo = citation("sessioninfo")[1], testthat = citation("testthat")[1], iSEEde = citation("iSEEde")[1] ) ``` # Implementation ## User-facing storage and access Differential expression results are generally reported as tables of statistics, including (log~2~) fold-change, p-value, average expression, etc. Those statistics being reported for individual features (e.g., genes), the `rowData()` component of `SummarizedExperiment()` objects provides a natural home for that information. Specifically, `r BiocStyle::Biocpkg("iSEEde")` stores and retrieves differential expression results in `rowData(se)[["iSEEde"]]`. However, the `rowData()` function should not be used to access or edit that information. Instead, the functions `embedContrastResults()` and `contrastResults()`, should be used to store and retrieve contrast results, respectively, as those functions ensure that feature names are kept synchronised with the enclosing `SummarizedExperiment` object. Moreover, the function `contrastResultsNames()` can be used to retrieve the names of contrast available in a given `SummarizedExperiment` object. ## Additional considerations The first challenge arises when differential expression statistics are computed only for a subset of features. In that case, `embedContrastResults()` populates missing information with `NA` values. The second challenge arises from the different names of columns used by individual differential expression methods to store differential expression common statistics. To address this, `r BiocStyle::Biocpkg("iSEEde")` provides S4 classes creating a common interface to supported differential expression methods. Each class of differential expression results implements the following methods: - `pValue()` returns the vector of raw p-values. - `log2FoldChange()` returns the vector of log2-fold-change values. - `averageLog2()` returns the vector of average log2-expression values. # Example data In this example, we use the `?airway` data set. We briefly adjust the reference level of the treatment factor to the untreated condition. ```{r, message=FALSE, warning=FALSE} library("airway") data("airway") airway$dex <- relevel(airway$dex, "untrt") ``` # Supported methods ## Limma We first use `edgeR::filterByExpr()` to remove genes whose counts are too low to support a rigorous differential expression analysis. Then we run a standard Limma-Voom analysis using `edgeR::voomLmFit()`, `limma::makeContrasts()`, and `limma::eBayes()`. (Alternatively, we could have used `limma::treat()` instead of `limma::eBayes()`.) The linear model includes the `dex` and `cell` covariates, indicating the treatment conditions and cell types, respectively. Here, we are interested in differences between treatments, adjusted by cell type, and define this comparison as the `dextrt - dexuntrt` contrast. The final differential expression results are fetched using `limma::topTable()`. ```{r, message=FALSE, warning=FALSE} library("edgeR") design <- model.matrix(~ 0 + dex + cell, data = colData(airway)) keep <- filterByExpr(airway, design) fit <- voomLmFit(airway[keep, ], design, plot = FALSE) contr <- makeContrasts("dextrt - dexuntrt", levels = design) fit <- contrasts.fit(fit, contr) fit <- eBayes(fit) res_limma <- topTable(fit, sort.by = "P", n = Inf) head(res_limma) ``` Then, we embed this set of differential expression results in the `airway` object using the `embedContrastResults()` method. Because the output of `limma::topTable()` is a standard `data.frame`, the `class=` argument must be used to manually identify the method that produced those results. Supported classes are listed in the object `iSEEde::embedContrastResultsMethods`, i.e. ```{r, message=FALSE} library(iSEEde) embedContrastResultsMethods ``` This manual method is preferable to any automated heuristic (e.g. using the column names of the `data.frame` for identifying it as a `limma::topTable()` output). The results embedded in the `airway` object can be accessed using the `contrastResults()` function. ```{r} airway <- embedContrastResults(res_limma, airway, name = "Limma-Voom", class = "limma") contrastResults(airway) contrastResults(airway, "Limma-Voom") ``` ## DESeq2 First, we use `DESeqDataSet()` to construct a `DESeqDataSet` object for the analysis. Then we run a standard `r BiocStyle::Biocpkg("DESeq2")` analysis using `DESeq()`. The differential expression results are fetched using `results()`. ```{r, message=FALSE, warning=FALSE} library("DESeq2") dds <- DESeqDataSet(airway, ~ 0 + dex + cell) dds <- DESeq(dds) res_deseq2 <- results(dds, contrast = list("dextrt", "dexuntrt")) head(res_deseq2) ``` Then, we embed this set of differential expression results in the `airway` object using the `embedContrastResults()` method. In this instance, the `r BiocStyle::Biocpkg("DESeq2")` results are stored in a recognisable `?DESeqResults` object, which can be given *as is* directly to the `embedContrastResults()` method. The function will automatically pass the results object to the `iSEEDESeq2Results()` constructor, to identify it as such. The results embedded in the airway object can be accessed using the `contrastResults()` function. ```{r} airway <- embedContrastResults(res_deseq2, airway, name = "DESeq2") contrastResults(airway) contrastResults(airway, "DESeq2") ``` ## edgeR We run a standard `r BiocStyle::Biocpkg("edgeR")` analysis using `glmFit()` and `glmLRT()`. The differential expression results are fetched using `topTags()`. ```{r, message=FALSE, warning=FALSE} library("edgeR") design <- model.matrix(~ 0 + dex + cell, data = colData(airway)) fit <- glmFit(airway, design, dispersion = 0.1) lrt <- glmLRT(fit, contrast = c(-1, 1, 0, 0, 0)) res_edger <- topTags(lrt, n = Inf) head(res_edger) ``` Then, we embed this set of differential expression results in the `airway` object using the `embedContrastResults()` method. In this instance, the `r BiocStyle::Biocpkg("edgeR")` results are stored in a recognisable `?TopTags` object. As such, the results object can be given *as is* directly to the `embedContrastResults()` method. The function will automatically pass the results object to the `iSEEedgeRResults()` constructor, to identify it as such. The results embedded in the airway object can be accessed using the `contrastResults()` function. ```{r} airway <- embedContrastResults(res_edger, airway, name = "edgeR") contrastResults(airway) contrastResults(airway, "edgeR") ``` # Live app In this example, we used the `r BiocStyle::Biocpkg("iSEEde")` functions `DETable()`, `VolcanoPlot()`, and `MAPlot()` to add panels that facilitate the visualisation of differential expression results in an `r BiocStyle::Biocpkg("iSEE")` app. Specifically, we add one set of panels for each differential expression method used in this vignette (i.e., Limma-Voom, DESeq2, edgeR). ```{r, message=FALSE} library(iSEE) app <- iSEE(airway, initial = list( DETable(ContrastName="Limma-Voom", HiddenColumns = c("AveExpr", "t", "B"), PanelWidth = 4L), VolcanoPlot(ContrastName = "Limma-Voom", PanelWidth = 4L), MAPlot(ContrastName = "Limma-Voom", PanelWidth = 4L), DETable(ContrastName="DESeq2", HiddenColumns = c("baseMean", "lfcSE", "stat"), PanelWidth = 4L), VolcanoPlot(ContrastName = "DESeq2", PanelWidth = 4L), MAPlot(ContrastName = "DESeq2", PanelWidth = 4L), DETable(ContrastName="edgeR", HiddenColumns = c("logCPM", "LR"), PanelWidth = 4L), VolcanoPlot(ContrastName = "edgeR", PanelWidth = 4L), MAPlot(ContrastName = "edgeR", PanelWidth = 4L) )) if (interactive()) { shiny::runApp(app) } ``` ```{r, echo=FALSE, out.width="100%"} SCREENSHOT("screenshots/methods_side_by_side.png", delay = 20) ``` # Comparing two contrasts The `?LogFCLogFCPlot` class allows users to compare the log~2~ fold-change value of features between two differential expression contrasts. In this example, we add one `?LogFCLogFCPlot` panel comparing the same contrast using the Limma-Voom and DESeq2 methods, alongside one `?VolcanoPlot` panel for each of those two contrasts. Moreover, we pre-select an area of the `?LogFCLogFCPlot` and highlight the selected features in the two `?VolcanoPlot` panels. ```{r, message=FALSE, warning=FALSE} library(iSEE) app <- iSEE(airway, initial = list( VolcanoPlot(ContrastName="Limma-Voom", RowSelectionSource = "LogFCLogFCPlot1", ColorBy = "Row selection", PanelWidth = 4L), LogFCLogFCPlot(ContrastNameX="Limma-Voom", ContrastNameY="DESeq2", BrushData = list( xmin = 3.6, xmax = 8.2, ymin = 3.8, ymax = 9.8, mapping = list(x = "X", y = "Y"), direction = "xy", brushId = "LogFCLogFCPlot1_Brush", outputId = "LogFCLogFCPlot1"), PanelWidth = 4L), VolcanoPlot(ContrastName="DESeq2", RowSelectionSource = "LogFCLogFCPlot1", ColorBy = "Row selection", PanelWidth = 4L) )) if (interactive()) { shiny::runApp(app) } ``` ```{r, echo=FALSE, out.width="100%"} SCREENSHOT("screenshots/logfc_logfc_plot.png", delay = 20) ``` # Reproducibility The `r Biocpkg("iSEEde")` package `r Citep(bib[["iSEEde"]])` was made possible thanks to: * R `r Citep(bib[["R"]])` * `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])` * `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])` * `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])` * `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])` * `r CRANpkg("sessioninfo")` `r Citep(bib[["sessioninfo"]])` * `r CRANpkg("testthat")` `r Citep(bib[["testthat"]])` This package was developed using `r BiocStyle::Biocpkg("biocthis")`. Code for creating the vignette ```{r createVignette, eval=FALSE} ## Create the vignette library("rmarkdown") system.time(render("methods.Rmd", "BiocStyle::html_document")) ## Extract the R code library("knitr") knit("methods.Rmd", tangle = TRUE) ``` Date the vignette was generated. ```{r reproduce1, echo=FALSE} ## Date the vignette was generated Sys.time() ``` Wallclock time spent generating the vignette. ```{r reproduce2, echo=FALSE} ## Processing time in seconds totalTime <- diff(c(startTime, Sys.time())) round(totalTime, digits = 3) ``` `R` session information. ```{r reproduce3, echo=FALSE} ## Session info library("sessioninfo") options(width = 120) session_info() ``` # Bibliography This vignette was generated using `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])` with `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])` and `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])` running behind the scenes. Citations made with `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`. ```{r vignetteBiblio, results = "asis", echo = FALSE, warning = FALSE, message = FALSE} ## Print bibliography PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html")) ``` [scheme-wikipedia]: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax [iana-uri]: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml