gg4way

Introduction

4way plots enable a comparison of the logFC values from two contrasts of differential gene expression (Friedman and Maniatis 2011). The gg4way package creates 4way plots using the ggplot2 framework and supports popular Bioconductor objects. The package also provides information about the correlation between contrasts and significant genes of interest.

Installation

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("gg4way")

To install the development version directly from GitHub:

if (!requireNamespace("remotes", quietly = TRUE)) {
    install.packages("remotes")
}

remotes::install_github("ben-laufer/gg4way")

Quick start: limma

This example involves testing a popular RNA-seq dataset using limma-voom.

Prepare data

First the airway data package is loaded, gene symbols are added, and then for the purpose of this vignette only genes with symbols are kept.

library("airway")
data("airway")
se <- airway

library("org.Hs.eg.db")
rowData(se)$symbol <- mapIds(org.Hs.eg.db,
                             keys = rownames(se),
                             column = "SYMBOL",
                             keytype = "ENSEMBL")

rowData(se)$ID <- rownames(se)

se <- se[!is.na(rowData(se)$symbol)]

limma-voom

The output from limma::eBayes() and limma::treat() is supported; however, only the former is shown for this example.

library("edgeR")
library("limma")

dge <- se |>
    SE2DGEList()

design <- model.matrix(~ 0 + cell + dex, data = dge$samples)
colnames(design) <- gsub("cell", "", colnames(design))

contr.matrix <- makeContrasts(N61311 - N052611,
                              N061011 - N052611,
                              levels = c("N052611", "N061011",
                                         "N080611", "N61311",
                                         "dexuntrt"))

keep <- filterByExpr(dge, design)
dge <- dge[keep, ]

efit <- dge |>
    calcNormFactors() |>
    voom(design) |>
    lmFit(design) |>
    contrasts.fit(contrasts = contr.matrix) |>
    eBayes()

Plot

Finally, we create a 4way plot comparing the logFC for all genes in the two contrasts.

library("gg4way")

p1 <- efit |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611")

p1
gg4way from limma

gg4way from limma

The legend title at the bottom shows that there is a correlation of r = 0.43, which is exemplified by more shared DEGs (blue dots) going in the same direction (upper right and bottom left) than opposite direction (upper left and bottom right). The numbers in the plot give the totals for the different quadrants of the 4way plot. If you look at the bottom left quadrant, the blue text shows that there are 62 DEGs where N052611 has significantly increased expression relative to both N61311 and N061011. The red text shows that there are 238 DEGs where N052611 has significantly increased expression relative to N61311 only, while the green text shows that there are 41 DEGs where N052611 has significantly increased expression relative to N061011 only.

Add gene labels

Table for labels

The genes that are significant in both contrasts can be obtained in a table through getShared().

p1 |>
    getShared() |>
    head()
## # A tibble: 6 × 12
##   ID              symbol   `N61311 vs N052611 LogFC` `N061011 vs N052611 LogFC`
##   <chr>           <chr>                        <dbl>                      <dbl>
## 1 ENSG00000204941 PSG5                         -4.61                      -5.24
## 2 ENSG00000164308 ERAP2                         3.45                       3.77
## 3 ENSG00000018625 ATP1A2                       -2.81                      -2.17
## 4 ENSG00000262902 MTCO1P40                     -6.72                      -6.78
## 5 ENSG00000180914 OXTR                         -3.78                      -3.18
## 6 ENSG00000078018 MAP2                         -3.10                      -1.35
## # ℹ 8 more variables: `N61311 vs N052611 FDR` <dbl>,
## #   `N061011 vs N052611 FDR` <dbl>, `N61311 vs N052611 FDRpass` <lgl>,
## #   `N061011 vs N052611 FDRpass` <lgl>, `N61311 vs N052611 Direction` <chr>,
## #   `N061011 vs N052611 Direction` <chr>, Significant <fct>, alpha <dbl>

Plotting labels

Gene symbols can be added to the plot through the label argument. Setting it to TRUE will plot all the genes colored blue, while specific genes can be labelled by providing their symbol. Below, two of the genes from the above table are labelled in the plot.

efit |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611",
           label = c("PSG5", "ERAP2"))
gg4way with labels

gg4way with labels

Advanced options

Axis titles

The axis titles can be further customized through ggplot2.

p1 +
    xlab(expression(atop(
        paste("Higher in Line 2" %<->% "Higher in Line 1"),
        paste("Line 1 vs 2 LogFC")))) +
    ylab(expression(atop(
        paste("Line 3 vs 2"),
        paste("Higher in Line 2" %<->% "Higher in Line 3"))))
gg4way with custom axis titles

gg4way with custom axis titles

Correlation only

The Pearson correlation coefficient can be obtained from getCor(). Advanced users can apply this in their own functions to compare across pairs in a heatmap.

p1 |>
    getCor()
## [1] 0.43

edgeR

In addition to the output of limma, the functions are also compatible with edgeR and DESeq2. If a user is starting here, they will first have to run the Prepare data and limma-voom subsections in the Quick start: limma section. The output from edgeR::glmQLFTest(), edgeR::glmTreat(), and edgeR::glmLRT() is supported.

library("purrr")

rfit <- dge |>
    calcNormFactors() |>
    estimateDisp() |>
    glmQLFit(design) 

rfit <- contr.matrix |> 
    colnames() |>
    set_names() |>
    map(~ rfit |>
            glmQLFTest(contrast = contr.matrix[,.x]))

rfit |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611")
gg4way from edgeR

gg4way from edgeR

DESeq2

If a user is starting here, they will first have to run the Prepare data subsection in the Quick start: limma section.

DESeq analysis

For the purpose of this vignette, we filter the object to remove the difference between the results name and contrast approaches shown below.

library("DESeq2")

ddsSE <- se |>
    DESeqDataSet(design = ~ cell + dex)

keep <- ddsSE |>
    counts() |>
    apply(1, function(gene) {
        all(gene != 0)
    })

ddsSE <- ddsSE[keep, ]

dds <- ddsSE |>
    DESeq()

Plot by results name

dds |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611")
gg4way from DESeq2

gg4way from DESeq2

Plot by contrast

The same result as above can be obtained through the contrast argument of DESeq2::results(), where you can also specify the lfcThreshold.

list("N61311 vs N052611" = c("cell", "N61311", "N052611"),
     "N061011 vs N052611" = c("cell", "N061011", "N052611")) |>
    map(~ results(dds, contrast = .x)) |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611")
gg4way from DESeq2 contrast

gg4way from DESeq2 contrast

Plot by lfcShrink

Finally, the output of DESeq2::lfcShrink() can also be plotted.

list("N61311 vs N052611" = c("cell", "N61311", "N052611"),
     "N061011 vs N052611" = c("cell", "N061011", "N052611")) |>
    map(~ dds |>
        results(contrast = .x) |>
        lfcShrink(dds, contrast = .x, res = _, type = "normal")) |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611")
gg4way from DESeq2 lfcShrink

gg4way from DESeq2 lfcShrink

Other packages

gg4way is not limited to input from limma, edgeR, or DESeq2. It also works with a named list of data.frames, where the names correspond to the x and y arguments. The separator between groups in the names of the list can be specified using the sep argument. Each data.frame should have columns corresponding to values provided to the ID, logFC, and FDR arguments. The symbol column is optional and used for the plot labels. This enables cases where not every feature has a (unique) gene ID. If the symbol column is not present then the argument should be given the ID column. The default values for the arguments can be seen in the documentation through ?gg4way().

Package support

The Bioconductor support site is the preferred method to ask for help. Before posting, it’s recommended to check previous posts for the answer and look over the posting guide. For the post, it’s important to use the gg4way tag and provide both a minimal reproducible example and session information.

Session info

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] DESeq2_1.47.0               purrr_1.0.2                
##  [3] gg4way_1.5.0                ggplot2_3.5.1              
##  [5] edgeR_4.3.21                limma_3.61.12              
##  [7] org.Hs.eg.db_3.20.0         AnnotationDbi_1.69.0       
##  [9] airway_1.25.0               SummarizedExperiment_1.35.5
## [11] Biobase_2.67.0              GenomicRanges_1.57.2       
## [13] GenomeInfoDb_1.41.2         IRanges_2.39.2             
## [15] S4Vectors_0.43.2            BiocGenerics_0.53.0        
## [17] MatrixGenerics_1.17.1       matrixStats_1.4.1          
## [19] BiocStyle_2.35.0           
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1        farver_2.1.2            dplyr_1.1.4            
##  [4] blob_1.2.4              Biostrings_2.75.0       fastmap_1.2.0          
##  [7] janitor_2.2.0           digest_0.6.37           timechange_0.3.0       
## [10] lifecycle_1.0.4         statmod_1.5.0           KEGGREST_1.45.1        
## [13] RSQLite_2.3.7           magrittr_2.0.3          compiler_4.4.1         
## [16] rlang_1.1.4             sass_0.4.9              tools_4.4.1            
## [19] utf8_1.2.4              yaml_2.3.10             knitr_1.48             
## [22] labeling_0.4.3          S4Arrays_1.5.11         bit_4.5.0              
## [25] DelayedArray_0.33.1     abind_1.4-8             BiocParallel_1.41.0    
## [28] withr_3.0.2             sys_3.4.3               grid_4.4.1             
## [31] fansi_1.0.6             colorspace_2.1-1        scales_1.3.0           
## [34] cli_3.6.3               rmarkdown_2.28          crayon_1.5.3           
## [37] generics_0.1.3          httr_1.4.7              DBI_1.2.3              
## [40] cachem_1.1.0            stringr_1.5.1           splines_4.4.1          
## [43] zlibbioc_1.51.2         parallel_4.4.1          BiocManager_1.30.25    
## [46] XVector_0.45.0          vctrs_0.6.5             Matrix_1.7-1           
## [49] jsonlite_1.8.9          bit64_4.5.2             ggrepel_0.9.6          
## [52] maketools_1.3.1         locfit_1.5-9.10         tidyr_1.3.1            
## [55] jquerylib_0.1.4         glue_1.8.0              codetools_0.2-20       
## [58] stringi_1.8.4           lubridate_1.9.3         gtable_0.3.6           
## [61] UCSC.utils_1.1.0        munsell_0.5.1           tibble_3.2.1           
## [64] pillar_1.9.0            htmltools_0.5.8.1       GenomeInfoDbData_1.2.13
## [67] R6_2.5.1                evaluate_1.0.1          lattice_0.22-6         
## [70] highr_0.11              png_0.1-8               snakecase_0.11.1       
## [73] memoise_2.0.1           bslib_0.8.0             Rcpp_1.0.13            
## [76] SparseArray_1.5.45      xfun_0.48               buildtools_1.0.0       
## [79] pkgconfig_2.0.3

References

Friedman, B. A., and T. Maniatis. 2011. ExpressionPlot: a web-based framework for analysis of RNA-Seq and microarray gene expression data.” Genome Biol 12 (7): R69.