DESeq2

Introduction

In this vignette we present the basic features of Glimma. Glimma is an interactive R widget for creating plots for differential expression analysis, created using the Vega and htmlwidgets frameworks. The created plots can be embedded in R Markdown, or exported as standalone HTML documents. The data presented here is slightly modified from the RNAseq123 workflow with only a single contrast has been performed for simplicity. Here we use DESeq2 to fit the model.

To begin, the DGEList object from the workflow has been included with the package as internal data. We will convert this to a DESeq data object.

library(Glimma)
library(edgeR)
library(DESeq2)

dge <- readRDS(system.file("RNAseq123/dge.rds", package = "Glimma"))

dds <- DESeqDataSetFromMatrix(
  countData = dge$counts,
  colData = dge$samples,
  rowData = dge$genes,
  design = ~group
)
#> converting counts to integer mode

MDS Plot

The multidimensional scaling (MDS) plot is frequently used to explore differences in samples. When data has been MDS transformed, the first two dimensions explain the greatest variance between samples, and the amount of variance decreases monotonically with increasing dimension.

The Glimma MDS contains two main components:

a plot showing two MDS dimensions, and
a plot of the eigenvalues of each dimension

The Glimma MDS allows different dimensions to be plotted against each other, with the proportion of variability explained by each dimension highlighted in the barplot alongside it. The interactive MDS plot can be created simply with a single argument for a DESeqDataSet object. The points in the MDS plot can have their size, colour and shape changed based on the information that is stored in the colData of the DESeqDataSet.

glimmaMDS(dds)

Interactions with the plot

In the plot above, try:

Scaling the points by library size (lib_size).
Changing the colour of points by group using the colour_by field.
Changing the colour scheme using to colour points using the colourscheme field.
Altering the shape of points by sample sequencing lane using the shape_by field.
Changing the dimensions plotted on the x-axis (x_axis) to dim2 and y-axis (y_axis) to dim3.
Saving the plots in either PNG or SVG formats using the “Save Plot” button.

Modifications to the plot

Adjusting plot size

Usage: glimmaMDS(dds, width=1200, height=1200)

Users can specify the width and height of the MDS plot widget in pixels. The default width and height are 900 and 500 respectively.

Continuous colour schemes

Usage: glimmaMDS(dds, continuous.color=TRUE)

This argument specifies that continuous colour schemes should be used, which can be useful for colouring samples by their expression for a particular gene.

Custom experimental groups

Usage: glimmaMDS(dds, groups=[vector or data frame])

This allows the user to change the associated sample information such as experimental groups. This information is displayed in mouseover tooltips and can be used to adjust the plot using scale_by, colour_by and shape_by fields.

MA Plot

The MA plot is a visualisation that plots the log-fold-change between experimental groups (M) against the mean expression across all the samples (A) for each gene.

The Glimma MA plot contains two main components:

a plot of summary statistics across all genes that have been tested, and
a plot of gene expression from individual samples for a given gene

The second plot shows gene expression from the last selected sample, which can be selected from the table or directly from the summary plot.

To create the MA plot we first need to run differential expression (DE) analysis for our data using the DESeq function.

dds <- DESeq(dds, quiet=TRUE)

The MA plot can then be created using the dds object that now contains fitted results and the gene counts.

glimmaMA(dds)
#> 15 of 16624 genes were filtered out in DESeq2 tests

Interactions with the plot

In the plot above, try:

Clicking points in the summary plot or rows in the table to plot the gene expression of the selection.
- Clicking genes in the table after selecting individual points will remove the previous selection.
Searching for individual genes using the search box. The search results are displayed in the table.
- If genes are currently selected, the search box will not function.
Setting a maximum value for the y-axis of the expression plot using the max_y_axis field.
- This allows for comparison of gene expression between genes on a comparable scale.
Saving the currently selected genes using the Save Data dropdown.
- From here, you can also choose to save the entire table.
Saving the summary plot or expression plot in either PNG or SVG formats, using the “Save Data” dropdown.

Modifications to the plot

Adjusting plot size

Usage: glimmaMA(dds, width=1200, height=1200)

Users can specify the width and height of the MA plot widget in pixels. The default width and height are both 920px.

Changing DE status colouring

Usage: glimmaMA(dds, status.cols=c("blue", "grey", "red")

Users can customise the colours associated with the differential expression status of a gene using the status.cols argument. A vector of length three should be passed in, where each element must be a valid CSS colour string.

Changing sample colours in expression plot

Usage: glimmaMA(dds, sample.cols=colours)

The sample.cols argument colours each sample based on the character vector of valid CSS colour strings colours. The colours vector must be of length ncol(counts).

Overriding counts and groups

Usage: glimmaMA(dds, counts=counts, groups=groups)

Glimma extracts counts from DESeq2::counts(dds) by default, and experimental groups from a group column in colData(dds) if it is available. However, users can optionally supply their own counts matrix and groups vector using the counts and groups arguments.

Transforming counts values

Usage: glimmaMA(dds, transform.counts="rpkm")

The transform.counts argument allows users to choose between strategies for transforming counts data displayed on the expression plot. The default argument is "logcpm" which log-transforms counts using edgeR::cpm(counts, log=TRUE). Other options are “rpkm" for edgeR::rpkm(counts), cpm for edgeR::cpm(counts) and none for no transformation.

Changing displayed columns in gene annotation The gene annotations are pulled from the DGEList object by default. This can be overwritten by providing a different table of annotations via the anno argument, the substitute annotations must have the same number of rows as the counts matrix and the genes must be in the same order as in the counts.

Some annotations may contain too many columsn to be sensibly displayed. The display.columns argument can be used to control the columns displayed in the plot. A vector of column names are to be provided for selecting the columns that will be displayed in the interactive plot.

Saving widgets

The plots created are automatically embedded into Rmarkdown reports, but having many interactive plots can significantly slow down the page. It is instead recommended to save the plots using htmlwidgets::saveWidget and linking to it via markdown hyperlinks.

# creates ma-plot.html in working directory
# link to it in Rmarkdown using [MA-plot](ma-plot.html)
htmlwidgets::saveWidget(glimmaMA(dds), "ma-plot.html")

Session Info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] DESeq2_1.47.3               SummarizedExperiment_1.37.0
#>  [3] Biobase_2.67.0              MatrixGenerics_1.19.1      
#>  [5] matrixStats_1.5.0           GenomicRanges_1.59.1       
#>  [7] GenomeInfoDb_1.43.4         IRanges_2.41.2             
#>  [9] S4Vectors_0.45.2            BiocGenerics_0.53.6        
#> [11] generics_0.1.3              edgeR_4.5.2                
#> [13] limma_3.63.3                Glimma_2.17.3              
#> [15] rmarkdown_2.29             
#> 
#> loaded via a namespace (and not attached):
#>  [1] sass_0.4.9              SparseArray_1.7.5       lattice_0.22-6         
#>  [4] magrittr_2.0.3          digest_0.6.37           evaluate_1.0.3         
#>  [7] grid_4.4.2              fastmap_1.2.0           jsonlite_1.8.9         
#> [10] Matrix_1.7-2            httr_1.4.7              UCSC.utils_1.3.1       
#> [13] scales_1.3.0            codetools_0.2-20        jquerylib_0.1.4        
#> [16] abind_1.4-8             cli_3.6.3               rlang_1.1.5            
#> [19] crayon_1.5.3            XVector_0.47.2          munsell_0.5.1          
#> [22] cachem_1.1.0            DelayedArray_0.33.5     yaml_2.3.10            
#> [25] S4Arrays_1.7.2          tools_4.4.2             parallel_4.4.2         
#> [28] BiocParallel_1.41.0     dplyr_1.1.4             colorspace_2.1-1       
#> [31] ggplot2_3.5.1           locfit_1.5-9.11         GenomeInfoDbData_1.2.13
#> [34] vctrs_0.6.5             buildtools_1.0.0        R6_2.5.1               
#> [37] lifecycle_1.0.4         htmlwidgets_1.6.4       pkgconfig_2.0.3        
#> [40] pillar_1.10.1           bslib_0.9.0             gtable_0.3.6           
#> [43] glue_1.8.0              Rcpp_1.0.14             statmod_1.5.0          
#> [46] tidyselect_1.2.1        tibble_3.2.1            xfun_0.50              
#> [49] sys_3.4.3               knitr_1.49              htmltools_0.5.8.1      
#> [52] maketools_1.3.1         compiler_4.4.2