Volcano plots represent a useful way to visualise the results of differential expression analyses. Here, we present a highly-configurable function that produces publication-ready volcano plots. EnhancedVolcano (Blighe, Rana, and Lewis 2018) will attempt to fit as many labels in the plot window as possible, thus avoiding ‘clogging’ up the plot with labels that could not otherwise have been read. Other functionality allows the user to identify up to 5 different types of attributes in the same plot space via colour, shape, size, encircling, and shade parameter configurations.
For this example, we will follow the tutorial (from Section 3.1) of RNA-seq workflow: gene-level exploratory analysis and differential expression. Specifically, we will load the ‘airway’ data, where different airway smooth muscle cells were treated with dexamethasone.
Annotate the Ensembl gene IDs to gene symbols:
ens <- rownames(airway)
library(org.Hs.eg.db)
symbols <- mapIds(org.Hs.eg.db, keys = ens,
column = c('SYMBOL'), keytype = 'ENSEMBL')
symbols <- symbols[!is.na(symbols)]
symbols <- symbols[match(rownames(airway), names(symbols))]
rownames(airway) <- symbols
keep <- !is.na(rownames(airway))
airway <- airway[keep,]
Conduct differential expression using DESeq2 in order to create 2 sets of results:
library('DESeq2')
dds <- DESeqDataSet(airway, design = ~ cell + dex)
dds <- DESeq(dds, betaPrior=FALSE)
res <- results(dds,
contrast = c('dex','trt','untrt'))
res <- lfcShrink(dds,
contrast = c('dex','trt','untrt'), res=res, type = 'normal')
For the most basic volcano plot, only a single data-frame, data-matrix, or tibble of test results is required, containing point labels, log2FC, and adjusted or unadjusted P values. The default cut-off for log2FC is >|2|; the default cut-off for P value is 10e-6.
Virtually all aspects of an EnhancedVolcano plot can be configured for the purposes of accommodating all types of statistical distributions and labelling preferences. By default, EnhancedVolcano will only attempt to label genes that pass the thresholds that you set for statistical significance, i.e., ‘pCutoff’ and ‘FCcutoff’. In addition, it will only label as many of these that can reasonably fit in the plot space. The user can optionally supply a vector of labels (as ‘selectLab’) that s/he wishes to label in the plot.
The default P value cut-off of 10e-6 may be too relaxed for most studies, which may therefore necessitate increasing this threshold by a few orders of magnitude. Equally, the log2FC cut-offs may be too stringent, given that moderated ‘shrunk’ estimates of log2FC differences in differential expression analysis can now be calculated.
In this example, we also modify the point and label size, which can help to improve clarity where many variables went into the differential expression analysis.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'N061011 versus N61311',
pCutoff = 10e-32,
FCcutoff = 0.5,
pointSize = 3.0,
labSize = 6.0)
The default colour scheme may not be to everyone’s taste. Here we make it such that only the variables passing both the log2FC and P value thresholds are coloured red, with everything else black. We also adjust the value for ‘alpha’, which controls the transparency of the plotted points: 1 = 100% opaque; 0 = 100% transparent.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'N061011 versus N61311',
pCutoff = 10e-16,
FCcutoff = 1.5,
pointSize = 3.0,
labSize = 6.0,
col=c('black', 'black', 'black', 'red3'),
colAlpha = 1)
It can help, visually, to also plot different points as different shapes. The default shape is a circle. The user can specify their own shape encoding via the ‘shape’ parameter, which accepts either a single or four possible values: if four values, these then map to the standard designation that is also assigned by the colours; if a single value, all points are shaped with this value.
For more information on shape encoding search online at ggplot2 Quick Reference: shape
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'N061011 versus N61311',
pCutoff = 10e-16,
FCcutoff = 1.5,
pointSize = 4.0,
labSize = 6.0,
shape = 8,
colAlpha = 1)
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'N061011 versus N61311',
pCutoff = 10e-16,
FCcutoff = 1.5,
pointSize = 3.0,
labSize = 6.0,
shape = c(1, 4, 23, 25),
colAlpha = 1)
The lines that are drawn to indicate cut-off points are also modifiable. The parameter ‘cutoffLineType’ accepts the following values: “blank”, “solid”, “dashed”, “dotted”, “dotdash”, “longdash”, and “twodash”. The colour and thickness of these can also be modified with ‘cutoffLineCol’ and ‘cutoffLineWidth’. To disable the lines, set either cutoffLineType=“blank” or cutoffLineWidth=0.
Extra lines can also be added via ‘hline’ and ‘vline’ to display other cut-offs.
To make these more visible, we will also remove the default gridlines.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
xlim = c(-6, 6),
title = 'N061011 versus N61311',
pCutoff = 10e-12,
FCcutoff = 1.5,
pointSize = 3.0,
labSize = 6.0,
colAlpha = 1,
cutoffLineType = 'blank',
cutoffLineCol = 'black',
cutoffLineWidth = 0.8,
hline = c(10e-20,
10e-20 * 10e-30,
10e-20 * 10e-60,
10e-20 * 10e-90),
hlineCol = c('pink', 'hotpink', 'purple', 'black'),
hlineType = c('solid', 'longdash', 'dotdash', 'dotted'),
hlineWidth = c(1.0, 1.5, 2.0, 2.5),
gridlines.major = FALSE,
gridlines.minor = FALSE)
The position of the legend can also be changed to “left” or “right” (and stacked vertically), or ‘top’ or “bottom” (stacked horizontally). The legend text, label size, and icon size can also be modified.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
pCutoff = 10e-12,
FCcutoff = 1.5,
cutoffLineType = 'twodash',
cutoffLineWidth = 0.8,
pointSize = 4.0,
labSize = 6.0,
colAlpha = 1,
legendLabels=c('Not sig.','Log (base 2) FC','p-value',
'p-value & Log (base 2) FC'),
legendPosition = 'right',
legendLabSize = 16,
legendIconSize = 5.0)
Note: to make the legend completely invisible, specify:
In order to maximise free space in the plot window, one can fit more labels by adding connectors from labels to points, where appropriate. The width and colour of these connectors can also be modified with ‘widthConnectors’ and ‘colConnectors’, respectively. Further configuration is achievable via ‘typeConnectors’ (“open”, “closed”), ‘endsConnectors’ (“last”, “first”, “both”), and lengthConnectors (default = unit(0.01, ‘npc’)).
The result may not always be desirable as it can make the plot look overcrowded.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
xlab = bquote(~Log[2]~ 'fold change'),
pCutoff = 10e-32,
FCcutoff = 2.0,
pointSize = 4.0,
labSize = 6.0,
colAlpha = 1,
legendPosition = 'right',
legendLabSize = 12,
legendIconSize = 4.0,
drawConnectors = TRUE,
widthConnectors = 0.75)
In many situations, people may only wish to label their key variables / variables of interest. One can therefore supply a vector of these variables via the ‘selectLab’ parameter, the contents of which have to also be present in the vector passed to ‘lab’.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
selectLab = c('TMEM176B','ADH1A'),
xlab = bquote(~Log[2]~ 'fold change'),
pCutoff = 10e-14,
FCcutoff = 2.0,
pointSize = 4.0,
labSize = 6.0,
shape = c(4, 35, 17, 18),
colAlpha = 1,
legendPosition = 'right',
legendLabSize = 14,
legendIconSize = 5.0)
To improve label clarity, we can draw simple boxes around the plot’s labels via boxedLabels. This works much better when drawConnectors is also TRUE.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
selectLab = c('VCAM1','KCTD12','ADAM12',
'CXCL12','CACNB2','SPARCL1','DUSP1','SAMHD1','MAOA'),
xlab = bquote(~Log[2]~ 'fold change'),
pCutoff = 10e-14,
FCcutoff = 2.0,
pointSize = 4.0,
labSize = 6.0,
labCol = 'black',
labFace = 'bold',
boxedLabels = TRUE,
colAlpha = 4/5,
legendPosition = 'right',
legendLabSize = 14,
legendIconSize = 4.0,
drawConnectors = TRUE,
widthConnectors = 1.0,
colConnectors = 'black')
To make the labels italic, we can create a new vector in which we encode the labels as follows: italic(‘[LABEL]’). By then setting parseLabels = TRUE, these will be parsed by the internal ggplot2 or ggrepel engine and presented as italicised text. Advanced users can encode any expression as the label, which will then also be parsed.
To flip the volcano on its side, we just use EnhancedVolcano(…) + coord_flip().
lab_italics <- paste0("italic('", rownames(res), "')")
selectLab_italics = paste0(
"italic('",
c('VCAM1','KCTD12','ADAM12', 'CXCL12','CACNB2','SPARCL1','DUSP1','SAMHD1','MAOA'),
"')")
EnhancedVolcano(res,
lab = lab_italics,
x = 'log2FoldChange',
y = 'pvalue',
selectLab = selectLab_italics,
xlab = bquote(~Log[2]~ 'fold change'),
pCutoff = 10e-14,
FCcutoff = 1.0,
pointSize = 3.0,
labSize = 6.0,
labCol = 'black',
labFace = 'bold',
boxedLabels = TRUE,
parseLabels = TRUE,
col = c('black', 'pink', 'purple', 'red3'),
colAlpha = 4/5,
legendPosition = 'bottom',
legendLabSize = 14,
legendIconSize = 4.0,
drawConnectors = TRUE,
widthConnectors = 1.0,
colConnectors = 'black') + coord_flip()
In certain situations, one may wish to over-ride the default colour scheme with their own colour-scheme, such as colouring variables by pathway, cell-type or group. This can be achieved by supplying a named vector as ‘colCustom’.
In this example, we just wish to colour all variables with log2FC > 2.5 as ‘high’ and those with log2FC < -2.5 as ‘low’.
# create custom key-value pairs for 'high', 'low', 'mid' expression by fold-change
# this can be achieved with nested ifelse statements
keyvals <- ifelse(
res$log2FoldChange < -2.5, 'royalblue',
ifelse(res$log2FoldChange > 2.5, 'gold',
'black'))
keyvals[is.na(keyvals)] <- 'black'
names(keyvals)[keyvals == 'gold'] <- 'high'
names(keyvals)[keyvals == 'black'] <- 'mid'
names(keyvals)[keyvals == 'royalblue'] <- 'low'
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
selectLab = rownames(res)[which(names(keyvals) %in% c('high', 'low'))],
xlab = bquote(~Log[2]~ 'fold change'),
title = 'Custom colour over-ride',
pCutoff = 10e-14,
FCcutoff = 1.0,
pointSize = 3.5,
labSize = 4.5,
shape = c(6, 4, 2, 11),
colCustom = keyvals,
colAlpha = 1,
legendPosition = 'left',
legendLabSize = 15,
legendIconSize = 5.0,
drawConnectors = TRUE,
widthConnectors = 1.0,
colConnectors = 'black',
arrowheads = FALSE,
gridlines.major = TRUE,
gridlines.minor = FALSE,
border = 'partial',
borderWidth = 1.5,
borderColour = 'black')
In this example, we first over-ride the existing shape scheme and then both the colour and shape scheme at the same time.
# define different cell-types that will be shaded
celltype1 <- c('VCAM1','KCTD12','ADAM12','CXCL12')
celltype2 <- c('CACNB2','SPARCL1','DUSP1','SAMHD1','MAOA')
# create custom key-value pairs for different cell-types
# this can be achieved with nested ifelse statements
keyvals.shape <- ifelse(
rownames(res) %in% celltype1, 17,
ifelse(rownames(res) %in% celltype2, 64,
3))
keyvals.shape[is.na(keyvals.shape)] <- 3
names(keyvals.shape)[keyvals.shape == 3] <- 'PBMC'
names(keyvals.shape)[keyvals.shape == 17] <- 'Cell-type 1'
names(keyvals.shape)[keyvals.shape == 64] <- 'Cell-type 2'
p1 <- EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
selectLab = rownames(res)[which(names(keyvals) %in% c('high', 'low'))],
xlab = bquote(~Log[2]~ 'fold change'),
title = 'Custom shape over-ride',
pCutoff = 10e-14,
FCcutoff = 1.0,
pointSize = 4.5,
labSize = 4.5,
shapeCustom = keyvals.shape,
colCustom = NULL,
colAlpha = 1,
legendLabSize = 15,
legendPosition = 'left',
legendIconSize = 5.0,
drawConnectors = TRUE,
widthConnectors = 0.5,
colConnectors = 'grey50',
gridlines.major = TRUE,
gridlines.minor = FALSE,
border = 'partial',
borderWidth = 1.5,
borderColour = 'black')
# create custom key-value pairs for 'high', 'low', 'mid' expression by fold-change
# this can be achieved with nested ifelse statements
keyvals.colour <- ifelse(
res$log2FoldChange < -2.5, 'royalblue',
ifelse(res$log2FoldChange > 2.5, 'gold',
'black'))
keyvals.colour[is.na(keyvals.colour)] <- 'black'
names(keyvals.colour)[keyvals.colour == 'gold'] <- 'high'
names(keyvals.colour)[keyvals.colour == 'black'] <- 'mid'
names(keyvals.colour)[keyvals.colour == 'royalblue'] <- 'low'
p2 <- EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
selectLab = rownames(res)[which(names(keyvals) %in% c('High', 'Low'))],
xlab = bquote(~Log[2]~ 'fold change'),
title = 'Custom shape & colour over-ride',
pCutoff = 10e-14,
FCcutoff = 1.0,
pointSize = 5.5,
labSize = 0.0,
shapeCustom = keyvals.shape,
colCustom = keyvals.colour,
colAlpha = 1,
legendPosition = 'right',
legendLabSize = 15,
legendIconSize = 5.0,
drawConnectors = TRUE,
widthConnectors = 0.5,
colConnectors = 'grey50',
gridlines.major = TRUE,
gridlines.minor = FALSE,
border = 'full',
borderWidth = 1.0,
borderColour = 'black')
library(gridExtra)
library(grid)
grid.arrange(p1, p2,
ncol=2,
top = textGrob('EnhancedVolcano',
just = c('center'),
gp = gpar(fontsize = 32)))
In this example we add an extra level of identifying key variables by encircling them.
This feature works best for shading just 1 or 2 key variables. It is expected that the user can use the ‘shapeCustom’ parameter for more in depth identification of different types of variables.
The encircling feature relies on package ggalt
being
installed.
# define different cell-types that will be shaded
celltype1 <- c('VCAM1','CXCL12')
celltype2 <- c('SORT1', 'KLF15')
has_ggalt <- ! is(try(find.package("ggalt")), "try-error")
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
selectLab = c(if (has_ggalt) celltype1 else NULL, celltype2),
xlab = bquote(~Log[2]~ 'fold change'),
title = 'Shading cell-type 1|2',
pCutoff = 10e-14,
FCcutoff = 1.0,
pointSize = 8.0,
labSize = 6.0,
labCol = 'black',
labFace = 'bold',
boxedLabels = TRUE,
shape = 42,
colCustom = keyvals,
colAlpha = 1,
legendPosition = 'right',
legendLabSize = 20,
legendIconSize = 20.0,
# encircle
encircle = if (has_ggalt) celltype1 else NULL,
encircleCol = 'black',
encircleSize = 2.5,
encircleFill = 'pink',
encircleAlpha = 1/2,
# shade
shade = celltype2,
shadeAlpha = 1/2,
shadeFill = 'skyblue',
shadeSize = 1,
shadeBins = 5,
drawConnectors = TRUE,
widthConnectors = 2.0,
gridlines.major = TRUE,
gridlines.minor = FALSE,
border = 'full',
borderWidth = 5,
borderColour = 'black')
One can also supply a vector of sizes to pointSize for the purpose of having a different size for each poin. For example, if we want to change the size of just those variables with log2FC>2:
library("pasilla")
pasCts <- system.file("extdata", "pasilla_gene_counts.tsv",
package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata", "pasilla_sample_annotation.csv",
package="pasilla", mustWork=TRUE)
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","type")]
rownames(coldata) <- sub("fb", "", rownames(coldata))
cts <- cts[, rownames(coldata)]
library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design = ~ condition)
featureData <- data.frame(gene=rownames(cts))
mcols(dds) <- DataFrame(mcols(dds), featureData)
dds <- DESeq(dds)
res <- results(dds)
p1 <- EnhancedVolcano(res,
lab = rownames(res),
x = "log2FoldChange",
y = "pvalue",
pCutoff = 10e-4,
FCcutoff = 2,
ylim = c(0, -log10(10e-12)),
pointSize = c(ifelse(res$log2FoldChange>2, 8, 1)),
labSize = 6.0,
shape = c(6, 6, 19, 16),
title = "DESeq2 results",
subtitle = "Differential expression",
caption = bquote(~Log[2]~ "fold change cutoff, 2; p-value cutoff, 10e-4"),
legendPosition = "right",
legendLabSize = 14,
col = c("grey30", "forestgreen", "royalblue", "red2"),
colAlpha = 0.9,
drawConnectors = TRUE,
hline = c(10e-8),
widthConnectors = 0.5)
p1
We can over-ride the default ‘discrete’ colour scheme with a continuous one that shades between 2 colours based on nominal or adjusted p-value, whichever is selected by y, via colGradient:
p1 <- EnhancedVolcano(res,
lab = rownames(res),
x = "log2FoldChange",
y = "pvalue",
pCutoff = 10e-4,
FCcutoff = 2,
ylim = c(0, -log10(10e-12)),
pointSize = c(ifelse(res$log2FoldChange>2, 8, 1)),
labSize = 6.0,
shape = c(6, 6, 19, 16),
title = "DESeq2 results",
subtitle = "Differential expression",
caption = bquote(~Log[2]~ "fold change cutoff, 2; p-value cutoff, 10e-4"),
legendPosition = "right",
legendLabSize = 14,
colAlpha = 0.9,
colGradient = c('red3', 'royalblue'),
drawConnectors = TRUE,
hline = c(10e-8),
widthConnectors = 0.5)
p1
Custom axis ticks can be added in a ‘plug and play’ fashion via ggplot2 functionality, as follows:
More information on this can be found here: http://www.sthda.com/english/wiki/ggplot2-axis-ticks-a-guide-to-customize-tick-marks-and-labels
The development of EnhancedVolcano has benefited from contributions and suggestions from:
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] pasilla_1.33.0 DEXSeq_1.51.0
## [3] RColorBrewer_1.1-3 BiocParallel_1.41.0
## [5] gridExtra_2.3 DESeq2_1.45.3
## [7] org.Hs.eg.db_3.20.0 AnnotationDbi_1.69.0
## [9] magrittr_2.0.3 airway_1.25.0
## [11] SummarizedExperiment_1.35.5 Biobase_2.67.0
## [13] GenomicRanges_1.57.2 GenomeInfoDb_1.41.2
## [15] IRanges_2.39.2 S4Vectors_0.43.2
## [17] BiocGenerics_0.53.0 MatrixGenerics_1.17.1
## [19] matrixStats_1.4.1 EnhancedVolcano_1.25.0
## [21] ggrepel_0.9.6 ggplot2_3.5.1
## [23] knitr_1.48
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 bitops_1.0-9 httr2_1.0.5
## [4] biomaRt_2.63.0 rlang_1.1.4 compiler_4.4.1
## [7] RSQLite_2.3.7 png_0.1-8 vctrs_0.6.5
## [10] maps_3.4.2 stringr_1.5.1 pkgconfig_2.0.3
## [13] crayon_1.5.3 fastmap_1.2.0 dbplyr_2.5.0
## [16] XVector_0.45.0 labeling_0.4.3 utf8_1.2.4
## [19] Rsamtools_2.21.2 rmarkdown_2.28 UCSC.utils_1.1.0
## [22] bit_4.5.0 xfun_0.48 zlibbioc_1.51.2
## [25] cachem_1.1.0 ash_1.0-15 jsonlite_1.8.9
## [28] progress_1.2.3 blob_1.2.4 highr_0.11
## [31] DelayedArray_0.31.14 parallel_4.4.1 prettyunits_1.2.0
## [34] R6_2.5.1 bslib_0.8.0 stringi_1.8.4
## [37] genefilter_1.87.0 extrafontdb_1.0 jquerylib_0.1.4
## [40] Rcpp_1.0.13 extrafont_0.19 splines_4.4.1
## [43] Matrix_1.7-1 tidyselect_1.2.1 abind_1.4-8
## [46] yaml_2.3.10 codetools_0.2-20 hwriter_1.3.2.1
## [49] curl_5.2.3 lattice_0.22-6 tibble_3.2.1
## [52] withr_3.0.2 KEGGREST_1.45.1 evaluate_1.0.1
## [55] survival_3.7-0 BiocFileCache_2.15.0 isoband_0.2.7
## [58] xml2_1.3.6 Biostrings_2.75.0 filelock_1.0.3
## [61] pillar_1.9.0 KernSmooth_2.23-24 generics_0.1.3
## [64] hms_1.1.3 munsell_0.5.1 scales_1.3.0
## [67] xtable_1.8-4 glue_1.8.0 maketools_1.3.1
## [70] tools_4.4.1 sys_3.4.3 annotate_1.85.0
## [73] locfit_1.5-9.10 XML_3.99-0.17 buildtools_1.0.0
## [76] Rttf2pt1_1.3.12 colorspace_2.1-1 GenomeInfoDbData_1.2.13
## [79] ggalt_0.4.0 cli_3.6.3 rappdirs_0.3.3
## [82] proj4_1.0-14 fansi_1.0.6 S4Arrays_1.5.11
## [85] dplyr_1.1.4 gtable_0.3.6 sass_0.4.9
## [88] digest_0.6.37 SparseArray_1.5.45 geneplotter_1.83.0
## [91] farver_2.1.2 memoise_2.0.1 htmltools_0.5.8.1
## [94] lifecycle_1.0.4 httr_1.4.7 statmod_1.5.0
## [97] bit64_4.5.2 MASS_7.3-61
Blighe, Rana, and Lewis (2018)