The stable version of this package is available on Bioconductor. You can install it by running the following:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("vidger")
The latest developmental version of
ViDGER
can be installed via GitHub using the
devtools
package:
if (!require("devtools")) install.packages("devtools")
devtools::install_github("btmonier/vidger", ref = "devel")
Once installed, you will have access to the following functions:
vsBoxplot()
vsScatterPlot()
vsScatterMatrix()
vsDEGMatrix()
vsMAPlot()
vsMAMatrix()
vsVolcano()
vsVolcanoMatrix()
vsFourWay()
Further explanation will be given to how these functions work later
on in the documentation. For the following examples, three toy data sets
will be used: df.cuff
, df.deseq
, and
df.edger
. Each of these data sets reflect the three RNA-seq
analyses this package covers. These can be loaded in the R workspace by
using the following command:
data(<data_set>)
Where <data_set>
is one of the previously
mentioned data sets. Some of the recurring elements that are found in
each of these functions are the type
and
d.factor
arguments. The type
argument tells
the function how to process the data for each analytical type
(i.e. "cuffdiff"
, "deseq"
, or
"edger"
). The d.factor
argument is used
specifically for DESeq2
objects which we will discuss in
the DESeq2 section. All other arguments are discussed in further detail
by looking at the respective help file for each functions
(i.e. ?vsScatterPlot
).
As mentioned earlier, three toy data sets are included with this package. In addition to these data sets, 5 “real-world” data sets were also used. All real-world data used is currently unpublished from ongoing collaborations. Summaries of this data can be found in the following tables:
Table 1: An overview of the toy data sets included in this package. In this table, each data set is summarized in terms of what analytical software was used, organism ID, experimental layout (replicates and treatments), number of transcripts (IDs), and size of the data object in terms of megabytes (MB).
Data | Software | Organism | Reps | Treat. | IDs | Size (MB) |
---|---|---|---|---|---|---|
df.cuff |
CuffDiff | H | 2 | 3 | 1200 | 0.2 |
sapiens | ||||||
df.deseq |
DESeq2 | D. | 2 | 3 | 29391 | 2.3 |
melanogaster | ||||||
df.deseq |
edgeR | A. | 2 | 3 | 724 | 0.1 |
thaliana |
Table 2: “Real-world” (RW) data set statistics. To test the reliability of our package, real data was used from human collections and several plant samples. Each data set is summarized in terms of organism ID, number of experimental samples (n), experimental conditions, and number of transcripts ( IDs).
Data | Organism | n | Exp. Conditions | IDs |
---|---|---|---|---|
RW-1 | H. | 10 | Two treatment dosages taken at two | 198002 |
sapiens | time points and one control sample | |||
taken at one time point | ||||
RW-2 | M. | 24 | Two phenotypes taken at four time | 63517 |
domestia | points (three replicates each) | |||
RW-3 | V. | 6 | Two conditions (three replicates | 59262 |
ripria: | each). | |||
bud | ||||
RW-4 | V. | 6 | Two conditions (three replicates | 17962 |
ripria: | each). | |||
shoot-tip | ||||
(7 days) | ||||
RW-5 | V. | 6 | Two conditions (three replicates | 19064 |
ripria: | each). | |||
shoot-tip | ||||
(21 days) |
Box plots are a useful way to determine the distribution of data. In
this case we can determine the distribution of FPKM or CPM values by
using the vsBoxPlot()
function. This function allows you to
extract necessary results-based data from analytical objects to create a
box plot comparing log10
(FPKM or CPM) distributions for experimental treatments.
vsBoxPlot(
data = df.cuff, d.factor = NULL, type = 'cuffdiff', title = TRUE,
legend = TRUE, grid = TRUE
)
A box plot example using the vsBoxPlot()
function with
cuffdiff
data. In this example, FPKM distributions for each
treatment within an experiment are shown in the form of a box and
whisker plot.
vsBoxPlot(
data = df.deseq, d.factor = 'condition', type = 'deseq',
title = TRUE, legend = TRUE, grid = TRUE
)
A box plot example using the vsBoxPlot()
function with
DESeq2
data. In this example, FPKM distributions for each
treatment within an experiment are shown in the form of a box and
whisker plot.
vsBoxPlot(
data = df.edger, d.factor = NULL, type = 'edger',
title = TRUE, legend = TRUE, grid = TRUE
)
A box plot example using the vsBoxPlot()
function with
edgeR
data. In this example, CPM distributions for each
treatment within an experiment are shown in the form of a box and
whisker plot
vsBoxPlot()
can allow for different iterations to
showcase data distribution. These changes can be implemented using the
aes
parameter. Currently, there are 6 different
variants:
box
: standard box plotviolin
: violin plotboxdot
: box plot with dot plot overlayviodot
: violin plot with dot plot overlayviosumm
: violin plot with summary stats overlaynotch
: box plot with notchbox
variantdata("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "box"
)
A box plot example using the aes
parameter:
box
.
violin
variantdata("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "violin"
)
A box plot example using the aes
parameter:
violin
.
boxdot
variantdata("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "boxdot"
)
A box plot example using the aes
parameter:
boxdot
.
viodot
variantdata("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "viodot"
)
A box plot example using the aes
parameter:
viodot
.
In addition to aesthetic changes, the fill color of each variant can
also be changed. This can be implemented by modifiying the
fill.color
parameter.
The palettes that can be used for this parameter are based off of the
palettes found in the RColorBrewer
package.
A visual list of all the palettes can be found here.
data("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "box", fill.color = "RdGy"
)
Color variant 1. A box plot example using the fill.color
parameter: RdGy
.
data("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "viosumm", fill.color = "Paired"
)
Color variant 2. A violin plot example using the fill.color
parameter: Paired
with the aes
parameter:
viosumm
.
data("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "notch", fill.color = "Greys"
)
Color variant 3. A notched box plot example using the
fill.color
parameter: Greys
with the
aes
parameter: notch
. Using these parameters,
we can also generate grey-scale plots.
This example will look at a basic scatter plot function,
vsScatterPlot()
. This function allows you to visualize
comparisons of log10
values of either FPKM or CPM measurements of two treatments depending on
analytical type.
vsScatterPlot(
x = 'hESC', y = 'iPS', data = df.cuff, type = 'cuffdiff',
d.factor = NULL, title = TRUE, grid = TRUE
)
A scatterplot example using the vsScatterPlot()
function
with Cuffdiff
data. In this visualization, log10
comparisons are made of fragments per kilobase of transcript per million
mapped reads (FPKM) measurments. The dashed line represents regression
line for the comparison.
vsScatterPlot(
x = 'treated_paired.end', y = 'untreated_paired.end',
data = df.deseq, type = 'deseq', d.factor = 'condition',
title = TRUE, grid = TRUE
)
A scatterplot example using the vsScatterPlot()
function
with DESeq2
data. In this visualization, log10
comparisons are made of fragments per kilobase of transcript per million
mapped reads (FPKM) measurments. The dashed line represents regression
line for the comparison.
vsScatterPlot(
x = 'WM', y = 'MM', data = df.edger, type = 'edger',
d.factor = NULL, title = TRUE, grid = TRUE
)
A scatterplot example using the vsScatterPlot()
function
with edgeR
data. In this visualization, log10
comparisons are made of fragments per kilobase of transcript per million
mapped reads (FPKM) measurments. The dashed line represents regression
line for the comparison.
This example will look at an extension of the
vsScatterPlot()
function which is
vsScatterMatrix()
. This function will create a matrix of
all possible comparisons of treatments within an experiment with
additional info.
vsScatterMatrix(
data = df.cuff, d.factor = NULL, type = 'cuffdiff',
comp = NULL, title = TRUE, grid = TRUE, man.title = NULL
)
A scatterplot matrix example using the vsScatterMatrix()
function with Cuffdiff
data. Similar to the scatterplot
function, this visualization allows for all comparisons to be made
within an experiment. In addition to the scatterplot visuals, FPKM
distributions (histograms) and correlation (Corr) values are generated.
vsScatterMatrix(
data = df.deseq, d.factor = 'condition', type = 'deseq',
comp = NULL, title = TRUE, grid = TRUE, man.title = NULL
)
A scatterplot matrix example using the vsScatterMatrix()
function with DESeq2
data. Similar to the scatterplot
function, this visualization allows for all comparisons to be made
within an experiment. In addition to the scatterplot visuals, FPKM
distributions (histograms) and correlation (Corr) values are generated.
vsScatterMatrix(
data = df.edger, d.factor = NULL, type = 'edger', comp = NULL,
title = TRUE, grid = TRUE, man.title = NULL
)
A scatterplot matrix example using the vsScatterMatrix()
function with edgeR
data. Similar to the scatterplot
function, this visualization allows for all comparisons to be made
within an experiment. In addition to the scatterplot visuals, FPKM
distributions (histograms) and correlation (Corr) values are generated.
Using the vsDEGMatrix()
function allows the user to
visualize the number of differentially expressed genes (DEGs) at a given
adjusted p-value (padj =
) for each experimental
treatment level. Higher color intensity correlates to a higher number of
DEGs.
vsDEGMatrix(
data = df.cuff, padj = 0.05, d.factor = NULL, type = 'cuffdiff',
title = TRUE, legend = TRUE, grid = TRUE
)
A matrix of differentially expressed genes (DEGs) at a given
p-value using the vsDEGMatrix()
function with
Cuffdiff
data. With this function, the user is able to
visualize the number of DEGs ata given adjusted p-value for
each experimental treatment level. Higher color intensity correlates to
a higher number of DEGs.
vsDEGMatrix(
data = df.deseq, padj = 0.05, d.factor = 'condition',
type = 'deseq', title = TRUE, legend = TRUE, grid = TRUE
)
A matrix of differentially expressed genes (DEGs) at a given
p-value using the vsDEGMatrix()
function with
DESeq2
data. With this function, the user is able to
visualize the number of DEGs ata given adjusted p-value for
each experimental treatment level. Higher color intensity correlates to
a higher number of DEGs.
vsDEGMatrix(
data = df.edger, padj = 0.05, d.factor = NULL, type = 'edger',
title = TRUE, legend = TRUE, grid = TRUE
)
A matrix of differentially expressed genes (DEGs) at a given
p-value using the vsDEGMatrix()
function with
edgeR
data. With this function, the user is able to
visualize the number of DEGs ata given adjusted p-value for
each experimental treatment level. Higher color intensity correlates to
a higher number of DEGs.
A grey-scale option is available for this function if you wish to use
a grey-to-white gradient instead of the classic blue-to-white gradient.
This can be invoked by setting the grey.scale
parameter to
TRUE
.
vsMAPlot()
visualizes the variance between two samples
in terms of gene expression values where logarithmic fold changes of
count data are plotted against mean counts. For more information on how
each of the aesthetics are plotted, please refer to the figure captions
and Method S1.
vsMAPlot(
x = 'iPS', y = 'hESC', data = df.cuff, d.factor = NULL,
type = 'cuffdiff', padj = 0.05, y.lim = NULL, lfc = NULL,
title = TRUE, legend = TRUE, grid = TRUE
)
MA plot visualization using the vsMAPLot()
function with
Cuffdiff
data. LFCs are plotted mean counts to determine
the variance between two treatments in terms of gene expression. Blue
nodes on the graph represent statistically significant LFCs which are
greater than a given value than a user-defined LFC parameter. Green
nodes indicate statistically significant LFCs which are less than the
user-defined LFC parameter. Gray nodes are data points that are not
statistically significant. Numerical values in parantheses for each
legend color indicate the number of transcripts that meet the prior
conditions. Triangular shapes represent values that exceed the viewing
area of the graph. Node size changes represent the magnitude of the LFC
values (i.e. larger shapes reflect larger LFC values). Dashed lines
indicate user-defined LFC values.
vsMAPlot(
x = 'treated_paired.end', y = 'untreated_paired.end',
data = df.deseq, d.factor = 'condition', type = 'deseq',
padj = 0.05, y.lim = NULL, lfc = NULL, title = TRUE,
legend = TRUE, grid = TRUE
)
MA plot visualization using the vsMAPLot()
function with
DESeq2
data. LFCs are plotted mean counts to determine the
variance between two treatments in terms of gene expression. Blue nodes
on the graph represent statistically significant LFCs which are greater
than a given value than a user-defined LFC parameter. Green nodes
indicate statistically significant LFCs which are less than the
user-defined LFC parameter. Gray nodes are data points that are not
statistically significant. Numerical values in parantheses for each
legend color indicate the number of transcripts that meet the prior
conditions. Triangular shapes represent values that exceed the viewing
area of the graph. Node size changes represent the magnitude of the LFC
values (i.e. larger shapes reflect larger LFC values). Dashed lines
indicate user-defined LFC values.
vsMAPlot(
x = 'WW', y = 'MM', data = df.edger, d.factor = NULL,
type = 'edger', padj = 0.05, y.lim = NULL, lfc = NULL,
title = TRUE, legend = TRUE, grid = TRUE
)
MA plot visualization using the vsMAPLot()
function with
edgeR
data. LFCs are plotted mean counts to determine the
variance between two treatments in terms of gene expression. Blue nodes
on the graph represent statistically significant LFCs which are greater
than a given value than a user-defined LFC parameter. Green nodes
indicate statistically significant LFCs which are less than the
user-defined LFC parameter. Gray nodes are data points that are not
statistically significant. Numerical values in parantheses for each
legend color indicate the number of transcripts that meet the prior
conditions. Triangular shapes represent values that exceed the viewing
area of the graph. Node size changes represent the magnitude of the LFC
values (i.e. larger shapes reflect larger LFC values). Dashed lines
indicate user-defined LFC values.
Similar to a scatter plot matrix, vsMAMatrix()
will
produce visualizations for all comparisons within your data set. For
more information on how the aesthetics are plotted in these
visualizations, please refer to the figure caption and Method S1.
vsMAMatrix(
data = df.cuff, d.factor = NULL, type = 'cuffdiff',
padj = 0.05, y.lim = NULL, lfc = 1, title = TRUE,
grid = TRUE, counts = TRUE, data.return = FALSE
)
A MA plot matrix using the vsMAMatrix()
function with
Cuffdiff
data. Similar to the vsMAPlot()
function, vsMAMatrix()
will generate a matrix of MA plots
for all comparisons within an experiment. LFCs are plotted mean counts
to determine the variance between two treatments in terms of gene
expression. Blue nodes on the graph represent statistically significant
LFCs which are greater than a given value than a user-defined LFC
parameter. Green nodes indicate statistically significant LFCs which are
less than the user-defined LFC parameter. Gray nodes are data points
that are not statistically significant. Numerical values in parantheses
for each legend color indicate the number of transcripts that meet the
prior conditions. Triangular shapes represent values that exceed the
viewing area of the graph. Node size changes represent the magnitude of
the LFC values (i.e. larger shapes reflect larger LFC values). Dashed
lines indicate user-defined LFC values.
vsMAMatrix(
data = df.deseq, d.factor = 'condition', type = 'deseq',
padj = 0.05, y.lim = NULL, lfc = 1, title = TRUE,
grid = TRUE, counts = TRUE, data.return = FALSE
)
A MA plot matrix using the vsMAMatrix()
function with
DESeq2
data. Similar to the vsMAPlot()
function, vsMAMatrix()
will generate a matrix of MA plots
for all comparisons within an experiment. LFCs are plotted mean counts
to determine the variance between two treatments in terms of gene
expression. Blue nodes on the graph represent statistically significant
LFCs which are greater than a given value than a user-defined LFC
parameter. Green nodes indicate statistically significant LFCs which are
less than the user-defined LFC parameter. Gray nodes are data points
that are not statistically significant. Numerical values in parantheses
for each legend color indicate the number of transcripts that meet the
prior conditions. Triangular shapes represent values that exceed the
viewing area of the graph. Node size changes represent the magnitude of
the LFC values (i.e. larger shapes reflect larger LFC values). Dashed
lines indicate user-defined LFC values.
vsMAMatrix(
data = df.edger, d.factor = NULL, type = 'edger',
padj = 0.05, y.lim = NULL, lfc = 1, title = TRUE,
grid = TRUE, counts = TRUE, data.return = FALSE
)
A MA plot matrix using the vsMAMatrix()
function with
edgeR
data. Similar to the vsMAPlot()
function, vsMAMatrix()
will generate a matrix of MA plots
for all comparisons within an experiment. LFCs are plotted mean counts
to determine the variance between two treatments in terms of gene
expression. Blue nodes on the graph represent statistically significant
LFCs which are greater than a given value than a user-defined LFC
parameter. Green nodes indicate statistically significant LFCs which are
less than the user-defined LFC parameter. Gray nodes are data points
that are not statistically significant. Numerical values in parantheses
for each legend color indicate the number of transcripts that meet the
prior conditions. Triangular shapes represent values that exceed the
viewing area of the graph. Node size changes represent the magnitude of
the LFC values (i.e. larger shapes reflect larger LFC values). Dashed
lines indicate user-defined LFC values.
The next few visualizations will focus on ways to display
differential gene expression between two or more treatments. Volcano
plots visualize the variance between two samples in terms of gene
expression values where the −log10
of calculated p-values (y-axis) are a plotted against the log2
changes (x-axis). These plots can be visualized with the
vsVolcano()
function. For more information on how each of
the aesthetics are plotted, please refer to the figure captions and
Method S1.
vsVolcano(
x = 'iPS', y = 'hESC', data = df.cuff, d.factor = NULL,
type = 'cuffdiff', padj = 0.05, x.lim = NULL, lfc = NULL,
title = TRUE, legend = TRUE, grid = TRUE, data.return = FALSE
)
A volcano plot example using the vsVolcano()
function with
Cuffdiff
data. In this visualization, comparisons are made
between the −log10
p-value versus the log2
fold change (LFC) between two treatments. Blue nodes on the graph
represent statistically significant LFCs which are greater than a given
value than a user-defined LFC parameter. Green nodes indicate
statistically significant LFCs which are less than the user-defined LFC
parameter. Gray nodes are data points that are not statistically
significant. Numerical values in parantheses for each legend color
indicate the number of transcripts that meet the prior conditions. Left
and right brackets (< and >) represent values that exceed the
viewing area of the graph. Node size changes represent the magnitude of
the LFC values (i.e. larger shapes reflect larger LFC values). Vertical
and horizontal lines indicate user-defined LFC and adjusted
p-values, respectively.
vsVolcano(
x = 'treated_paired.end', y = 'untreated_paired.end',
data = df.deseq, d.factor = 'condition', type = 'deseq',
padj = 0.05, x.lim = NULL, lfc = NULL, title = TRUE,
legend = TRUE, grid = TRUE, data.return = FALSE
)
A volcano plot example using the vsVolcano()
function with
DESeq2
data. In this visualization, comparisons are made
between the −log10
p-value versus the log2
fold change (LFC) between two treatments. Blue nodes on the graph
represent statistically significant LFCs which are greater than a given
value than a user-defined LFC parameter. Green nodes indicate
statistically significant LFCs which are less than the user-defined LFC
parameter. Gray nodes are data points that are not statistically
significant. Numerical values in parantheses for each legend color
indicate the number of transcripts that meet the prior conditions. Left
and right brackets (< and >) represent values that exceed the
viewing area of the graph. Node size changes represent the magnitude of
the LFC values (i.e. larger shapes reflect larger LFC values). Vertical
and horizontal lines indicate user-defined LFC and adjusted
p-values, respectively.
vsVolcano(
x = 'WW', y = 'MM', data = df.edger, d.factor = NULL,
type = 'edger', padj = 0.05, x.lim = NULL, lfc = NULL,
title = TRUE, legend = TRUE, grid = TRUE, data.return = FALSE
)
A volcano plot example using the vsVolcano()
function with
edgeR
data. In this visualization, comparisons are made
between the −log10
p-value versus the log2
fold change (LFC) between two treatments. Blue nodes on the graph
represent statistically significant LFCs which are greater than a given
value than a user-defined LFC parameter. Green nodes indicate
statistically significant LFCs which are less than the user-defined LFC
parameter. Gray nodes are data points that are not statistically
significant. Numerical values in parantheses for each legend color
indicate the number of transcripts that meet the prior conditions. Left
and right brackets (< and >) represent values that exceed the
viewing area of the graph. Node size changes represent the magnitude of
the LFC values (i.e. larger shapes reflect larger LFC values). Vertical
and horizontal lines indicate user-defined LFC and adjusted
p-values, respectively.
Similar to the prior matrix functions, vsVolcanoMatrix()
will produce visualizations for all comparisons within your data set.
For more information on how the aesthetics are plotted in these
visualizations, please refer to the figure caption and Method S1.
vsVolcanoMatrix(
data = df.cuff, d.factor = NULL, type = 'cuffdiff',
padj = 0.05, x.lim = NULL, lfc = NULL, title = TRUE,
legend = TRUE, grid = TRUE, counts = TRUE
)
A volcano plot matrix using the vsVolcanoMatrix()
function
with Cuffdiff
data. Similar to the vsVolcano()
function, vsVolcanoMatrix()
will generate a matrix of
volcano plots for all comparisons within an experiment. Comparisons are
made between the −log10
p-value versus the log2
fold change (LFC) between two treatments. Blue nodes on the graph
represent statistically significant LFCs which are greater than a given
value than a user-defined LFC parameter. Green nodes indicate
statistically significant LFCs which are less than the user-defined LFC
parameter. Gray nodes are data points that are not statistically
significant. The blue and green numbers in each facet represent the
number of transcripts that meet the criteria for blue and green nodes in
each comparison. Left and right brackets (< and >) represent
values that exceed the viewing area of the graph. Node size changes
represent the magnitude of the LFC values (i.e. larger shapes reflect
larger LFC values). Vertical and horizontal lines indicate user-defined
LFC and adjusted p-values, respectively.
vsVolcanoMatrix(
data = df.deseq, d.factor = 'condition', type = 'deseq',
padj = 0.05, x.lim = NULL, lfc = NULL, title = TRUE,
legend = TRUE, grid = TRUE, counts = TRUE
)
A volcano plot matrix using the vsVolcanoMatrix()
function
with DESeq2
data. Similar to the vsVolcano()
function, vsVolcanoMatrix()
will generate a matrix of
volcano plots for all comparisons within an experiment. Comparisons are
made between the −log10
p-value versus the log2
fold change (LFC) between two treatments. Blue nodes on the graph
represent statistically significant LFCs which are greater than a given
value than a user-defined LFC parameter. Green nodes indicate
statistically significant LFCs which are less than the user-defined LFC
parameter. Gray nodes are data points that are not statistically
significant. The blue and green numbers in each facet represent the
number of transcripts that meet the criteria for blue and green nodes in
each comparison. Left and right brackets (< and >) represent
values that exceed the viewing area of the graph. Node size changes
represent the magnitude of the LFC values (i.e. larger shapes reflect
larger LFC values). Vertical and horizontal lines indicate user-defined
LFC and adjusted p-values, respectively.
vsVolcanoMatrix(
data = df.edger, d.factor = NULL, type = 'edger', padj = 0.05,
x.lim = NULL, lfc = NULL, title = TRUE, legend = TRUE,
grid = TRUE, counts = TRUE
)
A volcano plot matrix using the vsVolcanoMatrix()
function
with edgeR
data. Similar to the vsVolcano()
function, vsVolcanoMatrix()
will generate a matrix of
volcano plots for all comparisons within an experiment. Comparisons are
made between the −log10
p-value versus the log2
fold change (LFC) between two treatments. Blue nodes on the graph
represent statistically significant LFCs which are greater than a given
value than a user-defined LFC parameter. Green nodes indicate
statistically significant LFCs which are less than the user-defined LFC
parameter. Gray nodes are data points that are not statistically
significant. The blue and green numbers in each facet represent the
number of transcripts that meet the criteria for blue and green nodes in
each comparison. Left and right brackets (< and >) represent
values that exceed the viewing area of the graph. Node size changes
represent the magnitude of the LFC values (i.e. larger shapes reflect
larger LFC values). Vertical and horizontal lines indicate user-defined
LFC and adjusted p-values, respectively.
To create four-way plots, the function, vsFourWay()
is
used. This plot compares the log2
fold changes between two samples and a ‘control’. For more information
on how each of the aesthetics are plotted, please refer to the figure
captions and Method S1.
vsFourWay(
x = 'iPS', y = 'hESC', control = 'Fibroblasts', data = df.cuff,
d.factor = NULL, type = 'cuffdiff', padj = 0.05, x.lim = NULL,
y.lim = NULL, lfc = NULL, legend = TRUE, title = TRUE, grid = TRUE
)
A four way plot visualization using the vsFourWay()
function with Cuffdiff
data. In this example, LFCs
comparisons between two treatments and a control are made. Blue nodes
indicate statistically significant LFCs which are greater than a given
user-defined value for both x and y-axes. Green nodes reflect
statistically significant LFCs which are less than a user-defined value
for treatment y and greater than said value for treatment x. Similar to
green nodes, red nodes reflect statistically significant LFCs which are
greater than a user-defined vlaue treatment y and less than said value
for treatment x. Gray nodes are data points that are not statistically
significant for both x and y-axes. Triangular shapes indicate values
which exceed the viewing are for the graph. Size change reflects the
magnitude of LFC values ( i.e. larger shapes reflect larger LFC values).
Vertical and horizontal dashed lines indicate user-defined LFC values.
vsFourWay(
x = 'treated_paired.end', y = 'untreated_single.read',
control = 'untreated_paired.end', data = df.deseq,
d.factor = 'condition', type = 'deseq', padj = 0.05, x.lim = NULL,
y.lim = NULL, lfc = NULL, legend = TRUE, title = TRUE, grid = TRUE
)
A four way plot visualization using the vsFourWay()
function with DESeq2
data. In this example, LFCs
comparisons between two treatments and a control are made. Blue nodes
indicate statistically significant LFCs which are greater than a given
user-defined value for both x and y-axes. Green nodes reflect
statistically significant LFCs which are less than a user-defined value
for treatment y and greater than said value for treatment x. Similar to
green nodes, red nodes reflect statistically significant LFCs which are
greater than a user-defined vlaue treatment y and less than said value
for treatment x. Gray nodes are data points that are not statistically
significant for both x and y-axes. Triangular shapes indicate values
which exceed the viewing are for the graph. Size change reflects the
magnitude of LFC values ( i.e. larger shapes reflect larger LFC values).
Vertical and horizontal dashed lines indicate user-defined LFC values.
vsFourWay(
x = 'WW', y = 'WM', control = 'MM', data = df.edger,
d.factor = NULL, type = 'edger', padj = 0.05, x.lim = NULL,
y.lim = NULL, lfc = NULL, legend = TRUE, title = TRUE, grid = TRUE
)
A four way plot visualization using the vsFourWay()
function with DESeq2
data. In this example, LFCs
comparisons between two treatments and a control are made. Blue nodes
indicate statistically significant LFCs which are greater than a given
user-defined value for both x and y-axes. Green nodes reflect
statistically significant LFCs which are less than a user-defined value
for treatment y and greater than said value for treatment x. Similar to
green nodes, red nodes reflect statistically significant LFCs which are
greater than a user-defined vlaue treatment y and less than said value
for treatment x. Gray nodes are data points that are not statistically
significant for both x and y-axes. Triangular shapes indicate values
which exceed the viewing are for the graph. Size change reflects the
magnitude of LFC values ( i.e. larger shapes reflect larger LFC values).
Vertical and horizontal dashed lines indicate user-defined LFC values.
For point-based plots, users can highlight IDs of interest (i.e. genes, transcripts, etc.). Currently, this functionality is implemented in the following functions:
vsScatterPlot()
vsMAPlot()
vsVolcano()
vsFourWay()
To use this feature, simply provide a vector of specified IDs to the
highlight
parameter found in the prior functions. An
example of a typical vector would be as follows:
## [1] "ID_001" "ID_002" "ID_003" "ID_004" "ID_005"
For specific examples using the toy data set, please see the proceeding 4 sub-sections.
vsScatterPlot()
data("df.cuff")
hl <- c(
"XLOC_000033",
"XLOC_000099",
"XLOC_001414",
"XLOC_001409"
)
vsScatterPlot(
x = "hESC", y = "iPS", data = df.cuff, d.factor = NULL,
type = "cuffdiff", title = TRUE, grid = TRUE, highlight = hl
)
Highlighting with vsScatterPlot()
. IDs of interest can be
identified within basic scatter plots. When highlighted, non-important
points will turn grey while highlighted points will turn blue. Text tags
will try to optimize their location within the graph without
trying to overlap each other.
vsMAPlot()
hl <- c(
"FBgn0022201",
"FBgn0003042",
"FBgn0031957",
"FBgn0033853",
"FBgn0003371"
)
vsMAPlot(
x = "treated_paired.end", y = "untreated_paired.end",
data = df.deseq, d.factor = "condition", type = "deseq",
padj = 0.05, y.lim = NULL, lfc = NULL, title = TRUE,
legend = TRUE, grid = TRUE, data.return = FALSE, highlight = hl
)
Highlighting with vsMAPlot()
. IDs of interest can be
identified within MA plots. When highlighted, non-important points will
decrease in transparency (i.e. lower alpha values) while highlighted
points will turn red. Text tags will try to optimize their
location within the graph without trying to overlap each other.
vsVolcano()
hl <- c(
"FBgn0036248",
"FBgn0026573",
"FBgn0259742",
"FBgn0038961",
"FBgn0038928"
)
vsVolcano(
x = "treated_paired.end", y = "untreated_paired.end",
data = df.deseq, d.factor = "condition",
type = "deseq", padj = 0.05, x.lim = NULL, lfc = NULL,
title = TRUE, grid = TRUE, data.return = FALSE, highlight = hl
)
Highlighting with vsVolcano()
. IDs of interest can be
identified within volcano plots. When highlighted, non-important points
will decrease in transparency (i.e. lower alpha values) while
highlighted points will turn red. Text tags will try to
optimize their location within the graph without trying to overlap each
other.
vsFourWay()
data("df.edger")
hl <- c(
"ID_639",
"ID_518",
"ID_602",
"ID_449",
"ID_076"
)
vsFourWay(
x = "WM", y = "WW", control = "MM", data = df.edger,
d.factor = NULL, type = "edger", padj = 0.05, x.lim = NULL,
y.lim = NULL, lfc = 2, title = TRUE, grid = TRUE,
data.return = FALSE, highlight = hl
)
Highlighting with vsFourWay()
. IDs of interest can be
identified within four-way plots. When highlighted, non-important points
will decrease in transparency (i.e. lower alpha values) while
highlighted points will turn dark grey. Text tags will try to
optimize their location within the graph without trying to overlap each
other.
For all plots, users can extract datasets used for the visualizations. You may want to pursue this option if you want to use a highly customized plot script or you would like to perform some unmentioned analysis, for example.
To use this this feature, set the data.return
parameter
in the function you are using to TRUE
. You will also need
to assign the function to an object. See the following example for
further details.
In this example, we will use the toy data set df.cuff
, a
cuffdiff output on the function vsScatterPlot()
. Take note
that we are assigning the function to an object tmp
:
# Extract data frame from visualization
data("df.cuff")
tmp <- vsScatterPlot(
x = "hESC", y = "iPS", data = df.cuff, d.factor = NULL,
type = "cuffdiff", title = TRUE, grid = TRUE, data.return = TRUE
)
The object we have created is a list with two elements:
data
and plot
. To extract the data, we can
call the first element of the list using the subset method
(<object>[[1]]
) or by invoking its element name
(<object>$data
):
## id x y
## 1 XLOC_000001 3.47386e-01 20.21750
## 2 XLOC_000002 0.00000e+00 0.00000
## 3 XLOC_000003 0.00000e+00 0.00000
## 4 XLOC_000004 6.97259e+05 0.00000
## 5 XLOC_000005 6.96704e+02 355.82300
## 6 XLOC_000006 0.00000e+00 1.51396
For all functions, users can modify the font size of multiple portions of the plot. These portions primarily revolve around these components:
To manipulate these components, users can modify the default values of the following parameters:
xaxis.text.size
yaxis.text.size
xaxis.title.size
yaxis.title.size
main.title.size
legend.text.size
legend.title.size
facet.title.size
Each of parameters mentioned in the prior section refer to numerical values. These values correlate to font size in typographic points. To illustrate what exactly these parameters modify, please refer to the following figure:
A visual guide to text size parameters. Users can modify these components which are highlighted by their respective parameter.
The facet.title.size
parameter refers to the facets
which are allocated in the matrix functions
(vsScatterMatrix()
, vsMAMatrix()
,
vsVolcanoMatrix()
). This is illustrated in the following
figure:
Location of facet titles. Facet title sizes can be modified using the
facet.title.size
parameter.
Since not all functions are equal in their parameters and component layout, some functions will either have or lack some of the prior parameters. To get an idea of which have functions have which, please refer to the following figure:
An overview of text size parameters for each function. Cells highlighted in red refer to parameters (columns) which are found in their respective functions (rows). Cells which are grey indicate parameters which are not found in each of the functions.
The shape and size of each data point will also change depending on several conditions. To maximize the viewing area while retaining high resolution, some data points will not be present within the viewing area. If they exceed the viewing area, they will change shape from a circle to a triangular orientation.
The extent (i.e. fold change) to how far these points exceed the viewing area are based on the following criteria:
To further clarify theses conditions, please refer to the following figure:
An illustration detailing the principles behind the node size for the differntial gene expression functions. In this figure, the data points increase in size depending on which quartile they reside as the absolute LFC increases (top bar). Data points that fall within the viewing area classified as SUB while data points that exceed this area are classified as T-1 through T-4.
Function efficiencies were determined by calculating system times by
using the microbenchmark
R package. Each function was ran
100 times with the prior code used in the documentation. All benchmarks
were determined on a machine running a 64-bit Windows 10 operating
system, 8 GB of RAM, and an Intel Core i5-6400 processor running at 2.7
GHz.
Benchmarks for the vsScatterPlot()
function. Time (ms)
distributions were generated for this function using 100 trials for each
of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example
data sets contained 1200, 724, and 29391 transcripts, respectively.
Benchmarks for the vsScatterMatrix()
function. Time (ms)
distributions were generated for this function using 100 trials for each
of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example
data sets contained 1200, 724, and 29391 transcripts, respectively.
Benchmarks for the vsBoxPlot()
function. Time (ms)
distributions were generated for this function using 100 trials for each
of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example
data sets contained 1200, 724, and 29391 transcripts, respectively.
Benchmarks for the vsDEGMatrix()
function. Time (ms)
distributions were generated for this function using 100 trials for each
of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example
data sets contained 1200, 724, and 29391 transcripts, respectively.
Benchmarks for the vsVolcano()
function. Time (ms)
distributions were generated for this function using 100 trials for each
of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example
data sets contained 1200, 724, and 29391 transcripts, respectively.
Benchmarks for the vsVolcanoMatrix()
function. Time (ms)
distributions were generated for this function using 100 trials for each
of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example
data sets contained 1200, 724, and 29391 transcripts, respectively.
Benchmarks for the vsMAPlot()
function. Time (ms)
distributions were generated for this function using 100 trials for each
of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example
data sets contained 1200, 724, and 29391 transcripts, respectively.
Benchmarks for the vsMAMatrix()
function. Time (s)
distributions were generated for this function using 100 trials for each
of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example
data sets contained 1200, 724, and 29391 transcripts, respectively.
Benchmarks for the vsFourWay()
function. Time (ms)
distributions were generated for this function using 100 trials for each
of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example
data sets contained 1200, 724, and 29391 transcripts, respectively.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] edgeR_4.5.2 limma_3.63.3
## [3] DESeq2_1.47.1 SummarizedExperiment_1.37.0
## [5] Biobase_2.67.0 MatrixGenerics_1.19.1
## [7] matrixStats_1.5.0 GenomicRanges_1.59.1
## [9] GenomeInfoDb_1.43.4 IRanges_2.41.2
## [11] S4Vectors_0.45.2 BiocGenerics_0.53.6
## [13] generics_0.1.3 vidger_1.27.0
## [15] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.50 bslib_0.8.0
## [4] ggplot2_3.5.1 ggrepel_0.9.6 GGally_2.2.1
## [7] lattice_0.22-6 vctrs_0.6.5 tools_4.4.2
## [10] parallel_4.4.2 tibble_3.2.1 pkgconfig_2.0.3
## [13] Matrix_1.7-2 RColorBrewer_1.1-3 lifecycle_1.0.4
## [16] GenomeInfoDbData_1.2.13 farver_2.1.2 compiler_4.4.2
## [19] statmod_1.5.0 munsell_0.5.1 codetools_0.2-20
## [22] htmltools_0.5.8.1 sys_3.4.3 buildtools_1.0.0
## [25] sass_0.4.9 yaml_2.3.10 tidyr_1.3.1
## [28] pillar_1.10.1 crayon_1.5.3 jquerylib_0.1.4
## [31] BiocParallel_1.41.0 DelayedArray_0.33.4 cachem_1.1.0
## [34] abind_1.4-8 ggstats_0.8.0 tidyselect_1.2.1
## [37] locfit_1.5-9.10 digest_0.6.37 purrr_1.0.2
## [40] dplyr_1.1.4 labeling_0.4.3 maketools_1.3.1
## [43] fastmap_1.2.0 grid_4.4.2 colorspace_2.1-1
## [46] cli_3.6.3 SparseArray_1.7.4 magrittr_2.0.3
## [49] S4Arrays_1.7.1 withr_3.0.2 scales_1.3.0
## [52] UCSC.utils_1.3.1 rmarkdown_2.29 XVector_0.47.2
## [55] httr_1.4.7 evaluate_1.0.3 knitr_1.49
## [58] rlang_1.1.5 Rcpp_1.0.14 glue_1.8.0
## [61] BiocManager_1.30.25 jsonlite_1.8.9 R6_2.5.1
## [64] plyr_1.8.9