Title: | User Friendly Single-Cell and Bulk RNA Sequencing Visualization |
---|---|
Description: | A universal, user friendly, single-cell and bulk RNA sequencing visualization toolkit that allows highly customizable creation of color blindness friendly, publication-quality figures. dittoSeq accepts both SingleCellExperiment (SCE) and Seurat objects, as well as the import and usage, via conversion to an SCE, of SummarizedExperiment or DGEList bulk data. Visualizations include dimensionality reduction plots, heatmaps, scatterplots, percent composition or expression across groups, and more. Customizations range from size and title adjustments to automatic generation of annotations for heatmaps, overlay of trajectory analysis onto any dimensionality reduciton plot, hidden data overlay upon cursor hovering via ggplotly conversion, and many more. All with simple, discrete inputs. Color blindness friendliness is powered by legend adjustments (enlarged keys), and by allowing the use of shapes or letter-overlay in addition to the carefully selected dittoColors(). |
Authors: | Daniel Bunis [aut, cre], Jared Andrews [aut, ctb] |
Maintainer: | Daniel Bunis <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.19.0 |
Built: | 2024-12-02 06:19:10 UTC |
Source: | https://github.com/bioc/dittoSeq |
Add any dimensionality reduction space to a SingleCellExperiment object containing bulk or single-cell data
addDimReduction(object, embeddings, name, key = .gen_key(name))
addDimReduction(object, embeddings, name, key = .gen_key(name))
object |
the bulk or single-cell |
embeddings |
a numeric matrix or matrix-like object, with number of rows equal to ncol(object), containing the coordinates of all cells / samples within the dimensionality reduction space. |
name |
String name for the reduction slot. Example: "pca".
This will become the name of the slot, and what should be provided to the |
key |
String, like "PC", which sets the default axes-label prefix when this reduction is used for making a |
Outputs a SingleCellExperiment
object with an added or replaced dimensionality reduction slot.
Daniel Bunis
addPrcomp
for a prcomp specific PCA import wrapper
importDittoBulk
for initial import of bulk RNAseq data into dittoSeq as a SingleCellExperiment
.
dittoDimPlot
for visualizing how samples group within added dimensionality reduction spaces
example("importDittoBulk", echo = FALSE) # Calculate PCA # NOTE: This is typically not done with all genes in the dataset. # The inclusion of this example code is not an endorsement of a particular # method of PCA. Consult yourself, a bioinformatician, or literature for # tips on proper techniques. embeds <- prcomp(t(logcounts(myRNA)), center = TRUE, scale = TRUE)$x myRNA <- addDimReduction( object = myRNA, embeddings = embeds, name = "pca", key = "PC") # Visualize conditions metadata on a PCA plot dittoDimPlot(myRNA, "conditions", reduction.use = "pca", size = 3)
example("importDittoBulk", echo = FALSE) # Calculate PCA # NOTE: This is typically not done with all genes in the dataset. # The inclusion of this example code is not an endorsement of a particular # method of PCA. Consult yourself, a bioinformatician, or literature for # tips on proper techniques. embeds <- prcomp(t(logcounts(myRNA)), center = TRUE, scale = TRUE)$x myRNA <- addDimReduction( object = myRNA, embeddings = embeds, name = "pca", key = "PC") # Visualize conditions metadata on a PCA plot dittoDimPlot(myRNA, "conditions", reduction.use = "pca", size = 3)
Add a prcomp pca calculation to a SingleCellExperiment object containing bulk or single-cell data
addPrcomp(object, prcomp, name = "pca", key = "PC")
addPrcomp(object, prcomp, name = "pca", key = "PC")
object |
the |
prcomp |
a prcomp output which will be added to the |
name |
String name for the reduction slot.
Normally, this will be "pca", but you can hold any number of PCA calculations so long as a unique |
key |
String, like "PC", which sets the default axes-label prefix when this reduction is used for making a |
Outputs an SingleCellExperiment
object with an added or replaced pca reduction slot.
Daniel Bunis
addDimReduction
for adding other types of dimensionality reductions
importDittoBulk
for initial import of bulk RNAseq data into dittoSeq as a SingleCellExperiment
.
dittoDimPlot
for visualizing how samples group within added dimensionality reduction spaces
example("importDittoBulk", echo = FALSE) # Calculate PCA with prcomp # NOTE: This is typically not done with all genes in a dataset. # The inclusion of this example code is not an endorsement of a particular # method of PCA. Consult yourself, a bioinformatician, or literature for # tips on proper techniques. calc <- prcomp(t(logcounts(myRNA)), center = TRUE, scale = TRUE) myRNA <- addPrcomp( object = myRNA, prcomp = calc) # Now we can visualize conditions metadata on a PCA plot dittoDimPlot(myRNA, "conditions", reduction.use = "pca", size = 3)
example("importDittoBulk", echo = FALSE) # Calculate PCA with prcomp # NOTE: This is typically not done with all genes in a dataset. # The inclusion of this example code is not an endorsement of a particular # method of PCA. Consult yourself, a bioinformatician, or literature for # tips on proper techniques. calc <- prcomp(t(logcounts(myRNA)), center = TRUE, scale = TRUE) myRNA <- addPrcomp( object = myRNA, prcomp = calc) # Now we can visualize conditions metadata on a PCA plot dittoDimPlot(myRNA, "conditions", reduction.use = "pca", size = 3)
A wrapper for the darken function of the colorspace package.
Darken(colors, percent.change = 0.25, relative = TRUE)
Darken(colors, percent.change = 0.25, relative = TRUE)
colors |
the color(s) input. Can be a list of colors, for example, /codedittoColors(). |
percent.change |
# between 0 and 1. the percentage to darken by. Defaults to 0.25 if not given. |
relative |
TRUE/FALSE. Whether the percentage should be a relative change versus an absolute one. Default = TRUE. |
Return a darkened version of the color in hexadecimal color form (="#RRGGBB" in base 16)
Daniel Bunis
Darken("blue") #"blue" = "#0000FF" #Output: "#0000BF" Darken(dittoColors()[1:8]) #Works for multiple color inputs as well.
Darken("blue") #"blue" = "#0000FF" #Output: "#0000BF" Darken(dittoColors()[1:8]) #Works for multiple color inputs as well.
Plots the number of annotations per sample, per lane
demux.calls.summary( object, singlets.only = FALSE, main = "Sample Annotations by Lane", sub = NULL, ylab = "Annotations", xlab = "Sample", color = dittoColors()[2], theme = NULL, rotate.labels = TRUE, data.out = FALSE )
demux.calls.summary( object, singlets.only = FALSE, main = "Sample Annotations by Lane", sub = NULL, ylab = "Annotations", xlab = "Sample", color = dittoColors()[2], theme = NULL, rotate.labels = TRUE, data.out = FALSE )
object |
A Seurat or SingleCellExperiment object |
singlets.only |
Whether to only show data for cells called as singlets by demuxlet. Default is TRUE. Note: if doublets are included, only one of their sample calls will be used. |
main |
plot title. Default = "Sample Annotations by Lane" |
sub |
plot subtitle |
ylab |
y axis label, default is "Annotations" |
xlab |
x axis label, default is "Sample" |
color |
bars color. Default is the dittoColors skyBlue. |
theme |
A complete ggplot theme. Default is a slightly modified theme_bw(). |
rotate.labels |
whether sample names / x-axis labels should be rotated or not. Default is TRUE. |
data.out |
Logical, whether underlying data for the plot should be output instead of the plot itself. |
A faceted ggplot summarizing how many cells in each lane were anotated to each sample.
Assumes that the Sample calls of each cell, and which lane each cell belonged to, are stored in 'Sample' and 'Lane' metadata slots, respectively, as would be the case if demuxlet information was imported with importDemux
.
Alternatively, value will be a data.frame containing the underlying data if data.out = TRUE
is provided.
Daniel Bunis
demux.SNP.summary
for plotting the number of SNPs measured per cell.
This is the other Demuxlet-associated QC visualization included with dittoSeq.
importDemux
, for how to import relevant demuxlet information as metadata.
Kang et al. Nature Biotechnology, 2018 https://www.nature.com/articles/nbt.4042 for more information about the demuxlet cell-sample deconvolution method.
example(importDemux, echo = FALSE) demux.calls.summary(myRNA) # Exclude doublets by setting 'singlets only = TRUE' demux.calls.summary(myRNA, singlets.only = TRUE) # To return the underlying data.frame demux.calls.summary(myRNA, data.out = TRUE)
example(importDemux, echo = FALSE) demux.calls.summary(myRNA) # Exclude doublets by setting 'singlets only = TRUE' demux.calls.summary(myRNA, singlets.only = TRUE) # To return the underlying data.frame demux.calls.summary(myRNA, data.out = TRUE)
Plots the number of SNPs sequenced per droplet
demux.SNP.summary( object, group.by = "Lane", color.by = group.by, plots = c("jitter", "boxplot"), boxplot.color = "grey30", add.line = 50, min = 0, ... )
demux.SNP.summary( object, group.by = "Lane", color.by = group.by, plots = c("jitter", "boxplot"), boxplot.color = "grey30", add.line = 50, min = 0, ... )
object |
A Seurat or SingleCellExperiment object |
group.by |
String "name" of a metadata to use for grouping values. Default is "Lane". |
color.by |
String "name" of a metadata to use for coloring.
Default is whatever was provided to |
plots |
String vector which sets the types of plots to include: possibilities = "jitter", "boxplot", "vlnplot", "ridgeplot". NOTE: The order matters, so use c("back","middle","front") when inputing multiple to put them in the order you want. |
boxplot.color |
The color of the lines of the boxplot. |
add.line |
numeric value(s) where a dashed horizontal line should go. Default = 50, a high confidence minimum number of SNPs per cell for highly accurate demuxlet sample deconvolution. |
min |
numeric value which sets the minimum value shown on the y-axis. |
... |
extra arguments passed to |
This function is a wrapper that essentially runs dittoPlot
("demux.N.SNP")
with a few modified defaults.
The altered defaults:
Data is grouped and colored by the "Lane" metadata (unless group.by
or color.by
are adjusted otherwise).
Data is displayed as boxplots with gray lines on top of dots for individual cells (unless plots
or boxplot.color
are adjusted otherwise).
The plot is set to have minimum y axis value of zero (unless min
is adjusted otherwise).
A dashed line is added at the value 50, a very conservative minimum number of SNPs for high confidence sample calls (unless add.line
is adjusted otherwise).
A ggplot, made with dittoPlot
showing a summary of how many SNPs were available to Demuxlet for each cell of a dataset.
Alternatively, a plotly object if data.hover = TRUE
is provided.
Alternatively, list containing a ggplot and the underlying data as a dataframe if data.out = TRUE
is provided.
Daniel Bunis
demux.calls.summary
for plotting the number of sample annotations assigned within each lane.
This is the other Demuxlet-associated QC visualization included with dittoSeq.
dittoPlot
, as demux.SNP.summary
is essentially just a dittoPlot
wrapper.
importDemux
, for how to import relevant demuxlet information as metadata.
Kang et al. Nature Biotechnology, 2018 https://www.nature.com/articles/nbt.4042 for more information about the demuxlet cell-sample deconvolution method.
example(importDemux, echo = FALSE) demux.SNP.summary(myRNA) #Function wraps dittoPlot. See dittoPlot docs for more examples
example(importDemux, echo = FALSE) demux.SNP.summary(myRNA) #Function wraps dittoPlot. See dittoPlot docs for more examples
A dataframe containing mock demuxlet information for the 80-cell Seurat::pbmc_small dataset
demuxlet.example
demuxlet.example
An object of class data.frame
with 80 rows and 7 columns.
This data was created based on the structure of real demuxlet.best output files.
Barcodes from Seurat's pbmc_small example data were used as the BARCODES column.
Cells were then assigned randomly as either SNG (singlets), DBL (doublets), or AMB (ambiguous).
Cells were then randomly assign to sample1-10 (or multiple samples for doublets), and this information was combined using the paste
function into the typical structure of a demuxlet CALL column.
Random sampling of remaining data from a separate, actual, demuxlet daatset was used for remaining columns.
A dataframe
This is a slightly simplified example. Real demuxlet.best data has additional columns.
Daniel Bunis
Outputs a stacked bar plot to show the percent composition of samples, groups, clusters, or other groupings
dittoBarPlot( object, var, group.by, scale = c("percent", "count"), split.by = NULL, cells.use = NULL, retain.factor.levels = FALSE, data.out = FALSE, do.hover = FALSE, color.panel = dittoColors(), colors = seq_along(color.panel), split.nrow = NULL, split.ncol = NULL, split.adjust = list(), y.breaks = NA, min = 0, max = NULL, var.labels.rename = NULL, var.labels.reorder = NULL, x.labels = NULL, x.labels.rotate = TRUE, x.reorder = NULL, theme = theme_classic(), xlab = group.by, ylab = "make", main = "make", sub = NULL, legend.show = TRUE, legend.title = NULL )
dittoBarPlot( object, var, group.by, scale = c("percent", "count"), split.by = NULL, cells.use = NULL, retain.factor.levels = FALSE, data.out = FALSE, do.hover = FALSE, color.panel = dittoColors(), colors = seq_along(color.panel), split.nrow = NULL, split.ncol = NULL, split.adjust = list(), y.breaks = NA, min = 0, max = NULL, var.labels.rename = NULL, var.labels.reorder = NULL, x.labels = NULL, x.labels.rotate = TRUE, x.reorder = NULL, theme = theme_classic(), xlab = group.by, ylab = "make", main = "make", sub = NULL, legend.show = TRUE, legend.title = NULL )
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
var |
String name of a metadata that contains discrete data, or a factor or vector containing such data for all cells/samples in the target |
group.by |
String name of a metadata to use for separating the cells/samples into discrete groups. |
scale |
"count" or "percent". Sets whether data should be shown as counts versus percentage. |
split.by |
1 or 2 strings naming discrete metadata to use for splitting the cells/samples into multiple plots with ggplot faceting. When 2 metadatas are named, c(row,col), the first is used as rows and the second is used for columns of the resulting grid. When 1 metadata is named, shape control can be achieved with |
cells.use |
String vector of cells'/samples' names OR an integer vector specifying the indices of cells/samples which should be included. Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include. Note: When |
retain.factor.levels |
Logical which controls whether factor identities of |
data.out |
Logical. When set to |
do.hover |
Logical which sets whether the ggplot output should be converted to a ggplotly object with data about individual bars displayed when you hover your cursor over them. |
color.panel |
String vector which sets the colors to draw from. |
colors |
Integer vector, which sets the indexes / order, of colors from color.panel to actually use.
(Provides an alternative to directly modifying |
split.nrow , split.ncol
|
Integers which set the dimensions of faceting/splitting when a single metadata is given to |
split.adjust |
A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. 'list(scales = "free")'. For options, when giving 1 metadata to |
y.breaks |
Numeric vector which sets the plot's tick marks / major gridlines. c(break1,break2,break3,etc.) |
min , max
|
Scalars which control the zoom of the plot.
These inputs set the minimum / maximum values of the y-axis.
Default = set based on the limits of the data, 0 to 1 for |
var.labels.rename |
String vector for renaming the distinct identities of Hint: use |
var.labels.reorder |
Integer vector. A sequence of numbers, from 1 to the number of distinct Method: Make a first plot without this input.
Then, treating the top-most grouping as index 1, and the bottom-most as index n.
Values of |
x.labels |
String vector which will replace the x-axis groupings' labels.
Regardless of |
x.labels.rotate |
Logical which sets whether the x-axis grouping labels should be rotated. |
x.reorder |
Integer vector. A sequence of numbers, from 1 to the number of groupings, for rearranging the order of x-axis groupings. Method: Make a first plot without this input.
Then, treating the leftmost grouping as index 1, and the rightmost as index n.
Values of Recommendation for advanced users: If you find yourself coming back to this input too many times, an alternative solution that can be easier long-term
is to make the target data into a factor, and to put its levels in the desired order: |
theme |
A ggplot theme which will be applied before dittoSeq adjustments.
Default = |
xlab |
String which sets the x-axis title.
Default is |
ylab |
String which sets the y-axis title. Default = "make" and if left as make, a title will be automatically generated. |
main |
String, sets the plot title |
sub |
String, sets the plot subtitle |
legend.show |
Logical which sets whether the legend should be displayed. |
legend.title |
String which adds a title to the legend. |
The function creates a dataframe containing counts and percent makeup of var
identities for each x-axis grouping (determined by the group.by
input).
If a set of cells/samples to use is indicated with the cells.use
input, only those cells/samples are used for counts and percent makeup calculations.
Then, a vertical bar plot is generated (ggplot2::geom_col()
) showing either percent makeup if
scale = "percent"
, which is the default, or raw counts if scale = "count"
.
A ggplot plot where discrete data, grouped by sample, condition, cluster, etc. on the x-axis, is shown on the y-axis as either counts or percent-of-total-per-grouping in a stacked barplot.
Alternatively, if data.out = TRUE
, a list containing the plot ("p") and a dataframe of the underlying data ("data").
Alternatively, if do.hover = TRUE
, a plotly conversion of the ggplot output in which underlying data can be retrieved upon hovering the cursor over the plot.
Colors can be adjusted with color.panel
and/or colors
.
y-axis zoom and tick marks can be adjusted using min
, max
, and y.breaks
.
Titles can be adjusted with main
, sub
, xlab
, ylab
, and legend.title
arguments.
The legend can be removed by setting legend.show = FALSE
.
x-axis labels and groupings can be changed / reordered using x.labels
and x.reorder
, and rotation of these labels can be turned off with x.labels.rotate = FALSE
.
y-axis var
-group labels and their order can be changed / reordered using var.labels
and var.labels.reorder
.
Daniel Bunis
dittoFreqPlot
for a data representation that focuses on pre-sample frequencies of each the var
-data values individually, rather than emphasizing total makeup of samples/groups.
example(importDittoBulk, echo = FALSE) myRNA dittoBarPlot(myRNA, "clustering", group.by = "groups") dittoBarPlot(myRNA, "clustering", group.by = "groups", scale = "count") # Reordering the x-axis groupings to have "C" (#3) come first dittoBarPlot(myRNA, "clustering", group.by = "groups", x.reorder = c(3,1,2,4)) ### Accessing underlying data: # as dataframe dittoBarPlot(myRNA, "clustering", group.by = "groups", data.out = TRUE) # through hovering the cursor over the relevant parts of the plot if (requireNamespace("plotly", quietly = TRUE)) { dittoBarPlot(myRNA, "clustering", group.by = "groups", do.hover = TRUE) } ### Previous Version Compatibility # Mistakenly, dittoBarPlot used to remove factor identities entirely from the # data it used. This manifests as ignorance of a user's set orderings for # their data. That is nolonger done by default, but to recreate old plots, # restoring this behavior can be achieved with 'retain.factor.levels = FALSE' # Set factor level ordering for a metadata we'll give to 'group.by' myRNA$groups_reverse_levels <- factor( myRNA$groups, levels = c("D", "C", "B", "A")) # dittoBarPlot will now respect this level order by default. dittoBarPlot(myRNA, "clustering", group.by = "groups_reverse_levels") # But that respect can be turned off... dittoBarPlot(myRNA, "clustering", group.by = "groups_reverse_levels", retain.factor.levels = FALSE)
example(importDittoBulk, echo = FALSE) myRNA dittoBarPlot(myRNA, "clustering", group.by = "groups") dittoBarPlot(myRNA, "clustering", group.by = "groups", scale = "count") # Reordering the x-axis groupings to have "C" (#3) come first dittoBarPlot(myRNA, "clustering", group.by = "groups", x.reorder = c(3,1,2,4)) ### Accessing underlying data: # as dataframe dittoBarPlot(myRNA, "clustering", group.by = "groups", data.out = TRUE) # through hovering the cursor over the relevant parts of the plot if (requireNamespace("plotly", quietly = TRUE)) { dittoBarPlot(myRNA, "clustering", group.by = "groups", do.hover = TRUE) } ### Previous Version Compatibility # Mistakenly, dittoBarPlot used to remove factor identities entirely from the # data it used. This manifests as ignorance of a user's set orderings for # their data. That is nolonger done by default, but to recreate old plots, # restoring this behavior can be achieved with 'retain.factor.levels = FALSE' # Set factor level ordering for a metadata we'll give to 'group.by' myRNA$groups_reverse_levels <- factor( myRNA$groups, levels = c("D", "C", "B", "A")) # dittoBarPlot will now respect this level order by default. dittoBarPlot(myRNA, "clustering", group.by = "groups_reverse_levels") # But that respect can be turned off... dittoBarPlot(myRNA, "clustering", group.by = "groups_reverse_levels", retain.factor.levels = FALSE)
Creates a string vector of 40 unique colors, in hexadecimal form, repeated 100 times.
Or, if get.names
is set to TRUE
, outputs the names of the colors which can be helpful as reference when adjusting how colors get used.
These colors are a modification of the protanope and deuteranope friendly colors from Wong, B. Nature Methods, 2011.
Truly, only the first 1-7 are maximally (red-green) color-blindness friendly, but the lightened and darkened versions (plus grey) in slots 8-40 still work releatively well at extending their utility further. Note that past 40, the colors simply repeat in order to most easily allow dittoSeq visualizations to handle situations requiring even more colors.
The colors are:
1-7 = Suggested color panel from Wong, B. Nature Methods, 2011, minus black
1- orange = "#E69F00"
2- skyBlue = "#56B4E9"
3- bluishGreen = "#009E73"
4- yellow = "#F0E442"
5- blue = "#0072B2"
6- vermillion = "#D55E00"
7- reddishPurple = "#CC79A7"
8 = gray40
9-16 = 25% darker versions of colors 1-8
17-24 = 25% lighter versions of colors 1-8
25-32 = 40% lighter versions of colors 1-8
33-40 = 40% darker versions of colors 1-8
dittoColors(reps = 100, get.names = FALSE)
dittoColors(reps = 100, get.names = FALSE)
reps |
Integer which sets how many times the original set of colors should be repeated |
get.names |
Logical, whether only the names of the default dittoSeq color panel should be returned instead |
A string vector with length = 24.
Daniel Bunis
dittoColors() #To retrieve names: dittoColors(get.names = TRUE)
dittoColors() #To retrieve names: dittoColors(get.names = TRUE)
Shows data overlayed on a tsne, pca, or similar type of plot
dittoDimPlot( object, var, reduction.use = .default_reduction(object), size = 1, opacity = 1, dim.1 = 1, dim.2 = 2, cells.use = NULL, shape.by = NULL, split.by = NULL, split.adjust = list(), extra.vars = NULL, multivar.split.dir = c("col", "row"), show.others = TRUE, split.show.all.others = TRUE, split.nrow = NULL, split.ncol = NULL, assay = .default_assay(object), slot = .default_slot(object), adjustment = NULL, swap.rownames = NULL, color.panel = dittoColors(), colors = seq_along(color.panel), shape.panel = c(16, 15, 17, 23, 25, 8), min.color = "#F0E442", max.color = "#0072B2", min = NA, max = NA, order = c("unordered", "increasing", "decreasing", "randomize"), main = "make", sub = NULL, xlab = "make", ylab = "make", rename.var.groups = NULL, rename.shape.groups = NULL, theme = theme_bw(), show.axes.numbers = TRUE, show.grid.lines = if (is.character(reduction.use)) { !grepl("umap|tsne", tolower(reduction.use)) } else { TRUE }, do.letter = FALSE, do.ellipse = FALSE, do.label = FALSE, labels.size = 5, labels.highlight = TRUE, labels.repel = TRUE, labels.split.by = split.by, labels.repel.adjust = list(), do.hover = FALSE, hover.data = var, hover.assay = .default_assay(object), hover.slot = .default_slot(object), hover.adjustment = NULL, add.trajectory.lineages = NULL, add.trajectory.curves = NULL, trajectory.cluster.meta, trajectory.arrow.size = 0.15, do.contour = FALSE, contour.color = "black", contour.linetype = 1, legend.show = TRUE, legend.size = 5, legend.title = "make", legend.breaks = waiver(), legend.breaks.labels = waiver(), shape.legend.size = 5, shape.legend.title = shape.by, do.raster = FALSE, raster.dpi = 300, data.out = FALSE )
dittoDimPlot( object, var, reduction.use = .default_reduction(object), size = 1, opacity = 1, dim.1 = 1, dim.2 = 2, cells.use = NULL, shape.by = NULL, split.by = NULL, split.adjust = list(), extra.vars = NULL, multivar.split.dir = c("col", "row"), show.others = TRUE, split.show.all.others = TRUE, split.nrow = NULL, split.ncol = NULL, assay = .default_assay(object), slot = .default_slot(object), adjustment = NULL, swap.rownames = NULL, color.panel = dittoColors(), colors = seq_along(color.panel), shape.panel = c(16, 15, 17, 23, 25, 8), min.color = "#F0E442", max.color = "#0072B2", min = NA, max = NA, order = c("unordered", "increasing", "decreasing", "randomize"), main = "make", sub = NULL, xlab = "make", ylab = "make", rename.var.groups = NULL, rename.shape.groups = NULL, theme = theme_bw(), show.axes.numbers = TRUE, show.grid.lines = if (is.character(reduction.use)) { !grepl("umap|tsne", tolower(reduction.use)) } else { TRUE }, do.letter = FALSE, do.ellipse = FALSE, do.label = FALSE, labels.size = 5, labels.highlight = TRUE, labels.repel = TRUE, labels.split.by = split.by, labels.repel.adjust = list(), do.hover = FALSE, hover.data = var, hover.assay = .default_assay(object), hover.slot = .default_slot(object), hover.adjustment = NULL, add.trajectory.lineages = NULL, add.trajectory.curves = NULL, trajectory.cluster.meta, trajectory.arrow.size = 0.15, do.contour = FALSE, contour.color = "black", contour.linetype = 1, legend.show = TRUE, legend.size = 5, legend.title = "make", legend.breaks = waiver(), legend.breaks.labels = waiver(), shape.legend.size = 5, shape.legend.title = shape.by, do.raster = FALSE, raster.dpi = 300, data.out = FALSE )
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
var |
String name of a "gene" or "metadata" (or "ident" for a Seurat Alternatively, a string vector naming multiple genes or metadata, OR a vector of the same length as there are cells/samples in the |
reduction.use |
String, such as "pca", "tsne", "umap", or "PCA", etc, which is the name of a dimensionality reduction slot within the object, and which sets what dimensionality reduction space within the object to use. Default = the first dimensionality reduction slot inside the object with "umap", "tsne", or "pca" within its name, (priority: UMAP > t-SNE > PCA) or the first dimensionality reduction slot if none of those exist. Alternatively, a matrix (or data.frame) containing the dimensionality reduction embeddings themselves.
The matrix should have as many rows as there are cells/samples in the |
size |
Number which sets the size of data points. Default = 1. |
opacity |
Number between 0 and 1. Great for when you have MANY overlapping points, this sets how solid the points should be: 1 = not see-through at all. 0 = invisible. Default = 1. (In terms of typical ggplot variables, = alpha) |
dim.1 |
The component number to use on the x-axis. Default = 1 |
dim.2 |
The component number to use on the y-axis. Default = 2 |
cells.use |
String vector of cells'/samples' names OR an integer vector specifying the indices of cells/samples which should be included. Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include. |
shape.by |
Variable for setting the shape of cells/samples in the plot. Note: must be discrete. Can be the name of a gene or meta-data. Alternatively, can be "ident" for clusters of a Seurat object. Alternatively, can be a numeric of length equal to the total number of cells/samples in object. Note: shapes can be harder to see, and to process mentally, than colors. Even as a color blind person myself writing this code, I recommend use of colors for variables with many discrete values. |
split.by |
1 or 2 strings naming discrete metadata to use for splitting the cells/samples into multiple plots with ggplot faceting. When 2 metadatas are named, c(row,col), the first is used as rows and the second is used for columns of the resulting grid. When 1 metadata is named, shape control can be achieved with |
split.adjust |
A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. 'list(scales = "free")'. For options, when giving 1 metadata to |
extra.vars |
String vector providing names of any extra metadata to be stashed in the dataframe supplied to Useful for making custom splitting/faceting or other additional alterations after dittoSeq plot generation. |
multivar.split.dir |
"row" or "col", sets the direction of faceting used for 'var' values when |
show.others |
Logical. Whether other cells should be shown in the background in light gray. Default = TRUE. |
split.show.all.others |
Logical which sets whether gray "others" cells of facets should include all cells of other facets ( |
split.nrow , split.ncol
|
Integers which set the dimensions of faceting/splitting when a single metadata is given to |
assay , slot
|
single strings or integers (SCEs and SEs) or an optionally named vector of such values that set which expression data to use.
See |
adjustment |
When plotting gene / feature expression, should that data be used directly (default) or should it be adjusted to be
|
swap.rownames |
optionally named string or string vector.
For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object).
When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for.
See |
color.panel |
String vector which sets the colors to draw from. |
colors |
Integer vector, the indexes / order, of colors from color.panel to actually use. Useful for quickly swapping the colors of nearby clusters. |
shape.panel |
Vector of integers corresponding to ggplot shapes which sets what shapes to use.
When discrete groupings are supplied by Note: Unfortunately, shapes can be hard to see when points are on top of each other & they are more slowly processed by the brain. For these reasons, even as a color blind person myself writing this code, I recommend use of colors for variables with many discrete values. |
min.color |
color for lowest values of |
max.color |
color for highest values of |
min |
Number which sets the value associated with the minimum color. |
max |
Number which sets the value associated with the maximum color. |
order |
String. If the data should be plotted based on the order of the color data, sets whether to plot (from back to front) in "increasing", "decreasing", "randomize" order.
If left as "unordered", plot order is simply based on the order of cells within the |
main |
String, sets the plot title.
Default title is automatically generated if not given a specific value. To remove, set to |
sub |
String, sets the plot subtitle |
xlab , ylab
|
Strings which set the labels for the axes.
Default labels are generated if you do not give this a specific value.
To remove, set to |
rename.var.groups |
String vector which sets new names for the identities of |
rename.shape.groups |
String vector which sets new names for the identities of |
theme |
A ggplot theme which will be applied before dittoSeq adjustments.
Default = |
show.axes.numbers |
Logical which controls whether the axes values should be displayed. |
show.grid.lines |
Logical which sets whether gridlines of the plot should be shown.
They are removed when set to FALSE.
Default = FALSE for umap and tsne |
do.letter |
Logical which sets whether letters should be added on top of the colored dots. For extended colorblindness compatibility.
NOTE: |
do.ellipse |
Logical. Whether the groups should be surrounded by median-centered ellipses. |
do.label |
Logical. Whether to add text labels near the center (median) of clusters for grouping vars. |
labels.size |
Size of the the labels text |
labels.highlight |
Logical. Whether the labels should have a box behind them |
labels.repel |
Logical, that sets whether the labels' placements will be adjusted with ggrepel to avoid intersections between labels and plot bounds. TRUE by default. |
labels.split.by |
String of one or two metadata names which controls the facet-split calculations for label placements.
Defaults to |
labels.repel.adjust |
A named list which allows extra parameters to be pushed through to ggrepel function calls.
List elements should be valid inputs to the |
do.hover |
Logical which controls whether the output will be converted to a plotly object so that data about individual points will be displayed when you hover your cursor over them.
|
hover.data |
String vector of gene and metadata names, example: |
hover.assay , hover.slot , hover.adjustment
|
Similar to the non-hover versions of these inputs, when showing expression data upon hover, these set what data will be shown. |
add.trajectory.lineages |
List of vectors representing trajectory paths, each from start-cluster to end-cluster, where vector contents are the names of clusters provided in the If the |
add.trajectory.curves |
List of matrices, each representing coordinates for a trajectory path, from start to end, where matrix columns represent x ( Alternatively, a list of lists(/princurve objects) can be provided.
Thus, if the |
trajectory.cluster.meta |
String name of metadata containing the clusters that were used for generating trajectories. Required when plotting trajectories using the |
trajectory.arrow.size |
Number representing the size of trajectory arrows, in inches. Default = 0.15. |
do.contour |
Logical. Whether density-based contours should be displayed. |
contour.color |
String that sets the color(s) of the |
contour.linetype |
String or numeric which sets the type of line used for |
legend.show |
Logical. Whether the legend should be displayed. Default = |
legend.size |
Number representing the size at which color legend shapes should be plotted (for discrete variable plotting) in the color legend. Default = 5. *Enlarging the colors legend is incredibly helpful for making colors more distinguishable by color blind individuals. |
legend.title |
String which sets the title for the color legend. Default = |
legend.breaks |
Numeric vector which sets the discrete values to show in the color-scale legend for continuous data. |
legend.breaks.labels |
String vector, with same length as |
shape.legend.size |
Number representing the size at which shapes should be plotted in the shape legend. |
shape.legend.title |
String which sets the title of the shapes legend. Default is |
do.raster |
Logical. When set to |
raster.dpi |
Number indicating dpi to use for rasterization. Default = 300. |
data.out |
Logical. When set to |
The function creates a dataframe containing the metadata or expression data associated with the given var
(or if a vector of data is provided directly, it just uses that),
plus X and Y coordinates data determined by the reduction.use
and dim.1
(x-axis) and dim.2
(y-axis) inputs.
Any extra data requested with shape.by
, split.by
or extra.var
is added as well.
For expression/counts data, assay
, slot
, and adjustment
inputs can be used to change which data is used, and if it should be adjusted in some way.
Next, if a set of cells or samples to use is indicated with the cells.use
input, then the dataframe is split into Target_data
and Others_data
based on subsetting by the target cells/samples.
Finally, a scatter plot is then created using these dataframes where non-target cells will be displayed in gray if show.others=TRUE
,
and target cell data is displayed on top, colored based on the var
-associated data, and with shapes determined by the shape.by
-associated data.
If split.by
was used, the plot will be split into a matrix of panels based on the associated groupings.
A ggplot or plotly object where colored dots (or other shapes) are overlayed onto a tSNE, PCA, UMAP, ..., plot of choice.
Alternatively, if data.out=TRUE
, a list containing three slots is output: the plot (named 'p'), a data.table containing the underlying data for target cells (named 'Target_data'), and a data.table containing the underlying data for non-target cells (named 'Others_data').
Alternatively, if do.hover
is set to TRUE
, the plot is coverted from ggplot to plotly &
cell/sample information, determined by the hover.data
input, is retrieved, added to the dataframe, and displayed upon hovering the cursor over the plot.
size
and opacity
can be used to adjust the size and transparency of the data points.
Color can be adjusted with color.panel
and/or colors
for discrete data, or min
, max
, min.color
, and max.color
for continuous data.
Shapes can be adjusted with shape.panel
.
Color and shape labels can be changed using rename.var.groups
and rename.shape.groups
.
Titles and axes labels can be adjusted with main
, sub
, xlab
, ylab
, and legend.title
arguments.
Legends can also be adjusted in other ways, using variables that all start with "legend.
" for easy tab-completion lookup.
Many other tweaks and features can be added as well.
Each is accessible through 'tab' autocompletion starting with "do.
"---
or "add.
"---
,
and if additional inputs are involved in implementing or tweaking these, the associated inputs will start with the "---.
":
If do.label
is set to TRUE
, labels will be added based on median centers of the discrete var
-data groupings.
The size of the text in the labels can be adjusted using the labels.size
input.
By default labels will repel eachother and the bounds of the plot, and labels will be highlighted with a white background.
Either of these can be turned off by setting labels.repel = FALSE
or labels.highlight = FALSE
,
If do.ellipse
is set to TRUE
, ellipses will be added to highlight distinct var
-data groups' positions based on median positions of their cell/sample components.
If do.contour
is provided, density gradiant contour lines will be overlaid with color and linetype adjustable via contour.color
and contour.linetype
.
If add.trajectory.lineages
is provided a list of vectors (each vector being cluster names from start-cluster-name to end-cluster-name), and a metadata name pointing to the relevant clustering information is provided to trajectory.cluster.meta
,
then median centers of the clusters will be calculated and arrows will be overlayed to show trajectory inference paths in the current dimmenionality reduction space.
If add.trajectory.curves
is provided a list of matrices (each matrix containing x, y coordinates from start to end), paths and arrows will be overlayed to show trajectory inference curves in the current dimmenionality reduction space.
Arrow size is controlled with the trajectory.arrow.size
input.
Daniel Bunis and Jared Andrews
getGenes
and getMetas
to see what the var
, split.by
, etc. options are of an object
.
getReductions
to see what the reduction.use
options are of an object
.
importDittoBulk
for how to create a SingleCellExperiment
object from bulk seq data that dittoSeq functions can use &
addDimReduction
for how to specifically add calculated dimensionality reductions that dittoDimPlot
can utilize.
dittoScatterPlot
for showing very similar data representations, but where genes or metadata are wanted as the axes.
dittoDimHex
and dittoScatterHex
for showing very similar data representations, but where nearby cells are summarized together in hexagonal bins.
dittoPlot
for an alternative continuous data display method where data broken into discrete groupings is shown on a y- (or x-) axis.
dittoBarPlot
for an alternative discrete data display and quantification method.
example(importDittoBulk, echo = FALSE) myRNA # Display discrete data: dittoDimPlot(myRNA, "clustering") # Display continuous data: dittoDimPlot(myRNA, "gene1") # You can also plot multiple sets of continuous data: dittoDimPlot(myRNA, c("gene1", "gene2")) # (See ?multi_dittoDimPlot if you would like to have wholy separate # plots/scales/legends for each set.) # To show currently set clustering for seurat objects, you can use "ident". # To change the dimensional reduction type, use 'reduction.use'. dittoDimPlot(myRNA, "clustering", reduction.use = "pca", dim.1 = 3, dim.2 = 4) # Subset to certain cells with cells.use dittoDimPlot(myRNA, "clustering", cells.us = !myRNA$SNP) # Data can also be split in other ways with 'shape.by' or 'split.by' dittoDimPlot(myRNA, "gene1", shape.by = "clustering", split.by = "SNP") # single split.by element dittoDimPlot(myRNA, "gene1", split.by = c("groups","SNP")) # row and col split.by elements # Modify the look with intuitive inputs dittoDimPlot(myRNA, "clustering", size = 2, opacity = 0.7, show.axes.numbers = FALSE, ylab = NULL, xlab = "tSNE", main = "Plot Title", sub = "subtitle", legend.title = "clustering") # MANY addtional tweaks are possible. # Also, many extra features are easy to add as well: dittoDimPlot(myRNA, "clustering", do.label = TRUE, do.ellipse = TRUE) dittoDimPlot(myRNA, "clustering", do.label = TRUE, labels.highlight = FALSE, labels.size = 8) if (requireNamespace("plotly", quietly = TRUE)) { dittoDimPlot(myRNA, "gene1", do.hover = TRUE, hover.data = c("gene2", "clustering", "timepoint")) } dittoDimPlot(myRNA, "gene1", add.trajectory.lineages = list(c(1,2,4), c(1,3)), trajectory.cluster.meta = "clustering", sub = "Pseudotime Trajectories") dittoDimPlot(myRNA, "gene1", do.contour = TRUE, contour.color = "lightblue", # Optional, black by default contour.linetype = "dashed") # Optional, solid by default # Plotting ordering can also be adjusted with 'order': dittoDimPlot(myRNA, "timepoint", size = 20, order = "increasing") dittoDimPlot(myRNA, "timepoint", size = 20, order = "decreasing") dittoDimPlot(myRNA, "timepoint", size = 20, order = "randomize")
example(importDittoBulk, echo = FALSE) myRNA # Display discrete data: dittoDimPlot(myRNA, "clustering") # Display continuous data: dittoDimPlot(myRNA, "gene1") # You can also plot multiple sets of continuous data: dittoDimPlot(myRNA, c("gene1", "gene2")) # (See ?multi_dittoDimPlot if you would like to have wholy separate # plots/scales/legends for each set.) # To show currently set clustering for seurat objects, you can use "ident". # To change the dimensional reduction type, use 'reduction.use'. dittoDimPlot(myRNA, "clustering", reduction.use = "pca", dim.1 = 3, dim.2 = 4) # Subset to certain cells with cells.use dittoDimPlot(myRNA, "clustering", cells.us = !myRNA$SNP) # Data can also be split in other ways with 'shape.by' or 'split.by' dittoDimPlot(myRNA, "gene1", shape.by = "clustering", split.by = "SNP") # single split.by element dittoDimPlot(myRNA, "gene1", split.by = c("groups","SNP")) # row and col split.by elements # Modify the look with intuitive inputs dittoDimPlot(myRNA, "clustering", size = 2, opacity = 0.7, show.axes.numbers = FALSE, ylab = NULL, xlab = "tSNE", main = "Plot Title", sub = "subtitle", legend.title = "clustering") # MANY addtional tweaks are possible. # Also, many extra features are easy to add as well: dittoDimPlot(myRNA, "clustering", do.label = TRUE, do.ellipse = TRUE) dittoDimPlot(myRNA, "clustering", do.label = TRUE, labels.highlight = FALSE, labels.size = 8) if (requireNamespace("plotly", quietly = TRUE)) { dittoDimPlot(myRNA, "gene1", do.hover = TRUE, hover.data = c("gene2", "clustering", "timepoint")) } dittoDimPlot(myRNA, "gene1", add.trajectory.lineages = list(c(1,2,4), c(1,3)), trajectory.cluster.meta = "clustering", sub = "Pseudotime Trajectories") dittoDimPlot(myRNA, "gene1", do.contour = TRUE, contour.color = "lightblue", # Optional, black by default contour.linetype = "dashed") # Optional, solid by default # Plotting ordering can also be adjusted with 'order': dittoDimPlot(myRNA, "timepoint", size = 20, order = "increasing") dittoDimPlot(myRNA, "timepoint", size = 20, order = "decreasing") dittoDimPlot(myRNA, "timepoint", size = 20, order = "randomize")
Compact plotting of per group summaries for expression of multiple features
dittoDotPlot( object, vars, group.by, scale = TRUE, split.by = NULL, cells.use = NULL, size = 6, vars.dir = c("x", "y"), categories.split.adjust = TRUE, categories.theme.adjust = TRUE, split.nrow = NULL, split.ncol = NULL, split.adjust = list(), min.color = "grey90", max.color = "#C51B7D", min = "make", max = NA, mid.color = NULL, mid = "make", summary.fxn.color = function(x) { mean(x[x != 0]) }, summary.fxn.size = function(x) { mean(x != 0) }, min.percent = 0.01, max.percent = NA, assay = .default_assay(object), slot = .default_slot(object), adjustment = NULL, swap.rownames = NULL, do.hover = FALSE, main = NULL, sub = NULL, ylab = group.by, y.labels = NULL, y.reorder = NULL, xlab = NULL, x.labels.rotate = vars.dir == "x", groupings.drop.unused = TRUE, theme = theme_classic(), legend.show = TRUE, legend.color.breaks = waiver(), legend.color.breaks.labels = waiver(), legend.color.title = "make", legend.size.title = "percent\nexpression", data.out = FALSE )
dittoDotPlot( object, vars, group.by, scale = TRUE, split.by = NULL, cells.use = NULL, size = 6, vars.dir = c("x", "y"), categories.split.adjust = TRUE, categories.theme.adjust = TRUE, split.nrow = NULL, split.ncol = NULL, split.adjust = list(), min.color = "grey90", max.color = "#C51B7D", min = "make", max = NA, mid.color = NULL, mid = "make", summary.fxn.color = function(x) { mean(x[x != 0]) }, summary.fxn.size = function(x) { mean(x != 0) }, min.percent = 0.01, max.percent = NA, assay = .default_assay(object), slot = .default_slot(object), adjustment = NULL, swap.rownames = NULL, do.hover = FALSE, main = NULL, sub = NULL, ylab = group.by, y.labels = NULL, y.reorder = NULL, xlab = NULL, x.labels.rotate = vars.dir == "x", groupings.drop.unused = TRUE, theme = theme_classic(), legend.show = TRUE, legend.color.breaks = waiver(), legend.color.breaks.labels = waiver(), legend.color.title = "make", legend.size.title = "percent\nexpression", data.out = FALSE )
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
vars |
String vector of gene or metadata names which selects the features to summarize and show.
Example: Alternatively, a named list of string vectors where names represent category labels, such as associated cell types, and values are the gene or metadata names that you wish to have grouped together.
Example: |
group.by |
String representing the name of a metadata to use for separating the cells/samples into discrete groups. |
scale |
String which sets whether the values shown with color (default: mean non-zero expression) should be centered and scaled. |
split.by |
1 or 2 strings naming discrete metadata to use for splitting the cells/samples into multiple plots with ggplot faceting.
|
cells.use |
String vector of cells'/samples' names OR an integer vector specifying the indices of cells/samples which should be included. Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include. |
size |
Number which sets the visual dot size associated with the highest value shown by dot size (default: percent non-zero expression). |
vars.dir |
"x" or "y", sets the axis where |
categories.split.adjust |
Boolean. When
|
categories.theme.adjust |
Boolean. When |
split.nrow , split.ncol
|
Integers which set the dimensions of faceting/splitting when a single metadata is given to |
split.adjust |
A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. 'list(scales = "free")'. For options, when giving 1 metadata to |
min.color , max.color
|
colors to use for minimum and maximum color values.
Default = light grey and purple.
Ignored if |
min , max
|
Numbers which set the values associated with the minimum and maximum colors. |
mid.color |
NULL (default), "ryb", "rwb", "rgb", or a color to use for the midpoint of a three-color color scale. This parameter acts a switch between using a 2-color scale or a 3-color scale:
|
mid |
Number or "make" (default) which sets the value associated with the |
summary.fxn.color , summary.fxn.size
|
A function which sets how color or size will be used to summarize variables' data for each group. Any function can be used as long as it takes in a numeric vector and returns a single numeric value. |
min.percent , max.percent
|
Numbers between 0 and 1 which sets the minimum and maximum percent expression to show. When set to NA, the minimum/maximum of the data are used. |
assay , slot
|
single strings or integers (SCEs and SEs) or an optionally named vector of such values that set which expression data to use.
See |
adjustment |
Should expression data be used directly (default) or should it be adjusted to be
|
swap.rownames |
optionally named string or string vector.
For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object).
When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for.
See |
do.hover |
Logical. Default = |
main |
String which sets the plot title. |
sub |
String which sets the plot subtitle. |
ylab |
String which sets the y/grouping-axis label.
Default is |
y.labels |
String vector, c("label1","label2","label3",...) which overrides the names of the samples/groups. |
y.reorder |
Integer vector. A sequence of numbers, from 1 to the number of groupings, for rearranging the order of groupings. Method: Make a first plot without this input. Then, treating the bottom-most grouping as index 1, and the top-most as index n, values of y.reorder should be these indices, but in the order that you would like them rearranged to be. Recommendation for advanced users: If you find yourself coming back to this input too many times, an alternative solution that can be easier long-term
is to make the target data into a factor, and to put its levels in the desired order: |
xlab |
String which sets the x/var-axis label.
Set to |
x.labels.rotate |
Logical which sets whether the var-labels should be rotated. |
groupings.drop.unused |
Logical. |
theme |
A ggplot theme which will be applied before dittoSeq adjustments.
Default = |
legend.show |
Logical. Whether the legend should be displayed. Default = |
legend.color.breaks |
Numeric vector which sets the discrete values to label in the color-scale legend for continuous data. |
legend.color.breaks.labels |
String vector, with same length as |
legend.color.title , legend.size.title
|
String or |
data.out |
Logical. When set to |
This function will output a compact summary of expression of multiple genes, or of values of multiple numeric metadata, across cell/sample groups (clusters, sample identity, conditions, etc.), where dot-size and dot-color are used to reflect distinct features of the data. Typically, and by default, size will reflect the percent of non-zero values, and color will reflect the mean of non-zero values for each var and group pairing.
Internally, the data for each element of vars
is obtained.
When elements are genes/features, assay
and slot
are utilized to determine which expression data to use,
and adjustment
determines if and how the expression data might be adjusted.
(Note that 'adjustment' would be applied before cells/samples subsetting, and across all groups of cells/samples.)
Groupings are determined using group.by
, and then data for each variable is summarized based on summary.fxn.color
& summary.fxn.size
.
If scale = TRUE
(default setting), the color summary values are centered and scaled.
Doing so 1) puts values for all vars
in a similar range, and 2) emphasizes relative differences between groups.
Finally, data is plotted as dots of differing colors and sizes, with vars
along the vars.dir
-axis and groupings along the other.
Labels along the x-axis can be rotated 45 degrees with x.label.rotate=TRUE
, which is on by default when vars.dir=='x'
.
a ggplot object where dots of different colors and sizes summarize continuous data for multiple features per multiple groups.
Alternatively when data.out = TRUE
, a list containing the plot ("p") and the underlying data as a dataframe ("data").
Alternatively when do.hover = TRUE
, a plotly converted version of the plot where additional data will be displayed when the cursor is hovered over the dots.
Size of the dots can be changed with size
.
Subsetting to utilize only certain cells/samples can be achieved with cells.use
.
Markers can be grouped into categories by providing them to the vars
input as a list, where list element names represent category names, and list element contents are the feature names which each category should contain.
Colors (2-color scale) can be adjusted with min.color
and max.color
.
Coloring can also be switched to a 3-color scale by using the mid.color
parameter. For details, see that parameter's description above.
Displayed value ranges can be adjusted with min
and max
for color, or min.percent
and max.percent
for size.
Titles and axes labels can be adjusted with main
, sub
, xlab
, ylab
, legend.color.title
, and legend.size.title
arguments.
The legend can be hidden by setting legend.show = FALSE
.
The color legend tick marks and associated labels can be adjusted with legend.color.breaks
and legend.color.breaks.labels
, respectively.
The groupings labels and order can be changed using y.labels
and y.reorder
Rotation of x-axis labels can be turned off with x.labels.rotate = FALSE
.
Daniel Bunis
dittoPlotVarsAcrossGroups
for a different method of summarizing expression of multiple features across distinct groups that can be better (and more compact) when the mapping of values to individual genes among the requested set are unimportant.
dittoPlot
and multi_dittoPlot
for plotting of expression and metadata vars, each as separate plots, on a per cell/sample basis.
example(importDittoBulk, echo = FALSE) myRNA # These random data don't mimic dropout, so we'll add some zeros. logcounts(myRNA)[ matrix( sample(c(TRUE,FALSE), ncol(myRNA)*10, p=c(.2,.8), replace = TRUE), ncol=10 )] <- 0 dittoDotPlot( myRNA, c("gene1", "gene2", "gene3", "gene4"), group.by = "clustering") # 'size' adjusts the dot-size associated with the highest percent expression dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", size = 12) # 'scale' input can be used to control / turn off scaling of avg exp values. dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", scale = FALSE) # x-axis label rotation can be controlled with 'x.labels.rotate' dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", x.labels.rotate = FALSE) # The axis that vars get shown on can be swapped with the 'vars.dir' input. dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", vars.dir = "y") # Titles are adjustable via various discrete inputs: dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", main = "Title", sub = "Subtitle", ylab = "y-axis label", xlab = "x-axis label", legend.color.title = "Colors title", legend.size.title = "Dot size title") # You can also bin vars into groups by providing them in a named list: dittoDotPlot(myRNA, group.by = "clustering", vars = list( 'Naive' = c("gene1", "gene2"), 'Stimulated' = c("gene3", "gene4") ) ) # The 'categories.split.adjust' and 'categories.theme.adjust' arguments then # control whether 'split.adjust' and 'theme' input contents, respectively, # will be added to in ways that make these categories actually appear, and # work, like categories. # They both default to TRUE, and the axis they affect follows 'vars.dir'. dittoDotPlot(myRNA, group.by = "clustering", vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")) ) dittoDotPlot(myRNA, group.by = "clustering", vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")), split.by = "conditions" ) dittoDotPlot(myRNA, group.by = "clustering", vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")), categories.split.adjust = FALSE, categories.theme.adjust = FALSE ) # Now with 'vars.dir' changed to 'y'... dittoDotPlot(myRNA, group.by = "clustering", vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")), vars.dir = "y" ) dittoDotPlot(myRNA, group.by = "clustering", vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")), split.by = "conditions", vars.dir = "y" ) # Coloring can be swapped from the default 2-color scale to a 3-color scale # by using the 'mid.color' input: dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", mid.color = "white" ) # Setting it to "ryb", "rgb", or "rwb" quickly updates this input as well as # 'min.color' and 'max.color', making the affect of these next two calls # equivalent: dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", mid.color = "rgb" ) dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", min.color = "#2166AC", # (blue) mid.color = "gray97", # (gray) max.color = "#B2182B" # (red) ) # For certain specialized applications, it may be helpful to adjust the # functions used for summarizing the data as well. Inputs are: # summary.fxn.color & summary.fxn.size # Requirement for each: Any function that takes in a numeric vector & # returns, as output, a single numeric value. dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", summary.fxn.color = mean, legend.color.title = "mean\nexpression\nincluding 0s", x.labels.rotate = FALSE, scale = FALSE)
example(importDittoBulk, echo = FALSE) myRNA # These random data don't mimic dropout, so we'll add some zeros. logcounts(myRNA)[ matrix( sample(c(TRUE,FALSE), ncol(myRNA)*10, p=c(.2,.8), replace = TRUE), ncol=10 )] <- 0 dittoDotPlot( myRNA, c("gene1", "gene2", "gene3", "gene4"), group.by = "clustering") # 'size' adjusts the dot-size associated with the highest percent expression dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", size = 12) # 'scale' input can be used to control / turn off scaling of avg exp values. dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", scale = FALSE) # x-axis label rotation can be controlled with 'x.labels.rotate' dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", x.labels.rotate = FALSE) # The axis that vars get shown on can be swapped with the 'vars.dir' input. dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", vars.dir = "y") # Titles are adjustable via various discrete inputs: dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", main = "Title", sub = "Subtitle", ylab = "y-axis label", xlab = "x-axis label", legend.color.title = "Colors title", legend.size.title = "Dot size title") # You can also bin vars into groups by providing them in a named list: dittoDotPlot(myRNA, group.by = "clustering", vars = list( 'Naive' = c("gene1", "gene2"), 'Stimulated' = c("gene3", "gene4") ) ) # The 'categories.split.adjust' and 'categories.theme.adjust' arguments then # control whether 'split.adjust' and 'theme' input contents, respectively, # will be added to in ways that make these categories actually appear, and # work, like categories. # They both default to TRUE, and the axis they affect follows 'vars.dir'. dittoDotPlot(myRNA, group.by = "clustering", vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")) ) dittoDotPlot(myRNA, group.by = "clustering", vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")), split.by = "conditions" ) dittoDotPlot(myRNA, group.by = "clustering", vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")), categories.split.adjust = FALSE, categories.theme.adjust = FALSE ) # Now with 'vars.dir' changed to 'y'... dittoDotPlot(myRNA, group.by = "clustering", vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")), vars.dir = "y" ) dittoDotPlot(myRNA, group.by = "clustering", vars = list(Naive = c("gene1", "gene2"), Stimulated = c("gene3")), split.by = "conditions", vars.dir = "y" ) # Coloring can be swapped from the default 2-color scale to a 3-color scale # by using the 'mid.color' input: dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", mid.color = "white" ) # Setting it to "ryb", "rgb", or "rwb" quickly updates this input as well as # 'min.color' and 'max.color', making the affect of these next two calls # equivalent: dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", mid.color = "rgb" ) dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", min.color = "#2166AC", # (blue) mid.color = "gray97", # (gray) max.color = "#B2182B" # (red) ) # For certain specialized applications, it may be helpful to adjust the # functions used for summarizing the data as well. Inputs are: # summary.fxn.color & summary.fxn.size # Requirement for each: Any function that takes in a numeric vector & # returns, as output, a single numeric value. dittoDotPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", summary.fxn.color = mean, legend.color.title = "mean\nexpression\nincluding 0s", x.labels.rotate = FALSE, scale = FALSE)
Plot cell type/cluster/identity frequencies per sample and per grouping
dittoFreqPlot( object, var, sample.by = NULL, group.by, color.by = group.by, vars.use = NULL, scale = c("percent", "count"), max.normalize = FALSE, plots = c("boxplot", "jitter"), split.nrow = NULL, split.ncol = NULL, split.adjust = list(), cells.use = NULL, data.out = FALSE, do.hover = FALSE, color.panel = dittoColors(), colors = seq_along(color.panel), y.breaks = NULL, min = 0, max = NA, var.labels.rename = NULL, var.labels.reorder = NULL, x.labels = NULL, x.labels.rotate = TRUE, x.reorder = NULL, theme = theme_classic(), xlab = group.by, ylab = "make", main = "make", sub = NULL, jitter.size = 1, jitter.width = 0.2, jitter.color = "black", jitter.position.dodge = boxplot.position.dodge, do.raster = FALSE, raster.dpi = 300, boxplot.width = 0.4, boxplot.color = "black", boxplot.show.outliers = NA, boxplot.outlier.size = 1.5, boxplot.fill = TRUE, boxplot.position.dodge = vlnplot.width, boxplot.lineweight = 1, vlnplot.lineweight = 1, vlnplot.width = 1, vlnplot.scaling = "area", vlnplot.quantiles = NULL, ridgeplot.lineweight = 1, ridgeplot.scale = 1.25, ridgeplot.ymax.expansion = NA, ridgeplot.shape = c("smooth", "hist"), ridgeplot.bins = 30, ridgeplot.binwidth = NULL, add.line = NULL, line.linetype = "dashed", line.color = "black", legend.show = TRUE, legend.title = color.by )
dittoFreqPlot( object, var, sample.by = NULL, group.by, color.by = group.by, vars.use = NULL, scale = c("percent", "count"), max.normalize = FALSE, plots = c("boxplot", "jitter"), split.nrow = NULL, split.ncol = NULL, split.adjust = list(), cells.use = NULL, data.out = FALSE, do.hover = FALSE, color.panel = dittoColors(), colors = seq_along(color.panel), y.breaks = NULL, min = 0, max = NA, var.labels.rename = NULL, var.labels.reorder = NULL, x.labels = NULL, x.labels.rotate = TRUE, x.reorder = NULL, theme = theme_classic(), xlab = group.by, ylab = "make", main = "make", sub = NULL, jitter.size = 1, jitter.width = 0.2, jitter.color = "black", jitter.position.dodge = boxplot.position.dodge, do.raster = FALSE, raster.dpi = 300, boxplot.width = 0.4, boxplot.color = "black", boxplot.show.outliers = NA, boxplot.outlier.size = 1.5, boxplot.fill = TRUE, boxplot.position.dodge = vlnplot.width, boxplot.lineweight = 1, vlnplot.lineweight = 1, vlnplot.width = 1, vlnplot.scaling = "area", vlnplot.quantiles = NULL, ridgeplot.lineweight = 1, ridgeplot.scale = 1.25, ridgeplot.ymax.expansion = NA, ridgeplot.shape = c("smooth", "hist"), ridgeplot.bins = 30, ridgeplot.binwidth = NULL, add.line = NULL, line.linetype = "dashed", line.color = "black", legend.show = TRUE, legend.title = color.by )
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
var |
String name of a metadata that contains discrete data, or a factor or vector containing such data for all cells/samples in the target |
sample.by |
String name of a metadata containing which samples each cell belongs to. Note that when this is not provided, there will only be one data point per grouping.
A warning can be expected then for all |
group.by |
String representing the name of a metadata to use for separating the cells/samples into discrete groups. |
color.by |
String representing the name of a metadata to use for setting fills.
Great for highlighting supersets or subgroups when wanted, but it defaults to |
vars.use |
String or string vector naming a subset of the values of Hint: use Note: When |
scale |
"count" or "percent". Sets whether data should be shown as counts versus percentage. |
max.normalize |
Logical which sets whether the data for each When set to |
plots |
String vector which sets the types of plots to include: possibilities = "jitter", "boxplot", "vlnplot", "ridgeplot". Order matters: c("vlnplot", "boxplot", "jitter") will put a violin plot in the back, boxplot in the middle, and then individual dots in the front. See details section for more info. |
split.nrow , split.ncol
|
Integers which set the dimensions of the facet grid. (the |
split.adjust |
A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. 'list(scales = "free")'. Faceting for this dittoFreqPlot is always by the |
cells.use |
String vector of cells'(/samples' for bulk data) names OR an integer vector specifying the indices of cells/samples which should be included. Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include. |
data.out |
Logical. When set to |
do.hover |
Logical. Default = |
color.panel |
String vector which sets the colors to draw from for plot fills.
Default = |
colors |
Integer vector, the indexes / order, of colors from color.panel to actually use.
(Provides an alternative to directly modifying |
y.breaks |
Numeric vector, a set of breaks that should be used as major gridlines. c(break1,break2,break3,etc.). |
min , max
|
Scalars which control the zoom of the plot. These inputs set the minimum / maximum values of the data to display. Default = NA, which allows ggplot to set these limits based on the range of all data being shown. |
var.labels.rename |
String vector for renaming the distinct identities of Hint: use |
var.labels.reorder |
Integer vector. A sequence of numbers, from 1 to the number of distinct Method: Make a first plot without this input.
Then, treating the top-left-most grouping as index 1, and the bottom-right-most as index n.
Values of |
x.labels |
String vector, c("label1","label2","label3",...) which overrides the names of groupings. |
x.labels.rotate |
Logical which sets whether the labels should be rotated.
Default: |
x.reorder |
Integer vector. A sequence of numbers, from 1 to the number of groupings, for rearranging the order of x-axis groupings. Method: Make a first plot without this input. Then, treating the leftmost grouping as index 1, and the rightmost as index n. Values of x.reorder should be these indices, but in the order that you would like them rearranged to be. Recommendation for advanced users: If you find yourself coming back to this input too many times, an alternative solution that can be easier long-term
is to make the target data into a factor, and to put its levels in the desired order: |
theme |
A ggplot theme which will be applied before dittoSeq adjustments.
Default = |
xlab |
String which sets the grouping-axis label (=x-axis for box and violin plots, y-axis for ridgeplots).
Set to |
ylab |
String, sets the continuous-axis label (=y-axis for box and violin plots, x-axis for ridgeplots). Default = "make" and if left as make, a title will be automatically generated. |
main |
String, sets the plot title. Default = "make" and if left as make, a title will be automatically generated. To remove, set to |
sub |
String, sets the plot subtitle |
jitter.size |
Scalar which sets the size of the jitter shapes. |
jitter.width |
Scalar that sets the width/spread of the jitter in the x direction. Ignored in ridgeplots. Note for when |
jitter.color |
String which sets the color of the jitter shapes |
jitter.position.dodge |
Scalar which adjusts the relative distance between jitter widths when multiple subgroups exist per |
do.raster |
Logical. When set to |
raster.dpi |
Number indicating dots/pixels per inch (dpi) to use for rasterization. Default = 300. |
boxplot.width |
Scalar which sets the width/spread of the boxplot in the x direction |
boxplot.color |
String which sets the color of the lines of the boxplot |
boxplot.show.outliers |
Logical, whether outliers should by including in the boxplot.
Default is |
boxplot.outlier.size |
Scalar which adjusts the size of points used to mark outliers |
boxplot.fill |
Logical, whether the boxplot should be filled in or not. Known bug: when boxplot fill is turned off, outliers do not render. |
boxplot.position.dodge |
Scalar which adjusts the relative distance between boxplots when multiple are drawn per grouping (a.k.a. when |
boxplot.lineweight |
Scalar which adjusts the thickness of boxplot lines. |
vlnplot.lineweight |
Scalar which sets the thickness of the line that outlines the violin plots. |
vlnplot.width |
Scalar which sets the width/spread of violin plots in the x direction |
vlnplot.scaling |
String which sets how the widths of the of violin plots are set in relation to each other.
Options are "area", "count", and "width". If the default is not right for your data, I recommend trying "width".
For an explanation of each, see |
vlnplot.quantiles |
Single number or numeric vector of values in [0,1] naming quantiles at which to draw a horizontal line within each violin plot. Example: |
ridgeplot.lineweight |
Scalar which sets the thickness of the ridgeplot outline. |
ridgeplot.scale |
Scalar which sets the distance/overlap between ridgeplots. A value of 1 means the tallest density curve just touches the baseline of the next higher one. Higher numbers lead to greater overlap. Default = 1.25 |
ridgeplot.ymax.expansion |
Scalar which adjusts the minimal space between the top-most grouping and the top of the plot in order to ensure that the curve is not cut off by the plotting grid. The larger the value, the greater the space requested. When left as NA, dittoSeq will attempt to determine an ideal value itself based on the number of groups & linear interpolation between these goal posts: 0.6 when g<=3, 0.1 when g==12, and 0.05 when g>=34, where g is the number of groups. |
ridgeplot.shape |
Either "smooth" or "hist", sets whether ridges will be smoothed (the typical, and default) versus rectangular like a histogram.
(Note: as of the time shape "hist" was added, combination of jittered points is not supported by the |
ridgeplot.bins |
Integer which sets how many chunks to break the x-axis into when |
ridgeplot.binwidth |
Integer which sets the width of chunks to break the x-axis into when |
add.line |
numeric value(s) where one or multiple line(s) should be added |
line.linetype |
String which sets the type of line for |
line.color |
String that sets the color(s) of the |
legend.show |
Logical. Whether the legend should be displayed. Default = |
legend.title |
String or |
The function creates a dataframe containing counts and percent makeup of var
identities per sample if sample.by
is given, or per group if only group.by
is given.
color.by
can optionally be used to add subgroupings to calculations and ultimate plots, or to convey super-groups of group.by
groupings.
Typically, var
will be pointed to clustering or cell type annotations, but in truth it can be given any discrete data.
If a set of cells to use is indicated with the cells.use
input, only those cells/samples are used for counts and percent makeup calculations.
If a set of var
-values to show is indicated with the vars.use
input, the data.frame is trimmed at the end to include only corresponding rows.
If max.normalized
is set to TRUE
, counts and percent data are transformed to a 0-1 scale, which makes better use of white space for lower frequency var
-values.
Either percent of total (scale = "percent"
), which is the default, or counts (if scale = "count"
)
data is then (gg)plotted with the data representation types in plots
by utilizing the same machinery as dittoPlot
.
Faceting by var
-data values is utilized to achieve per var
-value (e.g. cluster or cell type) granularity.
See below for additional customization options!
A ggplot plot where frequencies of discrete data, grouped by sample, condition, etc., is shown on the y-axis by a violin plot, boxplot, and/or jittered points, or on the x-axis by a ridgeplot with or without jittered points.
Alternatively, if data.out = TRUE
, a list containing the plot ("p") and a dataframe of the underlying data ("data").
Alternatively, if do.hover = TRUE
, a plotly conversion of the ggplot output in which underlying data can be retrieved upon hovering the cursor over the plot.
The function is restricted in that each samples' cells, indicated by the unique values of sample.by
-data, must exist within single group.by
and color.by
groupings.
Thus, in order to ensure all valid var
-data composition data points are generated, prior to calculations...
var
-data are ensured to be a factor, which ensures a calculation will be run for every var
-value (a.k.a. cell type or cluster)
group.by
-data and color-by
-data are treated as non-factor data, which ensures that calculations are run only for the groupings that each sample is associated with.
The plots
argument determines the types of data representation that will be generated, as well as their order from back to front.
Options are "jitter"
, "boxplot"
, "vlnplot"
, and "ridgeplot"
.
Each plot type has specific associated options which are controlled by variables that start with their associated string.
For example, all jitter adjustments start with "jitter.
", such as jitter.size
and jitter.width
.
Inclusion of "ridgeplot"
overrides "boxplot"
and "vlnplot"
presence and changes the plot to be horizontal.
Additionally:
Colors can be adjusted with color.panel
.
Subgroupings: color.by
can be utilized to split major group.by
groupings into subgroups.
When this is done in y-axis plotting, dittoSeq automatically ensures the centers of all geoms will align,
but users will need to manually adjust jitter.width
to less than 0.5/num_subgroups to avoid overlaps.
There are also three inputs through which one can use to control geom-center placement, but the easiest way to do all at once so is to just adjust vlnplot.width
!
The other two: boxplot.position.dodge
, and jitter.position.dodge
.
Line(s) can be added at single or multiple value(s) by providing these values to add.line
.
Linetype and color are set with line.linetype
, which is "dashed" by default, and line.color
, which is "black" by default.
Titles and axes labels can be adjusted with main
, sub
, xlab
, ylab
, and legend.title
arguments.
The legend can be hidden by setting legend.show = FALSE
.
y-axis zoom and tick marks can be adjusted using min
, max
, and y.breaks
.
x-axis labels and groupings can be changed / reordered using x.labels
and x.reorder
, and rotation of these labels can be turned on/off with x.labels.rotate = TRUE/FALSE
.
Daniel Bunis
dittoBarPlot
for a data representation that emphasizes total makeup of samples/groups rather than focusing on the var
-data values individually.
# Establish some workable example data example(importDittoBulk, echo = FALSE) myRNA1 <- myRNA colnames(myRNA) <- paste0(colnames(myRNA),"_1") example(importDittoBulk, echo = FALSE) myRNA <- cbind(myRNA, myRNA1) myRNA <- setBulk(myRNA, FALSE) myRNA$sample <- rep(1:12, each = 10) myRNA$groups <- rep(c("A", "B"), each = 60) myRNA$subgroups <- rep(as.character(c(1:3,1:3,1:3,1:3)), each = 10) myRNA # There are three main inputs for this function, in addition to 'object'. # var = typically this will be cell types annotations or clustering # sample.by = the name of a metadata containing sample assignment of cells. # group.by = how to group the data on the x-axis (y-axis for ridgeplots) dittoFreqPlot(myRNA, var = "clustering", sample.by = "sample", group.by = "groups") # 'color.by' can also be set differently from 'group.by' to have the effect # of highlighting supersets or subgroupings: dittoFreqPlot(myRNA, "clustering", group.by = "groups", sample.by = "sample", color.by = "subgroups") # The var-values shown can be subset with 'vars.use' dittoFreqPlot(myRNA, "clustering", group.by = "groups", sample.by = "sample", color.by = "subgroups", vars.use = 1:2) # Lower frequency groups can be expanded to use the entire y-axis by: # turning on 'max.normalize'-ation: dittoFreqPlot(myRNA, "clustering", group.by = "groups", sample.by = "sample", color.by = "subgroups", max.normalize = TRUE) # or by setting y-scale limits to be set by the contents of facets: dittoFreqPlot(myRNA, "clustering", group.by = "groups", sample.by = "sample", color.by = "subgroups", split.adjust = list(scales = "free_y")) # Data representations can also be selected and reordered with the 'plots' # input, and further adjusted with inputs applying to each representation. dittoFreqPlot(myRNA, var = "clustering", sample.by = "sample", group.by = "groups", plots = c("vlnplot", "boxplot", "jitter"), vlnplot.lineweight = 0.2, boxplot.fill = FALSE, boxplot.lineweight = 0.2) # Finally, 'sample.by' is not technically required. When not given, a # single-datapoint of overall composition stats will be shown for each # grouping. # Just note, all data representation other than "jitter" will complain # due to there only being the one datapoint per group. dittoFreqPlot(myRNA, var = "clustering", group.by = "groups", color.by = "subgroups", plots = "jitter")
# Establish some workable example data example(importDittoBulk, echo = FALSE) myRNA1 <- myRNA colnames(myRNA) <- paste0(colnames(myRNA),"_1") example(importDittoBulk, echo = FALSE) myRNA <- cbind(myRNA, myRNA1) myRNA <- setBulk(myRNA, FALSE) myRNA$sample <- rep(1:12, each = 10) myRNA$groups <- rep(c("A", "B"), each = 60) myRNA$subgroups <- rep(as.character(c(1:3,1:3,1:3,1:3)), each = 10) myRNA # There are three main inputs for this function, in addition to 'object'. # var = typically this will be cell types annotations or clustering # sample.by = the name of a metadata containing sample assignment of cells. # group.by = how to group the data on the x-axis (y-axis for ridgeplots) dittoFreqPlot(myRNA, var = "clustering", sample.by = "sample", group.by = "groups") # 'color.by' can also be set differently from 'group.by' to have the effect # of highlighting supersets or subgroupings: dittoFreqPlot(myRNA, "clustering", group.by = "groups", sample.by = "sample", color.by = "subgroups") # The var-values shown can be subset with 'vars.use' dittoFreqPlot(myRNA, "clustering", group.by = "groups", sample.by = "sample", color.by = "subgroups", vars.use = 1:2) # Lower frequency groups can be expanded to use the entire y-axis by: # turning on 'max.normalize'-ation: dittoFreqPlot(myRNA, "clustering", group.by = "groups", sample.by = "sample", color.by = "subgroups", max.normalize = TRUE) # or by setting y-scale limits to be set by the contents of facets: dittoFreqPlot(myRNA, "clustering", group.by = "groups", sample.by = "sample", color.by = "subgroups", split.adjust = list(scales = "free_y")) # Data representations can also be selected and reordered with the 'plots' # input, and further adjusted with inputs applying to each representation. dittoFreqPlot(myRNA, var = "clustering", sample.by = "sample", group.by = "groups", plots = c("vlnplot", "boxplot", "jitter"), vlnplot.lineweight = 0.2, boxplot.fill = FALSE, boxplot.lineweight = 0.2) # Finally, 'sample.by' is not technically required. When not given, a # single-datapoint of overall composition stats will be shown for each # grouping. # Just note, all data representation other than "jitter" will complain # due to there only being the one datapoint per group. dittoFreqPlot(myRNA, var = "clustering", group.by = "groups", color.by = "subgroups", plots = "jitter")
Given a set of genes, cells/samples, and metadata names for column annotations, this function will retrieve the expression data for those genes and cells, and the annotation data for those cells.
It will then utilize these data to make a heatmap using the pheatmap
function of either the pheatmap
(default) or ComplexHeatmap
package.
dittoHeatmap( object, genes = getGenes(object, assay), metas = NULL, cells.use = NULL, annot.by = NULL, order.by = .default_order(object, annot.by), main = NA, cell.names.meta = NULL, assay = .default_assay(object), slot = .default_slot(object), swap.rownames = NULL, heatmap.colors = colorRampPalette(c("blue", "white", "red"))(50), scaled.to.max = FALSE, heatmap.colors.max.scaled = colorRampPalette(c("white", "red"))(25), annot.colors = c(dittoColors(), dittoColors(1)[seq_len(7)]), annotation_col = NULL, annotation_colors = NULL, data.out = FALSE, highlight.features = NULL, show_colnames = isBulk(object), show_rownames = TRUE, scale = "row", cluster_cols = isBulk(object), border_color = NA, legend_breaks = NA, drop_levels = FALSE, breaks = NA, complex = FALSE, ... )
dittoHeatmap( object, genes = getGenes(object, assay), metas = NULL, cells.use = NULL, annot.by = NULL, order.by = .default_order(object, annot.by), main = NA, cell.names.meta = NULL, assay = .default_assay(object), slot = .default_slot(object), swap.rownames = NULL, heatmap.colors = colorRampPalette(c("blue", "white", "red"))(50), scaled.to.max = FALSE, heatmap.colors.max.scaled = colorRampPalette(c("white", "red"))(25), annot.colors = c(dittoColors(), dittoColors(1)[seq_len(7)]), annotation_col = NULL, annotation_colors = NULL, data.out = FALSE, highlight.features = NULL, show_colnames = isBulk(object), show_rownames = TRUE, scale = "row", cluster_cols = isBulk(object), border_color = NA, legend_breaks = NA, drop_levels = FALSE, breaks = NA, complex = FALSE, ... )
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
genes |
String vector, c("gene1","gene2","gene3",...) = the list of genes to put in the heatmap. If not provided, defaults to all genes of the object / assay. |
metas |
String vector, c("meta1","meta2","meta3",...) = the list of metadata variables to put in the heatmap. |
cells.use |
String vector of cells'/samples' names OR an integer vector specifying the indices of cells/samples which should be included. Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include. |
annot.by |
String name of any metadata slots containing how the cells/samples should be annotated. |
order.by |
Single string, string vector, or numeric vector which sets how cells/samples (columns) will be ordered when Strings should be the name of a gene, or metadata slot, but can also be multiple such values in order of priority. Alternatively, can be a numeric vector which gives the column index order directly. |
main |
String that sets the title for the heatmap. |
cell.names.meta |
quoted "name" of a meta.data slot to use for naming the columns instead of using the raw cell/sample names. |
assay , slot
|
single strings or integers (SCEs and SEs) or an optionally named vector of such values that set which expression data to use.
See |
swap.rownames |
optionally named string or string vector.
For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object).
When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for.
See |
heatmap.colors |
the colors to use within the heatmap when (default setting) |
scaled.to.max |
Logical, |
heatmap.colors.max.scaled |
the colors to use within the heatmap when |
annot.colors |
String (color) vector where each color will be assigned to an individual annotation in the generated annotation bars. |
data.out |
Logical. When set to |
highlight.features |
String vector of genes/metadata whose names you would like to show. Only these genes/metadata will be named in the resulting heatmap. |
show_colnames , show_rownames , scale , annotation_col , annotation_colors
|
arguments passed to
|
cluster_cols , border_color , legend_breaks , breaks , drop_levels , ...
|
other arguments passed to |
complex |
Logical which sets whether the heatmap should be generated with ComplexHeatmap ( |
This function serves as a wrapper for creating heatmaps from bulk or single-cell RNAseq data with pheatmap::pheatmap
,
by essentially automating the data extraction and annotation building steps.
(Or alternatively with ComplexHeatmap::pheatmap
if complex
is set to true
.
The function will extract the expression matrix for a set of genes
and/or an optional subset of cells / samples to use via cells.use
,
This matrix is either left as is, default (for scaling within the ultimate call to pheatmap), or if scaled.to.max = TRUE
, is scaled by dividing each row by its maximum value.
When provided with a set of metadata slot names to use for building annotations (with the annot.by
input),
the relevant metadata is retrieved from the object
and compiled into a pheatmap
-ready annotation_col
input.
The input annot.colors
is used to establish the set of colors that should be used for building a pheatmap
-ready annotation_colors
input as well,
unless such an input has been provided by the user. See below for further details.
A pheatmap
object.
Alternatively, if complex
is set to TRUE
, a Heatmap
Alternatively, if data.out
is set to TRUE
, a list containing all arguments that would have be passed to pheatmap to generate such a heatmap.
The cells can be ordered in a set way using the order.by
input.
Such ordering happens by default for single-cell RNAseq data when any metadata are provided to annot.by
as it is often unfeasible to cluster thousands of cells.
A plot title can be added with main
.
Gene or cell/sample names can be hidden with show_rownames
and show_colnames
, respectively, or...
Particular features can also be selected for labeling using the highlight.features
input.
Names of all cells/samples can be replaced with the contents of a metadata slot using the cell.names.meta
input.
Additional tweaks are possible through use of pheatmap
inputs which will be directly passed through.
Some examples of useful pheatmap
parameters are:
cluster_cols
and cluster_rows
for controlling clustering.
Note: cluster_cols will always be over-written to be FALSE
when the input order.by
is used above.
treeheight_row
and treeheight_col
for setting how large the trees on the side/top should be drawn.
cutree_col
and cutree_row
for spliting the heatmap based on kmeans clustering
When complex
is set to TRUE
, additional inputs for the Heatmap
function can be given as well.
Some examples:
use_raster
to have the heatmap rasterized/flattened to pixels which can make working with large heatmaps in a figure editor, like Illustrator, simpler.
name
to give the heatmap color scale a custom title.
In typical operation, dittoHeatmap pulls metadata annotations given to annot.by
to build a pheatmap-annotation_col
input,
then it uses the colors provided to annot.colors
to create the pheatmap-annotation_colors
input which sets the annotation coloring.
Specifically...
colors for the values of discrete metadata are pulled from the start of the annot.colors
vector, in the order that they are given to annot.by
colors for the values of continuous metadata are pulled from the end of the annot.colors
vector, in the order that they are given to annot.by
To customize colors or add additional column or row annotations, users can also provide annotation_colors
, annotation_col
, or annotation_row
pheatmap-inputs directly.
General structure is described below, but see pheatmap
for additional details and examples.
annotation_col
= a data.frame with rownames of the barcodes/names of all cells/samples in the dataset & columns representing annotations.
Names of columns are used as the annotation titles. *dittoSeq will append any annot.by
annotations to this dataframe.
annotation_row
= a data.frame with rownames of the genes/feature of the dataset & columns representing annotations.
Names of columns are used as the annotation titles.
annotation_colors
= a named list of string (color) vectors.
Vectors must be named by the row or column annotation title that they are associated with.
Optionally, individual colors can be named with the values that they should be associated with.
Partial annotation_colors
lists (containing vectors for only certain annotations) will have colors for left out annotations filled in automatically.
For such filling, annot.colors
are pulled for column annotations first, then for row annotations.
Daniel Bunis and Jared Andrews
pheatmap::pheatmap
, for how to add additional heatmap tweaks,
OR or ComplexHeatmap::pheatmap
and Heatmap
for when you want to turn on rasterization or any additional customizations offered by this fantastic package.
metaLevels
for helping to create manual annotation_colors inputs.
This function universally checks the options/levels of a string, factor (filled only by default), or numerical metadata.
example(importDittoBulk, echo = FALSE) scRNA <- setBulk(myRNA, FALSE) # We now have two SCEs for our example purposes: # 'myRNA' will be treated as a bulk RNAseq dataset # 'scRNA' will be treated as a single-cell RNAseq dataset # Pick a set of genes genes <- getGenes(myRNA)[1:30] # Make a heatmap with cells/samples annotated by their clusters dittoHeatmap(myRNA, genes, annot.by = "clustering") # For single-cell data, you will typically have more cells than can be # clustered quickly. Thus, cell clustering is turned off by default for # single-cell data. dittoHeatmap(scRNA, genes, annot.by = "clustering") # Using the 'order.by' input: # Ordering by a useful metadata or gene is often helpful. # For single-cell data, order.by defaults to the first element given to # annot.by. # For bulk data, order.by must be set separately. dittoHeatmap(myRNA, genes, annot.by = "clustering", order.by = "clustering", cluster_cols = FALSE) # 'order.by' can be multiple metadata/genes, or a vector of indexes directly dittoHeatmap(scRNA, genes, annot.by = "clustering", order.by = c("clustering", "timepoint")) dittoHeatmap(scRNA, genes, annot.by = "clustering", order.by = ncol(scRNA):1) # When there are many cells, showing names becomes less useful. # Names can be turned off with the 'show_colnames' parameter. dittoHeatmap(scRNA, genes, annot.by = "groups", show_colnames = FALSE) # When theree are many many cells & genes, rasterization can be super useful # as well. # Rasterization, or flattening of the distinct color objects to a matrix of # pixels, is the default for large heatmaps in the ComplexHeatmap package, # and you can have the heatmap rendered with this package (rather than the # pheatmap package) by setting 'complex = TRUE'. # Our data here is too small to hit that defaulting switch, so lets give # the direct input, 'use_raster' as well: if (requireNamespace("ComplexHeatmap")) { # Checks if you have the package. dittoHeatmap(scRNA, genes, annot.by = "groups", show_colnames = FALSE, complex = TRUE, use_raster = TRUE) } # Additionally, it is recommended for single-cell data that the parameter # scaled.to.max be set to TRUE, or scale be "none" and turned off altogether, # because these data are generally enriched for zeros that otherwise get # scaled to a negative value. dittoHeatmap(myRNA, genes, annot.by = "groups", order.by = "groups", show_colnames = FALSE, scaled.to.max = TRUE)
example(importDittoBulk, echo = FALSE) scRNA <- setBulk(myRNA, FALSE) # We now have two SCEs for our example purposes: # 'myRNA' will be treated as a bulk RNAseq dataset # 'scRNA' will be treated as a single-cell RNAseq dataset # Pick a set of genes genes <- getGenes(myRNA)[1:30] # Make a heatmap with cells/samples annotated by their clusters dittoHeatmap(myRNA, genes, annot.by = "clustering") # For single-cell data, you will typically have more cells than can be # clustered quickly. Thus, cell clustering is turned off by default for # single-cell data. dittoHeatmap(scRNA, genes, annot.by = "clustering") # Using the 'order.by' input: # Ordering by a useful metadata or gene is often helpful. # For single-cell data, order.by defaults to the first element given to # annot.by. # For bulk data, order.by must be set separately. dittoHeatmap(myRNA, genes, annot.by = "clustering", order.by = "clustering", cluster_cols = FALSE) # 'order.by' can be multiple metadata/genes, or a vector of indexes directly dittoHeatmap(scRNA, genes, annot.by = "clustering", order.by = c("clustering", "timepoint")) dittoHeatmap(scRNA, genes, annot.by = "clustering", order.by = ncol(scRNA):1) # When there are many cells, showing names becomes less useful. # Names can be turned off with the 'show_colnames' parameter. dittoHeatmap(scRNA, genes, annot.by = "groups", show_colnames = FALSE) # When theree are many many cells & genes, rasterization can be super useful # as well. # Rasterization, or flattening of the distinct color objects to a matrix of # pixels, is the default for large heatmaps in the ComplexHeatmap package, # and you can have the heatmap rendered with this package (rather than the # pheatmap package) by setting 'complex = TRUE'. # Our data here is too small to hit that defaulting switch, so lets give # the direct input, 'use_raster' as well: if (requireNamespace("ComplexHeatmap")) { # Checks if you have the package. dittoHeatmap(scRNA, genes, annot.by = "groups", show_colnames = FALSE, complex = TRUE, use_raster = TRUE) } # Additionally, it is recommended for single-cell data that the parameter # scaled.to.max be set to TRUE, or scale be "none" and turned off altogether, # because these data are generally enriched for zeros that otherwise get # scaled to a negative value. dittoHeatmap(myRNA, genes, annot.by = "groups", order.by = "groups", show_colnames = FALSE, scaled.to.max = TRUE)
Show RNAseq data, grouped into hexagonal bins, on a scatter or dimensionality reduction plot
dittoDimHex( object, color.var = NULL, bins = 30, color.method = NULL, reduction.use = .default_reduction(object), dim.1 = 1, dim.2 = 2, cells.use = NULL, color.panel = dittoColors(), colors = seq_along(color.panel), split.by = NULL, extra.vars = NULL, multivar.split.dir = c("col", "row"), split.nrow = NULL, split.ncol = NULL, split.adjust = list(), assay = .default_assay(object), slot = .default_slot(object), adjustment = NULL, swap.rownames = NULL, assay.extra = assay, slot.extra = slot, adjustment.extra = adjustment, show.axes.numbers = TRUE, show.grid.lines = !grepl("umap|tsne", tolower(reduction.use)), main = "make", sub = NULL, xlab = "make", ylab = "make", theme = theme_bw(), do.contour = FALSE, contour.color = "black", contour.linetype = 1, min.density = NA, max.density = NA, min.color = "#F0E442", max.color = "#0072B2", min.opacity = 0.2, max.opacity = 1, min = NA, max = NA, rename.color.groups = NULL, do.ellipse = FALSE, do.label = FALSE, labels.size = 5, labels.highlight = TRUE, labels.repel = TRUE, labels.split.by = split.by, labels.repel.adjust = list(), add.trajectory.lineages = NULL, add.trajectory.curves = NULL, trajectory.cluster.meta, trajectory.arrow.size = 0.15, data.out = FALSE, legend.show = TRUE, legend.color.title = "make", legend.color.breaks = waiver(), legend.color.breaks.labels = waiver(), legend.density.title = if (isBulk(object)) "Samples" else "Cells", legend.density.breaks = waiver(), legend.density.breaks.labels = waiver() ) dittoScatterHex( object, x.var, y.var, color.var = NULL, bins = 30, color.method = NULL, split.by = NULL, extra.vars = NULL, cells.use = NULL, color.panel = dittoColors(), colors = seq_along(color.panel), multivar.split.dir = c("col", "row"), split.nrow = NULL, split.ncol = NULL, split.adjust = list(), assay.x = .default_assay(object), slot.x = .default_slot(object), adjustment.x = NULL, assay.y = .default_assay(object), slot.y = .default_slot(object), adjustment.y = NULL, assay.color = .default_assay(object), slot.color = .default_slot(object), adjustment.color = NULL, assay.extra = .default_assay(object), slot.extra = .default_slot(object), adjustment.extra = NULL, swap.rownames = NULL, min.density = NA, max.density = NA, min.color = "#F0E442", max.color = "#0072B2", min.opacity = 0.2, max.opacity = 1, min = NA, max = NA, rename.color.groups = NULL, xlab = x.var, ylab = y.var, main = "make", sub = NULL, theme = theme_bw(), do.contour = FALSE, contour.color = "black", contour.linetype = 1, do.ellipse = FALSE, do.label = FALSE, labels.size = 5, labels.highlight = TRUE, labels.repel = TRUE, labels.split.by = split.by, labels.repel.adjust = list(), add.trajectory.lineages = NULL, add.trajectory.curves = NULL, trajectory.cluster.meta, trajectory.arrow.size = 0.15, legend.show = TRUE, legend.color.title = "make", legend.color.breaks = waiver(), legend.color.breaks.labels = waiver(), legend.density.title = if (isBulk(object)) "Samples" else "Cells", legend.density.breaks = waiver(), legend.density.breaks.labels = waiver(), data.out = FALSE )
dittoDimHex( object, color.var = NULL, bins = 30, color.method = NULL, reduction.use = .default_reduction(object), dim.1 = 1, dim.2 = 2, cells.use = NULL, color.panel = dittoColors(), colors = seq_along(color.panel), split.by = NULL, extra.vars = NULL, multivar.split.dir = c("col", "row"), split.nrow = NULL, split.ncol = NULL, split.adjust = list(), assay = .default_assay(object), slot = .default_slot(object), adjustment = NULL, swap.rownames = NULL, assay.extra = assay, slot.extra = slot, adjustment.extra = adjustment, show.axes.numbers = TRUE, show.grid.lines = !grepl("umap|tsne", tolower(reduction.use)), main = "make", sub = NULL, xlab = "make", ylab = "make", theme = theme_bw(), do.contour = FALSE, contour.color = "black", contour.linetype = 1, min.density = NA, max.density = NA, min.color = "#F0E442", max.color = "#0072B2", min.opacity = 0.2, max.opacity = 1, min = NA, max = NA, rename.color.groups = NULL, do.ellipse = FALSE, do.label = FALSE, labels.size = 5, labels.highlight = TRUE, labels.repel = TRUE, labels.split.by = split.by, labels.repel.adjust = list(), add.trajectory.lineages = NULL, add.trajectory.curves = NULL, trajectory.cluster.meta, trajectory.arrow.size = 0.15, data.out = FALSE, legend.show = TRUE, legend.color.title = "make", legend.color.breaks = waiver(), legend.color.breaks.labels = waiver(), legend.density.title = if (isBulk(object)) "Samples" else "Cells", legend.density.breaks = waiver(), legend.density.breaks.labels = waiver() ) dittoScatterHex( object, x.var, y.var, color.var = NULL, bins = 30, color.method = NULL, split.by = NULL, extra.vars = NULL, cells.use = NULL, color.panel = dittoColors(), colors = seq_along(color.panel), multivar.split.dir = c("col", "row"), split.nrow = NULL, split.ncol = NULL, split.adjust = list(), assay.x = .default_assay(object), slot.x = .default_slot(object), adjustment.x = NULL, assay.y = .default_assay(object), slot.y = .default_slot(object), adjustment.y = NULL, assay.color = .default_assay(object), slot.color = .default_slot(object), adjustment.color = NULL, assay.extra = .default_assay(object), slot.extra = .default_slot(object), adjustment.extra = NULL, swap.rownames = NULL, min.density = NA, max.density = NA, min.color = "#F0E442", max.color = "#0072B2", min.opacity = 0.2, max.opacity = 1, min = NA, max = NA, rename.color.groups = NULL, xlab = x.var, ylab = y.var, main = "make", sub = NULL, theme = theme_bw(), do.contour = FALSE, contour.color = "black", contour.linetype = 1, do.ellipse = FALSE, do.label = FALSE, labels.size = 5, labels.highlight = TRUE, labels.repel = TRUE, labels.split.by = split.by, labels.repel.adjust = list(), add.trajectory.lineages = NULL, add.trajectory.curves = NULL, trajectory.cluster.meta, trajectory.arrow.size = 0.15, legend.show = TRUE, legend.color.title = "make", legend.color.breaks = waiver(), legend.color.breaks.labels = waiver(), legend.density.title = if (isBulk(object)) "Samples" else "Cells", legend.density.breaks = waiver(), legend.density.breaks.labels = waiver(), data.out = FALSE )
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
color.var |
Single string giving a gene or metadata that will set the color of cells/samples in the plot. Alternatively, can be a directly supplied numeric or string vector or a factor of length equal to the total number of cells/samples in |
bins |
Numeric or numeric vector giving the number of haxagonal bins in the x and y directions. Set to 30 by default. |
color.method |
Works differently depending on whether the Continuous: String naming a function for how target data should be summarized for each bin.
Can be any function that summarizes a numeric vector input with a single numeric output value.
Default is Discrete: A string signifying whether the color should (default) be simply based on the "max" grouping of the bin, based on "prop.<value>" the proportion of a specific value (e.g. "prop.A" or "prop.TRUE"), or based on the "max.prop"ortion of cells/samples belonging to any grouping. |
reduction.use |
String, such as "pca", "tsne", "umap", or "PCA", etc, which is the name of a dimensionality reduction slot within the object, and which sets what dimensionality reduction space within the object to use. Default = the first dimensionality reduction slot inside the object with "umap", "tsne", or "pca" within its name, (priority: UMAP > t-SNE > PCA) or the first dimensionality reduction slot if none of those exist. Alternatively, a matrix (or data.frame) containing the dimensionality reduction embeddings themselves.
The matrix should have as many rows as there are cells/samples in the |
dim.1 |
The component number to use on the x-axis. Default = 1 |
dim.2 |
The component number to use on the y-axis. Default = 2 |
cells.use |
String vector of cells'/samples' names OR an integer vector specifying the indices of cells/samples which should be included. Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include. |
color.panel |
String vector which sets the colors to draw from. |
colors |
Integer vector, the indexes / order, of colors from color.panel to actually use |
split.by |
1 or 2 strings naming discrete metadata to use for splitting the cells/samples into multiple plots with ggplot faceting. When 2 metadatas are named, c(row,col), the first is used as rows and the second is used for columns of the resulting grid. When 1 metadata is named, shape control can be achieved with |
extra.vars |
String vector providing names of any extra metadata to be stashed in the dataframe supplied to Useful for making custom alterations after dittoSeq plot generation. |
multivar.split.dir |
"row" or "col", sets the direction of faceting used for 'var' values when |
split.nrow , split.ncol
|
Integers which set the dimensions of faceting/splitting when a single metadata is given to |
split.adjust |
A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. 'list(scales = "free")'. For options, when giving 1 metadata to |
assay , slot
|
single strings or integers (SCEs and SEs) or an optionally named vector of such values that set which expression data to use.
See |
adjustment |
When plotting gene / feature expression, should that data be used directly (default) or should it be adjusted to be
|
swap.rownames |
optionally named string or string vector.
For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object).
When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for.
See |
show.axes.numbers |
Logical which controls whether the axes values should be displayed. |
show.grid.lines |
Logical which sets whether gridlines of the plot should be shown.
They are removed when set to FALSE.
Default = FALSE for umap and tsne |
main |
String, sets the plot title. The default title is either "Density", |
sub |
String, sets the plot subtitle. |
xlab , ylab
|
Strings which set the labels for the axes. To remove, set to |
theme |
A ggplot theme which will be applied before dittoSeq adjustments.
Default = |
do.contour |
Logical. Whether density-based contours should be displayed. |
contour.color |
String that sets the color(s) of the |
contour.linetype |
String or numeric which sets the type of line used for |
min.density , max.density
|
Number which sets the min/max values used for the density scale. Used no matter whether density is represented through opacity or color. |
min.color , max.color
|
color for the min/max values of the color scale. |
min.opacity , max.opacity
|
Scalar between [0,1] which sets the minimum or maximum opacity used for the density legend (when color is used for |
min , max
|
Number which sets the values associated with the minimum or maximum color for |
rename.color.groups |
String vector containing new names for the identities of discrete color groups. |
do.ellipse |
Logical. Whether the groups should be surrounded by median-centered ellipses. |
do.label |
Logical. Whether to add text labels near the center (median) of clusters for grouping vars. |
labels.size |
Size of the the labels text |
labels.highlight |
Logical. Whether the labels should have a box behind them |
labels.repel |
Logical, that sets whether the labels' placements will be adjusted with ggrepel to avoid intersections between labels and plot bounds. TRUE by default. |
labels.split.by |
String of one or two metadata names which controls the facet-split calculations for label placements.
Defaults to |
labels.repel.adjust |
A named list which allows extra parameters to be pushed through to ggrepel function calls.
List elements should be valid inputs to the |
add.trajectory.lineages |
List of vectors representing trajectory paths, each from start-cluster to end-cluster, where vector contents are the names of clusters provided in the If the |
add.trajectory.curves |
List of matrices, each representing coordinates for a trajectory path, from start to end, where matrix columns represent x ( Alternatively, (for dittoDimHex only, but not dittoScatterHex) a list of lists(/princurve objects) can be provided.
Thus, if the |
trajectory.cluster.meta |
String name of metadata containing the clusters that were used for generating trajectories. Required when plotting trajectories using the |
trajectory.arrow.size |
Number representing the size of trajectory arrows, in inches. Default = 0.15. |
data.out |
Logical. When set to |
legend.show |
Logical. Whether any legend should be displayed. Default = |
legend.density.title , legend.color.title
|
Strings which set the title for the legends. |
legend.density.breaks , legend.color.breaks
|
Numeric vector which sets the discrete values to label in the density and color.var legends. |
legend.density.breaks.labels , legend.color.breaks.labels
|
String vector, with same length as |
x.var , y.var
|
Single string giving a gene or metadata that will be used for the x- and y-axis of the scatterplot. Note: must be continuous. Alternatively, can be a directly supplied numeric vector of length equal to the total number of cells/samples in |
assay.x , assay.y , assay.color , assay.extra , slot.x , slot.y , slot.color , slot.extra
|
single strings or integers (SCEs and SEs) or an optionally named vector of such values that set which expression data to use for each given data target.
See |
adjustment.x , adjustment.y , adjustment.color , adjustment.extra
|
For the given data target, when targeting gene / feature expression, should that data be used directly (default) or should it be adjusted to be
|
The functions create a dataframe with x and y coordinates for each cell/sample, determined by either x.var
and y.var
for dittoScatterHex
,
or reduction.use
, dim.1
(x), and dim.2
(y) for dittoDimHex
.
Extra data requested by color.var
for coloring, split.by
for faceting, or extra.var
for manual external manipulations, are added to the dataframe as well.
For expression/counts data, assay
, slot
, and adjustment
inputs can be used to select which values to use, and if they should be adjusted in some way.
The dataframe is then subset to only target cells/samples based on the cells.use
input.
Finally, a hex plot is created using this dataframe:
If color.var
is not rovided, coloring is based on the density of cells/samples within each hex bin.
When color.var
is provided, density is represented through opacity while coloring is based on a summarization, chosen with the color.method
input, of the target color.var
data.
If split.by
was used, the plot will be split into a matrix of panels based on the associated groupings.
A ggplot object where colored hexagonal bins are used to summarize RNAseq data in a scatterplot or tSNE, PCA, UMAP.
Alternatively, if data.out=TRUE
, a list containing two slots is output: the plot (named 'plot'), and a data.table containing the underlying data for target cells (named 'data').
dittoDimHex()
: Show RNAseq data overlayed on a tsne, pca, or similar, grouped into hexagonal bins
dittoScatterHex()
: Make a scatter plot of RNAseq data, grouped into hexagonal bins
Colors: min.color
and max.color
adjust the colors for continuous data.
For discrete color.var
plotting with color.method = "max"
, colors are instead adjusted with color.panel
and/or colors
& the labels of the groupings can be changed using rename.color.groups
.
Titles and axes labels can be adjusted with main
, sub
, xlab
, ylab
, and legend.color.title
and legend.density.title
arguments.
Legends can also be adjusted in other ways, using variables that all start with "legend.
" for easy tab completion lookup.
Other tweaks and features can be added as well.
Each is accessible through 'tab' autocompletion starting with "do.
"---
or "add.
"---
,
and if additional inputs are involved in implementing or tweaking these, the associated inputs will start with the "---.
":
If do.contour
is provided, density gradiant contour lines will be overlaid with color and linetype adjustable via contour.color
and contour.linetype
.
If add.trajectory.lineages
is provided a list of vectors (each vector being cluster names from start-cluster-name to end-cluster-name), and a metadata name pointing to the relevant clustering information is provided to trajectory.cluster.meta
,
then median centers of the clusters will be calculated and arrows will be overlayed to show trajectory inference paths in the current dimmenionality reduction space.
If add.trajectory.curves
is provided a list of matrices (each matrix containing x, y coordinates from start to end), paths and arrows will be overlayed to show trajectory inference curves in the current dimmenionality reduction space.
Arrow size is controlled with the trajectory.arrow.size
input.
Daniel Bunis with some code adapted from Giuseppe D'Agostino
dittoDimPlot
and dittoScatterPlot
for making very similar data representations, but where each cell is represented individually.
It is often best to investigate your data with both the individual and hex-bin methods, then pick whichever is the best representation for your particular goal.
getGenes
and getMetas
to see what the var
, split.by
, etc. options are of an object
.
getReductions
to see what the reduction.use
options are of an object
.
example(importDittoBulk, echo = FALSE) myRNA # Mock up some nCount_RNA and nFeature_RNA metadata # == the default way to extract myRNA$nCount_RNA <- runif(60,200,1000) myRNA$nFeature_RNA <- myRNA$nCount_RNA*runif(60,0.95,1.05) # and also percent.mito metadata myRNA$percent.mito <- sample(c(runif(50,0,0.05),runif(10,0.05,0.2))) dittoScatterHex( myRNA, x.var = "nCount_RNA", y.var = "nFeature_RNA") dittoDimHex(myRNA) # We don't have too many samples here, so let's increase the bin size. dittoDimHex(myRNA, bins = 10) # x and y bins can be set separately, useful for non-square plots dittoDimHex(myRNA, bins = c(20, 10)) ### Coloring # Default coloring, as above, is by cell/sample density in the region, but # 'color.var' can be used to color the data by another metric. # Density with then be represented via bin opacity. dittoDimHex(myRNA, color.var = "clustering", bins = 10) dittoDimHex(myRNA, color.var = "gene1", bins = 10) # 'color.method' is then used to adjust how the target data is summarized dittoDimHex(myRNA, color.var = "groups", bins = 10, color.method = "max.prop") dittoDimHex(myRNA, color.var = "gene1", bins = 10, color.method = "mean") # One particularly useful 'color.method' for discrete 'color.var'-data is # to use 'prop.<value>' to color by the proportion of a particular value # within each bin: dittoDimHex(myRNA, color.var = "groups", bins = 10, color.method = "prop.A") ### Additional Features: # Faceting with 'split.by' dittoDimHex(myRNA, bins = 10, split.by = "groups") dittoDimHex(myRNA, bins = 10, split.by = c("groups", "clustering")) # Faceting can also be used to show multiple continuous variables side-by-side # by giving a vector of continuous metadata or gene names to 'color.var'. # This can also be combined with 1 'split.by' variable, with direction then # controlled via 'multivar.split.dir': dittoDimHex(myRNA, bins = 10, color.var = c("gene1", "gene2")) dittoDimHex(myRNA, bins = 10, color.var = c("gene1", "gene2"), split.by = "groups") dittoDimHex(myRNA, bins = 10, color.var = c("gene1", "gene2"), split.by = "groups", multivar.split.dir = "row") # Underlying data output with 'data.out = TRUE' dittoDimHex(myRNA, data.out = TRUE) # Contour lines can be added with 'do.contours = TRUE' dittoDimHex(myRNA, bins = 10, do.contour = TRUE, contour.color = "lightblue", # Optional, black by default contour.linetype = "dashed") # Optional, solid by default # Trajectories can be added to dittoDimHex plots dittoDimHex(myRNA, bins = 10, add.trajectory.lineages = list(c(1,2,4), c(1,4), c(1,3)), trajectory.cluster.meta = "clustering")
example(importDittoBulk, echo = FALSE) myRNA # Mock up some nCount_RNA and nFeature_RNA metadata # == the default way to extract myRNA$nCount_RNA <- runif(60,200,1000) myRNA$nFeature_RNA <- myRNA$nCount_RNA*runif(60,0.95,1.05) # and also percent.mito metadata myRNA$percent.mito <- sample(c(runif(50,0,0.05),runif(10,0.05,0.2))) dittoScatterHex( myRNA, x.var = "nCount_RNA", y.var = "nFeature_RNA") dittoDimHex(myRNA) # We don't have too many samples here, so let's increase the bin size. dittoDimHex(myRNA, bins = 10) # x and y bins can be set separately, useful for non-square plots dittoDimHex(myRNA, bins = c(20, 10)) ### Coloring # Default coloring, as above, is by cell/sample density in the region, but # 'color.var' can be used to color the data by another metric. # Density with then be represented via bin opacity. dittoDimHex(myRNA, color.var = "clustering", bins = 10) dittoDimHex(myRNA, color.var = "gene1", bins = 10) # 'color.method' is then used to adjust how the target data is summarized dittoDimHex(myRNA, color.var = "groups", bins = 10, color.method = "max.prop") dittoDimHex(myRNA, color.var = "gene1", bins = 10, color.method = "mean") # One particularly useful 'color.method' for discrete 'color.var'-data is # to use 'prop.<value>' to color by the proportion of a particular value # within each bin: dittoDimHex(myRNA, color.var = "groups", bins = 10, color.method = "prop.A") ### Additional Features: # Faceting with 'split.by' dittoDimHex(myRNA, bins = 10, split.by = "groups") dittoDimHex(myRNA, bins = 10, split.by = c("groups", "clustering")) # Faceting can also be used to show multiple continuous variables side-by-side # by giving a vector of continuous metadata or gene names to 'color.var'. # This can also be combined with 1 'split.by' variable, with direction then # controlled via 'multivar.split.dir': dittoDimHex(myRNA, bins = 10, color.var = c("gene1", "gene2")) dittoDimHex(myRNA, bins = 10, color.var = c("gene1", "gene2"), split.by = "groups") dittoDimHex(myRNA, bins = 10, color.var = c("gene1", "gene2"), split.by = "groups", multivar.split.dir = "row") # Underlying data output with 'data.out = TRUE' dittoDimHex(myRNA, data.out = TRUE) # Contour lines can be added with 'do.contours = TRUE' dittoDimHex(myRNA, bins = 10, do.contour = TRUE, contour.color = "lightblue", # Optional, black by default contour.linetype = "dashed") # Optional, solid by default # Trajectories can be added to dittoDimHex plots dittoDimHex(myRNA, bins = 10, add.trajectory.lineages = list(c(1,2,4), c(1,4), c(1,3)), trajectory.cluster.meta = "clustering")
Plots continuous data for customizeable cells'/samples' groupings on a y- (or x-) axis
dittoPlot( object, var, group.by, color.by = group.by, shape.by = NULL, split.by = NULL, extra.vars = NULL, cells.use = NULL, plots = c("jitter", "vlnplot"), multivar.aes = c("split", "group", "color"), multivar.split.dir = c("col", "row"), assay = .default_assay(object), slot = .default_slot(object), adjustment = NULL, swap.rownames = NULL, do.hover = FALSE, hover.data = var, color.panel = dittoColors(), colors = seq_along(color.panel), shape.panel = c(16, 15, 17, 23, 25, 8), theme = theme_classic(), main = "make", sub = NULL, ylab = "make", y.breaks = NULL, min = NA, max = NA, xlab = "make", x.labels = NULL, x.labels.rotate = NA, x.reorder = NULL, split.nrow = NULL, split.ncol = NULL, split.adjust = list(), do.raster = FALSE, raster.dpi = 300, jitter.size = 1, jitter.width = 0.2, jitter.color = "black", jitter.shape.legend.size = NA, jitter.shape.legend.show = TRUE, jitter.position.dodge = boxplot.position.dodge, boxplot.width = 0.2, boxplot.color = "black", boxplot.show.outliers = NA, boxplot.outlier.size = 1.5, boxplot.fill = TRUE, boxplot.position.dodge = vlnplot.width, boxplot.lineweight = 1, vlnplot.lineweight = 1, vlnplot.width = 1, vlnplot.scaling = "area", vlnplot.quantiles = NULL, ridgeplot.lineweight = 1, ridgeplot.scale = 1.25, ridgeplot.ymax.expansion = NA, ridgeplot.shape = c("smooth", "hist"), ridgeplot.bins = 30, ridgeplot.binwidth = NULL, add.line = NULL, line.linetype = "dashed", line.color = "black", legend.show = TRUE, legend.title = "make", data.out = FALSE ) dittoRidgePlot(..., plots = c("ridgeplot")) dittoRidgeJitter(..., plots = c("ridgeplot", "jitter")) dittoBoxPlot(..., plots = c("boxplot", "jitter"))
dittoPlot( object, var, group.by, color.by = group.by, shape.by = NULL, split.by = NULL, extra.vars = NULL, cells.use = NULL, plots = c("jitter", "vlnplot"), multivar.aes = c("split", "group", "color"), multivar.split.dir = c("col", "row"), assay = .default_assay(object), slot = .default_slot(object), adjustment = NULL, swap.rownames = NULL, do.hover = FALSE, hover.data = var, color.panel = dittoColors(), colors = seq_along(color.panel), shape.panel = c(16, 15, 17, 23, 25, 8), theme = theme_classic(), main = "make", sub = NULL, ylab = "make", y.breaks = NULL, min = NA, max = NA, xlab = "make", x.labels = NULL, x.labels.rotate = NA, x.reorder = NULL, split.nrow = NULL, split.ncol = NULL, split.adjust = list(), do.raster = FALSE, raster.dpi = 300, jitter.size = 1, jitter.width = 0.2, jitter.color = "black", jitter.shape.legend.size = NA, jitter.shape.legend.show = TRUE, jitter.position.dodge = boxplot.position.dodge, boxplot.width = 0.2, boxplot.color = "black", boxplot.show.outliers = NA, boxplot.outlier.size = 1.5, boxplot.fill = TRUE, boxplot.position.dodge = vlnplot.width, boxplot.lineweight = 1, vlnplot.lineweight = 1, vlnplot.width = 1, vlnplot.scaling = "area", vlnplot.quantiles = NULL, ridgeplot.lineweight = 1, ridgeplot.scale = 1.25, ridgeplot.ymax.expansion = NA, ridgeplot.shape = c("smooth", "hist"), ridgeplot.bins = 30, ridgeplot.binwidth = NULL, add.line = NULL, line.linetype = "dashed", line.color = "black", legend.show = TRUE, legend.title = "make", data.out = FALSE ) dittoRidgePlot(..., plots = c("ridgeplot")) dittoRidgeJitter(..., plots = c("ridgeplot", "jitter")) dittoBoxPlot(..., plots = c("boxplot", "jitter"))
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
var |
Single string representing the name of a metadata or gene, OR a vector with length equal to the total number of cells/samples in the dataset. Alternatively, a string vector naming multiple genes or metadata. This is the primary data that will be displayed. |
group.by |
String representing the name of a metadata to use for separating the cells/samples into discrete groups. |
color.by |
String representing the name of a metadata to use for setting fills.
Great for highlighting supersets or subgroups when wanted, but it defaults to |
shape.by |
Single string representing the name of a metadata to use for setting the shapes of the jitter points. When not provided, all cells/samples will be represented with dots. |
split.by |
1 or 2 strings naming discrete metadata to use for splitting the cells/samples into multiple plots with ggplot faceting. When 2 metadatas are named, c(row,col), the first is used as rows and the second is used for columns of the resulting grid. When 1 metadata is named, shape control can be achieved with |
extra.vars |
String vector providing names of any extra metadata to be stashed in the dataframe supplied to Useful for making custom spliting/faceting or other additional alterations after dittoSeq plot generation. |
cells.use |
String vector of cells'(/samples' for bulk data) names OR an integer vector specifying the indices of cells/samples which should be included. Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include. |
plots |
String vector which sets the types of plots to include: possibilities = "jitter", "boxplot", "vlnplot", "ridgeplot". Order matters: c("vlnplot", "boxplot", "jitter") will put a violin plot in the back, boxplot in the middle, and then individual dots in the front. See details section for more info. |
multivar.aes |
"split", "group", or "color", the plot feature to utilize for displaying 'var' value when |
multivar.split.dir |
"row" or "col", sets the direction of faceting used for 'var' values when |
assay , slot
|
single strings or integers (SCEs and SEs) or an optionally named vector of such values that set which expression data to use.
See |
adjustment |
When plotting gene expression / feature counts, should that data be used directly (default) or should it be adjusted to be
|
swap.rownames |
optionally named string or string vector.
For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object).
When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for.
See |
do.hover |
Logical. Default = |
hover.data |
String vector, a list of variable names, c("meta1","gene1","meta2",...) which determines what data to show upon hover when do.hover is set to |
color.panel |
String vector which sets the colors to draw from for plot fills.
Default = |
colors |
Integer vector, the indexes / order, of colors from color.panel to actually use.
(Provides an alternative to directly modifying |
shape.panel |
Vector of integers corresponding to ggplot shapes which sets what shapes to use.
When discrete groupings are supplied by |
theme |
A ggplot theme which will be applied before dittoSeq adjustments.
Default = |
main |
String, sets the plot title. Default = "make" and if left as make, a title will be automatically generated. To remove, set to |
sub |
String, sets the plot subtitle |
ylab |
String, sets the continuous-axis label (=y-axis for box and violin plots, x-axis for ridgeplots).
Defaults to " |
y.breaks |
Numeric vector, a set of breaks that should be used as major gridlines. c(break1,break2,break3,etc.). |
min , max
|
Scalars which control the zoom of the plot. These inputs set the minimum / maximum values of the data to display. Default = NA, which allows ggplot to set these limits based on the range of all data being shown. |
xlab |
String which sets the grouping-axis label (=x-axis for box and violin plots, y-axis for ridgeplots).
Set to |
x.labels |
String vector, c("label1","label2","label3",...) which overrides the names of groupings. |
x.labels.rotate |
Logical which sets whether the labels should be rotated.
Default: |
x.reorder |
Integer vector. A sequence of numbers, from 1 to the number of groupings, for rearranging the order of x-axis groupings. Method: Make a first plot without this input. Then, treating the leftmost grouping as index 1, and the rightmost as index n. Values of x.reorder should be these indices, but in the order that you would like them rearranged to be. Recommendation for advanced users: If you find yourself coming back to this input too many times, an alternative solution that can be easier long-term
is to make the target data into a factor, and to put its levels in the desired order: |
split.nrow , split.ncol
|
Integers which set the dimensions of faceting/splitting when a single metadata is given to |
split.adjust |
A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. 'list(scales = "free")'. For options, when giving 1 metadata to |
do.raster |
Logical. When set to |
raster.dpi |
Number indicating dots/pixels per inch (dpi) to use for rasterization. Default = 300. |
jitter.size |
Scalar which sets the size of the jitter shapes. |
jitter.width |
Scalar that sets the width/spread of the jitter in the x direction. Ignored in ridgeplots. Note for when |
jitter.color |
String which sets the color of the jitter shapes |
jitter.shape.legend.size |
Scalar which changes the size of the shape key in the legend.
If set to |
jitter.shape.legend.show |
Logical which sets whether the shapes legend will be shown when its shape is determined by |
jitter.position.dodge |
Scalar which adjusts the relative distance between jitter widths when multiple subgroups exist per |
boxplot.width |
Scalar which sets the width/spread of the boxplot in the x direction |
boxplot.color |
String which sets the color of the lines of the boxplot |
boxplot.show.outliers |
Logical, whether outliers should by including in the boxplot.
Default is |
boxplot.outlier.size |
Scalar which adjusts the size of points used to mark outliers |
boxplot.fill |
Logical, whether the boxplot should be filled in or not. Known bug: when boxplot fill is turned off, outliers do not render. |
boxplot.position.dodge |
Scalar which adjusts the relative distance between boxplots when multiple are drawn per grouping (a.k.a. when |
boxplot.lineweight |
Scalar which adjusts the thickness of boxplot lines. |
vlnplot.lineweight |
Scalar which sets the thickness of the line that outlines the violin plots. |
vlnplot.width |
Scalar which sets the width/spread of violin plots in the x direction |
vlnplot.scaling |
String which sets how the widths of the of violin plots are set in relation to each other.
Options are "area", "count", and "width". If the default is not right for your data, I recommend trying "width".
For an explanation of each, see |
vlnplot.quantiles |
Single number or numeric vector of values in [0,1] naming quantiles at which to draw a horizontal line within each violin plot. Example: |
ridgeplot.lineweight |
Scalar which sets the thickness of the ridgeplot outline. |
ridgeplot.scale |
Scalar which sets the distance/overlap between ridgeplots. A value of 1 means the tallest density curve just touches the baseline of the next higher one. Higher numbers lead to greater overlap. Default = 1.25 |
ridgeplot.ymax.expansion |
Scalar which adjusts the minimal space between the top-most grouping and the top of the plot in order to ensure that the curve is not cut off by the plotting grid. The larger the value, the greater the space requested. When left as NA, dittoSeq will attempt to determine an ideal value itself based on the number of groups & linear interpolation between these goal posts: 0.6 when g<=3, 0.1 when g==12, and 0.05 when g>=34, where g is the number of groups. |
ridgeplot.shape |
Either "smooth" or "hist", sets whether ridges will be smoothed (the typical, and default) versus rectangular like a histogram.
(Note: as of the time shape "hist" was added, combination of jittered points is not supported by the |
ridgeplot.bins |
Integer which sets how many chunks to break the x-axis into when |
ridgeplot.binwidth |
Integer which sets the width of chunks to break the x-axis into when |
add.line |
numeric value(s) where one or multiple line(s) should be added |
line.linetype |
String which sets the type of line for |
line.color |
String that sets the color(s) of the |
legend.show |
Logical. Whether the legend should be displayed. Default = |
legend.title |
String or |
data.out |
Logical. When set to |
... |
arguments passed to dittoPlot by dittoRidgePlot, dittoRidgeJitter, and dittoBoxPlot wrappers. Options are all the ones above. |
The function creates a dataframe containing the metadata or expression data associated with the given var
(or if a vector of data is provided, that data).
On the discrete axis, data will be grouped by the metadata given to group.by
and colored by the metadata given to color.by
.
The assay
and slot
inputs can be used to change what expression data is used when displaying gene expression.
If a set of cells to use is indicated with the cells.use
input, the data is subset to include only those cells before plotting.
The plots
argument determines the types of data representation that will be generated, as well as their order from back to front.
Options are "jitter"
, "boxplot"
, "vlnplot"
, and "ridgeplot"
.
Inclusion of "ridgeplot"
overrides "boxplot"
and "vlnplot"
presence and changes the plot to be horizontal.
When split.by
is provided the name of a metadata containing discrete data, separate plots will be produced representing each of the distinct groupings of the split.by data.
dittoRidgePlot
, dittoRidgeJitter
, and dittoBoxPlot
are included as wrappers of the basic dittoPlot
function
that simply change the default for the plots
input to be "ridgeplot"
, c("ridgeplot","jitter")
, or c("boxplot","jitter")
,
to make such plots even easier to produce.
a ggplot where continuous data, grouped by sample, age, cluster, etc., shown on either the y-axis by a violin plot, boxplot, and/or jittered points, or on the x-axis by a ridgeplot with or without jittered points.
Alternatively when data.out=TRUE
, a list containing the plot ("p") and the underlying data as a dataframe ("data").
Alternatively when do.hover = TRUE
, a plotly converted version of the ggplot where additional data will be displayed when the cursor is hovered over jitter points.
dittoRidgePlot()
: Plots continuous data for customizeable cells'/samples' groupings horizontally in a density representation
dittoRidgeJitter()
: dittoRidgePlot, but with jitter overlaid
dittoBoxPlot()
: Plots continuous data for customizeable cells'/samples' groupings in boxplot form
The plots
argument determines the types of data representation that will be generated, as well as their order from back to front.
Options are "jitter"
, "boxplot"
, "vlnplot"
, and "ridgeplot"
.
Each plot type has specific associated options which are controlled by variables that start with their associated string.
For example, all jitter adjustments start with "jitter.
", such as jitter.size
and jitter.width
.
Inclusion of "ridgeplot"
overrides "boxplot"
and "vlnplot"
presence and changes the plot to be horizontal.
Additionally:
Colors can be adjusted with color.panel
.
Subgroupings: color.by
can be utilized to split major group.by
groupings into subgroups.
When this is done in y-axis plotting, dittoSeq automatically ensures the centers of all geoms will align,
but users will need to manually adjust jitter.width
to less than 0.5/num_subgroups to avoid overlaps.
There are also three inputs through which one can use to control geom-center placement, but the easiest way to do all at once so is to just adjust vlnplot.width
!
The other two: boxplot.position.dodge
, and jitter.position.dodge
.
Line(s) can be added at single or multiple value(s) by providing these values to add.line
.
Linetype and color are set with line.linetype
, which is "dashed" by default, and line.color
, which is "black" by default.
Titles and axes labels can be adjusted with main
, sub
, xlab
, ylab
, and legend.title
arguments.
The legend can be hidden by setting legend.show = FALSE
.
y-axis zoom and tick marks can be adjusted using min
, max
, and y.breaks
.
x-axis labels and groupings can be changed / reordered using x.labels
and x.reorder
, and rotation of these labels can be turned on/off with x.labels.rotate = TRUE/FALSE
.
Shapes used in conjunction with shape.by
can be adjusted with shape.panel
.
Single or multiple additional per-cell features can be retrieved and stashed within the underlying data using extra.vars
.
This can be very useful for making manual additional alterations after dittoSeq plot generation.
Daniel Bunis
multi_dittoPlot
for easy creation of multiple dittoPlots each focusing on a different var
.
dittoPlotVarsAcrossGroups
to create dittoPlots that show summarized expression (or values for metadata), accross groups, of multiple vars
in a single plot.
dittoRidgePlot
, dittoRidgeJitter
, and dittoBoxPlot
for shortcuts to a few 'plots' input shortcuts
example(importDittoBulk, echo = FALSE) myRNA # Basic dittoplot, with jitter behind a vlnplot (looks better with more cells) dittoPlot(object = myRNA, var = "gene1", group.by = "timepoint") # Color distinctly from the grouping variable using 'color.by' dittoPlot(object = myRNA, var = "gene1", group.by = "timepoint", color.by = "conditions") dittoPlot(object = myRNA, var = "gene1", group.by = "conditions", color.by = "timepoint") # Update the 'plots' input to change / reorder the data representations dittoPlot(myRNA, "gene1", "timepoint", plots = c("vlnplot", "boxplot", "jitter")) dittoPlot(myRNA, "gene1", "timepoint", plots = c("ridgeplot", "jitter")) ### Provided wrappers enable certain easy adjustments of the 'plots' parameter. # Quickly make a Boxplot dittoBoxPlot(myRNA, "gene1", group.by = "timepoint") # Quickly make a Ridgeplot, with or without jitter dittoRidgePlot(myRNA, "gene1", group.by = "timepoint") dittoRidgeJitter(myRNA, "gene1", group.by = "timepoint") ### Additional Functionality # Modify the look with intuitive inputs dittoPlot(myRNA, "gene1", "timepoint", plots = c("vlnplot", "boxplot", "jitter"), boxplot.color = "white", main = "CD3E", legend.show = FALSE) # Data can also be split in other ways with 'shape.by' or 'split.by' dittoPlot(object = myRNA, var = "gene1", group.by = "timepoint", plots = c("vlnplot", "boxplot", "jitter"), shape.by = "clustering", split.by = "SNP") # single split.by element dittoPlot(object = myRNA, var = "gene1", group.by = "timepoint", plots = c("vlnplot", "boxplot", "jitter"), split.by = c("groups","SNP")) # row and col split.by elements # Multiple genes or continuous metadata can also be plotted by giving them as # a vector to 'var'. One aesthetic of the plot will then be used to display # 'var'-info, and you can control which (faceting / "split", x-axis grouping # / "group", or color / "color") with 'multivar.aes': dittoPlot(object = myRNA, group.by = "timepoint", var = c("gene1", "gene2")) dittoPlot(object = myRNA, group.by = "timepoint", var = c("gene1", "gene2"), multivar.aes = "group") dittoPlot(object = myRNA, group.by = "timepoint", var = c("gene1", "gene2"), multivar.aes = "color") # For faceting, instead of using 'split.by', the target data can alternatively # be given to 'extra.var' to have it added in the underlying dataframe, then # faceting can be added manually for extra flexibility dittoPlot(myRNA, "gene1", "clustering", plots = c("vlnplot", "boxplot", "jitter"), extra.var = "SNP") + facet_wrap("SNP", ncol = 1, strip.position = "left")
example(importDittoBulk, echo = FALSE) myRNA # Basic dittoplot, with jitter behind a vlnplot (looks better with more cells) dittoPlot(object = myRNA, var = "gene1", group.by = "timepoint") # Color distinctly from the grouping variable using 'color.by' dittoPlot(object = myRNA, var = "gene1", group.by = "timepoint", color.by = "conditions") dittoPlot(object = myRNA, var = "gene1", group.by = "conditions", color.by = "timepoint") # Update the 'plots' input to change / reorder the data representations dittoPlot(myRNA, "gene1", "timepoint", plots = c("vlnplot", "boxplot", "jitter")) dittoPlot(myRNA, "gene1", "timepoint", plots = c("ridgeplot", "jitter")) ### Provided wrappers enable certain easy adjustments of the 'plots' parameter. # Quickly make a Boxplot dittoBoxPlot(myRNA, "gene1", group.by = "timepoint") # Quickly make a Ridgeplot, with or without jitter dittoRidgePlot(myRNA, "gene1", group.by = "timepoint") dittoRidgeJitter(myRNA, "gene1", group.by = "timepoint") ### Additional Functionality # Modify the look with intuitive inputs dittoPlot(myRNA, "gene1", "timepoint", plots = c("vlnplot", "boxplot", "jitter"), boxplot.color = "white", main = "CD3E", legend.show = FALSE) # Data can also be split in other ways with 'shape.by' or 'split.by' dittoPlot(object = myRNA, var = "gene1", group.by = "timepoint", plots = c("vlnplot", "boxplot", "jitter"), shape.by = "clustering", split.by = "SNP") # single split.by element dittoPlot(object = myRNA, var = "gene1", group.by = "timepoint", plots = c("vlnplot", "boxplot", "jitter"), split.by = c("groups","SNP")) # row and col split.by elements # Multiple genes or continuous metadata can also be plotted by giving them as # a vector to 'var'. One aesthetic of the plot will then be used to display # 'var'-info, and you can control which (faceting / "split", x-axis grouping # / "group", or color / "color") with 'multivar.aes': dittoPlot(object = myRNA, group.by = "timepoint", var = c("gene1", "gene2")) dittoPlot(object = myRNA, group.by = "timepoint", var = c("gene1", "gene2"), multivar.aes = "group") dittoPlot(object = myRNA, group.by = "timepoint", var = c("gene1", "gene2"), multivar.aes = "color") # For faceting, instead of using 'split.by', the target data can alternatively # be given to 'extra.var' to have it added in the underlying dataframe, then # faceting can be added manually for extra flexibility dittoPlot(myRNA, "gene1", "clustering", plots = c("vlnplot", "boxplot", "jitter"), extra.var = "SNP") + facet_wrap("SNP", ncol = 1, strip.position = "left")
Generates a dittoPlot where data points are genes/metadata summaries, per groups, instead of individual values per cells/samples.
dittoPlotVarsAcrossGroups( object, vars, group.by, color.by = group.by, split.by = NULL, summary.fxn = mean, cells.use = NULL, plots = c("vlnplot", "jitter"), assay = .default_assay(object), slot = .default_slot(object), adjustment = "z-score", swap.rownames = NULL, do.hover = FALSE, main = NULL, sub = NULL, ylab = "make", y.breaks = NULL, min = NA, max = NA, xlab = group.by, x.labels = NULL, x.labels.rotate = NA, x.reorder = NULL, groupings.drop.unused = TRUE, color.panel = dittoColors(), colors = c(seq_along(color.panel)), theme = theme_classic(), jitter.size = 1, jitter.width = 0.2, jitter.color = "black", jitter.position.dodge = boxplot.position.dodge, do.raster = FALSE, raster.dpi = 300, boxplot.width = 0.2, boxplot.color = "black", boxplot.show.outliers = NA, boxplot.outlier.size = 1.5, boxplot.fill = TRUE, boxplot.position.dodge = vlnplot.width, boxplot.lineweight = 1, vlnplot.lineweight = 1, vlnplot.width = 1, vlnplot.scaling = "area", vlnplot.quantiles = NULL, ridgeplot.lineweight = 1, ridgeplot.scale = 1.25, ridgeplot.ymax.expansion = NA, ridgeplot.shape = c("smooth", "hist"), ridgeplot.bins = 30, ridgeplot.binwidth = NULL, add.line = NULL, line.linetype = "dashed", line.color = "black", split.nrow = NULL, split.ncol = NULL, split.adjust = list(), legend.show = TRUE, legend.title = NULL, data.out = FALSE )
dittoPlotVarsAcrossGroups( object, vars, group.by, color.by = group.by, split.by = NULL, summary.fxn = mean, cells.use = NULL, plots = c("vlnplot", "jitter"), assay = .default_assay(object), slot = .default_slot(object), adjustment = "z-score", swap.rownames = NULL, do.hover = FALSE, main = NULL, sub = NULL, ylab = "make", y.breaks = NULL, min = NA, max = NA, xlab = group.by, x.labels = NULL, x.labels.rotate = NA, x.reorder = NULL, groupings.drop.unused = TRUE, color.panel = dittoColors(), colors = c(seq_along(color.panel)), theme = theme_classic(), jitter.size = 1, jitter.width = 0.2, jitter.color = "black", jitter.position.dodge = boxplot.position.dodge, do.raster = FALSE, raster.dpi = 300, boxplot.width = 0.2, boxplot.color = "black", boxplot.show.outliers = NA, boxplot.outlier.size = 1.5, boxplot.fill = TRUE, boxplot.position.dodge = vlnplot.width, boxplot.lineweight = 1, vlnplot.lineweight = 1, vlnplot.width = 1, vlnplot.scaling = "area", vlnplot.quantiles = NULL, ridgeplot.lineweight = 1, ridgeplot.scale = 1.25, ridgeplot.ymax.expansion = NA, ridgeplot.shape = c("smooth", "hist"), ridgeplot.bins = 30, ridgeplot.binwidth = NULL, add.line = NULL, line.linetype = "dashed", line.color = "black", split.nrow = NULL, split.ncol = NULL, split.adjust = list(), legend.show = TRUE, legend.title = NULL, data.out = FALSE )
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
vars |
String vector (example: |
group.by |
String representing the name of a metadata to use for separating the cells/samples into discrete groups. |
color.by |
String representing the name of a metadata to use for setting fills.
Great for highlighting subgroups when wanted, but it defaults to |
split.by |
1 or 2 strings naming discrete metadata to use for splitting the cells/samples into multiple plots with ggplot faceting. When 2 metadatas are named, c(row,col), the first is used as rows and the second is used for columns of the resulting grid. When 1 metadata is named, shape control can be achieved with |
summary.fxn |
A function which sets how variables' data will be summarized across the groups.
Default is |
cells.use |
String vector of cells'/samples' names OR an integer vector specifying the indices of cells/samples which should be included. Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include. |
plots |
String vector which sets the types of plots to include: possibilities = "jitter", "boxplot", "vlnplot", "ridgeplot". Order matters: c("vlnplot", "boxplot", "jitter") will put a violin plot in the back, boxplot in the middle, and then individual dots in the front. See details section for more info. |
assay , slot
|
single strings or integers (SCEs and SEs) or an optionally named vector of such values that set which expression data to use.
See |
adjustment |
When plotting gene expression (or antibody, or other forms of counts data), should that data be used directly or should it be adjusted to be
|
swap.rownames |
optionally named string or string vector.
For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object).
When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for.
See |
do.hover |
Logical. Default = |
main |
String which sets the plot title. |
sub |
String which sets the plot subtitle. |
ylab |
String which sets the y axis label.
Default = a combination of the name of the summary function + |
y.breaks |
Numeric vector, a set of breaks that should be used as major grid lines. c(break1,break2,break3,etc.). |
min , max
|
Scalars which control the zoom of the plot. These inputs set the minimum / maximum values of the data to display. Default = NA, which allows ggplot to set these limits based on the range of all data being shown. |
xlab |
String which sets the grouping-axis label (=x-axis for box and violin plots, y-axis for ridgeplots).
Set to |
x.labels |
String vector, c("label1","label2","label3",...) which overrides the names of groupings. |
x.labels.rotate |
Logical which sets whether the labels should be rotated.
Default: |
x.reorder |
Integer vector. A sequence of numbers, from 1 to the number of groupings, for rearranging the order of x-axis groupings. Method: Make a first plot without this input. Then, treating the leftmost grouping as index 1, and the rightmost as index n. Values of x.reorder should be these indices, but in the order that you would like them rearranged to be. Recommendation for advanced users: If you find yourself coming back to this input too many times, an alternative solution that can be easier long-term
is to make the target data into a factor, and to put its levels in the desired order: |
groupings.drop.unused |
Logical. |
color.panel |
String vector which sets the colors to draw from for plot fills. |
colors |
Integer vector, the indexes / order, of colors from color.panel to actually use.
(Provides an alternative to directly modifying |
theme |
A ggplot theme which will be applied before dittoSeq adjustments.
Default = |
jitter.size |
Scalar which sets the size of the jitter shapes. |
jitter.width |
Scalar that sets the width/spread of the jitter in the x direction. Ignored in ridgeplots. Note for when |
jitter.color |
String which sets the color of the jitter shapes |
jitter.position.dodge |
Scalar which adjusts the relative distance between jitter widths when multiple subgroups exist per |
do.raster |
Logical. When set to |
raster.dpi |
Number indicating dots/pixels per inch (dpi) to use for rasterization. Default = 300. |
boxplot.width |
Scalar which sets the width/spread of the boxplot in the x direction |
boxplot.color |
String which sets the color of the lines of the boxplot |
boxplot.show.outliers |
Logical, whether outliers should by including in the boxplot.
Default is |
boxplot.outlier.size |
Scalar which adjusts the size of points used to mark outliers |
boxplot.fill |
Logical, whether the boxplot should be filled in or not. Known bug: when boxplot fill is turned off, outliers do not render. |
boxplot.position.dodge |
Scalar which adjusts the relative distance between boxplots when multiple are drawn per grouping (a.k.a. when |
boxplot.lineweight |
Scalar which adjusts the thickness of boxplot lines. |
vlnplot.lineweight |
Scalar which sets the thickness of the line that outlines the violin plots. |
vlnplot.width |
Scalar which sets the width/spread of violin plots in the x direction |
vlnplot.scaling |
String which sets how the widths of the of violin plots are set in relation to each other.
Options are "area", "count", and "width". If the default is not right for your data, I recommend trying "width".
For an explanation of each, see |
vlnplot.quantiles |
Single number or numeric vector of values in [0,1] naming quantiles at which to draw a horizontal line within each violin plot. Example: |
ridgeplot.lineweight |
Scalar which sets the thickness of the ridgeplot outline. |
ridgeplot.scale |
Scalar which sets the distance/overlap between ridgeplots. A value of 1 means the tallest density curve just touches the baseline of the next higher one. Higher numbers lead to greater overlap. Default = 1.25 |
ridgeplot.ymax.expansion |
Scalar which adjusts the minimal space between the top-most grouping and the top of the plot in order to ensure that the curve is not cut off by the plotting grid. The larger the value, the greater the space requested. When left as NA, dittoSeq will attempt to determine an ideal value itself based on the number of groups & linear interpolation between these goal posts: 0.6 when g<=3, 0.1 when g==12, and 0.05 when g>=34, where g is the number of groups. |
ridgeplot.shape |
Either "smooth" or "hist", sets whether ridges will be smoothed (the typical, and default) versus rectangular like a histogram.
(Note: as of the time shape "hist" was added, combination of jittered points is not supported by the |
ridgeplot.bins |
Integer which sets how many chunks to break the x-axis into when |
ridgeplot.binwidth |
Integer which sets the width of chunks to break the x-axis into when |
add.line |
numeric value(s) where one or multiple line(s) should be added |
line.linetype |
String which sets the type of line for |
line.color |
String that sets the color(s) of the |
split.nrow , split.ncol
|
Integers which set the dimensions of faceting/splitting when a single metadata is given to |
split.adjust |
A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. 'list(scales = "free")'. For options, when giving 1 metadata to |
legend.show |
Logical. Whether the legend should be displayed. Default = |
legend.title |
String or |
data.out |
Logical. When set to |
Generally, this function will output a dittoPlot where each data point represents a gene (or metadata) rather than a cell/sample.
Values are the summary (mean
by default) of the values for each gene or metadata requested with vars
, within each group set by group.by
.
To start with, the data for each element of vars
is obtained.
When elements are genes/features, assay
and slot
are utilized to determine which expression data to use,
and adjustment
determines if and how the expression data might be adjusted.
By default, a z-score adjustment is applied to all gene/feature vars
.
Note that this adjustment is applied before cells/samples subsetting.
x-axis groupings are then determined using group.by
, and data for each variable is summarized using the summary.fxn
.
Finally, data is plotted with the data representation types in plots
.
a ggplot object
Alternatively when data.out = TRUE
, a list containing the plot ("p") and the underlying data as a dataframe ("data").
Alternatively when do.hover = TRUE
, a plotly converted version of the plot where additional data will be displayed when the cursor is hovered over jitter points.
The plots
argument determines the types of data representation that will be generated, as well as their order from back to front.
Options are "jitter"
, "boxplot"
, "vlnplot"
, and "ridgeplot"
.
Each plot type has specific associated options which are controlled by variables that start with their associated string.
For example, all jitter adjustments start with "jitter.
", such as jitter.size
and jitter.width
.
Inclusion of "ridgeplot"
overrides "boxplot"
and "vlnplot"
presence and changes the plot to be horizontal.
Additionally:
Colors can be adjusted with color.panel
.
Subgroupings: color.by
can be utilized to split major group.by
groupings into subgroups.
When this is done in y-axis plotting, dittoSeq automatically ensures the centers of all geoms will align,
but users will need to manually adjust jitter.width
to less than 0.5/num_subgroups to avoid overlaps.
There are also three inputs through which one can use to control geom-center placement, but the easiest way to do all at once so is to just adjust vlnplot.width
!
The other two: boxplot.position.dodge
, and jitter.position.dodge
.
Line(s) can be added at single or multiple value(s) by providing these values to add.line
.
Linetype and color are set with line.linetype
, which is "dashed" by default, and line.color
, which is "black" by default.
Titles and axes labels can be adjusted with main
, sub
, xlab
, ylab
, and legend.title
arguments.
The legend can be hidden by setting legend.show = FALSE
.
y-axis zoom and tick marks can be adjusted using min
, max
, and y.breaks
.
x-axis labels and groupings can be changed / reordered using x.labels
and x.reorder
, and rotation of these labels can be turned on/off with x.labels.rotate = TRUE/FALSE
.
Shapes used in conjunction with shape.by
can be adjusted with shape.panel
.
Daniel Bunis
dittoPlot
and multi_dittoPlot
for plotting of single or mutliple expression and metadata vars, each as separate plots, on a per cell/sample basis.
dittoDotPlot
for an alternative representation of per-group summaries of multiple vars where all vars are displayed separately, but still in a single plot.
example(importDittoBulk, echo = FALSE) # Pick a set of genes genes <- getGenes(myRNA)[1:30] dittoPlotVarsAcrossGroups( myRNA, genes, group.by = "timepoint") # Color can be controlled separately from grouping with 'color.by' # Just note: all groupings must map to a single color. dittoPlotVarsAcrossGroups(myRNA, genes, "timepoint", color.by = "conditions") # To change it to have the violin plot in the back, a jitter on # top of that, and a white boxplot with no fill in front: dittoPlotVarsAcrossGroups(myRNA, genes, "timepoint", plots = c("vlnplot","jitter","boxplot"), boxplot.color = "white", boxplot.fill = FALSE) ## Data can be summarized in other ways by changing the summary.fxn input. # median dittoPlotVarsAcrossGroups(myRNA, genes, "timepoint", summary.fxn = median, adjustment = NULL) # Percent non-zero expression ( = boring for this fake data) percent <- function(x) {sum(x!=0)/length(x)} dittoPlotVarsAcrossGroups(myRNA, genes, "timepoint", summary.fxn = percent, adjustment = NULL) # To investigate the identities of outlier genes, we can turn on hovering # (if the plotly package is available) if (requireNamespace("plotly", quietly = TRUE)) { dittoPlotVarsAcrossGroups( myRNA, genes, "timepoint", do.hover = TRUE) }
example(importDittoBulk, echo = FALSE) # Pick a set of genes genes <- getGenes(myRNA)[1:30] dittoPlotVarsAcrossGroups( myRNA, genes, group.by = "timepoint") # Color can be controlled separately from grouping with 'color.by' # Just note: all groupings must map to a single color. dittoPlotVarsAcrossGroups(myRNA, genes, "timepoint", color.by = "conditions") # To change it to have the violin plot in the back, a jitter on # top of that, and a white boxplot with no fill in front: dittoPlotVarsAcrossGroups(myRNA, genes, "timepoint", plots = c("vlnplot","jitter","boxplot"), boxplot.color = "white", boxplot.fill = FALSE) ## Data can be summarized in other ways by changing the summary.fxn input. # median dittoPlotVarsAcrossGroups(myRNA, genes, "timepoint", summary.fxn = median, adjustment = NULL) # Percent non-zero expression ( = boring for this fake data) percent <- function(x) {sum(x!=0)/length(x)} dittoPlotVarsAcrossGroups(myRNA, genes, "timepoint", summary.fxn = percent, adjustment = NULL) # To investigate the identities of outlier genes, we can turn on hovering # (if the plotly package is available) if (requireNamespace("plotly", quietly = TRUE)) { dittoPlotVarsAcrossGroups( myRNA, genes, "timepoint", do.hover = TRUE) }
Show RNAseq data overlayed on a scatter plot
dittoScatterPlot( object, x.var, y.var, color.var = NULL, shape.by = NULL, split.by = NULL, extra.vars = NULL, cells.use = NULL, multivar.split.dir = c("col", "row"), show.others = FALSE, split.show.all.others = TRUE, size = 1, opacity = 1, color.panel = dittoColors(), colors = seq_along(color.panel), split.nrow = NULL, split.ncol = NULL, split.adjust = list(), assay.x = .default_assay(object), slot.x = .default_slot(object), adjustment.x = NULL, assay.y = .default_assay(object), slot.y = .default_slot(object), adjustment.y = NULL, assay.color = .default_assay(object), slot.color = .default_slot(object), adjustment.color = NULL, assay.extra = .default_assay(object), slot.extra = .default_slot(object), adjustment.extra = NULL, swap.rownames = NULL, shape.panel = c(16, 15, 17, 23, 25, 8), rename.color.groups = NULL, rename.shape.groups = NULL, min.color = "#F0E442", max.color = "#0072B2", min = NA, max = NA, order = c("unordered", "increasing", "decreasing", "randomize"), xlab = x.var, ylab = y.var, main = "make", sub = NULL, theme = theme_bw(), do.hover = FALSE, hover.data = NULL, hover.assay = .default_assay(object), hover.slot = .default_slot(object), hover.adjustment = NULL, do.contour = FALSE, contour.color = "black", contour.linetype = 1, add.trajectory.lineages = NULL, add.trajectory.curves = NULL, trajectory.cluster.meta, trajectory.arrow.size = 0.15, do.letter = FALSE, do.ellipse = FALSE, do.label = FALSE, labels.size = 5, labels.highlight = TRUE, labels.repel = TRUE, labels.split.by = split.by, labels.repel.adjust = list(), legend.show = TRUE, legend.color.title = "make", legend.color.size = 5, legend.color.breaks = waiver(), legend.color.breaks.labels = waiver(), legend.shape.title = shape.by, legend.shape.size = 5, do.raster = FALSE, raster.dpi = 300, data.out = FALSE )
dittoScatterPlot( object, x.var, y.var, color.var = NULL, shape.by = NULL, split.by = NULL, extra.vars = NULL, cells.use = NULL, multivar.split.dir = c("col", "row"), show.others = FALSE, split.show.all.others = TRUE, size = 1, opacity = 1, color.panel = dittoColors(), colors = seq_along(color.panel), split.nrow = NULL, split.ncol = NULL, split.adjust = list(), assay.x = .default_assay(object), slot.x = .default_slot(object), adjustment.x = NULL, assay.y = .default_assay(object), slot.y = .default_slot(object), adjustment.y = NULL, assay.color = .default_assay(object), slot.color = .default_slot(object), adjustment.color = NULL, assay.extra = .default_assay(object), slot.extra = .default_slot(object), adjustment.extra = NULL, swap.rownames = NULL, shape.panel = c(16, 15, 17, 23, 25, 8), rename.color.groups = NULL, rename.shape.groups = NULL, min.color = "#F0E442", max.color = "#0072B2", min = NA, max = NA, order = c("unordered", "increasing", "decreasing", "randomize"), xlab = x.var, ylab = y.var, main = "make", sub = NULL, theme = theme_bw(), do.hover = FALSE, hover.data = NULL, hover.assay = .default_assay(object), hover.slot = .default_slot(object), hover.adjustment = NULL, do.contour = FALSE, contour.color = "black", contour.linetype = 1, add.trajectory.lineages = NULL, add.trajectory.curves = NULL, trajectory.cluster.meta, trajectory.arrow.size = 0.15, do.letter = FALSE, do.ellipse = FALSE, do.label = FALSE, labels.size = 5, labels.highlight = TRUE, labels.repel = TRUE, labels.split.by = split.by, labels.repel.adjust = list(), legend.show = TRUE, legend.color.title = "make", legend.color.size = 5, legend.color.breaks = waiver(), legend.color.breaks.labels = waiver(), legend.shape.title = shape.by, legend.shape.size = 5, do.raster = FALSE, raster.dpi = 300, data.out = FALSE )
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
x.var , y.var
|
Single string giving a gene or metadata that will be used for the x- and y-axis of the scatterplot. Note: must be continuous. Alternatively, can be a directly supplied numeric vector of length equal to the total number of cells/samples in |
color.var |
Single string giving a gene or metadata that will set the color of cells/samples in the plot. Alternatively, can be a directly supplied numeric or string vector or a factor of length equal to the total number of cells/samples in |
shape.by |
Single string giving a metadata (Note: must be discrete.) that will set the shape of cells/samples in the plot. Alternatively, can be a directly supplied string vector or a factor of length equal to the total number of cells/samples in |
split.by |
1 or 2 strings naming discrete metadata to use for splitting the cells/samples into multiple plots with ggplot faceting. When 2 metadatas are named, c(row,col), the first is used as rows and the second is used for columns of the resulting grid. When 1 metadata is named, shape control can be achieved with |
extra.vars |
String vector providing names of any extra metadata to be stashed in the dataframe supplied to Useful for making custom alterations after dittoSeq plot generation. |
cells.use |
String vector of cells'/samples' names OR an integer vector specifying the indices of cells/samples which should be included. Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include. |
multivar.split.dir |
"row" or "col", sets the direction of faceting used for 'var' values when |
show.others |
Logical. FALSE by default, whether other cells should be shown in the background in light gray. |
split.show.all.others |
Logical which sets whether gray "others" cells of facets should include all cells of other facets ( |
size |
Number which sets the size of data points. Default = 1. |
opacity |
Number between 0 and 1. Great for when you have MANY overlapping points, this sets how solid the points should be: 1 = not see-through at all. 0 = invisible. Default = 1. (In terms of typical ggplot variables, = alpha) |
color.panel |
String vector which sets the colors to draw from. |
colors |
Integer vector, the indexes / order, of colors from color.panel to actually use |
split.nrow , split.ncol
|
Integers which set the dimensions of faceting/splitting when a single metadata is given to |
split.adjust |
A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. 'list(scales = "free")'. For options, when giving 1 metadata to |
assay.x , assay.y , assay.color , assay.extra , slot.x , slot.y , slot.color , slot.extra
|
single strings or integers (SCEs and SEs) or an optionally named vector of such values that set which expression data to use for each given data target.
See |
adjustment.x , adjustment.y , adjustment.color , adjustment.extra
|
For the given data target, when targeting gene / feature expression, should that data be used directly (default) or should it be adjusted to be
|
swap.rownames |
optionally named string or string vector.
For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object).
When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for.
See |
shape.panel |
Vector of integers corresponding to ggplot shapes which sets what shapes to use.
When discrete groupings are supplied by Note: Unfortunately, shapes can be hard to see when points are on top of each other & they are more slowly processed by the brain. For these reasons, even as a color blind person myself writing this code, I recommend use of colors for variables with many discrete values. |
rename.color.groups , rename.shape.groups
|
String vector containing new names for the identities of the color or shape overlay groups. |
min.color |
color for |
max.color |
color for |
min , max
|
Number which sets the values associated with the minimum or maximum colors. |
order |
String. If the data should be plotted based on the order of the color data, sets whether to plot (from back to front) in "increasing", "decreasing", "randomize" order.
If left as "unordered", plot order is simply based on the order of cells within the |
xlab , ylab
|
Strings which set the labels for the axes. To remove, set to |
main |
String, sets the plot title.
A default title is automatically generated if based on |
sub |
String, sets the plot subtitle. |
theme |
A ggplot theme which will be applied before dittoSeq adjustments.
Default = |
do.hover |
Logical which controls whether the object will be converted to a plotly object so that data about individual points will be displayed when you hover your cursor over them.
|
hover.data |
String vector of gene and metadata names, example: |
hover.assay , hover.slot , hover.adjustment
|
Similar to the x, y, color, and extra versions, when showing expression data upon hover, these set what data will be shown. |
do.contour |
Logical. Whether density-based contours should be displayed. |
contour.color |
String that sets the color(s) of the |
contour.linetype |
String or numeric which sets the type of line used for |
add.trajectory.lineages |
List of vectors representing trajectory paths, each from start-cluster to end-cluster, where vector contents are the names of clusters provided in the If the |
add.trajectory.curves |
List of matrices, each representing coordinates for a trajectory path, from start to end, where matrix columns represent x and y coordinates of the paths. |
trajectory.cluster.meta |
String name of metadata containing the clusters that were used for generating trajectories. Required when plotting trajectories using the |
trajectory.arrow.size |
Number representing the size of trajectory arrows, in inches. Default = 0.15. |
do.letter |
Logical which sets whether letters should be added on top of the colored dots. For extended colorblindness compatibility.
NOTE: |
do.ellipse |
Logical. Whether the groups should be surrounded by median-centered ellipses. |
do.label |
Logical. Whether to add text labels near the center (median) of clusters for grouping vars. |
labels.size |
Size of the the labels text |
labels.highlight |
Logical. Whether the labels should have a box behind them |
labels.repel |
Logical, that sets whether the labels' placements will be adjusted with ggrepel to avoid intersections between labels and plot bounds. TRUE by default. |
labels.split.by |
String of one or two metadata names which controls the facet-split calculations for label placements.
Defaults to |
labels.repel.adjust |
A named list which allows extra parameters to be pushed through to ggrepel function calls.
List elements should be valid inputs to the |
legend.show |
Logical. Whether any legend should be displayed. Default = |
legend.color.title , legend.shape.title
|
Strings which set the title for the color or shape legends. |
legend.color.size , legend.shape.size
|
Numbers representing the size at which shapes should be plotted in the color and shape legends (for discrete variable plotting). Default = 5. *Enlarging the icons in the colors legend is incredibly helpful for making colors more distinguishable by color blind individuals. |
legend.color.breaks |
Numeric vector which sets the discrete values to label in the color-scale legend for continuous data. |
legend.color.breaks.labels |
String vector, with same length as |
do.raster |
Logical. When set to |
raster.dpi |
Number indicating dots/pixels per inch (dpi) to use for rasterization. Default = 300. |
data.out |
Logical. When set to |
This function creates a dataframe with X, Y, color, shape, and faceting data determined by x.var
, y.var
, color.var
, shape.var
, and split.by
.
Any extra gene or metadata requested with extra.var
is added as well.
For expression/counts data, assay
, slot
, and adjustment
inputs (.x
, .y
, and .color
) can be used to change which data is used, and if it should be adjusted in some way.
Next, if a set of cells or samples to use is indicated with the cells.use
input, then the dataframe is split into Target_data
and Others_data
based on subsetting by the target cells/samples.
Finally, a scatter plot is created using these dataframes.
Non-target cells are colored in gray if show.others=TRUE
,
and target cell data is displayed on top, colored and shaped based on the color.var
- and shape.by
-associated data.
If split.by
was used, the plot will be split into a matrix of panels based on the associated groupings.
a ggplot scatterplot where colored dots and/or shapes represent individual cells/samples. X and Y axes can be gene expression, numeric metadata, or manually supplied values.
Alternatively, if data.out=TRUE
, a list containing three slots is output: the plot (named 'p'), a data.table containing the underlying data for target cells (named 'Target_data'), and a data.table containing the underlying data for non-target cells (named 'Others_data').
Alternatively, if do.hover
is set to TRUE
, the plot is coverted from ggplot to plotly &
cell/sample information, determined by the hover.data
input, is retrieved, added to the dataframe, and displayed upon hovering the cursor over the plot.
size
and opacity
can be used to adjust the size and transparency of the data points.
Colors used can be adjusted with color.panel
and/or colors
for discrete data, or min
, max
, min.color
, and max.color
for continuous data.
Shapes used can be adjusted with shape.panel
.
Color and shape labels can be changed using rename.color.groups
and rename.shape.groups
.
Titles and axes labels can be adjusted with main
, sub
, xlab
, ylab
, and legend.title
arguments.
Legends can also be adjusted in other ways, using variables that all start with "legend.
" for easy tab completion lookup.
Daniel Bunis and Jared Andrews
getGenes
and getMetas
to see what the x.var
, y.var
, color.var
, shape.by
, and hover.data
options are of an object
.
dittoDimPlot
for making very similar data representations, but where dimensionality reduction (PCA, t-SNE, UMAP, etc.) dimensions are the scatterplot axes.
dittoDimHex
and dittoScatterHex
for showing very similar data representations, but where nearby cells are summarized together in hexagonal bins.
example(importDittoBulk, echo = FALSE) myRNA # Mock up some nCount_RNA and nFeature_RNA metadata # == the default way to extract myRNA$nCount_RNA <- runif(60,200,1000) myRNA$nFeature_RNA <- myRNA$nCount_RNA*runif(60,0.95,1.05) # and also percent.mito metadata myRNA$percent.mito <- sample(c(runif(50,0,0.05),runif(10,0.05,0.2))) dittoScatterPlot( myRNA, x.var = "nCount_RNA", y.var = "nFeature_RNA") # Shapes or colors can be overlaid representing discrete metadata # or (only colors) continuous metadata / expression data by providing # metadata or gene names to 'color.var' and 'shape.by' dittoScatterPlot( myRNA, x.var = "gene1", y.var = "gene2", color.var = "groups", shape.by = "SNP", size = 3) dittoScatterPlot( myRNA, x.var = "gene1", y.var = "gene2", color.var = "gene3") # Note: scatterplots like this can be very useful for dataset QC, especially # with percentage of mitochondrial reads as the color overlay. dittoScatterPlot(myRNA, x.var = "nCount_RNA", y.var = "nFeature_RNA", color.var = "percent.mito") # Data can be "split" or faceted by a discrete variable as well. dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", split.by = "timepoint") # single split.by element dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", split.by = c("groups","SNP")) # row and col split.by elements # OR with 'extra.vars' plus manually faceting for added control dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", extra.vars = c("SNP")) + facet_wrap("SNP", ncol = 1, strip.position = "left") # Countours can also be added to help illuminate overlapping samples dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", do.contour = TRUE) # Multiple continuous metadata or genes can also be plotted together by # giving that vector to 'color.var': dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", color.var = c("gene3", "gene4")) # This functionality can be combined with 1 additional 'split.by' variable, # with the directionality then controlled via 'multivar.split.dir': dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", color.var = c("gene3", "gene4"), split.by = "timepoint", multivar.split.dir = "col") dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", color.var = c("gene3", "gene4"), split.by = "timepoint", multivar.split.dir = "row")
example(importDittoBulk, echo = FALSE) myRNA # Mock up some nCount_RNA and nFeature_RNA metadata # == the default way to extract myRNA$nCount_RNA <- runif(60,200,1000) myRNA$nFeature_RNA <- myRNA$nCount_RNA*runif(60,0.95,1.05) # and also percent.mito metadata myRNA$percent.mito <- sample(c(runif(50,0,0.05),runif(10,0.05,0.2))) dittoScatterPlot( myRNA, x.var = "nCount_RNA", y.var = "nFeature_RNA") # Shapes or colors can be overlaid representing discrete metadata # or (only colors) continuous metadata / expression data by providing # metadata or gene names to 'color.var' and 'shape.by' dittoScatterPlot( myRNA, x.var = "gene1", y.var = "gene2", color.var = "groups", shape.by = "SNP", size = 3) dittoScatterPlot( myRNA, x.var = "gene1", y.var = "gene2", color.var = "gene3") # Note: scatterplots like this can be very useful for dataset QC, especially # with percentage of mitochondrial reads as the color overlay. dittoScatterPlot(myRNA, x.var = "nCount_RNA", y.var = "nFeature_RNA", color.var = "percent.mito") # Data can be "split" or faceted by a discrete variable as well. dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", split.by = "timepoint") # single split.by element dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", split.by = c("groups","SNP")) # row and col split.by elements # OR with 'extra.vars' plus manually faceting for added control dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", extra.vars = c("SNP")) + facet_wrap("SNP", ncol = 1, strip.position = "left") # Countours can also be added to help illuminate overlapping samples dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", do.contour = TRUE) # Multiple continuous metadata or genes can also be plotted together by # giving that vector to 'color.var': dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", color.var = c("gene3", "gene4")) # This functionality can be combined with 1 additional 'split.by' variable, # with the directionality then controlled via 'multivar.split.dir': dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", color.var = c("gene3", "gene4"), split.by = "timepoint", multivar.split.dir = "col") dittoScatterPlot(myRNA, x.var = "gene1", y.var = "gene2", color.var = c("gene3", "gene4"), split.by = "timepoint", multivar.split.dir = "row")
This package was built to make the visualization of single-cell and bulk RNA-sequencing data pipeline-agnostic and accessible for both experienced and novice coders, and for color vision impaired individuals.
Includes many plotting functions (dittoPlot
, dittoDimPlot
, dittoBarPlot
, dittoHeatmap
, ...),
helper funtions (meta
, gene
, isMeta
, getMetas
, ...),
and color adjustment functions (Simulate
, Darken
, Lighten
),
to aid in making sense of RNA sequencing data.
All included plotting functions produce a ggplot object (or pheatmap
/ Heatmap
for dittoHeatmap) by default and can spit out a full plot with just a few arguments.
Many additional arguments are available for customization to generate complex, publication-ready figures.
Default dittoColors
are color blindness friendly and adapted from Wong B, "Points of view: Color blindness." Nature Methods, 2011.
To report bugs, suggest new features, or ask for help, the best method is to create an issue on the github, here, or the bioconductor support site (be sure to tag 'dittoSeq' so that I get a notification!), here
Daniel Bunis
Returns the expression values of a gene for all cells/samples
gene( gene, object, assay = .default_assay(object), slot = .default_slot(object), adjustment = NULL, adj.fxn = NULL, swap.rownames = NULL )
gene( gene, object, assay = .default_assay(object), slot = .default_slot(object), adjustment = NULL, adj.fxn = NULL, swap.rownames = NULL )
gene |
quoted "gene" name = REQUIRED. the gene whose expression data should be retrieved. |
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
assay , slot
|
single strings or integers (SCEs and SEs) or a vector of such values that set which expression data to use.
See |
adjustment |
Should expression data be used directly (default) or should it be adjusted to be
|
adj.fxn |
A function which takes a vector (of metadata values) and returns a vector of the same length. For example, |
swap.rownames |
optionally named string or string vector.
For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object).
When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for.
See |
Returns the expression values of a gene for all cells/samples.
Daniel Bunis
example(importDittoBulk, echo = FALSE) gene("gene1", object = myRNA, assay = "counts") # z-scored gene("gene1", object = myRNA, assay = "counts", adjustment = "z-score") # Log2'd gene("gene1", object = myRNA, assay = "counts", adj.fxn = function(x) {log2(x)}) # To see expression of the gene for the default assay that dittoSeq would use # leave out the assay input # (For this object, the default assay is the logcounts assay) gene("gene1", myRNA) # Seurat (raw counts) if (!requireNamespace("Seurat")) { gene("CD14", object = Seurat::pbmc, assay = "RNA", slot = "counts") }
example(importDittoBulk, echo = FALSE) gene("gene1", object = myRNA, assay = "counts") # z-scored gene("gene1", object = myRNA, assay = "counts", adjustment = "z-score") # Log2'd gene("gene1", object = myRNA, assay = "counts", adj.fxn = function(x) {log2(x)}) # To see expression of the gene for the default assay that dittoSeq would use # leave out the assay input # (For this object, the default assay is the logcounts assay) gene("gene1", myRNA) # Seurat (raw counts) if (!requireNamespace("Seurat")) { gene("CD14", object = Seurat::pbmc, assay = "RNA", slot = "counts") }
Control of Gene/Feature targeting
As of dittoSeq version 1.15.2, we made it possible to target genes / features from across multiple modalities. Here, we describe intricacies of how 'assay', 'slot', and 'swap.rownames' inputs now work to allow for this purpose.
Control of gene/feature targeting in dittoSeq functions aims to blend seamlessly with how similar control works in Seurat, SingleCellExperiment (SCE), and other packages that deal with these data structures. However, as we've built in new features into dittoSeq, and the Seurat and SCE-package maintainers extend their tools as well, some divergence was to be expected.
The way Seurat and SingleCellExperiment objects hold data from multiple modalities is quite distinct, thus it is worth describing each distinctly.
It's also important to note, that both structures utilize the term 'assay', but they utilize it for distinct meanings. Keep that in mind because we chose to stick with the native terminologies within dittoSeq in order maintain intuitiveness with other Seurat or SCE data accession methods. In other words, rather than enforcing a new consistent paradigm, the native Seurat 'assay' meaning is respected for Seurat objects, and the native SCE 'assay' meaning is respected for SCE objects.
When not provided by the user, the defaults for assay
and slot
inputs are:
Seurat-v3+: assay
= DefaultAssay(object)
, slot
= "data"
Seurat-v2 (v2 pre-dates Seurat's own multi-modal capabilities): assay
is not used, slot
= "data"
SingleCellExperiment or SummarizedExperiment: assay
= whichever of "logcounts", "normcounts", or "counts" are found to exist first, prioritized in that order, otherwise the first assay of object's top-level / primary modality; slot
is not used.
The default for swap.rownames
is NULL
, a.k.a. not used.
For Seurat objects, dittoSeq uses of its assay
and slot
inputs for gene/feature retrieval control, and ultimately makes use of Seurat's GetAssayData function for extracting data. (See: '?SeuratObject::GetAssayData')
To allow targeting of features across multiple modalities, we allow provision of multiple assay names to dittoSeq's version of the 'assay' input. Internally, dittoSeq will then loop through all values of 'assay', making a separate calls to GetAssayData for each assay.
Otherwise, dittoSeq's assay
and slot
inputs work exactly the same as described in Seurat's documentation.
Phrased another way, it works via inputs:
assay
- takes the name(s) of Seurat Assays to target. Examples: "RNA"
or c("RNA", "ADT")
slot
- "counts", "data", or "scale.data". Directs which 'slot' of data from the targeted assays to extract from. Example: "data"
As an example, if you wanted to plot raw counts data from 1) the CD4 gene of the RNA assay and 2) the CD4.1 marker of an ADT assay, you would:
1. point the var
or vars
input of the plotter to c("CD4", "CD4.1")
2. target both modalities via assay = c("RNA", "ADT")
(Note that "RNA" and "ADT" are the default assay names typically used, but you do need to match with what is in your own Seurat object if your assays are named differently.)
3. target the raw counts data via slot = "counts"
For SCE objects, dittoSeq makes use of its assay
input for both modality and data form (the meaning of 'assay' for SCEs) control, and ultimately makes use of the assay
and altExp
functions for extracting data.
Additionally, we allow use of the swap.rownames
input to allow targeting & display of alternative gene/feature names. The implementation here is that rownames of the extracted assay data are swapped out for the given rowData
column of the object (or altExp).
When used, note that you will need to use these swapped names for targeting genes / features with gene
, var
, or vars
inputs.
In SCE objects themselves, the primary modality's expression data are stored in 'assay's of the SCE object. You might have one assay containing raw data, and another containing log-normalized data. Additional details of genes/features of this modality, possibly including alternative gene names, can be stored in the object's 'rowData' slot. When additional modalities are collected, the way to store them is via a nested SCE object called an "alternative experiment". Any number of these can be stored in the 'altExps' slot of the SCE object. Each alternative experiment can contain any number of assays. Again each will often have one representing raw data and another representing a normalized form of that data. And, these alternative experiments might also make use of their rowData to store additional characteristics or names of each gene/feature.
The system feels a bit more complicated here, because the SCE system is itself a bit more complicated. But the hope is that this system becomes simple to work with once learned!
To allow targeting of features across multiple modalities, dittoSeq's assay
input can be given:
Simplest form: a single string or string vector where values are either the names of an assay of the primary modality OR the name of an alternative experiment to target, with 'main' as an indicator for the primary modality and 'altexp' as a shortcut for indicating "the first altExp". In this form, when 'main', 'altexp', or the actual name of an alternative experiment are used, the first assay of that targeted modality will be used.
Explicit form: a named string or named vector of string values where names indicate the modality/experiment to target and values indicate what assays of those experiments to target. Here again, you can use 'main' or 'altexp' as names to mean the primary modality and "the first altExp", respectively.
These methods can also be combined. A few examples:
Using the simplified method only: assay = c('main', 'altexp', 'hto')
will target the first assays each of the main object, of the first alternative experiment of the object, and also of an alternative experiment named 'hto'.
Using the explicit form only: assay = c('main'='logexp', 'adt'='clr', 'altexp'='raw')
will target 1) the logexp-named assay of the main object, 2) the clr-named assay of an alternative experiment named 'adt', and 3) the raw-named assay of the first alternative experiment of the object.
Using a combination of the two: assay = c('logexp', 'adt'='clr')
will target 1) the logexp-named assay of the primary modilty, unless there is an alternative experiment named 'logexp' which will lead to grabbing the first assay of that modality, and 2) the clr-named assay of an alternative experiment named 'adt'.
The swap.rownames
input allows swapping to alternative names for genes/features via provision of a column name of rowData(object).
The values of that rowData column are then used to identify and label features of the moadilty's assays instead of the original rownames of the assays.
To allow swap.rownames to also work with the multi-modality access system in the most simplified way, the swap.rownames input also has both a simple and an explicit provision system:
Simple form: a single string or string vector where all modalities will be checked for the presence of a these values in colnames of their rowData. If multiple matches are found, priority goes to the earlier value. Values of matched rowData columns are then set (internally to dittoSeq only) as the rownames of the modality.
Explicit form: similar to the explicit assay use, a named string or named vector of string values where names indicate the modality/experiment to target and values indicate columns to look for among the given modality's rowData. 'main' should be used as the name / indicator for the primary modality, and 'altexp' can be used as a shortcut for indicating "the first altExp".
Examples:
Simplified1: Using assay = c('main', 'altexp'), swap.rownames = "SYMBOL"
with an object where the primary modality rowData has a SYMBOL column and the first alternative experiment's rowData is empty, will lead to swapping to the SYMBOL values for main modality features and use of original rownames for the alternative experiment's features.
(You will also see a warning indicating that the rownames were not swapped for the alternative experiment.)
Simplified2: Using assay = c('main', 'altexp'), swap.rownames = "SYMBOL"
with an object where both modalities' rowData have a SYMBOL column, will lead to swapping to the SYMBOL values both modalities (and no warning).
Explicit: Using assay = c('main', 'altexp'), swap.rownames = c(main="SYMBOL")
with an object where both modalities' rowData have a SYMBOL column, will lead to swapping to the SYMBOL values for main modality only.
As a full example, if you wanted to plot from 1) the raw 'counts' assay for a CD4 gene of the primary modality and 2) the normalized 'logexp' assay for a CD4.1 marker of an alternative experiment assay named 'ADT', but where 3) the rownames of these modalities are Ensembl ids while gene symbol names are held in a rowData column of both modalities that is named "symbols", the simplest provision method is:
1. point the var
or vars
input of the plotter to c("CD4", "CD4.1")
2. target the counts assay of the primary modality and logexp assay of the ADT alternative experiment via assay = c('counts', ADT = 'logexp')
3. swap to the symbol names of features from both modlities by also giving swap.rownames = "symbols"
Some choices within dittoSeq's multi-modality implementation for SCEs were made with a prioritization of ease over creation of edge-cases. Thus, a few known edge-cases exist:
Avoid naming alternative experiments as 'main' or 'altexp'. Because these tokens have been chosen as indicators of "top-level data", and "the first alternative experiment", respectively, any alternative experiment given one of these names will not be able to be reliably accessed via dittoSeq's system.
Explicit-path is required for top-level assays named 'altexp'
Use assay = c(main='altexp')
for a top-level assay named 'altexp'.
Because we think the "simple path" is usefully simpler for cases where it works, assay = 'altexp'
and assay = c('main'='altexp')
are not equivalent.
The explicit method MUST be used to extract from an assay named 'altexp' because assay = 'altexp'
will instead target the first assay of the first altExp of the SCE.
Dan Bunis
Returns the names of all genes of a target object.
getGenes(object, assay = .default_assay(object), swap.rownames = NULL)
getGenes(object, assay = .default_assay(object), swap.rownames = NULL)
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
assay |
Single string or integer that sets which set of seq data inside the object to check. |
swap.rownames |
optionally named string or string vector.
For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object).
When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for.
See |
A string vector, returns the names of all genes of the object
for the requested assay
.
Daniel Bunis
isGene
for returning all genes in an object
gene
for obtaining the expression data of genes
example(importDittoBulk, echo = FALSE) getGenes(object = myRNA, assay = "counts") # To see all genes of an object for the default assay that dittoSeq would use # leave out the assay input getGenes(myRNA) # Seurat # pbmc <- Seurat::pbmc_small # # To see all genes of an object of a particular assay # getGenes(pbmc, assay = "RNA")
example(importDittoBulk, echo = FALSE) getGenes(object = myRNA, assay = "counts") # To see all genes of an object for the default assay that dittoSeq would use # leave out the assay input getGenes(myRNA) # Seurat # pbmc <- Seurat::pbmc_small # # To see all genes of an object of a particular assay # getGenes(pbmc, assay = "RNA")
Returns the names of all meta.data slots of a target object.
getMetas(object, names.only = TRUE)
getMetas(object, names.only = TRUE)
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
names.only |
Logical, |
A string vector of the names of all metadata slots of the object
, or alternatively the entire dataframe of metadatas if names.only
is set to FALSE
Daniel Bunis
isMeta
for checking if certain metadata slots exist in an object
meta
for obtaining the contants of metadata slots
example(importDittoBulk, echo = FALSE) # To see all metadata slots of an object getMetas(myRNA) # To retrieve the entire metadata matrix getMetas(myRNA, names.only = FALSE)
example(importDittoBulk, echo = FALSE) # To see all metadata slots of an object getMetas(myRNA) # To retrieve the entire metadata matrix getMetas(myRNA, names.only = FALSE)
Returns the names of all dimensionality reduction slots of a target object.
getReductions(object)
getReductions(object)
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
A string vector of the names of all dimensionality reduction slots of the object
.
These represent the options for the reduction.use
input of dittoDimPlot
.
Daniel Bunis
example("addDimReduction", echo = FALSE) # To see all metadata slots of an object getReductions(myRNA)
example("addDimReduction", echo = FALSE) # To see all metadata slots of an object getReductions(myRNA)
Extracts Demuxlet information into a pre-made SingleCellExperiment or Seurat object
importDemux( object, raw.cell.names = NULL, lane.meta = NULL, lane.names = NA, demuxlet.best, trim.before_ = TRUE, bypass.check = FALSE, verbose = TRUE )
importDemux( object, raw.cell.names = NULL, lane.meta = NULL, lane.names = NA, demuxlet.best, trim.before_ = TRUE, bypass.check = FALSE, verbose = TRUE )
object |
A pre-made Seurat(v3+) or SingleCellExperiment object to add demuxlet information to. |
raw.cell.names |
A string vector consisting of the raw cell barcodes of the object as they would have been output by cellranger aggr. Format per cell.name = NNN...NNN-# where NNN...NNN are the cell barcode nucleotides, and # is the lane number. This input should be used when additional information has been added directly into the cell names outside of Seurat's standard merge prefix: "user-text_". |
lane.meta |
A string which names a metadata slot that contains which cells came from which droplet-generation wells. |
lane.names |
String vector which sets how the lanes should be named (if you want to give them something different from the default = Lane1, Lane2, Lane3...) |
demuxlet.best |
String or String vector pointing to the location(s) of the .best output file from running of demuxlet. Alternatively, a data.frame representing an already imported .best matrix. |
trim.before_ |
Logical which sets whether any characters in front of an "_" should be deleted from the |
bypass.check |
Logical which sets whether the function should run even when meta.data slots would be over-written. |
verbose |
whether to print messages about the stage of this process that is currently being run & also the summary at the end. |
The function takes in a previously generated Seurat or SingleCellExperiment (SCE) object.
It also takes in demuxlet information either in the form of: (1) the location of a single demuxlet.best out file, (2) the locations of multiple demuxlet.best output files, (3) a user-constructed data.frame created by reading in a demuxlet.best file.
Then it matches barcodes and adds demuxlet-information to the Seurat or SCE as metadata.
For a note on how best to utilize this function with multi-lane droplet-based data, see the devoted section below.
Specifically:
1. If a metadata slot name is provided to lane.meta
, information in that metadata slot is copied into a metadata slot called "Lane".
Alternatively, if lane.meta
is left as NULL
, separate lanes are assumed to be marked by distinct values of "-#" at the end of cell names, as is the typical output of the 10X cellranger count & aggr pipeline.
(1a. If demuxlet.best
was provided as a set of separate file locations (recommended usage in conjunction with 'cellranger aggr'),
the "-#" at the ends of BARCODEs columns from these files are incremented on read-in so that they can match the incrementation applied by cellranger aggr.
See the section on multi-lane scRNAseq for more.)
2. Barcodes in the demuxlet .best data are then matched to barcodes in the object
.
The cell names, colnames(object)
, are used by default for this matching,
but if these have been modified from what would have been given to demuxlet
– outside of -#
at the end or ***_
's at the beginning, as can be added in common merge functions –
raw.cell.names
can be provided and these cell names used instead.
3. Singlet/doublet/ambiguous calls and sample identities (1st only for doublets) are parsed and carried into metadata.
4. Finally, a summary of the results including mean number of SNPs and percentages of singlets and doublets is output unless verbose
is set to FALSE
.
The Seurat or SingleCellExperiment object with metadata added for "Sample" calls and other relevant statistics.
Lane information and demuxlet calls and statistics are imported into the object
as these metadata:
Lane = guided by lane.meta
import input or "-#"s in barcodes, represents the separate droplet-generation lanes.
Sample = The sample call, parsed from the BEST column
demux.doublet.call = whether the sample was a singlet (SNG), doublet (DBL), or ambiguious (AMB), parsed from the BEST column
demux.RD.TOTL = RD.TOTL column
demux.RD.PASS = RD.PASS column
demux.RD.UNIQ = RD.UNIQ column
demux.N.SNP = N.SNP column
demux.PRB.DBL = PRB.DBL column
demux.barcode.dup = (Only generated when TRUEs will exist) whether a cell's barcode in the demuxlet.best refered to only 1 cell in the object
.
(When TRUE, indicates that cells from distinct lanes were interpretted together by demuxlet.
These will often be mistakenly called as doublets.)
There are many different ways such data might initially be processed which will affect its accessibility to importDemux()
.
Initial Processing: 10X recommends running cellranger counts individually for each well/lane. Non-10X droplet-based data from separate lanes should also be processed separately, at least for the steps of collecting reads for individual cells. NOT processing such droplet lanes separately will create artificial doublets from cells that ended up with similar barcodes, but in separate droplet-gen lanes. Thus, proper processing initially leads to creation of separate counts matrices for each droplet-generation lane.
Combining data from each lane: These per-lane counts matrices can be combined in various ways. All options will alter the cell barcode names in a way that makes them unique across lanes, but this uniquification is achieved varies.
Counts table combination methods generally do not adjust adjust BAM files – specifically the cell names embedded within the BAM files which is demuxlet uses for its BARCODEs column.
Thus cell names data may needs to be modified in a proper way in order to make the object
's cell names and demuxlet.best
's BARCODEs match.
Running Demuxlet: Demuxlet should also be run, separately, on the BAM files of each individual lane. Imporperly running demuxlet on a combined BAM file can lead to loss of lane information and then to generation of artificial doublet calls for cells of distinct wells that received simiar barcodes. The BAM file associated with each demuxlet run is what is used for generating the BARCODE column of the demuxlet output.
How importDemux() handles barcode matching:
importDemux
is built to work with the 'cellranger aggr' pipeline by default, but can be used for demuxlet datasets processed differently as well (Option 2).
Option 1: When you merge matrices of all lanes with cellranger aggr before R import, aggr's barcode uniquification method is to increment a "-1", "-2", "-3", ... "-#" that is appended to the end of all barcode names. The number is incremented for each succesive lane. Note that lane-numbers depend on the order in which they were supplied to cellranger aggr.
to use: Simply supply a demuxlet.best
a vector containing the locations of the sepearate '.best' outputs for each lane, in the same order that lanes were provided to aggr.
importDemux
will adjust the "-#" in the demuxlet.best
BARCODEs automatically before performing the matching step.
Option 2: When you instead import your counts data into a Seurat or SingleCellExperiment, and then merge the separate objects into one, the uniquifiction method is dependent on your particular method.
to use: For these methods, it is easiest to
1) import your counts data,
2) transfer in your demuxlet info with importDemux() to each lane's object idividually (You can supply unique lane identifiers to the lane.names
input.),
and then 3) merge the separate objects.
Extra notes for any alternative cases:
For Seurat's merge()
, user-defined strings can be appended to the start of the barcodes, followed by an "_".
By default, importDemux()
will ignore these, but such ignorance can be controlled with the trim.before_
input.
Alternatively, cell names that are consistent with the demuxlet.best
BARCODEs can be supplied to the raw.cell.names
input.
Daniel Bunis
Included QC visualizations:
demux.calls.summary
for plotting the number of sample annotations assigned within each lane.
demux.SNP.summary
for plotting the number of SNPs measured per cell.
Or, see Kang et al. Nature Biotechnology, 2018 https://www.nature.com/articles/nbt.4042 for more information about the demuxlet cell-sample deconvolution method.
#Prep: loading in an example dataset and sample demuxlet data example("importDittoBulk", echo = FALSE) demux <- demuxlet.example colnames(myRNA) <- demux$BARCODE[seq_len(ncol(myRNA))] ### ### Method 1: Lanes info stored in a metadata ### # Notice there is a groups metadata in this Seurat object. getMetas(myRNA) # We will treat these as if that holds Lane information # Now, running importDemux: myRNA <- importDemux( myRNA, lane.meta = "groups", demuxlet.best = demux) # Note, importDemux can also take in the location of the .best file. # myRNA <- importDemux( # object = myRNA, # lane.meta = "groups", # demuxlet.best = "Location/filename.best") # demux.SNP.summary() and demux.calls.summary() can now be used. demux.SNP.summary(myRNA) demux.calls.summary(myRNA) ### ### Method 2: cellranger aggr combined data (denoted with "-#" in barcodes) ### # If cellranger aggr was used, lanes will be denoted by "-1", "-2", ... "-#" # at the ends of Seurat cellnames. # Demuxlet should be run on each lane individually. # Provided locations of each demuxlet.best output file, *in the same order # that lanes were provided to cellranger aggr* this function will then # adjust the "-#" within the .best BARCODEs automatically before matching # # myRNA <- importDemux( # object = myRNA, # demuxlet.best = c( # "Location/filename1.best", # "Location/filename2.best"), # lane.names = c("g1","g2"))
#Prep: loading in an example dataset and sample demuxlet data example("importDittoBulk", echo = FALSE) demux <- demuxlet.example colnames(myRNA) <- demux$BARCODE[seq_len(ncol(myRNA))] ### ### Method 1: Lanes info stored in a metadata ### # Notice there is a groups metadata in this Seurat object. getMetas(myRNA) # We will treat these as if that holds Lane information # Now, running importDemux: myRNA <- importDemux( myRNA, lane.meta = "groups", demuxlet.best = demux) # Note, importDemux can also take in the location of the .best file. # myRNA <- importDemux( # object = myRNA, # lane.meta = "groups", # demuxlet.best = "Location/filename.best") # demux.SNP.summary() and demux.calls.summary() can now be used. demux.SNP.summary(myRNA) demux.calls.summary(myRNA) ### ### Method 2: cellranger aggr combined data (denoted with "-#" in barcodes) ### # If cellranger aggr was used, lanes will be denoted by "-1", "-2", ... "-#" # at the ends of Seurat cellnames. # Demuxlet should be run on each lane individually. # Provided locations of each demuxlet.best output file, *in the same order # that lanes were provided to cellranger aggr* this function will then # adjust the "-#" within the .best BARCODEs automatically before matching # # myRNA <- importDemux( # object = myRNA, # demuxlet.best = c( # "Location/filename1.best", # "Location/filename2.best"), # lane.names = c("g1","g2"))
import bulk sequencing data into a SingleCellExperiment format that will work with other dittoSeq functions.
importDittoBulk(x, reductions = NULL, metadata = NULL, combine_metadata = TRUE)
importDittoBulk(x, reductions = NULL, metadata = NULL, combine_metadata = TRUE)
x |
A Alternatively, for import from a raw matrix format, a named list of matrices (or matrix-like objects) where names will become the assay names of the eventual SCE. NOTE: As of dittoSeq version 1.1.11, all dittoSeq functions can work directly with SummarizedExperiment objects, so this import function is nolonger required for such data. |
reductions |
A named list of dimensionality reduction embeddings matrices.
Names will become the names of the dimensionality reductions and how each will be used with the For each matrix, rows of the matrices should represent the different samples of the dataset, and columns the different dimensions. |
metadata |
A data.frame (or data.frame-like object) where rows represent samples and named columns represent the extra information about such samples that should be accessible to visualizations. The names of these columns can then be used to retrieve and plot such data in any dittoSeq visualization. |
combine_metadata |
Logical which sets whether original When
|
A SingleCellExperiment
object...
that contains all assays (SummarizedExperiment; includes DESeqDataSets), all standard slots (DGEList; see below for specifics), or expression matrices of the input x
,
as well as any dimensionality reductions provided to reductions
, and any provided metadata
stored in colData.
As of dittoSeq version 1.1.11, all dittoSeq functions can work directly with SummarizedExperiment objects, so this import function is nolonger required for such data.
One recommended assay to create if it is not already present in your dataset, is a log-normalized version of the counts data. The logNormCounts function of the scater package is an easy way to make such a slot.
dittoSeq visualizations default to grabbing expression data from an assay named logcounts > normcounts > counts
SingleCellExperiment
for more information about this storage structure.
library(SingleCellExperiment) # Generate some random data nsamples <- 60 exp <- matrix(rpois(1000*nsamples, 20), ncol=nsamples) colnames(exp) <- paste0("sample", seq_len(ncol(exp))) rownames(exp) <- paste0("gene", seq_len(nrow(exp))) logexp <- log2(exp + 1) # Dimensionality Reductions pca <- matrix(runif(nsamples*5,-2,2), nsamples) tsne <- matrix(rnorm(nsamples*2), nsamples) # Some Metadata conds <- factor(rep(c("condition1", "condition2"), each=nsamples/2)) timept <- rep(c("d0", "d3", "d6", "d9"), each = 15) genome <- rep(c(rep(TRUE,7),rep(FALSE,8)), 4) grps <- sample(c("A","B","C","D"), nsamples, TRUE) clusts <- as.character(1*(tsne[,1]>0&tsne[,2]>0) + 2*(tsne[,1]<0&tsne[,2]>0) + 3*(tsne[,1]>0&tsne[,2]<0) + 4*(tsne[,1]<0&tsne[,2]<0)) score1 <- seq_len(nsamples)/2 score2 <- rnorm(nsamples) ### We can import the counts directly myRNA <- importDittoBulk( x = list(counts = exp, logcounts = logexp)) ### Adding metadata & PCA or other dimensionality reductions # We can add these directly during import, or after. myRNA <- importDittoBulk( x = list(counts = exp, logcounts = logexp), metadata = data.frame( conditions = conds, timepoint = timept, SNP = genome, groups = grps), reductions = list( pca = pca)) myRNA$clustering <- clusts myRNA <- addDimReduction( myRNA, embeddings = tsne, name = "tsne") # (other packages SCE manipulations can also be used) ### When we import from SummarizedExperiment, all metadata is retained. # The object is just 'upgraded' to hold extra slots. # The output is the same, aside from a message when metadata are replaced. se <- SummarizedExperiment( list(counts = exp, logcounts = logexp)) myRNA <- importDittoBulk( x = se, metadata = data.frame( conditions = conds, timepoint = timept, SNP = genome, groups = grps, clustering = clusts, score1 = score1, score2 = score2), reductions = list( pca = pca, tsne = tsne)) myRNA ### For DESeq2, how we might have made this: # DESeqDataSets are SummarizedExperiments, and behave similarly # library(DESeq2) # dds <- DESeqDataSetFromMatrix( # exp, data.frame(conditions), ~ conditions) # dds <- DESeq(dds) # dds_ditto <- importDittoBulk(dds) ### For edgeR, DGELists are a separate beast. # dittoSeq imports what I know to commonly be inside them, but please submit # an issue on the github (dtm2451/dittoSeq) if more should be retained. # library(edgeR) # dgelist <- DGEList(counts=exp, group=conditions) # dge_ditto <- importDittoBulk(dgelist)
library(SingleCellExperiment) # Generate some random data nsamples <- 60 exp <- matrix(rpois(1000*nsamples, 20), ncol=nsamples) colnames(exp) <- paste0("sample", seq_len(ncol(exp))) rownames(exp) <- paste0("gene", seq_len(nrow(exp))) logexp <- log2(exp + 1) # Dimensionality Reductions pca <- matrix(runif(nsamples*5,-2,2), nsamples) tsne <- matrix(rnorm(nsamples*2), nsamples) # Some Metadata conds <- factor(rep(c("condition1", "condition2"), each=nsamples/2)) timept <- rep(c("d0", "d3", "d6", "d9"), each = 15) genome <- rep(c(rep(TRUE,7),rep(FALSE,8)), 4) grps <- sample(c("A","B","C","D"), nsamples, TRUE) clusts <- as.character(1*(tsne[,1]>0&tsne[,2]>0) + 2*(tsne[,1]<0&tsne[,2]>0) + 3*(tsne[,1]>0&tsne[,2]<0) + 4*(tsne[,1]<0&tsne[,2]<0)) score1 <- seq_len(nsamples)/2 score2 <- rnorm(nsamples) ### We can import the counts directly myRNA <- importDittoBulk( x = list(counts = exp, logcounts = logexp)) ### Adding metadata & PCA or other dimensionality reductions # We can add these directly during import, or after. myRNA <- importDittoBulk( x = list(counts = exp, logcounts = logexp), metadata = data.frame( conditions = conds, timepoint = timept, SNP = genome, groups = grps), reductions = list( pca = pca)) myRNA$clustering <- clusts myRNA <- addDimReduction( myRNA, embeddings = tsne, name = "tsne") # (other packages SCE manipulations can also be used) ### When we import from SummarizedExperiment, all metadata is retained. # The object is just 'upgraded' to hold extra slots. # The output is the same, aside from a message when metadata are replaced. se <- SummarizedExperiment( list(counts = exp, logcounts = logexp)) myRNA <- importDittoBulk( x = se, metadata = data.frame( conditions = conds, timepoint = timept, SNP = genome, groups = grps, clustering = clusts, score1 = score1, score2 = score2), reductions = list( pca = pca, tsne = tsne)) myRNA ### For DESeq2, how we might have made this: # DESeqDataSets are SummarizedExperiments, and behave similarly # library(DESeq2) # dds <- DESeqDataSetFromMatrix( # exp, data.frame(conditions), ~ conditions) # dds <- DESeq(dds) # dds_ditto <- importDittoBulk(dds) ### For edgeR, DGELists are a separate beast. # dittoSeq imports what I know to commonly be inside them, but please submit # an issue on the github (dtm2451/dittoSeq) if more should be retained. # library(edgeR) # dgelist <- DGEList(counts=exp, group=conditions) # dge_ditto <- importDittoBulk(dgelist)
Retrieve whether a given object would be treated as bulk versus single-cell by dittoSeq
isBulk(object)
isBulk(object)
object |
A target Seurat, SingleCellExperiment, or SummarizedExperiment object |
Logical: whether the provided object would be treated as bulk data by dittoSeq.
TRUE for SummarizedExperiments that are not SCEs, and for SCEs with $bulk = TRUE
in their internal metadata.
FALSE for any other object type and for SCEs without such internal metadata
setBulk
to (add to and) set the internal metadata of an SCE to say whether the object repressents bulk data.
example(importDittoBulk, echo = FALSE) myRNA isBulk(myRNA) scRNA <- setBulk(myRNA, FALSE) isBulk(scRNA)
example(importDittoBulk, echo = FALSE) myRNA isBulk(myRNA) scRNA <- setBulk(myRNA, FALSE) isBulk(scRNA)
Tests if input is the name of a gene in a target object.
isGene( test, object, assay = .default_assay(object), return.values = FALSE, swap.rownames = NULL )
isGene( test, object, assay = .default_assay(object), return.values = FALSE, swap.rownames = NULL )
test |
String or vector of strings, the "potential.gene.name"(s) to check for. |
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
assay |
single string or integer that sets which set of seq data inside the object to check. |
return.values |
Logical which sets whether the function returns a logical |
swap.rownames |
optionally named string or string vector.
For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object).
When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for.
See |
Returns a logical vector indicating whether each instance in test
is a rowname within the requested assay
of the object
.
Alternatively, returns the values of test
that were indeed rownames if return.values = TRUE
.
Daniel Bunis
getGenes
for returning all genes in an object
gene
for obtaining the expression data of genes
example(importDittoBulk, echo = FALSE) # To see the first 10 genes of an object of a particular assay getGenes(myRNA, assay = "counts")[1:10] # To see all genes of an object for the default assay that dittoSeq would use # leave out the assay input (again, remove `head()`) head(getGenes(myRNA)) # To test if something is a gene in an object: isGene("gene1", object = myRNA) # TRUE isGene("CD12345", myRNA) # FALSE # To test if many things are genes of an object isGene(c("gene1", "gene2", "not-a-gene", "CD12345"), myRNA) # 'return.values' input is especially useful in these cases. isGene(c("gene1", "gene2", "not-a-gene", "CD12345"), myRNA, return.values = TRUE)
example(importDittoBulk, echo = FALSE) # To see the first 10 genes of an object of a particular assay getGenes(myRNA, assay = "counts")[1:10] # To see all genes of an object for the default assay that dittoSeq would use # leave out the assay input (again, remove `head()`) head(getGenes(myRNA)) # To test if something is a gene in an object: isGene("gene1", object = myRNA) # TRUE isGene("CD12345", myRNA) # FALSE # To test if many things are genes of an object isGene(c("gene1", "gene2", "not-a-gene", "CD12345"), myRNA) # 'return.values' input is especially useful in these cases. isGene(c("gene1", "gene2", "not-a-gene", "CD12345"), myRNA, return.values = TRUE)
Tests if an input is the name of a meta.data slot in a target object.
isMeta(test, object, return.values = FALSE)
isMeta(test, object, return.values = FALSE)
test |
String or vector of strings, the "potential.metadata.name"(s) to check for. |
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
return.values |
Logical which sets whether the function returns a logical |
For Seurat objects, also returns TRUE for the input "ident"
because, for all dittoSeq visualiztions, "ident"
will retrieve a Seurat objects' clustering slot.
Returns a logical or logical vector indicating whether each instance in test
is a meta.data slot within the object
.
Alternatively, returns the values of test
that were indeed metadata slots if return.values = TRUE
.
Daniel Bunis
getMetas
for returning all metadata slots of an object
meta
for obtaining the contants of metadata slots
example(importDittoBulk, echo = FALSE) # To check if something is a metadata slot isMeta("timepoint", object = myRNA) # FTRUE isMeta("nCount_RNA", object = myRNA) # FALSE # To test if many things are metadata of an object isMeta(c("age","groups"), myRNA) # FALSE, TRUE # 'return.values' input is especially useful in these cases. isMeta(c("age","groups"), myRNA, return.values = TRUE) # Alternatively, to see all metadata slots of an object, use getMetas getMetas(myRNA)
example(importDittoBulk, echo = FALSE) # To check if something is a metadata slot isMeta("timepoint", object = myRNA) # FTRUE isMeta("nCount_RNA", object = myRNA) # FALSE # To test if many things are metadata of an object isMeta(c("age","groups"), myRNA) # FALSE, TRUE # 'return.values' input is especially useful in these cases. isMeta(c("age","groups"), myRNA, return.values = TRUE) # Alternatively, to see all metadata slots of an object, use getMetas getMetas(myRNA)
A wrapper for the lighten function of the colorspace package.
Lighten(colors, percent.change = 0.25, relative = TRUE)
Lighten(colors, percent.change = 0.25, relative = TRUE)
colors |
the color(s) input. Can be a list of colors, for example, /codedittoColors(). |
percent.change |
# between 0 and 1. the percentage to darken by. Defaults to 0.25 if not given. |
relative |
TRUE/FALSE. Whether the percentage should be a relative change versus an absolute one. Default = TRUE. |
Return a lighter version of the color in hexadecimal color form (="#RRGGBB" in base 16)
Daniel Bunis
Lighten("blue") #"blue" = "#0000FF" #Output: "#4040FF" Lighten(dittoColors()[1:8]) #Works for multiple color inputs as well.
Lighten("blue") #"blue" = "#0000FF" #Output: "#4040FF" Lighten(dittoColors()[1:8]) #Works for multiple color inputs as well.
Returns the values of a meta.data for all cells/samples
meta(meta, object, adjustment = NULL, adj.fxn = NULL)
meta(meta, object, adjustment = NULL, adj.fxn = NULL)
meta |
String, the name of the "metadata" slot to grab. OR "ident" to retireve the clustering of a Seurat |
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
adjustment |
A recognized string indicating whether numeric metadata should be used directly (default) versus adjusted to be
Ignored if the target metadata is not numeric. |
adj.fxn |
A function which takes a vector (of metadata values) and returns a vector of the same length. For example, |
Retrieves the values of a metadata slot from object
, or the clustering slot if meta = "ident"
and the object
is a Seurat.
If adjustment
or adj.fxn
are provided, then these requested adjustments are applied to these values (adjustment
first).
Note: Alterations via adjustment
are only applied when metadata is numeric, but adj.fxn
alterations are applied to metadata of any type.
Lastly, outputs these values are named as the cells'/samples' names.
A named vector.
Daniel Bunis
metaLevels
for returning just the unique discrete identities that exist within a metadata slot
getMetas
for returning all metadata slots of an object
isMeta
for testing whether something is the name of a metadata slot
example(importDittoBulk, echo = FALSE) meta("groups", object = myRNA) myRNA$numbers <- seq_len(ncol(myRNA)) meta("numbers", myRNA, adjustment = "z-score") meta("numbers", myRNA, adj.fxn = as.factor) meta("numbers", myRNA, adj.fxn = function(x) {log2(x)})
example(importDittoBulk, echo = FALSE) meta("groups", object = myRNA) myRNA$numbers <- seq_len(ncol(myRNA)) meta("numbers", myRNA, adjustment = "z-score") meta("numbers", myRNA, adj.fxn = as.factor) meta("numbers", myRNA, adj.fxn = function(x) {log2(x)})
Gives the distinct values of a meta.data slot (or ident)
metaLevels(meta, object, cells.use = NULL, used.only = TRUE)
metaLevels(meta, object, cells.use = NULL, used.only = TRUE)
meta |
quoted "meta.data.slot" name = REQUIRED. the meta.data slot whose potential values should be retrieved. |
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
cells.use |
String vector of cells'/samples' names OR an integer vector specifying the indices of cells/samples which should be included. Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include. |
used.only |
TRUE by default, for target metadata that are factors, whether levels nonexistent in the target data should be ignored. |
String vector, the distinct values of a metadata slot (factor or not) among all cells/samples, or for a subset of cells/samples.
(Alternatively, returns the distinct values of clustering if meta = "ident"
and the object is a Seurat
object).
Daniel Bunis
meta
for returning an entire metadata slots of an object
, not just the potential levels
getMetas
for returning all metadata slots of an object
isMeta
for testing whether something is the name of a metadata slot
example(importDittoBulk, echo = FALSE) metaLevels("clustering", object = myRNA) # Note: Set 'used.only' (default = TRUE) to FALSE to show unused levels # of metadata that are already factors. By default, only the in use options # of a metadata are shown. metaLevels("clustering", myRNA, used.only = FALSE)
example(importDittoBulk, echo = FALSE) metaLevels("clustering", object = myRNA) # Note: Set 'used.only' (default = TRUE) to FALSE to show unused levels # of metadata that are already factors. By default, only the in use options # of a metadata are shown. metaLevels("clustering", myRNA, used.only = FALSE)
Generates dittoDimPlots for multiple features.
multi_dittoDimPlot( object, vars, ncol = NULL, nrow = NULL, axes.labels.show = FALSE, list.out = FALSE, OUT.List = NULL, ..., xlab = NA, ylab = NA, data.out = FALSE, do.hover = FALSE, legend.show = FALSE )
multi_dittoDimPlot( object, vars, ncol = NULL, nrow = NULL, axes.labels.show = FALSE, list.out = FALSE, OUT.List = NULL, ..., xlab = NA, ylab = NA, data.out = FALSE, do.hover = FALSE, legend.show = FALSE )
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
vars |
c("var1","var2","var3",...). A vector of vars ('var' in regular |
ncol , nrow
|
Integer or NULL. How many columns or rows the plots should be arranged into. |
axes.labels.show |
Logical. Whether axis labels should be shown.
Subordinate to |
list.out |
Logical. (Default = FALSE) When set to |
OUT.List |
Deprecated. Use |
... , xlab , ylab , data.out , do.hover , legend.show
|
other parameters passed to |
Given multiple 'var' parameters to vars
, this function creates a dittoDimPlot
for each one, with minor defaulting tweaks (see below).
By default, these dittoDimPlots are arranged into a grid.
Alternatively, if list.out
is set to TRUE
, they are output as a list with each plot named as the vars
being shown.
All parameters that can be adjusted in dittoDimPlot can be adjusted here, but the only input that will change between plots is var
.
A set of dittoDimPlots either arranged into a grid (default), or output as a list.
axes labels are not shown by default to save space (control with axes.labels.show
or xlab
and ylab
)
legends are also not shown to save space (control with legend.show
)
Daniel Bunis
multi_dittoDimPlotVaryCells
for an alternate dittoDimPlot
multi-plotter where the cells/samples are varied between plots.
dittoDimPlot
for the base dittoDimPlot plotting function and details on all accepted inputs.
example(importDittoBulk, echo = FALSE) multi_dittoDimPlot(myRNA, c("gene1", "gene2", "clustering")) # Control grid shape with ncol / nrow multi_dittoDimPlot(myRNA, c("gene1", "gene2", "clustering"), nrow = 1) # Output as list instead multi_dittoDimPlot(myRNA, c("gene1", "gene2", "clustering"), list.out = TRUE)
example(importDittoBulk, echo = FALSE) multi_dittoDimPlot(myRNA, c("gene1", "gene2", "clustering")) # Control grid shape with ncol / nrow multi_dittoDimPlot(myRNA, c("gene1", "gene2", "clustering"), nrow = 1) # Output as list instead multi_dittoDimPlot(myRNA, c("gene1", "gene2", "clustering"), list.out = TRUE)
Generates multiple dittoDimPlots, for a single feature, where each showing different cells
multi_dittoDimPlotVaryCells( object, var, vary.cells.meta, vary.cells.levels = metaLevels(vary.cells.meta, object), show.titles = TRUE, show.allcells.plot = TRUE, allcells.main = "All Cells", show.legend.single = TRUE, show.legend.plots = FALSE, show.legend.allcells.plot = FALSE, nrow = NULL, ncol = NULL, list.out = FALSE, OUT.List = NULL, ..., assay = .default_assay(object), slot = .default_slot(object), adjustment = NULL, min = NULL, max = NULL, color.panel = dittoColors(), colors = seq_along(color.panel), data.out = FALSE, do.hover = FALSE, swap.rownames = NULL )
multi_dittoDimPlotVaryCells( object, var, vary.cells.meta, vary.cells.levels = metaLevels(vary.cells.meta, object), show.titles = TRUE, show.allcells.plot = TRUE, allcells.main = "All Cells", show.legend.single = TRUE, show.legend.plots = FALSE, show.legend.allcells.plot = FALSE, nrow = NULL, ncol = NULL, list.out = FALSE, OUT.List = NULL, ..., assay = .default_assay(object), slot = .default_slot(object), adjustment = NULL, min = NULL, max = NULL, color.panel = dittoColors(), colors = seq_along(color.panel), data.out = FALSE, do.hover = FALSE, swap.rownames = NULL )
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
var |
String name of a "gene" or "metadata" (or "ident" for a Seurat Alternatively, can be a vector of same length as there are cells/samples in the |
vary.cells.meta |
String name of a metadata that should be used for selecting which cells to show in each "VaryCells" |
vary.cells.levels |
The values/groupings of the |
show.titles |
Logical which sets whether grouping-levels should be used as titles for the individual VaryCell plots. Default = TRUE. |
show.allcells.plot |
Logical which sets whether an additional plot showing all of the cells should be added. |
allcells.main |
String which adjusts the title of the allcells plot. Default = "All Cells". Set to |
show.legend.single |
Logical which sets whether to add a single legend as an additional plot. Default = TRUE. |
show.legend.plots |
Logical which sets whether or not legends should be plotted in inidividual VaryCell plots. Default = FALSE. |
show.legend.allcells.plot |
Logical which sets whether or a legend should be plotted in the allcells plot. Default = FALSE. |
ncol , nrow
|
Integers which set dimensions of the plot grid when |
list.out |
Logical which controls whether the list of plots should be returned as a list instead of as a single grid arrangement of the plots. |
OUT.List |
Deprecated. Use |
... , color.panel , colors , min , max , assay , slot , adjustment , data.out , do.hover , swap.rownames
|
additional parameters passed to All parameters of |
This function generates separate dittoDimPlots that show the same target data, but each for distinct cells.
How cells are separated into distinct plots is controlled with the vary.cells.meta
parameter.
Individual dittoDimPlot
s are created for all levels of var.cells.meta
groupings given to the vary.cells.levels
input (default = all).
The function then appends a plot containing all cell/samples when show.allcells.plot = TRUE
, with title of this plot controlled by allcells.main
,
as well as as single legend when show.legend.single = TRUE
.
By default, these dittoDimPlots are output in a grid (default) with ncol
columns and nrow
rows,
Alternatively, if list.out
is set to TRUE
, they are returned as a list.
In the list, the VaryCell plots will be named by the levels of vary.cells.meta
that they contain,
and the optional allcells plot and single legend will be named "allcells" and "legend", respectively.
Either continuous or discrete var
data can be displayed.
For continuous data, the range of potential values is calculated at the start, and set, so that colors represent the same value across all plots.
For discrete data, colors used in each plot are adjusted so that colors represent the same groupings across all plots.
A set of dittoDimPlots either arranged into a grid (default), or output as a list.
Daniel Bunis
multi_dittoDimPlot
for an alternate dittoDimPlot
multi-plotter where var
s are varied across plots rather than cells/samples
dittoDimPlot
for the base dittoDimPlot plotting function and details on all accepted inputs.
example(importDittoBulk, echo = FALSE) # This function can be used to quickly scan for differences in expression # within or across clusters/cell types. multi_dittoDimPlotVaryCells(myRNA, "gene1", vary.cells.meta = "clustering") # Output as list instead multi_dittoDimPlotVaryCells(myRNA, "gene1", vary.cells.meta = "clustering", list.out = TRUE) # This function is also great for generating separate plots of each individual # grouping of a tsne/PCA/umap. This can be useful to check for dispersion # of groups that might otherwise be hidden behind other cells/samples. # The effect is similar to faceting, but: all distinct plots are treated # separately rather than being just a part of the whole, and with portrayal # of all cells/samples in an additional plot by default. # # To do so, set 'var' and 'vary.cells.meta' the same. multi_dittoDimPlotVaryCells(myRNA, "clustering", vary.cells.meta = "clustering") # The function can also be used to quickly visualize how separate clustering # resolutions match up to each other, or perhaps how certain conditions of # cells disperse across clusters. # (For an alternative method of viewing, and easily quantifying, how discrete # conditions of cells disperse across clusters, see '?dittoBarPlot') multi_dittoDimPlotVaryCells(myRNA, "groups", vary.cells.meta = "clustering")
example(importDittoBulk, echo = FALSE) # This function can be used to quickly scan for differences in expression # within or across clusters/cell types. multi_dittoDimPlotVaryCells(myRNA, "gene1", vary.cells.meta = "clustering") # Output as list instead multi_dittoDimPlotVaryCells(myRNA, "gene1", vary.cells.meta = "clustering", list.out = TRUE) # This function is also great for generating separate plots of each individual # grouping of a tsne/PCA/umap. This can be useful to check for dispersion # of groups that might otherwise be hidden behind other cells/samples. # The effect is similar to faceting, but: all distinct plots are treated # separately rather than being just a part of the whole, and with portrayal # of all cells/samples in an additional plot by default. # # To do so, set 'var' and 'vary.cells.meta' the same. multi_dittoDimPlotVaryCells(myRNA, "clustering", vary.cells.meta = "clustering") # The function can also be used to quickly visualize how separate clustering # resolutions match up to each other, or perhaps how certain conditions of # cells disperse across clusters. # (For an alternative method of viewing, and easily quantifying, how discrete # conditions of cells disperse across clusters, see '?dittoBarPlot') multi_dittoDimPlotVaryCells(myRNA, "groups", vary.cells.meta = "clustering")
Generates dittoPlots for multiple features.
multi_dittoPlot( object, vars, group.by, ncol = 3, nrow = NULL, main = "var", ylab = NULL, list.out = FALSE, OUT.List = NULL, ..., xlab = NULL, data.out = FALSE, do.hover = FALSE, legend.show = FALSE )
multi_dittoPlot( object, vars, group.by, ncol = 3, nrow = NULL, main = "var", ylab = NULL, list.out = FALSE, OUT.List = NULL, ..., xlab = NULL, data.out = FALSE, do.hover = FALSE, legend.show = FALSE )
object |
A Seurat, SingleCellExperiment, or SummarizedExperiment object. |
vars |
c("var1","var2","var3",...). A vector of gene or metadata names from which to generate the separate plots |
group.by |
String representing the name of a metadata to use for separating the cells/samples into discrete groups. |
ncol , nrow
|
Integer or NULL. How many columns or rows the plots should be arranged into. |
main , ylab
|
String which sets whether / how plot titles or y-axis labels should be added to each individual plot
|
list.out |
Logical. (Default = FALSE) When set to |
OUT.List |
Deprecated. Use |
... , xlab , data.out , do.hover , legend.show
|
other paramters passed along to |
Given multiple 'var' parameters to vars
, this function creates a dittoPlot
for each one, with minor defaulting tweaks (see below).
By default, these dittoPlots are arranged into a grid.
Alternatively, if list.out
is set to TRUE
, they are output as a list with each plot named as the vars
being shown.
All parameters that can be adjusted in dittoPlot can be adjusted here, but the only input that will change between plots is the var
.
A set of dittoPlots either arranged into a grid (default), or output as a list.
axes labels are not shown by default to save space (control with xlab
and ylab
)
legends are also not shown to save space (control with legend.show
)
Daniel Bunis
dittoPlot
for the single plot version of this function and details on all accepted inputs.
dittoDotPlot
and dittoPlotVarsAcrossGroups
to show, in a single plot, per-group summaries of the values for multiple vars.
example(importDittoBulk, echo = FALSE) genes <- getGenes(myRNA)[1:4] multi_dittoPlot(myRNA, vars = c("gene1", "gene2", "gene3", "gene4"), group.by = "clustering") #To make it output a grid that is 2x2, to add y-axis labels # instead of titles, and to show legends... multi_dittoPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", nrow = 2, ncol = 2, #Make grid 2x2 (only one of these needed) main = NULL, ylab = "make", #Add y axis labels instead of titles legend.show = TRUE) #Show legends # Output as list instead multi_dittoPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", list.out = TRUE)
example(importDittoBulk, echo = FALSE) genes <- getGenes(myRNA)[1:4] multi_dittoPlot(myRNA, vars = c("gene1", "gene2", "gene3", "gene4"), group.by = "clustering") #To make it output a grid that is 2x2, to add y-axis labels # instead of titles, and to show legends... multi_dittoPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", nrow = 2, ncol = 2, #Make grid 2x2 (only one of these needed) main = NULL, ylab = "make", #Add y axis labels instead of titles legend.show = TRUE) #Show legends # Output as list instead multi_dittoPlot(myRNA, c("gene1", "gene2", "gene3", "gene4"), "clustering", list.out = TRUE)
Set whether a SingleCellExperiment object should be treated as bulk versus single-cell by dittoSeq
setBulk(object, set = TRUE) ## S4 method for signature 'SingleCellExperiment' setBulk(object, set = TRUE)
setBulk(object, set = TRUE) ## S4 method for signature 'SingleCellExperiment' setBulk(object, set = TRUE)
object |
A target SingleCellExperiment object |
set |
Logical, whether the object should be considered as bulk (TRUE) or not (FALSE) |
A SingleCellExperiment
object with "bulk" internal metadata set to set
example(importDittoBulk, echo = FALSE) myRNA isBulk(myRNA) scRNA <- setBulk(myRNA, FALSE) isBulk(scRNA) # Now, if we make a heatmap with this data, we will see that single-cell # defaults (ordering by the first 'annot.by' & cell names not shown) are used. dittoHeatmap(scRNA, getGenes(scRNA)[1:30], annot.by = c("clustering", "groups"), main = "isBulk(object) == FALSE")
example(importDittoBulk, echo = FALSE) myRNA isBulk(myRNA) scRNA <- setBulk(myRNA, FALSE) isBulk(scRNA) # Now, if we make a heatmap with this data, we will see that single-cell # defaults (ordering by the first 'annot.by' & cell names not shown) are used. dittoHeatmap(scRNA, getGenes(scRNA)[1:30], annot.by = c("clustering", "groups"), main = "isBulk(object) == FALSE")
Essentially a wrapper function for colorspace's deutan(), protan(), and tritan() functions. This function will output any dittoSeq plot as it might look to an individual with one of the common forms of colorblindness: deutanopia/deutanomaly, the most common, is when the cones mainly responsible for red vision are defective. Protanopia/protanomaly is when the cones mainly responsible for green vision are defective. In tritanopia/tritanomaly, the defective cones are responsible for blue vision. Note: there are more severe color deficiencies that are even more rare. Unfortunately, for these types of color vision deficiency, only non-color methods, like lettering or shapes, will do much to help.
Simulate( type = c("deutan", "protan", "tritan"), plot.function, ..., color.panel = dittoColors(), min.color = "#F0E442", max.color = "#0072B2" )
Simulate( type = c("deutan", "protan", "tritan"), plot.function, ..., color.panel = dittoColors(), min.color = "#F0E442", max.color = "#0072B2" )
type |
The type of colorblindness that you want to simulate for. Options: "deutan", "protan", "tritan". Anything else, and you will get an error. |
plot.function |
The plotting function that you want to use/simulate. not quoted. and make sure to remove the () that R will try to add. |
... |
other paramters that can be given to dittoSeq plotting functions, including color.panel, used in exactly the same way they are used for those functions. (contrary to the look of this documentation, color.panel will still default to dittoColors() when not provided.) |
color.panel , min.color , max.color
|
The set of colors to be used. |
Outputs a dittoSeq plot with the color.panel / min.color & max.color updated as it might look to a colorblind individual.
Note: Does not currently adjust dittoHeatmap.
Daniel Bunis
example(importDittoBulk, echo = FALSE) Simulate("deutan", dittoDimPlot, object=myRNA, var="clustering", size = 2) Simulate("protan", dittoDimPlot, myRNA, "clustering", size = 2) Simulate("tritan", dittoDimPlot, myRNA, "clustering", size = 2)
example(importDittoBulk, echo = FALSE) Simulate("deutan", dittoDimPlot, object=myRNA, var="clustering", size = 2) Simulate("protan", dittoDimPlot, myRNA, "clustering", size = 2) Simulate("tritan", dittoDimPlot, myRNA, "clustering", size = 2)