Title: | Heatmaps of Stack Profiles from Epigenetic Signals |
---|---|
Description: | The epistack package main objective is the visualizations of stacks of genomic tracks (such as, but not restricted to, ChIP-seq, ATAC-seq, DNA methyation or genomic conservation data) centered at genomic regions of interest. epistack needs three different inputs: 1) a genomic score objects, such as ChIP-seq coverage or DNA methylation values, provided as a `GRanges` (easily obtained from `bigwig` or `bam` files). 2) a list of feature of interest, such as peaks or transcription start sites, provided as a `GRanges` (easily obtained from `gtf` or `bed` files). 3) a score to sort the features, such as peak height or gene expression value. |
Authors: | SACI Safia [aut], DEVAILLY Guillaume [cre, aut] |
Maintainer: | DEVAILLY Guillaume <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.13.0 |
Built: | 2024-12-18 03:41:21 UTC |
Source: | https://github.com/bioc/epistack |
Add an optional bin
metadata column to gr
,
to serve as annotations
for the epistack plots.
addBins(rse, nbins = 5L, bin = NULL)
addBins(rse, nbins = 5L, bin = NULL)
rse |
a SummarizedExperiment or a GRanges object. |
nbins |
an integer number, the number of bins. |
bin |
a vector containing pre-determined bins, in the same
order as |
nbins
is taken into account only if bin
is NULL
.
rse
should be sorted first, usually with the
addMetricAndArrangeGRanges()
function. addBin(rse, bin = vec)
is equivalent to
rse$bin <- vec
, while
addBin(rse, nbins = 5)
will create 5 bins of equal size based on
rse
order.
the RangedSummarizedExperiment or GRanges object with a
new bin
metadata column
addMetricAndArrangeGRanges
plotBinning
data("stackepi") addBins(stackepi) # 3 bins instead of 5 addBins(stackepi, nbins = 3) # assign bins using a vector addBins(stackepi, bin = rep(c("a", "b", "c"), length.out = length(stackepi)))
data("stackepi") addBins(stackepi) # 3 bins instead of 5 addBins(stackepi, nbins = 3) # assign bins using a vector addBins(stackepi, bin = rep(c("a", "b", "c"), length.out = length(stackepi)))
Perform an inner join between a GRanges object and a data.frame. Sort the resulting GRanges based on a metric column.
addMetricAndArrangeGRanges( gr, order, gr_key = "name", order_key = "name", order_value = "exp", shuffle_tie = TRUE )
addMetricAndArrangeGRanges( gr, order, gr_key = "name", order_key = "name", order_value = "exp", shuffle_tie = TRUE )
gr |
a GRanges object. |
order |
a data.frame with at least two columns: keys and values. |
gr_key |
name of the gr metadata column containing unique names for
each genomic region in |
order_key |
name of the |
order_value |
name of the |
shuffle_tie |
a boolean Value (TRUE / FALSE). When TRUE, shuffle the GRanges before sorting, mixing the ties. |
This utility function allow the addition of a metric column to
genomic regions of interest. One of its common use case is to add
gene expression values on a set of transcription start sites.
The resulting GRanges object will only contain regions presents in both
gr
and order
.
a GRanges sorted in descending order.
data("stackepi_gr") ramdomOrder <- data.frame(gene_id = stackepi_gr$gene_id, value = rnorm(length(stackepi_gr))) addMetricAndArrangeGRanges(stackepi_gr, ramdomOrder, gr_key = "gene_id", order_key = "gene_id", order_value = "value")
data("stackepi_gr") ramdomOrder <- data.frame(gene_id = stackepi_gr$gene_id, value = rnorm(length(stackepi_gr))) addMetricAndArrangeGRanges(stackepi_gr, ramdomOrder, gr_key = "gene_id", order_key = "gene_id", order_value = "value")
Perform an inner join between a rangedSummarizedExperiment object and a data.frame. Sort the resulting rangedSummarizedExperiment based on a metric column.
addMetricAndArrangeRSE( rse, order, rse_key = "name", order_key = "name", order_value = "exp", shuffle_tie = TRUE )
addMetricAndArrangeRSE( rse, order, rse_key = "name", order_key = "name", order_value = "exp", shuffle_tie = TRUE )
rse |
a rangedSummarizedExperiment object. |
order |
a data.frame with at least two columns: keys and values. |
rse_key |
name of the gr metadata column containing unique names for
each genomic region in |
order_key |
name of the |
order_value |
name of the |
shuffle_tie |
a boolean Value (TRUE / FALSE). When TRUE, shuffle the GRanges before sorting, mixing the ties. |
This utility function allow the addition of a metric column to
genomic regions of interest. One of its common use case is to add
gene expression values on a set of transcription start sites.
The resulting GRanges object will only contain regions presents in both
rse
and order
.
a rangedSummarizedExperiment sorted in descending order.
data("stackepi") ramdomOrder <- data.frame( gene_id = SummarizedExperiment::rowRanges(stackepi)$gene_id, value = rnorm(length(stackepi)) ) addMetricAndArrangeRSE(stackepi, ramdomOrder, rse_key = "gene_id", order_key = "gene_id", order_value = "value")
data("stackepi") ramdomOrder <- data.frame( gene_id = SummarizedExperiment::rowRanges(stackepi)$gene_id, value = rnorm(length(stackepi)) ) addMetricAndArrangeRSE(stackepi, ramdomOrder, rse_key = "gene_id", order_key = "gene_id", order_value = "value")
Convert objects from the old input format
(GRanges
object) to the new recommanded input format
RangedSummarizedExperiment
.
GRanges2RSE(gr, patterns, names = patterns)
GRanges2RSE(gr, patterns, names = patterns)
gr |
a |
patterns |
A character vector of column prefixes
(can be regular expressions) that should match columns of |
names |
specify the desired names of the assays (if different from patterns). |
Mostly used for backward compatibilities and unit testing.
a RangedSummarizedExperiment
.
data("stackepi_gr") GRanges2RSE(stackepi_gr, patterns = c("window")) GRanges2RSE(stackepi_gr, patterns = c("^window_"), names = c("DNAme"))
data("stackepi_gr") GRanges2RSE(stackepi_gr, patterns = c("window")) GRanges2RSE(stackepi_gr, patterns = c("^window_"), names = c("DNAme"))
Return the average color of a vector of colors, computed in the RGB space.
meanColor(colors)
meanColor(colors)
colors |
a vector of colors |
Input colors can be either in html or color name formats. The alpha channel is supported but optional.
a single color value
meanColor(c("#000000FF", "#FFFFFF00", "#FFFF00FF", "#FF0000FF")) # works with color names meanColor(c("blue", "red")) # Mix color names and HTML codes meanColor(c("blue", "red", "#FFFF00FF")) # works without alpha channel in inputs (but outputs an alpha channel): meanColor(c("#000000", "#FFFFFF", "#FFFF00", "#FF0000"))
meanColor(c("#000000FF", "#FFFFFF00", "#FFFF00FF", "#FF0000FF")) # works with color names meanColor(c("blue", "red")) # Mix color names and HTML codes meanColor(c("blue", "red", "#FFFF00FF")) # works without alpha channel in inputs (but outputs an alpha channel): meanColor(c("#000000", "#FFFFFF", "#FFFF00", "#FF0000"))
Plot the average stack profiles +/- error (sd or sem).
If a bin
column is present in rowRanges(rse)
,
one average profile is drawn for each bin.
plotAverageProfile( rse, assay = NULL, x_labels = c("Before", "Anchor", "After"), palette = colorRampPalette(c("#DF536B", "black", "#61D04F")), alpha_for_se = 0.25, error_type = c("sd", "sem", "ci95"), reversed_z_order = FALSE, ylim = NULL, y_title = NULL, pattern = NULL )
plotAverageProfile( rse, assay = NULL, x_labels = c("Before", "Anchor", "After"), palette = colorRampPalette(c("#DF536B", "black", "#61D04F")), alpha_for_se = 0.25, error_type = c("sd", "sem", "ci95"), reversed_z_order = FALSE, ylim = NULL, y_title = NULL, pattern = NULL )
rse |
a RangedSummarizedExperiment input. Aletrnatively: can be a
GRanges object
(for backward compatibility, |
assay |
specify the name of the assay to plot,
that should match one of |
x_labels |
x-axis labels. |
palette |
A vector of colors, or a function that returns
a palette of |
alpha_for_se |
the transparency (alpha) value for the error band. |
error_type |
can be either |
reversed_z_order |
should the z-order of the curves be reversed (i.e. first or last bin on top?) |
ylim |
a vector of two numbers corresponding to the y-limits of the plot. |
y_title |
the y-axis title. |
pattern |
only if |
Display a plot.
data("stackepi") plotAverageProfile(stackepi)
data("stackepi") plotAverageProfile(stackepi)
Plot a vertical color bar of the bin
column.
plotBinning( rse, target_height = 650, palette = colorRampPalette(c("#DF536B", "black", "#61D04F")) )
plotBinning( rse, target_height = 650, palette = colorRampPalette(c("#DF536B", "black", "#61D04F")) )
rse |
a RangedSummarizedExperiment input with a column |
target_height |
an integer, the approximate height (in pixels) of the final plot. Used to avoid overplotting artefacts. |
palette |
A vector of colors, or a function that returns
a palette of |
Display a plot.
data("stackepi") rse <- stackepi rse <- addBins(rse, nbins = 3) plotBinning(rse) gr2 <- data.frame(bin = rep(c(1,2,3,4), each = 5)) plotBinning(gr2, palette = colorRampPalette(c("blue4", "forestgreen", "coral3", "goldenrod")))
data("stackepi") rse <- stackepi rse <- addBins(rse, nbins = 3) plotBinning(rse) gr2 <- data.frame(bin = rep(c(1,2,3,4), each = 5)) plotBinning(gr2, palette = colorRampPalette(c("blue4", "forestgreen", "coral3", "goldenrod")))
Plot distribution of a metric values as boxplots
depending of bins.
If the bin
is absent from gr
, a single boxplot is drawn.
plotBoxMetric( rse, metric = "expr", title = "Metric", trans_func = function(x) x, ylim = NULL, ylab = "metric", palette = colorRampPalette(c("#DF536B", "black", "#61D04F")) )
plotBoxMetric( rse, metric = "expr", title = "Metric", trans_func = function(x) x, ylim = NULL, ylab = "metric", palette = colorRampPalette(c("#DF536B", "black", "#61D04F")) )
rse |
a RangedSummarizedExperiment input. Aletrnatively: can be a GRanges object (for backward compatibility). |
metric |
name of the column in |
title |
title of the plot. |
trans_func |
A function to transform value of x before ploting.
Useful to apply log10 transformation
(i.e. with |
ylim |
limit of the y axis; format: |
ylab |
y-axis title |
palette |
A vector of colors, or a function that returns
a palette of |
Display a plot.
data("stackepi") plotBoxMetric( stackepi, trans_func = function(x) x, metric = "exp", title = "Metric" )
data("stackepi") plotBoxMetric( stackepi, trans_func = function(x) x, metric = "exp", title = "Metric" )
Given a list of genomic regions,
epigenetic signals surrounding these regions, and a score for each regions,
plot epigenetic stacks depending on the score. An optional bin
column
allow the grouping of several genomic regions to produce average profiles per
bins.
plotEpistack( rse, assays = NULL, tints = "gray", titles = NULL, legends = "", main = NULL, x_labels = c("Before", "Anchor", "After"), zlim = c(0, 1), ylim = NULL, metric_col = "exp", metric_title = "Metric", metric_label = "metric", metric_ylab = NULL, metric_transfunc = function(x) x, bin_palette = colorRampPalette(c("#DF536B", "black", "#61D04F")), npix_height = 650, n_core = 1, high_mar = c(2.5, 0.6, 4, 0.6), low_mar = c(2.5, 0.6, 0.3, 0.6), error_type = c("ci95", "sd", "sem"), reversed_z_order = FALSE, rel_widths = c(score = 0.35, bin = 0.08, assays = 0.35), rel_heights = c(1, 0.14, 0.3), patterns = NULL, ... )
plotEpistack( rse, assays = NULL, tints = "gray", titles = NULL, legends = "", main = NULL, x_labels = c("Before", "Anchor", "After"), zlim = c(0, 1), ylim = NULL, metric_col = "exp", metric_title = "Metric", metric_label = "metric", metric_ylab = NULL, metric_transfunc = function(x) x, bin_palette = colorRampPalette(c("#DF536B", "black", "#61D04F")), npix_height = 650, n_core = 1, high_mar = c(2.5, 0.6, 4, 0.6), low_mar = c(2.5, 0.6, 0.3, 0.6), error_type = c("ci95", "sd", "sem"), reversed_z_order = FALSE, rel_widths = c(score = 0.35, bin = 0.08, assays = 0.35), rel_heights = c(1, 0.14, 0.3), patterns = NULL, ... )
rse |
a RangedSummarizedExperiment input. Aletrnatively: can be a
GRanges object
(for backward compatibility, |
assays |
specify the name(s) and order of assay(s) to plot. A vector of
names that should match |
tints |
a vector of colors to tint the heatmaps. Can alos be a
function returning |
titles |
titles of each heatmap. Defaults to |
legends |
legend names for the epistacks. |
main |
Main title for the figure. |
x_labels |
a character vector of length 3 used as x-axis labels. |
zlim |
the minimum and maximum z values the heatmap.
Format: |
ylim |
limits of the y axis for bottom plots. |
metric_col |
a character, name of a column in |
metric_title |
title to be display on the leftmost plots. |
metric_label |
label of the leftmost plots. |
metric_ylab |
y axis label of the top left plot. |
metric_transfunc |
a function to transform value of |
bin_palette |
A vector of colors, or a function that returns
a palette of |
npix_height |
The matrix height is reduced to this number of rows before plotting. Useful to limit overplotting artefacts. It should roughtly be set to the pixel height in the final heatmaps |
n_core |
number of core used to speedup the martrix resizing. |
high_mar |
a vector of numerical values corresponding to the margins of the top figures. c(bottom, left, top, right) |
low_mar |
a vector of numerical values corresponding to the margins of the bottom figures. c(bottom, left, top, right) |
error_type |
error_type, can be either |
reversed_z_order |
For the bottom panels: should the z-order of the curves be reversed (i.e. first or last bin on top)? |
rel_widths |
A named vector of three elements of relative panel widths:
|
rel_heights |
A vector of three elements of relative panel heights.
Default to |
patterns |
only if |
... |
Arguments to be passed to |
This function produce a comprehensive figure including epigenetic heatmaps
and average epigenetic profiles from a well formated
RangedSummarizedExperiment
object
with expected rowData metadata columns.
It scales resonably well up to hundreds of
thousands of genomic regions.
The visualisation is centered on an anchor,
a set of genomic coordinated that can be transcription start sites or
peak center for example.
Anchor coordinates are those of the GRanges
used as a rowData in the input RangedSummarizedExperiment object
(hereafter rse
).
Anchors are plotted from top to bottom in the same order as in rse
.
One should sort rse
before plotting if needed.
rse
's rowData should have a metric column that is used in the
leftmost plots.
The name of the metric column must be specified to metric_col
.
The metric can be transformed before plotting if needed using the
metric_transfunc
parameter.
The matrix or matrices used to display the heatmap(s) should be passed as
assay(s) in rse
. Such matrix can be obtained using
EnrichedHeatmap::normalizeToMatrix()
for example. The assay
names are then specified through assays
.
If an optionnal bin
column is present in rse
's rowData,
it will be used
to group genomic regions to performed average profile per bins in the bottom
plots.
Epistack are multipanel plots build using layout()
. Margins for the
panels can be specified using high_mar
and low_mar
parameters
if needed, especially to avoid text overlaps. The default value should
be appropriate in most situations. Individual component can be plotted
using severa epistack
functions such has plotStackProfile()
or plotAverageProfile()
.
Plotting more than > 1000 regions can lead to overplotting
issued as well as some plotting artefacts (such as horizontal white strips).
Both issues can be resolved with fidling with the npix_height
parameter. npix_height
should be smaller than the number of regions,
and in the same order of magnitude of the final heatmap height in pixels.
Last minutes call to the redimMatrix()
function will hapen before
plotting using npix_height
as target height. Parameter n_core
is passed to redimMatrix()
to speed up the down-scaling.
The input can also be a GRanges
object for backward compatibility. See
GRanges2RSE
. patterns
would then be required.
Display a plot.
plotStackProfile
,
plotAverageProfile
,
redimMatrix
,
normalizeToMatrix
,
addMetricAndArrangeGRanges
,
addBins
data("stackepi") plotEpistack(stackepi, metric_col = "exp", ylim = c(0, 1), metric_transfunc = function(x) log10(x+1))
data("stackepi") plotEpistack(stackepi, metric_col = "exp", ylim = c(0, 1), metric_transfunc = function(x) log10(x+1))
Plot a vertical line chart of the metric column, in the same order as the input.
plotMetric( x, trans_func = function(x) x, title = "Metric", ylim = NULL, xlab = NULL, ylab = NULL )
plotMetric( x, trans_func = function(x) x, title = "Metric", ylim = NULL, xlab = NULL, ylab = NULL )
x |
a numeric vector. |
trans_func |
a function to transform |
title |
Title of the plot. |
ylim |
limit of the y axis; format: |
xlab |
x-axis title |
ylab |
y-axis title |
Display a plot.
data("stackepi") plotMetric(SummarizedExperiment::rowRanges(stackepi)$exp)
data("stackepi") plotMetric(SummarizedExperiment::rowRanges(stackepi)$exp)
Display a heatmap of an epigenetic track centered at genomic anchors such as Transcription Start Sites or peak center.
plotStackProfile( rse, assay = NULL, x_labels = c("Before", "Anchor", "After"), title = "", zlim = NULL, palette = function(n) grDevices::hcl.colors(n, rev = TRUE), target_height = 650, summary_func = function(x) mean(x, na.rm = TRUE), n_core = 1, pattern = NULL )
plotStackProfile( rse, assay = NULL, x_labels = c("Before", "Anchor", "After"), title = "", zlim = NULL, palette = function(n) grDevices::hcl.colors(n, rev = TRUE), target_height = 650, summary_func = function(x) mean(x, na.rm = TRUE), n_core = 1, pattern = NULL )
rse |
a RangedSummarizedExperiment input. Aletrnatively: can be a
GRanges object
(for backward compatibility, |
assay |
specify the name of the assay to plot,
that should match one of |
x_labels |
a character vectors of length 3 used to label the x-axis. |
title |
The title of the heatmap |
zlim |
The minimum and maximum z values to match color to values. Format: zlim = c (min, max) |
palette |
a palette of color, (i.e. a function of parameter n that should retrun n colors). |
target_height |
The matrix height is reduced to this number of rows before plotting. Useful to limit overplotting artefacts. It should roughtly be set to the pixel height in the final heatmap. |
summary_func |
function passed to |
n_core |
multicore option, passed to |
pattern |
only if |
The visualisation is centered on an anchor,
a set of genomic coordinated that can be transcription start sites or
peak center for example. Anchor coordinates are those of the
RangedSummarizedExperiment
object used as an input
(hereafter rse
).
Anchors are plotted from top to bottom in the same order as in rse
.
One should sort rse
before plotting if needed.
The matrix used to display the heatmap should be passed as
assay of rse
. Such matrix can be obtained using
EnrichedHeatmap::normalizeToMatrix()
for example.
This function scale reasonnably wells up to hundred thousands
of regions. Overplotting issues are solved by last-minute reduction of the
matrix size using redimMatrix()
.
Display a plot.
plotAverageProfile
,
plotEpistack
,
normalizeToMatrix
,
plotStackProfileLegend
data("stackepi") plotStackProfile(stackepi, target_height = 650, zlim = c(0, 1), palette = colorRampPalette(c("white", "dodgerblue", "black")), title = "DNA methylation")
data("stackepi") plotStackProfile(stackepi, target_height = 650, zlim = c(0, 1), palette = colorRampPalette(c("white", "dodgerblue", "black")), title = "DNA methylation")
Utility function to plot the corresponding legend key of
plotStackProfile()
's plots.
plotStackProfileLegend( zlim, palette = colorRampPalette(c("white", "grey", "black")), title = NA )
plotStackProfileLegend( zlim, palette = colorRampPalette(c("white", "grey", "black")), title = NA )
zlim |
the limits of the values to be displayed.
Format: |
palette |
a palette of color, (i.e. a function of parameter n that should retrun n colors). |
title |
an optionnal title to be display bellow the color legend. |
Display a plot.
plotStackProfileLegend(zlim = c(0, 2), palette = colorRampPalette(c("white", "grey", "black")))
plotStackProfileLegend(zlim = c(0, 2), palette = colorRampPalette(c("white", "grey", "black")))
Reduce the input matrix size by applying a summary function on cells to be fused.
redimMatrix( mat, target_height = 100, target_width = 100, summary_func = function(x) mean(x, na.rm = TRUE), output_type = 0, n_core = 1 )
redimMatrix( mat, target_height = 100, target_width = 100, summary_func = function(x) mean(x, na.rm = TRUE), output_type = 0, n_core = 1 )
mat |
the input matrix. |
target_height |
height of the output matrix
(should be smaller than or equal to |
target_width |
width of the output matrix
(should be smaller than or equal to |
summary_func |
how to summerize cells? A function such has
|
output_type |
Type of the output, to be passed to |
n_core |
number of core to use for parallel processing. |
This function is used to reduce matrix right before plotting them in order to avoid overplotting issues as well as other plotting artefacts.
a resized matrix of size target_width
x target_height
where the summary_fun
was apply
to adjacent cells.
data("stackepi") mat <- SummarizedExperiment::assay(stackepi, "DNAme") dim(mat) smallMat <- redimMatrix(mat, target_height = 10, target_width = ncol(mat)) dim(smallMat) # changing the summary function mat <- matrix(sample(1:40,100,replace=TRUE),nrow=10,ncol=10) dim(mat) smallMat <- redimMatrix(mat, target_height = 5, target_width = ncol(mat), summary_func = function(x) max(x, na.rm = TRUE)) dim(smallMat) # working with colors colmat <- matrix( c("red", "red", "blue", "blue", "red", "blue", "blue", "green"), ncol = 2 ) redimMatrix(colmat, target_height = 2, target_width = 2, summary_func = meanColor, output_type = "color")
data("stackepi") mat <- SummarizedExperiment::assay(stackepi, "DNAme") dim(mat) smallMat <- redimMatrix(mat, target_height = 10, target_width = ncol(mat)) dim(smallMat) # changing the summary function mat <- matrix(sample(1:40,100,replace=TRUE),nrow=10,ncol=10) dim(mat) smallMat <- redimMatrix(mat, target_height = 5, target_width = ncol(mat), summary_func = function(x) max(x, na.rm = TRUE)) dim(smallMat) # working with colors colmat <- matrix( c("red", "red", "blue", "blue", "red", "blue", "blue", "green"), ncol = 2 ) redimMatrix(colmat, target_height = 2, target_width = 2, summary_func = meanColor, output_type = "color")
DNA methylation profiles (from MBD-seq data) arround transcription start sites of the 693 chr18 genes annotated on the pig genome (Sscrofa11.1), as well as gene expression levels in Transcript Per Million (TPM) measured by RNA-seq in the same duodenum sample.
data("stackepi")
data("stackepi")
A RangedSummarizedExperiment
of the 693 rows,
2 rows metadata columns, and one assay containing the DNA methylation
signal.
This dataset was generated from ENSEMBL annotation data and data generated by our lab (publicly available soon).
DNA methylation profiles (from MBD-seq data) arround transcription start sites of the 693 chr18 genes annotated on the pig genome (Sscrofa11.1), as well as gene expression levels in Transcript Per Million (TPM) measured by RNA-seq in the same duodenum sample.
data("stackepi_gr")
data("stackepi_gr")
A GRanges
of the 693 rows and 54 metadata
columns, kept for unit-testing backward-compatibility.
This dataset was generated from ENSEMBL annotation data and data generated by our lab (publicly available soon).