Title: | diffUTR: Streamlining differential exon and 3' UTR usage |
---|---|
Description: | The diffUTR package provides a uniform interface and plotting functions for limma/edgeR/DEXSeq -powered differential bin/exon usage. It includes in addition an improved version of the limma::diffSplice method. Most importantly, diffUTR further extends the application of these frameworks to differential UTR usage analysis using poly-A site databases. |
Authors: | Pierre-Luc Germain [cre, aut] , Stefan Gerber [aut] |
Maintainer: | Pierre-Luc Germain <[email protected]> |
License: | GPL-3 |
Version: | 1.15.0 |
Built: | 2024-11-18 03:42:07 UTC |
Source: | https://github.com/bioc/diffUTR |
addNormalizedAssays
addNormalizedAssays(se, readLength = 50L)
addNormalizedAssays(se, readLength = 50L)
se |
A bin-wise 'SummarizedExperiment' as produced by
|
readLength |
Used as a minimum width to estimate read density (default 50). |
The 'se' object with populated 'logcpm' and 'logNormDensity' assays.
data(example_bin_se) example_bin_se <- addNormalizedAssays(example_bin_se)
data(example_bin_se) example_bin_se <- addNormalizedAssays(example_bin_se)
countFeatures
countFeatures( bamfiles, bins, strandSpecific = 0, readLength = 50L, allowMultiOverlap = TRUE, inclNormalized = TRUE, tmpDir = tempdir(), ... )
countFeatures( bamfiles, bins, strandSpecific = 0, readLength = 50L, allowMultiOverlap = TRUE, inclNormalized = TRUE, tmpDir = tempdir(), ... )
bamfiles |
A vector of paths to bam files |
bins |
A GRanges of bins in which to count reads (or path to a rds file containing such an object |
strandSpecific |
Passed to 'Rsubread::featureCounts' |
readLength |
Used as a minimum width to estimate read density. |
allowMultiOverlap |
Passed to 'Rsubread::featureCounts' |
inclNormalized |
Logical; whether to include normalized assays (needed for plotting) |
tmpDir |
Passed to 'Rsubread::featureCounts' |
... |
Passed to 'Rsubread::featureCounts' |
A RangedSummarizedExperiment-class
data("example_gene_annotation", package="diffUTR") bins <- prepareBins(example_gene_annotation) bam_files <- list.files(system.file("extdata", package="diffUTR"), pattern="bam$", full=TRUE) # not run # se <- countFeatures(bam_files, bins, verbose=FALSE)
data("example_gene_annotation", package="diffUTR") bins <- prepareBins(example_gene_annotation) bam_files <- list.files(system.file("extdata", package="diffUTR"), pattern="bam$", full=TRUE) # not run # se <- countFeatures(bam_files, bins, verbose=FALSE)
deuBinPlot
deuBinPlot( se, gene, type = c("summary", "condition", "sample"), intronSize = 2, exonSize = c("sqrt", "linear", "log"), y = NULL, condition = NULL, size = "type", lineSize = 1, colour = NULL, alpha = NULL, removeAmbiguous = TRUE, minDensityRatio = 0.1 )
deuBinPlot( se, gene, type = c("summary", "condition", "sample"), intronSize = 2, exonSize = c("sqrt", "linear", "log"), y = NULL, condition = NULL, size = "type", lineSize = 1, colour = NULL, alpha = NULL, removeAmbiguous = TRUE, minDensityRatio = 0.1 )
se |
A bin-wise SummarizedExperiment as produced by
|
gene |
The gene of interest |
type |
Either 'summary' (plot DEU summary), 'sample' (plot sample-wise data), or 'condition' (plot data aggregate by condition) |
intronSize |
Intron plot size. If <=3, intron size will be this fraction of the mean exon size. If >3, each intron will have the given size. |
exonSize |
Scaling for exon sizes, either 'sqrt', 'log', or 'linear'. |
y |
Value to plot on the y-axis. If 'type="summary"', this should be a column of 'rowData(se)', otherwise should be an assay name of 'se'. |
condition |
The colData column containing the samples' condition. |
size |
rowData variable to use to determine the thickness of the bins. |
lineSize |
Size of the line connecting the bins. Use 'lineSize=0' to omit the line. |
colour |
rowData variable to use to determine the colour of the bins. If 'type="condition"', can also be "condition"; if 'type="sample"' can be any colData column. |
alpha |
Alpha level, passed to ggplot. |
removeAmbiguous |
Logical; whether to remove bins that are gene-ambiguous (i.e. overlap multiple genes). |
minDensityRatio |
Minimum ratio of read density (with respect to the gene's average) for a bin to be plotted. |
A ggplot object
data(example_bin_se) se <- diffSpliceWrapper(example_bin_se, ~condition) deuBinPlot(se, "Jund")
data(example_bin_se) se <- diffSpliceWrapper(example_bin_se, ~condition) deuBinPlot(se, "Jund")
This is a small improvement to the diffSplice
function
written by Gordon Smyth and Charity Law.
diffSplice2(fit, geneid, exonid = NULL, robust = FALSE, verbose = TRUE)
diffSplice2(fit, geneid, exonid = NULL, robust = FALSE, verbose = TRUE)
fit |
an |
geneid |
gene identifiers (as in |
exonid |
exon identifiers (as in |
robust |
logical, should the estimation of the empirical Bayes prior parameters be robustified against outlier sample variances? |
verbose |
logical, if TRUE will output some diagnostic information |
An MArrayLM-class
object containing both exon
level and gene level tests. Results are sorted by geneid and by exonid
within gene.
library(SummarizedExperiment) library(edgeR) data(example_bin_se) se <- example_bin_se design <- model.matrix(~condition, data=as.data.frame(colData(se))) dds <- calcNormFactors(DGEList(assays(se)$counts)) dds <- voom(dds, design) dds <- lmFit(dds, design) res <- diffSplice2(dds, geneid=rowData(se)$gene, exonid=row.names(se)) topSplice(res)
library(SummarizedExperiment) library(edgeR) data(example_bin_se) se <- example_bin_se design <- model.matrix(~condition, data=as.data.frame(colData(se))) dds <- calcNormFactors(DGEList(assays(se)$counts)) dds <- voom(dds, design) dds <- lmFit(dds, design) res <- diffSplice2(dds, geneid=rowData(se)$gene, exonid=row.names(se)) topSplice(res)
Wrappers around commonly-used DEU methods
(diffSpliceDGE
, DEXSeq
and an
improved version of diffSplice
diffSpliceDGEWrapper( se, design, coef = NULL, QLF = TRUE, robust = TRUE, countFilter = TRUE, excludeTypes = NULL ) diffSpliceWrapper( se, design, coef = NULL, robust = TRUE, improved = TRUE, countFilter = TRUE, excludeTypes = NULL ) DEXSeqWrapper( se, design = ~sample + exon + condition:exon, reducedModel = ~sample + exon, excludeTypes = NULL, ... )
diffSpliceDGEWrapper( se, design, coef = NULL, QLF = TRUE, robust = TRUE, countFilter = TRUE, excludeTypes = NULL ) diffSpliceWrapper( se, design, coef = NULL, robust = TRUE, improved = TRUE, countFilter = TRUE, excludeTypes = NULL ) DEXSeqWrapper( se, design = ~sample + exon + condition:exon, reducedModel = ~sample + exon, excludeTypes = NULL, ... )
se |
A bin-wise SummarizedExperiment as produced by
|
design |
A formula (using columns of 'colData(se)') or (for 'diffSpliceWrapper' or 'diffSpliceDGEWrapper' only) a model.matrix. |
coef |
The coefficient to be tested (ignored for 'DEXSeqWrapper'). |
QLF |
Logical; whether to use edgeR's quasi-likelihood negative binomial (applicable only to 'diffSpliceDGEWrapper'). |
robust |
Logical; whether to use robust fitting for the dispersion trend (ignored for 'DEXSeqWrapper'). |
countFilter |
Logical; whether to filter out low-count bins (ignored for 'DEXSeqWrapper'). |
excludeTypes |
A vector of bin types to ignore for testing. To test
for any kind of differential usage, leave empty. To test for differential
UTR usage, use 'excludeTypes=c("CDS","non-coding")' (or see
|
improved |
Logical; whether to use |
reducedModel |
A reduced formula (applicable only to 'DEXSeqWrapper'). |
... |
Further arguments (passed to 'testForDEU' and 'estimateExonFoldChanges') of 'DEXSeq'. Can for instance be used to enable multithreading, by passing 'BPPARAM=BiocParallel::MulticoreParam(ncores)'. |
The 'se' object with additional rowData columns contain bin (i.e. exon) -level statistics, and a metadata slot containing gene level p-values.
library(SummarizedExperiment) data(example_bin_se) se <- diffSpliceWrapper(example_bin_se, ~condition) head(rowData(se))
library(SummarizedExperiment) data(example_bin_se) se <- diffSpliceWrapper(example_bin_se, ~condition) head(rowData(se))
An object produced by countFeatures
containing small subset of
genes from mouse hippocampal slices undergoing Forskolin-induced long-term
potentiation (GSE84643).
a 'RangedSummarizedExperiment'
https://www.nature.com/articles/s41598-017-17407-w
An example gene annotation containing only a small subset of mouse genes.
a 'GRanges' object
A wrapper around 'ComplexHeatmap'.
geneBinHeatmap( se, gene, what = NULL, anno_rows = c("type", "logWidth", "meanLogDensity", "log10PValue", "geneAmbiguous"), anno_columns = c(), anno_colors = list(), removeAmbiguous = FALSE, merge_legends = TRUE, cluster_columns = FALSE, minDensityRatio = 0.1, left_annotation = NULL, top_annotation = NULL, ... )
geneBinHeatmap( se, gene, what = NULL, anno_rows = c("type", "logWidth", "meanLogDensity", "log10PValue", "geneAmbiguous"), anno_columns = c(), anno_colors = list(), removeAmbiguous = FALSE, merge_legends = TRUE, cluster_columns = FALSE, minDensityRatio = 0.1, left_annotation = NULL, top_annotation = NULL, ... )
se |
A bin-wise SummarizedExperiment as produced by
|
gene |
The gene of interest |
what |
Type of values (i.e. assay) to plot |
anno_rows |
Row annotation columns (i.e. columns of 'rowData(se)') to plot |
anno_columns |
Column annotation columns (i.e. columns of 'colData(se)') to plot |
anno_colors |
Annotation colors, as a list named with the row/column
annotations, see ' |
removeAmbiguous |
Logical; whether to remove bins that are gene-ambiguous (i.e. overlap multiple genes). |
merge_legends |
Logical; whether to merge legends. This effectively calls 'draw(..., merge_legends=TRUE)' around the heatmap. |
cluster_columns |
Logical; whether to cluster columns (passed to
|
minDensityRatio |
Minimum ratio of read density (with respect to the gene's average) for a bin to be plotted. |
left_annotation |
Passed to |
top_annotation |
Passed to |
... |
Passed to 'ComplexHeatmap' (see
|
A Heatmap
data(example_bin_se) se <- diffSpliceWrapper(example_bin_se, ~condition) geneBinHeatmap(se, "Jund")
data(example_bin_se) se <- diffSpliceWrapper(example_bin_se, ~condition) geneBinHeatmap(se, "Jund")
Aggregates bin-level statistics to the gene-level
geneLevelStats( se, coef = NULL, excludeTypes = NULL, includeTypes = NULL, returnSE = TRUE, minDensityRatio = 0.1, minWidth = 20, excludeGeneAmbiguous = TRUE )
geneLevelStats( se, coef = NULL, excludeTypes = NULL, includeTypes = NULL, returnSE = TRUE, minDensityRatio = 0.1, minWidth = 20, excludeGeneAmbiguous = TRUE )
se |
A 'RangedSummarizedExperiment' containing the results of one of the DEU wrappers. |
coef |
The coefficients tested (if the model included more than one term). |
excludeTypes |
Vector of bin types to exclude. |
includeTypes |
Vector of bin types to include (overrides 'excludeTypes') |
returnSE |
Logical; whether to return the updated 'se' object (default), or the gene-level table. |
minDensityRatio |
Minimum ratio of read density (with respect to the gene's average) for a bin to be included. |
minWidth |
Minimum bin width to include |
excludeGeneAmbiguous |
Logical; whether to exclude bins which are ambiguous (i.e. can be from different genes) |
If 'returnSE=TRUE' (default), returns the 'se' object with an updated 'metadata(se)$geneLevel' slot, otherwise returns the gene-level data.frame.
library(SummarizedExperiment) data(example_bin_se) se <- diffSpliceWrapper(example_bin_se, ~condition) se <- geneLevelStats(se, includeTypes="3UTR") head(metadata(se)$geneLevel)
library(SummarizedExperiment) data(example_bin_se) se <- diffSpliceWrapper(example_bin_se, ~condition) se <- geneLevelStats(se, includeTypes="3UTR") head(metadata(se)$geneLevel)
plotTopGenes
plotTopGenes(se, n = 10, FDR = 0.05, diffUTR = FALSE, alpha = 1, ...)
plotTopGenes(se, n = 10, FDR = 0.05, diffUTR = FALSE, alpha = 1, ...)
se |
A bin-wise SummarizedExperiment as produced by
|
n |
The maximum number of genes for which to plot labels |
FDR |
The FDR threshold above which to plot labels |
diffUTR |
Logical; if FALSE, uses absolute coefficients (appropriate for normal differential exon usage); if TRUE, uses non-absolute (ie changes should be in the same direction across significant bins) and width-weighted scores (i.e. larger bins have more weight) – this is relevant only when testing UTR usage. |
alpha |
Points transparency |
... |
Passed to geom_label_repel; this can for instance be used to increase 'max.overlaps' when not all desired gene labels are displayed) |
A ggplot
data(example_bin_se) se <- diffSpliceWrapper(example_bin_se, ~condition) plotTopGenes(se)
data(example_bin_se) se <- diffSpliceWrapper(example_bin_se, ~condition) plotTopGenes(se)
prepareBins
prepareBins( g, APA = NULL, onlyMainChr = TRUE, removeAntisense = TRUE, chrStyle = NULL, maxUTRbinSize = 15000, codingOnly = FALSE, genewise = FALSE, stranded = FALSE, verbose = TRUE )
prepareBins( g, APA = NULL, onlyMainChr = TRUE, removeAntisense = TRUE, chrStyle = NULL, maxUTRbinSize = 15000, codingOnly = FALSE, genewise = FALSE, stranded = FALSE, verbose = TRUE )
g |
A GRanges (or path to RDS file containing a GRanges) or path to a gtf file or EnsDb object containing the gene annotation. |
APA |
A GRanges (or path to a GRanges in RDS format) or bed file containing the alternative poly-A site database |
onlyMainChr |
Logical; whether to keep only main chromosomes |
removeAntisense |
Logical; whether to remove antisense APA sites |
chrStyle |
Chromosome notation to convert to (default no conversion) |
maxUTRbinSize |
Max width of new alternative UTR bins |
codingOnly |
Logical, whether to keep only coding transcripts |
genewise |
Logical, whether annotation should be flattened genewise |
stranded |
Logical, whether to perform disjoin in a stranded fashion. |
verbose |
Logical, whether to print run information |
See the vignette for more details.
A 'GRanges' object.
Stefan Greber
data(example_gene_annotation) bins <- prepareBins(example_gene_annotation)
data(example_gene_annotation) bins <- prepareBins(example_gene_annotation)
These are the sites from polyA_DB release 3.2, downloaded from https://exon.apps.wistar.org/PolyA_DB/v3/download/3.2/rat_pas.zip, and lifted over to Rno6.
a 'GRanges' object
Simes p-value correction and aggregation, adapted from
link[limma]{diffSplice}
simesAggregation(p.value, geneid)
simesAggregation(p.value, geneid)
p.value |
A vector of p-values |
geneid |
A vector of group labels such as gene identifiers |
A named vector of aggregated p-values
p <- runif(50) genes <- sample(LETTERS,50,replace=TRUE) simesAggregation(p, genes)
p <- runif(50) genes <- sample(LETTERS,50,replace=TRUE) simesAggregation(p, genes)