Title: | Additional functions for working with ChIP-Seq data |
---|---|
Description: | This package builds on existing tools and adds some simple but extremely useful capabilities for working wth ChIP-Seq data. The focus is on detecting differential binding windows/regions. One set of functions focusses on set-operations retaining mcols for GRanges objects, whilst another group of functions are to aid visualisation of results. Coercion to tibble objects is also implemented. |
Authors: | Stevie Pederson [aut, cre] |
Maintainer: | Stevie Pederson <[email protected]> |
License: | GPL-3 |
Version: | 1.11.0 |
Built: | 2024-10-30 07:48:35 UTC |
Source: | https://github.com/bioc/extraChIPs |
Add a status column based on significance and estimated change
addDiffStatus(x, ...) ## S4 method for signature 'data.frame' addDiffStatus( x, fc_col = "logFC", sig_col = c("FDR", "hmp_fdr", "p_fdr", "adj.P.Value"), alpha = 0.05, cutoff = 0, up = "Increased", down = "Decreased", other = "Unchanged", missing = "Undetected", new_col = "status", drop = FALSE, ... ) ## S4 method for signature 'DataFrame' addDiffStatus(x, new_col = "status", ...) ## S4 method for signature 'GRanges' addDiffStatus(x, ...) ## S4 method for signature 'GRangesList' addDiffStatus(x, ...) ## S4 method for signature 'SummarizedExperiment' addDiffStatus(x, ...)
addDiffStatus(x, ...) ## S4 method for signature 'data.frame' addDiffStatus( x, fc_col = "logFC", sig_col = c("FDR", "hmp_fdr", "p_fdr", "adj.P.Value"), alpha = 0.05, cutoff = 0, up = "Increased", down = "Decreased", other = "Unchanged", missing = "Undetected", new_col = "status", drop = FALSE, ... ) ## S4 method for signature 'DataFrame' addDiffStatus(x, new_col = "status", ...) ## S4 method for signature 'GRanges' addDiffStatus(x, ...) ## S4 method for signature 'GRangesList' addDiffStatus(x, ...) ## S4 method for signature 'SummarizedExperiment' addDiffStatus(x, ...)
x |
Object to be classified |
... |
Used to pass arguments between methods |
fc_col |
Name of the fold-change column |
sig_col |
Name of the column with significance values |
alpha |
significance threshold |
cutoff |
minimum estimated change to be considered in either of the up or down categories |
up , down , other
|
factor levels to annotate regions based on the above criteria |
missing |
Value to add when either fc_col or sig_col has NA values |
new_col |
name of the new column to be added |
drop |
logical(1) Drop unused factor levels from the status column |
This takes a simple object and adds a new column classifying entries into
one of three categories, as specified using up
, down
or other
.
Results in the new column will always be returned as a factor with levels in
order of the values provided in the arguments other
, down
and up
An object of the same type as provided
## Working with a data.frame set.seed(101) df <- data.frame(logFC = rnorm(20), p = rbeta(20, shape1 = 1, shape2 = 20)) df$FDR <- p.adjust(df$p, "fdr") addDiffStatus(df) ## This works identically with a GRanges object, amongst others gr <- GRanges(paste0("chr1:", seq_len(20))) mcols(gr) <- df addDiffStatus(gr)
## Working with a data.frame set.seed(101) df <- data.frame(logFC = rnorm(20), p = rbeta(20, shape1 = 1, shape2 = 20)) df$FDR <- p.adjust(df$p, "fdr") addDiffStatus(df) ## This works identically with a GRanges object, amongst others gr <- GRanges(paste0("chr1:", seq_len(20))) mcols(gr) <- df addDiffStatus(gr)
Convert multiple Genomic objects to tibbles
## S3 method for class 'DataFrame' as_tibble(x, rangeAsChar = TRUE, ...) ## S3 method for class 'GenomicRanges' as_tibble(x, rangeAsChar = TRUE, name = "range", ...) ## S3 method for class 'Seqinfo' as_tibble(x, ...) ## S3 method for class 'GInteractions' as_tibble(x, rangeAsChar = TRUE, suffix = c(".x", ".y"), ...) ## S3 method for class 'SummarizedExperiment' as_tibble(x, rangeAsChar = TRUE, ...) ## S3 method for class 'TopTags' as_tibble(x, ...)
## S3 method for class 'DataFrame' as_tibble(x, rangeAsChar = TRUE, ...) ## S3 method for class 'GenomicRanges' as_tibble(x, rangeAsChar = TRUE, name = "range", ...) ## S3 method for class 'Seqinfo' as_tibble(x, ...) ## S3 method for class 'GInteractions' as_tibble(x, rangeAsChar = TRUE, suffix = c(".x", ".y"), ...) ## S3 method for class 'SummarizedExperiment' as_tibble(x, rangeAsChar = TRUE, ...) ## S3 method for class 'TopTags' as_tibble(x, ...)
x |
A Genomic Ranges or DataFrame object |
rangeAsChar |
Convert any GRanges element to a character vector |
... |
Passed to |
name |
Name of column to use for ranges. Ignored if rangeAsChar =
|
suffix |
Suffix appended to column names for anchor1 and anchor2 of a GInteractions object. Only used if specifying rangeAsChar = FALSE |
Quick and dirty conversion into a tibble.
By default, GenomicRanges will be returned with the range as a character
column called range
and all mcols parsed as the remaining columns.
Seqinfo information will be lost during coercion.
Given that names for ranges are considered as rownames in the mcols element,
these can be simply parsed by setting rownames = "id"
in the call to
as_tibble()
When coercing a DataFrame, any Compressed/SimpleList columns will be coerced to S3 list columns. Any GRanges columns will be returned as a character column, losing any additional mcols from these secondary ranges
Coercion of SummarizedExperiment objects will be performed on the
rowRanges()
element, whilst for a GInteractions
object, both anchors will returned with the default suffixes .x
and .y
Defined as an S3 method for consistency with existing tidy methods
A tibble
gr <- GRanges("chr1:1-10") gr$p_value <- runif(1) names(gr) <- "range1" gr as_tibble(gr) as_tibble(gr, rownames = "id") as_tibble(mcols(gr)) as_tibble(seqinfo(gr)) hic <- InteractionSet::GInteractions(gr, GRanges("chr1:201-210")) hic$id <- "interaction1" as_tibble(hic)
gr <- GRanges("chr1:1-10") gr$p_value <- runif(1) names(gr) <- "range1" gr as_tibble(gr) as_tibble(gr, rownames = "id") as_tibble(mcols(gr)) as_tibble(seqinfo(gr)) hic <- InteractionSet::GInteractions(gr, GRanges("chr1:201-210")) hic$id <- "interaction1" as_tibble(hic)
Find the best overlap between ranges
bestOverlap(x, y, ...) ## S4 method for signature 'GRanges,GRanges' bestOverlap( x, y, var = NULL, ignore.strand = FALSE, missing = NA_character_, min_prop = 0.01, ... ) ## S4 method for signature 'GRanges,GRangesList' bestOverlap( x, y, ignore.strand = FALSE, missing = NA_character_, min_prop = 0.01, ... )
bestOverlap(x, y, ...) ## S4 method for signature 'GRanges,GRanges' bestOverlap( x, y, var = NULL, ignore.strand = FALSE, missing = NA_character_, min_prop = 0.01, ... ) ## S4 method for signature 'GRanges,GRangesList' bestOverlap( x, y, ignore.strand = FALSE, missing = NA_character_, min_prop = 0.01, ... )
x |
a GRanges object |
y |
a named GRangesList or GRanges object with mcol as reference category |
... |
Not used |
var |
The variable to use as the category. Not required if |
ignore.strand |
logical(1) Passed to findOverlaps |
missing |
Value to assign to ranges with no overlap |
min_prop |
Threshold below which overlaps are discarded |
This finds the category in the subject GRanges (y) which has the best overlap with the query GRanges (x). The aim is to produce a character vector for best classifying the query GRanges using an external set of features (e.g. promoters, enhancers etc). If the subject (y) is a GRanges object, the values in the specified column will be used as the category. If the subject (y) is a GRangesList, the names of the list will be used to provide the best match
Character vector the same length as the supplied GRanges object
gr <- GRanges("chr1:1-10") gr_cat <- GRanges(c("chr1:2-10", "chr1:5-10")) gr_cat$category <- c("a", "b") propOverlap(gr, gr_cat) bestOverlap(gr, gr_cat, var = "category") grl <- splitAsList(gr_cat, gr_cat$category) lapply(grl, function(x) propOverlap(gr, x)) bestOverlap(gr, grl)
gr <- GRanges("chr1:1-10") gr_cat <- GRanges(c("chr1:2-10", "chr1:5-10")) gr_cat$category <- c("a", "b") propOverlap(gr, gr_cat) bestOverlap(gr, gr_cat, var = "category") grl <- splitAsList(gr_cat, gr_cat$category) lapply(grl, function(x) propOverlap(gr, x)) bestOverlap(gr, grl)
Use coverage to estimate peak centres
centrePeaks(x, y, ...) ## S4 method for signature 'GRanges,BamFileList' centrePeaks( x, y, f = c("weighted.cov", "mean", "median"), BPPARAM = bpparam(), ... ) ## S4 method for signature 'GRanges,BamFile' centrePeaks(x, y, ...) ## S4 method for signature 'GRanges,BigWigFileList' centrePeaks( x, y, f = c("weighted.cov", "mean", "median"), BPPARAM = bpparam(), ... ) ## S4 method for signature 'GRanges,BigWigFile' centrePeaks(x, y, ...) ## S4 method for signature 'GRanges,character' centrePeaks(x, y, ...)
centrePeaks(x, y, ...) ## S4 method for signature 'GRanges,BamFileList' centrePeaks( x, y, f = c("weighted.cov", "mean", "median"), BPPARAM = bpparam(), ... ) ## S4 method for signature 'GRanges,BamFile' centrePeaks(x, y, ...) ## S4 method for signature 'GRanges,BigWigFileList' centrePeaks( x, y, f = c("weighted.cov", "mean", "median"), BPPARAM = bpparam(), ... ) ## S4 method for signature 'GRanges,BigWigFile' centrePeaks(x, y, ...) ## S4 method for signature 'GRanges,character' centrePeaks(x, y, ...)
x |
A set of GRanges representing peaks |
y |
A suitable set of files with methods defined |
... |
Used to pass arguments between methods |
f |
The function to use when estimating a combined peak centre |
BPPARAM |
An object of class BPPARAM |
Use coverage to estimate the centre of a set of peaks or GenomicRanges.
If using the mean or median, the point of maximum coverage for each sample will be found within each peak and these positions will be averaged to return a position representing an estimated peak centre.
If using weighted.cov, positions are weighted by the combined coverage across all samples to return the weighted mean position. In this case coverage will be scaled by total alignments within each bam file before summing across files
A GRanges object with all widths set to one
## Define some peaks f <- system.file("extdata/peaks.bed.gz", package = "extraChIPs") peaks <- importPeaks(f, type = "bed")[[1]] peaks ## Use a bam file to re-centre the regions using highest coverage bf <- system.file("extdata/bam/ex1.bam", package = "extraChIPs") centres <- centrePeaks(peaks, bf, BPPARAM = SerialParam()) centres
## Define some peaks f <- system.file("extdata/peaks.bed.gz", package = "extraChIPs") peaks <- importPeaks(f, type = "bed")[[1]] peaks ## Use a bam file to re-centre the regions using highest coverage bf <- system.file("extdata/bam/ex1.bam", package = "extraChIPs") centres <- centrePeaks(peaks, bf, BPPARAM = SerialParam()) centres
Keep unique ranges by 'chopping' mcols
chopMC(x, simplify = TRUE)
chopMC(x, simplify = TRUE)
x |
A GenomicRanges object |
simplify |
logical(1) |
This function finds unique ranges and chops all mcols in a manner similar
to chop.
Chopped columns will be returned as CompressedList
columns, unless
simplify = TRUE
(the default).
In this case, columns will be returned as vectors where possible.
A GRanges object
gr <- GRanges(rep(c("chr1:1-10"), 2)) gr$id <- paste0("range", seq_along(gr)) gr$gene <- "gene1" gr chopMC(gr)
gr <- GRanges(rep(c("chr1:1-10"), 2)) gr$id <- paste0("range", seq_along(gr)) gr$gene <- "gene1" gr chopMC(gr)
Collapse a vector of gene names
collapseGenes( x, sort = TRUE, dedup = TRUE, format = "_", sep = ", ", last = " and ", numeric = TRUE, width = Inf, ... )
collapseGenes( x, sort = TRUE, dedup = TRUE, format = "_", sep = ", ", last = " and ", numeric = TRUE, width = Inf, ... )
x |
character vector representing gene names |
sort |
logical(1) Should the names be sorted alphabetically |
dedup |
logical(1) Should duplicate names be removed |
format |
character string for markdown formatting of each element |
sep |
separator between vector elements |
last |
character string to place before the last element |
numeric |
logical(1) sort digits numerically, instead of as strings |
width |
The maximum width of the string before truncating to ... |
... |
passed to str_sort |
Convenience function to collapse a vector of gene names into a character/glue object of length 1. By default, symbols are deduplicated, sorted alpha-numerically and italicised with an underscore.
a glue object
genes <- c("FOXP3", "BRCA1", "TP53") collapseGenes(genes)
genes <- c("FOXP3", "BRCA1", "TP53") collapseGenes(genes)
Coerce a column to a GRanges object from a rectangular object
colToRanges(x, ...) ## S4 method for signature 'DataFrame' colToRanges(x, var, seqinfo = NULL, ...) ## S4 method for signature 'GRanges' colToRanges(x, var, ..., keep_metadata = TRUE) ## S4 method for signature 'data.frame' colToRanges(x, var, seqinfo = NULL, ...)
colToRanges(x, ...) ## S4 method for signature 'DataFrame' colToRanges(x, var, seqinfo = NULL, ...) ## S4 method for signature 'GRanges' colToRanges(x, var, ..., keep_metadata = TRUE) ## S4 method for signature 'data.frame' colToRanges(x, var, seqinfo = NULL, ...)
x |
A data-frame or GRanges object containing the column to coerce |
... |
Used to pass arguments to lower-level functions |
var |
The name of the column to coerce |
seqinfo |
A seqinfo object to be applied to the new GRanges object. Ignored if the column is already a GRanges object |
keep_metadata |
logical(1) If the original object is a GRanges object, retain all metadata from the original ranges in the replaced ranges |
Take a data.frame-like object and coerce one column to a GRanges object,
setting the remainder as the mcols
.
A particularly useful application of this is when you have a GRanges object
with one mcol being a secondary GRanges object.
Alternatively, if you have a data.frame with GRanges represented as a character column, this provides a simple method of coercion. In this case, no Seqinfo element will be applied to the GRanges element.
A GenomicRanges object
set.seed(73) x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60")) seqinfo(x) <- Seqinfo("chr1", 60, FALSE, "Example") df <- data.frame(logFC = rnorm(3), logCPM = rnorm(3,8), p = 10^-rexp(3)) mcols(x) <- df gr <- mergeByCol(x, col = "logCPM", pval = "p") colToRanges(gr, "keyval_range")
set.seed(73) x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60")) seqinfo(x) <- Seqinfo("chr1", 60, FALSE, "Example") df <- data.frame(logFC = rnorm(3), logCPM = rnorm(3,8), p = 10^-rexp(3)) mcols(x) <- df gr <- mergeByCol(x, col = "logCPM", pval = "p") colToRanges(gr, "keyval_range")
Cytogenetic bands for GRCh37/hg19 and GRCh38/hg38
data(grch37.cytobands) data(grch38.cytobands)
data(grch37.cytobands) data(grch38.cytobands)
Cytogenetic bands for standard chromosomes from GRCh37,in the format required by IdeogramTrack. A data.frame with 5 columns:
Chromosome
Starting position for each cytogenetic band
End position for each cytogenetic band
Name for each band, e.g. p.36.33
Staining pattern
An object of class data.frame
with 862 rows and 5 columns.
https://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz
https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/cytoBand.txt.gz
data(grch37.cytobands) head(grch37.cytobands) data(grch38.cytobands) head(grch38.cytobands)
data(grch37.cytobands) head(grch37.cytobands) data(grch38.cytobands) head(grch38.cytobands)
Use gene, transcript and exon annotations to define genomic regions
defineRegions( genes, transcripts, exons, promoter = c(2500, 500), upstream = 5000, intron = TRUE, proximal = 10000, simplify = FALSE, cols = c("gene_id", "gene_name", "transcript_id", "transcript_name"), ... )
defineRegions( genes, transcripts, exons, promoter = c(2500, 500), upstream = 5000, intron = TRUE, proximal = 10000, simplify = FALSE, cols = c("gene_id", "gene_name", "transcript_id", "transcript_name"), ... )
genes , transcripts , exons
|
GRanges objects defining each level of annotation |
promoter |
Numeric vector defining upstream and/or downstream distances for a promoter. Passing a single value will define a symmetrical promoter The first value represents the upstream range |
upstream |
The distance from a TSS defining an upstream promoter |
intron |
logical(1) Separate gene bodies into introns and exons. If
|
proximal |
Distance from a gene to be considered a proximal intergenic region. If set to 0, intergenic regions will simply be considered as uniformly intergenic |
simplify |
Passed internally to |
cols |
Column names to be retained from the supplied annotations |
... |
Not used |
Using GRanges annotated as genes, transcripts and exons this function will define ranges uniquely assigned to a region type using a hierarchical process. By default, these region types will be (in order) 1) Promoters, 2) Upstream Promoters, 3) Exons, 4) Introns, 5) Proximal Intergenic and 6) Distal Intergenic.
Setting intron = FALSE
will replace introns and exons with a generic "Gene
Body" annotation.
Setting proximal = 0
will return all intergenic regions (not previously
annotated as promoters or upstream promoters) to an "Intergenic" category
Notably, once a region has been defined, it is excluded from all subsequent candidate regions.
Any columns matching the names provided in cols will be returned, and it is
assumed that the gene/transcript/exon ranges will contain informative columns
in the mcols()
element.
A GRangesList
## Define two exons for two transcripts sq <- Seqinfo(seqnames = "chr1", seqlengths = 50000) e <- c("chr1:20001-21000", "chr1:29001-29950", "chr1:22001-23000", "chr1:29001-30000") e <- GRanges(e, seqinfo = sq) mcols(e) <- DataFrame( gene_id = "Gene1", transcript_id = paste0("Trans", c(1, 1, 2, 2)) ) ## Define the transcript ranges t <- unlist(endoapply(split(e, e$transcript_id), range)) t$gene_id <- "Gene1" t$transcript_id <- names(t) names(t) <- NULL ## Summarise to gene level g <- reduceMC(t) g$transcript_id <- NA_character_ ## Now annotate the regions regions <- defineRegions(genes = g, transcripts = t, exons = e) sort(unlist(regions)) ## Alternatively, collpse gene body and intergenic ranges regions <- defineRegions( genes = g, transcripts = t, exons = e, intron = FALSE, proximal = 0 ) sort(unlist(regions))
## Define two exons for two transcripts sq <- Seqinfo(seqnames = "chr1", seqlengths = 50000) e <- c("chr1:20001-21000", "chr1:29001-29950", "chr1:22001-23000", "chr1:29001-30000") e <- GRanges(e, seqinfo = sq) mcols(e) <- DataFrame( gene_id = "Gene1", transcript_id = paste0("Trans", c(1, 1, 2, 2)) ) ## Define the transcript ranges t <- unlist(endoapply(split(e, e$transcript_id), range)) t$gene_id <- "Gene1" t$transcript_id <- names(t) names(t) <- NULL ## Summarise to gene level g <- reduceMC(t) g$transcript_id <- NA_character_ ## Now annotate the regions regions <- defineRegions(genes = g, transcripts = t, exons = e) sort(unlist(regions)) ## Alternatively, collpse gene body and intergenic ranges regions <- defineRegions( genes = g, transcripts = t, exons = e, intron = FALSE, proximal = 0 ) sort(unlist(regions))
Use package data to define a Seqinfo object
defineSeqinfo( build = c("GRCh38", "GRCh37", "GRCm39", "GRCm38", "hg19", "hg38", "T2T-CHM13v2.0", "mm39", "mm10"), chr = TRUE, mito, ... )
defineSeqinfo( build = c("GRCh38", "GRCh37", "GRCm39", "GRCm38", "hg19", "hg38", "T2T-CHM13v2.0", "mm39", "mm10"), chr = TRUE, mito, ... )
build |
The Genome build used |
chr |
logical(1) Include the prefix "chr" |
mito |
Specify M or MT to include the mitochondrial chromosome. Omitted by default |
... |
Not used |
This function will create a Seqinfo object from pre-defined data from the Genome Reference Consortium. Returned objects will always be restricted to assembled molecules only. Currently implemented genome builds represent the four most common builds for ChIP-Seq analysis
A Seqinfo object
defineSeqinfo("GRCh37", TRUE) defineSeqinfo("GRCh37", FALSE, "MT")
defineSeqinfo("GRCh37", TRUE) defineSeqinfo("GRCh37", FALSE, "MT")
Keep distinct ranges by including mcols
distinctMC(x, ..., .keep_all = FALSE)
distinctMC(x, ..., .keep_all = FALSE)
x |
A GenomicRanges object |
... |
|
.keep_all |
If |
Wrapper to distinct for GRanges
objects.
Finds unique ranges and mcols in combination and retains only
the distinct combinations, in keeping with the dplyr
function.
Will default to unique(granges(x))
if no columns are provided
A GRanges object
gr <- GRanges(rep(c("chr1:1-10"), 2)) gr$id <- paste0("range", seq_along(gr)) gr$gene <- "gene1" gr distinctMC(gr) distinctMC(gr, gene) distinctMC(gr, gene, .keep_all = TRUE)
gr <- GRanges(rep(c("chr1:1-10"), 2)) gr$id <- paste0("range", seq_along(gr)) gr$gene <- "gene1" gr distinctMC(gr) distinctMC(gr, gene) distinctMC(gr, gene, .keep_all = TRUE)
Apply two filters to counts generated using sliding windows
dualFilter( x, bg = NULL, ref, q = 0.5, logCPM = TRUE, keep.totals = TRUE, bin.size = NULL, prior.count = 2, BPPARAM = bpparam() )
dualFilter( x, bg = NULL, ref, q = 0.5, logCPM = TRUE, keep.totals = TRUE, bin.size = NULL, prior.count = 2, BPPARAM = bpparam() )
x |
RangedSummarizedExperiment containing sample counts |
bg |
RangedSummarizedExperiment containing background/input counts, or alternate method for selecting samples from within x, such as a logical, numeric or character vector |
ref |
GRanges object containing ranges where signal is expected |
q |
The upper percentile of the reference ranges expected to be returned when tuning the filtering criteria |
logCPM |
logical(1) Add a logCPM assay to the returned data |
keep.totals |
logical(1) Keep the original library sizes or replace using only the retained windows |
bin.size |
Bin sizes when calling filterWindowsControl. If not specified will default to the largest of 2000bp or 10x the window size |
prior.count |
Passed to filterWindowsControl and filterWindowsProportion |
BPPARAM |
Settings for running in parallel |
This function will take sliding (or tiling) windows for it's input as a RangedSummarizedExperiment object. The dual strategy of applying filterWindowsControl and filterWindowsProportion will then be applied. A set of reference ranges for which signal is expected is used to refine the filtering criteria.
Cutoff values are found for both signal relative to input and overall signal,
such that the 100*q%
of the (sliding) windows which overlap a reference
range will be returned, along with any others which match the
dual filtering criteria.
In general, higher values of q
will return more windows as those with weak
signal and a marginal overlap with a reference range will be returned.
Lower values will ensure that fewer windows, generally with the strongest
signal, are retained.
Cutoff values for both criteria are added to the metadata
element of the returned object.
If setting bg = NULL
the filterWindowsControl step will be
ignored and only the filterWindowsProportion will be used.
This should only be performed if no Input sample is available.
Please note that the any .bam
files referred to in the supplied objects
must be accessible to this function. It will not run on a separate
machine or file structure to that which the original sliding windows were
prepared. Please see the example/vignette for runnable code.
A RangedSummarizedExperiment which is a
filtered subset of the original object. If requested the assay "logCPM" will
be added (TRUE
by default)
## Taken from the differential_binding vignette library(tidyverse) library(Rsamtools) library(csaw) library(BiocParallel) library(rtracklayer) ## For this function we need a set of counts using sliding windows and the ## original BamFiles from which they were taken ## First we'll set up the bam file list bfl <- system.file( "extdata", "bam", c("ex1.bam", "ex2.bam", "input.bam"), package = "extraChIPs" ) %>% BamFileList() %>% setNames(c("ex1", "ex2", "input")) ## Then define the readParam settings for csaw::readParam() rp <- readParam( pe = "none", dedup = TRUE, restrict = "chr10" ) ## Now we can form our sliding window object with the counts. wincounts <- windowCounts( bam.files = bfl, spacing = 60, width = 180, ext = 200, filter = 1, param = rp ) ## As this is a subset of reads, add the initial library sizes for accuracy ## Note that this step is not normally required wincounts$totals <- c(964076L, 989543L, 1172179L) ## We should also update the metadata for our counts wincounts$sample <- colnames(wincounts) wincounts$treat <- as.factor(c("ctrl", "treat", NA)) colData(wincounts) ## The function dualFilter requires a set of peaks which will guide the ## filtering step. This indicate where genuine signal is likely to be found ## and will perform the filtering based on a) signal above the input, and ## b) The overall signal level, using the guide set of peaks to inform the ## cutoff values for inclusion peaks <- import.bed( system.file("extdata", "peaks.bed.gz", package = "extraChIPs") ) filtcounts <- dualFilter( x = wincounts, bg = "input", ref = peaks, q = 0.8 # Better to use q = 0.5 on real data ) filtcounts
## Taken from the differential_binding vignette library(tidyverse) library(Rsamtools) library(csaw) library(BiocParallel) library(rtracklayer) ## For this function we need a set of counts using sliding windows and the ## original BamFiles from which they were taken ## First we'll set up the bam file list bfl <- system.file( "extdata", "bam", c("ex1.bam", "ex2.bam", "input.bam"), package = "extraChIPs" ) %>% BamFileList() %>% setNames(c("ex1", "ex2", "input")) ## Then define the readParam settings for csaw::readParam() rp <- readParam( pe = "none", dedup = TRUE, restrict = "chr10" ) ## Now we can form our sliding window object with the counts. wincounts <- windowCounts( bam.files = bfl, spacing = 60, width = 180, ext = 200, filter = 1, param = rp ) ## As this is a subset of reads, add the initial library sizes for accuracy ## Note that this step is not normally required wincounts$totals <- c(964076L, 989543L, 1172179L) ## We should also update the metadata for our counts wincounts$sample <- colnames(wincounts) wincounts$treat <- as.factor(c("ctrl", "treat", NA)) colData(wincounts) ## The function dualFilter requires a set of peaks which will guide the ## filtering step. This indicate where genuine signal is likely to be found ## and will perform the filtering based on a) signal above the input, and ## b) The overall signal level, using the guide set of peaks to inform the ## cutoff values for inclusion peaks <- import.bed( system.file("extdata", "peaks.bed.gz", package = "extraChIPs") ) filtcounts <- dualFilter( x = wincounts, bg = "input", ref = peaks, q = 0.8 # Better to use q = 0.5 on real data ) filtcounts
Various example datasets for demonstrating analysis and
visualisation strategies.
Generation of all datasets is documented in
system.file("script/ex_datasets.md", package = "extraChIPs")
Simple GRanges object with complete ranges for each gene
Exon & transcript level information prepared for plotting
with Gviz
or plotHFGC()
Regions defined as promoters
Example HiC interactions
data(ex_trans) data(ex_genes) data(ex_prom) data(ex_hic)
data(ex_trans) data(ex_genes) data(ex_prom) data(ex_hic)
GRanges and GInteractions objects
All annotations are from GRCh37
An object of class GRanges
of length 4.
An object of class GRanges
of length 9.
An object of class GInteractions
of length 1.
data(ex_trans) ex_trans
data(ex_trans) ex_trans
Detect differential ChIP signal using one of many approaches
fitAssayDiff(x, ...) ## S4 method for signature 'SummarizedExperiment' fitAssayDiff( x, assay = "counts", design = NULL, coef = NULL, lib.size = "totals", method = c("qlf", "lt", "wald"), norm = c("none", "TMM", "RLE", "TMMwsp", "upperquartile"), groups = NULL, fc = 1, lfc = log2(fc), asRanges = FALSE, offset = NULL, weighted = FALSE, ..., null = c("interval", "worst.case"), robust = FALSE, type = c("apeglm", "ashr", "normal") )
fitAssayDiff(x, ...) ## S4 method for signature 'SummarizedExperiment' fitAssayDiff( x, assay = "counts", design = NULL, coef = NULL, lib.size = "totals", method = c("qlf", "lt", "wald"), norm = c("none", "TMM", "RLE", "TMMwsp", "upperquartile"), groups = NULL, fc = 1, lfc = log2(fc), asRanges = FALSE, offset = NULL, weighted = FALSE, ..., null = c("interval", "worst.case"), robust = FALSE, type = c("apeglm", "ashr", "normal") )
x |
a SummarizedExperiment object |
... |
Passed to normLibSizes and |
assay |
The assay to use for analysis |
design |
The design matrix to use for analysis |
coef |
The required column from the design matrix |
lib.size |
The column within the colData element which contains the library size information. If set to NULL, column summaries will be used. |
method |
the analytic method to be used. Can be 'qlf' which will fit counts using the glmQLFit strategy , or 'lt' which fits the limma-trend model on logCPM, or pre-processed logCPM values. Setting method = 'wald' will call nbinomWaldTest |
norm |
The normalisation strategy to use when running the glmQLF model or the Wald test. The value 'none' relies solely on library-size normalisation, and is the default. All methods available in normLibSizes are implemented. Ignored when using method = "lt" |
groups |
character(1) If a column name is supplied here, group-based normalisation will be applied to GLM models treating data in this column as a grouping factor. Ignored when using method = "lt" |
fc , lfc
|
|
asRanges |
logical(1). By default, the returned object will be a
|
offset |
If provided will be used as the offset when a DGEList object is created during model fitting for method = 'qlf' |
weighted |
logical(1) Passed to normLibSizes. Only used when applying a TMM-type normalisation strategy |
null |
Passed to glmTreat glmQLFit when method = "qlf". If method = "lt", instead passed to lmFit |
robust |
|
type |
Passed to lfcShrink |
Starting with a SummarizedExperiment object this function fits either a glmQLFit model to count data, performs the nbinomWaldTest on count data, or applies the limma-trend model to logCPM data.
If fitting Generalised Linear Models via glmQLFit, options for normalisation are "none", which normalises to library size. Existing library sizes are commonly found in the "totals" column of the colData element and this is attempted by default. All methods provided in normLibSizes are also implemented, with the added possibility of normalising within groups instead of across the entire dataset. To enable this, the column with the grouping factor is expected to be in the colData element and is simply called by column name. No normalisation is applied when using the limma-trend model, as this allows for previous normalisation strategies to be performed on the data.
If testing with nbinomWaldTest, applying RLE normalisation without groups, and using colSums for library sizes (instead of total alignments), the standard normalisation factors from estimateSizeFactors will be used. In all other scenarios, normalisation factors as returned by normLibSizes will be used. The fitType is set to 'local' when estimating dispersions, and this can be easily modified by passing fitType via the dot arguments. Results are additionally returned after applying lfcShrink, including the svalue returned by this approach.
Normalising to ChIP Input samples is not yet implemented. Similarly, the use of offsets when applying the Wald test is not yet implemented.
Range-based hypothesis testing is implemented using glmTreat or treat. Setting fc to 1 (or lfc to 0) will default to a point-based null hypothesis, equivalent to either glmQLFTest (method = "qlf") or eBayes (method = "lt"). When applying nbinomWaldTest, lfcShrink will be applied.
It should also be noted that this is primarily a convenience function and if requiring intermediate output from any steps, then these can be run individually as conventionally specified.
A SummarizedExperiment object with results set as the rowData
element.
Any existing columns not contained in the differential ChIP results will be
retained.
Results from testing will contain logCPM, logFC, PValue and any t/F
statistic as appropriate, along with an FDR-adjusted p-value.
If specifying a range-based H0 by setting lfc != 0, an additional column p_mu0 will be included which is the p-value for the point H0: logFC = 0. These are not used for FDR-adjusted p-values but can be helpful when integrating multiple ChIP targets due to the increase in false-negatives when using a range-based H0, and when requiring more accurate identification of truly unchanged sites, as opposed to those which simply fail to achieve significance using a range-based H0 where arbitrary cutoff values are used.
nrows <- 200; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colnames(counts) <- paste0("Sample_", seq_len(ncols)) df <- DataFrame(treat = c("A", "A", "A", "B", "B", "B")) df$treat <- as.factor(df$treat) se <- SummarizedExperiment( assays = SimpleList(counts = counts), colData = df ) X <- model.matrix(~treat, colData(se)) se <- fitAssayDiff(se, design = X, lib.size = NULL) rowData(se)
nrows <- 200; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colnames(counts) <- paste0("Sample_", seq_len(ncols)) df <- DataFrame(treat = c("A", "A", "A", "B", "B", "B")) df$treat <- as.factor(df$treat) se <- SummarizedExperiment( assays = SimpleList(counts = counts), colData = df ) X <- model.matrix(~treat, colData(se)) se <- fitAssayDiff(se, design = X, lib.size = NULL) rowData(se)
GRangesList of peaks and SummarizedExperiment of counts All were saved during initial vignette preparation at https://github.com/smped/extraChIPs_vignette/blob/main/differential_signal_fixed.Rmd
data(se) data(peaks)
data(se) data(peaks)
An object of class RangedSummarizedExperiment
with 188 rows and 6 columns.
An object of class CompressedGRangesList
of length 6.
data(se) se data(peaks) peaks
data(se) se data(peaks) peaks
Get coverage Profile Data surrounding specified ranges
getProfileData(x, gr, ...) ## S4 method for signature 'BigWigFile,GenomicRanges' getProfileData( x, gr, upstream = 2500, downstream = upstream, bins = 100, mean_mode = "w0", log = TRUE, offset = 1, n_max = Inf, ... ) ## S4 method for signature 'BigWigFileList,GenomicRanges' getProfileData( x, gr, upstream = 2500, downstream = upstream, bins = 100, mean_mode = "w0", log = TRUE, offset = 1, BPPARAM = SerialParam(), ... ) ## S4 method for signature 'character,GenomicRanges' getProfileData( x, gr, upstream = 2500, downstream = upstream, bins = 100, mean_mode = "w0", log = TRUE, offset = 1, ... )
getProfileData(x, gr, ...) ## S4 method for signature 'BigWigFile,GenomicRanges' getProfileData( x, gr, upstream = 2500, downstream = upstream, bins = 100, mean_mode = "w0", log = TRUE, offset = 1, n_max = Inf, ... ) ## S4 method for signature 'BigWigFileList,GenomicRanges' getProfileData( x, gr, upstream = 2500, downstream = upstream, bins = 100, mean_mode = "w0", log = TRUE, offset = 1, BPPARAM = SerialParam(), ... ) ## S4 method for signature 'character,GenomicRanges' getProfileData( x, gr, upstream = 2500, downstream = upstream, bins = 100, mean_mode = "w0", log = TRUE, offset = 1, ... )
x |
A BigWigFile or BigWiFileList |
gr |
A GRanges object |
... |
Passed to normalizeToMatrix |
upstream |
The distance to extend upstream from the centre of each
range within |
downstream |
The distance to extend downstream from the centre of each
range within |
bins |
The total number of bins to break the extended ranges into |
mean_mode |
The method used for calculating the score for each bin. See normalizeToMatrix for details |
log |
logical(1) Should the returned values be log2-transformed |
offset |
Value added to data if log-transforming. Ignored otherwise |
n_max |
Upper limit on the number of ranges to return profile data for. By default, no limit will be applied . |
BPPARAM |
Passed internally to bplapply |
This will take all provided ranges and set as identical width ranges, extending by the specified amount both up and downstream of the centre of the provided ranges. By default, the ranges extensions are symmetrical and only the upstream range needs to be specified, however this parameterisation allows for non-symmetrical ranges to be generated.
These uniform width ranges will then be used to extract the value contained in the score field from one or more BigWigFiles. Uniform width ranges are then broken into bins of equal width and the average score found within each bin.
The binned profiles are returned as a DataFrameList called profile_data
as
a column within the resized GRanges object.
Column names in each DataFrame are score
, position
and bp
.
If passing a BigWigFileList, profiles will be obtained in series by
default. To run in parallel pass a MulticoreParam object
to the BPPARAM
argument.
GRanges or GrangesList with column profile_data, as described above
bw <- system.file("tests", "test.bw", package = "rtracklayer") gr <- GRanges("chr2:1000") pd <- getProfileData(bw, gr, upstream = 500, bins = 10) pd pd$profile_data
bw <- system.file("tests", "test.bw", package = "rtracklayer") gr <- GRanges("chr2:1000") pd <- getProfileData(bw, gr, upstream = 500, bins = 10) pd pd$profile_data
Move one or more columns from a GRangesList elements into assays in a RangesSummarizedEperiment
grlToSE(x, ...) ## S4 method for signature 'GRangesList' grlToSE( x, assayCols = c(), metaCols = c(), keyvals = c(), by = c("min", "max"), ..., ignore.strand = FALSE )
grlToSE(x, ...) ## S4 method for signature 'GRangesList' grlToSE( x, assayCols = c(), metaCols = c(), keyvals = c(), by = c("min", "max"), ..., ignore.strand = FALSE )
x |
A GrangesList |
... |
Passed to reduce |
assayCols |
Columns to move to separate assays |
metaCols |
Columns to move to mcols within the rowRanges element |
keyvals |
The value to use when choosing representative values |
by |
How to choose by keyvals |
ignore.strand |
logical(1). Whether the strand of the input ranges should be ignored or not. |
Given a GRangesList which would commonly represent multiple samples, reduce any overlapping ranges into a consensus range, setting any metadata columns to be retained as separate assays. These columns may contain values such as coverage, p-values etc.
Additional columns can also be placed as rowData columns where the original values are better suited to information about the consensus range rather than the sample (or GRangesList element).
Only one value for each range will be retained, and these are chosen using the value provided as the keyvals, taking either the min or max value in this column as the representative range.
A RangedSummarizedExperiment
a <- GRanges("chr1:1-10") a$feature <- "Gene" a$p <- 0.1 b <- GRanges(c("chr1:6-15", "chr1:15")) b$feature <- c("Gene", "Promoter") b$p <- c(0.5, 0.01) grl <- GRangesList(a = a, b = b) grl se <- grlToSE( grl, assayCols = "p", metaCols = "feature", keyvals = "p", by = "min" ) assay(se, "p") rowRanges(se)
a <- GRanges("chr1:1-10") a$feature <- "Gene" a$p <- 0.1 b <- GRanges(c("chr1:6-15", "chr1:15")) b$feature <- c("Gene", "Promoter") b$p <- c(0.5, 0.01) grl <- GRangesList(a = a, b = b) grl se <- grlToSE( grl, assayCols = "p", metaCols = "feature", keyvals = "p", by = "min" ) assay(se, "p") rowRanges(se)
Import peaks in narrowPeak, broadPeak or bed format
importPeaks( x, type = c("narrow", "broad", "bed"), blacklist, seqinfo, pruning.mode = c("coarse", "error"), sort = TRUE, setNames = TRUE, glueNames = "{basename(x)}", centre = FALSE, nameRanges = TRUE, ... )
importPeaks( x, type = c("narrow", "broad", "bed"), blacklist, seqinfo, pruning.mode = c("coarse", "error"), sort = TRUE, setNames = TRUE, glueNames = "{basename(x)}", centre = FALSE, nameRanges = TRUE, ... )
x |
One or more files to be imported. All files must be of the same type, i.e. narrow or broad |
type |
The type of peaks to be imported |
blacklist |
A set of ranges to be excluded |
seqinfo |
A seqinfo object to be applied to the GRanges objects |
pruning.mode |
How to handle conflicts if supplying a seqinfo object.
Defaults to |
sort |
logical. Should the ranges be sorted during import |
setNames |
logical Set basename(x) as the name for each element of the GRangesList |
glueNames |
glue syntax for naming list elements |
centre |
Add the estimated peak centre. Ignored unless type = "narrow" |
nameRanges |
Place any values in the name column as range names within each file. |
... |
passed to |
Peaks are imported from narrowPeak, broadPeak or bed format as GenomicRanges objects.
If importing bed files, only the default 3-6 columns will imported.
A GRangesList
fl <- system.file( c("extdata/ER_1.narrowPeak", "extdata/ER_2.narrowPeak"), package = "extraChIPs" ) peaks <- importPeaks(fl) peaks
fl <- system.file( c("extdata/ER_1.narrowPeak", "extdata/ER_2.narrowPeak"), package = "extraChIPs" ) peaks <- importPeaks(fl) peaks
Make a set of consensus peaks based on the number of replicates
makeConsensus( x, p = 0, var = NULL, method = c("union", "coverage"), ignore.strand = TRUE, simplify = FALSE, min_width = 0, merge_within = 1L, ... )
makeConsensus( x, p = 0, var = NULL, method = c("union", "coverage"), ignore.strand = TRUE, simplify = FALSE, min_width = 0, merge_within = 1L, ... )
x |
A GRangesList |
p |
The minimum proportion of samples (i.e. elements of |
var |
Additional columns in the |
method |
Either return the union of all overlapping ranges, or the regions within the overlapping ranges which are covered by the specified proportion of replicates. When using p = 0, both methods will return identical results |
ignore.strand , simplify , ...
|
Passed to reduceMC or intersectMC internally |
min_width |
Discard any regions below this width |
merge_within |
Passed to reduce as |
This takes a list of GRanges objects and forms a set of consensus peaks.
When using method = "union" the union ranges of all overlapping peaks will be returned, using the minimum proportion of replicates specified. When using method = "coverage", only the regions within each overlapping range which are 'covered' by the minimum proportion of replicates specified are returned. This will return narrower peaks in general, although some artefactual very small ranges may be included (e.g. 10bp). Careful setting of the min_width and merge_within parameters may be very helpful for these instances. It is also expected that setting method = "coverage" should return the region within each range which is more likely to contain the true binding site for the relevant ChIP targets
GRanges
object with mcols containing a logical vector for every element of
x, along with the column n
which adds all logical columns. These columns
denote which replicates contain an overlapping peak for each range
If any additional columns have been requested using var
, these will be
returned as CompressedList objects as produced by reduceMC()
or
intersectMC()
.
data("peaks") ## The first three replicates are from the same treatment group grl <- peaks[1:3] names(grl) <- gsub("_peaks.+", "", names(grl)) makeConsensus(grl) makeConsensus(grl, p = 2/3, var = "score") ## Using method = 'coverage' finds ranges based on the intersection makeConsensus(grl, p = 2/3, var = "score", method = "coverage")
data("peaks") ## The first three replicates are from the same treatment group grl <- peaks[1:3] names(grl) <- gsub("_peaks.+", "", names(grl)) makeConsensus(grl) makeConsensus(grl, p = 2/3, var = "score") ## Using method = 'coverage' finds ranges based on the intersection makeConsensus(grl, p = 2/3, var = "score", method = "coverage")
Map Genomic Ranges to genes using defined regulatory features
mapByFeature( gr, genes, prom, enh, gi, cols = c("gene_id", "gene_name", "symbol"), gr2prom = 0, gr2enh = 0, gr2gi = 0, gr2gene = 1e+05, prom2gene = 0, enh2gene = 1e+05, gi2gene = 0, ... )
mapByFeature( gr, genes, prom, enh, gi, cols = c("gene_id", "gene_name", "symbol"), gr2prom = 0, gr2enh = 0, gr2gi = 0, gr2gene = 1e+05, prom2gene = 0, enh2gene = 1e+05, gi2gene = 0, ... )
gr |
GRanges object with query ranges to be mapped to genes |
genes |
GRanges object containing genes (or any other nominal feature) to be assigned |
prom |
GRanges object defining promoters |
enh |
GRanges object defining Enhancers |
gi |
GInteractions object defining interactions. Mappings from interactions to genes should be performed as a separate prior step. |
cols |
Column names to be assigned as mcols in the output. Columns
must be minimally present in |
gr2prom |
The maximum permissible distance between a query range and any ranges defined as promoters |
gr2enh |
The maximum permissible distance between a query range and any ranges defined as enhancers |
gr2gi |
The maximum permissible distance between a query range and any ranges defined as GInteraction anchors |
gr2gene |
The maximum permissible distance between a query range and genes (for ranges not otherwise mapped) |
prom2gene |
The maximum permissible distance between a range provided
in |
enh2gene |
The maximum permissible distance between a range provided
in |
gi2gene |
The maximum permissible distance between a GInteractions
anchor (provided in |
... |
Passed to findOverlaps and overlapsAny internally |
This function is able to utilise feature-level information and long-range interactions to enable better mapping of regions to genes. If provided, this essentially maps from ranges to genes using the regulatory features as a framework. The following sequential strategy is used:
Ranges overlapping a promoter are assigned to that gene
Ranges overlapping an enhancer are assigned to all genes within a specified distance
Ranges overlapping a long-range interaction are assigned to all genes connected by the interaction
Ranges with no gene assignment from the previous steps are assigned to all overlapping genes or the nearest gene within a specified distance
If information is missing for one of these steps, the algorithm will simply proceed to the next step. If no promoter, enhancer or interaction data is provided, all ranges will be simply mapped by step 4. Ranges can be mapped by any or all of the first three steps, but step 4 is mutually exclusive with the first 3 steps.
Distances between each set of features and the query range can be
individually specified by modifying the gr2prom
, gr2enh
, gr2gi
or
gr2gene
parameters. Distances between features and genes can also be set
using the parameters prom2gene
, enh2gene
and gi2gene
.
Additionally, if previously defined mappings are included with any of the
prom
, enh
or gi
objects, this will be used in preference to any
obtained from the genes
object.
A GRanges object with added mcols as specified
## Define some genes genes <- GRanges(c("chr1:2-10:*", "chr1:25-30:-", "chr1:31-40:+")) genes$gene_id <- paste0("gene", seq_along(genes)) genes ## Add a promoter for each gene prom <- promoters(genes, upstream = 1, downstream = 1) prom ## Some ranges to map gr <- GRanges(paste0("chr1:", seq(0, 60, by = 15))) gr ## Map so that any gene within 25bp of the range is assigned mapByFeature(gr, genes, gr2gene = 25) ## Now use promoters to be more accurate in the gene assignment ## Given that the first range overlaps the promoter of gene1, this is a ## more targetted approach. Similarly for the third range mapByFeature(gr, genes, prom, gr2gene = 25)
## Define some genes genes <- GRanges(c("chr1:2-10:*", "chr1:25-30:-", "chr1:31-40:+")) genes$gene_id <- paste0("gene", seq_along(genes)) genes ## Add a promoter for each gene prom <- promoters(genes, upstream = 1, downstream = 1) prom ## Some ranges to map gr <- GRanges(paste0("chr1:", seq(0, 60, by = 15))) gr ## Map so that any gene within 25bp of the range is assigned mapByFeature(gr, genes, gr2gene = 25) ## Now use promoters to be more accurate in the gene assignment ## Given that the first range overlaps the promoter of gene1, this is a ## more targetted approach. Similarly for the third range mapByFeature(gr, genes, prom, gr2gene = 25)
Make consensus peaks and add individual columns from each original GRangesList element
mapGrlCols( x, var = NULL, collapse = NULL, collapse_sep = "; ", name_sep = "_", include = FALSE, ... )
mapGrlCols( x, var = NULL, collapse = NULL, collapse_sep = "; ", name_sep = "_", include = FALSE, ... )
x |
GRangesList |
var |
Column(s) to map onto the set of consensus peaks |
collapse |
Columns specified here will be simplified into a single column. Should only be character or factor columns |
collapse_sep |
String to separate values when collapsing columns |
name_sep |
String to separate values when adding column names |
include |
logical(1) Include the original ranges as character columns |
... |
Passed to makeConsensus |
Starting with a GRangesList, make a single GRanges object with select columns from each element added to the new object
GRanges object with a set of consensus ranges across all list elements and values from each element mapped to these consensus ranges.
If requested (include = TRUE
) the original ranges are returned as
character columns, as there will be multiple NA values in each.
a <- GRanges(paste0("chr1:", seq(1, 61, by = 20))) width(a) <- 5 a$logFC <- rnorm(length(a)) a_g <- as.list(paste("Gene", seq_along(a))) a_g[[1]] <- c("Gene 0", a_g[[1]]) a$genes <- as(a_g, "CompressedList") b <- GRanges("chr1:61-70") b$logFC <- rnorm(1) b$genes <- as(list("Gene 5"), "CompressedList") grl <- GRangesList(A = a, B = b) mapGrlCols(grl, var = "logFC") ## This forms a union of overlapping rangesby default ## Pass methods to makeConsensus() to change to regions with coverage == 2 mapGrlCols(grl, var = "logFC", method = "coverage", p = 1) ## Columns can be collapsed to merge into a single column mapGrlCols(grl, var = "logFC", collapse = "genes") ## Original ranges can also be included mapGrlCols(grl, collapse = "genes", include = TRUE)
a <- GRanges(paste0("chr1:", seq(1, 61, by = 20))) width(a) <- 5 a$logFC <- rnorm(length(a)) a_g <- as.list(paste("Gene", seq_along(a))) a_g[[1]] <- c("Gene 0", a_g[[1]]) a$genes <- as(a_g, "CompressedList") b <- GRanges("chr1:61-70") b$logFC <- rnorm(1) b$genes <- as(list("Gene 5"), "CompressedList") grl <- GRangesList(A = a, B = b) mapGrlCols(grl, var = "logFC") ## This forms a union of overlapping rangesby default ## Pass methods to makeConsensus() to change to regions with coverage == 2 mapGrlCols(grl, var = "logFC", method = "coverage", p = 1) ## Columns can be collapsed to merge into a single column mapGrlCols(grl, var = "logFC", collapse = "genes") ## Original ranges can also be included mapGrlCols(grl, collapse = "genes", include = TRUE)
Merge sliding windows using a specified column
mergeByCol(x, ...) ## S4 method for signature 'GenomicRanges' mergeByCol( x, df = NULL, col, by = c("max", "median", "mean", "min"), logfc = "logFC", pval = "P", inc_cols, p_adj_method = "fdr", merge_within = 1L, ignore_strand = TRUE, min_win = 1, ... ) ## S4 method for signature 'RangedSummarizedExperiment' mergeByCol( x, df = NULL, col, by = c("max", "median", "mean", "min"), logfc = "logFC", pval = "P", inc_cols, p_adj_method = "fdr", merge_within = 1L, ignore_strand = FALSE, ... )
mergeByCol(x, ...) ## S4 method for signature 'GenomicRanges' mergeByCol( x, df = NULL, col, by = c("max", "median", "mean", "min"), logfc = "logFC", pval = "P", inc_cols, p_adj_method = "fdr", merge_within = 1L, ignore_strand = TRUE, min_win = 1, ... ) ## S4 method for signature 'RangedSummarizedExperiment' mergeByCol( x, df = NULL, col, by = c("max", "median", "mean", "min"), logfc = "logFC", pval = "P", inc_cols, p_adj_method = "fdr", merge_within = 1L, ignore_strand = FALSE, ... )
x |
A GenomicRanges or SummarizedExperiment object |
... |
Not used |
df |
A data.frame-like object containing the columns of interest. If not provided, any columns in the mcols() slot will be used. |
col |
The column to select as representative of the merged ranges |
by |
The method for selecting representative values |
logfc |
Column containing logFC values |
pval |
Column containing p-values |
inc_cols |
Any additional columns to return. Output will always include
columns specified in the arguments |
p_adj_method |
Any of p.adjust.methods |
merge_within |
Merge any ranges within this distance |
ignore_strand |
Passed internally to reduce and findOverlaps |
min_win |
Only keep merged windows derived from at least this number |
This merges sliding windows using the values in a given column to select
representative values for the subsequent merged windows.
Values can be chosen from the specified column using any of min()
,
max()
, mean()
or median()
, although max()
is strongly recommended
when specifying values like logCPM.
Once a representative range is selected using the specified column, values
from columns specified using inc_cols
are also returned.
In addition to these columns, the range from the representative window is
returned in the mcols element as a GRanges object in the column
keyval_range
.
Merging windows using either the logFC or p-value columns is not implemented.
If adjusted p-values are requested an additional column names the same as the initial p-value, but tagged with the adjustment method, will be added. In addition, using the p-value from the selected window, the number of windows with lower p-values are counted by direction and returned in the final object. The selected window will always be counted as up/down regardless of significance as the p-value for this column is taken as the threshold. This is a not dissimilar approach to cluster-direction.
If called on a SummarizedExperiment object, the function will be applied to
the rowRanges
element.
A Genomic Ranges object
x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60")) set.seed(1001) df <- DataFrame(logFC = rnorm(3), logCPM = rnorm(3,8), p = rexp(3, 10)) mergeByCol(x, df, col = "logCPM", pval = "p") mcols(x) <- df x mergeByCol(x, col = "logCPM", pval = "p")
x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60")) set.seed(1001) df <- DataFrame(logFC = rnorm(3), logCPM = rnorm(3,8), p = rexp(3, 10)) mergeByCol(x, df, col = "logCPM", pval = "p") mcols(x) <- df x mergeByCol(x, col = "logCPM", pval = "p")
Merge overlapping windows using harmonic mean p-values from significance testing
mergeByHMP(x, ...) ## S4 method for signature 'GenomicRanges' mergeByHMP( x, df = NULL, w = NULL, logfc = "logFC", pval = "P", cpm = "logCPM", inc_cols = NULL, p_adj_method = "fdr", merge_within = 1L, ignore_strand = TRUE, min_win = 1, keyval = c("min", "merged"), hm_pre = "hm", ... ) ## S4 method for signature 'RangedSummarizedExperiment' mergeByHMP( x, df = NULL, w = NULL, logfc = "logFC", pval = "P", cpm = "logCPM", inc_cols = NULL, p_adj_method = "fdr", merge_within = 1L, ignore_strand = FALSE, hm_pre = "hm", ... )
mergeByHMP(x, ...) ## S4 method for signature 'GenomicRanges' mergeByHMP( x, df = NULL, w = NULL, logfc = "logFC", pval = "P", cpm = "logCPM", inc_cols = NULL, p_adj_method = "fdr", merge_within = 1L, ignore_strand = TRUE, min_win = 1, keyval = c("min", "merged"), hm_pre = "hm", ... ) ## S4 method for signature 'RangedSummarizedExperiment' mergeByHMP( x, df = NULL, w = NULL, logfc = "logFC", pval = "P", cpm = "logCPM", inc_cols = NULL, p_adj_method = "fdr", merge_within = 1L, ignore_strand = FALSE, hm_pre = "hm", ... )
x |
GenomicRanges object |
... |
Not used |
df |
data.frame with results of differential binding analysis performed
using a sliding window strategy. If not provided, the columns in the
|
w |
vector of weights to applied when calculating harmonic mean p-values |
logfc , pval , cpm
|
Column names for the values holding window specific estimates of change in binding (logfc), overall signal intensity (cpm) and the significance from statistical testing (pval). |
inc_cols |
(Optional) Character vector of any additional columns in
|
p_adj_method |
One of |
merge_within |
Merge any non-overlapping windows within this distance |
ignore_strand |
Passed internally to reduce and findOverlaps |
min_win |
Only keep merged windows derived from at least this number |
keyval |
Return the key-value range as the window associated with the minimum p-value, or by merging the ranges from all windows with raw p-values below the merged harmonic-mean p-value |
hm_pre |
Prefix to add to the beginning of all HMP-derived columns |
When using sliding windows to test for differential signal, overlapping
windows can be merged based on the significance of results.
mergeByHMP()
merges overlapping windows using the asymptotically exact
harmonic mean p-value p.hmp from the individual,
window-level tests. This tests the Null Hypothesis that there is no
significance amongst the initial set of p-values, and returns a summarised
value which controls the FDR within a set of tests (Wilson, PNAS, 2019).
Multilevel testing across the set of results is currently implemented using
p_adj_method = "fwer"
Given that the harmonic mean p-value is calculated from the inverse p-values,
these are used to provide a weighted average of expression and logFC values
in the returned object. Any weights provided in w
are ignored for these
values as they are simple representative estimates.
The representative range returned in keyval_range
corresponds to the window
with the lowest p-value.
The total number of windows is also returned in the final object, with the summarised values n_up and n_down indicating the number of windows with raw p-values below the calculated harmonic mean p-value, and with the corresponding direction of change.
The column containing the harmonic mean p-values is returned as 'hmp'. An additional column with adjusted hmp-values is returned with the suffix '_*' added where the p-value adjustment method is added after the underscore.
A GenomicRanges object with merged ranges from the original object along with summarised or representative values from the relevant columns. The range corresponding to a representative values is also returned as described above
x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60")) set.seed(1001) df <- DataFrame(logFC = rnorm(3), logCPM = rnorm(3,8), p = rexp(3, 10)) mergeByHMP(x, df, pval = "p") mcols(x) <- df x mergeByHMP(x, pval = "p", p_adj_method = "fwer")
x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60")) set.seed(1001) df <- DataFrame(logFC = rnorm(3), logCPM = rnorm(3,8), p = rexp(3, 10)) mergeByHMP(x, df, pval = "p") mcols(x) <- df x mergeByHMP(x, pval = "p", p_adj_method = "fwer")
Merge overlapping windows using p-values from significance testing
mergeBySig(x, ...) ## S4 method for signature 'GenomicRanges' mergeBySig( x, df = NULL, logfc = "logFC", pval = "P", cpm = "logCPM", inc_cols, p_adj_method = "fdr", alpha = 0.05, method = c("combine", "best", "minimal"), merge_within = 1L, ignore_strand = TRUE, min_win = 1, ... ) ## S4 method for signature 'RangedSummarizedExperiment' mergeBySig( x, df = NULL, logfc = "logFC", pval = "P", cpm = "logCPM", inc_cols, p_adj_method = "fdr", alpha = 0.05, method = c("combine", "best", "minimal"), merge_within = 1L, ignore_strand = TRUE, ... )
mergeBySig(x, ...) ## S4 method for signature 'GenomicRanges' mergeBySig( x, df = NULL, logfc = "logFC", pval = "P", cpm = "logCPM", inc_cols, p_adj_method = "fdr", alpha = 0.05, method = c("combine", "best", "minimal"), merge_within = 1L, ignore_strand = TRUE, min_win = 1, ... ) ## S4 method for signature 'RangedSummarizedExperiment' mergeBySig( x, df = NULL, logfc = "logFC", pval = "P", cpm = "logCPM", inc_cols, p_adj_method = "fdr", alpha = 0.05, method = c("combine", "best", "minimal"), merge_within = 1L, ignore_strand = TRUE, ... )
x |
GenomicRanges object |
... |
Passed to all csaw functions being wrapped |
df |
data.frame with results of differential binding analysis performed
using a sliding window strategy. If not provided, the columns in the
|
logfc , pval , cpm
|
Column names for the values holding window specific estimates of change in binding (logfc), overall signal intensity (cpm) and the significance from statistical testing (pval) |
inc_cols |
(Optional) Character vector of any additional columns in
|
p_adj_method |
One of |
alpha |
Significance threshold to apply during internal calculations |
method |
Shorthand versions for which |
merge_within |
Merge any non-overlapping windows within this distance |
ignore_strand |
Passed internally to reduce and findOverlaps |
min_win |
Only keep merged windows derived from at least this number |
When using sliding windows to test for differential signal, overlapping
windows can be merged based on the significance of results.
mergeBySig()
is a wrapper to the functions combineTests,
getBestTest and minimalTests, using each
function's approach to finding a representative window. The returned object
differs from those returned by the original functions in that the
description of windows as 'up', 'down' or mixed is omitted and the genomic
range corresponding to the representative window is also returned. Column
names also correspond to those in the original object.
An additional column with adjusted p-values is returned. This column retains the same name as the original but with the suffix '_*' added where the p-value adjustment method is added after the underscore.
A GenomicRanges object with overlapping ranges from the original object merged and representative values returned. The range corresponding to the representative values is also returned
x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60")) set.seed(1001) df <- DataFrame(logFC = rnorm(3), logCPM = rnorm(3,8), p = rexp(3, 10)) mcols(x) <- df mergeBySig(x, pval = "p", method = "combine") mergeBySig(x, pval = "p", method = "best") mergeBySig(x, pval = "p", method = "min")
x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60")) set.seed(1001) df <- DataFrame(logFC = rnorm(3), logCPM = rnorm(3,8), p = rexp(3, 10)) mcols(x) <- df mergeBySig(x, pval = "p", method = "combine") mergeBySig(x, pval = "p", method = "best") mergeBySig(x, pval = "p", method = "min")
Partition a set of Genomic Ranges by another
partitionRanges(x, y, ...) ## S4 method for signature 'GRanges,GRanges' partitionRanges( x, y, y_as_both = TRUE, ignore.strand = FALSE, simplify = TRUE, suffix = c(".x", ".y"), ... )
partitionRanges(x, y, ...) ## S4 method for signature 'GRanges,GRanges' partitionRanges( x, y, y_as_both = TRUE, ignore.strand = FALSE, simplify = TRUE, suffix = c(".x", ".y"), ... )
x , y
|
GenomicRanges objects |
... |
Not used |
y_as_both |
logical(1) If there are any unstranded regions in y, should these be assigned to both strands. If TRUE unstranded regions can be used to partition stranded regions |
ignore.strand |
If set to TRUE, then the strand of x and y is set to "*" prior to any computation. |
simplify |
Pass to chopMC and simplify mcols in the output |
suffix |
Added to any shared column names in the provided objects |
The query set of ranges can be broken in regions which strictly overlap a second set of ranges. The complete set of mcols from both initial objects will included in the set of partitioned ranges
A GRanges object
x <- GRanges(c("chr1:1-10", "chr1:6-15")) x$id <- paste0("range", seq_along(x)) x y <- GRanges(c("chr1:2-5", "chr1:6-12")) y$id <- paste0("range", seq_along(y)) y partitionRanges(x, y)
x <- GRanges(c("chr1:1-10", "chr1:6-15")) x$id <- paste0("range", seq_along(x)) x y <- GRanges(c("chr1:2-5", "chr1:6-12")) y$id <- paste0("range", seq_along(y)) y partitionRanges(x, y)
Plot Densities for any assay within a SummarizedExperiment
plotAssayDensities(x, ...) ## S4 method for signature 'SummarizedExperiment' plotAssayDensities( x, assay = "counts", colour, linetype, group, trans = NULL, n_max = Inf, ... )
plotAssayDensities(x, ...) ## S4 method for signature 'SummarizedExperiment' plotAssayDensities( x, assay = "counts", colour, linetype, group, trans = NULL, n_max = Inf, ... )
x |
A SummarizedExperiment object |
... |
Passed to density |
assay |
An assay within x |
colour |
Optional column in colData to colour lines by. To remove any
colours, set this argument to |
linetype |
Any optional column in colData used to determine linetype |
group |
Used by geom_line. Defaults to the sample names but setting to NULL will over-write this and only groups specified by colour or linetype will be drawn |
trans |
character(1). Any transformative function to be applied to the
data before calculating the density, e.g. |
n_max |
Maximum number of points to use when calculating densities |
Uses ggplot2 to create a density plot for all samples within the selected assay
A ggplot2
object. Scales and labels can be added using conventional
ggplot2
syntax.
data("se") se$treatment <- c("E2", "E2", "E2", "E2DHT", "E2DHT", "E2DHT") ## Plot individual samples plotAssayDensities(se, colour = "treatment") ## Plot combined within treatment groups plotAssayDensities(se, colour = "treatment", group = "treatment") ## Use a data transformation plotAssayDensities(se, trans = "log1p", colour = "treat")
data("se") se$treatment <- c("E2", "E2", "E2", "E2DHT", "E2DHT", "E2DHT") ## Plot individual samples plotAssayDensities(se, colour = "treatment") ## Plot combined within treatment groups plotAssayDensities(se, colour = "treatment", group = "treatment") ## Use a data transformation plotAssayDensities(se, trans = "log1p", colour = "treat")
Use ggplot2 to create a heatmap from a SummarizedExperiment object
plotAssayHeatmap(x, ...) ## S4 method for signature 'SummarizedExperiment' plotAssayHeatmap( x, assay = "counts", by_x = "colnames", facet_x = NULL, ysideline = FALSE, yside_col = NULL, trans = NULL, n_max = 100, ... )
plotAssayHeatmap(x, ...) ## S4 method for signature 'SummarizedExperiment' plotAssayHeatmap( x, assay = "counts", by_x = "colnames", facet_x = NULL, ysideline = FALSE, yside_col = NULL, trans = NULL, n_max = 100, ... )
x |
a SummarizedExperiment object |
... |
Not used |
assay |
the assay to take values from |
by_x |
the parameter to use for the x-axis. Will default to column names but should be one value per sample, such as an additional column containing shortened sample labels. |
facet_x |
column from colData(x) which will be used to group samples along the x-axis |
ysideline |
logical(1) Draw a line across the side of the y-axis summarising values for each range |
yside_col |
column from colData(x) to group and colour the lines drawn on the side of the y-axis. If grouping by treatment or replicate, the mean values will be shown |
trans |
character(1). Any transformative function to be applied to the
data before calculating the density, e.g. |
n_max |
Maximum number of ranges to draw |
Draw a heatmap containing selected values from an assay within a SummarizedExperiment object. Columns within the colData element of the object can be used to facet along the x-axis (e.g. treatment groups). The maximum number of points is set to be 100, although this can be changed easily should the plot require more ranges to be drawn.
The averages across any grouping of samples can be drawn as a line plot on
the side of the y-axis by setting ysideline = TRUE
, with groups as
specified in yside_col
. This feature is added for the specific context of
neighbouring or overlapping ranges, and as such may be less informative in
any other scenario
The returned object is a ggplot2 object so scales can easily be added after heatmap creation using scale_fill_\* for the main heatmap, and scale_colour_\* for any groupings along the y-axis
A ggplot2
object. Scales and labels can be added using conventional
ggplot2
syntax.
nrows <- 10; ncols <- 4 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colnames(counts) <- paste0("Sample_", seq_len(ncols)) df <- DataFrame(treat = c("A", "A", "B", "B")) se <- SummarizedExperiment( assays = SimpleList(counts = counts), colData = df ) rowRanges(se) <- GRanges(paste0("chr1:", seq_len(nrows))) plotAssayHeatmap(se, facet_x = "treat")
nrows <- 10; ncols <- 4 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colnames(counts) <- paste0("Sample_", seq_len(ncols)) df <- DataFrame(treat = c("A", "A", "B", "B")) se <- SummarizedExperiment( assays = SimpleList(counts = counts), colData = df ) rowRanges(se) <- GRanges(paste0("chr1:", seq_len(nrows))) plotAssayHeatmap(se, facet_x = "treat")
Plot PCA for any assay within a SummarizedExperiment object
plotAssayPCA(x, ...) ## S4 method for signature 'SummarizedExperiment' plotAssayPCA( x, assay = "counts", colour, shape, size, label, show_points = TRUE, pc_x = 1, pc_y = 2, trans = NULL, n_max = Inf, tol = sqrt(.Machine$double.eps), rank = NULL, ... )
plotAssayPCA(x, ...) ## S4 method for signature 'SummarizedExperiment' plotAssayPCA( x, assay = "counts", colour, shape, size, label, show_points = TRUE, pc_x = 1, pc_y = 2, trans = NULL, n_max = Inf, tol = sqrt(.Machine$double.eps), rank = NULL, ... )
x |
An object containing an assay slot |
... |
Passed to geom_text |
assay |
The assay to perform PCA on |
colour |
The column name to be used for colours |
shape , size
|
The column name(s) to be used for determining the shape or size of points |
label |
The column name to be used for labels |
show_points |
logical(1). Display the points. If |
pc_x |
numeric(1) The PC to plot on the x-axis |
pc_y |
numeric(1) The PC to plot on the y-axis |
trans |
character(1). Any transformative function to be applied to the
data before performing the PCA, e.g. |
n_max |
Subsample the data to this many points before performing PCA |
tol |
Any rows with variance below this value will be excluded prior to passing to prcomp. All rows are scaled and centred by default |
rank |
Passed to prcomp |
Uses ggplot2 to create a PCA plot for the selected assay. Any numerical
transformation prior to performing the PCA can be specified using the
trans
argument
A ggplot2 object
data("se") se$treatment <- c("E2", "E2", "E2", "E2DHT", "E2DHT", "E2DHT") se$sample <- colnames(se) plotAssayPCA(se, trans = "log1p", colour = "treatment", label = "sample") plotAssayPCA( se, trans = "log1p", colour = "treatment", label = "sample", size = totals / 1e3 ) plotAssayPCA( se, trans = "log1p", colour = "treatment", label = "sample", show_points = FALSE )
data("se") se$treatment <- c("E2", "E2", "E2", "E2DHT", "E2DHT", "E2DHT") se$sample <- colnames(se) plotAssayPCA(se, trans = "log1p", colour = "treatment", label = "sample") plotAssayPCA( se, trans = "log1p", colour = "treatment", label = "sample", size = totals / 1e3 ) plotAssayPCA( se, trans = "log1p", colour = "treatment", label = "sample", show_points = FALSE )
Plot RLE for a given assay within a SummarizedExperiment
plotAssayRle(x, ...) ## S4 method for signature 'SummarizedExperiment' plotAssayRle( x, assay = "counts", colour, fill, rle_group, by_x, n_max = Inf, trans = NULL, ... )
plotAssayRle(x, ...) ## S4 method for signature 'SummarizedExperiment' plotAssayRle( x, assay = "counts", colour, fill, rle_group, by_x, n_max = Inf, trans = NULL, ... )
x |
A SummarizedExperiment object |
... |
Passed to geom_boxplot |
assay |
The assay to plot |
colour |
Column from |
fill |
Column from |
rle_group |
Column from |
by_x |
Boxplots will be drawn by this grouping variable from
|
n_max |
Maximum number of points to plot |
trans |
character(1). Numerical transformation to apply to the data prior to RLE calculation |
Uses ggplot2 to create an RLE plot for the selected assay. Any numerical
transformation prior to performing the RLE can be specified using the
trans
argument
A ggplot2 object
data("se") se$treatment <- c("E2", "E2", "E2", "E2DHT", "E2DHT", "E2DHT") se$sample <- colnames(se) ## A conventional RLE Plot using all samples plotAssayRle(se, trans = "log1p", fill = "treatment") ## Calculate RLE within groups plotAssayRle(se, trans = "log1p", fill = "treatment", rle_group = "treatment") # Or show groups combined plotAssayRle(se, trans = "log1p", fill = "treatment", by_x = "treatment")
data("se") se$treatment <- c("E2", "E2", "E2", "E2DHT", "E2DHT", "E2DHT") se$sample <- colnames(se) ## A conventional RLE Plot using all samples plotAssayRle(se, trans = "log1p", fill = "treatment") ## Calculate RLE within groups plotAssayRle(se, trans = "log1p", fill = "treatment", rle_group = "treatment") # Or show groups combined plotAssayRle(se, trans = "log1p", fill = "treatment", by_x = "treatment")
Draw a plot from a GRangesList column using ggplot2
plotGrlCol( x, var = "width", geom = c("boxplot", "violin", "point", "jitter"), .id = "sample", df, fill, colour, q = 0.1, q_size = 3.5, qline_type = 2, qline_col = "blue", total = "{comma(n)}", total_geom = c("label", "text", "none"), total_pos = c("median", "top", "bottom"), total_size = 3.5, total_alpha = 1, total_adj = 0.025, ..., digits = 0 )
plotGrlCol( x, var = "width", geom = c("boxplot", "violin", "point", "jitter"), .id = "sample", df, fill, colour, q = 0.1, q_size = 3.5, qline_type = 2, qline_col = "blue", total = "{comma(n)}", total_geom = c("label", "text", "none"), total_pos = c("median", "top", "bottom"), total_size = 3.5, total_alpha = 1, total_adj = 0.025, ..., digits = 0 )
x |
A GRangesList |
var |
The variable to plot. Either a column in the mcols element or width. Can be quoted or unquoted |
geom |
Choose between different geoms, or even provide a geom_*() function |
.id |
The column name to place the element names. Passed internally to the same argument in bind_rows |
df |
Optional data.frame with columns to be passed to the colour or
fill parameters. Must contain a column with the same name as the
value passed to the |
fill , colour
|
Optional column names found in the df. Can be quoted or unquoted |
q |
The overall percentile to be drawn as a labelled, horizontal line. Set q = 0 to hide this line |
q_size |
Text size of percentile label |
qline_type , qline_col
|
Linetype and colour arguments for the horizontal line showing the specified percentile(s) |
total |
Glue syntax for totals, representing the length of each GRangesList element |
total_geom |
Passed to annotate. Set to |
total_pos |
Position for placing totals |
total_size , total_alpha
|
Size and transparency of totals |
total_adj |
Adjustment for labels |
... |
Passed to the geom if selecting via character string. Ignored otherwise |
digits |
Number of decimal places for the horizontal line label |
Using a common column or the width of the ranges, produces a boxplot or
violinplot from each element of the provided GRangesList.
The names of the GRangesList will be passed to the x-axis using the .id
argument.
A data frame containing annotations corresponding to each element can be
supplied, ensuring that the column associated with each elements is the name
passed to the .id
argument.
If q is > 0, a horizontal line will be draw corresponding to this percentile across the complete dataset, with parameters for this line able to be set using the qline_* arguments. The digits argument controls how many decimal points will be shown for the associated label.
The total length of each element will be added by default as a total, and is able to be placed across the median values, or at the top and bottom extremes of the plot.
A ggplot object
## Load some peaks data('peaks') names(peaks) <- gsub("_peaks.+", "", names(peaks)) ## The default boxplot plotGrlCol(peaks) ## A customised violin plot df <- data.frame(sample = names(peaks), treat = rep(c("A", "B"), each = 3)) plotGrlCol( peaks, geom = "violin", total_pos = "bottom", total_adj = 0.05, df = df, fill = "treat", draw_quantiles = 0.5, trim = FALSE, width = 0.7, alpha = 0.7 ) + scale_y_log10() plotGrlCol( peaks, var = score, geom = "jitter", total_pos = "bottom", total_adj = 0.05, df = df, colour = treat, width = 0.2, height = 0 ) plotGrlCol( peaks, geom = geom_boxplot(colour = "grey70"), df = df, fill = treat, total_pos = "bottom", total_adj = 0.05, ) + scale_y_log10()
## Load some peaks data('peaks') names(peaks) <- gsub("_peaks.+", "", names(peaks)) ## The default boxplot plotGrlCol(peaks) ## A customised violin plot df <- data.frame(sample = names(peaks), treat = rep(c("A", "B"), each = 3)) plotGrlCol( peaks, geom = "violin", total_pos = "bottom", total_adj = 0.05, df = df, fill = "treat", draw_quantiles = 0.5, trim = FALSE, width = 0.7, alpha = 0.7 ) + scale_y_log10() plotGrlCol( peaks, var = score, geom = "jitter", total_pos = "bottom", total_adj = 0.05, df = df, colour = treat, width = 0.2, height = 0 ) plotGrlCol( peaks, geom = geom_boxplot(colour = "grey70"), df = df, fill = treat, total_pos = "bottom", total_adj = 0.05, ) + scale_y_log10()
Plot a region with showing HiC, Features, Genes and Coverage
plotHFGC( gr, hic, features, genes, coverage, annotation, zoom = 1, shift = 0, max = 1e+07, axistrack = TRUE, cytobands, covtype = c("l", "heatmap"), linecol = c(), gradient = hcl.colors(101, "viridis"), hiccol = list(anchors = "lightblue", interactions = "red"), featcol, genecol, annotcol, highlight = "blue", hicsize = 1, featsize = 1, genesize = 1, covsize = 4, annotsize = 0.5, hicname = "HiC", featname = "Features", featstack = c("full", "hide", "dense", "squish", "pack"), collapseTranscripts = "auto", maxTrans = 12, ylim = NULL, ..., fontsize = 12, cex.title = 0.8, rotation.title = 0, col.title = "white", background.title = "lightgray", title.width = 1.5 )
plotHFGC( gr, hic, features, genes, coverage, annotation, zoom = 1, shift = 0, max = 1e+07, axistrack = TRUE, cytobands, covtype = c("l", "heatmap"), linecol = c(), gradient = hcl.colors(101, "viridis"), hiccol = list(anchors = "lightblue", interactions = "red"), featcol, genecol, annotcol, highlight = "blue", hicsize = 1, featsize = 1, genesize = 1, covsize = 4, annotsize = 0.5, hicname = "HiC", featname = "Features", featstack = c("full", "hide", "dense", "squish", "pack"), collapseTranscripts = "auto", maxTrans = 12, ylim = NULL, ..., fontsize = 12, cex.title = 0.8, rotation.title = 0, col.title = "white", background.title = "lightgray", title.width = 1.5 )
gr |
The range(s) of interest. Must be on a single chromosome |
hic |
Any HiC interactions to be included as a GenomicInteractions object. If not supplied, no HiC track will be drawn. |
features |
A named GRangesList or list of GRangesList objects. Each
GRangesList should contain features in each element which will drawn on the
same track. If providing a list, each GRangesList within the list will drawn
on a separate track. If this argument is not specified, no feature track will
be drawn. Features will be drawn with colours provided in |
genes |
A GRanges object with exon structure for each transcript/gene.
If not included, no track will be drawn for gene/transcript structure.
The expected mcols in this object are |
coverage |
A named list of BigWigFileList objects containing the primary tracks to show coverage for. Each list element will be drawn on a separate track, with elements within each BigWigFileList shown on the same track. List names will become track names. Alternatively, a single BigWigFileList will plot all individual files on separate tracks. If not included, no coverage tracks will be drawn. |
annotation |
Annotations for the coverage track(s). A single GRangesList if coverage is a BigWigListList. If coverage is supplied as a list of BigWigFileLists, a named list of GRangesList objects for each coverage track being annotatated. Names must match those given for coverage. |
zoom |
Multiplicative factor for zooming in and out |
shift |
Shift the plot. Applied after zooming |
max |
The maximum width of the plotting region. Given that the width of the final plotting window will be determined by any HiC interactions, this argument excludes any interactions beyond this distance. Plotting can be somewhat slow if any long range interactions are included. Ignored if no HiC interactions are supplied. |
axistrack |
logical. Add an AxisTrack() |
cytobands |
Cytogenetic bands to be displayed on each chromosome. See data('grch37.cytobands') for the correct format. Only drawn if a cytobands data.frame is provided. |
covtype |
The plot type for coverage. Currently only lines ("l") and heatmaps ("heatmap") are supported |
linecol |
If passing a BigWigFileList to coverage, a vector of colours. If passing a list of BigWigFileList objects to coverage, a list of colours with structure that matches the object being passed to coverage, i.e. a named list of the same length, with elements who's length matches each BigWigFileList. Only used if covtype = "l". |
gradient |
Colour gradient for heatmaps |
hiccol |
list with names |
featcol |
Named vector (or list) of colours for each feature. Must be provided if drawing features |
genecol |
Named vector (or list) of colours for each gene category |
annotcol |
Colours matching the coverage annotations |
highlight |
Outline colour for the highlight track. Setting this to
|
hicsize , featsize , genesize , covsize , annotsize
|
Relative sizes for each track (hic, features, genes, coverage & annotation) |
hicname , featname
|
Names displayed in the LHS panel |
featstack |
Stacking for the fature track |
collapseTranscripts |
Passed to GeneRegionTrack for the
genes track. Defaults to |
maxTrans |
Only used if |
ylim |
If a numeric vector, this will be passed to all coverage tracks. Alternatively, a named list of y-limits for each coverage track with names that match those in each element of the coverage list. |
... |
Passed to DataTrack for the coverage tracks only.
Useful arguments may be things like |
fontsize |
Applied across all tracks |
cex.title |
Passed to all tracks |
rotation.title |
Passed to all tracks |
col.title |
Passed to all tracks |
background.title |
Passed to all tracks |
title.width |
Expansion factor passed to plotTracks, and
used to widen the panels on the LHS of all tracks.
Can have unpredictable effects on the font
size of y-axis limits due to the algorithm applied by |
Convenience function for plotting a common set of tracks. All tracks are optional. For more fine control, users are advised to simply use Gviz directly.
The primary tracks defined in this function are H (HiC), F (features), G (genes), and C (coverage). Axis and Ideogram tracks are an additional part of this visualisation, with the Ideogram also being optional
Use all tracks specific to this dataset to generate a simple visualisation. In descending order the tracks displayed will be:
HiC Interactions (if supplied)
Regulatory features
Genes/genes
Coverage tracks as supplied
All tracks are optional and will simply be omitted if no data is supplied. See individual sections below for a more detailed explanation of each track
If wanting a single track of genes, simply pass a GRanges object in the format specified for a GeneRegionTrack. Passing a GRangesList with the same format will yield an individual track for each list element, with each track shown by default as a separate colour. This can be used for showing Up/Down-regulated genes, or Detected/Undetected genes.
If passing a BigWigFileList for the coverage track, each file within the object will be drawn on a separate track. If specified, the same y-limits will be applied to each track If passing a list of BigWigFileList objects, each list element will be drawn as a single track with the individual files within each BigWigFileList overlaid within each track.
Cytogenetic band information must be in the structure required by IdeogramTrack, with data for both GRCh37 and GRCh38 provided in this package (grch37.cytobands, grch38.cytobands).
A highlight overlay over the GRanges provided as the gr
argument will be
added if a colour is provided. If set to NULL, no highlight will be added.
A Gviz object
The available arguments for displaying HiC Interactions are defined below.
If hic
is supplied, a single InteractionTrack
will be added displaying
all interactions with an anchor within the range specified by gr
.
Only interactions with an anchor explicitly overlapping gr
will be shown.
If no interactions are found within gr
, the track will not be displayed.
The plotting range will expand to incorporate these interactions, with
the paramater max
providing an upper limit on the displayed range.
This is the GInteractions
object required for inclusion of
a HiC track in the final output. Will be ignored if not supplied
Determines the colours used for display of anchors and interactions
Relative size of the track compared to others
The name to display on the LHS panel
The maximum width of the plotted region. If multiple long-range
interactions are identified, this provides an upper limit for the display.
This defaults to 10Mb
.
If wanting to add an AnnotationTrack with regions defined as
'features', the following arguments are highly relevant.
All are ignored if features
is not provided.
A named GRangesList
. Each element will be considered as
a separate feature and drawn as a block in a distinct colour. Any mcols
data will be ignored.
A named vector (or list) providing a colour for each
element of features
The name to display on the LHS panel
Stacking to be applied to all supplied features
Relative size of the track compared to others
To display genes or transcripts, simply provide a single GRanges
object if
you wish to display all genes on a single track.
The mcols
element of this object should contain the columns feature
,
gene
, exon
, transcript
and symbol
as seen on the
GeneRegionTrack help page.
Alternatively, a GRangesList
can be provided to display genes on separate
tracks based on their category.
This can be useful for separating and colouring Up/Down regulated genes in a
precise way.
All elements should be as described above.
Again, all parameters associated with this track-set will be ignored of no
object is supplied to this argument.
A GRanges
or GRangesList
object as described above
A single colour if supplying a GRanges
object, or a
named vector/list of colours matching the GRangesList
Relative size of the track compared to others
Passed to all tracks. See the GeneRegionTrack
section in settings for detail regarding possible arguments.
If genes is a GRangesList
, can be a named vector/list with names
matching the names of the genes
object.
This section contains the most flexibility and can take two types of input.
The first option is a BigWigFileList
, which will lead to each BigWig file
being plotted on it's own track.
An alternative is a list of BigWigFileList
objects.
In this case, each list element will be plotted as a separate track,
with all individual BigWig
files within each list element
overlaid within the relevant track.
In addition to the coverage tracks, annotations can be added to each
BigWigFileList
in the form of coloured ranges, indicating anything of the
users choice. Common usage may be to indicate regions with binding of a
ChIP target is found to be detected, unchanged, gained or lost.
A BigWigFileList
or list
of BigWigFileList
objects.
A single BigWigFileList
will be displayed with each individual file on a
separate track with independent y-axes. Each element of the
BigWigFileList
must be named and these names will be displayed on the
LHS panels
A list of BigWigFileList
objects will be displayed with each list element
as a separate track, with any BigWig
files overlaid using the same
y-axis. The list must be named with these names displayed on the LHS
panel. Each internal BigWig
within a BigWigFileList
must also be named.
Currently only lines (covtype = "l"
) and
heatmaps (covtype = "heatmap"
) are supported. Colours can be
specified using the arguments below
Can be a single colour applied to all tracks, or a named
vector (or list) of colours. If coverage
is a single BigWigFileList
,
these names should match the names of this object exactly.
If coverage
is a list of BigWigFileList
objects, linecol
should be
a list with matching names. Each element of this list should also be a
named vector with names that exactly match those of each corresponding
BigWigFileList
.
A colour gradient applied to all heatmap tracks. No specific structure is required beyond a vector of colours.
Relative size of the tracks compared to others
Can be a vector of length 2 applied to all coverage tracks.
Alternatively, if passing a list of BigWigFlieList
objects to the
coverage
argument, this can be a named list of numeric vectors with
names matching coverage
Each BigWigFileList
needs annotations to be passed to
this argument as a named GRangesList
, with names being used to
associate unique colours with that set of ranges. If coverage
is a
BigWigFileList
a simple GRangesList
would be supplied and a single
'annotation' track will appear at the top of the set of coverage tracks.
If coverage
is a list
, then a named list of GRangesList
objects
should be supplied, with each being displayed above the corresponding track
from the coverage
object.
A vector of colours corresponding to all names within all
GRangesList
elements supplied as annotation
. It is assumed that the
same colour scheme will be applied to all annotation tracks and, as such,
the colours should not be provided as a list which matches the
coverage tracks. Instead, every named element anywhere in the annotation
GRanges, across all of the tracks must be included as a colour
Relative size of the tracks compared to others
library(rtracklayer) ## Make sure we have the cytobands active data(grch37.cytobands) ## Prepare the HiC, promoter & transcript information data(ex_hic, ex_trans, ex_prom) ex_features <- GRangesList(Promoter = ex_prom) featcol <- c(Promoter = "red") ## Prepare the coverage fl <- system.file( "extdata", "bigwig", c("ex1.bw", "ex2.bw"), package = "extraChIPs" ) bwfl <- BigWigFileList(fl) names(bwfl) <- c("ex1", "ex2") bw_col <- c(ex1 = "#4B0055", ex2 = "#007094") ## Define the plotting range gr <- GRanges("chr10:103862000-103900000") ## Now create the basic plot plotHFGC( gr, hic = ex_hic, features = ex_features, genes = ex_trans, coverage = bwfl, featcol = featcol, linecol = bw_col, cytobands = grch37.cytobands ) plotHFGC( gr, hic = ex_hic, features = ex_features, genes = ex_trans, coverage = bwfl, featcol = featcol, linecol = bw_col, cytobands = grch37.cytobands, maxTrans = 1 )
library(rtracklayer) ## Make sure we have the cytobands active data(grch37.cytobands) ## Prepare the HiC, promoter & transcript information data(ex_hic, ex_trans, ex_prom) ex_features <- GRangesList(Promoter = ex_prom) featcol <- c(Promoter = "red") ## Prepare the coverage fl <- system.file( "extdata", "bigwig", c("ex1.bw", "ex2.bw"), package = "extraChIPs" ) bwfl <- BigWigFileList(fl) names(bwfl) <- c("ex1", "ex2") bw_col <- c(ex1 = "#4B0055", ex2 = "#007094") ## Define the plotting range gr <- GRanges("chr10:103862000-103900000") ## Now create the basic plot plotHFGC( gr, hic = ex_hic, features = ex_features, genes = ex_trans, coverage = bwfl, featcol = featcol, linecol = bw_col, cytobands = grch37.cytobands ) plotHFGC( gr, hic = ex_hic, features = ex_features, genes = ex_trans, coverage = bwfl, featcol = featcol, linecol = bw_col, cytobands = grch37.cytobands, maxTrans = 1 )
Plot Overlaps between list elements as an upset or Venn diagram
plotOverlaps(x, ...) ## S4 method for signature 'GRangesList' plotOverlaps( x, type = c("auto", "venn", "upset"), var = NULL, f = c("mean", "median", "max", "min", "sd"), set_col = NULL, ..., .sort_sets = "ascending", hj_sets = 1.15, sz_sets = 3.5, exp_sets = 0.25, merge_within = 1L, ignore.strand = TRUE ) ## S4 method for signature 'list' plotOverlaps( x, type = c("auto", "venn", "upset"), set_col = NULL, ..., .sort_sets = "ascending", hj_sets = 1.15, sz_sets = 3.5, exp_sets = 0.25 )
plotOverlaps(x, ...) ## S4 method for signature 'GRangesList' plotOverlaps( x, type = c("auto", "venn", "upset"), var = NULL, f = c("mean", "median", "max", "min", "sd"), set_col = NULL, ..., .sort_sets = "ascending", hj_sets = 1.15, sz_sets = 3.5, exp_sets = 0.25, merge_within = 1L, ignore.strand = TRUE ) ## S4 method for signature 'list' plotOverlaps( x, type = c("auto", "venn", "upset"), set_col = NULL, ..., .sort_sets = "ascending", hj_sets = 1.15, sz_sets = 3.5, exp_sets = 0.25 )
x |
GRangesList of S3 list able to be coerced to character vectors |
... |
Passed to draw.pairwise.venn (or
|
type |
The type of plot to be produced |
var |
Column to summarised as a boxplot in an upper panel (UpSet plot only) |
f |
Summarisation function. Must return a single value from any numeric vector |
set_col |
Colours to be assigned to each set |
.sort_sets |
passed to |
hj_sets |
Horizontal adjustment of set size labels |
sz_sets |
Text size for set size labels. Passed internally to
|
exp_sets |
X-axis expansion for set size panel |
merge_within |
Passed to makeConsensus |
ignore.strand |
Passed to reduce |
This function should give the capability to show overlaps for any number of replicates or groups, or a list of items such as gene names. For n = 2, a scaled Venn Diagram will be produced, however no scaling is implemented for n = 3
UpSet plots are possible for any lists with length > 1, and are the only implemented possibility for lists > 3.
If the input is a GRangesList
an additional boxplot can be requested
using any numeric column within the existing mcols()
element.
Values will be summarised across all elements using the requested function
and the boxplot will be included as an upper panel above the intersections
Either a VennDiagram (i.e. grid) object, or a ComplexUpset plot
## Examples using a list of character vectors ex <- list( x = letters[1:5], y = letters[c(6:15, 26)], z = letters[c(2, 10:25)] ) plotOverlaps(ex, type = "upset") plotOverlaps(ex, type = "venn", set_col = 1:3, alpha = 0.3) plotOverlaps(ex, type = "upset", set_col = 1:3, labeller = stringr::str_to_title) plotOverlaps(ex[1:2]) ## GRangesList object will produce a boxplot of summarised values in the ## upper panel data("peaks") grl <- peaks[1:3] names(grl) <- gsub("_peaks.+", "", names(grl)) plotOverlaps(grl, type = 'upset', var = 'score', f = 'max') ## If only two samples are present, a VennDiagram will be produced plotOverlaps(grl[1:2], set_col = c("green", "blue"))
## Examples using a list of character vectors ex <- list( x = letters[1:5], y = letters[c(6:15, 26)], z = letters[c(2, 10:25)] ) plotOverlaps(ex, type = "upset") plotOverlaps(ex, type = "venn", set_col = 1:3, alpha = 0.3) plotOverlaps(ex, type = "upset", set_col = 1:3, labeller = stringr::str_to_title) plotOverlaps(ex[1:2]) ## GRangesList object will produce a boxplot of summarised values in the ## upper panel data("peaks") grl <- peaks[1:3] names(grl) <- gsub("_peaks.+", "", names(grl)) plotOverlaps(grl, type = 'upset', var = 'score', f = 'max') ## If only two samples are present, a VennDiagram will be produced plotOverlaps(grl[1:2], set_col = c("green", "blue"))
Plot Pairwise Values from a GRangeList by overlapping GRanges
plotPairwise( x, var, colour, label, index = c(1, 2), p = 0, method = "union", ignore.strand = TRUE, min_width = 0, xside = c("boxplot", "density", "violin", "none"), yside = c("boxplot", "density", "violin", "none"), side_panel_width = c(0.3, 0.4), side_alpha = 1, xside_axis_pos = "right", yside_axis_label = scales::label_wrap(10), smooth = TRUE, rho_geom = c("text", "label", "none"), rho_col = "black", rho_size = 4, rho_pos = c(0.05, 0.95), rho_alpha = 1, label_geom = c("label_repel", "label", "text_repel", "text", "none"), label_width = 20, label_sep = "; ", label_size = 3.5, label_alpha = 0.7, min_d = 1, group_sep = " - ", simplify_equal = TRUE, name_sep = " ", plot_theme = theme_get(), ... )
plotPairwise( x, var, colour, label, index = c(1, 2), p = 0, method = "union", ignore.strand = TRUE, min_width = 0, xside = c("boxplot", "density", "violin", "none"), yside = c("boxplot", "density", "violin", "none"), side_panel_width = c(0.3, 0.4), side_alpha = 1, xside_axis_pos = "right", yside_axis_label = scales::label_wrap(10), smooth = TRUE, rho_geom = c("text", "label", "none"), rho_col = "black", rho_size = 4, rho_pos = c(0.05, 0.95), rho_alpha = 1, label_geom = c("label_repel", "label", "text_repel", "text", "none"), label_width = 20, label_sep = "; ", label_size = 3.5, label_alpha = 0.7, min_d = 1, group_sep = " - ", simplify_equal = TRUE, name_sep = " ", plot_theme = theme_get(), ... )
x |
A GRangesList |
var |
The colunm to compare between list elements |
colour |
Optional column to use for combining across elements and setting point colour |
label |
Optional column to use for labelling ranges with the most extreme changes |
index |
Which list elements to compare |
p , method , ignore.strand , min_width
|
Passed to |
xside , yside
|
Will call geom_(x/y)side* from the package ggside and show additional panels on the top and right of the plot respectively |
side_panel_width |
Set the relative widths of the side panels |
side_alpha |
Set fill transparency in side panels |
xside_axis_pos |
Position for axis_labels in the top panel when using a discrete axis |
yside_axis_label |
Wrapping for axis labels on the right-side panel when using a discrete axis. Set to waiver() to turn off wrapping |
smooth |
logical(1). If TRUE a regression line will be drawn using geom_smooth. To add this manually, set to FALSE and call this geom with any custom parameters after creating the plot |
rho_geom |
Used to add correlation coefficients for the two values |
rho_col , rho_size , rho_alpha
|
Parameters for displaying the correlation |
rho_pos |
Place the correlation coefficient within the plotting region |
label_geom |
Used to add labels from the column specified in label |
label_width |
Label text will be truncated to this length |
label_sep |
If multiple values (e.g. genes) are mapped to a range, separate values using this string |
label_size , label_alpha
|
Passed to the geom used for adding labels |
min_d |
Labels will only be added if the points lie circle beyond a sircle of this radius |
group_sep |
Text separator used to separate categories when specifying colour |
simplify_equal |
logical(1) When combining columns from both elements for the colour categories, should shared values be annotated as 'Both ...' instead of having longer, more difficult to read annotations. |
name_sep |
Character string to separate names of the GRangesList and the selected column. Will appear as axis-labels |
plot_theme |
Sets the initial theme by using the default theme for the current R session via get_theme() |
... |
Passed to |
This function enables pairwise plotting of two elements within a GRangesList. All elements of the GRangesList will contain the same columns, so a set of consensus ranges are first formed, before then taking all values from each GRangesList element which overlap the range and producing a piarwise plot.
Given that not all ranges will have values in both elements, side panels are produced which can show the distribution of plotted values, along with those which are only found in one of the foundational GRanges. These can take the form of density, violin or boxplots.
Addition columns, such as Differential Signal status can also be used to form pairwise groups and colour the points.
If a column in the GRangesList is suitable for labelling points, such as a
column with genes mapped to each range, this can be specified using the
argument label = "col_to_label"
.
Only the furthest point from the origin will be labelled within each group
used to colour the points.
Labels will only be added if they lie beyond a circle of radius min_d
from
the origin.
If multiple genes are mapped to the range, these will be separated by the
string provided in the label_sep
argument.
A regression line and correlation co-efficient are added to the plot by default, but can be hidden easily if preferred
A ggside
or ggplot2
object
theme_set(theme_bw()) set.seed(100) gr1 <- GRanges(paste0("chr1:", seq(10, 150, by = 10))) width(gr1) <- 5 gr1$logFC <- rnorm(length(gr1)) gr1$FDR <- rbeta(length(gr1), 1, 8) gr2 <- GRanges(paste0("chr1:", seq(51, 250, by = 15))) width(gr2) <- 4 gr2$logFC <- rnorm(length(gr2)) gr2$FDR <- rbeta(length(gr2), 1, 8) grl <- GRangesList(TF1 = gr1, TF2 = gr2) grl <- addDiffStatus(grl) # Using the defaults plotPairwise(grl, var = "logFC") # Density plots on the side panels plotPairwise( grl, var = "logFC", xside = "density", yside = "density", side_alpha = 0.5 ) # Turning off side panels, regression line and correlations plotPairwise( grl, var = "logFC", xside = "none", yside = "none", rho_geom = "none", smooth = FALSE ) # Add colours using the status column plotPairwise(grl, var = "logFC", colour = "status") + scale_fill_manual(values = rep_len(c("blue", "red", "white", "grey"), 8)) + guides(fill = "none")
theme_set(theme_bw()) set.seed(100) gr1 <- GRanges(paste0("chr1:", seq(10, 150, by = 10))) width(gr1) <- 5 gr1$logFC <- rnorm(length(gr1)) gr1$FDR <- rbeta(length(gr1), 1, 8) gr2 <- GRanges(paste0("chr1:", seq(51, 250, by = 15))) width(gr2) <- 4 gr2$logFC <- rnorm(length(gr2)) gr2$FDR <- rbeta(length(gr2), 1, 8) grl <- GRangesList(TF1 = gr1, TF2 = gr2) grl <- addDiffStatus(grl) # Using the defaults plotPairwise(grl, var = "logFC") # Density plots on the side panels plotPairwise( grl, var = "logFC", xside = "density", yside = "density", side_alpha = 0.5 ) # Turning off side panels, regression line and correlations plotPairwise( grl, var = "logFC", xside = "none", yside = "none", rho_geom = "none", smooth = FALSE ) # Add colours using the status column plotPairwise(grl, var = "logFC", colour = "status") + scale_fill_manual(values = rep_len(c("blue", "red", "white", "grey"), 8)) + guides(fill = "none")
Draw Pie Graphs based one or more data.frame columns
plotPie(object, ...) ## S4 method for signature 'GRanges' plotPie(object, scale_by = c("n", "width"), ...) ## S4 method for signature 'DataFrame' plotPie(object, ...) ## S4 method for signature 'data.frame' plotPie( object, fill, x, y, scale_by, scale_factor = 1000, width = 0.8, total_geom = c("label", "text", "none"), total_glue = "{comma(N)}", total_colour = "black", total_fill = "white", total_alpha = 1, total_size = 3, min_p = 0.01, max_p = 1, cat_geom = c("label", "text", "none"), cat_glue = "{.data[[fill]]}\n{comma(n, 1)}\n({percent(p, 0.1)})", cat_colour = "black", cat_fill = "white", cat_size = 3, cat_alpha = 1, cat_adj = 0, hole_width = 0, ... )
plotPie(object, ...) ## S4 method for signature 'GRanges' plotPie(object, scale_by = c("n", "width"), ...) ## S4 method for signature 'DataFrame' plotPie(object, ...) ## S4 method for signature 'data.frame' plotPie( object, fill, x, y, scale_by, scale_factor = 1000, width = 0.8, total_geom = c("label", "text", "none"), total_glue = "{comma(N)}", total_colour = "black", total_fill = "white", total_alpha = 1, total_size = 3, min_p = 0.01, max_p = 1, cat_geom = c("label", "text", "none"), cat_glue = "{.data[[fill]]}\n{comma(n, 1)}\n({percent(p, 0.1)})", cat_colour = "black", cat_fill = "white", cat_size = 3, cat_alpha = 1, cat_adj = 0, hole_width = 0, ... )
object |
An object ( |
... |
Not used |
scale_by |
Scale the counts by this column. In this case of a GRanges object this defaults to the count (scale_by = "n") but can also be specified as being width of each range (scale_by = "width"). If choosing width, width will be displayed in Kb |
fill |
The category/column used to fill the slices of the pie charts |
x |
The second (optional) category/column to place along the x-axis |
y |
The final (optional) category/column to plce along the y-axis |
scale_factor |
When scaling by another column, such as width, totals will be divided by this value, with 1000 being the default to provide output in kb. |
width |
Scale the width of all pies |
total_geom |
The geom_* to use for the totals at the centre of each pie. Setting this to 'none' will disable totals |
total_glue |
glue syntax to use for the totals in the centre of each pie. The column 'N' will produce the totals and any other values or formatting may be added here. |
total_colour , total_fill , total_alpha , total_size
|
Colour, fill, alpha and size for the main totals in the centre of each pie chart |
min_p |
The minimum proportion of the total required for adding labels. Effectively removes labels from pie charts with few members. Alternatively when only one column is specified, categories below this will not be shown around the edge of the plot |
max_p |
only display labels for segments representing less than this proportion of the total. |
cat_geom |
The geom_* to use for category labels corresponding to each slice of the pie. Setting this to 'none' will disable category labels |
cat_glue |
glue syntax to use for the category labels corresponding to each slice of the pie charts. The columns 'n' and 'p' can be used to print totals and proportions for each slice. |
cat_colour , cat_fill , cat_size , cat_alpha
|
Colour, fill, size and alpha for category labels |
cat_adj |
Adjust category labels |
hole_width |
Add a hole in the middle to turn the plot into a donut. Values between zero and 1 work best. Only implemented for pie charts using one value (i.e. fill) |
Using a data.frame
as input, this function will draw pie graphs based
on one ore more columns, by simply counting the values in combination
across these columns.
One column must be selected for the fill as a bare minimum, with up to three
being possible.
Additional columns can be set for the x-axis to draw a series of pie-graphs
in a row, with a further column able to added to layout a series of pie
graphs in a grid
If only one column/category is chosen, category labels will be added around the edge of the plot
If show_total = TRUE
the overall counts for each pie graph will be added
in the centre using geom_label.
Parameters for these labels are customisable
A ggplot2 object able to be customised with colour scales and themes.
Also note that the $data element of the returned object will contain the
data.frame used for plotting. The additional column label_radians
represents the mid-point of each pie slice and can be used for manually
adding labels to each pie.
Only applies when plotting across the x
or y
axes
set.seed(200) df <- data.frame( feature = sample( c("Promoter", "Enhancer", "Intergenic"), 200, replace = TRUE ), TF1 = sample(c("Up", "Down", "Unchanged"), 200, replace = TRUE), TF2 = sample(c("Up", "Down", "Unchanged"), 200, replace = TRUE), w = rexp(200) ) plotPie(df, fill = "feature", total_glue = "N = {comma(N)}") plotPie( df, fill = "feature", scale_by = "w", total_geom = "none", cat_glue = "{percent(p)}", cat_size = 5 ) plotPie(df, fill = "feature", x = "TF1") plotPie( df, fill = "feature", x = "TF1", y = "TF2", min_p = 0.02, total_geom = "none", cat_glue = "{n} / {N}" ) + scale_fill_viridis_d() + theme_bw() ## And using a GRanges object data("ex_prom") gr <- ex_prom mcols(gr) <- df[seq_along(gr),] ## Show values by counts plotPie(gr, fill = "feature", total_size = 5) ## Show values scaled by width of each range as a donut plot plotPie( gr, fill = "feature", scale_by = "width", total_glue = "{round(N, 1)}kb", cat_glue = "{percent(p, 0.1)}", cat_size = 4, total_size = 5, hole_width = 0.2 )
set.seed(200) df <- data.frame( feature = sample( c("Promoter", "Enhancer", "Intergenic"), 200, replace = TRUE ), TF1 = sample(c("Up", "Down", "Unchanged"), 200, replace = TRUE), TF2 = sample(c("Up", "Down", "Unchanged"), 200, replace = TRUE), w = rexp(200) ) plotPie(df, fill = "feature", total_glue = "N = {comma(N)}") plotPie( df, fill = "feature", scale_by = "w", total_geom = "none", cat_glue = "{percent(p)}", cat_size = 5 ) plotPie(df, fill = "feature", x = "TF1") plotPie( df, fill = "feature", x = "TF1", y = "TF2", min_p = 0.02, total_geom = "none", cat_glue = "{n} / {N}" ) + scale_fill_viridis_d() + theme_bw() ## And using a GRanges object data("ex_prom") gr <- ex_prom mcols(gr) <- df[seq_along(gr),] ## Show values by counts plotPie(gr, fill = "feature", total_size = 5) ## Show values scaled by width of each range as a donut plot plotPie( gr, fill = "feature", scale_by = "width", total_glue = "{round(N, 1)}kb", cat_glue = "{percent(p, 0.1)}", cat_size = 4, total_size = 5, hole_width = 0.2 )
Plot a coverage Profile Heatmap across multiple ranges
plotProfileHeatmap(object, ...) ## S4 method for signature 'GenomicRangesList' plotProfileHeatmap( object, profileCol = "profile_data", xValue = "bp", fillValue = "score", facetX = NULL, facetY = NULL, colour = facetY, linetype = NULL, summariseBy = c("mean", "median", "min", "max", "none"), xLab = xValue, yLab = NULL, fillLab = fillValue, labelFunX = waiver(), relHeight = 0.3, sortFilter = NULL, maxDist = 100, ... ) ## S4 method for signature 'GenomicRanges' plotProfileHeatmap( object, profileCol = "profile_data", xValue = "bp", fillValue = "score", facetX = NULL, facetY = NULL, colour = facetY, linetype = NULL, summariseBy = c("mean", "median", "min", "max", "none"), xLab = xValue, yLab = NULL, fillLab = fillValue, labelFunX = waiver(), relHeight = 0.3, summaryLabelSide = "left", respectLevels = FALSE, sortFilter = NULL, maxDist = 100, ... )
plotProfileHeatmap(object, ...) ## S4 method for signature 'GenomicRangesList' plotProfileHeatmap( object, profileCol = "profile_data", xValue = "bp", fillValue = "score", facetX = NULL, facetY = NULL, colour = facetY, linetype = NULL, summariseBy = c("mean", "median", "min", "max", "none"), xLab = xValue, yLab = NULL, fillLab = fillValue, labelFunX = waiver(), relHeight = 0.3, sortFilter = NULL, maxDist = 100, ... ) ## S4 method for signature 'GenomicRanges' plotProfileHeatmap( object, profileCol = "profile_data", xValue = "bp", fillValue = "score", facetX = NULL, facetY = NULL, colour = facetY, linetype = NULL, summariseBy = c("mean", "median", "min", "max", "none"), xLab = xValue, yLab = NULL, fillLab = fillValue, labelFunX = waiver(), relHeight = 0.3, summaryLabelSide = "left", respectLevels = FALSE, sortFilter = NULL, maxDist = 100, ... )
object |
A GRanges or GRangesList object |
... |
Passed to facet_grid internally. Can be utilised for switching panel strips or passing a labeller function |
profileCol |
Column name specifying where to find the profile DataFrames |
xValue , fillValue
|
Columns within the profile DataFrames for heatmaps |
facetX , facetY
|
Columns used for faceting across the x- or y-axis respectively |
colour |
Column used for colouring lines in the summary panel. Defaults to any column used for facetY |
linetype |
Column used for linetypes in the summary panel |
summariseBy |
Function for creating the summary plot in the top panel. If set to 'none', no summary plot will be drawn. Otherwise the top panel will contain a line-plot representing this summary value for each x-axis bin |
xLab , yLab , fillLab
|
Labels for plotting aesthetics. Can be overwritten using labs() on any returned object |
labelFunX |
Function for formatting x-axis labels |
relHeight |
The relative height of the top summary panel. Represents the fraction of the plotting area taken up by the summary panel. |
sortFilter |
If calling on a GRangesList, a method for subsetting the original object (e.g. 1:2). If calling on a GRanges object should be and expression able to be parsed as a filtering expression using eval_tidy. This is applied when sorting the range order down the heatmap such that ranges can be sorted by one or specific samples, or all. Ranges will always be sorted such that those with the strongest signal are at the top of the plot |
maxDist |
Maximum distance from the centre to find the strongest signal when arranging the ranges |
summaryLabelSide |
Side to place y-axis for the summary plot in the top panel |
respectLevels |
logical(1) If FALSE, facets along the y-axis will be arranged in descending order of signal, otherwise any original factor levels will be retained |
Convenience function for plotting coverage heatmaps across a common set of ranges, shared between one or more samples. These are most commonly the coverage values from merged samples within a treatment group. THe input data structure is based on that obtained from getProfileData, and can be provided either as a GRanges object (generally for one sample) or as a GRangesList.
A 'profile DataFrame' here refers to a data.frame (or tibble, or DataFrame) with a coverage value in one column that corresponds to a genomic bin of a fixed size denoted in another, as generated by getProfileData. Given that multiple ranges are most likely to be drawn, each profile data frame must be the same size in terms of the number of bins, each of which represent a fixed number of nucleotides. At a minimum this is a two column data frame although getProfileData will provide three columns for each specified genomic region.
If using a GRangesList, each list element will be drawn as a separate panel by default. These panels will appear in the same order as the list elements of the GRangesList, although this can easily be overwritten by passing a column name to the facetX argument. The default approach will add the original element names as the column "name" which can be seen in the $data element of any resultant ggplot object produced by this function.
A ggplot2
object, able to be customised using standard ggplot2
syntax
library(rtracklayer) fl <- system.file( "extdata", "bigwig", c("ex1.bw", "ex2.bw"), package = "extraChIPs" ) bwfl <- BigWigFileList(fl) names(bwfl) <- c("ex1", "ex2") gr <- GRanges( c( "chr10:103880281-103880460", "chr10:103892581-103892760", "chr10:103877281-103877460" ) ) pd <- getProfileData(bwfl, gr) plotProfileHeatmap(pd, "profile_data") + scale_fill_viridis_c(option = "inferno", direction = -1) + labs(fill = "Coverage")
library(rtracklayer) fl <- system.file( "extdata", "bigwig", c("ex1.bw", "ex2.bw"), package = "extraChIPs" ) bwfl <- BigWigFileList(fl) names(bwfl) <- c("ex1", "ex2") gr <- GRanges( c( "chr10:103880281-103880460", "chr10:103892581-103892760", "chr10:103877281-103877460" ) ) pd <- getProfileData(bwfl, gr) plotProfileHeatmap(pd, "profile_data") + scale_fill_viridis_c(option = "inferno", direction = -1) + labs(fill = "Coverage")
Create Donut charts based on one or two columns in a data frame
plotSplitDonut(object, ...) ## S4 method for signature 'GRanges' plotSplitDonut(object, scale_by = c("n", "width"), ...) ## S4 method for signature 'DataFrame' plotSplitDonut(object, ...) ## S4 method for signature 'data.frame' plotSplitDonut( object, inner, outer, scale_by, scale_factor = 1000, r_centre = 0.5, r_inner = 1, r_outer = 1, total_glue = "{comma(N)}", total_size = 5, total_colour = "black", inner_glue = "{inner} {.data[[inner]]}\n{percent(p,0.1)}", outer_glue = "{outer} {.data[[outer]]}\n{percent(p,0.1)}", total_label = c("label", "text", "none"), inner_label = c("label", "text", "none"), outer_label = c("label", "text", "none"), label_alpha = 1, inner_label_alpha = NULL, outer_label_alpha = NULL, label_size = 3, inner_label_size = NULL, outer_label_size = NULL, label_colour = "black", inner_label_colour = NULL, outer_label_colour = NULL, min_p = 0.05, inner_min_p = NULL, outer_min_p = NULL, max_p = 1, inner_max_p = NULL, outer_max_p = NULL, inner_pattern = ".", outer_pattern = ".", inner_rotate = FALSE, outer_rotate = FALSE, explode_inner = NULL, explode_outer = NULL, explode_query = c("AND", "OR"), explode_x = 0, explode_y = 0, explode_r = 0, nudge_r = 0.5, inner_nudge_r = NULL, outer_nudge_r = NULL, expand = 0.1, inner_palette = NULL, outer_palette = NULL, inner_legend = TRUE, outer_legend = TRUE, outer_p_by = c("all", "inner"), layout = c(main = area(1, 1, 12, 12), lg1 = area(2, 12), lg2 = area(11, 12)), ... )
plotSplitDonut(object, ...) ## S4 method for signature 'GRanges' plotSplitDonut(object, scale_by = c("n", "width"), ...) ## S4 method for signature 'DataFrame' plotSplitDonut(object, ...) ## S4 method for signature 'data.frame' plotSplitDonut( object, inner, outer, scale_by, scale_factor = 1000, r_centre = 0.5, r_inner = 1, r_outer = 1, total_glue = "{comma(N)}", total_size = 5, total_colour = "black", inner_glue = "{inner} {.data[[inner]]}\n{percent(p,0.1)}", outer_glue = "{outer} {.data[[outer]]}\n{percent(p,0.1)}", total_label = c("label", "text", "none"), inner_label = c("label", "text", "none"), outer_label = c("label", "text", "none"), label_alpha = 1, inner_label_alpha = NULL, outer_label_alpha = NULL, label_size = 3, inner_label_size = NULL, outer_label_size = NULL, label_colour = "black", inner_label_colour = NULL, outer_label_colour = NULL, min_p = 0.05, inner_min_p = NULL, outer_min_p = NULL, max_p = 1, inner_max_p = NULL, outer_max_p = NULL, inner_pattern = ".", outer_pattern = ".", inner_rotate = FALSE, outer_rotate = FALSE, explode_inner = NULL, explode_outer = NULL, explode_query = c("AND", "OR"), explode_x = 0, explode_y = 0, explode_r = 0, nudge_r = 0.5, inner_nudge_r = NULL, outer_nudge_r = NULL, expand = 0.1, inner_palette = NULL, outer_palette = NULL, inner_legend = TRUE, outer_legend = TRUE, outer_p_by = c("all", "inner"), layout = c(main = area(1, 1, 12, 12), lg1 = area(2, 12), lg2 = area(11, 12)), ... )
object |
A |
... |
Not used |
scale_by |
Column to scale values by. If provided, values in this column will be summed, instead of simply counting entries. Any label in the centre of the plot will also reflect this difference |
inner |
Column name to create the inner ring |
outer |
Column name to create the outer ring, subset by the inner ring |
scale_factor |
When scaling by another column, such as width, totals will be divided by this value, with 1000 being the default to provide output in kb. |
r_centre |
The radius of the hole in the centre. Setting to zero will create a Pie chart |
r_inner , r_outer
|
The radii of the inner/outer rings |
total_glue |
glue-syntax for formatting the total which
appears in the centre of the plot. Internally, the value |
total_size |
Label size total number of entries in the centre of the plot. |
total_colour |
Label colour for the summary total in the centre |
inner_glue , outer_glue
|
glue-syntax for formatting labels
which appear on each inner/outer segment Internally, the values |
total_label , inner_label , outer_label
|
Can take values 'text', 'label'
or 'none'. If setting one the first two values, the labelling function
|
label_alpha , inner_label_alpha , outer_label_alpha
|
transparency for labels |
label_size , inner_label_size , outer_label_size
|
Size of all text labels |
label_colour , inner_label_colour , outer_label_colour
|
Takes any colour specification, with the additional option of 'palette'. In this special case, the same palette as is used for each segment will be applied. |
min_p , inner_min_p , outer_min_p
|
only display labels for segments
representing greater than this proportion of the total. If inner/outer values
are specified, the values in |
max_p , inner_max_p , outer_max_p
|
only display labels for segments
representing less than this proportion of the total. If inner/outer values
are specified, the values in |
inner_pattern , outer_pattern
|
Regular expressions which are combined with max_p and min_p values for accurately choosing labels |
inner_rotate , outer_rotate
|
logical(1). Rotate labels for inner or outer rings. This will be ignored by when setting the geom as "label". See geom_text |
explode_inner , explode_outer
|
Regular expressions from either the inner or outer ring for which segments will be 'exploded' |
explode_query |
Setting to AND and specifying values for both the inner and outer ring will require matches in both categories |
explode_x , explode_y
|
Numeric values for shifting exploded values |
explode_r |
Radius expansion for exploded values |
nudge_r , inner_nudge_r , outer_nudge_r
|
Radius expansion for labels |
expand |
Passed to expansion for both x and y axes. Can be helpful if labels are clipped by plot limits |
inner_palette |
Colour palette for the inner ring |
outer_palette |
Optional colour palette for the outer ring |
inner_legend , outer_legend
|
logical(1). Show legends for either layer |
outer_p_by |
Scale the proportions for outer segments by the complete dataset, or within each inner segment |
layout |
Passed to plot_layout |
Using a data.frame or GRanges object, this function enables creation of a Pie/Donut chart with an inner and outer ring. The function itself is extremely flexible allowing for separate colour palettes in the inner and outer rings, as well as highly customisable labels.
Sections can be exploded using a value from the inner ring or outer ring
separately, or in combination by setting explode_query = "AND"
.
Exploded sections can be shifted by expanding the radius (explode_r
), or
along the x/y co-ordinates using explode_x/y
, allowing for detailed
placement of sections.
If only the inner palette is specified, segments in the outer ring will be assigned the same colours as the inner segments, but with increased transparency. Only a single legend will be drawn in this scenario. If an outer palette is specified, both colour palettes are completely distinct and two distinct legends will be drawn. The placement of these legends, along with the larger donut plot, can be manually specified by providing a layout as defined in plot_layout. Names are not required on this layout, but may be beneficial for code reproducibility.
The inner label denoting the total can also be heavily customised using the
glue syntax to present the calculated value N
along with any
additional text, such as 'kb' if scaling GenomicRanges by width. The same
approach can be taken for the inner and outer labels, where totals are
held in the value n
, proportions are held in the value p
and the values
corresponding to each segment can be accessed using .data[[inner]]
or
.data[[outer]]
. Column titles can be added using {inner}
/{outer}
.
Values from the inner segments can be added to the outer
labels using this strategy enabling a wide variety of labelling approaches
to be utilised.
A patchwork object consisting of both ggplot2 objects and legend grobs
set.seed(200) df <- data.frame( feature = sample( c("Promoter", "Enhancer", "Intergenic"), 200, replace = TRUE ), TF1 = sample(c("Up", "Down", "Unchanged"), 200, replace = TRUE), TF2 = sample(c("Up", "Down", "Unchanged"), 200, replace = TRUE) ) ## The standard plot plotSplitDonut(df, inner = "TF1", outer = "TF2", inner_legend = FALSE) ## Adding an exploded section along with an outer palette & customisation plotSplitDonut( df, inner = "TF1", outer = "feature", total_label = "none", inner_label_alpha = 0.5, r_centre = 0, outer_glue = "{.data[[outer]]}\n(n = {n})", outer_label = "text", explode_inner = "Up", explode_outer = "Prom|Enh", explode_query = "AND", explode_r = 0.4, outer_rotate = TRUE, inner_palette = hcl.colors(3, "Spectral", rev = TRUE), outer_palette = hcl.colors(3, "Cividis") )
set.seed(200) df <- data.frame( feature = sample( c("Promoter", "Enhancer", "Intergenic"), 200, replace = TRUE ), TF1 = sample(c("Up", "Down", "Unchanged"), 200, replace = TRUE), TF2 = sample(c("Up", "Down", "Unchanged"), 200, replace = TRUE) ) ## The standard plot plotSplitDonut(df, inner = "TF1", outer = "TF2", inner_legend = FALSE) ## Adding an exploded section along with an outer palette & customisation plotSplitDonut( df, inner = "TF1", outer = "feature", total_label = "none", inner_label_alpha = 0.5, r_centre = 0, outer_glue = "{.data[[outer]]}\n(n = {n})", outer_label = "text", explode_inner = "Up", explode_outer = "Prom|Enh", explode_query = "AND", explode_r = 0.4, outer_rotate = TRUE, inner_palette = hcl.colors(3, "Spectral", rev = TRUE), outer_palette = hcl.colors(3, "Cividis") )
Find the proportion of a query reange which overlaps the subject
propOverlap(x, y, ...) ## S4 method for signature 'GRanges,GRanges' propOverlap(x, y, ignore.strand = FALSE, ...)
propOverlap(x, y, ...) ## S4 method for signature 'GRanges,GRanges' propOverlap(x, y, ignore.strand = FALSE, ...)
x , y
|
A GenomicRanges object |
... |
Not used |
ignore.strand |
If set to TRUE, then the strand of x and y is set to "*" prior to any computation. |
This behaves similarly to overlapsAny except the proportion of the query range which overlaps one or more subject ranges is returned instead of a logical vector
Numeric vector the same length as x
x <- GRanges("chr1:1-10") y <- GRanges("chr1:1-5") propOverlap(x, y) propOverlap(y, x)
x <- GRanges("chr1:1-10") y <- GRanges("chr1:1-5") propOverlap(x, y) propOverlap(y, x)
Reduce ranges retaining mcols
reduceMC(x, ignore.strand = FALSE, simplify = TRUE, ...)
reduceMC(x, ignore.strand = FALSE, simplify = TRUE, ...)
x |
A GenomicRanges object |
ignore.strand |
If set to TRUE, then the strand of x and y is set to "*" prior to any computation. |
simplify |
logical(1). Attempt to simplify returned columns where possible |
... |
Passed to reduce |
This function extends reduce so that all mcols
are returned in the output.
Where the reduced ranges map to multiple ranges in the original range,
mcols
will be returned as CompressedList
columns.
If simplify = TRUE
columns will be returned as vectors where possible.
A GRanges object
x <- GRanges(c("chr1:1-10:+", "chr1:6-12:-")) x$id <- c("range1", "range2") reduceMC(x) reduceMC(x, ignore.strand = TRUE)
x <- GRanges(c("chr1:1-10:+", "chr1:6-12:-")) x$id <- c("range1", "range2") reduceMC(x) reduceMC(x, ignore.strand = TRUE)
Perform set operations retaining all mcols from the query range
setdiffMC(x, y, ...) intersectMC(x, y, ...) unionMC(x, y, ...) ## S4 method for signature 'GRanges,GRanges' setdiffMC(x, y, ignore.strand = FALSE, simplify = TRUE, ...) ## S4 method for signature 'GRanges,GRanges' intersectMC(x, y, ignore.strand = FALSE, simplify = TRUE, ...) ## S4 method for signature 'GRanges,GRanges' unionMC(x, y, ignore.strand = FALSE, simplify = TRUE, ...)
setdiffMC(x, y, ...) intersectMC(x, y, ...) unionMC(x, y, ...) ## S4 method for signature 'GRanges,GRanges' setdiffMC(x, y, ignore.strand = FALSE, simplify = TRUE, ...) ## S4 method for signature 'GRanges,GRanges' intersectMC(x, y, ignore.strand = FALSE, simplify = TRUE, ...) ## S4 method for signature 'GRanges,GRanges' unionMC(x, y, ignore.strand = FALSE, simplify = TRUE, ...)
x , y
|
GenomicRanges objects |
... |
Not used |
ignore.strand |
If set to TRUE, then the strand of x and y is set to "*" prior to any computation. |
simplify |
logical(1) If TRUE, any List columns will be returned as vectors where possible. This can only occur if single, unique entries are present in all initial elements. |
This extends the methods provided by setdiff,
intersect and union so that
mcols
from x
will be returned as part of the output.
Where output ranges map back to multiple ranges in x
, CompressedList
columns will be returned.
By default, these will be simplified if possible, however this behaviour
can be disabled by setting simplify = FALSE
.
All columns will be returned which can also be time-consuming.
A wise approach is to only provide columns you require as part of the
query ranges x
.
If more nuanced approaches are required, the returned columns can be
further modified by many functions included in the plyranges
package,
such as mutate()
.
A GRanges object with all mcols returned form the original object. If a range obtained by setdiff maps back to two or more ranges in the original set of Ranges, mcols will be returned as CompressedList columns
x <- GRanges("chr1:1-100:+") x$id <- "range1" y <- GRanges(c("chr1:51-60:+", "chr1:21-30:-")) setdiffMC(x, y) setdiffMC(x, y, ignore.strand = TRUE) # The intersection works similarly intersectMC(x, y) # Union may contain ranges not initially in x unionMC(x, y) unionMC(x, y, ignore.strand = TRUE)
x <- GRanges("chr1:1-100:+") x$id <- "range1" y <- GRanges(c("chr1:51-60:+", "chr1:21-30:-")) setdiffMC(x, y) setdiffMC(x, y, ignore.strand = TRUE) # The intersection works similarly intersectMC(x, y) # Union may contain ranges not initially in x unionMC(x, y) unionMC(x, y, ignore.strand = TRUE)
Stitch together ranges within a given distance, using excluded ranges as barriers that cannot be crossed
stitchRanges(x, exclude, maxgap = 12500L, ignore.strand = TRUE)
stitchRanges(x, exclude, maxgap = 12500L, ignore.strand = TRUE)
x |
Ranges to be stitched together |
exclude |
Ranges to exclude |
maxgap |
The maximum distance between ranges to be stitched |
ignore.strand |
logical |
Stitches together ranges within a given distance, using any ranges provided for exclusion as barriers between stitched ranges. This may be particularly useful if wanting to stitch enhancers whilst excluding promoters.
All inputs and outputs are Genomic Ranges objects
A GRanges object
x <- GRanges(c("chr1:1-10", "chr1:101-110", "chr1:201-210", "chr2:1-10")) y <- GRanges("chr1:200:+") stitchRanges(x, exclude = y, maxgap = 100)
x <- GRanges(c("chr1:1-10", "chr1:101-110", "chr1:201-210", "chr2:1-10")) y <- GRanges("chr1:200:+") stitchRanges(x, exclude = y, maxgap = 100)
Estimate voom precision weights directly From CPM values
voomWeightsFromCPM( cpm, design = NULL, w0 = NULL, lib.size = NULL, isLogCPM = TRUE, span = 0.5, ... )
voomWeightsFromCPM( cpm, design = NULL, w0 = NULL, lib.size = NULL, isLogCPM = TRUE, span = 0.5, ... )
cpm |
Matrix of CPM or logCPM values |
design |
The design matrix for the experiment |
w0 |
Initial vector of sample weights. Should be calculated using arrayWeights |
lib.size |
Initial library sizes. Must be provided as these are no estimable from CPM values |
isLogCPM |
logical(1). Indicates whether the data is log2 transformed already. Most commonly (e.g. if using the output of cqn) it will be, |
span |
Width of the smoothing window used for the lowess mean-variance trend. Expressed as a proportion between 0 and 1. |
... |
Passed to lmFit internally |
This function takes CPM or logCPM values and estimates the precision weights as would be done by providing counts directly to voom. Using this function enables the use of logCPM values which have been normalised using other methods such as Conditional-Quantile or Smooth-Quantile Normalisation.
The precision weights are returned as part of the EList
output, and
these are automatically passed to the function lmFit
during model fitting.
This will ensure that the mean-variance relationship is appropriate for
the linear modelling steps as performed by limma.
Initial sample weights can be passed to the function, and should be calculated using arrayWeights called on the normalised logCPM values. The returned sample weights will be different to these, given that the function voomWithQualityWeights performs two rounds of estimation. The first is on the initial data, with the inappropriate mean-variance relationship, whilst the second round is after incorporation of the precision weights.
An object of class EList
as would be output by voom.
Importantly, there will be no genes
element, although this can be
added later.
Similarly, the returned targets
element will only contain sample
names and library sizes.
This can be incorporated with any other metadata as required.
Plotting data is always returned, noting the the value sx
has
been offset by the library sizes and will be simple logCPM values.
As such, the fitted Amean
is also returned in this list element.
If initial sample weights were provided, modified weights will also be returned, as the initial function voomWithQualityWeights performs two rounds of estimation of sample weights. Here we would simply provide the initial weights a priori, with the second round performed within the function. Importantly, this second round of sample weight estimation uses the precision weights ensuring the correct mean-variance relationship is used for the final estimation of sample weights
bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw") wc <- csaw::windowCounts(bamFiles, filter=1) cpm <- edgeR::cpm(wc, log = TRUE) el <- voomWeightsFromCPM(cpm, lib.size = wc$totals)
bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw") wc <- csaw::windowCounts(bamFiles, filter=1) cpm <- edgeR::cpm(wc, log = TRUE) el <- voomWeightsFromCPM(cpm, lib.size = wc$totals)