Title: | Differential Binding Analysis of ChIP-Seq Peak Data |
---|---|
Description: | Compute differentially bound sites from multiple ChIP-seq experiments using affinity (quantitative) data. Also enables occupancy (overlap) analysis and plotting functions. |
Authors: | Rory Stark [aut, cre], Gord Brown [aut] |
Maintainer: | Rory Stark <[email protected]> |
License: | Artistic-2.0 |
Version: | 3.17.0 |
Built: | 2024-11-29 07:18:45 UTC |
Source: | https://github.com/bioc/DiffBind |
Differential binding analysis of ChIP-seq peaksets
Computes differentially bound sites from multiple ChIP-seq experiments using affinity (quantitative) data. Also enables occupancy (overlap) analysis and plotting functions.
Entry Points:
dba : |
Construct a dba object |
dba.peakset : |
Add a peakset to, or retrieve a peakset from, a dba object |
dba.overlap : |
Compute binding site overlaps and/or correlations |
dba.blacklist : |
Filter peaks using blacklists and greylists |
dba.count : |
Count reads in binding sites |
dba.contrast : |
Establish design and contrast(s) for analysis |
dba.normalize : |
Normalize count data for analysis |
dba.analyze : |
Execute quantitative analysis |
dba.report : |
Generate results report for a contrast analysis |
dba.plotHeatmap : |
Heatmap plot |
dba.plotPCA : |
Principal Components plot |
dba.plotBox : |
Boxplots |
dba.plotMA : |
MA/scatter plot |
dba.plotVenn : |
Venn diagram plot |
dba.plotVolcano : |
Volcano plot |
dba.plotProfile : |
Peak profile heatmaps |
dba.show : |
Show dba metadata |
dba.mask : |
Mask samples or sites |
dba.save : |
Save dba object |
dba.load : |
Load dba object |
Rory Stark <rory.stark @at@ cruk.cam.ac.uk> and Gord Brown
Constructs a new DBA object from a sample sheet, or based on an existing DBA object
dba(DBA,mask, minOverlap=2, sampleSheet="dba_samples.csv", config=data.frame(AnalysisMethod=DBA_DESEQ2,th=0.05, DataType=DBA_DATA_GRANGES, RunParallel=TRUE, minQCth=15, fragmentSize=125, bCorPlot=FALSE, reportInit="DBA", bUsePval=FALSE, design=TRUE, doBlacklist=TRUE, doGreylist=TRUE), peakCaller="raw", peakFormat, scoreCol, bLowerScoreBetter, filter, skipLines=0, bAddCallerConsensus=FALSE, bRemoveM=TRUE, bRemoveRandom=TRUE, bSummarizedExperiment=FALSE, attributes, dir)
dba(DBA,mask, minOverlap=2, sampleSheet="dba_samples.csv", config=data.frame(AnalysisMethod=DBA_DESEQ2,th=0.05, DataType=DBA_DATA_GRANGES, RunParallel=TRUE, minQCth=15, fragmentSize=125, bCorPlot=FALSE, reportInit="DBA", bUsePval=FALSE, design=TRUE, doBlacklist=TRUE, doGreylist=TRUE), peakCaller="raw", peakFormat, scoreCol, bLowerScoreBetter, filter, skipLines=0, bAddCallerConsensus=FALSE, bRemoveM=TRUE, bRemoveRandom=TRUE, bSummarizedExperiment=FALSE, attributes, dir)
DBA |
existing DBA object – if present, will return a fully-constructed DBA object
based on the passed one,
using criteria specified in the |
mask |
logical or numerical vector indicating which peaksets to include
in the resulting model if basing DBA object on an existing one.
See |
minOverlap |
only include peaks in at least this many peaksets in the main binding matrix
if basing DBA object on an existing one.
If |
sampleSheet |
data frame containing sample sheet, or file name of sample sheet to load (ignored if DBA is specified). Columns names in sample sheet may include:
For sample sheets loaded from a file, the accepted formats are comma-separated values
(column headers, followed by one line per sample),
or Excel-formatted spreadsheets ( |
config |
See DBA-config for full set of options. Relevant fields include:
|
peakCaller |
if a |
peakFormat |
if a |
scoreCol |
if a |
bLowerScoreBetter |
if a |
filter |
if a |
skipLines |
if a |
bAddCallerConsensus |
add a consensus peakset for each sample with more than one peakset
(i.e. different peak callers) when constructing a new DBA object from a
|
bRemoveM |
logical indicating whether to remove peaks on chrM (mitochondria) when constructing a new DBA object from a sample sheet. |
bRemoveRandom |
logical indicating whether to remove peaks on chrN_random when constructing a new DBA object from a sample sheet. |
bSummarizedExperiment |
logical indicating whether to return resulting object as a |
bCorPlot |
logical indicating that a correlation heatmap should be plotted before returning.
If |
attributes |
vector of attributes to use subsequently as defaults when generating labels in plotting functions:
|
dir |
Directory path.
If supplied, files referenced in the |
MODE: Construct a new DBA object from a samplesheet:
dba(sampleSheet, config,
bAddCallerConsensus, bRemoveM, bRemoveRandom,
attributes)
MODE: Construct a DBA object based on an existing one:
dba(DBA, mask, attributes)
MODE: Convert a DBA object to a SummarizedExperiment object:
dba(DBA, bSummarizedExperiment=TRUE)
DBA object
Rory Stark and Gordon Brown
dba.peakset
, dba.show
, DBA.config
.
# Create DBA object from a samplesheet ## Not run: basedir <- system.file("extra", package="DiffBind") tamoxifen <- dba(sampleSheet="tamoxifen.csv", dir=basedir) tamoxifen tamoxifen <- dba(sampleSheet="tamoxifen_allfields.csv") tamoxifen tamoxifen <- dba(sampleSheet="tamoxifen_allfields.csv",config="config.csv") tamoxifen ## End(Not run) #Create a DBA object with a subset of samples data(tamoxifen_peaks) Responsive <- dba(tamoxifen,tamoxifen$masks$Responsive) Responsive # change peak caller but leave peak format the same basedir <- system.file("extra", package="DiffBind") tamoxifen <- dba(sampleSheet="tamoxifen.csv", dir=basedir, peakCaller="macs", peakFormat="raw", scoreCol=5 ) dba.show(tamoxifen, attributes=c(DBA_TISSUE,DBA_CONDITION,DBA_REPLICATE,DBA_CALLER)) # Convert DBA object to SummarizedExperiment data(tamoxifen_counts) sset <- dba(tamoxifen,bSummarizedExperiment=TRUE) sset
# Create DBA object from a samplesheet ## Not run: basedir <- system.file("extra", package="DiffBind") tamoxifen <- dba(sampleSheet="tamoxifen.csv", dir=basedir) tamoxifen tamoxifen <- dba(sampleSheet="tamoxifen_allfields.csv") tamoxifen tamoxifen <- dba(sampleSheet="tamoxifen_allfields.csv",config="config.csv") tamoxifen ## End(Not run) #Create a DBA object with a subset of samples data(tamoxifen_peaks) Responsive <- dba(tamoxifen,tamoxifen$masks$Responsive) Responsive # change peak caller but leave peak format the same basedir <- system.file("extra", package="DiffBind") tamoxifen <- dba(sampleSheet="tamoxifen.csv", dir=basedir, peakCaller="macs", peakFormat="raw", scoreCol=5 ) dba.show(tamoxifen, attributes=c(DBA_TISSUE,DBA_CONDITION,DBA_REPLICATE,DBA_CALLER)) # Convert DBA object to SummarizedExperiment data(tamoxifen_counts) sset <- dba(tamoxifen,bSummarizedExperiment=TRUE) sset
Standard S3 methods for DBA object.
## S3 method for class 'DBA' print(x, ...) ## S3 method for class 'DBA' summary(object, ...) ## S3 method for class 'DBA' plot(x, ...)
## S3 method for class 'DBA' print(x, ...) ## S3 method for class 'DBA' summary(object, ...) ## S3 method for class 'DBA' plot(x, ...)
x |
DBA object |
object |
DBA object |
... |
Arguments passed on to parent methods |
S3 methods for DBA object from the DiffBind
package.
DBA objects are usually constructed using the dba
function.
There are a number of internal parameters that can be set,
and defaults overridden, by setting DBA$config
options:
DBA$config$AnalysisMethod:
either DBA_DESEQ2
or DBA_EDGER
.
DBA$config$th:
default threshold for reporting and plotting analysis results.
DBA$config$DataType:
default class for peaks and reports
(DBA_DATA_GRANGES, DBA_DATA_RANGEDDATA, or DBA_DATA_FRAME
).
DBA$config$RunParallel:
logical indicating if counting and analysis
operations should be run in parallel using multicore by default.
DBA$config$cores:
number of cores to use when performing multi-core parallel processing.
DBA$config$minQCth:
numeric, for filtering reads based on mapping
quality score; only reads with a mapping quality score
greater than or equal to this will be counted.
DBA$config$fragmentSize:
numeric indicating mean fragment size for single-end counting.
Reads will be extended to this length before counting overlaps.
May be a vector of lengths, one for each sample.
DBA$config$bCorPlot:
logical indicating that a correlation heatmap
should be plotted automatically
DBA$config$ReportInit:
string to append to the beginning of saved report file names.
DBA$config$bUsePval:
logical, default indicating whether to use FDR
(FALSE
) or p-values (TRUE
).
DBA$config$doBlacklist:
logical, whether to attempt to find and apply
a blacklist if none is present when running dba.analyze
.
DBA$config$doGreylist
logical, whether to attempt to generate and apply
a greylist if none is present when running dba.analyze
.
DBA$config$DataType
The class of object for returned reports and peaksets:
DBA$config$mergeOverlap:
The overlap (in basepairs) between peaks to merge when generating
a consensus peakset.
A positive valuecontrols how many basepairs peaks
must overlap to be merged,
while a negative value will result in non-overlapping peaks to be merged,
If absent, the default value of 1
will result in any peaks
overlapping by at least one basepair to be merged into a single interval.
DBA$config$design:
When calling dba.contrast
, if design
parameter is
missing, this will be used as the value for that parameter.
DBA$config$edgeR$bTagwise:
logical
indicating if edgeR::estimateGLMTagwiseDisp
should
be called when performing an edgeR
analysis.
If absent the default is TRUE
, so setting this to FALSE
prevents the tagwise dispersion estimate form being calculated.
DBA$config$DESeq2$fitType:
logical
indicating the fitType
to be used in DESeq2::estimateDispersions
when performing a DESeq2
analysis.
If absent the default is local
.
DBA$config$savePrefix:
When calling dba.save
or dba.load
,
this value (if present)
will override the default value for the pre
parameter.
DBA$config$saveExt:
When calling dba.save
or dba.load
,
this value (if present)
will override the default value for the ext
parameter.
DBA$config$greylist.pval:
pvalue cutoff to use when generating a greylist
using GreyListChIP::calcThreshold
.
If missing, the default is 0.999
DBA$config$saveExt:
When calling dba.save
, this value (if present)
will override the default value for the ext
parameter.
DBA$config$yieldSize:
yieldSize indicating how many reads to process at one time; default is 5000000.
The lower this value, the less memory will be used, but the more time it will take to
complete the count operation.
DBA$config$intersectMode:
mode indicating which overlap algorithm to use;
default is "IntersectionNotEmpty"
DBA$config$singleEnd:
logical
indicating if reads are single end;
if NULL
, status will be automatically detected.
DBA$config$fragments:
logical
indicating how unmatched reads are counted;
default is FALSE
.
DBA$config$scanbamparam:
ScanBamParam
object to pass to summarizeOverlaps
.
If present, bRemoveDuplicates
is ignored.
DBA$config$pp.style:
Sets style
parameter for profileplyr::BamBigwig_to_chipProfile
when calling dba.plotProfile.
DBA$config$pp.nOfWindows:
Sets nOfWindow
parameter for
profileplyr::BamBigwig_to_chipProfile
when calling dba.plotProfile.
DBA$config$bin_size:
Sets bin_size
parameter for
profileplyr::BamBigwig_to_chipProfile
when calling dba.plotProfile.
DBA$config$distanceAround:
Sets distanceAround
parameter for
profileplyr::BamBigwig_to_chipProfile
when calling dba.plotProfile.
DBA$config$distanceUp:
Sets distanceUp
parameter for
profileplyr::BamBigwig_to_chipProfile
when calling dba.plotProfile.
DBA$config$distanceDown:
Sets distanceDown
parameter for
profileplyr::BamBigwig_to_chipProfile
when calling dba.plotProfile.
DBA$config$id:
character
string to use to replace "ID"
when displaying a DBA
object (dba.show
)
DBA$config$group:
character
string to use to replace "Group"
when displaying a DBA
object (dba.show
)
DBA$config$tissue:
character
string to use to replace "Tissue"
when displaying a DBA
object (dba.show
)
DBA$config$factor:
character
string to use to replace "Factor"
when displaying a DBA
object (dba.show
)
DBA$config$condition:
character
string to use to replace "Condition"
when displaying a DBA
object (dba.show
)
DBA$config$treatment:
character
string to use to replace "Treatment"
when displaying a DBA
object (dba.show
)
DBA$config$replicate:
character
string to use to replace "Replicate"
when displaying a DBA
object (dba.show
)
DBA$config$caller:
character
string to use to replace "Caller"
when displaying a DBA
object (dba.show
)
DBA$config$reads:
character
string to use to replace "Reads"
when displaying a DBA
object (dba.show
)
Rory Stark
data(tamoxifen_peaks) tamoxifen data(tamoxifen_counts) tamoxifen
data(tamoxifen_peaks) tamoxifen data(tamoxifen_counts) tamoxifen
Tamoxifen resistance dataset used for DBA examples
data(tamoxifen_peaks) data(tamoxifen_counts) data(tamoxifen_analysis) data(tamoxifen_greylist)
data(tamoxifen_peaks) data(tamoxifen_counts) data(tamoxifen_analysis) data(tamoxifen_greylist)
tamoxifen_peaks |
load tamoxifen resistance dataset DBA object with peak (occupancy) data |
tamoxifen_counts |
load tamoxifen resistance dataset DBA object with count (affinity) data. Also includes background bins counts for background normalization. |
tamoxifen_analysis |
load tamoxifen resistance dataset DBA object with count (affinity) data
and |
tamoxifen_greylist |
load greylist for tamoxifen dataset.
Generated as shown in |
The tamoxifen resistance dataset is used for the DBA vignette and man page examples.
Data used to create these objects can be downloaded at https://content.cruk.cam.ac.uk/bioinformatics/software/DiffBind/DiffBind_vignette_data.tar.gz.
loads a DBA object named tamoxifen
(or tamoxifen.greylist
).
The script for generating these files (GenerateDataFiles.R
)
is included with the package in the inst/extras
directory.
Rory Stark
data(tamoxifen_peaks) tamoxifen data(tamoxifen_counts) plot(tamoxifen) data(tamoxifen_analysis) dba.plotMA(tamoxifen) data(tamoxifen_greylist) tamoxifen.greylist$master
data(tamoxifen_peaks) tamoxifen data(tamoxifen_counts) plot(tamoxifen) data(tamoxifen_analysis) dba.plotMA(tamoxifen) data(tamoxifen_greylist) tamoxifen.greylist$master
Performs differential binding affinity analysis. Performs default generation of a consensus peakset, read counting, normalization, and setting up of contrasts if they have not been specified.
dba.analyze(DBA, method=DBA$config$AnalysisMethod, design, bBlacklist=DBA$config$doBlacklist, bGreylist=DBA$config$doGreylist, bRetrieveAnalysis=FALSE, bReduceObjects=TRUE, bParallel=DBA$config$RunParallel)
dba.analyze(DBA, method=DBA$config$AnalysisMethod, design, bBlacklist=DBA$config$doBlacklist, bGreylist=DBA$config$doGreylist, bRetrieveAnalysis=FALSE, bReduceObjects=TRUE, bParallel=DBA$config$RunParallel)
DBA |
Either a If no blacklist or greylists are included,
a call will be made to
If no counts are included, a default consensus will be formed
and read counts computed via a call to
If no normalization has been specified,
the reads will be normalized via a call to
If no contrasts are specified ( |
method |
Underlying method, or vector of methods, by which to analyze differential binding affinity. Supported methods:
|
design |
If present and a character string, will be used as the design formula for the analysis, replacing any previously established design if present. If See |
bBlacklist |
If |
bGreylist |
If |
bRetrieveAnalysis |
If changed from
An analysis object will only be successfully returned if there is at
least one contrast utilizing an explicit design
(see |
bReduceObjects |
logical indicating whether strip the analysis objects of unnecessary
fields to save memory.
If it is desired to use the |
bParallel |
logical indicating that the analyses is to be done in parallel using multicore (one process for each contrast for each method, plus an additional process per method). |
In general, prior to calling dba.analyze
, dba.count
should have been run.
If no contrasts have been established prior to invoking dba.analyze
,
then the default set of contrasts will be added using (dba.contrast
).
If no normalization parameters have been supplied by
calling dba.normalize
,
default normalization parameters will be used.
See the DBA User Guide for more details on how
the edgeR
and DESeq2
analyses are carried out.
DBA object with results of analysis added to DBA$contrasts
.
Alternatively, an analysis object (either a DESeqDataSet
or a DGEList
) if bRetrieveAnalysis
if not FALSE
.
If there is a blocking factor for the contrast(s) specified using a
previous call to dba.contrast
with design=FALSE
,
a multi-factor analysis will automatically be carried out in addition
to a single factor analysis.
Rory Stark
dba.blacklist
, dba.count
,
dba.contrast
, dba.normalize
,
dba.report
, DBA.config
.
data(tamoxifen_counts) dba.analyze(tamoxifen) tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS, design="~Tissue + Condition") dba.show(tamoxifen, bContrasts=TRUE) dba.analyze(tamoxifen, bRetrieveAnalysis=TRUE) edger.object <- dba.analyze(tamoxifen, bRetrieveAnalysis=DBA_EDGER) class(edger.object)
data(tamoxifen_counts) dba.analyze(tamoxifen) tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS, design="~Tissue + Condition") dba.show(tamoxifen, bContrasts=TRUE) dba.analyze(tamoxifen, bRetrieveAnalysis=TRUE) edger.object <- dba.analyze(tamoxifen, bRetrieveAnalysis=DBA_EDGER) class(edger.object)
Filters peak intervals that overlap a blacklist (from ENCODE or user supplied.) Filter peak intervals that overlap a greylist, either user supplied or generated based on aligned reads for control samples (e.g. Inputs).
dba.blacklist(DBA, blacklist=DBA$config$doBlacklist, greylist=DBA$config$doGreylist, Retrieve, cores=DBA$config$cores)
dba.blacklist(DBA, blacklist=DBA$config$doBlacklist, greylist=DBA$config$doGreylist, Retrieve, cores=DBA$config$cores)
DBA |
DBA object |
blacklist |
If not equal to If equal to A user specified blacklist can be specified by setting this parameter to
a Otherwise, this parameter may be set to one of the following constants, indicating which of the ENCODE blacklists should be applied:
|
greylist |
If not equal to If equal to The The following constants map to a subset of possible
A user specified greylist can also be specified by setting this parameter to
a |
Retrieve |
If present, some aspects of a previous run of the function is retrieved instead
of returning a If If If
Note that the if |
cores |
Parallel cores to use when running greylist generation. |
This function is intended to filter peak intervals that fall in regions of the genome that are known to be problematic for ChIP analysis. Blacklists, which are derived for a reference genome and should be applied for any experiments that use that reference, are distinguished from greylists, which are derived on a per-experiment basis using anomalous pileups in the control tracks (such as Inputs).
A core set of blacklists have been defined as part of the ENCODE project (see references).
Greylists can be generated using this function, which serves as a front-end to the GreyListChIP
package.
See the details of that package for more information on how it works.
Note that the GreyListChIP
package can be utilized separately to generate greylists
with more fine-grained control, with the results passed back to DiffBind
to filter peaks.
DBA
object, with peaks filtered (unless Retrieve
is specified.)
The p
threshold can be altered by setting DBA$config$greylist.pval
.
The default is 0.999
.
See GreyListChIP::
calcThreshold
for details.
Ideally, Blacklists and Greylists will be applied to the aligned reads prior
to calling peaks, as removing reads in anomalous regions
will yield better background noise models.
Once greylists have been generated, peaks can be
re-called and read into DiffBind
.
Rory Stark with thanks to Gord Brown
Amemiya HM, Kundaje A, Boyle AP. The ENCODE blacklist: identification of problematic regions of the genome. Sci Rep. 2019 Dec; 9(1) 9354 DOI: 10.1038/s41598-019-45839-z
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.
Brown, Gord. Generating Grey Lists from Input Libraries. Bioconductor. https://bioconductor.org/packages/release/bioc/html/GreyListChIP.html
GreyListChIP (GreyList)
, BSgenome
,
DBA.config
.
data(tamoxifen_peaks) ## Not run: tamoxifen <- dba.blacklist(tamoxifen, blacklist=TRUE, greylist="BSgenome.Hsapiens.UCSC.hg19") ## End(Not run) data(tamoxifen_greylist) tamoxifen <- dba.blacklist(tamoxifen, blacklist=DBA_BLACKLIST_HG19, greylist=tamoxifen.greylist$master) dba.blacklist(tamoxifen,Retrieve=DBA_GREYLIST) data(tamoxifen_counts) tamoxifen <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_CONTROL_READS) tamoxifen <- dba.blacklist(tamoxifen, blacklist=DBA_BLACKLIST_HG19, greylist=tamoxifen.greylist$master) blacklisted <- dba.blacklist(tamoxifen, Retrieve=DBA_BLACKLISTED_PEAKS) mean(blacklisted[[1]]$cReads) mean(dba.peakset(tamoxifen,peaks=1,bRetrieve=TRUE)$Score)
data(tamoxifen_peaks) ## Not run: tamoxifen <- dba.blacklist(tamoxifen, blacklist=TRUE, greylist="BSgenome.Hsapiens.UCSC.hg19") ## End(Not run) data(tamoxifen_greylist) tamoxifen <- dba.blacklist(tamoxifen, blacklist=DBA_BLACKLIST_HG19, greylist=tamoxifen.greylist$master) dba.blacklist(tamoxifen,Retrieve=DBA_GREYLIST) data(tamoxifen_counts) tamoxifen <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_CONTROL_READS) tamoxifen <- dba.blacklist(tamoxifen, blacklist=DBA_BLACKLIST_HG19, greylist=tamoxifen.greylist$master) blacklisted <- dba.blacklist(tamoxifen, Retrieve=DBA_BLACKLISTED_PEAKS) mean(blacklisted[[1]]$cReads) mean(dba.peakset(tamoxifen,peaks=1,bRetrieve=TRUE)$Score)
Sets up contrasts for differential binding affinity analysis
dba.contrast(DBA, design=missing(group1), contrast, group1, group2=!group1, name1, name2, minMembers=3, block, bNot=FALSE, bComplex=FALSE, categories=c(DBA_TISSUE,DBA_FACTOR,DBA_CONDITION,DBA_TREATMENT), bGetCoefficients=FALSE, reorderMeta)
dba.contrast(DBA, design=missing(group1), contrast, group1, group2=!group1, name1, name2, minMembers=3, block, bNot=FALSE, bComplex=FALSE, categories=c(DBA_TISSUE,DBA_FACTOR,DBA_CONDITION,DBA_TREATMENT), bGetCoefficients=FALSE, reorderMeta)
DBA |
DBA object with count data |
design |
Either a logical value, or a character string containing a valid design formula. If a logical value is specified, If a design formula is specified, it must be composed from the following allowable factors:
|
If design
is not explictly specified, and no group
is
specified, then design
will be set to
the value of DBA$config$design
, if present (see DiffBind3
).
contrast |
If a
|
group1 |
mask of samples in first group (when adding a specific contrast).
See |
group2 |
mask of samples in second group (when adding a specific contrast).
See |
name1 |
label for samples in first group (when adding a specific contrast). |
name2 |
label for samples in second group (when adding a specific contrast). |
minMembers |
when automatically generating contrasts, minimum number of unique samples in a group. Must be at least 2, as replicates are strongly advised. If you wish to do an analysis with no replicates, you can set the group1 and group2 parameters explicitly. |
bNot |
include contrasts consisting of a group and all other samples not in that group (indicated by a ! in the contrast name). |
bComplex |
include complex contrasts where groups include samples with the same values for multiple factors. |
categories |
when automatically generating contrasts, attribute or vector of attributes to base contrasts on:
|
block |
blocking attribute for multi-factor analysis. This may be specified as either a value, a vector, or a list. If block is a value, the specified metadata field is used to derive the blocking factor. One of:
If block is a vector, it can either be a mask (logical vector) or a vector of peakset numbers. In this case, the peaksets indicated in the blocking vector are all given the same factor value (true), while any peaksets not included in the vector take the alternative factor value (false). If block is a list, it should be a list of vectors (either logical masks or vectors of peakset numbers), with each indicating a set of peaksets that should share the same value. Each peakset should appear at most once, and any peaksets not specified will be given an default value (other). |
bGetCoefficients |
If |
reorderMeta |
By default, the metadata factor levels will be ordered in the
order they appear in the sample sheet.
They can be re-ordered using this parameter.
If the vector of factor values contains a subset of the possible values,
the specified values will be set to be ordered first, with the remaining values
following in their default order.
If only one factor value is supplied, it will be set as the reference
(or "control") value.
Contrasts that are no longer valid will be removed
(and a warning issued) if detected.
These include contrasts specified as a numeric vector of coefficients,
or contrasts specified using coefficient names that no longer
exists after reordering the metadata factor levels.
Any existing analysis will be removed when metadata factor levels
are reordered, necessitating
another call to |
MODE: Set up a specific contrast using a design:
dba.contrast(DBA, design, contrast)
MODE: Set up all possible contrasts:
dba.contrast(DBA, minMembers, categories)
MODE: Set up a specific contrast without an explicit design:
dba.contrast(DBA, design=FALSE, group1, group2, name1, name2, block)
DBA object with contrast(s) set as DBA$contrasts
.
Contrast list can be retrieved using dba.show(DBA, bContrasts=TRUE)
.
Contrasts will only be set up for peaksets where DBA_CALLER == "counts"
.
Contrasts can be cleared by DBA$contrasts <- NULL
.
Rory Stark
# Set up an explicit contrast data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen, contrast=c("Condition","Responsive","Resistant")) tamoxifen tamoxifen <- dba.analyze(tamoxifen) dba.show(tamoxifen,bContrasts=TRUE) # Add another contrast tamoxifen <- dba.contrast(tamoxifen, contrast=c("Tissue","MCF7","BT474")) dba.show(tamoxifen,bDesign=TRUE) # Change design tamoxifen <- dba.contrast(tamoxifen,design="~Tissue + Condition") tamoxifen <- dba.analyze(tamoxifen) tamoxifen # Automatically add all contrasts between sample groups # where at least THREE samples have the same factor value data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen) tamoxifen # Automatically add all contrasts between sample groups # where at least TWO samples have the same factor value tamoxifen <- dba.contrast(tamoxifen, minMembers=2) dba.show(tamoxifen,bContrasts=TRUE) ### Use of complex contrasts data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen, contrast=c("Tissue","BT474","MCF7")) dba.contrast(tamoxifen, bGetCoefficients=TRUE) #Change design and factor ordering tamoxifen <- dba.contrast(tamoxifen,design="~Tissue + Condition", reorderMeta=list(Condition="Responsive", Tissue=c("MCF7","ZR75","T47D","BT474"))) dba.contrast(tamoxifen, bGetCoefficients=TRUE) tamoxifen <- dba.contrast(tamoxifen,contrast="Tissue_BT474_vs_MCF7") tamoxifen <- dba.contrast(tamoxifen,contrast=list("Tissue_BT474_vs_MCF7")) tamoxifen <- dba.contrast(tamoxifen,contrast=c(0,0,0,1,0)) tamoxifen <- dba.contrast(tamoxifen, contrast=list("Tissue_BT474_vs_MCF7","Tissue_T47D_vs_MCF7")) tamoxifen <- dba.contrast(tamoxifen,contrast=c(0,0,-1,1,0)) tamoxifen <- dba.contrast(tamoxifen,contrast=c(0,0,0,0,1)) dba.show(tamoxifen,bContrasts=TRUE) tamoxifen <- dba.analyze(tamoxifen) tamoxifen tamoxifen <- dba.contrast(tamoxifen, contrast=c("Condition","Responsive","Resistant")) tamoxifen <- dba.analyze(tamoxifen) dba.show(tamoxifen,bContrasts=TRUE)[7:8,] dba.plotVenn(tamoxifen, contrast=7:8, bDB=TRUE, bAll=FALSE, bGain=TRUE, bLoss=TRUE) ## Explicit contrast, without design data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen, design=FALSE, group1=tamoxifen$masks$Responsive, name1="Responsive", group2=tamoxifen$masks$Resistant, name2="Resistant", block=DBA_TISSUE) dba.show(tamoxifen, bContrasts=TRUE) tamoxifen <- dba.analyze(tamoxifen) dba.show(tamoxifen,bContrasts=TRUE) dba.plotVenn(tamoxifen,contrast=1,method=c(DBA_DESEQ2,DBA_DESEQ2_BLOCK))
# Set up an explicit contrast data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen, contrast=c("Condition","Responsive","Resistant")) tamoxifen tamoxifen <- dba.analyze(tamoxifen) dba.show(tamoxifen,bContrasts=TRUE) # Add another contrast tamoxifen <- dba.contrast(tamoxifen, contrast=c("Tissue","MCF7","BT474")) dba.show(tamoxifen,bDesign=TRUE) # Change design tamoxifen <- dba.contrast(tamoxifen,design="~Tissue + Condition") tamoxifen <- dba.analyze(tamoxifen) tamoxifen # Automatically add all contrasts between sample groups # where at least THREE samples have the same factor value data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen) tamoxifen # Automatically add all contrasts between sample groups # where at least TWO samples have the same factor value tamoxifen <- dba.contrast(tamoxifen, minMembers=2) dba.show(tamoxifen,bContrasts=TRUE) ### Use of complex contrasts data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen, contrast=c("Tissue","BT474","MCF7")) dba.contrast(tamoxifen, bGetCoefficients=TRUE) #Change design and factor ordering tamoxifen <- dba.contrast(tamoxifen,design="~Tissue + Condition", reorderMeta=list(Condition="Responsive", Tissue=c("MCF7","ZR75","T47D","BT474"))) dba.contrast(tamoxifen, bGetCoefficients=TRUE) tamoxifen <- dba.contrast(tamoxifen,contrast="Tissue_BT474_vs_MCF7") tamoxifen <- dba.contrast(tamoxifen,contrast=list("Tissue_BT474_vs_MCF7")) tamoxifen <- dba.contrast(tamoxifen,contrast=c(0,0,0,1,0)) tamoxifen <- dba.contrast(tamoxifen, contrast=list("Tissue_BT474_vs_MCF7","Tissue_T47D_vs_MCF7")) tamoxifen <- dba.contrast(tamoxifen,contrast=c(0,0,-1,1,0)) tamoxifen <- dba.contrast(tamoxifen,contrast=c(0,0,0,0,1)) dba.show(tamoxifen,bContrasts=TRUE) tamoxifen <- dba.analyze(tamoxifen) tamoxifen tamoxifen <- dba.contrast(tamoxifen, contrast=c("Condition","Responsive","Resistant")) tamoxifen <- dba.analyze(tamoxifen) dba.show(tamoxifen,bContrasts=TRUE)[7:8,] dba.plotVenn(tamoxifen, contrast=7:8, bDB=TRUE, bAll=FALSE, bGain=TRUE, bLoss=TRUE) ## Explicit contrast, without design data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen, design=FALSE, group1=tamoxifen$masks$Responsive, name1="Responsive", group2=tamoxifen$masks$Resistant, name2="Resistant", block=DBA_TISSUE) dba.show(tamoxifen, bContrasts=TRUE) tamoxifen <- dba.analyze(tamoxifen) dba.show(tamoxifen,bContrasts=TRUE) dba.plotVenn(tamoxifen,contrast=1,method=c(DBA_DESEQ2,DBA_DESEQ2_BLOCK))
Counts reads in binding site intervals. Files must be one of bam, bed and gzip-compressed bed. File suffixes must be ".bam", ".bed", or ".bed.gz" respectively.
dba.count(DBA, peaks, minOverlap=2, score=DBA_SCORE_NORMALIZED, fragmentSize=DBA$config$fragmentSize, summits=200, filter=1, bRemoveDuplicates=FALSE, bScaleControl=TRUE, bSubControl = is.null(DBA$greylist), mapQCth=DBA$config$mapQCth, filterFun=max, minCount=0, bLog=FALSE, bUseSummarizeOverlaps=TRUE, readFormat=DBA_READS_DEFAULT, bParallel=DBA$config$RunParallel)
dba.count(DBA, peaks, minOverlap=2, score=DBA_SCORE_NORMALIZED, fragmentSize=DBA$config$fragmentSize, summits=200, filter=1, bRemoveDuplicates=FALSE, bScaleControl=TRUE, bSubControl = is.null(DBA$greylist), mapQCth=DBA$config$mapQCth, filterFun=max, minCount=0, bLog=FALSE, bUseSummarizeOverlaps=TRUE, readFormat=DBA_READS_DEFAULT, bParallel=DBA$config$RunParallel)
DBA |
DBA object |
||||||||||||||||||||||||||||||||||||
peaks |
If GRanges, RangedData, dataframe, or matrix, this parameter contains the intervals to use for counting. If character string, it specifies a file containing the intervals to use (with the first three columns specifying chromosome, startpos, endpos).If missing or a mask, generates a consensus peakset using minOverlap parameter (after applying the mask if present). If NULL, the |
||||||||||||||||||||||||||||||||||||
minOverlap |
only include peaks in at least this many peaksets when generating consensus peakset (i.e. when peaks parameter is missing). If minOverlap is between zero and one, peak will be included from at least this proportion of peaksets. |
||||||||||||||||||||||||||||||||||||
score |
which score to use in the binding affinity matrix.
Note that all raw read counts are maintained for use by
If
|
||||||||||||||||||||||||||||||||||||
fragmentSize |
This value will be used as the length of the reads.
Each read will be extended from its endpoint along the appropriate strand by this many bases.
If set to zero, the read size indicated in the BAM/BED file will be used.
|
||||||||||||||||||||||||||||||||||||
summits |
unless set to If the value of Note that if |
||||||||||||||||||||||||||||||||||||
filter |
value to use for filtering intervals with low read counts.
The NB: the filtering will be based on RPKM values.
If |
||||||||||||||||||||||||||||||||||||
bRemoveDuplicates |
logical indicating if duplicate reads (ones that map to exactly the same genomic position) should be removed.
If |
||||||||||||||||||||||||||||||||||||
bScaleControl |
logical indicating if the Control reads should be scaled based on relative library sizes.
If |
||||||||||||||||||||||||||||||||||||
bSubControl |
logical indicating whether Control read counts are subtracted for
each site in each sample.
If there are more overlapping control reads than ChIP reads,
the count will be set to the If |
||||||||||||||||||||||||||||||||||||
mapQCth |
for filtering by mapping quality (mapqc).
Only alignments with mapping scores of at least this value will be included.
Only applicable for bam files when |
||||||||||||||||||||||||||||||||||||
filterFun |
function that will be invoked for each interval with a
vector of scores for each sample.
Returns a score that will be evaluated against the |
||||||||||||||||||||||||||||||||||||
minCount |
minimum read count value. Any interval with fewer than this many overlapping reads will be set to have this count. Also applies to scores. |
||||||||||||||||||||||||||||||||||||
bLog |
logical indicating whether log2 of score should be used (only applies to DBA_SCORE_RPKM_FOLD and DBA_SCORE_READS_FOLD). |
||||||||||||||||||||||||||||||||||||
bUseSummarizeOverlaps |
logical indicating that See notes for when the Five additional parameters for
|
||||||||||||||||||||||||||||||||||||
readFormat |
Specify the file type of the read files, over-riding the file extension. Possible values:
Note that if |
||||||||||||||||||||||||||||||||||||
bParallel |
if |
DBA object with binding affinity matrix based on read count scores.
Rory Stark and Gordon Brown
# These won't run unless you have the reads available in a BAM or BED file data(tamoxifen_peaks) ## Not run: tamoxifen <- dba.count(tamoxifen) # Count using a peakset made up of only peaks in all responsive MCF7 replicates data(tamoxifen_peaks) mcf7Common <- dba.overlap(tamoxifen,tamoxifen$masks$MCF7&tamoxifen$masks$Responsive) ## Not run: tamoxifen <- dba.count(tamoxifen,peaks=mcf7Common$inAll) tamoxifen #First make consensus peaksets from each set of replicates, #then derive master consensus set for counting from those data(tamoxifen_peaks) tamoxifen <- dba.peakset(tamoxifen,consensus = -DBA_REPLICATE) ## Not run: tamoxifen <- dba.count(tamoxifen, peaks=tamoxifen$masks$Consensus) tamoxifen # Change binding affinity scores data(tamoxifen_counts) dba.peakset(tamoxifen, bRetrieve=TRUE) # default: DBA_SCORE_NORMALIZED tamoxifen <- dba.count(tamoxifen,peaks=NULL,score=DBA_SCORE_READS) dba.peakset(tamoxifen, bRetrieve=TRUE) tamoxifen <- dba.count(tamoxifen,peaks=NULL,score=DBA_SCORE_RPKM_MINUS) dba.peakset(tamoxifen, bRetrieve=TRUE) # Plot effect of a range of filter values and then apply filter data(tamoxifen_counts) rate.max <- dba.count(tamoxifen, peaks=NULL, filter=0:250) rate.sum <- dba.count(tamoxifen, peaks=NULL, filter=0:250,filterFun=sum) plot(0:250,rate.max/rate.max[1],type='l',xlab="Filter Value",ylab="Proportion Retained Sites") lines(0:250,rate.sum/rate.sum[1],col=2) tamoxifen <- dba.count(tamoxifen,peaks=NULL,filter=125,filterFun=sum) tamoxifen # Calculate summits data(tamoxifen_counts) # pre-counted with summits=250 or 501bp intervals as.numeric(dba.show(tamoxifen)$FRiP) ## Not run: tamoxifen <- dba.count(tamoxifen,peaks=NULL,summits=50) # re-counted with summits=50 or 101bp intervals as.numeric(dba.show(tamoxifen)$FRiP)
# These won't run unless you have the reads available in a BAM or BED file data(tamoxifen_peaks) ## Not run: tamoxifen <- dba.count(tamoxifen) # Count using a peakset made up of only peaks in all responsive MCF7 replicates data(tamoxifen_peaks) mcf7Common <- dba.overlap(tamoxifen,tamoxifen$masks$MCF7&tamoxifen$masks$Responsive) ## Not run: tamoxifen <- dba.count(tamoxifen,peaks=mcf7Common$inAll) tamoxifen #First make consensus peaksets from each set of replicates, #then derive master consensus set for counting from those data(tamoxifen_peaks) tamoxifen <- dba.peakset(tamoxifen,consensus = -DBA_REPLICATE) ## Not run: tamoxifen <- dba.count(tamoxifen, peaks=tamoxifen$masks$Consensus) tamoxifen # Change binding affinity scores data(tamoxifen_counts) dba.peakset(tamoxifen, bRetrieve=TRUE) # default: DBA_SCORE_NORMALIZED tamoxifen <- dba.count(tamoxifen,peaks=NULL,score=DBA_SCORE_READS) dba.peakset(tamoxifen, bRetrieve=TRUE) tamoxifen <- dba.count(tamoxifen,peaks=NULL,score=DBA_SCORE_RPKM_MINUS) dba.peakset(tamoxifen, bRetrieve=TRUE) # Plot effect of a range of filter values and then apply filter data(tamoxifen_counts) rate.max <- dba.count(tamoxifen, peaks=NULL, filter=0:250) rate.sum <- dba.count(tamoxifen, peaks=NULL, filter=0:250,filterFun=sum) plot(0:250,rate.max/rate.max[1],type='l',xlab="Filter Value",ylab="Proportion Retained Sites") lines(0:250,rate.sum/rate.sum[1],col=2) tamoxifen <- dba.count(tamoxifen,peaks=NULL,filter=125,filterFun=sum) tamoxifen # Calculate summits data(tamoxifen_counts) # pre-counted with summits=250 or 501bp intervals as.numeric(dba.show(tamoxifen)$FRiP) ## Not run: tamoxifen <- dba.count(tamoxifen,peaks=NULL,summits=50) # re-counted with summits=50 or 101bp intervals as.numeric(dba.show(tamoxifen)$FRiP)
Reads in saved DBA object
dba.load(file='DBA', dir='.', pre='dba_', ext='RData')
dba.load(file='DBA', dir='.', pre='dba_', ext='RData')
file |
main filename |
dir |
directory in which to save model |
pre |
string to pre-pend to filename |
ext |
file extension to use |
loaded DBA object
Rory Stark
data(tamoxifen_peaks) savefile <- dba.save(tamoxifen,'tamoxifenPeaks') savefile rm(tamoxifen) tamoxifen <- dba.load('tamoxifenPeaks') tamoxifen unlink(savefile)
data(tamoxifen_peaks) savefile <- dba.save(tamoxifen,'tamoxifenPeaks') savefile rm(tamoxifen) tamoxifen <- dba.load('tamoxifenPeaks') tamoxifen unlink(savefile)
Derives a mask to define a subset of peaksets or sites for a DBA object.
dba.mask(DBA, attribute, value, combine='or', mask, merge='or', bApply=FALSE, peakset, minValue=-1)
dba.mask(DBA, attribute, value, combine='or', mask, merge='or', bApply=FALSE, peakset, minValue=-1)
DBA |
DBA object |
attribute |
when deriving a peakset mask, attribute to base mask on:
|
value |
when deriving a peakset/sample mask, attribute value (or vector of attribute values) to match. |
combine |
when deriving a peakset/sample mask, if value is a vector, OR when deriving a site mask, and peaksets is a vector, this is method for combining result of each value:
|
mask |
when deriving a peakset/sample mask, this specifies an existing mask to merge with; if missing, create new mask |
merge |
when deriving a peakset/sample mask, and an existing mask is supplied, this specifies the method for combining new mask with supplied mask:
|
bApply |
when deriving a peakset/sample mask, a logical indicating that a new DBA object with the mask applied will be returned. |
peakset |
when deriving a peak/site mask, this specifies a peakset number, or a vector of peakset numbers. The resulting mask will indicate which of the overall sites were called as peaks in this peakset or set of peaksets. If a vector, the masks for each of the peaksets will be combined using the method specified in the combine parameter. |
minValue |
when deriving a peak/site mask, scores greater than this value will be considered as indicating that the site corresponds to a called peakset. |
MODE: Derive a a mask of peaksets/samples:
dba.mask(DBA, attribute, value, combine, mask, merge, bApply)
MODE: Derive a mask of peaks/sites:
dba.mask(DBA, combine, mask, merge,bApply, peakset, minValue)
either a logical mask, or new DBA object if bApply
is TRUE
.
dba automatically generates masks for each unique value of DBA_TISSUE
, DBA_FACTOR
, DBA_CONDITION
, DBA_TREATMENT
, DBA_CALLER
, and DBA_REPLICATE
. These are accessible using masks field of the DBA object (DBA$masks)
, and can be viewed using names(DBA$masks)
.
Rory Stark
data(tamoxifen_peaks) # Pre-made masks names(tamoxifen$masks) dba.show(tamoxifen,tamoxifen$masks$MCF7) # New masks mcf7Mask <- dba.mask(tamoxifen,DBA_TISSUE, "MCF7") mcf7DerivedMask <- dba.mask(tamoxifen,DBA_TISSUE,"TAMR",mask=mcf7Mask) mcf7Derived <- dba(tamoxifen,mcf7DerivedMask) mcf7Derived
data(tamoxifen_peaks) # Pre-made masks names(tamoxifen$masks) dba.show(tamoxifen,tamoxifen$masks$MCF7) # New masks mcf7Mask <- dba.mask(tamoxifen,DBA_TISSUE, "MCF7") mcf7DerivedMask <- dba.mask(tamoxifen,DBA_TISSUE,"TAMR",mask=mcf7Mask) mcf7Derived <- dba(tamoxifen,mcf7DerivedMask) mcf7Derived
Enables normalization of datasets using a variety of methods, including background, spike-in, and parallel factor normalization. Alternatively, allows a user to specify library sizes and normalization factors directly, or retrieve computed ones.
dba.normalize(DBA, method = DBA$config$AnalysisMethod, normalize = DBA_NORM_DEFAULT, library = DBA_LIBSIZE_DEFAULT, background = FALSE, spikein = FALSE, offsets = FALSE, libFun=mean, bRetrieve=FALSE, ...)
dba.normalize(DBA, method = DBA$config$AnalysisMethod, normalize = DBA_NORM_DEFAULT, library = DBA_LIBSIZE_DEFAULT, background = FALSE, spikein = FALSE, offsets = FALSE, libFun=mean, bRetrieve=FALSE, ...)
DBA |
DBA object that includes count data for a consensus peakset. |
method |
Underlying method, or vector of methods, for which to normalize. Supported methods:
|
normalize |
Either user-supplied normalization factors in a numeric vector, or a specification of a method to use to calculate normalization factors. Methods can be specified using one of the following:
|
library |
Either user-supplied library sizes in a numeric vector, or a specification of a method to use to calculate library sizes. Library sizes can be based on one of the following:
|
background |
This parameter controls the option to use "background" bins, which should not have differential enrichment between samples, as the basis for normalizing (instead of using reads counts overlapping consensus peaks). When enabled, the chromosomes for which there are peaks in the consensus peakset are tiled into large bins and reads overlapping these bins are counted. If present, If If If After counting (or setting) background bins,
both the If If |
spikein |
Either a logical value, a character vector of chromosome names,
a If If If If If Note that if |
offsets |
This parameter controls the use of offsets (matrix of normalization factors)
instead of a single normalization factor for each sample. It can either
be a logical value, a If it is a logical value and set to Alternatively, user-calculated normalization offsets can be supplied
as a |
libFun |
When For |
bRetrieve |
If set to If
If If
If
|
... |
Extra parameters to be passed to |
The default normalization parameters are as follows:
normalize=DBA_NORM_LIB
library=DBA_LIBSIZE_FULL
background=FALSE
If background=TRUE
, then the default becomes
library=DBA_LIBSIZE_BACKGROUND
.
If dba.contrast
has been
used to set up contrasts with design=FALSE
(pre-3.0 mode),
then the defaults are:
normalize=DBA_NORM_DEFAULT
library=DBA_LIBSIZE_FULL
background=FALSE
In this case, normalize=DBA_NORM_LIB
will be set for
method=DBA_DESEQ2
for backwards compatibility.
Either a DBA
object with normalization terms added,
or (if bRetrieve=TRUE
), a record
or normalization details.
The csaw
package is used to compute
background
bins and offsets
based on
limma::loessFit
.
See the DiffBind
vignette for technical details of how this
is done, and the csaw
vignette for details on
background bins and loess offsets can be used to address
different biases in ChIP-seq data.
Rory Stark
dba.count
, dba.analyze
, dba.save
# load DBA object with counts data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen,design="~Tissue + Condition") # default normalization: Full library sizes tamoxifen <- dba.normalize(tamoxifen) dba.normalize(tamoxifen, bRetrieve=TRUE) dba.analyze(tamoxifen) # RLE/TMM using Reads in Peaks tamoxifen <- dba.normalize(tamoxifen, method=DBA_ALL_METHODS, normalize=DBA_NORM_NATIVE, library=DBA_LIBSIZE_PEAKREADS) dba.normalize(tamoxifen, method=DBA_DESEQ2, bRetrieve=TRUE) dba.normalize(tamoxifen, method=DBA_EDGER, bRetrieve=TRUE) tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS) dba.show(tamoxifen,bContrasts=TRUE) dba.plotVenn(tamoxifen,contrast=1,method=DBA_ALL_METHODS,bDB=TRUE) # TMM in Background using precomputed background norm <- dba.normalize(tamoxifen,method=DBA_ALL_METHODS,bRetrieve=TRUE) tamoxifen <- dba.normalize(tamoxifen, background=norm$background, normalize="TMM", method=DBA_ALL_METHODS) tamoxifen <- dba.analyze(tamoxifen) dba.show(tamoxifen,bContrasts=TRUE) dba.plotMA(tamoxifen) # LOESS offsets tamoxifen <- dba.normalize(tamoxifen, method=DBA_ALL_METHODS, offsets=TRUE) tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS) dba.show(tamoxifen,bContrasts=TRUE) par(mfrow=c(3,1)) dba.plotMA(tamoxifen,th=0,bNormalized=FALSE) dba.plotMA(tamoxifen,method=DBA_DESEQ2) dba.plotMA(tamoxifen,method=DBA_EDGER)
# load DBA object with counts data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen,design="~Tissue + Condition") # default normalization: Full library sizes tamoxifen <- dba.normalize(tamoxifen) dba.normalize(tamoxifen, bRetrieve=TRUE) dba.analyze(tamoxifen) # RLE/TMM using Reads in Peaks tamoxifen <- dba.normalize(tamoxifen, method=DBA_ALL_METHODS, normalize=DBA_NORM_NATIVE, library=DBA_LIBSIZE_PEAKREADS) dba.normalize(tamoxifen, method=DBA_DESEQ2, bRetrieve=TRUE) dba.normalize(tamoxifen, method=DBA_EDGER, bRetrieve=TRUE) tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS) dba.show(tamoxifen,bContrasts=TRUE) dba.plotVenn(tamoxifen,contrast=1,method=DBA_ALL_METHODS,bDB=TRUE) # TMM in Background using precomputed background norm <- dba.normalize(tamoxifen,method=DBA_ALL_METHODS,bRetrieve=TRUE) tamoxifen <- dba.normalize(tamoxifen, background=norm$background, normalize="TMM", method=DBA_ALL_METHODS) tamoxifen <- dba.analyze(tamoxifen) dba.show(tamoxifen,bContrasts=TRUE) dba.plotMA(tamoxifen) # LOESS offsets tamoxifen <- dba.normalize(tamoxifen, method=DBA_ALL_METHODS, offsets=TRUE) tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS) dba.show(tamoxifen,bContrasts=TRUE) par(mfrow=c(3,1)) dba.plotMA(tamoxifen,th=0,bNormalized=FALSE) dba.plotMA(tamoxifen,method=DBA_DESEQ2) dba.plotMA(tamoxifen,method=DBA_EDGER)
Computes binding overlaps and co-occupancy statistics
dba.overlap(DBA, mask, mode=DBA_OLAP_PEAKS, contrast, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, report, byAttribute, bCorOnly=TRUE, CorMethod="pearson", DataType=DBA$config$DataType)
dba.overlap(DBA, mask, mode=DBA_OLAP_PEAKS, contrast, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, report, byAttribute, bCorOnly=TRUE, CorMethod="pearson", DataType=DBA$config$DataType)
DBA |
DBA object |
mask |
mask or vector of peakset numbers indicating a subset of peaksets to use (see |
mode |
indicates which results should be returned (see MODES below). One of: |
contrast |
contrast number to use. Only specified if contrast data is to be used when |
method |
if contrast is specified and |
th |
if contrast is specified and |
bUsePval |
if contrast is specified and |
report |
if contrast is specified and |
byAttribute |
when computing co-occupancy statistics ( |
bCorOnly |
when computing co-occupancy statistics ( |
CorMethod |
when computing co-occupancy statistics ( |
DataType |
if Can be set as default behavior by setting |
MODE: Generate overlapping/unique peaksets:
dba.overlap(DBA, mask, mode=DBA_OLAP_PEAKS, minVal)
MODE: Compute correlation and co-occupancy statistics (e.g. for dba.plotHeatmap):
dba.overlap(DBA, mask, mode=DBA_OLAP_ALL, byAttribute, minVal,
attributes, bCorOnly, CorMethod)
MODE: Compute correlation and co-occupancy statistics using significantly differentially bound sites (e.g. for dba.plotHeatmap):
dba.overlap(DBA, mask, mode=DBA_OLAP_ALL, byAttribute, minVal,
contrast, method, th=, bUsePval,
attributes, bCorOnly, CorMethod)
Note that the scores from the global binding affinity matrix will be used for correlations unless a report containing count data is specified.
MODE: Compute overlap rates at different stringency thresholds:
dba.overlap(DBA, mask, mode=DBA_OLAP_RATE, minVal)
Value depends on the mode specified in the mode parameter.
If mode=DBA_OLAP_PEAKS
, Value is an overlap record: a list of three peaksets for an A-B overlap, seven peaksets for a A-B-C overlap, and fifteen peaksets for a A-B-C-D overlap:
inAll |
peaks in all peaksets |
onlyA |
peaks unique to peakset A |
onlyB |
peaks unique to peakset B |
onlyC |
peaks unique to peakset C |
onlyD |
peaks unique to peakset D |
notA |
peaks in all peaksets except peakset A |
notB |
peaks in all peaksets except peakset B |
notC |
peaks in all peaksets except peakset C |
notD |
peaks in all peaksets except peakset D |
AandB |
peaks in peaksets A and B but not in peaksets C or D |
AandC |
peaks in peaksets A and C but not in peaksets B or D |
AandD |
peaks in peaksets A and D but not in peaksets B or C |
BandC |
peaks in peaksets B and C but not in peaksets A or D |
BandD |
peaks in peaksets B and D but not in peaksets A or C |
CandD |
peaks in peaksets C and D but not in peaksets A or B |
If mode=DBA_OLAP_ALL
, Value is a correlation record:
a matrix with a row for each pair of peaksets and the following columns:
A |
peakset number of first peakset in overlap |
B |
peakset number of second peakset in overlap |
onlyA |
number of sites unique to peakset A |
onlyB |
number of sites unique to peakset B |
inAll |
number of peaks in both peakset A and B (merged) |
R2 |
correlation value A vs B |
Overlap |
percentage overlap (number of overlapping sites divided by number of peaks unique to smaller peakset |
If mode=DBA_OLAP_RATE
, Value is a vector whose length is the number of peaksets,
containing the number of overlapping peaks at the corresponding
minOverlaps
threshold (i.e., Value[1] is the total number of unique sites, Value[2] is the number of unique sites appearing in at least two peaksets, Value[3] the number of sites overlapping in at least three peaksets, etc.).
Rory Stark
data(tamoxifen_peaks) # default mode: DBA_OLAP_PEAKS -- get overlapping/non overlapping peaksets mcf7 <- dba.overlap(tamoxifen,tamoxifen$masks$MCF7&tamoxifen$masks$Responsive) names(mcf7) mcf7$inAll # mode: DBA_OLAP_ALL -- get correlation record mcf7 <- dba(tamoxifen,tamoxifen$masks$MCF7) mcf7.corRec <- dba.overlap(mcf7,mode=DBA_OLAP_ALL,bCorOnly=FALSE) mcf7.corRec # mode: DBA_OLAP_RATE -- get overlap rate vector data(tamoxifen_peaks) rate <- dba.overlap(tamoxifen, mode=DBA_OLAP_RATE) rate plot(rate,type='b',xlab="# peaksets",ylab="# common peaks", main="Tamoxifen dataset overlap rate")
data(tamoxifen_peaks) # default mode: DBA_OLAP_PEAKS -- get overlapping/non overlapping peaksets mcf7 <- dba.overlap(tamoxifen,tamoxifen$masks$MCF7&tamoxifen$masks$Responsive) names(mcf7) mcf7$inAll # mode: DBA_OLAP_ALL -- get correlation record mcf7 <- dba(tamoxifen,tamoxifen$masks$MCF7) mcf7.corRec <- dba.overlap(mcf7,mode=DBA_OLAP_ALL,bCorOnly=FALSE) mcf7.corRec # mode: DBA_OLAP_RATE -- get overlap rate vector data(tamoxifen_peaks) rate <- dba.overlap(tamoxifen, mode=DBA_OLAP_RATE) rate plot(rate,type='b',xlab="# peaksets",ylab="# common peaks", main="Tamoxifen dataset overlap rate")
Adds a peakset to, or retrieves a peakset from, a DBA object
dba.peakset(DBA=NULL, peaks, sampID, tissue, factor, condition, treatment, replicate, control, peak.caller, peak.format, reads=0, consensus=FALSE, bamReads, bamControl, spikein, scoreCol, bLowerScoreBetter, filter, counts, bRemoveM=TRUE, bRemoveRandom=TRUE, minOverlap=2, bMerge=TRUE, bRetrieve=FALSE, writeFile, numCols=4, DataType=DBA$config$DataType)
dba.peakset(DBA=NULL, peaks, sampID, tissue, factor, condition, treatment, replicate, control, peak.caller, peak.format, reads=0, consensus=FALSE, bamReads, bamControl, spikein, scoreCol, bLowerScoreBetter, filter, counts, bRemoveM=TRUE, bRemoveRandom=TRUE, minOverlap=2, bMerge=TRUE, bRetrieve=FALSE, writeFile, numCols=4, DataType=DBA$config$DataType)
DBA |
DBA object. Required unless creating a new DBA object by adding an initial peakset. |
peaks |
When adding a specified peakset: set of peaks, either a When adding a consensus peakset: a sample mask or vector of peakset numbers to include in the consensus.
If missing or When adding and empty peakset (zero peaks), set When adding a set of consensus peaksets: a sample mask or vector of peakset numbers. Sample sets will be derived only from subsets of these peaksets. When adding all the peaks from one DBA object to another: a DBA object.
In this case, the only other parameter to have an effect is When retrieving and/or writing a peakset: either a |
sampID |
ID string for the peakset being added; if missing, one is assigned (a serial number for a new peakset, or a concatenation of IDs for a consensus peakset). Must be unique for each sample. |
tissue |
tissue name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of tissues). |
factor |
factor name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of factors). |
condition |
condition name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of conditions). |
treatment |
treatment name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of treatment). |
replicate |
replicate number for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of replicate numbers). |
control |
control name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of control names). |
peak.caller |
peak caller name string. If peaks is specified as a file, and peak.format is missing, a default fie format for the caller will be used (see peak.format). Supported values:
When adding a consensus peakset, a default value (a concatenation of peak caller names) is assigned if this is missing. |
peak.format |
peak format string. If specified, overrides the default file format for the specified peak caller. Supported formats (with default score column):
|
reads |
total number of ChIPed library reads for the peakset being added. |
consensus |
either the logical value of the consensus attribute when adding a specific peakset
(set to
|
bamReads |
file path of the BAM/BED file containing the aligned reads for the peakset being added. |
bamControl |
file path of the BAM/BED file containing the aligned reads for the control used for the peakset being added. |
spikein |
file path of the BAM/BED file containing the aligned reads for the spike-ins used for the peakset being added. |
scoreCol |
peak column to normalize to 0...1 scale when adding a peakset; 0 indicates no normalization |
bLowerScoreBetter |
Logical indicating that lower scores indicate higher confidence peaks; default is that higher scores indicate better peaks. |
filter |
Numeric indicating a filter value for peaks.
If present, any peaks with a score less than this value
(or higher if |
counts |
Used for adding externally computed peak counts.
Can be a filename or a dataframe.
Can consist of a single column (or vector) with the counts, or two columns,
with an ID for each interval in the first column and the counts in the second column,
or four columns (chr, start, end, counts).
When |
bRemoveM |
logical indicating whether to remove peaks on chrM when adding a peakset |
bRemoveRandom |
logical indicating whether to remove peaks on chrN_random when adding a peakset |
minOverlap |
the minimum number of peaksets a peak must be in to be included when adding a consensus peakset.
When retrieving, if the peaks parameter is
a vector (logical mask or vector of peakset numbers), a binding matrix will be retrieved
including all peaks in at least this many peaksets.
If |
bMerge |
logical indicating whether global binding matrix should be compiled after adding the peakset.
When adding several peaksets via successive calls to |
bRetrieve |
logical indicating that a peakset is being retrieved and/or written, not added. |
writeFile |
file to write retrieved peakset. |
numCols |
number of columns to include when writing out peakset. First four columns are chr, start, end, score; the remainder are maintained from the original peakset. Ignored when writing out complete binding matrix. |
DataType |
The class of object for returned peaksets:
Can be set as default behavior by setting |
MODE: Add a specified peakset:
dba.peakset(DBA=NULL, peaks, sampID, tissue, factor, condition, replicate,
control, peak.caller, reads, consensus, bamReads, bamControl,
normCol, bRemoveM, bRemoveRandom)
MODE: Add a consensus peakset (derived from overlapping peaks in peaksets already present):
dba.peakset(DBA, peaks, minOverlap)
MODE: Add a sets of consensus peaksets bases on sample sets that share or differ in specified attributes
dba.peakset(DBA, peaks, consensus, minOverlap)
MODE: Retrieve a peakset:
dba.peakset(DBA, peaks, bRetrieve=T)
MODE: Write a peakset out to a file:
dba.peakset(DBA, peaks, bRetrieve=T, writeFile, numCols)
DBA object when adding a peakset.
Peakset matrix or GRanges
object when retrieving and/or writing a peakset.
Rory Stark
to add peaksets using a sample sheet, see dba
.
$config$
options are described in DBA.config
.
# create a new DBA object by adding three peaksets mcf7 <- dba.peakset(NULL, peaks=system.file("extra/peaks/MCF7_ER_1.bed.gz", package="DiffBind"), peak.caller="bed", sampID="MCF7.1",tissue="MCF7", factor="ER",condition="Responsive",replicate=1) mcf7 <- dba.peakset(mcf7, peaks=system.file("extra/peaks/MCF7_ER_2.bed.gz", package="DiffBind"), peak.caller="bed", sampID="MCF7.2",tissue="MCF7", factor="ER",condition="Responsive",replicate=2) mcf7 <- dba.peakset(mcf7, peaks=system.file("extra/peaks/MCF7_ER_3.bed.gz", package="DiffBind"), peak.caller="bed", sampID="MCF7.3",tissue="MCF7", factor="ER",condition="Responsive",replicate=3) mcf7 #retrieve peaks that are in all three peaksets mcf7.consensus <- dba.peakset(mcf7, 1:3, minOverlap=3, bRetrieve=TRUE) mcf7.consensus #add a consensus peakset -- peaks in all three replicates mcf7 <- dba.peakset(mcf7, 1:3, minOverlap=3,sampID="MCF7_3of3") mcf7 #add consensus peaksets for all sample types by combining replicates data(tamoxifen_peaks) tamoxifen <- dba.peakset(tamoxifen,consensus = -DBA_REPLICATE) dba.show(tamoxifen,mask=tamoxifen$masks$Consensus) #add consensus peaksets for all sample types by (same tissue and condition) data(tamoxifen_peaks) tamoxifen <- dba.peakset(tamoxifen,consensus = c(DBA_TISSUE,DBA_CONDITION)) dba.show(tamoxifen,mask=tamoxifen$masks$Consensus) dba.plotVenn(tamoxifen,tamoxifen$masks$Responsive & tamoxifen$masks$Consensus) #create consensus peaksets from sample type consensuses for Responsive and Resistant sample groups tamoxifen <- dba.peakset(tamoxifen,peaks=tamoxifen$masks$Consensus,consensus=DBA_CONDITION) dba.show(tamoxifen,mask=tamoxifen$masks$Consensus) dba.plotVenn(tamoxifen,17:18) #retrieve the consensus peakset as GRanges object mcf7.consensus <- dba.peakset(mcf7,mcf7$masks$Consensus,bRetrieve=TRUE) mcf7.consensus
# create a new DBA object by adding three peaksets mcf7 <- dba.peakset(NULL, peaks=system.file("extra/peaks/MCF7_ER_1.bed.gz", package="DiffBind"), peak.caller="bed", sampID="MCF7.1",tissue="MCF7", factor="ER",condition="Responsive",replicate=1) mcf7 <- dba.peakset(mcf7, peaks=system.file("extra/peaks/MCF7_ER_2.bed.gz", package="DiffBind"), peak.caller="bed", sampID="MCF7.2",tissue="MCF7", factor="ER",condition="Responsive",replicate=2) mcf7 <- dba.peakset(mcf7, peaks=system.file("extra/peaks/MCF7_ER_3.bed.gz", package="DiffBind"), peak.caller="bed", sampID="MCF7.3",tissue="MCF7", factor="ER",condition="Responsive",replicate=3) mcf7 #retrieve peaks that are in all three peaksets mcf7.consensus <- dba.peakset(mcf7, 1:3, minOverlap=3, bRetrieve=TRUE) mcf7.consensus #add a consensus peakset -- peaks in all three replicates mcf7 <- dba.peakset(mcf7, 1:3, minOverlap=3,sampID="MCF7_3of3") mcf7 #add consensus peaksets for all sample types by combining replicates data(tamoxifen_peaks) tamoxifen <- dba.peakset(tamoxifen,consensus = -DBA_REPLICATE) dba.show(tamoxifen,mask=tamoxifen$masks$Consensus) #add consensus peaksets for all sample types by (same tissue and condition) data(tamoxifen_peaks) tamoxifen <- dba.peakset(tamoxifen,consensus = c(DBA_TISSUE,DBA_CONDITION)) dba.show(tamoxifen,mask=tamoxifen$masks$Consensus) dba.plotVenn(tamoxifen,tamoxifen$masks$Responsive & tamoxifen$masks$Consensus) #create consensus peaksets from sample type consensuses for Responsive and Resistant sample groups tamoxifen <- dba.peakset(tamoxifen,peaks=tamoxifen$masks$Consensus,consensus=DBA_CONDITION) dba.show(tamoxifen,mask=tamoxifen$masks$Consensus) dba.plotVenn(tamoxifen,17:18) #retrieve the consensus peakset as GRanges object mcf7.consensus <- dba.peakset(mcf7,mcf7$masks$Consensus,bRetrieve=TRUE) mcf7.consensus
Boxplots for read count distributions within differentially bound sites
dba.plotBox(DBA, contrast=1, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, bNormalized=TRUE, attribute=DBA_GROUP, mask, bAll=FALSE, bAllIncreased=FALSE, bAllDecreased=FALSE, bDB=TRUE, bDBIncreased=TRUE, bDBDecreased=TRUE, pvalMethod=wilcox.test, bReversePos=FALSE, attribOrder, vColors, varwidth=TRUE, notch=TRUE, ...)
dba.plotBox(DBA, contrast=1, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, bNormalized=TRUE, attribute=DBA_GROUP, mask, bAll=FALSE, bAllIncreased=FALSE, bAllDecreased=FALSE, bDB=TRUE, bDBIncreased=TRUE, bDBDecreased=TRUE, pvalMethod=wilcox.test, bReversePos=FALSE, attribOrder, vColors, varwidth=TRUE, notch=TRUE, ...)
DBA |
DBA object. |
contrast |
number of contrast to use for boxplot. |
method |
method used for analysis (used in conjunction with contrast): |
th |
significance threshold; all sites with FDR (or p-values, see bUsePval) less than or equal to this value will be included in the boxplot. |
bUsePval |
logical indicating whether to use FDR (FALSE) or p-value (TRUE) for thresholding. |
bNormalized |
logical indicating that normalized data (using normalization factors computed by differential analysis method)
should be plotted. |
attribute |
attribute to use for determining groups of samples.
Default ( |
mask |
logical mask of samples to include when no groups are present. |
bAll |
logical indicating if plot should include a set of boxplots using all counts, regardless of whether or not they pass the significance threshold. |
bAllIncreased |
logical indicating if plot should include a set of boxplots using all counts that increase in affinity, regardless of whether or not they pass the significance threshold. |
bAllDecreased |
logical indicating if plot should include a set of boxplots using all counts that decrease in affinity, regardless of whether or not they pass the significance threshold. |
bDB |
logical indicating if plot should include a set of boxplots using all counts in significantly differentially bound sites (i.e. those that pass the significance threshold), regardless of whether they increase or decrease in affinity. |
bDBIncreased |
logical indicating if plot should include a set of boxplots using all counts in significantly differentially bound sites that increase in affinity. |
bDBDecreased |
logical indicating if plot should include a set of boxplots using all counts in significantly differentially bound sites that decrease in affinity. |
pvalMethod |
method to use when computing matrix of p-values.
If |
bReversePos |
logical indicating if the default definition of positive affinity (higher affinity in the second group of the contrast) should be reversed (i.e. positive affinity is defined as being higher in the first group of the contrast). |
attribOrder |
vector of group numbers used to change the order that groups are plotted.
If |
vColors |
vector of custom colors; if absent, default colors will be used. |
varwidth |
passed to boxplot |
notch |
passed to boxplot |
... |
other arguments passed to boxplot |
Draws a boxplot showing distributions of read counts for various groups of samples under various conditions. In default mode, draws six boxes: one pair of boxes showing the distribution of read counts within all significantly differentially bound sites (one box for each sample group), one pair of boxes showing the distribution of read counts for significantly differentially bound sites that increase affinity in the second group, and a second pair of boxes showing the distribution of read counts for significantly differentially bound sites that have higher mean affinity in the first group.
if pvalMethod
is not NULL
, returns a matrix of p-values indicating the significance of the difference between each pair of distributions.
Rory Stark
data(tamoxifen_analysis) #default boxplot includes all DB sites, then divided into those increasing # affinity in each group dba.plotBox(tamoxifen) # plot non-normalized data for DB sites by tissue # (changing order to place Resistant samples last) dba.plotBox(tamoxifen, attribute=DBA_CONDITION, bDBIncreased=FALSE, bDBDecreased=FALSE, attribOrder=c(2,1), bNormalized=FALSE)
data(tamoxifen_analysis) #default boxplot includes all DB sites, then divided into those increasing # affinity in each group dba.plotBox(tamoxifen) # plot non-normalized data for DB sites by tissue # (changing order to place Resistant samples last) dba.plotBox(tamoxifen, attribute=DBA_CONDITION, bDBIncreased=FALSE, bDBDecreased=FALSE, attribOrder=c(2,1), bNormalized=FALSE)
Draws a binding site heatmap
dba.plotHeatmap(DBA, attributes=DBA$attributes, maxSites=1000, minval, maxval, contrast, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, report, score, bLog=TRUE, mask, sites, sortFun=sd, correlations=TRUE, olPlot=DBA_COR, ColAttributes,RowAttributes, colSideCols, rowSideCols=colSideCols, margin=10, colScheme="Greens", distMethod="pearson", ...)
dba.plotHeatmap(DBA, attributes=DBA$attributes, maxSites=1000, minval, maxval, contrast, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, report, score, bLog=TRUE, mask, sites, sortFun=sd, correlations=TRUE, olPlot=DBA_COR, ColAttributes,RowAttributes, colSideCols, rowSideCols=colSideCols, margin=10, colScheme="Greens", distMethod="pearson", ...)
DBA |
DBA object. |
attributes |
attribute or vector of attributes to use for column labels: |
maxSites |
maximum number of binding sites to use in heatmap.
Only used when not drawing a correlation heatmap ( |
minval |
Set all scores less than this to minval |
maxval |
Set all scores greater than this to maxval |
contrast |
number of contrast to report on; if present, draws a heatmap based on a differential binding affinity analysis
(see |
method |
analysis method (used in conjunction with contrast): |
th |
significance threshold; all sites with FDR (or p-values, see b |
bUsePval |
logical indicating whether to use FDR ( |
report |
report (obtained from |
score |
Score to use for count data. Only used when plotting the global binding matrix (no contrast specified). One of: |
bLog |
Logical indicating that log2 values should be used. Only applicable with read count scores (not peak scores). |
mask |
mask indicating a subset of peaksets to use when using global binding matrix scores.
If a |
sites |
logical vector indicating which sites to include; first |
sortFun |
function taking a vector of scores and returning a single value.
Only relevant when using global binding matrix ( |
correlations |
logical indicating that a correlation heatmap should be plotted ( |
olPlot |
if correlations is specified as a dataframe returned by |
ColAttributes |
Attribute or vector of attributes to plot for column color bars.
If missing, all attributes with two or more unique non-NA values will be plotted.
(For correlation heatmaps, |
RowAttributes |
Attribute or vector of attributes for row color bars.
Row color bars are only allowed for correlation heatmaps.
Same values as for |
rowSideCols |
Vector of colors to use in row color bars. Uses default colors if missing. Can also be a list of color vectors. |
colSideCols |
Vector of colors to use in column color bars. Uses default colors if missing. Can also be a list of color vectors. |
margin |
margin size of plot |
colScheme |
Color scheme; see |
distMethod |
distance method for clustering; see |
... |
passed on to |
MODE: Correlation Heatmap plot using statistics for global binding matrix:
dba.plotHeatmap(DBA, attributes=DBA$attributes, minval, maxval,
correlations, olPlot,
colScheme="Greens", distMethod="pearson", ...)
MODE: Correlation Heatmap plot using statistics for significantly differentially bound sites:
dba.plotHeatmap(DBA, attributes=DBA$attributes, minval, maxval,
contrast, method=DBA_DESEQ2, th=0.05, bUsePval=F, mask,
overlaps, olPlot=DBA_COR,
colScheme="Greens", distMethod="pearson", ...)
MODE: Binding heatmap plot using significantly differentially bound sites:
dba.plotHeatmap(DBA, attributes, maxSites, minval, maxval,
contrast, method, th, bUsePval, correlations=FALSE,
colScheme, distMethod, ...)
MODE: Binding heatmap plot using the global binding matrix:
dba.plotHeatmap(DBA, attributes, maxSites, minval, maxval,
mask, sites, correlations=FALSE, sortFun,
colScheme, distMethod, ...)
if correlations is not FALSE
, the overlap/correlation matrix is returned.
if correlations is FALSE
, the sites used in the heatmap are returned in a
GRanges
object,
in the row order they appear (top to bottom).
The metadata contains a column for
each sample (also in the order they are appear in the clustering plot),
with the values being the actual plotted values.
Rory Stark
data(tamoxifen_peaks) # peak overlap correlation heatmap dba.plotHeatmap(tamoxifen) data(tamoxifen_counts) # counts correlation heatmap dba.plotHeatmap(tamoxifen) data(tamoxifen_analysis) #correlation heatmap based on all normalized data dba.plotHeatmap(tamoxifen,contrast=1,th=1) #correlation heatmap based on DB sites only dba.plotHeatmap(tamoxifen,contrast=1) #binding heatmap based on DB sites dba.plotHeatmap(tamoxifen,contrast=1,correlations=FALSE) #binding heatmap based on 1,000 sites with highest variance sites <- dba.plotHeatmap(tamoxifen,contrast=1,th=1, correlations=FALSE,sortFun=var) sites data(tamoxifen_counts) #Examples of heatmaps using DB sites with different subsets of samples #exclude T47D tamoxifen <- dba.contrast(tamoxifen,design=FALSE, group1=tamoxifen$masks$Resistant, group2=c(3:5,10:11)) tamoxifen <- dba.analyze(tamoxifen) # regular heatmaps with samples from two contrast groups only dba.plotHeatmap(tamoxifen, contrast=1) #also include the T47D samples dba.plotHeatmap(tamoxifen,contrast=1,mask=tamoxifen$masks$All) #correlation heatmap without MCF7 plot(tamoxifen,contrast=1,mask=!tamoxifen$masks$MCF7) # binding heatmap using only the MCF7 samples dba.plotHeatmap(tamoxifen,contrast=1,mask=tamoxifen$masks$MCF7,correlations=FALSE)
data(tamoxifen_peaks) # peak overlap correlation heatmap dba.plotHeatmap(tamoxifen) data(tamoxifen_counts) # counts correlation heatmap dba.plotHeatmap(tamoxifen) data(tamoxifen_analysis) #correlation heatmap based on all normalized data dba.plotHeatmap(tamoxifen,contrast=1,th=1) #correlation heatmap based on DB sites only dba.plotHeatmap(tamoxifen,contrast=1) #binding heatmap based on DB sites dba.plotHeatmap(tamoxifen,contrast=1,correlations=FALSE) #binding heatmap based on 1,000 sites with highest variance sites <- dba.plotHeatmap(tamoxifen,contrast=1,th=1, correlations=FALSE,sortFun=var) sites data(tamoxifen_counts) #Examples of heatmaps using DB sites with different subsets of samples #exclude T47D tamoxifen <- dba.contrast(tamoxifen,design=FALSE, group1=tamoxifen$masks$Resistant, group2=c(3:5,10:11)) tamoxifen <- dba.analyze(tamoxifen) # regular heatmaps with samples from two contrast groups only dba.plotHeatmap(tamoxifen, contrast=1) #also include the T47D samples dba.plotHeatmap(tamoxifen,contrast=1,mask=tamoxifen$masks$All) #correlation heatmap without MCF7 plot(tamoxifen,contrast=1,mask=!tamoxifen$masks$MCF7) # binding heatmap using only the MCF7 samples dba.plotHeatmap(tamoxifen,contrast=1,mask=tamoxifen$masks$MCF7,correlations=FALSE)
Generates MA and scatter plots of differential binding analysis results.
dba.plotMA(DBA, contrast=1, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, fold=0, bNormalized=TRUE, factor="", bFlip=FALSE, bXY=FALSE, dotSize=.45, bSignificant=TRUE, highlight=NULL, bSmooth=TRUE, bLoess=TRUE, xrange, yrange, ...)
dba.plotMA(DBA, contrast=1, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, fold=0, bNormalized=TRUE, factor="", bFlip=FALSE, bXY=FALSE, dotSize=.45, bSignificant=TRUE, highlight=NULL, bSmooth=TRUE, bLoess=TRUE, xrange, yrange, ...)
DBA |
DBA object, on which |
contrast |
number of contrast to report on.
See Alternatively, an MA plot can be generated without a specific contrast,
plotting one set of samples against another.
In this case, |
method |
method or vector of methods to plot results for: |
th |
significance threshold; all sites with FDR (or p-values, see |
bUsePval |
logical indicating whether to use FDR ( |
fold |
will only include sites with fold change greater than this as significant (colored red). If |
bNormalized |
logical indicating whether to plot normalized data using normalization factors
computed by differential analysis method ( |
factor |
string to be prepended to plot main title; e.g. factor name. |
bFlip |
logical indicating that order of groups in contrast should be "flipped", allowing control of which sample group will have positive and which will have negative fold changes. |
bXY |
logical indicating whether to draw MA plot ( |
dotSize |
size of points on plot ( |
bSignificant |
Logical indicating if points corresponding to significantly
differentially bound sites (based on |
highlight |
|
bSmooth |
logical indicating that basic plot should be plotted using
|
bLoess |
logical indicating that a MA plot should include a fitted loess curve. |
xrange |
vector of length 2 containing the desired minimum and maximum concentrations to plot. |
yrange |
vector of length 2 containing the desired minimum and maximum fold changes to plot. |
... |
passed to underlying plotting functions. |
Rory Stark
data(tamoxifen_analysis) # default MA plot dba.plotMA(tamoxifen) # Show different normalizations tamoxifen <- dba.normalize(tamoxifen,method=DBA_ALL_METHODS, library=DBA_LIBSIZE_PEAKREADS, background=FALSE) tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS) par(mfrow=c(3,2)) dba.plotMA(tamoxifen,th=0,bNormalized=FALSE,sub="NON-NORMALIZED") dba.plotMA(tamoxifen,th=0,bNormalized=FALSE,sub="NON-NORMALIZED") dba.plotMA(tamoxifen,method=DBA_DESEQ2,bNormalized=TRUE, sub="DESeq2_RLE-RiP") dba.plotMA(tamoxifen,method=DBA_EDGER,bNormalized=TRUE, sub="edgeR_TMM-RiP") tamoxifen <- dba.normalize(tamoxifen, method=DBA_ALL_METHODS, normalize=DBA_NORM_LIB, background=FALSE) tamoxifen <- dba.analyze(tamoxifen,method=DBA_ALL_METHODS) dba.plotMA(tamoxifen,method=DBA_DESEQ2,bNormalized=TRUE, sub="DESeq2_LIB-FULL") dba.plotMA(tamoxifen,method=DBA_EDGER,bNormalized=TRUE, sub="edgeR_LIB-FULL") # MA plots of samples without a contrast data(tamoxifen_counts) par(mfrow=c(2,2)) dba.plotMA(tamoxifen,list(Resistant=tamoxifen$masks$Resistant, Responsive=tamoxifen$masks$Responsive), bNormalized=FALSE) dba.plotMA(tamoxifen,list(MCF7=tamoxifen$masks$MCF7), bNormalized=FALSE) dba.plotMA(tamoxifen, list(Sample1=1), bNormalized=FALSE) dba.plotMA(tamoxifen, list(Random=sample(1:11,5)), bNormalized=FALSE) #XY plots (with raw and normalized data) data(tamoxifen_analysis) par(mfrow=c(1,2)) dba.plotMA(tamoxifen,bXY=TRUE,bSmooth=FALSE,bNormalized=FALSE, sub="NON_NORMALIZED") dba.plotMA(tamoxifen,bXY=TRUE,bSmooth=FALSE,bNormalized=TRUE, sub="DESeq2-RLE-Background")
data(tamoxifen_analysis) # default MA plot dba.plotMA(tamoxifen) # Show different normalizations tamoxifen <- dba.normalize(tamoxifen,method=DBA_ALL_METHODS, library=DBA_LIBSIZE_PEAKREADS, background=FALSE) tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS) par(mfrow=c(3,2)) dba.plotMA(tamoxifen,th=0,bNormalized=FALSE,sub="NON-NORMALIZED") dba.plotMA(tamoxifen,th=0,bNormalized=FALSE,sub="NON-NORMALIZED") dba.plotMA(tamoxifen,method=DBA_DESEQ2,bNormalized=TRUE, sub="DESeq2_RLE-RiP") dba.plotMA(tamoxifen,method=DBA_EDGER,bNormalized=TRUE, sub="edgeR_TMM-RiP") tamoxifen <- dba.normalize(tamoxifen, method=DBA_ALL_METHODS, normalize=DBA_NORM_LIB, background=FALSE) tamoxifen <- dba.analyze(tamoxifen,method=DBA_ALL_METHODS) dba.plotMA(tamoxifen,method=DBA_DESEQ2,bNormalized=TRUE, sub="DESeq2_LIB-FULL") dba.plotMA(tamoxifen,method=DBA_EDGER,bNormalized=TRUE, sub="edgeR_LIB-FULL") # MA plots of samples without a contrast data(tamoxifen_counts) par(mfrow=c(2,2)) dba.plotMA(tamoxifen,list(Resistant=tamoxifen$masks$Resistant, Responsive=tamoxifen$masks$Responsive), bNormalized=FALSE) dba.plotMA(tamoxifen,list(MCF7=tamoxifen$masks$MCF7), bNormalized=FALSE) dba.plotMA(tamoxifen, list(Sample1=1), bNormalized=FALSE) dba.plotMA(tamoxifen, list(Random=sample(1:11,5)), bNormalized=FALSE) #XY plots (with raw and normalized data) data(tamoxifen_analysis) par(mfrow=c(1,2)) dba.plotMA(tamoxifen,bXY=TRUE,bSmooth=FALSE,bNormalized=FALSE, sub="NON_NORMALIZED") dba.plotMA(tamoxifen,bXY=TRUE,bSmooth=FALSE,bNormalized=TRUE, sub="DESeq2-RLE-Background")
Principal Component Analysis plot
dba.plotPCA(DBA, attributes, minval, maxval, contrast, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, report, score, bLog=TRUE, mask, sites, label, cor=FALSE, b3D=FALSE, vColors, dotSize, labelSize, labelCols, components=1:3, ...)
dba.plotPCA(DBA, attributes, minval, maxval, contrast, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, report, score, bLog=TRUE, mask, sites, label, cor=FALSE, b3D=FALSE, vColors, dotSize, labelSize, labelCols, components=1:3, ...)
DBA |
DBA object. |
attributes |
attribute or vector of attributes to use to color plotted points. Each unique combination of attribute values will be assigned a color. Chosen from: Note that |
minval |
Set all scores less than this to minval |
maxval |
Set all scores greater than this to maxval |
contrast |
number of contrast to use for PCA; if present, plots a PCA based on a differential binding affinity analysis
(see |
method |
method used for analysis (used in conjunction with contrast): |
th |
significance threshold; all sites with FDR (or p-values, see |
bUsePval |
if |
report |
report (obtained from |
score |
Score to use for count data.
Only used when plotting the global binding matrix (no |
bLog |
Logical indicating that log2 values should be used. Only applicable to read count scores (not peak scores). |
mask |
mask indicating a subset of peaksets to use when using global binding matrix scores.
If a |
sites |
logical vector indicating which sites to include in PCA.
Only relevant when using global binding matrix ( |
label |
A metadata field to use as a label in 2D plots.
The value for this field will be written directly on the plot near the dot for each sample.
Values can be any of those valid for the |
cor |
a logical value indicating whether the calculation should use the correlation matrix or the covariance matrix. Passed into princomp. |
b3D |
logical indicating that three principal components should be plotted (requires package |
vColors |
vector of custom colors; is absent, default colors will be used. |
dotSize |
size of dots to plot; is absent, a default will be calculated. |
labelSize |
Scaling factor for labels if present. Default is 0.8. |
labelCols |
Vector of colors to use for labels. Default is "black". |
components |
Number(s) of the components to plot.
Can be a vector of two or three component numbers, or a single integer.
If an integer, that component,
in addition to the succeeding one ( |
... |
arguments passed to |
MODE: PCA plot using significantly differentially bound sites:
dba.plotPCA(DBA, attributes, minval, maxval,
contrast, method, th, bUsePval,
b3D=F, vColors, dotSize, ...)
MODE: PCA plot using global binding matrix:
dba.plotPCA(DBA, attributes, minval, maxval,
mask, sites,
b3D=F, vColors, dotSize, ...)
trellis
plot from lattice
package; see xyplot
uses rgl package for 3D plots (if available)
Rory Stark
data(tamoxifen_peaks) # peakcaller scores PCA dba.plotPCA(tamoxifen) # raw count correlation PCA data(tamoxifen_analysis) dba.plotPCA(tamoxifen) #PCA based on normalized data for all sites dba.plotPCA(tamoxifen,contrast=1,th=1) #PCA based on DB sites only p <- dba.plotPCA(tamoxifen,contrast=1) p <- dba.plotPCA(tamoxifen,contrast=1,attributes=DBA_TISSUE) p <- dba.plotPCA(tamoxifen,contrast=1,attributes=DBA_TISSUE,label=DBA_CONDITION) p <- dba.plotPCA(tamoxifen,contrast=1,attributes=DBA_CONDITION,label=DBA_TISSUE) p <- dba.plotPCA(tamoxifen,contrast=1,attributes=c(DBA_TISSUE,DBA_CONDITION), label=DBA_REPLICATE)
data(tamoxifen_peaks) # peakcaller scores PCA dba.plotPCA(tamoxifen) # raw count correlation PCA data(tamoxifen_analysis) dba.plotPCA(tamoxifen) #PCA based on normalized data for all sites dba.plotPCA(tamoxifen,contrast=1,th=1) #PCA based on DB sites only p <- dba.plotPCA(tamoxifen,contrast=1) p <- dba.plotPCA(tamoxifen,contrast=1,attributes=DBA_TISSUE) p <- dba.plotPCA(tamoxifen,contrast=1,attributes=DBA_TISSUE,label=DBA_CONDITION) p <- dba.plotPCA(tamoxifen,contrast=1,attributes=DBA_CONDITION,label=DBA_TISSUE) p <- dba.plotPCA(tamoxifen,contrast=1,attributes=c(DBA_TISSUE,DBA_CONDITION), label=DBA_REPLICATE)
Generates profiles and makes heatmap plots.
dba.plotProfile(Object, samples, sites, scores, labels, normalize=TRUE, merge=DBA_REPLICATE, maxSites=1000, absScores=TRUE, doPlot=is(Object,"profileplyr"), ...)
dba.plotProfile(Object, samples, sites, scores, labels, normalize=TRUE, merge=DBA_REPLICATE, maxSites=1000, absScores=TRUE, doPlot=is(Object,"profileplyr"), ...)
Object |
Either a |
samples |
Sample mask. A vector of If absent, all samples will be included;
if Some groups of samples may be merged as indicated
in the |
sites |
If |
scores |
These can be any of the If |
labels |
Either a vector of sample label names (one for each sample in the plot), or a set of attributes to include (positive values) or exclude (negative values). Attributes include: |
normalize |
Can also be a vector of normalization factors, once for each sample. All counts for a sample will be divided by the normalization factor for that sample. |
merge |
Set of attributes to be used to determine which samples should be merged. All samples that share the same values for all other attributes except those specified will be merged by taking their mean count score (after normalizing, if specified), and included as a single sample column. Can also be specified as a list of vectors of sample numbers (relative to their
order in |
maxSites |
Maximum number of sites to include in a heatmap group. The top-scoring sites will be retained. |
absScores |
If |
doPlot |
|
... |
additional parameters passed on to |
This function enables the computation of peakset profiles and
the plotting of complex heatmaps.
It serves as a front-end to enable experiments analyzed using
DiffBind
to more easily use the
profiling and plotting functionality provided by the profileplyr
package
written by Tom Carroll and Doug Barrows.
Processing proceeds in two phases.
In the first phase, specific peaksets are extracted from a
DiffBind
DBA
object,
and profiles are calculated for these peaks for set of samples
in the DiffBind
experiment.
Profiles are calculated by counting the number of overlapping reads
in a series of bins upstream and downstream of each peak center.
In the second phase, the derived profiles are plotted in a series of complex heatmaps showing the relative intensity of overlapping peaks in each bin for each peak in each sample, along with summary plots showing the average profile across the sites for each sample.
Due to the computational cost of this function,
it is advised that the calculation of profiles
and the plotting of heatmaps be separated into two calls,
so that the profiles do not need to be re-generated if something goes wrong
in the plotting.
By default, when a DBA
object is passed in to generate profiles,
plotting is turned off and a profileplyr
object is returned.
When dba.plotProfile
is called with a profileplyr
object,
a plot is generated by default.
More detailed documentation is included in a markdown demonstration script
included with the DiffBind
package. This can be located as
follows:
system.file('extra/plotProfileDemo.Rmd',package='DiffBind')
An HTML version of the demonstration notebook can be accessed online at https://content.cruk.cam.ac.uk/bioinformatics/software/DiffBind/plotProfileDemo.html
silently returns a profileplyr-class
object.
Rory Stark
Carroll T, Barrows D (2020). profileplyr: Visualization and annotation of read signal over genomic ranges with profileplyr. DOI: 10.18129/B9.bioc.profileplyr
profileplyr::profileplyr-class
profileplyr::BamBigwig_to_chipProfile
profileplyr::generateEnrichedHeatmap
profileplyr::profileplyr
(Vignette)
# See plotProfileDemo notebook: # system.file('extra/plotProfileDemo.Rmd',package='DiffBind') data(tamoxifen_analysis) # default Profile plot ## Not run: dba.plotProfile(tamoxifen)
# See plotProfileDemo notebook: # system.file('extra/plotProfileDemo.Rmd',package='DiffBind') data(tamoxifen_analysis) # default Profile plot ## Not run: dba.plotProfile(tamoxifen)
Draws 2-way, 3-way, or 4-way Venn diagrams of overlaps
dba.plotVenn(DBA, mask, overlaps, label1, label2, label3, label4, main, sub, contrast, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, bDB=TRUE, bNotDB, bAll=TRUE, bGain=FALSE, bLoss=FALSE, labelAttributes, DataType=DBA$config$DataType)
dba.plotVenn(DBA, mask, overlaps, label1, label2, label3, label4, main, sub, contrast, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, bDB=TRUE, bNotDB, bAll=TRUE, bGain=FALSE, bLoss=FALSE, labelAttributes, DataType=DBA$config$DataType)
DBA |
DBA object; if present, only the mask parameter will apply. |
mask |
mask or vector of peakset numbers indicating which peaksets to include in Venn diagram.
Only 2 or 3 peaksets should be included.
See |
overlaps |
overlap record, as computed by |
label1 |
label for first peakset in diagram |
label2 |
label for second peakset in diagram |
label3 |
label for third peakset in diagram |
label4 |
label for fourth peakset in diagram |
main |
main title for plot |
sub |
subtitle for plot |
contrast |
contrast number(s) to use for results-based plots.
This can be a vector of contrast numbers.
See |
method |
if |
th |
if |
bUsePval |
if |
bDB |
if |
bNotDB |
if |
bAll |
if |
bGain |
if |
bLoss |
if |
labelAttributes |
if Only specified attributes that differ between peaksets will be used for labels; the ones that have the same value for all peaksets will be used as the default subtitle. |
DataType |
if Can be set as default behavior by setting Alternatively, this can be set to: to return a results-based DBA object, if a |
Either a list of peaksets is returned invisibly (as described in dba.overlap
), or, if DataType=DBA_DATA_DBAOBJECT
, a results-based DBA object.
When working with results overlaps (a least one contrast
is specified), and results-oriented DBA object is generated internally (as described in dba.report
). In some cases, it may be better to generate the DBA object explicitly (using dba.report
or setting bReturnPeaksets=TRUE
and DataType=DBA_DATA_DBAOBJECT
). This include the case where several plots are being made of the same results set, and it takes a long time to generate the results-based DBA object, as well as the case where there are more than four results peaksets and a mask needs to be specified. I
This function relies on vennPlot
in the systemPipeR
package, written by Thomas Girke.
Rory Stark
dba.analyze
, dba.overlap
,
dba.report
, dba.plotPCA
,
vennPlot
data(tamoxifen_peaks) par(mfrow=c(2,2)) # 2-way Venn dba.plotVenn(tamoxifen,6:7) dba.plotVenn(tamoxifen,tamoxifen$masks$ZR75) # 3-way Venn (done two different ways) dba.plotVenn(tamoxifen,tamoxifen$masks$MCF7&tamoxifen$masks$Responsive) olaps <- dba.overlap(tamoxifen,tamoxifen$masks$MCF7&tamoxifen$masks$Responsive) dba.plotVenn(tamoxifen,overlaps=olaps, label1="Rep 1",label2="Rep 2",label3="Rep 3", main="MCF7 (Responsive) Replicates") #Venn of overlaps Responsive=dba(tamoxifen,tamoxifen$masks$Responsive) Responsive Responsive <- dba.peakset(Responsive,1:3,sampID="MCF7") Responsive <- dba.peakset(Responsive,4:5,sampID="T47D") Responsive <- dba.peakset(Responsive,6:7,sampID="ZR75") par(mfrow=c(1,1)) dba.plotVenn(Responsive,Responsive$masks$Consensus) #4-way overlap data(tamoxifen_peaks) tamoxifen <- dba.peakset(tamoxifen, consensus=DBA_TISSUE) par(mfrow=c(1,1)) dba.plotVenn(tamoxifen,tamoxifen$masks$Consensus, main="Tissue consensus overlaps") #Venns of differentially bound sites data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen,design="~Tissue+Condition") tamoxifen <- dba.analyze(tamoxifen,method=c(DBA_EDGER,DBA_DESEQ2)) dba.plotVenn(tamoxifen,contrast=1,method=DBA_ALL_METHODS, bAll=FALSE,bGain=TRUE,bLoss=TRUE) par(mfrow=c(2,1)) dba.plotVenn(tamoxifen,contrast=1,method=DBA_ALL_METHODS, bAll=FALSE,bGain=TRUE,bLoss=FALSE) dba.plotVenn(tamoxifen,contrast=1,method=DBA_ALL_METHODS, bAll=FALSE,bGain=FALSE,bLoss=TRUE) data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen,design=FALSE,block=DBA_TISSUE) tamoxifen <- dba.contrast(tamoxifen,design="~Tissue + Condition", contrast=c("Condition","Responsive","Resistant")) tamoxifen <- dba.analyze(tamoxifen,method=DBA_ALL_METHODS) dba.plotVenn(tamoxifen,contrast=1:2,method=c(DBA_DESEQ2,DBA_DESEQ2_BLOCK)) tamoxifen.db <- dba.report(tamoxifen,contrast=1:2,method=DBA_ALL_METHODS_BLOCK, bDB=TRUE) dba.plotVenn(tamoxifen.db,mask=1:2) dba.plotVenn(tamoxifen.db,mask=3:6)
data(tamoxifen_peaks) par(mfrow=c(2,2)) # 2-way Venn dba.plotVenn(tamoxifen,6:7) dba.plotVenn(tamoxifen,tamoxifen$masks$ZR75) # 3-way Venn (done two different ways) dba.plotVenn(tamoxifen,tamoxifen$masks$MCF7&tamoxifen$masks$Responsive) olaps <- dba.overlap(tamoxifen,tamoxifen$masks$MCF7&tamoxifen$masks$Responsive) dba.plotVenn(tamoxifen,overlaps=olaps, label1="Rep 1",label2="Rep 2",label3="Rep 3", main="MCF7 (Responsive) Replicates") #Venn of overlaps Responsive=dba(tamoxifen,tamoxifen$masks$Responsive) Responsive Responsive <- dba.peakset(Responsive,1:3,sampID="MCF7") Responsive <- dba.peakset(Responsive,4:5,sampID="T47D") Responsive <- dba.peakset(Responsive,6:7,sampID="ZR75") par(mfrow=c(1,1)) dba.plotVenn(Responsive,Responsive$masks$Consensus) #4-way overlap data(tamoxifen_peaks) tamoxifen <- dba.peakset(tamoxifen, consensus=DBA_TISSUE) par(mfrow=c(1,1)) dba.plotVenn(tamoxifen,tamoxifen$masks$Consensus, main="Tissue consensus overlaps") #Venns of differentially bound sites data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen,design="~Tissue+Condition") tamoxifen <- dba.analyze(tamoxifen,method=c(DBA_EDGER,DBA_DESEQ2)) dba.plotVenn(tamoxifen,contrast=1,method=DBA_ALL_METHODS, bAll=FALSE,bGain=TRUE,bLoss=TRUE) par(mfrow=c(2,1)) dba.plotVenn(tamoxifen,contrast=1,method=DBA_ALL_METHODS, bAll=FALSE,bGain=TRUE,bLoss=FALSE) dba.plotVenn(tamoxifen,contrast=1,method=DBA_ALL_METHODS, bAll=FALSE,bGain=FALSE,bLoss=TRUE) data(tamoxifen_counts) tamoxifen <- dba.contrast(tamoxifen,design=FALSE,block=DBA_TISSUE) tamoxifen <- dba.contrast(tamoxifen,design="~Tissue + Condition", contrast=c("Condition","Responsive","Resistant")) tamoxifen <- dba.analyze(tamoxifen,method=DBA_ALL_METHODS) dba.plotVenn(tamoxifen,contrast=1:2,method=c(DBA_DESEQ2,DBA_DESEQ2_BLOCK)) tamoxifen.db <- dba.report(tamoxifen,contrast=1:2,method=DBA_ALL_METHODS_BLOCK, bDB=TRUE) dba.plotVenn(tamoxifen.db,mask=1:2) dba.plotVenn(tamoxifen.db,mask=3:6)
Generates volcano plots of differential binding analysis results.
dba.plotVolcano(DBA, contrast=1, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, fold=0, factor="", bFlip=FALSE, bLabels=FALSE, maxLabels=50, dotSize=1, bReturnSites=TRUE)
dba.plotVolcano(DBA, contrast=1, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, fold=0, factor="", bFlip=FALSE, bLabels=FALSE, maxLabels=50, dotSize=1, bReturnSites=TRUE)
DBA |
DBA object, on which |
contrast |
number of contrast to report on.
See |
method |
method or vector of methods to plot results for: |
th |
significance threshold; sites with FDR (or p-values, see |
bUsePval |
logical indicating whether to use FDR ( |
fold |
will only include sites with fold change greater than this as significant (colored red). If |
factor |
string to be prepended to plot main title; e.g. factor name. |
bFlip |
logical indicating that order of groups in contrast should be "flipped", allowing control of which sample group will have positive and which will have negative fold changes. |
bLabels |
logical indicating that labels should be drawn on the plot.
The labels are the site numbers, the row index in the (silently) returned set of
significant sites. The maximum number of sites can be specified
using |
maxLabels |
The maximum number of labels to use in the plot.
Ignored if |
dotSize |
size of points on plot. |
bReturnSites |
If |
Makes a volcano plot.
Silently returns wither a GRanges
object of the sites highlighted in red or a ggplot
object.
Rory Stark
data(tamoxifen_analysis) # default volcano plot dba.plotVolcano(tamoxifen) # only highlight significant sites with at least 3x Fold Change sigSites <- dba.plotVolcano(tamoxifen, fold=log2(3)) # use labels to find outlier sites sigSites <- dba.plotVolcano(tamoxifen, fold=log2(5), th=0.01, bLabels=TRUE) sigSites
data(tamoxifen_analysis) # default volcano plot dba.plotVolcano(tamoxifen) # only highlight significant sites with at least 3x Fold Change sigSites <- dba.plotVolcano(tamoxifen, fold=log2(3)) # use labels to find outlier sites sigSites <- dba.plotVolcano(tamoxifen, fold=log2(5), th=0.01, bLabels=TRUE) sigSites
Generates a report for a differential binding affinity analysis
dba.report(DBA, contrast, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, fold=0, bNormalized=TRUE, bFlip=FALSE, precision, bCalled=FALSE, bCounts=FALSE, bCalledDetail=FALSE, bDB, bNotDB, bAll=TRUE, bGain=FALSE, bLoss=FALSE, file,initString=DBA$config$reportInit,ext='csv', DataType=DBA$config$DataType)
dba.report(DBA, contrast, method=DBA$config$AnalysisMethod, th=DBA$config$th, bUsePval=DBA$config$bUsePval, fold=0, bNormalized=TRUE, bFlip=FALSE, precision, bCalled=FALSE, bCounts=FALSE, bCalledDetail=FALSE, bDB, bNotDB, bAll=TRUE, bGain=FALSE, bLoss=FALSE, file,initString=DBA$config$reportInit,ext='csv', DataType=DBA$config$DataType)
DBA |
DBA object.
A differential binding affinity analysis needs to have been previously carried out
(see |
contrast |
contrast number to report on.
When generating a report-based DBA object, this can be a vector of contrast numbers.
If missing, defaults to first contrast for reports,
and all contrasts when generating a report-based DBA object.
See |
method |
method used for analysis: When generating a report-based DBA object (see |
th |
significance threshold; all sites with FDR (or p-values, see |
bUsePval |
logical indicating whether to use FDR ( |
fold |
only sites with an absolute log Fold value greater than equal to this
will be included in the report.
This should be supplied as a If |
bNormalized |
logical indicating that normalized data (using normalization factors computed by differential analysis method) should be reported. When When Confidence statistics (p-value/FDR) are always reported as computed by the underlying analysis package, which incorporate normalization factors. |
bFlip |
logical indicating that order of groups in contrast should be "flipped", allowing control of which sample group will have positive and which will have negative fold changes. |
precision |
If present, alters the default precision for the
|
bCalled |
logical indicating that peak caller status should be included.
This will add a column for each group, each indicating the number of samples in the group identified as a peak in the original peaksets.
Note that this option is only available if the consensus peakset was calculated by
|
bCounts |
logical indicating that count data for individual samples should be reported as well as group statistics. Columns are added for each sample in the first group, followed by columns for each sample in the second group. |
bCalledDetail |
logical indicating that peak caller status should be included for each sample (if available). Columns are added for each sample in the first group, followed by columns for each sample in the second group. |
bDB |
logical indicating that a report-based DBA object should be generated, and that it should include Differentially Bound (DB) sites (respecting the |
bNotDB |
logical indicating that a report-based DBA object should be generated, and that it should include non-Differentially Bound (non-DB) sites (respecting the |
bAll |
logical indicating that a report-based DBA object should be generated, and that it should include peaksets combining peaks with both positive and negative fold changes. |
bGain |
logical indicating that a report-based DBA object should be generated, and that it should include peaksets with only positive fold changes. |
bLoss |
logical indicating that a report-based DBA object should be generated, and that it should include peaksets with only negative fold changes. |
file |
if present, also save the report to a comma separated value (csv) file, using this filename. |
initString |
if saving to a file, pre-pend this string to the filename. |
ext |
if saving to a file, append this extension to the filename. |
DataType |
The class of object for returned report: If set to Can be set as default behavior by setting |
if neither bDB
or bNotDB
is set to TRUE
,
a report dataframe or GRanges
object is returned,
with a row for each binding site within the thresholding parameters,
and the following columns:
Chr |
Chromosome of binding site |
Start |
Starting base position of binding site |
End |
End base position of binding site |
Conc |
Concentration – mean (log) reads across all samples in both groups |
Conc_group1 |
Group 1 Concentration – mean (log) reads across all samples first group |
Conc_group2 |
Group 2 Concentration – mean (log) reads across all samples in second group |
Fold |
Fold difference – mean fold difference of binding affinity of group 1 over group 2 (Conc1 - Conc2). Absolute value indicates magnitude of the difference, and sign indicates which one is bound with higher affinity, with a positive value indicating higher affinity in the first group |
p-value |
p-value calculation – statistic indicating significance of difference (likelihood difference is not attributable to chance) |
FDR |
adjusted p-value calculation – p-value subjected to multiple-testing correction |
If bCalled
is TRUE
and caller status is available, two more columns will follow:
Called1 |
Number of samples in group 1 that identified this binding site as a peak |
Called2 |
Number of samples in group 2 that identified this binding site as a peak |
If bCounts
is TRUE
, a column will be present for each sample in group 1, followed by each sample in group 2, if present.
The SampleID
will be used as the column header.
This column contains the read counts for the sample.
If bCalledDetail
is TRUE
, a column will be present for each sample in group 1,
followed by each sample in group 2, if present.
The SampleID
will be used as the column header.
This column contains a "+" to indicate for which sites the sample was called as a peak,
and a "-" if it was not so identified.
If bDB
or bNotDB
is set to TRUE
, a special DBA object
is returned,
containing peaksets based on sites determined to be differentially bound (or not)
as specified using the bDB
, bNotDB
, bGain
, bLoss
, and bAll
parameters.
In this DBA object
, the Tissue
value will specify the direction of the change (Gain
for positive fold changes, Loss
for negative fold changes, and All
for any fold change).
The Factor
value specifies if the peaks are differentially bound
(DB
) or not (!DB
).
The Condition
value specifies the analysis method (e.g. edgeR
),
and the Treatment
value is blank for unblocked analyses and set to block
for blocked analyses.
Rory Stark
data(tamoxifen_analysis) #Retrieve DB sites with FDR < 0.05 tamoxifen.DB <- dba.report(tamoxifen) tamoxifen.DB #Retrieve DB sites with p-value < 0.05 and Fold > 2 tamoxifen.DB <- dba.report(tamoxifen, th=.05, bUsePval=TRUE, fold=2) tamoxifen.DB #Retrieve all sites with confidence stats # and how many times each site was identified as a peak tamoxifen.DB <- dba.report(tamoxifen, th=1, bCalled=TRUE) tamoxifen.DB #Retrieve all sites with confidence stats and normalized counts tamoxifen.DB <- dba.report(tamoxifen, th=1, bCounts=TRUE) tamoxifen.DB #Retrieve all sites with confidence stats and raw counts tamoxifen.DB <- dba.report(tamoxifen, th=1, bCounts=TRUE,bNormalized=FALSE) tamoxifen.DB #Retrieve report as a SummarizedObject tamoxifen.sset <- dba.report(tamoxifen, DataType=DBA_DATA_SUMMARIZED_EXPERIMENT) tamoxifen.sset #Retrieve report-based DBA object data(tamoxifen_analysis) tamoxifen <- dba.analyze(tamoxifen,method=DBA_ALL_METHODS) tamoxifen.DB <- dba.report(tamoxifen,method=c(DBA_EDGER,DBA_DESEQ2), bDB=TRUE) tamoxifen.DB dba.plotVenn(tamoxifen.DB,1:2)
data(tamoxifen_analysis) #Retrieve DB sites with FDR < 0.05 tamoxifen.DB <- dba.report(tamoxifen) tamoxifen.DB #Retrieve DB sites with p-value < 0.05 and Fold > 2 tamoxifen.DB <- dba.report(tamoxifen, th=.05, bUsePval=TRUE, fold=2) tamoxifen.DB #Retrieve all sites with confidence stats # and how many times each site was identified as a peak tamoxifen.DB <- dba.report(tamoxifen, th=1, bCalled=TRUE) tamoxifen.DB #Retrieve all sites with confidence stats and normalized counts tamoxifen.DB <- dba.report(tamoxifen, th=1, bCounts=TRUE) tamoxifen.DB #Retrieve all sites with confidence stats and raw counts tamoxifen.DB <- dba.report(tamoxifen, th=1, bCounts=TRUE,bNormalized=FALSE) tamoxifen.DB #Retrieve report as a SummarizedObject tamoxifen.sset <- dba.report(tamoxifen, DataType=DBA_DATA_SUMMARIZED_EXPERIMENT) tamoxifen.sset #Retrieve report-based DBA object data(tamoxifen_analysis) tamoxifen <- dba.analyze(tamoxifen,method=DBA_ALL_METHODS) tamoxifen.DB <- dba.report(tamoxifen,method=c(DBA_EDGER,DBA_DESEQ2), bDB=TRUE) tamoxifen.DB dba.plotVenn(tamoxifen.DB,1:2)
Writes out DBA object
dba.save(DBA, file='DBA', dir='.', pre='dba_', ext='RData', bRemoveAnalysis=FALSE, bRemoveBackground=FALSE, bCompress=FALSE)
dba.save(DBA, file='DBA', dir='.', pre='dba_', ext='RData', bRemoveAnalysis=FALSE, bRemoveBackground=FALSE, bCompress=FALSE)
DBA |
DBA object |
file |
main filename |
dir |
directory to save model in |
pre |
string to pre-pend to filename |
ext |
extensions to use |
bRemoveAnalysis |
if |
bRemoveBackground |
if |
bCompress |
logical indicating saved DBA object should be compressed as much as possible. |
string containing full path and filename.
Rory Stark
## Not run: data(tamoxifen_peaks) savefile <- dba.save(tamoxifen,'tamoxifenPeaks') savefile rm(tamoxifen) tamoxifen <- dba.load('tamoxifenPeaks') unlink(savefile) ## End(Not run)
## Not run: data(tamoxifen_peaks) savefile <- dba.save(tamoxifen,'tamoxifenPeaks') savefile rm(tamoxifen) tamoxifen <- dba.load('tamoxifenPeaks') unlink(savefile) ## End(Not run)
Returns attributes of peaksets and/or contrasts associated with a DBA object.
dba.show(DBA, mask, attributes, bContrasts=FALSE, bDesign=FALSE, th=DBA$config$th)
dba.show(DBA, mask, attributes, bContrasts=FALSE, bDesign=FALSE, th=DBA$config$th)
DBA |
DBA object |
mask |
mask of peaksets for which to get attributes (used when obtaining peakset attributes,
i.e. |
attributes |
attribute or vector of attributes to retrieve.
Number of intervals is always shown.
Used when obtaining peakset attributes, i.e. |
bContrasts |
logical indicating whether peaksets or contrast attributes are to be retrieved.
|
bDesign |
logical indicating whether the model design should be returned, if present.
|
th |
if |
MODE: Return attributes of peaksets associated with a DBA object:
dba.show(DBA, mask, attributes)
MODE: Return contrasts associated with a DBA object:
dba.show(DBA,bContrasts=TRUE, th)
MODE: Return design associated with a DBA object:
dba.show(DBA,bDesign=TRUE)
dataframe with peakset attributes.
If bContrasts == FALSE
, each row represents a peakset, and each column is an attributes, with the final column, Intervals, indicating how many sites there are in the peakset.
If bContrasts == TRUE
, each row represent a contrast, with the following columns:
Group1 |
Label for first group of contrast |
Members1 |
Number of samples in first group of contrast |
Group2 |
Label for first group of contrast |
Members3 |
Number of samples in first group of contrast |
if dba.analyze
has been successfully run, there there will be up to
four more columns showing the number of significant differentially bound (DB) sites identified for
DB.edgeR |
Number of significantly differentially bound sites identified using edgeR |
DB.DESeq |
Number of significantly differentially bound sites identified using DESeq |
DB.edgeR.block |
Number of significantly differentially bound sites identified for blocking analysis using edgeR |
DB.DESeq.block |
Number of significantly differentially bound sites identified for blocking analysis using DESeq |
Rory Stark
dba
, dba.peakset
, dba.contrast
dba.analyze
, DBA.config
.
data(tamoxifen_peaks) dba.show(tamoxifen) dba.show(tamoxifen,tamoxifen$masks$Responsive) dba.show(tamoxifen,attributes=c(DBA_TISSUE,DBA_REPLICATE,DBA_CONDITION)) data(tamoxifen_analysis) dba.show(tamoxifen,bContrasts=TRUE) #alternatively: data(tamoxifen_analysis) tamoxifen tamoxifen$config$th <- .01 tamoxifen
data(tamoxifen_peaks) dba.show(tamoxifen) dba.show(tamoxifen,tamoxifen$masks$Responsive) dba.show(tamoxifen,attributes=c(DBA_TISSUE,DBA_REPLICATE,DBA_CONDITION)) data(tamoxifen_analysis) dba.show(tamoxifen,bContrasts=TRUE) #alternatively: data(tamoxifen_analysis) tamoxifen tamoxifen$config$th <- .01 tamoxifen
Constant variables used in DiffBind package
DBA_ID DBA_FACTOR DBA_TISSUE DBA_CONDITION DBA_TREATMENT DBA_REPLICATE DBA_CALLER DBA_CONSENSUS DBA_CONTROL DBA_READS DBA_ALL_ATTRIBUTES DBA_INTERVALS DBA_FRIP DBA_GROUP DBA_OLAP_PEAKS DBA_OLAP_ALL DBA_OLAP_RATE DBA_COR DBA_OLAP DBA_INALL DBA_SCORE_READS DBA_SCORE_NORMALIZED DBA_SCORE_CONTROL_READS DBA_SCORE_READS_MINUS DBA_SCORE_READS_FULL DBA_SCORE_READS_MINUS_FULL DBA_SCORE_READS_EFFECTIVE DBA_SCORE_READS_MINUS_EFFECTIVE DBA_SCORE_READS_FOLD DBA_SCORE_RPKM DBA_SCORE_RPKM_FOLD DBA_SCORE_RPKM_MINUS DBA_SCORE_TMM_READS_FULL DBA_SCORE_TMM_READS_EFFECTIVE DBA_SCORE_TMM_MINUS_FULL DBA_SCORE_TMM_MINUS_EFFECTIVE DBA_SCORE_TMM_READS_FULL_CPM DBA_SCORE_TMM_READS_EFFECTIVE_CPM DBA_SCORE_TMM_MINUS_FULL_CPM DBA_SCORE_TMM_MINUS_EFFECTIVE_CPM DBA_SCORE_SUMMIT DBA_SCORE_SUMMIT_ADJ DBA_SCORE_SUMMIT_POS DBA_SCORE_FOLD DBA_SCORE_CONCENTRATION DBA_SCORE_CONC_NUMERATOR DBA_SCORE_CONC_DENOMINATOR DBA_SCORE_PVAL DBA_SCORE_FDR DBA_READS_DEFAULT DBA_READS_BAM DBA_READS_BED DBA_EDGER DBA_DESEQ2 DBA_EDGER_BLOCK DBA_DESEQ2_BLOCK DBA_EDGER_GLM DBA_ALL_METHODS DBA_ALL_BLOCK DBA_ALL_METHODS_BLOCK DBA_DATA_FRAME DBA_DATA_GRANGES DBA_DATA_RANGEDDATA DBA_DATA_SUMMARIZED_EXPERIMENT DBA_DATA_DBAOBJECT DBA_BLACKLIST_CE10 DBA_BLACKLIST_CE11 DBA_BLACKLIST_DM3 DBA_BLACKLIST_DM6 DBA_BLACKLIST_GRCH37 DBA_BLACKLIST_GRCH38 DBA_BLACKLIST_HG19 DBA_BLACKLIST_HG38 DBA_BLACKLIST_MM9 DBA_BLACKLIST_MM10 DBA_BLACKLIST DBA_GREYLIST DBA_BLACKLISTED_PEAKS DBA_LIBSIZE_DEFAULT DBA_LIBSIZE_FULL DBA_LIBSIZE_PEAKREADS DBA_LIBSIZE_BACKGROUND DBA_LIBSIZE_USER DBA_NORM_DEFAULT DBA_NORM_NATIVE DBA_NORM_LIB DBA_NORM_TMM DBA_NORM_RLE DBA_NORM_SPIKEIN DBA_NORM_USER DBA_NORM_OFFSETS DBA_NORM_OFFSETS_ADJUST DBA_OFFSETS_LOESS DBA_OFFSETS_USER
DBA_ID DBA_FACTOR DBA_TISSUE DBA_CONDITION DBA_TREATMENT DBA_REPLICATE DBA_CALLER DBA_CONSENSUS DBA_CONTROL DBA_READS DBA_ALL_ATTRIBUTES DBA_INTERVALS DBA_FRIP DBA_GROUP DBA_OLAP_PEAKS DBA_OLAP_ALL DBA_OLAP_RATE DBA_COR DBA_OLAP DBA_INALL DBA_SCORE_READS DBA_SCORE_NORMALIZED DBA_SCORE_CONTROL_READS DBA_SCORE_READS_MINUS DBA_SCORE_READS_FULL DBA_SCORE_READS_MINUS_FULL DBA_SCORE_READS_EFFECTIVE DBA_SCORE_READS_MINUS_EFFECTIVE DBA_SCORE_READS_FOLD DBA_SCORE_RPKM DBA_SCORE_RPKM_FOLD DBA_SCORE_RPKM_MINUS DBA_SCORE_TMM_READS_FULL DBA_SCORE_TMM_READS_EFFECTIVE DBA_SCORE_TMM_MINUS_FULL DBA_SCORE_TMM_MINUS_EFFECTIVE DBA_SCORE_TMM_READS_FULL_CPM DBA_SCORE_TMM_READS_EFFECTIVE_CPM DBA_SCORE_TMM_MINUS_FULL_CPM DBA_SCORE_TMM_MINUS_EFFECTIVE_CPM DBA_SCORE_SUMMIT DBA_SCORE_SUMMIT_ADJ DBA_SCORE_SUMMIT_POS DBA_SCORE_FOLD DBA_SCORE_CONCENTRATION DBA_SCORE_CONC_NUMERATOR DBA_SCORE_CONC_DENOMINATOR DBA_SCORE_PVAL DBA_SCORE_FDR DBA_READS_DEFAULT DBA_READS_BAM DBA_READS_BED DBA_EDGER DBA_DESEQ2 DBA_EDGER_BLOCK DBA_DESEQ2_BLOCK DBA_EDGER_GLM DBA_ALL_METHODS DBA_ALL_BLOCK DBA_ALL_METHODS_BLOCK DBA_DATA_FRAME DBA_DATA_GRANGES DBA_DATA_RANGEDDATA DBA_DATA_SUMMARIZED_EXPERIMENT DBA_DATA_DBAOBJECT DBA_BLACKLIST_CE10 DBA_BLACKLIST_CE11 DBA_BLACKLIST_DM3 DBA_BLACKLIST_DM6 DBA_BLACKLIST_GRCH37 DBA_BLACKLIST_GRCH38 DBA_BLACKLIST_HG19 DBA_BLACKLIST_HG38 DBA_BLACKLIST_MM9 DBA_BLACKLIST_MM10 DBA_BLACKLIST DBA_GREYLIST DBA_BLACKLISTED_PEAKS DBA_LIBSIZE_DEFAULT DBA_LIBSIZE_FULL DBA_LIBSIZE_PEAKREADS DBA_LIBSIZE_BACKGROUND DBA_LIBSIZE_USER DBA_NORM_DEFAULT DBA_NORM_NATIVE DBA_NORM_LIB DBA_NORM_TMM DBA_NORM_RLE DBA_NORM_SPIKEIN DBA_NORM_USER DBA_NORM_OFFSETS DBA_NORM_OFFSETS_ADJUST DBA_OFFSETS_LOESS DBA_OFFSETS_USER
DBA_ID |
DBA peakset metadata: Peakset ID |
DBA_FACTOR |
DBA peakset metadata: Factor |
DBA_TISSUE |
DBA peakset metadata: Tissue |
DBA_CONDITION |
DBA peakset metadata: Condition |
DBA_TREATMENT |
DBA peakset metadata: Treatment |
DBA_REPLICATE |
DBA peakset metadata: Replicate |
DBA_CALLER |
DBA peakset metadata: Peak Caller |
DBA_CONSENSUS |
DBA peakset metadata: Is this a consensus peakset? |
DBA_CONTROL |
DBA peakset metadata: ID of Control sample |
DBA_READS |
Number of reads counted in BAM file. |
DBA_ALL_ATTRIBUTES |
DBA peakset metadata: all attributes that can be used in certain plot labels (cf |
DBA_INTERVALS |
DBA peakset metadata: Number of intervals in peakset |
DBA_FRIP |
DBA peakset metadata: Fraction of Reads in Peaks (number of reads in intervals divided by total number of reads in library) |
DBA_GROUP |
DBA peakset metadata: color PCA plot using contras groups |
DBA_OLAP_PEAKS |
dba.overlap mode: return overlapping/unique peaksets |
DBA_OLAP_ALL |
dba.overlap mode: return report of correlations/overlaps for each pair of samples |
DBA_OLAP_RATE |
dba.overlap mode: return overlap rates |
DBA_COR |
When plotting a heatmap from an overlap record, use the correlation value. |
DBA_OLAP |
When plotting a heatmap from an overlap record, use the percentage overlap value. |
DBA_INALL |
When plotting a heatmap from an overlap record, use the number of peaks in common to both samples. |
DBA_SCORE_READS |
dba.count score is number of reads in ChIP |
DBA_SCORE_CONTROL_READS |
dba.count score is number of reads in Control |
DBA_SCORE_READS_FOLD |
dba.count score is number of reads in ChIP divided by number of reads in Control |
DBA_SCORE_READS_MINUS |
dba.count score is number of reads in ChIP minus number of reads in Control |
DBA_SCORE_READS_FULL |
dba.count score is normalized ChIP read counts, using Full Library size |
DBA_SCORE_READS_MINUS_FULL |
dba.count score is normalized ChIP read counts minus Control read counts, using Full Library size |
DBA_SCORE_READS_EFFECTIVE |
dba.count score is normalized ChIP read counts, using Effective Library size |
DBA_SCORE_READS_MINUS_EFFECTIVE |
dba.count score is normalized ChIP read counts minus Control read counts, using Effective Library size |
DBA_SCORE_NORMALIZED |
dba.count score is normalized reads |
DBA_SCORE_RPKM |
dba.count score is RPKM of ChIP |
DBA_SCORE_RPKM_FOLD |
dba.count score is RPKM of ChIP divided by RPKM of Control |
DBA_SCORE_RPKM_MINUS |
dba.count score is RPKM of ChIP minus RPKM of Control |
DBA_SCORE_TMM_READS_FULL |
dba.count score is TMM normalized (using edgeR), using ChIP read counts and Full Library size |
DBA_SCORE_TMM_READS_EFFECTIVE |
dba.count score is TMM normalized (using edgeR), using ChIP read counts and Effective Library size |
DBA_SCORE_TMM_MINUS_FULL |
dba.count score is TMM normalized (using edgeR), using ChIP read counts minus Control read counts and Full Library size |
DBA_SCORE_TMM_MINUS_EFFECTIVE |
dba.count score is TMM normalized (using edgeR), using ChIP read counts minus Control read counts and Effective Library size |
DBA_SCORE_TMM_READS_FULL_CPM |
dba.count score is TMM normalized (using edgeR), using ChIP read counts and Full Library size, reported in counts-per-million. |
DBA_SCORE_TMM_READS_EFFECTIVE_CPM |
dba.count score is TMM normalized (using edgeR), using ChIP read counts and Effective Library size, reported in counts-per-million. |
DBA_SCORE_TMM_MINUS_FULL_CPM |
dba.count score is TMM normalized (using edgeR), using ChIP read counts minus Control read counts and Full Library size, reported in counts-per-million. |
DBA_SCORE_TMM_MINUS_EFFECTIVE_CPM |
dba.count score is TMM normalized (using edgeR), using ChIP read counts minus Control read counts and Effective Library size, reported in counts-per-million. |
DBA_SCORE_SUMMIT |
dba.count score is summit height (highest pile-up). |
DBA_SCORE_SUMMIT_ADJ |
dba.count score is summit height (highest pile-up), adjusted for library size. |
DBA_SCORE_SUMMIT_POS |
dba.count score is summit location (position of highest pile-up). |
DBA_SCORE_FOLD |
score for report-based DBA object is Log Fold Change. |
DBA_SCORE_CONCENTRATION |
score for report-based DBA object is Log Mean Concentration. |
DBA_SCORE_CONC_NUMERATOR |
score for report-based DBA object is Log Mean Concentration of numerator (first group in contrast). |
DBA_SCORE_CONC_DENOMINATOR |
score for report-based DBA object isLog Mean Concentration of denominator (second group in contrast). |
DBA_SCORE_PVAL |
score for report-based DBA object is p-value. |
DBA_SCORE_FDR |
score for report-based DBA object is FDR. |
DBA_READS_DEFAULT |
When counting read files, use the file extension to determine the file type. |
DBA_READS_BAM |
When counting read files, assume the file type is BAM, regardless of the file extension. |
DBA_READS_BED |
When counting read files, assume the file type is BED (or zipped BED), regardless of the file extension. |
DBA_EDGER |
differential analysis method: edgeR (default: DBA_EDGER_GLM) |
DBA_DESEQ2 |
differential analysis method: DESeq2 (using a single-factor GLM) |
DBA_EDGER_BLOCK |
differential analysis method: edgeR with blocking factors (GLM) |
DBA_DESEQ2_BLOCK |
differential analysis method: DESeq2 with blocking factors (GLM) |
DBA_EDGER_GLM |
differential analysis method: use GLM in edgeR for two-group comparisons |
DBA_ALL_METHODS |
use both analysis methods: |
DBA_ALL_BLOCK |
report on block results for both analysis methods: |
DBA_ALL_METHODS_BLOCK |
report on block results for all analysis methods, both blocked and unblocked: |
DBA_DATA_GRANGES |
Use GRanges class for peaksets and reports. This is the default (DBA$config$DataType = DBA_DATA_GRANGES). |
DBA_DATA_RANGEDDATA |
Use RangedData class for peaksets and reports. Can be set as default (DBA$config$DataType = DBA_DATA_RANGEDDATA). |
DBA_DATA_FRAME |
Use data.frame class for peaksets and reports. Can be set as default (DBA$config$DataType = DBA_DATA_FRAME). |
DBA_DATA_SUMMARIZED_EXPERIMENT |
Return report as a |
DBA_DATA_DBAOBJECT |
Return a result-based DBA object from |
DBA_BLACKLIST_HG19 |
Homo sapiens 19 (chromosomes have "chr") |
DBA_BLACKLIST_HG38 |
Homo sapiens 38 (chromosomes have "chr") |
DBA_BLACKLIST_GRCH37 |
Homo sapiens 37 (chromosomes are numbers) |
DBA_BLACKLIST_GRCH38 |
Homo sapiens 38 (chromosomes are numbers) |
DBA_BLACKLIST_MM9 |
Mus musculus 9 |
DBA_BLACKLIST_MM10 |
Mus musculus 10 |
DBA_BLACKLIST_CE10 |
C. elegans 10 |
DBA_BLACKLIST_CE11 |
C. elegans 11 |
DBA_BLACKLIST_DM3 |
Drosophila melanogaster 3 |
DBA_BLACKLIST_DM6 |
Drosophila melanogaster 6 |
DBA_BLACKLIST |
Retrieve blacklist |
DBA_GREYLIST |
Retrieve greylist |
DBA_BLACKLISTED_PEAKS |
Retrieve blacklisted peaks |
DBA_LIBSIZE_DEFAULT |
Default library size
( |
DBA_LIBSIZE_FULL |
Full library size (all reads in library) |
DBA_LIBSIZE_PEAKREADS |
Library size is Reads in Peaks |
DBA_LIBSIZE_BACKGROUND |
Library size is Reads in Background |
DBA_LIBSIZE_USER |
User supplied library sizes |
DBA_NORM_DEFAULT |
Default normalization method |
DBA_NORM_NATIVE |
"Native"" normalization method
(TMM for |
DBA_NORM_LIB |
Normalize by library size only |
DBA_NORM_TMM |
Normalize using TMM method ( |
DBA_NORM_RLE |
Normalize using RLE method ( |
DBA_NORM_SPIKEIN |
Normalize based on spike-ins |
DBA_NORM_USER |
User supplied normalization factors |
DBA_NORM_OFFSETS |
Use offsets instead of normalization factors |
DBA_NORM_OFFSETS_ADJUST |
Use offsets instead of normalization factors; adjust based on library size (DESeq) |
DBA_OFFSETS_LOESS |
Compute offsets using loess fit |
DBA_OFFSETS_USER |
Use offsetrs supplied by user |
Variables with ALL CAP names are used as constants within DiffBind
.
Rory Stark
Notes on the differences between DiffBind 3.0 and previous versions, and how run in a "backward compatible" manner.
Beginning with version 3.0, DiffBind
introduces
substantial updates and new features that may cause scripts
written for earlier versions to function differently
(or not at all), as well as altering the results.
This page givens details on these changes,
and how to approximate results computed
with earlier version if desired.
The major change in version 3.0 is in how the data are modeled. In previous versions, a separate model was derived for each contrast, including data only for those samples present in the contrast. Model design options were implicit and limited to either a single factor, or a subset of two-factor "blocked" designs. Starting in version 3.0, the default mode is to include all the data in a single model, allowing for any allowable design formula and any set of allowable contrasts.
Another change starting from version 3.0 is in how normalization is done. There are more normalization options, and more explicit control over them. The default normalization options have also changed, so reproducing a pre-3.0 analysis requires that normalization parameters to be specified.
It is recommended that existing analyses be re-run with the current software.
Existing scripts should execute (with the exception of two
normalization parameters which have been moved from dba.analyze
to the new interface function dba.normalize
.)
See the DiffBind
vignette for more information on
processing and analyzing ChIP-seq (and ATAC-seq) experiments.
blacklist is applied by default, if available, using automatic detection of reference genome.
greylists are generated from controls and applied by default.
minimum read counts are now 0 instead of being rounded up to 1 (this is now controllable).
centering peaks around summits is now done by default using 401-bp wide peaks (recommend to use 'summits=100' for ATAC-seq).
read counting is now performed by 'summarizeOverlaps()' by default, with single-end/paired-end counting automatically detected.
filtering is performed by default; consensus peaks where no peak has and RPKM value of at least 1 in any sample are filtered.
control read subtraction is now turned off by default if a greylist is present
normalization is based on full library sizes by default for both 'edgeR' and 'DESeq2'analyses.
score is set to normalized values by default.
Most existing DiffBind
scripts and saved objects will
run correctly using version 3.0, but there may be differences
in the results.
This section describes how to approximate earlier results for existing scripts and objects.
If a DBA
object was created with an earlier version of
DiffBind
, and saved using the dba.save
function, and loaded using the dba.load
function,
all settings should be preserved, such that running the analysis anew
will yield the same results.
In order to re-run the analysis using the post-version 3.0 settings,
the original script should be used to re-create the DBA
object.
By default, if you re-run a DiffBind
script, it will use the
new defaults from version 3.0 and beyond.
In order to re-analyze an experiment in the pre-version 3.0 mode,
a number of defaults need to be changed.
When calling dba.count
, the following defaults are changed:
summits
: This parameter is now set by default. Setting
summits=FALSE
will preempt re-centering each
peak interval around its point of highest pileup.
filter
: The new default for this parameter is 1
and is based on RPKM
values; previously
it was set to filter=0
and was based on read counts.
minCount
: This is a new parameter representing a minimum read count value.
It now default to 0
;
to get the previous behavior, set minCount=1
.
The easiest way to perform subsequent processing in a pre-version 3.0 manner is to set a configuration option:
DBA$config$design <- FALSE
This will result in the appropriate defaults being set for
the new interface function, dba.normalize
(which
does not need to be invoked explicitly.)
The pre-version 3.0 settings for dba.normalize
parameters
are as follows:
normalize
: DBA_NORM_DEFAULT
library
: DBA_LIBSIZE_FULL
background
: FALSE
Note that two parameters that used to be available when calling
dba.analyze
have been moved:
bSubControl
: now integrated into dba.count
. FALSE
by default (unless a greylist has
been added using dba.blacklist
).
bFullLibrarySize
: now integrated into
dba.normalize
as an option for the library
parameter.
library=DBA_LIBSIZE_FULL
is equivalent to bFullLibrarySize=TRUE
,
and library=DBA_LIBSIZE_PEAKREADS
is equivalent to
bFullLibrarySize=FALSE
.
Rory Stark
The DiffBind
vignette has been updated to show how to
analyze experiments using version 3.0.