Rollover bugfixes
add option to return ggplot in dba.plotVolcano
Rollover bugfixes from Release branch
Re-normalize after applying black/greylists
Apply Rhtslib patch
Update vignette
Remove contact info for Gord
Change default spike-in normalization to NATIVE (TLE/TMM)
Change spike-in normalization to use reference library sizes
Fix bug involving Called columns in reports
Maintain FRiP calculations
Improve dba.plotProfile sample labels
Fix issue with package:parallel
Fix error/warning in dba.blacklist relating to non-matching chromosome names
Update man page for dba.report to clarify how fold changes are reported when bNormalized=FALSE.
Add some new test conditions to GenerateDataFiles.R
Fix bFlip issues
Fix bug where mode not passed to summarizeOverlaps
Add $inter.feature config parameter
Re-compute FDR when fold change other than 0 is specified
Remove most gc() calls for performance
New type of plot: dba.plotProfile()
Can mix single-end and paired-end bam files
Various bug fixes
This is a major release of DiffBind and the new version will be 3.0.
The main upgrade involves how the modelling and normalization are done. DiffBind now supports models and contrasts of arbitrary complexity using either/both DESeq2 or/and edgeR, as well as a myriad of normalization options.
NB:
The previous methods for modelling are maintained for backward compatibility, however they are not the default. To repeat earlier analyses, dba.contrast() must be called explicitly with design=FALSE. See ?DiffBind3 for more information.
The default mode for dba.count() is now to center around summits (resulting in 401bp intervals). To to avoid recentering around summits as was the previous default, set summits=FALSE (or to a more appropriate value).
Normalization options have been moved from dba.analyze() to the new interface function dba.normalize(). Any non-default normalization options must be specified using dba.normalize().
Summary of Changes:
dba.analyze():
Automatic mode can start at any point, including from a sample sheet, and continue default analysis
Remove normalization options bSubControl, bFullLibrarySize, filter, and filterFun from dba.analyze(), now set in dba.normalize().
Add support to analyze using full model design formula.
Update DESeq2 analysis method.
Update edgeR analysis method.
Moved edgeR bTagwise parameter to $config option
Remove support for DESeq analysis method.
Add ability to retrieve DESeq2 or edgeR object
dba.contrast():
Add design parameter to set design formula
Add contrast parameter to specify contrast using variety of methods
Add reorderMeta parameter to set factor value order
Add bGetCoefficients parameter to get design coefficient names to use in contrasts
NEW: dba.normalize():
Support TMM, RLE, and Library size noramlization for both DESeq2 and edgeR
Support background bin normalization using csaw
Support offsets and loess fit normalization
Support spike-in normalization with combined or separate reference genomes
Support parallel factor normalization
bSubControl default depend on presence of Greylist
dba.count():
Change default to summits=250; to avoid recentering around summits,must set summits=FALSE
Default for bUseSummarizeOverlaps in dba.count is now TRUE
Automatically detect single-end/paired-end in dba.count
Automatically index unindexed bams in dba.count and dba.normalize
move bSubControl parameter
Default score is now new score DBA_SCORE_NORMALIZED
Add minCount parameter to dba.count(), default now 0 instead of 1
Filtering peak by read count thresholds only available in dba.count()
Fix bug in dba.count() with user-supplied peakset and summits=n
NEW: dba.blacklist():
Apply ENCODE blacklist
Automatically detect reference genome for blacklist
Apply Greylists
Generate Greylists from controls using GreyListChIP package
Plotting changes:
Add loess fit line to dba.plotMA()
Add ability in dba.plotMA() to plot aribitrary samples (without contrast).
Add mask parameter to dba.plotBox()
Support negative scores, eg Fold changes in report-based objects, to enable fold-change heatmaps.
Removed bCorPlot as a parameter to dba(), dba.count(), and dba.analyze(). Use config.
dba.show() / print changes:
Updated dba.show() and print() to deal with designs and different contrast types
Add ability to retrieve design formula in dba.show()
Removed bUsePval parameter in dba.show()
Added constant variable DBA_READS to access library sizes
Vignette and help pages:
Replace multi-factor analysis section
Add extensive normalization section
Add blacklist/greylist section.
Add pike-in and parallel normalization examples
Add DiffBind3 help page and vignette section with information on backward compatibility.
Update technical details sections
General updates to all sections
Add GenerateDataFiles.R to package
Various bugfixes and cosmetic changes.
Fix issue when extracting SummarizedExperiment from DBA
Change dba.plotPCA to use proper loading
Features
dba.report: add precision option to dba.report
dba.plot*: make plots use same precision as reports when thresholding
dba: add dir option to dba
Documentation updates
Vignette: change default to bFullLibrarySize=TRUE in description of DESeq2 analysis
Vignette: update vignette to not change dir
dba.report: clean up description of bCalled in man page
dba.report: modify example inman page to be clearer
dba: update man page to not change dir
dba.save: dontrun example code for dba.save writing into LIB
dba: dontrun example code for dba setting wd to LIB
Bug fixes
dba.report: Sort report by FDR instead of p_value
dba.peakset: fix bug when adding consensus peaks with chromosomes not present in some peaksets
dba.count: fix bug when recentering passed-in peaks
dba.report: fix bug when using filtering from dba.analyze
dba.count: fix bug when passing in peaks using factors
dba.count: fix bug caused by not registering one of the C routines correctly.
Feature changes
Change sortFun parameter in dba.plotHeatmap to default sd, with FALSE as option for no sorting
Sort peaks when adding directly via dba.peakset
don't add _ if no initString in dba.report
Internal feature: alternate peak counts: pv.resetCounts
Bug Fixes
Fix bug in dba.plotHeatmap is all values in a row are zero
Change authors in vignette to conform to new standard
Fix single peak boundary conditions
Subset config$fragmentSize when masking
Update example peaks to match data objects
Fix bug in dba.plotHeatmap if all values in a row are zero
Bugfix when returning report as GRanges with only one site
Bugfix when plotting venns of consensus peaksets
Fix buffer overrun causing segfault on MacOS
Feature: add new plot - dba.plotVolcano
Feature: Control which principal components are plotted using components parameter in dba.plotPCA
Feature: Control axis range using xrange and yrange parameters in dba.plotMA
Feature: Filtering per-contrast using filter and filterFun parameters in dba.analyze)
Feature: Flip which group in contrast shows gain/loss (sign of fold change) using bFlip parameter in dba.report
Feature: Flip which group in contrast shows gain/loss (sign of fold change) using bFlip parameter in dba.plotMA
Feature changes
Change default analysis method to DESeq2 for its more conservative normalization
Designate DESeq method as obsolete (in favor of DESeq2); alter documentation and vignette accordingly.
Change default FDR threshold to 0.05
Add bNot parameter to dba.contrast to remove ! contrasts by default
Remove bReturnPeaksets parameter from dba.plotVenn (does this by default)
Change bCorPlot default to FALSE (no more automatic clustering heatmaps)
Internal changes
Bump version number to 2.0
Update vignette
Remove $allvectors and $vectors; replace with $merged (without score matrix) and $binding
Upgrade peaksort to use inplace peakOrder
Optimize peak merging memory usage
Change PCA method from princomp() to prcomp()
maxGap implemented in merge
Include the beginnings of some unit tests using the testthat package.
Bug fixes
Fix bug in retrieving SummarizedExperiment
Fix bug when no peaks
Fix bugs in non-standard chromosome names and chromosome order
Fix bugs in Called logical vectors
Ensure loading sample sheet doesn't treat hash as comment char.
Tildes in file paths now accepted.
Spaces trimmed from entries in sample sheets (with warning).
Functions added to importFrom stats to satisfy BiocCheck.
Roll up bugfixes
dba.plotHeatmap returns binding sites in row order
Add support for reading Excel-format sample sheets (.xls, .xlsx extensions
Update DESeq2 reference in vignette; fix vignette samplesheet
use vennPlot from systemPiper
Fix Makevars to avoid gnu-specific extensions
Replace 'require' with 'requireNamespace' to eliminate NOTEs regarding misuse of 'require'
Remove non-ASCII characters from a couple of comments
Change Gord's email address
New: color vector lists for dba.plotHeatmap and colors for dba.plotPCA labels
Fix: bug causing two plots when changing score in dba.plotHeatmap and dba.plotPCA
Mostly bug fixes!
Counting
New: option to compute summits
New: option to center peaks with fixed width around summits
New: scores for summits (height, position) and CPM for TMM values
New: filter reads by mapping quality (mapQCth)
New: support for PE bam data using summarizeOverlaps
Remove: bCalledMask (now always TRUE)
Change: insertLength to fragmentSize
Add: fragmentSize can be a vector with a size for each sample
Change: fragmentSize default is 125 bp
Plotting
Change: colors based on CRUK color scheme
PCA plots
New: legend
New: label parameter for adding text labels of points in 2D plot
Venn diagrams
New: plot overlaps of differentially bound sites by specifying contrasts, thresholds etc.
New: able to return overlapping peaksets as GRanges directly
New: able to generate new DBA object consisting of overlapping peaks
New: labelAttributes for controlling default labels
New: default main and sub titles
Heatmaps
Fix: don’t plot column vector for attributes where every sample has a different value
General
New: add attribute value: DBA_ALL_ATTRIBUTES
Change: SN (signal/noise) to FRIP (fraction of reads in peaks)
Change: “Down” to “Loss” and Up” to “Gain”
Vignette
Change: vignette uses BiocStyles and dynamically generated figures
Change: example data based on hg19 instead of hg18
Change: example reads from bam files instead of bed files
New: section on using DiffBind and ChIPQC together
New configuration defaults options (DBA$config):
Metadata name strings: ID, Tissue, Factor, Condition, Treatment, Caller
th: significance threshold
bUsePval
fragmentSize
mapQCth: filter reads by mapping quality
fragments (for summarizeOverlaps)
Bugs/Issues
Fix: bRemoveDuplicates had some unpredictable behaviour
Fix: chrN_random were being counted against chrN
Disable: tamoxifen_GEO.R doesn’t work after SRA changed format of archived data
Add support for DESeq2:
New: Add DBA_DESEQ2, DBA_ALL_METHODS and DBA_ALL_BLOCK method constants
Change: dba.analyze can analyze using DESeq2
Change: all reporting and plotting functions support DESeq2 results
Change: vignette includes comparison of edgeR, DESeq, and DESeq2
Changes to counting using dba.count:
Change: optimize built-in counting code to use much less memory and run faster
Change: deprecate bLowMem, replaced with bUseSummarizeOverlaps
New: add readFormat parameter to specify read file type (instead of using file suffix)
New: generation of result-based DBA object using dba.report (makes it easier to work with differentially bound peaksets)
Changes to defaults:
Change: default score is now DBA_SCORE_TMM_MINUS_FULL instead of DBA_SCORE_TMM_MINUS_EFFECTIVE in dba.count
Change: default value for bFullLibrarySize is now TRUE in dba.analyze
New: add bCorPlot option to DBA$config to turn off CorPlot by default
Various bugfixes, improved warnings, updated documentation
New: Low memory counting of bam files using Rsamtools and summarizeOverlaps (bLowMem in dba.count)
New: Ability to read in externally derived counts (e.g. from htSeq) (dba.count)
Improved: Features to deal with filtering intervals based on read scores (dba.count)
Change parameter name: maxFilter -> filter
Allow maxFilter to be a numerical vector to retrieve filtering rate
Add parameter: filterFun to control filtering method
New: Support for SummarizedExperiment objects (dba and dba.report)
Add bSummarizedExperiment option to dba() to convert DBA object
Add DataType = DBA_DATA_SUMMARIZED_REPORT option to dba.report() to return SummarizedExperiment
Documentation: Add section to vignette showing how to obtain full tamoxifen resistance dataset
Add section to vignette showing how to obtains full tamoxifen dataset
Add script (tamoxifen_GEO.R) and sample sheet (tamoxifen_GEO.csv) to extras for full tamoxifen dataset
Add examples to man page for dba.count to show filtering
Add examples to man pages for dba and dba.report to show retrieval of SummarizedExperiment objects
Update and cleanup vignette and man pages
Various bugfixes and improved warnings
Plotting
dba.plotMA
Smooth plots now default
Added fold parameter in addition to th (threshold)
dba.plotHeatmap
Side colorbars added
Add support for specifying sample mask to include any subset of samples in a contrast plot, including samples that were not in the original contrast
dba.plotVenn
Changed plotter from limma to T. Girke's overLapper
Added support for 4-way Venns (also in dba/overlap)
dba.plotPCA
Add support for specifying sample mask to include any subset of samples in a contrast plot, including samples that were not in the original contrast
Peaksets (dba and dba.peakset)
Peakset formats
narrowPeaks format supported
Can override file format, score column, and score orientation defaults for supported peak callers
Consensus peaksets
Added ability to generate sets of consensus peaksets based on metadata attributes: for example create consensus peaksets for each tissue type and/or condition, or for all unique samples by taking the consensus of their replicate peaksets
Read counting (dba.count)
Compute Signal-to-Noise ratio when counting
Added bScaleControl to down-scale control reads by default
Add option to specify a mask in peak parameter to limit which peaksets are used to for a consensus by overlap. Works with new consensus peakset options in dba.peakset
Remove references to support for SAM files
Analysis (dba.analyze)
edgeR: updated calls to math change sin edgeR; updated vignette and references
DESeq: updated to work with current DESeq; use pooled-CR dispersion estimation method for blocking analysis; update vignette
Various bug fixes; more informative warnings; update documentation including vignette, new examples and cross-referencing in man pages
GRanges is default class for peaksets and reports instead of RangedData, controlled by DataType parameter.
Both analysis methods (edgeR and DESeq) use generalized linear models (GLMs) for two-group contrasts by default.
Blocking factors (for two-factor analysis) can be specified flexibly such that arbitrary blocking factors can be used.
Section added to vignette showing an analysis using a blocking factor.
Added new metadata type, DBA_TREATMENT.
New DBA_SCORE_ options for specifying scoring method, including TMM normalized counts, and ability to change scoring method on the fly in dba.plotHeatmap and dba.plotPCA when plotting global binding matrix.
bRemoveDuplicates parameter in dba.count allows duplicate reads to be discarded when computing counts
More efficient use of memory when analyzing (controlled by bReduceObjects parameter in dba.analyze).
various bugs fixed, man pages updated, and warning messages added.
Initial release