NEWS

chipenrich 2.16.0

Transition to Kai Wang as maintainer.

chipenrich 2.10.0

NEW FEATURES

A new test, proxReg(), tests for genomic region binding proximity to either gene transcription start sites or enhancer regions within gene sets. Used as an addendum to any gene set enrichment test, not exclusive to those in this package.

IMPROVEMENTS

Poly-Enrich now uses the likelihood ratio test instead of the Wald test, as LRT is more robust when using a negative binomial GLM.

BUG FIXES

Poly-Enrich Approximate method that uses the score test now uses the correct formula.

chipenrich 2.4.0

NEW FEATURES

A new function, peaks2genes(), to run the analysis up to, but not including, the enrichment testing. Useful for checking QC plots, check qualities of peak-to-gene assignments, and easier custom tests.

SIGNIFICANT USER-LEVEL CHANGES

The hybridenrich() method now returns the same format as chipenrich() and polyenrich()

IMPROVEMENTS

Vignette now describes all available gene sets

BUG FIXES

Fixed multiAssign weighting method to use the correct weights.

chipenrich 2.2.0

NEW FEATURES

polyenrich now supports weighting peaks by signal value
A hybrid test, hybridenrich() is available for those unsure of which test, between chipenrich() and polyenrich() to use.
A function to join two different results files, hybrid.join(), and it will give an adjusted set of p-values and FDR-adjusted p-values using the two.
A new approximation method using the Score test is available for quick results for chipenrich and polyenrich. Only recommended for significantly enriched results, and not depleted results. ~30x faster.

IMPROVEMENTS

Several updates to the vignette

chipenrich 2.0.0

NEW FEATURES

A new method for enrichment, polyenrich() is designed for gene set enrichment of experiments where the presence of multiple peaks in a gene is accounted for in the model. Use the polyenrich() function for this method.
New features resulting from chipenrich.data 2.0.0:
New genomes in chipenrich.data: danRer10, dm6, hg38, rn5, and rn6.
Reactome for fly in chipenrich.data.
Added locus definitions, GO gene sets, and Reactome gene sets for zebrafish.
All genomes have the following locus definitions: nearest_tss, nearest_gene, exon, intron, 1kb, 5kb, 10kb, 1kb_outside_upstream, 5kb_outside_upstream, 10kb_outside_upstream, 1kb_outside, 5kb_outside, and 10kb_outside.

IMPROVEMENTS

The chipenrich method is now significantly faster. Chris Lee figured out that spline calculations in chipenrich are not required for each gene set. Now a spline is calculated as peak ~ s(log10_length) and used for all gene sets. The correlation between the resulting p-values is nearly always 1. Unfortunately, this approach cannot be used for broadenrich().
The chipenrich(..., method='chipenrich', ...) function automatically uses this faster method.
Clarified documentation for the supported_locusdefs() to give explanations for what each locus definition is.
Use sys.call() to report options used in chipenrich() in opts.tab output. We previously used as.list(environment()) which would also output entire data.frames if peaks were loaded in as a data.frame.
Various updates to the vignette to reflect new features.

SIGNIFICANT USER-LEVEL CHANGES

As a result of updates to chipenrich.data, ENRICHMENT RESULTS MAY DIFFER between chipenrich 1.Y.Z and chipenrich 2.Y.Z. This is because revised versions of all genomes have been used to update LocusDefinitions, and GO and Reactome gene sets have been updated to more recent versions.
The broadenrich method is now its own function, broadenrich(), instead of chipenrich(..., method = 'broadenrich', ...).
User interface for mappability has been streamlined. 'mappability' parameter in broadenrich(), chipenrich(), and polyenrich() functions replaces the three parameters previously used: 'use_mappability', 'mappa_file', and 'read_length'. The unified 'mappability' parameter can be 'NULL', a file path, or a string indicating the read length for mappability, e.g. '24'.
A formerly hidden API for randomizations to assess Type I Error rates for data sets is now exposed to the user. Each of the enrich functions has a 'randomization' parameter. See documentation and vignette for details.
Many functions with the 'genome' parameter had a default of 'hg19', which was not ideal. Now users must specify a genome and it is checked against supported_genomes().
Input files are read according to their file extension. Supported extensions are bed, gff3, wig, bedGraph, narrowPeak, and broadPeak. Arbitrary extensions are also supported, but there can be no header, and the first three columns must be chr, start, and end.

SIGNIFICANT BACKEND CHANGES

Harmonize all code touching LocusDefinition and tss objects to reflect changes in chipenrich.data 2.0.0.
Alter setup_ldef() function to add symbol column. If a valid genome is used use orgDb to get eg2symbol mappings and fill in for the user. Users can give their own symbol column which will override using orgDb. Finally, if neither symbol column or valid genome is used, symbols are set to NA.
Any instance of 'geneid' or 'names' to refer to Entrez Gene IDs are now 'gene_id' for consistency.
Refactor read_bed() function as a wrapper for rtracklayer::import().
Automatic extension handling of BED3-6, gff3, wig, or bedGraph.
With some additional code, automatic extension handling of narrowPeak and broadPeak.
Backwards compatible with arbitrary extensions: this still assumes that the first three columns are chr, start, end.
The purpose of this refactor is to enable additional covariates for the peaks for possible use in future methods.
Refactor load_peaks() to use GenomicRanges::makeGRangesFromDataFrame().
Filtering gene sets is now based on the locus definition, and can be done from below (min) or above (max). Defaults are 15 and 2000, respectively.
Randomizations are all done on the LocusDefinition object.
Added lots of unit tests to increase test coverage.
Make Travis builds use sartorlab/chipenrich.data version of data package for faster testing.

DEPRECATED AND DEFUNCT

Calling the broadenrich method with chipenrich(..., method = 'broadenrich', ...) is no longer valid. Instead, use broadenrich().
Various utility functions that were used in the original development have been removed. Users never saw or used them.

BUG FIXES

Fixed bug in randomization with length bins where artifactually, randomizations would sort genes on Entrez ID introducing problems in Type I error rate.
Fixed a bug where the dependent variable used in the enrichment model was used to name the rows of the enrichment results. This could be confusing for users. Now, rownames are simply integers.
Fixed a bug that expected the result of read_bed() to be a list of IRanges from initial development. Big speed bump.

chipenrich 1.12.1

BUG FIXES

Fixed a bug in the check for proper organism + geneset combinations. Prevented combinations that are actually valid from running.

chipenrich 1.12.0

IMPROVEMENTS

Improve supported_*() functions to report and check combinations of genome, organism, genesets, locusdef, and mappability read length.
Cleanup DESCRIPTION and NAMESPACE to avoid loading entire packages.
Assigning peaks using GenomicRanges object rather than than list of IRanges.
Follow data() best practices.

USER-INVISIBLE CHANGES

Transition documentation to roxygen2 blocks.
Improve commenting in chipenrich() function.
Rewrite package vignette in Rmarkdown and render with knitr.

chipenrich 1.4.0

NEW FEATURES

A new method, broadenrich, is available in the chipenrich function which is designed for gene set enrichment on broad genomic regions, such as peaks resulting from histone modificaiton based ChIP-seq experiments.
Methods chipenrich and broadenrich are available in multicore versions (on every platform except Windows). The user selects the number of cores when calling the chipenrich function.
Peaks downloaded from the ENCODE Consortium as .broadPeak or .narrowPeak files are supported directly.
Peaks downloaded from the modENCODE Consortium as .bed.gff or .bed.gff3 files are also supported directly.
Support for D. melanogaster (dm3) genome and enrichment testing for GO terms from all three branches (GOBP, GOCC, and GOMF).
New gene sets from Reactome (http://www.reactome.org) for human, mouse, and rat.
New example histone data set, peaks_H3K4me3_GM12878, based on hg19.
New locus definitions including: introns, 10kb within TSS, and 10kb upstream of TSS.

chipenrich 1.0

PKG FEATURES

chipenrich performs gene set enrichment tests on peaks called from a ChIP-seq experiment
chipenrich empirically corrects for confounding factors such as the length of genes and mappability of sequence surrounding genes
Use multiple definitions of a gene "locus" when testing for enrichment, or provide your own definition
Test for enrichment using chipenrich or Fisher's exact test (should only be used for datasets where peaks are close to TSSs, see docs)
Test multiple sets of genesets (Gene Ontology, KEGG, Biocarta, OMIM, etc.)
Multiple plots to describe binding distance and likelihood of a peak as a function of gene length
Support for human (hg19), mouse (mm9), and rat (rn4) genomes
Many conveniences such as seeing which peaks were assigned to genes, their position relative to those genes and their TSS, etc.
See how many peaks were assigned to each gene along with the length and mappability of the gene

chipenrich 0.99.2

USER-VISIBLE CHANGES

Updated examples for various functions to be runnable (removed donttest)
Updated DESCRIPTION to use Imports: rather than Depends:
Updated license to GPL-3
Updated NEWS file for bioconductor guidelines

BUG FIXES

Added a correction for the case where a small gene set has a peak in every gene. This has the result of making a very few number of tests slightly conservative, at the benefit of actually being able to return a p-value for them.

chipenrich 0.99.1

USER-VISIBLE CHANGES

Minor updates to documentation for Bioconductor

chipenrich 0.99.0

NEW FEATURES

Initial submission to Bioconductor

chipenrich 0.9.6

NEW FEATURES

Added peaks per gene as a returned object / output file

chipenrich 0.9.5

BUG FIXES

Update to handle bioconductor/IRange's new "functionality" for distanceToNearest and distance

USER-VISIBLE CHANGES

Changed sorting of results to put enriched terms first (sorted by p-value), then depleted (also sorted by p-value)

chipenrich 0.9.4

USER-VISIBLE CHANGES

Minor changes to vignette and documentation

chipenrich 0.9.3

NEW FEATURES

Addition of rat genome

BUG FIXES

chipenrich() will correctly open both .bed and .bed.gz files now

chipenrich 0.9.2

NEW FEATURES

Added ability for user to input their own locus definition file (pass the full path to a file as the locusdef argument)
Added a data frame to the results object that gives the arguments/values passed to chipenrich, also written to file *_opts.tab
For FET and chipenrich methods, the outcome variable can be recoded to be >= 1 peak, 2 peaks, 3 peaks, etc. using the num_peak_threshold parameter
Added a parameter to set the maximum size of gene set that should be tested (defaults to 2000)

USER-VISIBLE CHANGES

Previously only peak midpoints were given in the peak --> gene assignments file, now the original peak start/ends are also given
Updated help/man with new parameters and more information about the results

BUG FIXES

Fixed an issue where status in results was not enriched if the odds ratio was infinite, and depleted if the odds ratio was exactly zero

chipenrich 0.9.1

NEW FEATURES

Added a QC plot for expected # of peaks and actual # of peaks vs. gene locus length. This will be automatically created if qc_plots is TRUE, or the plots can be created using the plot_expected_peaks function.
Distance to TSS is now signed for upstream (-) and downstream (+) of TSS
Column added to indicate whether the geneset is enriched or depleted

chipenrich 0.9

NEW FEATURES

Added support for reading BED files natively

BUG FIXES

Fixed bug where invalid geneset in chipenrich() wasn't detected properly

chipenrich 0.8

BUG FIXES

Fixed crash when mappability contained an NA (will be removed from DB in future version)

chipenrich 0.7

USER-VISIBLE CHANGES

Updated binomial test to sum gene locus lengths to get genome length and remove genes that are not present in the set of genes being tested
Updated spline fit plot to take into account mappability if requested (log mappable locus length plotted instead of simply log locus length)
Removed SAMPLEABLE_GENOME* constants since they are no longer needed
Updated help files to reflect changes to plot_spline_length and chipenrich functions

BUG FIXES

Fixed bug where results for multiple gene set types (e.g. doing BioCarta and KEGG together) were not sorted by p-value

chipenrich 0.6

BUG FIXES

Fixed bug where 1kb/5kb locusdefs could fail if not all peaks were assigned to a gene

chipenrich 0.5

USER-VISIBLE CHANGES

Updated help to explain new mappability model
Changed how mappability is handled - now multiplies gene locus length by mappability, rather than adjusting as a spline term