NEWS
chipenrich 2.16.0
- Transition to Kai Wang as maintainer.
chipenrich 2.10.0
NEW FEATURES
- A new test, proxReg(), tests for genomic region binding proximity to either
gene transcription start sites or enhancer regions within gene sets. Used as an
addendum to any gene set enrichment test, not exclusive to those in this package.
IMPROVEMENTS
- Poly-Enrich now uses the likelihood ratio test instead of the Wald test, as
LRT is more robust when using a negative binomial GLM.
BUG FIXES
- Poly-Enrich Approximate method that uses the score test now uses the correct
formula.
chipenrich 2.4.0
NEW FEATURES
- A new function, peaks2genes(), to run the analysis up to, but not including,
the enrichment testing. Useful for checking QC plots, check qualities of
peak-to-gene assignments, and easier custom tests.
SIGNIFICANT USER-LEVEL CHANGES
- The hybridenrich() method now returns the same format as chipenrich() and
polyenrich()
IMPROVEMENTS
- Vignette now describes all available gene sets
BUG FIXES
- Fixed multiAssign weighting method to use the correct weights.
chipenrich 2.2.0
NEW FEATURES
- polyenrich now supports weighting peaks by signal value
- A hybrid test, hybridenrich() is available for those unsure of which test,
between chipenrich() and polyenrich() to use.
- A function to join two different results files, hybrid.join(), and it will
give an adjusted set of p-values and FDR-adjusted p-values using the two.
- A new approximation method using the Score test is available for
quick results for chipenrich and polyenrich. Only recommended for
significantly enriched results, and not depleted results. ~30x faster.
IMPROVEMENTS
- Several updates to the vignette
chipenrich 2.0.0
NEW FEATURES
- A new method for enrichment, polyenrich() is designed for gene set enrichment
of experiments where the presence of multiple peaks in a gene is accounted
for in the model. Use the polyenrich() function for this method.
- New features resulting from chipenrich.data 2.0.0:
- New genomes in chipenrich.data: danRer10, dm6, hg38, rn5, and rn6.
- Reactome for fly in chipenrich.data.
- Added locus definitions, GO gene sets, and Reactome gene sets for zebrafish.
- All genomes have the following locus definitions: nearest_tss, nearest_gene,
exon, intron, 1kb, 5kb, 10kb, 1kb_outside_upstream, 5kb_outside_upstream,
10kb_outside_upstream, 1kb_outside, 5kb_outside, and 10kb_outside.
IMPROVEMENTS
- The chipenrich method is now significantly faster. Chris Lee figured out
that spline calculations in chipenrich are not required for each gene set.
Now a spline is calculated as peak ~ s(log10_length) and used for all gene
sets. The correlation between the resulting p-values is nearly always 1.
Unfortunately, this approach cannot be used for broadenrich().
- The chipenrich(..., method='chipenrich', ...) function automatically
uses this faster method.
- Clarified documentation for the supported_locusdefs() to give explanations
for what each locus definition is.
- Use sys.call() to report options used in chipenrich() in opts.tab output. We
previously used as.list(environment()) which would also output entire
data.frames if peaks were loaded in as a data.frame.
- Various updates to the vignette to reflect new features.
SIGNIFICANT USER-LEVEL CHANGES
- As a result of updates to chipenrich.data, ENRICHMENT RESULTS MAY DIFFER
between chipenrich 1.Y.Z and chipenrich 2.Y.Z. This is because revised
versions of all genomes have been used to update LocusDefinitions, and
GO and Reactome gene sets have been updated to more recent versions.
- The broadenrich method is now its own function, broadenrich(), instead
of chipenrich(..., method = 'broadenrich', ...).
- User interface for mappability has been streamlined. 'mappability' parameter
in broadenrich(), chipenrich(), and polyenrich() functions replaces the
three parameters previously used: 'use_mappability', 'mappa_file', and
'read_length'. The unified 'mappability' parameter can be 'NULL', a file path,
or a string indicating the read length for mappability, e.g. '24'.
- A formerly hidden API for randomizations to assess Type I Error rates for
data sets is now exposed to the user. Each of the enrich functions has a
'randomization' parameter. See documentation and vignette for details.
- Many functions with the 'genome' parameter had a default of 'hg19', which
was not ideal. Now users must specify a genome and it is checked against
supported_genomes().
- Input files are read according to their file extension. Supported extensions
are bed, gff3, wig, bedGraph, narrowPeak, and broadPeak. Arbitrary extensions
are also supported, but there can be no header, and the first three columns
must be chr, start, and end.
SIGNIFICANT BACKEND CHANGES
- Harmonize all code touching LocusDefinition and tss objects to reflect
changes in chipenrich.data 2.0.0.
- Alter setup_ldef() function to add symbol column. If a valid genome is
used use orgDb to get eg2symbol mappings and fill in for the user. Users
can give their own symbol column which will override using orgDb. Finally,
if neither symbol column or valid genome is used, symbols are set to NA.
- Any instance of 'geneid' or 'names' to refer to Entrez Gene IDs are now
'gene_id' for consistency.
- Refactor read_bed() function as a wrapper for rtracklayer::import().
- Automatic extension handling of BED3-6, gff3, wig, or bedGraph.
- With some additional code, automatic extension handling of narrowPeak
and broadPeak.
- Backwards compatible with arbitrary extensions: this still assumes that
the first three columns are chr, start, end.
- The purpose of this refactor is to enable additional covariates for the
peaks for possible use in future methods.
- Refactor load_peaks() to use GenomicRanges::makeGRangesFromDataFrame().
- Filtering gene sets is now based on the locus definition, and can be done
from below (min) or above (max). Defaults are 15 and 2000, respectively.
- Randomizations are all done on the LocusDefinition object.
- Added lots of unit tests to increase test coverage.
- Make Travis builds use sartorlab/chipenrich.data version of data package
for faster testing.
DEPRECATED AND DEFUNCT
- Calling the broadenrich method with chipenrich(..., method = 'broadenrich', ...)
is no longer valid. Instead, use broadenrich().
- Various utility functions that were used in the original development have
been removed. Users never saw or used them.
BUG FIXES
- Fixed bug in randomization with length bins where artifactually, randomizations
would sort genes on Entrez ID introducing problems in Type I error rate.
- Fixed a bug where the dependent variable used in the enrichment model
was used to name the rows of the enrichment results. This could be confusing
for users. Now, rownames are simply integers.
- Fixed a bug that expected the result of read_bed() to be a list of IRanges
from initial development. Big speed bump.
chipenrich 1.12.1
BUG FIXES
- Fixed a bug in the check for proper organism + geneset combinations. Prevented
combinations that are actually valid from running.
chipenrich 1.12.0
IMPROVEMENTS
- Improve supported_*() functions to report and check combinations of genome,
organism, genesets, locusdef, and mappability read length.
- Cleanup DESCRIPTION and NAMESPACE to avoid loading entire packages.
- Assigning peaks using GenomicRanges object rather than than list of IRanges.
- Follow data() best practices.
USER-INVISIBLE CHANGES
- Transition documentation to roxygen2 blocks.
- Improve commenting in chipenrich() function.
- Rewrite package vignette in Rmarkdown and render with knitr.
chipenrich 1.4.0
NEW FEATURES
- A new method, broadenrich, is available in the chipenrich function which is
designed for gene set enrichment on broad genomic regions, such as peaks resulting
from histone modificaiton based ChIP-seq experiments.
- Methods chipenrich and broadenrich are available in multicore versions (on every
platform except Windows). The user selects the number of cores when calling
the chipenrich function.
- Peaks downloaded from the ENCODE Consortium as .broadPeak or .narrowPeak files
are supported directly.
- Peaks downloaded from the modENCODE Consortium as .bed.gff or .bed.gff3 files are
also supported directly.
- Support for D. melanogaster (dm3) genome and enrichment testing for GO terms
from all three branches (GOBP, GOCC, and GOMF).
- New gene sets from Reactome (http://www.reactome.org) for human, mouse, and rat.
- New example histone data set, peaks_H3K4me3_GM12878, based on hg19.
- New locus definitions including: introns, 10kb within TSS, and 10kb upstream of TSS.
chipenrich 1.0
PKG FEATURES
- chipenrich performs gene set enrichment tests on peaks called from
a ChIP-seq experiment
- chipenrich empirically corrects for confounding factors such as
the length of genes and mappability of sequence surrounding genes
- Use multiple definitions of a gene "locus" when testing for enrichment,
or provide your own definition
- Test for enrichment using chipenrich or Fisher's exact test (should only
be used for datasets where peaks are close to TSSs, see docs)
- Test multiple sets of genesets (Gene Ontology, KEGG, Biocarta, OMIM, etc.)
- Multiple plots to describe binding distance and likelihood of a peak
as a function of gene length
- Support for human (hg19), mouse (mm9), and rat (rn4) genomes
- Many conveniences such as seeing which peaks were assigned to genes,
their position relative to those genes and their TSS, etc.
- See how many peaks were assigned to each gene along with the length and
mappability of the gene
chipenrich 0.99.2
USER-VISIBLE CHANGES
- Updated examples for various functions to be runnable (removed donttest)
- Updated DESCRIPTION to use Imports: rather than Depends:
- Updated license to GPL-3
- Updated NEWS file for bioconductor guidelines
BUG FIXES
- Added a correction for the case where a small gene set has a peak in
every gene. This has the result of making a very few number of tests
slightly conservative, at the benefit of actually being able to return
a p-value for them.
chipenrich 0.99.1
USER-VISIBLE CHANGES
- Minor updates to documentation for Bioconductor
chipenrich 0.99.0
NEW FEATURES
- Initial submission to Bioconductor
chipenrich 0.9.6
NEW FEATURES
- Added peaks per gene as a returned object / output file
chipenrich 0.9.5
BUG FIXES
- Update to handle bioconductor/IRange's new "functionality" for distanceToNearest and distance
USER-VISIBLE CHANGES
- Changed sorting of results to put enriched terms first (sorted by p-value), then depleted (also sorted by p-value)
chipenrich 0.9.4
USER-VISIBLE CHANGES
- Minor changes to vignette and documentation
chipenrich 0.9.3
NEW FEATURES
BUG FIXES
- chipenrich() will correctly open both .bed and .bed.gz files now
chipenrich 0.9.2
NEW FEATURES
- Added ability for user to input their own locus definition file (pass the full path to a file as the locusdef argument)
- Added a data frame to the results object that gives the arguments/values passed to chipenrich, also written to file *_opts.tab
- For FET and chipenrich methods, the outcome variable can be recoded to be >= 1 peak, 2 peaks, 3 peaks, etc. using the num_peak_threshold parameter
- Added a parameter to set the maximum size of gene set that should be tested (defaults to 2000)
USER-VISIBLE CHANGES
- Previously only peak midpoints were given in the peak --> gene assignments file, now the original peak start/ends are also given
- Updated help/man with new parameters and more information about the results
BUG FIXES
- Fixed an issue where status in results was not enriched if the odds ratio was infinite, and depleted if the odds ratio was exactly zero
chipenrich 0.9.1
NEW FEATURES
- Added a QC plot for expected # of peaks and actual # of peaks vs. gene locus length. This will be automatically created if qc_plots is TRUE, or the plots can be created using the plot_expected_peaks function.
- Distance to TSS is now signed for upstream (-) and downstream (+) of TSS
- Column added to indicate whether the geneset is enriched or depleted
chipenrich 0.9
NEW FEATURES
- Added support for reading BED files natively
BUG FIXES
- Fixed bug where invalid geneset in chipenrich() wasn't detected properly
chipenrich 0.8
BUG FIXES
- Fixed crash when mappability contained an NA (will be removed from DB in future version)
chipenrich 0.7
USER-VISIBLE CHANGES
- Updated binomial test to sum gene locus lengths to get genome length and remove genes that are not present in the set of genes being tested
- Updated spline fit plot to take into account mappability if requested (log mappable locus length plotted instead of simply log locus length)
- Removed SAMPLEABLE_GENOME* constants since they are no longer needed
- Updated help files to reflect changes to plot_spline_length and chipenrich functions
BUG FIXES
- Fixed bug where results for multiple gene set types (e.g. doing BioCarta and KEGG together) were not sorted by p-value
chipenrich 0.6
BUG FIXES
- Fixed bug where 1kb/5kb locusdefs could fail if not all peaks were assigned to a gene
chipenrich 0.5
USER-VISIBLE CHANGES
- Updated help to explain new mappability model
- Changed how mappability is handled - now multiplies gene locus length by mappability, rather than adjusting as a spline term