Title: | Normalization and difference calling in ChIP-seq data |
---|---|
Description: | Robust normalization and difference calling procedures for ChIP-seq and alike data. Read counts are modeled jointly as a binomial mixture model with a user-specified number of components. A fitted background estimate accounts for the effect of enrichment in certain regions and, therefore, represents an appropriate null hypothesis. This robust background is used to identify significantly enriched or depleted regions. |
Authors: | Johannes Helmuth [aut, cre], Ho-Ryun Chung [aut] |
Maintainer: | Johannes Helmuth <[email protected]> |
License: | GPL-2 |
Version: | 1.33.0 |
Built: | 2024-12-13 06:15:39 UTC |
Source: | https://github.com/bioc/normr |
Difference calling between treatment
(ChIP-seq 1) and control
(ChIP-seq 2) in normR is done by fitting background and two conditional
enrichments simultaenously. Therefore, a mixture of three binomials is fit
to the data with Expectation Maximization (EM). After convergence of the EM,
the fitted background component is used to calculate significance for
treatment and control count pair. Based on this statistic, user can extract
significantly enriched/depleted regions in a condition with a desired
significance level. These regions can be further analyzed within R or
exported (see NormRFit-class
). Furthermore, diffR
calculates a standardized conditional-specific enrichment given the
fitted background component. See also Details
diffR(treatment, control, genome, ...) ## S4 method for signature 'integer,integer,GenomicRanges' diffR(treatment, control, genome, procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,GenomicRanges' diffR(treatment, control, genome, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,data.frame' diffR(treatment, control, genome, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,character' diffR(treatment, control, genome = "", countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05)
diffR(treatment, control, genome, ...) ## S4 method for signature 'integer,integer,GenomicRanges' diffR(treatment, control, genome, procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,GenomicRanges' diffR(treatment, control, genome, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,data.frame' diffR(treatment, control, genome, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,character' diffR(treatment, control, genome = "", countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05)
treatment |
An |
control |
An |
genome |
Either |
... |
Optional arguments for the respective implementations of
|
procs |
An |
verbose |
A |
eps |
A |
iterations |
An |
minP |
An |
countConfig |
A |
Supplied count vectors for treatment and control should be of same length
and of type integer
.
For convenience, read count vectors can be obtained directly from bam files.
In this case, please specify a bam file for treatment and control each and a
genome
. Bam files should be indexed using samtools (i.e.
samtools index file file.bai). Furthermore, bam files should contain a valid
header with given chromosome names. If genome == NULL
(default),
chromosome names will be read from treatment bamheader. Please be aware that
bamheader might contain irregular contigs and chrM which influence the fit.
Also be sure that treatment and control contain the same chromosomes.
Otherwise an error will be thrown. If genome
is a character
,
fetchExtendedChromInfoFromUCSC
is used to
resolve this to a valid UCSC genome identifier (see
https://genome.ucsc.edu/cgi-bin/hgGateway for available genomes). In
this case, only assembled molecules will be considered (no circular). Please
check if your bam files obey this annotation. If genome
is a
data.frame
, it represents the chromosome specification. The first
column will be used as chromosome ID and the second column will be used as
the chromosome lengths. If genome
is a GenomicRanges
, it
should contain the equally sized genomic loci to count in, e.g. promoters.
The binsize in the supplied NormRCountConfig is ignore in this case.
bamCountConfig
is an instance of class NormRCountConfig
specifying settings for read counting on bam files. You can specify the
binsize, minimum mapping quality, shifting of read ends etc.. Please refer
to NormRFit-class
for details.
A NormRFit
container holding results of the fit
with type diffR
.
Johannes Helmuth [email protected]
NormRFit-class
for functions on accessing and
exporting the diffR fit. NormRCountConfig-class
for
configuration of the read counting procedure (binsize, mapping quality,...).
require(GenomicRanges) ### enrichR(): Calling Enrichment over Input #load some example bamfiles input <- system.file("extdata", "K562_Input.bam", package="normr") chipK4 <- system.file("extdata", "K562_H3K4me3.bam", package="normr") #region to count in (example files contain information only in this region) gr <- GRanges("chr1", IRanges(seq(22500001, 25000000, 1000), width = 1000)) #configure your counting strategy (see BamCountConfig-class) countConfiguration <- countConfigSingleEnd(binsize = 1000, mapq = 30, shift = 100) #invoke enrichR to call enrichment enrich <- enrichR(treatment = chipK4, control = input, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) #inspect the fit enrich summary(enrich) ## Not run: #write significant regions to bed #exportR(enrich, filename = "enrich.bed", fdr = 0.01) #write normalized enrichment to bigWig #exportR(enrich, filename = "enrich.bw") ## End(**Not run**) ### diffR(): Calling differences between two conditions chipK36 <- system.file("extdata", "K562_H3K36me3.bam", package="normr") diff <- diffR(treatment = chipK36, control = chipK4, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(diff) ### regimeR(): Identification of broad and peak enrichment regime <- regimeR(treatment = chipK36, control = input, models = 3, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(regime)
require(GenomicRanges) ### enrichR(): Calling Enrichment over Input #load some example bamfiles input <- system.file("extdata", "K562_Input.bam", package="normr") chipK4 <- system.file("extdata", "K562_H3K4me3.bam", package="normr") #region to count in (example files contain information only in this region) gr <- GRanges("chr1", IRanges(seq(22500001, 25000000, 1000), width = 1000)) #configure your counting strategy (see BamCountConfig-class) countConfiguration <- countConfigSingleEnd(binsize = 1000, mapq = 30, shift = 100) #invoke enrichR to call enrichment enrich <- enrichR(treatment = chipK4, control = input, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) #inspect the fit enrich summary(enrich) ## Not run: #write significant regions to bed #exportR(enrich, filename = "enrich.bed", fdr = 0.01) #write normalized enrichment to bigWig #exportR(enrich, filename = "enrich.bw") ## End(**Not run**) ### diffR(): Calling differences between two conditions chipK36 <- system.file("extdata", "K562_H3K36me3.bam", package="normr") diff <- diffR(treatment = chipK36, control = chipK4, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(diff) ### regimeR(): Identification of broad and peak enrichment regime <- regimeR(treatment = chipK36, control = input, models = 3, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(regime)
Enrichment calling between treatment
(ChIP-seq) and control
(Input) in normR is done by fitting background and enrichment
simultaenously. Therefore, a mixture of two binomials is fit to the data
with Expectation Maximization (EM). After convergence of the EM, the fitted
background component is used to calculate significance for treatment and
control count pair. Based on this statistic, user can extract significantly
enriched regions with a desired significance level. These regions can be
further analyzed within R or exported (see NormRFit-class
).
Furthermore, enrichR calculates a standardized enrichment given the fitted
background component. See also Details
enrichR(treatment, control, genome, ...) ## S4 method for signature 'integer,integer,GenomicRanges' enrichR(treatment, control, genome, procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,GenomicRanges' enrichR(treatment, control, genome, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,data.frame' enrichR(treatment, control, genome, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,character' enrichR(treatment, control, genome, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05)
enrichR(treatment, control, genome, ...) ## S4 method for signature 'integer,integer,GenomicRanges' enrichR(treatment, control, genome, procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,GenomicRanges' enrichR(treatment, control, genome, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,data.frame' enrichR(treatment, control, genome, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,character' enrichR(treatment, control, genome, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05)
treatment |
An |
control |
An |
genome |
Either |
... |
Optional arguments for the respective implementations of
|
procs |
An |
verbose |
A |
eps |
A |
iterations |
An |
minP |
An |
countConfig |
A |
Supplied count vectors for treatment and control should be of same length
and of type integer
.
For convenience, read count vectors can be obtained directly from bam files.
In this case, please specify a bam file for treatment and control each and a
genome
. Bam files should be indexed using samtools (i.e.
samtools index file file.bai). Furthermore, bam files should contain a valid
header with given chromosome names. If genome == NULL
(default),
chromosome names will be read from treatment bamheader. Please be aware that
bamheader might contain irregular contigs and chrM which influence the fit.
Also be sure that treatment and control contain the same chromosomes.
Otherwise an error will be thrown. If genome
is a character
,
fetchExtendedChromInfoFromUCSC
is used to
resolve this to a valid UCSC genome identifier (see
https://genome.ucsc.edu/cgi-bin/hgGateway for available genomes). In
this case, only assembled molecules will be considered (no circular). Please
check if your bam files obey this annotation. If genome
is a
data.frame
, it represents the chromosome specification. The first
column will be used as chromosome ID and the second column will be used as
the chromosome lengths. If genome
is a GenomicRanges
, it
should contain the equally sized genomic loci to count in, e.g. promoters.
The binsize in the supplied NormRCountConfig is ignore in this case.
bamCountConfig
is an instance of class NormRCountConfig
specifying settings for read counting on bam files. You can specify the
binsize, minimum mapping quality, shifting of read ends etc.. Please refer
to NormRFit-class
for details.
A NormRFit
container holding results of the fit
with type enrichR
.
Johannes Helmuth [email protected]
NormRFit-class
for functions on accessing and
exporting the enrichR fit. NormRCountConfig-class
for
configuration of the read counting procedure (binsize, mapping quality,...).
require(GenomicRanges) ### enrichR(): Calling Enrichment over Input #load some example bamfiles input <- system.file("extdata", "K562_Input.bam", package="normr") chipK4 <- system.file("extdata", "K562_H3K4me3.bam", package="normr") #region to count in (example files contain information only in this region) gr <- GRanges("chr1", IRanges(seq(22500001, 25000000, 1000), width = 1000)) #configure your counting strategy (see BamCountConfig-class) countConfiguration <- countConfigSingleEnd(binsize = 1000, mapq = 30, shift = 100) #invoke enrichR to call enrichment enrich <- enrichR(treatment = chipK4, control = input, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) #inspect the fit enrich summary(enrich) ## Not run: #write significant regions to bed #exportR(enrich, filename = "enrich.bed", fdr = 0.01) #write normalized enrichment to bigWig #exportR(enrich, filename = "enrich.bw") ## End(**Not run**) ### diffR(): Calling differences between two conditions chipK36 <- system.file("extdata", "K562_H3K36me3.bam", package="normr") diff <- diffR(treatment = chipK36, control = chipK4, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(diff) ### regimeR(): Identification of broad and peak enrichment regime <- regimeR(treatment = chipK36, control = input, models = 3, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(regime)
require(GenomicRanges) ### enrichR(): Calling Enrichment over Input #load some example bamfiles input <- system.file("extdata", "K562_Input.bam", package="normr") chipK4 <- system.file("extdata", "K562_H3K4me3.bam", package="normr") #region to count in (example files contain information only in this region) gr <- GRanges("chr1", IRanges(seq(22500001, 25000000, 1000), width = 1000)) #configure your counting strategy (see BamCountConfig-class) countConfiguration <- countConfigSingleEnd(binsize = 1000, mapq = 30, shift = 100) #invoke enrichR to call enrichment enrich <- enrichR(treatment = chipK4, control = input, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) #inspect the fit enrich summary(enrich) ## Not run: #write significant regions to bed #exportR(enrich, filename = "enrich.bed", fdr = 0.01) #write normalized enrichment to bigWig #exportR(enrich, filename = "enrich.bw") ## End(**Not run**) ### diffR(): Calling differences between two conditions chipK36 <- system.file("extdata", "K562_H3K36me3.bam", package="normr") diff <- diffR(treatment = chipK36, control = chipK4, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(diff) ### regimeR(): Identification of broad and peak enrichment regime <- regimeR(treatment = chipK36, control = input, models = 3, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(regime)
A correct background estimation is crucial for calling enrichment and
differences in ChIP-seq data. normR
provides robust
normalization and difference calling in ChIP-seq and alike data. In brief, a
binomial mixture model with a given number of components is fit to read
count data for a treatment and control experiment. Therein, computational
performance is improved by fitting a log-space model via Expectation
Maximization in C++. Convergence is achieved by a threshold on
the minimum change in model loglikelihood. After the model fit has
converged, a robust background estimate is obtained. This estimate accounts
for the effect of enrichment in certain regions and, therefore, represents
an appropriate null hypothesis. This robust background is used to identify
significantly enriched or depleted regions with respect to control.
Moreover, a standardized enrichment for each bin is calculated based on the
fitted background component. For convenience, read count vectors can be
obtained directly from bam files when a compliant chromosome annotation is
given. Please refer to the individual documentations of functions for
enrichment calling (enrichR
), difference calling
(diffR
) and enrichment regime calling (regimeR
).
Available functions are
enrichR
: Enrichment calling between treatment
(e.g. ChIP-seq) and control
(e.g. Input).
diffR
: Difference calling between treatment
(e.g. ChIP-seq condition 1) and control
(e.g. ChIP-seq
condition 2).
regimeR
: Enrichment regime calling between treatment
(e.g. ChIP-seq) and control
(e.g. Input) with a
given number of model components. For example, 3 regimes recover background,
broad and peak enrichment.
The computational performance is improved by fitting a log-space model in C++. Parallization is achieved in C++ via OpenMP (http://openmp.org).
Johannes Helmuth [email protected]
NormRFit-class
for functions on accessing and
exporting the normR fit. NormRCountConfig-class
for
configuration of the read counting procedure (binsize, mapping quality,...).
require(GenomicRanges) ### enrichR(): Calling Enrichment over Input #load some example bamfiles input <- system.file("extdata", "K562_Input.bam", package="normr") chipK4 <- system.file("extdata", "K562_H3K4me3.bam", package="normr") #region to count in (example files contain information only in this region) gr <- GRanges("chr1", IRanges(seq(22500001, 25000000, 1000), width = 1000)) #configure your counting strategy (see BamCountConfig-class) countConfiguration <- countConfigSingleEnd(binsize = 1000, mapq = 30, shift = 100) #invoke enrichR to call enrichment enrich <- enrichR(treatment = chipK4, control = input, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) #inspect the fit enrich summary(enrich) ## Not run: #write significant regions to bed #exportR(enrich, filename = "enrich.bed", fdr = 0.01) #write normalized enrichment to bigWig #exportR(enrich, filename = "enrich.bw") ## End(**Not run**) ### diffR(): Calling differences between two conditions chipK36 <- system.file("extdata", "K562_H3K36me3.bam", package="normr") diff <- diffR(treatment = chipK36, control = chipK4, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(diff) ### regimeR(): Identification of broad and peak enrichment regime <- regimeR(treatment = chipK36, control = input, models = 3, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(regime)
require(GenomicRanges) ### enrichR(): Calling Enrichment over Input #load some example bamfiles input <- system.file("extdata", "K562_Input.bam", package="normr") chipK4 <- system.file("extdata", "K562_H3K4me3.bam", package="normr") #region to count in (example files contain information only in this region) gr <- GRanges("chr1", IRanges(seq(22500001, 25000000, 1000), width = 1000)) #configure your counting strategy (see BamCountConfig-class) countConfiguration <- countConfigSingleEnd(binsize = 1000, mapq = 30, shift = 100) #invoke enrichR to call enrichment enrich <- enrichR(treatment = chipK4, control = input, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) #inspect the fit enrich summary(enrich) ## Not run: #write significant regions to bed #exportR(enrich, filename = "enrich.bed", fdr = 0.01) #write normalized enrichment to bigWig #exportR(enrich, filename = "enrich.bw") ## End(**Not run**) ### diffR(): Calling differences between two conditions chipK36 <- system.file("extdata", "K562_H3K36me3.bam", package="normr") diff <- diffR(treatment = chipK36, control = chipK4, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(diff) ### regimeR(): Identification of broad and peak enrichment regime <- regimeR(treatment = chipK36, control = input, models = 3, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(regime)
This S4 class is a small wrapper for a configuration on obtaining counts from
bamfiles with
bamsignals::bamProfile()
. Herein,
two functions provide help for creating an instance of this class:
countConfigSingleEnd
creates a configuration for single end reads;
and countConfigPairedEnd
creates a configuration for paired end
reads.
## S4 method for signature 'ANY' countConfigSingleEnd(binsize = 250L, mapq = 20L, filteredFlag = 1024L, shift = 0L) ## S4 method for signature 'ANY' countConfigPairedEnd(binsize = 250L, mapq = 20L, filteredFlag = 1024L, shift = 0L, midpoint = TRUE, tlenFilter = c(70L, 200L)) ## S4 method for signature 'NormRCountConfig' getFilter(x) ## S4 method for signature 'NormRCountConfig' print(x, ...) ## S4 method for signature 'NormRCountConfig' show(object)
## S4 method for signature 'ANY' countConfigSingleEnd(binsize = 250L, mapq = 20L, filteredFlag = 1024L, shift = 0L) ## S4 method for signature 'ANY' countConfigPairedEnd(binsize = 250L, mapq = 20L, filteredFlag = 1024L, shift = 0L, midpoint = TRUE, tlenFilter = c(70L, 200L)) ## S4 method for signature 'NormRCountConfig' getFilter(x) ## S4 method for signature 'NormRCountConfig' print(x, ...) ## S4 method for signature 'NormRCountConfig' show(object)
binsize |
An |
mapq |
An |
filteredFlag |
An |
shift |
An |
midpoint |
Paired End data only: A |
tlenFilter |
An |
x |
A |
... |
optional arguments to be passed directly to the inherited function without alteration and with the original names preserved. |
object |
A |
A NormRCountConfig
with specified counting parameters
for normr
methods (enrichR
, diffR
,
regimeR
countConfigSingleEnd
: Setup single end count configuration
countConfigPairedEnd
: Setup paired end count configuration
getFilter
: Get the filter compliant to
bamsignals::bamProfile()
print
: Prints a given BamCounConfig
show
: Shows a given BamCounConfig
type
A character
of value paired.end
or
single.end
.
binsize
An integer
specifying the binsize in bp.
mapq
An integer
specifying the minimal mapping quality for a
read to be counted.
filteredFlag
An integer
to filter for in the SAMFLAG field.
For example, 1024 filters out marked duplicates (default). Refer to
https://broadinstitute.github.io/picard/explain-flags.html for
details.
shift
An integer
specifing a shift of the read counting
position in 3'-direction. This can be handy in the analysis of chip-seq
data.
midpoint
Paired End data only: A logical
indicating whether
fragment midpoints instead of 5'-ends should be counted.
tlenFilter
Paired End data only: An integer
of length two
specifying the lower and upper length bound for a fragment to be considered.
The fragment length as estimated from alignment in paired end experiments
and written into the TLEN column.
Johannes Helmuth [email protected]
normr for functions that use this object.
### Create an instance of this class (see below for helper functions) # 250bp bins; single end reads; MAPQ>=10; no duplicates countConfig <- new("NormRCountConfig", type = "single.end", binsize = 250L, mapq = 10L, filteredFlag = 1024L, shift = 0L, midpoint = FALSE, tlenFilter = NULL) ### Counting configuration for Single End alignment files # 250bp bins (default); only reads with MAPQ>=20; move counting position 100bp countConfigurationSE <- countConfigSingleEnd(mapq = 20, shift = 100) countConfigurationSE ### Counting configuration for Paired End alignment files # 250bp bins; count center of fragments; only fragments with 70bp<=length<=200 countConfigurationPE <- countConfigPairedEnd(midpoint = TRUE, tlenFilter = c(70, 200)) countConfigurationPE
### Create an instance of this class (see below for helper functions) # 250bp bins; single end reads; MAPQ>=10; no duplicates countConfig <- new("NormRCountConfig", type = "single.end", binsize = 250L, mapq = 10L, filteredFlag = 1024L, shift = 0L, midpoint = FALSE, tlenFilter = NULL) ### Counting configuration for Single End alignment files # 250bp bins (default); only reads with MAPQ>=20; move counting position 100bp countConfigurationSE <- countConfigSingleEnd(mapq = 20, shift = 100) countConfigurationSE ### Counting configuration for Paired End alignment files # 250bp bins; count center of fragments; only fragments with 70bp<=length<=200 countConfigurationPE <- countConfigPairedEnd(midpoint = TRUE, tlenFilter = c(70, 200)) countConfigurationPE
This S4 class wraps a normR
fit containing counts, fit
configuration and results of the fit. Herein, functions for printing,
summarization and accessing are provided. The
functions enrichR
, diffR
and
regimeR
generate a container of this class to save results of
a normR binomial mixture fitting. Please refer to their documentation for
conventional usage of the normR package.
## S4 method for signature 'NormRFit,character' exportR(x, filename, fdr = 0.01, color = NA, type = c(NA, "bed", "bedGraph", "bigWig")) ## S4 method for signature 'NormRFit,missing' plot(x, y, ...) ## S4 method for signature 'NormRFit' getCounts(x) ## S4 method for signature 'NormRFit' getRanges(x, fdr = NA, k = NULL) ## S4 method for signature 'NormRFit' getPosteriors(x) ## S4 method for signature 'NormRFit' getEnrichment(x, B = NA, F = NA, standardized = TRUE, procs = 1L) ## S4 method for signature 'NormRFit' getPvalues(x, filtered = FALSE) ## S4 method for signature 'NormRFit' getQvalues(x) ## S4 method for signature 'NormRFit' getClasses(x, fdr = NA) ## S4 method for signature 'NormRFit' length(x) ## S4 method for signature 'NormRFit' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S4 method for signature 'NormRFit' show(object) ## S4 method for signature 'NormRFit' summary(object, print = TRUE, digits = 3, ...)
## S4 method for signature 'NormRFit,character' exportR(x, filename, fdr = 0.01, color = NA, type = c(NA, "bed", "bedGraph", "bigWig")) ## S4 method for signature 'NormRFit,missing' plot(x, y, ...) ## S4 method for signature 'NormRFit' getCounts(x) ## S4 method for signature 'NormRFit' getRanges(x, fdr = NA, k = NULL) ## S4 method for signature 'NormRFit' getPosteriors(x) ## S4 method for signature 'NormRFit' getEnrichment(x, B = NA, F = NA, standardized = TRUE, procs = 1L) ## S4 method for signature 'NormRFit' getPvalues(x, filtered = FALSE) ## S4 method for signature 'NormRFit' getQvalues(x) ## S4 method for signature 'NormRFit' getClasses(x, fdr = NA) ## S4 method for signature 'NormRFit' length(x) ## S4 method for signature 'NormRFit' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S4 method for signature 'NormRFit' show(object) ## S4 method for signature 'NormRFit' summary(object, print = TRUE, digits = 3, ...)
x |
A |
filename |
A |
fdr |
|
color |
Specified color(s) when printing a bed file. If |
type |
A |
y |
not used. |
... |
optional arguments to be passed directly to the inherited function without alteration and with the original names preserved. |
k |
|
B |
An |
F |
An |
standardized |
A |
procs |
An |
filtered |
A |
digits |
Number of digits to show in number formatting. |
object |
A |
print |
|
When working with instances of this S4 class, it is recommended to only use functions to access contents of this object. Internally, the class holds a map structure of unique elements to reduce memory requirements. #'
getCounts: A list
of length 2 with integer
for control
and treatment each.
getRanges: A GenomicRanges
object.
getPosteriors: A matrix
of posteriors for x@k
mixture
components
getEnrichment: A numeric
of length length(x@n)
giving
the normR computed enrichment.
getPvalues: A numeric
of length length(x@n)
giving
the normR computed Pvalues.
getQvalues: A numeric
of length length(x@filteredT)
giving the FDR-corrected q-values using Storey's method.
getClasses: A integer
specifying assignments of regions to
the mixture model. If x@type == "enrichR"
, it contains 1
for
enriched regions and NA
for non-enriched regions. If x@type ==
"diffR"
, it contains 1
for control-enriched regions, 2
for
treatment-enriched regions and NA
for non-enriched regions. If
x@type == "regimeR"
, it contains >= 1
for regime-enriched
regions and NA
for non-enriched regions.
exportR
: Export results of a normR fit to common file formats.
plot
: Plot a NormRFit.
getCounts
: Get count data for control and treatment.
getRanges
: Get the genomic coordinates of regions analyzed with
information about component assignment.
getPosteriors
: Get computed posteriors for each mixture component.
getEnrichment
: Get normalized enrichment.
getPvalues
: Get normR-computed P-values.
getQvalues
: Get FDR-corrected q-values.
getClasses
: Get component assignments for each region analyzed.
length
: Returns the number of regions analyzed.
print
: Prints a small summary on a NormRFit.
show
: Shows a small summary on a NormRFit.
summary
: Prints a concise summary of a NormRFit.
type
A character
representing the type of fit. One of
c("enrichR","diffR", "regimeR")
.
n
An integer
specifying the number of regions.
ranges
A GenomicRanges
specifying the genomic coordinates of
the regions.
k
An integer
giving the number of binomial mixture components.
B
An integer
specifying the index of the background component.
map
A vector of integer
holding a map to map back
counts
, lnposteriors
, lnenrichment
, lnpvals
,
lnqvals
and classes
. See low level function
normr:::map2uniquePairs
for how the map is generated.
counts
A list
of length two containing a vector of
integer
holding unique counts for control and treatment each. Use
getCounts
to retrieve original count matrix.
amount
A vector of integer
specifying the number of occurences
of each unique control / treatment count pair.
names
A character
of length two specifying the names for
control and treatment.
thetastar
A numeric
giving the calculated naive background
estimation, i.e. sum(getCounts(obj)[2,])/sum(getCounts(obj))
theta
A numeric
of length k
giving the normR fitted
parametrization of k
binomial mixture components.
mixtures
A numeric
of length k
giving the normR fitted
mixture proportions of k
binomial mixture components. Should add up
to one.
lnL
A vector of numeric
holding the log-likelihood-trace of
a normR model fit.
eps
A numeric
used as threshold for normR fit EM convergence.
lnposteriors
A matrix
with length(amount)
rows and
k
columns. It contains the ln posterior probabilities for each unique
control / treatment count pair. Use getPosteriors
to get the
posterior matrix for the original data.
lnenrichment
A numeric
of length length(amount)
holding
calculared normalized enrichment for each unique control / treatment count
pair. The enrichment is calculated with respect to the fitted component
B
. Use getEnrichment
to retrieve enrichment for the
original data.
lnpvals
A numeric
of length length(amount)
holding ln
P-values for each unique control / treatment count pair. Given
theta
of B
the signifcane of enrichment is assigned. Use
getPvalues
to retrieve Pvalues for original data.
thresholdT
An integer
giving the threshold used to filter
P-values for FDR correction. The T-Filter threshold is a calculated
population size for which the null hypothesis (theta
of B
) can
be rejected. eps
specifies the significance level.
filteredT
A vector of integer
giving indices of P-values
passing thresholdT
. Only these P-values will be considered for FDR
correction.
lnqvals
A numeric
of length length(filteredT)
holding
ln q-values (FDR correction). P-values are corrected for multiple testing
using Storey's method.
classes
A integer
of length length(amount)
specifying
the class assignments for each unique control / treatment count pair. These
class assignments are based on the normR model fit. For type ==
"enrichR"
, this vector contains either NA
(not enriched) or 1
(enriched). For type == "diffR"
, this vector contains NA
(unchanged), 1
(differential in ChIP-seq 1) and 2
(differential in ChIP-seq 2). For type == "regimeR"
, this vector
contains NA
(not enriched) and an arbitary number of enrichment class
>= 1
.
Johannes Helmuth [email protected]
normr for function creating this container
require(GenomicRanges) #Create a toy instance of type 'enrichR' fit <- new("NormRFit", type="enrichR", n=10L, ranges=GRanges("chr1", IRanges(seq(1,100,10), width=10)), k=2L, B=1L, map=rep(1:5,2), counts=list(1:5, 1:5), amount=rep(2L,5), names=c("chip", "input"), thetastar=.35, theta=c(.15,.55), mixtures=c(.9,.1), lnL=seq(-50,-1,10), eps=.001, lnposteriors=log(matrix(runif(10), ncol=2)), lnenrichment=log(runif(5,0,.2)), lnpvals=log(runif(5)), filteredT=2:5, thresholdT=1L, lnqvals=log(runif(5,0,.2)), classes=sample(1:2,5,TRUE)) #print some statistics on fits fit summary(fit) ## Not run: #write significant regions to bed #exportR(fit, filename = "enrich.bed", fdr = 0.1) #write normalized enrichment to bigWig #exportR(fit, filename = "enrich.bw") ## End(**Not run**) ###AccessorMethods #get original counts getCounts(fit) #get genomic coordinates for significant ranges as a GenomicRanges instance getRanges(fit, fdr = .1) getPosteriors(fit) getEnrichment(fit) getPvalues(fit) getQvalues(fit) getClasses(fit)
require(GenomicRanges) #Create a toy instance of type 'enrichR' fit <- new("NormRFit", type="enrichR", n=10L, ranges=GRanges("chr1", IRanges(seq(1,100,10), width=10)), k=2L, B=1L, map=rep(1:5,2), counts=list(1:5, 1:5), amount=rep(2L,5), names=c("chip", "input"), thetastar=.35, theta=c(.15,.55), mixtures=c(.9,.1), lnL=seq(-50,-1,10), eps=.001, lnposteriors=log(matrix(runif(10), ncol=2)), lnenrichment=log(runif(5,0,.2)), lnpvals=log(runif(5)), filteredT=2:5, thresholdT=1L, lnqvals=log(runif(5,0,.2)), classes=sample(1:2,5,TRUE)) #print some statistics on fits fit summary(fit) ## Not run: #write significant regions to bed #exportR(fit, filename = "enrich.bed", fdr = 0.1) #write normalized enrichment to bigWig #exportR(fit, filename = "enrich.bw") ## End(**Not run**) ###AccessorMethods #get original counts getCounts(fit) #get genomic coordinates for significant ranges as a GenomicRanges instance getRanges(fit, fdr = .1) getPosteriors(fit) getEnrichment(fit) getPvalues(fit) getQvalues(fit) getClasses(fit)
Regime enrichment calling between treatment
(ChIP-seq) and
control
(Input) in normR is done by fitting background and multiple
enrichment regimes simultaenously. Therefore, a mixture of models
binomials is fit to the data with Expectation Maximization (EM). After
convergence of the EM, the fitted background component is used to calculate
significance for treatment and control count pair. Based on this statistic,
user can extract significantly enriched regions with a desired significance
level. Regime assignments are done by Maximum A Posteriori. Regions can be
further analyzed within R or exported (see NormRFit-class
).
Furthermore, regimeR calculates a standardized enrichment given the fitted
background component. For example, 3 regimes discriminate background, broad
and peak enrichment. See also Details.
regimeR(treatment, control, genome, models, ...) ## S4 method for signature 'integer,integer,GenomicRanges,numeric' regimeR(treatment, control, genome, models = 3, procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,GenomicRanges,numeric' regimeR(treatment, control, genome, models = 3, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,data.frame,numeric' regimeR(treatment, control, genome, models = 3, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,character,numeric' regimeR(treatment, control, genome = "", models = 3, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05)
regimeR(treatment, control, genome, models, ...) ## S4 method for signature 'integer,integer,GenomicRanges,numeric' regimeR(treatment, control, genome, models = 3, procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,GenomicRanges,numeric' regimeR(treatment, control, genome, models = 3, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,data.frame,numeric' regimeR(treatment, control, genome, models = 3, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05) ## S4 method for signature 'character,character,character,numeric' regimeR(treatment, control, genome = "", models = 3, countConfig = countConfigSingleEnd(), procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05)
treatment |
An |
control |
An |
genome |
Either |
models |
An |
... |
Optional arguments for the respective implementations of
|
procs |
An |
verbose |
A |
eps |
A |
iterations |
An |
minP |
An |
countConfig |
A |
Supplied count vectors for treatment and control should be of same length
and of type integer
.
For convenience, read count vectors can be obtained directly from bam files.
In this case, please specify a bam file for treatment and control each and a
genome
. Bam files should be indexed using samtools (i.e.
samtools index file file.bai). Furthermore, bam files should contain a valid
header with given chromosome names. If genome == NULL
(default),
chromosome names will be read from treatment bamheader. Please be aware that
bamheader might contain irregular contigs and chrM which influence the fit.
Also be sure that treatment and control contain the same chromosomes.
Otherwise an error will be thrown. If genome
is a character
,
fetchExtendedChromInfoFromUCSC
is used to
resolve this to a valid UCSC genome identifier (see
https://genome.ucsc.edu/cgi-bin/hgGateway for available genomes). In
this case, only assembled molecules will be considered (no circular). Please
check if your bam files obey this annotation. If genome
is a
data.frame
, it represents the chromosome specification. The first
column will be used as chromosome ID and the second column will be used as
the chromosome lengths. If genome
is a GenomicRanges
, it
should contain the equally sized genomic loci to count in, e.g. promoters.
The binsize in the supplied NormRCountConfig is ignore in this case.
bamCountConfig
is an instance of class NormRCountConfig
specifying settings for read counting on bam files. You can specify the
binsize, minimum mapping quality, shifting of read ends etc.. Please refer
to NormRFit-class
for details.
A NormRFit
container holding results of the fit
with type regimeR
.
Johannes Helmuth [email protected]
NormRFit-class
for functions on accessing and
exporting the regimeR fit. NormRCountConfig-class
for
configuration of the read counting procedure (binsize, mapping quality,...).
require(GenomicRanges) ### enrichR(): Calling Enrichment over Input #load some example bamfiles input <- system.file("extdata", "K562_Input.bam", package="normr") chipK4 <- system.file("extdata", "K562_H3K4me3.bam", package="normr") #region to count in (example files contain information only in this region) gr <- GRanges("chr1", IRanges(seq(22500001, 25000000, 1000), width = 1000)) #configure your counting strategy (see BamCountConfig-class) countConfiguration <- countConfigSingleEnd(binsize = 1000, mapq = 30, shift = 100) #invoke enrichR to call enrichment enrich <- enrichR(treatment = chipK4, control = input, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) #inspect the fit enrich summary(enrich) ## Not run: #write significant regions to bed #exportR(enrich, filename = "enrich.bed", fdr = 0.01) #write normalized enrichment to bigWig #exportR(enrich, filename = "enrich.bw") ## End(**Not run**) ### diffR(): Calling differences between two conditions chipK36 <- system.file("extdata", "K562_H3K36me3.bam", package="normr") diff <- diffR(treatment = chipK36, control = chipK4, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(diff) ### regimeR(): Identification of broad and peak enrichment regime <- regimeR(treatment = chipK36, control = input, models = 3, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(regime)
require(GenomicRanges) ### enrichR(): Calling Enrichment over Input #load some example bamfiles input <- system.file("extdata", "K562_Input.bam", package="normr") chipK4 <- system.file("extdata", "K562_H3K4me3.bam", package="normr") #region to count in (example files contain information only in this region) gr <- GRanges("chr1", IRanges(seq(22500001, 25000000, 1000), width = 1000)) #configure your counting strategy (see BamCountConfig-class) countConfiguration <- countConfigSingleEnd(binsize = 1000, mapq = 30, shift = 100) #invoke enrichR to call enrichment enrich <- enrichR(treatment = chipK4, control = input, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) #inspect the fit enrich summary(enrich) ## Not run: #write significant regions to bed #exportR(enrich, filename = "enrich.bed", fdr = 0.01) #write normalized enrichment to bigWig #exportR(enrich, filename = "enrich.bw") ## End(**Not run**) ### diffR(): Calling differences between two conditions chipK36 <- system.file("extdata", "K562_H3K36me3.bam", package="normr") diff <- diffR(treatment = chipK36, control = chipK4, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(diff) ### regimeR(): Identification of broad and peak enrichment regime <- regimeR(treatment = chipK36, control = input, models = 3, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(regime)