Title: | Combinatorial and Differential Chromatin State Analysis for ChIP-Seq Data |
---|---|
Description: | This package implements functions for combinatorial and differential analysis of ChIP-seq data. It includes uni- and multivariate peak-calling, export to genome browser viewable files, and functions for enrichment analyses. |
Authors: | Aaron Taudt, Maria Colome Tatche, Matthias Heinig, Minh Anh Nguyen |
Maintainer: | Aaron Taudt <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.33.1 |
Built: | 2024-12-18 08:40:52 UTC |
Source: | https://github.com/bioc/chromstaR |
This package implements functions for the combinatorial and differential analysis of ChIP-seq data. It was developed for histone modifications with a broad profile but is also suitable for the analysis of transcription factor binding data. A Hidden Markov Model with a mixture of Negative Binomials as emission densities is used to call peaks. Please refer to our manuscript at http://dx.doi.org/10.1101/038612 for a detailed description of the method.
The main function of this package is Chromstar
. For a detailed introduction type browseVignettes("chromstaR")
and read the vignette. Here is an overview of all plotting
functions.
Aaron Taudt, Maria Colome-Tatche, Matthias Heinig, Minh Anh Nguyen
A GRanges-class
object which contains binned read counts as meta data column counts
. It is output of the binReads
function.
Convert aligned reads in .bam or .bed(.gz) format into read counts in equidistant windows.
binReads( file, experiment.table = NULL, ID = NULL, assembly, bamindex = file, chromosomes = NULL, pairedEndReads = FALSE, min.mapq = 10, remove.duplicate.reads = TRUE, max.fragment.width = 1000, blacklist = NULL, binsizes = 1000, stepsizes = binsizes/2, reads.per.bin = NULL, bins = NULL, variable.width.reference = NULL, use.bamsignals = TRUE, format = NULL )
binReads( file, experiment.table = NULL, ID = NULL, assembly, bamindex = file, chromosomes = NULL, pairedEndReads = FALSE, min.mapq = 10, remove.duplicate.reads = TRUE, max.fragment.width = 1000, blacklist = NULL, binsizes = 1000, stepsizes = binsizes/2, reads.per.bin = NULL, bins = NULL, variable.width.reference = NULL, use.bamsignals = TRUE, format = NULL )
file |
A file with aligned reads. Alternatively a |
experiment.table |
An |
ID |
Optional ID to select a row from the |
assembly |
Please see |
bamindex |
BAM index file. Can be specified without the .bai ending. If the index file does not exist it will be created and a warning is issued. |
chromosomes |
If only a subset of the chromosomes should be binned, specify them here. |
pairedEndReads |
Set to |
min.mapq |
Minimum mapping quality when importing from BAM files. Set |
remove.duplicate.reads |
A logical indicating whether or not duplicate reads should be removed. |
max.fragment.width |
Maximum allowed fragment length. This is to filter out erroneously wrong fragments due to mapping errors of paired end reads. |
blacklist |
A |
binsizes |
An integer vector specifying the bin sizes to use. |
stepsizes |
An integer vector specifying the step size. One number can be given for each element in |
reads.per.bin |
Approximate number of desired reads per bin. The bin size will be selected accordingly. |
bins |
A |
variable.width.reference |
A BAM file that is used as reference to produce variable width bins. See |
use.bamsignals |
If |
format |
One of |
Convert aligned reads from .bam or .bed(.gz) files into read counts in equidistant windows (bins). This function uses GenomicRanges::countOverlaps
to calculate the read counts, or alternatively bamsignals::bamProfile
if option use.bamsignals
is set (only effective for .bam files).
If only one bin size was specified for option binsizes
, the function returns a single GRanges-class
object with meta data column 'counts' that contains the read count. If multiple binsizes
were specified , the function returns a named list()
of GRanges-class objects.
## Get an example BAM file with ChIP-seq reads file <- system.file("extdata", "euratrans", "lv-H3K27me3-BN-male-bio2-tech1.bam", package="chromstaRData") ## Bin the file into bin size 1000bp data(rn4_chrominfo) data(experiment_table) binned <- binReads(file, experiment.table=experiment_table, assembly=rn4_chrominfo, binsizes=1000, stepsizes=500, chromosomes='chr12') print(binned)
## Get an example BAM file with ChIP-seq reads file <- system.file("extdata", "euratrans", "lv-H3K27me3-BN-male-bio2-tech1.bam", package="chromstaRData") ## Bin the file into bin size 1000bp data(rn4_chrominfo) data(experiment_table) binned <- binReads(file, experiment.table=experiment_table, assembly=rn4_chrominfo, binsizes=1000, stepsizes=500, chromosomes='chr12') print(binned)
Fit a HMM to multiple ChIP-seq samples to determine the combinatorial state of genomic regions. Input is a list of uniHMM
s generated by callPeaksUnivariate
.
callPeaksMultivariate( hmms, use.states, max.states = NULL, per.chrom = TRUE, chromosomes = NULL, eps = 0.01, keep.posteriors = FALSE, num.threads = 1, max.time = NULL, max.iter = NULL, keep.densities = FALSE, verbosity = 1, temp.savedir = NULL )
callPeaksMultivariate( hmms, use.states, max.states = NULL, per.chrom = TRUE, chromosomes = NULL, eps = 0.01, keep.posteriors = FALSE, num.threads = 1, max.time = NULL, max.iter = NULL, keep.densities = FALSE, verbosity = 1, temp.savedir = NULL )
hmms |
A list of |
use.states |
A data.frame with combinatorial states which are used in the multivariate HMM, generated by function |
max.states |
Maximum number of combinatorial states to use in the multivariate HMM. The states are ordered by occurrence as determined from the combination of univariate state calls. |
per.chrom |
If |
chromosomes |
A vector specifying the chromosomes to use from the models in |
eps |
Convergence threshold for the Baum-Welch algorithm. |
keep.posteriors |
If set to |
num.threads |
Number of threads to use. Setting this to >1 may give increased performance. |
max.time |
The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. The default |
max.iter |
The maximum number of iterations for the Baum-Welch algorithm. The default |
keep.densities |
If set to |
verbosity |
Verbosity level for the fitting procedure. 0 - No output, 1 - Iterations are printed. |
temp.savedir |
A directory where to store intermediate results if |
Emission distributions from the univariate HMMs are used with a Gaussian copula to generate a multivariate emission distribution for each combinatorial state. This multivariate distribution is then kept fixed and the transition probabilities are fitted with a Baum-Welch. Please refer to our manuscript at http://dx.doi.org/10.1101/038612 for a detailed description of the method.
A multiHMM
object.
Aaron Taudt, Maria Colome Tatche
multiHMM
, callPeaksUnivariate
, callPeaksReplicates
# Get example BAM files for 2 different marks in hypertensive rat file.path <- system.file("extdata","euratrans", package='chromstaRData') files <- list.files(file.path, full.names=TRUE, pattern='SHR.*bam$')[c(1:2,6)] # Construct experiment structure exp <- data.frame(file=files, mark=c("H3K27me3","H3K27me3","H3K4me3"), condition=rep("SHR",3), replicate=c(1:2,1), pairedEndReads=FALSE, controlFiles=NA) states <- stateBrewer(exp, mode='combinatorial') # Bin the data data(rn4_chrominfo) binned.data <- list() for (file in files) { binned.data[[basename(file)]] <- binReads(file, binsizes=1000, stepsizes=500, experiment.table=exp, assembly=rn4_chrominfo, chromosomes='chr12') } # Obtain the univariate fits models <- list() for (i1 in 1:length(binned.data)) { models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1) } # Call multivariate peaks multimodel <- callPeaksMultivariate(models, use.states=states, eps=1, max.time=60) # Check some plots heatmapTransitionProbs(multimodel) heatmapCountCorrelation(multimodel)
# Get example BAM files for 2 different marks in hypertensive rat file.path <- system.file("extdata","euratrans", package='chromstaRData') files <- list.files(file.path, full.names=TRUE, pattern='SHR.*bam$')[c(1:2,6)] # Construct experiment structure exp <- data.frame(file=files, mark=c("H3K27me3","H3K27me3","H3K4me3"), condition=rep("SHR",3), replicate=c(1:2,1), pairedEndReads=FALSE, controlFiles=NA) states <- stateBrewer(exp, mode='combinatorial') # Bin the data data(rn4_chrominfo) binned.data <- list() for (file in files) { binned.data[[basename(file)]] <- binReads(file, binsizes=1000, stepsizes=500, experiment.table=exp, assembly=rn4_chrominfo, chromosomes='chr12') } # Obtain the univariate fits models <- list() for (i1 in 1:length(binned.data)) { models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1) } # Call multivariate peaks multimodel <- callPeaksMultivariate(models, use.states=states, eps=1, max.time=60) # Check some plots heatmapTransitionProbs(multimodel) heatmapCountCorrelation(multimodel)
Fit an HMM to multiple ChIP-seq replicates and derive correlation measures. Input is a list of uniHMM
s generated by callPeaksUnivariate
.
callPeaksReplicates( hmm.list, max.states = 32, force.equal = FALSE, eps = 0.01, max.iter = NULL, max.time = NULL, keep.posteriors = TRUE, num.threads = 1, max.distance = 0.2, per.chrom = TRUE )
callPeaksReplicates( hmm.list, max.states = 32, force.equal = FALSE, eps = 0.01, max.iter = NULL, max.time = NULL, keep.posteriors = TRUE, num.threads = 1, max.distance = 0.2, per.chrom = TRUE )
hmm.list |
A list of |
max.states |
The maximum number of combinatorial states to consider. The default (32) is sufficient to treat up to 5 replicates exactly and more than 5 replicates approximately. |
force.equal |
The default ( |
eps |
Convergence threshold for the Baum-Welch algorithm. |
max.iter |
The maximum number of iterations for the Baum-Welch algorithm. The default |
max.time |
The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. The default |
keep.posteriors |
If set to |
num.threads |
Number of threads to use. Setting this to >1 may give increased performance. |
max.distance |
This number is used as a cutoff to group replicates based on their distance matrix. The lower this number, the more similar replicates have to be to be grouped together. |
per.chrom |
If |
Output is a multiHMM
object with additional entry replicateInfo
. If only one uniHMM
was given as input, a simple list() with the replicateInfo
is returned.
Aaron Taudt
multiHMM
, callPeaksUnivariate
, callPeaksMultivariate
# Let's get some example data with 3 replicates file.path <- system.file("extdata","euratrans", package='chromstaRData') files <- list.files(file.path, pattern="H3K27me3.*SHR.*bam$", full.names=TRUE)[1:3] # Obtain chromosome lengths. This is only necessary for BED files. BAM files are # handled automatically. data(rn4_chrominfo) # Define experiment structure exp <- data.frame(file=files, mark='H3K27me3', condition='SHR', replicate=1:3, pairedEndReads=FALSE, controlFiles=NA) # We use bin size 1000bp and chromosome 12 to keep the example quick binned.data <- list() for (file in files) { binned.data[[basename(file)]] <- binReads(file, binsizes=1000, stepsizes=500, experiment.table=exp, assembly=rn4_chrominfo, chromosomes='chr12') } # The univariate fit is obtained for each replicate models <- list() for (i1 in 1:length(binned.data)) { models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1) } # Obtain peak calls considering information from all replicates multi.model <- callPeaksReplicates(models, force.equal=TRUE, max.time=60, eps=1)
# Let's get some example data with 3 replicates file.path <- system.file("extdata","euratrans", package='chromstaRData') files <- list.files(file.path, pattern="H3K27me3.*SHR.*bam$", full.names=TRUE)[1:3] # Obtain chromosome lengths. This is only necessary for BED files. BAM files are # handled automatically. data(rn4_chrominfo) # Define experiment structure exp <- data.frame(file=files, mark='H3K27me3', condition='SHR', replicate=1:3, pairedEndReads=FALSE, controlFiles=NA) # We use bin size 1000bp and chromosome 12 to keep the example quick binned.data <- list() for (file in files) { binned.data[[basename(file)]] <- binReads(file, binsizes=1000, stepsizes=500, experiment.table=exp, assembly=rn4_chrominfo, chromosomes='chr12') } # The univariate fit is obtained for each replicate models <- list() for (i1 in 1:length(binned.data)) { models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1) } # Obtain peak calls considering information from all replicates multi.model <- callPeaksReplicates(models, force.equal=TRUE, max.time=60, eps=1)
Fit a HMM to a ChIP-seq sample to determine the modification state of genomic regions, e.g. call peaks in the sample.
callPeaksUnivariate( binned.data, control.data = NULL, prefit.on.chr = NULL, short = TRUE, eps = 0.1, init = "standard", max.time = NULL, max.iter = 5000, num.trials = 1, eps.try = NULL, num.threads = 1, read.cutoff = TRUE, read.cutoff.quantile = 1, read.cutoff.absolute = 500, max.mean = Inf, post.cutoff = 0.5, control = FALSE, keep.posteriors = FALSE, keep.densities = FALSE, verbosity = 1 )
callPeaksUnivariate( binned.data, control.data = NULL, prefit.on.chr = NULL, short = TRUE, eps = 0.1, init = "standard", max.time = NULL, max.iter = 5000, num.trials = 1, eps.try = NULL, num.threads = 1, read.cutoff = TRUE, read.cutoff.quantile = 1, read.cutoff.absolute = 500, max.mean = Inf, post.cutoff = 0.5, control = FALSE, keep.posteriors = FALSE, keep.densities = FALSE, verbosity = 1 )
binned.data |
A |
control.data |
Input control for the experiment. A |
prefit.on.chr |
A chromosome that is used to pre-fit the Hidden Markov Model. Set to |
short |
If |
eps |
Convergence threshold for the Baum-Welch algorithm. |
init |
One of the following initialization procedures:
|
max.time |
The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. The default |
max.iter |
The maximum number of iterations for the Baum-Welch algorithm. The default |
num.trials |
The number of trials to run the HMM. Each time, the HMM is seeded with different random initial values. The HMM with the best likelihood is given as output. |
eps.try |
If code num.trials is set to greater than 1, |
num.threads |
Number of threads to use. Setting this to >1 may give increased performance. |
read.cutoff |
The default ( |
read.cutoff.quantile |
A quantile between 0 and 1. Should be near 1. Read counts above this quantile will be set to the read count specified by this quantile. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. If option |
read.cutoff.absolute |
Read counts above this value will be set to the read count specified by this value. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. If option |
max.mean |
If |
post.cutoff |
False discovery rate. codeNULL means that the state with maximum posterior probability will be chosen, irrespective of its absolute probability (default=codeNULL). |
control |
If set to |
keep.posteriors |
If set to |
keep.densities |
If set to |
verbosity |
Verbosity level for the fitting procedure. 0 - No output, 1 - Iterations are printed. |
This function is similar to callPeaksUnivariateAllChr
but allows to pre-fit on a single chromosome instead of the whole genome. This gives a significant performance increase and can help to converge into a better fit in case of unsteady quality for some chromosomes.
A uniHMM
object.
Aaron Taudt, Maria Colome Tatche
## Get an example BAM file with ChIP-seq reads file <- system.file("extdata", "euratrans", "lv-H3K27me3-BN-male-bio2-tech1.bam", package="chromstaRData") ## Bin the BED file into bin size 1000bp data(rn4_chrominfo) data(experiment_table) binned <- binReads(file, experiment.table=experiment_table, assembly=rn4_chrominfo, binsizes=1000, stepsizes=500, chromosomes='chr12') ## Fit the univariate Hidden Markov Model hmm <- callPeaksUnivariate(binned, max.time=60, eps=1) ## Check if the fit is ok plotHistogram(hmm)
## Get an example BAM file with ChIP-seq reads file <- system.file("extdata", "euratrans", "lv-H3K27me3-BN-male-bio2-tech1.bam", package="chromstaRData") ## Bin the BED file into bin size 1000bp data(rn4_chrominfo) data(experiment_table) binned <- binReads(file, experiment.table=experiment_table, assembly=rn4_chrominfo, binsizes=1000, stepsizes=500, chromosomes='chr12') ## Fit the univariate Hidden Markov Model hmm <- callPeaksUnivariate(binned, max.time=60, eps=1) ## Check if the fit is ok plotHistogram(hmm)
Fit a HMM to a ChIP-seq sample to determine the modification state of genomic regions, e.g. call peaks in the sample.
callPeaksUnivariateAllChr( binned.data, control.data = NULL, eps = 0.01, init = "standard", max.time = NULL, max.iter = NULL, num.trials = 1, eps.try = NULL, num.threads = 1, read.cutoff = TRUE, read.cutoff.quantile = 1, read.cutoff.absolute = 500, max.mean = Inf, post.cutoff = 0.5, control = FALSE, keep.posteriors = FALSE, keep.densities = FALSE, verbosity = 1 )
callPeaksUnivariateAllChr( binned.data, control.data = NULL, eps = 0.01, init = "standard", max.time = NULL, max.iter = NULL, num.trials = 1, eps.try = NULL, num.threads = 1, read.cutoff = TRUE, read.cutoff.quantile = 1, read.cutoff.absolute = 500, max.mean = Inf, post.cutoff = 0.5, control = FALSE, keep.posteriors = FALSE, keep.densities = FALSE, verbosity = 1 )
binned.data |
A |
control.data |
Input control for the experiment. A |
eps |
Convergence threshold for the Baum-Welch algorithm. |
init |
One of the following initialization procedures:
|
max.time |
The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. The default |
max.iter |
The maximum number of iterations for the Baum-Welch algorithm. The default |
num.trials |
The number of trials to run the HMM. Each time, the HMM is seeded with different random initial values. The HMM with the best likelihood is given as output. |
eps.try |
If code num.trials is set to greater than 1, |
num.threads |
Number of threads to use. Setting this to >1 may give increased performance. |
read.cutoff |
The default ( |
read.cutoff.quantile |
A quantile between 0 and 1. Should be near 1. Read counts above this quantile will be set to the read count specified by this quantile. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. If option |
read.cutoff.absolute |
Read counts above this value will be set to the read count specified by this value. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. If option |
max.mean |
If |
post.cutoff |
False discovery rate. codeNULL means that the state with maximum posterior probability will be chosen, irrespective of its absolute probability (default=codeNULL). |
control |
If set to |
keep.posteriors |
If set to |
keep.densities |
If set to |
verbosity |
Verbosity level for the fitting procedure. 0 - No output, 1 - Iterations are printed. |
The Hidden Markov Model which is used to classify the bins uses 3 states: state 'zero-inflation' with a delta function as emission densitiy (only zero read counts), 'unmodified' and 'modified' with Negative Binomials as emission densities. A Baum-Welch algorithm is employed to estimate the parameters of the distributions. Please refer to our manuscript at http://dx.doi.org/10.1101/038612 for a detailed description of the method.
A uniHMM
object.
Aaron Taudt, Maria Coome Tatche
Adjusts the peak calls of a uniHMM
, multiHMM
or combinedMultiHMM
object with a cutoff on the maximum-posterior within each peak. Higher values of maxPost.cutoff
mean less sensitive and more precise peak calls. Remaining peaks are kept intact, as opposed to function changePostCutoff
, where broad peaks are fragmented. This function was formerly called 'changeFDR' and is still available for backwards compatibiltiy.
changeMaxPostCutoff(model, maxPost.cutoff = 0.99, invert = FALSE) changeFDR(model, fdr = 0.01, invert = FALSE)
changeMaxPostCutoff(model, maxPost.cutoff = 0.99, invert = FALSE) changeFDR(model, fdr = 0.01, invert = FALSE)
model |
|
maxPost.cutoff |
A vector of values between 0 and 1 for each column in |
invert |
Select peaks below ( |
fdr |
Same as |
Each peak has a maximum-posterior (maxPostInPeak, between 0 and 1) associated. The sensitivity is adjusted with a simple cutoff on maxPostInPeak, e.g. for maxPost.cutoff = 0.99
only peaks with maxPostInPeak >= 0.99
will be selected.
The input object is returned with adjusted peak calls.
changeFDR
: This function was renamed to 'changeMaxPostCutoff' in chromstaR 1.5.1 but it still available for backwards compatibility.
Aaron Taudt
## Get an example uniHMM ## file <- system.file("data","H3K27me3-BN-rep1.RData", package="chromstaR") model <- get(load(file)) ## Compare fits with different fdrs plotHistogram(model) + ylim(0,0.25) + ylim(0,0.3) plotHistogram(changeMaxPostCutoff(model, maxPost.cutoff=0.99)) + ylim(0,0.3) plotHistogram(changeMaxPostCutoff(model, maxPost.cutoff=1-1e-12)) + ylim(0,0.3) ## Get an example multiHMM ## file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) genomicFrequencies(model) model.new <- changeMaxPostCutoff(model, maxPost.cutoff=0.9999, invert=FALSE) genomicFrequencies(model.new) ## Get an example combinedMultiHMM ## file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file)) genomicFrequencies(model) model.new <- changeMaxPostCutoff(model, maxPost.cutoff=0.9999, invert=FALSE) genomicFrequencies(model.new)
## Get an example uniHMM ## file <- system.file("data","H3K27me3-BN-rep1.RData", package="chromstaR") model <- get(load(file)) ## Compare fits with different fdrs plotHistogram(model) + ylim(0,0.25) + ylim(0,0.3) plotHistogram(changeMaxPostCutoff(model, maxPost.cutoff=0.99)) + ylim(0,0.3) plotHistogram(changeMaxPostCutoff(model, maxPost.cutoff=1-1e-12)) + ylim(0,0.3) ## Get an example multiHMM ## file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) genomicFrequencies(model) model.new <- changeMaxPostCutoff(model, maxPost.cutoff=0.9999, invert=FALSE) genomicFrequencies(model.new) ## Get an example combinedMultiHMM ## file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file)) genomicFrequencies(model) model.new <- changeMaxPostCutoff(model, maxPost.cutoff=0.9999, invert=FALSE) genomicFrequencies(model.new)
Adjusts the peak calls of a uniHMM
, multiHMM
or combinedMultiHMM
object with the given posterior cutoff.
changePostCutoff(model, post.cutoff = 0.5)
changePostCutoff(model, post.cutoff = 0.5)
model |
|
post.cutoff |
A vector of posterior cutoff values between 0 and 1 the same length as |
Posterior probabilities are between 0 and 1. Peaks are called if the posteriors for a state (univariate) or sample (multivariate) are >= post.cutoff
.
The input object is returned with adjusted peak calls.
Aaron Taudt
## Get an example BAM file with ChIP-seq reads file <- system.file("extdata", "euratrans", "lv-H3K27me3-BN-male-bio2-tech1.bam", package="chromstaRData") ## Bin the BED file into bin size 1000bp data(rn4_chrominfo) data(experiment_table) binned <- binReads(file, experiment.table=experiment_table, assembly=rn4_chrominfo, binsizes=1000, stepsizes=500, chromosomes='chr12') plotHistogram(binned) ## Fit HMM model <- callPeaksUnivariate(binned, keep.posteriors=TRUE, verbosity=0) ## Compare fits with different post.cutoffs plotHistogram(changePostCutoff(model, post.cutoff=0.01)) + ylim(0,0.25) plotHistogram(model) + ylim(0,0.25) plotHistogram(changePostCutoff(model, post.cutoff=0.99)) + ylim(0,0.25) ## Get an example multiHMM ## file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) genomicFrequencies(model) model.new <- changePostCutoff(model, post.cutoff=0.9999) genomicFrequencies(model.new) ## Get an example combinedMultiHMM ## file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file)) genomicFrequencies(model) model.new <- changePostCutoff(model, post.cutoff=0.9999) genomicFrequencies(model.new)
## Get an example BAM file with ChIP-seq reads file <- system.file("extdata", "euratrans", "lv-H3K27me3-BN-male-bio2-tech1.bam", package="chromstaRData") ## Bin the BED file into bin size 1000bp data(rn4_chrominfo) data(experiment_table) binned <- binReads(file, experiment.table=experiment_table, assembly=rn4_chrominfo, binsizes=1000, stepsizes=500, chromosomes='chr12') plotHistogram(binned) ## Fit HMM model <- callPeaksUnivariate(binned, keep.posteriors=TRUE, verbosity=0) ## Compare fits with different post.cutoffs plotHistogram(changePostCutoff(model, post.cutoff=0.01)) + ylim(0,0.25) plotHistogram(model) + ylim(0,0.25) plotHistogram(changePostCutoff(model, post.cutoff=0.99)) + ylim(0,0.25) ## Get an example multiHMM ## file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) genomicFrequencies(model) model.new <- changePostCutoff(model, post.cutoff=0.9999) genomicFrequencies(model.new) ## Get an example combinedMultiHMM ## file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file)) genomicFrequencies(model) model.new <- changePostCutoff(model, post.cutoff=0.9999) genomicFrequencies(model.new)
This function performs binning
, univariate peak calling
and multivariate peak calling
from a list of input files.
Chromstar( inputfolder, experiment.table, outputfolder, configfile = NULL, numCPU = 1, binsize = 1000, stepsize = binsize/2, assembly = NULL, chromosomes = NULL, remove.duplicate.reads = TRUE, min.mapq = 10, format = NULL, prefit.on.chr = NULL, eps.univariate = 0.1, max.time = NULL, max.iter = 5000, read.cutoff.absolute = 500, keep.posteriors = TRUE, mode = "differential", max.states = 128, per.chrom = TRUE, eps.multivariate = 0.01, exclusive.table = NULL )
Chromstar( inputfolder, experiment.table, outputfolder, configfile = NULL, numCPU = 1, binsize = 1000, stepsize = binsize/2, assembly = NULL, chromosomes = NULL, remove.duplicate.reads = TRUE, min.mapq = 10, format = NULL, prefit.on.chr = NULL, eps.univariate = 0.1, max.time = NULL, max.iter = 5000, read.cutoff.absolute = 500, keep.posteriors = TRUE, mode = "differential", max.states = 128, per.chrom = TRUE, eps.multivariate = 0.01, exclusive.table = NULL )
inputfolder |
Folder with either BAM or BED-6 (see |
experiment.table |
A |
outputfolder |
Folder where the results and intermediate files will be written to. |
configfile |
A file specifying the parameters of this function (without |
numCPU |
Number of threads to use for the analysis. Beware that more CPUs also means more memory is needed. If you experience crashes of R with higher numbers of this parameter, leave it at |
binsize |
An integer specifying the bin size that is used for the analysis. |
stepsize |
An integer specifying the step size for analysis. |
assembly |
A |
chromosomes |
If only a subset of the chromosomes should be imported, specify them here. |
remove.duplicate.reads |
A logical indicating whether or not duplicate reads should be removed. |
min.mapq |
Minimum mapping quality when importing from BAM files. Set |
format |
One of |
prefit.on.chr |
A chromosome that is used to pre-fit the Hidden Markov Model. Set to |
eps.univariate |
Convergence threshold for the univariate Baum-Welch algorithm. |
max.time |
The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. The default |
max.iter |
The maximum number of iterations for the Baum-Welch algorithm. The default |
read.cutoff.absolute |
Read counts above this value will be set to the read count specified by this value. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. If option |
keep.posteriors |
If set to |
mode |
One of
|
max.states |
The maximum number of states to use in the multivariate part. If set to |
per.chrom |
If set to |
eps.multivariate |
Convergence threshold for the multivariate Baum-Welch algorithm. |
exclusive.table |
A |
NULL
## Prepare the file paths. Exchange this with your input and output directories. inputfolder <- system.file("extdata","euratrans", package="chromstaRData") outputfolder <- file.path(tempdir(), 'SHR-example') ## Define experiment structure data(experiment_table_SHR) ## Define assembly # This is only necessary if you have BED files, BAM files are handled automatically. # For common assemblies you can also specify them as 'hg19' for example. data(rn4_chrominfo) ## Run ChromstaR Chromstar(inputfolder, experiment.table=experiment_table_SHR, outputfolder=outputfolder, numCPU=4, binsize=1000, assembly=rn4_chrominfo, prefit.on.chr='chr12', chromosomes='chr12', mode='combinatorial', eps.univariate=1, eps.multivariate=1)
## Prepare the file paths. Exchange this with your input and output directories. inputfolder <- system.file("extdata","euratrans", package="chromstaRData") outputfolder <- file.path(tempdir(), 'SHR-example') ## Define experiment structure data(experiment_table_SHR) ## Define assembly # This is only necessary if you have BED files, BAM files are handled automatically. # For common assemblies you can also specify them as 'hg19' for example. data(rn4_chrominfo) ## Run ChromstaR Chromstar(inputfolder, experiment.table=experiment_table_SHR, outputfolder=outputfolder, numCPU=4, binsize=1000, assembly=rn4_chrominfo, prefit.on.chr='chr12', chromosomes='chr12', mode='combinatorial', eps.univariate=1, eps.multivariate=1)
chromstaR defines several objects.
uniHMM
: Returned by callPeaksUnivariate
.
multiHMM
: Returned by callPeaksMultivariate
and callPeaksReplicates
.
combinedMultiHMM
: Returned by combineMultivariates
.
The function will collapse consecutive bins which have, for example, the same combinatorial state.
collapseBins( data, column2collapseBy = NULL, columns2sumUp = NULL, columns2average = NULL, columns2getMax = NULL, columns2drop = NULL )
collapseBins( data, column2collapseBy = NULL, columns2sumUp = NULL, columns2average = NULL, columns2getMax = NULL, columns2drop = NULL )
data |
A data.frame containing the genomic coordinates in the first three columns. |
column2collapseBy |
The number of the column which will be used to collapse all other inputs. If a set of consecutive bins has the same value in this column, they will be aggregated into one bin with adjusted genomic coordinates. |
columns2sumUp |
Column numbers that will be summed during the aggregation process. |
columns2average |
Column numbers that will be averaged during the aggregation process. |
columns2getMax |
Column numbers where the maximum will be chosen during the aggregation process. |
columns2drop |
Column numbers that will be dropped after the aggregation process. |
The following tables illustrate the principle of the collapsing:
Input data:
seqnames | start | end | column2collapseBy | moreColumns | columns2sumUp |
chr1 | 0 | 199 | 2 | 1 10 | 1 3 |
chr1 | 200 | 399 | 2 | 2 11 | 0 3 |
chr1 | 400 | 599 | 2 | 3 12 | 1 3 |
chr1 | 600 | 799 | 1 | 4 13 | 0 3 |
chr1 | 800 | 999 | 1 | 5 14 | 1 3 |
Output data:
seqnames | start | end | column2collapseBy | moreColumns | columns2sumUp |
chr1 | 0 | 599 | 2 | 1 10 | 2 9 |
chr1 | 600 | 999 | 1 | 4 13 | 1 6 |
A data.frame.
Aaron Taudt
## Load example data ## Get an example multiHMM file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) df <- as.data.frame(model$bins) shortdf <- collapseBins(df, column2collapseBy='state', columns2sumUp='width', columns2average=6:9)
## Load example data ## Get an example multiHMM file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) df <- as.data.frame(model$bins) shortdf <- collapseBins(df, column2collapseBy='state', columns2sumUp='width', columns2average=6:9)
Get the combinatorial states of a list of models generated by callPeaksUnivariate
. The function returns the decimal combinatorial states for each bin (see details for an explanation of combinatorial state).
combinatorialStates(hmm.list, binary = FALSE)
combinatorialStates(hmm.list, binary = FALSE)
hmm.list |
A list of models generated by |
binary |
If |
For a given model, each genomic bin can be either called 'unmodified' or 'modified', depending on the posterior probabilities estimated by the Baum-Welch. Thus, a list of models defines a binary combinatorial state for each bin. This binary combinatorial state can be expressed as a decimal number. Example: We have 4 histone modifications, and we run the univariate HMM for each of them. Then we use a false discovery rate of 0.5 to call each bin either 'unmodified' or 'modified'. The resulting binary combinatorial states can then be converted to decimal representation. The following table illustrates this:
bin | modification state | decimal state | |||
model1 | model2 | model3 | model4 | ||
1 | 0 | 0 | 1 | 0 | 2 |
2 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 1 | 1 | 0 | 6 |
4 | 0 | 1 | 1 | 1 | 7 |
Output is a vector of integers representing the combinatorial state of each bin.
Aaron Taudt
# Get example BAM files for 3 different marks in hypertensive rat (SHR) file.path <- system.file("extdata","euratrans", package='chromstaRData') files <- list.files(file.path, full.names=TRUE, pattern='SHR.*bam$')[c(1,4,6)] # Bin the data data(rn4_chrominfo) binned.data <- list() for (file in files) { binned.data[[basename(file)]] <- binReads(file, binsizes=1000, stepsizes=500, assembly=rn4_chrominfo, chromosomes='chr12') } # Obtain the univariate fits models <- list() for (i1 in 1:length(binned.data)) { models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1) } ## Get the decimal representation of the combinatorial state of this combination of models states <- chromstaR:::combinatorialStates(models, binary=FALSE) ## Show number of each state table(states)
# Get example BAM files for 3 different marks in hypertensive rat (SHR) file.path <- system.file("extdata","euratrans", package='chromstaRData') files <- list.files(file.path, full.names=TRUE, pattern='SHR.*bam$')[c(1,4,6)] # Bin the data data(rn4_chrominfo) binned.data <- list() for (file in files) { binned.data[[basename(file)]] <- binReads(file, binsizes=1000, stepsizes=500, assembly=rn4_chrominfo, chromosomes='chr12') } # Obtain the univariate fits models <- list() for (i1 in 1:length(binned.data)) { models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1) } ## Get the decimal representation of the combinatorial state of this combination of models states <- chromstaR:::combinatorialStates(models, binary=FALSE) ## Show number of each state table(states)
The combined multivariate HMM object is output of the function combineMultivariates
and is a list()
with various entries. The class()
attribute of this list was set to "combinedMultiHMM". For a given hmm, the entries can be accessed with the list operators 'hmm[[]]' or 'hmm$'.
A list()
with the following entries:
info |
Experiment table for this object. |
bins |
A |
segments |
Same as |
segments.per.condition |
A |
peaks |
A |
frequencies |
Genomic frequencies of combinations. |
mode |
Mode of analysis. |
combineMultivariates
, uniHMM
, multiHMM
Combine combinatorial states from several multiHMM
objects. Combinatorial states can be combined for objects containing multiple marks (mode='combinatorial'
) or multiple conditions (mode='differential'
).
combineMultivariates(hmms, mode)
combineMultivariates(hmms, mode)
hmms |
A |
mode |
Mode of combination. See |
A combinedMultiHMM
objects with combinatorial states for each condition.
Aaron Taudt
### Multivariate peak calling for spontaneous hypertensive rat (SHR) ### # Get example BAM files for 2 different marks in hypertensive rat (SHR) file.path <- system.file("extdata","euratrans", package='chromstaRData') files <- list.files(file.path, full.names=TRUE, pattern='SHR.*bam$')[c(1:2,4:5)] # Construct experiment structure exp <- data.frame(file=files, mark=c("H3K27me3","H3K27me3","H3K4me3","H3K4me3"), condition=rep("SHR",4), replicate=c(1:2,1:2), pairedEndReads=FALSE, controlFiles=NA) states <- stateBrewer(exp, mode='combinatorial') # Bin the data data(rn4_chrominfo) binned.data <- list() for (file in files) { binned.data[[basename(file)]] <- binReads(file, binsizes=1000, stepsizes=500, experiment.table=exp, assembly=rn4_chrominfo, chromosomes='chr12') } # Obtain the univariate fits models <- list() for (i1 in 1:length(binned.data)) { models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1) } # Call multivariate peaks multimodel.SHR <- callPeaksMultivariate(models, use.states=states, eps=1, max.time=60) #'### Multivariate peak calling for brown norway (BN) rat ### # Get example BAM files for 2 different marks in brown norway rat file.path <- system.file("extdata","euratrans", package='chromstaRData') files <- list.files(file.path, full.names=TRUE, pattern='BN.*bam$')[c(1:2,3:4)] # Construct experiment structure exp <- data.frame(file=files, mark=c("H3K27me3","H3K27me3","H3K4me3","H3K4me3"), condition=rep("BN",4), replicate=c(1:2,1:2), pairedEndReads=FALSE, controlFiles=NA) states <- stateBrewer(exp, mode='combinatorial') # Bin the data data(rn4_chrominfo) binned.data <- list() for (file in files) { binned.data[[basename(file)]] <- binReads(file, binsizes=1000, stepsizes=500, experiment.table=exp, assembly=rn4_chrominfo, chromosomes='chr12') } # Obtain the univariate fits models <- list() for (i1 in 1:length(binned.data)) { models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1) } # Call multivariate peaks multimodel.BN <- callPeaksMultivariate(models, use.states=states, eps=1, max.time=60) ### Combine multivariates ### hmms <- list(multimodel.SHR, multimodel.BN) comb.model <- combineMultivariates(hmms, mode='combinatorial')
### Multivariate peak calling for spontaneous hypertensive rat (SHR) ### # Get example BAM files for 2 different marks in hypertensive rat (SHR) file.path <- system.file("extdata","euratrans", package='chromstaRData') files <- list.files(file.path, full.names=TRUE, pattern='SHR.*bam$')[c(1:2,4:5)] # Construct experiment structure exp <- data.frame(file=files, mark=c("H3K27me3","H3K27me3","H3K4me3","H3K4me3"), condition=rep("SHR",4), replicate=c(1:2,1:2), pairedEndReads=FALSE, controlFiles=NA) states <- stateBrewer(exp, mode='combinatorial') # Bin the data data(rn4_chrominfo) binned.data <- list() for (file in files) { binned.data[[basename(file)]] <- binReads(file, binsizes=1000, stepsizes=500, experiment.table=exp, assembly=rn4_chrominfo, chromosomes='chr12') } # Obtain the univariate fits models <- list() for (i1 in 1:length(binned.data)) { models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1) } # Call multivariate peaks multimodel.SHR <- callPeaksMultivariate(models, use.states=states, eps=1, max.time=60) #'### Multivariate peak calling for brown norway (BN) rat ### # Get example BAM files for 2 different marks in brown norway rat file.path <- system.file("extdata","euratrans", package='chromstaRData') files <- list.files(file.path, full.names=TRUE, pattern='BN.*bam$')[c(1:2,3:4)] # Construct experiment structure exp <- data.frame(file=files, mark=c("H3K27me3","H3K27me3","H3K4me3","H3K4me3"), condition=rep("BN",4), replicate=c(1:2,1:2), pairedEndReads=FALSE, controlFiles=NA) states <- stateBrewer(exp, mode='combinatorial') # Bin the data data(rn4_chrominfo) binned.data <- list() for (file in files) { binned.data[[basename(file)]] <- binReads(file, binsizes=1000, stepsizes=500, experiment.table=exp, assembly=rn4_chrominfo, chromosomes='chr12') } # Obtain the univariate fits models <- list() for (i1 in 1:length(binned.data)) { models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1) } # Call multivariate peaks multimodel.BN <- callPeaksMultivariate(models, use.states=states, eps=1, max.time=60) ### Combine multivariates ### hmms <- list(multimodel.SHR, multimodel.BN) comb.model <- combineMultivariates(hmms, mode='combinatorial')
Convert combinatorial states in decimal representation to combinatorial states in binary representation and vice versa.
dec2bin(dec, colnames = NULL, ndigits = NULL) bin2dec(bin)
dec2bin(dec, colnames = NULL, ndigits = NULL) bin2dec(bin)
dec |
A vector with whole numbers. |
colnames |
The column names for the returned matrix. If specified, |
ndigits |
The number of digits that the binary representation should have. If unspecified, the shortest possible representation will be chosen. |
bin |
A matrix with only 0 and 1 (or TRUE and FALSE) as entries. One combinatorial state per row. |
chromstaR uses decimal numbers to represent combinatorial states of peaks. These functions serve as a convenient way to get from the efficient decimal representation to a more human-readable binary representation.
A vector of integers for bin2dec
and a matrix of logicals with one state per row for dec2bin
.
dec2bin
: Decimal to binary conversion.
bin2dec
: Binary to decimal conversion.
Aaron Taudt
decimal.states <- c(0:31) binary.states <- dec2bin(decimal.states, colnames=paste0('mark',1:5)) control.decimal.states <- bin2dec(binary.states)
decimal.states <- c(0:31) binary.states <- dec2bin(decimal.states, colnames=paste0('mark',1:5)) control.decimal.states <- bin2dec(binary.states)
Plotting functions for enrichment analysis of multiHMM
or combinedMultiHMM
objects with any annotation of interest, specified as a GRanges-class
object.
plotFoldEnrichHeatmap( hmm, annotations, what = "combinations", combinations = NULL, marks = NULL, plot = TRUE, logscale = TRUE ) plotEnrichCountHeatmap( hmm, annotation, bp.around.annotation = 10000, max.rows = 1000, combinations = NULL, colorByCombinations = sortByCombinations, sortByCombinations = is.null(sortByColumns), sortByColumns = NULL ) plotEnrichment( hmm, annotation, bp.around.annotation = 10000, region = c("start", "inside", "end"), num.intervals = 20, what = "combinations", combinations = NULL, marks = NULL, statistic = "fold", logscale = TRUE )
plotFoldEnrichHeatmap( hmm, annotations, what = "combinations", combinations = NULL, marks = NULL, plot = TRUE, logscale = TRUE ) plotEnrichCountHeatmap( hmm, annotation, bp.around.annotation = 10000, max.rows = 1000, combinations = NULL, colorByCombinations = sortByCombinations, sortByCombinations = is.null(sortByColumns), sortByColumns = NULL ) plotEnrichment( hmm, annotation, bp.around.annotation = 10000, region = c("start", "inside", "end"), num.intervals = 20, what = "combinations", combinations = NULL, marks = NULL, statistic = "fold", logscale = TRUE )
hmm |
A |
annotations |
A |
what |
One of |
combinations |
A vector with combinations for which the enrichment will be calculated, e.g. |
marks |
A vector with marks for which the enrichment is plotted. If |
plot |
A logical indicating whether the plot or an array with the fold enrichment values is returned. |
logscale |
Set to |
annotation |
A |
bp.around.annotation |
An integer specifying the number of basepairs up- and downstream of the annotation for which the enrichment will be calculated. |
max.rows |
An integer specifying the number of randomly subsampled rows that are plotted from the |
colorByCombinations |
A logical indicating whether or not to color the heatmap by combinations. |
sortByCombinations |
A logical indicating whether or not to sort the heatmap by combinations. |
sortByColumns |
An integer vector specifying the column numbers by which to sort the rows. If |
region |
A combination of |
num.intervals |
Number of intervals for enrichment 'inside' of annotation. |
statistic |
The statistic to calculate. Either 'fold' for fold enrichments or 'fraction' for fraction of bins falling into the annotation. |
A ggplot
object containing the plot or a list() with ggplot
objects if several plots are returned. For plotFoldEnrichHeatmap
a named array with fold enrichments if plot=FALSE
.
plotFoldEnrichHeatmap
: Compute the fold enrichment of combinatorial states for multiple annotations.
plotEnrichCountHeatmap
: Plot read counts around annotation as heatmap.
plotEnrichment
: Plot fold enrichment of combinatorial states around and inside of annotation.
Aaron Taudt
### Get an example multiHMM ### file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) ### Obtain gene coordinates for rat from biomaRt ### library(biomaRt) ensembl <- useEnsembl(biomart='ENSEMBL_MART_ENSEMBL', dataset='rnorvegicus_gene_ensembl') genes <- getBM(attributes=c('ensembl_gene_id', 'chromosome_name', 'start_position', 'end_position', 'strand', 'external_gene_name', 'gene_biotype'), mart=ensembl) # Transform to GRanges for easier handling genes <- GRanges(seqnames=paste0('chr',genes$chromosome_name), ranges=IRanges(start=genes$start, end=genes$end), strand=genes$strand, name=genes$external_gene_name, biotype=genes$gene_biotype) # Rename chrMT to chrM seqlevels(genes)[seqlevels(genes)=='chrMT'] <- 'chrM' print(genes) ### Make the enrichment plots ### # We expect promoter [H3K4me3] and bivalent-promoter signatures [H3K4me3+H3K27me3] # to be enriched at transcription start sites. plotEnrichment(hmm = model, annotation = genes, bp.around.annotation = 15000) + ggtitle('Fold enrichment around genes') + xlab('distance from gene body') # Plot enrichment only at TSS. We make use of the fact that TSS is the start of a gene. plotEnrichment(model, genes, region = 'start') + ggtitle('Fold enrichment around TSS') + xlab('distance from TSS in [bp]') # Note: If you want to facet the plot because you have many combinatorial states you # can do that with plotEnrichment(model, genes, region = 'start') + facet_wrap(~ combination) # Another form of visualization that shows every TSS in a heatmap # If transparency is not supported try to plot to pdf() instead. tss <- resize(genes, width = 3, fix = 'start') plotEnrichCountHeatmap(model, tss) + theme(strip.text.x = element_text(size=6)) # Fold enrichment with different biotypes, showing that protein coding genes are # enriched with (bivalent) promoter combinations [H3K4me3] and [H3K4me3+H3K27me3], # while rRNA is enriched with the empty [] and repressive combinations [H3K27me3]. tss <- resize(genes, width = 3, fix = 'start') biotypes <- split(tss, tss$biotype) plotFoldEnrichHeatmap(model, annotations=biotypes) + coord_flip()
### Get an example multiHMM ### file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) ### Obtain gene coordinates for rat from biomaRt ### library(biomaRt) ensembl <- useEnsembl(biomart='ENSEMBL_MART_ENSEMBL', dataset='rnorvegicus_gene_ensembl') genes <- getBM(attributes=c('ensembl_gene_id', 'chromosome_name', 'start_position', 'end_position', 'strand', 'external_gene_name', 'gene_biotype'), mart=ensembl) # Transform to GRanges for easier handling genes <- GRanges(seqnames=paste0('chr',genes$chromosome_name), ranges=IRanges(start=genes$start, end=genes$end), strand=genes$strand, name=genes$external_gene_name, biotype=genes$gene_biotype) # Rename chrMT to chrM seqlevels(genes)[seqlevels(genes)=='chrMT'] <- 'chrM' print(genes) ### Make the enrichment plots ### # We expect promoter [H3K4me3] and bivalent-promoter signatures [H3K4me3+H3K27me3] # to be enriched at transcription start sites. plotEnrichment(hmm = model, annotation = genes, bp.around.annotation = 15000) + ggtitle('Fold enrichment around genes') + xlab('distance from gene body') # Plot enrichment only at TSS. We make use of the fact that TSS is the start of a gene. plotEnrichment(model, genes, region = 'start') + ggtitle('Fold enrichment around TSS') + xlab('distance from TSS in [bp]') # Note: If you want to facet the plot because you have many combinatorial states you # can do that with plotEnrichment(model, genes, region = 'start') + facet_wrap(~ combination) # Another form of visualization that shows every TSS in a heatmap # If transparency is not supported try to plot to pdf() instead. tss <- resize(genes, width = 3, fix = 'start') plotEnrichCountHeatmap(model, tss) + theme(strip.text.x = element_text(size=6)) # Fold enrichment with different biotypes, showing that protein coding genes are # enriched with (bivalent) promoter combinations [H3K4me3] and [H3K4me3+H3K27me3], # while rRNA is enriched with the empty [] and repressive combinations [H3K27me3]. tss <- resize(genes, width = 3, fix = 'start') biotypes <- split(tss, tss$biotype) plotFoldEnrichHeatmap(model, annotations=biotypes) + coord_flip()
The function calculates the enrichment of a genomic feature with peaks or combinatorial states. Input is a multiHMM
object (containing the peak calls and combinatorial states) and a GRanges-class
object containing the annotation of interest (e.g. transcription start sites or genes).
enrichmentAtAnnotation( bins, info, annotation, bp.around.annotation = 10000, region = c("start", "inside", "end"), what = "combinations", num.intervals = 21, statistic = "fold" )
enrichmentAtAnnotation( bins, info, annotation, bp.around.annotation = 10000, region = c("start", "inside", "end"), what = "combinations", num.intervals = 21, statistic = "fold" )
bins |
The |
info |
The |
annotation |
A |
bp.around.annotation |
An integer specifying the number of basepairs up- and downstream of the annotation for which the enrichment will be calculated. |
region |
A combination of |
what |
One of |
num.intervals |
Number of intervals for enrichment 'inside' of annotation. |
statistic |
The statistic to calculate. Either 'fold' for fold enrichments or 'fraction' for fraction of bins falling into the annotation. |
A list()
containing data.frame()
s for enrichment of combinatorial states and binary states at the start, end and inside of the annotation.
Aaron Taudt
A data.frame
specifying the structure of the experiment.
A data.frame
with columns 'file', 'mark', 'condition', 'replicate', 'pairedEndReads' and 'controlFiles'. Avoid the use of special characters like '-' or '+' as this will confuse the internal file management.
data(experiment_table) print(experiment_table)
data(experiment_table) print(experiment_table)
These functions allow to export chromstaR-objects
as files which can be uploaded to a genome browser. Peak calls are exported in BED format (.bed.gz), read counts in wiggle format (.wig.gz) as RPKM values, and combinatorial states are exported in BED format (.bed.gz).
exportPeaks( model, filename, header = TRUE, separate.files = TRUE, trackname = NULL ) exportCounts( model, filename, header = TRUE, separate.files = TRUE, trackname = NULL ) exportCombinations( model, filename, header = TRUE, separate.files = TRUE, trackname = NULL, exclude.states = "[]", include.states = NULL )
exportPeaks( model, filename, header = TRUE, separate.files = TRUE, trackname = NULL ) exportCounts( model, filename, header = TRUE, separate.files = TRUE, trackname = NULL ) exportCombinations( model, filename, header = TRUE, separate.files = TRUE, trackname = NULL, exclude.states = "[]", include.states = NULL )
model |
|
filename |
The name of the file that will be written. The appropriate ending will be appended, either "_peaks.bed.gz" for peak-calls or "_counts.wig.gz" for read counts or "_combinations.bed.gz" for combinatorial states. Any existing file will be overwritten. |
header |
A logical indicating whether the output file will have a heading track line ( |
separate.files |
A logical indicating whether or not to produce separate files for each track. |
trackname |
Name that will be used in the "track name" field of the BED file. |
exclude.states |
A character vector with combinatorial states that will be excluded from export. |
include.states |
A character vector with combinatorial states that will be exported. If specified, |
NULL
exportPeaks
: Export peak calls in BED format.
exportCounts
: Export read counts as RPKM values in wiggle format.
exportCombinations
: Export combinatorial states in BED format.
## Get an example multiHMM file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file)) ## Export peak calls and combinatorial states exportPeaks(model, filename=tempfile()) exportCombinations(model, filename=tempfile())
## Get an example multiHMM file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file)) ## Export peak calls and combinatorial states exportPeaks(model, filename=tempfile()) exportCombinations(model, filename=tempfile())
Export GRanges as genome browser viewable file
exportGRangesAsBedFile( gr, trackname, filename, namecol = "combination", scorecol = "score", colorcol = NULL, colors = NULL, header = TRUE, append = FALSE )
exportGRangesAsBedFile( gr, trackname, filename, namecol = "combination", scorecol = "score", colorcol = NULL, colors = NULL, header = TRUE, append = FALSE )
gr |
A |
trackname |
The name that will be used as track name and description in the header. |
filename |
The name of the file that will be written. The ending ".bed.gz". Any existing file will be overwritten. |
namecol |
A character specifying the column that is used as name-column. |
scorecol |
A character specifying the column that is used as score-column. The score should contain integers in the interval [0,1000] for compatibility with the UCSC genome browser convention. |
colorcol |
A character specifying the column that is used for coloring the track. There will be one color for each unique element in |
colors |
A character vector with the colors that are used for the unique elements in |
header |
A logical indicating whether the output file will have a heading track line ( |
append |
Whether or not to append to an existing file. |
Export regions from GRanges-class
as a file which can be uploaded into a genome browser. Regions are exported in BED format (.bed.gz).
NULL
Aaron Taudt
exportPeaks
, exportCounts
, exportCombinations
### Export regions with read counts above 20 ### # Get an example BAM file with ChIP-seq reads file <- system.file("extdata", "euratrans", "lv-H3K27me3-BN-male-bio2-tech1.bam", package="chromstaRData") # Bin the file into bin size 1000bp data(rn4_chrominfo) binned <- binReads(file, assembly=rn4_chrominfo, binsizes=1000, stepsizes=500, chromosomes='chr12') plotHistogram(binned) # Export regions with read count above 20 exportGRangesAsBedFile(binned[binned$counts[,1] > 20], filename=tempfile(), trackname='read counts above 20')
### Export regions with read counts above 20 ### # Get an example BAM file with ChIP-seq reads file <- system.file("extdata", "euratrans", "lv-H3K27me3-BN-male-bio2-tech1.bam", package="chromstaRData") # Bin the file into bin size 1000bp data(rn4_chrominfo) binned <- binReads(file, assembly=rn4_chrominfo, binsizes=1000, stepsizes=500, chromosomes='chr12') plotHistogram(binned) # Export regions with read count above 20 exportGRangesAsBedFile(binned[binned$counts[,1] > 20], filename=tempfile(), trackname='read counts above 20')
Make fixed-width bins based on given bin size.
fixedWidthBins( bamfile = NULL, assembly = NULL, chrom.lengths = NULL, chromosome.format, binsizes = 1e+06, chromosomes = NULL )
fixedWidthBins( bamfile = NULL, assembly = NULL, chrom.lengths = NULL, chromosome.format, binsizes = 1e+06, chromosomes = NULL )
bamfile |
A BAM file from which the header is read to determine the chromosome lengths. If a |
assembly |
An assembly from which the chromosome lengths are determined. Please see |
chrom.lengths |
A named character vector with chromosome lengths. Names correspond to chromosomes. |
chromosome.format |
A character specifying the format of the chromosomes if |
binsizes |
A vector of bin sizes in base pairs. |
chromosomes |
A subset of chromosomes for which the bins are generated. |
A list()
of GRanges-class
objects with fixed-width bins.
Aaron Taudt
## Make fixed-width bins of size 500kb and 1Mb data(rn4_chrominfo) chrom.lengths <- rn4_chrominfo$length names(chrom.lengths) <- rn4_chrominfo$chromosome bins <- fixedWidthBins(chrom.lengths=chrom.lengths, binsizes=c(5e5,1e6)) bins ## Make bins using NCBI server (requires internet connection) # bins <- fixedWidthBins(assembly='mm10', chromosome.format='NCBI', binsizes=c(5e5,1e6))
## Make fixed-width bins of size 500kb and 1Mb data(rn4_chrominfo) chrom.lengths <- rn4_chrominfo$length names(chrom.lengths) <- rn4_chrominfo$chromosome bins <- fixedWidthBins(chrom.lengths=chrom.lengths, binsizes=c(5e5,1e6)) bins ## Make bins using NCBI server (requires internet connection) # bins <- fixedWidthBins(assembly='mm10', chromosome.format='NCBI', binsizes=c(5e5,1e6))
A data.frame containing gene coordinates and biotypes of the rn4 assembly.
A data.frame.
data(genes_rn4) head(genes_rn4)
data(genes_rn4) head(genes_rn4)
Get the genomewide frequency of each combinatorial state.
genomicFrequencies(multi.hmm, combinations = NULL, per.mark = FALSE)
genomicFrequencies(multi.hmm, combinations = NULL, per.mark = FALSE)
multi.hmm |
A |
combinations |
A vector with combinations for which the frequency will be calculated. If |
per.mark |
Set to |
A table with frequencies of each combinatorial state.
Aaron Taudt
## Get an example multiHMM file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) genomicFrequencies(model)
## Get an example multiHMM file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) genomicFrequencies(model)
Get a DataFrame with combinations from a GRanges-class
object.
getCombinations(gr)
getCombinations(gr)
gr |
A |
A DataFrame.
### Get an example multiHMM ### file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) ### Get the combinations bin.combs <- getCombinations(model$bins) print(bin.combs) seg.combs <- getCombinations(model$segments) print(seg.combs)
### Get an example multiHMM ### file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) ### Get the combinations bin.combs <- getCombinations(model$bins) print(bin.combs) seg.combs <- getCombinations(model$segments) print(seg.combs)
Get a set of distinct colors selected from colors
.
getDistinctColors( n, start.color = "blue4", exclude.colors = c("white", "black", "gray", "grey", "\\<yellow\\>", "yellow1", "lemonchiffon"), exclude.brightness.above = 1, exclude.rgb.above = 210 )
getDistinctColors( n, start.color = "blue4", exclude.colors = c("white", "black", "gray", "grey", "\\<yellow\\>", "yellow1", "lemonchiffon"), exclude.brightness.above = 1, exclude.rgb.above = 210 )
n |
Number of colors to select. If |
start.color |
Color to start the selection process from. |
exclude.colors |
Character vector with colors that should not be used. |
exclude.brightness.above |
Exclude colors where the 'brightness' value in HSV space is above. This is useful to obtain a matt palette. |
exclude.rgb.above |
Exclude colors where all RGB values are above. This is useful to exclude whitish colors. |
The function computes the euclidian distance between all colors
and iteratively selects those that have the furthest closes distance to the set of already selected colors.
A character vector with colors.
Aaron Taudt
cols <- getDistinctColors(5) pie(rep(1,5), labels=cols, col=cols)
cols <- getDistinctColors(5) pie(rep(1,5), labels=cols, col=cols)
Get the colors that are used for plotting.
getStateColors(labels = NULL)
getStateColors(labels = NULL)
labels |
Any combination of |
A character vector with colors.
cols <- getStateColors() pie(1:length(cols), col=cols, labels=names(cols))
cols <- getStateColors() pie(1:length(cols), col=cols, labels=names(cols))
Plot a heatmap that shows the binary presence/absence of marks for the different combinations.
heatmapCombinations(model = NULL, marks = NULL, emissionProbs = NULL)
heatmapCombinations(model = NULL, marks = NULL, emissionProbs = NULL)
model |
A |
marks |
A character vector with histone marks. If specified, |
emissionProbs |
A matrix with emission probabilities where |
A ggplot
object.
Aaron Taudt
marks <- c('H3K4me3','H3K27me3','H4K20me1') heatmapCombinations(marks=marks) file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") heatmapCombinations(file)
marks <- c('H3K4me3','H3K27me3','H4K20me1') heatmapCombinations(marks=marks) file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") heatmapCombinations(file)
Heatmap of read count correlations (see cor
).
heatmapCountCorrelation(model, cluster = TRUE)
heatmapCountCorrelation(model, cluster = TRUE)
model |
A |
cluster |
Logical indicating whether or not to cluster the heatmap. |
A ggplot
object.
## Get an example multiHMM ## file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) ## Plot count correlations as heatmap heatmapCountCorrelation(model)
## Get an example multiHMM ## file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) ## Plot count correlations as heatmap heatmapCountCorrelation(model)
Plot a heatmap of transition probabilities for a multiHMM
model.
heatmapTransitionProbs( model = NULL, reorder.states = TRUE, transitionProbs = NULL )
heatmapTransitionProbs( model = NULL, reorder.states = TRUE, transitionProbs = NULL )
model |
A |
reorder.states |
Whether or not to reorder the states. |
transitionProbs |
A matrix with transition probabilities where |
A ggplot
object.
## Get an example multiHMM ## file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) ## Plot transition probabilites as heatmap heatmapTransitionProbs(model, reorder.states=TRUE)
## Get an example multiHMM ## file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) ## Plot transition probabilites as heatmap heatmapTransitionProbs(model, reorder.states=TRUE)
Wrapper to load chromstaR objects from file and check the class of the loaded objects.
loadHmmsFromFiles( files, check.class = c("GRanges", "uniHMM", "multiHMM", "combinedMultiHMM") )
loadHmmsFromFiles( files, check.class = c("GRanges", "uniHMM", "multiHMM", "combinedMultiHMM") )
files |
A list of |
check.class |
Any combination of |
A list of chromstaR-object
.
## Get an example BAM file file <- system.file("extdata", "euratrans", "lv-H3K27me3-BN-male-bio2-tech1.bam", package="chromstaRData") ## Bin the file into bin size 1000bp data(rn4_chrominfo) binned <- binReads(file, assembly=rn4_chrominfo, binsizes=1000, stepsizes=500, chromosomes='chr12') ## Fit the univariate Hidden Markov Model hmm <- callPeaksUnivariate(binned, max.time=60, eps=1) temp.file <- tempfile() save(hmm, file=temp.file) loaded.hmm <- loadHmmsFromFiles(temp.file)[[1]] class(loaded.hmm)
## Get an example BAM file file <- system.file("extdata", "euratrans", "lv-H3K27me3-BN-male-bio2-tech1.bam", package="chromstaRData") ## Bin the file into bin size 1000bp data(rn4_chrominfo) binned <- binReads(file, assembly=rn4_chrominfo, binsizes=1000, stepsizes=500, chromosomes='chr12') ## Fit the univariate Hidden Markov Model hmm <- callPeaksUnivariate(binned, max.time=60, eps=1) temp.file <- tempfile() save(hmm, file=temp.file) loaded.hmm <- loadHmmsFromFiles(temp.file)[[1]] class(loaded.hmm)
multiHMM
s into one objectMerge several multiHMM
s into one object. This can be done to merge fits for separate chromosomes into one object for easier handling. Merging will only be done if all models have the same IDs.
mergeChroms(multi.hmm.list, filename = NULL)
mergeChroms(multi.hmm.list, filename = NULL)
multi.hmm.list |
A list of |
filename |
The file name where the merged object will be stored. If |
A multiHMM
object or NULL, depending on option filename
.
Aaron Taudt
A combinedMultiHMM
object for demonstration purposes in examples of package chromstaR.
A combinedMultiHMM
object.
## Get an example combinedMultiHMM file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file))
## Get an example combinedMultiHMM file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file))
A multiHMM
object for demonstration purposes in examples of package chromstaR.
A multiHMM
object.
## Get an example multiHMM file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file))
## Get an example multiHMM file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file))
A uniHMM
object for demonstration purposes in examples of package chromstaR.
A uniHMM
object.
## Get an example uniHMM file <- system.file("data","H3K27me3-BN-rep1.RData", package="chromstaR") model <- get(load(file))
## Get an example uniHMM file <- system.file("data","H3K27me3-BN-rep1.RData", package="chromstaR") model <- get(load(file))
The multivariate HMM object is output of the function callPeaksMultivariate
and is a list()
with various entries. The class()
attribute of this list was set to "multiHMM". For a given hmm, the entries can be accessed with the list operators 'hmm[[]]' or 'hmm$'.
A list()
with the following entries:
info |
Experiment table for this object. |
bincounts |
A |
bins |
A |
segments |
Same as |
peaks |
A |
mapping |
A named vector giving the mapping from decimal combinatorial states to human readable combinations. |
weights |
Weight for each component. Same as |
weights.univariate |
Weights of the univariate HMMs. |
transitionProbs |
Matrix of transition probabilities from each state (row) into each state (column). |
transitionProbs.initial |
Initial |
startProbs |
Probabilities for the first bin. Same as |
startProbs.initial |
Initial |
distributions |
Emission distributions used for this model. |
convergenceInfo |
Contains information about the convergence of the Baum-Welch algorithm. |
convergenceInfo$eps |
Convergence threshold for the Baum-Welch. |
convergenceInfo$loglik |
Final loglikelihood after the last iteration. |
convergenceInfo$loglik.delta |
Change in loglikelihood after the last iteration (should be smaller than |
convergenceInfo$num.iterations |
Number of iterations that the Baum-Welch needed to converge to the desired |
convergenceInfo$time.sec |
Time in seconds that the Baum-Welch needed to converge to the desired |
correlation.matrix |
Correlation matrix of transformed reads. |
callPeaksMultivariate
, uniHMM
, combinedMultiHMM
## Get an example multiHMM file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file))
## Get an example multiHMM file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file))
Make segmentation from bins for a multiHMM
object.
multivariateSegmentation(bins, column2collapseBy = "state")
multivariateSegmentation(bins, column2collapseBy = "state")
bins |
A |
column2collapseBy |
The number of the column which will be used to collapse all other inputs. If a set of consecutive bins has the same value in this column, they will be aggregated into one bin with adjusted genomic coordinates. |
A GRanges-class
with segmented regions.
Get the expression values that overlap with each combinatorial state.
plotExpression(hmm, expression, combinations = NULL, return.marks = FALSE)
plotExpression(hmm, expression, combinations = NULL, return.marks = FALSE)
hmm |
A |
expression |
A |
combinations |
A vector with combinations for which the expression overlap will be calculated. If |
return.marks |
Set to |
A ggplot2
object if a multiHMM
was given or a named list with ggplot2
objects if a combinedMultiHMM
was given.
Aaron Taudt
## Load an example multiHMM file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) ## Obtain expression data data(expression_lv) head(expression_lv) ## We need to get coordinates for each of the genes library(biomaRt) ensembl <- useEnsembl(biomart='ENSEMBL_MART_ENSEMBL', dataset='rnorvegicus_gene_ensembl') genes <- getBM(attributes=c('ensembl_gene_id', 'chromosome_name', 'start_position', 'end_position', 'strand', 'external_gene_name', 'gene_biotype'), mart=ensembl) expr <- merge(genes, expression_lv, by='ensembl_gene_id') # Transform to GRanges expression.SHR <- GRanges(seqnames=paste0('chr',expr$chromosome_name), ranges=IRanges(start=expr$start, end=expr$end), strand=expr$strand, name=expr$external_gene_name, biotype=expr$gene_biotype, expression=expr$expression_SHR) # We apply an asinh transformation to reduce the effect of outliers expression.SHR$expression <- asinh(expression.SHR$expression) ## Plot plotExpression(model, expression.SHR) + theme(axis.text.x=element_text(angle=0, hjust=0.5)) + ggtitle('Expression of genes overlapping combinatorial states') plotExpression(model, expression.SHR, return.marks=TRUE) + ggtitle('Expression of marks overlapping combinatorial states')
## Load an example multiHMM file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) ## Obtain expression data data(expression_lv) head(expression_lv) ## We need to get coordinates for each of the genes library(biomaRt) ensembl <- useEnsembl(biomart='ENSEMBL_MART_ENSEMBL', dataset='rnorvegicus_gene_ensembl') genes <- getBM(attributes=c('ensembl_gene_id', 'chromosome_name', 'start_position', 'end_position', 'strand', 'external_gene_name', 'gene_biotype'), mart=ensembl) expr <- merge(genes, expression_lv, by='ensembl_gene_id') # Transform to GRanges expression.SHR <- GRanges(seqnames=paste0('chr',expr$chromosome_name), ranges=IRanges(start=expr$start, end=expr$end), strand=expr$strand, name=expr$external_gene_name, biotype=expr$gene_biotype, expression=expr$expression_SHR) # We apply an asinh transformation to reduce the effect of outliers expression.SHR$expression <- asinh(expression.SHR$expression) ## Plot plotExpression(model, expression.SHR) + theme(axis.text.x=element_text(angle=0, hjust=0.5)) + ggtitle('Expression of genes overlapping combinatorial states') plotExpression(model, expression.SHR, return.marks=TRUE) + ggtitle('Expression of marks overlapping combinatorial states')
GRanges-class
object with meta-data column 'counts'.
#' @param peaklist A named list() of GRanges-class
objects containing peak coordinates.
#' @param chr,start,end Chromosome, start and end coordinates for the plot.
#' @param countcol A character giving the color for the counts.
#' @param peakcols A character vector with colors for the peaks in peaklist
.
#' @param style One of c('peaks', 'density')
.
#' @param peakTrackHeight Relative height of the tracks given in peaklist
compared to the counts
.
#' @return A ggplot
object.
#' @examples
#'## Get an example multiHMM ##
#'file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData",
#' package="chromstaR")
#'model <- get(load(file))
#'## Plot genome browser snapshot
#'bins <- model$bins
#'bins$counts <- model$bins$counts.rpkm[,1]
#'plotGenomeBrowser(counts=bins, peaklist=model$peaks,
#' chr='chr12', start=1, end=1e6)
#'
plotGenomeBrowser2 <- function(counts, peaklist=NULL, chr, start, end, countcol='black', peakcols=NULL, style='peaks', peakTrackHeight=5)
## Select ranges to plot
ranges2plot <- reduce(counts[counts@seqnames == chr & start(counts) >= start & start(counts) <= end])
## Counts
counts <- subsetByOverlaps(counts, ranges2plot)
if (style == 'peaks')
df <- data.frame(x=(start(counts)+end(counts))/2, counts=counts$counts) # plot triangles centered at middle of the bin
ggplt <- ggplot(df) + geom_area(aes_string(x='x', y='counts')) + theme(panel.grid = element_blank(), panel.background = element_blank(), axis.text.x = element_blank(), axis.title = element_blank(), axis.ticks.x = element_blank(), axis.line = element_blank())
maxcounts <- max(counts$counts)
ggplt <- ggplt + scale_y_continuous(breaks=c(0, maxcounts))
else if (style == 'density')
df <- data.frame(xmin=start(counts), xmax=end(counts), counts=counts$counts)
ggplt <- ggplot(df) + geom_rect(aes_string(xmin='xmin', xmax='xmax', ymin=0, ymax=4, alpha='counts')) + theme(panel.grid = element_blank(), panel.background = element_blank(), axis.text = element_blank(), axis.title = element_blank(), axis.ticks = element_blank(), axis.line = element_blank())
else
stop("Unknown value '", style, "' for parameter 'style'. Must be one of c('peaks', 'density').")
## Peaks
if (!is.null(peaklist))
if (is.null(peakcols))
peakcols <- getDistinctColors(length(peaklist))
for (i1 in 1:length(peaklist))
p <- peakTrackHeight
peaks <- subsetByOverlaps(peaklist[[i1]], ranges2plot)
if (length(peaks) > 0)
df <- data.frame(start=start(peaks), end=end(peaks), ymin=-p*i1, ymax=-p*i1+0.9*p)
ggplt <- ggplt + geom_rect(data=df, mapping=aes_string(xmin='start', xmax='end', ymin='ymin', ymax='ymax'), col=peakcols[i1], fill=peakcols[i1])
trackname <- names(peaklist)[i1]
df <- data.frame(x=start(counts)[1], y=-p*i1+0.5*p, label=trackname)
ggplt <- ggplt + geom_text(data=df, mapping=aes_string(x='x', y='y', label='label'), vjust=0.5, hjust=0.5, col=peakcols[i1])
return(ggplt)
Plot a genome browser viewPlot a simple genome browser view of chromstaR-objects
. This is useful for scripted genome browser snapshots.
plotGenomeBrowser( model, chr, start, end, style = "peaks", peakHeight = 0.2, peakColor = "blue", same.yaxis = TRUE )
plotGenomeBrowser( model, chr, start, end, style = "peaks", peakHeight = 0.2, peakColor = "blue", same.yaxis = TRUE )
model |
A |
chr , start , end
|
Chromosome, start and end coordinates for the plot. |
style |
One of |
peakHeight |
Height of the peak track relative to the count track. |
peakColor |
Color for the peak track. |
same.yaxis |
Whether or not the plots for the same mark have the same y-axis. |
A list()
of ggplot
objects.
## Get an example uniHMM ## file <- system.file("data","H3K27me3-BN-rep1.RData", package="chromstaR") model <- get(load(file)) plotGenomeBrowser(model, chr='chr12', start=1, end=1e6, style='peaks', peakHeight=0.1) ## Get an example multiHMM ## file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) plotGenomeBrowser(model, chr='chr12', start=1, end=1e6, style='peaks', peakHeight=0.1) ## Get an example combinedMultiHMM ## file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file)) plotlist <- plotGenomeBrowser(model, chr='chr12', start=1, end=1e6, style='peaks', peakHeight=0.1)
## Get an example uniHMM ## file <- system.file("data","H3K27me3-BN-rep1.RData", package="chromstaR") model <- get(load(file)) plotGenomeBrowser(model, chr='chr12', start=1, end=1e6, style='peaks', peakHeight=0.1) ## Get an example multiHMM ## file <- system.file("data","multivariate_mode-combinatorial_condition-SHR.RData", package="chromstaR") model <- get(load(file)) plotGenomeBrowser(model, chr='chr12', start=1, end=1e6, style='peaks', peakHeight=0.1) ## Get an example combinedMultiHMM ## file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file)) plotlist <- plotGenomeBrowser(model, chr='chr12', start=1, end=1e6, style='peaks', peakHeight=0.1)
Plot a histogram of binned read counts with fitted mixture distributions from a uniHMM
object.
plotHistogram( model, state = NULL, chromosomes = NULL, start = NULL, end = NULL, linewidth = 1 )
plotHistogram( model, state = NULL, chromosomes = NULL, start = NULL, end = NULL, linewidth = 1 )
model |
A |
state |
Plot the histogram only for the specified state. One of |
chromosomes , start , end
|
Plot the histogram only for the specified chromosomes, start and end position. |
linewidth |
Width of the distribution lines. |
A ggplot
object.
## Get an example BAM file with ChIP-seq reads file <- system.file("extdata", "euratrans", "lv-H3K27me3-BN-male-bio2-tech1.bam", package="chromstaRData") ## Bin the BED file into bin size 1000bp data(rn4_chrominfo) data(experiment_table) binned <- binReads(file, experiment.table=experiment_table, assembly=rn4_chrominfo, binsizes=1000, stepsizes=500, chromosomes='chr12') plotHistogram(binned) ## Fit the univariate Hidden Markov Model hmm <- callPeaksUnivariate(binned, max.time=60, eps=1) ## Check if the fit is ok plotHistogram(hmm)
## Get an example BAM file with ChIP-seq reads file <- system.file("extdata", "euratrans", "lv-H3K27me3-BN-male-bio2-tech1.bam", package="chromstaRData") ## Bin the BED file into bin size 1000bp data(rn4_chrominfo) data(experiment_table) binned <- binReads(file, experiment.table=experiment_table, assembly=rn4_chrominfo, binsizes=1000, stepsizes=500, chromosomes='chr12') plotHistogram(binned) ## Fit the univariate Hidden Markov Model hmm <- callPeaksUnivariate(binned, max.time=60, eps=1) ## Check if the fit is ok plotHistogram(hmm)
Plot histograms of binned read counts with fitted mixture distributions from a multiHMM
object.
plotHistograms(model, ...)
plotHistograms(model, ...)
model |
A |
... |
Additional arguments (see |
A ggplot
object.
This page provides an overview of all chromstaR plotting functions.
Plotting functions that work on uniHMM
objects:
plotHistogram
Read count histogram with fitted mixture distributions.
Plotting functions that work on multiHMM
objects:
heatmapCountCorrelation
Heatmap of read count correlations.
heatmapTransitionProbs
Heatmap of transition probabilities of the Hidden Markov Model.
heatmapCombinations
Binary presence/absence pattern of combinatorial states.
plotExpression
Boxplot of expression values that overlap combinatorial states.
Plotting functions that work on multiHMM
and combinedMultiHMM
objects:
heatmapCountCorrelation
Heatmap of read count correlations.
plotEnrichCountHeatmap
Heatmap of read counts around annotation.
plotEnrichment
Enrichment of combinatorial states around annotation.
plotFoldEnrichHeatmap
Enrichment of combinatorial states at multiple annotations.
plotExpression
Boxplot of expression values that overlap combinatorial states.
Other plotting functions:
heatmapCombinations
Binary presence/absence pattern of combinatorial states.
Print combinedMultiHMM object
## S3 method for class 'combinedMultiHMM' print(x, ...)
## S3 method for class 'combinedMultiHMM' print(x, ...)
x |
An |
... |
Ignored. |
An invisible NULL
.
Print multiHMM object
## S3 method for class 'multiHMM' print(x, ...)
## S3 method for class 'multiHMM' print(x, ...)
x |
An |
... |
Ignored. |
An invisible NULL
.
Print uniHMM object
## S3 method for class 'uniHMM' print(x, ...)
## S3 method for class 'uniHMM' print(x, ...)
x |
An |
... |
Ignored. |
An invisible NULL
.
Import aligned reads from a BAM file into a GRanges-class
object.
readBamFileAsGRanges( bamfile, bamindex = bamfile, chromosomes = NULL, pairedEndReads = FALSE, remove.duplicate.reads = FALSE, min.mapq = 10, max.fragment.width = 1000, blacklist = NULL, what = "mapq" )
readBamFileAsGRanges( bamfile, bamindex = bamfile, chromosomes = NULL, pairedEndReads = FALSE, remove.duplicate.reads = FALSE, min.mapq = 10, max.fragment.width = 1000, blacklist = NULL, what = "mapq" )
bamfile |
A sorted BAM file. |
bamindex |
BAM index file. Can be specified without the .bai ending. If the index file does not exist it will be created and a warning is issued. |
chromosomes |
If only a subset of the chromosomes should be imported, specify them here. |
pairedEndReads |
Set to |
remove.duplicate.reads |
A logical indicating whether or not duplicate reads should be removed. |
min.mapq |
Minimum mapping quality when importing from BAM files. Set |
max.fragment.width |
Maximum allowed fragment length. This is to filter out erroneously wrong fragments due to mapping errors of paired end reads. |
blacklist |
A |
what |
A character vector of fields that are returned. Uses the |
A GRanges-class
object containing the reads.
## Get an example BAM file with ChIP-seq reads bamfile <- system.file("extdata", "euratrans", "lv-H3K4me3-BN-female-bio1-tech1.bam", package="chromstaRData") ## Read the file into a GRanges object reads <- readBamFileAsGRanges(bamfile, chromosomes='chr12', pairedEndReads=FALSE, min.mapq=10, remove.duplicate.reads=TRUE) print(reads)
## Get an example BAM file with ChIP-seq reads bamfile <- system.file("extdata", "euratrans", "lv-H3K4me3-BN-female-bio1-tech1.bam", package="chromstaRData") ## Read the file into a GRanges object reads <- readBamFileAsGRanges(bamfile, chromosomes='chr12', pairedEndReads=FALSE, min.mapq=10, remove.duplicate.reads=TRUE) print(reads)
Import aligned reads from a BED file into a GRanges-class
object.
readBedFileAsGRanges( bedfile, assembly, chromosomes = NULL, remove.duplicate.reads = FALSE, min.mapq = 10, max.fragment.width = 1000, blacklist = NULL )
readBedFileAsGRanges( bedfile, assembly, chromosomes = NULL, remove.duplicate.reads = FALSE, min.mapq = 10, max.fragment.width = 1000, blacklist = NULL )
bedfile |
A file with aligned reads in BED-6 format. The columns have to be c('chromosome','start','end','description','mapq','strand'). |
assembly |
Please see |
chromosomes |
If only a subset of the chromosomes should be imported, specify them here. |
remove.duplicate.reads |
A logical indicating whether or not duplicate reads should be removed. |
min.mapq |
Minimum mapping quality when importing from BAM files. Set |
max.fragment.width |
Maximum allowed fragment length. This is to filter out erroneously wrong fragments. |
blacklist |
A |
A GRanges-class
object containing the reads.
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "liver-H3K4me3-BN-male-bio2-tech1.bed.gz", package="chromstaRData") ## Read the file into a GRanges object data(rn4_chrominfo) reads <- readBedFileAsGRanges(bedfile, assembly=rn4_chrominfo, chromosomes='chr12', min.mapq=10, remove.duplicate.reads=TRUE) print(reads)
## Get an example BED file with single-cell-sequencing reads bedfile <- system.file("extdata", "liver-H3K4me3-BN-male-bio2-tech1.bed.gz", package="chromstaRData") ## Read the file into a GRanges object data(rn4_chrominfo) reads <- readBedFileAsGRanges(bedfile, assembly=rn4_chrominfo, chromosomes='chr12', min.mapq=10, remove.duplicate.reads=TRUE) print(reads)
Read a chromstaR configuration file into a list structure. The configuration file has to be specified in INI format. R expressions can be used and will be evaluated.
readConfig(configfile)
readConfig(configfile)
configfile |
Path to the configuration file |
A list
with one entry for each element in configfile
.
Aaron Taudt
This is a simple convenience function to read a bed(.gz)-file into a GRanges-class
object. The bed-file is expected to have the following fields: chromosome, start, end, name, score, strand
.
readCustomBedFile( bedfile, col.names = c("chromosome", "start", "end", "name", "score", "strand"), col.classes = NULL, skip = 0, chromosome.format = "NCBI", sep = "" )
readCustomBedFile( bedfile, col.names = c("chromosome", "start", "end", "name", "score", "strand"), col.classes = NULL, skip = 0, chromosome.format = "NCBI", sep = "" )
bedfile |
Filename of the bed or bed.gz file. |
col.names |
A character vector giving the names of the columns in the |
col.classes |
A character vector giving the classes of the columns in |
skip |
Number of lines to skip at the beginning. |
chromosome.format |
Desired format of the chromosomes. Either 'NCBI' for (1,2,3 ...) or 'UCSC' for (chr1,chr2,chr3 ...) or |
sep |
Field separator from |
A GRanges-class
object with the contents of the bed-file.
Aaron Taudt
## Get an example BED file bedfile <- system.file("extdata", "liver-H3K4me3-BN-male-bio2-tech1.bed.gz", package="chromstaRData") ## Import the file and skip the first 10 lines data <- readCustomBedFile(bedfile, skip=10)
## Get an example BED file bedfile <- system.file("extdata", "liver-H3K4me3-BN-male-bio2-tech1.bed.gz", package="chromstaRData") ## Import the file and skip the first 10 lines data <- readCustomBedFile(bedfile, skip=10)
Remove a condition from a combinedMultiHMM
object.
removeCondition(model, conditions)
removeCondition(model, conditions)
model |
A |
conditions |
A character vector with the condition(s) to be removed. |
The input combinedMultiHMM
object with specified conditions removed.
## Get an example HMM file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file)) ## Print available conditions print(unique(model$info$condition)) ## Remove condition SHR new.model <- removeCondition(model, conditions='SHR')
## Get an example HMM file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file)) ## Print available conditions print(unique(model$info$condition)) ## Remove condition SHR new.model <- removeCondition(model, conditions='SHR')
Use simulations to find the best bin size among a set of input files. There is no guarantee that the bin size will be the best for your data, since it is only "best" in terms of fewest miscalls for simulated data. However, it can give you a hint what bin size to choose.
scanBinsizes( files.binned, outputfolder, chromosomes = "chr10", eps = 0.01, max.iter = 100, max.time = 300, repetitions = 3, plot.progress = FALSE )
scanBinsizes( files.binned, outputfolder, chromosomes = "chr10", eps = 0.01, max.iter = 100, max.time = 300, repetitions = 3, plot.progress = FALSE )
files.binned |
A vector with files that contain |
outputfolder |
Name of the folder where all files will be written to. |
chromosomes |
A vector of chromosomes to use for the simulation. |
eps |
Convergence threshold for the Baum-Welch algorithm. |
max.iter |
The maximum number of iterations for the Baum-Welch algorithm. The default -1 is no limit. |
max.time |
The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. The default -1 is no limit. |
repetitions |
Number of repetitions for each simulation. |
plot.progress |
If TRUE, the plot will be updated each time a simulation has finished. If FALSE, the plot will be returned only at the end. |
The function first runs callPeaksUnivariate
on the given binned.data files. From the estimated parameters it generates simulated data and calls the peaks on this simulated data. Because the data is simulated, the fraction of miscalls can be precisely calculated.
A ggplot
object with a bar plot of the number of miscalls dependent on the bin size.
Aaron Taudt
Various scores used in chromstaR
.
differentialScoreMax(mat, info, FUN = "-") differentialScoreSum(mat, info, FUN = "-")
differentialScoreMax(mat, info, FUN = "-") differentialScoreSum(mat, info, FUN = "-")
mat |
A matrix with posterior probabilities, read counts or any other matrix with these dimensions. Column names must correspond to the ID entries in |
info |
An |
FUN |
A function to compute the score with. |
A numeric vector.
differentialScoreMax
: Maximum differential score. Values are between 0 and 1. A value of 1 means that at least one mark is maximally different between conditions.
differentialScoreSum
: Additive differential score. Values are between 0 and N, where N is the number of marks. A value around 1 means that approximately 1 mark is different, a value of 2 means that 2 marks are different etc.
Aaron Taudt
Simulate known states, read counts and read coordinates using a multivariate Hidden Markov Model.
simulateMultivariate( bins, transition, emissions, weights, correlationMatrices, combstates, IDs, fragLen = 50 )
simulateMultivariate( bins, transition, emissions, weights, correlationMatrices, combstates, IDs, fragLen = 50 )
bins |
A |
transition |
A matrix with transition probabilities. |
emissions |
A list() with data.frames with emission distributions (see |
weights |
A list() with weights for the three univariate states. |
correlationMatrices |
A list with correlation matrices. |
combstates |
A vector with combinatorial states. |
IDs |
A character vector with IDs. |
fragLen |
Length of the simulated read fragments. |
A list()
with entries $bins containing the simulated states and read count, $reads with simulated read coordinates.
Simulate read coordinates using read counts as input.
simulateReadsFromCounts(bins, fragLen = 50)
simulateReadsFromCounts(bins, fragLen = 50)
bins |
A |
fragLen |
Length of the simulated read fragments. |
A GRanges-class
with read coordinates.
Simulate known states, read counts and read coordinates using a univariate Hidden Markov Model with three states ("zero-inflation", "unmodified" and "modified").
simulateUnivariate(bins, transition, emission, fragLen = 50)
simulateUnivariate(bins, transition, emission, fragLen = 50)
bins |
A |
transition |
A matrix with transition probabilities. |
emission |
A data.frame with emission distributions (see |
fragLen |
Length of the simulated read fragments. |
A list
with entries $bins containing the simulated states and read count, $reads with simulated read coordinates and $transition and $emission.
This function returns all combinatorial (decimal) states that are consistent with a given abstract specification.
state.brewer( replicates = NULL, differential.states = FALSE, min.diff = 1, common.states = FALSE, conditions = NULL, tracks2compare = NULL, sep = "+", statespec = NULL, diffstatespec = NULL, exclusive.table = NULL, binary.matrix = NULL )
state.brewer( replicates = NULL, differential.states = FALSE, min.diff = 1, common.states = FALSE, conditions = NULL, tracks2compare = NULL, sep = "+", statespec = NULL, diffstatespec = NULL, exclusive.table = NULL, binary.matrix = NULL )
replicates |
A vector specifying the replicate structure. Similar entries will be treated as replicates. |
differential.states |
A logical specifying whether differential states shall be returned. |
min.diff |
The minimum number of differences between conditions. |
common.states |
A logical specifying whether common states shall be returned. |
conditions |
A vector with the same length as |
tracks2compare |
A vector with the same length as |
sep |
Separator used to separate the tracknames in the combinations. The default '+' should not be changed because it is assumed in follow-up functions. |
statespec |
If this parameter is specified,
|
diffstatespec |
A vector composed of any combination of the following entries:
|
exclusive.table |
A |
binary.matrix |
A logical matrix produced by |
The binary modification state (unmodified=0 or modified=1) of multiple ChIP-seq samples defines a (decimal) combinatorial state such as:
sample1 | sample2 | sample3 | sample4 | sample5 | combinatorial state | |
bin1 | 0 | 0 | 1 | 0 | 0 | 4 |
bin2 | 0 | 0 | 0 | 0 | 0 | 0 |
bin3 | 0 | 1 | 0 | 1 | 0 | 10 |
bin4 | 0 | 1 | 1 | 1 | 1 | 15 |
bin5 | 0 | 0 | 1 | 0 | 1 | 5 |
A data.frame with combinations and their corresponding (decimal) combinatorial states.
Aaron Taudt, David Widmann
# Get all combinatorial states where sample1=0, sample2=1, sample3=(0 or 1), # sample4=sample5 chromstaR:::state.brewer(statespec=c('0.A','1.B','x.C','r.D','r.D')) # Get all combinatorial states where sample1=sample2=sample3, sample4=sample5 chromstaR:::state.brewer(statespec=c('r.A','r.A','r.A','r.B','r.B')) # Get all combinatorial states where sample1=sample5, sample2=sample3=1, # sample4=(0 or 1) chromstaR:::state.brewer(statespec=c('r.A','1.B','1.C','x.D','r.A'))
# Get all combinatorial states where sample1=0, sample2=1, sample3=(0 or 1), # sample4=sample5 chromstaR:::state.brewer(statespec=c('0.A','1.B','x.C','r.D','r.D')) # Get all combinatorial states where sample1=sample2=sample3, sample4=sample5 chromstaR:::state.brewer(statespec=c('r.A','r.A','r.A','r.B','r.B')) # Get all combinatorial states where sample1=sample5, sample2=sample3=1, # sample4=(0 or 1) chromstaR:::state.brewer(statespec=c('r.A','1.B','1.C','x.D','r.A'))
This function computes combinatorial states from an experiment.table
.
stateBrewer( experiment.table, mode, differential.states = FALSE, common.states = FALSE, exclusive.table = NULL, binary.matrix = NULL )
stateBrewer( experiment.table, mode, differential.states = FALSE, common.states = FALSE, exclusive.table = NULL, binary.matrix = NULL )
experiment.table |
A |
mode |
Mode of brewing. See |
differential.states |
A logical specifying whether differential states shall be returned. |
common.states |
A logical specifying whether common states shall be returned. |
exclusive.table |
A |
binary.matrix |
A logical matrix produced by |
The binary modification state (unmodified=0 or modified=1) of multiple ChIP-seq samples defines a (decimal) combinatorial state such as:
sample1 | sample2 | sample3 | sample4 | sample5 | combinatorial state | |
bin1 | 0 | 0 | 1 | 0 | 0 | 4 |
bin2 | 0 | 0 | 0 | 0 | 0 | 0 |
bin3 | 0 | 1 | 0 | 1 | 0 | 10 |
bin4 | 0 | 1 | 1 | 1 | 1 | 15 |
bin5 | 0 | 0 | 1 | 0 | 1 | 5 |
A data.frame with combinations and their corresponding (decimal) combinatorial states.
Aaron Taudt
## Construct an experiment table data(experiment_table) print(experiment_table) ## Construct combinatorial states stateBrewer(experiment_table, mode='combinatorial') stateBrewer(experiment_table, mode='differential') stateBrewer(experiment_table, mode='full', common.states=TRUE) ## Exclude states with exclusive.table excl <- data.frame(mark=c('H3K4me3','H3K27me3'), group=c(1,1)) stateBrewer(experiment_table, mode='full', exclusive.table=excl)
## Construct an experiment table data(experiment_table) print(experiment_table) ## Construct combinatorial states stateBrewer(experiment_table, mode='combinatorial') stateBrewer(experiment_table, mode='differential') stateBrewer(experiment_table, mode='full', common.states=TRUE) ## Exclude states with exclusive.table excl <- data.frame(mark=c('H3K4me3','H3K27me3'), group=c(1,1)) stateBrewer(experiment_table, mode='full', exclusive.table=excl)
Normalize read counts to a given read depth. Reads counts are randomly removed from the input to match the specified read depth.
subsample(binned.data, sample.reads)
subsample(binned.data, sample.reads)
binned.data |
A |
sample.reads |
The number of reads that will be retained. |
A GRanges-class
object with downsampled read counts.
Aaron Taudt
Get a table of transition frequencies between combinatorial states of different multiHMM
s.
transitionFrequencies( multi.hmms = NULL, combined.hmm = NULL, zero.states = "[]", combstates = NULL )
transitionFrequencies( multi.hmms = NULL, combined.hmm = NULL, zero.states = "[]", combstates = NULL )
multi.hmms |
A named list with |
combined.hmm |
A |
zero.states |
The string(s) which identifies the zero.states. |
combstates |
Alternative input instead of |
A data.frame with transition frequencies.
Aaron Taudt
## Get an example combinedMultiHMM file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file)) freqs <- transitionFrequencies(combined.hmm=model) freqs$table
## Get an example combinedMultiHMM file <- system.file("data","combined_mode-differential.RData", package="chromstaR") model <- get(load(file)) freqs <- transitionFrequencies(combined.hmm=model) freqs$table
The univariate HMM object is output of the function callPeaksUnivariate
and is a list()
with various entries. The class()
attribute of this list was set to "uniHMM". For a given hmm, the entries can be accessed with the list operators 'hmm[[]]' or 'hmm$'.
A list()
with the following entries:
info |
Experiment table for this object. |
bincounts |
A |
bins |
A |
peaks |
A |
weights |
Weight for each component. Same as |
transitionProbs |
Matrix of transition probabilities from each state (row) into each state (column). |
transitionProbs.initial |
Initial |
startProbs |
Probabilities for the first bin. Same as |
startProbs.initial |
Initial |
distributions |
Estimated parameters of the emission distributions. |
distributions.initial |
Distribution parameters at the beginning of the Baum-Welch. |
post.cutoff |
Cutoff for posterior probabilities to call peaks. |
convergenceInfo |
Contains information about the convergence of the Baum-Welch algorithm. |
convergenceInfo$eps |
Convergence threshold for the Baum-Welch. |
convergenceInfo$loglik |
Final loglikelihood after the last iteration. |
convergenceInfo$loglik.delta |
Change in loglikelihood after the last iteration (should be smaller than |
convergenceInfo$num.iterations |
Number of iterations that the Baum-Welch needed to converge to the desired |
convergenceInfo$time.sec |
Time in seconds that the Baum-Welch needed to converge to the desired |
convergenceInfo$max.mean |
Value of parameter |
convergenceInfo$read.cutoff |
Cutoff value for read counts. |
callPeaksUnivariate
, multiHMM
, combinedMultiHMM
Combine multiple uniHMM
s to a multiHMM
without running callPeaksMultivariate
. This should only be done for comparison purposes.
unis2pseudomulti(hmms)
unis2pseudomulti(hmms)
hmms |
A named list of |
Use this function if you want to combine ChIP-seq samples without actually running a multivariate Hidden Markov Model. The resulting object will be of class multiHMM
but will not be truly multivariate.
A multiHMM
object.
Aaron Taudt
# Get example BAM files for 2 different marks in hypertensive rat (SHR) file.path <- system.file("extdata","euratrans", package='chromstaRData') files <- list.files(file.path, full.names=TRUE, pattern='SHR.*bam$')[c(1,4)] # Bin the data data(rn4_chrominfo) binned.data <- list() for (file in files) { binned.data[[basename(file)]] <- binReads(file, binsizes=1000, stepsizes=500, assembly=rn4_chrominfo, chromosomes='chr12') } # Obtain the univariate fits models <- list() for (i1 in 1:length(binned.data)) { models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1) } ## Combine the univariate HMMs without fitting a multivariate HMM names(models) <- c('H3K27me3','H3K4me3') pseudo.multi.HMM <- unis2pseudomulti(models) ## Compare frequencies with real multivariate HMM exp <- data.frame(file=files, mark=c("H3K27me3","H3K4me3"), condition=rep("SHR",2), replicate=c(1,1), pairedEndReads=FALSE, controlFiles=NA) states <- stateBrewer(exp, mode='combinatorial') real.multi.HMM <- callPeaksMultivariate(models, use.states=states, eps=1, max.time=60) genomicFrequencies(real.multi.HMM) genomicFrequencies(pseudo.multi.HMM)
# Get example BAM files for 2 different marks in hypertensive rat (SHR) file.path <- system.file("extdata","euratrans", package='chromstaRData') files <- list.files(file.path, full.names=TRUE, pattern='SHR.*bam$')[c(1,4)] # Bin the data data(rn4_chrominfo) binned.data <- list() for (file in files) { binned.data[[basename(file)]] <- binReads(file, binsizes=1000, stepsizes=500, assembly=rn4_chrominfo, chromosomes='chr12') } # Obtain the univariate fits models <- list() for (i1 in 1:length(binned.data)) { models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1) } ## Combine the univariate HMMs without fitting a multivariate HMM names(models) <- c('H3K27me3','H3K4me3') pseudo.multi.HMM <- unis2pseudomulti(models) ## Compare frequencies with real multivariate HMM exp <- data.frame(file=files, mark=c("H3K27me3","H3K4me3"), condition=rep("SHR",2), replicate=c(1,1), pairedEndReads=FALSE, controlFiles=NA) states <- stateBrewer(exp, mode='combinatorial') real.multi.HMM <- callPeaksMultivariate(models, use.states=states, eps=1, max.time=60) genomicFrequencies(real.multi.HMM) genomicFrequencies(pseudo.multi.HMM)
Make variable-width bins based on a reference BAM file. This can be a simulated file (produced by TODO: insert link and aligned with your favourite aligner) or a real reference.
variableWidthBins(reads, binsizes, chromosomes = NULL)
variableWidthBins(reads, binsizes, chromosomes = NULL)
reads |
A |
binsizes |
A vector with binsizes. Resulting bins will be close to the specified binsizes. |
chromosomes |
A subset of chromosomes for which the bins are generated. |
Variable-width bins are produced by first binning the reference BAM file with fixed-width bins and selecting the desired number of reads per bin as the (non-zero) maximum of the histogram. A new set of bins is then generated such that every bin contains the desired number of reads.
A list()
of GRanges-class
objects with variable-width bins.
Aaron Taudt
## Get an example BAM file with ChIP-seq reads bamfile <- system.file("extdata", "euratrans", "lv-H3K4me3-BN-female-bio1-tech1.bam", package="chromstaRData") ## Read the file into a GRanges object reads <- readBamFileAsGRanges(bamfile, chromosomes='chr12', pairedEndReads=FALSE, min.mapq=10, remove.duplicate.reads=TRUE) ## Make variable-width bins of size 1000bp bins <- variableWidthBins(reads, binsizes=1000) ## Plot the distribution of binsizes hist(width(bins[['1000']]), breaks=50)
## Get an example BAM file with ChIP-seq reads bamfile <- system.file("extdata", "euratrans", "lv-H3K4me3-BN-female-bio1-tech1.bam", package="chromstaRData") ## Read the file into a GRanges object reads <- readBamFileAsGRanges(bamfile, chromosomes='chr12', pairedEndReads=FALSE, min.mapq=10, remove.duplicate.reads=TRUE) ## Make variable-width bins of size 1000bp bins <- variableWidthBins(reads, binsizes=1000) ## Plot the distribution of binsizes hist(width(bins[['1000']]), breaks=50)
Write a chromstaR configuration file from a list structure.
writeConfig(conf, configfile)
writeConfig(conf, configfile)
conf |
A list structure with parameter values. Each entry will be written in one line. |
configfile |
Filename of the outputfile. |
NULL
Aaron Taudt
Density, distribution function, quantile function and random
generation for the zero-inflated negative binomial distribution with parameters
w
, size
and prob
.
dzinbinom(x, w, size, prob, mu) pzinbinom(q, w, size, prob, mu, lower.tail = TRUE) qzinbinom(p, w, size, prob, mu, lower.tail = TRUE) rzinbinom(n, w, size, prob, mu)
dzinbinom(x, w, size, prob, mu) pzinbinom(q, w, size, prob, mu, lower.tail = TRUE) qzinbinom(p, w, size, prob, mu, lower.tail = TRUE) rzinbinom(n, w, size, prob, mu)
x |
Vector of (non-negative integer) quantiles. |
w |
Weight of the zero-inflation. |
size |
Target for number of successful trials, or dispersion parameter (the shape parameter of the gamma mixing distribution). Must be strictly positive, need not be integer. |
prob |
Probability of success in each trial. |
mu |
Alternative parametrization via mean: see ‘Details’. |
q |
Vector of quantiles. |
lower.tail |
logical; if TRUE (default), probabilities are |
p |
Vector of probabilities. |
n |
number of observations. If |
The zero-inflated negative binomial distribution with size
and
prob
has density
for ,
,
and
.
for ,
,
and
.
dzinbinom gives the density, pzinbinom gives the distribution function, qzinbinom gives the quantile function, and rzinbinom generates random deviates.
dzinbinom
: gives the density
pzinbinom
: gives the cumulative distribution function
qzinbinom
: gives the quantile function
rzinbinom
: random number generation
Matthias Heinig, Aaron Taudt
Distributions for standard distributions, including
dbinom
for the binomial, dnbinom
for the negative binomial, dpois
for the
Poisson and dgeom
for the geometric distribution, which
is a special case of the negative binomial.