Package 'chipseq' reference manual

Title:	chipseq: A package for analyzing chipseq data
Description:	Tools for helping process short read data for chipseq experiments.
Authors:	Deepayan Sarkar [aut], Robert Gentleman [aut], Michael Lawrence [aut], Zizhen Yao [aut], Oluwabukola Bamigbade [ctb] (Converted vignette from Sweave to R Markdown / HTML.), Bioconductor Package Maintainer [cre]
Maintainer:	Bioconductor Package Maintainer <[email protected]>
License:	Artistic-2.0
Version:	1.57.0
Built:	2025-03-25 06:25:39 UTC
Source:	https://github.com/bioc/chipseq

Filtering ChIP-seq reads

Description

Convenience for creating an SRFilter object appropriate for ChIP-seq data. Typically, the result is passed to readAligned when loading reads.

Usage

chipseqFilter(exclude = "[_MXY]", uniqueness = c("location", "sequence", "location*sequence", "none"), hasStrand = TRUE)
chipseqFilter(exclude = "[_MXY]", uniqueness = c("location", "sequence", "location*sequence", "none"), hasStrand = TRUE)

Arguments

`exclude`	A regular expression for excluding chromosomes by name. Just like the parameter to `bsapply`.
`uniqueness`	The criteria used to determine whether a read is unique. A read may be unique if it maps to a unique `location`, has a unique `sequence` or both. Specifying `none` avoids this test entirely.
`hasStrand`	Whether to require that the read is mapped to a strand, which usually translates to whether the read was mapped at all.

Value

An SRFilter object

Author(s)

M. Lawrence

Examples

sp <- SolexaPath(system.file("extdata", package="ShortRead"))

filter <- chipseqFilter()
aln <- readAligned(sp, "s_2_export.txt", filter=filter)
## allow mapping to the same location (but only if sequence is different)
filter <- chipseqFilter(uniqueness = "sequence")
aln <- readAligned(sp, "s_2_export.txt", filter=filter)
## allow sex chromosomes
filter <- chipseqFilter(exclude = "[M_]")
aln <- readAligned(sp, "s_2_export.txt", filter=filter)
sp <- SolexaPath(system.file("extdata", package="ShortRead"))

filter <- chipseqFilter()
aln <- readAligned(sp, "s_2_export.txt", filter=filter)
## allow mapping to the same location (but only if sequence is different)
filter <- chipseqFilter(uniqueness = "sequence")
aln <- readAligned(sp, "s_2_export.txt", filter=filter)
## allow sex chromosomes
filter <- chipseqFilter(exclude = "[M_]")
aln <- readAligned(sp, "s_2_export.txt", filter=filter)

Plot coverage on a small interval.

Description

A function that plots one or two coverage vectors over a relatively small interval in the genome.

Usage

coverageplot(peaks1, peaks2 = NULL, i = 1,
             xlab = "Position", ylab = "Coverage",
             opposite = TRUE, ...)
coverageplot(peaks1, peaks2 = NULL, i = 1,
             xlab = "Position", ylab = "Coverage",
             opposite = TRUE, ...)

Arguments

`peaks1`, `peaks2`	A set of peaks as described by ranges over a coverage vector.
`i`	Which peak to use.
`xlab`, `ylab`	Axis labels.
`opposite`	Logical specifying whether the two peaks should be plotted on opposite sides (appropriate for positive and negative strand peaks).
`...`	extra arguments.

Author(s)

Deepayan Sarkar

Examples

cov <- Rle(c(1:10, seq(10, 1, -2), seq(1,5,2), 4:1), rep(1:2, 11))
peaks <- slice(cov, 3)
peaks.cov <- Views(cov, ranges(peaks))
peaks.cov.rev <- rev(peaks.cov)
coverageplot(peaks.cov, peaks.cov.rev, ylab = "Example")
cov <- Rle(c(1:10, seq(10, 1, -2), seq(1,5,2), 4:1), rep(1:2, 11))
peaks <- slice(cov, 3)
peaks.cov <- Views(cov, ranges(peaks))
peaks.cov.rev <- rev(peaks.cov)
coverageplot(peaks.cov, peaks.cov.rev, ylab = "Example")

A test ChIP-Seq dataset

Description

A small subset of a ChIP-Seq dataset downloaded from the Short-Read Archive.

Usage

data(cstest)data(cstest)

Format

The dataset is on object of class GRangesList with read alignments from three chromosomes in two lanes representing CTCF and GFP pull-down in mouse.

Source

Short Read Archive, GEO accession number GSM288351 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM288351

References

Chen X., Xu H., Yuan P., Fang F., Huss M., Vega V.B., Wong E., Orlov Y.L., Zhang W., Jiang J., Loh Y.H., Yeo H.C., Yeo Z.X., Narang V., Govindarajan K.R., Leong B., Shahab A.S., Ruan Y., Bourque G., Sung W.K., Clarke N.D., Wei C.L., Ng H.H. (2008), “Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells”. Cell, 133:1106-1117.

Examples

data(cstest)
names(cstest)
cstest$gfp
data(cstest)
names(cstest)
cstest$gfp

A function to identify and produce summary statistics for differentially expressed peaks.

Description

Given two sets of peaks, this function combines them and summarizes the individual coverage vectors under the combined peak set.

Usage

diffPeakSummary(ranges1, ranges2,
                viewSummary = list(sums = viewSums, maxs = viewMaxs))
diffPeakSummary(ranges1, ranges2,
                viewSummary = list(sums = viewSums, maxs = viewMaxs))

Arguments

`ranges1`	First set of peaks (typically an `RleViewsList`).
`ranges2`	Second set of peaks (typically an `RleViewsList`).
`viewSummary`	A list of the per peak summary functions.

Value

A data.frame with one row for each peak in the combined data. The chromosome, start and stop nucleotide positions (+ strand) are given as are the summary statistics requested.

Author(s)

D. Sarkar

Examples

data(cstest)
library(BSgenome.Mmusculus.UCSC.mm9)
seqlevels(cstest) <- seqlevels(Mmusculus)
seqlengths(cstest) <- seqlengths(Mmusculus)
## find peaks
findPeaks <- function(reads) {
  reads.ext <- resize(reads, width = 200)
  slice(coverage(reads.ext), lower = 8)
}
peakSummary <- diffPeakSummary(findPeaks(cstest$gfp), findPeaks(cstest$ctcf))
data(cstest)
library(BSgenome.Mmusculus.UCSC.mm9)
seqlevels(cstest) <- seqlevels(Mmusculus)
seqlengths(cstest) <- seqlengths(Mmusculus)
## find peaks
findPeaks <- function(reads) {
  reads.ext <- resize(reads, width = 200)
  slice(coverage(reads.ext), lower = 8)
}
peakSummary <- diffPeakSummary(findPeaks(cstest$gfp), findPeaks(cstest$ctcf))

Estimate summaries of the distribution of fragment lengths in a short-read experiment. The methods are designed for ChIP-Seq experiments and may not work well in data without peaks.

Description

estimate.mean.fraglen implements three methods for estimating mean fragment length. The other functions are related helper functions implementing various methods, but may be useful by themselves for diagnostic purposes. Many of these operations are potentially slow.

sparse.density is intended to be similar to density, but returns the results in a run-length encoded form. This is useful when long stretches of the range of the data have zero density.

Usage

estimate.mean.fraglen(x, method = c("SISSR", "coverage", "correlation"),
                      ...)

basesCovered(x, shift = seq(5, 300, 5), seqLen = 100, verbose = FALSE)

densityCorr(x, shift = seq(0, 500, 5), center = FALSE,
            width = seqLen *2L, seqLen=100L, maxDist = 500L, ...)

sparse.density(x, width = 50, kernel = "epanechnikov",
               from = start(rix)[1] - 10L,
               to = end(rix)[length(rix)] + 10L)
estimate.mean.fraglen(x, method = c("SISSR", "coverage", "correlation"),
                      ...)

basesCovered(x, shift = seq(5, 300, 5), seqLen = 100, verbose = FALSE)

densityCorr(x, shift = seq(0, 500, 5), center = FALSE,
            width = seqLen *2L, seqLen=100L, maxDist = 500L, ...)

sparse.density(x, width = 50, kernel = "epanechnikov",
               from = start(rix)[1] - 10L,
               to = end(rix)[length(rix)] + 10L)

Arguments

`x`	For `estimate.mean.fraglen`, typically an AlignedRead or a GRanges object. For `basesCovered` and `densityCorr`, a list with elements `"+"` and `"-"` representing locations of reads aligned to positive and negative strands (the values should be integers denoting the location where the first sequenced base matched.) `densityCorr` has also come to support `GRanges` input directly. For `sparse.density`, a numeric or integer vector for which density is to be computed.
`method`	Character string giving method to be used. `method = "SISSR"` implements the method described in Jothi et al (see References below). `method = "correlation"` implements the method described in Kharchenko et al (see References below), where the idea is to compute the density of tag start positions separately for each strand, and then determine the amount of shift that maximizes the correlation between these two densities. `method = "coverage"` computes the optimal shift for which the number of bases covered by any read is minimized.
`shift`	Integer vector giving amount of shifts to be tried when optimizing. The current algorithm simply evaluates all supplied values and reports the one giving minimum coverage or maximum correlation.
`seqLen`	For the `"coverage"` method, the assumed length of each read for computing the coverage. Typically the read length. This is added to the shift estimated by `"coverage"` and `"correlation"` to come up with the actual fragment length.
`verbose`	Logical specifying whether progress information should be printed during execution.
`center`	For the `"correlation"` method, whether the calculations should incorporate centering by the mean density. The default is not to do so; as the density is zero over most of the genome, this slightly improves efficiency at negligible loss in accuracy.
`width`	half-bandwidth used in the computation. This needs to be specified as an integer, data-driven rules are not supported.
`kernel`	A character string giving the density kernel.
`from`, `to`	specifies range over which the density is to be computed.
`maxDist`	If distance to nearest neighbor is more than this, the position is discarded. This removes isolated points, which are not very informative.
`...`	Extra arguments, passed on as appropriate to other functions.

Details

For the correlation method, the range over which densities are computed only cover the range of reads; that is, the beginning and end of chromosomes are excluded.

Value

estimate.mean.fraglen gives an estimate of the mean fragment length.

basesCovered and densityCorr give a vector of the corresponding objective function evaluated at the supplied values of shift.

sparse.density returns an object of class "Rle".

Author(s)

Deepayan Sarkar, Michael Lawrence

References

R. Jothi, S. Cuddapah, A. Barski, K. Cui, and K. Zhao. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Research, 36:5221–31, 2008.

P. V. Kharchenko, M. Y. Tolstorukov, and P. J. Park. Design and analysis of ChIP experiments for DNA-binding proteins. Nature Biotechnology, 26:1351–1359, 2008.

Examples

data(cstest)
estimate.mean.fraglen(cstest[["ctcf"]], method = "coverage")
data(cstest)
estimate.mean.fraglen(cstest[["ctcf"]], method = "coverage")

Plot island depth distribution

Description

Plots the distribution of island depths using points for the observed islands and a line for the Poisson estimate of the noise. Useful for choosing a depth corresponding to a desired FDR.

Usage

islandDepthPlot(x, maxDepth = 20L)
islandDepthPlot(x, maxDepth = 20L)

Arguments

`x`	A coverage object, e.g., `RleList`.
`maxDepth`	The maximum depth to plot (there are usually some outliers).

Author(s)

D. Sarkar, M. Lawrence

Examples

data(cstest)
cov <- coverage(resize(cstest$ctcf, width=200))
islandDepthPlot(cov)
data(cstest)
cov <- coverage(resize(cstest$ctcf, width=200))
islandDepthPlot(cov)

Subsample short read alignment locations

Description

Subsamples data from multiple lanes on a per-chromosome basis.

Usage

laneSubsample(lane1, lane2, fudge = 0.05)
laneSubsample(lane1, lane2, fudge = 0.05)

Arguments

`lane1`, `lane2`	Two lanes of data, each of class `"GRanges"`.
`fudge`	A numeric fudge factor. For each chromosome, if the difference in the sizes relative to the size of the first dataset is less than `fudge`, no subsampling is done.

Value

laneSubsample returns a list similar to its input, but with the larger dataset subsampled to be similar to the smaller one.

Author(s)

D. Sarkar

Examples

data(cstest)
## subsample to compare lanes
cstest.sub <- laneSubsample(cstest[[1]], cstest[[2]])
unlist(cstest.sub)
data(cstest)
## subsample to compare lanes
cstest.sub <- laneSubsample(cstest[[1]], cstest[[2]])
unlist(cstest.sub)

Calculate a peak cutoff

Description

Calculates a peak cutoff value given an FDR, assuming a Poisson noise distribution estimated from the frequency of singleton and doubleton islands.

Usage

peakCutoff(cov, fdr.cutoff = 0.001, k = 2:20)
peakCutoff(cov, fdr.cutoff = 0.001, k = 2:20)

Arguments

`cov`	The coverage object, e.g., an `RleList` object.
`fdr.cutoff`	The maximum-allowed FDR for calculating the cutoff.
`k`	The coverage levels at which to estimate an FDR value. The maximal value that is less than `fdr.cutoff` is chosen for calculating the cutoff. Usually best left to the default.

Value

A numeric value to use for calling peaks

Author(s)

D. Sarkar and M. Lawrence

Examples

data(cstest)
cov <- coverage(resize(cstest$ctcf, width=200))
peakCutoff(cov)
data(cstest)
cov <- coverage(resize(cstest$ctcf, width=200))
peakCutoff(cov)

Summarizing peak sets

Description

Summarizes a set of peaks into a GRanges object with columns of statistics like the peak maxima and integrals (sums).

Usage

peakSummary(x, ...) 
peakSummary(x, ...)

Arguments

`x`	An object containing peaks, usually a `RleViewsList`.
`...`	Arguments to pass to methods

Value

A GRanges object of the peaks, with columns named max, maxpos (position of the maximum, centered), and sum.

Compute summaries for cumulative subsets of a short-read data set.

Description

THIS FUNCTION IS DEFUNCT!

Divides a short-read dataset into several subsets, and computes various summaries cumulatively. The goal is to study the characteristics of the data as a function of sample size.

Usage

subsetSummary(x, chr, nstep, props = seq(0.1, 1, 0.1),
              chromlens = seqlengths(x), fg.cutoff = 6, seqLen = 200,
              fdr.cutoff = 0.001, use.fdr = FALSE, resample = TRUE,
              islands = TRUE, verbose = getOption("verbose"))
subsetSummary(x, chr, nstep, props = seq(0.1, 1, 0.1),
              chromlens = seqlengths(x), fg.cutoff = 6, seqLen = 200,
              fdr.cutoff = 0.001, use.fdr = FALSE, resample = TRUE,
              islands = TRUE, verbose = getOption("verbose"))

Arguments

`x`	A `"GRanges"` object representing alignment locations at the sample level.
`chr`	The chromosome for which the summaries are to be obtained. Must specify a valid element of `x`
`nstep`	The number of maps in each increment for the full dataset (not per-chromosome). This will be translated to a per-chromosome number proportionally.
`props`	Alternatively, an increasing sequence of proportions determining the size of each subset. Overrides `nstep`.
`chromlens`	A named vector of per-chromosome lengths, typically the result of `seqlengths`.
`fg.cutoff`	The coverage depth above which a region would be considered foreground.
`seqLen`	The number of bases to which to extend each read before computing coverage.
`resample`	Logical; whether to randomly reorder the reads before dividing them up into subsets. Useful to remove potential order effects (for example, if data from two lanes were combined to produce `x`).
`fdr.cutoff`	The maximum false discovery rate for a region that is considered to be foreground.
`use.fdr`	Whether to use the FDR detected peaks when calling foreground and background.
`islands`	Logical. If `TRUE`, the whole island would be considered foreground if the maximum depth equals or exceeds `fg.cutoff`. If `FALSE`, only the region above the cutoff would be considered foreground.
`verbose`	logical controlling whether progress information will be shown during computation (which is potentially long-running).

Value

A data frame with various per-subset summaries.

Note

This function should be considered preliminary, in that it might change significantly or simply be removed in a subsequent version. If you like it the way it is, please notify the maintainer.

Author(s)

Deepayan Sarkar, Michael Lawrence

Package 'chipseq'

Help Index

Filtering ChIP-seq reads

Description

Usage

Arguments

Value

Author(s)

Examples

Plot coverage on a small interval.

Description

Usage

Arguments

Author(s)

Examples

A test ChIP-Seq dataset

Description

Usage

Format

Source

References

Examples

A function to identify and produce summary statistics for differentially expressed peaks.

Description

Usage

Arguments

Value

Author(s)

Examples

Estimate summaries of the distribution of fragment lengths in a short-read experiment. The methods are designed for ChIP-Seq experiments and may not work well in data without peaks.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Plot island depth distribution

Description

Usage

Arguments

Author(s)

See Also

Examples

Subsample short read alignment locations

Description

Usage

Arguments

Value

Author(s)

Examples

Calculate a peak cutoff

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Summarizing peak sets

Description

Usage

Arguments

Value

See Also

Compute summaries for cumulative subsets of a short-read data set.

Description

Usage

Arguments

Value

Note

Author(s)