Title: | Processing and analyzing bisulfite sequencing data |
---|---|
Description: | The BiSeq package provides useful classes and functions to handle and analyze targeted bisulfite sequencing (BS) data such as reduced-representation bisulfite sequencing (RRBS) data. In particular, it implements an algorithm to detect differentially methylated regions (DMRs). The package takes already aligned BS data from one or multiple samples. |
Authors: | Katja Hebestreit, Hans-Ulrich Klein |
Maintainer: | Katja Hebestreit <[email protected]> |
License: | LGPL-3 |
Version: | 1.47.0 |
Built: | 2024-11-29 04:10:40 UTC |
Source: | https://github.com/bioc/BiSeq |
GRanges
object by means of a second GRanges
objectEach genomic location of object
is checked for overlapping with
genomic ranges of regions
. In case of an overlapping, this
genomic location is marked as TRUE
, or with the identifier of
respective the regions
object (if any).
annotateGRanges(object, regions, name, regionInfo)
annotateGRanges(object, regions, name, regionInfo)
object |
A |
regions |
A |
name |
A string specifying the name of the metadata
column with the overlapping information to be added to
|
regionInfo |
OPTIONAL: A string or integer specifying the metadata column
column of |
If multiple ranges of regions
overlap with a genomic
region in object
, the identifier names of the overlapping regions
are seperated by ','.
A GRanges
object similar to object
containing an
additional metadata column with the overlapping information.
Katja Hebestreit
GRanges-class
# load detected DMRs: data(DMRs) # annotate the DMRs with a GRanges object: data(promoters) DMRs.anno <- annotateGRanges(object = DMRs, regions = promoters, name = 'Promoter', regionInfo = 'acc_no') DMRs.anno
# load detected DMRs: data(DMRs) # annotate the DMRs with a GRanges object: data(promoters) DMRs.anno <- annotateGRanges(object = DMRs, regions = promoters, name = 'Promoter', regionInfo = 'acc_no') DMRs.anno
This function models the methylation level within a beta regression. The
first independent variable in formula
is tested to be
unequal to zero.
betaRegression(formula, link, object, mc.cores, ...)
betaRegression(formula, link, object, mc.cores, ...)
formula |
Symbolic description of the model. For the first independent variable the P value (Wald test) and the effect on methylation is returned. For details see below. |
link |
A character specifying the link function in the mean
model (mu). Currently, |
object |
A |
mc.cores |
Passed to |
... |
Other parameters passed to the |
See betareg
function for details.
mclapply
A data.frame
containing the position, chromosome, P value, estimated
methylation level in group 1 and group 2 and methylation difference of
group 1 and group 2.
Katja Hebestreit
Hebestreit, K., Dugas, M., and Klein HU. Detection of significantly differentially methylated regions in targeted bisulfite sequencing Data. In preparation. Bioinformatics. 2013 Jul 1;29(13):1647-53.
See also reference list in the documentation of betareg
.
betareg
# load RRBS data, subset to save time, find CpG clusters and smooth methylation data: data(rrbs) rrbs.small <- rrbs[1:1000,] rrbs.clust.unlim <- clusterSites(object = rrbs.small, groups = colData(rrbs)$group, perc.samples = 4/5, min.sites = 20, max.dist = 100) ind.cov <- totalReads(rrbs.clust.unlim) > 0 quant <- quantile(totalReads(rrbs.clust.unlim)[ind.cov], 0.9) rrbs.clust.lim <- limitCov(rrbs.clust.unlim, maxCov = quant) # with a small subset to save calculation time: rrbs.part <- rrbs.clust.lim[1:100,] predictedMeth <- predictMeth(object=rrbs.part) betaResults <- betaRegression(formula = ~group, link = "probit", object = predictedMeth, type="BR")
# load RRBS data, subset to save time, find CpG clusters and smooth methylation data: data(rrbs) rrbs.small <- rrbs[1:1000,] rrbs.clust.unlim <- clusterSites(object = rrbs.small, groups = colData(rrbs)$group, perc.samples = 4/5, min.sites = 20, max.dist = 100) ind.cov <- totalReads(rrbs.clust.unlim) > 0 quant <- quantile(totalReads(rrbs.clust.unlim)[ind.cov], 0.9) rrbs.clust.lim <- limitCov(rrbs.clust.unlim, maxCov = quant) # with a small subset to save calculation time: rrbs.part <- rrbs.clust.lim[1:100,] predictedMeth <- predictMeth(object=rrbs.part) betaResults <- betaRegression(formula = ~group, link = "probit", object = predictedMeth, type="BR")
betaRegression
Please see the package vignette for description.
data(betaResults)
data(betaResults)
A data frame with 4276 observations on the following 10 variables:
chr
a factor with levels chr1
chr2
pos
a numeric vector
p.val
a numeric vector
meth.group1
a numeric vector
meth.group2
a numeric vector
meth.diff
a numeric vector
estimate
a numeric vector
std.error
a numeric vector
pseudo.R.sqrt
a numeric vector
cluster.id
a character vector
data(betaResults) head(betaResults)
data(betaResults) head(betaResults)
betaRegression
for resampled data
Please see the package vignette for description.
data(betaResultsNull)
data(betaResultsNull)
A data frame with 4276 observations on the following 10 variables.
chr
a factor with levels chr1
chr2
pos
a numeric vector
p.val
a numeric vector
meth.group1
a numeric vector
meth.group2
a numeric vector
meth.diff
a numeric vector
estimate
a numeric vector
std.error
a numeric vector
pseudo.R.sqrt
a numeric vector
cluster.id
a character vector
data(betaResultsNull) head(betaResultsNull)
data(betaResultsNull) head(betaResultsNull)
For a given set of binomial random variables with 1-dimensional coordinates, this function calculates the local likelihood estimation of the success probability p at a given point. For this purpose, a weighted likelihood estimation with weights obtained by a triangular kernel with given bandwidth is used. This can be used to predict values at points where no variable has been observed and/or to smooth observations using neighboured observations.
binomLikelihoodSmooth(pred.pos, pos, m, n, h)
binomLikelihoodSmooth(pred.pos, pos, m, n, h)
pred.pos |
A vector of positions where p should be estimated. |
pos |
A vector of positions where binomial variables have been observed. |
m |
A vector of length |
n |
A vector of length |
h |
The bandwidth of the kernel. |
For a given position x, the weighted likelihood for parameter
is maximized. B denotes the binomial probability function. The weights
are calculated using a triangular kernel with bandwidth
:
A vector of length pred.pos
giving the local likelihood estimation of
the success probability p at the given positions.
Hans-Ulrich Klein
n = rpois(100, lambda=10) E = c(rep(0.4, 30), rep(0.8, 40), rep(0.1, 30)) m = rbinom(100, n, E) pos = 1:100 p_10 = binomLikelihoodSmooth(pos, pos, m, n, h=10) p_20 = binomLikelihoodSmooth(pos, pos, m, n, h=20) ## Not run: plot(x=pos, y=m/n) points(x=pos, y=p_10, col="green") lines(x=pos, y=p_10, col="green") points(x=pos, y=p_20, col="red") lines(x=pos, y=p_20, col="red") ## End(Not run)
n = rpois(100, lambda=10) E = c(rep(0.4, 30), rep(0.8, 40), rep(0.1, 30)) m = rbinom(100, n, E) pos = 1:100 p_10 = binomLikelihoodSmooth(pos, pos, m, n, h=10) p_20 = binomLikelihoodSmooth(pos, pos, m, n, h=20) ## Not run: plot(x=pos, y=m/n) points(x=pos, y=p_10, col="green") lines(x=pos, y=p_10, col="green") points(x=pos, y=p_20, col="red") lines(x=pos, y=p_20, col="red") ## End(Not run)
The BSraw
class is derived from
RangedSummarizedExperiment
and contains a SimpleList
of
matrices named methReads
and totalReads
as assays
.
Objects can be created by calls of the form
BSraw(metadata = list(),
rowRanges,
colData = DataFrame(row.names=colnames(methReads)),
methReads,
totalReads,
...)
.
However, one will most likely create a BSraw
object when use
readBismark
to load data.
metadata
:An optional list
of arbitrary
content describing the overall experiment.
rowRanges
:Object of class "GRanges"
containing the genome positions of CpG-sites covered by bisulfite
sequencing. WARNING: The accessor for this slot is rowRanges
,
not rowRanges
!
colData
:Object of class "DataFrame"
containing information on variable values of the samples.
assays
:Object of class SimpleList
of two
matrices, named totalReads
and methReads
. The matrix
totalReads
contains the number of reads spanning a
CpG-site. The rows represent the CpG sites in rowRanges
and
the columns represent the samples in colData
. The matrix
methReads
contains the number of methylated reads spanning
a CpG-site.
Class "RangedSummarizedExperiment"
, directly.
signature(x = "BSraw")
: Gets the totalReads
slot.
signature(x = "BSraw", value = "matrix")
:
Sets the totalReads
slot.
signature(x = "BSraw")
: Gets the methReads
slot.
signature(x = "BSraw", value = "matrix")
:
Sets the methReads
slot.
signature(x = "BSraw", y = "BSraw")
: Combines two
BSraw
objects.
Katja Hebestreit
RangedSummarizedExperiment, BSrel-class
, readBismark
showClass("BSraw") ## How to create a BSraw object by hand: metadata <- list(Sequencer = "Sequencer", Year = "2013") rowRanges <- GRanges(seqnames = "chr1", ranges = IRanges(start = c(1,2,3), end = c(1,2,3))) colData <- DataFrame(group = c("cancer", "control"), row.names = c("sample_1", "sample_2")) totalReads <- matrix(c(rep(10L, 3), rep(5L, 3)), ncol = 2) methReads <- matrix(c(rep(5L, 3), rep(5L, 3)), ncol = 2) BSraw(metadata = metadata, rowRanges = rowRanges, colData = colData, totalReads = totalReads, methReads = methReads) ## A more realistic example can be loaded: data(rrbs) rrbs head(totalReads(rrbs)) head(methReads(rrbs))
showClass("BSraw") ## How to create a BSraw object by hand: metadata <- list(Sequencer = "Sequencer", Year = "2013") rowRanges <- GRanges(seqnames = "chr1", ranges = IRanges(start = c(1,2,3), end = c(1,2,3))) colData <- DataFrame(group = c("cancer", "control"), row.names = c("sample_1", "sample_2")) totalReads <- matrix(c(rep(10L, 3), rep(5L, 3)), ncol = 2) methReads <- matrix(c(rep(5L, 3), rep(5L, 3)), ncol = 2) BSraw(metadata = metadata, rowRanges = rowRanges, colData = colData, totalReads = totalReads, methReads = methReads) ## A more realistic example can be loaded: data(rrbs) rrbs head(totalReads(rrbs)) head(methReads(rrbs))
The BSrel
class is derived from
RangedSummarizedExperiment
and contains a SimpleList
of one
matrix named methLevel
as assays
.
Objects can be created by calls of the form
BSrel(metadata = list(),
rowRanges,
colData = DataFrame(row.names=colnames(methLevel)),
methLevel,
...)
.
However, one will most likely create a BSraw
object when use
readBismark
to load data.
metadata
:An optional list
of arbitrary
content describing the overall experiment.
rowRanges
:Object of class "GRanges"
containing the genome positions of CpG-sites covered by bisulfite
sequencing. WARNING: The accessor for this slot is rowRanges
,
not rowRanges
!
colData
:Object of class "DataFrame"
containing information on variable values of the samples.
assays
:Object of class SimpleList
of a
matrix, named methLevel
containing the methylation levels
(between 0 and 1) per CpG site. The rows represent the CpG sites
in rowRanges
and the columns represent the samples in colData
.
Class "RangedSummarizedExperiment"
, directly.
signature(x = "BSrel")
: Gets the methLevel
slot.
signature(x = "BSrel", value = "matrix")
:
Sets the methLevel
slot.
signature(x = "BSrel", y = "BSrel")
: Combines two
BSrel
objects.
Katja Hebestreit
RangedSummarizedExperiment, BSraw-class
, readBismark
showClass("BSrel") ## How to create a BSrel object by hand: metadata <- list(Sequencer = "Sequencer", Year = "2013") rowRanges <- GRanges(seqnames = "chr1", ranges = IRanges(start = c(1,2,3), end = c(1,2,3))) colData <- DataFrame(group = c("cancer", "control"), row.names = c("sample_1", "sample_2")) methLevel <- matrix(c(rep(0.5, 3), rep(1, 3)), ncol = 2) BSrel(metadata = metadata, rowRanges = rowRanges, colData = colData, methLevel = methLevel) # Or get a BSrel object out of a BSraw object: data(rrbs) rrbs.rel <- rawToRel(rrbs)
showClass("BSrel") ## How to create a BSrel object by hand: metadata <- list(Sequencer = "Sequencer", Year = "2013") rowRanges <- GRanges(seqnames = "chr1", ranges = IRanges(start = c(1,2,3), end = c(1,2,3))) colData <- DataFrame(group = c("cancer", "control"), row.names = c("sample_1", "sample_2")) methLevel <- matrix(c(rep(0.5, 3), rep(1, 3)), ncol = 2) BSrel(metadata = metadata, rowRanges = rowRanges, colData = colData, methLevel = methLevel) # Or get a BSrel object out of a BSraw object: data(rrbs) rrbs.rel <- rawToRel(rrbs)
BSraw
objects
Within a BSraw
object clusterSites
searches for agglomerations of CpG sites across
all samples. In a first step the data is reduced to CpG sites covered in
round(perc.samples*ncol(object))
samples, these are
called 'frequently covered CpG sites'. In a second step regions are detected
where not less than min.sites
frequently covered CpG sites are sufficiantly
close to each other (max.dist
). Note, that the frequently covered CpG sites
are considered to define the boundaries of the CpG clusters only. For the subsequent
analysis the methylation data of all CpG sites within these clusters
are used.
clusterSites(object, groups, perc.samples, min.sites, max.dist, mc.cores, ...)
clusterSites(object, groups, perc.samples, min.sites, max.dist, mc.cores, ...)
object |
A |
groups |
OPTIONAL. A factor specifying two or more sample groups within the given object. See Details. |
perc.samples |
A numeric between 0 and 1. Is passed to |
min.sites |
A numeric. Clusters should comprise at least |
max.dist |
A numeric. CpG sites which are covered in at least
|
mc.cores |
Passed to |
... |
Further arguments passed to the |
There are three parameters that are important: perc.samples
, min.sites
and max.dist
.
For example, if perc.samples=0.5
, the algorithm detects all CpG sites that are covered in at least 50%
of the samples. Those CpG sites are called frequently covered CpG sites. In the next step the algorithm
determines the distances between neighboured frequently covered CpG sites.
When they are closer than (or close as) max.dist
base pairs to each other,
those frequently covered CpG sites and all other, less frequently covered CpG sites that are
in between, belong to the same cluster. In the third step, each cluster is checked
for the number of frequently covered CpG sites. If this number is less than min.sites
,
the cluster is discarded.
In other words:
1. The perc.samples
parameter defines which are the frequently covered CpG sites.
2. The frequently covered CpG sites determine the boundaries of the clusters,
depending on their distance to each other.
3. Clusters are discarded if they have too less frequently covered CpG sites.
If argument group
is given, perc.samples
, or no.samples
, are
applied for all group levels.
A BSraw
object reduced to CpG sites within CpG cluster regions. A cluster.id
metadata column on the rowRanges
assigns cluster memberships per CpG site.
Katja Hebestreit
filterBySharedRegions
, mclapply
data(rrbs) rrbs.clust <- clusterSites(object = rrbs, groups = colData(rrbs)$group, perc.samples = 4/5, min.sites = 20, max.dist = 100)
data(rrbs) rrbs.clust <- clusterSites(object = rrbs, groups = colData(rrbs)$group, perc.samples = 4/5, min.sites = 20, max.dist = 100)
GRanges
object of CpG clusters from BSraw
and BSrel
objects
This function allows to get the start and end positions of CpG clusters
from a BSraw
or BSrel
object, when there is a cluster.id
column in the rowRanges
slot.
clusterSitesToGR(object)
clusterSitesToGR(object)
object |
A |
An object of class GRanges
is returned.
Katja Hebestreit
data(rrbs) rrbs.clustered <- clusterSites(rrbs) clusterSitesToGR(rrbs.clustered)
data(rrbs) rrbs.clustered <- clusterSites(rrbs) clusterSitesToGR(rrbs.clustered)
Determines the differences of (smoothed) methylation levels between two samples and aggregates the sites surpassing a minimum difference to DMRs.
compareTwoSamples(object, sample1, sample2, minDiff, max.dist)
compareTwoSamples(object, sample1, sample2, minDiff, max.dist)
object |
A |
sample1 |
A numeric or character specifying the first sample to be used. |
sample2 |
A numeric or character specifying the second sample to be used. |
minDiff |
A numeric greater than 0 and smaller or equal to 1. |
max.dist |
Numeric. The maximum distance between two CpG sites (or grid points)
with absolute methylation differences greater or equal
than |
This function determines the differences between the methylation levels of sample1 and sample2 for each site. Successive sites with methylation differences smaller or equal to minDiff
are summarized.
A GRanges object.
Katja Hebestreit
data(rrbs) rrbs <- rrbs[, c(1,6)] CpG.clusters <- clusterSites(object = rrbs, perc.samples = 1, min.sites = 20, max.dist = 100) predictedMeth <- predictMeth(object = CpG.clusters) DMRs <- compareTwoSamples(predictedMeth, sample1 = 1, sample2 = 2, minDiff = 0.3, max.dist = 100)
data(rrbs) rrbs <- rrbs[, c(1,6)] CpG.clusters <- clusterSites(object = rrbs, perc.samples = 1, min.sites = 20, max.dist = 100) predictedMeth <- predictMeth(object = CpG.clusters) DMRs <- compareTwoSamples(predictedMeth, sample1 = 1, sample2 = 2, minDiff = 0.3, max.dist = 100)
A boxplot per sample is plotted for the coverages of CpG-sites.
It is constrained to CpG-sites which are covered in the respective sample (coverage != 0
and not NA
).
R covBoxplots(object, ...)
R covBoxplots(object, ...)
object |
A |
... |
Other graphical parameters passed to the |
Katja Hebestreit
boxplot
data(rrbs) covBoxplots(rrbs)
data(rrbs) covBoxplots(rrbs)
This function produces information per samples about 1.) the covered CpG-sites 2.) the median of their coverages.
covStatistics(object)
covStatistics(object)
object |
A |
Katja Hebestreit
data(rrbs) covStatistics(rrbs)
data(rrbs) covStatistics(rrbs)
findDMRs
Please see the package vignette for description.
data(DMRs)
data(DMRs)
A GRanges
of the chromosomes, start and end positions of the
detected DMRs together with information (in the metadata columns) on
DMRs: median.p
, codemedian.meth.group1,
codemedian.meth.group2, median.meth.diff
.
data(DMRs) head(DMRs)
data(DMRs) head(DMRs)
For each location the correlation of this location's z-score to
of its CpG cluster is estimated.
estLocCor(vario.sm)
estLocCor(vario.sm)
vario.sm |
Output of |
A list:
variogram |
A variogram matrix, usually created by
|
pValsList |
A list of the test results per CpG cluster. |
sigma.cluster |
The standard deviations of z-scores within each cluster. |
Z.cluster |
The arithmetic means of the z-scores for each cluster. |
length.cluster |
The widths (number of pase pairs) of each cluster. |
Katja Hebestreit
Yoav Benjamini and Ruth Heller (2007): False Discovery Rates for Spatial Signals. American Statistical Association, 102 (480): 1272-81.
makeVariogram
, smoothVariogram
data(betaResultsNull) vario <- makeVariogram(betaResultsNull) vario.sm <- smoothVariogram(vario, sill = 1) locCor <- estLocCor(vario.sm)
data(betaResultsNull) vario <- makeVariogram(betaResultsNull) vario.sm <- smoothVariogram(vario, sill = 1) locCor <- estLocCor(vario.sm)
BSraw
object with a minimum
coverage
This method reduces a BSraw
object to its regions (or single CpGs) with a minimum number of reads.
filterByCov(object, minCov, global)
filterByCov(object, minCov, global)
object |
A |
minCov |
Minimum number of reads overlapping the CpG sites. |
global |
A logical indicating whether the regions should achieve
the minimum coverage in each sample. If |
A BSraw
object containing the CpGs or regions achieving
the minimum coverage in all (if global=TRUE
) or at least one
(if global=FALSE
) samples.
Katja Hebestreit
data(rrbs) rrbs.reduced <- filterByCov(object=rrbs, minCov=10, global=TRUE)
data(rrbs) rrbs.reduced <- filterByCov(object=rrbs, minCov=10, global=TRUE)
This function aggregates CpG sites to DMRs on the basis of their P values.
findDMRs(test.out, alpha, max.dist, diff.dir)
findDMRs(test.out, alpha, max.dist, diff.dir)
test.out |
An object returned by |
alpha |
OPTIONAL. A DMR contains CpG sites with P values smaller or equal than |
max.dist |
Numeric. The maximum distance between two P values smaller than
|
diff.dir |
Logical. Should DMRs be seperated if the direction of methylation
differences changes? If |
A GRanges
object storing the start and end positions of the DMRs with information in metadata columns:
median.p |
median of P values |
median.meth.group1 |
median of modeled methylation level of group1. |
median.meth.group1 |
median of modeled methylation level of group2. |
median.meth.diff |
median of difference of modeled methylation levels of group1 and group2. |
Katja Hebestreit
## Variogram under Null hypothesis (for resampled data): data(vario) plot(vario$variogram$v) vario.sm <- smoothVariogram(vario, sill=0.9) # auxiliary object to get the pValsList for the test # results of interest: data(betaResults) vario.aux <- makeVariogram(betaResults, make.variogram=FALSE) # Replace the pValsList slot: vario.sm$pValsList <- vario.aux$pValsList ## vario.sm contains the smoothed variogram under the Null hypothesis as ## well as the p Values that the group has an effect on DNA methylation. locCor <- estLocCor(vario.sm) clusters.rej <- testClusters(locCor, FDR.cluster = 0.1) clusters.trimmed <- trimClusters(clusters.rej, FDR.loc = 0.05) DMRs <- findDMRs(clusters.trimmed, max.dist=100, diff.dir=TRUE)
## Variogram under Null hypothesis (for resampled data): data(vario) plot(vario$variogram$v) vario.sm <- smoothVariogram(vario, sill=0.9) # auxiliary object to get the pValsList for the test # results of interest: data(betaResults) vario.aux <- makeVariogram(betaResults, make.variogram=FALSE) # Replace the pValsList slot: vario.sm$pValsList <- vario.aux$pValsList ## vario.sm contains the smoothed variogram under the Null hypothesis as ## well as the p Values that the group has an effect on DNA methylation. locCor <- estLocCor(vario.sm) clusters.rej <- testClusters(locCor, FDR.cluster = 0.1) clusters.trimmed <- trimClusters(clusters.rej, FDR.loc = 0.05) DMRs <- findDMRs(clusters.trimmed, max.dist=100, diff.dir=TRUE)
This method is a wrapper for conveniently invoking the
globaltest method gt
on a BSrel-class
object. The
globaltest can be applied to test against a high dimensional
alternative in various regression models. E.g., it can be used to test
whether at least one CpG is differentially methylated between two
groups.
globalTest(response, alternative, ...)
globalTest(response, alternative, ...)
response |
The response vector of the regression model. May be
supplied as a vector or as a |
alternative |
An object of |
... |
Other arguments passed to the |
For details see the documentation of the gt
method in
package globaltest.
The function returns an object of class gt.object
. Several
operations and diagnostic plots for this class are provided by the
globaltest package.
Hans-Ulrich Klein
Goeman, J. J., van de Geer, S. A., and van Houwelingen, J. C. (2006). Testing against a high-dimensional alternative. Journal of the Royal Statistical Society Series B- Statistical Methodology, 68(3):477-493.
link{gt}
, link{BSrel}
data(rrbs) rrbs <- rawToRel(rrbs) regions <- GRanges(IRanges(start=c(850000, 1920000, 500), end=c(879000, 1980000, 600)), seqnames=c("chr1", "chr2", "chr3")) globalTest(group~1, rrbs) globalTest(group~1, rrbs, subsets=regions)
data(rrbs) rrbs <- rawToRel(rrbs) regions <- GRanges(IRanges(start=c(850000, 1920000, 500), end=c(879000, 1980000, 600)), seqnames=c("chr1", "chr2", "chr3")) globalTest(group~1, rrbs) globalTest(group~1, rrbs, subsets=regions)
BSraw
objectNumber of methylated and unmethylated reads
of a CpG site with coverage above maxCov
are reduced such that
the methylation level remains unchanged.
limitCov(object, maxCov)
limitCov(object, maxCov)
object |
A |
maxCov |
The maximum number of reads a CpG should have. All coverages above this threshold are limited. (Default is 50) |
This function might be useful prior to the use of
predictMeth
to limit the weights of CpGs with extremly
high coverages. See binomLikelihoodSmooth
for details.
A BSraw
object.
Katja Hebestreit
predictMeth
, binomLikelihoodSmooth
data(rrbs) rrbs.clust.unlim <- clusterSites(object = rrbs, groups = colData(rrbs)$group, perc.samples = 4/5, min.sites = 20, max.dist = 100) covBoxplots(rrbs.clust.unlim) # 90% quantile of coverage is 39x quantile(totalReads(rrbs.clust.unlim)[totalReads(rrbs.clust.unlim)>0], 0.9) rrbs.clust.lim <- limitCov(rrbs.clust.unlim, maxCov = 39) covBoxplots(rrbs.clust.lim)
data(rrbs) rrbs.clust.unlim <- clusterSites(object = rrbs, groups = colData(rrbs)$group, perc.samples = 4/5, min.sites = 20, max.dist = 100) covBoxplots(rrbs.clust.unlim) # 90% quantile of coverage is 39x quantile(totalReads(rrbs.clust.unlim)[totalReads(rrbs.clust.unlim)>0], 0.9) rrbs.clust.lim <- limitCov(rrbs.clust.unlim, maxCov = 39) covBoxplots(rrbs.clust.lim)
It is used to fit a linear model on the log odds of each (smoothed) methylation level. The first independent variable in formula
is tested to be unequal to zero.
logisticRegression(formula, link, object, mc.cores)
logisticRegression(formula, link, object, mc.cores)
formula |
An object of class |
link |
A character specifying the link function. Currently,
|
object |
A |
mc.cores |
Passed to |
A data.frame
containing the position, chromosome, P value, estimated
methylation level in group 1 and group 2 and methylation difference of
group 1 and group 2.
Katja Hebestreit
mclapply
, glm
data(predictedMeth) logisticResults <- logisticRegression(formula = ~group, link = "logit", object = predictedMeth)
data(predictedMeth) logisticResults <- logisticRegression(formula = ~group, link = "logit", object = predictedMeth)
A function which estimates the variogram of the z-scores in the given data frame.
makeVariogram(test.out, make.variogram, sample.clusters, max.dist)
makeVariogram(test.out, make.variogram, sample.clusters, max.dist)
test.out |
A |
make.variogram |
A |
sample.clusters |
Can speed up variogram estimation
significantly. Default is |
max.dist |
Can speed up variogram estimation significantly. The variogram is estimated for distances until this threshold. Default is 500 base pairs, since the variogram usually does not change for distances larger than 100 base pairs, because methylation of CpG sites further away are not correlated anymore. Especially useful if there are large clusters. |
For each CpG site the z-score is determined by qnorm
(1 -
P value). The variogram of the z-scores of locations
and
within one cluster is estimated robustly by
.
A list:
variogram |
A |
pValsList |
A |
Katja Hebestreit
Yoav Benjamini and Ruth Heller (2007): False Discovery Rates for Spatial Signals. American Statistical Association, 102 (480): 1272-81.
data(betaResults) vario <- makeVariogram(betaResults) plot(vario$variogram$v)
data(betaResults) vario <- makeVariogram(betaResults) plot(vario$variogram$v)
plotBindingSites
takes several genomic regions (e.g. protein binding
sites), centers them such that the position 0 refers to the center of each region
and finally calculates the mean methylation of all regions for each given sample.
If several samples are given, the median of the samples' methylation values and
optionally other quantiles are plotted.
plotBindingSites(object, regions, width, groups, quantiles, bandwidth, ...)
plotBindingSites(object, regions, width, groups, quantiles, bandwidth, ...)
object |
An object of class |
regions |
Regions given a |
width |
The width of the genomic region that is plotted. Default value is the width of the largest given region. |
groups |
An optional factor defining two or more groups within the given object. The mean methylation is than plotted for each group separately. |
quantiles |
Other quantiles to be plotted besides the median. Default are the 25% and the 75% quantiles. |
bandwidth |
The bandwidth of the kernel smoother used for smoothing methylation values. Default value is 1/8 |
... |
Other graphical parameters passed to the |
First, all regions were expanded or shrinked to the given width
by adding
or removing base pairs symmetrically at both ends of the regions (not by scaling).
A new coordinate system is centered at the middle of the equally sized regions.
Next, the relative methylation values for each sample are averaged accross all
regions. That means, if there are several CpGs from different regions lying the
same position, the mean methylation value is calculated for that position. Then,
the median of these methylation values across all samples is calculated.
Optionally, other quantiles are calculated, too. The median of the methylation is
then plotted for each position after smoothing using a gaussian kernel with the
given bandwidth.
If the given regions correspond to binding sites of a certain protein, the plot can be used to discover whether the protein induces changes in the DNA methylation in the proximity of its binding sites.
Hans-Ulrich Klein
data(rrbs) data(promoters) plotBindingSites(object=rrbs, regions=promoters, width=4000, groups=colData(rrbs)$group)
data(rrbs) data(promoters) plotBindingSites(object=rrbs, regions=promoters, width=4000, groups=colData(rrbs)$group)
This function plots the raw and the smoothed methylation data for one sample and a given region. The smoothed data is shown as a line (one line per CpG cluster) and the raw data is shown as points with color intensities proportional to the coverage.
plotMeth(object.raw, object.rel, region, col.lines, lwd.lines, col.points, ...)
plotMeth(object.raw, object.rel, region, col.lines, lwd.lines, col.points, ...)
object.raw |
A |
object.rel |
A |
region |
A |
col.lines |
OPTIONAL. The color for the line representing the smoothed methylation values. |
lwd.lines |
OPTIONAL. The line width for the line representing the smoothed methylation values. |
col.points |
OPTIONAL. The color for the points representing the raw methylation levels. |
... |
Other graphical parameters passed to the |
Katja Hebestreit
plotSmoothMeth
, plot
data(rrbs) data(predictedMeth) region <- GRanges(seqnames="chr1", ranges=IRanges(start = 875200, end = 875500)) plotMeth(object.raw = rrbs[,6], object.rel = predictedMeth[,6], region = region)
data(rrbs) data(predictedMeth) region <- GRanges(seqnames="chr1", ranges=IRanges(start = 875200, end = 875500)) plotMeth(object.raw = rrbs[,6], object.rel = predictedMeth[,6], region = region)
A heatmap like plot is generated showing the relative methylation of single CpG sites. Samples are clustered hierarchically.
plotMethMap(object, region, groups, intervals, ...)
plotMethMap(object, region, groups, intervals, ...)
object |
A |
region |
A |
groups |
OPTIONAL. A |
intervals |
OPTIONAL. A |
... |
Further arguments passed to the |
The relative methylation values are passed to the heatmap function. Default
colors are green (not methylated), black and red (methylated). To ensure
that a relative methylation of 0 corresponds to green, 0.5 to black and
1 to red, the default value for the zlim
argument of the
heatmap
function is set to c(0,1)
. And the default
for the scale
parameter is set to "none"
.
If argument intervals
is set to TRUE
, region should not
be too large (< 1kb) and respect the resolution of your screen.
Hans-Ulrich Klein
heatmap
, BSraw-class
, BSrel-class
, filterBySharedRegions
,
filterByCov
data(rrbs) data(predictedMeth) data(DMRs) plotMethMap(rrbs, region = DMRs[4], groups = colData(rrbs)[, "group"]) plotMethMap(predictedMeth, region = DMRs[4], groups = colData(rrbs)[,"group"], intervals = FALSE)
data(rrbs) data(predictedMeth) data(DMRs) plotMethMap(rrbs, region = DMRs[4], groups = colData(rrbs)[, "group"]) plotMethMap(predictedMeth, region = DMRs[4], groups = colData(rrbs)[,"group"], intervals = FALSE)
This function plots the smoothed methylation data as lines for a given region and all given samples. It is also possible to average the data for groups of samples.
plotSmoothMeth(object.rel, region, groups, group.average, ...)
plotSmoothMeth(object.rel, region, groups, group.average, ...)
object.rel |
A |
region |
A |
groups |
OPTIONAL. A |
group.average |
OPTIONAL. A |
... |
Other graphical parameters passed to the |
Katja Hebestreit
plotMeth
, plot
data(predictedMeth) data(DMRs) plotSmoothMeth(object.rel = predictedMeth, region = DMRs[3] + 200, groups = colData(predictedMeth)$group, col=c("magenta", "blue")) legend("topright", lty=1, legend=levels(colData(predictedMeth)$group), col=c("magenta", "blue"))
data(predictedMeth) data(DMRs) plotSmoothMeth(object.rel = predictedMeth, region = DMRs[3] + 200, groups = colData(predictedMeth)$group, col=c("magenta", "blue")) legend("topright", lty=1, legend=levels(colData(predictedMeth)$group), col=c("magenta", "blue"))
predictMeth
Please see the package vignette for description.
data(predictedMeth)
data(predictedMeth)
A BSrel
object with the smoothed methylation data.
data(predictedMeth) show(predictedMeth)
data(predictedMeth) show(predictedMeth)
Uses local regression to predict methylation levels per sample.
predictMeth(object, h, grid.dist, mc.cores)
predictMeth(object, h, grid.dist, mc.cores)
object |
A |
h |
Bandwidth in base pairs. Large values produce a smoother curve. Default is 80. |
grid.dist |
OPTIONAL. If |
mc.cores |
Passed to |
Uses binomLikelihoodSmooth
with pos
= CpG position, m
= number
methylated reads and n
= number of reads. pred.pos
corresponds to all CpG positions, or to the grid sites respectively, within the CpG clusters.
A BSrel
object containing the predicted methylation levels in
the methLevel
slot.
Katja Hebestreit
clusterSites
, binomLikelihoodSmooth
, mclapply
data(rrbs) rrbs.clust.unlim <- clusterSites(object = rrbs, groups = colData(rrbs)$group, perc.samples = 4/5, min.sites = 20, max.dist = 100) ind.cov <- totalReads(rrbs.clust.unlim) > 0 quant <- quantile(totalReads(rrbs.clust.unlim)[ind.cov], 0.9) rrbs.clust.lim <- limitCov(rrbs.clust.unlim, maxCov = quant) # with a small subset to save calculation time: rrbs.part <- rrbs.clust.lim[1:100,] predictedMeth <- predictMeth(object=rrbs.part)
data(rrbs) rrbs.clust.unlim <- clusterSites(object = rrbs, groups = colData(rrbs)$group, perc.samples = 4/5, min.sites = 20, max.dist = 100) ind.cov <- totalReads(rrbs.clust.unlim) > 0 quant <- quantile(totalReads(rrbs.clust.unlim)[ind.cov], 0.9) rrbs.clust.lim <- limitCov(rrbs.clust.unlim, maxCov = quant) # with a small subset to save calculation time: rrbs.part <- rrbs.clust.lim[1:100,] predictedMeth <- predictMeth(object=rrbs.part)
GRanges
of promoters of the human genome
Please see the package vignette for description.
data(promoters)
data(promoters)
A GRanges
object with the chromosomes, start and end positions of
defined human promoter regions together with an accesion number stored in a
metadata column.
data(promoters) head(promoters)
data(promoters) head(promoters)
BSraw
object to a BSrel
objectDetermines the methLevel
matrix via: methReads(object) / totalReads(object)
.
rawToRel(object)
rawToRel(object)
object |
A |
A BSrel
.
Katja Hebestreit
data(rrbs) rrbs.rel <- rawToRel(rrbs)
data(rrbs) rrbs.rel <- rawToRel(rrbs)
Bismark is a bisulfite read mapper and methylation caller. This method reads Bismark's
output files and returns a BSraw
object.
readBismark(files, colData)
readBismark(files, colData)
files |
A |
colData |
Samples' names plus additional sample information
as |
Input files are created with Bismark as follows (from the command line):
bismark_methylation_extractor -s --comprehensive test_sample.sam
bismark2bedGraph -o CpG_context_test_sample.bedGraph
CpG_context_test_sample.txt
This will output two files, a .bedGraph
and a .cov
file.
We will import the CpG_context_test_sample.cov
using readBismark
.
The colData
argument should specify the sample names as
character
. Alternatively, a data.frame
or
DataFrame
can be given. Then, the row names are used as
sample names and the data frame is passed to the final
BSraw
object.
A BSraw
object storing coverage and methylation information.
Hans-Ulrich Klein
http://www.bioinformatics.bbsrc.ac.uk/projects/bismark/
file <- system.file("extdata", "CpG_context_test_sample.cov", package = "BiSeq") rrbs <- readBismark(file, colData= DataFrame(row.names="sample_1"))
file <- system.file("extdata", "CpG_context_test_sample.cov", package = "BiSeq") rrbs <- readBismark(file, colData= DataFrame(row.names="sample_1"))
RRBS data of the CpG sites CpG sites from genomic regions on p arms of chromosome 1 and 2 covered in at least one sample. Data was obtained from 5 APL patient samples and 5 control samples (APL in remission). RRBS data was preprocessed with the Bismark software version 0.5.
rrbs
rrbs
A BSraw-class
object.
Schoofs T, Rohde C, Hebestreit K, Klein HU, Goellner S, Schulze I, Lerdrup M, Dietrich N, Agrawal-Singh S, Witten A, Stoll M, Lengfelder E, Hofmann WK, Schlenke P, Buechner T, Hansen K, Berdel WE, Rosenbauer F, Dugas M, Mueller-Tidow C (2012). DNA methylation changes are a late event in acute promyelocytic leukemia and coincide with loss of transcription factor binding. Blood.
Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 2011;27:1571-1572.
data(rrbs) show(rrbs)
data(rrbs) show(rrbs)
Nonparametric smoothing with kernel regression estimators and adaptable bandwidth for variogram smoothing.
smoothVariogram(variogram, sill, bandwidth)
smoothVariogram(variogram, sill, bandwidth)
variogram |
A |
sill |
A |
bandwidth |
A numeric vector of same length as the variogram (number of
rows). Default: |
It is necessary to smooth the variogram. Especially for greater
h
the variogram tends to oscillate strongly. This is the reason
why the default bandwidth increases with increasing
h
. Nevertheless, the smoothed variogram may further increase or
decrease after a horizontal part (sill). This is mostly due to the small number
of observations for high distances. To wipe out this bias it is useful to
set the smoothed variogram to a fixed value above a certain h
, usually the mean value of the
horizontal part. If a smoothed value is greater than
sill
for distance , this
and all other
smoothed values with
are set to
sill
. Internally, the function lokerns
from package lokerns
is used for smoothing.
The variogram matrix (or a list with the variogram matrix) with an additional column of the smoothed
v
values.
Katja Hebestreit
makeVariogram
, lokerns
data(vario) # Find out the sill (this is more obvious for larger data sets): plot(vario$variogram$v) vario.sm <- smoothVariogram(vario, sill = 0.9) plot(vario$variogram$v) lines(vario.sm$variogram[,c("h", "v.sm")], col = "red")
data(vario) # Find out the sill (this is more obvious for larger data sets): plot(vario$variogram$v) vario.sm <- smoothVariogram(vario, sill = 0.9) plot(vario$variogram$v) lines(vario.sm$variogram[,c("h", "v.sm")], col = "red")
This method summarizes the methylation states of single CpG sites to a single methylation state for a given genomic region.
summarizeRegions(object, regions, outputAll)
summarizeRegions(object, regions, outputAll)
object |
An |
regions |
A |
outputAll |
A logical. If |
When the given object is of class BSraw-class
, all
(methylated) reads of all CpG site lying within a region are summed up and
assign as total number of (methylated) reads to that region. It is recommended
to use limitCov
before applying summarizeRegions
to an
BSraw-class
object in order to avoid an excessive influence of a
single CpG site on the methylation value of a region. When the
given object is of class BSrel-class
, the mean relative methylation
of all CpGs within a region is assign to that region.
The rowRanges
slot of the returned object is the given object
regions
with all columns preserved.
An BSraw
or an BSrel
object storing methylation information about
the given regions.
Hans-Ulrich Klein
BSraw-class
, BSrel-class
, limitCov
data(rrbs) rrbs.clustered <- clusterSites(rrbs) regions <- clusterSitesToGR(rrbs.clustered) rrbs <- limitCov(rrbs, maxCov=50) rrbsRegion <- summarizeRegions(rrbs, regions) totalReads(rrbsRegion)
data(rrbs) rrbs.clustered <- clusterSites(rrbs) regions <- clusterSitesToGR(rrbs.clustered) rrbs <- limitCov(rrbs, maxCov=50) rrbsRegion <- summarizeRegions(rrbs, regions) totalReads(rrbsRegion)
CpG clusters are tested with a cluster-wise FDR level.
testClusters(locCor, FDR.cluster)
testClusters(locCor, FDR.cluster)
locCor |
Output of |
FDR.cluster |
A |
CpG clusters containing at least one differentially methylated location are detected.
A list is returned:
FDR.cluster |
Chosen WFDR (weighted FDR) for clusters. |
CpGs.clust.reject |
A |
CpGs.clust.not.reject |
A |
clusters.reject |
A |
clusters.not.reject |
A |
sigma.clusters.reject |
The standard deviations for z-scores within each rejected cluster. |
variogram |
The variogram matrix. |
m |
Number of clusters tested. |
k |
Number of clusters rejected. |
u.1 |
Cutoff point of the largest P value rejected. |
Katja Hebestreit
Yoav Benjamini and Ruth Heller (2007): False Discovery Rates for Spatial Signals. American Statistical Association, 102 (480): 1272-81.
## Variogram under Null hypothesis (for resampled data): data(vario) plot(vario$variogram$v) vario.sm <- smoothVariogram(vario, sill=0.9) # auxiliary object to get the pValsList for the test # results of interest: data(betaResults) vario.aux <- makeVariogram(betaResults, make.variogram=FALSE) # Replace the pValsList slot: vario.sm$pValsList <- vario.aux$pValsList ## vario.sm contains the smoothed variogram under the Null hypothesis as ## well as the p Values that the group has an effect on DNA methylation. locCor <- estLocCor(vario.sm) clusters.rej <- testClusters(locCor, FDR.cluster = 0.1)
## Variogram under Null hypothesis (for resampled data): data(vario) plot(vario$variogram$v) vario.sm <- smoothVariogram(vario, sill=0.9) # auxiliary object to get the pValsList for the test # results of interest: data(betaResults) vario.aux <- makeVariogram(betaResults, make.variogram=FALSE) # Replace the pValsList slot: vario.sm$pValsList <- vario.aux$pValsList ## vario.sm contains the smoothed variogram under the Null hypothesis as ## well as the p Values that the group has an effect on DNA methylation. locCor <- estLocCor(vario.sm) clusters.rej <- testClusters(locCor, FDR.cluster = 0.1)
CpG clusters rejected in a previous step are trimmed.
trimClusters(clusters.rej, FDR.loc)
trimClusters(clusters.rej, FDR.loc)
clusters.rej |
Output of |
FDR.loc |
Location-wise FDR level. Default is 0.2. |
Not differentially methylated CpG sites are removed within the CpG
clusters rejected by testClusters
.
A data.frame
containing the differentially methylated CpG sites.
Katja Hebestreit
Yoav Benjamini and Ruth Heller (2007): False Discovery Rates for Spatial Signals. American Statistical Association, 102 (480): 1272-81.
## Variogram under Null hypothesis (for resampled data): data(vario) plot(vario$variogram$v) vario.sm <- smoothVariogram(vario, sill=0.9) # auxiliary object to get the pValsList for the test # results of interest: data(betaResults) vario.aux <- makeVariogram(betaResults, make.variogram=FALSE) # Replace the pValsList slot: vario.sm$pValsList <- vario.aux$pValsList ## vario.sm contains the smoothed variogram under the Null hypothesis as ## well as the p Values that the group has an effect on DNA methylation. locCor <- estLocCor(vario.sm) clusters.rej <- testClusters(locCor, FDR.cluster = 0.1) clusters.trimmed <- trimClusters(clusters.rej, FDR.loc = 0.05)
## Variogram under Null hypothesis (for resampled data): data(vario) plot(vario$variogram$v) vario.sm <- smoothVariogram(vario, sill=0.9) # auxiliary object to get the pValsList for the test # results of interest: data(betaResults) vario.aux <- makeVariogram(betaResults, make.variogram=FALSE) # Replace the pValsList slot: vario.sm$pValsList <- vario.aux$pValsList ## vario.sm contains the smoothed variogram under the Null hypothesis as ## well as the p Values that the group has an effect on DNA methylation. locCor <- estLocCor(vario.sm) clusters.rej <- testClusters(locCor, FDR.cluster = 0.1) clusters.trimmed <- trimClusters(clusters.rej, FDR.loc = 0.05)
makeVariogram
Please see the package vignette for description.
data(vario)
data(vario)
A list
consisting of the variogram
(a matrix
) and
the pValsList
(a list
of the data frames of test results).
data(vario) names(vario)
data(vario) names(vario)
BSraw
and BSrel
data to a bed file suitable for the IGVThe created bed files contains an entry for each CpG site. Strand information, relative methylation and absolute number of reads covering the CpG sites are stored. The relative methylation is indicated by colors: green via black to red for unmethylated to methylated.
writeBED(object, name, file)
writeBED(object, name, file)
object |
A |
name |
Track names (sample names) written to the bed file's header. |
file |
Character vector with names of the bed file. |
The written bed file contains the following extra information:
score: the relative methylation of the CpG site
name: the coverage of the CpG site
itemRgb: a color value visualizing the methylation score
A separate bed file is created for each sample in the
given object. The lengths of the arguments
name
and file
should equal the number of
samples.
Nothing. Bed files are written.
Hans-Ulrich Klein
data(rrbs) s1 <- rrbs[,1] out <- tempfile(, fileext = ".bed") writeBED(s1, name = colnames(s1), file = out)
data(rrbs) s1 <- rrbs[,1] out <- tempfile(, fileext = ".bed") writeBED(s1, name = colnames(s1), file = out)