Package 'SOMNiBUS' reference manual

Title:	Smooth modeling of bisulfite sequencing
Description:	This package aims to analyse count-based methylation data on predefined genomic regions, such as those obtained by targeted sequencing, and thus to identify differentially methylated regions (DMRs) that are associated with phenotypes or traits. The method is built a rich flexible model that allows for the effects, on the methylation levels, of multiple covariates to vary smoothly along genomic regions. At the same time, this method also allows for sequencing errors and can adjust for variability in cell type mixture.
Authors:	Kaiqiong Zhao [aut], Kathleen Klein [cre], Audrey Lemaçon [ctb, ctr], Simon Laurin-Lemay [ctb, ctr], My Intelligent Machines Inc. [ctr], Celia Greenwood [ths, aut]
Maintainer:	Kathleen Klein <[email protected]>
License:	MIT + file LICENSE
Version:	1.15.1
Built:	2025-03-02 04:08:25 UTC
Source:	https://github.com/bioc/SOMNiBUS

A smoothed-EM algorithm to estimate covariate effects and test regional association in Bisulfite Sequencing-derived methylation data

Description

This function fits a (dispersion-adjusted) binomial regression model to regional methylation data, and reports the estimated smooth covariate effects and regional p-values for the test of DMRs (differentially methylation regions). Over or under dispersion across loci is accounted for in the model by the combination of a multiplicative dispersion parameter (or scale parameter) and a sample-specific random effect.

This method can deal with outcomes, i.e. the number of methylated reads in a region, that are contaminated by known false methylation calling rate (p0) and false non-methylation calling rate (1-p1).

The covariate effects are assumed to smoothly vary across genomic regions. In order to estimate them, the algorithm first represents the functional parameters by a linear combination of a set of restricted cubic splines (with dimension n.k), and a smoothness penalization term which depends on the smoothing parameters lambdas is also added to control smoothness. The estimation is performed by an iterated EM algorithm. Each M step constitutes an outer Newton's iteration to estimate smoothing parameters lambdas and an inner P-IRLS iteration to estimate spline coefficients alpha for the covariate effects. Currently, the computation in the M step depends the implementation of gam() in package mgcv.

Usage

binomRegMethModel(
  data,
  n.k,
  p0 = 0.003,
  p1 = 0.9,
  Quasi = TRUE,
  epsilon = 10^(-6),
  epsilon.lambda = 10^(-3),
  maxStep = 200,
  binom.link = "logit",
  method = "REML",
  covs = NULL,
  RanEff = TRUE,
  reml.scale = FALSE,
  scale = -2,
  verbose = TRUE
)
binomRegMethModel(
  data,
  n.k,
  p0 = 0.003,
  p1 = 0.9,
  Quasi = TRUE,
  epsilon = 10^(-6),
  epsilon.lambda = 10^(-3),
  maxStep = 200,
  binom.link = "logit",
  method = "REML",
  covs = NULL,
  RanEff = TRUE,
  reml.scale = FALSE,
  scale = -2,
  verbose = TRUE
)

Arguments

`data`	a data frame with rows as individual CpGs appearing in all the samples. The first 4 columns should contain the information of `Meth_Counts` (methylated counts), `Total_Counts` (read depths), `Position` (Genomic position for the CpG site) and `ID` (sample ID). The covariate information, such as disease status or cell type composition, are listed in column 5 and onwards.
`n.k`	a vector of basis dimensions for the intercept and individual covariates. `n.k` specifies an upper limit of the degrees of each functional parameters. The length of n.k should equal to the number of covariates plus 1 (for the intercept)). We recommend basis dimensions n.k, approximately equal to the number of unique CpGs in the region divided by 20. This parameter will be computed automatically, when several regions are generated by the partitioning function.
`p0`	the probability of observing a methylated read when the underlying true status is unmethylated. `p0` is the rate of false methylation calls, i.e. false positive rate.
`p1`	the probability of observing a methylated read when the underlying true status is methylated. `1-p1` is the rate of false non-methylation calls, i.e. false negative rate.
`Quasi`	whether a Quasi-likelihood estimation approach will be used; in other words, whether a multiplicative dispersion is added in the model or not.
`epsilon`	numeric; stopping criterion for the closeness of estimates of spline coefficients from two consecutive iterations.
`epsilon.lambda`	numeric; stopping criterion for the closeness of estimates of smoothing parameter `lambda` from two consecutive iterations.
`maxStep`	the algorithm will step if the iteration steps exceed `maxStep`.
`binom.link`	the link function used in the binomial regression model; the default is the logit link.
`method`	the method used to estimate the smoothing parameters. The default is the 'REML' method which is generally better than prediction based criterion `GCV.cp`.
`covs`	a vector of covariate names. The covariates with names in `covs` will be included in the model and their covariate effects will be estimated. The default is to fit all covariates in `dat`
`RanEff`	whether sample-level random effects are added or not
`reml.scale`	whether a REML-based scale (dispersion) estimator is used. The default is Fletcher-based estimator.
`scale`	negative values mean scale parameter should be estimated; if a positive value is provided, a fixed scale will be used.
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

This function return a list including objects:

est: estimates of the spline basis coefficients alpha
lambda: estimates of the smoothing parameters for each functional parameters
est.pi: predicted methylation levels for each row in the input data
ite.points: estimates of est, lambda at each EM iteration
cov1: estimated variance-covariance matrix of the basis coefficients alphas
reg.out: regional testing output obtained using Fletcher-based dispersion estimate; an additional 'ID' row would appear if RanEff is TRUE
reg.out.reml.scale:regional testing output obtained using REML-based dispersion estimate;
reg.out.gam:regional testing output obtained using (Fletcher-based) dispersion estimate from mgcv package;
phi_fletcher: Fletcher-based estimate of the (multiplicative) dispersion parameter;
phi_reml: REML-based estimate of the (multiplicative) dispersion parameter;
phi_gam: Estimated dispersion parameter reported by mgcv;
SE.out: a matrix of the estimated pointwise Standard Errors (SE); number of rows are the number of unique CpG sites in the input data and the number of columns equal to the total number of covariates fitted in the model (the first one is the intercept);
SE.out.REML.scale: a matrix of the estimated pointwise Standard Errors (SE); the SE calculated from the REML-based dispersion estimates
uni.pos: the genomic postions for each row of CpG sites in the matrix SE.out;
Beta.out: a matrix of the estimated covariate effects beta(t), where t denotes the genomic positions;
ncovs: number of functional paramters in the model (including the intercept);
sigma00: estimated variance for the random effect if RanEff is TRUE; NA if RanEff is FALSE.

Author(s)

Kaiqiong Zhao

Examples

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
out <- binomRegMethModel(
  data=RAdat.f, n.k=rep(5,3), p0=0.003307034, p1=0.9,
  epsilon=10^(-6), epsilon.lambda=10^(-3), maxStep=200
)
#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
out <- binomRegMethModel(
  data=RAdat.f, n.k=rep(5,3), p0=0.003307034, p1=0.9,
  epsilon=10^(-6), epsilon.lambda=10^(-3), maxStep=200
)

Plot the smooth covariate effect

Description

This function accepts an output object from function binomRegMethModel and print out a plot of the estimated effect across the region for each test covariate.

Usage

binomRegMethModelPlot(
  BEM.obj,
  mfrow = NULL,
  same.range = FALSE,
  title = "Smooth covariate effects",
  covs = NULL,
  save = NULL,
  verbose = TRUE
)
binomRegMethModelPlot(
  BEM.obj,
  mfrow = NULL,
  same.range = FALSE,
  title = "Smooth covariate effects",
  covs = NULL,
  save = NULL,
  verbose = TRUE
)

Arguments

`BEM.obj`	an output object from function `binomRegMethModel`
`mfrow`	A vector of the form c(nr, nc). Subsequent figures will be drawn in an nr-by-nc array on the device.
`same.range`	specify whether the plots should be in the same vertical scale
`title`	the text for the title
`covs`	a vector of covariate names. The covariates with names in `covs` will be included in the plot. When the value is set to NULL all the covariates and the Intercept will be represented. The default value is NULL.
`save`	file name to create on disk. When the value is set to NULL, the plot is not saved. The default value is NULL.
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

This function prints out a plot of smooth covariate effects and its pointwise confidence intervals

Author(s)

Kaiqiong Zhao, Audrey Lemaçon

Examples

#------------------------------------------------------------#
data(RAdat)
head(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
out <- binomRegMethModel(
  data=RAdat.f, n.k=rep(5, 3), p0=0.003307034, p1=0.9,
  epsilon=10^(-6), epsilon.lambda=10^(-3), maxStep=200,
  Quasi = FALSE, RanEff = FALSE
)
binomRegMethModelPlot(out, same.range=FALSE)


#------------------------------------------------------------#
data(RAdat)
head(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
out <- binomRegMethModel(
  data=RAdat.f, n.k=rep(5, 3), p0=0.003307034, p1=0.9,
  epsilon=10^(-6), epsilon.lambda=10^(-3), maxStep=200,
  Quasi = FALSE, RanEff = FALSE
)
binomRegMethModelPlot(out, same.range=FALSE)

A smoothed-EM algorithm to estimate covariate effects and test regional association in Bisulfite Sequencing-derived methylation data

Description

This function returns the predicted methylation levels

Usage

binomRegMethModelPred(
  BEM.obj,
  newdata = NULL,
  type = "proportion",
  verbose = TRUE
)
binomRegMethModelPred(
  BEM.obj,
  newdata = NULL,
  type = "proportion",
  verbose = TRUE
)

Arguments

`BEM.obj`	an output from the function `binomRegMethModel`
`newdata`	the data set whose predictions are calculated; with columns 'Position', covariate names matching the BEM.obj
`type`	return the predicted methylation proportion or the predicted response (in logit or other binom.link scale)
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

This function returns the predicted methylation levels

Author(s)

Kaiqiong Zhao

Examples

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
out <- binomRegMethModel(
  data=RAdat.f, n.k=rep(5, 3), p0=0.003307034, p1=0.9,
  epsilon=10^(-6), epsilon.lambda=10^(-3), maxStep=200,
  Quasi = FALSE, RanEff = FALSE
)
binomRegMethModelPred(out)
#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
out <- binomRegMethModel(
  data=RAdat.f, n.k=rep(5, 3), p0=0.003307034, p1=0.9,
  epsilon=10^(-6), epsilon.lambda=10^(-3), maxStep=200,
  Quasi = FALSE, RanEff = FALSE
)
binomRegMethModelPred(out)

Simulate Bisulfite sequencing data from specified smooth covariate effects

Description

Simulate Bisulfite sequencing data from a Generalized Additive Model with functional parameters varying with the genomic position. Both the true methylated counts and observed methylated counts are generated, given the error/conversion rate parameters p0 and p1. In addition, the true methylated counts can be simulated from a binomial or a dispersed binomial distribution (Beta-binomial distribution).

Usage

binomRegMethModelSim(
  n,
  posit,
  theta.0,
  beta,
  phi,
  random.eff = FALSE,
  mu.e = 0,
  sigma.ee = 1,
  p0 = 0.003,
  p1 = 0.9,
  X,
  Z,
  binom.link = "logit",
  verbose = TRUE
)
binomRegMethModelSim(
  n,
  posit,
  theta.0,
  beta,
  phi,
  random.eff = FALSE,
  mu.e = 0,
  sigma.ee = 1,
  p0 = 0.003,
  p1 = 0.9,
  X,
  Z,
  binom.link = "logit",
  verbose = TRUE
)

Arguments

`n`	sample size
`posit`	a numeric vector of size `p` (the number of CpG sites in the considered region) containing the genomic positions;
`theta.0`	numeric vector of size `p` which is a functional parameter for the intercept of the GAMM model.
`beta`	numeric vector of size `p` which is a functional parameter for the slope of cell type composition.
`phi`	a vector of length `p` determining the multiplicative dispersion parameter for each loci in a region. The dispersed-Binomial counts are simulated from beta-binomial distribution, so each element of phi has to be greater than 1.
`random.eff`	indicates whether adding the subject-specific random effect term `e`.
`mu.e`	number, the mean of the random effect.
`sigma.ee`	positive number, variance of the random effect
`p0`	the probability of observing a methylated read when the underlying true status is unmethylated. `p0` is the rate of false methylation calls, i.e. false positive rate.
`p1`	the probability of observing a methylated read when the underlying true status is methylated. `1-p1` is the rate of false non-methylation calls, i.e. false negative rate.
`X`	the matrix of the read coverage for each CpG in each sample; a matrix of n rows and `p` columns.
`Z`	numeric matrix with `p` columns and `n` rows storing the covariate information.
`binom.link`	the link function used for simulation
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

The function returns a list of following objects

S a numeric matrix of n rows and p columns containing the true methylation counts;
Y a numeric matrix of n rows and p columns containing the observed methylation counts;
theta a numeric matrix of n rows and p columns containing the methylation parameter (after the logit transformation);
pi a numeric matrix of n rows and p columns containing the true methylation proportions used to simulate the data.

Author(s)

Kaiqiong Zhao

Examples

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
out <- binomRegMethModel(
   data=RAdat.f, n.k=rep(5, 3), p0=0, p1=1,
   epsilon=10^(-6), epsilon.lambda=10^(-3), maxStep=200, RanEff = FALSE
)
Z = as.matrix(RAdat.f[match(unique(RAdat.f$ID), RAdat.f$ID),
c('T_cell', 'RA')])
set.seed(123)
X = matrix(sample(80, nrow(Z)*length(out$uni.pos), replace = TRUE),
nrow = nrow(Z), ncol = length(out$uni.pos))+10
simdat = binomRegMethModelSim(n=nrow(Z), posit= out$uni.pos,
theta.0=out$Beta.out[,1], beta= out$Beta.out[,-1], random.eff=FALSE,
mu.e=0,sigma.ee=1, p0=0.003, p1=0.9,X=X , Z=Z, binom.link='logit',
phi = rep(1, length(out$uni.pos)))
#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
out <- binomRegMethModel(
   data=RAdat.f, n.k=rep(5, 3), p0=0, p1=1,
   epsilon=10^(-6), epsilon.lambda=10^(-3), maxStep=200, RanEff = FALSE
)
Z = as.matrix(RAdat.f[match(unique(RAdat.f$ID), RAdat.f$ID),
c('T_cell', 'RA')])
set.seed(123)
X = matrix(sample(80, nrow(Z)*length(out$uni.pos), replace = TRUE),
nrow = nrow(Z), ncol = length(out$uni.pos))+10
simdat = binomRegMethModelSim(n=nrow(Z), posit= out$uni.pos,
theta.0=out$Beta.out[,1], beta= out$Beta.out[,-1], random.eff=FALSE,
mu.e=0,sigma.ee=1, p0=0.003, p1=0.9,X=X , Z=Z, binom.link='logit',
phi = rep(1, length(out$uni.pos)))

Plot the predicted methylation levels

Description

This function accepts the data.frame used as an input for the function binomRegMethModelPred with additional columns containing the predictions generated by the function binomRegMethModelPred and columns containing the name of each experimental group and returns a plot representing the predicted methylation levels according to each experimental group.

Usage

binomRegMethPredPlot(
  pred,
  pred.type = "proportion",
  pred.col = "pred",
  group.col = NULL,
  title = "Predicted methylation levels",
  style = NULL,
  save = NULL,
  verbose = TRUE
)
binomRegMethPredPlot(
  pred,
  pred.type = "proportion",
  pred.col = "pred",
  group.col = NULL,
  title = "Predicted methylation levels",
  style = NULL,
  save = NULL,
  verbose = TRUE
)

Arguments

`pred`	`data.frame` used as an input for the function `binomRegMethModelPred` (with columns 'Position', covariate names matching the original output from the function `binomRegMethModel`) with additional columns containing the predictions generated by the function `binomRegMethModelPred` and columns containing the name of each experimental group. Rows without a valid group name (empty character `""` or `NA`) are ignored
`pred.type`	type of prediction returned by the function `binomRegMethModelPred`: `proportion` or `link.scale`. The default value is "proportion".
`pred.col`	character defines the name of the column containing the prediction values. The default value is "pred".
`group.col`	character defines the name of the column containing the experimental groups. If the group.col is set to NULL, the resulting plot will be a simple scatter plot representing all predicted values disregarding any experimental design. The default value is NULL.
`title`	the text for the title
`style`	named list containing the wanted style (color and line type) for each experimental groups. The first level list is named according each experimental group and for each experimental group there is a list containing the `color` and the `type` of the line. The line types should be among the following types: `twodash`, `solid`, `longdash`, `dotted`, `dotdash`, `dashed`, `blank`. The function accepts color name and its hexadecimal code. The default value is NULL meaning that the colors will be chosen randomly and the line style will be set to `solid`.
`save`	file name to create on disk. When the value is set to NULL, the plot is not saved. The default value is NULL.
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

This function prints out a plot of the predicted methylation levels according to preset experimental groups.

Author(s)

Audrey Lemaçon

Examples

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
BEM.obj <- binomRegMethModel(
  data=RAdat.f, n.k=rep(5, 3), p0=0.003307034, p1=0.9,
  epsilon=10^(-6), epsilon.lambda=10^(-3), maxStep=200,
  Quasi = FALSE, RanEff = FALSE, verbose = FALSE
)
pos <- BEM.obj$uni.pos
newdata <- expand.grid(pos, c(0, 1), c(0, 1))
colnames(newdata) <- c("Position", "T_cell", "RA")
my.pred <- binomRegMethModelPred(BEM.obj, newdata, type = "link.scale", 
verbose = FALSE)
newdata$group <- ""
newdata[(newdata$RA == 0 & newdata$T_cell == 0),]$group <- "CTRL MONO"
newdata[(newdata$RA == 0 & newdata$T_cell == 1),]$group <- "CTRL TCELL"
newdata[(newdata$RA == 1 & newdata$T_cell == 0),]$group <- "RA MONO"
newdata[(newdata$RA == 1 & newdata$T_cell == 1),]$group <- "RA TCELL"
pred <- cbind(newdata, Pred = my.pred)
style <- list("CTRL MONO" = list(color = "blue", type = "dashed"),
"CTRL TCELL" = list(color = "green", type = "dashed"),
"RA MONO" = list(color = "blue", type = "solid"),
"RA TCELL" = list(color = "green", type = "solid"))
g <- binomRegMethPredPlot(pred, pred.col = "Pred", group.col = "group",
style = style, save = NULL, verbose = FALSE)

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
BEM.obj <- binomRegMethModel(
  data=RAdat.f, n.k=rep(5, 3), p0=0.003307034, p1=0.9,
  epsilon=10^(-6), epsilon.lambda=10^(-3), maxStep=200,
  Quasi = FALSE, RanEff = FALSE, verbose = FALSE
)
pos <- BEM.obj$uni.pos
newdata <- expand.grid(pos, c(0, 1), c(0, 1))
colnames(newdata) <- c("Position", "T_cell", "RA")
my.pred <- binomRegMethModelPred(BEM.obj, newdata, type = "link.scale", 
verbose = FALSE)
newdata$group <- ""
newdata[(newdata$RA == 0 & newdata$T_cell == 0),]$group <- "CTRL MONO"
newdata[(newdata$RA == 0 & newdata$T_cell == 1),]$group <- "CTRL TCELL"
newdata[(newdata$RA == 1 & newdata$T_cell == 0),]$group <- "RA MONO"
newdata[(newdata$RA == 1 & newdata$T_cell == 1),]$group <- "RA TCELL"
pred <- cbind(newdata, Pred = my.pred)
style <- list("CTRL MONO" = list(color = "blue", type = "dashed"),
"CTRL TCELL" = list(color = "green", type = "dashed"),
"RA MONO" = list(color = "blue", type = "solid"),
"RA TCELL" = list(color = "green", type = "solid"))
g <- binomRegMethPredPlot(pred, pred.col = "Pred", group.col = "group",
style = style, save = NULL, verbose = FALSE)

Parsing output from the Bismark alignment suite

Description

This function reads and converts Bismark's 'genome wide cytosine report' and 'coverage' into a list of data.frames (one per chromosome) to a format compatible with SOMNiBUS' main functions runSOMNiBUS and binomRegMethModel.

Usage

formatFromBismark(..., verbose = TRUE)
formatFromBismark(..., verbose = TRUE)

Arguments

`...`	parameters from `bsseq::read.bismark()` function
`verbose`	logical indicates the level of information provided by the algorithm during the process. The default value is TRUE.

Value

This function returns a list of data.frames (one per chromosome). Each data.frame contains rows as individual CpGs appearing in all the samples. The first 4 columns contain the information of Meth_Counts (methylated counts), Total_Counts (read depths), Position (Genomic position for the CpG site) and ID (sample ID). The additional information (such as disease status, sex, age) extracted from the BSseq object are listed in column 5 and onwards and will be considered as covariate information by SOMNiBUS algorithms.

Author(s)

Audrey Lemaçon

Examples

infile <- system.file("extdata/test_data.fastq_bismark.bismark.cov.gz",
package = "bsseq")
dat <- formatFromBismark(infile, verbose = FALSE)
infile <- system.file("extdata/test_data.fastq_bismark.bismark.cov.gz",
package = "bsseq")
dat <- formatFromBismark(infile, verbose = FALSE)

Parsing output from the BSseq package

Description

This function reads and converts a BSseq object into a list of data.frames (one per chromosome) to a format compatible with SOMNiBUS' main functions runSOMNiBUS and binomRegMethModel.

Usage

formatFromBSseq(bsseq_dat, verbose = TRUE)
formatFromBSseq(bsseq_dat, verbose = TRUE)

Arguments

`bsseq_dat`	an object of class BSseq.
`verbose`	logical indicates the level of information provided by the algorithm during the process. The default value is TRUE.

Value

Author(s)

Audrey Lemaçon

Examples

M <- matrix(1:9, 3,3)
colnames(M) <- c("A1", "A2", "A3")
BStest <- bsseq::BSseq(pos = 1:3, chr = c("chr1", "chr2", "chr1"), 
M = M, Cov = M + 2)
dat <- formatFromBSseq(BStest, verbose = FALSE)
M <- matrix(1:9, 3,3)
colnames(M) <- c("A1", "A2", "A3")
BStest <- bsseq::BSseq(pos = 1:3, chr = c("chr1", "chr2", "chr1"), 
M = M, Cov = M + 2)
dat <- formatFromBSseq(BStest, verbose = FALSE)

A simulated methylation dataset based on a real data.

Description

A dataset containing methylation levels on one targeted region on chromosome 4 near gene BANK1 from cases with rheumatoid arthritis (RA) and controls.

Usage

RAdat
RAdat

Format

A data frame of 5289 rows and 6 columns. Each row represents a CpG site for a sample. Columns include in order:

Meth_Counts: Number of methylated reads
Total_Counts: Total number of reads; read-depth
Position: Genomic position (in bp) for the CpG site
ID: indicates which sample the CpG site belongs to
T_cell: whether a sample is from T cell or monocyte
RA: whether a sample is an RA patient or control

Details

This example data include methylation levels of cell type separated blood samples of 22 rheumatoid arthritis (RA) patients and 21 healthy individuals. In the data set, 123 CpG sites are measured and there are 25 samples from circulating T cells and 18 samples from monocytes.

It should be noted that this data example is only for illustration purposes. The forward and reverse reads were not merged in this dataset. Try not to overinterpret the results of this simple example.

Source

simulation is based a real data set provided by Dr. Marie Hudson (McGill University).

A simulated methylation dataset based on a real data.

Description

This example data include methylation levels on a region with 208 CpGs for 116 blood samples.

Usage

RAdat2
RAdat2

Format

A data frame of 6064 rows and 13 columns. Each row represents a CpG site for a sample. Columns include in order:

Meth_Counts: Number of methylated reads
Total_Counts: Total number of reads; read-depth
Position: Genomic position (in bp) for the CpG site
ID: indicates which sample the CpG site belongs to
ACPA4: binary indicator for a biomarker anti-citrullinated protein antibody
Age: Age
Sex: 2-female; 1-male
Smoking: 1-current or ex-smoker; 0-non-smoker
Smoking_NA: 1-Smoking info is NA; 0-Smoking info is available
PC1: PC1 for the cell type proportions
PC2: PC2 for the cell type proportions
PC3: PC3 for the cell type proportions
PC4: PC4 for the cell type proportions

Source

simulation is based a real data set provided by PI Dr. Sasha Bernatsky (McGill University)

Wrapper function running the smoothed-EM algorithm to estimate covariate effects and test regional association in Bisulfite Sequencing-derived methylation data

Description

This function splits the methylation data into regions (according to different approaches) and, for each region, fits a (dispersion-adjusted) binomial regression model to regional methylation data, and reports the estimated smooth covariate effects and regional p-values for the test of DMRs (differentially methylation regions). Over or under dispersion across loci is accounted for in the model by the combination of a multiplicative dispersion parameter (or scale parameter) and a sample-specific random effect.

Usage

runSOMNiBUS(
  dat,
  split = list(approach = "region"),
  min.cpgs = 50,
  max.cpgs = 2000,
  n.k,
  p0 = 0.003,
  p1 = 0.9,
  Quasi = TRUE,
  epsilon = 10^(-6),
  epsilon.lambda = 10^(-3),
  maxStep = 200,
  binom.link = "logit",
  method = "REML",
  covs = NULL,
  RanEff = TRUE,
  reml.scale = FALSE,
  scale = -2,
  verbose = TRUE
)
runSOMNiBUS(
  dat,
  split = list(approach = "region"),
  min.cpgs = 50,
  max.cpgs = 2000,
  n.k,
  p0 = 0.003,
  p1 = 0.9,
  Quasi = TRUE,
  epsilon = 10^(-6),
  epsilon.lambda = 10^(-3),
  maxStep = 200,
  binom.link = "logit",
  method = "REML",
  covs = NULL,
  RanEff = TRUE,
  reml.scale = FALSE,
  scale = -2,
  verbose = TRUE
)

Arguments

`dat`	a data frame with rows as individual CpGs appearing in all the samples. The first 4 columns should contain the information of `Meth_Counts` (methylated counts), `Total_Counts` (read depths), `Position` (Genomic position for the CpG site) and `ID` (sample ID). The covariate information, such as disease status or cell type composition, are listed in column 5 and onwards.
`split`	this `list` must contain at least the element `approach` which corresponds to the partitioning approach used to split the data into independent regions. The partitioning methods available are: `"region"` (partitioning based on the spacing of CpGs), `"density"` (partitioning based on CpG density), `"chromatin"` (partitioning based on chromatin states), `"gene"` (partitioning based on gene regions), `"granges"` (partitioning based on user-specific annotations provided as a `GenomicRanges` object), `"bed"` (partitioning based on user-specific annotations provided in a BED file). This list should also contain additional parameters specific to each partitioning approach (see the documentation of each approach for details).
`min.cpgs`	positive integer defining the minimum number of CpGs within a region for the algorithm to perform optimally. The default value is 50.
`max.cpgs`	positive integer defining the maximum number of CpGs within a region for the algorithm to perform optimally. The default value is 2000.
`n.k`	a vector of basis dimensions for the intercept and individual covariates. `n.k` specifies an upper limit of the degrees of each functional parameters. The length of n.k should equal to the number of covariates plus 1 (for the intercept)). We recommend basis dimensions n.k, approximately equal to the number of unique CpGs in the region divided by 20. This parameter will be computed automatically, when several regions are generated by the partitioning function.
`p0`	the probability of observing a methylated read when the underlying true status is unmethylated. `p0` is the rate of false methylation calls, i.e. false positive rate.
`p1`	the probability of observing a methylated read when the underlying true status is methylated. `1-p1` is the rate of false non-methylation calls, i.e. false negative rate.
`Quasi`	whether a Quasi-likelihood estimation approach will be used; in other words, whether a multiplicative dispersion is added in the model or not.
`epsilon`	numeric; stopping criterion for the closeness of estimates of spline coefficients from two consecutive iterations.
`epsilon.lambda`	numeric; stopping criterion for the closeness of estimates of smoothing parameter `lambda` from two consecutive iterations.
`maxStep`	the algorithm will step if the iteration steps exceed `maxStep`.
`binom.link`	the link function used in the binomial regression model; the default is the logit link.
`method`	the method used to estimate the smoothing parameters. The default is the 'REML' method which is generally better than prediction based criterion `GCV.cp`.
`covs`	a vector of covariate names. The covariates with names in `covs` will be included in the model and their covariate effects will be estimated. The default is to fit all covariates in `dat`
`RanEff`	whether sample-level random effects are added or not
`reml.scale`	whether a REML-based scale (dispersion) estimator is used. The default is Fletcher-based estimator.
`scale`	negative values mean scale parameter should be estimated; if a positive value is provided, a fixed scale will be used.
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

This function returns a list of models (one by independent region) including objects:

est: estimates of the spline basis coefficients alpha
lambda: estimates of the smoothing parameters for each functional parameters
est.pi: predicted methylation levels for each row in the input data
ite.points: estimates of est, lambda at each EM iteration
cov1: estimated variance-covariance matrix of the basis coefficients alphas
reg.out: regional testing output obtained using Fletcher-based dispersion estimate; an additional 'ID' row would appear if RanEff is TRUE
reg.out.reml.scale: regional testing output obtained using REML-based dispersion estimate;
reg.out.gam: regional testing output obtained using (Fletcher-based) dispersion estimate from mgcv package;
phi_fletcher: Fletcher-based estimate of the (multiplicative) dispersion parameter;
phi_reml: REML-based estimate of the (multiplicative) dispersion parameter;
phi_gam: Estimated dispersion parameter reported by mgcv;
SE.out: a matrix of the estimated pointwise Standard Errors (SE); number of rows are the number of unique CpG sites in the input data and the number of columns equal to the total number of covariates fitted in the model (the first one is the intercept);
SE.out.REML.scale: a matrix of the estimated pointwise Standard Errors (SE); the SE calculated from the REML-based dispersion estimates
uni.pos: the genomic postions for each row of CpG sites in the matrix SE.out;
Beta.out: a matrix of the estimated covariate effects beta(t), where t denotes the genomic positions;
ncovs: number of functional paramters in the model (including the intercept);
sigma00: estimated variance for the random effect if RanEff is TRUE; NA if RanEff is FALSE.

Author(s)

Audrey Lemaçon

Examples

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
outs <- runSOMNiBUS(
  dat=RAdat.f, split = list(approach = "region", gap = 1e6), min.cpgs = 5,
  n.k = rep(5,3), p0 = 0.003, p1 = 0.9
)

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
outs <- runSOMNiBUS(
  dat=RAdat.f, split = list(approach = "region", gap = 1e6), min.cpgs = 5,
  n.k = rep(5,3), p0 = 0.003, p1 = 0.9
)

Split methylation data into regions based on the genomic annotations

Description

This function splits the methylation data into regions based on the genomic annotation provided under the form of a 1-based BED file

Usage

splitDataByBed(
  dat,
  chr,
  bed,
  gap = -1,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)
splitDataByBed(
  dat,
  chr,
  bed,
  gap = -1,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)

Arguments

`dat`	a data frame with rows as individual CpGs appearing in all the samples. The first 4 columns should contain the information of `Meth_Counts` (methylated counts), `Total_Counts` (read depths), `Position` (Genomic position for the CpG site) and `ID` (sample ID). The covariate information, such as disease status or cell type composition, are listed in column 5 and onwards.
`chr`	character vector containing the chromosome information. Its length should be equal to the number of rows in `dat`.
`bed`	character, path to the 1-based BED file containing the annotations
`gap`	integer defining the maximum gap that is allowed between two regions to be considered as overlapping. According to the `GenomicRanges::findOverlaps` function, the gap between 2 ranges is the number of positions that separate them. The gap between 2 adjacent ranges is 0. By convention when one range has its start or end strictly inside the other (i.e. non-disjoint ranges), the gap is considered to be -1. Decimal values will be rounded to the nearest integer. The default value is `-1` (meaning strict overlapping).
`min.cpgs`	positive integer defining the minimum number of CpGs within a region for the algorithm to perform optimally. The default value is 50.
`max.cpgs`	positive integer defining the maximum number of CpGs within a region for the algorithm to perform optimally. The default value is 2000.
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

A named list of data.frame containing the data of each independent region.

Author(s)

Audrey Lemaçon

Split methylation data into regions based on the chromatin states

Description

This function splits the methylation data into regions based on the chromatin states predicted by ChromHMM software (Ernst and Kellis (2012)). The annotations come from the Bioconductor package annnotatr. Chromatin states determined by chromHMM are available in hg19 for nine cell lines (Gm12878, H1hesc, Hepg2, Hmec, Hsmm, Huvec, K562, Nhek, and Nhlf).

Usage

splitDataByChromatin(
  dat,
  chr,
  cell.line,
  states,
  gap = -1,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)
splitDataByChromatin(
  dat,
  chr,
  cell.line,
  states,
  gap = -1,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)

Arguments

`dat`	a data frame with rows as individual CpGs appearing in all the samples. The first 4 columns should contain the information of `Meth_Counts` (methylated counts), `Total_Counts` (read depths), `Position` (Genomic position for the CpG site) and `ID`(sample ID). The covariate information, such as disease status or cell type composition, are listed in column 5 and onwards.
`chr`	character vector containing the chromosome information. Its length should be equal to the number of rows in `dat`.
`cell.line`	character defining the cell line of interest. Nine cell lines are available: `"gm12878"`: Lymphoblastoid cells GM12878, `"h1hesc"`: Embryonic cells H1 hESC, `"hepg2"`: Liver carcinoma HepG2, `"hmec"`, Mammary epithelial cells HMEC, `"hsmm"`, Skeletal muscle myoblasts HSMM, `"huvec"`: Umbilical vein endothelial HUVEC, `"k562"`: Myelogenous leukemia K562, `"nhek"`: Keratinocytes NHEK, `"nhlf"`: Normal human lung fibroblasts NHLF.
`states`	character vector defining the chromatin states of interest among the following available options: `"ActivePromoter"`: Active Promoter `"WeakPromoter"`: Weak Promoter `"PoisedPromoter"`: Poised Promoter `"StrongEnhancer"`: Strong Enhancer `"WeakEnhancer"`: Weak/poised Enhancer `"Insulator"`: Insulator `"TxnTransition"`: Transcriptional Transition `"TxnElongation"`: Transcriptional Elongation `"WeakTxn"`: Weak Transcribed `"Repressed"`: Polycomb-Repressed `"Heterochrom"`: Heterochromatin; low signal `"RepetitiveCNV"`: Repetitive/Copy Number Variation Use `state="all"` to select all the states simultaneously.
`gap`	this integer defines the maximum gap that is allowed between two regions to be considered as overlapping. According to the `GenomicRanges::findOverlaps` function, the gap between 2 ranges is the number of positions that separate them. The gap between 2 adjacent ranges is 0. By convention when one range has its start or end strictly inside the other (i.e. non-disjoint ranges), the gap is considered to be -1. Decimal values will be rounded to the nearest integer. The default value is `-1`.
`min.cpgs`	positive integer defining the minimum number of CpGs within a region for the algorithm to perform optimally. The default value is 50.
`max.cpgs`	positive integer defining the maximum number of CpGs within a region for the algorithm to perform optimally. The default value is 2000.
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

A list of data.frame containing the data of each independent region.

Author(s)

Audrey Lemaçon

Examples

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
results <- splitDataByChromatin(dat = RAdat.f, 
cell.line = "huvec", chr = rep(x = "chr4", times = nrow(RAdat.f)),
states = "Insulator", verbose = FALSE)

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
results <- splitDataByChromatin(dat = RAdat.f, 
cell.line = "huvec", chr = rep(x = "chr4", times = nrow(RAdat.f)),
states = "Insulator", verbose = FALSE)

Split methylation data into regions based on the density of CpGs

Description

This function splits the methylation data into regions based on the density of CpGs.

Usage

splitDataByDensity(
  dat,
  window.size = 100,
  by = 1,
  min.density = 5,
  gap = 10,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)
splitDataByDensity(
  dat,
  window.size = 100,
  by = 1,
  min.density = 5,
  gap = 10,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)

Arguments

`dat`	a data frame with rows as individual CpGs appearing in all the samples. The first 4 columns should contain the information of `Meth_Counts` (methylated counts), `Total_Counts` (read depths), `Position` (Genomic position for the CpG site) and `ID` (sample ID). The covariate information, such as disease status or cell type composition, are listed in column 5 and onwards.
`window.size`	this positive integer defines the size of the sliding window in bp. Decimal values will be rounded to the nearest integer. The value should be greater than 10. The default value is `100` (100 bp)
`by`	positive integer defines by how many base pairs the window moves at each increment. Decimal values will be rounded to the nearest integer. The default value is `1` (1 bp).
`min.density`	positive integer defines the minimum density threshold for each window. Decimal values will be rounded to the nearest integer. The default value is `5` (5 CpGs/`window.size`).
`gap`	positive integer defining the gap width beyond which we consider that two regions are independent. Decimal values will be rounded to the nearest integer. The default value is `10` (10bp).
`min.cpgs`	positive integer defining the minimum number of CpGs within a region for the algorithm to perform optimally. The default value is 50.
`max.cpgs`	positive integer defining the maximum number of CpGs within a region for the algorithm to perform optimally. The default value is 2000.
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

A named list of data.frame containing the data of each independent region.

Author(s)

Audrey Lemaçon

Examples

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
results <- splitDataByDensity(dat = RAdat.f, window.size = 100, by = 1, 
min.density = 5, gap = 10, min.cpgs = 50, verbose = FALSE)
#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
results <- splitDataByDensity(dat = RAdat.f, window.size = 100, by = 1, 
min.density = 5, gap = 10, min.cpgs = 50, verbose = FALSE)

Split methylation data into regions based on the genes annotations

Description

This function splits the methylation data into regions based on the genes. The annotations are coming from the Bioconductor package annnotatr.

Usage

splitDataByGene(
  dat,
  chr,
  organism = "human",
  build = "hg38",
  types = "promoter",
  gap = -1,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)
splitDataByGene(
  dat,
  chr,
  organism = "human",
  build = "hg38",
  types = "promoter",
  gap = -1,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)

Arguments

`dat`	a data frame with rows as individual CpGs appearing in all the samples. The first 4 columns should contain the information of `Meth_Counts` (methylated counts), `Total_Counts` (read depths), `Position` (Genomic position for the CpG site) and `ID`(sample ID). The covariate information, such as disease status or cell type composition, are listed in column 5 and onwards.
`chr`	character vector containing the chromosome information. Its length should be equal to the number of rows in `dat`.
`organism`	character defining the organism of interest Only Homo sapiens (`"human"`) is available. Additional packages are required for Mus musculus (`"mouse"`), Rattus norvegicus (`"rat"`) and Drosophila melanogaster (`"fly"`). The matching is case-insensitive. The default value is `"human"`.
`build`	character defining the version of the genome build on which the methylation data have been mapped. By default, the build is set to `"hg38"`, however the build `"hg19"` is also available for Homo sapiens: Once the additional packages are installed, the following organisms and builds are available: `"mm9"` and `"mm10"` for Mus musculus; `"rn4"`, `"rn5"` and `"rn6"` for Rattus norvegicus; `"dm3"` and `"dm6"` for Drosophila melanogaster;
`types`	character vector defining the type of genic annotations to use among the following options: `"upstream"` for the annotations included 1-5Kb upstream of the TSS; `"promoter"` for the annotations included < 1Kb upstream of the TSS; `"threeprime"` for the annotations included in 3' UTR; `"fiveprime"` for the annotations included in the 5' UTR; `"exon"` for the annotations included in the exons; `"intron"` for the annotations included in the introns; `"all"` for all the annotations aforementioned. The default value is `"promoter"`.
`gap`	this integer defines the maximum gap allowed between two regions to be considered as overlapping. According to the `GenomicRanges::findOverlaps` function, the gap between 2 ranges is the number of positions that separate them. The gap between 2 adjacent ranges is 0. By convention when one range has its start or end strictly inside the other (i.e. non-disjoint ranges), the gap is considered to be -1. Decimal values will be rounded to the nearest integer. The default value is `-1`.
`min.cpgs`	positive integer defining the minimum number of CpGs within a region for the algorithm to perform optimally. The default value is 50.
`max.cpgs`	positive integer defining the maximum number of CpGs within a region for the algorithm to perform optimally. The default value is 2000.
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

A named list of data.frame containing the data of each independent region.

Author(s)

Audrey Lemaçon

Examples

#------------------------------------------------------------#
data(RAdat)
# Add a column containing the chromosome information
RAdat$Chr <- "chr4"
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
results <- splitDataByGene(dat = RAdat.f, 
chr = rep(x = "chr1", times = nrow(RAdat.f)), verbose = FALSE)

#------------------------------------------------------------#
data(RAdat)
# Add a column containing the chromosome information
RAdat$Chr <- "chr4"
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
results <- splitDataByGene(dat = RAdat.f, 
chr = rep(x = "chr1", times = nrow(RAdat.f)), verbose = FALSE)

Split methylation data into regions based on the genomic annotations

Description

This function splits the methylation data into regions based on the genomic annotations provided under the form of a GenomicRanges object.

Usage

splitDataByGRanges(
  dat,
  chr,
  annots,
  gap = -1,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)
splitDataByGRanges(
  dat,
  chr,
  annots,
  gap = -1,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)

Arguments

`dat`	a data frame with rows as individual CpGs appearing in all the samples. The first 4 columns should contain the information of `Meth_Counts` (methylated counts), `Total_Counts` (read depths), `Position` (Genomic position for the CpG site) and `ID` (sample ID). The covariate information, such as disease status or cell type composition, are listed in column 5 and onwards.
`chr`	character vector containing the chromosome information. Its length should be equal to the number of rows in `dat`.
`annots`	GenomicRanges object containing the annotations
`gap`	integer defining the maximum gap that is allowed between two regions to be considered as overlapping. According to the `GenomicRanges::findOverlaps` function, the gap between 2 ranges is the number of positions that separate them. The gap between 2 adjacent ranges is 0. By convention when one range has its start or end strictly inside the other (i.e. non-disjoint ranges), the gap is considered to be -1. Decimal values will be rounded to the nearest integer. The default value is `-1` (meaning strict overlaping).
`min.cpgs`	positive integer defining the minimum number of CpGs within a region for the algorithm to perform optimally. The default value is 50.
`max.cpgs`	positive integer defining the maximum number of CpGs within a region for the algorithm to perform optimally. The default value is 2000.
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

A named list of data.frame containing the data of each independent region.

Author(s)

Audrey Lemaçon

Examples

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
annot <- GenomicRanges::GRanges(seqnames = "chr1", IRanges::IRanges(
start = c(102711720,102711844,102712006,102712503,102712702), 
end = c(102711757,102711909,102712195,102712637,102712712)
))
results <- splitDataByGRanges(dat = RAdat.f, 
chr = rep(x = "chr1", times = nrow(RAdat.f)), 
annots = annot, gap = -1, min.cpgs = 5)
#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
annot <- GenomicRanges::GRanges(seqnames = "chr1", IRanges::IRanges(
start = c(102711720,102711844,102712006,102712503,102712702), 
end = c(102711757,102711909,102712195,102712637,102712712)
))
results <- splitDataByGRanges(dat = RAdat.f, 
chr = rep(x = "chr1", times = nrow(RAdat.f)), 
annots = annot, gap = -1, min.cpgs = 5)

Split methylation data into regions based on the spacing of CpGs

Description

This function splits the methylation data into regions based on the spacing of CpGs.

Usage

splitDataByRegion(
  dat,
  gap = 1e+06,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)
splitDataByRegion(
  dat,
  gap = 1e+06,
  min.cpgs = 50,
  max.cpgs = 2000,
  verbose = TRUE
)

Arguments

`dat`	a data frame with rows as individual CpGs appearing in all the samples. The first 4 columns should contain the information of `Meth_Counts` (methylated counts), `Total_Counts` (read depths), `Position` (Genomic position for the CpG site) and `ID` (sample ID). The covariate information, such as disease status or cell type composition, are listed in column 5 and onwards.
`gap`	positive integer defining the gap width beyond which we consider that two regions are independent. Odd and decimal values will be rounded to the next even numbers (e.g. 8.2 and 8.7 become gaps of 8 and 10 respectively). The default value is `1e+6` (1Mb).
`min.cpgs`	positive integer defining the minimum number of CpGs within a region for the algorithm to perform optimally. The default value is 50.
`max.cpgs`	positive integer defining the maximum number of CpGs within a region for the algorithm to perform optimally. The default value is 2000.
`verbose`	logical indicates if the algorithm should provide progress report information. The default value is TRUE.

Value

A named list of data.frame containing the data of each independent region.

Author(s)

Audrey Lemaçon

Examples

#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
results <- splitDataByRegion( dat=RAdat.f, gap = 1e6, min.cpgs = 5, 
verbose = FALSE)
#------------------------------------------------------------#
data(RAdat)
RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ])
results <- splitDataByRegion( dat=RAdat.f, gap = 1e6, min.cpgs = 5, 
verbose = FALSE)

Package 'SOMNiBUS'

Help Index

A smoothed-EM algorithm to estimate covariate effects and test regional association in Bisulfite Sequencing-derived methylation data

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Plot the smooth covariate effect

Description

Usage

Arguments

Value

Author(s)

Examples

A smoothed-EM algorithm to estimate covariate effects and test regional association in Bisulfite Sequencing-derived methylation data

Description

Usage

Arguments

Value

Author(s)

Examples

Simulate Bisulfite sequencing data from specified smooth covariate effects

Description

Usage

Arguments

Value

Author(s)

Examples

Plot the predicted methylation levels

Description

Usage

Arguments

Value

Author(s)

Examples

Parsing output from the Bismark alignment suite

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Parsing output from the BSseq package

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

A simulated methylation dataset based on a real data.

Description

Usage

Format

Details

Source

A simulated methylation dataset based on a real data.

Description

Usage

Format

Source

Wrapper function running the smoothed-EM algorithm to estimate covariate effects and test regional association in Bisulfite Sequencing-derived methylation data

Description

Usage

Arguments

Value

Author(s)

Examples

Split methylation data into regions based on the genomic annotations

Description

Usage

Arguments

Value

Author(s)

Split methylation data into regions based on the chromatin states

Description