| Title: | MethylSig: Differential Methylation Testing for WGBS and RRBS Data |
|---|---|
| Description: | MethylSig is a package for testing for differentially methylated cytosines (DMCs) or regions (DMRs) in whole-genome bisulfite sequencing (WGBS) or reduced representation bisulfite sequencing (RRBS) experiments. MethylSig uses a beta binomial model to test for significant differences between groups of samples. Several options exist for either site-specific or sliding window tests, and variance estimation. |
| Authors: | Yongseok Park [aut], Raymond G. Cavalcante [aut, cre] |
| Maintainer: | Raymond G. Cavalcante <[email protected]> |
| License: | GPL-3 |
| Version: | 1.25.0 |
| Built: | 2026-05-30 09:40:48 UTC |
| Source: | https://github.com/bioc/methylSig |
Data contains 6 methylation loci and 2 samples
bsseq_destrandedbsseq_destranded
A BSseq object
data-raw/02-create_bsseq_rda.R
data(bsseq_destranded, package = 'methylSig')data(bsseq_destranded, package = 'methylSig')
Data contains 4 methylation loci for 2 samples on 2 chromosomes
bsseq_multichrombsseq_multichrom
A BSseq object
data-raw/02-create_bsseq_rda.R
data(bsseq_multichrom, package = 'methylSig')data(bsseq_multichrom, package = 'methylSig')
Data contains 11 methylation loci and 2 samples
bsseq_strandedbsseq_stranded
A BSseq object
data-raw/02-create_bsseq_rda.R
data(bsseq_stranded, package = 'methylSig')data(bsseq_stranded, package = 'methylSig')
This function calculates differential methylation statistics using a binomial-based approach. See ‘Warning’ message below.
diff_binomial(bs, group_column, comparison_groups)diff_binomial(bs, group_column, comparison_groups)
bs |
A |
group_column |
a |
comparison_groups |
a named |
This function uses a binomial-based model to calculate differential methylation statistics. It is nearly identical to the methylKit::calculateDiffMeth function in the methylKit R package except that only the likelihood ratio test and p.adjust(..., method='BH') are used to calculate significance levels. It is significantly faster than methylKit::calculateDiffMeth function.
A GRanges object containing the following mcols:
Methylation estimate for case.
Methylation estimate for control.
The difference meth_case - meth_control.
The group for which the lcous is hyper-methylated. Note, this is not subject to significance thresholds.
The p-value from the t-test (t_approx = TRUE) or the Chi-Square test (t_approx = FALSE).
The Benjamini-Hochberg adjusted p-values using p.adjust(method = 'BH').
The log likelihood ratio.
This function does not take into account the variability among samples in each group being compared.
data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[1:50] diff_gr = diff_binomial( bs = small_test, group_column = 'Type', comparison_groups = c('case' = 'cancer', 'control' = 'normal'))data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[1:50] diff_gr = diff_binomial( bs = small_test, group_column = 'Type', comparison_groups = c('case' = 'cancer', 'control' = 'normal'))
This function is a wrapper for DSS::DMLfit.multiFactor.
diff_dss_fit(bs, design, formula)diff_dss_fit(bs, design, formula)
bs |
a |
design |
a |
formula |
a formula for the linear model. It should refer to column names from |
A list object with:
a GRanges object with loci fit.
the data.frame input as the experimental design.
the formula representing the model. Can be character or formula.
the design matrix used in regression based on the design and formula. This should be consulted to determine the appropriate contrast to use in dss_fit_test().
a list with model fitting results. It has components beta, the estimated coefficients, and var.beta the estimated variance/covariance matrix for beta.
data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[1:50] diff_fit = diff_dss_fit( bs = small_test, design = bsseq::pData(bs), formula = '~ Type')data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[1:50] diff_fit = diff_dss_fit( bs = small_test, design = bsseq::pData(bs), formula = '~ Type')
This function is a wrapper for DSS::DMLtest.multiFactor with the added feature of reporting methylation rates alongside the test results via the methylation_group_column and methylation_groups parameters. See documentation below.
diff_dss_test( bs, diff_fit, contrast, methylation_group_column = NA, methylation_groups = NA )diff_dss_test( bs, diff_fit, contrast, methylation_group_column = NA, methylation_groups = NA )
bs |
a |
diff_fit |
a |
contrast |
a contrast matrix for hypothesis testing. The number of rows should match the number of columns |
methylation_group_column |
Optionally, a column from |
methylation_groups |
Optionally, a named |
A GRanges object containing the following mcols:
The test statistic.
The p-value.
The Benjamini-Hochberg adjusted p-values using p.adjust(method = 'BH').
If methylation_group_column is specified, also the following mcols:
Methylation estimate for case.
Methylation estimate for control.
The difference meth_case - meth_control.
The group for which the locus is hyper-methylated. Note, this is not subject to significance thresholds.
data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[1:50] diff_fit = diff_dss_fit( bs = small_test, design = bsseq::pData(bs), formula = '~ Type') result = diff_dss_test( bs = small_test, diff_fit = diff_fit, contrast = matrix(c(0,1), ncol = 1) ) result_with_meth = diff_dss_test( bs = small_test, diff_fit = diff_fit, contrast = matrix(c(0,1), ncol = 1), methylation_group_column = 'Type', methylation_groups = c('case' = 'cancer', 'control' = 'normal') )data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[1:50] diff_fit = diff_dss_fit( bs = small_test, design = bsseq::pData(bs), formula = '~ Type') result = diff_dss_test( bs = small_test, diff_fit = diff_fit, contrast = matrix(c(0,1), ncol = 1) ) result_with_meth = diff_dss_test( bs = small_test, diff_fit = diff_fit, contrast = matrix(c(0,1), ncol = 1), methylation_group_column = 'Type', methylation_groups = c('case' = 'cancer', 'control' = 'normal') )
The function calculates differential methylation statistics between two groups of samples using a beta-binomial approach to calculate differential methylation statistics, accounting for variation among samples within each group. The function can be applied to a BSseq object subjected to filter_loci_by_coverage(), filter_loci_by_snps(), filter_loci_by_group_coverage() or any combination thereof. Moreover, the function can be applied to a BSseq object which has been tiled with tile_by_regions() or tile_by_windows().
diff_methylsig( bs, group_column, comparison_groups, disp_groups, local_window_size = 0, local_weight_function, t_approx = TRUE, n_cores = 1 )diff_methylsig( bs, group_column, comparison_groups, disp_groups, local_window_size = 0, local_weight_function, t_approx = TRUE, n_cores = 1 )
bs |
a |
group_column |
a |
comparison_groups |
a named |
disp_groups |
a named |
local_window_size |
an |
local_weight_function |
a weight kernel function. The default is the tri-weight kernel function defined as |
t_approx |
a |
n_cores |
an |
A GRanges object containing the following mcols:
Methylation estimate for case.
Methylation estimate for control.
The difference meth_case - meth_control.
The group for which the locus is hyper-methylated. Note, this is not subject to significance thresholds.
The p-value from the t-test (t_approx = TRUE) or the Chi-Square test (t_approx = FALSE).
The Benjamini-Hochberg adjusted p-values using p.adjust(method = 'BH').
The dispersion estimate.
The log likelihood ratio.
Degrees of freedom used when t_approx = TRUE.
data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[seq(50)] diff_gr = diff_methylsig( bs = small_test, group_column = 'Type', comparison_groups = c('case' = 'cancer', 'control' = 'normal'), disp_groups = c('case' = TRUE, 'control' = TRUE), local_window_size = 0, t_approx = TRUE, n_cores = 1)data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[seq(50)] diff_gr = diff_methylsig( bs = small_test, group_column = 'Type', comparison_groups = c('case' = 'cancer', 'control' = 'normal'), disp_groups = c('case' = TRUE, 'control' = TRUE), local_window_size = 0, t_approx = TRUE, n_cores = 1)
Used after bsseq::read.bismark to mark loci in samples below min_count or above max_count to 0. These loci will then be removed prior to differential analysis by filter_loci_by_group_coverage() if there are not a sufficient number of samples with appropriate coverage.
filter_loci_by_coverage(bs, min_count = 5, max_count = 500)filter_loci_by_coverage(bs, min_count = 5, max_count = 500)
bs |
a |
min_count |
an |
max_count |
an |
A BSseq object with samples/loci in the coverage and methylation matrix set to 0 where the coverage was less than min_count or greater than max_count. The number of samples and loci are conserved.
bis_cov_file1 = system.file('extdata', 'bis_cov1.cov', package = 'methylSig') bis_cov_file2 = system.file('extdata', 'bis_cov2.cov', package = 'methylSig') test = bsseq::read.bismark( files = c(bis_cov_file1, bis_cov_file2), colData = data.frame(row.names = c('test1','test2')), rmZeroCov = FALSE, strandCollapse = FALSE ) test = filter_loci_by_coverage(bs = test, min_count = 10, max_count = 500)bis_cov_file1 = system.file('extdata', 'bis_cov1.cov', package = 'methylSig') bis_cov_file2 = system.file('extdata', 'bis_cov2.cov', package = 'methylSig') test = bsseq::read.bismark( files = c(bis_cov_file1, bis_cov_file2), colData = data.frame(row.names = c('test1','test2')), rmZeroCov = FALSE, strandCollapse = FALSE ) test = filter_loci_by_coverage(bs = test, min_count = 10, max_count = 500)
An optional function to remove loci not satisfying coverage thresholds from filter_loci_by_coverage in a minimum number of samples per group.
filter_loci_by_group_coverage(bs, group_column, min_samples_per_group)filter_loci_by_group_coverage(bs, group_column, min_samples_per_group)
bs |
a |
group_column |
a |
min_samples_per_group |
a named |
The filter_loci_by_coverage function marked locus/sample pairs in the coverage matrix as 0 if said pair had coverage less than minCount or more than maxCount. This function enforces a threshold on the minimum number of samples per group required for a locus to be tested in downstream testing functions.
A BSseq object with only those loci having min_samples_per_group.
data(BS.cancer.ex, package = 'bsseqData') filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', min_samples_per_group = c('cancer' = 3, 'normal' = 3) )data(BS.cancer.ex, package = 'bsseqData') filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', min_samples_per_group = c('cancer' = 3, 'normal' = 3) )
GRanges objectA function to remove loci from a BSseq object based on intersection with loci in a GRanges object.
filter_loci_by_location(bs, gr)filter_loci_by_location(bs, gr)
bs |
a |
gr |
a |
A BSseq object with loci intersecting gr removed.
data(bsseq_stranded, package = 'methylSig') regions = GenomicRanges::GRanges( seqnames = c('chr1','chr1','chr1','chr1'), ranges = IRanges::IRanges( start = c(5,25,45,70), end = c(15,40,55,80) ) ) filtered = filter_loci_by_location(bs = bsseq_stranded, gr = regions)data(bsseq_stranded, package = 'methylSig') regions = GenomicRanges::GRanges( seqnames = c('chr1','chr1','chr1','chr1'), ranges = IRanges::IRanges( start = c(5,25,45,70), end = c(15,40,55,80) ) ) filtered = filter_loci_by_location(bs = bsseq_stranded, gr = regions)
Data contains 1466 promoters for use in the vignette
promoters_grpromoters_gr
A GRanges object
data-raw/02-create_bsseq_rda.R
data(promoters_gr, package = 'methylSig')data(promoters_gr, package = 'methylSig')
An optional function to aggregate cytosine / CpG level data into regions based on a GRanges set of genomic regions.
tile_by_regions(bs, gr)tile_by_regions(bs, gr)
bs |
a |
gr |
a |
A BSseq object with loci of regions matching gr. Coverage and methylation read count matrices are aggregated by the sums of the cytosines / CpGs in the regions per sample.
data(bsseq_stranded, package = 'methylSig') regions = GenomicRanges::GRanges( seqnames = c('chr1','chr1','chr1'), ranges = IRanges::IRanges( start = c(5,35,75), end = c(30,70,80) ) ) tiled = tile_by_regions(bs = bsseq_stranded, gr = regions)data(bsseq_stranded, package = 'methylSig') regions = GenomicRanges::GRanges( seqnames = c('chr1','chr1','chr1'), ranges = IRanges::IRanges( start = c(5,35,75), end = c(30,70,80) ) ) tiled = tile_by_regions(bs = bsseq_stranded, gr = regions)
An optional function to aggregate cytosine / CpG level data into regions based on a tiling of the genome by win_size.
tile_by_windows(bs, win_size = 200)tile_by_windows(bs, win_size = 200)
bs |
a |
win_size |
an |
A BSseq object with loci consisting of a tiling of the genome by win_size bp tiles. Coverage and methylation read count matrices are aggregated by the sums of the cytosines / CpGs in the regions per sample.
data(bsseq_stranded, package = 'methylSig') tiled = tile_by_windows(bs = bsseq_stranded, win_size = 50)data(bsseq_stranded, package = 'methylSig') tiled = tile_by_windows(bs = bsseq_stranded, win_size = 50)