Title: | MethylSig: Differential Methylation Testing for WGBS and RRBS Data |
---|---|
Description: | MethylSig is a package for testing for differentially methylated cytosines (DMCs) or regions (DMRs) in whole-genome bisulfite sequencing (WGBS) or reduced representation bisulfite sequencing (RRBS) experiments. MethylSig uses a beta binomial model to test for significant differences between groups of samples. Several options exist for either site-specific or sliding window tests, and variance estimation. |
Authors: | Yongseok Park [aut], Raymond G. Cavalcante [aut, cre] |
Maintainer: | Raymond G. Cavalcante <[email protected]> |
License: | GPL-3 |
Version: | 1.19.0 |
Built: | 2024-11-29 07:43:36 UTC |
Source: | https://github.com/bioc/methylSig |
Data contains 6 methylation loci and 2 samples
bsseq_destranded
bsseq_destranded
A BSseq object
data-raw/02-create_bsseq_rda.R
data(bsseq_destranded, package = 'methylSig')
data(bsseq_destranded, package = 'methylSig')
Data contains 4 methylation loci for 2 samples on 2 chromosomes
bsseq_multichrom
bsseq_multichrom
A BSseq object
data-raw/02-create_bsseq_rda.R
data(bsseq_multichrom, package = 'methylSig')
data(bsseq_multichrom, package = 'methylSig')
Data contains 11 methylation loci and 2 samples
bsseq_stranded
bsseq_stranded
A BSseq object
data-raw/02-create_bsseq_rda.R
data(bsseq_stranded, package = 'methylSig')
data(bsseq_stranded, package = 'methylSig')
This function calculates differential methylation statistics using a binomial-based approach. See ‘Warning’ message below.
diff_binomial(bs, group_column, comparison_groups)
diff_binomial(bs, group_column, comparison_groups)
bs |
A |
group_column |
a |
comparison_groups |
a named |
This function uses a binomial-based model to calculate differential methylation statistics. It is nearly identical to the methylKit::calculateDiffMeth
function in the methylKit
R package except that only the likelihood ratio test and p.adjust(..., method='BH')
are used to calculate significance levels. It is significantly faster than methylKit::calculateDiffMeth
function.
A GRanges
object containing the following mcols
:
Methylation estimate for case.
Methylation estimate for control.
The difference meth_case - meth_control
.
The group for which the lcous is hyper-methylated. Note, this is not subject to significance thresholds.
The p-value from the t-test (t_approx = TRUE
) or the Chi-Square test (t_approx = FALSE
).
The Benjamini-Hochberg adjusted p-values using p.adjust(method = 'BH')
.
The log likelihood ratio.
This function does not take into account the variability among samples in each group being compared.
data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[1:50] diff_gr = diff_binomial( bs = small_test, group_column = 'Type', comparison_groups = c('case' = 'cancer', 'control' = 'normal'))
data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[1:50] diff_gr = diff_binomial( bs = small_test, group_column = 'Type', comparison_groups = c('case' = 'cancer', 'control' = 'normal'))
This function is a wrapper for DSS::DMLfit.multiFactor
.
diff_dss_fit(bs, design, formula)
diff_dss_fit(bs, design, formula)
bs |
a |
design |
a |
formula |
a formula for the linear model. It should refer to column names from |
A list
object with:
a GRanges
object with loci fit.
the data.frame
input as the experimental design.
the formula
representing the model. Can be character
or formula
.
the design matrix
used in regression based on the design
and formula
. This should be consulted to determine the appropriate contrast to use in dss_fit_test()
.
a list
with model fitting results. It has components beta
, the estimated coefficients, and var.beta
the estimated variance/covariance matrix for beta
.
data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[1:50] diff_fit = diff_dss_fit( bs = small_test, design = bsseq::pData(bs), formula = '~ Type')
data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[1:50] diff_fit = diff_dss_fit( bs = small_test, design = bsseq::pData(bs), formula = '~ Type')
This function is a wrapper for DSS::DMLtest.multiFactor
with the added feature of reporting methylation rates alongside the test results via the methylation_group_column
and methylation_groups
parameters. See documentation below.
diff_dss_test( bs, diff_fit, contrast, methylation_group_column = NA, methylation_groups = NA )
diff_dss_test( bs, diff_fit, contrast, methylation_group_column = NA, methylation_groups = NA )
bs |
a |
diff_fit |
a |
contrast |
a contrast matrix for hypothesis testing. The number of rows should match the number of columns |
methylation_group_column |
Optionally, a column from |
methylation_groups |
Optionally, a named |
A GRanges
object containing the following mcols
:
The test statistic.
The p-value.
The Benjamini-Hochberg adjusted p-values using p.adjust(method = 'BH')
.
If methylation_group_column
is specified, also the following mcols
:
Methylation estimate for case.
Methylation estimate for control.
The difference meth_case - meth_control
.
The group for which the locus is hyper-methylated. Note, this is not subject to significance thresholds.
data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[1:50] diff_fit = diff_dss_fit( bs = small_test, design = bsseq::pData(bs), formula = '~ Type') result = diff_dss_test( bs = small_test, diff_fit = diff_fit, contrast = matrix(c(0,1), ncol = 1) ) result_with_meth = diff_dss_test( bs = small_test, diff_fit = diff_fit, contrast = matrix(c(0,1), ncol = 1), methylation_group_column = 'Type', methylation_groups = c('case' = 'cancer', 'control' = 'normal') )
data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[1:50] diff_fit = diff_dss_fit( bs = small_test, design = bsseq::pData(bs), formula = '~ Type') result = diff_dss_test( bs = small_test, diff_fit = diff_fit, contrast = matrix(c(0,1), ncol = 1) ) result_with_meth = diff_dss_test( bs = small_test, diff_fit = diff_fit, contrast = matrix(c(0,1), ncol = 1), methylation_group_column = 'Type', methylation_groups = c('case' = 'cancer', 'control' = 'normal') )
The function calculates differential methylation statistics between two groups of samples using a beta-binomial approach to calculate differential methylation statistics, accounting for variation among samples within each group. The function can be applied to a BSseq
object subjected to filter_loci_by_coverage()
, filter_loci_by_snps()
, filter_loci_by_group_coverage()
or any combination thereof. Moreover, the function can be applied to a BSseq
object which has been tiled with tile_by_regions()
or tile_by_windows()
.
diff_methylsig( bs, group_column, comparison_groups, disp_groups, local_window_size = 0, local_weight_function, t_approx = TRUE, n_cores = 1 )
diff_methylsig( bs, group_column, comparison_groups, disp_groups, local_window_size = 0, local_weight_function, t_approx = TRUE, n_cores = 1 )
bs |
a |
group_column |
a |
comparison_groups |
a named |
disp_groups |
a named |
local_window_size |
an |
local_weight_function |
a weight kernel function. The default is the tri-weight kernel function defined as |
t_approx |
a |
n_cores |
an |
A GRanges
object containing the following mcols
:
Methylation estimate for case.
Methylation estimate for control.
The difference meth_case - meth_control
.
The group for which the locus is hyper-methylated. Note, this is not subject to significance thresholds.
The p-value from the t-test (t_approx = TRUE
) or the Chi-Square test (t_approx = FALSE
).
The Benjamini-Hochberg adjusted p-values using p.adjust(method = 'BH')
.
The dispersion estimate.
The log likelihood ratio.
Degrees of freedom used when t_approx = TRUE
.
data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[seq(50)] diff_gr = diff_methylsig( bs = small_test, group_column = 'Type', comparison_groups = c('case' = 'cancer', 'control' = 'normal'), disp_groups = c('case' = TRUE, 'control' = TRUE), local_window_size = 0, t_approx = TRUE, n_cores = 1)
data(BS.cancer.ex, package = 'bsseqData') bs = filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) small_test = bs[seq(50)] diff_gr = diff_methylsig( bs = small_test, group_column = 'Type', comparison_groups = c('case' = 'cancer', 'control' = 'normal'), disp_groups = c('case' = TRUE, 'control' = TRUE), local_window_size = 0, t_approx = TRUE, n_cores = 1)
Used after bsseq::read.bismark
to mark loci in samples below min_count
or above max_count
to 0. These loci will then be removed prior to differential analysis by filter_loci_by_group_coverage()
if there are not a sufficient number of samples with appropriate coverage.
filter_loci_by_coverage(bs, min_count = 5, max_count = 500)
filter_loci_by_coverage(bs, min_count = 5, max_count = 500)
bs |
a |
min_count |
an |
max_count |
an |
A BSseq
object with samples/loci in the coverage and methylation matrix set to 0 where the coverage was less than min_count
or greater than max_count
. The number of samples and loci are conserved.
bis_cov_file1 = system.file('extdata', 'bis_cov1.cov', package = 'methylSig') bis_cov_file2 = system.file('extdata', 'bis_cov2.cov', package = 'methylSig') test = bsseq::read.bismark( files = c(bis_cov_file1, bis_cov_file2), colData = data.frame(row.names = c('test1','test2')), rmZeroCov = FALSE, strandCollapse = FALSE ) test = filter_loci_by_coverage(bs = test, min_count = 10, max_count = 500)
bis_cov_file1 = system.file('extdata', 'bis_cov1.cov', package = 'methylSig') bis_cov_file2 = system.file('extdata', 'bis_cov2.cov', package = 'methylSig') test = bsseq::read.bismark( files = c(bis_cov_file1, bis_cov_file2), colData = data.frame(row.names = c('test1','test2')), rmZeroCov = FALSE, strandCollapse = FALSE ) test = filter_loci_by_coverage(bs = test, min_count = 10, max_count = 500)
An optional function to remove loci not satisfying coverage thresholds from filter_loci_by_coverage
in a minimum number of samples per group.
filter_loci_by_group_coverage(bs, group_column, min_samples_per_group)
filter_loci_by_group_coverage(bs, group_column, min_samples_per_group)
bs |
a |
group_column |
a |
min_samples_per_group |
a named |
The filter_loci_by_coverage
function marked locus/sample pairs in the coverage matrix as 0 if said pair had coverage less than minCount
or more than maxCount
. This function enforces a threshold on the minimum number of samples per group required for a locus to be tested in downstream testing functions.
A BSseq
object with only those loci having min_samples_per_group
.
data(BS.cancer.ex, package = 'bsseqData') filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', min_samples_per_group = c('cancer' = 3, 'normal' = 3) )
data(BS.cancer.ex, package = 'bsseqData') filter_loci_by_group_coverage( bs = BS.cancer.ex, group_column = 'Type', min_samples_per_group = c('cancer' = 3, 'normal' = 3) )
GRanges
objectA function to remove loci from a BSseq
object based on intersection with loci in a GRanges
object.
filter_loci_by_location(bs, gr)
filter_loci_by_location(bs, gr)
bs |
a |
gr |
a |
A BSseq
object with loci intersecting gr
removed.
data(bsseq_stranded, package = 'methylSig') regions = GenomicRanges::GRanges( seqnames = c('chr1','chr1','chr1','chr1'), ranges = IRanges::IRanges( start = c(5,25,45,70), end = c(15,40,55,80) ) ) filtered = filter_loci_by_location(bs = bsseq_stranded, gr = regions)
data(bsseq_stranded, package = 'methylSig') regions = GenomicRanges::GRanges( seqnames = c('chr1','chr1','chr1','chr1'), ranges = IRanges::IRanges( start = c(5,25,45,70), end = c(15,40,55,80) ) ) filtered = filter_loci_by_location(bs = bsseq_stranded, gr = regions)
Data contains 1466 promoters for use in the vignette
promoters_gr
promoters_gr
A GRanges object
data-raw/02-create_bsseq_rda.R
data(promoters_gr, package = 'methylSig')
data(promoters_gr, package = 'methylSig')
An optional function to aggregate cytosine / CpG level data into regions based on a GRanges
set of genomic regions.
tile_by_regions(bs, gr)
tile_by_regions(bs, gr)
bs |
a |
gr |
a |
A BSseq
object with loci of regions matching gr
. Coverage and methylation read count matrices are aggregated by the sums of the cytosines / CpGs in the regions per sample.
data(bsseq_stranded, package = 'methylSig') regions = GenomicRanges::GRanges( seqnames = c('chr1','chr1','chr1'), ranges = IRanges::IRanges( start = c(5,35,75), end = c(30,70,80) ) ) tiled = tile_by_regions(bs = bsseq_stranded, gr = regions)
data(bsseq_stranded, package = 'methylSig') regions = GenomicRanges::GRanges( seqnames = c('chr1','chr1','chr1'), ranges = IRanges::IRanges( start = c(5,35,75), end = c(30,70,80) ) ) tiled = tile_by_regions(bs = bsseq_stranded, gr = regions)
An optional function to aggregate cytosine / CpG level data into regions based on a tiling of the genome by win_size
.
tile_by_windows(bs, win_size = 200)
tile_by_windows(bs, win_size = 200)
bs |
a |
win_size |
an |
A BSseq
object with loci consisting of a tiling of the genome by win_size
bp tiles. Coverage and methylation read count matrices are aggregated by the sums of the cytosines / CpGs in the regions per sample.
data(bsseq_stranded, package = 'methylSig') tiled = tile_by_windows(bs = bsseq_stranded, win_size = 50)
data(bsseq_stranded, package = 'methylSig') tiled = tile_by_windows(bs = bsseq_stranded, win_size = 50)