| Title: | hierarchical Bayesian modeling for cell type deconvolution of immunoprecipitation-based DNA methylome |
|---|---|
| Description: | The R package decemedip is a novel computational paradigm developed for inferring the relative abundances of cell types and tissues measure by methylated DNA immunoprecipitation sequencing (MeDIP-Seq). This paradigm allows using reference data from other technologies such as microarray or WGBS. |
| Authors: | Ning Shen [aut, cre] (ORCID: <https://orcid.org/0000-0002-2974-1086>) |
| Maintainer: | Ning Shen <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-28 06:05:29 UTC |
| Source: | https://github.com/bioc/decemedip |
The R package decemedip is a novel computational paradigm developed for inferring the relative abundances of cell types and tissues from tissue bulk or circulating cell-free DNA (cfDNA) measure by methylated DNA immunoprecipitation sequencing (MeDIP-Seq). This paradigm allows using reference data from other technologies such as microarray or WGBS.
Maintainer: Ning Shen [email protected] (ORCID)
Stan Development Team (NA). RStan: the R interface to Stan. R package version 2.32.6. https://mc-stan.org
Useful links:
Main function to perform cell type deconvolution with MeDIP-seq data
decemedip( sample_bam_file = NULL, paired_end = NULL, counts_cts = c(), counts_anc = c(), ref_assembly = "hg19", ref_cts = NULL, ref_anc = NULL, weight_cts = 1, weight_anc = 0.5, diagnostics = TRUE, seed = 2024, cores = 1, chains = 4, iter = 2000, stan_input_params = list(s_mu = 3, s_sigma = 3, n_knot_z = 0, degree_z = 3, Xi = cor(as.matrix(SummarizedExperiment::assay((ref_cts)))), s_theta = 3, s_tau = 3), stan_control = list(adapt_delta = 0.95, max_treedepth = 15), timeout_sec = 2 * (diagnostics + 1) * iter * chains, max_retries = 3, ... )decemedip( sample_bam_file = NULL, paired_end = NULL, counts_cts = c(), counts_anc = c(), ref_assembly = "hg19", ref_cts = NULL, ref_anc = NULL, weight_cts = 1, weight_anc = 0.5, diagnostics = TRUE, seed = 2024, cores = 1, chains = 4, iter = 2000, stan_input_params = list(s_mu = 3, s_sigma = 3, n_knot_z = 0, degree_z = 3, Xi = cor(as.matrix(SummarizedExperiment::assay((ref_cts)))), s_theta = 3, s_tau = 3), stan_control = list(adapt_delta = 0.95, max_treedepth = 15), timeout_sec = 2 * (diagnostics + 1) * iter * chains, max_retries = 3, ... )
sample_bam_file |
A string value that indicates the file path to the bam file of a
MeDIP-seq sample of interest. If |
paired_end |
A logic value that indicates whether the bam file is from paired-end
reads or single-end. Specify |
counts_cts |
An atomic vector of integer values that indicates the read counts of
a MeDIP-seq sample on reference sites/regions. If |
counts_anc |
An atomic vector of integer values that indicates the read counts of
a MeDIP-seq sample on reference sites/regions. If |
ref_assembly |
A string that represents the genome assembly that should be used for cell type-specific
sites in the reference panel ('hg19' or 'hg38'). Default to 'hg19'. The default reference is explained
in the manuscript of decemedip. Alternatively, if the user want to provide their own
reference panel by using the |
ref_cts |
A |
ref_anc |
Same as |
weight_cts |
A numeric value indicating the weights that should be put on cell type-specific sites/regions. Default is 0.5. |
weight_anc |
A numeric value indicating the weights that should be put on cell type-specific sites/regions. Default is 1. |
diagnostics |
A logic value that indicates whether to include components of the stan model
in the output that are necessary for future diagnostics of the model, such as posterior
predictive checks. For details, please refer to the function |
seed |
The seed for random number generation in MCMC sampling. |
cores |
A positive integer specifying the number of cores that can be used for MCMC sampling. The default is 1. |
chains |
A positive integer specifying the number of Markov chains. The default is 4. |
iter |
A positive integer specifying the number of iterations for each chain (including warmup). The default is 2000. |
stan_input_params |
A named list of parameters that specifies the prior of the decemedip model. |
stan_control |
A named list of parameters to control the sampler's behavior in
Stan. See the details in the documentation for the control argument in |
timeout_sec |
A numerical value indicating the CPU/processor time (in seconds) allowed for the MCMC to run before restarting the chains with a new random seed. |
max_retries |
An integer value indicating the maximum number of tries with different seed for MCMC if it fails to converge. |
... |
Other parameters that can be passed to the |
A list of two elements:
data_list: An organized list of variables used as input to the Stan posterior sampling function.
posterior: An stanfit object produced by Stan representing the fitted posteriors.
# Here we use a lightweighted example that only contains 8 cell types to avoid long running time: data(example.hg19.ref.cts.se) data(example.hg19.ref.anc.se) data(example.pdx.counts.cts.se) data(example.pdx.counts.anc.se) # read counts of cell type-specific CpGs of the sample 'LuCaP_147CR' counts_cts <- SummarizedExperiment::assays(example.pdx.counts.cts.se)$counts[, "LuCaP_147CR"] # read counts of anchor CpGs of the sample 'LuCaP_147CR' counts_anc <- SummarizedExperiment::assays(example.pdx.counts.anc.se)$counts[, "LuCaP_147CR"] ## Fit decemedip model output <- decemedip(counts_cts = counts_cts, counts_anc = counts_anc, ref_cts = example.hg19.ref.cts.se, ref_anc = example.hg19.ref.anc.se, iter = 500, cores = 1, chains = 1) ## IMPORTANT NOTE (PLEASE READ): For usual cases, you may skip specifying ref_cts and ref_anc, ## as the default panels contain 25 cell types, as used in the manuscript, e.g., run the function as: ## Not run: ## By default, the functions uses 2000 iterations, 4 cores and 4 chains output <- decemedip(counts_cts = counts_cts, counts_anc = counts_anc) ## or use BAM files as input (paired=TRUE is file is paired-end) output <- decemedip(sample_bam_file = "path/to/bam/files", paired = TRUE) ## End(Not run)# Here we use a lightweighted example that only contains 8 cell types to avoid long running time: data(example.hg19.ref.cts.se) data(example.hg19.ref.anc.se) data(example.pdx.counts.cts.se) data(example.pdx.counts.anc.se) # read counts of cell type-specific CpGs of the sample 'LuCaP_147CR' counts_cts <- SummarizedExperiment::assays(example.pdx.counts.cts.se)$counts[, "LuCaP_147CR"] # read counts of anchor CpGs of the sample 'LuCaP_147CR' counts_anc <- SummarizedExperiment::assays(example.pdx.counts.anc.se)$counts[, "LuCaP_147CR"] ## Fit decemedip model output <- decemedip(counts_cts = counts_cts, counts_anc = counts_anc, ref_cts = example.hg19.ref.cts.se, ref_anc = example.hg19.ref.anc.se, iter = 500, cores = 1, chains = 1) ## IMPORTANT NOTE (PLEASE READ): For usual cases, you may skip specifying ref_cts and ref_anc, ## as the default panels contain 25 cell types, as used in the manuscript, e.g., run the function as: ## Not run: ## By default, the functions uses 2000 iterations, 4 cores and 4 chains output <- decemedip(counts_cts = counts_cts, counts_anc = counts_anc) ## or use BAM files as input (paired=TRUE is file is paired-end) output <- decemedip(sample_bam_file = "path/to/bam/files", paired = TRUE) ## End(Not run)
Subset of hg19.ref.anc.se as a lightweighted example, only contains the blood cell types and prostate.
data(example.hg19.ref.anc.se)data(example.hg19.ref.anc.se)
An object of class SummarizedExperiment.
All coordinates are in hg19.
Subset of hg19.ref.cts.se as a lightweighted example, only contains the blood cell types and prostate.
data(example.hg19.ref.cts.se)data(example.hg19.ref.cts.se)
An object of class SummarizedExperiment.
All coordinates are in hg19.
Subset of pdx.counts.anc.se as a lightweighted example, only contains the blood cell types and prostate.
data(example.pdx.counts.anc.se)data(example.pdx.counts.anc.se)
An object of class SummarizedExperiment.
All coordinates are in hg19.
Subset of pdx.counts.cts.se as a lightweighted example, only contains the blood cell types and prostate.
data(example.pdx.counts.cts.se)data(example.pdx.counts.cts.se)
An object of class SummarizedExperiment.
All coordinates are in hg19.
Obtain read counts of regions of interest from MeDIP-seq bam files
getRoiReadCount( sample_bam_files, sample_names, sample_paired, roi, col_data = NULL, row_data = NULL, bs_genome = "BSgenome.Hsapiens.UCSC.hg19", ... )getRoiReadCount( sample_bam_files, sample_names, sample_paired, roi, col_data = NULL, row_data = NULL, bs_genome = "BSgenome.Hsapiens.UCSC.hg19", ... )
sample_bam_files |
An atomic vector that contains directory paths of the MeDIP-seq sorted bam files. |
sample_names |
An atomic vector of strings that indicates the sample names.
Default is NULL. If not NULL, please make sure that |
sample_paired |
A logic value that indicates sample_paired-end reads (if |
roi |
A |
col_data |
A |
row_data |
A |
bs_genome |
A character value that indicates the reference genome name as
defined by |
... |
Additional arguments passed into |
An object of class SummarizedExperiment with read count matrix stored
as an assay named 'counts' (can be extracted using SummarizedExperiment::assays)
se <- getRoiReadCount( sample_bam_files = c("dir/to/bam1", "dir/to/bam2"), sample_names = c("sample1", "sample2"), sample_paired = TRUE )se <- getRoiReadCount( sample_bam_files = c("dir/to/bam1", "dir/to/bam2"), sample_names = c("sample1", "sample2"), sample_paired = TRUE )
Extract summary statistics and diagnostics on fitted cell type proportions
getSummaryOnPi( posterior, probs = c(0.025, 0.25, 0.5, 0.75, 0.975), digits_summary = 5, cell_type_names = NULL, ... )getSummaryOnPi( posterior, probs = c(0.025, 0.25, 0.5, 0.75, 0.975), digits_summary = 5, cell_type_names = NULL, ... )
posterior |
The fitted posterior object from decemedip model. |
probs |
A numeric vector specifying quantiles of interest. The defaults is c(0.025,0.25,0.5,0.75,0.975). |
digits_summary |
The number of significant digits to use in the summary, defaulting to 5. |
cell_type_names |
Name of the cell types in reference panel. The order should align with order in the reference panel. |
... |
Additional arguments that get passed into |
A data.frame object containg summary statistics and diagnostic statistics of the fitted cell type proportions.
data(pdx.counts.cts.se) data(pdx.counts.anc.se) # read counts of cell type-specific CpGs of the sample 'LuCaP_147CR' counts_cts <- SummarizedExperiment::assays(pdx.counts.cts.se)$counts[, "LuCaP_147CR"] # read counts of anchor CpGs of the sample 'LuCaP_147CR' counts_anc <- SummarizedExperiment::assays(pdx.counts.anc.se)$counts[, "LuCaP_147CR"] ## The following functions are commented due to Bioconductor's time constraints on package building ## Fit decemedip model # output <- decemedip(counts_cts = counts_cts, counts_anc = counts_anc) ## Get summary stats # smr_pi.df <- getSummaryOnPi(output$posterior)data(pdx.counts.cts.se) data(pdx.counts.anc.se) # read counts of cell type-specific CpGs of the sample 'LuCaP_147CR' counts_cts <- SummarizedExperiment::assays(pdx.counts.cts.se)$counts[, "LuCaP_147CR"] # read counts of anchor CpGs of the sample 'LuCaP_147CR' counts_anc <- SummarizedExperiment::assays(pdx.counts.anc.se)$counts[, "LuCaP_147CR"] ## The following functions are commented due to Bioconductor's time constraints on package building ## Fit decemedip model # output <- decemedip(counts_cts = counts_cts, counts_anc = counts_anc) ## Get summary stats # smr_pi.df <- getSummaryOnPi(output$posterior)
decemedip.This dataset represents a GRanges object that contains the collection of Illumina HumanMethylation450K probes that have methylation level less than 0.1 or greater than 0.9 in all tissue present in the atlas. Data source is from the MethAtlas GitHub repo (https://github.com/nloyfer/meth_atlas).
data(hg19.ref.anc.se)data(hg19.ref.anc.se)
An object of class SummarizedExperiment.
All coordinates are in hg19.
Moss, J., Magenheim, J., Neiman, D. et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat Commun 9, 5068 (2018).
data(hg19.ref.anc.se) hg19.ref.anc.sedata(hg19.ref.anc.se) hg19.ref.anc.se
Default reference cell type-specific markers used as default in
decemedip. This dataset represents a GRanges object that contains the collection
of Illumina HumanMethylation450K probes that have methylation level less than 0.1 or
greater than 0.9 in all tissue present in the atlas. Data source of the methylation atlas is
from the MethAtlas GitHub repo (https://github.com/nloyfer/meth_atlas). For details
of how the marker CpGs are selected, please refer to the decemedip manuscript.
data(hg19.ref.cts.se)data(hg19.ref.cts.se)
An object of class SummarizedExperiment. rowData(hg19.ref.cts.se) contains
information of the selected probes.
All coordinates are in hg19.
Moss, J., Magenheim, J., Neiman, D. et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat Commun 9, 5068 (2018). https://doi.org/10.1038/s41467-018-07466-6
data(hg19.ref.cts.se) hg19.ref.cts.sedata(hg19.ref.cts.se) hg19.ref.cts.se
Same as data(hg19.ref.anc.se) but lifted over to hg38.
data(hg38.ref.anc.se)data(hg38.ref.anc.se)
An object of class SummarizedExperiment.
An object of class SummarizedExperiment.
All coordinates are in hg38.
data(hg38.ref.anc.se) hg38.ref.anc.sedata(hg38.ref.anc.se) hg38.ref.anc.se
Same as data(hg19.ref.cts.se) but lifted over to hg38.
data(hg38.ref.cts.se)data(hg38.ref.cts.se)
An object of class SummarizedExperiment.
An object of class SummarizedExperiment.
All coordinates are in hg38.
data(hg38.ref.cts.se) hg38.ref.cts.sedata(hg38.ref.cts.se) hg38.ref.cts.se
decemedip function)Function for assembling a SummarizedExperiment object of reference panel in hg19 for
cell type deconvolution (which is used in decemedip function)
makeReferencePanel( row_ranges, beta_matrix, cpg_coords, col_names = NULL, row_names = NULL, col_data = NULL, row_data = NULL )makeReferencePanel( row_ranges, beta_matrix, cpg_coords, col_names = NULL, row_names = NULL, col_data = NULL, row_data = NULL )
row_ranges |
A |
beta_matrix |
A matrix that contains the beta values of reference regions. Each row
is a region and each column is a cell type. If |
cpg_coords |
A |
col_names |
An atomic vector of strings that indicates the column names, i.e.,
names of the cell types. Default is NULL. If not NULL, the column names of |
row_names |
An atomic vector of strings that indicates the row names, i.e.,
names of the reference regions. Default is NULL. If not NULL, the row names of |
col_data |
A |
row_data |
A |
An SummarizedExperiment object with each row represents a reference region
and an assay named 'beta_matrix' that stores the beta values of reference regions.
row_ranges <- GenomicRanges::GRanges( seqnames = S4Vectors::Rle(c("chr1", "chr2", "chr3")), ranges = IRanges::IRanges( start = c(100, 200, 300), end = c(100, 200, 300) ), cpg_id = c("cpg_1", "cpg_2", "cpg_3") # CpG site IDs ) cpg_coords <- GenomicRanges::GRanges( seqnames = S4Vectors::Rle(c("chr1", "chr1", "chr2", "chr2", "chr3", "chr3")), ranges = IRanges::IRanges( start = c(100, 101, 200, 201, 300, 301), end = c(100, 101, 200, 201, 300, 301) ) ) beta_matrix <- matrix(runif(6), nrow = 3) makeReferencePanel( row_ranges = row_ranges, beta_matrix = beta_matrix, cpg_coords = cpg_coords )row_ranges <- GenomicRanges::GRanges( seqnames = S4Vectors::Rle(c("chr1", "chr2", "chr3")), ranges = IRanges::IRanges( start = c(100, 200, 300), end = c(100, 200, 300) ), cpg_id = c("cpg_1", "cpg_2", "cpg_3") # CpG site IDs ) cpg_coords <- GenomicRanges::GRanges( seqnames = S4Vectors::Rle(c("chr1", "chr1", "chr2", "chr2", "chr3", "chr3")), ranges = IRanges::IRanges( start = c(100, 101, 200, 201, 300, 301), end = c(100, 101, 200, 201, 300, 301) ) ) beta_matrix <- matrix(runif(6), nrow = 3) makeReferencePanel( row_ranges = row_ranges, beta_matrix = beta_matrix, cpg_coords = cpg_coords )
This dataset represents a SummarizedExperiment object that contains
MeDIP-seq read counts on reference anchor CpGs of 3 PDX samples from Berchuck
et al. 2022. Each row is a CpG.
data(pdx.counts.anc.se)data(pdx.counts.anc.se)
An object of class SummarizedExperiment.
All coordinates are in hg19.
data(pdx.counts.anc.se) pdx.counts.anc.sedata(pdx.counts.anc.se) pdx.counts.anc.se
This dataset represents a SummarizedExperiment object that contains
MeDIP-seq read counts on reference cell type-specific CpGs of 3 PDX samples from Berchuck
et al. 2022. Each row is a CpG.
data(pdx.counts.cts.se)data(pdx.counts.cts.se)
An object of class SummarizedExperiment.
All coordinates are in hg19.
data(pdx.counts.cts.se) pdx.counts.cts.sedata(pdx.counts.cts.se) pdx.counts.cts.se
decemedip()
Diagnostics for model fitting in decemedip()
plotDiagnostics( decemedip_output, plot_type, model_fit_n_samples = 100, model_fit_label_size = 12, model_fit_align = "hv", ... )plotDiagnostics( decemedip_output, plot_type, model_fit_n_samples = 100, model_fit_label_size = 12, model_fit_align = "hv", ... )
decemedip_output |
The output from |
plot_type |
A string value, either 'y_fit' or 'model_fit'. |
model_fit_n_samples |
Number of randomly selected posterior samples for
plotting the diagnostic plots of stan fit. For |
model_fit_label_size |
Label size in the plot grid. For |
model_fit_align |
Specifies how graphs in the grid should be aligned. See
the argument |
... |
Additional arguments to be fed into cowplot::plot_grid
in the case of |
An ggplot object.
data(pdx.counts.cts.se) data(pdx.counts.anc.se) # read counts of cell type-specific CpGs of the sample 'LuCaP_147CR' counts_cts <- SummarizedExperiment::assays(pdx.counts.cts.se)$counts[, "LuCaP_147CR"] # read counts of anchor CpGs of the sample 'LuCaP_147CR' counts_anc <- SummarizedExperiment::assays(pdx.counts.anc.se)$counts[, "LuCaP_147CR"] ## The following function is commented due to Bioconductor's time constraints on package building ## Fit decemedip model # output <- decemedip(counts_cts = counts_cts, counts_anc = counts_anc) ## Plot diagnostic plots # plotDiagnostics(output, plot_type = "y_fit")data(pdx.counts.cts.se) data(pdx.counts.anc.se) # read counts of cell type-specific CpGs of the sample 'LuCaP_147CR' counts_cts <- SummarizedExperiment::assays(pdx.counts.cts.se)$counts[, "LuCaP_147CR"] # read counts of anchor CpGs of the sample 'LuCaP_147CR' counts_anc <- SummarizedExperiment::assays(pdx.counts.anc.se)$counts[, "LuCaP_147CR"] ## The following function is commented due to Bioconductor's time constraints on package building ## Fit decemedip model # output <- decemedip(counts_cts = counts_cts, counts_anc = counts_anc) ## Plot diagnostic plots # plotDiagnostics(output, plot_type = "y_fit")