Title: | Estimate the cell composition of whole blood in DNA methylation samples |
---|---|
Description: | A tool to estimate the cell composition of DNA methylation whole blood sample measured on any platform technology (microarray and sequencing). |
Authors: | Stephanie C. Hicks [aut, cre] , Rafael Irizarry [aut] |
Maintainer: | Stephanie C. Hicks <[email protected]> |
License: | GPL-3 |
Version: | 1.21.0 |
Built: | 2024-12-31 03:08:20 UTC |
Source: | https://github.com/bioc/methylCC |
Extract the methylation values and GRanges objects
.extract_raw_data(object)
.extract_raw_data(object)
object |
an object can be a |
A list preprocessed objects from the
RGChannelSet
, GenomicMethylSet
or
BSseq
objects to be used in .preprocess_estimatecc()
.
This function uses the FlowSorted.Blood.450k
whole blood reference methylomes with six cell types
to identify differentially methylated regions.
.find_dmrs(verbose = TRUE, gr_target = NULL, include_cpgs = FALSE, include_dmrs = TRUE, num_cpgs = 50, num_regions = 50, bumphunter_beta_cutoff = 0.2, dmr_up_cutoff = 0.5, dmr_down_cutoff = 0.4, dmr_pval_cutoff = 1e-11, cpg_pval_cutoff = 1e-08, cpg_up_dm_cutoff = 0, cpg_down_dm_cutoff = 0, pairwise_comparison = FALSE, mset_train_flow_sort = NULL)
.find_dmrs(verbose = TRUE, gr_target = NULL, include_cpgs = FALSE, include_dmrs = TRUE, num_cpgs = 50, num_regions = 50, bumphunter_beta_cutoff = 0.2, dmr_up_cutoff = 0.5, dmr_down_cutoff = 0.4, dmr_pval_cutoff = 1e-11, cpg_pval_cutoff = 1e-08, cpg_up_dm_cutoff = 0, cpg_down_dm_cutoff = 0, pairwise_comparison = FALSE, mset_train_flow_sort = NULL)
verbose |
TRUE/FALSE argument specifying if verbose messages should be returned or not. Default is TRUE. |
gr_target |
Default is NULL. However, the user
can provide a GRanges object from the |
include_cpgs |
TRUE/FALSE. Should individual CpGs be returned. Default is FALSE. |
include_dmrs |
TRUE/FALSE. Should differentially methylated regions be returned. Default is TRUE. User can turn this to FALSE and search for only CpGs. |
num_cpgs |
The max number of CpGs to return for each cell type. Default is 50. |
num_regions |
The max number of DMRs to return for each cell type. Default is 50. |
bumphunter_beta_cutoff |
The |
dmr_up_cutoff |
A cutoff threshold for identifying DMRs that are methylated in one cell type, but not in the other cell types. |
dmr_down_cutoff |
A cutoff threshold for identifying DMRs that are not methylated in one cell type, but methylated in the other cell types. |
dmr_pval_cutoff |
A cutoff threshold for the p-values when identifying DMRs that are methylated in one cell type, but not in the other cell types (or vice versa). |
cpg_pval_cutoff |
A cutoff threshold for the p-values when identifying differentially methylated CpGs that are methylated in one cell type, but not in the other cell types (or vice versa). |
cpg_up_dm_cutoff |
A cutoff threshold for identifying differentially methylated CpGs that are methylated in one cell type, but not in the other cell types. |
cpg_down_dm_cutoff |
A cutoff threshold for identifying differentially methylated CpGs that are not methylated in one cell type, but are methylated in the other cell types. |
pairwise_comparison |
TRUE/FAlSE of whether all pairwise comparisons (e.g. methylated in Granulocytes and Monocytes, but not methylated in other cell types). Default if FALSE. |
mset_train_flow_sort |
Default is NULL. However, a user
can provide a |
A list of data frames and GRanges objects.
Creates a container with initial theta parameter estimates
.initialize_theta(n, K, alpha0 = NULL, alpha1 = NULL, sig0 = NULL, sig1 = NULL, tau = NULL)
.initialize_theta(n, K, alpha0 = NULL, alpha1 = NULL, sig0 = NULL, sig1 = NULL, tau = NULL)
n |
Number of samples |
K |
Number of cell types |
alpha0 |
Default NULL. Initial mean methylation level in unmethylated regions |
alpha1 |
Default NULL. Initial mean methylation level in methylated regions |
sig0 |
Default NULL. Initial var methylation level in unmethylated regions |
sig1 |
Default NULL. Initial var methylation level in methylated regions |
tau |
Default NULL. Initial var for measurement error |
A data frame with initial parameter estimates
to be used in .initializeMLEs()
.
Helper functions to initialize
MLEs in estimatecc()
.
.initializeMLEs(init_param_method, n, K, Ys, Zs, a0init, a1init, sig0init, sig1init, tauinit)
.initializeMLEs(init_param_method, n, K, Ys, Zs, a0init, a1init, sig0init, sig1init, tauinit)
init_param_method |
method to initialize parameter estimates. Choose between "random" (randomly sample) or "known_regions" (uses unmethyalted and methylated regions that were identified based on Reinus et al. (2012) cell sorted data.). Defaults to "random". |
n |
Number of samples |
K |
Number of cell types |
Ys |
observed methylation levels in samples provided by user of dimension R x n |
Zs |
Cell type specific regions of dimension R x K |
a0init |
Default NULL. Initial mean methylation level in unmethylated regions |
a1init |
Default NULL. Initial mean methylation level in methylated regions |
sig0init |
Default NULL. Initial var methylation level in unmethylated regions |
sig1init |
Default NULL. Initial var methylation level in methylated regions |
tauinit |
Default NULL. Initial var for measurement error |
A list of MLE estimates to be used in estimatecc()
.
Helper function for estimatecc
.methylcc_engine(Ys, Zs, current_pi_mle, current_theta, epsilon, max_iter)
.methylcc_engine(Ys, Zs, current_pi_mle, current_theta, epsilon, max_iter)
Ys |
observed methylation levels in samples provided by user of dimension R x n |
Zs |
Cell type specific regions of dimension R x K |
current_pi_mle |
cell composition MLE estimates of dimension K x n |
current_theta |
other parameter estimates in EM algorithm |
epsilon |
Add here. |
max_iter |
Add here. |
A list of MLE estimates that is used in
estimatecc()
.
Expectation step in EM algorithm for methylCC
.methylcc_estep(Ys, Zs, current_pi_mle, current_theta, meth_status = 0)
.methylcc_estep(Ys, Zs, current_pi_mle, current_theta, meth_status = 0)
Ys |
observed methylation levels in samples provided by user of dimension R x n |
Zs |
Cell type specific regions of dimension R x K |
current_pi_mle |
cell composition MLE estimates of dimension K x n |
current_theta |
other parameter estimates in EM algorithm |
meth_status |
Indicator function corresponding to regions that are unmethylated (meth_status=0) or methylated (meth_status=1) |
List of expected value of the first two moments
of the random effects (or the E-Step in the EM algorithm)
used in .methylcc_engine()
Maximization step in EM Algorithm for methylCC
.methylcc_mstep(Ys, Zs, current_pi_mle, current_theta, estep0, estep1)
.methylcc_mstep(Ys, Zs, current_pi_mle, current_theta, estep0, estep1)
Ys |
observed methylation levels in samples provided by user of dimension R x n |
Zs |
Cell type specific regions of dimension R x K |
current_pi_mle |
cell composition MLE estimates of dimension K x n |
current_theta |
other parameter estimates in EM algorithm |
estep0 |
Results from expectation step for unmethylated regions |
estep1 |
Results from expectation step for methylated regions |
A list of the updated MLEs (or the M-Step
in the EM algorithm) used in .methylcc_engine()
Pick probes from target data using
the indices in dmp_regions
.pick_target_positions(target_granges, target_object = NULL, target_cvg = NULL, dmp_regions)
.pick_target_positions(target_granges, target_object = NULL, target_cvg = NULL, dmp_regions)
target_granges |
add more here. |
target_object |
an optional argument which
contains the meta-data for |
target_cvg |
coverage reads for the target object |
dmp_regions |
differentially methylated regions |
A list of GRanges objects to be used in
.preprocess_estimatecc()
This function preprocesses the data
before the estimatecc()
function
.preprocess_estimatecc(object, verbose = TRUE, init_param_method = "random", celltype_specific_dmrs = celltype_specific_dmrs)
.preprocess_estimatecc(object, verbose = TRUE, init_param_method = "random", celltype_specific_dmrs = celltype_specific_dmrs)
object |
an object can be a |
verbose |
TRUE/FALSE argument specifying if verbose messages should be returned or not. Default is TRUE. |
init_param_method |
method to initialize parameter estimates. Choose between "random" (randomly sample) or "known_regions" (uses unmethyalted and methylated regions that were identified based on Reinus et al. (2012) cell sorted data.). Defaults to "random". |
celltype_specific_dmrs |
cell type specific differentially methylated regions (DMRs). |
A list of object to be used in estimatecc
helper function to split along a variable
.splitit(x)
.splitit(x)
x |
a vector |
A list to be used in find_dmrs()
Helper function which is the product of Z and pi_mle
.WFun(Zs, pi_mle)
.WFun(Zs, pi_mle)
Zs |
Cell type specific regions of dimension R x K |
pi_mle |
cell composition MLE estimates |
A list of output after taking the product of Z
and cell composition mle estimates to be used in
.methylcc_estep()
.
Given a estimatecc object, this function returns the cell composition estimates
Accessors for the 'cell_counts' slot of a estimatecc object.
cell_counts(object) ## S4 method for signature 'estimatecc' cell_counts(object)
cell_counts(object) ## S4 method for signature 'estimatecc' cell_counts(object)
object |
an object of class |
Returns the cell composition estimates
# This is a reduced version of the FlowSorted.Blood.450k # dataset available by using BiocManager::install("FlowSorted.Blood.450k), # but for purposes of the example, we use the smaller version # and we set \code{demo=TRUE}. For any case outside of this example for # the package, you should set \code{demo=FALSE} (the default). dir <- system.file("data", package="methylCC") files <- file.path(dir, "FlowSorted.Blood.450k.sub.RData") if(file.exists(files)){ load(file = files) set.seed(12345) est <- estimatecc(object = FlowSorted.Blood.450k.sub, demo = TRUE) cell_counts(est) }
# This is a reduced version of the FlowSorted.Blood.450k # dataset available by using BiocManager::install("FlowSorted.Blood.450k), # but for purposes of the example, we use the smaller version # and we set \code{demo=TRUE}. For any case outside of this example for # the package, you should set \code{demo=FALSE} (the default). dir <- system.file("data", package="methylCC") files <- file.path(dir, "FlowSorted.Blood.450k.sub.RData") if(file.exists(files)){ load(file = files) set.seed(12345) est <- estimatecc(object = FlowSorted.Blood.450k.sub, demo = TRUE) cell_counts(est) }
Estimate cell composition from DNAm data
estimatecc(object, find_dmrs_object = NULL, verbose = TRUE, epsilon = 0.01, max_iter = 100, take_intersection = FALSE, include_cpgs = FALSE, include_dmrs = TRUE, init_param_method = "random", a0init = NULL, a1init = NULL, sig0init = NULL, sig1init = NULL, tauinit = NULL, demo = FALSE)
estimatecc(object, find_dmrs_object = NULL, verbose = TRUE, epsilon = 0.01, max_iter = 100, take_intersection = FALSE, include_cpgs = FALSE, include_dmrs = TRUE, init_param_method = "random", a0init = NULL, a1init = NULL, sig0init = NULL, sig1init = NULL, tauinit = NULL, demo = FALSE)
object |
an object can be a |
find_dmrs_object |
If the user would like to supply
different differentially methylated regions, they can
use the output from the |
verbose |
TRUE/FALSE argument specifying if verbose messages should be returned or not. Default is TRUE. |
epsilon |
Threshold for EM algorithm to check for convergence. Default is 0.01. |
max_iter |
Maximum number of iterations for EM algorithm. Default is 100 iterations. |
take_intersection |
TRUE/FALSE asking if only the CpGs
included in |
include_cpgs |
TRUE/FALSE. Should individual CpGs be returned. Default is FALSE. |
include_dmrs |
TRUE/FALSE. Should differentially methylated regions be returned. Default is TRUE. |
init_param_method |
method to initialize parameter estimates. Choose between "random" (randomly sample) or "known_regions" (uses unmethyalted and methylated regions that were identified based on Reinus et al. (2012) cell sorted data.). Defaults to "random". |
a0init |
Default NULL. Initial mean methylation level in unmethylated regions |
a1init |
Default NULL. Initial mean methylation level in methylated regions |
sig0init |
Default NULL. Initial var methylation level in unmethylated regions |
sig1init |
Default NULL. Initial var methylation level in methylated regions |
tauinit |
Default NULL. Initial var for measurement error |
demo |
TRUE/FALSE. Should the function be used in demo mode to shorten examples in package. Defaults to FALSE. |
A object of the class estimatecc
that
contains information about the cell composition
estimation (in the summary
slot) and
the cell composition estimates themselves
(in the cell_counts
slot).
# This is a reduced version of the FlowSorted.Blood.450k # dataset available by using BiocManager::install("FlowSorted.Blood.450k), # but for purposes of the example, we use the smaller version # and we set \code{demo=TRUE}. For any case outside of this example for # the package, you should set \code{demo=FALSE} (the default). dir <- system.file("data", package="methylCC") files <- file.path(dir, "FlowSorted.Blood.450k.sub.RData") if(file.exists(files)){ load(file = files) set.seed(12345) est <- estimatecc(object = FlowSorted.Blood.450k.sub, demo = TRUE) cell_counts(est) }
# This is a reduced version of the FlowSorted.Blood.450k # dataset available by using BiocManager::install("FlowSorted.Blood.450k), # but for purposes of the example, we use the smaller version # and we set \code{demo=TRUE}. For any case outside of this example for # the package, you should set \code{demo=FALSE} (the default). dir <- system.file("data", package="methylCC") files <- file.path(dir, "FlowSorted.Blood.450k.sub.RData") if(file.exists(files)){ load(file = files) set.seed(12345) est <- estimatecc(object = FlowSorted.Blood.450k.sub, demo = TRUE) cell_counts(est) }
Objects of this class store all the values needed information to work with a estimatecc object
summary
returns the summary information
about the cell composition estimate procedure and
cell_counts
returns the cell composition estimates
summary
information about the samples and regions used to estimate cell composition
cell_counts
cell composition estimates
# This is a reduced version of the FlowSorted.Blood.450k # dataset available by using BiocManager::install("FlowSorted.Blood.450k), # but for purposes of the example, we use the smaller version # and we set \code{demo=TRUE}. For any case outside of this example for # the package, you should set \code{demo=FALSE} (the default). dir <- system.file("data", package="methylCC") files <- file.path(dir, "FlowSorted.Blood.450k.sub.RData") if(file.exists(files)){ load(file = files) set.seed(12345) est <- estimatecc(object = FlowSorted.Blood.450k.sub, demo = TRUE) cell_counts(est) }
# This is a reduced version of the FlowSorted.Blood.450k # dataset available by using BiocManager::install("FlowSorted.Blood.450k), # but for purposes of the example, we use the smaller version # and we set \code{demo=TRUE}. For any case outside of this example for # the package, you should set \code{demo=FALSE} (the default). dir <- system.file("data", package="methylCC") files <- file.path(dir, "FlowSorted.Blood.450k.sub.RData") if(file.exists(files)){ load(file = files) set.seed(12345) est <- estimatecc(object = FlowSorted.Blood.450k.sub, demo = TRUE) cell_counts(est) }
A reduced size of the
FlowSorted.Blood.450k
dataset
The object was created using the script in /inst and located in the /data folder.
A RGset object with 2e5 rows (probes) and 6 columns (whole blood samples).
This is the script used to create the
offMethRegions
data set.
The purpose is use in the estimate_cc()
function.
The object was created using the script in /inst and located in the /data folder.
add more here.
This is the script used to create the
onMethRegions
data set.
The purpose is use in the estimate_cc()
function.
The object was created using the script in /inst and located in the /data folder.
add more here.