Package 'methylCC' reference manual

Title:	Estimate the cell composition of whole blood in DNA methylation samples
Description:	A tool to estimate the cell composition of DNA methylation whole blood sample measured on any platform technology (microarray and sequencing).
Authors:	Stephanie C. Hicks [aut, cre] , Rafael Irizarry [aut]
Maintainer:	Stephanie C. Hicks <[email protected]>
License:	GPL-3
Version:	1.21.0
Built:	2025-03-31 04:16:36 UTC
Source:	https://github.com/bioc/methylCC

Extract raw data

Description

Extract the methylation values and GRanges objects

Usage

.extract_raw_data(object)
.extract_raw_data(object)

Arguments

object

an object can be a RGChannelSet, GenomicMethylSet or BSseq object

Value

A list preprocessed objects from the RGChannelSet, GenomicMethylSet or BSseq objects to be used in .preprocess_estimatecc().

Finding differentially methylated regions

Description

This function uses the FlowSorted.Blood.450k whole blood reference methylomes with six cell types to identify differentially methylated regions.

Usage

.find_dmrs(verbose = TRUE, gr_target = NULL, include_cpgs = FALSE,
  include_dmrs = TRUE, num_cpgs = 50, num_regions = 50,
  bumphunter_beta_cutoff = 0.2, dmr_up_cutoff = 0.5,
  dmr_down_cutoff = 0.4, dmr_pval_cutoff = 1e-11,
  cpg_pval_cutoff = 1e-08, cpg_up_dm_cutoff = 0,
  cpg_down_dm_cutoff = 0, pairwise_comparison = FALSE,
  mset_train_flow_sort = NULL)
.find_dmrs(verbose = TRUE, gr_target = NULL, include_cpgs = FALSE,
  include_dmrs = TRUE, num_cpgs = 50, num_regions = 50,
  bumphunter_beta_cutoff = 0.2, dmr_up_cutoff = 0.5,
  dmr_down_cutoff = 0.4, dmr_pval_cutoff = 1e-11,
  cpg_pval_cutoff = 1e-08, cpg_up_dm_cutoff = 0,
  cpg_down_dm_cutoff = 0, pairwise_comparison = FALSE,
  mset_train_flow_sort = NULL)

Arguments

`verbose`	TRUE/FALSE argument specifying if verbose messages should be returned or not. Default is TRUE.
`gr_target`	Default is NULL. However, the user can provide a GRanges object from the `object` in `estimatecc`. Before starting the procedure to find differentially methylated regions, the intersection of the `gr_target` and GRanges object from the reference methylomes (`FlowSorted.Blood.450k`).
`include_cpgs`	TRUE/FALSE. Should individual CpGs be returned. Default is FALSE.
`include_dmrs`	TRUE/FALSE. Should differentially methylated regions be returned. Default is TRUE. User can turn this to FALSE and search for only CpGs.
`num_cpgs`	The max number of CpGs to return for each cell type. Default is 50.
`num_regions`	The max number of DMRs to return for each cell type. Default is 50.
`bumphunter_beta_cutoff`	The `cutoff` threshold in `bumphunter()` in the `bumphunter` package.
`dmr_up_cutoff`	A cutoff threshold for identifying DMRs that are methylated in one cell type, but not in the other cell types.
`dmr_down_cutoff`	A cutoff threshold for identifying DMRs that are not methylated in one cell type, but methylated in the other cell types.
`dmr_pval_cutoff`	A cutoff threshold for the p-values when identifying DMRs that are methylated in one cell type, but not in the other cell types (or vice versa).
`cpg_pval_cutoff`	A cutoff threshold for the p-values when identifying differentially methylated CpGs that are methylated in one cell type, but not in the other cell types (or vice versa).
`cpg_up_dm_cutoff`	A cutoff threshold for identifying differentially methylated CpGs that are methylated in one cell type, but not in the other cell types.
`cpg_down_dm_cutoff`	A cutoff threshold for identifying differentially methylated CpGs that are not methylated in one cell type, but are methylated in the other cell types.
`pairwise_comparison`	TRUE/FAlSE of whether all pairwise comparisons (e.g. methylated in Granulocytes and Monocytes, but not methylated in other cell types). Default if FALSE.
`mset_train_flow_sort`	Default is NULL. However, a user can provide a `MethylSet` object after processing the `FlowSorted.Blood.450k` dataset. The default normalization is `preprocessIllumina()`.

Value

A list of data frames and GRanges objects.

.initialize_theta

Description

Creates a container with initial theta parameter estimates

Usage

.initialize_theta(n, K, alpha0 = NULL, alpha1 = NULL, sig0 = NULL,
  sig1 = NULL, tau = NULL)
.initialize_theta(n, K, alpha0 = NULL, alpha1 = NULL, sig0 = NULL,
  sig1 = NULL, tau = NULL)

Arguments

`n`	Number of samples
`K`	Number of cell types
`alpha0`	Default NULL. Initial mean methylation level in unmethylated regions
`alpha1`	Default NULL. Initial mean methylation level in methylated regions
`sig0`	Default NULL. Initial var methylation level in unmethylated regions
`sig1`	Default NULL. Initial var methylation level in methylated regions
`tau`	Default NULL. Initial var for measurement error

Value

A data frame with initial parameter estimates to be used in .initializeMLEs().

.initializeMLEs

Description

Helper functions to initialize MLEs in estimatecc().

Usage

.initializeMLEs(init_param_method, n, K, Ys, Zs, a0init, a1init, sig0init,
  sig1init, tauinit)
.initializeMLEs(init_param_method, n, K, Ys, Zs, a0init, a1init, sig0init,
  sig1init, tauinit)

Arguments

`init_param_method`	method to initialize parameter estimates. Choose between "random" (randomly sample) or "known_regions" (uses unmethyalted and methylated regions that were identified based on Reinus et al. (2012) cell sorted data.). Defaults to "random".
`n`	Number of samples
`K`	Number of cell types
`Ys`	observed methylation levels in samples provided by user of dimension R x n
`Zs`	Cell type specific regions of dimension R x K
`a0init`	Default NULL. Initial mean methylation level in unmethylated regions
`a1init`	Default NULL. Initial mean methylation level in methylated regions
`sig0init`	Default NULL. Initial var methylation level in unmethylated regions
`sig1init`	Default NULL. Initial var methylation level in methylated regions
`tauinit`	Default NULL. Initial var for measurement error

Value

A list of MLE estimates to be used in estimatecc().

.methylcc_engine

Description

Helper function for estimatecc

Usage

.methylcc_engine(Ys, Zs, current_pi_mle, current_theta, epsilon, max_iter)
.methylcc_engine(Ys, Zs, current_pi_mle, current_theta, epsilon, max_iter)

Arguments

`Ys`	observed methylation levels in samples provided by user of dimension R x n
`Zs`	Cell type specific regions of dimension R x K
`current_pi_mle`	cell composition MLE estimates of dimension K x n
`current_theta`	other parameter estimates in EM algorithm
`epsilon`	Add here.
`max_iter`	Add here.

Value

A list of MLE estimates that is used in estimatecc().

Expectation step

Description

Expectation step in EM algorithm for methylCC

Usage

.methylcc_estep(Ys, Zs, current_pi_mle, current_theta, meth_status = 0)
.methylcc_estep(Ys, Zs, current_pi_mle, current_theta, meth_status = 0)

Arguments

`Ys`	observed methylation levels in samples provided by user of dimension R x n
`Zs`	Cell type specific regions of dimension R x K
`current_pi_mle`	cell composition MLE estimates of dimension K x n
`current_theta`	other parameter estimates in EM algorithm
`meth_status`	Indicator function corresponding to regions that are unmethylated (meth_status=0) or methylated (meth_status=1)

Value

List of expected value of the first two moments of the random effects (or the E-Step in the EM algorithm) used in .methylcc_engine()

Maximization step

Description

Maximization step in EM Algorithm for methylCC

Usage

.methylcc_mstep(Ys, Zs, current_pi_mle, current_theta, estep0, estep1)
.methylcc_mstep(Ys, Zs, current_pi_mle, current_theta, estep0, estep1)

Arguments

`Ys`	observed methylation levels in samples provided by user of dimension R x n
`Zs`	Cell type specific regions of dimension R x K
`current_pi_mle`	cell composition MLE estimates of dimension K x n
`current_theta`	other parameter estimates in EM algorithm
`estep0`	Results from expectation step for unmethylated regions
`estep1`	Results from expectation step for methylated regions

Value

A list of the updated MLEs (or the M-Step in the EM algorithm) used in .methylcc_engine()

Pick target positions

Description

Pick probes from target data using the indices in dmp_regions

Usage

.pick_target_positions(target_granges, target_object = NULL,
  target_cvg = NULL, dmp_regions)
.pick_target_positions(target_granges, target_object = NULL,
  target_cvg = NULL, dmp_regions)

Arguments

`target_granges`	add more here.
`target_object`	an optional argument which contains the meta-data for `target_granges`. If `target_granges` already contains the meta-data, do not need to supply `target_object`.
`target_cvg`	coverage reads for the target object
`dmp_regions`	differentially methylated regions

Value

A list of GRanges objects to be used in .preprocess_estimatecc()

.preprocess_estimatecc

Description

This function preprocesses the data before the estimatecc() function

Usage

.preprocess_estimatecc(object, verbose = TRUE,
  init_param_method = "random",
  celltype_specific_dmrs = celltype_specific_dmrs)
.preprocess_estimatecc(object, verbose = TRUE,
  init_param_method = "random",
  celltype_specific_dmrs = celltype_specific_dmrs)

Arguments

`object`	an object can be a `RGChannelSet`, `GenomicMethylSet` or `BSseq` object
`verbose`	TRUE/FALSE argument specifying if verbose messages should be returned or not. Default is TRUE.
`init_param_method`	method to initialize parameter estimates. Choose between "random" (randomly sample) or "known_regions" (uses unmethyalted and methylated regions that were identified based on Reinus et al. (2012) cell sorted data.). Defaults to "random".
`celltype_specific_dmrs`	cell type specific differentially methylated regions (DMRs).

Value

A list of object to be used in estimatecc

.splitit

Description

helper function to split along a variable

Usage

.splitit(x)
.splitit(x)

Arguments

x

a vector

Value

A list to be used in find_dmrs()

Helper function to take the product of Z and cell composition estimates

Description

Helper function which is the product of Z and pi_mle

Usage

.WFun(Zs, pi_mle)
.WFun(Zs, pi_mle)

Arguments

`Zs`	Cell type specific regions of dimension R x K
`pi_mle`	cell composition MLE estimates

Value

A list of output after taking the product of Z and cell composition mle estimates to be used in .methylcc_estep().

Generic function that returns the cell composition estimates

Description

Given a estimatecc object, this function returns the cell composition estimates

Accessors for the 'cell_counts' slot of a estimatecc object.

Usage

cell_counts(object)

## S4 method for signature 'estimatecc'
cell_counts(object)
cell_counts(object)

## S4 method for signature 'estimatecc'
cell_counts(object)

Arguments

object

an object of class estimatecc.

Value

Returns the cell composition estimates

Examples

# This is a reduced version of the FlowSorted.Blood.450k 
# dataset available by using BiocManager::install("FlowSorted.Blood.450k),
# but for purposes of the example, we use the smaller version 
# and we set \code{demo=TRUE}. For any case outside of this example for 
# the package, you should set \code{demo=FALSE} (the default). 

dir <- system.file("data", package="methylCC")
files <- file.path(dir, "FlowSorted.Blood.450k.sub.RData") 
if(file.exists(files)){
    load(file = files)

    set.seed(12345)
    est <- estimatecc(object = FlowSorted.Blood.450k.sub, demo = TRUE) 
    cell_counts(est)
 }   

# This is a reduced version of the FlowSorted.Blood.450k 
# dataset available by using BiocManager::install("FlowSorted.Blood.450k),
# but for purposes of the example, we use the smaller version 
# and we set \code{demo=TRUE}. For any case outside of this example for 
# the package, you should set \code{demo=FALSE} (the default). 

dir <- system.file("data", package="methylCC")
files <- file.path(dir, "FlowSorted.Blood.450k.sub.RData") 
if(file.exists(files)){
    load(file = files)

    set.seed(12345)
    est <- estimatecc(object = FlowSorted.Blood.450k.sub, demo = TRUE) 
    cell_counts(est)
 }

Estimate cell composition from DNAm data

Description

Estimate cell composition from DNAm data

Usage

estimatecc(object, find_dmrs_object = NULL, verbose = TRUE,
  epsilon = 0.01, max_iter = 100, take_intersection = FALSE,
  include_cpgs = FALSE, include_dmrs = TRUE,
  init_param_method = "random", a0init = NULL, a1init = NULL,
  sig0init = NULL, sig1init = NULL, tauinit = NULL, demo = FALSE)
estimatecc(object, find_dmrs_object = NULL, verbose = TRUE,
  epsilon = 0.01, max_iter = 100, take_intersection = FALSE,
  include_cpgs = FALSE, include_dmrs = TRUE,
  init_param_method = "random", a0init = NULL, a1init = NULL,
  sig0init = NULL, sig1init = NULL, tauinit = NULL, demo = FALSE)

Arguments

`object`	an object can be a `RGChannelSet`, `GenomicMethylSet` or `BSseq` object
`find_dmrs_object`	If the user would like to supply different differentially methylated regions, they can use the output from the `find_dmrs` function to supply different regions to `estimatecc`.
`verbose`	TRUE/FALSE argument specifying if verbose messages should be returned or not. Default is TRUE.
`epsilon`	Threshold for EM algorithm to check for convergence. Default is 0.01.
`max_iter`	Maximum number of iterations for EM algorithm. Default is 100 iterations.
`take_intersection`	TRUE/FALSE asking if only the CpGs included in `object` should be used to find DMRs. Default is FALSE.
`include_cpgs`	TRUE/FALSE. Should individual CpGs be returned. Default is FALSE.
`include_dmrs`	TRUE/FALSE. Should differentially methylated regions be returned. Default is TRUE.
`init_param_method`	method to initialize parameter estimates. Choose between "random" (randomly sample) or "known_regions" (uses unmethyalted and methylated regions that were identified based on Reinus et al. (2012) cell sorted data.). Defaults to "random".
`a0init`	Default NULL. Initial mean methylation level in unmethylated regions
`a1init`	Default NULL. Initial mean methylation level in methylated regions
`sig0init`	Default NULL. Initial var methylation level in unmethylated regions
`sig1init`	Default NULL. Initial var methylation level in methylated regions
`tauinit`	Default NULL. Initial var for measurement error
`demo`	TRUE/FALSE. Should the function be used in demo mode to shorten examples in package. Defaults to FALSE.

Value

A object of the class estimatecc that contains information about the cell composition estimation (in the summary slot) and the cell composition estimates themselves (in the cell_counts slot).

Examples

# This is a reduced version of the FlowSorted.Blood.450k 
# dataset available by using BiocManager::install("FlowSorted.Blood.450k),
# but for purposes of the example, we use the smaller version 
# and we set \code{demo=TRUE}. For any case outside of this example for 
# the package, you should set \code{demo=FALSE} (the default). 

dir <- system.file("data", package="methylCC")
files <- file.path(dir, "FlowSorted.Blood.450k.sub.RData") 
if(file.exists(files)){
    load(file = files)

    set.seed(12345)
    est <- estimatecc(object = FlowSorted.Blood.450k.sub, demo = TRUE) 
    cell_counts(est)
 }   

# This is a reduced version of the FlowSorted.Blood.450k 
# dataset available by using BiocManager::install("FlowSorted.Blood.450k),
# but for purposes of the example, we use the smaller version 
# and we set \code{demo=TRUE}. For any case outside of this example for 
# the package, you should set \code{demo=FALSE} (the default). 

dir <- system.file("data", package="methylCC")
files <- file.path(dir, "FlowSorted.Blood.450k.sub.RData") 
if(file.exists(files)){
    load(file = files)

    set.seed(12345)
    est <- estimatecc(object = FlowSorted.Blood.450k.sub, demo = TRUE) 
    cell_counts(est)
 }

the estimatecc class

Description

Objects of this class store all the values needed information to work with a estimatecc object

Value

summary returns the summary information about the cell composition estimate procedure and cell_counts returns the cell composition estimates

Slots

summary: information about the samples and regions used to estimate cell composition
cell_counts: cell composition estimates

Examples

# This is a reduced version of the FlowSorted.Blood.450k 
# dataset available by using BiocManager::install("FlowSorted.Blood.450k),
# but for purposes of the example, we use the smaller version 
# and we set \code{demo=TRUE}. For any case outside of this example for 
# the package, you should set \code{demo=FALSE} (the default). 

dir <- system.file("data", package="methylCC")
files <- file.path(dir, "FlowSorted.Blood.450k.sub.RData") 
if(file.exists(files)){
    load(file = files)

    set.seed(12345)
    est <- estimatecc(object = FlowSorted.Blood.450k.sub, demo = TRUE) 
    cell_counts(est)
 }   

# This is a reduced version of the FlowSorted.Blood.450k 
# dataset available by using BiocManager::install("FlowSorted.Blood.450k),
# but for purposes of the example, we use the smaller version 
# and we set \code{demo=TRUE}. For any case outside of this example for 
# the package, you should set \code{demo=FALSE} (the default). 

dir <- system.file("data", package="methylCC")
files <- file.path(dir, "FlowSorted.Blood.450k.sub.RData") 
if(file.exists(files)){
    load(file = files)

    set.seed(12345)
    est <- estimatecc(object = FlowSorted.Blood.450k.sub, demo = TRUE) 
    cell_counts(est)
 }

A reduced size of the FlowSorted.Blood.450k dataset

Description

A reduced size of the FlowSorted.Blood.450k dataset

The object was created using the script in /inst and located in the /data folder.

Format

A RGset object with 2e5 rows (probes) and 6 columns (whole blood samples).

Unmethylated regions for all celltypes

Description

This is the script used to create the offMethRegions data set. The purpose is use in the estimate_cc() function.

The object was created using the script in /inst and located in the /data folder.

Format

add more here.

Methylated regions for all celltypes

Description

This is the script used to create the onMethRegions data set. The purpose is use in the estimate_cc() function.

The object was created using the script in /inst and located in the /data folder.

Format

add more here.

Package 'methylCC'

Help Index

Extract raw data

Description

Usage

Arguments

Value

Finding differentially methylated regions

Description

Usage

Arguments

Value

.initialize_theta

Description

Usage

Arguments

Value

.initializeMLEs

Description

Usage

Arguments

Value

.methylcc_engine

Description

Usage

Arguments

Value

Expectation step

Description

Usage

Arguments

Value

Maximization step

Description

Usage

Arguments

Value

Pick target positions

Description

Usage

Arguments

Value

.preprocess_estimatecc

Description

Usage

Arguments

Value

.splitit

Description

Usage

Arguments

Value

Helper function to take the product of Z and cell composition estimates

Description

Usage

Arguments

Value

Generic function that returns the cell composition estimates

Description

Usage

Arguments

Value

Examples

Estimate cell composition from DNAm data

Description

Usage

Arguments

Value

Examples

the estimatecc class

Description

Value

Slots

Examples

A reduced size of the FlowSorted.Blood.450k dataset

Description

Format

Unmethylated regions for all celltypes

Description

Format