Package 'scMET' reference manual

Title:	Bayesian modelling of cell-to-cell DNA methylation heterogeneity
Description:	High-throughput single-cell measurements of DNA methylomes can quantify methylation heterogeneity and uncover its role in gene regulation. However, technical limitations and sparse coverage can preclude this task. scMET is a hierarchical Bayesian model which overcomes sparsity, sharing information across cells and genomic features to robustly quantify genuine biological heterogeneity. scMET can identify highly variable features that drive epigenetic heterogeneity, and perform differential methylation and variability analyses. We illustrate how scMET facilitates the characterization of epigenetically distinct cell populations and how it enables the formulation of novel hypotheses on the epigenetic regulation of gene expression.
Authors:	Andreas C. Kapourani [aut, cre] , John Riddell [ctb]
Maintainer:	Andreas C. Kapourani <[email protected]>
License:	GPL-3
Version:	1.9.0
Built:	2025-03-30 06:59:49 UTC
Source:	https://github.com/bioc/scMET

`scMET`: Bayesian modelling of DNA methylation at single-cell resolution.

Description

Package for analysing single-cell DNA methylation datasets. scMET performs feature selection, by identifying highly variable features, and also differential testing, based on mean but also more importantly on variability between two groups of cells.

Value

scMET main package documentation.

Author(s)

C.A.Kapourani [email protected]

References

Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.19.3. https://mc-stan.org

Beta binomial maximum likelihood estimation (BB MLE)

Description

Maximum Likelihood Estimate (MLE) of Beta-Binomial (BB) model. Some details about this model can be found on the following tutorial https://rpubs.com/cakapourani/beta-binomial

Usage

bb_mle(x, w = NULL, n_starts = 10, lower_thresh = 0.001)
bb_mle(x, w = NULL, n_starts = 10, lower_thresh = 0.001)

Arguments

`x`	An n x 2 data.table or matrix, where 1st column keeps total number of trials and 2nd column number of successes, n is the total number of samples.
`w`	Vector with initial values of `alpha` and `beta`, if NULL the method of moments is used to initialize them.
`n_starts`	Total number of restarts when optimisation fails.
`lower_thresh`	Threshold when to stop optimisation.

Value

A list with the following elements:

gamma: The overdispersion parameter. This is the most important parameter, since it tells us if and how much overdispersion we observe in the data that cannot be explained by the Binomial model.
mu: The mean parameter, i.e. success probability of the beta binomial.
alpha: Alpha parameter, when taking the different parametrisation of the BB.
beta: Beta parameter, when taking the different parametrisation of the BB.
is_conv: Logical, whether or not the optimisation converged.
lrt: The likelihood ratio test statistic, for testing whether the Binomial or the Beta-Binomial fit better the data.
chi2_test: The p-value from the Chi-squared test obtained from the LRT statistics.
Z_score: The Z score statistic proposed by Tarone (1979). Seems more stable than LRT, in test whether we have overdispersion in our data.
z_test: The p-value obtain from the Z-score statistic.
bb_ll: Beta binomial log likelihood (used internally to compute the LRT statistic and the BIC)
BIC_bb: The Bayes Information Criterion for beta binomial model
bin_ll: Binomial log likelihood (used internally to compute the LRT statistic and the BIC.)
BIC_bin|: The Bayes Information Criterion for binomial model

Author(s)

C.A.Kapourani [email protected]

Examples

# Extract data from a single Feature
x <- scmet_dt$Y[Feature == "Feature_1", c("total_reads", "met_reads")]
fit_mle <- bb_mle(x)
# Extract data from a single Feature
x <- scmet_dt$Y[Feature == "Feature_1", c("total_reads", "met_reads")]
fit_mle <- bb_mle(x)

Create design matrix

Description

Generic function for crating a radial basis function (RBF) design matrix for input vector X.

Usage

create_design_matrix(L, X, c = 1.2)
create_design_matrix(L, X, c = 1.2)

Arguments

`L`	Total number of basis functions, including the bias term.
`X`	Vector of covariates
`c`	Scaling parameter for variance of RBFs

Value

A design matrix object H.

Author(s)

C.A.Kapourani [email protected]

Examples

# Extract
H <- create_design_matrix(L = 4, X = scmet_dt$X)

# Extract
H <- create_design_matrix(L = 4, X = scmet_dt$X)

Convert from SingleCellExperiment to scmet object

Description

Helper function that converts SCE objects to scmet objects that can be used as input to the scmet function. The structure of the SCE object to store single cell methylation data is the following. We create two sparse assays, met storing methylated CpGs and total storing total number of CpGs. Rows correspond to features and columns to cells, similar to scRNA-seq convention.To distinguish between a feature (in a cell) having zero methylated CpGs vs not having CpG coverage at all (missing value), we check if the corresponding entry in total is zero as well. The rownames and colnames slots should store the feature and cell names, respectively. Covariates X that might explain variability in mean (methylation) should be stored in ⁠metadata(rowData(sce)X⁠.

Usage

sce_to_scmet(sce)
sce_to_scmet(sce)

Arguments

sce

SummarizedExperiment object

Value

A named list containing the matrix Y (methylation data in format required by the scmet function) and the covariates X.

Author(s)

C.A.Kapourani [email protected]

Examples

# Extract
sce <- scmet_to_sce(Y = scmet_dt$Y, X = scmet_dt$X)

df <- sce_to_scmet(sce)

# Extract
sce <- scmet_to_sce(Y = scmet_dt$Y, X = scmet_dt$X)

df <- sce_to_scmet(sce)

Perform inference with scMET

Description

Compute posterior of scMET model. This is the main function which infers model parameters and corrects for the mean-overdispersion relationship. The most important parameters the user should focus are X, L, user_mcmc and iter. Advanced users may want to optimise the model by changing the prior parameters. For small datasets, we recommend using MCMC implementation of scMET since it is more stable.

Usage

scmet(
  Y,
  X = NULL,
  L = 4,
  use_mcmc = FALSE,
  use_eb = TRUE,
  iter = 5000,
  algorithm = "meanfield",
  output_samples = 2000,
  chains = 4,
  m_wmu = rep(0, NCOL(X)),
  s_wmu = 2,
  s_mu = 1.5,
  m_wgamma = rep(0, L),
  s_wgamma = 2,
  a_sgamma = 2,
  b_sgamma = 3,
  rbf_c = 1,
  init_using_eb = TRUE,
  tol_rel_obj = 1e-04,
  n_cores = 2,
  lambda = 4,
  seed = sample.int(.Machine$integer.max, 1),
  ...
)
scmet(
  Y,
  X = NULL,
  L = 4,
  use_mcmc = FALSE,
  use_eb = TRUE,
  iter = 5000,
  algorithm = "meanfield",
  output_samples = 2000,
  chains = 4,
  m_wmu = rep(0, NCOL(X)),
  s_wmu = 2,
  s_mu = 1.5,
  m_wgamma = rep(0, L),
  s_wgamma = 2,
  a_sgamma = 2,
  b_sgamma = 3,
  rbf_c = 1,
  init_using_eb = TRUE,
  tol_rel_obj = 1e-04,
  n_cores = 2,
  lambda = 4,
  seed = sample.int(.Machine$integer.max, 1),
  ...
)

Arguments

`Y`	Observed data (methylated reads and total reads) for each feature and cell, in a long format `data.table`. That is it should have 4 named columns: (Feature, Cell, total_reads, met_reads).
`X`	Covariates which might explain variability in mean (methylation). If X = NULL, then we do not perform any correction on the mean estimates. NOTE that if X is provided, `rownames` of X should be the unique feature names in Y. If the dimensions or all feature names do not match, an error will be thrown.
`L`	Total number of basis function to fit the mean-overdispersion trend. For L = 1, this reduces to a model that does not correct for the mean-overdispersion relationship.
`use_mcmc`	Logical, whether to use the MCMC implementation for posterior inference. If FALSE, we run the VB implementation (default). For small datasets, we recommend using MCMC implementation since it is more stable.
`use_eb`	Logical, whether to use 'Empirical Bayes' for parameter initialization. If `TRUE` (default), it will intialise the `m_wmu` and `m_wgamma` parameters below.
`iter`	Total number of iterations, either MCMC or VB algorithm. NOTE: The STAN implementation of VB relies on black-box variational inference and potentially with relatively small sample sizes sometimes tends to 'search' around the local/global minima. We've seen that with larger sample sizes (thousands of cells), it tends to converge much faster, e.g. around 2-3k iterations.
`algorithm`	Stan algorithm to be used by Stan. If MCMC: Possible values are: "NUTS", "HMC". If VB: Possible values are: "meanfield" and "fullrank".
`output_samples`	If VB algorithm, the number of posterior samples to draw and save.
`chains`	Total number of chains.
`m_wmu`	Prior mean of regression coefficients for covariates X.
`s_wmu`	Prior standard deviation of regression coefficients for covariates X.
`s_mu`	Prior standard deviation for mean parameter `mu`.
`m_wgamma`	Prior mean of regression coefficients of the basis functions.
`s_wgamma`	Prior standard deviation of regression coefficients of the basis functions.
`a_sgamma`	Gamma prior (shape) for standard deviation for dispersion parameter `gamma`.
`b_sgamma`	Gamma prior (rate) for standard deviation for dispersion parameter `gamma`.
`rbf_c`	Scale parameter for empirically computing the variance of the RBFs.
`init_using_eb`	Logical, initial values of parameters for STAN posterior inference. Preferably this should be set always to TRUE, to lower the chances of VB/MCMC initialisations being far away from posterior mass.
`tol_rel_obj`	If VB algorithm, the convergence tolerance on the relative norm of the objective.
`n_cores`	Total number of cores.
`lambda`	The penalty term to fit the RBF coefficients for the mean-overdispersion trend when initialising hyper-parameter with EB.
`seed`	The seed for random number generation.
`...`	Additional parameters passed to `Stan` fitting functions.

Value

An object of class scmet_mcmc or scmet_vb with the following elements:

posterior: A list of matrices containing the samples from the posterior. Each matrix corresponds to a different parameter returned from scMET.
Y: The observed data Y.
feature_names: A vector of feature names.
theta_priors: A list with all prior parameter values, for reproducibility purposes.
opts: A list of all additional parameters when running scMET. For reproducibility purposes.

Author(s)

C.A.Kapourani [email protected]

Examples

# Fit scMET (in practice 'iter' should be much larger)
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 300)

# Fit scMET (in practice 'iter' should be much larger)
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 300)

Synthetic methylation data from two groups of cells

Description

Small synthetic data for quick analysis, mostly useful for showing the differential analysis one can perform using scMET.

Usage

scmet_diff_dt
scmet_diff_dt

Format

An object of class scmet_simulate_diff of length 9.

Value

A list object with simulated data.

Differential testing using scMET

Description

Function for performing differential methylation testing to identify differentially methylted (DM) and differentially variable (DV) features across two groups of pre-specified cell populations.

Usage

scmet_differential(
  obj_A,
  obj_B,
  psi_m = log(1.5),
  psi_e = log(1.5),
  psi_g = log(1.5),
  evidence_thresh_m = 0.8,
  evidence_thresh_e = 0.8,
  evidence_thresh_g = 0.8,
  efdr_m = 0.05,
  efdr_e = 0.05,
  efdr_g = 0.05,
  group_label_A = "GroupA",
  group_label_B = "GroupB",
  features_selected = NULL,
  filter_outlier_features = FALSE,
  outlier_m = 0.05,
  outlier_g = 0.05
)
scmet_differential(
  obj_A,
  obj_B,
  psi_m = log(1.5),
  psi_e = log(1.5),
  psi_g = log(1.5),
  evidence_thresh_m = 0.8,
  evidence_thresh_e = 0.8,
  evidence_thresh_g = 0.8,
  efdr_m = 0.05,
  efdr_e = 0.05,
  efdr_g = 0.05,
  group_label_A = "GroupA",
  group_label_B = "GroupB",
  features_selected = NULL,
  filter_outlier_features = FALSE,
  outlier_m = 0.05,
  outlier_g = 0.05
)

Arguments

`obj_A`	The scMET posterior object for group A.
`obj_B`	The scMET posterior object for group B.
`psi_m`	Minimum log odds ratio tolerance threshold for detecting changes in overall methylation (positive real number). Default value: `psi_m = log(1.5)` (i.e. 50% increase).
`psi_e`	Minimum log odds ratio tolerance threshold for detecting changes in residual over-dispersion (positive real number).
`psi_g`	Minimum log odds ratio tolerance threshold for detecting changes in biological over-dispersion (positive real number).
`evidence_thresh_m`	Optional parameter. Posterior evidence probability threshold parameter `⁠alpha_{M}⁠` for detecting changes in overall methylation (between 0.6 and 1). If `efdr_m = NULL`, then threshold will be set to `evidence_thresh_m`. If a value for `EFDR_M` is provided, the posterior probability threshold is chosen to achieve an EFDR equal to `efdr_m` and `evidence_thresh_m` defines a minimum probability threshold for this calibration (this avoids low values of `evidence_thresh_m` to be chosen by the EFDR calibration. Default value `evidence_thresh_m = 0.8`.
`evidence_thresh_e`	Optional parameter. Posterior evidence probability threshold parameter `⁠alpha_{G}⁠` for detecting changes in cell-to-cell residual over-dispersion. Same usage as above.
`evidence_thresh_g`	Optional parameter. Posterior evidence probability threshold parameter `⁠alpha_{G}⁠` for detecting changes in cell-to-cell biological over-dispersion. Same usage as above.
`efdr_m`	Target for expected false discovery rate related to the comparison of means. If `efdr_m = NULL`, no calibration is performed, and `⁠alpha_{M}⁠` is set to `evidence_thresh_m`. Default value: `efdr_m = 0.05`.
`efdr_e`	Target for expected false discovery rate related to the comparison of residual over-dispersions If `efdr_e = NULL`, no calibration is performed, and 'alpha_E“ is set to `evidence_thresh_e`. Default value: `efdr_e = 0.05`.
`efdr_g`	Target for expected false discovery rate related to the comparison of biological over-dispersions If `efdr_g = NULL`, no calibration is performed, and `⁠alpha_{G}⁠` is set to `evidence_thresh_g`. Default value: `efdr_g = 0.05`.
`group_label_A`	Label assigned to group A.
`group_label_B`	Label assigned to group B.
`features_selected`	User defined list of selected features to perform differential analysis. Should be the same length as the total number of features, with TRUE for features included in the differential analysis, and FALSE for those excluded from further analysis.
`filter_outlier_features`	Logical, whether to filter features that have either mean methylation levels `mu` or overdispersion `gamma` across both groups near the range edges, i.e. taking values near 0 or 1. This mostly is an issue due to taking the logit transformation which effectively makes small changes in actual space (0, 1) to look really large in transformed space (-Inf, Inf). In general we expect this will not remove many interesting features with biological information.
`outlier_m`	Value of average mean methylation across both groups so a feature is considered as outlier. I.e. if set to 0.05, then will remove features with `mu` < 0.05 or `mu` > 1 - 0.05. Only used if `filter_outlier_features = TRUE`.
`outlier_g`	Value of average overdispersion `gamma` across groups so a feature is considered as outlier. Same as `outlier_m` parameter above.

Value

An scmet_differential object which is a list containing the following elements:

diff_mu_summary: A data.frame containing differential mean methylation output information per feature (rows), including posterior median parameters for each group and mu_LOR containing the log odds-ratio between the groups. The mu_tail_prob column contains the posterior tail probability of a feature being called as DM. The mu_diff_test column informs the outcomes of the test.
diff_epsilon_summary: Same as above, but for differential variability based on residual overdispersion.
diff_gamma_summary: The same as above but for DV analysis based on overdispersion.
diff_mu_thresh: Information about optimal posterior evidence threshold search for mean methylation mu.
diff_epsilon_thresh: Same as above but for residual overdispersion epsilon..
diff_gamma_thresh: Same as above but for overdispersion gamma.
opts: The parameters used for testing. For reproducibility purposes.

Author(s)

C.A.Kapourani [email protected]

Examples

## Not run: 
# Fit scMET for each group
fit_A <- scmet(Y = scmet_diff_dt$scmet_dt_A$Y,
X = scmet_diff_dt$scmet_dt_A$X, L = 4, iter = 50, seed = 12)
fit_B <- scmet(Y = scmet_diff_dt$scmet_dt_B$Y,
X = scmet_diff_dt$scmet_dt_B$X, L = 4, iter = 50, seed = 12)

# Run differential test
diff_obj <- scmet_differential(obj_A = fit_A, obj_B = fit_B)

## End(Not run)

## Not run: 
# Fit scMET for each group
fit_A <- scmet(Y = scmet_diff_dt$scmet_dt_A$Y,
X = scmet_diff_dt$scmet_dt_A$X, L = 4, iter = 50, seed = 12)
fit_B <- scmet(Y = scmet_diff_dt$scmet_dt_B$Y,
X = scmet_diff_dt$scmet_dt_B$X, L = 4, iter = 50, seed = 12)

# Run differential test
diff_obj <- scmet_differential(obj_A = fit_A, obj_B = fit_B)

## End(Not run)

Synthetic methylation data from a single population

Description

Small synthetic data for quick analysis, mostly useful for performing feature selection and capturing mean-variance relationship with scMET.

Usage

scmet_dt
scmet_dt

Format

An object of class scmet_simulate of length 5.

Value

A list object with simulated data.

Detect highly (or lowly) variable features with scMET

Description

Function for calling features as highly (or lowly) variable within a datasert or cell population. This can be thought as a feature selection step, where the highly variable features (HVF) can be used for diverse downstream tasks, such as clustering or visualisation. Two approaches for identifying HVFs (or LVFs): (1) If we correct for mean-dispersion relationship, then we work directly on residual dispersions epsilon, and define a percentile threshold delta_e. This is the preferred option since the residual overdispersion is not confounded by mean methylation levels. (2) Work directly with the overdispersion parameter gamma and define an overdispersion contribution threshold delta_g, above (below) of which we call HVFs (LVFs).

Usage

scmet_hvf(
  scmet_obj,
  delta_e = 0.9,
  delta_g = NULL,
  evidence_thresh = 0.8,
  efdr = 0.1
)

scmet_lvf(
  scmet_obj,
  delta_e = 0.1,
  delta_g = NULL,
  evidence_thresh = 0.8,
  efdr = 0.1
)
scmet_hvf(
  scmet_obj,
  delta_e = 0.9,
  delta_g = NULL,
  evidence_thresh = 0.8,
  efdr = 0.1
)

scmet_lvf(
  scmet_obj,
  delta_e = 0.1,
  delta_g = NULL,
  evidence_thresh = 0.8,
  efdr = 0.1
)

Arguments

`scmet_obj`	The scMET posterior object after performing inference, i.e. after calling `scmet` function.
`delta_e`	Percentile threshold for residual overdispersion to detect variable features (between 0 and 1). Default: 0.9 for HVF and 0.1 for LVF (top 10%). NOTE: This parameter should be used when correcting for mean-dispersion relationship.
`delta_g`	Overdispersion contribution threshold (between 0 and 1).
`evidence_thresh`	Optional parameter. Posterior evidence probability threshold parameter `⁠alpha_{H}⁠` (between 0.6 and 1).
`efdr`	Target for expected false discovery rate related to HVF/LVF detection (default = 0.1).

Value

The scMET posterior object with an additional element named hvf or lvf according to the analysis performed. This is a list object containing the following elements:

summary: A data.frame containing HVF or LVF analysis output information per feature, including posterior medians for mu, gamma, and epsilon. The tail_prob column contains the posterior tail probability of a feature being called as HVF or LVF. The logical is_variable column informs whether the feature is called as variable or not.
evidence_thresh: The optimal evidence threshold.
efdr: The EFDR value.
efnr: The EFNR value.
efdr_grid: The EFDR values for the grid search.
efnr_grid: The EFNR values for the grid search.
evidence_thresh_grid: The grid where we searched for optimal evidence threshold.

Author(s)

C.A.Kapourani [email protected]

Examples

# Fit scMET
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 100)

# Run HVF analysis
obj <- scmet_hvf(scmet_obj = obj)

# Run LVF analysis
obj <- scmet_lvf(scmet_obj = obj)

# Fit scMET
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 100)

# Run HVF analysis
obj <- scmet_hvf(scmet_obj = obj)

# Run LVF analysis
obj <- scmet_lvf(scmet_obj = obj)

Plot EFDR/EFNR grid

Description

Function for plotting the grid search performed to obtain the optimal posterior evidence threshold to achieve a specific EFDR.

Usage

scmet_plot_efdr_efnr_grid(obj, task = "hvf")
scmet_plot_efdr_efnr_grid(obj, task = "hvf")

Arguments

`obj`	Either the scMET object after calling the `scmet_hvf_lvf` functions or the object from calling the `scmet_differential` function.
`task`	String. When calling variable features, i.e. output of `scmet_hvf_lvf`, it can be either "hvf" or "lvf". For differential analysis, i.e. output of `scmet_differential`, it can be either: (1) "diff_mu" for diff mean methylation, (2) "diff_epsilon" for residual overdispersion, or (3) "diff_gamma" for overdispersion analysis.

Value

A ggplot2 object.

Author(s)

C.A.Kapourani [email protected]

Examples

# Fit scMET
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 100)
obj <- scmet_hvf(scmet_obj = obj, delta_e = 0.7)
scmet_plot_vf_tail_prob(obj = obj, task = "hvf")

# Fit scMET
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 100)
obj <- scmet_hvf(scmet_obj = obj, delta_e = 0.7)
scmet_plot_vf_tail_prob(obj = obj, task = "hvf")

Plot true versus inferred parameter estimated.

Description

Function for plotting true on x-axis and inferred parameter estimates on y-axis (either mean methylation or overdispersion). Along with posterior medians, the 80 high posterior density is shown as error bars. Wehn MLE estimates are provided, a plot showing the shrinkage introduced by scMET is shown as arrows.

Usage

scmet_plot_estimated_vs_true(
  obj,
  sim_dt,
  param = "mu",
  mle_fit = NULL,
  diff_feat_idx = NULL,
  hpd_thresh = 0.8,
  title = NULL,
  nfeatures = NULL
)
scmet_plot_estimated_vs_true(
  obj,
  sim_dt,
  param = "mu",
  mle_fit = NULL,
  diff_feat_idx = NULL,
  hpd_thresh = 0.8,
  title = NULL,
  nfeatures = NULL
)

Arguments

`obj`	The scMET object after calling the `scmet` function.
`sim_dt`	The simulated data object. E.g. after calling the `scmet_simulate` function.
`param`	The parameter to plot posterior estimates, either "mu" or "gamma".
`mle_fit`	A three column matrix of beta-binomial maximum likelihood estimates. First column feature name, second column mean methylation and third column overdispersion estimates. Number of features should match the ones used by scMET.
`diff_feat_idx`	Vector with locations of features that were simulated to be differentially variable or methylated. This is stored in the object after calling the `scmet_simulate_diff` function.
`hpd_thresh`	The high posterior density threshold, as computed by the `HPDinterval` function.
`title`	Optional title, default NULL.
`nfeatures`	Optional parameter, denoting a subset of number of features to plot. Mostly to reduce over-plotting.

Value

A ggplot2 object.

Author(s)

C.A.Kapourani [email protected]

Examples

# Fit scMET
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 100)
scmet_plot_estimated_vs_true(obj = obj, sim_dt = scmet_dt, param = "mu")

# BB MLE fit to compare with scMET
mle_fit <- scmet_dt$Y[, bb_mle(cbind(total_reads, met_reads))
[c("mu", "gamma")], by = c("Feature")]
scmet_plot_estimated_vs_true(obj = obj, sim_dt = scmet_dt, param = "mu",
mle_fit = mle_fit)

# Fit scMET
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 100)
scmet_plot_estimated_vs_true(obj = obj, sim_dt = scmet_dt, param = "mu")

# BB MLE fit to compare with scMET
mle_fit <- scmet_dt$Y[, bb_mle(cbind(total_reads, met_reads))
[c("mu", "gamma")], by = c("Feature")]
scmet_plot_estimated_vs_true(obj = obj, sim_dt = scmet_dt, param = "mu",
mle_fit = mle_fit)

MA plot for differential analysis

Description

Function showing MA plots for differential analysis. The y-axis shows difference between measurements across two groups and the x-axis shows the average measurements across the two groups.

Usage

scmet_plot_ma(
  diff_obj,
  task = "diff_epsilon",
  x = "mu",
  xlab = NULL,
  ylab = NULL,
  title = NULL,
  nfeatures = NULL
)
scmet_plot_ma(
  diff_obj,
  task = "diff_epsilon",
  x = "mu",
  xlab = NULL,
  ylab = NULL,
  title = NULL,
  nfeatures = NULL
)

Arguments

`diff_obj`	The differential scMET object after calling the `scmet_differential` function.
`task`	The differential test to plot. For differential mean methylation: `diff_mu` that plots the LOR(mu_A, mu_B) on y-axis. For differential variability: either (1) `diff_epsilon` that plots the change (epsilon_A - epsilon_B), or (2) `diff_gamma` that plots the LOR(gamma_A, gamma_B) on y-axis.
`x`	The average parameter across the two populations to plot on the x-axis. Can be either `mu`, `epsilon` or `gamma`. When `task = epsilon`, x can be either `mu` or `epsilon`. When `task = gamma`, x can be either `mu` or `gamma`. When `task = mu`, x can be only `mu`.
`xlab`	Optional x-axis label.
`ylab`	Optional y-axis label.
`title`	Optional title, default NULL.
`nfeatures`	Optional parameter, denoting a subset of number of features to plot (only for non-differential features). Mostly to reduce over-plotting.

Value

A ggplot2 object.

Author(s)

C.A.Kapourani [email protected]

Examples

## Not run: 
# Fit scMET for each group
fit_A <- scmet(Y = scmet_diff_dt$scmet_dt_A$Y,
X = scmet_diff_dt$scmet_dt_A$X, L = 4, iter = 100, seed = 12)
fit_B <- scmet(Y = scmet_diff_dt$scmet_dt_B$Y,
X = scmet_diff_dt$scmet_dt_B$X, L = 4, iter = 100, seed = 12)

# Run differential test
diff_obj <- scmet_differential(obj_A = fit_A, obj_B = fit_B)
# Create volcano plot
scmet_plot_ma(diff_obj, task = "diff_epsilon")

## End(Not run)

## Not run: 
# Fit scMET for each group
fit_A <- scmet(Y = scmet_diff_dt$scmet_dt_A$Y,
X = scmet_diff_dt$scmet_dt_A$X, L = 4, iter = 100, seed = 12)
fit_B <- scmet(Y = scmet_diff_dt$scmet_dt_B$Y,
X = scmet_diff_dt$scmet_dt_B$X, L = 4, iter = 100, seed = 12)

# Run differential test
diff_obj <- scmet_differential(obj_A = fit_A, obj_B = fit_B)
# Create volcano plot
scmet_plot_ma(diff_obj, task = "diff_epsilon")

## End(Not run)

Plotting mean-variability relationship

Description

Function for plotting mean methylation on x-axis and variability on y-axis (either overdispersion or residual overdispersion). If HVF/LVF analysis is performed, points will be also coloured accordingly.

Usage

scmet_plot_mean_var(
  obj,
  y = "gamma",
  task = NULL,
  show_fit = TRUE,
  title = NULL,
  nfeatures = NULL,
  n = 80
)
scmet_plot_mean_var(
  obj,
  y = "gamma",
  task = NULL,
  show_fit = TRUE,
  title = NULL,
  nfeatures = NULL,
  n = 80
)

Arguments

`obj`	The scMET object after calling the `scmet_hvf_lvf` function.
`y`	The parameter to plot on the y-axis. Values can be `gamma` (default) or `epsilon`.
`task`	If NULL (default) the mean-variability relationship is plotted. If set to "hvf" or "lvf", points are coloured according the HVF/LVF analysis task.
`show_fit`	Logical, whether to show the fitted mean-overdispersion trend. Applicable only when `y = gamma` and `task = NULL`.
`title`	Optional title, default NULL.
`nfeatures`	Optional parameter, denoting a subset of number of features to plot. Mostly to reduce over-plotting. When `⁠task = hvf or lvf⁠`, the subsampling is performed on the features that are not called as HVF or LVF (i.e. not interesting features).
`n`	Optional integer denoting the number of grid points to colour them by density. Used by `kde2d` function. Used only when `task = NULL`.

Value

A ggplot2 object.

Author(s)

C.A.Kapourani [email protected]

Examples

# Fit scMET
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 100)
scmet_plot_mean_var(obj = obj, y = "gamma")

# Fit scMET
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 100)
scmet_plot_mean_var(obj = obj, y = "gamma")

Plot tail probabilities for variable feature analysis

Description

Function for plotting the tail probabilities associated with the HVF/LVF analysis. The tail probabilities are plotted on the y-axis, and the user can choose which parameter can be plotted on the x-axis, using the x parameter.

Usage

scmet_plot_vf_tail_prob(
  obj,
  x = "mu",
  task = "hvf",
  title = NULL,
  nfeatures = NULL
)
scmet_plot_vf_tail_prob(
  obj,
  x = "mu",
  task = "hvf",
  title = NULL,
  nfeatures = NULL
)

Arguments

`obj`	The scMET object after calling the `scmet_hvf_lvf` function.
`x`	The parameter to plot on the x-axis. Values can be `mu` (default), `epsilon` or `gamma`.
`task`	The task for identifying variable, either "hvf" or "lvf".
`title`	Optional title, default NULL.
`nfeatures`	Optional parameter, denoting a subset of number of features to plot (only for non HVF/LVF features). Mostly to reduce over-plotting.

Value

A ggplot2 object.

Author(s)

C.A.Kapourani [email protected]

Examples

# Fit scMET
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 100)
obj <- scmet_hvf(scmet_obj = obj, delta_e = 0.7)
scmet_plot_vf_tail_prob(obj = obj, x = "mu")

# Fit scMET
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 100)
obj <- scmet_hvf(scmet_obj = obj, delta_e = 0.7)
scmet_plot_vf_tail_prob(obj = obj, x = "mu")

Volcano plot for differential analysis

Description

Function showing volcano plots for differential analysis. The posterior tail probabilities are ploteted on the y-axis, and depending on the differential test to plot the effect size will be plotted on the x-axis. For differential variability (DV) analysis we recommend using the epsilon parameter.

Usage

scmet_plot_volcano(
  diff_obj,
  task = "diff_epsilon",
  xlab = NULL,
  ylab = "Posterior tail probability",
  title = NULL,
  nfeatures = NULL
)
scmet_plot_volcano(
  diff_obj,
  task = "diff_epsilon",
  xlab = NULL,
  ylab = "Posterior tail probability",
  title = NULL,
  nfeatures = NULL
)

Arguments

`diff_obj`	The differential scMET object after calling the `scmet_differential` function.
`task`	The differential test to plot. For differential mean methylation: `diff_mu` that plots the LOR(mu_A, mu_B) on x-axis. For differential variability: either (1) `diff_epsilon` that plots the change (epsilon_A - epsilon_B), or (2) `diff_gamma` that plots the LOR(gamma_A, gamma_B) on x-axis.
`xlab`	Optional x-axis label.
`ylab`	Optional y-axis label.
`title`	Optional title, default NULL.
`nfeatures`	Optional parameter, denoting a subset of number of features to plot (only for non-differential features). Mostly to reduce over-plotting.

Value

A ggplot2 object.

Author(s)

C.A.Kapourani [email protected]

Examples

## Not run: 
# Fit scMET for each group
fit_A <- scmet(Y = scmet_diff_dt$scmet_dt_A$Y,
X = scmet_diff_dt$scmet_dt_A$X, L = 4, iter = 100, seed = 12)
fit_B <- scmet(Y = scmet_diff_dt$scmet_dt_B$Y,
X = scmet_diff_dt$scmet_dt_B$X, L = 4, iter = 100, seed = 12)

# Run differential test
diff_obj <- scmet_differential(obj_A = fit_A, obj_B = fit_B)
# Create volcano plot
scmet_plot_volcano(diff_obj, task = "diff_epsilon")

## End(Not run)

## Not run: 
# Fit scMET for each group
fit_A <- scmet(Y = scmet_diff_dt$scmet_dt_A$Y,
X = scmet_diff_dt$scmet_dt_A$X, L = 4, iter = 100, seed = 12)
fit_B <- scmet(Y = scmet_diff_dt$scmet_dt_B$Y,
X = scmet_diff_dt$scmet_dt_B$X, L = 4, iter = 100, seed = 12)

# Run differential test
diff_obj <- scmet_differential(obj_A = fit_A, obj_B = fit_B)
# Create volcano plot
scmet_plot_volcano(diff_obj, task = "diff_epsilon")

## End(Not run)

Simulate methylation data from scMET.

Description

General function for simulating datasets with diverse proprties. This for instance include, adding covariates X that explain differences in mean methylation levels. Or also defining the trend for the mean - overdispersion relationship.

Usage

scmet_simulate(
  N_feat = 100,
  N_cells = 50,
  N_cpgs = 15,
  L = 4,
  X = NULL,
  w_mu = c(-0.5, -1.5),
  s_mu = 1,
  w_gamma = NULL,
  s_gamma = 0.3,
  rbf_c = 1,
  cells_range = c(0.4, 0.8),
  cpgs_range = c(0.4, 0.8)
)
scmet_simulate(
  N_feat = 100,
  N_cells = 50,
  N_cpgs = 15,
  L = 4,
  X = NULL,
  w_mu = c(-0.5, -1.5),
  s_mu = 1,
  w_gamma = NULL,
  s_gamma = 0.3,
  rbf_c = 1,
  cells_range = c(0.4, 0.8),
  cpgs_range = c(0.4, 0.8)
)

Arguments

`N_feat`	Total number of features (genomics regions).
`N_cells`	Maximum number of cells.
`N_cpgs`	Maximum number of CpGs per cell and feature.
`L`	Total number of radial basis functions (RBFs) to fit the mean-overdispersion trend. For L = 1, this reduces to a model that does not correct for the mean-overdispersion relationship.
`X`	Covariates which might explain variability in mean (methylation). If X = NULL, a 2-dim matrix will be generated, first column containing intercept term (all values = 1), and second colunn random generated covariates.
`w_mu`	Regression coefficients for covariates X. Should match number of columns of X.
`s_mu`	Standard deviation for mean parameter `mu`.
`w_gamma`	Regression coefficients of the basis functions. Should match the value of L. If NULL, random coefficients will be generated.
`s_gamma`	Standard deviation of dispersion parameter `gamma`.
`rbf_c`	Scale parameter for empirically computing the variance of the RBFs.
`cells_range`	Range (betwen 0 and 1) to randomly (sub)sample the number of cells per feature.
`cpgs_range`	Range (betwen 0 and 1) to randomly (sub)sample the number of CpGs per cell and feature.

Value

A simulated dataset and additional information for reproducibility purposes.

Examples

sim <- scmet_simulate(N_feat = 150, N_cells = 50, N_cpgs = 15, L = 4)

sim <- scmet_simulate(N_feat = 150, N_cells = 50, N_cpgs = 15, L = 4)

Simulate differential methylation data from scMET.

Description

General function for simulating two methylation datasets for performing differential methylation analysis. Differential analysis can be either performed in detecting changes in mean or variability of methylation patterns between the two groups. Similar to scmet_simulate, the function allows inclusion of covariates X that explain differences in mean methylation levels. Or also defining the trend for the mean - overdispersion relationship.

Usage

scmet_simulate_diff(
  N_feat = 100,
  N_cells = 50,
  N_cpgs = 15,
  L = 4,
  diff_feat_prcg_mu = 0,
  diff_feat_prcg_gamma = 0.2,
  OR_change_mu = 3,
  OR_change_gamma = 3,
  X = NULL,
  w_mu = c(-0.5, -1.5),
  s_mu = 1,
  w_gamma = NULL,
  s_gamma = 0.3,
  rbf_c = 1,
  cells_range = c(0.4, 0.8),
  cpgs_range = c(0.4, 0.8)
)
scmet_simulate_diff(
  N_feat = 100,
  N_cells = 50,
  N_cpgs = 15,
  L = 4,
  diff_feat_prcg_mu = 0,
  diff_feat_prcg_gamma = 0.2,
  OR_change_mu = 3,
  OR_change_gamma = 3,
  X = NULL,
  w_mu = c(-0.5, -1.5),
  s_mu = 1,
  w_gamma = NULL,
  s_gamma = 0.3,
  rbf_c = 1,
  cells_range = c(0.4, 0.8),
  cpgs_range = c(0.4, 0.8)
)

Arguments

`N_feat`	Total number of features (genomics regions).
`N_cells`	Maximum number of cells.
`N_cpgs`	Maximum number of CpGs per cell and feature.
`L`	Total number of radial basis functions (RBFs) to fit the mean-overdispersion trend. For L = 1, this reduces to a model that does not correct for the mean-overdispersion relationship.
`diff_feat_prcg_mu`	Percentage of features (betwen 0 and 1) that show differential mean methylation between the two groups.
`diff_feat_prcg_gamma`	Percentage of features (betwen 0 and 1) that show differential variability between the two groups.
`OR_change_mu`	Effect size change (in terms of odds ratio) of mean methylation between the two groups.
`OR_change_gamma`	Effect size change (in terms of odds ratio) of methylation variability between the two groups.
`X`	Covariates which might explain variability in mean (methylation). If X = NULL, a 2-dim matrix will be generated, first column containing intercept term (all values = 1), and second colunn random generated covariates.
`w_mu`	Regression coefficients for covariates X. Should match number of columns of X.
`s_mu`	Standard deviation for mean parameter `mu`.
`w_gamma`	Regression coefficients of the basis functions. Should match the value of L. If NULL, random coefficients will be generated.
`s_gamma`	Standard deviation of dispersion parameter `gamma`.
`rbf_c`	Scale parameter for empirically computing the variance of the RBFs.
`cells_range`	Range (betwen 0 and 1) to randomly (sub)sample the number of cells per feature.
`cpgs_range`	Range (betwen 0 and 1) to randomly (sub)sample the number of CpGs per cell and feature.

Value

Methylation data from two cell populations/conditions.

Examples

sim_diff <- scmet_simulate_diff(N_feat = 150, N_cells = 100, N_cpgs = 15, L = 4)

sim_diff <- scmet_simulate_diff(N_feat = 150, N_cells = 100, N_cpgs = 15, L = 4)

Convert from scmet to SingleCellExperiment object.

Description

Helper function that converts an scmet to SCE object. The structure of the SCE object to store single cell methylation data is the following. We create two assays, met storing methylated CpGs and total storing total number of CpGs. Rows correspond to features and columns to cells, similar to scRNA-seq convention. The rownames and colnames slots should store the feature and cell names, respectively. Covariates X that might explain variability in mean (methylation) should be stored in ⁠metadata(rowData(sce)$X⁠.

Usage

scmet_to_sce(Y, X = NULL)
scmet_to_sce(Y, X = NULL)

Arguments

`Y`	Methylation data in data.table format.
`X`	(Optional) Matrix of covariates.

Value

An SCE object with the structure described above.

Author(s)

C.A.Kapourani [email protected]

Examples

# Extract
sce <- scmet_to_sce(Y = scmet_dt$Y, X = scmet_dt$X)

# Extract
sce <- scmet_to_sce(Y = scmet_dt$Y, X = scmet_dt$X)

Package 'scMET'

Help Index

scMET: Bayesian modelling of DNA methylation at single-cell resolution.

Description

Value

Author(s)

References

See Also

Beta binomial maximum likelihood estimation (BB MLE)

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Create design matrix

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Convert from SingleCellExperiment to scmet object

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Perform inference with scMET

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Synthetic methylation data from two groups of cells

Description

Usage

Format

Value

Differential testing using scMET

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Synthetic methylation data from a single population

Description

Usage

Format

Value

Detect highly (or lowly) variable features with scMET

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Plot EFDR/EFNR grid

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Plot true versus inferred parameter estimated.

Description

Usage

Arguments

Value

Author(s)

`scMET`: Bayesian modelling of DNA methylation at single-cell resolution.