Title: | Bayesian hierarchical mixture of factor analyzers for modelling genomic bifurcations |
---|---|
Description: | MFA models genomic bifurcations using a Bayesian hierarchical mixture of factor analysers. |
Authors: | Kieran Campbell [aut, cre] |
Maintainer: | Kieran Campbell <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.29.0 |
Built: | 2024-10-30 08:47:23 UTC |
Source: | https://github.com/bioc/mfa |
Calculates a data frame of the MAP estimates of .
calculate_chi(m)
calculate_chi(m)
m |
A fit returned from |
A data_frame
with one entry for the feature names and one
for the MAP estimates of chi (using the posterior.mode
function
from MCMCglmm
).
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X) chi_map <- calculate_chi(m)
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X) chi_map <- calculate_chi(m)
Create synthetic bifurcating data for two branches. Optionally incorporate zero inflation and transient gene expression.
create_synthetic(C = 100, G = 40, p_transient = 0, zero_negative = TRUE, model_dropout = FALSE, lambda = 1)
create_synthetic(C = 100, G = 40, p_transient = 0, zero_negative = TRUE, model_dropout = FALSE, lambda = 1)
C |
Number of cells to simulate |
G |
Number of genes to simulate |
p_transient |
Propotion of genes that exhibit transient expression |
zero_negative |
Logical: should expression generated less than zero be set to zero? This will zero-inflate the data |
model_dropout |
Logical: if true, expression will be set to zero with
the exponential dropout formula dependent on the latent expression using
dropout parameter |
lambda |
The dropout parameter |
A list with the following entries:
X
A cell-by-feature expression matrix
branch
A vector of length C
assigning cells to branches
pst
A vector of pseudotimes for each cell
k
The parameters
phi
The parameters
delta
The parameters
p_transient
The proportion of genes simulated as transient
according to the original function call
synth <- create_synthetic()
synth <- create_synthetic()
Estimate the dropout parameter
empirical_lambda(y, lower_limit = 0)
empirical_lambda(y, lower_limit = 0)
y |
A cell-by-gene expression matrix |
lower_limit |
The limit below which expression counts as 'dropout' |
The estimated lambda
synth <- create_synthetic(C = 20, G = 5, zero_negative = TRUE, model_dropout = TRUE) lambda <- empirical_lambda(synth$X)
synth <- create_synthetic(C = 20, G = 5, zero_negative = TRUE, model_dropout = TRUE) lambda <- empirical_lambda(synth$X)
Perform Gibbs sampling inference for a hierarchical Bayesian mixture of factor analysers to identify bifurcations in single-cell expression data.
mfa(y, iter = 2000, thin = 1, burn = iter/2, b = 2, zero_inflation = FALSE, pc_initialise = 1, prop_collapse = 0, scale_input = !zero_inflation, lambda = NULL, eta_tilde = NULL, alpha = 0.1, beta = 0.1, theta_tilde = 0, tau_eta = 1, tau_theta = 1, tau_c = 1, alpha_chi = 0.01, beta_chi = 0.01, w_alpha = 1/b, clamp_pseudotimes = FALSE)
mfa(y, iter = 2000, thin = 1, burn = iter/2, b = 2, zero_inflation = FALSE, pc_initialise = 1, prop_collapse = 0, scale_input = !zero_inflation, lambda = NULL, eta_tilde = NULL, alpha = 0.1, beta = 0.1, theta_tilde = 0, tau_eta = 1, tau_theta = 1, tau_c = 1, alpha_chi = 0.01, beta_chi = 0.01, w_alpha = 1/b, clamp_pseudotimes = FALSE)
y |
A cell-by-gene single-cell expression matrix or an ExpressionSet object |
iter |
Number of MCMC iterations |
thin |
MCMC samples to thin |
burn |
Number of MCMC samples to throw away |
b |
Number of branches to model |
zero_inflation |
Logical, should zero inflation be enabled? |
pc_initialise |
Which principal component to initialise pseudotimes to |
prop_collapse |
Proportion of Gibbs samples which should marginalise over c |
scale_input |
Logical. If true, input is scaled to have mean 0 variance 1 |
lambda |
The dropout parameter - by default estimated using the function |
eta_tilde |
Hyperparameter |
alpha |
Hyperparameter |
beta |
Hyperparameter |
theta_tilde |
Hyperparameter |
tau_eta |
Hyperparameter |
tau_theta |
Hyperparameter |
tau_c |
Hyperparameter |
alpha_chi |
Hyperparameter |
beta_chi |
Hyperparameter |
w_alpha |
Hyperparameter |
clamp_pseudotimes |
This clamps the pseudotimes to their initial values and doesn't perform sampling. Should be FALSE except for diagnostics. |
The column names of Y are used as feature (gene/transcript) names while the row names are used as cell names. If either of these is undefined then the corresponding names are set to cell_x or feature_y.
It is recommended the form of Y is analogous to log-expression to mitigate the impact of outliers.
In the absence of prior information, three valid local maxima in the posterior likelihood exist (see manuscript). Setting the initial values to a principal component typically fixes sampling to one of them, analogous to specifying a root cell in similar methods.
The hyper-parameter eta_tilde
represents the expected expression in the absence of
any actual expression measurements. While a Bayesian purist might reason this based on
knowledge of the measurement technology, simply taking the mean of the input matrix in
an Empirical Bayes style seems reasonable.
The degree of shrinkage of the factor loading matrices to a common value is given by the
gamma prior on chi
. The mean of this is alpha_chi / beta_chi
while the variance
alpha_chi / beta_chi^2
. Therefore, to obtain higher levels of shrinkage increase
alpha_chi
with respect to beta_chi
.
The collapsed Gibbs sampling option given by collapse
involves marginalising out
c
(the factor loading intercepts) when updating the branch assignment parameters
gamma
which tends to soften the branch assignments.
If zero inflation is enabled using the zero_inflation
parameter then scaling should
*not* be enabled.
An S3 structure with the following entries:
traces
A list of iteration-by-dim trace matrices for
several important variables
iter
Number of iterations
thin
Thinning applied
burn
Burn period at the start of MCMC
b
Number of branches modelled
prop_collapse
Proportion of updates for gamma that are collapsed
N
Number of cells
G
Number of features (genes/transcripts)
feature_names
Names of features
cell_names
Names of cells
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X)
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X)
Plot posterior precision parameters
plot_chi(m, nfeatures = m$G)
plot_chi(m, nfeatures = m$G)
m |
A fit returned from |
nfeatures |
Top number of |
A ggplot2
bar-plot showing the map
estimates of
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X) plot_chi(m)
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X) plot_chi(m)
Plot the dropout relationship
plot_dropout_relationship(y, lambda = empirical_lambda(y))
plot_dropout_relationship(y, lambda = empirical_lambda(y))
y |
The input data matrix |
lambda |
The estimated value of lambda |
A ggplot2
plot showing the estimated dropout relationship
synth <- create_synthetic(C = 20, G = 5, zero_negative = TRUE, model_dropout = TRUE) lambda <- empirical_lambda(synth$X) plot_dropout_relationship(synth$X, lambda)
synth <- create_synthetic(C = 20, G = 5, zero_negative = TRUE, model_dropout = TRUE) lambda <- empirical_lambda(synth$X) plot_dropout_relationship(synth$X, lambda)
Plots the autocorrelation of the posterior log-likelihood.
plot_mfa_autocorr(m)
plot_mfa_autocorr(m)
m |
A fit returned from |
A ggplot2
plot returned by the ggmcmc
package plotting
the autocorrelation of the posterior log-likelihood.
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X) plot_mfa_autocorr(m)
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X) plot_mfa_autocorr(m)
Plots the trace of the posterior log-likelihood.
plot_mfa_trace(m)
plot_mfa_trace(m)
m |
A fit returned from |
A ggplot2
plot plotting
the trace of the posterior log-likelihood.
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X) plot_mfa_trace(m)
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X) plot_mfa_trace(m)
Print an mfa fit
## S3 method for class 'mfa' print(x, ...)
## S3 method for class 'mfa' print(x, ...)
x |
An MFA fit returned by |
... |
Additional arguments |
A string representation of an mfa
object.
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X) print(m)
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X) print(m)
Returns summary statistics of an mfa fit, including MAP pseudotime and branch allocations along with uncertainties.
## S3 method for class 'mfa' summary(object, ...)
## S3 method for class 'mfa' summary(object, ...)
object |
An MFA fit returned by a call to |
... |
Additional arguments |
A data_frame
with the following columns:
pseudotime
The MAP pseudotime estimate
branch
The MAP branch estimate
branch_certainty
The proportion of traces for which the cell
is assigned to its MAP branch
pseudotime_lower
The lower bound on the 95
(HPD) credible interval
pseudotime_upper
The upper bound on the 95
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X) ms <- summary(m)
synth <- create_synthetic(C = 20, G = 5) m <- mfa(synth$X) ms <- summary(m)
ggmcmc
objectTurn a trace list to a ggmcmc
object
to_ggmcmc(g)
to_ggmcmc(g)
g |
A list of trace matrices |
The trace list converted into a ggs
object for
input to ggmcmc
.