| Title: | Uncertainty-aware quantitative analysis of high-throughput live cell migration data |
|---|---|
| Description: | High-throughput cell imaging facilitates the analysis of cell migration across many wells treated under different biological conditions. These workflows generate considerable technical noise and biological variability, and therefore technical and biological replicates are necessary, leading to large, hierarchically structured datasets, i.e., cells are nested within technical replicates that are nested within biological replicates. Current statistical analyses of such data usually ignore the hierarchical structure of the data and fail to explicitly quantify uncertainty arising from technical or biological variability. To address this gap, we present cellmig, an R package implementing Bayesian hierarchical models for migration analysis. cellmig quantifies condition- specific velocity changes (e.g., drug effects) while modeling nested data structures and technical artifacts. It further enables synthetic data generation for experimental design optimization. |
| Authors: | Simo Kitanovski [aut, cre] (ORCID: <https://orcid.org/0000-0003-2909-5376>) |
| Maintainer: | Simo Kitanovski <[email protected]> |
| License: | GPL-3 + file LICENSE |
| Version: | 1.3.4 |
| Built: | 2026-05-23 15:08:41 UTC |
| Source: | https://github.com/bioc/cellmig |
The cellmig package implements Bayesian hierarchical models for the
analysis of cell migration speed. It quantifies condition-specific effects
(e.g., drug treatments) on cell migration speed while explicitly modeling
nested data structures and technical artifacts, and provides uncertainty-aware
estimates via posterior credible intervals.
This package contains functions for modeling, clustering, and visualization of cell migration speed data derived from high-throughput migration assays.
Authors and maintainers:
Simo Kitanovski [email protected] (ORCID)
Useful links:
# load package input data data(d, package = "cellmig") o <- cellmig(x = d, control = list(mcmc_warmup = 200, mcmc_steps = 700, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) str(o)# load package input data data(d, package = "cellmig") o <- cellmig(x = d, control = list(mcmc_warmup = 200, mcmc_steps = 700, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) str(o)
The cellmig function estimates cell migration speed using a
Bayesian model implemented in rstan. It takes a data.frame as
input and allows users to configure the Markov Chain Monte Carlo (MCMC)
sampling procedure via the control parameter. The function
returns a list containing:
fit - the fitted model as an rstan object.
data - the input/processed input data
posteriors - a summary of model parameters: means, medians, and
95% credible intervals.
control - the list of input controls
cellmig(x, control = NULL)cellmig(x, control = NULL)
x |
A data.frame containing the following columns. Each row represents the features of a cell:
|
control |
A list configuring MCMC sampling settings and parameter priors with the following default values:
|
A list containing:
fit = The fitted model as an rstan object.
data = The raw and processed input data.
posteriors = A summary of model parameters, including means
and 95% credible intervals.
alpha_p: batch effect on plate p
delta_t: overall treatment effects relative to the selected
control
delta_pt: plate-specific treatment effects relative to the
selected
control
well_mu: mean of cell speed distribution per well
well_kappa: shape of cell speed distribution per well
kappa_mu, kappa_sigma: mean/standard deviation
parameters of the population of well_kappas
sigma_bio: variability between biological replicates
sigma_tech: variability between technical replicates
sigma_delta: variability between delta_t parameters
yhat: posterior predictions
control = The list of input controls
data(d, package = "cellmig") o <- cellmig(x = d, control = list(mcmc_warmup = 200, mcmc_steps = 700, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) str(o)data(d, package = "cellmig") o <- cellmig(x = d, control = list(mcmc_warmup = 200, mcmc_steps = 700, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) str(o)
d
The dataset d contains simulated cell migration speed data from an
imaginary experiment. It includes cell migration speed measurements
(column v) for individual cells measured in different wells,
treated with chemical compounds (compound) at varying doses
(dose) and measured on experimental plates (plate).
The dataset d_mini is a reduced subset of d, containing data
for four compounds (C1–C4) and three doses.
data("d", package = "cellmig") data(d_mini, package = "cellmig")data("d", package = "cellmig") data(d_mini, package = "cellmig")
A data frame with cells (rows) having with the following features.
vNumeric. Cell migration speed (micrometers per minute).
wellCharacter. Well identifier.
compoundCharacter. Compound identifier.
doseCharacter. Treatment dose.
plateCharacter. Plate identifier.
offsetInteger (0 or 1). Indicates control samples used for batch correction across plates (1 = control, 0 = non-control).
Both datasets were generated using a simulation script located at
inst/scripts/sim_s.R.
Simulated data generated using inst/scripts/sim_s.R.
Simulated data generated using inst/scripts/sim_s.R.
data(d, package = "cellmig") data(d_mini, package = "cellmig") str(d) str(d_mini)data(d, package = "cellmig") data(d_mini, package = "cellmig") str(d) str(d_mini)
Simulates cell migration speed data from a hierarchical Bayesian model implemented in Stan, where model parameters are drawn from their prior distributions.
gen_full(control = list(N_biorep = 3, N_techrep = 3, N_cell = 50, N_group = 5, prior_alpha_p_M = -0.5, prior_alpha_p_SD = 1, prior_kappa_mu_M = 1.5, prior_kappa_mu_SD = 1, prior_kappa_sigma_M = 0, prior_kappa_sigma_SD = 1, prior_sigma_bio_M = 0, prior_sigma_bio_SD = 1, prior_sigma_tech_M = 0, prior_sigma_tech_SD = 1, prior_sigma_delta_M = 0, prior_sigma_delta_SD = 1))gen_full(control = list(N_biorep = 3, N_techrep = 3, N_cell = 50, N_group = 5, prior_alpha_p_M = -0.5, prior_alpha_p_SD = 1, prior_kappa_mu_M = 1.5, prior_kappa_mu_SD = 1, prior_kappa_sigma_M = 0, prior_kappa_sigma_SD = 1, prior_sigma_bio_M = 0, prior_sigma_bio_SD = 1, prior_sigma_tech_M = 0, prior_sigma_tech_SD = 1, prior_sigma_delta_M = 0, prior_sigma_delta_SD = 1))
control |
A named list specifying simulation settings and prior distributions. Default values are:
|
The function constructs a hierarchical synthetic dataset by simulating
cell migration speed values from a fully generative Stan model. Metadata
for plates, wells, treatment groups, and replicates are generated, and
cell-level observations are simulated using the sampling function
from the rstan package.
A data frame with one row per cell and the following columns:
vSimulated cell migration speed.
well_idUnique well identifier.
group_idTreatment group identifier.
plate_idBiological replicate (plate) identifier.
trep_idTechnical replicate identifier.
f <- gen_full(control = list(N_biorep = 3, N_techrep = 3, N_cell = 50, N_group = 5, prior_alpha_p_M = -0.5, prior_alpha_p_SD = 1.0, prior_kappa_mu_M = 1.5, prior_kappa_mu_SD = 1.0, prior_kappa_sigma_M = 0, prior_kappa_sigma_SD = 1.0, prior_sigma_bio_M = 0.0, prior_sigma_bio_SD = 1.0, prior_sigma_tech_M = 0.0, prior_sigma_tech_SD = 1.0, prior_sigma_delta_M = 0.0, prior_sigma_delta_SD = 1.0)) str(f)f <- gen_full(control = list(N_biorep = 3, N_techrep = 3, N_cell = 50, N_group = 5, prior_alpha_p_M = -0.5, prior_alpha_p_SD = 1.0, prior_kappa_mu_M = 1.5, prior_kappa_mu_SD = 1.0, prior_kappa_sigma_M = 0, prior_kappa_sigma_SD = 1.0, prior_sigma_bio_M = 0.0, prior_sigma_bio_SD = 1.0, prior_sigma_tech_M = 0.0, prior_sigma_tech_SD = 1.0, prior_sigma_delta_M = 0.0, prior_sigma_delta_SD = 1.0)) str(f)
Simulates cell migration speed data from a hierarchical Bayesian model implemented in Stan, where some model parameters are fixed by the user and the remaining parameters are drawn from their prior distributions.
gen_partial(control = list(N_biorep = 3, N_techrep = 3, N_cell = 50, delta, sigma_bio = 0.1, sigma_tech = 0.05, offset = 1, prior_alpha_p_M = -0.5, prior_alpha_p_SD = 1.0, prior_kappa_mu_M = 1.5, prior_kappa_mu_SD = 1.0, prior_kappa_sigma_M = 0, prior_kappa_sigma_SD = 0.3))gen_partial(control = list(N_biorep = 3, N_techrep = 3, N_cell = 50, delta, sigma_bio = 0.1, sigma_tech = 0.05, offset = 1, prior_alpha_p_M = -0.5, prior_alpha_p_SD = 1.0, prior_kappa_mu_M = 1.5, prior_kappa_mu_SD = 1.0, prior_kappa_sigma_M = 0, prior_kappa_sigma_SD = 0.3))
control |
A named list specifying simulation settings, fixed parameter values, and prior distributions. Default values are:
|
The function constructs a hierarchical synthetic dataset by simulating
cell migration speed values from a partially generative Stan model.
User-specified parameters (e.g., treatment effects and variability terms)
are held fixed, while remaining parameters are sampled from their prior
distributions. Cell-level observations are simulated using the
sampling function from the rstan package.
A data frame with one row per simulated observation and the following columns:
iterationSimulation iteration index.
well_idUnique well identifier.
ySimulated cell migration speed.
group_idTreatment group identifier.
plate_idBiological replicate (plate) identifier.
g <- gen_partial(control = list(N_biorep = 3, N_techrep = 3, N_cell = 50, delta=c(0, -0.4, -0.2, -0.1, 0, 0.1, 0.2, 0.4), sigma_bio = 0.2, sigma_tech = 0.05, offset = 1, prior_alpha_p_M = -0.5, prior_alpha_p_SD = 0.5, prior_kappa_mu_M = 1.7, prior_kappa_mu_SD = 0.5, prior_kappa_sigma_M = 0, prior_kappa_sigma_SD = 0.3)) str(g)g <- gen_partial(control = list(N_biorep = 3, N_techrep = 3, N_cell = 50, delta=c(0, -0.4, -0.2, -0.1, 0, 0.1, 0.2, 0.4), sigma_bio = 0.2, sigma_tech = 0.05, offset = 1, prior_alpha_p_M = -0.5, prior_alpha_p_SD = 0.5, prior_kappa_mu_M = 1.7, prior_kappa_mu_SD = 0.5, prior_kappa_sigma_M = 0, prior_kappa_sigma_SD = 0.3)) str(g)
The function takes as its main input (x) the output of the
cellmig function. Users can select a subset of treatment
groups (compounds + dose combinations) using groups. To
construct the hierarchical dendrogram, one has to select a distance
metric, hc_dist, (e.g. hc_dist = "euclidean"); and a
linkage function, hc_link, (e.g. hc_link = "average").
get_dose_response_profile(x, hc_link = "average", hc_dist = "euclidean", groups, B = 1000, exponentiate)get_dose_response_profile(x, hc_link = "average", hc_dist = "euclidean", groups, B = 1000, exponentiate)
x |
An object generated by |
hc_link |
Character. Linkage method used for hierarchical clustering
(e.g., |
hc_dist |
Character. Distance metric used for hierarchical clustering
(e.g., |
groups |
Optional character vector specifying a subset of treatment groups (compound–dose combinations) to include in the analysis. |
B |
Integer. Number of posterior samples drawn for visualization. |
exponentiate |
Logical. If |
The function get_dose_response_profile visualizes compound
effect profiles, identifying compounds with similar effects on cell
migration speed across different doses. These compounds are clustered
together in the hierarchical dendrogram.
A patchwork plot containing:
A: Hierarchical dendrogram comparing compound effect curves.
B: Mean effects of compounds and doses on cell migration speed with 95% credible intervals.
C: Mean effects of compounds and doses on cell migration speed in each replicate with 95% credible intervals.
data(d, package = "cellmig") o <- cellmig(x = d, control = list(mcmc_warmup = 200, mcmc_steps = 500, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) p <- get_dose_response_profile(x = o, hc_link = "average", hc_dist = "euclidean", B = 100, exponentiate = TRUE) pdata(d, package = "cellmig") o <- cellmig(x = d, control = list(mcmc_warmup = 200, mcmc_steps = 500, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) p <- get_dose_response_profile(x = o, hc_link = "average", hc_dist = "euclidean", B = 100, exponentiate = TRUE) p
This function extracts treatment group metadata from an object generated by
cellmig.
get_groups(x)get_groups(x)
x |
An object generated by |
The function retrieves group metadata such as group identifiers, names, compound information, and dose levels.
A data frame containing the following columns:
group_id |
Unique identifier for each treatment group. |
group |
Name of the treatment group. |
compound |
Compound associated with the treatment group. |
dose |
Dose level of the compound. |
cellmig, get_pairs, get_groups, get_violins
data(d_mini, package = "cellmig") o <- cellmig(x = d_mini, control = list(mcmc_warmup = 200, mcmc_steps = 500, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) u <- get_groups(x = o)data(d_mini, package = "cellmig") o <- cellmig(x = d_mini, control = list(mcmc_warmup = 200, mcmc_steps = 500, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) u <- get_groups(x = o)
The function get_pairs performs pairwise comparisons of overall
treatment effects by computing differences between their posterior
distributions.
get_pairs(x, groups, exponentiate)get_pairs(x, groups, exponentiate)
x |
An object generated by |
groups |
Optional character vector specifying treatment groups to include.
Available groups can be obtained using |
exponentiate |
Logical. If |
The main input, x, is the output of the cellmig function.
For each pair of treatment groups, get_pairs extracts the posterior
distributions of their cell migration speed effects and computes the
difference between them. This difference defines a posterior distribution
denoted by .
The 95% credible interval of is summarized by its lower
() and upper () bounds.
The posterior probability of differential effects, , is defined as:
If and its 95% credible interval lie mostly or entirely below zero,
there is strong evidence for a lower cell migration speed effect in treatment
group compared to . Conversely, if and its 95%
credible interval lie mostly or entirely above zero, there is strong evidence
for a higher effect in treatment group compared to .
If the 95% credible interval of overlaps zero substantially, the
data do not provide clear evidence for a difference between treatment groups.
Note that lack of clear evidence does not imply absence of an effect, as wide
credible intervals may still include biologically relevant differences.
A list containing:
comparisons: A data frame with one row per pairwise comparison,
containing:
group_xFirst treatment group.
group_ySecond treatment group.
Posterior mean of the difference in treatment effects.
Lower bound of the 95% credible interval.
Upper bound of the 95% credible interval.
Posterior probability of differential effects.
plot_rho: A heatmap (ggplot2 object) of values for all
treatment group pairs.
plot_pi: A heatmap (ggplot2 object) of values for all
treatment group pairs.
plot_volcano: A volcano plot (ggplot2 object) of vs.
for all treatment group pairs.
data(d_mini, package = "cellmig") o <- cellmig(x = d_mini, control = list(mcmc_warmup = 200, mcmc_steps = 500, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) u <- get_pairs(x = o, exponentiate = FALSE) head(u)data(d_mini, package = "cellmig") o <- cellmig(x = d_mini, control = list(mcmc_warmup = 200, mcmc_steps = 500, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) u <- get_pairs(x = o, exponentiate = FALSE) head(u)
The get_ppc_means function visualizes the comparison between observed and
posterior predicted migration means. The function aggregates observed migration
values by well, merges them with posterior predictive means, and generates a
scatter plot with error bars representing the 95% HDI.
get_ppc_means(x)get_ppc_means(x)
x |
The function takes as its main input ( |
The function follows these steps:
Compute mean observed (scaled) cell migration in each well
Compute mean predicted (scaled) cell migration in each well + 95% credible intervals
Generate a ggplot2 visualization with:
A dashed diagonal indicating perfect agreement.
Scatter points representing observed vs. predicted means.
Error bars (dark gray) showing the 95% HDI for predictions.
A ggplot2 object displaying:
A scatter plot comparing observed vs. predicted migration means.
95% highest density interval (HDI) error bars for the predicted means.
A dashed reference line (y = x) for ideal agreement.
# Load example dataset data(d_mini, package = "cellmig") # Run cellmig with MCMC sampling o <- cellmig(x = d_mini, control = list(mcmc_warmup = 200, mcmc_steps = 500, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) # Generate PPC means plot p <- get_ppc_means(x = o) print(p)# Load example dataset data(d_mini, package = "cellmig") # Run cellmig with MCMC sampling o <- cellmig(x = d_mini, control = list(mcmc_warmup = 200, mcmc_steps = 500, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) # Generate PPC means plot p <- get_ppc_means(x = o) print(p)
The get_ppc_violins function visualizes posterior predictive checks (PPC) by
comparing observed data with posterior predictive distributions. The function
extracts simulated data from the fitted Stan model and overlays them with
observed values, grouped by compound and plate.
get_ppc_violins(x, wrap = FALSE, ncol = 4)get_ppc_violins(x, wrap = FALSE, ncol = 4)
x |
The function takes as its main input ( |
wrap |
logical, if wrap = FALSE (default) we will use facet_grid, and if wrap = TRUE we will use facet_wrap to split data into compound x plate pairs. |
ncol |
In case wrap = TRUE, how many columns should the plot have? |
The function extracts posterior predictive samples using rstan::extract,
reshapes them into long format using reshape2::melt, and merges them
with metadata from the input object. It then generates a ggplot2
visualization, where:
Violin plots (dashed red) represent posterior predictive distributions.
Overlaid sina plots (black) represent observed data points.
Facets are arranged by compound and plate for better comparison.
A ggplot2 object displaying:
Violin plots of posterior predictive distributions for each compound and plate.
Overlaid scatter plots (sina plot) of observed values.
data(d_mini, package = "cellmig") o <- cellmig(x = d_mini, control = list(mcmc_warmup = 200, mcmc_steps = 500, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) p <- get_ppc_violins(x = o, wrap = TRUE, ncol = 4) print(p)data(d_mini, package = "cellmig") o <- cellmig(x = d_mini, control = list(mcmc_warmup = 200, mcmc_steps = 500, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) p <- get_ppc_violins(x = o, wrap = TRUE, ncol = 4) print(p)
This function generates violin plots for comparing posterior distributions of treatment group effects on cell migration.
get_violins(x, from_groups, to_group, exponentiate)get_violins(x, from_groups, to_group, exponentiate)
x |
An object generated by |
from_groups |
A vector of group names to compare against the target group. |
to_group |
A single group name serving as the reference for comparisons. |
exponentiate |
logical, should the effect be exponentiating turning log-fold-changes into more interpretable fold-changes |
The function extracts posterior samples of treatment group-level means and computes differences between specified groups. It generates a violin plot illustrating these differences and returns both the computed data and the plot object.
A list with two elements:
ds |
A data frame containing computed differences, treatment group identifiers, and associated statistics. |
plot |
A ggplot2 object representing the violin plot of the computed treatment differences. |
cellmig, get_pairs, get_groups
data(d_mini, package = "cellmig") o <- cellmig(x = d_mini, control = list(mcmc_warmup = 200, mcmc_steps = 500, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) str(get_groups(x = o)) u <- get_violins(x = o, from_groups = c("C2|D3", "C2|D4"), to_group = "C3|D3", exponentiate = FALSE)data(d_mini, package = "cellmig") o <- cellmig(x = d_mini, control = list(mcmc_warmup = 200, mcmc_steps = 500, mcmc_chains = 2, mcmc_cores = 2, mcmc_algorithm = "NUTS", adapt_delta = 0.8, max_treedepth = 10)) str(get_groups(x = o)) u <- get_violins(x = o, from_groups = c("C2|D3", "C2|D4"), to_group = "C3|D3", exponentiate = FALSE)