Title: | Conditional Differential Analysis for Flow and Mass Cytometry Experiments |
---|---|
Description: | The CytoGLMM R package implements two multiple regression strategies: A bootstrapped generalized linear model (GLM) and a generalized linear mixed model (GLMM). Most current data analysis tools compare expressions across many computationally discovered cell types. CytoGLMM focuses on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. As a result, CytoGLMM finds differential proteins in flow and mass cytometry data while reducing biases arising from marker correlations and safeguarding against false discoveries induced by patient heterogeneity. |
Authors: | Christof Seiler [aut, cre] |
Maintainer: | Christof Seiler <[email protected]> |
License: | LGPL-3 |
Version: | 1.15.0 |
Built: | 2024-10-30 05:22:23 UTC |
Source: | https://github.com/bioc/CytoGLMM |
Check if input to cytoxxx function have errors
cyto_check(cell_n_subsample, cell_n_min, protein_names)
cyto_check(cell_n_subsample, cell_n_min, protein_names)
cell_n_subsample |
Subsample samples to have this maximum cell count |
cell_n_min |
A vector of column names of protein to use in the analysis |
protein_names |
A vector of column names of protein to use in the analysis |
NULL.
Logistic mixture regression
cytoflexmix( df_samples_subset, protein_names, condition, group = "donor", cell_n_min = Inf, cell_n_subsample = 0, ks = seq_len(10), num_cores = 1 )
cytoflexmix( df_samples_subset, protein_names, condition, group = "donor", cell_n_min = Inf, cell_n_subsample = 0, ks = seq_len(10), num_cores = 1 )
df_samples_subset |
Data frame or tibble with proteins counts, cell condition, and group information |
protein_names |
A vector of column names of protein to use in the analysis |
condition |
The column name of the condition variable |
group |
The column name of the group variable |
cell_n_min |
Remove samples that are below this cell counts threshold |
cell_n_subsample |
Subsample samples to have this maximum cell count |
ks |
A vector of cluster sizes |
num_cores |
Number of computing cores |
A list of class cytoglm
containing
flexmixfits |
list of |
df_samples_subset |
possibly subsampled df_samples_subset table |
protein_names |
input protein names |
condition |
input condition variable |
group |
input group names |
cell_n_min |
input cell_n_min |
cell_n_subsample |
input cell_n_subsample |
ks |
input ks |
num_cores |
input num_cores |
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) mix_fit <- CytoGLMM::cytoflexmix(df, protein_names = protein_names, condition = "condition", group = "donor", ks = 2) mix_fit
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) mix_fit <- CytoGLMM::cytoflexmix(df, protein_names = protein_names, condition = "condition", group = "donor", ks = 2) mix_fit
Fit GLM with bootstrap resampling
cytoglm( df_samples_subset, protein_names, condition, group = "donor", covariate_names = NULL, cell_n_min = Inf, cell_n_subsample = 0, num_boot = 100, num_cores = 1 )
cytoglm( df_samples_subset, protein_names, condition, group = "donor", covariate_names = NULL, cell_n_min = Inf, cell_n_subsample = 0, num_boot = 100, num_cores = 1 )
df_samples_subset |
Data frame or tibble with proteins counts, cell condition, and group information |
protein_names |
A vector of column names of protein to use in the analysis |
condition |
The column name of the condition variable |
group |
The column name of the group variable |
covariate_names |
The column names of covariates |
cell_n_min |
Remove samples that are below this cell counts threshold |
cell_n_subsample |
Subsample samples to have this maximum cell count |
num_boot |
Number of bootstrap samples |
num_cores |
Number of computing cores |
A list of class cytoglm
containing
tb_coef |
coefficent table |
df_samples_subset |
possibly subsampled df_samples_subset table |
protein_names |
input protein names |
condition |
input condition variable |
group |
input group names |
covariate_names |
input covariates |
cell_n_min |
input cell_n_min |
cell_n_subsample |
input cell_n_subsample |
unpaired |
true if unpaired samples were provided as input |
num_boot |
input num_boot |
num_cores |
input num_cores |
formula_str |
formula use in the regression model |
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glm_fit <- CytoGLMM::cytoglm(df, protein_names = protein_names, condition = "condition", group = "donor", num_boot = 10) # in practice >=1000 glm_fit
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glm_fit <- CytoGLMM::cytoglm(df, protein_names = protein_names, condition = "condition", group = "donor", num_boot = 10) # in practice >=1000 glm_fit
Fit GLMM with method of moments
cytoglmm( df_samples_subset, protein_names, condition, group = "donor", covariate_names = NULL, cell_n_min = Inf, cell_n_subsample = 0, num_cores = 1 )
cytoglmm( df_samples_subset, protein_names, condition, group = "donor", covariate_names = NULL, cell_n_min = Inf, cell_n_subsample = 0, num_cores = 1 )
df_samples_subset |
Data frame or tibble with proteins counts, cell condition, and group information |
protein_names |
A vector of column names of protein to use in the analysis |
condition |
The column name of the condition variable |
group |
The column name of the group variable |
covariate_names |
The column names of covariates |
cell_n_min |
Remove samples that are below this cell counts threshold |
cell_n_subsample |
Subsample samples to have this maximum cell count |
num_cores |
Number of computing cores |
A list of class cytoglm
containing
glmmfit |
|
df_samples_subset |
possibly subsampled df_samples_subset table |
protein_names |
input protein names |
condition |
input condition variable |
group |
input group names |
covariate_names |
input covariates |
cell_n_min |
input cell_n_min |
cell_n_subsample |
input cell_n_subsample |
num_cores |
input num_cores |
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glmm_fit <- CytoGLMM::cytoglmm(df, protein_names = protein_names, condition = "condition", group = "donor") glmm_fit
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glmm_fit <- CytoGLMM::cytoglmm(df, protein_names = protein_names, condition = "condition", group = "donor") glmm_fit
Group-specific fixed effects model
cytogroup( df_samples_subset, protein_names, condition, group = "donor", cell_n_min = Inf, cell_n_subsample = 0 )
cytogroup( df_samples_subset, protein_names, condition, group = "donor", cell_n_min = Inf, cell_n_subsample = 0 )
df_samples_subset |
Data frame or tibble with proteins counts, cell condition, and group information |
protein_names |
A vector of column names of protein to use in the analysis |
condition |
The column name of the condition variable |
group |
The column name of the group variable |
cell_n_min |
Remove samples that are below this cell counts threshold |
cell_n_subsample |
Subsample samples to have this maximum cell count |
A list of class cytoglm
containing
groupfit |
|
df_samples_subset |
possibly subsampled df_samples_subset table |
protein_names |
input protein names |
condition |
input condition variable |
group |
input group names |
cell_n_min |
input cell_n_min |
cell_n_subsample |
input cell_n_subsample |
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) group_fit <- CytoGLMM::cytogroup(df, protein_names = protein_names, condition = "condition", group = "donor") group_fit
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) group_fit <- CytoGLMM::cytogroup(df, protein_names = protein_names, condition = "condition", group = "donor") group_fit
Evaluate parameter stability with respect to gating sheme
cytostab( df_samples_subset, protein_names, condition, group = "donor", cell_n_min = Inf, cell_n_subsample = 0 )
cytostab( df_samples_subset, protein_names, condition, group = "donor", cell_n_min = Inf, cell_n_subsample = 0 )
df_samples_subset |
Data frame or tibble with proteins counts, cell condition, and group information |
protein_names |
A vector of column names of protein to use in the analysis |
condition |
The column name of the condition variable |
group |
The column name of the group variable |
cell_n_min |
Remove samples that are below this cell counts threshold |
cell_n_subsample |
Subsample samples to have this maximum cell count |
A data frame
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) stab <- CytoGLMM::cytostab(df, protein_names = protein_names, condition = "condition", group = "donor") stab
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) stab <- CytoGLMM::cytostab(df, protein_names = protein_names, condition = "condition", group = "donor") stab
Generate dataset for vignettes and simulation studies
generate_data()
generate_data()
tibble
data frame
set.seed(23) df <- generate_data() str(df) df
set.seed(23) df <- generate_data() str(df) df
Generalized linear mixed model with maximum likelihood
glmm_moment( df_samples, protein_names, response, group = "donor", covariate_names = NULL, num_cores = 1 )
glmm_moment( df_samples, protein_names, response, group = "donor", covariate_names = NULL, num_cores = 1 )
df_samples |
Data frame or tibble with proteins counts, cell condition, and group information |
protein_names |
A vector of column names of protein to use in the analysis |
response |
The column name of the condition variable |
group |
The column name of the group variable |
covariate_names |
The column names of covariates |
num_cores |
Number of computing cores |
mbest
object
Check if samples match or paired on condition
is_unpaired(df_samples_subset, condition, group)
is_unpaired(df_samples_subset, condition, group)
df_samples_subset |
Data frame or tibble with proteins counts, cell condition, and group information |
condition |
The column name of the condition variable |
group |
The column name of the group variable |
A boolean
Helper function to plot regression coeffcient
plot_coeff( tb, title_str, title_str_right, xlab_str, redline = 0, order = FALSE, separate = FALSE )
plot_coeff( tb, title_str, title_str_right, xlab_str, redline = 0, order = FALSE, separate = FALSE )
tb |
A data frame |
title_str |
Title string for summary plot |
title_str_right |
Title for bootstrap sample plot |
xlab_str |
Label on x-axis |
redline |
Point on x-axis to draw the red line |
order |
Order the markers according to the mangintute of the coefficients |
separate |
Plot both summary and bootstrap samples |
ggplot2
object or list of two objects if
separate is true
Heatmap of median marker expression
plot_heatmap( df_samples, sample_info_names, protein_names, arrange_by_1, arrange_by_2 = "", cluster_cols = FALSE, fun = median )
plot_heatmap( df_samples, sample_info_names, protein_names, arrange_by_1, arrange_by_2 = "", cluster_cols = FALSE, fun = median )
df_samples |
Data frame or tibble with proteins counts, cell condition, and group information |
sample_info_names |
Column names that contain information about the cell, e.g. donor, condition, file name, or cell type |
protein_names |
A vector of column names of protein to use in the analysis |
arrange_by_1 |
Column name |
arrange_by_2 |
Column name |
cluster_cols |
Apply hierarchical cluster to columns |
fun |
Summary statistics of marker expression |
pheatmap
object
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) CytoGLMM::plot_heatmap(df, protein_names = protein_names, sample_info_names = c("donor", "condition"), arrange_by_1 = "condition")
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) CytoGLMM::plot_heatmap(df, protein_names = protein_names, sample_info_names = c("donor", "condition"), arrange_by_1 = "condition")
LDA on marker expression
plot_lda( df_samples, protein_names, group, cor_scaling_factor = 1, arrow_color = "black", marker_color = "black", marker_size = 5 )
plot_lda( df_samples, protein_names, group, cor_scaling_factor = 1, arrow_color = "black", marker_color = "black", marker_size = 5 )
df_samples |
Data frame or tibble with proteins counts, cell condition, and group information |
protein_names |
A vector of column names of protein to use in the analysis |
group |
The column name of the group variable |
cor_scaling_factor |
Scaling factor of circle of correlations |
arrow_color |
Color of correlation circle |
marker_color |
Colors of marker names |
marker_size |
Size of markerr names |
ggplot2
object
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) df$condition <- rep(c("A", "B", "C", "D"), each = length(df$condition)/4) CytoGLMM::plot_lda(df, protein_names = protein_names, group = "condition", cor_scaling_factor = 2)
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) df$condition <- rep(c("A", "B", "C", "D"), each = length(df$condition)/4) CytoGLMM::plot_lda(df, protein_names = protein_names, group = "condition", cor_scaling_factor = 2)
MDS on median marker expression
plot_mds( df_samples, protein_names, sample_info_names, color, sample_label = "" )
plot_mds( df_samples, protein_names, sample_info_names, color, sample_label = "" )
df_samples |
Data frame or tibble with proteins counts, cell condition, and group information |
protein_names |
A vector of column names of protein to use in the analysis |
sample_info_names |
Column names that contain information about the cell, e.g. donor, condition, file name, or cell type |
color |
Column name |
sample_label |
Column name |
cowplot
object
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) CytoGLMM::plot_mds(df, protein_names = protein_names, sample_info_names = c("donor", "condition"), color = "condition")
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) CytoGLMM::plot_mds(df, protein_names = protein_names, sample_info_names = c("donor", "condition"), color = "condition")
Plot model selection to choose number optimal number of clusters
plot_model_selection(fit, k = NULL)
plot_model_selection(fit, k = NULL)
fit |
A |
k |
Number of clusters |
cowplot
object
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) mix_fit <- CytoGLMM::cytoflexmix(df, protein_names = protein_names, condition = "condition", group = "donor", ks = 1:2) plot_model_selection(mix_fit)
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) mix_fit <- CytoGLMM::cytoflexmix(df, protein_names = protein_names, condition = "condition", group = "donor", ks = 1:2) plot_model_selection(mix_fit)
Plot PCA of subsampled data using ggplot
plot_prcomp( df_samples, protein_names, color_var = "treatment", subsample_size = 10000, repel = TRUE )
plot_prcomp( df_samples, protein_names, color_var = "treatment", subsample_size = 10000, repel = TRUE )
df_samples |
Data frame or tibble with proteins counts, cell condition, and group information |
protein_names |
A vector of column names of protein to use in the analysis |
color_var |
A column name |
subsample_size |
Subsample per color_var variable |
repel |
Repel labels |
cowplot
object
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) CytoGLMM::plot_prcomp(df, protein_names = protein_names, color_var = "condition")
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) CytoGLMM::plot_prcomp(df, protein_names = protein_names, color_var = "condition")
Plot all components of mixture regression
## S3 method for class 'cytoflexmix' plot(x, k = NULL, separate = FALSE, ...)
## S3 method for class 'cytoflexmix' plot(x, k = NULL, separate = FALSE, ...)
x |
A |
k |
Number of clusters |
separate |
create two separate |
... |
Other parameters |
ggplot2
object
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) mix_fit <- CytoGLMM::cytoflexmix(df, protein_names = protein_names, condition = "condition", group = "donor", ks = 2) plot(mix_fit)
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) mix_fit <- CytoGLMM::cytoflexmix(df, protein_names = protein_names, condition = "condition", group = "donor", ks = 2) plot(mix_fit)
Plot bootstraped coefficients
## S3 method for class 'cytoglm' plot(x, order = FALSE, separate = FALSE, ...)
## S3 method for class 'cytoglm' plot(x, order = FALSE, separate = FALSE, ...)
x |
A |
order |
Order the markers according to the mangintute of the coefficients |
separate |
create two separate |
... |
Other parameters |
ggplot2
object
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glm_fit <- CytoGLMM::cytoglm(df, protein_names = protein_names, condition = "condition", group = "donor", num_boot = 10) # in practice >=1000 plot(glm_fit)
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glm_fit <- CytoGLMM::cytoglm(df, protein_names = protein_names, condition = "condition", group = "donor", num_boot = 10) # in practice >=1000 plot(glm_fit)
Plot fixded coefficients of random effects model
## S3 method for class 'cytoglmm' plot(x, order = FALSE, separate = FALSE, ...)
## S3 method for class 'cytoglmm' plot(x, order = FALSE, separate = FALSE, ...)
x |
A |
order |
Order the markers according to the mangintute of the coefficients |
separate |
create two separate |
... |
Other parameters |
ggplot2
object
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glmm_fit <- CytoGLMM::cytoglmm(df, protein_names = protein_names, condition = "condition", group = "donor") plot(glmm_fit)
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glmm_fit <- CytoGLMM::cytoglmm(df, protein_names = protein_names, condition = "condition", group = "donor") plot(glmm_fit)
Plot fixded coefficients of group-specific fixed effects model
## S3 method for class 'cytogroup' plot(x, order = FALSE, separate = FALSE, ...)
## S3 method for class 'cytogroup' plot(x, order = FALSE, separate = FALSE, ...)
x |
A |
order |
Order the markers according to the mangintute of the coefficients |
separate |
create two separate |
... |
Other parameters |
ggplot2
object
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) group_fit <- CytoGLMM::cytogroup(df, protein_names = protein_names, condition = "condition", group = "donor") plot(group_fit)
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) group_fit <- CytoGLMM::cytogroup(df, protein_names = protein_names, condition = "condition", group = "donor") plot(group_fit)
Extact and print bootstrap GLM fit
## S3 method for class 'cytoglm' print(x, ...)
## S3 method for class 'cytoglm' print(x, ...)
x |
A |
... |
Other parameters |
NULL.
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glm_fit <- CytoGLMM::cytoglm(df, protein_names = protein_names, condition = "condition", group = "donor", num_boot = 10) # in practice >=1000 print(glm_fit)
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glm_fit <- CytoGLMM::cytoglm(df, protein_names = protein_names, condition = "condition", group = "donor", num_boot = 10) # in practice >=1000 print(glm_fit)
Extact and print GLMM fit
## S3 method for class 'cytoglmm' print(x, ...)
## S3 method for class 'cytoglmm' print(x, ...)
x |
A |
... |
Other parameters |
NULL.
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glmm_fit <- CytoGLMM::cytoglmm(df, protein_names = protein_names, condition = "condition", group = "donor") print(glmm_fit)
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glmm_fit <- CytoGLMM::cytoglmm(df, protein_names = protein_names, condition = "condition", group = "donor") print(glmm_fit)
Remove samples based on low cell counts
remove_samples(df_samples_subset, condition, group, unpaired, cell_n_min)
remove_samples(df_samples_subset, condition, group, unpaired, cell_n_min)
df_samples_subset |
Data frame or tibble with proteins counts, cell condition, and group information |
condition |
The column name of the condition variable |
group |
The column name of the group variable |
unpaired |
true if unpaired samples were provided as input |
cell_n_min |
Remove samples that are below this cell counts threshold |
NULL.
Extact and calculate p-values of bootstrap GLM fit
## S3 method for class 'cytoglm' summary(object, method = "BH", ...)
## S3 method for class 'cytoglm' summary(object, method = "BH", ...)
object |
A |
method |
Multiple comparison adjustment method |
... |
Other parameters |
tibble
data frame
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glm_fit <- CytoGLMM::cytoglm(df, protein_names = protein_names, condition = "condition", group = "donor", num_boot = 10) # in practice >=1000 summary(glm_fit)
set.seed(23) df <- generate_data() protein_names <- names(df)[3:12] df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glm_fit <- CytoGLMM::cytoglm(df, protein_names = protein_names, condition = "condition", group = "donor", num_boot = 10) # in practice >=1000 summary(glm_fit)
Extact and calculate p-values of GLMM fit
## S3 method for class 'cytoglmm' summary(object, method = "BH", ...)
## S3 method for class 'cytoglmm' summary(object, method = "BH", ...)
object |
A |
method |
Multiple comparison adjustment method |
... |
Other parameters |
tibble
data frame
set.seed(23) df = generate_data() protein_names = names(df)[3:12] df = dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glmm_fit = CytoGLMM::cytoglmm(df, protein_names = protein_names, condition = "condition", group = "donor") summary(glmm_fit)
set.seed(23) df = generate_data() protein_names = names(df)[3:12] df = dplyr::mutate_at(df, protein_names, function(x) asinh(x/5)) glmm_fit = CytoGLMM::cytoglmm(df, protein_names = protein_names, condition = "condition", group = "donor") summary(glmm_fit)