Package 'CytoGLMM' reference manual

Title:	Conditional Differential Analysis for Flow and Mass Cytometry Experiments
Description:	The CytoGLMM R package implements two multiple regression strategies: A bootstrapped generalized linear model (GLM) and a generalized linear mixed model (GLMM). Most current data analysis tools compare expressions across many computationally discovered cell types. CytoGLMM focuses on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. As a result, CytoGLMM finds differential proteins in flow and mass cytometry data while reducing biases arising from marker correlations and safeguarding against false discoveries induced by patient heterogeneity.
Authors:	Christof Seiler [aut, cre]
Maintainer:	Christof Seiler <[email protected]>
License:	LGPL-3
Version:	1.15.1
Built:	2025-04-03 07:34:47 UTC
Source:	https://github.com/bioc/CytoGLMM

Check if input to cytoxxx function have errors

Description

Check if input to cytoxxx function have errors

Usage

cyto_check(cell_n_subsample, cell_n_min, protein_names)
cyto_check(cell_n_subsample, cell_n_min, protein_names)

Arguments

`cell_n_subsample`	Subsample samples to have this maximum cell count
`cell_n_min`	A vector of column names of protein to use in the analysis
`protein_names`	A vector of column names of protein to use in the analysis

Value

NULL.

Logistic mixture regression

Description

Logistic mixture regression

Usage

cytoflexmix(
  df_samples_subset,
  protein_names,
  condition,
  group = "donor",
  cell_n_min = Inf,
  cell_n_subsample = 0,
  ks = seq_len(10),
  num_cores = 1
)
cytoflexmix(
  df_samples_subset,
  protein_names,
  condition,
  group = "donor",
  cell_n_min = Inf,
  cell_n_subsample = 0,
  ks = seq_len(10),
  num_cores = 1
)

Arguments

`df_samples_subset`	Data frame or tibble with proteins counts, cell condition, and group information
`protein_names`	A vector of column names of protein to use in the analysis
`condition`	The column name of the condition variable
`group`	The column name of the group variable
`cell_n_min`	Remove samples that are below this cell counts threshold
`cell_n_subsample`	Subsample samples to have this maximum cell count
`ks`	A vector of cluster sizes
`num_cores`	Number of computing cores

Value

A list of class cytoglm containing

`flexmixfits`	list of `flexmix` objects
`df_samples_subset`	possibly subsampled df_samples_subset table
`protein_names`	input protein names
`condition`	input condition variable
`group`	input group names
`cell_n_min`	input cell_n_min
`cell_n_subsample`	input cell_n_subsample
`ks`	input ks
`num_cores`	input num_cores

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
mix_fit <- CytoGLMM::cytoflexmix(df,
                                 protein_names = protein_names,
                                 condition = "condition",
                                 group = "donor",
                                 ks = 2)
mix_fit
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
mix_fit <- CytoGLMM::cytoflexmix(df,
                                 protein_names = protein_names,
                                 condition = "condition",
                                 group = "donor",
                                 ks = 2)
mix_fit

Fit GLM with bootstrap resampling

Description

Fit GLM with bootstrap resampling

Usage

cytoglm(
  df_samples_subset,
  protein_names,
  condition,
  group = "donor",
  covariate_names = NULL,
  cell_n_min = Inf,
  cell_n_subsample = 0,
  num_boot = 100,
  num_cores = 1
)
cytoglm(
  df_samples_subset,
  protein_names,
  condition,
  group = "donor",
  covariate_names = NULL,
  cell_n_min = Inf,
  cell_n_subsample = 0,
  num_boot = 100,
  num_cores = 1
)

Arguments

`df_samples_subset`	Data frame or tibble with proteins counts, cell condition, and group information
`protein_names`	A vector of column names of protein to use in the analysis
`condition`	The column name of the condition variable
`group`	The column name of the group variable
`covariate_names`	The column names of covariates
`cell_n_min`	Remove samples that are below this cell counts threshold
`cell_n_subsample`	Subsample samples to have this maximum cell count
`num_boot`	Number of bootstrap samples
`num_cores`	Number of computing cores

Value

A list of class cytoglm containing

`tb_coef`	coefficent table
`df_samples_subset`	possibly subsampled df_samples_subset table
`protein_names`	input protein names
`condition`	input condition variable
`group`	input group names
`covariate_names`	input covariates
`cell_n_min`	input cell_n_min
`cell_n_subsample`	input cell_n_subsample
`unpaired`	true if unpaired samples were provided as input
`num_boot`	input num_boot
`num_cores`	input num_cores
`formula_str`	formula use in the regression model

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
glm_fit <- CytoGLMM::cytoglm(df,
                             protein_names = protein_names,
                             condition = "condition",
                             group = "donor",
                             num_boot = 10) # in practice >=1000
glm_fit
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
glm_fit <- CytoGLMM::cytoglm(df,
                             protein_names = protein_names,
                             condition = "condition",
                             group = "donor",
                             num_boot = 10) # in practice >=1000
glm_fit

Group-specific fixed effects model

Description

Group-specific fixed effects model

Usage

cytogroup(
  df_samples_subset,
  protein_names,
  condition,
  group = "donor",
  cell_n_min = Inf,
  cell_n_subsample = 0
)
cytogroup(
  df_samples_subset,
  protein_names,
  condition,
  group = "donor",
  cell_n_min = Inf,
  cell_n_subsample = 0
)

Arguments

`df_samples_subset`	Data frame or tibble with proteins counts, cell condition, and group information
`protein_names`	A vector of column names of protein to use in the analysis
`condition`	The column name of the condition variable
`group`	The column name of the group variable
`cell_n_min`	Remove samples that are below this cell counts threshold
`cell_n_subsample`	Subsample samples to have this maximum cell count

Value

A list of class cytoglm containing

`groupfit`	`glm` object
`df_samples_subset`	possibly subsampled df_samples_subset table
`protein_names`	input protein names
`condition`	input condition variable
`group`	input group names
`cell_n_min`	input cell_n_min
`cell_n_subsample`	input cell_n_subsample

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
group_fit <- CytoGLMM::cytogroup(df,
                                 protein_names = protein_names,
                                 condition = "condition",
                                 group = "donor")
group_fit
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
group_fit <- CytoGLMM::cytogroup(df,
                                 protein_names = protein_names,
                                 condition = "condition",
                                 group = "donor")
group_fit

Evaluate parameter stability with respect to gating sheme

Description

Evaluate parameter stability with respect to gating sheme

Usage

cytostab(
  df_samples_subset,
  protein_names,
  condition,
  group = "donor",
  cell_n_min = Inf,
  cell_n_subsample = 0
)
cytostab(
  df_samples_subset,
  protein_names,
  condition,
  group = "donor",
  cell_n_min = Inf,
  cell_n_subsample = 0
)

Arguments

`df_samples_subset`	Data frame or tibble with proteins counts, cell condition, and group information
`protein_names`	A vector of column names of protein to use in the analysis
`condition`	The column name of the condition variable
`group`	The column name of the group variable
`cell_n_min`	Remove samples that are below this cell counts threshold
`cell_n_subsample`	Subsample samples to have this maximum cell count

Value

A data frame

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
stab <- CytoGLMM::cytostab(df,
                           protein_names = protein_names,
                           condition = "condition",
                           group = "donor")
stab
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
stab <- CytoGLMM::cytostab(df,
                           protein_names = protein_names,
                           condition = "condition",
                           group = "donor")
stab

Generate dataset for vignettes and simulation studies

Description

Generate dataset for vignettes and simulation studies

Usage

generate_data()
generate_data()

Value

tibble data frame

Examples

set.seed(23)
df <- generate_data()
str(df)
df
set.seed(23)
df <- generate_data()
str(df)
df

Check if samples match or paired on condition

Description

Check if samples match or paired on condition

Usage

is_unpaired(df_samples_subset, condition, group)
is_unpaired(df_samples_subset, condition, group)

Arguments

`df_samples_subset`	Data frame or tibble with proteins counts, cell condition, and group information
`condition`	The column name of the condition variable
`group`	The column name of the group variable

Value

A boolean

Helper function to plot regression coeffcient

Description

Helper function to plot regression coeffcient

Usage

plot_coeff(
  tb,
  title_str,
  title_str_right,
  xlab_str,
  redline = 0,
  order = FALSE,
  separate = FALSE
)
plot_coeff(
  tb,
  title_str,
  title_str_right,
  xlab_str,
  redline = 0,
  order = FALSE,
  separate = FALSE
)

Arguments

`tb`	A data frame
`title_str`	Title string for summary plot
`title_str_right`	Title for bootstrap sample plot
`xlab_str`	Label on x-axis
`redline`	Point on x-axis to draw the red line
`order`	Order the markers according to the mangintute of the coefficients
`separate`	Plot both summary and bootstrap samples

Value

ggplot2 object or list of two objects if separate is true

Heatmap of median marker expression

Description

Heatmap of median marker expression

Usage

plot_heatmap(
  df_samples,
  sample_info_names,
  protein_names,
  arrange_by_1,
  arrange_by_2 = "",
  cluster_cols = FALSE,
  fun = median
)
plot_heatmap(
  df_samples,
  sample_info_names,
  protein_names,
  arrange_by_1,
  arrange_by_2 = "",
  cluster_cols = FALSE,
  fun = median
)

Arguments

`df_samples`	Data frame or tibble with proteins counts, cell condition, and group information
`sample_info_names`	Column names that contain information about the cell, e.g. donor, condition, file name, or cell type
`protein_names`	A vector of column names of protein to use in the analysis
`arrange_by_1`	Column name
`arrange_by_2`	Column name
`cluster_cols`	Apply hierarchical cluster to columns
`fun`	Summary statistics of marker expression

Value

pheatmap object

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
CytoGLMM::plot_heatmap(df,
                       protein_names = protein_names,
                       sample_info_names = c("donor", "condition"),
                       arrange_by_1 = "condition")
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
CytoGLMM::plot_heatmap(df,
                       protein_names = protein_names,
                       sample_info_names = c("donor", "condition"),
                       arrange_by_1 = "condition")

LDA on marker expression

Description

LDA on marker expression

Usage

plot_lda(
  df_samples,
  protein_names,
  group,
  cor_scaling_factor = 1,
  arrow_color = "black",
  marker_color = "black",
  marker_size = 5
)
plot_lda(
  df_samples,
  protein_names,
  group,
  cor_scaling_factor = 1,
  arrow_color = "black",
  marker_color = "black",
  marker_size = 5
)

Arguments

`df_samples`	Data frame or tibble with proteins counts, cell condition, and group information
`protein_names`	A vector of column names of protein to use in the analysis
`group`	The column name of the group variable
`cor_scaling_factor`	Scaling factor of circle of correlations
`arrow_color`	Color of correlation circle
`marker_color`	Colors of marker names
`marker_size`	Size of markerr names

Value

ggplot2 object

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
df$condition <- rep(c("A", "B", "C", "D"), each = length(df$condition)/4)
CytoGLMM::plot_lda(df,
                   protein_names = protein_names,
                   group = "condition",
                   cor_scaling_factor = 2)
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
df$condition <- rep(c("A", "B", "C", "D"), each = length(df$condition)/4)
CytoGLMM::plot_lda(df,
                   protein_names = protein_names,
                   group = "condition",
                   cor_scaling_factor = 2)

MDS on median marker expression

Description

MDS on median marker expression

Usage

plot_mds(
  df_samples,
  protein_names,
  sample_info_names,
  color,
  sample_label = ""
)
plot_mds(
  df_samples,
  protein_names,
  sample_info_names,
  color,
  sample_label = ""
)

Arguments

`df_samples`	Data frame or tibble with proteins counts, cell condition, and group information
`protein_names`	A vector of column names of protein to use in the analysis
`sample_info_names`	Column names that contain information about the cell, e.g. donor, condition, file name, or cell type
`color`	Column name
`sample_label`	Column name

Value

cowplot object

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
CytoGLMM::plot_mds(df,
                   protein_names = protein_names,
                   sample_info_names = c("donor", "condition"),
                   color = "condition")
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
CytoGLMM::plot_mds(df,
                   protein_names = protein_names,
                   sample_info_names = c("donor", "condition"),
                   color = "condition")

Plot model selection to choose number optimal number of clusters

Description

Plot model selection to choose number optimal number of clusters

Usage

plot_model_selection(fit, k = NULL)
plot_model_selection(fit, k = NULL)

Arguments

`fit`	A `cytoflexmix` class
`k`	Number of clusters

Value

cowplot object

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
mix_fit <- CytoGLMM::cytoflexmix(df,
                                 protein_names = protein_names,
                                 condition = "condition",
                                 group = "donor",
                                 ks = 1:2)
plot_model_selection(mix_fit)
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
mix_fit <- CytoGLMM::cytoflexmix(df,
                                 protein_names = protein_names,
                                 condition = "condition",
                                 group = "donor",
                                 ks = 1:2)
plot_model_selection(mix_fit)

Plot PCA of subsampled data using ggplot

Description

Plot PCA of subsampled data using ggplot

Usage

plot_prcomp(
  df_samples,
  protein_names,
  color_var = "treatment",
  subsample_size = 10000,
  repel = TRUE
)
plot_prcomp(
  df_samples,
  protein_names,
  color_var = "treatment",
  subsample_size = 10000,
  repel = TRUE
)

Arguments

`df_samples`	Data frame or tibble with proteins counts, cell condition, and group information
`protein_names`	A vector of column names of protein to use in the analysis
`color_var`	A column name
`subsample_size`	Subsample per color_var variable
`repel`	Repel labels

Value

cowplot object

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
CytoGLMM::plot_prcomp(df,
                      protein_names = protein_names,
                      color_var = "condition")
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
CytoGLMM::plot_prcomp(df,
                      protein_names = protein_names,
                      color_var = "condition")

Plot all components of mixture regression

Description

Plot all components of mixture regression

Usage

## S3 method for class 'cytoflexmix'
plot(x, k = NULL, separate = FALSE, ...)
## S3 method for class 'cytoflexmix'
plot(x, k = NULL, separate = FALSE, ...)

Arguments

`x`	A `cytoflexmix` class
`k`	Number of clusters
`separate`	create two separate `ggplot2` objects
`...`	Other parameters

Value

ggplot2 object

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
mix_fit <- CytoGLMM::cytoflexmix(df,
                                 protein_names = protein_names,
                                 condition = "condition",
                                 group = "donor",
                                 ks = 2)
plot(mix_fit)
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
mix_fit <- CytoGLMM::cytoflexmix(df,
                                 protein_names = protein_names,
                                 condition = "condition",
                                 group = "donor",
                                 ks = 2)
plot(mix_fit)

Plot bootstraped coefficients

Description

Plot bootstraped coefficients

Usage

## S3 method for class 'cytoglm'
plot(x, order = FALSE, separate = FALSE, ...)
## S3 method for class 'cytoglm'
plot(x, order = FALSE, separate = FALSE, ...)

Arguments

`x`	A `cytoglm` class
`order`	Order the markers according to the mangintute of the coefficients
`separate`	create two separate `ggplot2` objects
`...`	Other parameters

Value

ggplot2 object

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
glm_fit <- CytoGLMM::cytoglm(df,
                             protein_names = protein_names,
                             condition = "condition",
                             group = "donor",
                             num_boot = 10) # in practice >=1000
plot(glm_fit)
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
glm_fit <- CytoGLMM::cytoglm(df,
                             protein_names = protein_names,
                             condition = "condition",
                             group = "donor",
                             num_boot = 10) # in practice >=1000
plot(glm_fit)

Plot fixded coefficients of group-specific fixed effects model

Description

Plot fixded coefficients of group-specific fixed effects model

Usage

## S3 method for class 'cytogroup'
plot(x, order = FALSE, separate = FALSE, ...)
## S3 method for class 'cytogroup'
plot(x, order = FALSE, separate = FALSE, ...)

Arguments

`x`	A `cytoglmm` class
`order`	Order the markers according to the mangintute of the coefficients
`separate`	create two separate `ggplot2` objects
`...`	Other parameters

Value

ggplot2 object

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
group_fit <- CytoGLMM::cytogroup(df,
                                 protein_names = protein_names,
                                 condition = "condition",
                                 group = "donor")
plot(group_fit)
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
group_fit <- CytoGLMM::cytogroup(df,
                                 protein_names = protein_names,
                                 condition = "condition",
                                 group = "donor")
plot(group_fit)

Extact and print bootstrap GLM fit

Description

Extact and print bootstrap GLM fit

Usage

## S3 method for class 'cytoglm'
print(x, ...)
## S3 method for class 'cytoglm'
print(x, ...)

Arguments

`x`	A `cytoglm` class
`...`	Other parameters

Value

NULL.

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
glm_fit <- CytoGLMM::cytoglm(df,
                             protein_names = protein_names,
                             condition = "condition",
                             group = "donor",
                             num_boot = 10) # in practice >=1000
print(glm_fit)
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
glm_fit <- CytoGLMM::cytoglm(df,
                             protein_names = protein_names,
                             condition = "condition",
                             group = "donor",
                             num_boot = 10) # in practice >=1000
print(glm_fit)

Remove samples based on low cell counts

Description

Remove samples based on low cell counts

Usage

remove_samples(df_samples_subset, condition, group, unpaired, cell_n_min)
remove_samples(df_samples_subset, condition, group, unpaired, cell_n_min)

Arguments

`df_samples_subset`	Data frame or tibble with proteins counts, cell condition, and group information
`condition`	The column name of the condition variable
`group`	The column name of the group variable
`unpaired`	true if unpaired samples were provided as input
`cell_n_min`	Remove samples that are below this cell counts threshold

Value

NULL.

Extact and calculate p-values of bootstrap GLM fit

Description

Extact and calculate p-values of bootstrap GLM fit

Usage

## S3 method for class 'cytoglm'
summary(object, method = "BH", ...)
## S3 method for class 'cytoglm'
summary(object, method = "BH", ...)

Arguments

`object`	A `cytoglm` class
`method`	Multiple comparison adjustment method
`...`	Other parameters

Value

tibble data frame

Examples

set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
glm_fit <- CytoGLMM::cytoglm(df,
                             protein_names = protein_names,
                             condition = "condition",
                             group = "donor",
                             num_boot = 10) # in practice >=1000
summary(glm_fit)
set.seed(23)
df <- generate_data()
protein_names <- names(df)[3:12]
df <- dplyr::mutate_at(df, protein_names, function(x) asinh(x/5))
glm_fit <- CytoGLMM::cytoglm(df,
                             protein_names = protein_names,
                             condition = "condition",
                             group = "donor",
                             num_boot = 10) # in practice >=1000
summary(glm_fit)

Package 'CytoGLMM'

Help Index

Check if input to cytoxxx function have errors

Description

Usage

Arguments

Value

Logistic mixture regression

Description

Usage

Arguments

Value

Examples

Fit GLM with bootstrap resampling

Description

Usage

Arguments

Value

Examples

Group-specific fixed effects model

Description

Usage

Arguments

Value

Examples

Evaluate parameter stability with respect to gating sheme

Description

Usage

Arguments

Value

Examples

Generate dataset for vignettes and simulation studies

Description

Usage

Value

Examples

Check if samples match or paired on condition

Description

Usage

Arguments

Value

Helper function to plot regression coeffcient

Description

Usage

Arguments

Value

Heatmap of median marker expression

Description

Usage

Arguments

Value

Examples

LDA on marker expression

Description

Usage

Arguments

Value

Examples

MDS on median marker expression

Description

Usage

Arguments

Value

Examples

Plot model selection to choose number optimal number of clusters

Description

Usage

Arguments

Value

Examples

Plot PCA of subsampled data using ggplot

Description

Usage

Arguments

Value

Examples

Plot all components of mixture regression

Description

Usage

Arguments