| Title: | Batch Effects Quality Control Software |
|---|---|
| Description: | Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA. |
| Authors: | Jessica Anderson [aut] (ORCID: <https://orcid.org/0000-0002-0542-9872>), W. Evan Johnson [aut, fnd] (ORCID: <https://orcid.org/0000-0002-6247-6595>), Yaoan Leng [ctb, cre] (ORCID: <https://orcid.org/0009-0002-3957-5250>), Solaiappan Manimaran [aut], Heather Selby [ctb], Claire Ruberman [ctb], Kwame Okrah [ctb], Hector Corrada Bravo [ctb], Michael Silverstein [ctb], Regan Conrad [ctb], Zhaorong Li [ctb], Evan Holmes [ctb], Solomon Joseph [ctb], Howard Fan [ctb], Sean Lu [ctb] |
| Maintainer: | Yaoan Leng <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 2.9.0 |
| Built: | 2026-05-11 10:05:59 UTC |
| Source: | https://github.com/bioc/BatchQC |
This function creates a boxplot of all the AIC values for each gene under each tested distribution to aid in identifying outliers
AIC_boxplots(AIC_data, num_methods)AIC_boxplots(AIC_data, num_methods)
AIC_data |
dataframe with the data to be plotted |
num_methods |
integer representing the number of distribution methods |
AIC_boxplot; a boxplot for each method showing distribution of data
Batch Correct This function allows you to Add batch corrected count matrix to the SE object
batch_correct(se, method, assay_to_normalize, batch, group = NULL, covar, output_assay_name, psva, num_sv, limit, numrepeats)batch_correct(se, method, assay_to_normalize, batch, group = NULL, covar, output_assay_name, psva, num_sv, limit, numrepeats)
se |
SummarizedExperiment object |
method |
Normalization Method ("ComBat-Seq", "ComBat", "limma", "sva", svaseq, "Harman") |
assay_to_normalize |
Which assay use to do normalization |
batch |
The batch |
group |
The group variable |
covar |
list of covariates |
output_assay_name |
name of results assay |
psva |
boolean; default: FALSE. Only used if normalization method is "sva". If set to TRUE and no covariate input, psva function from the sva package will be used to remove batch effect. |
num_sv |
boolean; default: FALSE. Only used if normalization method is "svaseq". The number of estimated latent factor is set to 1 for a small number of samples. If set to TRUE, svaseq function will estimate the number of latent factors for you. |
limit |
numeric; default: 0.95. Only used if normalization method is "Harman". Indicates the limit of confidence in which to stop removing a batch effect. Must be between 0 and 1. |
numrepeats |
integer; default: 100000L. Only used if normalization method is "Harman". The number of repeats in which to run the simulated batch mean distribution estimator using the random selection algorithm. |
a summarized experiment object with normalized assay appended
library(scran) se <- mockSCE() se <- BatchQC::batch_correct(se, method = "ComBat-Seq", assay_to_normalize = "counts", batch = "Mutation_Status", covar = "Treatment", output_assay_name = "ComBat_Seq_Corrected") se <- BatchQC::batch_correct(se, method = "ComBat", assay_to_normalize = "counts", batch = "Mutation_Status", covar = "Treatment", output_assay_name = "ComBat_Corrected") selibrary(scran) se <- mockSCE() se <- BatchQC::batch_correct(se, method = "ComBat-Seq", assay_to_normalize = "counts", batch = "Mutation_Status", covar = "Treatment", output_assay_name = "ComBat_Seq_Corrected") se <- BatchQC::batch_correct(se, method = "ComBat", assay_to_normalize = "counts", batch = "Mutation_Status", covar = "Treatment", output_assay_name = "ComBat_Corrected") se
This function allows you to make a batch design matrix
batch_design(se, batch, covariate)batch_design(se, batch, covariate)
se |
summarized experiment object |
batch |
string, batch variable |
covariate |
string, biological covariate |
design table
library(scran) se <- mockSCE() batch_design_tibble <- batch_design(se, batch = "Mutation_Status", covariate = "Treatment") batch_design_tibblelibrary(scran) se <- mockSCE() batch_design_tibble <- batch_design(se, batch = "Mutation_Status", covariate = "Treatment") batch_design_tibble
This dataset is from signature data captured when activating different growth pathway genes in human mammary epithelial cells (GEO accession: GSE73628). This data consists of three batches and ten different conditions corresponding to control and nine different pathways.
data(batch_indicator)data(batch_indicator)
A data frame with 89 rows and 2 variables:
batch
condition
Run BatchQC shiny app
BatchQC(dev = FALSE)BatchQC(dev = FALSE)
dev |
Run the application in developer mode |
The shiny app will open
if(interactive()){ BatchQC() }if(interactive()){ BatchQC() }
Returns a list of explained variation by batch and condition combinations
batchqc_explained_variation(se, batch, condition = NULL, assay_name)batchqc_explained_variation(se, batch, condition = NULL, assay_name)
se |
Summarized experiment object |
batch |
Batch covariate |
condition |
Condition covariate(s) of interest if desired, default is NULL |
assay_name |
Assay of choice |
List of explained variation by batch and condition
library(scran) se <- mockSCE() batchqc_explained_variation <- BatchQC::batchqc_explained_variation(se, batch = "Mutation_Status", condition = "Treatment", assay_name = "counts") batchqc_explained_variationlibrary(scran) se <- mockSCE() batchqc_explained_variation <- BatchQC::batchqc_explained_variation(se, batch = "Mutation_Status", condition = "Treatment", assay_name = "counts") batchqc_explained_variation
adapted from kBET package (https://github.com/theislab/kBET).
Provides recursive bisection algorithm for an arbitrary function. It
evaluates the function foo at the bounds and replaces one of the
boundaries until a maximum is found or the interval becomes too small
bisect(foo, bounds, known = NULL, ..., tolx = 5, toly = 0.01)bisect(foo, bounds, known = NULL, ..., tolx = 5, toly = 0.01)
foo |
a function mapping a one-dim argument to one-dim value |
bounds |
a vector of length 2 with real valued numbers
(i.e. two arguments of |
known |
tells for which of the arguments a value is known (defaults to NULL) |
... |
additional parameters for |
tolx |
break condition for argument (defaults to 10) |
toly |
break condition for value (defaults to 0.01) |
A range of bounds where foo is maximal.
get_maximum <- bisect(function(x) { -(x - 2)^2 }, c(-5, 50))get_maximum <- bisect(function(x) { -(x - 2)^2 }, c(-5, 50))
Bladder data upload This function uploads the Bladder data set from the bladderbatch package. This dataset is from bladder cancer data with 22,283 different microarray gene expression data. It has 57 bladder samples with 3 metadata variables (batch, outcome and cancer). It contains 5 batches, 3 cancer types (cancer, biopsy, control), and 5 outcomes (Biopsy, mTCC, sTCC-CIS, sTCC+CIS, and Normal). Batch 1 contains only cancer, 2 has cancer and controls, 3 has only controls, 4 contains only biopsy, and 5 contains cancer and biopsy
bladder_data_upload()bladder_data_upload()
a SE object with counts data and metadata
library(bladderbatch) se_object <- bladder_data_upload()library(bladderbatch) se_object <- bladder_data_upload()
This function returns BMI data that comes form the data in "Comparing tuberculosis gene signatures in malnourished individuals using the TBSignatureProfiler" paper. Subject IDs were matched as shown on "github.com/jessmcc22/BatchQCv2_Manuscript/blob/devel/R/subjectID_match.R"
BMI_data(meta)BMI_data(meta)
meta |
dataframe; metadata that needs to be matched to BMI |
dataframe provided as input with BMI info added
Helper function to save variables as factors if not already factors
check_valid_input(se, batch, condition)check_valid_input(se, batch, condition)
se |
se object |
batch |
batch |
condition |
condition |
se se object
This function creates the base color palette used in BatchQC
color_palette(n, first_hue = 25, last_hue = 360)color_palette(n, first_hue = 25, last_hue = 360)
n |
numeric object representing number of colors to be created |
first_hue |
numeric object to set the first hue value |
last_hue |
numeric object to set the final hue value |
color_list list of colors generated
library(scran) n <- 100 color_list <- color_palette(n) color_listlibrary(scran) n <- 100 color_list <- color_palette(n) color_list
ComBat Correction This function applies ComBat correction to your summarized experiment object
ComBat_correction(se, assay_to_normalize, batch, covar, output_assay_name)ComBat_correction(se, assay_to_normalize, batch, covar, output_assay_name)
se |
SummarizedExperiment object |
assay_to_normalize |
Assay that should be corrected |
batch |
The variable that represents batch |
covar |
list of covariates |
output_assay_name |
name of results assay |
SE object with an added ComBat corrected array
ComBat-Seq Correction This function applies ComBat-seq correction to your summarized experiment object
ComBat_seq_correction(se, assay_to_normalize, batch, group, covar, output_assay_name)ComBat_seq_correction(se, assay_to_normalize, batch, group, covar, output_assay_name)
se |
SummarizedExperiment object |
assay_to_normalize |
Assay that should be corrected |
batch |
The variable that represents batch |
group |
The group variable |
covar |
list of covariates |
output_assay_name |
name of results assay |
SE object with an added ComBat-seq corrected array
This function creates the commentary recommendation when there are more than 20 samples.
commentary(nb_fit_pval, count_below_value_pval, proportion, low_pval, method)commentary(nb_fit_pval, count_below_value_pval, proportion, low_pval, method)
nb_fit_pval |
Boolean representing if the p-val count is below threshold |
count_below_value_pval |
number of features below p-val threshold |
proportion |
numeric; proportion of genes below the p-value |
low_pval |
pval threshold |
method |
string; method utilized for the parametric or non-parametric test; either "DESeq2" or "edgeR" |
a commentary string statement
A vector contains the AIC based on negative binomial model for individual genes.
A vector contains the AIC based on lognormal model for individual genes.
A vector contains the AIC based on voom transformation for individual genes.
The sum of AICs across all genes for the models in comparison.
The number of minimum AIC across the models in comparison for individual genes.
The ratios between total_AIC and min_AIC for the models in comparison. The optimal model is the one has minimum value.
compute_aic( se, assay_of_interest, batchind, groupind, maxit = 25, zero_filt_percent = 100 )compute_aic( se, assay_of_interest, batchind, groupind, maxit = 25, zero_filt_percent = 100 )
se |
SummarizedExperiment object |
assay_of_interest |
The assay name from se that you are interested in analyzing. This assay need to be a counts assay containing only non-negative integers. |
batchind |
Factor or numeric vector of length = ncol(dat); batch indicator for each sample. |
groupind |
Factor or numeric vector of length = ncol(dat); biological group label/indicator for each sample. |
maxit |
Integer giving the maximal number of IWLS iterations. Default is 25. |
zero_filt_percent |
Numeric value between 0 and 100, the percentage of zeros allowed for each gene to be included in the AIC calculation. Genes with more than this percentage of zeros will be filtered out. Default is 100. |
This function calculates the AIC based on lognormal distribution, negative binomial distribution as well as the voom transformation. It then compares the AICs of the three models across different genes and yields AIC-based metric values. Must have non-negative data and non-discrete data will only analyze the lognormal and voom distribution.
A list with a boxplot of the AIC_boxplot and a dataframe containing the total_AIC values, the frequency of the min_AIC values, the AIC_score, and the median_AIC value, for the following three elements:
The metric value calculated based on the negative binomial model.
The metric value calculated based on the lognormal model.
The metric value calculated based on the voom-based model.
library(scran) se <- mockSCE() compare_aic <- compute_aic(se, assay_of_interest = "counts", batchind = "Cell_Cycle", groupind = c("Treatment", "Mutation_Status")) print(compare_aic)library(scran) se <- mockSCE() compare_aic <- compute_aic(se, assay_of_interest = "counts", batchind = "Cell_Cycle", groupind = c("Treatment", "Mutation_Status")) print(compare_aic)
This function calculates the proportions of variation explained by batch, group, and residual for each gene using two-way ANOVA and computes the lambda index based on these three proportions.
compute_lambda(dat, batchind, groupind)compute_lambda(dat, batchind, groupind)
dat |
Numeric matrix of dimension (genes x samples) where each row represents one gene's expression across samples. |
batchind |
Factor or numeric vector of length = ncol(dat); batch indicator for each sample |
groupind |
Factor or numeric vector of length = ncol(dat); biological group label/indicator for each sample. |
dataframe with columns:
Proportion of total variance explained by batch effects.
Proportion of total variance explained by group effects.
Proportion of total variance that is residual noise.
Raw lambda index = total SS_batch / total SS_group.
Adjusted lambda = lambda_raw * ResidV/(1-ResidV).
library(scran) se <- mockSCE() res <- BatchQC::compute_lambda(assays(se)[["counts"]], colData(se)$Mutation_Status, colData(se)$Treatment) print(res)library(scran) se <- mockSCE() res <- BatchQC::compute_lambda(assays(se)[["counts"]], colData(se)$Mutation_Status, colData(se)$Treatment) print(res)
Combine std. Pearson correlation coefficient and Cramer's V
confound_metrics(se, batch)confound_metrics(se, batch)
se |
summarized experiment |
batch |
batch variable |
metrics of confounding
library(scran) se <- mockSCE() confound_table <- BatchQC::confound_metrics(se, batch = "Mutation_Status") confound_tablelibrary(scran) se <- mockSCE() confound_table <- BatchQC::confound_metrics(se, batch = "Mutation_Status") confound_table
This function allows you to calculate correlation properties
cor_props(bd)cor_props(bd)
bd |
batch design |
correlation properties
library(scran) se <- mockSCE() batch_design_tibble <- batch_design(se, batch = "Mutation_Status", covariate = "Treatment") correlation_property <- BatchQC::cor_props(batch_design_tibble) correlation_propertylibrary(scran) se <- mockSCE() batch_design_tibble <- batch_design(se, batch = "Mutation_Status", covariate = "Treatment") correlation_property <- BatchQC::cor_props(batch_design_tibble) correlation_property
Returns list of covariates not confounded by batch; helper function for explained variation and for populating shiny app condition options
covariates_not_confounded(se, batch)covariates_not_confounded(se, batch)
se |
Summarized experiment object |
batch |
Batch variable |
List of explained variation by batch and condition
library(scran) se <- mockSCE() covariates_not_confounded <- BatchQC::covariates_not_confounded(se, batch = "Mutation_Status") covariates_not_confoundedlibrary(scran) se <- mockSCE() covariates_not_confounded <- BatchQC::covariates_not_confounded(se, batch = "Mutation_Status") covariates_not_confounded
This function allows you to calculate Cramer's V
cramers_v(bd)cramers_v(bd)
bd |
batch design |
Cramer's V
library(scran) se <- mockSCE() batch_design_tibble <- batch_design(se, batch = "Mutation_Status", covariate = "Treatment") cramers_v_result <- BatchQC::cramers_v(batch_design_tibble) cramers_v_resultlibrary(scran) se <- mockSCE() batch_design_tibble <- batch_design(se, batch = "Mutation_Status", covariate = "Treatment") cramers_v_result <- BatchQC::cramers_v(batch_design_tibble) cramers_v_result
This function runs DE analysis on a count matrix (DESeq), a normalized log or log-CPM matrix (limma), an edgeR TMM-normalized matrix (edgeR) or perform ANOVA or Kruskal-Wallis test on the data contained in the se object.
DE_analyze(se, method, batch, conditions, assay_to_analyze, padj_method)DE_analyze(se, method, batch, conditions, assay_to_analyze, padj_method)
se |
SummarizedExperiment object |
method |
DE analysis method option ('DESeq2', 'limma', 'edgeR', 'ANOVA', or 'Kruskal-Wallis') |
batch |
metadata column in the se object representing batch |
conditions |
metadata columns in the se object representing additional analysis covariates |
assay_to_analyze |
Assay in the se object (either counts for DESeq2 or normalized data for limma or edgeR) for DE analysis |
padj_method |
correction method for adjusted p-value from p.adjust.methods |
A named list containing the log2FoldChange, fvalue (ANOVA only), pvalue and adjusted pvalue (padj) for each analysis returned by DESeq2, limma, edgeR, ANOVA, or Kruskal-Wallis.
library(scran) se <- mockSCE() differential_expression <- BatchQC::DE_analyze(se = se, method = "DESeq2", batch = "Treatment", conditions = c( "Mutation_Status"), assay_to_analyze = "counts", padj_method = "BH") pval_summary(differential_expression) pval_plotter(differential_expression)library(scran) se <- mockSCE() differential_expression <- BatchQC::DE_analyze(se = se, method = "DESeq2", batch = "Treatment", conditions = c( "Mutation_Status"), assay_to_analyze = "counts", padj_method = "BH") pval_summary(differential_expression) pval_plotter(differential_expression)
This function checks if there is any numeric or strings for plotting legend
dendrogram_alpha_numeric_check(dendro_var)dendrogram_alpha_numeric_check(dendro_var)
dendro_var |
column from dendrogram object representing category |
geom_label label for the legend of category variable
library(scran) se <- mockSCE() dendro_alpha_numeric_check <- dendrogram_alpha_numeric_check( dendro_var = "Treatment") dendro_alpha_numeric_checklibrary(scran) se <- mockSCE() dendro_alpha_numeric_check <- dendrogram_alpha_numeric_check( dendro_var = "Treatment") dendro_alpha_numeric_check
This function creates the color palette used in the dendrogram plotter
dendrogram_color_palette(col, dendrogram_info)dendrogram_color_palette(col, dendrogram_info)
col |
string object representing color of the label |
dendrogram_info |
dendrogram_ends object |
annotation_color vector of colors corresponding to col variable
library(scran) se <- mockSCE() process_dendro <- BatchQC::process_dendrogram(se, "counts") dendrogram_ends <- process_dendro$dendrogram_ends col <- process_dendro$condition_var dendro_colors <- dendrogram_color_palette(col = "Treatment", dendrogram_info = dendrogram_ends) dendro_colorslibrary(scran) se <- mockSCE() process_dendro <- BatchQC::process_dendrogram(se, "counts") dendrogram_ends <- process_dendro$dendrogram_ends col <- process_dendro$condition_var dendro_colors <- dendrogram_color_palette(col = "Treatment", dendrogram_info = dendrogram_ends) dendro_colors
This function creates a dendrogram plot
dendrogram_plotter(se, assay, batch_var, category_var)dendrogram_plotter(se, assay, batch_var, category_var)
se |
SummarizedExperiment object |
assay |
assay to plot |
batch_var |
sample metadata column representing batch |
category_var |
sample metadata column representing category of interest |
named list of dendrogram plots
dendrogram is a dendrogram ggplot
circular_dendrogram is a circular dendrogram ggplot
library(scran) se <- mockSCE() dendrogram_plot <- BatchQC::dendrogram_plotter(se, "counts", "Mutation_Status", "Treatment") dendrogram_plot$dendrogram dendrogram_plot$circular_dendrogramlibrary(scran) se <- mockSCE() dendrogram_plot <- BatchQC::dendrogram_plotter(se, "counts", "Mutation_Status", "Treatment") dendrogram_plot$dendrogram dendrogram_plot$circular_dendrogram
This function calculated the goodness of fit of DESeq2 for larger sample sizes (intended for more than 150 samples).
DESeq_large_analysis( count_matrix, condition, other_variables, conditions_df, model_formula, num_samples, sampled, small_sample_cutoff )DESeq_large_analysis( count_matrix, condition, other_variables, conditions_df, model_formula, num_samples, sampled, small_sample_cutoff )
count_matrix |
matrix containing the data to be analyzed |
condition |
a vector containing a factor of the condition of interest (typically batch) |
other_variables |
a vector of strings of other variables of interest |
conditions_df |
data frame containing information for the other variables of interest (columns in order of the other_variables vector) |
model_formula |
the stat formula to be used in the DESeq analysis |
num_samples |
total number of samples to analyze |
sampled |
the down sampled matrix |
small_sample_cutoff |
value at which non-parametric test was used (considered "large sample size") vs parametric was used (considered "small sample size") |
a list containing the string recommendation
This function calculated the goodness of fit of DESeq2 for small sample sizes (intended for less than 20 samples).
DESeq2_small_size( count_matrix, condition, other_variables, conditions_df, model_formula, num_samples, small_sample_cutoff )DESeq2_small_size( count_matrix, condition, other_variables, conditions_df, model_formula, num_samples, small_sample_cutoff )
count_matrix |
matrix containing the data to be analyzed |
condition |
a vector containing a factor of the condition of interest (typically batch) |
other_variables |
a vector of strings of other variables of interest |
conditions_df |
data frame containing information for the other variables of interest (columns in order of the other_variables vector) |
model_formula |
the stat formula to be used in the DESeq analysis |
num_samples |
total number of samples to analyze |
small_sample_cutoff |
value at which non-parametric test was used (considered "large sample size") vs parametric was used (considered "small sample size") |
a list containing the string recommendation, the histogram and a reference for the original source of the test
This function calculated the goodness of fit of edgeR for larger sample sizes (intended for more than 150 samples).
edgeR_large_analysis( count_matrix, condition, other_variables, conditions_df, model_formula, num_samples, sampled, small_sample_cutoff )edgeR_large_analysis( count_matrix, condition, other_variables, conditions_df, model_formula, num_samples, sampled, small_sample_cutoff )
count_matrix |
matrix containing the data to be analyzed |
condition |
a vector containing a factor of the condition of interest (typically batch) |
other_variables |
a vector of strings of other variables of interest |
conditions_df |
data frame containing information for the other variables of interest (columns in order of the other_variables vector) |
model_formula |
the stat formula to be used in the DESeq analysis |
num_samples |
total number of samples to analyze |
sampled |
the down sampled matrix |
small_sample_cutoff |
value at which non-parametric test was used (considered "large sample size") vs parametric was used (considered "small sample size") |
a list containing the string recommendation
This function calculated the goodness of fit of edgeR for small sample sizes (intended for less than or equal to 20 samples).
edgeR_small_size( count_matrix, condition, other_variables, conditions_df, model_formula, num_samples, small_sample_cutoff )edgeR_small_size( count_matrix, condition, other_variables, conditions_df, model_formula, num_samples, small_sample_cutoff )
count_matrix |
matrix containing the data to be analyzed |
condition |
a vector containing a factor of the condition of interest (typically batch) |
other_variables |
a vector of strings of other variables of interest |
conditions_df |
data frame containing information for the other variables of interest (columns in order of the other_variables vector) |
model_formula |
the stat formula to be used in the DESeq analysis |
num_samples |
total number of samples to analyze |
small_sample_cutoff |
value at which non-parametric test was used (considered "large sample size") vs parametric was used (considered "small sample size") |
a list containing the string recommendation, the histogram and a reference for the original source of the test
This function allows you to plot explained variation
EV_plotter(batchqc_ev)EV_plotter(batchqc_ev)
batchqc_ev |
table of explained variation from batchqc_explained_variation |
boxplot of explained variation
library(scran) se <- mockSCE() se$Mutation_Status <- as.factor(se$Mutation_Status) se$Treatment <- as.factor(se$Treatment) expl_var_result <- batchqc_explained_variation(se, batch = "Mutation_Status", condition = "Treatment", assay_name = "counts") EV_boxplot <- BatchQC::EV_plotter(expl_var_result[[1]]) EV_boxplotlibrary(scran) se <- mockSCE() se$Mutation_Status <- as.factor(se$Mutation_Status) se$Treatment <- as.factor(se$Treatment) expl_var_result <- batchqc_explained_variation(se, batch = "Mutation_Status", condition = "Treatment", assay_name = "counts") EV_boxplot <- BatchQC::EV_plotter(expl_var_result[[1]]) EV_boxplot
EV Table Returns table with percent variation explained for specified number of genes
EV_table(batchqc_ev)EV_table(batchqc_ev)
batchqc_ev |
explained variation results from batchqc_explained_variation |
List of explained variation by batch and condition
library(scran) se <- mockSCE() se$Mutation_Status <- as.factor(se$Mutation_Status) se$Treatment <- as.factor(se$Treatment) exp_var_result <- BatchQC::batchqc_explained_variation(se, batch = "Mutation_Status", condition = "Treatment", assay_name = "counts") EV_table <- BatchQC::EV_table(exp_var_result[[1]]) EV_tablelibrary(scran) se <- mockSCE() se$Mutation_Status <- as.factor(se$Mutation_Status) se$Treatment <- as.factor(se$Treatment) exp_var_result <- BatchQC::batchqc_explained_variation(se, batch = "Mutation_Status", condition = "Treatment", assay_name = "counts") EV_table <- BatchQC::EV_table(exp_var_result[[1]]) EV_table
Helper function to get residuals
get.res(y, X)get.res(y, X)
y |
assay |
X |
model matrix design |
residuals
This function calculates goodness-of-fit pvalues for all genes by looking at how the NB model by edgeR or DESeq2 fit the data
goodness_of_fit_nb( se, count_matrix, condition, other_variables = NULL, method = "edgeR", num_genes = 500, small_sample_cutoff = 21 )goodness_of_fit_nb( se, count_matrix, condition, other_variables = NULL, method = "edgeR", num_genes = 500, small_sample_cutoff = 21 )
se |
the se object; se object where all the data is contained |
count_matrix |
string; name of the assay with gene expression matrix (in counts) |
condition |
string; name of the se colData with the condition status |
other_variables |
string; name of the se colData containing other variables of interest that should be considered in the model |
method |
string; method to use for the parametric or non-parametric test;either "DESeq2" or "edgeR"; default is "edgeR" |
num_genes |
down sample value, default is 500 (or all genes if less) |
small_sample_cutoff |
value at which non-parametric test will be used (considered "large sample size") vs parametric will be used (considered "small sample size); default is 21 |
a matrix of p-values where each row is a gene and each column is a level within the condition of interest
# example code for small sample library(scran) se <- mockSCE(ncells = 20) se$Treatment <- as.factor(se$Treatment) se$Mutation_Status <- as.factor(se$Mutation_Status) nb_results <- goodness_of_fit_nb(se = se, count_matrix = "counts", condition = "Treatment", other_variables = "Mutation_Status", method = "edgeR") nb_results[1] nb_results[2] nb_results[3] # example code for large sample library(scran) se <- mockSCE(ncells = 150) se$Treatment <- as.factor(se$Treatment) se$Mutation_Status <- as.factor(se$Mutation_Status) nb_results <- goodness_of_fit_nb(se = se, count_matrix = "counts", condition = "Treatment", other_variables = "Mutation_Status", method = "edgeR") nb_results[1] nb_results[2] nb_results[3]# example code for small sample library(scran) se <- mockSCE(ncells = 20) se$Treatment <- as.factor(se$Treatment) se$Mutation_Status <- as.factor(se$Mutation_Status) nb_results <- goodness_of_fit_nb(se = se, count_matrix = "counts", condition = "Treatment", other_variables = "Mutation_Status", method = "edgeR") nb_results[1] nb_results[2] nb_results[3] # example code for large sample library(scran) se <- mockSCE(ncells = 150) se$Treatment <- as.factor(se$Treatment) se$Mutation_Status <- as.factor(se$Mutation_Status) nb_results <- goodness_of_fit_nb(se = se, count_matrix = "counts", condition = "Treatment", other_variables = "Mutation_Status", method = "edgeR") nb_results[1] nb_results[2] nb_results[3]
Harman Correction This function applies Harman correction to a summarized experiment object with and reconstructs the data back into the original feature space
Harman_correction( se, assay_to_normalize, batch, covar, output_assay_name, limit = 0.95, numrepeats = 100000L )Harman_correction( se, assay_to_normalize, batch, covar, output_assay_name, limit = 0.95, numrepeats = 100000L )
se |
SummarizedExperiment object |
assay_to_normalize |
string; name of assay that should be corrected |
batch |
factor; The variable that represents batch |
covar |
string; name of covariate. |
output_assay_name |
string; name of results assay |
limit |
numeric; default: 0.95 Indicates the limit of confidence in which to stop removing a batch effect. Must be between 0 and 1 |
numrepeats |
integer; default: 100000L the number of repeats in which to run the simulated batch mean distribution estimator using the random selection algorithm. |
SE object with an added Harman corrected reconstructed data
This function converts any found numerics to characters
heatmap_num_to_char_converter(ann_col)heatmap_num_to_char_converter(ann_col)
ann_col |
column data of heatmap |
ann_col modified column data of heatmap
library(scran) se <- mockSCE() col_info <- colData(se) ann_col <- heatmap_num_to_char_converter(ann_col = col_info) ann_collibrary(scran) se <- mockSCE() col_info <- colData(se) ann_col <- heatmap_num_to_char_converter(ann_col = col_info) ann_col
This function allows you to plot a heatmap
heatmap_plotter(se, assay, nfeature, annotation_column, log_option)heatmap_plotter(se, assay, nfeature, annotation_column, log_option)
se |
SummarizedExperiment |
assay |
normalized or corrected assay |
nfeature |
number of features to display |
annotation_column |
choose column |
log_option |
TRUE if data should be logged before plotting (recommended for sequencing counts), FALSE if data should not be logged (for instance, data is already logged) |
heatmap plot
library(scran) se <- mockSCE() heatmaps <- BatchQC::heatmap_plotter(se, assay = "counts", nfeature = 15, annotation_column = c("Mutation_Status", "Treatment"), log_option = FALSE) correlation_heatmap <- heatmaps$correlation_heatmap correlation_heatmap heatmap <- heatmaps$topn_heatmap heatmaplibrary(scran) se <- mockSCE() heatmaps <- BatchQC::heatmap_plotter(se, assay = "counts", nfeature = 15, annotation_column = c("Mutation_Status", "Treatment"), log_option = FALSE) correlation_heatmap <- heatmaps$correlation_heatmap correlation_heatmap heatmap <- heatmaps$topn_heatmap heatmap
Used in conjunction with the lambda
is_design_balanced(se, batch, covariate)is_design_balanced(se, batch, covariate)
se |
summarized experiment object |
batch |
string, batch variable |
covariate |
string, biological covariate |
Boolean Value, TRUE if the experimental design is balanced, FALSE if the experimental design is not balanced
library(scran) se <- mockSCE() balanced_design_check <- is_design_balanced(se, batch = "Mutation_Status", covariate = "Treatment") balanced_design_checklibrary(scran) se <- mockSCE() balanced_design_check <- is_design_balanced(se, batch = "Mutation_Status", covariate = "Treatment") balanced_design_check
adapted from kBET package (https://github.com/theislab/kBET).
kBET runs a chi square test to evaluate
the probability of a batch effect.
kBET( df, batch, k0 = NULL, knn = NULL, testSize = NULL, do.pca = TRUE, dim.pca = 50, heuristic = TRUE, n_repeat = 100, alpha = 0.05, addTest = FALSE, verbose = FALSE, plot = TRUE, adapt = TRUE )kBET( df, batch, k0 = NULL, knn = NULL, testSize = NULL, do.pca = TRUE, dim.pca = 50, heuristic = TRUE, n_repeat = 100, alpha = 0.05, addTest = FALSE, verbose = FALSE, plot = TRUE, adapt = TRUE )
df |
dataset (rows: cells, columns: features) |
batch |
batch id for each cell or a data frame with both condition and replicates |
k0 |
number of nearest neighbours to test on (neighbourhood size) |
knn |
an n x k matrix of nearest neighbours for each cell (optional) |
testSize |
number of data points to test, (10 percent sample size default, but at least 25) |
do.pca |
perform a pca prior to knn search? (defaults to TRUE) |
dim.pca |
if do.pca=TRUE, choose the number of dimensions to consider (defaults to 50) |
heuristic |
compute an optimal neighbourhood size k (defaults to TRUE) |
n_repeat |
to create a statistics on batch estimates, evaluate 'n_repeat' subsets |
alpha |
significance level |
addTest |
perform an LRT-approximation to the multinomial test AND a multinomial exact test (if appropriate) |
verbose |
displays stages of current computation (defaults to FALSE) |
plot |
if stats > 10, then a boxplot of the resulting rejection rates is created |
adapt |
In some cases, a number of cells do not contribute to any neighbourhood and this may cause an imbalance in observed and expected batch label frequencies. Frequencies will be adapted if adapt=TRUE (default). |
list object
summary - a rejection rate for the data,
an expected rejection rate for random
labeling and the significance for the observed result
results - detailed list for each tested cells;
p-values for expected and observed label distribution
average.pval - significance level over the averaged
batch label distribution in all neighbourhoods
stats - extended test summary for every sample
params - list of input parameters and adapted parameters,
respectively
outsider - only shown if adapt=TRUE. List of samples
without mutual nearest neighbour:
index - index of each outsider sample)
categories - tabularised labels of outsiders
p.val - Significance level of outsider batch label
distribution vs expected frequencies.
If the significance level is lower than alpha,
expected frequencies will be adapted
If the optimal neighbourhood size (k0) is smaller than 10, NA is returned.
library(scran) se <- mockSCE() df <- as.matrix(assays(se)[["counts"]]) batch <- data.frame(colData(se))[, "Treatment"] batch.estimate <- kBET(df, batch)library(scran) se <- mockSCE() df <- as.matrix(assays(se)[["counts"]]) batch <- data.frame(colData(se))[, "Treatment"] batch.estimate <- kBET(df, batch)
Limma Correction This function applies limma batch correction to your provided assay
limma_correction(se, assay_to_normalize, batch, covar, output_assay_name)limma_correction(se, assay_to_normalize, batch, covar, output_assay_name)
se |
SummarizedExperiment object |
assay_to_normalize |
Log assay that should be corrected |
batch |
Factor containing batch information |
covar |
list of covariates |
output_assay_name |
name of results assay |
SE object with an added limma corrected array
This is support data for the TB data set that contains the BMI data and ID numbers from both the curatedTBData database and the original study the data was used in
data(merged_IDs)data(merged_IDs)
A data frame with 91 rows and 3 columns
Subject ID found in curatedTBData
Subject ID in the original paper
subject's BMI from the original study
This function creates a histogram from the negative binomial goodness-of-fit adjusted pvalues.
nb_histogram(p_val_table)nb_histogram(p_val_table)
p_val_table |
table of adjusted p-values from the nb test |
a histogram of the number of genes within a p-value range
This function determines the proportion of p-values below a specific value and compares to the previously determined threshold
nb_proportion( p_val_table, low_pval, threshold, num_samples, small_sample_cutoff, method )nb_proportion( p_val_table, low_pval, threshold, num_samples, small_sample_cutoff, method )
p_val_table |
table of p-values from the nb test |
low_pval |
value of the p-value cut off to use in proportion |
threshold |
the value to compare the proportion of p-values to for data sets |
num_samples |
the number of samples in the analysis |
small_sample_cutoff |
value at which non-parametric test will be used (considered "large sample size") vs parametric will be used (considered "small sample size); default is 20 |
method |
string; method utilized for the parametric or non-parametric test; either "DESeq2" or "edgeR" |
a statement about whether DESeq2 is appropriate to use for analysis
This function allows you to add normalized count matrix to the SE object
normalize_SE( se, method, log_bool, assay_to_normalize, output_assay_name, condition = NULL, batch = NULL )normalize_SE( se, method, log_bool, assay_to_normalize, output_assay_name, condition = NULL, batch = NULL )
se |
SummarizedExperiment Object |
method |
string; Normalization Method, either 'CPM', 'DESeq', 'edgeR', 'voom', or 'none' for log(x+1) only |
log_bool |
True or False; True to log normalize the data set after normalization method |
assay_to_normalize |
string; SE assay to do normalization on |
output_assay_name |
string; name for the resulting normalized assay |
condition |
string; the biological variable of interest, required for voom, default 'NULL' |
batch |
string; the batch variable, required for voom, default 'NULL' |
the original SE object with normalized assay appended
library(scran) se <- mockSCE() se_CPM_normalized <- BatchQC::normalize_SE(se, method = "CPM", log_bool = FALSE, assay_to_normalize = "counts", output_assay_name = "CPM_normalized_counts") se_DESeq_normalized <- BatchQC::normalize_SE(se, method = "DESeq", log_bool = FALSE, assay_to_normalize = "counts", output_assay_name = "DESeq_normalized_counts") se_CPM_normalized se_DESeq_normalizedlibrary(scran) se <- mockSCE() se_CPM_normalized <- BatchQC::normalize_SE(se, method = "CPM", log_bool = FALSE, assay_to_normalize = "counts", output_assay_name = "CPM_normalized_counts") se_DESeq_normalized <- BatchQC::normalize_SE(se, method = "DESeq", log_bool = FALSE, assay_to_normalize = "counts", output_assay_name = "DESeq_normalized_counts") se_CPM_normalized se_DESeq_normalized
This function allows you to plot PCA
PCA_plotter(se, nfeature, color, shape, batch, assays, xaxisPC, yaxisPC, log_option = FALSE)PCA_plotter(se, nfeature, color, shape, batch, assays, xaxisPC, yaxisPC, log_option = FALSE)
se |
SummarizedExperiment object |
nfeature |
number of features |
color |
choose a color |
shape |
choose a shape |
batch |
variable representing batch (for ellipses) |
assays |
array of assay names from |
xaxisPC |
the PC to plot as the x axis |
yaxisPC |
the PC to plot as the y axis |
log_option |
TRUE if data should be logged before plotting (recommended for sequencing counts), FALSE if data should not be logged (for instance, data is already logged); FALSE by default |
List containing PCA info, PCA variance and PCA plot
library(scran) se <- mockSCE() se_object_ComBat_Seq <- BatchQC::batch_correct(se, method = "ComBat-Seq", assay_to_normalize = "counts", batch = "Mutation_Status", covar = "Treatment", output_assay_name = "ComBat_Seq_Corrected") pca_plot <- BatchQC::PCA_plotter(se = se_object_ComBat_Seq, nfeature = 2, color = "Mutation_Status", shape = "Treatment", batch = "batch", assays = c("counts", "ComBat_Seq_Corrected"), xaxisPC = 1, yaxisPC = 2, log_option = FALSE) pca_plot$plot pca_plot$var_explainedlibrary(scran) se <- mockSCE() se_object_ComBat_Seq <- BatchQC::batch_correct(se, method = "ComBat-Seq", assay_to_normalize = "counts", batch = "Mutation_Status", covar = "Treatment", output_assay_name = "ComBat_Seq_Corrected") pca_plot <- BatchQC::PCA_plotter(se = se_object_ComBat_Seq, nfeature = 2, color = "Mutation_Status", shape = "Treatment", batch = "batch", assays = c("counts", "ComBat_Seq_Corrected"), xaxisPC = 1, yaxisPC = 2, log_option = FALSE) pca_plot$plot pca_plot$var_explained
This function performs DESeq on the permuted dataset.
permuted_DESeq( count_matrix, condition, other_variables, conditions_df, model_formula )permuted_DESeq( count_matrix, condition, other_variables, conditions_df, model_formula )
count_matrix |
matrix containing the data to be analyzed |
condition |
a vector containing a factor of the condition of interest (typically batch) |
other_variables |
a vector of strings of other variables of interest |
conditions_df |
data frame containing information for the other variables of interest (columns in order of the other_variables vector) |
model_formula |
the stat formula to be used in the DESeq analysis |
a DESeq2 object
This function performs edgeR on the permuted dataset adjusted pvalues.
permuted_edgeR( count_matrix, condition, other_variables, conditions_df, model_formula )permuted_edgeR( count_matrix, condition, other_variables, conditions_df, model_formula )
count_matrix |
matrix containing the data to be analyzed |
condition |
a vector containing a factor of the condition of interest (typically batch) |
other_variables |
a vector of strings of other variables of interest |
conditions_df |
data frame containing information for the other variables of interest (columns in order of the other_variables vector) |
model_formula |
the stat formula to be used in the DESeq analysis |
edgeR fit
This function formats the PCA plot using ggplot
plot_data(pca_plot_data, color, shape, batch, xaxisPC, yaxisPC)plot_data(pca_plot_data, color, shape, batch, xaxisPC, yaxisPC)
pca_plot_data |
Data for all assays to plot |
color |
variable that will be plotted as color |
shape |
variable that will be plotted as shape |
batch |
variable representing batch for the ellipses |
xaxisPC |
the PC to plot as the x axis |
yaxisPC |
the PC to plot as the y axis |
PCA plot
This function generates a boxplot of observed and expected rejection rates for the provided kBET output list object
plot_kBET(kBET_res)plot_kBET(kBET_res)
kBET_res |
list object output from kBET function |
ggplot object containing kBET rejection boxplot
library(scran) se <- mockSCE() df <- as.matrix(assays(se)[["counts"]]) batch <- data.frame(colData(se))[, "Treatment"] batch.estimate <- kBET(df, batch) plot_kBET(batch.estimate)library(scran) se <- mockSCE() df <- as.matrix(assays(se)[["counts"]]) batch <- data.frame(colData(se))[, "Treatment"] batch.estimate <- kBET(df, batch) plot_kBET(batch.estimate)
Create potential min_distance values for exploratory analysis based on the value of spread
possible_distances(spread)possible_distances(spread)
spread |
numeric; the value of spread used in the exploratory analysis |
vector of min_distance values to use in exploratory analysis
Create a vector of possible nearest neighbor values from 5, 15, 25, 50, and 100
possible_k_neighbors(data_size)possible_k_neighbors(data_size)
data_size |
size of the data set used to create umaps |
k nearest neighbor list
Preprocess assay data
preprocess(se, assay, nfeature, log_option)preprocess(se, assay, nfeature, log_option)
se |
Summarized Experiment object |
assay |
Assay from SummarizedExperiment object |
nfeature |
Number of variable features to use |
log_option |
"True" if data should be logged, "False" otherwise |
Returns processed data
This function processes count data for dendrogram plotting
process_dendrogram(se, assay)process_dendrogram(se, assay)
se |
SummarizedExperiment object |
assay |
assay to plot |
named list of dendrogram data
dendrogram_segments is data representing segments of the dendrogram
dendrogram_ends is data representing ends of the dendrogram
library(scran) se <- mockSCE() process_dendro <- BatchQC::process_dendrogram(se, "counts") process_dendrolibrary(scran) se <- mockSCE() process_dendro <- BatchQC::process_dendrogram(se, "counts") process_dendro
This data consists of two batches and two conditions corresponding to case and control. The columns are case/control samples, and the rows represent 39 different proteins.
data(protein_data)data(protein_data)
A data frame with 39 rows and 24 variables
This data consists of two batches and two conditions corresponding to case and control for the protein expression data
data(protein_sample_info)data(protein_sample_info)
A data frame with 24 rows and 2 variables:
Batch Indicator
Condition (Case vs Control) Indicator
P-value Plotter This function allows you to plot p-values of explained variation
pval_plotter(DE_results)pval_plotter(DE_results)
DE_results |
Differential Expression analysis result (a named list of dataframes corresponding to each analysis completed with a "pvalue" column) |
boxplots of pvalues for each condition
library(scran) se <- mockSCE() differential_expression <- BatchQC::DE_analyze(se = se, method = "DESeq2", batch = "Treatment", conditions = c( "Mutation_Status"), assay_to_analyze = "counts", padj_method = "BH") pval_summary(differential_expression) pval_plotter(differential_expression)library(scran) se <- mockSCE() differential_expression <- BatchQC::DE_analyze(se = se, method = "DESeq2", batch = "Treatment", conditions = c( "Mutation_Status"), assay_to_analyze = "counts", padj_method = "BH") pval_summary(differential_expression) pval_plotter(differential_expression)
Returns summary table for p-values of explained variation
pval_summary(res_list)pval_summary(res_list)
res_list |
Differential Expression analysis result (a named list of dataframes corresponding to each analysis completed with a "pvalue" column) |
summary table for p-values of explained variation for each analysis
library(scran) se <- mockSCE() differential_expression <- BatchQC::DE_analyze(se = se, method = "DESeq2", batch = "Treatment", conditions = c( "Mutation_Status"), assay_to_analyze = "counts", padj_method = "BH") pval_summary(differential_expression)library(scran) se <- mockSCE() differential_expression <- BatchQC::DE_analyze(se = se, method = "DESeq2", batch = "Treatment", conditions = c( "Mutation_Status"), assay_to_analyze = "counts", padj_method = "BH") pval_summary(differential_expression)
This function allows you to plot ratios of explained variation
ratio_plotter(ev_ratio)ratio_plotter(ev_ratio)
ev_ratio |
table of ratios from variation_ratios() |
boxplot of ratios
library(scran) se <- mockSCE() se$Mutation_Status <- as.factor(se$Mutation_Status) se$Treatment <- as.factor(se$Treatment) expl_var_result <- batchqc_explained_variation(se, batch = "Mutation_Status", condition = "Treatment", assay_name = "counts") ratios_results <- variation_ratios(expl_var_result[[1]], batch = "Mutation_Status") ratio_boxplot <- BatchQC::ratio_plotter(ratios_results) ratio_boxplotlibrary(scran) se <- mockSCE() se$Mutation_Status <- as.factor(se$Mutation_Status) se$Treatment <- as.factor(se$Treatment) expl_var_result <- batchqc_explained_variation(se, batch = "Mutation_Status", condition = "Treatment", assay_name = "counts") ratios_results <- variation_ratios(expl_var_result[[1]], batch = "Mutation_Status") ratio_boxplot <- BatchQC::ratio_plotter(ratios_results) ratio_boxplot
compute_aic
Helper function that contains the code to run the lognormal, voom, and
negative binomial AIC models for compute_aic
run_AIC_models(dat, design, nb_test, maxit)run_AIC_models(dat, design, nb_test, maxit)
dat |
dataframe of the data to analyze |
design |
stats design model to be used in the analyses |
nb_test |
boolean; should negative binomial run (must be discrete data) |
maxit |
integer; the max number of IWLS iterations |
data frame; containing the AIC results of each method
This function runs the k-nearest neighbor batch effect test (kBET) to evaluate whether the data has detectable batch effect.
run_kBET( se, assay_to_normalize, batch, k0 = NULL, knn = NULL, testSize = NULL, do.pca = TRUE, dim.pca = 50, heuristic = TRUE, n_repeat = 100, alpha = 0.05, addTest = FALSE, verbose = FALSE, adapt = TRUE )run_kBET( se, assay_to_normalize, batch, k0 = NULL, knn = NULL, testSize = NULL, do.pca = TRUE, dim.pca = 50, heuristic = TRUE, n_repeat = 100, alpha = 0.05, addTest = FALSE, verbose = FALSE, adapt = TRUE )
se |
SummarizedExperiment object |
assay_to_normalize |
string; assay from se object to do normalization |
batch |
character string of column name that represents batch |
k0 |
integer representing number of nearest neighbors to test on (neighborhood size) |
knn |
n x k matrix of nearest neighbors for each cell (optional) |
testSize |
integer representing number of data points to test |
do.pca |
Boolean, if TRUE, perform a pca prior to knn search (defaults to TRUE) |
dim.pca |
Boolean, if do.pca=TRUE, choose the number of dimensions to consider (defaults to 50) |
heuristic |
Boolean, if true, compute an optimal neighborhood size k (defaults to TRUE) |
n_repeat |
numeric representing 'n_repeat' subsets to evaluate in order to create a statistics on batch estimates |
alpha |
numeric for significance level |
addTest |
Boolean, if TRUE, perform an LRT-approximation to the multinomial test AND a multinomial exact test (if appropriate) |
verbose |
Boolean, if TRUE, display stages of current computation (defaults to FALSE) |
adapt |
Boolean, if TRUE, frequencies will be adapted (defaults to TRUE) |
list object from kBET() function
summary - a rejection rate for the data,
an expected rejection rate for random
labeling and the significance for the observed result
results - detailed list for each tested cells;
p-values for expected and observed label distribution
average.pval - significance level over the averaged
batch label distribution in all neighbourhoods
stats - extended test summary for every sample
params - list of input parameters and adapted parameters,
respectively
outsider - only shown if adapt=TRUE. List of samples
without mutual nearest neighbour:
index - index of each outsider sample)
categories - tabularised labels of outsiders
p.val - Significance level of outsider batch label
distribution vs expected frequencies.
If the significance level is lower than alpha,
expected frequencies will be adapted
library(scran) se <- mockSCE() kBET_result <- BatchQC::run_kBET( se=se, assay_to_normalize="counts", batch="Treatment" ) BatchQC::plot_kBET(kBET_result)library(scran) se <- mockSCE() kBET_result <- BatchQC::run_kBET( se=se, assay_to_normalize="counts", batch="Treatment" ) BatchQC::plot_kBET(kBET_result)
This functions determines if an experimental design is balanced, then calculates the lambda statistic for balanced designs and provides a recommendation on if batch correction should be utilized. In general, unbalanced designs always benefit from batch correction, while balanced designs with a lambda greater than -2 benefit from batch correction.
run_lambda(se, assay, batch, condition)run_lambda(se, assay, batch, condition)
se |
summarized experiment object |
assay |
string, the assay to analyze |
batch |
string, batch variable |
condition |
string, condition variable |
a named list with:
provides the output of compute_lambda function
string, rec for batch correction
a list with 2 parameters, 'lambda_stat' which contains the adj lambda value from lambda_compute (ln(lambda)) and 'correction_recommendation' which contains a string with a recommendation on if batch correction should be completed
library(scran) se <- mockSCE() lambda_calculation <- run_lambda(se, assay = "counts", batch = "Mutation_Status", condition = "Treatment") print(lambda_calculation$correction_recommendation) print(lambda_calculation$lambda_stat)library(scran) se <- mockSCE() lambda_calculation <- run_lambda(se, assay = "counts", batch = "Mutation_Status", condition = "Treatment") print(lambda_calculation$correction_recommendation) print(lambda_calculation$lambda_stat)
This data consists of three batches and ten conditions. The columns are samples, and the rows represent 1600 different genes.
data(signature_data)data(signature_data)
A data frame with 1600 rows and 89 variables
Calculate a standardized Pearson correlation coefficient
std_pearson_corr_coef(bd)std_pearson_corr_coef(bd)
bd |
batch design |
standardized Pearson correlation coefficient
library(scran) se <- mockSCE() batch_design_tibble <- batch_design(se, batch = "Mutation_Status", covariate = "Treatment") pearson_cor_result <- BatchQC::std_pearson_corr_coef(batch_design_tibble) pearson_cor_resultlibrary(scran) se <- mockSCE() batch_design_tibble <- batch_design(se, batch = "Mutation_Status", covariate = "Treatment") pearson_cor_result <- BatchQC::std_pearson_corr_coef(batch_design_tibble) pearson_cor_result
This function creates a summarized experiment object from count and metadata files uploaded by the user
summarized_experiment(counts, columndata)summarized_experiment(counts, columndata)
counts |
counts matrix |
columndata |
metadata dataframe |
a summarized experiment object
data(protein_data) data(protein_sample_info) se_object <- summarized_experiment(protein_data, protein_sample_info)data(protein_data) data(protein_sample_info) se_object <- summarized_experiment(protein_data, protein_sample_info)
Summary Stats EV Table Returns table with min, 1st quartile, mean, 2nd quartile, and max for each variable in the explained variation boxplot
summary_stats_EV_table(batchqc_ev)summary_stats_EV_table(batchqc_ev)
batchqc_ev |
explained variation results from |
summary_stats_table dataframe containing the min, 1st quartile, mean, 2nd quartile, and max for each variable in the explained variation boxplot
library(scran) se <- mockSCE() se$Mutation_Status <- as.factor(se$Mutation_Status) se$Treatment <- as.factor(se$Treatment) exp_var_result <- BatchQC::batchqc_explained_variation(se, batch = "Mutation_Status", condition = "Treatment", assay_name = "counts") summary_stats_table <- BatchQC::summary_stats_EV_table(exp_var_result[[1]]) summary_stats_tablelibrary(scran) se <- mockSCE() se$Mutation_Status <- as.factor(se$Mutation_Status) se$Treatment <- as.factor(se$Treatment) exp_var_result <- BatchQC::batchqc_explained_variation(se, batch = "Mutation_Status", condition = "Treatment", assay_name = "counts") summary_stats_table <- BatchQC::summary_stats_EV_table(exp_var_result[[1]]) summary_stats_table
sva Correction This function applies sva correction to a summarized experiment object (implementation adapted from sva::psva)
sva_correction( se, assay_to_normalize, var_of_interest, covar, output_assay_name, psva = FALSE )sva_correction( se, assay_to_normalize, var_of_interest, covar, output_assay_name, psva = FALSE )
se |
SummarizedExperiment object |
assay_to_normalize |
string; name of assay that should be corrected |
var_of_interest |
string; name of experimental variable of interest |
covar |
list; sting list of covariates to include in sva analysis |
output_assay_name |
string; name of results assay |
psva |
boolean; default: FALSE. If set to TRUE and no covariate input, psva function from the sva package will be used to remove batch effect. |
SE object with an added sva corrected array
svaseq Correction This function applies sva correction to a summarized experiment object with count based RNA-seq data
svaseq_correction( se, assay_to_normalize, var_of_interest, covar, output_assay_name, num_sv = FALSE )svaseq_correction( se, assay_to_normalize, var_of_interest, covar, output_assay_name, num_sv = FALSE )
se |
SummarizedExperiment object |
assay_to_normalize |
string; name of assay that should be corrected |
var_of_interest |
string; name of experimental variable of interest |
covar |
list; sting list of covariates to include in sva analysis |
output_assay_name |
string; name of results assay |
num_sv |
boolean; Default is FALSE: the number of estimated latent factor is set to 1 for a small number of samples. If set to TRUE, svaseq function will estimate the number of latent factors for you. |
SE object with an added sva corrected array
TB data upload This function uploads the TB data set from the curatedTBData package.
tb_data_upload()tb_data_upload()
a SE object with raw counts data and metadata
library(curatedTBData) se_object <- tb_data_upload()library(curatedTBData) se_object <- tb_data_upload()
Create a umap plot; wrapper function for umap package plus custom plotting
umap( se_object, assay_of_interest, batch, covar, neighbors = 15, min_distance = 0.1, spread = 1, exploratory = FALSE, log_option = FALSE )umap( se_object, assay_of_interest, batch, covar, neighbors = 15, min_distance = 0.1, spread = 1, exploratory = FALSE, log_option = FALSE )
se_object |
se_object; containing data of interest |
assay_of_interest |
string; the assay in the se_object to plot |
batch |
string; representing batch |
covar |
string; representing biological variable |
neighbors |
integer; number of nearest neighbors, default 15 per umap; lower values prioritize local structure, higher values will represent bigger picture but lose finer details |
min_distance |
numeric; how close points appear in final layout; higher values puts less emphasis on global structure; must be less than spread |
spread |
numeric; dispersion of points in umap |
exploratory |
Boolean; default is FALSE, if TRUE, a 5x5 grid with k = 15, 25, 50, 100 and min_distance = 0.1, .2, .5, .75, .99 will be plotted |
log_option |
Boolean; default is FALSE, if TRUE, log(assay_of_interest + 1) is computed and used for plotting; useful for count data |
umap plot
library(scran) se <- mockSCE() se$Treatment <- as.factor(se$Treatment) se$Mutation_Status <- as.factor(se$Mutation_Status) umap_plot <- BatchQC::umap(se_object = se, assay_of_interest = "counts", batch = "Treatment", covar = "Mutation_Status", log_option = TRUE) umap_plotlibrary(scran) se <- mockSCE() se$Treatment <- as.factor(se$Treatment) se$Mutation_Status <- as.factor(se$Mutation_Status) umap_plot <- BatchQC::umap(se_object = se, assay_of_interest = "counts", batch = "Treatment", covar = "Mutation_Status", log_option = TRUE) umap_plot
Creates Ratios of batch to variable variation statistic
variation_ratios(ex_variation_table, batch)variation_ratios(ex_variation_table, batch)
ex_variation_table |
table of explained variation results from batchqc_explained_variation |
batch |
batch |
dataframe with condition/batch ratios
library(scran) se <- mockSCE() se$Mutation_Status <- as.factor(se$Mutation_Status) se$Treatment <- as.factor(se$Treatment) expl_var_result <- batchqc_explained_variation(se, batch = "Mutation_Status", condition = "Treatment", assay_name = "counts") ratios_results <- variation_ratios(expl_var_result[[1]], batch = "Mutation_Status") ratios_resultslibrary(scran) se <- mockSCE() se$Mutation_Status <- as.factor(se$Mutation_Status) se$Treatment <- as.factor(se$Treatment) expl_var_result <- batchqc_explained_variation(se, batch = "Mutation_Status", condition = "Treatment", assay_name = "counts") ratios_results <- variation_ratios(expl_var_result[[1]], batch = "Mutation_Status") ratios_results
This function allows you to plot DE analysis results as a volcano plot
volcano_plot(DE_results, pslider = 0.05, fcslider)volcano_plot(DE_results, pslider = 0.05, fcslider)
DE_results |
a dataframe with the results of one of the DE Analysis; must include "log2FoldChange" and "pvalue" columns |
pslider |
Magnitude of significance value threshold, default is 0.05 |
fcslider |
Magnitude of expression change value threshold |
A volcano plot of expression change and significance value data
library(scran) se <- mockSCE() differential_expression <- BatchQC::DE_analyze(se = se, method = "DESeq2", batch = "Treatment", conditions = c( "Mutation_Status", "Cell_Cycle"), assay_to_analyze = "counts", padj_method = "BH") value <- round((max(abs( differential_expression[[length(differential_expression)]][, 1])) + min(abs( differential_expression[[length(differential_expression)]][, 1]))) / 2) volcano_plot(differential_expression[[1]], pslider = 0.05, fcslider = value)library(scran) se <- mockSCE() differential_expression <- BatchQC::DE_analyze(se = se, method = "DESeq2", batch = "Treatment", conditions = c( "Mutation_Status", "Cell_Cycle"), assay_to_analyze = "counts", padj_method = "BH") value <- round((max(abs( differential_expression[[length(differential_expression)]][, 1])) + min(abs( differential_expression[[length(differential_expression)]][, 1]))) / 2) volcano_plot(differential_expression[[1]], pslider = 0.05, fcslider = value)