| Title: | Automatic Generation of Functional Screen Analyses |
|---|---|
| Description: | Implements pipelines for generating functional screen analysis reports in the augere framework. This uses voom to test for differential abundance of barcodes with consolidation into gene-level statistics. Each pipeline function generates a self-contained Rmarkdown report with all of the steps required to reproduce the analysis. |
| Authors: | Aaron Lun [cre, aut] (ORCID: <https://orcid.org/0000-0002-3564-4813>) |
| Maintainer: | Aaron Lun <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.99.2 |
| Built: | 2026-05-15 06:34:42 UTC |
| Source: | https://github.com/bioc/augere.screen |
Implements pipelines for generating functional screen analysis reports in the augere framework. This uses voom to test for differential abundance of barcodes with consolidation into gene-level statistics. Each pipeline function generates a self-contained Rmarkdown report with all of the steps required to reproduce the analysis.
Maintainer: Aaron Lun [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/augere-bioinfo/augere.screen/issues
Use voom to test for differential abundance of barcodes in a functional screen.
runScreen( x, groups, comparisons, covariates = NULL, block = NULL, subset.factor = NULL, subset.levels = NULL, subset.groups = TRUE, design = NULL, contrasts = NULL, dc.block = NULL, robust = TRUE, trend = FALSE, quality = TRUE, lfc.threshold = 0, filter.reference.factor = NULL, filter.reference.levels = NULL, filter.num.mads = 3, filter.default = TRUE, norm.control.barcodes = NULL, norm.control.genes = NULL, norm.control.type.field = NULL, norm.control.types = NULL, norm.tmm = TRUE, gene.field = NULL, report.summary = TRUE, summary.threshold = 0.05, consolidation = c("simes", "fry", "holm-min"), fry.args = list(), holm.min.n = 3, holm.min.prop = 0.3, assay = 1, row.data = gene.field, gene.data = NULL, metadata = NULL, output.dir = "voom", author = NULL, dry.run = FALSE, save.results = TRUE, suppress.plots = FALSE )runScreen( x, groups, comparisons, covariates = NULL, block = NULL, subset.factor = NULL, subset.levels = NULL, subset.groups = TRUE, design = NULL, contrasts = NULL, dc.block = NULL, robust = TRUE, trend = FALSE, quality = TRUE, lfc.threshold = 0, filter.reference.factor = NULL, filter.reference.levels = NULL, filter.num.mads = 3, filter.default = TRUE, norm.control.barcodes = NULL, norm.control.genes = NULL, norm.control.type.field = NULL, norm.control.types = NULL, norm.tmm = TRUE, gene.field = NULL, report.summary = TRUE, summary.threshold = 0.05, consolidation = c("simes", "fry", "holm-min"), fry.args = list(), holm.min.n = 3, holm.min.prop = 0.3, assay = 1, row.data = gene.field, gene.data = NULL, metadata = NULL, output.dir = "voom", author = NULL, dry.run = FALSE, save.results = TRUE, suppress.plots = FALSE )
x |
A SummarizedExperiment object containing read counts for each barcode (row) and sample (column).
Alternatively, the output of |
groups |
String specifying the |
comparisons |
Character vector specifying two or more groups to compare from |
covariates |
Character vector specifying the |
block |
Character vector specifying the |
subset.factor |
String specifying the |
subset.levels |
Vector containing the levels of the |
subset.groups |
Boolean indicating whether to automatically subset the dataset to only those samples assigned to groups in |
design |
Matrix, function, or formula specifying the experimental design, see |
contrasts |
String, function or matrix specifying a custom contrast, or a list of such objects; see |
dc.block |
String specifying the blocking factor to use in |
robust |
Boolean indicating whether robust empirical Bayes shrinkage should be used in |
trend |
Boolean indicating whether variances should be shrunk towards a trend in |
quality |
Boolean indicating whether quality weighting should be performed. This reduces the influence of low-quality samples at the cost of more computational work. |
lfc.threshold |
Number specifying a threshold on the log-fold change in |
filter.reference.factor |
String specifying |
filter.reference.levels |
Character vector containing the reference levels in the |
filter.num.mads |
Number specifying the number of median absolute deviations below the median to define an outlier threshold for filtering barcodes.
Only used if |
filter.default |
Boolean indicating whether to filter barcodes using the default edgeR method, i.e., |
norm.control.barcodes |
A named list of character vectors containing one or more sets of control barcodes.
Each vector should contain the names of barcodes, corresponding to the row names of |
norm.control.genes |
A named list containing one or more sets of control genes.
Each vector should contain the names of genes, corresponding to the entries of the |
norm.control.type.field |
String specifying the |
norm.control.types |
Character vector containing the control feature types in the |
norm.tmm |
Boolean indicating whether TMM normalization should be used.
If |
gene.field |
String specifying the |
report.summary |
Boolean indicating whether to report barcode-level summaries within the per-gene data frames.
Only used if |
summary.threshold |
Number specifying the FDR threshold when summarizing the number of significant barcodes per gene.
Only used if |
consolidation |
Character vector specifying the consolidation method(s) to convert per-barcode statistics into per-gene results. This can be zero, one or more of:
Only used if |
fry.args |
Named list of additional arguments to pass to |
holm.min.n |
Integer specifying the minimum number of significant barcodes per gene when |
holm.min.prop |
Integer specifying the minimum proportion of significant barcodes per gene when |
assay |
String or integer specifying the assay of |
row.data |
Character vector specifying the |
gene.data |
Character vector containing the names of |
metadata |
Named list of additional metadata to store alongside each result. |
output.dir |
String containing the path to an output directory in which to write the Rmarkdown file and save results. |
author |
Character vector containg the names of the authors.
If |
dry.run |
Boolean indicating whether to perform a dry run.
This generates the Rmarkdown report in |
save.results |
Boolean indicating whether the results should be saved to file. |
suppress.plots |
Boolean indicating whether plots should be suppressed.
This can be set to |
A Rmarkdown report named report.Rmd is written inside output.dir that contains the analysis commands.
If dry.run=FALSE, a list is returned containing:
barcode, a named list of DataFrames containing barcode-level result tables.
Each DataFrame corresponds to a comparison/contrast where each row corresponds to a barcode (i.e., row) in se.
Each DataFrame contains the following columns:
AveAb, the average abundance.
t, the t-statistic.
(For non-ANOVA-like contrasts only.)
F, the F-statistic.
(For ANOVA-like contrasts only.)
LogFC, the log-fold change.
(For non-ANOVA-like contrasts only.)
LogFC.<COLUMN>, the log-fold change corresponding to each column of the contrast matrix.
(For ANOVA-like contrasts only.)
PValue, the p-value;
FDR, the Benjamini-Hochberg-adjusted p-value.
gene, a named list of lists containing one entry per method in consolidation.
Each inner list contains one DataFrame per contrast where each row corresponds to a gene (not barcode).
Each DataFrame contains at least the following columns:
NumBarcodes, the number of barcodes per gene (possibly after filtering).
PValue, the p-value for differential abundance across all barcodes for the gene.
FDR, the Benjamini-Hochberg-adjusted p-value.
Direction, the overall direction of change for all barcodes of the gene.
(For non-ANOVA-like contrasts only.)
NumUp, the number of barcodes with a significant increase in abundance.
(For non-ANOVA-like contrasts only, when report.summary=TRUE.)
NumDown, the number of barcodes with a significant decrease in abundance.
(For non-ANOVA-like contrasts only, when report.summary=TRUE.)
LogFC, the log-fold change of the barcode with the lowest p-value.
(For non-ANOVA-like contrasts only, when report.summary=TRUE.)
NumSig, the number of significant barcodes.
(For ANOVA-like contrasts only, when report.summary=TRUE.)
AveAb, the average abundance of the barcode with the lowest p-value.
Only reported if gene.field is not NULL.
normalized, a copy of x with normalized abundance values, possibly subsetted by sample.
This contains:
lib.size and norm.factors columns in its colData,
containing the library sizes and normalization factors, respectively.
a retained column in its rowData,
indicating whether a gene was retained after filtering.
a logcounts assay, containing the log-normalized counts.
normalized may be subsetted by sample, depending on subset.factor, subset.group, etc.
If save.results=TRUE, the results are saved in a results directory inside output.
If dry.run=TRUE, NULL is returned.
Only the Rmarkdown report is saved to file.
A reference sample should represent the barcode distribution before any perturbation, e.g., time zero in a time course. All barcodes should have similar abundances in the reference samples as they should be present at the same molarity in the original barcode library. If a barcode has very low abundance, it was probably missing from the original library due to some manufacturing defect.
To identify these low outliers, we compute the average abundance across reference samples for each barcode. We define a filtering threshold based on the median absolute deviation (MAD) below the median of the average abundances across barcodes. Barcodes with average abundances below this threshold are discarded.
We also use filterByExpr to remove low-abundance barcodes.
This ensures that all remaining barcodes have sufficiently large counts for voom,
especially if the reference samples themselves are not used in the rest of the analysis.
Control barcodes might not target any gene (non-targeting controls, or NTCs) or they might target non-essential genes (NEGs).
We assume that such barcodes do not genuinely change in abundance across conditions, allowing us to use them to compute normalization factors.
The choice of control barcodes can be made by one or more of the norm.barcodes, norm.genes and norm.types arguments.
If more than one of these modes is provided, the union of barcodes is used.
By default, we use TMM normalization (see calcNormFactors) on the control barcodes.
This avoids composition biases and provides some robustness against changes in abundance due to inadvertent biological activity.
If norm.tmm=FALSE, we instead use the total sum of counts across the controls to derive normalization factors.
This can be more precise but is more sensitive to genuine changes in abundance.
If no control barcodes are specified, normalization uses all barcodes that remain after filtering.
If norm.tmm=TRUE, TMM normalization is performed on all (filtered) barcodes.
Otherwise, the total sum of counts across all (filtered) barcodes is used.
Aaron Lun, Jean-Philippe Fortin
# Creating an example dataset. mu <- 2^runif(1000, 0, 5) grouping <- rep(c("A", "B", "ctrl"), each=3) counts <- matrix(rnbinom(length(mu)* length(grouping), mu=mu, size=20), ncol=length(grouping)) rownames(counts) <- sprintf("BARCODE-%s", seq_along(mu)) library(SummarizedExperiment) se <- SummarizedExperiment(list(counts=counts)) rowData(se)$type <- sample(c("NTC", "NEG", "other"), length(mu), replace=TRUE) rowData(se)$gene <- paste0("GENE-", sample(LETTERS, length(mu), replace=TRUE)) colData(se)$group <- grouping output <- tempfile() res <- runScreen( se, groups="group", comparisons=c("A", "B"), norm.control.type.field="type", norm.control.types="NEG", filter.reference.factor="group", filter.reference.levels="ctrl", gene.field="gene", output.dir=output ) list.files(output, recursive=TRUE) res$barcode[[1]] res$gene$simes[[1]] res$normalized# Creating an example dataset. mu <- 2^runif(1000, 0, 5) grouping <- rep(c("A", "B", "ctrl"), each=3) counts <- matrix(rnbinom(length(mu)* length(grouping), mu=mu, size=20), ncol=length(grouping)) rownames(counts) <- sprintf("BARCODE-%s", seq_along(mu)) library(SummarizedExperiment) se <- SummarizedExperiment(list(counts=counts)) rowData(se)$type <- sample(c("NTC", "NEG", "other"), length(mu), replace=TRUE) rowData(se)$gene <- paste0("GENE-", sample(LETTERS, length(mu), replace=TRUE)) colData(se)$group <- grouping output <- tempfile() res <- runScreen( se, groups="group", comparisons=c("A", "B"), norm.control.type.field="type", norm.control.types="NEG", filter.reference.factor="group", filter.reference.levels="ctrl", gene.field="gene", output.dir=output ) list.files(output, recursive=TRUE) res$barcode[[1]] res$gene$simes[[1]] res$normalized