| Title: | Automatic Generation of Differential Expression Analyses |
|---|---|
| Description: | Implements pipelines for generating differential expression analysis reports in the augere framework. This includes analyses with edgeR or voom-limma, with a variety of options for contrasts, blocking and covariates. Each pipeline function generates a self-contained Rmarkdown report with all of the steps required to reproduce the DE analysis. |
| Authors: | Aaron Lun [cre, aut] (ORCID: <https://orcid.org/0000-0002-3564-4813>) |
| Maintainer: | Aaron Lun <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.99.2 |
| Built: | 2026-05-15 06:35:02 UTC |
| Source: | https://github.com/bioc/augere.de |
Implements pipelines for generating differential expression analysis reports in the augere framework. This includes analyses with edgeR or voom-limma, with a variety of options for contrasts, blocking and covariates. Each pipeline function generates a self-contained Rmarkdown report with all of the steps required to reproduce the DE analysis.
Maintainer: Aaron Lun [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/augere-bioinfo/augere.de/issues
List all groups involved in contrasts, to use with subset.group=TRUE.
This is mostly intended for use by developers of differential analysis pipelines.
findSubsetGroups(contrast.info)findSubsetGroups(contrast.info)
contrast.info |
List of contrast information as returned by |
No subsetting by group is performed if covariate-based contrasts are present, as all samples are potentially informative in such an analysis.
Character vector of the groups involved in "versus" or "anova" contrasts.
If any covariate-based contrasts are present, NULL is always returned.
Aaron Lun
findSubsetGroups(processSimpleComparisons(c("disease", "healthy"))) findSubsetGroups(processSimpleComparisons(c("treated1", "treated2", "healthy"))) findSubsetGroups(processSimpleComparisons(c("age")))findSubsetGroups(processSimpleComparisons(c("disease", "healthy"))) findSubsetGroups(processSimpleComparisons(c("treated1", "treated2", "healthy"))) findSubsetGroups(processSimpleComparisons(c("age")))
Load an example RNA-seq dataset from the airway package. This is a convenience wrapper that strips out some unnecesssary metadata.
loadExampleDataset()loadExampleDataset()
A RangedSummarizedExperiment containing the airway dataset.
Aaron Lun
loadExampleDataset()loadExampleDataset()
Create R commands to define the contrast metadata for the results of runEdgeR and runVoom.
processContrastMetadata(info)processContrastMetadata(info)
info |
List containing information for a single contrast.
This is typically an entry of the list returned by |
Character vector containing arguments for constructing the gpsa.differential_gene_expression sublist of the result metadata.
Aaron Lun
contrast.info <- processSimpleComparisons(list( main=c("disease", "healthy"), secondary="dosage" )) cat(processContrastMetadata(contrast.info[[1]]), sep="\n") cat(processContrastMetadata(contrast.info[[2]]), sep="\n")contrast.info <- processSimpleComparisons(list( main=c("disease", "healthy"), secondary="dosage" )) cat(processContrastMetadata(contrast.info[[1]]), sep="\n") cat(processContrastMetadata(contrast.info[[2]]), sep="\n")
Construct R commands to define custom contrast vectors/matrices.
processCustomContrasts( contrasts, design.name = "design", contrast.name = "con" )processCustomContrasts( contrasts, design.name = "design", contrast.name = "con" )
contrasts |
Numeric matrix or vector, a function to generate such a matrix/vector,
or a character vector to be passed to |
design.name |
String containing the variable name of the design matrix. |
contrast.name |
String containing the variable name of the contrast vector/matrix. |
If contrasts is a a character vector, it will be passed to makeContrasts with levels= set to the design matrix.
A vector of length 2 or more represents an ANOVA-like comparison.
If contrasts is a numeric vector, it should have names that match some or all of the column names of the design matrix.
The values of this vector represent the entries of the contrast vector for the named coefficients; all other entries of the contrast vector are set to zero.
If contrasts is a numeric matrix, its row names should match some or all of the column names of the design matrix.
The rows of this vector represent the rows of the contrast matrix for the named coefficients; all other entries of the contrast matrix are set to zero.
The column names of the contrast matrix are set to those of contrasts, if available.
If contrasts is a function, it should accept the design matrix and return a contrast vector/matrrix.
This mode is useful when the deparsed design matrix is difficult to read in the Rmarkdown report.
A list containing one entry per comparison, where each entry contains:
title, the title for the comparison.
If comparisons is named, the corresponding (non-empty) name is used here, otherwise an appropriate title is automatically generated.
type, the type of the comparison.
This is always set to "custom".
commands, character vector of R commands that produce the desired contrast vector/matrix.
This assumes that the evaluation environment has a design matrix named design.name.
The newly defined contrast vector/matrix will be stored as contrast.name in the environment.
processCustomContrasts(c(coef1 = 0.5, coef2 = 0.5, coef3 = -1)) mat <- matrix(0, 3, 2) rownames(mat) <- c("grp1", "grp2", "grp3") mat["grp1",] <- 1 mat["grp2",1] <- -1 mat["grp3",2] <- -1 processCustomContrasts(mat) processCustomContrasts("TREATMENT - CONTROL") processCustomContrasts(function(design) { out <- numeric(ncol(design)) names(out) <- colnames(design) out[c("treated", "control")] <- c(1, -1) out })processCustomContrasts(c(coef1 = 0.5, coef2 = 0.5, coef3 = -1)) mat <- matrix(0, 3, 2) rownames(mat) <- c("grp1", "grp2", "grp3") mat["grp1",] <- 1 mat["grp2",1] <- -1 mat["grp3",2] <- -1 processCustomContrasts(mat) processCustomContrasts("TREATMENT - CONTROL") processCustomContrasts(function(design) { out <- numeric(ncol(design)) names(out) <- colnames(design) out[c("treated", "control")] <- c(1, -1) out })
Construct R commands to create a custom design matrix, to be inserted into the generated Rmarkdown report.
processCustomDesignMatrix(design, se.name, design.name = "design")processCustomDesignMatrix(design, se.name, design.name = "design")
design |
Function, formula or matrix specifying the experimental design. |
se.name |
String containing the variable name of the SummarizedExperiment object. |
design.name |
String containing the variable name of the design matrix. |
If design is a function, it should accept a SummarizedExperiment and return the matrix.
If design is a formula, it should use the column names of the SummarizedExperiment's colData.
This is passed to model.matrix using data=colData(se) for a SummarizedExperiment se.
If design is a numeric matrix, it is deparsed and used verbatim.
The number of rows of this matrix should be equal to the number of samples.
The design matrix is expected to be of full column rank.
Character vector containing R commands that, upon evaluation, create a design matrix in the evaluation environment.
Aaron Lun
processSimpleDesignMatrix, for creating simple design matrices.
processCustomContrasts, to create custom contrasts.
cat(processCustomDesignMatrix(~ batch + treatment, "se")) cat(processCustomDesignMatrix(function(x) { model.matrix(~ batch + treatment, data=colData(x)) }, "se")) batch <- factor(rep(1:3, each=2)) treatment <- rep(c("Drg", "Ctrl"), 3) mat <- model.matrix(~ batch + treatment) cat(processCustomDesignMatrix(mat, "se"), sep="\n")cat(processCustomDesignMatrix(~ batch + treatment, "se")) cat(processCustomDesignMatrix(function(x) { model.matrix(~ batch + treatment, data=colData(x)) }, "se")) batch <- factor(rep(1:3, each=2)) treatment <- rep(c("Drg", "Ctrl"), 3) mat <- model.matrix(~ batch + treatment) cat(processCustomDesignMatrix(mat, "se"), sep="\n")
Construct R commands to define contrast vectors/matrices for simple comparisons between groups or covariates.
processSimpleComparisons( comparisons, design.name = "design", contrast.name = "con" )processSimpleComparisons( comparisons, design.name = "design", contrast.name = "con" )
comparisons |
Character vector specifying the groups to compare or the covariates to test. Non-character vectors will be coerced into character vectors. Alternatively, a (possibly named) list of such vectors where each entry represents a different comparison. |
design.name |
String containing the variable name of the design matrix. |
contrast.name |
String containing the variable name of the contrast vector/matrix. |
If a vector in comparisons is unnamed and of length 1, the sole entry is assumed to refer to a covariate in the covariates from processSimpleDesignMatrix.
The null hypothesis is that the coefficient of the design matrix with the same name is zero, i.e., the covariate has no effect.
If a comparison vector is unnamed and of length 2, the entries represent the names of two groups in the specified groups factor.
The null hypothesis is that there is no differential expression between the two groups.
The log-fold change is defined as the first group (left) over the second (right).
If a comparison vector is unnamed and of length 3 or greater, the null hypothesis is that all of the specified levels of groups are equal.
This is an ANOVA-like contrast where contrasts are formulated with respect to the last level,
i.e., for n coefficients, n-1 log-fold changes are reported representing the differences relative to the last coefficient.
If the comparison vector is named, all entries with the same name are assumed to represent a group of coefficients.
If there is only one unique name, all entries of the vector are assumed to refer to entries of covariates.
The null hypothesis is that the average of the coefficients for the specified covariates is equal to zero.
If there are exactly two unique names, these are assumed to refer to two sets of entries of groups.
The null hypothesis is that the average of the per-group coefficients are equal between the two sets.
If there are three or more names, these are assumed to refer to multiple sets of entries of groups.
The null hypothesis is that the average of the per-group coefficients are equal across all sets, i.e., an ANOVA-like comparison.
A list containing one entry per comparison, where each entry contains:
title, the title for the comparison.
If comparisons is named, the corresponding (non-empty) name is used here, otherwise an appropriate title is automatically generated.
type, the type of the comparison.
This is one of "covariate" (for covariates), "versus" (for comparison between two groups or two sets of groups) or "anova" (for ANOVAs).
left and right, lists of strings specifying the groups on the left (numerator) or right (denominator) of a “versus” comparison.
Only present if type = "versus".
groups, list of lists of strings specifying the groups involved in an ANOVA-like comparison.
Only present if type = "anova".
covariate, list of strings specifying the covariate(s) being tested.
Only present if type = "covariate".
commands, character vector of R commands that produce the desired contrast vector/matrix.
This assumes that the evaluation environment has a design matrix named design.name.
The newly defined contrast vector/matrix will be stored as contrast.name in the environment.
processSimpleDesignMatrix, to generate the corresponding design matrix.
processCustomContrasts, to define more complex custom contrasts.
processSimpleComparisons(c("disease", "healthy")) processSimpleComparisons("dosage") processSimpleComparisons(c("untreated", "treated", "healthy")) processSimpleComparisons(c(treated="treatment1", treated="treatment2", healthy="healthy")) processSimpleComparisons(list( main=c("disease", "healthy"), secondary="dosage" ))processSimpleComparisons(c("disease", "healthy")) processSimpleComparisons("dosage") processSimpleComparisons(c("untreated", "treated", "healthy")) processSimpleComparisons(c(treated="treatment1", treated="treatment2", healthy="healthy")) processSimpleComparisons(list( main=c("disease", "healthy"), secondary="dosage" ))
Construct R commands to create a simple design matrix, to be inserted into the generated Rmarkdown report.
processSimpleDesignMatrix( groups, covariates, block, se.name, design.name = "design" )processSimpleDesignMatrix( groups, covariates, block, se.name, design.name = "design" )
groups |
String specifying the |
covariates |
Character vector specifying the |
block |
Character vector specifying |
se.name |
String containing the variable name of the SummarizedExperiment object. |
design.name |
String containing the variable name of the design matrix. |
When creating the design matrix, groups, covariates and block are treated as additive factors.
If groups is specified, the design matrix will not have an intercept.
The first few columns will correspond to the levels of groups and are named by concatenating groups with the factor level.
If groups is not supplied, the design matrix will have an intercept in the first column.
If covariates are specified, they are represented by the columns after the per-group columns (if groups if supplied) or the intercept (otherwise).
All remaining columns will correspond to the various levels of the block factors.
Syntactically invalid column names and levels can be used for all arguments.
The design matrix is expected to be of full column rank, e.g., groups is not confounded with elements of block.
Character vector containing R commands that, upon evaluation, create a design matrix in the evaluation environment.
Aaron Lun
processCustomDesignMatrix, for creating more complex design matrices.
processSimpleComparisons, to create contrasts for this design matrix.
cat(processSimpleDesignMatrix("my_groups", "age", "batch", "se"), sep="\n")cat(processSimpleDesignMatrix("my_groups", "age", "batch", "se"), sep="\n")
Test for differentially expressed (DE) genes from an RNA-seq count matrix using the quasi-likelihood (QL) framework in edgeR.
runEdgeR( x, groups, comparisons, covariates = NULL, block = NULL, subset.factor = NULL, subset.levels = NULL, subset.groups = TRUE, design = NULL, contrasts = NULL, robust = TRUE, trend = TRUE, lfc.threshold = 0, assay = 1, row.data = NULL, metadata = NULL, output.dir = "edgeR", author = NULL, dry.run = FALSE, save.results = TRUE, suppress.plots = FALSE )runEdgeR( x, groups, comparisons, covariates = NULL, block = NULL, subset.factor = NULL, subset.levels = NULL, subset.groups = TRUE, design = NULL, contrasts = NULL, robust = TRUE, trend = TRUE, lfc.threshold = 0, assay = 1, row.data = NULL, metadata = NULL, output.dir = "edgeR", author = NULL, dry.run = FALSE, save.results = TRUE, suppress.plots = FALSE )
x |
A SummarizedExperiment object containing a count matrix where genes and samples are in rows and columns, respectively.
Alternatively, the output of |
groups |
String specifying the |
comparisons |
Character vector specifying two or more groups to compare from |
covariates |
Character vector specifying the |
block |
Character vector specifying the |
subset.factor |
String specifying the |
subset.levels |
Vector containing the levels of the |
subset.groups |
Boolean indicating whether to automatically subset the dataset to only those samples assigned to groups in |
design |
Matrix, function, or formula specifying the experimental design, see |
contrasts |
String, function or matrix specifying a custom contrast, or a list of such objects; see |
robust |
Boolean indicating whether robust empirical Bayes shrinkage should be used in |
trend |
Boolean indicating whether to shrink the QL dispersions towards a trend fitted to the mean in |
lfc.threshold |
Number specifying a log-fold change threshold for |
assay |
String or integer specifying the assay of |
row.data |
Character vector specifying the |
metadata |
Named list of additional metadata to store alongside each result. |
output.dir |
String containing the path to an output directory in which to write the Rmarkdown file and save results. |
author |
Character vector containg the names of the authors.
If |
dry.run |
Boolean indicating whether to perform a dry run.
This generates the Rmarkdown report in |
save.results |
Boolean indicating whether the results should be saved to file. |
suppress.plots |
Boolean indicating whether plots should be suppressed.
This can be set to |
A Rmarkdown report named report.Rmd is written inside output.dir that contains the analysis commands.
If dry.run=FALSE, a list is returned containing:
results, a list of DataFrames of tables from all contrasts.
Each DataFrame corresponds to a comparison/contrast where each row corresponds to a gene (i.e., row) in se.
Each DataFrame contains the following columns:
AveExpr, the average abundance.
F, the F-statistic.
LogFC, the log-fold change.
(For non-ANOVA-like contrasts only.)
LogFC.<COLUMN>, the log-fold change corresponding to each column of the contrast matrix.
(For ANOVA-like contrasts only.)
PValue, the p-value;
FDR, the Benjamini-Hochberg-adjusted p-value.
normalized, a copy of x with normalized expression values.
This contains:
lib.size and norm.factors columns in its colData,
containing the library sizes and normalization factors, respectively.
a retained column in its rowData,
indicating whether a gene was retained after filtering.
a logCPM assay, containing the log-counts-per-million after normalization.
normalized may be subsetted by sample, depending on subset.factor, subset.group, etc.
If save.results=TRUE, the results are saved in a results directory inside output.
If dry.run=TRUE, NULL is returned.
Only the Rmarkdown report is saved to file.
Aaron Lun
x <- loadExampleDataset() tmp <- tempfile() out <- runEdgeR( x, groups="dex", comparisons=c("trt", "untrt"), output=tmp ) list.files(tmp, recursive=TRUE) outx <- loadExampleDataset() tmp <- tempfile() out <- runEdgeR( x, groups="dex", comparisons=c("trt", "untrt"), output=tmp ) list.files(tmp, recursive=TRUE) out
Test for differentially expressed (DE) genes from an RNA-seq count matrix using the voom algorithm from limma.
runVoom( x, groups, comparisons, covariates = NULL, block = NULL, subset.factor = NULL, subset.levels = NULL, subset.groups = TRUE, design = NULL, contrasts = NULL, dc.block = NULL, robust = TRUE, trend = FALSE, quality = TRUE, lfc.threshold = 0, assay = 1, row.data = NULL, metadata = NULL, output.dir = "voom", author = NULL, dry.run = FALSE, save.results = TRUE, suppress.plots = FALSE )runVoom( x, groups, comparisons, covariates = NULL, block = NULL, subset.factor = NULL, subset.levels = NULL, subset.groups = TRUE, design = NULL, contrasts = NULL, dc.block = NULL, robust = TRUE, trend = FALSE, quality = TRUE, lfc.threshold = 0, assay = 1, row.data = NULL, metadata = NULL, output.dir = "voom", author = NULL, dry.run = FALSE, save.results = TRUE, suppress.plots = FALSE )
x |
A SummarizedExperiment object containing a count matrix where genes and samples are in rows and columns, respectively.
Alternatively, the output of |
groups |
String specifying the |
comparisons |
Character vector specifying two or more groups to compare from |
covariates |
Character vector specifying the |
block |
Character vector specifying the |
subset.factor |
String specifying the |
subset.levels |
Vector containing the levels of the |
subset.groups |
Boolean indicating whether to automatically subset the dataset to only those samples assigned to groups in |
design |
Matrix, function, or formula specifying the experimental design, see |
contrasts |
String, function or matrix specifying a custom contrast, or a list of such objects; see |
dc.block |
String specifying the blocking factor to use in |
robust |
Boolean indicating whether robust empirical Bayes shrinkage should be used in |
trend |
Boolean indicating whether variances should be shrunk towards a trend in |
quality |
Boolean indicating whether quality weighting should be performed. This reduces the influence of low-quality samples at the cost of more computational work. |
lfc.threshold |
Number specifying a threshold on the log-fold change in |
assay |
String or integer specifying the assay of |
row.data |
Character vector specifying the |
metadata |
Named list of additional metadata to store alongside each result. |
output.dir |
String containing the path to an output directory in which to write the Rmarkdown file and save results. |
author |
Character vector containg the names of the authors.
If |
dry.run |
Boolean indicating whether to perform a dry run.
This generates the Rmarkdown report in |
save.results |
Boolean indicating whether the results should be saved to file. |
suppress.plots |
Boolean indicating whether plots should be suppressed.
This can be set to |
A Rmarkdown report named report.Rmd is written inside output.dir that contains the analysis commands.
If dry.run=FALSE, a list is returned containing:
results, a list of DataFrames of tables from all contrasts.
Each DataFrame corresponds to a comparison/contrast where each row corresponds to a gene (i.e., row) in se.
Each DataFrame contains the following columns:
AveExpr, the average abundance.
t, the F-statistic.
(For non-ANOVA-like contrasts only.)
F, the F-statistic.
(For ANOVA-like contrasts only.)
LogFC, the log-fold change.
(For non-ANOVA-like contrasts only.)
LogFC.<COLUMN>, the log-fold change corresponding to each column of the contrast matrix.
(For ANOVA-like contrasts only.)
PValue, the p-value;
FDR, the Benjamini-Hochberg-adjusted p-value.
normalized, a copy of x with normalized expression values.
This contains:
lib.size and norm.factors columns in its colData,
containing the library sizes and normalization factors, respectively.
a retained column in its rowData,
indicating whether a gene was retained after filtering.
a logCPM assay, containing the log-counts-per-million after normalization.
normalized may be subsetted by sample, depending on subset.factor, subset.group, etc.
If save.results=TRUE, the results are saved in a results directory inside output.
If dry.run=TRUE, NULL is returned.
Only the Rmarkdown report is saved to file.
Aaron Lun
x <- loadExampleDataset() tmp <- tempfile() out <- runVoom( x, groups="dex", comparisons=c("trt", "untrt"), output=tmp ) list.files(tmp, recursive=TRUE) outx <- loadExampleDataset() tmp <- tempfile() out <- runVoom( x, groups="dex", comparisons=c("trt", "untrt"), output=tmp ) list.files(tmp, recursive=TRUE) out