| Title: | Fast Donor-Weighted Pseudo-Bulk Differential Expression for scRNA-seq |
|---|---|
| Description: | scFastDE provides fast, donor-weighted pseudo-bulk differential expression analysis for multi-donor single-cell RNA-seq experiments. Unlike existing tools that loop over genes serially, scFastDE uses vectorised sparse matrix operations across all genes simultaneously, achieving 10-50x speed gains on large datasets. Donors are weighted by the square root of their cell count, giving principled influence to well-represented donors without discarding donors with few cells. Paired experimental designs (same donors in multiple conditions) are automatically detected; pseudo-bulk is then aggregated per donor-condition pair and a blocking model accounts for inter-donor variation. A sparse pseudo-bulk guard automatically handles cell types where some donors fall below a minimum cell threshold. All functions operate natively on SingleCellExperiment objects. |
| Authors: | Subhadip Jana [aut, cre] (ORCID: <https://orcid.org/0009-0003-7860-2853>) |
| Maintainer: | Subhadip Jana <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.99.3 |
| Built: | 2026-06-28 11:12:42 UTC |
| Source: | https://github.com/bioc/scFastDE |
scFastDE provides fast, donor-weighted pseudo-bulk differential expression analysis for multi-donor single-cell RNA-seq experiments. Unlike existing tools that test genes serially in a loop, scFastDE uses vectorised sparse matrix operations across all genes simultaneously, giving 10-50x speed gains on large datasets (30k+ genes, 100k+ cells).
All genes are tested simultaneously via matrix algebra — no per-gene loop.
Each sample's pseudo-bulk profile is
weighted by sqrt(n_cells), giving principled influence to
well-represented samples.
When the same donors appear in
multiple conditions, fastDE automatically aggregates per
donor-condition pair and uses a blocking model
(~ 0 + condition + donor) to account for inter-donor
variation.
Donors with fewer than
min_cells cells per cell type are flagged and optionally
removed before aggregation.
filterSparseDonorsRemove donors below minimum cell threshold per cell type.
fastPseudobulkBuild donor-weighted pseudo-bulk matrix from a SingleCellExperiment. Supports per-donor or per-donor-condition aggregation.
fastDERun vectorised DE across all genes. Auto-detects paired vs unpaired designs.
plotDEResultsVolcano plot of DE results.
Maintainer: Subhadip Jana [email protected] (ORCID)
Authors:
Subhadip Jana [email protected] (ORCID)
Crowell HL et al. (2020). muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nature Communications, 11, 6077.
Useful links:
Report bugs at https://github.com/SubhadipJana1409/scFastDE/issues
Returns the per-gene differential expression statistics
DataFrame from a FDEResult object.
deTable(x, ...) ## S4 method for signature 'FDEResult' deTable(x, ...)deTable(x, ...) ## S4 method for signature 'FDEResult' deTable(x, ...)
x |
A |
... |
Additional arguments (not used). |
A DataFrame with columns logFC, AveExpr,
t, P.Value, adj.P.Val, and B.
library(S4Vectors) de <- DataFrame(logFC = c(1.2, -0.5), P.Value = c(0.001, 0.5), adj.P.Val = c(0.01, 0.8)) pb <- matrix(rpois(20, 10), nrow = 2, ncol = 10) rownames(pb) <- paste0("Gene", 1:2) colnames(pb) <- paste0("Donor", 1:10) obj <- FDEResult(de, pb, setNames(sqrt(1:10), paste0("Donor", 1:10))) deTable(obj)library(S4Vectors) de <- DataFrame(logFC = c(1.2, -0.5), P.Value = c(0.001, 0.5), adj.P.Val = c(0.01, 0.8)) pb <- matrix(rpois(20, 10), nrow = 2, ncol = 10) rownames(pb) <- paste0("Gene", 1:2) colnames(pb) <- paste0("Donor", 1:10) obj <- FDEResult(de, pb, setNames(sqrt(1:10), paste0("Donor", 1:10))) deTable(obj)
Returns the per-donor weight vector from a
FDEResult object.
donorWeights(x, ...) ## S4 method for signature 'FDEResult' donorWeights(x, ...)donorWeights(x, ...) ## S4 method for signature 'FDEResult' donorWeights(x, ...)
x |
A |
... |
Additional arguments (not used). |
A named numeric vector of donor weights.
library(S4Vectors) de <- DataFrame(logFC = c(1.2, -0.5), P.Value = c(0.001, 0.5), adj.P.Val = c(0.01, 0.8)) pb <- matrix(rpois(20, 10), nrow = 2, ncol = 10) rownames(pb) <- paste0("Gene", 1:2) colnames(pb) <- paste0("Donor", 1:10) obj <- FDEResult(de, pb, setNames(sqrt(1:10), paste0("Donor", 1:10))) donorWeights(obj)library(S4Vectors) de <- DataFrame(logFC = c(1.2, -0.5), P.Value = c(0.001, 0.5), adj.P.Val = c(0.01, 0.8)) pb <- matrix(rpois(20, 10), nrow = 2, ncol = 10) rownames(pb) <- paste0("Gene", 1:2) colnames(pb) <- paste0("Donor", 1:10) obj <- FDEResult(de, pb, setNames(sqrt(1:10), paste0("Donor", 1:10))) donorWeights(obj)
Runs differential expression analysis across all genes simultaneously
using a weighted limma-voom linear model. Unlike tools that loop over
genes serially, fastDE passes the entire pseudo-bulk matrix to
limma, which uses LAPACK routines to fit all gene models in one
vectorised call.
The function automatically detects whether the experimental design is paired (same donors in multiple conditions) or unpaired (each donor in one condition only) and builds the appropriate linear model:
Paired: pseudo-bulk is aggregated per donor-condition
pair, and a ~ 0 + condition + donor model accounts for
donor-level variation.
Unpaired: pseudo-bulk is aggregated per donor, and a
~ 0 + condition model is used.
fastDE( sce, donor = NULL, cell_type = NULL, condition = NULL, target_type = NULL, contrast = NULL, min_cpm = 1, min_donors = 2L, min_cells = 10L, BPPARAM = SerialParam() )fastDE( sce, donor = NULL, cell_type = NULL, condition = NULL, target_type = NULL, contrast = NULL, min_cpm = 1, min_donors = 2L, min_cells = 10L, BPPARAM = SerialParam() )
sce |
A |
donor |
A |
cell_type |
A |
condition |
A |
target_type |
A |
contrast |
A |
min_cpm |
A |
min_donors |
An |
min_cells |
An |
BPPARAM |
A |
The analysis pipeline is:
Filter lowly-expressed genes (CPM > min_cpm in at least
min_donors samples).
Apply limma::voom with sample cell-count weights.
Fit a weighted linear model with the appropriate design.
Extract the contrast of interest with limma::makeContrasts.
Apply empirical Bayes moderation with limma::eBayes.
Return all results as a FDEResult object.
A FDEResult object containing:
deTable: per-gene statistics (logFC,
AveExpr, t, P.Value, adj.P.Val,
B).
pseudobulk: the aggregated count matrix.
donorWeights: the per-sample sqrt(n_cells)
weights.
params: the analysis parameters.
fastPseudobulk, filterSparseDonors,
plotDEResults
library(SingleCellExperiment) set.seed(42) n_genes <- 200 counts <- matrix(rpois(n_genes * 60, 8), nrow = n_genes, ncol = 60) rownames(counts) <- paste0("Gene", seq_len(n_genes)) colnames(counts) <- paste0("Cell", seq_len(60)) # Inject DE signal into first 10 genes for treatment group counts[1:10, 31:60] <- counts[1:10, 31:60] * 3L sce <- SingleCellExperiment(assays = list(counts = counts)) sce$donor <- rep(paste0("D", 1:6), each = 10) sce$cell_type <- "Tcell" sce$condition <- rep(c("ctrl", "treat"), each = 30) result <- fastDE(sce, donor = "donor", cell_type = "cell_type", condition = "condition", target_type = "Tcell", min_cells = 5) result head(deTable(result))library(SingleCellExperiment) set.seed(42) n_genes <- 200 counts <- matrix(rpois(n_genes * 60, 8), nrow = n_genes, ncol = 60) rownames(counts) <- paste0("Gene", seq_len(n_genes)) colnames(counts) <- paste0("Cell", seq_len(60)) # Inject DE signal into first 10 genes for treatment group counts[1:10, 31:60] <- counts[1:10, 31:60] * 3L sce <- SingleCellExperiment(assays = list(counts = counts)) sce$donor <- rep(paste0("D", 1:6), each = 10) sce$cell_type <- "Tcell" sce$condition <- rep(c("ctrl", "treat"), each = 30) result <- fastDE(sce, donor = "donor", cell_type = "cell_type", condition = "condition", target_type = "Tcell", min_cells = 5) result head(deTable(result))
Aggregates single-cell counts into pseudo-bulk profiles for a specified cell type, using vectorised sparse matrix operations.
When condition is provided, aggregation is performed per
donor-condition pair (one column per sample, e.g. D1_ctrl,
D1_stim). This is the correct approach for paired experimental
designs where the same donors contribute cells under multiple conditions.
When condition is NULL (default), aggregation is performed
per donor only (backward-compatible behaviour for unpaired designs).
Each sample's weight equals sqrt(n_cells), giving more influence
to well-represented samples while not discarding those with fewer cells.
fastPseudobulk( sce, donor = NULL, cell_type = NULL, target_type = NULL, condition = NULL, assay_name = "counts" )fastPseudobulk( sce, donor = NULL, cell_type = NULL, target_type = NULL, condition = NULL, assay_name = "counts" )
sce |
A |
donor |
A |
cell_type |
A |
target_type |
A |
condition |
A |
assay_name |
A |
Pseudo-bulk aggregation sums the raw counts for all cells belonging to each donor (or donor-condition pair) within a given cell type:
Sample weights are then computed as:
where is the number of cells for sample in cell
type .
The aggregation uses sparse matrix column-sums grouped by sample, avoiding a slow R-level loop.
A list with elements:
pseudobulkA matrix of aggregated counts,
rows = genes, columns = samples.
sample_weightsA named numeric vector of
per-sample weights (sqrt(n_cells)).
sample_ncellsA named integer vector giving
the raw cell count per sample.
sample_infoA data.frame with columns
sample_id, donor, and (if applicable)
condition — one row per pseudo-bulk sample.
For backward compatibility, donor_weights and
donor_ncells are also provided as aliases when
condition is NULL.
library(SingleCellExperiment) set.seed(42) counts <- matrix(rpois(500 * 60, 5), nrow = 500, ncol = 60) rownames(counts) <- paste0("Gene", seq_len(500)) colnames(counts) <- paste0("Cell", seq_len(60)) sce <- SingleCellExperiment(assays = list(counts = counts)) sce$donor <- rep(paste0("D", 1:6), each = 10) sce$cell_type <- "Tcell" sce$condition <- rep(c("ctrl", "treat"), each = 30) # Unpaired (aggregate by donor only) pb <- fastPseudobulk(sce, donor = "donor", cell_type = "cell_type", target_type = "Tcell") dim(pb$pseudobulk) # Paired (aggregate by donor x condition) pb2 <- fastPseudobulk(sce, donor = "donor", cell_type = "cell_type", target_type = "Tcell", condition = "condition") dim(pb2$pseudobulk) pb2$sample_infolibrary(SingleCellExperiment) set.seed(42) counts <- matrix(rpois(500 * 60, 5), nrow = 500, ncol = 60) rownames(counts) <- paste0("Gene", seq_len(500)) colnames(counts) <- paste0("Cell", seq_len(60)) sce <- SingleCellExperiment(assays = list(counts = counts)) sce$donor <- rep(paste0("D", 1:6), each = 10) sce$cell_type <- "Tcell" sce$condition <- rep(c("ctrl", "treat"), each = 30) # Unpaired (aggregate by donor only) pb <- fastPseudobulk(sce, donor = "donor", cell_type = "cell_type", target_type = "Tcell") dim(pb$pseudobulk) # Paired (aggregate by donor x condition) pb2 <- fastPseudobulk(sce, donor = "donor", cell_type = "cell_type", target_type = "Tcell", condition = "condition") dim(pb2$pseudobulk) pb2$sample_info
Create a new FDEResult object.
FDEResult(deTable, pseudobulk, donorWeights, params = list())FDEResult(deTable, pseudobulk, donorWeights, params = list())
deTable |
A |
pseudobulk |
A |
donorWeights |
A named |
params |
A |
A FDEResult object.
library(S4Vectors) de <- DataFrame( logFC = c(1.2, -0.5, 0.8), P.Value = c(0.001, 0.5, 0.02), adj.P.Val = c(0.01, 0.8, 0.05) ) pb <- matrix(rpois(30, 10), nrow = 3, ncol = 10) rownames(pb) <- paste0("Gene", 1:3) colnames(pb) <- paste0("Donor", 1:10) obj <- FDEResult( deTable = de, pseudobulk = pb, donorWeights = setNames(sqrt(1:10), paste0("Donor", 1:10)), params = list(cell_type = "T_cells", condition = "group") ) objlibrary(S4Vectors) de <- DataFrame( logFC = c(1.2, -0.5, 0.8), P.Value = c(0.001, 0.5, 0.02), adj.P.Val = c(0.01, 0.8, 0.05) ) pb <- matrix(rpois(30, 10), nrow = 3, ncol = 10) rownames(pb) <- paste0("Gene", 1:3) colnames(pb) <- paste0("Donor", 1:10) obj <- FDEResult( deTable = de, pseudobulk = pb, donorWeights = setNames(sqrt(1:10), paste0("Donor", 1:10)), params = list(cell_type = "T_cells", condition = "group") ) obj
An S4 class to store the output of fastDE. Slots hold
per-gene DE statistics, the pseudo-bulk matrix used, donor weights,
and the analysis parameters.
deTableA DataFrame with one row per gene containing
columns logFC, AveExpr, t, P.Value,
adj.P.Val, and B.
pseudobulkA matrix of pseudo-bulk counts (genes x donors)
used as input to the DE model.
donorWeightsA named numeric vector of per-donor weights
(sqrt of cell count) used in the linear model.
paramsA list of analysis parameters including
cell_type, condition, donor, min_cells.
Removes or flags donors that have fewer than min_cells cells
for a given cell type. When forming pseudo-bulk profiles, donors with
very few cells produce highly variable, unreliable aggregated counts
that can inflate false-positive DE calls. This function provides a
principled pre-filtering step before calling fastPseudobulk.
filterSparseDonors( sce, donor = NULL, cell_type = NULL, min_cells = 10L, action = c("remove", "flag") )filterSparseDonors( sce, donor = NULL, cell_type = NULL, min_cells = 10L, action = c("remove", "flag") )
sce |
A |
donor |
A |
cell_type |
A |
min_cells |
A |
action |
A |
For each combination of cell_type level and donor level,
the function counts the number of cells in the sce and compares
it to min_cells. Donors below the threshold are either removed
(when action = "remove") or flagged in a new colData
column (when action = "flag").
As a rule of thumb, min_cells = 10 is a reasonable default for
common cell types. For rare cell types (< 1% frequency), consider
lowering to min_cells = 5.
The input sce with sparse donor-cell type combinations
either removed or flagged depending on action.
When action = "flag", a column scFastDE_sparse is
added to colData: TRUE means the cell belongs to a
sparse donor-cell type combination.
library(SingleCellExperiment) set.seed(42) counts <- matrix(rpois(500 * 80, 5), nrow = 500, ncol = 80) rownames(counts) <- paste0("Gene", seq_len(500)) colnames(counts) <- paste0("Cell", seq_len(80)) sce <- SingleCellExperiment(assays = list(counts = counts)) sce$donor <- rep(paste0("D", 1:8), each = 10) sce$cell_type <- rep(c("Tcell", "Bcell"), times = 40) # Remove donor-cell type combos with fewer than 5 cells sce_filtered <- filterSparseDonors(sce, donor = "donor", cell_type = "cell_type", min_cells = 5) ncol(sce_filtered)library(SingleCellExperiment) set.seed(42) counts <- matrix(rpois(500 * 80, 5), nrow = 500, ncol = 80) rownames(counts) <- paste0("Gene", seq_len(500)) colnames(counts) <- paste0("Cell", seq_len(80)) sce <- SingleCellExperiment(assays = list(counts = counts)) sce$donor <- rep(paste0("D", 1:8), each = 10) sce$cell_type <- rep(c("Tcell", "Bcell"), times = 40) # Remove donor-cell type combos with fewer than 5 cells sce_filtered <- filterSparseDonors(sce, donor = "donor", cell_type = "cell_type", min_cells = 5) ncol(sce_filtered)
Returns the analysis parameter list from a
FDEResult object.
## S4 method for signature 'FDEResult' params(x, ...)## S4 method for signature 'FDEResult' params(x, ...)
x |
A |
... |
Additional arguments (not used). |
A list of analysis parameters.
library(S4Vectors) de <- DataFrame(logFC = c(1.2, -0.5), P.Value = c(0.001, 0.5), adj.P.Val = c(0.01, 0.8)) pb <- matrix(rpois(20, 10), nrow = 2, ncol = 10) rownames(pb) <- paste0("Gene", 1:2) colnames(pb) <- paste0("Donor", 1:10) obj <- FDEResult( de, pb, setNames(sqrt(1:10), paste0("Donor", 1:10)), params = list(cell_type = "T_cells", condition = "group") ) params(obj)library(S4Vectors) de <- DataFrame(logFC = c(1.2, -0.5), P.Value = c(0.001, 0.5), adj.P.Val = c(0.01, 0.8)) pb <- matrix(rpois(20, 10), nrow = 2, ncol = 10) rownames(pb) <- paste0("Gene", 1:2) colnames(pb) <- paste0("Donor", 1:10) obj <- FDEResult( de, pb, setNames(sqrt(1:10), paste0("Donor", 1:10)), params = list(cell_type = "T_cells", condition = "group") ) params(obj)
Produces a volcano plot from a FDEResult object, with
genes coloured by significance and optionally labelled. Points are
coloured by whether they exceed the lfc_thresh and
fdr_thresh thresholds.
plotDEResults( result, fdr_thresh = 0.05, lfc_thresh = 1, top_n = 10L, point_size = 1, point_alpha = 0.6, colours = c(up = "#E24B4A", down = "#378ADD", ns = "#888780") )plotDEResults( result, fdr_thresh = 0.05, lfc_thresh = 1, top_n = 10L, point_size = 1, point_alpha = 0.6, colours = c(up = "#E24B4A", down = "#378ADD", ns = "#888780") )
result |
|
fdr_thresh |
A |
lfc_thresh |
A |
top_n |
An |
point_size |
A |
point_alpha |
A |
colours |
A named |
A ggplot2 object.
library(SingleCellExperiment) set.seed(42) n_genes <- 200 counts <- matrix(rpois(n_genes * 60, 8), nrow = n_genes, ncol = 60) rownames(counts) <- paste0("Gene", seq_len(n_genes)) colnames(counts) <- paste0("Cell", seq_len(60)) counts[1:10, 31:60] <- counts[1:10, 31:60] * 3L sce <- SingleCellExperiment(assays = list(counts = counts)) sce$donor <- rep(paste0("D", 1:6), each = 10) sce$cell_type <- "Tcell" sce$condition <- rep(c("ctrl", "treat"), each = 30) result <- fastDE(sce, donor = "donor", cell_type = "cell_type", condition = "condition", target_type = "Tcell", min_cells = 5) plotDEResults(result)library(SingleCellExperiment) set.seed(42) n_genes <- 200 counts <- matrix(rpois(n_genes * 60, 8), nrow = n_genes, ncol = 60) rownames(counts) <- paste0("Gene", seq_len(n_genes)) colnames(counts) <- paste0("Cell", seq_len(60)) counts[1:10, 31:60] <- counts[1:10, 31:60] * 3L sce <- SingleCellExperiment(assays = list(counts = counts)) sce$donor <- rep(paste0("D", 1:6), each = 10) sce$cell_type <- "Tcell" sce$condition <- rep(c("ctrl", "treat"), each = 30) result <- fastDE(sce, donor = "donor", cell_type = "cell_type", condition = "condition", target_type = "Tcell", min_cells = 5) plotDEResults(result)
Returns the pseudo-bulk count matrix from a
FDEResult object.
pseudobulk(x, ...) ## S4 method for signature 'FDEResult' pseudobulk(x, ...)pseudobulk(x, ...) ## S4 method for signature 'FDEResult' pseudobulk(x, ...)
x |
A |
... |
Additional arguments (not used). |
A matrix of pseudo-bulk counts (genes x donors).
library(S4Vectors) de <- DataFrame(logFC = c(1.2, -0.5), P.Value = c(0.001, 0.5), adj.P.Val = c(0.01, 0.8)) pb <- matrix(rpois(20, 10), nrow = 2, ncol = 10) rownames(pb) <- paste0("Gene", 1:2) colnames(pb) <- paste0("Donor", 1:10) obj <- FDEResult(de, pb, setNames(sqrt(1:10), paste0("Donor", 1:10))) pseudobulk(obj)library(S4Vectors) de <- DataFrame(logFC = c(1.2, -0.5), P.Value = c(0.001, 0.5), adj.P.Val = c(0.01, 0.8)) pb <- matrix(rpois(20, 10), nrow = 2, ncol = 10) rownames(pb) <- paste0("Gene", 1:2) colnames(pb) <- paste0("Donor", 1:10) obj <- FDEResult(de, pb, setNames(sqrt(1:10), paste0("Donor", 1:10))) pseudobulk(obj)
Prints a compact summary of a FDEResult object.
## S4 method for signature 'FDEResult' show(object)## S4 method for signature 'FDEResult' show(object)
object |
A |
Invisibly returns object.