| Title: | Curated Latent-variable Analysis with Molecular Priors |
|---|---|
| Description: | CLAMP performs prior-informed latent variable decomposition of high-dimensional transcriptomic data. It integrates curated gene sets to learn biologically interpretable latent variables, supports file-backed matrices for large datasets, and provides tools for preprocessing, normalization, projection, and evaluation of latent structures. CLAMP is designed to scale to tens of thousands of samples, making it suitable for large public resources such as recount3 and ARCHS4. It enables researchers to uncover biologically meaningful patterns that connect genes, pathways, and complex traits in transcriptomics studies. |
| Authors: | Marc Subirana-Granes [aut, cre] (ORCID: <https://orcid.org/0000-0003-3934-839X>), Maria Chikina [aut], National Human Genome Research Institute [fnd] (R00 HG011898 to M.P.; R01 HG009299-6A1 to M.C.), Eunice Kennedy Shriver National Institute of Child Health and Human Development [fnd] (R01 HD109765 to M.P.), National Science Foundation [fnd] (NSF 2238125 to M.C.), National Eye Institute [fnd] (NIH R01 EY030546-01A1 to M.C.) |
| Maintainer: | Marc Subirana-Granes <[email protected]> |
| License: | GPL-3 |
| Version: | 0.99.3 |
| Built: | 2026-06-21 00:44:05 UTC |
| Source: | https://github.com/bioc/CLAMP |
Calculates the area under the ROC curve (AUC) for all pairs of columns
between a prediction matrix B and binary targets in target.
Each column of B is ranked, and AUC is computed based on how well
the ranks separate positive vs. negative samples in each target column.
allAgainstAllAUCs(B, target)allAgainstAllAUCs(B, target)
B |
A numeric matrix of predictions (samples × features). |
target |
A binary matrix of the same number of rows as |
A numeric matrix of AUC values (features × targets).
set.seed(1) B <- matrix(rnorm(100), nrow = 20) target <- matrix(sample(0:1, 40, replace = TRUE), nrow = 20) allAgainstAllAUCs(B, target)set.seed(1) B <- matrix(rnorm(100), nrow = 20) target <- matrix(sample(0:1, 40, replace = TRUE), nrow = 20) allAgainstAllAUCs(B, target)
Computes the area under the ROC curve (AUC) by applying a Wilcoxon rank-sum test between predicted values for positive and negative labels. This is equivalent to computing the Mann-Whitney U statistic.
AUC(labels, values)AUC(labels, values)
labels |
A numeric or logical vector indicating class labels. Values > 0 are treated as positive. |
values |
A numeric vector of prediction scores corresponding to
|
A list with:
aucEstimated AUC, or 0.5 if one class is missing
pvalWilcoxon test p-value, or NA if one class
is missing
Applies the BH (Benjamini-Hochberg) correction for multiple hypothesis testing.
BH(p)BH(p)
p |
Numeric vector of p-values. |
Adjusted p-values.
Keeps only the top top values in each column of a matrix, setting
others to 0.
binarizeTop(Z, top, keepVals = TRUE)binarizeTop(Z, top, keepVals = TRUE)
Z |
A numeric matrix. |
top |
Number of top entries to keep in each column. |
keepVals |
If |
A modified matrix with only top entries retained per column.
A numeric matrix of estimated cell-type proportions for whole-blood samples. Rows correspond to sample IDs and columns to major immune cell types. This dataset can be used for validation or illustrative purposes in CLAMP analyses.
data(celltypeTargets)data(celltypeTargets)
A numeric matrix with samples as rows and cell types as columns. Row names are sample identifiers.
A numeric matrix of cell-type proportions.
data(celltypeTargets)data(celltypeTargets)
Runs the core matrix factorization procedure of CLAMP,
decomposing the gene expression matrix Y into latent variables
Z and loadings B. It supports sparse, dense, and Filebacked
Big Matrices (FBM) as input and includes options for
adaptive sparsity, positive constraints, and regularization.
CLAMPbase( Y, clamp_k = NULL, svd_k = NULL, svdres = NULL, L1 = NULL, L2 = NULL, Zpos = TRUE, max.iter = 200, tol = 5e-04, trace = FALSE, rseed = NULL, B = NULL, scale = 1, pos.adj = 3, adaptive.p = 0.05, adaptive.iter = 20, cutoff = 0, ncores = 1, clamp_k_method = c("elbow", "permutation", "gavish_donoho", "scaleSVs") )CLAMPbase( Y, clamp_k = NULL, svd_k = NULL, svdres = NULL, L1 = NULL, L2 = NULL, Zpos = TRUE, max.iter = 200, tol = 5e-04, trace = FALSE, rseed = NULL, B = NULL, scale = 1, pos.adj = 3, adaptive.p = 0.05, adaptive.iter = 20, cutoff = 0, ncores = 1, clamp_k_method = c("elbow", "permutation", "gavish_donoho", "scaleSVs") )
Y |
Input gene expression matrix (genes x samples). Can be dense,
sparse ( |
clamp_k |
Number of latent variables for CLAMP (final model rank).
If |
svd_k |
Number of singular values/components to compute in the SVD.
If |
svdres |
Optional precomputed SVD result. If not supplied, it is computed internally. |
L1 |
L1 regularization strength for Z. Defaults to scaled singular value. |
L2 |
L2 regularization strength for B. Defaults to scaled singular value. |
Zpos |
Logical; if |
max.iter |
Maximum number of optimization iterations. Default is 200. |
tol |
Convergence tolerance for B update. Default is 5e-4. |
trace |
Logical; if |
rseed |
Optional integer for reproducible random initialization of B. |
B |
Optional initial matrix for B. If not provided, initialized from SVD. |
scale |
Scaling factor for L1 and L2 when not provided. Default is 1. |
pos.adj |
Positive constraint adjustment divisor for L1. Default is 3. |
adaptive.p |
Controls adaptive sparsity in |
adaptive.iter |
Number of iterations before adaptive sparsity is applied. Default is 20. |
cutoff |
Scalar threshold to zero Z values when |
ncores |
Number of cores to use for parallel computation (only used if Y is an FBM). Default is 1. |
clamp_k_method |
Method for selecting |
This function is the low-level implementation of CLAMP. It alternates
between solving for Z given B and solving for B
given Z, with optional sparsity and non-negativity constraints on
Z. Convergence is assessed via relative change in B.
A list with components:
BLatent variable loadings (LVs x genes)
ZLatent variable scores (LVs x samples)
ZrawRaw Z matrix before thresholding
L1Final value of L1 used
L2Final value of L2 used
# small toy dataset: 5 genes x 4 samples Y <- matrix(rnorm(5 * 4), nrow = 5, ncol = 4) # run a single iteration for speed res <- CLAMPbase(Y, clamp_k = 2, max.iter = 1, trace = FALSE) # inspect dimensions of B and Z dim(res$B) dim(res$Z)# small toy dataset: 5 genes x 4 samples Y <- matrix(rnorm(5 * 4), nrow = 5, ncol = 4) # run a single iteration for speed res <- CLAMPbase(Y, clamp_k = 2, max.iter = 1, trace = FALSE) # inspect dimensions of B and Z dim(res$B) dim(res$Z)
Lollipop-style dot plot showing the top pathways associated with one
selected LV. Dot size encodes AUC; dot colour encodes -log10(FDR).
The x-axis and pathway ordering can use either AUC or -log10(FDR).
CLAMPdotplot( clampRes, lv = 1, top = 20, auc.cutoff = 0.6, fdr.cutoff = 0.05, max.name.len = 50, x.axis = c("AUC", "-log10(FDR)", "log10FDR"), order.by = x.axis )CLAMPdotplot( clampRes, lv = 1, top = 20, auc.cutoff = 0.6, fdr.cutoff = 0.05, max.name.len = 50, x.axis = c("AUC", "-log10(FDR)", "log10FDR"), order.by = x.axis )
clampRes |
A CLAMP result list containing a |
lv |
LV to plot: either a numeric index (e.g. |
top |
Maximum number of pathways to display, chosen by |
auc.cutoff |
Minimum AUC to display. Default |
fdr.cutoff |
Maximum FDR to display. Default |
max.name.len |
Maximum characters for pathway label trimming.
Default |
x.axis |
Metric to place on the x-axis. Use |
order.by |
Metric used to choose the top pathways and order the y-axis.
Use |
Invisibly returns a ggplot2::ggplot() object.
set.seed(9) pathways <- paste0("Pathway_", 1:20) lvs <- paste0("LV", 1:5) nr <- length(pathways) * length(lvs) summ <- data.frame( pathway = rep(pathways, length(lvs)), LV = rep(lvs, each = length(pathways)), AUC = runif(nr, 0.5, 1.0), FDR = runif(nr, 0, 0.05), stringsAsFactors = FALSE ) CLAMPdotplot(list(summary = summ), lv = 1, top = 10) CLAMPdotplot(list(summary = summ), lv = "LV2", x.axis = "-log10(FDR)", order.by = "-log10(FDR)" )set.seed(9) pathways <- paste0("Pathway_", 1:20) lvs <- paste0("LV", 1:5) nr <- length(pathways) * length(lvs) summ <- data.frame( pathway = rep(pathways, length(lvs)), LV = rep(lvs, each = length(pathways)), AUC = runif(nr, 0.5, 1.0), FDR = runif(nr, 0, 0.05), stringsAsFactors = FALSE ) CLAMPdotplot(list(summary = summ), lv = 1, top = 10) CLAMPdotplot(list(summary = summ), lv = "LV2", x.axis = "-log10(FDR)", order.by = "-log10(FDR)" )
Produces a dot plot where each point represents one pathway × LV pair.
Dot size encodes the AUC value and dot colour encodes -log10(FDR). Only
associations passing auc.cutoff and fdr.cutoff are shown.
CLAMPdotplotAll( clampRes, auc.cutoff = 0.6, fdr.cutoff = 0.05, top.per.lv = NULL, max.name.len = 40 )CLAMPdotplotAll( clampRes, auc.cutoff = 0.6, fdr.cutoff = 0.05, top.per.lv = NULL, max.name.len = 40 )
clampRes |
A CLAMP result list containing a |
auc.cutoff |
Minimum AUC to display. Default |
fdr.cutoff |
Maximum FDR to display. Default |
top.per.lv |
Maximum number of pathways to display per LV, chosen by
highest AUC. |
max.name.len |
Maximum characters for pathway label trimming.
Default |
Invisibly returns a ggplot2::ggplot() object.
set.seed(9) pathways <- paste0("Pathway_", 1:20) lvs <- paste0("LV", 1:5) nr <- length(pathways) * length(lvs) summ <- data.frame( pathway = rep(pathways, length(lvs)), LV = rep(lvs, each = length(pathways)), AUC = runif(nr, 0.5, 1.0), FDR = runif(nr, 0, 0.2), stringsAsFactors = FALSE ) CLAMPdotplotAll(list(summary = summ), auc.cutoff = 0.6, fdr.cutoff = 0.15)set.seed(9) pathways <- paste0("Pathway_", 1:20) lvs <- paste0("LV", 1:5) nr <- length(pathways) * length(lvs) summ <- data.frame( pathway = rep(pathways, length(lvs)), LV = rep(lvs, each = length(pathways)), AUC = runif(nr, 0.5, 1.0), FDR = runif(nr, 0, 0.2), stringsAsFactors = FALSE ) CLAMPdotplotAll(list(summary = summ), auc.cutoff = 0.6, fdr.cutoff = 0.15)
This version performs latent-variable decomposition of a gene expression
matrix Y guided by prior pathway annotations priorMat, with
simplified and lighter regularization compared to the original extended
CLAMP variant. The algorithm alternates updates of Z, B,
and U, where U captures pathway-latent variable
associations inferred directly from the data without ridge-regularized
projections (Chat is not used).
CLAMPfull( Y, priorMat, Chat = NULL, svdres = NULL, clamp.base.result = NULL, clamp_k = NULL, svd_k = NULL, L1 = NULL, L2 = NULL, cvn = 5, max.iter = 30, trace = TRUE, maxPath = 10, doCrossval = TRUE, penalty.factor = rep(1, ncol(priorMat)), glm_alpha = 0.9, minGenes = 0, tol = 5e-04, seed = 123456, allGenes = FALSE, rseed = NULL, max.U.updates = Inf, pathwaySelection = c("fast", "complete"), multiplier = 5, adaptive.p = 0.05, useNNLS = TRUE, useRaw = TRUE, refitEvery = 3, useSE = FALSE, var.prior = TRUE, Uscale = FALSE, robust.vp = TRUE, use_cpp = FALSE, clamp_k_method = c("elbow", "permutation", "gavish_donoho", "scaleSVs") )CLAMPfull( Y, priorMat, Chat = NULL, svdres = NULL, clamp.base.result = NULL, clamp_k = NULL, svd_k = NULL, L1 = NULL, L2 = NULL, cvn = 5, max.iter = 30, trace = TRUE, maxPath = 10, doCrossval = TRUE, penalty.factor = rep(1, ncol(priorMat)), glm_alpha = 0.9, minGenes = 0, tol = 5e-04, seed = 123456, allGenes = FALSE, rseed = NULL, max.U.updates = Inf, pathwaySelection = c("fast", "complete"), multiplier = 5, adaptive.p = 0.05, useNNLS = TRUE, useRaw = TRUE, refitEvery = 3, useSE = FALSE, var.prior = TRUE, Uscale = FALSE, robust.vp = TRUE, use_cpp = FALSE, clamp_k_method = c("elbow", "permutation", "gavish_donoho", "scaleSVs") )
Y |
Gene expression matrix (genes x samples). Can be dense, sparse (dgCMatrix), or FBM. |
priorMat |
Binary or weighted prior matrix (genes x pathways) linking genes to pathways. |
Chat |
Ignored in this version (kept for interface compatibility). |
svdres |
Optional precomputed SVD result for initialization. |
clamp.base.result |
Optional result from |
clamp_k |
Number of latent variables for CLAMP (final model rank).
If |
svd_k |
Number of singular values/components to compute in the SVD.
If |
L1, L2
|
Regularization parameters for |
cvn |
Number of folds for pathway-level cross-validation. Default: 5. |
max.iter |
Maximum number of outer iterations. Default: 30. |
trace |
Logical; print iteration progress. Default: |
maxPath |
Maximum number of pathways per LV during U-fitting. Default: 10. |
doCrossval |
Whether to mask prior entries for CV evaluation.
Default: |
penalty.factor |
Optional vector of per-pathway penalties for glmnet. Default: 1. |
glm_alpha |
Elastic net mixing parameter for U estimation. Default: 0.9. |
minGenes |
Minimum number of genes per pathway. Default: 0. |
tol |
Convergence tolerance for B updates. Default: 5e-4. |
seed |
Random seed for CV masking. Default: 123456. |
allGenes |
If |
rseed |
Reproducibility, coordinate descent updates are done in random order. |
max.U.updates |
Maximum number of U updates (capped by max.iter). |
pathwaySelection |
Pathway selection mode for U fitting
( |
multiplier |
Variance-prior scaling factor. Default: 5. |
adaptive.p |
Quantile of negative Z values used to define adaptive thresholding. Default: 0.05. |
useNNLS |
Whether to use non-negative least squares for U estimation.
Default: |
useRaw |
If |
refitEvery |
Frequency (in U updates) of full refits. Default: 3. |
useSE |
Logical; whether to use the 1-standard-error rule for internal glmnet fitting. Default is FALSE. |
var.prior |
Logical; if |
Uscale |
Logical; whether to scale U columns. Default: |
robust.vp |
Logical; winsorize prior-predicted Z2 values to reduce
outlier effects. Default: |
use_cpp |
Logical; if TRUE, use C++ implementation for Z updates. Default is FALSE. |
clamp_k_method |
Method for selecting |
Cross-validation can be used to evaluate pathway-LV specificity, and a
variance-based prior (var.prior = TRUE) introduces adaptive
shrinkage of Z based on how strongly each latent component aligns
with prior pathways. The scaling factor multiplier (default 5)
controls the strength of this adaptive shrinkage.
This implementation omits ridge-projected priors (Chat) and uses
a lighter variance prior with a lower default multiplier = 5,
allowing more flexible latent representations. Setting
var.prior = FALSE reproduces standard CLAMP-like updates.
Cross-validation, if
enabled, masks 20% of gene-pathway associations per column to estimate
pathway-LV specificity
(reported via AUC and p-values).
A list with elements:
BLV loadings on samples (k × samples)
ZGene loadings (genes × k)
UPathway loadings (pathways × k)
CPrior matrix used during training (masked if CV)
Z2Predicted Z from pathway priors
heldOutGenesHeld-out gene lists per pathway (CV mode)
Uauc, Up, summaryCV evaluation metrics if CV is enabled
priorMat, priorMatCVFinal and masked prior matrices
callFunction call
set.seed(1) mat <- matrix(rnorm(100), nrow = 10, ncol = 10) base <- CLAMPbase(mat, clamp_k = 5, trace = FALSE, max.iter = 5) prior <- matrix(sample(0:1, 10 * 6, TRUE, prob = c(0.9, 0.1)), nrow = 10, ncol = 6 ) fit <- CLAMPfull( Y = mat, priorMat = prior, clamp.base.result = base, doCrossval = FALSE, adaptive.p = 0, max.U.updates = 0, max.iter = 1, trace = FALSE )set.seed(1) mat <- matrix(rnorm(100), nrow = 10, ncol = 10) base <- CLAMPbase(mat, clamp_k = 5, trace = FALSE, max.iter = 5) prior <- matrix(sample(0:1, 10 * 6, TRUE, prob = c(0.9, 0.1)), nrow = 10, ncol = 6 ) fit <- CLAMPfull( Y = mat, priorMat = prior, clamp.base.result = base, doCrossval = FALSE, adaptive.p = 0, max.U.updates = 0, max.iter = 1, trace = FALSE )
Runs the full CLAMP model using a gene expression matrix and prior pathway annotation matrix. This function performs latent variable decomposition guided by prior knowledge and includes optional cross-validation to evaluate pathway associations.
CLAMPfullnVP( Y, priorMat, svdres = NULL, clamp.base.result = NULL, clamp_k = NULL, svd_k = NULL, L1 = NULL, L2 = NULL, top = NULL, cvn = 5, max.iter = 350, trace = FALSE, Chat = NULL, maxPath = 10, doCrossval = TRUE, penalty.factor = rep(1, ncol(priorMat)), glm_alpha = 0.9, minGenes = 10, tol = 5e-04, seed = 123456, allGenes = FALSE, rseed = NULL, max.U.updates = 5, pathwaySelection = c("fast", "complete"), multiplier = 1, adaptive.p = 0.05, useNNLS = TRUE, useRaw = TRUE, refitAll = FALSE, useSE = FALSE, ncores = 1, clamp_k_method = c("elbow", "permutation", "gavish_donoho", "scaleSVs") )CLAMPfullnVP( Y, priorMat, svdres = NULL, clamp.base.result = NULL, clamp_k = NULL, svd_k = NULL, L1 = NULL, L2 = NULL, top = NULL, cvn = 5, max.iter = 350, trace = FALSE, Chat = NULL, maxPath = 10, doCrossval = TRUE, penalty.factor = rep(1, ncol(priorMat)), glm_alpha = 0.9, minGenes = 10, tol = 5e-04, seed = 123456, allGenes = FALSE, rseed = NULL, max.U.updates = 5, pathwaySelection = c("fast", "complete"), multiplier = 1, adaptive.p = 0.05, useNNLS = TRUE, useRaw = TRUE, refitAll = FALSE, useSE = FALSE, ncores = 1, clamp_k_method = c("elbow", "permutation", "gavish_donoho", "scaleSVs") )
Y |
Gene expression matrix (genes x samples). Can be dense, sparse (dgCMatrix), or FBM. |
priorMat |
Binary matrix (genes x pathways) representing prior annotations. |
svdres |
Optional SVD result used for initialization. |
clamp.base.result |
Optional result from |
clamp_k |
Number of latent variables for CLAMP (final model rank).
If |
svd_k |
Number of singular values/components to compute in the SVD.
If |
L1 |
Regularization strength for Z. If |
L2 |
Regularization strength for B. If |
top |
If set, keeps only top-n values per column in Z during U updates. |
cvn |
Number of folds for cross-validation in U updates. Default is 5. |
max.iter |
Maximum number of iterations. Default is 350. |
trace |
Logical; if |
Chat |
Optional precomputed matrix for solving U. |
maxPath |
Maximum number of pathways/features selected per LV. Default is 10. |
doCrossval |
Whether to perform pathway-level cross-validation.
Default is |
penalty.factor |
Vector of feature-specific penalties for glmnet. Default: all ones. |
glm_alpha |
Elastic net mixing parameter for glmnet. Default is 0.9. |
minGenes |
Minimum number of genes per pathway to retain. Default is 10. |
tol |
Convergence tolerance on relative change in B. Default is 5e-4. |
seed |
Seed for reproducibility of cross-validation masking. Default is 123456. |
allGenes |
If |
rseed |
Optional seed for randomly reinitializing B and Z. |
max.U.updates |
Maximum number of U updates. Default is 5. |
pathwaySelection |
Pathway selection mode: |
multiplier |
Scaling factor for adjusting L1 and L2. |
adaptive.p |
Quantile threshold for adaptively zeroing small Z
values. After each update, the |
useNNLS |
If |
useRaw |
If |
refitAll |
If |
useSE |
Logical; passed to the internal |
ncores |
Number of cores to use for parallel computation (only used if Y is an FBM). Default is 1. |
clamp_k_method |
Method for selecting |
The model alternates between solving Z, B, and U.
Adaptive sparsity is applied to Z using a dynamic threshold based
on the negative tail of its distribution. Cross-validation is used to
hold out gene annotations in priorMat and evaluate latent variable
specificity.
A list with the following components:
BLatent variable loadings (LVs x genes)
ZLatent variable matrix (LVs x samples)
UPathway loadings matrix (pathways x LVs)
CMasked prior matrix used for training
L1, L2
Regularization parameters
heldOutGenesList of held-out genes per pathway (if CV is enabled)
UaucAUC matrix from CV evaluation (if enabled)
Up-log10(p) values from CV evaluation (if enabled)
summaryData frame of AUC, p-values, and FDR per pathway x LV (if enabled)
priorMatCVMasked prior matrix used during CV
priorMatFinal filtered prior matrix
withPriorIndices of LVs with non-zero pathway loadings
callFunction call
mat <- matrix(rnorm(100), 10, 10) svdres <- rsvd::rsvd(mat, k = 5) base <- CLAMPbase(Y = mat, clamp_k = 5, svdres = svdres, trace = FALSE) priorMat <- matrix(1, nrow(mat), 5) full <- CLAMPfullnVP( Y = mat, priorMat = priorMat, svdres = svdres, clamp.base.result = base, clamp_k = 5, doCrossval = FALSE, trace = FALSE, max.U.updates = 0 )mat <- matrix(rnorm(100), 10, 10) svdres <- rsvd::rsvd(mat, k = 5) base <- CLAMPbase(Y = mat, clamp_k = 5, svdres = svdres, trace = FALSE) priorMat <- matrix(1, nrow(mat), 5) full <- CLAMPfullnVP( Y = mat, priorMat = priorMat, svdres = svdres, clamp.base.result = base, clamp_k = 5, doCrossval = FALSE, trace = FALSE, max.U.updates = 0 )
For each selected latent variable, ranks genes by their Z loading and plots
the top genes as a loading-versus-rank scatter plot. The highest ranking
genes are labelled with ggrepel.
CLAMPplotTopZ( clampRes, data = NULL, priorMat = NULL, top = 50, index = NULL, allLVs = FALSE, label.top = min(10, top), max.name.len = 50 )CLAMPplotTopZ( clampRes, data = NULL, priorMat = NULL, top = 50, index = NULL, allLVs = FALSE, label.top = min(10, top), max.name.len = 50 )
clampRes |
A CLAMP result list containing at least |
data |
Deprecated; retained for backward compatibility and ignored. |
priorMat |
Deprecated; retained for backward compatibility and ignored. |
top |
Number of top genes to plot per LV. Default |
index |
Integer or character vector of LV columns to include. |
allLVs |
Logical; if |
label.top |
Number of top genes to label per LV. Default |
max.name.len |
Maximum characters for displayed gene labels. |
A ggplot2::ggplot() object for one LV, or a patchwork object for
multiple LVs.
set.seed(1) genes <- paste0("Gene", 1:80) lvs <- paste0("LV", 1:4) paths <- paste0("Path", 1:10) Z <- matrix(abs(rnorm(80 * 4)), nrow = 80, dimnames = list(genes, lvs) ) U <- matrix(abs(rnorm(10 * 4)), nrow = 10, dimnames = list(paths, lvs) ) clampRes <- list(Z = Z, U = U) CLAMPplotTopZ(clampRes, top = 20, index = 1:2)set.seed(1) genes <- paste0("Gene", 1:80) lvs <- paste0("LV", 1:4) paths <- paste0("Path", 1:10) Z <- matrix(abs(rnorm(80 * 4)), nrow = 80, dimnames = list(genes, lvs) ) U <- matrix(abs(rnorm(10 * 4)), nrow = 10, dimnames = list(paths, lvs) ) clampRes <- list(Z = Z, U = U) CLAMPplotTopZ(clampRes, top = 20, index = 1:2)
Displays the pathway loading matrix U after filtering by AUC and FDR
thresholds. Only the top-top pathways per LV are shown.
CLAMPplotU( clampRes, auc.cutoff = 0.6, fdr.cutoff = 0.05, indexCol = NULL, indexRow = NULL, top = 3, sort.row = FALSE, cluster.rows = TRUE )CLAMPplotU( clampRes, auc.cutoff = 0.6, fdr.cutoff = 0.05, indexCol = NULL, indexRow = NULL, top = 3, sort.row = FALSE, cluster.rows = TRUE )
clampRes |
A CLAMP result list containing at least |
auc.cutoff |
Minimum AUC threshold; entries below this are set to zero.
Default |
fdr.cutoff |
Maximum FDR threshold for pathway significance filtering.
Default |
indexCol |
Integer vector of LV column indices to include. |
indexRow |
Integer vector of pathway row indices to include. |
top |
Number of top pathways to retain per LV. Default |
sort.row |
Logical; if |
cluster.rows |
Logical; if |
Invisibly returns a ggplot2::ggplot() object.
set.seed(42) pathways <- paste0("Path", 1:30) lvs <- paste0("LV", 1:5) U <- matrix(abs(rnorm(30 * 5)), nrow = 30, dimnames = list(pathways, lvs) ) Uauc <- matrix(runif(30 * 5, 0.5, 1.0), nrow = 30, dimnames = list(pathways, lvs) ) Up <- matrix(runif(30 * 5, 0, 3), nrow = 30, dimnames = list(pathways, lvs) ) # Build a minimal summary table nr <- 30 * 5 summ <- data.frame( pathway = rep(pathways, 5), LV = rep(lvs, each = 30), AUC = as.vector(Uauc), p_value = runif(nr, 0, 0.1), FDR = runif(nr, 0, 0.1), stringsAsFactors = FALSE ) clampRes <- list(U = U, Uauc = Uauc, Up = Up, summary = summ) CLAMPplotU(clampRes, auc.cutoff = 0.6, fdr.cutoff = 0.1, top = 3)set.seed(42) pathways <- paste0("Path", 1:30) lvs <- paste0("LV", 1:5) U <- matrix(abs(rnorm(30 * 5)), nrow = 30, dimnames = list(pathways, lvs) ) Uauc <- matrix(runif(30 * 5, 0.5, 1.0), nrow = 30, dimnames = list(pathways, lvs) ) Up <- matrix(runif(30 * 5, 0, 3), nrow = 30, dimnames = list(pathways, lvs) ) # Build a minimal summary table nr <- 30 * 5 summ <- data.frame( pathway = rep(pathways, 5), LV = rep(lvs, each = 30), AUC = as.vector(Uauc), p_value = runif(nr, 0, 0.1), FDR = runif(nr, 0, 0.1), stringsAsFactors = FALSE ) clampRes <- list(U = U, Uauc = Uauc, Up = Up, summary = summ) CLAMPplotU(clampRes, auc.cutoff = 0.6, fdr.cutoff = 0.1, top = 3)
This function inspects an FBM to determine if log-transformation is needed (based on value range) and whether NA values are present. If the maximum value is >= 100, it applies a log2(x + 1) transformation in-place. If any NA values are detected, they are replaced with 0.
cleanFBM(fbm, ncores = 1)cleanFBM(fbm, ncores = 1)
fbm |
A |
ncores |
Integer; number of cores to use for parallel operations (default 1). |
Modifies the FBM in place. Uses bigstatsr::big_apply() to process in
parallel-safe chunks.
A list with:
The maximum value encountered in the FBM (after log transformation if applied).
Logical indicating whether any NA values were found and filled.
fbm <- bigstatsr::FBM(3, 4, init = matrix( c(0, 1, 2, NA, 100, 200, 300, 400, 5, 6, 7, 8), nrow = 3 )) cleanFBM(fbm, ncores = 1)fbm <- bigstatsr::FBM(3, 4, init = matrix( c(0, 1, 2, NA, 100, 200, 300, 400, 5, 6, 7, 8), nrow = 3 )) cleanFBM(fbm, ncores = 1)
Returns the intersection of row names shared by two input objects.
commonRows(data1, data2)commonRows(data1, data2)
data1 |
A matrix, data frame, or similar object with row names. |
data2 |
A matrix, data frame, or similar object with row names. |
A character vector of row names common to both inputs.
Compares correspondence between two result matrices (e.g., factor loadings) across multiple targets using correlation, AUC, or t-statistics. Produces a paired comparison plot and summary statistics.
compareBs( res1, res2, target, method = c("p", "s", "a", "t"), xlab = "1", ylab = "2", stat.method = c("t", "wilcox"), oneToOne = TRUE )compareBs( res1, res2, target, method = c("p", "s", "a", "t"), xlab = "1", ylab = "2", stat.method = c("t", "wilcox"), oneToOne = TRUE )
res1 |
res2 Result objects or matrices containing factor loadings.
If a list, must contain element |
res2 |
Second result object or matrix to compare against. |
target |
Numeric matrix of target variables (samples × targets). |
method |
Character, one of |
xlab |
ylab Labels for x and y axes in the plot. |
ylab |
Label for y-axis (used in plot). |
stat.method |
Statistical test to compare correlations
( |
oneToOne |
Logical, whether to apply one-to-one masking of associations. |
A list with:
A ggplot object comparing maximal correlations across targets.
A data.frame with per-target metrics and (if available) top genes.
set.seed(123) # Simulated example with 50 genes × 20 samples Y <- matrix(rnorm(50 * 20), nrow = 50, ncol = 20) # Run two CLAMP-like decompositions (here using simple SVD) svd1 <- rsvd::rsvd(Y, k = 5) svd2 <- rsvd::rsvd(Y + matrix(rnorm(50 * 20, 0, 0.1), 50, 20), k = 5) # Define a target variable (e.g., binary or continuous trait) target <- matrix(rnorm(20 * 3), ncol = 3) colnames(target) <- c("Trait1", "Trait2", "Trait3") # Compare the two sets of embeddings res <- compareBs(svd1, svd2, target, method = "p", xlab = "SVD1", ylab = "SVD2" )set.seed(123) # Simulated example with 50 genes × 20 samples Y <- matrix(rnorm(50 * 20), nrow = 50, ncol = 20) # Run two CLAMP-like decompositions (here using simple SVD) svd1 <- rsvd::rsvd(Y, k = 5) svd2 <- rsvd::rsvd(Y + matrix(rnorm(50 * 20, 0, 0.1), 50, 20), k = 5) # Define a target variable (e.g., binary or continuous trait) target <- matrix(rnorm(20 * 3), ncol = 3) colnames(target) <- c("Trait1", "Trait2", "Trait3") # Compare the two sets of embeddings res <- compareBs(svd1, svd2, target, method = "p", xlab = "SVD1", ylab = "SVD2" )
By default, the SVD backend is auto-detected from the class of Y:
bigstatsr::big_SVD for FBM objects, irlba::irlba for sparse
dgCMatrix objects, and rsvd::rsvd otherwise. A specific backend
can be forced via method. Used by the CLAMP solvers so that the SVD
step is handled in one place.
compute_svd(Y, k = NULL, method = NULL)compute_svd(Y, k = NULL, method = NULL)
Y |
A matrix-like object (dense matrix, |
k |
Integer number of components to compute. If |
method |
One of |
A list with d, u, v components (structure depends on the
backend but these three fields are always present).
set.seed(1) Y <- matrix(rnorm(100), nrow = 20, ncol = 5) res <- compute_svd(Y, k = 3) length(res$d) # Force a specific backend: res2 <- compute_svd(Y, k = 3, method = "rsvd")set.seed(1) Y <- matrix(rnorm(100), nrow = 20, ncol = 5) res <- compute_svd(Y, k = 3) length(res$d) # Force a specific backend: res2 <- compute_svd(Y, k = 3, method = "rsvd")
Efficiently computes row sums and row sum-of-squares for a
bigstatsr::FBM using column-wise chunking, suitable for large datasets
that cannot be loaded fully into memory.
computeRowStatsFBM(fbm, ncores = 1)computeRowStatsFBM(fbm, ncores = 1)
fbm |
A |
ncores |
Integer; number of cores to use for parallel operations (default 1). |
A list with two numeric vectors:
Sum of each row.
Sum of squares of each row.
Compute counts-per-million (CPM) for CLAMP pipelines
cpmCLAMP(counts)cpmCLAMP(counts)
counts |
A numeric matrix or data.frame of raw counts (genes x samples). |
A numeric matrix of CPM values (same dimensions), ready for CLAMP input.
mat <- matrix(seq_len(12), nrow = 3) cpmCLAMP(mat)mat <- matrix(seq_len(12), nrow = 3) cpmCLAMP(mat)
Compute CPM on a file-backed matrix for CLAMP (in-place)
cpmCLAMPFBM(fbm_counts, block_size = 1000, ncores = 1)cpmCLAMPFBM(fbm_counts, block_size = 1000, ncores = 1)
fbm_counts |
A bigstatsr::FBM of raw counts (genes x samples). |
block_size |
Integer; columns per block (default 1000). |
ncores |
Integer; number of cores to use for parallel operations (default 1). |
Invisibly returns the modified FBM (now holding CPM values).
library(bigstatsr) mat <- matrix(c(10, 20, 30, 40, 50, 60), nrow = 2, dimnames = list(c("gene1", "gene2"), paste0("sample", seq_len(3))) ) fbm <- FBM(nrow(mat), ncol(mat), init = mat) cpmCLAMPFBM(fbm, block_size = 1)library(bigstatsr) mat <- matrix(c(10, 20, 30, 40, 50, 60), nrow = 2, dimnames = list(c("gene1", "gene2"), paste0("sample", seq_len(3))) ) fbm <- FBM(nrow(mat), ncol(mat), init = mat) cpmCLAMPFBM(fbm, block_size = 1)
Computes , handling FBM objects from bigstatsr
as well as base R matrices.
cross_ZY(Y, Z)cross_ZY(Y, Z)
Y |
Gene expression matrix (genes x samples), dense or FBM. |
Z |
Latent variable matrix (genes x k). |
A numeric matrix giving Z^T Y.
set.seed(123) genes <- 40 samples <- 10 k <- 4 Y <- matrix(rnorm(genes * samples), nrow = genes) Z <- matrix(rnorm(genes * k), nrow = genes) # Compute Z^T Y res1 <- cross_ZY(Y, Z) dim(res1) # k × samplesset.seed(123) genes <- 40 samples <- 10 k <- 4 Y <- matrix(rnorm(genes * samples), nrow = genes) Z <- matrix(rnorm(genes * k), nrow = genes) # Compute Z^T Y res1 <- cross_ZY(Y, Z) dim(res1) # k × samples
Evaluates how well each latent variable in a CLAMP model captures
held-out pathway annotations, using cross-validation over the prior
matrix. For each latent variable and associated pathway, held-out genes
are selected and the AUC is computed using their scores in
clampRes$Z.
crossVal(clampRes, priorMat, priorMatcv)crossVal(clampRes, priorMat, priorMatcv)
clampRes |
A list containing |
priorMat |
A binary matrix (genes x pathways) indicating original pathway annotations. |
priorMatcv |
A version of |
A list with:
UaucMatrix of AUC values (pathways x LVs)
UpvalMatrix of -log10(p) values (pathways x LVs)
summaryData frame with pathway, LV index, AUC, p-value, and FDR
A numeric matrix of whole-blood gene expression where rows correspond to genes and columns to samples.
data(dataWholeBlood)data(dataWholeBlood)
A numeric matrix with G genes (rows) and N samples (columns). Row names are gene symbols, and column names are sample IDs.
This object is a whole-blood RNA-seq expression matrix. The source data are
publicly available from NCBI GEO under accession
GSE130824
(Homo sapiens whole-blood RNA-seq, 36 samples). Raw sequencing data are
deposited in SRA, and the processed normalized expression file is provided
as GSE130824_dataNormedFiltered.txt.gz. The matrix bundled with
CLAMP is derived from that processed file and serves as a compact example
dataset for package demonstrations and unit tests.
A numeric matrix of expression values.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE130824
data(dataWholeBlood)data(dataWholeBlood)
For each row of a CLAMP B matrix, compares mean activity in a
reference sample group against all other samples using a Wilcoxon
rank-sum test, with Benjamini–Hochberg FDR adjustment.
differentialLVActivity( x, metadata, sample_col = "id", group_col = "type", reference )differentialLVActivity( x, metadata, sample_col = "id", group_col = "type", reference )
x |
A CLAMP result list (with element |
metadata |
Data frame with sample identifiers and group labels. |
sample_col |
Name of the column in |
group_col |
Name of the column in |
reference |
Label of the reference group. All other samples are treated as the comparison group. |
A data frame with one row per latent variable, ordered by FDR.
Columns: LV, Mean_Reference, Mean_Comparison,
Mean_Diff, P_Value, FDR.
B <- matrix(rnorm(30), nrow = 3) rownames(B) <- paste0("LV", seq_len(3)) colnames(B) <- paste0("S", seq_len(10)) meta <- data.frame( id = colnames(B), type = rep(c("Control", "Case"), each = 5) ) differentialLVActivity(B, meta, reference = "Control")B <- matrix(rnorm(30), nrow = 3) rownames(B) <- paste0("LV", seq_len(3)) colnames(B) <- paste0("S", seq_len(10)) meta <- data.frame( id = colnames(B), type = rep(c("Control", "Case"), each = 5) ) differentialLVActivity(B, meta, reference = "Control")
Filters an FBM based on row-level mean and variance thresholds, returning a new FBM with only the selected rows.
filterFBM( fbm, rowStats, keep_samples_idx = NULL, mean_cutoff = NULL, var_cutoff = NULL, backingfile = "filtered_fbm" )filterFBM( fbm, rowStats, keep_samples_idx = NULL, mean_cutoff = NULL, var_cutoff = NULL, backingfile = "filtered_fbm" )
fbm |
A |
rowStats |
A list with numeric vectors |
keep_samples_idx |
Optional integer vector of column indices to retain.
Default is |
mean_cutoff |
Optional minimum mean threshold; rows with means below this are removed. |
var_cutoff |
Optional minimum variance threshold; rows with variances below this are removed. |
backingfile |
A character string specifying the filename (without extension) for the new FBM. |
This function creates a new FBM and copies over only the rows that pass the filtering criteria. The original FBM is unchanged.
A list with:
A new FBM object containing only filtered rows.
Indices of rows retained in the filtering step.
fbm <- bigstatsr::FBM(5, 3, init = matrix(rnorm(15), nrow = 5)) rs <- list( row_means = rowMeans(fbm[]), row_variances = apply(fbm[], 1, var) ) out <- filterFBM(fbm, rs, mean_cutoff = -0.2, var_cutoff = 0.5, backingfile = tempfile() )fbm <- bigstatsr::FBM(5, 3, init = matrix(rnorm(15), nrow = 5)) rs <- list( row_means = rowMeans(fbm[]), row_variances = apply(fbm[], 1, var) ) out <- filterFBM(fbm, rs, mean_cutoff = -0.2, var_cutoff = 0.5, backingfile = tempfile() )
Find the location of the maximum of a smoothing spline
findSplineMax(x, y, n = 1000, spar = NULL)findSplineMax(x, y, n = 1000, spar = NULL)
x |
Numeric vector of x values. |
y |
Numeric vector of y values (same length as |
n |
Integer, number of grid points to evaluate (default 1000). |
spar |
Smoothing parameter passed to |
A named list with components:
The x coordinate at which the spline reaches its maximum.
The corresponding maximum fitted y value.
x <- seq_len(10) y <- sin(x) + rnorm(10, 0, 0.1) findSplineMax(x, y)x <- seq_len(10) y <- sin(x) + rnorm(10, 0, 0.1) findSplineMax(x, y)
Given a summary data frame from cross-validation, reports the number of latent variables with maximum AUC exceeding 0.7, 0.8, and 0.9.
getAUCstats(summary)getAUCstats(summary)
summary |
A data frame with |
A named numeric vector with counts for thresholds 0.7, 0.8, and 0.9.
Computes the transformation matrix Chat used to map from observed
data to latent space, based on a pseudo-inverse of the prior annotation
matrix. Optionally standardizes the columns
of priorMat before computing.
getChat(priorMat, scale = TRUE)getChat(priorMat, scale = TRUE)
priorMat |
A numeric or sparse matrix (features x pathways) containing prior annotations. |
scale |
Logical; if |
A numeric matrix Chat of dimensions (pathways x features).
# simple toy prior: 3 features x 2 pathways priorMat <- matrix( c( 1, 0, 1, 0, 1, 0 ), nrow = 3, ncol = 2, dimnames = list( paste0("gene", seq_len(3)), paste0("path", seq_len(2)) ) ) # compute Chat (2 pathways x 3 features) Chat <- getChat(priorMat)# simple toy prior: 3 features x 2 pathways priorMat <- matrix( c( 1, 0, 1, 0, 1, 0 ), nrow = 3, ncol = 2, dimnames = list( paste0("gene", seq_len(3)), paste0("path", seq_len(2)) ) ) # compute Chat (2 pathways x 3 features) Chat <- getChat(priorMat)
Downloads a Gene Matrix Transposed (GMT) file from a specified URL, reads it into R as a list, and removes the temporary file afterward.
getGMT(url, name = NULL, cache_dir = NULL, redownload = FALSE)getGMT(url, name = NULL, cache_dir = NULL, redownload = FALSE)
url |
A character string specifying the URL to a GMT file. |
name |
Optional name for the GMT file (defaults to the portion after the last '=' in the URL). |
cache_dir |
Optional directory in which to cache/download the GMT file. |
redownload |
Logical; if |
A named list where each element is a character vector of gene names for a given gene set.
url <- paste0( "https://maayanlab.cloud/Enrichr/geneSetLibrary", "?mode=text&libraryName=KEGG_2019_Human" ) gmt_list <- getGMT(url) # list available gene sets names(gmt_list) # inspect the first few genes in the first gene set head(gmt_list[[1]])url <- paste0( "https://maayanlab.cloud/Enrichr/geneSetLibrary", "?mode=text&libraryName=KEGG_2019_Human" ) gmt_list <- getGMT(url) # list available gene sets names(gmt_list) # inspect the first few genes in the first gene set head(gmt_list[[1]])
Filters a gene-by-pathway annotation matrix to retain only pathways
with sufficient overlap with a given gene set. The result is a sparse matrix
aligned to new.genes, with columns (pathways) retained only if they
have at least min.genes matched genes.
getMatchedPathwayMat(pathMat, new.genes, min.genes = 10)getMatchedPathwayMat(pathMat, new.genes, min.genes = 10)
pathMat |
A sparse binary matrix of genes (rows) x pathways (columns). |
new.genes |
Character vector of gene names to match. |
min.genes |
Minimum number of overlapping genes required to keep a pathway. |
A sparse matrix of dimensions length(new.genes) x
filtered pathways.
library(Matrix) # create a toy gene-by-pathway sparse matrix genes <- paste0("g", seq_len(6)) pathways <- c("Path1", "Path2", "Path3") # Path1: g1, g2; Path2: g2, g3, g4; Path3: g5 pathMat <- sparseMatrix( i = c(1, 2, 2, 3, 4, 5), j = c(1, 1, 2, 2, 2, 3), dims = c(length(genes), length(pathways)), dimnames = list(genes, pathways) ) new.genes <- genes filtered <- getMatchedPathwayMat(pathMat, new.genes, min.genes = 2)library(Matrix) # create a toy gene-by-pathway sparse matrix genes <- paste0("g", seq_len(6)) pathways <- c("Path1", "Path2", "Path3") # Path1: g1, g2; Path2: g2, g3, g4; Path3: g5 pathMat <- sparseMatrix( i = c(1, 2, 2, 3, 4, 5), j = c(1, 1, 2, 2, 2, 3), dims = c(length(genes), length(pathways)), dimnames = list(genes, pathways) ) new.genes <- genes filtered <- getMatchedPathwayMat(pathMat, new.genes, min.genes = 2)
Filters gene-by-pathway annotation matrices to retain only pathways
with sufficient overlap with a given gene set. The result is a sparse matrix
aligned to new.genes, combining all inputs column-wise.
getMatchedPathwayMat2(..., new.genes, min.genes = 10)getMatchedPathwayMat2(..., new.genes, min.genes = 10)
... |
One or more sparse binary matrices (genes x pathways). |
new.genes |
Character vector of gene names to match. |
min.genes |
Minimum number of overlapping genes required to keep a pathway. |
A sparse matrix with rows = new.genes and columns =
filtered pathways from all inputs.
Filters one or more gene-by-pathway annotation matrices to retain only
pathways
with sufficient overlap with a given target gene set. Each input matrix is
restricted to new.genes, and pathways with fewer than min.genes
overlapping genes are removed. The resulting matrices are column-bound into a
single sparse matrix aligned to new.genes.
getMatchedPathwayMatList(..., new.genes, min.genes = 10)getMatchedPathwayMatList(..., new.genes, min.genes = 10)
... |
One or more binary matrices (genes x pathways), either base
|
new.genes |
Character vector of gene names to align all pathway matrices to. |
min.genes |
Integer; minimum number of overlapping genes required for a pathway to be retained. Default is 10. |
A sparse binary matrix with rows equal to new.genes and
columns equal to
the union of filtered pathways from all input matrices.
set.seed(123) library(Matrix) # Simulate two small pathway matrices (genes × pathways) genes <- paste0("Gene", 1:100) pathways1 <- paste0("Path", 1:5) pathways2 <- paste0("Path", 6:10) mat1 <- matrix(sample(c(0, 1), 100 * 5, replace = TRUE, prob = c(0.9, 0.1)), nrow = 100, ncol = 5, dimnames = list(genes, pathways1) ) mat2 <- matrix(sample(c(0, 1), 100 * 5, replace = TRUE, prob = c(0.9, 0.1)), nrow = 100, ncol = 5, dimnames = list(genes, pathways2) ) # Define target genes (subset of total) new.genes <- sample(genes, 50) # Match and filter pathways with at least 5 genes matched <- getMatchedPathwayMatList(mat1, mat2, new.genes = new.genes, min.genes = 5 )set.seed(123) library(Matrix) # Simulate two small pathway matrices (genes × pathways) genes <- paste0("Gene", 1:100) pathways1 <- paste0("Path", 1:5) pathways2 <- paste0("Path", 6:10) mat1 <- matrix(sample(c(0, 1), 100 * 5, replace = TRUE, prob = c(0.9, 0.1)), nrow = 100, ncol = 5, dimnames = list(genes, pathways1) ) mat2 <- matrix(sample(c(0, 1), 100 * 5, replace = TRUE, prob = c(0.9, 0.1)), nrow = 100, ncol = 5, dimnames = list(genes, pathways2) ) # Define target genes (subset of total) new.genes <- sample(genes, 50) # Match and filter pathways with at least 5 genes matched <- getMatchedPathwayMatList(mat1, mat2, new.genes = new.genes, min.genes = 5 )
Filters a gene-by-pathway annotation matrix to retain only pathways
with sufficient overlap with a given gene set. The result is a sparse matrix
aligned to new.genes, with columns (pathways) retained only if they
have at least min.genes matched genes.
getMatchedPathwayMatOld(pathMat, new.genes, min.genes = 10)getMatchedPathwayMatOld(pathMat, new.genes, min.genes = 10)
pathMat |
A sparse binary matrix of genes (rows) x pathways (columns). |
new.genes |
Character vector of gene names to match. |
min.genes |
Minimum number of overlapping genes required to keep a pathway. |
A sparse matrix of dimensions length(new.genes) x
filtered pathways.
Summarizes a cross-validation results data frame to extract the highest AUC value associated with each latent variable (LV).
getMaxAUC(summary, verbose = FALSE)getMaxAUC(summary, verbose = FALSE)
summary |
A data frame (e.g., from |
verbose |
Logical; if |
A data frame with columns LV index and max_AUC.
This function estimates a characteristic scale from a vector of singular
values
by fitting a linear model to the tail and extrapolating to length n.
The median of the extrapolated values is returned as the estimate.
If no sufficiently linear tail is detected (based on min_r2) or
the extrapolation produces negative values, a fallback estimate is returned
using the 75\
getScaleFromSVs(sv, n, min_r2 = 0.95)getScaleFromSVs(sv, n, min_r2 = 0.95)
sv |
Numeric vector of singular values sorted in decreasing order. |
n |
Integer, total length to which the linear tail is extrapolated. |
min_r2 |
Numeric, minimum R-squared value required for accepting
the linear tail fit. Defaults to |
A numeric scalar giving the estimated scale. If linear extrapolation
is unreliable, returns sv[ceiling(0.75 * length(sv))].
sv <- exp(-seq(0, 5, length.out = 50)) + rnorm(50, 0, 0.01) getScaleFromSVs(sv, n = 100)sv <- exp(-seq(0, 5, length.out = 50)) + rnorm(50, 0, 0.01) getScaleFromSVs(sv, n = 100)
Converts a list of named gene sets (e.g., from getGMT()) into a sparse
binary matrix where rows are genes, columns are gene sets, and entries
are 1 if the gene is in the set.
gmtListToSparseMat(gmtList)gmtListToSparseMat(gmtList)
gmtList |
A nested list of gene sets. Outer names are gene set names; each entry is a character vector of gene names. |
A sparse binary matrix with genes as rows and gene sets as columns.
# define a simple nested GMT list gmt1 <- list( PathwayA = c("Gene1", "Gene2", "Gene3"), PathwayB = c("Gene2", "Gene4") ) gmt2 <- list( PathwayC = c("Gene1", "Gene4"), PathwayD = c("Gene3", "Gene5") ) # combine into a nested list nestedList <- list(gmt1 = gmt1, gmt2 = gmt2) # convert to sparse matrix sparseMat <- gmtListToSparseMat(nestedList)# define a simple nested GMT list gmt1 <- list( PathwayA = c("Gene1", "Gene2", "Gene3"), PathwayB = c("Gene2", "Gene4") ) gmt2 <- list( PathwayC = c("Gene1", "Gene4"), PathwayD = c("Gene3", "Gene5") ) # combine into a nested list nestedList <- list(gmt1 = gmt1, gmt2 = gmt2) # convert to sparse matrix sparseMat <- gmtListToSparseMat(nestedList)
A factor vector annotating samples by major cell type. This dataset provides cell-type labels corresponding to each sample in the reference expression matrix.
data(majorCellTypes)data(majorCellTypes)
A factor (or character) vector of length N, where each element corresponds to a sample and indicates its major cell type.
A factor (or character) vector of major cell-type labels.
data(majorCellTypes)data(majorCellTypes)
Multiplies two matrices, using optimized multiplication if the first is a Filebacked Big Matrix (FBM).
mat_mult(mat1, mat2, ncores = 1)mat_mult(mat1, mat2, ncores = 1)
mat1 |
A matrix or an object of class |
mat2 |
A numeric matrix. |
ncores |
Number of cores to use for parallel computation (only used if mat1 is an FBM). Default is 1. |
Matrix product of mat1 and mat2.
set.seed(123) mat1 <- matrix(rnorm(20), nrow = 4) mat2 <- matrix(rnorm(15), nrow = 5) res1 <- mat_mult(mat1, mat2) res1set.seed(123) mat1 <- matrix(rnorm(20), nrow = 4) mat2 <- matrix(rnorm(15), nrow = 5) res1 <- mat_mult(mat1, mat2) res1
Finds a one-to-one assignment (permutation) between rows and columns of a square correlation matrix that maximizes the total correlation score, using a greedy algorithm.
max_correspondence_greedy(cor_mat)max_correspondence_greedy(cor_mat)
cor_mat |
A square numeric matrix of pairwise correlations (rows = items, cols = items). |
A vector of assignments (integer indices).
Wrapper around message() that pastes arguments together into a single
string.
mymessage(...)mymessage(...)
... |
Character strings to concatenate and print. |
Invisibly returns NULL. Called for side effects (messages).
Estimate number of principal components via elbow or permutation method
num.pc(data, method = c("elbow", "permutation"), B = 20, seed = NULL)num.pc(data, method = c("elbow", "permutation"), B = 20, seed = NULL)
data |
Either a matrix (e.g. z-scored data) or an SVD result (list with $d). |
method |
One of "elbow" (fast) or "permutation" (slower). |
B |
Number of permutations (for method = "permutation"). |
seed |
Seed for reproducibility. |
Estimated number of PCs.
# generate a small random matrix: 5 features x 10 samples mat <- matrix(rnorm(5 * 10), nrow = 5) # fast elbow estimate num.pc(mat, method = "elbow") # slower permutation estimate (use fewer perms for example speed) num.pc(mat, method = "permutation", B = 5)# generate a small random matrix: 5 features x 10 samples mat <- matrix(rnorm(5 * 10), nrow = 5) # fast elbow estimate num.pc(mat, method = "elbow") # slower permutation estimate (use fewer perms for example speed) num.pc(mat, method = "permutation", B = 5)
Selects the highest-scoring one-to-one pairs between rows and columns
of a matrix, similar to a greedy bipartite matching. All other entries
are set to a sentinel value (default -100).
oneToOneMask(cc)oneToOneMask(cc)
cc |
A numeric matrix of association scores. |
A numeric matrix of the same dimensions as cc,
where only the selected one-to-one maxima are retained
and all other entries are set to -100.
set.seed(1) m <- matrix(runif(16), 4, 4) oneToOneMask(m)set.seed(1) m <- matrix(runif(16), 4, 4) oneToOneMask(m)
A list of curated pathway and biological process gene sets used as prior knowledge for CLAMP and other latent-variable models.
data(panDB)data(panDB)
A named list of length M, where each element is a gene-set collection.
Each element corresponds to a functional collection (e.g., KEGG, Reactome, GO), where each entry contains a character vector of gene symbols.
A list of gene-set collections used as priors for pathway-informed modeling.
data(panDB)data(panDB)
Computes a stable pseudoinverse of a symmetric positive semi-definite matrix using singular value decomposition (SVD) and ridge regularization. This is useful when the matrix is ill-conditioned or rank-deficient.
pinv.ridge(m, alpha = 0)pinv.ridge(m, alpha = 0)
m |
A symmetric numeric matrix (e.g., from |
alpha |
Non-negative scalar specifying the ridge penalty. A small positive value stabilizes the inversion by shrinking large singular values. |
A numeric matrix representing the ridge-regularized
pseudoinverse of m.
Produces a combined heatmap showing expression, pathway membership, and optionally Z-loadings for top genes per LV using ComplexHeatmap.
plotTopZ_Complex( clampRes, data, priorMat, top = 10, top.pathway = 5, index = NULL, allLVs = FALSE, Zheat = FALSE, LV.names = NULL, max.genes = 100, max.col = 50, seed = 1234 )plotTopZ_Complex( clampRes, data, priorMat, top = 10, top.pathway = 5, index = NULL, allLVs = FALSE, Zheat = FALSE, LV.names = NULL, max.genes = 100, max.col = 50, seed = 1234 )
clampRes |
A CLAMP result list containing matrices |
data |
Expression matrix with genes as rows. |
priorMat |
Binary gene × pathway matrix. |
top |
Integer, number of top genes per LV. |
top.pathway |
Integer, number of top pathways per LV to annotate. |
index |
Optional vector of LVs to include. |
allLVs |
Logical; include all LVs. |
Zheat |
Logical; whether to include the Z matrix as an additional heatmap. |
LV.names |
Optional character vector for LV names. |
max.genes |
Maximum number of genes allowed in the plot. |
max.col |
Maximum number of columns (samples). |
seed |
Random seed for column subsampling. |
Invisibly returns the drawn ComplexHeatmap object.
library(ComplexHeatmap) library(Matrix) # Simulate small CLAMP-like results set.seed(123) genes <- paste0("Gene", 1:100) samples <- paste0("S", 1:20) lvs <- paste0("LV", 1:3) # Simulated Z (gene loadings) and U (pathway loadings) Z <- matrix(rnorm(100 * 3), nrow = 100, dimnames = list(genes, lvs)) U <- matrix(abs(rnorm(50 * 3)), nrow = 50, dimnames = list(paste0("Path", 1:50), lvs) ) # Expression data expr_data <- matrix(rnorm(100 * 20), nrow = 100, dimnames = list(genes, samples) ) # Binary gene × pathway matrix priorMat <- matrix(sample(0:1, 100 * 50, replace = TRUE, prob = c(0.9, 0.1)), nrow = 100, ncol = 50, dimnames = list(genes, paste0("Path", 1:50)) ) # Create a CLAMP-like result list clampRes <- list(Z = Z, U = U) # Plot top genes and pathway memberships plotTopZ_Complex(clampRes, expr_data, priorMat, top = 5, top.pathway = 3, index = 1:2, Zheat = TRUE )library(ComplexHeatmap) library(Matrix) # Simulate small CLAMP-like results set.seed(123) genes <- paste0("Gene", 1:100) samples <- paste0("S", 1:20) lvs <- paste0("LV", 1:3) # Simulated Z (gene loadings) and U (pathway loadings) Z <- matrix(rnorm(100 * 3), nrow = 100, dimnames = list(genes, lvs)) U <- matrix(abs(rnorm(50 * 3)), nrow = 50, dimnames = list(paste0("Path", 1:50), lvs) ) # Expression data expr_data <- matrix(rnorm(100 * 20), nrow = 100, dimnames = list(genes, samples) ) # Binary gene × pathway matrix priorMat <- matrix(sample(0:1, 100 * 50, replace = TRUE, prob = c(0.9, 0.1)), nrow = 100, ncol = 50, dimnames = list(genes, paste0("Path", 1:50)) ) # Create a CLAMP-like result list clampRes <- list(Z = Z, U = U) # Plot top genes and pathway memberships plotTopZ_Complex(clampRes, expr_data, priorMat, top = 5, top.pathway = 3, index = 1:2, Zheat = TRUE )
Filters genes by mean expression and variance, returning the filtered matrix and per-gene statistics.
preprocessCLAMP(Y, mean_cutoff = 0, var_cutoff = 0)preprocessCLAMP(Y, mean_cutoff = 0, var_cutoff = 0)
Y |
Numeric matrix of gene expression (rows = genes, cols = samples) |
mean_cutoff |
Numeric. Minimum row-mean required to keep a gene (default 0). |
var_cutoff |
Numeric. Minimum row-variance required to keep a gene (default 0). |
A list with components:
Y_filtered: filtered matrix (genes x samples)
rowStats: data.frame with columns mean and variance for each kept gene
kept_rows: integer vector of the original row indices that were kept
# construct a small example matrix mat <- matrix( c( 1, 5, 10, 2, 6, 11, 3, 7, 12, 4, 8, 13 ), nrow = 4, byrow = FALSE, dimnames = list(paste0("gene", seq_len(4)), paste0("sample", seq_len(3))) ) # keep genes with mean >= 6 and variance >= 2 res <- preprocessCLAMP(mat, mean_cutoff = 6, var_cutoff = 2)# construct a small example matrix mat <- matrix( c( 1, 5, 10, 2, 6, 11, 3, 7, 12, 4, 8, 13 ), nrow = 4, byrow = FALSE, dimnames = list(paste0("gene", seq_len(4)), paste0("sample", seq_len(3))) ) # keep genes with mean >= 6 and variance >= 2 res <- preprocessCLAMP(mat, mean_cutoff = 6, var_cutoff = 2)
Makes a writable copy of the input FBM, cleans it (log2 transform if needed, fill NAs), filters rows by mean/variance, and returns the filtered FBM plus stats and indices.
preprocessCLAMPFBM( fbm, mean_cutoff = NULL, var_cutoff = NULL, backingfile = NULL, block_size = 1000, ncores = 1 )preprocessCLAMPFBM( fbm, mean_cutoff = NULL, var_cutoff = NULL, backingfile = NULL, block_size = 1000, ncores = 1 )
fbm |
A bigstatsr::FBM (genes x samples), possibly read-only. |
mean_cutoff |
Numeric or NULL. Minimum row mean to keep (NULL = no mean filter). |
var_cutoff |
Numeric or NULL. Minimum row variance to keep (NULL = no var filter). |
backingfile |
Character or NULL. Base name for the copy FBM and filtered FBM on disk. If NULL, defaults to paste0(fbm$backingfile, '_preproc') and '_filtered'. |
block_size |
Number of rows to process at a time when copying data. Default is 1000. |
ncores |
Integer; number of cores to use for parallel operations (default 1). |
A list with:
fbm_filtered |
The filtered FBM (writable). |
rowStats |
List with row_means & row_variances for fbm_filtered. |
kept_rows |
Integer vector of original row indices that were retained. |
library(bigstatsr) # create a toy matrix and back it with an FBM mat <- matrix( c( 1, 2, 3, # geneA 10, 20, 30, # geneB 100, 200, 300 # geneC ), nrow = 3, byrow = TRUE, dimnames = list(c("geneA", "geneB", "geneC"), paste0("s", seq_len(3))) ) fbm <- FBM(nrow(mat), ncol(mat), init = mat) # preprocess without filtering (all genes kept) res_all <- preprocessCLAMPFBM(fbm)library(bigstatsr) # create a toy matrix and back it with an FBM mat <- matrix( c( 1, 2, 3, # geneA 10, 20, 30, # geneB 100, 200, 300 # geneC ), nrow = 3, byrow = TRUE, dimnames = list(c("geneA", "geneB", "geneC"), paste0("s", seq_len(3))) ) fbm <- FBM(nrow(mat), ncol(mat), init = mat) # preprocess without filtering (all genes kept) res_all <- preprocessCLAMPFBM(fbm)
Computes the latent loadings B for new gene expression data using
the latent variables Z from a fitted CLAMP model. This allows
transfer of the learned latent structure to new datasets with matched genes.
projectCLAMP( CLAMPres, newdata, scale = 1, ncores = 1, align = TRUE, verbose = TRUE )projectCLAMP( CLAMPres, newdata, scale = 1, ncores = 1, align = TRUE, verbose = TRUE )
CLAMPres |
A result object from |
newdata |
A gene expression matrix (genes x samples) to be projected. Can be a standard matrix, sparse matrix, or FBM/big.matrix. |
scale |
Optional numeric multiplier for the L2 regularization terms. Default is 1. |
ncores |
Number of cores to use for parallel computation (only used if newdata is an FBM). Default is 1. |
align |
Logical; if |
verbose |
Logical; if |
This function uses ridge-regularized least squares to compute
B = solve(Z'Z + L2 * I) * Z'Y, where Z is the latent matrix
from the trained CLAMP model and Y is the new dataset. If
newdata is a Filebacked Big Matrix (FBM) and does not need row-name
subsetting, the computation is optimized using
bigstatsr::big_cprodMat().
A matrix B of projected latent loadings (LVs x samples)
for the new dataset.
# fit a tiny CLAMP model for projection Y0 <- matrix(rnorm(5 * 3), nrow = 5, dimnames = list(paste0("Gene", 1:5), paste0("S", 1:3)) ) base <- CLAMPbase(Y0, clamp_k = 2, max.iter = 1, trace = FALSE) # new data can be provided in a different row order newY <- matrix(rnorm(5 * 2), nrow = 5, dimnames = list(rev(rownames(Y0)), paste0("N", 1:2)) ) projB <- projectCLAMP(base, newdata = newY) # check dimensions: 2 latent vars x 2 samples dim(projB)# fit a tiny CLAMP model for projection Y0 <- matrix(rnorm(5 * 3), nrow = 5, dimnames = list(paste0("Gene", 1:5), paste0("S", 1:3)) ) base <- CLAMPbase(Y0, clamp_k = 2, max.iter = 1, trace = FALSE) # new data can be provided in a different row order newY <- matrix(rnorm(5 * 2), nrow = 5, dimnames = list(rev(rownames(Y0)), paste0("N", 1:2)) ) projB <- projectCLAMP(base, newdata = newY) # check dimensions: 2 latent vars x 2 samples dim(projB)
Parses a local GMT file and returns a list of gene sets. Each gene set is represented as a character vector of unique gene names.
read_gmt(filename)read_gmt(filename)
filename |
A |
A list where each element is a character vector of gene names,
named by the gene set ID.
# Bioconductor requires runnable examples. # We create a dummy GMT file for this example: gmt_file <- tempfile(fileext = ".gmt") writeLines( c( "PATHWAY_A\thttp://link.com\tGENE1\tGENE2\tGENE3", "PATHWAY_B\thttp://link.com\tGENE2\tGENE4" ), con = gmt_file ) # Run the function gs <- read_gmt(gmt_file) # Inspect results length(gs) names(gs) gs[["PATHWAY_A"]]# Bioconductor requires runnable examples. # We create a dummy GMT file for this example: gmt_file <- tempfile(fileext = ".gmt") writeLines( c( "PATHWAY_A\thttp://link.com\tGENE1\tGENE2\tGENE3", "PATHWAY_B\thttp://link.com\tGENE2\tGENE4" ), con = gmt_file ) # Run the function gs <- read_gmt(gmt_file) # Inspect results length(gs) names(gs) gs[["PATHWAY_A"]]
Solves with ridge penalty matrix L2.
ridge_B(Y, Z, L2k)ridge_B(Y, Z, L2k)
Y |
Gene expression matrix (genes x samples), dense or FBM. |
Z |
Latent variable matrix (genes x k). |
L2k |
Ridge penalty matrix (k x k). |
A numeric matrix of size k x samples.
set.seed(123) genes <- paste0("Gene", 1:50) samples <- paste0("S", 1:20) k <- 5 Y <- matrix(rnorm(50 * 20), nrow = 50, dimnames = list(genes, samples)) Z <- matrix(rnorm(50 * k), nrow = 50, dimnames = list(genes, paste0("LV", 1:k)) ) lambda <- 0.1 L2k <- diag(lambda, k) # Solve for B = (Z'Z + L2)^(-1) Z'Y B <- ridge_B(Y, Z, L2k)set.seed(123) genes <- paste0("Gene", 1:50) samples <- paste0("S", 1:20) k <- 5 Y <- matrix(rnorm(50 * 20), nrow = 50, dimnames = list(genes, samples)) Z <- matrix(rnorm(50 * k), nrow = 50, dimnames = list(genes, paste0("LV", 1:k)) ) lambda <- 0.1 L2k <- diag(lambda, k) # Solve for B = (Z'Z + L2)^(-1) Z'Y B <- ridge_B(Y, Z, L2k)
Ensures consistency in SVD output by flipping signs so that each left singular vector has a majority of positive entries.
rotateSVD(svdres)rotateSVD(svdres)
svdres |
A list as returned by |
A modified svd-result list where each column of $u
has been sign-flipped so that its entries sum to a nonnegative value;
$v is flipped correspondingly.
Computes the Pearson correlation for each row between matrices
A and B.
row_cor(A, B)row_cor(A, B)
A |
A numeric matrix. |
B |
A numeric matrix of the same dimensions as |
A numeric vector of correlations, one per row.
Run elbow method to estimate number of PCs
run_elbow(d)run_elbow(d)
d |
Vector of singular values |
Estimated number of PCs via elbow
Run permutation method to estimate number of PCs
run_permutation(data, d, B = 20)run_permutation(data, d, B = 20)
data |
Raw data matrix (row-normalized) |
d |
Vector of singular values |
B |
Number of permutations |
Estimated number of PCs via permutation test
Chooses the default clamp_k used by the CLAMP solvers when the user
does not provide one, and returns the scale used for downstream L1/L2
regularization. Multiple methods are available via method:
select_clamp_k( svdres, n_samples, svd_k, method = c("elbow", "permutation", "gavish_donoho", "scaleSVs"), data = NULL, B = 20 )select_clamp_k( svdres, n_samples, svd_k, method = c("elbow", "permutation", "gavish_donoho", "scaleSVs"), data = NULL, B = 20 )
svdres |
An SVD result with a |
n_samples |
Integer number of samples in the original matrix
(i.e. |
svd_k |
Integer upper bound on |
method |
One of |
data |
Raw data matrix. Required for |
B |
Number of permutations for |
"elbow" (default)Elbow heuristic on the singular-value spectrum
via num.pc(svdres, method = "elbow"). scale = svdres$d[clamp_k].
"permutation"Permutation test via num.pc(data, method = "permutation", B = B). Requires the raw row-normalized data matrix.
scale = svdres$d[clamp_k].
"gavish_donoho"Gavish-Donoho optimal singular-value threshold
via PCAtools::chooseGavishDonoho(). Requires the raw data matrix
(used for n_genes). scale = svdres$d[clamp_k].
"scaleSVs"Previous behavior: getScaleFromSVs() linear-tail fit,
clamp_k <- min(floor(k * 1.5), svd_k), scale from the fit.
An integer: the selected number of latent variables.
set.seed(1) Y <- matrix(rnorm(100 * 30), nrow = 100, ncol = 30) svdres <- compute_svd(Y, k = 25) select_clamp_k(svdres, n_samples = ncol(Y), svd_k = 25)set.seed(1) Y <- matrix(rnorm(100 * 30), nrow = 100, ncol = 30) svdres <- compute_svd(Y, k = 25) select_clamp_k(svdres, n_samples = ncol(Y), svd_k = 25)
Returns the default number of components to compute in a truncated SVD
for the given input matrix. Used by the CLAMP solvers when svd_k is
not provided explicitly. Other SVD contexts use their own heuristics.
select_svd_k(Y)select_svd_k(Y)
Y |
A matrix-like object (dense matrix, |
An integer: max(2, floor((min(nrow(Y), ncol(Y)) - 1) / 4)).
select_svd_k(matrix(0, nrow = 100, ncol = 20))select_svd_k(matrix(0, nrow = 100, ncol = 20))
For each column of a target matrix Z using pathway or prior
annotation priorMat. It performs regularization selection using
cross-validation and can apply either Supports continuous or binary
response models. Relaxed refitting is supported for final coefficient
estimation.
solveU( Z, Chat = NULL, priorMat, penalty.factor, pathwaySelection = c("fast", "complete"), alpha = 0.9, maxPath = 10, nfolds = 5, useSE = FALSE, top = NULL, binary = FALSE, nlambda = 20, scale = TRUE, refit = TRUE, Uprev = NULL, useAUC = TRUE, intercept = TRUE, ... )solveU( Z, Chat = NULL, priorMat, penalty.factor, pathwaySelection = c("fast", "complete"), alpha = 0.9, maxPath = 10, nfolds = 5, useSE = FALSE, top = NULL, binary = FALSE, nlambda = 20, scale = TRUE, refit = TRUE, Uprev = NULL, useAUC = TRUE, intercept = TRUE, ... )
Z |
A numeric matrix with features (rows) and samples (columns). |
Chat |
(Optional) Precomputed pseudo-inverse of |
priorMat |
A numeric matrix with prior information (features x pathways). |
penalty.factor |
Optional penalty weights for features in
|
pathwaySelection |
Method to select candidate pathways: |
alpha |
Elastic net mixing parameter (0 = ridge, 1 = lasso). Default is 0.9. |
maxPath |
Maximum number of pathways/features selected per column. Default is 10. |
nfolds |
Number of cross-validation folds. Default is 5. |
useSE |
Whether to use the 1-standard-error rule for lambda selection.
Default is |
top |
If set, sets to 0 all but the top entries of |
binary |
If |
nlambda |
Number of lambda values for glmnet. Default is 20. |
scale |
Whether to standardize predictors in glmnet.
Default is |
refit |
Whether to perform relaxed refitting using selected
predictors. Default is |
Uprev |
(Optional) Previous U matrix to reuse. In this mode only
the columns of |
useAUC |
Logical; whether to compute pathway-LV associations using AUC (default TRUE) instead of OLS. |
intercept |
Logical; whether to include an intercept term in glmnet models. Default is TRUE. |
... |
Additional arguments passed to |
A list with one element:
UA matrix of loadings (features x components).
Columns are named LV1, LV2, ...
set.seed(123) genes <- paste0("G", 1:200) lvs <- paste0("LV", 1:4) paths <- paste0("Path", 1:60) Z <- matrix(rnorm(200 * 4), nrow = 200, dimnames = list(genes, lvs)) priorMat <- matrix(rbinom(200 * 60, 1, 0.07), nrow = 200, dimnames = list(genes, paths) ) fit1 <- solveU( Z = Z, priorMat = priorMat, pathwaySelection = "fast", alpha = 0.9, maxPath = 10, nfolds = 5, binary = FALSE, refit = TRUE )set.seed(123) genes <- paste0("G", 1:200) lvs <- paste0("LV", 1:4) paths <- paste0("Path", 1:60) Z <- matrix(rnorm(200 * 4), nrow = 200, dimnames = list(genes, lvs)) priorMat <- matrix(rbinom(200 * 60, 1, 0.07), nrow = 200, dimnames = list(genes, paths) ) fit1 <- solveU( Z = Z, priorMat = priorMat, pathwaySelection = "fast", alpha = 0.9, maxPath = 10, nfolds = 5, binary = FALSE, refit = TRUE )
Squash extreme z-scores
squashZscore(zdata, maxScore = 2)squashZscore(zdata, maxScore = 2)
zdata |
Numeric vector or matrix of z-scores. |
maxScore |
Numeric scalar, maximum absolute score (default 2). |
A numeric object of same dimensions as zdata, with values
shrunk by a hyperbolic tangent transformation.
z <- rnorm(10, 0, 5) squashZscore(z)z <- rnorm(10, 0, 5) squashZscore(z)
Standardizes each row of a numeric matrix to have mean 0 and standard deviation 1. Missing values are ignored in the computation of the mean and standard deviation.
tscale(x)tscale(x)
x |
A numeric matrix. Each row will be scaled independently. |
A numeric matrix of the same dimensions as x, where each row
has mean 0 and standard deviation 1 (ignoring NAs). If a row
has zero variance, it is returned unchanged.
mat <- matrix(seq_len(9), nrow = 3) tscale(mat)mat <- matrix(seq_len(9), nrow = 3) tscale(mat)
For each column, replaces values greater than the k-th largest entry with that threshold.
winsor_topk(M, k)winsor_topk(M, k)
M |
A numeric matrix. |
k |
Integer; number of top elements to cap. Must be >= 1 and <= nrow(M). |
A numeric matrix of the same dimensions as M, winsorized
per column.
set.seed(123) M <- matrix(rnorm(20 * 5, mean = 0, sd = 2), nrow = 20, dimnames = list(paste0("Gene", 1:20), paste0("S", 1:5)) ) # Display column maxima before winsorization apply(M, 2, max) # Winsorize each column by capping top 3 values M_winsor <- winsor_topk(M, k = 3)set.seed(123) M <- matrix(rnorm(20 * 5, mean = 0, sd = 2), nrow = 20, dimnames = list(paste0("Gene", 1:20), paste0("S", 1:5)) ) # Display column maxima before winsorization apply(M, 2, max) # Winsorize each column by capping top 3 values M_winsor <- winsor_topk(M, k = 3)
A numeric matrix of cell-type signatures derived from xCell, used to represent the transcriptional profiles of distinct immune and stromal cell populations.
data(xCell)data(xCell)
A numeric matrix with G genes (rows) and C cell types (columns). Row names are gene symbols; column names are xCell cell-type labels.
A matrix of xCell-derived reference expression signatures.
data(xCell)data(xCell)
Centers each gene to mean 0 and scales to unit variance.
zscoreCLAMP(Y_filtered, rowStats)zscoreCLAMP(Y_filtered, rowStats)
Y_filtered |
Numeric matrix (genes x samples) returned by preprocessCLAMP |
rowStats |
Data frame with numeric columns |
Numeric matrix of the same dimensions as Y_filtered, with each row centered and scaled.
# simple 2 genes x 3 samples matrix Y <- matrix( c( 2, 4, 6, # gene1 counts 8, 10, 12 # gene2 counts ), nrow = 2, byrow = TRUE, dimnames = list(c("gene1", "gene2"), paste0("sample", seq_len(3))) ) # compute per‐gene mean and variance rowStats <- data.frame( mean = rowMeans(Y), variance = apply(Y, 1, var), row.names = rownames(Y) ) # z‐score each row Y_z <- zscoreCLAMP(Y, rowStats)# simple 2 genes x 3 samples matrix Y <- matrix( c( 2, 4, 6, # gene1 counts 8, 10, 12 # gene2 counts ), nrow = 2, byrow = TRUE, dimnames = list(c("gene1", "gene2"), paste0("sample", seq_len(3))) ) # compute per‐gene mean and variance rowStats <- data.frame( mean = rowMeans(Y), variance = apply(Y, 1, var), row.names = rownames(Y) ) # z‐score each row Y_z <- zscoreCLAMP(Y, rowStats)
Standardizes each row of an FBM using provided row means and variances.
zscoreCLAMPFBM(fbm_filtered, rowStats, chunk_size = 1000, ncores = 1)zscoreCLAMPFBM(fbm_filtered, rowStats, chunk_size = 1000, ncores = 1)
fbm_filtered |
A bigstatsr::FBM produced by preprocessCLAMPFBM(). |
rowStats |
A list with row_means and row_variances from that FBM. |
chunk_size |
Columns per block (default 1000). |
ncores |
Integer; number of cores to use for parallel operations (default 1). |
A normalized FBM with z-scored rows.
library(bigstatsr) fbm <- FBM( nrow = 2, ncol = 4, init = matrix(seq_len(8), nrow = 2) ) stats <- list( row_means = rowMeans(fbm[]), row_variances = apply(fbm[], 1, var) ) zscoreCLAMPFBM(fbm, stats, chunk_size = 2)library(bigstatsr) fbm <- FBM( nrow = 2, ncol = 4, init = matrix(seq_len(8), nrow = 2) ) stats <- list( row_means = rowMeans(fbm[]), row_variances = apply(fbm[], 1, var) ) zscoreCLAMPFBM(fbm, stats, chunk_size = 2)