| Title: | comparative single-cell transcriptomics between disease models and a human condition |
|---|---|
| Description: | Provides with toolkits to implement a full singIST analysis with pseudobulked Seurat objects of disease models and human data. |
| Authors: | Aitor Moruno-Cuenca [aut, cre] (ORCID: <https://orcid.org/0009-0009-8133-2552>), Dr. Sergio Picart-Armada [rev] (ORCID: <https://orcid.org/0000-0002-6426-8204>), Dr. Alexandre Perera-Lluna [ths] (ORCID: <https://orcid.org/0000-0001-6427-851X>), Dr. Francesc Fernández-Albert [ths] (ORCID: <https://orcid.org/0000-0001-5561-0701>) |
| Maintainer: | Aitor Moruno-Cuenca <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-30 09:47:47 UTC |
| Source: | https://git.bioconductor.org/packages/singIST |
Ensure all celltype–sample combinations are present in the pseudobulkmatrix
add_missing_psb_rows(mat, celltypes, sample_ids)add_missing_psb_rows(mat, celltypes, sample_ids)
mat |
A numeric matrix with rownames in the form “celltype_sample”. |
celltypes |
Character vector of the celltypes you intend to include
(in the exact order of |
sample_ids |
Character vector of all sample identifiers
(in the order of |
A numeric matrix with length(celltypes) * length(sample_ids)
rows, in the canonical paste(celltypes, sample_ids, sep = "_")
order, where newly added rows are filled with NA_real_.
Implements stratified K‐fold cross‐validation with repetitions, mirroring the
structure of asmbPLSDA.cv.loo but using K k and ncv instead of LOO.
asmbPLSDA.cv.kcv( X.matrix, Y.matrix, PLS_term = 2, X.dim, quantile.comb.table, k = 4, ncv = 10, outcome.type = c("binary", "multiclass"), Method = NULL, measure = "B_accuracy", parallel = FALSE, expected.measure.increase = 0.005, center = TRUE, scale = TRUE, maxiter = 100 )asmbPLSDA.cv.kcv( X.matrix, Y.matrix, PLS_term = 2, X.dim, quantile.comb.table, k = 4, ncv = 10, outcome.type = c("binary", "multiclass"), Method = NULL, measure = "B_accuracy", parallel = FALSE, expected.measure.increase = 0.005, center = TRUE, scale = TRUE, maxiter = 100 )
X.matrix |
Predictor matrix (n×p) |
Y.matrix |
Response one‐hot matrix (n×q) |
PLS_term |
Integer: maximum number of PLS components |
X.dim |
Vector: feature counts per block |
quantile.comb.table |
Matrix (C×length(X.dim)): quantile combinations |
k |
Integer: number of CV k (K) |
ncv |
Integer: number of ncv |
outcome.type |
"binary" or "multiclass" |
Method |
Prediction method |
measure |
"B_accuracy","accuracy","precision","recall","F1" |
parallel |
Logical: TRUE to parallelize per-fold |
expected.measure.increase |
Numeric: min performance gain to add PLS |
center |
Logical: center predictors |
scale |
Logical: scale predictors |
maxiter |
Integer: max iterations for asmbPLSDA.fit |
A list with:
quantile_table_CV |
Matrix (PLS_term × (blocks + metrics)) of optimal quantiles and CV metrics |
optimal_nPLS |
Integer: selected number of PLS components |
splits |
List of length (k*ncv) of train/validation splits |
# example code file <- system.file("extdata", "example_superpathway_input.rda", package = "singIST") load(file) data <- example_superpathway_input matrices <- matrixToBlock(data) X.matrix <- matrices$block_predictor Y.matrix <- matrices$matrix_response X.dim <- matrices$block_dim quantile.comb.table <- data$hyperparameters_info$quantile_comb_table quantile.comb.table <- rbind(quantile.comb.table, c(0.1, 0.2)) # Add 2 cases outcome.type <- data$hyperparameters_info$outcome_type asmbPLSDA.cv.kcv(X.matrix, Y.matrix, PLS_term = 1, X.dim,quantile.comb.table,Method = NULL, measure = "B_accuracy", parallel = TRUE, outcome.type = outcome.type, expected.measure.increase = 0.005, center = TRUE, scale = TRUE, maxiter = 100)# example code file <- system.file("extdata", "example_superpathway_input.rda", package = "singIST") load(file) data <- example_superpathway_input matrices <- matrixToBlock(data) X.matrix <- matrices$block_predictor Y.matrix <- matrices$matrix_response X.dim <- matrices$block_dim quantile.comb.table <- data$hyperparameters_info$quantile_comb_table quantile.comb.table <- rbind(quantile.comb.table, c(0.1, 0.2)) # Add 2 cases outcome.type <- data$hyperparameters_info$outcome_type asmbPLSDA.cv.kcv(X.matrix, Y.matrix, PLS_term = 1, X.dim,quantile.comb.table,Method = NULL, measure = "B_accuracy", parallel = TRUE, outcome.type = outcome.type, expected.measure.increase = 0.005, center = TRUE, scale = TRUE, maxiter = 100)
Leave-one-out Cross-validation
asmbPLSDA.cv.loo( X.matrix, Y.matrix, PLS_term = 1, X.dim, quantile.comb.table, outcome.type = c("binary", "multiclass"), Method = NULL, measure = "B_accuracy", parallel = FALSE, expected.measure.increase = 0.005, center = TRUE, scale = TRUE, maxiter = 100 )asmbPLSDA.cv.loo( X.matrix, Y.matrix, PLS_term = 1, X.dim, quantile.comb.table, outcome.type = c("binary", "multiclass"), Method = NULL, measure = "B_accuracy", parallel = FALSE, expected.measure.increase = 0.005, center = TRUE, scale = TRUE, maxiter = 100 )
X.matrix |
Predictor block matrix from |
Y.matrix |
Response matrix from |
PLS_term |
An integer with the number of PLS components to use passed from hyperparameter list |
X.dim |
A list with the observed gene set size for each cell type
from |
quantile.comb.table |
A matrix with the quantile comb table passed from hyperparameters list object |
outcome.type |
A character indicating |
Method |
A parameter passed from |
measure |
A parameter passed from |
parallel |
A parameter passed from |
expected.measure.increase |
A parameter passed from |
center |
A parameter passed from |
scale |
A parameter passed from |
maxiter |
A parameter passed from |
A list containing the optimal quantiles for each PLS component and the optimal number of PLS components.
file <- system.file("extdata", "example_superpathway_input.rda", package = "singIST") load(file) data <- example_superpathway_input matrices <- matrixToBlock(data) X.matrix <- matrices$block_predictor Y.matrix <- matrices$matrix_response X.dim <- matrices$block_dim quantile.comb.table <- data$hyperparameters_info$quantile_comb_table outcome.type <- data$hyperparameters_info$outcome_type asmbPLSDA.cv.loo(X.matrix, Y.matrix, PLS_term = 1, X.dim,quantile.comb.table, Method = NULL, measure = "B_accuracy", parallel = TRUE, outcome.type = outcome.type, expected.measure.increase = 0.005, center = TRUE, scale = TRUE,maxiter = 100)file <- system.file("extdata", "example_superpathway_input.rda", package = "singIST") load(file) data <- example_superpathway_input matrices <- matrixToBlock(data) X.matrix <- matrices$block_predictor Y.matrix <- matrices$matrix_response X.dim <- matrices$block_dim quantile.comb.table <- data$hyperparameters_info$quantile_comb_table outcome.type <- data$hyperparameters_info$outcome_type asmbPLSDA.cv.loo(X.matrix, Y.matrix, PLS_term = 1, X.dim,quantile.comb.table, Method = NULL, measure = "B_accuracy", parallel = TRUE, outcome.type = outcome.type, expected.measure.increase = 0.005, center = TRUE, scale = TRUE,maxiter = 100)
Maps the organism information in mapping organism list and superpathway fit model list to obtain the "singIST treated samples" with the simulated human. The biological link function involves the cell type mapping, orthology mapping and fold change computation.
biological_link_function( object, model_object, object_gene_identifiers = "external_gene_name", model_species = "hsapiens", FC_list = NULL, ... )biological_link_function( object, model_object, object_gene_identifiers = "external_gene_name", model_species = "hsapiens", FC_list = NULL, ... )
object |
A mapping organism list with the disease model data |
model_object |
A superpathway fit model list with the fitted model |
object_gene_identifiers |
Annotation of gene identifiers used in
|
model_species |
Organism for which |
FC_list |
Optional parameter with list of matrices containing Fold
Changes provided by the user. If such list is provided, logFC will not be
computed via diff_expressed and the list input will be used instead.
The list length must match the number and order of human cell types modelled.
Each element of the list must be a |
... |
Other parameters to pass onto diff_expressed |
A list with; ortholog gene sets as returned by orthology_mapping; a list with the Fold Changes used; singIST treated samples as returned by singIST_treat
file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data_organism <- example_mapping_organism file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data_model <- example_superpathway_fit_model biological_link_function(data_organism, data_model)file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data_organism <- example_mapping_organism file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data_model <- example_superpathway_fit_model biological_link_function(data_organism, data_model)
For a given mapping organism list it updates the variable
celltype_cluster so that each element of it is updated accordingly
to the mapped cell types as indicated in object$celltype_mapping.
celltype_mapping(object)celltype_mapping(object)
object |
A mapping organism list |
A mapping organism list with the object$counts
slot updated, for the variable celltype_cluster with the cell types
according to the mapping defined in object$celltype_mapping.
file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data <- example_mapping_organism new_object <- celltype_mapping(data) new_object$counts$celltype_clusterfile <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data <- example_mapping_organism new_object <- celltype_mapping(data) new_object$counts$celltype_cluster
Derive cell type recapitulation
celltype_recap(model_object, data_original, data_singIST)celltype_recap(model_object, data_original, data_singIST)
model_object |
A superpathway fit model list passed from singISTrecapitulations |
data_original |
A matrix with the cell type contributions as returned by derive_contributions for the non-singIST treated samples, passed from singISTrecapitulations |
data_singIST |
A matrix with the cell type contributions as returned by derive_contributions for the singIST treated samples, passed from singISTrecapitulations |
A data.frame object with the variables: pathway name, celltype with
the cell type name, recapitulation with the cell type recapitulation, and
reference with the cell type reference recapitulation
file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) model <- example_superpathway_fit_model file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) mapped <- example_mapping_organism singIST_samples <- biological_link_function(mapped, model)$singIST_samples original <- derive_contributions(model, singIST_samples) derived <- derive_contributions(model, model$model_fit$predictor_block) celltype_recap(model, original$celltype_contribution, derived$celltype_contribution)file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) model <- example_superpathway_fit_model file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) mapped <- example_mapping_organism singIST_samples <- biological_link_function(mapped, model)$singIST_samples original <- derive_contributions(model, singIST_samples) derived <- derive_contributions(model, model$model_fit$predictor_block) celltype_recap(model, original$celltype_contribution, derived$celltype_contribution)
Checks that all provided fields for a fitted asmbPLS-DA model meet the expected properties:
superpathway_input must be a valid superpathway_input object (validated with check_superpathway_input()).
hyperparameters_fit must be valid hyperparameters (validated with check_hyperparameters()).
model_fit and model_validation must be lists.
check_fit_model( superpathway_input, hyperparameters_fit, model_fit, model_validation )check_fit_model( superpathway_input, hyperparameters_fit, model_fit, model_validation )
superpathway_input |
List. A superpathway object created by create_superpathway(). |
hyperparameters_fit |
List. A hyperparameters object created by create_hyperparameters(). |
model_fit |
List. Fitted model details. |
model_validation |
List. Validation metrics of the fitted model. |
TRUE if all checks pass; otherwise, an error is thrown.
Checks that all provided hyperparameters meet the expected properties:
quantile_comb_table must be a matrix with at least one row.
outcome_type must be "binary" or "multiclass".
number_PLS must be an integer >= 0.
folds_CV and repetition_CV must be integers >= 0 (or NULL).
check_hyperparameters( quantile_comb_table, outcome_type, number_PLS, folds_CV, repetition_CV )check_hyperparameters( quantile_comb_table, outcome_type, number_PLS, folds_CV, repetition_CV )
quantile_comb_table |
Matrix. Quantile (lambda) sparsity values for CV. |
outcome_type |
Character. Either "binary" or "multiclass". |
number_PLS |
Integer. Maximum number of PLS components. |
folds_CV |
Integer or NULL. Number of folds for CV (default 5). |
repetition_CV |
Integer or NULL. Number of repetitions for CV (default 10). |
TRUE if all checks pass; otherwise, an error is thrown.
quantile_comb_table <- base::as.matrix(RcppAlgos::permuteGeneral(seq(0.05, 0.95, by = 0.50))) check_hyperparameters(quantile_comb_table, "binary", 3L, 1L, 1L)quantile_comb_table <- base::as.matrix(RcppAlgos::permuteGeneral(seq(0.05, 0.95, by = 0.50))) check_hyperparameters(quantile_comb_table, "binary", 3L, 1L, 1L)
Validates that all provided fields for a mapping organism meet expected properties:
organism, target_class, base_class are scalar characters.
celltype_mapping is a list with non-empty names; each entry is a character vector (length 0 allowed).
counts is a Seurat or SingleCellExperiment object with metadata columns: 'class' and 'celltype_cluster'.
target_class and base_class exist in meta$class and are different (after normalization).
all clusters referenced in celltype_mapping exist in meta$celltype_cluster.
check_mapping_organism( organism, target_class, base_class, celltype_mapping, counts )check_mapping_organism( organism, target_class, base_class, celltype_mapping, counts )
organism |
Character(1). Scientific Latin name of the organism. |
target_class |
Character(1). Name of the target class for this organism. |
base_class |
Character(1). Name of the base class for this organism. |
celltype_mapping |
List. Mapping of cell types (keys) to cluster names (character vectors). |
counts |
Seurat or SingleCellExperiment. Contains scRNA-seq counts and metadata. |
TRUE if all checks pass; otherwise an informative error is thrown.
# Seurat example counts <- SeuratObject::pbmc_small colnames(slot(counts, "meta.data"))[1] <- "donor" colnames(slot(counts, "meta.data"))[6] <- "class" colnames(slot(counts, "meta.data"))[7] <- "celltype_cluster" celltype_mapping <- list("T-cell" = c("T"), "Dendritic Cell" = character(0)) check_mapping_organism("Mus musculus", "g1", "g2", celltype_mapping, counts)# Seurat example counts <- SeuratObject::pbmc_small colnames(slot(counts, "meta.data"))[1] <- "donor" colnames(slot(counts, "meta.data"))[6] <- "class" colnames(slot(counts, "meta.data"))[7] <- "celltype_cluster" celltype_mapping <- list("T-cell" = c("T"), "Dendritic Cell" = character(0)) check_mapping_organism("Mus musculus", "g1", "g2", celltype_mapping, counts)
Checks that all provided fields for a pathway meet the expected properties.
check_pathway(standard_name, dbsource, collection, subcollection)check_pathway(standard_name, dbsource, collection, subcollection)
standard_name |
Character. Pathway standard name from MsigDB. |
dbsource |
Character. Database source (KEGG, PID, REACTOME, BIOCARTA, WIKIPATHWAYS). |
collection |
Character. MsigDB collection (c2 or m2). |
subcollection |
Character. MsigDB subcollection (CP). |
TRUE if all checks pass; otherwise, an error is thrown.
check_pathway( standard_name = "KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", dbsource = "KEGG", collection = "c2", subcollection = "CP" )check_pathway( standard_name = "KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", dbsource = "KEGG", collection = "c2", subcollection = "CP" )
Checks that all provided fields for a superpathway meet the expected properties:
pathway_info must be a valid pathway (validated with check_pathway()).
celltypes must be a character vector with at least 2 elements.
gene_sets_celltype must be a list (or NULL) with same length as celltypes.
check_superpathway(pathway_info, celltypes, gene_sets_celltype)check_superpathway(pathway_info, celltypes, gene_sets_celltype)
pathway_info |
List. A pathway object created by create_pathway(). |
celltypes |
Character vector. Each element represents a cell type. |
gene_sets_celltype |
List of character vectors. Each element corresponds to gene sets for each cell type. Can be NULL. |
TRUE if all checks pass; otherwise, an error is thrown.
my_pathway <- create_pathway("KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", "KEGG", "c2", "CP") check_superpathway(my_pathway, c("T-cell", "Dendritic Cell"), list(c("IL4", "IL5"), c("IL13")))my_pathway <- create_pathway("KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", "KEGG", "c2", "CP") check_superpathway(my_pathway, c("T-cell", "Dendritic Cell"), list(c("IL4", "IL5"), c("IL13")))
Checks the validity of the inputs. This version assumes that
superpathway_info and hyperparameters_info are plain lists
validated by check_superpathway() and check_hyperparameters().
check_superpathway_input( superpathway_info, hyperparameters_info, pseudobulk_lognorm, sample_id, sample_class, base_class, target_class )check_superpathway_input( superpathway_info, hyperparameters_info, pseudobulk_lognorm, sample_id, sample_class, base_class, target_class )
superpathway_info |
A list representing a superpathway object. |
hyperparameters_info |
A list representing a hyperparameters object. |
pseudobulk_lognorm |
A pseudobulk matrix (rows: "Celltype_Sampleid", cols: genes in "HGNC" format or similar). |
sample_id |
A character vector of sample ids. |
sample_class |
A character vector with the class of each sample. |
base_class |
A character scalar indicating the base class. |
target_class |
A character scalar indicating the target class. |
Invisibly returns TRUE if all checks pass; otherwise errors.
# ---- Superpathway info (list) ---- my_pathway <- create_pathway("KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", "KEGG", "c2", "CP") celltypes <- c("T-cell", "Dendritic Cell") my_superpathway <- create_superpathway(my_pathway, celltypes, list(c("IL4", "IL5"), c("IL13"))) # ---- Hyperparameters info (list) ---- quantile_comb_table <- base::as.matrix( RcppAlgos::permuteGeneral(seq(0.05, 0.95, by = 0.50)), ncol = length(celltypes) ) my_hyperparameters <- create_hyperparameters( quantile_comb_table = quantile_comb_table, outcome_type = "binary", number_PLS = as.integer(3), folds_CV = as.integer(1), repetition_CV = as.integer(1) ) # ---- Pseudobulk + labels ---- sample_id <- c("AD1", "AD2", "HC1", "HC2") sample_class <- c("AD", "AD", "HC", "HC") base_class <- "HC" target_class <- "AD" pseudobulk_lognorm <- matrix( rnorm(length(celltypes) * length(sample_id)), nrow = length(celltypes) * length(sample_id), ncol = length(celltypes) ) rownames(pseudobulk_lognorm) <- as.vector(t(outer( celltypes, sample_id, function(x, y) paste(x, y, sep = "_") ))) check_superpathway_input( superpathway_info = my_superpathway, hyperparameters_info = my_hyperparameters, pseudobulk_lognorm = pseudobulk_lognorm, sample_id = sample_id, sample_class = sample_class, base_class = base_class, target_class = target_class )# ---- Superpathway info (list) ---- my_pathway <- create_pathway("KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", "KEGG", "c2", "CP") celltypes <- c("T-cell", "Dendritic Cell") my_superpathway <- create_superpathway(my_pathway, celltypes, list(c("IL4", "IL5"), c("IL13"))) # ---- Hyperparameters info (list) ---- quantile_comb_table <- base::as.matrix( RcppAlgos::permuteGeneral(seq(0.05, 0.95, by = 0.50)), ncol = length(celltypes) ) my_hyperparameters <- create_hyperparameters( quantile_comb_table = quantile_comb_table, outcome_type = "binary", number_PLS = as.integer(3), folds_CV = as.integer(1), repetition_CV = as.integer(1) ) # ---- Pseudobulk + labels ---- sample_id <- c("AD1", "AD2", "HC1", "HC2") sample_class <- c("AD", "AD", "HC", "HC") base_class <- "HC" target_class <- "AD" pseudobulk_lognorm <- matrix( rnorm(length(celltypes) * length(sample_id)), nrow = length(celltypes) * length(sample_id), ncol = length(celltypes) ) rownames(pseudobulk_lognorm) <- as.vector(t(outer( celltypes, sample_id, function(x, y) paste(x, y, sep = "_") ))) check_superpathway_input( superpathway_info = my_superpathway, hyperparameters_info = my_hyperparameters, pseudobulk_lognorm = pseudobulk_lognorm, sample_id = sample_id, sample_class = sample_class, base_class = base_class, target_class = target_class )
Computes CIP and GIP metrics from a superpathway fit model list for the target class
CIP_GIP(object)CIP_GIP(object)
object |
A superpathway fit model list |
A list with the CIP and GIP metrics for all cell types. The metrics are computed for the target class.
file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data <- example_superpathway_fit_model CIP_GIP(data)file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data <- example_superpathway_fit_model CIP_GIP(data)
Computes Cell and Gene Importance Projection observed distribution from fitted asmbPLSDA, and its associated null distributions by permuting the block of predictor matrices. Returns a pvalue of the Mann-Whitney Wilcoxon between the observed and null distribution for each CIP and GIP.
CIP_GIP_test( object, npermut = 100, maxiter = 100, type = c("jackknife", "subsampling"), nsubsampling = 100, ... )CIP_GIP_test( object, npermut = 100, maxiter = 100, type = c("jackknife", "subsampling"), nsubsampling = 100, ... )
object |
A superpathway fit model list |
npermut |
Number of permutations on response block matrices |
maxiter |
An integer indicating the maximum number of iterations.
If |
type |
Either |
nsubsampling |
Number of subsamples to generate CIP and GIP observed distributions. By default 100. |
... |
Other parameters to be passed onto wilcox_CIP_GIP |
A list containing; observed distributions of CIP and GIP (variability_param); its associated null distributions generated by permutations (NULL_CIP_GIP); the unadjusted pvalue of Mann-Whitney Wilcoxon for CIP distribution (CIP_pvalue); and for GIP distribution (GIP_pvalue).
file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data <- example_superpathway_fit_model CIP_GIP_test(data, npermut = 3, type = "jackknife")file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data <- example_superpathway_fit_model CIP_GIP_test(data, npermut = 3, type = "jackknife")
Converts non‐finite values to NA, removes any column that is 100% NA or has zero variance, and returns a logical mask of kept columns.
clean_mfa_data(X)clean_mfa_data(X)
X |
A numeric matrix or data.frame of predictors (samples × features). |
A list with components:
X_cleanCleaned matrix (only kept columns).
keep_colsLogical vector, TRUE for columns kept.
removedInteger indices of removed columns.
clean_mfa_data(matrix(matrix(c(0,0,0,NA, 1,2), ncol = 2, nrow = 3)))clean_mfa_data(matrix(matrix(c(0,0,0,NA, 1,2), ncol = 2, nrow = 3)))
Creates a simple list representing a fitted asmbPLS-DA model, after validating its components.
create_fit_model( superpathway_input, hyperparameters_fit, model_fit, model_validation )create_fit_model( superpathway_input, hyperparameters_fit, model_fit, model_validation )
superpathway_input |
List. A superpathway object created by create_superpathway(). |
hyperparameters_fit |
List. A hyperparameters object created by create_hyperparameters(). |
model_fit |
List. Fitted model details. |
model_validation |
List. Validation metrics of the fitted model. |
A list with elements: superpathway_input, hyperparameters_fit, model_fit, model_validation.
my_pathway <- create_pathway("KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", "KEGG", "c2", "CP") celltypes <- c("T-cell", "Dendritic Cell") my_superpathway <- create_superpathway(my_pathway, celltypes, list(c("IL4", "IL5"), c("IL13"))) my_hyperparameters <- create_hyperparameters(matrix(1:4, nrow = 2), "binary", 3L, 5L, 10L) #' # ---- Pseudobulk + labels ---- sample_id <- c("AD1", "AD2", "HC1", "HC2") sample_class <- c("AD", "AD", "HC", "HC") base_class <- "HC" target_class <- "AD" pseudobulk_lognorm <- matrix( rnorm(length(celltypes) * length(sample_id)), nrow = length(celltypes) * length(sample_id), ncol = length(celltypes) ) rownames(pseudobulk_lognorm) <- as.vector(t(outer( celltypes, sample_id, function(x, y) paste(x, y, sep = "_") ))) my_superpathway_input <- create_superpathway_input( superpathway_info = my_superpathway, hyperparameters_info = my_hyperparameters, pseudobulk_lognorm = pseudobulk_lognorm, sample_id = sample_id, sample_class = sample_class, base_class = base_class, target_class = target_class ) my_fit <- create_fit_model(my_superpathway_input, my_hyperparameters, list(model = "fit"), list(accuracy = 0.95))my_pathway <- create_pathway("KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", "KEGG", "c2", "CP") celltypes <- c("T-cell", "Dendritic Cell") my_superpathway <- create_superpathway(my_pathway, celltypes, list(c("IL4", "IL5"), c("IL13"))) my_hyperparameters <- create_hyperparameters(matrix(1:4, nrow = 2), "binary", 3L, 5L, 10L) #' # ---- Pseudobulk + labels ---- sample_id <- c("AD1", "AD2", "HC1", "HC2") sample_class <- c("AD", "AD", "HC", "HC") base_class <- "HC" target_class <- "AD" pseudobulk_lognorm <- matrix( rnorm(length(celltypes) * length(sample_id)), nrow = length(celltypes) * length(sample_id), ncol = length(celltypes) ) rownames(pseudobulk_lognorm) <- as.vector(t(outer( celltypes, sample_id, function(x, y) paste(x, y, sep = "_") ))) my_superpathway_input <- create_superpathway_input( superpathway_info = my_superpathway, hyperparameters_info = my_hyperparameters, pseudobulk_lognorm = pseudobulk_lognorm, sample_id = sample_id, sample_class = sample_class, base_class = base_class, target_class = target_class ) my_fit <- create_fit_model(my_superpathway_input, my_hyperparameters, list(model = "fit"), list(accuracy = 0.95))
Creates a simple list representing hyperparameters for asmbPLS-DA, after validating them.
create_hyperparameters( quantile_comb_table, outcome_type, number_PLS, folds_CV = 5L, repetition_CV = 10L )create_hyperparameters( quantile_comb_table, outcome_type, number_PLS, folds_CV = 5L, repetition_CV = 10L )
quantile_comb_table |
Matrix. Quantile (lambda) sparsity values for CV. |
outcome_type |
Character. Either "binary" or "multiclass". |
number_PLS |
Integer. Maximum number of PLS components. |
folds_CV |
Integer or NULL. Number of folds for CV (default 5). |
repetition_CV |
Integer or NULL. Number of repetitions for CV (default 10). |
A list with elements: quantile_comb_table, outcome_type, number_PLS, folds_CV, repetition_CV.
quantile_comb_table <- base::as.matrix(RcppAlgos::permuteGeneral(seq(0.05, 0.95, by = 0.50))) my_hyperparameters <- create_hyperparameters(quantile_comb_table,"binary",3L, 1L, 1L) print(my_hyperparameters)quantile_comb_table <- base::as.matrix(RcppAlgos::permuteGeneral(seq(0.05, 0.95, by = 0.50))) my_hyperparameters <- create_hyperparameters(quantile_comb_table,"binary",3L, 1L, 1L) print(my_hyperparameters)
Creates a simple S3 object (list with class "mapping.organism") after validating its components.
create_mapping_organism( organism, target_class, base_class, celltype_mapping, counts )create_mapping_organism( organism, target_class, base_class, celltype_mapping, counts )
organism |
Character(1). Scientific Latin name of the organism. |
target_class |
Character(1). Name of the target class for this organism. |
base_class |
Character(1). Name of the base class for this organism. |
celltype_mapping |
List. Mapping of cell types to clusters (character vectors). |
counts |
|
A list of class "mapping.organism" with elements:
organism
target_class
base_class
celltype_mapping
counts
counts <- SeuratObject::pbmc_small colnames(slot(counts, "meta.data"))[1] <- "donor" colnames(slot(counts, "meta.data"))[6] <- "class" colnames(slot(counts, "meta.data"))[7] <- "celltype_cluster" celltype_mapping <- list("T-cell" = c("T"), "Dendritic Cell" = character(0)) obj <- create_mapping_organism("Mus musculus", "g1", "g2", celltype_mapping, counts)counts <- SeuratObject::pbmc_small colnames(slot(counts, "meta.data"))[1] <- "donor" colnames(slot(counts, "meta.data"))[6] <- "class" colnames(slot(counts, "meta.data"))[7] <- "celltype_cluster" celltype_mapping <- list("T-cell" = c("T"), "Dendritic Cell" = character(0)) obj <- create_mapping_organism("Mus musculus", "g1", "g2", celltype_mapping, counts)
Creates a simple list representing a pathway, after validating its fields.
create_pathway(standard_name, dbsource, collection, subcollection)create_pathway(standard_name, dbsource, collection, subcollection)
standard_name |
Character. Pathway standard name from MsigDB. |
dbsource |
Character. Database source (KEGG, PID, REACTOME, BIOCARTA, WIKIPATHWAYS). |
collection |
Character. MsigDB collection (c2 or m2). |
subcollection |
Character. MsigDB subcollection (CP). |
A list with elements: standard_name, dbsource, collection, subcollection.
my_pathway <- create_pathway( standard_name = "KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", dbsource = "KEGG", collection = "c2", subcollection = "CP" ) print(my_pathway)my_pathway <- create_pathway( standard_name = "KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", dbsource = "KEGG", collection = "c2", subcollection = "CP" ) print(my_pathway)
Creates a simple list representing a superpathway, after validating its fields.
create_superpathway(pathway_info, celltypes, gene_sets_celltype)create_superpathway(pathway_info, celltypes, gene_sets_celltype)
pathway_info |
List. A pathway object created by create_pathway(). |
celltypes |
Character vector. Each element represents a cell type. |
gene_sets_celltype |
List of character vectors. Each element corresponds to gene sets for each cell type. Can be NULL. |
A list with elements: pathway_info, celltypes, gene_sets_celltype.
my_pathway <- create_pathway("KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", "KEGG", "c2", "CP") my_superpathway <- create_superpathway(my_pathway, c("T-cell", "Dendritic Cell"), list(c("IL4", "IL5"), c("IL13"))) print(my_superpathway)my_pathway <- create_pathway("KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", "KEGG", "c2", "CP") my_superpathway <- create_superpathway(my_pathway, c("T-cell", "Dendritic Cell"), list(c("IL4", "IL5"), c("IL13"))) print(my_superpathway)
Creates a plain list containing the fields for superpathway input object
and validates the content with check_superpathway_input.
create_superpathway_input( superpathway_info, hyperparameters_info, pseudobulk_lognorm, sample_id, sample_class, base_class, target_class )create_superpathway_input( superpathway_info, hyperparameters_info, pseudobulk_lognorm, sample_id, sample_class, base_class, target_class )
superpathway_info |
A list representing a superpathway object (formerly
|
hyperparameters_info |
A list representing a hyperparameters object
(formerly |
pseudobulk_lognorm |
A pseudobulk matrix. |
sample_id |
A character vector of sample ids. |
sample_class |
A character vector with the class of each sample. |
base_class |
A character scalar indicating the base class. |
target_class |
A character scalar indicating the target class. |
A list with elements:
superpathway_info
hyperparameters_info
pseudobulk_lognorm
sample_id
sample_class
base_class
target_class
# ---- Superpathway info (list) ---- my_pathway <- create_pathway("KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", "KEGG", "c2", "CP") celltypes <- c("T-cell", "Dendritic Cell") my_superpathway <- create_superpathway(my_pathway, celltypes, list(c("IL4", "IL5"), c("IL13"))) # ---- Hyperparameters info (list) ---- quantile_comb_table <- base::as.matrix( RcppAlgos::permuteGeneral(seq(0.05, 0.95, by = 0.50)), ncol = length(celltypes) ) my_hyperparameters <- create_hyperparameters( quantile_comb_table = quantile_comb_table, outcome_type = "binary", number_PLS = as.integer(3), folds_CV = as.integer(1), repetition_CV = as.integer(1) ) # ---- Pseudobulk + labels ---- sample_id <- c("AD1", "AD2", "HC1", "HC2") sample_class <- c("AD", "AD", "HC", "HC") base_class <- "HC" target_class <- "AD" pseudobulk_lognorm <- matrix( rnorm(length(celltypes) * length(sample_id)), nrow = length(celltypes) * length(sample_id), ncol = length(celltypes) ) rownames(pseudobulk_lognorm) <- as.vector(t(outer( celltypes, sample_id, function(x, y) paste(x, y, sep = "_") ))) my_superpathway_input <- create_superpathway_input( superpathway_info = my_superpathway, hyperparameters_info = my_hyperparameters, pseudobulk_lognorm = pseudobulk_lognorm, sample_id = sample_id, sample_class = sample_class, base_class = base_class, target_class = target_class )# ---- Superpathway info (list) ---- my_pathway <- create_pathway("KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", "KEGG", "c2", "CP") celltypes <- c("T-cell", "Dendritic Cell") my_superpathway <- create_superpathway(my_pathway, celltypes, list(c("IL4", "IL5"), c("IL13"))) # ---- Hyperparameters info (list) ---- quantile_comb_table <- base::as.matrix( RcppAlgos::permuteGeneral(seq(0.05, 0.95, by = 0.50)), ncol = length(celltypes) ) my_hyperparameters <- create_hyperparameters( quantile_comb_table = quantile_comb_table, outcome_type = "binary", number_PLS = as.integer(3), folds_CV = as.integer(1), repetition_CV = as.integer(1) ) # ---- Pseudobulk + labels ---- sample_id <- c("AD1", "AD2", "HC1", "HC2") sample_class <- c("AD", "AD", "HC", "HC") base_class <- "HC" target_class <- "AD" pseudobulk_lognorm <- matrix( rnorm(length(celltypes) * length(sample_id)), nrow = length(celltypes) * length(sample_id), ncol = length(celltypes) ) rownames(pseudobulk_lognorm) <- as.vector(t(outer( celltypes, sample_id, function(x, y) paste(x, y, sep = "_") ))) my_superpathway_input <- create_superpathway_input( superpathway_info = my_superpathway, hyperparameters_info = my_hyperparameters, pseudobulk_lognorm = pseudobulk_lognorm, sample_id = sample_id, sample_class = sample_class, base_class = base_class, target_class = target_class )
Computes the superpathway score, its cell type contribution and gene contribution for a block of predictor matrices for its later use to compute recapitulations
derive_contributions(model_object, data)derive_contributions(model_object, data)
model_object |
A superpathway fit model list with the fitted asmbPLSDA |
data |
A matrix with the block of predictor matrices to compute score and contributions from |
A list with the superpathway score for each sample, cell type and gene contributions to the former
file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) mapped <- example_mapping_organism file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) model <- example_superpathway_fit_model singIST_samples <- biological_link_function(mapped, model)$singIST_samples derive_contributions(model, singIST_samples)file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) mapped <- example_mapping_organism file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) model <- example_superpathway_fit_model singIST_samples <- biological_link_function(mapped, model)$singIST_samples derive_contributions(model, singIST_samples)
Computes scores from the predictor to later derive the superpathway's score, cell type contribution and gene contributions, for the target class
derive_scores(object, data, sample)derive_scores(object, data, sample)
object |
A superpathway fit model list passed from derive_contributions |
data |
Block of predictor matrices to compute scores from |
sample |
Current sample from |
A list containing the needed parameters to compute superpathway's score, cell type contributions and needed scores to compute gene contributions
file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) mapped <- example_mapping_organism file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) model <- example_superpathway_fit_model singIST_samples <- biological_link_function(mapped, model)$singIST_samples # Derive the scores for sample 2 derive_scores(model, singIST_samples, 2)file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) mapped <- example_mapping_organism file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) model <- example_superpathway_fit_model singIST_samples <- biological_link_function(mapped, model)$singIST_samples # Derive the scores for sample 2 derive_scores(model, singIST_samples, 2)
Computes differentially expressed genes with Seurat::FindMarkers, for
Seurat objects, or scran::findMarkers, for SingleCellExperiment
objects, for the conditions indicated. Note that Seurat::FindMarkers will
compute Wilcoxon Signed Rank Test by default, while scran::findMarkers will
perform t-test by default instead. The reported logFC values are difference
of means of log-normalized expression values with
Seurat::AggregateExpression or
SingleCellExperiment::aggregateAcrossCells. This logFC is consistent with
the human log2FC computation by asmbPLS-DA.
diff_expressed( object, condition_1 = c(), condition_2 = c(), logfc.treshold = 0.25, assay = "RNA", ... )diff_expressed( object, condition_1 = c(), condition_2 = c(), logfc.treshold = 0.25, assay = "RNA", ... )
object |
A mapping organism list. If a |
condition_1 |
A vector with the elements of the first factor to perform
the hypothesis test. By default the mapped cell types
|
condition_2 |
A vector with the elements of the second factor to perform
the hypothesis test with. By default the class of the organism
|
logfc.treshold |
Sets the minimum FindMarkers log-fold change (logFC)
cutoff for identifying differentially expressed genes (DEGs). By default
|
assay |
Specific assay being used for analysis. By default
|
... |
Other parameters to pass onto |
A list where each element is a data.frame for a cell type containing: p_val
p-value of test, avg_log2FC descriptive point estimate of logFC,
pct.1 percentage of cells where the gene is detected in the base class,
pct.2 percentage of cells where the gene is detected in the target class,
p_val_adj FDR.
# Set the identities file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data_organism <- example_mapping_organism data <- celltype_mapping(data_organism) data$counts$test <- paste0(data$counts$celltype_cluster, "_", data$counts$class) SeuratObject::Idents(data$counts) <- "test" diff_expressed(data)# Set the identities file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data_organism <- example_mapping_organism data <- celltype_mapping(data_organism) data$counts$test <- paste0(data$counts$celltype_cluster, "_", data$counts$class) SeuratObject::Idents(data$counts) <- "test" diff_expressed(data)
Runs a full EM‐based imputeMFA() on the training set
then fits a pure MFA to extract the final means and loadings.
fit_mfa_imputer(X_train, group, ncp = 2, method = "Regularized")fit_mfa_imputer(X_train, group, ncp = 2, method = "Regularized")
X_train |
Numeric matrix (train samples × features), may contain NAs. |
group |
Integer vector of block sizes (must sum to
|
ncp |
Number of MFA components to use for imputation (default |
method |
Method for |
A list with components:
imputedMatrix X_train with NAs filled.
muNumeric vector of column means (length = ncol).
loadingsNumeric matrix of loadings (ncol × ncp).
fit_mfa_imputer(matrix(c(NA,runif(19)), nrow = 5, ncol = 4), c(2,2))fit_mfa_imputer(matrix(c(NA,runif(19)), nrow = 5, ncol = 4), c(2,2))
Performs Cross Validation of the provided superpathway input, fits the optimal model and computes its validation metrics. The Cross Validation can either be Leave One-Out Cross Validation (LOOCV) or K-Fold Cross Validation (KCV). A LOOCV is performed if the number of folds was set to 1 or if the number of samples per class is less than 3 for any class. A K-Fold Cross Validation (KCV) is performed if the number of folds is greater or equal than 3 and the number of samples per class is always greater than the number of folds. If the number of samples is low for some of the classes LOOCV is recommended. If KCV is performed, missing values are automatically imputed in the K-CV process. The training set is imputed via missMDA::imputeMFA(), and an FactoMineR::MFA() is trained on the imputed training set from which we extract the mean of each gene and the estimated loadings. We then estimate the validation set by projecting the samples onto MFA space of the training set. Gene whose variance is 0 are excluded from the imputation, if a gene has null variance and full of 0 values, the NA were imputed to 0.
fitOptimal( object, parallel = FALSE, measure = "B_accuracy", Method = NULL, expected_measure_increase = 0.005, maxiter = 100, global_significance_full = FALSE, CIP.GIP_significance_full = FALSE, npermut = 100, nbObsPermut = NULL, type = "jackknife", nsubsampling = 100, ... )fitOptimal( object, parallel = FALSE, measure = "B_accuracy", Method = NULL, expected_measure_increase = 0.005, maxiter = 100, global_significance_full = FALSE, CIP.GIP_significance_full = FALSE, npermut = 100, nbObsPermut = NULL, type = "jackknife", nsubsampling = 100, ... )
object |
A superpathway input list to fit optimal asmbPLSDA. |
parallel |
A boolean indicating whether to parallelize ( |
measure |
Accuracy measure to be used to select optimal asmbPLSDA model. Default is F1 measure. Options are: F1, accuracy, B_accuracy, precision and recall. |
Method |
Decision rule used for prediction. For binary outcome
|
expected_measure_increase |
A double indicating the measure you expect
to decrease by percent after including one more PLS component, this will
affect the selection of optimal number of PLS components. If |
maxiter |
An integer indicating the maximum number of iterations.
If |
global_significance_full |
A boolean indicating whether to return a list
with information of each permutation for the global
significance test of asmbPLSDA. By default |
CIP.GIP_significance_full |
A boolean indicating whether to return a
list with the observed and null distributions of CIP and GIP or only the
pvalue and adjusted pvalue. By default |
npermut |
Number of permutations for the tests. By default 100. Parameter passed onto permut_asmbplsda and CIP_GIP_test. |
nbObsPermut |
An integer indicating the number of samples to permute
in each permutation. By default |
type |
Either |
nsubsampling |
Number of subsamples to generate CIP and GIP observed distributions. By default 100. Passed onto CIP_GIP_test. |
... |
Other parameters to be passed onto wilcox_CIP_GIP, wilcox test of GIP statistical tests |
A superpathway fit model list object with; a
superpathway input list object used for CV and model fit;
a hyperparameters list object with the hyperparameters used to fit
the optimal model (includes optimal quantiles and PLS components from the
CV step); a list with the fitted model information including: predictor
and response matrices, observed gene sets, from matrixToBlock,
and asmbPLSDA output; a list with the validaton metrics of fitted model.
# fitOptimal with jackknife for CIP/GIP statistics and 10 permutations # for the global significance test of the optimal model file <- system.file("extdata", "example_superpathway_input.rda", package = "singIST") load(file) data <- example_superpathway_input fitOptimal(data, npermut = 10, type = "jackknife") # fitOptimal with subsampling for CIP/GIP statistics with # 10 subsamples and 50 permutations for the global significance test of the # optimal model fitOptimal(data, npermut = 50, type = "subsampling", nsubsampling = 10)# fitOptimal with jackknife for CIP/GIP statistics and 10 permutations # for the global significance test of the optimal model file <- system.file("extdata", "example_superpathway_input.rda", package = "singIST") load(file) data <- example_superpathway_input fitOptimal(data, npermut = 10, type = "jackknife") # fitOptimal with subsampling for CIP/GIP statistics with # 10 subsamples and 50 permutations for the global significance test of the # optimal model fitOptimal(data, npermut = 50, type = "subsampling", nsubsampling = 10)
Derive gene contribution to cell type recapitulation
gene_contrib(model_object, data_original, data_singIST, cell_reference)gene_contrib(model_object, data_original, data_singIST, cell_reference)
model_object |
A superpathway fit model list passed from singISTrecapitulations |
data_original |
A matrix with the gene contributions to superpathway's score as returned by derive_contributions for the non-singIST treated samples, passed from singISTrecapitulations |
data_singIST |
A matrix with the gene contributions to superpathway's score as returned by derive_contributions for the singIST treated samples, passed from singISTrecapitulations |
cell_reference |
A matrix with the cell type recapitulations as returned by celltype_recap |
A data.frame object with the variables: pathway name, celltype name,
gene name, contribution gene contribution to cell type recapitulation
file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) model <- example_superpathway_fit_model file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) mapped <- example_mapping_organism singIST_samples <- biological_link_function(mapped, model)$singIST_samples original <- derive_contributions(model, singIST_samples) derived <- derive_contributions(model, model$model_fit$predictor_block) # Derive cell type reference cell <- celltype_recap(model, original$celltype_contribution, derived$celltype_contribution) # Compute gene contributions gene_contrib(model, original$gene_contribution, derived$gene_contribution, cell)file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) model <- example_superpathway_fit_model file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) mapped <- example_mapping_organism singIST_samples <- biological_link_function(mapped, model)$singIST_samples original <- derive_contributions(model, singIST_samples) derived <- derive_contributions(model, model$model_fit$predictor_block) # Derive cell type reference cell <- celltype_recap(model, original$celltype_contribution, derived$celltype_contribution) # Compute gene contributions gene_contrib(model, original$gene_contribution, derived$gene_contribution, cell)
Fill up matrix with the corresponding expression values
get_measure_index() returns the index associated to each performance
measure
Splits the predictor and response matrices into training and validation sets for leave-one-out cross-validation.
Function to train and validate asmbPLSDA excluding one observation parallelized for each quantile combination provided
Computes the prediction accuracy for different quantile combinations by fitting the asmbPLSDA model and making predictions.
Performs leave-one-out cross-validation (LOO-CV) in parallel.
Computes the performance measure selected between the training LOOCV samples and the validation LOOCV samples for all the quantile combination
For an optimal quantile combination and PLS component it computes its performance metrics between the training and validation sets
Selects the optimal number of PLS according to the performance measure
Iterates over all quantiles to generate the fitted asmbPLSDA for each and its associated predicted values
Creates a structured list to store permutation results.
Performs random permutations of the response matrix.
Calculates correlation, percentage change, and RV coefficient.
Selects sample indices for training and validation.
Fits the asmbPLS-DA model using permuted data.
Computes the p-value for the observed CV error against the null distribution of errors generated from permutation testing.
Calculates the 95% confidence interval for the null distribution of permutation errors.
Perform the jackknife resampling procedure for CIP/GIP calculations.
Perform the subsampling procedure for CIP/GIP calculations.
Generate null distributions of CIP and GIP using permutations.
Permute the X matrix to generate a null distribution.
Compute p-values by applying the Mann-Whitney test.
This helper function performs either Leave-One-Out Cross Validation (LOOCV) or K-Fold Cross Validation (KCV) on the given dataset and returns the optimal hyperparameters based on the specified accuracy measure.
This helper function computes various validation metrics, including global significance, CIP/GIP significance, and adjusted p-values for the fitted model based on cross-validation results.
For a given gene set it identifies the annotation of the genes, it does so if the genes have more than 50% match with a given annotation. Annotation must be either Ensembl, Entrez or Gene Symbols.
Retrieves one to one orthologs between from_species and to_species of orthology_mapping
Applies the biological link function conditions onto a predictor block matrix. The resulting gene expression of the predictor block are the cases defined in the biological link function.
Centers and scales each column of the predictor block matrices. The
centering and scaling is according to the centroid and variance estimated in
fit_asmb.
Given a block and the dimensions of all blocks it returns the indices of the genes belonging to that block within the predictor block matrix
Performs loading deflation for a given predictor block and PLS component
update_block( celltype, observed_gene_sets, block_predictor = block_predictor, matrix = matrix ) get_measure_index(measure) get_train_val_sets(X.matrix, Y.matrix, validation_index) quantile_computation( j, ..., results_CV_summary_n, F_matrix_validation_bind, X.matrix, Y.matrix, PLS_term = 1, X.dim, quantile.comb.table, outcome.type = c("binary", "multiclass"), quantile_table_CV, K, n_quantile_comb, Method = NULL, measure = "B_accuracy", expected.measure.increase = 0.005, center = TRUE, scale = TRUE, maxiter = 100 ) evaluate_quantile_combinations( j, results_CV_summary_n, F_matrix_validation_bind, E_matrix_training, F_matrix_training, E_matrix_validation, F_matrix_validation, quantile_table_CV, i, X.dim, quantile.comb.table, outcome.type, center, scale, maxiter, Method ) execute_parallel_cv( K, results_CV_summary_n, F_matrix_validation_bind, X.matrix, Y.matrix, PLS_term, X.dim, quantile.comb.table, outcome.type, quantile_table_CV, Method, measure, expected.measure.increase, center, scale, maxiter, BPPARAM = BiocParallel::bpparam() ) performance_measures( n_quantile_comb, results_CV_summary_n, F_matrix_validation_bind, outcome.type, measure_selected ) compute_final_measures( K, X.matrix, Y.matrix, i, X.dim, quantile_table_CV, outcome.type, center, scale, maxiter, Method ) select_optimal_PLS( PLS_term, quantile_table_CV, X.dim, measure_selected, expected.measure.increase ) execute_sequential_cv( K, n_quantile_comb, results_CV_summary_n, F_matrix_validation_bind, X.matrix, Y.matrix, PLS_term, X.dim, quantile.comb.table, outcome.type, quantile_table_CV, measure, expected.measure.increase, center, scale, maxiter, Method ) initialize_results(npermut, q) permute_Y_matrix(Y.matrix, nr, nbObsPermut, j) compute_permutation_stats(res, Y.matrix, Ypermut, j, q, nr) select_samples(object, nr, Nc) fit_permuted_model(object, X_train, Y_train, maxiter) evaluate_performance( res, Modelpermut, X_train, X_val, Y.matrix, s, measure, j, nr, Method, object ) compute_pvalue(null_errors, CV_error) compute_IC95(m) jackknife_CIP_GIP(object, X.matrix, Y.matrix, K, maxiter, X.dim) subsampling_CIP_GIP( object, X.matrix, Y.matrix, K, M, nsubsampling, maxiter, X.dim ) generate_null_distributions( object, X.matrix, Y.matrix, npermut, K, X.dim, maxiter ) permute_X_matrix(X.matrix, K, X.dim) calculate_pvalues(variability, null_dist, test_func, ...) perform_cv( object, model_block_matrices, nFC, measure, parallel, expected_measure_increase, maxiter, Method ) compute_validation_metrics( output, optimal_hyperparameters, model_block_matrices, npermut, nbObsPermut, maxiter, global_significance_full, CIP.GIP_significance_full, type, nsubsampling, measure, Method ) detect_gene_type(gene_set, mart) retrieve_one2one_orthologs( annotation, gene_set, mart, from_species, to_species ) FCtoExpression(model_object, b, samples, predictor_block, FC) center_scale(data, fit_asmb) get_indices(j, X.dim) deflate_prediction(data, PLS, delta_cbind, fit_asmb)update_block( celltype, observed_gene_sets, block_predictor = block_predictor, matrix = matrix ) get_measure_index(measure) get_train_val_sets(X.matrix, Y.matrix, validation_index) quantile_computation( j, ..., results_CV_summary_n, F_matrix_validation_bind, X.matrix, Y.matrix, PLS_term = 1, X.dim, quantile.comb.table, outcome.type = c("binary", "multiclass"), quantile_table_CV, K, n_quantile_comb, Method = NULL, measure = "B_accuracy", expected.measure.increase = 0.005, center = TRUE, scale = TRUE, maxiter = 100 ) evaluate_quantile_combinations( j, results_CV_summary_n, F_matrix_validation_bind, E_matrix_training, F_matrix_training, E_matrix_validation, F_matrix_validation, quantile_table_CV, i, X.dim, quantile.comb.table, outcome.type, center, scale, maxiter, Method ) execute_parallel_cv( K, results_CV_summary_n, F_matrix_validation_bind, X.matrix, Y.matrix, PLS_term, X.dim, quantile.comb.table, outcome.type, quantile_table_CV, Method, measure, expected.measure.increase, center, scale, maxiter, BPPARAM = BiocParallel::bpparam() ) performance_measures( n_quantile_comb, results_CV_summary_n, F_matrix_validation_bind, outcome.type, measure_selected ) compute_final_measures( K, X.matrix, Y.matrix, i, X.dim, quantile_table_CV, outcome.type, center, scale, maxiter, Method ) select_optimal_PLS( PLS_term, quantile_table_CV, X.dim, measure_selected, expected.measure.increase ) execute_sequential_cv( K, n_quantile_comb, results_CV_summary_n, F_matrix_validation_bind, X.matrix, Y.matrix, PLS_term, X.dim, quantile.comb.table, outcome.type, quantile_table_CV, measure, expected.measure.increase, center, scale, maxiter, Method ) initialize_results(npermut, q) permute_Y_matrix(Y.matrix, nr, nbObsPermut, j) compute_permutation_stats(res, Y.matrix, Ypermut, j, q, nr) select_samples(object, nr, Nc) fit_permuted_model(object, X_train, Y_train, maxiter) evaluate_performance( res, Modelpermut, X_train, X_val, Y.matrix, s, measure, j, nr, Method, object ) compute_pvalue(null_errors, CV_error) compute_IC95(m) jackknife_CIP_GIP(object, X.matrix, Y.matrix, K, maxiter, X.dim) subsampling_CIP_GIP( object, X.matrix, Y.matrix, K, M, nsubsampling, maxiter, X.dim ) generate_null_distributions( object, X.matrix, Y.matrix, npermut, K, X.dim, maxiter ) permute_X_matrix(X.matrix, K, X.dim) calculate_pvalues(variability, null_dist, test_func, ...) perform_cv( object, model_block_matrices, nFC, measure, parallel, expected_measure_increase, maxiter, Method ) compute_validation_metrics( output, optimal_hyperparameters, model_block_matrices, npermut, nbObsPermut, maxiter, global_significance_full, CIP.GIP_significance_full, type, nsubsampling, measure, Method ) detect_gene_type(gene_set, mart) retrieve_one2one_orthologs( annotation, gene_set, mart, from_species, to_species ) FCtoExpression(model_object, b, samples, predictor_block, FC) center_scale(data, fit_asmb) get_indices(j, X.dim) deflate_prediction(data, PLS, delta_cbind, fit_asmb)
celltype |
Cell types modelled |
observed_gene_sets |
Gene sets observed from your dataset |
block_predictor |
Block of predictor matrices to update |
matrix |
To iteratively update with block_predictor values |
measure |
The accuracy measure used for validation. Default is "F1". |
X.matrix |
Predictor matrix. |
Y.matrix |
Response matrix. |
validation_index |
Index of the validation sample. |
j |
Block to return indices for |
... |
Other parameters of |
results_CV_summary_n |
Passed from asmbPLSDA.cv.loo |
F_matrix_validation_bind |
Passed from asmbPLSDA.cv.loo |
PLS_term |
Passed from asmbPLSDA.cv.loo |
X.dim |
Vector with number of genes of each block |
quantile.comb.table |
Passed from asmbPLSDA.cv.loo |
outcome.type |
Passed from asmbPLSDA.cv.loo |
quantile_table_CV |
Passed from asmbPLSDA.cv.loo |
K |
Number of samples. |
n_quantile_comb |
Passed from asmbPLSDA.cv.loo |
Method |
The decision rule for prediction (e.g., "fixed_cutoff", "Euclidean_distance_X", etc.). |
expected.measure.increase |
Passed from asmbPLSDA.cv.loo |
center |
Passed from asmbPLSDA.cv.loo |
scale |
Passed from asmbPLSDA.cv.loo |
maxiter |
The maximum number of iterations for validation tests. Default is 100. |
E_matrix_training |
Training predictor matrix. |
F_matrix_training |
Training response matrix. |
E_matrix_validation |
Validation predictor matrix. |
F_matrix_validation |
Validation response matrix |
i |
Passed from asmbPLSDA.cv.loo |
BPPARAM |
A |
measure_selected |
Passed from asmbPLSDA.cv.loo |
npermut |
The number of permutations for significance testing. |
q |
Number of classes. |
nr |
Number of samples |
nbObsPermut |
The number of samples to permute in each permutation.
Default is |
res |
List of results to store statistics |
Ypermut |
Permuted response matrix. |
object |
A superpathway input list containing the data to be used for the cross-validation. |
Nc |
Number of samples to drop at each permutation. |
X_train |
Training predictor blocks |
Y_train |
Training response matrix. |
Modelpermut |
Permuted asmbPLSDA model |
X_val |
Validation predictor blocks |
s |
Validation samples |
null_errors |
A vector of errors from the null distribution (permuted errors). |
CV_error |
The observed cross-validation error. |
m |
A vector of errors from the null distribution (permuted errors). |
M |
Number of classes. |
nsubsampling |
The number of subsamples for CIP/GIP testing. Default is 100. |
variability |
A list of CIP or GIP values for observed distributions. |
null_dist |
A list of CIP or GIP values for null distributions. |
test_func |
The test function to use (typically Wilcoxon). |
model_block_matrices |
A list containing the model block matrices (predictor and response matrices). |
nFC |
The number of folds for K-fold cross-validation. If |
parallel |
A logical value indicating whether parallel computation should be used. |
expected_measure_increase |
Expected decrease in measure per additional PLS component. Default is 0.005. |
output |
The superpathway fit model list that contains the fitted model and validation information. |
optimal_hyperparameters |
The optimal hyperparameters obtained from cross-validation. |
global_significance_full |
Boolean flag indicating whether to return full global significance results. |
CIP.GIP_significance_full |
Boolean flag indicating whether to return full CIP/GIP significance results. |
type |
The procedure type for generating CIP/GIP distributions. Can be "jackknife" or "subsampling". |
gene_set |
A parameter passed from orthology_mapping |
mart |
A parameter passed from orthology_mapping |
annotation |
A parameter passed from orthology_mapping, it indicates the annotation of the gene set provided |
from_species |
A parameter passed from orthology_mapping |
to_species |
A parameter passed from orthology_mapping |
model_object |
A superpathway fit model list |
b |
A parameter passed from singIST_treat. The index of current iteration block. |
samples |
A parameter passed from singIST_treat. The samples
to modify its gene expression from |
predictor_block |
A parameter passed from singIST_treat. The predictor block of matrices from asmbPLSDA to modify its gene expression. |
FC |
A parameter passed from singIST_treat. A |
data |
Matrix of predictor block to deflate |
fit_asmb |
asmbPLSDA fitted model |
PLS |
Numeric value indicating the PLS component |
delta_cbind |
Gene contributions (loadings) used to deflate the blocks |
A list containing the training and validation sets:
E_matrix_validation |
Validation predictor matrix |
F_matrix_validation |
Validation response matrix |
E_matrix_training |
Training predictor matrix |
F_matrix_training |
Training response matrix |
A numeric vector containing predicted values for validation samples.
A list containing updated results_CV_summary_n and
F_matrix_validation_bind matrices.
A vector with the performance measure of each quantile combination
Optimal quantile table for each PLS with all its performance measures
An integer with the optimal number of PLS
A list with the true class of each LOOCV sample and its predicted class for each quantile combination
A list containing initialized data frames for permutation statistics.
A permuted response matrix.
Updated result list with permutation statistics.
A vector of selected sample indices.
The fitted asmbPLS-DA model.
Res list including the performance measure of the permuted model
The computed p-value.
A numeric vector containing the lower and upper bounds of the 95% confidence interval.
A list with the observed CIP and GIP distributions.
A list with the observed CIP and GIP distributions.
A list with the null CIP and GIP distributions.
A permuted X matrix.
A data frame of p-values.
A list containing the optimal hyperparameters and associated quantile table.
The updated superpathway.fit.model object with the computed
validation metrics.
The identified gene annotation or NULL if it was not identified
A data.table object with the Ensembl identifiers of gene set for
from_species and to_species with only one to one orthologs
The predictor block matrix updated with the FC translation
The object data centered and scaled.
A vector with the indices of the predictor block matrix for the requestes block
The data matrix loading deflated
measure <- "F1" get_measure_index(measure) X <- matrix(rnorm(100), nrow = 10, ncol = 10) Y <- matrix(sample(0:1, 10, replace = TRUE), ncol = 1) result <- get_train_val_sets(X, Y, validation_index = 2) str(result) E_train <- matrix(rnorm(100), nrow = 10, ncol = 10) F_train <- matrix(sample(0:1, 10, replace = TRUE), ncol = 1) E_valid <- matrix(rnorm(10), nrow = 1, ncol = 10) F_valid <- matrix(1, nrow = 1, ncol = 1) quantile_table <- matrix(runif(2), nrow = 1, ncol = 2) quantile_table_CV <- matrix(runif(7), nrow = 1, ncol = 7) results_CV_summary_n <- matrix(0, nrow = 1, ncol = 2) F_matrix_validation_bind <- matrix(0, nrow = 1, ncol = 2) result <- evaluate_quantile_combinations(j=1, E_matrix_training = E_train, F_matrix_training = F_train, E_matrix_validation = E_valid, F_matrix_validation = F_valid, F_matrix_validation_bind = F_matrix_validation_bind, results_CV_summary_n = results_CV_summary_n, quantile_table_CV=quantile_table_CV, i = 1, X.dim = c(5,5), quantile.comb.table =quantile_table, outcome.type = "binary", center = TRUE, scale = TRUE, maxiter = 100, Method = NULL) print(result) set.seed(123) K <- 5 X <- matrix(rnorm(50), nrow = 5, ncol = 10) Y <- matrix(sample(0:1, 5, replace = TRUE), ncol = 1) quantile_comb_table <- matrix(runif(10), nrow = 2, ncol = 10) results_CV_summary_n <- matrix(0, nrow = 2, ncol = K) F_matrix_validation_bind <- matrix(0, nrow = 2, ncol = K) # Parallelization options library(BiocParallel) register(SnowParam(workers = 2, exportglobals = FALSE, progressbar = TRUE), default = TRUE) output <- execute_parallel_cv(K, results_CV_summary_n, F_matrix_validation_bind, X, Y, PLS_term = 1, X.dim = c(5,5), quantile.comb.table = quantile_comb_table, outcome.type = "binary", quantile_table_CV = quantile_comb_table, measure = "B_accuracy", expected.measure.increase = 0.005, center = TRUE, scale = TRUE, maxiter = 100, Method = NULL) register(SerialParam(), default = TRUE) # disable parallelization str(output) initialize_results(100, 3) permute_Y_matrix(matrix(rnorm(100), 10, 10), nr = 10, nbObsPermut = 3, j = 2) res <- initialize_results(100, 3) compute_permutation_stats(res, matrix(rnorm(100), 10, 10), matrix(rnorm(100), 10, 10), j = 2, q = 3, nr = 10) null_errors <- c(0.3, 0.4, 0.35, 0.33) CV_error <- 0.32 compute_pvalue(null_errors, CV_error) null_errors <- c(0.3, 0.4, 0.35, 0.33) compute_IC95(null_errors) library(biomaRt) gene_set <- c("IL13", "IL4", "IL5", "IL21") mart <- biomaRt::useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl") detect_gene_type(gene_set, mart) annotation <- "external_gene_name" gene_set <- c("IL13", "IL4", "IL5") mart <- biomaRt::useMart(biomart = "ensembl", dataset = paste0("hsapiens", "_gene_ensembl")) retrieve_one2one_orthologs(annotation, gene_set, mart, "hsapiens", "mmusculus") X.dim <- c(30,40,60) j <- 2 get_indices(j, X.dim)measure <- "F1" get_measure_index(measure) X <- matrix(rnorm(100), nrow = 10, ncol = 10) Y <- matrix(sample(0:1, 10, replace = TRUE), ncol = 1) result <- get_train_val_sets(X, Y, validation_index = 2) str(result) E_train <- matrix(rnorm(100), nrow = 10, ncol = 10) F_train <- matrix(sample(0:1, 10, replace = TRUE), ncol = 1) E_valid <- matrix(rnorm(10), nrow = 1, ncol = 10) F_valid <- matrix(1, nrow = 1, ncol = 1) quantile_table <- matrix(runif(2), nrow = 1, ncol = 2) quantile_table_CV <- matrix(runif(7), nrow = 1, ncol = 7) results_CV_summary_n <- matrix(0, nrow = 1, ncol = 2) F_matrix_validation_bind <- matrix(0, nrow = 1, ncol = 2) result <- evaluate_quantile_combinations(j=1, E_matrix_training = E_train, F_matrix_training = F_train, E_matrix_validation = E_valid, F_matrix_validation = F_valid, F_matrix_validation_bind = F_matrix_validation_bind, results_CV_summary_n = results_CV_summary_n, quantile_table_CV=quantile_table_CV, i = 1, X.dim = c(5,5), quantile.comb.table =quantile_table, outcome.type = "binary", center = TRUE, scale = TRUE, maxiter = 100, Method = NULL) print(result) set.seed(123) K <- 5 X <- matrix(rnorm(50), nrow = 5, ncol = 10) Y <- matrix(sample(0:1, 5, replace = TRUE), ncol = 1) quantile_comb_table <- matrix(runif(10), nrow = 2, ncol = 10) results_CV_summary_n <- matrix(0, nrow = 2, ncol = K) F_matrix_validation_bind <- matrix(0, nrow = 2, ncol = K) # Parallelization options library(BiocParallel) register(SnowParam(workers = 2, exportglobals = FALSE, progressbar = TRUE), default = TRUE) output <- execute_parallel_cv(K, results_CV_summary_n, F_matrix_validation_bind, X, Y, PLS_term = 1, X.dim = c(5,5), quantile.comb.table = quantile_comb_table, outcome.type = "binary", quantile_table_CV = quantile_comb_table, measure = "B_accuracy", expected.measure.increase = 0.005, center = TRUE, scale = TRUE, maxiter = 100, Method = NULL) register(SerialParam(), default = TRUE) # disable parallelization str(output) initialize_results(100, 3) permute_Y_matrix(matrix(rnorm(100), 10, 10), nr = 10, nbObsPermut = 3, j = 2) res <- initialize_results(100, 3) compute_permutation_stats(res, matrix(rnorm(100), 10, 10), matrix(rnorm(100), 10, 10), j = 2, q = 3, nr = 10) null_errors <- c(0.3, 0.4, 0.35, 0.33) CV_error <- 0.32 compute_pvalue(null_errors, CV_error) null_errors <- c(0.3, 0.4, 0.35, 0.33) compute_IC95(null_errors) library(biomaRt) gene_set <- c("IL13", "IL4", "IL5", "IL21") mart <- biomaRt::useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl") detect_gene_type(gene_set, mart) annotation <- "external_gene_name" gene_set <- c("IL13", "IL4", "IL5") mart <- biomaRt::useMart(biomart = "ensembl", dataset = paste0("hsapiens", "_gene_ensembl")) retrieve_one2one_orthologs(annotation, gene_set, mart, "hsapiens", "mmusculus") X.dim <- c(30,40,60) j <- 2 get_indices(j, X.dim)
Builds the predictor block and matrix response for its fit in asmbPLS-DA
matrixToBlock(object)matrixToBlock(object)
object |
A superpathway input list object |
A list containing the predictor block, response matrix, dimension of each block and observed gene sets with respect to gene_sets_celltype for the original pseudobulk_lognorm matrix, for its use in asmbPLS-DA fit
file <- system.file("extdata", "example_superpathway_input.rda", package = "singIST") load(file) data <- example_superpathway_input matrixToBlock(data)file <- system.file("extdata", "example_superpathway_input.rda", package = "singIST") load(file) data <- example_superpathway_input matrixToBlock(data)
For the wrapper functions multiple_fitOptimal and
multiple_singISTrecapitulations one must pass multiple parameters. To
check for the consistency of such parameters we use this function with the
logic; if the parameter passed is NULL or its length is 1, it is assumed
that the desired list of parameters is the repetition of such; if the
parameter passed is a vector whose length is the number of objects that
the wrappers iterate on, then the function returns a list whose elements
are each of the vector elements; otherwise if the parameters are a vector
whose length does not match with the number of objects to iterate on
then the function stops
multiple_check(parameter, objectLength)multiple_check(parameter, objectLength)
parameter |
The parameters passed from either multiple_fitOptimal or multiple_singISTrecapitulations |
objectLength |
The number of objects that the wrapper functions iterate on |
A list with the repetition of the parameter
# NULL parameter case parameter <- NULL objectLength <- 10 multiple_check(parameter, objectLength) # Parameter equal for all elements parameter <- FALSE objectLength <- 5 multiple_check(parameter, objectLength) # Parameter differing for all elements parameter <- c(1, 6, 7, 8, 9) objectLength <- 5 multiple_check(parameter, objectLength)# NULL parameter case parameter <- NULL objectLength <- 10 multiple_check(parameter, objectLength) # Parameter equal for all elements parameter <- FALSE objectLength <- 5 multiple_check(parameter, objectLength) # Parameter differing for all elements parameter <- c(1, 6, 7, 8, 9) objectLength <- 5 multiple_check(parameter, objectLength)
Use fitOptimal for multiple superpathway input list objects. This wrapper is useful if one wants to assess multiple superpathways for analyses and needs to train its respective optimal models.
multiple_fitOptimal( object = list(), parallel = c(FALSE), measure = c("B_accuracy"), expected_measure_increase = c(0.005), maxiter = c(100), global_significance_full = c(FALSE), CIP.GIP_significance_full = c(FALSE), npermut = c(100), nbObsPermut = c(NULL), type = c("jackknife"), nsubsampling = c(100), Method = c(NULL), ... )multiple_fitOptimal( object = list(), parallel = c(FALSE), measure = c("B_accuracy"), expected_measure_increase = c(0.005), maxiter = c(100), global_significance_full = c(FALSE), CIP.GIP_significance_full = c(FALSE), npermut = c(100), nbObsPermut = c(NULL), type = c("jackknife"), nsubsampling = c(100), Method = c(NULL), ... )
object |
A list whose elements are superpathway input lists objects to use fitOptimal |
parallel |
A vector whose elements are |
measure |
A vector whose elements are |
expected_measure_increase |
A vector whose elements are
|
maxiter |
A vector whose elements are |
global_significance_full |
A vector whose elements are
|
CIP.GIP_significance_full |
A vector whose elements are
|
npermut |
A vector whose elements are |
nbObsPermut |
A vector whose elements are |
type |
A vector whose elements are |
nsubsampling |
A vector whose elements are |
Method |
A vector whose elements are |
... |
Other parameters to be passed onto fitOptimal |
A list of superpathway fit model list
file <- system.file("extdata", "example_superpathway_input.rda", package = "singIST") load(file) data <- example_superpathway_input models <- list(data, data) # Example with different options multiple_model <- multiple_fitOptimal(models, type = c("jackknife", "subsampling"), nsubsampling = c(NULL, 10), npermut = c(10,15))file <- system.file("extdata", "example_superpathway_input.rda", package = "singIST") load(file) data <- example_superpathway_input models <- list(data, data) # Example with different options multiple_model <- multiple_fitOptimal(models, type = c("jackknife", "subsampling"), nsubsampling = c(NULL, 10), npermut = c(10,15))
Use singISTrecapitulations for multiple superpathway fit model list against the same mapping organism list. This wrapper is useful if one wants to assess multiple superpathways against the same mapping organism list.
multiple_singISTrecapitulations( object, model_object = list(), model_species = list("hsapiens"), ... )multiple_singISTrecapitulations( object, model_object = list(), model_species = list("hsapiens"), ... )
object |
A mapping organism list |
model_object |
A list whose elements are superpathway fit model list |
model_species |
A list of characters indicating the organism of each
|
... |
Other parameters to pass onto biological_link_function |
A list with the row binded data.frame for each superpathway assessed for
the superpathway and cell type recapitulations, and gene contributions
to the former.
file <- system.file("extdata", "example_superpathway_input.rda", package = "singIST") load(file) data_model <- example_superpathway_input models <- list(data_model, data_model) # Example with different options multiple_model <- multiple_fitOptimal(models, type = c("jackknife", "subsampling"), nsubsampling = c(NULL, 10), npermut = c(10,15)) file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data_organism <- example_mapping_organism multiple_singISTrecapitulations(data_organism, multiple_model, model_species = list("hsapiens", "hsapiens"))file <- system.file("extdata", "example_superpathway_input.rda", package = "singIST") load(file) data_model <- example_superpathway_input models <- list(data_model, data_model) # Example with different options multiple_model <- multiple_fitOptimal(models, type = c("jackknife", "subsampling"), nsubsampling = c(NULL, 10), npermut = c(10,15)) file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data_organism <- example_mapping_organism multiple_singISTrecapitulations(data_organism, multiple_model, model_species = list("hsapiens", "hsapiens"))
Performs the one-to-one orthology mapping between the mapped disease model list to the reference (human) organism of the superpathway fit model list.
orthology_mapping( object, model_object, from_species, to_species = "mmusculus", annotation_to_species = "external_gene_name" )orthology_mapping( object, model_object, from_species, to_species = "mmusculus", annotation_to_species = "external_gene_name" )
object |
A mapping organism list |
model_object |
A superpathway fit model list |
from_species |
A character indicating the reference organism for which
the parameter |
to_species |
A character indicating the mapped organism for which the
parameter |
annotation_to_species |
A character indicating the gene identifier
annotation used for the |
A list with the gene sets per cell type with the one-to-one orthology
# Case without stating the gene annotation of the mapping.organisms object # note this will take longer to execute file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data_organism <- example_mapping_organism file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data_model <- example_superpathway_fit_model orthology_mapping(data_organism, data_model, "hsapiens", annotation_to_species = NULL) # Case assuming the gene annotation of the mapping.organism object is # by default "external_gene_name" this is faster orthology_mapping(data_organism, data_model, "hsapiens")# Case without stating the gene annotation of the mapping.organisms object # note this will take longer to execute file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data_organism <- example_mapping_organism file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data_model <- example_superpathway_fit_model orthology_mapping(data_organism, data_model, "hsapiens", annotation_to_species = NULL) # Case assuming the gene annotation of the mapping.organism object is # by default "external_gene_name" this is faster orthology_mapping(data_organism, data_model, "hsapiens")
Performs permutation testing for asmbPLS-DA to evaluate model validity.
permut_asmbplsda( object, npermut = 100, nbObsPermut = NULL, Nc = 1, CV_error, measure = "B_accuracy", Method = NULL, maxiter = 100 )permut_asmbplsda( object, npermut = 100, nbObsPermut = NULL, Nc = 1, CV_error, measure = "B_accuracy", Method = NULL, maxiter = 100 )
object |
A superpathway fit model list. |
npermut |
Number of permutations (default: 100). |
nbObsPermut |
Number of samples to permute per iteration (default: NULL). |
Nc |
Number of samples dropped per permutation (default: 1). |
CV_error |
Cross-validation error of the fitted model. |
measure |
Accuracy measure ( |
Method |
Decision rule for prediction (default: NULL). |
maxiter |
Maximum iterations (default: 100). |
A list with permutation statistics, p-value, and confidence intervals.
file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data <- example_superpathway_fit_model permut_asmbplsda(data, npermut = 5, Nc = 1, CV_error = 1)file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data <- example_superpathway_fit_model permut_asmbplsda(data, npermut = 5, Nc = 1, CV_error = 1)
If splits=NULL, runs LOOCV‐based permutation. Otherwise treats splits as
a list of train/validate splits (e.g. from make_splits_R()) and does a fixed
splits K‐fold×repeats permutation test.
permut_asmbplsda_kcv( object, npermut = 100, splits = NULL, measure = "B_accuracy", nbObsPermut = NULL, Nc = 1, Method = NULL, maxiter = 100, CV_error = NULL, ... )permut_asmbplsda_kcv( object, npermut = 100, splits = NULL, measure = "B_accuracy", nbObsPermut = NULL, Nc = 1, Method = NULL, maxiter = 100, CV_error = NULL, ... )
object |
A superpathway fit model list |
npermut |
Number of permutations (default: 100) |
splits |
Optional list of splits; if NULL uses LOOCV branch |
measure |
Accuracy measure ( |
nbObsPermut |
Number of samples to permute per iteration (default: NULL). |
Nc |
Number of samples dropped per permutation (default: 1 if LOOCV). |
Method |
Decision rule for prediction (default: NULL). |
maxiter |
Maximum iterations (default: 100). |
CV_error |
Error obtained from optimal model CV process |
... |
Other args passed to LOOCV or to evaluate_performance |
A list with null distribution, p-value, and (for KCV) splits
Projects each new sample into the latent space learned on training, then reconstructs its missing entries.
predict_mfa_imputer(X_new, mu, loadings)predict_mfa_imputer(X_new, mu, loadings)
X_new |
Numeric matrix (new samples × same features), may contain NAs. |
mu |
Numeric vector of column means (as returned by
|
loadings |
Numeric matrix of loadings (columns = components). |
Matrix X_new with NAs replaced by reconstructed values.
Retrieves the gene set associated with a pathway or superpathway from MsigDB.
pullGeneSet(object, gse = NULL, ...)pullGeneSet(object, gse = NULL, ...)
object |
List. A pathway (from create_pathway()) or superpathway (from create_superpathway()). |
gse |
Gene Set Collection from MsigDB. If NULL, loads default (human, ENTREZ + SYM IDs). |
... |
Additional arguments passed to msigdb::subsetCollection(). |
A character vector with gene IDs for the specified pathway.
my_pathway <- create_pathway("KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", "KEGG", "c2", "CP") pullGeneSet(my_pathway)my_pathway <- create_pathway("KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", "KEGG", "c2", "CP") pullGeneSet(my_pathway)
Render output of multiple_singISTrecapitulations for multiple disease models and superpathways, the output is friendly for visualizing the results
render_multiple_outputs(objects = list())render_multiple_outputs(objects = list())
objects |
A list as retuned by multiple_singISTrecapitulations |
A list with the row binded data.frame for each superpathway assessed for
the superpathway and cell type recapitulations, gene contributions
to the former, and fold changes. These row binds are performed for all
disease models and superpathways.
After cleaning and imputing a subset of predictors (removing any columns with zero variance or 100% missing), this function re‐inserts those removed columns in their original order. For each re‐inserted column, original non‐missing values are kept and entries that were originally missing are set to zero.
restore_removed_columns(X_imp_clean, X_raw, keep_cols)restore_removed_columns(X_imp_clean, X_raw, keep_cols)
X_imp_clean |
Numeric matrix (n_samples × p_cleaned) of imputed values for the columns that were kept. |
X_raw |
Numeric matrix or data.frame (n_samples × p_oril) of ginathe original predictor data before cleaning. |
keep_cols |
Logical vector of length |
A numeric matrix (n_samples × p_original) with all original columns
in their original order. Columns where keep_cols == TRUE contain
the values from X_imp_clean. Columns where
keep_cols == FALSE contain the original non‐NA values, and any
entries that were originally NA are set to zero.
Compute performance metrics of predicted asmbPLSDA
Results_comparison_measure( Y_predict, Y_true, outcome.type = c("binary", "multiclass") )Results_comparison_measure( Y_predict, Y_true, outcome.type = c("binary", "multiclass") )
Y_predict |
Predicted matrix from asmbPLSDA |
Y_true |
True class used to fit asmbPLSDA |
outcome.type |
Outcome type either |
A vector with accuracy, balanced accuracy, precision, recall and F1 metric
Results_comparison_measure(c(1,0,1,0,1), c(0,0,1,1,1), outcome.type = "binary")Results_comparison_measure(c(1,0,1,0,1), c(0,0,1,1,1), outcome.type = "binary")
Updates the gene_sets_celltype element of a superpathway object, ensuring validity:
The number of gene sets must match the number of cell types.
setGeneSetsCelltype(object, value = NULL, ...)setGeneSetsCelltype(object, value = NULL, ...)
object |
List. A superpathway object created by create_superpathway(), or a pathway object created by create_pathway(). |
value |
List. A list of genes to incorporate in each cell type slot. By
default |
... |
other parameters to pass onto pullGeneSet(). |
The updated superpathway object (list).
my_pathway <- create_pathway( standard_name = "KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", dbsource = "KEGG", collection = "c2", subcollection = "CP" ) my_superpathway <- create_superpathway(my_pathway, c("T-cell", "Dendritic Cell"), list()) my_superpathway <- setGeneSetsCelltype(my_superpathway, list(c("IL2"), c("IL4")))my_pathway <- create_pathway( standard_name = "KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION", dbsource = "KEGG", collection = "c2", subcollection = "CP" ) my_superpathway <- create_superpathway(my_pathway, c("T-cell", "Dendritic Cell"), list()) my_superpathway <- setGeneSetsCelltype(my_superpathway, list(c("IL2"), c("IL4")))
Derive singIST treated samples
singIST_treat(object, model_object, orthologs, logFC)singIST_treat(object, model_object, orthologs, logFC)
object |
A mapping organism list passed from biological_link_function. |
model_object |
A superpathway fit model list passed from biological_link_function |
orthologs |
A list of |
logFC |
A list of |
A list object with the singIST treated samples predictor block matrix and a list of Fold Changes for each cell type used to compute the singIST treated samples.
# Orthology mapping file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data_organism <- example_mapping_organism file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data_model <- example_superpathway_fit_model orthologs <- orthology_mapping(data_organism, data_model, "hsapiens") # Set the identities # Cell type mapping data <- celltype_mapping(data_organism) data$counts$test <- paste0(data$counts$celltype_cluster, "_", data$counts$class) SeuratObject::Idents(data$counts) <- "test" logFC <- diff_expressed(data) singIST_treat(data_organism, data_model, orthologs, logFC)# Orthology mapping file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data_organism <- example_mapping_organism file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data_model <- example_superpathway_fit_model orthologs <- orthology_mapping(data_organism, data_model, "hsapiens") # Set the identities # Cell type mapping data <- celltype_mapping(data_organism) data$counts$test <- paste0(data$counts$celltype_cluster, "_", data$counts$class) SeuratObject::Idents(data$counts) <- "test" logFC <- diff_expressed(data) singIST_treat(data_organism, data_model, orthologs, logFC)
This method provides with all singIST recapitulations; superpathway recapitulations; cell type recapitulations; gene contributions to cell type recapitulations. The procedure encompasses the execution of the biological link function, the derivation of the predictor scores (superpathway, cell type and gene scores), and their use to compute the predicted recapitulations as a fraction of the reference recapitulation.
singISTrecapitulations(object, model_object, ...)singISTrecapitulations(object, model_object, ...)
object |
A mapping organism list for which to calculate the recapitulations against the fitted superpathway model |
model_object |
A superpathway fit model list used to calculate the recapitulations |
... |
Other parameters to pass onto biological_link_function |
A list with; a data.frame object with the superpathway recapitulation,
containing variables pathway name, recapitulation, p_val with the
global significance test of the fitted model as provided in model_object,
and target_organism with the target class of the disease model as provided
in model_object; a data.frame with the cell type recapitulation,
containing variables pathway name, celltype, recapitulation,
orthology with the percentage of observed one-to-one orthology coverage
if all cell types have the same gene set this value is constant -, and
target_organism; a data.frame object with the gene contributions to
cell type recapitulation, containing variables pathway, celltype,
gene name, contribution indicating the gene contribution to cell type
recapitulation, and target_organism; orthologs a data.frame with
the one-to-one orthology mapping for each cell type gene set.
file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data_organism <- example_mapping_organism file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data_model <- example_superpathway_fit_model singISTrecapitulations(data_organism, data_model)file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) data_organism <- example_mapping_organism file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) data_model <- example_superpathway_fit_model singISTrecapitulations(data_organism, data_model)
Derive superpathway recapitulation
superpathway_recap(model_object, data_original, data_singIST)superpathway_recap(model_object, data_original, data_singIST)
model_object |
A superpathway fit model list passed from singISTrecapitulations |
data_original |
A matrix with the superpathway's score as returned by derive_contributions for the non-singIST treated samples, passed from singISTrecapitulations |
data_singIST |
A matrix with the superpathway's score as returned by derive_contributions for the singIST treated samples, passed from singISTrecapitulations |
An object data.frame with the variables: pathway name as indicated in
model_object, recapitulation with the superpathway recapitulation
file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) model <- example_superpathway_fit_model file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) mapped <- example_mapping_organism singIST_samples <- biological_link_function(mapped, model)$singIST_samples original <- derive_contributions(model, singIST_samples) derived <- derive_contributions(model,model$model_fit$predictor_block) superpathway_recap(model, original$superpathway_score, derived$superpathway_score)file <- system.file("extdata", "example_superpathway_fit_model.rda", package = "singIST") load(file) model <- example_superpathway_fit_model file <- system.file("extdata", "example_mapping_organism.rda", package = "singIST") load(file) mapped <- example_mapping_organism singIST_samples <- biological_link_function(mapped, model)$singIST_samples original <- derive_contributions(model, singIST_samples) derived <- derive_contributions(model,model$model_fit$predictor_block) superpathway_recap(model, original$superpathway_score, derived$superpathway_score)
Given an original vector of block lengths and a mask of kept columns, recomputes the new block lengths.
update_group_sizes(group_orig, keep_cols)update_group_sizes(group_orig, keep_cols)
group_orig |
Integer vector of original block sizes (sum group_orig = ncol before cleaning). |
keep_cols |
Logical vector of length |
Integer vector of same length as group_orig with updated block
sizes.
update_group_sizes(c(2,3), c(TRUE, TRUE, FALSE, TRUE, TRUE))update_group_sizes(c(2,3), c(TRUE, TRUE, FALSE, TRUE, TRUE))
Mann-Whitney Wilcoxon test p-value
wilcox_CIP_GIP(ref_distr, null_distr, ...)wilcox_CIP_GIP(ref_distr, null_distr, ...)
ref_distr |
A vector with the reference distribution |
null_distr |
A vector with the null distribution |
... |
Other parameters to be passed onto wilcox.test |
A pvalue with the Mann-Whitney Wilcoxon test with the "greater" as the alternative hypothesis
ref_distr <- rnorm(100, mean = 30, sd = 2) null_distr <- rnorm(100, mean = 0, sd = 1) wilcox_CIP_GIP(ref_distr, null_distr)ref_distr <- rnorm(100, mean = 30, sd = 2) null_distr <- rnorm(100, mean = 0, sd = 1) wilcox_CIP_GIP(ref_distr, null_distr)