Package 'decoupleR' reference manual

Title:	decoupleR: Ensemble of computational methods to infer biological activities from omics data
Description:	Many methods allow us to extract biological activities from omics data using information from prior knowledge resources, reducing the dimensionality for increased statistical power and better interpretability. Here, we present decoupleR, a Bioconductor package containing different statistical methods to extract these signatures within a unified framework. decoupleR allows the user to flexibly test any method with any resource. It incorporates methods that take into account the sign and weight of network interactions. decoupleR can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.
Authors:	Pau Badia-i-Mompel [aut, cre] , Jesús Vélez-Santiago [aut] , Jana Braunger [aut] , Celina Geiss [aut] , Daniel Dimitrov [aut] , Sophia Müller-Dott [aut] , Petr Taus [aut] , Aurélien Dugourd [aut] , Christian H. Holland [aut] , Ricardo O. Ramirez Flores [aut] , Julio Saez-Rodriguez [aut]
Maintainer:	Pau Badia-i-Mompel <[email protected]>
License:	GPL-3 + file LICENSE
Version:	2.13.0
Built:	2025-02-12 03:44:24 UTC
Source:	https://github.com/bioc/decoupleR

Pre-processing for methods that fit networks.

Description

If center is true, then the expression values are centered by the mean of expression across the samples.

Usage

.fit_preprocessing(network, mat, center, na.rm, sparse)
.fit_preprocessing(network, mat, center, na.rm, sparse)

Arguments

`network`	Tibble or dataframe with edges and it's associated metadata.
`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`center`	Logical value indicating if `mat` must be centered by `base::rowMeans()`.
`na.rm`	Should missing values (including NaN) be omitted from the calculations of `base::rowMeans()`?
`sparse`	Deprecated parameter.

Value

A named list of matrices to evaluate in methods that fit models, like .mlm_analysis().

mat: Features as rows and samples as columns.
mor_mat: Features as rows and columns as source.

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))
net <- rename_net(net, source, target, mor)
.fit_preprocessing(net, mat, center = FALSE, na.rm = FALSE, sparse = FALSE)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))
net <- rename_net(net, source, target, mor)
.fit_preprocessing(net, mat, center = FALSE, na.rm = FALSE, sparse = FALSE)

Check correlation (colinearity)

Description

Checks the correlation across the regulators in a network.

Usage

check_corr(
  network,
  .source = "source",
  .target = "target",
  .mor = "mor",
  .likelihood = NULL
)
check_corr(
  network,
  .source = "source",
  .target = "target",
  .mor = "mor",
  .likelihood = NULL
)

Arguments

`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`.mor`	Column with edge mode of regulation (i.e. mor).
`.likelihood`	Deprecated argument. Now it will always be set to 1.

Value

Correlation pairs tibble.

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
net <- readRDS(file.path(inputs_dir, "net.rds"))
check_corr(net, .source='source')
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
net <- readRDS(file.path(inputs_dir, "net.rds"))
check_corr(net, .source='source')

Rename columns and add defaults values if column not present

Description

convert_f_defaults() combine the dplyr::rename() way of working and with the tibble::add_column() to add columns with default values in case they don't exist after renaming data.

Usage

convert_f_defaults(.data, ..., .def_col_val = c(), .use_dots = TRUE)
convert_f_defaults(.data, ..., .def_col_val = c(), .use_dots = TRUE)

Arguments

`.data`	A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
`...`	For `rename()`: <`tidy-select`> Use `new_name = old_name` to rename selected variables. For `rename_with()`: additional arguments passed onto `.fn`.
`.def_col_val`	Named vector with columns with default values if none exist after rename.
`.use_dots`	Should a dot prefix be added to renamed variables? This will allow swapping of columns.

Details

The objective of using .use_dots is to be able to swap columns which, by default, is not allowed by the dplyr::rename() function. The same behavior can be replicated by simply using the dplyr::select(), however, the select evaluation allows much more flexibility so that unexpected results could be obtained. Despite this, a future implementation will consider this form of execution to allow renaming the same column to multiple ones (i.e. extend dataframe extension).

Value

An object of the same type as .data. The output has the following properties:

Rows are not affected.
Column names are changed.
Column order is the same as that of the function call.

Examples


df <- tibble::tibble(x = 1, y = 2, z = 3)

# Rename columns
df <- tibble::tibble(x = 1, y = 2)
convert_f_defaults(
    .data = df,
    new_x = x,
    new_y = y,
    new_z = NULL,
    .def_col_val = c(new_z = 3)
)
df <- tibble::tibble(x = 1, y = 2, z = 3)

# Rename columns
df <- tibble::tibble(x = 1, y = 2)
convert_f_defaults(
    .data = df,
    new_x = x,
    new_y = y,
    new_z = NULL,
    .def_col_val = c(new_z = 3)
)

Evaluate multiple statistics with same input data

Description

Calculate the source activity per sample out of a gene expression matrix by coupling a regulatory network with a variety of statistics.

Usage

decouple(
  mat,
  network,
  .source = source,
  .target = target,
  statistics = NULL,
  args = list(NULL),
  consensus_score = TRUE,
  consensus_stats = NULL,
  include_time = FALSE,
  show_toy_call = FALSE,
  minsize = 5
)
decouple(
  mat,
  network,
  .source = source,
  .target = target,
  statistics = NULL,
  args = list(NULL),
  consensus_score = TRUE,
  consensus_stats = NULL,
  include_time = FALSE,
  show_toy_call = FALSE,
  minsize = 5
)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`statistics`	Statistical methods to be run sequentially. If none are provided, only top performer methods are run (mlm, ulm and wsum).
`args`	A list of argument-lists the same length as `statistics` (or length 1). The default argument, list(NULL), will be recycled to the same length as `statistics`, and will call each function with no arguments (apart from `mat`, `network`, `.source` and, `.target`).
`consensus_score`	Boolean whether to run a consensus score between methods.
`consensus_stats`	List of estimate names to use for the calculation of the consensus score. This is used to filter out extra estimations from some methods, for example wsum returns wsum, corr_wsum and norm_wsum. If none are provided, and also no statstics where provided, only top performer methods are used (mlm, ulm and norm_wsum). Else, it will use all available estimates after running all methods in the statistics argument.
`include_time`	Should the time per statistic evaluated be informed?
`show_toy_call`	The call of each statistic must be informed?
`minsize`	Integer indicating the minimum number of targets per source.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

run_id: Indicates the order in which the methods have been executed.
statistic: Indicates which method is associated with which score.
source: Source nodes of network.
condition: Condition representing each column of mat.
score: Regulatory activity (enrichment score).
statistic_time: If requested, internal execution time indicator.
p_value: p-value (if available) of the obtained score.

Examples

if (FALSE) {
    inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

    mat <- readRDS(file.path(inputs_dir, "mat.rds"))
    net <- readRDS(file.path(inputs_dir, "net.rds"))

    decouple(
        mat = mat,
        network = net,
        .source = "source",
        .target = "target",
        statistics = c("gsva", "wmean", "wsum", "ulm", "aucell"),
        args = list(
            gsva = list(verbose = FALSE),
            wmean = list(.mor = "mor", .likelihood = "likelihood"),
            wsum = list(.mor = "mor"),
            ulm = list(.mor = "mor")
        ),
        minsize = 0
    )
}
if (FALSE) {
    inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

    mat <- readRDS(file.path(inputs_dir, "mat.rds"))
    net <- readRDS(file.path(inputs_dir, "net.rds"))

    decouple(
        mat = mat,
        network = net,
        .source = "source",
        .target = "target",
        statistics = c("gsva", "wmean", "wsum", "ulm", "aucell"),
        args = list(
            gsva = list(verbose = FALSE),
            wmean = list(.mor = "mor", .likelihood = "likelihood"),
            wsum = list(.mor = "mor"),
            ulm = list(.mor = "mor")
        ),
        minsize = 0
    )
}

Extract sets

Description

Extracts feature sets from a renamed network (see rename_net).

Usage

extract_sets(network)
extract_sets(network)

Arguments

network

Tibble or dataframe with edges and it's associated metadata.

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))
net <- rename_net(net, source, target, mor)
extract_sets(net)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))
net <- rename_net(net, source, target, mor)
extract_sets(net)

Filter sources with minsize targets

Description

Filter sources of a net with less than minsize targets

Usage

filt_minsize(mat_f_names, network, minsize = 5)
filt_minsize(mat_f_names, network, minsize = 5)

Arguments

`mat_f_names`	Feature names of mat.
`network`	Tibble or dataframe with edges and it's associated metadata.
`minsize`	Integer indicating the minimum number of targets per source.

Value

Filtered network.

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))
net <- rename_net(net, source, target, mor)
filt_minsize(rownames(mat), net, minsize = 4)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))
net <- rename_net(net, source, target, mor)
filt_minsize(rownames(mat), net, minsize = 4)

CollecTRI gene regulatory network. Wrapper to access CollecTRI gene regulatory network. CollecTRI is a comprehensive resource containing a curated collection of transcription factors (TFs) and their target genes. It is an expansion of DoRothEA. Each interaction is weighted by its mode of regulation (either positive or negative).

Description

CollecTRI gene regulatory network. Wrapper to access CollecTRI gene regulatory network. CollecTRI is a comprehensive resource containing a curated collection of transcription factors (TFs) and their target genes. It is an expansion of DoRothEA. Each interaction is weighted by its mode of regulation (either positive or negative).

Usage

get_collectri(organism = "human", split_complexes = FALSE, ...)
get_collectri(organism = "human", split_complexes = FALSE, ...)

Arguments

`organism`	Which organism to use. Only human, mouse and rat are available.
`split_complexes`	Whether to split complexes into subunits. By default complexes are kept as they are.
`...`	Ignored.

Examples

collectri <- get_collectri(organism='human', split_complexes=FALSE)
collectri <- get_collectri(organism='human', split_complexes=FALSE)

DoRothEA gene regulatory network.

Description

Wrapper to access DoRothEA gene regulatory network. DoRothEA is a comprehensive resource containing a curated collection of transcription factors (TFs) and their target genes. Each interaction is weighted by its mode of regulation (either positive or negative) and by its confidence level

Usage

get_dorothea(
  organism = "human",
  levels = c("A", "B", "C"),
  weight_dict = list(A = 1, B = 2, C = 3, D = 4)
)
get_dorothea(
  organism = "human",
  levels = c("A", "B", "C"),
  weight_dict = list(A = 1, B = 2, C = 3, D = 4)
)

Arguments

`organism`	Which organism to use. Only human, mouse and rat are available.
`levels`	List of confidence levels to return. Goes from A to D, A being the most confident and D being the less.
`weight_dict`	Dictionary of values to divide the mode of regulation (-1 or 1), one for each confidence level. Bigger values will generate weights close to zero.

Examples

dorothea <- get_dorothea(organism='human', levels=c('A', 'B'))
dorothea <- get_dorothea(organism='human', levels=c('A', 'B'))

OmniPath kinase-substrate network

Description

Retrieve a ready to use, curated kinase-substrate Network from the OmniPath database.

Usage

get_ksn_omnipath(...)
get_ksn_omnipath(...)

Arguments

...

Passed to OmnipathR::import_omnipath_enzsub.

Details

Import enzyme-PTM network from OmniPath, then filter out anything that is not phospho or dephosphorilation. Then format the columns for use with decoupleR functions.

Pathway RespOnsive GENes for activity inference (PROGENy).

Description

Wrapper to access PROGENy model gene weights. Each pathway is defined with a collection of target genes, each interaction has an associated p-value and weight. The top significant interactions per pathway are returned.

Usage

get_progeny(organism = "human", top = 500)
get_progeny(organism = "human", top = 500)

Arguments

`organism`	Which organism to use. Only human and mouse are available.
`top`	Number of genes per pathway to return.

Examples

progeny <- get_progeny(organism='human', top=500)
progeny <- get_progeny(organism='human', top=500)

Wrapper to access resources inside Omnipath. This wrapper allows to easily query different prior knowledge resources. To check available resources run `decoupleR::show_resources()`. For more information visit the official website for Omnipath.

Description

Wrapper to access resources inside Omnipath. This wrapper allows to easily query different prior knowledge resources. To check available resources run decoupleR::show_resources(). For more information visit the official website for Omnipath.

Usage

get_resource(name, organism = "human", ...)
get_resource(name, organism = "human", ...)

Arguments

`name`	Name of the resource to query.
`organism`	Organism name or NCBI Taxonomy ID.
`...`	Passed to `OmnipathR::import_omnipath_annotations`.

Examples

df <- decoupleR::get_resource('SIGNOR')
df <- decoupleR::get_resource('SIGNOR')

Generate a toy `mat` and `network`.

Description

Generate a toy mat and network.

Usage

get_toy_data(n_samples = 24, seed = 42)
get_toy_data(n_samples = 24, seed = 42)

Arguments

`n_samples`	Number of samples to simulate.
`seed`	A single value, interpreted as an integer, or NULL for random number generation.

Value

List containing mat and network.

Examples

data <- get_toy_data()
mat <- data$mat
network <- data$network
data <- get_toy_data()
mat <- data$mat
network <- data$network

Intersect network target features with input matrix.

Description

Keep only edges which its target features belong to the input matrix.

Usage

intersect_regulons(mat, network, .source, .target, minsize)
intersect_regulons(mat, network, .source, .target, minsize)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`minsize`	Minimum number of targets per source allowed.

Value

Filtered tibble.

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))
intersect_regulons(mat, net, source, target, minsize=4)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))
intersect_regulons(mat, net, source, target, minsize=4)

Rename network

Description

Renames a given network to these column names: .source, .target, .mor, If .mor is not provided, then the function sets them to default values.

Usage

rename_net(
  network,
  .source,
  .target,
  .mor = NULL,
  .likelihood = NULL,
  def_mor = 1
)
rename_net(
  network,
  .source,
  .target,
  .mor = NULL,
  .likelihood = NULL,
  def_mor = 1
)

Arguments

`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`.mor`	Column with edge mode of regulation (i.e. mor).
`.likelihood`	Deprecated argument. Now it will always be set to 1.
`def_mor`	Default value for .mor when not provided.

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))
rename_net(net, source, target, mor)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))
rename_net(net, source, target, mor)

AUCell

Description

Calculates regulatory activities using AUCell.

Usage

run_aucell(
  mat,
  network,
  .source = source,
  .target = target,
  aucMaxRank = ceiling(0.05 * nrow(rankings)),
  nproc = availableCores(),
  seed = 42,
  minsize = 5
)
run_aucell(
  mat,
  network,
  .source = source,
  .target = target,
  aucMaxRank = ceiling(0.05 * nrow(rankings)),
  nproc = availableCores(),
  seed = 42,
  minsize = 5
)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`aucMaxRank`	Threshold to calculate the AUC.
`nproc`	Number of cores to use for computation.
`seed`	A single value, interpreted as an integer, or NULL for random number generation.
`minsize`	Integer indicating the minimum number of targets per source.

Details

AUCell (Aibar et al., 2017) uses the Area Under the Curve (AUC) to calculate whether a set of targets is enriched within the molecular readouts of each sample. To do so, AUCell first ranks the molecular features of each sample from highest to lowest value, resolving ties randomly. Then, an AUC can be calculated using by default the top 5% molecular features in the ranking. Therefore, this metric, aucell, represents the proportion of abundant molecular features in the target set, and their relative abundance value compared to the other features within the sample.

Aibar S. et al. (2017) Scenic: single-cell regulatory network inference and clustering. Nat. Methods, 14, 1083–1086.

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_aucell(mat, net, minsize=0, nproc=1, aucMaxRank=3)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_aucell(mat, net, minsize=0, nproc=1, aucMaxRank=3)

Consensus score between methods

Description

Function to generate a consensus score between methods from the result of the decouple function.

Usage

run_consensus(df, include_time = FALSE, seed = NULL)
run_consensus(df, include_time = FALSE, seed = NULL)

Arguments

`df`	`decouple` data frame result
`include_time`	Should the time per statistic evaluated be informed?
`seed`	Deprecated parameter.

Value

Updated tibble with the computed consensus score between methods

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

results <- decouple(
   mat = mat,
   network = net,
   .source = "source",
   .target = "target",
   statistics = c("wmean", "ulm"),
   args = list(
            wmean = list(.mor = "mor", .likelihood = "likelihood"),
            ulm = list(.mor = "mor", .likelihood = "likelihood")
        ),
   consensus_score = FALSE,
   minsize = 0
   )
run_consensus(results)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

results <- decouple(
   mat = mat,
   network = net,
   .source = "source",
   .target = "target",
   statistics = c("wmean", "ulm"),
   args = list(
            wmean = list(.mor = "mor", .likelihood = "likelihood"),
            ulm = list(.mor = "mor", .likelihood = "likelihood")
        ),
   consensus_score = FALSE,
   minsize = 0
   )
run_consensus(results)

Fast Gene Set Enrichment Analysis (FGSEA)

Description

Calculates regulatory activities using FGSEA.

Usage

run_fgsea(
  mat,
  network,
  .source = source,
  .target = target,
  times = 100,
  nproc = availableCores(),
  seed = 42,
  minsize = 5,
  ...
)
run_fgsea(
  mat,
  network,
  .source = source,
  .target = target,
  times = 100,
  nproc = availableCores(),
  seed = 42,
  minsize = 5,
  ...
)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`times`	How many permutations to do?
`nproc`	Number of cores to use for computation.
`seed`	A single value, interpreted as an integer, or NULL.
`minsize`	Integer indicating the minimum number of targets per source.
`...`	Arguments passed on to `fgsea::fgseaMultilevel` `sampleSize` The size of a random set of genes which in turn has size = pathwaySize `minSize` Minimal size of a gene set to test. All pathways below the threshold are excluded. `maxSize` Maximal size of a gene set to test. All pathways above the threshold are excluded. `eps` This parameter sets the boundary for calculating the p value. `scoreType` This parameter defines the GSEA score type. Possible options are ("std", "pos", "neg"). By default ("std") the enrichment score is computed as in the original GSEA. The "pos" and "neg" score types are intended to be used for one-tailed tests (i.e. when one is interested only in positive ("pos") or negateive ("neg") enrichment). `gseaParam` GSEA parameter value, all gene-level statis are raised to the power of 'gseaParam' before calculation of GSEA enrichment scores. `BPPARAM` Parallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting 'nproc' default value 'bpparam()' is used. `absEps` deprecated, use 'eps' parameter instead

Details

GSEA (Aravind et al., 2005) starts by transforming the input molecular readouts in mat to ranks for each sample. Then, an enrichment score fgsea is calculated by walking down the list of features, increasing a running-sum statistic when a feature in the target feature set is encountered and decreasing it when it is not. The final score is the maximum deviation from zero encountered in the random walk. Finally, a normalized score norm_fgsea, can be obtained by computing the z-score of the estimate compared to a null distribution obtained from N random permutations. The used implementation is taken from the package fgsea (Korotkevich et al., 2021).

Aravind S. et al. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 102, 43.

Korotkevich G. et al. (2021) Fast gene set enrichment analysis. bioRxiv. DOI: https://doi.org/10.1101/060012.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

statistic: Indicates which method is associated with which score.
source: Source nodes of network.
condition: Condition representing each column of mat.
score: Regulatory activity (enrichment score).

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_fgsea(mat, net, minsize=0, nproc=1)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_fgsea(mat, net, minsize=0, nproc=1)

Gene Set Variation Analysis (GSVA)

Description

Calculates regulatory activities using GSVA.

Usage

run_gsva(
  mat,
  network,
  .source = source,
  .target = target,
  verbose = FALSE,
  method = c("gsva", "plage", "ssgsea", "zscore"),
  minsize = 5L,
  maxsize = Inf,
  ...
)
run_gsva(
  mat,
  network,
  .source = source,
  .target = target,
  verbose = FALSE,
  method = c("gsva", "plage", "ssgsea", "zscore"),
  minsize = 5L,
  maxsize = Inf,
  ...
)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`verbose`	Gives information about each calculation step. Default: FALSE.
`method`	Method to employ in the estimation of gene-set enrichment. scores per sample. By default this is set to gsva (Hänzelmann et al, 2013). Further available methods are "plage", "ssgsea" and "zscore". Read more in the manual of `GSVA::gsva`.
`minsize`	Integer indicating the minimum number of targets per source. Must be greater than 0.
`maxsize`	Integer indicating the maximum number of targets per source.
`...`	Arguments passed on to `GSVA::gsvaParam`, `GSVA::ssgseaParam` `assay` The name of the assay to use in case `exprData` is a multi-assay container, otherwise ignored. By default, the first assay is used. `annotation` The name of a Bioconductor annotation package for the gene identifiers occurring in the row names of the expression data matrix. This can be used to map gene identifiers occurring in the gene sets if those are provided in a `GeneSetCollection`. By default gene identifiers used in expression data matrix and gene sets are matched directly. `kcdf` Character vector of length 1 denoting the kernel to use during the non-parametric estimation of the cumulative distribution function of expression levels across samples. By default, `kcdf="Gaussian"` which is suitable when input expression values are continuous, such as microarray fluorescent units in logarithmic scale, RNA-seq log-CPMs, log-RPKMs or log-TPMs. When input expression values are integer counts, such as those derived from RNA-seq experiments, then this argument should be set to `kcdf="Poisson"`. `tau` Numeric vector of length 1. The exponent defining the weight of the tail in the random walk performed by the `GSVA` (Hänzelmann et al., 2013) method. The default value is 1 as described in the paper. `maxDiff` Logical vector of length 1 which offers two approaches to calculate the enrichment statistic (ES) from the KS random walk statistic. `FALSE`: ES is calculated as the maximum distance of the random walk from 0. `TRUE` (the default): ES is calculated as the magnitude difference between the largest positive and negative random walk deviations. `absRanking` Logical vector of length 1 used only when `maxDiff=TRUE`. When `absRanking=FALSE` (default) a modified Kuiper statistic is used to calculate enrichment scores, taking the magnitude difference between the largest positive and negative random walk deviations. When `absRanking=TRUE` the original Kuiper statistic that sums the largest positive and negative random walk deviations, is used. In this latter case, gene sets with genes enriched on either extreme (high or low) will be regarded as ’highly’ activated. `alpha` Numeric vector of length 1. The exponent defining the weight of the tail in the random walk performed by the `ssGSEA` (Barbie et al., 2009) method. The default value is 0.25 as described in the paper. `normalize` Logical vector of length 1; if `TRUE` runs the `ssGSEA` method from Barbie et al. (2009) normalizing the scores by the absolute difference between the minimum and the maximum, as described in their paper. Otherwise this last normalization step is skipped.

Details

GSVA (Hänzelmann et al., 2013) starts by transforming the input molecular readouts in mat to a readout-level statistic using Gaussian kernel estimation of the cumulative density function. Then, readout-level statistics are ranked per sample and normalized to up-weight the two tails of the rank distribution. Afterwards, an enrichment score gsva is calculated using a running sum statistic that is normalized by subtracting the largest negative estimate from the largest positive one.

Hänzelmann S. et al. (2013) GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics, 14, 7.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

statistic: Indicates which method is associated with which score.
source: Source nodes of network.
condition: Condition representing each column of mat.
score: Regulatory activity (enrichment score).

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_gsva(mat, net, minsize=1, verbose = FALSE)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_gsva(mat, net, minsize=1, verbose = FALSE)

Multivariate Decision Trees (MDT)

Description

Calculates regulatory activities using MDT.

Usage

run_mdt(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  sparse = FALSE,
  center = FALSE,
  na.rm = FALSE,
  trees = 10,
  min_n = 20,
  nproc = availableCores(),
  seed = 42,
  minsize = 5
)
run_mdt(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  sparse = FALSE,
  center = FALSE,
  na.rm = FALSE,
  trees = 10,
  min_n = 20,
  nproc = availableCores(),
  seed = 42,
  minsize = 5
)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`.mor`	Column with edge mode of regulation (i.e. mor).
`.likelihood`	Deprecated argument. Now it will always be set to 1.
`sparse`	Deprecated parameter.
`center`	Logical value indicating if `mat` must be centered by `base::rowMeans()`.
`na.rm`	Should missing values (including NaN) be omitted from the calculations of `base::rowMeans()`?
`trees`	An integer for the number of trees contained in the ensemble.
`min_n`	An integer for the minimum number of data points in a node that are required for the node to be split further.
`nproc`	Number of cores to use for computation.
`seed`	A single value, interpreted as an integer, or NULL for random number generation.
`minsize`	Integer indicating the minimum number of targets per source.

Details

MDT fits a multivariate regression random forest for each sample, where the observed molecular readouts in mat are the response variable and the regulator weights in net are the covariates. Target features with no associated weight are set to zero. The obtained feature importances from the fitted model are the activities mdt of the regulators in net.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

statistic: Indicates which method is associated with which score.
source: Source nodes of network.
condition: Condition representing each column of mat.
score: Regulatory activity (enrichment score).

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_mdt(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_mdt(mat, net, minsize=0)

Multivariate Linear Model (MLM)

Description

Calculates regulatory activities using MLM.

Usage

run_mlm(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  sparse = FALSE,
  center = FALSE,
  na.rm = FALSE,
  minsize = 5
)
run_mlm(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  sparse = FALSE,
  center = FALSE,
  na.rm = FALSE,
  minsize = 5
)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`.mor`	Column with edge mode of regulation (i.e. mor).
`.likelihood`	Deprecated argument. Now it will always be set to 1.
`sparse`	Deprecated parameter.
`center`	Logical value indicating if `mat` must be centered by `base::rowMeans()`.
`na.rm`	Should missing values (including NaN) be omitted from the calculations of `base::rowMeans()`?
`minsize`	Integer indicating the minimum number of targets per source.

Details

MLM fits a multivariate linear model for each sample, where the observed molecular readouts in mat are the response variable and the regulator weights in net are the covariates. Target features with no associated weight are set to zero. The obtained t-values from the fitted model are the activities (mlm) of the regulators in net.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

statistic: Indicates which method is associated with which score.
source: Source nodes of network.
condition: Condition representing each column of mat.
score: Regulatory activity (enrichment score).

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_mlm(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_mlm(mat, net, minsize=0)

Over Representation Analysis (ORA)

Description

Calculates regulatory activities using ORA.

Usage

run_ora(
  mat,
  network,
  .source = source,
  .target = target,
  n_up = ceiling(0.05 * nrow(mat)),
  n_bottom = 0,
  n_background = 20000,
  with_ties = TRUE,
  seed = 42,
  minsize = 5,
  ...
)
run_ora(
  mat,
  network,
  .source = source,
  .target = target,
  n_up = ceiling(0.05 * nrow(mat)),
  n_bottom = 0,
  n_background = 20000,
  with_ties = TRUE,
  seed = 42,
  minsize = 5,
  ...
)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`n_up`	Integer indicating the number of top targets to slice from mat.
`n_bottom`	Integer indicating the number of bottom targets to slice from mat.
`n_background`	Integer indicating the background size of the sliced targets. If not specified the number of background targets is determined by the total number of unique targets in the union of `mat` and `network`.
`with_ties`	Should ties be kept together? The default, `TRUE`, may return more rows than you request. Use `FALSE` to ignore ties, and return the first `n` rows.
`seed`	A single value, interpreted as an integer, or NULL for random number generation.
`minsize`	Integer indicating the minimum number of targets per source.
`...`	Arguments passed on to `stats::fisher.test` `workspace` an integer specifying the size of the workspace used in the network algorithm. In units of 4 bytes. Only used for non-simulated p-values larger than $2 \times 2$ tables. Since R version 3.5.0, this also increases the internal stack size which allows larger problems to be solved, however sometimes needing hours. In such cases, `simulate.p.values=TRUE` may be more reasonable. `hybrid` a logical. Only used for larger than $2 \times 2$ tables, in which cases it indicates whether the exact probabilities (default) or a hybrid approximation thereof should be computed. `hybridPars` a numeric vector of length 3, by default describing “Cochran's conditions” for the validity of the chisquare approximation, see ‘Details’. `control` a list with named components for low level algorithm control. At present the only one used is `"mult"`, a positive integer $\ge 2$ with default 30 used only for larger than $2 \times 2$ tables. This says how many times as much space should be allocated to paths as to keys: see file ‘fexact.c’ in the sources of this package. `or` the hypothesized odds ratio. Only used in the $2 \times 2$ case. `alternative` indicates the alternative hypothesis and must be one of `"two.sided"`, `"greater"` or `"less"`. You can specify just the initial letter. Only used in the $2 \times 2$ case. `conf.int` logical indicating if a confidence interval for the odds ratio in a $2 \times 2$ table should be computed (and returned). `conf.level` confidence level for the returned confidence interval. Only used in the $2 \times 2$ case and if `conf.int = TRUE`. `simulate.p.value` a logical indicating whether to compute p-values by Monte Carlo simulation, in larger than $2 \times 2$ tables. `B` an integer specifying the number of replicates used in the Monte Carlo test.

Details

ORA measures the overlap between the target feature set and a list of most altered molecular features in mat. The most altered molecular features can be selected from the top and or bottom of the molecular readout distribution, by default it is the top 5% positive values. With these, a contingency table is build and a one-tailed Fisher’s exact test is computed to determine if a regulator’s set of features are over-represented in the selected features from the data. The resulting score, ora, is the minus log10 of the obtained p-value.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

statistic: Indicates which method is associated with which score.
source: Source nodes of network.
condition: Condition representing each column of mat.
score: Regulatory activity (enrichment score).

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_ora(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_ora(mat, net, minsize=0)

Univariate Decision Tree (UDT)

Description

Calculates regulatory activities by using UDT.

Usage

run_udt(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  sparse = FALSE,
  center = FALSE,
  na.rm = FALSE,
  min_n = 20,
  seed = 42,
  minsize = 5
)
run_udt(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  sparse = FALSE,
  center = FALSE,
  na.rm = FALSE,
  min_n = 20,
  seed = 42,
  minsize = 5
)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`.mor`	Column with edge mode of regulation (i.e. mor).
`.likelihood`	Deprecated argument. Now it will always be set to 1.
`sparse`	Deprecated parameter.
`center`	Logical value indicating if `mat` must be centered by `base::rowMeans()`.
`na.rm`	Should missing values (including NaN) be omitted from the calculations of `base::rowMeans()`?
`min_n`	An integer for the minimum number of data points in a node that are required for the node to be split further.
`seed`	A single value, interpreted as an integer, or NULL for random number generation.
`minsize`	Integer indicating the minimum number of targets per source.

Details

UDT fits a single regression decision tree for each sample and regulator, where the observed molecular readouts in mat are the response variable and the regulator weights in net are the explanatory one. Target features with no associated weight are set to zero. The obtained feature importance from the fitted model is the activity udt of a given regulator.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

statistic: Indicates which method is associated with which score.
source: Source nodes of network.
condition: Condition representing each column of mat.
score: Regulatory activity (enrichment score).

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_udt(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_udt(mat, net, minsize=0)

Univariate Linear Model (ULM)

Description

Calculates regulatory activities using ULM.

Usage

run_ulm(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  sparse = FALSE,
  center = FALSE,
  na.rm = FALSE,
  minsize = 5L
)
run_ulm(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  sparse = FALSE,
  center = FALSE,
  na.rm = FALSE,
  minsize = 5L
)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`.mor`	Column with edge mode of regulation (i.e. mor).
`.likelihood`	Deprecated argument. Now it will always be set to 1.
`sparse`	Deprecated parameter.
`center`	Logical value indicating if `mat` must be centered by `base::rowMeans()`.
`na.rm`	Should missing values (including NaN) be omitted from the calculations of `base::rowMeans()`?
`minsize`	Integer indicating the minimum number of targets per source.

Details

ULM fits a linear model for each sample and regulator, where the observed molecular readouts in mat are the response variable and the regulator weights in net are the explanatory one. Target features with no associated weight are set to zero. The obtained t-value from the fitted model is the activity ulm of a given regulator.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

statistic: Indicates which method is associated with which score.
source: Source nodes of network.
condition: Condition representing each column of mat.
score: Regulatory activity (enrichment score).

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_ulm(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_ulm(mat, net, minsize=0)

Virtual Inference of Protein-activity by Enriched Regulon analysis (VIPER)

Description

Calculates regulatory activities using VIPER.

Usage

run_viper(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  verbose = FALSE,
  minsize = 5,
  pleiotropy = TRUE,
  eset.filter = FALSE,
  ...
)
run_viper(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  verbose = FALSE,
  minsize = 5,
  pleiotropy = TRUE,
  eset.filter = FALSE,
  ...
)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`.mor`	Column with edge mode of regulation (i.e. mor).
`.likelihood`	Deprecated argument. Now it will always be set to 1.
`verbose`	Logical, whether progression messages should be printed in the terminal.
`minsize`	Integer indicating the minimum number of targets per source.
`pleiotropy`	Logical, whether correction for pleiotropic regulation should be performed.
`eset.filter`	Logical, whether the dataset should be limited only to the genes represented in the interactome.
`...`	Arguments passed on to `viper::viper` `dnull` Numeric matrix for the null model, usually generated by `nullTtest` `nes` Logical, whether the enrichment score reported should be normalized `method` Character string indicating the method for computing the single samples signature, either scale, rank, mad, ttest or none `bootstraps` Integer indicating the number of bootstraps iterations to perform. Only the scale method is implemented with bootstraps. `adaptive.size` Logical, whether the weighting scores should be taken into account for computing the regulon size `pleiotropyArgs` list of 5 numbers for the pleotropy correction indicating: regulators p-value threshold, pleiotropic interaction p-value threshold, minimum number of targets in the overlap between pleiotropic regulators, penalty for the pleiotropic interactions and the method for computing the pleiotropy, either absolute or adaptive `cores` Integer indicating the number of cores to use (only 1 in Windows-based systems)

Details

VIPER (Alvarez et al., 2016) estimates biological activities by performing a three-tailed enrichment score calculation. For further information check the supplementary information of the decoupler manuscript or the original publication.

Alvarez M.J.et al. (2016) Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet., 48, 838–847.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

statistic: Indicates which method is associated with which score.
source: Source nodes of network.
condition: Condition representing each column of mat.
score: Regulatory activity (enrichment score).

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_viper(mat, net, minsize=0, verbose = FALSE)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_viper(mat, net, minsize=0, verbose = FALSE)

Weighted Mean (WMEAN)

Description

Calculates regulatory activities using WMEAN.

Usage

run_wmean(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  times = 100,
  seed = 42,
  sparse = TRUE,
  randomize_type = "rows",
  minsize = 5
)
run_wmean(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  times = 100,
  seed = 42,
  sparse = TRUE,
  randomize_type = "rows",
  minsize = 5
)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`.mor`	Column with edge mode of regulation (i.e. mor).
`.likelihood`	Deprecated argument. Now it will always be set to 1.
`times`	How many permutations to do?
`seed`	A single value, interpreted as an integer, or NULL for random number generation.
`sparse`	Should the matrices used for the calculation be sparse?
`randomize_type`	How to randomize the expression matrix.
`minsize`	Integer indicating the minimum number of targets per source.

Details

WMEAN infers regulator activities by first multiplying each target feature by its associated weight which then are summed to an enrichment score wmean. Furthermore, permutations of random target features can be performed to obtain a null distribution that can be used to compute a z-score norm_wmean, or a corrected estimate corr_wmean by multiplying wmean by the minus log10 of the obtained empirical p-value.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

statistic: Indicates which method is associated with which score.
source: Source nodes of network.
condition: Condition representing each column of mat.
score: Regulatory activity (enrichment score).
p_value: p-value for the score of the method.

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_wmean(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_wmean(mat, net, minsize=0)

Weighted Sum (WSUM)

Description

Calculates regulatory activities using WSUM.

Usage

run_wsum(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  times = 100,
  seed = 42,
  sparse = TRUE,
  randomize_type = "rows",
  minsize = 5
)
run_wsum(
  mat,
  network,
  .source = source,
  .target = target,
  .mor = mor,
  .likelihood = likelihood,
  times = 100,
  seed = 42,
  sparse = TRUE,
  randomize_type = "rows",
  minsize = 5
)

Arguments

`mat`	Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. `rownames(mat)` must have at least one intersection with the elements in `network` `.target` column.
`network`	Tibble or dataframe with edges and it's associated metadata.
`.source`	Column with source nodes.
`.target`	Column with target nodes.
`.mor`	Column with edge mode of regulation (i.e. mor).
`.likelihood`	Deprecated argument. Now it will always be set to 1.
`times`	How many permutations to do?
`seed`	A single value, interpreted as an integer, or NULL for random number generation.
`sparse`	Should the matrices used for the calculation be sparse?
`randomize_type`	How to randomize the expression matrix.
`minsize`	Integer indicating the minimum number of targets per source.

Details

WSUM infers regulator activities by first multiplying each target feature by its associated weight which then are summed to an enrichment score wsum. Furthermore, permutations of random target features can be performed to obtain a null distribution that can be used to compute a z-score norm_wsum, or a corrected estimate corr_wsum by multiplying wsum by the minus log10 of the obtained empirical p-value.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

statistic: Indicates which method is associated with which score.
source: Source nodes of network.
condition: Condition representing each column of mat.
score: Regulatory activity (enrichment score).
p_value: p-value for the score of the method.

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_wsum(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_wsum(mat, net, minsize=0)

Show methods

Description

Prints the methods available in decoupleR. The first column correspond to the function name in decoupleR and the second to the method's full name.

Usage

show_methods()
show_methods()

Examples

show_methods()
show_methods()

Shows available resources in Omnipath. For more information visit the official website for Omnipath.

Description

Shows available resources in Omnipath. For more information visit the official website for Omnipath.

Usage

show_resources()
show_resources()

Examples

decoupleR::show_resources()
decoupleR::show_resources()

Package 'decoupleR'

Help Index

Pre-processing for methods that fit networks.

Description

Usage

Arguments

Value

Examples

Check correlation (colinearity)

Description

Usage

Arguments

Value

Examples

Rename columns and add defaults values if column not present

Description

Usage

Arguments

Details

Value

Examples

Evaluate multiple statistics with same input data

Description

Usage

Arguments

Value

See Also

Examples

Extract sets

Description

Usage

Arguments

Examples

Filter sources with minsize targets

Description

Usage

Arguments

Value

Examples

Description

Usage

Arguments

Examples

DoRothEA gene regulatory network.

Description

Usage

Arguments

Examples

OmniPath kinase-substrate network

Description

Usage

Arguments

Details

Pathway RespOnsive GENes for activity inference (PROGENy).

Description

Usage

Arguments

Examples

Wrapper to access resources inside Omnipath. This wrapper allows to easily query different prior knowledge resources. To check available resources run decoupleR::show_resources(). For more information visit the official website for Omnipath.

Description

Usage

Arguments

Examples

Generate a toy mat and network.

Description

Usage

Arguments

Value

Examples

Intersect network target features with input matrix.

Description

Usage

Arguments

Value

Examples

Rename network

Description

Usage

Arguments

Examples

Wrapper to access resources inside Omnipath. This wrapper allows to easily query different prior knowledge resources. To check available resources run `decoupleR::show_resources()`. For more information visit the official website for Omnipath.

Generate a toy `mat` and `network`.