Title: | decoupleR: Ensemble of computational methods to infer biological activities from omics data |
---|---|
Description: | Many methods allow us to extract biological activities from omics data using information from prior knowledge resources, reducing the dimensionality for increased statistical power and better interpretability. Here, we present decoupleR, a Bioconductor package containing different statistical methods to extract these signatures within a unified framework. decoupleR allows the user to flexibly test any method with any resource. It incorporates methods that take into account the sign and weight of network interactions. decoupleR can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase. |
Authors: | Pau Badia-i-Mompel [aut, cre] , Jesús Vélez-Santiago [aut] , Jana Braunger [aut] , Celina Geiss [aut] , Daniel Dimitrov [aut] , Sophia Müller-Dott [aut] , Petr Taus [aut] , Aurélien Dugourd [aut] , Christian H. Holland [aut] , Ricardo O. Ramirez Flores [aut] , Julio Saez-Rodriguez [aut] |
Maintainer: | Pau Badia-i-Mompel <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 2.13.0 |
Built: | 2024-11-14 05:51:09 UTC |
Source: | https://github.com/bioc/decoupleR |
If center
is true, then the expression values are centered by the
mean of expression across the samples.
.fit_preprocessing(network, mat, center, na.rm, sparse)
.fit_preprocessing(network, mat, center, na.rm, sparse)
network |
Tibble or dataframe with edges and it's associated metadata. |
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
center |
Logical value indicating if |
na.rm |
Should missing values (including NaN) be omitted from the
calculations of |
sparse |
Deprecated parameter. |
A named list of matrices to evaluate in methods that fit models, like
.mlm_analysis()
.
mat: Features as rows and samples as columns.
mor_mat: Features as rows and columns as source.
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) net <- rename_net(net, source, target, mor) .fit_preprocessing(net, mat, center = FALSE, na.rm = FALSE, sparse = FALSE)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) net <- rename_net(net, source, target, mor) .fit_preprocessing(net, mat, center = FALSE, na.rm = FALSE, sparse = FALSE)
Checks the correlation across the regulators in a network.
check_corr( network, .source = "source", .target = "target", .mor = "mor", .likelihood = NULL )
check_corr( network, .source = "source", .target = "target", .mor = "mor", .likelihood = NULL )
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
.mor |
Column with edge mode of regulation (i.e. mor). |
.likelihood |
Deprecated argument. Now it will always be set to 1. |
Correlation pairs tibble.
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") net <- readRDS(file.path(inputs_dir, "net.rds")) check_corr(net, .source='source')
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") net <- readRDS(file.path(inputs_dir, "net.rds")) check_corr(net, .source='source')
convert_f_defaults()
combine the dplyr::rename()
way of
working and with the tibble::add_column()
to add columns
with default values in case they don't exist after renaming data.
convert_f_defaults(.data, ..., .def_col_val = c(), .use_dots = TRUE)
convert_f_defaults(.data, ..., .def_col_val = c(), .use_dots = TRUE)
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
For For |
.def_col_val |
Named vector with columns with default values if none exist after rename. |
.use_dots |
Should a dot prefix be added to renamed variables? This will allow swapping of columns. |
The objective of using .use_dots is to be able to swap columns which,
by default, is not allowed by the dplyr::rename()
function.
The same behavior can be replicated by simply using the dplyr::select()
,
however, the select evaluation allows much more flexibility so that
unexpected results could be obtained. Despite this, a future implementation
will consider this form of execution to allow renaming the same
column to multiple ones (i.e. extend dataframe extension).
An object of the same type as .data. The output has the following properties:
Rows are not affected.
Column names are changed.
Column order is the same as that of the function call.
df <- tibble::tibble(x = 1, y = 2, z = 3) # Rename columns df <- tibble::tibble(x = 1, y = 2) convert_f_defaults( .data = df, new_x = x, new_y = y, new_z = NULL, .def_col_val = c(new_z = 3) )
df <- tibble::tibble(x = 1, y = 2, z = 3) # Rename columns df <- tibble::tibble(x = 1, y = 2) convert_f_defaults( .data = df, new_x = x, new_y = y, new_z = NULL, .def_col_val = c(new_z = 3) )
Calculate the source activity per sample out of a gene expression matrix by coupling a regulatory network with a variety of statistics.
decouple( mat, network, .source = source, .target = target, statistics = NULL, args = list(NULL), consensus_score = TRUE, consensus_stats = NULL, include_time = FALSE, show_toy_call = FALSE, minsize = 5 )
decouple( mat, network, .source = source, .target = target, statistics = NULL, args = list(NULL), consensus_score = TRUE, consensus_stats = NULL, include_time = FALSE, show_toy_call = FALSE, minsize = 5 )
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
statistics |
Statistical methods to be run sequentially. If none are provided, only top performer methods are run (mlm, ulm and wsum). |
args |
A list of argument-lists the same length as |
consensus_score |
Boolean whether to run a consensus score between methods. |
consensus_stats |
List of estimate names to use for the calculation of the consensus score. This is used to filter out extra estimations from some methods, for example wsum returns wsum, corr_wsum and norm_wsum. If none are provided, and also no statstics where provided, only top performer methods are used (mlm, ulm and norm_wsum). Else, it will use all available estimates after running all methods in the statistics argument. |
include_time |
Should the time per statistic evaluated be informed? |
show_toy_call |
The call of each statistic must be informed? |
minsize |
Integer indicating the minimum number of targets per source. |
A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:
run_id
: Indicates the order in which the methods have been executed.
statistic
: Indicates which method is associated with which score.
source
: Source nodes of network
.
condition
: Condition representing each column of mat
.
score
: Regulatory activity (enrichment score).
statistic_time
: If requested, internal execution time indicator.
p_value
: p-value (if available) of the obtained score.
Other decoupleR statistics:
run_aucell()
,
run_fgsea()
,
run_gsva()
,
run_mdt()
,
run_mlm()
,
run_ora()
,
run_udt()
,
run_ulm()
,
run_viper()
,
run_wmean()
,
run_wsum()
if (FALSE) { inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) decouple( mat = mat, network = net, .source = "source", .target = "target", statistics = c("gsva", "wmean", "wsum", "ulm", "aucell"), args = list( gsva = list(verbose = FALSE), wmean = list(.mor = "mor", .likelihood = "likelihood"), wsum = list(.mor = "mor"), ulm = list(.mor = "mor") ), minsize = 0 ) }
if (FALSE) { inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) decouple( mat = mat, network = net, .source = "source", .target = "target", statistics = c("gsva", "wmean", "wsum", "ulm", "aucell"), args = list( gsva = list(verbose = FALSE), wmean = list(.mor = "mor", .likelihood = "likelihood"), wsum = list(.mor = "mor"), ulm = list(.mor = "mor") ), minsize = 0 ) }
Extracts feature sets from a renamed network (see rename_net).
extract_sets(network)
extract_sets(network)
network |
Tibble or dataframe with edges and it's associated metadata. |
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) net <- rename_net(net, source, target, mor) extract_sets(net)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) net <- rename_net(net, source, target, mor) extract_sets(net)
Filter sources of a net with less than minsize targets
filt_minsize(mat_f_names, network, minsize = 5)
filt_minsize(mat_f_names, network, minsize = 5)
mat_f_names |
Feature names of mat. |
network |
Tibble or dataframe with edges and it's associated metadata. |
minsize |
Integer indicating the minimum number of targets per source. |
Filtered network.
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) net <- rename_net(net, source, target, mor) filt_minsize(rownames(mat), net, minsize = 4)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) net <- rename_net(net, source, target, mor) filt_minsize(rownames(mat), net, minsize = 4)
CollecTRI gene regulatory network. Wrapper to access CollecTRI gene regulatory network. CollecTRI is a comprehensive resource containing a curated collection of transcription factors (TFs) and their target genes. It is an expansion of DoRothEA. Each interaction is weighted by its mode of regulation (either positive or negative).
get_collectri(organism = "human", split_complexes = FALSE, ...)
get_collectri(organism = "human", split_complexes = FALSE, ...)
organism |
Which organism to use. Only human, mouse and rat are available. |
split_complexes |
Whether to split complexes into subunits. By default complexes are kept as they are. |
... |
Ignored. |
collectri <- get_collectri(organism='human', split_complexes=FALSE)
collectri <- get_collectri(organism='human', split_complexes=FALSE)
Wrapper to access DoRothEA gene regulatory network. DoRothEA is a comprehensive resource containing a curated collection of transcription factors (TFs) and their target genes. Each interaction is weighted by its mode of regulation (either positive or negative) and by its confidence level
get_dorothea( organism = "human", levels = c("A", "B", "C"), weight_dict = list(A = 1, B = 2, C = 3, D = 4) )
get_dorothea( organism = "human", levels = c("A", "B", "C"), weight_dict = list(A = 1, B = 2, C = 3, D = 4) )
organism |
Which organism to use. Only human, mouse and rat are available. |
levels |
List of confidence levels to return. Goes from A to D, A being the most confident and D being the less. |
weight_dict |
Dictionary of values to divide the mode of regulation (-1 or 1), one for each confidence level. Bigger values will generate weights close to zero. |
dorothea <- get_dorothea(organism='human', levels=c('A', 'B'))
dorothea <- get_dorothea(organism='human', levels=c('A', 'B'))
Retrieve a ready to use, curated kinase-substrate Network from the OmniPath database.
get_ksn_omnipath(...)
get_ksn_omnipath(...)
... |
Passed to |
Import enzyme-PTM network from OmniPath, then filter out anything that is not phospho or dephosphorilation. Then format the columns for use with decoupleR functions.
Wrapper to access PROGENy model gene weights. Each pathway is defined with a collection of target genes, each interaction has an associated p-value and weight. The top significant interactions per pathway are returned.
get_progeny(organism = "human", top = 500)
get_progeny(organism = "human", top = 500)
organism |
Which organism to use. Only human and mouse are available. |
top |
Number of genes per pathway to return. |
progeny <- get_progeny(organism='human', top=500)
progeny <- get_progeny(organism='human', top=500)
decoupleR::show_resources()
. For more
information visit the official website for Omnipath.Wrapper to access resources inside Omnipath.
This wrapper allows to easily query different prior knowledge resources.
To check available resources run decoupleR::show_resources()
. For more
information visit the official website for Omnipath.
get_resource(name, organism = "human", ...)
get_resource(name, organism = "human", ...)
name |
Name of the resource to query. |
organism |
Organism name or NCBI Taxonomy ID. |
... |
Passed to |
df <- decoupleR::get_resource('SIGNOR')
df <- decoupleR::get_resource('SIGNOR')
mat
and network
.Generate a toy mat
and network
.
get_toy_data(n_samples = 24, seed = 42)
get_toy_data(n_samples = 24, seed = 42)
n_samples |
Number of samples to simulate. |
seed |
A single value, interpreted as an integer, or NULL for random number generation. |
List containing mat
and network
.
data <- get_toy_data() mat <- data$mat network <- data$network
data <- get_toy_data() mat <- data$mat network <- data$network
Keep only edges which its target features belong to the input matrix.
intersect_regulons(mat, network, .source, .target, minsize)
intersect_regulons(mat, network, .source, .target, minsize)
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
minsize |
Minimum number of targets per source allowed. |
Filtered tibble.
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) intersect_regulons(mat, net, source, target, minsize=4)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) intersect_regulons(mat, net, source, target, minsize=4)
Renames a given network to these column names: .source, .target, .mor, If .mor is not provided, then the function sets them to default values.
rename_net( network, .source, .target, .mor = NULL, .likelihood = NULL, def_mor = 1 )
rename_net( network, .source, .target, .mor = NULL, .likelihood = NULL, def_mor = 1 )
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
.mor |
Column with edge mode of regulation (i.e. mor). |
.likelihood |
Deprecated argument. Now it will always be set to 1. |
def_mor |
Default value for .mor when not provided. |
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) rename_net(net, source, target, mor)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) rename_net(net, source, target, mor)
Calculates regulatory activities using AUCell.
run_aucell( mat, network, .source = source, .target = target, aucMaxRank = ceiling(0.05 * nrow(rankings)), nproc = availableCores(), seed = 42, minsize = 5 )
run_aucell( mat, network, .source = source, .target = target, aucMaxRank = ceiling(0.05 * nrow(rankings)), nproc = availableCores(), seed = 42, minsize = 5 )
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
aucMaxRank |
Threshold to calculate the AUC. |
nproc |
Number of cores to use for computation. |
seed |
A single value, interpreted as an integer, or NULL for random number generation. |
minsize |
Integer indicating the minimum number of targets per source. |
AUCell (Aibar et al., 2017) uses the Area Under the Curve (AUC) to calculate
whether a set of targets is enriched within the molecular readouts of each
sample. To do so, AUCell first ranks the molecular features of each sample
from highest to lowest value, resolving ties randomly. Then, an AUC can be
calculated using by default the top 5% molecular features in the ranking.
Therefore, this metric, aucell
, represents the proportion of
abundant molecular features in the target set, and their relative abundance
value compared to the other features within the sample.
Aibar S. et al. (2017) Scenic: single-cell regulatory network inference and clustering. Nat. Methods, 14, 1083–1086.
Other decoupleR statistics:
decouple()
,
run_fgsea()
,
run_gsva()
,
run_mdt()
,
run_mlm()
,
run_ora()
,
run_udt()
,
run_ulm()
,
run_viper()
,
run_wmean()
,
run_wsum()
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_aucell(mat, net, minsize=0, nproc=1, aucMaxRank=3)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_aucell(mat, net, minsize=0, nproc=1, aucMaxRank=3)
Function to generate a consensus score between methods from the
result of the decouple
function.
run_consensus(df, include_time = FALSE, seed = NULL)
run_consensus(df, include_time = FALSE, seed = NULL)
df |
|
include_time |
Should the time per statistic evaluated be informed? |
seed |
Deprecated parameter. |
Updated tibble with the computed consensus score between methods
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) results <- decouple( mat = mat, network = net, .source = "source", .target = "target", statistics = c("wmean", "ulm"), args = list( wmean = list(.mor = "mor", .likelihood = "likelihood"), ulm = list(.mor = "mor", .likelihood = "likelihood") ), consensus_score = FALSE, minsize = 0 ) run_consensus(results)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) results <- decouple( mat = mat, network = net, .source = "source", .target = "target", statistics = c("wmean", "ulm"), args = list( wmean = list(.mor = "mor", .likelihood = "likelihood"), ulm = list(.mor = "mor", .likelihood = "likelihood") ), consensus_score = FALSE, minsize = 0 ) run_consensus(results)
Calculates regulatory activities using FGSEA.
run_fgsea( mat, network, .source = source, .target = target, times = 100, nproc = availableCores(), seed = 42, minsize = 5, ... )
run_fgsea( mat, network, .source = source, .target = target, times = 100, nproc = availableCores(), seed = 42, minsize = 5, ... )
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
times |
How many permutations to do? |
nproc |
Number of cores to use for computation. |
seed |
A single value, interpreted as an integer, or NULL. |
minsize |
Integer indicating the minimum number of targets per source. |
... |
Arguments passed on to
|
GSEA (Aravind et al., 2005) starts by transforming the input molecular
readouts in mat to ranks for each sample. Then, an enrichment score
fgsea
is calculated by walking down the list of features, increasing
a running-sum statistic when a feature in the target feature set is
encountered and decreasing it when it is not. The final score is the maximum
deviation from zero encountered in the random walk. Finally, a normalized
score norm_fgsea
, can be obtained by computing the z-score of the estimate
compared to a null distribution obtained from N random permutations. The used
implementation is taken from the package fgsea
(Korotkevich et al., 2021).
Aravind S. et al. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 102, 43.
Korotkevich G. et al. (2021) Fast gene set enrichment analysis. bioRxiv. DOI: https://doi.org/10.1101/060012.
A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:
statistic
: Indicates which method is associated with which score.
source
: Source nodes of network
.
condition
: Condition representing each column of mat
.
score
: Regulatory activity (enrichment score).
Other decoupleR statistics:
decouple()
,
run_aucell()
,
run_gsva()
,
run_mdt()
,
run_mlm()
,
run_ora()
,
run_udt()
,
run_ulm()
,
run_viper()
,
run_wmean()
,
run_wsum()
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_fgsea(mat, net, minsize=0, nproc=1)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_fgsea(mat, net, minsize=0, nproc=1)
Calculates regulatory activities using GSVA.
run_gsva( mat, network, .source = source, .target = target, verbose = FALSE, method = c("gsva", "plage", "ssgsea", "zscore"), minsize = 5L, maxsize = Inf, ... )
run_gsva( mat, network, .source = source, .target = target, verbose = FALSE, method = c("gsva", "plage", "ssgsea", "zscore"), minsize = 5L, maxsize = Inf, ... )
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
verbose |
Gives information about each calculation step. Default: FALSE. |
method |
Method to employ in the estimation of gene-set enrichment.
scores per sample. By default this is set to gsva (Hänzelmann et al, 2013).
Further available methods are "plage", "ssgsea" and "zscore". Read more in
the manual of |
minsize |
Integer indicating the minimum number of targets per source. Must be greater than 0. |
maxsize |
Integer indicating the maximum number of targets per source. |
... |
Arguments passed on to
|
GSVA (Hänzelmann et al., 2013) starts by transforming the input molecular
readouts in mat to a readout-level statistic using Gaussian kernel estimation
of the cumulative density function. Then, readout-level statistics are
ranked per sample and normalized to up-weight the two tails of the rank
distribution. Afterwards, an enrichment score gsva
is calculated
using a running sum statistic that is normalized by subtracting the largest
negative estimate from the largest positive one.
Hänzelmann S. et al. (2013) GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics, 14, 7.
A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:
statistic
: Indicates which method is associated with which score.
source
: Source nodes of network
.
condition
: Condition representing each column of mat
.
score
: Regulatory activity (enrichment score).
Other decoupleR statistics:
decouple()
,
run_aucell()
,
run_fgsea()
,
run_mdt()
,
run_mlm()
,
run_ora()
,
run_udt()
,
run_ulm()
,
run_viper()
,
run_wmean()
,
run_wsum()
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_gsva(mat, net, minsize=1, verbose = FALSE)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_gsva(mat, net, minsize=1, verbose = FALSE)
Calculates regulatory activities using MDT.
run_mdt( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, sparse = FALSE, center = FALSE, na.rm = FALSE, trees = 10, min_n = 20, nproc = availableCores(), seed = 42, minsize = 5 )
run_mdt( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, sparse = FALSE, center = FALSE, na.rm = FALSE, trees = 10, min_n = 20, nproc = availableCores(), seed = 42, minsize = 5 )
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
.mor |
Column with edge mode of regulation (i.e. mor). |
.likelihood |
Deprecated argument. Now it will always be set to 1. |
sparse |
Deprecated parameter. |
center |
Logical value indicating if |
na.rm |
Should missing values (including NaN) be omitted from the
calculations of |
trees |
An integer for the number of trees contained in the ensemble. |
min_n |
An integer for the minimum number of data points in a node that are required for the node to be split further. |
nproc |
Number of cores to use for computation. |
seed |
A single value, interpreted as an integer, or NULL for random number generation. |
minsize |
Integer indicating the minimum number of targets per source. |
MDT fits a multivariate regression random forest for each sample, where the
observed molecular readouts in mat are the response variable and the
regulator weights in net are the covariates. Target features with no
associated weight are set to zero. The obtained feature importances from the
fitted model are the activities mdt
of the regulators in net.
A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:
statistic
: Indicates which method is associated with which score.
source
: Source nodes of network
.
condition
: Condition representing each column of mat
.
score
: Regulatory activity (enrichment score).
Other decoupleR statistics:
decouple()
,
run_aucell()
,
run_fgsea()
,
run_gsva()
,
run_mlm()
,
run_ora()
,
run_udt()
,
run_ulm()
,
run_viper()
,
run_wmean()
,
run_wsum()
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_mdt(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_mdt(mat, net, minsize=0)
Calculates regulatory activities using MLM.
run_mlm( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, sparse = FALSE, center = FALSE, na.rm = FALSE, minsize = 5 )
run_mlm( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, sparse = FALSE, center = FALSE, na.rm = FALSE, minsize = 5 )
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
.mor |
Column with edge mode of regulation (i.e. mor). |
.likelihood |
Deprecated argument. Now it will always be set to 1. |
sparse |
Deprecated parameter. |
center |
Logical value indicating if |
na.rm |
Should missing values (including NaN) be omitted from the
calculations of |
minsize |
Integer indicating the minimum number of targets per source. |
MLM fits a multivariate linear model for each sample, where the observed
molecular readouts in mat are the response variable and the regulator weights
in net are the covariates. Target features with no associated weight are set
to zero. The obtained t-values from the fitted model are the activities
(mlm
) of the regulators in net.
A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:
statistic
: Indicates which method is associated with which score.
source
: Source nodes of network
.
condition
: Condition representing each column of mat
.
score
: Regulatory activity (enrichment score).
Other decoupleR statistics:
decouple()
,
run_aucell()
,
run_fgsea()
,
run_gsva()
,
run_mdt()
,
run_ora()
,
run_udt()
,
run_ulm()
,
run_viper()
,
run_wmean()
,
run_wsum()
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_mlm(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_mlm(mat, net, minsize=0)
Calculates regulatory activities using ORA.
run_ora( mat, network, .source = source, .target = target, n_up = ceiling(0.05 * nrow(mat)), n_bottom = 0, n_background = 20000, with_ties = TRUE, seed = 42, minsize = 5, ... )
run_ora( mat, network, .source = source, .target = target, n_up = ceiling(0.05 * nrow(mat)), n_bottom = 0, n_background = 20000, with_ties = TRUE, seed = 42, minsize = 5, ... )
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
n_up |
Integer indicating the number of top targets to slice from mat. |
n_bottom |
Integer indicating the number of bottom targets to slice from mat. |
n_background |
Integer indicating the background size of the sliced
targets. If not specified the number of background targets is determined by
the total number of unique targets in the union of |
with_ties |
Should ties be kept together? The default, |
seed |
A single value, interpreted as an integer, or NULL for random number generation. |
minsize |
Integer indicating the minimum number of targets per source. |
... |
Arguments passed on to
|
ORA measures the overlap between the target feature set and a list of most
altered molecular features in mat. The most altered molecular features can
be selected from the top and or bottom of the molecular readout distribution,
by default it is the top 5% positive values. With these, a contingency table
is build and a one-tailed Fisher’s exact test is computed to determine if a
regulator’s set of features are over-represented in the selected features
from the data. The resulting score, ora
, is the minus log10 of the
obtained p-value.
A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:
statistic
: Indicates which method is associated with which score.
source
: Source nodes of network
.
condition
: Condition representing each column of mat
.
score
: Regulatory activity (enrichment score).
Other decoupleR statistics:
decouple()
,
run_aucell()
,
run_fgsea()
,
run_gsva()
,
run_mdt()
,
run_mlm()
,
run_udt()
,
run_ulm()
,
run_viper()
,
run_wmean()
,
run_wsum()
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_ora(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_ora(mat, net, minsize=0)
Calculates regulatory activities by using UDT.
run_udt( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, sparse = FALSE, center = FALSE, na.rm = FALSE, min_n = 20, seed = 42, minsize = 5 )
run_udt( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, sparse = FALSE, center = FALSE, na.rm = FALSE, min_n = 20, seed = 42, minsize = 5 )
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
.mor |
Column with edge mode of regulation (i.e. mor). |
.likelihood |
Deprecated argument. Now it will always be set to 1. |
sparse |
Deprecated parameter. |
center |
Logical value indicating if |
na.rm |
Should missing values (including NaN) be omitted from the
calculations of |
min_n |
An integer for the minimum number of data points in a node that are required for the node to be split further. |
seed |
A single value, interpreted as an integer, or NULL for random number generation. |
minsize |
Integer indicating the minimum number of targets per source. |
UDT fits a single regression decision tree for each sample and regulator,
where the observed molecular readouts in mat are the response variable and
the regulator weights in net are the explanatory one. Target features with
no associated weight are set to zero. The obtained feature importance from
the fitted model is the activity udt
of a given regulator.
A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:
statistic
: Indicates which method is associated with which score.
source
: Source nodes of network
.
condition
: Condition representing each column of mat
.
score
: Regulatory activity (enrichment score).
Other decoupleR statistics:
decouple()
,
run_aucell()
,
run_fgsea()
,
run_gsva()
,
run_mdt()
,
run_mlm()
,
run_ora()
,
run_ulm()
,
run_viper()
,
run_wmean()
,
run_wsum()
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_udt(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_udt(mat, net, minsize=0)
Calculates regulatory activities using ULM.
run_ulm( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, sparse = FALSE, center = FALSE, na.rm = FALSE, minsize = 5L )
run_ulm( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, sparse = FALSE, center = FALSE, na.rm = FALSE, minsize = 5L )
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
.mor |
Column with edge mode of regulation (i.e. mor). |
.likelihood |
Deprecated argument. Now it will always be set to 1. |
sparse |
Deprecated parameter. |
center |
Logical value indicating if |
na.rm |
Should missing values (including NaN) be omitted from the
calculations of |
minsize |
Integer indicating the minimum number of targets per source. |
ULM fits a linear model for each sample and regulator, where the observed
molecular readouts in mat are the response variable and the regulator weights
in net are the explanatory one. Target features with no associated weight
are set to zero. The obtained t-value from the fitted model is the activity
ulm
of a given regulator.
A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:
statistic
: Indicates which method is associated with which score.
source
: Source nodes of network
.
condition
: Condition representing each column of mat
.
score
: Regulatory activity (enrichment score).
Other decoupleR statistics:
decouple()
,
run_aucell()
,
run_fgsea()
,
run_gsva()
,
run_mdt()
,
run_mlm()
,
run_ora()
,
run_udt()
,
run_viper()
,
run_wmean()
,
run_wsum()
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_ulm(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_ulm(mat, net, minsize=0)
Calculates regulatory activities using VIPER.
run_viper( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, verbose = FALSE, minsize = 5, pleiotropy = TRUE, eset.filter = FALSE, ... )
run_viper( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, verbose = FALSE, minsize = 5, pleiotropy = TRUE, eset.filter = FALSE, ... )
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
.mor |
Column with edge mode of regulation (i.e. mor). |
.likelihood |
Deprecated argument. Now it will always be set to 1. |
verbose |
Logical, whether progression messages should be printed in the terminal. |
minsize |
Integer indicating the minimum number of targets per source. |
pleiotropy |
Logical, whether correction for pleiotropic regulation should be performed. |
eset.filter |
Logical, whether the dataset should be limited only to the genes represented in the interactome. |
... |
Arguments passed on to
|
VIPER (Alvarez et al., 2016) estimates biological activities by performing a three-tailed enrichment score calculation. For further information check the supplementary information of the decoupler manuscript or the original publication.
Alvarez M.J.et al. (2016) Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet., 48, 838–847.
A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:
statistic
: Indicates which method is associated with which score.
source
: Source nodes of network
.
condition
: Condition representing each column of mat
.
score
: Regulatory activity (enrichment score).
Other decoupleR statistics:
decouple()
,
run_aucell()
,
run_fgsea()
,
run_gsva()
,
run_mdt()
,
run_mlm()
,
run_ora()
,
run_udt()
,
run_ulm()
,
run_wmean()
,
run_wsum()
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_viper(mat, net, minsize=0, verbose = FALSE)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_viper(mat, net, minsize=0, verbose = FALSE)
Calculates regulatory activities using WMEAN.
run_wmean( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, times = 100, seed = 42, sparse = TRUE, randomize_type = "rows", minsize = 5 )
run_wmean( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, times = 100, seed = 42, sparse = TRUE, randomize_type = "rows", minsize = 5 )
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
.mor |
Column with edge mode of regulation (i.e. mor). |
.likelihood |
Deprecated argument. Now it will always be set to 1. |
times |
How many permutations to do? |
seed |
A single value, interpreted as an integer, or NULL for random number generation. |
sparse |
Should the matrices used for the calculation be sparse? |
randomize_type |
How to randomize the expression matrix. |
minsize |
Integer indicating the minimum number of targets per source. |
WMEAN infers regulator activities by first multiplying each target feature by
its associated weight which then are summed to an enrichment score
wmean
. Furthermore, permutations of random target features can
be performed to obtain a null distribution that can be used to compute a
z-score norm_wmean
, or a corrected estimate corr_wmean
by multiplying
wmean
by the minus log10 of the obtained empirical p-value.
A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:
statistic
: Indicates which method is associated with which score.
source
: Source nodes of network
.
condition
: Condition representing each column of mat
.
score
: Regulatory activity (enrichment score).
p_value
: p-value for the score of the method.
Other decoupleR statistics:
decouple()
,
run_aucell()
,
run_fgsea()
,
run_gsva()
,
run_mdt()
,
run_mlm()
,
run_ora()
,
run_udt()
,
run_ulm()
,
run_viper()
,
run_wsum()
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_wmean(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_wmean(mat, net, minsize=0)
Calculates regulatory activities using WSUM.
run_wsum( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, times = 100, seed = 42, sparse = TRUE, randomize_type = "rows", minsize = 5 )
run_wsum( mat, network, .source = source, .target = target, .mor = mor, .likelihood = likelihood, times = 100, seed = 42, sparse = TRUE, randomize_type = "rows", minsize = 5 )
mat |
Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
|
network |
Tibble or dataframe with edges and it's associated metadata. |
.source |
Column with source nodes. |
.target |
Column with target nodes. |
.mor |
Column with edge mode of regulation (i.e. mor). |
.likelihood |
Deprecated argument. Now it will always be set to 1. |
times |
How many permutations to do? |
seed |
A single value, interpreted as an integer, or NULL for random number generation. |
sparse |
Should the matrices used for the calculation be sparse? |
randomize_type |
How to randomize the expression matrix. |
minsize |
Integer indicating the minimum number of targets per source. |
WSUM infers regulator activities by first multiplying each target feature by
its associated weight which then are summed to an enrichment score
wsum
. Furthermore, permutations of random target features can be
performed to obtain a null distribution that can be used to compute a z-score
norm_wsum
, or a corrected estimate corr_wsum
by multiplying
wsum
by the minus log10 of the obtained empirical p-value.
A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:
statistic
: Indicates which method is associated with which score.
source
: Source nodes of network
.
condition
: Condition representing each column of mat
.
score
: Regulatory activity (enrichment score).
p_value
: p-value for the score of the method.
Other decoupleR statistics:
decouple()
,
run_aucell()
,
run_fgsea()
,
run_gsva()
,
run_mdt()
,
run_mlm()
,
run_ora()
,
run_udt()
,
run_ulm()
,
run_viper()
,
run_wmean()
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_wsum(mat, net, minsize=0)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR") mat <- readRDS(file.path(inputs_dir, "mat.rds")) net <- readRDS(file.path(inputs_dir, "net.rds")) run_wsum(mat, net, minsize=0)
Prints the methods available in decoupleR. The first column correspond to the function name in decoupleR and the second to the method's full name.
show_methods()
show_methods()
show_methods()
show_methods()
Shows available resources in Omnipath. For more information visit the official website for Omnipath.
show_resources()
show_resources()
decoupleR::show_resources()
decoupleR::show_resources()