| Title: | LimROTS: A Hybrid Method Integrating Empirical Bayes and Reproducibility-Optimized Statistics for Robust Differential Expression Analysis |
|---|---|
| Description: | Differential expression analysis is commonly used to study diverse biological datasets. The reproducibility-optimized test statistic (ROTS) (Elo et al., 2008, <doi:10.1109/tcbb.2007.1078>) uses a modified t-statistic to prioritise features that differ between two or more groups. However, the ROTS Bioconductor implementation (Suomi et al., 2017, <doi:10.1371/journal.pcbi.1005562>) did not accommodate technical or biological covariates. LimROTS (Anwar et al., 2025, <doi:10.1093/bioinformatics/btaf570>) addressed this limitation by combining a reproducibility-optimized test statistic with the limma empirical Bayes approach (Ritchie et al., 2015, <doi:10.1093/nar/gkv007>). This enables the analysis of more complex experimental designs and the incorporation of covariates. |
| Authors: | Ali Mostafa Anwar [aut, cre] (ORCID: <https://orcid.org/0000-0002-5201-387X>), Leo Lahti [aut, ths] (ORCID: <https://orcid.org/0000-0001-5537-637X>), Akewak Jeba [aut, ctb] (ORCID: <https://orcid.org/0009-0007-1347-7552>), Eleanor Coffey [aut, ths] (ORCID: <https://orcid.org/0000-0002-9717-5610>), Rasmus Hindström [ctb] (ORCID: <https://orcid.org/0009-0004-5731-178X>) |
| Maintainer: | Ali Mostafa Anwar <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.5.0 |
| Built: | 2026-05-30 09:39:30 UTC |
| Source: | https://github.com/bioc/LimROTS |
Parallel processing handling function
Boot_parallel( BPPARAM = NULL, samples, data, formula.str, group, groups, meta.info, a1, a2, pSamples, correlation_block = NULL )Boot_parallel( BPPARAM = NULL, samples, data, formula.str, group, groups, meta.info, a1, a2, pSamples, correlation_block = NULL )
BPPARAM |
A parallel BPPARAM object for distributed computation. |
samples |
bootstrapped samples matrix |
data |
A |
formula.str |
A formula string used when covariates are present in meta. info for modeling. It should include "~ 0 + ..." to exclude the intercept from the model. |
group |
A string specifying the column in |
groups |
groups information from |
meta.info |
A data frame containing sample-level metadata, where each
row corresponds to a sample. It should include the grouping variable
specified in |
a1 |
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs. |
a2 |
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs. |
pSamples |
a permutated list of samples |
correlation_block |
Character or NULL. The name of a column in
|
A list containing: D, S, pD, pS for bootstrapped data and
for permuted data.
Parallel processing handling function for LimROTS survival
Boot_parallel_survival( BPPARAM = NULL, samples, data, formula.str, meta.info, a1, a2, pSamples, competing_risks )Boot_parallel_survival( BPPARAM = NULL, samples, data, formula.str, meta.info, a1, a2, pSamples, competing_risks )
BPPARAM |
A parallel BPPARAM object for distributed computation. |
samples |
bootstrapped samples matrix |
data |
A |
formula.str |
A formula string used when covariates are present in meta. info for modeling. It should include "~ 0 + ..." to exclude the intercept from the model. |
meta.info |
A data frame containing sample-level metadata, where each
row corresponds to a sample. It should include the grouping variable
specified in |
a1 |
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs. |
a2 |
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs. |
pSamples |
a permutated list of samples |
competing_risks |
Logical. If |
A list containing: D, S, pD, pS for bootstrapped data and
for permuted data.
Perform Per-Feature Survival Modeling on Bootstrap Resamples
bootstrap_survival(x, meta.info, formula.str, competing_risks)bootstrap_survival(x, meta.info, formula.str, competing_risks)
x |
A list of data matrices (bootstrap resample), where each element corresponds to a resampled group. Rows represent features (e.g., proteins, metabolites) and columns represent samples. |
meta.info |
A data frame containing the metadata for the samples,
including |
formula.str |
A string specifying the formula to be used in model
fitting. Must include a |
competing_risks |
Logical. If |
For each feature (row), the function constructs a temporary data frame
combining the bootstrap-resampled expression values (y) with the
sample metadata. It then fits either a Cox proportional hazards model
(competing_risks = FALSE) or a Fine subdistribution
hazard model (competing_risks = TRUE) per feature, extracting
the coefficient and its standard error for the feature term y.
A list containing the following elements:
d |
A numeric vector of absolute coefficients
( |
s |
A numeric vector of standard errors of the coefficients for each feature. |
This function generates bootstrap samples from the input metadata. It samples with replacement within each group defined in the metadata, and optionally adjusts for paired groups.
bootstrapS(niter, meta.info, group)bootstrapS(niter, meta.info, group)
niter |
Integer. The number of bootstrap samples to generate. |
meta.info |
Data frame. Metadata containing sample information, where each row corresponds to a sample. |
group |
Character. The name of the column in |
The function works by resampling the row names of the metadata for each group separately.
A matrix of dimension niter x n, where n is the
number of samples. Each row corresponds to a bootstrap sample, and each
entry is a resampled row name from the metadata.
This function generates stratified bootstrap samples based on the groupings and additional factors in the metadata. The function ensures that samples are drawn proportionally based on strata defined by the interaction of factor columns in the metadata.
bootstrapSamples_limRots(niter, meta.info, group)bootstrapSamples_limRots(niter, meta.info, group)
niter |
Integer. The number of bootstrap samples to generate. |
meta.info |
Data frame. Metadata containing sample information,
where each row corresponds to a sample. Factor columns in |
group |
Character. The name of the column in |
The function works by first identifying the factors in the meta.info data
frame that are used to create strata for sampling. Within each group defined
by group, the function samples according to the strata proportions,
ensuring that samples are drawn from the correct groups and strata in a
proportional manner.
A matrix of dimension niter x n, where n is the
number of samples. Each row corresponds to a bootstrap sample, and each
entry is a resampled row name from the metadata, stratified by group and
additional factors.
This function generates stratified bootstrap samples identical to
bootstrapSamples_limRots, but additionally supports correlation
blocks. When correlation_block is specified, all samples sharing
the same block ID are always selected together during resampling.
When correlation_block is NULL, the function delegates entirely
to bootstrapSamples_limRots.
bootstrapSamples_limRots_block( niter, meta.info, group, correlation_block = NULL )bootstrapSamples_limRots_block( niter, meta.info, group, correlation_block = NULL )
niter |
Integer. The number of bootstrap samples to generate. |
meta.info |
Data frame. Metadata containing sample information,
where each row corresponds to a sample. Factor columns in |
group |
Character. The name of the column in |
correlation_block |
Character or NULL. The name of a column in
|
The function follows the same logic as bootstrapSamples_limRots:
within each group defined by group, it identifies factor
columns to create strata, then samples proportionally within each stratum.
When correlation_block is not NULL, entire blocks (e.g., repeated
measures from the same subject) are resampled together as a unit instead
of individual samples.
A matrix of dimension niter x n, where n is the
number of samples. Each row corresponds to a bootstrap sample, and each
entry is a resampled row name from the metadata, stratified by group and
additional factors.
This function generates stratified bootstrap samples similar to
bootstrapSamples_limRots, but additionally supports correlation
blocks. When correlation_block is specified, all samples sharing
the same block ID are always selected together during resampling.
bootstrapSamples_limRots_cox(niter, meta.info, correlation_block = NULL)bootstrapSamples_limRots_cox(niter, meta.info, correlation_block = NULL)
niter |
Integer. The number of bootstrap samples to generate. |
meta.info |
Data frame. Metadata containing sample information,
where each row corresponds to a sample. Factor columns in |
correlation_block |
Character or NULL. The name of a column in
|
When correlation_block is not NULL, the function groups samples by
their block ID within each group/stratum and resamples entire blocks with
replacement, so that correlated samples (e.g., repeated measures from the
same subject) are always kept together.
A matrix of dimension niter x n, where n is the
number of samples. Each row corresponds to a bootstrap sample, and each
entry is a resampled row name from the metadata.
This function calculates the false discovery rate (FDR) by comparing observed values to permuted values.The function sorts observed values, compares them against permuted data, and computes FDR using the median of permutation results.
calculateFalseDiscoveryRate(observedValues, permutedValues)calculateFalseDiscoveryRate(observedValues, permutedValues)
observedValues |
Numeric vector. The observed test statistics or values to be evaluated for significance. |
permutedValues |
Numeric matrix. The permuted test statistics or values,
with rows corresponding to the same values as in |
A numeric vector of the same length as observedValues, containing
the estimated FDR for each observed value.
This function calculates the overlap between observed and permuted data for two sets of comparisons. It computes the ratio of overlap between pairs of vectors (res1/res2 and pres1/pres2) after sorting the values.
calOverlaps(D, S, pD, pS, nrow, N, N_len, ssq, niter, overlaps, overlaps_P)calOverlaps(D, S, pD, pS, nrow, N, N_len, ssq, niter, overlaps, overlaps_P)
D |
Numeric vector. Observed data values (e.g., differences). |
S |
Numeric vector. Standard errors or related values associated with the observed data. |
pD |
Numeric vector. Permuted data values (e.g., differences). |
pS |
Numeric vector. Standard errors or related values associated with the permuted data. |
nrow |
Integer. Number of rows in each block of data. |
N |
Integer vector. Number of top values to consider for overlap calculation. |
N_len |
Integer. Length of the |
ssq |
Numeric. A small constant added to standard errors for stability. |
niter |
Integer. Number of bootstrap samples or resampling iterations. |
overlaps |
Numeric matrix. Matrix to store overlap results for observed data. |
overlaps_P |
Numeric matrix. Matrix to store overlap results for permuted data. |
The function calculates overlaps for two sets of comparisons: one for
observed data (res1/res2) and one for permuted data (pres1/pres2).For each
bootstrap sample, the function orders the two vectors being compared, then
calculates the proportion of overlap for the top N values.
A list containing two matrices: overlaps for observed data and
overlaps_P for permuted data.
This function computes the overlap between two sets of observed and permuted 'values for single-label replicates (SLR). It calculates the proportion of overlap between pairs of vectors (res1/res2 and pres1/pres2) after sorting them.
calOverlaps_slr(D, pD, nrow, N, N_len, niter, overlaps, overlaps_P)calOverlaps_slr(D, pD, nrow, N, N_len, niter, overlaps, overlaps_P)
D |
Numeric vector. Observed data values (e.g., differences). |
pD |
Numeric vector. Permuted data values. |
nrow |
Integer. Number of rows in each block of data. |
N |
Integer vector. Number of top values to consider for overlap calculation. |
N_len |
Integer. Length of the |
niter |
Integer. Number of bootstrap samples or resampling iterations. |
overlaps |
Numeric matrix. Matrix to store overlap results for observed data. |
overlaps_P |
Numeric matrix. Matrix to store overlap results for permuted data. |
The function calculates the overlap for two sets of comparisons: one for
observed data (res1/res2) and one for permuted data (pres1/pres2).
For each bootstrap sample, the function orders the two vectors being
compared, then computes the proportion of overlap for the top N values.
A list containing two matrices: overlaps for observed data and
overlaps_P for permuted data.
Check if meta info is correct
Check_meta_info(meta.info, data, log)Check_meta_info(meta.info, data, log)
meta.info |
Data frame. Metadata associated with the samples
(columns of |
data |
A matrix-like object or a |
log |
Logical, indicating whether the data is already log-transformed. Default is TRUE. |
Logical
Check if SummarizedExperiment or data is correct
Check_SummarizedExperiment( x, assay.type = NULL, meta.info, group, survival = FALSE )Check_SummarizedExperiment( x, assay.type = NULL, meta.info, group, survival = FALSE )
x |
A matrix-like object or a |
assay.type |
A character string or numeric index specifying the assay
to use if |
meta.info |
Data frame. Metadata associated with the samples
(columns of |
group |
Character. Column name in |
survival |
Logical, indicating whether the analysis is survival
analysis. Default is |
a list of data , groups and meta.info
This helper function compares observed values against permuted values and counts the number of permuted values that are greater than or equal to each observed value.
countLargerThan(observedVec, permutedVec)countLargerThan(observedVec, permutedVec)
observedVec |
Numeric vector. The observed values. |
permutedVec |
Numeric vector. The permuted values to compare against the observed values. |
A numeric vector containing the counts of permuted values greater than or equal to the corresponding observed values.
This function fits a per-feature survival model to the full (non-resampled) data matrix using the observed sample metadata, producing the final statistics used for ranking features.
fit_survival(x, meta.info, formula.str, competing_risks)fit_survival(x, meta.info, formula.str, competing_risks)
x |
A data matrix where rows represent features (e.g., proteins, metabolites) and columns represent samples. |
meta.info |
A data frame containing the metadata for the samples.
Must include |
formula.str |
A string specifying the formula to be used in model
fitting. Must include a |
competing_risks |
Logical. If |
For each feature (row), the function appends the feature expression values
as y to the sample metadata and fits either a Cox proportional
hazards model (competing_risks = FALSE) or a
subdistribution hazard model (competing_risks = TRUE). The
coefficient, its standard error, and the exponentiated coefficient
(hazard ratio) for the feature term y are extracted.
Unlike permutating_survival and bootstrap_survival, this
function operates on the observed (non-permuted, non-resampled) data to
produce the final statistics used for feature ranking.
A list containing the following elements:
d |
A numeric vector of absolute coefficients
( |
s |
A numeric vector of standard errors of the coefficients for each feature. |
exp_coef |
A numeric vector of exponentiated coefficients (hazard
ratios, |
This function performs linear modeling using the Limma package while
accounting for covariates specified
in the meta.info. It supports two-group comparisons and multi-group
analysis, incorporating covariates
through a design matrix.
Limma_bootstrap(x, group, meta.info, formula.str, correlation_block = NULL)Limma_bootstrap(x, group, meta.info, formula.str, correlation_block = NULL)
x |
A list containing two or more data matrices where rows represent features (e.g., genes, proteins) and columns represent samples. The list should contain at least two matrices for pairwise group comparison. |
group |
A character string indicating the name of the group
variable in |
meta.info |
A data frame containing the metadata for the samples. This includes sample grouping and any covariates to be included in the model. |
formula.str |
A string specifying the formula to be used in model
fitting. It should follow the standard R formula syntax
(e.g., |
correlation_block |
Character or NULL. The name of a column in
|
This function first combines the data matrices from different groups and
prepares a design matrix based on the covariates specified in meta.info
using the provided formula. It fits a linear model using Limma,
computes contrasts between groups, and applies empirical Bayes moderation.
For two-group comparisons, the function returns log-fold changes and
associated statistics. In multi-group settings with a single covariate,
it calculates pairwise contrasts and moderated F-statistics.
A list containing the following elements:
d |
A vector of the test statistics (log-fold changes or F-statistics) for each feature. |
s |
A vector of the standard deviations for each feature, adjusted by the empirical Bayes procedure. |
lmFit, eBayes,
topTable,
makeContrasts
This function performs linear modeling using the Limma package, incorporating covariates in the model fitting process. It is designed to handle both two-group comparisons and multi-group settings with covariates.
Limma_fit( x, group, meta.info, formula.str, trend, robust, correlation_block = NULL )Limma_fit( x, group, meta.info, formula.str, trend, robust, correlation_block = NULL )
x |
A list containing two or more data matrices where rows represent features (e.g., genes, proteins) and columns represent samples. The list should contain at least two matrices for pairwise group comparison. |
group |
A character string indicating the name of the group
variable in |
meta.info |
A data frame containing the metadata for the samples. This includes sample grouping and any covariates to be included in the model. |
formula.str |
A string specifying the formula to be used in model
fitting. It should follow the standard R formula syntax
(e.g., |
trend |
A logical value indicating whether to allow for an intensity-dependent trend in the prior variance. |
robust |
A logical value indicating whether to use a robust fitting procedure to protect against outliers. |
correlation_block |
Character or NULL. The name of a column in
|
This function combines the data matrices from different groups and fits a
linear model using covariates provided in the meta.info. For two-group
comparisons, the function computes contrasts between the two groups and
applies empirical Bayes moderation. For multi-group analysis with a single
covariate, pairwise contrasts are computed, and the moderated F-statistic is
calculated for each feature.
When formula.str contains a | character (random effects term), the
function bypasses the standard limma workflow and instead uses
dream to fit a linear mixed model. This
path is only supported for two-group comparisons. Passing more than two
groups with a random-effects formula raises an informative error directing
the user to use the correlation_block argument with a fixed-effects
formula instead.
A list containing the following elements:
d |
A vector of the test statistics (log-fold changes or F-statistics) for each feature. |
s |
A vector of the standard deviations for each feature, adjusted by t he empirical Bayes procedure. |
corrected.logfc |
The log-fold changes for each feature after fitting the model. |
lmFit,
eBayes,
topTable,
makeContrasts,
dream,
makeContrastsDream
This function performs linear modeling using the Limma package with permutation of the covariates to evaluate the test statistics under random assignments. It handles two-group comparisons and multi-group settings.
Limma_permutating(x, group, meta.info, formula.str, correlation_block = NULL)Limma_permutating(x, group, meta.info, formula.str, correlation_block = NULL)
x |
A data matrices where rows represent features (e.g., genes, proteins) and columns represent samples. The list should contain at least two matrices for pairwise group comparison. |
group |
A character string indicating the name of the group
variable in |
meta.info |
A data frame containing the metadata for the samples. This includes sample grouping and any covariates to be included in the model. |
formula.str |
A string specifying the formula to be used in model
fitting. It should follow the standard R formula syntax
(e.g., |
correlation_block |
Character or NULL. The name of a column in
|
This function combines the data matrices from different groups and permutes
the covariates from meta.infobefore fitting a linear model using Limma.
Permutation helps assess how the covariates behave under random conditions,
providing a null distribution of the test statistics. For two-group
comparisons, the function computes contrasts between the two groups and
applies empirical Bayes moderation. For multi-group analysis with a single
covariate, pairwise contrasts are computed, and the moderated F-statistic is
calculated for each feature.
A list containing the following elements:
d |
A vector of the test statistics (log-fold changes or F-statistics) for each feature. |
s |
A vector of the standard deviations for each feature, adjusted by the empirical Bayes procedure. |
lmFit, eBayes,
topTable,
makeContrasts
LimROTS: A Hybrid Method Integrating Empirical Bayes and
Reproducibility-Optimized Statistics for Robust
Differential Expression AnalysisLimROTS: A Hybrid Method Integrating Empirical Bayes and
Reproducibility-Optimized Statistics for Robust
Differential Expression Analysis
LimROTS( x, assay.type = NULL, niter = 1000, K = NULL, a1 = NULL, a2 = NULL, log = TRUE, verbose = TRUE, meta.info, BPPARAM = NULL, group, formula.str, robust = TRUE, trend = TRUE, permutating.group = FALSE, correlation_block = NULL )LimROTS( x, assay.type = NULL, niter = 1000, K = NULL, a1 = NULL, a2 = NULL, log = TRUE, verbose = TRUE, meta.info, BPPARAM = NULL, group, formula.str, robust = TRUE, trend = TRUE, permutating.group = FALSE, correlation_block = NULL )
x |
A |
assay.type |
A character string or numeric index specifying the assay
to use if |
niter |
An integer representing the amount of bootstrap iterations. Default is 1000. |
K |
An optional integer representing the top list size for ranking. If not specified, it is set to one-fourth of the number of features. |
a1 |
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs. |
a2 |
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs. |
log |
Logical, indicating whether the data is already log-transformed.
Default is |
verbose |
Logical, indicating whether to display messages during the
function's execution. Default is |
meta.info |
a character vector of the metadata needed for the
model to run and can be retrieved using |
BPPARAM |
A |
group |
A string specifying the column in |
formula.str |
A formula string for modeling.
It should include "~ 0 + ..." to exclude the intercept from the model.
All the model parameters must be present in |
robust |
indicating whether robust fitting should be used. Default is TRUE, see eBayes. |
trend |
indicating whether to include trend fitting in the differential expression analysis. Default is TRUE. see eBayes. |
permutating.group |
Logical, If |
correlation_block |
Character or NULL. The name of a column in
|
The LimROTS approach initially uses
limma package functionality to simulate the intensity data of
proteins and
metabolites. A linear model is subsequently fitted using the design matrix.
Empirical Bayes variance shrinking is then implemented. To obtain the
moderated t-statistics, the adjusted standard error
for each feature is computed, along with the regression
coefficient for each feature (indicating the impact of variations in the
experimental settings). Then, by adapting a reproducibility-optimized
technique known as ROTS to establish an optimality
based on the largest overlap of top-ranked features within group-preserving
bootstrap datasets, Finally based on the optimized parameters
and
this equation used to calculates the final statistics:
where
is the final statistics for each feature,
is the coefficient, and
is the the adjusted
standard error. LimROTS generates p-values from permutation samples
using the implementation available in
qvalue package, along with internal implementation
of FDR adapted from ROTS package. Additionally, the qvalue package
is used to calculate q-values, were the proportion of true null
p-values is set to the bootstrap method pi0est.
This function processes a dataset using parallel computation. It leverages the BiocParallel framework to distribute tasks across multiple workers, which can significantly reduce runtime for large datasets.
An object of class "SummarizedExperiment" with the
following elements:
data |
The original data matrix. |
niter |
The number of bootstrap samples used. |
statistics |
The optimized statistics for each feature. |
pvalue |
P-values computed based on the permutation samples. |
FDR |
False discovery rate estimates. |
a1 |
Optimized parameter used in differential expression ranking. |
a2 |
Optimized parameter used in differential expression ranking. |
k |
Top list size used for ranking. |
corrected.logfc |
estimate of the log2-fold-change corresponding to the effect corrected by the s model see topTable. |
q_values |
Estimated q-values using the |
BH.pvalue |
Benjamini-Hochberg adjusted p-values. |
null.statistics |
The optimized null statistics for each feature. |
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47
Suomi T, Seyednasrollah F, Jaakkola M, Faux T, Elo L (2017). “ROTS: An R package for reproducibility-optimized statistical testing. ” PLoS computational biology, 13(5), e1005562. https://doi.org/10.1371/journal.pcbi.1005562 https://doi.org/10.1371/journal.pcbi.1005562, http://www.ncbi.nlm.nih.gov/pubmed/28542205
Elo LL, Filen S, Lahesmaa R, Aittokallio T. Reproducibility-optimized test statistic for ranking genes in microarray studies. IEEE/ACM Trans Comput Biol Bioinform. 2008;5(3):423-431. https://doi.org/10.1109/tcbb.2007.1078
# Example usage: data <- data.frame(matrix(rnorm(500), nrow = 100, ncol = 10)) # Simulated data meta.info <- data.frame( group = factor(rep(1:2, each = 5)), row.names = colnames(data) ) formula.str <- "~ 0 + group" result <- LimROTS(data, meta.info = meta.info, group = "group", formula.str = formula.str, niter = 10 )# Example usage: data <- data.frame(matrix(rnorm(500), nrow = 100, ncol = 10)) # Simulated data meta.info <- data.frame( group = factor(rep(1:2, each = 5)), row.names = colnames(data) ) formula.str <- "~ 0 + group" result <- LimROTS(data, meta.info = meta.info, group = "group", formula.str = formula.str, niter = 10 )
LimROTS_survival: A Hybrid Method Integrating Empirical Bayes and
Reproducibility-Optimized Statistics for Robust
survival analysis in Omics DataLimROTS_survival: A Hybrid Method Integrating Empirical Bayes and
Reproducibility-Optimized Statistics for Robust
survival analysis in Omics Data
LimROTS_survival( x, assay.type = NULL, niter = 1000, K = NULL, a1 = NULL, a2 = NULL, verbose = TRUE, meta.info, BPPARAM = NULL, formula.str, competing_risks = FALSE, correlation_block = NULL )LimROTS_survival( x, assay.type = NULL, niter = 1000, K = NULL, a1 = NULL, a2 = NULL, verbose = TRUE, meta.info, BPPARAM = NULL, formula.str, competing_risks = FALSE, correlation_block = NULL )
x |
A |
assay.type |
A character string or numeric index specifying the assay
to use if |
niter |
An integer representing the amount of bootstrap iterations. Default is 1000. |
K |
An optional integer representing the top list size for ranking. If not specified, it is set to one-fourth of the number of features. |
a1 |
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs. |
a2 |
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs. |
verbose |
Logical, indicating whether to display messages during the
function's execution. Default is |
meta.info |
a character vector of the metadata needed for the
model to run and can be retrieved using |
BPPARAM |
A |
formula.str |
A formula string for modeling.
It should include "~ 0 + ..." to exclude the intercept from the model.
All the model parameters must be present in |
competing_risks |
Logical. If |
correlation_block |
Character or NULL. The name of a column in
|
LimROTS_survival applies the reproducibility-optimized
statistic framework to survival analysis. For each bootstrap resample,
a Cox proportional hazards model (or a competing risks model
via crr from cmprsk when competing_risks = TRUE)
is fitted per feature using the supplied formula.str. The
coefficient and its standard error are
extracted for each feature across bootstrap datasets.
Reproducibility-optimized parameters and
are then selected by maximizing the overlap of top-ranked features across
group-preserving bootstrap datasets, following the
ROTS approach. The final statistic for each feature is:
where is the final statistic,
is the Cox model coefficient, and is its standard error.
P-values are computed from permutation-based null distributions using empPvals from the qvalue package, alongside an internal FDR implementation adapted from the ROTS package. Q-values are estimated via qvalue with the proportion of true null p-values set using the bootstrap method pi0est.
This function processes a dataset using parallel computation. It leverages the BiocParallel framework to distribute tasks across multiple workers, which can significantly reduce runtime for large datasets.
A SummarizedExperiment when x is a
SummarizedExperiment (results added to rowData and
metadata), or a list with the following elements:
data |
The original data matrix. |
niter |
The number of bootstrap samples used. |
statistics |
The optimized statistics for each feature. |
pvalue |
P-values computed based on the permutation samples. |
FDR |
False discovery rate estimates. |
a1 |
Optimized parameter used in survival ranking. |
a2 |
Optimized parameter used in survival ranking. |
k |
Top list size used for ranking. |
R |
Reproducibility score for the optimized parameters. |
Z |
Z-score corresponding to the optimized parameters. |
ztable |
Table of z-scores across the parameter grid. |
exp_coef |
Exponentiated coefficient (hazard ratio estimate) from the Cox proportional hazards model for each feature. |
q_values |
Estimated q-values using the |
BH.pvalue |
Benjamini-Hochberg adjusted p-values. |
null.statistics |
The optimized null statistics for each feature. |
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47
Suomi T, Seyednasrollah F, Jaakkola M, Faux T, Elo L (2017). “ROTS: An R package for reproducibility-optimized statistical testing. ” PLoS computational biology, 13(5), e1005562. https://doi.org/10.1371/journal.pcbi.1005562 https://doi.org/10.1371/journal.pcbi.1005562, http://www.ncbi.nlm.nih.gov/pubmed/28542205
Elo LL, Filen S, Lahesmaa R, Aittokallio T. Reproducibility-optimized test statistic for ranking genes in microarray studies. IEEE/ACM Trans Comput Biol Bioinform. 2008;5(3):423-431. https://doi.org/10.1109/tcbb.2007.1078
set.seed(123) nsamples <- 20 nfeatures <- 50 sim_data <- matrix(rnorm(nfeatures * nsamples), nrow = nfeatures) colnames(sim_data) <- paste0("sample", seq_len(nsamples)) rownames(sim_data) <- paste0("gene", seq_len(nfeatures)) meta_data <- data.frame( time = abs(rnorm(nsamples, mean = 5, sd = 2)), event = sample(0:1, nsamples, replace = TRUE), group = factor(rep(seq_len(2), each = nsamples / 2)), row.names = colnames(sim_data) ) formula.str <- "~ Surv(time, event) + group" result <- LimROTS_survival( x = sim_data, meta.info = meta_data, formula.str = formula.str, niter = 10, verbose = FALSE, competing_risks = FALSE )set.seed(123) nsamples <- 20 nfeatures <- 50 sim_data <- matrix(rnorm(nfeatures * nsamples), nrow = nfeatures) colnames(sim_data) <- paste0("sample", seq_len(nsamples)) rownames(sim_data) <- paste0("gene", seq_len(nfeatures)) meta_data <- data.frame( time = abs(rnorm(nsamples, mean = 5, sd = 2)), event = sample(0:1, nsamples, replace = TRUE), group = factor(rep(seq_len(2), each = nsamples / 2)), row.names = colnames(sim_data) ) formula.str <- "~ Surv(time, event) + group" result <- LimROTS_survival( x = sim_data, meta.info = meta_data, formula.str = formula.str, niter = 10, verbose = FALSE, competing_risks = FALSE )
This function optimizes parameters by calculating overlaps between observed
and permuted data for multiple values of a smoothing constant (ssq) and a
single-label replicate (SLR) comparison.
Optimizing(niter, ssq, N, D, S, pD, pS, verbose)Optimizing(niter, ssq, N, D, S, pD, pS, verbose)
niter |
Integer. Number of bootstrap samples or resampling iterations. |
ssq |
Numeric vector. Smoothing constants to be evaluated. |
N |
Integer vector. Number of top values to consider for overlap calculation. |
D |
Numeric matrix. Observed data values. |
S |
Numeric matrix. Standard errors or related values for observed data. |
pD |
Numeric matrix. Permuted data values. |
pS |
Numeric matrix. Standard errors or related values for permuted data. |
verbose |
Logical. If |
The function calculates overlaps for a range of smoothing constants and
identifies the optimal set of parameters by maximizing a z-score-based
metric, which compares the overlap of observed data to permuted data.
It computes overlap matrices for both observed (D and S) and permuted
(pD and pS) data and returns the optimal parameters based on the
highest z-score.
A list containing the optimal parameters:
a1: Optimal smoothing constant or 1 for SLR.
a2: SLR flag (1 if smoothing constant is optimal,
0 if SLR is optimal).
k: Optimal number of top values to consider for overlap.
R: Optimal overlap value.
Z: Optimal z-score.
ztable: Matrix of z-scores for all evaluated parameters.
This function fits a per-feature survival model to the original data matrix using permuted sample metadata, generating a null distribution of test statistics for downstream FDR and p-value estimation.
permutating_survival(x, meta.info, formula.str, competing_risks)permutating_survival(x, meta.info, formula.str, competing_risks)
x |
A data matrix where rows represent features (e.g., proteins, metabolites) and columns represent samples. |
meta.info |
A data frame of permuted sample metadata, where each row
corresponds to a sample. Must include |
formula.str |
A string specifying the formula to be used in model
fitting. Must include a |
competing_risks |
Logical. If |
For each feature (row), the function appends the feature expression values
as y to the permuted metadata and fits either a Cox proportional
hazards model (competing_risks = FALSE) or a
subdistribution hazard model (competing_risks = TRUE). Because
the metadata is permuted, the resulting statistics form the null distribution
used to compute empirical p-values and FDR.
Unlike bootstrap_survival, this function operates directly on the
full data matrix without group splitting or resampling.
A list containing the following elements:
d |
A numeric vector of absolute Cox coefficients
( |
s |
A numeric vector of standard errors of the coefficients for each feature. |
This function performs a series of checks and initial setups for input data, metadata, and parameters, ensuring everything is correctly formatted for downstream analysis.
SanityChecK( x, assay.type = NULL, niter = 1000, K = NULL, meta.info, group, formula.str, verbose = TRUE, log = TRUE, survival = FALSE )SanityChecK( x, assay.type = NULL, niter = 1000, K = NULL, meta.info, group, formula.str, verbose = TRUE, log = TRUE, survival = FALSE )
x |
A matrix-like object or a |
assay.type |
A character string or numeric index specifying the assay
to use if |
niter |
Integer. Number of bootstrap samples or resampling iterations. Default is 1000. |
K |
Integer. Top list size. If NULL, it will be set to a quarter of the number of rows in the data matrix. Default is NULL. |
meta.info |
Data frame. Metadata associated with the samples
(columns of |
group |
Character. Column name in |
formula.str |
Optional character string representing the formula for the model. |
verbose |
Logical, indicating whether to display messages during the function's execution. Default is TRUE. |
log |
Logical, indicating whether the data is already log-transformed. Default is TRUE. |
survival |
Logical, indicating whether the analysis is survival
analysis.
Default is |
This function checks whether the input data and metadata are in the correct
format, processes metadata from a SummarizedExperiment object if provided,
and ensures that group information is correctly specified. If no top list
size (K) is provided, it defaults to a quarter of the number of rows in
the data.
A list containing:
meta.info: Processed metadata.
data: Processed data matrix.
groups: Numeric or factor vector indicating group assignments.
K: Top list size to be used in the analysis.
A
SummarizedExperiment
object containing DIA proteomics data from a UPS1-spiked E. coli proteins,
processed using both Spectronaut and ScaffoldDIA separately, then merged.
An instance of the
SummarizedExperiment
class with the following assays:
This assay includes log2 protein intensities calculated by averaging the peptides derived from the same protein
The object also contains colData and rowData:
A DataFrame with metadata for samples.
A DataFrame with metadata for proteins.
The colData contains the following columns:
Unique identifier for each sample.
Experimental condition or group for each sample , representing different conc. of UPS1-spiked proteins.
software tool used, Spectronaut or ScaffoldDIA.
A fake digestion batch.
Generated with both spectronaut and ScaffoldDIA separately. using a mixed mode acquisition method and FASTA mode for demonstration purposes.
Gotti, C., Roux-Dalvai, F., Joly-Beauparlant, C., Mangnier, L., Leclercq, M., & Droit, A. (2022). DIA proteomics data from a UPS1-spiked E.coli protein mixture processed with six software tools. In Data in Brief (Vol. 41, p. 107829). Elsevier BV. https://doi.org/10.1016/j.dib.2022.107829