Title: | Random Rotation Methods for High Dimensional Data with Batch Structure |
---|---|
Description: | A collection of methods for performing random rotations on high-dimensional, normally distributed data (e.g. microarray or RNA-seq data) with batch structure. The random rotation approach allows exact testing of dependent test statistics with linear models following arbitrary batch effect correction methods. |
Authors: | Peter Hettegger [aut, cre] |
Maintainer: | Peter Hettegger <[email protected]> |
License: | GPL-3 |
Version: | 1.19.0 |
Built: | 2024-10-31 04:22:16 UTC |
Source: | https://github.com/bioc/randRotation |
A collection of methods for performing random rotations on high-dimensional, normally distributed data (e.g. microarray or RNA-seq data) with batch structure. The random rotation approach allows exact testing of dependent test statistics with linear models following arbitrary batch effect correction methods.
Please refer to the package vignette for further details on usage and for
a "quick start". rotateStat
is
the central function of the package. Methods are described in
(Hettegger et al. 2021).
Maintainer: Peter Hettegger [email protected] (ORCID)
Hettegger P, Vierlinger K, Weinhaeusel A (2021). “Random rotation for identifying differentially expressed genes with linear models following batch effect correction.” Bioinformatics. ISSN 1367-4803, doi:10.1093/bioinformatics/btab063.
Useful links:
Report bugs at https://github.com/phettegger/randRotation/issues
This function takes a model matrix X
and a contrast matrix C
and creates a transformed model matrix corresponding to a transformed set
of coefficients.
contrastModel(X, C, coef.h = seq_len(ncol(C)))
contrastModel(X, C, coef.h = seq_len(ncol(C)))
X |
|
C |
|
coef.h |
column numbers of contrasts (in |
The last n coefficients of the transformed model matrix correspond to the n
contrasts. By default, all contrasts are set as coef.h
.
See package vignette for examples of data rotations with contrasts.
A transformed model matrix with coef.h
set as attribute.
Peter Hettegger
group <- c("A", "A", "B", "B") X <- model.matrix(~0+group) C <- cbind(contrast1 = c(1, -1)) X2 <- contrastModel(X, C)
group <- c("A", "A", "B", "B") X <- model.matrix(~0+group) C <- cbind(contrast1 = c(1, -1)) X2 <- contrastModel(X, C)
Retrieve the dimensions of an object.
## S4 method for signature 'initRandrot' dim(x) ## S4 method for signature 'initBatchRandrot' dim(x)
## S4 method for signature 'initRandrot' dim(x) ## S4 method for signature 'initBatchRandrot' dim(x)
x |
An object of class
|
Vector of length two with number of features
and number of
samples
. See also
initRandrot
.
Retrieve the dimnames of an object.
## S4 method for signature 'initRandrot' dimnames(x) ## S4 method for signature 'initBatchRandrot' dimnames(x)
## S4 method for signature 'initRandrot' dimnames(x) ## S4 method for signature 'initBatchRandrot' dimnames(x)
x |
An object of class
|
A list
with names of features
and samples
, see
initRandrot
.
This class contains initRandrot
or initRandrotW
class
objects for each batch. See also descriptions in
initRandrot
and
initRandrot-class
.
batch.obj
List of initRandrot
or initRandrotW
class objects for each batch.
split.by
List of sample indices for each batch.
Peter Hettegger
Initialization of a linear model for subsequent generation of randomly
rotated data (randrot
) associated with
the null hypothesis .
Basics of rotation tests are found in (Hettegger et al. 2021) and
(Langsrud 2005).
initRandrot(Y = NULL, X = NULL, coef.h = NULL, weights = NULL, cormat = NULL) initBatchRandrot( Y = NULL, X = NULL, coef.h = NULL, batch = NULL, weights = NULL, cormat = NULL ) ## S4 method for signature 'list' initBatchRandrot( Y = NULL, X = Y$design, coef.h = NULL, batch = NULL, weights = Y$weights, cormat = NULL ) ## S4 method for signature 'list' initRandrot( Y = NULL, X = Y$design, coef.h = NULL, weights = Y$weights, cormat = NULL )
initRandrot(Y = NULL, X = NULL, coef.h = NULL, weights = NULL, cormat = NULL) initBatchRandrot( Y = NULL, X = NULL, coef.h = NULL, batch = NULL, weights = NULL, cormat = NULL ) ## S4 method for signature 'list' initBatchRandrot( Y = NULL, X = Y$design, coef.h = NULL, batch = NULL, weights = Y$weights, cormat = NULL ) ## S4 method for signature 'list' initRandrot( Y = NULL, X = Y$design, coef.h = NULL, weights = Y$weights, cormat = NULL )
Y |
a data matrix with |
X |
the design matrix of the experiment with |
coef.h |
single integer or vector of integers specifying the "hypothesis
coefficients" ( |
weights |
numerical matrix of finite positive weights > 0 (as in
weighted least squares regression. Dimensions must be equal to dimensions
of |
cormat |
the sample correlation matrix with |
batch |
Batch covariate of the same length as |
This function performs basic initial checks and preparatory calculations for random rotation data generation. Nomenclature of variables is mainly as in (Langsrud 2005) and (Hettegger et al. 2021). See also package vignette for application examples.
Y
can also be a list with elements E
, design
and
weights
. Y$E
is thereby used as Y
, Y$design
is
used as X
and Y$weights
is used as weights
. By this, the
functions are compatible with results from e.g. voom
(limma
package), see Examples
.
coef.h
specifies the model coefficients associated with the null
hypothesis ("hypothesis coefficients"). All other model coefficients are
considered as "determined coefficients" coef.d
(Langsrud 2005). The design matrix is rearranged so
that coef.h
correspond to the last columns of the design matrix and
coef.d
correspond to the first columns of the design matrix. This
is necessary for adequate transformation of the combined null-hypothesis
by QR decomposition.
If
X[,coef.d]
does not have full rank, a warning is generated and
coef.d
is set to coef.d <- seq_len(qr(X[,coef.d])$rank)
.
Weights must be finite positive numerics
greater zero. This is
necessary for model (QR) decomposition and for back transformation of the
rotated data into the original variance structure, see also
randrot
. Weights as estimated e.g. by
voom (Law et al. 2014) are suitable and can be used without
further processing. Note that due to the whitening transformation (i.e. by
using the arguments weights
and/or cormat
) the rank of the
transformed (whitened) design matrix X
could change (become smaller),
which could become dangerous for the fitting procedures. If you get errors
using weights
and/or cormat
, try the routine without using
weights
and/or cormat
to exclude this source of errors.
The following section provides a brief summary how rotations are calculated.
A more general introduction is given in
(Langsrud 2005). For reasons of readability, we omit
writing %*%
for matrix multiplication and write *
for
transposed matrix. The rotation is done by multiplying the features x
samples
data matrix Y
with the transpose of the restricted random
rotation matrix Rt
Rt = Xd Xd* + [Xh Xe] R [Xh Xe]*
with R
being a (reduced) random rotation matrix and Xd
,
Xh
and Xe
being columns of the full QR decomposition of the
design matrix X
. [Xd Xh Xe] = qr.Q(qr(X), complete = TRUE)
,
where Xd
correspond to columns coef.d
, Xh
to columns
coef.h
and Xe
to the remaining columns.
If weights
and/or cormat
are specified, each feature
Y[i,]
and the design matrix X
are whitening transformed before
rotation. The whitening matrix T
is defined as T = solve(C) w
,
where solve(C)
is the inverse Cholesky decompostion of the correlation
matrix (cormat = CC*
) and w
is a diagonal matrix of the square
roots of the sample weights for the according feature (w =
diag(sqrt(weights[i,]
))).
The rotated data for one feature y.r[i,]
is thus calculated as
y.r[i,] = ( solve(T) Rt T (y[i,])* )*
and [Xd Xh Xe] =
qr.Q(qr(TX), complete = TRUE)
For weights = NULL
and cormat = NULL
, T
is the identity
matrix.
Note that a separate QR decomposition is calculated for each feature if
weights
are specified. The restricted random orthogonal matrix
Rt
is calculated with the same reduced random orthogonal matrix R
for all features.
When using initBatchRandrot
, initRandrot
is called
for each batch separately. When using initBatchRandrot
with
cormat
, cormat
needs to be a list of correlation matrices
with one matrix for each batch. Note that this implicitly assumes a block
design of the sample correlation matrix, where sample correlation
coefficients between batches are zero. For a more general sample
correlation matrix, allowing non-zero sample correlation coefficients
between batches, see package vignette. Batches are split according to
split(seq_along(batch), batch)
.
An initialised
initRandrot
,
initRandrotW
or
initBatchRandrot
object.
Peter Hettegger
Hettegger P, Vierlinger K, Weinhaeusel A (2021).
“Random rotation for identifying differentially expressed genes with linear models following batch effect correction.”
Bioinformatics.
ISSN 1367-4803, doi:10.1093/bioinformatics/btab063.
Langsrud O (2005).
“Rotation tests.”
Statistics and Computing, 15(1), 53–60.
ISSN 09603174, doi:10.1007/s11222-005-4789-5.
Law CW, Chen Y, Shi W, Smyth GK (2014).
“Voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.”
Genome Biology, 15(2), 1–17.
ISSN 1474760X, doi:10.1186/gb-2014-15-2-r29, http://www.ncbi.nlm.nih.gov/pubmed/24485249.
# For further examples see '?rotateStat' and package vignette. # Example 1: Compatibility with limma::voom ## Not run: v <- voom(counts, design) ir <- initRandrot(v) ## End(Not run) # Example 2: #set.seed(0) # Dataframe of phenotype data (sample information) # We simulate 2 sample classes processed in 3 batches pdata <- data.frame(batch = rep(1:3, c(10,10,10)), phenotype = rep(c("Control", "Cancer"), c(5,5))) features <- 100 # Matrix with random gene expression data edata <- matrix(rnorm(features * nrow(pdata)), features) rownames(edata) <- paste("feature", 1:nrow(edata)) mod1 <- model.matrix(~phenotype, pdata) # Initialisation of the random rotation class init1 <- initBatchRandrot(Y = edata, X = mod1, coef.h = 2, batch = pdata$batch) init1 # See '?rotateStat'
# For further examples see '?rotateStat' and package vignette. # Example 1: Compatibility with limma::voom ## Not run: v <- voom(counts, design) ir <- initRandrot(v) ## End(Not run) # Example 2: #set.seed(0) # Dataframe of phenotype data (sample information) # We simulate 2 sample classes processed in 3 batches pdata <- data.frame(batch = rep(1:3, c(10,10,10)), phenotype = rep(c("Control", "Cancer"), c(5,5))) features <- 100 # Matrix with random gene expression data edata <- matrix(rnorm(features * nrow(pdata)), features) rownames(edata) <- paste("feature", 1:nrow(edata)) mod1 <- model.matrix(~phenotype, pdata) # Initialisation of the random rotation class init1 <- initBatchRandrot(Y = edata, X = mod1, coef.h = 2, batch = pdata$batch) init1 # See '?rotateStat'
List-based S4 class containing all information necessary to generate randomly
rotated data with the randrot
method.
initRandrot
and initRandrotW
objects are created with the
initRandrot
method.
initRandrotW
is organised as its base class initRandrot
,
altough some components are changed or added.
The following components are included as list elements:
X
Original (non-transformed) design matrix.
Xhe
, Xhe.Y.w
, Yd
Pre-multiplied matrix products needed for generation of rotated data (randrot
).
coef.h
, coef.d
Indices of coefficients (
coef.h
or "hypothesis coefficients") and indices of all other coefficients (coef.d
or "determined coefficients").
cormat
Correlation matrix, see initRandrot
.
tcholC
Cholesky decomposition of cormat: cormat = crossprod(tcholC)
.
rank
Rank of the qr decomposition of (transformed/whitened) X
The following components are changed or added in initRandrotW-class
as compared to initRandrot-class
:
decomp.list
List containing Xd
, Xhe
and rank of the transformed/whitened design matrix for each feature, see also X_decomp
.
w
Numeric matrix with dimensions features x samples
containing component wise square root of the weight matrix, see initRandrot
.
Peter Hettegger
This function calculates either (1) resampling based p-values with subsequent
p-value adjustment using stats::p.adjust
or (2)
resampling based false-discovery-rates (FDRs) for rotated statistics from a
rotateStat
object.
pFdr(obj, method = "none", pooled = TRUE, na.rm = FALSE, beta = 0.05)
pFdr(obj, method = "none", pooled = TRUE, na.rm = FALSE, beta = 0.05)
obj |
A |
method |
Can be either |
pooled |
|
na.rm |
|
beta |
|
Larger values of obj$s0 are considered more significant when
compared to the empirical distribution. E.g. for calculation of resampling
based p-values (with pooled = FALSE
) we in principle use
p.val <- (rowSums(obj$stats >= obj$s0)+1)/(ncol(obj$stats)+1)
according to
(Phipson and Smyth 2010).
method = "fdr.q"
and method = "fdr.qu"
are resampling based
fdr estimates and can only be used with pooled = TRUE
. method
= "fdr.q"
is the FDR local estimator and method = "fdr.qu"
is the
FDR upper limit, see (Reiner et al. 2003; Yekutieli and Benjamini 1999).
For all other method
arguments resampling based p-values are
calculated and passed to stats::p.adjust
for
p-value adjustment. So these methods provide resampling based p-values with
(non-resampling based) p-value adjustment.
method = "fdr.q"
and method = "fdr.qu"
were
adapted from package fdrame
(Benjamini et al. 2019; Reiner et al. 2003).
When pooled = TRUE
,
marginal distributions of the test statistics are considered exchangeable
for all features. The resampling based p-values of each feature are then
calculated from all rotated statistics (all features, all rotations). For
these cases, if the number of features is reasonably large, usually only
few resamples (argument R
in
rotateStat
) are required.
We want to emphasize that in order for the marginal distributions to be
exchangeable, the statistics must be a pivotal quantity (i.e. it must be
scale independent).
Pivotal quantities are e.g. t values. Using e.g. linear models
with coef
as statistics is questionable if the different features
are measured on different scales. The resampled coefficients then
have different variances and pooled = TRUE
is not applicable.
We thus highly recommend using pivotal quantities as statistics
in
rotateStat
if possible.
When pooled = FALSE
the resampling based p-values are calculcated
for each feature separately. This is required if one expects the resampling
based statistics to be distributed differently for individual features. For
most common applications this should not be the case and the marginal
distribution are exchangeable for all features, hence pooled = TRUE
by default.
If method = "fdr.q"
or method = "fdr.qu"
and
weights
were specified when initialising the random rotation
object (see parameter initialised.obj
in
rotateStat
), a warning is displayed.
The correlation structure (dependence structure) of linear model
coefficients between different features is not generally preserved if
different weights are used for different features.
Methods fdr.q
and fdr.qu
rely on preserved correlation
structure of dependent statistics and thus should not be used if statistics
based on model coefficients (e.g. t statistics of model coefficients) are
used in combination with different weights.
P-values and FDRs are calculated for each column of obj$s0
separately.
A numeric
matrix of corrected p-values or FDRs with dimension dim(obj$s0)
.
Peter Hettegger
Benjamini Y, Kenigsberg E, Reiner A, Yekutieli D (2019).
fdrame: FDR adjustments of Microarray Experiments (FDR-AME).
R package version 1.56.0.
Phipson B, Smyth GK (2010).
“Permutation P-values should never be zero: Calculating exact P-values when permutations are randomly drawn.”
Statistical Applications in Genetics and Molecular Biology, 9(1).
ISSN 15446115, doi:10.2202/1544-6115.1585, 1603.05766, http://www.ncbi.nlm.nih.gov/pubmed/21044043.
Reiner A, Yekutieli D, Benjamini Y (2003).
“Identifying differentially expressed genes using false discovery rate controlling procedures.”
Bioinformatics, 19(3), 368–375.
ISSN 13674803, doi:10.1093/bioinformatics/btf877, http://www.ncbi.nlm.nih.gov/pubmed/12584122.
Yekutieli D, Benjamini Y (1999).
“Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics.”
Journal of Statistical Planning and Inference, 82(1-2), 171–196.
ISSN 03783758, doi:10.1016/S0378-3758(99)00041-5, http://www.ncbi.nlm.nih.gov/pubmed/83580500015.
# See also '?rotateStat': #set.seed(0) # Dataframe of phenotype data (sample information) # We simulate 2 sample classes processed in 3 batches pdata <- data.frame(batch = rep(1:3, c(10,10,10)), phenotype = rep(c("Control", "Cancer"), c(5,5))) features <- 100 # Matrix with random gene expression data edata <- matrix(rnorm(features * nrow(pdata)), features) rownames(edata) <- paste("feature", 1:nrow(edata)) mod1 <- model.matrix(~phenotype, pdata) # Initialisation of the random rotation class init1 <- initBatchRandrot(Y = edata, X = mod1, coef.h = 2, batch = pdata$batch) init1 # Definition of the batch effect correction procedure with subsequent calculation # of two-sided test statistics statistic <- function(., batch, mod, coef){ # The "capture.output" and "suppressMessages" simply suppress any output capture.output(suppressMessages( Y.tmp <- sva::ComBat(., batch = batch, mod) )) fit1 <- lm.fit(mod, t(Y.tmp)) abs(coef(fit1)[coef,]) } # We calculate test statistics for the second coefficient res1 <- rotateStat(initialised.obj = init1, R = 10, statistic = statistic, batch = pdata$batch, mod = mod1, coef = 2) hist(pFdr(res1))
# See also '?rotateStat': #set.seed(0) # Dataframe of phenotype data (sample information) # We simulate 2 sample classes processed in 3 batches pdata <- data.frame(batch = rep(1:3, c(10,10,10)), phenotype = rep(c("Control", "Cancer"), c(5,5))) features <- 100 # Matrix with random gene expression data edata <- matrix(rnorm(features * nrow(pdata)), features) rownames(edata) <- paste("feature", 1:nrow(edata)) mod1 <- model.matrix(~phenotype, pdata) # Initialisation of the random rotation class init1 <- initBatchRandrot(Y = edata, X = mod1, coef.h = 2, batch = pdata$batch) init1 # Definition of the batch effect correction procedure with subsequent calculation # of two-sided test statistics statistic <- function(., batch, mod, coef){ # The "capture.output" and "suppressMessages" simply suppress any output capture.output(suppressMessages( Y.tmp <- sva::ComBat(., batch = batch, mod) )) fit1 <- lm.fit(mod, t(Y.tmp)) abs(coef(fit1)[coef,]) } # We calculate test statistics for the second coefficient res1 <- rotateStat(initialised.obj = init1, R = 10, statistic = statistic, batch = pdata$batch, mod = mod1, coef = 2) hist(pFdr(res1))
qqunif
produces a QQ plot of the values in ps
against the
theoretical quantiles of the uniform distribution.
qqunif( ps, log = "xy", pch = 20, xlab = "theoretical quantiles", ylab = "sample quantiles", plot.it = TRUE, ... )
qqunif( ps, log = "xy", pch = 20, xlab = "theoretical quantiles", ylab = "sample quantiles", plot.it = TRUE, ... )
ps |
|
log |
|
pch |
Point symbol, see |
xlab |
Label for the x axis. |
ylab |
Label for the y axis. |
plot.it |
|
... |
Graphical parameters forwarded to |
This function can e.g. be used for comparing p-values against the uniform distribution. The log scale of the x and y axes allow a closer look at low p-values.
This function is a modified version of the examples in the
qqnorm
documentation page.
A list of x
and y coordinates, as in qqplot
.
qqunif(runif(100))
qqunif(runif(100))
Generation of a random orthogonal n x n
matrix.
randorth(n, type = c("orthonormal", "unitary"), I.matrix = FALSE)
randorth(n, type = c("orthonormal", "unitary"), I.matrix = FALSE)
n |
|
type |
Either |
I.matrix |
If |
A random orthogonal matrix R
is generated in order that t(R)
(for "orthonormal"
) or Conj(t(R))
(for "unitary"
) equals the inverse matrix of R
.
This function was adapted from the pracma package (pracma::randortho
).
The random orthogonal matrices are distributed with Haar measure over
O(n)
, where O(n)
is the set of orthogonal matrices of order n
.
The random orthogonal matrices are basically distributed "uniformly" in the
space of random orthogonal matrices of dimension n x n
.
See also the Examples
and (Stewart 1980; Mezzadri 2007).
A random orthogonal matrix of dimension n x n
.
Peter Hettegger
Mezzadri F (2007).
“How to generate random matrices from the classical compact groups.”
Notices of the American Mathematical Society, 54(5), 592–604.
ISSN 1088-9477, 0609050, http://arxiv.org/abs/math-ph/0609050.
Stewart GW (1980).
“The Efficient Generation of Random Orthogonal Matrices with an Application to Condition Estimators.”
SIAM Journal on Numerical Analysis.
ISSN 0036-1429, doi:10.1137/0717034.
# The following example shows the orthogonality of the random orthogonal matrix: R1 <- randorth(4) zapsmall(t(R1) %*% R1) R1 <- randorth(4, "unitary") zapsmall(Conj(t(R1)) %*% R1) # The following example shows the distribution of 2-dimensional random orthogonal vectors # on the unit circle. tmp1 <- vapply(1:400, function(i)randorth(2)[,1], numeric(2)) plot(t(tmp1), xlab = "x", ylab = "y")
# The following example shows the orthogonality of the random orthogonal matrix: R1 <- randorth(4) zapsmall(t(R1) %*% R1) R1 <- randorth(4, "unitary") zapsmall(Conj(t(R1)) %*% R1) # The following example shows the distribution of 2-dimensional random orthogonal vectors # on the unit circle. tmp1 <- vapply(1:400, function(i)randorth(2)[,1], numeric(2)) plot(t(tmp1), xlab = "x", ylab = "y")
Generate a random permutation matrix for n
samples.
randpermut(n)
randpermut(n)
n |
Number of samples |
This methods generates an orthogonal matrix with only
one entry in each row and column being 1
, all other entries being
0
.
A random permutation matrix of dimension n x n
Peter Hettegger
tmp1 <- randpermut(5) t(tmp1) %*% tmp1
tmp1 <- randpermut(5) t(tmp1) %*% tmp1
Perform random data rotation of a previously initialised object (see
initRandrot
) associated with the
null hypothesis .
randrot(object, ...) ## S4 method for signature 'initRandrot' randrot(object, ...) ## S4 method for signature 'initRandrotW' randrot(object, ...) ## S4 method for signature 'initBatchRandrot' randrot(object, ...)
randrot(object, ...) ## S4 method for signature 'initRandrot' randrot(object, ...) ## S4 method for signature 'initRandrotW' randrot(object, ...) ## S4 method for signature 'initBatchRandrot' randrot(object, ...)
object |
An initialised object of class
|
... |
further arguments passed to
|
This function generates a randomly rotated dataset from an initialised
randrot object (see initRandrot
).
See also package vignette for application examples. Only the numerical matrix
of rotated data is returned, no design matrix, weights or other info is
return for efficiency purposes. Please consider that, if you e.g. use
weights
or if you use
rotateStat
, you may need to forward
the design matrix X
, weights
etc. to subsequent analyses. See
the example in rotateStat
.
Details on the calculation of a rotated dataset are given in
initRandrot
,
(Langsrud 2005) and (Hettegger et al. 2021).
numeric
matrix of rotated data under the specified combined
null hypothesis.
Peter Hettegger
Hettegger P, Vierlinger K, Weinhaeusel A (2021).
“Random rotation for identifying differentially expressed genes with linear models following batch effect correction.”
Bioinformatics.
ISSN 1367-4803, doi:10.1093/bioinformatics/btab063.
Langsrud O (2005).
“Rotation tests.”
Statistics and Computing, 15(1), 53–60.
ISSN 09603174, doi:10.1007/s11222-005-4789-5.
# For further examples see '?rotateStat' and package vignette. #set.seed(0) # Dataframe of phenotype data (sample information) # We simulate 2 sample classes processed in 3 batches pdata <- data.frame(batch = rep(1:3, c(10,10,10)), phenotype = rep(c("Control", "Cancer"), c(5,5))) features <- 100 # Matrix with random gene expression data edata <- matrix(rnorm(features * nrow(pdata)), features) rownames(edata) <- paste("feature", 1:nrow(edata)) mod1 <- model.matrix(~phenotype, pdata) # Initialisation of the random rotation class init1 <- initBatchRandrot(Y = edata, X = mod1, coef.h = 2, batch = pdata$batch) init1 ### Fit model to original data fit.orig <- lm.fit(mod1, t(edata)) head(t(coef(fit.orig))) ### Fit model to rotated data edata.rot <- randrot(init1) fit.rot <- lm.fit(mod1, t(edata.rot)) head(t(coef(fit.rot))) # Note that the coefficients stay equal if we regress only on the # non-hypothesis coefficients mod0 <- model.matrix(~1, pdata) fit.orig0 <- lm.fit(mod0, t(edata)) fit.rot0 <- lm.fit(mod0, t(edata.rot)) head(t(coef(fit.orig0))) head(t(coef(fit.rot0)))
# For further examples see '?rotateStat' and package vignette. #set.seed(0) # Dataframe of phenotype data (sample information) # We simulate 2 sample classes processed in 3 batches pdata <- data.frame(batch = rep(1:3, c(10,10,10)), phenotype = rep(c("Control", "Cancer"), c(5,5))) features <- 100 # Matrix with random gene expression data edata <- matrix(rnorm(features * nrow(pdata)), features) rownames(edata) <- paste("feature", 1:nrow(edata)) mod1 <- model.matrix(~phenotype, pdata) # Initialisation of the random rotation class init1 <- initBatchRandrot(Y = edata, X = mod1, coef.h = 2, batch = pdata$batch) init1 ### Fit model to original data fit.orig <- lm.fit(mod1, t(edata)) head(t(coef(fit.orig))) ### Fit model to rotated data edata.rot <- randrot(init1) fit.rot <- lm.fit(mod1, t(edata.rot)) head(t(coef(fit.rot))) # Note that the coefficients stay equal if we regress only on the # non-hypothesis coefficients mod0 <- model.matrix(~1, pdata) fit.orig0 <- lm.fit(mod0, t(edata)) fit.rot0 <- lm.fit(mod0, t(edata.rot)) head(t(coef(fit.orig0))) head(t(coef(fit.rot0)))
These functions are defunct and no longer available.
Defunct functions are: df_estimate
This function generates rotations of data and calculates the provided statistic
on each rotation and the non-rotated (original) data.
This is the central function of the package.
rotateStat( initialised.obj, R = 10, statistic, ..., parallel = FALSE, BPPARAM = BiocParallel::bpparam() )
rotateStat( initialised.obj, R = 10, statistic, ..., parallel = FALSE, BPPARAM = BiocParallel::bpparam() )
initialised.obj |
An initialised random rotation object as returned by |
R |
The number of resamples/rotations. Single |
statistic |
A function which takes a data matrix (same dimensions as |
... |
Further named arguments for |
parallel |
|
BPPARAM |
An optional |
The function takes an initialised randrot object
(initRandrot
) and a function that
calculates a statistic on the data. The statistic function thereby takes the
a matrix Y
as first argument. Any further arguments are passed to it
by ...
.
Together with pFdr
, this function implements
the workflow
described in (Hettegger et al. 2021).
Be aware that only data is rotated (see also
randrot
), so any additional information
including weights
, X
etc. need to be provided to
statistic
. See also package vignette and Examples
.
Parallel processing is implemented with the BiocParallel package of Bioconductor.
The default argument BiocParallel::bpparam()
for BPPARAM
returns the registered default backend.
See package documentation for further information and usage options.
If parallel = TRUE
the function calls in statistic
need to be called explicitly
with package name and "::". So e.g. calling lmFit
from the
limma
package is done with limma::lmFit(...)
, see also the
examples in the package vignette.
An object of class rotateStat
.
Peter Hettegger
Hettegger P, Vierlinger K, Weinhaeusel A (2021). “Random rotation for identifying differentially expressed genes with linear models following batch effect correction.” Bioinformatics. ISSN 1367-4803, doi:10.1093/bioinformatics/btab063.
#set.seed(0) # Dataframe of phenotype data (sample information) # We simulate 2 sample classes processed in 3 batches pdata <- data.frame(batch = rep(1:3, c(10,10,10)), phenotype = rep(c("Control", "Cancer"), c(5,5))) features <- 100 # Matrix with random gene expression data edata <- matrix(rnorm(features * nrow(pdata)), features) rownames(edata) <- paste("feature", 1:nrow(edata)) mod1 <- model.matrix(~phenotype, pdata) # Initialisation of the random rotation class init1 <- initBatchRandrot(Y = edata, X = mod1, coef.h = 2, batch = pdata$batch) init1 # Definition of the batch effect correction procedure with subsequent calculation # of two-sided test statistics statistic <- function(., batch, mod, coef){ # The "capture.output" and "suppressMessages" simply suppress any output capture.output(suppressMessages( Y.tmp <- sva::ComBat(., batch = batch, mod) )) fit1 <- lm.fit(mod, t(Y.tmp)) abs(coef(fit1)[coef,]) } # We calculate test statistics for the second coefficient res1 <- rotateStat(initialised.obj = init1, R = 10, statistic = statistic, batch = pdata$batch, mod = mod1, coef = 2) hist(pFdr(res1))
#set.seed(0) # Dataframe of phenotype data (sample information) # We simulate 2 sample classes processed in 3 batches pdata <- data.frame(batch = rep(1:3, c(10,10,10)), phenotype = rep(c("Control", "Cancer"), c(5,5))) features <- 100 # Matrix with random gene expression data edata <- matrix(rnorm(features * nrow(pdata)), features) rownames(edata) <- paste("feature", 1:nrow(edata)) mod1 <- model.matrix(~phenotype, pdata) # Initialisation of the random rotation class init1 <- initBatchRandrot(Y = edata, X = mod1, coef.h = 2, batch = pdata$batch) init1 # Definition of the batch effect correction procedure with subsequent calculation # of two-sided test statistics statistic <- function(., batch, mod, coef){ # The "capture.output" and "suppressMessages" simply suppress any output capture.output(suppressMessages( Y.tmp <- sva::ComBat(., batch = batch, mod) )) fit1 <- lm.fit(mod, t(Y.tmp)) abs(coef(fit1)[coef,]) } # We calculate test statistics for the second coefficient res1 <- rotateStat(initialised.obj = init1, R = 10, statistic = statistic, batch = pdata$batch, mod = mod1, coef = 2) hist(pFdr(res1))
This list based class contains calculated statistics for the original data
(s0
) and rotated data (stats
). See also
rotateStat
.
s0
Calculated statistics for original (non-rotated) data as returned by the statistic
function (rotateStat
).
stats
List of length ncol.s
containing statistics on rotated data for each column returned by the statistic
function.
ncol.s
Number of columns returned by the statistic
function.
R
Number of resamples/rotations.
Peter Hettegger
Display the object by printing structured summary information.
## S4 method for signature 'initRandrot' show(object) ## S4 method for signature 'initBatchRandrot' show(object) ## S4 method for signature 'rotateStat' show(object)
## S4 method for signature 'initRandrot' show(object) ## S4 method for signature 'initBatchRandrot' show(object) ## S4 method for signature 'rotateStat' show(object)
object |
An object of class
|
The show method always displays the original design matrix
(X
), not the transformed (whitened) versions.
show
returns an invisible NULL
.
weights
is a generic function which extracts fitting weights from objects returned by modeling functions.
NOTE: This man page is for the weights S4 generic function defined in the randRotation package.
## S4 method for signature 'initRandrot' weights(object, ...) ## S4 method for signature 'initBatchRandrot' weights(object, ...)
## S4 method for signature 'initRandrot' weights(object, ...) ## S4 method for signature 'initBatchRandrot' weights(object, ...)
object |
An object of class
|
... |
Kept for compatibility with the default method, see
|
Weights extracted from the object object
. NULL
if no
weights were specified.
See ?stats::weights
for the value returned by the default method.
weights showMethods("weights") selectMethod("weights", "ANY") # the default method
weights showMethods("weights") selectMethod("weights", "ANY") # the default method
Full QR decomposition of the design matrix X
. No argument checks are
performed, see Details
.
X_decomp(X = NULL, coef.d = seq_len(ncol(X) - 1))
X_decomp(X = NULL, coef.d = seq_len(ncol(X) - 1))
X |
Design matrix as generated by
|
coef.d |
Non- |
The design matrix X
is QR decomposed into X = Xq Xr
.
By performing a full QR decomposition, Xq
is automatically extended to
a full basis. Xq
is further split into Xd
and Xhe
, where
Xd
corresponds to columns coef.d
(non-H0
or
non-Null-Hypothesis columns) and Xhe
correspond to all other columns
(H0
and error columns), see initRandrot
.
No argument checks are performed for reasons
of performance as this function is called frequently by
initRandrot
when weights are used.
See (Hettegger et al. 2021) and (Langsrud 2005) for further details.
A list
object containing matrices Xd
,
Xhe
and rank of the qr decomposition.
Peter Hettegger
Hettegger P, Vierlinger K, Weinhaeusel A (2021).
“Random rotation for identifying differentially expressed genes with linear models following batch effect correction.”
Bioinformatics.
ISSN 1367-4803, doi:10.1093/bioinformatics/btab063.
Langsrud O (2005).
“Rotation tests.”
Statistics and Computing, 15(1), 53–60.
ISSN 09603174, doi:10.1007/s11222-005-4789-5.
design <- cbind(1, rep(0:1, 5)) X_decomp(design)
design <- cbind(1, rep(0:1, 5)) X_decomp(design)