Here we describe the improvements to limma’s
diffSplice
method. diffSplice
works on the
bin-wise coefficient of the linear model which corresponds to the log2
fold changes between conditions. It compares the log2(fold change) β̂k, g
of a bin k belonging to gene
g, to a weighted average of
log2(fold change) of all the other bins of the same gene combined B̂k, g
(the subscript g will be
henceforth omitted for ease of reading). The weighted average of all the
other bins in the same gene is calculated by where $w_{i}=\frac{1}{u_{i}^2}$ and ui refers to the
diagonal elements of the unscaled covariance matrix (XTVX)−1.
X is the design matrix and
V corresponds to the weight
matrix estimated by voom
. The difference of log2 fold
changes, which is also the coefficient returned by
diffSplice()
is then calculated by Ĉk = β̂k − B̂k.
Instead of calculating the t-statistic with Ĉk, this value
is scaled again in the original code:
and the t-statistic is calculated as:
sg2 refers to the posterior residual variance of gene g, which is calculated by averaging the sample values of the residual variances of all the bins in the gene, and then squeezing these residual variances of all genes using empirical Bayes method. This assumes that the residual variance is constant across all bins of the same gene.
In , we applied three changes to the above method. First, the residual variances are not assumed to be constant across all bins of the same gene. This results in the sample values of the residual variances of every bin now being squeezed using empirical Bayes method, resulting in posterior variances si2 for every individual bin i. Second, the weights wi, used to calculate B̂k, now incorporate the individual variances by $w_{i}=\frac{1}{s_{i}^2u_{i}^2}$. Third, the Ĉk value is directly used to calculate the t-statistic, which after all these changes now corresponds to
Using a simulation from a previous DEU benchmark Soneson et al. 2016, we show that our modifications improve the accuracy of the original limma method:
Of note, however, in most settings the accuracy of
diffSplice2
is still inferior to that of DEXSeq,
which therefore remains the method of choice for DEU. However, as the
computationally-heavy DEXSeq
does not scale well and is not applicable in all situations, we
recommend diffSplice2
as the next best
method.