modified: Sat Jan 20 08:18:27 2018 compiled: Wed Oct 30 03:33:57 2024
bacon can be used to remove inflation and bias often observed in epigenome- and transcriptome-wide association studies (Iterson, Zwet, and Heijmans 2017).
To this end bacon constructs an empirical null distribution using a Gibbs Sampling algorithm by fitting a three-component normal mixture on z-scores. One component is forced, using prior knowledge, to represent the null distribution with mean and standard deviation representing the bias and inflation. The other two components are necessary to capture the amount of true associations present in the data, which we assume unknown but small.
bacon provides functionality to inspect the output of the Gibbs Sampling algorithm, i.e., plots of traces, posterior distributions and the mixture fit, are provided. Furthermore, inflation- and bias-corrected test-statistics, effect-sizes and standard errors, or P-values are extracted easily. In addition, functionality for performing fixed-effect meta-analysis and obtaining inflation- and bias-corrected statistics with a 95% Confidence Interval (CI) are provided as well.
The function bacon
requires a vector or a matrix of
z-scores and/or effect-sizes and standard errors, e.g., those extracted
from association analyses using a linear regression approach. For
fixed-effect meta-analysis a matrix of effect-sizes and standard-errors
is required.
This vignette illustrates the use of bacon using simulated z-scores, effect-sizes and standard errors to avoid long run-times. If multiple sets of test-statisics or effect-sizes and standard-errors are provided, the Gibbs Sampler algorithm can be executed in parallel to reduce computation time using functionality provide by BiocParallel-package.
A vector containing 5000 z-scores is
generated from a normal mixture distribution, 90% of the z-scores were drawn from a biased
and inflated null distribution, 𝒩(0.2, 1.3), and the remaining z-scores from
𝒩(μ, 1), where μ ∼ 𝒩(4, 1). The
rnormmix
-function provided by Bacon
generates
a vector of random test-statistics described above optionally with
different parameters.
The function bacon
executes the Gibbs Sampler algorithm
and stores all in- and out-put in an object of class Bacon
.
Several accessor-functions are available to access data contained in the
Bacon
-object, e.g. for obtaining the estimated parameters
of the mixture fit or explicitly the bias and inflation. Actually, the
latter two are the mean and standard deviation of the null component
(mu.0 and sigma.0).
## Bacon-object containing 1 set(s) of 5000 test-statistics.
## ...estimated bias: 0.16.
## ...estimated inflation: 1.3.
##
## Empirical null estimates are based on 5000 iterations with a burnin-period of 2000.
## p.0 p.1 p.2 mu.0 mu.1 mu.2 sigma.0 sigma.1 sigma.2
## [1,] 0.906 0.0659 0.0278 0.163 3.01 -3.03 1.28 2.79 2.6
## sigma.0
## 1.28
## mu.0
## 0.163
Several methods are provided to inspect the output of the Gibbs Sampler algorithm, such as traces-plots of all estimates, plots of posterior distributions, provide as a scatter plot between two parameters, and the actual fit of the three component mixture to the histogram of z-scores.
The previous three plots can be use as diagnostic tools to inspect the Gibbs sampling process.
There is also a generic plot function that can generate two types of plots; a histogram of the z-scores and a qq-plot. The histogram of the z-scores shows on top the standard normal distribution and the Gibbs Sampling estimated empirical null distribution. The quantile-quantile plot shows the −log10 transformed P-values. Default values are raw, not controlled for bias and inflation, z-scores and P-values.
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## ℹ The deprecated feature was likely used in the bacon package.
## Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Matrices containing 5000 × 6
effect-sizes and standard errors are generated to simulated data for a
fixed-effect meta-analyses. This is a toy-example just to illustrate the
capabilities of bacon
in handling multiple sets of
test-statics.
set.seed(12345)
biases <- runif(6, -0.2, 0.2)
inflations <- runif(6, 1, 1.3)
es <- matrix(nrow=5000, ncol=6)
for(i in 1:6)
es[,i] <- rnormmix(5000, c(0.9, biases[i], inflations[i], 0, 4, 1), shuffle=FALSE)
se <- replicate(6, 0.8*sqrt(4/rchisq(5000,df=4)))
colnames(es) <- colnames(se) <- LETTERS[1:ncol(se)]
rownames(es) <- rownames(se) <- 1:5000
head(rownames(es))
## [1] "1" "2" "3" "4" "5" "6"
## [1] "A" "B" "C" "D" "E" "F"
By default the function bacon
detects the number of
cores/nodes registered, as described in the BiocParallel,
to perform bacon in parallel. To run the vignette in general we set it
here for convenience to 1 node.
## Warning in .local(.Object, ...): test-statistics were not provided:
## teststatistics = effectsizes/standarderrors
## Did you registered a biocparallel back-end?
## Continuing serial!
## Bacon-object containing 6 set(s) of 5000 test-statistics.
## ...estimated bias: 0.065,0.092,0.088,0.051,0.018,-0.076.
## ...estimated inflation: 1.2,1.3,1.3,1.3,1.1,1.1.
##
## Empirical null estimates are based on 5000 iterations with a burnin-period of 2000.
p.0 | p.1 | p.2 | mu.0 | mu.1 | mu.2 | sigma.0 | sigma.1 | sigma.2 | |
---|---|---|---|---|---|---|---|---|---|
A | 0.868 | 0.072 | 0.060 | 0.065 | 2.65 | -2.66 | 1.19 | 3.64 | 3.21 |
B | 0.877 | 0.071 | 0.052 | 0.092 | 2.80 | -2.75 | 1.29 | 3.02 | 3.70 |
C | 0.852 | 0.081 | 0.066 | 0.088 | 2.62 | -2.73 | 1.30 | 3.22 | 3.37 |
D | 0.832 | 0.059 | 0.109 | 0.051 | 3.02 | -1.16 | 1.33 | 1.55 | 4.61 |
E | 0.881 | 0.058 | 0.060 | 0.018 | 2.69 | -2.61 | 1.15 | 4.04 | 3.45 |
F | 0.861 | 0.061 | 0.079 | -0.076 | 2.79 | -2.65 | 1.15 | 3.54 | 3.25 |
## A B C D E F
## 1.19 1.29 1.30 1.33 1.15 1.15
## A B C D E F
## 0.0649 0.0923 0.0876 0.0514 0.0178 -0.0760
A | B | C | D | E | F |
---|---|---|---|---|---|
-0.669 | 0.609 | -0.613 | -0.722 | 0.182 | -0.986 |
0.360 | 0.261 | 0.243 | -3.209 | -0.783 | 2.517 |
-0.488 | -0.036 | -0.134 | -0.803 | 0.793 | -0.272 |
0.115 | -2.721 | -0.911 | -1.584 | 0.461 | 0.296 |
0.568 | 0.909 | 1.925 | 0.841 | 2.025 | -1.191 |
A | B | C | D | E | F |
---|---|---|---|---|---|
0.503 | 0.543 | 0.540 | 0.471 | 0.855 | 0.324 |
0.719 | 0.794 | 0.808 | 0.001 | 0.434 | 0.012 |
0.625 | 0.972 | 0.894 | 0.422 | 0.428 | 0.786 |
0.908 | 0.007 | 0.362 | 0.113 | 0.645 | 0.767 |
0.570 | 0.363 | 0.054 | 0.401 | 0.043 | 0.234 |
A | B | C | D | E | F |
---|---|---|---|---|---|
1.058 | 0.917 | 1.888 | 1.812 | 1.255 | 0.899 |
0.856 | 1.664 | 1.947 | 1.045 | 0.898 | 0.802 |
1.342 | 1.448 | 0.881 | 1.281 | 1.036 | 1.081 |
2.067 | 1.237 | 1.704 | 0.760 | 0.799 | 2.268 |
2.527 | 1.180 | 0.680 | 0.738 | 0.825 | 0.983 |
A | B | C | D | E | F |
---|---|---|---|---|---|
-0.708 | 0.559 | -1.157 | -1.307 | 0.229 | -0.887 |
0.308 | 0.434 | 0.474 | -3.355 | -0.704 | 2.018 |
-0.655 | -0.052 | -0.118 | -1.029 | 0.822 | -0.294 |
0.239 | -3.367 | -1.552 | -1.203 | 0.369 | 0.671 |
1.435 | 1.073 | 1.310 | 0.621 | 1.671 | -1.170 |
The accessor-function return as expected matrices of estimates. For the plotting functions an additional index of the ith study or z-score is required.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The following code chunk shows how to perform fixed-effect meta-analysis and the inspection of results.
## A B C D E F meta
## 1 0.503 0.5425 0.5400 0.47052 0.8553 0.3241 0.4406
## 2 0.719 0.7942 0.8077 0.00133 0.4336 0.0118 0.9664
## 3 0.625 0.9716 0.8937 0.42186 0.4276 0.7856 0.7636
## 4 0.908 0.0065 0.3625 0.11325 0.6448 0.7674 0.0620
## 5 0.570 0.3631 0.0542 0.40057 0.0429 0.2336 0.0223
## 6 0.279 0.1885 0.7669 0.56507 0.0229 0.3063 0.0111
## eff.size.meta std.err.meta pval.adj.meta pval.org.meta tstat.meta
## 4976 -5.87 0.359 2.46e-56 4.93e-60 -16.3
## 4820 4.19 0.322 4.85e-35 9.69e-39 13.0
## 4617 5.26 0.404 5.10e-35 1.02e-38 13.0
## 4520 3.89 0.321 3.50e-30 7.00e-34 12.1
## 4919 4.54 0.378 1.58e-29 3.15e-33 12.0
## 4804 5.25 0.437 1.67e-29 3.35e-33 12.0
## 4562 4.59 0.384 3.39e-29 6.78e-33 11.9
## 4918 -4.20 0.366 9.85e-27 1.97e-30 -11.5
## 4567 -4.33 0.395 2.86e-24 5.71e-28 -11.0
## 4585 -3.42 0.312 3.46e-24 6.92e-28 -10.9
## eff.size.A std.err.A pval.A tstat.A eff.size.B std.err.B pval.B
## 4976 -0.6577 1.400 6.38e-01 -0.4699 -2.803 1.594 7.87e-02
## 4820 2.1473 0.805 7.66e-03 2.6668 -5.913 0.933 2.39e-10
## 4617 7.6367 0.948 7.65e-16 8.0596 1.249 0.976 2.01e-01
## 4520 0.6468 1.561 6.79e-01 0.4142 0.755 0.699 2.80e-01
## 4919 8.1045 0.571 1.05e-45 14.1905 -0.605 1.427 6.72e-01
## 4804 4.0377 1.029 8.78e-05 3.9220 -0.754 2.329 7.46e-01
## 4562 2.8163 0.848 8.98e-04 3.3208 8.665 0.607 2.87e-46
## 4918 -0.4046 1.467 7.83e-01 -0.2758 -7.779 1.046 1.01e-13
## 4567 0.0691 1.887 9.71e-01 0.0366 -6.144 0.861 9.68e-13
## 4585 -2.7347 0.966 4.65e-03 -2.8305 1.968 0.879 2.52e-02
## tstat.B eff.size.C std.err.C pval.C tstat.C eff.size.D std.err.D
## 4976 -1.758 -7.0701 1.336 1.21e-07 -5.2926 2.892 1.019
## 4820 -6.334 -9.1400 0.967 3.23e-21 -9.4551 1.069 2.114
## 4617 1.280 -2.2194 1.391 1.11e-01 -1.5958 -0.359 1.953
## 4520 1.081 1.4007 1.601 3.82e-01 0.8750 -0.502 0.673
## 4919 -0.424 1.8069 0.848 3.31e-02 2.1311 -3.252 1.175
## 4804 -0.324 1.7019 1.232 1.67e-01 1.3816 9.846 0.867
## 4562 14.281 5.7253 2.162 8.08e-03 2.6485 3.656 1.538
## 4918 -7.439 -1.4093 1.802 4.34e-01 -0.7819 -3.416 0.762
## 4567 -7.135 -0.0962 1.635 9.53e-01 -0.0588 -5.254 0.706
## 4585 2.239 5.1220 0.927 3.25e-08 5.5274 1.534 1.248
## pval.D tstat.D eff.size.E std.err.E pval.E tstat.E eff.size.F
## 4976 4.54e-03 2.838 -11.24 0.539 2.33e-96 -20.83 -2.26
## 4820 6.13e-01 0.506 10.65 0.454 1.98e-121 23.43 2.29
## 4617 8.54e-01 -0.184 5.17 0.861 2.01e-09 6.00 9.09
## 4520 4.56e-01 -0.746 10.24 0.526 2.89e-84 19.45 -0.35
## 4919 5.65e-03 -2.768 5.30 1.035 2.99e-07 5.12 5.50
## 4804 7.32e-30 11.351 5.76 0.840 7.38e-12 6.85 2.04
## 4562 1.74e-02 2.378 -4.46 1.364 1.07e-03 -3.27 2.19
## 4918 7.44e-06 -4.481 -6.21 0.637 1.69e-22 -9.76 -1.49
## 4567 9.51e-14 -7.448 -9.74 0.841 5.35e-31 -11.58 4.87
## 4585 2.19e-01 1.229 -7.20 0.425 2.75e-64 -16.93 -3.27
## std.err.F pval.F tstat.F
## 4976 0.730 1.96e-03 -3.097
## 4820 1.105 3.80e-02 2.075
## 4617 0.739 7.89e-35 12.311
## 4520 0.975 7.20e-01 -0.359
## 4919 1.627 7.23e-04 3.381
## 4804 1.200 8.97e-02 1.697
## 4562 0.815 7.17e-03 2.689
## 4918 0.798 6.16e-02 -1.869
## 4567 1.003 1.21e-06 4.854
## 4585 1.347 1.53e-02 -2.425
Here we describe how inflation- and bias-corrected statistics with a 95% Confidence Interval (CI) can be constructed.
Given a vector of z-scores. Replicate the z-scores, minimal 30 times, and store in a matrix.
This is a toy-example just to illustrate how bacon
can
be used to create a sampling distributions of metrics output from
bacon
therefore we just use 10 replicates to speedup the
calculations.
zs <- cbind(zs, matrix(zs, nrow=5000, ncol=9))
colnames(zs) <- c(paste0("A", 1:10))
rownames(zs) <- 1:5000
head(rownames(zs))
## [1] "1" "2" "3" "4" "5" "6"
## [1] "A1" "A2" "A3" "A4" "A5" "A6"
By default the function bacon
sets a global seed for
random number generation (RNG) that occurs in the Gibbs Sampling
process. The global seed is sufficient for controlling the RNG when the
input is a single-vector or a matrix so that each call to bacon produces
the same results. Since BiocParallel
performs RNG independently of the global environment, the
globalSeed
can be set to NULL
to allow RNG
across each parallel process that calls bacon
, and the
parallelSeed
(default 42) used by BiocParallel
will control the RNG so that a separate call to bacon
with
the same input matrix will produce the same ‘randomized’ results.
library(BiocParallel)
register(MulticoreParam(1, log=TRUE))
bc <- bacon(teststatistics = zs,
globalSeed = NULL,
parallelSeed = 42)
## Did you registered a biocparallel back-end?
## Continuing serial!
## Bacon-object containing 10 set(s) of 5000 test-statistics.
## ...estimated bias: 0.15,0.15,0.15,0.14,0.15,0.15,0.14,0.15,0.14,0.15.
## ...estimated inflation: 1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3.
##
## Empirical null estimates are based on 5000 iterations with a burnin-period of 2000.
## A1 A2 A3 A4 A5 A6
## 1.32 1.32 1.32 1.32 1.32 1.32
## A1 A2 A3 A4 A5 A6
## 0.146 0.146 0.146 0.145 0.145 0.145
The resulting sampling distributions can be used to calculate the average estimated inflation and bias metrics with a 95% CI. Averaged inflation- and bias- adjusted z-scores and p-values can also be easily extracted.
For example, overall average bias and inflation:
## [1] 0.145
## [1] 1.32
The 95% confidence intervals:
## 2.5% 97.5%
## 0.145 0.146
## 2.5% 97.5%
## 1.32 1.32
And average inflation- and bias-corrected z-scores and P-values:
## 1 2 3 4 5
## 7.730 1.946 -1.331 0.174 -0.428
## 1 2 3 4 5
## 1.08e-14 5.17e-02 1.83e-01 8.62e-01 6.69e-01
Here is the output of sessionInfo()
on the system on
which this document was compiled:
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] bacon_1.35.0 ellipse_0.5.0 BiocParallel_1.39.0
## [4] ggplot2_3.5.1 BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 jsonlite_1.8.9 crayon_1.5.3
## [4] highr_0.11 compiler_4.4.1 BiocManager_1.30.25
## [7] parallel_4.4.1 jquerylib_0.1.4 scales_1.3.0
## [10] yaml_2.3.10 fastmap_1.2.0 R6_2.5.1
## [13] labeling_0.4.3 knitr_1.48 tibble_3.2.1
## [16] maketools_1.3.1 munsell_0.5.1 bslib_0.8.0
## [19] pillar_1.9.0 rlang_1.1.4 utf8_1.2.4
## [22] cachem_1.1.0 xfun_0.48 sass_0.4.9
## [25] sys_3.4.3 cli_3.6.3 withr_3.0.2
## [28] magrittr_2.0.3 digest_0.6.37 grid_4.4.1
## [31] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.1
## [34] glue_1.8.0 farver_2.1.2 codetools_0.2-20
## [37] buildtools_1.0.0 fansi_1.0.6 colorspace_2.1-1
## [40] rmarkdown_2.28 tools_4.4.1 pkgconfig_2.0.3
## [43] htmltools_0.5.8.1