Meta-analyses on p-values of various differential tests

Overview

The metapod package provides utilities for meta-analyses on precomputed p-values, typicall generated from differential analyses of some sort. This enables users to consolidate multiple related tests into a single metric for easier interpretation and prioritization of interesting events. The functionality here was originally developed as part of the csaw package, to combine inferences from adjacent windows inside a genomic region; as well as from the scran package, to merge statistics from multiple batches or from multiple pairwise comparisons during marker gene detection. Most functions in this package involve combining multiple (independent or not) p-values into a single combined p-value. Additional functions are also provided to summarize the direction of any effect.

Meta-analyses on p-values

metapod provides two families of functions to combine a group of p-values into a single statistic. The first family considers parallel vectors of p-values where each group of p-values is defined from corresponding elements across vectors. This is useful for combining different tests on the same set of features, e.g., DE analyses with the same gene annotation. We illustrate this below by using Simes’ method to combine two parallel vectors:

library(metapod)
p1 <- rbeta(10000, 0.5, 1)
p2 <- rbeta(10000, 0.5, 1)
par.combined <- parallelSimes(list(p1, p2))
str(par.combined)
## List of 3
##  $ p.value       : num [1:10000] 0.353945 0.219269 0.000693 0.016756 0.301057 ...
##  $ representative: int [1:10000] 2 2 1 2 2 2 2 1 2 2 ...
##  $ influential   :List of 2
##   ..$ : logi [1:10000] TRUE FALSE TRUE FALSE TRUE TRUE ...
##   ..$ : logi [1:10000] TRUE TRUE FALSE TRUE TRUE TRUE ...

The other approach is to consider groups of p-values within a single numeric vector. This is useful when the p-values to be combined are generated as part of a single analysis. The motivating example is that of adjacent windows in the same genomic region for ChIP-seq, but one can also imagine the same situation for other features, e.g., exons in RNA-seq. In this situation, we need a grouping factor to define the groups within our vector:

p <- rbeta(10000, 0.5, 1)
g <- sample(100, length(p), replace=TRUE)
gr.combined <- groupedSimes(p, g)
str(gr.combined)
## List of 3
##  $ p.value       : Named num [1:100] 0.011366 0.002741 0.000457 0.001041 0.00011 ...
##   ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
##  $ representative: Named int [1:100] 1401 6974 6781 1992 3515 7978 4294 5948 241 4089 ...
##   ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
##  $ influential   : logi [1:10000] FALSE FALSE FALSE FALSE FALSE FALSE ...

Both families of functions return the combined p-value along with some other helpful statistics. The representative vector specifies the index of the representative test for each group; this is a test that is particularly important to the calculation of the combined p-value and can be used to, e.g., obtain a representative effect size. Similarly, the influential vector indicates which of the individual tests in each group are “influential” in computing the combined p-value. This is defined as the subset of tests that, if their p-values were set to unity, would alter the combined p-value.

p-value combining strategies

The following methods yield a significant result for groups where any of the individual tests are significant:

  • Simes’ method (parallelSimes() and groupedSimes()) is robust to dependencies between tests and supports weighting of individual tests.
  • Fisher’s method (parallelFisher() and groupedFisher()) assumes that tests are independent and does not support weighting.
  • Pearson’s method (parallelPearson() and groupedPearson()) assumes that tests are independent and does not support weighting.
  • Stouffer’s Z-score method (parallelStouffer() and groupedStouffer()) assumes that tests are independent and does support weighting.

Fisher’s method is most sensitive to the smallest p-value while Pearson’s method is most sensitive to the largest p-value. Stouffer’s method serves as a compromise between these two extremes. Simes’ method is more conservative than all three but is still functional in the presence of dependencies.

The following methods yield a significant result for groups where some of the individual tests are significant (i.e., a minimum number or proportion of individual tests reject the null hypothesis):

  • Wilkinson’s method (parallelWilkinson() and groupedWilkinson()) assumes that tests are independent and does not support weighting.
  • The minimum Holm approach (parallelHolmMin() and groupedHolmMin()) does not require independence and supports weighting.

Technically, Wilkinson’s method can reject if any of the individual nulls are significant. However, it has a dramatic increase in detection power when the specified minimum is attained, hence its inclusion in this category.

Finally, Berger’s intersection union test will yield a significant result for groups where all of the individual tests are significant. This does not require independence but does not support weighting.

Summarizing the direction

The summarizeParallelDirection() and summarizeGroupedDirection() functions will summarize the direction of effect of all influential tests for each group. For example, if all influential tests have positive log-fold changes, the group would be reported as having a direction of "up". Alternatively, if influential tests have both positive and negative log-fold changes, the reported direction would be "mixed". By only considering the influential tests, we avoid noise from “less significant” tests that do not contribute to the final p-value.

logfc1 <- rnorm(10000)
logfc2 <- rnorm(10000)

par.dir <- summarizeParallelDirection(list(logfc1, logfc2), 
    influential = par.combined$influential)
table(par.dir)
## par.dir
##  down mixed    up 
##  4294  1455  4251

countParallelDirection() and countGroupedDirection will just count the number of significant tests changing in each direction. This is done after correcting for the number of tests in each group with the Benjamini-Hochberg or Holm methods. These counts are somewhat simpler to interpret than the summarized direction but have their own caveats - see ?countParallelDirection for details.

par.dir2 <- countParallelDirection(list(p1, p2), list(logfc1, logfc2)) 
str(par.dir2)
## List of 2
##  $ down: int [1:10000] 0 0 0 0 0 0 0 0 0 0 ...
##  $ up  : int [1:10000] 0 0 1 2 0 0 0 0 0 0 ...

Another approach is to just use the direction of effect of the representative test. This is usually the best approach if a single direction and effect size is required, e.g., for visualization purposes.

Further comments

All functions support log.p=TRUE to accept log-transformed p-values as input and to return log-transformed output. This is helpful in situations where underflow would cause generation of zero p-values.

Several functions support weighting of the individual tests within each group. This is helpful if some of the tests are a priori known to be more important than others.

The combineParallelPValues() and combineGroupedPValues() wrapper functions will dispatch to the requested method, making it easier to switch between methods inside other packages:

gr.combined2 <- combineGroupedPValues(p, g, method="holm-min")
str(gr.combined2)
## List of 3
##  $ p.value       : Named num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
##  $ representative: Named int [1:100] 2505 7087 1299 4253 8410 5711 3743 3209 307 4020 ...
##   ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
##  $ influential   : logi [1:10000] FALSE TRUE TRUE TRUE FALSE TRUE ...

Session information

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] metapod_1.15.0   knitr_1.49       BiocStyle_2.35.0
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37       R6_2.5.1            fastmap_1.2.0      
##  [4] xfun_0.49           maketools_1.3.1     cachem_1.1.0       
##  [7] htmltools_0.5.8.1   rmarkdown_2.29      buildtools_1.0.0   
## [10] lifecycle_1.0.4     cli_3.6.3           sass_0.4.9         
## [13] jquerylib_0.1.4     compiler_4.4.2      sys_3.4.3          
## [16] tools_4.4.2         evaluate_1.0.1      bslib_0.8.0        
## [19] Rcpp_1.0.13-1       yaml_2.3.10         BiocManager_1.30.25
## [22] jsonlite_1.8.9      rlang_1.1.4