There are four testing scenarios
depending on the type format of the query set and database sets. They
are shown with the respective testing scenario in the table below.
testEnrichment
, testEnrichmentSEA
are for
Fisher’s exact test and Set Enrichment Analysis respectively.
Continuous Database Set | Discrete Database Set | |
---|---|---|
Continuous Query | Correlation-based | Set Enrichment Analysis |
Discrete Query | Set Enrichment Analysis | Fisher’s Exact Test |
The query may be a named continuous vector. In that case, either a gene enrichment score will be calculated (if the database is discrete) or a Spearman correlation will be calculated (if the database is continuous as well). The three other cases are shown below using biologically relevant examples.
To display this functionality, let’s load two numeric database sets individually. One is a database set for CpG density and the other is a database set corresponding to the distance of the nearest transcriptional start site (TSS) to each probe.
sesameDataCache(data_titles = c("KYCG.MM285.seqContextN.20210630"))
res <- testEnrichmentSEA(query, "MM285.seqContextN")
main_stats <- c("dbname", "test", "estimate", "FDR", "nQ", "nD", "overlap")
res[,main_stats]
## dbname test estimate FDR nQ nD overlap
## 2 distToTSS Set Enrichment Score 0.7486501 0.00000000 69236 303421 69236
## 1 CpGDesity50 Set Enrichment Score -0.2626335 0.04790685 69236 297415 69236
The estimate here is enrichment score.
NOTE: Negative enrichment score suggests enrichment of the categorical database with the higher values (in the numerical database). Positive enrichment score represent enrichment with the smaller values. As expected, the designed TSS CpGs are significantly enriched in smaller TSS distance and higher CpG density.
Alternatively one can test the enrichment of a continuous query with discrete databases. Here we will use the methylation level from a sample as the query and test it against the chromHMM chromatin states.
library(sesame)
sesameDataCache(data_titles = c("MM285.1.SigDF"))
beta_values <- getBetas(sesameDataGet("MM285.1.SigDF"))
res <- testEnrichmentSEA(beta_values, "MM285.chromHMM")
main_stats <- c("dbname", "test", "estimate", "FDR", "nQ", "nD", "overlap")
res[,main_stats]
## dbname test estimate FDR nQ nD overlap
## 14 Tss Set Enrichment Score 0.8010037 0.000000e+00 41675 296070 41672
## 15 TssBiv Set Enrichment Score 0.6609816 0.000000e+00 12278 296070 12278
## 10 Quies4 Set Enrichment Score 0.3407788 0.000000e+00 6751 296070 6751
## 1 Enh Set Enrichment Score 0.3277562 0.000000e+00 8269 296070 8269
## 5 EnhPr Set Enrichment Score 0.2930447 0.000000e+00 5912 296070 5912
## 16 TssFlnk Set Enrichment Score 0.2873390 0.000000e+00 9462 296070 9461
## 12 ReprPC Set Enrichment Score 0.2365804 0.000000e+00 8858 296070 8858
## 3 EnhLo Set Enrichment Score 0.1898612 6.534576e-133 1808 296070 1808
## 6 Het Set Enrichment Score -0.1748840 3.468312e-02 3575 296070 3575
## 11 QuiesG Set Enrichment Score -0.2460352 4.490718e-02 35428 296070 35423
## 13 ReprPCWk Set Enrichment Score -0.2297174 4.490718e-02 9806 296070 9805
## 17 Tx Set Enrichment Score -0.4111345 5.000525e-02 17801 296070 17801
## 18 TxWk Set Enrichment Score -0.3113600 5.000525e-02 14167 296070 14165
## 7 Quies Set Enrichment Score -0.3181956 7.782078e-02 96622 296070 96602
## 2 EnhG Set Enrichment Score -0.1601814 5.365994e-01 3640 296070 3640
## 9 Quies3 Set Enrichment Score -0.1629812 1.000000e+00 4113 296070 4112
## 4 EnhPois Set Enrichment Score -0.1866245 1.000000e+00 12317 296070 12317
## 8 Quies2 Set Enrichment Score -0.1213437 1.000000e+00 2603 296070 2601
As expected, chromatin states Tss
, Enh
has
negative enrichment score, meaning these databases are associated with
small values of the query (DNA methylation level). On the contrary,
Het
and Quies
states are associated with high
methylation level.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] sesame_1.25.1 knitr_1.49
## [3] gprofiler2_0.2.3 SummarizedExperiment_1.37.0
## [5] Biobase_2.67.0 GenomicRanges_1.59.1
## [7] GenomeInfoDb_1.43.2 IRanges_2.41.2
## [9] S4Vectors_0.45.2 MatrixGenerics_1.19.0
## [11] matrixStats_1.4.1 sesameData_1.24.0
## [13] ExperimentHub_2.15.0 AnnotationHub_3.15.0
## [15] BiocFileCache_2.15.0 dbplyr_2.5.0
## [17] BiocGenerics_0.53.3 generics_0.1.3
## [19] knowYourCG_1.3.5
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 bitops_1.0-9 rlang_1.1.4
## [4] magrittr_2.0.3 compiler_4.4.2 RSQLite_2.3.9
## [7] png_0.1-8 vctrs_0.6.5 reshape2_1.4.4
## [10] stringr_1.5.1 pkgconfig_2.0.3 crayon_1.5.3
## [13] fastmap_1.2.0 XVector_0.47.1 utf8_1.2.4
## [16] rmarkdown_2.29 tzdb_0.4.0 preprocessCore_1.69.0
## [19] UCSC.utils_1.3.0 purrr_1.0.2 bit_4.5.0.1
## [22] xfun_0.49 cachem_1.1.0 jsonlite_1.8.9
## [25] blob_1.2.4 DelayedArray_0.33.3 BiocParallel_1.41.0
## [28] parallel_4.4.2 R6_2.5.1 bslib_0.8.0
## [31] stringi_1.8.4 RColorBrewer_1.1-3 jquerylib_0.1.4
## [34] Rcpp_1.0.13-1 wheatmap_0.2.0 readr_2.1.5
## [37] Matrix_1.7-1 tidyselect_1.2.1 abind_1.4-8
## [40] yaml_2.3.10 codetools_0.2-20 curl_6.0.1
## [43] lattice_0.22-6 tibble_3.2.1 plyr_1.8.9
## [46] withr_3.0.2 KEGGREST_1.47.0 evaluate_1.0.1
## [49] Biostrings_2.75.3 pillar_1.10.0 BiocManager_1.30.25
## [52] filelock_1.0.3 plotly_4.10.4 RCurl_1.98-1.16
## [55] BiocVersion_3.21.1 hms_1.1.3 ggplot2_3.5.1
## [58] munsell_0.5.1 scales_1.3.0 glue_1.8.0
## [61] lazyeval_0.2.2 maketools_1.3.1 tools_4.4.2
## [64] sys_3.4.3 data.table_1.16.4 buildtools_1.0.0
## [67] grid_4.4.2 tidyr_1.3.1 AnnotationDbi_1.69.0
## [70] colorspace_2.1-1 GenomeInfoDbData_1.2.13 cli_3.6.3
## [73] rappdirs_0.3.3 S4Arrays_1.7.1 viridisLite_0.4.2
## [76] dplyr_1.1.4 gtable_0.3.6 sass_0.4.9
## [79] digest_0.6.37 SparseArray_1.7.2 ggrepel_0.9.6
## [82] htmlwidgets_1.6.4 memoise_2.0.1 htmltools_0.5.8.1
## [85] lifecycle_1.0.4 httr_1.4.7 mime_0.12
## [88] bit64_4.5.2