Continuous Data Analysis

There are four testing scenarios depending on the type format of the query set and database sets. They are shown with the respective testing scenario in the table below. testEnrichment, testEnrichmentSEA are for Fisher’s exact test and Set Enrichment Analysis respectively.

Four knowYourCG Testing Scenarios
Continuous Database Set Discrete Database Set
Continuous Query Correlation-based Set Enrichment Analysis
Discrete Query Set Enrichment Analysis Fisher’s Exact Test

CONTINUOUS VARIABLE ENRICHMENT

The query may be a named continuous vector. In that case, either a gene enrichment score will be calculated (if the database is discrete) or a Spearman correlation will be calculated (if the database is continuous as well). The three other cases are shown below using biologically relevant examples.

To display this functionality, let’s load two numeric database sets individually. One is a database set for CpG density and the other is a database set corresponding to the distance of the nearest transcriptional start site (TSS) to each probe.

query <- getDBs("KYCG.MM285.designGroup")[["TSS"]]
sesameDataCache(data_titles = c("KYCG.MM285.seqContextN.20210630"))
res <- testEnrichmentSEA(query, "MM285.seqContextN")
main_stats <- c("dbname", "test", "estimate", "FDR", "nQ", "nD", "overlap")
res[,main_stats]
##        dbname                 test   estimate        FDR    nQ     nD overlap
## 2   distToTSS Set Enrichment Score  0.7486501 0.00000000 69236 303421   69236
## 1 CpGDesity50 Set Enrichment Score -0.2626335 0.04790685 69236 297415   69236

The estimate here is enrichment score.

NOTE: Negative enrichment score suggests enrichment of the categorical database with the higher values (in the numerical database). Positive enrichment score represent enrichment with the smaller values. As expected, the designed TSS CpGs are significantly enriched in smaller TSS distance and higher CpG density.

Alternatively one can test the enrichment of a continuous query with discrete databases. Here we will use the methylation level from a sample as the query and test it against the chromHMM chromatin states.

library(sesame)
sesameDataCache(data_titles = c("MM285.1.SigDF"))
beta_values <- getBetas(sesameDataGet("MM285.1.SigDF"))
res <- testEnrichmentSEA(beta_values, "MM285.chromHMM")
main_stats <- c("dbname", "test", "estimate", "FDR", "nQ", "nD", "overlap")
res[,main_stats] 
##      dbname                 test   estimate           FDR    nQ     nD overlap
## 14      Tss Set Enrichment Score  0.8010037  0.000000e+00 41675 296070   41672
## 15   TssBiv Set Enrichment Score  0.6609816  0.000000e+00 12278 296070   12278
## 10   Quies4 Set Enrichment Score  0.3407788  0.000000e+00  6751 296070    6751
## 1       Enh Set Enrichment Score  0.3277562  0.000000e+00  8269 296070    8269
## 5     EnhPr Set Enrichment Score  0.2930447  0.000000e+00  5912 296070    5912
## 16  TssFlnk Set Enrichment Score  0.2873390  0.000000e+00  9462 296070    9461
## 12   ReprPC Set Enrichment Score  0.2365804  0.000000e+00  8858 296070    8858
## 3     EnhLo Set Enrichment Score  0.1898612 6.534576e-133  1808 296070    1808
## 6       Het Set Enrichment Score -0.1748840  3.468312e-02  3575 296070    3575
## 11   QuiesG Set Enrichment Score -0.2460352  4.490718e-02 35428 296070   35423
## 13 ReprPCWk Set Enrichment Score -0.2297174  4.490718e-02  9806 296070    9805
## 17       Tx Set Enrichment Score -0.4111345  5.000525e-02 17801 296070   17801
## 18     TxWk Set Enrichment Score -0.3113600  5.000525e-02 14167 296070   14165
## 7     Quies Set Enrichment Score -0.3181956  7.782078e-02 96622 296070   96602
## 2      EnhG Set Enrichment Score -0.1601814  5.365994e-01  3640 296070    3640
## 9    Quies3 Set Enrichment Score -0.1629812  1.000000e+00  4113 296070    4112
## 4   EnhPois Set Enrichment Score -0.1866245  1.000000e+00 12317 296070   12317
## 8    Quies2 Set Enrichment Score -0.1213437  1.000000e+00  2603 296070    2601

As expected, chromatin states Tss, Enh has negative enrichment score, meaning these databases are associated with small values of the query (DNA methylation level). On the contrary, Het and Quies states are associated with high methylation level.

SESSION INFO

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] sesame_1.25.1               knitr_1.49                 
##  [3] gprofiler2_0.2.3            SummarizedExperiment_1.37.0
##  [5] Biobase_2.67.0              GenomicRanges_1.59.1       
##  [7] GenomeInfoDb_1.43.2         IRanges_2.41.2             
##  [9] S4Vectors_0.45.2            MatrixGenerics_1.19.0      
## [11] matrixStats_1.4.1           sesameData_1.24.0          
## [13] ExperimentHub_2.15.0        AnnotationHub_3.15.0       
## [15] BiocFileCache_2.15.0        dbplyr_2.5.0               
## [17] BiocGenerics_0.53.3         generics_0.1.3             
## [19] knowYourCG_1.3.5           
## 
## loaded via a namespace (and not attached):
##  [1] DBI_1.2.3               bitops_1.0-9            rlang_1.1.4            
##  [4] magrittr_2.0.3          compiler_4.4.2          RSQLite_2.3.9          
##  [7] png_0.1-8               vctrs_0.6.5             reshape2_1.4.4         
## [10] stringr_1.5.1           pkgconfig_2.0.3         crayon_1.5.3           
## [13] fastmap_1.2.0           XVector_0.47.1          utf8_1.2.4             
## [16] rmarkdown_2.29          tzdb_0.4.0              preprocessCore_1.69.0  
## [19] UCSC.utils_1.3.0        purrr_1.0.2             bit_4.5.0.1            
## [22] xfun_0.49               cachem_1.1.0            jsonlite_1.8.9         
## [25] blob_1.2.4              DelayedArray_0.33.3     BiocParallel_1.41.0    
## [28] parallel_4.4.2          R6_2.5.1                bslib_0.8.0            
## [31] stringi_1.8.4           RColorBrewer_1.1-3      jquerylib_0.1.4        
## [34] Rcpp_1.0.13-1           wheatmap_0.2.0          readr_2.1.5            
## [37] Matrix_1.7-1            tidyselect_1.2.1        abind_1.4-8            
## [40] yaml_2.3.10             codetools_0.2-20        curl_6.0.1             
## [43] lattice_0.22-6          tibble_3.2.1            plyr_1.8.9             
## [46] withr_3.0.2             KEGGREST_1.47.0         evaluate_1.0.1         
## [49] Biostrings_2.75.3       pillar_1.10.0           BiocManager_1.30.25    
## [52] filelock_1.0.3          plotly_4.10.4           RCurl_1.98-1.16        
## [55] BiocVersion_3.21.1      hms_1.1.3               ggplot2_3.5.1          
## [58] munsell_0.5.1           scales_1.3.0            glue_1.8.0             
## [61] lazyeval_0.2.2          maketools_1.3.1         tools_4.4.2            
## [64] sys_3.4.3               data.table_1.16.4       buildtools_1.0.0       
## [67] grid_4.4.2              tidyr_1.3.1             AnnotationDbi_1.69.0   
## [70] colorspace_2.1-1        GenomeInfoDbData_1.2.13 cli_3.6.3              
## [73] rappdirs_0.3.3          S4Arrays_1.7.1          viridisLite_0.4.2      
## [76] dplyr_1.1.4             gtable_0.3.6            sass_0.4.9             
## [79] digest_0.6.37           SparseArray_1.7.2       ggrepel_0.9.6          
## [82] htmlwidgets_1.6.4       memoise_2.0.1           htmltools_0.5.8.1      
## [85] lifecycle_1.0.4         httr_1.4.7              mime_0.12              
## [88] bit64_4.5.2