Overview of pathway network databases

Introduction

Load required packages

Load the package with the library function.

library(tidyverse)
library(ggplot2)

library(dce)

set.seed(42)

Pathway database overview

We provide access to the following topological pathway databases using graphite (Sales et al. 2012) in a processed format. This format looks as follows:

dce::df_pathway_statistics %>%
  arrange(desc(node_num)) %>%
  head(10) %>%
  knitr::kable()
database pathway_id pathway_name node_num edge_num
reactome R-HSA-162582 Signaling Pathways 2488 62068
reactome R-HSA-1430728 Metabolism 2047 85543
reactome R-HSA-392499 Metabolism of proteins 1894 52807
reactome R-HSA-1643685 Disease 1774 55469
reactome R-HSA-168256 Immune System 1771 58277
panther P00057 Wnt signaling pathway 1644 195344
reactome R-HSA-74160 Gene expression (Transcription) 1472 32493
reactome R-HSA-597592 Post-translational protein modification 1394 26399
kegg hsa:01100 Metabolic pathways 1343 22504
reactome R-HSA-73857 RNA Polymerase II Transcription 1339 25294

Let’s see how many pathways each database provides:

dce::df_pathway_statistics %>%
  count(database, sort = TRUE, name = "pathway_number") %>%
  knitr::kable()
database pathway_number
pathbank 48685
smpdb 48671
reactome 2406
wikipathways 640
kegg 323
panther 94
pharmgkb 90

Next, we can see how the pathway sizes are distributed for each database:

dce::df_pathway_statistics %>%
  ggplot(aes(x = node_num)) +
    geom_histogram(bins = 30) +
    facet_wrap(~ database, scales = "free") +
    theme_minimal()

Plotting pathways

It is easily possible to plot pathways:

pathways <- get_pathways(
  pathway_list = list(
    pathbank = c("Lactose Synthesis"),
    kegg = c("Fatty acid biosynthesis")
  )
)

lapply(pathways, function(x) {
  plot_network(
    as(x$graph, "matrix"),
    visualize_edge_weights = FALSE,
    arrow_size = 0.02,
    shadowtext = TRUE
  ) +
    ggtitle(x$pathway_name)
})
## [[1]]

## 
## [[2]]

Session information

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] dce_1.13.0                  graph_1.83.0               
##  [3] cowplot_1.1.3               lubridate_1.9.3            
##  [5] forcats_1.0.0               stringr_1.5.1              
##  [7] dplyr_1.1.4                 purrr_1.0.2                
##  [9] readr_2.1.5                 tidyr_1.3.1                
## [11] tibble_3.2.1                tidyverse_2.0.0            
## [13] TCGAutils_1.25.1            curatedTCGAData_1.27.0     
## [15] MultiAssayExperiment_1.31.5 SummarizedExperiment_1.35.1
## [17] Biobase_2.65.1              GenomicRanges_1.57.1       
## [19] GenomeInfoDb_1.41.1         IRanges_2.39.2             
## [21] S4Vectors_0.43.2            BiocGenerics_0.51.1        
## [23] MatrixGenerics_1.17.0       matrixStats_1.3.0          
## [25] ggraph_2.2.1                ggplot2_3.5.1              
## [27] BiocStyle_2.33.1           
## 
## loaded via a namespace (and not attached):
##   [1] bitops_1.0-8              httr_1.4.7               
##   [3] GenomicDataCommons_1.29.4 prabclus_2.3-3           
##   [5] Rgraphviz_2.49.0          numDeriv_2016.8-1.1      
##   [7] tools_4.4.1               utf8_1.2.4               
##   [9] R6_2.5.1                  vegan_2.6-8              
##  [11] mgcv_1.9-1                sn_2.1.1                 
##  [13] permute_0.9-7             withr_3.0.1              
##  [15] graphite_1.51.0           gridExtra_2.3            
##  [17] flexclust_1.4-2           cli_3.6.3                
##  [19] sandwich_3.1-0            labeling_0.4.3           
##  [21] sass_0.4.9                diptest_0.77-1           
##  [23] mvtnorm_1.3-0             robustbase_0.99-4        
##  [25] proxy_0.4-27              Rsamtools_2.21.1         
##  [27] FMStable_0.1-4            Linnorm_2.29.0           
##  [29] plotrix_3.8-4             limma_3.61.9             
##  [31] RSQLite_2.3.7             generics_0.1.3           
##  [33] BiocIO_1.15.2             gtools_3.9.5             
##  [35] wesanderson_0.3.7         Matrix_1.7-0             
##  [37] fansi_1.0.6               logger_0.3.0             
##  [39] abind_1.4-5               lifecycle_1.0.4          
##  [41] multcomp_1.4-26           yaml_2.3.10              
##  [43] edgeR_4.3.14              mathjaxr_1.6-0           
##  [45] SparseArray_1.5.31        BiocFileCache_2.13.0     
##  [47] Rtsne_0.17                grid_4.4.1               
##  [49] blob_1.2.4                gdata_3.0.0              
##  [51] ppcor_1.1                 bdsmatrix_1.3-7          
##  [53] ExperimentHub_2.13.1      crayon_1.5.3             
##  [55] lattice_0.22-6            GenomicFeatures_1.57.0   
##  [57] KEGGREST_1.45.1           sys_3.4.2                
##  [59] maketools_1.3.0           pillar_1.9.0             
##  [61] knitr_1.48                rjson_0.2.22             
##  [63] fpc_2.2-12                corpcor_1.6.10           
##  [65] codetools_0.2-20          mutoss_0.1-13            
##  [67] glue_1.7.0                RcppArmadillo_14.0.0-1   
##  [69] data.table_1.16.0         vctrs_0.6.5              
##  [71] png_0.1-8                 Rdpack_2.6.1             
##  [73] mnem_1.21.0               gtable_0.3.5             
##  [75] kernlab_0.9-33            assertthat_0.2.1         
##  [77] amap_0.8-19               cachem_1.1.0             
##  [79] xfun_0.47                 mime_0.12                
##  [81] rbibutils_2.2.16          S4Arrays_1.5.7           
##  [83] RcppEigen_0.3.4.0.2       tidygraph_1.3.1          
##  [85] survival_3.7-0            fastICA_1.2-5.1          
##  [87] statmod_1.5.0             TH.data_1.1-2            
##  [89] nlme_3.1-166              tsne_0.1-3.1             
##  [91] naturalsort_0.1.3         bit64_4.0.5              
##  [93] gmodels_2.19.1            filelock_1.0.3           
##  [95] bslib_0.8.0               colorspace_2.1-1         
##  [97] DBI_1.2.3                 nnet_7.3-19              
##  [99] mnormt_2.1.1              tidyselect_1.2.1         
## [101] bit_4.0.5                 compiler_4.4.1           
## [103] curl_5.2.2                rvest_1.0.4              
## [105] expm_1.0-0                xml2_1.3.6               
## [107] TFisher_0.2.0             ggdendro_0.2.0           
## [109] DelayedArray_0.31.11      shadowtext_0.1.4         
## [111] rtracklayer_1.65.0        harmonicmeanp_3.0.1      
## [113] sfsmisc_1.1-19            scales_1.3.0             
## [115] DEoptimR_1.1-3            RBGL_1.81.0              
## [117] rappdirs_0.3.3            apcluster_1.4.13         
## [119] digest_0.6.37             snowfall_1.84-6.3        
## [121] rmarkdown_2.28            XVector_0.45.0           
## [123] htmltools_0.5.8.1         pkgconfig_2.0.3          
## [125] highr_0.11                dbplyr_2.5.0             
## [127] fastmap_1.2.0             rlang_1.1.4              
## [129] UCSC.utils_1.1.0          farver_2.1.2             
## [131] jquerylib_0.1.4           zoo_1.8-12               
## [133] jsonlite_1.8.8            BiocParallel_1.39.0      
## [135] mclust_6.1.1              RCurl_1.98-1.16          
## [137] magrittr_2.0.3            modeltools_0.2-23        
## [139] GenomeInfoDbData_1.2.12   munsell_0.5.1            
## [141] Rcpp_1.0.13               viridis_0.6.5            
## [143] stringi_1.8.4             zlibbioc_1.51.1          
## [145] MASS_7.3-61               plyr_1.8.9               
## [147] AnnotationHub_3.13.3      org.Hs.eg.db_3.19.1      
## [149] flexmix_2.3-19            parallel_4.4.1           
## [151] ggrepel_0.9.5             Biostrings_2.73.1        
## [153] graphlayouts_1.1.1        splines_4.4.1            
## [155] multtest_2.61.0           hms_1.1.3                
## [157] locfit_1.5-9.10           qqconf_1.3.2             
## [159] igraph_2.0.3              fastcluster_1.2.6        
## [161] buildtools_1.0.0          reshape2_1.4.4           
## [163] BiocVersion_3.20.0        XML_3.99-0.17            
## [165] evaluate_0.24.0           metap_1.11               
## [167] pcalg_2.7-11              BiocManager_1.30.25      
## [169] tzdb_0.4.0                tweenr_2.0.3             
## [171] polyclip_1.10-7           clue_0.3-65              
## [173] BiocBaseUtils_1.7.3       ggforce_0.4.2            
## [175] restfulr_0.0.15           e1071_1.7-14             
## [177] viridisLite_0.4.2         class_7.3-22             
## [179] snow_0.4-4                ggm_2.5.1                
## [181] memoise_2.0.1             AnnotationDbi_1.67.0     
## [183] GenomicAlignments_1.41.0  ellipse_0.5.0            
## [185] cluster_2.1.6             timechange_0.3.0

References

Sales, Gabriele, Enrica Calura, Duccio Cavalieri, and Chiara Romualdi. 2012. “Graphite-a Bioconductor Package to Convert Pathway Topology to Gene Network.” BMC Bioinformatics 13 (1): 20.