To see a full list of datasets call the omnipath_show_db
function:
## # A tibble: 23 × 10
## name last_used lifetime package loader loader_param latest_param loaded db key
## <chr> <dttm> <dbl> <chr> <chr> <list> <list> <lgl> <list> <chr>
## 1 Gene Ontol… 2024-11-18 03:29:15 300 Omnipa… go_on… <named list> <named list> TRUE <named list> go_b…
## 2 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_f…
## 3 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_a…
## 4 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_a…
## 5 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_s…
## 6 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_c…
## 7 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_d…
## 8 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_c…
## 9 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_m…
## 10 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_p…
## # ℹ 13 more rows
It returns a tibble where each dataset has a human readable name and a key which can be used to refer to it. We can also check here if the dataset is currently loaded, the time it’s been last used, the loader function and its arguments.
Datasets can be accessed by the get_db
function. Ideally
you should call this function every time you use the dataset. The first
time it will be loaded, the subsequent times the already loaded dataset
will be returned. This way each access is registered and extends the
expiry time. Let’s load the human UniProt-GeneSymbol table. Above we see
its key is up_gs
.
## # A tibble: 20,471 × 2
## From To
## <chr> <chr>
## 1 A0A087X1C5 CYP2D7
## 2 A0A0B4J2F0 PIGBOS1
## 3 A0A0B4J2F2 SIK1B
## 4 A0A0C5B5G6 MT-RNR1
## 5 A0A0K2S4Q6 CD300H
## 6 A0A0U1RRE5 NBDY
## 7 A0A1B0GTW7 CIROP
## 8 A0AV02 SLC12A8
## 9 A0AV96 RBM47
## 10 A0AVF1 IFT56
## # ℹ 20,461 more rows
This dataset is a two columns data frame of SwissProt IDs and Gene
Symbols. Looking again at the datasets, we find that this dataset is
loaded now and the last_used
timestamp is set to the time
we called get_db
:
## # A tibble: 23 × 10
## name last_used lifetime package loader loader_param latest_param loaded db key
## <chr> <dttm> <dbl> <chr> <chr> <list> <list> <lgl> <list> <chr>
## 1 Gene Ontol… 2024-11-18 03:29:15 300 Omnipa… go_on… <named list> <named list> TRUE <named list> go_b…
## 2 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_f…
## 3 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_a…
## 4 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_a…
## 5 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_s…
## 6 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_c…
## 7 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_d…
## 8 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_c…
## 9 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_m…
## 10 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_p…
## # ℹ 13 more rows
The above table contains also a reference to the dataset, and the arguments passed to the loader function:
## # A tibble: 20,471 × 2
## From To
## <chr> <chr>
## 1 A0A087X1C5 CYP2D7
## 2 A0A0B4J2F0 PIGBOS1
## 3 A0A0B4J2F2 SIK1B
## 4 A0A0C5B5G6 MT-RNR1
## 5 A0A0K2S4Q6 CD300H
## 6 A0A0U1RRE5 NBDY
## 7 A0A1B0GTW7 CIROP
## 8 A0AV02 SLC12A8
## 9 A0AV96 RBM47
## 10 A0AVF1 IFT56
## # ℹ 20,461 more rows
## $to
## [1] "genesymbol"
##
## $organism
## [1] 9606
If we call get_db
again, the timestamp is updated,
resetting the expiry counter:
## # A tibble: 23 × 10
## name last_used lifetime package loader loader_param latest_param loaded db key
## <chr> <dttm> <dbl> <chr> <chr> <list> <list> <lgl> <list> <chr>
## 1 Gene Ontol… 2024-11-18 03:29:15 300 Omnipa… go_on… <named list> <named list> TRUE <named list> go_b…
## 2 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_f…
## 3 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_a…
## 4 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_a…
## 5 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_s…
## 6 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_c…
## 7 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_d…
## 8 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_c…
## 9 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_m…
## 10 Gene Ontol… NA 300 Omnipa… go_on… <named list> <lgl [1]> FALSE <lgl [1]> go_p…
## # ℹ 13 more rows
The loaded datasets live in an environment which belong to the
OmnipathR package. Normally users don’t need to access this environment.
As we see below, omnipath_show_db
presents us all
information availble by directly looking at the environment:
## $name
## [1] "UniProt-GeneSymbol table"
##
## $last_used
## [1] "2024-11-18 03:31:18 UTC"
##
## $lifetime
## [1] 300
##
## $package
## [1] "OmnipathR"
##
## $loader
## [1] "uniprot_full_id_mapping_table"
##
## $loader_param
## $loader_param$to
## [1] "genesymbol"
##
## $loader_param$organism
## [1] 9606
##
##
## $latest_param
## $latest_param$to
## [1] "genesymbol"
##
## $latest_param$organism
## [1] 9606
##
##
## $loaded
## [1] TRUE
##
## $db
## # A tibble: 20,471 × 2
## From To
## <chr> <chr>
## 1 A0A087X1C5 CYP2D7
## 2 A0A0B4J2F0 PIGBOS1
## 3 A0A0B4J2F2 SIK1B
## 4 A0A0C5B5G6 MT-RNR1
## 5 A0A0K2S4Q6 CD300H
## 6 A0A0U1RRE5 NBDY
## 7 A0A1B0GTW7 CIROP
## 8 A0AV02 SLC12A8
## 9 A0AV96 RBM47
## 10 A0AVF1 IFT56
## # ℹ 20,461 more rows
The default expiry of datasets is given by the option
omnipath.db_lifetime
. By calling
omnipath_save_config
this option is saved to the default
config file and will be valid in all subsequent sessions. Otherwise it’s
valid only in the current session.
The built-in dataset definitions are in a JSON file shipped with the package. Easiest way to see it is by the git web interface.
Currently no API available for this, but it would be super easy to implement. It would be matter of providing a JSON similar to the above, or calling a function. Please open an issue if you are interested in this feature.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
## [4] LC_COLLATE=C LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Matrix_1.7-1 OmnipathR_3.15.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] xfun_0.49 bslib_0.8.0 lattice_0.22-6 tzdb_0.4.0 vctrs_0.6.5
## [6] tools_4.4.2 generics_0.1.3 parallel_4.4.2 curl_6.0.1 tibble_3.2.1
## [11] fansi_1.0.6 RSQLite_2.3.8 blob_1.2.4 pkgconfig_2.0.3 R.oo_1.27.0
## [16] checkmate_2.3.2 readxl_1.4.3 lifecycle_1.0.4 compiler_4.4.2 stringr_1.5.1
## [21] progress_1.2.3 htmltools_0.5.8.1 sys_3.4.3 buildtools_1.0.0 sass_0.4.9
## [26] yaml_2.3.10 later_1.3.2 pillar_1.9.0 crayon_1.5.3 jquerylib_0.1.4
## [31] tidyr_1.3.1 R.utils_2.12.3 cachem_1.1.0 zip_2.3.1 tidyselect_1.2.1
## [36] rvest_1.0.4 digest_0.6.37 stringi_1.8.4 dplyr_1.1.4 purrr_1.0.2
## [41] maketools_1.3.1 grid_4.4.2 fastmap_1.2.0 cli_3.6.3 logger_0.4.0
## [46] magrittr_2.0.3 XML_3.99-0.17 utf8_1.2.4 readr_2.1.5 withr_3.0.2
## [51] prettyunits_1.2.0 backports_1.5.0 rappdirs_0.3.3 bit64_4.5.2 lubridate_1.9.3
## [56] timechange_0.3.0 rmarkdown_2.29 httr_1.4.7 igraph_2.1.1 bit_4.5.0
## [61] R.matlab_3.7.0 cellranger_1.1.0 R.methodsS3_1.8.2 hms_1.1.3 memoise_2.0.1
## [66] evaluate_1.0.1 knitr_1.49 rlang_1.1.4 Rcpp_1.0.13-1 glue_1.8.0
## [71] DBI_1.2.3 selectr_0.4-2 BiocManager_1.30.25 xml2_1.3.6 vroom_1.6.5
## [76] jsonlite_1.8.9 R6_2.5.1