biodbChebi is a biodb extension package that implements a connector to the ChEBI database (Degtyarenko et al. 2007; Hastings et al. 2012).
Install using Bioconductor:
The first step in using biodbChebi, is to create an instance
of the biodb class BiodbMain
from the main biodb
package. This is done by calling the constructor of the class:
During this step the configuration is set up, the cache system is initialized and extension packages are loaded.
We will see at the end of this vignette that the biodb
instance needs to be terminated with a call to the
terminate()
method.
In biodb the connection to a database is handled by a connector instance that you can get from the factory. biodbChebi implements a connector to a local database, thus when creating an instance you must provide a URL that points to your database:
## Loading required package: biodbChebi
Using accession numbers, we request entries from ChEBI:
## INFO [03:50:26.566] Create cache folder "/github/home/.cache/R/biodb/chebi-0c5076ac2a43d16dbce503a44b09f649" for "chebi-0c5076ac2a43d16dbce503a44b09f649".
Getting the values of entry fields are done using
getFieldValue()
method. Here we retrieve the SMILE field of
the first entry from the ChEBI entries obtained previously:
## [1] "CC(C)CC[C@@H](O)[C@](C)(O)[C@H]1CC[C@@]2(O)C3=CC(=O)[C@@H]4C[C@@H](O)[C@@H](O)C[C@]4(C)[C@H]3[C@H](O)C[C@]12C"
We can convert an entry into a data frame:
## accession charge formula
## 1 2528 0 C27H44O7
## inchi
## 1 InChI=1S/C27H44O7/c1-14(2)6-7-22(32)26(5,33)21-8-9-27(34)16-11-17(28)15-10-18(29)19(30)12-24(15,3)23(16)20(31)13-25(21,27)4/h11,14-15,18-23,29-34H,6-10,12-13H2,1-5H3/t15-,18+,19-,20+,21-,22+,23+,24-,25+,26+,27+/m0/s1
## inchikey kegg.compound.id molecular.mass monoisotopic.mass
## 1 LQGNCUXDDPRDJH-UKTRSHMFSA-N C08811 480.635 480.3087
## name n.stars
## 1 Ajugasterone C 2
## smiles
## 1 CC(C)CC[C@@H](O)[C@](C)(O)[C@H]1CC[C@@]2(O)C3=CC(=O)[C@@H]4C[C@@H](O)[C@@H](O)C[C@]4(C)[C@H]3[C@H](O)C[C@]12C
## chebi.id
## 1 2528
Building a data frame for a list of entries is also possible:
mybiodb$entriesToDataframe(entries, fields=c('accession', 'formula',
'molecular.mass', 'inchikey', 'kegg.compound.id'))
## accession formula molecular.mass inchikey
## 1 2528 C27H44O7 480.6350 LQGNCUXDDPRDJH-UKTRSHMFSA-N
## 2 17053 C4H7NO4 133.1027 CKLJMWTZIZZHCS-REOHCLBHSA-N
## 3 15440 C30H50 410.7300 YYGNTYWPHWGJRM-AAJYLUCBSA-N
## kegg.compound.id
## 1 C08811
## 2 C00049
## 3 C00751
Searching by name and/or mass is done through the
searchCompound()
.
Searching by name:
## Warning: `searchCompound()` was deprecated in biodb 1.0.0.
## ℹ Please use `searchForEntries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [1] "22660" "74653" "136722"
Searching by mass:
## [1] "231506" "30514" "37130"
Searching by name and mass:
ids <- chebi$searchCompound(name='aspartic', mass=133,
mass.field='molecular.mass', mass.tol=0.3, max.results=3)
Display results in data frame:
## accession molecular.mass
## 1 22660 133.1027
## 2 17364 133.1027
## 3 17053 133.1027
## name
## 1 aspartic acid;(+-)-Aspartic acid;(R,S)-Aspartic acid;2-aminobutanedioic acid;Asp;D;DL-Aminosuccinic acid;DL-Asparagic acid
## 2 D-aspartic acid;(R)-2-aminobutanedioic acid;(R)-2-aminosuccinic acid;aspartic acid D-form;D-Asparaginsaeure;DAS
## 3 L-aspartic acid;(S)-2-aminobutanedioic acid;(S)-2-aminosuccinic acid;2-Aminosuccinic acid;Asp;ASPARTIC ACID;D;L-Asparaginsaeure
Converting CAS IDs to ChEBI IDs:
## [1] "10000" "18357"
If more than one ChEBI ID is found for a CAS ID, then a list of character vectors is returned:
## [[1]]
## [1] "40356" "28037"
This behaviour can be made the default one by setting
simplify
parameter to FALSE
.
The method is similar to convCasToChebi()
.
Converting InChI to ChEBI IDs:
chebi$convInchiToChebi('InChI=1S/C8H11NO3/c9-4-8(12)5-1-2-6(10)7(11)3-5/h1-3,8,10-12H,4,9H2/t8-/m0/s1')
## [1] "18357"
You can also use an InChI key:
## [1] "28913"
Do not forget to terminate your biodb instance once you are done with it:
## INFO [03:50:51.442] Closing BiodbMain instance...
## INFO [03:50:51.443] Connector "chebi" deleted.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] biodbChebi_1.13.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.3 sass_0.4.9 generics_0.1.3
## [4] bitops_1.0-9 stringi_1.8.4 RSQLite_2.3.9
## [7] hms_1.1.3 digest_0.6.37 magrittr_2.0.3
## [10] evaluate_1.0.1 fastmap_1.2.0 blob_1.2.4
## [13] plyr_1.8.9 jsonlite_1.8.9 progress_1.2.3
## [16] DBI_1.2.3 BiocManager_1.30.25 httr_1.4.7
## [19] XML_3.99-0.17 jquerylib_0.1.4 cli_3.6.3
## [22] rlang_1.1.4 chk_0.9.2 crayon_1.5.3
## [25] dbplyr_2.5.0 bit64_4.5.2 withr_3.0.2
## [28] cachem_1.1.0 yaml_2.3.10 tools_4.4.2
## [31] memoise_2.0.1 biodb_1.15.0 dplyr_1.1.4
## [34] filelock_1.0.3 curl_6.0.1 buildtools_1.0.0
## [37] vctrs_0.6.5 R6_2.5.1 BiocFileCache_2.15.0
## [40] lifecycle_1.0.4 stringr_1.5.1 bit_4.5.0.1
## [43] pkgconfig_2.0.3 pillar_1.10.0 bslib_0.8.0
## [46] glue_1.8.0 Rcpp_1.0.13-1 lgr_0.4.4
## [49] xfun_0.49 tibble_3.2.1 tidyselect_1.2.1
## [52] sys_3.4.3 knitr_1.49 htmltools_0.5.8.1
## [55] rmarkdown_2.29 maketools_1.3.1 compiler_4.4.2
## [58] prettyunits_1.2.0 askpass_1.2.1 RCurl_1.98-1.16
## [61] openssl_2.3.0