An introduction to biodbHmdb

Introduction

biodbHmdb is a biodb extension package that implements a connector to HMDB Metabolites.

We present here the different ways to search for HMDB (Wishart et al. 2012) entries with this package.

Note that the whole HMDB is downloaded locally by biodb and stored on disk, since this is the only way to access HMDB programmatically. Any search on HMDB is hence currently run on the local machine.

Installation

Install using Bioconductor:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install('biodbHmdb')

Initialization

The first step in using biodbHmdb, is to create an instance of the biodb class BiodbMain from the main biodb package. This is done by calling the constructor of the class:

mybiodb <- biodb::newInst()

During this step the configuration is set up, the cache system is initialized and extension packages are loaded.

We will see at the end of this vignette that the biodb instance needs to be terminated with a call to the terminate() method.

Creating a connector to HMDB Metabolites

In biodb the connection to a database is handled by a connector instance that you can get from the factory. biodbHmdb implements a connector to a remote database. Here is the code to instantiate a connector:

conn <- mybiodb$getFactory()$createConn('hmdb.metabolites')
## Loading required package: biodbHmdb

For this vignette, we will avoid the downloading of the full HMDB Metabolites database, and use instead an extract containing a few entries:

dbExtract <- system.file("extdata", 'generated', "hmdb_extract.zip",
    package="biodbHmdb")
conn$setPropValSlot('urls', 'db.zip.url', dbExtract)

Accessing entries

To get the number of entries stored inside the database, run:

conn$getNbEntries()
## INFO  [04:22:55.655] Create cache folder "/github/home/.cache/R/biodb/hmdb.metabolites-9e094cf1e5980f9ac69d952373fb6d34" for "hmdb.metabolites-9e094cf1e5980f9ac69d952373fb6d34".
## INFO  [04:22:55.656] Downloading whole database of hmdb.metabolites.
## INFO  [04:22:55.657] Downloading HMDB metabolite database at "/tmp/RtmpXHjEe9/Rinst14fb39a57903/biodbHmdb/extdata/generated/hmdb_extract.zip" ...
## INFO  [04:22:55.661] Extract whole database of hmdb.metabolites.
## INFO  [04:22:55.661] Extracting content of downloaded',
##                     ' HMDB metabolite database...
## [1] 2

To get some of the first entry IDs (accession numbers) from the database, run:

ids <- conn$getEntryIds(2)
ids
## [1] "HMDB0000001" "HMDB0000002"

To retrieve entries, use:

entries <- conn$getEntry(ids)
entries
## [[1]]
## Biodb HMDB Metabolites entry instance HMDB0000001.
## 
## [[2]]
## Biodb HMDB Metabolites entry instance HMDB0000002.

To convert a list of entries into a dataframe, run:

x <- mybiodb$entriesToDataframe(entries, compute=FALSE)
x
##     accession
## 1 HMDB0000001
## 2 HMDB0000002
##                                                          secondary.accessions
## 1 HMDB00001;HMDB0004935;HMDB0006703;HMDB0006704;HMDB04935;HMDB06703;HMDB06704
## 2                                             HMDB00002;HMDB0060172;HMDB60172
##   average.mass   cas.id chebi.id chemspider.id kegg.compound.id
## 1     169.1811 332-80-9    50599         83153           C01152
## 2      74.1249 109-76-2    15725           415           C00986
##   ncbi.pubchem.comp.id                                     comp.iupac.name.syst
## 1                92105 (2S)-2-amino-3-(1-methyl-1H-imidazol-4-yl)propanoic acid
## 2                  428                                      propane-1,3-diamine
##   comp.iupac.name.trad   formula
## 1    1 methylhistidine C7H11N3O2
## 2   α,ω-propanediamine   C3H10N2
##                                                                                     inchi
## 1 InChI=1S/C7H11N3O2/c1-10-3-5(9-4-10)2-6(8)7(11)12/h3-4,6H,2,8H2,1H3,(H,11,12)/t6-/m0/s1
## 2                                                      InChI=1S/C3H10N2/c4-2-1-3-5/h1-5H2
##                      inchikey monoisotopic.mass
## 1 BRMWTNUJHUMWMS-LURJTMIESA-N          169.0851
## 2 XFNJVJPLKCPIBV-UHFFFAOYSA-N           74.0844
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              name
## 1 1-Methylhistidine;(2S)-2-Amino-3-(1-methyl-1H-imidazol-4-yl)propanoic acid;Pi-methylhistidine;(2S)-2-Amino-3-(1-methyl-1H-imidazol-4-yl)propanoate;1 Methylhistidine;1-Methyl histidine;1-Methyl-histidine;1-Methyl-L-histidine;1-MHis;1-N-Methyl-L-histidine;L-1-Methylhistidine;N1-Methyl-L-histidine;1-Methylhistidine dihydrochloride;Renal disease;Nephropathy;Non-insulin-dependent diabetes mellitus;Niddm;Adult-onset diabetes;Striated muscle;Fecal;Stool;Faecal;Faeces;Csf;Cytoplasma
## 2                                                             1,3-Diaminopropane;1,3-Propanediamine;1,3-Propylenediamine;Propane-1,3-diamine;tn;Trimethylenediamine;1,3-diamino-N-Propane;1,3-Trimethylenediamine;3-Aminopropylamine;a,W-Propanediamine;Trimethylenediamine hydrochloride;Trimethylenediamine dihydrochloride;1,3-Diaminepropane;Leukaemia;Digestion;Flora;Gramineae;Papilionoideae;Legume;Soy;Soya;Soybean;Soya bean;Cucurbits;Gourds;Fauna;Fecal;Stool;Faecal;Faeces;Cytoplasma
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         description
## 1 1-Methylhistidine, also known as 1-MHis, belongs to the class of organic compounds known as histidine and derivatives. Histidine and derivatives are compounds containing cysteine or a derivative thereof resulting from a reaction of cysteine at the amino group or the carboxy group, or from the replacement of any hydrogen of glycine by a heteroatom. 1-Methylhistidine is derived mainly from the anserine of dietary flesh sources, especially poultry. The enzyme, carnosinase, splits anserine into beta-alanine and 1-MHis. High levels of 1-MHis tend to inhibit the enzyme carnosinase and increase anserine levels. Conversely, genetic variants with deficient carnosinase activity in plasma show increased 1-MHis excretions when they consume a high meat diet. Reduced serum carnosinase activity is also found in patients with Parkinson's disease and multiple sclerosis and patients following a cerebrovascular accident. Vitamin E deficiency can lead to 1-methylhistidinuria from increased oxidative effects in skeletal muscle. 1-Methylhistidine is a biomarker for the consumption of meat, especially red meat.
## 2                                                                                                                                                                                                                                                          1,3-Diaminopropane, also known as DAP or trimethylenediamine, belongs to the class of organic compounds known as monoalkylamines. These are organic compounds containing a primary aliphatic amine group. 1,3-Diaminopropane is a stable, flammable, and highly hygroscopic fluid. It is a polyamine that is normally quite toxic if swallowed, inhaled, or absorbed through the skin. It is a catabolic byproduct of spermidine. It is also a precursor in the enzymatic synthesis of beta-alanine. 1,3-Diaminopropane is involved in the arginine/proline metabolic pathways and the beta-alanine metabolic pathway. 1,3-Diaminopropane has been detected, but not quantified in, several different foods, such as cassava, shiitakes, oyster mushrooms, muscadine grapes, and cinnamons. This could make 1,3-diaminopropane a potential biomarker for the consumption of these foods.
##                        smiles              comp.super.class hmdb.metabolites.id
## 1 CN1C=NC(C[C@H](N)C(O)=O)=C1 Organic acids and derivatives         HMDB0000001
## 2                       NCCCN    Organic nitrogen compounds         HMDB0000002

Searching by name

We use here the generic biodb method searchForEntries() to search for entries by name:

id <- conn$searchForEntries(list(name='1-Methylhistidine'), max.results=1)
id
## [1] "HMDB0000001"

We limit the search result to one entry with the max.results field.

The first parameter is the filtering criterion, expressed as a list whose single key (in our case) is the biodb field on which we want to filter. The value is the text we want to search for. See the documentation of searchForEntries() inside ?biodb::BiodbConn.

We could also use several strings to search for, in which case an entry will be matched if its field value contains all the specified strings:

conn$searchForEntries(list(name=c('propanoic', 'acid')), max.results=1)
## [1] "HMDB0000001"

To look at the values of the entry, you may convert it to a data frame:

entryDf <- conn$getEntry(id)$getFieldsAsDataframe(fields=c('accession', 'name'))

See table @ref(tab:entryByNameTable) for the content of this data frame.

The entry returned when searching by name.
accession name
HMDB0000001 1-Methylhistidine;(2S)-2-Amino-3-(1-methyl-1H-imidazol-4-yl)propanoic acid;Pi-methylhistidine;(2S)-2-Amino-3-(1-methyl-1H-imidazol-4-yl)propanoate;1 Methylhistidine;1-Methyl histidine;1-Methyl-histidine;1-Methyl-L-histidine;1-MHis;1-N-Methyl-L-histidine;L-1-Methylhistidine;N1-Methyl-L-histidine;1-Methylhistidine dihydrochloride;Renal disease;Nephropathy;Non-insulin-dependent diabetes mellitus;Niddm;Adult-onset diabetes;Striated muscle;Fecal;Stool;Faecal;Faeces;Csf;Cytoplasma

Searching inside the “description” field

Searching inside the description field can be done in the same way as for the name field. Here is a search with multiple strings to match:

id <- conn$searchForEntries(list(description=c('Parkinson', 'sclerosis')), max.results=1)
id
## [1] "HMDB0000001"

Again, you can look at the values of the entry through a data frame:

entryDf <- conn$getEntry(id)$getFieldsAsDataframe(fields=c('accession', 'name', 'description'))

See table @ref(tab:entryByDescTable) for the content of this data frame.

The entry returned when searching by description.
accession name description
HMDB0000001 1-Methylhistidine;(2S)-2-Amino-3-(1-methyl-1H-imidazol-4-yl)propanoic acid;Pi-methylhistidine;(2S)-2-Amino-3-(1-methyl-1H-imidazol-4-yl)propanoate;1 Methylhistidine;1-Methyl histidine;1-Methyl-histidine;1-Methyl-L-histidine;1-MHis;1-N-Methyl-L-histidine;L-1-Methylhistidine;N1-Methyl-L-histidine;1-Methylhistidine dihydrochloride;Renal disease;Nephropathy;Non-insulin-dependent diabetes mellitus;Niddm;Adult-onset diabetes;Striated muscle;Fecal;Stool;Faecal;Faeces;Csf;Cytoplasma 1-Methylhistidine, also known as 1-MHis, belongs to the class of organic compounds known as histidine and derivatives. Histidine and derivatives are compounds containing cysteine or a derivative thereof resulting from a reaction of cysteine at the amino group or the carboxy group, or from the replacement of any hydrogen of glycine by a heteroatom. 1-Methylhistidine is derived mainly from the anserine of dietary flesh sources, especially poultry. The enzyme, carnosinase, splits anserine into beta-alanine and 1-MHis. High levels of 1-MHis tend to inhibit the enzyme carnosinase and increase anserine levels. Conversely, genetic variants with deficient carnosinase activity in plasma show increased 1-MHis excretions when they consume a high meat diet. Reduced serum carnosinase activity is also found in patients with Parkinson’s disease and multiple sclerosis and patients following a cerebrovascular accident. Vitamin E deficiency can lead to 1-methylhistidinuria from increased oxidative effects in skeletal muscle. 1-Methylhistidine is a biomarker for the consumption of meat, especially red meat.

Closing biodb instance

When done with your biodb instance you have to terminate it, in order to ensure release of resources (file handles, database connection, etc):

mybiodb$terminate()
## INFO  [04:22:56.392] Closing BiodbMain instance...
## INFO  [04:22:56.397] Connector "hmdb.metabolites" deleted.

Session information

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] biodbHmdb_1.13.0 BiocStyle_2.33.1
## 
## loaded via a namespace (and not attached):
##  [1] rappdirs_0.3.3       sass_0.4.9           utf8_1.2.4          
##  [4] generics_0.1.3       stringi_1.8.4        RSQLite_2.3.7       
##  [7] hms_1.1.3            digest_0.6.37        magrittr_2.0.3      
## [10] evaluate_1.0.1       fastmap_1.2.0        blob_1.2.4          
## [13] plyr_1.8.9           jsonlite_1.8.9       zip_2.3.1           
## [16] progress_1.2.3       DBI_1.2.3            BiocManager_1.30.25 
## [19] httr_1.4.7           fansi_1.0.6          XML_3.99-0.17       
## [22] jquerylib_0.1.4      cli_3.6.3            rlang_1.1.4         
## [25] chk_0.9.2            crayon_1.5.3         dbplyr_2.5.0        
## [28] bit64_4.5.2          withr_3.0.2          cachem_1.1.0        
## [31] yaml_2.3.10          tools_4.4.1          memoise_2.0.1       
## [34] biodb_1.13.0         dplyr_1.1.4          filelock_1.0.3      
## [37] curl_5.2.3           buildtools_1.0.0     vctrs_0.6.5         
## [40] R6_2.5.1             BiocFileCache_2.13.2 lifecycle_1.0.4     
## [43] stringr_1.5.1        bit_4.5.0            pkgconfig_2.0.3     
## [46] pillar_1.9.0         bslib_0.8.0          glue_1.8.0          
## [49] Rcpp_1.0.13          lgr_0.4.4            xfun_0.48           
## [52] tibble_3.2.1         tidyselect_1.2.1     sys_3.4.3           
## [55] knitr_1.48           htmltools_0.5.8.1    rmarkdown_2.28      
## [58] maketools_1.3.1      compiler_4.4.1       prettyunits_1.2.0   
## [61] askpass_1.2.1        openssl_2.2.2

References

Wishart, David S., Timothy Jewison, An Chi Guo, Michael Wilson, Craig Knox, Yifeng Liu, Yannick Djoumbou, et al. 2012. HMDB 3.0—The Human Metabolome Database in 2013.” Nucleic Acids Research 41 (D1): D801–7. https://doi.org/10.1093/nar/gks1065.