MassBank Data for AnnotationHub

Authors: Johannes Rainer [cre] (https://orcid.org/0000-0002-6977-7147)
Compiled: Mon Nov 18 03:11:37 2024

Introduction

MassBank is an open access, community maintained annotation database for small compounds. Annotations provided by this database comprise names, chemical formulas, exact masses and other chemical properties for small compounds (including metabolites, medical treatment agents and others). In addition, fragment spectra are available which are crucial for the annotation of untargeted mass spectrometry data. The CompoundDb Bioconductor package supports conversion of MassBank data into the CompDb (SQLite) format which enables a simplified distribution of the resource and easy integration into Bioconductor-based annotation workflows.

Fetch MassBank CompDb Databases from AnnotationHub

The AHMassBank package provides the metadata for all CompDb SQLite databases with MassBank annotations in r Biocpkg("AnnotationHub"). To get and use MassBank annotations we first we load/update the AnnotationHub resource.

library(AnnotationHub)
## Warning: multiple methods tables found for 'intersect'
## Warning: multiple methods tables found for 'intersect'
ah <- AnnotationHub()

Next we list all MassBank entries from AnnotationHub.

query(ah, "MassBank")
## AnnotationHub with 6 records
## # snapshotDate(): 2024-10-28
## # $dataprovider: MassBank
## # $species: NA
## # $rdataclass: CompDb
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["AH107048"]]' 
## 
##              title                                
##   AH107048 | MassBank CompDb for release 2021.03  
##   AH107049 | MassBank CompDb for release 2022.06  
##   AH111334 | MassBank CompDb for release 2022.12.1
##   AH116164 | MassBank CompDb for release 2023.06  
##   AH116165 | MassBank CompDb for release 2023.09  
##   AH116166 | MassBank CompDb for release 2023.11

We fetch the CompDb with MassBank annotations for release 2021.03.

qr <- query(ah, c("MassBank", "2021.03"))
cdb <- qr[[1]]
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## require("CompoundDb")
## Warning: multiple methods tables found for 'union'
## Warning: multiple methods tables found for 'intersect'
## Warning: multiple methods tables found for 'setdiff'

Creating CompDb Databases from MassBank

MassBank provides its annotation database as a MySQL dump. To simplify its usage (also for users not experienced with MySQL or with the specific MassBank database layout), MassBank annotations can also be converted into the (SQLite-based) CompDb format which can be easily used with the CompoundDb package. The steps to convert a MassBank MySQL database to a CompDb SQLite database are described below.

First the MySQL database dump needs to be downloaded from the MassBank github page. This database needs to be installed into a local MySQL/MariaDB database server (using mysql -h localhost -u <username> -p < MassBank.sql with <username> being the name of the user with write access to the database server).

To transfer the MassBank data into a CompDb database a helper function from the CompoundDb package can be used.

library(RMariaDB)
con <- dbConnect(MariaDB(), host = "localhost", user = <username>,
                 pass = <password>, dbname = "MassBank")
source(system.file("scripts", "massbank_to_compdb.R", package = "CompoundDb"))
massbank_to_compdb(con)

Session Information

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] CompoundDb_1.11.0       S4Vectors_0.45.2        AnnotationFilter_1.31.0
## [4] AnnotationHub_3.15.0    BiocFileCache_2.15.0    dbplyr_2.5.0           
## [7] BiocGenerics_0.53.3     generics_0.1.3          BiocStyle_2.35.0       
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1        dplyr_1.1.4             blob_1.2.4             
##  [4] filelock_1.0.3          Biostrings_2.75.1       bitops_1.0-9           
##  [7] fastmap_1.2.0           lazyeval_0.2.2          RCurl_1.98-1.16        
## [10] digest_0.6.37           mime_0.12               lifecycle_1.0.4        
## [13] cluster_2.1.6           ProtGenerics_1.39.0     rsvg_2.6.1             
## [16] KEGGREST_1.47.0         RSQLite_2.3.8           magrittr_2.0.3         
## [19] compiler_4.4.2          rlang_1.1.4             sass_0.4.9             
## [22] tools_4.4.2             utf8_1.2.4              yaml_2.3.10            
## [25] knitr_1.49              htmlwidgets_1.6.4       bit_4.5.0              
## [28] curl_6.0.1              xml2_1.3.6              BiocParallel_1.41.0    
## [31] withr_3.0.2             purrr_1.0.2             sys_3.4.3              
## [34] grid_4.4.2              fansi_1.0.6             colorspace_2.1-1       
## [37] ggplot2_3.5.1           MASS_7.3-61             scales_1.3.0           
## [40] cli_3.6.3               rmarkdown_2.29          crayon_1.5.3           
## [43] httr_1.4.7              rjson_0.2.23            DBI_1.2.3              
## [46] cachem_1.1.0            zlibbioc_1.52.0         parallel_4.4.2         
## [49] AnnotationDbi_1.69.0    BiocManager_1.30.25     XVector_0.47.0         
## [52] base64enc_0.1-3         vctrs_0.6.5             jsonlite_1.8.9         
## [55] IRanges_2.41.1          bit64_4.5.2             clue_0.3-66            
## [58] maketools_1.3.1         jquerylib_0.1.4         glue_1.8.0             
## [61] codetools_0.2-20        DT_0.33                 Spectra_1.17.0         
## [64] gtable_0.3.6            BiocVersion_3.21.1      GenomeInfoDb_1.43.0    
## [67] GenomicRanges_1.59.0    UCSC.utils_1.3.0        munsell_0.5.1          
## [70] tibble_3.2.1            pillar_1.9.0            rappdirs_0.3.3         
## [73] htmltools_0.5.8.1       GenomeInfoDbData_1.2.13 R6_2.5.1               
## [76] evaluate_1.0.1          Biobase_2.67.0          png_0.1-8              
## [79] memoise_2.0.1           bslib_0.8.0             MetaboCoreUtils_1.15.0 
## [82] Rcpp_1.0.13-1           gridExtra_2.3           ChemmineR_3.59.0       
## [85] xfun_0.49               fs_1.6.5                MsCoreUtils_1.19.0     
## [88] buildtools_1.0.0        pkgconfig_2.0.3