Simple food over representation analysis (ORA)

Compiled date: 2024-11-28

Last edited: 2022-01-12

License: GPL-3

Installation

Run the following code to install the Bioconductor version of the package.

# install.packages("BiocManager")
BiocManager::install("fobitools")

Load fobitools

library(fobitools)

You can also load some additional packages that will be very useful in this vignette.

library(dplyr)
library(kableExtra)

metaboliteUniverse and metaboliteList

In microarrays, for example, we can study almost all the genes of an organism in our sample, so it makes sense to perform an over representation analysis (ORA) considering all the genes present in Gene Ontology (GO). Since most of the GO pathways would be represented by some gene in the microarray.

This is different in nutrimetabolomics. Targeted nutrimetabolomics studies sets of about 200-500 diet-related metabolites, so it would not make sense to use all known metabolites (for example in HMDB or CHEBI) in an ORA, as most of them would not have been quantified in the study.

In nutrimetabolomic studies it may be interesting to study enriched or over represented foods/food groups by the metabolites resulting from the study statistical analysis, rather than the enriched metabolic pathways, as would make more sense in genomics or other metabolomics studies.

The Food-Biomarker Ontology (FOBI) provides a biological knowledge for conducting these enrichment analyses in nutrimetabolomic studies, as FOBI provides the relationships between several foods and their associated dietary metabolites (Castellano-Escuder et al. 2020).

Accordingly, to perform an ORA with the fobitools package, it is necessary to provide a metabolite universe (all metabolites included in the statistical analysis) and a list of selected metabolites (selected metabolites according to a statistical criterion).

Here is an example:

# select 300 random metabolites from FOBI
idx_universe <- sample(nrow(fobitools::idmap), 300, replace = FALSE)
metaboliteUniverse <- fobitools::idmap %>%
  dplyr::slice(idx_universe) %>%
  pull(FOBI)

# select 10 random metabolites from metaboliteUniverse that are associated with 'Red meat' (FOBI:0193), 
# 'Lean meat' (FOBI:0185) , 'egg food product' (FOODON:00001274), 
# or 'grape (whole, raw)' (FOODON:03301702)
fobi_subset <- fobitools::fobi %>% # equivalent to `parse_fobi()`
  filter(FOBI %in% metaboliteUniverse) %>%
  filter(id_BiomarkerOf %in% c("FOBI:0193", "FOBI:0185", "FOODON:00001274", "FOODON:03301702")) %>%
  dplyr::slice(sample(nrow(.), 10, replace = FALSE))

metaboliteList <- fobi_subset %>%
  pull(FOBI)
fobitools::ora(metaboliteList = metaboliteList, 
               metaboliteUniverse = metaboliteUniverse, 
               subOntology = "food", 
               pvalCutoff = 0.01)
className classSize overlap pval padj overlapMetabolites
grapefruit (whole, raw) 14 6 0.0000006 0.0000486 FOBI:030….
stem or spear vegetable 4 4 0.0000006 0.0000486 FOBI:030….
apple juice 9 5 0.0000015 0.0000586 FOBI:030….
orange juice 9 5 0.0000015 0.0000586 FOBI:030….
White fish 5 4 0.0000031 0.0000597 FOBI:030….
herb 5 4 0.0000031 0.0000597 FOBI:030….
white bread 5 4 0.0000031 0.0000597 FOBI:030….
white wine 5 4 0.0000031 0.0000597 FOBI:030….
vinegar 6 4 0.0000092 0.0001410 FOBI:030….
white sugar 6 4 0.0000092 0.0001410 FOBI:030….
Red meat 13 5 0.0000148 0.0002054 FOBI:030….
black tea leaf (dry) 7 4 0.0000212 0.0002490 FOBI:030….
kale leaf (raw) 7 4 0.0000212 0.0002490 FOBI:030….
black coffee 3 3 0.0000269 0.0002747 FOBI:030….
black turtle bean (whole) 3 3 0.0000269 0.0002747 FOBI:030….
blueberry (whole, raw) 8 4 0.0000416 0.0003746 FOBI:030….
raspberry (whole, raw) 8 4 0.0000416 0.0003746 FOBI:030….
green tea leaf (dry) 9 4 0.0000737 0.0005637 FOBI:030….
red tea 9 4 0.0000737 0.0005637 FOBI:030….
red velvet 9 4 0.0000737 0.0005637 FOBI:030….
lemon (whole, raw) 18 5 0.0000914 0.0006658 FOBI:030….
carrot root (whole, raw) 10 4 0.0001208 0.0007791 FOBI:030….
dairy food product 10 4 0.0001208 0.0007791 FOBI:030….
wine (food product) 19 5 0.0001222 0.0007791 FOBI:030….
cherry (whole, raw) 12 4 0.0002754 0.0015609 FOBI:030….
grain plant 12 4 0.0002754 0.0015609 FOBI:030….
grain product 12 4 0.0002754 0.0015609 FOBI:030….
coffee (liquid drink) 13 4 0.0003913 0.0020645 FOBI:030….
strawberry (whole, raw) 13 4 0.0003913 0.0020645 FOBI:030….
soybean (whole) 24 5 0.0004148 0.0021156 FOBI:030….
black pepper food product 6 3 0.0005106 0.0025201 FOBI:030….
sweet potato vegetable food product 14 4 0.0005388 0.0025760 FOBI:030….
cumin seed (whole, dried) 15 4 0.0007225 0.0032514 FOBI:030….
tomato (whole, raw) 15 4 0.0007225 0.0032514 FOBI:030….
almond (whole, raw) 7 3 0.0008777 0.0036294 FOBI:030….
cocoa 7 3 0.0008777 0.0036294 FOBI:030….
egg food product 7 3 0.0008777 0.0036294 FOBI:030….
beer 16 4 0.0009474 0.0038144 FOBI:030….
ale 8 3 0.0013793 0.0054112 FOBI:030….
flour 19 4 0.0019184 0.0070663 FOBI:030….
blackberry (whole, raw) 9 3 0.0020322 0.0070663 FOBI:030….
bread food product 9 3 0.0020322 0.0070663 FOBI:030….
pear (whole, raw) 9 3 0.0020322 0.0070663 FOBI:030….
whole bread 9 3 0.0020322 0.0070663 FOBI:030….
black currant (whole, raw) 10 3 0.0028513 0.0092305 FOBI:030….
rye food product 10 3 0.0028513 0.0092305 FOBI:030….
lentil (whole) 3 2 0.0029562 0.0092305 FOBI:030….
soybean oil 3 2 0.0029562 0.0092305 FOBI:030….
turnip (whole, raw) 3 2 0.0029562 0.0092305 FOBI:030….
oregano (ground) 11 3 0.0038505 0.0117826 FOBI:030….
olive (whole, ripe) 12 3 0.0050422 0.0148358 FOBI:030….
tea food product 12 3 0.0050422 0.0148358 FOBI:030….
cauliflower (whole, raw) 4 2 0.0058065 0.0161526 FOBI:030….
pea (whole) 4 2 0.0058065 0.0161526 FOBI:030….
pomegranate (whole, raw) 4 2 0.0058065 0.0161526 FOBI:030….
grape (whole, raw) 14 3 0.0080462 0.0219835 FOBI:030….
eggplant (whole, raw) 5 2 0.0095042 0.0250714 FOBI:030….
rice grain food product 5 2 0.0095042 0.0250714 FOBI:030….
meat food product 15 3 0.0098772 0.0256137 FOBI:030….

Network visualization of metaboliteList terms

Then, with the fobi_graph function we can visualize the metaboliteList terms with their corresponding FOBI relationships.

terms <- fobi_subset %>%
  pull(id_code)

# create the associated graph
fobitools::fobi_graph(terms = terms, 
                      get = "anc",
                      labels = TRUE,
                      legend = TRUE)

Session Information

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] SummarizedExperiment_1.37.0   Biobase_2.67.0               
#>  [3] GenomicRanges_1.59.1          GenomeInfoDb_1.43.1          
#>  [5] IRanges_2.41.1                S4Vectors_0.45.2             
#>  [7] BiocGenerics_0.53.3           generics_0.1.3               
#>  [9] MatrixGenerics_1.19.0         matrixStats_1.4.1            
#> [11] metabolomicsWorkbenchR_1.17.0 POMA_1.17.6                  
#> [13] ggrepel_0.9.6                 rvest_1.0.4                  
#> [15] kableExtra_1.4.0              lubridate_1.9.3              
#> [17] forcats_1.0.0                 stringr_1.5.1                
#> [19] dplyr_1.1.4                   purrr_1.0.2                  
#> [21] readr_2.1.5                   tidyr_1.3.1                  
#> [23] tibble_3.2.1                  ggplot2_3.5.1                
#> [25] tidyverse_2.0.0               fobitools_1.15.1             
#> [27] BiocStyle_2.35.0             
#> 
#> loaded via a namespace (and not attached):
#>   [1] sys_3.4.3                   rstudioapi_0.17.1          
#>   [3] jsonlite_1.8.9              MultiAssayExperiment_1.33.1
#>   [5] magrittr_2.0.3              farver_2.1.2               
#>   [7] rmarkdown_2.29              zlibbioc_1.52.0            
#>   [9] vctrs_0.6.5                 memoise_2.0.1              
#>  [11] S4Arrays_1.7.1              htmltools_0.5.8.1          
#>  [13] curl_6.0.1                  qdapRegex_0.7.8            
#>  [15] tictoc_1.2.1                SparseArray_1.7.2          
#>  [17] sass_0.4.9                  parallelly_1.39.0          
#>  [19] bslib_0.8.0                 impute_1.81.0              
#>  [21] RecordLinkage_0.4-12.4      cachem_1.1.0               
#>  [23] buildtools_1.0.0            igraph_2.1.1               
#>  [25] lifecycle_1.0.4             pkgconfig_2.0.3            
#>  [27] Matrix_1.7-1                R6_2.5.1                   
#>  [29] fastmap_1.2.0               GenomeInfoDbData_1.2.13    
#>  [31] future_1.34.0               selectr_0.4-2              
#>  [33] digest_0.6.37               syuzhet_1.0.7              
#>  [35] colorspace_2.1-1            RSQLite_2.3.8              
#>  [37] labeling_0.4.3              fansi_1.0.6                
#>  [39] timechange_0.3.0            abind_1.4-8                
#>  [41] httr_1.4.7                  polyclip_1.10-7            
#>  [43] compiler_4.4.2              proxy_0.4-27               
#>  [45] bit64_4.5.2                 withr_3.0.2                
#>  [47] BiocParallel_1.41.0         viridis_0.6.5              
#>  [49] DBI_1.2.3                   ggforce_0.4.2              
#>  [51] MASS_7.3-61                 lava_1.8.0                 
#>  [53] DelayedArray_0.33.2         textclean_0.9.3            
#>  [55] tools_4.4.2                 future.apply_1.11.3        
#>  [57] nnet_7.3-19                 glue_1.8.0                 
#>  [59] grid_4.4.2                  fgsea_1.33.0               
#>  [61] gtable_0.3.6                lexicon_1.2.1              
#>  [63] tzdb_0.4.0                  class_7.3-22               
#>  [65] data.table_1.16.2           hms_1.1.3                  
#>  [67] tidygraph_1.3.1             xml2_1.3.6                 
#>  [69] utf8_1.2.4                  XVector_0.47.0             
#>  [71] pillar_1.9.0                limma_3.63.2               
#>  [73] vroom_1.6.5                 splines_4.4.2              
#>  [75] tweenr_2.0.3                lattice_0.22-6             
#>  [77] survival_3.7-0              bit_4.5.0                  
#>  [79] tidyselect_1.2.1            maketools_1.3.1            
#>  [81] knitr_1.49                  gridExtra_2.3              
#>  [83] svglite_2.1.3               xfun_0.49                  
#>  [85] graphlayouts_1.2.1          statmod_1.5.0              
#>  [87] stringi_1.8.4               UCSC.utils_1.3.0           
#>  [89] yaml_2.3.10                 evaluate_1.0.1             
#>  [91] codetools_0.2-20            evd_2.3-7.1                
#>  [93] ggraph_2.2.1                BiocManager_1.30.25        
#>  [95] cli_3.6.3                   ontologyIndex_2.12         
#>  [97] rpart_4.1.23                xtable_1.8-4               
#>  [99] systemfonts_1.1.0           struct_1.19.0              
#> [101] munsell_0.5.1               jquerylib_0.1.4            
#> [103] Rcpp_1.0.13-1               globals_0.16.3             
#> [105] parallel_4.4.2              blob_1.2.4                 
#> [107] ff_4.5.0                    listenv_0.9.1              
#> [109] viridisLite_0.4.2           ipred_0.9-15               
#> [111] scales_1.3.0                prodlim_2024.06.25         
#> [113] e1071_1.7-16                crayon_1.5.3               
#> [115] clisymbols_1.2.0            rlang_1.1.4                
#> [117] ada_2.0-5                   cowplot_1.1.3              
#> [119] fastmatch_1.1-4

References

Castellano-Escuder, Pol, Raúl González-Domı́nguez, David S Wishart, Cristina Andrés-Lacueva, and Alex Sánchez-Pla. 2020. “FOBI: An Ontology to Represent Food Intake Data and Associate It with Metabolomic Data.” Database 2020.