--- title: "Exploring the MTox700+ library" author: "Gavin Rhys Lloyd" date: "`r Sys.Date()`" output: BiocStyle::html_document: toc: true toc_depth: 2 number_sections: true toc_float: true vignette: > %\VignetteIndexEntry{Exploring the MTox700+ library} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, fig.align = "center" ) .DT <- function(x) { dt_options <- list( scrollX = TRUE, pageLength = 6, dom = "t", initComplete = DT::JS( "function(settings, json) {", "$(this.api().table().header()).css({'font-size':'10pt'});", "}" ) ) x %>% DT::datatable(options = dt_options, rownames = FALSE) %>% DT::formatStyle( columns = colnames(x), fontSize = "10pt" ) } library(BiocStyle) ```
# Getting Started The latest versions of `r Biocpkg("struct")` and `MetMashR` that are compatible with your current R version can be installed using BiocManager. ```{r,eval = FALSE, include = TRUE} # install BiocManager if not present if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } # install MetMashR and dependencies BiocManager::install("MetMashR") ``` Once installed you can activate the packages in the usual way: ```{r, eval=TRUE, include=FALSE} suppressWarnings(suppressPackageStartupMessages({ # load the packages library(MetMashR) library(ggplot2) library(structToolbox) library(dplyr) library(DT) })) ``` ```{r, eval=FALSE, include=TRUE} # load the packages library(MetMashR) library(ggplot2) library(structToolbox) library(dplyr) library(DT) ```
# Introduction > MTox700+ is a list of toxicologically relevant metabolites derived from publications, public databases and relevant toxicological assays. In this vignette we import the MTox700+ database and combine.merge and "mash" it with other databases to explore its contents and it's coverage of chemical, biological and toxicological space.
# Importing the MTox700+ database The MTox700+ database can be imported using the `MTox700plus_database` object. It can be imported to a data.frame using the `read_database` method. ```{r} # prep object MT <- MTox700plus_database( version = "latest", tag = "MTox700+" ) # import df <- read_database(MT) # show contents .DT(df) ``` ```{r} # prepare workflow that uses MTox700+ as a source M <- import_source() + trim_whitespace( column_name = ".all", which = "both", whitespace = "[\\h\\v]" ) # apply M <- model_apply(M, MT) ```
# Exploring the chemical space The chemical (or "metabolite") space covered by the MTox700+ database can be explored in several ways using the data included in the database. For example, we can generate images of the molecules using the SMILES included in the database. Here we generate images of the first 6 metabolites in the database. ```{r} # prepare chart C <- openbabel_structure( smiles_column = "smiles", row_index = 1, scale_to_fit = FALSE, image_size = 300, title_column = "metabolite_name", view_port = 400 ) # first six G <- list() for (k in 1:6) { # set row idx C$row_index <- k # plot G[[k]] <- chart_plot(C, predicted(M)) } # layout cowplot::plot_grid(plotlist = G, nrow = 2) ``` The MTox700+ database also contains information about the structural classification of the metabolites based on ChemOnt (a chemical taxonomy) and ClassyFire (software to compute the taxonomy of a structure) [10.1186/s13321-016-0174-y]. In this plot we show the number of metabolites in the MTox700+ database that are assigned to a "superclass" of molecules. ```{r,fig.height=7.5,fig.width=7.5} # initialise chart object C <- annotation_bar_chart( factor_name = "superclass", label_rotation = TRUE, label_location = "outside", label_type = "percent", legend = TRUE ) # plot g <- chart_plot(C, predicted(M)) + ylim(c(0, 600)) + guides(fill = guide_legend(nrow = 6, title = element_blank())) + theme(legend.position = "bottom", legend.margin = margin()) # layout leg <- cowplot::get_legend(g) cowplot::plot_grid(g + theme(legend.position = "none"), leg, nrow = 2, rel_heights = c(75, 25) ) ```
# Exploring the biological space To explore the biological space covered by the metabolites in MTox700+ we need mash the database with additional information about the biological pathways that the metabolites are part of. We use the [PathBank](https://pathbank.org/) for this purpose. A `struct_database` object for PathBank is already included in `MetMashR`.
## Importing PathBank `MetMashR` provides the `PathBank_metabolite_databse` object to import the PathBank database. You can choose to import: - The "primary" database. This is a smaller version of the database restricted to primary pathways. - The "complete" database, which includes all pathways in the database. The "complete" database is a >50mb download, and unzipped is >1Gb. Unzipping and caching of the database is handled by [BiocFileCache]. For the vignette we restrict to the "primary" PathBank database to keep file sizes and downloads to a minimum. We can use the database in two ways: 1. convert it to a source and "mash" it with other sources 2. use it as a lookup table to add information to an existing source. To explore the biological space covered by MetMashR we will do both.
## Comparing PathBank and MTox700+ It is useful to visualise the overlap between PathBank and MTox700+. MTox700+ is a much smaller database due to it being a curated list of metabolites with toxicologial relevance, and PathBank is more general. In th example below we import PathBank as a source, and use a venn diagram to compare the overlap between inchikey identifiers in PathBank and MTox700+. ```{r,fig.width=8,warning=FALSE} # object M already contains the MTox700+ database as a source # prepare PathBank as a source P <- PathBank_metabolite_database( version = "primary", tag = "PathBank" ) # import P <- read_source(P) # prepare chart C <- annotation_venn_chart( factor_name = c("inchikey", "InChI.Key"), legend = FALSE, fill_colour = ".group", line_colour = "white" ) # plot g1 <- chart_plot(C, predicted(M), P) C <- annotation_upset_chart( factor_name = c("inchikey", "InChI.Key") ) g2 <- chart_plot(C, predicted(M), P) cowplot::plot_grid(g1, g2, nrow = 1, labels = c("Venn diagram", "UpSet plot")) ``` The charts show that less than half of the metabolites in MTox700+ are also present in the PathBank database for primary pathways.
## Combining MTox700+ with PathBank To combine the pathway information in PathBank with the MTox700+ database we can use PathBank as a lookup table based on inchikeys. To do this we use the `database_lookup` object. Note that PathBank is not downloaded a second time; it is automatically retrieved from the cache. We request a number of columns from PathBank, including pathway information and additional identifiers such as HMBD ID and KEGG ID. ```{r} # prepare object X <- database_lookup( query_column = "inchikey", database = P$data, database_column = "InChI.Key", include = c( "PathBank.ID", "Pathway.Name", "Pathway.Subject", "Species", "HMDB.ID", "KEGG.ID", "ChEBI.ID", "DrugBank.ID", "SMILES" ), suffix = "" ) # apply X <- model_apply(X, predicted(M)) ``` We can now visualise e.g. the subject of the pathways captured by the MTox700+ database. ```{r} C <- annotation_bar_chart( factor_name = "Pathway.Subject", label_rotation = TRUE, label_location = "outside", label_type = "percent", legend = TRUE ) chart_plot(C, predicted(X)) + ylim(c(0, 17500)) ``` We can see that MTox700+ largely focuses on metabolites related to Disease metabolism and general metabolism, which is concomitant with the database being curated to contain metabolites relevant to toxicology in humans.
## Combining records Metabolites can appear in multiple pathways. The PathBank database therefore contains multiple records for the same metabolite, and the relationship between MTox700+ and PathBank is one-to-many. After obtaining pathway information from PathBank the new table has many more rows than the original MTox700+ database, as each MTox700+ record has been replicated for each match in the PathBank database. e.g. after importing MTox700+ the number of records was: ```{r} # Number in MTox700+ nrow(predicted(M)$data) ``` After combing with PathBank the number of records is: ```{r} # Number after PathBank lookup nrow(predicted(X)$data) ``` Sometimes it is useful to collapse this information into a single record per metabolite. We can use the `combine_records` object and its helper functions to do this in a `MetMashR` workflow. ```{r} # prepare object X <- database_lookup( query_column = "inchikey", database = P$data, database_column = "InChI.Key", include = c( "PathBank.ID", "Pathway.Name", "Pathway.Subject", "Species", "HMDB.ID", "KEGG.ID", "ChEBI.ID", "DrugBank.ID", "SMILES" ), suffix = "" ) + combine_records( group_by = "inchikey", default_fcn = fuse_unique(" || ") ) # apply X <- model_apply(X, predicted(M)) ``` We have used the `.unique` helper function so that records for each inchikey are combined into a single record by only retaining unique values in each field (column). If there are multiple unique values for a field then they are combined into a single string using the " || " separator. We can now extract the pathways associated with a particular metabolite. For example Glycolic acid: ```{r} # get index of metabolite w <- which(predicted(X)$data$metabolite_name == "Glycolic acid") ``` The pathways associated with Glycolic acid are: ```{r} # print list of pathways predicted(X)$data$Pathway.Name[w] ```
# Session Info ```{r} sessionInfo() ```