--- title: "ggkegg" author: "Noriaki Sato" output: BiocStyle::html_document: toc: true toc_float: true vignette: > %\VignetteIndexEntry{ggkegg} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, fig.width=12, fig.height=6, warning=FALSE, message=FALSE) ``` # ggkegg This package aims to import, parse, and analyze KEGG data such as KEGG PATHWAY and KEGG MODULE. The package supports visualizing KEGG information using ggplot2 and ggraph through using the grammar of graphics. The package enables the direct visualization of the results from various omics analysis packages and the connection to the other tidy manipulation packages. In this documentation, the basic usage of `ggkegg` is presented. Please refer to [the documentation](https://noriakis.github.io/software/ggkegg) for the detailed usage. ## Introduction There are many great packages performing KEGG PATHWAY analysis in R. `r BiocStyle::Biocpkg("pathview")` fetches KEGG PATHWAY information, enabling the output of images reflecting various user-defined values on the map. `r BiocStyle::Biocpkg("KEGGlincs")` can overlay LINCS data to KEGG PATHWAY, and examine the map using Cytoscape. `r BiocStyle::Biocpkg("graphite")` acquires pathways including KEGG and Reactome, convert them into graphNEL format, and provides an interface for topological analysis. `r BiocStyle::Biocpkg("KEGGgraph")` also downloads KEGG PATHWAY information and converts it into a format analyzable in R. Extending to these packages, the purpose of developing this package, `ggkegg`, is to allow for tidy manipulation of KEGG information by the power of `tidygraph`, to plot the relevant information in flexible and customizable ways using grammar of graphics, to examine the global and overview maps consisting of compounds and reactions. ## Pathway The users can obtain a KEGG PATHWAY `tbl_graph` by `pathway` function. If you want to cache the file, please specify `use_cache=TRUE`, and if you already have the XML files of the pathway, please specify the directory of the file with `directory` argument. Here, we obtain `Cell cycle` pathway (`hsa04110`) using cache. `pathway_id` column is inserted to node and edge by default, which allows for the identification of the pathway ID in the other functions. ```{r pathway1, message=FALSE, warning=FALSE, fig.width=6, fig.height=3} library(ggkegg) library(tidygraph) library(dplyr) graph <- ggkegg::pathway("hsa04110", use_cache=TRUE) graph ``` The output can be analysed readily using `tidygraph` and `dplyr` verbs. For example, centrality calculations can be performed as follows. ```{r pathway1_1, message=FALSE, warning=FALSE, fig.width=6, fig.height=3} graph |> mutate(degree=centrality_degree(mode="all"), betweenness=centrality_betweenness()) |> activate(nodes) |> filter(type=="gene") |> arrange(desc(degree)) |> as_tibble() |> relocate(degree, betweenness) ``` ### Plot the pathway using `ggraph` The parsed `tbl_graph` can be used to plot the information by `ggraph` using the grammar of graphics. The components in the graph such as nodes, edges, and text can be plotted layer by layer. ```{r plot_pathway1, message=FALSE, warning=FALSE, fig.width=10, fig.height=8} graph <- graph |> mutate(showname=strsplit(graphics_name, ",") |> vapply("[", 1, FUN.VALUE="a")) ggraph(graph, layout="manual", x=x, y=y)+ geom_edge_parallel(aes(linetype=subtype_name), arrow=arrow(length=unit(1,"mm"), type="closed"), end_cap=circle(1,"cm"), start_cap=circle(1,"cm"))+ geom_node_rect(aes(fill=I(bgcolor), filter=type == "gene"), color="black")+ geom_node_text(aes(label=showname, filter=type == "gene"), size=2)+ theme_void() ``` Besides the default ordering, various layout functions in `igraph` and `ggraph` can be used. ```{r plot_pathway2, message=FALSE, warning=FALSE, fig.width=10, fig.height=10} graph |> mutate(x=NULL, y=NULL) |> ggraph(layout="nicely")+ geom_edge_parallel(aes(color=subtype_name), arrow=arrow(length=unit(1,"mm"), type="closed"), end_cap=circle(0.1,"cm"), start_cap=circle(0.1,"cm"))+ geom_node_point(aes(filter=type == "gene"), color="black")+ geom_node_point(aes(filter=type == "group"), color="tomato")+ geom_node_text(aes(label=showname, filter=type == "gene"), size=3, repel=TRUE, bg.colour="white")+ scale_edge_color_viridis(discrete=TRUE)+ theme_void() ``` ## Converting identifiers In the above example, `graphics_name` column in the node table were used, which are available in the default KGML file. Some of them are truncated, and the user can convert identifiers using `convert_id` function to be used in `mutate`. One can pipe the functions to convert `name` column consisting of `hsa` KEGG gene IDs in node table of `tbl_graph`. ```{r convert, message=FALSE, warning=FALSE} graph |> activate(nodes) |> mutate(hsa=convert_id("hsa")) |> filter(type == "gene") |> as_tibble() |> relocate(hsa) ``` ### Highlighting set of nodes and edges `highlight_set_nodes()` and `highlight_set_edges()` can be used to identify nodes that satisfy query IDs. Nodes often have multiple IDs, and user can choose `how="any"` (if one of identifiers in the nodes matches the query) or `how="all"` (if all of the identifiers in the nodes match the query) to highlight. ```{r highlight, message=FALSE, warning=FALSE, fig.width=10, fig.height=8} graph |> activate(nodes) |> mutate(highlight=highlight_set_nodes("hsa:7157")) |> ggraph(layout="manual", x=x, y=y)+ geom_node_rect(aes(fill=I(bgcolor), filter=type == "gene"), color="black")+ geom_node_rect(aes(fill="tomato", filter=highlight), color="black")+ geom_node_text(aes(label=showname, filter=type == "gene"), size=2)+ geom_edge_parallel(aes(linetype=subtype_name), arrow=arrow(length=unit(1,"mm"), type="closed"), end_cap=circle(1,"cm"), start_cap=circle(1,"cm"))+ theme_void() ``` ### Overlaying raw KEGG image We can use `overlay_raw_map` to overlay the raw KEGG images on the created `ggraph`. The node and text can be directly customized by using various geoms, effects such as `ggfx`, and scaling functions. The code below creates nodes using default parsed background color and just overlay the image. ```{r example_raw, message=FALSE, warning=FALSE, eval=TRUE} graph |> mutate(degree=centrality_degree(mode="all")) |> ggraph(graph, layout="manual", x=x, y=y)+ geom_node_rect(aes(fill=degree, filter=type == "gene"))+ overlay_raw_map()+ scale_fill_viridis_c()+ theme_void() ``` ## Module and Network ### Parsing module KEGG MODULE can be parsed and used in the analysis. The formula to obtain module is the same as pathway. Here, we use test pathway which contains two KEGG ORTHOLOGY, two compounds and one reaction. This will create `kegg_module` class object storing definition and reactions. ```{r module2, eval=TRUE} mod <- module("M00002", use_cache=TRUE) mod ``` ### Visualizing module The module can be visualized by text-based or network-based, depicting how the KOs interact each other. For text based visualization like the one shown in the original KEGG website, `module_text` can be used. ```{r mod_vis1, message=FALSE, warning=FALSE, fig.width=8, fig.height=4} ## Text-based mod |> module_text() |> ## return data.frame plot_module_text() ``` For network based visualization, `obtain_sequential_module_definition` can be used. ```{r mod_vis2, message=FALSE, warning=FALSE, fig.width=8, fig.height=8} ## Network-based mod |> obtain_sequential_module_definition() |> ## return tbl_graph plot_module_blocks() ``` We can assess module completeness, as well as user-defined module abundances. Please refer to [the module section of documentation](https://noriakis.github.io/software/ggkegg/module.html). The network can be created by the same way, and create `kegg_network` class object storing information. ### Use with the other omics packages The package supports direct importing and visualization, and investigation of the results of the other packages such as enrichment analysis results from `clusterProfiler` and differential expression analysis results from `DESeq2`. Pplease refer to [use cases](https://noriakis.github.io/software/ggkegg/usecases.html) in the documentation for more detailed use cases. ## Wrapper function `ggkegg` `ggkegg` function can be used with various input. For example, if the user provides pathway ID, the function automatically returns the `ggraph` with the original layout, which can be used directly for stacking geoms. The other supported IDs are module, network, and also the `enrichResult` object, and the other options such as converting IDs are available. ```{r ggkegg, fig.width=6, fig.height=6} ggkegg("bpsp00270") |> class() ## Returns ggraph ggkegg("N00002") ## Returns the KEGG NETWORK plot ``` ```{r} sessionInfo() ```