MetaboSignal 2: merging KEGG with additional interaction resources
Andrea Rodriguez Martinez, Maryam Anwar, Rafael Ayala, Joram M. Posma, Ana L. Neves, Marc-Emmanuel Dumas
May 22, 2017
MetaboSignal is an R package designed to investigate the genetic
regulation of the metabolome, using KEGG as primary reference database.
The main goal of this vignette is to illustrate how KEGG interactions
can be merged with two large literature-curated resources of human
regulatory interactions: OmniPath and TRRUST.
Metabolites are organized in biochemical pathways regulated by signaling-transduction pathways, allowing the organism to adapt to environmental changes and maintain homeostasis. We developed MetaboSignal (Rodriguez-Martinez et al. 2017) as a tool to explore the relationships between genes (both enzymatic and signaling) and metabolites, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa & Goto 2000) as primary reference database. In order to generate a more complete picture of the genetic regulation of the metabolome, we have now updated and standarized the functionalities of MetaboSignal to facilitate its integration with additional resources of molecular interactions. In this vignette we show how KEGG interactions can be merged with human regulatory interactions from two large literature-curated resources: OmniPath (Turei et al. 2016) and TRRUST(Hang et al. 2015).
We begin by loading the MetaboSignal package:
We then load the “regulatory_interactions” and “kegg_pathways” datasets,
containing the following information:
- regulatory_interactions:
matrix containing a set of regulatory interactions reported in OmniPath
(directed protein-protein and signaling interactions) and TRRUST
(transcription factor-target interactions). For each interaction, both
literature reference(s) and primary database reference(s) are reported.
Users are responsible for respecting the terms of the licences of these
databases and for citing them when required. Notice that there are some
inconsistencies between databases in terms of direction and sign of the
interactions. This is likely to be due to curation errors, or also to
the fact that some interactions might be bidirectional or have different
sign depending on the tissue. Users can update/edit this matrix as
required.
- kegg_pathways: matrix containing the identifiers
(IDs) of relevant metabolic (n = 85) and signaling (n = 126) human KEGG
pathways. These IDs were retrieved using the function “MS_getPathIds(
)”.
## Regulatory interactions
data("regulatory_interactions")
head(regulatory_interactions[, c(1, 3, 5)])
## source_entrez target_entrez interaction_type
## [1,] "351" "2" "o_Unknown"
## [2,] "3576" "2" "o_Unknown"
## [3,] "7040" "2" "o_Unknown"
## [4,] "7042" "2" "o_Unknown"
## [5,] "2064" "12" "o_Unknown"
## [6,] "3817" "12" "o_Unknown"
## Path_id Path_category Path_type
## [1,] "hsa00010" "Metabolism; Carbohydrate metabolism" "metabolic"
## [2,] "hsa00020" "Metabolism; Carbohydrate metabolism" "metabolic"
## [3,] "hsa00030" "Metabolism; Carbohydrate metabolism" "metabolic"
## [4,] "hsa00040" "Metabolism; Carbohydrate metabolism" "metabolic"
## [5,] "hsa00051" "Metabolism; Carbohydrate metabolism" "metabolic"
## [6,] "hsa00052" "Metabolism; Carbohydrate metabolism" "metabolic"
## Path_id Path_category Path_type
## [206,] "hsa04964" "Organismal Systems; Excretory system" "signaling"
## [207,] "hsa04966" "Organismal Systems; Excretory system" "signaling"
## [208,] "hsa04970" "Organismal Systems; Digestive system" "signaling"
## [209,] "hsa04971" "Organismal Systems; Digestive system" "signaling"
## [210,] "hsa04972" "Organismal Systems; Digestive system" "signaling"
## [211,] "hsa04976" "Organismal Systems; Digestive system" "signaling"
We use the function “MS_getPathIds( )” to retrieve the IDs of all human metabolic and signaling KEGG pathways.
## Get IDs of metabolic and signaling human pathways
hsa_paths <- MS_getPathIds(organism_code = "hsa")
This function generates a “.txt” file in the working directory named “hsa_pathways.txt”. We recommend that users take some time to inspect this file and carefully select the metabolic and signaling pathways that will be used to build the network. In this example, we selected the pathways stored in the “kegg_pathways” dataset.
Next, we use the function “MS_keggNetwork( )” to build a MetaboSignal network, by merging the selected metabolic and signaling KEGG pathways stored in the “kegg_pathways” dataset:
## Create metabo_paths and signaling_paths
## vectors
metabo_paths <- kegg_pathways[kegg_pathways[, "Path_type"] ==
"metabolic", "Path_id"]
signaling_paths <- kegg_pathways[kegg_pathways[, "Path_type"] ==
"signaling", "Path_id"]
## Build KEGG network (might take a while)
keggNet_example <- MS_keggNetwork(metabo_paths, signaling_paths,
expand_genes = TRUE, convert_entrez = TRUE)
## source target interaction_type
## [1,] "cpd:C00084" "217" "k_compound:reversible"
## [2,] "cpd:C00084" "224" "k_compound:reversible"
## [3,] "cpd:C00084" "221" "k_compound:reversible"
## [4,] "cpd:C00084" "219" "k_compound:reversible"
## [5,] "cpd:C00084" "222" "k_compound:reversible"
## [6,] "cpd:C00084" "220" "k_compound:reversible"
The network is formatted as a three-column matrix where each row represents an edge between two nodes (from source to target). The nodes represent the following molecular entities: chemical compounds (KEGG IDs), reactions (KEGG IDs), signaling genes (Entrez IDs) and metabolic genes (Entrez IDs). The type of interaction is reported in the “interaction_type” column. Compound-gene (or gene-compound) interactions are designated as: “k_compound:reversible” or “k_compound:irreversible”, depending on the direction of the interaction. Other types of interactions correspond to gene-gene interactions. When KEGG reports various types of interaction for the same interactant pair, the “interaction_type” is collapsed using “/”.
Notice that when transforming KEGG signaling maps into binary interactions, a number of indirect interactions are introduced, such as interactions involving all members of a proteic complex or proteins interacting via an intermediary compound (e.g. AC and PKA, via cAMP). We recommend excluding these indirect interactions, as they might alter further topological analyses. In this example, we remove interactions classified as: “unknown”, “indirect-compound”, “indirect-effect”, “dissociation”, “state-change”, “binding”, “association”.
## Get all types of interaction
all_types <- unique(unlist(strsplit(keggNet_example[, "interaction_type"],
"/")))
all_types <- gsub("k_", "", all_types)
## Select wanted interactions
wanted_types <- setdiff(all_types, c("unknown", "indirect-compound",
"indirect-effect", "dissociation", "state-change", "binding",
"association"))
print(wanted_types) # interactions that will be retained
## [1] "compound:reversible" "compound:irreversible" "expression"
## [4] "activation" "phosphorylation" "dephosphorylation"
## [7] "inhibition" "repression" "ubiquitination"
## [10] "methylation" "glycosylation"
## Filter keggNet_example to retain only wanted
## interactions
wanted_types <- paste(wanted_types, collapse = "|")
keggNet_clean <- keggNet_example[grep(wanted_types, keggNet_example[,
3]), ]
We then use the function “MS2_ppiNetwork( )” to generate a regulatory network, by merging the signaling interactions from OmniPath and TRRUST, or by selecting the interactions of only one of these databases. Some examples are shown below:
## Build regulatory network of TRRUST interactions
trrustNet_example <- MS2_ppiNetwork(datasets = "trrust")
## Build regulatory network of OmniPath interactions
omnipathNet_example <- MS2_ppiNetwork(datasets = "omnipath")
## Build regulatory network by merging OmniPath and TRRUST interactions
ppiNet_example <- MS2_ppiNetwork(datasets = "all")
## See network format
head(ppiNet_example)
## source target interaction_type
## [1,] "351" "2" "o_Unknown"
## [2,] "3576" "2" "o_Unknown"
## [3,] "7040" "2" "o_Unknown"
## [4,] "7042" "2" "o_Unknown"
## [5,] "2064" "12" "o_Unknown"
## [6,] "3817" "12" "o_Unknown"
Each of these networks is formatted as a three-column matrix where each row represents an edge between two nodes (from source to target). The third column indicates the interaction type and the source of the interaction (OmniPath: “o_”, TRRUST: “t_”). Notice that common interactions between both databases are collapsed, and the interaction type is reported as: “o_; t_;”.
Finally, we use the function “MS2_mergeNetworks( )” to merge the KEGG-based network with the regulatory network.
## source target interaction_type
## [1,] "cpd:C00084" "217" "k_compound:reversible"
## [2,] "cpd:C00084" "224" "k_compound:reversible"
## [3,] "cpd:C00084" "221" "k_compound:reversible"
## [4,] "cpd:C00084" "219" "k_compound:reversible"
## [5,] "cpd:C00084" "222" "k_compound:reversible"
## [6,] "cpd:C00084" "220" "k_compound:reversible"
The network is formatted as a three-column matrix where each row represents an edge between two nodes (from source to target). The third column indicates the interaction type and the source of the interaction (KEGG: “k_”, OmniPath: “o_”, TRRUST: “t_”). Notice that common interactions between both databases are collapsed, and the interaction type is reported as: “k_;o_;t_;”. This network can be further customized and subsequently used to explore gene-metabolite associations as described in the introductory vignette of the package.