Title: | MetaboSignal: a network-based approach to overlay and explore metabolic and signaling KEGG pathways |
---|---|
Description: | MetaboSignal is an R package that allows merging, analyzing and customizing metabolic and signaling KEGG pathways. It is a network-based approach designed to explore the topological relationship between genes (signaling- or enzymatic-genes) and metabolites, representing a powerful tool to investigate the genetic landscape and regulatory networks of metabolic phenotypes. |
Authors: | Andrea Rodriguez-Martinez, Rafael Ayala, Joram M. Posma, Ana L. Neves, Maryam Anwar, Jeremy K. Nicholson, Marc-Emmanuel Dumas |
Maintainer: | Andrea Rodriguez-Martinez <[email protected]>, Rafael Ayala <[email protected]> |
License: | GPL-3 |
Version: | 1.37.0 |
Built: | 2024-11-29 08:16:57 UTC |
Source: | https://github.com/bioc/MetaboSignal |
This matrix contains a set of KEGG reactions with incorrect/inconsistent directionality. The directionality of these reactions has been corrected based on published literature. This matrix can be updated or edited by the user if required.
directionality_reactions
directionality_reactions
Matrix
Matrix
This data frame contains tissue expression data of human proteins, based on the Human Protein Atlas project. This data frame was obtained from the hpar package, and it is used in MetaboSignal to filter signaling genes based on tissue expression.
data(hpaNormalTissue)
data(hpaNormalTissue)
Data.frame
Data.frame
This matrix contains examples of metabolic and signaling human KEGG pathways. This matrix was generated with the function "MS_getPathIds( )".
kegg_pathways
kegg_pathways
Matrix
Matrix
KEGG network generated using the metabolic and signaling pathways stored in kegg_pathways.
keggNet_example
keggNet_example
Matrix
Matrix
Network generated by merging "keggNet_example" and "ppiNet_example" in the vignette.
mergedNet_example
mergedNet_example
Matrix
Matrix
This network-table was generated using two metabo_paths ("rno00010", "rno00562") and two signaling_paths ("rno04910", "rno04151"). Notice that due to KEGG udpates, this network might be different to the one generated when running the vignette.
data(MetaboSignal_table)
data(MetaboSignal_table)
Matrix
Matrix
This function allows transforming KEGG IDs of genes or compounds into their corresponding common names (for compounds) or symbols (for genes).
MS_changeNames(nodes, organism_code)
MS_changeNames(nodes, organism_code)
nodes |
character vector or matrix containing the KEEG IDs of either metabolites, genes (organism-specific or orthology), or reactions. It also converts human Entrez gene IDs into symbols. |
organism_code |
character vector containing the KEGG code for the organism of interest. For example the KEGG code for the rat is "rno". See the function "MS_keggFinder( )". This argument is ignored when nodes are metabolites. |
A character string or a matrix containing the common metabolite names or gene symbols corresponding to the input KEGG IDs. Reaction IDs remain unchanged.
http://www.kegg.jp/kegg/docs/keggapi.html
MS_changeNames(c("rno:84482", "K01084", "cpd:C00267"), "rno") MS_changeNames("K01082", organism_code = "rno")
MS_changeNames(c("rno:84482", "K01084", "cpd:C00267"), "rno") MS_changeNames("K01082", organism_code = "rno")
This function allows transforming Entrez gene IDs or official gene symbols into KEGG IDs (orthology IDs or organism-specific gene IDs). The transformed KEGG IDs can be stored and used as source genes in the functions "MS_distances( )" or "MS_shortestpathsNetwork( )".
MS_convertGene(genes, organism_code, organism_name, output = "vector", orthology = TRUE)
MS_convertGene(genes, organism_code, organism_name, output = "vector", orthology = TRUE)
genes |
character vector containing the Entrez IDs or official symbols of the genes of interest. All genes need to be in the same ID format (i.e. Entrez or symbols). It is preferable to use Entrez IDs rather than gene symbols, since some gene symbols are not unique. |
organism_code |
character vector containing the KEGG code for the organism of interest. For example the KEGG code for the rat is "rno". See the function "MS_keggFinder( )". |
organism_name |
character vector containing the common name of the organism of interest (e.g. "rat", "mouse", "human", "zebrafish") or taxonomy id. For more details, check: http://docs.mygene.info/en/latest/doc/data.html#species. This argument is only required when gene symbols are used. |
output |
character constant indicating whether the function will return a vector containing mapped and transformed KEGG IDs (output = "vector"), or a matrix containing both mapped Entrez IDs or gene symbols and their corresponding KEGG IDs (output = "matrix"). |
orthology |
logical scalar indicating whether the gene IDs will be transformed into orthology IDs or into organism-specific gene IDs. |
A character vector containing mapped and transformed KEGG IDs or a matrix containing both mapped Entrez IDs or gene symbols and their corresponding KEGG IDs.
Carlson, M. org.Hs.eg.db: Genome wide annotation for Human. R package version >= 3.2.3.
Mark, A., et al. (2014) mygene: Access MyGene.Info_ services. R package version >= 1.6.0.
http://www.kegg.jp/kegg/docs/keggapi.html
# Transform gene symbol Hoga1 (293949) into rat-specific KEGG ID MS_convertGene(genes = "Hoga1", organism_code = "rno", organism_name = "rat", orthology = FALSE) MS_convertGene(genes = "Hoga1", "rno", "rat", output = "matrix", orthology = FALSE) # Transform entrez ID 293949 into orthology KEGG ID MS_convertGene(genes = "293949", organism_code = "rno", output = "matrix")
# Transform gene symbol Hoga1 (293949) into rat-specific KEGG ID MS_convertGene(genes = "Hoga1", organism_code = "rno", organism_name = "rat", orthology = FALSE) MS_convertGene(genes = "Hoga1", "rno", "rat", output = "matrix", orthology = FALSE) # Transform entrez ID 293949 into orthology KEGG ID MS_convertGene(genes = "293949", organism_code = "rno", output = "matrix")
This function generates a distance matrix containing the length of all shortest paths from a set of genes (or reactions) to a set of metabolites. The shortest path length between two nodes is defined as the minimum number of edges between these two nodes.
MS_distances(network_table, organism_code, mode = "SP", source_genes = "all", target_metabolites = "all", names = FALSE)
MS_distances(network_table, organism_code, mode = "SP", source_genes = "all", target_metabolites = "all", names = FALSE)
network_table |
three-column matrix where each row represents an edge between two nodes. See function "MS_keggNetwork( )". |
organism_code |
character vector containing the KEGG code for the organism of interest. For example the KEGG code for the rat is "rno". See the function "MS_keggFinder( )". |
mode |
character constant indicating whether a directed or an undirected network will be considered. "all" indicates that all the edges of the network will be considered as undirected. "out" indicates that all the edges of the network will be considered as directed. "SP" indicates that all network will be considered as directed except the edges linked to target metabolite, which will be considered as undirected. The difference between the "out" and the "SP" options, is that the latter aids reaching target metabolites that are substrates of irreversible reactions. |
source_genes |
character vector containing the genes from which the shortest paths will be calculated. Remember that Entrez IDs or gene symbols can be transformed into KEGG IDs using the function "MS_convertGene( )". By default, source_genes = "all", indicating that all the genes of the network will be used. |
target_metabolites |
character vector containing the KEGG IDs of the metabolites to which the shortest paths will be calculated. Compound KEGG IDs can be obtained using the function "MS_keggFinder( )". By default, target_metabolites = "all", indicating that all the metabolites of the network will be used. |
names |
logical scalar indicating whether metabolite or gene KEGG IDs will be transformed into common metabolite names or gene symbols. Reaction IDs remain unchanged. |
A matrix containing the shortest path length from the genes or reactions (in the rows) to the metabolites (in the columns). For unreacheable metabolites Inf is included.
Csardi, G. & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695.
data(MetaboSignal_table) # Distances from Ship2 (65038) and Ppp2r5b (309179) to D-glucose ("cpd:C00031") MS_convertGene(genes = c("65038","309179"), "rno", "rat", output = "matrix") distances_targets <- MS_distances(MetaboSignal_table, organism_code = "rno", source_genes = c("K15909", "K11584"), target_metabolites = "cpd:C00031", names = TRUE) # Distances from all genes to all metabolites of the network distances_all <- MS_distances(MetaboSignal_table, organism_code = "rno")
data(MetaboSignal_table) # Distances from Ship2 (65038) and Ppp2r5b (309179) to D-glucose ("cpd:C00031") MS_convertGene(genes = c("65038","309179"), "rno", "rat", output = "matrix") distances_targets <- MS_distances(MetaboSignal_table, organism_code = "rno", source_genes = c("K15909", "K11584"), target_metabolites = "cpd:C00031", names = TRUE) # Distances from all genes to all metabolites of the network distances_all <- MS_distances(MetaboSignal_table, organism_code = "rno")
This function generates a network file and two attribute files ("NodesType.txt", "TargetNodes.txt"), which can be imported into Cytoscape to visualize the network. The first attribute file allows customizing the nodes of the network based on the molecular entity they represent: compound, reaction, metabolic-gene or signaling-gene. The second attribute file allows highlighting a set of nodes of interest.
MS_exportCytoscape(network_table, organism_code, names = TRUE, targets = NULL, file_name = "MS")
MS_exportCytoscape(network_table, organism_code, names = TRUE, targets = NULL, file_name = "MS")
network_table |
three-column matrix where each row represents and edge between two nodes. Nodes must be KEGG IDs, not common names. See function "MS_keggNetwork()". For human networks, Entrez gene IDs are also allowed. |
organism_code |
character vector containing the KEGG code for the organism of interest. For example the KEGG code for the rat is "rno". See function "MS_keggFinder( )". |
names |
logical scalar indicating whether metabolite or gene KEGG IDs will be transformed into common metabolite names or gene symbols. Reaction IDs remain unchanged. |
targets |
optional character vector containing the IDs of the target nodes to be discriminated from the other nodes of the network. |
file_name |
character vector that allows customizing the name of the exported files. |
A data frame where each row represents an edge between two nodes (from source to target). The function also generates and exports a network file ("MS_Network.txt") and two attribute files ("MS_NodesType.txt", "MS_TargetNodes.txt"), which can be imported into Cytoscape to visualize the network.
Shannon P et al. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research, 13, 2498-2504.
data(MetaboSignal_table) MS_exportCytoscape(MetaboSignal_table, organism_code = "rno", names = FALSE)
data(MetaboSignal_table) MS_exportCytoscape(MetaboSignal_table, organism_code = "rno", names = FALSE)
This function can be used to find out if a set of genes or metabolites of interest can be mapped onto the network.
MS_findMappedNodes(nodes, network_table)
MS_findMappedNodes(nodes, network_table)
nodes |
character vector containing the IDs of the genes or the metabolites to be mapped onto the network. Remember that Entrez IDs or gene symbols can be transformed into KEGG IDs using the function "MS_convertGene( )". |
network_table |
three-column matrix where each row represents and edge between two nodes. See function "MS_keggNetwork( )". |
A list reporting which genes or metabolites can or cannot be mapped onto the network.
Carlson, M. org.Hs.eg.db: Genome wide annotation for Human.R package version >= 3.2.3.
Mark, A., et al.(2014) mygene: Access MyGene.Info_ services. R package version >= 1.6.0.
http://www.kegg.jp/kegg/docs/keggapi.html
data(MetaboSignal_table) # Map D-glucose ("cpd:C00031"), taurine ("cpd:C00245"), and aldh ("K00128") onto # onto the network MS_findMappedNodes(nodes = c("cpd:C00031","cpd:C00245", "K00128"), MetaboSignal_table)
data(MetaboSignal_table) # Map D-glucose ("cpd:C00031"), taurine ("cpd:C00245"), and aldh ("K00128") onto # onto the network MS_findMappedNodes(nodes = c("cpd:C00031","cpd:C00245", "K00128"), MetaboSignal_table)
This function retrieves the identifiers (IDs) of all metabolic and signaling KEGG pathways of a given organism. These pathway IDs can be used to build a MetaboSignal network with the function "MS_keggNetwork( )".
MS_getPathIds(organism_code)
MS_getPathIds(organism_code)
organism_code |
character vector containing the KEGG code for the organism of interest. For example the KEGG code for the rat is "rno". See the function "MS_keggFinder( )". |
This function returns a matrix, where each row contains the ID, description, category, and type (i.e. "metabolic" or "signaling") of each pathway. This matrix is also exported in a file named "organism-code_pathways.txt".
Tenenbaum, D. KEGGREST: Client-side REST access to KEGG. R package version >= 1.17.0.
rat_paths <- MS_getPathIds(organism_code = "rno") human_paths <- MS_getPathIds(organism_code = "hsa")
rat_paths <- MS_getPathIds(organism_code = "rno") human_paths <- MS_getPathIds(organism_code = "hsa")
This function returns a list of entries corresponding to one of the following KEGG databases: "compound", "organism", "pathway". It can also find entries with matching query keywords in a given database.
MS_keggFinder(KEGG_database, match = NULL, organism_code)
MS_keggFinder(KEGG_database, match = NULL, organism_code)
KEGG_database |
character vector containing the name of the KEGG database of interest: "compound", "organism", "pathway". |
match |
character vector containing one or more elements (i.e. key words) to be matched as compound names. |
organism_code |
character vector containing the KEGG code for the organism of interest. For example the KEGG code for the rat is "rno". This argument is only required for KEGG_database = "pathway". |
By default, a matrix where each row contains the KEGG entries of the database of interest. When using the option "match" a list is returned, each list element containing information of matched entries.
MS_keggFinder(KEGG_database = "compound", match = "acetoacetic acid") MS_keggFinder(KEGG_database = "organism", match = c("rattus","human")) MS_keggFinder(KEGG_database = "pathway", match = c("glycol", "insulin signal", "akt"), organism_code = "rno")
MS_keggFinder(KEGG_database = "compound", match = "acetoacetic acid") MS_keggFinder(KEGG_database = "organism", match = c("rattus","human")) MS_keggFinder(KEGG_database = "pathway", match = c("glycol", "insulin signal", "akt"), organism_code = "rno")
This function generates a directed network-table (i.e. three-column matrix), where each row represents an edge connecting two nodes (from source to target). Nodes represent different molecular entities: metabolic-genes (i.e. genes encoding enzymes that catalyze metabolic reactions), signaling-genes (e.g. kinases), reactions and compounds (metabolites, drugs or glycans). The third column of the matrix indicates the interaction type. Compound-gene (or gene-compound) interactions are designated as: "k_compound:reversible" or "kegg_compound:irreversible", depending on the direction of the interaction. Other types of interactions correspond to gene-gene interactions. When KEGG reports various types of interaction for the same gene pair, the "interaction_type" is collapsed using "/".
The network-table generated with this function can be customized based on several criteria. For instance, undesired nodes can be removed or replaced using the functions "MS_removeNode( )" or "MS_replaceNode( )" respectively. Also, the network can be filtered according to different topological parameters (e.g. node betweenness) using the function "MS_topologyFilter( )".
MS_keggNetwork(metabo_paths, signaling_paths, expand_genes = FALSE, convert_entrez = FALSE)
MS_keggNetwork(metabo_paths, signaling_paths, expand_genes = FALSE, convert_entrez = FALSE)
metabo_paths |
character vector containing the KEGG IDs of the metabolic pathways of interest (organism-specific). Pathway IDs take the form: "organism code + 5-digit number". For example, the ID of the rat "glycolysis/gluconeogenesis" pathway is "rno00010". See functions "MS_keggFinder( )" and "MS_getPathIds( )". |
signaling_paths |
character vector containing the KEGG IDs for the signaling pathways of interest (organism-specific). For example, the ID for the pathway "insulin signaling pathway" in the rat is "rno04910". See functions "MS_keggFinder( )" and "MS_getPathIds( )". |
expand_genes |
logical scalar indicating whether the gene nodes will represent orthology IDs (FALSE) or organism-specific gene IDs (TRUE). |
convert_entrez |
logical scalar indicating whether the KEGG gene IDs will be transformed into Entrez IDs. This argument will be ignored if expand_genes = FALSE, or if the input paths are not human-specific. |
A three-column matrix where each row represents an edge between two nodes.
Reaction directionality reported in KEGG has been cross-validated with published literature (Duarte et al., 2007).
Davidovic, L., et al. (2011). A metabolomic and systems biology perspective on the brain of the fragile X syndrome mouse model. Genome Research, 21, 2190-2202.
Duarte, N.C., et al. (2007). Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proceedings of the National Academy of Sciences, 104, 1777-1782.
Posma, J.M., et al.(2014). MetaboNetworks, an interactive Matlab-based toolbox for creating, customizing and exploringsub-networks from KEGG. Bioinformatics, 30, 893-895.
Zhang, J.D. & Wiemann, S. (2009). KEGGgraph: a graph approach to KEGG PATHWAY in R and Bioconductor. Bioinformatics, 25, 1470-1471.
http://www.kegg.jp/kegg/docs/keggapi.html
# MetaboSignal network-table with organism-specific gene nodes MS_netIsoforms <- MS_keggNetwork(metabo_paths = c("rno00010", "rno00562"), signaling_paths = c("rno04910", "rno04151"), expand_genes = TRUE) # MetaboSignal network-table with orthology gene nodes MS_netK <- MS_keggNetwork(metabo_paths = c("rno00010", "rno00562"), signaling_paths = c("rno04910", "rno04151")) # MetaboSignal network-table with human Entrez gene IDs MS_netEntrez <- MS_keggNetwork(metabo_paths = c("hsa00010", "hsa00562"), signaling_paths = c("hsa04910", "hsa04151"), expand_genes = TRUE, convert_entrez = TRUE)
# MetaboSignal network-table with organism-specific gene nodes MS_netIsoforms <- MS_keggNetwork(metabo_paths = c("rno00010", "rno00562"), signaling_paths = c("rno04910", "rno04151"), expand_genes = TRUE) # MetaboSignal network-table with orthology gene nodes MS_netK <- MS_keggNetwork(metabo_paths = c("rno00010", "rno00562"), signaling_paths = c("rno04910", "rno04151")) # MetaboSignal network-table with human Entrez gene IDs MS_netEntrez <- MS_keggNetwork(metabo_paths = c("hsa00010", "hsa00562"), signaling_paths = c("hsa04910", "hsa04151"), expand_genes = TRUE, convert_entrez = TRUE)
This function calculates the betweenness of each node of the network.
MS_nodeBW(network_table, mode = "all", normalized = TRUE)
MS_nodeBW(network_table, mode = "all", normalized = TRUE)
network_table |
three-column matrix where each row represents and edge between two nodes. See function "MS_keggNetwork( )". |
mode |
character constant indicating whether a directed ("out") or undirected ("all") network will be considered. |
normalized |
logical scalar indicating whether to normalize the betweeness scores. If TRUE, normalized betweenness scores will be returned. If FALSE, raw betweenness scores will be returned. |
A numeric vector containing the betweenness of each node of the network. The function also produces and histogram showing the distribution of node betweenness.
Csardi, G. & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695.
data(MetaboSignal_table) MS_nodeBW(MetaboSignal_table)
data(MetaboSignal_table) MS_nodeBW(MetaboSignal_table)
This function generates a directed reaction-compound network. The network is formalized as a three-column matrix, where each row represents an edge connecting two nodes (from source to target).
MS_reactionNetwork(metabo_paths)
MS_reactionNetwork(metabo_paths)
metabo_paths |
character vector containing the KEGG IDs of the metabolic pathways of interest. See functions "MS_keggFinder( )" and "MS_getPathIds( )". |
A three-column matrix where each row represents an edge between two nodes.
Reaction directionality reported in KEGG has been cross-validated with published literature (Duarte et al., 2007).
reaction_network <- MS_reactionNetwork(metabo_paths = c("rno00010", "rno00562"))
reaction_network <- MS_reactionNetwork(metabo_paths = c("rno00010", "rno00562"))
This function allows removing edges containing drug ("dr:") nodes.
MS_removeDrugs(network_table)
MS_removeDrugs(network_table)
network_table |
three-column matrix where each row represents and edge between two nodes. See function "MS_keggNetwork( )". |
A three-column matrix corresponding to the input network-table without the drug nodes.
data(MetaboSignal_table) # Remove drug nodes if present drugsRemoved <- MS_removeDrugs(MetaboSignal_table)
data(MetaboSignal_table) # Remove drug nodes if present drugsRemoved <- MS_removeDrugs(MetaboSignal_table)
This function allows removing undesired nodes from the network-table.
MS_removeNode(nodes, network_table)
MS_removeNode(nodes, network_table)
nodes |
character vector containing the node IDs to be removed. |
network_table |
three-column matrix where each row represents and edge between two nodes. See function "MS_keegNetwork( )". |
A three-column matrix corresponding to the input network-table without the undesired nodes.
data(MetaboSignal_table) # Remove glucose nodes glucoseRemoved <- MS_removeNode(nodes = c("cpd:C00267", "cpd:C00221", "cpd:C00031"), MetaboSignal_table)
data(MetaboSignal_table) # Remove glucose nodes glucoseRemoved <- MS_removeNode(nodes = c("cpd:C00267", "cpd:C00221", "cpd:C00031"), MetaboSignal_table)
This function allows replacing node IDs of a network-table. It can be used to cluster the IDs of chemical isomers (e.g. alpha-D-glucose ("cpd:C00267"), D-glucose ("cpd:C00031"), and beta-D-glucose ("cpd:C00021")) into a single ID.
MS_replaceNode(node1, node2, network_table)
MS_replaceNode(node1, node2, network_table)
node1 |
character vector containing the node IDs to be replaced. |
node2 |
character vector containing the ID that will be used as a replacement. |
network_table |
three-column matrix where each row represents and edge between two nodes. See function "MS_keggNetwork( )". |
A three-column matrix corresponding to the input network-table with replaced nodes.
data(MetaboSignal_table) # Cluster D-glucose isomers ("cpd:C00267","cpd:C00221","cpd:C00031") glucoseClustered <- MS_replaceNode(node1 = c("cpd:C00267", "cpd:C00221"), node2 = "cpd:C00031", MetaboSignal_table)
data(MetaboSignal_table) # Cluster D-glucose isomers ("cpd:C00267","cpd:C00221","cpd:C00031") glucoseClustered <- MS_replaceNode(node1 = c("cpd:C00267", "cpd:C00221"), node2 = "cpd:C00031", MetaboSignal_table)
This function calculates the shortest path(s) between any two reachable nodes of a network-table.
MS_shortestPaths(network_table, source_node, target_node, mode = "out", type = "first")
MS_shortestPaths(network_table, source_node, target_node, mode = "out", type = "first")
network_table |
three-column matrix where each row represents and edge between two nodes. See function "MS_keggNetwork( )". |
source_node |
character vector containing the node from which the shortest paths will be calculated. |
target_node |
character vector containing the node to which the shortest path will be calculated. |
mode |
character constant indicating whether a directed or an undirected network will be considered. "all" indicates that all the edges of the network will be considered as undirected. "out" indicates that all the edges of the network will be considered as directed. "SP" indicates that all network will be considered as directed except the edges linked to target metabolite, which will be considered as undirected. The difference between the "out" and "SP" options, is that the latter aids reaching target metabolites that are substrate of irreversible reactions. |
type |
indicates whether all shortest paths or a single shortest path will be considered when there are several shortest paths between the source_node and the target_node. If type = "all", all shortest paths will be considered. If type = "first" a single path will be considered. If type = "bw" the path with the highest betweenness score will be considered. The betweenness score is calculated as the average betweenness of the gene nodes of the path. Using type = "bw" increases the time required to compute this function. |
A vector or a matrix where each row contains a shortest path from the source_node to the target_node. KEGG IDs can be transformed into common names using the function "MS_changeNames( )".
G. Csardi and T. Nepusz (2015). igraph package, The Comprehensive R Archive Network, v1.0.1.
data(MetaboSignal_table) # Shortest path from HK ("K00844") to a-D-Glucose ("cpd:C00267") path1 <- MS_shortestPaths(MetaboSignal_table, "K00844", "cpd:C00267", mode = "SP") path2 <- MS_shortestPaths(MetaboSignal_table, "K00844", "cpd:C00267", mode = "out") # Shortest paths from G6PC ("K01084") to pyruvate ("cpd:C00022") path3 <- MS_shortestPaths(MetaboSignal_table, "K01084", "cpd:C00022", type = "all") path4 <- MS_shortestPaths(MetaboSignal_table, "K01084", "cpd:C00022", type = "bw")
data(MetaboSignal_table) # Shortest path from HK ("K00844") to a-D-Glucose ("cpd:C00267") path1 <- MS_shortestPaths(MetaboSignal_table, "K00844", "cpd:C00267", mode = "SP") path2 <- MS_shortestPaths(MetaboSignal_table, "K00844", "cpd:C00267", mode = "out") # Shortest paths from G6PC ("K01084") to pyruvate ("cpd:C00022") path3 <- MS_shortestPaths(MetaboSignal_table, "K01084", "cpd:C00022", type = "all") path4 <- MS_shortestPaths(MetaboSignal_table, "K01084", "cpd:C00022", type = "bw")
This function allows calculating the shortest paths from a set of source nodes to a set of target nodes, and representing them as a network. By default, the function exports a network file and two attribute files ("NodesType.txt", "TargetNodes.txt"), which can be imported into Cytoscape to visualize the network. The first attribute file allows customizing the nodes of the network based on the molecular entity they represent: signaling-gene, metabolic-gene, reaction or compound. The second attribute file allows highlighting the source and target nodes.
MS_shortestPathsNetwork(network_table, organism_code, source_nodes, target_nodes, mode = "out", type = "first", distance_th = Inf, names = TRUE, export_cytoscape = TRUE, file_name = "MS")
MS_shortestPathsNetwork(network_table, organism_code, source_nodes, target_nodes, mode = "out", type = "first", distance_th = Inf, names = TRUE, export_cytoscape = TRUE, file_name = "MS")
network_table |
three-column matrix where each row represents an edge between two nodes. See function "MS_keggNetwork( )". |
organism_code |
character vector containing the KEGG code for the organism of interest. For example the KEGG code for the rat is "rno". See the function "MS_keggFinder( )". |
source_nodes |
character vector containing the node IDs (typically genes) from which the shortest paths will be calculated. When using gene IDs make sure that they are consistent with the format of the network (i.e. organism-specific gene IDs or orthology IDs). Remember that Entrez IDs and gene symbols can be transformed into KEGG IDs with the function "MS_convertGene( )". |
target_nodes |
character vector containing the nodes IDs (typically compounds) to which the shortest paths will be calculated. Compound KEGG IDs can be obtained using the function "MS_keggFinder( )". |
mode |
character constant indicating whether a directed (mode = "out") or semi-directed (mode = "SP") network will be considered. "out" indicates that all the edges of the network will be considered as directed. "SP" indicates that all network will be considered as directed except the edges linked to target_node, which will be considered as undirected. The difference between the "out" and the "SP" options, is that the latter aids reaching target metabolites that are substrates of irreversible reactions. |
type |
character constant indicating whether all shortest paths or a single shortest path will be considered when there are several shortest paths between a source node and a target node. If type = "all", all shortest paths will be considered. If type = "first" a single path will be considered. If type = "bw" the path with the highest betweenness score will be considered. The betweenness score is calculated as the average betweenness of the gene nodes of the path. Note that using type = "bw" increases the time required to compute this function. |
distance_th |
establishes a shortest path length threshold. Only shortest paths with length below this threshold will be included in the network. |
names |
logical scalar indicating whether metabolite or gene KEGG IDs will be transformed into common metabolite names or gene symbols. Reaction IDs remain unchanged. |
export_cytoscape |
logical scalar indicating whether network and attribute Cytoscape files will be generated and exported. |
file_name |
character vector that allows customizing the name of the exported files. |
A matrix where each row represents an edge between two nodes. By default, the function also generates a network file ("MS_Network.txt") and two attribute files ("MS_NodesType.txt", "MS_TargetNodes.txt"), which can be imported into Cytoscape to visualize the network.
The network-table generated with this function can be also visualized in R using the igraph package. The network-table can be transformed into an igraph object using the function "graph.data.frame( )" from igraph.
Csardi, G. & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695.
Shannon, P., et al. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research, 13, 2498-2504.
data(MetaboSignal_table) # Shortest paths from G6PC ("K01084") to pyruvate ("cpd:C00022") and # to a-D-Glucose ("cpd:C00267") subnet_first <- MS_shortestPathsNetwork(MetaboSignal_table, organism_code = "rno", source_nodes = "K01084", target_nodes = c("cpd:C00022", "cpd:C00267"), mode = "SP", type = "first") subnet_all <- MS_shortestPathsNetwork(MetaboSignal_table, organism_code = "rno", source_nodes = "K01084", target_nodes = c("cpd:C00022", "cpd:C00267"), mode = "SP", type = "all")
data(MetaboSignal_table) # Shortest paths from G6PC ("K01084") to pyruvate ("cpd:C00022") and # to a-D-Glucose ("cpd:C00267") subnet_first <- MS_shortestPathsNetwork(MetaboSignal_table, organism_code = "rno", source_nodes = "K01084", target_nodes = c("cpd:C00022", "cpd:C00267"), mode = "SP", type = "first") subnet_all <- MS_shortestPathsNetwork(MetaboSignal_table, organism_code = "rno", source_nodes = "K01084", target_nodes = c("cpd:C00022", "cpd:C00267"), mode = "SP", type = "all")
This function allows filtering a network based on tissue expression data from the Human Protein Atlas, by removing signaling genes that are not detected in the target tissue(s) (reliability = "approved" or "supported"). This function can be only used to filter human networks.
MS_tissueFilter(network_table, tissue, input_format = "kegg", expand_genes = FALSE)
MS_tissueFilter(network_table, tissue, input_format = "kegg", expand_genes = FALSE)
network_table |
three-column matrix where each row represents an edge between two nodes. The gene nodes of this network must be human specific gene IDS (not orthologies). For this, use the function "MS_keggNetwork( )" with expand_genes = TRUE. |
tissue |
character vector indicating the tissue(s) of interest. Signaling genes (i.e. non-enzymatic genes) not detected in the target tissue(s) (reliability = "approved" or "supported") will be removed from the network. Check all possible tissues in the "hpaNormalTissue" dataset. |
input_format |
character vector indicating the gene format in the input network_table ("entrez" or "kegg"). |
expand_genes |
logical scalar indicating whether the gene nodes in the filtered network will represent orthology IDs (expand_genes = FALSE) or organism-specific gene IDs (expand_genes = TRUE). |
A three-column matrix where each row represents an edge between two nodes.
Gatto, L. hpar: Human Protein Atlas in R.R package version 1.12.0.
http://www.kegg.jp/kegg/docs/keggapi.html
# Build network net <- MS_keggNetwork(metabo_paths = "hsa00010", signaling_paths = "hsa04014", expand_genes = TRUE) # Filter network by liver and cluster genes by orthology net_filtered <- MS_tissueFilter(net, tissue = "liver")
# Build network net <- MS_keggNetwork(metabo_paths = "hsa00010", signaling_paths = "hsa04014", expand_genes = TRUE) # Filter network by liver and cluster genes by orthology net_filtered <- MS_tissueFilter(net, tissue = "liver")
This function allows reducing the dimensionality of a network, by removing nodes that do not meet the established distance and/or node betweenness criteria.
MS_topologyFilter(network_table, mode = "all", type, target_node, distance_th, bw_th)
MS_topologyFilter(network_table, mode = "all", type, target_node, distance_th, bw_th)
network_table |
three-column matrix where each row represents and edge between two nodes. See function "MS_keggNetwork( )". |
mode |
character constant indicating whether a directed ("out") or undirected ("all") network will be considered. |
type |
character constant used to establish the criteria for filtering the network. "bw" indicates that edges (i.e. rows of the network_table) containing at least one node with betweenness below bw_th will be neglected. "distance" indicates edges containing at least one node with shortest path length to the target_node above distance_th will be neglected. "all" indicates that edges containing at least one node with either betweenness below bw_th or distance above distance_th, will be neglected. |
target_node |
character vector containing the ID of the node to which the distances will be calculated. |
distance_th |
numeric value corresponding to the distance threshold. Nodes with shortest path length to the target_node above this threshold will be removed from the network-table. |
bw_th |
numeric value corresponding to the normalized-betweenness threshold. Nodes with betweenness below this threshold will be removed from the network-table. See also "MS_nodeBW( )". |
A three-column matrix where each row represents an edge between two nodes.
Csardi, G. & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695.
data(MetaboSignal_table) # Remove edges containing nodes with distance to D-glucose ("cpd:C00031") > 2 network_filtered1 <- MS_topologyFilter(MetaboSignal_table, type = "distance", target_node = "cpd:C00031", distance_th = 2) # Remove edges containing nodes with distance to D-glucose ("cpd:C00031") > 2 or # normalized-betweenness < 0.00005 network_filtered2 <- MS_topologyFilter(MetaboSignal_table, type = "all", target_node = "cpd:C00031", distance_th = 2, bw_th = 0.00005) # Note below that network_filtered1 has one edge more than network_filtered2. This is # because "cpd:C00031" has betweenness = 0, and therefore it is removed in network_filtered2: setdiff(as.vector(network_filtered1[, 1:2]),as.vector(network_filtered2[, 1:2]))
data(MetaboSignal_table) # Remove edges containing nodes with distance to D-glucose ("cpd:C00031") > 2 network_filtered1 <- MS_topologyFilter(MetaboSignal_table, type = "distance", target_node = "cpd:C00031", distance_th = 2) # Remove edges containing nodes with distance to D-glucose ("cpd:C00031") > 2 or # normalized-betweenness < 0.00005 network_filtered2 <- MS_topologyFilter(MetaboSignal_table, type = "all", target_node = "cpd:C00031", distance_th = 2, bw_th = 0.00005) # Note below that network_filtered1 has one edge more than network_filtered2. This is # because "cpd:C00031" has betweenness = 0, and therefore it is removed in network_filtered2: setdiff(as.vector(network_filtered1[, 1:2]),as.vector(network_filtered2[, 1:2]))
This function allows merging two network-tables of interest.
MS2_mergeNetworks(network_table1, network_table2)
MS2_mergeNetworks(network_table1, network_table2)
network_table1 |
three-column matrix where each row represents an edge between two nodes. See functions "MS_keggNetwork()" and "MS2_ppiNetwork()". |
network_table2 |
three-column matrix where each row represents an edge between two nodes. See functions "MS_keggNetwork()" and "MS2_ppiNetwork()". |
A three-column matrix where each row represents an edge between two nodes.
data(keggNet_example) data(ppiNet_example) # Fast example using subsets global_network1 <- MS2_mergeNetworks(keggNet_example[1:10, ], ppiNet_example[1:10, ]) # Example using full datasets global_network2 <- MS2_mergeNetworks(keggNet_example, ppiNet_example)
data(keggNet_example) data(ppiNet_example) # Fast example using subsets global_network1 <- MS2_mergeNetworks(keggNet_example[1:10, ], ppiNet_example[1:10, ]) # Example using full datasets global_network2 <- MS2_mergeNetworks(keggNet_example, ppiNet_example)
This function generates a directed regulatory network by merging interactions reported in two literature-curated resources: OmniPath and TRRUST. The network is formalized as a three-column matrix, where each row represents an edge connecting two nodes (from source to target). The third column indicates the type of interaction, as well as the source of the interaction (OmniPath = "o_", TRRUST = "t_"). Nodes represent gene Entrez IDs.
MS2_ppiNetwork(datasets = "all")
MS2_ppiNetwork(datasets = "all")
datasets |
character vector indicating the datasets that will be used to build the network ("all", "omnipath","trrust"). It is also possible to select databases included within OmniPath (e.g. datasets = c("biogrid", "string")) |
A three-column matrix where each row represents an edge between two nodes.
The dataset "regulatory_interactions" contains details regarding primary database reference(s) as well as literature reference(s) of each of the regulatory interactions. The users are fully responsible for respecting the terms of the these databases and for citing them when required. The users can edit/update this dataset if needed.
Ceol, A., et al. (2007). DOMINO: a database of domain-peptide interactions. Nucleic Acid Research, 35, D557-60.
Cui, Q., et al. (2007). A map of human cancer signaling. Molecular Systems Biology, 3:152.
Diella, F., et al. (2004). Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics, 22, 5:79.
Dinkel, H., et al. (2012). ELM–the database of eukaryotic linear motifs. Nucleic Acid Research, 40, D242-51.
Han, H., et al. (2015). TRRUST: a reference database of human transcriptional regulatory interactions. Scientific Reports, 15, 11432.
Hornbeck, P.V., et al. (2012). PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acid Research, 40, D261-70.
Korcsmaros, T., et al. (2010). Uniformly curated signaling pathways reveal tissue-specific cross-talks and support drug target discovery. Bioinformatics, 26, 2042:2050.
Lynn, D.J., et al. (2008). InnateDB: facilitating systems-level analyses of the mammalian innate immune response. Molecular Systems Biology, 4, 218.
Orchard, S., et al. (2014). The MIntAct project–IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acid Research, 242, D358-63.
Pagel, P., et al. (2005). The MIPS mammalian protein-protein interaction database. Bioinformatics, 21, 832-834.
Papp, D., et al. (2012). The NRF2-related interactome and regulome contain multifunctional proteins and fine-tuned autoregulatory loops. FEBS Letters, 586, 1795-802.
Pawson, A.J., et al. (2014). The IUPHAR/BPS Guide to PHARMACOLOGY: an expert-driven knowledgebase of drug targets and their ligands. Nucleic Acids Research, 42, D1098-106.
Peri, S., et al. (2003). Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Research, 13, 2363-2371.
Turei, D., et al. (2015). Autophagy Regulatory Network - a systems-level bioinformatics resource for studying the mechanism and regulation of autophagy. Autophagy, 11, 155-165.
Turei, D., et al. (2016). OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nature methods, 13, 966-967.
Sarkar, D., et al. (2015). LMPID: a manually curated database of linear motifs mediating protein-protein interactions. Database(Oxford), pii: bav014.
Shin, Y.C., et al. (2011). TRIP Database: a manually curated database of protein-protein interactions for mammalian TRP channels. Nucleic Acids Research, 39, D356-61.
Snel, B., et al. (2000). STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Research, 28, 3442-3444.
Xenarios, I., et al. (2002). DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Research, 30, 303-305.
# Build regulatory network using the OmniPath dataset only omnipath_net <- MS2_ppiNetwork(datasets = "omnipath") # Build regulatory network using the TRRUST dataset only trrust_net <- MS2_ppiNetwork(datasets = "trrust") # Build regulatory network using interactions from STRING and BioGRID biogridstring_net <- MS2_ppiNetwork(datasets = c("biogrid", "string"))
# Build regulatory network using the OmniPath dataset only omnipath_net <- MS2_ppiNetwork(datasets = "omnipath") # Build regulatory network using the TRRUST dataset only trrust_net <- MS2_ppiNetwork(datasets = "trrust") # Build regulatory network using interactions from STRING and BioGRID biogridstring_net <- MS2_ppiNetwork(datasets = c("biogrid", "string"))
Signaling-transduction network generated by merging the interactions from OmniPath and TRRUST databases.
ppiNet_example
ppiNet_example
Matrix
Matrix
This matrix contains a set of human regulatory interactions compiled from two literature-curated resources: OmniPath (directed protein-protein and signaling interactions reported in databases with an appropiate licence) and TRRUST (transcription factor-target interactions). For each interaction, both literature references and primary database references are reported. The users are responsible for respecting the terms of their licences and for citing them when required. This matrix can be edited or updated by the users if required.
regulatory_interactions
regulatory_interactions
Matrix
Matrix
Ceol, A., et al. (2007). DOMINO: a database of domain-peptide interactions. Nucleic Acid Research, 35, D557-60.
Cui, Q., et al. (2007). A map of human cancer signaling. Molecular Systems Biology, 3:152.
Diella, F., et al. (2004). Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics, 22, 5:79.
Dinkel, H., et al. (2012). ELM–the database of eukaryotic linear motifs. Nucleic Acid Research, 40, D242-51.
Han, H., et al. (2015). TRRUST: a reference database of human transcriptional regulatory interactions. Scientific Reports, 15, 11432.
Hornbeck, P.V., et al. (2012). PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acid Research, 40, D261-70.
Korcsmaros, T., et al. (2010). Uniformly curated signaling pathways reveal tissue-specific cross-talks and support drug target discovery. Bioinformatics, 26, 2042:2050.
Lynn, D.J., et al. (2008). InnateDB: facilitating systems-level analyses of the mammalian innate immune response. Molecular Systems Biology, 4, 218.
Orchard, S., et al. (2014). The MIntAct project–IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acid Research, 242, D358-63.
Pagel, P., et al. (2005). The MIPS mammalian protein-protein interaction database. Bioinformatics, 21, 832-834.
Papp, D., et al. (2012). The NRF2-related interactome and regulome contain multifunctional proteins and fine-tuned autoregulatory loops. FEBS Letters, 586, 1795-802.
Pawson, A.J., et al. (2014). The IUPHAR/BPS Guide to PHARMACOLOGY: an expert-driven knowledgebase of drug targets and their ligands. Nucleic Acids Research, 42, D1098-106.
Peri, S., et al. (2003). Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Research, 13, 2363-2371.
Turei, D., et al. (2015). Autophagy Regulatory Network - a systems-level bioinformatics resource for studying the mechanism and regulation of autophagy. Autophagy, 11, 155-165.
Turei, D., et al. (2016). OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nature methods, 13, 966-967.
Sarkar, D., et al. (2015). LMPID: a manually curated database of linear motifs mediating protein-protein interactions. Database(Oxford), pii: bav014.
Shin, Y.C., et al. (2011). TRIP Database: a manually curated database of protein-protein interactions for mammalian TRP channels. Nucleic Acids Research, 39, D356-61.
Snel, B., et al. (2000). STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Research, 28, 3442-3444.
Xenarios, I., et al. (2002). DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Research, 30, 303-305.