This tutorial describes an R-package for finding active metabolic modules based on high throughput data. The pipeline takes as input transcriptional and/or metabolic data and finds a metabolic subnetwork (module) most regulated between the two conditions of interest.
The package relies on the active module analysis framework developed in BioNet package, but extends it to work with metabolic reaction networks. Further, it illustrates the usage of mwcsr package which provides a number of solvers for Maximum Weight Connected Subgraph problem and its variants.
Example of using the pipeline include:
More details on the pipeline are available in Sergushichev et al, 2016 and Emelianova et al, 2022.
You can install gatom via
BiocManager
:
In this example we will find an active metabolic module based on macrophage activation gene expression and metabolomics data (Jha et al, 2015). For improved performance here we will consider a simplified version of the data. See Example on full data and full network section for the real-scale analysis.
First let’s load the example tables with input differential gene expression and metabolite abundance data for LPS-IFNg stimulated macrophages compared to controls:
## # A tibble: 6 × 4
## ID pval log2FC baseMean
## <chr> <dbl> <dbl> <dbl>
## 1 NM_008392 8.83e-16 10.3 16589.
## 2 NM_001110517 1.12e-12 5.97 153.
## 3 NM_001253809 3.22e-11 -5.87 43.6
## 4 NM_013693 1.88e-10 4.92 326.
## 5 NM_009114 8.59e-10 -3.21 473.
## 6 NM_021491 2.36e- 9 -6.69 50.9
## # A tibble: 6 × 4
## ID pval log2FC baseMean
## <chr> <dbl> <dbl> <dbl>
## 1 HMDB02092 8.83e-34 3.12 17.1
## 2 HMDB00749 8.83e-34 3.12 17.1
## 3 HMDB00263 3.16e-19 1.41 12.8
## 4 HMDB01316 2.88e-13 -1.23 13.2
## 5 HMDB00191 4.39e-13 0.735 16.4
## 6 HMDB01112 6.60e-13 0.653 15.2
Next we will load example network related objects: global reaction
network (networkEx
object), metabolite annotations
(met.kegg.dbEx
), and organism-specific enzyme annotations
for mouse (org.Mm.eg.gatom.annoEx
).
Here networkEx
object contain information about 204 KEGG
reactions, their atom mappings and relation to enzymes:
## List of 5
## $ reactions :Classes 'data.table' and 'data.frame': 204 obs. of 4 variables:
## $ enzyme2reaction:Classes 'data.table' and 'data.frame': 243 obs. of 2 variables:
## $ reaction2align :Classes 'data.table' and 'data.frame': 3919 obs. of 3 variables:
## $ atoms :Classes 'data.table' and 'data.frame': 1689 obs. of 3 variables:
## $ metabolite2atom:Classes 'data.table' and 'data.frame': 1689 obs. of 2 variables:
Object met.kegg.dbEx
contains information about 138 KEGG
metabolites, including mappings from HMDB and ChEBI:
## List of 4
## $ baseId : chr "KEGG"
## $ metabolites:Classes 'data.table' and 'data.frame': 138 obs. of 3 variables:
## ..$ metabolite : chr [1:138] "C00002" "C00003" "C00004" "C00005" ...
## ..$ metabolite_name: chr [1:138] "ATP" "NAD+" "NADH" "NADPH" ...
## ..$ metabolite_url : chr [1:138] "http://www.kegg.jp/entry/C00002" "http://www.kegg.jp/entry/C00003" "http://www.kegg.jp/entry/C00004" "http://www.kegg.jp/entry/C00005" ...
## $ anomers :List of 2
## ..$ metabolite2base_metabolite:Classes 'data.table' and 'data.frame': 238 obs. of 2 variables:
## ..$ base_metabolite2metabolite:Classes 'data.table' and 'data.frame': 238 obs. of 2 variables:
## $ mapFrom :List of 2
## ..$ HMDB :Classes 'data.table' and 'data.frame': 212 obs. of 2 variables:
## ..$ ChEBI:Classes 'data.table' and 'data.frame': 161 obs. of 2 variables:
Object org.Mm.eg.gatom.annoEx
contains mouse-specific
mapping between enzyme classes and genes, as well as mapping between
different types of gene identifiers:
## List of 4
## $ baseId : chr "Entrez"
## $ gene2enzyme:Classes 'data.table' and 'data.frame': 150 obs. of 2 variables:
## ..$ gene : chr [1:150] "100042025" "100198" "100198" "100678" ...
## ..$ enzyme: chr [1:150] "1.2.1.12" "3.1.1.31" "1.1.1.47" "3.1.3.3" ...
## $ genes :Classes 'data.table' and 'data.frame': 135 obs. of 2 variables:
## ..$ gene : chr [1:135] "100042025" "100198" "100678" "100705" ...
## ..$ symbol: chr [1:135] "Gapdh-ps15" "H6pd" "Psph" "Acacb" ...
## $ mapFrom :List of 3
## ..$ RefSeq :Classes 'data.table' and 'data.frame': 724 obs. of 2 variables:
## ..$ Ensembl:Classes 'data.table' and 'data.frame': 135 obs. of 2 variables:
## ..$ Symbol :Classes 'data.table' and 'data.frame': 135 obs. of 2 variables:
Then we create a metabolic graph with atom topology from the loaded
data. Depending on topology
parameter, the graph vertices
can correspond either to atoms
or metabolites
.
For metabolite topology, see Using
metabolite-level network section.
g <- makeMetabolicGraph(network=networkEx,
topology="atoms",
org.gatom.anno=org.Mm.eg.gatom.annoEx,
gene.de=gene.de.rawEx,
met.db=met.kegg.dbEx,
met.de=met.de.rawEx)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
## IGRAPH f0faaee UN-- 176 190 --
## + attr: name (v/c), metabolite (v/c), element (v/c), label (v/c), url
## | (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC (v/n), baseMean
## | (v/n), logPval (v/n), signal (v/c), label (e/c), pval (e/n), origin
## | (e/n), RefSeq (e/c), gene (e/c), enzyme (e/c), reaction_name (e/c),
## | reaction_equation (e/c), url (e/c), reaction (e/c), log2FC (e/n),
## | baseMean (e/n), logPval (e/n), signal (e/c), signalRank (e/n)
## + edges from f0faaee (vertex names):
## [1] C00025_-0.3248_2.8125 --C00026_-0.3248_2.8125
## [2] C00025_-1.6238_3.5625 --C00026_-1.6238_3.5625
## [3] C00025_-2.9228_2.8125 --C00026_-2.9228_2.8125
## + ... omitted several edges
After creating the metabolic graph, we then score it, obtaining an instance of Signal Generalized Maximum Weight Subgraph (SGMWCS) problem.
The size of the module can be controlled by changing scoring
parameters k.gene
and k.met
. The higher the
values of scoring parameters are, the bigger the module is going to
be.
Then we initialize an SMGWCS solver. Here, we use a heuristic
relax-and-cut solver rnc_solver
for simplicity.
See mwcsr
package documentation for more solver options,
or Using exact solver section for the
recommended way.
Then we find an active metabolic module with chosen solver and scored graph:
The result module is an igraph
object that captures the
most regulated reactions:
## IGRAPH a2b623e UN-- 33 32 --
## + attr: signals (g/n), name (v/c), metabolite (v/c), element (v/c),
## | label (v/c), url (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC
## | (v/n), baseMean (v/n), logPval (v/n), signal (v/c), score (v/n),
## | label (e/c), pval (e/n), origin (e/n), RefSeq (e/c), gene (e/c),
## | enzyme (e/c), reaction_name (e/c), reaction_equation (e/c), url
## | (e/c), reaction (e/c), log2FC (e/n), baseMean (e/n), logPval (e/n),
## | signal (e/c), signalRank (e/n), score (e/n)
## + edges from a2b623e (vertex names):
## [1] C00025_-1.6238_3.5625--C00026_-1.6238_3.5625
## [2] C00022_-1.299_2.25 --C00041_-1.299_2.25
## + ... omitted several edges
## [1] "Psat1" "Gpt2" "Got2" "Shmt1" "Pkm" "Tpi1"
## [1] "Pyruvate" "L-Glutamate" "2-Oxoglutarate" "Oxaloacetate"
## [5] "Glycine" "L-Alanine"
The module can be plotted in R Viewer with
createShinyCyJSWidget()
. Here, red color corresponds to
up-regulation (positive log-2 fold change) and green to down-regulation
(negative log-2 fold change). Blue nodes and edges come from data with
absent log-2 fold change values. Bigger size of nodes and width of edges
reflect lower p-values.
We can save the module to graphml format with
write_graph()
function from igraph
:
Or it can be saved to an interactive html widget:
We can also save the module to dot format:
Such dot file can be further used to generate svg file using
neato
tool from graphviz suite if it is installed on the
system:
system(paste0("neato -Tsvg ", file.path(tempdir(), "M0.vs.M1.dot"),
" > ", file.path(tempdir(), "M0.vs.M1.svg")),
ignore.stderr=TRUE)
Alternatively, the module can be saved to pdf format with a nice layout.
You may vary the meaning of repel force and the number of iterations of repel algorithm for label layout. Note, that the larger your graph is the softer force you should use.
You may also set different seed for different variants of edge layout
with set.seed()
.
Let’s now look at how the analysis will work with the full dataset and the full network. For this case we will be using the combined network instead of KEGG network (see Networks for details on the network types).
The full macrophage LPS+IFNG-activation dataset can be downloaded from http://artyomovlab.wustl.edu/publications/supp_materials/GAM/:
met.de.raw <- fread("http://artyomovlab.wustl.edu/publications/supp_materials/GAM/Ctrl.vs.MandLPSandIFNg.met.de.tsv.gz")
gene.de.raw <- fread("http://artyomovlab.wustl.edu/publications/supp_materials/GAM/Ctrl.vs.MandLPSandIFNg.gene.de.tsv.gz")
Full pre-generated combined network, corresponding metabolite annotation, and enzyme annotation can be downloaded from http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/:
network.combined <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/network.combined.rds"))
met.combined.db <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/met.combined.db.rds"))
org.Mm.eg.gatom.anno <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/org.Mm.eg.gatom.anno.rds"))
For better work of the combined network we highly recommend using additional supplementary gene files (see Supplementary Genes).
gene2reaction.extra <- fread("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/gene2reaction.combined.mmu.eg.tsv", colClasses="character")
Running gatom
on the combined network and the full
dataset:
cg <- makeMetabolicGraph(network=network.combined,
topology="atoms",
org.gatom.anno=org.Mm.eg.gatom.anno,
gene.de=gene.de.raw,
met.db=met.combined.db,
met.de=met.de.raw,
gene2reaction.extra=gene2reaction.extra)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
## Metabolite p-value threshold: 0.038971
## Metabolite BU alpha: 0.078854
## FDR for metabolites: 0.054345
## Gene p-value threshold: 0.001204
## Gene BU alpha: 0.153709
## FDR for genes: 0.006234
## IGRAPH c751eb3 UN-- 75 77 --
## + attr: signals (g/n), name (v/c), metabolite (v/c), element (v/c),
## | label (v/c), url (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC
## | (v/n), baseMean (v/n), logPval (v/n), signal (v/c), score (v/n),
## | label (e/c), pval (e/n), origin (e/n), RefSeq (e/c), gene (e/c),
## | enzyme (e/c), reaction_name (e/c), reaction_equation (e/c), url
## | (e/c), reaction (e/c), log2FC (e/n), baseMean (e/n), logPval (e/n),
## | signal (e/c), signalRank (e/n), score (e/n)
## + edges from c751eb3 (vertex names):
## [1] C00025_1.299_0.75--C00025_1.299_0.75 C00022_1.299_0.75--C00026_-1.299_0.75
## [3] C00025_1.299_0.75--C00026_-1.299_0.75 C00025_0_0 --C00026_1.299_0.75
## + ... omitted several edges
The result module for combined network:
We provide four types of networks that can be used for analysis:
KEGG network consists of network.kegg.rds
&
met.kegg.db.rds
files and is based on KEGG database.
Both metabolites and reactions in KEGG network have KEGG IDs.
This network was generated with the pipeline available here. For extra details on KEGG network you can also reference shinyGatom and GAM articles.
network <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/network.kegg.rds"))
met.db <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/met.kegg.db.rds"))
Running gatom
with KEGG network on full dataset:
kg <- makeMetabolicGraph(network=network,
topology="atoms",
org.gatom.anno=org.Mm.eg.gatom.anno,
gene.de=gene.de.raw,
met.db=met.db,
met.de=met.de.raw)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
## Metabolite p-value threshold: 0.055133
## Metabolite BU alpha: 0.087937
## FDR for metabolites: 0.070765
## Gene p-value threshold: 0.002471
## Gene BU alpha: 0.164266
## FDR for genes: 0.010231
## IGRAPH 044115c UN-- 67 67 --
## + attr: signals (g/n), name (v/c), metabolite (v/c), element (v/c),
## | label (v/c), url (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC
## | (v/n), baseMean (v/n), logPval (v/n), signal (v/c), score (v/n),
## | label (e/c), pval (e/n), origin (e/n), RefSeq (e/c), gene (e/c),
## | enzyme (e/c), reaction_name (e/c), reaction_equation (e/c), url
## | (e/c), reaction (e/c), log2FC (e/n), baseMean (e/n), logPval (e/n),
## | signal (e/c), signalRank (e/n), score (e/n)
## + edges from 044115c (vertex names):
## [1] C00025_-4.2219_3.5625--C00026_-4.2219_3.5625
## [2] C00025_0.9743_3.5625 --C00026_-4.2219_3.5625
## + ... omitted several edges
Rhea network consists of network.rhea.rds
&
met.rhea.db.rds
files and is based on Rhea database.
Reactions in Rhea have their own IDs, but unlike KEGG, metabolite IDs come from a separate database – ChEBI database.
This network was generated with the pipeline available here. For extra details on Rhea network you can also reference shinyGatom article.
To use Rhea network download the following files:
network.rhea <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/network.rhea.rds"))
met.rhea.db <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/met.rhea.db.rds"))
For proper work of the Rhea network we also need a corresponding supplementary gene file (ref. Supplementary Genes).
gene2reaction.extra <- fread("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/gene2reaction.rhea.mmu.eg.tsv", colClasses="character")
And run gatom
on Rhea network:
rg <- makeMetabolicGraph(network=network.rhea,
topology="atoms",
org.gatom.anno=org.Mm.eg.gatom.anno,
gene.de=gene.de.raw,
met.db=met.rhea.db,
met.de=met.de.raw,
gene2reaction.extra=gene2reaction.extra)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
## Metabolite p-value threshold: 0.046206
## Metabolite BU alpha: 0.082053
## FDR for metabolites: 0.092911
## Gene p-value threshold: 0.000695
## Gene BU alpha: 0.161730
## FDR for genes: 0.004909
## IGRAPH 60f6d9b UN-- 46 46 --
## + attr: signals (g/n), name (v/c), metabolite (v/c), element (v/c),
## | label (v/c), url (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC
## | (v/n), baseMean (v/n), logPval (v/n), signal (v/c), score (v/n),
## | label (e/c), pval (e/n), origin (e/n), RefSeq (e/c), gene (e/c),
## | enzyme (e/c), url (e/c), reaction (e/c), log2FC (e/n), baseMean
## | (e/n), logPval (e/n), signal (e/c), signalRank (e/n), score (e/n)
## + edges from 60f6d9b (vertex names):
## [1] CHEBI:15562_-2.9228_2.8125--CHEBI:16383_-0.3248_2.8125
## [2] CHEBI:15361_-1.299_3.75 --CHEBI:16452_-1.9486_3.375
## [3] CHEBI:15562_-2.9228_2.8125--CHEBI:16810_-0.3248_2.8125
## + ... omitted several edges
Result Rhea network module:
Combined network comprises not only KEGG and Rhea reactions, but also transport reactions from BIGG database.
This means that reactions in such network have either KEGG or Rhea or BIGG IDs, and metabolite IDs are KEGGs and ChEBIs.
Rhea lipid subnetwork is subset of Rhea reactions that involve
lipids, and it consists of network.rhea.lipids.rds
&
met.rhea.lipids.db.rds
files.
This network was generated with the pipeline available here. For extra details on Rhea lipid subnetwork you can also reference shinyGatom article.
network.lipids <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/network.rhea.lipids.rds"))
met.lipids.db <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/met.lipids.db.rds"))
For proper work of the lipid network we will also need a corresponding supplementary gene file (ref. Supplementary Genes)
gene2reaction.extra <- fread("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/gene2reaction.rhea.mmu.eg.tsv", colClasses="character")
To test lipid network we will use example lipidomics data for WT mice control vs high fat diet comparison from ST001289 dataset.
met.de.lipids <- fread("https://artyomovlab.wustl.edu/publications/supp_materials/GATOM/Ctrl.vs.HighFat.lipid.de.csv")
For lipid network we recommend setting topology to
metabolites
(ref. Using
metabolite-level network):
lg <- makeMetabolicGraph(network=network.lipids,
topology="metabolites",
org.gatom.anno=org.Mm.eg.gatom.anno,
gene.de=NULL,
met.db=met.lipids.db,
met.de=met.de.lipids,
gene2reaction.extra=gene2reaction.extra)
## Found DE table for metabolites with Species IDs
## Metabolite p-value threshold: 0.009204
## Metabolite BU alpha: 0.145591
## FDR for metabolites: 0.006065
## IGRAPH b3d9441 UN-- 47 46 --
## + attr: signals (g/n), name (v/c), metabolite (v/c), element (v/c),
## | label (v/c), url (v/c), LipidMaps_ID (v/c), SwissLipids_ID (v/c),
## | Class_LipidMaps (v/c), Abbreviation_LipidMaps (v/c),
## | Synonyms_LipidMaps (v/c), Class_SwissLipids (v/c),
## | Abbreviation_SwissLipids (v/c), pval (v/n), origin (v/n), Species
## | (v/c), Mean(ctrl) (v/n), Mean(exp) (v/n), FC (v/n), log2FC (v/n),
## | -Log10(p-value) (v/n), -Log10(padj) (v/n), adj.P.Val (v/n),
## | Significance(padj) (v/c), Significance(p-value) (v/c), logPval (v/n),
## | signal (v/c), score (v/n), label (e/c), pval (e/l), gene (e/c),
## | enzyme (e/c), url (e/c), reaction (e/c), score (e/n), signal (e/c)
## + edges from b3d9441 (vertex names):
Result lipid subnetwork module:
If IDs for metabolite differential abundance data are of type
Species
we can process metabolite labels into more readable
ones:
For combined, Rhea and lipid networks we provide supplementary files with genes that either come from proteome or are not linked to a specific enzyme. These files are organism-specific and are also available at http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/.
network.combined <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/network.combined.rds"))
met.combined.db <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/met.combined.db.rds"))
gene2reaction.extra <- fread("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/gene2reaction.combined.mmu.eg.tsv", colClasses="character")
gg <- makeMetabolicGraph(network=network.combined,
topology="atoms",
org.gatom.anno=org.Mm.eg.gatom.anno,
gene.de=gene.de.raw,
met.db=met.combined.db,
met.de=met.de.raw,
gene2reaction.extra=gene2reaction.extra)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
## IGRAPH 3f7f3ef UN-- 5047 6374 --
## + attr: name (v/c), metabolite (v/c), element (v/c), label (v/c), url
## | (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC (v/n), baseMean
## | (v/n), logPval (v/n), signal (v/c), label (e/c), pval (e/n), origin
## | (e/n), RefSeq (e/c), gene (e/c), enzyme (e/c), reaction_name (e/c),
## | reaction_equation (e/c), url (e/c), reaction (e/c), log2FC (e/n),
## | baseMean (e/n), logPval (e/n), signal (e/c), signalRank (e/n)
## + edges from 3f7f3ef (vertex names):
## [1] C00019_-0.75_-1.299--C00019_-0.75_-1.299
## [2] C00019_-1.5_0 --C00019_-1.5_0
## [3] C00019_0.75_-1.299 --C00019_0.75_-1.299
## + ... omitted several edges
Optionally, we can also preserve non-enzymatic reactions that are
found in the network. This can be done by setting
keepReactionsWithoutEnzymes
to TRUE
:
ge <- makeMetabolicGraph(network=network.combined,
topology="atoms",
org.gatom.anno=org.Mm.eg.gatom.anno,
gene.de=gene.de.raw,
met.db=met.combined.db,
met.de=met.de.raw,
gene2reaction.extra=gene2reaction.extra,
keepReactionsWithoutEnzymes=TRUE)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
## IGRAPH 23ff3c6 UN-- 5171 6519 --
## + attr: name (v/c), metabolite (v/c), element (v/c), label (v/c), url
## | (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC (v/n), baseMean
## | (v/n), logPval (v/n), signal (v/c), label (e/c), pval (e/n), origin
## | (e/n), RefSeq (e/c), gene (e/c), enzyme (e/c), reaction_name (e/c),
## | reaction_equation (e/c), url (e/c), reaction (e/c), log2FC (e/n),
## | baseMean (e/n), logPval (e/n), signal (e/c), signalRank (e/n)
## + edges from 23ff3c6 (vertex names):
## [1] C00019_-0.75_-1.299--C00019_-0.75_-1.299
## [2] C00019_-1.5_0 --C00019_-1.5_0
## [3] C00019_0.75_-1.299 --C00019_0.75_-1.299
## + ... omitted several edges
For proper analysis quality we recommend to use exact SGMWCS solver
virgo_solver()
, which requires Java (version >= 11) and
CPLEX (version >= 12.7) to be installed. If the requirements are met
you can then find a module as following:
vsolver <- virgo_solver(cplex_dir=Sys.getenv("CPLEX_HOME"),
threads=4, penalty=0.001, log=1)
sol <- solve_mwcsp(vsolver, gs)
m <- sol$graph
Edge penalty option there is used to remove excessive redundancy in genes.
If there is no metabolite data in your experiment assign
met.de
and k.met
to NULL
:
g <- makeMetabolicGraph(network=networkEx,
topology="atoms",
org.gatom.anno=org.Mm.eg.gatom.annoEx,
gene.de=gene.de.rawEx,
met.db=met.kegg.dbEx,
met.de=NULL)
## Found DE table for genes with RefSeq IDs
## Warning in bumOptim(x = x, starts): One or both parameters are on the limit of
## the defined parameter space
## Gene p-value threshold: 0.130626
## Gene BU alpha: 0.364527
## FDR for genes: 0.100000
If there is no gene data in your experiment assign
gene.de
and k.gene
to NULL
:
g <- makeMetabolicGraph(network=networkEx,
topology="atoms",
org.gatom.anno=org.Mm.eg.gatom.annoEx,
gene.de=NULL,
met.db=met.kegg.dbEx,
met.de=met.de.rawEx)
## Found DE table for metabolites with HMDB IDs
## Metabolite p-value threshold: 0.260385
## Metabolite BU alpha: 0.080774
## FDR for metabolites: 0.100000
Sometimes it could make sense to work with metabolite-metabolite topology of the network, not atom-atom one. Such network is less structured, but contains more genes.
gm <- makeMetabolicGraph(network=network,
topology="metabolite",
org.gatom.anno=org.Mm.eg.gatom.anno,
gene.de=gene.de.raw,
met.db=met.db,
met.de=met.de.raw)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
## Metabolite p-value threshold: 0.037216
## Metabolite BU alpha: 0.078983
## FDR for metabolites: 0.062353
## Gene p-value threshold: 0.001241
## Gene BU alpha: 0.163383
## FDR for genes: 0.006075
## IGRAPH 72c289b UN-- 63 63 --
## + attr: signals (g/n), name (v/c), metabolite (v/c), element (v/c),
## | label (v/c), url (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC
## | (v/n), baseMean (v/n), logPval (v/n), signal (v/c), score (v/n),
## | label (e/c), pval (e/n), origin (e/n), RefSeq (e/c), gene (e/c),
## | enzyme (e/c), reaction_name (e/c), reaction_equation (e/c), url
## | (e/c), reaction (e/c), log2FC (e/n), baseMean (e/n), logPval (e/n),
## | signal (e/c), signalRank (e/n), score (e/n)
## + edges from 72c289b (vertex names):
## [1] C00219--C05956 C00062--C05933 C00327--C05933 C00417--C00490 C00062--C00086
## [6] C00157--C00162 C00157--C00219 C00162--C00350 C00550--C00588 C00669--C01879
## + ... omitted several edges
To get functional annotation of obtained modules by KEGG and Reactome
metabolic pathways, we can use hypergeometric test with
fora()
function from fgsea
package.
foraRes <- fgsea::fora(pathways=org.Mm.eg.gatom.anno$pathways,
genes=E(km)$gene,
universe=unique(E(kg)$gene),
minSize=5)
foraRes[padj < 0.05]
## pathway
## <char>
## 1: mmu_M00002: Glycolysis, core module involving three-carbon compounds
## 2: mmu_M00003: Gluconeogenesis, oxaloacetate => fructose-6P
## 3: mmu_M00001: Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate
## 4: mmu00565: Ether lipid metabolism
## 5: mmu00010: Glycolysis / Gluconeogenesis
## 6: mmu00030: Pentose phosphate pathway
## 7: mmu00020: Citrate cycle (TCA cycle)
## 8: mmu00564: Glycerophospholipid metabolism
## 9: mmu_M00004: Pentose phosphate pathway (Pentose phosphate cycle)
## pval padj foldEnrichment overlap size overlapGenes
## <num> <num> <num> <int> <int> <list>
## 1: 5.046444e-05 0.003532511 7.017241 5 5 21991, 1....
## 2: 2.695203e-04 0.005915625 5.847701 5 6 74551, 1....
## 3: 3.925355e-04 0.005915625 4.678161 6 9 212032, ....
## 4: 3.925355e-04 0.005915625 4.678161 6 9 18783, 2....
## 5: 4.225447e-04 0.005915625 3.050975 10 23 13806, 1....
## 6: 2.154500e-03 0.025135833 3.274713 7 15 110208, ....
## 7: 3.034903e-03 0.026789277 3.508621 6 12 11428, 1....
## 8: 3.061632e-03 0.026789277 2.631466 9 24 14571, 1....
## 9: 3.996354e-03 0.031082754 3.898467 5 9 110208, ....
Optionally, redundancy in pathways can be decreased with
collapsePathwaysORA()
function:
mainPathways <- fgsea::collapsePathwaysORA(
foraRes[padj < 0.05],
pathways=org.Mm.eg.gatom.anno$pathways,
genes=E(km)$gene,
universe=unique(E(kg)$gene))
foraRes[pathway %in% mainPathways$mainPathways]
## pathway
## <char>
## 1: mmu_M00002: Glycolysis, core module involving three-carbon compounds
## 2: mmu00565: Ether lipid metabolism
## 3: mmu00030: Pentose phosphate pathway
## 4: mmu00020: Citrate cycle (TCA cycle)
## pval padj foldEnrichment overlap size overlapGenes
## <num> <num> <num> <int> <int> <list>
## 1: 5.046444e-05 0.003532511 7.017241 5 5 21991, 1....
## 2: 3.925355e-04 0.005915625 4.678161 6 9 18783, 2....
## 3: 2.154500e-03 0.025135833 3.274713 7 15 110208, ....
## 4: 3.034903e-03 0.026789277 3.508621 6 12 11428, 1....
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] R.utils_2.12.3 R.oo_1.26.0 R.methodsS3_1.8.2 mwcsr_0.1.9
## [5] igraph_2.1.1 data.table_1.16.2 gatom_1.5.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 farver_2.1.2 dplyr_1.1.4
## [4] blob_1.2.4 Biostrings_2.75.0 fastmap_1.2.0
## [7] GGally_2.2.1 XML_3.99-0.17 digest_0.6.37
## [10] lifecycle_1.0.4 KEGGREST_1.45.1 RSQLite_2.3.7
## [13] magrittr_2.0.3 compiler_4.4.1 rlang_1.1.4
## [16] sass_0.4.9 tools_4.4.1 utf8_1.2.4
## [19] yaml_2.3.10 sna_2.8 knitr_1.48
## [22] htmlwidgets_1.6.4 bit_4.5.0 plyr_1.8.9
## [25] RColorBrewer_1.1-3 BiocParallel_1.41.0 withr_3.0.2
## [28] purrr_1.0.2 BiocGenerics_0.53.0 shinyCyJS_1.0.0
## [31] sys_3.4.3 grid_4.4.1 stats4_4.4.1
## [34] fansi_1.0.6 colorspace_2.1-1 ggplot2_3.5.1
## [37] scales_1.3.0 cli_3.6.3 rmarkdown_2.28
## [40] crayon_1.5.3 generics_0.1.3 httr_1.4.7
## [43] DBI_1.2.3 cachem_1.1.0 stringr_1.5.1
## [46] zlibbioc_1.51.2 network_1.18.2 parallel_4.4.1
## [49] AnnotationDbi_1.69.0 BiocManager_1.30.25 XVector_0.45.0
## [52] vctrs_0.6.5 Matrix_1.7-1 jsonlite_1.8.9
## [55] IRanges_2.39.2 S4Vectors_0.43.2 bit64_4.5.2
## [58] BioNet_1.67.0 maketools_1.3.1 jquerylib_0.1.4
## [61] tidyr_1.3.1 intergraph_2.0-4 glue_1.8.0
## [64] statnet.common_4.10.0 ggstats_0.7.0 codetools_0.2-20
## [67] cowplot_1.1.3 stringi_1.8.4 gtable_0.3.6
## [70] GenomeInfoDb_1.41.2 UCSC.utils_1.1.0 munsell_0.5.1
## [73] tibble_3.2.1 pillar_1.9.0 htmltools_0.5.8.1
## [76] fgsea_1.31.6 GenomeInfoDbData_1.2.13 R6_2.5.1
## [79] lattice_0.22-6 evaluate_1.0.1 Biobase_2.67.0
## [82] png_0.1-8 memoise_2.0.1 pryr_0.1.6
## [85] bslib_0.8.0 fastmatch_1.1-4 Rcpp_1.0.13
## [88] coda_0.19-4.1 xfun_0.48 buildtools_1.0.0
## [91] pkgconfig_2.0.3