---
title: "From functional enrichment results to biological networks"
author: Astrid Deschênes, Pascal Belleau, Robert L Faure and Maria J Fernandes
output:
BiocStyle::html_document:
toc: true
toc_depth: 3
pkgdown:
as_is: true
bibliography: enrichViewNet.bibtex
vignette: >
%\VignetteIndexEntry{From functional enrichment results to biological networks}
%\VignettePackage{enrichViewNet}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r style, echo = FALSE, results = 'hide', warning=FALSE, message=FALSE}
BiocStyle::markdown()
suppressPackageStartupMessages({
library(knitr)
library(enrichViewNet)
library(gprofiler2)
library(ggplot2)
})
## Set it globally
options(ggrepel.max.overlaps = Inf)
set.seed(1214)
```
**Package**: `r Rpackage("enrichViewNet")`
**Authors**: `r packageDescription("enrichViewNet")[["Author"]]`
**Version**: `r packageDescription("enrichViewNet")$Version`
**Compiled date**: `r Sys.Date()`
**License**: `r packageDescription("enrichViewNet")[["License"]]`
# Licensing
The `r Biocpkg("enrichViewNet")` package and the underlying
`r Biocpkg("enrichViewNet")` code are distributed under the
Artistic license 2.0. You are free to use and redistribute this software.
# Citing
If you use this package for a publication, we would ask you to cite the
following:
> Deschênes A, Belleau P, Faure RL, Fernandes MJ, Krasnitz A,
Tuveson DA (2023).
enrichViewNet: From functional enrichment results to biological networks.
doi:10.18129/B9.bioc.enrichViewNet,
https://bioconductor.org/packages/enrichViewNet.
# Introduction
High-throughput technologies are routinely used in basic and applied
research and are key drivers of scientific discovery. A major challenge
in using these experimental approaches is the analysis of the large amount
of data generated. These include lists of proteins or genes generated by
mass spectrometry, single-cell RNA sequencing and/or microarray analysis,
respectively. There is thus a need for robust bioinformatic and statistical
tools that can analyze these large datasets and display the data in the form
of networks that illustrate the biological and conceptual links with findings
in the literature. This gap has been partially addressed by several
bioinformatic tools that perform enrichment analysis of the data and/or
present it in the form of networks.
Functional enrichment analysis tools, such as _Enrichr_ [@Kuleshov2016] and
_DAVID_ [@Dennis2003], are specialized in positioning novel findings against
well curated data sources of biological processes and pathways.
Most specifically, those tools identify functional gene sets that
are statistically over- (or under-) represented in a gene list (functional
enrichment). The traditional output of a significant enrichment analysis
tool is a table containing
the significant gene sets with their associated statistics. While those
results are extremely useful, their interpretation is challenging. The visual
representation of these results as a network can greatly facilitate the
interpretation of the data.
Biological network models are visual representations of various biological
interacting elements which are based on mathematical graphs. In those
networks, the biological elements are generally represented by nodes while
the interactions and relationships are represented by edges. One of the
widely used network tools in the quantitative biology community is the
open source software _Cytoscape_ [@PaulShannon2003]. In addition of
biological data visualization and network
analysis, _Cytoscape_ can be expended through the use of
specialized plug-ins such as _BiNGO_ that calculates over-represented GO terms
in a network [@Maere2005] or _CentiScaPe_ that identifies relevant network
nodes [@Scardoni2009].
The _g:Profiler_ enrichment analysis tool [@Raudvere2019] is web based and has
the particularity of being accompanied by the CRAN package
_gprofiler2_ [@Kolberg2020]. The _gprofiler2_ package gives the opportunity to
researchers to incorporate functional enrichment analysis into automated
analysis pipelines written in R. This greatly facilitates research
reproducibility.
The **enrichViewNet** package enables the visualization
of functional enrichment results as network graphs. Visualization of
enriched terms aims to
facilitate the analyses of complex results.
Compared to popular enrichment visualization graphs such as bar plots and
dot plots, network graphs unveil the connection between the
terms as significant terms often share one or multiple genes. Moreover, the
**enrichViewNet** package takes advantage of a powerful network
visualization tool which is _Cytoscape_. By doing so, all the
functionalities of this mature software can be used to personalize
and analyze the enrichment networks.
First, the **enrichViewNet** package enables the
visualization of enrichment results, in a format corresponding to the one
generated by _gprofiler2_, as a
customizable _Cytoscape_ network [@PaulShannon2003]. In the biological
networks generated by **enrichViewNet**, both
gene datasets (GO terms/pathways/protein complexes) and genes associated
to the datasets are represented as nodes. While the edges connect each gene
to its dataset(s). Only genes present in the query used for the
enrichment analysis are shown.
```{r graphDemo01, echo = FALSE, fig.align="center", fig.cap="A network where significant GO terms and genes are presented as nodes while edges connect each gene to its associated term(s).", out.width = '90%'}
knitr::include_graphics("demo01.jpeg")
```
The **enrichViewNet** package
offers the option to generate a network for only a portion of the
significant terms by selecting the source or by providing a
specific list of terms.Once the network is created, the user can
personalize the visual attributes
and integrate external information such as expression profiles, phenotypes
and other molecular states. The user can also perform network analysis.
In addition, the **enrichViewNet** package also provides the option to
create enrichment maps from functional enrichment results.
The enrichment maps have been introduced in the Bioconductor
_enrichplot_ package [@Yu2022].
Enrichment maps enable the visualization of enriched terms into a network
with edges connecting overlapping genes. Thus, enriched terms with overlapping
genes cluster together. This type of graphs facilitate the
identification of functional modules.
```{r graphDemo02, echo=FALSE, fig.align="center", fig.cap="An enrichment map using significant Kegg terms where edges are connecting terms with overlapping genes.", out.width = '95%'}
knitr::include_graphics("demo_KEGG_emap_v03.jpg")
```
**enrichViewNet** has been submitted to
[Bioconductor](https://www.bioconductor.org/) to aid researchers in carrying
out reproducible network analyses using functional enrichment results.
# Installation
To install this package
from [Bioconductor](https://bioconductor.org), start R
(version 4.3 or later) and enter:
```{r installDemo01, eval=FALSE, warning=FALSE, message=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("enrichViewNet")
```
# General workflow
The following workflow gives an overview of the capabilities of
**enrichViewNet**:
```{r graphWorkflow, echo=FALSE, fig.align="center", fig.cap="The enrichViewNet general workflow", out.width = '100%'}
knitr::include_graphics("Figure_enrichViewNet_workflow_v05.jpg")
```
The principal input of **enrichViewNet** is a functional enrichment
result in a format identical to the one generated by the CRAN
_gprofiler2_ package.
From an enrichment result, the **enrichViewNet** offers two options:
* generate a gene-term network that can be loaded in *Cytoscape* software
* generate a enrichment map
For the gene-term network, the installation of *Cytoscape* software is
highly recommended.
# Transforming enrichment results into a gene-term network loadable in Cytoscape
The following workflow gives an overview of the steps associated to the
creation of an gene-term network loadable in *Cytoscape*.
```{r graphListToGraph01, echo = FALSE, fig.align="left", fig.cap="From an enrichment list (A) to a Cytoscape network (B).", out.width = '100%'}
knitr::include_graphics("FromListToGraph_v03.jpg")
```
The key steps for the workflow are:
Step | Function
-------------------------------- | --------------------------------------------
Run an enrichment analysis | `gprofiler2::gost()`
Start *Cytoscape* | outside R
Create a gene-term network | `createNetwork()`
The `package::function()` notation is used for functions from other packages.
## Run an enrichment analysis
The first step consists in running an enrichment analysis with
`r CRANpkg("gprofiler2")` package. The output of the _gprofiler2::gost()_
is a _list_ and should be saved.
```{r gprofiler, echo=TRUE, warning=FALSE, message=FALSE, collapse=F, eval=TRUE}
## Required library
library(gprofiler2)
## The dataset of differentially expressed genes done between
## napabucasin treated and DMSO control parental (Froeling et al 2019)
## All genes tested are present
data("parentalNapaVsDMSODEG")
## Retain significant results
## (absolute fold change superior to 1 and adjusted p-value inferior to 0.05)
retained <- which(abs(parentalNapaVsDMSODEG$log2FoldChange) > 1 &
parentalNapaVsDMSODEG$padj < 0.05)
signRes <- parentalNapaVsDMSODEG[retained, ]
## Run one functional enrichment analysis using all significant genes
## The species is homo sapiens ("hsapiens")
## The g:SCS multiple testing correction method (Raudvere U et al 2019)
## The WikiPathways database is used
## Only the significant results are retained (significant=TRUE)
## The evidence codes are included in the results (evcodes=TRUE)
## A custom background included the tested genes is used
gostres <- gprofiler2::gost(
query=list(parental_napa_vs_DMSO=unique(signRes$EnsemblID)),
organism="hsapiens",
correction_method="g_SCS",
sources=c("WP"),
significant=TRUE,
evcodes=TRUE,
custom_bg=unique(parentalNapaVsDMSODEG$EnsemblID))
```
The gost() function returns an named list of 2 entries:
* The __result__ entry contains the enrichment results.
* The __meta__ entry contains the metadata information.
```{r gostResult, echo=TRUE, eval=TRUE}
## The 'gostres' object is a list of 2 entries
## The 'result' entry contains the enrichment results
## The 'meta' entry contains the metadata information
## Some columns of interest in the results
gostres$result[1:4, c("query", "p_value", "term_size",
"query_size", "intersection_size", "term_id")]
## The term names can be longer than the one shown
gostres$result[19:22, c("term_id", "source", "term_name")]
```
## Start Cytoscape
[Cytoscape](https://cytoscape.org/) is an open source software for
visualizing networks. It enables network integration with any type of attribute
data. The Cytoscape software
is available at the [Cytoscape website](https://cytoscape.org/).
```{r cytoscapeLogo01, echo = FALSE, fig.align="center", fig.cap="Cytoscape software logo.", out.width = '55%'}
knitr::include_graphics("cy3sticker.png")
```
The Cytoscape network generated by
`r Githubpkg("adeschen/enrichViewNet")`
will be automatically loaded into the [Cytoscape](https://cytoscape.org/)
software when the application is running.
If the application is not running, a CX JSON file will be created (standard
Cytoscape file format). The file can then be loaded manually into
the [Cytoscape](https://cytoscape.org/) software.
## Create a gene-term network
The gene-term network can be created with the _createNetwork()_ function. If
_Cytoscape_ is opened, the network should automatically be loaded in the
application. Otherwise, a CX JSON file is created. The CX JSON can be manually
be opened in _Cytoscape_.
The following figure shows what the gene-term network looks like
in _Cytoscape_. As there are numerous significant Reactome terms, the network
is a bit hectic.
```{r runCreateNetwork, echo=TRUE, eval=TRUE, message=FALSE}
## Load saved enrichment results between parental Napa vs DMSO
data("parentalNapaVsDMSOEnrichment")
## Create network for REACTOME significant terms
## The 'removeRoot=TRUE' parameter removes the root term from the network
## The network will either by created in Cytoscape (if the application is open)
## or a CX file will be created in the temporary directory
createNetwork(gostObject=parentalNapaVsDMSOEnrichment, source="REAC",
removeRoot=TRUE, title="REACTOME_All",
collection="parental_napa_vs_DMSO",
fileName=file.path(tempdir(), "parentalNapaVsDMSOEnrichment.cx"))
```
This is an example of the Reactome network in _Cytoscape_.
```{r networkInCytoscape, echo=FALSE, fig.align="center", fig.cap="All reactome terms in a gene-term network loaded in Cytoscape.", out.width = '110%'}
knitr::include_graphics("cytoscape_reactome_all_parental_napa_vs_DMSO.png")
```
To address this situation, an updated gene-term network containing only
Reactome terms of interest is created.
```{r runCreateNetworkSelected, echo=TRUE, eval=TRUE, message=FALSE}
## Load saved enrichment results between parental Napa vs DMSO
data("parentalNapaVsDMSOEnrichment")
## List of terms of interest
reactomeSelected <- c("REAC:R-HSA-9031628", "REAC:R-HSA-198725",
"REAC:R-HSA-9614085", "REAC:R-HSA-9617828",
"REAC:R-HSA-9614657", "REAC:R-HSA-73857",
"REAC:R-HSA-74160", "REAC:R-HSA-381340")
## All enrichment results
results <- parentalNapaVsDMSOEnrichment$result
## Retain selected results
selectedRes <- results[which(results$term_id %in% reactomeSelected), ]
## Print the first selected terms
selectedRes[, c("term_name")]
```
```{r runCreateNetworkSelected2, echo=TRUE, eval=TRUE, message=FALSE, fig.align="center", fig.cap="Enrichment map."}
## Create network for REACTOME selected terms
## The 'source="TERM_ID"' parameter enable to specify a personalized
## list of terms of interest
## The network will either by created in Cytoscape (if the application is open)
## or a CX file will be created in the temporary directory
createNetwork(gostObject=parentalNapaVsDMSOEnrichment, source="TERM_ID",
termIDs=selectedRes$term_id, title="REACTOME_Selected",
collection="parental_napa_vs_DMSO",
fileName=file.path(tempdir(), "parentalNapaVsDMSO_REACTOME.cx"))
```
The updated Reactome network in _Cytoscape_.
```{r networkInCytoscapeSelected, echo=FALSE, fig.align="center", fig.cap="Selected Reactome terms in a gene-term network loaded in Cytoscape.", out.width = '110%'}
knitr::include_graphics("cytoscape_with_selected_REACTOME_v01.png")
```
In _Cytoscape_, the appearance of a network is easily customized. As example,
default color and shape for all nodes can be modified. For this example,
the nodes have been moved to clarify their relation with the Reactome terms.
```{r networkFinalReactome, echo=FALSE, fig.align="center", fig.cap="Final Reactome network after customization inside Cytoscape.", out.width = '100%'}
knitr::include_graphics("REACTOME_Selected.jpeg")
```
The final Reactome network, after customization inside Cytoscape, shows that
multiple transcription enriched terms (*FOXO-mediated transcription*,
*FOXO-mediated transcription of cell cycle genes*,
_transcription regulation of white adipocyte differentiation_,
_RNA polymerase II transcription_ and
_NGF-stimulated transcription_ terms) are linked through enriched genes.
# Transforming enrichment results into an enrichment map
The following workflow gives an overview of the steps associated to the
creation of an enrichment map.
The key steps for the workflow are:
Step | Function
-------------------------------- | ------------------------------------------
Run an enrichment analysis | `gprofiler2::gost()`
Create an enrichment map | `createEnrichMap()`
The `package::function()` notation is used for functions from other packages.
## Run an enrichment analysis
The first step consists in running an enrichment analysis with
`r CRANpkg("gprofiler2")` package. The output of the _gprofiler2::gost()_
is a _list_ and should be saved.
```{r gprofiler2, echo=TRUE, warning=FALSE, message=FALSE, collapse=F, eval=TRUE}
## Required library
library(gprofiler2)
## The dataset of differentially expressed genes done between
## napabucasin treated and DMSO control parental (Froeling et al 2019)
## All genes tested are present
data("parentalNapaVsDMSODEG")
## Retain significant results
## (absolute fold change superior to 1 and adjusted p-value inferior to 0.05)
retained <- which(abs(parentalNapaVsDMSODEG$log2FoldChange) > 1 &
parentalNapaVsDMSODEG$padj < 0.05)
signRes <- parentalNapaVsDMSODEG[retained, ]
## Run one functional enrichment analysis using all significant genes
## The species is homo sapiens ("hsapiens")
## The g:SCS multiple testing correction method (Raudvere U et al 2019)
## The WikiPathways database is used
## Only the significant results are retained (significant=TRUE)
## The evidence codes are included in the results (evcodes=TRUE)
## A custom background included the tested genes is used
gostres <- gprofiler2::gost(
query=list(parental_napa_vs_DMSO=unique(signRes$EnsemblID)),
organism="hsapiens",
correction_method="g_SCS",
sources=c("WP"),
significant=TRUE,
evcodes=TRUE,
custom_bg=unique(parentalNapaVsDMSODEG$EnsemblID))
```
The _gost()_ function returns an named list of 2 entries:
* The __result__ entry contains the enrichment results.
* The __meta__ entry contains the metadata information.
```{r gostResult2, echo=TRUE, eval=TRUE}
## The 'gostres' object is a list of 2 entries
## The 'result' entry contains the enrichment results
## The 'meta' entry contains the metadata information
## Some columns of interest in the results
gostres$result[1:4, c("query", "p_value", "term_size",
"query_size", "intersection_size", "term_id")]
## The term names can be longer than the one shown
gostres$result[19:22, c("term_id", "source", "term_name")]
```
## Create an enrichment map
The enrichment map can be created with the _createEnrichMap()_ function. The
function generates a _ggplot_ object.
In an enrichment map, terms with overlapping significant genes tend to
cluster together. The Jaccard correlation coefficient is used as a metric of
similarity. Terms with high similarity (similarity metric > 0.2) are linked
together by edges. The edges are shorter and thicker when the similarity
metric is high.
```{r runCreateEmap01, echo=TRUE, eval=TRUE, fig.cap="A Kegg enrichment map where terms with overlapping significant genes cluster together.", fig.align="center"}
## Load saved enrichment results between parental Napa vs DMSO
data(parentalNapaVsDMSOEnrichment)
## Set seed to ensure reproducible results
set.seed(121)
## Create network for all Kegg terms
## All terms will be shown even if there is overlapping
createEnrichMap(gostObject=parentalNapaVsDMSOEnrichment,
query="parental_napa_vs_DMSO",
source="KEGG")
```
The Kegg enrichment map shows that the _MAPK signaling pathway_ term is
highly influential in the network. In addition, the
_Transcriptional misregulation in cancer_ term is the only isolated node.
### Using list of term IDs
Instead of using a specific source, the enrichment map can be generated from a
personalized array of term IDs. To do so, the _source_ parameter must be set
to _"TERM_ID"_ and an array of term IDs must be passed to the _termIDs_
parameter. The terms can be pick-up from different sources. If more than one
terms have the same description, the term ID will be added at the end of the
description for clarity.
```{r runCreateEmapTerms, echo=TRUE, eval=TRUE, fig.cap="An enrichment map showing only the user selected terms.", fig.align="center"}
## Load saved enrichment results between parental Napa vs DMSO
data(parentalNapaVsDMSOEnrichment)
## The term IDs must correspond to the IDs present in the "term_id" column
head(parentalNapaVsDMSOEnrichment$result[, c("query", "term_id", "term_name")],
n=3)
## List of selected terms from different sources
termID <- c("KEGG:04115", "WP:WP4963", "KEGG:04010",
"REAC:R-HSA-5675221", "REAC:R-HSA-112409", "WP:WP382")
## Set seed to ensure reproducible results
set.seed(222)
## Create network for all selected terms
createEnrichMap(gostObject=parentalNapaVsDMSOEnrichment,
query="parental_napa_vs_DMSO",
source="TERM_ID", termIDs=termID)
```
As the description "MAPK signaling pathway" is present twice, the ID of each
term has been added to the end of the description. Hence, the MAPK pathway
from WikiPathway can be distinguished from the Kegg pathway.
### Effect of _seed_ value
The layering of the nodes is not always optimal. As the layering is affected by
the _seed_ value, you might want to test few _seed_ values, using the
_set.seed()_ function, before selecting the final graph.
```{r runCreateEmap02, echo=TRUE, eval=TRUE, message=FALSE, warning=FALSE, fig.cap="An enrichment map with a different seed.", fig.align="center"}
## Set seed to ensure reproducible results
set.seed(91)
## Create network for all Kegg terms
createEnrichMap(gostObject=parentalNapaVsDMSOEnrichment,
query="parental_napa_vs_DMSO",
source="KEGG")
```
### Enrichment map customization
The output of the _createEnrichMap()_ function is a _ggplot_ object.
This means that the graph can be personalized. For example, the default
colors can be changed:
```{r runCreateEmap03, echo=TRUE, eval=TRUE, message=FALSE, warning=FALSE, fig.cap="An enrichment map with personalized colors.", fig.align="center"}
## The ggplot2 library is required
library(ggplot2)
## Set seed to ensure reproducible results
set.seed(91)
## Create network for all Kegg terms
graphKegg <- createEnrichMap(gostObject=parentalNapaVsDMSOEnrichment,
query="parental_napa_vs_DMSO",
source="KEGG")
## Nodes with lowest p-values will be in orange and highest p-values in black
## The title of the legend is also modified
graphKegg + scale_color_continuous(name="P-value adjusted", low="orange",
high="black")
```
# Transforming enrichment results into an enrichment map with groups from different enrichment analyses
The following workflow gives an overview of the steps associated to the
creation of an enrichment map with groups. Groups can be created from the
same enrichment analysis or can come from different enrichment analyses. This
section is specifically for enrichment maps created using multiple enrichment
analyses.
The key steps for the workflow are:
Step | Function
--------------------------------------------- | ------------------------------
Run multiple enrichment analyses | `gprofiler2::gost()`
Create an enrichment map with groups | `createEnrichMapMultiBasic()`
The `package::function()` notation is used for functions from other packages.
The first step has been presented in the previous section.
## Create an enrichment map using multiple enrichment analyses
An enrichment map can show enrichment results from multiple enrichment
analyses. A different color is used for each analysis. This enables to
highlight the similarities and differences between the analyses.
```{r emapMulti01, echo=TRUE, warning=FALSE, message=FALSE, collapse=F, eval=TRUE, fig.cap="An enrichment map containing Kegg enrichment results for 2 different experiments.", fig.align="center"}
## Set seed to ensure reproducible results
set.seed(2121)
## The dataset of functional enriched terms for two experiments:
## napabucasin treated and DMSO control parental and
## napabucasin treated and DMSO control expressing Rosa26 control vector
## (Froeling et al 2019)
data("parentalNapaVsDMSOEnrichment")
data("rosaNapaVsDMSOEnrichment")
## The gostObjectList is a list containing all
## the functional enrichment objects
gostObjectList <- list(parentalNapaVsDMSOEnrichment,
rosaNapaVsDMSOEnrichment)
## The queryList is a list of query names retained for each of the enrichment
## object (same order). Beware that a enrichment object can contain more than
## one query.
query_01 <- unique(parentalNapaVsDMSOEnrichment$result$query)[1]
query_02 <- unique(rosaNapaVsDMSOEnrichment$result$query)[1]
queryList <- list(query_01, query_02)
## Enrichment map where the groups are the KEGG results for the 2 different
## experiments
createEnrichMapMultiBasic(gostObjectList=gostObjectList,
queryList=queryList, source="KEGG", removeRoot=TRUE)
```
There are 4 KEGG enrichment terms that are shared by the 2 experiments.
In additions, the _Viral carinogenesis_ term is the only term that is not
related to the other KEGG terms.
## Effect of _seed_ value
The layering of the nodes is not always optimal. As the layering is affected by
the _seed_ value, you might want to test few _seed_ values, using the
_set.seed()_ function, before selecting the final graph.
```{r emapMultiSeed, echo=TRUE, eval=TRUE, message=FALSE, warning=FALSE, fig.cap="An enrichment map using the same data fromt the previous one but with a different seed.", fig.align="center"}
## Set seed to ensure reproducible results
set.seed(5)
## Enrichment map where the groups are the KEGG results for the 2 different
## experiments
createEnrichMapMultiBasic(gostObjectList=gostObjectList,
queryList=queryList, source="KEGG", removeRoot=TRUE)
```
## Enrichment map customization
The output of the _createEnrichMapMultiBasic()_ function is a _ggplot_ object.
This means that the graph can be personalized. For example, the default
colors, legend title and legend font can be changed:
```{r emapMultiCustom, echo=TRUE, warning=FALSE, message=FALSE, collapse=F, eval=TRUE, fig.cap="An enrichment map using KEGG terms from two enrichment analyses with personalized colors and legend."}
## Required library
library(ggplot2)
## Set seed to ensure reproducible results
set.seed(5)
## Enrichment map where the groups are the KEGG results for the 2 different
## experiments
createEnrichMapMultiBasic(gostObjectList=gostObjectList,
queryList=queryList, source="KEGG", removeRoot=TRUE) +
scale_fill_manual(name="Groups",
breaks = queryList,
values = c("cyan4", "bisque3"),
labels = c("parental", "rosa")) +
theme(legend.title = element_text(face="bold"))
```
# Transforming enrichment results into an enrichment map with groups from same enrichment analysis or from complex designs
The following workflow gives an overview of the steps associated to the
creation of an enrichment map with groups. Groups can be created from the
same enrichment analysis or can come from different enrichment analyses. This
section is specifically for enrichment maps created using complex designs.
The key steps for the workflow are:
Step | Function
------------------------------------------- | ------------------------------
Run one ro multiple enrichment analyses | `gprofiler2::gost()`
Create an enrichment map with group | `createEnrichMapMultiComplex()`
The `package::function()` notation is used for functions from other packages.
The first step has been presented in the previous section.
## Create an enrichment map using mutliple subsections of one enrichment analysis
The _createEnrichMapMultiComplex()_ function enables enrichment map to be
generate with multiple groups from one enrichment analysis. As an example,
terms from different sources can be drawn in different colors.
```{r emapMultiComplex01, echo=TRUE, warning=FALSE, message=FALSE, collapse=F, eval=TRUE, fig.cap="An enrichment map containing Kegg and Reactome results from the rosa Napa vs DMSO analysis.", fig.align="center"}
## Set seed to ensure reproducible results
set.seed(3221)
## The dataset of functional enriched terms for one experiment:
## napabucasin treated and DMSO control expressing Rosa26 control vector
## (Froeling et al 2019)
data("rosaNapaVsDMSOEnrichment")
## The gostObjectList is a list containing all
## the functional enrichment objects
## In this case, the same enrichment object is used twice
gostObjectList <- list(rosaNapaVsDMSOEnrichment,
rosaNapaVsDMSOEnrichment)
## Extract the query name from the enrichment object
query_01 <- unique(rosaNapaVsDMSOEnrichment$result$query)[1]
## The query information is a data frame containing the information required
## to extract the specific terms for each enrichment object.
## The number of rows must correspond to the number of enrichment objects/
## The query name must be present in the enrichment object.
## The source can be: "GO:BP" for Gene Ontology Biological Process,
## "GO:CC" for Gene Ontology Cellular Component, "GO:MF" for Gene Ontology
## Molecular Function, "KEGG" for Kegg, "REAC" for Reactome,
## "TF" for TRANSFAC, "MIRNA" for miRTarBase, "CORUM" for CORUM database,
## "HP" for Human phenotype ontology and "WP" for WikiPathways or
## "TERM_ID" when a list of terms is specified.
## The termsIDs is an empty string except when the source is set to "TERM_ID".
## The group names are going to be used in the legend and should be unique to
## each group.
queryInfo <- data.frame(queryName=c(query_01, query_01),
source=c("KEGG", "REAC"),
removeRoot=c(TRUE, TRUE),
termIDs=c("", ""),
groupName=c("Kegg", "Reactome"),
stringsAsFactors=FALSE)
## Enrichment map where the groups are the KEGG and Reactome results for the
## same experiment
createEnrichMapMultiComplex(gostObjectList=gostObjectList,
queryInfo=queryInfo)
```
The significant genes are overlapping between the Kegg and Reactome pathways
as shown by the connections between the pathways.
The only isolated term is the Kegg Viral carcinogenesis pathway.
## Create an enrichment map using a complex design
Complex designs can be used with _createEnrichMapMultiComplex()_. Results
from different analyses as well as split results from the same analysis can
be showcase together.
In this example, the enrichment map will contain 4 groups from 2 experiments:
- MAP kinases related terms in rosa napabucasin treated vs DMSO control
- MAP kinases related terms in parental napabucasin treated vs DMSO control
- Interleukin related terms in rosa napabucasin treated vs DMSO control
- Interleukin related terms in parental napabucasin treated vs DMSO control
```{r emapMultiCustom2, echo=TRUE, warning=FALSE, message=FALSE, collapse=FALSE, eval=TRUE, fig.cap="An enrichment map using selected terms related to MAP kinases and interleukin in two different experiments."}
## Set seed to ensure reproducible results
set.seed(28)
## The datasets of functional enriched terms for the two experiments:
## napabucasin treated and DMSO control expressing Rosa26 control vector and
## napabucasin treated and DMSO control parental MiaPaCa2 cells
## (Froeling et al 2019)
data("rosaNapaVsDMSOEnrichment")
data("parentalNapaVsDMSOEnrichment")
## The gostObjectList is a list containing all
## the functional enrichment objects
## In this case, the same enrichment object is used twice
## The order of the objects must respect the order on the queryInfo data frame
## In this case:
## 1. rosa dataset (for MAP kinases)
## 2. parental dataset (for MAP kinases)
## 3. rosa dataset (for interleukin)
## 4. parental dataset (for interleukin)
gostObjectList <- list(rosaNapaVsDMSOEnrichment, parentalNapaVsDMSOEnrichment,
rosaNapaVsDMSOEnrichment, parentalNapaVsDMSOEnrichment)
## Extract the query name from the enrichment object
query_rosa <- unique(rosaNapaVsDMSOEnrichment$result$query)[1]
query_parental <- unique(parentalNapaVsDMSOEnrichment$result$query)[1]
## List of selected terms that will be shown in each group
rosa_mapk <- "GO:0017017,GO:0033549,KEGG:04010,WP:WP382"
rosa_il <- "KEGG:04657,WP:WP4754"
parental_mapk <- paste0("GO:0017017,GO:0033549,KEGG:04010,",
"REAC:R-HSA-5675221,REAC:R-HSA-112409,WP:WP382")
parental_il <- "WP:WP4754,WP:WP395"
## The query information is a data frame containing the information required
## to extract the specific terms for each enrichment object
## The number of rows must correspond to the number of enrichment objects
## The query name must be present in the enrichment object
## The source is set to "TERM_ID" so that the terms present in termIDs column
## will be used
## The group name will be used for the legend, the same name cannot be
## used twice
queryInfo <- data.frame(queryName=c(query_rosa, query_parental,
query_rosa, query_parental),
source=c("TERM_ID", "TERM_ID", "TERM_ID", "TERM_ID"),
removeRoot=c(FALSE, FALSE, FALSE, FALSE),
termIDs=c(rosa_mapk, parental_mapk, rosa_il, parental_il),
groupName=c("rosa - MAP kinases", "parental - MAP kinases",
"rosa - Interleukin", "parental - Interleukin"),
stringsAsFactors=FALSE)
## Enrichment map where the groups TODO
createEnrichMapMultiComplex(gostObjectList=gostObjectList,
queryInfo=queryInfo)
```
While the two experiments have a lot of similar enriched terms, there are
still terms that are unique to each experiment.
# Acknowledgments
The differentially expressed genes between napabucasin-treated cells (0.5 uM)
and DMSO as vehicle control are reprinted from
Clinical Cancer Research, 2019, 25 (23), 7162–7174,
Fieke E.M. Froeling, Manojit Mosur Swamynathan, Astrid Deschênes,
Iok In Christine Chio, Erin Brosnan, Melissa A. Yao, Priya Alagesan,
Matthew Lucito, Juying Li, An-Yun Chang, Lloyd C. Trotman, Pascal Belleau,
Youngkyu Park, Harry A. Rogoff, James D. Watson, David A. Tuveson,
Bioactivation of napabucasin triggers reactive oxygen species–mediated
cancer cell death, with permission from AACR.
Robert L. Faure is also supported by the National Sciences Engineering
Research Council of Canada (NSERCC): 155751-1501.
# Session info
Here is the output of sessionInfo() on the system on which this document
was compiled:
```{r sessionInfo, echo=FALSE}
sessionInfo()
```
# References