Package 'GSEAmining' reference manual

Title:	Make Biological Sense of Gene Set Enrichment Analysis Outputs
Description:	Gene Set Enrichment Analysis is a very powerful and interesting computational method that allows an easy correlation between differential expressed genes and biological processes. Unfortunately, although it was designed to help researchers to interpret gene expression data it can generate huge amounts of results whose biological meaning can be difficult to interpret. Many available tools rely on the hierarchically structured Gene Ontology (GO) classification to reduce reundandcy in the results. However, due to the popularity of GSEA many more gene set collections, such as those in the Molecular Signatures Database are emerging. Since these collections are not organized as those in GO, their usage for GSEA do not always give a straightforward answer or, in other words, getting all the meaninful information can be challenging with the currently available tools. For these reasons, GSEAmining was born to be an easy tool to create reproducible reports to help researchers make biological sense of GSEA outputs. Given the results of GSEA, GSEAmining clusters the different gene sets collections based on the presence of the same genes in the leadind edge (core) subset. Leading edge subsets are those genes that contribute most to the enrichment score of each collection of genes or gene sets. For this reason, gene sets that participate in similar biological processes should share genes in common and in turn cluster together. After that, GSEAmining is able to identify and represent for each cluster: - The most enriched terms in the names of gene sets (as wordclouds) - The most enriched genes in the leading edge subsets (as bar plots). In each case, positive and negative enrichments are shown in different colors so it is easy to distinguish biological processes or genes that may be of interest in that particular study.
Authors:	Oriol Arqués [aut, cre]
Maintainer:	Oriol Arqués <[email protected]>
License:	GPL-3 \| file LICENSE
Version:	1.17.0
Built:	2025-01-28 04:26:44 UTC
Source:	https://github.com/bioc/GSEAmining

clust_group_cores

Description

Takes the output of clust_groups, a data frame , and process it to obtain the enrichment of genes in the core enrichment (or leading edge analysis) within each cluster. The output is used in the functions gm_enrichcores and gm_enrichreport.

Usage

clust_group_cores(cg, top = 3)
clust_group_cores(cg, top = 3)

Arguments

`cg`	A data frame output from the GSEAmining clusts_groups function.
`top`	An integer to choose the top most enriched genes to plot per cluster. The default parameter are the top 3.

Value

A tibble with four variables (Cluster, Enrichment, lead_token, n).

Examples

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
clust.groups <- clust_groups(genesets_sel, gs.cl)
clust.lead <- clust_group_cores(clust.groups, top = 3)

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
clust.groups <- clust_groups(genesets_sel, gs.cl)
clust.lead <- clust_group_cores(clust.groups, top = 3)

clust_group_terms

Description

Takes the output of clust_groups, a data frame , and process it to obtain the enrichment of terms in gene sets names within each cluster. The output is used in the functions gm_enrichterms and gm_enrichreport.

Usage

clust_group_terms(cg)
clust_group_terms(cg)

Arguments

`cg`	A data frame output from the GSEAmining cluster_groups function.

Value

A tibble with four variables (Cluster, Enrichment, monogram, n).

Examples

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
clust.groups <- clust_groups(genesets_sel, gs.cl)
clust.groups.wordcloud <- clust_group_terms(clust.groups)

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
clust.groups <- clust_groups(genesets_sel, gs.cl)
clust.groups.wordcloud <- clust_group_terms(clust.groups)

clust_groups

Description

Takes the output of gm_clust, which is an hclust class object, and returns a data frame that will be used in the rest of GSEAmining functions gm_enrichreport, gm_enrichterms and gm_enrichcores.

Usage

clust_groups(df, hc)
clust_groups(df, hc)

Arguments

`df`	Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'.
`hc`	The output of gm_clust, which is an hclust class object.

Value

A data.frame containing the cluster each gene set belongs to.

Examples

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
clust.groups <- clust_groups(genesets_sel, gs.cl)

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
clust.groups <- clust_groups(genesets_sel, gs.cl)

Selected gene sets as test

Description

Data that corresponds to GSEA analysis of differential expressed genes from treated versus control samples in HGPalmer-PDX-P30 experiment. Differential gene expression was obtained by using the oligo and limma R packages. GSEA analysis was performed using the clusterProfiler R package using MSigDb collections C2, C5 and Hallmarks.

Usage

data(genesets_sel)
data(genesets_sel)

Format

An object of class data.frame with 52 observations and 4 variables:

ID: Name of the gene set
NES: Normalized Enrichment Score
p.adjust: False discovery rate
core_enrichment: Genes that are in the leading edge subset

Source

ArrayExpress

References

Arqués et al. Clinical Cancer Research. 2016 Feb 1;22(3):644-56. doi: 10.1158/1078-0432.CCR-14-3081. Epub 2015 Jul 29. (Clinical Cancer Research)

Examples

data(genesets_sel)

gs.cl <- gm_clust(genesets_sel)

gm_dendplot(genesets_sel, gs.cl)

gm_enrichterms(genesets_sel, gs.cl)

gm_enrichcores(genesets_sel, gs.cl)

## Not run: gm_enrichreport(genesets_sel, gs.cl)

data(genesets_sel)

gs.cl <- gm_clust(genesets_sel)

gm_dendplot(genesets_sel, gs.cl)

gm_enrichterms(genesets_sel, gs.cl)

gm_enrichcores(genesets_sel, gs.cl)

## Not run: gm_enrichreport(genesets_sel, gs.cl)

gm_clust: GSEAmining cluster object

Description

Takes the output of gm_filter or a data frame that with the results of GSEA analysis and returns and hclust object that can be plotted using the gm_dendplot function.

Usage

gm_clust(df)
gm_clust(df)

Arguments

`df`	Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'.

Value

An object of class hclust that contains the clustering of the gene sets by the core enriched genes.First a distance matrix is calculated using the 'binary' method and then a cluster with the 'complete' method is created.

Examples

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)


data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)

gm_dendplot: GSEAmining dendrogram plotter

Description

Takes the output of gm_clust, which is an hclust class object, and plots the dendrogram using the dendextend package.

Usage

gm_dendplot(
  df,
  hc,
  col_pos = "red",
  col_neg = "blue",
  dend_len = 30,
  rect = TRUE,
  rect_len = 2
)
gm_dendplot(
  df,
  hc,
  col_pos = "red",
  col_neg = "blue",
  dend_len = 30,
  rect = TRUE,
  rect_len = 2
)

Arguments

`df`	Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'.
`hc`	The output of gm_clust, which is an hclust class object.
`col_pos`	Color to represent the positively enriched gene sets. Default is red.
`col_neg`	Color to represent the negatively enriched gene sets. Default is blue.
`dend_len`	An integer that defines the length of the dendrogram. Default value is 30. The closest to zero the longest the dendrogram.
`rect`	A logical value indicating if rectangles should be drawn around the clusters to help differentiating them. By default it is set to TRUE.
`rect_len`	An integer to specify the length of the rectangle around the cluster and the gene set label. Default is 2. The closest to zero the smallest the rectangle.

Value

Invisibly returns a list with all the elements necessary to plot a dendrogram.

Examples

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
gm_dendplot(genesets_sel, gs.cl)

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
gm_dendplot(genesets_sel, gs.cl)

gm_enrichcores: GSEAmining core enrichment genes

Description

Takes the output of gm_clust, which is an hclust class object, and plots the top n genes in core enrichment (leading edge analysis). Two options are available, either separate barplots by clusters or all together in one plot.

Usage

gm_enrichcores(
  df,
  hc,
  clust = TRUE,
  col_pos = "red",
  col_neg = "blue",
  top = 3
)
gm_enrichcores(
  df,
  hc,
  clust = TRUE,
  col_pos = "red",
  col_neg = "blue",
  top = 3
)

Arguments

`df`	Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'.
`hc`	The output of gm_clust, which is an hclust class object.
`clust`	A logical value indicating if wordclouds should be separated by clusters or not. Default value is TRUE.
`col_pos`	Color to represent positively enriched gene sets. Default is red.
`col_neg`	Color to represent negatively enriched gene sets. Default is blue.
`top`	An integer to choose the top most enriched genes to plot per cluster. The default parameter are the top 3.

Value

Returns a ggplot object.

Examples

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
gm_enrichcores(genesets_sel, gs.cl)

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
gm_enrichcores(genesets_sel, gs.cl)

gm_enrichreport: GSEAmining Enrichment Report

Description

Takes the output of gm_clust, which is an hclust class object, and creates a report in pdf that contains enriched terms and enriched core genes in gene sets for each cluster. The results of each cluster are plotted in an independent page.

Usage

gm_enrichreport(
  df,
  hc,
  col_pos = "red",
  col_neg = "blue",
  top = 3,
  output = "gm_report"
)
gm_enrichreport(
  df,
  hc,
  col_pos = "red",
  col_neg = "blue",
  top = 3,
  output = "gm_report"
)

Arguments

`df`	Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'.
`hc`	The output of gm_clust, which is an hclust class object.
`col_pos`	Color to represent positively enriched gene sets. Default is red.
`col_neg`	Color to represent negatively enriched gene sets. Default is blue.
`top`	An integer to choose the top most enriched genes to plot per cluster. The default parameter are the top 3.
`output`	A string to name the output pdf file.

Value

Generates a pdf file.

Examples

#' data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
## Not run: gm_enrichreport(genesets_sel, gs.cl)

#' data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
## Not run: gm_enrichreport(genesets_sel, gs.cl)

gm_enrichterms: GSEAmining enriched terms

Description

Takes the output of gm_clust, which is an hclust class object, and plots gene set enriched terms as wordclouds. Two options are available, either separate enrichments by clusters or plot them together in a single plot.

Usage

gm_enrichterms(df, hc, clust = TRUE, col_pos = "red", col_neg = "blue")
gm_enrichterms(df, hc, clust = TRUE, col_pos = "red", col_neg = "blue")

Arguments

`df`	Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'.
`hc`	The output of gm_clust, which is an hclust class object.
`clust`	A logical value indicating if wordclouds should be separated by clusters or not. Default value is TRUE.
`col_pos`	Color to represent positively enriched gene sets. Default is red.
`col_neg`	Color to represent negatively enriched gene sets. Default is blue.

Value

Returns a ggplot object.

Examples

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
gm_enrichterms(genesets_sel, gs.cl)

data(genesets_sel)
gs.cl <- gm_clust(genesets_sel)
gm_enrichterms(genesets_sel, gs.cl)

gm_filter: GSEAmining GSEA output filter

Description

Filters a data frame containing the results of GSEA analysis.

Usage

gm_filter(df, p.adj = 0.05, neg_NES = 1, pos_NES = 1)
gm_filter(df, p.adj = 0.05, neg_NES = 1, pos_NES = 1)

Arguments

`df`	Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'.
`p.adj`	An integer to set the limit of the adjusted p-value (or false discovery rate, FDR). Default value is 0.05
`neg_NES`	A positive integer to set the limit of negative NES. Default is 1.
`pos_NES`	A positive integer to set the limit of positive NES. Default is 1.

Value

A data frame.

Examples

data(genesets_sel)
gs.filt <- gm_filter(genesets_sel, p.adj = 0.05, neg_NES = 2.6, pos_NES = 2)

data(genesets_sel)
gs.filt <- gm_filter(genesets_sel, p.adj = 0.05, neg_NES = 2.6, pos_NES = 2)

Stop words. Eliminates the first word of the gene sets from MSigDb that relate to the origin of the gene set. Additionally it eliminates words that do not add a lot of significance such as prepositions or adverbs among others.

Description

Stop words. Eliminates the first word of the gene sets from MSigDb that relate to the origin of the gene set. Additionally it eliminates words that do not add a lot of significance such as prepositions or adverbs among others.

Usage

stop_words()
stop_words()

Value

Returns a tibble with 2 variables.

Package 'GSEAmining'

Help Index

clust_group_cores

Description

Usage

Arguments

Value

Examples

clust_group_terms

Description

Usage

Arguments

Value

Examples

clust_groups

Description

Usage

Arguments

Value

Examples

Selected gene sets as test

Description

Usage

Format

Source

References

Examples

gm_clust: GSEAmining cluster object

Description

Usage

Arguments

Value

Examples

gm_dendplot: GSEAmining dendrogram plotter

Description

Usage

Arguments

Value

Examples

gm_enrichcores: GSEAmining core enrichment genes

Description

Usage

Arguments

Value

Examples

gm_enrichreport: GSEAmining Enrichment Report

Description

Usage

Arguments

Value

Examples

gm_enrichterms: GSEAmining enriched terms

Description

Usage

Arguments

Value

Examples

gm_filter: GSEAmining GSEA output filter

Description

Usage

Arguments

Value

Examples

Stop words. Eliminates the first word of the gene sets from MSigDb that relate to the origin of the gene set. Additionally it eliminates words that do not add a lot of significance such as prepositions or adverbs among others.

Description

Usage

Value