Title: | Make Biological Sense of Gene Set Enrichment Analysis Outputs |
---|---|
Description: | Gene Set Enrichment Analysis is a very powerful and interesting computational method that allows an easy correlation between differential expressed genes and biological processes. Unfortunately, although it was designed to help researchers to interpret gene expression data it can generate huge amounts of results whose biological meaning can be difficult to interpret. Many available tools rely on the hierarchically structured Gene Ontology (GO) classification to reduce reundandcy in the results. However, due to the popularity of GSEA many more gene set collections, such as those in the Molecular Signatures Database are emerging. Since these collections are not organized as those in GO, their usage for GSEA do not always give a straightforward answer or, in other words, getting all the meaninful information can be challenging with the currently available tools. For these reasons, GSEAmining was born to be an easy tool to create reproducible reports to help researchers make biological sense of GSEA outputs. Given the results of GSEA, GSEAmining clusters the different gene sets collections based on the presence of the same genes in the leadind edge (core) subset. Leading edge subsets are those genes that contribute most to the enrichment score of each collection of genes or gene sets. For this reason, gene sets that participate in similar biological processes should share genes in common and in turn cluster together. After that, GSEAmining is able to identify and represent for each cluster: - The most enriched terms in the names of gene sets (as wordclouds) - The most enriched genes in the leading edge subsets (as bar plots). In each case, positive and negative enrichments are shown in different colors so it is easy to distinguish biological processes or genes that may be of interest in that particular study. |
Authors: | Oriol Arqués [aut, cre] |
Maintainer: | Oriol Arqués <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 1.17.0 |
Built: | 2024-10-30 07:22:22 UTC |
Source: | https://github.com/bioc/GSEAmining |
Takes the output of clust_groups, a data frame , and process it to obtain the enrichment of genes in the core enrichment (or leading edge analysis) within each cluster. The output is used in the functions gm_enrichcores and gm_enrichreport.
clust_group_cores(cg, top = 3)
clust_group_cores(cg, top = 3)
cg |
A data frame output from the GSEAmining clusts_groups function. |
top |
An integer to choose the top most enriched genes to plot per cluster. The default parameter are the top 3. |
A tibble with four variables (Cluster, Enrichment, lead_token, n).
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) clust.groups <- clust_groups(genesets_sel, gs.cl) clust.lead <- clust_group_cores(clust.groups, top = 3)
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) clust.groups <- clust_groups(genesets_sel, gs.cl) clust.lead <- clust_group_cores(clust.groups, top = 3)
Takes the output of clust_groups, a data frame , and process it to obtain the enrichment of terms in gene sets names within each cluster. The output is used in the functions gm_enrichterms and gm_enrichreport.
clust_group_terms(cg)
clust_group_terms(cg)
cg |
A data frame output from the GSEAmining cluster_groups function. |
A tibble with four variables (Cluster, Enrichment, monogram, n).
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) clust.groups <- clust_groups(genesets_sel, gs.cl) clust.groups.wordcloud <- clust_group_terms(clust.groups)
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) clust.groups <- clust_groups(genesets_sel, gs.cl) clust.groups.wordcloud <- clust_group_terms(clust.groups)
Takes the output of gm_clust, which is an hclust class object, and returns a data frame that will be used in the rest of GSEAmining functions gm_enrichreport, gm_enrichterms and gm_enrichcores.
clust_groups(df, hc)
clust_groups(df, hc)
df |
Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'. |
hc |
The output of gm_clust, which is an hclust class object. |
A data.frame containing the cluster each gene set belongs to.
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) clust.groups <- clust_groups(genesets_sel, gs.cl)
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) clust.groups <- clust_groups(genesets_sel, gs.cl)
Data that corresponds to GSEA analysis of differential expressed genes from treated versus control samples in HGPalmer-PDX-P30 experiment. Differential gene expression was obtained by using the oligo and limma R packages. GSEA analysis was performed using the clusterProfiler R package using MSigDb collections C2, C5 and Hallmarks.
data(genesets_sel)
data(genesets_sel)
An object of class data.frame with 52 observations and 4 variables:
Name of the gene set
Normalized Enrichment Score
False discovery rate
Genes that are in the leading edge subset
Arqués et al. Clinical Cancer Research. 2016 Feb 1;22(3):644-56. doi: 10.1158/1078-0432.CCR-14-3081. Epub 2015 Jul 29. (Clinical Cancer Research)
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) gm_dendplot(genesets_sel, gs.cl) gm_enrichterms(genesets_sel, gs.cl) gm_enrichcores(genesets_sel, gs.cl) ## Not run: gm_enrichreport(genesets_sel, gs.cl)
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) gm_dendplot(genesets_sel, gs.cl) gm_enrichterms(genesets_sel, gs.cl) gm_enrichcores(genesets_sel, gs.cl) ## Not run: gm_enrichreport(genesets_sel, gs.cl)
Takes the output of gm_filter or a data frame that with the results of GSEA analysis and returns and hclust object that can be plotted using the gm_dendplot function.
gm_clust(df)
gm_clust(df)
df |
Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'. |
An object of class hclust that contains the clustering of the gene sets by the core enriched genes.First a distance matrix is calculated using the 'binary' method and then a cluster with the 'complete' method is created.
data(genesets_sel) gs.cl <- gm_clust(genesets_sel)
data(genesets_sel) gs.cl <- gm_clust(genesets_sel)
Takes the output of gm_clust, which is an hclust class object, and plots the dendrogram using the dendextend package.
gm_dendplot( df, hc, col_pos = "red", col_neg = "blue", dend_len = 30, rect = TRUE, rect_len = 2 )
gm_dendplot( df, hc, col_pos = "red", col_neg = "blue", dend_len = 30, rect = TRUE, rect_len = 2 )
df |
Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'. |
hc |
The output of gm_clust, which is an hclust class object. |
col_pos |
Color to represent the positively enriched gene sets. Default is red. |
col_neg |
Color to represent the negatively enriched gene sets. Default is blue. |
dend_len |
An integer that defines the length of the dendrogram. Default value is 30. The closest to zero the longest the dendrogram. |
rect |
A logical value indicating if rectangles should be drawn around the clusters to help differentiating them. By default it is set to TRUE. |
rect_len |
An integer to specify the length of the rectangle around the cluster and the gene set label. Default is 2. The closest to zero the smallest the rectangle. |
Invisibly returns a list with all the elements necessary to plot a dendrogram.
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) gm_dendplot(genesets_sel, gs.cl)
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) gm_dendplot(genesets_sel, gs.cl)
Takes the output of gm_clust, which is an hclust class object, and plots the top n genes in core enrichment (leading edge analysis). Two options are available, either separate barplots by clusters or all together in one plot.
gm_enrichcores( df, hc, clust = TRUE, col_pos = "red", col_neg = "blue", top = 3 )
gm_enrichcores( df, hc, clust = TRUE, col_pos = "red", col_neg = "blue", top = 3 )
df |
Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'. |
hc |
The output of gm_clust, which is an hclust class object. |
clust |
A logical value indicating if wordclouds should be separated by clusters or not. Default value is TRUE. |
col_pos |
Color to represent positively enriched gene sets. Default is red. |
col_neg |
Color to represent negatively enriched gene sets. Default is blue. |
top |
An integer to choose the top most enriched genes to plot per cluster. The default parameter are the top 3. |
Returns a ggplot object.
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) gm_enrichcores(genesets_sel, gs.cl)
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) gm_enrichcores(genesets_sel, gs.cl)
Takes the output of gm_clust, which is an hclust class object, and creates a report in pdf that contains enriched terms and enriched core genes in gene sets for each cluster. The results of each cluster are plotted in an independent page.
gm_enrichreport( df, hc, col_pos = "red", col_neg = "blue", top = 3, output = "gm_report" )
gm_enrichreport( df, hc, col_pos = "red", col_neg = "blue", top = 3, output = "gm_report" )
df |
Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'. |
hc |
The output of gm_clust, which is an hclust class object. |
col_pos |
Color to represent positively enriched gene sets. Default is red. |
col_neg |
Color to represent negatively enriched gene sets. Default is blue. |
top |
An integer to choose the top most enriched genes to plot per cluster. The default parameter are the top 3. |
output |
A string to name the output pdf file. |
Generates a pdf file.
#' data(genesets_sel) gs.cl <- gm_clust(genesets_sel) ## Not run: gm_enrichreport(genesets_sel, gs.cl)
#' data(genesets_sel) gs.cl <- gm_clust(genesets_sel) ## Not run: gm_enrichreport(genesets_sel, gs.cl)
Takes the output of gm_clust, which is an hclust class object, and plots gene set enriched terms as wordclouds. Two options are available, either separate enrichments by clusters or plot them together in a single plot.
gm_enrichterms(df, hc, clust = TRUE, col_pos = "red", col_neg = "blue")
gm_enrichterms(df, hc, clust = TRUE, col_pos = "red", col_neg = "blue")
df |
Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'. |
hc |
The output of gm_clust, which is an hclust class object. |
clust |
A logical value indicating if wordclouds should be separated by clusters or not. Default value is TRUE. |
col_pos |
Color to represent positively enriched gene sets. Default is red. |
col_neg |
Color to represent negatively enriched gene sets. Default is blue. |
Returns a ggplot object.
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) gm_enrichterms(genesets_sel, gs.cl)
data(genesets_sel) gs.cl <- gm_clust(genesets_sel) gm_enrichterms(genesets_sel, gs.cl)
Filters a data frame containing the results of GSEA analysis.
gm_filter(df, p.adj = 0.05, neg_NES = 1, pos_NES = 1)
gm_filter(df, p.adj = 0.05, neg_NES = 1, pos_NES = 1)
df |
Data frame that contains at least three columns: an ID column for the gene set names, a NES column with the normalized enrichment score and a core_enrichment column containing the genes in the leading edge of each gene set separated by '/'. |
p.adj |
An integer to set the limit of the adjusted p-value (or false discovery rate, FDR). Default value is 0.05 |
neg_NES |
A positive integer to set the limit of negative NES. Default is 1. |
pos_NES |
A positive integer to set the limit of positive NES. Default is 1. |
A data frame.
data(genesets_sel) gs.filt <- gm_filter(genesets_sel, p.adj = 0.05, neg_NES = 2.6, pos_NES = 2)
data(genesets_sel) gs.filt <- gm_filter(genesets_sel, p.adj = 0.05, neg_NES = 2.6, pos_NES = 2)
Stop words. Eliminates the first word of the gene sets from MSigDb that relate to the origin of the gene set. Additionally it eliminates words that do not add a lot of significance such as prepositions or adverbs among others.
stop_words()
stop_words()
Returns a tibble with 2 variables.