Title: | Reduce + Visualize GO |
---|---|
Description: | Reduce and visualize lists of Gene Ontology terms by identifying redudance based on semantic similarity. |
Authors: | Sergi Sayols [aut, cre], Sara Elmeligy [ctb] |
Maintainer: | Sergi Sayols <[email protected]> |
License: | GPL-3 |
Version: | 1.19.0 |
Built: | 2024-10-31 04:35:42 UTC |
Source: | https://github.com/bioc/rrvgo |
calculateSimMatrix Calculate the score similarity matrix between terms
calculateSimMatrix( x, orgdb, keytype = "ENTREZID", semdata = GOSemSim::godata(orgdb, ont = ont, keytype = keytype), ont = c("BP", "MF", "CC"), method = c("Resnik", "Lin", "Rel", "Jiang", "Wang") )
calculateSimMatrix( x, orgdb, keytype = "ENTREZID", semdata = GOSemSim::godata(orgdb, ont = ont, keytype = keytype), ont = c("BP", "MF", "CC"), method = c("Resnik", "Lin", "Rel", "Jiang", "Wang") )
x |
vector of GO terms |
orgdb |
one of org.* Bioconductor packages (the package name, or the package itself) |
keytype |
keytype passed to AnnotationDbi::keys to retrieve GO terms associated to gene ids in your orgdb |
semdata |
object with prepared GO DATA for measuring semantic similarity |
ont |
ontlogy. One of c("BP", "MF", "CC") |
method |
distance method. One of the supported methods by GOSemSim: c("Resnik", "Lin", "Rel", "Jiang", "Wang") |
All similarity measures available are those implemented in the [GOSemSim package](https://www.bioconductor.org/packages/release/bioc/html/GOSemSim.html), namely the Resnik, Lin, Relevance, Jiang and Wang methods. See the [Semantic Similarity Measurement Based on GO](https://www.bioconductor.org/packages/release/bioc/vignettes/GOSemSim/inst/doc/GOSemSim.html#semantic-similarity-measurement-based-on-go) section from the GOSeSim documentation for more details.
a square matrix with similarity scores between terms
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo")) simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo")) simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
getGoSize Get GO term size (# of genes)
getGoSize(terms, orgdb, keytype, children)
getGoSize(terms, orgdb, keytype, children)
terms |
GO terms |
orgdb |
one of org.* Bioconductor packages (the package name, or the package itself) |
keytype |
keytype passed to AnnotationDbi::keys to retrieve GO terms associated to gene ids in your orgdb |
children |
include genes in children terms (based on relationships in the GO DAG hierarchy) |
number of genes associated with each term
getGoTerm Get the description of a GO term
getGoTerm(x)
getGoTerm(x)
x |
GO terms |
the Term slot in GO.db::GOTERM[[x]]
getTermDisp Calculate the term dispensability score, defined as the semantic similarity threshold a term was assigned to a cluster (namely, the similarity of a term to the cluster representative term).
getTermDisp(simMatrix, cluster, clusterRep)
getTermDisp(simMatrix, cluster, clusterRep)
simMatrix |
a (square) similarity matrix |
cluster |
the cluster assignment for each term |
clusterRep |
the cluster representative term |
a vector of term dispensability scores
getTermUniq Calculate the term uniqueness score, defined as 1 minus the average semantic similarity of a term to all other terms.
getTermUniq(simMatrix, cluster = NULL)
getTermUniq(simMatrix, cluster = NULL)
simMatrix |
a (square) similarity matrix |
cluster |
vector with the cluster each entry in the simMatrix belongs to. If NULL, a |
a vector of term uniqueness scores
gg_color_hue Emulate ggplot2 color palette.
gg_color_hue(n)
gg_color_hue(n)
n |
number of colors |
It is just equally spaced hues around the color wheel, starting from 15:
a vector with colors (alphanumeric)
## Not run: plot(1:10, pch=16, cex=2, col=gg_color_hue(10)) ## End(Not run)
## Not run: plot(1:10, pch=16, cex=2, col=gg_color_hue(10)) ## End(Not run)
heatmapPlot Plot similarity matrix as a heatmap
heatmapPlot( simMatrix, reducedTerms = NULL, annotateParent = TRUE, annotationLabel = "parentTerm", ... )
heatmapPlot( simMatrix, reducedTerms = NULL, annotateParent = TRUE, annotationLabel = "parentTerm", ... )
simMatrix |
a (square) similarity matrix. |
reducedTerms |
a data.frame with the reduced terms from reduceSimMatrix() |
annotateParent |
whether to add annotation of the parent |
annotationLabel |
display "parent" ids or "parentTerm" string |
... |
other parameters sent to pheatmap::pheatmap() |
Matrix with similarity scores between terms is represented as a heatmap.
Invisibly a pheatmap object that is a list with components
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo")) simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel") scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID) reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db") heatmapPlot(simMatrix, reducedTerms, annotateParent=TRUE, annotationLabel="parentTerm", fontsize=6)
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo")) simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel") scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID) reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db") heatmapPlot(simMatrix, reducedTerms, annotateParent=TRUE, annotationLabel="parentTerm", fontsize=6)
loadOrgdb Load an orgdb object
loadOrgdb(orgdb)
loadOrgdb(orgdb)
orgdb |
one of org.* Bioconductor packages |
the loaded orgdb
reduceSimMatrix Reduce a set of GO terms based on their semantic similarity and scores.
reduceSimMatrix( simMatrix, scores = c("uniqueness", "size"), threshold = 0.7, orgdb, keytype = "ENTREZID", children = TRUE )
reduceSimMatrix( simMatrix, scores = c("uniqueness", "size"), threshold = 0.7, orgdb, keytype = "ENTREZID", children = TRUE )
simMatrix |
a (square) similarity matrix |
scores |
one of c("uniqueness", "size"), or a *named* vector with scores provided for each term, where higher values favor choosing the term as the cluster representative. The default "uniqueness" uses a score reflecting how unique the term is. Note: if you like to use p-values as scores, consider -1*log-transforming them ('-log(p)') |
threshold |
similarity threshold (0-1). Some guidance: Large (allowed similarity=0.9), Medium (0.7), Small (0.5), Tiny (0.4) Defaults to Medium (0.7) |
orgdb |
one of org.* Bioconductor packages (the package name, or the orgdb object itself) |
keytype |
keytype passed to AnnotationDbi::keys to retrieve GO terms associated to gene ids in your orgdb |
children |
when retrieving GO term size, include genes in children terms. (based on relationships in the GO DAG hierarchy). Defaults to TRUE |
Group terms which are at least within a similarity below 'threshold'. Decide which term remains based on a score. If no score is provided, then decide based on the "uniqueness" or the term "size".
Currently, rrvgo uses the similarity between pairs of terms to compute a distance matrix, defined as (1-simMatrix). The terms are then hierarchically clustered using complete linkage, and the tree is cut at the desired threshold, picking the term with the highest score as the representative of each group.
Therefore, higher thresholds lead to fewer groups, and the threshold should be read as the minimum similarity between group representatives.
a data.frame identifying the different clusters of terms, the parent term representing the cluster, and some metrics of importance describing how unique and dispensable a term is.
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo")) simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel") scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID) reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo")) simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel") scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID) reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")
scatterPlot Plot GO terms as scattered points.
scatterPlot( simMatrix, reducedTerms, algorithm = c("pca", "umap"), onlyParents = FALSE, size = "score", addLabel = TRUE, labelSize = 3 )
scatterPlot( simMatrix, reducedTerms, algorithm = c("pca", "umap"), onlyParents = FALSE, size = "score", addLabel = TRUE, labelSize = 3 )
simMatrix |
a (square) similarity matrix. |
reducedTerms |
a data.frame with the reduced terms from reduceSimMatrix() |
algorithm |
algorithm for dimensionality reduction. Either pca or umap. |
onlyParents |
plot only parent terms. Point size is the number of aggregated terms under the parent. |
size |
what to use as point size. Can be either GO term's "size" or "score". |
addLabel |
add labels with the most representative term of the group. |
labelSize |
text size in the label. |
Distances between points represent the similarity between terms. Axes are the first 2 components of applying one of this dimensionality reduction algorithms: - a PCoA to the (di)similarity matrix. - a UMAP (Uniform Manifold Approximation and Projection,[1]) Size of the point represents the provided scores or, in its absence, the number of genes the GO term contains.
ggplot2 object ready to be printed (or manipulated)
[1] Konopka T (2022). _umap: Uniform Manifold Approximation and Projection_. R package version 0.2.8.0, https://CRAN.R-project.org/package=umap.
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo")) simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel") scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID) reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db") scatterPlot(simMatrix, reducedTerms)
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo")) simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel") scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID) reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db") scatterPlot(simMatrix, reducedTerms)
shiny_rrvgo Launch an interactive web interface.
shiny_rrvgo(...)
shiny_rrvgo(...)
... |
other params sent to shiny::runApp(). |
Nothing
treemapPlot Plot GO terms as a treemap.
treemapPlot(reducedTerms, size = "score", title = "", ...)
treemapPlot(reducedTerms, size = "score", title = "", ...)
reducedTerms |
a data.frame with the reduced terms from reduceSimMatrix() |
size |
what to use as point size. Can be either GO term's "size" or "score" |
title |
title of the plot. Defaults to nothing |
... |
other parameters sent to treemap::treemap() |
A list from the call to the 'treemap()' function is silently returned
## Not run: go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo")) simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel") scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID) reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db") treemapPlot(reducedTerms) ## End(Not run)
## Not run: go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo")) simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel") scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID) reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db") treemapPlot(reducedTerms) ## End(Not run)
wordlcoudPlot Plot GO reduced terms as a wordcloud.
wordcloudPlot(reducedTerms, onlyParents = TRUE, ...)
wordcloudPlot(reducedTerms, onlyParents = TRUE, ...)
reducedTerms |
a data.frame with the reduced terms from reduceSimMatrix(). |
onlyParents |
use only parent terms to calculate frequencies. |
... |
other parameters sent to wordcloud::wordcloud() |
Nothing
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo")) simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel") scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID) reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db") wordcloudPlot(reducedTerms, min.freq=1, colors="black")
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo")) simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel") scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID) reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db") wordcloudPlot(reducedTerms, min.freq=1, colors="black")