Package 'rrvgo' reference manual

Title:	Reduce + Visualize GO
Description:	Reduce and visualize lists of Gene Ontology terms by identifying redudance based on semantic similarity.
Authors:	Sergi Sayols [aut, cre], Sara Elmeligy [ctb]
Maintainer:	Sergi Sayols <[email protected]>
License:	GPL-3
Version:	1.19.0
Built:	2025-03-30 07:19:23 UTC
Source:	https://github.com/bioc/rrvgo

calculateSimMatrix Calculate the score similarity matrix between terms

Description

calculateSimMatrix Calculate the score similarity matrix between terms

Usage

calculateSimMatrix(
  x,
  orgdb,
  keytype = "ENTREZID",
  semdata = GOSemSim::godata(orgdb, ont = ont, keytype = keytype),
  ont = c("BP", "MF", "CC"),
  method = c("Resnik", "Lin", "Rel", "Jiang", "Wang")
)
calculateSimMatrix(
  x,
  orgdb,
  keytype = "ENTREZID",
  semdata = GOSemSim::godata(orgdb, ont = ont, keytype = keytype),
  ont = c("BP", "MF", "CC"),
  method = c("Resnik", "Lin", "Rel", "Jiang", "Wang")
)

Arguments

`x`	vector of GO terms
`orgdb`	one of org.* Bioconductor packages (the package name, or the package itself)
`keytype`	keytype passed to AnnotationDbi::keys to retrieve GO terms associated to gene ids in your orgdb
`semdata`	object with prepared GO DATA for measuring semantic similarity
`ont`	ontlogy. One of c("BP", "MF", "CC")
`method`	distance method. One of the supported methods by GOSemSim: c("Resnik", "Lin", "Rel", "Jiang", "Wang")

Details

All similarity measures available are those implemented in the [GOSemSim package](https://www.bioconductor.org/packages/release/bioc/html/GOSemSim.html), namely the Resnik, Lin, Relevance, Jiang and Wang methods. See the [Semantic Similarity Measurement Based on GO](https://www.bioconductor.org/packages/release/bioc/vignettes/GOSemSim/inst/doc/GOSemSim.html#semantic-similarity-measurement-based-on-go) section from the GOSeSim documentation for more details.

Value

a square matrix with similarity scores between terms

Examples

go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")

getGoSize Get GO term size (# of genes)

Description

getGoSize Get GO term size (# of genes)

Usage

getGoSize(terms, orgdb, keytype, children)
getGoSize(terms, orgdb, keytype, children)

Arguments

`terms`	GO terms
`orgdb`	one of org.* Bioconductor packages (the package name, or the package itself)
`keytype`	keytype passed to AnnotationDbi::keys to retrieve GO terms associated to gene ids in your orgdb
`children`	include genes in children terms (based on relationships in the GO DAG hierarchy)

Value

number of genes associated with each term

getGoTerm Get the description of a GO term

Description

getGoTerm Get the description of a GO term

Usage

getGoTerm(x)
getGoTerm(x)

Arguments

x

GO terms

Value

the Term slot in GO.db::GOTERM[[x]]

getTermDisp Calculate the term dispensability score, defined as the semantic similarity threshold a term was assigned to a cluster (namely, the similarity of a term to the cluster representative term).

Description

getTermDisp Calculate the term dispensability score, defined as the semantic similarity threshold a term was assigned to a cluster (namely, the similarity of a term to the cluster representative term).

Usage

getTermDisp(simMatrix, cluster, clusterRep)
getTermDisp(simMatrix, cluster, clusterRep)

Arguments

`simMatrix`	a (square) similarity matrix
`cluster`	the cluster assignment for each term
`clusterRep`	the cluster representative term

Value

a vector of term dispensability scores

getTermUniq Calculate the term uniqueness score, defined as 1 minus the average semantic similarity of a term to all other terms.

Description

getTermUniq Calculate the term uniqueness score, defined as 1 minus the average semantic similarity of a term to all other terms.

Usage

getTermUniq(simMatrix, cluster = NULL)
getTermUniq(simMatrix, cluster = NULL)

Arguments

`simMatrix`	a (square) similarity matrix
`cluster`	vector with the cluster each entry in the simMatrix belongs to. If NULL, a

Value

a vector of term uniqueness scores

gg_color_hue Emulate ggplot2 color palette.

Description

gg_color_hue Emulate ggplot2 color palette.

Usage

gg_color_hue(n)
gg_color_hue(n)

Arguments

`n`	number of colors

Details

It is just equally spaced hues around the color wheel, starting from 15:

Value

a vector with colors (alphanumeric)

Examples

## Not run: 
plot(1:10, pch=16, cex=2, col=gg_color_hue(10))

## End(Not run)
## Not run: 
plot(1:10, pch=16, cex=2, col=gg_color_hue(10))

## End(Not run)

heatmapPlot Plot similarity matrix as a heatmap

Description

heatmapPlot Plot similarity matrix as a heatmap

Usage

heatmapPlot(
  simMatrix,
  reducedTerms = NULL,
  annotateParent = TRUE,
  annotationLabel = "parentTerm",
  ...
)
heatmapPlot(
  simMatrix,
  reducedTerms = NULL,
  annotateParent = TRUE,
  annotationLabel = "parentTerm",
  ...
)

Arguments

`simMatrix`	a (square) similarity matrix.
`reducedTerms`	a data.frame with the reduced terms from reduceSimMatrix()
`annotateParent`	whether to add annotation of the parent
`annotationLabel`	display "parent" ids or "parentTerm" string
`...`	other parameters sent to pheatmap::pheatmap()

Details

Matrix with similarity scores between terms is represented as a heatmap.

Value

Invisibly a pheatmap object that is a list with components

Examples

go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID)
reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")
heatmapPlot(simMatrix, reducedTerms, annotateParent=TRUE, annotationLabel="parentTerm", fontsize=6)
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID)
reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")
heatmapPlot(simMatrix, reducedTerms, annotateParent=TRUE, annotationLabel="parentTerm", fontsize=6)

loadOrgdb Load an orgdb object

Description

loadOrgdb Load an orgdb object

Usage

loadOrgdb(orgdb)
loadOrgdb(orgdb)

Arguments

orgdb

one of org.* Bioconductor packages

Value

the loaded orgdb

reduceSimMatrix Reduce a set of GO terms based on their semantic similarity and scores.

Description

reduceSimMatrix Reduce a set of GO terms based on their semantic similarity and scores.

Usage

reduceSimMatrix(
  simMatrix,
  scores = c("uniqueness", "size"),
  threshold = 0.7,
  orgdb,
  keytype = "ENTREZID",
  children = TRUE
)
reduceSimMatrix(
  simMatrix,
  scores = c("uniqueness", "size"),
  threshold = 0.7,
  orgdb,
  keytype = "ENTREZID",
  children = TRUE
)

Arguments

`simMatrix`	a (square) similarity matrix
`scores`	one of c("uniqueness", "size"), or a named vector with scores provided for each term, where higher values favor choosing the term as the cluster representative. The default "uniqueness" uses a score reflecting how unique the term is. Note: if you like to use p-values as scores, consider -1*log-transforming them ('-log(p)')
`threshold`	similarity threshold (0-1). Some guidance: Large (allowed similarity=0.9), Medium (0.7), Small (0.5), Tiny (0.4) Defaults to Medium (0.7)
`orgdb`	one of org.* Bioconductor packages (the package name, or the orgdb object itself)
`keytype`	keytype passed to AnnotationDbi::keys to retrieve GO terms associated to gene ids in your orgdb
`children`	when retrieving GO term size, include genes in children terms. (based on relationships in the GO DAG hierarchy). Defaults to TRUE

Details

Group terms which are at least within a similarity below 'threshold'. Decide which term remains based on a score. If no score is provided, then decide based on the "uniqueness" or the term "size".

Currently, rrvgo uses the similarity between pairs of terms to compute a distance matrix, defined as (1-simMatrix). The terms are then hierarchically clustered using complete linkage, and the tree is cut at the desired threshold, picking the term with the highest score as the representative of each group.

Therefore, higher thresholds lead to fewer groups, and the threshold should be read as the minimum similarity between group representatives.

Value

a data.frame identifying the different clusters of terms, the parent term representing the cluster, and some metrics of importance describing how unique and dispensable a term is.

Examples

go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID)
reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID)
reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")

scatterPlot Plot GO terms as scattered points.

Description

scatterPlot Plot GO terms as scattered points.

Usage

scatterPlot(
  simMatrix,
  reducedTerms,
  algorithm = c("pca", "umap"),
  onlyParents = FALSE,
  size = "score",
  addLabel = TRUE,
  labelSize = 3
)
scatterPlot(
  simMatrix,
  reducedTerms,
  algorithm = c("pca", "umap"),
  onlyParents = FALSE,
  size = "score",
  addLabel = TRUE,
  labelSize = 3
)

Arguments

`simMatrix`	a (square) similarity matrix.
`reducedTerms`	a data.frame with the reduced terms from reduceSimMatrix()
`algorithm`	algorithm for dimensionality reduction. Either pca or umap.
`onlyParents`	plot only parent terms. Point size is the number of aggregated terms under the parent.
`size`	what to use as point size. Can be either GO term's "size" or "score".
`addLabel`	add labels with the most representative term of the group.
`labelSize`	text size in the label.

Details

Distances between points represent the similarity between terms. Axes are the first 2 components of applying one of this dimensionality reduction algorithms: - a PCoA to the (di)similarity matrix. - a UMAP (Uniform Manifold Approximation and Projection,[1]) Size of the point represents the provided scores or, in its absence, the number of genes the GO term contains.

Value

ggplot2 object ready to be printed (or manipulated)

References

[1] Konopka T (2022). _umap: Uniform Manifold Approximation and Projection_. R package version 0.2.8.0, https://CRAN.R-project.org/package=umap.

Examples

go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID)
reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")
scatterPlot(simMatrix, reducedTerms)
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID)
reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")
scatterPlot(simMatrix, reducedTerms)

shiny_rrvgo Launch an interactive web interface.

Description

shiny_rrvgo Launch an interactive web interface.

Usage

shiny_rrvgo(...)
shiny_rrvgo(...)

Arguments

...

other params sent to shiny::runApp().

Value

Nothing

treemapPlot Plot GO terms as a treemap.

Description

treemapPlot Plot GO terms as a treemap.

Usage

treemapPlot(reducedTerms, size = "score", title = "", ...)
treemapPlot(reducedTerms, size = "score", title = "", ...)

Arguments

`reducedTerms`	a data.frame with the reduced terms from reduceSimMatrix()
`size`	what to use as point size. Can be either GO term's "size" or "score"
`title`	title of the plot. Defaults to nothing
`...`	other parameters sent to treemap::treemap()

Value

A list from the call to the 'treemap()' function is silently returned

Examples

## Not run: 
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID)
reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")
treemapPlot(reducedTerms)

## End(Not run)
## Not run: 
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID)
reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")
treemapPlot(reducedTerms)

## End(Not run)

wordlcoudPlot Plot GO reduced terms as a wordcloud.

Description

wordlcoudPlot Plot GO reduced terms as a wordcloud.

Usage

wordcloudPlot(reducedTerms, onlyParents = TRUE, ...)
wordcloudPlot(reducedTerms, onlyParents = TRUE, ...)

Arguments

`reducedTerms`	a data.frame with the reduced terms from reduceSimMatrix().
`onlyParents`	use only parent terms to calculate frequencies.
`...`	other parameters sent to wordcloud::wordcloud()

Value

Nothing

Examples

go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID)
reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")
wordcloudPlot(reducedTerms, min.freq=1, colors="black")
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID)
reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")
wordcloudPlot(reducedTerms, min.freq=1, colors="black")

Package 'rrvgo'

Help Index

calculateSimMatrix Calculate the score similarity matrix between terms

Description

Usage

Arguments

Details

Value

Examples

getGoSize Get GO term size (# of genes)

Description

Usage

Arguments

Value

getGoTerm Get the description of a GO term

Description

Usage

Arguments

Value

getTermDisp Calculate the term dispensability score, defined as the semantic similarity threshold a term was assigned to a cluster (namely, the similarity of a term to the cluster representative term).

Description

Usage

Arguments

Value

getTermUniq Calculate the term uniqueness score, defined as 1 minus the average semantic similarity of a term to all other terms.

Description

Usage

Arguments

Value

gg_color_hue Emulate ggplot2 color palette.

Description

Usage

Arguments

Details

Value

Examples

heatmapPlot Plot similarity matrix as a heatmap

Description

Usage

Arguments

Details

Value

Examples

loadOrgdb Load an orgdb object

Description

Usage

Arguments

Value

reduceSimMatrix Reduce a set of GO terms based on their semantic similarity and scores.

Description

Usage

Arguments

Details

Value

Examples

scatterPlot Plot GO terms as scattered points.

Description

Usage

Arguments

Details

Value

References

Examples

shiny_rrvgo Launch an interactive web interface.

Description

Usage

Arguments

Value

treemapPlot Plot GO terms as a treemap.

Description

Usage

Arguments

Value

Examples

wordlcoudPlot Plot GO reduced terms as a wordcloud.

Description

Usage

Arguments

Value

Examples