| Title: | Defining and visualizing the distances between different genesets |
|---|---|
| Description: | The package provides different distances measurements to calculate the difference between genesets. Based on these scores the genesets are clustered and visualized as graph. This is all presented in an interactive Shiny application for easy usage. |
| Authors: | Annekathrin Nedwed [aut, cre] (ORCID: <https://orcid.org/0000-0002-2475-4945>), Federico Marini [aut] (ORCID: <https://orcid.org/0000-0003-3252-7758>) |
| Maintainer: | Annekathrin Nedwed <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.9.0 |
| Built: | 2026-05-30 09:44:09 UTC |
| Source: | https://github.com/bioc/GeDi |
Check if the input genesets have the expected format for this app
.checkGenesets( genesets, col_name_genesets = "Genesets", col_name_genes = "Genes" ).checkGenesets( genesets, col_name_genesets = "Genesets", col_name_genes = "Genes" )
genesets |
a |
col_name_genesets |
character, the name of the column in which the geneset ids are listed. Defaults to "Genesets". |
col_name_genes |
character, the name of the column in which the genes are listed. Defaults to "Genes". |
A validated and formatted genesets data frame.
Check if the provided GeneTonic List object has the expected format for the app and extract the functional enrichment results
.checkGTL(gtl).checkGTL(gtl)
gtl |
A |
A validated and renamed geneset data.frame.
Check if the Protein-Protein-interaction (PPI) has the expected format for this app
.checkPPI(ppi).checkPPI(ppi)
ppi |
a |
A validated and formatted PPI data frame.
Check if the provided distance scores have the expected format for this app
.checkScores(genesets, distance_scores).checkScores(genesets, distance_scores)
genesets |
a |
distance_scores |
A |
A validated and formatted distance_scores Matrix::Matrix().
This function implements the Markov Clustering (MCL) algorithm for finding community
structure, in an analogous way to other existing algorithms in igraph.
.cluster_markov( g, add_self_loops = TRUE, loop_value = 1, mcl_expansion = 2, mcl_inflation = 2, allow_singletons = TRUE, max_iter = 100, return_node_names = TRUE, return_esm = FALSE ).cluster_markov( g, add_self_loops = TRUE, loop_value = 1, mcl_expansion = 2, mcl_inflation = 2, allow_singletons = TRUE, max_iter = 100, return_node_names = TRUE, return_esm = FALSE )
g |
The input graph object |
add_self_loops |
Logical, whether to add self-loops to the matrix by
setting the diagonal to |
loop_value |
Numeric, the value to use for self-loops |
mcl_expansion |
Numeric, cluster expansion factor for the Markov clustering iteration - defaults to 2 |
mcl_inflation |
Numeric, cluster inflation factor for the Markov clustering iteration - defaults to 2 |
allow_singletons |
Logical; if |
max_iter |
Numeric value for the maximum number of iterations for the Markov clustering |
return_node_names |
Logical, if the graph is named and set to |
return_esm |
Logical, controlling whether the equilibrium state matrix should be returned |
This implementation has been driven by the nice explanations provided in
https://sites.cs.ucsb.edu/~xyan/classes/CS595D-2009winter/MCL_Presentation2.pdf
https://medium.com/analytics-vidhya/demystifying-markov-clustering-aeb6cdabbfc7
https://github.com/GuyAllard/markov_clustering (python implementation)
More info on the MCL: https://micans.org/mcl/index.html, and https://micans.org/mcl/sec_description1.html
This function returns a communities object, containing the numbers of
the assigned membership (in the slot membership). Please see the
igraph::communities() manual page for additional details
van Dongen, S.M., Graph clustering by flow simulation (2000) PhD thesis, Utrecht University Repository - https://dspace.library.uu.nl/handle/1874/848
Enright AJ, van Dongen SM, Ouzounis CA, An efficient algorithm for large-scale detection of protein families (2002) Nucleic Acids Research, Volume 30, Issue 7, 1 April 2002, Pages 1575–1584, https://doi.org/10.1093/nar/30.7.1575
library("igraph") g <- make_full_graph(5) %du% make_full_graph(5) %du% make_full_graph(5) g <- add_edges(g, c(1, 6, 1, 11, 6, 11)) .cluster_markov(g) V(g)$color <- .cluster_markov(g)$membership plot(g)library("igraph") g <- make_full_graph(5) %du% make_full_graph(5) %du% make_full_graph(5) g <- add_edges(g, c(1, 6, 1, 11, 6, 11)) .cluster_markov(g) V(g)$color <- .cluster_markov(g)$membership plot(g)
Filter a preselected list of genesets from a data.frame of genesets
.filterGenesets(remove, df_genesets).filterGenesets(remove, df_genesets)
remove |
a |
df_genesets |
a |
A data.frame containing information about filtered genesets
This function tries to guess which separator was used in a list of delimited strings.
.findSeparator(stringList, sepList = c(",", "\t", ";", " ", "/")).findSeparator(stringList, sepList = c(",", "\t", ";", " ", "/"))
stringList |
|
sepList |
|
character, corresponding to the guessed separator. One of "," (comma), "\t" (tab), ";" (semicolon)," " (whitespace) or "/" (backslash).
See https://github.com/federicomarini/ideal for details on the original implementation.
Map each geneset to the cluster it belongs and return the information as
a data.frame
.getClusterDatatable(cluster, gs_names, gs_description).getClusterDatatable(cluster, gs_names, gs_description)
cluster |
A |
gs_names |
A vector of geneset names |
gs_description |
A vector of descriptions for each geneset |
A data.frame mapping each geneset to the cluster(s) it belongs to
Extracts gene set descriptions from a provided gene set object. The function prioritizes columns "Term", "Description", or "Genesets" to find the appropriate descriptions. If any descriptions are duplicated, the function appends a suffix to make them unique.
.getGenesetDescriptions(genesets).getGenesetDescriptions(genesets)
genesets |
a |
a list of geneset descriptions
Determine the number of CPU cores the scoring functions should use when computing the distance scores.
.getNumberCores(n_cores = NULL).getNumberCores(n_cores = NULL)
n_cores |
numeric, number of cores to use for the function.
Defaults to |
Number of CPU cores to be used.
data.frame of graph metricsGenerate a data.frame of the graph metrics degree, betweenness,
harmonic centrality and clustering coefficient for each node
in a given graph.
.graphMetricsGenesetsDT(g, genesets).graphMetricsGenesetsDT(g, genesets)
g |
A |
genesets |
A |
A data.frame of geneset extended by columns for the degree,
betweenness, harmonic centrality and clustering coefficient for each
geneset.
Maps numeric continuous values to values in a color palette
.map_to_color(x, pal, symmetric = TRUE, limits = NULL).map_to_color(x, pal, symmetric = TRUE, limits = NULL)
x |
A character vector of numeric values (e.g. log2FoldChange values) to be converted to a vector of colors |
pal |
A vector of characters specifying the definition of colors for the
palette, e.g. obtained via |
symmetric |
Logical value, whether to return a palette which is symmetrical
with respect to the minimum and maximum values - "respecting" the zero.
Defaults to |
limits |
A vector containing the limits of the values to be mapped. If
not specified, defaults to the range of values in the |
A vector of colors, each corresponding to an element in the original vector
a <- 1:9 pal <- RColorBrewer::brewer.pal(9, "Set1") .map_to_color(a, pal) plot(a, col = .map_to_color(a, pal), pch = 20, cex = 4) b <- 1:50 pal2 <- grDevices::colorRampPalette( RColorBrewer::brewer.pal(name = "RdYlBu", 11) )(50) plot(b, col = .map_to_color(b, pal2), pch = 20, cex = 3)a <- 1:9 pal <- RColorBrewer::brewer.pal(9, "Set1") .map_to_color(a, pal) plot(a, col = .map_to_color(a, pal), pch = 20, cex = 4) b <- 1:50 pal2 <- grDevices::colorRampPalette( RColorBrewer::brewer.pal(name = "RdYlBu", 11) )(50) plot(b, col = .map_to_color(b, pal2), pch = 20, cex = 3)
This function tries to guess which separator was used in a text delimited file.
.sepguesser(file, sep_list = c(",", "\t", ";", " ", "/")).sepguesser(file, sep_list = c(",", "\t", ";", " ", "/"))
file |
a character, location of a file to read data from. |
sep_list |
a |
A character, corresponding to the guessed separator. One of "," (comma), "\t" (tab), ";" (semicolon)," " (whitespace) or "/" (backslash).
See https://github.com/federicomarini/ideal for details on the original implementation.
Build a igraph from cluster information, connecting nodes which belong to
the same cluster.
buildClusterGraph( cluster, geneset_df, gs_ids, color_by = NULL, gs_names = NULL )buildClusterGraph( cluster, geneset_df, gs_ids, color_by = NULL, gs_names = NULL )
cluster |
list, a |
geneset_df |
|
gs_ids |
vector, a vector of geneset identifiers, e.g. the |
color_by |
character, a column name of |
gs_names |
vector, a vector of geneset descriptions/names, e.g. the
|
An igraph object to be further manipulated or processed/plotted
(e.g. via igraph::plot.igraph() or
visNetwork::visIgraph())
cluster <- list(c(1:5), c(6:9, 1)) genes <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") gs_ids <- c(1:9) geneset_df <- data.frame( Genesets = gs_names, value = rep(1, 9) ) geneset_df$Genes <- genes graph <- buildClusterGraph( cluster = cluster, geneset_df = geneset_df, gs_ids = gs_ids, color_by = "value", gs_names = gs_names )cluster <- list(c(1:5), c(6:9, 1)) genes <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") gs_ids <- c(1:9) geneset_df <- data.frame( Genesets = gs_names, value = rep(1, 9) ) geneset_df$Genes <- genes graph <- buildClusterGraph( cluster = cluster, geneset_df = geneset_df, gs_ids = gs_ids, color_by = "value", gs_names = gs_names )
Construct a graph from a given adjacency matrix
buildGraph(adjMatrix, geneset_df = NULL, gs_names = NULL, weighted = FALSE)buildGraph(adjMatrix, geneset_df = NULL, gs_names = NULL, weighted = FALSE)
adjMatrix |
A |
geneset_df |
|
gs_names |
vector, a vector of geneset descriptions/names, e.g. the
|
weighted |
logical value, whether or not the resulting graph should have
weighted edges. If TRUE, the |
An igraph object to be further manipulated or processed/plotted
(e.g. via igraph::plot.igraph() or
visNetwork::visIgraph())
adj <- Matrix::Matrix(0, 100, 100) adj[c(80:100), c(80:100)] <- 1 geneset_names <- as.character(stats::runif(100, min = 0, max = 1)) rownames(adj) <- colnames(adj) <- geneset_names graph <- buildGraph(adj)adj <- Matrix::Matrix(0, 100, 100) adj[c(80:100), c(80:100)] <- 1 geneset_names <- as.character(stats::runif(100, min = 0, max = 1)) rownames(adj) <- colnames(adj) <- geneset_names graph <- buildGraph(adj)
gsHistogram().Prepare the data for the gsHistogram() by generating a data.frame
which maps geneset names / identifiers to the size of their size.
buildHistogramData( genesets, gs_names, gs_description = NULL, start = 0, end = 0 )buildHistogramData( genesets, gs_names, gs_description = NULL, start = 0, end = 0 )
genesets |
a |
gs_names |
character vector, Name / identifier of the genesets in
|
gs_description |
Optional, a character vector containing a short description for each geneset |
start |
numeric, Optional, describes the minimum gene set size to include. Defaults to 0. |
end |
numeric, Optional, describes the maximum gene set size to include. Defaults to 0. |
A data.frame mapping geneset names to sizes
## Mock example showing how the data should look like gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") genesets <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) p <- buildHistogramData(genesets, gs_names) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) p <- buildHistogramData(genes, macrophage_topGO_example_small$Genesets)## Mock example showing how the data should look like gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") genesets <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) p <- buildHistogramData(genesets, gs_names) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) p <- buildHistogramData(genes, macrophage_topGO_example_small$Genesets)
Remove subsets from a given list of sets, i.e. remove sets which are completely contained in any other larger set in the list.
checkInclusion(seeds)checkInclusion(seeds)
seeds |
A |
A list of unique sets
## Mock example showing how the data should look like seeds <- list(c(1:5), c(2:5), c(6:10)) s <- checkInclusion(seeds) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) seeds <- seedFinding(scores_macrophage_topGO_example_small, simThreshold = 0.3, memThreshold = 0.5) seeds <- checkInclusion(seeds)## Mock example showing how the data should look like seeds <- list(c(1:5), c(2:5), c(6:10)) s <- checkInclusion(seeds) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) seeds <- seedFinding(scores_macrophage_topGO_example_small, simThreshold = 0.3, memThreshold = 0.5) seeds <- checkInclusion(seeds)
This function performs clustering on a set of scores using either the Louvain or Markov method.
clustering(scores, threshold, cluster_method = "louvain")clustering(scores, threshold, cluster_method = "louvain")
scores |
A |
threshold |
numerical, A threshold used to determine which genesets are considered similar. Genesets are considered similar if (distance) score <= threshold. similar. |
cluster_method |
character, the clustering method to use. The options
are |
A list of clusters
## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(m) <- colnames(m) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- clustering(m, 0.3, "markov") ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) clustering <- clustering(scores_macrophage_topGO_example_small, threshold = 0.5)## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(m) <- colnames(m) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- clustering(m, 0.3, "markov") ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) clustering <- clustering(scores_macrophage_topGO_example_small, threshold = 0.5)
Functions that are on their way to the function afterlife. Their successors are also listed.
... |
Ignored arguments. |
The successors of these functions are likely coming from a renaming of the functions to more intuitive function names
All functions throw a warning, with a deprecation message pointing towards its descendent (if available).
getGenes(), now replaced by the more intuitive name
prepareGenesetData(). The only change in its functionality concerns the
function name.
Annekathrin Nedwed
# try(getGenes())# try(getGenes())
Plot a dendrogram of a matrix of (distance) scores.
distanceDendro(distance_scores, cluster_method = "average")distanceDendro(distance_scores, cluster_method = "average")
distance_scores |
A |
cluster_method |
character, indicating the clustering method
for the |
A ggdendro::ggdendrogram() plot object.
## Mock example showing how the data should look like distance_scores <- Matrix::Matrix(0.5, 20, 20) distance_scores[c(11:15), c(2:6)] <- 0.2 dendro <- distanceDendro(distance_scores, cluster_method = "single") ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) dendro <- distanceDendro(scores_macrophage_topGO_example_small, cluster_method = "average")## Mock example showing how the data should look like distance_scores <- Matrix::Matrix(0.5, 20, 20) distance_scores[c(11:15), c(2:6)] <- 0.2 dendro <- distanceDendro(distance_scores, cluster_method = "single") ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) dendro <- distanceDendro(scores_macrophage_topGO_example_small, cluster_method = "average")
Plot a heatmap of a matrix of (distance) scores of the input genesets
distanceHeatmap( distance_scores, chars_limit = 50, plot_labels = TRUE, cluster_rows = TRUE, cluster_columns = TRUE, title = "Distance Scores", display_similarity = FALSE, quantile_limits = c(0.025, 0.975) )distanceHeatmap( distance_scores, chars_limit = 50, plot_labels = TRUE, cluster_rows = TRUE, cluster_columns = TRUE, title = "Distance Scores", display_similarity = FALSE, quantile_limits = c(0.025, 0.975) )
distance_scores |
A |
chars_limit |
Numeric value, Indicates how many characters of the
row and column names of |
plot_labels |
Logical, Indicates if row and collabels should be plotted. Defaults to TRUE |
cluster_rows |
Logical, Indicates whether or not the rows should be clustered based on the distance scores. Defaults to TRUE |
cluster_columns |
Logical, Indicates whether or not the rows should be clustered based on the distance scores. Defaults to TRUE |
title |
character, a title for the figure. Defaults to "Distance Scores" |
display_similarity |
Logical, Indicates whether or not the scores should be plotted as a distance matrix (i.e. 0 indicates completely identical sets) or as a similarity matrix (i.e. 1 indicates completely identical sets). Defaults to FALSE (i.e. a distance matrix). |
quantile_limits |
Numerical vector, Used to scale the colors in the heatmap between the given quantiles. Can be helpful in order to reduce the effect of the diagonal or other outlier values on the color scheme. Defaults to c(0.025, 0.975). |
A ComplexHeatmap::Heatmap() plot object.
## Mock example showing how the data should look like distance_scores <- Matrix::Matrix(0.5, 20, 20) distance_scores[c(11:15), c(2:6)] <- 0.2 rownames(distance_scores) <- colnames(distance_scores) <- as.character(c(1:20)) p <- distanceHeatmap(distance_scores) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) p <- distanceHeatmap(scores_macrophage_topGO_example_small)## Mock example showing how the data should look like distance_scores <- Matrix::Matrix(0.5, 20, 20) distance_scores[c(11:15), c(2:6)] <- 0.2 rownames(distance_scores) <- colnames(distance_scores) <- as.character(c(1:20)) p <- distanceHeatmap(distance_scores) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) p <- distanceHeatmap(scores_macrophage_topGO_example_small)
Visualize the results of an enrichment analysis as a word cloud. The word cloud highlights the most frequent terms associated with the description of the genesets in the enrichment analysis.
enrichmentWordcloud( genesets_df, remove_generic_terms = FALSE, terms_to_remove = c() )enrichmentWordcloud( genesets_df, remove_generic_terms = FALSE, terms_to_remove = c() )
genesets_df |
A |
remove_generic_terms |
Logical, If generic terms like "via", "protein", "factor", "side", "type", "specific", "regulation" and "process" should be removed from the wordcloud. Default to FALSE. |
terms_to_remove |
Character vector, A vector of additional terms that should be removed from the wordcloud. |
A wordcloud2::wordcloud2() plot object
## Mock example showing how the data should look like ## If no "Term" or "Description" column is available, ## the rownames of the data frame will be used. geneset_df <- data.frame( Genesets = c("GO:0002503", "GO:0045087", "GO:0019886"), Genes = c("B2M, HLA-DMA, HLA-DMB", "ACOD1, ADAM8, AIM2", "B2M, CD74, CTSS") ) rownames(geneset_df) <- geneset_df$Genesets wordcloud <- enrichmentWordcloud(geneset_df) ## With available "Term" column. geneset_df <- data.frame( Genesets = c("GO:0002503", "GO:0045087", "GO:0019886"), Genes = c("B2M, HLA-DMA, HLA-DMB", "ACOD1, ADAM8, AIM2", "B2M, CD74, CTSS"), Term = c( "peptide antigen assembly with MHC class II protein complex", "innate immune response", "antigen processing and presentation of exogenous peptide antigen via MHC class II") ) wordcloud <- enrichmentWordcloud(geneset_df) ## Example using the data available in the package data(macrophage_topGO_example, package = "GeDi", envir = environment()) wordcloud <- enrichmentWordcloud(macrophage_topGO_example)## Mock example showing how the data should look like ## If no "Term" or "Description" column is available, ## the rownames of the data frame will be used. geneset_df <- data.frame( Genesets = c("GO:0002503", "GO:0045087", "GO:0019886"), Genes = c("B2M, HLA-DMA, HLA-DMB", "ACOD1, ADAM8, AIM2", "B2M, CD74, CTSS") ) rownames(geneset_df) <- geneset_df$Genesets wordcloud <- enrichmentWordcloud(geneset_df) ## With available "Term" column. geneset_df <- data.frame( Genesets = c("GO:0002503", "GO:0045087", "GO:0019886"), Genes = c("B2M, HLA-DMA, HLA-DMB", "ACOD1, ADAM8, AIM2", "B2M, CD74, CTSS"), Term = c( "peptide antigen assembly with MHC class II protein complex", "innate immune response", "antigen processing and presentation of exogenous peptide antigen via MHC class II") ) wordcloud <- enrichmentWordcloud(geneset_df) ## Example using the data available in the package data(macrophage_topGO_example, package = "GeDi", envir = environment()) wordcloud <- enrichmentWordcloud(macrophage_topGO_example)
Merge the initially determined seeds to clusters.
fuzzyClustering(seeds, threshold)fuzzyClustering(seeds, threshold)
seeds |
A |
threshold |
numerical, A threshold for merging seeds |
A list of clusters
See https://david.ncifcrf.gov/helps/functional_classification.html#clustering for details on the original implementation
## Mock example showing how the data should look like seeds <- list(c(1:5), c(6:10)) cluster <- fuzzyClustering(seeds, 0.5) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) seeds <- seedFinding(scores_macrophage_topGO_example_small, simThreshold = 0.3, memThreshold = 0.5) cluster <- fuzzyClustering(seeds, threshold = 0.5)## Mock example showing how the data should look like seeds <- list(c(1:5), c(6:10)) cluster <- fuzzyClustering(seeds, 0.5) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) seeds <- seedFinding(scores_macrophage_topGO_example_small, simThreshold = 0.3, memThreshold = 0.5) cluster <- fuzzyClustering(seeds, threshold = 0.5)
GeDi main function
GeDi( genesets = NULL, ppi_df = NULL, distance_scores = NULL, gtl = NULL, col_name_genesets = "Genesets", col_name_genes = "Genes" )GeDi( genesets = NULL, ppi_df = NULL, distance_scores = NULL, gtl = NULL, col_name_genesets = "Genesets", col_name_genes = "Genes" )
genesets |
a |
ppi_df |
a |
distance_scores |
A |
gtl |
A |
col_name_genesets |
character, the name of the column in which the geneset ids are listed. Defaults to "Genesets". |
col_name_genes |
character, the name of the column in which the genes are listed. Defaults to "Genes". |
A Shiny app object is returned
if (interactive()) { GeDi() } # Alternatively, you can also start the application with your data directly # loaded. data("macrophage_topGO_example", package = "GeDi") if (interactive()) { GeDi(genesets = macrophage_topGO_example) }if (interactive()) { GeDi() } # Alternatively, you can also start the application with your data directly # loaded. data("macrophage_topGO_example", package = "GeDi") if (interactive()) { GeDi(genesets = macrophage_topGO_example) }
Construct an adjacency matrix from the (distance) scores and a given threshold.
getAdjacencyMatrix(distanceMatrix, cutOff, weighted = FALSE)getAdjacencyMatrix(distanceMatrix, cutOff, weighted = FALSE)
distanceMatrix |
A |
cutOff |
Numeric value, indicating for which pair of entries in the
|
weighted |
logical value, indicating whether or not the resulting
adjacency matrix should be weighted. If TRUE, the matrix will
be weighted by the distance scores in |
A Matrix::Matrix() of adjacency status
m <- Matrix::Matrix(stats::runif(1000, 0, 1), 100, 100) geneset_names <- as.character(stats::runif(100, min = 0, max = 1)) rownames(m) <- colnames(m) <- geneset_names threshold <- 0.3 adj <- getAdjacencyMatrix(m, threshold)m <- Matrix::Matrix(stats::runif(1000, 0, 1), 100, 100) geneset_names <- as.character(stats::runif(100, min = 0, max = 1)) rownames(m) <- colnames(m) <- geneset_names threshold <- 0.3 adj <- getAdjacencyMatrix(m, threshold)
STRINGdb objectGet the annotation of a STRINGdb object, i.e. the aliases of the protein
information
getAnnotation(stringdb)getAnnotation(stringdb)
stringdb |
the |
A data.frame mapping STRINGdb ids to gene names
stringdb <- getStringDB(9606) stringdb anno_df <- getAnnotation(stringdb)stringdb <- getStringDB(9606) stringdb anno_df <- getAnnotation(stringdb)
Construct a bipartite graph from cluster information, mapping the cluster to its members
getBipartiteGraph(cluster, gs_names, genes)getBipartiteGraph(cluster, gs_names, genes)
cluster |
|
gs_names |
vector, a vector of (geneset) identifiers/names to map the
numeric member value in |
genes |
|
An igraph object to be further manipulated or processed/plotted
(e.g. via igraph::plot.igraph() or
visNetwork::visIgraph())
cluster <- list(c(1:5), c(6:9)) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") genes <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) g <- getBipartiteGraph(cluster, gs_names, genes)cluster <- list(c(1:5), c(6:9)) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") genes <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) g <- getBipartiteGraph(cluster, gs_names, genes)
Construct an adjacency matrix from a list of cluster.
getClusterAdjacencyMatrix(cluster, gs_names)getClusterAdjacencyMatrix(cluster, gs_names)
cluster |
A |
gs_names |
A vector of geneset names |
A Matrix::Matrix() of adjacency status
cluster <- list(c(1:5), c(6:9)) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") adj <- getClusterAdjacencyMatrix(cluster, gs_names)cluster <- list(c(1:5), c(6:9)) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") adj <- getClusterAdjacencyMatrix(cluster, gs_names)
Split a long string of space separated genes into a list of individual
genes.
getGenes(genesets, gene_name = NULL)getGenes(genesets, gene_name = NULL)
genesets |
a |
gene_name |
a character, Alternative name for the column containing the
genes in |
A list containing for each geneset in the Geneset column a
list of the included genes.
Build up the title for the graph nodes to display the available information of each geneset.
getGraphTitle( geneset_df = NULL, node_ids, gs_ids, gs_names = NULL, cluster_id = NULL )getGraphTitle( geneset_df = NULL, node_ids, gs_ids, gs_names = NULL, cluster_id = NULL )
geneset_df |
A |
node_ids |
vector, a vector of ids of the nodes in the graph for which the node title should be build. |
gs_ids |
vector, a vector of geneset identifiers, e.g. the |
gs_names |
vector, a vector of geneset descriptions/names, e.g. the
|
cluster_id |
vector, a vector of cluster ids for each of the genesets |
A list of titles for a graph with nodes given by node_ids.
genes <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") geneset_df <- data.frame( Genesets = gs_names, value = rep(1, 9) ) geneset_df$Genes <- genes graph <- getGraphTitle( geneset_df = geneset_df, node_ids = c(1:9), gs_ids = c(1:9), gs_names = gs_names )genes <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") geneset_df <- data.frame( Genesets = gs_names, value = rep(1, 9) ) geneset_df$Genes <- genes graph <- getGraphTitle( geneset_df = geneset_df, node_ids = c(1:9), gs_ids = c(1:9), gs_names = gs_names )
Get the NCBI ID of a species
getId(species, version = "12.0", cache = FALSE)getId(species, version = "12.0", cache = FALSE)
species |
character, the species of your input data |
version |
character, the version of STRING you want to use, defaults to the current version of STRING |
cache |
Logical value, defining whether to use the
BiocFileCache for retrieval of the files underlying
the |
A character of the NCBI ID of species
species <- "Homo sapiens" id <- getId(species = species) species <- "Mus musculus" id <- getId(species = species)species <- "Homo sapiens" id <- getId(species = species) species <- "Mus musculus" id <- getId(species = species)
The function calculates an interaction score between two sets of genes based on a protein-protein interaction network.
getInteractionScore(a, b, ppi, maxInteract)getInteractionScore(a, b, ppi, maxInteract)
a, b
|
character vector, set of gene identifiers. |
ppi |
a |
maxInteract |
numeric, Maximum interaction value in the PPI. |
Interaction score between the two gene sets.
See https://doi.org/10.1186/s12864-019-5738-6 for details on the original implementation.
## Mock example showing how the data should look like a <- c("PDHB", "VARS2", "IARS2") b <- c("IARS2", "PDHA1") ppi <- data.frame( Gene1 = c("PDHB", "VARS2", "IARS2"), Gene2 = c("IARS2", "PDHA1", "CD3"), combined_score = c(0.5, 0.2, 0.1) ) maxInteract <- max(ppi$combined_score) interaction <- getInteractionScore(a, b, ppi, maxInteract) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) data(ppi_macrophage_topGO_example_small, package = "GeDi", envir = environment()) maxInteract <- max(ppi_macrophage_topGO_example_small$combined_score) interaction <- getInteractionScore(genes[1], genes[2], ppi, maxInteract)## Mock example showing how the data should look like a <- c("PDHB", "VARS2", "IARS2") b <- c("IARS2", "PDHA1") ppi <- data.frame( Gene1 = c("PDHB", "VARS2", "IARS2"), Gene2 = c("IARS2", "PDHA1", "CD3"), combined_score = c(0.5, 0.2, 0.1) ) maxInteract <- max(ppi$combined_score) interaction <- getInteractionScore(a, b, ppi, maxInteract) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) data(ppi_macrophage_topGO_example_small, package = "GeDi", envir = environment()) maxInteract <- max(ppi_macrophage_topGO_example_small$combined_score) interaction <- getInteractionScore(genes[1], genes[2], ppi, maxInteract)
Calculate the Jaccard distance of all combinations of genesets in a given data set of genesets.
getJaccardMatrix(genesets)getJaccardMatrix(genesets)
genesets |
a |
A Matrix::Matrix() with Jaccard distance rounded to 2 decimal
places.
## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getJaccardMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) jaccard <- getJaccardMatrix(genes)## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getJaccardMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) jaccard <- getJaccardMatrix(genes)
Calculate the Kappa distance of all combinations of genesets in a given data set of genesets. The Kappa distance is normalized to the (0, 1) interval.
getKappaMatrix(genesets)getKappaMatrix(genesets)
genesets |
a |
A Matrix::Matrix() with Kappa distance rounded to 2 decimal
places.
#' ## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getKappaMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) kappa <- getKappaMatrix(genes)#' ## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getKappaMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) kappa <- getKappaMatrix(genes)
Calculate the Meet-Min distance of all combinations of genesets in a given data set of genesets.
getMeetMinMatrix(genesets)getMeetMinMatrix(genesets)
genesets |
a |
A Matrix::Matrix() with Meet-Min distance rounded to 2 decimal
places.
## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getMeetMinMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) mm <- getMeetMinMatrix(genes)## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getMeetMinMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) mm <- getMeetMinMatrix(genes)
Calculate the pMM distance of all combinations of genesets in a given data set of genesets.
getpMMMatrix( genesets, ppi, alpha = 1, progress = NULL, BPPARAM = BiocParallel::SerialParam() )getpMMMatrix( genesets, ppi, alpha = 1, progress = NULL, BPPARAM = BiocParallel::SerialParam() )
genesets |
a |
ppi |
a |
alpha |
numeric, Scaling factor for controlling the influence of the interaction score. Defaults to 1. |
progress |
a |
BPPARAM |
A BiocParallel |
A Matrix::Matrix() with pMM distance rounded to 2 decimal places.
See https://doi.org/10.1186/s12864-019-5738-6 for details on the original implementation.
## Mock example showing how the data should look like genesets <- list(c("PDHB", "VARS2"), c("IARS2", "PDHA1")) ppi <- data.frame( Gene1 = c("PDHB", "VARS2"), Gene2 = c("IARS2", "PDHA1"), combined_score = c(0.5, 0.2) ) pMM <- getpMMMatrix(genesets, ppi) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) data(ppi_macrophage_topGO_example_small, package = "GeDi", envir = environment()) pMM <- getpMMMatrix(genes, ppi)## Mock example showing how the data should look like genesets <- list(c("PDHB", "VARS2"), c("IARS2", "PDHA1")) ppi <- data.frame( Gene1 = c("PDHB", "VARS2"), Gene2 = c("IARS2", "PDHA1"), combined_score = c(0.5, 0.2) ) pMM <- getpMMMatrix(genesets, ppi) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) data(ppi_macrophage_topGO_example_small, package = "GeDi", envir = environment()) pMM <- getpMMMatrix(genes, ppi)
Download the Protein-Protein Interaction (PPI) information of a STRINGdb
object
getPPI(genes, stringdb, anno_df)getPPI(genes, stringdb, anno_df)
genes |
a |
stringdb |
A |
anno_df |
An annotation |
A data.frame of Protein-Protein interactions
## Mock example showing how the data should look like genes <- c(c("CFTR", "RALA"), c("CACNG3", "ITGA3"), c("DVL2")) stringdb <- getStringDB(9606, cache_location = FALSE) # stringdb anno_df <- getAnnotation(stringdb) ppi <- getPPI(genes, stringdb, anno_df) ## Example using the data available in the package ## Not run: data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) stringdb <- getStringDB(9606) stringdb anno_df <- getAnnotation(stringdb) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) ppi <- getPPI(genes, stringdb, anno_df) ## End(Not run)## Mock example showing how the data should look like genes <- c(c("CFTR", "RALA"), c("CACNG3", "ITGA3"), c("DVL2")) stringdb <- getStringDB(9606, cache_location = FALSE) # stringdb anno_df <- getAnnotation(stringdb) ppi <- getPPI(genes, stringdb, anno_df) ## Example using the data available in the package ## Not run: data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) stringdb <- getStringDB(9606) stringdb anno_df <- getAnnotation(stringdb) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) ppi <- getPPI(genes, stringdb, anno_df) ## End(Not run)
Calculate the Sorensen-Dice distance of all combinations of genesets in a given data set of genesets.
getSorensenDiceMatrix(genesets)getSorensenDiceMatrix(genesets)
genesets |
a |
A Matrix::Matrix() with Sorensen-Dice distance rounded to 2 decimal
places.
## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getSorensenDiceMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) sd_matrix <- getSorensenDiceMatrix(genes)## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getSorensenDiceMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) sd_matrix <- getSorensenDiceMatrix(genes)
Get the respective STRINGdb object of your species of interest
getStringDB( species, version = "12.0", score_threshold = 0, cache_location = FALSE )getStringDB( species, version = "12.0", score_threshold = 0, cache_location = FALSE )
species |
numeric, the NCBI ID of the species of interest |
version |
character, The STRINGdb version to use, defaults to the current version |
score_threshold |
numeric, A score threshold to cut the retrieved interactions, defaults to 0 (all interactions) |
cache_location |
Logical value, defining whether to use the
BiocFileCache for retrieval of the files underlying
the |
a STRINGdb object of species
species <- getId(species = "Homo sapiens") stringdb <- getStringDB(as.numeric(species))species <- getId(species = "Homo sapiens") stringdb <- getStringDB(as.numeric(species))
Calculate the pairwise similarity of GO terms
goDistance( geneset_ids, method = "Wang", ontology = "BP", species = "org.Hs.eg.db" )goDistance( geneset_ids, method = "Wang", ontology = "BP", species = "org.Hs.eg.db" )
geneset_ids |
|
method |
character, the method to calculate the GO distance. Possible options are "Resnik", "Lin", "Rel", "Jiang", "Wang". |
ontology |
character, the ontology to use. Possible options are "BP", "MF" and "CC". |
species |
character, the species of your data. Indicated as org.XX.eg.db package from Bioconductor. |
A Matrix::Matrix() with the pairwise GO distance of each
geneset pair.
## Mock example showing how the data should look like go_ids <- c("GO:0002503", "GO:0045087", "GO:0019886", "GO:0002250", "GO:0001916", "GO:0019885") similarity <- goDistance(go_ids) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi") go_ids <- macrophage_topGO_example_small$Genesets ## Not run: similarity <- goDistance(go_ids) ## End(Not run)## Mock example showing how the data should look like go_ids <- c("GO:0002503", "GO:0045087", "GO:0019886", "GO:0002250", "GO:0001916", "GO:0019885") similarity <- goDistance(go_ids) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi") go_ids <- macrophage_topGO_example_small$Genesets ## Not run: similarity <- goDistance(go_ids) ## End(Not run)
Create a histogram plot to plot geneset names / identifiers against their size.
gsHistogram( genesets, gs_names, gs_description = NULL, start = 0, end = 0, binwidth = 5, color = "#0092AC" )gsHistogram( genesets, gs_names, gs_description = NULL, start = 0, end = 0, binwidth = 5, color = "#0092AC" )
genesets |
a |
gs_names |
character vector, Name / identifier of the genesets in
|
gs_description |
Optional, a character vector containing a short description for each geneset |
start |
numeric, Optional, describes the minimum gene set size to include. Defaults to 0. |
end |
numeric, Optional, describes the maximum gene set size to include. Defaults to 0. |
binwidth |
numeric, Width of histogram bins. Defaults to 5. |
color |
character, Fill color for histogram bars. Defaults to #0092AC. |
A ggplot2::ggplot() plot object.
## Mock example showing how the data should look like gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h") genesets <- list( c("PDHB", "VARS2", "IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2", "AATF"), c("AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) p <- gsHistogram(genesets, gs_names, binwidth = 1) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) p <- gsHistogram(genes, macrophage_topGO_example_small$Genesets)## Mock example showing how the data should look like gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h") genesets <- list( c("PDHB", "VARS2", "IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2", "AATF"), c("AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) p <- gsHistogram(genesets, gs_names, binwidth = 1) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) p <- gsHistogram(genes, macrophage_topGO_example_small$Genesets)
This function performs kMeans clustering on a set of scores.
kMeansClustering(scores, k, iter = 500, nstart = 50)kMeansClustering(scores, k, iter = 500, nstart = 50)
scores |
A |
k |
numerical, the number of centers to start with. This number will correlate with the resulting number of clusters. |
iter |
numerical, number of iterations for refinement. Defaults to 500. |
nstart |
numerical, how often the start points should be switched. Ensures a robust clustering, as clustering is influenced by the start points. Defaults to 50. |
A list of clusters
## Mock example showing how the data should look like scores <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(scores) <- colnames(scores) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- kMeansClustering(scores, k = 3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) cluster <- kMeansClustering(scores_macrophage_topGO_example_small, k = 5)## Mock example showing how the data should look like scores <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(scores) <- colnames(scores) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- kMeansClustering(scores, k = 3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) cluster <- kMeansClustering(scores_macrophage_topGO_example_small, k = 5)
This function performs k-Nearest Neighbors (kNN) clustering on a set of scores.
kNN_clustering(scores, k)kNN_clustering(scores, k)
scores |
A |
k |
numerical, the number of neighbors |
A list of clusters
## Mock example showing how the data should look like scores <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(scores) <- colnames(scores) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- kNN_clustering(scores, k = 3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) kNN <- kNN_clustering(scores_macrophage_topGO_example_small, k = 5)## Mock example showing how the data should look like scores <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(scores) <- colnames(scores) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- kNN_clustering(scores, k = 3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) kNN <- kNN_clustering(scores_macrophage_topGO_example_small, k = 5)
This function is a wrapper function for the Louvain clustering.
The actual computation of the clustering is done in the GeDi::clustering()
function. This function is mainly a wrapper function for stand-alone use of
GeDi functionalities to enhance user experience and allow for a clearer
distinction of the individual clustering algorithms.
louvainClustering(scores, threshold)louvainClustering(scores, threshold)
scores |
A |
threshold |
numerical, A threshold used to determine which genesets are considered similar. Genesets are considered similar if (distance) score <= threshold. similar. |
A list of clusters
## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(m) <- colnames(m) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") louvainCluster <- louvainClustering(m, 0.3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) louvainCluster <- louvainClustering(scores_macrophage_topGO_example_small, threshold = 0.5)## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(m) <- colnames(m) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") louvainCluster <- louvainClustering(m, 0.3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) louvainCluster <- louvainClustering(scores_macrophage_topGO_example_small, threshold = 0.5)
A sample input RData file generated from the macrophage dataset.
A data.frame object
This sample input contains data from the macrophage package found on Bioconductor. The exact steps used to generated this file can be found in the package vignette. The used database for the enrichment was the KEGG database.
Alasoo, K., Rodrigues, J., Mukhopadhyay, S. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet 50, 424–431 (2018). https://doi.org/10.1038/s41588-018-0046-7
A sample input RData file generated from the macrophage dataset.
A data.frame object
This sample input contains data from the macrophage package found on Bioconductor. The exact steps used to generated this file can be found in the package vignette. The used database for the enrichment was the Reactome database.
Alasoo, K., Rodrigues, J., Mukhopadhyay, S. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet 50, 424–431 (2018). https://doi.org/10.1038/s41588-018-0046-7
A sample input RData file generated from the macrophage dataset.
A data.frame object
This sample input contains data from the macrophage package found on Bioconductor. The exact steps used to generated this file can be found in the package vignette.
Alasoo, K., Rodrigues, J., Mukhopadhyay, S. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet 50, 424–431 (2018). https://doi.org/10.1038/s41588-018-0046-7
A small sample input RData file generated from the macrophage dataset.
A data.frame object
This sample input contains data from the macrophage package found on
Bioconductor. It is a small version of the
macrophage_topGO_example and only contains the first 50 rows of
this example. It can be used for fast testing of the application.
Alasoo, K., Rodrigues, J., Mukhopadhyay, S. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet 50, 424–431 (2018). https://doi.org/10.1038/s41588-018-0046-7
This function is a wrapper function for the Markov clustering.
The actual computation of the clustering is done in the GeDi::clustering()
function. This function is mainly a wrapper function for stand-alone use of
GeDi functionalities to enhance user experience and allow for a clearer
distinction of the individual clustering algorithms.
markovClustering(scores, threshold)markovClustering(scores, threshold)
scores |
A |
threshold |
numerical, A threshold used to determine which genesets are considered similar. Genesets are considered similar if (distance) score <= threshold. similar. |
A list of clusters
## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(m) <- colnames(m) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") markovCluster <- markovClustering(m, 0.3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) markovCluster <- markovClustering(scores_macrophage_topGO_example_small, threshold = 0.5)## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(m) <- colnames(m) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") markovCluster <- markovClustering(m, 0.3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) markovCluster <- markovClustering(scores_macrophage_topGO_example_small, threshold = 0.5)
This function performs Partioning aroung Medoids clustering on a set of scores.
pamClustering(scores, k)pamClustering(scores, k)
scores |
A |
k |
numerical, the number of centers to start with. This number will correlate with the resulting number of clusters. |
A list of clusters
## Mock example showing how the data should look like scores <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(scores) <- colnames(scores) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- pamClustering(scores, k = 3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) cluster <- pamClustering(scores_macrophage_topGO_example_small, k = 5)## Mock example showing how the data should look like scores <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(scores) <- colnames(scores) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- pamClustering(scores, k = 3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) cluster <- pamClustering(scores_macrophage_topGO_example_small, k = 5)
Prepare enrichment analysis output data for GeDi
path_to_GeDi(genesets_df, enrichment_package)path_to_GeDi(genesets_df, enrichment_package)
genesets_df |
Dataframe, a dataframe of the enrichment analysis output which should be prepared for the use in GeDi. |
enrichment_package |
character, the name of the R package used to conduct the enrichment analysis. Supported packages are currently topGO, clusterProfiler, ReactomePA, enrichR and fgsea. |
A Dataframe of the inout data which can directly be used with the GeDi package.
genesets_df <- data("macrophage_Reactome_example", package = "GeDi") genesets_df <- path_to_GeDi( genesets_df = macrophage_Reactome_example, enrichment_package = "clusterProfiler")genesets_df <- data("macrophage_Reactome_example", package = "GeDi") genesets_df <- path_to_GeDi( genesets_df = macrophage_Reactome_example, enrichment_package = "clusterProfiler")
Calculate the local pMM distance of two genesets.
pMMlocal(a, b, ppi, maxInteract, alpha)pMMlocal(a, b, ppi, maxInteract, alpha)
a, b
|
character vector, set of gene identifiers. |
ppi |
a |
maxInteract |
numeric, Maximum interaction value in the PPI. |
alpha |
numeric, Scaling factor for controlling the influence of the interaction score. Defaults to 1. |
The pMMlocal score between the two gene sets.
See https://doi.org/10.1186/s12864-019-5738-6 for details on the original implementation.
## Mock example showing how the data should look like a <- c("PDHB", "VARS2") b <- c("IARS2", "PDHA1") ppi <- data.frame( Gene1 = c("PDHB", "VARS2"), Gene2 = c("IARS2", "PDHA1"), combined_score = c(0.5, 0.2) ) maxInteract <- max(ppi$combined_score) pMM_score <- pMMlocal(a, b, ppi, alpha = 1, maxInteract) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) data(ppi_macrophage_topGO_example_small, package = "GeDi", envir = environment()) maxInteract <- max(ppi_macrophage_topGO_example_small$combined_score) pMMlocal <- pMMlocal(genes[1], genes[2], ppi, alpha = 1, maxInteract)## Mock example showing how the data should look like a <- c("PDHB", "VARS2") b <- c("IARS2", "PDHA1") ppi <- data.frame( Gene1 = c("PDHB", "VARS2"), Gene2 = c("IARS2", "PDHA1"), combined_score = c(0.5, 0.2) ) maxInteract <- max(ppi$combined_score) pMM_score <- pMMlocal(a, b, ppi, alpha = 1, maxInteract) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) data(ppi_macrophage_topGO_example_small, package = "GeDi", envir = environment()) maxInteract <- max(ppi_macrophage_topGO_example_small$combined_score) pMMlocal <- pMMlocal(genes[1], genes[2], ppi, alpha = 1, maxInteract)
A file containing a Protein-Protein Interaction (PPI) data.frame for the
macrophage_topGO_example_small.
A data.frame object
This sample input contains a PPI for the
macrophage_topGO_example_small. The PPI has been downloaded using
the functions to download a PPI matrix. Please check out the
vignette for further information.
Alasoo, K., Rodrigues, J., Mukhopadhyay, S. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet 50, 424–431 (2018). https://doi.org/10.1038/s41588-018-0046-7
Split a long string of space separated genes into a list of individual
genes.
prepareGenesetData(genesets, gene_name = NULL)prepareGenesetData(genesets, gene_name = NULL)
genesets |
a |
gene_name |
a character, Alternative name for the column containing the
genes in |
A list containing for each geneset in the Geneset column a
list of the included genes.
## Mock example showing how the data should look like df <- data.frame( Geneset = c( "Cell Cycle", "Biological Process", "Mitosis" ), Genes = c( c("PDHB,VARS2,IARS2"), c("LARS,LARS2"), c("IARS,SUV3") ) ) genes <- prepareGenesetData(df) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- prepareGenesetData(macrophage_topGO_example_small)## Mock example showing how the data should look like df <- data.frame( Geneset = c( "Cell Cycle", "Biological Process", "Mitosis" ), Genes = c( c("PDHB,VARS2,IARS2"), c("LARS,LARS2"), c("IARS,SUV3") ) ) genes <- prepareGenesetData(df) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- prepareGenesetData(macrophage_topGO_example_small)
A sample input text file taken from the GScluster package
Text file
This sample input text file contains data from the GScluster package. It is identical to the sample_geneset.txt file found on the Github page of the package.
Yoon, S., Kim, J., Kim, SK. et al. GScluster: network-weighted gene-set clustering analysis. BMC Genomics 20, 352 (2019). https://doi.org/10.1186/s12864-019-5738-6
A broken input text file to test the application
Text file
This sample input text file is broken and used for testing the application.
An empty input text file to test the application
Text file
This sample input text file is empty and used for testing the application.
A sample input text file taken from the GScluster package, which is reduced to a smaller number of entries for faster testing of the application.
Text file
This sample input text file contains data from the GScluster package. It was taken from the sample_geneset.txt file found on the Github page of the package and then reduced to a smaller amount of entries for faster testing of the application.
Yoon, S., Kim, J., Kim, SK. et al. GScluster: network-weighted gene-set clustering analysis. BMC Genomics 20, 352 (2019). https://doi.org/10.1186/s12864-019-5738-6
A file containing sample distance scores for the
macrophage_topGO_example_small.
A sparse matrix (dgCMatrix)
This sample input contains scores for the
macrophage_topGO_example_small. Distance scores have been
calculated using the getJaccardMatrix() method.
Alasoo, K., Rodrigues, J., Mukhopadhyay, S. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet 50, 424–431 (2018). https://doi.org/10.1038/s41588-018-0046-7
Determine initial seeds for the clustering from the distance score matrix.
seedFinding(distances, simThreshold, memThreshold)seedFinding(distances, simThreshold, memThreshold)
distances |
A |
simThreshold |
numerical, A threshold to determine which genesets are
considered close (i.e. have a distance <= simThreshold)
in the |
memThreshold |
numerical, A threshold used to ensure that enough members of a potential seed set are close/similar to each other. Only if this condition is met, the set is considered a seed. |
A list of seeds which can be used for clustering
See https://david.ncifcrf.gov/helps/functional_classification.html#clustering for details on the original implementation
## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) seeds <- seedFinding(distances = m, simThreshold = 0.3, memThreshold = 0.5) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) seeds <- seedFinding(scores_macrophage_topGO_example_small, simThreshold = 0.3, memThreshold = 0.5)## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) seeds <- seedFinding(distances = m, simThreshold = 0.3, memThreshold = 0.5) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) seeds <- seedFinding(scores_macrophage_topGO_example_small, simThreshold = 0.3, memThreshold = 0.5)