Title: | Defining and visualizing the distances between different genesets |
---|---|
Description: | The package provides different distances measurements to calculate the difference between genesets. Based on these scores the genesets are clustered and visualized as graph. This is all presented in an interactive Shiny application for easy usage. |
Authors: | Annekathrin Nedwed [aut, cre] , Federico Marini [aut] |
Maintainer: | Annekathrin Nedwed <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.3.0 |
Built: | 2024-10-30 07:28:10 UTC |
Source: | https://github.com/bioc/GeDi |
Check if the input genesets have the expected format for this app
.checkGenesets( genesets, col_name_genesets = "Genesets", col_name_genes = "Genes" )
.checkGenesets( genesets, col_name_genesets = "Genesets", col_name_genes = "Genes" )
genesets |
a |
col_name_genesets |
character, the name of the column in which the geneset ids are listed. Defaults to "Genesets". |
col_name_genes |
character, the name of the column in which the genes are listed. Defaults to "Genes". |
A validated and formatted genesets data frame.
Check if the provided GeneTonic List object has the expected format for the app and extract the functional enrichment results
.checkGTL(gtl)
.checkGTL(gtl)
gtl |
A |
A validated and renamed geneset data.frame.
Check if the Protein-Protein-interaction (PPI) has the expected format for this app
.checkPPI(ppi)
.checkPPI(ppi)
ppi |
a |
A validated and formatted PPI data frame.
Check if the provided distance scores have the expected format for this app
.checkScores(genesets, distance_scores)
.checkScores(genesets, distance_scores)
genesets |
a |
distance_scores |
A |
A validated and formatted distance_scores Matrix::Matrix()
.
Filter a preselected list of genesets from a data.frame
of genesets
.filterGenesets(remove, df_genesets)
.filterGenesets(remove, df_genesets)
remove |
a |
df_genesets |
a |
A data.frame
containing information about filtered genesets
This function tries to guess which separator was used in a list of delimited strings.
.findSeparator(stringList, sepList = c(",", "\t", ";", " ", "/"))
.findSeparator(stringList, sepList = c(",", "\t", ";", " ", "/"))
stringList |
|
sepList |
|
character, corresponding to the guessed separator. One of "," (comma), "\t" (tab), ";" (semicolon)," " (whitespace) or "/" (backslash).
See https://github.com/federicomarini/ideal for details on the original implementation.
Map each geneset to the cluster it belongs and return the information as
a data.frame
.getClusterDatatable(cluster, gs_names, gs_description)
.getClusterDatatable(cluster, gs_names, gs_description)
cluster |
A |
gs_names |
A vector of geneset names |
gs_description |
A vector of descriptions for each geneset |
A data.frame
mapping each geneset to the cluster(s) it belongs to
Extracts gene set descriptions from a provided gene set object. The function prioritizes columns "Term", "Description", or "Genesets" to find the appropriate descriptions. If any descriptions are duplicated, the function appends a suffix to make them unique.
.getGenesetDescriptions(genesets)
.getGenesetDescriptions(genesets)
genesets |
a |
a list
of geneset descriptions
Determine the number of CPU cores the scoring functions should use when computing the distance scores.
.getNumberCores(n_cores = NULL)
.getNumberCores(n_cores = NULL)
n_cores |
numeric, number of cores to use for the function.
Defaults to |
Number of CPU cores to be used.
data.frame
of graph metricsGenerate a data.frame
of the graph metrics degree, betweenness,
harmonic centrality and clustering coefficient for each node
in a given graph.
.graphMetricsGenesetsDT(g, genesets)
.graphMetricsGenesetsDT(g, genesets)
g |
A igraph graph object |
genesets |
A |
A data.frame
of geneset
extended by columns for the degree,
betweenness, harmonic centrality and clustering coefficient for each
geneset.
This function tries to guess which separator was used in a text delimited file.
.sepguesser(file, sep_list = c(",", "\t", ";", " ", "/"))
.sepguesser(file, sep_list = c(",", "\t", ";", " ", "/"))
file |
a character, location of a file to read data from. |
sep_list |
a |
A character, corresponding to the guessed separator. One of "," (comma), "\t" (tab), ";" (semicolon)," " (whitespace) or "/" (backslash).
See https://github.com/federicomarini/ideal for details on the original implementation.
Build a igraph from cluster information, connecting nodes which belong to the same cluster.
buildClusterGraph( cluster, geneset_df, gs_ids, color_by = NULL, gs_names = NULL )
buildClusterGraph( cluster, geneset_df, gs_ids, color_by = NULL, gs_names = NULL )
cluster |
list, a |
geneset_df |
|
gs_ids |
vector, a vector of geneset identifiers, e.g. the |
color_by |
character, a column name of |
gs_names |
vector, a vector of geneset descriptions/names, e.g. the
|
An igraph
object to be further manipulated or processed/plotted
(e.g. via igraph::plot.igraph()
or
visNetwork::visIgraph())
cluster <- list(c(1:5), c(6:9, 1)) genes <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") gs_ids <- c(1:9) geneset_df <- data.frame( Genesets = gs_names, value = rep(1, 9) ) geneset_df$Genes <- genes graph <- buildClusterGraph( cluster = cluster, geneset_df = geneset_df, gs_ids = gs_ids, color_by = "value", gs_names = gs_names )
cluster <- list(c(1:5), c(6:9, 1)) genes <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") gs_ids <- c(1:9) geneset_df <- data.frame( Genesets = gs_names, value = rep(1, 9) ) geneset_df$Genes <- genes graph <- buildClusterGraph( cluster = cluster, geneset_df = geneset_df, gs_ids = gs_ids, color_by = "value", gs_names = gs_names )
Construct a graph from a given adjacency matrix
buildGraph(adjMatrix, geneset_df = NULL, gs_names = NULL, weighted = FALSE)
buildGraph(adjMatrix, geneset_df = NULL, gs_names = NULL, weighted = FALSE)
adjMatrix |
A |
geneset_df |
|
gs_names |
vector, a vector of geneset descriptions/names, e.g. the
|
weighted |
logical value, whether or not the resulting graph should have
weighted edges. If TRUE, the |
An igraph
object to be further manipulated or processed/plotted
(e.g. via igraph::plot.igraph()
or
visNetwork::visIgraph())
adj <- Matrix::Matrix(0, 100, 100) adj[c(80:100), c(80:100)] <- 1 geneset_names <- as.character(stats::runif(100, min = 0, max = 1)) rownames(adj) <- colnames(adj) <- geneset_names graph <- buildGraph(adj)
adj <- Matrix::Matrix(0, 100, 100) adj[c(80:100), c(80:100)] <- 1 geneset_names <- as.character(stats::runif(100, min = 0, max = 1)) rownames(adj) <- colnames(adj) <- geneset_names graph <- buildGraph(adj)
gsHistogram()
.Prepare the data for the gsHistogram()
by generating a data.frame
which maps geneset names / identifiers to the size of their size.
buildHistogramData( genesets, gs_names, gs_description = NULL, start = 0, end = 0 )
buildHistogramData( genesets, gs_names, gs_description = NULL, start = 0, end = 0 )
genesets |
a |
gs_names |
character vector, Name / identifier of the genesets in
|
gs_description |
Optional, a character vector containing a short description for each geneset |
start |
numeric, Optional, describes the minimum gene set size to include. Defaults to 0. |
end |
numeric, Optional, describes the maximum gene set size to include. Defaults to 0. |
A data.frame
mapping geneset names to sizes
## Mock example showing how the data should look like gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") genesets <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) p <- buildHistogramData(genesets, gs_names) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) p <- buildHistogramData(genes, macrophage_topGO_example_small$Genesets)
## Mock example showing how the data should look like gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") genesets <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) p <- buildHistogramData(genesets, gs_names) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) p <- buildHistogramData(genes, macrophage_topGO_example_small$Genesets)
Calculate the Jaccard distance between two genesets.
calculateJaccard(a, b)
calculateJaccard(a, b)
a , b
|
character vector, set of gene identifiers. |
The Jaccard distance of the sets.
## Mock example showing how the data should look like a <- c("PDHB", "VARS2") b <- c("IARS2", "PDHA1") c <- calculateJaccard(a, b) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) jaccard <- calculateJaccard(genes[1], genes[2])
## Mock example showing how the data should look like a <- c("PDHB", "VARS2") b <- c("IARS2", "PDHA1") c <- calculateJaccard(a, b) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) jaccard <- calculateJaccard(genes[1], genes[2])
Calculate the Kappa distance between two genesets.
calculateKappa(a, b, all_genes)
calculateKappa(a, b, all_genes)
a , b
|
character vector, set of gene identifiers. |
all_genes |
character vector, list of all (unique) genes available in the input data. |
The Kappa distance of the sets.
## Mock example showing how the data should look like a <- c("PDHB", "VARS2") b <- c("IARS2", "PDHA1") all_genes <- c("PDHB", "VARS2", "IARS2", "PDHA1") c <- calculateKappa(a, b, all_genes) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) c <- calculateKappa(genes[1], genes[2], unique(genes))
## Mock example showing how the data should look like a <- c("PDHB", "VARS2") b <- c("IARS2", "PDHA1") all_genes <- c("PDHB", "VARS2", "IARS2", "PDHA1") c <- calculateKappa(a, b, all_genes) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) c <- calculateKappa(genes[1], genes[2], unique(genes))
Calculate the Sorensen-Dice distance between two genesets.
calculateSorensenDice(a, b)
calculateSorensenDice(a, b)
a , b
|
character vector, set of gene identifiers. |
The Sorensen-Dice distance of the sets.
#' ## Mock example showing how the data should look like a <- c("PDHB", "VARS2") b <- c("IARS2", "PDHA1") c <- calculateSorensenDice(a, b) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) sd <- calculateSorensenDice(genes[1], genes[2])
#' ## Mock example showing how the data should look like a <- c("PDHB", "VARS2") b <- c("IARS2", "PDHA1") c <- calculateSorensenDice(a, b) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) sd <- calculateSorensenDice(genes[1], genes[2])
Remove subsets from a given list of sets, i.e. remove sets which are completely contained in any other larger set in the list.
checkInclusion(seeds)
checkInclusion(seeds)
seeds |
A |
A list
of unique sets
## Mock example showing how the data should look like seeds <- list(c(1:5), c(2:5), c(6:10)) s <- checkInclusion(seeds) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) seeds <- seedFinding(scores_macrophage_topGO_example_small, simThreshold = 0.3, memThreshold = 0.5) seeds <- checkInclusion(seeds)
## Mock example showing how the data should look like seeds <- list(c(1:5), c(2:5), c(6:10)) s <- checkInclusion(seeds) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) seeds <- seedFinding(scores_macrophage_topGO_example_small, simThreshold = 0.3, memThreshold = 0.5) seeds <- checkInclusion(seeds)
This function performs clustering on a set of scores using either the Louvain or Markov method.
clustering(scores, threshold, cluster_method = "louvain")
clustering(scores, threshold, cluster_method = "louvain")
scores |
A |
threshold |
numerical, A threshold used to determine which genesets are considered similar. Genesets are considered similar if (distance) score <= threshold. similar. |
cluster_method |
character, the clustering method to use. The options
are |
A list
of clusters
## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(m) <- colnames(m) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- clustering(m, 0.3, "markov") ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) clustering <- clustering(scores_macrophage_topGO_example_small, threshold = 0.5)
## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(m) <- colnames(m) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- clustering(m, 0.3, "markov") ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) clustering <- clustering(scores_macrophage_topGO_example_small, threshold = 0.5)
Functions that are on their way to the function afterlife. Their successors are also listed.
... |
Ignored arguments. |
The successors of these functions are likely coming from a renaming of the functions to more intuitive function names
All functions throw a warning, with a deprecation message pointing towards its descendent (if available).
getGenes()
, now replaced by the more intuitive name
prepareGenesetData()
. The only change in its functionality concerns the
function name.
Annekathrin Nedwed
# try(getGenes())
# try(getGenes())
Plot a dendrogram of a matrix of (distance) scores.
distanceDendro(distance_scores, cluster_method = "average")
distanceDendro(distance_scores, cluster_method = "average")
distance_scores |
A |
cluster_method |
character, indicating the clustering method
for the |
A ggdendro::ggdendrogram()
plot object.
## Mock example showing how the data should look like distance_scores <- Matrix::Matrix(0.5, 20, 20) distance_scores[c(11:15), c(2:6)] <- 0.2 dendro <- distanceDendro(distance_scores, cluster_method = "single") ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) dendro <- distanceDendro(scores_macrophage_topGO_example_small, cluster_method = "average")
## Mock example showing how the data should look like distance_scores <- Matrix::Matrix(0.5, 20, 20) distance_scores[c(11:15), c(2:6)] <- 0.2 dendro <- distanceDendro(distance_scores, cluster_method = "single") ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) dendro <- distanceDendro(scores_macrophage_topGO_example_small, cluster_method = "average")
Plot a heatmap of a matrix of (distance) scores of the input genesets
distanceHeatmap( distance_scores, chars_limit = 50, plot_labels = TRUE, cluster_rows = TRUE, cluster_columns = TRUE, title = "Distance Scores" )
distanceHeatmap( distance_scores, chars_limit = 50, plot_labels = TRUE, cluster_rows = TRUE, cluster_columns = TRUE, title = "Distance Scores" )
distance_scores |
A |
chars_limit |
Numeric value, Indicates how many characters of the
row and column names of |
plot_labels |
Logical, Indicates if row and collabels should be plotted. Defaults to TRUE |
cluster_rows |
Logical, Indicates whether or not the rows should be clustered based on the distance scores. Defaults to TRUE |
cluster_columns |
Logical, Indicates whether or not the rows should be clustered based on the distance scores. Defaults to TRUE |
title |
character, a title for the figure. Defaults to "Distance Scores" |
A ComplexHeatmap::Heatmap()
plot object.
## Mock example showing how the data should look like distance_scores <- Matrix::Matrix(0.5, 20, 20) distance_scores[c(11:15), c(2:6)] <- 0.2 rownames(distance_scores) <- colnames(distance_scores) <- as.character(c(1:20)) p <- distanceHeatmap(distance_scores) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) p <- distanceHeatmap(scores_macrophage_topGO_example_small)
## Mock example showing how the data should look like distance_scores <- Matrix::Matrix(0.5, 20, 20) distance_scores[c(11:15), c(2:6)] <- 0.2 rownames(distance_scores) <- colnames(distance_scores) <- as.character(c(1:20)) p <- distanceHeatmap(distance_scores) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) p <- distanceHeatmap(scores_macrophage_topGO_example_small)
Visualize the results of an enrichment analysis as a word cloud. The word cloud highlights the most frequent terms associated with the description of the genesets in the enrichment analysis.
enrichmentWordcloud(genesets_df)
enrichmentWordcloud(genesets_df)
genesets_df |
A |
A wordcloud2::wordcloud2()
plot object
## Mock example showing how the data should look like ## If no "Term" or "Description" column is available, ## the rownames of the data frame will be used. geneset_df <- data.frame( Genesets = c("GO:0002503", "GO:0045087", "GO:0019886"), Genes = c("B2M, HLA-DMA, HLA-DMB", "ACOD1, ADAM8, AIM2", "B2M, CD74, CTSS") ) rownames(geneset_df) <- geneset_df$Genesets wordcloud <- enrichmentWordcloud(geneset_df) ## With available "Term" column. geneset_df <- data.frame( Genesets = c("GO:0002503", "GO:0045087", "GO:0019886"), Genes = c("B2M, HLA-DMA, HLA-DMB", "ACOD1, ADAM8, AIM2", "B2M, CD74, CTSS"), Term = c( "peptide antigen assembly with MHC class II protein complex", "innate immune response", "antigen processing and presentation of exogenous peptide antigen via MHC class II") ) wordcloud <- enrichmentWordcloud(geneset_df) ## Example using the data available in the package data(macrophage_topGO_example, package = "GeDi", envir = environment()) wordcloud <- enrichmentWordcloud(macrophage_topGO_example)
## Mock example showing how the data should look like ## If no "Term" or "Description" column is available, ## the rownames of the data frame will be used. geneset_df <- data.frame( Genesets = c("GO:0002503", "GO:0045087", "GO:0019886"), Genes = c("B2M, HLA-DMA, HLA-DMB", "ACOD1, ADAM8, AIM2", "B2M, CD74, CTSS") ) rownames(geneset_df) <- geneset_df$Genesets wordcloud <- enrichmentWordcloud(geneset_df) ## With available "Term" column. geneset_df <- data.frame( Genesets = c("GO:0002503", "GO:0045087", "GO:0019886"), Genes = c("B2M, HLA-DMA, HLA-DMB", "ACOD1, ADAM8, AIM2", "B2M, CD74, CTSS"), Term = c( "peptide antigen assembly with MHC class II protein complex", "innate immune response", "antigen processing and presentation of exogenous peptide antigen via MHC class II") ) wordcloud <- enrichmentWordcloud(geneset_df) ## Example using the data available in the package data(macrophage_topGO_example, package = "GeDi", envir = environment()) wordcloud <- enrichmentWordcloud(macrophage_topGO_example)
Merge the initially determined seeds to clusters.
fuzzyClustering(seeds, threshold)
fuzzyClustering(seeds, threshold)
seeds |
A |
threshold |
numerical, A threshold for merging seeds |
A list
of clusters
See https://david.ncifcrf.gov/helps/functional_classification.html#clustering for details on the original implementation
## Mock example showing how the data should look like seeds <- list(c(1:5), c(6:10)) cluster <- fuzzyClustering(seeds, 0.5) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) seeds <- seedFinding(scores_macrophage_topGO_example_small, simThreshold = 0.3, memThreshold = 0.5) cluster <- fuzzyClustering(seeds, threshold = 0.5)
## Mock example showing how the data should look like seeds <- list(c(1:5), c(6:10)) cluster <- fuzzyClustering(seeds, 0.5) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) seeds <- seedFinding(scores_macrophage_topGO_example_small, simThreshold = 0.3, memThreshold = 0.5) cluster <- fuzzyClustering(seeds, threshold = 0.5)
GeDi main function
GeDi( genesets = NULL, ppi_df = NULL, distance_scores = NULL, gtl = NULL, col_name_genesets = "Genesets", col_name_genes = "Genes" )
GeDi( genesets = NULL, ppi_df = NULL, distance_scores = NULL, gtl = NULL, col_name_genesets = "Genesets", col_name_genes = "Genes" )
genesets |
a |
ppi_df |
a |
distance_scores |
A |
gtl |
A |
col_name_genesets |
character, the name of the column in which the geneset ids are listed. Defaults to "Genesets". |
col_name_genes |
character, the name of the column in which the genes are listed. Defaults to "Genes". |
A Shiny app object is returned
if (interactive()) { GeDi() } # Alternatively, you can also start the application with your data directly # loaded. data("macrophage_topGO_example", package = "GeDi") if (interactive()) { GeDi(genesets = macrophage_topGO_example) }
if (interactive()) { GeDi() } # Alternatively, you can also start the application with your data directly # loaded. data("macrophage_topGO_example", package = "GeDi") if (interactive()) { GeDi(genesets = macrophage_topGO_example) }
Construct an adjacency matrix from the (distance) scores and a given threshold.
getAdjacencyMatrix(distanceMatrix, cutOff, weighted = FALSE)
getAdjacencyMatrix(distanceMatrix, cutOff, weighted = FALSE)
distanceMatrix |
A |
cutOff |
Numeric value, indicating for which pair of entries in the
|
weighted |
logical value, indicating whether or not the resulting
adjacency matrix should be weighted. If TRUE, the matrix will
be weighted by the distance scores in |
A Matrix::Matrix()
of adjacency status
m <- Matrix::Matrix(stats::runif(1000, 0, 1), 100, 100) geneset_names <- as.character(stats::runif(100, min = 0, max = 1)) rownames(m) <- colnames(m) <- geneset_names threshold <- 0.3 adj <- getAdjacencyMatrix(m, threshold)
m <- Matrix::Matrix(stats::runif(1000, 0, 1), 100, 100) geneset_names <- as.character(stats::runif(100, min = 0, max = 1)) rownames(m) <- colnames(m) <- geneset_names threshold <- 0.3 adj <- getAdjacencyMatrix(m, threshold)
Get the annotation of a STRINGdb object, i.e. the aliases of the protein information
getAnnotation(stringdb)
getAnnotation(stringdb)
stringdb |
the STRINGdb object |
A data.frame
mapping STRINGdb ids to gene names
stringdb <- getStringDB(9606) stringdb anno_df <- getAnnotation(stringdb)
stringdb <- getStringDB(9606) stringdb anno_df <- getAnnotation(stringdb)
Construct a bipartite graph from cluster information, mapping the cluster to its members
getBipartiteGraph(cluster, gs_names, genes)
getBipartiteGraph(cluster, gs_names, genes)
cluster |
|
gs_names |
vector, a vector of (geneset) identifiers/names to map the
numeric member value in |
genes |
|
An igraph
object to be further manipulated or processed/plotted
(e.g. via igraph::plot.igraph()
or
visNetwork::visIgraph())
cluster <- list(c(1:5), c(6:9)) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") genes <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) g <- getBipartiteGraph(cluster, gs_names, genes)
cluster <- list(c(1:5), c(6:9)) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") genes <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) g <- getBipartiteGraph(cluster, gs_names, genes)
Construct an adjacency matrix from a list
of cluster.
getClusterAdjacencyMatrix(cluster, gs_names)
getClusterAdjacencyMatrix(cluster, gs_names)
cluster |
A |
gs_names |
A vector of geneset names |
A Matrix::Matrix()
of adjacency status
cluster <- list(c(1:5), c(6:9)) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") adj <- getClusterAdjacencyMatrix(cluster, gs_names)
cluster <- list(c(1:5), c(6:9)) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") adj <- getClusterAdjacencyMatrix(cluster, gs_names)
Split a long string of space separated genes into a list
of individual
genes.
getGenes(genesets, gene_name = NULL)
getGenes(genesets, gene_name = NULL)
genesets |
a |
gene_name |
a character, Alternative name for the column containing the
genes in |
A list
containing for each geneset in the Geneset
column a
list
of the included genes.
Build up the title for the graph nodes to display the available information of each geneset.
getGraphTitle( geneset_df = NULL, node_ids, gs_ids, gs_names = NULL, cluster_id = NULL )
getGraphTitle( geneset_df = NULL, node_ids, gs_ids, gs_names = NULL, cluster_id = NULL )
geneset_df |
A |
node_ids |
vector, a vector of ids of the nodes in the graph for which the node title should be build. |
gs_ids |
vector, a vector of geneset identifiers, e.g. the |
gs_names |
vector, a vector of geneset descriptions/names, e.g. the
|
cluster_id |
vector, a vector of cluster ids for each of the genesets |
A list
of titles for a graph with nodes given by node_ids
.
genes <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") geneset_df <- data.frame( Genesets = gs_names, value = rep(1, 9) ) geneset_df$Genes <- genes graph <- getGraphTitle( geneset_df = geneset_df, node_ids = c(1:9), gs_ids = c(1:9), gs_names = gs_names )
genes <- list( c("PDHB", "VARS2"), c("IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2"), c("AATF", "AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i") geneset_df <- data.frame( Genesets = gs_names, value = rep(1, 9) ) geneset_df$Genes <- genes graph <- getGraphTitle( geneset_df = geneset_df, node_ids = c(1:9), gs_ids = c(1:9), gs_names = gs_names )
Get the NCBI ID of a species
getId(species, version = "12.0", cache = FALSE)
getId(species, version = "12.0", cache = FALSE)
species |
character, the species of your input data |
version |
character, the version of STRING you want to use, defaults to the current version of STRING |
cache |
Logical value, defining whether to use the
BiocFileCache for retrieval of the files underlying
the STRINGdb object. Defaults to |
A character of the NCBI ID of species
species <- "Homo sapiens" id <- getId(species = species) species <- "Mus musculus" id <- getId(species = species)
species <- "Homo sapiens" id <- getId(species = species) species <- "Mus musculus" id <- getId(species = species)
The function calculates an interaction score between two sets of genes based on a protein-protein interaction network.
getInteractionScore(a, b, ppi, maxInteract)
getInteractionScore(a, b, ppi, maxInteract)
a , b
|
character vector, set of gene identifiers. |
ppi |
a |
maxInteract |
numeric, Maximum interaction value in the PPI. |
Interaction score between the two gene sets.
See https://doi.org/10.1186/s12864-019-5738-6 for details on the original implementation.
## Mock example showing how the data should look like a <- c("PDHB", "VARS2", "IARS2") b <- c("IARS2", "PDHA1") ppi <- data.frame( Gene1 = c("PDHB", "VARS2", "IARS2"), Gene2 = c("IARS2", "PDHA1", "CD3"), combined_score = c(0.5, 0.2, 0.1) ) maxInteract <- max(ppi$combined_score) interaction <- getInteractionScore(a, b, ppi, maxInteract) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) data(ppi_macrophage_topGO_example_small, package = "GeDi", envir = environment()) maxInteract <- max(ppi_macrophage_topGO_example_small$combined_score) interaction <- getInteractionScore(genes[1], genes[2], ppi, maxInteract)
## Mock example showing how the data should look like a <- c("PDHB", "VARS2", "IARS2") b <- c("IARS2", "PDHA1") ppi <- data.frame( Gene1 = c("PDHB", "VARS2", "IARS2"), Gene2 = c("IARS2", "PDHA1", "CD3"), combined_score = c(0.5, 0.2, 0.1) ) maxInteract <- max(ppi$combined_score) interaction <- getInteractionScore(a, b, ppi, maxInteract) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) data(ppi_macrophage_topGO_example_small, package = "GeDi", envir = environment()) maxInteract <- max(ppi_macrophage_topGO_example_small$combined_score) interaction <- getInteractionScore(genes[1], genes[2], ppi, maxInteract)
Calculate the Jaccard distance of all combinations of genesets in a given data set of genesets.
getJaccardMatrix( genesets, progress = NULL, BPPARAM = BiocParallel::SerialParam() )
getJaccardMatrix( genesets, progress = NULL, BPPARAM = BiocParallel::SerialParam() )
genesets |
a |
progress |
a |
BPPARAM |
A BiocParallel |
A Matrix::Matrix()
with Jaccard distance rounded to 2 decimal
places.
## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getJaccardMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) jaccard <-getJaccardMatrix(genes)
## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getJaccardMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) jaccard <-getJaccardMatrix(genes)
Calculate the Kappa distance of all combinations of genesets in a given data set of genesets. The Kappa distance is normalized to the (0, 1) interval.
getKappaMatrix( genesets, progress = NULL, BPPARAM = BiocParallel::SerialParam() )
getKappaMatrix( genesets, progress = NULL, BPPARAM = BiocParallel::SerialParam() )
genesets |
a |
progress |
a |
BPPARAM |
A BiocParallel |
A Matrix::Matrix()
with Kappa distance rounded to 2 decimal
places.
#' ## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getKappaMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) kappa <- getKappaMatrix(genes)
#' ## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getKappaMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) kappa <- getKappaMatrix(genes)
Calculate the Meet-Min distance of all combinations of genesets in a given data set of genesets.
getMeetMinMatrix( genesets, progress = NULL, BPPARAM = BiocParallel::SerialParam() )
getMeetMinMatrix( genesets, progress = NULL, BPPARAM = BiocParallel::SerialParam() )
genesets |
a |
progress |
a |
BPPARAM |
A BiocParallel |
A Matrix::Matrix()
with Meet-Min distance rounded to 2 decimal
places.
## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getMeetMinMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) mm <- getMeetMinMatrix(genes)
## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getMeetMinMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) mm <- getMeetMinMatrix(genes)
Calculate the pMM distance of all combinations of genesets in a given data set of genesets.
getpMMMatrix( genesets, ppi, alpha = 1, progress = NULL, BPPARAM = BiocParallel::SerialParam() )
getpMMMatrix( genesets, ppi, alpha = 1, progress = NULL, BPPARAM = BiocParallel::SerialParam() )
genesets |
a |
ppi |
a |
alpha |
numeric, Scaling factor for controlling the influence of the interaction score. Defaults to 1. |
progress |
a |
BPPARAM |
A BiocParallel |
A Matrix::Matrix()
with pMM distance rounded to 2 decimal places.
See https://doi.org/10.1186/s12864-019-5738-6 for details on the original implementation.
## Mock example showing how the data should look like genesets <- list(c("PDHB", "VARS2"), c("IARS2", "PDHA1")) ppi <- data.frame( Gene1 = c("PDHB", "VARS2"), Gene2 = c("IARS2", "PDHA1"), combined_score = c(0.5, 0.2) ) pMM <- getpMMMatrix(genesets, ppi) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) data(ppi_macrophage_topGO_example_small, package = "GeDi", envir = environment()) pMM <- getpMMMatrix(genes, ppi)
## Mock example showing how the data should look like genesets <- list(c("PDHB", "VARS2"), c("IARS2", "PDHA1")) ppi <- data.frame( Gene1 = c("PDHB", "VARS2"), Gene2 = c("IARS2", "PDHA1"), combined_score = c(0.5, 0.2) ) pMM <- getpMMMatrix(genesets, ppi) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) data(ppi_macrophage_topGO_example_small, package = "GeDi", envir = environment()) pMM <- getpMMMatrix(genes, ppi)
Download the Protein-Protein Interaction (PPI) information of a STRINGdb object
getPPI(genes, stringdb, anno_df)
getPPI(genes, stringdb, anno_df)
genes |
a |
stringdb |
A STRINGdb object, the species of the object should match
the species of |
anno_df |
An annotation |
A data.frame
of Protein-Protein interactions
## Mock example showing how the data should look like genes <- c(c("CFTR", "RALA"), c("CACNG3", "ITGA3"), c("DVL2")) stringdb <- getStringDB(9606, cache_location = FALSE) # stringdb anno_df <- getAnnotation(stringdb) ppi <- getPPI(genes, stringdb, anno_df) ## Example using the data available in the package ## Not run: data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) stringdb <- getStringDB(9606) stringdb anno_df <- getAnnotation(stringdb) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) ppi <- getPPI(genes, stringdb, anno_df) ## End(Not run)
## Mock example showing how the data should look like genes <- c(c("CFTR", "RALA"), c("CACNG3", "ITGA3"), c("DVL2")) stringdb <- getStringDB(9606, cache_location = FALSE) # stringdb anno_df <- getAnnotation(stringdb) ppi <- getPPI(genes, stringdb, anno_df) ## Example using the data available in the package ## Not run: data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) stringdb <- getStringDB(9606) stringdb anno_df <- getAnnotation(stringdb) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) ppi <- getPPI(genes, stringdb, anno_df) ## End(Not run)
Calculate the Sorensen-Dice distance of all combinations of genesets in a given data set of genesets.
getSorensenDiceMatrix( genesets, progress = NULL, BPPARAM = BiocParallel::SerialParam() )
getSorensenDiceMatrix( genesets, progress = NULL, BPPARAM = BiocParallel::SerialParam() )
genesets |
a |
progress |
a |
BPPARAM |
A BiocParallel |
A Matrix::Matrix()
with Sorensen-Dice distance rounded to 2 decimal
places.
## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getSorensenDiceMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) sd_matrix <- getSorensenDiceMatrix(genes)
## Mock example showing how the data should look like genesets <- list(list("PDHB", "VARS2"), list("IARS2", "PDHA1")) m <- getSorensenDiceMatrix(genesets) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) sd_matrix <- getSorensenDiceMatrix(genes)
Get the respective STRINGdb object of your species of interest
getStringDB( species, version = "12.0", score_threshold = 0, cache_location = FALSE )
getStringDB( species, version = "12.0", score_threshold = 0, cache_location = FALSE )
species |
numeric, the NCBI ID of the species of interest |
version |
character, The STRINGdb version to use, defaults to the current version |
score_threshold |
numeric, A score threshold to cut the retrieved interactions, defaults to 0 (all interactions) |
cache_location |
Logical value, defining whether to use the
BiocFileCache for retrieval of the files underlying
the STRINGdb object. Defaults to |
a STRINGdb object of species
species <- getId(species = "Homo sapiens") stringdb <- getStringDB(as.numeric(species))
species <- getId(species = "Homo sapiens") stringdb <- getStringDB(as.numeric(species))
Calculate the pairwise similarity of GO terms
goDistance( geneset_ids, method = "Wang", ontology = "BP", species = "org.Hs.eg.db", progress = NULL, BPPARAM = BiocParallel::SerialParam() )
goDistance( geneset_ids, method = "Wang", ontology = "BP", species = "org.Hs.eg.db", progress = NULL, BPPARAM = BiocParallel::SerialParam() )
geneset_ids |
|
method |
character, the method to calculate the GO distance. See GOSemSim::goSim measure parameter for possibilities. |
ontology |
character, the ontology to use. See GOSemSim::goSim
|
species |
character, the species of your data. Indicated as org.XX.eg.db package from Bioconductor. |
progress |
|
BPPARAM |
A BiocParallel |
A Matrix::Matrix()
with the pairwise GO distance of each
geneset pair.
## Mock example showing how the data should look like go_ids <- c("GO:0002503", "GO:0045087", "GO:0019886", "GO:0002250", "GO:0001916", "GO:0019885") similarity <- goDistance(go_ids) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi") go_ids <- macrophage_topGO_example_small$Genesets ## Not run: similarity <- goDistance(go_ids) ## End(Not run)
## Mock example showing how the data should look like go_ids <- c("GO:0002503", "GO:0045087", "GO:0019886", "GO:0002250", "GO:0001916", "GO:0019885") similarity <- goDistance(go_ids) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi") go_ids <- macrophage_topGO_example_small$Genesets ## Not run: similarity <- goDistance(go_ids) ## End(Not run)
Create a histogram plot to plot geneset names / identifiers against their size.
gsHistogram( genesets, gs_names, gs_description = NULL, start = 0, end = 0, binwidth = 5, color = "#0092AC" )
gsHistogram( genesets, gs_names, gs_description = NULL, start = 0, end = 0, binwidth = 5, color = "#0092AC" )
genesets |
a |
gs_names |
character vector, Name / identifier of the genesets in
|
gs_description |
Optional, a character vector containing a short description for each geneset |
start |
numeric, Optional, describes the minimum gene set size to include. Defaults to 0. |
end |
numeric, Optional, describes the maximum gene set size to include. Defaults to 0. |
binwidth |
numeric, Width of histogram bins. Defaults to 5. |
color |
character, Fill color for histogram bars. Defaults to #0092AC. |
A ggplot2::ggplot()
plot object.
## Mock example showing how the data should look like gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h") genesets <- list( c("PDHB", "VARS2", "IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2", "AATF"), c("AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) p <- gsHistogram(genesets, gs_names, binwidth = 1) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) p <- gsHistogram(genes, macrophage_topGO_example_small$Genesets)
## Mock example showing how the data should look like gs_names <- c("a", "b", "c", "d", "e", "f", "g", "h") genesets <- list( c("PDHB", "VARS2", "IARS2", "PDHA1"), c("AAAS", "ABCE1"), c("ABI1", "AAR2", "AATF"), c("AMFR"), c("BMS1", "DAP3"), c("AURKAIP1", "CHCHD1"), c("IARS2"), c("AHI1", "ALMS1") ) p <- gsHistogram(genesets, gs_names, binwidth = 1) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) p <- gsHistogram(genes, macrophage_topGO_example_small$Genesets)
This function performs kMeans clustering on a set of scores.
kMeansClustering(scores, k, iter = 500, nstart = 50)
kMeansClustering(scores, k, iter = 500, nstart = 50)
scores |
A |
k |
numerical, the number of centers to start with. This number will correlate with the resulting number of clusters. |
iter |
numerical, number of iterations for refinement. Defaults to 500. |
nstart |
numerical, how often the start points should be switched. Ensures a robust clustering, as clustering is influenced by the start points. Defaults to 50. |
A list
of clusters
## Mock example showing how the data should look like scores <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(scores) <- colnames(scores) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- kMeansClustering(scores, k = 3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) cluster <- kMeansClustering(scores_macrophage_topGO_example_small, k = 5)
## Mock example showing how the data should look like scores <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(scores) <- colnames(scores) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- kMeansClustering(scores, k = 3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) cluster <- kMeansClustering(scores_macrophage_topGO_example_small, k = 5)
This function performs k-Nearest Neighbors (kNN) clustering on a set of scores.
kNN_clustering(scores, k)
kNN_clustering(scores, k)
scores |
A |
k |
numerical, the number of neighbors |
A list
of clusters
## Mock example showing how the data should look like scores <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(scores) <- colnames(scores) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- kNN_clustering(scores, k = 3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) kNN <- kNN_clustering(scores_macrophage_topGO_example_small, k = 5)
## Mock example showing how the data should look like scores <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(scores) <- colnames(scores) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- kNN_clustering(scores, k = 3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) kNN <- kNN_clustering(scores_macrophage_topGO_example_small, k = 5)
This function is a wrapper function for the Louvain clustering.
The actual computation of the clustering is done in the GeDi::clustering()
function. This function is mainly a wrapper function for stand-alone use of
GeDi functionalities to enhance user experience and allow for a clearer
distinction of the individual clustering algorithms.
louvainClustering(scores, threshold)
louvainClustering(scores, threshold)
scores |
A |
threshold |
numerical, A threshold used to determine which genesets are considered similar. Genesets are considered similar if (distance) score <= threshold. similar. |
A list
of clusters
## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(m) <- colnames(m) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") louvainCluster <- louvainClustering(m, 0.3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) louvainCluster <- louvainClustering(scores_macrophage_topGO_example_small, threshold = 0.5)
## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(m) <- colnames(m) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") louvainCluster <- louvainClustering(m, 0.3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) louvainCluster <- louvainClustering(scores_macrophage_topGO_example_small, threshold = 0.5)
A sample input RData file generated from the macrophage dataset.
A data.frame object
This sample input contains data from the macrophage package found on Bioconductor. The exact steps used to generated this file can be found in the package vignette. The used database for the enrichment was the KEGG database.
Alasoo, K., Rodrigues, J., Mukhopadhyay, S. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet 50, 424–431 (2018). https://doi.org/10.1038/s41588-018-0046-7
A sample input RData file generated from the macrophage dataset.
A data.frame object
This sample input contains data from the macrophage package found on Bioconductor. The exact steps used to generated this file can be found in the package vignette. The used database for the enrichment was the Reactome database.
Alasoo, K., Rodrigues, J., Mukhopadhyay, S. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet 50, 424–431 (2018). https://doi.org/10.1038/s41588-018-0046-7
A sample input RData file generated from the macrophage dataset.
A data.frame object
This sample input contains data from the macrophage package found on Bioconductor. The exact steps used to generated this file can be found in the package vignette.
Alasoo, K., Rodrigues, J., Mukhopadhyay, S. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet 50, 424–431 (2018). https://doi.org/10.1038/s41588-018-0046-7
A small sample input RData file generated from the macrophage dataset.
A data.frame object
This sample input contains data from the macrophage package found on
Bioconductor. It is a small version of the
macrophage_topGO_example
and only contains the first 50 rows of
this example. It can be used for fast testing of the application.
Alasoo, K., Rodrigues, J., Mukhopadhyay, S. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet 50, 424–431 (2018). https://doi.org/10.1038/s41588-018-0046-7
This function is a wrapper function for the Markov clustering.
The actual computation of the clustering is done in the GeDi::clustering()
function. This function is mainly a wrapper function for stand-alone use of
GeDi functionalities to enhance user experience and allow for a clearer
distinction of the individual clustering algorithms.
markovClustering(scores, threshold)
markovClustering(scores, threshold)
scores |
A |
threshold |
numerical, A threshold used to determine which genesets are considered similar. Genesets are considered similar if (distance) score <= threshold. similar. |
A list
of clusters
## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(m) <- colnames(m) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") markovCluster <- markovClustering(m, 0.3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) markovCluster <- markovClustering(scores_macrophage_topGO_example_small, threshold = 0.5)
## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(m) <- colnames(m) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") markovCluster <- markovClustering(m, 0.3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) markovCluster <- markovClustering(scores_macrophage_topGO_example_small, threshold = 0.5)
This function performs Partioning aroung Medoids clustering on a set of scores.
pamClustering(scores, k)
pamClustering(scores, k)
scores |
A |
k |
numerical, the number of centers to start with. This number will correlate with the resulting number of clusters. |
A list
of clusters
## Mock example showing how the data should look like scores <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(scores) <- colnames(scores) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- pamClustering(scores, k = 3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) cluster <- pamClustering(scores_macrophage_topGO_example_small, k = 5)
## Mock example showing how the data should look like scores <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) rownames(scores) <- colnames(scores) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j") cluster <- pamClustering(scores, k = 3) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) cluster <- pamClustering(scores_macrophage_topGO_example_small, k = 5)
Calculate the local pMM distance of two genesets.
pMMlocal(a, b, ppi, maxInteract, alpha)
pMMlocal(a, b, ppi, maxInteract, alpha)
a , b
|
character vector, set of gene identifiers. |
ppi |
a |
maxInteract |
numeric, Maximum interaction value in the PPI. |
alpha |
numeric, Scaling factor for controlling the influence of the interaction score. Defaults to 1. |
The pMMlocal score between the two gene sets.
See https://doi.org/10.1186/s12864-019-5738-6 for details on the original implementation.
## Mock example showing how the data should look like a <- c("PDHB", "VARS2") b <- c("IARS2", "PDHA1") ppi <- data.frame( Gene1 = c("PDHB", "VARS2"), Gene2 = c("IARS2", "PDHA1"), combined_score = c(0.5, 0.2) ) maxInteract <- max(ppi$combined_score) pMM_score <- pMMlocal(a, b, ppi, alpha = 1, maxInteract) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) data(ppi_macrophage_topGO_example_small, package = "GeDi", envir = environment()) maxInteract <- max(ppi_macrophage_topGO_example_small$combined_score) pMMlocal <- pMMlocal(genes[1], genes[2], ppi, alpha = 1, maxInteract)
## Mock example showing how the data should look like a <- c("PDHB", "VARS2") b <- c("IARS2", "PDHA1") ppi <- data.frame( Gene1 = c("PDHB", "VARS2"), Gene2 = c("IARS2", "PDHA1"), combined_score = c(0.5, 0.2) ) maxInteract <- max(ppi$combined_score) pMM_score <- pMMlocal(a, b, ppi, alpha = 1, maxInteract) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- GeDi::prepareGenesetData(macrophage_topGO_example_small) data(ppi_macrophage_topGO_example_small, package = "GeDi", envir = environment()) maxInteract <- max(ppi_macrophage_topGO_example_small$combined_score) pMMlocal <- pMMlocal(genes[1], genes[2], ppi, alpha = 1, maxInteract)
A file containing a Protein-Protein Interaction (PPI) data.frame
for the
macrophage_topGO_example_small
.
A data.frame object
This sample input contains a PPI for the
macrophage_topGO_example_small
. The PPI has been downloaded using
the functions to download a PPI matrix. Please check out the
vignette for further information.
Alasoo, K., Rodrigues, J., Mukhopadhyay, S. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet 50, 424–431 (2018). https://doi.org/10.1038/s41588-018-0046-7
Split a long string of space separated genes into a list
of individual
genes.
prepareGenesetData(genesets, gene_name = NULL)
prepareGenesetData(genesets, gene_name = NULL)
genesets |
a |
gene_name |
a character, Alternative name for the column containing the
genes in |
A list
containing for each geneset in the Geneset
column a
list
of the included genes.
## Mock example showing how the data should look like df <- data.frame( Geneset = c( "Cell Cycle", "Biological Process", "Mitosis" ), Genes = c( c("PDHB,VARS2,IARS2"), c("LARS,LARS2"), c("IARS,SUV3") ) ) genes <- prepareGenesetData(df) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- prepareGenesetData(macrophage_topGO_example_small)
## Mock example showing how the data should look like df <- data.frame( Geneset = c( "Cell Cycle", "Biological Process", "Mitosis" ), Genes = c( c("PDHB,VARS2,IARS2"), c("LARS,LARS2"), c("IARS,SUV3") ) ) genes <- prepareGenesetData(df) ## Example using the data available in the package data(macrophage_topGO_example_small, package = "GeDi", envir = environment()) genes <- prepareGenesetData(macrophage_topGO_example_small)
A sample input text file taken from the GScluster package
Text file
This sample input text file contains data from the GScluster package. It is identical to the sample_geneset.txt file found on the Github page of the package.
Yoon, S., Kim, J., Kim, SK. et al. GScluster: network-weighted gene-set clustering analysis. BMC Genomics 20, 352 (2019). https://doi.org/10.1186/s12864-019-5738-6
A broken input text file to test the application
Text file
This sample input text file is broken and used for testing the application.
An empty input text file to test the application
Text file
This sample input text file is empty and used for testing the application.
A sample input text file taken from the GScluster package, which is reduced to a smaller number of entries for faster testing of the application.
Text file
This sample input text file contains data from the GScluster package. It was taken from the sample_geneset.txt file found on the Github page of the package and then reduced to a smaller amount of entries for faster testing of the application.
Yoon, S., Kim, J., Kim, SK. et al. GScluster: network-weighted gene-set clustering analysis. BMC Genomics 20, 352 (2019). https://doi.org/10.1186/s12864-019-5738-6
A method to scale a matrix of distance scores with the GO term similarity of the associated genesets.
scaleGO( scores, geneset_ids, method = "Wang", ontology = "BP", species = "org.Hs.eg.db", BPPARAM = BiocParallel::SerialParam() )
scaleGO( scores, geneset_ids, method = "Wang", ontology = "BP", species = "org.Hs.eg.db", BPPARAM = BiocParallel::SerialParam() )
scores |
a |
geneset_ids |
|
method |
character, the method to calculate the GO distance. See GOSemSim::goSim measure parameter for possibilities. |
ontology |
character, the ontology to use. See GOSemSim::goSim ont parameter for possibilities. |
species |
character, the species of your data. Indicated as org.XX.eg.db package from Bioconductor. |
BPPARAM |
A BiocParallelParam object specifying how parallelization should be handled |
A Matrix::Matrix()
of scaled values.
## Mock example showing how the data should look like go_ids <- c("GO:0002503", "GO:0045087", "GO:0019886", "GO:0002250", "GO:0001916", "GO:0019885") set.seed(42) scores <- Matrix::Matrix(stats::runif(36, min = 0, max = 1), 6, 6) similarity <- scaleGO(scores, go_ids) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi") data(macrophage_topGO_example_small, package = "GeDi") go_ids <- macrophage_topGO_example_small$Genesets ## Not run: scores_scaled <- scaleGO(scores_macrophage_topGO_example_small, go_ids) ## End(Not run)
## Mock example showing how the data should look like go_ids <- c("GO:0002503", "GO:0045087", "GO:0019886", "GO:0002250", "GO:0001916", "GO:0019885") set.seed(42) scores <- Matrix::Matrix(stats::runif(36, min = 0, max = 1), 6, 6) similarity <- scaleGO(scores, go_ids) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi") data(macrophage_topGO_example_small, package = "GeDi") go_ids <- macrophage_topGO_example_small$Genesets ## Not run: scores_scaled <- scaleGO(scores_macrophage_topGO_example_small, go_ids) ## End(Not run)
A file containing sample distance scores for the
macrophage_topGO_example_small
.
A sparse matrix (dgCMatrix
)
This sample input contains scores for the
macrophage_topGO_example_small
. Distance scores have been
calculated using the getJaccardMatrix()
method.
Alasoo, K., Rodrigues, J., Mukhopadhyay, S. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet 50, 424–431 (2018). https://doi.org/10.1038/s41588-018-0046-7
Determine initial seeds for the clustering from the distance score matrix.
seedFinding(distances, simThreshold, memThreshold)
seedFinding(distances, simThreshold, memThreshold)
distances |
A |
simThreshold |
numerical, A threshold to determine which genesets are
considered close (i.e. have a distance <= simThreshold)
in the |
memThreshold |
numerical, A threshold used to ensure that enough members of a potential seed set are close/similar to each other. Only if this condition is met, the set is considered a seed. |
A list
of seeds which can be used for clustering
See https://david.ncifcrf.gov/helps/functional_classification.html#clustering for details on the original implementation
## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) seeds <- seedFinding(distances = m, simThreshold = 0.3, memThreshold = 0.5) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) seeds <- seedFinding(scores_macrophage_topGO_example_small, simThreshold = 0.3, memThreshold = 0.5)
## Mock example showing how the data should look like m <- Matrix::Matrix(stats::runif(100, min = 0, max = 1), 10, 10) seeds <- seedFinding(distances = m, simThreshold = 0.3, memThreshold = 0.5) ## Example using the data available in the package data(scores_macrophage_topGO_example_small, package = "GeDi", envir = environment()) seeds <- seedFinding(scores_macrophage_topGO_example_small, simThreshold = 0.3, memThreshold = 0.5)