Title: | Functional similarities |
---|---|
Description: | Calculates functional similarities based on the pathways described on KEGG and REACTOME or in gene sets. These similarities can be calculated for pathways or gene sets, genes, or clusters and combined with other similarities. They can be used to improve networks, gene selection, testing relationships... |
Authors: | Lluís Revilla Sancho [aut, cre] , Pau Sancho-Bru [ths] , Juan José Salvatella Lozano [ths] |
Maintainer: | Lluís Revilla Sancho <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.31.0 |
Built: | 2024-10-30 04:51:14 UTC |
Source: | https://github.com/bioc/BioCor |
Calculates a functional similarity measure between gene identifiers based on the pathways described on KEGG and REACTOME.
pathSim()
: Calculates the similarity between two pathways.
geneSim()
: Calculates the similarity (based on pathSim)
between two genes.
clusterSim()
: Calculates the similarity between two
clusters of genes by joining pathways of each gene.
clusterGeneSim()
: Calculates the similarity between two
clusters of genes by comparing the similarity between the genes of a cluster.
similarities()
: Allows to combine the value of matrices of
similarities.
conversions()
: Two functions to convert similarity
measures.
weighted()
: Functions provided to combine similarities.
Maintainer: Lluís Revilla Sancho [email protected] (ORCID)
Other contributors:
Useful links:
Report bugs at https://github.com/llrs/BioCor/issues
Function that use the previously calculated similarities into a single similarity matrix.
addSimilarities(x, bio_mat, weights = c(0.5, 0.18, 0.1, 0.22))
addSimilarities(x, bio_mat, weights = c(0.5, 0.18, 0.1, 0.22))
x |
A matrix with the similarity of expression |
bio_mat |
A list of matrices of the same dimension as x. |
weights |
A numeric vector of weight to multiply each similarity |
The total weight can't be higher than 1 to prevent values above 1 but can be below 1. It uses weighted.sum with abs = TRUE internally.
A square matrix of the same dimensions as the input matrices.
Lluís Revilla
set.seed(100) a <- seq2mat(LETTERS[1:5], rnorm(10)) b <- seq2mat(LETTERS[1:5], seq(from = 0.1, to = 1, by = 0.1)) sim <- list(b) addSimilarities(a, sim, c(0.5, 0.5))
set.seed(100) a <- seq2mat(LETTERS[1:5], rnorm(10)) b <- seq2mat(LETTERS[1:5], seq(from = 0.1, to = 1, by = 0.1)) sim <- list(b) addSimilarities(a, sim, c(0.5, 0.5))
Insert values from a matrix into another matrix based on the rownames and colnames replacing the values.
AintoB(A, B)
AintoB(A, B)
A |
A matrix to be inserted. |
B |
A matrix to insert in. |
If all the genes with pathway information are already calculated but you would like to use more genes when performing analysis. insert the once you have calculated on the matrix of genes.
A matrix with the values of A in the matrix B.
Lluís Revilla
B <- matrix( ncol = 10, nrow = 10, dimnames = list(letters[1:10], letters[1:10]) ) A <- matrix(c(1:15), byrow = TRUE, nrow = 5, dimnames = list(letters[1:5], letters[1:3]) ) AintoB(A, B) # Mixed orders colnames(A) <- c("c", "h", "e") rownames(A) <- c("b", "a", "f", "c", "j") AintoB(A, B) # Missing colums or rows colnames(A) <- c("d", "f", "k") AintoB(A, B)
B <- matrix( ncol = 10, nrow = 10, dimnames = list(letters[1:10], letters[1:10]) ) A <- matrix(c(1:15), byrow = TRUE, nrow = 5, dimnames = list(letters[1:5], letters[1:3]) ) AintoB(A, B) # Mixed orders colnames(A) <- c("c", "h", "e") rownames(A) <- c("b", "a", "f", "c", "j") AintoB(A, B) # Missing colums or rows colnames(A) <- c("d", "f", "k") AintoB(A, B)
Looks for the similarity between genes of a group and then between each group's genes.
clusterGeneSim(cluster1, cluster2, info, method = c("max", "rcmax.avg"), ...) ## S4 method for signature 'character,character,GeneSetCollection' clusterGeneSim(cluster1, cluster2, info, method = c("max", "rcmax.avg"), ...)
clusterGeneSim(cluster1, cluster2, info, method = c("max", "rcmax.avg"), ...) ## S4 method for signature 'character,character,GeneSetCollection' clusterGeneSim(cluster1, cluster2, info, method = c("max", "rcmax.avg"), ...)
cluster1 , cluster2
|
A vector with genes. |
info |
A GeneSetCollection or a list of genes and the pathways they are involved. |
method |
A vector with two or one argument to be passed to combineScores the first one is used to summarize the similarities of genes, the second one for clusters. |
... |
Other arguments passed to |
Differs with clusterSim that first each combination between genes is calculated, and with this values then the comparison between the two clusters is done. Thus applying combineScores twice, one at gene level and another one at cluster level.
Returns a similarity score between the genes of the two clusters.
clusterGeneSim(
cluster1 = character,
cluster2 = character,
info = GeneSetCollection
)
: Calculates the gene similarities in a
GeneSetCollection and combine them using combineScoresPar()
Lluís Revilla
mclusterGeneSim()
, combineScores()
and
clusterSim()
if (require("org.Hs.eg.db")) { # Extract the paths of all genes of org.Hs.eg.db from KEGG (last update in # data of June 31st 2011) genes.kegg <- as.list(org.Hs.egPATH) clusterGeneSim(c("18", "81", "10"), c("100", "10", "1"), genes.kegg) clusterGeneSim( c("18", "81", "10"), c("100", "10", "1"), genes.kegg, c("avg", "avg") ) clusterGeneSim( c("18", "81", "10"), c("100", "10", "1"), genes.kegg, c("avg", "rcmax.avg") ) (clus <- clusterGeneSim( c("18", "81", "10"), c("100", "10", "1"), genes.kegg, "avg" )) combineScores(clus, "rcmax.avg") } else { warning("You need org.Hs.eg.db package for this example") }
if (require("org.Hs.eg.db")) { # Extract the paths of all genes of org.Hs.eg.db from KEGG (last update in # data of June 31st 2011) genes.kegg <- as.list(org.Hs.egPATH) clusterGeneSim(c("18", "81", "10"), c("100", "10", "1"), genes.kegg) clusterGeneSim( c("18", "81", "10"), c("100", "10", "1"), genes.kegg, c("avg", "avg") ) clusterGeneSim( c("18", "81", "10"), c("100", "10", "1"), genes.kegg, c("avg", "rcmax.avg") ) (clus <- clusterGeneSim( c("18", "81", "10"), c("100", "10", "1"), genes.kegg, "avg" )) combineScores(clus, "rcmax.avg") } else { warning("You need org.Hs.eg.db package for this example") }
Looks for the similarity between genes in groups
clusterSim(cluster1, cluster2, info, method = "max", ...) ## S4 method for signature 'character,character,GeneSetCollection' clusterSim(cluster1, cluster2, info, method = "max", ...)
clusterSim(cluster1, cluster2, info, method = "max", ...) ## S4 method for signature 'character,character,GeneSetCollection' clusterSim(cluster1, cluster2, info, method = "max", ...)
cluster1 , cluster2
|
A vector with genes. |
info |
A GeneSetCollection or a list of genes and the pathways they are involved. |
method |
one of |
... |
Other arguments passed to |
Once the pathways for each cluster are found they are combined using
combineScores()
.
clusterSim
returns a similarity score of the two clusters
clusterSim(
cluster1 = character,
cluster2 = character,
info = GeneSetCollection
)
: Calculates all the similarities of the
GeneSetCollection and combine them using combineScoresPar()
Lluís Revilla
For a different approach see clusterGeneSim()
,
combineScores()
and conversions()
if (require("org.Hs.eg.db")) { # Extract the paths of all genes of org.Hs.eg.db from KEGG (last update in # data of June 31st 2011) genes.kegg <- as.list(org.Hs.egPATH) clusterSim(c("9", "15", "10"), c("33", "19", "20"), genes.kegg) clusterSim(c("9", "15", "10"), c("33", "19", "20"), genes.kegg, NULL) clusterSim(c("9", "15", "10"), c("33", "19", "20"), genes.kegg, "avg") } else { warning("You need org.Hs.eg.db package for this example") }
if (require("org.Hs.eg.db")) { # Extract the paths of all genes of org.Hs.eg.db from KEGG (last update in # data of June 31st 2011) genes.kegg <- as.list(org.Hs.egPATH) clusterSim(c("9", "15", "10"), c("33", "19", "20"), genes.kegg) clusterSim(c("9", "15", "10"), c("33", "19", "20"), genes.kegg, NULL) clusterSim(c("9", "15", "10"), c("33", "19", "20"), genes.kegg, "avg") } else { warning("You need org.Hs.eg.db package for this example") }
Function similar to combn but for larger vectors. To avoid allocating a big vector with all the combinations each one can be computed with this function.
combinadic(n, r, i)
combinadic(n, r, i)
n |
Elements to extract the combination from |
r |
Number of elements per combination |
i |
ith combination |
The combination ith of the elements
Joshua Ulrich
StackOverflow answer 4494469/2886003
# Output of all combinations combn(LETTERS[1:5], 2) # Otuput of the second combination combinadic(LETTERS[1:5], 2, 2)
# Output of all combinations combn(LETTERS[1:5], 2) # Otuput of the second combination combinadic(LETTERS[1:5], 2, 2)
Combine several similarities into one using several methods.
combineScores( scores, method = c("max", "avg", "rcmax", "rcmax.avg", "BMA", "reciprocal"), round = FALSE, t = 0 ) combineScoresPar(scores, method, subSets = NULL, BPPARAM = NULL, ...)
combineScores( scores, method = c("max", "avg", "rcmax", "rcmax.avg", "BMA", "reciprocal"), round = FALSE, t = 0 ) combineScoresPar(scores, method, subSets = NULL, BPPARAM = NULL, ...)
scores |
Matrix of scores to be combined |
method |
one of |
round |
Should the resulting value be rounded to the third digit? |
t |
Numeric value to filter scores below this value. Only used in the reciprocal method. |
subSets |
List of combinations as info in other functions. |
BPPARAM |
BiocParallel back-end parameters.
By default ( |
... |
Other arguments passed to |
The input matrix can be a base matrix or a matrix from package Matrix. The methods return:
avg: The average or mean value.
max: The max value.
rcmax: The max of the column means or row means.
rcmax.avg: The sum of the max values by rows and columns divided by the number of columns and rows.
BMA: The same as rcmax.avg
.
reciprocal: The double of the sum of the reciprocal maximal similarities (above a threshold) divided by the number of elements. See equation 3 of the Tao et al 2007 article.
A numeric value as described in details.
combineScores
is a version of the function of the same name in
package GOSemSim (GOSemSim::combineScores()
) with optional
rounding and some internal differences.
Lluís Revilla based on Guangchuang Yu.
Ying Tao, Lee Sam, Jianrong Li, Carol Friedman, Yves A. Lussier; Information theory applied to the sparse gene ontology annotation network to predict novel gene function. Bioinformatics 2007; 23 (13): i529-i538. doi: 10.1093/bioinformatics/btm195
register in BiocParallel about the arguments accepted by BPPARAM.
(d <- structure(c( 0.4, 0.6, 0.222222222222222, 0.4, 0.4, 0, 0.25, 0.5, 0.285714285714286 ), .Dim = c(3L, 3L), .Dimnames = list(c("a", "b", "c"), c("d", "e", "f")) )) e <- d sapply(c("avg", "max", "rcmax", "rcmax.avg", "BMA", "reciprocal"), combineScores, scores = d ) d[1, 2] <- NA sapply(c("avg", "max", "rcmax", "rcmax.avg", "BMA", "reciprocal"), combineScores, scores = d ) colnames(e) <- rownames(e) combineScoresPar(e, list(a = c("a", "b"), b = c("b", "c")), method = "max" )
(d <- structure(c( 0.4, 0.6, 0.222222222222222, 0.4, 0.4, 0, 0.25, 0.5, 0.285714285714286 ), .Dim = c(3L, 3L), .Dimnames = list(c("a", "b", "c"), c("d", "e", "f")) )) e <- d sapply(c("avg", "max", "rcmax", "rcmax.avg", "BMA", "reciprocal"), combineScores, scores = d ) d[1, 2] <- NA sapply(c("avg", "max", "rcmax", "rcmax.avg", "BMA", "reciprocal"), combineScores, scores = d ) colnames(e) <- rownames(e) combineScoresPar(e, list(a = c("a", "b"), b = c("b", "c")), method = "max" )
Given several sources of pathways with the same for the same id of the genes it merge them.
combineSources(...)
combineSources(...)
... |
Lists of genes and their pathways. |
It assumes that the identifier of the genes are the same for both sources but if many aren't equal it issues a warning. Only unique pathways identifiers are returned.
A single list with the pathways of each source on the same gene.
DB1 <- list(g1 = letters[6:8], g2 = letters[1:5], g3 = letters[4:7]) DB2 <- list( g1 = c("one", "two"), g2 = c("three", "four"), g3 = c("another", "two") ) combineSources(DB1, DB2) combineSources(DB1, DB1) DB3 <- list( g1 = c("one", "two"), g2 = c("three", "four"), g4 = c("five", "six", "seven"), g5 = c("another", "two") ) combineSources(DB1, DB3) # A warning is expected
DB1 <- list(g1 = letters[6:8], g2 = letters[1:5], g3 = letters[4:7]) DB2 <- list( g1 = c("one", "two"), g2 = c("three", "four"), g3 = c("another", "two") ) combineSources(DB1, DB2) combineSources(DB1, DB1) DB3 <- list( g1 = c("one", "two"), g2 = c("three", "four"), g4 = c("five", "six", "seven"), g5 = c("another", "two") ) combineSources(DB1, DB3) # A warning is expected
Functions to convert the similarity coefficients between Jaccard and Dice. D2J is the opposite of J2D.
D2J(D) J2D(J)
D2J(D) J2D(J)
D |
Dice coefficient, as returned by |
J |
Jaccard coefficient |
A numeric value.
Lluís Revilla
D2J(0.5) J2D(0.5) D2J(J2D(0.5))
D2J(0.5) J2D(0.5) D2J(J2D(0.5))
Function to estimate how much two list of genes overlap by looking how much of the nodes are shared. Calculates the Dice similarity
diceSim(g1, g2)
diceSim(g1, g2)
g1 , g2
|
A character list with the names of the proteins in each pathway. |
It requires a vector of characters otherwise will return an NA
.
A score between 0 and 1 calculated as the double of the proteins shared by g1 and g2 divided by the number of genes in both groups.
Lluís Revilla
Used for geneSim()
, see conversions()
help
page to transform Dice score to Jaccard score.
genes.id2 <- c("52", "11342", "80895", "57654", "548953", "11586", "45985") genes.id1 <- c( "52", "11342", "80895", "57654", "58493", "1164", "1163", "4150", "2130", "159" ) diceSim(genes.id1, genes.id2) diceSim(genes.id2, genes.id2)
genes.id2 <- c("52", "11342", "80895", "57654", "548953", "11586", "45985") genes.id1 <- c( "52", "11342", "80895", "57654", "58493", "1164", "1163", "4150", "2130", "159" ) diceSim(genes.id1, genes.id2) diceSim(genes.id2, genes.id2)
Finds the indices of duplicated elements in the vector given.
duplicateIndices(vec)
duplicateIndices(vec)
vec |
Vector of identifiers presumably duplicated |
For each duplication it can return a list or if all the duplication events are of the same length it returns a matrix, where each column is duplicated.
The format is determined by the simplify2array
Lluís Revilla
duplicateIndices(c("52", "52", "53", "55")) # One repeated element duplicateIndices(c("52", "52", "53", "55", "55")) # Repeated elements duplicateIndices(c("52", "55", "53", "55", "52")) # Mixed repeated elements
duplicateIndices(c("52", "52", "53", "55")) # One repeated element duplicateIndices(c("52", "52", "53", "55", "55")) # Repeated elements duplicateIndices(c("52", "55", "53", "55", "52")) # Mixed repeated elements
Given two genes, calculates the Dice similarity between each pathway which is combined to obtain a similarity between the genes.
geneSim(gene1, gene2, info, method = "max", ...) ## S4 method for signature 'character,character,GeneSetCollection' geneSim(gene1, gene2, info, method = "max", ...)
geneSim(gene1, gene2, info, method = "max", ...) ## S4 method for signature 'character,character,GeneSetCollection' geneSim(gene1, gene2, info, method = "max", ...)
gene1 , gene2
|
Ids of the genes to calculate the similarity, to be found in genes. |
info |
A GeneSetCollection or a list of genes and the pathways they are involved. |
method |
one of |
... |
Other arguments passed to |
Given the information about the genes and their pathways, uses the ids
of the genes to find the Dice similarity score for each pathway comparison
between the genes. Later this similarities are combined using
combineScoresPar()
.
The highest Dice score of all the combinations of pathways between
the two ids compared if a method to combine scores is provided or NA if
there isn't information for one gene.
If an NA
is returned this means that there isn't information
available for any pathways for one of the genes. Otherwise a number
between 0 and 1 (both included) is returned. Note that there isn't a
negative value of similarity.
geneSim(gene1 = character, gene2 = character, info = GeneSetCollection)
: Calculates all the similarities of the GeneSetCollection
and combine them using combineScoresPar()
Lluís Revilla
mgeneSim()
, conversions()
help page to transform Dice
score to Jaccard score. For the method to combine the scores see
combineScoresPar()
.
if (require("org.Hs.eg.db") & require("reactome.db")) { # Extract the paths of all genes of org.Hs.eg.db from KEGG # (last update in data of June 31st 2011) genes.kegg <- as.list(org.Hs.egPATH) # Extracts the paths of all genes of org.Hs.eg.db from reactome genes.react <- as.list(reactomeEXTID2PATHID) geneSim("81", "18", genes.react) geneSim("81", "18", genes.kegg) geneSim("81", "18", genes.react, NULL) geneSim("81", "18", genes.kegg, NULL) } else { warning("You need reactome.db and org.Hs.eg.db package for this example") }
if (require("org.Hs.eg.db") & require("reactome.db")) { # Extract the paths of all genes of org.Hs.eg.db from KEGG # (last update in data of June 31st 2011) genes.kegg <- as.list(org.Hs.egPATH) # Extracts the paths of all genes of org.Hs.eg.db from reactome genes.react <- as.list(reactomeEXTID2PATHID) geneSim("81", "18", genes.react) geneSim("81", "18", genes.kegg) geneSim("81", "18", genes.react, NULL) geneSim("81", "18", genes.kegg, NULL) } else { warning("You need reactome.db and org.Hs.eg.db package for this example") }
Calculate the pathways per gene of list
inverseList(x)
inverseList(x)
x |
A list with genes as names and names of pathways as values of the list |
The number of pathways each gene has.
Lluís Revilla
Looks for the similarity between genes of a group and then between each group's genes.
mclusterGeneSim(clusters, info, method = c("max", "rcmax.avg"), ...) ## S4 method for signature 'list,GeneSetCollection' mclusterGeneSim(clusters, info, method = c("max", "rcmax.avg"), ...)
mclusterGeneSim(clusters, info, method = c("max", "rcmax.avg"), ...) ## S4 method for signature 'list,GeneSetCollection' mclusterGeneSim(clusters, info, method = c("max", "rcmax.avg"), ...)
clusters |
A list of clusters of genes to be found in |
info |
A GeneSetCollection or a list of genes and the pathways they are involved. |
method |
A vector with two or one argument to be passed to combineScores the first one is used to summarize the similarities of genes, the second one for clusters. |
... |
Other arguments passed to |
Returns a matrix with the similarity scores for each cluster comparison.
mclusterGeneSim(clusters = list, info = GeneSetCollection)
: Calculates all the similarities of the
GeneSetCollection and combine them using combineScoresPar()
Lluís Revilla
clusterGeneSim()
, clusterSim()
and
combineScores()
if (require("org.Hs.eg.db")) { genes.kegg <- as.list(org.Hs.egPATH) clusters <- list( cluster1 = c("18", "81", "10"), cluster2 = c("100", "594", "836"), cluster3 = c("18", "10", "83") ) mclusterGeneSim(clusters, genes.kegg) mclusterGeneSim(clusters, genes.kegg, c("max", "avg")) mclusterGeneSim(clusters, genes.kegg, c("max", "BMA")) } else { warning("You need org.Hs.eg.db package for this example") }
if (require("org.Hs.eg.db")) { genes.kegg <- as.list(org.Hs.egPATH) clusters <- list( cluster1 = c("18", "81", "10"), cluster2 = c("100", "594", "836"), cluster3 = c("18", "10", "83") ) mclusterGeneSim(clusters, genes.kegg) mclusterGeneSim(clusters, genes.kegg, c("max", "avg")) mclusterGeneSim(clusters, genes.kegg, c("max", "BMA")) } else { warning("You need org.Hs.eg.db package for this example") }
Looks for the similarity between genes in groups. Once the pathways for each cluster are found they are combined using codecombineScores.
mclusterSim(clusters, info, method = "max", ...) ## S4 method for signature 'list,GeneSetCollection' mclusterSim(clusters, info, method = "max", ...)
mclusterSim(clusters, info, method = "max", ...) ## S4 method for signature 'list,GeneSetCollection' mclusterSim(clusters, info, method = "max", ...)
clusters |
A list of clusters of genes to be found in |
info |
A GeneSetCollection or a list of genes and the pathways they are involved. |
method |
one of |
... |
Other arguments passed to |
mclusterSim
returns a matrix with the similarity scores for
each cluster comparison.
mclusterSim(clusters = list, info = GeneSetCollection)
: Calculates all the similarities of the GeneSetCollection
and combine them using combineScoresPar()
Lluís Revilla
For a different approach see clusterGeneSim()
,
combineScores()
and conversions()
if (require("org.Hs.eg.db")) { # Extract the paths of all genes of org.Hs.eg.db from KEGG (last update in # data of June 31st 2011) genes.kegg <- as.list(org.Hs.egPATH) clusters <- list( cluster1 = c("18", "81", "10"), cluster2 = c("100", "10", "1"), cluster3 = c("18", "10", "83") ) mclusterSim(clusters, genes.kegg) mclusterSim(clusters, genes.kegg, "avg") } else { warning("You need org.Hs.eg.db package for this example") }
if (require("org.Hs.eg.db")) { # Extract the paths of all genes of org.Hs.eg.db from KEGG (last update in # data of June 31st 2011) genes.kegg <- as.list(org.Hs.egPATH) clusters <- list( cluster1 = c("18", "81", "10"), cluster2 = c("100", "10", "1"), cluster3 = c("18", "10", "83") ) mclusterSim(clusters, genes.kegg) mclusterSim(clusters, genes.kegg, "avg") } else { warning("You need org.Hs.eg.db package for this example") }
Given two genes, calculates the Dice similarity between each pathway which is combined to obtain a similarity between the genes.
mgeneSim(genes, info, method = "max", ...) ## S4 method for signature 'character,GeneSetCollection' mgeneSim(genes, info, method = "max", ...) ## S4 method for signature 'missing,GeneSetCollection' mgeneSim(genes, info, method = "max", ...)
mgeneSim(genes, info, method = "max", ...) ## S4 method for signature 'character,GeneSetCollection' mgeneSim(genes, info, method = "max", ...) ## S4 method for signature 'missing,GeneSetCollection' mgeneSim(genes, info, method = "max", ...)
genes |
A vector of genes. |
info |
A GeneSetCollection or a list of genes and the pathways they are involved. |
method |
one of |
... |
Other arguments passed to |
Given the information about the genes and their pathways, uses the ids
of the genes to find the Dice similarity score for each pathway comparison
between the genes. Later this similarities are combined using
combineScoresPar()
.
mgeneSim
returns the matrix of similarities between the genes
in the vector
mgeneSim(genes = character, info = GeneSetCollection)
: Calculates all the similarities of the list and
combine them using combineScoresPar()
mgeneSim(genes = missing, info = GeneSetCollection)
: Calculates all the similarities of the list and
combine them using combineScoresPar()
genes accept named characters and the output will use the names of the genes.
geneSim()
, conversions()
help page to transform Dice
score to Jaccard score. For the method to combine the scores see
combineScoresPar()
.
if (require("org.Hs.eg.db") & require("reactome.db")) { # Extract the paths of all genes of org.Hs.eg.db from KEGG # (last update in data of June 31st 2011) genes.kegg <- as.list(org.Hs.egPATH) # Extracts the paths of all genes of org.Hs.eg.db from reactome genes.react <- as.list(reactomeEXTID2PATHID) mgeneSim(c("81", "18", "10"), genes.react) mgeneSim(c("81", "18", "10"), genes.react, "avg") named_genes <- structure(c("81", "18", "10"), .Names = c("ACTN4", "ABAT", "NAT2") ) mgeneSim(named_genes, genes.react, "max") } else { warning("You need reactome.db and org.Hs.eg.db package for this example") }
if (require("org.Hs.eg.db") & require("reactome.db")) { # Extract the paths of all genes of org.Hs.eg.db from KEGG # (last update in data of June 31st 2011) genes.kegg <- as.list(org.Hs.egPATH) # Extracts the paths of all genes of org.Hs.eg.db from reactome genes.react <- as.list(reactomeEXTID2PATHID) mgeneSim(c("81", "18", "10"), genes.react) mgeneSim(c("81", "18", "10"), genes.react, "avg") named_genes <- structure(c("81", "18", "10"), .Names = c("ACTN4", "ABAT", "NAT2") ) mgeneSim(named_genes, genes.react, "max") } else { warning("You need reactome.db and org.Hs.eg.db package for this example") }
Calculates the similarity between several pathways using dice similarity score.
If one needs the matrix of similarities between pathways set the argument
methods to NULL
.
mpathSim(pathways, info, method = NULL, ...) ## S4 method for signature 'character,GeneSetCollection,ANY' mpathSim(pathways, info, method = NULL, ...) ## S4 method for signature 'missing,GeneSetCollection,ANY' mpathSim(pathways, info, method = NULL, ...) ## S4 method for signature 'missing,list,ANY' mpathSim(pathways, info, method = NULL, ...) ## S4 method for signature 'missing,list,missing' mpathSim(pathways, info, method = NULL, ...)
mpathSim(pathways, info, method = NULL, ...) ## S4 method for signature 'character,GeneSetCollection,ANY' mpathSim(pathways, info, method = NULL, ...) ## S4 method for signature 'missing,GeneSetCollection,ANY' mpathSim(pathways, info, method = NULL, ...) ## S4 method for signature 'missing,list,ANY' mpathSim(pathways, info, method = NULL, ...) ## S4 method for signature 'missing,list,missing' mpathSim(pathways, info, method = NULL, ...)
pathways |
Pathways to calculate the similarity for |
info |
A list of genes and the pathways they are involved or a GeneSetCollection object |
method |
To combine the scores of each pathway, one of |
... |
Other arguments passed to |
The similarity between those pathways or all the similarities between each comparison.
mpathSim(pathways = character, info = GeneSetCollection, method = ANY)
: Calculates the similarity between the provided pathways
of the GeneSetCollection using combineScoresPar
mpathSim(pathways = missing, info = GeneSetCollection, method = ANY)
: Calculates all the similarities of the
GeneSetCollection and combine them using combineScoresPar
mpathSim(pathways = missing, info = list, method = ANY)
: Calculates all the similarities of the list and
combine them using combineScoresPar
mpathSim(pathways = missing, info = list, method = missing)
: Calculates all the similarities of the list
pathways
accept named characters, and then the output will have
the names
pathSim()
For single pairwise comparison.
conversions()
To convert the Dice similarity to Jaccard similarity
if (require("reactome.db")) { genes.react <- as.list(reactomeEXTID2PATHID) (pathways <- sample(unique(unlist(genes.react)), 10)) mpathSim(pathways, genes.react, NULL) named_paths <- structure( c("R-HSA-112310", "R-HSA-112316", "R-HSA-112315"), .Names = c( "Neurotransmitter Release Cycle", "Neuronal System", "Transmission across Chemical Synapses" ) ) mpathSim(named_paths, genes.react, NULL) many_pathways <- sample(unique(unlist(genes.react)), 152) mpathSim(many_pathways, genes.react, "avg") } else { warning("You need reactome.db package for this example") }
if (require("reactome.db")) { genes.react <- as.list(reactomeEXTID2PATHID) (pathways <- sample(unique(unlist(genes.react)), 10)) mpathSim(pathways, genes.react, NULL) named_paths <- structure( c("R-HSA-112310", "R-HSA-112316", "R-HSA-112315"), .Names = c( "Neurotransmitter Release Cycle", "Neuronal System", "Transmission across Chemical Synapses" ) ) mpathSim(named_paths, genes.react, NULL) many_pathways <- sample(unique(unlist(genes.react)), 152) mpathSim(many_pathways, genes.react, "avg") } else { warning("You need reactome.db package for this example") }
Calculates the similarity between pathways using dice similarity score.
diceSim
is used to calculate similarities between the two pathways.
pathSim(pathway1, pathway2, info) ## S4 method for signature 'character,character,GeneSetCollection' pathSim(pathway1, pathway2, info)
pathSim(pathway1, pathway2, info) ## S4 method for signature 'character,character,GeneSetCollection' pathSim(pathway1, pathway2, info)
pathway1 , pathway2
|
A single pathway to calculate the similarity |
info |
A GeneSetCollection or a list of genes and the pathways they are involved. |
The similarity between those pathways or all the similarities between each comparison.
pathSim(pathway1 = character, pathway2 = character, info = GeneSetCollection)
: Calculates all the similarities of a GeneSetCollection
and combine them using combineScoresPar
Lluís Revilla
conversions()
help page to transform Dice score to Jaccard
score.
mpathSim()
for multiple pairwise comparison of pathways.
if (require("reactome.db")) { # Extracts the paths of all genes of org.Hs.eg.db from reactome genes.react <- as.list(reactomeEXTID2PATHID) (paths <- sample(unique(unlist(genes.react)), 2)) pathSim(paths[1], paths[2], genes.react) } else { warning("You need reactome.db package for this example") }
if (require("reactome.db")) { # Extracts the paths of all genes of org.Hs.eg.db from reactome genes.react <- as.list(reactomeEXTID2PATHID) (paths <- sample(unique(unlist(genes.react)), 2)) pathSim(paths[1], paths[2], genes.react) } else { warning("You need reactome.db package for this example") }
The position of the nodes is based on the similarity between them.
Plot how similar are the data
plot_data(x, top) plot_similarity(pd)
plot_data(x, top) plot_similarity(pd)
x |
Matrix with the similarities. |
top |
a number between 0 and 1 to select the edges relating the elements of the matrix. |
pd |
The plot data from |
A list with two elements:
nodes: The position and name of the nodes
edges: The information about the selected edges
A ggplot object
if (require("org.Hs.eg.db") & require("reactome.db")) { # Extract the paths of all genes of org.Hs.eg.db from KEGG # (last update in data of June 31st 2011) genes.kegg <- as.list(org.Hs.egPATH) # Extracts the paths of all genes of org.Hs.eg.db from reactome genes.react <- as.list(reactomeEXTID2PATHID) sim <- mgeneSim(c("81", "18", "10"), genes.react) pd <- plot_data(sim, top = 0.25) if (requireNamespace("ggplot2", quietly = TRUE)){ plot_similarity(pd) } }
if (require("org.Hs.eg.db") & require("reactome.db")) { # Extract the paths of all genes of org.Hs.eg.db from KEGG # (last update in data of June 31st 2011) genes.kegg <- as.list(org.Hs.egPATH) # Extracts the paths of all genes of org.Hs.eg.db from reactome genes.react <- as.list(reactomeEXTID2PATHID) sim <- mgeneSim(c("81", "18", "10"), genes.react) pd <- plot_data(sim, top = 0.25) if (requireNamespace("ggplot2", quietly = TRUE)){ plot_similarity(pd) } }
Given the indices of the duplicated entries remove the columns and rows until just one is left, it keeps the duplicated with the highest absolute mean value.
removeDup(cor_mat, dupli)
removeDup(cor_mat, dupli)
cor_mat |
List of matrices |
dupli |
List of indices with duplicated entries |
A matrix with only one of the columns and rows duplicated
Lluís Revilla
duplicateIndices()
to obtain the list of indices with
duplicated entries.
a <- seq2mat(c("52", "52", "53", "55"), runif(choose(4, 2))) b <- seq2mat(c("52", "52", "53", "55"), runif(choose(4, 2))) mat <- list("kegg" = a, "react" = b) mat dupli <- duplicateIndices(rownames(a)) remat <- removeDup(mat, dupli) remat
a <- seq2mat(c("52", "52", "53", "55"), runif(choose(4, 2))) b <- seq2mat(c("52", "52", "53", "55"), runif(choose(4, 2))) mat <- list("kegg" = a, "react" = b) mat dupli <- duplicateIndices(rownames(a)) remat <- removeDup(mat, dupli) remat
Fills a matrix of ncol = length(x)
and nrow = length(x)
with
the values in dat
and setting the diagonal to 1.
seq2mat(x, dat)
seq2mat(x, dat)
x |
names of columns and rows, used to define the size of the matrix |
dat |
Data to fill with the matrix with except the diagonal. |
dat
should be at least choose(length(x), 2)
of length. It
assumes that the data provided comes from using the row and column id to
obtain it.
A square matrix with the diagonal set to 1 and dat
on the
upper and lower triangle with the columns ids and row ids from x.
Lluís Revilla
seq2mat(LETTERS[1:5], 1:10) seq2mat(LETTERS[1:5], seq(from = 0.1, to = 1, by = 0.1))
seq2mat(LETTERS[1:5], 1:10) seq2mat(LETTERS[1:5], seq(from = 0.1, to = 1, by = 0.1))
Function to join list of similarities by a function provided by the user.
similarities(sim, func, ...)
similarities(sim, func, ...)
sim |
list of similarities to be joined. All similarities must have the same dimensions. The genes are assumed to be in the same order for all the matrices. |
func |
function to perform on those similarities: |
... |
Other arguments passed to the function |
A matrix of the size of the similarities
It doesn't check that the columns and rows of the matrices are in the same order or are the same.
Lluís Revilla
weighted()
for functions that can be used, and
addSimilarities()
for a wrapper to one of them
set.seed(100) a <- seq2mat(LETTERS[1:5], rnorm(10)) b <- seq2mat(LETTERS[1:5], seq(from = 0.1, to = 1, by = 0.1)) sim <- list(b, a) similarities(sim, weighted.prod, c(0.5, 0.5)) # Note the differences in the sign of some values similarities(sim, weighted.sum, c(0.5, 0.5))
set.seed(100) a <- seq2mat(LETTERS[1:5], rnorm(10)) b <- seq2mat(LETTERS[1:5], seq(from = 0.1, to = 1, by = 0.1)) sim <- list(b, a) similarities(sim, weighted.prod, c(0.5, 0.5)) # Note the differences in the sign of some values similarities(sim, weighted.sum, c(0.5, 0.5))
Calculates the weighted sum or product of x
. Each values should have
its weight, otherwise it will throw an error.
weighted.sum(x, w, abs = TRUE) weighted.prod(x, w)
weighted.sum(x, w, abs = TRUE) weighted.prod(x, w)
x |
an object containing the values whose weighted operations is to be computed |
w |
a numerical vector of weights the same length as |
abs |
If any |
This functions are thought to be used with similarities
. As some
similarities might be positive and others negative the argument abs
is provided for weighted.sum
, assuming that only one similarity will
be negative (usually the one coming from expression correlation).
weighted.sum
returns the sum of the product of x*weights
removing all NA
values. See parameter abs
if there are any
negative values.
weighted.prod
returns the product of product of x*weights
removing all NA
values.
Lluís Revilla
weighted.mean()
, similarities()
and
addSimilarities()
expr <- c(-0.2, 0.3, 0.5, 0.8, 0.1) weighted.sum(expr, c(0.5, 0.2, 0.1, 0.1, 0.1)) weighted.sum(expr, c(0.5, 0.2, 0.1, 0.2, 0.1), FALSE) weighted.sum(expr, c(0.4, 0.2, 0.1, 0.2, 0.1)) weighted.sum(expr, c(0.4, 0.2, 0.1, 0.2, 0.1), FALSE) weighted.sum(expr, c(0.4, 0.2, 0, 0.2, 0.1)) weighted.sum(expr, c(0.5, 0.2, 0, 0.2, 0.1)) # Compared to weighted.prod: weighted.prod(expr, c(0.5, 0.2, 0.1, 0.1, 0.1)) weighted.prod(expr, c(0.4, 0.2, 0.1, 0.2, 0.1)) weighted.prod(expr, c(0.4, 0.2, 0, 0.2, 0.1)) weighted.prod(expr, c(0.5, 0.2, 0, 0.2, 0.1))
expr <- c(-0.2, 0.3, 0.5, 0.8, 0.1) weighted.sum(expr, c(0.5, 0.2, 0.1, 0.1, 0.1)) weighted.sum(expr, c(0.5, 0.2, 0.1, 0.2, 0.1), FALSE) weighted.sum(expr, c(0.4, 0.2, 0.1, 0.2, 0.1)) weighted.sum(expr, c(0.4, 0.2, 0.1, 0.2, 0.1), FALSE) weighted.sum(expr, c(0.4, 0.2, 0, 0.2, 0.1)) weighted.sum(expr, c(0.5, 0.2, 0, 0.2, 0.1)) # Compared to weighted.prod: weighted.prod(expr, c(0.5, 0.2, 0.1, 0.1, 0.1)) weighted.prod(expr, c(0.4, 0.2, 0.1, 0.2, 0.1)) weighted.prod(expr, c(0.4, 0.2, 0, 0.2, 0.1)) weighted.prod(expr, c(0.5, 0.2, 0, 0.2, 0.1))