Title: | Extending guilt by association by degree |
---|---|
Description: | The package implements a series of highly efficient tools to calculate functional properties of networks based on guilt by association methods. |
Authors: | Sara Ballouz [aut, cre], Melanie Weber [aut, ctb], Paul Pavlidis [aut], Jesse Gillis [aut, ctb] |
Maintainer: | Sara Ballouz <[email protected]> |
License: | GPL-2 |
Version: | 1.35.0 |
Built: | 2024-11-29 05:25:00 UTC |
Source: | https://github.com/bioc/EGAD |
The function calculates the assortativity of a network, that measures the preference of interactions between similar nodes. As in most literature, 'similarity' is here defined in terms of node degrees.
assortativity(network)
assortativity(network)
network |
matrix indicating network structure (symmetric) |
Numeric value
network <- matrix( sample(c(0,1),36, replace=TRUE), nrow=6,byrow=TRUE) assort_value <- assortativity(network)
network <- matrix( sample(c(0,1),36, replace=TRUE), nrow=6,byrow=TRUE) assort_value <- assortativity(network)
A dataset containing identifiers for gene transcripts
A data frame with 60483 rows and 10 variables:
chromosome
chromosomal start position, in base pairs
chromosomal end position, in base pairs
chromosomal strand, + or -
unknown
ENSEMBL identifier
type of transcript
status of transcript
HUGO identifier
Entrez identifier
@source ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_22/
A dataset containing identifiers for gene transcripts
A data frame with 46517 rows and 10 variables:
chromosome
chromosomal start position, in base pairs
chromosomal end position, in base pairs
chromosomal strand, + or -
unknown
ENSEMBL identifier
type of transcript
status of transcript
HUGO identifier
Entrez identifier
@source ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M7/
The function calculates the AUC for a functional group analytically using an optimal ranked list of genes that indicates association between genes and groups.
auc_multifunc(annotations, optimallist)
auc_multifunc(annotations, optimallist)
annotations |
binary matrix indicating which list elements are in which functional groups. |
optimallist |
Ranked list (multifunctionality analysis, see |
aucs array of aucs for each group in annotations
annotations <- c(rep(0,10)) annotations[c(1,3,5)] <- 1 optimallist <- 10:1 aurocs_mf <- auc_multifunc(annotations, optimallist)
annotations <- c(rep(0,10)) annotations[c(1,3,5)] <- 1 optimallist <- 10:1 aurocs_mf <- auc_multifunc(annotations, optimallist)
The function calculates the area under the precision-recall curve
auprc(scores, labels)
auprc(scores, labels)
scores |
numeric array |
labels |
binary array |
auprc Numeric value
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 auprc <- auprc(scores, labels)
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 auprc <- auprc(scores, labels)
The function calculates the area under the receiver operating characteristic (ROC) curve analytically
auroc_analytic(ranks, labels)
auroc_analytic(ranks, labels)
ranks |
numeric array |
labels |
binary array |
auroc Numeric value
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 auroc <- auroc_analytic(scores, labels)
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 auroc <- auroc_analytic(scores, labels)
A data frame containing protein-protein interactions
A data frame with 211506 rows and 2 variables:
List of Entrez identifiers, interactor A
List of Entrez identifiers, interactor B
@source http://thebiogrid.org/
The function creates a gene-by-gene matrix with binary entries indicating interaction (1) or no interaction (0) between the genes.
build_binary_network(data, list)
build_binary_network(data, list)
data |
2-column matrix, each row a pair indicating a relationship or interaction |
list |
string array of genes/labels/ids |
net matrix binary characterizing interactions
data <- cbind(edgeA=c('gene1','gene2'),edgeB=c('gene3','gene3')) list <- c('gene1','gene2','gene3') network <- build_binary_network(data,list)
data <- cbind(edgeA=c('gene1','gene2'),edgeB=c('gene3','gene3')) list <- c('gene1','gene2','gene3') network <- build_binary_network(data,list)
The function generates a dense coexpression network from expression data stored in
the expressionSet data type. Correlation coefficicents are used as to weight the edges
of the nodes (genes). Calls build_coexp_network
.
build_coexp_expressionSet( exprsSet, gene.list, method = "spearman", flag = "rank" )
build_coexp_expressionSet( exprsSet, gene.list, method = "spearman", flag = "rank" )
exprsSet |
data class ExpressionSet |
gene.list |
array of gene labels |
method |
correlation method to use, default Spearman's rho |
flag |
string to indicate if the network should be ranked |
net Matrix symmetric
exprs <- matrix( rnorm(1000), ncol=10,byrow=TRUE) gene.list <- paste('gene',1:100, sep='') sample.list <- paste('sample',1:10, sep='') rownames(exprs) <- gene.list colnames(exprs) <- sample.list network <- build_coexp_expressionSet(exprs, gene.list, method='pearson')
exprs <- matrix( rnorm(1000), ncol=10,byrow=TRUE) gene.list <- paste('gene',1:100, sep='') sample.list <- paste('sample',1:10, sep='') rownames(exprs) <- gene.list colnames(exprs) <- sample.list network <- build_coexp_expressionSet(exprs, gene.list, method='pearson')
The function generates a dense coexpression network from expression data stored in
GEO. The expression data is downloaded from GEO. Correlation coefficicents are used
as to weight the edges of the nodes (genes). Calls get_expression_matrix_from_GEO
and build_coexp_network
.
build_coexp_GEOID(gseid, gene.list, method = "spearman", flag = "rank")
build_coexp_GEOID(gseid, gene.list, method = "spearman", flag = "rank")
gseid |
string GEO ID of expression experiment |
gene.list |
array of gene labels |
method |
correlation method to use, default Spearman's rho |
flag |
string to indicate if the network should be ranked |
net Matrix symmetric
The function generates a dense coexpression network from expression data stored as a
matrix, with the genes as row labels, and samples as column labels.
Correlation coefficicents are used as to weight the edges of the nodes (genes).
Calls cor
.
build_coexp_network(exprs, gene.list, method = "spearman", flag = "rank")
build_coexp_network(exprs, gene.list, method = "spearman", flag = "rank")
exprs |
matrix of expression data |
gene.list |
array of gene labels |
method |
correlation method to use, default Spearman's rho |
flag |
string to indicate if the network should be ranked |
net Matrix symmetric
exprs <- matrix( rnorm(1000), ncol=10,byrow=TRUE) gene.list <- paste('gene',1:100, sep='') sample.list <- paste('sample',1:10, sep='') rownames(exprs) <- gene.list colnames(exprs) <- sample.list network <- build_coexp_network(exprs, gene.list)
exprs <- matrix( rnorm(1000), ncol=10,byrow=TRUE) gene.list <- paste('gene',1:100, sep='') sample.list <- paste('sample',1:10, sep='') rownames(exprs) <- gene.list colnames(exprs) <- sample.list network <- build_coexp_network(exprs, gene.list)
The function builds a semantic similarity network given a data and labels
build_semantic_similarity_network(genes.labels, genes)
build_semantic_similarity_network(genes.labels, genes)
genes.labels |
matrix with rows as genes and columns as a function/label |
genes |
array of gene IDs |
net Numeric value
genes.labels <- matrix( sample(c(0,1), 100, replace=TRUE),ncol=10,nrow=10) rownames(genes.labels) <- 1:10 genes <- 1:10 net <- build_semantic_similarity_network(genes.labels, genes)
genes.labels <- matrix( sample(c(0,1), 100, replace=TRUE),ncol=10,nrow=10) rownames(genes.labels) <- 1:10 genes <- 1:10 net <- build_semantic_similarity_network(genes.labels, genes)
The function creates a gene-by-gene matrix with binary entries indicating interaction (1) or no interaction (0) between the genes.
build_weighted_network(data, list)
build_weighted_network(data, list)
data |
3-column matrix, each row a pair indicating a relationship or interaction, and the last column the weight |
list |
string array of genes/labels/ids |
net matrix characterizing interactions
data <- cbind(edgeA=c('gene1','gene2'),edgeB=c('gene3','gene3'), weight=c(0.5, 0.9)) list <- c('gene1','gene2','gene3') network <- build_weighted_network(data,list)
data <- cbind(edgeA=c('gene1','gene2'),edgeB=c('gene3','gene3'), weight=c(0.5, 0.9)) list <- c('gene1','gene2','gene3') network <- build_weighted_network(data,list)
The function performs multifunctionality analysis ([1]) for a set of annotated genes and creates a rank based optimallist. For annotations use an ontology that is large enough to serve as a prior (e.g. GO, Phenocarta).
calculate_multifunc(genes.labels)
calculate_multifunc(genes.labels)
genes.labels |
Annotation matrix |
gene.mfs Returns matrix with evaluation of gene function prediction by given labels:
genes.labels <- matrix( sample(c(0,1), 100, replace=TRUE),ncol=10,nrow=10) rownames(genes.labels) = paste('gene', 1:10, sep='') colnames(genes.labels) = paste('label', 1:10, sep='') mf <- calculate_multifunc(genes.labels)
genes.labels <- matrix( sample(c(0,1), 100, replace=TRUE),ncol=10,nrow=10) rownames(genes.labels) = paste('gene', 1:10, sep='') colnames(genes.labels) = paste('label', 1:10, sep='') mf <- calculate_multifunc(genes.labels)
The function plots a smoothed curve using the convolve
function.
conv_smoother(X, Y, window, raw = FALSE, output = FALSE, ...)
conv_smoother(X, Y, window, raw = FALSE, output = FALSE, ...)
X |
numeric array |
Y |
numeric array |
window |
numeric value indicating size of window to use |
raw |
boolean |
output |
boolean |
... |
other input into the plot function |
smoothed X,Y and std Y matrix
x <- 1:1000 y <- rnorm(1000) conv <- conv_smoother(x,y,10)
x <- 1:1000 y <- rnorm(1000) conv <- conv_smoother(x,y,10)
This dataset includes
chromosomal start position, in base pairs
HUGO gene identifier
species
disease
The function extends a binary network by using the inverse of the path length between nodes as a weighted edge
extend_network(net, max = 6)
extend_network(net, max = 6)
net |
matrix binary and symmetric |
max |
numeric maximum number of jumps |
ext_net matrix dense and symmetric
net <- matrix( sample(c(0,1),36, replace=TRUE), nrow=6,byrow=TRUE) ext_net <- extend_network(net)
net <- matrix( sample(c(0,1),36, replace=TRUE), nrow=6,byrow=TRUE) ext_net <- extend_network(net)
The function filters out the rows or columns of a matrix such that the size of the group is exclusively between given min and max values
filter_network(network, flag = 1, min = 0, max = 1, ids = NA)
filter_network(network, flag = 1, min = 0, max = 1, ids = NA)
network |
numeric matrix |
flag |
numeric 1 for row filtering, 2 for column filtering |
min |
numeric value |
max |
numeric value |
ids |
array to filter on |
network numeric matrix
net <- matrix( rnorm(10000), nrow=100) filt_net <- filter_network(net,1,10,100)
net <- matrix( rnorm(10000), nrow=100) filt_net <- filter_network(net,1,10,100)
The function filters out the columns of a matrix such that the size of the group is exclusively between given min and max values
filter_network_cols(network, min = 0, max = 1, ids = NA)
filter_network_cols(network, min = 0, max = 1, ids = NA)
network |
numeric matrix |
min |
numeric value |
max |
numeric value |
ids |
array |
network numeric matrix
genes.labels <- matrix( sample( c(0,1), 10000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:100, sep='') genes.labels <- filter_network_cols(genes.labels,50,200) genes.labels <- matrix( sample( c(0,1), 10000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:100, sep='') genes.labels <- filter_network_cols(genes.labels,ids = paste('function', 1:20, sep=''))
genes.labels <- matrix( sample( c(0,1), 10000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:100, sep='') genes.labels <- filter_network_cols(genes.labels,50,200) genes.labels <- matrix( sample( c(0,1), 10000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:100, sep='') genes.labels <- filter_network_cols(genes.labels,ids = paste('function', 1:20, sep=''))
The function filters out the rows of a matrix such that the size of the group is exclusively between given min and max values
filter_network_rows(network, min = 0, max = 1, ids = NA)
filter_network_rows(network, min = 0, max = 1, ids = NA)
network |
numeric matrix |
min |
numeric value |
max |
numeric value |
ids |
array to filter on |
network numeric matrix
genes.labels <- matrix( sample( c(0,1), 10000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:100, sep='') genes.labels <- filter_network_rows(genes.labels,50,200) genes.labels <- matrix( sample( c(0,1), 10000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:100, sep='') genes.labels <- filter_network_rows(genes.labels,ids = paste('gene', 1:20, sep=''))
genes.labels <- matrix( sample( c(0,1), 10000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:100, sep='') genes.labels <- filter_network_rows(genes.labels,50,200) genes.labels <- matrix( sample( c(0,1), 10000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:100, sep='') genes.labels <- filter_network_rows(genes.labels,ids = paste('gene', 1:20, sep=''))
The function filters away the labels for the genes that are not in the orthologs list
filter_orthologs(annotations, genelist, orthologs)
filter_orthologs(annotations, genelist, orthologs)
annotations |
binary matrix |
genelist |
array of gene ids |
orthologs |
array to filter on |
annotations_filtered binary matrix
genes.labels <- matrix( sample( c(0,1), 1000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:10, sep='') gene.list <- paste('gene', 1:100, sep='') orthologs <- paste('gene', (1:50)*2, sep='') genes.labels.filt <- filter_orthologs(genes.labels, gene.list, orthologs)
genes.labels <- matrix( sample( c(0,1), 1000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:10, sep='') gene.list <- paste('gene', 1:100, sep='') orthologs <- paste('gene', (1:50)*2, sep='') genes.labels.filt <- filter_orthologs(genes.labels, gene.list, orthologs)
The function calculates fmeasure for a given beta of a precision-recall curve
fmeasure(recall, precis, beta = 1)
fmeasure(recall, precis, beta = 1)
recall |
numeric array |
precis |
numeric array |
beta |
numeric value, default is 1 |
fmeasure Numeric value
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 prc <- get_prc(scores, labels) fm <- fmeasure(prc[,1], prc[,2])
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 prc <- get_prc(scores, labels) fm <- fmeasure(prc[,1], prc[,2])
An array containing identifiers for genes in biogrid
Array
List of Entrez identifiers
@source http://thebiogrid.org/
The function calculates the area under the curve defined by x and y
get_auc(x, y)
get_auc(x, y)
x |
numeric array |
y |
numeric array |
auc numeric value
x <- 1:100 y <- 1:100 auc <- get_auc(x,y)
x <- 1:100 y <- 1:100 auc <- get_auc(x,y)
The function downloads the specifed version of biogrid for a particular taxon
get_biogrid(species = "9606", version = "3.5.181", interactions = "physical")
get_biogrid(species = "9606", version = "3.5.181", interactions = "physical")
species |
numeric taxon of species |
version |
string of biogrid version |
interactions |
string stating either physical or genetic interactions |
biogrid data.frame with interactions
The function formats the count distribution from the histogram function
get_counts(hist)
get_counts(hist)
hist |
histogram |
x,y
x <- runif(1000) counts <- get_counts( hist(x, plot=FALSE))
x <- runif(1000) counts <- get_counts( hist(x, plot=FALSE))
The function formats the density distribution from the histogram function
get_density(hist)
get_density(hist)
hist |
histogram |
array
x <- runif(1000) density <- get_density( hist(x, plot=FALSE))
x <- runif(1000) density <- get_density( hist(x, plot=FALSE))
The function downloads and parses the expression matrix from the GEMMA database, specified by the GEO ID
get_expression_data_gemma(gseid, filtered = "true")
get_expression_data_gemma(gseid, filtered = "true")
gseid |
GEO ID of the expression experiment |
filtered |
flag to indicate whether or not the data is QC |
list of genes and the expression matrix
The function downloads and parses the expression matrix from the GEO file specified by the GEO ID
get_expression_matrix_from_GEO(gseid)
get_expression_matrix_from_GEO(gseid)
gseid |
GEO ID of the expression experiment |
list of genes and the expression matrix
The function downloads the latest version of phenocarta
get_phenocarta(species = "human", type = "all")
get_phenocarta(species = "human", type = "all")
species |
string |
type |
string |
data data.frame with phenocarta data
The function calculates the recall and precision
get_prc(ranks, labels)
get_prc(ranks, labels)
ranks |
numeric array |
labels |
binary array |
recall,precision numeric arrays
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 ranks <- rank(scores) prc <- get_prc(ranks, labels)
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 ranks <- rank(scores) prc <- get_prc(ranks, labels)
The function calculates the FPR and TRPR for the receiver operating characteristic (ROC)
get_roc(ranks, labels)
get_roc(ranks, labels)
ranks |
numeric array |
labels |
binary array |
FPR,TPR numeric arrays
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 ranks <- rank(scores) roc <- get_roc(ranks, labels)
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 ranks <- rank(scores) roc <- get_roc(ranks, labels)
A dataset of the gene GO associations
A data frame with 2511938 rows and 4 variables:
gene symbol
entrez identifier
gene ontology term ID
evidence code
@source http://geneontology.org/
A dataset of the gene GO associations
A data frame with 2086086 rows and 4 variables:
gene symbol
entrez identifier
gene ontology term ID
evidence code
@source http://geneontology.org/
A dataset of the gene ontology vocabulary
A data frame with 42266 rows and 3 variables:
GO identifier
GO description
GO domain
@source http://geneontology.org/
The function annotates a list of genes according to a given ontology. It creates a binary matrix associating genes (rows) with labels (columns).
make_annotations(data, listA, listB)
make_annotations(data, listA, listB)
data |
2-column matrix, each row a pair indicating a relationship or interaction |
listA |
string array of genes |
listB |
string array of labels/functions |
net matrix binary
gene.list <- paste('gene', 1:100, sep='') labels.list <- paste('labels', 1:10, sep='') data <- matrix(0,nrow=100, ncol=2) data[,1] <- sample(gene.list, 100, replace=TRUE) data[,2] <- sample(labels.list, 100, replace=TRUE) net <- make_annotations(data, gene.list, labels.list)
gene.list <- paste('gene', 1:100, sep='') labels.list <- paste('labels', 1:10, sep='') data <- matrix(0,nrow=100, ncol=2) data[,1] <- sample(gene.list, 100, replace=TRUE) data[,2] <- sample(labels.list, 100, replace=TRUE) net <- make_annotations(data, gene.list, labels.list)
The function creates a gene-by-gene matrix with binary entries indicating interaction (1) or no interaction (0) between the genes.
make_gene_network(data, list)
make_gene_network(data, list)
data |
2-column matrix, each row a pair indicating a relationship or interaction |
list |
string array of genes |
net matrix binary characterizing interactions
gene.list <- paste('gene', 1:100, sep='') data <- matrix(0,nrow=100, ncol=2) data[,1] <- sample(gene.list, 100) data[,2] <- sample(gene.list, 100) net <- make_gene_network(data, gene.list)
gene.list <- paste('gene', 1:100, sep='') data <- matrix(0,nrow=100, ncol=2) data[,1] <- sample(gene.list, 100) data[,2] <- sample(gene.list, 100) net <- make_gene_network(data, gene.list)
The function extracts the list of all genes in the data set
make_genelist(gene_data_interacting)
make_genelist(gene_data_interacting)
gene_data_interacting |
2-column matrix, each row a pair indicating a relationship or interaction |
list array of data labels
gene.list <- paste('gene', 1:100, sep='') data <- matrix(0,nrow=100, ncol=2) data[,1] <- sample(gene.list, 50, replace=TRUE) data[,2] <- sample(gene.list, 50, replace=TRUE) genes <- make_genelist(data)
gene.list <- paste('gene', 1:100, sep='') data <- matrix(0,nrow=100, ncol=2) data[,1] <- sample(gene.list, 50, replace=TRUE) data[,2] <- sample(gene.list, 50, replace=TRUE) genes <- make_genelist(data)
Make a color transparent (Taken from an answer on StackOverflow by Nick Sabbe)
make_transparent(color, alpha = 100)
make_transparent(color, alpha = 100)
color |
color number, string or hexidecimal code |
alpha |
numeric transparency |
someColor rgb
The function performs gene function prediction based on 'guilt by association' using cross validation ([1]). Performance and significance are evaluated by calculating the AUROC or AUPRC of each functional group.
neighbor_voting( genes.labels, network, nFold = 3, output = "AUROC", FLAG_DRAW = FALSE )
neighbor_voting( genes.labels, network, nFold = 3, output = "AUROC", FLAG_DRAW = FALSE )
genes.labels |
numeric array |
network |
numeric array symmetric, gene-by-gene matrix |
nFold |
numeric value, default is 3 |
output |
string, default is AUROC |
FLAG_DRAW |
binary flag to draw roc plot |
scores numeric matrix with a row for each gene label and columns auc: the average area under the ROC or PR curve for the neighbor voting predictor across cross validation replicates avg_node_degree: the average node degree degree_null_auc: the area the ROC or PR curve for the node degree predictor
genes.labels <- matrix( sample( c(0,1), 1000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:10, sep='') net <- cor( matrix( rnorm(10000), ncol=100), method='spearman') rownames(net) <- paste('gene', 1:100, sep='') colnames(net) <- paste('gene', 1:100, sep='') aurocs <- neighbor_voting(genes.labels, net, output = 'AUROC') avgprcs <- neighbor_voting(genes.labels, net, output = 'PR')
genes.labels <- matrix( sample( c(0,1), 1000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:10, sep='') net <- cor( matrix( rnorm(10000), ncol=100), method='spearman') rownames(net) <- paste('gene', 1:100, sep='') colnames(net) <- paste('gene', 1:100, sep='') aurocs <- neighbor_voting(genes.labels, net, output = 'AUROC') avgprcs <- neighbor_voting(genes.labels, net, output = 'PR')
The function calculates the node degree of a network
node_degree(net)
node_degree(net)
net |
numeric matrix |
node_degree numeric array
net <- cor( matrix(rnorm(1000), ncol=10)) n <- 10 net <- matrix(rank(net, na.last = 'keep', ties.method = 'average'), nrow = n, ncol = n) net <- net/max(net, na.rm=TRUE) nd <- node_degree(net)
net <- cor( matrix(rnorm(1000), ncol=10)) n <- 10 net <- matrix(rank(net, na.last = 'keep', ties.method = 'average'), nrow = n, ncol = n) net <- net/max(net, na.rm=TRUE) nd <- node_degree(net)
A list containing identifiers for the subsets of gene orthologs
List orthologs for 5 species
List of Entrez identifiers, Drosophila
List of Entrez identifiers, C. elegans
List of Entrez identifiers, Yeast
List of Entrez identifiers, Mouse
List of Entrez identifiers, Zebrafish
@source http://useast.ensembl.org/index.html/
A dataset of gene disease associations
A data frame with 142272 rows and 4 variables:
chromosomal start position, in base pairs
HUGO gene identifier
species
disease
@source http://www.chibi.ubc.ca/Gemma/phenotypes.html
The function plots multiple density curves and compares their modes
plot_densities( hists, id, col = c("lightgrey"), xlab = "", ylab = "Density", mode = "hist" )
plot_densities( hists, id, col = c("lightgrey"), xlab = "", ylab = "Density", mode = "hist" )
hists |
list of histogram objects or density objects |
id |
string |
col |
color for shading |
xlab |
string x-axis label |
ylab |
string y-axis label |
mode |
flag indicating histogram or density |
null
aurocsA <- density((runif(1000)+runif(1000)+runif(1000)+runif(1000))/4) aurocsB <- density((runif(1000)+runif(1000)+runif(1000))/3) aurocsC <- density(runif(1000)) hists <- list(aurocsA, aurocsB, aurocsC) temp <- plot_densities(hists,'', mode='density')
aurocsA <- density((runif(1000)+runif(1000)+runif(1000)+runif(1000))/4) aurocsB <- density((runif(1000)+runif(1000)+runif(1000))/3) aurocsC <- density(runif(1000)) hists <- list(aurocsA, aurocsB, aurocsC) temp <- plot_densities(hists,'', mode='density')
The function plots two density curves and compares their modes
plot_density_compare( aucA, aucB, col = "lightgrey", xlab = "AUROC (neighbor voting)", ylab = "Density", mode = TRUE )
plot_density_compare( aucA, aucB, col = "lightgrey", xlab = "AUROC (neighbor voting)", ylab = "Density", mode = TRUE )
aucA |
numeric array of aurocs |
aucB |
numeric array of aurocs |
col |
color of lines |
xlab |
string label |
ylab |
string label |
mode |
boolean to plot mode or mean |
null
aurocsA <- (runif(1000)+runif(1000)+runif(1000)+runif(1000))/4 aurocsB <- runif(1000) plot_density_compare(aurocsA, aurocsB)
aurocsA <- (runif(1000)+runif(1000)+runif(1000)+runif(1000))/4 aurocsB <- runif(1000) plot_density_compare(aurocsA, aurocsB)
The function plots a the distribution of AUROCs
plot_distribution( auc, b = 20, col = "lightgrey", xlab = "", ylab = "Density", xlim = c(0.4, 1), ylim = c(0, 5), med = TRUE, avg = TRUE, density = TRUE, bars = FALSE )
plot_distribution( auc, b = 20, col = "lightgrey", xlab = "", ylab = "Density", xlim = c(0.4, 1), ylim = c(0, 5), med = TRUE, avg = TRUE, density = TRUE, bars = FALSE )
auc |
numeric aucs |
b |
array of breaks |
col |
color of line |
xlab |
string label |
ylab |
string label |
xlim |
range of values for xaxis |
ylim |
range of values for yaxis |
med |
boolean to plot median auc |
avg |
boolean to plot average auc |
density |
boolean |
bars |
boolena for barplot |
auc list and quartiles
aurocs <- (runif(1000)+runif(1000)+runif(1000)+runif(1000))/4 d <- plot_distribution(aurocs)
aurocs <- (runif(1000)+runif(1000)+runif(1000)+runif(1000))/4 d <- plot_distribution(aurocs)
The function draws a heatmap to visualize a network
plot_network_heatmap(net, colrs)
plot_network_heatmap(net, colrs)
net |
a numeric matrix of edge weights |
colrs |
a range of colors to plot the network |
null
network <- cor(matrix( rnorm(10000), nrow=100)) plot_network_heatmap(network)
network <- cor(matrix( rnorm(10000), nrow=100)) plot_network_heatmap(network)
The function calculates the precision and recall and plots the curve
plot_prc(scores, labels)
plot_prc(scores, labels)
scores |
numeric array |
labels |
binary array |
prc numeric arrays
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 roc <- plot_prc(scores, labels)
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 roc <- plot_prc(scores, labels)
The function calculates the FPR and TRPR for the receiver operating characteristic (ROC) and plots the curve
plot_roc(scores, labels)
plot_roc(scores, labels)
scores |
numeric array |
labels |
binary array |
FPR,TPR numeric arrays
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 roc <- plot_roc(scores, labels)
labels <- c(rep(0,10)) labels[c(1,3,5)] <- 1 scores <- 10:1 roc <- plot_roc(scores, labels)
The function plots a density overlay of ROCs given the scores and labels
plot_roc_overlay(scores.mat, labels.mat, nbins = 100)
plot_roc_overlay(scores.mat, labels.mat, nbins = 100)
scores.mat |
numeric array |
labels.mat |
numeric array |
nbins |
numeric value |
list of Z(matrix) and roc_sum (average ROC curve)
genes.labels <- matrix( c(rep(1, 1000), rep(0,9000)), nrow=1000, byrow=TRUE) rownames(genes.labels) = paste('gene', 1:1000, sep='') colnames(genes.labels) = paste('function', 1:10, sep='') scores <- matrix( rnorm(10000), nrow=1000) scores <- apply(scores, 2, rank) rownames(scores) = paste('gene', 1:1000, sep='') colnames(scores) = paste('function', 1:10, sep='') z <- plot_roc_overlay(scores, genes.labels)
genes.labels <- matrix( c(rep(1, 1000), rep(0,9000)), nrow=1000, byrow=TRUE) rownames(genes.labels) = paste('gene', 1:1000, sep='') colnames(genes.labels) = paste('function', 1:10, sep='') scores <- matrix( rnorm(10000), nrow=1000) scores <- apply(scores, 2, rank) rownames(scores) = paste('gene', 1:1000, sep='') colnames(scores) = paste('function', 1:10, sep='') z <- plot_roc_overlay(scores, genes.labels)
The function plots a scatter
plot_value_compare( aucA, aucB, xlab = "AUROC", ylab = "AUROC", xlim = c(0, 1), ylim = c(0, 1) )
plot_value_compare( aucA, aucB, xlab = "AUROC", ylab = "AUROC", xlim = c(0, 1), ylim = c(0, 1) )
aucA |
numeric array of aucs |
aucB |
numeric array of aucs |
xlab |
string label |
ylab |
string label |
xlim |
range of values for xaxis |
ylim |
range of values for yaxis |
null
The function performs gene function prediction on the whole data set using the 'guilt by association'-principle ([1]).
predictions(genes.labels, network)
predictions(genes.labels, network)
genes.labels |
numeric array |
network |
numeric array symmetric, gene-by-gene matrix |
scores numeric matrix
genes.labels <- matrix( sample( c(0,1), 1000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:10, sep='') net <- cor( matrix( rnorm(10000), ncol=100), method='spearman') rownames(net) <- paste('gene', 1:100, sep='') colnames(net) <- paste('gene', 1:100, sep='') preds <- predictions(genes.labels, net)
genes.labels <- matrix( sample( c(0,1), 1000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:10, sep='') net <- cor( matrix( rnorm(10000), ncol=100), method='spearman') rownames(net) <- paste('gene', 1:100, sep='') colnames(net) <- paste('gene', 1:100, sep='') preds <- predictions(genes.labels, net)
The function generates a matrix by binding the columns and rows
repmat(X, m, n)
repmat(X, m, n)
X |
numeric matrix |
m |
numeric value, repeat rows m times |
n |
numeric value, repeat columns n times |
list of genes and the expression matrix
genes.labels <- matrix( sample( c(0,1), 1000, replace=TRUE), nrow=100) expand <- repmat( genes.labels, 1,2)
genes.labels <- matrix( sample( c(0,1), 1000, replace=TRUE), nrow=100) expand <- repmat( genes.labels, 1,2)
The function runs and evaluates gene function prediction based on the
'guilt by association'-principle using neighbor voting (neighbor_voting
)
[1]. As a measure of performance and significance of results, AUCs of all
evaluated functional groups are calculated.
run_GBA(network, labels, min = 20, max = 1000, nfold = 3)
run_GBA(network, labels, min = 20, max = 1000, nfold = 3)
network |
numeric array symmetric, gene-by-gene matrix |
labels |
numeric array |
min |
numeric value to limit gene function size |
max |
numeric value to limit gene function size |
nfold |
numeric value, default is 3 |
list roc.sub, genes, auroc
genes.labels <- matrix( sample( c(0,1), 1000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:10, sep='') net <- cor( matrix( rnorm(10000), ncol=100), method='spearman') rownames(net) <- paste('gene', 1:100, sep='') colnames(net) <- paste('gene', 1:100, sep='') gba <- run_GBA(net, genes.labels, min=10)
genes.labels <- matrix( sample( c(0,1), 1000, replace=TRUE), nrow=100) rownames(genes.labels) = paste('gene', 1:100, sep='') colnames(genes.labels) = paste('function', 1:10, sep='') net <- cor( matrix( rnorm(10000), ncol=100), method='spearman') rownames(net) <- paste('gene', 1:100, sep='') colnames(net) <- paste('gene', 1:100, sep='') gba <- run_GBA(net, genes.labels, min=10)