Title: | Biological Network Reconstruction Omnibus |
---|---|
Description: | BioNERO aims to integrate all aspects of biological network inference in a single package, including data preprocessing, exploratory analyses, network inference, and analyses for biological interpretations. BioNERO can be used to infer gene coexpression networks (GCNs) and gene regulatory networks (GRNs) from gene expression data. Additionally, it can be used to explore topological properties of protein-protein interaction (PPI) networks. GCN inference relies on the popular WGCNA algorithm. GRN inference is based on the "wisdom of the crowds" principle, which consists in inferring GRNs with multiple algorithms (here, CLR, GENIE3 and ARACNE) and calculating the average rank for each interaction pair. As all steps of network analyses are included in this package, BioNERO makes users avoid having to learn the syntaxes of several packages and how to communicate between them. Finally, users can also identify consensus modules across independent expression sets and calculate intra and interspecies module preservation statistics between different networks. |
Authors: | Fabricio Almeida-Silva [cre, aut] , Thiago Venancio [aut] |
Maintainer: | Fabricio Almeida-Silva <[email protected]> |
License: | GPL-3 |
Version: | 1.15.0 |
Built: | 2024-10-30 04:27:58 UTC |
Source: | https://github.com/bioc/BioNERO |
Check scale-free topology fit for a given network
check_SFT(edgelist, net_type = "gcn")
check_SFT(edgelist, net_type = "gcn")
edgelist |
Edge list as a data frame containing node 1, node 2 and edge weight. |
net_type |
Type of biological network. One of "gcn", "grn", or "ppi". Default: gcn. |
A list with SFT fit statistics and a message indicating if the network is scale-free.
set.seed(1) exp <- t(matrix(rnorm(10000), ncol=1000, nrow=200)) rownames(exp) <- paste0("Gene", 1:nrow(exp)) colnames(exp) <- paste0("Sample", 1:ncol(exp)) cormat <- cor(t(exp)) edges <- cormat_to_edgelist(cormat) edges <- edges[abs(edges$Weight) > 0.10, ] check_SFT(edges)
set.seed(1) exp <- t(matrix(rnorm(10000), ncol=1000, nrow=200)) rownames(exp) <- paste0("Gene", 1:nrow(exp)) colnames(exp) <- paste0("Sample", 1:ncol(exp)) cormat <- cor(t(exp)) edges <- cormat_to_edgelist(cormat) edges <- edges[abs(edges$Weight) > 0.10, ] check_SFT(edges)
Identify consensus modules across independent data sets
consensus_modules( exp_list, metadata, power, cor_method = "spearman", net_type = "signed hybrid", module_merging_threshold = 0.8, TOM_type = NULL, verbose = FALSE )
consensus_modules( exp_list, metadata, power, cor_method = "spearman", net_type = "signed hybrid", module_merging_threshold = 0.8, TOM_type = NULL, verbose = FALSE )
exp_list |
A list containing the expression data frames with genes in
row names and samples in column names or 'SummarizedExperiment' objects.
The list can be created by using |
metadata |
A data frame containing sample names in row names and sample annotation in the first column. Ignored if 'exp_list' is a list of 'SummarizedExperiment' objects, since the function will extract colData. |
power |
Numeric vector of beta power for each expression set
as calculated by |
cor_method |
Correlation method used for network reconstruction. One of "spearman" (default), "biweight", or "pearson". |
net_type |
Network type. One of "signed hybrid" (default), "signed" or "unsigned". |
module_merging_threshold |
Correlation threshold to merge similar modules into a single one. Default: 0.8. |
TOM_type |
Character indicating the type of Topological Overlap Matrix to (TOM) create. One of 'unsigned', 'signed', 'signed Nowick', 'unsigned 2', 'signed 2', and 'signed Nowick 2'. By default, TOM type is automatically selected based on network type. |
verbose |
Logical indicating whether to display progress messages or not. Default: FALSE. |
A list containing 4 elements:
Consensus module eigengenes
Description of the multi-set object returned by the function WGCNA::checkSets
Metadata for each expression set
Data frame of genes and consensus modules
Objects to be used in dendrogram plotting
set.seed(12) data(zma.se) filt.zma <- filter_by_variance(zma.se, n=500) zma.set1 <- filt.zma[, sample(colnames(filt.zma), size=20, replace=FALSE)] zma.set2 <- filt.zma[, sample(colnames(filt.zma), size=20, replace=FALSE)] list.sets <- list(zma.set1, zma.set2) # SFT power previously identified with consensus_SFT_fit() cons_mod <- consensus_modules(list.sets, power = c(11, 13), cor_method = "pearson")
set.seed(12) data(zma.se) filt.zma <- filter_by_variance(zma.se, n=500) zma.set1 <- filt.zma[, sample(colnames(filt.zma), size=20, replace=FALSE)] zma.set2 <- filt.zma[, sample(colnames(filt.zma), size=20, replace=FALSE)] list.sets <- list(zma.set1, zma.set2) # SFT power previously identified with consensus_SFT_fit() cons_mod <- consensus_modules(list.sets, power = c(11, 13), cor_method = "pearson")
Pick power to fit networks to scale-free topology
consensus_SFT_fit( exp_list, setLabels = NULL, metadata = NULL, cor_method = "spearman", net_type = "signed hybrid", rsquared = 0.8 )
consensus_SFT_fit( exp_list, setLabels = NULL, metadata = NULL, cor_method = "spearman", net_type = "signed hybrid", rsquared = 0.8 )
exp_list |
A list of expression data frames or
SummarizedExperiment objects.
If input is a list of data frames, row names must correspond to gene IDs
and column names to samples.
The list can be created with |
setLabels |
Character vector containing labels for each expression set. |
metadata |
A data frame containing sample names in row names and sample annotation in the first column. Ignored if 'exp_list' is a list of 'SummarizedExperiment' objects, since the function will extract colData. |
cor_method |
Correlation method used for network reconstruction. One of "spearman" (default), "biweight", or "pearson". |
net_type |
Network type. One of "signed hybrid" (default), "signed" or "unsigned". |
rsquared |
Minimum R squared to consider the network similar to a scale-free topology. Default is 0.8. |
A list of 2 elements:
Numeric vector of optimal beta powers to fit networks to SFT
A ggplot object displaying main statistics of the SFT fit test
set.seed(12) data(zma.se) filt.zma <- filter_by_variance(zma.se, n=500) zma.set1 <- filt.zma[, sample(colnames(filt.zma), size=20, replace=FALSE)] zma.set2 <- filt.zma[, sample(colnames(filt.zma), size=20, replace=FALSE)] list.sets <- list(zma.set1, zma.set2) cons_sft <- consensus_SFT_fit(list.sets, setLabels = c("Maize1", "Maize2"), cor_method = "pearson")
set.seed(12) data(zma.se) filt.zma <- filter_by_variance(zma.se, n=500) zma.set1 <- filt.zma[, sample(colnames(filt.zma), size=20, replace=FALSE)] zma.set2 <- filt.zma[, sample(colnames(filt.zma), size=20, replace=FALSE)] list.sets <- list(zma.set1, zma.set2) cons_sft <- consensus_SFT_fit(list.sets, setLabels = c("Maize1", "Maize2"), cor_method = "pearson")
Correlate set-specific modules and consensus modules to sample information
consensus_trait_cor(consensus, cor_method = "pearson", metadata_cols = NULL)
consensus_trait_cor(consensus, cor_method = "pearson", metadata_cols = NULL)
consensus |
Consensus network returned by |
cor_method |
Correlation method to be used. One of 'spearman' or 'pearson'. Default: 'pearson'. |
metadata_cols |
A vector (either numeric or character) indicating which columns should be extracted from column metadata if exp is a 'SummarizedExperiment' object. The vector can contain column indices (numeric) or column names (character). By default, all columns are used. |
Data frame of consensus module-trait correlations and p-values, with the following variables:
Factor, trait name. Each trait corresponds to a variable of the sample metadata (if numeric) or levels of a variable (if categorical).
Factor, module eigengene.
Numeric, correlation.
Numeric, correlation P-values.
Character, name of the metadata variable.
set.seed(12) data(zma.se) filt.zma <- filter_by_variance(zma.se, n=500) zma.set1 <- filt.zma[, sample(colnames(filt.zma), size=20, replace=FALSE)] zma.set2 <- filt.zma[, sample(colnames(filt.zma), size=20, replace=FALSE)] list.sets <- list(zma.set1, zma.set2) # SFT power previously identified with consensus_SFT_fit() consensus <- consensus_modules(list.sets, power = c(11, 13), cor_method = "pearson") consensus_trait <- consensus_trait_cor(consensus, cor_method = "pearson")
set.seed(12) data(zma.se) filt.zma <- filter_by_variance(zma.se, n=500) zma.set1 <- filt.zma[, sample(colnames(filt.zma), size=20, replace=FALSE)] zma.set2 <- filt.zma[, sample(colnames(filt.zma), size=20, replace=FALSE)] list.sets <- list(zma.set1, zma.set2) # SFT power previously identified with consensus_SFT_fit() consensus <- consensus_modules(list.sets, power = c(11, 13), cor_method = "pearson") consensus_trait <- consensus_trait_cor(consensus, cor_method = "pearson")
Calculate an adjacency matrix from a correlation matrix
cor2adj(cor_matrix, beta, net_type = "signed hybrid")
cor2adj(cor_matrix, beta, net_type = "signed hybrid")
cor_matrix |
A numeric, symmetric matrix with pairwise correlations between genes (i.e., a 'correlation matrix'). |
beta |
Numeric scalar indicating the value of the |
net_type |
Character indicating the type of network to infer. Default: "signed hybrid". |
A numeric, symmetric matrix with network adjacency values between genes.
# Simulate an expression matrix with 100 genes and 50 samples exp <- matrix( rnorm(100 * 50, mean = 10, sd = 2), nrow = 100, dimnames = list( paste0("gene", seq_len(100)), paste0("sample", seq_len(50)) ) ) # Calculate correlation matrix cor_mat <- exp2cor(exp) # Calculate adjacency matrix (random value for beta) adj <- cor2adj(cor_mat, beta = 9)
# Simulate an expression matrix with 100 genes and 50 samples exp <- matrix( rnorm(100 * 50, mean = 10, sd = 2), nrow = 100, dimnames = list( paste0("gene", seq_len(100)), paste0("sample", seq_len(50)) ) ) # Calculate correlation matrix cor_mat <- exp2cor(exp) # Calculate adjacency matrix (random value for beta) adj <- cor2adj(cor_mat, beta = 9)
Transform a correlation matrix to an edge list
cormat_to_edgelist(matrix)
cormat_to_edgelist(matrix)
matrix |
Symmetrical correlation matrix. |
A 2-column data frame containing node 1, node 2 and edge weight.
data(filt.se) cor_mat <- cor(t(SummarizedExperiment::assay(filt.se))) edgelist <- cormat_to_edgelist(cor_mat)
data(filt.se) cor_mat <- cor(t(SummarizedExperiment::assay(filt.se))) edgelist <- cormat_to_edgelist(cor_mat)
Detect communities in a network
detect_communities(edgelist, method = igraph::cluster_infomap, directed = TRUE)
detect_communities(edgelist, method = igraph::cluster_infomap, directed = TRUE)
edgelist |
Data frame containing the network as an edge list. First column must be node 1 and second column must be node 2. Additional columns will be interpreted as edge attributes and will be modified by this function. |
method |
igraph function to be used for community detection. Available functions are cluster_infomap, cluster_edge_betweenness, cluster_fast_greedy, cluster_walktrap, cluster_spinglass, cluster_leading_eigen, cluster_louvain, and cluster_label_prop. Default is cluster_infomap. |
directed |
Logical indicating whether the network is directed (GRN only) or not (GCN and PPI networks). Default: TRUE. |
A data frame containing node names in the first column, and communities to which nodes belong in the second column.
Fabricio Almeida-Silva
cluster_infomap
,
cluster_edge_betweenness
,
cluster_fast_greedy
,
cluster_walktrap
,
cluster_spinglass
,
cluster_leading_eigen
,
cluster_louvain
,
cluster_label_prop
data(filt.se) tfs <- sample(rownames(filt.se), size=50, replace=FALSE) grn_edges <- grn_infer(filt.se, method = "clr", regulators = tfs) com <- detect_communities(grn_edges, directed=TRUE)
data(filt.se) tfs <- sample(rownames(filt.se), size=50, replace=FALSE) grn_edges <- grn_infer(filt.se, method = "clr", regulators = tfs) com <- detect_communities(grn_edges, directed=TRUE)
This function reads multiple expression tables (.tsv files) in a directory and combines them into a single gene expression data frame.
dfs2one(mypath, pattern = ".tsv$")
dfs2one(mypath, pattern = ".tsv$")
mypath |
Path to directory containing .tsv files. Files must have the first column in common, e.g. "Gene_ID". Rows are gene IDs and columns are sample names. |
pattern |
Pattern contained in each expression file. Default is '.tsv$', which means that all files ending in '.tsv' in the specified directory will be considered expression files. |
Data frame with gene IDs as row names and their expression values in each sample (columns).
Fabricio Almeida-Silva
# Simulate two expression data frames of 100 genes and 30 samples genes <- paste0(rep("Gene", 100), 1:100) samples1 <- paste0(rep("Sample", 30), 1:30) samples2 <- paste0(rep("Sample", 30), 31:60) exp1 <- cbind(genes, as.data.frame(matrix(rnorm(100*30),nrow=100,ncol=30))) exp2 <- cbind(genes, as.data.frame(matrix(rnorm(100*30),nrow=100,ncol=30))) colnames(exp1) <- c("Gene", samples1) colnames(exp2) <- c("Gene", samples2) # Write data frames to temporary files tmpdir <- tempdir() tmp1 <- tempfile(tmpdir = tmpdir, fileext = ".exp.tsv") tmp2 <- tempfile(tmpdir = tmpdir, fileext = ".exp.tsv") write.table(exp1, file=tmp1, quote=FALSE, sep="\t") write.table(exp2, file=tmp2, quote=FALSE, sep="\t") # Load the files into one exp <- dfs2one(mypath = tmpdir, pattern=".exp.tsv")
# Simulate two expression data frames of 100 genes and 30 samples genes <- paste0(rep("Gene", 100), 1:100) samples1 <- paste0(rep("Sample", 30), 1:30) samples2 <- paste0(rep("Sample", 30), 31:60) exp1 <- cbind(genes, as.data.frame(matrix(rnorm(100*30),nrow=100,ncol=30))) exp2 <- cbind(genes, as.data.frame(matrix(rnorm(100*30),nrow=100,ncol=30))) colnames(exp1) <- c("Gene", samples1) colnames(exp2) <- c("Gene", samples2) # Write data frames to temporary files tmpdir <- tempdir() tmp1 <- tempfile(tmpdir = tmpdir, fileext = ".exp.tsv") tmp2 <- tempfile(tmpdir = tmpdir, fileext = ".exp.tsv") write.table(exp1, file=tmp1, quote=FALSE, sep="\t") write.table(exp2, file=tmp2, quote=FALSE, sep="\t") # Load the files into one exp <- dfs2one(mypath = tmpdir, pattern=".exp.tsv")
Perform overrepresentation analysis for a set of genes
enrichment_analysis( genes, background_genes, annotation, column = NULL, correction = "BH", p = 0.05, min_setsize = 10, max_setsize = 500, bp_param = BiocParallel::SerialParam() )
enrichment_analysis( genes, background_genes, annotation, column = NULL, correction = "BH", p = 0.05, min_setsize = 10, max_setsize = 500, bp_param = BiocParallel::SerialParam() )
genes |
Character vector containing genes for overrepresentation analysis. |
background_genes |
Character vector of genes to be used as background for the overrepresentation analysis. |
annotation |
Annotation data frame with genes in the first column and functional annotation in the other columns. This data frame can be exported from Biomart or similar databases. |
column |
Column or columns of annotation to be used for enrichment. Both character or numeric values with column indices can be used. If users want to supply more than one column, input a character or numeric vector. Default: all columns from annotation. |
correction |
Multiple testing correction method. One of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr" or "none". Default is "BH". |
p |
P-value threshold. P-values below this threshold will be considered significant. Default: 0.05. |
min_setsize |
Numeric indicating the minimum gene set size to be considered. Gene sets correspond to levels of each variable in annotation). Default: 10. |
max_setsize |
Numeric indicating the maximum gene set size to be considered. Gene sets correspond to levels of each variable in annotation). Default: 500. |
bp_param |
BiocParallel back-end to be used. Default: BiocParallel::SerialParam() |
A data frame of overrepresentation results with the following variables:
character, functional term ID/name.
numeric, intersection length between input genes and genes in a particular functional term.
numeric, number of all genes in a particular functional term.
numeric, P-value for the hypergeometric test.
numeric, P-value adjusted for multiple comparisons using the method specified in parameter adj.
character, name of the grouping variable (i.e., column name of annotation).
Fabricio Almeida-Silva
data(filt.se) data(zma.interpro) genes <- rownames(filt.se)[1:50] background_genes <- rownames(filt.se) annotation <- zma.interpro # Using p = 1 to show all results enrich <- enrichment_analysis(genes, background_genes, annotation, p = 1)
data(filt.se) data(zma.interpro) genes <- rownames(filt.se)[1:50] background_genes <- rownames(filt.se) annotation <- zma.interpro # Using p = 1 to show all results enrich <- enrichment_analysis(genes, background_genes, annotation, p = 1)
For a given list of expression data, this function replaces genes with their corresponding orthogroups to allow inter-species comparisons.
exp_genes2orthogroups(explist = NULL, og = NULL, summarize = "median")
exp_genes2orthogroups(explist = NULL, og = NULL, summarize = "median")
explist |
List of expression data frames or SummarizedExperiment objects. |
og |
Data frame of 3 columns corresponding to orthogroup, species ID, and gene ID, respectively. Species IDs must be the same as the names of the expression list. |
summarize |
Centrality measure to summarize multiple paralogous genes in the same orthogroup. One of "median" or "mean". Default: "median". |
List of expression data frames for each species with expression summarized at the orthogroup level.
data(og.zma.osa) data(zma.se) data(osa.se) explist <- list(zma = zma.se, osa = osa.se) og <- og.zma.osa exp_ortho <- exp_genes2orthogroups(explist, og, summarize = "mean")
data(og.zma.osa) data(zma.se) data(osa.se) explist <- list(zma = zma.se, osa = osa.se) og <- og.zma.osa exp_ortho <- exp_genes2orthogroups(explist, og, summarize = "mean")
Preprocess expression data for network reconstruction
exp_preprocess( exp, NA_rm = TRUE, replaceby = 0, Zk_filtering = TRUE, zk = -2, cor_method = "spearman", remove_nonexpressed = TRUE, method = "median", min_exp = 1, min_percentage_samples = 0.25, remove_confounders = TRUE, variance_filter = FALSE, n = NULL, percentile = NULL, vstransform = FALSE )
exp_preprocess( exp, NA_rm = TRUE, replaceby = 0, Zk_filtering = TRUE, zk = -2, cor_method = "spearman", remove_nonexpressed = TRUE, method = "median", min_exp = 1, min_percentage_samples = 0.25, remove_confounders = TRUE, variance_filter = FALSE, n = NULL, percentile = NULL, vstransform = FALSE )
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
NA_rm |
Logical. It specifies whether to remove missing values from the expression data frame or not. Default = TRUE. |
replaceby |
If NA_rm is TRUE, what to use instead of NAs. One of 0 or 'mean'. Default is 0. |
Zk_filtering |
Logical. It specifies whether to filter outlying samples by Zk or not. Default: TRUE. |
zk |
If Zk_filtering is TRUE, the standardized connectivity threshold. Samples below this threshold will be considered outliers. Default is -2. |
cor_method |
If Zk_filtering is TRUE, the correlation method to use. One of 'spearman', 'bicor', or 'pearson'. Default is 'spearman'. |
remove_nonexpressed |
Logical. It specifies whether non-expressed genes should be removed or not. Default is TRUE. |
method |
If remove_nonexpressed is TRUE, the criterion to filter non-expressed genes out. One of "mean", "median", "percentage", or "allsamples". Default is 'median'. |
min_exp |
If method is 'mean', 'median', or 'allsamples', the minimum value for a gene to be considered expressed. If method is 'percentage', the minimum value each gene must have in at least n percent of samples to be considered expressed. |
min_percentage_samples |
If method is 'percentage', expressed genes must have expression >= min_exp in at least this percentage. Values must range from 0 to 1. Default = 0.25. |
remove_confounders |
Logical. If TRUE, it removes principal components that add noise to the data. |
variance_filter |
Logical. If TRUE, it will filter genes by variance. Default is FALSE. |
n |
If variance_filter is TRUE, the number of most variable genes to keep. |
percentile |
If variance_filter is TRUE, the percentage of most variable genes to keep. |
vstransform |
Logical indicating if data should be variance stabilizing transformed. This parameter can only be set to TRUE if data is a matrix of raw read counts. |
Processed gene expression data frame with gene IDs in row names and sample names in column names or 'SummarizedExperiment' object.
Fabricio Almeida-Silva
Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology, 15(12), 1-21.
varianceStabilizingTransformation
data(zma.se) exp <- exp_preprocess(zma.se, variance_filter=TRUE, n=1000)
data(zma.se) exp <- exp_preprocess(zma.se, variance_filter=TRUE, n=1000)
Calculate pairwise correlations between genes in a matrix
exp2cor(exp, cor_method = "pearson")
exp2cor(exp, cor_method = "pearson")
exp |
A numeric matrix containing a gene expression matrix, with genes in rows and samples in columns. |
cor_method |
Character indicating the correlation method to use. One of "pearson", "spearman", or "biweight". Default: "pearson". |
A numeric, symmetric matrix with pairwise correlations between genes.
# Simulate an expression matrix with 100 genes and 50 samples exp <- matrix( rnorm(100 * 50, mean = 10, sd = 2), nrow = 100, dimnames = list( paste0("gene", seq_len(100)), paste0("sample", seq_len(50)) ) ) # Calculate correlation matrix cor_mat <- exp2cor(exp)
# Simulate an expression matrix with 100 genes and 50 samples exp <- matrix( rnorm(100 * 50, mean = 10, sd = 2), nrow = 100, dimnames = list( paste0("gene", seq_len(100)), paste0("sample", seq_len(50)) ) ) # Calculate correlation matrix cor_mat <- exp2cor(exp)
Infer gene coexpression network from gene expression
exp2gcn( exp, net_type = "signed", module_merging_threshold = 0.8, SFTpower = NULL, cor_method = "spearman", TOM_type = NULL, min_module_size = 30, return_cormat = TRUE, verbose = FALSE )
exp2gcn( exp, net_type = "signed", module_merging_threshold = 0.8, SFTpower = NULL, cor_method = "spearman", TOM_type = NULL, min_module_size = 30, return_cormat = TRUE, verbose = FALSE )
exp |
Either a 'SummarizedExperiment' object, or a gene expression matrix/data frame with genes in row names and samples in column names. |
net_type |
Character indicating the type of network to infer. One of 'signed', 'signed hybrid' or 'unsigned'. Default: 'signed'. |
module_merging_threshold |
Numeric indicating the minimum correlation threshold to merge similar modules into a single one. Default: 0.8. |
SFTpower |
Numeric scalar indicating the value of the |
cor_method |
Character with correlation method to use. One of "pearson", "biweight" or "spearman". Default: "spearman". |
TOM_type |
Character specifying the method to use to calculate a topological overlap matrix (TOM). If NULL, TOM type will be automatically inferred from network type specified in net_type. Default: NULL. |
min_module_size |
Numeric indicating the minimum module size. Default: 30. |
return_cormat |
Logical indicating whether the correlation matrix should be returned. If TRUE (default), an element named 'correlation_matrix' containing the correlation matrix will be included in the result list. |
verbose |
Logical indicating whether to display progress messages or not. Default: FALSE. |
A list containing the following elements:
adjacency_matrix Numeric matrix with network adjacencies.
MEs Data frame of module eigengenes, with samples in rows, and module eigengenes in columns.
genes_and_modules Data frame with columns 'Genes' (character) and 'Modules' (character) indicating the genes and the modules to which they belong.
kIN Data frame of degree centrality for each gene, with columns 'kTotal' (total degree), 'kWithin' (intramodular degree), 'kOut' (extra-modular degree), and 'kDiff' (difference between the intra- and extra-modular degree).
correlation_matrix Numeric matrix with pairwise correlation coefficients between genes. If parameter return_cormat is FALSE, this will be NULL.
params List with network inference parameters passed as input.
dendro_plot_objects List with objects to plot the dendrogram
in plot_dendro_and_colors
. Elements are named 'tree' (an hclust
object with gene dendrogram), 'Unmerged' (character with per-gene module
assignments before merging similar modules), and 'Merged' (character
with per-gene module assignments after merging similar modules).
Fabricio Almeida-Silva
data(filt.se) # The SFT fit was previously calculated and the optimal power was 16 gcn <- exp2gcn(exp = filt.se, SFTpower = 18, cor_method = "pearson")
data(filt.se) # The SFT fit was previously calculated and the optimal power was 16 gcn <- exp2gcn(exp = filt.se, SFTpower = 18, cor_method = "pearson")
Infer gene coexpression network from gene expression in a blockwise manner
exp2gcn_blockwise( exp, net_type = "signed", module_merging_threshold = 0.8, SFTpower = NULL, cor_method = "pearson", TOM_type = NULL, max_block_size = 5000, min_module_size = 30, ... )
exp2gcn_blockwise( exp, net_type = "signed", module_merging_threshold = 0.8, SFTpower = NULL, cor_method = "pearson", TOM_type = NULL, max_block_size = 5000, min_module_size = 30, ... )
exp |
Either a 'SummarizedExperiment' object, or a gene expression matrix/data frame with genes in row names and samples in column names. |
net_type |
Character indicating the type of network to infer. One of 'signed', 'signed hybrid' or 'unsigned'. Default: 'signed'. |
module_merging_threshold |
Numeric indicating the minimum correlation threshold to merge similar modules into a single one. Default: 0.8. |
SFTpower |
Numeric scalar indicating the value of the |
cor_method |
Character with correlation method to use. One of "pearson" or "biweight". Default: "pearson". |
TOM_type |
Character specifying the method to use to calculate a topological overlap matrix (TOM). If NULL, TOM type will be automatically inferred from network type specified in net_type. Default: NULL. |
max_block_size |
Numeric indicating the maximum block size for module detection. |
min_module_size |
Numeric indicating the minimum module size. Default: 30. |
... |
Additional arguments to |
A list containing the following elements:
MEs Data frame of module eigengenes, with samples in rows, and module eigengenes in columns.
genes_and_modules Data frame with columns 'Genes' (character) and 'Modules' (character) indicating the genes and the modules to which they belong.
params List with network inference parameters passed as input.
dendro_plot_objects List with objects to plot the dendrogram
in plot_dendro_and_colors
. Elements are named 'tree' (an hclust
object with gene dendrogram), 'Unmerged' (character with per-gene module
assignments before merging similar modules), and 'Merged' (character
with per-gene module assignments after merging similar modules).
Fabricio Almeida-Silva
data(filt.se) # The SFT fit was previously calculated and the optimal power was 16 cor <- WGCNA::cor gcn <- exp2gcn_blockwise( exp = filt.se, SFTpower = 18, cor_method = "pearson" )
data(filt.se) # The SFT fit was previously calculated and the optimal power was 16 cor <- WGCNA::cor gcn <- exp2gcn_blockwise( exp = filt.se, SFTpower = 18, cor_method = "pearson" )
Infer gene regulatory network from expression data
exp2grn( exp, regulators = NULL, eps = 0, estimator_aracne = "spearman", estimator_clr = "pearson", remove_zero = TRUE, nsplit = 10, ... )
exp2grn( exp, regulators = NULL, eps = 0, estimator_aracne = "spearman", estimator_clr = "pearson", remove_zero = TRUE, nsplit = 10, ... )
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
regulators |
A character vector of regulators (e.g., transcription factors or miRNAs). All regulators must be included in 'exp'. |
eps |
Numeric value indicating the threshold used when removing an edge: for each triplet of nodes (i,j,k), the weakest edge, say (ij), is removed if its weight is below min(ik),(jk) - eps. Default: 0. |
estimator_aracne |
Entropy estimator to be used in ARACNE inference. One of "mi.empirical", "mi.mm", "mi.shrink", "mi.sg", "pearson", "spearman", or "kendall". Default: "spearman". |
estimator_clr |
Entropy estimator to be used in CLR inference. One of "mi.empirical", "mi.mm", "mi.shrink", "mi.sg", "pearson", "spearman", or "kendall". Default: "pearson". |
remove_zero |
Logical indicating whether to remove edges whose weight is exactly zero. Zero values indicate edges that were removed by ARACNE. Default: TRUE. |
nsplit |
Number of groups in which the edge list will be split. Default: 10. |
... |
Additional arguments passed to 'GENIE3::GENIE3()'. |
This function infers GRNs with ARACNE, GENIE3 and CLR, ranks correlation weights for each GRN and calculates the average rank for each edge. Then, the resulting GRN is filtered to keep the top n edges that lead to the optimal scale-free topology fit.
A filtered edge list with regulators in the first column and targets in the second column.
data(filt.se) tfs <- sample(rownames(filt.se), size=50, replace=FALSE) # Test with small number of trees for demonstration purpose grn <- exp2grn(filt.se, regulators = tfs, nTrees=2, nsplit=2)
data(filt.se) tfs <- sample(rownames(filt.se), size=50, replace=FALSE) # Test with small number of trees for demonstration purpose grn <- exp2grn(filt.se, regulators = tfs, nTrees=2, nsplit=2)
Filtered expression data in transcripts per million (TPM) from
Shin et al., 2021. This is the same data set described in zma.se
,
but it only contains the top 500 genes with the highest variances.
This data set was created to be used in unit tests and examples.
data(filt.se)
data(filt.se)
An object of class SummarizedExperiment
Shin, J., Marx, H., Richards, A., Vaneechoutte, D., Jayaraman, D., Maeda, J., ... & Roy, S. (2021). A network-based comparative framework to study conservation and divergence of proteomes in plant phylogenies. Nucleic Acids Research, 49(1), e3-e3.
data(filt.se)
data(filt.se)
Keep only genes with the highest variances
filter_by_variance(exp, n = NULL, percentile = NULL)
filter_by_variance(exp, n = NULL, percentile = NULL)
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
n |
Number of most variable genes (e.g., n=5000 will keep the top 5000 most variable genes). |
percentile |
Percentile of most highly variable genes (e.g., percentile=0.1 will keep the top 10 percent most variable genes). Values must range from 0 to 1. |
Expression data frame or 'SummarizedExperiment' object with the most variable genes in row names and samples in column names.
Fabricio Almeida-Silva
data(zma.se) filt_exp <- filter_by_variance(zma.se, p=0.1)
data(zma.se) filt_exp <- filter_by_variance(zma.se, p=0.1)
Calculate gene significance for a given group of genes
gene_significance( exp, metadata, metadata_cols = NULL, genes = NULL, alpha = 0.05, cor_method = "pearson", min_cor = 0.2, use_abs = TRUE )
gene_significance( exp, metadata, metadata_cols = NULL, genes = NULL, alpha = 0.05, cor_method = "pearson", min_cor = 0.2, use_abs = TRUE )
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
metadata |
A data frame containing sample names in row names and sample annotation in the first column. Ignored if 'exp' is a 'SummarizedExperiment' object, since the function will extract colData. |
metadata_cols |
A vector (either numeric or character) indicating which columns should be extracted from column metadata if exp is a 'SummarizedExperiment' object. The vector can contain column indices (numeric) or column names (character). By default, all columns are used. |
genes |
Character vector of genes to be correlated with traits. If not given, all genes in 'exp' will be considered. |
alpha |
Significance level. Default is 0.05. |
cor_method |
Method to calculate correlation. One of 'pearson', 'spearman' or 'kendall'. Default is 'spearman'. |
min_cor |
Minimum correlation coefficient. Default is 0.2. |
use_abs |
Logical indicating whether to filter by correlation using
absolute value or not. If TRUE, a |
A data frame with correlation and correlation p-values for each pair of gene and trait, with the following variables:
Factor, gene ID.
Factor, trait name. Each trait corresponds to a variable of the sample metadata (if numeric) or levels of a variable (if categorical).
Numeric, correlation.
Numeric, correlation P-values.
Character, name of the metadata variable.
Fabricio Almeida-Silva
data(filt.se) gs <- gene_significance(filt.se)
data(filt.se) gs <- gene_significance(filt.se)
Get edge list from an adjacency matrix for a group of genes
get_edge_list( net, genes = NULL, module = NULL, filter = FALSE, method = "optimalSFT", r_optimal_test = seq(0.4, 0.9, by = 0.1), Zcutoff = 1.96, pvalue_cutoff = 0.05, rcutoff = 0.7, nSamples = NULL, check_SFT = FALSE, bp_param = BiocParallel::SerialParam() )
get_edge_list( net, genes = NULL, module = NULL, filter = FALSE, method = "optimalSFT", r_optimal_test = seq(0.4, 0.9, by = 0.1), Zcutoff = 1.96, pvalue_cutoff = 0.05, rcutoff = 0.7, nSamples = NULL, check_SFT = FALSE, bp_param = BiocParallel::SerialParam() )
net |
List object returned by |
genes |
Character vector containing a subset of genes from which edges will be extracted. It can be ignored if the user wants to extract an edge list for a given module instead of individual genes. |
module |
Character with module name from which edges will be extracted. To include 2 or more modules, input the names in a character vector. |
filter |
Logical indicating whether to filter the edge list or not. |
method |
Method to filter spurious correlations. One of "Zscore", "optimalSFT", "pvalue" or "min_cor". See details for more information on the methods. Default: 'optimalSFT' |
r_optimal_test |
Numeric vector with the correlation thresholds
to be tested for optimal scale-free topology fit. Only valid
if |
Zcutoff |
Minimum Z-score threshold. Only valid if
|
pvalue_cutoff |
Maximum P-value threshold. Only valid
if |
rcutoff |
Minimum correlation threshold. Only valid
if |
nSamples |
Number of samples in the data set from which
the correlation matrix was calculated. Only required
if |
check_SFT |
Logical indicating whether to test if the resulting network is close to a scale-free topology or not. Default: FALSE. |
bp_param |
BiocParallel back-end to be used. Default: BiocParallel::SerialParam() |
The default method ("optimalSFT") will create several different
edge lists by filtering the original correlation matrix by the thresholds
specified in r_optimal_test
. Then, it will calculate a scale-free
topology fit index for each of the possible networks and return the network
that best fits the scale-free topology.
The method "Zscore" will apply a Fisher Z-transformation for the correlation
coefficients and remove the Z-scores below the threshold specified
in Zcutoff
.
The method "pvalue" will calculate Student asymptotic p-value for the
correlations and remove correlations whose p-values are above the threshold
specified in pvalue_cutoff
.
The method "min_cor" will remove correlations below the minimum correlation
threshold specified in rcutoff
.
Data frame with edge list for the input genes.
Fabricio Almeida-Silva
SFT_fit
exp2gcn
.
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") genes <- rownames(filt.se)[1:50] edges <- get_edge_list(gcn, genes=genes, filter = FALSE)
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") genes <- rownames(filt.se)[1:50] edges <- get_edge_list(gcn, genes=genes, filter = FALSE)
Get housekeeping genes from global expression profile
get_HK(exp)
get_HK(exp)
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
This function identifies housekeeping genes, which are broadly expressed genes with low variation in a global scale across samples. For some cases, users would want to remove these genes as they are not interesting for coexpression network analyses. See references for more details.
Character vector of housekeeping gene IDs.
Fabricio Almeida-Silva
Machado, F.B., Moharana, K.C., Almeida‐Silva, F., Gazara, R.K., Pedrosa‐Silva, F., Coelho, F.S., Grativol, C. and Venancio, T.M. (2020), Systematic analysis of 1298 RNA‐Seq samples and construction of a comprehensive soybean (Glycine max) expression atlas. Plant J, 103: 1894-1909.
data(zma.se) hk <- get_HK(zma.se)
data(zma.se) hk <- get_HK(zma.se)
Get GCN hubs
get_hubs_gcn(exp, net)
get_hubs_gcn(exp, net)
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
net |
List object returned by |
Data frame containing gene IDs, modules and intramodular connectivity of all hubs.
Fabricio Almeida-Silva
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") hubs <- get_hubs_gcn(filt.se, gcn)
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") hubs <- get_hubs_gcn(filt.se, gcn)
Get hubs for gene regulatory network
Get hubs for protein-protein interaction network
get_hubs_grn( edgelist, top_percentile = 0.1, top_n = NULL, return_degree = FALSE, ranked = TRUE ) get_hubs_ppi( edgelist, top_percentile = 0.1, top_n = NULL, return_degree = FALSE )
get_hubs_grn( edgelist, top_percentile = 0.1, top_n = NULL, return_degree = FALSE, ranked = TRUE ) get_hubs_ppi( edgelist, top_percentile = 0.1, top_n = NULL, return_degree = FALSE )
edgelist |
A protein-protein interaction network represented as an edge list. |
top_percentile |
Numeric from 0 to 1 indicating the percentage of proteins with the highest degree to consider hubs. Default: 0.1. |
top_n |
Numeric indicating the number of proteins with the highest degree to consider hubs. |
return_degree |
Logical indicating whether to return a data frame of degree for all proteins. If TRUE, the function will return a list instead of a data frame. Default: FALSE. |
ranked |
Logical indicating whether to treat third column of the edge list (edge weights) as ranked values. Ignored if the edge list only contains 2 columns. Default: TRUE. |
A data frame with gene ID in the first column and out degree in the second column or a list of two data frames with hubs and degree for all genes, respectively.
A data frame with protein ID in the first column and degree in the second column or a list of two data frames with hubs and degree for all genes, respectively.
data(filt.se) tfs <- sample(rownames(filt.se), size=50, replace=FALSE) grn_list <- grn_combined(filt.se, regulators=tfs, nTrees=2) ranked_grn <- grn_average_rank(grn_list) # split in only 2 groups for demonstration purposes filtered_edges <- grn_filter(ranked_grn, nsplit=2) hubs <- get_hubs_grn(filtered_edges) ppi_edges <- igraph::sample_pa(n = 500) ppi_edges <- igraph::as_edgelist(ppi_edges) hubs <- get_hubs_ppi(ppi_edges, return_degree = TRUE)
data(filt.se) tfs <- sample(rownames(filt.se), size=50, replace=FALSE) grn_list <- grn_combined(filt.se, regulators=tfs, nTrees=2) ranked_grn <- grn_average_rank(grn_list) # split in only 2 groups for demonstration purposes filtered_edges <- grn_filter(ranked_grn, nsplit=2) hubs <- get_hubs_grn(filtered_edges) ppi_edges <- igraph::sample_pa(n = 500) ppi_edges <- igraph::as_edgelist(ppi_edges) hubs <- get_hubs_ppi(ppi_edges, return_degree = TRUE)
Get 1st-order neighbors of a given gene or group of genes
get_neighbors(genes, net, cor_threshold = 0.7)
get_neighbors(genes, net, cor_threshold = 0.7)
genes |
Character vector containing genes from which direct neighbors will be extracted. |
net |
List object returned by |
cor_threshold |
Correlation threshold to filter connections. As a weighted network is a fully connected graph, a cutoff must be selected. Default is 0.7. |
List containing 1st-order neighbors for each input gene.
Fabricio Almeida-Silva
exp2gcn
SFT_fit
data(filt.se) genes <- rownames(filt.se)[1:10] gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") neighbors <- get_neighbors(genes, gcn)
data(filt.se) genes <- rownames(filt.se)[1:10] gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") neighbors <- get_neighbors(genes, gcn)
Rank edge weights for GRNs and calculate average across different methods
grn_average_rank(list_edges)
grn_average_rank(list_edges)
list_edges |
List containing edge lists as returned by
the function |
Edge list containing regulator, target and mean rank from all algorithms.
data(filt.se) tfs <- sample(rownames(filt.se), size=50, replace=FALSE) grn_list <- grn_combined(filt.se, regulators=tfs, nTrees=2) ranked_grn <- grn_average_rank(grn_list)
data(filt.se) tfs <- sample(rownames(filt.se), size=50, replace=FALSE) grn_list <- grn_combined(filt.se, regulators=tfs, nTrees=2) ranked_grn <- grn_average_rank(grn_list)
Infer gene regulatory network with multiple algorithms and combine results in a list
grn_combined( exp, regulators = NULL, eps = 0.1, estimator_aracne = "spearman", estimator_clr = "pearson", remove_zero = TRUE, ... )
grn_combined( exp, regulators = NULL, eps = 0.1, estimator_aracne = "spearman", estimator_clr = "pearson", remove_zero = TRUE, ... )
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
regulators |
A character vector of regulators (e.g., transcription factors or miRNAs). All regulators must be included in 'exp'. |
eps |
Numeric value indicating the threshold used when removing an edge: for each triplet of nodes (i,j,k), the weakest edge, say (ij), is removed if its weight is below min(ik),(jk) - eps. Default: 0.1. |
estimator_aracne |
Entropy estimator to be used in ARACNE inference. One of "mi.empirical", "mi.mm", "mi.shrink", "mi.sg", "pearson", "spearman", or "kendall". Default: "spearman". |
estimator_clr |
Entropy estimator to be used in CLR inference. One of "mi.empirical", "mi.mm", "mi.shrink", "mi.sg", "pearson", "spearman", or "kendall". Default: "pearson". |
remove_zero |
Logical indicating whether to remove edges whose weight is exactly zero. Zero values indicate edges that were removed by ARACNE. Default: TRUE. |
... |
Additional arguments passed to 'GENIE3::GENIE3()'. |
A list of data frames representing edge lists. Each list element is an edge list for a specific method.
data(filt.se) tfs <- sample(rownames(filt.se), size=50, replace=FALSE) grn_list <- grn_combined(filt.se, regulators=tfs, nTrees=2)
data(filt.se) tfs <- sample(rownames(filt.se), size=50, replace=FALSE) grn_list <- grn_combined(filt.se, regulators=tfs, nTrees=2)
Filter a gene regulatory network based on optimal scale-free topology fit
grn_filter(edgelist, nsplit = 10, bp_param = BiocParallel::SerialParam())
grn_filter(edgelist, nsplit = 10, bp_param = BiocParallel::SerialParam())
edgelist |
A gene regulatory network represented as an edge list. |
nsplit |
Number of groups in which the edge list will be split. Default: 10. |
bp_param |
BiocParallel back-end to be used. Default: BiocParallel::SerialParam() |
The edge list will be split in n groups and the scale-free topology fit will be tested for each subset of the edge list. For instance, if an edge list of 10000 rows is used as input, the function will test SFT fit for the top 1000 edges, then top 2000 edges, and so on up to the whole edge list.
The edge list that best fits the scale-free topology.
data(filt.se) tfs <- sample(rownames(filt.se), size=50, replace=FALSE) grn_list <- grn_combined(filt.se, regulators=tfs, nTrees=2) ranked_grn <- grn_average_rank(grn_list) # split in only 2 groups for demonstration purposes filtered_edges <- grn_filter(ranked_grn, nsplit=2)
data(filt.se) tfs <- sample(rownames(filt.se), size=50, replace=FALSE) grn_list <- grn_combined(filt.se, regulators=tfs, nTrees=2) ranked_grn <- grn_average_rank(grn_list) # split in only 2 groups for demonstration purposes filtered_edges <- grn_filter(ranked_grn, nsplit=2)
The available algorithms are Context Likelihood of Relatedness (CLR), ARACNE, or GENIE3.
grn_infer( exp, regulators = NULL, method = c("clr", "aracne", "genie3"), estimator_clr = "pearson", estimator_aracne = "spearman", eps = 0.1, remove_zero = TRUE, ... )
grn_infer( exp, regulators = NULL, method = c("clr", "aracne", "genie3"), estimator_clr = "pearson", estimator_aracne = "spearman", eps = 0.1, remove_zero = TRUE, ... )
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
regulators |
A character vector of regulators (e.g., transcription factors or miRNAs). All regulators must be included in 'exp'. |
method |
GRN inference algorithm to be used. One of "clr", "aracne", or "genie3". |
estimator_clr |
Entropy estimator to be used. One of "mi.empirical", "mi.mm", "mi.shrink", "mi.sg", "pearson", "spearman", or "kendall". Default: "pearson". |
estimator_aracne |
Entropy estimator to be used. One of "mi.empirical", "mi.mm", "mi.shrink", "mi.sg", "pearson", "spearman", or "kendall". Default: "spearman". |
eps |
Numeric value indicating the threshold used when removing an edge: for each triplet of nodes (i,j,k), the weakest edge, say (ij), is removed if its weight is below min(ik),(jk) - eps. Default: 0.1. |
remove_zero |
Logical indicating whether to remove edges whose weight is exactly zero. Default: TRUE |
... |
Additional arguments passed to 'GENIE3::GENIE3()'. |
A gene regulatory network represented as an edge list.
data(filt.se) tfs <- sample(rownames(filt.se), size=20, replace=FALSE) clr <- grn_infer(filt.se, method = "clr", regulators=tfs) aracne <- grn_infer(filt.se, method = "aracne", regulators=tfs) # only 2 trees for demonstration purposes genie3 <- grn_infer(filt.se, method = "genie3", regulators=tfs, nTrees=2)
data(filt.se) tfs <- sample(rownames(filt.se), size=20, replace=FALSE) clr <- grn_infer(filt.se, method = "clr", regulators=tfs) aracne <- grn_infer(filt.se, method = "aracne", regulators=tfs) # only 2 trees for demonstration purposes genie3 <- grn_infer(filt.se, method = "genie3", regulators=tfs, nTrees=2)
Logical expression to check if gene or gene set is singleton or not
is_singleton(genes, og)
is_singleton(genes, og)
genes |
Character containing gene or group of genes to be evaluated. |
og |
Data frame of 3 columns corresponding to orthogroup, species ID, and gene ID, respectively. |
Vector of logical values indicating if gene or group of genes is singleton or not.
Fabricio Almeida-Silva
is_duplicated
data(og.zma.osa) data(filt.se) genes <- tail(rownames(filt.se), n = 100) is_singleton(genes, og.zma.osa)
data(og.zma.osa) data(filt.se) genes <- tail(rownames(filt.se), n = 100) is_singleton(genes, og.zma.osa)
Calculate module preservation between two expression data sets using NetRep's algorithm
modPres_netrep( explist, ref_net = NULL, test_net = NULL, nPerm = 1000, nThreads = 1 )
modPres_netrep( explist, ref_net = NULL, test_net = NULL, nPerm = 1000, nThreads = 1 )
explist |
List of expression data frames or SummarizedExperiment objects. |
ref_net |
Reference network object returned by the function |
test_net |
Test network object returned by the function |
nPerm |
Number of permutations. Default: 1000 |
nThreads |
Number of threads to be used for parallel computing. Default: 1 |
Output list from NetRep::modulePreservation
and a message in
user's standard output stating which modules are preserved.
set.seed(1) data(og.zma.osa) data(zma.se) data(osa.se) og <- og.zma.osa exp_ortho <- exp_genes2orthogroups(explist, og, summarize = "mean") exp_ortho <- lapply(exp_ortho, function(x) filter_by_variance(x, n=1500)) # Previously calculated SFT powers powers <- c(13, 15) gcn_osa <- exp2gcn(exp_ortho$osa, net_type = "signed hybrid", SFTpower = powers[1], cor_method = "pearson") gcn_zma <- exp2gcn(exp_ortho$zma, net_type = "signed hybrid", SFTpower = powers[2], cor_method = "pearson") explist <- exp_ortho ref_net <- gcn_osa test_net <- gcn_zma # 10 permutations for demonstration purposes pres_netrep <- modPres_netrep(explist, ref_net, test_net, nPerm=10, nThreads = 2)
set.seed(1) data(og.zma.osa) data(zma.se) data(osa.se) og <- og.zma.osa exp_ortho <- exp_genes2orthogroups(explist, og, summarize = "mean") exp_ortho <- lapply(exp_ortho, function(x) filter_by_variance(x, n=1500)) # Previously calculated SFT powers powers <- c(13, 15) gcn_osa <- exp2gcn(exp_ortho$osa, net_type = "signed hybrid", SFTpower = powers[1], cor_method = "pearson") gcn_zma <- exp2gcn(exp_ortho$zma, net_type = "signed hybrid", SFTpower = powers[2], cor_method = "pearson") explist <- exp_ortho ref_net <- gcn_osa test_net <- gcn_zma # 10 permutations for demonstration purposes pres_netrep <- modPres_netrep(explist, ref_net, test_net, nPerm=10, nThreads = 2)
Calculate module preservation between two expression data sets using WGCNA's algorithm
modPres_WGCNA(explist, ref_net, nPerm = 200)
modPres_WGCNA(explist, ref_net, nPerm = 200)
explist |
List of expression data frames or SummarizedExperiment objects. |
ref_net |
Reference network object returned by the function |
nPerm |
Number of permutations for the module preservation statistics. It must be greater than 1. Default: 200. |
A ggplot object with module preservation statistics.
set.seed(1) data(og.zma.osa) data(zma.se) data(osa.se) explist <- list(Zma = zma.se, Osa = osa.se) og <- og.zma.osa exp_ortho <- exp_genes2orthogroups(explist, og, summarize = "mean") exp_ortho <- lapply(exp_ortho, function(x) filter_by_variance(x, n=1500)) # Previously calculated power powers <- c(13, 15) gcn_osa <- exp2gcn(exp_ortho$Osa, net_type = "signed hybrid", SFTpower = powers[1], cor_method = "pearson") explist <- exp_ortho ref_net <- gcn_osa # 5 permutations for demonstration purposes pres_wgcna <- modPres_WGCNA(explist, ref_net, nPerm=5)
set.seed(1) data(og.zma.osa) data(zma.se) data(osa.se) explist <- list(Zma = zma.se, Osa = osa.se) og <- og.zma.osa exp_ortho <- exp_genes2orthogroups(explist, og, summarize = "mean") exp_ortho <- lapply(exp_ortho, function(x) filter_by_variance(x, n=1500)) # Previously calculated power powers <- c(13, 15) gcn_osa <- exp2gcn(exp_ortho$Osa, net_type = "signed hybrid", SFTpower = powers[1], cor_method = "pearson") explist <- exp_ortho ref_net <- gcn_osa # 5 permutations for demonstration purposes pres_wgcna <- modPres_WGCNA(explist, ref_net, nPerm=5)
Perform enrichment analysis for coexpression network modules
module_enrichment( net = NULL, background_genes, annotation, column = NULL, correction = "BH", p = 0.05, min_setsize = 10, max_setsize = 500, bp_param = BiocParallel::SerialParam() )
module_enrichment( net = NULL, background_genes, annotation, column = NULL, correction = "BH", p = 0.05, min_setsize = 10, max_setsize = 500, bp_param = BiocParallel::SerialParam() )
net |
List object returned by |
background_genes |
Character vector of genes to be used as background for the Fisher's Exact Test. |
annotation |
Annotation data frame with genes in the first column and functional annotation in the other columns. This data frame can be exported from Biomart or similar databases. |
column |
Column or columns of |
correction |
Multiple testing correction method. One of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr" or "none". Default is "BH". |
p |
P-value threshold. P-values below this threshold will be considered significant. Default is 0.05. |
min_setsize |
Numeric indicating the minimum gene set size to be considered. Gene sets correspond to levels of each variable in annotation). Default: 10. |
max_setsize |
Numeric indicating the maximum gene set size to be considered. Gene sets correspond to levels of each variable in annotation). Default: 500. |
bp_param |
BiocParallel back-end to be used. Default: BiocParallel::SerialParam() |
A data frame of overrepresentation results with the following variables:
character, functional term ID/name.
numeric, intersection length between input genes and genes in a particular functional term.
numeric, number of all genes in a particular functional term.
numeric, P-value for the hypergeometric test.
numeric, P-value adjusted for multiple comparisons using the method specified in parameter adj.
character, name of the grouping variable (i.e., column name of annotation).
character, module name.
Fabricio Almeida-Silva
data(filt.se) data(zma.interpro) background <- rownames(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") mod_enrich <- module_enrichment(gcn, background, zma.interpro, p=1)
data(filt.se) data(zma.interpro) background <- rownames(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") mod_enrich <- module_enrichment(gcn, background, zma.interpro, p=1)
Calculate network preservation between two expression data sets
module_preservation( explist, ref_net = NULL, test_net = NULL, algorithm = "netrep", nPerm = 1000, nThreads = 1 )
module_preservation( explist, ref_net = NULL, test_net = NULL, algorithm = "netrep", nPerm = 1000, nThreads = 1 )
explist |
List of SummarizedExperiment objects or expression data frames with genes (or orthogroups) in row names and samples in column names. |
ref_net |
Reference network object returned by
the function |
test_net |
Test network object returned by the function |
algorithm |
Module preservation algorithm to be used. One of 'netrep' (default, permutation-based) or WGCNA. |
nPerm |
Number of permutations. Default: 1000 |
nThreads |
Number of threads to be used for parallel computing. Default: 1 |
A list containing the preservation statistics (netrep) or a ggplot
object with preservation statistics.
See WGCNA::modulePreservation
or NetRep::modulePreservation
for more info.
set.seed(1) data(og.zma.osa) data(zma.se) data(osa.se) og <- og.zma.osa exp_ortho <- exp_genes2orthogroups(explist, og, summarize = "mean") exp_ortho <- lapply(exp_ortho, function(x) filter_by_variance(x, n=1500)) # Previously calculated SFT powers powers <- c(13, 15) gcn_osa <- exp2gcn(exp_ortho$osa, net_type = "signed hybrid", SFTpower = powers[1], cor_method = "pearson") gcn_zma <- exp2gcn(exp_ortho$zma, net_type = "signed hybrid", SFTpower = powers[2], cor_method = "pearson") explist <- exp_ortho ref_net <- gcn_osa test_net <- gcn_zma # 10 permutations for demonstration purposes pres <- module_preservation(explist, ref_net, test_net, nPerm=10)
set.seed(1) data(og.zma.osa) data(zma.se) data(osa.se) og <- og.zma.osa exp_ortho <- exp_genes2orthogroups(explist, og, summarize = "mean") exp_ortho <- lapply(exp_ortho, function(x) filter_by_variance(x, n=1500)) # Previously calculated SFT powers powers <- c(13, 15) gcn_osa <- exp2gcn(exp_ortho$osa, net_type = "signed hybrid", SFTpower = powers[1], cor_method = "pearson") gcn_zma <- exp2gcn(exp_ortho$zma, net_type = "signed hybrid", SFTpower = powers[2], cor_method = "pearson") explist <- exp_ortho ref_net <- gcn_osa test_net <- gcn_zma # 10 permutations for demonstration purposes pres <- module_preservation(explist, ref_net, test_net, nPerm=10)
Perform module stability analysis
module_stability(exp, net, nRuns = 20)
module_stability(exp, net, nRuns = 20)
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
net |
List object returned by |
nRuns |
Number of times to resample. Default is 20. |
A base plot with the module stability results.
data(filt.se) filt <- filt.se[1:100, ] # reducing even further for testing purposes # The SFT fit was previously calculated and the optimal power was 16 gcn <- exp2gcn(filt, SFTpower = 16, cor_method = "pearson") # For simplicity, only 2 runs module_stability(exp = filt, net = gcn, nRuns = 2)
data(filt.se) filt <- filt.se[1:100, ] # reducing even further for testing purposes # The SFT fit was previously calculated and the optimal power was 16 gcn <- exp2gcn(filt, SFTpower = 16, cor_method = "pearson") # For simplicity, only 2 runs module_stability(exp = filt, net = gcn, nRuns = 2)
Correlate module eigengenes to trait
module_trait_cor( exp, metadata, MEs, metadata_cols = NULL, cor_method = "pearson" )
module_trait_cor( exp, metadata, MEs, metadata_cols = NULL, cor_method = "pearson" )
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
metadata |
A data frame containing sample names in row names and sample annotation in the first column. Ignored if 'exp' is a 'SummarizedExperiment' object, since the function will extract colData. |
MEs |
Module eigengenes. It is the 2nd element of the result list
generated by the function |
metadata_cols |
A vector (either numeric or character) indicating which columns should be extracted from column metadata if exp is a 'SummarizedExperiment' object. The vector can contain column indices (numeric) or column names (character). By default, all columns are used. |
cor_method |
Method to calculate correlation. One of 'pearson', 'spearman' or 'kendall'. Default is 'spearman'. |
A data frame with correlation and correlation p-values for each pair of ME and trait, with the following variables:
Factor, module eigengene.
Factor, trait name. Each trait corresponds to a variable of the sample metadata (if numeric) or levels of a variable (if categorical).
Numeric, correlation.
Numeric, correlation P-values.
Character, name of the metadata variable.
Fabricio Almeida-Silva
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") module_trait_cor(filt.se, MEs = gcn$MEs)
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") module_trait_cor(filt.se, MEs = gcn$MEs)
Calculate network statistics
net_stats( adj_matrix = NULL, net_type = c("gcn", "ppi", "grn"), calculate_additional = FALSE )
net_stats( adj_matrix = NULL, net_type = c("gcn", "ppi", "grn"), calculate_additional = FALSE )
adj_matrix |
Adjacency matrix that represents the network. |
net_type |
One of "gcn" (gene coexpression network), "ppi" (protein-protein interaction), or "grn" (gene regulatory network). |
calculate_additional |
Logical indicating whether to calculate additional network statistics (betweenness and closeness). Default is FALSE. |
A list containing the following elements:
Connectivity
ScaledConnectivity
ClusterCoef
MAR (for gcn only)
Density
Centralization
Heterogeneity (gcn only)
Diameter
Betweenness
Closeness
graph_from_adjacency_matrix
,
cliques
,diameter
,
estimate_betweenness
,V
,
closeness
,degree
,
transitivity
,edge_density
,
centr_degree
fundamentalNetworkConcepts
data(filt.se) set.seed(12) filt.se <- exp_preprocess( filt.se, Zk_filtering = FALSE, variance_filter = TRUE, n = 200 ) gcn <- exp2gcn( filt.se, SFTpower = 7, cor_method = "pearson", net_type = "signed hybrid" ) stats <- net_stats(gcn$adjacency_matrix, net_type = "gcn")
data(filt.se) set.seed(12) filt.se <- exp_preprocess( filt.se, Zk_filtering = FALSE, variance_filter = TRUE, n = 200 ) gcn <- exp2gcn( filt.se, SFTpower = 7, cor_method = "pearson", net_type = "signed hybrid" ) stats <- net_stats(gcn$adjacency_matrix, net_type = "gcn")
The orthogroups were downloaded from the PLAZA 4.0 Monocots database.
data(og.zma.osa)
data(og.zma.osa)
A 3-column data frame with orthogroups, species IDs and gene IDs.
Van Bel, M., Diels, T., Vancaester, E., Kreft, L., Botzki, A., Van de Peer, Y., ... & Vandepoele, K. (2018). PLAZA 4.0: an integrative resource for functional, evolutionary and comparative plant genomics. Nucleic acids research, 46(D1), D1190-D1196.
data(og.zma.osa)
data(og.zma.osa)
Filtered expression data in transcripts per million (TPM) from Shin et al., 2021. Genes with TPM values <5 in more than 60 were removed to reduce package size. The expression data and associated sample metadata are stored in a SummarizedExperiment object.
data(osa.se)
data(osa.se)
An object of class SummarizedExperiment
Shin, J., Marx, H., Richards, A., Vaneechoutte, D., Jayaraman, D., Maeda, J., ... & Roy, S. (2021). A network-based comparative framework to study conservation and divergence of proteomes in plant phylogenies. Nucleic Acids Research, 49(1), e3-e3.
data(osa.se)
data(osa.se)
This function converts the orthogroups file named Orthogroups.tsv to a 3-column data frame that can be interpreted by BioNERO.
parse_orthofinder(file_path = NULL)
parse_orthofinder(file_path = NULL)
file_path |
Path to Orthogroups/Orthogroups.tsv file generated by OrthoFinder. |
A 3-column data frame with orthogroups, species IDs and gene IDs, respectively.
Fabricio Almeida-Silva
path <- system.file("extdata", "Orthogroups.tsv", package = "BioNERO") og <- parse_orthofinder(path)
path <- system.file("extdata", "Orthogroups.tsv", package = "BioNERO") og <- parse_orthofinder(path)
Apply Principal Component (PC)-based correction for confounding artifacts
PC_correction(exp, verbose = FALSE)
PC_correction(exp, verbose = FALSE)
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
verbose |
Logical indicating whether to display progress messages or not. Default: FALSE. |
Corrected expression data frame or 'SummarizedExperiment' object.
Fabricio Almeida-Silva
Parsana, P., Ruberman, C., Jaffe, A. E., Schatz, M. C., Battle, A., & Leek, J. T. (2019). Addressing confounding artifacts in reconstruction of gene co-expression networks. Genome biology, 20(1), 1-6.
data(zma.se) exp <- filter_by_variance(zma.se, n=500) exp <- PC_correction(exp)
data(zma.se) exp <- filter_by_variance(zma.se, n=500) exp <- PC_correction(exp)
Plot dendrogram of genes and modules
plot_dendro_and_colors(gcn)
plot_dendro_and_colors(gcn)
gcn |
List object returned by |
A base plot with the gene dendrogram and modules.
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") plot_dendro_and_colors(gcn)
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") plot_dendro_and_colors(gcn)
Plot eigengene network
plot_eigengene_network(gcn, palette = "PRGn")
plot_eigengene_network(gcn, palette = "PRGn")
gcn |
List object returned by |
palette |
Character indicating the name of the RColorBrewer palette to use. Default: "PRGn". |
A base plot with the eigengene network
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") plot_eigengene_network(gcn)
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") plot_eigengene_network(gcn)
Plot expression profile of given genes across samples
plot_expression_profile( genes, exp, metadata, metadata_cols = 1, plot_module = TRUE, net, modulename, bg_line = "mean" )
plot_expression_profile( genes, exp, metadata, metadata_cols = 1, plot_module = TRUE, net, modulename, bg_line = "mean" )
genes |
Character vector containing a subset of genes from which
edges will be extracted. It can be ignored if |
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
metadata |
A data frame of sample metadata containing sample names in row names and sample annotation in subsequent columns. Ignored if 'exp' is a 'SummarizedExperiment' object, since colData will be automatically extracted. |
metadata_cols |
A character or numeric scalar indicating which column should be extracted from column metadata if exp is a 'SummarizedExperiment' object. The column to be extracted can be represented by indices (numeric) or column names (character). By default, the first column is used. |
plot_module |
Logical indicating whether to plot a whole module or not.
If set to FALSE, |
net |
List object returned by |
modulename |
Name of the module to plot. |
bg_line |
Character indicating what to show in the background (black) line. One of "mean" or "median". Default: "mean". |
A ggplot object showing the expression profile of some genes across all samples.
Fabricio Almeida-Silva
data(zma.se) data(filt.se) genes <- rownames(filt.se) plot_expression_profile(genes = genes, exp = zma.se, plot_module = FALSE)
data(zma.se) data(filt.se) genes <- rownames(filt.se) plot_expression_profile(genes = genes, exp = zma.se, plot_module = FALSE)
Plot gene coexpression network from edge list
plot_gcn( edgelist_gcn, net, color_by = "module", hubs = NULL, show_labels = "tophubs", top_n_hubs = 5, curvature = 0, interactive = FALSE, dim_interactive = c(600, 600) )
plot_gcn( edgelist_gcn, net, color_by = "module", hubs = NULL, show_labels = "tophubs", top_n_hubs = 5, curvature = 0, interactive = FALSE, dim_interactive = c(600, 600) )
edgelist_gcn |
Data frame containing the edge list for the GCN.
The edge list can be generated with |
net |
List object returned by |
color_by |
How should nodes be colored? It must be either "module" (nodes will have the colors of their modules) or a 2-column data frame containing genes in the first column and a custom gene annotation in the second column. Default: "module". |
hubs |
Data frame containing hub genes in the first column, their modules in the second column, and intramodular connectivity in the third column. |
show_labels |
Character indicating which nodes will be labeled. One of "all", "allhubs", "tophubs", or "none". Default: tophubs. |
top_n_hubs |
Number of top hubs to be labeled. It is only valid
if |
curvature |
Numeric indicating the amount of curvature in edges. Negative values produce left-hand curves, positive values produce right-hand curves, and zero produces a straight line. Default: 0.1. |
interactive |
Logical indicating whether the network should be interactive or not. Default is FALSE. |
dim_interactive |
Numeric vector with width and height of window for interactive plotting. Default: c(600,600). |
A ggplot object.
Fabricio Almeida-Silva
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") gcn_edges <- get_edge_list(gcn, module="brown", filter=TRUE, method="min_cor") hubs <- get_hubs_gcn(filt.se, gcn) p <- plot_gcn(gcn_edges, gcn, hubs = hubs)
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") gcn_edges <- get_edge_list(gcn, module="brown", filter=TRUE, method="min_cor") hubs <- get_hubs_gcn(filt.se, gcn) p <- plot_gcn(gcn_edges, gcn, hubs = hubs)
Plot a heatmap of gene significance
plot_gene_significance(corandp, palette = "RdYlBu", transpose = FALSE, ...)
plot_gene_significance(corandp, palette = "RdYlBu", transpose = FALSE, ...)
corandp |
A data frame of gene-trait correlations as returned
by |
palette |
Character indicating which RColorBrewer palette to use. Default: 'RdYlBu'. |
transpose |
Logical indicating whether to transpose the heatmap or not. |
... |
Additional arguments to |
Significance levels: 1 asterisk: significant at alpha = 0.05. 2 asterisks: significant at alpha = 0.01. 3 asterisks: significant at alpha = 0.001. no asterisk: not significant.
A 'Heatmap' object created by ComplexHeatmap::pheatmap()
.
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") corandp <- gene_significance(filt.se) plot_gene_significance(corandp, show_rownames = FALSE)
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") corandp <- gene_significance(filt.se) plot_gene_significance(corandp, show_rownames = FALSE)
Plot gene regulatory network from edge list
plot_grn( edgelist_grn, show_labels = "tophubs", top_n_hubs = 5, layout = igraph::with_kk, arrow.gap = 0.01, ranked = TRUE, curvature = 0.1, interactive = FALSE, dim_interactive = c(600, 600) )
plot_grn( edgelist_grn, show_labels = "tophubs", top_n_hubs = 5, layout = igraph::with_kk, arrow.gap = 0.01, ranked = TRUE, curvature = 0.1, interactive = FALSE, dim_interactive = c(600, 600) )
edgelist_grn |
Data frame containing the edge list for the GRN network. First column is the TF and second column is the target gene. All other columns are interpreted as edge attributes. |
show_labels |
Character indicating which nodes will be labeled. One of "all", "allhubs", "tophubs", or "none". |
top_n_hubs |
Number of top hubs to be labeled. It is only valid
if |
layout |
igraph function for the network layout. One of with_dh, with_drl, with_gem, with_lgl, with_fr, with_graphopt, with_kk and with_mds. Default is with_kk. |
arrow.gap |
Numeric indicating the distance between nodes and arrows. Default is 0.2. |
ranked |
Logical indicating whether to treat third column of the edge list (edge weights) as ranked values. Default: TRUE. |
curvature |
Numeric indicating the amount of curvature in edges. Negative values produce left-hand curves, positive values produce right-hand curves, and zero produces a straight line. Default: 0.1. |
interactive |
Logical indicating whether the network should be interactive or not. Default is FALSE. |
dim_interactive |
Numeric vector with width and height of window for interactive plotting. Default: c(600,600). |
A ggplot object containing the network.
Fabricio Almeida-Silva
data(filt.se) tfs <- sample(rownames(filt.se), size = 50, replace = FALSE) grn_edges <- grn_infer(filt.se, method = "clr", regulators = tfs) p <- plot_grn(grn_edges, ranked = FALSE)
data(filt.se) tfs <- sample(rownames(filt.se), size = 50, replace = FALSE) grn_edges <- grn_infer(filt.se, method = "clr", regulators = tfs) p <- plot_grn(grn_edges, ranked = FALSE)
Plot heatmap of hierarchically clustered sample correlations or gene expression
plot_heatmap( exp, col_metadata = NA, row_metadata = NA, coldata_cols = NULL, rowdata_cols = NULL, type = "samplecor", cor_method = "spearman", palette = NULL, log_trans = FALSE, ... )
plot_heatmap( exp, col_metadata = NA, row_metadata = NA, coldata_cols = NULL, rowdata_cols = NULL, type = "samplecor", cor_method = "spearman", palette = NULL, log_trans = FALSE, ... )
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
col_metadata |
A data frame containing sample names in row names and sample annotation in the subsequent columns. The maximum number of columns is 3 to ensure legends can be visualized. Ignored if 'exp' is a 'SummarizedExperiment' object, since the function will extract colData. Default: NA. |
row_metadata |
A data frame containing gene IDs in row names and gene functional classification in the first column. The maximum number of columns is 3 to ensure legends can be visualized. Default: NA. |
coldata_cols |
A vector (either numeric or character) indicating which columns should be extracted from column metadata if exp is a 'SummarizedExperiment' object. The vector can contain column indices (numeric) or column names (character). By default, all columns are used. |
rowdata_cols |
A vector (either numeric or character) indicating which columns should be extracted from row metadata if exp is a 'SummarizedExperiment' object. The vector can contain column indices (numeric) or column names (character). By default, all columns are used. |
type |
Type of heatmap to plot. One of 'samplecor' (sample correlations) or 'expr'. Default: 'samplecor'. |
cor_method |
Correlation method to use in case type is "samplecor". One of 'spearman' or 'pearson'. Default is 'spearman'. |
palette |
RColorBrewer palette to use. Default is "Blues" for sample correlation heatmaps and "YlOrRd" for gene expression heatmaps. |
log_trans |
Logical indicating whether to log transform the expression data or not. Default: FALSE. |
... |
Additional arguments to be passed
to |
A heatmap of sample correlations or gene expression.
Fabricio Almeida-Silva
data(filt.se) plot_heatmap(filt.se)
data(filt.se) plot_heatmap(filt.se)
Plot a heatmap of module-trait correlations
plot_module_trait_cor(corandp, palette = "RdYlBu", transpose = FALSE)
plot_module_trait_cor(corandp, palette = "RdYlBu", transpose = FALSE)
corandp |
A data frame of module-trait correlations as returned
by |
palette |
Character indicating which RColorBrewer palette to use. Default: 'RdYlBu'. |
transpose |
Logical indicating whether to transpose the heatmap or not. |
Significance levels: 1 asterisk: significant at alpha = 0.05. 2 asterisks: significant at alpha = 0.01. 3 asterisks: significant at alpha = 0.001. no asterisk: not significant.
A 'Heatmap' object created by ComplexHeatmap::pheatmap()
.
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") corandp <- module_trait_cor(filt.se, MEs = gcn$MEs) plot_module_trait_cor(corandp)
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") corandp <- module_trait_cor(filt.se, MEs = gcn$MEs) plot_module_trait_cor(corandp)
Plot number of genes per module
plot_ngenes_per_module(net = NULL)
plot_ngenes_per_module(net = NULL)
net |
List object returned by |
A ggplot object with a bar plot of gene number in each module.
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") plot_ngenes_per_module(gcn)
data(filt.se) gcn <- exp2gcn(filt.se, SFTpower = 18, cor_method = "pearson") plot_ngenes_per_module(gcn)
Plot Principal Component Analysis (PCA) of samples
plot_PCA( exp, metadata, metadata_cols = NULL, log_trans = FALSE, PCs = c(1, 2), size = 2 )
plot_PCA( exp, metadata, metadata_cols = NULL, log_trans = FALSE, PCs = c(1, 2), size = 2 )
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
metadata |
A data frame of sample metadata containing sample names in row names and sample annotation in subsequent columns. Ignored if 'exp' is a 'SummarizedExperiment' object, since colData will be automatically extracted. |
metadata_cols |
A vector (either numeric or character) indicating which columns should be extracted from column metadata if exp is a 'SummarizedExperiment' object. The vector can contain column indices (numeric) or column names (character). By default, all columns are used. |
log_trans |
Logical indicating whether the gene expression matrix
should be log transformed using |
PCs |
Numeric vector of length 2 indicating the principal components
to be plotted on the x-axis and y-axis, respectively.
Default: |
size |
Numeric indicating the point size. Default is 2. |
A ggplot object with the PCA plot.
Fabricio Almeida-Silva
data(zma.se) plot_PCA(zma.se, log_trans = TRUE)
data(zma.se) plot_PCA(zma.se, log_trans = TRUE)
Plot protein-protein interaction network from edge list
plot_ppi( edgelist_int, color_by = "community", clustering_method = igraph::cluster_infomap, show_labels = "tophubs", top_n_hubs = 5, add_color_legend = TRUE, curvature = 0, interactive = FALSE, dim_interactive = c(600, 600) )
plot_ppi( edgelist_int, color_by = "community", clustering_method = igraph::cluster_infomap, show_labels = "tophubs", top_n_hubs = 5, add_color_legend = TRUE, curvature = 0, interactive = FALSE, dim_interactive = c(600, 600) )
edgelist_int |
Data frame containing the edge list for the PPI network. First column is the protein 1 and second column is the protein 2. All other columns are interpreted as edge attributes. |
color_by |
How should nodes be colored? It must be either "community" or a 2-column data frame containing proteins in the first column and a custom annotation in the second column. If "community", a clustering algorithm will be applied. Default: "community". |
clustering_method |
igraph function to be used for community detection. Available functions are cluster_infomap, cluster_edge_betweenness, cluster_fast_greedy, cluster_walktrap, cluster_spinglass, cluster_leading_eigen, cluster_louvain, and cluster_label_prop. Default is cluster_infomap. |
show_labels |
Character indicating which nodes will be labeled. One of "all", "allhubs", "tophubs", or "none". |
top_n_hubs |
Number of top hubs to be labeled. It is only valid
if |
add_color_legend |
Logical indicating whether to add a color legend for nodes. Default: TRUE. |
curvature |
Numeric indicating the amount of curvature in edges. Negative values produce left-hand curves, positive values produce right-hand curves, and zero produces a straight line. Default: 0. |
interactive |
Logical indicating whether the network should be interactive or not. Default is FALSE. |
dim_interactive |
Numeric vector with width and height of window for interactive plotting. Default: c(600,600). |
A ggplot object.
Fabricio Almeida-Silva
ppi_edges <- igraph::sample_pa(n = 500) ppi_edges <- igraph::as_edgelist(ppi_edges) p <- plot_ppi(ppi_edges, add_color_legend = FALSE)
ppi_edges <- igraph::sample_pa(n = 500) ppi_edges <- igraph::as_edgelist(ppi_edges) p <- plot_ppi(ppi_edges, add_color_legend = FALSE)
Quantile normalize the expression data
q_normalize(exp)
q_normalize(exp)
exp |
A gene expression data frame with genes in row names and samples in column names. |
Expression matrix with normalized values
data(zma.se) exp <- SummarizedExperiment::assay(zma.se) norm_exp <- q_normalize(exp)
data(zma.se) exp <- SummarizedExperiment::assay(zma.se) norm_exp <- q_normalize(exp)
Remove genes that are not expressed based on a user-defined threshold
remove_nonexp( exp, method = "median", min_exp = 1, min_percentage_samples = 0.25 )
remove_nonexp( exp, method = "median", min_exp = 1, min_percentage_samples = 0.25 )
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
method |
Criterion to filter non-expressed genes out. One of "mean", "median", "percentage", or "allsamples". Default is "median". |
min_exp |
If method is 'mean', 'median', or 'allsamples', the minimum value for a gene to be considered expressed. If method is 'percentage', the minimum value each gene must have in at least n percent of samples to be considered expressed. |
min_percentage_samples |
In case the user chooses 'percentage' as method, expressed genes must have expression >= min_exp in at least this percentage. Values must range from 0 to 1. |
Filtered gene expression data frame or 'SummarizedExperiment' object.
Fabricio Almeida-Silva
data(zma.se) filt_exp <- remove_nonexp(zma.se, min_exp = 5)
data(zma.se) filt_exp <- remove_nonexp(zma.se, min_exp = 5)
Remove missing values in a gene expression data frame
replace_na(exp, replaceby = 0)
replace_na(exp, replaceby = 0)
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
replaceby |
What to use instead of NAs. One of 0 or 'mean'. Default is 0. |
Gene expression data frame or 'SummarizedExperiment' object with all NAs replaced according to the argument 'replaceby'.
Fabricio Almeida-Silva
data(zma.se) exp <- replace_na(zma.se) sum(is.na(exp))
data(zma.se) exp <- replace_na(zma.se) sum(is.na(exp))
Pick power to fit network to a scale-free topology
SFT_fit(exp, net_type = "signed", rsquared = 0.8, cor_method = "spearman")
SFT_fit(exp, net_type = "signed", rsquared = 0.8, cor_method = "spearman")
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
net_type |
Network type. One of 'signed', 'signed hybrid' or 'unsigned'. Default is signed. |
rsquared |
R squared cutoff. Default is 0.8. |
cor_method |
Correlation method. One of "pearson", "biweight" or "spearman". Default is "spearman". |
A list containing:
powerOptimal power based on scale-free topology fit
plotA ggplot object displaying main statistics of the SFT fit test
Fabricio Almeida-Silva
data(filt.se) sft <- SFT_fit(filt.se, cor_method = "pearson")
data(filt.se) sft <- SFT_fit(filt.se, cor_method = "pearson")
Filter outlying samples based on the standardized connectivity (Zk) method
ZKfiltering(exp, zk = -2, cor_method = "spearman")
ZKfiltering(exp, zk = -2, cor_method = "spearman")
exp |
A gene expression data frame with genes in row names and samples in column names or a 'SummarizedExperiment' object. |
zk |
Standardized connectivity threshold. Default is -2. |
cor_method |
Correlation method. One of "pearson", "biweight" or "spearman". Default is "spearman". |
Filtered gene expression data frame or 'SummarizedExperiment' object.
Fabricio Almeida-Silva
Oldham, M. C., Langfelder, P., & Horvath, S. (2012). Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease. BMC systems biology, 6(1), 1-18.
data(zma.se) filt_exp <- ZKfiltering(zma.se)
data(zma.se) filt_exp <- ZKfiltering(zma.se)
Interpro protein domain annotation retrieved from the
PLAZA Monocots 4.0 database.
Only genes included in zma.se
are present in this subset.
data(zma.interpro)
data(zma.interpro)
A 2-column data frame containing gene IDs and their associated Interpro annotations.
Van Bel, M., Diels, T., Vancaester, E., Kreft, L., Botzki, A., Van de Peer, Y., ... & Vandepoele, K. (2018). PLAZA 4.0: an integrative resource for functional, evolutionary and comparative plant genomics. Nucleic acids research, 46(D1), D1190-D1196.
data(zma.interpro)
data(zma.interpro)
Filtered expression data in transcripts per million (TPM) from Shin et al., 2021. Genes with TPM values <5 in more than 60 were removed to reduce package size. The expression data and associated sample metadata are stored in a SummarizedExperiment object.
data(zma.se)
data(zma.se)
An object of class SummarizedExperiment
Shin, J., Marx, H., Richards, A., Vaneechoutte, D., Jayaraman, D., Maeda, J., ... & Roy, S. (2021). A network-based comparative framework to study conservation and divergence of proteomes in plant phylogenies. Nucleic Acids Research, 49(1), e3-e3.
data(zma.se)
data(zma.se)
Transcription factors and their families were downloaded from PlantTFDB 4.0.
data(zma.tfs)
data(zma.tfs)
A data frame with gene IDs of TFs and their associated families.
Jin, J., Tian, F., Yang, D. C., Meng, Y. Q., Kong, L., Luo, J., & Gao, G. (2016). PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic acids research, gkw982.
data(zma.tfs)
data(zma.tfs)