Title: | Gene regulator enrichment analysis |
---|---|
Description: | This package is a pipeline to identify the key gene regulators in a biological process, for example in cell differentiation and in cell development after stimulation. There are four major steps in this pipeline: (1) differential expression analysis; (2) regulator-target network inference; (3) enrichment analysis; and (4) regulators scoring and ranking. |
Authors: | Weiyang Tao [cre, aut], Aridaman Pandit [aut] |
Maintainer: | Weiyang Tao <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.17.0 |
Built: | 2024-12-17 06:17:22 UTC |
Source: | https://github.com/bioc/RegEnrich |
DeaSet class
colData
DataFrame object, sample information, the row name is
corresponding to the column names of expression matrix in the assays
slot.
assays
SimpleList object of one/multiple matrix/matrices, this is the slot for storing the expression data after filtering (and after Variance Stabilizing Transformation, i.e. VST, if the differential analysis method is 'Wald_DESeq2' or 'LRT_DESeq2'). And the expression matrix is used for network inference and plotting.
NAMES
row names of expression data in assays
slot and
elementMetadata
slot.
elementMetadata
feature information, contains at least a DataFrame of three columns, i.e. 'gene', 'p' and 'logFC', which stores gene names/IDs, differential p values and log2 expression fold changes, respectively.
metadata
DataFrame object, information of feature columns.
assayRaw
a slot for saving the raw expression data.
nrows = 100 ncols = 6 counts = matrix(rnbinom(nrows * ncols, size = 2, mu = 500), nrow = nrows) assays = SimpleList(assayData = counts) colData = DataFrame(Condition = rep(c("treatment", "ctrl"), 3), row.names=LETTERS[1:6]) geneNames = sprintf("G%03s", seq(nrows)) elementMetadata = DataFrame(gene = geneNames, p = numeric(nrows), logFC = numeric(nrows)) ds = new("DeaSet", assays = Assays(assays), colData = colData, assayRaw = counts, elementMetadata = elementMetadata, NAMES = geneNames) ds
nrows = 100 ncols = 6 counts = matrix(rnbinom(nrows * ncols, size = 2, mu = 500), nrow = nrows) assays = SimpleList(assayData = counts) colData = DataFrame(Condition = rep(c("treatment", "ctrl"), 3), row.names=LETTERS[1:6]) geneNames = sprintf("G%03s", seq(nrows)) elementMetadata = DataFrame(gene = geneNames, p = numeric(nrows), logFC = numeric(nrows)) ds = new("DeaSet", assays = Assays(assays), colData = colData, assayRaw = counts, elementMetadata = elementMetadata, NAMES = geneNames) ds
dimention of 'TopNetwork' object
## S4 method for signature 'TopNetwork' dim(x)
## S4 method for signature 'TopNetwork' dim(x)
x |
a 'TopNetwork' object. |
Dimention of regulator-target network edge table.
nw = newTopNetwork() dim(nw)
nw = newTopNetwork() dim(nw)
The 'Enrich' object is to store enrichment analysis results by either 'FET' method or 'GSEA' method.
topResult
data frame. The enrichment results that pass thresholds (default threshold is 0.05).
allResult
data frame. The enrichment results by FET or GSEA methods.
gene
character vector indicating the genes used for enrichment analysis.
namedScores
numeric vector, a vector of ranked scores (decendent), the names of the scores are the genes to perform enrichment analysis. Here the scores are p-value of each gene.
type
character indicating enrichment method, either 'FET' or 'GSEA'.
Inference the name of results of DESeq analysis by a formula (or model matrix) and sample information
getResultsNames(design, pData = NULL)
getResultsNames(design, pData = NULL)
design |
either a formula or a model matrix. |
pData |
a data frame, showing the information of each sample. If design is a formula, the pData must be include the columns that identical to the terms of the design formula. If design is a model matrix, then pData is not used. Default is NULL. |
the names of contrast parameter (list of character format) that
regenrich_diffExpr
and results
function can use, and it is the same as the value that
resultsNames
function returns.
# formula with intercept design = ~condition pData = data.frame(condition = factor(c('A', 'A', 'A', 'B', 'B', 'B'), c('A', 'B'))) getResultsNames(design, pData) # formula without intercept design = ~0+condition getResultsNames(design, pData) # formula with two terms design = ~condition+treatment pData = data.frame(condition = factor(rep(c('A', 'B'), each= 4), c('A', 'B')), treatment = factor(rep_len(c('Ctrl', 'Treat'), 8), c('Ctrl', 'Treat'))) getResultsNames(design, pData) # formula with two terms and an interaction term design = ~condition+treatment+condition:treatment getResultsNames(design, pData) # design is a model matrix pData = data.frame(condition = factor(rep(c('A', 'B'), each= 4), c('A', 'B')), treatment = factor(rep_len(c('Ctrl', 'Treat'), 8), c('Ctrl', 'Treat'))) design = model.matrix(~condition+treatment, pData) getResultsNames(design)
# formula with intercept design = ~condition pData = data.frame(condition = factor(c('A', 'A', 'A', 'B', 'B', 'B'), c('A', 'B'))) getResultsNames(design, pData) # formula without intercept design = ~0+condition getResultsNames(design, pData) # formula with two terms design = ~condition+treatment pData = data.frame(condition = factor(rep(c('A', 'B'), each= 4), c('A', 'B')), treatment = factor(rep_len(c('Ctrl', 'Treat'), 8), c('Ctrl', 'Treat'))) getResultsNames(design, pData) # formula with two terms and an interaction term design = ~condition+treatment+condition:treatment getResultsNames(design, pData) # design is a model matrix pData = data.frame(condition = factor(rep(c('A', 'B'), each= 4), c('A', 'B')), treatment = factor(rep_len(c('Ctrl', 'Treat'), 8), c('Ctrl', 'Treat'))) design = model.matrix(~condition+treatment, pData) getResultsNames(design)
head or tail of Score object
## S4 method for signature 'Score' head(x, ...) ## S4 method for signature 'Score' tail(x, ...)
## S4 method for signature 'Score' head(x, ...) ## S4 method for signature 'Score' tail(x, ...)
x |
an |
... |
arguments to be passed to or from other methods. |
Head or tail table of Score object.
s = newScore(letters, seq(26), seq(26), seq(26), seq(2, 0, len = 26)) s1 = head(s) s1 s2 = tail(s) s2
s = newScore(letters, seq(26), seq(26), seq(26), seq(2, 0, len = 26)) s1 = head(s) s1 s2 = tail(s) s2
Data from an RNA sequencing experiment on peripheral mononuclear blood cells (PBMC) of Lyme disease patients against healthy controls. It contains a gene expression (FPKM) table (data frame) and a sample information table (data frame).
data(Lyme_GSE63085)
data(Lyme_GSE63085)
A list of 2 elements: FPKM and sampleInfo. FPKM is the 'Fragments Per Kilobase of transcript per Million mapped reads' data, which is a 5000 (genes) * 52 (samples) data frame. sampleInfo is the information of samples, which is 52 (samples) * 9 (features) data frame. The full version of FPKM table contains 23615 rows, which can be downloaded from GEO database.
Bouquet et al. (2016) mBio 7(1): e00100-16 (PubMed)
DeaSet object creator
newDeaSet( assayRaw = matrix(nrow = 0, ncol = 0), rowData = NULL, assays = SimpleList(), colData = DataFrame(), metadata = list() )
newDeaSet( assayRaw = matrix(nrow = 0, ncol = 0), rowData = NULL, assays = SimpleList(), colData = DataFrame(), metadata = list() )
assayRaw |
A matrix of gene expression data. This can be the same
as the matrix-like element in |
rowData |
A DataFrame object describing the rows. |
assays |
A list or SimpleList of matrix-like element,
or a matrix-like object. The matrix-like element can be the same
as |
colData |
A DataFrame describing the sample information. |
metadata |
An optional list of arbitrary content describing the overall experiment. |
A DeaSet object.
# Empty DeaSet object newDeaSet() # 100 * 6 DeaSet object nrows = 100 ncols = 6 counts = matrix(rnbinom(nrows * ncols, size = 2, mu = 500), nrow = nrows) assays = SimpleList(counts=counts) colData = DataFrame(Condition = rep(c("treatment", "ctrl"), 3), row.names=LETTERS[1:6]) geneNames = sprintf("G%03s", seq(nrows)) elementMetadata = DataFrame(gene = geneNames, p = numeric(nrows), logFC = numeric(nrows)) newDeaSet(assayRaw = counts, rowData = elementMetadata, assays = SimpleList(assayData = counts), colData = colData)
# Empty DeaSet object newDeaSet() # 100 * 6 DeaSet object nrows = 100 ncols = 6 counts = matrix(rnbinom(nrows * ncols, size = 2, mu = 500), nrow = nrows) assays = SimpleList(counts=counts) colData = DataFrame(Condition = rep(c("treatment", "ctrl"), 3), row.names=LETTERS[1:6]) geneNames = sprintf("G%03s", seq(nrows)) elementMetadata = DataFrame(gene = geneNames, p = numeric(nrows), logFC = numeric(nrows)) newDeaSet(assayRaw = counts, rowData = elementMetadata, assays = SimpleList(assayData = counts), colData = colData)
This function create 'TopNetwork' object using 3-column edge table.
newTopNetwork( networkEdgeTable, reg = "", directed = TRUE, networkConstruction = c("new", "COEN", "GRN"), percent = 100 )
newTopNetwork( networkEdgeTable, reg = "", directed = TRUE, networkConstruction = c("new", "COEN", "GRN"), percent = 100 )
networkEdgeTable |
a data frame of 3 columns, representing 'from.gene' ('regulators'), 'to.gene' ('targets') and 'weight', respectively. |
reg |
a vector of gene regulators. |
directed |
logical, whether the network is directed. Default is TRUE. |
networkConstruction |
the method to construct this network. Possible can be: 'COEN', coexpression network; 'GRN', gene regulatory network by random forest; 'new' (default), meaning a network provided by user, rather than infered based on the expression data. |
percent |
the percentage of edges in the original whole network. Default is 100, meaning 100% edges in whole network. |
an object of topNetwork class.
data(TFs) edge = data.frame(from = rep(TFs$TF_name[seq(3)], seq(3)), to = TFs$TF_name[11:16], weight = 0.1*(6:1)) object = newTopNetwork(edge, networkConstruction = 'new', percent = 100) object str(object)
data(TFs) edge = data.frame(from = rep(TFs$TF_name[seq(3)], seq(3)), to = TFs$TF_name[11:16], weight = 0.1*(6:1)) object = newTopNetwork(edge, networkConstruction = 'new', percent = 100) object str(object)
Plot FET/GSEA enrichment results. If the FET method is applied, the top 'showCategory' regulator will be plotted. If the GSEA method is applied, the GSEA graph of regulator 'reg' will be plotted.
plot_Enrich(object, ...) ## S4 method for signature 'RegenrichSet' plot_Enrich( object, reg = NULL, showCategory = 20, regDescription = NULL, font.size = 12 )
plot_Enrich(object, ...) ## S4 method for signature 'RegenrichSet' plot_Enrich( object, reg = NULL, showCategory = 20, regDescription = NULL, font.size = 12 )
object |
a |
... |
other parameters. |
reg |
The regulator to plot. This only works when the GSEA enrichment method has used. |
showCategory |
the number of regulator to plot. |
regDescription |
NULL or a two-column data frame, in which first column is the regulator IDs (for example ENSEMBL IDs), and the second column is the description of regulators (for example gene name). Default is NULL, meaning both columns are the same regulator names/IDs in the network. |
font.size |
font size of axis labels and axis tick mark labels, default is 12. |
a ggplot object of plotting FET or GSEA enrichment result.
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) object = RegenrichSet(expr = data1, colData = colData, method = "limma", minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = "COEN", enrichTest = "FET") # Differential expression analysis object = regenrich_diffExpr(object) # Network inference using "COEN" method object = regenrich_network(object) # Enrichment analysis by Fisher's exact test (FET) object = regenrich_enrich(object) # plot plot_Enrich(object) # Enrichment analysis by Fisher's exact test (FET) object = regenrich_enrich(object, enrichTest = "GSEA") # plot plot_Enrich(object)
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) object = RegenrichSet(expr = data1, colData = colData, method = "limma", minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = "COEN", enrichTest = "FET") # Differential expression analysis object = regenrich_diffExpr(object) # Network inference using "COEN" method object = regenrich_network(object) # Enrichment analysis by Fisher's exact test (FET) object = regenrich_enrich(object) # plot plot_Enrich(object) # Enrichment analysis by Fisher's exact test (FET) object = regenrich_enrich(object, enrichTest = "GSEA") # plot plot_Enrich(object)
compare the orders of two vectors
plotOrders(name1, name2)
plotOrders(name1, name2)
name1 |
a vector with first order. |
name2 |
a vector with anothoer second order. |
A plot of comparing two orders of vectors.
a = c('a1', 'a2', 'a5', 'a4') b = c( 'a2', 'a5', 'a7', 'a4', 'a6') plotOrders(a, b)
a = c('a1', 'a2', 'a5', 'a4') b = c( 'a2', 'a5', 'a7', 'a4', 'a6') plotOrders(a, b)
Plot regulator and its targets expression
plotRegTarExpr( object, reg, n = 1000, scale = TRUE, tarCol = "black", tarColAlpha = 0.1, regCol = "#ffaa00", xlab = "Samples", ylab = "Z-scores", ... )
plotRegTarExpr( object, reg, n = 1000, scale = TRUE, tarCol = "black", tarColAlpha = 0.1, regCol = "#ffaa00", xlab = "Samples", ylab = "Z-scores", ... )
object |
a RegenrichSet object, to which at
least |
reg |
a regulator to plot. |
n |
the maximun number of targets to plot. |
scale |
logical, whether gene expression is z-score normalized. |
tarCol |
the color of the lines for the targets of the regulator. |
tarColAlpha |
numeric, ranging from 0 to 1, indicating transparancy of target lines. |
regCol |
the color of the line for the 'reg'. |
xlab |
x label of plot. |
ylab |
y label of plot. |
... |
other parameters in |
a ggplot object.
# constructing a RegenrichSet object colData = data.frame(patientID = paste0('Sample_', seq(50)), week = rep(c('0', '1'), each = 25), row.names = paste0('Sample_', seq(50)), stringsAsFactors = TRUE) design = ~week reduced = ~1 set.seed(123) cnts = matrix(as.integer(rnbinom(n=1000*50, mu=100, size=1/0.1)), ncol=50, dimnames = list(paste0('gene', seq(1000)), rownames(colData))) cnts[5,26:50] = cnts[5,26:50] + 50L # add reads to gene5 in some samples. id = sample(31:1000, 20) # randomly select 20 rows, and assign reads. cnts[id,] = vapply(cnts[5,], function(x){ as.integer(rnbinom(n = 20, size = 1/0.02, mu = x))}, FUN.VALUE = rep(1L, 20)) object = RegenrichSet(expr = cnts, colData = colData, method = 'LRT_DESeq2', minMeanExpr = 0, design = design, reduced = reduced, fitType = 'local', networkConstruction = 'COEN', enrichTest = 'FET', reg = paste0('gene', seq(30))) ## RegEnrich analysis object = regenrich_diffExpr(object) # Set a random softPower, otherwise it is difficult to achive a # scale-free network because of a randomly generated count data. object = regenrich_network(object, softPower = 3) object = regenrich_enrich(object) object = regenrich_rankScore(object) ## plot expression of a regulator and its targets. plotRegTarExpr(object, reg = 'gene5') plotRegTarExpr(object, reg = 'gene27')
# constructing a RegenrichSet object colData = data.frame(patientID = paste0('Sample_', seq(50)), week = rep(c('0', '1'), each = 25), row.names = paste0('Sample_', seq(50)), stringsAsFactors = TRUE) design = ~week reduced = ~1 set.seed(123) cnts = matrix(as.integer(rnbinom(n=1000*50, mu=100, size=1/0.1)), ncol=50, dimnames = list(paste0('gene', seq(1000)), rownames(colData))) cnts[5,26:50] = cnts[5,26:50] + 50L # add reads to gene5 in some samples. id = sample(31:1000, 20) # randomly select 20 rows, and assign reads. cnts[id,] = vapply(cnts[5,], function(x){ as.integer(rnbinom(n = 20, size = 1/0.02, mu = x))}, FUN.VALUE = rep(1L, 20)) object = RegenrichSet(expr = cnts, colData = colData, method = 'LRT_DESeq2', minMeanExpr = 0, design = design, reduced = reduced, fitType = 'local', networkConstruction = 'COEN', enrichTest = 'FET', reg = paste0('gene', seq(30))) ## RegEnrich analysis object = regenrich_diffExpr(object) # Set a random softPower, otherwise it is difficult to achive a # scale-free network because of a randomly generated count data. object = regenrich_network(object, softPower = 3) object = regenrich_enrich(object) object = regenrich_rankScore(object) ## plot expression of a regulator and its targets. plotRegTarExpr(object, reg = 'gene5') plotRegTarExpr(object, reg = 'gene27')
Plot soft power and corresponding scale free topology fitting index to find a proper soft power for WGCNA analysis.
plotSoftPower( expr, rowSample = FALSE, weights = NULL, powerVector = c(seq(10), seq(12, 20, by = 2)), RsquaredCut = 0.85, networkType = "unsigned", removeFirst = FALSE, nBreaks = 10, corFnc = WGCNA::cor, corOptions = list(use = "p") )
plotSoftPower( expr, rowSample = FALSE, weights = NULL, powerVector = c(seq(10), seq(12, 20, by = 2)), RsquaredCut = 0.85, networkType = "unsigned", removeFirst = FALSE, nBreaks = 10, corFnc = WGCNA::cor, corOptions = list(use = "p") )
expr |
Gene expression data, either a matrix or a data frame. By default, each row represents a gene, each column represents a sample. |
rowSample |
logic. If |
weights |
optional observation weights for |
powerVector |
a vector of soft thresholding powers for which the scale free topology fit indices are to be calculated. |
RsquaredCut |
desired minimum scale free topology fitting index R^2. The default is 0.85. |
networkType |
character, network type. Allowed values are
(unique abbreviations of) "unsigned" (default), "signed", "signed hybrid".
See |
removeFirst |
should the first bin be removed from the connectivity histogram? The default is FALSE. |
nBreaks |
number of bins in connectivity histograms. The default is 10. |
corFnc |
correlation function to be used in adjacency calculation.
The default is the |
corOptions |
a named list of options to the correlation function specified in corFnc. The default is list(use = "p"). |
a list of three elements: powerEstimate
, fitIndices
,
and plot
.
powerEstimate
is an estimate of an appropriate soft-thresholding
power. fitIndices
is a data frame containing the fit indices for
scale free topology. The plot
is a ggplot object.
data(Lyme_GSE63085) log2FPKM = log2(Lyme_GSE63085$FPKM + 1) log2FPKMhi = log2FPKM[rowMeans(log2FPKM) >= 10^-3, , drop = FALSE] log2FPKMhi = head(log2FPKMhi, 3000) # First 3000 genes for example softP = plotSoftPower(log2FPKMhi, RsquaredCut = 0.85) print(softP)
data(Lyme_GSE63085) log2FPKM = log2(Lyme_GSE63085$FPKM + 1) log2FPKMhi = log2FPKM[rowMeans(log2FPKM) >= 10^-3, , drop = FALSE] log2FPKMhi = head(log2FPKMhi, 3000) # First 3000 genes for example softP = plotSoftPower(log2FPKMhi, RsquaredCut = 0.85) print(softP)
Print Score object
## S3 method for class 'Score' print(x, ...)
## S3 method for class 'Score' print(x, ...)
x |
a Score object. |
... |
optional arguments to print. |
print.Score returns the a Score
object
x = newScore(letters[1:5], 1:5, 1:5, -2:2, seq(2, 1, len = 5)) print(x)
x = newScore(letters[1:5], 1:5, 1:5, -2:2, seq(2, 1, len = 5)) print(x)
This is the first step of RegEnrich analysis. differential expression analysis by this function needs to be performed on a 'RegenrichSet' object.
regenrich_diffExpr(object, ...) ## S4 method for signature 'RegenrichSet' regenrich_diffExpr(object, ...)
regenrich_diffExpr(object, ...) ## S4 method for signature 'RegenrichSet' regenrich_diffExpr(object, ...)
object |
a 'RegenrichSet' object, which is initialized by
|
... |
arguments for differential analysis.
After constructing a 'RegenrichSet' object,
all arguments for RegEnrich analysis have been initialized and
stored in 'paramsIn“ slot. while the arguments for differential analysis
can be re-specified here. |
This function returns a 'RegenrichSet' object with an updated
'resDEA' slot, which is a 'DeaSet' object, and an updated 'paramsIn' slot.
See newDeaSet
function for more details about 'DeaSet' class.
If an argument not in the above list is specified in the regenrich_diffExpr
function, a warning or error will be raised.
Initialization of a 'RegenrichSet' object
RegenrichSet
,and next step
regenrich_network
.
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) # Initializing a 'RegenrichSet' object object = RegenrichSet(expr = data1, colData = colData, method = 'limma', minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = 'COEN', enrichTest = 'FET') # Using the predifined parameters in the previous step (object = regenrich_diffExpr(object)) # re-specifying parameter 'minMeanExpr' print(slot(object, 'paramsIn')$minMeanExpr) (object = regenrich_diffExpr(object, minMeanExpr = 1)) print(slot(object, 'paramsIn')$minMeanExpr) # Unrecognized argument 'unrecognizedArg' (Error) # object = regenrich_diffExpr(object, minMeanExpr = 1, # unrecognizedArg = 23) # Argument not for differential expression analysis (Warning) # print(slot(object, 'paramsIn')$networkConstruction) # (object = regenrich_diffExpr(object, minMeanExpr = 1, # networkConstruction = 'GRN')) # print(slot(object, 'paramsIn')$networkConstruction) # not changed
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) # Initializing a 'RegenrichSet' object object = RegenrichSet(expr = data1, colData = colData, method = 'limma', minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = 'COEN', enrichTest = 'FET') # Using the predifined parameters in the previous step (object = regenrich_diffExpr(object)) # re-specifying parameter 'minMeanExpr' print(slot(object, 'paramsIn')$minMeanExpr) (object = regenrich_diffExpr(object, minMeanExpr = 1)) print(slot(object, 'paramsIn')$minMeanExpr) # Unrecognized argument 'unrecognizedArg' (Error) # object = regenrich_diffExpr(object, minMeanExpr = 1, # unrecognizedArg = 23) # Argument not for differential expression analysis (Warning) # print(slot(object, 'paramsIn')$networkConstruction) # (object = regenrich_diffExpr(object, minMeanExpr = 1, # networkConstruction = 'GRN')) # print(slot(object, 'paramsIn')$networkConstruction) # not changed
As the thrid step of RegEnrich analysis, enrichment analysis is followed by differential expression analysis (regenrich_diffExpr), and regulator-target network inference (regenrich_network).
regenrich_enrich(object, ...) ## S4 method for signature 'RegenrichSet' regenrich_enrich(object, ...)
regenrich_enrich(object, ...) ## S4 method for signature 'RegenrichSet' regenrich_enrich(object, ...)
object |
a 'RegenrichSet' object, to which
|
... |
arguments for enrichment analysis.
After constructing a 'RegenrichSet' object using |
This function returns a 'RegenrichSet' object with an updated
'resEnrich' slots, which is 'Enrich' objects, and an updated 'paramsIn'
slot.
See Enrich-class
function for more details about 'Enrich'
class.
Previous step regenrich_network
,
and next step regenrich_rankScore
.
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) # Initializing a 'RegenrichSet' object object = RegenrichSet(expr = data1, colData = colData, method = 'limma', minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = 'COEN', enrichTest = 'FET') # Differential expression analysis object = regenrich_diffExpr(object) # Network inference using 'COEN' method object = regenrich_network(object) # Enrichment analysis by Fisher's exact test (FET) (object = regenrich_enrich(object)) # Enrichment analysis by Fisher's exact test (GSEA) (object = regenrich_enrich(object, enrichTest = "GSEA"))
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) # Initializing a 'RegenrichSet' object object = RegenrichSet(expr = data1, colData = colData, method = 'limma', minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = 'COEN', enrichTest = 'FET') # Differential expression analysis object = regenrich_diffExpr(object) # Network inference using 'COEN' method object = regenrich_network(object) # Enrichment analysis by Fisher's exact test (FET) (object = regenrich_enrich(object)) # Enrichment analysis by Fisher's exact test (GSEA) (object = regenrich_enrich(object, enrichTest = "GSEA"))
As the second step of RegEnrich analysis, network inference is followed by differential expression analysis (regenrich_diffExpr).
Provide a network to 'RegenrichSet' object.
regenrich_network(object, ...) ## S4 method for signature 'RegenrichSet' regenrich_network(object, ...) regenrich_network(object) <- value ## S4 replacement method for signature 'RegenrichSet,TopNetwork' regenrich_network(object) <- value ## S4 replacement method for signature 'RegenrichSet,data.frame' regenrich_network(object) <- value
regenrich_network(object, ...) ## S4 method for signature 'RegenrichSet' regenrich_network(object, ...) regenrich_network(object) <- value ## S4 replacement method for signature 'RegenrichSet,TopNetwork' regenrich_network(object) <- value ## S4 replacement method for signature 'RegenrichSet,data.frame' regenrich_network(object) <- value
object |
a 'RegenrichSet' object, to which
|
... |
arguments for network inference.
After constructing a 'RegenrichSet' object using |
value |
either a 'TopNetwork' object or 'data.frame' object. If value is a 'data.frame' object, then the number of columns of |
This function returns a 'RegenrichSet' object with an updated
'network' and 'topNetP' slots, which are 'TopNetwork' objects, and
an updated 'paramsIn' slot.
See TopNetwork-class
class for more details.
This function returns a 'RegenrichSet' object with an updated
'network' and 'topNetP' slots, which are 'TopNetwork' objects, and an
updated 'paramsIn' slot.
See TopNetwork-class
class for more details.
Previous step regenrich_diffExpr
,
and next step regenrich_enrich
. User defined
network regenrich_network<-
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) # Initializing a 'RegenrichSet' object object = RegenrichSet(expr = data1, colData = colData, method = 'limma', minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = 'COEN', enrichTest = 'FET') # Differential expression analysis (object = regenrich_diffExpr(object)) # Network inference using 'COEN' method (object = regenrich_network(object))
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) # Initializing a 'RegenrichSet' object object = RegenrichSet(expr = data1, colData = colData, method = 'limma', minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = 'COEN', enrichTest = 'FET') # Differential expression analysis (object = regenrich_diffExpr(object)) # Network inference using 'COEN' method (object = regenrich_network(object))
As the fourth step of RegEnrich analysis, regulator ranking is followed by differential expression analysis (regenrich_diffExpr), regulator-target network inference (regenrich_network), and enrichment analysis (regenrich_enrich).
regenrich_rankScore(object) ## S4 method for signature 'RegenrichSet' regenrich_rankScore(object)
regenrich_rankScore(object) ## S4 method for signature 'RegenrichSet' regenrich_rankScore(object)
object |
a 'RegenrichSet' object, to which
|
This function returns a 'RegenrichSet' object with an updated 'resScore' slots, which is a 'regEnrichScore' (also 'data.frame') object, and an updated 'paramsIn' slot. In the 'regEnrichScore' object there are five columns, which are 'reg' (regulator), 'negLogPDEA' (-log10(p values of differential expression analysis)), 'negLogPEnrich' (-log10(p values of enrichment analysis), 'logFC' (log2 fold changes), and 'score' (RegEnrich ranking score).
Previous step regenrich_enrich
.
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) # Initializing a 'RegenrichSet' object object = RegenrichSet(expr = data1, colData = colData, method = 'limma', minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = 'COEN', enrichTest = 'FET') # Differential expression analysis object = regenrich_diffExpr(object) # Network inference using 'COEN' method object = regenrich_network(object) # Enrichment analysis by Fisher's exact test (FET) object = regenrich_enrich(object) # Regulators ranking (object = regenrich_rankScore(object))
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) # Initializing a 'RegenrichSet' object object = RegenrichSet(expr = data1, colData = colData, method = 'limma', minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = 'COEN', enrichTest = 'FET') # Differential expression analysis object = regenrich_diffExpr(object) # Network inference using 'COEN' method object = regenrich_network(object) # Enrichment analysis by Fisher's exact test (FET) object = regenrich_enrich(object) # Regulators ranking (object = regenrich_rankScore(object))
This is 'RegenrichSet' object creator function.
There are four types of parameters in this function.
First, parameters to provide raw data and sample information;
'expr' and 'colData'.
Second, parameters to perform differential expression analysis;
'method', 'minMeanExpr', 'design', 'reduced', 'contrast',
'coef', 'name', 'fitType', 'sfType', 'betaPrior', 'minReplicatesForReplace',
'useT', 'minmu', 'parallel', 'BPPARAM' (also for network inference),
'altHypothesis', 'listValues', 'cooksCutoff', 'independentFiltering',
'alpha', 'filter', 'theta', 'filterFun', 'addMLE', 'blind', 'ndups',
'spacing', 'block', 'correlation', 'weights', 'proportion',
'stdev.coef.lim', 'trend', 'robust', and 'winsor.tail.p'.
Thrid, parameters to perform regulator-target network inference;
'reg', 'networkConstruction', 'topNetPercent', 'directed', 'rowSample',
'softPower', 'networkType', 'TOMDenom', 'RsquaredCut', 'edgeThreshold',
'K', 'nbTrees', 'importanceMeasure', 'trace',
'BPPARAM' (also for differential expression analysis), and 'minR'.
Fourth, parameters to perform enrichment analysis:
'enrichTest', 'namedScoresCutoffs', 'minSize', 'maxSize', 'pvalueCutoff',
'qvalueCutoff', 'regAltName', 'universe', and 'nperm'.
RegenrichSet( expr, colData, rowData = NULL, method = c("Wald_DESeq2", "LRT_DESeq2", "limma", "LRT_LM"), minMeanExpr = NULL, design, reduced, contrast, coef = NULL, name, fitType = c("parametric", "local", "mean"), sfType = c("ratio", "poscounts", "iterate"), betaPrior, minReplicatesForReplace = 7, useT = FALSE, minmu = 0.5, parallel = FALSE, BPPARAM = bpparam(), altHypothesis = c("greaterAbs", "lessAbs", "greater", "less"), listValues = c(1, -1), cooksCutoff, independentFiltering = TRUE, alpha = 0.1, filter, theta, filterFun, addMLE = FALSE, blind = FALSE, ndups = 1, spacing = 1, block = NULL, correlation, weights = NULL, proportion = 0.01, stdev.coef.lim = c(0.1, 4), trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1), reg = TFs$TF_name, networkConstruction = c("COEN", "GRN", "new"), topNetPercent = 5, directed = FALSE, rowSample = FALSE, softPower = NULL, networkType = "unsigned", TOMDenom = "min", RsquaredCut = 0.85, edgeThreshold = NULL, K = "sqrt", nbTrees = 1000, importanceMeasure = "IncNodePurity", trace = FALSE, minR = 0.3, enrichTest = c("FET", "GSEA"), namedScoresCutoffs = 0.05, minSize = 5, maxSize = 5000, pvalueCutoff = 0.05, qvalueCutoff = 0.2, regAltName = NULL, universe = NULL, nperm = 10000 )
RegenrichSet( expr, colData, rowData = NULL, method = c("Wald_DESeq2", "LRT_DESeq2", "limma", "LRT_LM"), minMeanExpr = NULL, design, reduced, contrast, coef = NULL, name, fitType = c("parametric", "local", "mean"), sfType = c("ratio", "poscounts", "iterate"), betaPrior, minReplicatesForReplace = 7, useT = FALSE, minmu = 0.5, parallel = FALSE, BPPARAM = bpparam(), altHypothesis = c("greaterAbs", "lessAbs", "greater", "less"), listValues = c(1, -1), cooksCutoff, independentFiltering = TRUE, alpha = 0.1, filter, theta, filterFun, addMLE = FALSE, blind = FALSE, ndups = 1, spacing = 1, block = NULL, correlation, weights = NULL, proportion = 0.01, stdev.coef.lim = c(0.1, 4), trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1), reg = TFs$TF_name, networkConstruction = c("COEN", "GRN", "new"), topNetPercent = 5, directed = FALSE, rowSample = FALSE, softPower = NULL, networkType = "unsigned", TOMDenom = "min", RsquaredCut = 0.85, edgeThreshold = NULL, K = "sqrt", nbTrees = 1000, importanceMeasure = "IncNodePurity", trace = FALSE, minR = 0.3, enrichTest = c("FET", "GSEA"), namedScoresCutoffs = 0.05, minSize = 5, maxSize = 5000, pvalueCutoff = 0.05, qvalueCutoff = 0.2, regAltName = NULL, universe = NULL, nperm = 10000 )
expr |
matrix or data.frame, expression profile of a set of
genes or a set of proteins. If the |
colData |
data frame, sample phenotype data. The rows of colData must correspond to the columns of expr. |
rowData |
NULL or data frame, information of each row/gene. Default is NULL, which will generate a DataFrame of three columns, i.e., "gene", "p", and "logFC". |
method |
either 'Wald_DESeq2', 'LRT_DESeq2', 'limma', or 'LRT_LM' for the differential expression analysis.
|
minMeanExpr |
numeric, the cutoff of gene average expression for pre-filtering. The rows of 'expr' with everage expression < minMeanExpr is removed. The higher 'minMeanExpr' is, the more genes are not included for testing. |
design |
either model formula or model matrix. For method = 'LRT_DESeq2' or 'LRT_LM', the design is the full model formula/matrix. For method = 'limma', and if design is a formula, the model matrix is constructed using model.matrix(design, colData), so the name of each term in the design formula must be included in the column names of 'colData'. |
reduced |
The argument is used only when method = 'LRT_DESeq2' or 'LRT_LM', it is a reduced formula/matrix to compare against. If the design is a model matrix, 'reduced' must also be a model matrix. |
contrast |
The argument is used only when method = 'LRT_DESeq2',
'Wald_DESeq2', or 'limma'.
When method = 'limma', It can be one of following two formats:
|
coef |
The argument is used only when method = 'limma'. (Vector of) column number or column name specifying which coefficient or contrast of the linear model is of interest. Default is NULL. |
name |
The argument is used only when method = 'LRT_DESeq2' or
'Wald_DESeq2'.
the name of the individual effect (coefficient) for building a results
table.
Use this argument rather than contrast for continuous variables,
individual
effects or for individual interaction terms. The value provided to
name must
be an element of |
fitType |
either 'parametric', 'local', or 'mean' for the type of
fitting
of dispersions to the mean intensity. This argument is used only when
method =
'Wald_DESeq2' or 'LRT_DESeq2'. See |
sfType |
either 'ratio', 'poscounts', or 'iterate' for the type
of size
factor estimation. This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
betaPrior |
This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
minReplicatesForReplace |
This argument is used only when method
= either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
useT |
This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
minmu |
This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
parallel |
whether computing (only for differential analysis with method = "Wald_DESeq2" or "LRT_DESeq2") is parallel (default is FALSE). |
BPPARAM |
parameters for parallel computing (default is
|
altHypothesis |
= c('greaterAbs', 'lessAbs', 'greater', 'less').
This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
listValues |
This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
cooksCutoff |
theshold on Cook's distance, such that if one or
more
samples for a row have a distance higher, the p-value for the row is
set to NA.
This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
independentFiltering |
logical, whether independent filtering
should be
applied automatically. This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
alpha |
the significance cutoff used for optimizing the independent
filtering.
This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
filter |
the vector of filter statistics over which the independent
filtering is optimized. By default the mean of normalized counts is used.
This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
theta |
the quantiles at which to assess the number of rejections
from
independent filtering. This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
filterFun |
an optional custom function for performing independent
filtering
and p-value adjustment. This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
addMLE |
if betaPrior=TRUE was used, whether the 'unshrunken' maximum
likelihood estimates (MLE) of log2 fold change should be added as a column
to the results table. This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
blind |
logical, whether to blind the transformation to the
experimental
design. This argument is used only when method = either
'Wald_DESeq2' or 'LRT_DESeq2'. See |
ndups |
positive integer giving the number of times each distinct
probe is
printed on each array. This argument is used only when method = 'limma'.
See |
spacing |
positive integer giving the spacing between duplicate
occurrences of
the same probe, spacing=1 for consecutive rows. This argument is used only
when method = 'limma'. See |
block |
vector or factor specifying a blocking variable on the arrays.
Has length equal to the number of arrays. Must be NULL if ndups > 2.
This argument is used only when method = 'limma'. See |
correlation |
the inter-duplicate or inter-technical replicate
correlation.
The correlation value should be estimated using the
|
weights |
non-negative precision weights. Can be a numeric matrix of
individual weights of same size as the object expression matrix, or a
numeric
vector of array weights with length equal to ncol of the expression matrix,
or a numeric vector of gene weights with length equal to nrow of the
expression
matrix. This argument is used only when method = 'limma' or 'LRT_LM'.
See |
proportion |
numeric value between 0 and 1, assumed proportion of
genes which
are differentially expressed. This argument is used only when method =
'limma'.
See |
stdev.coef.lim |
numeric vector of length 2, assumed lower and
upper limits
for the standard deviation of log2-fold-changes for differentially
expressed
genes. This argument is used only when method = 'limma'.
See |
trend |
logical, should an intensity-trend be allowed for the prior
variance?
This argument is used only when method = 'limma'. See |
robust |
logical, should the estimation of df.prior and var.prior be
robustified against outlier sample variances? This argument is used only
when method = 'limma'. See |
winsor.tail.p |
numeric vector of length 1 or 2, giving left and right
tail proportions of x to Winsorize. Used only when method = 'limma' and
robust=TRUE. See |
reg |
a vector of regulator names (ID). By default, these are transcription (co-)factors defined by three literatures/databases, namely RegNet, TRRUST, and Marbach2016. The type (for example ENSEMBL gene ID, Entrez gene ID, or gene symble/name) of names or IDs of these regulators must be the same as the type of names or IDs in the regulator-target network. |
networkConstruction |
the method to construct this network.
Possible can be: |
topNetPercent |
numeric, what percentage of the top edges in the full network is ratained. Default is 5, meaning top 5% of edges. This value must be between 0 and 100. |
directed |
logical, whether the network is directed. Default is FALSE. |
rowSample |
logic, if TRUE, each row represents a sample. Otherwise, each column represents a sample. Default is FALSE. |
softPower |
numeric, a soft power to achieve scale free topology.
If not provided, the parameter will be picked automatically by
|
networkType |
network type. Allowed values are (unique abbreviations
of)
'unsigned' (default), 'signed', 'signed hybrid'.
See |
TOMDenom |
a character string specifying the TOM variant to be used. Recognized values are 'min' giving the standard TOM described in Zhang and Horvath (2005), and 'mean' in which the min function in the denominator is replaced by mean. The 'mean' may produce better results but at this time should be considered experimental. |
RsquaredCut |
desired minimum scale free topology fitting index R^2. Default is 0.85. |
edgeThreshold |
numeric, the threshold to remove the low weighted edges, Default is NULL, which means no edges will be removed. |
K |
integer or character. The number of features in each tree, can be either a integer number, 'sqrt', or 'all'. 'sqrt' denotes sqrt(the number of 'reg'), 'all' means the number of 'reg'. Default is 'sqrt'. |
nbTrees |
integer. The number of trees. Default is 1000. |
importanceMeasure |
character. importanceMeasure can be '%IncMSE'
or 'IncNodePurity', corresponding to type = 1 and 2 in
|
trace |
logical. To show the progress or not (default). |
minR |
numeric. The minimum correlation coefficient of prediction is to control model accuracy. Default is 0.3. |
enrichTest |
character, specifying the enrichment analysis method, which is either ‘FET' (Fisher’s exact test) or 'GSEA' (gene set enrichment analysis). |
namedScoresCutoffs |
numeric, the significance cutoff for the differential analysis p value. Default is 0.05. |
minSize |
The minimum number (default 5) of target genes. |
maxSize |
The maximum number (default 5000) of target genes. |
pvalueCutoff |
numeric, the significance cutoff for adjusted enrichment p value. This is used for obtaining the 'topResult' slot in the final 'Enrich' object. Default is 0.05. |
qvalueCutoff |
numeric, the significance cutoff of enrichment q-value. Default is 0.2. |
regAltName |
alternative name for regulator. Default is NULL. |
universe |
a vector of charactors. Background target genes. |
nperm |
integer, number of permutations. The minimial possible nominal p-value is about 1/nperm. Default is 10000. |
an object of RegenrichSet class.
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) # Initializing a 'RegenrichSet' object object = RegenrichSet(expr = data1, colData = colData, method = 'limma', minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = 'COEN', enrichTest = 'FET') object
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) # Initializing a 'RegenrichSet' object object = RegenrichSet(expr = data1, colData = colData, method = 'limma', minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = 'COEN', enrichTest = 'FET') object
The RegenrichSet
is the fundamental class that RegEnrich
package is working with.
assayRaw
matrix
, the initial raw expression data.
colData
DataFrame
object, indicating sample information.
Each row represent a sample and each column represent a feature of samples.
assays
SimpleList
object, containing the expression data after
filtering (and after Variance Stabilizing Transformation, i.e. VST, if the
differential analysis method is 'Wald_DESeq2' or 'LRT_DESeq2').
elementMetadata
DataFrame object, a slot for saving results by differential expression analysis, containing at least three columns:'gene', 'p' and 'logFC'.
topNetwork
TopNetwork
object, a slot for saving top network
edges. After regulator-target network inference,
a TopNetwork-class
object is assigned to this slot, containing only top ranked edges
in the full network. Default is NULL.
resEnrich
Enrich
object, a slot for saving enrichment analysis
either by Fisher's exact test (FET) or gene set enrichment analysis (GSEA).
resScore
Score
object, a slot for saving regulator ranking
results. It contains five components,
which are 'reg' (regulator), 'negLogPDEA' (-log10(p values of differential
expression analysis)), 'negLogPEnrich' (-log10(p values of enrichment
analysis)), 'logFC' (log2 fold changes), and 'score' (RegEnrich ranking
score).
paramsIn
list. The parameters used in the whole RegEnrich analysis. This slot can be updated by respecifying arguments in each step of RegEnrich analysis.
paramsOut
a list of four elements: DeaMethod (differential expression method), networkType (regulator-target network construction method), percent (what percentage of edges from the full network is used), and enrichTest (enrichment method). By default, each element is NULL.
network
TopNetwork
object, a slot for saving a full network.
results_expr accesses raw expression data.
results_DEA accesses results from differential expression analysis.
results_topNet accesses results from network inference.
retults_enrich accesses results from FET/GSEA enrichment analysis.
results_score accesses results from regulator scoring and ranking.
results_expr(object) results_DEA(object) results_topNet(object) results_enrich(object) results_score(object)
results_expr(object) results_DEA(object) results_topNet(object) results_enrich(object) results_score(object)
object |
RegenrichSet object. |
results_expr retures an expression matrix.
results_DEA returns a list result of differentila analysis.
results_topNet returns a TopNetwork object.
results_enrich returns an Enrich object by either FET or GSEA method.
results_score returns an data frame of summarized ranking scores of regulators.
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) # Initializing a 'RegenrichSet' object object = RegenrichSet(expr = data1, colData = colData, method = 'limma', minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = 'COEN', enrichTest = 'FET') # Differential expression analysis object = regenrich_diffExpr(object) results_expr(object) results_DEA(object) # Network inference using 'COEN' method object = regenrich_network(object) results_topNet(object) # Enrichment analysis by Fisher's exact test (FET) object = regenrich_enrich(object) results_enrich(object) # Regulators ranking object = regenrich_rankScore(object) results_score(object)
# library(RegEnrich) data("Lyme_GSE63085") data("TFs") data = log2(Lyme_GSE63085$FPKM + 1) colData = Lyme_GSE63085$sampleInfo # Take first 2000 rows for example data1 = data[seq(2000), ] design = model.matrix(~0 + patientID + week, data = colData) # Initializing a 'RegenrichSet' object object = RegenrichSet(expr = data1, colData = colData, method = 'limma', minMeanExpr = 0, design = design, contrast = c(rep(0, ncol(design) - 1), 1), networkConstruction = 'COEN', enrichTest = 'FET') # Differential expression analysis object = regenrich_diffExpr(object) results_expr(object) results_DEA(object) # Network inference using 'COEN' method object = regenrich_network(object) results_topNet(object) # Enrichment analysis by Fisher's exact test (FET) object = regenrich_enrich(object) results_enrich(object) # Regulators ranking object = regenrich_rankScore(object) results_score(object)
'Score' class inherits tibble ("tbl"). The objects of 'Score' class are to store information of regulator ranking scores.
newScore( reg = character(), negLogPDEA = numeric(), negLogPEnrich = numeric(), logFC = numeric(), score = numeric() )
newScore( reg = character(), negLogPDEA = numeric(), negLogPEnrich = numeric(), logFC = numeric(), score = numeric() )
reg |
character, regulator IDs. |
negLogPDEA |
numeric, -log(p_DEA). |
negLogPEnrich |
numeric, -log(p_Enrich). |
logFC |
numeric, log2 fold change. |
score |
numeric, RegEnrich ranking score. |
newScore function returns a Score
object.
names
character vector, containing "reg", "negLogPDEA", "negLogPEnrich", "logFC", and "score".
.Data
a list of length 5, each elements corresponds to the
names
slots.
row.names
character, regulators corresponding to .Data
slot.
.S3Class
character vector, containing "tbl_df", "tbl", "data.frame", indicating the classes that 'Score' class inherits.
newScore() newScore(letters[1:5], 1:5, 1:5, -2:2, seq(2, 1, len = 5))
newScore() newScore(letters[1:5], 1:5, 1:5, -2:2, seq(2, 1, len = 5))
methods of generic function "show"
## S4 method for signature 'DeaSet' show(object) ## S4 method for signature 'TopNetwork' show(object) ## S4 method for signature 'Enrich' show(object) ## S4 method for signature 'Score' show(object) ## S4 method for signature 'RegenrichSet' show(object)
## S4 method for signature 'DeaSet' show(object) ## S4 method for signature 'TopNetwork' show(object) ## S4 method for signature 'Enrich' show(object) ## S4 method for signature 'Score' show(object) ## S4 method for signature 'RegenrichSet' show(object)
object |
one object of either |
show returns an invisible original object
.
x = newScore(letters[1:5], 1:5, 1:5, -2:2, seq(2, 1, len = 5)) show(x)
x = newScore(letters[1:5], 1:5, 1:5, -2:2, seq(2, 1, len = 5)) show(x)
The transcrpiton factors and co-factors in humans are considered the regulators in RegEnrich. And these regulators are obtained from (Han et al. 2015; Marbach et al. 2016; and Liu et al. 2015).
data(TFs)
data(TFs)
An object of 2-column data.frame
; The first column is
ENSEMBL ID of gene regulators. The second column is gene name of gene
regulators. The row name of this data frame is identical to the
ENSEMBL ID column.
Han et al. (2015) Scientific Reports, 5:11432 (PubMed), Liu et al. (2015) Database, bav095 (PubMed), Marbach et al. (2016) Nature Methods, 13(4):366-70 (PubMed).
The 'TopNetwork' object is to store either a full network (the percentage of top edges is 100 between 0 to 10).
element
tibble, the pool of targets in the network.
set
tibble, the pool of valid regulators.
elementset
tibble, regulator-target edges with edge weights. and the elements are regulators of the targets indicated by the element name.
directed
logical, whether the network is directed.
networkConstruction
character, by which method this network is constructed. Either 'COEN' (coexpression network using WGCNA), or 'GRN' (gene regulatory network using random forest), or 'new' (a network provided by the user).
percent
numeric, what percentage of the top edges are remained. The value must be between 0 (excluding) and 100 (including).
active
character, which data table is activated, the default is "elementset".