Package 'RegEnrich' reference manual

Title:	Gene regulator enrichment analysis
Description:	This package is a pipeline to identify the key gene regulators in a biological process, for example in cell differentiation and in cell development after stimulation. There are four major steps in this pipeline: (1) differential expression analysis; (2) regulator-target network inference; (3) enrichment analysis; and (4) regulators scoring and ranking.
Authors:	Weiyang Tao [cre, aut], Aridaman Pandit [aut]
Maintainer:	Weiyang Tao <weiyangtao1513@gmail.com>
License:	GPL (>= 2)
Version:	1.17.0
Built:	2025-02-15 05:45:02 UTC
Source:	https://github.com/bioc/RegEnrich

DeaSet class

Description

DeaSet class

Slots

colData: DataFrame object, sample information, the row name is corresponding to the column names of expression matrix in the assays slot.
assays: SimpleList object of one/multiple matrix/matrices, this is the slot for storing the expression data after filtering (and after Variance Stabilizing Transformation, i.e. VST, if the differential analysis method is 'Wald_DESeq2' or 'LRT_DESeq2'). And the expression matrix is used for network inference and plotting.
NAMES: row names of expression data in assays slot and elementMetadata slot.
elementMetadata: feature information, contains at least a DataFrame of three columns, i.e. 'gene', 'p' and 'logFC', which stores gene names/IDs, differential p values and log2 expression fold changes, respectively.
metadata: DataFrame object, information of feature columns.
assayRaw: a slot for saving the raw expression data.

Examples

nrows = 100
ncols = 6
counts = matrix(rnbinom(nrows * ncols, size = 2, mu = 500),
                nrow = nrows)
assays = SimpleList(assayData = counts)

colData = DataFrame(Condition = rep(c("treatment", "ctrl"), 3),
                    row.names=LETTERS[1:6])
geneNames = sprintf("G%03s", seq(nrows))
elementMetadata = DataFrame(gene = geneNames,
                            p = numeric(nrows),
                            logFC = numeric(nrows))

ds = new("DeaSet",
         assays = Assays(assays),
         colData = colData,
         assayRaw = counts,
         elementMetadata = elementMetadata,
         NAMES = geneNames)
ds


nrows = 100
ncols = 6
counts = matrix(rnbinom(nrows * ncols, size = 2, mu = 500),
                nrow = nrows)
assays = SimpleList(assayData = counts)

colData = DataFrame(Condition = rep(c("treatment", "ctrl"), 3),
                    row.names=LETTERS[1:6])
geneNames = sprintf("G%03s", seq(nrows))
elementMetadata = DataFrame(gene = geneNames,
                            p = numeric(nrows),
                            logFC = numeric(nrows))

ds = new("DeaSet",
         assays = Assays(assays),
         colData = colData,
         assayRaw = counts,
         elementMetadata = elementMetadata,
         NAMES = geneNames)
ds

dimention of 'TopNetwork' object

Description

dimention of 'TopNetwork' object

Usage

## S4 method for signature 'TopNetwork'
dim(x)
## S4 method for signature 'TopNetwork'
dim(x)

Arguments

`x`	a 'TopNetwork' object.

Value

Dimention of regulator-target network edge table.

Examples

nw = newTopNetwork()
dim(nw)
nw = newTopNetwork()
dim(nw)

Enrich class

Description

The 'Enrich' object is to store enrichment analysis results by either 'FET' method or 'GSEA' method.

Slots

topResult: data frame. The enrichment results that pass thresholds (default threshold is 0.05).
allResult: data frame. The enrichment results by FET or GSEA methods.
gene: character vector indicating the genes used for enrichment analysis.
namedScores: numeric vector, a vector of ranked scores (decendent), the names of the scores are the genes to perform enrichment analysis. Here the scores are p-value of each gene.
type: character indicating enrichment method, either 'FET' or 'GSEA'.

Inference the name of results of DESeq analysis by a formula (or model matrix) and sample information

Description

Inference the name of results of DESeq analysis by a formula (or model matrix) and sample information

Usage

getResultsNames(design, pData = NULL)
getResultsNames(design, pData = NULL)

Arguments

`design`	either a formula or a model matrix.
`pData`	a data frame, showing the information of each sample. If design is a formula, the pData must be include the columns that identical to the terms of the design formula. If design is a model matrix, then pData is not used. Default is NULL.

Value

the names of contrast parameter (list of character format) that regenrich_diffExpr and results function can use, and it is the same as the value that resultsNames function returns.

Examples

# formula with intercept
design = ~condition
pData = data.frame(condition = factor(c('A', 'A', 'A', 'B', 'B', 'B'),
                                      c('A', 'B')))
getResultsNames(design, pData)

# formula without intercept
design = ~0+condition
getResultsNames(design, pData)

# formula with two terms
design = ~condition+treatment
pData = data.frame(condition = factor(rep(c('A', 'B'), each= 4),
                                      c('A', 'B')),
                   treatment = factor(rep_len(c('Ctrl', 'Treat'), 8),
                                      c('Ctrl', 'Treat')))
getResultsNames(design, pData)

# formula with two terms and an interaction term
design = ~condition+treatment+condition:treatment
getResultsNames(design, pData)

# design is a model matrix
pData = data.frame(condition = factor(rep(c('A', 'B'), each= 4),
                                      c('A', 'B')),
                   treatment = factor(rep_len(c('Ctrl', 'Treat'), 8),
                                      c('Ctrl', 'Treat')))
design = model.matrix(~condition+treatment, pData)
getResultsNames(design)
# formula with intercept
design = ~condition
pData = data.frame(condition = factor(c('A', 'A', 'A', 'B', 'B', 'B'),
                                      c('A', 'B')))
getResultsNames(design, pData)

# formula without intercept
design = ~0+condition
getResultsNames(design, pData)

# formula with two terms
design = ~condition+treatment
pData = data.frame(condition = factor(rep(c('A', 'B'), each= 4),
                                      c('A', 'B')),
                   treatment = factor(rep_len(c('Ctrl', 'Treat'), 8),
                                      c('Ctrl', 'Treat')))
getResultsNames(design, pData)

# formula with two terms and an interaction term
design = ~condition+treatment+condition:treatment
getResultsNames(design, pData)

# design is a model matrix
pData = data.frame(condition = factor(rep(c('A', 'B'), each= 4),
                                      c('A', 'B')),
                   treatment = factor(rep_len(c('Ctrl', 'Treat'), 8),
                                      c('Ctrl', 'Treat')))
design = model.matrix(~condition+treatment, pData)
getResultsNames(design)

head or tail of Score object

Description

head or tail of Score object

Usage

## S4 method for signature 'Score'
head(x, ...)

## S4 method for signature 'Score'
tail(x, ...)
## S4 method for signature 'Score'
head(x, ...)

## S4 method for signature 'Score'
tail(x, ...)

Arguments

`x`	an `Score` object.
`...`	arguments to be passed to or from other methods.

Value

Head or tail table of Score object.

Examples

s = newScore(letters, seq(26), seq(26), seq(26), seq(2, 0, len = 26))
s1 = head(s)
s1

s2 = tail(s)
s2

s = newScore(letters, seq(26), seq(26), seq(26), seq(2, 0, len = 26))
s1 = head(s)
s1

s2 = tail(s)
s2

Example RNAseq dataset [Human]

Description

Data from an RNA sequencing experiment on peripheral mononuclear blood cells (PBMC) of Lyme disease patients against healthy controls. It contains a gene expression (FPKM) table (data frame) and a sample information table (data frame).

Usage

data(Lyme_GSE63085)
data(Lyme_GSE63085)

Format

A list of 2 elements: FPKM and sampleInfo. FPKM is the 'Fragments Per Kilobase of transcript per Million mapped reads' data, which is a 5000 (genes) * 52 (samples) data frame. sampleInfo is the information of samples, which is 52 (samples) * 9 (features) data frame. The full version of FPKM table contains 23615 rows, which can be downloaded from GEO database.

Source

URL

References

Bouquet et al. (2016) mBio 7(1): e00100-16 (PubMed)

DeaSet object creator

Description

DeaSet object creator

Usage

newDeaSet(
  assayRaw = matrix(nrow = 0, ncol = 0),
  rowData = NULL,
  assays = SimpleList(),
  colData = DataFrame(),
  metadata = list()
)
newDeaSet(
  assayRaw = matrix(nrow = 0, ncol = 0),
  rowData = NULL,
  assays = SimpleList(),
  colData = DataFrame(),
  metadata = list()
)

Arguments

`assayRaw`	A matrix of gene expression data. This can be the same as the matrix-like element in `assays` parameter.
`rowData`	A DataFrame object describing the rows.
`assays`	A list or SimpleList of matrix-like element, or a matrix-like object. The matrix-like element can be the same as `assayRaw` parameter.
`colData`	A DataFrame describing the sample information.
`metadata`	An optional list of arbitrary content describing the overall experiment.

Value

A DeaSet object.

Examples

# Empty DeaSet object
newDeaSet()

# 100 * 6 DeaSet object
nrows = 100
ncols = 6
counts = matrix(rnbinom(nrows * ncols, size = 2, mu = 500),
                nrow = nrows)
assays = SimpleList(counts=counts)

colData = DataFrame(Condition = rep(c("treatment", "ctrl"), 3),
                    row.names=LETTERS[1:6])
geneNames = sprintf("G%03s", seq(nrows))
elementMetadata = DataFrame(gene = geneNames,
                            p = numeric(nrows),
                            logFC = numeric(nrows))
newDeaSet(assayRaw = counts, 
          rowData = elementMetadata,
          assays = SimpleList(assayData = counts),
          colData = colData)
# Empty DeaSet object
newDeaSet()

# 100 * 6 DeaSet object
nrows = 100
ncols = 6
counts = matrix(rnbinom(nrows * ncols, size = 2, mu = 500),
                nrow = nrows)
assays = SimpleList(counts=counts)

colData = DataFrame(Condition = rep(c("treatment", "ctrl"), 3),
                    row.names=LETTERS[1:6])
geneNames = sprintf("G%03s", seq(nrows))
elementMetadata = DataFrame(gene = geneNames,
                            p = numeric(nrows),
                            logFC = numeric(nrows))
newDeaSet(assayRaw = counts, 
          rowData = elementMetadata,
          assays = SimpleList(assayData = counts),
          colData = colData)

TopNetwork object creator

Description

This function create 'TopNetwork' object using 3-column edge table.

Usage

newTopNetwork(
  networkEdgeTable,
  reg = "",
  directed = TRUE,
  networkConstruction = c("new", "COEN", "GRN"),
  percent = 100
)
newTopNetwork(
  networkEdgeTable,
  reg = "",
  directed = TRUE,
  networkConstruction = c("new", "COEN", "GRN"),
  percent = 100
)

Arguments

`networkEdgeTable`	a data frame of 3 columns, representing 'from.gene' ('regulators'), 'to.gene' ('targets') and 'weight', respectively.
`reg`	a vector of gene regulators.
`directed`	logical, whether the network is directed. Default is TRUE.
`networkConstruction`	the method to construct this network. Possible can be: 'COEN', coexpression network; 'GRN', gene regulatory network by random forest; 'new' (default), meaning a network provided by user, rather than infered based on the expression data.
`percent`	the percentage of edges in the original whole network. Default is 100, meaning 100% edges in whole network.

Value

an object of topNetwork class.

Examples

data(TFs)
edge = data.frame(from = rep(TFs$TF_name[seq(3)], seq(3)),
                  to = TFs$TF_name[11:16], weight = 0.1*(6:1))
object = newTopNetwork(edge, networkConstruction = 'new', percent = 100)
object
str(object)
data(TFs)
edge = data.frame(from = rep(TFs$TF_name[seq(3)], seq(3)),
                  to = TFs$TF_name[11:16], weight = 0.1*(6:1))
object = newTopNetwork(edge, networkConstruction = 'new', percent = 100)
object
str(object)

Plot results of FET/GSEA enrichment analysis

Description

Plot FET/GSEA enrichment results. If the FET method is applied, the top 'showCategory' regulator will be plotted. If the GSEA method is applied, the GSEA graph of regulator 'reg' will be plotted.

Usage

plot_Enrich(object, ...)

## S4 method for signature 'RegenrichSet'
plot_Enrich(
  object,
  reg = NULL,
  showCategory = 20,
  regDescription = NULL,
  font.size = 12
)
plot_Enrich(object, ...)

## S4 method for signature 'RegenrichSet'
plot_Enrich(
  object,
  reg = NULL,
  showCategory = 20,
  regDescription = NULL,
  font.size = 12
)

Arguments

`object`	a `RegenrichSet` object.
`...`	other parameters.
`reg`	The regulator to plot. This only works when the GSEA enrichment method has used.
`showCategory`	the number of regulator to plot.
`regDescription`	NULL or a two-column data frame, in which first column is the regulator IDs (for example ENSEMBL IDs), and the second column is the description of regulators (for example gene name). Default is NULL, meaning both columns are the same regulator names/IDs in the network.
`font.size`	font size of axis labels and axis tick mark labels, default is 12.

Value

a ggplot object of plotting FET or GSEA enrichment result.

Examples

# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = "limma", minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = "COEN",
                      enrichTest = "FET")
# Differential expression analysis
object = regenrich_diffExpr(object)
# Network inference using "COEN" method
object = regenrich_network(object)
# Enrichment analysis by Fisher's exact test (FET)
object = regenrich_enrich(object)
# plot
plot_Enrich(object)

# Enrichment analysis by Fisher's exact test (FET)
object = regenrich_enrich(object, enrichTest = "GSEA")
# plot
plot_Enrich(object)
# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = "limma", minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = "COEN",
                      enrichTest = "FET")
# Differential expression analysis
object = regenrich_diffExpr(object)
# Network inference using "COEN" method
object = regenrich_network(object)
# Enrichment analysis by Fisher's exact test (FET)
object = regenrich_enrich(object)
# plot
plot_Enrich(object)

# Enrichment analysis by Fisher's exact test (FET)
object = regenrich_enrich(object, enrichTest = "GSEA")
# plot
plot_Enrich(object)

Compare the orders of two vectors

Description

compare the orders of two vectors

Usage

plotOrders(name1, name2)
plotOrders(name1, name2)

Arguments

`name1`	a vector with first order.
`name2`	a vector with anothoer second order.

Value

A plot of comparing two orders of vectors.

Examples

a = c('a1', 'a2', 'a5', 'a4')
b = c( 'a2', 'a5', 'a7', 'a4', 'a6')
plotOrders(a, b)
a = c('a1', 'a2', 'a5', 'a4')
b = c( 'a2', 'a5', 'a7', 'a4', 'a6')
plotOrders(a, b)

Plot regulator and its targets expression

Description

Plot regulator and its targets expression

Usage

plotRegTarExpr(
  object,
  reg,
  n = 1000,
  scale = TRUE,
  tarCol = "black",
  tarColAlpha = 0.1,
  regCol = "#ffaa00",
  xlab = "Samples",
  ylab = "Z-scores",
  ...
)
plotRegTarExpr(
  object,
  reg,
  n = 1000,
  scale = TRUE,
  tarCol = "black",
  tarColAlpha = 0.1,
  regCol = "#ffaa00",
  xlab = "Samples",
  ylab = "Z-scores",
  ...
)

Arguments

`object`	a RegenrichSet object, to which at least `regenrich_diffExpr` and `regenrich_network` functions have been applied.
`reg`	a regulator to plot.
`n`	the maximun number of targets to plot.
`scale`	logical, whether gene expression is z-score normalized.
`tarCol`	the color of the lines for the targets of the regulator.
`tarColAlpha`	numeric, ranging from 0 to 1, indicating transparancy of target lines.
`regCol`	the color of the line for the 'reg'.
`xlab`	x label of plot.
`ylab`	y label of plot.
`...`	other parameters in `ggplot` function.

Value

a ggplot object.

Examples

# constructing a RegenrichSet object
colData = data.frame(patientID = paste0('Sample_', seq(50)),
                     week = rep(c('0', '1'), each = 25),
                     row.names = paste0('Sample_', seq(50)), 
                     stringsAsFactors = TRUE)
design = ~week
reduced = ~1
set.seed(123)
cnts = matrix(as.integer(rnbinom(n=1000*50, mu=100, size=1/0.1)), ncol=50,
              dimnames = list(paste0('gene', seq(1000)), rownames(colData)))

cnts[5,26:50] = cnts[5,26:50] + 50L # add reads to gene5 in some samples.
id = sample(31:1000, 20) # randomly select 20 rows, and assign reads.
cnts[id,] = vapply(cnts[5,], function(x){
  as.integer(rnbinom(n = 20, size = 1/0.02, mu = x))},
  FUN.VALUE = rep(1L, 20))

object = RegenrichSet(expr = cnts,
                      colData = colData,
                      method = 'LRT_DESeq2', minMeanExpr = 0,
                      design = design, reduced = reduced, fitType = 'local',
                      networkConstruction = 'COEN',
                      enrichTest = 'FET',
                      reg = paste0('gene', seq(30)))

## RegEnrich analysis
object = regenrich_diffExpr(object)

# Set a random softPower, otherwise it is difficult to achive a
# scale-free network because of a randomly generated count data.
object = regenrich_network(object, softPower = 3)
object = regenrich_enrich(object)
object = regenrich_rankScore(object)

## plot expression of a regulator and its targets.
plotRegTarExpr(object, reg = 'gene5')
plotRegTarExpr(object, reg = 'gene27')
# constructing a RegenrichSet object
colData = data.frame(patientID = paste0('Sample_', seq(50)),
                     week = rep(c('0', '1'), each = 25),
                     row.names = paste0('Sample_', seq(50)), 
                     stringsAsFactors = TRUE)
design = ~week
reduced = ~1
set.seed(123)
cnts = matrix(as.integer(rnbinom(n=1000*50, mu=100, size=1/0.1)), ncol=50,
              dimnames = list(paste0('gene', seq(1000)), rownames(colData)))

cnts[5,26:50] = cnts[5,26:50] + 50L # add reads to gene5 in some samples.
id = sample(31:1000, 20) # randomly select 20 rows, and assign reads.
cnts[id,] = vapply(cnts[5,], function(x){
  as.integer(rnbinom(n = 20, size = 1/0.02, mu = x))},
  FUN.VALUE = rep(1L, 20))

object = RegenrichSet(expr = cnts,
                      colData = colData,
                      method = 'LRT_DESeq2', minMeanExpr = 0,
                      design = design, reduced = reduced, fitType = 'local',
                      networkConstruction = 'COEN',
                      enrichTest = 'FET',
                      reg = paste0('gene', seq(30)))

## RegEnrich analysis
object = regenrich_diffExpr(object)

# Set a random softPower, otherwise it is difficult to achive a
# scale-free network because of a randomly generated count data.
object = regenrich_network(object, softPower = 3)
object = regenrich_enrich(object)
object = regenrich_rankScore(object)

## plot expression of a regulator and its targets.
plotRegTarExpr(object, reg = 'gene5')
plotRegTarExpr(object, reg = 'gene27')

Plot soft power for WGCNA analysis

Description

Plot soft power and corresponding scale free topology fitting index to find a proper soft power for WGCNA analysis.

Usage

plotSoftPower(
  expr,
  rowSample = FALSE,
  weights = NULL,
  powerVector = c(seq(10), seq(12, 20, by = 2)),
  RsquaredCut = 0.85,
  networkType = "unsigned",
  removeFirst = FALSE,
  nBreaks = 10,
  corFnc = WGCNA::cor,
  corOptions = list(use = "p")
)
plotSoftPower(
  expr,
  rowSample = FALSE,
  weights = NULL,
  powerVector = c(seq(10), seq(12, 20, by = 2)),
  RsquaredCut = 0.85,
  networkType = "unsigned",
  removeFirst = FALSE,
  nBreaks = 10,
  corFnc = WGCNA::cor,
  corOptions = list(use = "p")
)

Arguments

`expr`	Gene expression data, either a matrix or a data frame. By default, each row represents a gene, each column represents a sample.
`rowSample`	logic. If `TRUE`, each row represents a sample. The default is `FALSE`.
`weights`	optional observation weights for `expr` to be used in correlation calculation.
`powerVector`	a vector of soft thresholding powers for which the scale free topology fit indices are to be calculated.
`RsquaredCut`	desired minimum scale free topology fitting index R^2. The default is 0.85.
`networkType`	character, network type. Allowed values are (unique abbreviations of) "unsigned" (default), "signed", "signed hybrid". See `adjacency`.
`removeFirst`	should the first bin be removed from the connectivity histogram? The default is FALSE.
`nBreaks`	number of bins in connectivity histograms. The default is 10.
`corFnc`	correlation function to be used in adjacency calculation. The default is the `cor` function in WGCNA.
`corOptions`	a named list of options to the correlation function specified in corFnc. The default is list(use = "p").

Value

a list of three elements: powerEstimate, fitIndices, and plot. powerEstimate is an estimate of an appropriate soft-thresholding power. fitIndices is a data frame containing the fit indices for scale free topology. The plot is a ggplot object.

Examples

data(Lyme_GSE63085)
log2FPKM = log2(Lyme_GSE63085$FPKM + 1)
log2FPKMhi = log2FPKM[rowMeans(log2FPKM) >= 10^-3, , drop = FALSE]
log2FPKMhi = head(log2FPKMhi, 3000) # First 3000 genes for example

softP = plotSoftPower(log2FPKMhi, RsquaredCut = 0.85)
print(softP)

data(Lyme_GSE63085)
log2FPKM = log2(Lyme_GSE63085$FPKM + 1)
log2FPKMhi = log2FPKM[rowMeans(log2FPKM) >= 10^-3, , drop = FALSE]
log2FPKMhi = head(log2FPKMhi, 3000) # First 3000 genes for example

softP = plotSoftPower(log2FPKMhi, RsquaredCut = 0.85)
print(softP)

Print Score object

Description

Print Score object

Usage

## S3 method for class 'Score'
print(x, ...)
## S3 method for class 'Score'
print(x, ...)

Arguments

`x`	a Score object.
`...`	optional arguments to print.

Value

print.Score returns the a Score object

Examples

x = newScore(letters[1:5], 1:5, 1:5, -2:2, seq(2, 1, len = 5))
print(x)
x = newScore(letters[1:5], 1:5, 1:5, -2:2, seq(2, 1, len = 5))
print(x)

Differential expression analysis step

Description

This is the first step of RegEnrich analysis. differential expression analysis by this function needs to be performed on a 'RegenrichSet' object.

Usage

regenrich_diffExpr(object, ...)

## S4 method for signature 'RegenrichSet'
regenrich_diffExpr(object, ...)
regenrich_diffExpr(object, ...)

## S4 method for signature 'RegenrichSet'
regenrich_diffExpr(object, ...)

Arguments

object

a 'RegenrichSet' object, which is initialized by RegenrichSet function.

...

arguments for differential analysis. After constructing a 'RegenrichSet' object, all arguments for RegEnrich analysis have been initialized and stored in 'paramsIn“ slot. while the arguments for differential analysis can be re-specified here.

These arguments include 'method', 'minMeanExpr', 'design', 'reduced', 'contrast', 'coef', 'name', 'fitType', 'sfType', 'betaPrior', 'minReplicatesForReplace', 'useT', 'minmu', 'parallel', 'BPPARAM', 'altHypothesis', 'listValues', 'cooksCutoff', 'independentFiltering', 'alpha', 'filter', 'theta', 'filterFun', 'addMLE', 'blind', 'ndups', 'spacing', 'block', 'correlation', 'weights', 'proportion', 'stdev.coef.lim', 'trend', 'robust', and 'winsor.tail.p'.
See RegenrichSet function for more details about these arguments.

Value

This function returns a 'RegenrichSet' object with an updated 'resDEA' slot, which is a 'DeaSet' object, and an updated 'paramsIn' slot. See newDeaSet function for more details about 'DeaSet' class. If an argument not in the above list is specified in the regenrich_diffExpr function, a warning or error will be raised.

Examples

# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')
# Using the predifined parameters in the previous step
(object = regenrich_diffExpr(object))

# re-specifying parameter 'minMeanExpr'
print(slot(object, 'paramsIn')$minMeanExpr)
(object = regenrich_diffExpr(object, minMeanExpr = 1))
print(slot(object, 'paramsIn')$minMeanExpr)

# Unrecognized argument 'unrecognizedArg' (Error)
# object = regenrich_diffExpr(object, minMeanExpr = 1,
#                             unrecognizedArg = 23)

# Argument not for differential expression analysis (Warning)
# print(slot(object, 'paramsIn')$networkConstruction)
# (object = regenrich_diffExpr(object, minMeanExpr = 1,
#                              networkConstruction = 'GRN'))
# print(slot(object, 'paramsIn')$networkConstruction) # not changed
# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')
# Using the predifined parameters in the previous step
(object = regenrich_diffExpr(object))

# re-specifying parameter 'minMeanExpr'
print(slot(object, 'paramsIn')$minMeanExpr)
(object = regenrich_diffExpr(object, minMeanExpr = 1))
print(slot(object, 'paramsIn')$minMeanExpr)

# Unrecognized argument 'unrecognizedArg' (Error)
# object = regenrich_diffExpr(object, minMeanExpr = 1,
#                             unrecognizedArg = 23)

# Argument not for differential expression analysis (Warning)
# print(slot(object, 'paramsIn')$networkConstruction)
# (object = regenrich_diffExpr(object, minMeanExpr = 1,
#                              networkConstruction = 'GRN'))
# print(slot(object, 'paramsIn')$networkConstruction) # not changed

Enrichment analysis step

Description

As the thrid step of RegEnrich analysis, enrichment analysis is followed by differential expression analysis (regenrich_diffExpr), and regulator-target network inference (regenrich_network).

Usage

regenrich_enrich(object, ...)

## S4 method for signature 'RegenrichSet'
regenrich_enrich(object, ...)
regenrich_enrich(object, ...)

## S4 method for signature 'RegenrichSet'
regenrich_enrich(object, ...)

Arguments

object

a 'RegenrichSet' object, to which regenrich_diffExpr, and regenrich_network, functions have been already applied.

...

arguments for enrichment analysis. After constructing a 'RegenrichSet' object using RegenrichSet function, all arguments for RegEnrich analysis have been initialized and stored in 'paramsIn“ slot. The arguments for enrichment analysis can be re-specified here.

These arguments include 'enrichTest', 'namedScoresCutoffs', 'minSize', 'maxSize', 'pvalueCutoff','qvalueCutoff', 'regAltName', 'universe', 'minSize', 'maxSize', 'pvalueCutoff', and 'nperm'.

See RegenrichSet function for more details about these arguments.

Value

This function returns a 'RegenrichSet' object with an updated 'resEnrich' slots, which is 'Enrich' objects, and an updated 'paramsIn' slot. See Enrich-class function for more details about 'Enrich' class.

Examples

# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')


# Differential expression analysis
object = regenrich_diffExpr(object)

# Network inference using 'COEN' method
object = regenrich_network(object)

# Enrichment analysis by Fisher's exact test (FET)
(object = regenrich_enrich(object))

# Enrichment analysis by Fisher's exact test (GSEA)
(object = regenrich_enrich(object, enrichTest = "GSEA"))

# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')


# Differential expression analysis
object = regenrich_diffExpr(object)

# Network inference using 'COEN' method
object = regenrich_network(object)

# Enrichment analysis by Fisher's exact test (FET)
(object = regenrich_enrich(object))

# Enrichment analysis by Fisher's exact test (GSEA)
(object = regenrich_enrich(object, enrichTest = "GSEA"))

Regulator-target network inference step

Description

As the second step of RegEnrich analysis, network inference is followed by differential expression analysis (regenrich_diffExpr).

Provide a network to 'RegenrichSet' object.

Usage

regenrich_network(object, ...)

## S4 method for signature 'RegenrichSet'
regenrich_network(object, ...)

regenrich_network(object) <- value

## S4 replacement method for signature 'RegenrichSet,TopNetwork'
regenrich_network(object) <- value

## S4 replacement method for signature 'RegenrichSet,data.frame'
regenrich_network(object) <- value
regenrich_network(object, ...)

## S4 method for signature 'RegenrichSet'
regenrich_network(object, ...)

regenrich_network(object) <- value

## S4 replacement method for signature 'RegenrichSet,TopNetwork'
regenrich_network(object) <- value

## S4 replacement method for signature 'RegenrichSet,data.frame'
regenrich_network(object) <- value

Arguments

`object`	a 'RegenrichSet' object, to which `regenrich_diffExpr` function has been already applied.
`...`	arguments for network inference. After constructing a 'RegenrichSet' object using `RegenrichSet` function, all arguments for RegEnrich analysis have been initialized and stored in 'paramsIn“ slot. The arguments for network inference can be re-specified here. These arguments include 'networkConstruction', 'reg', 'rowSample', 'softPower', 'networkType', 'TOMDenom', 'RsquaredCut', 'edgeThreshold', 'K', 'nbTrees', 'importanceMeasure', 'trace', 'BPPARAM', 'minR', 'topNetPercent', and 'directed'. See `RegenrichSet` function for more details about these arguments.
`value`	either a 'TopNetwork' object or 'data.frame' object. If value is a 'data.frame' object, then the number of columns of

Value

This function returns a 'RegenrichSet' object with an updated 'network' and 'topNetP' slots, which are 'TopNetwork' objects, and an updated 'paramsIn' slot. See TopNetwork-class class for more details.

Examples

# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')


# Differential expression analysis
(object = regenrich_diffExpr(object))

# Network inference using 'COEN' method
(object = regenrich_network(object))

# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')


# Differential expression analysis
(object = regenrich_diffExpr(object))

# Network inference using 'COEN' method
(object = regenrich_network(object))

Regulator scoring and ranking

Description

As the fourth step of RegEnrich analysis, regulator ranking is followed by differential expression analysis (regenrich_diffExpr), regulator-target network inference (regenrich_network), and enrichment analysis (regenrich_enrich).

Usage

regenrich_rankScore(object)

## S4 method for signature 'RegenrichSet'
regenrich_rankScore(object)
regenrich_rankScore(object)

## S4 method for signature 'RegenrichSet'
regenrich_rankScore(object)

Arguments

object

a 'RegenrichSet' object, to which regenrich_diffExpr, regenrich_network, and regenrich_enrich functions all have been already applied.

Value

This function returns a 'RegenrichSet' object with an updated 'resScore' slots, which is a 'regEnrichScore' (also 'data.frame') object, and an updated 'paramsIn' slot. In the 'regEnrichScore' object there are five columns, which are 'reg' (regulator), 'negLogPDEA' (-log10(p values of differential expression analysis)), 'negLogPEnrich' (-log10(p values of enrichment analysis), 'logFC' (log2 fold changes), and 'score' (RegEnrich ranking score).

Examples

# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')


# Differential expression analysis
object = regenrich_diffExpr(object)

# Network inference using 'COEN' method
object = regenrich_network(object)

# Enrichment analysis by Fisher's exact test (FET)
object = regenrich_enrich(object)

# Regulators ranking
(object = regenrich_rankScore(object))

# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')


# Differential expression analysis
object = regenrich_diffExpr(object)

# Network inference using 'COEN' method
object = regenrich_network(object)

# Enrichment analysis by Fisher's exact test (FET)
object = regenrich_enrich(object)

# Regulators ranking
(object = regenrich_rankScore(object))

RegenrichSet object creator

Description

This is 'RegenrichSet' object creator function. There are four types of parameters in this function.
First, parameters to provide raw data and sample information;
'expr' and 'colData'.

Second, parameters to perform differential expression analysis;
'method', 'minMeanExpr', 'design', 'reduced', 'contrast', 'coef', 'name', 'fitType', 'sfType', 'betaPrior', 'minReplicatesForReplace', 'useT', 'minmu', 'parallel', 'BPPARAM' (also for network inference), 'altHypothesis', 'listValues', 'cooksCutoff', 'independentFiltering', 'alpha', 'filter', 'theta', 'filterFun', 'addMLE', 'blind', 'ndups', 'spacing', 'block', 'correlation', 'weights', 'proportion', 'stdev.coef.lim', 'trend', 'robust', and 'winsor.tail.p'.

Thrid, parameters to perform regulator-target network inference;
'reg', 'networkConstruction', 'topNetPercent', 'directed', 'rowSample', 'softPower', 'networkType', 'TOMDenom', 'RsquaredCut', 'edgeThreshold', 'K', 'nbTrees', 'importanceMeasure', 'trace', 'BPPARAM' (also for differential expression analysis), and 'minR'.

Fourth, parameters to perform enrichment analysis:
'enrichTest', 'namedScoresCutoffs', 'minSize', 'maxSize', 'pvalueCutoff', 'qvalueCutoff', 'regAltName', 'universe', and 'nperm'.

Usage

RegenrichSet(
  expr,
  colData,
  rowData = NULL,
  method = c("Wald_DESeq2", "LRT_DESeq2", "limma", "LRT_LM"),
  minMeanExpr = NULL,
  design,
  reduced,
  contrast,
  coef = NULL,
  name,
  fitType = c("parametric", "local", "mean"),
  sfType = c("ratio", "poscounts", "iterate"),
  betaPrior,
  minReplicatesForReplace = 7,
  useT = FALSE,
  minmu = 0.5,
  parallel = FALSE,
  BPPARAM = bpparam(),
  altHypothesis = c("greaterAbs", "lessAbs", "greater", "less"),
  listValues = c(1, -1),
  cooksCutoff,
  independentFiltering = TRUE,
  alpha = 0.1,
  filter,
  theta,
  filterFun,
  addMLE = FALSE,
  blind = FALSE,
  ndups = 1,
  spacing = 1,
  block = NULL,
  correlation,
  weights = NULL,
  proportion = 0.01,
  stdev.coef.lim = c(0.1, 4),
  trend = FALSE,
  robust = FALSE,
  winsor.tail.p = c(0.05, 0.1),
  reg = TFs$TF_name,
  networkConstruction = c("COEN", "GRN", "new"),
  topNetPercent = 5,
  directed = FALSE,
  rowSample = FALSE,
  softPower = NULL,
  networkType = "unsigned",
  TOMDenom = "min",
  RsquaredCut = 0.85,
  edgeThreshold = NULL,
  K = "sqrt",
  nbTrees = 1000,
  importanceMeasure = "IncNodePurity",
  trace = FALSE,
  minR = 0.3,
  enrichTest = c("FET", "GSEA"),
  namedScoresCutoffs = 0.05,
  minSize = 5,
  maxSize = 5000,
  pvalueCutoff = 0.05,
  qvalueCutoff = 0.2,
  regAltName = NULL,
  universe = NULL,
  nperm = 10000
)
RegenrichSet(
  expr,
  colData,
  rowData = NULL,
  method = c("Wald_DESeq2", "LRT_DESeq2", "limma", "LRT_LM"),
  minMeanExpr = NULL,
  design,
  reduced,
  contrast,
  coef = NULL,
  name,
  fitType = c("parametric", "local", "mean"),
  sfType = c("ratio", "poscounts", "iterate"),
  betaPrior,
  minReplicatesForReplace = 7,
  useT = FALSE,
  minmu = 0.5,
  parallel = FALSE,
  BPPARAM = bpparam(),
  altHypothesis = c("greaterAbs", "lessAbs", "greater", "less"),
  listValues = c(1, -1),
  cooksCutoff,
  independentFiltering = TRUE,
  alpha = 0.1,
  filter,
  theta,
  filterFun,
  addMLE = FALSE,
  blind = FALSE,
  ndups = 1,
  spacing = 1,
  block = NULL,
  correlation,
  weights = NULL,
  proportion = 0.01,
  stdev.coef.lim = c(0.1, 4),
  trend = FALSE,
  robust = FALSE,
  winsor.tail.p = c(0.05, 0.1),
  reg = TFs$TF_name,
  networkConstruction = c("COEN", "GRN", "new"),
  topNetPercent = 5,
  directed = FALSE,
  rowSample = FALSE,
  softPower = NULL,
  networkType = "unsigned",
  TOMDenom = "min",
  RsquaredCut = 0.85,
  edgeThreshold = NULL,
  K = "sqrt",
  nbTrees = 1000,
  importanceMeasure = "IncNodePurity",
  trace = FALSE,
  minR = 0.3,
  enrichTest = c("FET", "GSEA"),
  namedScoresCutoffs = 0.05,
  minSize = 5,
  maxSize = 5000,
  pvalueCutoff = 0.05,
  qvalueCutoff = 0.2,
  regAltName = NULL,
  universe = NULL,
  nperm = 10000
)

Arguments

`expr`	matrix or data.frame, expression profile of a set of genes or a set of proteins. If the `method = 'Wald_DESeq2' or 'LRT_DESeq2'` only non-negative integer matrix (read counts by RNA sequencing) is accepted.
`colData`	data frame, sample phenotype data. The rows of colData must correspond to the columns of expr.
`rowData`	NULL or data frame, information of each row/gene. Default is NULL, which will generate a DataFrame of three columns, i.e., "gene", "p", and "logFC".
`method`	either 'Wald_DESeq2', 'LRT_DESeq2', 'limma', or 'LRT_LM' for the differential expression analysis. When method = 'Wald_DESeq2', the Wald test in DESeq2 package is used; When method = 'LRT_DESeq2', the likelihood ratio test (LRT) in DESeq2 package is used; When method = 'limma', the 'ls' method and empirical Bayes method in limma package are used to calculate moderated t-statistics and differential p-values; When method = 'LRT_LM', a likelihood ratio test is performed for each row of 'expr' to compare two linear model specified by 'design' and 'reduced' arguments. In this case, the fold changes are not calculated but set to 0.
`minMeanExpr`	numeric, the cutoff of gene average expression for pre-filtering. The rows of 'expr' with everage expression < minMeanExpr is removed. The higher 'minMeanExpr' is, the more genes are not included for testing.
`design`	either model formula or model matrix. For method = 'LRT_DESeq2' or 'LRT_LM', the design is the full model formula/matrix. For method = 'limma', and if design is a formula, the model matrix is constructed using model.matrix(design, colData), so the name of each term in the design formula must be included in the column names of 'colData'.
`reduced`	The argument is used only when method = 'LRT_DESeq2' or 'LRT_LM', it is a reduced formula/matrix to compare against. If the design is a model matrix, 'reduced' must also be a model matrix.
`contrast`	The argument is used only when method = 'LRT_DESeq2', 'Wald_DESeq2', or 'limma'. When method = 'LRT_DESeq2', or 'Wald_DESeq2', it specifies what comparison to extract from the 'DESeqDataSet' object to build a results table (when method = 'LRT_DESeq2', this does not affect the value of 'stat', 'pvalue', or 'padj'). It can be one of following three formats: a character vector with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change; a list of 1 or 2 character vector(s): the first element specifies the names of the fold changes for the numerator, and the second element (optional) specifies the names of the fold changes for the denominator. These names should be elements of `getResultsNames(design, colData)`; a numeric contrast vector with one element for each element in `getResultsNames(design, colData)`. When method = 'limma', It can be one of following two formats: a numeric matrix with rows corresponding to coefficients in design matrix and columns containing contrasts; a numeric vector if there is only one contrast. Each element of the vector corresponds to coefficients in design matrix. This is similar to the third format of contrast when method = 'LRT_DESeq2', or 'Wald_DESeq2'.
`coef`	The argument is used only when method = 'limma'. (Vector of) column number or column name specifying which coefficient or contrast of the linear model is of interest. Default is NULL.
`name`	The argument is used only when method = 'LRT_DESeq2' or 'Wald_DESeq2'. the name of the individual effect (coefficient) for building a results table. Use this argument rather than contrast for continuous variables, individual effects or for individual interaction terms. The value provided to name must be an element of `getResultsNames(design, colData)`.
`fitType`	either 'parametric', 'local', or 'mean' for the type of fitting of dispersions to the mean intensity. This argument is used only when method = 'Wald_DESeq2' or 'LRT_DESeq2'. See `DESeq` from DESeq2 package for more details. Default is 'parametric'.
`sfType`	either 'ratio', 'poscounts', or 'iterate' for the type of size factor estimation. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `DESeq` from DESeq2 package for more details. Default is 'ratio'.
`betaPrior`	This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `DESeq` from DESeq2 package for more details.
`minReplicatesForReplace`	This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `DESeq` from DESeq2 package for more details. Default is 7.
`useT`	This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `DESeq` from DESeq2 package for more details. Default is FALSE,
`minmu`	This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `DESeq` from DESeq2 package for more details. Default is 0.5.
`parallel`	whether computing (only for differential analysis with method = "Wald_DESeq2" or "LRT_DESeq2") is parallel (default is FALSE).
`BPPARAM`	parameters for parallel computing (default is `bpparam()`).
`altHypothesis`	= c('greaterAbs', 'lessAbs', 'greater', 'less'). This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details. Default is 'greaterAbs'.
`listValues`	This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details. Default is c(1, -1),
`cooksCutoff`	theshold on Cook's distance, such that if one or more samples for a row have a distance higher, the p-value for the row is set to NA. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details.
`independentFiltering`	logical, whether independent filtering should be applied automatically. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details. Default is TRUE.
`alpha`	the significance cutoff used for optimizing the independent filtering. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details. Default is 0.1,
`filter`	the vector of filter statistics over which the independent filtering is optimized. By default the mean of normalized counts is used. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details.
`theta`	the quantiles at which to assess the number of rejections from independent filtering. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details.
`filterFun`	an optional custom function for performing independent filtering and p-value adjustment. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details.
`addMLE`	if betaPrior=TRUE was used, whether the 'unshrunken' maximum likelihood estimates (MLE) of log2 fold change should be added as a column to the results table. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details. Default is FALSE.
`blind`	logical, whether to blind the transformation to the experimental design. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `vst` from DESeq2 package for more details. Default is FALSE, which is different from the default of vst function.
`ndups`	positive integer giving the number of times each distinct probe is printed on each array. This argument is used only when method = 'limma'. See `lmFit` from limma package for more details. Default is 1.
`spacing`	positive integer giving the spacing between duplicate occurrences of the same probe, spacing=1 for consecutive rows. This argument is used only when method = 'limma'. See `lmFit` from limma package for more details. Default is 1.
`block`	vector or factor specifying a blocking variable on the arrays. Has length equal to the number of arrays. Must be NULL if ndups > 2. This argument is used only when method = 'limma'. See `lmFit` from limma package for more details. Default is NULL.
`correlation`	the inter-duplicate or inter-technical replicate correlation. The correlation value should be estimated using the `duplicateCorrelation` function. This argument is used only when method = 'limma'. See `lmFit` from limma package for more details.
`weights`	non-negative precision weights. Can be a numeric matrix of individual weights of same size as the object expression matrix, or a numeric vector of array weights with length equal to ncol of the expression matrix, or a numeric vector of gene weights with length equal to nrow of the expression matrix. This argument is used only when method = 'limma' or 'LRT_LM'. See `lmFit` from limma package for more details. Default is NULL.
`proportion`	numeric value between 0 and 1, assumed proportion of genes which are differentially expressed. This argument is used only when method = 'limma'. See `eBayes` from limma package for more details. Default is 0.01.
`stdev.coef.lim`	numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes. This argument is used only when method = 'limma'. See `eBayes` from limma package for more details. Default is c(0.1, 4).
`trend`	logical, should an intensity-trend be allowed for the prior variance? This argument is used only when method = 'limma'. See `eBayes` from limma package for more details. Default is FALSE, meaning that the prior variance is constant.
`robust`	logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances? This argument is used only when method = 'limma'. See `eBayes` from limma package for more details. Default is FALSE.
`winsor.tail.p`	numeric vector of length 1 or 2, giving left and right tail proportions of x to Winsorize. Used only when method = 'limma' and robust=TRUE. See `eBayes` from limma package for more details. Default is c(0.05,0.1)
`reg`	a vector of regulator names (ID). By default, these are transcription (co-)factors defined by three literatures/databases, namely RegNet, TRRUST, and Marbach2016. The type (for example ENSEMBL gene ID, Entrez gene ID, or gene symble/name) of names or IDs of these regulators must be the same as the type of names or IDs in the regulator-target network.
`networkConstruction`	the method to construct this network. Possible can be: 'COEN', coexpression network; 'GRN', gene regulatory network by random forest; 'new' (default), meaning a network provided by user, rather than infered based on the expression data.
`topNetPercent`	numeric, what percentage of the top edges in the full network is ratained. Default is 5, meaning top 5% of edges. This value must be between 0 and 100.
`directed`	logical, whether the network is directed. Default is FALSE.
`rowSample`	logic, if TRUE, each row represents a sample. Otherwise, each column represents a sample. Default is FALSE.
`softPower`	numeric, a soft power to achieve scale free topology. If not provided, the parameter will be picked automatically by `plotSoftPower` function.
`networkType`	network type. Allowed values are (unique abbreviations of) 'unsigned' (default), 'signed', 'signed hybrid'. See `adjacency`.
`TOMDenom`	a character string specifying the TOM variant to be used. Recognized values are 'min' giving the standard TOM described in Zhang and Horvath (2005), and 'mean' in which the min function in the denominator is replaced by mean. The 'mean' may produce better results but at this time should be considered experimental.
`RsquaredCut`	desired minimum scale free topology fitting index R^2. Default is 0.85.
`edgeThreshold`	numeric, the threshold to remove the low weighted edges, Default is NULL, which means no edges will be removed.
`K`	integer or character. The number of features in each tree, can be either a integer number, 'sqrt', or 'all'. 'sqrt' denotes sqrt(the number of 'reg'), 'all' means the number of 'reg'. Default is 'sqrt'.
`nbTrees`	integer. The number of trees. Default is 1000.
`importanceMeasure`	character. importanceMeasure can be '%IncMSE' or 'IncNodePurity', corresponding to type = 1 and 2 in `importance` function, respectively. Default is 'IncNodePurity'(decrease in node impurity), which is faster than '%IncMSE' (decrease in accuracy).
`trace`	logical. To show the progress or not (default).
`minR`	numeric. The minimum correlation coefficient of prediction is to control model accuracy. Default is 0.3.
`enrichTest`	character, specifying the enrichment analysis method, which is either ‘FET' (Fisher’s exact test) or 'GSEA' (gene set enrichment analysis).
`namedScoresCutoffs`	numeric, the significance cutoff for the differential analysis p value. Default is 0.05.
`minSize`	The minimum number (default 5) of target genes.
`maxSize`	The maximum number (default 5000) of target genes.
`pvalueCutoff`	numeric, the significance cutoff for adjusted enrichment p value. This is used for obtaining the 'topResult' slot in the final 'Enrich' object. Default is 0.05.
`qvalueCutoff`	numeric, the significance cutoff of enrichment q-value. Default is 0.2.
`regAltName`	alternative name for regulator. Default is NULL.
`universe`	a vector of charactors. Background target genes.
`nperm`	integer, number of permutations. The minimial possible nominal p-value is about 1/nperm. Default is 10000.

Value

an object of RegenrichSet class.

Examples

# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')
object
# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')
object

RegenrichSet class

Description

The RegenrichSet is the fundamental class that RegEnrich package is working with.

Slots

assayRaw: matrix, the initial raw expression data.
colData: DataFrame object, indicating sample information. Each row represent a sample and each column represent a feature of samples.
assays: SimpleList object, containing the expression data after filtering (and after Variance Stabilizing Transformation, i.e. VST, if the differential analysis method is 'Wald_DESeq2' or 'LRT_DESeq2').
elementMetadata: DataFrame object, a slot for saving results by differential expression analysis, containing at least three columns:'gene', 'p' and 'logFC'.
topNetwork: TopNetwork object, a slot for saving top network edges. After regulator-target network inference, a TopNetwork-class object is assigned to this slot, containing only top ranked edges in the full network. Default is NULL.
resEnrich: Enrich object, a slot for saving enrichment analysis either by Fisher's exact test (FET) or gene set enrichment analysis (GSEA).
resScore: Score object, a slot for saving regulator ranking results. It contains five components, which are 'reg' (regulator), 'negLogPDEA' (-log10(p values of differential expression analysis)), 'negLogPEnrich' (-log10(p values of enrichment analysis)), 'logFC' (log2 fold changes), and 'score' (RegEnrich ranking score).
paramsIn: list. The parameters used in the whole RegEnrich analysis. This slot can be updated by respecifying arguments in each step of RegEnrich analysis.
paramsOut: a list of four elements: DeaMethod (differential expression method), networkType (regulator-target network construction method), percent (what percentage of edges from the full network is used), and enrichTest (enrichment method). By default, each element is NULL.
network: TopNetwork object, a slot for saving a full network.

Result accessor functions

Description

results_expr accesses raw expression data.
results_DEA accesses results from differential expression analysis.
results_topNet accesses results from network inference.
retults_enrich accesses results from FET/GSEA enrichment analysis.
results_score accesses results from regulator scoring and ranking.

Usage

results_expr(object)

results_DEA(object)

results_topNet(object)

results_enrich(object)

results_score(object)
results_expr(object)

results_DEA(object)

results_topNet(object)

results_enrich(object)

results_score(object)

Arguments

object

RegenrichSet object.

Value

results_expr retures an expression matrix.

results_DEA returns a list result of differentila analysis.

results_topNet returns a TopNetwork object.

results_enrich returns an Enrich object by either FET or GSEA method.

results_score returns an data frame of summarized ranking scores of regulators.

Examples

# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')


# Differential expression analysis
object = regenrich_diffExpr(object)
results_expr(object)
results_DEA(object)

# Network inference using 'COEN' method
object = regenrich_network(object)
results_topNet(object)

# Enrichment analysis by Fisher's exact test (FET)
object = regenrich_enrich(object)
results_enrich(object)

# Regulators ranking
object = regenrich_rankScore(object)
results_score(object)

# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')


# Differential expression analysis
object = regenrich_diffExpr(object)
results_expr(object)
results_DEA(object)

# Network inference using 'COEN' method
object = regenrich_network(object)
results_topNet(object)

# Enrichment analysis by Fisher's exact test (FET)
object = regenrich_enrich(object)
results_enrich(object)

# Regulators ranking
object = regenrich_rankScore(object)
results_score(object)

Score class

Description

'Score' class inherits tibble ("tbl"). The objects of 'Score' class are to store information of regulator ranking scores.

Usage

newScore(
  reg = character(),
  negLogPDEA = numeric(),
  negLogPEnrich = numeric(),
  logFC = numeric(),
  score = numeric()
)
newScore(
  reg = character(),
  negLogPDEA = numeric(),
  negLogPEnrich = numeric(),
  logFC = numeric(),
  score = numeric()
)

Arguments

`reg`	character, regulator IDs.
`negLogPDEA`	numeric, -log(p_DEA).
`negLogPEnrich`	numeric, -log(p_Enrich).
`logFC`	numeric, log2 fold change.
`score`	numeric, RegEnrich ranking score.

Value

newScore function returns a Score object.

Slots

names: character vector, containing "reg", "negLogPDEA", "negLogPEnrich", "logFC", and "score".
.Data: a list of length 5, each elements corresponds to the names slots.
row.names: character, regulators corresponding to .Data slot.
.S3Class: character vector, containing "tbl_df", "tbl", "data.frame", indicating the classes that 'Score' class inherits.

Examples

newScore()
newScore(letters[1:5], 1:5, 1:5, -2:2, seq(2, 1, len = 5))
newScore()
newScore(letters[1:5], 1:5, 1:5, -2:2, seq(2, 1, len = 5))

methods of generic function "show"

Description

methods of generic function "show"

Usage

## S4 method for signature 'DeaSet'
show(object)

## S4 method for signature 'TopNetwork'
show(object)

## S4 method for signature 'Enrich'
show(object)

## S4 method for signature 'Score'
show(object)

## S4 method for signature 'RegenrichSet'
show(object)
## S4 method for signature 'DeaSet'
show(object)

## S4 method for signature 'TopNetwork'
show(object)

## S4 method for signature 'Enrich'
show(object)

## S4 method for signature 'Score'
show(object)

## S4 method for signature 'RegenrichSet'
show(object)

Arguments

object

one object of either DeaSet, TopNetwork, Enrich, Score, or RegenrichSet class.

Value

show returns an invisible original object.

Examples

x = newScore(letters[1:5], 1:5, 1:5, -2:2, seq(2, 1, len = 5))
show(x)

x = newScore(letters[1:5], 1:5, 1:5, -2:2, seq(2, 1, len = 5))
show(x)

Human gene regulators

Description

The transcrpiton factors and co-factors in humans are considered the regulators in RegEnrich. And these regulators are obtained from (Han et al. 2015; Marbach et al. 2016; and Liu et al. 2015).

Usage

data(TFs)
data(TFs)

Format

An object of 2-column data.frame; The first column is ENSEMBL ID of gene regulators. The second column is gene name of gene regulators. The row name of this data frame is identical to the ENSEMBL ID column.

References

Han et al. (2015) Scientific Reports, 5:11432 (PubMed), Liu et al. (2015) Database, bav095 (PubMed), Marbach et al. (2016) Nature Methods, 13(4):366-70 (PubMed).

TopNetwork class

Description

The 'TopNetwork' object is to store either a full network (the percentage of top edges is 100 between 0 to 10).

Slots

element: tibble, the pool of targets in the network.
set: tibble, the pool of valid regulators.
elementset: tibble, regulator-target edges with edge weights. and the elements are regulators of the targets indicated by the element name.
directed: logical, whether the network is directed.
networkConstruction: character, by which method this network is constructed. Either 'COEN' (coexpression network using WGCNA), or 'GRN' (gene regulatory network using random forest), or 'new' (a network provided by the user).
percent: numeric, what percentage of the top edges are remained. The value must be between 0 (excluding) and 100 (including).
active: character, which data table is activated, the default is "elementset".

Package 'RegEnrich'

Help Index

DeaSet class

Description

Slots

Examples

dimention of 'TopNetwork' object

Description

Usage

Arguments

Value

Examples

Enrich class

Description

Slots

Inference the name of results of DESeq analysis by a formula (or model matrix) and sample information

Description

Usage

Arguments

Value

Examples

head or tail of Score object

Description

Usage

Arguments

Value

Examples

Example RNAseq dataset [Human]

Description

Usage

Format

Source

References

DeaSet object creator

Description

Usage

Arguments

Value

Examples

TopNetwork object creator

Description

Usage

Arguments

Value

Examples

Plot results of FET/GSEA enrichment analysis

Description

Usage

Arguments

Value

Examples

Compare the orders of two vectors

Description

Usage

Arguments

Value

Examples

Plot regulator and its targets expression

Description

Usage

Arguments

Value

Examples

Plot soft power for WGCNA analysis

Description

Usage

Arguments

Value

Examples

Print Score object

Description

Usage

Arguments

Value

Examples

Differential expression analysis step

Description

Usage

Arguments

Value