Title: | Coexpression for Transcription Factors using Regulatory Impact Factors and Partial Correlation and Information Theory analysis |
---|---|
Description: | This package provides the necessary functions for performing the Partial Correlation coefficient with Information Theory (PCIT) (Reverter and Chan 2008) and Regulatory Impact Factors (RIF) (Reverter et al. 2010) algorithm. The PCIT algorithm identifies meaningful correlations to define edges in a weighted network and can be applied to any correlation-based network including but not limited to gene co-expression networks, while the RIF algorithm identify critical Transcription Factors (TF) from gene expression data. These two algorithms when combined provide a very relevant layer of information for gene expression studies (Microarray, RNA-seq and single-cell RNA-seq data). |
Authors: | Carlos Alberto Oliveira de Biagi Junior [aut, cre], Ricardo Perecin Nociti [aut], Breno Osvaldo Funicheli [aut], João Paulo Bianchi Ximenez [ctb], Patrícia de Cássia Ruy [ctb], Marcelo Gomes de Paula [ctb], Rafael dos Santos Bezerra [ctb], Wilson Araújo da Silva Junior [aut, ths] |
Maintainer: | Carlos Alberto Oliveira de Biagi Junior <[email protected]> |
License: | GPL-3 |
Version: | 1.19.0 |
Built: | 2024-10-30 05:06:21 UTC |
Source: | https://github.com/bioc/CeTF |
Read two columns of data values (say X and Y) and computes summary statistics including N, Mean, SD, Min and Max for X and Y, as well as the correlation between X and Y and the regression of Y on X.
bivar.awk(x)
bivar.awk(x)
x |
A dataframe with two columns (variables). |
Returns an summary statistics for two variables.
# creating a random dataframe with two columns (variables) tab <- data.frame(a = sample(1:1000, 100, replace=TRUE), b = sample(1:1000, 100, replace=TRUE)) # running bivar.awk function bivar.awk(tab)
# creating a random dataframe with two columns (variables) tab <- data.frame(a = sample(1:1000, 100, replace=TRUE), b = sample(1:1000, 100, replace=TRUE)) # running bivar.awk function bivar.awk(tab)
The CeTF class is data storage class that stores all the
results from runAnalysis
function.
Returns an CeTF object.
Data
Includes the raw, tpm and norm (see normExp
) data.
DE
Includes the uniquely differentially expressed genes/TFs and
the statistics for all genes (see expDiff
).
Input
Includes input matrices for RIF (see RIF
) and PCIT
(see PCIT
) for both conditions in runAnalysis
function analysis.
Output
Includes the matrix output from RIF analysis
(see RIF
) and a matrix with PCIT output, and other two matrix
with raw and significant adjacency (see PCIT
) for both conditions
inside of runAnalysis
function analysis.
Network
Network with Gene-Gene and Gene-TF interactions for both
conditions (see PCIT
), main TFs resulted from the complete
analysis, all the TFs identified in the input data and matrix annotating all
genes and TFs.
A CeTFdemo class object to run the examples in functions. This object was generated after running the runAnalysis function. This object is the same generated in vignette for complete analysis. Note that this example data is reduced and don't have the Data slot.
data(CeTFdemo)
data(CeTFdemo)
An CeTF class object
data(CeTFdemo)
data(CeTFdemo)
Generate an plot for Transcription Targets (TFs) or any gene targets. This plot consists of sorting all the chromosomes of any specie based in GTF annotation file and showing how the selected TF(s)/gene(s) targets are distributed. If a target is connected to the same chromosome as the selected one so the connection is defined as cis, otherwise it is a trans connection.
CircosTargets(object, file, nomenclature, selection, cond)
CircosTargets(object, file, nomenclature, selection, cond)
object |
CeTF class object resulted from |
file |
GTF file or path. |
nomenclature |
Gene nomenclature: SYMBOL or ENSEMBL. |
selection |
Specify a single or up to 4 TF/gene to be visualized for. |
cond |
The options are condition1 or condition2 based on
the conditions previously defined in |
The black links are between different chromosomes while the red links are between the same chromosome.
Returns an plot with a specific(s) TF/gene and its targets in order to visualize the chromosome location of each one.
## Not run: CircosTargets(object = out, file = '/path/to/gtf/specie.gtf', nomenclature = 'SYMBOL', selection = 'TCF4', cond = 'condition1') ## End(Not run)
## Not run: CircosTargets(object = out, file = '/path/to/gtf/specie.gtf', nomenclature = 'SYMBOL', selection = 'TCF4', cond = 'condition1') ## End(Not run)
Calculate the clustering coefficient for an adjacency matrix.
clustCoef(mat)
clustCoef(mat)
mat |
An adjacency matrix. Calculating the clustering coefficient only makes sense if some connections are zero i.e. no connection. |
Returns the clustering coefficient(s) for the adjacency matrix.
Nathan S. Watson-Haigh, Haja N. Kadarmideen, and Antonio Reverter (2010). PCIT: an R package for weighted gene co-expression networks based on partial correlation and information theory approaches. Bioinformatics. 26(3) 411-413. https://academic.oup.com/bioinformatics/article/26/3/411/215002
# loading a simulated counts data data('simNorm') # running PCIT analysis results <- PCIT(simNorm) # getting the clustering coefficient clustCoef(results$adj_sig)
# loading a simulated counts data data('simNorm') # running PCIT analysis results <- PCIT(simNorm) # getting the clustering coefficient clustCoef(results$adj_sig)
Given an adjacency matrix, calculate the clustering coefficient as a percentage of non-zero adjacencies.
clustCoefPercentage(mat)
clustCoefPercentage(mat)
mat |
An adjacency matrix. Calculating the clustering coefficient percentage only makes sense if some connections are zero i.e. no connection. |
Returns the clustering coefficient as a porcentage.
Nathan S. Watson-Haigh, Haja N. Kadarmideen, and Antonio Reverter (2010). PCIT: an R package for weighted gene co-expression networks based on partial correlation and information theory approaches. Bioinformatics. 26(3) 411-413. https://academic.oup.com/bioinformatics/article/26/3/411/215002
# loading a simulated counts data data('simNorm') # running PCIT analysis results <- PCIT(simNorm) # getting the clustering coefficient as percentage clustCoefPercentage(results$adj_sig)
# loading a simulated counts data data('simNorm') # running PCIT analysis results <- PCIT(simNorm) # getting the clustering coefficient as percentage clustCoefPercentage(results$adj_sig)
Generate the density plot for adjacency matrices. This
function uses the raw adjacency matrix and significant adjacency
matrix resulted from PCIT
function.
densityPlot(mat1, mat2, threshold = 0.5)
densityPlot(mat1, mat2, threshold = 0.5)
mat1 |
Raw adjacency matrix. |
mat2 |
Significant adjacency matrix. |
threshold |
Threshold of correlation module to plot (default: 0.5). |
Returns an density plot of raw correlation with significant PCIT values.
# loading a simulated normalized data data('simNorm') # getting the PCIT results results <- PCIT(simNorm[1:20, ]) # using the PCIT results to get density distribution of correlation coefficients densityPlot(mat1 = results$adj_raw, mat2 = results$adj_sig, threshold = 0.5)
# loading a simulated normalized data data('simNorm') # getting the PCIT results results <- PCIT(simNorm[1:20, ]) # using the PCIT results to get density distribution of correlation coefficients densityPlot(mat1 = results$adj_raw, mat2 = results$adj_sig, threshold = 0.5)
Expand node selection using network propagation algorithms generating the expanded network for a core of genes and the network plot of this subnetwork.
diffusion(object, cond, genes, cyPath, name = "top_diffusion", label = TRUE)
diffusion(object, cond, genes, cyPath, name = "top_diffusion", label = TRUE)
object |
CeTF object resulted from |
cond |
Which conditions to be used to perform the diffusion analysis. The options are: network1 (1th condition) and network2 (2th condition). |
genes |
A single gene or a vector of characters indicating which genes will be used to perform diffusion analysis. |
cyPath |
System path of Cytoscape software (see details for further informations). |
name |
Network output name (default: top_diffusion) |
label |
If label is TRUE, shows the names of nodes (default: TRUE). |
To perform the diffusion analysis is necessary to install the latest Cytoscape software version (https://cytoscape.org/).
The cyPath argument varies depending on the operating system used, for example:
For Windows users: C:/Program Files/Cytoscape_v3.8.0/Cytoscape.exe
For Linux users: /home/user/Cytoscape_v3.8.0/Cytoscape
For macOS users: /Applications/Cytoscape_v3.8.0/cytoscape.sh
Returns a list with the plot of the network and a table with the diffusion network.
## Not run: data(CeTFdemo) result <- diffusion(object = CeTFdemo, cond = 'network1', genes = c('ENSG00000185591', 'ENSG00000179094'), cyPath = 'C:/Program Files/Cytoscape_v3.7.2/Cytoscape.exe', name = 'top_diffusion', label = TRUE) ## End(Not run)
## Not run: data(CeTFdemo) result <- diffusion(object = CeTFdemo, cond = 'network1', genes = c('ENSG00000185591', 'ENSG00000179094'), cyPath = 'C:/Program Files/Cytoscape_v3.7.2/Cytoscape.exe', name = 'top_diffusion', label = TRUE) ## End(Not run)
Enrichemnt result from CeTFdemo
using the genes
of condition 1 network.
data(enrichdemo)
data(enrichdemo)
An list
data(enrichdemo)
data(enrichdemo)
Generate three types of plots to visualize the enrichment
analysis results from getEnrich
function. The plots are an
circular barplot, barplot and dotplot.
enrichPlot(res, showCategory = 10, type = "circle")
enrichPlot(res, showCategory = 10, type = "circle")
res |
A dataframe with |
showCategory |
Number of enriched terms to display (default: 10). |
type |
Type of plot: circle, bar or dot (default: circle). |
Returns a circle, bar or dot plot of enrichment analysis results.
# loading enrichdemo data(enrichdemo) # circle barplot enrichPlot(res = enrichdemo$results, showCategory = 10, type = 'circle') # barplot enrichPlot(res = enrichdemo$results, showCategory = 10, type = 'bar') # dotplot enrichPlot(res = enrichdemo$results, showCategory = 10, type = 'dot')
# loading enrichdemo data(enrichdemo) # circle barplot enrichPlot(res = enrichdemo$results, showCategory = 10, type = 'circle') # barplot enrichPlot(res = enrichdemo$results, showCategory = 10, type = 'bar') # dotplot enrichPlot(res = enrichdemo$results, showCategory = 10, type = 'dot')
This function returns the differentially expressed genes when comparing two conditions.
expDiff( exp, anno = NULL, conditions = NULL, lfc = 1.5, padj = 0.05, diffMethod = "Reverter" )
expDiff( exp, anno = NULL, conditions = NULL, lfc = 1.5, padj = 0.05, diffMethod = "Reverter" )
exp |
Count data where the rows are genes and coluns the samples. |
anno |
A single column dataframe. The column name must be 'cond', and the rownames must be the names of samples. |
conditions |
A character vector containing the name of the two conditions. The first name will be selected as reference. |
lfc |
log2 fold change module threshold to define a gene as differentially expressed (default: 1.5). |
padj |
Significance value to define a gene as differentially expressed (default: 0.05). |
diffMethod |
Choose between Reverter or DESeq2 method (default: 'Reverter'). The DESeq2 method is only for counts data (see details). |
The Reverter option to diffMethod parameter works as follows:
Calculation of mean between samples of each condition for all genes;
Subtraction between mean of control condition relative to other condition;
Calculation of variance of subtraction previously obtained;
The last step calculates the differential expression using the following formula, where x is the result of substraction (item 2) and var is the variance calculated in item 3:
The DESeq2 option to diffMethod parameter is recommended only for
count data. This method apply the differential expression analysis based
on the negative binomial distribution (see DESeq
).
Returns an list with all calculations of differentially expressed genes and the subsetted differentially expressed genes by lfc and/or padj.
REVERTER, Antonio et al. Simultaneous identification of differential gene expression and connectivity in inflammation, adipogenesis and cancer. Bioinformatics, v. 22, n. 19, p. 2396-2404, 2006. https://academic.oup.com/bioinformatics/article/22/19/2396/240742
# loading a simulated counts data data('simCounts') # creating the dataframe with annotation for each sample anno <- data.frame(cond = c(rep('cond1', 10), rep('cond2', 10))) # renaming colums of simulated counts data colnames(simCounts) <- paste(colnames(simCounts), anno$cond, sep = '_') # renaming anno rows rownames(anno) <- colnames(simCounts) # performing differential expression analysis using Reverter method out <- expDiff(exp = simCounts, anno = anno, conditions = c('cond1', 'cond2'), lfc = 2, padj = 0.05, diffMethod = 'Reverter')
# loading a simulated counts data data('simCounts') # creating the dataframe with annotation for each sample anno <- data.frame(cond = c(rep('cond1', 10), rep('cond2', 10))) # renaming colums of simulated counts data colnames(simCounts) <- paste(colnames(simCounts), anno$cond, sep = '_') # renaming anno rows rownames(anno) <- colnames(simCounts) # performing differential expression analysis using Reverter method out <- expDiff(exp = simCounts, anno = anno, conditions = c('cond1', 'cond2'), lfc = 2, padj = 0.05, diffMethod = 'Reverter')
The Data accessor access the raw, tpm and normalized data from
runAnalysis
function analysis.
getData(x, type = "raw") ## S4 method for signature 'CeTF' getData(x, type = "raw")
getData(x, type = "raw") ## S4 method for signature 'CeTF' getData(x, type = "raw")
x |
|
type |
Type of data: raw, tpm or norm (default: raw) |
Returns the raw, tpm or normalized data.
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) getData(CeTFdemo)
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) getData(CeTFdemo)
The DE accessor access the differential expression resulted from
runAnalysis
function analysis.
getDE(x, type = "unique") ## S4 method for signature 'CeTF' getDE(x, type = "unique")
getDE(x, type = "unique") ## S4 method for signature 'CeTF' getDE(x, type = "unique")
x |
|
type |
Type of DE matrix: unique and all (default: unique) |
Returns the DE genes with the statistics.
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) getDE(CeTFdemo)
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) getDE(CeTFdemo)
Enrichment analysis of a set of genes derived from the network of any condition using clusterProfiler. Given a vector of genes, this function will return the enrichment related to the selected database.
getEnrich( genes, organismDB, keyType, ont, fdrMethod = "BH", fdrThr = 0.05, minGSSize = 5, maxGSSize = 500 )
getEnrich( genes, organismDB, keyType, ont, fdrMethod = "BH", fdrThr = 0.05, minGSSize = 5, maxGSSize = 500 )
genes |
Should be an R vector object containing the interesting gene list. |
organismDB |
clusterProfiler supports a lot of different organisms. Users can check the following link (https://www.bioconductor.org/packages/release/data/annotation/) and search for annotations starting with *org.*. |
keyType |
The ID type of the input genes (i.e. SYMBOL, ENTREZID, ENSEMBL, etc.). |
ont |
The functional categories for the enrichment analysis. The available ontologies are Biological Process (BP), Molecular Function (MF) and Cellular Component (CC). |
fdrMethod |
Has five FDR methods: holm, hochberg, hommel, bonferroni, BH, BY, fdr and none(default: BH). |
fdrThr |
The significant threshold for selected pathways (default: 0.05). |
minGSSize |
Will be exclude the categories with the number of annotated genes less than minGSSize for enrichment analysis (default: 5). |
maxGSSize |
Will be exclude the categories with the number of annotated genes larger than maxGSSize for enrichment analysis (default: 500). |
Returns an list with the results of the enrichment analysis of the genes and a network with the database ID (column 1) and the corresponding genes (column 2).
## Not run: # load the CeTF class object resulted from runAnalysis function library(org.Hs.eg.db) data(CeTFdemo) # getting the genes in network of condition 1 genes <- unique(c(as.character(NetworkData(CeTFdemo, 'network1')[, 'gene1']), as.character(NetworkData(CeTFdemo, 'network1')[, 'gene2']))) # performing getEnrich analysis cond1 <- getEnrich(genes = genes, organismDB = org.Hs.eg.db, keyType = 'ENSEMBL', ont = 'BP', fdrMethod = "BH", fdrThr = 0.05, minGSSize = 5, maxGSSize = 500) ## End(Not run)
## Not run: # load the CeTF class object resulted from runAnalysis function library(org.Hs.eg.db) data(CeTFdemo) # getting the genes in network of condition 1 genes <- unique(c(as.character(NetworkData(CeTFdemo, 'network1')[, 'gene1']), as.character(NetworkData(CeTFdemo, 'network1')[, 'gene2']))) # performing getEnrich analysis cond1 <- getEnrich(genes = genes, organismDB = org.Hs.eg.db, keyType = 'ENSEMBL', ont = 'BP', fdrMethod = "BH", fdrThr = 0.05, minGSSize = 5, maxGSSize = 500) ## End(Not run)
Functional Profile of a gene set at specific GO level. Given a vector of genes, this function will return the GO profile at a specific level.
getGroupGO(genes, ont = "BP", keyType, annoPkg, level = 3)
getGroupGO(genes, ont = "BP", keyType, annoPkg, level = 3)
genes |
Character vector with the genes to perform the functional profile. |
ont |
One of 'MF', 'BP', and 'CC' subontologies (default: 'BP'). |
keyType |
Key type of inputted genes (i.e. 'ENSEMBL', 'SYMBOL', 'ENTREZID'). |
annoPkg |
Package of annotation of specific organism (i.e. org.Hs.eg.db, org.Bt.eg.db, org.Rn.eg.db, etc). |
level |
Specific GO Level (default: 3). |
Returns an list with the results of the functional profile of the genes and a network with the ontologies (column 1) and the corresponding genes (column 2).
## Not run: # load the annotation package library(org.Hs.eg.db) # load the CeTF class object resulted from runAnalysis function data(CeTFdemo) # getting the genes in network of condition 1 genes <- unique(c(as.character(NetworkData(CeTFdemo, 'network1')[, 'gene1']), as.character(NetworkData(CeTFdemo, 'network1')[, 'gene2']))) # performing getGroupGO analysis cond1 <- getGroupGO(genes = genes, ont = 'BP', keyType = 'ENSEMBL', annoPkg = org.Hs.eg.db, level = 3) ## End(Not run)
## Not run: # load the annotation package library(org.Hs.eg.db) # load the CeTF class object resulted from runAnalysis function data(CeTFdemo) # getting the genes in network of condition 1 genes <- unique(c(as.character(NetworkData(CeTFdemo, 'network1')[, 'gene1']), as.character(NetworkData(CeTFdemo, 'network1')[, 'gene2']))) # performing getGroupGO analysis cond1 <- getGroupGO(genes = genes, ont = 'BP', keyType = 'ENSEMBL', annoPkg = org.Hs.eg.db, level = 3) ## End(Not run)
Converts a GTF to BED format.
gtfToBed(gtf)
gtfToBed(gtf)
gtf |
A GTF as data.frame. |
Returns a data.frame in BED format.
Heatmap-like functional classification to visualize the
enrichment analysis results from getEnrich
function. The plot
contains the heatmap with the associated pathways genes, the significance
of the enrichment and a barplot with the enrichment ratio.
heatPlot(res, diff, showCategory = 10, font_size = 6)
heatPlot(res, diff, showCategory = 10, font_size = 6)
res |
A dataframe with |
diff |
A dataframe with all differentialy expressed genes obtained
from |
showCategory |
Number of enriched terms to display (default: 10). |
font_size |
Size of gene row names (default: 6). |
Returns a Heatmap-like functional classification
# loading enrichdemo and CeTFdemo object data(enrichdemo) data(CeTFdemo) heatPlot(res = enrichdemo$results, diff = getDE(CeTFdemo, 'all'), showCategory = 10)
# loading enrichdemo and CeTFdemo object data(enrichdemo) data(CeTFdemo) heatPlot(res = enrichdemo$results, diff = getDE(CeTFdemo, 'all'), showCategory = 10)
Generate the histogram for adjacency matrix to show the clustering coefficient distribution.
histPlot(mat)
histPlot(mat)
mat |
Adjacency matrix resulting from PCIT analysis in which has some zero values. |
Returns the histogram of connectivity distribution.
# loading a simulated normalized data data(simNorm) # getting the PCIT results for first 30 genes results <- PCIT(simNorm[1:30, ]) # plotting the histogram for PCIT significance results histPlot(results$adj_sig)
# loading a simulated normalized data data(simNorm) # getting the PCIT results for first 30 genes results <- PCIT(simNorm[1:30, ]) # plotting the histogram for PCIT significance results histPlot(results$adj_sig)
The input accessor access the input matrices used for RIF and
PCIT analysis to both conditions resulted from runAnalysis
function analysis.
InputData(x, analysis = "rif") ## S4 method for signature 'CeTF' InputData(x, analysis = "rif")
InputData(x, analysis = "rif") ## S4 method for signature 'CeTF' InputData(x, analysis = "rif")
x |
|
analysis |
Type of analysis: rif, pcit1, pcit2. The numbers 1 and 2 correspond to the respective condition (default: rif). |
Returns the Inputs used for RIF and PCIT.
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) InputData(CeTFdemo)
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) InputData(CeTFdemo)
Generate the network plot of gene-gene/gene-TFs interactions for both conditions.
netConditionsPlot(x)
netConditionsPlot(x)
x |
CeTF object resulted from |
Returns the network plot for both conditions.
# loading a simulated counts data data('simCounts') # performing runAnalysis function out <- runAnalysis(mat = simCounts, conditions=c('cond1', 'cond2'), lfc = 3, padj = 0.05, TFs = paste0('TF_', 1:1000), nSamples1 = 10, nSamples2= 10, tolType = 'mean', diffMethod = 'Reverter', data.type = 'counts') # plotting networks conditions netConditionsPlot(out)
# loading a simulated counts data data('simCounts') # performing runAnalysis function out <- runAnalysis(mat = simCounts, conditions=c('cond1', 'cond2'), lfc = 3, padj = 0.05, TFs = paste0('TF_', 1:1000), nSamples1 = 10, nSamples2= 10, tolType = 'mean', diffMethod = 'Reverter', data.type = 'counts') # plotting networks conditions netConditionsPlot(out)
Generate the plot of groupGO network result of
getGroupGO
function, and the integrated network of genes, GOs
and TFs.
netGOTFPlot( netCond, resultsGO, netGO, anno, groupBy = "pathways", TFs = NULL, genes = NULL, keyTFs = NULL, size = 0.5, type = NULL )
netGOTFPlot( netCond, resultsGO, netGO, anno, groupBy = "pathways", TFs = NULL, genes = NULL, keyTFs = NULL, size = 0.5, type = NULL )
netCond |
Network of a specific condition. Can be found in
result of |
resultsGO |
Dataframe with the results of |
netGO |
Dataframe with the results of |
anno |
Annotation of gene or TFs. Can be found in result of
|
groupBy |
Which variables do you want to group in GO type? The options are: 'pathways', 'TFs' and 'genes' (default: 'pathways'). |
TFs |
A character with selected TFs. |
genes |
A character with selected genes. |
keyTFs |
TFs identified as importants by |
size |
Size of nodes labels (default: 0.5). |
type |
Type of plot selected (GO or Integrated). If GO will plot the associated GO grouped by some variable, and if Integrated will plot a integrated network with genes, GO and TFs. |
Returns a list with the plot of the network for GO or integrated output under a condition and the table used to plot the network.
## Not run: # load the annotation package library(org.Hs.eg.db) # load the CeTF class object resulted from runAnalysis function data(CeTFdemo) # getting the genes in network of condition 1 genes <- unique(c(as.character(NetworkData(CeTFdemo, 'network1')[, 'gene1']), as.character(NetworkData(CeTFdemo, 'network1')[, 'gene2']))) # performing getGroupGO analysis cond1 <- getGroupGO(genes = genes, ont = 'BP', keyType = 'ENSEMBL', annoPkg = org.Hs.eg.db, level = 3) # selecting only first 12 pathways t1 <- head(cond1$results, 12) # subsetting the network to have only the first 12 pathways t2 <- subset(cond1$netGO, cond1$netGO$gene1 %in% as.character(t1[, 'ID'])) # generate the GO plot grouping by pathways pt <- netGOTFPlot(netCond = NetworkData(CeTFdemo, 'network1'), resultsGO = t1, netGO = t2, anno = NetworkData(CeTFdemo, 'annotation'), groupBy = 'pathways', keyTFs = NetworkData(CeTFdemo, 'keytfs'), type = 'GO') pt$plot head(pt$tab$`GO:0006807`) # generate the Integrated plot pt <- netGOTFPlot(netCond = NetworkData(CeTFdemo, 'network1'), resultsGO = t1, netGO = t2, anno = NetworkData(CeTFdemo, 'annotation'), groupBy = 'pathways', keyTFs = NetworkData(CeTFdemo, 'keytfs'), type = 'Integrated') pt$plot head(pt$tab) ## End(Not run)
## Not run: # load the annotation package library(org.Hs.eg.db) # load the CeTF class object resulted from runAnalysis function data(CeTFdemo) # getting the genes in network of condition 1 genes <- unique(c(as.character(NetworkData(CeTFdemo, 'network1')[, 'gene1']), as.character(NetworkData(CeTFdemo, 'network1')[, 'gene2']))) # performing getGroupGO analysis cond1 <- getGroupGO(genes = genes, ont = 'BP', keyType = 'ENSEMBL', annoPkg = org.Hs.eg.db, level = 3) # selecting only first 12 pathways t1 <- head(cond1$results, 12) # subsetting the network to have only the first 12 pathways t2 <- subset(cond1$netGO, cond1$netGO$gene1 %in% as.character(t1[, 'ID'])) # generate the GO plot grouping by pathways pt <- netGOTFPlot(netCond = NetworkData(CeTFdemo, 'network1'), resultsGO = t1, netGO = t2, anno = NetworkData(CeTFdemo, 'annotation'), groupBy = 'pathways', keyTFs = NetworkData(CeTFdemo, 'keytfs'), type = 'GO') pt$plot head(pt$tab$`GO:0006807`) # generate the Integrated plot pt <- netGOTFPlot(netCond = NetworkData(CeTFdemo, 'network1'), resultsGO = t1, netGO = t2, anno = NetworkData(CeTFdemo, 'annotation'), groupBy = 'pathways', keyTFs = NetworkData(CeTFdemo, 'keytfs'), type = 'Integrated') pt$plot head(pt$tab) ## End(Not run)
The networks accessor access the networks, key TFs and
annotations for each gene and TF resulted from PCIT analysis and
runAnalysis
function analysis.
NetworkData(x, type = "network1") ## S4 method for signature 'CeTF' NetworkData(x, type = "network1")
NetworkData(x, type = "network1") ## S4 method for signature 'CeTF' NetworkData(x, type = "network1")
x |
|
type |
Type of data: network1, network2, keytfs, tfs or annotation.The numbers 1 and 2 correspond to the respective condition (default: network1). |
Returns the Outputs used for RIF and PCIT.
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) NetworkData(CeTFdemo)
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) NetworkData(CeTFdemo)
Normalize the expression data of any type of experiment by columns, applying log(x + 1)/log(2).
normExp(tab)
normExp(tab)
tab |
A matrix or dataframe of expression data (i.e. TPM, counts, FPKM). |
Returns a table with normalized values.
# loading a simulated counts data data('simCounts') # getting the TPM matrix from counts tpm <- apply(simCounts, 2, function(x) { (1e+06 * x)/sum(x) }) # normalizing TPM data norm <- normExp(tpm)
# loading a simulated counts data data('simCounts') # getting the TPM matrix from counts tpm <- apply(simCounts, 2, function(x) { (1e+06 * x)/sum(x) }) # normalizing TPM data norm <- normExp(tpm)
The output accessor access the output matrices and lists
used for RIF and PCIT analysis to both conditions resulted from
runAnalysis
function analysis.
OutputData(x, analysis = "rif", type = "tab") ## S4 method for signature 'CeTF' OutputData(x, analysis = "rif", type = "tab")
OutputData(x, analysis = "rif", type = "tab") ## S4 method for signature 'CeTF' OutputData(x, analysis = "rif", type = "tab")
x |
|
analysis |
Type of analysis: rif, pcit1, pcit2. The numbers 1 and 2 correspond to the respective condition (default: rif). |
type |
Type of matrix for PCIT output: tab, adj_raw or adj_sig (default: tab). |
Returns the Outputs used for RIF and PCIT.
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) OutputData(CeTFdemo)
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) OutputData(CeTFdemo)
The PCIT algorithm is used for reconstruction of gene co-expression networks (GCN) that combines the concept partial correlation coefficient with information theory to identify significant gene to gene associations defining edges in the reconstruction of GCN.
PCIT(input, tolType = "mean")
PCIT(input, tolType = "mean")
input |
A correlation matrix. |
tolType |
Type of tolerance (default: 'mean') given the 3 pairwise correlations
(see |
Returns an list with the significant correlations, raw adjacency matrix and significant adjacency matrix.
REVERTER, Antonio; CHAN, Eva KF. Combining partial correlation and an information theory approach to the reversed engineering of gene co-expression networks. Bioinformatics, v. 24, n. 21, p. 2491-2497, 2008. https://academic.oup.com/bioinformatics/article/24/21/2491/192682
# loading a simulated normalized data data('simNorm') # getting the PCIT results for first 30 genes results <- PCIT(simNorm[1:30, ]) # printing PCIT output first 15 rows head(results$tab, 15)
# loading a simulated normalized data data('simNorm') # getting the PCIT results for first 30 genes results <- PCIT(simNorm[1:30, ]) # printing PCIT output first 15 rows head(results$tab, 15)
Calculates the correlation matrix using PCIT algorithm
pcitC(cor, tolType)
pcitC(cor, tolType)
cor |
A correlation matrix. |
tolType |
Type of tolerance (default: 'mean') given the 3 pairwise correlations
(see |
Correlation matrix resulted from PCIT algorithm.
(see PCIT
)
library(Matrix) # loading a simulated normalized data data('simNorm') # calculating the correlation matrix suppressWarnings(gene_corr <- cor(t(simNorm[1:30, ]))) gene_corr[is.na(gene_corr)] <- 0 # getting the PCIT correlation results for first 30 genes results <- pcitC(cor = Matrix(gene_corr, sparse = TRUE), tolType = 1)
library(Matrix) # loading a simulated normalized data data('simNorm') # calculating the correlation matrix suppressWarnings(gene_corr <- cor(t(simNorm[1:30, ]))) gene_corr[is.na(gene_corr)] <- 0 # getting the PCIT correlation results for first 30 genes results <- pcitC(cor = Matrix(gene_corr, sparse = TRUE), tolType = 1)
List with protein codings for 5 organisms that must be used as reference genes for functional enrichment. This list was generated using Ensembl GTF. The organisms are: Human (Homo sapiens), Mouse (Mus musculus), Zebrafish (Danio rerio), Cow (Bos taurus) and Rat (Rattus norvegicus).
data(refGenes)
data(refGenes)
An list.
https://www.ensembl.org/info/data/ftp/index.html
data(refGenes)
data(refGenes)
The RIF algorithm identify critical transcript factors (TF) from gene expression data.
RIF(input, nta = NULL, ntf = NULL, nSamples1 = NULL, nSamples2 = NULL)
RIF(input, nta = NULL, ntf = NULL, nSamples1 = NULL, nSamples2 = NULL)
input |
A matrix of expression with differentially expressed genes and transcript factors in rows, and the samples in columns. |
nta |
Number of Differentially Expressed (DE) genes. |
ntf |
Number of Transcription Factors (TFs). |
nSamples1 |
Number of samples of condition 1. |
nSamples2 |
Number of samples of condition 2. |
The input matrix must have the rows and columns ordered by the following request:
rows: DE genes followed by TFs;
columns: samples of condition1 followed by samples of condition2.
Returns an dataframe with the regulatory impact factors metric for each transcript factor.
REVERTER, Antonio et al. Regulatory impact factors: unraveling the transcriptional regulation of complex traits from expression data. Bioinformatics, v. 26, n. 7, p. 896-904, 2010. https://academic.oup.com/bioinformatics/article/26/7/896/212064
# load RIF input example data('RIF_input') # performing RIF analysis RIF_out <- RIF(input = RIF_input, nta = 104, ntf = 50, nSamples1 = 10, nSamples2 = 10)
# load RIF input example data('RIF_input') # performing RIF analysis RIF_out <- RIF(input = RIF_input, nta = 104, ntf = 50, nSamples1 = 10, nSamples2 = 10)
Data used to the examples of RIF analysis. This data was generated based on simulated counts and normalized data.
data(RIF_input)
data(RIF_input)
An dataframe.
data(RIF_input)
data(RIF_input)
Generate plots for the relationship between the RIF output analysis (RIF1 and RIF2) and for differentially expressed genes (DE).
RIFPlot(object, color = "darkblue", type = "RIF")
RIFPlot(object, color = "darkblue", type = "RIF")
object |
CeTF object resulted from |
color |
Color of points (default: darkblue) |
type |
Type of plot. The available options are: RIF or DE (default: RIF) |
This function can only be used after using the runAnalysis
function as it uses the CeTF class object as input.
Returns a relationship plot between RIF1 and RIF2 or a plot with the relationship between RIF1 or RIF2 with DE genes.
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) # performing RIFPlot for RIF RIFPlot(object = CeTFdemo, color = 'darkblue', type = 'RIF') # performing RIFPlot for DE RIFPlot(object = CeTFdemo, color = 'darkblue', type = 'DE')
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) # performing RIFPlot for RIF RIFPlot(object = CeTFdemo, color = 'darkblue', type = 'RIF') # performing RIFPlot for DE RIFPlot(object = CeTFdemo, color = 'darkblue', type = 'DE')
This function uses RIF and PCIT algorithms to run the whole pipeline analysis. The pipeline is composed by 4 steps:
Step 1: Data adjustment;
Step 2: Differential expression analysis;
Step 3: Regulatory Impact Factors analysis;
Step 4: Partial Correlation and Information Theory analysis.
runAnalysis( mat, conditions = NULL, lfc = 2.57, padj = 0.05, TFs = NULL, nSamples1 = NULL, nSamples2 = NULL, tolType = "mean", diffMethod = "Reverter", data.type = NULL )
runAnalysis( mat, conditions = NULL, lfc = 2.57, padj = 0.05, TFs = NULL, nSamples1 = NULL, nSamples2 = NULL, tolType = "mean", diffMethod = "Reverter", data.type = NULL )
mat |
Count data where the rows are genes and coluns the samples (conditions). |
conditions |
A vector of characters identifying the names of conditions (i.e. c('normal', 'tumor')). |
lfc |
logFoldChange module threshold to define a gene as differentially expressed (default: 2.57). |
padj |
Significance value to define a gene as differentially expressed (default: 0.05). |
TFs |
A vector of character with all transcripts factors of specific organism. |
nSamples1 |
Number of samples that correspond to first condition. |
nSamples2 |
Number of samples that correspond to second condition. |
tolType |
Tolerance calculation type (see |
diffMethod |
Method to calculate Differentially Expressed (DE) genes (see |
data.type |
Type of input data. If is expression (FPKM, TPM, etc) or counts. |
Returns an CeTF class object with output variables of each step of analysis.
data('simCounts') out <- runAnalysis(mat = simCounts, conditions=c('cond1', 'cond2'), lfc = 3, padj = 0.05, TFs = paste0('TF_', 1:1000), nSamples1 = 10, nSamples2= 10, tolType = 'mean', diffMethod = 'Reverter', data.type = 'counts')
data('simCounts') out <- runAnalysis(mat = simCounts, conditions=c('cond1', 'cond2'), lfc = 3, padj = 0.05, TFs = paste0('TF_', 1:1000), nSamples1 = 10, nSamples2= 10, tolType = 'mean', diffMethod = 'Reverter', data.type = 'counts')
Simulated counts data created using PROPER package. This data contains 21,000 genes, 1,000 transcript factors and 20 samples (divided in two conditions).
data(simCounts)
data(simCounts)
An dataframe.
data(simCounts)
data(simCounts)
Simulated normalized data created using PROPER package. This data contains 69 genes, and 10 samples (correspondent to only one condition).
data(simNorm)
data(simNorm)
An dataframe.
data(simNorm)
data(simNorm)
Generate an plot for Differentially Expressed (DE) genes and for specific TF that shows the relationship between log(baseMean) and Difference of Expression or log2FoldChange. This plot enables to visualize the distribution of DE genes and TF in both conditions.
SmearPlot( object, diffMethod, lfc = 1.5, conditions, TF = NULL, padjust = 0.05, label = FALSE, type = NULL )
SmearPlot( object, diffMethod, lfc = 1.5, conditions, TF = NULL, padjust = 0.05, label = FALSE, type = NULL )
object |
CeTF class object resulted from |
diffMethod |
Method used to calculate Differentially Expressed (DE)
genes (see |
lfc |
logFoldChange module threshold to define a gene as differentially expressed (default: 1.5). |
conditions |
A vector of characters identifying the names of conditions (i.e. c('normal', 'tumoral')). |
TF |
Specify a single TF to be visualized for (used only if type argument equals TF). |
padjust |
Significance value to define a gene as differentially expressed in DESeq2 diffMethod option (default: 0.05). |
label |
If label is TRUE, shows the names of single TF and its respectives (default: FALSE). |
type |
Type of plot (DE or TF). If DE, will plot the smear plot for all differentally expressed genes and TFs for both conditions, and if TF, will plot the smear plot for a specific TF and their targets. targets for both conditions (default: FALSE). |
Returns an plot of log2(baseMean) by log2FoldChange or difference of expression for genes and TFs differentially expressed or for a single TF and its targets for both conditions.
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) #performing SmearPlot for DE genes and TFs SmearPlot(object = CeTFdemo, diffMethod = 'Reverter', lfc = 1.5, conditions = c('untrt', 'trt'), type = 'DE') #performing SmearPlot for DE genes and TFs SmearPlot(object = CeTFdemo, diffMethod = 'Reverter', lfc = 1.5, conditions = c('untrt', 'trt'), TF = 'ENSG00000205189', label = FALSE, type = 'TF')
# load the CeTF class object resulted from runAnalysis function data(CeTFdemo) #performing SmearPlot for DE genes and TFs SmearPlot(object = CeTFdemo, diffMethod = 'Reverter', lfc = 1.5, conditions = c('untrt', 'trt'), type = 'DE') #performing SmearPlot for DE genes and TFs SmearPlot(object = CeTFdemo, diffMethod = 'Reverter', lfc = 1.5, conditions = c('untrt', 'trt'), TF = 'ENSG00000205189', label = FALSE, type = 'TF')
Transcripition Factors data from Kai Wang and Hiroki Nishida, 2015 for Homo sapiens.
data(TFs)
data(TFs)
An character vector with TFs for Homo sapiens.
See https://doi.org/10.1186/s12859-015-0552-x
data(TFs)
data(TFs)
Calculates the local tolerance for every trio of genes.
tolerance(a, b, c, tolType)
tolerance(a, b, c, tolType)
a |
Correlation value between the genes A and B. |
b |
Correlation value between the genes B and C. |
c |
Correlation value between the genes A and C. |
tolType |
Calculation type for tolerance (1 for mean, 2 for min and 3 for max). |
Returns the value of tolerance.
See vignette for more details about the pairwise correlations.
tolerance(0.5, -0.65, 0.23, tolType = 1) tolerance(0.5, -0.65, 0.23, tolType = 2) tolerance(0.5, -0.65, 0.23, tolType = 3)
tolerance(0.5, -0.65, 0.23, tolType = 1) tolerance(0.5, -0.65, 0.23, tolType = 2) tolerance(0.5, -0.65, 0.23, tolType = 3)