Title: | a Global Weighted Model for Disease Ontology Enrichment Analysis |
---|---|
Description: | To implement disease ontology (DO) enrichment analysis, this package is designed and presents a double weighted model based on the latest annotations of the human genome with DO terms, by integrating the DO graph topology on a global scale. This package exhibits high accuracy that it can identify more specific DO terms, which alleviates the over enriched problem. The package includes various statistical models and visualization schemes for discovering the associations between genes and diseases from biological big data. |
Authors: | Liang Cheng [aut], Haixiu Yang [aut], Hongyu Fu [cre] |
Maintainer: | Hongyu Fu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.1 |
Built: | 2024-12-29 07:56:09 UTC |
Source: | https://github.com/bioc/EnrichDO |
To implement disease ontology (DO) enrichment analysis, this package is designed and presents a double weighted model based on the latest annotations of the human genome with DO terms, by integrating the DO graph topology on a global scale. This package exhibits high accuracy that it can identify more specific DO terms, which alleviates the over enriched problem. The package includes various statistical models and visualization schemes for discovering the associations between genes and diseases from biological big data.
Liang cheng, Haixiu Yang, Hongyu Fu
Maintainer: Haixiu Yang [email protected]
using the result of writeResult for convenience drawing.
convDraw(resultDO)
convDraw(resultDO)
resultDO |
a data frame of enrichment result |
DataFrame
Haixiu Yang
#'#Draw from wrireResult output files #Firstly, read the wrireResult output file,using the following two lines data <- read.delim(file.path(system.file('examples', package = 'EnrichDO'), 'result.txt')) enrich <- convDraw(resultDO = data) #then, Use the drawing function you need drawGraphViz(enrich=enrich) #Tree diagram drawPointGraph(enrich=enrich) #Bubble diagram drawBarGraph(enrich=enrich) #Bar plot
#'#Draw from wrireResult output files #Firstly, read the wrireResult output file,using the following two lines data <- read.delim(file.path(system.file('examples', package = 'EnrichDO'), 'result.txt')) enrich <- convDraw(resultDO = data) #then, Use the drawing function you need drawGraphViz(enrich=enrich) #Tree diagram drawPointGraph(enrich=enrich) #Bubble diagram drawBarGraph(enrich=enrich) #Bar plot
given an array of human protein-genes with NCBI ENTREZID format, this function combines topological properties of the disease ontology structure for enrichment analysis.
doEnrich( interestGenes, test = c("hypergeomTest", "fisherTest", "binomTest", "chisqTest", "logoddTest"), method = c("BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr", "none"), m = 1, maxGsize = 5000, minGsize = 5, traditional = FALSE, delta = 0.01, penalize = TRUE, allDOTerms = FALSE )
doEnrich( interestGenes, test = c("hypergeomTest", "fisherTest", "binomTest", "chisqTest", "logoddTest"), method = c("BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr", "none"), m = 1, maxGsize = 5000, minGsize = 5, traditional = FALSE, delta = 0.01, penalize = TRUE, allDOTerms = FALSE )
interestGenes |
a vector of gene IDs.The interest gene sets should be protein-coding genes, using the ENTREZID format from NCBI. |
test |
One of 'fisherTest','hypergeomTest','binomTest','chisqTest' and 'logoddTest' statistical model. Default is hypergeomTest. |
method |
One of 'holm', 'hochberg', 'hommel', 'bonferroni', 'BH', 'BY','fdr' and 'none',for P value correction. |
m |
Set the maximum number of ancestor layers for ontology enrichment. Default is layer 1. |
maxGsize |
indicates that doterms with more annotation genes than maxGsize are ignored, and the P value of these doterms is set to 1. |
minGsize |
indicates that doterms with less annotation genes than minGsize are ignored, and the P value of these doterms is set to 1. |
traditional |
a logical variable, TRUE for traditional enrichment analysis, FALSE for enrichment analysis with weights. Default is FALSE. |
delta |
Set the threshold of nodes, if the p value of doterm is greater than delta, the nodes are not significant, and these nodes are not weighted.Default is 0.01. |
penalize |
Logical value, used to alleviate the impact of different magnitudes of p-values, default value is TRUE. When set to FALSE, the degree of reduction in weight for non-significant nodes is decreased. |
allDOTerms |
Logical value, whether to store all doterms in EnrichResult, defaults is FALSE (only significant nodes are retained). |
A EnrichResult
instance.
Haixiu Yang
##Input data case #the inputdata_demo variable stores validated protein-coding genes associated with Alzheimer's disease. Alzheimer <- read.delim(file.path(system.file('extdata', package='EnrichDO'), 'Alzheimer_curated.csv'), header = FALSE) inputdata_demo <- Alzheimer[,1] ##doEnrich case #The enrichment results were obtained by using demo.data demo.data <- c(1636,351,102,2932,3077,348,4137,54209) demo_result <- doEnrich(interestGenes=demo.data,maxGsize = 100, minGsize=10)
##Input data case #the inputdata_demo variable stores validated protein-coding genes associated with Alzheimer's disease. Alzheimer <- read.delim(file.path(system.file('extdata', package='EnrichDO'), 'Alzheimer_curated.csv'), header = FALSE) inputdata_demo <- Alzheimer[,1] ##doEnrich case #The enrichment results were obtained by using demo.data demo.data <- c(1636,351,102,2932,3077,348,4137,54209) demo_result <- doEnrich(interestGenes=demo.data,maxGsize = 100, minGsize=10)
A dataset includes 15106 genes.
dotermgenes
dotermgenes
An character array with 15106 elements:
A dataset includes 4831 DO terms of hierarchical information, annotated gene information, and weight information
doterms
doterms
A data frame with 4813 rows and 10 variables:
the DOterm ID on enrichment
the hierarchy of the DOterm in the DAG graph
all genes related to the DOterm
gene weights in each node
the parent node of the DOterm
the number of parent.arr
child nodes of the DOterm
the number of child.arr
the number of all genes related to the DOterm
the standard name of the DOterm
The enrichment results are shown in a bar chart
drawBarGraph(EnrichResult = NULL, enrich = NULL, n = 10, delta = 1e-15)
drawBarGraph(EnrichResult = NULL, enrich = NULL, n = 10, delta = 1e-15)
EnrichResult |
the EnrichResult object |
enrich |
a data frame of enrichment result |
n |
number of bars |
delta |
the threshold of P value |
bar graph
Haixiu Yang
demo.data <- c(1636,351,102,2932,3077,348,4137,54209) sample1 <- doEnrich(interestGenes=demo.data,maxGsize = 100, minGsize=10) drawBarGraph(EnrichResult=sample1, n=10, delta=0.05)
demo.data <- c(1636,351,102,2932,3077,348,4137,54209) sample1 <- doEnrich(interestGenes=demo.data,maxGsize = 100, minGsize=10) drawBarGraph(EnrichResult=sample1, n=10, delta=0.05)
the enrichment results are shown in a tree diagram
drawGraphViz( EnrichResult = NULL, enrich = NULL, n = 10, labelfontsize = 14, numview = TRUE, pview = TRUE )
drawGraphViz( EnrichResult = NULL, enrich = NULL, n = 10, labelfontsize = 14, numview = TRUE, pview = TRUE )
EnrichResult |
the EnrichResult object |
enrich |
a data frame of the enrichment result |
n |
the number of most significant nodes |
labelfontsize |
the font size of nodes |
numview |
Displays the number of intersections between the interest set and each doterm. |
pview |
Displays the P value for each dotrem. |
tree diagram
Haixiu Yang
demo.data <- c(1636,351,102,2932,3077,348,4137,54209) sample5 <- doEnrich(interestGenes=demo.data,maxGsize = 100, minGsize=10) drawGraphViz(EnrichResult =sample5) #The p-value and the number of intersections are not visible drawGraphViz(EnrichResult=sample5, numview = FALSE, pview = FALSE)
demo.data <- c(1636,351,102,2932,3077,348,4137,54209) sample5 <- doEnrich(interestGenes=demo.data,maxGsize = 100, minGsize=10) drawGraphViz(EnrichResult =sample5) #The p-value and the number of intersections are not visible drawGraphViz(EnrichResult=sample5, numview = FALSE, pview = FALSE)
The top DOID_n nodes in the enrichment results showed the top gene_n genes with the highest weight sum.
drawHeatmap( interestGenes, EnrichResult = NULL, DOID_n = 10, gene_n = 50, fontsize_row = 10, readable = TRUE, ... )
drawHeatmap( interestGenes, EnrichResult = NULL, DOID_n = 10, gene_n = 50, fontsize_row = 10, readable = TRUE, ... )
interestGenes |
A collection of interest genes in vector form |
EnrichResult |
the EnrichResult object |
DOID_n |
There are DOID_n nodes with the highest significance in the enrichment results. |
gene_n |
Among the selected DOID_n nodes, the top gene_n genes with the highest weight sum are selected to show. |
fontsize_row |
Set the font size of the gene tag. |
readable |
Logical value that controls whether the gene tag is in symbol format |
... |
Other parameters in the pheatmap function also apply. |
heat map
Haixiu Yang
demo.data <- c(1636,351,102,2932,3077,348,4137,54209) sample6 <- doEnrich(interestGenes=demo.data,maxGsize = 100, minGsize=10) drawHeatmap(interestGenes=demo.data, EnrichResult = sample6, gene_n = 10)
demo.data <- c(1636,351,102,2932,3077,348,4137,54209) sample6 <- doEnrich(interestGenes=demo.data,maxGsize = 100, minGsize=10) drawHeatmap(interestGenes=demo.data, EnrichResult = sample6, gene_n = 10)
The enrichment results are shown in a scatter plot
drawPointGraph(EnrichResult = NULL, enrich = NULL, n = 10, delta = 1e-15)
drawPointGraph(EnrichResult = NULL, enrich = NULL, n = 10, delta = 1e-15)
EnrichResult |
the EnrichResult object |
enrich |
a data frame of enrichment result. |
n |
number of points. |
delta |
the threshold of P value. |
scatter graph
Haixiu Yang
demo.data <- c(1636,351,102,2932,3077,348,4137,54209) sample2 <- doEnrich(interestGenes=demo.data,maxGsize = 100, minGsize=10) drawPointGraph(EnrichResult=sample2, n=10, delta=0.05)
demo.data <- c(1636,351,102,2932,3077,348,4137,54209) sample2 <- doEnrich(interestGenes=demo.data,maxGsize = 100, minGsize=10) drawPointGraph(EnrichResult=sample2, n=10, delta=0.05)
Class 'EnrichResult' This class represents the result of enrich analysis
enrich
a data frame of enrichment result
test
Statistical test
method
Multiple test correction methods
m
the maximum number of ancestor layers for ontology enrichment
maxGsize
The maximum number of DOTerm genes in enrichment analysis
minGsize
The minimum number of DOTerm genes in enrichment analysis
traditional
Indicates whether the traditional ORA method is used
delta
The highest p-value of significance for each node
penalize
Whether to use penalty function in enrichment analysis
interestGenes
A valid interest gene set
Haixiu Yang
show method for EnrichResult
instance
## S4 method for signature 'EnrichResult' show(object)
## S4 method for signature 'EnrichResult' show(object)
object |
A |
print info
Haixiu Yang
show DOterms
showDoTerms(doterms = doterms)
showDoTerms(doterms = doterms)
doterms |
a data frame of DOterms. |
text
Haixiu Yang
showDoTerms(doterms)
showDoTerms(doterms)
Internal calculation of enrichment analysis
TermStruct(resultDO)
TermStruct(resultDO)
resultDO |
Receives the file output by the wrireResult function, which is used to visually display the enrichment results (without running the enrichment operation again). |
A EnrichResult
instance.
Haixiu Yang
Output enrichment result as text
writeResult(EnrichResult = NULL, file, Q = 1, P = 1)
writeResult(EnrichResult = NULL, file, Q = 1, P = 1)
EnrichResult |
the EnrichResult object |
file |
the address and name of the output file. |
Q |
Output only doterm information with p.adjust values less than or equal to Q. |
P |
Output only doterm information with p values less than or equal to P. |
text
Haixiu Yang
demo.data <- c(1636,351,102,2932,3077,348,4137,54209) sample4 <- doEnrich(interestGenes=demo.data,maxGsize = 100, minGsize=10) writeResult(EnrichResult=sample4, file=file.path(tempdir(), 'result.txt'))
demo.data <- c(1636,351,102,2932,3077,348,4137,54209) sample4 <- doEnrich(interestGenes=demo.data,maxGsize = 100, minGsize=10) writeResult(EnrichResult=sample4, file=file.path(tempdir(), 'result.txt'))