Title: | Fisher's Test for Enrichment and Depletion of User-Defined Pathways |
---|---|
Description: | An R package that tests for enrichment and depletion of user-defined pathways using a Fisher's exact test. The method is designed for versatile pathway annotation formats (eg. gmt, txt, xlsx) to allow the user to run pathway analysis on custom annotations. This package is also integrated with Cytoscape to provide network-based pathway visualization that enhances the interpretability of the results. |
Authors: | Catherine Ross [aut, cre] |
Maintainer: | Catherine Ross <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.15.0 |
Built: | 2024-10-30 07:48:46 UTC |
Source: | https://github.com/bioc/fedup |
Raw Excel data file (Sup Tab 2) is available from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566881/
data(geneDouble)
data(geneDouble)
a named list with three vector elements, one common background gene vector and two test gene vectors.
Script to prepare data system.file("script", "genes.R", package = "fedup")
Raw Excel data file (Sup Tab 2) is available from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566881/
data(geneMulti)
data(geneMulti)
a named list with thirteen vector elements, one common background gene vector and twelve test gene vectors.
Script to prepare data system.file("script", "genes.R", package = "fedup")
Raw Excel data file (Sup Tab 2) is available from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566881/
data(geneSingle)
data(geneSingle)
a named list with two vector elements, one common background gene vector and one test gene vector.
Script to prepare data system.file("script", "genes.R", package = "fedup")
GMT file is available from http://download.baderlab.org/EM_Genesets/November_17_2020/Human/symbol
data(pathwaysGMT)
data(pathwaysGMT)
a named list of 1437 vectors
Raw data location system.file("extdata", "Human_Reactome_November_17_2020_symbol.gmt", package = "fedup") Script to prepare data system.file("script", "pathwaysGMT.R", package = "fedup")
Raw data file (S5) is available from https://boonelab.ccbr.utoronto.ca/supplement/costanzo2016/
data(pathwaysTXT)
data(pathwaysTXT)
a named list of 317 vectors
Raw data location system.file("extdata", "SAFE_terms.txt", package = "fedup")
Script to prepare data system.file("script", "pathwaysTXT.R", package = "fedup")
Raw data file (S5) is available from https://boonelab.ccbr.utoronto.ca/supplement/costanzo2016/
data(pathwaysXLSX)
data(pathwaysXLSX)
a named list of 317 vectors
Raw data location system.file("extdata", "SAFE_terms.xlsx", package = "fedup")
Script to prepare data system.file("script", "pathwaysXLSX.R", package = "fedup")
This function supports any combination of numeric x-y variables to plot from fedup results. The list outputted by runFedup must first be converted to a data.frame before plotting (see examples for sample use).
plotDotPlot( df, xVar, yVar, xLab = xVar, yLab = NULL, pTitle = NULL, fillVar = NULL, fillCol = NULL, fillLab = fillVar, sizeVar = NULL, sizeLab = sizeVar )
plotDotPlot( df, xVar, yVar, xLab = xVar, yLab = NULL, pTitle = NULL, fillVar = NULL, fillCol = NULL, fillLab = fillVar, sizeVar = NULL, sizeLab = sizeVar )
df |
(data.frame) table with fedup results generated via runFedup |
xVar |
(char) x-axis variable (must be a column value in |
yVar |
(char) y-axis variable (must be a column value in |
xLab |
(char) x-axis label (default |
yLab |
(char) y-axis label (default NULL) |
pTitle |
(char) plot title (default NULL) |
fillVar |
(char) point fill variable (default NULL) |
fillCol |
(char) point fill colours (default NULL) |
fillLab |
(char) point fill label (default |
sizeVar |
(char) point size variable (default NULL) |
sizeLab |
(char) point size label (default |
Object returned from ggplot with the enrichment dot plot.
# Load example data data(geneDouble) data(pathwaysGMT) # Load external libraries suppressMessages(library(dplyr)) suppressMessages(library(tidyr)) # Run fedup fedupRes <- runFedup(geneDouble, pathwaysGMT) # Prepare dataframe from fedup results fedupPlot <- fedupRes %>% bind_rows(.id = "set") %>% separate(col = "set", into = c("set", "sign"), sep = "_") %>% subset(qvalue < 0.01) %>% mutate(log10qvalue = -log10(qvalue)) %>% mutate(pathway = gsub("\\%.*", "", pathway)) %>% as.data.frame() # Plot p <- plotDotPlot( df = fedupPlot, xVar = "log10qvalue", yVar = "pathway", xLab = "-log10(qvalue)", fillVar = "sign", fillLab = "Genetic interaction", fillCol = c("#0077f1", "#fcde24"), sizeVar = "fold_enrichment", sizeLab = "Fold enrichment" )
# Load example data data(geneDouble) data(pathwaysGMT) # Load external libraries suppressMessages(library(dplyr)) suppressMessages(library(tidyr)) # Run fedup fedupRes <- runFedup(geneDouble, pathwaysGMT) # Prepare dataframe from fedup results fedupPlot <- fedupRes %>% bind_rows(.id = "set") %>% separate(col = "set", into = c("set", "sign"), sep = "_") %>% subset(qvalue < 0.01) %>% mutate(log10qvalue = -log10(qvalue)) %>% mutate(pathway = gsub("\\%.*", "", pathway)) %>% as.data.frame() # Plot p <- plotDotPlot( df = fedupPlot, xVar = "log10qvalue", yVar = "pathway", xLab = "-log10(qvalue)", fillVar = "sign", fillLab = "Genetic interaction", fillCol = c("#0077f1", "#fcde24"), sizeVar = "fold_enrichment", sizeLab = "Fold enrichment" )
Draws a network representation of overlaps among pathway enrichment results using EnrichmentMap (EM) in Cytoscape.
plotFemap( gmtFile, resultsFolder, pvalue = 1, qvalue = 1, formSim = "COMBINED", edgeSim = 0.375, combSim = 0.5, chartData = "NES_VALUE", clustAlg = "MCL", clustWords = 3, hideNodeLabels = FALSE, netName = "generic", netFile = "png" )
plotFemap( gmtFile, resultsFolder, pvalue = 1, qvalue = 1, formSim = "COMBINED", edgeSim = 0.375, combSim = 0.5, chartData = "NES_VALUE", clustAlg = "MCL", clustWords = 3, hideNodeLabels = FALSE, netName = "generic", netFile = "png" )
gmtFile |
(char) absolute path to GMT file (generated via writePathways) |
resultsFolder |
(char) absolute path to folder with fedup results (generated via writeFemap) |
pvalue |
(numeric) pvalue cutoff (value between 0 and 1; default 1) |
qvalue |
(numeric) qvalue cutoff (value between 0 and 1; default 1) |
formSim |
(character) formula to calculate similarity score (one of OVERLAP, JACCARD, COMBINED; default COMBINED) |
edgeSim |
(numeric) edge similarity score cutoff (value between 0 and 1; default 0.375) |
combSim |
(numeric) when coefficients=COMBINED this parameter is used to determine what percentage to use for JACCARD and OVERLAP when combining their value (value between 0 to 1; default 0.5) |
chartData |
(char) node chart data (one of NES_VALUE, P_VALUE, FDR_VALUE, PHENOTYPES, DATA_SET, EXPRESSION_SET, or NONE; default NES_VALUE) |
clustAlg |
(character) clusterMaker algorith (one of AFFINITY_PROPAGATION, CLUSTER_FIZZIFIER, GLAY, CONNECTED_COMPONENTS, MCL, SCPS; default MCL) |
clustWords |
(integer) maximum words to include in autoAnnotate cluster label (default 3) |
hideNodeLabels |
(logical) if TRUE hides the node label in the EM; cluster labels generated via AutoAnnotate remain visible |
netName |
(char) name for EM in Cytoscape (default generic) |
netFile |
(char) name of output image (supports png, pdf, svg, jpeg image formats) |
File name of image to which the network is exported and an open session of Cytoscape (side effect of plotting EM). NULL if Cytoscape is not running locally.
# Load example data data(geneDouble) data(pathwaysGMT) # Run fedup fedupRes <- runFedup(geneDouble, pathwaysGMT) # Write out results to temp folder resultsFolder <- tempdir() writeFemap(fedupRes, resultsFolder) # Write out gmt formatted pathawy annotations to temp file gmtFile <- tempfile("pathwaysGMT", fileext = ".gmt") writePathways(pathwaysGMT, gmtFile) # Plot enrichment map netFile <- tempfile("fedup_EM", fileext = ".png") plotFemap( gmtFile = gmtFile, resultsFolder = resultsFolder, qvalue = 0.05, hideNodeLabels = TRUE, netName = "fedup_EM", netFile = netFile )
# Load example data data(geneDouble) data(pathwaysGMT) # Run fedup fedupRes <- runFedup(geneDouble, pathwaysGMT) # Write out results to temp folder resultsFolder <- tempdir() writeFemap(fedupRes, resultsFolder) # Write out gmt formatted pathawy annotations to temp file gmtFile <- tempfile("pathwaysGMT", fileext = ".gmt") writePathways(pathwaysGMT, gmtFile) # Plot enrichment map netFile <- tempfile("fedup_EM", fileext = ".png") plotFemap( gmtFile = gmtFile, resultsFolder = resultsFolder, qvalue = 0.05, hideNodeLabels = TRUE, netName = "fedup_EM", netFile = netFile )
This function takes any number of test genes and a common
background set of genes and properly formats them for to pass to
runFedup gene
argument.
prepInput(setName, ...)
prepInput(setName, ...)
setName |
(char) character vector naming gene vectors passed to
|
... |
(char) |
List of length n
with gene vectors corresponding to those
passed to ...
.
# Raw gene data file genesFile <- system.file("extdata", "NIHMS1587165-supplement-1587165_Sup_Tab_2.txt", package = "fedup") genes <- read.delim(genesFile, h = TRUE, as.is = TRUE) # Prepare gene vectors b <- unique(genes[, "gene"]) set1 <- unique(genes[which(genes$FASN_merge < -0.4), "gene"]) set2 <- unique(genes[which(genes$FASN_merge > 0.4), "gene"]) setName <- c("background", "negative", "positive") # Generate input list with background genes and two test set of genes geneDouble <- prepInput(setName, b, set1, set2)
# Raw gene data file genesFile <- system.file("extdata", "NIHMS1587165-supplement-1587165_Sup_Tab_2.txt", package = "fedup") genes <- read.delim(genesFile, h = TRUE, as.is = TRUE) # Prepare gene vectors b <- unique(genes[, "gene"]) set1 <- unique(genes[which(genes$FASN_merge < -0.4), "gene"]) set2 <- unique(genes[which(genes$FASN_merge > 0.4), "gene"]) setName <- c("background", "negative", "positive") # Generate input list with background genes and two test set of genes geneDouble <- prepInput(setName, b, set1, set2)
This function supports custom pathway annotations to use for fedup pathway enrichment analysis. Current file formats supported are gmt, txt, and xlsx.
readPathways( pathwayFile, header = FALSE, pathCol = NULL, geneCol = NULL, minGene = 1L, maxGene = Inf )
readPathways( pathwayFile, header = FALSE, pathCol = NULL, geneCol = NULL, minGene = 1L, maxGene = Inf )
pathwayFile |
(char) path to file with pathway annotations |
header |
(logical) whether |
pathCol |
(char or int) column name or number with pathway identifiers (for use with non-GMT input files (eg "Pathway.ID" or 2; default NULL)) |
geneCol |
(char or int) column name or number with gene identifiers (for use with non-GMT input files (eg "Gene.ID" or 5; default NULL)) |
minGene |
(integer) minimum number of genes to be considered in a pathway (default 1) |
maxGene |
(integer) maximum number of genes to be considered in a pathway (default Inf) |
A list of vectors with pathway annotations.
# Generate pathway list from GMT annotation file pathways <- readPathways( system.file("extdata", "Human_Reactome_November_17_2020_symbol.gmt", package = "fedup" ), minGene = 10, maxGene = 500 ) # Generate pathway list from XLSX annotation file pathways <- readPathways( system.file("extdata", "SAFE_terms.xlsx", package = "fedup"), header = TRUE, pathCol = "Enriched.GO.names", geneCol = "Gene.ID" )
# Generate pathway list from GMT annotation file pathways <- readPathways( system.file("extdata", "Human_Reactome_November_17_2020_symbol.gmt", package = "fedup" ), minGene = 10, maxGene = 500 ) # Generate pathway list from XLSX annotation file pathways <- readPathways( system.file("extdata", "SAFE_terms.xlsx", package = "fedup"), header = TRUE, pathCol = "Enriched.GO.names", geneCol = "Gene.ID" )
This function takes a list of test genes and a common background set to calculate enrichment and depletion for a list of pathways. The method allows for fast and efficient testing of multiple gene sets of interest.
runFedup(genes, pathways)
runFedup(genes, pathways)
genes |
(list) named list of vectors with background genes and |
pathways |
(list) named list of vectors with pathway annotations. |
List of length n
with table(s) of pathway enrichment and
depletion results. Rows represent tested pathways. Columns represent:
pathway – name of the pathway, corresponds to
names(pathways
);
size – size of the pathway;
real_frac – fraction of test gene members in pathway;
expected_frac – fraction of background gene members in pathway;
fold_enrichment – fold enrichment measure,
evaluates as real_frac
/ expected_frac
;
status – indicator that pathway is enriched or depleted for test gene members;
real_gene – vector of test gene members annotated
to pathways
;
pvalue – enrichment p-value calculated via Fisher's exact test;
qvalue – BH-adjusted p-value
# Load pathway annotations data(pathwaysGMT) # Run fedup with a single test set data(geneSingle) fedupRes <- runFedup(geneSingle, pathwaysGMT) # Run fedup with two test sets data(geneDouble) fedupRes <- runFedup(geneDouble, pathwaysGMT) # Run fedup with multiple test sets data(geneMulti) fedupRes <- runFedup(geneMulti, pathwaysGMT)
# Load pathway annotations data(pathwaysGMT) # Run fedup with a single test set data(geneSingle) fedupRes <- runFedup(geneSingle, pathwaysGMT) # Run fedup with two test sets data(geneDouble) fedupRes <- runFedup(geneDouble, pathwaysGMT) # Run fedup with multiple test sets data(geneMulti) fedupRes <- runFedup(geneMulti, pathwaysGMT)
Writes an enrichment dataset file for use in Cytoscape EnrichmentMap.
writeFemap(results, resultsFolder)
writeFemap(results, resultsFolder)
results |
(list) list with ouput results from runFedup |
resultsFolder |
(char) name of folder to store result file(s) |
Table of pathway enrichment and depletion results formatted as a 'Generic results file'. Rows represent tested pathways. Columns represent:
pathway – pathway ID (must match pathway IDs in the GMT file provided to plotFemap;
description – pathway name or description;
pvalue – enrichment pvalue;
qvalue – BH-corrected pvalue;
status – +1 or -1, to identify enriched or depleted pathways (+1 maps to red, -1 maps to blue)
# Load example data data(geneDouble) data(pathwaysGMT) # Run fedup fedupRes <- runFedup(geneDouble, pathwaysGMT) # Write out results to temp folder resultsFolder <- tempdir() writeFemap(fedupRes, resultsFolder)
# Load example data data(geneDouble) data(pathwaysGMT) # Run fedup fedupRes <- runFedup(geneDouble, pathwaysGMT) # Write out results to temp folder resultsFolder <- tempdir() writeFemap(fedupRes, resultsFolder)
Writes a set of pathways (list of vectors) to a GMT file.
writePathways(pathways, gmtFile)
writePathways(pathways, gmtFile)
pathways |
(list) named list of vectors |
gmtFile |
(char) name of output GMT file |
GMT-formatted file. Rows represent pathways. Columns represent:
pathway ID;
description;
a list of tab-delimited genes
data(pathwaysXLSX) writePathways(pathwaysXLSX, tempfile("pathwaysXLSX", fileext = ".gmt"))
data(pathwaysXLSX) writePathways(pathwaysXLSX, tempfile("pathwaysXLSX", fileext = ".gmt"))