Package 'fedup'

Title: Fisher's Test for Enrichment and Depletion of User-Defined Pathways
Description: An R package that tests for enrichment and depletion of user-defined pathways using a Fisher's exact test. The method is designed for versatile pathway annotation formats (eg. gmt, txt, xlsx) to allow the user to run pathway analysis on custom annotations. This package is also integrated with Cytoscape to provide network-based pathway visualization that enhances the interpretability of the results.
Authors: Catherine Ross [aut, cre]
Maintainer: Catherine Ross <[email protected]>
License: MIT + file LICENSE
Version: 1.15.0
Built: 2024-11-27 04:43:07 UTC
Source: https://github.com/bioc/fedup

Help Index


Example list of a background set and two sets of test genes to use for enrichment analysis.

Description

Raw Excel data file (Sup Tab 2) is available from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566881/

Usage

data(geneDouble)

Format

a named list with three vector elements, one common background gene vector and two test gene vectors.

Details

Script to prepare data system.file("script", "genes.R", package = "fedup")


Example list of a background set and multiple sets of test genes to use for enrichment analysis.

Description

Raw Excel data file (Sup Tab 2) is available from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566881/

Usage

data(geneMulti)

Format

a named list with thirteen vector elements, one common background gene vector and twelve test gene vectors.

Details

Script to prepare data system.file("script", "genes.R", package = "fedup")


Example list of a background set and single set of test genes to use for enrichment analysis.

Description

Raw Excel data file (Sup Tab 2) is available from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566881/

Usage

data(geneSingle)

Format

a named list with two vector elements, one common background gene vector and one test gene vector.

Details

Script to prepare data system.file("script", "genes.R", package = "fedup")


Named list of human pathway annotations obtained from a GMT file.

Description

GMT file is available from http://download.baderlab.org/EM_Genesets/November_17_2020/Human/symbol

Usage

data(pathwaysGMT)

Format

a named list of 1437 vectors

Details

Raw data location system.file("extdata", "Human_Reactome_November_17_2020_symbol.gmt", package = "fedup") Script to prepare data system.file("script", "pathwaysGMT.R", package = "fedup")


Example list of SAFE terms obtained from a TXT file.

Description

Raw data file (S5) is available from https://boonelab.ccbr.utoronto.ca/supplement/costanzo2016/

Usage

data(pathwaysTXT)

Format

a named list of 317 vectors

Details

Raw data location system.file("extdata", "SAFE_terms.txt", package = "fedup")

Script to prepare data system.file("script", "pathwaysTXT.R", package = "fedup")


Example list of SAFE terms obtained from a XLSX file.

Description

Raw data file (S5) is available from https://boonelab.ccbr.utoronto.ca/supplement/costanzo2016/

Usage

data(pathwaysXLSX)

Format

a named list of 317 vectors

Details

Raw data location system.file("extdata", "SAFE_terms.xlsx", package = "fedup")

Script to prepare data system.file("script", "pathwaysXLSX.R", package = "fedup")


Visualizes fedup enrichment and depletion results using ggplot.

Description

This function supports any combination of numeric x-y variables to plot from fedup results. The list outputted by runFedup must first be converted to a data.frame before plotting (see examples for sample use).

Usage

plotDotPlot(
    df,
    xVar,
    yVar,
    xLab = xVar,
    yLab = NULL,
    pTitle = NULL,
    fillVar = NULL,
    fillCol = NULL,
    fillLab = fillVar,
    sizeVar = NULL,
    sizeLab = sizeVar
)

Arguments

df

(data.frame) table with fedup results generated via runFedup

xVar

(char) x-axis variable (must be a column value in df)

yVar

(char) y-axis variable (must be a column value in df)

xLab

(char) x-axis label (default xVar value)

yLab

(char) y-axis label (default NULL)

pTitle

(char) plot title (default NULL)

fillVar

(char) point fill variable (default NULL)

fillCol

(char) point fill colours (default NULL)

fillLab

(char) point fill label (default fillVar value)

sizeVar

(char) point size variable (default NULL)

sizeLab

(char) point size label (default sizeVar value)

Value

Object returned from ggplot with the enrichment dot plot.

Examples

# Load example data
data(geneDouble)
data(pathwaysGMT)
# Load external libraries
suppressMessages(library(dplyr))
suppressMessages(library(tidyr))
# Run fedup
fedupRes <- runFedup(geneDouble, pathwaysGMT)
# Prepare dataframe from fedup results
fedupPlot <- fedupRes %>%
    bind_rows(.id = "set") %>%
    separate(col = "set", into = c("set", "sign"), sep = "_") %>%
    subset(qvalue < 0.01) %>%
    mutate(log10qvalue = -log10(qvalue)) %>%
    mutate(pathway = gsub("\\%.*", "", pathway)) %>%
    as.data.frame()
# Plot
p <- plotDotPlot(
    df = fedupPlot,
    xVar = "log10qvalue",
    yVar = "pathway",
    xLab = "-log10(qvalue)",
    fillVar = "sign",
    fillLab = "Genetic interaction",
    fillCol = c("#0077f1", "#fcde24"),
    sizeVar = "fold_enrichment",
    sizeLab = "Fold enrichment"
)

Draws a network representation of overlaps among pathway enrichment results using EnrichmentMap (EM) in Cytoscape.

Description

Draws a network representation of overlaps among pathway enrichment results using EnrichmentMap (EM) in Cytoscape.

Usage

plotFemap(
    gmtFile,
    resultsFolder,
    pvalue = 1,
    qvalue = 1,
    formSim = "COMBINED",
    edgeSim = 0.375,
    combSim = 0.5,
    chartData = "NES_VALUE",
    clustAlg = "MCL",
    clustWords = 3,
    hideNodeLabels = FALSE,
    netName = "generic",
    netFile = "png"
)

Arguments

gmtFile

(char) absolute path to GMT file (generated via writePathways)

resultsFolder

(char) absolute path to folder with fedup results (generated via writeFemap)

pvalue

(numeric) pvalue cutoff (value between 0 and 1; default 1)

qvalue

(numeric) qvalue cutoff (value between 0 and 1; default 1)

formSim

(character) formula to calculate similarity score (one of OVERLAP, JACCARD, COMBINED; default COMBINED)

edgeSim

(numeric) edge similarity score cutoff (value between 0 and 1; default 0.375)

combSim

(numeric) when coefficients=COMBINED this parameter is used to determine what percentage to use for JACCARD and OVERLAP when combining their value (value between 0 to 1; default 0.5)

chartData

(char) node chart data (one of NES_VALUE, P_VALUE, FDR_VALUE, PHENOTYPES, DATA_SET, EXPRESSION_SET, or NONE; default NES_VALUE)

clustAlg

(character) clusterMaker algorith (one of AFFINITY_PROPAGATION, CLUSTER_FIZZIFIER, GLAY, CONNECTED_COMPONENTS, MCL, SCPS; default MCL)

clustWords

(integer) maximum words to include in autoAnnotate cluster label (default 3)

hideNodeLabels

(logical) if TRUE hides the node label in the EM; cluster labels generated via AutoAnnotate remain visible

netName

(char) name for EM in Cytoscape (default generic)

netFile

(char) name of output image (supports png, pdf, svg, jpeg image formats)

Value

File name of image to which the network is exported and an open session of Cytoscape (side effect of plotting EM). NULL if Cytoscape is not running locally.

Examples

# Load example data
data(geneDouble)
data(pathwaysGMT)
# Run fedup
fedupRes <- runFedup(geneDouble, pathwaysGMT)
# Write out results to temp folder
resultsFolder <- tempdir()
writeFemap(fedupRes, resultsFolder)
# Write out gmt formatted pathawy annotations to temp file
gmtFile <- tempfile("pathwaysGMT", fileext = ".gmt")
writePathways(pathwaysGMT, gmtFile)
# Plot enrichment map
netFile <- tempfile("fedup_EM", fileext = ".png")
plotFemap(
    gmtFile = gmtFile,
    resultsFolder = resultsFolder,
    qvalue = 0.05,
    hideNodeLabels = TRUE,
    netName = "fedup_EM",
    netFile = netFile
)

Prepares input gene list for runFedup.

Description

This function takes any number of test genes and a common background set of genes and properly formats them for to pass to runFedup gene argument.

Usage

prepInput(setName, ...)

Arguments

setName

(char) character vector naming gene vectors passed to ... (must include "background" (case sensitive)).

...

(char) n vectors with at least one background set of genes and any number of test genes.

Value

List of length n with gene vectors corresponding to those passed to ....

Examples

# Raw gene data file
genesFile <- system.file("extdata",
    "NIHMS1587165-supplement-1587165_Sup_Tab_2.txt", package = "fedup")
genes <- read.delim(genesFile, h = TRUE, as.is = TRUE)
# Prepare gene vectors
b <- unique(genes[, "gene"])
set1 <- unique(genes[which(genes$FASN_merge < -0.4), "gene"])
set2 <- unique(genes[which(genes$FASN_merge > 0.4), "gene"])
setName <- c("background", "negative", "positive")
# Generate input list with background genes and two test set of genes
geneDouble <- prepInput(setName, b, set1, set2)

Returns a list of pathways from various file formats.

Description

This function supports custom pathway annotations to use for fedup pathway enrichment analysis. Current file formats supported are gmt, txt, and xlsx.

Usage

readPathways(
    pathwayFile,
    header = FALSE,
    pathCol = NULL,
    geneCol = NULL,
    minGene = 1L,
    maxGene = Inf
)

Arguments

pathwayFile

(char) path to file with pathway annotations

header

(logical) whether pathwayFile has a header (default FALSE)

pathCol

(char or int) column name or number with pathway identifiers (for use with non-GMT input files (eg "Pathway.ID" or 2; default NULL))

geneCol

(char or int) column name or number with gene identifiers (for use with non-GMT input files (eg "Gene.ID" or 5; default NULL))

minGene

(integer) minimum number of genes to be considered in a pathway (default 1)

maxGene

(integer) maximum number of genes to be considered in a pathway (default Inf)

Value

A list of vectors with pathway annotations.

Examples

# Generate pathway list from GMT annotation file
pathways <- readPathways(
    system.file("extdata", "Human_Reactome_November_17_2020_symbol.gmt",
        package = "fedup"
    ),
    minGene = 10, maxGene = 500
)
# Generate pathway list from XLSX annotation file
pathways <- readPathways(
    system.file("extdata", "SAFE_terms.xlsx", package = "fedup"),
    header = TRUE, pathCol = "Enriched.GO.names", geneCol = "Gene.ID"
)

Runs pathway enrichment and depletion analysis using a Fisher's exact test.

Description

This function takes a list of test genes and a common background set to calculate enrichment and depletion for a list of pathways. The method allows for fast and efficient testing of multiple gene sets of interest.

Usage

runFedup(genes, pathways)

Arguments

genes

(list) named list of vectors with background genes and n test genes.

pathways

(list) named list of vectors with pathway annotations.

Value

List of length n with table(s) of pathway enrichment and depletion results. Rows represent tested pathways. Columns represent:

  • pathway – name of the pathway, corresponds to names(pathways);

  • size – size of the pathway;

  • real_frac – fraction of test gene members in pathway;

  • expected_frac – fraction of background gene members in pathway;

  • fold_enrichment – fold enrichment measure, evaluates as real_frac / expected_frac;

  • status – indicator that pathway is enriched or depleted for test gene members;

  • real_gene – vector of test gene members annotated to pathways;

  • pvalue – enrichment p-value calculated via Fisher's exact test;

  • qvalue – BH-adjusted p-value

Examples

# Load pathway annotations
data(pathwaysGMT)
# Run fedup with a single test set
data(geneSingle)
fedupRes <- runFedup(geneSingle, pathwaysGMT)
# Run fedup with two test sets
data(geneDouble)
fedupRes <- runFedup(geneDouble, pathwaysGMT)
# Run fedup with multiple test sets
data(geneMulti)
fedupRes <- runFedup(geneMulti, pathwaysGMT)

Writes an enrichment dataset file for use in Cytoscape EnrichmentMap.

Description

Writes an enrichment dataset file for use in Cytoscape EnrichmentMap.

Usage

writeFemap(results, resultsFolder)

Arguments

results

(list) list with ouput results from runFedup

resultsFolder

(char) name of folder to store result file(s)

Value

Table of pathway enrichment and depletion results formatted as a 'Generic results file'. Rows represent tested pathways. Columns represent:

  • pathway – pathway ID (must match pathway IDs in the GMT file provided to plotFemap;

  • description – pathway name or description;

  • pvalue – enrichment pvalue;

  • qvalue – BH-corrected pvalue;

  • status – +1 or -1, to identify enriched or depleted pathways (+1 maps to red, -1 maps to blue)

Examples

# Load example data
data(geneDouble)
data(pathwaysGMT)
# Run fedup
fedupRes <- runFedup(geneDouble, pathwaysGMT)
# Write out results to temp folder
resultsFolder <- tempdir()
writeFemap(fedupRes, resultsFolder)

Writes a set of pathways (list of vectors) to a GMT file.

Description

Writes a set of pathways (list of vectors) to a GMT file.

Usage

writePathways(pathways, gmtFile)

Arguments

pathways

(list) named list of vectors

gmtFile

(char) name of output GMT file

Value

GMT-formatted file. Rows represent pathways. Columns represent:

  • pathway ID;

  • description;

  • a list of tab-delimited genes

Examples

data(pathwaysXLSX)
writePathways(pathwaysXLSX, tempfile("pathwaysXLSX", fileext = ".gmt"))