Title: | Fast functional enrichment for interactive applications |
---|---|
Description: | Perform fast functional enrichment on feature lists (like genes or proteins) using the hypergeometric distribution. Tailored for speed, this package is ideal for interactive platforms such as Shiny. It supports the retrieval of functional data from sources like GO, KEGG, Reactome, Bioplanet and WikiPathways. By downloading and preparing data first, it allows for rapid successive tests on various feature selections without the need for repetitive, time-consuming preparatory steps typical of other packages. |
Authors: | Marek Gierlinski [aut, cre] |
Maintainer: | Marek Gierlinski <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.5.0 |
Built: | 2024-10-31 06:08:46 UTC |
Source: | https://github.com/bioc/fenr |
Small Shiny app serving as example for fast enrichment
enrichment_interactive(de, term_data)
enrichment_interactive(de, term_data)
de |
Differential expression results, |
term_data |
A list of |
An interactive Shiny app
## Not run: data(yeast_de) term_data <- fetch_terms_for_example(yeast_de) enrichment_interactive(yeast_de, term_data) ## End(Not run)
## Not run: data(yeast_de) term_data <- fetch_terms_for_example(yeast_de) enrichment_interactive(yeast_de, term_data) ## End(Not run)
A set of gene names for proteins that could be detected in a typical proteomics experiment on yeast samples.
data(exmpl_all)
data(exmpl_all)
A character vector with 6985 elements.
A vector with background gene names.
A set of gene names manually selected to illustrate functional enrichment.
data(exmpl_sel)
data(exmpl_sel)
A character vector with 21 elements.
A vector with selected gene names.
Download term information (term ID and name) and gene-pathway mapping (NCBI gene ID, gene symbol and pathway ID) from BioPlanet.
fetch_bp(use_cache = TRUE, on_error = c("stop", "warn", "ignore"))
fetch_bp(use_cache = TRUE, on_error = c("stop", "warn", "ignore"))
use_cache |
Logical, if TRUE, the remote file will be cached locally. |
on_error |
A character string indicating the error handling strategy: either "stop" to halt execution, "warn" to issue a warning and return 'NULL' or "ignore" to return 'NULL' without warnings. Defaults to "stop". |
A list with terms
and mapping
tibbles.
bioplanet_data <- fetch_bp(on_error = "warn")
bioplanet_data <- fetch_bp(on_error = "warn")
This function downloads term information (GO term ID and name) and gene-term mapping (gene ID, symbol, and GO term ID) from either the Ensembl database (using BioMart) or the Gene Ontology database (using GAF files), depending on the provided argument.
fetch_go( species = NULL, dataset = NULL, use_cache = TRUE, on_error = c("stop", "warn", "ignore") )
fetch_go( species = NULL, dataset = NULL, use_cache = TRUE, on_error = c("stop", "warn", "ignore") )
species |
(Optional) Species designation. Examples are |
dataset |
(Optional) A string representing the dataset passed to Ensebml's Biomart, e.g. 'scerevisiae_gene_ensembl'. To see the different datasets available within a biomaRt you can e.g. do: mart <- biomaRt::useEnsembl(biomart = "ensembl"), followed by biomaRt::listDatasets(mart). |
use_cache |
Logical, if TRUE, the remote data will be cached locally. |
on_error |
A character string indicating the error handling strategy: either "stop" to halt execution, "warn" to issue a warning and return 'NULL' or "ignore" to return 'NULL' without warnings. Defaults to "stop". |
If species
is provided, mapping from a Gene Ontology GAF file
will be downloaded. GAF files contain more generic information than gene
symbols. In this function, the third column of the GAF file (DB Object
Symbol) is returned as gene_symbol
, but, depending on the
species
argument it can contain other entities, e.g. RNA or protein
complex names. Similarly, the eleventh column of the GAF file (DB Object
Synonym) is returned as gene_id
. It is up to the user to select
the appropriate database.
Alternatively, if dataset
is provided, mapping will be downloaded
from Ensembl database. It will gene symbol and Ensembl gene ID.
A list with terms
and mapping
tibbles.
# Fetch GO data from Ensembl go_data_ensembl <- fetch_go(dataset = "scerevisiae_gene_ensembl", on_error = "warn") # Fetch GO data from Gene Ontology go_data_go <- fetch_go(species = "sgd", on_error = "warn")
# Fetch GO data from Ensembl go_data_ensembl <- fetch_go(dataset = "scerevisiae_gene_ensembl", on_error = "warn") # Fetch GO data from Gene Ontology go_data_go <- fetch_go(species = "sgd", on_error = "warn")
This function attempts to scrape HTML web page containing a table of available species and corresponding file names. If the structure of the page changes one day and the function stops working, go to http://current.geneontology.org/products/pages/downloads.html and check file names. The species designation used in this package is the GAF file name without extension (e.g. for a file ‘goa_chicken.gaf’ the designation is ‘goa_chicken’).
fetch_go_species(on_error = c("stop", "warn", "ignore"))
fetch_go_species(on_error = c("stop", "warn", "ignore"))
on_error |
A character string indicating the error handling strategy: either "stop" to halt execution, "warn" to issue a warning and return 'NULL' or "ignore" to return 'NULL' without warnings. Defaults to "stop". |
A tibble with columns species
and designation
.
go_species <- fetch_go_species(on_error = "warn")
go_species <- fetch_go_species(on_error = "warn")
Download information (pathway ID and name) and gene-pathway mapping (entrez gene ID, gene symbol and pathway ID) from KEGG. Gene symbols are extracted from gene descriptions. For some species (e.g. yeast), gene symbols are returned instead of entrez IDs and not in gene description.
fetch_kegg(species, batch_size = 10, on_error = c("stop", "warn", "ignore"))
fetch_kegg(species, batch_size = 10, on_error = c("stop", "warn", "ignore"))
species |
KEGG species code, for example "hsa" for human. The
full list of available KEGG species can be found by using |
batch_size |
Number of pathways sent to KEGG database in one query. The maximum allowed is 10. |
on_error |
A character string indicating the error handling strategy: either "stop" to halt execution, "warn" to issue a warning and return 'NULL' or "ignore" to return 'NULL' without warnings. Defaults to "stop". |
A list with terms
and mapping
tibbles.
kegg_data <- fetch_kegg("mge", on_error = "warn")
kegg_data <- fetch_kegg("mge", on_error = "warn")
Find all species available from KEGG
fetch_kegg_species(on_error = c("stop", "warn", "ignore"))
fetch_kegg_species(on_error = c("stop", "warn", "ignore"))
on_error |
A character string indicating the error handling strategy: either "stop" to halt execution, "warn" to issue a warning and return 'NULL' or "ignore" to return 'NULL' without warnings. Defaults to "stop". |
A tibble, in which column designation
contains species
designations used in function fetch_kegg
.
spe <- fetch_kegg_species(on_error = "warn")
spe <- fetch_kegg_species(on_error = "warn")
Download term information (pathway ID and name) and gene-pathway mapping (Ensembl gene ID or gene symbol and pathway ID) from Reactome.
fetch_reactome( species, source = c("ensembl", "api", "gene_association"), use_cache = TRUE, on_error = c("stop", "warn", "ignore") )
fetch_reactome( species, source = c("ensembl", "api", "gene_association"), use_cache = TRUE, on_error = c("stop", "warn", "ignore") )
species |
Reactome species designation, for example "Homo sapiens" for
human. Full list of available species can be found using
|
source |
How to download the mapping. If 'ensembl' or 'gene_association', one mapping file provided by Reactome will be downloaded, if 'api', then Reactome API will be used. See details. |
use_cache |
Logical, if TRUE, the remote file will be cached locally. |
on_error |
A character string indicating the error handling strategy: either "stop" to halt execution, "warn" to issue a warning and return 'NULL' or "ignore" to return 'NULL' without warnings. Defaults to "stop". |
Reactome makes mapping between Ensembl ID and pathway ID available
in form of one downloadable file. This mapping contains gene symbols as
well. Also, a gene association file with mapping between UniProt accession
number, gene symbol and Reactome term is available. If source =
"ensembl"
or source =
"gene_association"
is set, one large file will be downloaded and parsed.
If source = "api"
is set, then Reactome APIs will be interrogated
for each pathway available. This method is considerably slower, especially
for large genomes. However, gene association file contains far fewer
mappings than can be extracted using API. If gene symbols are needed, we
recommend using source = "ensembl"
.
A list with terms
and mapping
tibbles
reactome_data <- fetch_reactome("Saccharomyces cerevisiae", on_error = "warn")
reactome_data <- fetch_reactome("Saccharomyces cerevisiae", on_error = "warn")
List of available Reactome species
fetch_reactome_species(on_error = c("stop", "warn", "ignore"))
fetch_reactome_species(on_error = c("stop", "warn", "ignore"))
on_error |
A character string indicating the error handling strategy: either "stop" to halt execution, "warn" to issue a warning and return 'NULL' or "ignore" to return 'NULL' without warnings. Defaults to "stop". |
A tibble with species names used by Reactome.
re <- fetch_reactome_species(on_error = "warn")
re <- fetch_reactome_species(on_error = "warn")
Create term data for interactive example
fetch_terms_for_example(de)
fetch_terms_for_example(de)
de |
Differential expression results, use |
A list of objects containing functional terms for GO and Reactome.
## Not run: data(yeast_de) term_data <- fetch_terms_for_example(yeast_de) ## End(Not run)
## Not run: data(yeast_de) term_data <- fetch_terms_for_example(yeast_de) ## End(Not run)
Download term information (pathway ID and name) and gene-pathway mapping (gene symbol and pathway ID) from WikiPathways.
fetch_wiki( species, databases = c("Ensembl", "Entrez Gene", "HGNC", "HGNC Accession number", "Uniprot-TrEMBL"), types = c("GeneProduct", "Protein", "Rna", "RNA"), on_error = c("stop", "warn", "ignore") )
fetch_wiki( species, databases = c("Ensembl", "Entrez Gene", "HGNC", "HGNC Accession number", "Uniprot-TrEMBL"), types = c("GeneProduct", "Protein", "Rna", "RNA"), on_error = c("stop", "warn", "ignore") )
species |
WikiPathways species designation, for example "Homo sapiens"
for human. Full list of available species can be found using
|
databases |
A character vector with database names to pre-filter mapping data. See details. Full result will be returned if NULL. |
types |
A character vector with types of entities to pre-filter mapping data. See details. Full result will be returned if NULL. |
on_error |
A character string indicating the error handling strategy: either "stop" to halt execution, "warn" to issue a warning and return 'NULL' or "ignore" to return 'NULL' without warnings. Defaults to "stop". |
WikiPathways contain mapping between pathways and a variety of
entities from various databases. Typically a gene symbol is returned in
column text_label
and some sort of ID in column id
, but this
depends on the species and databases used. For gene/protein enrichment,
these should be filtered to contain gene symbols only. This can be done by
selecting a desired databases and types. The default values for parameters
databases
and types
attempt to select information from generic
databases, but there are organism-specific databases not included in the
selection. We suggest to run this function with databases = NULL,
types = NULL
to see what types and databases are available before making
selection.
A list with terms
and mapping
tibbles.
wiki_data <- fetch_wiki("Bacillus subtilis", on_error = "warn")
wiki_data <- fetch_wiki("Bacillus subtilis", on_error = "warn")
List of available WikiPathways species
fetch_wiki_species(on_error = c("stop", "warn", "ignore"))
fetch_wiki_species(on_error = c("stop", "warn", "ignore"))
on_error |
A character string indicating the error handling strategy: either "stop" to halt execution, "warn" to issue a warning and return 'NULL' or "ignore" to return 'NULL' without warnings. Defaults to "stop". |
A character vector with species names used by WikiPathways.
spec <- fetch_wiki_species(on_error = "warn")
spec <- fetch_wiki_species(on_error = "warn")
Perform fast functional enrichment analysis based on the hypergeometric distribution. Designed for use in interactive applications.
functional_enrichment(feat_all, feat_sel, term_data, feat2name = NULL)
functional_enrichment(feat_all, feat_sel, term_data, feat2name = NULL)
feat_all |
A character vector with all feature identifiers, serving as the background for enrichment. |
feat_sel |
A character vector with feature identifiers in the selection. |
term_data |
An object of class |
feat2name |
An optional named list to convert feature IDs into feature names. |
This function carries out functional enrichment analysis on a
selection of features (e.g., differentially expressed genes) using the
hypergeometric probability distribution (Fisher's exact test). Features can
be genes, proteins, etc. The term_data
object contains functional
term information and feature-term mapping.
A tibble with enrichment results, providing the following information for each term:
N_with
- number of features with this term among all features
n_with_sel
- number of features with this term in the selection
n_expect
- expected number of features with this term in the selection,
under the null hypothesis that terms are mapped to features randomly
enrichment
- ratio of n_with_sel / n_expect
odds_ratio
- odds ratio for enrichment; is infinite when all
features with the given term are in the selection
p_value
- p-value from a single hypergeometric test
p_adjust
- p-value adjusted for multiple tests using the
Benjamini-Hochberg approach
.
## Not run: data(exmpl_all, exmpl_sel) go <- fetch_go(species = "sgd") go_terms <- prepare_for_enrichment(go$terms, go$mapping, exmpl_all, feature_name = "gene_symbol") enr <- functional_enrichment(exmpl_all, exmpl_sel, go_terms) ## End(Not run)
## Not run: data(exmpl_all, exmpl_sel) go <- fetch_go(species = "sgd") go_terms <- prepare_for_enrichment(go$terms, go$mapping, exmpl_all, feature_name = "gene_symbol") enr <- functional_enrichment(exmpl_all, exmpl_sel, go_terms) ## End(Not run)
Get terms annotating a given feature.
get_feature_terms(term_data, feature_id)
get_feature_terms(term_data, feature_id)
term_data |
An object class |
feature_id |
A string with a feature ID |
A character vector containing functional term IDs annotating given feature.
## Not run: go_data <- fetch_go(species = "sgd") go_terms <- prepare_for_enrichment(go_data$terms, go_data$mapping, feature = "gene_symbol") trms <- get_feature_terms(go_terms, "GEM1") ## End(Not run)
## Not run: go_data <- fetch_go(species = "sgd") go_terms <- prepare_for_enrichment(go_data$terms, go_data$mapping, feature = "gene_symbol") trms <- get_feature_terms(go_terms, "GEM1") ## End(Not run)
Get features annotated with a given term.
get_term_features(term_data, term_id)
get_term_features(term_data, term_id)
term_data |
An object class |
term_id |
A string with a functional term ID. |
A character vector containing feature IDs annotated with the term ID.
## Not run: go_data <- fetch_go(species = "sgd") go_terms <- prepare_for_enrichment(go_data$terms, go_data$mapping, feature = "gene_symbol") feats <- get_term_features(go_terms, "GO:0000001") ## End(Not run)
## Not run: go_data <- fetch_go(species = "sgd") go_terms <- prepare_for_enrichment(go_data$terms, go_data$mapping, feature = "gene_symbol") feats <- get_term_features(go_terms, "GO:0000001") ## End(Not run)
Downloaded using go <- fetch_go(species = "sgd")
data(go)
data(go)
A list of two tibbles
Contains GO-term descriptions and gene mapping.
Downloaded using go_species <- fetch_go_species()
data(go_species)
data(go_species)
A tibble
Contains species available through Gene Ontology
Process term data downloaded with the fetch_*
functions, preparing it
for fast enrichment analysis using functional_enrichment
.
prepare_for_enrichment( terms, mapping, all_features = NULL, feature_name = "gene_id" )
prepare_for_enrichment( terms, mapping, all_features = NULL, feature_name = "gene_id" )
terms |
A tibble with at least two columns: |
mapping |
A tibble with at least two columns, containing the mapping
between functional terms and features. One column must be named
|
all_features |
A vector with all feature IDs used as the background for
enrichment. If not specified, all features found in |
feature_name |
The name of the column in the |
This function takes two tibbles containing functional term information
(terms
) and feature mapping (mapping
), and converts them into
an object required by functional_enrichment
for efficient analysis.
Terms and mapping can be generated with the database access functions
included in this package, such as fetch_reactome
or
fetch_go_from_go
.
An object of class fenr_terms
required by
functional_enrichment
.
## Not run: data(exmpl_all) go <- fetch_go(species = "sgd") go_terms <- prepare_for_enrichment(go$terms, go$mapping, exmpl_all, feature_name = "gene_symbol") ## End(Not run)
## Not run: data(exmpl_all) go <- fetch_go(species = "sgd") go_terms <- prepare_for_enrichment(go$terms, go$mapping, exmpl_all, feature_name = "gene_symbol") ## End(Not run)
This function will remove all cached data used by 'fenr'. The user will be prompted for confirmation. Use only when you suspect the cache was corrupted. Use with caution!
remove_cache(ask = TRUE)
remove_cache(ask = TRUE)
ask |
Logical, whether to ask user for confirmation. |
TRUE if successfully removed.
## Not run: remove_cache() ## End(Not run)
## Not run: remove_cache() ## End(Not run)
A subset of 6 + 6 replicates was selected from data set reported in https://doi.org/10.1093/bioinformatics/btv425
data(yeast_de)
data(yeast_de)
A tibble with 5 columns
Results for differential expression for yeast RNA-seq.