| Title: | Gene List Enrichment Analysis using the ToppGene Suite |
|---|---|
| Description: | The ToppGene Suite is a one-stop portal for gene list enrichment analysis and candidate gene prioritization based on functional annotations and protein interactions network. Although the ToppCluster web application provides convenient graphical access to the ToppGene Suite, the OpenAPI 3.0 compliant interface of ToppGene is better suited for automation and reproducibility. This package includes Bioconductor class interfaces and biological examples. |
| Authors: | Pariksheet Nanda [aut, cre] (ORCID: <https://orcid.org/0000-0001-9726-4552>), Jason Shoemaker [fnd] (ORCID: <https://orcid.org/0000-0003-3315-7103>) |
| Maintainer: | Pariksheet Nanda <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.1.1 |
| Built: | 2026-05-30 09:35:19 UTC |
| Source: | https://github.com/bioc/toppgene |
Specialized [DataFrame] class with the following additional constraints to represent ToppGene parameters to run a ToppGene [enrich()] query:
Fixed number of rows with fixed order for each category.
Columns can only be set to the allowed numerical and set values that are described when showing the object.
The DataFrame semantics allow quickly setting multiple parameters at a time. Unlike typical DataFrame behavior, [show()] displays all 19 rows instead of using ellipses.
CategoriesDataFrame(...) ## S4 method for signature 'CategoriesDataFrame' show(object) default(x) ## S4 method for signature 'CategoriesDataFrame' default(x) ## S4 replacement method for signature 'CategoriesDataFrame' x[i, j, ...] <- value ## S4 replacement method for signature 'CategoriesDataFrame,ANY,ANY,ANY' x[[i, j, ...]] <- value ## S4 replacement method for signature 'CategoriesDataFrame' rownames(x) <- value ## S4 method for signature 'CategoriesDataFrame' subset(x, ...)CategoriesDataFrame(...) ## S4 method for signature 'CategoriesDataFrame' show(object) default(x) ## S4 method for signature 'CategoriesDataFrame' default(x) ## S4 replacement method for signature 'CategoriesDataFrame' x[i, j, ...] <- value ## S4 replacement method for signature 'CategoriesDataFrame,ANY,ANY,ANY' x[[i, j, ...]] <- value ## S4 replacement method for signature 'CategoriesDataFrame' rownames(x) <- value ## S4 method for signature 'CategoriesDataFrame' subset(x, ...)
... |
Arguments passed on to inherited methods. |
object |
'CategoriesDataFrame' object used in [validObject()] and [show()] function calls. |
x |
'CategoriesDataFrame' object. |
i |
With 'j' (x[i, j]]), the row slice(s) / row name(s) , otherwise (x[[i]]) the column slice / name. |
j |
The column slice / name (x[i, j]). |
value |
Atomic or vector assigned to 'x'. |
'CategoriesDataFrame' with 19 rows (categories: CoExpression, ..., ToppCell) and 5 columns (PValue, MinGenes, MaxGenes, MaxResults, Correction).
[DataFrame::DataFrame()]
library(DFplyr) cats <- CategoriesDataFrame() cats <- cats |> mutate( PValue = 0.001, MaxResults = case_when( grepl("Onto", rownames(cats)) ~ 25L, .default = MaxResults)) catslibrary(DFplyr) cats <- CategoriesDataFrame() cats <- cats |> mutate( PValue = 0.001, MaxResults = case_when( grepl("Onto", rownames(cats)) ~ 25L, .default = MaxResults)) cats
The ToppGene API returns many [CATEGORIES] of gene list erichment.
enrich(entrez_ids, categories = CategoriesDataFrame(), max_tries = 3L)enrich(entrez_ids, categories = CategoriesDataFrame(), max_tries = 3L)
entrez_ids |
Integer vector of genes. |
categories |
If no categories are provided, return all categories. |
max_tries |
Number of attempts passed on to httr2::req_retry. |
DataFrame with 15 columns containing the enrichment Category, ID, and associated data.
# Sample functional enrichment calls of the ToppGene API specification: enrich(2L) enrich(as.integer(c(1482, 4205, 2626, 9421, 9464, 6910, 6722)))# Sample functional enrichment calls of the ToppGene API specification: enrich(2L) enrich(as.integer(c(1482, 4205, 2626, 9421, 9464, 6910, 6722)))
The ToppGene API returns many lookup differs from Bioconductor's identifier lookup, therefore we have to use the web API instead of the typical Bioconductor functions of [GSEABase::mapIdentfiers()], [AnnotationDbi::mapIds()], etc.
lookup(symbols, max_tries = 3L)lookup(symbols, max_tries = 3L)
symbols |
Character vector of genes. |
max_tries |
Number of attempts passed on to httr2::req_retry. |
DataFrame with 4 columns: "Submitted" symbol character vector in the same order as the symbols input parameter, corresponding "Entrez" integer IDs, "OfficialSymbol" character vector, and "Description" of gene.
# Sample lookup call of the ToppGene API specification: # - FLDB is an obsolete symbol for APOB. # - APOE is the current symbol for APOE. # - ENSG00000113196 is an ensembl gene symbol for HAND1. # - ENSMUSG00000020287 is a mouse gene MPG. lookup(c("FLDB", "APOE", "ENSG00000113196", "ENSMUSG00000020287"))# Sample lookup call of the ToppGene API specification: # - FLDB is an obsolete symbol for APOB. # - APOE is the current symbol for APOE. # - ENSG00000113196 is an ensembl gene symbol for HAND1. # - ENSMUSG00000020287 is a mouse gene MPG. lookup(c("FLDB", "APOE", "ENSG00000113196", "ENSMUSG00000020287"))
Many downstream drug analyses in Bioconductor make use of PubChem CIDs but ToppGened drug identifiers require changes prior to conversion, and the conversion itself is involved when scaling to large lists of identifiers.
lookup_pubchem(df)lookup_pubchem(df)
df |
DataFrame with 15 columns containing the enrichment Category, ID, and associated data. |
Therefore this function submits queries to the PubChem Power User Gateway (PUG) in parallel for each query.
DataFrame with 2 columns subset to Category == Drug containing input Source, ID, and output CID.
library(DFplyr) cats <- CategoriesDataFrame() cats <- cats |> mutate( PValue = case_when( grepl("Drug", rownames(cats)) ~ PValue, .default = 1e-100), MaxResults = case_when( grepl("Drug", rownames(cats)) ~ 1000L, .default = 1L)) # EGFR gene that has hits in all drug databases. df_enrich <- toppgene::enrich(1956L, cats) df_cid <- lookup_pubchem(df_enrich) df_cidlibrary(DFplyr) cats <- CategoriesDataFrame() cats <- cats |> mutate( PValue = case_when( grepl("Drug", rownames(cats)) ~ PValue, .default = 1e-100), MaxResults = case_when( grepl("Drug", rownames(cats)) ~ 1000L, .default = 1L)) # EGFR gene that has hits in all drug databases. df_enrich <- toppgene::enrich(1956L, cats) df_cid <- lookup_pubchem(df_enrich) df_cid
Map external Registry IDs to PubChem CIDs using the PubChem Power User Gateway (PUG) (https://pubchem.ncbi.nlm.nih.gov/docs/power-user-gateway), also specified by the NCBI identifer exchange service: https://pubchem.ncbi.nlm.nih.gov/idexchange/
lookup_pubchem_(ids, registry = NULL)lookup_pubchem_(ids, registry = NULL)
ids |
character vector of one or more Registry identifiers. |
registry |
optional character vector of length one with Registry name. Not specifying this argument falls back to a PubChem synonym lookup. |
DataFrame with PubChem CIDs or NAs guaranteed to be the length as the input when using a registry. The DataFrame may have more rows than the input when using a non-registry synonym lookup.