Title: | processing of ontologies of anatomy, cell lines, and so on |
---|---|
Description: | Support harvesting of diverse bioinformatic ontologies, making particular use of the ontologyIndex package on CRAN. We provide snapshots of key ontologies for terms about cells, cell lines, chemical compounds, and anatomy, to help analyze genome-scale experiments, particularly cell x compound screens. Another purpose is to strengthen development of compelling use cases for richer interfaces to emerging ontologies. |
Authors: | Vincent Carey [ctb, cre] , Sara Stankiewicz [ctb], Victor Tarca [ctb] |
Maintainer: | Vincent Carey <[email protected]> |
License: | Artistic-2.0 |
Version: | 2.1.3 |
Built: | 2024-12-05 21:45:01 UTC |
Source: | https://github.com/bioc/ontoProc |
subset method
## S3 method for class 'owlents' x[i, j, drop = FALSE]
## S3 method for class 'owlents' x[i, j, drop = FALSE]
x |
owlents instance |
i |
character or numeric vector |
j |
not used |
drop |
not used |
allGOterms: data.frame with ids and terms
allGOterms
allGOterms
data.frame instance
This is a snapshot of all the terms available from GO.db (3.4.2), August 2017, using keys(GO.db, keytype="TERM").
data(allGOterms) head(allGOterms)
data(allGOterms) head(allGOterms)
retrieve ancestor 'sets'
ancestors(oe)
ancestors(oe)
oe |
owlents instance |
a list of sets
pa = get_ordo_owl_path() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { orde = setup_entities(pa) orde ancestors(orde[1:5]) labels(orde[1:5]) }
pa = get_ordo_owl_path() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { orde = setup_entities(pa) orde ancestors(orde[1:5]) labels(orde[1:5]) }
obtain list of names of a set of ancestors
ancestors_names(anclist)
ancestors_names(anclist)
anclist |
output of 'ancestors' |
list of vectors of character()
non-entities are removed and names are extracted
pa = get_ordo_owl_path() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { orde = setup_entities(pa) al = ancestors(orde[1001:1002]) ancestors_names(al) }
pa = get_ordo_owl_path() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { orde = setup_entities(pa) al = ancestors(orde[1001:1002]) ancestors_names(al) }
add mapping from informal to formal cell type tags to a SummarizedExperiment colData
bind_formal_tags(se, informal, tagmap, force = FALSE)
bind_formal_tags(se, informal, tagmap, force = FALSE)
se |
SummarizedExperiment instance |
informal |
character(1) name of colData element with uncontrolled vocabulary |
tagmap |
data.frame with columns 'informal' and 'formal' |
force |
logical(1), defaults to FALSE; if TRUE, allows clobbering existing colData variable named "formal" |
SummarizedExperiment instance with a new colData column 'label.ont' giving the formal tags associated with each sample
This function will fail if the value of 'informal' is not among the colData variable names, or if "formal" is among the colData variable names.
produce bioregistry_ols table
bioregistry_ols_resources()
bioregistry_ols_resources()
data.frame
This uses the 'resources' method of the bioregistry module from pip to isolate resources with a non-null 'ols' component.
tab = bioregistry_ols_resources() head(tab[,1:3])
tab = bioregistry_ols_resources() head(tab[,1:3])
combine TermSet instances
## S4 method for signature 'TermSet' c(x, ...)
## S4 method for signature 'TermSet' c(x, ...)
x |
TermSet instance |
... |
additional instances |
TermSet instance
utilities for approximate matching of cell type terms to GO categories and annotations
cellTypeToGO(celltypeString, gotab, ...) cellTypeToGenes( celltypeString, gotab, orgDb, cols = c("ENSEMBL", "SYMBOL"), ... )
cellTypeToGO(celltypeString, gotab, ...) cellTypeToGenes( celltypeString, gotab, orgDb, cols = c("ENSEMBL", "SYMBOL"), ... )
celltypeString |
character atom to be used to search GO terms using |
gotab |
a data.frame with columns GO (goids) and TERM (term strings)
|
... |
additional arguments to |
orgDb |
instances of orgDb |
cols |
columns to be retrieved in select operation |
data.frame
data.frame
Very primitive, uses agrep to try to find relevant terms.
library(org.Hs.eg.db) data(allGOterms) head(cellTypeToGO("serotonergic neuron", allGOterms)) head(cellTypeToGenes("serotonergic neuron", allGOterms, org.Hs.eg.db))
library(org.Hs.eg.db) data(allGOterms) head(cellTypeToGO("serotonergic neuron", allGOterms)) head(cellTypeToGenes("serotonergic neuron", allGOterms, org.Hs.eg.db))
obtain list of names of a set of subclasses/children
children_names(sclist)
children_names(sclist)
sclist |
output of 'subclasses' |
list of vectors of character()
non-entities are removed and names are extracted
pa = get_ordo_owl_path() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { orde = setup_entities(pa) al = subclasses(orde[100:120]) children_names(al) }
pa = get_ordo_owl_path() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { orde = setup_entities(pa) al = subclasses(orde[100:120]) children_names(al) }
obtain named character vector of terms from Cell Line Ontology, omitting obsolete and trailing 'cell'
cleanCLOnames()
cleanCLOnames()
character()
cleanCLOnames()[1:10]
cleanCLOnames()[1:10]
produce a data.frame of features relevant to a Cell Ontology class
CLfeats(ont, tag = "CL:0001054", pr, go)
CLfeats(ont, tag = "CL:0001054", pr, go)
ont |
instance of ontologyIndex ontology |
tag |
character(1) a CL: class tag |
pr |
instance of ontologyIndex PRO protein ontology |
go |
instance of ontologyIndex GO gene ontology |
a data.frame instance
This function will look in the intersection_of and has_part, lacks_part components of the CL entry to find properties asserted of or inherited by the cell type identified in 'tag'. As of 1.19, this function does not look in global environment for ontologies. We use 2021 versions in the examples because some changes in ontologies omit important relationships; revisions to package code after 1.19.4 will attempt to address these.
cl = getOnto("cellOnto", year_added="2021") pr = getOnto("Pronto", "2021") # legacy tag, for 2022 would be PROnto go = getOnto("goOnto", "2021") CLfeats(cl, tag="CL:0001054", pr=pr, go=go)
cl = getOnto("cellOnto", year_added="2021") pr = getOnto("Pronto", "2021") # legacy tag, for 2022 would be PROnto go = getOnto("goOnto", "2021") CLfeats(cl, tag="CL:0001054", pr=pr, go=go)
list and count samples with common ontological annotation in two SEs
common_classes(ont, se1, se2)
common_classes(ont, se1, se2)
ont |
instance of ontologyIndex ontology |
se1 |
a SummarizedExperiment using 'label.ont' in colData to provide ontological tags (from 'ont') for samples |
se2 |
a SummarizedExperiment using 'label.ont' in colData to provide ontological tags (from 'ont') for samples |
a data.frame with rownames given by the common tags, the class names as column 'clname', and counts of samples bearing the given tags in remaining columns.
if (requireNamespace("celldex")) { imm = celldex::ImmGenData() if ("label.ont" %in% names(SummarizedExperiment::colData(imm))) { cl = getOnto("cellOnto") blu = celldex::BlueprintEncodeData() common_classes( cl, imm, blu ) } }
if (requireNamespace("celldex")) { imm = celldex::ImmGenData() if ("label.ont" %in% names(SummarizedExperiment::colData(imm))) { cl = getOnto("cellOnto") blu = celldex::BlueprintEncodeData() common_classes( cl, imm, blu ) } }
connect ontological categories between related, annotated SummarizedExperiments
connect_classes(ont, se1, se2)
connect_classes(ont, se1, se2)
ont |
an ontologyIndex ontology instance |
se1 |
SummarizedExperiment instance with 'label.ont' among colData columns |
se2 |
SummarizedExperiment instance with 'label.ont' among colData columns |
a list with two sublists mapping from terms in one SE to descendant terms in the other SE
app to review molecular properties of cell types via cell ontology
ctmarks(cl, pr, go)
ctmarks(cl, pr, go)
cl |
an import of a Cell Ontology (or extended Cell Ontology) in ontology_index form |
pr |
an import of a Protein Ontology in ontology_index form |
go |
an import of a Gene Ontology in ontology_index form |
a data.frame with features for selected cell types
Prototype of harvesting of cell ontology by searching has_part, has_plasma_membrane_part, intersection_of and allied ontology relationships. Uses shiny. Can perform better if getPROnto() and getGeneOnto() values are in .GlobalEnv as pr and go respectively.
if (interactive()) { co = getOnto("cellOnto", year_added="2023") # has plasma membrane relations go = getOnto("goOnto", "2023") pr = getOnto("Pronto", "2021") # peculiar tag used in legacy, would be PROnto with 2022 ctmarks(co, go, pr) }
if (interactive()) { co = getOnto("cellOnto", year_added="2023") # has plasma membrane relations go = getOnto("goOnto", "2023") pr = getOnto("Pronto", "2021") # peculiar tag used in legacy, would be PROnto with 2022 ctmarks(co, go, pr) }
as in Bakken et al. (2017 PMID 29322913) create gene signatures for k cell types, each of which fails to express all but one gene in a set of k genes
cyclicSigset( idvec, conds = c("hasExp", "lacksExp"), tags = paste0("CL:X", 1:length(idvec)) )
cyclicSigset( idvec, conds = c("hasExp", "lacksExp"), tags = paste0("CL:X", 1:length(idvec)) )
idvec |
character vector of identifiers, must have names() set to identify cells bearing genes |
conds |
character(2) tokens used to indicate condition to which signature element contributes |
tags |
character vector of cell-type identifiers; for Cell Ontology use CL: as prefix, one element for each element of idvec |
a long data.frame
sigels = c("CL:X01"="GRIK3", "CL:X02"="NTNG1", "CL:X03"="BAGE2", "CL:X04"="MC4R", "CL:X05"="PAX6", "CL:X06"="TSPAN12", "CL:X07"="hSHISA8", "CL:X08"="SNCG", "CL:X09"="ARHGEF28", "CL:X10"="EGF") sigdf = cyclicSigset(sigels) head(sigdf)
sigels = c("CL:X01"="GRIK3", "CL:X02"="NTNG1", "CL:X03"="BAGE2", "CL:X04"="MC4R", "CL:X05"="PAX6", "CL:X06"="TSPAN12", "CL:X07"="hSHISA8", "CL:X08"="SNCG", "CL:X09"="ARHGEF28", "CL:X10"="EGF") sigdf = cyclicSigset(sigels) head(sigdf)
demonstrate the use of makeSelectInput
demoApp()
demoApp()
Run only for side effect of starting a shiny app.
if (interactive()) { require(shiny) print(demoApp()) }
if (interactive()) { require(shiny) print(demoApp()) }
dropStop is a utility for removing certain words from text data
dropStop(x, drop, lower = TRUE, splitby = " ")
dropStop(x, drop, lower = TRUE, splitby = " ")
x |
character vector of strings to be cleaned |
drop |
character vector of words to scrub |
lower |
logical, if TRUE, x converted with |
splitby |
character, used with strsplit to tokenize |
a list with one element per input string, split by " ", with elements in drop
removed
data(minicorpus) minicorpus[1:3] dropStop(minicorpus)[1:3]
data(minicorpus) minicorpus[1:3] dropStop(minicorpus)[1:3]
some fields of interest are lists, and grep per se should not be used – this function checks and uses grep within vapply when appropriate
fastGrep(patt, onto, field, ...)
fastGrep(patt, onto, field, ...)
patt |
a regular expression whose presence in field should be checked |
onto |
an ontologyIndex instance |
field |
the ontologyIndex component to be searched |
... |
passed to grep |
logical vector indicating vector or list elements where a match is found
cheb = getOnto("chebi_lite") ind = fastGrep("tanespimycin", cheb, "name") cheb$name[ind]
cheb = getOnto("chebi_lite") ind = fastGrep("tanespimycin", cheb, "name") cheb$name[ind]
Given a set of ontology terms, find their latest common ancestors based on the term hierarchy.
findCommonAncestors(..., g, remove.self = TRUE, descriptions = NULL)
findCommonAncestors(..., g, remove.self = TRUE, descriptions = NULL)
... |
One or more (possibly named) character vectors containing ontology terms. |
g |
A graph object containing the hierarchy of all ontology terms. |
remove.self |
Logical scalar indicating whether to ignore ancestors containing only a single term (themselves). |
descriptions |
Named character vector containing plain-English descriptions for each term. Names should be the term identifier while the values are the descriptions. |
This function identifies all terms in g
that are the latest common ancestor (LCA) of any subset of terms in ...
.
An LCA is one that has no children that have the exact same set of descendent terms in ...
,
i.e., it is the most specific term for that set of observed descendents.
Knowing the LCA is useful for deciding how terms should be rolled up to broader definitions in downstream applications,
usually when the exact terms in ...
are too specific for practical use.
The descendents
DataFrame in each row of the output describes the descendents for each LCA,
stratified by their presence or absence in each entry of ...
.
This is particularly useful for seeing how different sets of terms would be aggregated into broader terms,
e.g., when harmonizing annotation from different datasets or studies.
Note that any names for ...
will be reflected in the columns of the DataFrame for each LCA.
A DataFrame where each row corresponds to a common ancestor term.
This contains the columns number
, the number of descendent terms across all vectors in ...
;
and descendents
, a List of DataFrames containing the identities of the descendents.
It may also contain the column description
, containing the description for each term.
Aaron Lun
co <- getOnto("cellOnto") # TODO: wrap in utility function. parents <- co$parents self <- rep(names(parents), lengths(parents)) library(igraph) g <- make_graph(rbind(unlist(parents), self)) # Selecting random terms: LCA <- ontoProc:::findCommonAncestors(A=sample(names(V(g)), 20), B=sample(names(V(g)), 20), g=g) LCA[1,] LCA[1,"descendents"][[1]]
co <- getOnto("cellOnto") # TODO: wrap in utility function. parents <- co$parents self <- rep(names(parents), lengths(parents)) library(igraph) g <- make_graph(rbind(unlist(parents), self)) # Selecting random terms: LCA <- ontoProc:::findCommonAncestors(A=sample(names(V(g)), 20), B=sample(names(V(g)), 20), g=g) LCA[1,] LCA[1,"descendents"][[1]]
return a generator with ontology classes
get_classes(owlfile)
get_classes(owlfile)
owlfile |
reference to OWL file, can be URL, will be processed by owlready2.get_ontology |
generator with output of classes() on the loaded ontology
decompress ordo owl file
get_ordo_owl_path(target = tempdir())
get_ordo_owl_path(target = tempdir())
target |
character(1) path to where decompressed owl will live |
basic getters in old style, retained 2023 for deprecation interval
getChebiLite() getCellosaurusOnto() getUBERON_NE() getChebiOnto() getOncotreeOnto() getDiseaseOnto() getGeneOnto() getHCAOnto() getPROnto() getPATOnto() getMondoOnto() getSIOOnto()
getChebiLite() getCellosaurusOnto() getUBERON_NE() getChebiOnto() getOncotreeOnto() getDiseaseOnto() getGeneOnto() getHCAOnto() getPROnto() getPATOnto() getMondoOnto() getSIOOnto()
instance of ontology_index (S3) from ontologyIndex
getChebiOnto loads ontoRda/chebi_full.rda
getOncotreeOnto loads ontoRda/oncotree.rda
getDiseaseOnto loads ontoRda/diseaseOnto.rda
getHCAOnto loads ontoRda/hcaOnto.rda produced from hcao.owl at https://github.com/HumanCellAtlas/ontology/releases/tag/1.0.6 2/11/2019, python pronto was used to convert OWL to OBO.
getPROnto loads ontoRda/PRonto.rda, produced from http://purl.obolibrary.org/obo/pr.obo 'reasoned' ontology from OBO foundry, 02-08-2019. In contrast to other ontologies, this is imported via get_OBO with ‘extract_tags=’minimal''.
getPATOnto loads ontoRda/patoOnto.rda, produced from https://raw.githubusercontent.com/pato-ontology/pato/master/pato.obo from OBO foundry, 02-08-2019.
obtain childless descendents of a term (including query)
getLeavesFromTerm(x, ont)
getLeavesFromTerm(x, ont)
x |
a character(1) id element for ontology_index instance |
ont |
an ontology_index instance as defined in ontologyIndex package |
character vector of 'leaves' of ontology tree
ch = getOnto("chebi_lite") alldr = getLeavesFromTerm("CHEBI:23888", ch) head(ch$name[alldr[1:15]])
ch = getOnto("chebi_lite") alldr = getLeavesFromTerm("CHEBI:23888", ch) head(ch$name[alldr[1:15]])
get the ontology based on a short tag and year
getOnto(ontoname = "cellOnto", year_added = "2023")
getOnto(ontoname = "cellOnto", year_added = "2023")
ontoname |
character(1) must be an element in 'valid_ontonames()' |
year_added |
character(1) refers to 'rdatadateadded' in AnnotationHub metadata |
This queries AnnotationHub for "ontoProcData" and then filters to find the AnnotationHub accession number and retrieves the ontologyIndex serialization of the associated OBO representation of the ontology.
co = getOnto() tail(co$name[1000:1500])
co = getOnto() tail(co$name[1000:1500])
humrna: a data.frame of SRA metadata related to RNA-seq in humans
humrna
humrna
data.frame
arbitrarily chosen from RNA-seq studies for taxon 9606
NCBI SRA
data(humrna) names(humrna) head(humrna[,1:5])
data(humrna) names(humrna) head(humrna[,1:5])
inject linefeeds for node names for graph, with textual annotation from ontology
improveNodes(g, ont)
improveNodes(g, ont)
g |
graphNEL instance |
ont |
instance of ontology from ontologyIndex |
retrieve labels with names
## S3 method for class 'owlents' labels(object, ...)
## S3 method for class 'owlents' labels(object, ...)
object |
owlents instance |
... |
not used |
When multiple labels are present, only first is silently returned. Note that reticulate 1.35.0 made a change that appears to imply that '[0]' can be used to retrieve the desired components. To get ontology tags, use 'names(labels(...))'. Note: This function was revised Jul 12 2024 to allow terms that lack labels (like CHEBI references in cl.owl) to be processed, returning NA. The previous functionality which failed is available, not exported, as labelsOLD.owlents.
clont_path = owl2cache(url="http://purl.obolibrary.org/obo/cl.owl") o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { clont = setup_entities(clont_path) labels(clont[1:5]) labels(clont[51:55]) }
clont_path = owl2cache(url="http://purl.obolibrary.org/obo/cl.owl") o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { clont = setup_entities(clont_path) labels(clont[1:5]) labels(clont[51:55]) }
use output of cyclicSigset to generate a series of character vectors constituting OBO terms
ldfToTerms( ldf, propmap, sigels, prologMaker = function(id, ...) sprintf("id: %s", id) )
ldfToTerms( ldf, propmap, sigels, prologMaker = function(id, ...) sprintf("id: %s", id) )
ldf |
a 'long format' data.frame as created by cyclicSigset |
propmap |
a character vector with names of elements corresponding to 'abbreviated' relationship tokens and element values corresponding to full relationship-naming strings |
sigels |
a named character vector associating cell types (names) to genes expressed in a cyclic set, one element per type |
prologMaker |
a function with arguments (id, ...), in which id is character(1), that generates a vector of strings that will be used for each cell type-specific term. |
a character vector, strings can be concatenated to OBO
ldfToTerms is not sufficiently general to produce terms for any reasonably populated long data frame/propmap combination, but it is a working example for the cyclic set context.
# a set of cell types -- names are cell type token, values are genes expressed in a # cyclic set -- each cell type expresses exactly one gene in the set and fails to # express all the other genes in the set. See Figs 3 and 4 of Bakken et al [PMID 29322913]. sigels = c("CL:X01"="GRIK3", "CL:X02"="NTNG1", "CL:X03"="BAGE2", "CL:X04"="MC4R", "CL:X05"="PAX6", "CL:X06"="TSPAN12", "CL:X07"="hSHISA8", "CL:X08"="SNCG", "CL:X09"="ARHGEF28", "CL:X10"="EGF") # create the associated long data frame ldf = cyclicSigset(sigels) # describe the abbreviations pmap = c("hasExp"="has_expression_of", lacksExp="lacks_expression_of") # now define the prolog for each cell type makeIntnProlog = function(id, ...) { # make type-specific prologs as key-value pairs c( sprintf("id: %s", id), sprintf("name: %s-expressing cortical layer 1 interneuron, human", ...), sprintf("def: '%s-expressing cortical layer 1 interneuron, human described via RNA-seq observations' [PMID 29322913]", ...), "is_a: CL:0000099 ! interneuron", "intersection_of: CL:0000099 ! interneuron") } tms = ldfToTerms(ldf, pmap, sigels, makeIntnProlog) cat(tms[[1]], sep="\n")
# a set of cell types -- names are cell type token, values are genes expressed in a # cyclic set -- each cell type expresses exactly one gene in the set and fails to # express all the other genes in the set. See Figs 3 and 4 of Bakken et al [PMID 29322913]. sigels = c("CL:X01"="GRIK3", "CL:X02"="NTNG1", "CL:X03"="BAGE2", "CL:X04"="MC4R", "CL:X05"="PAX6", "CL:X06"="TSPAN12", "CL:X07"="hSHISA8", "CL:X08"="SNCG", "CL:X09"="ARHGEF28", "CL:X10"="EGF") # create the associated long data frame ldf = cyclicSigset(sigels) # describe the abbreviations pmap = c("hasExp"="has_expression_of", lacksExp="lacks_expression_of") # now define the prolog for each cell type makeIntnProlog = function(id, ...) { # make type-specific prologs as key-value pairs c( sprintf("id: %s", id), sprintf("name: %s-expressing cortical layer 1 interneuron, human", ...), sprintf("def: '%s-expressing cortical layer 1 interneuron, human described via RNA-seq observations' [PMID 29322913]", ...), "is_a: CL:0000099 ! interneuron", "intersection_of: CL:0000099 ! interneuron") } tms = ldfToTerms(ldf, pmap, sigels, makeIntnProlog) cat(tms[[1]], sep="\n")
Produce a data.frame with a set of naive terms mapped to all matching ontology ids and their formal terms
liberalMap(terms, onto, useAgrep = FALSE, ...)
liberalMap(terms, onto, useAgrep = FALSE, ...)
terms |
character() vector, can use grep-compatible regular expressions |
onto |
an instance of ontologyIndex::ontology_index |
useAgrep |
logical(1) if TRUE, agrep will be used |
... |
passed to agrep if used |
a data.frame
cands = c("astrocyte$", "oligodendrocyte", "oligodendrocyte precursor", "neoplastic", "^neuron$", "^vascular", "badterm") #co = ontoProc::getCellOnto() co = getOnto("cellOnto", year_added="2023") liberalMap(cands, co)
cands = c("astrocyte$", "oligodendrocyte", "oligodendrocyte precursor", "neoplastic", "^neuron$", "^vascular", "badterm") #co = ontoProc::getCellOnto() co = getOnto("cellOnto", year_added="2023") liberalMap(cands, co)
obtain graphNEL from ontology_plot instance of ontologyPlot
make_graphNEL_from_ontology_plot(x)
make_graphNEL_from_ontology_plot(x)
x |
instance of S3 class ontology_plot |
instance of S4 graphNEL class
requireNamespace("Rgraphviz") requireNamespace("graph") cl = getOnto("cellOnto") cl3k = c("CL:0000492", "CL:0001054", "CL:0000236", "CL:0000625", "CL:0000576", "CL:0000623", "CL:0000451", "CL:0000556") p3k = ontologyPlot::onto_plot(cl, cl3k) gnel = make_graphNEL_from_ontology_plot(p3k) gnel = improveNodes(gnel, cl) graph::graph.par(list(nodes=list(shape="plaintext", cex=.8))) gnel = Rgraphviz::layoutGraph(gnel) Rgraphviz::renderGraph(gnel)
requireNamespace("Rgraphviz") requireNamespace("graph") cl = getOnto("cellOnto") cl3k = c("CL:0000492", "CL:0001054", "CL:0000236", "CL:0000625", "CL:0000576", "CL:0000623", "CL:0000451", "CL:0000556") p3k = ontologyPlot::onto_plot(cl, cl3k) gnel = make_graphNEL_from_ontology_plot(p3k) gnel = improveNodes(gnel, cl) graph::graph.par(list(nodes=list(shape="plaintext", cex=.8))) gnel = Rgraphviz::layoutGraph(gnel) Rgraphviz::renderGraph(gnel)
generate a selectInput control for an ontologyIndex slice
makeSelectInput( onto, term, type = "siblings", inputId, label, multiple = TRUE, ... )
makeSelectInput( onto, term, type = "siblings", inputId, label, multiple = TRUE, ... )
onto |
ontologyIndex instance |
term |
character(1) term used as basis for term list option set in the control |
type |
character(1) 'siblings' or 'children', relationship to 'term' that the options will satisfy |
inputId |
character(1) for use in server |
label |
character(1) for labeling in ui |
multiple |
logical(1) passed to |
... |
additional parameters passed to |
a selectInput
control
makeSelectInput
makeSelectInput
use prose terminology with output of connect_classes
map2prose(x, cl)
map2prose(x, cl)
x |
a component of connect_classes output |
cl |
an ontologyIndex ontology instance |
a decorated list
use grep or agrep to find a match for a naive token into ontology
mapOneNaive(naive, onto, useAgrep = FALSE, ...)
mapOneNaive(naive, onto, useAgrep = FALSE, ...)
naive |
character(1) |
onto |
an instance of ontologyIndex::ontology_index |
useAgrep |
logical(1) if TRUE, agrep will be used |
... |
passed to agrep if used |
if a match is found, the result of grep/agrep with value=TRUE is returned; otherwise a named NA_character_ is returned
named vector, names are ontology identifiers, values are matched strings
#co = ontoProc::getCellOnto() co = getOnto("cellOnto", year_added="2023") mapOneNaive("astrocyte", co)
#co = ontoProc::getCellOnto() co = getOnto("cellOnto", year_added="2023") mapOneNaive("astrocyte", co)
minicorpus: a vector of annotation strings found in 'study title' of SRA metadata.
minicorpus
minicorpus
character vector
arbitrarily chosen from titles of RNA-seq studies for taxon 9606
NCBI SRA
data(minicorpus) head(minicorpus)
data(minicorpus) head(minicorpus)
repair nomenclature mismatches (to curated term set) in a vector of terms
nomenCheckup(cand, namedOffic, n = 1, tagcolname = "tag", ...)
nomenCheckup(cand, namedOffic, n = 1, tagcolname = "tag", ...)
cand |
character vector of candidate terms |
namedOffic |
named character vector of curated terms, the names are regarded as tags, intended to be identifiers in curated ontologies |
n |
numeric(1) number of nearest neighbors to return |
tagcolname |
character(1) prefix used to name columns for tags in output |
... |
passed to |
a data.frame instance with 2n+1 columns (column 1 is candidate,
remaining n pairs of columns are (term, tag) for n nearest neighbors
as measured by adist
.
candidates = c("JHH7", "HUT102", "HS739T", "NCIH716") # the candidates are cell line names returned in the text dump from # https://portals.broadinstitute.org/ccle/page?gene=AHR # note that one must travel to the third nearest neighbor # to find the match (and tag) for Hs 739.T # in this example, we compare to cell line names in Cell Line Ontology nomenCheckup(candidates, cleanCLOnames(), n=3, tagcolname="clo")
candidates = c("JHH7", "HUT102", "HS739T", "NCIH716") # the candidates are cell line names returned in the text dump from # https://portals.broadinstitute.org/ccle/page?gene=AHR # note that one must travel to the third nearest neighbor # to find the match (and tag) for Hs 739.T # in this example, we compare to cell line names in Cell Line Ontology nomenCheckup(candidates, cleanCLOnames(), n=3, tagcolname="clo")
high-level use of graph/Rgraphviz for rendering ontology relations
onto_plot2(ont, terms2use, cex = 0.8, ...)
onto_plot2(ont, terms2use, cex = 0.8, ...)
ont |
instance of ontology from ontologyIndex |
terms2use |
character vector |
cex |
numeric(1) defaults to .8, supplied to Rgraphviz::graph.par |
... |
passed to onto_plot of ontologyPlot |
graphNEL instance (invisibly)
cl = getOnto("cellOnto") cl3k = c("CL:0000492", "CL:0001054", "CL:0000236", "CL:0000625", "CL:0000576", "CL:0000623", "CL:0000451", "CL:0000556") onto_plot2(cl, cl3k)
cl = getOnto("cellOnto") cl3k = c("CL:0000492", "CL:0001054", "CL:0000236", "CL:0000625", "CL:0000576", "CL:0000623", "CL:0000451", "CL:0000556") onto_plot2(cl, cl3k)
list parentless nodes in ontology_index instance
onto_roots(x)
onto_roots(x)
x |
an ontology_index instance |
a report (produced by cat()) of root ids and associated names
onto_roots
onto_roots
Highlights in green the terms that are present in the new ontology but not the old one
ontoDiff(newonto, oldonto, terms2use, cex = 0.8, ...)
ontoDiff(newonto, oldonto, terms2use, cex = 0.8, ...)
newonto |
the newest version of the ontology |
oldonto |
the old version of the ontology |
terms2use |
terms of interest |
cex |
numeric(1) defaults to .8, supplied to Rgraphviz::graph.par |
... |
passed to onto_plot of ontologyPlot |
onto_plot2 style plot with version differences highlighted
cl = getOnto("diseaseOnto") cl2 = getOnto(ontoname = "diseaseOnto", year_added = "2021") cl3k = c("DOID:0040064","DOID:0040076","DOID:0081127","DOID:0081126","DOID:0081131","DOID:0060034") ontoDiff(cl,cl2,cl3k)
cl = getOnto("diseaseOnto") cl2 = getOnto(ontoname = "diseaseOnto", year_added = "2021") cl3k = c("DOID:0040064","DOID:0040076","DOID:0081127","DOID:0081126","DOID:0081131","DOID:0060034") ontoDiff(cl,cl2,cl3k)
cache an owl file accessible via URL
owl2cache(cache = BiocFileCache::BiocFileCache(), url)
owl2cache(cache = BiocFileCache::BiocFileCache(), url)
cache |
BiocFileCache instance or equivalent |
url |
character(1) |
This function will check for presence of url in cache using bfcquery; if a hit is found, returns the rpath associated with the last matching record. etags can be available for use with bfcneedsupdate.
ca = BiocFileCache::BiocFileCache() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { hppa = owl2cache(ca, url="http://purl.obolibrary.org/obo/hp/releases/2023-10-09/hp-base.owl") setup_entities(hppa) }
ca = BiocFileCache::BiocFileCache() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { hppa = owl2cache(ca, url="http://purl.obolibrary.org/obo/hp/releases/2023-10-09/hp-base.owl") setup_entities(hppa) }
packDesc2019: overview of ontoProc resources
packDesc2019
packDesc2019
data.frame instance
Brief survey of functions available to load serialized ontology_index instances imported from OBO.
data(packDesc2019) head(packDesc2019)
data(packDesc2019) head(packDesc2019)
packDesc2021: overview of ontoProc resources
packDesc2021
packDesc2021
data.frame instance
Brief survey of functions available to load serialized ontology_index instances imported from OBO. Focus is on versions added in 2021.
data(packDesc2021) head(packDesc2021)
data(packDesc2021) head(packDesc2021)
packDesc2022: overview of ontoProc resources
packDesc2022
packDesc2022
data.frame instance
Brief survey of functions available to load serialized ontology_index instances imported from OBO. Focus is on versions added in 2022.
data(packDesc2022) head(packDesc2022)
data(packDesc2022) head(packDesc2022)
packDesc2023: overview of ontoProc resources
packDesc2023
packDesc2023
data.frame instance
Brief survey of functions available to load serialized ontology_index instances imported from OBO. Focus is on versions added in 2023. Several manual interventions were needed – cellosaurus was too large to use the script in inst/scripts/desc.R, and a number of ontologies do not have 2023 versions.
data(packDesc2023) head(packDesc2023)
data(packDesc2023) head(packDesc2023)
retrieve is_a
parents(oe)
parents(oe)
oe |
owlents instance |
list of vectors of tags of parents
pa = get_ordo_owl_path() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { orde = setup_entities(pa) orde parents(orde[1000:1001]) labels(orde[1000:1001]) }
pa = get_ordo_owl_path() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { orde = setup_entities(pa) orde parents(orde[1000:1001]) labels(orde[1000:1001]) }
visualize ontology selection via onto_plot2, based on owlents
plot.owlents(x, y, ..., dropThing = TRUE)
plot.owlents(x, y, ..., dropThing = TRUE)
x |
owlents instance |
y |
character() vector of entries in x$clnames |
... |
passed to onto_plot2 |
dropThing |
logical(1) defaults to TRUE; if "Thing" is present in terms to display, it is removed |
cl3k = c("CL:0000492", "CL:0001054", "CL:0000236", "CL:0000625", "CL:0000576", "CL:0000623", "CL:0000451", "CL:0000556") cl3k = gsub(":", "_", cl3k) clont_path = owl2cache(url="http://purl.obolibrary.org/obo/cl.owl") o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { clont = setup_entities(clont_path) plot(clont,cl3k) }
cl3k = c("CL:0000492", "CL:0001054", "CL:0000236", "CL:0000625", "CL:0000576", "CL:0000623", "CL:0000451", "CL:0000556") cl3k = gsub(":", "_", cl3k) clont_path = owl2cache(url="http://purl.obolibrary.org/obo/cl.owl") o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { clont = setup_entities(clont_path) plot(clont,cl3k) }
short printer
## S3 method for class 'owlents' print(x, ...)
## S3 method for class 'owlents' print(x, ...)
x |
owlents instance |
... |
not used |
PROSYM: HGNC symbol synonyms for PR (protein ontology) entries identified in Cell Ontology
PROSYM
PROSYM
data.frame instance
This is a snapshot of the synonyms component of an extract_tags='everything' import of PR. The 'EXACT.*PRO-short.*:DNx' pattern is used to retrieve HGNC symbols. See ?getPROnto for more provenance information.
OBO Foundry
data(PROSYM) head(PROSYM)
data(PROSYM) head(PROSYM)
enumerate ontological relationships used in ontoProc utilities
recognizedPredicates()
recognizedPredicates()
character vector, names of elements are abbreviated tokens that may be used in code
head(recognizedPredicates())
head(recognizedPredicates())
use owlready2 ontology search facility on term labels
search_labels(ontopath, regexp, case_sensitive = TRUE)
search_labels(ontopath, regexp, case_sensitive = TRUE)
ontopath |
character(1) path to owl file |
regexp |
character(1) simple regular expression |
case_sensitive |
logical(1) should case be respected in search? |
A named list: term labels are elements, tags are names of elements. Will return NULL if nothing is found.
pa = get_ordo_owl_path() ol = search_labels(pa, "*Immunog*") orde = setup_entities2(pa) onto_plot2(orde, names(ol))
pa = get_ordo_owl_path() ol = search_labels(pa, "*Immunog*") orde = setup_entities2(pa) onto_plot2(orde, names(ol))
simple generation of children of 'choices' given as terms, returned as TermSet
secLevGen(choices, ont)
secLevGen(choices, ont)
choices |
vector of terms |
ont |
instance of ontology_index (S3) from ontologyIndex package |
TermSet instance
efoOnto = getOnto("efoOnto") secLevGen( "disease", efoOnto )
efoOnto = getOnto("efoOnto") secLevGen( "disease", efoOnto )
select a set of elements from a term 'map' and return a contribution to a data.frame
selectFromMap(namedvec, index)
selectFromMap(namedvec, index)
namedvec |
named character vector, as returned from |
index |
numeric() or integer(), typically of length one |
a data.frame; if index
does not inherit from
numeric
, a data.frame of one row with columns 'ontoid'
and 'term' populated with NA_character_
is returned,
otherwise a similarly named data.frame is returned with
contents from the selected elements of namedvec
#co = ontoProc::getCellOnto() co = getOnto("cellOnto", year_added="2023") mast = mapOneNaive("astrocyte", co) selectFromMap(mast, 1)
#co = ontoProc::getCellOnto() co = getOnto("cellOnto", year_added="2023") mast = mapOneNaive("astrocyte", co) selectFromMap(mast, 1)
construct owlents instance from an owl file
setup_entities(owlfn)
setup_entities(owlfn)
owlfn |
character(1) path to valid owl ontology |
instance of owlents, which is a list with clnames ( a vector of term names in form '[namespace]_[tag]'), allents (a list with python references to owlready2 entities, that can be operated on using owlready2.EntityClass methods), owlfn (filename), iri (IRI), call (record of call producing the entity.)
pa = get_ordo_owl_path() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { orde = setup_entities(pa) orde ancestors(orde[1000:1001]) labels(orde[1000:1001]) }
pa = get_ordo_owl_path() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { orde = setup_entities(pa) orde ancestors(orde[1000:1001]) labels(orde[1000:1001]) }
preparing for a small number of entry points to owlready2 mediated by basilisk, this setup function will ingest OWL, enumerate classes and their names, and produce the 'parents' list, which can then be used with ontology_index to produce a functional ontology representation
setup_entities2(owlfn, cache_object = TRUE)
setup_entities2(owlfn, cache_object = TRUE)
owlfn |
character(1) path to OWL file |
cache_object |
logical(1) if TRUE, cache the 'ontology_index' instance in BiocFileCache::BiocFileCache() |
Production of an 'ontology_index' instance will often throw a warning when "Thing" is part of the ontology. suppressWarnings has been used in the code to suppress this. This may be too aggressive an approach.
pa = get_ordo_owl_path() orde = setup_entities2(pa) orde
pa = get_ordo_owl_path() orde = setup_entities2(pa) orde
tabulate the basic outcome of PBMC 3K tutorial of Seurat
seur3kTab()
seur3kTab()
a data.frame
seur3kTab()
seur3kTab()
generate a TermSet with siblings of a given term, excluding that term by default
acquire the label of an ontology subject tag
acquire the labels of children of an ontology subject tag
siblings_TAG(Tagstring = "EFO:1001209", ontology, justSibs = TRUE) label_TAG(Tagstring = "EFO:0000311", ontology) children_TAG(Tagstring = "EFO:1001209", ontology)
siblings_TAG(Tagstring = "EFO:1001209", ontology, justSibs = TRUE) label_TAG(Tagstring = "EFO:0000311", ontology) children_TAG(Tagstring = "EFO:1001209", ontology)
Tagstring |
a character(1) that identifies a term |
ontology |
instance of ontology_index (S3) from ontologyIndex |
justSibs |
character(1) |
TermSet instance
character(1)
TermSet instance
for label_TAG
, Tagstring
may be a vector
efoOnto = getOnto("efoOnto") siblings_TAG( "EFO:1001209", efoOnto ) efoOnto = getOnto("efoOnto") label_TAG( "EFO:0000311", efoOnto ) efoOnto = getOnto("efoOnto") children_TAG( ontology = efoOnto )
efoOnto = getOnto("efoOnto") siblings_TAG( "EFO:1001209", efoOnto ) efoOnto = getOnto("efoOnto") label_TAG( "EFO:0000311", efoOnto ) efoOnto = getOnto("efoOnto") children_TAG( ontology = efoOnto )
stopWords: vector of stop words from xpo6.com
stopWords
stopWords
character vector
"Stop words" are english words that are assumed to contribute limited semantic value in the analysis of free text.
http://xpo6.com/list-of-english-stop-words/
data(stopWords) head(stopWords)
data(stopWords) head(stopWords)
retrieve subclass entities
subclasses(oe)
subclasses(oe)
oe |
owlents instance |
pa = get_ordo_owl_path() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { orde = setup_entities(pa) orde sc <- subclasses(orde[1:5]) labels(orde[3]) o3 = reticulate::iterate(sc[[3]]) print(length(o3)) o3[[2]] labels(orde["Orphanet_100011"]) }
pa = get_ordo_owl_path() o2 = try(reticulate::import("owlready2"), silent=TRUE) if (!inherits(o2, "try-error")) { orde = setup_entities(pa) orde sc <- subclasses(orde[1:5]) labels(orde[3]) o3 = reticulate::iterate(sc[[3]]) print(length(o3)) o3[[2]] labels(orde["Orphanet_100011"]) }
subset a SummarizedExperiment to which ontology tags have been bound using 'bind_formal_tags', obtaining the 'descendants' of the class of interest
subset_descendants( se, onto, class_name, class_tag, formal_cd_name = "label.ont" )
subset_descendants( se, onto, class_name, class_tag, formal_cd_name = "label.ont" )
se |
SummarizedExperiment instance |
onto |
representation of an ontology using representation from ontologyIndex package |
class_name |
character(1) if 'class_tag' is missing, this will be grepped in onto[["name"]] to find class and its descendants |
class_tag |
character(1) used if given to identify "ontological descendants" of this term in se |
formal_cd_name |
character(1) tells name used for ontology tag column in 'colData(se)' |
instance of SummarizedExperiment
use Cell Ontology and Protein Ontology to identify cell-type defining conditions in which a given gene is named
sym2CellOnto(sym, cl, pr)
sym2CellOnto(sym, cl, pr)
sym |
gene symbol, must be used in protein ontology as a PRO:DNx exact match token |
cl |
result of getOnto("cellOnto") |
pr |
result of getOnto("PROnto") |
DataFrame if any hits are found. A field 'cond' abbreviates the identified conditions: (has/lacks)PMP (plasma membrane part) (hi/lo)PMAmt (plasma membrane amount), (has/lacks)Part.
Currently just checks for *plasma_membrane_part, *plasma_membrane_amount, and *Part conditions.
if (!exists("cl")) cl = getOnto("cellOnto") if (!exists("pr")) pr = getOnto("PROnto") sym2CellOnto("ITGAM", cl, pr) sym2CellOnto("FOXP3", cl, pr)
if (!exists("cl")) cl = getOnto("cellOnto") if (!exists("pr")) pr = getOnto("PROnto") sym2CellOnto("ITGAM", cl, pr) sym2CellOnto("FOXP3", cl, pr)
manage ontological data with tags and a DataFrame instance
abbreviated display for TermSet instances
## S4 method for signature 'TermSet' show(object)
## S4 method for signature 'TermSet' show(object)
object |
instance of TermSet class |
instance of TermSet
efoOnto = getOnto("efoOnto") defsibs = siblings_TAG("EFO:1001209", efoOnto) class(defsibs) defsibs
efoOnto = getOnto("efoOnto") defsibs = siblings_TAG("EFO:1001209", efoOnto) class(defsibs) defsibs
check that a URL can get a 200 for a HEAD request
url_ok(url)
url_ok(url)
url |
character(1) |
logical(1)
give a vector of valid 'names' of ontoProc ontologies
valid_ontonames()
valid_ontonames()
head(valid_ontonames())
head(valid_ontonames())