rols is a Bioconductor package and should hence be installed using the dedicated functionality
To get help, either post your question on the Bioconductor support site or open an issue on the rols github page.
The Ontology Lookup Service (OLS) [1, 2] is originally spin-off of the PRoteomics IDEntifications database (PRIDE) service, located at the EBI, and is now developed and maintained by the Samples, Phenotypes and Ontologies team at EMBL-EBI.
The OLS provides a REST interface to hundreds of ontologies from a single location with a unified output format. The rols package make this possible from within R. Do do so, it relies on the httr package to query the REST interface, and access and retrieve data.
There are 266 ontologies available in the OLS, listed in the table below. Their name is to be use to defined which ontology to query.
The rols package is build around a few classes that enable to query the OLS and retrieve, store and manipulate data. Each of these classes are described in more details in their respective manual pages. We start by loading the package.
The Ontology
and Ontologies
classes can
store information about single of multiple ontologies. The latter can be
easily subset using [
and [[
, as one would for
lists.
## ⠙ Iterating 11 done (5.3/s) | 2.1s
## ⠙ Iterating 14 done (5.4/s) | 2.6s
## Object of class 'Ontologies' with 266 entries
## NULL, NULL ... NULL, NULL
## [1] "hra" "dcat" "biolink" "addicto" "dhba" "hba"
## Ontology: Biological Spatial Ontology (bspo)
## An ontology for respresenting spatial concepts, anatomical axes,
## gradients, regions, planes, sides and surfaces. These concepts can be
## used at multiple biological scales and in a diversity of taxa,
## including plants, animals and fungi. The BSPO is used to provide a
## source of anatomical location descriptors for logically defining
## anatomical entity classes in anatomy ontologies.
## Loaded: 2024-12-02 Updated: 2024-12-02 Version: 2023-05-27
## 169 terms 236 properties 18 individuals
It is also possible to initialise a single ontology
## Ontology: Biological Spatial Ontology (bspo)
## An ontology for respresenting spatial concepts, anatomical axes,
## gradients, regions, planes, sides and surfaces. These concepts can be
## used at multiple biological scales and in a diversity of taxa,
## including plants, animals and fungi. The BSPO is used to provide a
## source of anatomical location descriptors for logically defining
## anatomical entity classes in anatomy ontologies.
## Loaded: 2024-12-02 Updated: 2024-12-02 Version: 2023-05-27
## 169 terms 236 properties 18 individuals
Single ontology terms are stored in Term
objects. When
more terms need to be manipulated, they are stored as Terms
objects. It is easy to obtain all terms of an ontology of interest, and
the resulting Terms
object can be subset using
[
and [[
, as one would for lists.
## Object of class 'Terms' with 169 entries
## From the BSPO ontology
## BFO:0000002, BFO:0000003 ... IAO:0000409, PATO:0000001
## Object of class 'Terms' with 10 entries
## From the BSPO ontology
## BFO:0000002, BFO:0000003 ... BFO:0000023, BFO:0000031
## A Term from the BSPO ontology: BSPO:0000092
## Label: anatomical compartment boundary
## to be merged into CARO
It is also possible to initialise a single term
## [1] "BSPO:0000092"
## [1] "anatomical compartment boundary"
It is then possible to extract the ancestors
,
descendants
, parents
and children
terms. Each of these functions return a Terms
object
## Object of class 'Terms' with 1 entries
## From the BSPO ontology
## CARO:0000010
## Object of class 'Terms' with 6 entries
## From the BSPO ontology
## BSPO:0000094, BSPO:0000093 ... BSPO:0000041, BSPO:0000040
Finally, a single term or terms object can be coerced to a
data.frame
using as(x, "data.frame")
.
Properties (relationships) of single or multiple terms or complete
ontologies can be queries with the properties
method, as
briefly illustrated below.
## A Term from the UBERON ontology: UBERON:0002107
## Label: liver
## An exocrine gland which secretes bile and functions in metabolism of
## protein and carbohydrate and fat, synthesizes substances involved in
## the clotting of the blood, synthesizes vitamin A, detoxifies poisonous
## substances, stores glycogen, and breaks down worn-out erythrocytes[GO].
## Object of class 'Properties' with 160 entries
## From the UBERON ontology
## abdomen, digestive system ... liver serosa, liver subserosa
## A Property from the UBERON ontology: UBERON:0000916
## Label: abdomen
## [1] "abdomen"
A researcher might be interested in the trans-Golgi network.
Searching the OLS is assured by the OlsSearch
and
olsSearch
classes/functions. The first step is to defined
the search query with OlsSearch
, as shown below. This
creates an search object of class OlsSearch
that stores the
query and its parameters. In records the number of requested results
(default is 20) and the total number of possible results (there are
17500 results across all ontologies, in this case). At this stage, the
results have not yet been downloaded, as shown by the 0 responses.
## Object of class 'OlsSearch':
## query: trans-golgi network
## requested: 20 (out of 17500)
## response(s): 0
17500 results are probably too many to be relevant. Below we show how
to perform an exact search by setting exact = TRUE
, and
limiting the search the the GO ontology by specifying
ontology = "GO"
, or doing both.
## Object of class 'OlsSearch':
## query: trans-golgi network
## requested: 20 (out of 227)
## response(s): 0
## Object of class 'OlsSearch':
## ontolgy: GO
## query: trans-golgi network
## requested: 20 (out of 1042)
## response(s): 0
## Object of class 'OlsSearch':
## ontolgy: GO
## query: trans-golgi network
## requested: 20 (out of 25)
## response(s): 0
One case set the rows
argument to set the number of
desired results.
## Object of class 'OlsSearch':
## ontolgy: GO
## query: trans-golgi network
## requested: 200 (out of 1042)
## response(s): 0
See ?OlsSearch
for details about retrieving many
results.
Let’s proceed with the exact search and retrieve the results. Even if
we request the default 20 results, only the 227 relevant result will be
retrieved. The olsSearch
function updates the previously
created object (called qry
below) by adding the results to
it.
## Object of class 'OlsSearch':
## query: trans-golgi network
## requested: 20 (out of 227)
## response(s): 20
We can now transform this search result object into a fully fledged
Terms
object or a data.frame
.
## Warning in asMethod(object): 4 term failed to be instantiated.
## Object of class 'Terms' with 16 entries
## From 9 ontologies
## GO:0005802, NCIT:C33802 ... GO:0012510, GO:0098564
## 'data.frame': 20 obs. of 8 variables:
## $ iri : chr "http://purl.obolibrary.org/obo/GO_0005802" "http://purl.obolibrary.org/obo/NCIT_C33802" "http://purl.obolibrary.org/obo/OMIT_0020822" "http://purl.obolibrary.org/obo/PR_O43493" ...
## $ ontology_name : chr "go" "ncit" "omit" "pr" ...
## $ ontology_prefix: chr "GO" "NCIT" "OMIT" "PR" ...
## $ short_form : chr "GO_0005802" "NCIT_C33802" "OMIT_0020822" "PR_O43493" ...
## $ description :List of 20
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__ "There are different opinions about whether the TGN should be considered part of the Golgi apparatus or not. We "| __truncated__
## ..$ : chr "A network of membrane components where vesicles bud off the Golgi apparatus to bring proteins, membranes and ot"| __truncated__
## ..$ : chr
## ..$ : chr "A trans-Golgi network integral membrane protein 2 that is encoded in the genome of human." "Category=organism-gene."
## ..$ : chr "The lipid bilayer surrounding any of the compartments that make up the trans-Golgi network."
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__ "There are different opinions about whether the TGN should be considered part of the Golgi apparatus or not. We"| __truncated__
## ..$ : chr
## ..$ : chr
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## ..$ : chr
## ..$ : chr "Abnormal(ly) mislocalised (of) enterocyte of trans-Golgi network."
## ..$ : chr "A vesicle that mediates transport between the trans-Golgi network and other parts of the cell."
## ..$ : chr "The amount of a trans-Golgi network integral membrane protein 2 when measured in blood."
## ..$ : chr "The side (leaflet) of the trans-Golgi network transport vesicle membrane that faces the lumen."
## ..$ : chr "The side (leaflet) of the trans-Golgi network transport vesicle membrane that faces the cytoplasm."
## ..$ : chr "The lipid bilayer surrounding a vesicle transporting substances between the trans-Golgi network and other parts of the cell."
## ..$ : chr "The volume enclosed within the membrane of a trans-Golgi network transport vesicle."
## $ label : chr "trans-Golgi network" "Trans-Golgi Network" "trans-Golgi Network" "trans-Golgi network integral membrane protein 2 (human)" ...
## $ obo_id : chr "GO:0005802" "NCIT:C33802" "OMIT:0020822" "PR:O43493" ...
## $ type : chr "class" "class" "class" "class" ...
In this case, we can see that we actually retrieve the same term used across different ontologies. In such cases, it might be useful to keep only non-redundant term instances. Here, this would have been equivalent to searching the go, ncit, omit, pr, go, fma, zp, go, oba, go, go, go, go ontology
## GO:0005802 NCIT:C33802 OMIT:0020822 PR:O43493 GO:0032588 FMA:61756
## "go" "ncit" "omit" "pr" "go" "fma"
## ZP:0142408 GO:0030140 OBA:2051789 GO:0098540 GO:0098541 GO:0012510
## "zp" "go" "oba" "go" "go" "go"
## GO:0098564
## "go"
## $`GO:0005802`
## [1] "cellular_component"
##
## $`NCIT:C33802`
## NULL
##
## $`OMIT:0020822`
## NULL
##
## $`PR:O43493`
## [1] "protein"
##
## $`GO:0032588`
## [1] "cellular_component"
##
## $`FMA:61756`
## [1] "fma"
##
## $`ZP:0142408`
## NULL
##
## $`GO:0030140`
## [1] "cellular_component"
##
## $`OBA:2051789`
## NULL
##
## $`GO:0098540`
## [1] "cellular_component"
##
## $`GO:0098541`
## [1] "cellular_component"
##
## $`GO:0012510`
## [1] "cellular_component"
##
## $`GO:0098564`
## [1] "cellular_component"
Below, we execute the same query using the GO.db package.
## GOID: GO:0005802
## Term: trans-Golgi network
## Ontology: CC
## Definition: The network of interconnected tubular and cisternal
## structures located within the Golgi apparatus on the side distal to
## the endoplasmic reticulum, from which secretory vesicles emerge.
## The trans-Golgi network is important in the later stages of protein
## secretion where it is thought to play a key role in the sorting and
## targeting of secreted proteins to the correct destination.
## Synonym: TGN
## Synonym: trans Golgi network
## Synonym: Golgi trans face
## Synonym: Golgi trans-face
## Synonym: late Golgi
## Synonym: maturing face
## Synonym: trans face
It is possible to observe different results with rols and GO.db, as a result of the different ways they access the data. rols or biomaRt perform direct online queries, while GO.db and other annotation packages use database snapshot that are updated every release.
Both approaches have advantages. While online queries allow to obtain
the latest up-to-date information, such approaches rely on network
availability and quality. If reproducibility is a major issue, the
version of the database to be queried can easily be controlled with
off-line approaches. In the case of rols,
although the load date of a specific ontology can be queried with
olsVersion
, it is not possible to query a specific version
of an ontology.
rols 2.0 has
substantially changed. While the table below shows some correspondence
between the old and new interface, this is not always the case. The new
interface relies on the Ontology
/Ontologies
,
Term
/Terms
and OlsSearch
classes,
that need to be instantiated and can then be queried, as described
above.
version < 1.99 | version >= 1.99 |
---|---|
ontologyLoadDate |
olsLoaded and olsUpdated |
ontologyNames |
Ontologies |
olsVersion |
olsVersion |
allIds |
terms |
isIdObsolete |
isObsolete |
rootId |
olsRoot |
olsQuery |
OlsSearch and olsSearch |
Not all functionality is currently available. If there is anything that you need but not available in the new version, please contact the maintained by opening an issue on the package development site.
rols
version >= 2.99 has been refactored to use the
OLS4 REST API.httr
.Term()
and
Terms()
.Properties()
.The CVParam
class is used to handle controlled
vocabulary. It can be used for user-defined parameters
## [, , A user param, the value]
or official controlled vocabulary (which triggers a query to the OLS service)
## [MS, MS:1000073, electrospray ionization, ]
See ?CVParam
for more details and examples.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] DT_0.33 rols_3.3.0 GO.db_3.20.0
## [4] AnnotationDbi_1.69.0 IRanges_2.41.2 S4Vectors_0.45.2
## [7] Biobase_2.67.0 BiocGenerics_0.53.3 generics_0.1.3
## [10] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.3 sass_0.4.9 RSQLite_2.3.9
## [4] digest_0.6.37 magrittr_2.0.3 evaluate_1.0.1
## [7] fastmap_1.2.0 blob_1.2.4 jsonlite_1.8.9
## [10] GenomeInfoDb_1.43.2 DBI_1.2.3 BiocManager_1.30.25
## [13] httr_1.4.7 crosstalk_1.2.1 UCSC.utils_1.3.0
## [16] Biostrings_2.75.2 httr2_1.0.7 jquerylib_0.1.4
## [19] cli_3.6.3 rlang_1.1.4 crayon_1.5.3
## [22] XVector_0.47.0 bit64_4.5.2 cachem_1.1.0
## [25] yaml_2.3.10 tools_4.4.2 memoise_2.0.1
## [28] GenomeInfoDbData_1.2.13 curl_6.0.1 buildtools_1.0.0
## [31] vctrs_0.6.5 R6_2.5.1 png_0.1-8
## [34] lifecycle_1.0.4 zlibbioc_1.52.0 KEGGREST_1.47.0
## [37] htmlwidgets_1.6.4 bit_4.5.0.1 pkgconfig_2.0.3
## [40] bslib_0.8.0 glue_1.8.0 xfun_0.49
## [43] sys_3.4.3 knitr_1.49 htmltools_0.5.8.1
## [46] rmarkdown_2.29 maketools_1.3.1 compiler_4.4.2