An R interface to the Ontology Lookup Service

Introduction

Installation

rols is a Bioconductor package and should hence be installed using the dedicated functionality

## try http:// if https:// URLs are not supported
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("rols")

Getting help

To get help, either post your question on the Bioconductor support site or open an issue on the rols github page.

The resource

The Ontology Lookup Service (OLS) [1, 2] is originally spin-off of the PRoteomics IDEntifications database (PRIDE) service, located at the EBI, and is now developed and maintained by the Samples, Phenotypes and Ontologies team at EMBL-EBI.

The package

The OLS provides a REST interface to hundreds of ontologies from a single location with a unified output format. The rols package make this possible from within R. Do do so, it relies on the httr package to query the REST interface, and access and retrieve data.

There are 266 ontologies available in the OLS, listed in the table below. Their name is to be use to defined which ontology to query.

A Brief rols overview

The rols package is build around a few classes that enable to query the OLS and retrieve, store and manipulate data. Each of these classes are described in more details in their respective manual pages. We start by loading the package.

library("rols")

Ontologies

The Ontology and Ontologies classes can store information about single of multiple ontologies. The latter can be easily subset using [ and [[, as one would for lists.

ol <- Ontologies()
## ⠙ Iterating 11 done (5.3/s) | 2.1s
## ⠙ Iterating 14 done (5.4/s) | 2.6s
ol
## Object of class 'Ontologies' with 266 entries
##    NULL, NULL ... NULL, NULL
head(olsNamespace(ol))
## [1] "hra"     "dcat"    "biolink" "addicto" "dhba"    "hba"
ol[["bspo"]]
## Ontology: Biological Spatial Ontology (bspo)  
##   An ontology for respresenting spatial concepts, anatomical axes,
##   gradients, regions, planes, sides and surfaces. These concepts can be
##   used at multiple biological scales and in a diversity of taxa,
##   including plants, animals and fungi. The BSPO is used to provide a
##   source of anatomical location descriptors for logically defining
##   anatomical entity classes in anatomy ontologies.
##    Loaded: 2024-12-02 Updated: 2024-12-02 Version: 2023-05-27 
##    169 terms  236 properties  18 individuals

It is also possible to initialise a single ontology

bspo <- Ontology("bspo")
bspo
## Ontology: Biological Spatial Ontology (bspo)  
##   An ontology for respresenting spatial concepts, anatomical axes,
##   gradients, regions, planes, sides and surfaces. These concepts can be
##   used at multiple biological scales and in a diversity of taxa,
##   including plants, animals and fungi. The BSPO is used to provide a
##   source of anatomical location descriptors for logically defining
##   anatomical entity classes in anatomy ontologies.
##    Loaded: 2024-12-02 Updated: 2024-12-02 Version: 2023-05-27 
##    169 terms  236 properties  18 individuals

Terms

Single ontology terms are stored in Term objects. When more terms need to be manipulated, they are stored as Terms objects. It is easy to obtain all terms of an ontology of interest, and the resulting Terms object can be subset using [ and [[, as one would for lists.

bspotrms <- Terms(bspo) ## or Terms("bspo")
bspotrms
## Object of class 'Terms' with 169 entries
##  From the BSPO ontology
##   BFO:0000002, BFO:0000003 ... IAO:0000409, PATO:0000001
bspotrms[1:10]
## Object of class 'Terms' with 10 entries
##  From the BSPO ontology
##   BFO:0000002, BFO:0000003 ... BFO:0000023, BFO:0000031
bspotrms[["BSPO:0000092"]]
## A Term from the BSPO ontology: BSPO:0000092 
##  Label: anatomical compartment boundary
##   to be merged into CARO

It is also possible to initialise a single term

trm <- Term(bspo, "BSPO:0000092")
termId(trm)
## [1] "BSPO:0000092"
termLabel(trm)
## [1] "anatomical compartment boundary"

It is then possible to extract the ancestors, descendants, parents and children terms. Each of these functions return a Terms object

parents(trm)
## Object of class 'Terms' with 1 entries
##  From the BSPO ontology
## CARO:0000010
children(trm)
## Object of class 'Terms' with 6 entries
##  From the BSPO ontology
##   BSPO:0000094, BSPO:0000093 ... BSPO:0000041, BSPO:0000040

Finally, a single term or terms object can be coerced to a data.frame using as(x, "data.frame").

Properties

Properties (relationships) of single or multiple terms or complete ontologies can be queries with the properties method, as briefly illustrated below.

trm <- Term("uberon", "UBERON:0002107")
trm
## A Term from the UBERON ontology: UBERON:0002107 
##  Label: liver
##   An exocrine gland which secretes bile and functions in metabolism of
##   protein and carbohydrate and fat, synthesizes substances involved in
##   the clotting of the blood, synthesizes vitamin A, detoxifies poisonous
##   substances, stores glycogen, and breaks down worn-out erythrocytes[GO].
p <- Properties(trm)
p
## Object of class 'Properties' with 160 entries
##  From the UBERON ontology
##   abdomen, digestive system ... liver serosa, liver subserosa
p[[1]]
## A Property from the UBERON ontology: UBERON:0000916 
##  Label: abdomen
termLabel(p[[1]])
## [1] "abdomen"

Use case

A researcher might be interested in the trans-Golgi network. Searching the OLS is assured by the OlsSearch and olsSearch classes/functions. The first step is to defined the search query with OlsSearch, as shown below. This creates an search object of class OlsSearch that stores the query and its parameters. In records the number of requested results (default is 20) and the total number of possible results (there are 17500 results across all ontologies, in this case). At this stage, the results have not yet been downloaded, as shown by the 0 responses.

OlsSearch(q = "trans-golgi network")
## Object of class 'OlsSearch':
##   query: trans-golgi network 
##   requested: 20 (out of 17500)
##   response(s): 0

17500 results are probably too many to be relevant. Below we show how to perform an exact search by setting exact = TRUE, and limiting the search the the GO ontology by specifying ontology = "GO", or doing both.

OlsSearch(q = "trans-golgi network", exact = TRUE)
## Object of class 'OlsSearch':
##   query: trans-golgi network 
##   requested: 20 (out of 227)
##   response(s): 0
OlsSearch(q = "trans-golgi network", ontology = "GO")
## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 20 (out of 1042)
##   response(s): 0
OlsSearch(q = "trans-golgi network", ontology = "GO", exact = TRUE)
## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 20 (out of 25)
##   response(s): 0

One case set the rows argument to set the number of desired results.

OlsSearch(q = "trans-golgi network", ontology = "GO", rows = 200)
## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 200 (out of 1042)
##   response(s): 0

See ?OlsSearch for details about retrieving many results.

Let’s proceed with the exact search and retrieve the results. Even if we request the default 20 results, only the 227 relevant result will be retrieved. The olsSearch function updates the previously created object (called qry below) by adding the results to it.

qry <- OlsSearch(q = "trans-golgi network", exact = TRUE)
(qry <- olsSearch(qry))
## Object of class 'OlsSearch':
##   query: trans-golgi network 
##   requested: 20 (out of 227)
##   response(s): 20

We can now transform this search result object into a fully fledged Terms object or a data.frame.

(qtrms <- as(qry, "Terms"))
## Warning in asMethod(object): 4 term failed to be instantiated.
## Object of class 'Terms' with 16 entries
##  From  9 ontologies
##   GO:0005802, NCIT:C33802 ... GO:0012510, GO:0098564
str(qdrf <- as(qry, "data.frame"))
## 'data.frame':    20 obs. of  8 variables:
##  $ iri            : chr  "http://purl.obolibrary.org/obo/GO_0005802" "http://purl.obolibrary.org/obo/NCIT_C33802" "http://purl.obolibrary.org/obo/OMIT_0020822" "http://purl.obolibrary.org/obo/PR_O43493" ...
##  $ ontology_name  : chr  "go" "ncit" "omit" "pr" ...
##  $ ontology_prefix: chr  "GO" "NCIT" "OMIT" "PR" ...
##  $ short_form     : chr  "GO_0005802" "NCIT_C33802" "OMIT_0020822" "PR_O43493" ...
##  $ description    :List of 20
##   ..$ : chr  "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__ "There are different opinions about whether the TGN should be considered part of the Golgi apparatus or not. We "| __truncated__
##   ..$ : chr "A network of membrane components where vesicles bud off the Golgi apparatus to bring proteins, membranes and ot"| __truncated__
##   ..$ : chr 
##   ..$ : chr  "A trans-Golgi network integral membrane protein 2 that is encoded in the genome of human." "Category=organism-gene."
##   ..$ : chr "The lipid bilayer surrounding any of the compartments that make up the trans-Golgi network."
##   ..$ : chr  "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__ "There are different opinions about whether the TGN should be considered part of the Golgi apparatus or not.  We"| __truncated__
##   ..$ : chr 
##   ..$ : chr 
##   ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
##   ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
##   ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
##   ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
##   ..$ : chr 
##   ..$ : chr "Abnormal(ly) mislocalised (of) enterocyte of trans-Golgi network."
##   ..$ : chr "A vesicle that mediates transport between the trans-Golgi network and other parts of the cell."
##   ..$ : chr "The amount of a trans-Golgi network integral membrane protein 2 when measured in blood."
##   ..$ : chr "The side (leaflet) of the trans-Golgi network transport vesicle membrane that faces the lumen."
##   ..$ : chr "The side (leaflet) of the trans-Golgi network transport vesicle membrane that faces the cytoplasm."
##   ..$ : chr "The lipid bilayer surrounding a vesicle transporting substances between the trans-Golgi network and other parts of the cell."
##   ..$ : chr "The volume enclosed within the membrane of a trans-Golgi network transport vesicle."
##  $ label          : chr  "trans-Golgi network" "Trans-Golgi Network" "trans-Golgi Network" "trans-Golgi network integral membrane protein 2 (human)" ...
##  $ obo_id         : chr  "GO:0005802" "NCIT:C33802" "OMIT:0020822" "PR:O43493" ...
##  $ type           : chr  "class" "class" "class" "class" ...

In this case, we can see that we actually retrieve the same term used across different ontologies. In such cases, it might be useful to keep only non-redundant term instances. Here, this would have been equivalent to searching the go, ncit, omit, pr, go, fma, zp, go, oba, go, go, go, go ontology

qtrms <- unique(qtrms)
termOntology(qtrms)
##   GO:0005802  NCIT:C33802 OMIT:0020822    PR:O43493   GO:0032588    FMA:61756 
##         "go"       "ncit"       "omit"         "pr"         "go"        "fma" 
##   ZP:0142408   GO:0030140  OBA:2051789   GO:0098540   GO:0098541   GO:0012510 
##         "zp"         "go"        "oba"         "go"         "go"         "go" 
##   GO:0098564 
##         "go"
termNamespace(qtrms)
## $`GO:0005802`
## [1] "cellular_component"
## 
## $`NCIT:C33802`
## NULL
## 
## $`OMIT:0020822`
## NULL
## 
## $`PR:O43493`
## [1] "protein"
## 
## $`GO:0032588`
## [1] "cellular_component"
## 
## $`FMA:61756`
## [1] "fma"
## 
## $`ZP:0142408`
## NULL
## 
## $`GO:0030140`
## [1] "cellular_component"
## 
## $`OBA:2051789`
## NULL
## 
## $`GO:0098540`
## [1] "cellular_component"
## 
## $`GO:0098541`
## [1] "cellular_component"
## 
## $`GO:0012510`
## [1] "cellular_component"
## 
## $`GO:0098564`
## [1] "cellular_component"

Below, we execute the same query using the GO.db package.

library("GO.db")
GOTERM[["GO:0005802"]]
## GOID: GO:0005802
## Term: trans-Golgi network
## Ontology: CC
## Definition: The network of interconnected tubular and cisternal
##     structures located within the Golgi apparatus on the side distal to
##     the endoplasmic reticulum, from which secretory vesicles emerge.
##     The trans-Golgi network is important in the later stages of protein
##     secretion where it is thought to play a key role in the sorting and
##     targeting of secreted proteins to the correct destination.
## Synonym: TGN
## Synonym: trans Golgi network
## Synonym: Golgi trans face
## Synonym: Golgi trans-face
## Synonym: late Golgi
## Synonym: maturing face
## Synonym: trans face

On-line vs. off-line data

It is possible to observe different results with rols and GO.db, as a result of the different ways they access the data. rols or biomaRt perform direct online queries, while GO.db and other annotation packages use database snapshot that are updated every release.

Both approaches have advantages. While online queries allow to obtain the latest up-to-date information, such approaches rely on network availability and quality. If reproducibility is a major issue, the version of the database to be queried can easily be controlled with off-line approaches. In the case of rols, although the load date of a specific ontology can be queried with olsVersion, it is not possible to query a specific version of an ontology.

Changes

Version 2.0

rols 2.0 has substantially changed. While the table below shows some correspondence between the old and new interface, this is not always the case. The new interface relies on the Ontology/Ontologies, Term/Terms and OlsSearch classes, that need to be instantiated and can then be queried, as described above.

version < 1.99 version >= 1.99
ontologyLoadDate olsLoaded and olsUpdated
ontologyNames Ontologies
olsVersion olsVersion
allIds terms
isIdObsolete isObsolete
rootId olsRoot
olsQuery OlsSearch and olsSearch

Not all functionality is currently available. If there is anything that you need but not available in the new version, please contact the maintained by opening an issue on the package development site.

Version 2.99

  • rols version >= 2.99 has been refactored to use the OLS4 REST API.
  • REST queries now use httr2 instead of superseded httr.
  • The term(s) constructors are capitalised as Term() and Terms().
  • The properties constructor is capitalised as Properties().
  • Some class definitions have been updated to accomodate changes in the data received by OLS. Some function have been dropped.

CVParams

The CVParam class is used to handle controlled vocabulary. It can be used for user-defined parameters

CVParam(name = "A user param", value = "the value")
## [, , A user param, the value]

or official controlled vocabulary (which triggers a query to the OLS service)

CVParam(label = "MS", accession = "MS:1000073")
## [MS, MS:1000073, electrospray ionization, ]

See ?CVParam for more details and examples.

Session information

## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] DT_0.33              rols_3.3.0           GO.db_3.20.0        
##  [4] AnnotationDbi_1.69.0 IRanges_2.41.2       S4Vectors_0.45.2    
##  [7] Biobase_2.67.0       BiocGenerics_0.53.3  generics_0.1.3      
## [10] BiocStyle_2.35.0    
## 
## loaded via a namespace (and not attached):
##  [1] rappdirs_0.3.3          sass_0.4.9              RSQLite_2.3.9          
##  [4] digest_0.6.37           magrittr_2.0.3          evaluate_1.0.1         
##  [7] fastmap_1.2.0           blob_1.2.4              jsonlite_1.8.9         
## [10] GenomeInfoDb_1.43.2     DBI_1.2.3               BiocManager_1.30.25    
## [13] httr_1.4.7              crosstalk_1.2.1         UCSC.utils_1.3.0       
## [16] Biostrings_2.75.2       httr2_1.0.7             jquerylib_0.1.4        
## [19] cli_3.6.3               rlang_1.1.4             crayon_1.5.3           
## [22] XVector_0.47.0          bit64_4.5.2             cachem_1.1.0           
## [25] yaml_2.3.10             tools_4.4.2             memoise_2.0.1          
## [28] GenomeInfoDbData_1.2.13 curl_6.0.1              buildtools_1.0.0       
## [31] vctrs_0.6.5             R6_2.5.1                png_0.1-8              
## [34] lifecycle_1.0.4         zlibbioc_1.52.0         KEGGREST_1.47.0        
## [37] htmlwidgets_1.6.4       bit_4.5.0.1             pkgconfig_2.0.3        
## [40] bslib_0.8.0             glue_1.8.0              xfun_0.49              
## [43] sys_3.4.3               knitr_1.49              htmltools_0.5.8.1      
## [46] rmarkdown_2.29          maketools_1.3.1         compiler_4.4.2