Database knowledge is essential for omics data analysis and modeling. Despite being an important factor, contributing to the outcome of studies, often subject to little attention. With OmniPath our aim is to raise awarness of the diversity of available resources and facilitate access to these resources in a uniform and transparent way. OmniPath has been developed in a close contact to mechanistic modeling applications and functional omics analysis, hence it is especially suitable for these fields. OmniPath has been used for the analysis of various omics data. In the Saez-Rodriguez group we often use it in a pipeline with our footprint based methods DoRothEA and PROGENy and our causal reasoning method CARNIVAL to infer signaling mechanisms from transcriptomics data.
One recent novelty of OmniPath is a collection of intercellular communication interactions. Apart from simply merging data from existing resources, OmniPath defines a number of intercellular communication roles, such as ligand, receptor, adhesion, enzyme, matrix, etc, and generalizes the terms ligand and receptor by introducing the terms transmitter, receiver and mediator. This unique knowledge base is especially adequate for the emerging field of cell-cell communication analysis, typically from single cell transcriptomics, but also from other kinds of data.
No special pre-requisites apart from basic knowledge of R. OmniPath, the database resource in the focus of this workshop has been published in [1,2], however you don’t need to know anything about OmniPath to benefit from the workshop. In the workshop we will demonstrate the R/Bioconductor package OmnipathR. If you would like to try the examples yourself we recommend to install the latest version of the package before the workshop:
In the workshop we will present the design and some important features of the OmniPath database, so can be confident you get the most out of it. Then we will demonstrate further useful features of the OmnipathR package, such as accessing other resources, building graphs. Participants are encouraged to experiment with the examples and shape the contents of the workshop by asking questions. We are happy to recieve questions and topic suggestions by email also before the workshop. These could help us to adjust the contents to the interests of the participants.
Total: 45 minutes
Activity | Time |
---|---|
OmniPath database overview | 5m |
Network datasets | 10m |
Other OmniPath databases | 5m |
Intercellular communication | 10m |
Igraph integration | 5m |
Further resources | 10m |
In this workshop you will get familiar with the design and features of the OmniPath databases. For example, to know some important details about the datasets and parameters which help you to query the database the most suitable way according to your purposes. You will also learn about functionalities of the OmnipathR package which might make your work easier.
OmniPath consists of five major databases, each combining many original resources. The five databases are:
The parameters for each database (query type) are available in the web service, for example: https://omnipathdb.org/queries/interactions. The R package supports all features of the web service and the parameter names and values usually correspond to the web service parameters which you would use in a HTTP query string.
The network database contains protein-protein, gene regulatory and miRNA-mRNA interactions. Soon more interaction types will be added. Some of these categories can be further divided into datasets which are defined by the type of evidences. A full list of network datasets:
The functions accessing the above datasets are listed here.
Not individual interactions but resource are classified into the
datasets above, so these can overlap. Each interaction type and dataset
has its dedicated function in OmnipathR
, above we provide
links to their help pages. As an example, let’s see the gene regulatory
interactions:
## # A tibble: 128,218 × 16
## source target source_genesymbol target_genesymbol is_directed is_stimulation is_inhibition
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 P01106 O14746 MYC TERT 1 1 0
## 2 P17947 P02818 SPI1 BGLAP 1 1 0
## 3 COMPLEX:P15407_P17275 P05412 FOSL1_JUNB JUN 1 1 0
## 4 COMPLEX:P01100_P05412 P05412 FOS_JUN JUN 1 1 0
## 5 COMPLEX:P01100_P17275 P05412 FOS_JUNB JUN 1 1 0
## 6 COMPLEX:P15408_P17535 P05412 FOSL2_JUND JUN 1 1 0
## 7 COMPLEX:P05412_P15408 P05412 FOSL2_JUN JUN 1 1 0
## 8 COMPLEX:P05412 P05412 JUN JUN 1 1 0
## 9 COMPLEX:P17275_P53539 P05412 FOSB_JUNB JUN 1 1 0
## 10 COMPLEX:P17275 P05412 JUNB JUN 1 1 0
## # ℹ 128,208 more rows
## # ℹ 9 more variables: consensus_direction <dbl>, consensus_stimulation <dbl>, consensus_inhibition <dbl>,
## # sources <chr>, references <chr>, curation_effort <dbl>, dorothea_level <chr>, n_references <dbl>,
## # n_resources <int>
The interaction data frame contains the UniProt IDs and Gene Symbols of the interacting partners, the list of resources and references (PubMed IDs) for each interaction, and whether the interaction is directed, stimulatory or inhibitory.
The network data frames can be converted to igraph graph objects, so you can make use of the graph and visualization methods of igraph:
## IGRAPH 9dfbcd2 DN-- 15679 128218 --
## + attr: name (v/c), up_ids (v/c), is_directed (e/n), is_stimulation (e/n), is_inhibition (e/n),
## | consensus_direction (e/n), consensus_stimulation (e/n), consensus_inhibition (e/n), sources
## | (e/x), references (e/x), curation_effort (e/n), dorothea_level (e/c), n_references (e/n),
## | n_resources (e/n)
## + edges from 9dfbcd2 (vertex names):
## [1] MYC ->TERT SPI1 ->BGLAP FOSL1_JUNB->JUN FOS_JUN ->JUN FOS_JUNB ->JUN
## [6] FOSL2_JUND->JUN FOSL2_JUN ->JUN JUN ->JUN FOSB_JUNB ->JUN JUNB ->JUN
## [11] FOSL2_JUNB->JUN JUND ->JUN JUN_JUND ->JUN JUNB_JUND ->JUN FOSL1_JUND->JUN
## [16] FOSL1_JUN ->JUN FOSB_JUND ->JUN FOSB_JUN ->JUN JUN_JUNB ->JUN FOS_JUND ->JUN
## [21] SMAD3 ->JUN SMAD4 ->JUN STAT5A ->IL2 STAT5B ->IL2 RELA ->FAS
## + ... omitted several edges
On this network we can use OmnipathR
’s
find_all_paths
function, which is able to look up all paths
up to a certain length between two set of nodes:
paths <- find_all_paths(
graph = gr_graph,
start = c('EGFR', 'STAT3'),
end = c('AKT1', 'ULK1'),
attr = 'name'
)
As this is a gene regulatory network, the paths are TFs regulating the transcription of other TFs.
Enzyme-substrate interactions are also available also in the interactions query, but the enzyme-substrate query type provides additional information about the PTM types and residues.
## # A tibble: 43,269 × 12
## enzyme substrate enzyme_genesymbol substrate_genesymbol residue_type residue_offset modification sources
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 P06239 O14543 LCK SOCS3 Y 204 phosphorylation KEA;MI…
## 2 P06239 O14543 LCK SOCS3 Y 221 phosphorylation KEA;MI…
## 3 P12931 O14746 SRC TERT Y 707 phosphorylation BEL-La…
## 4 P06241 O15117 FYN FYB1 Y 651 phosphorylation HPRD;K…
## 5 P06241 O15117 FYN FYB1 Y 595 phosphorylation HPRD;K…
## 6 P06241 O15117 FYN FYB1 Y 697 phosphorylation HPRD;K…
## 7 P06241 O15117 FYN FYB1 Y 625 phosphorylation Phosph…
## 8 P06241 O15117 FYN FYB1 Y 571 phosphorylation Phosph…
## 9 P06241 O15117 FYN FYB1 Y 771 phosphorylation Phosph…
## 10 P06241 O15117 FYN FYB1 Y 559 phosphorylation Phosph…
## # ℹ 43,259 more rows
## # ℹ 4 more variables: references <chr>, curation_effort <dbl>, n_references <dbl>, n_resources <int>
This data frame also can be converted to an igraph object:
## IGRAPH afd7e44 DN-- 4793 43269 --
## + attr: name (v/c), up_ids (v/c), residue_type (e/c), residue_offset (e/n), modification (e/c),
## | sources (e/x), references (e/x), curation_effort (e/n), n_references (e/n), n_resources (e/n)
## + edges from afd7e44 (vertex names):
## [1] LCK ->SOCS3 LCK ->SOCS3 SRC ->TERT FYN ->FYB1 FYN ->FYB1 FYN ->FYB1 FYN ->FYB1
## [8] FYN ->FYB1 FYN ->FYB1 FYN ->FYB1 FYN ->FYB1 FYN ->FYB1 FYN ->FYB1 FYN ->FYB1
## [15] FYN ->FYB1 FYN ->FYB1 FYN ->FYB1 FYN ->FYB1 FYN ->FYB1 FYN ->FYB1 FYN ->FYB1
## [22] ABL1 ->PLSCR1 ABL1 ->PLSCR1 SRC ->PLSCR1 SRC ->PLSCR1 ABL1 ->TP73 CDK2 ->TP73 CHEK1->TP73
## [29] AURKB->BIRC5 AURKB->BIRC5 AURKB->BIRC5 CDK1 ->BIRC5 PDPK1->PDPK1 PDPK1->PDPK1 PDPK1->PDPK1
## [36] PDPK1->PDPK1 PDPK1->PDPK1 PDPK1->PDPK1 PDPK1->PDPK1 PDPK1->PDPK1 PDPK1->PDPK1 PDPK1->PDPK1
## [43] PDPK1->PDPK1 PDPK1->PDPK1 SRC ->PDPK1 SRC ->PDPK1 SRC ->PDPK1 SRC ->PDPK1 SRC ->PDPK1
## + ... omitted several edges
It is also possible to add effect signs (stimulatory or inhibitory) to enzyme-PTM relationships:
## # A tibble: 35,459 × 7
## name components components_genesymbols stoichiometry sources references identifiers
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 NFY P23511_P25208_Q13952 NFYA_NFYB_NFYC 1:1:1 CORUM;… 9372932;1… CORUM:4478…
## 2 mTORC2 P42345_P68104_P85299_Q6R3… DEPTOR_EEF1A1_MLST8_M… 0:0:0:0:0:0 SIGNOR <NA> SIGNOR:SIG…
## 3 mTORC1 P42345_Q8N122_Q8TB45_Q96B… AKT1S1_DEPTOR_MLST8_M… 0:0:0:0:0 SIGNOR <NA> SIGNOR:SIG…
## 4 SCF-betaTRCP P63208_Q13616_Q9Y297 BTRC_CUL1_SKP1 1:1:1 CORUM;… 9990852 CORUM:227;…
## 5 CBP/p300 Q09472_Q92793 CREBBP_EP300 0:0 SIGNOR <NA> SIGNOR:SIG…
## 6 P300/PCAF Q09472_Q92793_Q92831 CREBBP_EP300_KAT2B 0:0:0 SIGNOR <NA> SIGNOR:SIG…
## 7 SMAD2/SMAD4 Q13485_Q15796 SMAD2_SMAD4 1:2 Comple… 12923550;… PDB:1u7v;S…
## 8 SMAD3/SMAD4 P84022_Q13485 SMAD3_SMAD4 2:1 Comple… 12923550;… PDB:1U7F;P…
## 9 SMAD4/JUN P05412_Q13485 JUN_SMAD4 0:0 SIGNOR <NA> SIGNOR:SIG…
## 10 SMAD2/SMURF2 Q15796_Q9HAU4 SMAD2_SMURF2 1:1 Comple… 11389444 Compleat:H…
## # ℹ 35,449 more rows
The resulted data frame provides the constitution and stoichiometry of protein complexes, with references.
The annotations query type includes a diverse set of resources (about 60 of them), about protein function, localization, structure and expression. For most use cases it is better to convert the data into wide data frames, as these correspond to the original format of the resources. If you load more than one resources into wide data frames, the result will be a list of data frames, otherwise one data frame. See a few examples with localization data from UniProt, tissue expression from Human Protein Atlas and pathway information from SignaLink:
## # A tibble: 67,934 × 5
## uniprot genesymbol entity_type location features
## <chr> <chr> <chr> <chr> <chr>
## 1 Q96JT2 SLC45A3 protein Membrane Multi-pass membrane protein
## 2 Q9UP95 SLC12A4 protein Membrane Multi-pass membrane protein
## 3 Q08357 SLC20A2 protein Cell membrane Multi-pass membrane protein
## 4 O94855 SEC24D protein Cytoplasm <NA>
## 5 O94855 SEC24D protein Endoplasmic reticulum membrane Peripheral membrane protein;Cytoplasmic side
## 6 O94855 SEC24D protein COPII-coated vesicle membrane Peripheral membrane protein;Cytoplasmic side
## 7 O94855 SEC24D protein Cytosol <NA>
## 8 O94855 SEC24D protein Cytoplasmic vesicle <NA>
## 9 Q8N2U9 SLC66A2 protein Membrane Multi-pass membrane protein
## 10 Q96CW6 SLC7A6OS protein Cytoplasm <NA>
## # ℹ 67,924 more rows
The entity_type
field can be protein, mirna or complex.
Protein complexes mostly annotated based on the consensus of their
members, we should be aware that this is an in silico
inference.
In case of spelling mistake either in parameter names or values
OmnipathR
either corrects the mistake or gives a warning or
error:
## Warning in omnipath_check_param(.): The following resources are not available: Uniprot_location. Check the
## resource names for spelling mistakes.
Above the name of the resource is wrong. If the parameter name is wrong, it throws an error:
## Error in annotations(resuorces = "UniProt_location", wide = TRUE): Downloading the entire annotations database is not allowed by default because of its huge size (>1GB). If you really want to do that, you find static files at https://archive.omnipathdb.org/. However we recommend to query a set of proteins or a few resources, depending on your interest.
Singular vs. plural forms and a few synonyms are automatically corrected:
Another example with tissue expression from Human Protein Atlas:
hpa_tissue <- annotations(
resources = 'HPA_tissue',
wide = TRUE,
# Limiting to a handful of proteins for a faster vignette build:
proteins = c('DLL1', 'MEIS2', 'PHOX2A', 'BACH1', 'KLF11', 'FOXO3', 'MEFV')
)
hpa_tissue
## # A tibble: 529 × 15
## uniprot genesymbol entity_type organ tissue level status prognostic favourable pathology n_not_detected
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <dbl>
## 1 O43524 FOXO3 protein placenta endot… Not … Suppo… FALSE FALSE FALSE NA
## 2 O43524 FOXO3 protein cerebral… glial… Not … Suppo… FALSE FALSE FALSE NA
## 3 O43524 FOXO3 protein lung alveo… High Suppo… FALSE FALSE FALSE NA
## 4 O43524 FOXO3 protein urotheli… uroth… Medi… <NA> FALSE TRUE TRUE 3
## 5 O43524 FOXO3 protein seminal … gland… Medi… Suppo… FALSE FALSE FALSE NA
## 6 O43524 FOXO3 protein skin 1 fibro… Not … Suppo… FALSE FALSE FALSE NA
## 7 O43524 FOXO3 protein colon gland… Medi… Suppo… FALSE FALSE FALSE NA
## 8 O43524 FOXO3 protein skeletal… myocy… Medi… Suppo… FALSE FALSE FALSE NA
## 9 O43524 FOXO3 protein colon endot… Medi… Suppo… FALSE FALSE FALSE NA
## 10 O43524 FOXO3 protein testis Leydi… High Suppo… FALSE FALSE FALSE NA
## # ℹ 519 more rows
## # ℹ 4 more variables: n_low <dbl>, n_medium <dbl>, n_high <dbl>, score <dbl>
And pathway annotations from SignaLink:
## # A tibble: 2,487 × 4
## uniprot genesymbol entity_type pathway
## <chr> <chr> <chr> <chr>
## 1 P20963 CD247 protein T-cell receptor
## 2 P43403 ZAP70 protein Receptor tyrosine kinase
## 3 P43403 ZAP70 protein T-cell receptor
## 4 Q9NYJ8 TAB2 protein Toll-like receptor
## 5 Q9NYJ8 TAB2 protein Innate immune pathways
## 6 Q9NYJ8 TAB2 protein JAK/STAT
## 7 Q9NYJ8 TAB2 protein Receptor tyrosine kinase
## 8 O43318 MAP3K7 protein TNF pathway
## 9 O43318 MAP3K7 protein T-cell receptor
## 10 O43318 MAP3K7 protein Receptor tyrosine kinase
## # ℹ 2,477 more rows
Annotations can be easily added to network data frames, in this case both the source and target nodes will have their annotation data. This function accepts either the name of an annotation resource or an annotation data frame:
network <- omnipath()
network_slk_pw <- annotated_network(network, 'SignaLink_pathway')
network_slk_pw
## # A tibble: 116,712 × 17
## source target source_genesymbol target_genesymbol is_directed is_stimulation is_inhibition
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 P0DP23 P48995 CALM1 TRPC1 1 0 1
## 2 P0DP25 P48995 CALM3 TRPC1 1 0 1
## 3 P0DP24 P48995 CALM2 TRPC1 1 0 1
## 4 Q03135 P48995 CAV1 TRPC1 1 1 0
## 5 P14416 P48995 DRD2 TRPC1 1 1 0
## 6 Q99750 P48995 MDFI TRPC1 1 0 1
## 7 Q14571 P48995 ITPR2 TRPC1 1 1 0
## 8 P29966 P48995 MARCKS TRPC1 1 0 1
## 9 Q13255 P48995 GRM1 TRPC1 1 1 0
## 10 Q13586 P48995 STIM1 TRPC1 1 1 0
## # ℹ 116,702 more rows
## # ℹ 10 more variables: consensus_direction <dbl>, consensus_stimulation <dbl>, consensus_inhibition <dbl>,
## # sources <chr>, references <chr>, curation_effort <dbl>, n_references <int>, n_resources <int>,
## # pathway_source <chr>, pathway_target <chr>
The intercell
database assigns roles to proteins such as
ligand, receptor, adhesion, transporter, ECM, etc. The design of this
database is far from being simple, best is to check the description in
the recent OmniPath paper [1].
## # A tibble: 332,447 × 15
## category parent database scope aspect source uniprot genesymbol entity_type consensus_score transmitter
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <lgl>
## 1 transmembra… trans… UniProt… gene… locat… resou… Q8N661 TMEM86B protein 4 FALSE
## 2 transmembra… trans… UniProt… gene… locat… resou… Q8IWU2 LMTK2 protein 7 FALSE
## 3 transmembra… trans… UniProt… gene… locat… resou… P41273 TNFSF9 protein 7 FALSE
## 4 transmembra… trans… UniProt… gene… locat… resou… Q9Y661 HS3ST4 protein 4 FALSE
## 5 transmembra… trans… UniProt… gene… locat… resou… Q9UPX0 IGSF9B protein 5 FALSE
## 6 transmembra… trans… UniProt… gene… locat… resou… Q9NYV7 TAS2R16 protein 5 FALSE
## 7 transmembra… trans… UniProt… gene… locat… resou… P01911 HLA-DRB1 protein 8 FALSE
## 8 transmembra… trans… UniProt… gene… locat… resou… Q6P9B9 INTS5 protein 4 FALSE
## 9 transmembra… trans… UniProt… gene… locat… resou… P05496 ATP5MC1 protein 5 FALSE
## 10 transmembra… trans… UniProt… gene… locat… resou… P55344 LIM2 protein 4 FALSE
## # ℹ 332,437 more rows
## # ℹ 4 more variables: receiver <lgl>, secreted <lgl>, plasma_membrane_transmembrane <lgl>,
## # plasma_membrane_peripheral <lgl>
This data frame is about individual proteins. To create a network of
intercellular communication, we provide the
intercell_network
function:
## # A tibble: 18,089 × 45
## category_intercell_sou…¹ parent_intercell_sou…² source target category_intercell_t…³ parent_intercell_tar…⁴
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 activating_cofactor receptor_regulator O14786 P35968 adhesion adhesion
## 2 activating_cofactor receptor_regulator O14786 P35968 cell_adhesion cell_adhesion
## 3 activating_cofactor receptor_regulator O14786 P35968 matrix_adhesion matrix_adhesion
## 4 activating_cofactor receptor_regulator O14786 P35968 receptor receptor
## 5 activating_cofactor receptor_regulator P08138 P04629 adhesion adhesion
## 6 activating_cofactor receptor_regulator P08138 P04629 cell_adhesion cell_adhesion
## 7 activating_cofactor receptor_regulator P08138 P04629 receptor receptor
## 8 activating_cofactor receptor_regulator P34925 Q9H461 receptor receptor
## 9 activating_cofactor receptor_regulator Q01973 P21860 adhesion adhesion
## 10 activating_cofactor receptor_regulator Q01973 P21860 cell_adhesion cell_adhesion
## # ℹ 18,079 more rows
## # ℹ abbreviated names: ¹category_intercell_source, ²parent_intercell_source, ³category_intercell_target,
## # ⁴parent_intercell_target
## # ℹ 39 more variables: target_genesymbol <chr>, source_genesymbol <chr>, is_directed <dbl>,
## # is_stimulation <dbl>, is_inhibition <dbl>, consensus_direction <dbl>, consensus_stimulation <dbl>,
## # consensus_inhibition <dbl>, omnipath <lgl>, ligrecextra <lgl>, sources <chr>, references <chr>,
## # curation_effort <dbl>, n_references <dbl>, n_resources <int>, database_intercell_source <chr>, …
The result is similar to the annotated_network
, each
interacting partner has its intercell annotations. In the
intercell
database, OmniPath aims to ship all available
information, which means it might contain quite some false positives.
The high_confidence
option is a shortcut to stringent
filter settings based on the number and consensus of provenances. Using
instead the filter_intercell_network
function, you can have
a fine control over the quality filters. It has many options which are
described in the manual.
icn <- intercell_network()
icn_hc <- filter_intercell_network(
icn,
ligand_receptor = TRUE,
consensus_percentile = 30,
loc_consensus_percentile = 50,
simplify = TRUE
)
The filter_intecell
function does a similar procedure on
an intercell annotation data frame.
The list of available resources for each query type can be retrieved
by the ..._resources
functions. For example, the annotation
resources are:
## [1] "Adhesome" "Almen2009" "Baccin2019" "CORUM_Funcat"
## [5] "CORUM_GO" "CSPA" "CSPA_celltype" "CancerDrugsDB"
## [9] "CancerGeneCensus" "CancerSEA" "CellCall" "CellCellInteractions"
## [13] "CellChatDB" "CellChatDB_complex" "CellPhoneDB" "CellPhoneDB_complex"
## [17] "CellTalkDB" "CellTypist" "Cellinker" "Cellinker_complex"
## [21] "ComPPI" "CytoSig" "DGIdb" "DisGeNet"
## [25] "EMBRACE" "Exocarta" "GO_Intercell" "GPCRdb"
## [29] "Guide2Pharma" "HGNC" "HPA_secretome" "HPA_subcellular"
## [33] "HPA_tissue" "HPMR" "HumanCellMap" "ICELLNET"
## [37] "ICELLNET_complex" "IntOGen" "Integrins" "InterPro"
## [41] "KEGG-PC" "Kirouac2010" "LOCATE" "LRdb"
## [45] "Lambert2018" "MCAM" "MSigDB" "Matrisome"
## [49] "MatrixDB" "Membranome" "NetPath" "OPM"
## [53] "PROGENy" "PanglaoDB" "Phobius" "Phosphatome"
## [57] "Ramilowski2015" "Ramilowski_location" "SIGNOR" "SignaLink_function"
## [61] "SignaLink_pathway" "Surfaceome" "TCDB" "TFcensus"
## [65] "TopDB" "UniProt_family" "UniProt_keyword" "UniProt_location"
## [69] "UniProt_tissue" "UniProt_topology" "Vesiclepedia" "Wang"
## [73] "Zhong2015" "connectomeDB2020" "iTALK" "kinase.com"
## [77] "scConnect" "scConnect_complex" "talklr"
Categories in the intercell
query also can be
listed:
## [1] "transmembrane" "transmembrane_predicted"
## [3] "peripheral" "plasma_membrane"
## [5] "plasma_membrane_transmembrane" "plasma_membrane_regulator"
## [7] "plasma_membrane_peripheral" "secreted"
## [9] "cell_surface" "ecm"
## [11] "ligand" "receptor"
## [13] "secreted_enzyme" "secreted_peptidase"
## [15] "extracellular" "intracellular"
## [17] "receptor_regulator" "secreted_receptor"
## [19] "sparc_ecm_regulator" "ecm_regulator"
## [21] "ligand_regulator" "cell_surface_ligand"
## [23] "cell_adhesion" "matrix_adhesion"
## [25] "adhesion" "matrix_adhesion_regulator"
## [27] "cell_surface_enzyme" "cell_surface_peptidase"
## [29] "secreted_enyzme" "extracellular_peptidase"
## [31] "secreted_peptidase_inhibitor" "transporter"
## [33] "ion_channel" "ion_channel_regulator"
## [35] "gap_junction" "tight_junction"
## [37] "adherens_junction" "desmosome"
## [39] "intracellular_intercellular_related"
An increasing number of other resources (currently around 20) can be
directly accessed by OmnipathR
(not from the omnipathdb.org
domain, but from their original providers). As an example,
OmnipathR
uses UniProt data to translate identifiers.
You may find a list of the available identifiers in the manual page of
translate_ids
function. The evaluation of the parameters is
tidyverse style, and both UniProt’s notation and a simple internal
notation can be used. Furthermore, it can handle vectors, data frames or
list of vectors.
d <- data.frame(uniprot_id = c('P00533', 'Q9ULV1', 'P43897', 'Q9Y2P5'))
d <- translate_ids(
d,
uniprot_id = uniprot, # the source ID type and column name
genesymbol # the target ID type using OmniPath's notation
)
d
## uniprot_id genesymbol
## 1 P00533 EGFR
## 2 Q9ULV1 FZD4
## 3 P43897 TSFM
## 4 Q9Y2P5 SLC27A5
It is possible to have one source ID type and column in one call, but multiple target ID types and columns: to translate a network, two calls are necessary. Note: certain functionality fails recently due to changes in other packages, will be fixed in a few days.
OmnipathR
is able to look up ancestors and descendants
in ontology trees, and also exposes the ontology tree in three different
formats: as a data frame, as a list of lists or as an igraph graph
object. All these can have two directions: child-to-parent
(c2p
) or parent-to-child (p2c
).
## # A tibble: 55,114 × 3
## term relation parents
## <fct> <chr> <list>
## 1 GO:0000001 is_a <chr [2]>
## 2 GO:0000002 is_a <chr [1]>
## 3 GO:0000006 is_a <chr [1]>
## 4 GO:0000007 is_a <chr [1]>
## 5 GO:0000009 is_a <chr [1]>
## 6 GO:0000010 is_a <chr [1]>
## 7 GO:0000011 is_a <chr [2]>
## 8 GO:0000012 is_a <chr [1]>
## 9 GO:0000014 is_a <chr [1]>
## 10 GO:0000015 is_a <chr [1]>
## # ℹ 55,104 more rows
To convert the relations to list or graph format, use the
relations_table_to_list
or
relations_table_to_graph
functions. To swap between
c2p
and p2c
use the
swap_relations
function.
## IGRAPH 84619aa DN-- 40665 78698 --
## + attr: name (v/c), relation (e/c)
## + edges from 84619aa (vertex names):
## [1] GO:0000001->GO:0048308 GO:0000001->GO:0048311 GO:0000002->GO:0007005 GO:0000006->GO:0005385
## [5] GO:0000007->GO:0005385 GO:0000009->GO:0000030 GO:0000010->GO:0004659 GO:0000011->GO:0007033
## [9] GO:0000011->GO:0048308 GO:0000012->GO:0006281 GO:0000014->GO:0004520 GO:0000015->GO:1902494
## [13] GO:0000015->GO:0005829 GO:0000016->GO:0004553 GO:0000017->GO:0042946 GO:0000018->GO:0051052
## [17] GO:0000018->GO:0006310 GO:0000019->GO:0000018 GO:0000019->GO:0006312 GO:0000022->GO:0051231
## [21] GO:0000022->GO:1903047 GO:0000022->GO:0000070 GO:0000022->GO:0007052 GO:0000023->GO:0005984
## [25] GO:0000024->GO:0000023 GO:0000024->GO:0046351 GO:0000025->GO:0000023 GO:0000025->GO:0046352
## [29] GO:0000026->GO:0000030 GO:0000027->GO:0022618 GO:0000027->GO:0042255 GO:0000027->GO:0042273
## + ... omitted several edges
It can also translate term IDs to term names:
## [1] "mitotic spindle elongation"
The first call takes a few seconds as it loads the database, subsequent calls are faster.
OmnipathR
features a logging facility, a YML
configuration file and a cache directory. By default the highest level
messages are printed to the console, and you can browse the full log
from R by calling omnipath_log()
. The cache can be
controlled by a number of functions, for example you can search for
cache files by omnipath_cache_search()
, and delete them by
omnipath_cache_remove
:
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`$key
## [1] "8e1fed15bbe7704374f40d278e719e18b4a9d60f"
##
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`$url
## [1] "https://omnipathdb.org/interactions?genesymbols=yes&datasets=dorothea,tf_target,collectri&organisms=9606&dorothea_levels=A,B&fields=sources,references,curation_effort,dorothea_level&license=academic"
##
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`$post
## list()
##
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`$payload
## list()
##
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`$ext
## [1] "rds"
##
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`$versions
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`$versions$`1`
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`$versions$`1`$number
## [1] "1"
##
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`$versions$`1`$path
## [1] "/github/home/.cache/OmnipathR/8e1fed15bbe7704374f40d278e719e18b4a9d60f-1.rds"
##
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`$versions$`1`$dl_started
## [1] "2024-12-18 03:56:20 UTC"
##
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`$versions$`1`$status
## [1] "ready"
##
## $`8e1fed15bbe7704374f40d278e719e18b4a9d60f`$versions$`1`$dl_finished
## [1] "2024-12-18 03:56:21 UTC"
The configuration can be set by options
, all options are
prefixed with omnipath.
, and can be saved by
omnipath_save_config
. For example, to exclude all OmniPath
resources which don’t allow for-profit use:
The internal state is contained by the omnipathr.env
environment.
Find more examples in the other vignettes and the manual. For
example, the NicheNet vignette presents the integratation between
OmnipathR
and nichenetr
, a method for
prediction of ligand-target gene connections. Another Bioconductor
package wppi
is able to add context specific scores to
networks, based on genes of interest, functional annotations and network
proximity (random walks with restart). The new paths
vignette presents some approaches to construct pathways from networks.
The design of the OmniPath database is described in our recent paper
[1], while an in depth analysis of the pathway resources is available in
the first OmniPath paper [2].
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
## [4] LC_COLLATE=C LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] OmnipathR_3.15.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] xfun_0.49 bslib_0.8.0 tzdb_0.4.0 vctrs_0.6.5 tools_4.4.2
## [6] generics_0.1.3 parallel_4.4.2 curl_6.0.1 tibble_3.2.1 RSQLite_2.3.9
## [11] blob_1.2.4 pkgconfig_2.0.3 R.oo_1.27.0 checkmate_2.3.2 readxl_1.4.3
## [16] lifecycle_1.0.4 compiler_4.4.2 stringr_1.5.1 progress_1.2.3 htmltools_0.5.8.1
## [21] sys_3.4.3 buildtools_1.0.0 sass_0.4.9 yaml_2.3.10 later_1.4.1
## [26] pillar_1.10.0 crayon_1.5.3 jquerylib_0.1.4 tidyr_1.3.1 R.utils_2.12.3
## [31] cachem_1.1.0 tidyselect_1.2.1 rvest_1.0.4 zip_2.3.1 digest_0.6.37
## [36] stringi_1.8.4 dplyr_1.1.4 purrr_1.0.2 maketools_1.3.1 fastmap_1.2.0
## [41] cli_3.6.3 logger_0.4.0 magrittr_2.0.3 utf8_1.2.4 XML_3.99-0.17
## [46] readr_2.1.5 withr_3.0.2 prettyunits_1.2.0 backports_1.5.0 rappdirs_0.3.3
## [51] bit64_4.5.2 lubridate_1.9.4 timechange_0.3.0 rmarkdown_2.29 httr_1.4.7
## [56] igraph_2.1.2 bit_4.5.0.1 cellranger_1.1.0 R.methodsS3_1.8.2 hms_1.1.3
## [61] memoise_2.0.1 evaluate_1.0.1 knitr_1.49 rlang_1.1.4 Rcpp_1.0.13-1
## [66] glue_1.8.0 DBI_1.2.3 selectr_0.4-2 BiocManager_1.30.25 xml2_1.3.6
## [71] vroom_1.6.5 jsonlite_1.8.9 R6_2.5.1
[1] D Turei, A Valdeolivas, L Gul, N Palacio-Escat, M Klein, O Ivanova, M Olbei, A Gabor, F Theis, D Modos, T Korcsmaros and J Saez-Rodriguez (2021) Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. Molecular Systems Biology 17:e9923
[2] D Turei, T Korcsmaros and J Saez-Rodriguez (2016) OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nature Methods 13(12)