The wppi
package depends on the OmnipathR
package. Since it relies on features more recent than the latest
Bioconductor version (OmnipathR 2.0.0 in Bioconductor 3.12), until the
release of Bioconductor 3.13, it is recommended to install OmnipathR
from git.
The score_candidate_genes_from_PPI
function executes the
full wppi workflow. The only mandatory input is a set of genes of
interest. As a return, an ordered table with the similarity scores of
the new genes within the neighbourhood of the genes of interest is
provided. A higher score stands for a higher functional similarity
between this new gene and the given ones.
library(wppi)
# example gene set
genes_interest <- c(
'ERCC8', 'AKT3', 'NOL3', 'TTK',
'GFI1B', 'CDC25A', 'TPX2', 'SHE'
)
scores <- score_candidate_genes_from_PPI(genes_interest)
## Warning in (function (...) : 'function (...)
## {
## .Deprecated("post_translational")
## post_translational(...)
## }' is deprecated.
## Use 'post_translational' instead.
## See help("Deprecated")
## 'as(<dgCMatrix>, "dgTMatrix")' is deprecated.
## Use 'as(., "TsparseMatrix")' instead.
## See help("Deprecated") and help("Matrix-deprecated").
scores
# # A tibble: 295 x 3
# score gene_symbol uniprot
# <dbl> <chr> <chr>
# 1 0.247 KNL1 Q8NG31
# 2 0.247 HTRA2 O43464
# 3 0.247 KAT6A Q92794
# 4 0.247 BABAM1 Q9NWV8
# 5 0.247 SKI P12755
# 6 0.247 FOXA2 Q9Y261
# 7 0.247 CLK2 P49760
# 8 0.247 HNRNPA1 P09651
# 9 0.247 HK1 P19367
# 10 0.180 SH3RF1 Q7Z6J0
# # . with 285 more rows
The database knowledge is provided by wppi_data
. By
default all directed protein-protein interactions are used from
OmniPath. By passing various options the network can be customized. See
more details in the documentation of the OmnipathR
package,
especially the import_post_translational_interactions
function. For example, to use only the literature curated interactions
one can use the datasets = 'omnipath'
parameter:
## Warning in (function (...) : 'function (...)
## {
## .Deprecated("post_translational")
## post_translational(...)
## }' is deprecated.
## Use 'post_translational' instead.
## See help("Deprecated")
The wppi_data
function retrieves all database data at
once. Parameters to customize the network can be passed directly to this
function.
## Warning in (function (...) : 'function (...)
## {
## .Deprecated("post_translational")
## post_translational(...)
## }' is deprecated.
## Use 'post_translational' instead.
## See help("Deprecated")
Optionally, the Human Phenotype Ontology (HPO) annotations relevant in the context can be selected. For example, to select the annotations related to diabetes:
To work further with the interactions we first convert it to an
igraph
graph object:
Then we select a subgraph around the genes of interest. The size of
the subgraph is determined by the range of this neighborhood
(sub_level
argument for the subgraph_op
function).
The next step is to assign weights to each interaction. The weights are calculated based on the number of common neighbors and the similarities of the annotations of the interacting partners.
The random walk with restarts algorithm uses the edge weights to score the overall connections between pairs of genes. The result takes into accound also the indirect connections, integrating the information in the graph topology.
At the end we can summarize the scores for each protein, taking the sum of all adjacent connections. The resulted table provides us a list of proteins prioritized by their predicted importance in the context of interest (disease or condition).
## # A tibble: 158 × 3
## score gene_symbol uniprot
## <dbl> <chr> <chr>
## 1 0.251 KAT6A Q92794
## 2 0.251 MKRN1 Q9UHC7
## 3 0.251 STXBP4 Q6ZWJ1
## 4 0.251 TBX3 O15119
## 5 0.251 PHF20 Q9BVI0
## 6 0.251 AGO1 Q9UL18
## 7 0.247 AZU1 P20160
## 8 0.247 TACC2 O95359
## 9 0.247 RMI2 Q96E14
## 10 0.247 TUBB1 Q9H4B7
## # ℹ 148 more rows
# # A tibble: 249 x 3
# score gene_symbol uniprot
# <dbl> <chr> <chr>
# 1 0.251 HTRA2 O43464
# 2 0.251 KAT6A Q92794
# 3 0.251 BABAM1 Q9NWV8
# 4 0.251 SKI P12755
# 5 0.251 CLK2 P49760
# 6 0.248 TUBB P07437
# 7 0.248 KNL1 Q8NG31
# 8 0.189 SH3RF1 Q7Z6J0
# 9 0.189 SRPK2 P78362
# 10 0.150 CSNK1D P48730
# # . with 239 more rows
The top genes in the first order neighborhood of the genes of interest can be visualized in the PPI network:
{r fig1,dpi = 300, echo=FALSE, eval = FALSE, fig.cap="PPI network visualization of genes of interest (blue nodes) and their first neighbor with similarity scores (green nodes). "} idx_neighbors <- which(!V(graph_op_1)$Gene_Symbol %in% genes_interest) cols <- rep("lightsteelblue2",vcount(graph_op_1)) cols[idx_neighbors] <- "#57da83" scores.vertex <- rep(1,vcount(graph_op_1)) scores.vertex[idx_neighbors] <- 8*scores[na.omit(match(V(graph_op_1)$Gene_Symbol,scores$gene_symbol)),]$score par(mar=c(0.1,0.1,0.1,0.1)) plot(graph_op_1,vertex.label = ifelse(scores.vertex>=1,V(graph_op_1)$Gene_Symbol,NA), layout = layout.fruchterman.reingold,vertex.color=cols, vertex.size = 7*scores.vertex,edge.width = 0.5,edge.arrow.mode=0, vertex.label.font = 1, vertex.label.cex = 0.45)
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] wppi_1.15.0
##
## loaded via a namespace (and not attached):
## [1] xfun_0.48 bslib_0.8.0 lattice_0.22-6 OmnipathR_3.15.0
## [5] tzdb_0.4.0 bitops_1.0-9 vctrs_0.6.5 tools_4.4.1
## [9] generics_0.1.3 parallel_4.4.1 curl_5.2.3 tibble_3.2.1
## [13] fansi_1.0.6 RSQLite_2.3.7 blob_1.2.4 pkgconfig_2.0.3
## [17] R.oo_1.26.0 Matrix_1.7-1 checkmate_2.3.2 readxl_1.4.3
## [21] lifecycle_1.0.4 compiler_4.4.1 stringr_1.5.1 progress_1.2.3
## [25] htmltools_0.5.8.1 sys_3.4.3 buildtools_1.0.0 sass_0.4.9
## [29] RCurl_1.98-1.16 yaml_2.3.10 later_1.3.2 pillar_1.9.0
## [33] crayon_1.5.3 jquerylib_0.1.4 tidyr_1.3.1 R.utils_2.12.3
## [37] cachem_1.1.0 tidyselect_1.2.1 rvest_1.0.4 zip_2.3.1
## [41] digest_0.6.37 stringi_1.8.4 dplyr_1.1.4 purrr_1.0.2
## [45] maketools_1.3.1 fastmap_1.2.0 grid_4.4.1 cli_3.6.3
## [49] logger_0.4.0 magrittr_2.0.3 XML_3.99-0.17 utf8_1.2.4
## [53] readr_2.1.5 withr_3.0.2 prettyunits_1.2.0 backports_1.5.0
## [57] rappdirs_0.3.3 bit64_4.5.2 lubridate_1.9.3 timechange_0.3.0
## [61] rmarkdown_2.28 httr_1.4.7 igraph_2.1.1 bit_4.5.0
## [65] cellranger_1.1.0 R.methodsS3_1.8.2 hms_1.1.3 memoise_2.0.1
## [69] evaluate_1.0.1 knitr_1.48 rlang_1.1.4 Rcpp_1.0.13
## [73] glue_1.8.0 DBI_1.2.3 selectr_0.4-2 xml2_1.3.6
## [77] vroom_1.6.5 jsonlite_1.8.9 R6_2.5.1