To install this package, start R (version > “4.0”) and enter:
If you have any iPath-related questions, please post to the GitHub Issue section of iPath at https://github.com/suke18/iPath/issues, which will be helpful for the construction of iPath.
Identifying biomarkers to predict the clinical outcomes of individual patients is a fundamental problem in clinical oncology. Multiple single-gene biomarkers have already been identified and used in the clinics. However, multiple oncogenes or tumor-suppressor genes are involved during the process of tumorigenesis. Additionally, the efficacy of single-gene biomarkers is limited by the extensively variable expression levels measured by high-throughput assays. In this study, we hypothesize that in individual tumor samples, the disruption of transcription homeostasis in key pathways or gene set plays an important role in tumorigenesis and has profound implications for the patient’s clinical outcome. We devised a computational method named iPath to identify, at the individual sample level, which pathways or gene sets significantly deviate from their norms. We conducted a pan-cancer analysis and demonstrated that iPath is capable of identifying highly predictive biomarkers for clinical outcomes, including overall survival, tumor subtypes, and tumor stage classifications.
iPath requires an normalized expression matrix with rows representing the genes and columns representing the samples. To preprocess the expression matrix, iPath filters out the genes depending on standard deviations (sd). Here, we sampled PRAD TCGA dataset for illustration. It is noted that iPath requires a gene set database (GSDB) as another input, which can be obtained by the MSigDB database.
The PRAD_data
dataset is loaded with three objects
including the RPKM expression matrix (prad_exprs
),
corresponding phenotype information (prad_inds). prad_inds
is the binary vector with 0 representing normal and 1 representing tumor
sample, and simulated clinical dataset (prad_cli
).
The core of iPath is to calculate the iES score for each patient and pathway. The function iES_cal2 requires two input an expression matrix and gene set database (GSDB). The returned matrix contains iES with rows corresponding to the pathways and columns corresponding to the samples.
After computing iES matrix, it is important to investigate whether
the classified normal-like and perturbed groups exist significance
different in terms of survival outcomes. To perform the classifcaiton in
tumor samples, we use normal sampels as reference by fiting a Guassian
Mixture. The investigation is conducted for each individual pathway.
iES_surv
function inputs the iES matrix from the
iES_cal2
step, the clinical data, and the binary vector
indicating the patient phenotypes; for example, 0 represents normal
sample and 1 represents tumor sample.
surv_outcomes = iES_surv(iES_mat = iES_mat, cli = prad_cli, indVec = prad_inds)
#> Warning in coxph.fit(X, Y, istrat, offset, init, control, weights = weights, :
#> Loglik converged before variable 1 ; coefficient may be infinite.
head(surv_outcomes)
#> nPerturb c-index coef pval
#> SimPathway1 8 0.5418719 -0.6874565 0.5041147
#> SimPathway2 5 0.5615764 -18.2110687 0.1970832
#> SimPathway3 11 0.4802956 -0.2586764 0.7386278
We also provide two forms of visualization for iES scores. One is the waterfall plot ranked from the smallest to the largest.
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] iPath_1.13.0 survival_3.8-3 BiocParallel_1.41.0
#> [4] mclust_6.1.1 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 xfun_0.49 bslib_0.8.0
#> [4] ggplot2_3.5.1 rstatix_0.7.2 lattice_0.22-6
#> [7] vctrs_0.6.5 tools_4.4.2 generics_0.1.3
#> [10] parallel_4.4.2 tibble_3.2.1 pkgconfig_2.0.3
#> [13] Matrix_1.7-1 data.table_1.16.4 lifecycle_1.0.4
#> [16] farver_2.1.2 compiler_4.4.2 munsell_0.5.1
#> [19] codetools_0.2-20 carData_3.0-5 htmltools_0.5.8.1
#> [22] sys_3.4.3 buildtools_1.0.0 sass_0.4.9
#> [25] yaml_2.3.10 Formula_1.2-5 pillar_1.10.0
#> [28] car_3.1-3 ggpubr_0.6.0 jquerylib_0.1.4
#> [31] tidyr_1.3.1 cachem_1.1.0 survminer_0.5.0
#> [34] abind_1.4-8 km.ci_0.5-6 tidyselect_1.2.1
#> [37] digest_0.6.37 dplyr_1.1.4 purrr_1.0.2
#> [40] labeling_0.4.3 maketools_1.3.1 splines_4.4.2
#> [43] fastmap_1.2.0 grid_4.4.2 colorspace_2.1-1
#> [46] cli_3.6.3 magrittr_2.0.3 broom_1.0.7
#> [49] withr_3.0.2 scales_1.3.0 backports_1.5.0
#> [52] rmarkdown_2.29 matrixStats_1.4.1 gridExtra_2.3
#> [55] ggsignif_0.6.4 zoo_1.8-12 evaluate_1.0.1
#> [58] knitr_1.49 KMsurv_0.1-5 survMisc_0.5.6
#> [61] rlang_1.1.4 Rcpp_1.0.13-1 xtable_1.8-4
#> [64] glue_1.8.0 BiocManager_1.30.25 jsonlite_1.8.9
#> [67] R6_2.5.1