Exploratory, statistical and survival analysis of cancer genomic data is extremely important and can lead to new discoveries, such as the identification of novel genomic prognostic markers, that have the potential to advance our understanding of cancer and ultimately benefit patients. These analyses are often performed on data available from a number of consortia websites, such as cBio Cancer Genomics Portal (cBioPortal), which is one of the best known and commonly used consolidated curations that hosts data from large consortium efforts. While cBioPortal provides both graphical user interface (GUI)-based and representational state transfer mediated means for researchers to explore and analyse clinical and genomics data, its capabilities have their limitations and oftentimes, to explore specific hypotheses, users need to perform a more sophisticated ‘off site’ analysis that typically requires users to have some prior programming experience.
To overcome these limitations and provide a GUI that facilitates the visualisation and interrogation of cancer genomics data, particularly cBioPortal-hosted data, using standard biostatistical methodologies, we developed an R Shiny app called GeNomics explOrer using StatistIcal and Survival analysis in R (GNOSIS). GNOSIS was initially developed as part of our study, using the METABRIC data, to investigate whether survival outcomes are associated with genomic instability in luminal breast cancers and was further developed to enable the exploration, analysis and incorporation of a diverse range of genomic features with clinical data in a research or clinical setting.
GNOSIS leverages a number of R packages and provides an intuitive GUI with multiple tab panels supporting a range of functionalities, including data upload and initial exploration, data recoding and subsetting, data visualisations, statistical analysis, mutation analysis and, in particular, survival analysis to identify prognostic markers. In addition, GNOSIS also helps researchers carry out reproducible research by providing downloadable input logs (Shiny_Log.txt) from each session.
GNOSIS has been submitted to Bioconductor to aid researchers in carrying out a reproducable, comprehensive statistical and survival analysis using data obtained from cBioPortal, or otherwise.
The GNOSIS GUI has 4 main elements: (1) A sidebar where each analysis tab can be selected, the Exploratory Tables tab is selected and displayed. (2) Tab panels within each tab, allowing multiple operations to be carried out and viewed in the one tab. (3) A box sidebar allowing users to select inputs, alter arguments and customise and export visualisations. (4) Main viewing panel displaying output.
Users can upload their own clinical, CNA or mutation data stored on their local machine, or select a cBioPortal study to upload:
A preview of the uploaded/selected data is provided in the GNOSIS viewing panel to ensure that the data has been read in correctly:
In the case where a cBioPortal study does not contain CNA and/or MAF data, a warning will be produced alerting users to this.
In addition, users can select specific columns of each dataframe to inspect:
To prepare the data for downstream analysis a number a things can be done. Firstly users can change the type of variables to numeric or factors using the box sidebar, which contains a space to select relevant variables:
Subsequently, users can subset the data based on up to three categorical variables and carry out survival variable recoding.
Here we filter the data to only include patients who received chemotherapy:
We also recode the overall and disease-specific survival to 0/1:
In cases where CNA data is uploaded, users may produce and segment CNA metrics for each patient, as well as select and extract specific genes for further analysis:
Users can produce a range of visualisations including boxplots, scatterplots, barplots, histograms and density plots.
Here is an example of a customisable boxplot, that can also be downloaded:
The primary function offered by GNOSIS is statistically robust survival analysis. GNOSIS contains several step-wise tabs to provide a complete survival analysis of the data under investigation.
Users can produce KM survival curves and the corresponding logrank tests to identify survival-associated categorical variables, both visually and statistically.
Users can perform a selection of association tests to identify variables that are associated with each other. This enables users to identify potential confounding variables in the analysis.
Statistical association tests available include the Chi-squared test, Fisher’s exact test, simulated Fisher’s exact test, ANOVA, Kruskal-Wallis test, pairwise t-test and Dunn’s test.
Users can produce both univariate and multivariable Cox models to identify survival-associated variables, and test the assumptions of these models using graphical diagnostics based on the scaled Schoenfeld residuals:
The corresponding adjusted survival curves, survival curves adjusted for the covariates in the multivariable Cox model, can also be produced and customised:
In the case where the PH assumption of the multivariable Cox model is violated, users can apply recursive partitioning survival trees:
An additional function of GNOSIS is the ability to summarise, analyse and visualise mutation annotation format (MAF) files using maftools.
GNOSIS facilitates reproducible research by allowing users to download an input log containing information on all the inputs selected throughout the session:
For details on the implementation, layout and application of GNOSIS see the corresponding publication. Demonstration videos providing a walkthrough of GNOSIS are also provided on Zenodo.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] GNOSIS_1.5.0 maftools_2.23.0 operator.tools_1.6.3
## [4] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
## [7] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
## [10] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1
## [13] tidyverse_2.0.0 shinymeta_0.2.0.3 shinyWidgets_0.8.7
## [16] dashboardthemes_1.1.6 shinydashboardPlus_2.0.5 shinydashboard_0.7.2
## [19] shiny_1.9.1 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] matrixStats_1.4.1 bitops_1.0-9
## [3] fontawesome_0.5.3 httr_1.4.7
## [5] RColorBrewer_1.1-3 GenomicDataCommons_1.31.0
## [7] tools_4.4.2 backports_1.5.0
## [9] utf8_1.2.4 R6_2.5.1
## [11] DT_0.33 jomo_2.7-6
## [13] withr_3.0.2 gridExtra_2.3
## [15] TCGAutils_1.27.0 cli_3.6.3
## [17] Biobase_2.67.0 textshaping_0.4.0
## [19] formatR_1.14 shinyjs_2.1.0
## [21] officer_0.6.7 sass_0.4.9
## [23] BWStest_0.2.3 survMisc_0.5.6
## [25] mvtnorm_1.3-2 proxy_0.4-27
## [27] askpass_1.2.1 rapiclient_0.1.8
## [29] Rsamtools_2.23.0 systemfonts_1.1.0
## [31] svglite_2.1.3 R.utils_2.12.3
## [33] sourcetools_0.1.7-1 styler_1.10.3
## [35] readxl_1.4.3 rstudioapi_0.17.1
## [37] RSQLite_2.3.8 generics_0.1.3
## [39] shape_1.4.6.1 BiocIO_1.17.0
## [41] car_3.1-3 zip_2.3.1
## [43] Matrix_1.7-1 futile.logger_1.4.3
## [45] fansi_1.0.6 DescTools_0.99.58
## [47] S4Vectors_0.45.2 abind_1.4-8
## [49] R.methodsS3_1.8.2 lifecycle_1.0.4
## [51] yaml_2.3.10 inum_1.0-5
## [53] carData_3.0-5 SummarizedExperiment_1.37.0
## [55] RaggedExperiment_1.31.0 SparseArray_1.7.2
## [57] BiocFileCache_2.15.0 grid_4.4.2
## [59] blob_1.2.4 promises_1.3.0
## [61] crayon_1.5.3 mitml_0.4-5
## [63] miniUI_0.1.1.1 lattice_0.22-6
## [65] haven_2.5.4 AnVIL_1.19.3
## [67] GenomicFeatures_1.59.1 KEGGREST_1.47.0
## [69] sys_3.4.3 maketools_1.3.1
## [71] pillar_1.9.0 knitr_1.49
## [73] GenomicRanges_1.59.0 rjson_0.2.23
## [75] boot_1.3-31 gld_2.6.6
## [77] kSamples_1.2-10 codetools_0.2-20
## [79] cBioPortalData_2.19.0 compareGroups_4.9.1
## [81] pan_1.9 glue_1.8.0
## [83] fontLiberation_0.1.0 data.table_1.16.2
## [85] MultiAssayExperiment_1.33.0 vctrs_0.6.5
## [87] png_0.1-8 cellranger_1.1.0
## [89] gtable_0.3.6 cachem_1.1.0
## [91] xfun_0.49 S4Arrays_1.7.1
## [93] mime_0.12 libcoin_1.0-10
## [95] survival_3.7-0 RTCGAToolbox_2.37.0
## [97] iterators_1.0.14 KMsurv_0.1-5
## [99] gmp_0.7-5 nlme_3.1-166
## [101] RcppCCTZ_0.2.12 fontquiver_0.2.1
## [103] bit64_4.5.2 filelock_1.0.3
## [105] GenomeInfoDb_1.43.1 R.cache_0.16.0
## [107] bslib_0.8.0 rpart_4.1.23
## [109] colorspace_2.1-1 BiocGenerics_0.53.3
## [111] DBI_1.2.3 nnet_7.3-19
## [113] DNAcopy_1.81.0 Exact_3.3
## [115] tidyselect_1.2.1 bit_4.5.0
## [117] compiler_4.4.2 curl_6.0.1
## [119] chron_2.3-61 glmnet_4.1-8
## [121] rvest_1.0.4 httr2_1.0.6
## [123] AnVILBase_1.1.0 HardyWeinberg_1.7.8
## [125] flextable_0.9.7 mice_3.16.0
## [127] expm_1.0-0 xml2_1.3.6
## [129] nanotime_0.3.10 fontBitstreamVera_0.1.1
## [131] DelayedArray_0.33.2 rtracklayer_1.67.0
## [133] scales_1.3.0 multcompView_0.1-10
## [135] rappdirs_0.3.3 digest_0.6.37
## [137] minqa_1.2.8 rmarkdown_2.29
## [139] XVector_0.47.0 htmltools_0.5.8.1
## [141] pkgconfig_2.0.3 lme4_1.1-35.5
## [143] MatrixGenerics_1.19.0 dbplyr_2.5.0
## [145] fastmap_1.2.0 rlang_1.1.4
## [147] htmlwidgets_1.6.4 UCSC.utils_1.3.0
## [149] SuppDists_1.1-9.8 jquerylib_0.1.4
## [151] zoo_1.8-12 jsonlite_1.8.9
## [153] BiocParallel_1.41.0 R.oo_1.27.0
## [155] RCurl_1.98-1.16 magrittr_2.0.3
## [157] kableExtra_1.4.0 Formula_1.2-5
## [159] GenomeInfoDbData_1.2.13 munsell_0.5.1
## [161] Rcpp_1.0.13-1 shinycssloaders_1.1.0
## [163] gdtools_0.4.1 partykit_1.2-22
## [165] stringi_1.8.4 rootSolve_1.8.2.4
## [167] RJSONIO_1.3-1.9 zlibbioc_1.52.0
## [169] MASS_7.3-61 plyr_1.8.9
## [171] parallel_4.4.2 shinylogs_0.2.1
## [173] lmom_3.2 survminer_0.5.0
## [175] PMCMRplus_1.9.12 Biostrings_2.75.1
## [177] splines_4.4.2 hms_1.1.3
## [179] anytime_0.3.9 ggpubr_0.6.0
## [181] uuid_1.2-1 ggsignif_0.6.4
## [183] buildtools_1.0.0 reshape2_1.4.4
## [185] stats4_4.4.2 futile.options_1.0.1
## [187] XML_3.99-0.17 evaluate_1.0.1
## [189] lambda.r_1.2.4 BiocManager_1.30.25
## [191] fabricatr_1.0.2 nloptr_2.1.1
## [193] tzdb_0.4.0 foreach_1.5.2
## [195] httpuv_1.6.15 openssl_2.2.2
## [197] km.ci_0.5-6 BiocBaseUtils_1.9.0
## [199] broom_1.0.7 xtable_1.8-4
## [201] Rmpfr_0.9-5 restfulr_0.0.15
## [203] e1071_1.7-16 rstatix_0.7.2
## [205] Rsolnp_1.16 later_1.3.2
## [207] viridisLite_0.4.2 class_7.3-22
## [209] ragg_1.3.3 truncnorm_1.0-9
## [211] memoise_2.0.1 AnnotationDbi_1.69.0
## [213] GenomicAlignments_1.43.0 IRanges_2.41.1
## [215] writexl_1.5.1 timechange_0.3.0