MOSClip vignette

Abstract

In the last years we have witnessed a dramatic change in the clinical treatment of patients thanks to molecular and personalized medicine. Many medical institutes are starting to adopt routine genome-wide screenings to complement and help diagnosis and treatment choices. As the number of datasets grows, we need to adapt and improve the methods to cope with the complexity, amount and multi-level structure of available information. Integrating these type of data remains challenging due to their dimensionality and diverse features. Moreover, focusing on dysregulated biological processes rather than individual genes can offer deeper insights into complex diseases like cancer. MOSClip is a multi-omic statistical approach based on pathway topology that can deal with this complexity. It integrates multiple omics - such as expression, methylation, mutation, or copy number variation - using various dimensionality reduction strategies. The analysis can be performed at the level of entire pathways or pathway modules, allowing for a more detailed examination of the dysregulated mechanisms within large biological processes. MOSClip pathway analysis serves two primary purposes: testing the survival association of pathways or modules using the Cox proportional hazard model, and conducting a two-class analysis with a generalized linear model. Additionally, the package offers valuable graphical tools to visualize and interpret the results.

Introduction

In recent years many efforts have been dedicated on multi-omic data integration, with several tools available for pathway analysis in a multi-omic framework; however, their purpose is mainly limited to two-class comparison. MOSClip (Multi-Omic Survival Clip) was originally developed for survival pathway analysis, combining both multi-omic data integration and graphical model theory to keep track of gene interactions among a pathway (Martini et al. 2019). With this purpose, MOSClip allows to test survival association of pathways or their connected components, that we called modules, in a multi-omic framework. A second test has been implemented to perform also two-class comparison, to investigate pathways or modules association with a specific condition. Multi-omics gene measurements are tested as covariates of a Cox proportional hazard model or a generalized linear model, after dimensionality reduction of data. MOSClip is highly flexible thanks to its modular structure, allowing the use of one or multiple different omics, as well as different data reduction strategies and tests. In brief, MOSClip comprises four main components: pathway topology, multi-omic data and survival or two-class analysis. The final goal is to find biological processes impacting patient’s survival or patient’s association with a specific condition. Furthermore, several graphical tools have been implemented in MOSClip to browse, manage and provide help in the interpretation of the results.

In this vignette, we will show an example of survival analysis on four omics: transcriptome, methylome, genomic mutations and genomic copy number variations, testing if these omics can be synergically involved in pathways with survival or two-class prognostication power.

Installation

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("MOSClip")

How to use MOSClip for survival analysis

We start by loading example data provided by MOSClip package to run our analysis. reactSmall object is a list of three interesting Reactome pathways in graphite::Pathway format that we will use in our analysis. More information can be found on the object documentation (?reactSmall). multiOmics object is a multi-omic dataset containing matrices, genes per patients, of methylation status, somatic mutations, CNVs, and transcripts expression levels, for 50 ovarian cancer patients from TCGA. Genes and patients were manually selected to generate this small dataset intended to illustrate the functionality of the package, allowing users to explore its methods and tools in a simplified context. Information on how the dataset was generated can be found in package documentation with ?multiOmics. Moreover, additional information on how to pre-process datasets for a MOSClip analysis are available in this GitHub tutorial or in the Materials and Methods section of the paper (Martini et al. 2019).

library(MOSClip)
library(MultiAssayExperiment)
library(kableExtra)

data("reactSmall")
data("multiOmics")

multiOmics is an Omics class object. This MOSClip specific object derives from a MultiAssayExperiment object. It contains standard slots ExperimentList, sampleMap, colData, and specific slots for MOSClip analysis: modelInfo, where the user should specify the dimensionality reduction method to process each omic dataset, and specificArgs, where additional parameters can be set for each method, as described in the help of each reduction function. An Omics class object can be generated with the function makeOmics starting from data matrices and additional annotations.

Available methods for dimensionality reduction can be conveniently visualized with availableOmicMethods function.

availableOmicMethods()
#> [1] "summarizeToBinaryEvents"             
#> [2] "summarizeToNumberOfEvents"           
#> [3] "summarizeWithPca"                    
#> [4] "summarizeInCluster"                  
#> [5] "summarizeToBinaryDirectionalEvents"  
#> [6] "summarizeToNumberOfDirectionalEvents"

In this vignette, we chose to use PCA for expression data, cluster analysis for methylation data, vote counting for mutations and CNVs (for detail see MOSClip paper (Martini et al. 2019)). This data transformations are easily applied calling MOSClip functions, thus, here we need only to provide the name of the needed function. Given the nature of the methylation data, we expect to have more than a CpG cluster associated to a gene. For this reason, MOSClip provides the possibility to include a dictionary to associate the methylation level of multiple CpG clusters to a single gene. Thus, in the methylation specific arguments you need to provide the dictionary to convert cluster names into genes.

Module analysis

We are now ready to perform the survival analysis on modules using the function multiOmicsSurvivalModuleTest. Required inputs are a multi-omic dataset and a pathway in Pathway or graphNEL format. Alternatively, it is possible to run the analysis also on gene-sets; in this case the topological information is lost, thus, only the function for pathway test must be used. In this example, we choose to test only genes that have at least expression data, as specified with useTheseGenes. This a priori filter, however, is not mandatory. Moreover, it could be useful to set a seed in order to have reproducible results.

genesToConsider <- row.names(experiments(multiOmics)$exp)

moduleSurv <- lapply(reactSmall, function(g) {
  set.seed(1234)
  fcl = multiOmicsSurvivalModuleTest(multiOmics, g, 
                                     useTheseGenes = genesToConsider)
  fcl
 })

Once the analysis is complete, we can plot the tabular summary of the top 10 modules selected by p-value of the Cox proportional hazard model using the function multiPathwayModuleReport.

moduleSummary <- multiPathwayModuleReport(moduleSurv)
module pvalue cnvNEG cnvPOS expPC1 expPC2 expPC3 met2k2 met3k2 met3k3 mut
Activation of Matrix Metalloproteinases.4 4 0.0008290 NA 0.0026686 0.0509299 0.0002584 0.5032458 0.0037543 NA NA 0.7128964
FGFR1 mutant receptor activation.3 3 0.0027138 NA 0.1441293 0.2163497 NA NA 0.0010768 NA NA 0.1113289
FGFR1 mutant receptor activation.2 2 0.0044104 NA 0.7453814 0.2635201 NA NA NA 0.0051868 0.2507027 0.9970709
Activation of Matrix Metalloproteinases.6 6 0.0232918 NA 0.0083832 0.0500642 0.0008553 0.5538676 0.1345511 NA NA 0.5097630
FGFR1 mutant receptor activation.1 1 0.0248414 0.3413743 0.8067974 0.6359349 0.4539220 NA NA 0.4315512 0.0028007 0.0874373
Activation of Matrix Metalloproteinases.5 5 0.0513868 NA 0.0239773 0.0623145 0.0028872 0.6663908 0.4852597 NA NA 0.7326374
Activation of Matrix Metalloproteinases.7 7 0.0652934 0.8248342 0.0360290 0.9254547 0.0017038 0.1847143 0.3620709 NA NA NA
Activation of Matrix Metalloproteinases.9 9 0.0727559 NA 0.0783999 0.0225378 0.0094491 0.6212418 NA 0.7186513 0.4393132 0.2449149
VEGFA-VEGFR2 Pathway.1 1 0.0825774 0.4602803 0.2725537 0.0850171 0.3649652 0.0081958 0.2580115 NA NA 0.0478975
VEGFA-VEGFR2 Pathway.8 8 0.1015215 NA 0.0354221 0.1335834 0.1266076 NA 0.1198248 NA NA 0.6545505

Graphical exploration of MOSClip module results

MOSClip has plenty of functions to visually explore the results. In the following paragraphs we will show some examples.

We can visualize the table of module test results for a specific pathway, e.g., Activation of Matrix Metalloproteinases.

plotModuleReport(moduleSurv[["Activation of Matrix Metalloproteinases"]], 
                 MOcolors = c(exp = "red", met = "green", 
                              cnv = "yellow", mut = "blue"))

As you can see, among others, the module number 4 of this pathway is significant, and, in particular, covariates expPC2, met2k2 and cnvPOS are driving this significance.

We can have a look at the pathway graph and the module position in the pathway using plotModuleInGraph.

plotModuleInGraph(moduleSurv[["Activation of Matrix Metalloproteinases"]], 
                  pathList = reactSmall, 
                  moduleNumber = 4)

Then, we can check at the differences in survival using Kaplan-Meier curves dividing patients in groups with different omics patterns. Here we consider only covariates expPC2 and met2k.

plotModuleKM(moduleSurv[["Activation of Matrix Metalloproteinases"]], 
             moduleNumber = 4, 
             formula = "Surv(days, status) ~ expPC2 + met2k", 
             paletteNames = "Paired", risk.table = TRUE, inYears = TRUE)

We can explore the most important genes of the pathway and their original profiles across the omics using an heatmap plot of the original values. The most important genes are selected as described in Martini et al. (Martini et al. 2019).

additionalA <- colData(multiOmics)
additionalA$status[additionalA$status == 1] <- "event"
additionalA$status[additionalA$status == 0] <- "no_event"
additionalA$PFS <- as.factor(additionalA$status)
additionalA$status <- NULL
additionalA$years <- round(additionalA$days/365.24, 0)
additionalA$days <- NULL

plotModuleHeat(moduleSurv[["Activation of Matrix Metalloproteinases"]], 
               moduleNumber = 4, 
               paletteNames = c(exp = "red", met = "green", 
                                cnv = "yellow", mut = "blue"),
               additionalAnnotations = additionalA, 
               additionalPaletteNames = list(PFS = "violet", years = "teal"), 
               sortBy = c("expPC2", "met2k", "PFS", "years"), 
               withSampleNames = FALSE)

In second instance, we can ask if two or more omics are significant in the same module simultaneously and if this omic interaction is more frequent than those expected by chance. To perform this test we use the runSupertest function. A circle plot is returned with the frequency of all significant omic combinations and their significance levels.

runSupertest(moduleSummary, pvalueThr = 0.05, zscoreThr = 0.05, 
             excludeColumns = c("pathway", "module"))

Pathway analysis

In pathway test the pathway topology (the in and out connections of the pathway genes) can be exploited to guide the data reduction step. To do that, we suggest to use the topological PCA instead of the sparse PCA changing the settings in the omics object.

Then, we are ready to run the analysis using the function multiOmicsSurvivalPathwayTest.

data("multiOmicsTopo")

pathwaySurv <- lapply(reactSmall, function(g) {
  set.seed(1234)
  fcl = multiOmicsSurvivalPathwayTest(multiOmicsTopo, g, 
                                      useTheseGenes = genesToConsider)
  })

Graphical exploration of MOSClip pathway results

We can plot a report of the first 10 pathways, sorted by pvalue of the Cox proportional hazard model.

plotMultiPathwayReport(pathwaySurv, 
                       MOcolors = c(exp = "red", mut = "blue", 
                                    cnv = "yellow", met = "green"))

Finally, we can look at the predictive genes using a heatmap with patient additional annotations.

plotPathwayHeat(pathwaySurv[["Activation of Matrix Metalloproteinases"]], 
                sortBy = c("expPC1", "cnvPOS", "PFS"), 
                paletteNames = c(exp = "red", met = "green",
                                 mut = "blue", cnv = "yellow"), 
                additionalAnnotations = additionalA, 
                additionalPaletteNames = list(PFS = "violet", years = "teal"), 
                withSampleNames = FALSE)

Then, we can also check for differences in survival using Kaplan-Meier curves dividing patients in groups with different omics patterns (e.g. patients with different methylation pattern and high and low levels of PC2 in expression).

plotPathwayKM(pathwaySurv[["Activation of Matrix Metalloproteinases"]], 
              formula = "Surv(days, status) ~ expPC1 + cnvPOS", 
              paletteNames = "Paired")

Additional functionalities

MOSClip gives the possibility to prioritize most important and stable pathway or module results, running a resampling procedure that can be found on the extended tutorial on GitHub.

More tutorials are also available on how to perform a two-class analysis with MOSClip, as well as examples of more plots that were not shown in this vignette.

Session Information

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] kableExtra_1.4.0            MultiAssayExperiment_1.33.1
#>  [3] SummarizedExperiment_1.37.0 Biobase_2.67.0             
#>  [5] GenomicRanges_1.59.1        GenomeInfoDb_1.43.2        
#>  [7] IRanges_2.41.2              S4Vectors_0.45.2           
#>  [9] BiocGenerics_0.53.3         generics_0.1.3             
#> [11] MatrixGenerics_1.19.0       matrixStats_1.4.1          
#> [13] MOSClip_1.1.0               BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] splines_4.4.2            BiocIO_1.17.1            bitops_1.0-9            
#>   [4] ggplotify_0.1.2          tibble_3.2.1             qpgraph_2.41.0          
#>   [7] graph_1.85.0             XML_3.99-0.17            lifecycle_1.0.4         
#>  [10] rstatix_0.7.2            doParallel_1.0.17        lattice_0.22-6          
#>  [13] MASS_7.3-61              flashClust_1.01-2        SuperExactTest_1.1.0    
#>  [16] NbClust_3.0.1            backports_1.5.0          magrittr_2.0.3          
#>  [19] sass_0.4.9               rmarkdown_2.29           jquerylib_0.1.4         
#>  [22] yaml_2.3.10              DBI_1.2.3                buildtools_1.0.0        
#>  [25] RColorBrewer_1.1-3       abind_1.4-8              zlibbioc_1.52.0         
#>  [28] purrr_1.0.2              elasticnet_1.3           RCurl_1.98-1.16         
#>  [31] yulab.utils_0.1.8        rappdirs_0.3.3           circlize_0.4.16         
#>  [34] GenomeInfoDbData_1.2.13  KMsurv_0.1-5             ggrepel_0.9.6           
#>  [37] gRbase_2.0.3             maketools_1.3.1          pheatmap_1.0.12         
#>  [40] annotate_1.85.0          commonmark_1.9.2         svglite_2.1.3           
#>  [43] codetools_0.2-20         DelayedArray_0.33.3      ggtext_0.1.2            
#>  [46] xml2_1.3.6               DT_0.33                  tidyselect_1.2.1        
#>  [49] shape_1.4.6.1            farver_2.1.2             UCSC.utils_1.3.0        
#>  [52] GenomicAlignments_1.43.0 jsonlite_1.8.9           GetoptLong_1.0.5        
#>  [55] Formula_1.2-5            survival_3.8-3           iterators_1.0.14        
#>  [58] emmeans_1.10.6           coxrobust_1.0.1          systemfonts_1.1.0       
#>  [61] foreach_1.5.2            tools_4.4.2              Rcpp_1.0.13-1           
#>  [64] glue_1.8.0               gridExtra_2.3            SparseArray_1.7.2       
#>  [67] xfun_0.49                dplyr_1.1.4              withr_3.0.2             
#>  [70] BiocManager_1.30.25      fastmap_1.2.0            digest_0.6.37           
#>  [73] R6_2.5.1                 gridGraphics_0.5-1       estimability_1.5.1      
#>  [76] colorspace_2.1-1         markdown_1.13            RSQLite_2.3.9           
#>  [79] tidyr_1.3.1              data.table_1.16.4        corpcor_1.6.10          
#>  [82] rtracklayer_1.67.0       httr_1.4.7               htmlwidgets_1.6.4       
#>  [85] S4Arrays_1.7.1           scatterplot3d_0.3-44     graphite_1.53.0         
#>  [88] pkgconfig_2.0.3          gtable_0.3.6             blob_1.2.4              
#>  [91] ComplexHeatmap_2.23.0    XVector_0.47.0           sys_3.4.3               
#>  [94] survMisc_0.5.6           htmltools_0.5.8.1        carData_3.0-5           
#>  [97] multcompView_0.1-10      clue_0.3-66              scales_1.3.0            
#> [100] leaps_3.2                png_0.1-8                knitr_1.49              
#> [103] km.ci_0.5-6              rstudioapi_0.17.1        rjson_0.2.23            
#> [106] checkmate_2.3.2          curl_6.0.1               org.Hs.eg.db_3.20.0     
#> [109] cachem_1.1.0             zoo_1.8-12               GlobalOptions_0.1.2     
#> [112] stringr_1.5.1            parallel_4.4.2           AnnotationDbi_1.69.0    
#> [115] restfulr_0.0.15          pillar_1.10.0            grid_4.4.2              
#> [118] reshape_0.8.9            vctrs_0.6.5              maxstat_0.7-25          
#> [121] ggpubr_0.6.0             car_3.1-3                xtable_1.8-4            
#> [124] cluster_2.1.8            Rgraphviz_2.51.0         evaluate_1.0.1          
#> [127] GenomicFeatures_1.59.1   mvtnorm_1.3-2            cli_3.6.3               
#> [130] compiler_4.4.2           Rsamtools_2.23.1         rlang_1.1.4             
#> [133] crayon_1.5.3             ggsignif_0.6.4           labeling_0.4.3          
#> [136] survminer_0.5.0          plyr_1.8.9               fs_1.6.5                
#> [139] stringi_1.8.4            viridisLite_0.4.2        BiocParallel_1.41.0     
#> [142] munsell_0.5.1            Biostrings_2.75.3        qtl_1.70                
#> [145] Matrix_1.7-1             lars_1.3                 bit64_4.5.2             
#> [148] ggplot2_3.5.1            KEGGREST_1.47.0          FactoMineR_2.11         
#> [151] gridtext_0.1.5           exactRankTests_0.8-35    igraph_2.1.2            
#> [154] broom_1.0.7              memoise_2.0.1            bslib_0.8.0             
#> [157] bit_4.5.0.1

References

Martini, Paolo, Monica Chiogna, Enrica Calura, and Chiara Romualdi. 2019. MOSClip: Multi-Omic and Survival Pathway Analysis for the Identification of Survival Associated Gene and Modules.” Nucleic Acids Research 47 (14): e80. https://doi.org/10.1093/nar/gkz324.