In the last years we have witnessed a dramatic change in the clinical
treatment of patients thanks to molecular and personalized medicine.
Many medical institutes are starting to adopt routine genome-wide
screenings to complement and help diagnosis and treatment choices. As
the number of datasets grows, we need to adapt and improve the methods
to cope with the complexity, amount and multi-level structure of
available information. Integrating these type of data remains
challenging due to their dimensionality and diverse features. Moreover,
focusing on dysregulated biological processes rather than individual
genes can offer deeper insights into complex diseases like cancer.
MOSClip
is a multi-omic statistical approach based on
pathway topology that can deal with this complexity. It integrates
multiple omics - such as expression, methylation, mutation, or copy
number variation - using various dimensionality reduction strategies.
The analysis can be performed at the level of entire pathways or pathway
modules, allowing for a more detailed examination of the dysregulated
mechanisms within large biological processes. MOSClip
pathway analysis serves two primary purposes: testing the survival
association of pathways or modules using the Cox proportional hazard
model, and conducting a two-class analysis with a generalized linear
model. Additionally, the package offers valuable graphical tools to
visualize and interpret the results.
In recent years many efforts have been dedicated on multi-omic data
integration, with several tools available for pathway analysis in a
multi-omic framework; however, their purpose is mainly limited to
two-class comparison. MOSClip
(Multi-Omic Survival Clip)
was originally developed for survival pathway analysis, combining both
multi-omic data integration and graphical model theory to keep track of
gene interactions among a pathway (Martini et al.
2019). With this purpose, MOSClip
allows to test
survival association of pathways or their connected components, that we
called modules, in a multi-omic framework. A second test has been
implemented to perform also two-class comparison, to investigate
pathways or modules association with a specific condition. Multi-omics
gene measurements are tested as covariates of a Cox proportional hazard
model or a generalized linear model, after dimensionality reduction of
data. MOSClip
is highly flexible thanks to its modular
structure, allowing the use of one or multiple different omics, as well
as different data reduction strategies and tests. In brief,
MOSClip
comprises four main components: pathway topology,
multi-omic data and survival or two-class analysis. The final goal is to
find biological processes impacting patient’s survival or patient’s
association with a specific condition. Furthermore, several graphical
tools have been implemented in MOSClip
to browse, manage
and provide help in the interpretation of the results.
In this vignette, we will show an example of survival analysis on four omics: transcriptome, methylome, genomic mutations and genomic copy number variations, testing if these omics can be synergically involved in pathways with survival or two-class prognostication power.
We start by loading example data provided by MOSClip
package to run our analysis. reactSmall
object is a list of
three interesting Reactome pathways in graphite::Pathway
format that we will use in our analysis. More information can be found
on the object documentation (?reactSmall
).
multiOmics
object is a multi-omic dataset containing
matrices, genes per patients, of methylation status, somatic mutations,
CNVs, and transcripts expression levels, for 50 ovarian cancer patients
from TCGA. Genes and patients were manually selected to generate this
small dataset intended to illustrate the functionality of the package,
allowing users to explore its methods and tools in a simplified context.
Information on how the dataset was generated can be found in package
documentation with ?multiOmics
. Moreover, additional
information on how to pre-process datasets for a MOSClip
analysis are available in this GitHub tutorial
or in the Materials and Methods section of the paper (Martini et al. 2019).
library(MOSClip)
#> Warning: multiple methods tables found for 'union'
#> Warning: multiple methods tables found for 'intersect'
#> Warning: multiple methods tables found for 'setdiff'
library(MultiAssayExperiment)
library(kableExtra)
data("reactSmall")
data("multiOmics")
multiOmics
is an Omics
class object. This
MOSClip
specific object derives from a
MultiAssayExperiment
object. It contains standard slots
ExperimentList
, sampleMap
,
colData
, and specific slots for MOSClip
analysis: modelInfo
, where the user should specify the
dimensionality reduction method to process each omic dataset, and
specificArgs
, where additional parameters can be set for
each method, as described in the help of each reduction function. An
Omics
class object can be generated with the function
makeOmics
starting from data matrices and additional
annotations.
Available methods for dimensionality reduction can be conveniently
visualized with availableOmicMethods
function.
availableOmicMethods()
#> [1] "summarizeToBinaryEvents"
#> [2] "summarizeToNumberOfEvents"
#> [3] "summarizeWithPca"
#> [4] "summarizeInCluster"
#> [5] "summarizeToBinaryDirectionalEvents"
#> [6] "summarizeToNumberOfDirectionalEvents"
In this vignette, we chose to use PCA for expression data, cluster
analysis for methylation data, vote counting for mutations and CNVs (for
detail see MOSClip
paper (Martini et
al. 2019)). This data transformations are easily applied calling
MOSClip
functions, thus, here we need only to provide the
name of the needed function. Given the nature of the methylation data,
we expect to have more than a CpG cluster associated to a gene. For this
reason, MOSClip
provides the possibility to include a
dictionary to associate the methylation level of multiple CpG clusters
to a single gene. Thus, in the methylation specific arguments you need
to provide the dictionary to convert cluster names into genes.
We are now ready to perform the survival analysis on modules using
the function multiOmicsSurvivalModuleTest
. Required inputs
are a multi-omic dataset and a pathway in Pathway
or
graphNEL
format. Alternatively, it is possible to run the
analysis also on gene-sets; in this case the topological information is
lost, thus, only the function for pathway test must be used. In this
example, we choose to test only genes that have at least expression
data, as specified with useTheseGenes
. This a priori
filter, however, is not mandatory. Moreover, it could be useful to set a
seed in order to have reproducible results.
genesToConsider <- row.names(experiments(multiOmics)$exp)
moduleSurv <- lapply(reactSmall, function(g) {
set.seed(1234)
fcl = multiOmicsSurvivalModuleTest(multiOmics, g,
useTheseGenes = genesToConsider)
fcl
})
Once the analysis is complete, we can plot the tabular summary of the
top 10 modules selected by p-value of the Cox proportional hazard model
using the function multiPathwayModuleReport
.
module | pvalue | cnvNEG | cnvPOS | expPC1 | expPC2 | expPC3 | met2k2 | met3k2 | met3k3 | mut | |
---|---|---|---|---|---|---|---|---|---|---|---|
Activation of Matrix Metalloproteinases.4 | 4 | 0.0008290 | NA | 0.0026686 | 0.0509299 | 0.0002584 | 0.5032458 | 0.0037543 | NA | NA | 0.7128964 |
FGFR1 mutant receptor activation.3 | 3 | 0.0027138 | NA | 0.1441293 | 0.2163497 | NA | NA | 0.0010768 | NA | NA | 0.1113289 |
FGFR1 mutant receptor activation.2 | 2 | 0.0044104 | NA | 0.7453814 | 0.2635201 | NA | NA | NA | 0.0051868 | 0.2507027 | 0.9970709 |
Activation of Matrix Metalloproteinases.6 | 6 | 0.0232918 | NA | 0.0083832 | 0.0500642 | 0.0008553 | 0.5538676 | 0.1345511 | NA | NA | 0.5097630 |
FGFR1 mutant receptor activation.1 | 1 | 0.0248414 | 0.3413743 | 0.8067974 | 0.6359349 | 0.4539220 | NA | NA | 0.4315512 | 0.0028007 | 0.0874373 |
Activation of Matrix Metalloproteinases.5 | 5 | 0.0513868 | NA | 0.0239773 | 0.0623145 | 0.0028872 | 0.6663908 | 0.4852597 | NA | NA | 0.7326374 |
Activation of Matrix Metalloproteinases.7 | 7 | 0.0652934 | 0.8248342 | 0.0360290 | 0.9254547 | 0.0017038 | 0.1847143 | 0.3620709 | NA | NA | NA |
Activation of Matrix Metalloproteinases.9 | 9 | 0.0727559 | NA | 0.0783999 | 0.0225378 | 0.0094491 | 0.6212418 | NA | 0.7186513 | 0.4393132 | 0.2449149 |
VEGFA-VEGFR2 Pathway.1 | 1 | 0.0825774 | 0.4602803 | 0.2725537 | 0.0850171 | 0.3649652 | 0.0081958 | 0.2580115 | NA | NA | 0.0478975 |
VEGFA-VEGFR2 Pathway.8 | 8 | 0.1015215 | NA | 0.0354221 | 0.1335834 | 0.1266076 | NA | 0.1198248 | NA | NA | 0.6545505 |
MOSClip
has plenty of functions to visually explore the
results. In the following paragraphs we will show some examples.
We can visualize the table of module test results for a specific pathway, e.g., Activation of Matrix Metalloproteinases.
plotModuleReport(moduleSurv[["Activation of Matrix Metalloproteinases"]],
MOcolors = c(exp = "red", met = "green",
cnv = "yellow", mut = "blue"))
As you can see, among others, the module number 4 of this pathway is significant, and, in particular, covariates expPC2, met2k2 and cnvPOS are driving this significance.
We can have a look at the pathway graph and the module position in
the pathway using plotModuleInGraph
.
plotModuleInGraph(moduleSurv[["Activation of Matrix Metalloproteinases"]],
pathList = reactSmall,
moduleNumber = 4)
Then, we can check at the differences in survival using Kaplan-Meier curves dividing patients in groups with different omics patterns. Here we consider only covariates expPC2 and met2k.
plotModuleKM(moduleSurv[["Activation of Matrix Metalloproteinases"]],
moduleNumber = 4,
formula = "Surv(days, status) ~ expPC2 + met2k",
paletteNames = "Paired", risk.table = TRUE, inYears = TRUE)
We can explore the most important genes of the pathway and their original profiles across the omics using an heatmap plot of the original values. The most important genes are selected as described in Martini et al. (Martini et al. 2019).
additionalA <- colData(multiOmics)
additionalA$status[additionalA$status == 1] <- "event"
additionalA$status[additionalA$status == 0] <- "no_event"
additionalA$PFS <- as.factor(additionalA$status)
additionalA$status <- NULL
additionalA$years <- round(additionalA$days/365.24, 0)
additionalA$days <- NULL
plotModuleHeat(moduleSurv[["Activation of Matrix Metalloproteinases"]],
moduleNumber = 4,
paletteNames = c(exp = "red", met = "green",
cnv = "yellow", mut = "blue"),
additionalAnnotations = additionalA,
additionalPaletteNames = list(PFS = "violet", years = "teal"),
sortBy = c("expPC2", "met2k", "PFS", "years"),
withSampleNames = FALSE)
In second instance, we can ask if two or more omics are significant
in the same module simultaneously and if this omic interaction is more
frequent than those expected by chance. To perform this test we use the
runSupertest
function. A circle plot is returned with the
frequency of all significant omic combinations and their significance
levels.
In pathway test the pathway topology (the in and out connections of the pathway genes) can be exploited to guide the data reduction step. To do that, we suggest to use the topological PCA instead of the sparse PCA changing the settings in the omics object.
Then, we are ready to run the analysis using the function
multiOmicsSurvivalPathwayTest
.
data("multiOmicsTopo")
pathwaySurv <- lapply(reactSmall, function(g) {
set.seed(1234)
fcl = multiOmicsSurvivalPathwayTest(multiOmicsTopo, g,
useTheseGenes = genesToConsider)
})
We can plot a report of the first 10 pathways, sorted by pvalue of the Cox proportional hazard model.
plotMultiPathwayReport(pathwaySurv,
MOcolors = c(exp = "red", mut = "blue",
cnv = "yellow", met = "green"))
Finally, we can look at the predictive genes using a heatmap with patient additional annotations.
plotPathwayHeat(pathwaySurv[["Activation of Matrix Metalloproteinases"]],
sortBy = c("expPC1", "cnvPOS", "PFS"),
paletteNames = c(exp = "red", met = "green",
mut = "blue", cnv = "yellow"),
additionalAnnotations = additionalA,
additionalPaletteNames = list(PFS = "violet", years = "teal"),
withSampleNames = FALSE)
Then, we can also check for differences in survival using Kaplan-Meier curves dividing patients in groups with different omics patterns (e.g. patients with different methylation pattern and high and low levels of PC2 in expression).
MOSClip
gives the possibility to prioritize most
important and stable pathway or module results, running a resampling
procedure that can be found on the extended tutorial
on GitHub.
More tutorials are also available on how to perform a two-class
analysis with MOSClip
, as well as examples of more plots
that were not shown in this vignette.
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] kableExtra_1.4.0 MultiAssayExperiment_1.33.0
#> [3] SummarizedExperiment_1.37.0 Biobase_2.67.0
#> [5] GenomicRanges_1.59.0 GenomeInfoDb_1.43.1
#> [7] IRanges_2.41.1 S4Vectors_0.45.2
#> [9] BiocGenerics_0.53.3 generics_0.1.3
#> [11] MatrixGenerics_1.19.0 matrixStats_1.4.1
#> [13] MOSClip_1.1.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] splines_4.4.2 BiocIO_1.17.0 bitops_1.0-9
#> [4] ggplotify_0.1.2 tibble_3.2.1 qpgraph_2.41.0
#> [7] graph_1.85.0 XML_3.99-0.17 lifecycle_1.0.4
#> [10] rstatix_0.7.2 doParallel_1.0.17 lattice_0.22-6
#> [13] MASS_7.3-61 flashClust_1.01-2 SuperExactTest_1.1.0
#> [16] NbClust_3.0.1 backports_1.5.0 magrittr_2.0.3
#> [19] sass_0.4.9 rmarkdown_2.29 jquerylib_0.1.4
#> [22] yaml_2.3.10 DBI_1.2.3 buildtools_1.0.0
#> [25] RColorBrewer_1.1-3 abind_1.4-8 zlibbioc_1.52.0
#> [28] purrr_1.0.2 elasticnet_1.3 RCurl_1.98-1.16
#> [31] yulab.utils_0.1.8 rappdirs_0.3.3 circlize_0.4.16
#> [34] GenomeInfoDbData_1.2.13 KMsurv_0.1-5 ggrepel_0.9.6
#> [37] gRbase_2.0.3 maketools_1.3.1 pheatmap_1.0.12
#> [40] annotate_1.85.0 commonmark_1.9.2 svglite_2.1.3
#> [43] codetools_0.2-20 DelayedArray_0.33.2 ggtext_0.1.2
#> [46] xml2_1.3.6 DT_0.33 tidyselect_1.2.1
#> [49] shape_1.4.6.1 farver_2.1.2 UCSC.utils_1.3.0
#> [52] GenomicAlignments_1.43.0 jsonlite_1.8.9 GetoptLong_1.0.5
#> [55] Formula_1.2-5 survival_3.7-0 iterators_1.0.14
#> [58] emmeans_1.10.5 coxrobust_1.0.1 systemfonts_1.1.0
#> [61] foreach_1.5.2 tools_4.4.2 Rcpp_1.0.13-1
#> [64] glue_1.8.0 gridExtra_2.3 SparseArray_1.7.2
#> [67] xfun_0.49 dplyr_1.1.4 withr_3.0.2
#> [70] BiocManager_1.30.25 fastmap_1.2.0 fansi_1.0.6
#> [73] digest_0.6.37 R6_2.5.1 gridGraphics_0.5-1
#> [76] estimability_1.5.1 colorspace_2.1-1 markdown_1.13
#> [79] RSQLite_2.3.8 utf8_1.2.4 tidyr_1.3.1
#> [82] data.table_1.16.2 corpcor_1.6.10 rtracklayer_1.67.0
#> [85] httr_1.4.7 htmlwidgets_1.6.4 S4Arrays_1.7.1
#> [88] scatterplot3d_0.3-44 graphite_1.53.0 pkgconfig_2.0.3
#> [91] gtable_0.3.6 blob_1.2.4 ComplexHeatmap_2.23.0
#> [94] XVector_0.47.0 sys_3.4.3 survMisc_0.5.6
#> [97] htmltools_0.5.8.1 carData_3.0-5 multcompView_0.1-10
#> [100] clue_0.3-66 scales_1.3.0 leaps_3.2
#> [103] png_0.1-8 knitr_1.49 km.ci_0.5-6
#> [106] rstudioapi_0.17.1 rjson_0.2.23 checkmate_2.3.2
#> [109] curl_6.0.1 org.Hs.eg.db_3.20.0 cachem_1.1.0
#> [112] zoo_1.8-12 GlobalOptions_0.1.2 stringr_1.5.1
#> [115] parallel_4.4.2 AnnotationDbi_1.69.0 restfulr_0.0.15
#> [118] pillar_1.9.0 grid_4.4.2 reshape_0.8.9
#> [121] vctrs_0.6.5 maxstat_0.7-25 ggpubr_0.6.0
#> [124] car_3.1-3 xtable_1.8-4 cluster_2.1.6
#> [127] Rgraphviz_2.51.0 evaluate_1.0.1 GenomicFeatures_1.59.1
#> [130] mvtnorm_1.3-2 cli_3.6.3 compiler_4.4.2
#> [133] Rsamtools_2.23.0 rlang_1.1.4 crayon_1.5.3
#> [136] ggsignif_0.6.4 labeling_0.4.3 survminer_0.5.0
#> [139] plyr_1.8.9 fs_1.6.5 stringi_1.8.4
#> [142] viridisLite_0.4.2 BiocParallel_1.41.0 munsell_0.5.1
#> [145] Biostrings_2.75.1 qtl_1.70 Matrix_1.7-1
#> [148] lars_1.3 bit64_4.5.2 ggplot2_3.5.1
#> [151] KEGGREST_1.47.0 FactoMineR_2.11 gridtext_0.1.5
#> [154] exactRankTests_0.8-35 igraph_2.1.1 broom_1.0.7
#> [157] memoise_2.0.1 bslib_0.8.0 bit_4.5.0