MOMA is a tool for inferring connections between Master Regulator proteins and genomic driver events in cancer. Master regulators are regulatory proteins (primarily transcription factors and co-transcription factors) that control cell state. In the case of cancer and other disease states transcription factors have been shown to be key drivers of maintaining the disease state and can be targets for interventions. Often these master regulators are not mutated themselves but are downstream of mutations and other genomic alterations that ultimately dysregulate the normal activity of that regulator. MOMA uses multiple inputs of information to infer these connections and to improve the predictive value of the master regulator analysis.
For more information on Master Regulators and our tool for calculating their values see the paper published on VIPER in Nature Genetics in 2016 paper // package.
First install and load the library into the R session. Make sure all the other dependent packages have already been installed to ensure full functionality, including graphics and plotting functions.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MOMA")
library(MOMA)
#> Warning: multiple methods tables found for 'union'
#> Warning: multiple methods tables found for 'intersect'
#> Warning: multiple methods tables found for 'setdiff'
Explore the test data saved in the MultiAssayExperiment
object. Confirm that we have 50 test samples and 2505 VIPER inferred
proteins in the test viper matrix. This was generated by using
VIPER
on the GBM gene expression signature. The matrix has
the samples across the columns and the regulators in the rows.
The other assays needed for the experiment are:
example.gbm.mae
#> A MultiAssayExperiment object of 3 listed
#> experiments with user-defined names and respective classes.
#> Containing an ExperimentList class object of length 3:
#> [1] viper: matrix with 2505 rows and 50 columns
#> [2] cnv: matrix with 598 rows and 50 columns
#> [3] mut: matrix with 60 rows and 50 columns
#> Functionality:
#> experiments() - obtain the ExperimentList instance
#> colData() - the primary/phenotype DataFrame
#> sampleMap() - the sample coordination DataFrame
#> `$`, `[`, `[[` - extract colData columns, subset, or experiment
#> *Format() - convert into a long or wide DataFrame
#> assays() - convert ExperimentList to a SimpleList of matrices
#> exportClass() - save data to flat files
MultiAssayExperiment::assays(example.gbm.mae)$viper[1:3, 1:3]
#> TCGA-14-1825-01 TCGA-16-0846-01 TCGA-32-2634-01
#> 165 -3.0369845 0.5837581 -1.982579
#> 196 -2.4667300 -3.7642532 -1.941096
#> 257 0.7872116 0.3260199 1.683252
Next, we select the biological pathways we’d like to use to aid in our inference task. In this case, we’re using the CINDy modulator inference algorithm [paper] as well as protein-protein interactions predicted by the PrePPI structure-based algorithm [paper]. The output of the CINDy algorithm is a likelihood (p value) for the association of an upstream modulator with a particular regulators activity. The relevant output for PrePPI is a likelihood (p value) that a modulator structurally binds to a regulator.
To make the MOMA object they need to be a named list of lists, with the indexes of each being a regulator and their associated partners.
note: CINDy values are context specific and need to be obtained for a particular tissue/expression set. Values calculated from the tumor types in TCGA can be found [here] for use in applying this analysis to other datasets. The PREPPi values are not context specific and the values from this example analysis can be used with other datasets.
Initialize the Moma object with the assays and pathway data in order to start the analysis. The required inputs are the following:
Other optional inputs include:
momaObj <- MomaConstructor(example.gbm.mae, gbm.pathways)
#> Found the following assays:viper, cnv, mut
#> Common samples across main assays: 50
#> Checking labels on pathway cindy
#> Found labels for 364 TFs in VIPER matrix
#> Checking labels on pathway preppi
#> Found labels for 2505 TFs in VIPER matrix
The first step, runDIGGIT()
will run the DIGGIT
inference algorithm [paper]
to find statistical interactions between VIPER-inferred proteins and
genomic events.
The makeInteractions()
function will infer robust
computational predictions using all the provided data, including the
Conditional Inference of Network Dynamics (CINDy) algorithm.
The Rank()
function will create a final ranking of
candidate Master Regulators for this cohort of patient samples.
Clustering of the samples, using the protein ranks computed in the
last step, can then be performed using Cluster()
. Multiple
cluster solutions will be calculated, ranging from 2 to 15 clusters by
default. The reliability or average silhouette scores of each can be
assessed to determine an optimal ‘k’ number of clusters. By default the
clustering solution with the maximum reliability will be saved to the
object, but any solution can be saved in afterwards.
momaObj$Cluster()
#> using pearson correlation with weighted vipermat
#> testing clustering options, k = 2..15
#> Using reliability scores to select best solution.
#> Best solution is: 6clusters
# Explore the reliability scores
momaObj$clustering.results$all.cluster.reliability
#> 2clusters 3clusters 4clusters 5clusters 6clusters 7clusters 8clusters
#> 0.6144845 0.7529229 0.7456256 0.7775375 0.7907407 0.6766867 0.6616316
#> 9clusters 10clusters 11clusters 12clusters 13clusters 14clusters 15clusters
#> 0.6022523 0.5793794 0.5645946 0.5290190 0.5175175 0.5476476 0.5363463
# Save in the 3 cluster solution
momaObj$sample.clustering <- momaObj$clustering.results$`3clusters`$clustering
Genomic saturation analysis is then performed on each cluster with
the saturationCalculation()
function, allowing us to find
the key proteins that are downstream of the majority of genomic events
in the samples within a particular cluster. These regulators make up
that cluster’s checkpoint
.
The results of this analysis can be accessed directly in the following result fields:
momaObj$checkpoints
momaObj$genomic.saturation
momaObj$coverage.summaryStats
These will be used for plotting the genomic saturation curves as well.
cluster1.checkpoint <- momaObj$checkpoints[[1]]
print (cluster1.checkpoint[1:5])
#> [1] "5702" "55170" "7014" "23028" "29128"
The primary results of the analysis are the master regululators of each particular cluster’s checkpoint, as displayed above. You can plot the original VIPER matrix subset down to only these regulators using whatever heatmap function you like.
The other important results of the analysis are the statisically
significant genomic events found to be upstream of each of the
checkpoint master regulators. The makeSaturationPlots()
function takes the subtype specific genomic event interactions
calculated in the previous step and makes plots for viewing the results.
Because there are usually quite a large number of events that are
detected only the most frequently occurring events will be plotted.
(Note: to account for amplifications and deletions being events that can
occur across multiple genes, these events are considered on a cytoband
basis).
The two plot types are:
The data for these plots are stored as grobs and ggplot objects so layers or other graphic modifications can be added afterwards for further customization.
Any of the results fields can be saved by using the
saveData()
function. Pass in any of the names of the
results fields and they will be saved to files in the designated output
folder. If no names are passed all results will be saved.
Here is the output of sessionInfo() on the system on which this package and documentation was compiled.
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] grid stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] ggplot2_3.5.1 MOMA_1.19.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 farver_2.1.2
#> [3] dplyr_1.1.4 fastmap_1.2.0
#> [5] digest_0.6.37 lifecycle_1.0.4
#> [7] cluster_2.1.6 statmod_1.5.0
#> [9] magrittr_2.0.3 compiler_4.4.2
#> [11] rlang_1.1.4 sass_0.4.9
#> [13] tools_4.4.2 utf8_1.2.4
#> [15] yaml_2.3.10 knitr_1.49
#> [17] labeling_0.4.3 S4Arrays_1.7.1
#> [19] DelayedArray_0.33.2 plyr_1.8.9
#> [21] RColorBrewer_1.1-3 abind_1.4-8
#> [23] withr_3.0.2 purrr_1.0.2
#> [25] BiocGenerics_0.53.3 sys_3.4.3
#> [27] stats4_4.4.2 fansi_1.0.6
#> [29] colorspace_2.1-1 scales_1.3.0
#> [31] iterators_1.0.14 MultiAssayExperiment_1.33.0
#> [33] SummarizedExperiment_1.37.0 cli_3.6.3
#> [35] rmarkdown_2.29 crayon_1.5.3
#> [37] generics_0.1.3 robustbase_0.99-4-1
#> [39] httr_1.4.7 reshape2_1.4.4
#> [41] tzdb_0.4.0 rjson_0.2.23
#> [43] BiocBaseUtils_1.9.0 qvalue_2.39.0
#> [45] cachem_1.1.0 stringr_1.5.1
#> [47] zlibbioc_1.52.0 splines_4.4.2
#> [49] parallel_4.4.2 BiocManager_1.30.25
#> [51] XVector_0.47.0 matrixStats_1.4.1
#> [53] vctrs_0.6.5 Matrix_1.7-1
#> [55] jsonlite_1.8.9 IRanges_2.41.1
#> [57] GetoptLong_1.0.5 hms_1.1.3
#> [59] S4Vectors_0.45.2 clue_0.3-66
#> [61] maketools_1.3.1 foreach_1.5.2
#> [63] limma_3.63.2 tidyr_1.3.1
#> [65] jquerylib_0.1.4 glue_1.8.0
#> [67] DEoptimR_1.1-3 MKmisc_1.9
#> [69] codetools_0.2-20 stringi_1.8.4
#> [71] shape_1.4.6.1 gtable_0.3.6
#> [73] GenomeInfoDb_1.43.1 GenomicRanges_1.59.0
#> [75] UCSC.utils_1.3.0 ComplexHeatmap_2.23.0
#> [77] munsell_0.5.1 tibble_3.2.1
#> [79] pillar_1.9.0 htmltools_0.5.8.1
#> [81] GenomeInfoDbData_1.2.13 circlize_0.4.16
#> [83] R6_2.5.1 doParallel_1.0.17
#> [85] lattice_0.22-6 evaluate_1.0.1
#> [87] Biobase_2.67.0 readr_2.1.5
#> [89] png_0.1-8 bslib_0.8.0
#> [91] Rcpp_1.0.13-1 SparseArray_1.7.2
#> [93] xfun_0.49 MatrixGenerics_1.19.0
#> [95] buildtools_1.0.0 pkgconfig_2.0.3
#> [97] GlobalOptions_0.1.2