Transcriptional networks are important tools to visualize complex biological systems that involve large groups of genes and multiple regulators. In a previous study, we have implemented the RTN R/Bioconductor package to reconstruct transcriptional regulatory networks (Fletcher et al. 2013). This package reconstructs regulons, consisting of a regulator and its target genes. A regulon can be further used to investigate, for example, the association of its expression on survival probabilities.
RTNsurvival is a tool for integrating regulons generated by the RTN package with survival information for the same set of samples used in the reconstruction of the transcriptional regulatory network. There are two main survival analysis pipelines: a Cox Proportional Hazards approach used to model regulons as predictors of survival time (Figure 1), and a Kaplan-Meier analysis showing the stratification of a cohort based on the regulon activity (Figure 2). For a given regulon, the 2-tailed GSEA approach is used to estimate regulon activity (differential Enrichment Score - dES) for each individual sample (Castro et al. 2016), and the dES distribution of all samples is then used to assess the survival statistics for the cohort. The plots can be fine-tuned to the user’s specifications.
This Vignette uses precomputed regulons available in the RTN package:
The stni
is a TNI-class
object created from
a subset of the expression data available in the
Fletcher2013b
package. The stni
object
contains a minimal toy reconstruction of 5 regulons (PGR, MYB, E2F2,
FOXM1 and PTTG1).
More information about the parameters used to build the toy regulons
can be viewed by calling the summary of the stni
object.
The RTNsurvival package provides a survival table
for the samples in the stni
object, including clinical data
from the METABRIC study (Curtis et al.
2012) where the expression data was originally obtained.
In order to run the analysis pipelines, the input data must be
evaluated by the tnsPreprocess
method in order to build a
TNS-class
object. Note that the survival table must be
provided with time and event columns, and key covariates can also be
specified for subsequent use in the Cox analysis.
The survival analysis pipeline depends on the 2-tailed GSEA approach,
which estimates regulon activity (dES) for all samples in the cohort.
The tnsGSEA2
function calls the tni.gsea2
method available in the RTN package.
Once the dES metric has been computed by tnsGSEA2
function, then it is possible to run the Cox analysis.
The tnsCox
method runs a Cox multivariate regression
analysis and shows the proportional hazards of each of the specified
regulons and the provided key covariates, indicating the contribution of
each variable to survival (Figure 1). The method uses
the Bioconductor survival package to fit the Cox model.
Figure 1 - The plot shows the Hazard Ratio for all key covariates and regulons. Lines that are completely to the right of the grey line, shown in red, are positively associated with hazard. This means that samples with high expression of this regulon have poor prognosis. The further to the right or left of the grey line, the more significant is the association.
The tnsKM
method generates a Kaplan-Meier plot, which
consists of three panels put together: a ranked dES plot for the cohort,
a status of key attributes plot (optional) and a Kaplan-Meier plot,
showing survival curves for lower and higher dES samples (Figure
2).
rtns <- tnsKM(rtns, sections = 2)
tnsPlotKM(rtns, regs="FOXM1", attribs = list(c("ER+","ER-"),c("PR+","PR-"),c("G1","G2","G3"),"HT"))
Figure 2 - The Kaplan-Meier plot for FOXM1 shows that samples with high regulon activity (red and pink) have poorer prognosis, as their survival probability is lower than the samples that have low regulon activity (light and dark blue).
Individual sample differential enrichment analysis can be
investigated using the tnsPlotGSEA2
function. This will
generate a 2-tailed GSEA plot for the differential expression of both
positive and negative targets of a regulon (Figure 3).
This step takes a little longer because the GSEA is recomputed for a
selected regulon, and because tnsPlotGSEA2
is a wrapper for
the RTN function tna.plot.gsea2
, which
generates the GSEA plot.
Figure 3 - The 2-tailed GSEA plot for the MB-5116 sample. It shows that the positive targets of the FOXM1 regulator are positively enriched, while the negative targets are negatively enriched.
An overview of regulon activity can be obtained by plotting a heatmap
with all evaluated samples and regulons. First, we need to obtain the
matrix of dES values from the TNS object. Then, we can plot the heatmap
using the pheatmap
function from the
pheatmap package. In this example, we also illustrate
how to incorporate sample features from the survival data.
library(pheatmap)
enrichmentScores <- tnsGet(rtns, "regulonActivity")
survival.data <- tnsGet(rtns, "survivalData")
annotation_col <- survival.data[,c("ER+", "ER-")]
pheatmap(t(enrichmentScores$dif),
annotation_col = annotation_col,
show_colnames = FALSE,
annotation_legend = FALSE)
Figure 4 - Regulon activity of individual tumour samples. This heatmap shows two regulon clusters. The PGR and MYB regulons are repressed in the ER- samples and activated in ER+ samples. The PTTG1, E2F2 and FOXM1 regulons, on the other hand, are activated in ER- samples and repressed in ER+ samples.
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RTNsurvival_1.31.0 RTNduals_1.30.0 RTN_2.30.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 viridisLite_0.4.2
## [3] dplyr_1.1.4 mixtools_2.0.0
## [5] fastmap_1.2.0 lazyeval_0.2.2
## [7] egg_0.4.5 digest_0.6.37
## [9] lifecycle_1.0.4 survival_3.7-0
## [11] statmod_1.5.0 magrittr_2.0.3
## [13] kernlab_0.9-33 compiler_4.4.1
## [15] rlang_1.1.4 sass_0.4.9
## [17] tools_4.4.1 igraph_2.1.1
## [19] utf8_1.2.4 yaml_2.3.10
## [21] data.table_1.16.2 knitr_1.48
## [23] S4Arrays_1.6.0 htmlwidgets_1.6.4
## [25] DelayedArray_0.33.1 RColorBrewer_1.1-3
## [27] abind_1.4-8 KernSmooth_2.23-24
## [29] purrr_1.0.2 viper_1.40.0
## [31] BiocGenerics_0.53.0 sys_3.4.3
## [33] grid_4.4.1 stats4_4.4.1
## [35] fansi_1.0.6 dunn.test_1.3.6
## [37] e1071_1.7-16 colorspace_2.1-1
## [39] ggplot2_3.5.1 scales_1.3.0
## [41] MASS_7.3-61 SummarizedExperiment_1.36.0
## [43] cli_3.6.3 rmarkdown_2.28
## [45] crayon_1.5.3 generics_0.1.3
## [47] httr_1.4.7 cachem_1.1.0
## [49] proxy_0.4-27 zlibbioc_1.52.0
## [51] splines_4.4.1 parallel_4.4.1
## [53] BiocManager_1.30.25 XVector_0.46.0
## [55] matrixStats_1.4.1 vctrs_0.6.5
## [57] Matrix_1.7-1 jsonlite_1.8.9
## [59] carData_3.0-5 pwr_1.3-0
## [61] car_3.1-3 IRanges_2.41.0
## [63] S4Vectors_0.44.0 minet_3.65.0
## [65] Formula_1.2-5 maketools_1.3.1
## [67] limma_3.63.0 plotly_4.10.4
## [69] tidyr_1.3.1 jquerylib_0.1.4
## [71] snow_0.4-4 glue_1.8.0
## [73] gtable_0.3.6 GenomeInfoDb_1.43.0
## [75] GenomicRanges_1.59.0 UCSC.utils_1.2.0
## [77] munsell_0.5.1 tibble_3.2.1
## [79] pillar_1.9.0 htmltools_0.5.8.1
## [81] GenomeInfoDbData_1.2.13 R6_2.5.1
## [83] evaluate_1.0.1 lattice_0.22-6
## [85] Biobase_2.67.0 highr_0.11
## [87] pheatmap_1.0.12 segmented_2.1-3
## [89] RedeR_3.2.0 bslib_0.8.0
## [91] class_7.3-22 gridExtra_2.3
## [93] SparseArray_1.6.0 nlme_3.1-166
## [95] xfun_0.48 MatrixGenerics_1.19.0
## [97] buildtools_1.0.0 pkgconfig_2.0.3