Recurrent patterns in biological networks may reflect critical roles in multiple biological processes (Bracken, Scott, and Goodall 2016), for example, regulatory loops between transcription factors and microRNAs (Zhang et al. 2015). RTNduals searches for regulatory patterns between pairs of regulators, using regulatory networks generated by the RTN package (for details, please refer to the RTN documentation) (Castro et al. 2016). In such a network, each regulator has an associated set of target genes (i.e. a regulon), and when we assess the shared targets between a pair of regulators, we find triplets that may be regulated in a positive or negative direction, whith regulators either cooperating or competing in the regulatory network. The inference of dual regulons requires three complementary statistics: (1) Targets are assigned to regulons based on mutual information (MI) between the regulator and the target. The significance of the MI statistics is assessed by permutation and bootstrap analysis. (2) Shared targets between any two regulons are identified and the similarity in regulation (i.e. positive or negative direction) is assessed by correlation analysis. (3) A test is carried out to determine if the number of shared targets is higher than expected by chance. The schematics in Figure 1 show two triplets formed between regulators. In (a) the two regulators co-operate by influencing shared targets in the same direction (co-activation or co-repression), while in (b) they compete, influencing targets in opposite directions. For gene expression data, typical regulators might include transcription factors, miRNAs, eRNAs and lncRNAs.
Figure 1. Examples of regulators and predicted associations. This figure illustrates four triplets formed between regulators (a). Regulators R1 and R2 co-activate or co-repress shared targets. (b) Regulators R1 and R2 compete, influencing targets in opposite directions.
The RTNduals workflow starts with a preprocessing step that generates a TNI-class object from an expression matrix and a list of regulators. The expression matrix is typically obtained from multiple samples (e.g. transcriptomes from a cancer cohort), while the list of regulators represents some prior biological information indicating which genes in the expression matrix should be regarded as regulators. The input data can also deal with different classes of regulators; for example, genes and microRNAs. In this case, the expression matrix should comprise mRNA and miRNA expression values.
This example provides the data required to generate an TNI-class object. The dataset tniData is available from the RTN package and consists of an R list with 6 objects, 3 of which will be used in the subsequent analysis: (1) expData, a named gene expression matrix with 120 samples (genes in rows, samples in cols), (2) rowAnnotation, a data.frame with Probe-to-ENTREZ annotation, and (3) tfs, a character vector listing transcription factors. These datasets were extracted, pre-processed and size-reduced from Fletcher et al. (2013), and should be regarded as examples for demonstration purposes only.
The gexp data matrix and the corresponding
annotation are evaluated by the tni.constructor
function in
order to check the consistency of the input data. After this step it is
generated a pre-processed TNI-class object whose status
is updated to ‘Preprocess [x]’.
The tni.permutation
method takes the pre-processed
TNI-class object and returns a regulatory network
inferred by mutual information analysis (with multiple hypothesis
testing corrections).
In additional to the permutation analysis, the stability of the
regulatory network is assessed by bootstrapping using the
tni.bootstrap
function.
In a given regulatory network each target can be linked to multiple regulators as a result of both direct and indirect interactions. The Data Processing Inequality (DPI) algorithm (Meyer, Lafitte, and Bontempi 2008) is used to remove the weakest interaction between two regulators and a common target.
The mbrAssociation
method takes the transcriptional
network computed in the previous steps and enumerates all triplets
formed by two regulatores and one shared target. The method retrieves
the mutual information between regulators and assesses the agreement
between the predicted downstream effects using correlation analysis. A
Fisher’s exact test is used to evaluate whether the number of shared
targets is greater than expected by chance.
A summary of the results can be accessed from ‘rmbr’ using the
mbrGet
function.
## $MBR
## $MBR$Duals
## Tested Predicted
## Duals 21 4
##
##
## $TNI
## $TNI$tnet
## Regulators Targets Edges
## tnet.ref 7 2375 7117
## tnet.dpi 7 2375 3869
##
## $TNI$regulonSize
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## tnet.ref 796 959 1041 1016.7143 1103.5 1155
## tnet.dpi 380 459 543 552.7143 666.0 696
##--- get results
overlap <- mbrGet(rmbr, what="dualsOverlap")
correlation <- mbrGet(rmbr, what="dualsCorrelation")
Also, when prior evidences are available this method can add a ‘supplementaryTable’ regarding the association between regulators. The ‘supplementaryTable’ is a ‘data.frame’ listing unique relationships between any two regulators (please refer to the documentation for details on the input data format).
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RTNduals_1.31.0 RTN_2.30.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 viridisLite_0.4.2
## [3] dplyr_1.1.4 mixtools_2.0.0
## [5] fastmap_1.2.0 lazyeval_0.2.2
## [7] digest_0.6.37 lifecycle_1.0.4
## [9] survival_3.7-0 statmod_1.5.0
## [11] magrittr_2.0.3 kernlab_0.9-33
## [13] compiler_4.4.1 rlang_1.1.4
## [15] sass_0.4.9 tools_4.4.1
## [17] igraph_2.1.1 utf8_1.2.4
## [19] yaml_2.3.10 data.table_1.16.2
## [21] knitr_1.48 S4Arrays_1.6.0
## [23] htmlwidgets_1.6.4 DelayedArray_0.33.1
## [25] RColorBrewer_1.1-3 abind_1.4-8
## [27] KernSmooth_2.23-24 purrr_1.0.2
## [29] viper_1.40.0 BiocGenerics_0.53.0
## [31] sys_3.4.3 grid_4.4.1
## [33] stats4_4.4.1 fansi_1.0.6
## [35] e1071_1.7-16 colorspace_2.1-1
## [37] ggplot2_3.5.1 scales_1.3.0
## [39] MASS_7.3-61 SummarizedExperiment_1.36.0
## [41] cli_3.6.3 rmarkdown_2.28
## [43] crayon_1.5.3 generics_0.1.3
## [45] httr_1.4.7 cachem_1.1.0
## [47] proxy_0.4-27 zlibbioc_1.52.0
## [49] splines_4.4.1 parallel_4.4.1
## [51] BiocManager_1.30.25 XVector_0.46.0
## [53] matrixStats_1.4.1 vctrs_0.6.5
## [55] Matrix_1.7-1 jsonlite_1.8.9
## [57] carData_3.0-5 pwr_1.3-0
## [59] car_3.1-3 IRanges_2.41.0
## [61] S4Vectors_0.44.0 minet_3.65.0
## [63] Formula_1.2-5 maketools_1.3.1
## [65] limma_3.63.0 plotly_4.10.4
## [67] tidyr_1.3.1 jquerylib_0.1.4
## [69] snow_0.4-4 glue_1.8.0
## [71] gtable_0.3.6 GenomeInfoDb_1.43.0
## [73] GenomicRanges_1.59.0 UCSC.utils_1.2.0
## [75] munsell_0.5.1 tibble_3.2.1
## [77] pillar_1.9.0 htmltools_0.5.8.1
## [79] GenomeInfoDbData_1.2.13 R6_2.5.1
## [81] evaluate_1.0.1 lattice_0.22-6
## [83] Biobase_2.67.0 pheatmap_1.0.12
## [85] segmented_2.1-3 RedeR_3.2.0
## [87] bslib_0.8.0 class_7.3-22
## [89] SparseArray_1.6.0 nlme_3.1-166
## [91] xfun_0.48 MatrixGenerics_1.19.0
## [93] buildtools_1.0.0 pkgconfig_2.0.3