Since TDbasedUFEadv is an advanced package from TDbasedUFE (Taguchi 2023), please master the contents in TDbasedUFE prior to the trial of this package.
Since the publication of the book (Taguchi 2020) describing the methodology, I have published numerous papers using this method. In spite of that, very limited number of researcher used this method, possibly because of unfamiliarity with the mathematical concepts used in this methodology, tensors. Thus I decided to develop the packages by which users can use the methods without detailed knowledge about the tensor.
The main purpose of this package is to select features (typically genes) based upon provided omics data sets. In this sense, apparently the functionality of this package is similar to DESeq2 or limma, which have functionality that can identify differentially expressed genes. In contrast to those supervised methods, the present method is unsupervised one, which provides users what kind of profiles are observed over samples, and users are advised to select one of favorite features by which features are selected. In addition to this, the present method is suitable to small number of samples associated with large number of features. Since this situation is very common in genomics, the present method is supposed to be suitable to be applied to genomics, although it does not look liked the methods very specific to genomics science. Actually, we have published the number of papers using the methods implemented in the present package. I hope that one can make use of this package for his/her own researches.
Suppose we have two omics profiles $$ x_{ij} \in \mathbb{R}^{N \times M} \\ x_{ik} \in \mathbb{R}^{N \times K} $$ that represent values of ith feature of jth and kth objects, respectively (i.r., these two profiles share the features). In this case, we generate a tensor, xijk, by the product of two profiles as xijk = xijxik ∈ ℝN × M × K
and HOSVD was applied to xijk to get xijk = ∑ℓ1∑ℓ2∑ell3G(ℓ1ℓ2ℓ3)uℓ1iuℓ2juℓ3k After that we can follow the standard procedure to select features is associated with the desired properties represented by the selected singular value vectors, uℓ2j and uℓ3k, attributed to objects, js and ks.
In the above, we dealt with full tensor. It is often difficult to treat full tensor, since it is as large as N × MtimesK. In this case, we can take the alternative approach. In order that we define reduced matrix with taking partial summation xjk = ∑ixijk and apply SVD to xjk as xjk = ∑ℓλℓuℓjvℓk and singular value vectors attributed to samples as $$ u^{(j)}_{\ell i} = \sum_j u_{\ell j} x_{ij} \\ u^{(k)}_{\ell i} = \sum_k v_{\ell k} x_{ik} $$ In this case, singular value vectors are attributed separately to features associated with objects j and k, respectively.
The feature selection can be done using these singular value vectors associated with selected singular value vectors attributed to samples, j and k.
In the case where not features but samples are shared between two omics data, we can do something similar. $$ x_{ij} \in \mathbb{R}^{N \times M} \\ x_{kj} \in \mathbb{R}^{K \times M} $$
In this case, we generate a tensor, xijk, by the product of two profiles as xijk = xijxkj ∈ ℝN × M × K
and HOSVD was applied to xijk to get xijk = ∑ℓ1∑ℓ2∑ell3G(ℓ1ℓ2ℓ3)uℓ1iuℓ2juℓ3k After that we can follow the standard procedure to select features is and ks associated with the desired properties represented by the selected singular value vectors, uℓ2j, attributed to objects, js.
In the above, we dealt with full tensor. It is often difficult to treat full tensor, since it is as large as N × MtimesK. In this case, we can take the alternative approach. In order that we define reduced matrix with taking partial summation xik = ∑jxijk and apply SVD to xjk as xik = ∑ℓλℓuℓivℓk is and ks are selected with uℓi and vℓk, respectively. Singular value vectors attributed to samples can be computed as $$ u^{(i)}_{\ell j} = \sum_i u_{\ell i} x_{ij} \\ u^{(k)}_{\ell j} = \sum_k v_{\ell k} x_{kj} $$
Here we would like to propose an alternative strategy to integrate multiple tensors using projection with SVD.
Suppose we have multiomics data as xijk ∈ ℝNk × M × K for ith feature of jth sample at kth omics data.
In order to bundle them into a tensor, we applied SVD to them as xijk = ∑ℓλℓ(k)uℓikvℓj(k)
Then apply HOSVD to vℓj(k) as vℓj(k) = ∑ℓ1∑ℓ2∑ℓ3G(ℓ1ℓ2ℓ3)uℓ1ℓuℓ2juℓ3k After identifying the uℓ2j and uℓ3k of interest, we can compute uℓik as uℓ2ik = ∑juℓ2jxikjk Then ik can be selected as usual.
Suppose we have multiple sets of samples as xijkk ∈ ℝN × Mk × K
In order to bundle them into a tensor, we apply SVD to xijkk as xijkk = ∑ℓλℓ(k)uℓi(k)uℓjk HOSVD is applied to uℓi(k) as uℓi(k) = ∑ℓ1∑ℓ2∑ℓ3G(ℓ1ℓ2ℓ3)uℓ1ℓuℓ2iuℓ3k uℓ2jk is generated as uℓ2jk = ∑iuℓ2ixijkk
After identifying uℓ2jks of interest, we select is using uℓ2i.
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] STRINGdb_2.19.0 enrichR_3.2
#> [3] RTCGA.clinical_20151101.36.0 RTCGA.rnaseq_20151101.36.0
#> [5] RTCGA_1.37.0 enrichplot_1.27.1
#> [7] DOSE_4.1.0 TDbasedUFEadv_1.7.0
#> [9] TDbasedUFE_1.7.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] rTensor_1.4.8 splines_4.4.2 later_1.3.2
#> [4] bitops_1.0-9 ggplotify_0.1.2 tibble_3.2.1
#> [7] R.oo_1.27.0 XML_3.99-0.17 lifecycle_1.0.4
#> [10] rstatix_0.7.2 lattice_0.22-6 backports_1.5.0
#> [13] magrittr_2.0.3 sass_0.4.9 rmarkdown_2.29
#> [16] jquerylib_0.1.4 yaml_2.3.10 plotrix_3.8-4
#> [19] httpuv_1.6.15 ggtangle_0.0.4 cowplot_1.1.3
#> [22] DBI_1.2.3 buildtools_1.0.0 RColorBrewer_1.1-3
#> [25] abind_1.4-8 MOFAdata_1.22.0 zlibbioc_1.52.0
#> [28] rvest_1.0.4 GenomicRanges_1.59.1 purrr_1.0.2
#> [31] R.utils_2.12.3 BiocGenerics_0.53.3 RCurl_1.98-1.16
#> [34] hash_2.2.6.3 yulab.utils_0.1.8 WriteXLS_6.7.0
#> [37] GenomeInfoDbData_1.2.13 IRanges_2.41.1 KMsurv_0.1-5
#> [40] S4Vectors_0.45.2 ggrepel_0.9.6 tidytree_0.4.6
#> [43] maketools_1.3.1 proto_1.0.0 codetools_0.2-20
#> [46] xml2_1.3.6 tximportData_1.34.0 tidyselect_1.2.1
#> [49] aplot_0.2.3 UCSC.utils_1.3.0 farver_2.1.2
#> [52] viridis_0.6.5 stats4_4.4.2 jsonlite_1.8.9
#> [55] Formula_1.2-5 survival_3.7-0 tools_4.4.2
#> [58] chron_2.3-61 treeio_1.31.0 Rcpp_1.0.13-1
#> [61] glue_1.8.0 gridExtra_2.3 xfun_0.49
#> [64] qvalue_2.39.0 ggthemes_5.1.0 GenomeInfoDb_1.43.1
#> [67] dplyr_1.1.4 withr_3.0.2 BiocManager_1.30.25
#> [70] fastmap_1.2.0 fansi_1.0.6 caTools_1.18.3
#> [73] digest_0.6.37 R6_2.5.1 mime_0.12
#> [76] gridGraphics_0.5-1 colorspace_2.1-1 GO.db_3.20.0
#> [79] gtools_3.9.5 RSQLite_2.3.8 R.methodsS3_1.8.2
#> [82] utf8_1.2.4 tidyr_1.3.1 generics_0.1.3
#> [85] data.table_1.16.2 httr_1.4.7 sqldf_0.4-11
#> [88] pkgconfig_2.0.3 gtable_0.3.6 blob_1.2.4
#> [91] XVector_0.47.0 sys_3.4.3 survMisc_0.5.6
#> [94] htmltools_0.5.8.1 carData_3.0-5 fgsea_1.33.0
#> [97] scales_1.3.0 Biobase_2.67.0 png_0.1-8
#> [100] ggfun_0.1.7 knitr_1.49 km.ci_0.5-6
#> [103] tzdb_0.4.0 reshape2_1.4.4 rjson_0.2.23
#> [106] nlme_3.1-166 curl_6.0.1 cachem_1.1.0
#> [109] zoo_1.8-12 stringr_1.5.1 KernSmooth_2.23-24
#> [112] parallel_4.4.2 AnnotationDbi_1.69.0 pillar_1.9.0
#> [115] grid_4.4.2 vctrs_0.6.5 gplots_3.2.0
#> [118] promises_1.3.0 ggpubr_0.6.0 car_3.1-3
#> [121] xtable_1.8-4 tximport_1.35.0 evaluate_1.0.1
#> [124] readr_2.1.5 gsubfn_0.7 cli_3.6.3
#> [127] compiler_4.4.2 rlang_1.1.4 crayon_1.5.3
#> [130] ggsignif_0.6.4 labeling_0.4.3 survminer_0.5.0
#> [133] plyr_1.8.9 fs_1.6.5 stringi_1.8.4
#> [136] viridisLite_0.4.2 BiocParallel_1.41.0 assertthat_0.2.1
#> [139] munsell_0.5.1 Biostrings_2.75.1 lazyeval_0.2.2
#> [142] GOSemSim_2.33.0 Matrix_1.7-1 hms_1.1.3
#> [145] patchwork_1.3.0 bit64_4.5.2 ggplot2_3.5.1
#> [148] KEGGREST_1.47.0 shiny_1.9.1 igraph_2.1.1
#> [151] broom_1.0.7 memoise_2.0.1 bslib_0.8.0
#> [154] ggtree_3.15.0 fastmatch_1.1-4 bit_4.5.0
#> [157] ape_5.8