#library(TDbasedUFEadv)

Introduction

Since TDbasedUFEadv is an advanced package from TDbasedUFE (Taguchi 2023), please master the contents in TDbasedUFE prior to the trial of this package.

Motivations

Since the publication of the book (Taguchi 2020) describing the methodology, I have published numerous papers using this method. In spite of that, very limited number of researcher used this method, possibly because of unfamiliarity with the mathematical concepts used in this methodology, tensors. Thus I decided to develop the packages by which users can use the methods without detailed knowledge about the tensor.

Integrated analysis of two omics data sets

When features are shared.

Full tensor

Tensor decomposition towards tensor generated from two matrices

Suppose we have two omics profiles $$ x_{ij} \in \mathbb{R}^{N \times M} \\ x_{ik} \in \mathbb{R}^{N \times K} $$ that represent values of ith feature of jth and kth objects, respectively (i.r., these two profiles share the features). In this case, we generate a tensor, x_ijk, by the product of two profiles as x_ijk = x_ijx_ik ∈ ℝ^{N × M × K}

and HOSVD was applied to x_ijk to get x_ijk = ∑_ℓ₁∑_ℓ₂∑_ell₃G(ℓ₁ℓ₂ℓ₃)u_ℓ₁iu_ℓ₂ju_ℓ₃k After that we can follow the standard procedure to select features is associated with the desired properties represented by the selected singular value vectors, u_ℓ₂j and u_ℓ₃k, attributed to objects, js and ks.

Matrix generated by partial summation

After partial summation of tensor

In the above, we dealt with full tensor. It is often difficult to treat full tensor, since it is as large as N × MtimesK. In this case, we can take the alternative approach. In order that we define reduced matrix with taking partial summation x_jk = ∑_ix_ijk and apply SVD to x_jk as x_jk = ∑_ℓλ_ℓu_ℓjv_ℓk and singular value vectors attributed to samples as $$ u^{(j)}_{\ell i} = \sum_j u_{\ell j} x_{ij} \\ u^{(k)}_{\ell i} = \sum_k v_{\ell k} x_{ik} $$ In this case, singular value vectors are attributed separately to features associated with objects j and k, respectively.

The feature selection can be done using these singular value vectors associated with selected singular value vectors attributed to samples, j and k.

When samples are shared.

Full tensor

Tensor decomposition towards tensor generated from two matrices

In the case where not features but samples are shared between two omics data, we can do something similar. $$ x_{ij} \in \mathbb{R}^{N \times M} \\ x_{kj} \in \mathbb{R}^{K \times M} $$

In this case, we generate a tensor, x_ijk, by the product of two profiles as x_ijk = x_ijx_kj ∈ ℝ^{N × M × K}

and HOSVD was applied to x_ijk to get x_ijk = ∑_ℓ₁∑_ℓ₂∑_ell₃G(ℓ₁ℓ₂ℓ₃)u_ℓ₁iu_ℓ₂ju_ℓ₃k After that we can follow the standard procedure to select features is and ks associated with the desired properties represented by the selected singular value vectors, u_ℓ₂j, attributed to objects, js.

Matrix generated from partial summation

After partial summation of tensor

In the above, we dealt with full tensor. It is often difficult to treat full tensor, since it is as large as N × MtimesK. In this case, we can take the alternative approach. In order that we define reduced matrix with taking partial summation x_ik = ∑_jx_ijk and apply SVD to x_jk as x_ik = ∑_ℓλ_ℓu_ℓiv_ℓk is and ks are selected with u_ℓi and v_ℓk, respectively. Singular value vectors attributed to samples can be computed as $$ u^{(i)}_{\ell j} = \sum_i u_{\ell i} x_{ij} \\ u^{(k)}_{\ell j} = \sum_k v_{\ell k} x_{kj} $$

Integrated analysis using projection

Here we would like to propose an alternative strategy to integrate multiple tensors using projection with SVD.

When samples are shared.

Projection when samples are shared

Suppose we have multiomics data as x_ijk ∈ ℝ^{N_k × M × K} for ith feature of jth sample at kth omics data.

In order to bundle them into a tensor, we applied SVD to them as x_ijk = ∑_ℓλ_ℓ^(k)u_{ℓi_k}v_ℓj^(k)

Then apply HOSVD to v_ℓj^(k) as v_ℓj^(k) = ∑_ℓ₁∑_ℓ₂∑_ℓ₃G(ℓ₁ℓ₂ℓ₃)u_ℓ₁ℓu_ℓ₂ju_ℓ₃k After identifying the u_ℓ₂j and u_ℓ₃k of interest, we can compute u_{ℓi_k} as u_{ℓ₂i_k} = ∑_ju_ℓ₂jx_{i_kjk} Then i_k can be selected as usual.

When features are shared.

Projection when features are shared

Suppose we have multiple sets of samples as x_{ij_kk} ∈ ℝ^{N × M_k × K}

In order to bundle them into a tensor, we apply SVD to x_{ij_kk} as x_{ij_kk} = ∑_ℓλ_ℓ^(k)u_ℓi^(k)u_{ℓj_k} HOSVD is applied to u_ℓi^(k) as u_ℓi^(k) = ∑_ℓ₁∑_ℓ₂∑_ℓ₃G(ℓ₁ℓ₂ℓ₃)u_ℓ₁ℓu_ℓ₂iu_ℓ₃k u_{ℓ₂j_k} is generated as u_{ℓ₂j_k} = ∑_iu_ℓ₂ix_{ij_kk}

After identifying u_{ℓ₂j_k}s of interest, we select is using u_ℓ₂i.

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] STRINGdb_2.19.0              enrichR_3.4                 
#>  [3] RTCGA.clinical_20151101.36.0 RTCGA.rnaseq_20151101.36.0  
#>  [5] RTCGA_1.37.0                 enrichplot_1.27.4           
#>  [7] DOSE_4.1.0                   TDbasedUFEadv_1.7.0         
#>  [9] TDbasedUFE_1.7.0             BiocStyle_2.35.0            
#> 
#> loaded via a namespace (and not attached):
#>   [1] rTensor_1.4.8           splines_4.4.2           later_1.4.1            
#>   [4] bitops_1.0-9            ggplotify_0.1.2         tibble_3.2.1           
#>   [7] R.oo_1.27.0             XML_3.99-0.18           lifecycle_1.0.4        
#>  [10] rstatix_0.7.2           lattice_0.22-6          backports_1.5.0        
#>  [13] magrittr_2.0.3          sass_0.4.9              rmarkdown_2.29         
#>  [16] jquerylib_0.1.4         yaml_2.3.10             plotrix_3.8-4          
#>  [19] httpuv_1.6.15           ggtangle_0.0.6          cowplot_1.1.3          
#>  [22] DBI_1.2.3               buildtools_1.0.0        RColorBrewer_1.1-3     
#>  [25] abind_1.4-8             MOFAdata_1.22.0         rvest_1.0.4            
#>  [28] GenomicRanges_1.59.1    purrr_1.0.4             R.utils_2.12.3         
#>  [31] BiocGenerics_0.53.6     RCurl_1.98-1.16         hash_2.2.6.3           
#>  [34] yulab.utils_0.2.0       WriteXLS_6.7.0          GenomeInfoDbData_1.2.13
#>  [37] IRanges_2.41.3          KMsurv_0.1-5            S4Vectors_0.45.4       
#>  [40] ggrepel_0.9.6           tidytree_0.4.6          maketools_1.3.2        
#>  [43] proto_1.0.0             codetools_0.2-20        xml2_1.3.6             
#>  [46] tximportData_1.34.0     tidyselect_1.2.1        aplot_0.2.4            
#>  [49] UCSC.utils_1.3.1        farver_2.1.2            viridis_0.6.5          
#>  [52] stats4_4.4.2            jsonlite_1.8.9          Formula_1.2-5          
#>  [55] survival_3.8-3          tools_4.4.2             chron_2.3-62           
#>  [58] treeio_1.31.0           Rcpp_1.0.14             glue_1.8.0             
#>  [61] gridExtra_2.3           xfun_0.50               qvalue_2.39.0          
#>  [64] ggthemes_5.1.0          GenomeInfoDb_1.43.4     dplyr_1.1.4            
#>  [67] withr_3.0.2             BiocManager_1.30.25     fastmap_1.2.0          
#>  [70] caTools_1.18.3          digest_0.6.37           R6_2.6.1               
#>  [73] mime_0.12               gridGraphics_0.5-1      colorspace_2.1-1       
#>  [76] GO.db_3.20.0            gtools_3.9.5            RSQLite_2.3.9          
#>  [79] R.methodsS3_1.8.2       tidyr_1.3.1             generics_0.1.3         
#>  [82] data.table_1.16.4       httr_1.4.7              sqldf_0.4-11           
#>  [85] pkgconfig_2.0.3         gtable_0.3.6            blob_1.2.4             
#>  [88] XVector_0.47.2          sys_3.4.3               survMisc_0.5.6         
#>  [91] htmltools_0.5.8.1       carData_3.0-5           fgsea_1.33.2           
#>  [94] scales_1.3.0            Biobase_2.67.0          png_0.1-8              
#>  [97] ggfun_0.1.8             knitr_1.49              km.ci_0.5-6            
#> [100] tzdb_0.4.0              reshape2_1.4.4          rjson_0.2.23           
#> [103] nlme_3.1-167            curl_6.2.0              cachem_1.1.0           
#> [106] zoo_1.8-12              stringr_1.5.1           KernSmooth_2.23-26     
#> [109] parallel_4.4.2          AnnotationDbi_1.69.0    pillar_1.10.1          
#> [112] grid_4.4.2              vctrs_0.6.5             gplots_3.2.0           
#> [115] promises_1.3.2          ggpubr_0.6.0            car_3.1-3              
#> [118] xtable_1.8-4            tximport_1.35.0         evaluate_1.0.3         
#> [121] readr_2.1.5             gsubfn_0.7              cli_3.6.4              
#> [124] compiler_4.4.2          rlang_1.1.5             crayon_1.5.3           
#> [127] ggsignif_0.6.4          labeling_0.4.3          survminer_0.5.0        
#> [130] plyr_1.8.9              fs_1.6.5                stringi_1.8.4          
#> [133] viridisLite_0.4.2       BiocParallel_1.41.1     assertthat_0.2.1       
#> [136] munsell_0.5.1           Biostrings_2.75.3       lazyeval_0.2.2         
#> [139] GOSemSim_2.33.0         Matrix_1.7-2            hms_1.1.3              
#> [142] patchwork_1.3.0         bit64_4.6.0-1           ggplot2_3.5.1          
#> [145] KEGGREST_1.47.0         shiny_1.10.0            igraph_2.1.4           
#> [148] broom_1.0.7             memoise_2.0.1           bslib_0.9.0            
#> [151] ggtree_3.15.0           fastmatch_1.1-6         bit_4.5.0.1            
#> [154] ape_5.8-1

Taguchi, Y-H. 2020. Unsupervised Feature Extraction Applied to Bioinformatics. Springer International Publishing. https://doi.org/10.1007/978-3-030-22456-1.

———. 2023. TDbasedUFE: Tensor Decomposition Bassed Unsupervised Feature Extraction. https://github.com/tagtag/TDbasedUFE.

Explanation of TDbasedUFEadv

Introduction

Motivations

What differs from related packages.

Integrated analysis of two omics data sets

When features are shared.

Full tensor

Matrix generated by partial summation

When samples are shared.

Full tensor

Matrix generated from partial summation

Integrated analysis using projection

When samples are shared.

When features are shared.