BiocSingular
implements several useful DelayedMatrix
backends for
dealing with principal components analysis (PCA). This vignette aims to
provide an overview of these classes and how they can be used in other
packages to improve efficiency prior to or after PCA.
DeferredMatrix
classThis has now been moved to its own ScaledMatrix package. Check it out, it’s pretty cool.
LowRankMatrix
classOnce a PCA is performed, it is occasionally desirable to obtain a
low-rank approximation of the input matrix by taking the cross-product
of the rotation vectors and PC scores. Naively doing so results in the
formation of a dense matrix of the same dimensions as the input. This
may be prohibitively memory-consuming for a large data set. Instead, we
can construct a LowRankMatrix
class that mimics the output
of the cross-product without actually computing it.
library(Matrix)
a <- rsparsematrix(10000, 1000, density=0.01)
out <- runPCA(a, rank=5, BSPARAM=IrlbaParam(deferred=TRUE)) # deferring for speed.
recon <- LowRankMatrix(out$rotation, out$x)
recon
## <1000 x 10000> LowRankMatrix object of type "double":
## [,1] [,2] [,3] ... [,9999]
## [1,] -6.607941e-05 -3.309660e-03 -2.509974e-04 . 0.0006029616
## [2,] -7.794603e-04 -1.556315e-03 4.616200e-04 . 0.0028565633
## [3,] -5.427039e-04 9.868561e-04 -1.863603e-03 . -0.0031250885
## [4,] -3.779873e-03 -8.401240e-05 -2.049993e-03 . 0.0028219708
## [5,] 1.047063e-03 2.053309e-03 1.583250e-03 . -0.0010803753
## ... . . . . .
## [996,] 4.693951e-03 -1.163595e-02 6.549077e-03 . -7.600833e-03
## [997,] 2.929351e-04 -3.241344e-03 1.434765e-03 . 7.972958e-06
## [998,] -2.177646e-03 -9.159261e-04 -1.859687e-03 . -1.206412e-04
## [999,] -6.839735e-05 2.387144e-03 4.641709e-04 . 1.017343e-03
## [1000,] 2.363621e-03 7.373208e-04 5.527227e-04 . -5.160522e-03
## [,10000]
## [1,] 0.0049807597
## [2,] 0.0003094826
## [3,] 0.0045562989
## [4,] 0.0026209457
## [5,] -0.0034867483
## ... .
## [996,] 9.029046e-03
## [997,] 4.252964e-04
## [998,] 5.170815e-03
## [999,] -4.509402e-03
## [1000,] 4.960680e-04
This is useful for convenient extraction of row- or column vectors
without needing to manually perform a cross-product. A
LowRankMatrix
is thus directly interoperable with
downstream procedures (e.g., for visualization) that expect a matrix of
the same dimensionality as the input.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5.118e-02 -1.946e-03 -5.443e-05 -1.168e-04 1.766e-03 7.390e-02
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.090e-02 -1.191e-03 -5.502e-05 0.000e+00 1.110e-03 5.060e-02
Again, most operations will cause the LowRankMatrix
to
collapse gracefully into DelayedMatrix
for further
processing.
ResidualMatrix
classThis has now been moved to its own ResidualMatrix package. Check it out, it’s pretty cool.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Matrix_1.7-1 BiocParallel_1.41.0 BiocSingular_1.23.0
## [4] knitr_1.49 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.9 compiler_4.4.2 BiocManager_1.30.25
## [4] crayon_1.5.3 rsvd_1.0.5 Rcpp_1.0.13-1
## [7] parallel_4.4.2 jquerylib_0.1.4 IRanges_2.41.1
## [10] yaml_2.3.10 fastmap_1.2.0 lattice_0.22-6
## [13] XVector_0.47.0 R6_2.5.1 S4Arrays_1.7.1
## [16] generics_0.1.3 ScaledMatrix_1.15.0 BiocGenerics_0.53.3
## [19] DelayedArray_0.33.2 MatrixGenerics_1.19.0 maketools_1.3.1
## [22] bslib_0.8.0 rlang_1.1.4 cachem_1.1.0
## [25] xfun_0.49 sass_0.4.9 sys_3.4.3
## [28] SparseArray_1.7.2 cli_3.6.3 zlibbioc_1.52.0
## [31] digest_0.6.37 grid_4.4.2 irlba_2.3.5.1
## [34] lifecycle_1.0.4 S4Vectors_0.45.2 evaluate_1.0.1
## [37] codetools_0.2-20 buildtools_1.0.0 beachmat_2.23.1
## [40] abind_1.4-8 stats4_4.4.2 rmarkdown_2.29
## [43] matrixStats_1.4.1 tools_4.4.2 htmltools_0.5.8.1