TileDB implements a framework for local and remote storage of dense
and sparse arrays. We can use this as a DelayedArray
backend to provide an array-level abstraction, thus allowing the data to
be used in many places where an ordinary array or matrix might be used.
The TileDBArray
package implements the necessary wrappers around TileDB-R to
support read/write operations on TileDB arrays within the DelayedArray
framework.
TileDBArray
Creating a TileDBArray
is as easy as:
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.09566966 0.36662422 -0.52579544 . -1.0609520 -0.5018745
## [2,] -0.54318098 -0.77874219 -0.17460032 . 0.5051455 -1.0689912
## [3,] 0.65699690 0.27394275 0.65663796 . 0.5176544 1.6036746
## [4,] 1.17203856 1.97772831 1.17681317 . 0.8060545 -0.6489925
## [5,] -0.70135328 1.88627420 -1.17325867 . -1.0834520 -1.1130286
## ... . . . . . .
## [96,] -0.52451351 0.08490191 -1.22181997 . -0.436845410 0.062160418
## [97,] -1.54794681 -0.46512657 -1.44103279 . -0.449074354 0.202489080
## [98,] 0.71473878 -0.11511416 0.75036302 . -1.233149190 -0.123587800
## [99,] 0.77938131 -1.03747090 1.12620720 . -0.112822959 -0.770304056
## [100,] 0.48698773 0.76270102 -0.70517579 . 1.075469858 0.002154998
Alternatively, we can use coercion methods:
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.09566966 0.36662422 -0.52579544 . -1.0609520 -0.5018745
## [2,] -0.54318098 -0.77874219 -0.17460032 . 0.5051455 -1.0689912
## [3,] 0.65699690 0.27394275 0.65663796 . 0.5176544 1.6036746
## [4,] 1.17203856 1.97772831 1.17681317 . 0.8060545 -0.6489925
## [5,] -0.70135328 1.88627420 -1.17325867 . -1.0834520 -1.1130286
## ... . . . . . .
## [96,] -0.52451351 0.08490191 -1.22181997 . -0.436845410 0.062160418
## [97,] -1.54794681 -0.46512657 -1.44103279 . -0.449074354 0.202489080
## [98,] 0.71473878 -0.11511416 0.75036302 . -1.233149190 -0.123587800
## [99,] 0.77938131 -1.03747090 1.12620720 . -0.112822959 -0.770304056
## [100,] 0.48698773 0.76270102 -0.70517579 . 1.075469858 0.002154998
This process works also for sparse matrices:
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.09566966 0.36662422 -0.52579544 . -1.0609520 -0.5018745
## GENE_2 -0.54318098 -0.77874219 -0.17460032 . 0.5051455 -1.0689912
## GENE_3 0.65699690 0.27394275 0.65663796 . 0.5176544 1.6036746
## GENE_4 1.17203856 1.97772831 1.17681317 . 0.8060545 -0.6489925
## GENE_5 -0.70135328 1.88627420 -1.17325867 . -1.0834520 -1.1130286
## ... . . . . . .
## GENE_96 -0.52451351 0.08490191 -1.22181997 . -0.436845410 0.062160418
## GENE_97 -1.54794681 -0.46512657 -1.44103279 . -0.449074354 0.202489080
## GENE_98 0.71473878 -0.11511416 0.75036302 . -1.233149190 -0.123587800
## GENE_99 0.77938131 -1.03747090 1.12620720 . -0.112822959 -0.770304056
## GENE_100 0.48698773 0.76270102 -0.70517579 . 1.075469858 0.002154998
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such. The usual conventions for
extracting data from matrix-like objects work as expected:
## [1] 100 10
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.09566966 -0.54318098 0.65699690 1.17203856 -0.70135328 -0.72424777
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required, hence
the creation of the DelayedMatrix
object.
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.09566966 0.36662422 -0.52579544 -0.03782422 -0.55120656
## GENE_2 -0.54318098 -0.77874219 -0.17460032 -0.03902491 0.24580866
## GENE_3 0.65699690 0.27394275 0.65663796 0.78150748 0.23543995
## GENE_4 1.17203856 1.97772831 1.17681317 -0.07128138 -0.90644024
## GENE_5 -0.70135328 1.88627420 -1.17325867 0.04288611 0.30862583
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.1913393 0.7332484 -1.0515909 . -2.121904 -1.003749
## GENE_2 -1.0863620 -1.5574844 -0.3492006 . 1.010291 -2.137982
## GENE_3 1.3139938 0.5478855 1.3132759 . 1.035309 3.207349
## GENE_4 2.3440771 3.9554566 2.3536263 . 1.612109 -1.297985
## GENE_5 -1.4027066 3.7725484 -2.3465173 . -2.166904 -2.226057
## ... . . . . . .
## GENE_96 -1.0490270 0.1698038 -2.4436399 . -0.873690820 0.124320837
## GENE_97 -3.0958936 -0.9302531 -2.8820656 . -0.898148708 0.404978160
## GENE_98 1.4294776 -0.2302283 1.5007260 . -2.466298380 -0.247175599
## GENE_99 1.5587626 -2.0749418 2.2524144 . -0.225645917 -1.540608111
## GENE_100 0.9739755 1.5254020 -1.4103516 . 2.150939716 0.004309996
We can also do more complex matrix operations that are supported by DelayedArray:
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## 3.2950162 4.6005877 -4.2995780 5.2147931 0.8022364 12.9272288 11.0719735
## SAMP_8 SAMP_9 SAMP_10
## 7.3226745 -0.1245271 -3.3126678
## [,1]
## GENE_1 0.299699796
## GENE_2 -0.830431618
## GENE_3 2.309283730
## GENE_4 3.742552076
## GENE_5 0.022497046
## GENE_6 2.419709995
## GENE_7 -1.268748873
## GENE_8 -2.773860721
## GENE_9 0.450934843
## GENE_10 -1.843584568
## GENE_11 2.387310087
## GENE_12 0.809592531
## GENE_13 -3.062207269
## GENE_14 2.600582537
## GENE_15 -1.375473167
## GENE_16 1.421952799
## GENE_17 0.557332744
## GENE_18 -0.094969369
## GENE_19 2.225717227
## GENE_20 -0.429769383
## GENE_21 0.351732684
## GENE_22 -0.251097924
## GENE_23 0.083737496
## GENE_24 0.835793413
## GENE_25 1.525999380
## GENE_26 -0.533268178
## GENE_27 -0.378498999
## GENE_28 -1.288782942
## GENE_29 0.438495036
## GENE_30 3.679041489
## GENE_31 1.494100071
## GENE_32 2.524674151
## GENE_33 0.727276149
## GENE_34 -0.324944879
## GENE_35 -2.555733644
## GENE_36 -0.135123594
## GENE_37 2.746354236
## GENE_38 0.688827341
## GENE_39 0.826843461
## GENE_40 2.092039639
## GENE_41 -0.554017907
## GENE_42 -2.228571195
## GENE_43 2.064394686
## GENE_44 -2.270524370
## GENE_45 1.215629382
## GENE_46 0.865602947
## GENE_47 -1.263889289
## GENE_48 -1.704217727
## GENE_49 2.073636932
## GENE_50 1.119929778
## GENE_51 2.956231835
## GENE_52 3.033213468
## GENE_53 2.580838299
## GENE_54 1.621281137
## GENE_55 0.845488706
## GENE_56 1.302263402
## GENE_57 -0.478690246
## GENE_58 -1.078767801
## GENE_59 2.424261473
## GENE_60 0.160977845
## GENE_61 -1.370618731
## GENE_62 1.547684277
## GENE_63 -0.607965102
## GENE_64 -0.550743257
## GENE_65 0.752064481
## GENE_66 -1.341096124
## GENE_67 1.645586678
## GENE_68 0.618405571
## GENE_69 1.816905092
## GENE_70 -3.351320045
## GENE_71 0.918430184
## GENE_72 2.413737983
## GENE_73 -1.579869168
## GENE_74 -2.131577448
## GENE_75 0.061930771
## GENE_76 1.836209293
## GENE_77 -1.623216951
## GENE_78 1.439936481
## GENE_79 -2.058959346
## GENE_80 0.519167107
## GENE_81 -2.204022828
## GENE_82 -2.387991393
## GENE_83 0.533055042
## GENE_84 -1.266084377
## GENE_85 -0.474594010
## GENE_86 -2.938422468
## GENE_87 1.172028899
## GENE_88 0.177213023
## GENE_89 2.866762699
## GENE_90 0.920876710
## GENE_91 -2.583421472
## GENE_92 -0.939716184
## GENE_93 -2.990517059
## GENE_94 -0.532059478
## GENE_95 -1.059485829
## GENE_96 0.078016432
## GENE_97 -0.175172898
## GENE_98 0.006095439
## GENE_99 -2.570636962
## GENE_100 1.318434690
We can adjust some parameters for creating the backend with
appropriate arguments to writeTileDBArray()
. For example,
the example below allows us to control the path to the backend as well
as the name of the attribute containing the data.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.77930573 -0.04202972 2.23361321 . -1.6942175 -1.9945002
## [2,] 1.26954185 1.57268242 -0.84536629 . 0.3236437 0.1573491
## [3,] -0.21297701 1.42915158 0.32437196 . -0.7833516 0.7082655
## [4,] 0.19299864 1.13604425 0.66682440 . 0.3949870 0.3447885
## [5,] 1.62339290 -0.81591351 -0.65900280 . -0.6173345 -1.1650758
## ... . . . . . .
## [96,] -0.77016517 -0.31946276 -0.74177540 . 0.94892159 -0.53379816
## [97,] 1.89574910 -1.63039571 1.54742651 . -0.83661330 0.91749703
## [98,] -0.45331328 -0.10580894 -1.33238834 . 0.69994839 0.19166611
## [99,] -0.55910190 -0.65965452 -1.19074037 . -0.06386042 -0.17679921
## [100,] 0.07623104 -1.21422073 -1.09061679 . 0.64849752 -1.26362789
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.77930573 -0.04202972 2.23361321 . -1.6942175 -1.9945002
## [2,] 1.26954185 1.57268242 -0.84536629 . 0.3236437 0.1573491
## [3,] -0.21297701 1.42915158 0.32437196 . -0.7833516 0.7082655
## [4,] 0.19299864 1.13604425 0.66682440 . 0.3949870 0.3447885
## [5,] 1.62339290 -0.81591351 -0.65900280 . -0.6173345 -1.1650758
## ... . . . . . .
## [96,] -0.77016517 -0.31946276 -0.74177540 . 0.94892159 -0.53379816
## [97,] 1.89574910 -1.63039571 1.54742651 . -0.83661330 0.91749703
## [98,] -0.45331328 -0.10580894 -1.33238834 . 0.69994839 0.19166611
## [99,] -0.55910190 -0.65965452 -1.19074037 . -0.06386042 -0.17679921
## [100,] 0.07623104 -1.21422073 -1.09061679 . 0.64849752 -1.26362789
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.17 TileDBArray_1.15.2 DelayedArray_0.31.11
## [4] SparseArray_1.5.31 S4Arrays_1.5.7 IRanges_2.39.2
## [7] abind_1.4-5 S4Vectors_0.43.2 MatrixGenerics_1.17.0
## [10] matrixStats_1.4.0 BiocGenerics_0.51.1 Matrix_1.7-0
## [13] BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.8 compiler_4.4.1
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.13
## [7] nanoarrow_0.5.0.1 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 RcppCCTZ_0.2.12
## [13] R6_2.5.1 XVector_0.45.0 tiledb_0.29.0
## [16] knitr_1.48 maketools_1.3.0 bslib_0.8.0
## [19] rlang_1.1.4 cachem_1.1.0 xfun_0.47
## [22] sass_0.4.9 sys_3.4.2 bit64_4.0.5
## [25] cli_3.6.3 zlibbioc_1.51.1 spdl_0.0.5
## [28] digest_0.6.37 grid_4.4.1 lifecycle_1.0.4
## [31] data.table_1.16.0 evaluate_0.24.0 nanotime_0.3.9
## [34] zoo_1.8-12 buildtools_1.0.0 rmarkdown_2.28
## [37] tools_4.4.1 htmltools_0.5.8.1