TileDB implements a framework for local and remote storage of dense
and sparse arrays. We can use this as a DelayedArray
backend to provide an array-level abstraction, thus allowing the data to
be used in many places where an ordinary array or matrix might be used.
The TileDBArray
package implements the necessary wrappers around TileDB-R to
support read/write operations on TileDB arrays within the DelayedArray
framework.
TileDBArray
Creating a TileDBArray
is as easy as:
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.8295081 -0.2147574 -0.3002395 . 0.102939774 -0.168677676
## [2,] -0.2407711 -1.2097642 -1.1797586 . -1.118352074 -0.587042972
## [3,] 1.0263293 -1.0696864 0.3858150 . 0.590606217 0.713438738
## [4,] -0.5359748 -0.5911862 2.6854679 . 1.502702596 0.604758596
## [5,] -1.1328836 0.1993387 0.5054101 . 0.411818535 -0.005032609
## ... . . . . . .
## [96,] -1.71042055 -1.14873390 1.72807932 . 1.36891428 -0.88829525
## [97,] -0.76932335 0.57253368 -1.25070096 . 1.19989766 -0.50688952
## [98,] 0.22847915 0.02245715 -2.40541314 . -0.45450586 0.02232690
## [99,] -0.84393151 1.49146165 -1.07175593 . -0.08675869 0.10505145
## [100,] 0.12460620 -0.10918715 0.61969006 . -0.29433759 0.98021964
Alternatively, we can use coercion methods:
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.8295081 -0.2147574 -0.3002395 . 0.102939774 -0.168677676
## [2,] -0.2407711 -1.2097642 -1.1797586 . -1.118352074 -0.587042972
## [3,] 1.0263293 -1.0696864 0.3858150 . 0.590606217 0.713438738
## [4,] -0.5359748 -0.5911862 2.6854679 . 1.502702596 0.604758596
## [5,] -1.1328836 0.1993387 0.5054101 . 0.411818535 -0.005032609
## ... . . . . . .
## [96,] -1.71042055 -1.14873390 1.72807932 . 1.36891428 -0.88829525
## [97,] -0.76932335 0.57253368 -1.25070096 . 1.19989766 -0.50688952
## [98,] 0.22847915 0.02245715 -2.40541314 . -0.45450586 0.02232690
## [99,] -0.84393151 1.49146165 -1.07175593 . -0.08675869 0.10505145
## [100,] 0.12460620 -0.10918715 0.61969006 . -0.29433759 0.98021964
This process works also for sparse matrices:
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.8295081 -0.2147574 -0.3002395 . 0.102939774 -0.168677676
## GENE_2 -0.2407711 -1.2097642 -1.1797586 . -1.118352074 -0.587042972
## GENE_3 1.0263293 -1.0696864 0.3858150 . 0.590606217 0.713438738
## GENE_4 -0.5359748 -0.5911862 2.6854679 . 1.502702596 0.604758596
## GENE_5 -1.1328836 0.1993387 0.5054101 . 0.411818535 -0.005032609
## ... . . . . . .
## GENE_96 -1.71042055 -1.14873390 1.72807932 . 1.36891428 -0.88829525
## GENE_97 -0.76932335 0.57253368 -1.25070096 . 1.19989766 -0.50688952
## GENE_98 0.22847915 0.02245715 -2.40541314 . -0.45450586 0.02232690
## GENE_99 -0.84393151 1.49146165 -1.07175593 . -0.08675869 0.10505145
## GENE_100 0.12460620 -0.10918715 0.61969006 . -0.29433759 0.98021964
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such. The usual conventions for
extracting data from matrix-like objects work as expected:
## [1] 100 10
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.82950807 -0.24077114 1.02632933 -0.53597485 -1.13288363 0.09083834
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required, hence
the creation of the DelayedMatrix
object.
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.8295081 -0.2147574 -0.3002395 -1.0774882 0.6537696
## GENE_2 -0.2407711 -1.2097642 -1.1797586 1.7847679 0.1864016
## GENE_3 1.0263293 -1.0696864 0.3858150 1.3461287 -0.4562356
## GENE_4 -0.5359748 -0.5911862 2.6854679 0.1675964 -0.8506202
## GENE_5 -1.1328836 0.1993387 0.5054101 0.2749176 0.2738543
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.6590161 -0.4295147 -0.6004790 . 0.20587955 -0.33735535
## GENE_2 -0.4815423 -2.4195283 -2.3595171 . -2.23670415 -1.17408594
## GENE_3 2.0526587 -2.1393728 0.7716301 . 1.18121243 1.42687748
## GENE_4 -1.0719497 -1.1823724 5.3709358 . 3.00540519 1.20951719
## GENE_5 -2.2657673 0.3986775 1.0108202 . 0.82363707 -0.01006522
## ... . . . . . .
## GENE_96 -3.4208411 -2.2974678 3.4561586 . 2.7378286 -1.7765905
## GENE_97 -1.5386467 1.1450674 -2.5014019 . 2.3997953 -1.0137790
## GENE_98 0.4569583 0.0449143 -4.8108263 . -0.9090117 0.0446538
## GENE_99 -1.6878630 2.9829233 -2.1435119 . -0.1735174 0.2101029
## GENE_100 0.2492124 -0.2183743 1.2393801 . -0.5886752 1.9604393
We can also do more complex matrix operations that are supported by DelayedArray:
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## -12.6005057 -6.0172943 6.2192479 8.3612061 -0.6681432 15.2435245
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## 4.8253229 13.0271567 18.7574745 0.2463580
## [,1]
## GENE_1 -0.56482193
## GENE_2 -0.55863361
## GENE_3 1.91464938
## GENE_4 0.61829817
## GENE_5 -0.02174770
## GENE_6 -0.29459161
## GENE_7 -3.22897569
## GENE_8 -2.17774336
## GENE_9 -1.52392116
## GENE_10 3.41372521
## GENE_11 -1.64944292
## GENE_12 0.16778062
## GENE_13 0.28528499
## GENE_14 -0.91064416
## GENE_15 0.86830952
## GENE_16 0.20360999
## GENE_17 1.72638493
## GENE_18 -1.48174065
## GENE_19 1.00644833
## GENE_20 -0.42683026
## GENE_21 1.28457381
## GENE_22 -0.78728767
## GENE_23 1.29227875
## GENE_24 0.86339912
## GENE_25 2.74510196
## GENE_26 0.49556971
## GENE_27 -1.05877466
## GENE_28 2.54735914
## GENE_29 0.23340466
## GENE_30 0.43694875
## GENE_31 -1.24981834
## GENE_32 0.51989784
## GENE_33 1.64682898
## GENE_34 0.92623886
## GENE_35 0.33224472
## GENE_36 -0.64260985
## GENE_37 -0.58365591
## GENE_38 0.68992390
## GENE_39 1.45066154
## GENE_40 1.54820900
## GENE_41 -0.12801916
## GENE_42 0.14638490
## GENE_43 -0.92690780
## GENE_44 1.47090237
## GENE_45 -2.37768161
## GENE_46 -0.70774621
## GENE_47 -1.45915947
## GENE_48 1.74153934
## GENE_49 2.79313076
## GENE_50 1.67314313
## GENE_51 -0.64490910
## GENE_52 -1.55406801
## GENE_53 -0.95653393
## GENE_54 1.36044450
## GENE_55 2.19449036
## GENE_56 -0.37836080
## GENE_57 -1.31214820
## GENE_58 0.96166617
## GENE_59 -2.64363973
## GENE_60 0.65443415
## GENE_61 -0.77098511
## GENE_62 -1.26812287
## GENE_63 1.32434308
## GENE_64 0.31805195
## GENE_65 -0.57461181
## GENE_66 0.69898725
## GENE_67 0.30786525
## GENE_68 0.05536005
## GENE_69 1.22100313
## GENE_70 -0.53207604
## GENE_71 -2.36187306
## GENE_72 3.89458011
## GENE_73 -0.77781900
## GENE_74 -1.16438689
## GENE_75 0.81978117
## GENE_76 -0.08128711
## GENE_77 2.74323099
## GENE_78 0.31894519
## GENE_79 0.65792681
## GENE_80 0.13567093
## GENE_81 0.31995332
## GENE_82 1.18094604
## GENE_83 -2.06697385
## GENE_84 -0.03116369
## GENE_85 1.67989754
## GENE_86 0.52550085
## GENE_87 0.31064241
## GENE_88 -0.86240340
## GENE_89 -0.37420519
## GENE_90 3.41922556
## GENE_91 2.47171009
## GENE_92 2.61909181
## GENE_93 -0.96563998
## GENE_94 2.94273202
## GENE_95 -0.44186342
## GENE_96 -0.75297976
## GENE_97 0.29153814
## GENE_98 1.73971073
## GENE_99 2.02233835
## GENE_100 1.07930374
We can adjust some parameters for creating the backend with
appropriate arguments to writeTileDBArray()
. For example,
the example below allows us to control the path to the backend as well
as the name of the attribute containing the data.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.87102604 0.07066709 -0.56560177 . -0.79734338 0.61720101
## [2,] 0.59805783 0.04436053 0.72806056 . 1.95409521 1.30955788
## [3,] -0.60467245 -0.19008403 1.04320014 . 0.42933648 0.07489165
## [4,] 0.46813679 -0.93201854 1.93860771 . 2.42273904 0.67061926
## [5,] 0.28518683 -1.57033363 -0.38763063 . 2.17745278 1.51697938
## ... . . . . . .
## [96,] -0.14717892 0.44631943 -1.25755472 . 0.71493626 0.75256468
## [97,] 1.43208899 -0.38277914 0.12367752 . -0.49366989 0.70556228
## [98,] -0.29797638 -0.04856172 0.26533687 . -0.30511982 0.01711105
## [99,] -0.92980601 -0.75440606 -1.11732765 . -1.59507149 -0.72676847
## [100,] -0.65094852 1.51233457 0.86040559 . -1.74825701 1.08737305
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.87102604 0.07066709 -0.56560177 . -0.79734338 0.61720101
## [2,] 0.59805783 0.04436053 0.72806056 . 1.95409521 1.30955788
## [3,] -0.60467245 -0.19008403 1.04320014 . 0.42933648 0.07489165
## [4,] 0.46813679 -0.93201854 1.93860771 . 2.42273904 0.67061926
## [5,] 0.28518683 -1.57033363 -0.38763063 . 2.17745278 1.51697938
## ... . . . . . .
## [96,] -0.14717892 0.44631943 -1.25755472 . 0.71493626 0.75256468
## [97,] 1.43208899 -0.38277914 0.12367752 . -0.49366989 0.70556228
## [98,] -0.29797638 -0.04856172 0.26533687 . -0.30511982 0.01711105
## [99,] -0.92980601 -0.75440606 -1.11732765 . -1.59507149 -0.72676847
## [100,] -0.65094852 1.51233457 0.86040559 . -1.74825701 1.08737305
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.19 TileDBArray_1.17.0 DelayedArray_0.33.4
## [4] SparseArray_1.7.4 S4Arrays_1.7.1 IRanges_2.41.2
## [7] abind_1.4-8 S4Vectors_0.45.2 MatrixGenerics_1.19.1
## [10] matrixStats_1.5.0 BiocGenerics_0.53.5 generics_0.1.3
## [13] Matrix_1.7-2 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0.1 jsonlite_1.8.9 compiler_4.4.2
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 RcppCCTZ_0.2.13
## [13] R6_2.5.1 XVector_0.47.2 tiledb_0.30.2
## [16] knitr_1.49 maketools_1.3.1 bslib_0.8.0
## [19] rlang_1.1.5 cachem_1.1.0 xfun_0.50
## [22] sass_0.4.9 sys_3.4.3 bit64_4.6.0-1
## [25] cli_3.6.3 spdl_0.0.5 digest_0.6.37
## [28] grid_4.4.2 lifecycle_1.0.4 data.table_1.16.4
## [31] evaluate_1.0.3 nanotime_0.3.11 zoo_1.8-12
## [34] buildtools_1.0.0 rmarkdown_2.29 tools_4.4.2
## [37] htmltools_0.5.8.1