TileDB implements a framework for local and remote storage of dense
and sparse arrays. We can use this as a DelayedArray
backend to provide an array-level abstraction, thus allowing the data to
be used in many places where an ordinary array or matrix might be used.
The TileDBArray
package implements the necessary wrappers around TileDB-R to
support read/write operations on TileDB arrays within the DelayedArray
framework.
TileDBArray
Creating a TileDBArray
is as easy as:
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.52739164 1.28347320 0.24042244 . 2.0958765 -0.5515157
## [2,] -0.15734949 -1.43924530 0.09707408 . 0.7357530 -0.4001625
## [3,] -0.73208902 -0.36031226 -2.15564995 . 0.7343245 -1.0001622
## [4,] 0.06851473 2.79196020 -0.60396409 . -0.2689149 -1.3812130
## [5,] 1.29000210 -0.67459504 -0.15964475 . -1.1189701 -0.3515991
## ... . . . . . .
## [96,] -0.44094554 1.49795908 0.40692310 . -0.68061908 -0.26412489
## [97,] -0.55004952 0.50585480 0.03954544 . -0.20471961 -0.10663073
## [98,] 0.77215696 0.85577917 -1.22481093 . 0.04978616 -0.78743782
## [99,] -0.27891139 1.43219148 0.87718531 . 1.14324270 -0.12210975
## [100,] -0.21227860 1.42561098 0.07815333 . -0.88870052 0.31391888
Alternatively, we can use coercion methods:
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.52739164 1.28347320 0.24042244 . 2.0958765 -0.5515157
## [2,] -0.15734949 -1.43924530 0.09707408 . 0.7357530 -0.4001625
## [3,] -0.73208902 -0.36031226 -2.15564995 . 0.7343245 -1.0001622
## [4,] 0.06851473 2.79196020 -0.60396409 . -0.2689149 -1.3812130
## [5,] 1.29000210 -0.67459504 -0.15964475 . -1.1189701 -0.3515991
## ... . . . . . .
## [96,] -0.44094554 1.49795908 0.40692310 . -0.68061908 -0.26412489
## [97,] -0.55004952 0.50585480 0.03954544 . -0.20471961 -0.10663073
## [98,] 0.77215696 0.85577917 -1.22481093 . 0.04978616 -0.78743782
## [99,] -0.27891139 1.43219148 0.87718531 . 1.14324270 -0.12210975
## [100,] -0.21227860 1.42561098 0.07815333 . -0.88870052 0.31391888
This process works also for sparse matrices:
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.52739164 1.28347320 0.24042244 . 2.0958765 -0.5515157
## GENE_2 -0.15734949 -1.43924530 0.09707408 . 0.7357530 -0.4001625
## GENE_3 -0.73208902 -0.36031226 -2.15564995 . 0.7343245 -1.0001622
## GENE_4 0.06851473 2.79196020 -0.60396409 . -0.2689149 -1.3812130
## GENE_5 1.29000210 -0.67459504 -0.15964475 . -1.1189701 -0.3515991
## ... . . . . . .
## GENE_96 -0.44094554 1.49795908 0.40692310 . -0.68061908 -0.26412489
## GENE_97 -0.55004952 0.50585480 0.03954544 . -0.20471961 -0.10663073
## GENE_98 0.77215696 0.85577917 -1.22481093 . 0.04978616 -0.78743782
## GENE_99 -0.27891139 1.43219148 0.87718531 . 1.14324270 -0.12210975
## GENE_100 -0.21227860 1.42561098 0.07815333 . -0.88870052 0.31391888
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such. The usual conventions for
extracting data from matrix-like objects work as expected:
## [1] 100 10
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.52739164 -0.15734949 -0.73208902 0.06851473 1.29000210 -0.91490744
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required, hence
the creation of the DelayedMatrix
object.
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.52739164 1.28347320 0.24042244 -0.20686277 0.26528099
## GENE_2 -0.15734949 -1.43924530 0.09707408 1.53268069 1.23875988
## GENE_3 -0.73208902 -0.36031226 -2.15564995 0.38297151 0.79730386
## GENE_4 0.06851473 2.79196020 -0.60396409 -0.99771041 0.45815756
## GENE_5 1.29000210 -0.67459504 -0.15964475 0.22394278 -0.70747844
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.0547833 2.5669464 0.4808449 . 4.1917531 -1.1030314
## GENE_2 -0.3146990 -2.8784906 0.1941482 . 1.4715059 -0.8003249
## GENE_3 -1.4641780 -0.7206245 -4.3112999 . 1.4686489 -2.0003245
## GENE_4 0.1370295 5.5839204 -1.2079282 . -0.5378298 -2.7624259
## GENE_5 2.5800042 -1.3491901 -0.3192895 . -2.2379403 -0.7031981
## ... . . . . . .
## GENE_96 -0.88189108 2.99591815 0.81384620 . -1.36123815 -0.52824978
## GENE_97 -1.10009903 1.01170961 0.07909088 . -0.40943922 -0.21326147
## GENE_98 1.54431392 1.71155834 -2.44962186 . 0.09957231 -1.57487563
## GENE_99 -0.55782277 2.86438295 1.75437063 . 2.28648540 -0.24421950
## GENE_100 -0.42455719 2.85122196 0.15630665 . -1.77740104 0.62783776
We can also do more complex matrix operations that are supported by DelayedArray:
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 2.2639884 4.7122248 -13.7032483 5.1164527 5.7100693 -11.7194416
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## -7.7681385 8.9957249 6.6945354 0.2435777
## [,1]
## GENE_1 0.857068894
## GENE_2 2.396750066
## GENE_3 -0.253343765
## GENE_4 3.477471421
## GENE_5 -1.591711668
## GENE_6 -1.694599022
## GENE_7 1.576580421
## GENE_8 -0.341955153
## GENE_9 -1.588338766
## GENE_10 -0.569422053
## GENE_11 0.212487723
## GENE_12 -1.780875756
## GENE_13 0.004867817
## GENE_14 0.064374917
## GENE_15 0.851194158
## GENE_16 -0.633402245
## GENE_17 -0.218062515
## GENE_18 2.240311457
## GENE_19 -0.374215410
## GENE_20 1.598516605
## GENE_21 -0.629927999
## GENE_22 -1.741390144
## GENE_23 -1.598373005
## GENE_24 -2.346343222
## GENE_25 1.710065154
## GENE_26 -1.825896614
## GENE_27 1.578766657
## GENE_28 -0.086082837
## GENE_29 0.868694636
## GENE_30 1.466226908
## GENE_31 -0.646999741
## GENE_32 1.145661886
## GENE_33 1.424882242
## GENE_34 1.628434869
## GENE_35 0.273718421
## GENE_36 0.360363834
## GENE_37 2.233743198
## GENE_38 -0.827021302
## GENE_39 0.469448491
## GENE_40 -0.352306404
## GENE_41 -2.426977045
## GENE_42 -1.709782245
## GENE_43 -0.713250444
## GENE_44 -1.646989386
## GENE_45 -0.992484651
## GENE_46 0.555586863
## GENE_47 0.225292821
## GENE_48 -0.431727308
## GENE_49 0.735568094
## GENE_50 0.264595212
## GENE_51 -0.404455180
## GENE_52 -1.950940182
## GENE_53 -1.069844435
## GENE_54 -1.722255242
## GENE_55 2.718434335
## GENE_56 -1.283943049
## GENE_57 -1.724886263
## GENE_58 0.941873342
## GENE_59 -0.937725157
## GENE_60 -1.305952996
## GENE_61 1.728179162
## GENE_62 0.728956643
## GENE_63 -0.399113314
## GENE_64 0.671608440
## GENE_65 -0.308466103
## GENE_66 -1.404873475
## GENE_67 -1.930834181
## GENE_68 1.369003161
## GENE_69 1.926157679
## GENE_70 2.005058584
## GENE_71 1.872603304
## GENE_72 -0.776434316
## GENE_73 -0.897078151
## GENE_74 3.232756628
## GENE_75 1.415445426
## GENE_76 -1.264838965
## GENE_77 -1.746462156
## GENE_78 -1.541315523
## GENE_79 -1.804878377
## GENE_80 0.026070141
## GENE_81 -0.133896918
## GENE_82 -2.437976810
## GENE_83 -1.270892153
## GENE_84 1.769164982
## GENE_85 -0.968370074
## GENE_86 2.916874867
## GENE_87 -1.423910051
## GENE_88 -2.138016471
## GENE_89 -0.291134712
## GENE_90 1.021926921
## GENE_91 -2.907305325
## GENE_92 0.418864047
## GENE_93 1.769834404
## GENE_94 -0.205542248
## GENE_95 -1.525411669
## GENE_96 4.912960336
## GENE_97 -0.334940717
## GENE_98 1.095967835
## GENE_99 -0.263029578
## GENE_100 1.187721783
We can adjust some parameters for creating the backend with
appropriate arguments to writeTileDBArray()
. For example,
the example below allows us to control the path to the backend as well
as the name of the attribute containing the data.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.0310723 3.0698032 -0.7770261 . -0.4890212 -0.8264628
## [2,] 0.7131760 0.3379398 -1.5087706 . 0.7972373 -0.2493980
## [3,] -0.8164945 1.2808272 -1.5016590 . -0.4244787 -0.9873177
## [4,] -2.0595909 1.0585495 2.0570800 . 0.8214420 1.5852594
## [5,] 0.2122694 -0.6529390 -0.1714979 . 2.4478122 -0.7582382
## ... . . . . . .
## [96,] 0.5372695 0.4672385 0.3934034 . 1.16282750 0.27681299
## [97,] -0.1986432 -0.6483247 0.2708613 . -0.67175437 -1.36846436
## [98,] -0.4718827 2.6193819 -0.4960531 . 0.15029578 0.09244555
## [99,] 1.1201927 0.2829049 1.5238887 . 1.24813821 -0.10667334
## [100,] -1.0413931 -0.4542847 0.8310756 . -2.32736187 0.02997124
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.0310723 3.0698032 -0.7770261 . -0.4890212 -0.8264628
## [2,] 0.7131760 0.3379398 -1.5087706 . 0.7972373 -0.2493980
## [3,] -0.8164945 1.2808272 -1.5016590 . -0.4244787 -0.9873177
## [4,] -2.0595909 1.0585495 2.0570800 . 0.8214420 1.5852594
## [5,] 0.2122694 -0.6529390 -0.1714979 . 2.4478122 -0.7582382
## ... . . . . . .
## [96,] 0.5372695 0.4672385 0.3934034 . 1.16282750 0.27681299
## [97,] -0.1986432 -0.6483247 0.2708613 . -0.67175437 -1.36846436
## [98,] -0.4718827 2.6193819 -0.4960531 . 0.15029578 0.09244555
## [99,] 1.1201927 0.2829049 1.5238887 . 1.24813821 -0.10667334
## [100,] -1.0413931 -0.4542847 0.8310756 . -2.32736187 0.02997124
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.18 TileDBArray_1.17.0 DelayedArray_0.33.1
## [4] SparseArray_1.6.0 S4Arrays_1.6.0 IRanges_2.41.0
## [7] abind_1.4-8 S4Vectors_0.44.0 MatrixGenerics_1.19.0
## [10] matrixStats_1.4.1 BiocGenerics_0.53.1 generics_0.1.3
## [13] Matrix_1.7-1 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0 jsonlite_1.8.9 compiler_4.4.1
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.13
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 RcppCCTZ_0.2.12
## [13] R6_2.5.1 XVector_0.46.0 tiledb_0.30.2
## [16] knitr_1.48 maketools_1.3.1 bslib_0.8.0
## [19] rlang_1.1.4 cachem_1.1.0 xfun_0.48
## [22] sass_0.4.9 sys_3.4.3 bit64_4.5.2
## [25] cli_3.6.3 zlibbioc_1.52.0 spdl_0.0.5
## [28] digest_0.6.37 grid_4.4.1 lifecycle_1.0.4
## [31] data.table_1.16.2 nanotime_0.3.10 evaluate_1.0.1
## [34] zoo_1.8-12 buildtools_1.0.0 rmarkdown_2.28
## [37] tools_4.4.1 htmltools_0.5.8.1