TileDB implements a framework for local and remote storage of dense
and sparse arrays. We can use this as a DelayedArray
backend to provide an array-level abstraction, thus allowing the data to
be used in many places where an ordinary array or matrix might be used.
The TileDBArray
package implements the necessary wrappers around TileDB-R to
support read/write operations on TileDB arrays within the DelayedArray
framework.
TileDBArray
Creating a TileDBArray
is as easy as:
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.0980690 0.3737262 -2.0417734 . -1.2850166 -0.8650878
## [2,] 1.1499984 -3.0586088 -0.7456296 . -0.9424875 0.6705240
## [3,] 0.3864939 0.7536042 -1.1251407 . 1.2201857 0.3976605
## [4,] 0.1680516 0.4671242 -0.8067544 . -0.9340041 0.6842486
## [5,] 0.5509990 0.7466213 0.1592049 . -0.2004421 0.5552144
## ... . . . . . .
## [96,] 0.762025116 -0.003355968 0.126421659 . -1.54332010 -0.13786982
## [97,] 0.383790777 0.708645296 -1.222077790 . -0.03434548 1.06258324
## [98,] -1.117983902 -0.612733756 -0.198165321 . -1.47603549 -1.24214161
## [99,] -0.829250010 -0.512751585 0.050848093 . 0.73825832 0.70796545
## [100,] 0.237634172 0.997480306 1.736221133 . 1.56305324 0.62874966
Alternatively, we can use coercion methods:
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.0980690 0.3737262 -2.0417734 . -1.2850166 -0.8650878
## [2,] 1.1499984 -3.0586088 -0.7456296 . -0.9424875 0.6705240
## [3,] 0.3864939 0.7536042 -1.1251407 . 1.2201857 0.3976605
## [4,] 0.1680516 0.4671242 -0.8067544 . -0.9340041 0.6842486
## [5,] 0.5509990 0.7466213 0.1592049 . -0.2004421 0.5552144
## ... . . . . . .
## [96,] 0.762025116 -0.003355968 0.126421659 . -1.54332010 -0.13786982
## [97,] 0.383790777 0.708645296 -1.222077790 . -0.03434548 1.06258324
## [98,] -1.117983902 -0.612733756 -0.198165321 . -1.47603549 -1.24214161
## [99,] -0.829250010 -0.512751585 0.050848093 . 0.73825832 0.70796545
## [100,] 0.237634172 0.997480306 1.736221133 . 1.56305324 0.62874966
This process works also for sparse matrices:
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -1.0980690 0.3737262 -2.0417734 . -1.2850166 -0.8650878
## GENE_2 1.1499984 -3.0586088 -0.7456296 . -0.9424875 0.6705240
## GENE_3 0.3864939 0.7536042 -1.1251407 . 1.2201857 0.3976605
## GENE_4 0.1680516 0.4671242 -0.8067544 . -0.9340041 0.6842486
## GENE_5 0.5509990 0.7466213 0.1592049 . -0.2004421 0.5552144
## ... . . . . . .
## GENE_96 0.762025116 -0.003355968 0.126421659 . -1.54332010 -0.13786982
## GENE_97 0.383790777 0.708645296 -1.222077790 . -0.03434548 1.06258324
## GENE_98 -1.117983902 -0.612733756 -0.198165321 . -1.47603549 -1.24214161
## GENE_99 -0.829250010 -0.512751585 0.050848093 . 0.73825832 0.70796545
## GENE_100 0.237634172 0.997480306 1.736221133 . 1.56305324 0.62874966
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such. The usual conventions for
extracting data from matrix-like objects work as expected:
## [1] 100 10
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -1.0980690 1.1499984 0.3864939 0.1680516 0.5509990 -1.3202584
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required, hence
the creation of the DelayedMatrix
object.
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -1.0980690 0.3737262 -2.0417734 -1.6682418 0.5768231
## GENE_2 1.1499984 -3.0586088 -0.7456296 -1.6251950 0.7343314
## GENE_3 0.3864939 0.7536042 -1.1251407 -0.3444799 -0.3209462
## GENE_4 0.1680516 0.4671242 -0.8067544 -1.2651838 -0.3112803
## GENE_5 0.5509990 0.7466213 0.1592049 -1.3204951 -0.6614292
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -2.1961380 0.7474524 -4.0835469 . -2.5700333 -1.7301755
## GENE_2 2.2999968 -6.1172177 -1.4912592 . -1.8849750 1.3410479
## GENE_3 0.7729877 1.5072084 -2.2502814 . 2.4403713 0.7953210
## GENE_4 0.3361033 0.9342485 -1.6135087 . -1.8680081 1.3684973
## GENE_5 1.1019980 1.4932426 0.3184099 . -0.4008842 1.1104287
## ... . . . . . .
## GENE_96 1.524050233 -0.006711936 0.252843318 . -3.08664020 -0.27573963
## GENE_97 0.767581554 1.417290592 -2.444155579 . -0.06869097 2.12516648
## GENE_98 -2.235967804 -1.225467512 -0.396330641 . -2.95207099 -2.48428323
## GENE_99 -1.658500020 -1.025503170 0.101696186 . 1.47651664 1.41593090
## GENE_100 0.475268344 1.994960613 3.472442265 . 3.12610649 1.25749931
We can also do more complex matrix operations that are supported by DelayedArray:
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## -5.375598 -8.113603 7.900775 2.311052 4.747431 -0.342669 12.392944
## SAMP_8 SAMP_9 SAMP_10
## -14.974653 1.957300 -1.525254
## [,1]
## GENE_1 -2.74727166
## GENE_2 -2.50475238
## GENE_3 -0.61169560
## GENE_4 -1.19755542
## GENE_5 -0.49849427
## GENE_6 -2.51232501
## GENE_7 1.84754234
## GENE_8 -2.26148442
## GENE_9 0.22198750
## GENE_10 0.11713612
## GENE_11 1.79074886
## GENE_12 -1.36852197
## GENE_13 0.36017075
## GENE_14 -0.61393683
## GENE_15 0.59631056
## GENE_16 -0.09789762
## GENE_17 1.09524344
## GENE_18 1.67409272
## GENE_19 0.80876534
## GENE_20 0.21307732
## GENE_21 -0.05593080
## GENE_22 0.73731744
## GENE_23 1.68929656
## GENE_24 -0.16424304
## GENE_25 1.12222377
## GENE_26 2.88903553
## GENE_27 -0.38447949
## GENE_28 1.22496823
## GENE_29 -1.44211685
## GENE_30 3.23809037
## GENE_31 2.39628865
## GENE_32 -3.27021959
## GENE_33 -0.42247556
## GENE_34 -0.20489820
## GENE_35 -0.22768172
## GENE_36 -0.62702425
## GENE_37 -1.87140184
## GENE_38 1.05222057
## GENE_39 0.15476318
## GENE_40 1.18451708
## GENE_41 -4.30488882
## GENE_42 0.44961023
## GENE_43 0.73391496
## GENE_44 2.29374976
## GENE_45 -1.91549059
## GENE_46 2.52978282
## GENE_47 -0.85988592
## GENE_48 2.25581457
## GENE_49 -1.34673362
## GENE_50 2.42197222
## GENE_51 1.78469597
## GENE_52 -1.63042881
## GENE_53 3.57259880
## GENE_54 -0.76750894
## GENE_55 1.41078500
## GENE_56 2.46836960
## GENE_57 -2.60899509
## GENE_58 1.54602225
## GENE_59 -0.32380936
## GENE_60 1.38565152
## GENE_61 -0.01547708
## GENE_62 -1.14699090
## GENE_63 -0.62738372
## GENE_64 1.69336682
## GENE_65 2.77960150
## GENE_66 -1.58730915
## GENE_67 0.08578080
## GENE_68 0.22947005
## GENE_69 -1.11615834
## GENE_70 -0.62171950
## GENE_71 -1.27760854
## GENE_72 -2.34267146
## GENE_73 0.95320889
## GENE_74 0.19087706
## GENE_75 0.09586077
## GENE_76 0.70327334
## GENE_77 0.48271729
## GENE_78 -3.11158148
## GENE_79 -0.50702246
## GENE_80 0.60809570
## GENE_81 2.47893406
## GENE_82 -2.09526945
## GENE_83 0.02244614
## GENE_84 1.10329823
## GENE_85 -0.60702937
## GENE_86 0.43691814
## GENE_87 -2.25739185
## GENE_88 0.31643011
## GENE_89 -1.17796355
## GENE_90 -1.39669067
## GENE_91 -0.67156585
## GENE_92 1.79666685
## GENE_93 -0.20460173
## GENE_94 -1.34338358
## GENE_95 0.45637824
## GENE_96 0.41490620
## GENE_97 0.44872238
## GENE_98 -4.08770581
## GENE_99 3.03967022
## GENE_100 3.58818078
We can adjust some parameters for creating the backend with
appropriate arguments to writeTileDBArray()
. For example,
the example below allows us to control the path to the backend as well
as the name of the attribute containing the data.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.03096172 1.77421207 0.17191501 . -0.6945926 -0.6800004
## [2,] -0.62916299 -0.13931708 0.48272873 . 1.4286237 -1.3165881
## [3,] -2.34621209 0.92339418 -0.22777383 . -0.1005461 -0.1978022
## [4,] 0.60296619 0.01731837 2.14738258 . 0.8033065 -0.7035307
## [5,] -0.27924668 -0.17308041 -0.64001258 . -0.9912655 0.2840542
## ... . . . . . .
## [96,] -1.3724414 0.5044666 0.3016977 . -0.88577982 0.95495447
## [97,] -0.1577134 2.3188311 1.9801966 . 1.13409371 -0.90914540
## [98,] 0.3893308 0.5562913 0.2681973 . -0.15257106 0.10713512
## [99,] -0.4951212 0.9612113 1.6072360 . -0.76596025 -0.07702746
## [100,] -0.3230275 -1.1657553 1.1901853 . -1.74630881 1.24084805
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.03096172 1.77421207 0.17191501 . -0.6945926 -0.6800004
## [2,] -0.62916299 -0.13931708 0.48272873 . 1.4286237 -1.3165881
## [3,] -2.34621209 0.92339418 -0.22777383 . -0.1005461 -0.1978022
## [4,] 0.60296619 0.01731837 2.14738258 . 0.8033065 -0.7035307
## [5,] -0.27924668 -0.17308041 -0.64001258 . -0.9912655 0.2840542
## ... . . . . . .
## [96,] -1.3724414 0.5044666 0.3016977 . -0.88577982 0.95495447
## [97,] -0.1577134 2.3188311 1.9801966 . 1.13409371 -0.90914540
## [98,] 0.3893308 0.5562913 0.2681973 . -0.15257106 0.10713512
## [99,] -0.4951212 0.9612113 1.6072360 . -0.76596025 -0.07702746
## [100,] -0.3230275 -1.1657553 1.1901853 . -1.74630881 1.24084805
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.20 TileDBArray_1.17.0 DelayedArray_0.33.6
## [4] SparseArray_1.7.7 S4Arrays_1.7.3 IRanges_2.41.3
## [7] abind_1.4-8 S4Vectors_0.45.4 MatrixGenerics_1.19.1
## [10] matrixStats_1.5.0 BiocGenerics_0.53.6 generics_0.1.3
## [13] Matrix_1.7-3 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_1.9.1 compiler_4.4.3
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 RcppCCTZ_0.2.13
## [13] R6_2.6.1 XVector_0.47.2 tiledb_0.30.2
## [16] knitr_1.50 maketools_1.3.2 bslib_0.9.0
## [19] rlang_1.1.5 cachem_1.1.0 xfun_0.51
## [22] sass_0.4.9 sys_3.4.3 bit64_4.6.0-1
## [25] cli_3.6.4 spdl_0.0.5 digest_0.6.37
## [28] grid_4.4.3 lifecycle_1.0.4 data.table_1.17.0
## [31] evaluate_1.0.3 nanotime_0.3.11 zoo_1.8-13
## [34] buildtools_1.0.0 rmarkdown_2.29 tools_4.4.3
## [37] htmltools_0.5.8.1