TileDB implements a framework for local and remote storage of dense
and sparse arrays. We can use this as a DelayedArray
backend to provide an array-level abstraction, thus allowing the data to
be used in many places where an ordinary array or matrix might be used.
The TileDBArray
package implements the necessary wrappers around TileDB-R to
support read/write operations on TileDB arrays within the DelayedArray
framework.
TileDBArray
Creating a TileDBArray
is as easy as:
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.11121241 -0.77746233 0.29669910 . -0.9738874 -2.7032837
## [2,] -0.23093286 0.02575036 -0.98082034 . 0.2640319 0.1376174
## [3,] 0.85523721 -0.22571451 0.13172056 . -0.7500873 -0.3277719
## [4,] -0.34815154 0.95986226 0.26786391 . 0.8398497 2.0524982
## [5,] -0.95172452 -0.21975246 -1.16025641 . 0.5027632 -0.2299850
## ... . . . . . .
## [96,] -0.86690460 -0.35127102 -0.46650964 . 1.35521665 -1.99238794
## [97,] 0.60434322 -0.05384068 1.41577319 . -0.75597351 0.24366122
## [98,] -0.42257607 0.15399382 -2.10086689 . 0.05599346 -0.09222779
## [99,] 1.31168909 -0.26242372 -1.11907884 . 0.92059253 0.80331219
## [100,] -0.20768590 -0.93412086 -1.59665759 . -1.36133141 -1.11780215
Alternatively, we can use coercion methods:
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.11121241 -0.77746233 0.29669910 . -0.9738874 -2.7032837
## [2,] -0.23093286 0.02575036 -0.98082034 . 0.2640319 0.1376174
## [3,] 0.85523721 -0.22571451 0.13172056 . -0.7500873 -0.3277719
## [4,] -0.34815154 0.95986226 0.26786391 . 0.8398497 2.0524982
## [5,] -0.95172452 -0.21975246 -1.16025641 . 0.5027632 -0.2299850
## ... . . . . . .
## [96,] -0.86690460 -0.35127102 -0.46650964 . 1.35521665 -1.99238794
## [97,] 0.60434322 -0.05384068 1.41577319 . -0.75597351 0.24366122
## [98,] -0.42257607 0.15399382 -2.10086689 . 0.05599346 -0.09222779
## [99,] 1.31168909 -0.26242372 -1.11907884 . 0.92059253 0.80331219
## [100,] -0.20768590 -0.93412086 -1.59665759 . -1.36133141 -1.11780215
This process works also for sparse matrices:
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0.0 0.0
## [2,] 0 0 0 . 0.0 -0.9
## [3,] 0 0 0 . 0.0 0.0
## [4,] 0 0 0 . 0.0 0.0
## [5,] 0 0 0 . 0.0 0.0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.11121241 -0.77746233 0.29669910 . -0.9738874 -2.7032837
## GENE_2 -0.23093286 0.02575036 -0.98082034 . 0.2640319 0.1376174
## GENE_3 0.85523721 -0.22571451 0.13172056 . -0.7500873 -0.3277719
## GENE_4 -0.34815154 0.95986226 0.26786391 . 0.8398497 2.0524982
## GENE_5 -0.95172452 -0.21975246 -1.16025641 . 0.5027632 -0.2299850
## ... . . . . . .
## GENE_96 -0.86690460 -0.35127102 -0.46650964 . 1.35521665 -1.99238794
## GENE_97 0.60434322 -0.05384068 1.41577319 . -0.75597351 0.24366122
## GENE_98 -0.42257607 0.15399382 -2.10086689 . 0.05599346 -0.09222779
## GENE_99 1.31168909 -0.26242372 -1.11907884 . 0.92059253 0.80331219
## GENE_100 -0.20768590 -0.93412086 -1.59665759 . -1.36133141 -1.11780215
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such. The usual conventions for
extracting data from matrix-like objects work as expected:
## [1] 100 10
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.1112124 -0.2309329 0.8552372 -0.3481515 -0.9517245 -0.9119683
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required, hence
the creation of the DelayedMatrix
object.
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.11121241 -0.77746233 0.29669910 1.11727798 1.44274006
## GENE_2 -0.23093286 0.02575036 -0.98082034 -1.19631089 -0.50190029
## GENE_3 0.85523721 -0.22571451 0.13172056 2.52263301 0.30427252
## GENE_4 -0.34815154 0.95986226 0.26786391 1.02778496 -1.24317573
## GENE_5 -0.95172452 -0.21975246 -1.16025641 -2.02210094 0.11605730
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.22242482 -1.55492466 0.59339821 . -1.9477749 -5.4065675
## GENE_2 -0.46186572 0.05150072 -1.96164067 . 0.5280639 0.2752348
## GENE_3 1.71047443 -0.45142901 0.26344112 . -1.5001746 -0.6555438
## GENE_4 -0.69630308 1.91972453 0.53572781 . 1.6796994 4.1049963
## GENE_5 -1.90344903 -0.43950493 -2.32051281 . 1.0055264 -0.4599701
## ... . . . . . .
## GENE_96 -1.7338092 -0.7025420 -0.9330193 . 2.7104333 -3.9847759
## GENE_97 1.2086864 -0.1076814 2.8315464 . -1.5119470 0.4873224
## GENE_98 -0.8451521 0.3079876 -4.2017338 . 0.1119869 -0.1844556
## GENE_99 2.6233782 -0.5248474 -2.2381577 . 1.8411851 1.6066244
## GENE_100 -0.4153718 -1.8682417 -3.1933152 . -2.7226628 -2.2356043
We can also do more complex matrix operations that are supported by DelayedArray:
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 4.0247200 -9.9007397 -3.2759337 20.4835213 11.9521906 -4.6486200
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## -15.3281753 8.3305574 -0.4708035 5.1513811
## [,1]
## GENE_1 -2.07306141
## GENE_2 -1.13744558
## GENE_3 1.19385474
## GENE_4 1.74772401
## GENE_5 -2.70507139
## GENE_6 -1.24468844
## GENE_7 -2.46896879
## GENE_8 -2.38969539
## GENE_9 0.85382708
## GENE_10 -1.26553381
## GENE_11 1.39231007
## GENE_12 0.09184206
## GENE_13 -2.69070781
## GENE_14 0.87617312
## GENE_15 3.82056058
## GENE_16 3.08100590
## GENE_17 0.91517083
## GENE_18 -3.03217694
## GENE_19 1.03096145
## GENE_20 0.80431963
## GENE_21 -0.88055384
## GENE_22 2.40725446
## GENE_23 -2.52043915
## GENE_24 2.71405829
## GENE_25 0.06694443
## GENE_26 -0.37187935
## GENE_27 -1.34507802
## GENE_28 -1.01654195
## GENE_29 2.17419086
## GENE_30 3.09700290
## GENE_31 -0.25689141
## GENE_32 1.73447020
## GENE_33 -0.98349832
## GENE_34 -2.83331370
## GENE_35 -0.63139973
## GENE_36 0.07398955
## GENE_37 1.43402949
## GENE_38 1.06934741
## GENE_39 0.88832680
## GENE_40 2.76792430
## GENE_41 -0.52399859
## GENE_42 -1.09666715
## GENE_43 -0.56630550
## GENE_44 -3.22914014
## GENE_45 3.60383393
## GENE_46 1.25056174
## GENE_47 1.27045360
## GENE_48 -0.30516531
## GENE_49 -1.86816601
## GENE_50 -1.43756093
## GENE_51 0.56406488
## GENE_52 1.33293976
## GENE_53 2.65235962
## GENE_54 1.37614554
## GENE_55 0.14727544
## GENE_56 -1.26503957
## GENE_57 -0.38551870
## GENE_58 0.43341436
## GENE_59 3.13403895
## GENE_60 -0.80235113
## GENE_61 2.51029354
## GENE_62 -2.57983726
## GENE_63 0.34038257
## GENE_64 -0.22035871
## GENE_65 -0.47585042
## GENE_66 0.42184305
## GENE_67 -0.09164513
## GENE_68 2.94527935
## GENE_69 0.35231461
## GENE_70 0.75293902
## GENE_71 1.70363783
## GENE_72 -1.10630782
## GENE_73 -0.60488522
## GENE_74 0.26086200
## GENE_75 -0.78843748
## GENE_76 1.87028284
## GENE_77 -2.41052366
## GENE_78 5.45841638
## GENE_79 -1.72319643
## GENE_80 -0.96963619
## GENE_81 -2.76035520
## GENE_82 0.40611305
## GENE_83 -1.14252005
## GENE_84 -1.93212494
## GENE_85 3.02394974
## GENE_86 -1.82956832
## GENE_87 0.17887289
## GENE_88 -0.45044999
## GENE_89 2.01656209
## GENE_90 -0.66609592
## GENE_91 -1.57852119
## GENE_92 -0.49054954
## GENE_93 -0.02542750
## GENE_94 0.88581799
## GENE_95 1.63361948
## GENE_96 -2.70488405
## GENE_97 1.23584606
## GENE_98 0.91880363
## GENE_99 3.66069000
## GENE_100 -0.94163715
We can adjust some parameters for creating the backend with
appropriate arguments to writeTileDBArray()
. For example,
the example below allows us to control the path to the backend as well
as the name of the attribute containing the data.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.21173505 -1.37146049 -0.51407793 . -0.78034288 -0.18791472
## [2,] 0.46154773 0.84134383 -0.05557603 . 0.75320089 -1.10434819
## [3,] 0.30418644 -2.00162138 1.61663070 . 1.45651635 0.90185634
## [4,] -0.80779822 -1.17597650 -0.45567107 . -0.03366864 -1.21638865
## [5,] 1.49203272 -0.03433720 0.55329419 . -1.23297182 0.46143426
## ... . . . . . .
## [96,] -0.4858482 0.8634975 1.8656026 . -1.04110412 -0.02490801
## [97,] -0.2430335 -0.5367983 -0.5645933 . -0.98074324 0.72234220
## [98,] 0.1335198 -0.4421754 -0.3929330 . 0.37562041 -0.15347705
## [99,] -0.5356884 0.5433323 0.7353919 . 0.82538861 0.03581916
## [100,] -0.5295905 -0.4243385 -1.5902678 . 2.24817456 -1.01119971
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.21173505 -1.37146049 -0.51407793 . -0.78034288 -0.18791472
## [2,] 0.46154773 0.84134383 -0.05557603 . 0.75320089 -1.10434819
## [3,] 0.30418644 -2.00162138 1.61663070 . 1.45651635 0.90185634
## [4,] -0.80779822 -1.17597650 -0.45567107 . -0.03366864 -1.21638865
## [5,] 1.49203272 -0.03433720 0.55329419 . -1.23297182 0.46143426
## ... . . . . . .
## [96,] -0.4858482 0.8634975 1.8656026 . -1.04110412 -0.02490801
## [97,] -0.2430335 -0.5367983 -0.5645933 . -0.98074324 0.72234220
## [98,] 0.1335198 -0.4421754 -0.3929330 . 0.37562041 -0.15347705
## [99,] -0.5356884 0.5433323 0.7353919 . 0.82538861 0.03581916
## [100,] -0.5295905 -0.4243385 -1.5902678 . 2.24817456 -1.01119971
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.19 TileDBArray_1.17.0 DelayedArray_0.33.3
## [4] SparseArray_1.7.2 S4Arrays_1.7.1 IRanges_2.41.2
## [7] abind_1.4-8 S4Vectors_0.45.2 MatrixGenerics_1.19.0
## [10] matrixStats_1.4.1 BiocGenerics_0.53.3 generics_0.1.3
## [13] Matrix_1.7-1 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0.1 jsonlite_1.8.9 compiler_4.4.2
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.13-1
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 RcppCCTZ_0.2.13
## [13] R6_2.5.1 XVector_0.47.1 tiledb_0.30.2
## [16] knitr_1.49 maketools_1.3.1 bslib_0.8.0
## [19] rlang_1.1.4 cachem_1.1.0 xfun_0.49
## [22] sass_0.4.9 sys_3.4.3 bit64_4.5.2
## [25] cli_3.6.3 spdl_0.0.5 digest_0.6.37
## [28] grid_4.4.2 lifecycle_1.0.4 data.table_1.16.4
## [31] evaluate_1.0.1 nanotime_0.3.10 zoo_1.8-12
## [34] buildtools_1.0.0 rmarkdown_2.29 tools_4.4.2
## [37] htmltools_0.5.8.1