DelayedMatrixStats ports the matrixStats API to work with DelayedMatrix objects from the DelayedArray package. It provides high-performing functions operating on rows and columns of DelayedMatrix objects, including all subclasses such as RleArray (from the DelayedArray package) and HDF5Array (from the HDF5Array) as well as supporting all types of seeds, such as matrix (from the base package) and Matrix (from the Matrix package).
The DelayedArray package allows developers to store array-like data using in-memory or on-disk representations (e.g., in HDF5 files) and provides a common and familiar array-like interface for interacting with these data.
The DelayedMatrixStats package is designed to make life easier for Bioconductor developers wanting to use DelayedArray by providing a rich set of column-wise and row-wise summary functions.
We briefly demonstrate and explain these two features using a simple example. We’ll simulate some (unrealistic) RNA-seq read counts data from 10,000 genes and 20 samples and store it on disk as a HDF5Array:
library(DelayedArray)
x <- do.call(cbind, lapply(1:20, function(j) {
rpois(n = 10000, lambda = sample(20:40, 10000, replace = TRUE))
}))
colnames(x) <- paste0("S", 1:20)
x <- realize(x, "HDF5Array")
x
#> <10000 x 20> HDF5Matrix object of type "integer":
#> S1 S2 S3 S4 ... S17 S18 S19 S20
#> [1,] 22 29 20 38 . 27 32 23 22
#> [2,] 29 27 21 26 . 24 41 27 29
#> [3,] 36 27 34 32 . 19 23 30 30
#> [4,] 22 34 45 37 . 25 38 39 34
#> [5,] 24 24 36 44 . 28 19 26 28
#> ... . . . . . . . . .
#> [9996,] 33 21 24 24 . 20 25 40 24
#> [9997,] 23 25 25 34 . 43 40 24 32
#> [9998,] 27 35 25 17 . 31 33 30 24
#> [9999,] 20 41 34 24 . 20 30 24 24
#> [10000,] 25 49 34 38 . 27 25 40 16
Suppose you wish to compute the standard deviation of the read counts for each gene.
You might think to use apply()
like in the
following:
system.time(row_sds <- apply(x, 1, sd))
#> user system elapsed
#> 133.249 8.520 141.776
head(row_sds)
#> [1] 7.762257 7.387044 7.859791 8.316755 8.277427 10.753090
This works, but takes quite a while.
Or perhaps you already know that the matrixStats
package provides a rowSds()
function:
matrixStats::rowSds(x)
#> Error in rowVars(x, rows = rows, cols = cols, na.rm = na.rm, refine = refine, : Argument 'x' must be a matrix or a vector
Unfortunately (and perhaps unsurprisingly) this doesn’t work. matrixStats is designed for use on in-memory matrix objects. Well, why don’t we just first realize our data in-memory and then use matrixStats
system.time(row_sds <- matrixStats::rowSds(as.matrix(x)))
#> user system elapsed
#> 0.008 0.000 0.008
head(row_sds)
#> [1] 7.762257 7.387044 7.859791 8.316755 8.277427 10.753090
This works and is many times faster than the
apply()
-based approach! However, it rather defeats the
purpose of using a HDF5Array for storing the data since we have
to bring all the data into memory at once to compute the result.
Instead, we can use DelayedMatrixStats::rowSds()
, which
has the speed benefits of matrixStats::rowSds()
1 but
without having to load the entire data into memory at once2:
library(DelayedMatrixStats)
system.time(row_sds <- rowSds(x))
#> user system elapsed
#> 0.018 0.000 0.018
head(row_sds)
#> [1] 7.762257 7.387044 7.859791 8.316755 8.277427 10.753090
Finally, by using DelayedMatrixStats
we can use the same code, (colMedians(x)
) regardless of
whether the input is an ordinary matrix or a
DelayedMatrix. This is useful for packages wishing to support
both types of objects, e.g., packages wanting to retain backward
compatibility or during a transition period from matrix-based
to DelayeMatrix-based objects.
The initial release of DelayedMatrixStats supports the complete column-wise and row-wise API matrixStats API3. Please see the matrixStats vignette (available online) for a summary these methods. The following table documents the API coverage and availability of ‘seed-aware’ methods in the current version of DelayedMatrixStats, where:
Method | Block processing | base::matrix optimized | Matrix::dgCMatrix optimized | Matrix::lgCMatrix optimized | DelayedArray::RleArray (SolidRleArraySeed) optimized | DelayedArray::RleArray (ChunkedRleArraySeed) optimized | HDF5Array::HDF5Matrix optimized | base::data.frame optimized | S4Vectors::DataFrame optimized |
---|---|---|---|---|---|---|---|---|---|
colAlls() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colAnyMissings() |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colAnyNAs() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colAnys() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colAvgsPerRowSet() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colCollapse() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colCounts() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colCummaxs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colCummins() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colCumprods() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colCumsums() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colDiffs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colIQRDiffs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colIQRs() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colLogSumExps() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colMadDiffs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colMads() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colMaxs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colMeans2() |
✔ | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ |
colMedians() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colMins() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colOrderStats() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colProds() |
✔ | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ |
colQuantiles() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colRanges() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colRanks() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colSdDiffs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colSds() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colSums2() |
✔ | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ |
colTabulates() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colVarDiffs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colVars() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colWeightedMads() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colWeightedMeans() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colWeightedMedians() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
colWeightedSds() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colWeightedVars() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
colsum() |
☑️ | ❌ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowAlls() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowAnyMissings() |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowAnyNAs() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowAnys() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowAvgsPerColSet() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowCollapse() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowCounts() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowCummaxs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowCummins() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowCumprods() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowCumsums() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowDiffs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowIQRDiffs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowIQRs() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowLogSumExps() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowMadDiffs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowMads() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowMaxs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowMeans2() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowMedians() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowMins() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowOrderStats() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowProds() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowQuantiles() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowRanges() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowRanks() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowSdDiffs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowSds() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowSums2() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowTabulates() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowVarDiffs() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowVars() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowWeightedMads() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowWeightedMeans() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowWeightedMedians() |
✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowWeightedSds() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowWeightedVars() |
✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |
rowsum() |
☑️ | ❌ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
As well as offering a familiar API, DelayedMatrixStats provides ‘seed-aware’ methods that are optimized for specific types of DelayedMatrix objects.
To illustrate this idea, we will compare two ways of computing the column sums of a DelayedMatrix object:
force_block_processing
argumentWe will demonstrate this by computing the column sums matrices with 20,000 rows and 600 columns where the data have different structure and are stored in DelayedMatrix objects with different types of seed:
We use the microbenchmark package to measure running time and the profmem package to measure the total memory allocations of each method.
In each case, the ‘seed-aware’ method is many times faster and allocates substantially lower total memory.
library(DelayedMatrixStats)
library(sparseMatrixStats)
library(microbenchmark)
library(profmem)
set.seed(666)
# -----------------------------------------------------------------------------
# Dense with values in (0, 1)
# Fast, memory-efficient column sums of DelayedMatrix with ordinary matrix seed
#
# Generate some data
dense_matrix <- matrix(runif(20000 * 600),
nrow = 20000,
ncol = 600)
# Benchmark
dm_matrix <- DelayedArray(dense_matrix)
class(seed(dm_matrix))
#> [1] "matrix" "array"
dm_matrix
#> <20000 x 600> DelayedMatrix object of type "double":
#> [,1] [,2] [,3] ... [,599] [,600]
#> [1,] 0.7743685 0.6601787 0.4098798 . 0.89118118 0.05776471
#> [2,] 0.1972242 0.8436035 0.9198450 . 0.31799523 0.63099417
#> [3,] 0.9780138 0.2017589 0.4696158 . 0.31783791 0.02830454
#> [4,] 0.2013274 0.8797239 0.6474768 . 0.55217184 0.09678816
#> [5,] 0.3612444 0.8158778 0.5928599 . 0.08530977 0.39224147
#> ... . . . . . .
#> [19996,] 0.19490291 0.07763570 0.56391725 . 0.09703424 0.62659353
#> [19997,] 0.61182993 0.01910121 0.04046034 . 0.59708388 0.88389731
#> [19998,] 0.12932744 0.21155070 0.19344085 . 0.51682032 0.13378223
#> [19999,] 0.18985573 0.41716539 0.35110782 . 0.62939661 0.94601427
#> [20000,] 0.87889047 0.25308041 0.54666920 . 0.81630322 0.73272217
microbenchmark(
block_processing = colSums2(dm_matrix, force_block_processing = TRUE),
seed_aware = colSums2(dm_matrix),
times = 10)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> block_processing 37.07903 38.73802 103.10016 40.29768 43.82515 358.60450 10
#> seed_aware 11.94733 11.97567 12.07432 12.11604 12.14295 12.21361 10
total(profmem(colSums2(dm_matrix, force_block_processing = TRUE)))
#> [1] 96100848
total(profmem(colSums2(dm_matrix)))
#> [1] 6064
# -----------------------------------------------------------------------------
# Sparse (60% zero) with values in (0, 1)
# Fast, memory-efficient column sums of DelayedMatrix with ordinary matrix seed
#
# Generate some data
sparse_matrix <- dense_matrix
zero_idx <- sample(length(sparse_matrix), 0.6 * length(sparse_matrix))
sparse_matrix[zero_idx] <- 0
# Benchmark
dm_dgCMatrix <- DelayedArray(Matrix(sparse_matrix, sparse = TRUE))
class(seed(dm_dgCMatrix))
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"
dm_dgCMatrix
#> <20000 x 600> sparse DelayedMatrix object of type "double":
#> [,1] [,2] [,3] ... [,599] [,600]
#> [1,] 0.7743685 0.0000000 0.0000000 . 0.89118118 0.00000000
#> [2,] 0.1972242 0.0000000 0.9198450 . 0.00000000 0.00000000
#> [3,] 0.9780138 0.0000000 0.4696158 . 0.31783791 0.00000000
#> [4,] 0.0000000 0.8797239 0.6474768 . 0.55217184 0.00000000
#> [5,] 0.3612444 0.0000000 0.0000000 . 0.08530977 0.39224147
#> ... . . . . . .
#> [19996,] 0.1949029 0.0776357 0.0000000 . 0.09703424 0.00000000
#> [19997,] 0.0000000 0.0000000 0.0000000 . 0.00000000 0.88389731
#> [19998,] 0.0000000 0.2115507 0.1934408 . 0.00000000 0.00000000
#> [19999,] 0.1898557 0.0000000 0.3511078 . 0.62939661 0.94601427
#> [20000,] 0.8788905 0.2530804 0.0000000 . 0.00000000 0.73272217
microbenchmark(
block_processing = colSums2(dm_dgCMatrix, force_block_processing = TRUE),
seed_aware = colSums2(dm_dgCMatrix),
times = 10)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> block_processing 69.64262 76.60401 116.0571 87.26354 97.29511 401.24965 10
#> seed_aware 19.02165 19.14325 19.1913 19.16863 19.23355 19.53455 10
total(profmem(colSums2(dm_dgCMatrix, force_block_processing = TRUE)))
#> [1] 153758032
total(profmem(colSums2(dm_dgCMatrix)))
#> [1] 4848
# -----------------------------------------------------------------------------
# Dense with values in {0, 100} featuring runs of identical values
# Fast, memory-efficient column sums of DelayedMatrix with Rle-based seed
#
# Generate some data
runs <- rep(sample(100, 500000, replace = TRUE), rpois(500000, 100))
runs <- runs[seq_len(20000 * 600)]
runs_matrix <- matrix(runs,
nrow = 20000,
ncol = 600)
# Benchmark
dm_rle <- RleArray(Rle(runs),
dim = c(20000, 600))
class(seed(dm_rle))
#> [1] "SolidRleArraySeed"
#> attr(,"package")
#> [1] "DelayedArray"
dm_rle
#> <20000 x 600> RleMatrix object of type "integer":
#> [,1] [,2] [,3] [,4] ... [,597] [,598] [,599] [,600]
#> [1,] 20 65 9 50 . 43 74 85 14
#> [2,] 20 65 9 50 . 43 74 85 14
#> [3,] 20 65 9 50 . 43 74 85 14
#> [4,] 20 65 9 50 . 43 74 85 14
#> [5,] 20 65 9 50 . 43 74 85 14
#> ... . . . . . . . . .
#> [19996,] 65 9 50 16 . 74 85 14 55
#> [19997,] 65 9 50 16 . 74 85 14 55
#> [19998,] 65 9 50 16 . 74 85 14 55
#> [19999,] 65 9 50 16 . 74 85 14 55
#> [20000,] 65 9 50 16 . 74 85 14 55
microbenchmark(
block_processing = colSums2(dm_rle, force_block_processing = TRUE),
seed_aware = colSums2(dm_rle),
times = 10)
#> Unit: milliseconds
#> expr min lq mean median uq
#> block_processing 311.889584 312.738261 347.590351 313.973030 317.138524
#> seed_aware 3.369649 3.492389 4.769061 3.612187 4.080206
#> max neval
#> 640.43532 10
#> 14.32996 10
total(profmem(colSums2(dm_rle, force_block_processing = TRUE)))
#> [1] 144256800
total(profmem(colSums2(dm_rle)))
#> [1] 41048
The development of ‘seed-aware’ methods is ongoing work (see the Roadmap), and for now only a few methods and seed-types have a ‘seed-aware’ method.
An extensive set of benchmarks is under development at http://peterhickey.org/BenchmarkingDelayedMatrixStats/.
A key feature of a DelayedArray is the ability to register
‘delayed operations’. For example, let’s compute
sin(dm_matrix)
:
This instantaneous because the operation is not actually performed,
rather it is registered and only performed when the object is
realized. All methods in DelayedMatrixStats
will correctly realise these delayed operations before computing the
final result. For example, let’s compute
colSums2(sin_dm_matrix)
and compare check we get the
correct answer:
The initial version of DelayedMatrixStats provides complete coverage of the matrixStats column-wise and row-wise API4, allowing package developers to use these functions with DelayedMatrix objects as well as with ordinary matrix objects. This should simplify package development and assist authors to support to their software for large datasets stored in disk-backed data structures such as HDF5Array. Such large datasets are increasingly common with the rise of single-cell genomics.
Future releases of DelayedMatrixStats
will improve the performance of these methods, specifically by
developing additional ‘seed-aware’ methods. The plan is to prioritise
commonly used methods (e.g.,
colMeans2()
/rowMeans2()
,
colSums2()
/rowSums2()
, etc.) and the
development of ‘seed-aware’ methods for the HDF5Matrix class.
To do so, we will leverage the beachmat
package. Proof-of-concept code has shown that this can greatly increase
the performance when analysing such disk-backed data.
Importantly, all package developers using methods from DelayedMatrixStats will immediately gain from performance improvements to these low-level routines. By using DelayedMatrixStats, package developers will be able to focus on higher level programming tasks and address important scientific questions and technological challenges in high-throughput biology.
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] profmem_0.6.0 microbenchmark_1.5.0
#> [3] sparseMatrixStats_1.19.0 DelayedMatrixStats_1.29.0
#> [5] DelayedArray_0.33.2 SparseArray_1.7.2
#> [7] S4Arrays_1.7.1 abind_1.4-8
#> [9] IRanges_2.41.1 S4Vectors_0.45.2
#> [11] MatrixGenerics_1.19.0 matrixStats_1.4.1
#> [13] BiocGenerics_0.53.3 generics_0.1.3
#> [15] Matrix_1.7-1 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] jsonlite_1.8.9 compiler_4.4.2 BiocManager_1.30.25
#> [4] crayon_1.5.3 Rcpp_1.0.13-1 rhdf5filters_1.19.0
#> [7] jquerylib_0.1.4 yaml_2.3.10 fastmap_1.2.0
#> [10] lattice_0.22-6 R6_2.5.1 XVector_0.47.0
#> [13] knitr_1.49 maketools_1.3.1 bslib_0.8.0
#> [16] rlang_1.1.4 cachem_1.1.0 HDF5Array_1.35.1
#> [19] xfun_0.49 sass_0.4.9 sys_3.4.3
#> [22] cli_3.6.3 Rhdf5lib_1.29.0 zlibbioc_1.52.0
#> [25] digest_0.6.37 grid_4.4.2 rhdf5_2.51.0
#> [28] lifecycle_1.0.4 evaluate_1.0.1 buildtools_1.0.0
#> [31] rmarkdown_2.29 tools_4.4.2 htmltools_0.5.8.1
In fact, it currently uses
matrixStats::rowSds()
under the hood.↩︎
In this case, it loads blocks of data row-by-row. The
amount of data loaded into memory at any one time is controlled by the
default block size global setting; see
?DelayedArray::getAutoBlockSize
for details. Notably, if
the data are small enough (and the default block size is large enough)
then all the data is loaded as a single block, but this approach
generalizes and still works when the data are too large to be loaded
into memory in one block.↩︎
Some of the API is covered via inheritance to functionality in DelayedArray↩︎
Some of the API is covered via inheritance to functionality in DelayedArray↩︎