| Title: | Compiling Bioconductor to Handle Each Matrix Type |
|---|---|
| Description: | Provides a consistent C++ class interface for reading from a variety of commonly used matrix types. Ordinary matrices and several sparse/dense Matrix classes are directly supported, along with a subset of the delayed operations implemented in the DelayedArray package. All other matrix-like objects are supported by calling back into R. |
| Authors: | Aaron Lun [aut, cre], Hervé Pagès [aut], Mike Smith [aut] |
| Maintainer: | Aaron Lun <[email protected]> |
| License: | GPL-3 |
| Version: | 2.29.0 |
| Built: | 2026-05-29 08:35:11 UTC |
| Source: | https://github.com/bioc/beachmat |
Check the in-memory cache for a pre-existing initialized C++ object, and initialize it if it does not exist.
This is typically used in initializeCpp methods of file-backed representations to avoid redundant reads of the entire matrix.
flushMemoryCache() checkMemoryCache(namespace, key, fun)flushMemoryCache() checkMemoryCache(namespace, key, fun)
namespace |
String containing the namespace, typically the name of the package implementing the method. |
key |
String containing the key for a specific matrix instance. |
fun |
Function that accepts no arguments and returns an external pointer like those returned by |
For representations where data extraction is costly (e.g., from file), initializeCpp methods may consider realizing the entire matrix into memory.
This effectively pays a one-time up-front cost to improve efficiency for downstream operations that pass through the matrix multiple times.
If such an option is enabled, initializeCpp methods are expected to cache the in-memory instance using checkMemoryCache.
This ensures that all subsequent calls to the same initializeCpp method will return the same instance, avoiding redundant memory loads when the same matrix is used in multiple functions.
Of course, this process comes at the expense of increased memory usage.
If too many instances are stored in the cache, they can be cleared from memory using the flushMemoryCache function.
For checkMemoryCache, the output of fun (possibly from an existing cache) is returned.
For flushMemoryCache, all existing cached objects are removed and NULL is invisibly returned.
Aaron Lun
The hdf5.realize= and tiledb.realize= options in the initializeCpp methods of beachmat.hdf5 and beachmat.tiledb.
# Mocking up a class with some kind of uniquely identifying aspect. setClass("UnknownMatrix", slots=c(contents="dgCMatrix", uuid="character")) X <- new("UnknownMatrix", contents=Matrix::rsparsematrix(10, 10, 0.1), uuid=as.character(sample(1e8, 1))) # Defining our initialization method. setMethod("initializeCpp", "UnknownMatrix", function(x, ..., memorize=FALSE) { if (memorize) { checkMemoryCache("my_package", x@uuid, function() initializeCpp(x@contents)) } else { initializeCpp(x@contents) } }) # Same pointer is returned multiple times. initializeCpp(X, memorize=TRUE) initializeCpp(X, memorize=TRUE) # Flushing the cache. flushMemoryCache()# Mocking up a class with some kind of uniquely identifying aspect. setClass("UnknownMatrix", slots=c(contents="dgCMatrix", uuid="character")) X <- new("UnknownMatrix", contents=Matrix::rsparsematrix(10, 10, 0.1), uuid=as.character(sample(1e8, 1))) # Defining our initialization method. setMethod("initializeCpp", "UnknownMatrix", function(x, ..., memorize=FALSE) { if (memorize) { checkMemoryCache("my_package", x@uuid, function() initializeCpp(x@contents)) } else { initializeCpp(x@contents) } }) # Same pointer is returned multiple times. initializeCpp(X, memorize=TRUE) initializeCpp(X, memorize=TRUE) # Flushing the cache. flushMemoryCache()
Apply a function over blocks of columns or rows using DelayedArray's block processing mechanism.
colBlockApply( x, FUN, ..., grid = NULL, coerce.sparse = TRUE, BPPARAM = getAutoBPPARAM() ) rowBlockApply( x, FUN, ..., grid = NULL, coerce.sparse = TRUE, BPPARAM = getAutoBPPARAM() )colBlockApply( x, FUN, ..., grid = NULL, coerce.sparse = TRUE, BPPARAM = getAutoBPPARAM() ) rowBlockApply( x, FUN, ..., grid = NULL, coerce.sparse = TRUE, BPPARAM = getAutoBPPARAM() )
x |
A matrix-like object to be split into blocks and looped over. This can be of any class that respects the matrix contract. |
FUN |
A function that operates on columns or rows in |
... |
Further arguments to pass to |
grid |
An ArrayGrid object specifying how |
coerce.sparse |
Logical scalar indicating whether blocks of a sparse DelayedMatrix |
BPPARAM |
A BiocParallelParam object from the BiocParallel package, specifying how parallelization should be performed across blocks. |
This is a wrapper around blockApply that is dedicated to looping across rows or columns of x.
The aim is to provide a simpler interface for the common task of applying across a matrix,
along with a few modifications to improve efficiency for parallel processing and for natively supported x.
Note that the fragmentation of x into blocks is not easily predictable,
meaning that FUN should be capable of operating on each row/column independently.
Users can retrieve the current location of each block of x by calling currentViewport inside FUN.
If grid is not explicitly set to an ArrayGrid object, it can take several values:
If TRUE, the function will choose a grid that (i) respects the memory limits in getAutoBlockSize
and (ii) fragments x into sufficiently fine chunks that every worker in BPPARAM gets to do something.
If FUN might make large allocations, this mode should be used to constrain memory usage.
The default grid=NULL is very similar to TRUE
except that that memory limits are ignored when x is of any type that can be passed directly to FUN.
This avoids unnecessary copies of x and is best used when FUN itself does not make large allocations.
If FALSE, the function will choose a grid that covers the entire x.
This is provided for completeness and is only really useful for debugging.
The default of coerce.sparse=TRUE will generate dgCMatrix objects during block processing of a sparse DelayedMatrix x.
This is convenient as it avoids the need for FUN to specially handle SparseMatrix objects from the SparseArray package.
If the coercion is not desired (e.g., to preserve integer values in x), it can be disabled with coerce.sparse=FALSE.
A list of length equal to the number of blocks,
where each entry is the output of FUN for the results of processing each the rows/columns in the corresponding block.
blockApply, for the original DelayedArray implementation.
toCsparse, to convert SparseMatrix objects to CsparseMatrix objects prior to further processing in FUN.
x <- matrix(runif(10000), ncol=10) str(colBlockApply(x, colSums)) str(rowBlockApply(x, rowSums)) library(Matrix) y <- rsparsematrix(10000, 10000, density=0.01) str(colBlockApply(y, colSums)) str(rowBlockApply(y, rowSums)) library(DelayedArray) z <- DelayedArray(y) + 1 str(colBlockApply(z, colSums)) str(rowBlockApply(z, rowSums)) # We can also force multiple blocks: library(BiocParallel) BPPARAM <- SnowParam(2) str(colBlockApply(x, colSums, BPPARAM=BPPARAM)) str(rowBlockApply(x, rowSums, BPPARAM=BPPARAM))x <- matrix(runif(10000), ncol=10) str(colBlockApply(x, colSums)) str(rowBlockApply(x, rowSums)) library(Matrix) y <- rsparsematrix(10000, 10000, density=0.01) str(colBlockApply(y, colSums)) str(rowBlockApply(y, rowSums)) library(DelayedArray) z <- DelayedArray(y) + 1 str(colBlockApply(z, colSums)) str(rowBlockApply(z, rowSums)) # We can also force multiple blocks: library(BiocParallel) BPPARAM <- SnowParam(2) str(colBlockApply(x, colSums, BPPARAM=BPPARAM)) str(rowBlockApply(x, rowSums, BPPARAM=BPPARAM))
Get the executor object for safe execution of R code in parallel sections.
This should be set by Rtatami::set_executor() in the .onLoad function of downstream packages.
getExecutor()getExecutor()
An external pointer to be passed to Rtatami::set_executor.
Aaron Lun
getExecutor()getExecutor()
Initialize a tatami matrix object in C++ memory space from an abstract numeric R matrix. This object simply references the R memory space and avoids making any copies of its own, so it can be cheaply re-created when needed inside each function.
initializeCpp(x, ...)initializeCpp(x, ...)
x |
A matrix-like object, typically from the Matrix or DelayedArray packages.
Alternatively, an external pointer from a previous call to |
... |
Further arguments used by specific methods, such as:
If a |
Do not attempt to serialize the return value; it contains a pointer to external memory, and will not be valid after a save/load cycle.
Users should not be exposed to the returned pointers; rather, developers should call initializeCpp at the start to obtain a C++ object for further processing.
The initialization process should be cheap so there is no downside from just recreating the object within each function body.
An external pointer to a C++ object containing a tatami matrix.
# Mocking up a count matrix: x <- Matrix::rsparsematrix(1000, 100, 0.1) y <- round(abs(x)) stuff <- initializeCpp(y) stuff# Mocking up a count matrix: x <- Matrix::rsparsematrix(1000, 100, 0.1) y <- round(abs(x)) stuff <- initializeCpp(y) stuff
Realize a file-backed DelayedMatrix into its corresponding in-memory format.
realizeFileBackedMatrix(x) isFileBackedMatrix(x)realizeFileBackedMatrix(x) isFileBackedMatrix(x)
x |
A DelayedMatrix object. |
A file-backed matrix representation is recognized based on whether it has a path method for any one of its seeds.
If so, and the "beachmat.realizeFileBackedMatrix" option is not FALSE, we will load it into memory.
This is intended for DelayedMatrix objects that have already been subsetted (e.g., to highly variable genes),
which can be feasibly loaded into memory for rapid calculations.
For realizeFileBackedMatrix, an ordinary matrix or a dgCMatrix, depending on whether is_sparse(x).
For isFileBackedMatrix, a logical scalar indicating whether x has file-backed components.
Aaron Lun
mat <- matrix(rnorm(50), ncol=5) realizeFileBackedMatrix(mat) # no effect library(HDF5Array) mat2 <- as(mat, "HDF5Array") realizeFileBackedMatrix(mat2) # realized into memorymat <- matrix(rnorm(50), ncol=5) realizeFileBackedMatrix(mat) # no effect library(HDF5Array) mat2 <- as(mat, "HDF5Array") realizeFileBackedMatrix(mat2) # realized into memory
Utility functions that directly operate on the pointers produced by initializeCpp.
Some of these are used internally by initializeCpp methods operating on DelayedArray classes.
tatami.bind(xs, by.row) tatami.transpose(x) tatami.subset(x, subset, by.row) tatami.arith(x, op, val, by.row, right) tatami.compare(x, op, val, by.row, right) tatami.logic(x, op, val, by.row) tatami.round(x) tatami.log(x, base) tatami.math(x, op) tatami.not(x) tatami.binary(x, y, op) tatami.dim(x) tatami.get(x, i, row) tatami.is.sparse(x) tatami.prefer.rows(x) tatami.realize(x, num.threads) tatami.multiply(x, val, right, num.threads) tatami.sums(x, row, num.threads) tatami.sums.by.group(x, group, row, num.threads) tatami.medians(x, row, num.threads) tatami.nan.counts(x, row, num.threads)tatami.bind(xs, by.row) tatami.transpose(x) tatami.subset(x, subset, by.row) tatami.arith(x, op, val, by.row, right) tatami.compare(x, op, val, by.row, right) tatami.logic(x, op, val, by.row) tatami.round(x) tatami.log(x, base) tatami.math(x, op) tatami.not(x) tatami.binary(x, y, op) tatami.dim(x) tatami.get(x, i, row) tatami.is.sparse(x) tatami.prefer.rows(x) tatami.realize(x, num.threads) tatami.multiply(x, val, right, num.threads) tatami.sums(x, row, num.threads) tatami.sums.by.group(x, group, row, num.threads) tatami.medians(x, row, num.threads) tatami.nan.counts(x, row, num.threads)
xs |
A list of pointers produced by |
by.row |
Logical scalar indicating whether to apply the operation on the rows.
|
x |
A pointer produced by |
subset |
Integer vector containing the subset of interest.
These should be 1-based row or column indices depending on |
op |
String specifying the operation to perform.
|
val |
For
For
|
right |
For For |
base |
Numeric scalar specifying the base of the log-transformation. |
y |
A pointer produced by |
i |
Integer scalar containing the 1-based index of the row (for |
row |
For For |
num.threads |
Integer scalar specifying the number of threads to use. |
group |
Integer vector of length equal to the number of columns (if |
For tatami.dim, an integer vector containing the dimensions of the matrix.
For tatami.is.sparse, a logical scalar indicating whether the matrix is sparse.
For tatami.prefer.rows, a logical scalar indicating whether the matrix prefers iteration by row.
For tatami.get, a numeric vector containing the contents of row or column i, respectively.
For tatami.realize, a numeric matrix or dgCMatrix with the matrix contents.
The exact class depends on whether x refers to a sparse matrix.
For tatami.multiply, a numeric matrix containing the matrix product of x and other.
For tatami.sums, a numeric vector containing the row or column sums, respectively.
For tatami.sums.by.group, a numeric matrix is returned.
If row=TRUE, its number of rows is equal to that of x and its number of columns is equal to the number of groups.
Each column corresponds to a group and contains the row sums across all columns of x assigned to that group.
If row=FALSE, its number of columns is equal to that of x and its number of rows is equal to the number of groups.
Each row corresponds to a group and contains the column sums across all columns of x assigned to that group.
For tatami.medians, a numeric vector containing the row or column medians, respectively.
For tatami.nan.counts, a numeric vector containing the number of NaNs in each row or column, respectively.
For all other functions, a new pointer to a matrix with the requested operations applied to x or xs.
Aaron Lun
x <- Matrix::rsparsematrix(1000, 100, 0.1) ptr <- initializeCpp(x) tatami.dim(ptr) tatami.get(ptr, 1, row=TRUE) rounded <- tatami.round(ptr) tatami.get(rounded, 1, row=TRUE) tatami.sums(ptr, row=FALSE, num.threads=2) tatami.medians(ptr, row=FALSE, num.threads=2)x <- Matrix::rsparsematrix(1000, 100, 0.1) ptr <- initializeCpp(x) tatami.dim(ptr) tatami.get(ptr, 1, row=TRUE) rounded <- tatami.round(ptr) tatami.get(rounded, 1, row=TRUE) tatami.sums(ptr, row=FALSE, num.threads=2) tatami.medians(ptr, row=FALSE, num.threads=2)
Exactly what it says in the title.
toCsparse(x)toCsparse(x)
x |
Any object produced by block processing with |
This is intended for use inside functions to be passed to colBlockApply or rowBlockApply.
The idea is to pre-process blocks for user-defined functions that don't know how to deal with SparseMatrix objects,
which is often the case for R-defined functions that do not benefit from beachmat's C++ abstraction.
x is returned unless it is a SparseMatrix object,
in which case an appropriate CsparseMatrix object is returned instead.
Aaron Lun
library(SparseArray) out <- COO_SparseArray(c(10, 10), nzcoo=cbind(1:10, sample(10)), nzdata=runif(10)) toCsparse(out)library(SparseArray) out <- COO_SparseArray(c(10, 10), nzcoo=cbind(1:10, sample(10)), nzdata=runif(10)) toCsparse(out)
This function is soft-deprecated; users are advised to use nzwhich and nzvals instead.
whichNonZero(x, ...)whichNonZero(x, ...)
x |
A numeric matrix-like object, usually sparse in content if not in representation. |
... |
Further arguments, ignored. |
A list containing i, an integer vector of the row indices of all non-zero entries;
j, an integer vector of the column indices of all non-zero entries;
and x, a (usually atomic) vector of the values of the non-zero entries.
Aaron Lun
which, obviously.
x <- Matrix::rsparsematrix(1e6, 1e6, 0.000001) out <- whichNonZero(x) str(out)x <- Matrix::rsparsematrix(1e6, 1e6, 0.000001) out <- whichNonZero(x) str(out)