Title: | Compiling Bioconductor to Handle Each Matrix Type |
---|---|
Description: | Provides a consistent C++ class interface for reading from a variety of commonly used matrix types. Ordinary matrices and several sparse/dense Matrix classes are directly supported, along with a subset of the delayed operations implemented in the DelayedArray package. All other matrix-like objects are supported by calling back into R. |
Authors: | Aaron Lun [aut, cre], Hervé Pagès [aut], Mike Smith [aut] |
Maintainer: | Aaron Lun <[email protected]> |
License: | GPL-3 |
Version: | 2.23.6 |
Built: | 2025-01-15 02:56:24 UTC |
Source: | https://github.com/bioc/beachmat |
Check the in-memory cache for a pre-existing initialized C++ object, and initialize it if it does not exist.
This is typically used in initializeCpp
methods of file-backed representations to avoid redundant reads of the entire matrix.
flushMemoryCache() checkMemoryCache(namespace, key, fun)
flushMemoryCache() checkMemoryCache(namespace, key, fun)
namespace |
String containing the namespace, typically the name of the package implementing the method. |
key |
String containing the key for a specific matrix instance. |
fun |
Function that accepts no arguments and returns an external pointer like those returned by |
For representations where data extraction is costly (e.g., from file), initializeCpp
methods may provide a memorize=
option.
Setting this to TRUE
will load the entire matrix into memory, effectively paying a one-time up-front cost to improve efficiency for downstream operations that pass through the matrix multiple times.
If this option is provided, initializeCpp
methods are expected to cache the in-memory instance using checkMemoryCache
.
This ensures that all subsequent calls to the same initializeCpp
method will return the same instance, avoiding redundant memory loads when the same matrix is used in multiple functions.
Of course, this process saves time at the expense of increased memory usage.
If too many instances are being cached, they can be cleared from memory using the flushMemoryCache
function.
For checkMemoryCache
, the output of fun
(possibly from an existing cache) is returned.
For flushMemoryCache
, all existing cached objects are removed and NULL
is invisibly returned.
Aaron Lun
# Mocking up a class with some kind of uniquely identifying aspect. setClass("UnknownMatrix", slots=c(contents="dgCMatrix", uuid="character")) X <- new("UnknownMatrix", contents=Matrix::rsparsematrix(10, 10, 0.1), uuid=as.character(sample(1e8, 1))) # Defining our initialization method. setMethod("initializeCpp", "UnknownMatrix", function(x, ..., memorize=FALSE) { if (memorize) { checkMemoryCache("my_package", x@uuid, function() initializeCpp(x@contents)) } else { initializeCpp(x@contents) } }) # Same pointer is returned multiple times. initializeCpp(X, memorize=TRUE) initializeCpp(X, memorize=TRUE) # Flushing the cache. flushMemoryCache()
# Mocking up a class with some kind of uniquely identifying aspect. setClass("UnknownMatrix", slots=c(contents="dgCMatrix", uuid="character")) X <- new("UnknownMatrix", contents=Matrix::rsparsematrix(10, 10, 0.1), uuid=as.character(sample(1e8, 1))) # Defining our initialization method. setMethod("initializeCpp", "UnknownMatrix", function(x, ..., memorize=FALSE) { if (memorize) { checkMemoryCache("my_package", x@uuid, function() initializeCpp(x@contents)) } else { initializeCpp(x@contents) } }) # Same pointer is returned multiple times. initializeCpp(X, memorize=TRUE) initializeCpp(X, memorize=TRUE) # Flushing the cache. flushMemoryCache()
Apply a function over blocks of columns or rows using DelayedArray's block processing mechanism.
colBlockApply( x, FUN, ..., grid = NULL, coerce.sparse = TRUE, BPPARAM = getAutoBPPARAM() ) rowBlockApply( x, FUN, ..., grid = NULL, coerce.sparse = TRUE, BPPARAM = getAutoBPPARAM() )
colBlockApply( x, FUN, ..., grid = NULL, coerce.sparse = TRUE, BPPARAM = getAutoBPPARAM() ) rowBlockApply( x, FUN, ..., grid = NULL, coerce.sparse = TRUE, BPPARAM = getAutoBPPARAM() )
x |
A matrix-like object to be split into blocks and looped over. This can be of any class that respects the matrix contract. |
FUN |
A function that operates on columns or rows in |
... |
Further arguments to pass to |
grid |
An ArrayGrid object specifying how |
coerce.sparse |
Logical scalar indicating whether blocks of a sparse DelayedMatrix |
BPPARAM |
A BiocParallelParam object from the BiocParallel package, specifying how parallelization should be performed across blocks. |
This is a wrapper around blockApply
that is dedicated to looping across rows or columns of x
.
The aim is to provide a simpler interface for the common task of apply
ing across a matrix,
along with a few modifications to improve efficiency for parallel processing and for natively supported x
.
Note that the fragmentation of x
into blocks is not easily predictable,
meaning that FUN
should be capable of operating on each row/column independently.
Users can retrieve the current location of each block of x
by calling currentViewport
inside FUN
.
If grid
is not explicitly set to an ArrayGrid object, it can take several values:
If TRUE
, the function will choose a grid that (i) respects the memory limits in getAutoBlockSize
and (ii) fragments x
into sufficiently fine chunks that every worker in BPPARAM
gets to do something.
If FUN
might make large allocations, this mode should be used to constrain memory usage.
The default grid=NULL
is very similar to TRUE
except that that memory limits are ignored when x
is of any type that can be passed directly to FUN
.
This avoids unnecessary copies of x
and is best used when FUN
itself does not make large allocations.
If FALSE
, the function will choose a grid that covers the entire x
.
This is provided for completeness and is only really useful for debugging.
The default of coerce.sparse=TRUE
will generate dgCMatrix objects during block processing of a sparse DelayedMatrix x
.
This is convenient as it avoids the need for FUN
to specially handle SparseMatrix objects from the SparseArray package.
If the coercion is not desired (e.g., to preserve integer values in x
), it can be disabled with coerce.sparse=FALSE
.
A list of length equal to the number of blocks,
where each entry is the output of FUN
for the results of processing each the rows/columns in the corresponding block.
blockApply
, for the original DelayedArray implementation.
toCsparse
, to convert SparseMatrix objects to CsparseMatrix objects prior to further processing in FUN
.
x <- matrix(runif(10000), ncol=10) str(colBlockApply(x, colSums)) str(rowBlockApply(x, rowSums)) library(Matrix) y <- rsparsematrix(10000, 10000, density=0.01) str(colBlockApply(y, colSums)) str(rowBlockApply(y, rowSums)) library(DelayedArray) z <- DelayedArray(y) + 1 str(colBlockApply(z, colSums)) str(rowBlockApply(z, rowSums)) # We can also force multiple blocks: library(BiocParallel) BPPARAM <- SnowParam(2) str(colBlockApply(x, colSums, BPPARAM=BPPARAM)) str(rowBlockApply(x, rowSums, BPPARAM=BPPARAM))
x <- matrix(runif(10000), ncol=10) str(colBlockApply(x, colSums)) str(rowBlockApply(x, rowSums)) library(Matrix) y <- rsparsematrix(10000, 10000, density=0.01) str(colBlockApply(y, colSums)) str(rowBlockApply(y, rowSums)) library(DelayedArray) z <- DelayedArray(y) + 1 str(colBlockApply(z, colSums)) str(rowBlockApply(z, rowSums)) # We can also force multiple blocks: library(BiocParallel) BPPARAM <- SnowParam(2) str(colBlockApply(x, colSums, BPPARAM=BPPARAM)) str(rowBlockApply(x, rowSums, BPPARAM=BPPARAM))
Initialize a tatami matrix object in C++ memory space from an abstract numeric R matrix. This object simply references the R memory space and avoids making any copies of its own, so it can be cheaply re-created when needed inside each function.
initializeCpp(x, ...)
initializeCpp(x, ...)
x |
A matrix-like object, typically from the Matrix or DelayedArray packages.
Alternatively, an external pointer from a previous call to |
... |
Further arguments used by specific methods, such as:
Fields should generally be prefixed by the matrix type, to avoid conflicts with arguments from other packages.
For example, |
Do not attempt to serialize the return value; it contains a pointer to external memory, and will not be valid after a save/load cycle.
Users should not be exposed to the returned pointers; rather, developers should call initialize
at the start to obtain a C++ object for further processing.
As mentioned before, this initialization process is very cheap so there is no downside from just recreating the object within each function body.
An external pointer to a C++ object containing a tatami matrix.
# Mocking up a count matrix: x <- Matrix::rsparsematrix(1000, 100, 0.1) y <- round(abs(x)) stuff <- initializeCpp(y) stuff
# Mocking up a count matrix: x <- Matrix::rsparsematrix(1000, 100, 0.1) y <- round(abs(x)) stuff <- initializeCpp(y) stuff
Realize a file-backed DelayedMatrix into its corresponding in-memory format.
realizeFileBackedMatrix(x) isFileBackedMatrix(x)
realizeFileBackedMatrix(x) isFileBackedMatrix(x)
x |
A DelayedMatrix object. |
A file-backed matrix representation is recognized based on whether it has a path
method for any one of its seeds.
If so, and the "beachmat.realizeFileBackedMatrix"
option is not FALSE
, we will load it into memory.
This is intended for DelayedMatrix objects that have already been subsetted (e.g., to highly variable genes),
which can be feasibly loaded into memory for rapid calculations.
For realizeFileBackedMatrix
, an ordinary matrix or a dgCMatrix, depending on whether is_sparse(x)
.
For isFileBackedMatrix
, a logical scalar indicating whether x
has file-backed components.
Aaron Lun
mat <- matrix(rnorm(50), ncol=5) realizeFileBackedMatrix(mat) # no effect library(HDF5Array) mat2 <- as(mat, "HDF5Array") realizeFileBackedMatrix(mat2) # realized into memory
mat <- matrix(rnorm(50), ncol=5) realizeFileBackedMatrix(mat) # no effect library(HDF5Array) mat2 <- as(mat, "HDF5Array") realizeFileBackedMatrix(mat2) # realized into memory
Utility functions that directly operate on the pointers produced by initializeCpp
.
Some of these are used internally by initializeCpp
methods operating on DelayedArray classes.
tatami.bind(xs, by.row) tatami.transpose(x) tatami.subset(x, subset, by.row) tatami.arith(x, op, val, by.row, right) tatami.compare(x, op, val, by.row, right) tatami.logic(x, op, val, by.row) tatami.round(x) tatami.log(x, base) tatami.math(x, op) tatami.not(x) tatami.binary(x, y, op) tatami.dim(x) tatami.row(x, i) tatami.column(x, i) tatami.row.sums(x, num.threads) tatami.column.sums(x, num.threads) tatami.row.nan.counts(x, num.threads) tatami.column.nan.counts(x, num.threads) tatami.is.sparse(x) tatami.prefer.rows(x) tatami.realize(x, num.threads) tatami.multiply(x, val, right, num.threads)
tatami.bind(xs, by.row) tatami.transpose(x) tatami.subset(x, subset, by.row) tatami.arith(x, op, val, by.row, right) tatami.compare(x, op, val, by.row, right) tatami.logic(x, op, val, by.row) tatami.round(x) tatami.log(x, base) tatami.math(x, op) tatami.not(x) tatami.binary(x, y, op) tatami.dim(x) tatami.row(x, i) tatami.column(x, i) tatami.row.sums(x, num.threads) tatami.column.sums(x, num.threads) tatami.row.nan.counts(x, num.threads) tatami.column.nan.counts(x, num.threads) tatami.is.sparse(x) tatami.prefer.rows(x) tatami.realize(x, num.threads) tatami.multiply(x, val, right, num.threads)
xs |
A list of pointers produced by |
by.row |
Logical scalar indicating whether to apply the operation on the rows.
|
x |
A pointer produced by |
subset |
Integer vector containing the subset of interest.
These should be 1-based row or column indices depending on |
op |
String specifying the operation to perform.
|
val |
For
For
|
right |
For For |
base |
Numeric scalar specifying the base of the log-transformation. |
y |
A pointer produced by |
i |
Integer scalar containing the 1-based index of the row (for |
num.threads |
Integer scalar specifying the number of threads to use. |
For tatami.dim
, an integer vector containing the dimensions of the matrix.
For tatami.is.sparse
, a logical scalar indicating whether the matrix is sparse.
For tatami.prefer.rows
, a logical scalar indicating whether the matrix prefers iteration by row.
For tatami.row
or tatami.column
, a numeric vector containing the contents of row or column i
, respectively.
For tatami.row.sums
or tatami.column.sums
, a numeric vector containing the row or column sums, respectively.
For tatami.row.nan.counts
or tatami.column.nan.counts
, a numeric vector containing the number of NaNs in each row or column, respectively.
For tatami.realize
, a numeric matrix or dgCMatrix with the matrix contents.
The exact class depends on whether x
refers to a sparse matrix.
For tatami.multiply
, a numeric matrix containing the matrix product of x
and other
.
For all other functions, a new pointer to a matrix with the requested operations applied to x
or xs
.
Aaron Lun
x <- Matrix::rsparsematrix(1000, 100, 0.1) ptr <- initializeCpp(x) tatami.dim(ptr) tatami.row(ptr, 1) rounded <- tatami.round(ptr) tatami.row(rounded, 1)
x <- Matrix::rsparsematrix(1000, 100, 0.1) ptr <- initializeCpp(x) tatami.dim(ptr) tatami.row(ptr, 1) rounded <- tatami.round(ptr) tatami.row(rounded, 1)
Exactly what it says in the title.
toCsparse(x)
toCsparse(x)
x |
Any object produced by block processing with |
This is intended for use inside functions to be passed to colBlockApply
or rowBlockApply
.
The idea is to pre-process blocks for user-defined functions that don't know how to deal with SparseMatrix objects,
which is often the case for R-defined functions that do not benefit from beachmat's C++ abstraction.
x
is returned unless it is a SparseMatrix object from the SparseArray package,
in which case an appropriate CsparseMatrix object is returned instead.
Aaron Lun
library(SparseArray) out <- COO_SparseArray(c(10, 10), nzcoo=cbind(1:10, sample(10)), nzdata=runif(10)) toCsparse(out)
library(SparseArray) out <- COO_SparseArray(c(10, 10), nzcoo=cbind(1:10, sample(10)), nzdata=runif(10)) toCsparse(out)
This function is soft-deprecated; users are advised to use nzwhich
and nzvals
instead.
whichNonZero(x, ...)
whichNonZero(x, ...)
x |
A numeric matrix-like object, usually sparse in content if not in representation. |
... |
Further arguments, ignored. |
A list containing i
, an integer vector of the row indices of all non-zero entries;
j
, an integer vector of the column indices of all non-zero entries;
and x
, a (usually atomic) vector of the values of the non-zero entries.
Aaron Lun
which
, obviously.
x <- Matrix::rsparsematrix(1e6, 1e6, 0.000001) out <- whichNonZero(x) str(out)
x <- Matrix::rsparsematrix(1e6, 1e6, 0.000001) out <- whichNonZero(x) str(out)