The beachmat package provides a C++ API to extract numeric data from matrix-like R objects, based on the matrix representations in the tatami library. This enables Bioconductor packages to use C++ for high-performance processing of data in arbitrary R matrices, including:
matrix
objectsDelayedMatrix
objects with delayed operations from the
DelayedArray
packageDelayedArray::extract_array()
method.Where possible, beachmat will map the R object to its C++ representation, bypassing the R interpreter to directly extract the matrix contents. This provides fast access by avoiding R-level evaluation, saves memory by avoiding block processing and memory allocations, and permits more fine-grained parallelization. For objects without native support, the R interpreter is called in a thread-safe manner to ensure that any downstream C++ code still works.
Packages that use beachmat’s API should set the following:
The compiler should support the C++17 standard. Developers need
to tell the build system to use C++17, by modifying the
SystemRequirements
field of the DESCRIPTION
file:
SystemRequirements: C++17
… or modifying the PKG_CPPFLAGS
in the
Makevars
with the relevant flags.
Rcpp should be
installed. Developers need to ensure that Rcpp is loaded
when the package is loaded. This requires addition of Rcpp
to the Imports
field of the DESCRIPTION
file:
Imports: Rcpp
… and a corresponding importFrom
specification in the
NAMESPACE
file:
importFrom(Rcpp, sourceCpp)
(The exact function to be imported doesn’t matter, as long as the namespace is loaded. Check out the Rcpp documentation for more details.)
Linking to beachmat is as simple as writing:
LinkingTo: Rcpp, assorthead, beachmat
… in the DESCRIPTION
file.
Given an arbitrary matrix-like object, we create its C++
representation using the initializeCpp()
function. This
process is very cheap as no data is copied, so the C++ object only holds
views on the memory allocated and owned by R itself.
# Mocking up some kind of matrix-like object.
library(Matrix)
x <- round(rsparsematrix(1000, 10, 0.2))
# Initializing it in C++.
library(beachmat)
ptr <- initializeCpp(x)
ptr
now refers to a BoundNumericMatrix
object, the composition of which can be found in the
Rtatami.h
header. Of particular relevance is the
ptr
member, which contains a pointer to a
tatami::NumericMatrix
object derived from the argument to
initializeCpp()
. Developers can read the documentation in
the header file for more details:
Now we can write a function that uses tatami to
operate on ptr
. All functionality described in
tatami’s documentation can be
used here; the only beachmat-specific
factor is that developers should include the Rtatami.h
header first (which takes care of the tatami headers).
Let’s just say we want to compute column sums - a simple implementation
might look like this:
#include "Rtatami.h"
#include <vector>
#include <algorithm>
// Not necessary in a package context, it's only used for this vignette:
// [[Rcpp::depends(beachmat, assorthead)]]
// [[Rcpp::export(rng=false)]]
Rcpp::NumericVector column_sums(Rcpp::RObject initmat) {
Rtatami::BoundNumericPointer parsed(initmat);
const auto& ptr = parsed->ptr;
auto NR = ptr->nrow();
auto NC = ptr->ncol();
std::vector<double> buffer(NR);
Rcpp::NumericVector output(NC);
auto wrk = ptr->dense_column();
for (int i = 0; i < NC; ++i) {
auto extracted = wrk->fetch(i, buffer.data());
output[i] = std::accumulate(extracted, extracted + NR, 0.0);
}
return output;
}
Let’s compile this function with Rcpp and put it
to work. We can just pass in the ptr
that we created
earlier:
## [1] 10 -12 14 -14 21 3 -28 22 -19 -5
In general, we suggest calling initializeCpp()
within
package functions rather than asking users to call it themselves. The
external pointers should never be exposed to the user, as they do not
behave like regular objects, e.g., they are not serializable.
Fortunately, the initializeCpp()
calls are very cheap and
can be performed at the start of any R function that needs to do matrix
operations in C++.
tatami calls are normally thread-safe, but if the
tatami::NumericMatrix
is constructed from an unsupported
object, it needs to call R to extract the matrix contents. The R
interpreter is strictly single-threaded, which requires some care when
defining our chosen parallelization scheme. The easiest way to achieve
parallelization is to use the tatami::parallelize()
function:
#include "Rtatami.h"
#include <vector>
#include <algorithm>
// Not necessary in a package context, it's only used for this vignette:
// [[Rcpp::depends(beachmat, assorthead)]]
// [[Rcpp::export(rng=false)]]
Rcpp::NumericVector parallel_column_sums(Rcpp::RObject initmat, int nthreads) {
Rtatami::BoundNumericPointer parsed(initmat);
const auto& ptr = parsed->ptr;
auto NR = ptr->nrow();
auto NC = ptr->ncol();
Rcpp::NumericVector output(NC);
tatami::parallelize([&](int thread, int start, int length) {
std::vector<double> buffer(NR);
auto wrk = ptr->dense_column();
for (int i = start, end = start + length; i < end; ++i) {
auto extracted = wrk->fetch(i, buffer.data());
output[i] = std::accumulate(extracted, extracted + NR, 0.0);
}
}, NC, nthreads);
return output;
}
Now we put it to work with 2 threads. Note that
tatami::parallelize()
(or specifically, the macros in
Rtatami.h
) will use the standard
<thread>
library to handle parallelization, so it
will work even on toolchains that do not have OpenMP support.
## [1] 10 -12 14 -14 21 3 -28 22 -19 -5
More advanced users can check out the parallelization-related documentation in the tatami_r repository.
The conventional approach to iterating over a matrix is to use DelayedArray’s
block processing mechanism, i.e., DelayedArray::blockApply
.
This is implemented completely in R and is convenient for developers as
it can be used directly with any R function. However, its performance
only goes so far, and several years of experience with its use has
revealed a few shortcomings:
Shifting the iteration into C++ with beachmat
avoids many of these issues. The looping overhead is effectively
eliminated as the R interpreter is no longer involved. Only one
row/column is extracted at a time for most tatami::Matrix
classes, minimizing memory overhead and avoiding the need for manual
block size management in most cases. Parallelization is also much easier
with the standard <thread>
library, as we do not need
to spin up or fork to create a separate process.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] beachmat_2.23.5 Matrix_1.7-1 knitr_1.49 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.9 compiler_4.4.2 BiocManager_1.30.25
## [4] crayon_1.5.3 Rcpp_1.0.13-1 jquerylib_0.1.4
## [7] IRanges_2.41.2 assorthead_1.1.5 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 R6_2.5.1
## [13] XVector_0.47.0 S4Arrays_1.7.1 generics_0.1.3
## [16] BiocGenerics_0.53.3 DelayedArray_0.33.3 MatrixGenerics_1.19.0
## [19] maketools_1.3.1 bslib_0.8.0 rlang_1.1.4
## [22] cachem_1.1.0 xfun_0.49 sass_0.4.9
## [25] sys_3.4.3 SparseArray_1.7.2 cli_3.6.3
## [28] zlibbioc_1.52.0 digest_0.6.37 grid_4.4.2
## [31] lifecycle_1.0.4 S4Vectors_0.45.2 evaluate_1.0.1
## [34] buildtools_1.0.0 abind_1.4-8 stats4_4.4.2
## [37] rmarkdown_2.29 matrixStats_1.4.1 tools_4.4.2
## [40] htmltools_0.5.8.1