| Title: | Bring Zarr datasets in R as DelayedArray objects |
|---|---|
| Description: | The ZarrArray package leverages the Rarr package to bring Zarr datasets in R as DelayedArray objects. The main class in the package is the ZarrArray class. A ZarrArray object is an array-like object that represents a Zarr dataset in R. ZarrArray objects are DelayedArray derivatives and therefore support all operations (delayed or block-processed) supported by DelayedArray objects. |
| Authors: | Hervé Pagès [aut, cre] (ORCID: <https://orcid.org/0009-0002-8272-4522>), Mike Smith [aut] (ORCID: <https://orcid.org/0000-0002-7800-3848>), Hugo Gruson [aut] (ORCID: <https://orcid.org/0000-0002-4094-1476>), Artür Manukyan [aut] (ORCID: <https://orcid.org/0000-0002-0441-9517>), Levi Waldron [fnd] (ORCID: <https://orcid.org/0000-0003-2725-0694>) |
| Maintainer: | Hervé Pagès <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.1.0 |
| Built: | 2026-06-04 11:12:43 UTC |
| Source: | https://github.com/bioc/ZarrArray |
A function for writing an array-like object to disk in Zarr format.
writeZarrArray(x, zarr_path=NULL, chunkdim=NULL, nchar=NULL, zarr_version=3, verbose=NA)writeZarrArray(x, zarr_path=NULL, chunkdim=NULL, nchar=NULL, zarr_version=3, verbose=NA)
x |
The array-like object to write to disk in Zarr format. If |
zarr_path |
|
chunkdim |
The dimensions of the physical chunks to use when writing the data
to disk. See |
nchar |
When |
zarr_version |
The version of the Zarr specification to use. Currently,
either |
verbose |
Whether block processing progress should be displayed or not.
If set to |
writeZarrArray() leverages lower-level functionality implemented
in the Rarr package like create_empty_zarr_array() and
update_zarr_array().
Please note that, depending on the size of the data to write to disk
and the performance of the disk, writeZarrArray() can take a
long time to complete. Use verbose=TRUE to see its progress.
A ZarrArray object that points to the newly written Zarr dataset on disk.
IMPORTANT NOTE: The dimnames on x are NOT propagated to the
returned ZarrArray object at the moment! This is a temporary
situation that will be addressed in future versions of the
ZarrArray package.
When passed a DelayedArray object, writeZarrArray()
realizes it on disk, that is, all the delayed operations carried
by the object are executed on the fly while the object is written to disk.
This uses a block-processing strategy so that the full object is not
realized at once in memory. Instead the object is processed block by block
i.e. the blocks are realized in memory and written to disk one at a time.
In other words, writeZarrArray(x, ...) is semantically equivalent
to writeZarrArray(as.array(x), ...), except that as.array(x)
is not called because this would realize the full object at once in memory.
See ?DelayedArray for general information about
DelayedArray objects.
writeZarrArray_auto_args to control writeZarrArray's
automatic argument values like zarr_path and chunkdim.
ZarrArray objects.
## --------------------------------------------------------------------- ## Write an ordinary matrix to disk in Zarr format ## --------------------------------------------------------------------- m0 <- matrix(runif(180, min=-1), ncol=9) path <- tempfile(fileext=".zarr") M1 <- writeZarrArray(m0, path) M1 # ZarrMatrix object path(M1) chunkdim(M1) as(m0, "ZarrArray") # equivalent to writeZarrArray(m0) ## --------------------------------------------------------------------- ## Transform a Zarr dataset and write it back in Zarr format ## --------------------------------------------------------------------- M2 <- log(t(M1) + 1) # DelayedMatrix object M3 <- writeZarrArray(M2) M3 # ZarrMatrix object as(M2, "ZarrArray") # equivalent to writeZarrArray(M2) ## --------------------------------------------------------------------- ## Use writeZarrArray() to convert an HDF5 dataset to Zarr format ## --------------------------------------------------------------------- ## The HDF5Array package includes an HDF5 file with some toy HDF5 ## datasets: library(HDF5Array) h5_path <- system.file(package="HDF5Array", "extdata", "toy.h5") h5ls(h5_path) ## Let's convert the M1 dataset to Zarr: M1 <- HDF5Array(h5_path, "M1") zarr_path <- file.path(tempdir(), "M1.zarr") writeZarrArray(M1, zarr_path) ## Note that writeZarrArray() uses a block-processing strategy so that ## the original HDF5 dataset is not loaded at once in memory. Instead ## the object is loaded block by block and the blocks are written to ## disk one at a time. In other words writeZarrArray() can operate with ## a limited amount of memory regardless of the size of the original ## dataset. This amount of memory depends on the size of the blocks which ## can be controlled with setAutoBlockSize(). See '?setAutoBlockSize' in ## the DelayedArray package for more information.## --------------------------------------------------------------------- ## Write an ordinary matrix to disk in Zarr format ## --------------------------------------------------------------------- m0 <- matrix(runif(180, min=-1), ncol=9) path <- tempfile(fileext=".zarr") M1 <- writeZarrArray(m0, path) M1 # ZarrMatrix object path(M1) chunkdim(M1) as(m0, "ZarrArray") # equivalent to writeZarrArray(m0) ## --------------------------------------------------------------------- ## Transform a Zarr dataset and write it back in Zarr format ## --------------------------------------------------------------------- M2 <- log(t(M1) + 1) # DelayedMatrix object M3 <- writeZarrArray(M2) M3 # ZarrMatrix object as(M2, "ZarrArray") # equivalent to writeZarrArray(M2) ## --------------------------------------------------------------------- ## Use writeZarrArray() to convert an HDF5 dataset to Zarr format ## --------------------------------------------------------------------- ## The HDF5Array package includes an HDF5 file with some toy HDF5 ## datasets: library(HDF5Array) h5_path <- system.file(package="HDF5Array", "extdata", "toy.h5") h5ls(h5_path) ## Let's convert the M1 dataset to Zarr: M1 <- HDF5Array(h5_path, "M1") zarr_path <- file.path(tempdir(), "M1.zarr") writeZarrArray(M1, zarr_path) ## Note that writeZarrArray() uses a block-processing strategy so that ## the original HDF5 dataset is not loaded at once in memory. Instead ## the object is loaded block by block and the blocks are written to ## disk one at a time. In other words writeZarrArray() can operate with ## a limited amount of memory regardless of the size of the original ## dataset. This amount of memory depends on the size of the blocks which ## can be controlled with setAutoBlockSize(). See '?setAutoBlockSize' in ## the DelayedArray package for more information.
get_writeZarrArray_auto_path() and
get_writeZarrArray_auto_chunkdim() are used internally by
writeZarrArray() and ZarrRealizationSink()
to obtain automatic values for their zarr_path
and chunkdim arguments when those arguments are not supplied.
## Used internally by writeZarrArray() to obtain "automatic values" for ## arguments 'zarr_path' and 'chunkdim': get_writeZarrArray_auto_path() get_writeZarrArray_auto_chunkdim(dim) ## Control the value returned by get_writeZarrArray_auto_path(): set_writeZarrArray_dump_dir(dir) ## Control the value returned by get_writeZarrArray_auto_chunkdim(): set_writeZarrArray_chunk_maxlen(maxlen=1000000L) set_writeZarrArray_chunk_shape(shape="scale") ## The "get" functions that correspond to the "set" functions above: get_writeZarrArray_dump_dir() get_writeZarrArray_chunk_maxlen() get_writeZarrArray_chunk_shape()## Used internally by writeZarrArray() to obtain "automatic values" for ## arguments 'zarr_path' and 'chunkdim': get_writeZarrArray_auto_path() get_writeZarrArray_auto_chunkdim(dim) ## Control the value returned by get_writeZarrArray_auto_path(): set_writeZarrArray_dump_dir(dir) ## Control the value returned by get_writeZarrArray_auto_chunkdim(): set_writeZarrArray_chunk_maxlen(maxlen=1000000L) set_writeZarrArray_chunk_shape(shape="scale") ## The "get" functions that correspond to the "set" functions above: get_writeZarrArray_dump_dir() get_writeZarrArray_chunk_maxlen() get_writeZarrArray_chunk_shape()
dim |
The dimensions (as an integer vector) of the array-like object to be realized to disk in Zarr format. |
dir |
The path (as a single string) to the "realization dump", that is, to
the directory where realization of array-like objects in Zarr format
should happen by default.
If |
maxlen |
The "maximum chunk length", that is, the maximum number of array elements per physical chunk when realizing an array-like object to disk in Zarr format. |
shape |
A string describing the shape of the physical chunks to use by
default when realizing an array-like object to disk in Zarr format.
See |
Here's how writeZarrArray() obtains its automatic
argument values:
The automatic value for zarr_path is obtained with
get_writeZarrArray_auto_path().
The automatic value for chunkdim is obtained with
chunkdim(x) where x is the array-like object
passed to writeZarrArray.
If chunkdim(x) returns NULL (which can happen if
x is an in-memory object or if the dimensions of its
physical chunks cannot be determined), then chunkdim
is obtained with get_writeZarrArray_auto_chunkdim(dim(x)).
The ZarrArray package provides a set of utility functions to
control the values returned by get_writeZarrArray_auto_path()
and get_writeZarrArray_auto_chunkdim():
The value returned by get_writeZarrArray_auto_path()
is controlled by set_writeZarrArray_dump_dir().
The value returned by get_writeZarrArray_auto_chunkdim()
is controlled by set_writeZarrArray_chunk_maxlen() and
set_writeZarrArray_chunk_shape().
In other words, the set_writeZarrArray_*() utility functions
provide some control over the behavior of writeZarrArray()
and ZarrRealizationSink() when only their first argument
is specified, like in:
a <- array(101:160, dim=5:3)
A <- writeZarrArray(a)
or in:
ZarrRealizationSink(dim(a))
Consequently, they also provide some control over the behavior of
coercion of an arbitrary array-like object to ZarrArray (i.e.
on as(a, "ZarrArray")), since this coercion simply calls
writeZarrArray() on the supplied object.
get_writeZarrArray_auto_path() returns a single string
containing the automatic path used by writeZarrArray()
when its zarr_path argument is not specified.
Note that the function is used internally by writeZarrArray()
and is not meant to be used directly by the user.
get_writeZarrArray_auto_chunkdim() returns an integer vector
containing the automatic chunk dimensions used by
writeZarrArray() when its chunkdim argument is
not specified.
Note that the function is used internally by writeZarrArray()
and is not meant to be used directly by the user.
get_writeZarrArray_dump_dir() returns a single string containing
the path to the "realization dump".
set_writeZarrArray_dump_dir() returns an invisible single string
containing the previous path to the "realization dump". In other words,
prev_dir <- set_writeZarrArray_dump_dir(dir)
is equivalent to
prev_dir <- get_writeZarrArray_dump_dir()
set_writeZarrArray_dump_dir(dir)
get_writeZarrArray_chunk_maxlen() returns the "maximum chunk length"
(i.e. maximum number of array elements) of the physical chunks to use by
default when realizing an array-like object to disk in Zarr format.
set_writeZarrArray_chunk_maxlen() returns an invisible number
that is the previous "maximum chunk length". In other words,
prev_maxlen <- set_writeZarrArray_chunk_maxlen(maxlen)
is equivalent to
prev_maxlen <- get_writeZarrArray_chunk_maxlen()
set_writeZarrArray_chunk_maxlen(maxlen)
get_writeZarrArray_chunk_shape() returns a single string describing
the "chunk shape", that is, the shape of the physical chunks to use by
default when realizing an array-like object to disk in Zarr format.
set_writeZarrArray_chunk_shape() returns an invisible string
describing the previous "chunk shape". In other words,
prev_shape <- set_writeZarrArray_chunk_shape(shape)
is equivalent to
prev_shape <- get_writeZarrArray_chunk_shape()
set_writeZarrArray_chunk_shape(shape)
writeZarrArray for writing an array-like object
to disk in Zarr format.
ZarrArray objects.
makeCappedVolumeBox in the
DelayedArray package.
a <- array(101:160, dim=5:3) get_writeZarrArray_dump_dir() # default "Zarr realization dump" A1 <- writeZarrArray(a) path(A1) ## Take control of where writeZarrArray() should write Zarr datasets ## by default: my_zarr_dump <- file.path(tempdir(), "my_zarr_dump") set_writeZarrArray_dump_dir(my_zarr_dump) A2 <- writeZarrArray(a) path(A2) m <- matrix(101:140, ncol=8) M <- as(m, "ZarrArray") # equivalent to writeZarrArray(m) path(M) ## Set "Zarr realization dump" to the default: set_writeZarrArray_dump_dir()a <- array(101:160, dim=5:3) get_writeZarrArray_dump_dir() # default "Zarr realization dump" A1 <- writeZarrArray(a) path(A1) ## Take control of where writeZarrArray() should write Zarr datasets ## by default: my_zarr_dump <- file.path(tempdir(), "my_zarr_dump") set_writeZarrArray_dump_dir(my_zarr_dump) A2 <- writeZarrArray(a) path(A2) m <- matrix(101:140, ncol=8) M <- as(m, "ZarrArray") # equivalent to writeZarrArray(m) path(M) ## Set "Zarr realization dump" to the default: set_writeZarrArray_dump_dir()
The ZarrArray class is a DelayedArray extension for representing and operating on a Zarr dataset.
All the operations available for DelayedArray objects work on ZarrArray objects.
## Constructor function: ZarrArray(zarr_path, s3_client=NULL)## Constructor function: ZarrArray(zarr_path, s3_client=NULL)
zarr_path |
The path (as a single string) to the Zarr dataset. |
s3_client |
Object created by |
A ZarrArray (or ZarrMatrix) object. (Note that ZarrMatrix extends ZarrArray.)
DelayedArray objects in the DelayedArray package.
s3 in the paws.storage package
for how to create a client for the S3 service.
writeZarrArray for writing an array-like object
to disk in Zarr format.
The ZarrArraySeed helper class.
zarr_path <- system.file(package="Rarr", "extdata", "zarr_examples", "column-first", "int32.zarr") A <- ZarrArray(zarr_path) A # 3D ZarrArray object path(A) dim(A) type(A) chunkdim(A) aperm(A) # multidimensional transposition chunkdim(aperm(A)) A[ , , 1] log1p(t(A[ , , 1])) rowSums(log1p(t(A[ , , 1]))) ## Sanity check: stopifnot( identical(dim(aperm(A)), rev(dim(A))), identical(chunkdim(aperm(A)), rev(chunkdim(A))), identical(rowSums(log1p(t(A[ , , 1]))), rowSums(log1p(t(as.array(A)[ , , 1])))) )zarr_path <- system.file(package="Rarr", "extdata", "zarr_examples", "column-first", "int32.zarr") A <- ZarrArray(zarr_path) A # 3D ZarrArray object path(A) dim(A) type(A) chunkdim(A) aperm(A) # multidimensional transposition chunkdim(aperm(A)) A[ , , 1] log1p(t(A[ , , 1])) rowSums(log1p(t(A[ , , 1]))) ## Sanity check: stopifnot( identical(dim(aperm(A)), rev(dim(A))), identical(chunkdim(aperm(A)), rev(chunkdim(A))), identical(rowSums(log1p(t(A[ , , 1]))), rowSums(log1p(t(as.array(A)[ , , 1])))) )
ZarrArraySeed is a low-level helper class for representing a pointer to a Zarr dataset.
Note that a ZarrArraySeed object is not intended to be used directly.
Most end users will typically create and manipulate a higher-level
ZarrArray object instead. See ?ZarrArray for
more information.
## --- Constructor function --- ZarrArraySeed(zarr_path, s3_client=NULL) ## --- Accessors -------------- ## S4 method for signature 'ZarrArraySeed' path(object) ## S4 method for signature 'ZarrArraySeed' dim(x) ## S4 method for signature 'ZarrArraySeed' type(x) ## S4 method for signature 'ZarrArraySeed' chunkdim(x) ## --- Data extraction -------- ## S4 method for signature 'ZarrArraySeed' extract_array(x, index)## --- Constructor function --- ZarrArraySeed(zarr_path, s3_client=NULL) ## --- Accessors -------------- ## S4 method for signature 'ZarrArraySeed' path(object) ## S4 method for signature 'ZarrArraySeed' dim(x) ## S4 method for signature 'ZarrArraySeed' type(x) ## S4 method for signature 'ZarrArraySeed' chunkdim(x) ## --- Data extraction -------- ## S4 method for signature 'ZarrArraySeed' extract_array(x, index)
zarr_path, s3_client
|
See |
object, x
|
A ZarrArraySeed object. |
index |
See |
ZarrArraySeed objects only support a limited set of methods:
path(): Returns the path to the Zarr dataset.
Note that the path() generic
is defined and documented in the BiocGenerics package.
dim(), type(), chunkdim(). Note that
the type() generic is defined and
documented in the BiocGenerics package, and the
chunkdim() generic is defined and
documented in the DelayedArray package.
extract_array(), as.array(), is_sparse():
Note that these generics are defined and documented
in other packages e.g. in S4Arrays for
extract_array() and
is_sparse(), and in base
for as.array().
In order to access the full set of operations that are available
for DelayedArray objects, one needs to wrap
a ZarrArraySeed object in a DelayedArray object,
typically by calling the DelayedArray()
constructor on it.
Note that this is exactly what the ZarrArray() constructor
function does.
The result of this wrapping is a ZarrArray object, a DelayedArray derivative that simply represents a ZarrArraySeed object wrapped in a DelayedArray object.
ZarrArraySeed() returns a ZarrArraySeed object.
ZarrArray objects.
type, extract_array,
and is_sparse, in the S4Arrays
package.
chunkdim in the DelayedArray
package.
zarr_path <- system.file(package="Rarr", "extdata", "zarr_examples", "column-first", "int32.zarr") seed <- ZarrArraySeed(zarr_path) seed # ZarrArraySeed object path(seed) dim(seed) type(seed) chunkdim(seed) DelayedArray(seed) # ZarrArray object ## Sanity checks: stopifnot(class(seed) == "ZarrArraySeed", class(DelayedArray(seed)) == "ZarrArray")zarr_path <- system.file(package="Rarr", "extdata", "zarr_examples", "column-first", "int32.zarr") seed <- ZarrArraySeed(zarr_path) seed # ZarrArraySeed object path(seed) dim(seed) type(seed) chunkdim(seed) DelayedArray(seed) # ZarrArray object ## Sanity checks: stopifnot(class(seed) == "ZarrArraySeed", class(DelayedArray(seed)) == "ZarrArray")