NEWS

HDF5Array 1.34.0

NEW FEATURES

Add 'as.vector' argument to h5mread().

SIGNIFICANT USER-VISIBLE CHANGES

Improvements to coercions from CSC_H5SparseMatrixSeed, H5SparseMatrix, TENxMatrix, or H5ADMatrix to SparseArray: - should be significantly more efficient, thanks to various tweaks that happened in the SparseArray and Delayed5Array packages; - support coercing an object with more than 2^31 nonzero values.
Coercion from any of the class above to a sparseMatrix derivative now fails early if object to coerce has >= 2^31 nonzero values.
All *Seed classes in the package now extend the new OutOfMemoryObject class defined in BiocGenerics (virtual class with no slots).

BUG FIXES

Fix long standing bug in t() methods for CSC_H5SparseMatrixSeed and CSR_H5SparseMatrixSeed objects.
Replace internal calls to rhdf5::H5Fopen(), rhdf5::H5Dopen(), and rhdf5::H5Gopen(), with calls to new internal helpers .H5Fopen(), .H5Dopen(), and .H5Gopen(), respectively. See commit 31a7e06 for more information.

HDF5Array 1.32.0

NEW FEATURES

Some light refactoring of the HDF5 dump management utilities: - All the settings controlled by the get/setHDF5Dump*() functions are now formally treated as global options (i.e. they're stored in the global .Options vector). The benefit is that the settings will always get passed to the workers in the context of parallel evaluation, even when using a parallel back-end like BiocParallel::SnowParam. In other words, all the workers are now guaranteed to use the same settings as the main R process. - In addition, getHDF5DumpFile() was further modified to make sure that it will generate unique "automatique dump files" across workers.

SIGNIFICANT USER-VISIBLE CHANGES

Change 'with.dimnames' default to TRUE (was FALSE) in writeHDF5Array().

BUG FIXES

Make sure that chunkdim(x) on a TENxRealizationSink, CSC_H5SparseMatrixSeed, or CSR_H5SparseMatrixSeed object 'x' **always** returns dimensions that are at most dim(x), even when 'x' has 0 rows and/or columns.

HDF5Array 1.30.0

NEW FEATURES

Add 'dim' and 'sparse.layout' args to H5SparseMatrixSeed().

SIGNIFICANT USER-VISIBLE CHANGES

HDF5Array now imports S4Arrays.

HDF5Array 1.28.0

No changes in this version.

HDF5Array 1.26.0

SIGNIFICANT USER-VISIBLE CHANGES

Try harder to find and load the matrix rownames of a 10x Genomics dataset. See commit abafbb9e99ad54a64e5013305486b97daa9442bc.

BUG FIXES

Handle HDF5 sparse matrices where shape is not an integer vector. When the shape returned by internal helper .read_h5sparse_dim() is a double vector it is now coerced to an integer vector. Integer overflows resulting from this coercion trigger an error with an informative error message. See GitHub issue #48.

HDF5Array 1.24.0

SIGNIFICANT USER-VISIBLE CHANGES

Improve error reporting in internal helper .h5openlocalfile()

BUG FIXES

Make sure updateObject() handles very old HDF5ArraySeed instances.

HDF5Array 1.22.0

No changes in this version.

HDF5Array 1.20.0

NEW FEATURES

Implement the H5SparseMatrix class and H5SparseMatrix() constructor function. H5SparseMatrix is a DelayedMatrix subclass for representing and operating on an HDF5 sparse matrix stored in CSR/CSC/Yale format.
Implement the H5ADMatrix class and H5ADMatrix() constructor function. H5ADMatrix is a DelayedMatrix subclass for representing and operating on the central matrix of an ‘h5ad’ file, or any matrix in its '/layers' group.
Implement H5File objects. The H5File class provides a formal representation of an HDF5 file (local or remote, including a file stored in an Amazon S3 bucket).
HDF5Array objects now work with files on Amazon S3 (via use of H5File()).

BUG FIXES

Remove "global counter" files at unload time (commit f7913043).

HDF5Array 1.18.0

NEW FEATURES

Add 'as.sparse' argument to h5mread(), HDF5Array(), HDF5ArraySeed(), writeHDF5Array(), saveHDF5SummarizedExperiment(), and HDF5RealizationSink(). Even though it won't change how the data is stored in the HDF5 file (data will still be stored the usual dense way), the 'as.sparse' argument allows the user to control whether the HDF5 dataset should be considered sparse (and treated as such) or not. More precisely, when HDF5Array() is called with 'as.sparse=TRUE', the returned object will be considered sparse i.e. blocks in the object will be loaded as sparse objects during block processing. This should lead to less memory usage and hopefully overall better performance.
Add is_sparse() setter for HDF5Array and HDF5ArraySeed objects.

SIGNIFICANT USER-VISIBLE CHANGES

Change default value of 'verbose' argument from FALSE to NA for writeHDF5Array(), saveHDF5SummarizedExperiment(), and writeTENxMatrix().

BUG FIXES

Fix handling of logical NAs in h5mread().
Fix bug in saveHDF5SummarizedExperiment() when 'chunkdim' is specified.

HDF5Array 1.16.0

NEW FEATURES

New h5writeDimnames()/h5readDimnames() functions for writing/reading the dimnames of an HDF5 dataset to/from the HDF5 file. See ?h5writeDimnames for more information.
Add full support for HDF5Array objects of type "raw": - writeHDF5Array() now works on a DelayedArray object of type "raw" (it creates an H5 dataset of type H5T_STD_U8LE). - The HDF5Array() constructor now should return an HDF5Array object of type "raw" when pointed to an H5 dataset with an 8-bit width type (e.g. H5T_STD_U8LE, H5T_STD_U8BE, H5T_STD_I8LE, H5T_STD_I8BE, H5T_STD_B8LE, H5T_STD_B8BE, etc...)
Add 'H5type' argument to writeHDF5Array().
h5mread() now supports contiguous (i.e. unchunked) string data.

SIGNIFICANT USER-VISIBLE CHANGES

HDF5Array objects now find their dimnames in the HDF5 file. writeHDF5Array() and as(x, "HDF5Array") know how to write the dimnames to the HDF5 file, and the HDF5Array() constructor knows how to find them. See ?writeHDF5Array for more information.

BUG FIXES

Fix bug causing character data to be truncated when written to HDF5 file.
Fix h5mread() inefficiency when the user selection covers full chunks.
h5mread() now handles character NAs consistently with rhdf5::h5read().
Fix writeHDF5Array() error on character array filled with NAs.

HDF5Array 1.14.0

NEW FEATURES

Add coercions from TENxMatrix (or TENxMatrixSeed) to dgCMatrix

SIGNIFICANT USER-VISIBLE CHANGES

h5mread() argument 'starts' now defaults to NULL

BUG FIXES

h5mread() now supports datasets with contiguous layout (i.e. not chunked)

HDF5Array 1.12.0

NEW FEATURES

Add 'prefix' arg to save/loadHDF5SummarizedExperiment()
Add quickResaveHDF5SummarizedExperiment() for fast re-saving after initial saveHDF5SummarizedExperiment(). See ?quickResaveHDF5SummarizedExperiment for more information.
Add h5mread() as a faster alternative to rhdf5::h5read(). It is now the workhorse behind the extract_array() method for HDF5ArraySeed objects. This change should significantly speed up block processing of HDF5ArraySeed-based DelayedArray objects (including HDF5Array objects).

HDF5Array 1.10.0

NEW FEATURES

Implement the TENxMatrix container (DelayedArray backend for the HDF5-based sparse matrix representation used by 10x Genomics). Also add writeTENxMatrix() and coercion to TENxMatrix.

SIGNIFICANT USER-VISIBLE CHANGES

By default automatic HDF5 datasets (e.g. the dataset that gets written to disk when calling 'as(x, "HDF5Array")') now are created with chunks of 1 million array elements (revious default was 1/75 of 'getAutoBlockLength(x)'). This can be controlled with new low-level utilities get/setHDF5DumpChunkLength().
By default automatic HDF5 datasets now are created with chunks of shape "scale" instead of "first-dim-grows-first". This can be controlled with new low-level utilities get/setHDF5DumpChunkShape().
getHDF5DumpChunkDim() looses the 'type' and 'ratio' arguments (only 'dim' is left).