NEWS
HDF5Array 1.34.0
NEW FEATURES
- Add 'as.vector' argument to h5mread().
SIGNIFICANT USER-VISIBLE CHANGES
- Improvements to coercions from CSC_H5SparseMatrixSeed, H5SparseMatrix,
TENxMatrix, or H5ADMatrix to SparseArray:
- should be significantly more efficient, thanks to various tweaks that
happened in the SparseArray and Delayed5Array packages;
- support coercing an object with more than 2^31 nonzero values.
- Coercion from any of the class above to a sparseMatrix derivative
now fails early if object to coerce has >= 2^31 nonzero values.
- All *Seed classes in the package now extend the new OutOfMemoryObject
class defined in BiocGenerics (virtual class with no slots).
BUG FIXES
- Fix long standing bug in t() methods for CSC_H5SparseMatrixSeed and
CSR_H5SparseMatrixSeed objects.
- Replace internal calls to rhdf5::H5Fopen(), rhdf5::H5Dopen(), and
rhdf5::H5Gopen(), with calls to new internal helpers .H5Fopen(),
.H5Dopen(), and .H5Gopen(), respectively.
See commit 31a7e06 for more information.
HDF5Array 1.32.0
NEW FEATURES
- Some light refactoring of the HDF5 dump management utilities:
- All the settings controlled by the get/setHDF5Dump*() functions are
now formally treated as global options (i.e. they're stored in the
global .Options vector). The benefit is that the settings will always
get passed to the workers in the context of parallel evaluation, even
when using a parallel back-end like BiocParallel::SnowParam.
In other words, all the workers are now guaranteed to use the same
settings as the main R process.
- In addition, getHDF5DumpFile() was further modified to make sure that
it will generate unique "automatique dump files" across workers.
SIGNIFICANT USER-VISIBLE CHANGES
- Change 'with.dimnames' default to TRUE (was FALSE) in writeHDF5Array().
BUG FIXES
- Make sure that chunkdim(x) on a TENxRealizationSink,
CSC_H5SparseMatrixSeed, or CSR_H5SparseMatrixSeed object 'x'
**always** returns dimensions that are at most dim(x), even
when 'x' has 0 rows and/or columns.
HDF5Array 1.30.0
NEW FEATURES
- Add 'dim' and 'sparse.layout' args to H5SparseMatrixSeed().
SIGNIFICANT USER-VISIBLE CHANGES
- HDF5Array now imports S4Arrays.
HDF5Array 1.28.0
- No changes in this version.
HDF5Array 1.26.0
SIGNIFICANT USER-VISIBLE CHANGES
- Try harder to find and load the matrix rownames of a 10x Genomics dataset.
See commit abafbb9e99ad54a64e5013305486b97daa9442bc.
BUG FIXES
- Handle HDF5 sparse matrices where shape is not an integer vector.
When the shape returned by internal helper .read_h5sparse_dim() is a
double vector it is now coerced to an integer vector. Integer overflows
resulting from this coercion trigger an error with an informative error
message.
See GitHub issue #48.
HDF5Array 1.24.0
SIGNIFICANT USER-VISIBLE CHANGES
- Improve error reporting in internal helper .h5openlocalfile()
BUG FIXES
- Make sure updateObject() handles very old HDF5ArraySeed instances.
HDF5Array 1.22.0
- No changes in this version.
HDF5Array 1.20.0
NEW FEATURES
- Implement the H5SparseMatrix class and H5SparseMatrix() constructor
function. H5SparseMatrix is a DelayedMatrix subclass for representing
and operating on an HDF5 sparse matrix stored in CSR/CSC/Yale format.
- Implement the H5ADMatrix class and H5ADMatrix() constructor function.
H5ADMatrix is a DelayedMatrix subclass for representing and operating
on the central matrix of an ‘h5ad’ file, or any matrix in its '/layers'
group.
- Implement H5File objects. The H5File class provides a formal
representation of an HDF5 file (local or remote, including a file
stored in an Amazon S3 bucket).
- HDF5Array objects now work with files on Amazon S3 (via use of H5File()).
BUG FIXES
- Remove "global counter" files at unload time (commit f7913043).
HDF5Array 1.18.0
NEW FEATURES
- Add 'as.sparse' argument to h5mread(), HDF5Array(), HDF5ArraySeed(),
writeHDF5Array(), saveHDF5SummarizedExperiment(), and
HDF5RealizationSink().
Even though it won't change how the data is stored in the HDF5 file
(data will still be stored the usual dense way), the 'as.sparse'
argument allows the user to control whether the HDF5 dataset should
be considered sparse (and treated as such) or not. More precisely,
when HDF5Array() is called with 'as.sparse=TRUE', the returned object
will be considered sparse i.e. blocks in the object will be loaded as
sparse objects during block processing. This should lead to less
memory usage and hopefully overall better performance.
- Add is_sparse() setter for HDF5Array and HDF5ArraySeed objects.
SIGNIFICANT USER-VISIBLE CHANGES
- Change default value of 'verbose' argument from FALSE to NA for
writeHDF5Array(), saveHDF5SummarizedExperiment(), and writeTENxMatrix().
BUG FIXES
- Fix handling of logical NAs in h5mread().
- Fix bug in saveHDF5SummarizedExperiment() when 'chunkdim' is specified.
HDF5Array 1.16.0
NEW FEATURES
- New h5writeDimnames()/h5readDimnames() functions for writing/reading
the dimnames of an HDF5 dataset to/from the HDF5 file.
See ?h5writeDimnames for more information.
- Add full support for HDF5Array objects of type "raw":
- writeHDF5Array() now works on a DelayedArray object of type "raw" (it
creates an H5 dataset of type H5T_STD_U8LE).
- The HDF5Array() constructor now should return an HDF5Array object of
type "raw" when pointed to an H5 dataset with an 8-bit width type (e.g.
H5T_STD_U8LE, H5T_STD_U8BE, H5T_STD_I8LE, H5T_STD_I8BE, H5T_STD_B8LE,
H5T_STD_B8BE, etc...)
- Add 'H5type' argument to writeHDF5Array().
- h5mread() now supports contiguous (i.e. unchunked) string data.
SIGNIFICANT USER-VISIBLE CHANGES
- HDF5Array objects now find their dimnames in the HDF5 file.
writeHDF5Array() and as(x, "HDF5Array") know how to write the dimnames
to the HDF5 file, and the HDF5Array() constructor knows how to find
them. See ?writeHDF5Array for more information.
BUG FIXES
- Fix bug causing character data to be truncated when written to HDF5 file.
- Fix h5mread() inefficiency when the user selection covers full chunks.
- h5mread() now handles character NAs consistently with rhdf5::h5read().
- Fix writeHDF5Array() error on character array filled with NAs.
HDF5Array 1.14.0
NEW FEATURES
- Add coercions from TENxMatrix (or TENxMatrixSeed) to dgCMatrix
SIGNIFICANT USER-VISIBLE CHANGES
- h5mread() argument 'starts' now defaults to NULL
BUG FIXES
- h5mread() now supports datasets with contiguous layout (i.e. not chunked)
HDF5Array 1.12.0
NEW FEATURES
- Add 'prefix' arg to save/loadHDF5SummarizedExperiment()
- Add quickResaveHDF5SummarizedExperiment() for fast re-saving after
initial saveHDF5SummarizedExperiment().
See ?quickResaveHDF5SummarizedExperiment for more information.
- Add h5mread() as a faster alternative to rhdf5::h5read(). It is now
the workhorse behind the extract_array() method for HDF5ArraySeed
objects. This change should significantly speed up block processing
of HDF5ArraySeed-based DelayedArray objects (including HDF5Array
objects).
HDF5Array 1.10.0
NEW FEATURES
- Implement the TENxMatrix container (DelayedArray backend for the
HDF5-based sparse matrix representation used by 10x Genomics).
Also add writeTENxMatrix() and coercion to TENxMatrix.
SIGNIFICANT USER-VISIBLE CHANGES
- By default automatic HDF5 datasets (e.g. the dataset that gets written
to disk when calling 'as(x, "HDF5Array")') now are created with chunks
of 1 million array elements (revious default was 1/75 of
'getAutoBlockLength(x)'). This can be controlled with new low-level
utilities get/setHDF5DumpChunkLength().
- By default automatic HDF5 datasets now are created with chunks of
shape "scale" instead of "first-dim-grows-first". This can be
controlled with new low-level utilities get/setHDF5DumpChunkShape().
- getHDF5DumpChunkDim() looses the 'type' and 'ratio' arguments (only 'dim'
is left).