NEWS
DelayedArray 0.32.0
NEW FEATURES
- Add nzwhich() method for DelayedArray objects (block-processed).
- Support %*%, crossprod(), and tcrossprod() between DelayedMatrix and
COO_SparseMatrix objects.
SIGNIFICANT USER-VISIBLE CHANGES
- The default realize() method now returns an SVT_SparseArray object
instead of an SparseArraySeed object when the array-like object to
realize is sparse and 'BACKEND' is NULL.
- Coercing a DelayedArray object or derivative to SparseArray should be
much more efficient (thanks to various tweaks that happened in the
SparseArray and HDF5Array packages).
DEPRECATED AND DEFUNCT
- Deprecate SparseArraySeed objects.
- Deprecate OLD_extract_sparse_array() and read_sparse_block() generics
and methods.
- Fix bug in coercion from DelayedArray to SparseArray when the object to
coerce has NAs. See https://github.com/Bioconductor/HDF5Array/issues/61
DelayedArray 0.30.0
NEW FEATURES
- rowsum(DelayedMatrix)/colsum(DelayedMatrix) now acknowledge the
current "automatic realization backend" and "automatic BiocParallel
BPPARAM". See '?DelayedArray::rowsum' for more information.
SIGNIFICANT USER-VISIBLE CHANGES
- Rename supportedRealizationBackends -> registeredRealizationBackends.
- Slightly modify the behavior of 'realize(x, BACKEND=NULL)'.
See https://github.com/hansenlab/minfi/issues/256
- Two important changes to matrix multiplication of DelayedMatrix objects.
1. Now it returns an ordinary matrix by default (before this change an
ordinary matrix was returned wrapped in a DelayedMatrix object). The
user can change the default behavior by setting an "automatic
realization backend". See ?DelayedArray::‘%*%' for more information.
2. Better block processing strategy when only one of the two operands is
a DelayedMatrix object (or derivative). The new strategy acknowledges
the geometry of the physical chunks of the data in the object. This
can make a huge difference in some cases. For example, using a subset
of the "1.3 Million Brain Cell Dataset" from 10x Genomics:
library(HDF5Array)
library(ExperimentHub)
hub <- ExperimentHub()
tenx <- TENxMatrix(hub[["EH1039"]], group="mm10")
M <- tenx[ , 1:25000]
m <- cbind(runif(ncol(M)), runif(ncol(M)))
M %*% m
Doing ’M %*% m' now takes 7.6s and uses 1.1Gb of memory, compared to
110s / 3.1Gb before this improvement. Furthermore, the new strategy
operates in linear time and at constant memory:
with DelayedArray with DelayedArray
ncol(M) 0.29.2 < 0.29.2
——- —————-- —————--
12500 4.3s / 1.1Gb 32s / 2.1Gb
25000 7.6s / 1.1Gb 110s / 3.1Gb
50000 13.4s / 1.1Gb 495s / 5.6Gb
100000 24.0s / 1.2Gg 2409s / 9.1Gb
Note that the new strategy is implemented in internal helpers
DelayedArray:::BLOCK_mult_Lgrid() and
DelayedArray:::BLOCK_mult_Rgrid(). When the two operands are
DelayedMatrix objects, the old strategy (which is implemented in
DelayedArray:::.super_BLOCK_mult()) is still used.
DelayedArray 0.28.0
NEW FEATURES
- Add coercion from DelayedArray to SparseArray.
- Add efficient rowVars/colVars methods for DelayedMatrix objects.
These methods, like all other row/col summarization methods implemented
in the DelayedArray package, use block processing and can handle blocks
of **arbitrary** geometry, that is, they can handle a grid of class
ArbitraryArrayGrid (the most general type of grid).
- Add 'useNames' arg to row/colMins, row/colMaxs, row/colRanges, and
row/colVars methods for DelayedMatrix objects.
- Add 'current_viewport' argument to set_grid_context().
SIGNIFICANT USER-VISIBLE CHANGES
- DelayedArray now depends on S4Arrays and SparseArray.
- Some improvements to the rowMeans/colMeans methods for DelayedMatrix
objects.
DelayedArray 0.26.0
- No changes in this version.
DelayedArray 0.24.0
SIGNIFICANT USER-VISIBLE CHANGES
- Move the aperm() S4 generic to BiocGenerics.
DelayedArray 0.22.0
DEPRECATED AND DEFUNCT
- The following stuff is now defunct after being deprecated in previous
versions of the package:
- blockGrid(): replaced with defaultAutoGrid()
- rowGrid(): replaced with rowAutoGrid()
- colGrid(): replaced with colAutoGrid()
- multGrids(): replaced with defaultMultAutoGrids()
- linearInd(): replaced with Mindex2Lindex()
- viewportApply(): replaced with gridApply()
- viewportReduce(): replaced with gridReduce()
- getRealizationBackend(): replaced with getAutoRealizationBackend()
- setRealizationBackend(): replaced with setAutoRealizationBackend()
- RealizationSink(): replaced with AutoRealizationSink()
BUG FIXES
- Small tweak to updateObject() method for DelayedArray objects (see
commit abcd154).
DelayedArray 0.20.0
BUG FIXES
- Fix long-standing bugs in dense2sparse():
- mishandling of NAs/NaNs in input
- 1D case didn't work
DelayedArray 0.18.0
NEW FEATURES
- Implement ConstantArray objects. The ConstantArray class is a
DelayedArray subclass to efficiently mimic an array containing a
constant value, without actually creating said array in memory.
- Add scale() method for DelayedMatrix objects.
- Add sinkApply(), a convenience function for walking on a RealizationSink
derivative and filling it with blocks of data.
- Proper support for dgRMatrix and lgRMatrix objects as DelayedArray
object seeds:
- is_sparse() now returns TRUE on dgRMatrix and lgRMatrix objects.
- Support coercion back and forth between SparseArraySeed objects
and dgRMatrix/lgRMatrix objects.
- Add extract_sparse_array() methods for dgRMatrix and lgRMatrix
objects.
These changes bring the treatment of dgRMatrix and lgRMatrix objects
to the same level as dgCMatrix and lgCMatrix objects. For example,
wrapping a dgRMatrix or lgRMatrix object in a DelayedArray object will
trigger the same sparse-optimized mechanisms during block processing
as when wrapping a dgCMatrix or lgCMatrix object.
- rbind() and cbind() on sparse DelayedArray objects are now fully
supported.
- Delayed operations of type DelayedUnaryIsoOpWithArgs now preserve
sparsity when appropriate.
- Implement DummyArrayGrid and DummyArrayViewport objects.
SIGNIFICANT USER-VISIBLE CHANGES
- Rename viewportApply()/viewportReduce() -> gridApply()/gridReduce().
BUG FIXES
- Subsetting of a DelayedArray object now propagates the names/dimnames,
even when drop=TRUE and the result has only 1 dimension (issue #78).
- log() on a DelayedArray object now handles the 'base' argument.
- Fix issue in is_sparse() methods for DelayedUnaryIsoOpStack and
DelayedNaryIsoOp objects.
- cbind()/rbind() no longer coerce supplied objects to type of 1st object
(commit f1279e07).
- Fix small issue in dim() setter (commit c9488537).
DelayedArray 0.16.0
NEW FEATURES
- Added 'as.sparse' argument to read_block() (see ?read_block) and to
AutoRealizationSink() (see ?AutoRealizationSink).
- SparseArraySeed objects now can hold dimnames. As a consequence
read_block() now also propagates the dimnames to sparse blocks,
not just to dense blocks.
- Matrix multiplication is now sparse-aware via sparseMatrices.
- Added is_sparse<- generic (with methods for HDF5Array/HDF5ArraySeed
objects only, see ?HDF5Array in the HDF5Array package).
- Added viewportApply() and viewportReduce() to the blockApply() family.
- Added set_grid_context() for testing/debugging callback functions passed
to blockApply() and family.
SIGNIFICANT USER-VISIBLE CHANGES
- Renamed first write_block() argument 'x' -> 'sink'
- Renamed:
RealizationSink() -> AutoRealizationSink()
get/setRealizationBackend() -> get/setAutoRealizationBackend()
blockGrid() -> defaultAutoGrid()
row/colGrid() -> row/colAutoGrid()
- Improved support of sparse data:
- Slightly more efficient coercion from SparseArraySeed to
dgCMatrix/lgCMatrix (small speedup and memory footprint reduction).
This provides a minor speedup to the sparse aware block-processed
row/col summarization methods for DelayedMatrix objects when the
object is sparse. (These methods are: row/colSums(), row/colMeans(),
row/colMins(), row/colMaxs(), and row/colRanges(). The methods defined
in DelayedMatrixStats are not sparse aware yet so are not affected.)
- Made the following block-processed operations on DelayedArray objects
sparse aware: anyNA(), which(), max(), min(), range(), sum(), prod(),
any(), all(), and mean(). With a typical 50%-60% speedup when the
DelayedArray object is sparse.
- Implemented a bunch of methods to operate natively on SparseArraySeed
objects. Their main purpose is to support the above i.e. to support
block processed methods for DelayedArray objects like sum(), mean(),
which(), etc... when the object is sparse. Note that more are needed
to also support the sparse aware block-processed row/col summarization
methods for DelayedMatrix objects so we can finally ditch the costly
coercion from SparseArraySeed to dgCMatrix/lgCMatrix that they currently
rely on.
- The utility functions for retrieving grid context for the current
block/viewport should now be called with no argument (previously
one needed to pass the current block to them). These functions are
effectiveGrid(), currentBlockId(), and currentViewport().
- DelayedArray now depends on the MatrixGenerics package.
BUG FIXES
- Various fixes and improvements to block processing of sparse logical
DelayedMatrix objects (e.g. DelayedMatrix object with a lgCMatrix
seed from thr Matrix package).
- Fix extract_sparse_array() inefficiency on dgCMatrix and lgCMatrix
objects.
- Switch matrix multiplication to bplapply2() from bpiterate() to fix
error handling.
DelayedArray 0.14.0
NEW FEATURES
- Support 'type(x) <- new_type' to change the type of a DelayedArray
object.
- 1D-style single bracket subsetting of DelayedArray objects now supports
subsetting by a numeric matrix with one column per dimension.
SIGNIFICANT USER-VISIBLE CHANGES
- No more parallel evaluation by default, that is, getAutoBPPARAM() now
returns NULL on a fresh session instead of one of the parallelization
backends defined in BiocParallel. It is now the responsibility of the
user to set the parallelization backend (with setAutoBPPARAM()) if they
wish things like matrix multiplication, rowsum() or rowSums() use
parallel evaluation again.
Also BiocParallel has been moved from Depends to Suggests.
- Replace arrayInd2() and linearInd() with Lindex2Mindex() and
Mindex2Lindex(). The new functions are implemented in C for better
performances and they properly handle L-index values greater than
INT_MAX (2^31 - 1) in the input and output.
- 2x speedup to coercion from DelayedArray to SparseArraySeed or dgCMatrix.
DEPRECATED AND DEFUNCT
- arrayInd2() and linearInd() are now deprecated in favor of
Lindex2Mindex() and Mindex2Lindex().
BUG FIXES
- Fix handling of linear indices >= 2^31 in 1D-style single bracket
subsetting of DelayedArray objects.
- rowsum() & colsum() methods for DelayedArray objects now respect factor
level ordering (issue #59).
- Coercion from DelayedMatrix to dgCMatrix now propagates the dimnames.
- No more quotes around the NA values of a DelayedArray of type "character".
- Better error message when Ops methods for DelayedArray objects reject
their operands.
DelayedArray 0.12.0
NEW FEATURES
- Add isPristine()
- Delayed subassignment now accepts a right value with dimensions that are
not strictly the same as the dimensions of the selection as long as the
"effective dimensions" are the same
- Small improvement to delayed dimnames setter: atomic vectors or factors
in the supplied 'dimnames' list are now accepted and passed thru
as.character()
SIGNIFICANT USER-VISIBLE CHANGES
- Improve show() method for DelayedArray objects (see commit 54540856)
BUG FIXES
- Setting and getting the dimnames of a DelayedArray object or derivative
now preserves the names on the dimnames
- Some fixes related to DelayedArray objects with list array seeds (see
commit 6c94eac7)
DelayedArray 0.10.0
NEW FEATURES
- Many improvements to matrix multiplication (%*%) of DelayedMatrix
objects by Aaron Lun. Also add limited support for (t)crossprod methods.
- Add rowsum() and colsum() methods for DelayedMatrix objects.
These methods are block-processed operations.
- Many improvements to the RleArray() contructor (see messages for
commits 582234a7 and 0a36ee01 for more info).
- Add seedApply()
- Add multGrids() utility (still a work-in-progress, not documented yet)
DelayedArray 0.8.0
NEW FEATURES
- Add get/setAutoBlockSize(), getAutoBlockLength(),
get/setAutoBlockShape() and get/setAutoGridMaker().
- Add rowGrid() and colGrid(), in addition to blockGrid().
- Add get/setAutoBPPARAM() to control the automatic 'BPPARAM' used by
blockApply().
- Reduce memory usage when realizing a sparse DelayedArray to disk
On-disk realization of a DelayedArray object that is reported to be sparse
(by is_sparse()) to a "sparsity-optimized" backend (i.e. to a backend with
a memory efficient write_sparse_block() like the TENxMatrix backend imple-
mented in the HDF5Array package) now preserves sparse representation of
the data all the way. More precisely, each block of data is now kept in
a sparse form during the 3 steps that it goes thru: read from seed,
realize in memory, and write to disk.
- showtree() now displays whether a tree node or leaf is considered sparse
or not.
- Enhance "aperm" method and dim() setter for DelayedArray objects. In
addition to allowing dropping "ineffective dimensions" (i.e. dimensions
equal to 1) from a DelayedArray object, aperm() and the dim() setter now
allow adding "ineffective dimensions" to it.
- Enhance subassignment to a DelayedArray object.
So far subassignment to a DelayedArray object only supported the **linear
form** (i.e. x[i] <- value) with strong restrictions (the subscript 'i'
must be a logical DelayedArray of the same dimensions as 'x', and 'value'
must be an ordinary vector of length 1).
In addition to this linear form, subassignment to a DelayedArray object
now supports the **multi-dimensional form** (e.g. x[3:1, , 6] <- 0). In
this form, one subscript per dimension is supplied, and each subscript
can be missing or be anything that multi-dimensional subassignment to
an ordinary array supports. The replacement value (a.k.a. the right
value) can be an array-like object (e.g. ordinary array, dgCMatrix object,
DelayedArray object, etc...) or an ordinary vector of length 1. Like the
linear form, the multi-dimensional form is also implemented as a delayed
operation.
- Re-implement internal helper simple_abind() in C and support long arrays.
simple_abind() is the workhorse behind realization of arbind() and
acbind() operations on DelayedArray objects.
- Add "table" and (restricted) "unique" methods for DelayedArray objects,
both block-processed.
- range() (block-processed) now supports the 'finite' argument on a
DelayedArray object.
- %*% (block-processed) now works between a DelayedMatrix object and an
ordinary vector.
- Improve support for DelayedArray of type "list".
- Add TENxMatrix to list of supported realization backends.
- Add backend-agnostic RealizationSink() constructor.
- Add linearInd() utility for turning array indices into linear indices.
Note that linearInd() performs the reverse transformation of
base::arrayInd().
- Add low-level utilities mapToGrid() and mapToRef() for mapping reference
array positions to grid positions and vice-versa.
- Add downsample() for reducing the "resolution" of an ArrayGrid object.
- Add maxlength() generic and methods for ArrayGrid objects.
SIGNIFICANT USER-VISIBLE CHANGES
- Multi-dimensional subsetting is no more delayed when drop=TRUE and the
result has only one dimension. In this case the result now is returned
as an **ordinary** vector (atomic or list). This is the only case of
multi-dimensional single bracket subsetting that is not delayed.
- Rename defaultGrid() -> blockGrid(). The 'max.block.length' argument
is replaced with the 'block.length' argument. 2 new arguments are
added: 'chunk.grid' and 'block.shape'.
- Major improvements to the block processing mechanism.
All block-processed operations (except realization by block) now support
blocks of **arbitrary** geometry instead of column-oriented blocks only.
'blockGrid(x)', which is called by the block-processed operations to get
the grid of blocks to use on 'x', has the following new features:
1) It's "chunk aware". This means that, when the chunk grid is known (i.e.
when 'chunkGrid(x)' is not NULL), 'blockGrid(x)' defines blocks that
are "compatible" with the chunks i.e. that any chunk is fully contained
in a block. In other words, blocks are chosen so that chunks don't
cross their boundaries.
2) When the chunk grid is unknown (i.e. when 'chunkGrid(x)' is NULL),
blocks are "isotropic", that is, they're as close as possible to an
hypercube instead of being "column-oriented" (column-oriented blocks,
also known as "linear blocks", are elongated along the 1st dimension,
then along the 2nd dimension, etc...)
3) The returned grid has the lowest "resolution" compatible with
'getAutoBlockSize()', that is, the blocks are made as big as possible
as long as their size in memory doesn't exceed 'getAutoBlockSize()'.
Note that this is not a new feature. What is new though is that an
exception now is made when the chunk grid is known and some chunks
are >= 'getAutoBlockSize()', in which case 'blockGrid(x)' returns a
grid that is the same as the chunk grid.
These new features are supposed to make the returned grid "optimal" for
block processing. (Some benchmarks still need to be done to
confirm/quantify this.)
- The automatic block size now is set to 100 Mb (instead of 4.5 Mb
previously) at package startup. Use setAutoBlockSize() to change the
automatic block size.
- No more 'BPREDO' argument to blockApply().
- Replace block_APPLY_and_COMBINE() with blockReduce().
BUG FIXES
- No-op operations on a DelayedArray derivative really act like no-ops.
Operating on a DelayedArray derivative (e.g. RleArray, HDF5Array or
GDSArray) will now return an objet of the original class if the result
is "pristine" (i.e. if it doesn't carry delayed operations) instead of
degrading the object to a DelayedArray instance. This applies for example
to 't(t(x))' or 'dimnames(x) <- dimnames(x)' etc...