Title: | High-performance sparse data representation and manipulation in R |
---|---|
Description: | The SparseArray package provides array-like containers for efficient in-memory representation of multidimensional sparse data in R (arrays and matrices). The package defines the SparseArray virtual class and two concrete subclasses: COO_SparseArray and SVT_SparseArray. Each subclass uses its own internal representation of the nonzero multidimensional data: the "COO layout" and the "SVT layout", respectively. SVT_SparseArray objects mimic as much as possible the behavior of ordinary matrix and array objects in base R. In particular, they suppport most of the "standard matrix and array API" defined in base R and in the matrixStats package from CRAN. |
Authors: | Hervé Pagès [aut, cre] , Vince Carey [fnd] , Rafael A. Irizarry [fnd] , Jacques Serizay [ctb] |
Maintainer: | Hervé Pagès <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.7.2 |
Built: | 2024-11-15 03:34:43 UTC |
Source: | https://github.com/bioc/SparseArray |
The COO_SparseArray class is a container for efficient in-memory representation of multidimensional sparse arrays. It uses the COO layout to represent the nonzero data internally.
A COO_SparseMatrix object is a COO_SparseArray object of 2 dimensions.
IMPORTANT NOTE: COO_SparseArray and COO_SparseMatrix objects are now superseded by the new and more efficient SVT_SparseArray and SVT_SparseMatrix objects.
## Constructor function: COO_SparseArray(dim, nzcoo=NULL, nzdata=NULL, dimnames=NULL, check=TRUE) ## Getters (in addition to dim(), length(), and dimnames()): nzcoo(x) nzdata(x)
## Constructor function: COO_SparseArray(dim, nzcoo=NULL, nzdata=NULL, dimnames=NULL, check=TRUE) ## Getters (in addition to dim(), length(), and dimnames()): nzcoo(x) nzdata(x)
dim |
The dimensions (supplied as an integer vector) of the COO_SparseArray or COO_SparseMatrix object to construct. |
nzcoo |
A matrix containing the array coordinates of the nonzero elements. This must be an integer matrix of array coordinates like
one returned by |
nzdata |
A vector (atomic or list) of length |
dimnames |
The dimnames of the object to construct. Must be |
check |
Should the object be validated upon construction? |
x |
A COO_SparseArray or COO_SparseMatrix object. |
For COO_SparseArray()
: A COO_SparseArray or COO_SparseMatrix
object.
For nzcoo()
: A matrix with one column per dimension
containing the array coordinates of the nonzero elements.
For nzdata()
: A vector parallel to nzcoo(x)
(i.e. with one element per row in nzcoo(x)
) containing
the nonzero elements.
The new SVT_SparseArray class for a replacement of of the COO_SparseArray class.
The SparseArray class for the virtual parent class of COO_SparseArray and SVT_SparseArray.
S4 classes dgCMatrix and lgCMatrix defined in the Matrix package, for the de facto standard of sparse matrix representations in the R ecosystem.
base::arrayInd
in the base package.
S4Arrays::Lindex2Mindex
in the
S4Arrays package for an improved (faster) version
of base::arrayInd
.
Ordinary array objects in base R.
## --------------------------------------------------------------------- ## EXAMPLE 1 ## --------------------------------------------------------------------- dim1 <- 5:3 nzcoo1 <- Lindex2Mindex(sample(60, 8), 5:3) nzdata1 <- 11.11 * seq_len(nrow(nzcoo1)) coo1 <- COO_SparseArray(dim1, nzcoo1, nzdata1) coo1 nzcoo(coo1) nzdata(coo1) type(coo1) sparsity(coo1) as.array(coo1) # back to a dense representation #as.matrix(coo1) # error! ## --------------------------------------------------------------------- ## EXAMPLE 2 ## --------------------------------------------------------------------- m2 <- matrix(c(5:-2, rep.int(c(0L, 99L), 11)), ncol=6) coo2 <- as(m2, "COO_SparseArray") class(coo2) dim(coo2) length(coo2) nzcoo(coo2) nzdata(coo2) type(coo2) sparsity(coo2) stopifnot(identical(as.matrix(coo2), m2)) t(coo2) stopifnot(identical(as.matrix(t(coo2)), t(as.matrix(coo2)))) ## --------------------------------------------------------------------- ## COERCION FROM/TO dg[C|R]Matrix OR lg[C|R]Matrix OBJECTS ## --------------------------------------------------------------------- ## dg[C|R]Matrix and lg[C|R]Matrix objects are defined in the Matrix ## package. ## dgCMatrix/dgRMatrix: M2C <- as(coo2, "dgCMatrix") stopifnot(identical(M2C, as(m2, "dgCMatrix"))) coo2C <- as(M2C, "COO_SparseArray") ## 'coo2C' is the same as 'coo2' except that 'nzdata(coo2C)' has ## type "double" instead of "integer": stopifnot(all.equal(coo2, coo2C)) typeof(nzdata(coo2C)) # double typeof(nzdata(coo2)) # integer M2R <- as(coo2, "dgRMatrix") stopifnot(identical(M2R, as(m2, "dgRMatrix"))) coo2R <- as(M2R, "COO_SparseArray") stopifnot(all.equal(as.matrix(coo2), as.matrix(coo2R))) ## lgCMatrix/lgRMatrix: m3 <- m2 == 99 # logical matrix coo3 <- as(m3, "COO_SparseArray") class(coo3) type(coo3) M3C <- as(coo3, "lgCMatrix") stopifnot(identical(M3C, as(m3, "lgCMatrix"))) coo3C <- as(M3C, "COO_SparseArray") identical(as.matrix(coo3), as.matrix(coo3C)) M3R <- as(coo3, "lgRMatrix") #stopifnot(identical(M3R, as(m3, "lgRMatrix"))) coo3R <- as(M3R, "COO_SparseArray") identical(as.matrix(coo3), as.matrix(coo3R)) ## --------------------------------------------------------------------- ## A BIG COO_SparseArray OBJECT ## --------------------------------------------------------------------- nzcoo4 <- cbind(sample(25000, 600000, replace=TRUE), sample(195000, 600000, replace=TRUE)) nzdata4 <- runif(600000) coo4 <- COO_SparseArray(c(25000, 195000), nzcoo4, nzdata4) coo4 sparsity(coo4)
## --------------------------------------------------------------------- ## EXAMPLE 1 ## --------------------------------------------------------------------- dim1 <- 5:3 nzcoo1 <- Lindex2Mindex(sample(60, 8), 5:3) nzdata1 <- 11.11 * seq_len(nrow(nzcoo1)) coo1 <- COO_SparseArray(dim1, nzcoo1, nzdata1) coo1 nzcoo(coo1) nzdata(coo1) type(coo1) sparsity(coo1) as.array(coo1) # back to a dense representation #as.matrix(coo1) # error! ## --------------------------------------------------------------------- ## EXAMPLE 2 ## --------------------------------------------------------------------- m2 <- matrix(c(5:-2, rep.int(c(0L, 99L), 11)), ncol=6) coo2 <- as(m2, "COO_SparseArray") class(coo2) dim(coo2) length(coo2) nzcoo(coo2) nzdata(coo2) type(coo2) sparsity(coo2) stopifnot(identical(as.matrix(coo2), m2)) t(coo2) stopifnot(identical(as.matrix(t(coo2)), t(as.matrix(coo2)))) ## --------------------------------------------------------------------- ## COERCION FROM/TO dg[C|R]Matrix OR lg[C|R]Matrix OBJECTS ## --------------------------------------------------------------------- ## dg[C|R]Matrix and lg[C|R]Matrix objects are defined in the Matrix ## package. ## dgCMatrix/dgRMatrix: M2C <- as(coo2, "dgCMatrix") stopifnot(identical(M2C, as(m2, "dgCMatrix"))) coo2C <- as(M2C, "COO_SparseArray") ## 'coo2C' is the same as 'coo2' except that 'nzdata(coo2C)' has ## type "double" instead of "integer": stopifnot(all.equal(coo2, coo2C)) typeof(nzdata(coo2C)) # double typeof(nzdata(coo2)) # integer M2R <- as(coo2, "dgRMatrix") stopifnot(identical(M2R, as(m2, "dgRMatrix"))) coo2R <- as(M2R, "COO_SparseArray") stopifnot(all.equal(as.matrix(coo2), as.matrix(coo2R))) ## lgCMatrix/lgRMatrix: m3 <- m2 == 99 # logical matrix coo3 <- as(m3, "COO_SparseArray") class(coo3) type(coo3) M3C <- as(coo3, "lgCMatrix") stopifnot(identical(M3C, as(m3, "lgCMatrix"))) coo3C <- as(M3C, "COO_SparseArray") identical(as.matrix(coo3), as.matrix(coo3C)) M3R <- as(coo3, "lgRMatrix") #stopifnot(identical(M3R, as(m3, "lgRMatrix"))) coo3R <- as(M3R, "COO_SparseArray") identical(as.matrix(coo3), as.matrix(coo3R)) ## --------------------------------------------------------------------- ## A BIG COO_SparseArray OBJECT ## --------------------------------------------------------------------- nzcoo4 <- cbind(sample(25000, 600000, replace=TRUE), sample(195000, 600000, replace=TRUE)) nzdata4 <- runif(600000) coo4 <- COO_SparseArray(c(25000, 195000), nzcoo4, nzdata4) coo4 sparsity(coo4)
A set of functions for direct manipulation of the non-NA elements of an array-like object.
is_nonna(x) nnacount(x) nnawhich(x, arr.ind=FALSE) nnavals(x) nnavals(x) <- value
is_nonna(x) nnacount(x) nnawhich(x, arr.ind=FALSE) nnavals(x) nnavals(x) <- value
x |
Typically (but not necessarily) an array-like object that is non-NA sparse, like an NaArray object. However, |
arr.ind |
If See Note that using |
value |
A vector, typically of length |
nnacount(x)
and nnawhich(x)
are equivalent to, but
typically more efficient than, sum(is_nonna(x))
and
which(is_nonna(x))
, respectively.
nnavals(x)
is equivalent to, but typically more efficient than,
x[nnawhich(x)]
(or x[is_nonna(x)]
).
nnavals(x) <- value
replaces the values of the non-NA array
elements in x
with the supplied values. It's equivalent to,
but typically more efficient than, x[nnawhich(x)] <- value
.
Note that nnavals(x) <- nnavals(x)
is guaranteed to be a no-op.
is_nonna()
: An array-like object of type()
"logical"
and same dimensions as the input object.
nnacount()
: The number of non-NA array elements in x
.
nnawhich()
: The indices of the non-NA array elements in x
,
either as an L-index (if arr.ind
is FALSE
) or as
an M-index (if arr.ind
is TRUE
).
Note that the indices are returned sorted in strictly ascending order.
nnavals()
: A vector of the same type()
as x
and
containing the values of the non-NA array elements in x
.
Note that the returned vector is guaranteed to be parallel
to nnawhich(x)
.
is_nonzero for is_nonzero()
and nz*()
functions
nzcount()
, nzwhich()
, etc...
NaArray objects.
Ordinary array objects in base R.
base::which
in base R.
a <- array(NA_integer_, dim=c(5, 12, 2)) a[sample(length(a), 20)] <- (-9):10 is_nonna(a) ## Get the number of non-NA array elements in 'a': nnacount(a) ## nnawhich() returns the indices of the non-NA array elements in 'a'. ## Either as a "L-index" i.e. an integer (or numeric) vector of ## length 'nnacount(a)' containing "linear indices": nnaidx <- nnawhich(a) length(nnaidx) head(nnaidx) ## Or as an "M-index" i.e. an integer matrix with 'nnacount(a)' rows ## and one column per dimension where the rows represent "array indices" ## (a.k.a. "array coordinates"): Mnnaidx <- nnawhich(a, arr.ind=TRUE) dim(Mnnaidx) ## Each row in the matrix is an n-tuple representing the "array ## coordinates" of a non-NA element in 'a': head(Mnnaidx) tail(Mnnaidx) ## Extract the values of the non-NA array elements in 'a' and return ## them in a vector "parallel" to 'nnawhich(a)': a_nnavals <- nnavals(a) # equivalent to 'a[nnawhich(a)]' length(a_nnavals) head(a_nnavals) nnavals(a) <- 10 ^ nnavals(a) a ## Sanity checks: stopifnot( identical(nnaidx, which(!is.na(a))), identical(Mnnaidx, which(!is.na(a), arr.ind=TRUE, useNames=FALSE)), identical(nnavals(a), a[nnaidx]), identical(nnavals(a), a[Mnnaidx]), identical(`nnavals<-`(a, nnavals(a)), a) )
a <- array(NA_integer_, dim=c(5, 12, 2)) a[sample(length(a), 20)] <- (-9):10 is_nonna(a) ## Get the number of non-NA array elements in 'a': nnacount(a) ## nnawhich() returns the indices of the non-NA array elements in 'a'. ## Either as a "L-index" i.e. an integer (or numeric) vector of ## length 'nnacount(a)' containing "linear indices": nnaidx <- nnawhich(a) length(nnaidx) head(nnaidx) ## Or as an "M-index" i.e. an integer matrix with 'nnacount(a)' rows ## and one column per dimension where the rows represent "array indices" ## (a.k.a. "array coordinates"): Mnnaidx <- nnawhich(a, arr.ind=TRUE) dim(Mnnaidx) ## Each row in the matrix is an n-tuple representing the "array ## coordinates" of a non-NA element in 'a': head(Mnnaidx) tail(Mnnaidx) ## Extract the values of the non-NA array elements in 'a' and return ## them in a vector "parallel" to 'nnawhich(a)': a_nnavals <- nnavals(a) # equivalent to 'a[nnawhich(a)]' length(a_nnavals) head(a_nnavals) nnavals(a) <- 10 ^ nnavals(a) a ## Sanity checks: stopifnot( identical(nnaidx, which(!is.na(a))), identical(Mnnaidx, which(!is.na(a), arr.ind=TRUE, useNames=FALSE)), identical(nnavals(a), a[nnaidx]), identical(nnavals(a), a[Mnnaidx]), identical(`nnavals<-`(a, nnavals(a)), a) )
A set of functions for direct manipulation of the nonzero elements of an array-like object.
is_nonzero(x) nzcount(x) nzwhich(x, arr.ind=FALSE) nzvals(x) nzvals(x) <- value sparsity(x)
is_nonzero(x) nzcount(x) nzwhich(x, arr.ind=FALSE) nzvals(x) nzvals(x) <- value sparsity(x)
x |
Typically (but not necessarily) an array-like object that is sparse, like a SparseArray derivative, or a dg[C|R]Matrix or lg[C|R]Matrix object from the Matrix package. However, |
arr.ind |
If See Note that using |
value |
A vector, typically of length |
nzcount(x)
and nzwhich(x)
are equivalent to, but
typically more efficient than, sum(is_nonzero(x))
and
which(is_nonzero(x))
, respectively.
nzvals(x)
is equivalent to, but typically more efficient than,
x[nzwhich(x)]
(or x[is_nonzero(x)]
).
nzvals(x) <- value
replaces the values of the nonzero array
elements in x
with the supplied values. It's equivalent to,
but typically more efficient than, x[nzwhich(x)] <- value
.
Note that nzvals(x) <- nzvals(x)
is guaranteed to be a no-op.
is_nonzero()
: An array-like object of type()
"logical"
and same dimensions as the input object.
nzcount()
: The number of nonzero array elements in x
.
nzwhich()
: The indices of the nonzero array elements in x
,
either as an L-index (if arr.ind
is FALSE
) or as
an M-index (if arr.ind
is TRUE
).
Note that the indices are returned sorted in strictly ascending order.
nzvals()
: A vector of the same type()
as x
and
containing the values of the nonzero array elements in x
.
Note that the returned vector is guaranteed to be parallel
to nzwhich(x)
.
sparsity(x)
: The ratio between the number of zero-valued
elements in array-like object x
and its total number of
elements (length(x)
or prod(dim(x))
).
More precisely, sparsity(x)
is 1 - nzcount(x)/length(x)
.
is_nonna for is_nonna()
and nna*()
functions
nnacount()
, nnawhich()
, etc...
SparseArray objects.
S4 classes dgCMatrix, lgCMatrix, and ngCMatrix defined in the Matrix package.
Ordinary array objects in base R.
base::which
in base R.
a <- array(rpois(120, lambda=0.3), dim=c(5, 12, 2)) is_nonzero(a) ## Get the number of nonzero array elements in 'a': nzcount(a) ## nzwhich() returns the indices of the nonzero array elements in 'a'. ## Either as a "L-index" i.e. an integer (or numeric) vector of ## length 'nzcount(a)' containing "linear indices": nzidx <- nzwhich(a) length(nzidx) head(nzidx) ## Or as an "M-index" i.e. an integer matrix with 'nzcount(a)' rows ## and one column per dimension where the rows represent "array indices" ## (a.k.a. "array coordinates"): Mnzidx <- nzwhich(a, arr.ind=TRUE) dim(Mnzidx) ## Each row in the matrix is an n-tuple representing the "array ## coordinates" of a nonzero element in 'a': head(Mnzidx) tail(Mnzidx) ## Extract the values of the nonzero array elements in 'a' and return ## them in a vector "parallel" to 'nzwhich(a)': a_nzvals <- nzvals(a) # equivalent to 'a[nzwhich(a)]' length(a_nzvals) head(a_nzvals) nzvals(a) <- log1p(nzvals(a)) a ## Sanity checks: stopifnot( identical(nzidx, which(a != 0)), identical(Mnzidx, which(a != 0, arr.ind=TRUE, useNames=FALSE)), identical(nzvals(a), a[nzidx]), identical(nzvals(a), a[Mnzidx]), identical(`nzvals<-`(a, nzvals(a)), a) )
a <- array(rpois(120, lambda=0.3), dim=c(5, 12, 2)) is_nonzero(a) ## Get the number of nonzero array elements in 'a': nzcount(a) ## nzwhich() returns the indices of the nonzero array elements in 'a'. ## Either as a "L-index" i.e. an integer (or numeric) vector of ## length 'nzcount(a)' containing "linear indices": nzidx <- nzwhich(a) length(nzidx) head(nzidx) ## Or as an "M-index" i.e. an integer matrix with 'nzcount(a)' rows ## and one column per dimension where the rows represent "array indices" ## (a.k.a. "array coordinates"): Mnzidx <- nzwhich(a, arr.ind=TRUE) dim(Mnzidx) ## Each row in the matrix is an n-tuple representing the "array ## coordinates" of a nonzero element in 'a': head(Mnzidx) tail(Mnzidx) ## Extract the values of the nonzero array elements in 'a' and return ## them in a vector "parallel" to 'nzwhich(a)': a_nzvals <- nzvals(a) # equivalent to 'a[nzwhich(a)]' length(a_nzvals) head(a_nzvals) nzvals(a) <- log1p(nzvals(a)) a ## Sanity checks: stopifnot( identical(nzidx, which(a != 0)), identical(Mnzidx, which(a != 0, arr.ind=TRUE, useNames=FALSE)), identical(nzvals(a), a[nzidx]), identical(nzvals(a), a[Mnzidx]), identical(`nzvals<-`(a, nzvals(a)), a) )
EXPERIMENTAL!!!
Like SVT_SparseArray objects but the background value is NA
instead of zero.
## Constructor function: NaArray(x, dim=NULL, dimnames=NULL, type=NA)
## Constructor function: NaArray(x, dim=NULL, dimnames=NULL, type=NA)
x |
If If |
dim |
|
dimnames |
The dimnames of the object to construct. Must be |
type |
A single string specifying the requested type of the object. By default the NaArray object returned by the constructor
function will have the same naa <- NaArray(x, type=type) is equivalent to doing: naa <- NaArray(x) type(naa) <- type but the former is more convenient and will generally be more efficient. The supported types for NaArray objects are |
NaArray is a concrete subclass of the Array virtual class. This makes NaArray objects Array derivatives.
Like with SVT_SparseArray objects, the non-NA data in an NaArray
object is stored in a Sparse Vector Tree.
See ?SVT_SparseArray
for more information.
An NaArray or NaMatrix object.
The SVT_SparseArray class.
is_nonna for is_nonna()
and nna*()
functions
nnacount()
, nnawhich()
, etc...
NaArray_aperm for permuting the dimensions of an NaArray object (e.g. transposition).
NaArray_subsetting for subsetting an NaArray object.
NaArray_abind for combining 2D or multidimensional NaArray objects.
NaArray_summarization for NaArray summarization methods.
NaArray_Arith, NaArray_Compare, and
NaArray_Logic, for operations from the Arith
,
Compare
, and Arith
groups on NaArray objects.
NaArray_Math for operations from the Math
and
Math2
groups on NaArray objects.
NaArray_misc for miscellaneous operations on an NaArray object.
NaArray_matrixStats for col/row summarization methods for NaArray objects.
Ordinary array objects in base R.
## --------------------------------------------------------------------- ## Display details of class definition & known subclasses ## --------------------------------------------------------------------- showClass("NaArray") ## --------------------------------------------------------------------- ## The NaArray() constructor ## --------------------------------------------------------------------- naa1 <- NaArray(dim=5:3) # all-NA object naa1 naa2 <- NaArray(dim=c(35000, 2e6), type="integer") # all-NA object naa2 ## Add some non-NA values to 'naa2': naa2[cbind( 1:99, 2:100)] <- 1L naa2[cbind(1:100, 1:100)] <- 0L naa2[cbind(2:100, 1:99)] <- -1L naa2 ## The dimnames can be specified at construction time, or ## added/modified later: naa3 <- NaArray(c(NA, NA, 1L, NA, 0:7, rep(NA, 4), 12:14, NA), dim=4:5, dimnames=list(letters[1:4], LETTERS[1:5])) naa3 colnames(naa3) <- LETTERS[22:26] naa3 ## Sanity checks: stopifnot( is(naa1, "NaArray"), identical(dim(naa1), 5:3), identical(as.array(naa1), array(dim=5:3)), is(naa2, "NaMatrix"), all.equal(dim(naa2), c(35000, 2e6)), identical(nnacount(naa2), 298L), is(naa3, "NaMatrix"), identical(dim(naa3), 4:5), identical(nnacount(naa3), 12L) )
## --------------------------------------------------------------------- ## Display details of class definition & known subclasses ## --------------------------------------------------------------------- showClass("NaArray") ## --------------------------------------------------------------------- ## The NaArray() constructor ## --------------------------------------------------------------------- naa1 <- NaArray(dim=5:3) # all-NA object naa1 naa2 <- NaArray(dim=c(35000, 2e6), type="integer") # all-NA object naa2 ## Add some non-NA values to 'naa2': naa2[cbind( 1:99, 2:100)] <- 1L naa2[cbind(1:100, 1:100)] <- 0L naa2[cbind(2:100, 1:99)] <- -1L naa2 ## The dimnames can be specified at construction time, or ## added/modified later: naa3 <- NaArray(c(NA, NA, 1L, NA, 0:7, rep(NA, 4), 12:14, NA), dim=4:5, dimnames=list(letters[1:4], LETTERS[1:5])) naa3 colnames(naa3) <- LETTERS[22:26] naa3 ## Sanity checks: stopifnot( is(naa1, "NaArray"), identical(dim(naa1), 5:3), identical(as.array(naa1), array(dim=5:3)), is(naa2, "NaMatrix"), all.equal(dim(naa2), c(35000, 2e6)), identical(nnacount(naa2), 298L), is(naa3, "NaMatrix"), identical(dim(naa3), 4:5), identical(nnacount(naa3), 12L) )
EXPERIMENTAL!!!
Like ordinary matrices and arrays in base R, NaMatrix objects
can be combined by rows or columns, with rbind()
or cbind()
,
and multidimensional NaArray objects can be bound along
any dimension with abind()
.
Note that arbind()
can also be used to combine the objects along
their first dimension, and acbind()
can be used to combine them
along their second dimension.
# COMING SOON...
# COMING SOON...
EXPERIMENTAL!!!
Transpose an NaArray object by permuting its dimensions.
COMING SOON...
# COMING SOON...
# COMING SOON...
EXPERIMENTAL!!!
NaArray objects support all operations from the Arith
group.
See ?S4groupGeneric
in the methods package
for more information about the Arith
group generic.
Note that NaArray of type()
"complex"
don't support
Arith
operations at the moment.
Three forms of 'Arith' operations involving NaArray objects are supported:
Between an NaArray object naa
and a single
value y
:
naa op y y op naa
Between two NaArray objects naa1
and naa2
of same dimensions (a.k.a. conformable arrays):
naa1 op naa2
Between an NaArray object naa
and an
SVT_SparseArray object svt
of same dimensions
(a.k.a. conformable arrays):
naa op svt svt op naa
An NaArray object of the same dimensions as the input object(s).
S4groupGeneric
in the methods package.
NaArray objects.
SVT_SparseArray objects.
Ordinary array objects in base R.
nam1 <- NaArray(dim=c(15, 6), type="integer") nam1[cbind(1:15, 2)] <- 100:114 nam1[cbind(1:15, 5)] <- -(114:100) nam1 nam1 * -0.01 nam1 * 10 # result is of type "double" nam1 * 10L # result is of type "integer" nam1 / 10L nam1 ^ 3.5 nam1 %% 5L nam1 %/% 5L nam2 <- NaArray(dim=dim(nam1), type="double") nam2[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- runif(26) nam2 nam2 + nam1 nam2 - nam1 nam2 * nam1 nam2 / nam1 nam2 ^ nam1 nam2 %% nam1 nam2 %/% nam1 ## Sanity checks: m1 <- as.matrix(nam1) m2 <- as.matrix(nam2) stopifnot( identical(as.matrix(nam1 * -0.01), m1 * -0.01), identical(as.matrix(nam1 * 10), m1 * 10), identical(as.matrix(nam1 * 10L), m1 * 10L), identical(as.matrix(nam1 / 10L), m1 / 10L), identical(as.matrix(nam1 ^ 3.5), m1 ^ 3.5), identical(as.matrix(nam1 %% 5L), m1 %% 5L), identical(as.matrix(nam1 %/% 5L), m1 %/% 5L), identical(as.matrix(nam2 + nam1), m2 + m1), identical(as.matrix(nam2 - nam1), m2 - m1), identical(as.matrix(nam2 * nam1), m2 * m1), identical(as.matrix(nam2 / nam1), m2 / m1), all.equal(as.matrix(nam2 ^ nam1), m2 ^ m1), identical(as.matrix(nam2 %% nam1), m2 %% m1), identical(as.matrix(nam2 %/% nam1), m2 %/% m1) )
nam1 <- NaArray(dim=c(15, 6), type="integer") nam1[cbind(1:15, 2)] <- 100:114 nam1[cbind(1:15, 5)] <- -(114:100) nam1 nam1 * -0.01 nam1 * 10 # result is of type "double" nam1 * 10L # result is of type "integer" nam1 / 10L nam1 ^ 3.5 nam1 %% 5L nam1 %/% 5L nam2 <- NaArray(dim=dim(nam1), type="double") nam2[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- runif(26) nam2 nam2 + nam1 nam2 - nam1 nam2 * nam1 nam2 / nam1 nam2 ^ nam1 nam2 %% nam1 nam2 %/% nam1 ## Sanity checks: m1 <- as.matrix(nam1) m2 <- as.matrix(nam2) stopifnot( identical(as.matrix(nam1 * -0.01), m1 * -0.01), identical(as.matrix(nam1 * 10), m1 * 10), identical(as.matrix(nam1 * 10L), m1 * 10L), identical(as.matrix(nam1 / 10L), m1 / 10L), identical(as.matrix(nam1 ^ 3.5), m1 ^ 3.5), identical(as.matrix(nam1 %% 5L), m1 %% 5L), identical(as.matrix(nam1 %/% 5L), m1 %/% 5L), identical(as.matrix(nam2 + nam1), m2 + m1), identical(as.matrix(nam2 - nam1), m2 - m1), identical(as.matrix(nam2 * nam1), m2 * m1), identical(as.matrix(nam2 / nam1), m2 / m1), all.equal(as.matrix(nam2 ^ nam1), m2 ^ m1), identical(as.matrix(nam2 %% nam1), m2 %% m1), identical(as.matrix(nam2 %/% nam1), m2 %/% m1) )
EXPERIMENTAL!!!
NaArray objects support all operations from the Compare
group.
See ?S4groupGeneric
in the methods package
for more information about the Compare
group generic.
Three forms of 'Compare' operations involving NaArray objects are supported:
Between an NaArray object naa
and a single
value y
:
naa op y y op naa
Between two NaArray objects naa1
and naa2
of same dimensions (a.k.a. conformable arrays):
naa1 op naa2
Between an NaArray object naa
and an
SVT_SparseArray object svt
of same dimensions
(a.k.a. conformable arrays):
naa op svt svt op naa
An NaArray object of type()
"logical"
and same
dimensions as the input object(s).
S4groupGeneric
in the methods package.
NaArray objects.
SVT_SparseArray objects.
Ordinary array objects in base R.
nam1 <- NaArray(dim=c(15, 6), type="double") nam1[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- runif(26) nam1 nam1 >= 0.2 nam1 != 0 nam2 <- NaArray(dim=dim(nam1), type="integer") nam2[cbind(1:15, 2)] <- 100:114 nam2[cbind(1:15, 5)] <- -(114:100) nam2 nam1 < nam2 ## Sanity checks: m1 <- as.matrix(nam1) m2 <- as.matrix(nam2) stopifnot( identical(as.matrix(nam1 >= 0.2), m1 >= 0.2), identical(as.matrix(nam1 != 0), m1 != 0), identical(as.matrix(nam1 < nam2), m1 < m2) )
nam1 <- NaArray(dim=c(15, 6), type="double") nam1[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- runif(26) nam1 nam1 >= 0.2 nam1 != 0 nam2 <- NaArray(dim=dim(nam1), type="integer") nam2[cbind(1:15, 2)] <- 100:114 nam2[cbind(1:15, 5)] <- -(114:100) nam2 nam1 < nam2 ## Sanity checks: m1 <- as.matrix(nam1) m2 <- as.matrix(nam2) stopifnot( identical(as.matrix(nam1 >= 0.2), m1 >= 0.2), identical(as.matrix(nam1 != 0), m1 != 0), identical(as.matrix(nam1 < nam2), m1 < m2) )
EXPERIMENTAL!!!
NaArray objects support operations from the Logic
group
(i.e. &
and |
), as well as logical negation (!
).
See ?S4groupGeneric
in the methods package
for more information about the Logic
group generic.
Note that in base R, Logic
operations support input of type()
"logical"
, "integer"
, "double"
, or "complex"
.
However, the corresponding methods for NaArray objects only
support objects of type()
"logical"
for now.
Three forms of 'Logic' operations involving NaArray objects are supported:
Between an NaArray object naa
and a single
logical value y
:
naa op y y op naa
Between two NaArray objects naa1
and naa2
of same dimensions (a.k.a. conformable arrays):
naa1 op naa2
Between an NaArray object naa
and an
SVT_SparseArray object svt
of same dimensions
(a.k.a. conformable arrays):
naa op svt svt op naa
Note that, in this case, |
returns an NaArray
object but &
returns an SVT_SparseArray object.
An NaArray object of type()
"logical"
and same
dimensions as the input object(s), except when &
is used
between an NaArray object and an SVT_SparseArray object
in which case an SVT_SparseArray object is returned.
S4groupGeneric
in the methods package.
NaArray objects.
SVT_SparseArray objects.
Ordinary array objects in base R.
nam1 <- NaArray(dim=c(15, 6)) nam1[cbind(1:15, 2)] <- c(TRUE, FALSE, NA) nam1[cbind(1:15, 5)] <- c(TRUE, NA, NA, FALSE, TRUE) nam1 !nam1 nam1 & NA # replaces all TRUE's with NA's nam1 | NA # replaces all FALSE's with NA's nam2 <- NaArray(dim=dim(nam1)) nam2[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- c(TRUE, NA) nam2 nam1 & nam2 nam1 | nam2 ## Sanity checks: m1 <- as.matrix(nam1) m2 <- as.matrix(nam2) stopifnot( identical(as.matrix(!nam1), !m1), identical(as.matrix(nam1 & NA), m1 & NA), identical(as.matrix(nam1 | NA), m1 | NA), identical(as.matrix(nam1 & nam2), m1 & m2), identical(as.matrix(nam1 | nam2), m1 | m2) )
nam1 <- NaArray(dim=c(15, 6)) nam1[cbind(1:15, 2)] <- c(TRUE, FALSE, NA) nam1[cbind(1:15, 5)] <- c(TRUE, NA, NA, FALSE, TRUE) nam1 !nam1 nam1 & NA # replaces all TRUE's with NA's nam1 | NA # replaces all FALSE's with NA's nam2 <- NaArray(dim=dim(nam1)) nam2[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- c(TRUE, NA) nam2 nam1 & nam2 nam1 | nam2 ## Sanity checks: m1 <- as.matrix(nam1) m2 <- as.matrix(nam2) stopifnot( identical(as.matrix(!nam1), !m1), identical(as.matrix(nam1 & NA), m1 & NA), identical(as.matrix(nam1 | NA), m1 | NA), identical(as.matrix(nam1 & nam2), m1 & m2), identical(as.matrix(nam1 | nam2), m1 | m2) )
EXPERIMENTAL!!!
NaArray objects support all operations from the Math
and Math2
groups with a few exceptions.
See ?S4groupGeneric
in the methods package
for more information about the Math
and Math2
group generics.
Note that Math
and Math2
operations only support
NaArray objects of type()
"double"
at the moment.
NaArray objects of type()
"integer"
are not
supported yet.
A NaArray derivative of the same dimensions as the input object.
S4groupGeneric
in the methods package.
NaArray objects.
Ordinary array objects in base R.
nam <- NaArray(dim=c(15, 6)) nam[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- c(runif(22)*1e4, Inf, -Inf, NA, NaN) log(nam) exp(nam) cos(nam) lgamma(nam) ## Sanity checks: m <- as.matrix(nam) stopifnot( suppressWarnings(identical(as.matrix(log(nam)), log(m))), identical(as.matrix(exp(nam)), exp(m)), suppressWarnings(identical(as.matrix(cos(nam)), cos(m))), suppressWarnings(identical(as.matrix(lgamma(nam)), lgamma(m))) )
nam <- NaArray(dim=c(15, 6)) nam[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- c(runif(22)*1e4, Inf, -Inf, NA, NaN) log(nam) exp(nam) cos(nam) lgamma(nam) ## Sanity checks: m <- as.matrix(nam) stopifnot( suppressWarnings(identical(as.matrix(log(nam)), log(m))), identical(as.matrix(exp(nam)), exp(m)), suppressWarnings(identical(as.matrix(cos(nam)), cos(m))), suppressWarnings(identical(as.matrix(lgamma(nam)), lgamma(m))) )
EXPERIMENTAL!!!
The SparseArray package provides memory-efficient col/row
summarization methods (a.k.a. matrixStats methods) for NaArray
objects, like colSums()
, rowSums()
, colMeans()
,
rowMeans()
, etc...
Note that these are S4 generic functions defined in the MatrixGenerics package, with methods for ordinary matrices defined in the matrixStats package. This man page documents the methods defined for NaArray objects.
IMPORTANT NOTE: This is WORK-IN-PROGRESS! All the col*()
methods
listed below are supported. However, among the row*()
methods,
only rowAnyNAs()
, rowMins()
, rowMaxs()
,
rowRanges()
, rowSums()
, and rowSums2()
are
supported on NaArray objects at the moment.
## N.B.: Showing ONLY the col*() methods (usage of row*() methods is ## the same): ## S4 method for signature 'NaArray' colAnyNAs(x, rows=NULL, cols=NULL, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colAnys(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colAlls(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colMins(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colMaxs(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colRanges(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colSums(x, na.rm=FALSE, dims=1) ## S4 method for signature 'NaArray' colProds(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colMeans(x, na.rm=FALSE, dims=1) ## S4 method for signature 'NaArray' colSums2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colMeans2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colVars(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colSds(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1, ..., useNames=NA)
## N.B.: Showing ONLY the col*() methods (usage of row*() methods is ## the same): ## S4 method for signature 'NaArray' colAnyNAs(x, rows=NULL, cols=NULL, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colAnys(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colAlls(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colMins(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colMaxs(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colRanges(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colSums(x, na.rm=FALSE, dims=1) ## S4 method for signature 'NaArray' colProds(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colMeans(x, na.rm=FALSE, dims=1) ## S4 method for signature 'NaArray' colSums2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colMeans2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colVars(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1, ..., useNames=NA) ## S4 method for signature 'NaArray' colSds(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1, ..., useNames=NA)
x |
|
rows , cols , ...
|
Not supported. |
na.rm , useNames , center
|
See man pages of the corresponding generics in the MatrixGenerics
package (e.g. Note that, unlike the methods for ordinary matrices defined in
the matrixStats package, the |
dims |
See |
These methods are typically used with na.rm=TRUE
when called
on an NaMatrix or NaArray object.
All these methods operate natively on the NaArray internal representation, for maximum efficiency.
Note that more col/row summarization methods might be added in the future.
See man pages of the corresponding generics in the MatrixGenerics
package (e.g. ?MatrixGenerics::colRanges
)
for the value returned by these methods.
Most col*()
methods for NaArray objects are multithreaded.
See set_SparseArray_nthread
for how to control the number
of threads.
NaArray objects.
The man pages of the various generic functions defined
in the MatrixGenerics package e.g.
MatrixGenerics::colVars
etc...
# COMING SOON...
# COMING SOON...
This man page documents various base array operations that are supported by NaArray derivatives, and that didn't belong to any of the groups of operations documented in the other man pages of the SparseArray package.
# --- unary isometric array transformations --- ## S4 method for signature 'NaArray' is.nan(x) ## S4 method for signature 'NaArray' is.infinite(x) # --- N-ary isometric array transformations --- # COMING SOON...
# --- unary isometric array transformations --- ## S4 method for signature 'NaArray' is.nan(x) ## S4 method for signature 'NaArray' is.infinite(x) # --- N-ary isometric array transformations --- # COMING SOON...
x |
An NaArray object. |
More operations will be added in the future.
is.nan()
and is.infinite()
return a SparseArray
object of type()
"logical"
and same dimensions as the
input object.
base::is.nan
and
base::is.infinite
in base R.
NaArray objects.
SparseArray objects.
Ordinary array objects in base R.
a <- array(c(NA, 2.77, NaN, Inf, NA, -Inf), dim=5:3) naa <- NaArray(a) # NaArray object naa is.nan(naa) # SparseArray object of type "logical" is.infinite(naa) # SparseArray object of type "logical" ## Sanity checks: res <- is.nan(naa) stopifnot(is(res, "SparseArray"), type(res) == "logical", identical(as.array(res), is.nan(a))) res <- is.infinite(naa) stopifnot(is(res, "SparseArray"), type(res) == "logical", identical(as.array(res), is.infinite(a)))
a <- array(c(NA, 2.77, NaN, Inf, NA, -Inf), dim=5:3) naa <- NaArray(a) # NaArray object naa is.nan(naa) # SparseArray object of type "logical" is.infinite(naa) # SparseArray object of type "logical" ## Sanity checks: res <- is.nan(naa) stopifnot(is(res, "SparseArray"), type(res) == "logical", identical(as.array(res), is.nan(a))) res <- is.infinite(naa) stopifnot(is(res, "SparseArray"), type(res) == "logical", identical(as.array(res), is.infinite(a)))
EXPERIMENTAL!!!
Like ordinary arrays in base R, NaArray objects support
subsetting via the single bracket operator ([
).
drop
in base R to drop the ineffective
dimensions of an array or array-like object.
Lindex2Mindex
in the S4Arrays
package for how to convert an L-index to an M-index
and vice-versa.
NaArray objects.
naa <- NaArray(dim=5:3) naa[c(1:2, 8, 10, 15:17, 20, 24, 40, 56:60)] <- (1:15)*10L ## --------------------------------------------------------------------- ## N-dimensional subsetting ## --------------------------------------------------------------------- naa[5:3, c(4,2,4), 2:3] naa[ , c(4,2,4), 2:3] naa[ , c(4,2,4), -1] naa[ , c(4,2,4), 1] naa2 <- naa[ , c(4,2,4), 1, drop=FALSE] naa2 ## Ineffective dimensions can always be dropped as a separate step: drop(naa2) naa[ , c(4,2,4), integer(0)] dimnames(naa) <- list(letters[1:5], NULL, LETTERS[1:3]) naa[c("d", "a"), c(4,2,4), "C"] naa2 <- naa["e", c(4,2,4), , drop=FALSE] naa2 drop(naa2) ## --------------------------------------------------------------------- ## 1D-style subsetting (a.k.a. linear subsetting) ## --------------------------------------------------------------------- ## Using a numeric vector (L-index): naa[c(60, 24, 21, 56)] ## Using a matrix subscript (M-index): m <- rbind(c(5, 4, 3), c(4, 1, 2), c(1, 1, 2), c(1, 4, 3)) naa[m] ## See '?Lindex2Mindex' in the S4Arrays package for how to convert an ## L-index to an M-index and vice-versa. ## --------------------------------------------------------------------- ## Sanity checks ## --------------------------------------------------------------------- a <- as.array(naa) naa2 <- naa[5:3, c(4,2,4), 2:3] a2 <- a [5:3, c(4,2,4), 2:3] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- naa[ , c(4,2,4), 2:3] a2 <- a [ , c(4,2,4), 2:3] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- naa[ , c(4,2,4), -1] a2 <- a [ , c(4,2,4), -1] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- naa[ , c(4,2,4), 1] a2 <- a [ , c(4,2,4), 1] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- naa[ , c(4,2,4), 1, drop=FALSE] a2 <- a [ , c(4,2,4), 1, drop=FALSE] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- drop(naa2) a2 <- drop(a2) stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- naa[ , c(4,2,4), integer(0)] a2 <- a [ , c(4,2,4), integer(0)] stopifnot(identical(as.array(naa2), a2), identical(unname(naa2), unname(NaArray(a2)))) naa2 <- naa[c("d", "a"), c(4,2,4), "C"] a2 <- a [c("d", "a"), c(4,2,4), "C"] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- naa["e", c(4,2,4), , drop=FALSE] a2 <- a ["e", c(4,2,4), , drop=FALSE] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- drop(naa2) a2 <- drop(a2) stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) stopifnot(identical(naa[c(60, 24, 21, 56)], naa[m]))
naa <- NaArray(dim=5:3) naa[c(1:2, 8, 10, 15:17, 20, 24, 40, 56:60)] <- (1:15)*10L ## --------------------------------------------------------------------- ## N-dimensional subsetting ## --------------------------------------------------------------------- naa[5:3, c(4,2,4), 2:3] naa[ , c(4,2,4), 2:3] naa[ , c(4,2,4), -1] naa[ , c(4,2,4), 1] naa2 <- naa[ , c(4,2,4), 1, drop=FALSE] naa2 ## Ineffective dimensions can always be dropped as a separate step: drop(naa2) naa[ , c(4,2,4), integer(0)] dimnames(naa) <- list(letters[1:5], NULL, LETTERS[1:3]) naa[c("d", "a"), c(4,2,4), "C"] naa2 <- naa["e", c(4,2,4), , drop=FALSE] naa2 drop(naa2) ## --------------------------------------------------------------------- ## 1D-style subsetting (a.k.a. linear subsetting) ## --------------------------------------------------------------------- ## Using a numeric vector (L-index): naa[c(60, 24, 21, 56)] ## Using a matrix subscript (M-index): m <- rbind(c(5, 4, 3), c(4, 1, 2), c(1, 1, 2), c(1, 4, 3)) naa[m] ## See '?Lindex2Mindex' in the S4Arrays package for how to convert an ## L-index to an M-index and vice-versa. ## --------------------------------------------------------------------- ## Sanity checks ## --------------------------------------------------------------------- a <- as.array(naa) naa2 <- naa[5:3, c(4,2,4), 2:3] a2 <- a [5:3, c(4,2,4), 2:3] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- naa[ , c(4,2,4), 2:3] a2 <- a [ , c(4,2,4), 2:3] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- naa[ , c(4,2,4), -1] a2 <- a [ , c(4,2,4), -1] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- naa[ , c(4,2,4), 1] a2 <- a [ , c(4,2,4), 1] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- naa[ , c(4,2,4), 1, drop=FALSE] a2 <- a [ , c(4,2,4), 1, drop=FALSE] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- drop(naa2) a2 <- drop(a2) stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- naa[ , c(4,2,4), integer(0)] a2 <- a [ , c(4,2,4), integer(0)] stopifnot(identical(as.array(naa2), a2), identical(unname(naa2), unname(NaArray(a2)))) naa2 <- naa[c("d", "a"), c(4,2,4), "C"] a2 <- a [c("d", "a"), c(4,2,4), "C"] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- naa["e", c(4,2,4), , drop=FALSE] a2 <- a ["e", c(4,2,4), , drop=FALSE] stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) naa2 <- drop(naa2) a2 <- drop(a2) stopifnot(identical(as.array(naa2), a2), identical(naa2, NaArray(a2))) stopifnot(identical(naa[c(60, 24, 21, 56)], naa[m]))
EXPERIMENTAL!!!
The SparseArray package provides memory-efficient summarization
methods for NaArray objects. The following methods are
supported at the moment: anyNA()
, any()
, all()
,
min()
, max()
, range()
, sum()
, prod()
,
mean()
, var()
, sd()
.
More might be added in the future.
Note that these are S4 generic functions defined in base R and in the BiocGenerics package, with default methods defined in base R. This man page documents the methods defined for NaArray objects.
All these methods operate natively on the NaArray representation, for maximum efficiency.
See man pages of the corresponding default methods in the
base package (e.g. ?base::range
,
?base::mean
, etc...) for the value returned
by these methods.
NaArray objects.
The man pages of the various default methods defined in
the base package e.g. base::range
,
base::mean
, base::anyNA
,
etc...
naa <- NaArray(dim=c(4, 5, 2)) naa[c(1:2, 8, 10, 15:17, 24:26, 28, 40)] <- (1:12)*10L naa anyNA(naa) range(naa, na.rm=TRUE) sum(naa, na.rm=TRUE) sd(naa, na.rm=TRUE) ## Sanity checks: a0 <- as.array(naa) stopifnot( identical(anyNA(naa), anyNA(a0)), identical(range(naa), range(a0)), identical(range(naa, na.rm=TRUE), range(a0, na.rm=TRUE)), identical(sum(naa), sum(a0)), identical(sum(naa, na.rm=TRUE), sum(a0, na.rm=TRUE)), all.equal(sd(naa, na.rm=TRUE), sd(a0, na.rm=TRUE)) )
naa <- NaArray(dim=c(4, 5, 2)) naa[c(1:2, 8, 10, 15:17, 24:26, 28, 40)] <- (1:12)*10L naa anyNA(naa) range(naa, na.rm=TRUE) sum(naa, na.rm=TRUE) sd(naa, na.rm=TRUE) ## Sanity checks: a0 <- as.array(naa) stopifnot( identical(anyNA(naa), anyNA(a0)), identical(range(naa), range(a0)), identical(range(naa, na.rm=TRUE), range(a0, na.rm=TRUE)), identical(sum(naa), sum(a0)), identical(sum(naa, na.rm=TRUE), sum(a0, na.rm=TRUE)), all.equal(sd(naa, na.rm=TRUE), sd(a0, na.rm=TRUE)) )
randomSparseArray()
and poissonSparseArray()
can be used
to generate a random SparseArray object efficiently.
randomSparseArray(dim, density=0.05, dimnames=NULL) poissonSparseArray(dim, lambda=-log(0.95), density=NA, dimnames=NULL) ## Convenience wrappers for the 2D case: randomSparseMatrix(nrow, ncol, density=0.05, dimnames=NULL) poissonSparseMatrix(nrow, ncol, lambda=-log(0.95), density=NA, dimnames=NULL)
randomSparseArray(dim, density=0.05, dimnames=NULL) poissonSparseArray(dim, lambda=-log(0.95), density=NA, dimnames=NULL) ## Convenience wrappers for the 2D case: randomSparseMatrix(nrow, ncol, density=0.05, dimnames=NULL) poissonSparseMatrix(nrow, ncol, lambda=-log(0.95), density=NA, dimnames=NULL)
dim |
The dimensions (specified as an integer vector) of the SparseArray object to generate. |
density |
The desired density (specified as a number >= 0 and <= 1) of the
SparseArray object to generate, that is, the ratio between its
number of nonzero elements and its total number of elements.
This is Note that for |
dimnames |
The dimnames to put on the object to generate. Must be |
lambda |
The mean of the Poisson distribution. Passed internally to the calls
to Only one of When Finally note that the default value for |
nrow , ncol
|
Number of rows and columns of the SparseMatrix object to generate. |
randomSparseArray()
mimics the rsparsematrix()
function from the Matrix package but returns a SparseArray
object instead of a dgCMatrix object.
poissonSparseArray()
populates a SparseArray object with
Poisson data i.e. it's equivalent to:
a <- array(rpois(prod(dim), lambda), dim) as(a, "SparseArray")
but is faster and more memory efficient because intermediate dense array
a
is never generated.
A SparseArray derivative (of class SVT_SparseArray or SVT_SparseMatrix) with the requested dimensions and density.
The type of the returned object is "double"
for
randomSparseArray()
and randomSparseMatrix()
,
and "integer"
for poissonSparseArray()
and
poissonSparseMatrix()
.
Unlike with Matrix::rsparsematrix()
there's no
limit on the number of nonzero elements that can be contained in the
returned SparseArray object.
For example Matrix::rsparsematrix(3e5, 2e4, density=0.5)
will fail
with an error but randomSparseMatrix(3e5, 2e4, density=0.5)
should
work (even though it will take some time and the memory footprint of the
resulting object will be about 18 Gb).
The Matrix::rsparsematrix
function in
the Matrix package.
The stats::rpois
function in the
stats package.
SVT_SparseArray objects.
## --------------------------------------------------------------------- ## randomSparseArray() / randomSparseMatrix() ## --------------------------------------------------------------------- set.seed(123) dgcm1 <- rsparsematrix(2500, 950, density=0.1) set.seed(123) svt1 <- randomSparseMatrix(2500, 950, density=0.1) svt1 type(svt1) # "double" stopifnot(identical(as(svt1, "dgCMatrix"), dgcm1)) ## --------------------------------------------------------------------- ## poissonSparseArray() / poissonSparseMatrix() ## --------------------------------------------------------------------- svt2 <- poissonSparseMatrix(2500, 950, density=0.1) svt2 type(svt2) # "integer" 1 - sparsity(svt2) # very close to the requested density set.seed(123) svt3 <- poissonSparseArray(c(600, 1700, 80), lambda=0.01) set.seed(123) a3 <- array(rpois(length(svt3), lambda=0.01), dim(svt3)) stopifnot(identical(svt3, SparseArray(a3))) ## The memory footprint of 'svt3' is 10x smaller than that of 'a3': object.size(svt3) object.size(a3) as.double(object.size(a3) / object.size(svt3))
## --------------------------------------------------------------------- ## randomSparseArray() / randomSparseMatrix() ## --------------------------------------------------------------------- set.seed(123) dgcm1 <- rsparsematrix(2500, 950, density=0.1) set.seed(123) svt1 <- randomSparseMatrix(2500, 950, density=0.1) svt1 type(svt1) # "double" stopifnot(identical(as(svt1, "dgCMatrix"), dgcm1)) ## --------------------------------------------------------------------- ## poissonSparseArray() / poissonSparseMatrix() ## --------------------------------------------------------------------- svt2 <- poissonSparseMatrix(2500, 950, density=0.1) svt2 type(svt2) # "integer" 1 - sparsity(svt2) # very close to the requested density set.seed(123) svt3 <- poissonSparseArray(c(600, 1700, 80), lambda=0.01) set.seed(123) a3 <- array(rpois(length(svt3), lambda=0.01), dim(svt3)) stopifnot(identical(svt3, SparseArray(a3))) ## The memory footprint of 'svt3' is 10x smaller than that of 'a3': object.size(svt3) object.size(a3) as.double(object.size(a3) / object.size(svt3))
Read/write a sparse matrix from/to a CSV (comma-separated values) file.
writeSparseCSV(x, filepath, sep=",", transpose=FALSE, write.zeros=FALSE, chunknrow=250) readSparseCSV(filepath, sep=",", transpose=FALSE)
writeSparseCSV(x, filepath, sep=",", transpose=FALSE, write.zeros=FALSE, chunknrow=250) readSparseCSV(filepath, sep=",", transpose=FALSE)
x |
A matrix-like object, typically sparse. IMPORTANT: The object must have rownames and colnames! These will be written to the file. Another requirement is that the object must be subsettable. More
precisely: it must support 2D-style subsetting of the kind |
filepath |
The path (as a single string) to the file where to write the matrix-like object or to read it from. Compressed files are supported. If Note that |
sep |
The field separator character. Values on each line of the file are separated by this character. |
transpose |
Note that using |
write.zeros |
|
chunknrow |
|
writeSparseCSV
returns an invisible NULL
.
readSparseCSV
returns a SparseMatrix object of class
SVT_SparseMatrix.
SparseArray objects.
dgCMatrix objects implemented in the Matrix package.
## --------------------------------------------------------------------- ## writeSparseCSV() ## --------------------------------------------------------------------- ## Prepare toy matrix 'm0': rownames0 <- LETTERS[1:6] colnames0 <- letters[1:4] m0 <- matrix(0L, nrow=length(rownames0), ncol=length(colnames0), dimnames=list(rownames0, colnames0)) m0[c(1:2, 8, 10, 15:17, 24)] <- (1:8)*10L m0 ## writeSparseCSV(): writeSparseCSV(m0, filepath="", sep="\t") writeSparseCSV(m0, filepath="", sep="\t", write.zeros=TRUE) writeSparseCSV(m0, filepath="", sep="\t", transpose=TRUE) ## Note that writeSparseCSV() will automatically (and silently) coerce ## non-integer values to integer by passing them thru as.integer(). ## Example where type(x) is "double": m1 <- m0 * runif(length(m0)) m1 type(m1) writeSparseCSV(m1, filepath="", sep="\t") ## Example where type(x) is "logical": writeSparseCSV(m0 != 0, filepath="", sep="\t") ## Example where type(x) is "raw": m2 <- m0 type(m2) <- "raw" m2 writeSparseCSV(m2, filepath="", sep="\t") ## --------------------------------------------------------------------- ## readSparseCSV() ## --------------------------------------------------------------------- csv_file <- tempfile() writeSparseCSV(m0, csv_file) svt1 <- readSparseCSV(csv_file) svt1 svt2 <- readSparseCSV(csv_file, transpose=TRUE) svt2 ## If you need the sparse data as a dgCMatrix object, just coerce the ## returned object: as(svt1, "dgCMatrix") as(svt2, "dgCMatrix") ## Sanity checks: stopifnot(identical(m0, as.matrix(svt1))) stopifnot(identical(t(m0), as.matrix(svt2)))
## --------------------------------------------------------------------- ## writeSparseCSV() ## --------------------------------------------------------------------- ## Prepare toy matrix 'm0': rownames0 <- LETTERS[1:6] colnames0 <- letters[1:4] m0 <- matrix(0L, nrow=length(rownames0), ncol=length(colnames0), dimnames=list(rownames0, colnames0)) m0[c(1:2, 8, 10, 15:17, 24)] <- (1:8)*10L m0 ## writeSparseCSV(): writeSparseCSV(m0, filepath="", sep="\t") writeSparseCSV(m0, filepath="", sep="\t", write.zeros=TRUE) writeSparseCSV(m0, filepath="", sep="\t", transpose=TRUE) ## Note that writeSparseCSV() will automatically (and silently) coerce ## non-integer values to integer by passing them thru as.integer(). ## Example where type(x) is "double": m1 <- m0 * runif(length(m0)) m1 type(m1) writeSparseCSV(m1, filepath="", sep="\t") ## Example where type(x) is "logical": writeSparseCSV(m0 != 0, filepath="", sep="\t") ## Example where type(x) is "raw": m2 <- m0 type(m2) <- "raw" m2 writeSparseCSV(m2, filepath="", sep="\t") ## --------------------------------------------------------------------- ## readSparseCSV() ## --------------------------------------------------------------------- csv_file <- tempfile() writeSparseCSV(m0, csv_file) svt1 <- readSparseCSV(csv_file) svt1 svt2 <- readSparseCSV(csv_file, transpose=TRUE) svt2 ## If you need the sparse data as a dgCMatrix object, just coerce the ## returned object: as(svt1, "dgCMatrix") as(svt2, "dgCMatrix") ## Sanity checks: stopifnot(identical(m0, as.matrix(svt1))) stopifnot(identical(t(m0), as.matrix(svt2)))
The SparseArray package provides memory-efficient
rowsum()
and colsum()
methods for SparseMatrix and dgCMatrix objects.
## S4 method for signature 'SparseMatrix' rowsum(x, group, reorder=TRUE, ...) ## S4 method for signature 'dgCMatrix' rowsum(x, group, reorder=TRUE, ...) ## S4 method for signature 'SparseMatrix' colsum(x, group, reorder=TRUE, ...) ## S4 method for signature 'dgCMatrix' colsum(x, group, reorder=TRUE, ...)
## S4 method for signature 'SparseMatrix' rowsum(x, group, reorder=TRUE, ...) ## S4 method for signature 'dgCMatrix' rowsum(x, group, reorder=TRUE, ...) ## S4 method for signature 'SparseMatrix' colsum(x, group, reorder=TRUE, ...) ## S4 method for signature 'dgCMatrix' colsum(x, group, reorder=TRUE, ...)
x |
A SparseMatrix or dgCMatrix object. |
group , reorder
|
See |
... |
Like the default S3 |
An ordinary matrix, like the default rowsum()
method.
See ?base::rowsum
for how the matrix returned
by the default rowsum()
method is obtained.
rowsum
in base R.
S4Arrays::rowsum
in the S4Arrays
package for the rowsum()
and colsum()
S4 generic
functions.
SparseMatrix objects.
dgCMatrix objects implemented in the Matrix package.
svt0 <- randomSparseMatrix(7e5, 100, density=0.15) dgcm0 <- as(svt0, "dgCMatrix") m0 <- as.matrix(svt0) group <- sample(10, nrow(m0), replace=TRUE) ## Calling rowsum() on the sparse representations is usually faster ## than on the dense representation: rs1 <- rowsum(m0, group) rs2 <- rowsum(svt0, group) # about 3x faster rs3 <- rowsum(dgcm0, group) # also about 3x faster ## Sanity checks: stopifnot(identical(rs1, rs2)) stopifnot(identical(rs1, rs3))
svt0 <- randomSparseMatrix(7e5, 100, density=0.15) dgcm0 <- as(svt0, "dgCMatrix") m0 <- as.matrix(svt0) group <- sample(10, nrow(m0), replace=TRUE) ## Calling rowsum() on the sparse representations is usually faster ## than on the dense representation: rs1 <- rowsum(m0, group) rs2 <- rowsum(svt0, group) # about 3x faster rs3 <- rowsum(dgcm0, group) # also about 3x faster ## Sanity checks: stopifnot(identical(rs1, rs2)) stopifnot(identical(rs1, rs3))
The SparseArray package defines the SparseArray virtual class whose purpose is to be extended by other S4 classes that aim at representing in-memory multidimensional sparse arrays.
It has currently two concrete subclasses, COO_SparseArray and SVT_SparseArray, both also defined in this package. Each subclass uses its own internal representation for the nonzero multidimensional data, the COO layout for COO_SparseArray, and the SVT layout for SVT_SparseArray. The two layouts are described in the COO_SparseArray and SVT_SparseArray man pages, respectively.
Finally, the package also defines the SparseMatrix virtual class, as a subclass of the SparseArray class, for the specific 2D case.
## Constructor function: SparseArray(x, type=NA)
## Constructor function: SparseArray(x, type=NA)
x |
An ordinary matrix or array, or a dg[C|R]Matrix object, or an lg[C|R]Matrix object, or any matrix-like or array-like object that supports coercion to SVT_SparseArray. |
type |
A single string specifying the requested type of the object. By default, the SparseArray object returned by the constructor
function has the same sa <- SparseArray(x, type=type) is equivalent to doing: sa <- SparseArray(x) type(sa) <- type but the former is more convenient and will generally be more efficient. Supported types are all R atomic types plus |
The SparseArray class extends the Array virtual class defined in the S4Arrays package. Here is the full SparseArray sub-hierarchy as defined in the SparseArray package (virtual classes are marked with an asterisk):
: Array class : Array* : hierarchy : ^ | - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - : SparseArray : SparseArray* : sub-hierarchy : ^ ^ ^ | | | COO_SparseArray | SVT_SparseArray ^ | ^ - - - - - - - - - - - - | - - - - | - - - - | - - - - - - - - - - : SparseMatrix : | SparseMatrix* | : sub-sub-hierarchy : | ^ ^ | | | | | COO_SparseMatrix SVT_SparseMatrix
Any object that belongs to a class that extends SparseArray e.g. (a SVT_SparseArray or SVT_SparseMatrix object) is called a SparseArray derivative.
Most of the standard matrix and array API defined in base R should
work on SparseArray derivatives, including dim()
, length()
,
dimnames()
, `dimnames<-`()
, [
, drop()
,
`[<-`
(subassignment), t()
, rbind()
, cbind()
,
etc...
SparseArray derivatives also support type()
, `type<-`()
,
is_sparse()
, nzcount()
, nzwhich()
, nzvals()
,
`nzvals<-`()
, sparsity()
, arbind()
, and acbind()
.
A SparseArray derivative, that is a SVT_SparseArray, COO_SparseArray, SVT_SparseMatrix, or COO_SparseMatrix object.
The type()
of the input object is preserved, except if a
different one was requested via the type
argument.
What is considered a zero depends on the type()
:
"logical"
zero is FALSE
;
"integer"
zero is 0L
;
"double"
zero is 0
;
"complex"
zero is 0+0i
;
"raw"
zero is raw(1)
;
"character"
zero is ""
(empty string);
"list"
zero is NULL
.
The COO_SparseArray and SVT_SparseArray classes.
is_nonzero for is_nonzero()
and nz*()
functions
nzcount()
, nzwhich()
, etc...
SparseArray_aperm for permuting the dimensions of a SparseArray object (e.g. transposition).
SparseArray_subsetting for subsetting a SparseArray object.
SparseArray_subassignment for SparseArray subassignment.
SparseArray_abind for combining 2D or multidimensional SparseArray objects.
SparseArray_summarization for SparseArray summarization methods.
SparseArray_Arith, SparseArray_Compare, and
SparseArray_Logic, for operations from the Arith
,
Compare
, and Logic
groups on SparseArray objects.
SparseArray_Math for operations from the Math
and
Math2
groups on SparseArray objects.
SparseArray_Complex for operations from the Complex
group on SparseArray objects.
SparseArray_misc for miscellaneous operations on a SparseArray object.
SparseArray_matrixStats for col/row summarization methods for SparseArray objects.
rowsum_methods for rowsum()
methods for sparse
matrices.
SparseMatrix_mult for SparseMatrix multiplication and cross-product.
randomSparseArray
to generate a random SparseArray
object.
readSparseCSV
to read/write a sparse matrix
from/to a CSV (comma-separated values) file.
S4 classes dgCMatrix, dgRMatrix, and lgCMatrix defined in the Matrix package, for the de facto standard for sparse matrix representations in the R ecosystem.
is_sparse
in the S4Arrays package.
The Array class defined in the S4Arrays package.
Ordinary array objects in base R.
base::which
in base R.
## --------------------------------------------------------------------- ## Display details of class definition & known subclasses ## --------------------------------------------------------------------- showClass("SparseArray") ## --------------------------------------------------------------------- ## The SparseArray() constructor ## --------------------------------------------------------------------- a <- array(rpois(9e6, lambda=0.3), dim=c(500, 3000, 6)) SparseArray(a) # an SVT_SparseArray object m <- matrix(rpois(9e6, lambda=0.3), ncol=500) SparseArray(m) # an SVT_SparseMatrix object dgc <- sparseMatrix(i=c(4:1, 2:4, 9:12, 11:9), j=c(1:7, 1:7), x=runif(14), dims=c(12, 7)) class(dgc) SparseArray(dgc) # an SVT_SparseMatrix object dgr <- as(dgc, "RsparseMatrix") class(dgr) SparseArray(dgr) # a COO_SparseMatrix object ## --------------------------------------------------------------------- ## nzcount(), nzwhich(), nzvals(), `nzvals<-`() ## --------------------------------------------------------------------- x <- SparseArray(a) ## Get the number of nonzero array elements in 'x': nzcount(x) ## nzwhich() returns the indices of the nonzero array elements in 'x'. ## Either as an integer (or numeric) vector of length 'nzcount(x)' ## containing "linear indices": nzidx <- nzwhich(x) length(nzidx) head(nzidx) ## Or as an integer matrix with 'nzcount(x)' rows and one column per ## dimension where the rows represent "array indices" (a.k.a. "array ## coordinates"): Mnzidx <- nzwhich(x, arr.ind=TRUE) dim(Mnzidx) ## Each row in the matrix is an n-tuple representing the "array ## coordinates" of a nonzero element in 'x': head(Mnzidx) tail(Mnzidx) ## Extract the values of the nonzero array elements in 'x' and return ## them in a vector "parallel" to 'nzwhich(x)': x_nzvals <- nzvals(x) # equivalent to 'x[nzwhich(x)]' length(x_nzvals) head(x_nzvals) nzvals(x) <- log1p(nzvals(x)) x ## Sanity checks: stopifnot(identical(nzidx, which(a != 0))) stopifnot(identical(Mnzidx, which(a != 0, arr.ind=TRUE, useNames=FALSE))) stopifnot(identical(x_nzvals, a[nzidx])) stopifnot(identical(x_nzvals, a[Mnzidx])) stopifnot(identical(`nzvals<-`(x, nzvals(x)), x))
## --------------------------------------------------------------------- ## Display details of class definition & known subclasses ## --------------------------------------------------------------------- showClass("SparseArray") ## --------------------------------------------------------------------- ## The SparseArray() constructor ## --------------------------------------------------------------------- a <- array(rpois(9e6, lambda=0.3), dim=c(500, 3000, 6)) SparseArray(a) # an SVT_SparseArray object m <- matrix(rpois(9e6, lambda=0.3), ncol=500) SparseArray(m) # an SVT_SparseMatrix object dgc <- sparseMatrix(i=c(4:1, 2:4, 9:12, 11:9), j=c(1:7, 1:7), x=runif(14), dims=c(12, 7)) class(dgc) SparseArray(dgc) # an SVT_SparseMatrix object dgr <- as(dgc, "RsparseMatrix") class(dgr) SparseArray(dgr) # a COO_SparseMatrix object ## --------------------------------------------------------------------- ## nzcount(), nzwhich(), nzvals(), `nzvals<-`() ## --------------------------------------------------------------------- x <- SparseArray(a) ## Get the number of nonzero array elements in 'x': nzcount(x) ## nzwhich() returns the indices of the nonzero array elements in 'x'. ## Either as an integer (or numeric) vector of length 'nzcount(x)' ## containing "linear indices": nzidx <- nzwhich(x) length(nzidx) head(nzidx) ## Or as an integer matrix with 'nzcount(x)' rows and one column per ## dimension where the rows represent "array indices" (a.k.a. "array ## coordinates"): Mnzidx <- nzwhich(x, arr.ind=TRUE) dim(Mnzidx) ## Each row in the matrix is an n-tuple representing the "array ## coordinates" of a nonzero element in 'x': head(Mnzidx) tail(Mnzidx) ## Extract the values of the nonzero array elements in 'x' and return ## them in a vector "parallel" to 'nzwhich(x)': x_nzvals <- nzvals(x) # equivalent to 'x[nzwhich(x)]' length(x_nzvals) head(x_nzvals) nzvals(x) <- log1p(nzvals(x)) x ## Sanity checks: stopifnot(identical(nzidx, which(a != 0))) stopifnot(identical(Mnzidx, which(a != 0, arr.ind=TRUE, useNames=FALSE))) stopifnot(identical(x_nzvals, a[nzidx])) stopifnot(identical(x_nzvals, a[Mnzidx])) stopifnot(identical(`nzvals<-`(x, nzvals(x)), x))
Like ordinary matrices and arrays in base R, SparseMatrix derivatives
can be combined by rows or columns, with rbind()
or cbind()
,
and multidimensional SparseArray derivatives can be bound along
any dimension with abind()
.
Note that arbind()
can also be used to combine the objects along
their first dimension, and acbind()
can be used to combine them
along their second dimension.
cbind
in base R.
abind
in the S4Arrays package.
SparseArray objects.
Ordinary array objects in base R.
## --------------------------------------------------------------------- ## COMBINING SparseMatrix OBJECTS ## --------------------------------------------------------------------- m1a <- matrix(1:15, nrow=3, ncol=5, dimnames=list(NULL, paste0("M1y", 1:5))) m1b <- matrix(101:135, nrow=7, ncol=5, dimnames=list(paste0("M2x", 1:7), paste0("M2y", 1:5))) sm1a <- SparseArray(m1a) sm1b <- SparseArray(m1b) rbind(sm1a, sm1b) ## --------------------------------------------------------------------- ## COMBINING SparseArray OBJECTS WITH 3 DIMENSIONS ## --------------------------------------------------------------------- a2a <- array(1:105, dim=c(5, 7, 3), dimnames=list(NULL, paste0("A1y", 1:7), NULL)) a2b <- array(1001:1105, dim=c(5, 7, 3), dimnames=list(paste0("A2x", 1:5), paste0("A2y", 1:7), NULL)) sa2a <- SparseArray(a2a) sa2b <- SparseArray(a2b) abind(sa2a, sa2b) # same as 'abind(sa2a, sa2b, along=3)' abind(sa2a, sa2b, rev.along=0) # same as 'abind(sa2a, sa2b, along=4)' a3a <- array(1:60, dim=c(3, 5, 4), dimnames=list(NULL, paste0("A1y", 1:5), NULL)) a3b <- array(101:240, dim=c(7, 5, 4), dimnames=list(paste0("A2x", 1:7), paste0("A2y", 1:5), NULL)) sa3a <- SparseArray(a3a) sa3b <- SparseArray(a3b) arbind(sa3a, sa3b) # same as 'abind(sa3a, sa3b, along=1)' ## --------------------------------------------------------------------- ## Sanity checks ## --------------------------------------------------------------------- sm1 <- rbind(sm1a, sm1b) m1 <- rbind(m1a, m1b) stopifnot(identical(as.array(sm1), m1), identical(sm1, SparseArray(m1))) sa2 <- abind(sa2a, sa2b) stopifnot(identical(sa2, abind(sa2a, sa2b, along=3))) a2 <- abind(a2a, a2b, along=3) stopifnot(identical(as.array(sa2), a2), identical(sa2, SparseArray(a2))) sa2 <- abind(sa2a, sa2b, rev.along=0) stopifnot(identical(sa2, abind(sa2a, sa2b, along=4))) a2 <- abind(a2a, a2b, along=4) stopifnot(identical(as.array(sa2), a2), identical(sa2, SparseArray(a2))) sa3 <- arbind(sa3a, sa3b) a3 <- arbind(a3a, a3b) stopifnot(identical(as.array(sa3), a3), identical(sa3, SparseArray(a3)))
## --------------------------------------------------------------------- ## COMBINING SparseMatrix OBJECTS ## --------------------------------------------------------------------- m1a <- matrix(1:15, nrow=3, ncol=5, dimnames=list(NULL, paste0("M1y", 1:5))) m1b <- matrix(101:135, nrow=7, ncol=5, dimnames=list(paste0("M2x", 1:7), paste0("M2y", 1:5))) sm1a <- SparseArray(m1a) sm1b <- SparseArray(m1b) rbind(sm1a, sm1b) ## --------------------------------------------------------------------- ## COMBINING SparseArray OBJECTS WITH 3 DIMENSIONS ## --------------------------------------------------------------------- a2a <- array(1:105, dim=c(5, 7, 3), dimnames=list(NULL, paste0("A1y", 1:7), NULL)) a2b <- array(1001:1105, dim=c(5, 7, 3), dimnames=list(paste0("A2x", 1:5), paste0("A2y", 1:7), NULL)) sa2a <- SparseArray(a2a) sa2b <- SparseArray(a2b) abind(sa2a, sa2b) # same as 'abind(sa2a, sa2b, along=3)' abind(sa2a, sa2b, rev.along=0) # same as 'abind(sa2a, sa2b, along=4)' a3a <- array(1:60, dim=c(3, 5, 4), dimnames=list(NULL, paste0("A1y", 1:5), NULL)) a3b <- array(101:240, dim=c(7, 5, 4), dimnames=list(paste0("A2x", 1:7), paste0("A2y", 1:5), NULL)) sa3a <- SparseArray(a3a) sa3b <- SparseArray(a3b) arbind(sa3a, sa3b) # same as 'abind(sa3a, sa3b, along=1)' ## --------------------------------------------------------------------- ## Sanity checks ## --------------------------------------------------------------------- sm1 <- rbind(sm1a, sm1b) m1 <- rbind(m1a, m1b) stopifnot(identical(as.array(sm1), m1), identical(sm1, SparseArray(m1))) sa2 <- abind(sa2a, sa2b) stopifnot(identical(sa2, abind(sa2a, sa2b, along=3))) a2 <- abind(a2a, a2b, along=3) stopifnot(identical(as.array(sa2), a2), identical(sa2, SparseArray(a2))) sa2 <- abind(sa2a, sa2b, rev.along=0) stopifnot(identical(sa2, abind(sa2a, sa2b, along=4))) a2 <- abind(a2a, a2b, along=4) stopifnot(identical(as.array(sa2), a2), identical(sa2, SparseArray(a2))) sa3 <- arbind(sa3a, sa3b) a3 <- arbind(a3a, a3b) stopifnot(identical(as.array(sa3), a3), identical(sa3, SparseArray(a3)))
EXPERIMENTAL!!!
Transpose a SparseArray object by permuting its dimensions.
COMING SOON...
aperm()
in base R.
SparseArray objects.
Ordinary array objects in base R.
# COMING SOON...
# COMING SOON...
SparseArray derivatives support operations from the Arith
group, with some restrictions.
See ?S4groupGeneric
in the methods package
for more information about the Arith
group generic.
IMPORTANT NOTES:
Only SVT_SparseArray objects are supported at the moment. Support for COO_SparseArray objects might be added in the future.
SVT_SparseArray of type()
"complex"
don't
support Arith
operations at the moment.
Two forms of 'Arith' operations are supported:
Between an SVT_SparseArray object svt
and a single
value y
:
svt op y y op svt
The Arith
operations that support this form are: *
,
/
, ^
, %%
,%/%
.
Note that, except for *
(for which both svt * y
and y * svt
are supported), single value y
must
be on the right e.g. svt ^ 3
.
Between two SVT_SparseArray objects svt1
and
svt2
of same dimensions (a.k.a. conformable arrays):
svt1 op svt2
The Arith
operations that support this form are: +
,
-
, *
.
A SparseArray derivative of the same dimensions as the input object(s).
S4groupGeneric
in the methods package.
SparseArray objects.
Ordinary array objects in base R.
## --------------------------------------------------------------------- ## Basic examples ## --------------------------------------------------------------------- svt1 <- SVT_SparseArray(dim=c(15, 6), type="integer") svt1[cbind(1:15, 2)] <- 100:114 svt1[cbind(1:15, 5)] <- -(114:100) svt1 svt1 * -0.01 svt1 * 10 # result is of type "double" svt1 * 10L # result is of type "integer" svt1 / 10L svt1 ^ 3.5 svt1 %% 5L svt1 %/% 5L svt2 <- SVT_SparseArray(dim=dim(svt1), type="double") svt2[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- runif(26) svt2 svt1 + svt2 svt1 - svt2 svt1 * svt2 svt2 * (0.1 * svt1 - svt2 ^ 2) + svt1 / sum(svt2) ## Sanity checks: m1 <- as.matrix(svt1) m2 <- as.matrix(svt2) stopifnot( identical(as.matrix(svt1 * -0.01), m1 * -0.01), identical(as.matrix(svt1 * 10), m1 * 10), identical(as.matrix(svt1 * 10L), m1 * 10L), identical(as.matrix(svt1 / 10L), m1 / 10L), identical(as.matrix(svt1 ^ 3.5), m1 ^ 3.5), identical(as.matrix(svt1 %% 5L), m1 %% 5L), identical(as.matrix(svt1 %/% 5L), m1 %/% 5L), identical(as.matrix(svt1 + svt2), m1 + m2), identical(as.matrix(svt1 - svt2), m1 - m2), identical(as.matrix(svt1 * svt2), m1 * m2), all.equal(as.matrix(svt2 * (0.1 * svt1 - svt2 ^ 2) + svt1 / sum(svt2)), m2 * (0.1 * m1 - m2 ^ 2) + m1 / sum(m2)) ) ## --------------------------------------------------------------------- ## An example combining operations from the 'Arith', 'Compare', ## and 'Logic' groups ## --------------------------------------------------------------------- m3 <- matrix(0L, nrow=15, ncol=6) m3[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- 101:126 svt3 <- SparseArray(m3) ## Can be 5x or 10x faster than with a dgCMatrix object on a big ## SVT_SparseMatrix object! svt4 <- (svt3^1.5 + svt3) %% 100 - 0.2 * svt3 > 0 svt4 ## Sanity check: m4 <- (m3^1.5 + m3) %% 100 - 0.2 * m3 > 0 stopifnot(identical(as.matrix(svt4), m4))
## --------------------------------------------------------------------- ## Basic examples ## --------------------------------------------------------------------- svt1 <- SVT_SparseArray(dim=c(15, 6), type="integer") svt1[cbind(1:15, 2)] <- 100:114 svt1[cbind(1:15, 5)] <- -(114:100) svt1 svt1 * -0.01 svt1 * 10 # result is of type "double" svt1 * 10L # result is of type "integer" svt1 / 10L svt1 ^ 3.5 svt1 %% 5L svt1 %/% 5L svt2 <- SVT_SparseArray(dim=dim(svt1), type="double") svt2[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- runif(26) svt2 svt1 + svt2 svt1 - svt2 svt1 * svt2 svt2 * (0.1 * svt1 - svt2 ^ 2) + svt1 / sum(svt2) ## Sanity checks: m1 <- as.matrix(svt1) m2 <- as.matrix(svt2) stopifnot( identical(as.matrix(svt1 * -0.01), m1 * -0.01), identical(as.matrix(svt1 * 10), m1 * 10), identical(as.matrix(svt1 * 10L), m1 * 10L), identical(as.matrix(svt1 / 10L), m1 / 10L), identical(as.matrix(svt1 ^ 3.5), m1 ^ 3.5), identical(as.matrix(svt1 %% 5L), m1 %% 5L), identical(as.matrix(svt1 %/% 5L), m1 %/% 5L), identical(as.matrix(svt1 + svt2), m1 + m2), identical(as.matrix(svt1 - svt2), m1 - m2), identical(as.matrix(svt1 * svt2), m1 * m2), all.equal(as.matrix(svt2 * (0.1 * svt1 - svt2 ^ 2) + svt1 / sum(svt2)), m2 * (0.1 * m1 - m2 ^ 2) + m1 / sum(m2)) ) ## --------------------------------------------------------------------- ## An example combining operations from the 'Arith', 'Compare', ## and 'Logic' groups ## --------------------------------------------------------------------- m3 <- matrix(0L, nrow=15, ncol=6) m3[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- 101:126 svt3 <- SparseArray(m3) ## Can be 5x or 10x faster than with a dgCMatrix object on a big ## SVT_SparseMatrix object! svt4 <- (svt3^1.5 + svt3) %% 100 - 0.2 * svt3 > 0 svt4 ## Sanity check: m4 <- (m3^1.5 + m3) %% 100 - 0.2 * m3 > 0 stopifnot(identical(as.matrix(svt4), m4))
SparseArray derivatives support operations from the Compare
group, with some restrictions.
See ?S4groupGeneric
in the methods package
for more information about the Compare
group generic.
IMPORTANT NOTE: Only SVT_SparseArray objects are supported at the moment. Support for COO_SparseArray objects might be added in the future.
Two forms of 'Compare' operations are supported:
Between an SVT_SparseArray object svt
and a single
value y
:
svt op y y op svt
All operations from the Compare
group support this form,
with single value y
either on the left or the right.
However, there are some operation-dependent restrictions on the
value of y
.
Between two SVT_SparseArray objects svt1
and
svt2
of same dimensions (a.k.a. conformable arrays):
svt1 op svt2
The Compare
operations that support this form are:
!=
, <
, >
.
A SparseArray derivative of type()
"logical"
and
same dimensions as the input object(s).
S4groupGeneric
in the methods package.
SparseArray objects.
Ordinary array objects in base R.
svt1 <- SVT_SparseArray(dim=c(15, 6), type="double") svt1[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- runif(26) svt1 svt1 >= 0.2 svt1 != 0 ## Sanity checks: m1 <- as.matrix(svt1) stopifnot( identical(as.matrix(svt1 >= 0.2), m1 >= 0.2), identical(as.matrix(svt1 != 0), m1 != 0) )
svt1 <- SVT_SparseArray(dim=c(15, 6), type="double") svt1[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- runif(26) svt1 svt1 >= 0.2 svt1 != 0 ## Sanity checks: m1 <- as.matrix(svt1) stopifnot( identical(as.matrix(svt1 >= 0.2), m1 >= 0.2), identical(as.matrix(svt1 != 0), m1 != 0) )
WORK-IN-PROGRESS
COMING SOON...
S4groupGeneric
in the methods package.
SparseArray objects.
Ordinary array objects in base R.
# COMING SOON...
# COMING SOON...
SparseArray derivatives support operations from the Logic
group (i.e. &
and |
), with some restrictions.
See ?S4groupGeneric
in the methods package
for more information about the Logic
group generic.
IMPORTANT NOTES:
Only SVT_SparseArray objects are supported at the moment. Support for COO_SparseArray objects might be added in the future.
In base R, Logic
operations support input of type()
"logical"
, "integer"
, "double"
, or
"complex"
. However, the corresponding methods for
SVT_SparseArray objects only support objects of
type()
"logical"
for now.
A SparseArray derivative of type()
"logical"
and
same dimensions as the input object(s).
S4groupGeneric
in the methods package.
SparseArray objects.
Ordinary array objects in base R.
svt1 <- svt2 <- SVT_SparseArray(dim=c(15, 6)) svt1[cbind(1:15, 2)] <- c(TRUE, FALSE, NA) svt1[cbind(1:15, 5)] <- c(TRUE, NA, NA, FALSE, TRUE) svt2[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- c(TRUE, NA) svt1 & svt2 svt1 | svt2 ## Sanity checks: m1 <- as.matrix(svt1) m2 <- as.matrix(svt2) stopifnot( identical(as.matrix(svt1 & svt2), m1 & m2), identical(as.matrix(svt1 | svt2), m1 | m2) )
svt1 <- svt2 <- SVT_SparseArray(dim=c(15, 6)) svt1[cbind(1:15, 2)] <- c(TRUE, FALSE, NA) svt1[cbind(1:15, 5)] <- c(TRUE, NA, NA, FALSE, TRUE) svt2[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- c(TRUE, NA) svt1 & svt2 svt1 | svt2 ## Sanity checks: m1 <- as.matrix(svt1) m2 <- as.matrix(svt2) stopifnot( identical(as.matrix(svt1 & svt2), m1 & m2), identical(as.matrix(svt1 | svt2), m1 | m2) )
SparseArray derivatives support a subset of operations
from the Math
and Math2
groups.
See ?S4groupGeneric
in the methods package
for more information about the Math
and Math2
group generics.
IMPORTANT NOTES:
Only operations from these groups that preserve sparsity are
supported. For example, sqrt()
, trunc()
,
log1p()
, and sin()
are supported, but cumsum()
,
log()
, cos()
, or gamma()
are not.
Only SVT_SparseArray objects are supported at the moment. Support for COO_SparseArray objects might be added in the future.
Math
and Math2
operations only support
SVT_SparseArray objects of type()
"double"
at the moment.
A SparseArray derivative of the same dimensions as the input object.
S4groupGeneric
in the methods package.
SparseArray objects.
Ordinary array objects in base R.
m <- matrix(0, nrow=15, ncol=6) m[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- c(runif(22)*1e4, Inf, -Inf, NA, NaN) svt <- SparseArray(m) svt2 <- trunc(sqrt(svt)) svt2 ## Sanity check: m2 <- suppressWarnings(trunc(sqrt(m))) stopifnot(identical(as.matrix(svt2), m2))
m <- matrix(0, nrow=15, ncol=6) m[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- c(runif(22)*1e4, Inf, -Inf, NA, NaN) svt <- SparseArray(m) svt2 <- trunc(sqrt(svt)) svt2 ## Sanity check: m2 <- suppressWarnings(trunc(sqrt(m))) stopifnot(identical(as.matrix(svt2), m2))
The SparseArray package provides memory-efficient col/row
summarization methods (a.k.a. matrixStats methods) for SparseArray
objects, like colSums()
, rowSums()
, colMedians()
,
rowMedians()
, colVars()
, rowVars()
, etc...
Note that these are S4 generic functions defined in the MatrixGenerics package, with methods for ordinary matrices defined in the matrixStats package. This man page documents the methods defined for SparseArray objects.
## N.B.: Showing ONLY the col*() methods (usage of row*() methods is ## the same): ## S4 method for signature 'SparseArray' colAnyNAs(x, rows=NULL, cols=NULL, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colAnys(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colAlls(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colMins(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colMaxs(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colRanges(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colSums(x, na.rm=FALSE, dims=1) ## S4 method for signature 'SparseArray' colProds(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colMeans(x, na.rm=FALSE, dims=1) ## S4 method for signature 'SparseArray' colSums2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colMeans2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colVars(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colSds(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colMedians(x, rows=NULL, cols=NULL, na.rm=FALSE, ..., useNames=NA)
## N.B.: Showing ONLY the col*() methods (usage of row*() methods is ## the same): ## S4 method for signature 'SparseArray' colAnyNAs(x, rows=NULL, cols=NULL, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colAnys(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colAlls(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colMins(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colMaxs(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colRanges(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colSums(x, na.rm=FALSE, dims=1) ## S4 method for signature 'SparseArray' colProds(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colMeans(x, na.rm=FALSE, dims=1) ## S4 method for signature 'SparseArray' colSums2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colMeans2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colVars(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colSds(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1, ..., useNames=NA) ## S4 method for signature 'SparseArray' colMedians(x, rows=NULL, cols=NULL, na.rm=FALSE, ..., useNames=NA)
x |
A SparseMatrix or SparseArray object. Note that the |
rows , cols , ...
|
Not supported. |
na.rm , useNames , center
|
See man pages of the corresponding generics in the MatrixGenerics
package (e.g. Note that, unlike the methods for ordinary matrices defined in
the matrixStats package, the |
dims |
See |
All these methods operate natively on the SVT_SparseArray internal representation, for maximum efficiency.
Note that more col/row summarization methods might be added in the future.
See man pages of the corresponding generics in the MatrixGenerics
package (e.g. ?MatrixGenerics::colRanges
)
for the value returned by these methods.
Most col*()
methods for SparseArray objects are multithreaded.
See set_SparseArray_nthread
for how to control the number
of threads.
SparseArray objects.
The man pages of the various generic functions defined
in the MatrixGenerics package e.g.
MatrixGenerics::colVars
etc...
## --------------------------------------------------------------------- ## 2D CASE ## --------------------------------------------------------------------- m0 <- matrix(0L, nrow=6, ncol=4, dimnames=list(letters[1:6], LETTERS[1:4])) m0[c(1:2, 8, 10, 15:17, 24)] <- (1:8)*10L m0["e", "B"] <- NA svt0 <- SparseArray(m0) svt0 colSums(svt0) colSums(svt0, na.rm=TRUE) rowSums(svt0) rowSums(svt0, na.rm=TRUE) colMeans(svt0) colMeans(svt0, na.rm=TRUE) colRanges(svt0) colRanges(svt0, useNames=FALSE) colRanges(svt0, na.rm=TRUE) colRanges(svt0, na.rm=TRUE, useNames=FALSE) colVars(svt0) colVars(svt0, useNames=FALSE) ## Sanity checks: stopifnot( identical(colSums(svt0), colSums(m0)), identical(colSums(svt0, na.rm=TRUE), colSums(m0, na.rm=TRUE)), identical(rowSums(svt0), rowSums(m0)), identical(rowSums(svt0, na.rm=TRUE), rowSums(m0, na.rm=TRUE)), identical(colMeans(svt0), colMeans(m0)), identical(colMeans(svt0, na.rm=TRUE), colMeans(m0, na.rm=TRUE)), identical(colRanges(svt0), colRanges(m0, useNames=TRUE)), identical(colRanges(svt0, useNames=FALSE), colRanges(m0, useNames=FALSE)), identical(colRanges(svt0, na.rm=TRUE), colRanges(m0, na.rm=TRUE, useNames=TRUE)), identical(colVars(svt0), colVars(m0, useNames=TRUE)), identical(colVars(svt0, na.rm=TRUE), colVars(m0, na.rm=TRUE, useNames=TRUE)) ) ## --------------------------------------------------------------------- ## 3D CASE (AND ARBITRARY NUMBER OF DIMENSIONS) ## --------------------------------------------------------------------- set.seed(2009) svt <- 6L * (poissonSparseArray(5:3, density=0.35) - poissonSparseArray(5:3, density=0.35)) dimnames(svt) <- list(NULL, letters[1:4], LETTERS[1:3]) cs1 <- colSums(svt) cs1 # cs1[j , k] is equal to sum(svt[ , j, k]) cs2 <- colSums(svt, dims=2) cs2 # cv2[k] is equal to sum(svt[ , , k]) cv1 <- colVars(svt) cv1 # cv1[j , k] is equal to var(svt[ , j, k]) cv2 <- colVars(svt, dims=2) cv2 # cv2[k] is equal to var(svt[ , , k]) ## Sanity checks: k_idx <- setNames(seq_len(dim(svt)[3]), dimnames(svt)[[3]]) j_idx <- setNames(seq_len(dim(svt)[2]), dimnames(svt)[[2]]) cv1b <- sapply(k_idx, function(k) sapply(j_idx, function(j) var(svt[ , j, k, drop=FALSE]))) cv2b <- sapply(k_idx, function(k) var(svt[ , , k])) stopifnot( identical(colSums(svt), colSums(as.array(svt))), identical(colSums(svt, dims=2), colSums(as.array(svt), dims=2)), identical(cv1, cv1b), identical(cv2, cv2b) )
## --------------------------------------------------------------------- ## 2D CASE ## --------------------------------------------------------------------- m0 <- matrix(0L, nrow=6, ncol=4, dimnames=list(letters[1:6], LETTERS[1:4])) m0[c(1:2, 8, 10, 15:17, 24)] <- (1:8)*10L m0["e", "B"] <- NA svt0 <- SparseArray(m0) svt0 colSums(svt0) colSums(svt0, na.rm=TRUE) rowSums(svt0) rowSums(svt0, na.rm=TRUE) colMeans(svt0) colMeans(svt0, na.rm=TRUE) colRanges(svt0) colRanges(svt0, useNames=FALSE) colRanges(svt0, na.rm=TRUE) colRanges(svt0, na.rm=TRUE, useNames=FALSE) colVars(svt0) colVars(svt0, useNames=FALSE) ## Sanity checks: stopifnot( identical(colSums(svt0), colSums(m0)), identical(colSums(svt0, na.rm=TRUE), colSums(m0, na.rm=TRUE)), identical(rowSums(svt0), rowSums(m0)), identical(rowSums(svt0, na.rm=TRUE), rowSums(m0, na.rm=TRUE)), identical(colMeans(svt0), colMeans(m0)), identical(colMeans(svt0, na.rm=TRUE), colMeans(m0, na.rm=TRUE)), identical(colRanges(svt0), colRanges(m0, useNames=TRUE)), identical(colRanges(svt0, useNames=FALSE), colRanges(m0, useNames=FALSE)), identical(colRanges(svt0, na.rm=TRUE), colRanges(m0, na.rm=TRUE, useNames=TRUE)), identical(colVars(svt0), colVars(m0, useNames=TRUE)), identical(colVars(svt0, na.rm=TRUE), colVars(m0, na.rm=TRUE, useNames=TRUE)) ) ## --------------------------------------------------------------------- ## 3D CASE (AND ARBITRARY NUMBER OF DIMENSIONS) ## --------------------------------------------------------------------- set.seed(2009) svt <- 6L * (poissonSparseArray(5:3, density=0.35) - poissonSparseArray(5:3, density=0.35)) dimnames(svt) <- list(NULL, letters[1:4], LETTERS[1:3]) cs1 <- colSums(svt) cs1 # cs1[j , k] is equal to sum(svt[ , j, k]) cs2 <- colSums(svt, dims=2) cs2 # cv2[k] is equal to sum(svt[ , , k]) cv1 <- colVars(svt) cv1 # cv1[j , k] is equal to var(svt[ , j, k]) cv2 <- colVars(svt, dims=2) cv2 # cv2[k] is equal to var(svt[ , , k]) ## Sanity checks: k_idx <- setNames(seq_len(dim(svt)[3]), dimnames(svt)[[3]]) j_idx <- setNames(seq_len(dim(svt)[2]), dimnames(svt)[[2]]) cv1b <- sapply(k_idx, function(k) sapply(j_idx, function(j) var(svt[ , j, k, drop=FALSE]))) cv2b <- sapply(k_idx, function(k) var(svt[ , , k])) stopifnot( identical(colSums(svt), colSums(as.array(svt))), identical(colSums(svt, dims=2), colSums(as.array(svt), dims=2)), identical(cv1, cv1b), identical(cv2, cv2b) )
This man page documents various base array operations that are supported by SparseArray derivatives, and that didn't belong to any of the groups of operations documented in the other man pages of the SparseArray package.
# --- unary isometric array transformations --- ## S4 method for signature 'COO_SparseArray' is.na(x) ## S4 method for signature 'SVT_SparseArray' is.na(x) ## S4 method for signature 'COO_SparseArray' is.nan(x) ## S4 method for signature 'SVT_SparseArray' is.nan(x) ## S4 method for signature 'COO_SparseArray' is.infinite(x) ## S4 method for signature 'SVT_SparseArray' is.infinite(x) ## S4 method for signature 'COO_SparseArray' tolower(x) ## S4 method for signature 'COO_SparseArray' toupper(x) ## S4 method for signature 'COO_SparseArray' nchar(x, type="chars", allowNA=FALSE, keepNA=NA) # --- N-ary isometric array transformations --- ## S4 method for signature 'SparseArray' pmin(..., na.rm=FALSE) ## S4 method for signature 'SparseArray' pmax(..., na.rm=FALSE)
# --- unary isometric array transformations --- ## S4 method for signature 'COO_SparseArray' is.na(x) ## S4 method for signature 'SVT_SparseArray' is.na(x) ## S4 method for signature 'COO_SparseArray' is.nan(x) ## S4 method for signature 'SVT_SparseArray' is.nan(x) ## S4 method for signature 'COO_SparseArray' is.infinite(x) ## S4 method for signature 'SVT_SparseArray' is.infinite(x) ## S4 method for signature 'COO_SparseArray' tolower(x) ## S4 method for signature 'COO_SparseArray' toupper(x) ## S4 method for signature 'COO_SparseArray' nchar(x, type="chars", allowNA=FALSE, keepNA=NA) # --- N-ary isometric array transformations --- ## S4 method for signature 'SparseArray' pmin(..., na.rm=FALSE) ## S4 method for signature 'SparseArray' pmax(..., na.rm=FALSE)
x |
A SparseArray derivative. |
type , allowNA , keepNA
|
See |
... |
SparseArray derivatives. |
na.rm |
See |
More operations will be added in the future.
See man pages of the corresponding base functions (e.g.
?base::is.na
, ?base::nchar
,
?base::pmin
, etc...) for the value returned by
these methods.
Note that, like the base functions, the methods documented in this man page are endomorphisms i.e. they return an array-like object of the same class as the input.
base::is.na
and
base::is.infinite
in base R.
base::tolower
in base R.
base::nchar
in base R.
base::pmin
in base R.
SparseArray objects.
Ordinary array objects in base R.
a <- array(c(0, 2.77, NA, 0, NaN, -Inf), dim=5:3) svt <- SparseArray(a) # SVT_SparseArray object class(svt) is.na(svt) # SVT_SparseArray object of type "logical" is.nan(svt) # SVT_SparseArray object of type "logical" is.infinite(svt) # SVT_SparseArray object of type "logical" svt1 <- poissonSparseMatrix(500, 20, density=0.2) svt2 <- poissonSparseMatrix(500, 20, density=0.25) * 0.77 pmin(svt1, svt2) pmax(svt1, svt2) ## Sanity checks: res <- is.na(svt) stopifnot(is(res, "SVT_SparseArray"), type(res) == "logical", identical(as.array(res), is.na(a))) res <- is.nan(svt) stopifnot(is(res, "SVT_SparseArray"), type(res) == "logical", identical(as.array(res), is.nan(a))) res <- is.infinite(svt) stopifnot(is(res, "SVT_SparseArray"), type(res) == "logical", identical(as.array(res), is.infinite(a))) res <- pmin(svt1, svt2) stopifnot(is(res, "SVT_SparseArray"), identical(as.array(res), pmin(as.array(svt1), as.array(svt2)))) res <- pmax(svt1, svt2) stopifnot(is(res, "SVT_SparseArray"), identical(as.array(res), pmax(as.array(svt1), as.array(svt2))))
a <- array(c(0, 2.77, NA, 0, NaN, -Inf), dim=5:3) svt <- SparseArray(a) # SVT_SparseArray object class(svt) is.na(svt) # SVT_SparseArray object of type "logical" is.nan(svt) # SVT_SparseArray object of type "logical" is.infinite(svt) # SVT_SparseArray object of type "logical" svt1 <- poissonSparseMatrix(500, 20, density=0.2) svt2 <- poissonSparseMatrix(500, 20, density=0.25) * 0.77 pmin(svt1, svt2) pmax(svt1, svt2) ## Sanity checks: res <- is.na(svt) stopifnot(is(res, "SVT_SparseArray"), type(res) == "logical", identical(as.array(res), is.na(a))) res <- is.nan(svt) stopifnot(is(res, "SVT_SparseArray"), type(res) == "logical", identical(as.array(res), is.nan(a))) res <- is.infinite(svt) stopifnot(is(res, "SVT_SparseArray"), type(res) == "logical", identical(as.array(res), is.infinite(a))) res <- pmin(svt1, svt2) stopifnot(is(res, "SVT_SparseArray"), identical(as.array(res), pmin(as.array(svt1), as.array(svt2)))) res <- pmax(svt1, svt2) stopifnot(is(res, "SVT_SparseArray"), identical(as.array(res), pmax(as.array(svt1), as.array(svt2))))
Like ordinary arrays in base R, SparseArray derivatives support
subassignment via the [<-
operator.
[<-
in base R.
SparseArray objects.
Ordinary array objects in base R.
a <- array(0L, dim=5:3) a[c(1:2, 8, 10, 15:17, 20, 24, 40, 56:60)] <- (1:15)*10L svt <- SparseArray(a) svt svt[5:3, c(4,2,4), 2:3] <- -99L ## Sanity checks: a[5:3, c(4,2,4), 2:3] <- -99L stopifnot(identical(as.array(svt), a), identical(svt, SparseArray(a)))
a <- array(0L, dim=5:3) a[c(1:2, 8, 10, 15:17, 20, 24, 40, 56:60)] <- (1:15)*10L svt <- SparseArray(a) svt svt[5:3, c(4,2,4), 2:3] <- -99L ## Sanity checks: a[5:3, c(4,2,4), 2:3] <- -99L stopifnot(identical(as.array(svt), a), identical(svt, SparseArray(a)))
Like ordinary arrays in base R, SparseArray derivatives support
subsetting via the single bracket operator ([
).
drop
in base R to drop the ineffective
dimensions of an array or array-like object.
Lindex2Mindex
in the S4Arrays
package for how to convert an L-index to an M-index
and vice-versa.
SparseArray objects.
a <- array(0L, dim=5:3) a[c(1:2, 8, 10, 15:17, 20, 24, 40, 56:60)] <- (1:15)*10L svt <- SparseArray(a) svt ## --------------------------------------------------------------------- ## N-dimensional subsetting ## --------------------------------------------------------------------- svt[5:3, c(4,2,4), 2:3] svt[ , c(4,2,4), 2:3] svt[ , c(4,2,4), -1] svt[ , c(4,2,4), 1] svt2 <- svt[ , c(4,2,4), 1, drop=FALSE] svt2 ## Ineffective dimensions can always be dropped as a separate step: drop(svt2) svt[ , c(4,2,4), integer(0)] dimnames(a) <- list(letters[1:5], NULL, LETTERS[1:3]) svt <- SparseArray(a) svt[c("d", "a"), c(4,2,4), "C"] svt2 <- svt["e", c(4,2,4), , drop=FALSE] svt2 drop(svt2) ## --------------------------------------------------------------------- ## 1D-style subsetting (a.k.a. linear subsetting) ## --------------------------------------------------------------------- ## Using a numeric vector (L-index): svt[c(60, 24, 21, 56)] ## Using a matrix subscript (M-index): m <- rbind(c(5, 4, 3), c(4, 1, 2), c(1, 1, 2), c(1, 4, 3)) svt[m] ## See '?Lindex2Mindex' in the S4Arrays package for how to convert an ## L-index to an M-index and vice-versa. ## --------------------------------------------------------------------- ## Sanity checks ## --------------------------------------------------------------------- svt2 <- svt[5:3, c(4,2,4), 2:3] a2 <- a [5:3, c(4,2,4), 2:3] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- svt[ , c(4,2,4), 2:3] a2 <- a [ , c(4,2,4), 2:3] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- svt[ , c(4,2,4), -1] a2 <- a [ , c(4,2,4), -1] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- svt[ , c(4,2,4), 1] a2 <- a [ , c(4,2,4), 1] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- svt[ , c(4,2,4), 1, drop=FALSE] a2 <- a [ , c(4,2,4), 1, drop=FALSE] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- drop(svt2) a2 <- drop(a2) stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- svt[ , c(4,2,4), integer(0)] a2 <- a [ , c(4,2,4), integer(0)] stopifnot(identical(as.array(svt2), a2), identical(unname(svt2), unname(SparseArray(a2)))) svt2 <- svt[c("d", "a"), c(4,2,4), "C"] a2 <- a [c("d", "a"), c(4,2,4), "C"] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- svt["e", c(4,2,4), , drop=FALSE] a2 <- a ["e", c(4,2,4), , drop=FALSE] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- drop(svt2) a2 <- drop(a2) stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) stopifnot(identical(svt[c(60, 24, 21, 56)], svt[m]))
a <- array(0L, dim=5:3) a[c(1:2, 8, 10, 15:17, 20, 24, 40, 56:60)] <- (1:15)*10L svt <- SparseArray(a) svt ## --------------------------------------------------------------------- ## N-dimensional subsetting ## --------------------------------------------------------------------- svt[5:3, c(4,2,4), 2:3] svt[ , c(4,2,4), 2:3] svt[ , c(4,2,4), -1] svt[ , c(4,2,4), 1] svt2 <- svt[ , c(4,2,4), 1, drop=FALSE] svt2 ## Ineffective dimensions can always be dropped as a separate step: drop(svt2) svt[ , c(4,2,4), integer(0)] dimnames(a) <- list(letters[1:5], NULL, LETTERS[1:3]) svt <- SparseArray(a) svt[c("d", "a"), c(4,2,4), "C"] svt2 <- svt["e", c(4,2,4), , drop=FALSE] svt2 drop(svt2) ## --------------------------------------------------------------------- ## 1D-style subsetting (a.k.a. linear subsetting) ## --------------------------------------------------------------------- ## Using a numeric vector (L-index): svt[c(60, 24, 21, 56)] ## Using a matrix subscript (M-index): m <- rbind(c(5, 4, 3), c(4, 1, 2), c(1, 1, 2), c(1, 4, 3)) svt[m] ## See '?Lindex2Mindex' in the S4Arrays package for how to convert an ## L-index to an M-index and vice-versa. ## --------------------------------------------------------------------- ## Sanity checks ## --------------------------------------------------------------------- svt2 <- svt[5:3, c(4,2,4), 2:3] a2 <- a [5:3, c(4,2,4), 2:3] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- svt[ , c(4,2,4), 2:3] a2 <- a [ , c(4,2,4), 2:3] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- svt[ , c(4,2,4), -1] a2 <- a [ , c(4,2,4), -1] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- svt[ , c(4,2,4), 1] a2 <- a [ , c(4,2,4), 1] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- svt[ , c(4,2,4), 1, drop=FALSE] a2 <- a [ , c(4,2,4), 1, drop=FALSE] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- drop(svt2) a2 <- drop(a2) stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- svt[ , c(4,2,4), integer(0)] a2 <- a [ , c(4,2,4), integer(0)] stopifnot(identical(as.array(svt2), a2), identical(unname(svt2), unname(SparseArray(a2)))) svt2 <- svt[c("d", "a"), c(4,2,4), "C"] a2 <- a [c("d", "a"), c(4,2,4), "C"] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- svt["e", c(4,2,4), , drop=FALSE] a2 <- a ["e", c(4,2,4), , drop=FALSE] stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) svt2 <- drop(svt2) a2 <- drop(a2) stopifnot(identical(as.array(svt2), a2), identical(svt2, SparseArray(a2))) stopifnot(identical(svt[c(60, 24, 21, 56)], svt[m]))
The SparseArray package provides memory-efficient summarization
methods for SparseArray objects. The following methods are
supported at the moment: anyNA()
, any()
, all()
,
min()
, max()
, range()
, sum()
, prod()
,
mean()
, var()
, sd()
.
More might be added in the future.
Note that these are S4 generic functions defined in base R and in the BiocGenerics package, with default methods defined in base R. This man page documents the methods defined for SparseArray objects.
All these methods operate natively on the COO_SparseArray or SVT_SparseArray representation, for maximum efficiency.
See man pages of the corresponding default methods in the
base package (e.g. ?base::range
,
?base::mean
, etc...) for the value returned
by these methods.
SparseArray objects.
The man pages of the various default methods defined in
the base package e.g. base::range
,
base::mean
, base::anyNA
,
etc...
svt0 <- SVT_SparseArray(dim=c(4, 5, 2)) svt0[c(1:2, 8, 10, 15:17, 24:26, 28, 40)] <- (1:12)*10L svt0[4, 3, 1] <- NA svt0 anyNA(svt0) range(svt0) range(svt0, na.rm=TRUE) sum(svt0, na.rm=TRUE) sd(svt0, na.rm=TRUE) ## Sanity checks: a0 <- as.array(svt0) stopifnot( identical(anyNA(svt0), anyNA(a0)), identical(range(svt0), range(a0)), identical(range(svt0, na.rm=TRUE), range(a0, na.rm=TRUE)), identical(sum(svt0), sum(a0)), identical(sum(svt0, na.rm=TRUE), sum(a0, na.rm=TRUE)), all.equal(sd(svt0, na.rm=TRUE), sd(a0, na.rm=TRUE)) )
svt0 <- SVT_SparseArray(dim=c(4, 5, 2)) svt0[c(1:2, 8, 10, 15:17, 24:26, 28, 40)] <- (1:12)*10L svt0[4, 3, 1] <- NA svt0 anyNA(svt0) range(svt0) range(svt0, na.rm=TRUE) sum(svt0, na.rm=TRUE) sd(svt0, na.rm=TRUE) ## Sanity checks: a0 <- as.array(svt0) stopifnot( identical(anyNA(svt0), anyNA(a0)), identical(range(svt0), range(a0)), identical(range(svt0, na.rm=TRUE), range(a0, na.rm=TRUE)), identical(sum(svt0), sum(a0)), identical(sum(svt0, na.rm=TRUE), sum(a0, na.rm=TRUE)), all.equal(sd(svt0, na.rm=TRUE), sd(a0, na.rm=TRUE)) )
Like ordinary matrices in base R, SparseMatrix derivatives can
be multiplied with the %*%
operator. They also support
crossprod()
and tcrossprod()
.
The %*%
, crossprod()
and tcrossprod()
methods
for SparseMatrix objects always return an ordinary matrix
of type()
"double"
.
Matrix multiplication and cross-product of SparseMatrix
derivatives are multithreaded.
See set_SparseArray_nthread
for how to control the number
of threads.
SparseMatrix objects.
S4Arrays::type
in the S4Arrays
package to get the type of the elements of an array-like object.
Ordinary matrix objects in base R.
m1 <- matrix(0L, nrow=15, ncol=6) m1[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- 101:126 svt1 <- as(m1, "SVT_SparseMatrix") set.seed(333) svt2 <- poissonSparseMatrix(nrow=6, ncol=7, density=0.2) svt1 %*% svt2 m1 %*% svt2 ## Unary crossprod() and tcrossprod(): crossprod(svt1) # same as t(svt1) %*% svt1 tcrossprod(svt1) # same as svt1 %*% t(svt1) ## Binary crossprod() and tcrossprod(): crossprod(svt1[1:6, ], svt2) # same as t(svt1[1:6, ]) %*% svt2 tcrossprod(svt1, t(svt2)) # same as svt1 %*% svt2 ## Sanity checks: m12 <- m1 %*% as.matrix(svt2) stopifnot( identical(svt1 %*% svt2, m12), identical(m1 %*% svt2, m12), identical(crossprod(svt1), t(svt1) %*% svt1), identical(tcrossprod(svt1), svt1 %*% t(svt1)), identical(crossprod(svt1[1:6, ], svt2), t(svt1[1:6, ]) %*% svt2), identical(tcrossprod(svt1, t(svt2)), m12) )
m1 <- matrix(0L, nrow=15, ncol=6) m1[c(2, 6, 12:17, 22:33, 55, 59:62, 90)] <- 101:126 svt1 <- as(m1, "SVT_SparseMatrix") set.seed(333) svt2 <- poissonSparseMatrix(nrow=6, ncol=7, density=0.2) svt1 %*% svt2 m1 %*% svt2 ## Unary crossprod() and tcrossprod(): crossprod(svt1) # same as t(svt1) %*% svt1 tcrossprod(svt1) # same as svt1 %*% t(svt1) ## Binary crossprod() and tcrossprod(): crossprod(svt1[1:6, ], svt2) # same as t(svt1[1:6, ]) %*% svt2 tcrossprod(svt1, t(svt2)) # same as svt1 %*% svt2 ## Sanity checks: m12 <- m1 %*% as.matrix(svt2) stopifnot( identical(svt1 %*% svt2, m12), identical(m1 %*% svt2, m12), identical(crossprod(svt1), t(svt1) %*% svt1), identical(tcrossprod(svt1), svt1 %*% t(svt1)), identical(crossprod(svt1[1:6, ], svt2), t(svt1[1:6, ]) %*% svt2), identical(tcrossprod(svt1, t(svt2)), m12) )
The SVT_SparseArray class is a new container for efficient in-memory representation of multidimensional sparse arrays. It uses the SVT layout to represent the nonzero multidimensional data internally.
An SVT_SparseMatrix object is an SVT_SparseArray object of 2 dimensions.
Note that SVT_SparseArray and SVT_SparseMatrix objects replace the older and less efficient COO_SparseArray and COO_SparseMatrix objects.
## Constructor function: SVT_SparseArray(x, dim=NULL, dimnames=NULL, type=NA)
## Constructor function: SVT_SparseArray(x, dim=NULL, dimnames=NULL, type=NA)
x |
If If |
dim |
|
dimnames |
The dimnames of the object to construct. Must be |
type |
A single string specifying the requested type of the object. Normally, the SVT_SparseArray object returned by the constructor
function has the same svt <- SVT_SparseArray(x, type=type) is equivalent to doing: svt <- SVT_SparseArray(x) type(svt) <- type but the former is more convenient and will generally be more efficient. Supported types are all R atomic types plus |
SVT_SparseArray is a concrete subclass of the SparseArray virtual class. This makes SVT_SparseArray objects SparseArray derivatives.
The nonzero data in a SVT_SparseArray object is stored in a Sparse Vector Tree. We'll refer to this internal data representation as the SVT layout. See the "SVT layout" section below for more information.
The SVT layout is similar to the CSC layout (compressed, sparse, column-oriented format) used by CsparseMatrix derivatives from the Matrix package, like dgCMatrix or lgCMatrix objects, but with the following improvements:
The SVT layout supports sparse arrays of arbitrary dimensions.
With the SVT layout, the sparse data can be of any type.
Whereas CsparseMatrix derivatives only support sparse data
of type "double"
or "logical"
at the moment.
The SVT layout imposes no limit on the number of nonzero elements that can be stored. With dgCMatrix/lgCMatrix objects, this number must be < 2^31.
Overall, the SVT layout allows more efficient operations on SVT_SparseArray objects.
An SVT_SparseArray or SVT_SparseMatrix object.
An SVT (Sparse Vector Tree) is a tree of depth N - 1 where N is the number of dimensions of the sparse array.
The leaves in the tree can only be of two kinds: NULL or leaf vector. Leaves that are leaf vectors can only be found at the deepest level in the tree (i.e. at depth N - 1). All leaves found at a lower depth must be NULLs.
A leaf vector represents a sparse vector along the first dimension (a.k.a. innermost or fastest moving dimension) of the sparse array. It contains a collection of offset/value pairs sorted by strictly ascending offset. More precisely, a leaf vector is represented by an ordinary list of 2 parallel dense vectors:
nzvals: a vector (atomic or list) of nonzero values (zeros are not allowed);
nzoffs: an integer vector of offsets (i.e. 0-based positions).
The 1st vector determines the type of the leaf vector i.e. "double"
,
"integer"
, "logical"
, etc...
All the leaf vectors in the SVT must have the same type as the sparse array.
It's useful to realize that a leaf vector simply represents a 1D SVT.
In SparseArray 1.5.4 a new type of leaf vector was introduced called
lacunar leaf. A lacunar leaf is a non-empty leaf vector where the
nzvals component is set to NULL
. In this case the nonzero values are
implicit: they're all considered to be equal to one.
Examples:
An SVT_SparseArray object with 1 dimension has its nonzero data stored in an SVT of depth 0. Such SVT is represented by a single leaf vector.
An SVT_SparseArray object with 2 dimensions has its nonzero data stored in an SVT of depth 1. Such SVT is represented by a list of length the extend of the 2nd dimension (number of columns). Each list element is an SVT of depth 0 (as described above), or a NULL if the corresponding column is empty (i.e. has no nonzero data).
For example, the nonzero data of an 8-column sparse matrix will be stored in an SVT that looks like this:
.------------------list-of-length-8-----------------. / / / | | \ \ \ | | | | | | | | leaf leaf NULL leaf leaf leaf leaf NULL vector vector vector vector vector vector
The NULL leaves represent the empty columns (i.e. the columns with no nonzero elements).
An SVT_SparseArray object with 3 dimensions has its nonzero data stored in an SVT of depth 2. Such SVT is represented by a list of length the extend of the 3rd dimension. Each list element must be an SVT of depth 1 (as described above) that stores the nonzero data of the corresponding 2D slice, or a NULL if the 2D slice is empty (i.e. has no nonzero data).
And so on...
The SparseArray class for the virtual parent class of COO_SparseArray and SVT_SparseArray.
S4 classes dgCMatrix and lgCMatrix defined in the Matrix package, for the de facto standard of sparse matrix representations in the R ecosystem.
Virtual class CsparseMatrix defined in the Matrix package for the parent class of all classes that use the "CSC layout".
The Matrix::rsparsematrix
function in
the Matrix package.
Ordinary array objects in base R.
## --------------------------------------------------------------------- ## BASIC CONSTRUCTION ## --------------------------------------------------------------------- SVT_SparseArray(dim=5:3) # allzero object SVT_SparseArray(dim=c(35000, 2e6), type="raw") # allzero object ## Use a dgCMatrix object to fill the SVT_SparseArray object to construct: x <- rsparsematrix(10, 16, density=0.1) # random dgCMatrix object SVT_SparseArray(x, dim=c(8, 5, 4)) svt1 <- SVT_SparseArray(dim=c(12, 5, 2)) # allzero object svt1[cbind(11, 2:5, 2)] <- 22:25 svt1 svt2 <- SVT_SparseArray(dim=c(6, 4), type="integer", dimnames=list(letters[1:6], LETTERS[1:4])) svt2[c(1:2, 8, 10, 15:17, 24)] <- (1:8)*10L svt2 ## --------------------------------------------------------------------- ## CSC (Compressed Sparse Column) LAYOUT VS SVT LAYOUT ## --------------------------------------------------------------------- ## dgCMatrix objects from the Matrix package use the CSC layout: dgcm2 <- as(svt2, "dgCMatrix") dgcm2@x # nonzero values dgcm2@i # row indices of the nonzero values dgcm2@p # breakpoints (0 followed by one breakpoint per column) str(svt2) m3 <- matrix(rpois(54e6, lambda=0.4), ncol=1200) ## Note that 'SparseArray(m3)' can also be used for this: svt3 <- SVT_SparseArray(m3) svt3 dgcm3 <- as(m3, "dgCMatrix") ## Compare type and memory footprint: type(svt3) object.size(svt3) type(dgcm3) object.size(dgcm3) ## Transpose: system.time(svt <- t(t(svt3))) system.time(dgcm <- t(t(dgcm3))) identical(svt, svt3) identical(dgcm, dgcm3) ## rbind(): m4 <- matrix(rpois(45e6, lambda=0.4), ncol=1200) svt4 <- SVT_SparseArray(m4) dgcm4 <- as(m4, "dgCMatrix") system.time(rbind(svt3, svt4)) system.time(rbind(dgcm3, dgcm4))
## --------------------------------------------------------------------- ## BASIC CONSTRUCTION ## --------------------------------------------------------------------- SVT_SparseArray(dim=5:3) # allzero object SVT_SparseArray(dim=c(35000, 2e6), type="raw") # allzero object ## Use a dgCMatrix object to fill the SVT_SparseArray object to construct: x <- rsparsematrix(10, 16, density=0.1) # random dgCMatrix object SVT_SparseArray(x, dim=c(8, 5, 4)) svt1 <- SVT_SparseArray(dim=c(12, 5, 2)) # allzero object svt1[cbind(11, 2:5, 2)] <- 22:25 svt1 svt2 <- SVT_SparseArray(dim=c(6, 4), type="integer", dimnames=list(letters[1:6], LETTERS[1:4])) svt2[c(1:2, 8, 10, 15:17, 24)] <- (1:8)*10L svt2 ## --------------------------------------------------------------------- ## CSC (Compressed Sparse Column) LAYOUT VS SVT LAYOUT ## --------------------------------------------------------------------- ## dgCMatrix objects from the Matrix package use the CSC layout: dgcm2 <- as(svt2, "dgCMatrix") dgcm2@x # nonzero values dgcm2@i # row indices of the nonzero values dgcm2@p # breakpoints (0 followed by one breakpoint per column) str(svt2) m3 <- matrix(rpois(54e6, lambda=0.4), ncol=1200) ## Note that 'SparseArray(m3)' can also be used for this: svt3 <- SVT_SparseArray(m3) svt3 dgcm3 <- as(m3, "dgCMatrix") ## Compare type and memory footprint: type(svt3) object.size(svt3) type(dgcm3) object.size(dgcm3) ## Transpose: system.time(svt <- t(t(svt3))) system.time(dgcm <- t(t(dgcm3))) identical(svt, svt3) identical(dgcm, dgcm3) ## rbind(): m4 <- matrix(rpois(45e6, lambda=0.4), ncol=1200) svt4 <- SVT_SparseArray(m4) dgcm4 <- as(m4, "dgCMatrix") system.time(rbind(svt3, svt4)) system.time(rbind(dgcm3, dgcm4))
Use get_SparseArray_nthread
or set_SparseArray_nthread
to get or set the number of threads to use by the multithreaded
operations implemented in the SparseArray package.
get_SparseArray_nthread() set_SparseArray_nthread(nthread=NULL)
get_SparseArray_nthread() set_SparseArray_nthread(nthread=NULL)
nthread |
The number of threads to use by multithreaded operations implemented in the SparseArray package. On systems where OpenMP is available, this must be On systems where OpenMP is not available, the supplied |
Multithreaded operations in the SparseArray package are implemented in C with OpenMP (https://www.openmp.org/).
Note that OpenMP is not available on all systems. On systems where it's
available, get_SparseArray_nthread()
is guaranteed to return a
value >= 1. On systems where it's not available (e.g. macOS),
get_SparseArray_nthread()
returns 0 and
set_SparseArray_nthread()
is a no-op.
IMPORTANT: The portable way to disable multithreading is by calling
set_SparseArray_nthread(1)
, NOT set_SparseArray_nthread(0)
(the latter returns an error on systems where OpenMP is available).
get_SparseArray_nthread()
returns an integer value >= 1 on systems
where OpenMP is available, and 0 on systems where it's not.
set_SparseArray_nthread()
returns the previous nthread
value, that is, the value returned by get_SparseArray_nthread()
before the call to set_SparseArray_nthread()
. Note that the value
is returned invisibly.
SparseArray_matrixStats for SparseArray col/row summarization methods.
SparseMatrix_mult for SparseMatrix multiplication and cross-product.
SparseArray objects.
get_SparseArray_nthread() if (get_SparseArray_nthread() != 0) { # multithreading is available svt1 <- poissonSparseMatrix(77000L, 15000L, density=0.01) ## 'user' time is typically N x 'elapsed' time where N is roughly the ## number of threads that was effectively used: system.time(cv1 <- colVars(svt1)) svt2 <- poissonSparseMatrix(77000L, 300L, density=0.3) * 0.77 system.time(cp12 <- crossprod(svt1, svt2)) prev_nthread <- set_SparseArray_nthread(1) # disable multithreading system.time(cv1 <- colVars(svt1)) system.time(cp12 <- crossprod(svt1, svt2)) ## Restore previous 'nthread' value: set_SparseArray_nthread(prev_nthread) }
get_SparseArray_nthread() if (get_SparseArray_nthread() != 0) { # multithreading is available svt1 <- poissonSparseMatrix(77000L, 15000L, density=0.01) ## 'user' time is typically N x 'elapsed' time where N is roughly the ## number of threads that was effectively used: system.time(cv1 <- colVars(svt1)) svt2 <- poissonSparseMatrix(77000L, 300L, density=0.3) * 0.77 system.time(cp12 <- crossprod(svt1, svt2)) prev_nthread <- set_SparseArray_nthread(1) # disable multithreading system.time(cv1 <- colVars(svt1)) system.time(cp12 <- crossprod(svt1, svt2)) ## Restore previous 'nthread' value: set_SparseArray_nthread(prev_nthread) }