Title: | Bumpy Matrix of Non-Scalar Objects |
---|---|
Description: | Implements the BumpyMatrix class and several subclasses for holding non-scalar objects in each entry of the matrix. This is akin to a ragged array but the raggedness is in the third dimension, much like a bumpy surface - hence the name. Of particular interest is the BumpyDataFrameMatrix, where each entry is a Bioconductor data frame. This allows us to naturally represent multivariate data in a format that is compatible with two-dimensional containers like the SummarizedExperiment and MultiAssayExperiment objects. |
Authors: | Aaron Lun [aut, cre], Genentech, Inc. [cph] |
Maintainer: | Aaron Lun <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.15.0 |
Built: | 2024-11-03 06:22:26 UTC |
Source: | https://github.com/bioc/BumpyMatrix |
A subclass of the BumpyMatrix where each entry is an atomic vector. One subclass is provided for each of the most common types.
In the following code snippets, x
is a BumpyDFrameMatrix.
Binary and unary operations are implemented by specializing Ops, Math and related group generics,
and will usually return a new BumpyAtomicMatrix of the appropriate type.
The exception is for Summary methods like max
and min
;
these return an ordinary matrix where each entry contains a scalar value for the corresponding entry of x
.
Furthermore, range
will return a 3-dimensional array containing the minimum and maximum for each entry of x
.
Common mathematical operations are implemented that apply to each entry of the BumpyAtomicMatrix:
mean
, sd
, median
, mad
, var
and IQR
take a single BumpyAtomicMatrix
and return an ordinary double-precision matrix of the same dimensions containing the computed statistic for each entry of the input.
This is possible as all operations are guaranteed to produce a scalar.
quantile
takes a single BumpyAtomicMatrix as input and return a 3-dimensional array.
The first dimension contains the requested quantiles, the second dimension corresponds to the rows of x
and the third dimension corresponds to the columns of x
.
which.max
and which.min
take a single BumpyAtomicMatrix
and return an ordinary integer matrix of the same dimensions containing the index for the min/max value per entry.
(This is set to NA
if the entry of the input has length zero.)
pmin
, pmax
, pmin.int
and pmax.int
take multiple BumpyAtomicMatrix objects of the same dimensions,
and return a BumpyAtomicMatrix containing the result of running the same function across corresponding entries of the input objects.
cor
, cov
and (optionally) var
take two BumpyAtomicMatrix objects of the same dimensions,
and return an ordinary matrix containing the computed statistic for the corresponding entries of the inputs.
This is possible as all operations are guaranteed to produce a scalar.
Additionally, common operations are implemented that apply to each entry of the BumpyCharacterMatrix
and return a BumpyAtomicMatrix of the same dimensions and an appropriate type.
This includes tolower
, toupper
, substr
, substring
, sub
, gsub
, grepl
, grep
,
nchar
, chartr
, startsWith
and endsWith
.
We also implement unstrsplit
, which returns an ordinary matrix of the same dimensions containing the unsplit strings.
All methods implemented for the BumpyMatrix parent class are available here.
# Mocking up a BumpyNumericList: library(IRanges) x <- splitAsList(runif(1000), factor(sample(50, 1000, replace=TRUE), 1:50)) # Creating a BumpyNumericMatrix: mat <- BumpyMatrix(x, c(10, 5)) mat[,1] # Arithmetic operations: (mat * 2)[,1] (mat + mat * 5)[,1] # Logical operations: (mat < 0.5)[,1] (mat > 0.5 & mat < 1)[,1] (mat == mat)[,1] # More statistics: max(mat) min(mat) mean(mat) sd(mat) median(mat) # Handling character vectors: x <- splitAsList(sample(LETTERS, 100, replace=TRUE), factor(sample(20, 100, replace=TRUE), 1:20)) cmat <- BumpyMatrix(x, c(5, 4)) cmat[,1] tolower(cmat[,1]) grepl("A|E|I|O|U", cmat)[,1] sub("A|E|I|O|U", "vowel", cmat)[,1]
# Mocking up a BumpyNumericList: library(IRanges) x <- splitAsList(runif(1000), factor(sample(50, 1000, replace=TRUE), 1:50)) # Creating a BumpyNumericMatrix: mat <- BumpyMatrix(x, c(10, 5)) mat[,1] # Arithmetic operations: (mat * 2)[,1] (mat + mat * 5)[,1] # Logical operations: (mat < 0.5)[,1] (mat > 0.5 & mat < 1)[,1] (mat == mat)[,1] # More statistics: max(mat) min(mat) mean(mat) sd(mat) median(mat) # Handling character vectors: x <- splitAsList(sample(LETTERS, 100, replace=TRUE), factor(sample(20, 100, replace=TRUE), 1:20)) cmat <- BumpyMatrix(x, c(5, 4)) cmat[,1] tolower(cmat[,1]) grepl("A|E|I|O|U", cmat)[,1] sub("A|E|I|O|U", "vowel", cmat)[,1]
The BumpyDataFrameMatrix provides a two-dimensional object where each entry is a DataFrame. This is useful for storing data that has a variable number of observations per sample/feature combination, e.g., for inclusion as another assay in a SummarizedExperiment object.
In the following code snippets, x
is a BumpyDataFrameMatrix.
commonColnames(x)
will return a character vector with the names of the available commonColnames.
This can be modified with commonColnames(x) <- value
.
x[i, j, k, ..., .dropk=drop, drop=TRUE]
will subset the BumpyDataFrameMatrix:
If k
is not specified, this will either produce another BumpyDataFrameMatrix corresponding to the specified submatrix,
or a CompressedSplitDataFrameList containing the entries of interest if drop=TRUE
.
If k
is specified, it should contain the names or indices of the columns of the underlying DataFrame to retain.
For multiple fields or with .dropk=FALSE
, a new BumpyDataFrameMatrix is returned with the specified columns in the DataFrame.
If k
only specifies a single column and .dropk=TRUE
,
a BumpyMatrix (or CompressedList, if drop=TRUE
) corresponding to the type of the field is returned.
x[i, j, k, ...] <- value
will modify x
by replacing the specified values with those in the BumpyMatrix value
of the same dimensions.
If k
is not specified, value
should be a BumpyDataFrameMatrix with the same fields as x
.
If k
is specified, value
should be a BumpyDataFrameMatrix with the specified fields.
If k
contains a single field, value
can also be a BumpyAtomicMatrix containing the values to use in that field.
All methods described for the BumpyMatrix parent class are available.
Aaron Lun
library(S4Vectors) df <- DataFrame(x=runif(100), y=runif(100)) f <- factor(sample(letters[1:20], nrow(df), replace=TRUE), letters[1:20]) out <- split(df, f) # Making our BumpyDataFrameMatrix. mat <- BumpyMatrix(out, c(5, 4)) mat[,1] mat[1,] # Subsetting capabilities. xmat <- mat[,,"x"] ymat <- mat[,,"y"] filtered <- mat[xmat > 0.5 & ymat > 0.5] filtered[,1] # Subset replacement works as expected. mat2 <- mat mat2[,,"x"] <- mat2[,,"x"] * 2 mat2[,1]
library(S4Vectors) df <- DataFrame(x=runif(100), y=runif(100)) f <- factor(sample(letters[1:20], nrow(df), replace=TRUE), letters[1:20]) out <- split(df, f) # Making our BumpyDataFrameMatrix. mat <- BumpyMatrix(out, c(5, 4)) mat[,1] mat[1,] # Subsetting capabilities. xmat <- mat[,,"x"] ymat <- mat[,,"y"] filtered <- mat[xmat > 0.5 & ymat > 0.5] filtered[,1] # Subset replacement works as expected. mat2 <- mat mat2[,,"x"] <- mat2[,,"x"] * 2 mat2[,1]
The BumpyMatrix provides a two-dimensional object where each entry is a Vector object. This is useful for storing data that has a variable number of observations per sample/feature combination, e.g., for inclusion as another assay in a SummarizedExperiment object.
BumpyMatrix(x, dims, dimnames=list(NULL, NULL), proxy=NULL, reorder=TRUE)
will produce a BumpyMatrix object, given:
x
, a CompressedList object containing one or more DFrames or atomic vectors.
dim
, an integer vector of length 2 specifying the dimensions of the returned object.
dimnames
, a list of length 2 containing the row and column names.
proxy
, an integer or numeric matrix-like object specifying the location of each entry of x
in the output matrix.
reorder
, a logical scalar indicating whether proxy
(if specified) should be reordered.
The type of the returned BumpyMatrix object is determined from the type of x
.
If proxy=NULL
, x
should have length equal to the product of dim
.
The entries of the returned BumpyMatrix are filled with x
in a column-major manner.
If proxy
is specified, it should contain indices in 1:length(x)
with all other entries filled with zeros.
If reorder=FALSE
, all non-zero values should be in increasing order when encountered in column-major format;
otherwise, the indices are resorted to enforce this expectation.
Note that dims
and dimnames
are ignored.
If x
is missing, a BumpyIntegerMatrix is returned with zero rows and columns.
If dim
is also specified, a BumpyIntegerMatrix with the specified number of rows and columns is returned,
where each entry is an empty integer vector.
In the following code snippets, x
is an instance of a BumpyMatrix subclass.
dim(x)
will yield a length-2 integer vector containing the number of rows and columns in x
.
length(x)
will yield the product of the number of columns and rows.
dimnames(x)
will yield a list of two character vectors with the row and column names of x
.
Either or both elements of the list may be NULL
if no names are present.
x[i, j, ..., drop=TRUE]
will yield the specified submatrix of the same type as x
,
given integer, character or logical subsetting vectors in i
and j
.
If the resulting submatrix has any dimension of length 1 and drop=TRUE
,
a CompressedList of the appropriate type is instead returned.
x[i,j] <- value
will replace the specified entries in x
with the values in another BumpyMatrix value
.
It is expected that value
is of the same subclass as x
.
value
can also be a CompressedList of the same class as undim(x)
,
in which case it is recycled to fill the specified entries.
t(x)
will transpose the BumpyMatrix, returning an object of the same type.
rbind(..., deparse.level=1)
and cbind(..., deparse.level=1)
will combine all BumpyMatrix objects in ...
,
yielding a single BumpyMatrix object containing all the rows and columns, respectively.
All objects should have the same number of columns (for rbind
) or rows (for cbind
).
Given a BumpyMatrix x
and an appropriate BumpyMatrix i
,
x[i]
will return another BumpyMatrix where each entry of x
is subsetted by the corresponding entry of i
.
This usually requires i
to be a BumpyIntegerMatrix or a BumpyLogicalMatrix,
though it is also possible to use a BumpyCharacterMatrix if each entry of x
is named.
undim(x)
will return the underlying CompressedList object.
redim(flesh, skeleton)
will create a BumpyMatrix object, given a CompressedList flesh
and an existing BumpyMatrix object skeleton
.
flesh
is assumed to be of the same length as undim(skeleton)
where each entry in the former replaces the corresponding entry in the latter.
The class of the output is determined based on the class of flesh
.
This method is analogous to the relist
function for lists.
unlist(x, ...)
will return the underlying Vector used to create the CompressedList object.
This is the same as unlist(undim(x), ...)
.
lengths(x)
will return a numeric matrix-like object with the same dimensions and dimnames as x
,
where each entry contains the length of the corresponding entry in x
.
The output class can be anything used in the proxy
of the constructor, e.g., a sparse matrix from the Matrix package.
Aaron Lun
# Mocking up a BumpyNumericList: library(IRanges) x <- NumericList(split(runif(1000), factor(sample(50, 1000, replace=TRUE), 1:50))) length(x) # Creating a BumpyNumericMatrix: mat <- BumpyMatrix(x, c(10, 5)) mat # Standard subsetting works correctly: mat[1:10,1:2] mat[,1] mat[1,] # Subsetting by another BumpyMatrix. is.big <- x > 0.9 i <- BumpyMatrix(is.big, dim(mat)) out <- mat[i] out # same dimensions as mat... out[,1] # but the entries are subsetted. out[1,] # Subset replacement works correctly: mat[,2] alt <- mat alt[,2] <- mat[,1,drop=FALSE] alt[,2] # Combining works correctly: rbind(mat, mat) cbind(mat, mat) # Transposition works correctly: mat[1,2] tmat <- t(mat) tmat tmat[1,2] # Get the underlying objects: undim(mat) summary(unlist(mat))
# Mocking up a BumpyNumericList: library(IRanges) x <- NumericList(split(runif(1000), factor(sample(50, 1000, replace=TRUE), 1:50))) length(x) # Creating a BumpyNumericMatrix: mat <- BumpyMatrix(x, c(10, 5)) mat # Standard subsetting works correctly: mat[1:10,1:2] mat[,1] mat[1,] # Subsetting by another BumpyMatrix. is.big <- x > 0.9 i <- BumpyMatrix(is.big, dim(mat)) out <- mat[i] out # same dimensions as mat... out[,1] # but the entries are subsetted. out[1,] # Subset replacement works correctly: mat[,2] alt <- mat alt[,2] <- mat[,1,drop=FALSE] alt[,2] # Combining works correctly: rbind(mat, mat) cbind(mat, mat) # Transposition works correctly: mat[1,2] tmat <- t(mat) tmat tmat[1,2] # Get the underlying objects: undim(mat) summary(unlist(mat))
Split a vector or Vector into a BumpyMatrix based on row/column factors. This facilitates the construction of a BumpyMatrix from vector-like objects.
splitAsBumpyMatrix(x, row, column, sparse = FALSE)
splitAsBumpyMatrix(x, row, column, sparse = FALSE)
x |
|
row |
An object coercible into a factor, of length equal to |
column |
An object coercible into a factor, of length equal to |
sparse |
Logical scalar indicating whether a sparse representation should be used. |
A BumpyMatrix of the appropriate type,
with number of rows and columns equal to the number of levels in row
and column
respectively.
Each entry of the matrix contains all elements of x
with the corresponding indices in row
and column
.
Aaron Lun
BumpyMatrix
, if a CompressedList has already been constructed.
unsplitAsDataFrame
, which reverses this operation to recover a long-format DataFrame.
splitAsList
, which inspired this function.
mat <- splitAsBumpyMatrix(runif(1000), row=sample(LETTERS, 1000, replace=TRUE), column=sample(10, 1000, replace=TRUE) ) mat mat[,1] mat[1,] # Or with a sparse representation. mat <- splitAsBumpyMatrix(runif(10), row=sample(LETTERS, 10, replace=TRUE), column=sample(10, 10, replace=TRUE) ) mat mat[,1] mat[1,]
mat <- splitAsBumpyMatrix(runif(1000), row=sample(LETTERS, 1000, replace=TRUE), column=sample(10, 1000, replace=TRUE) ) mat mat[,1] mat[1,] # Or with a sparse representation. mat <- splitAsBumpyMatrix(runif(10), row=sample(LETTERS, 10, replace=TRUE), column=sample(10, 10, replace=TRUE) ) mat mat[,1] mat[1,]
Unsplit a BumpyMatrix into a DataFrame, adding back the row and column names as separate columns. This is equivalent to converting the BumpyMatrix into a “long” format.
unsplitAsDataFrame( x, row.names = TRUE, column.names = TRUE, row.field = "row", column.field = "column", value.field = "value" )
unsplitAsDataFrame( x, row.names = TRUE, column.names = TRUE, row.field = "row", column.field = "column", value.field = "value" )
x |
A BumpyMatrix object. |
row.names , column.names
|
Logical scalar indicating whether the row or column names of |
row.field , column.field
|
String indicating the field in the output DataFrame to store the row or column names. |
value.field |
String specifying the field in the output DataFrame to store BumpyAtomicMatrix values. |
Denote the output of this function as y
.
Given a BumpyAtomicMatrix x
, we would expect to be able to recover x
by calling splitAsBumpyMatrix(y$value, y$row, y$column)
.
The row.field
, column.field
and value.field
arguments can be used to alter the column names of the output DataFrame.
This can be helpful to avoid, e.g., conflicts with columns of the same name in a BumpyDataFrameMatrix x
.
If no row/column names are present in x
(or row.names
or column.names
is FALSE
),
the row
and column
columns instead hold integer indices specifying the matrix row/column of each DataFrame row.
A DataFrame object containing the data in x
.
This has additional row
and column
columns containing the row/column names for each DataFrame row.
If x
is a BumpyAtomicMatrix, the output DataFrame contains a value
column that holds unlist(x)
.
Otherwise, if x
is a BumpyDataFrameMatrix, the DataFrame contains the columns in unlist(x)
.
Aaron Lun
splitAsBumpyMatrix
, to do the split in the first place.
mat <- splitAsBumpyMatrix(runif(1000), row=sample(LETTERS, 1000, replace=TRUE), column=sample(10, 1000, replace=TRUE) ) unsplitAsDataFrame(mat)
mat <- splitAsBumpyMatrix(runif(1000), row=sample(LETTERS, 1000, replace=TRUE), column=sample(10, 1000, replace=TRUE) ) unsplitAsDataFrame(mat)