Title: | Out-of-core statistical computing and signal processing |
---|---|
Description: | Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images. |
Authors: | Kylie A. Bemis <[email protected]> |
Maintainer: | Kylie A. Bemis <[email protected]> |
License: | Artistic-2.0 | file LICENSE |
Version: | 2.9.0 |
Built: | 2024-10-30 08:42:29 UTC |
Source: | https://github.com/bioc/matter |
Resample the given data at specified points. Interpolation can be performed within a tolerance using several interpolation methods.
approx1(x, y, xout, interp = "linear", n = length(x), tol = NA_real_, tol.ref = "abs", extrap = NA_real_)
approx1(x, y, xout, interp = "linear", n = length(x), tol = NA_real_, tol.ref = "abs", extrap = NA_real_)
x , y
|
The data to be interpolated. |
xout |
A vector of values where the resampling should take place. |
interp |
Interpolation method. One of 'none', 'sum', 'mean', 'max', 'min', 'area', 'linear', 'cubic', 'gaussian', or 'lanczos'. |
n |
If |
tol |
The tolerance for the data points used for interpolation. Must be nonnegative. If |
tol.ref |
If 'abs', then comparison is done by taking the absolute difference. If 'x', then relative differences are used. |
extrap |
The value to be returned when performing extrapolation, i.e., in the case when there is no data within |
The algorithm is implemented in C and provides several fast interpolation methods. Note that interpolation is limited to using data within the given tolerance. This is also used to specify the width for kernel-based interpolation methods such as interp = "gaussian"
. The use of a tolerance also means that interpolating within the range of the data but where no data points are within the tolerance window is considered extrapolation. This can be useful when resampling sparse signals with large empty regions, by setting extrap = 0
, and setting an appropriate tolerance.
A vector of the same length as xout
, giving the resampled data.
Kylie A. Bemis
x <- c(1.11, 2.22, 3.33, 5.0, 5.1) y <- x^1.11 approx1(x, y, 2.22) # 2.42359 approx1(x, y, 3.0) # NA approx1(x, y, 3.0, tol=0.2, tol.ref="x") # 3.801133
x <- c(1.11, 2.22, 3.33, 5.0, 5.1) y <- x^1.11 approx1(x, y, 2.22) # 2.42359 approx1(x, y, 3.0) # NA approx1(x, y, 3.0, tol=0.2, tol.ref="x") # 3.801133
Resample the (potentially scattered) 2D data to a rectilinear grid at the specified coordinates. Interpolation can be performed within a tolerance using several interpolation methods.
approx2(x, y, z, xout, yout, interp = "linear", nx = length(x), ny = length(y), tol = NA_real_, tol.ref = "abs", extrap = NA_real_)
approx2(x, y, z, xout, yout, interp = "linear", nx = length(x), ny = length(y), tol = NA_real_, tol.ref = "abs", extrap = NA_real_)
x , y , z
|
The data to be interpolated. Alternatively, |
xout , yout
|
The coordinate (grid lines) where the resampling should take place. These are expanded into a rectilinear grid using |
interp |
Interpolation method. One of 'none', 'sum', 'mean', 'max', 'min', 'area', 'linear', 'cubic', 'gaussian', or 'lanczos'. |
nx , ny
|
If |
tol |
The tolerance for the data points used for interpolation. Must be nonnegative. May be length-2 to have different tolerances for |
tol.ref |
If 'abs', then comparison is done by taking the absolute difference. If 'x', then relative differences are used. |
extrap |
The value to be returned when performing extrapolation, i.e., in the case when there is no data within |
See approx1
for details of the 1D implementation. The 2D implementation is mostly the same, except it uses a kd-tree to quickly find neighboring points.
Note that interp = "linear"
and interp = "cubic"
use a kernel-based approximation. Traditionally, bilinear and bicubic interpolation use 4 and 16 neighboring points, respectively. However, to support scattered data, approx2
will use as many points as are found within the given tolerance, and scale the kernels accordingly. If the input data falls on a regular grid already, then the tolerance should be specified accordingly. Set tol
equal to the sampling rate for interp = "linear"
and twice the sampling rate for interp = "cubic"
.
A vector of the same length as xout
, giving the resampled data.
Kylie A. Bemis
expand.grid
,
asearch
,
approx
,
approx1
x <- matrix(1:25, nrow=5, ncol=5) approx2(x, nx=10, ny=10, interp="cubic") # upsampling
x <- matrix(1:25, nrow=5, ncol=5) approx2(x, nx=10, ny=10, interp="cubic") # upsampling
Calculate the mean, median, or mode of a vector.
avg(x, center = mean)
avg(x, center = mean)
x |
A vector to summarize. |
center |
A function to use to calculate the central tendency for numeric vectors. Defaults to |
Missing values are always removed before calculating the central tendency. The center
funtion will be used to calculate the central tendency for any x
for which either is.numeric
or is.complex
is TRUE
. Otherwise, the mode is calculated.
A single value summarizing x
.
Kylie A. Bemis
set.seed(1) x <- sample(LETTERS, 50, replace=TRUE) y <- runif(50) avg(x) avg(y) avg(y, median)
set.seed(1) x <- sample(LETTERS, 50, replace=TRUE) y <- runif(50) avg(x) avg(y) avg(y, median)
Combine peaks from multiple signals.
# Bin a list of peaks binpeaks(peaklist, domain = NULL, xlist = peaklist, tol = NA_real_, tol.ref = "abs", merge = FALSE, na.drop = TRUE) # Merge peaks mergepeaks(peaks, n = nobs(peaks), x = peaks, tol = NA_real_, tol.ref = "abs", na.drop = TRUE)
# Bin a list of peaks binpeaks(peaklist, domain = NULL, xlist = peaklist, tol = NA_real_, tol.ref = "abs", merge = FALSE, na.drop = TRUE) # Merge peaks mergepeaks(peaks, n = nobs(peaks), x = peaks, tol = NA_real_, tol.ref = "abs", na.drop = TRUE)
peaklist , xlist
|
A list of vectors of peak indices (or domain values), and the values to be binned according to the peak locations. |
peaks , x
|
The indices (or domain values) of peaks which should be merged, or for which the corresponding values should be averaged. If |
domain |
The domain variable of the signal. |
tol , tol.ref
|
A tolerance specifying the maximum allowed distance between binned or merged peaks. See |
merge |
Should the binned peaks be merged? |
na.drop |
Should missing values be dropped from the result? |
n |
The count of times each peak was observed. This is used to weight the averaging. Local minima in counts are also used to separate distinct peaks that are closer together than |
binpeaks()
is used to bin a list of peaks from multiple signals to a set of common peaks. The peaks (or their corresponding values) are binned to the given domain
values and are averaged within each bin. If domain
is not given, then the bins are created from the range of the peak locations and the specified tol
.
mergepeaks()
is used to merge any peaks with gaps smaller than the given tolerance and whose counts (n
) do not indicate that they should be considered separate peaks. The merged peaks are averaged together.
A numeric stream_stat
vector, giving the average locations of each peak.
Kylie A. Bemis
x <- c(0, 1, 1, 2, 3, 2, 1, 4, 5, 1, 1, 0) y <- c(0, 1, 1, 3, 2, 2, 1, 5, 4, 1, 1, 0) p1 <- findpeaks(x) p2 <- findpeaks(y) binpeaks(list(p1, p2), merge=FALSE)
x <- c(0, 1, 1, 2, 3, 2, 1, 4, 5, 1, 1, 0) y <- c(0, 1, 1, 3, 2, 2, 1, 5, 4, 1, 1, 0) p1 <- findpeaks(x) p2 <- findpeaks(y) binpeaks(list(p1, p2), merge=FALSE)
Summarize a vector in the bins at the specified indices.
binvec(x, lower, upper, stat = "sum", prob = 0.5)
binvec(x, lower, upper, stat = "sum", prob = 0.5)
x |
A numeric vector. |
lower , upper
|
The (inclusive) lower and upper indices of the bins |
stat |
The statistic used to summarize the values in each bin. Must be one of "sum", "mean", "max", "min", "sd", "var", "mad", or "quantile". |
prob |
The quantile for |
An numeric vector of the summarized (binned) values.
Kylie A. Bemis
set.seed(1) x <- sort(runif(20)) binvec(x, c(1,6,11,16), c(5,10,15,20)) binvec(x, seq(from=1, to=16, by=5), stat="mean")
set.seed(1) x <- sort(runif(20)) binvec(x, c(1,6,11,16), c(5,10,15,20)) binvec(x, seq(from=1, to=16, by=5), stat="mean")
Use a binary search to find approximate matches for the elements of its first argument among those in its second. This implementation allows for returning the index of the nearest match if there are no exact matches. It also allows specifying a tolerance for the comparison.
bsearch(x, table, tol = 0, tol.ref = "abs", nomatch = NA_integer_, nearest = FALSE) reldiff(x, y, ref = "y")
bsearch(x, table, tol = 0, tol.ref = "abs", nomatch = NA_integer_, nearest = FALSE) reldiff(x, y, ref = "y")
x |
A vector of values to be matched. Only integer, numeric, and character vectors are supported. |
y , table
|
A sorted (non-decreasing) vector of values to be matched against. Only integer, numeric, and character vectors are supported. |
tol |
The tolerance for matching doubles. Must be >= 0. |
ref , tol.ref
|
One of 'abs', 'x', or 'y'. If 'abs', then comparison is done by taking the absolute difference. If either 'x' or 'y', then relative differences are used, and this specifies which to use as the reference (target) value. For strings, this uses the Hamming distance (number of errors), normalized by the length of the reference string for relative differences. |
nomatch |
The value to be returned in the case when no match is found, coerced to an integer. (Ignored if |
nearest |
Should the index of the closest match be returned if no exact matches are found? |
The algorithm is implemented in C and currently only works for 'integer', 'numeric', and 'character' vectors. If there are multiple matches, then the first match that is found will be returned, with no guarantees. If a nonzero tolerance is provided, the closest match will be returned.
The "nearest" match for strings when there are no exact matches is decided by the match with the most initial matching characters. Tolerance is ignored for strings and integers. Behavior is undefined and results may be unexpected if values
includes NAs.
A vector of the same length as x
, giving the indexes of the matches in table
.
Kylie A. Bemis
asearch
,
match
,
pmatch
,
findInterval
a <- c(1.11, 2.22, 3.33, 5.0, 5.1) bsearch(2.22, a) # 2 bsearch(3.0, a) # NA bsearch(3.0, a, nearest=TRUE) # 3 bsearch(3.0, a, tol=0.1, tol.ref="values") # 3 b <- c("hello", "world!") bsearch("world!", b) # 2 bsearch("worl", b) # NA bsearch("worl", b, nearest=TRUE) # 2
a <- c(1.11, 2.22, 3.33, 5.0, 5.1) bsearch(2.22, a) # 2 bsearch(3.0, a) # NA bsearch(3.0, a, nearest=TRUE) # 3 bsearch(3.0, a, tol=0.1, tol.ref="values") # 3 b <- c("hello", "world!") bsearch("world!", b) # 2 bsearch("worl", b) # NA bsearch("worl", b, nearest=TRUE) # 2
This is a generic function for applying cryptographic hash functions and calculating checksums for externally-stored R objects.
checksum(x, ...) ## S4 method for signature 'character' checksum(x, algo = "sha1", ...) ## S4 method for signature 'matter_' checksum(x, algo = "sha1", ...)
checksum(x, ...) ## S4 method for signature 'character' checksum(x, algo = "sha1", ...) ## S4 method for signature 'matter_' checksum(x, algo = "sha1", ...)
x |
A file path or an object to be hashed. |
algo |
The hash function to use. |
... |
Additional arguments to be passed to the hash function. |
The method for matter
objects calculates checksums of each of the files in the object's paths
.
A character vector giving the hash or hashes of the object.
Kylie A. Bemis
x <- matter(1:10) y <- matter(1:10) checksum(x) checksum(y) # should be the same
x <- matter(1:10) y <- matter(1:10) checksum(x) checksum(y) # should be the same
Perform equivalents of apply
, lapply
, and mapply
, but over parallelized chunks of data. This is most useful if accessing the data is potentially time-consuming, such as for file-based matter
objects. Operating on chunks reduces the number of I/O operations.
## Operate on elements/rows/columns chunkApply(X, MARGIN, FUN, ..., simplify = FALSE, outpath = NULL, verbose = NA, BPPARAM = bpparam()) chunkLapply(X, FUN, ..., simplify = FALSE, outpath = NULL, verbose = NA, BPPARAM = bpparam()) chunkMapply(FUN, ..., simplify = FALSE, outpath = NULL, verbose = NA, BPPARAM = bpparam()) ## Operate on complete chunks chunk_rowapply(X, FUN, ..., simplify = "c", depends = NULL, permute = FALSE, RNG = FALSE, verbose = NA, chunkopts = list(), BPPARAM = bpparam()) chunk_colapply(X, FUN, ..., simplify = "c", depends = NULL, permute = FALSE, RNG = FALSE, verbose = NA, chunkopts = list(), BPPARAM = bpparam()) chunk_lapply(X, FUN, ..., simplify = "c", depends = NULL, permute = FALSE, RNG = FALSE, verbose = NA, chunkopts = list(), BPPARAM = bpparam()) chunk_mapply(FUN, ..., MoreArgs = NULL, simplify = "c", depends = NULL, permute = FALSE, RNG = FALSE, verbose = NA, chunkopts = list(), BPPARAM = bpparam())
## Operate on elements/rows/columns chunkApply(X, MARGIN, FUN, ..., simplify = FALSE, outpath = NULL, verbose = NA, BPPARAM = bpparam()) chunkLapply(X, FUN, ..., simplify = FALSE, outpath = NULL, verbose = NA, BPPARAM = bpparam()) chunkMapply(FUN, ..., simplify = FALSE, outpath = NULL, verbose = NA, BPPARAM = bpparam()) ## Operate on complete chunks chunk_rowapply(X, FUN, ..., simplify = "c", depends = NULL, permute = FALSE, RNG = FALSE, verbose = NA, chunkopts = list(), BPPARAM = bpparam()) chunk_colapply(X, FUN, ..., simplify = "c", depends = NULL, permute = FALSE, RNG = FALSE, verbose = NA, chunkopts = list(), BPPARAM = bpparam()) chunk_lapply(X, FUN, ..., simplify = "c", depends = NULL, permute = FALSE, RNG = FALSE, verbose = NA, chunkopts = list(), BPPARAM = bpparam()) chunk_mapply(FUN, ..., MoreArgs = NULL, simplify = "c", depends = NULL, permute = FALSE, RNG = FALSE, verbose = NA, chunkopts = list(), BPPARAM = bpparam())
X |
A matrix for |
MARGIN |
If the object is matrix-like, which dimension to iterate over. Must be 1 or 2, where 1 indicates rows and 2 indicates columns. The dimension names can also be used if |
FUN |
The function to be applied. |
MoreArgs |
A list of other arguments to |
... |
Additional arguments to be passed to |
simplify |
Should the result be simplified into a vector, matrix, or higher dimensional array? |
outpath |
If non-NULL, a file path where the results should be written as they are processed. If specified, |
verbose |
Should user messages be printed with the current chunk being processed? If |
chunkopts |
An (optional) list of chunk options including |
depends |
A list with length equal to the extent of |
permute |
Should the order of items be randomized? This may be useful for iterating over random subsets. No attempt is made to re-order the results. |
RNG |
Should the local random seed (as set by |
BPPARAM |
An optional instance of |
For chunkApply()
, chunkLapply()
, and chunkMapply()
:
For vectors and lists, the vector is broken into some number of chunks according to chunks
. The individual elements of the chunk are then passed to FUN
.
For matrices, the matrix is chunked along rows or columns, based on the number of chunks
. The individual rows or columns of the chunk are then passed to FUN
.
In this way, the first argument of FUN
is analogous to using the base apply
, lapply
, and mapply
functions.
For chunk_rowapply()
, chunk_colapply()
, chunk_lapply()
, and chunk_mapply()
:
In this situation, the entire chunk is passed to FUN
, and FUN
is responsible for knowing how to handle a sub-vector or sub-matrix of the original object. This may be useful if FUN
is already a function that could be applied to the whole object such as rowSums
or colSums
.
When this is the case, it may be useful to provide a custom simplify
function.
For convenience to the programmer, several attributes are made available when operating on a chunk.
"chunkid": The index of the chunk currently being processed by FUN
.
"chunklen": The number of elements in the chunk that should be processed.
"index": The indices of the elements of the chunk, as elements/rows/columns in the original matrix/vector.
"depends" (optional): If depends
is given, then this is a list of indices within the chunk. The length of the list is equal to the number of elements/rows/columns in the chunk. Each list element is either NULL
or a vector of indices giving the elements/rows/columns of the chunk that should be processed for that index. The indices that should be processed will be non-NULL
, and indices that should be ignored will be NULL
.
The depends
argument can be used to iterate over dependent elements of a vector, or dependent rows/columns of a matrix. This can be useful if the calculation for a particular row/column/element depends on the values of others.
When depends
is provided, multiple rows/columns/elements will be passed to FUN
. Each element of the depends
list should be a vector giving the indices that should be passed to FUN
.
For example, this can be used to implement a rolling apply function.
Several options are supported by chunkopts
to override the global options:
nchunks: The number of chunks to use. If omitted, this is taken from getOption("matter.default.nchunks")
. For IO-bound operations, using fewer chunks will often be faster, but use more memory.
chunksize: The approximate chunk size in bytes. If omitted, this is taken from getOption("matter.default.chunksize")
. For IO-bound operations, using larger chunks will often be faster, but use more memory. If set to NA_real_
, then the chunk size is determined by the number of chunks.
serialize: Whether data in virtual memory should be realized on the manager and serialized to the workers (TRUE
), passed to the workers in virtual memory as-is (FALSE
), or if matter
should decide the behavior based on the cluster configuration (NA
). If omitted, this is taken from getOption("matter.default.serialize")
. If all workers have access to the same virtual memory resources (whether file storage or shared memory), then it can be significantly faster to avoid serializing the data.
Typically, a list if simplify=FALSE
. Otherwise, the results may be coerced to a vector or array.
Kylie A. Bemis
apply
,
lapply
,
mapply
,
RNGkind
,
RNGStreams
,
SnowfastParam
register(SerialParam()) set.seed(1) x <- matrix(rnorm(1000^2), nrow=1000, ncol=1000) out <- chunkApply(x, 1L, mean, chunkopts=list(nchunks=10)) head(out)
register(SerialParam()) set.seed(1) x <- matrix(rnorm(1000^2), nrow=1000, ncol=1000) out <- chunkApply(x, 1L, mean, chunkopts=list(nchunks=10)) head(out)
The chunked
class implements chunked wrappers for vectors, arrays, and lists for parallel iteration.
## Instance creation chunked_vec(x, nchunks = NA, chunksize = NA, verbose = FALSE, permute = FALSE, depends = NULL, drop = FALSE) chunked_mat(x, margin, nchunks = NA, chunksize = NA, verbose = FALSE, permute = FALSE, depends = NULL, drop = FALSE) chunked_list(..., nchunks = NA, chunksize = NA, verbose = FALSE, permute = FALSE, depends = NULL, drop = FALSE) ## Additional methods documented below
## Instance creation chunked_vec(x, nchunks = NA, chunksize = NA, verbose = FALSE, permute = FALSE, depends = NULL, drop = FALSE) chunked_mat(x, margin, nchunks = NA, chunksize = NA, verbose = FALSE, permute = FALSE, depends = NULL, drop = FALSE) chunked_list(..., nchunks = NA, chunksize = NA, verbose = FALSE, permute = FALSE, depends = NULL, drop = FALSE) ## Additional methods documented below
x , ...
|
The data to be chunked. If multiple objects are passed via ..., then they are all recycled to the same length. |
nchunks |
The number of chunks to use. If |
chunksize |
The approximate chunk size in bytes. If |
verbose |
Should messages be printed whenever a chunk is extracted? |
permute |
Should the order of items be randomized? Alternatively, an integer vector or a list of integer vectors can be specified. If an integer vector is provided, then |
depends |
A list with length equal to the extent of |
margin |
Which array margin should be chunked. |
drop |
The value passed to |
An object derived from class chunked
.
data
:The data.
index
:The chunk indices.
verbose
:Print messages on chunk extraction?
drop
:The value passed to drop
when subsetting the chunks.
margin
:The array margin for the chunks.
chunked_vec
, chunked_mat
and chunked_list
instances can be created through chunked_vec()
, chunked_mat()
, and chunked_list()
, respectively.
Standard generic methods:
length(x):
Get number of chunks.
lengths(x):
Get chunk sizes for each chunk.
x[i, ...]
:Get chunks.
x[[i]]
:Get a single chunk.
Kylie A. Bemis
x <- matrix(runif(200), nrow=10, ncol=20) y <- chunked_mat(x, margin=2, nchunks=5) print(y)
x <- matrix(runif(200), nrow=10, ncol=20) y <- chunked_mat(x, margin=2, nchunks=5) print(y)
Apply the equivalent of scale
to either columns or rows of a matrix, using a grouping variable.
## S4 method for signature 'ANY' colscale(x, center=TRUE, scale=TRUE, group = NULL, ..., BPPARAM = bpparam()) ## S4 method for signature 'ANY' rowscale(x, center=TRUE, scale=TRUE, group = NULL, ..., BPPARAM = bpparam())
## S4 method for signature 'ANY' colscale(x, center=TRUE, scale=TRUE, group = NULL, ..., BPPARAM = bpparam()) ## S4 method for signature 'ANY' rowscale(x, center=TRUE, scale=TRUE, group = NULL, ..., BPPARAM = bpparam())
x |
A matrix-like object. |
center |
Either a logical value or a numeric vector of length equal to the number of columns of 'x' (for |
scale |
Either a logical value or a numeric vector of length equal to the number of columns of 'x' (for |
group |
A vector or factor giving the groupings with length equal to the number of rows of 'x' (for |
... |
Arguments passed to |
BPPARAM |
An optional instance of |
See scale
for details.
A matrix-like object (usually of the same class as x
) with either ‘col-scaled:center’ and ‘col-scaled:scale’ attributes or ‘row-scaled:center’ and ‘row-scaled:scale’ attributes.
Kylie A. Bemis
x <- matter(1:100, nrow=10, ncol=10) colscale(x)
x <- matter(1:100, nrow=10, ncol=10) colscale(x)
These functions perform calculation of summary statistics over matrix rows and columns for each level of a grouping variable.
## S4 method for signature 'ANY' rowStats(x, stat, ..., BPPARAM = bpparam()) ## S4 method for signature 'ANY' colStats(x, stat, ..., BPPARAM = bpparam()) ## S4 method for signature 'matter_mat' rowStats(x, stat, ..., BPPARAM = bpparam()) ## S4 method for signature 'matter_mat' colStats(x, stat, ..., BPPARAM = bpparam()) ## S4 method for signature 'sparse_mat' rowStats(x, stat, ..., BPPARAM = bpparam()) ## S4 method for signature 'sparse_mat' colStats(x, stat, ..., BPPARAM = bpparam()) .rowStats(x, stat, group = NULL, na.rm = FALSE, simplify = TRUE, drop = TRUE, iter.dim = 1L, BPPARAM = bpparam(), ...) .colStats(x, stat, group = NULL, na.rm = FALSE, simplify = TRUE, drop = TRUE, iter.dim = 2L, BPPARAM = bpparam(), ...)
## S4 method for signature 'ANY' rowStats(x, stat, ..., BPPARAM = bpparam()) ## S4 method for signature 'ANY' colStats(x, stat, ..., BPPARAM = bpparam()) ## S4 method for signature 'matter_mat' rowStats(x, stat, ..., BPPARAM = bpparam()) ## S4 method for signature 'matter_mat' colStats(x, stat, ..., BPPARAM = bpparam()) ## S4 method for signature 'sparse_mat' rowStats(x, stat, ..., BPPARAM = bpparam()) ## S4 method for signature 'sparse_mat' colStats(x, stat, ..., BPPARAM = bpparam()) .rowStats(x, stat, group = NULL, na.rm = FALSE, simplify = TRUE, drop = TRUE, iter.dim = 1L, BPPARAM = bpparam(), ...) .colStats(x, stat, group = NULL, na.rm = FALSE, simplify = TRUE, drop = TRUE, iter.dim = 2L, BPPARAM = bpparam(), ...)
x |
A matrix on which to calculate summary statistics. |
stat |
The name of summary statistics to compute over the rows or columns of a matrix. Allowable values include: "min", "max", "prod", "sum", "mean", "var", "sd", "any", "all", and "nnzero". |
group |
A factor or vector giving the grouping. If not provided, no grouping will be used. |
na.rm |
If |
simplify |
Simplify the results from a list to a vector or array. This also drops any additional attributes (besides names). |
drop |
If only a single summary statistic is calculated, return the results as a vector (or matrix) rather than a list. |
iter.dim |
The dimension to iterate over. Must be 1 or 2, where 1 indicates rows and 2 indicates columns. |
BPPARAM |
An optional instance of |
... |
Additional arguments passed to |
The summary statistics methods are calculated over chunks of the matrix using s_colstats
and s_rowstats
. For matter
objects, the iteration is performed over the major dimension for IO efficiency.
A list for each stat
requested, where each element is either a vector (if no grouping variable is provided) or a matrix where each column corresponds to a different level of group
.
If drop=TRUE
, and only a single statistic is requested, then the result will be unlisted and returned as a vector or matrix.
Kylie A. Bemis
register(SerialParam()) set.seed(1) x <- matrix(runif(100^2), nrow=100, ncol=100) g <- as.factor(rep(letters[1:5], each=20)) colStats(x, "mean", group=g)
register(SerialParam()) set.seed(1) x <- matrix(runif(100^2), nrow=100, ncol=100) g <- as.factor(rep(letters[1:5], each=20)) colStats(x, "mean", group=g)
Apply the equivalent of sweep
to either columns or rows of a matrix, using a grouping variable.
## S4 method for signature 'ANY' colsweep(x, STATS, FUN = "-", group = NULL, ...) ## S4 method for signature 'matter_mat' colsweep(x, STATS, FUN = "-", group = NULL, ...) ## S4 method for signature 'sparse_mat' colsweep(x, STATS, FUN = "-", group = NULL, ...) ## S4 method for signature 'ANY' rowsweep(x, STATS, FUN = "-", group = NULL, ...) ## S4 method for signature 'matter_mat' rowsweep(x, STATS, FUN = "-", group = NULL, ...) ## S4 method for signature 'sparse_mat' rowsweep(x, STATS, FUN = "-", group = NULL, ...)
## S4 method for signature 'ANY' colsweep(x, STATS, FUN = "-", group = NULL, ...) ## S4 method for signature 'matter_mat' colsweep(x, STATS, FUN = "-", group = NULL, ...) ## S4 method for signature 'sparse_mat' colsweep(x, STATS, FUN = "-", group = NULL, ...) ## S4 method for signature 'ANY' rowsweep(x, STATS, FUN = "-", group = NULL, ...) ## S4 method for signature 'matter_mat' rowsweep(x, STATS, FUN = "-", group = NULL, ...) ## S4 method for signature 'sparse_mat' rowsweep(x, STATS, FUN = "-", group = NULL, ...)
x |
A matrix-like object. |
STATS |
The summary statistic to be swept out, with length equal to the number of columns of 'x' (for |
FUN |
The function to be used to carry out the sweep. |
group |
A vector or factor giving the groupings with length equal to the number of rows of 'x' (for |
... |
Ignored. |
See sweep
for details.
A matrix-like object (usually of the same class as x
) with the statistics swept out.
Kylie A. Bemis
set.seed(1) x <- matrix(1:100, nrow=10, ncol=10) colsweep(x, colStats(x, "mean"))
set.seed(1) x <- matrix(1:100, nrow=10, ncol=10) colsweep(x, colStats(x, "mean"))
Convolve a signal with weights at arbitrary indices.
convolve_at(x, index, weights, margin = NULL, na.rm = FALSE)
convolve_at(x, index, weights, margin = NULL, na.rm = FALSE)
x |
A numeric vector or matrix. |
index |
A list or matrix of numeric vectors giving the indices to convolve. If |
weights |
A list giving the weights of the kernels to convolve for each element of |
margin |
If |
na.rm |
Should missing values (including |
This is essentially just a weighted sum defined by x[i] = sum(weights[[i]] * x[index[[i]]])
.
A numeric vector the same length as x
with the smoothed result.
Kylie A. Bemis
set.seed(1) t <- seq(from=0, to=6 * pi, length.out=5000) y <- sin(t) + 0.6 * sin(2.6 * t) x <- y + runif(length(y)) i <- roll(seq_along(x), width=15) wt <- dnorm((-7):7, sd=7/2) wt <- wt / sum(wt) xs <- convolve_at(x, i, wt) plot(x, type="l") lines(xs, col="red")
set.seed(1) t <- seq(from=0, to=6 * pi, length.out=5000) y <- sin(t) + 0.6 * sin(2.6 * t) x <- y + runif(length(y)) i <- roll(seq_along(x), width=15) wt <- dnorm((-7):7, sd=7/2) wt <- wt / sum(wt) xs <- convolve_at(x, i, wt) plot(x, type="l") lines(xs, col="red")
Compute Manders overlap coefficient (MOC), and Manders colocalization coefficients (M1 and M2), and Dice similarity coefficient.
coscore(x, y, threshold = NA)
coscore(x, y, threshold = NA)
x , y
|
Images to be compared. These can be numeric or logical. If numeric, then the "overlap" is defined where both images are nonzero. |
threshold |
The intensity threshold to use when comparing the images. If |
The Dice coefficient and Manders overlap coefficient are symmetric between images, while M1 and M2 measure the overlap relative to x
and y
respectively.
A numeric vector with elements named "MOC", "M1", "M2", and "Dice", and an attribute named "threshold" giving the numeric thresholds (if applicable) for converting each image to a logical mask.
Kylie A. Bemis
K. W. Dunn, M. M. Kamocka, and J. H. McDonald. “A practical guide to evaluating colocalization in biological microscop.” American Journal of Physiology: Cell Physiology, vol. 300, no. 4, pp. C732-C742, 2011.
S. V. Costes, D. Daelemans, E. H. Cho, Z. Dobbin, G. Pavlakis, and S. Lockett. “Automatic and Quantitative Measurement of Protein-Protein Colocalization in Live Cells.” Biophysical Journal, vol. 86, no. 6, pp. 3993-4003, 2004.
K. H. Zou, S. K. Warfield, A. Bharatha, C. M. C. Tempany, M. R. Kaus, S. J. Haker, W. M. Wells, III, F. A. Jolesz, and R. Kikinis. “Statistical Validation of Image Segmentation Quality Based on a Spatial Overlap Index.” Academic Radiology, vol. 11, issue 2, pp. 178-189, 2004.
set.seed(1) y <- x <- matrix(0, nrow=32, ncol=32) x[5:16,5:16] <- 1 x[17:28,17:28] <- 1 x <- x + runif(length(x)) y[4:15,4:15] <- 1 y[18:29,18:29] <- 1 y <- y + runif(length(y)) xl <- x > median(x) yl <- y > median(y) coscore(x, x) coscore(x, y) coscore(x, y, threshold=median) coscore(xl, yl)
set.seed(1) y <- x <- matrix(0, nrow=32, ncol=32) x[5:16,5:16] <- 1 x[17:28,17:28] <- 1 x <- x + runif(length(x)) y[4:15,4:15] <- 1 y[18:29,18:29] <- 1 y <- y + runif(length(y)) xl <- x > median(x) yl <- y > median(y) coscore(x, x) coscore(x, y) coscore(x, y, threshold=median) coscore(xl, yl)
These functions provide simple color palettes.
## Continuous color palettes cpal(palette = "Viridis") ## Discrete color palettes dpal(palette = "Tableau 10") ## List palettes cpals() dpals() # Add transparency to colors add_alpha(colors, alpha = 1, pow = 1)
## Continuous color palettes cpal(palette = "Viridis") ## Discrete color palettes dpal(palette = "Tableau 10") ## List palettes cpals() dpals() # Add transparency to colors add_alpha(colors, alpha = 1, pow = 1)
palette |
The name of a color palette. See |
colors |
A character vector of colors to add transparency to. |
alpha |
A numeric vector giving the level of transparency in the range [0, 1] where 0 is fully transparent and 1 is fully opaque. |
pow |
The power scaling of the alpha channel. A linear alpha scale often results in poor interpretability for superposed images, so raising the alpha channel (already in range [0, 1]) to a power > 1 can improve interpretability in these cases. Conversely, for highly skewed data, using a power < 1 can reduce the impact of extreme values. |
A character vector of colors or a function for generating n
colors.
Kylie A. Bemis
f <- cpal("viridis") cols <- f(10) add_alpha(cols, 1:10/10)
f <- cpal("viridis") cols <- f(10) add_alpha(cols, 1:10/10)
Perform k-fold cross-validation with an arbitrary modeling function.
cv_do(fit., x, y, folds, ..., mi = !is.null(bags), bags = NULL, pos = 1L, predict. = predict, transpose = FALSE, keep.models = TRUE, trainProcess = NULL, trainArgs = list(), testProcess = NULL, testArgs = list(), verbose = NA, chunkopts = list(), BPPARAM = bpparam()) ## S3 method for class 'cv' fitted(object, type = c("response", "class"), simplify = TRUE, ...)
cv_do(fit., x, y, folds, ..., mi = !is.null(bags), bags = NULL, pos = 1L, predict. = predict, transpose = FALSE, keep.models = TRUE, trainProcess = NULL, trainArgs = list(), testProcess = NULL, testArgs = list(), verbose = NA, chunkopts = list(), BPPARAM = bpparam()) ## S3 method for class 'cv' fitted(object, type = c("response", "class"), simplify = TRUE, ...)
fit. |
The function used to fit the model. |
x , y
|
The data and response variable. |
folds |
A vector coercible to a factor giving the fold for each row or column of |
mi |
Should |
bags |
If provided, subsetted and passed to |
pos |
The positive class for multiple instance learning. Only used if |
... |
Additional arguments passed to |
predict. |
The function used to predict on new data from the fitted model. The fitted model is passed as the 1st argument and the test data is passed as the 2nd argument. |
transpose |
A logical value indicating whether |
keep.models |
Should the models be kept and returned? |
trainProcess , trainArgs
|
A function and arguments used for processing the training sets. The training set is passed as the 1st argument to |
testProcess , testArgs
|
A function and arguments used for processing the test sets. The test set is passed as the 1st argument to |
verbose |
Should progress be printed for each iteration? |
chunkopts |
Passed to |
BPPARAM |
An optional instance of |
object |
An object inheriting from |
type |
The type of prediction, where |
simplify |
Should the predictions be simplified (from a list) to an array ( |
The cross-validation is not performed in parallel, because it is assumed the pre-processing functions, modeling function, and prediction function may make use of parallelization. Therefore, these functions need to be able to handle (or ignore) the arguments nchunks
and BPPARAM
, which will be passed to them.
If bags
is specified, then multiple instance learning is assumed, where observations from the same bag are all assumed to have the same label. The labels for bags are automatically pooled (from y
) so that if any observation in a bag is pos
, then the entire bag is labeled pos
. If mi=TRUE
then mi_learn
will be called by cv_do
; otherwise it is assumed fn
will handle the multiple instance learning. The accuracy metrics are calculated with the original y
labels.
An object of class cv
, with the following components:
average
: The average accuracy metrics.
scores
: The fold-specific accuracy metrics.
folds
: The fold memberships.
fitted.values
: The fold-specific predictions.
models
: (Optional) The fitted models.
Kylie A. Bemis
register(SerialParam()) set.seed(1) n <- 100 p <- 5 nfolds <- 3 y <- rep(c(rep.int("yes", 60), rep.int("no", 40)), nfolds) x <- matrix(rnorm(nfolds * n * p), nrow=nfolds * n, ncol=p) x[,1L] <- x[,1L] + 2 * ifelse(y == "yes", runif(n), -runif(n)) x[,2L] <- x[,2L] + 2 * ifelse(y == "no", runif(n), -runif(n)) folds <- rep(paste0("set", seq_len(nfolds)), each=n) cv_do(pls_nipals, x, y, k=1:5, folds=folds)
register(SerialParam()) set.seed(1) n <- 100 p <- 5 nfolds <- 3 y <- rep(c(rep.int("yes", 60), rep.int("no", 40)), nfolds) x <- matrix(rnorm(nfolds * n * p), nrow=nfolds * n, ncol=p) x[,1L] <- x[,1L] + 2 * ifelse(y == "yes", runif(n), -runif(n)) x[,2L] <- x[,2L] + 2 * ifelse(y == "no", runif(n), -runif(n)) folds <- rep(paste0("set", seq_len(nfolds)), each=n) cv_do(pls_nipals, x, y, k=1:5, folds=folds)
Some arithmetic, comparison, and logical operations are available as delayed operations on matter_arr
and sparse_arr
objects.
Currently the following delayed operations are supported:
‘Arith’: ‘+’, ‘-’, ‘*’, ‘/’, ‘^’, '
‘Math’: ‘exp’, ‘log’, ‘log2’, ‘log10’
Arithmetic operations are applied in C++ layer immediately after the elements are read from virtual memory. This means that operations that are implemented in C and/or C++ for efficiency (such as summary statistics) will also reflect the execution of the deferred arithmetic operations.
A new matter
object with the registered deferred operation. Data in storage is not modified; only object metadata is changed.
Kylie A. Bemis
Arith
,
Compare
,
Logic
,
Ops
,
Math
x <- matter(1:100) y <- 2 * x + 1 x[1:10] y[1:10] mean(x) mean(y)
x <- matter(1:100) y <- 2 * x + 1 x[1:10] y[1:10] mean(x) mean(y)
These functions are provided for compatibility with older versions of matter, and will be removed in the future.
Downsamples a signal for the purposes of visualization. A subset of the original samples are used to represent the signal. The downsampled signal is intended to resemble the original when visualized and should not typically be used for downstream processing.
downsample(x, n = length(x) / 10L, domain = NULL, method = c("lttb", "ltob", "dynamic"))
downsample(x, n = length(x) / 10L, domain = NULL, method = c("lttb", "ltob", "dynamic"))
x |
A numeric vector. |
n |
The length of the downsampled signal. |
domain |
The domain variable of the signal. |
method |
The downsampling method to be used. Must be one of |
This function implements the downsampling methods from Sveinn Steinarsson's 2013 MSc thesis Downsampling Time Series for Visual Representation, including largest-triangle-three-buckets (LTTB), largest-triangle-one-bucket (LTOB), and dynamic binning.
A vector of length n
, giving the downsampled signal.
Kylie A. Bemis
S. Steinarsson. “Downsampling Time Series for Visual Representation.” MSc, School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland, June 2013.
set.seed(1) t <- seq(from=0, to=6 * pi, length.out=2000) x <- sin(t) + 0.6 * sin(2.6 * t) x <- x + runif(length(x)) xs <- downsample(x, n=200) s <- attr(xs, "sample") plot(x, type="l") points(s, xs, col="red", type="b")
set.seed(1) t <- seq(from=0, to=6 * pi, length.out=2000) x <- sin(t) + 0.6 * sin(2.6 * t) x <- x + runif(length(x)) xs <- downsample(x, n=200) s <- attr(xs, "sample") plot(x, type="l") points(s, xs, col="red", type="b")
The drle
class stores delta-run-length-encoded vectors. These differ from other run-length-encoded vectors provided by other packages in that they allow for runs of values that each differ by a common difference (delta).
## Instance creation drle(x, type = "drle", cr_threshold = 0) is.drle(x) ## Additional methods documented below
## Instance creation drle(x, type = "drle", cr_threshold = 0) is.drle(x) ## Additional methods documented below
x |
An integer or numeric vector to convert to delta run length encoding for |
type |
The type of compression to use. Must be "drle", "rle", or "seq". The default ("drle") allows arbitrary deltas. Using type "rle" means that runs must consist of a single value (i.e., deltas must be 0). Using type "seq" means that deltas must be 1, -1, or 0. |
cr_threshold |
The compression ratio threshold to use when converting a vector to delta run length encoding. The default (0) always converts the object to |
An object of class drle
.
values
:The values that begin each run.
lengths
:The length of each run.
deltas
:The difference between the values of each run.
drle
instances can be created through drle()
.
Standard generic methods:
x[i]
:Get the elements of the uncompressed vector.
length(x)
:Get the length of the uncompressed vector.
c(x, ...)
:Combine vectors.
Kylie A. Bemis
## Create a drle vector x <- c(1,1,1,1,1,6,7,8,9,10,21,32,33,34,15) y <- drle(x) # Check that their elements are equal x == y[]
## Create a drle vector x <- c(1,1,1,1,1,6,7,8,9,10,21,32,33,34,15) y <- drle(x) # Check that their elements are equal x == y[]
Enhance the contrast in a 2D signal.
# Adjust by extreme values enhance_adj(x, frac = 0.01) # Histogram equalization enhance_hist(x, nbins = 256L) # Contrast-limited adaptive histogram equalization (CLAHE) enhance_adapt(x, width = sqrt(nrow(x) * ncol(x)) %/% 5L, clip = 0.1, nbins = 256L)
# Adjust by extreme values enhance_adj(x, frac = 0.01) # Histogram equalization enhance_hist(x, nbins = 256L) # Contrast-limited adaptive histogram equalization (CLAHE) enhance_adapt(x, width = sqrt(nrow(x) * ncol(x)) %/% 5L, clip = 0.1, nbins = 256L)
x |
A numeric matrix. |
frac |
The fraction of the highest and lowest pixel values to adjust by clamping the overall intensity range. |
nbins |
The number of gray levels in the output image. |
clip |
The normalized clip limit, expressed as a fraction of the neighborhood size. This is used to limit the maximum value of any bin in the adaptive histograms, in order avoid amplifying local noise. |
width |
The width of the sliding window used when calculating the local adaptive histograms. |
enhance_adj()
performs a simple adjustment of the overall image intensity range by clamping a fraction of the highest and lowest pixel values. This is useful for suppressing very bright hotspots, but may not be sufficient for images with globally poor contrast.
enhance_hist()
performs histogram equalization. Histogram equalization transforms the pixel values so that the histogram of the image is approximately flat. This is done by replacing the original pixel values with their associated probability in the image's empirical cumulative distribution.
enhance_adapt()
performs contrast-limited adaptive histogram equalization (CLAHE) from Zuiderveld (1994). While ordinary histogram equalization performs a global transformation on the image, adaptive histogram equalization calculates a histogram in a local neighborhood around each pixel to perform the transformation, thereby enhancing the local contrast across the image. However, this can amplify local noise, so to avoid this, the histogram is clipped to a maximum allowed bin value before transforming the pixel values. To speed up the computation, it is implemented here using a sliding window technique as described by Wang and Tao (2006).
These methods rescale the output image so that its median equals the median of the original image and it has equal interquartile range (IQR).
A numeric matrix the same dimensions as x
with the smoothed result.
Kylie A. Bemis
K. Zuiderveld. “Contrast Limited Adaptive Histogram Equalization.” Graphics Gems IV, Academic Press, pp. 474-485, 1994.
Z. Wang and J. Tao. “A Fast Implementation of Adaptive Histogram Equalization.” IEEE 8th international Conference on Signal Processing, Nov 2006.
set.seed(1) x <- matrix(0, nrow=32, ncol=32) x[9:24,9:24] <- 10 x <- x + runif(length(x)) y <- x + rlnorm(length(x)) z <- enhance_hist(y) par(mfcol=c(1,3)) image(x, col=hcl.colors(256), main="original") image(y, col=hcl.colors(256), main="multiplicative noise") image(z, col=hcl.colors(256), main="histogram equalization")
set.seed(1) x <- matrix(0, nrow=32, ncol=32) x[9:24,9:24] <- 10 x <- x + runif(length(x)) y <- x + rlnorm(length(x)) z <- enhance_hist(y) par(mfcol=c(1,3)) image(x, col=hcl.colors(256), main="original") image(y, col=hcl.colors(256), main="multiplicative noise") image(z, col=hcl.colors(256), main="histogram equalization")
Estimate the continuum (baseline) of a signal.
# Continuum based on local extrema estbase_loc(x, smooth = c("none", "loess", "spline"), span = 1/10, spar = NULL, upper = FALSE) # Convex hull estbase_hull(x, upper = FALSE) # Sensitive nonlinear iterative peak clipping (SNIP) estbase_snip(x, width = 100L, decreasing = TRUE) # Running medians estbase_med(x, width = 100L)
# Continuum based on local extrema estbase_loc(x, smooth = c("none", "loess", "spline"), span = 1/10, spar = NULL, upper = FALSE) # Convex hull estbase_hull(x, upper = FALSE) # Sensitive nonlinear iterative peak clipping (SNIP) estbase_snip(x, width = 100L, decreasing = TRUE) # Running medians estbase_med(x, width = 100L)
x |
A numeric vector. |
smooth |
A smoothing method to be applied after linearly interpolating the continuum. |
span , spar
|
Smoothing parameters for loess and spline smoothing, respectively. |
upper |
Should the upper continuum be estimated instead of the lower continuum? |
width |
The width of the smoothing window in number of samples. |
decreasing |
Use a decreasing clipping window instead of an increasing window. |
estbase_loc()
uses a simple method based on linearly interpolating from local extrema. It typically performs well enough for most situations. Signals with strong noise or wide peaks may require stronger smoothing after the interpolation step.
estbase_hull()
estimates the continuum by finding the lower or upper convex hull using the monotonic chain algorithm of A. M. Andrew (1979).
estbase_snip()
performs sensitive nonlinear iterative peak (SNIP) clipping using the adaptive clipping window from M. Morhac (2009).
estbase_med()
estimates the continuum from running medians.
A numeric vector the same length as x
with the estimated continuum.
Kylie A. Bemis
A. M. Andrew. “Another efficient algorithm for convex hulls in two dimensions.” Information Processing Letters, vol. 9, issue 5, pp. 216-219, Dec. 1979.
M. Morhac. “An algorithm for determination of peak regions and baseline elimination in spectroscopic data.” Nuclear Instruments and Methods in Physics Research A, vol. 600, issue 2, pp. 478-487, Mar. 2009.
set.seed(1) t <- seq(from=0, to=6 * pi, length.out=2000) x <- sin(t) + 0.6 * sin(2.6 * t) lo <- estbase_hull(x) hi <- estbase_hull(x, upper=TRUE) plot(x, type="l") lines(lo, col="red") lines(hi, col="blue")
set.seed(1) t <- seq(from=0, to=6 * pi, length.out=2000) x <- sin(t) + 0.6 * sin(2.6 * t) lo <- estbase_hull(x) hi <- estbase_hull(x, upper=TRUE) plot(x, type="l") lines(lo, col="red") lines(hi, col="blue")
Estimate the raster dimensions of a scattered 2D signal based on its pixel coordinates.
estdim(x, tol = 1e-6)
estdim(x, tol = 1e-6)
x |
A numeric matrix or data frame where each column gives the pixel coordinates for a different dimension. Only 2 or 3 dimensions are supported if the coordinates are irregular. Otherwise, any number of dimensions are supported. |
tol |
The tolerance allowed when estimating the resolution (i.e., pixel sizes) using |
A numeric vector giving the estimated raster dimensions.
Kylie A. Bemis
co <- expand.grid(x=1:12, y=1:9) co$x <- jitter(co$x) co$y <- jitter(co$y) estdim(co)
co <- expand.grid(x=1:12, y=1:9) co$x <- jitter(co$x) co$y <- jitter(co$y) estdim(co)
Estimate the noise across a signal.
# Difference-based noise estimation estnoise_diff(x, nbins = 1L, overlap = 0.5, index = NULL) # Dynamic noise level filtering estnoise_filt(x, nbins = 1L, overlap = 0.5, msnr = 2, threshold = 0.5, isPeaks = FALSE, index = NULL) # SD-based noise estimation estnoise_sd(x, nbins = 1, overlap = 0.5, k = 9, index = NULL) # MAD-based noise estimation estnoise_mad(x, nbins = 1, overlap = 0.5, k = 9, index = NULL) # Quantile-based noise estimation estnoise_quant(x, nbins = 1, overlap = 0.5, k = 9, prob = 0.95, index = NULL)
# Difference-based noise estimation estnoise_diff(x, nbins = 1L, overlap = 0.5, index = NULL) # Dynamic noise level filtering estnoise_filt(x, nbins = 1L, overlap = 0.5, msnr = 2, threshold = 0.5, isPeaks = FALSE, index = NULL) # SD-based noise estimation estnoise_sd(x, nbins = 1, overlap = 0.5, k = 9, index = NULL) # MAD-based noise estimation estnoise_mad(x, nbins = 1, overlap = 0.5, k = 9, index = NULL) # Quantile-based noise estimation estnoise_quant(x, nbins = 1, overlap = 0.5, k = 9, prob = 0.95, index = NULL)
x |
A numeric vector. |
nbins |
The number of bins to use if estimating a variable noise level. |
overlap |
The fraction of overlap between bins if estimating a variable noise level. |
msnr |
The minimum signal-to-noise ratio for distinguishing signal peaks from noise peaks. |
threshold |
The required signal-to-noise difference for the first non-noise peak. |
isPeaks |
Does |
index |
A matrix or list of domain vectors giving the sampling locations of |
k |
The size of the smoothing window when calculating the difference between the raw signal and a smoothed version of the signal. |
prob |
The quantile used when estimating the noise. |
estnoise_diff()
estimates the local noise from the mean absolute differences between the signal and the average of its derivative. For noisy signals, the derivative is dominated by the noise, making it a useful estimator of the noise.
estnoise_sd()
, estnoise_mad()
, and estnoise_quant()
all estimate the local noise by first smoothing the signal with a Gaussian filter, and then subtracting the raw signal from the smoothed signal to isolate the noise component. The noise is then summarized with the corresponding statistic.
estnoise_filt()
uses the dynamic noise level filtering algorithm of Xu and Freitas (2010) based on the local signal in an approach similar to Gallia et al. (2013). The peaks in the signal are sorted, and the smallest peak is assumed to be noise and is used to estimate the noise level. Each peak is then compared to the previous peak. A peak is labeled a signal peak only if it exceeds a minimum signal-to-noise ratio. Otherwise, the peak is labeled noise, and the noise level is re-estimated. This process continues until a signal peak is found, and the noise level is estimated from the noise peaks.
A numeric vector the same length as x
with the estimated local noise level.
Kylie A. Bemis
H. Xu and M. A. Freitas. “A Dynamic Noise Level Algorithm for Spectral Screening of Peptide MS/MS Spectra.” BMC Bioinformatics, vol. 11, no. 436, Aug. 2010.
J. Gallia, K. Lavrich, A. Tan-Wilson, and P. H. Madden. “Filtering of MS/MS data for peptide identification.” BMC Genomics, vol. 14, suppl. 7, Nov. 2013.
# simple signal set.seed(1) n <- 500 x <- rnorm(n) x <- x + 90 * dnorm(seq_along(x), mean=n/4) x <- x + 80 * dnorm(seq_along(x), mean=n/2) x <- x + 70 * dnorm(seq_along(x), mean=3*n/4) ns <- estnoise_quant(x) plot(x, type="l") lines(ns, col="blue") # simulated spectrum set.seed(1) x <- simspec(size=5000) ns <- estnoise_quant(x, nbins=20) plot(x, type="l") lines(ns, col="blue")
# simple signal set.seed(1) n <- 500 x <- rnorm(n) x <- x + 90 * dnorm(seq_along(x), mean=n/4) x <- x + 80 * dnorm(seq_along(x), mean=n/2) x <- x + 70 * dnorm(seq_along(x), mean=3*n/4) ns <- estnoise_quant(x) plot(x, type="l") lines(ns, col="blue") # simulated spectrum set.seed(1) x <- simspec(size=5000) ns <- estnoise_quant(x, nbins=20) plot(x, type="l") lines(ns, col="blue")
Estimate the resolution (approximate sampling rate) of a signal based on its domain values.
estres(x, tol = NA, ref = NA_character_)
estres(x, tol = NA, ref = NA_character_)
x |
A numeric vector giving the domain values of the signal. |
tol |
The tolerance allowed when determining if the estimated resolution is valid (i.e., actually matches the given domain values). Noise in the sampling rate will be allowed up to this amount. If |
ref |
If 'abs', then comparison is done by taking the absolute difference. If 'x', then relative differences are used. If missing, then the funciton will try to determine which gives a better fit to the domain values. |
A single number named "absolute" or "relative" giving the approximate constant sampling rate matching the given domain values. NA
if a sampling rate could not be determined.
Kylie A. Bemis
x <- seq_rel(501, 600, by=1e-3) estres(x)
x <- seq_rel(501, 600, by=1e-3) estres(x)
The FastMap algorithm performs approximate multidimensional scaling (MDS) based on any distance function. It is faster and more efficient than traditional MDS algorithms, scaling as O(n) rather than O(n^2). FastMap accomplishes this by finding two distant pivot objects on some hyperplane for each projected dimension, and then projecting all other objects onto the line between these pivots.
# FastMap projection fastmap(x, k = 3L, group = NULL, distfun = NULL, transpose = FALSE, pivots = 10L, niter = 10L, verbose = NA, ...) ## S3 method for class 'fastmap' predict(object, newdata, ...)
# FastMap projection fastmap(x, k = 3L, group = NULL, distfun = NULL, transpose = FALSE, pivots = 10L, niter = 10L, verbose = NA, ...) ## S3 method for class 'fastmap' predict(object, newdata, ...)
x |
A numeric matrix-like object. |
k |
The number of FastMap components to project. |
group |
Grouping variable if pivots should be guaranteed to come from different groups. |
distfun |
A distance function with the same usage (i.e., supports the same arguments and return values) as |
transpose |
A logical value indicating whether |
pivots |
The number of pivot candidates to attempt each iteration. Using more pivot candidates can help improve the quality of the pivot selection. Using fewer can help speed up computation. |
niter |
The maximum number of iterations for selecting the pivots. |
... |
Additional options passed to |
object |
An object inheriting from |
newdata |
An optional data matrix to use for the prediction. |
verbose |
Should progress be printed for each pivot iteration and FastMap component projection? |
The pivots are initialized randomly for each new dimension, so the selection of pivots (and therefore the resulting projection) can be sensitive to the random seed for some datasets.
A custom distance function can be passed via distfun
. If not provided, then this defaults to rowDists
if transpose=FALSE
or colDists
if transpose=TRUE
.
If a custom function is passed, it must support the same arguments and return values as rowDists
and colDists
.
An object of class fastmap
, with the following components:
x
: The projected variable matrix.
sdev
: The standard deviations of each column of the projected matrix x
.
pivots
: A matrix giving the indices of the pivots and the distances between them.
pivot.array
: A subset of the original data matrix containing only the pivots.
distfun
: The function used to generate the distance function.
Kylie A. Bemis
C. Faloutsos, and D. Lin. “FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets.” Proceedings of the 1995 ACM SIGMOD international conference on Management of data, pp. 163 - 174, June 1995.
rowDists
,
colDists
,
cmdscale
,
prcomp
register(SerialParam()) set.seed(1) a <- matrix(sort(runif(500)), nrow=50, ncol=10) b <- matrix(rev(sort(runif(500))), nrow=50, ncol=10) x <- cbind(a, b) fm <- fastmap(x, k=2)
register(SerialParam()) set.seed(1) a <- matrix(sort(runif(500)), nrow=50, ncol=10) b <- matrix(rev(sort(runif(500))), nrow=50, ncol=10) x <- cbind(a, b) fm <- fastmap(x, k=2)
These are generic functions for moving data in R objects between shared memory and file storage for sharing with other R processes.
fetch(object, ...) flash(object, ...)
fetch(object, ...) flash(object, ...)
object |
An object with data to move. |
... |
Additional arguments to the |
The fetch
methods for matter
objects return new matter
objects that use shared memory storage.
The flash
methods for matter
objects return new matter
objects that use file storage.
A new object (typically of the same class but not necessarily) with data in the specified storage format.
Kylie A. Bemis
set.seed(1) x <- as.matter(runif(10)) path(x) # copy x into shared memory y <- fetch(x) path(y) # copy y into file storage z <- flash(y) path(z)
set.seed(1) x <- as.matter(runif(10)) path(x) # copy x into shared memory y <- fetch(x) path(y) # copy y into file storage z <- flash(y) path(z)
Smooth a uniformly sampled 1D signal.
# Moving average filter filt1_ma(x, width = 5L) # Linear convolution filter filt1_conv(x, weights) # Gaussian filter filt1_gauss(x, width = 5L, sd = (width %/% 2) / 2) # Bilateral filter filt1_bi(x, width = 5L, sddist = (width %/% 2) / 2, sdrange = mad(x, na.rm = TRUE)) # Bilateral filter with adaptive parameters filt1_adapt(x, width = 5L, spar = 1) # Nonlinear diffusion filt1_diff(x, niter = 3L, kappa = 50, rate = 0.25, method = 1L) # Guided filter filt1_guide(x, width = 5L, guide = x, sdreg = mad(x, na.rm = TRUE)) # Peak-aware guided filter filt1_pag(x, width = 5L, guide = NULL, sdreg = mad(x, na.rm = TRUE), ftol = 1/10) # Savitzky-Golay filter filt1_sg(x, width = 5L, order = min(3L, width - 2L), deriv = 0, delta = 1)
# Moving average filter filt1_ma(x, width = 5L) # Linear convolution filter filt1_conv(x, weights) # Gaussian filter filt1_gauss(x, width = 5L, sd = (width %/% 2) / 2) # Bilateral filter filt1_bi(x, width = 5L, sddist = (width %/% 2) / 2, sdrange = mad(x, na.rm = TRUE)) # Bilateral filter with adaptive parameters filt1_adapt(x, width = 5L, spar = 1) # Nonlinear diffusion filt1_diff(x, niter = 3L, kappa = 50, rate = 0.25, method = 1L) # Guided filter filt1_guide(x, width = 5L, guide = x, sdreg = mad(x, na.rm = TRUE)) # Peak-aware guided filter filt1_pag(x, width = 5L, guide = NULL, sdreg = mad(x, na.rm = TRUE), ftol = 1/10) # Savitzky-Golay filter filt1_sg(x, width = 5L, order = min(3L, width - 2L), deriv = 0, delta = 1)
x |
A numeric vector. |
width |
The width of the smoothing window in number of samples. Must be positive. Must be odd. |
weights |
The weights of the linear convolution kernel. Length must be odd. |
sd , sddist
|
The spatial parameter for kernel-based filters. This controls the strength of smoothing for samples farther from the center of the smoothing window. |
sdrange |
The range parameter for kernel-based filters. This controls the strength of the smoothing for samples with signal values very different from the center of the smoothing window. |
spar |
The strength of the smoothing when calculating the adaptive bilateral filtering parameters. The larger the number, the stronger the smoothing. Must be positive. |
kappa |
The constant for the conduction coefficient for nonlinear diffusion. Must be positive. |
rate |
The rate of diffusion. Must be between 0 and 0.25 for stability. |
method |
The diffusivity method, where |
niter |
The number of iterations for nonlinear diffusion. Must be positive. |
guide |
The guide signal for guided filtering. This is the signal used to determine the degree of filtering for different regions of the sample. By default, it is the same as the signal to be smoothed. |
sdreg |
The regularization parameter for guided filtering. This is analagous to the range parameter for kernel-based filters. Signal regions with variance much smaller than this value are smoothed, while signal regions with varaince much larger than this value are preserved. |
ftol |
Specifies how large the signal value must be before it is considered a peak, expressed as a fraction of the maximum value in the signal. |
order |
The polynomial order for the Savitzky-Golay filter coefficients. |
deriv |
The order of the derivative for the Savitzky-Golay filter coefficients. |
delta |
The sample spacing for the Savitzky-Golay filter. Only used if |
filt1_ma()
performs mean filtering in O(n) time. This is fast and especially useful for calculating other filters that can be constructed as a combination of mean filters.
filt1_gauss()
performs Gaussian filtering.
filt1_bi()
and filt1_adapt()
perform edge-preserving bilateral filtering. The latter calculates the kernel parameters adapatively based on the local signal, using a strategy adapted from Joseph and Periyasamy (2018).
filt1_diff()
performs the nonlinear diffusion filtering of Perona and Malik (1990). Rather than relying on a filter width, it progressively diffuses (smooths) the signal over multiple iterations. More iterations will result in a smoother image.
filt1_guide()
performs edge-preserving guided filtering. Guided filtering uses a local linear model based on the structure of a so-called "guidance signal". By default, the guidance signal is often the same as the input signal. Guided filtering performs similarly to bilateral filtering, but is often faster (though with more memory use), as it is implemented as a combination of mean filters.
filt1_pag()
performs peak-aware guided filtering using a regularization parameter that focuses on preserving peaks rather than edges, using a strategy adapted from Liu and He (2022). By default, the guidance signal is generated by smoothing the input signal with nonlinear diffusion.
filt1_sg()
performs traditional Savitzky-Golay filtering, which uses a local least-squares polynomial approximation to perform the smoothing. It reduces noise while attempting to retain the peak shape and height.
A numeric vector the same length as x
with the smoothed result.
Kylie A. Bemis
J. Joseph and R. Perisamy. “An image driven bilateral filter with adaptive range and spatial parameters for denoising Magnetic Resonance Images.” Computers and Electrical Engineering, vol. 69, pp. 782-795, July 2018.
P. Perona and J. Malik. “Scale-space and edge detection using anisotropic diffusion.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, issue 7, pp. 629-639, July 1990.
K. He, J. Sun, and X. Tang. “Guided Image Filtering.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 6, pp. 1397-1409, June 2013.
D. Liu and C. He. “Peak-aware guided filtering for spectrum signal denoising.” Chemometrics and Intelligent Laboratory Systems, vol. 222, March 2022.
A. Savitzky and M. J. E. Golay. “Smoothing and Differentiation of Data by Simplified Least Squares Procedures.” Analytical Chemistry, vol. 36, no. 8, pp. 1627-1639, July 1964.
set.seed(1) t <- seq(from=0, to=6 * pi, length.out=5000) y <- sin(t) + 0.6 * sin(2.6 * t) x <- y + runif(length(y)) xs <- filt1_gauss(x, width=25) plot(x, type="l") lines(xs, col="red")
set.seed(1) t <- seq(from=0, to=6 * pi, length.out=5000) y <- sin(t) + 0.6 * sin(2.6 * t) x <- y + runif(length(y)) xs <- filt1_gauss(x, width=25) plot(x, type="l") lines(xs, col="red")
Smooth a uniformly sampled 2D signal.
# Moving average filter filt2_ma(x, width = 5L) # Linear convolution filter filt2_conv(x, weights) # Gaussian filter filt2_gauss(x, width = 5L, sd = (width %/% 2) / 2) # Bilateral filter filt2_bi(x, width = 5L, sddist = (width %/% 2) / 2, sdrange = mad(x, na.rm = TRUE)) # Bilateral filter with adaptive parameters filt2_adapt(x, width = 5L, spar = 1) # Nonlinear diffusion filt2_diff(x, niter = 3L, kappa = 50, rate = 0.25, method = 1L) # Guided filter filt2_guide(x, width = 5L, guide = x, sdreg = mad(x, na.rm = TRUE))
# Moving average filter filt2_ma(x, width = 5L) # Linear convolution filter filt2_conv(x, weights) # Gaussian filter filt2_gauss(x, width = 5L, sd = (width %/% 2) / 2) # Bilateral filter filt2_bi(x, width = 5L, sddist = (width %/% 2) / 2, sdrange = mad(x, na.rm = TRUE)) # Bilateral filter with adaptive parameters filt2_adapt(x, width = 5L, spar = 1) # Nonlinear diffusion filt2_diff(x, niter = 3L, kappa = 50, rate = 0.25, method = 1L) # Guided filter filt2_guide(x, width = 5L, guide = x, sdreg = mad(x, na.rm = TRUE))
x |
A numeric matrix. |
width |
The width of the smoothing window in number of samples. Must be positive. Must be odd. |
weights |
A matrix of weights for the linear convolution kernel. Dimensions must be odd. |
sd , sddist
|
The spatial parameter for kernel-based filters. This controls the strength of smoothing for samples farther from the center of the smoothing window. |
sdrange |
The range parameter for kernel-based filters. This controls the strength of the smoothing for samples with signal values very different from the center of the smoothing window. |
spar |
The strength of the smoothing when calculating the adaptive bilateral filtering parameters. The larger the number, the stronger the smoothing. Must be positive. |
kappa |
The constant for the conduction coefficient for nonlinear diffusion. Must be positive. |
rate |
The rate of diffusion. Must be between 0 and 0.25 for stability. |
method |
The diffusivity method, where |
niter |
The number of iterations for nonlinear diffusion. Must be positive. |
guide |
The guide signal for guided filtering. This is the signal used to determine the degree of filtering for different regions of the sample. By default, it is the same as the signal to be smoothed. |
sdreg |
The regularization parameter for guided filtering. This is analagous to the range parameter for kernel-based filters. Signal regions with variance much smaller than this value are smoothed, while signal regions with varaince much larger than this value are preserved. |
filt2_ma()
performs mean filtering in O(n) time. This is fast and especially useful for calculating other filters that can be constructed as a combination of mean filters.
filt2_gauss()
performs Gaussian filtering.
filt2_bi()
and filt2_adapt()
perform edge-preserving bilateral filtering. The latter calculates the kernel parameters adapatively based on the local signal, using a strategy adapted from Joseph and Periyasamy (2018).
filt2_diff()
performs the nonlinear diffusion filtering of Perona and Malik (1990). Rather than relying on a filter width, it progressively diffuses (smooths) the signal over multiple iterations. More iterations will result in a smoother image.
filt2_guide()
performs edge-preserving guided filtering. Guided filtering uses a local linear model based on the structure of a so-called "guidance signal". By default, the guidance signal is often the same as the input signal. Guided filtering performs similarly to bilateral filtering, but is often faster (though with more memory use), as it is implemented as a combination of mean filters.
A numeric matrix the same dimensions as x
with the smoothed result.
Kylie A. Bemis
J. Joseph and R. Perisamy. “An image driven bilateral filter with adaptive range and spatial parameters for denoising Magnetic Resonance Images.” Computers and Electrical Engineering, vol. 69, pp. 782-795, July 2018.
P. Perona and J. Malik. “Scale-space and edge detection using anisotropic diffusion.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, issue 7, pp. 629-639, July 1990.
K. He, J. Sun, and X. Tang. “Guided Image Filtering.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 6, pp. 1397-1409, June 2013.
set.seed(1) i <- seq(-4, 4, length.out=12) j <- seq(1, 3, length.out=9) co <- expand.grid(i=i, j=j) x <- matrix(atan(co$i / co$j), nrow=12, ncol=9) x <- 10 * (x - min(x)) / diff(range(x)) x <- x + 2.5 * runif(length(x)) xs <- filt2_gauss(x) par(mfcol=c(1,2)) image(x, col=hcl.colors(256), main="original") image(xs, col=hcl.colors(256), main="smoothed")
set.seed(1) i <- seq(-4, 4, length.out=12) j <- seq(1, 3, length.out=9) co <- expand.grid(i=i, j=j) x <- matrix(atan(co$i / co$j), nrow=12, ncol=9) x <- 10 * (x - min(x)) / diff(range(x)) x <- x + 2.5 * runif(length(x)) xs <- filt2_gauss(x) par(mfcol=c(1,2)) image(x, col=hcl.colors(256), main="original") image(xs, col=hcl.colors(256), main="smoothed")
Smooth a nonuniformly sampled signal in K-nearest neighbors.
# Moving average filter filtn_ma(x, index, k = 5L, metric = "euclidean", p = 2) # Linear convolution filter filtn_conv(x, index, weights, metric = "euclidean", p = 2) # Gaussian filter filtn_gauss(x, index, k = 5L, sd = median(r) / 2, metric = "euclidean", p = 2) # Bilateral filter filtn_bi(x, index, k = 5L, sddist = median(r) / 2, sdrange = mad(x, na.rm=TRUE), metric = "euclidean", p = 2) # Bilateral filter with adaptive parameters filtn_adapt(x, index, k = 5L, spar = 1, metric = "euclidean", p = 2)
# Moving average filter filtn_ma(x, index, k = 5L, metric = "euclidean", p = 2) # Linear convolution filter filtn_conv(x, index, weights, metric = "euclidean", p = 2) # Gaussian filter filtn_gauss(x, index, k = 5L, sd = median(r) / 2, metric = "euclidean", p = 2) # Bilateral filter filtn_bi(x, index, k = 5L, sddist = median(r) / 2, sdrange = mad(x, na.rm=TRUE), metric = "euclidean", p = 2) # Bilateral filter with adaptive parameters filtn_adapt(x, index, k = 5L, spar = 1, metric = "euclidean", p = 2)
x |
A numeric vector. |
index |
A matrix or list of domain vectors giving the sampling locations of |
k |
The number of K-nearest neighbors to use in the smoothing (including the sample being smoothed). Must be positive. |
weights |
A vector of weights for the linear convolution kernel, in order from nearest to farthest neighbors. The first weight applies to the sample being smoothed. |
sd , sddist
|
The spatial parameter for kernel-based filters. This controls the strength of smoothing for samples farther from the center of the smoothing window. The default is calculated as half the median smoothing radius (i.e., the distance to the Kth nearest neighbor). |
sdrange |
The range parameter for kernel-based filters. This controls the strength of the smoothing for samples with signal values very different from the center of the smoothing window. |
spar |
The strength of the smoothing when calculating the adaptive bilateral filtering parameters. The larger the number, the stronger the smoothing. Must be positive. |
metric |
Distance metric to use when finding the nearest neighbors. Supported metrics include "euclidean", "maximum", "manhattan", and "minkowski". |
p |
The power for the Minkowski distance. |
These functions must first perform a K-nearest neighbors search on the sample locations (index
) before smoothing the signal (x
), but they can be applied to nonuniformly sampled signals in an arbitrary number of dimensions. The nearest neighbor search is performed using a kd-tree. It is efficient for low dimensional signals, but performance will degrade in high dimensions. In general, the complexity of these filters is O(k * n log(n)).
A numeric vector the same dimensions as x
with the smoothed result.
Kylie A. Bemis
J. Joseph and R. Perisamy. “An image driven bilateral filter with adaptive range and spatial parameters for denoising Magnetic Resonance Images.” Computers and Electrical Engineering, vol. 69, pp. 782-795, July 2018.
set.seed(1) # signal intensities x <- rlnorm(100) # 3D sampling locations index <- replicate(3, runif(100)) filtn_ma(x, index, k=5)
set.seed(1) # signal intensities x <- rlnorm(100) # 3D sampling locations index <- replicate(3, runif(100)) filtn_ma(x, index, k=5)
Find peaks in a signal based on its local maxima, as determined by a sliding window.
# Find peaks findpeaks(x, width = 5L, prominence = NULL, snr = NULL, noise = "quant", bounds = TRUE, relheight = 0.005, ...) # Local maxima locmax(x, width = 5L) # Local minima locmin(x, width = 5L)
# Find peaks findpeaks(x, width = 5L, prominence = NULL, snr = NULL, noise = "quant", bounds = TRUE, relheight = 0.005, ...) # Local maxima locmax(x, width = 5L) # Local minima locmin(x, width = 5L)
x |
A numeric vector. |
width |
The number of signal elements to consider when determining if the center of the sliding window is a local extremum. |
prominence |
The minimum peak prominence used for filtering the peaks. The prominence of a peak is the height that the peak rises above the higher of its bases (i.e., its lowest contour line). A peak's bases are found as the local minima between the peak and the next higher peaks on either side. |
snr |
The minimum signal-to-noise ratio used for filtering the peaks. |
noise |
The method used to estimate the noise. See Details. |
bounds |
Whether the boundaries of each peak should be calculated and returned. A peak's boundaries are found as the nearest local minima on either side. |
relheight |
The minimum relative height (proportion of the maximum peak value) used for filtering the peaks. |
... |
Arguments passed to the noise estimation function. |
For locmax()
and locmin()
, a local extremum is defined as an element greater (or less) than all of the elements within width / 2
elements to the left of it, and greater (or less) than or equal to all of the elements within width / 2
elements to the right of it. That is, ties are resolved such that only the leftmost sample is considered the local extremum.
For findpeaks()
, the peaks are simply the local maxima of the signal. The peak boundaries are found by descending a local maximum until a local minimum is found on either side, using the same criteria as above. The peaks are optionally filtered based on their prominences.
Optionally, the signal-to-noise ratio (SNR) can be estimated and used for filtering the peaks. These use the functions estnoise_quant
, estnoise_diff
, estnoise_filt
, etc., to estimate the noise in the signal.
For locmax()
and locmin()
, an logical vector indicating whether each element is a local extremum.
For findpeaks()
, an integer vector giving the indices of the peaks, with attributes 'left_bounds' and 'right_bounds' giving the left and right boundaries of the peak as determined using the rule above.
Kylie A. Bemis
findpeaks_cwt
,
findpeaks_knn
,
estnoise_quant
,
estnoise_sd
,
estnoise_mad
,
estnoise_diff
,
estnoise_filt
,
peakwidths
,
peakareas
,
peakheights
,
binpeaks
,
mergepeaks
# simple signal x <- c(0, 1, 1, 2, 3, 2, 1, 4, 5, 1, 1, 0) locmax(x) findpeaks(x) # simulated spectrum set.seed(1) x <- simspec(size=5000) # find peaks with snr >= 3 p <- findpeaks(x, snr=3, noise="quant") plot(x, type="l") points(p, x[p], col="red") # find peaks with derivative-based noise p <- findpeaks(x, snr=3, noise="diff") plot(x, type="l") points(p, x[p], col="red")
# simple signal x <- c(0, 1, 1, 2, 3, 2, 1, 4, 5, 1, 1, 0) locmax(x) findpeaks(x) # simulated spectrum set.seed(1) x <- simspec(size=5000) # find peaks with snr >= 3 p <- findpeaks(x, snr=3, noise="quant") plot(x, type="l") points(p, x[p], col="red") # find peaks with derivative-based noise p <- findpeaks(x, snr=3, noise="diff") plot(x, type="l") points(p, x[p], col="red")
Find peaks in a signal using continuous wavelet transform (CWT).
# Find peaks with CWT findpeaks_cwt(x, snr = 2, wavelet = ricker, scales = NULL, maxdists = scales, ngaps = 3L, ridgelen = length(scales) %/% 4L, qnoise = 0.95, width = length(x) %/% 20L, bounds = TRUE) # Find ridges lines in a matrix findridges(x, maxdists, ngaps) # Continuous Wavelet Transform cwt(x, wavelet = ricker, scales = NULL)
# Find peaks with CWT findpeaks_cwt(x, snr = 2, wavelet = ricker, scales = NULL, maxdists = scales, ngaps = 3L, ridgelen = length(scales) %/% 4L, qnoise = 0.95, width = length(x) %/% 20L, bounds = TRUE) # Find ridges lines in a matrix findridges(x, maxdists, ngaps) # Continuous Wavelet Transform cwt(x, wavelet = ricker, scales = NULL)
x |
A numeric vector for |
snr |
The minimum signal-to-noise ratio used for filtering the peaks. |
wavelet |
The wavelet to be convolved with the signal. Must be a function that takes two arguments: the number of points in the wavelet |
scales |
The scales at which to perform CWT. A reasonable sequence is generated automatically if not provided. |
maxdists |
The maximum allowed shift distance between local maxima allowed when connecting maxima into ridge lines. Should be a vector the same length as |
ngaps |
The number of gaps allowed in a ridge line before it is removed from the search space. |
ridgelen |
The minimum ridge length allowed when filtering peaks. |
qnoise |
The quantile of the CWT coefficients at the smallest scale used to estimate the noise. |
width |
The width of the rolling estimation of noise quantile. |
bounds |
Whether the boundaries of each peak should be calculated and returned. A peak's boundaries are found as the nearest local minima on either side. |
findpeaks_cwt()
uses the peak detection method based on continuous wavelet transform (CWT) proposed by Du, Kibbe, and Lin (2006).
The raw signal is convolved with a wavelet (by default, a Ricker wavelet is used) at a range of different scales. This produces a matrix of CWT coefficients with a number of rows equal to the length of the original signal and each column representing a different scale of convolution.
The convolution at the smallest scales represent a good estimate of noise and peak location. The larger scales represent a smoother signal where larger peaks are prominent and smaller peaks are removed.
The method proceeds by identifying ridge lines in the CWT coefficient matrix using findridges()
. Local maxima are identified at each scale and connected across each scale, forming the ridge lines.
Finally, the local noise is estimated from the CWT coefficients at the smallest scale. The peaks are filtered based on signal-to-noise ratio and the length of their ridge lines.
For findpeaks_cwt()
, an integer vector giving the indices of the peaks, with attributes 'left_bounds' and 'right_bounds' giving the left and right boundaries of the peak as determined using the rule above.
For findridges()
, a list of matrices giving the row and column indices of the entries of each detected ridge line.
Kylie A. Bemis
findpeaks
,
peakwidths
,
peakareas
,
peakheights
,
binpeaks
,
mergepeaks
# simple signal x <- c(0, 1, 1, 2, 3, 2, 1, 4, 5, 1, 1, 0) locmax(x) findpeaks(x) # simulated spectrum set.seed(1) x <- simspec(size=5000) # find peaks with snr >= 3 p <- findpeaks_cwt(x, snr=3) plot(x, type="l") points(p, x[p], col="red") # plot ridges ridges <- attr(p, "ridges") plot(c(0, length(x)), c(0, 25), type="n") for ( ri in ridges ) lines(ri, type="o", pch=20, cex=0.5)
# simple signal x <- c(0, 1, 1, 2, 3, 2, 1, 4, 5, 1, 1, 0) locmax(x) findpeaks(x) # simulated spectrum set.seed(1) x <- simspec(size=5000) # find peaks with snr >= 3 p <- findpeaks_cwt(x, snr=3) plot(x, type="l") points(p, x[p], col="red") # plot ridges ridges <- attr(p, "ridges") plot(c(0, length(x)), c(0, 25), type="n") for ( ri in ridges ) lines(ri, type="o", pch=20, cex=0.5)
Find peaks in a signal based on its local maxima, as determined by K-nearest neighbors.
# Find peaks with KNN findpeaks_knn(x, index, k = 5L, snr = NULL, noise = "quant", relheight = 0.005, arr.ind = is.array(x), useNames = TRUE, ...) # Local maxima knnmax(x, index, k = 5L) # Local minima knnmin(x, index, k = 5L)
# Find peaks with KNN findpeaks_knn(x, index, k = 5L, snr = NULL, noise = "quant", relheight = 0.005, arr.ind = is.array(x), useNames = TRUE, ...) # Local maxima knnmax(x, index, k = 5L) # Local minima knnmin(x, index, k = 5L)
x |
A numeric vector or array. |
index |
A matrix or list of domain vectors giving the sampling locations of |
k |
The number of nearest neighbors to consider when determining whether a sample is a local extremum. |
snr |
The minimum signal-to-noise ratio used for filtering the peaks. |
noise |
The method used to estimate the noise. See Details. |
relheight |
The minimum relative height (proportion of the maximum peak value) used for filtering the peaks. |
arr.ind |
Should array indices be returned? Otherwise, return linear indices. |
useNames |
Passed to |
... |
Arguments passed to the noise estimation function. |
For knnmax()
and knnmin()
, a local extremum is defined as an element greater (or less) than all of its nearest neighbors with a lesser index, and greater (or less) than or equal to all of its nearest neighbors with a greater or equal index. That is, ties are resolved such that the sample with the lowest index is considered the local extremum.
For findpeaks_knn()
, the peaks are simply the local maxima of the signal. The peaks are optionally filtered based on their relative heights.
Optionally, the signal-to-noise ratio (SNR) can be estimated and used for filtering the peaks. These use the functions estnoise_quant
, estnoise_diff
, estnoise_filt
, etc., to estimate the noise in the signal.
For knnmax()
and knnmin()
, an logical vector or array indicating whether each element is a local extremum.
For findpeaks_knn()
, an integer vector (or matrix if arr.ind=TRUE
) giving the indices of the peaks.
Kylie A. Bemis
findpeaks
,
estnoise_quant
,
estnoise_sd
,
estnoise_mad
,
estnoise_diff
,
estnoise_filt
# simple 2D signal x <- c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 1, 4, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 3, 2, 1, 0, 0, 0, 0, 0, 0, 1, 3, 3, 0, 0, 1, 4, 4, 3, 1, 0, 0, 0, 0, 0, 0, 3, 2, 0, 1, 0, 3, 2, 3, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 2, 2, 3, 0, 0) x <- matrix(x, nrow=7, ncol=15, byrow=TRUE) # find peaks in 2D using KNN findpeaks_knn(x)
# simple 2D signal x <- c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 1, 4, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 3, 2, 1, 0, 0, 0, 0, 0, 0, 1, 3, 3, 0, 0, 1, 4, 4, 3, 1, 0, 0, 0, 0, 0, 0, 3, 2, 0, 1, 0, 3, 2, 3, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 2, 2, 3, 0, 0) x <- matrix(x, nrow=7, ncol=15, byrow=TRUE) # find peaks in 2D using KNN findpeaks_knn(x)
Check if a series of x-y points are contained in a closed 2D polygon.
inpoly(points, poly)
inpoly(points, poly)
points |
A 2-column numeric matrix with the points to check. |
poly |
A 2-column numeric matrix with the vertices of the polygon. |
This function works by extending a horizontal ray from each point and counting the number of times it crosses an edge of the polygon.
A logical vector that is TRUE
for points that are fully inside the polygon, a vertex, or on an edge, and FALSE
for points fully outside the polygon.
There are various public implementations of this function with no clear original source. The version implemented here is loosely based on code by W. Randolph Franklin with modifications so that vertices and points on edges are considered inside the polygon.
W. R. Franklin and Kylie A. Bemis
W. R. Franklin. “PNPOLY - Point Inclusion in Polygon Test.” https://wrfranklin.org/Research/Short_Notes/pnpoly.html, 1970.
M. Shimrat, "Algorithm 112, Position of Point Relative to Polygon", Comm. ACM 5(8), pp. 434, Aug 1962.
E. Haines. “Point in Polygon Strategies.” http://www.acm.org/pubs/tog/editors/erich/ptinpoly/, 1994.
poly <- data.frame( x=c(3,5,5,3), y=c(3,3,5,5)) xy <- data.frame( x=c(4,6,4,2,3,5,5,3,4,5,4,4, 3.5,4.5,4.0,3.5), y=c(2,4,6,4,3,3,5,5,3,4,5,3, 4.0,4.0,4.5,4.0), ref=c( rep("out", 4), rep("vertex", 4), rep("edge", 4), rep("in", 4))) xy$test <- inpoly(xy[,1:2], poly) xy
poly <- data.frame( x=c(3,5,5,3), y=c(3,3,5,5)) xy <- data.frame( x=c(4,6,4,2,3,5,5,3,4,5,4,4, 3.5,4.5,4.0,3.5), y=c(2,4,6,4,3,3,5,5,3,4,5,3, 4.0,4.0,4.5,4.0), ref=c( rep("out", 4), rep("vertex", 4), rep("edge", 4), rep("in", 4))) xy$test <- inpoly(xy[,1:2], poly) xy
Quote strings representing variable identifiers with backticks for use in formulas.
iQuote(x, q = "`")
iQuote(x, q = "`")
x |
An object coercible to a character vector. |
q |
The kind of quotes to be used. |
A character vector of the same length as x
.
Kylie A. Bemis
x1 <- "This is a non-syntactic variable name" x2 <- "This is another variable name" fm <- paste0(iQuote(x1), "~", iQuote(x2)) as.formula(fm)
x1 <- "This is a non-syntactic variable name" x2 <- "This is another variable name" fm <- paste0(iQuote(x1), "~", iQuote(x2)) as.formula(fm)
These functions modify the environment of a function to isolate a closure from its original environment.
isofun(fun, envir = baseenv()) isoclos(fun, envir = baseenv())
isofun(fun, envir = baseenv()) isoclos(fun, envir = baseenv())
fun |
A function to isolate. |
envir |
The new parent environment for the closure. |
A common challenge with parallel programming in R is unintentionally serializing large environments that have been captured by closures. The functions isofun
and isoclos
provide straightforward ways to isolate functions and closures from their original parent environments which may contain large objects.
isofun
simply replaces the environment of fun
with envir
, which defaults to the base environment. This is appropriate for simple functions that do not need to enclose any variables, instead taking all of their inputs as formal arguments.
isoclos
creates a new closure that is isolated from fun
's original environment. All objects in environment(fun)
are first copied into a new environment with parent envir
, and then this new environment is assigned to fun
.
Note that the default envir=baseenv()
means that any functions not in the base environment must be fully qualified (e.g., stats::sd
versus sd
).
A new function or closure, isolated from environment(fun)
.
Kylie A. Bemis
register(SerialParam()) bigfun <- function(x) { # isolate 'smallfun' from 'x' smallfun <- isofun(function(xi) { (xi - mean(xi)) / stats::sd(xi) }) chunkApply(x, 2L, smallfun) } set.seed(1) x <- matrix(runif(100), nrow=10, ncol=10) bigfun(x)
register(SerialParam()) bigfun <- function(x) { # isolate 'smallfun' from 'x' smallfun <- isofun(function(xi) { (xi - mean(xi)) / stats::sd(xi) }) chunkApply(x, 2L, smallfun) } set.seed(1) x <- matrix(runif(100), nrow=10, ncol=10) bigfun(x)
Search a matrix of K-dimensional data points and return the indices of the nearest neighbors or of all data points that are within a specified tolerance in each dimension.
# Nearest neighbor search knnsearch(x, data, k = 1L, metric = "euclidean", p = 2) # Range search kdsearch(x, data, tol = 0, tol.ref = "abs") # K-D tree kdtree(data)
# Nearest neighbor search knnsearch(x, data, k = 1L, metric = "euclidean", p = 2) # Range search kdsearch(x, data, tol = 0, tol.ref = "abs") # K-D tree kdtree(data)
x |
A numeric matrix of coordinates to be matched. Each column should represent a dimension. Each row should be a query point. |
data |
Either a |
k |
The number of nearest neighbors to find for each point (row) in |
metric |
Distance metric to use when finding the nearest neighbors. Supported metrics include "euclidean", "maximum", "manhattan", and "minkowski". |
p |
The power for the Minkowski distance. |
tol |
The tolerance for finding neighboring points in each dimension. May be a vector with the same length as the number of dimensions. Must be positive. |
tol.ref |
One of 'abs', 'x', or 'y'. If 'abs', then comparison is done by taking the absolute difference. If either 'x' or 'y', then relative differences are used, and this specifies which to use as the reference (target) value. |
knnsearch()
performs k-nearest neighbor searches. kdsearch()
performs range searches for points within a given tolerance of the query points.
The algorithms are implemented in C and work by building a kd-tree to perform the search. If multiple calls to kdsearch()
or knnsearch()
are expected on the same data, it can be much faster to build the tree once with kdtree()
.
A kd-tree is essentially a multidimensional generalization of a binary search tree. Building the search tree is O(n * log n) and searching for a single data point is O(log n).
For knnsearch()
, ties are broken based on the original ordering of the rows in data
.
For knnsearch()
, a matrix with rows equal to the number of rows of x
and columns equal to k
giving the indices of the k-nearest neighbors.
For kdsearch()
, a list with length equal to the number of rows of x
, where each list element is a vector of indexes of the matches in data
.
Kylie A. Bemis
d <- expand.grid(x=1:10, y=1:10) x <- rbind(c(1.11, 2.22), c(3.33, 4.44)) knnsearch(x, d, k=3)
d <- expand.grid(x=1:10, y=1:10) x <- rbind(c(1.11, 2.22), c(3.33, 4.44)) knnsearch(x, d, k=3)
The matter_arr
class implements out-of-core arrays.
## Instance creation matter_arr(data, type = "double", path = NULL, dim = NA_integer_, dimnames = NULL, offset = 0, extent = NA_real_, readonly = NA, append = FALSE, rowMaj = FALSE, ...) matter_mat(data, type = "double", path = NULL, nrow = NA_integer_, ncol = NA_integer_, dimnames = NULL, offset = 0, extent = NA_real_, readonly = NA, append = FALSE, rowMaj = FALSE, ...) matter_vec(data, type = "double", path = NULL, length = NA_integer_, names = NULL, offset = 0, extent = NA_real_, readonly = NA, append = FALSE, rowMaj = FALSE, ...) ## Additional methods documented below
## Instance creation matter_arr(data, type = "double", path = NULL, dim = NA_integer_, dimnames = NULL, offset = 0, extent = NA_real_, readonly = NA, append = FALSE, rowMaj = FALSE, ...) matter_mat(data, type = "double", path = NULL, nrow = NA_integer_, ncol = NA_integer_, dimnames = NULL, offset = 0, extent = NA_real_, readonly = NA, append = FALSE, rowMaj = FALSE, ...) matter_vec(data, type = "double", path = NULL, length = NA_integer_, names = NULL, offset = 0, extent = NA_real_, readonly = NA, append = FALSE, rowMaj = FALSE, ...) ## Additional methods documented below
data |
An optional data vector which will be initially written to virtual memory if provided. |
type |
A 'character' vector giving the storage mode of the data in virtual memory. See |
path |
A 'character' vector of the path(s) to the file(s) where the data are stored. If |
dim , nrow , ncol , length
|
The dimensions of the array, or the number of rows and columns, or the length. |
dimnames , names
|
The names of the matrix dimensions or vector elements. |
offset |
A vector giving the offsets in number of bytes from the beginning of each file in 'path', specifying the start of the data to be accessed for each file. |
extent |
A vector giving the length of the data for each file in 'path', specifying the number of elements of size 'type' to be accessed from each file. |
readonly |
Whether the data and file(s) should be treated as read-only or read/write. |
append |
If |
rowMaj |
Whether the data is stored in row-major or column-major order. The default is to use column-major order, which is the same as native R matrices. |
... |
Additional arguments to be passed to constructor. |
An object of class matter_arr
.
data
:This slot stores any information necessary to access the data for the object (which may include the data itself and/or paths to file locations, etc.).
type
:The storage mode of the accessed data when read into R. This is a 'factor' with levels 'raw', 'logical', 'integer', 'numeric', or 'character'.
dim
:Either 'NULL' for vectors, or an integer vector of length one of more giving the maximal indices in each dimension for matrices and arrays.
names
:The names of the data elements for vectors.
dimnames
:Either 'NULL' or the names for the dimensions. If not 'NULL', then this should be a list of character vectors of the length given by 'dim' for each dimension. This is always 'NULL' for vectors.
ops
:Deferred arithmetic operations.
transpose
:Indicates whether the data is stored in row-major order (TRUE) or column-major order (FALSE). For a matrix, switching the order that the data is read is equivalent to transposing the matrix (without changing any data).
indexed
:For matter_mat
only. Indicates whether the pointers to rows or columns are indexed for quick access or not.
matter_arr
instances can be created through matter_arr()
or matter()
. Matrices and vectors can also be created through matter_mat()
and matter_vec()
.
Class-specific methods:
path(x), path(x) <- value
:Get or set the data source names, i.e., file path(s).
fetch(x, ...)
:Pull data into shared memory.
flash(x, ...)
:Push data to a temporary file.
Standard generic methods:
length(x), length(x) <- value
:Get or set length.
dim(x), dim(x) <- value
:Get or set 'dim'.
names(x), names(x) <- value
:Get or set 'names'.
dimnames(x), dimnames(x) <- value
:Get or set 'dimnames'.
x[...], x[...] <- value
:Get or set the elements of the array.
as.vector(x)
:Coerce to a vector.
as.array(x)
:Coerce to an array.
cbind(x, ...), rbind(x, ...)
:Combine matrices by row or column.
t(x)
:Transpose a matrix. This is a quick operation which only changes metadata and does not touch the data representation.
rowMaj(x)
:Check the data orientation.
Kylie A. Bemis
x <- matter_arr(1:1000, dim=c(10,10,10)) x
x <- matter_arr(1:1000, dim=c(10,10,10)) x
The matter_fct
class implements out-of-core factors.
## Instance creation matter_fct(data, levels, type = typeof(levels), path = NULL, length = NA_integer_, names = NULL, offset = 0, extent = NA_real_, readonly = NA, append = FALSE, labels = as.character(levels), ...) ## Additional methods documented below
## Instance creation matter_fct(data, levels, type = typeof(levels), path = NULL, length = NA_integer_, names = NULL, offset = 0, extent = NA_real_, readonly = NA, append = FALSE, labels = as.character(levels), ...) ## Additional methods documented below
data |
An optional data vector which will be initially written to the data in virtual memory if provided. |
levels |
The levels of the factor. These should be of the same type as the data. (Use |
type |
A 'character' vector giving the storage mode of the data in virtual memory. See |
path |
A 'character' vector of the path(s) to the file(s) where the data are stored. If |
length |
The length of the factor. |
names |
The names of the data elements. |
offset |
A vector giving the offsets in number of bytes from the beginning of each file in 'path', specifying the start of the data to be accessed for each file. |
extent |
A vector giving the length of the data for each file in 'path', specifying the number of elements of size 'type' to be accessed from each file. |
readonly |
Whether the data and file(s) should be treated as read-only or read/write. |
append |
If |
labels |
An optional character vector of labels for the factor levels. |
... |
Additional arguments to be passed to constructor. |
An object of class matter_fct
.
data
:This slot stores any information necessary to access the data for the object (which may include the data itself and/or paths to file locations, etc.).
type
:The storage mode of the accessed data when read into R. This is a 'factor' with levels 'raw', 'logical', 'integer', 'numeric', or 'character'.
dim
:Either 'NULL' for vectors, or an integer vector of length one of more giving the maximal indices in each dimension for matrices and arrays.
names
:The names of the data elements for vectors.
dimnames
:Either 'NULL' or the names for the dimensions. If not 'NULL', then this should be a list of character vectors of the length given by 'dim' for each dimension. This is always 'NULL' for vectors.
levels
:The levels of the factor.
labels
:The labels for the levels.
matter_fct
instances can be created through matter_fct()
or matter()
.
Standard generic methods:
length(x), length(x) <- value
:Get or set length.
names(x), names(x) <- value
:Get or set 'names'.
x[i], x[i] <- value
:Get or set the elements of the factor.
levels(x), levels(x) <- value
:Get or set the levels of the factor.
Kylie A. Bemis
x <- matter_fct(rep(c("a", "a", "b"), 5), levels=c("a", "b", "c")) x
x <- matter_fct(rep(c("a", "a", "b"), 5), levels=c("a", "b", "c")) x
The matter_list
class implements out-of-core lists.
## Instance creation matter_list(data, type = "double", path = NULL, lengths = NA_integer_, names = NULL, offset = 0, extent = NA_real_, readonly = NA, append = FALSE, ...) ## Additional methods documented below
## Instance creation matter_list(data, type = "double", path = NULL, lengths = NA_integer_, names = NULL, offset = 0, extent = NA_real_, readonly = NA, append = FALSE, ...) ## Additional methods documented below
data |
An optional data vector which will be initially written to virtual memory if provided. |
type |
A 'character' vector giving the storage mode of the data in virtual memory. See |
path |
A 'character' vector of the path(s) to the file(s) where the data are stored. If |
lengths |
The lengths of the list elements. |
names |
The names of the list elements. |
offset |
A vector giving the offsets in number of bytes from the beginning of each file in 'path', specifying the start of the data to be accessed for each file. |
extent |
A vector giving the length of the data for each file in 'path', specifying the number of elements of size 'type' to be accessed from each file. |
readonly |
Whether the data and file(s) should be treated as read-only or read/write. |
append |
If |
... |
Additional arguments to be passed to constructor. |
An object of class matter_list
.
data
:This slot stores any information necessary to access the data for the object (which may include the data itself and/or paths to file locations, etc.).
type
:The storage mode of the accessed data when read into R. This is a 'factor' with levels 'raw', 'logical', 'integer', 'numeric', or 'character'.
dim
:Either 'NULL' for vectors, or an integer vector of length one of more giving the maximal indices in each dimension for matrices and arrays.
names
:The names of the data elements for vectors.
dimnames
:Either 'NULL' or the names for the dimensions. If not 'NULL', then this should be a list of character vectors of the length given by 'dim' for each dimension. This is always 'NULL' for vectors.
matter_list
instances can be created through matter_list()
or matter()
.
Class-specific methods:
path(x), path(x) <- value
:Get or set the data source names, i.e., file path(s).
fetch(x, ...)
:Pull data into shared memory.
flash(x, ...)
:Push data to a temporary file.
Standard generic methods:
x[[i]], x[[i]] <- value
:Get or set a single element of the list.
x[[i, j]]
:Get the j
th sub-elements of the i
th element of the list.
x[i], x[i] <- value
:Get or set the i
th elements of the list.
lengths(x)
:Get the lengths of all elements in the list.
Kylie A. Bemis
x <- matter_list(list(c(TRUE,FALSE), 1:5, c(1.11, 2.22, 3.33)), lengths=c(2,5,3)) x[] x[1] x[[1]] x[[3,1]] x[[2,1:3]]
x <- matter_list(list(c(TRUE,FALSE), 1:5, c(1.11, 2.22, 3.33)), lengths=c(2,5,3)) x[] x[1] x[[1]] x[[3,1]] x[[2,1:3]]
The matter_str
class implements out-of-core strings.
## Instance creation matter_str(data, encoding, type = "character", path = NULL, nchar = NA_integer_, names = NULL, offset = 0, extent = NA_real_, readonly = NA, append = FALSE, ...) ## Additional methods documented below
## Instance creation matter_str(data, encoding, type = "character", path = NULL, nchar = NA_integer_, names = NULL, offset = 0, extent = NA_real_, readonly = NA, append = FALSE, ...) ## Additional methods documented below
data |
An optional data vector which will be initially written to virtual memory if provided. |
encoding |
The character encoding to use (if known). |
type |
A 'character' vector giving the storage mode of the data in virtual memory. See |
path |
A 'character' vector of the path(s) to the file(s) where the data are stored. If |
nchar |
A vector giving the length of each element of the character vector. |
names |
The names of the data elements. |
offset |
A vector giving the offsets in number of bytes from the beginning of each file in 'path', specifying the start of the data to be accessed for each file. |
extent |
A vector giving the length of the data for each file in 'path', specifying the number of elements of size 'type' to be accessed from each file. |
readonly |
Whether the data and file(s) should be treated as read-only or read/write. |
append |
If |
... |
Additional arguments to be passed to constructor. |
An object of class matter_str
.
data
:This slot stores any information necessary to access the data for the object (which may include the data itself and/or paths to file locations, etc.).
type
:The storage mode of the accessed data when read into R. This is a 'factor' with levels 'raw', 'logical', 'integer', 'numeric', or 'character'.
dim
:Either 'NULL' for vectors, or an integer vector of length one of more giving the maximal indices in each dimension for matrices and arrays.
names
:The names of the data elements for vectors.
dimnames
:Either 'NULL' or the names for the dimensions. If not 'NULL', then this should be a list of character vectors of the length given by 'dim' for each dimension. This is always 'NULL' for vectors.
encoding
:The string encoding used.
matter_str
instances can be created through matter_str()
or matter()
.
Standard generic methods:
x[i], x[i] <- value
:Get or set the string elements of the vector.
lengths(x)
:Get the number of characters (in bytes) of all string elements in the vector.
Kylie A. Bemis
x <- matter_str(rep(c("hello", "world!"), 50)) x
x <- matter_str(rep(c("hello", "world!"), 50)) x
The matter
class and its subclasses are designed for easy on-demand read/write access to virtual memory data structures stored in files and/or shared memory and allow working with them as vectors, matrices, arrays, and lists.
## Instance creation matter(data, ...) # Check if an object is a matter object is.matter(x) # Coerce an object to a matter object as.matter(x) # Check if an object uses shared memory is.shared(x) # Coerce an object to use shared memory as.shared(x) ## Additional methods documented below
## Instance creation matter(data, ...) # Check if an object is a matter object is.matter(x) # Coerce an object to a matter object as.matter(x) # Check if an object uses shared memory is.shared(x) # Coerce an object to use shared memory as.shared(x) ## Additional methods documented below
data |
Data passed to the subclasse constructor. |
... |
Arguments passed to subclasses. |
x |
An object to check if it is a matter object or coerce to a matter object. |
An object of class matter
.
data
:This slot stores any information necessary to access the data for the object (which may include the data itself and/or paths to file locations, etc.).
type
:The storage mode of the accessed data when read into R. This is a 'factor' with levels 'raw', 'logical', 'integer', 'numeric', or 'character'.
dim
:Either 'NULL' for vectors, or an integer vector of length one of more giving the maximal indices in each dimension for matrices and arrays.
names
:The names of the data elements for vectors.
dimnames
:Either 'NULL' or the names for the dimensions. If not 'NULL', then this should be a list of character vectors of the length given by 'dim' for each dimension. This is always 'NULL' for vectors.
matter
is a virtual class and cannot be instantiated directly, but instances of its subclasses can be created through matter()
.
Class-specific methods:
atomdata(x)
:Access the 'data' slot.
adata(x)
:An alias for atomdata(x).
type(x), type(x) <- value
:Get or set data 'type'.
Standard generic methods:
length(x), length(x) <- value
:Get or set length.
dim(x), dim(x) <- value
:Get or set 'dim'.
names(x), names(x) <- value
:Get or set 'names'.
dimnames(x), dimnames(x) <- value
:Get or set 'dimnames'.
Kylie A. Bemis
matter_arr
,
matter_mat
,
matter_vec
,
matter_fct
,
matter_list
,
matter_str
## Create a matter_vec vector x <- matter(1:100, length=100) x ## Create a matter_mat matrix y <- matter(1:100, nrow=10, ncol=10) y
## Create a matter_vec vector x <- matter(1:100, length=100) x ## Create a matter_mat matrix y <- matter(1:100, nrow=10, ncol=10) y
Set global parameters for matter
.
## Set defaults for common arguments matter_defaults(nchunks = 20L, chunksize = NA_real_, serialize = NA, verbose = FALSE)
## Set defaults for common arguments matter_defaults(nchunks = 20L, chunksize = NA_real_, serialize = NA, verbose = FALSE)
nchunks |
The number of chunks to use for chunk processing (e.g., in |
chunksize |
The approximate chunk size in bytes for chunk processing (e.g., in |
serialize |
Whether data in virtual memory should be realized on the manager and serialized to the workers ( |
verbose |
Whether progress messages should be printed. This sets |
The matter
package provides the following options:
options(matter.compress.atoms=3)
: The compression ratio threshold to be used to determine when to compress atoms in a matter object. Setting to 0 or FALSE
means that atoms are never compressed.
options(matter.default.nchunks=20L)
: The default number of chunks to use when iterating over matter objects. For IO-bound operations, using fewer chunks will often be faster, but use more memory.
options(matter.default.chunksize=NA_real_)
: The default chunk size in bytes to use when iterating over matter objects. For IO-bound operations, using larger chunks will often be faster, but use more memory. If set to NA_real_
, then the chunk size is determined by the number of chunks.
options(matter.default.serialize=NA)
: Whether virtual memory chunks should be realized on the manager and serialized to the workers (TRUE
), passed to the workers as-is FALSE
, or if matter
should decide based on the cluster configuration (NA
). If all workers have access to the same virtual memory resources (whether file storage or shared memory), then it can be significantly faster to avoid serializing the data.
options(matter.default.verbose=FALSE)
: The default verbosity for printing progress messages.
options(matter.matmul.bpparam=NULL)
: An optional BiocParallelParam
passed to bplapply
when performing matrix multiplication with matter_mat
and sparse_mat
objects.
options(matter.show.head=TRUE)
: Should a preview of the beginning of the data be displayed when the object is printed?
options(matter.show.head.n=6)
: The number of elements, rows, and/or columns to be displayed by the object preview.
options(matter.coerce.altrep=FALSE)
: When coercing matter
objects to native R objects (such as matrix
), should a matter
-backed ALTREP object be returned instead? The initial coercion will be cheap, and the result will look like a native R object. This does not guarantee that the full data is never read into memory. Not all functions are ALTREP-aware at the C-level, so some operations may still trigger the full data to be read into memory. This should only ever happen once, as long as the object is not duplicated, though.
options(matter.wrap.altrep=FALSE)
: When coercing to a matter
-backed ALTREP object, should the object be wrapped in an ALTREP wrapper? (This is always done in cases where the coercion preserves existing attributes.) This allows setting of attributes without triggering a (potentially expensive) duplication of the object when safe to do so.
options(matter.temp.dir=tempdir())
: Temporary directory where anonymous matter
object files (i.e., those created with path=NULL
) should be created.
options(matter.temp.gc=TRUE)
: If TRUE
, then anonymous matter
object files (i.e., those created with path=NULL
) are automatically cleaned up when all R objects referencing them have been garbage collected. If FALSE
, then they are only removed at the end of the R session (and only if they are in R's default temporary directory).
The matter
package defines a number of data types for translating between data elements stored in virtual memory and data elements loaded into R. These are typically set and stored via the datamode
argument and slot.
At the R level, matter
objects may be any of the following data modes:
raw
: matter
objects of this mode are typically vectors of raw bytes.
logical
: matter
object represented logical vector in R.
integer
: matter
objects represented as integer vectors in R.
numeric
: matter
objects represented as double vectors in R.
character
: matter
objects representated as character vectors in R.
list
: Not used. This type exists for inferring conversions between R types and C types, but matter_list
objects instead report the types of their elements.
In virtual memory, matter
objects may be composed of atomic units of the following data types:
char
: 8-bit signed integer; defined as char
.
uchar
: 8-bit unsigned integer; used for ‘Rbyte’ or ‘raw’; defined as unsigned char
.
int16
: 16-bit signed integer; defined as int16_t
. May be aliased as ‘short’ and ‘16-bit integer’.
uint16
: 16-bit unsigned integer; defined as uint16_t
. May be aliased as ‘ushort’ and ‘16-bit unsigned integer’.
int32
: 32-bit signed integer; defined as int32_t
. May be aliased as ‘int’ and ‘32-bit integer’.
uint32
: 32-bit unsigned integer; defined as uint32_t
. May be aliased as ‘uint’ and ‘32-bit unsigned integer’.
int64
: 64-bit signed integer; defined as int64_t
. May be aliased as ‘long’ and ‘64-bit integer’.
uint64
: 64-bit unsigned integer; defined as uint64_t
. May be aliased as ‘ulong’ and ‘64-bit unsigned integer’.
float32
: 32-bit float; defined as float
. May be aliased as ‘float’ and ‘32-bit float’.
float64
: 64-bit float; defined as double
. May be aliased as ‘double’ and ‘64-bit float’.
While a substantial effort is made to coerce data elements properly between data types, sometimes this cannot be done losslessly. Loss of precision is silent, while values outside of the representable range will generate a warning (sometimes many such warnings) and will be set to NA
if available or 0 otherwise.
Note that the unsigned data types do not support NA
; coercion between signed integer types attempts to preserve missingness. The special values NaN
, Inf
, and -Inf
are only supported by the floating-point types, and will be set to NA
for signed integral types, and to 0
for unsigned integral types.
These are utility functions for checking memory used by R and matter
in the current session and/or during the timed execution of an expression.
mem(x, reset = FALSE) memcl(cl = bpparam(), reset = FALSE) memtime(expr, verbose = NA, BPPARAM = NULL)
mem(x, reset = FALSE) memcl(cl = bpparam(), reset = FALSE) memtime(expr, verbose = NA, BPPARAM = NULL)
x |
An object, to summarize how much memory it is using. |
reset |
Should the maximum memory values be reset? |
expr |
An expression to be evaluated. |
verbose |
Should timing messages be printed? |
cl , BPPARAM
|
A |
These functions summarize the memory used by both traditional R objects and out-of-memory matter
objects. "Real" memory managed by R is summarized using gc
. "Virtual" memory managed by matter
includes shared memory allocated by matter
and temporary files created by matter
in getOption("matter.temp.dir")
.
For timing parallel code, it is useful to use memtime
in combination with its BPPARAM
argument to monitor the amount of memory used by the cluster.
For mem
called with an x
argument, a named vector summarizing the memory and storage used by the object. The named elements include:
"real"
: The amount of R memory used.
"shared"
: The amount of shared memory used.
"virtual"
: The amount of virtual memory used, including both file storage and shared memory.
For mem
called without an x
argument, a named vector summarizing the memory and storage used by the current R session. The named elements include:
"real"
: The amount of R memory used.
"shared"
: The amount of shared memory used.
"max real"
: The maximum amount of R memory used since the last reset.
"max shared"
: The maximum amount of shared memory used since the last reset.
"temp"
: The total size of temporary files managed by matter
in getOption("matter.temp.dir")
.
For memcl
, a data frame with columns corresponding to the elements described above and rows corresponding to cluster nodes.
For memtime
, a list including:
"start"
: Memory use at the start of timing.
"end"
: Memory use at the end of timing.
"cluser"
: (Optional.) A summary of the memory used by the cluster if BPPARAM
is specified.
"overhead"
: The amount of "real" memory used during the execution of expr
that is freed by the end of timing.
"change"
: The difference in "real" memory used before and after timing.
"total"
: For the current R session, the sum of "max real" and "max shared" memory used and total cluster memory used. For a cluster, the sum of "max real" memory used by all workers (not including shared memory or the managing R process).
"time"
: The execution time.
Kylie A. Bemis
x <- 1:100 mem(x) memtime(mean(x + 1))
x <- 1:100 mem(x) memtime(mean(x + 1))
Multiple instance learning is a strategy for training classifiers when the class labels are observed at a coarser level than the individual data points. For example, if an entire image is classified as "positive" or "negative" but the classifier is trained and predicts at the pixel level.
mi_learn(fn, x, y, bags, pos = 1L, ..., score = fitted, threshold = 0.01, verbose = NA)
mi_learn(fn, x, y, bags, pos = 1L, ..., score = fitted, threshold = 0.01, verbose = NA)
fn |
The function used to train the classifier. |
x |
The data matrix. |
y |
The response. This can be the same length as the data, or the same length as the number of bags. Must have exactly two levels when coerced to a factor. |
bags |
The bags to which the data points belong. The class labels are observed per-bag rather than per data point. |
pos |
The positive class label, as a string matching one of the levels of |
... |
Additional options passed to |
score |
The function used to extract the scores for prediction. |
threshold |
The stopping criterion. The learning stops when the proportion of updated labels between iterations is less than this value. |
verbose |
Should progress be printed for each iteration? Not passed to |
This is a generic wrapper for applying a multiple instance learning strategy for any classifier that satisfies certain criteria. The labels must be binary (positive and negative).
The multiple instance learning algorithm here assumes that if a single data point is positive, then the entire bag to which it belongs is labeled as positive. For example, if a single pixel in an image indicates the presence of disease, then the entire image is labeled as disease.
The model returned by fn
must support returning either a vector of probabilities or a 2-column score matrix when passed to score
.
A model object returned by fn
.
Kylie A. Bemis
D. Guo, M. C. Foell, V. Volkmann, K. Enderle-Ammour, P. Bronsert, O. Shilling, and O. Vitek. “Deep multiple instance learning classifies subtissue locations in mass spectrometry images from tissue-level annotations.” Bioinformatics, vol. 36, issue Supplement_1, pp. i300-i308, 2020.
register(SerialParam()) set.seed(1) n <- 100 p <- 5 g <- 5 bags <- rep(paste0("s", seq_len(g)), each=n %/% g) bags <- factor(rep_len(bags, n)) x <- matrix(rnorm(n * p), nrow=n, ncol=p) colnames(x) <- paste0("x", seq_len(p)) # create bagged labels y <- ifelse(bags %in% c("s1", "s2", "s3"), "pos", "neg") y <- factor(y, levels=c("pos", "neg")) ipos <- which(y == "pos") ineg <- which(y == "neg") z <- y # create "true" labels (with some within-bag noise) z[ipos] <- sample(c("pos", "neg"), length(ipos), replace=TRUE) jpos <- which(z == "pos") jneg <- which(z == "neg") # create data x[jpos,] <- x[jpos,] + rnorm(p * length(jpos), mean=1) x[jneg,] <- x[jneg,] - rnorm(p * length(jneg), mean=1) # fit ordinary NSC and mi-NSC fit0 <- nscentroids(x=x, y=y) fit1 <- mi_learn(nscentroids, x=x, y=y, bags=bags, priors=1) # improved performance on "true" labels mean(fitted(fit0, "class") == z) mean(fitted(fit1, "class") == z)
register(SerialParam()) set.seed(1) n <- 100 p <- 5 g <- 5 bags <- rep(paste0("s", seq_len(g)), each=n %/% g) bags <- factor(rep_len(bags, n)) x <- matrix(rnorm(n * p), nrow=n, ncol=p) colnames(x) <- paste0("x", seq_len(p)) # create bagged labels y <- ifelse(bags %in% c("s1", "s2", "s3"), "pos", "neg") y <- factor(y, levels=c("pos", "neg")) ipos <- which(y == "pos") ineg <- which(y == "neg") z <- y # create "true" labels (with some within-bag noise) z[ipos] <- sample(c("pos", "neg"), length(ipos), replace=TRUE) jpos <- which(z == "pos") jneg <- which(z == "neg") # create data x[jpos,] <- x[jpos,] + rnorm(p * length(jpos), mean=1) x[jneg,] <- x[jneg,] - rnorm(p * length(jneg), mean=1) # fit ordinary NSC and mi-NSC fit0 <- nscentroids(x=x, y=y) fit1 <- mi_learn(nscentroids, x=x, y=y, bags=bags, priors=1) # improved performance on "true" labels mean(fitted(fit0, "class") == z) mean(fitted(fit1, "class") == z)
Nonnegative matrix factorization (NMF) decomposes a nonnegative data matrix into a matrix of basis variables and a matrix of activations (or coefficients). The factorization is approximate and may be less accurate than alternative methods such as PCA, but can greatly improve the interpretability of the reduced dimensions.
# Alternating least squares nnmf_als(x, k = 3L, niter = 100L, transpose = FALSE, eps = 1e-9, tol = 1e-5, verbose = NA, ...) # Multiplicative updates nnmf_mult(x, k = 3L, niter = 100L, cost = c("euclidean", "KL", "IS"), transpose = FALSE, eps = 1e-9, tol = 1e-5, verbose = NA, ...) ## S3 method for class 'nnmf' predict(object, newdata, ...) # Nonnegative double SVD nndsvd(x, k = 3L, ...)
# Alternating least squares nnmf_als(x, k = 3L, niter = 100L, transpose = FALSE, eps = 1e-9, tol = 1e-5, verbose = NA, ...) # Multiplicative updates nnmf_mult(x, k = 3L, niter = 100L, cost = c("euclidean", "KL", "IS"), transpose = FALSE, eps = 1e-9, tol = 1e-5, verbose = NA, ...) ## S3 method for class 'nnmf' predict(object, newdata, ...) # Nonnegative double SVD nndsvd(x, k = 3L, ...)
x |
A nonnegative matrix. |
k |
The number of NMF components to extract. |
niter |
The maximum number of iterations. |
transpose |
A logical value indicating whether |
eps |
A small regularization parameter to prevent singularities. |
tol |
The tolerance for convergence, as measured by the Frobenius norm of the differences between the W and H matrices in successive iterations. |
verbose |
Should progress be printed for each iteration? |
cost |
The cost function (i.e., error measure between the reconstructed matrix and original |
... |
Additional options passed to |
object |
An object inheriting from |
newdata |
An optional data matrix to use for the prediction. |
These functions implement nonnegative matrix factorization (NMF) using either alternating least squares as described by Berry at al. (2007) or multiplicative updates from Lee and Seung (2000) and further described by Burred (2014). The algorithms are initialized using nonnegative double singular value decomposition (NNDSVD) from Boutsidis and Gallopoulos (2008).
The algorithm using multiplicative updates (nnmf_mult()
) tends to be more stable but converges more slowly. The alternative least squares algorithm (nnmf_als()
) tends to converge faster to more accurate results, but can be less numerically stable than the multiplicative updates algorithm.
Note for nnmf_mult()
that method = "euclidean"
is the only method that can handle out-of-memory matter_mat
and sparse_mat
matrices. x
will be coerced to an in-memory matrix for other methods.
An object of class nnmf
, with the following components:
activation
: The (transposed) coefficient matrix (H).
x
: The basis variable matrix (W).
iter
: The number of iterations performed.
Kylie A. Bemis
M. W. Berry, M. Browne, A. N. Langville, V. P. Pauca, R. J. Plemmons. “Algorithms and applications for approximate nonnegative matrix factorization.” Computational Statistics and Data Analysis, vol. 52, issue 1, pp. 155-173, Sept. 2007.
D. D. Lee and H. S. Seung. “Algorithms for non-negative matrix factorization.” Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS), pp. 535-541, Jan. 2000.
C. Boutsidis and E. Gallopoulos. “SVD based initialization: A head start for nonnegative matrix factorization.” Pattern Recognition, vol. 41, issue 4, pp. 1350-1362, Apr. 2008.
J. J. Burred. “Detailed derivation of multiplicative update rules for NMF.” Techical report, Paris, March 2014.
set.seed(1) a <- matrix(sort(runif(500)), nrow=50, ncol=10) b <- matrix(rev(sort(runif(500))), nrow=50, ncol=10) x <- cbind(a, b) mf <- nnmf_als(x, k=3)
set.seed(1) a <- matrix(sort(runif(500)), nrow=50, ncol=10) b <- matrix(rev(sort(runif(500))), nrow=50, ncol=10) x <- cbind(a, b) mf <- nnmf_als(x, k=3)
Nearest shrunken centroids performs regularized classification of high-dimensional data. Originally developed for classification of microarrays, it calculates test statistics for each feature/dimension based on the deviation between the class centroids and the global centroid. It applies regularization (via soft thresholding) to these test statistics to produce shrunken centroids for each class.
# Nearest shrunken centroids nscentroids(x, y, s = 0, distfun = NULL, priors = table(y), center = NULL, transpose = FALSE, verbose = NA, chunkopts=list(), BPPARAM = bpparam(), ...) ## S3 method for class 'nscentroids' fitted(object, type = c("response", "class"), ...) ## S3 method for class 'nscentroids' predict(object, newdata, type = c("response", "class"), priors = NULL, ...) ## S3 method for class 'nscentroids' logLik(object, ...)
# Nearest shrunken centroids nscentroids(x, y, s = 0, distfun = NULL, priors = table(y), center = NULL, transpose = FALSE, verbose = NA, chunkopts=list(), BPPARAM = bpparam(), ...) ## S3 method for class 'nscentroids' fitted(object, type = c("response", "class"), ...) ## S3 method for class 'nscentroids' predict(object, newdata, type = c("response", "class"), priors = NULL, ...) ## S3 method for class 'nscentroids' logLik(object, ...)
x |
The data matrix. |
y |
The response. (Coerced to a factor.) |
s |
The sparsity (soft thresholding) parameter used to shrink the test statistics. May be a vector. |
distfun |
A distance function with the same usage (i.e., supports the same arguments and return values) as |
priors |
The prior probabilities or sample sizes for each class. (Will be normalized.) |
center |
An optional vector giving the pre-calculated global centroid. |
transpose |
A logical value indicating whether |
verbose |
Should progress be printed for the initial centroid calculations and for each fitted model (i.e., each value of |
chunkopts |
An (optional) list of chunk options including |
BPPARAM |
An optional instance of |
... |
Additional options passed to |
object |
An object inheriting from |
newdata |
An optional data matrix to use for the prediction. |
type |
The type of prediction, where |
This functions implements nearest shrunken centroids based on the original algorithm by Tibshirani et al. (2002). It provides a sparse strategy for classification based on regularized class centroids. The class centroids are shrunken toward the global centroid. The shrunken test statistics used to perform the regularization can then be interpreted to determine which features are relevant to the classification. (Important features will have nonzero test statitistics after soft thresholding.)
A custom distance function can be passed via distfun
. If not provided, then this defaults to rowDists
if transpose=FALSE
or colDists
if transpose=TRUE
.
If a custom function is passed, it must support the same arguments and return values as rowDists
and colDists
.
An object of class nscentroids
, with the following components:
class
: The predicted classes.
probability
: A matrix of posterior class probabilities.
centers
: The shrunken class centroids used for classification.
statistic
: The shrunken test statistics.
sd
: The pooled within-class standard deviations for each feature.
priors
: The prior class probabilities.
s
: The regularization (soft thresholding) parameter.
distfun
: The function used to generate the dissimilarity function.
Kylie A. Bemis
R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. “Diagnosis of multiple cancer types by shrunken centroids of gene expression.” Proceedings of the National Academy of Sciences of the USA, vol. 99, no. 10, pp. 6567-6572, 2002.
R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. “Class prediction by nearest shrunken with applications to DNA microarrays.” Statistical Science, vol. 18, no. 1, pp. 104-117, 2003.
register(SerialParam()) set.seed(1) n <- 100 p <- 5 x <- matrix(rnorm(n * p), nrow=n, ncol=p) colnames(x) <- paste0("x", seq_len(p)) y <- ifelse(x[,1L] > 0 | x[,2L] < 0, "a", "b") nscentroids(x, y, s=1.5)
register(SerialParam()) set.seed(1) n <- 100 p <- 5 x <- matrix(rnorm(n * p), nrow=n, ncol=p) colnames(x) <- paste0("x", seq_len(p)) y <- ifelse(x[,1L] > 0 | x[,2L] < 0, "a", "b") nscentroids(x, y, s=1.5)
Summarize peaks based on their shapes and properties.
# Get peak widths peakwidths(x, peaks, domain = NULL, fmax = 0.5, ref = c("height", "prominence")) # Get peak areas peakareas(x, peaks, domain = NULL) # Get peak heights peakheights(x, peaks)
# Get peak widths peakwidths(x, peaks, domain = NULL, fmax = 0.5, ref = c("height", "prominence")) # Get peak areas peakareas(x, peaks, domain = NULL) # Get peak heights peakheights(x, peaks)
x |
A numeric vector. |
peaks |
The indices (or domain values) of peaks for which the widths or areas should be calculated. |
domain |
The domain variable of the signal. |
fmax |
The fraction of the peak's height used for determining the peak's width. |
ref |
The reference value of the peak for determining the peak width: either the peak height of the peak prominence. |
A numeric vector giving the widths, areas, or heights of the peaks with attributes 'left_bounds' and 'right_bounds' giving the left and right boundaries of the peak.
Kylie A. Bemis
findpeaks
,
findpeaks_cwt
,
binpeaks
,
mergepeaks
x <- c(0, 1, 1, 2, 3, 2, 1, 4, 5, 1, 1, 0) p <- findpeaks(x) peakareas(x, p) peakheights(x, p)
x <- c(0, 1, 1, 2, 3, 2, 1, 4, 5, 1, 1, 0) p <- findpeaks(x) peakareas(x, p) peakheights(x, p)
Calculate the Moore-Penrose pseudoinverse of a matrix.
pinv(x, tol = sqrt(.Machine$double.eps))
pinv(x, tol = sqrt(.Machine$double.eps))
x |
A matrix. |
tol |
The relative tolerance for detecting non-zero singular values. |
A matrix giving the pseudoinverse of x
.
set.seed(1) x <- diag(10) + rnorm(100) pinv(x)
set.seed(1) x <- diag(10) + rnorm(100) pinv(x)
Plot a list of superposed or faceted signals or images (in 2D or 3D).
plot_signal(x, y, z, by, group = NULL, byrow = FALSE, xlim = NULL, ylim = NULL, col = NULL, alphapow = 1, xlab = NULL, ylab = NULL, layout = NULL, free = "", n = Inf, downsampler = "lttb", key = TRUE, grid = TRUE, isPeaks = FALSE, annPeaks = 0, engine = NULL, ...) plot_image(x, y, z, vals, by, group = NULL, byrow = FALSE, zlim = NULL, xlim = NULL, ylim = NULL, col = NULL, alphapow = 1, zlab = NULL, xlab = NULL, ylab = NULL, layout = NULL, free = "", enhance = NULL, smooth = NULL, scale = NULL, key = TRUE, rasterImages = NULL, rasterParams = NULL, useRaster = !is3d, grid = TRUE, asp = 1, engine = NULL, ...)
plot_signal(x, y, z, by, group = NULL, byrow = FALSE, xlim = NULL, ylim = NULL, col = NULL, alphapow = 1, xlab = NULL, ylab = NULL, layout = NULL, free = "", n = Inf, downsampler = "lttb", key = TRUE, grid = TRUE, isPeaks = FALSE, annPeaks = 0, engine = NULL, ...) plot_image(x, y, z, vals, by, group = NULL, byrow = FALSE, zlim = NULL, xlim = NULL, ylim = NULL, col = NULL, alphapow = 1, zlab = NULL, xlab = NULL, ylab = NULL, layout = NULL, free = "", enhance = NULL, smooth = NULL, scale = NULL, key = TRUE, rasterImages = NULL, rasterParams = NULL, useRaster = !is3d, grid = TRUE, asp = 1, engine = NULL, ...)
x , y , z , vals
|
Lists of vectors to plot such that |
by |
A vector of labels indicating facets (i.e., which values should be plotted as separate sub-plots). |
group |
A vector of labels indicating groups (i.e., which values should be indicated by color as belonging to the same group). |
byrow |
If |
xlim , ylim , zlim
|
The plot limits. See |
xlab , ylab , zlab
|
Plotting labels. |
col |
A vector giving the color map for encoding the image, or a function that returns a vector of |
alphapow |
The power scaling of the alpha channel (if used). |
layout |
A vector of the form |
free |
A string specifying the free spatial dimensions during faceting. E.g., |
n , downsampler
|
See |
key |
Should a color key be generated for the image? |
grid |
Should a rectangular grid be included in the plot? |
isPeaks |
Whether the signal should be plotted as peaks or as a continuous signal. |
annPeaks |
If |
engine |
The plotting engine. Default is to use base graphics. Using "plotly" requires the |
... |
Additional graphical parameters (as in |
enhance |
The name of a contrast enhancement method, such as |
smooth |
The name of a smoothing method, such as |
scale |
If |
asp |
The aspect ratio. See |
rasterImages , rasterParams
|
A list of rasters and raster parameters (e.g., |
useRaster |
Should a bitmap raster be used for plotting? For 2D images, this is typically faster on supported devices. A fallback to polygon-based plotting is used if raster plotting is not supported. For 3D images, |
An object of class vizi_plot
.
Kylie A. Bemis
require(datasets) # plot signals set.seed(1) s <- simspec(6) plot_signal(domain(s), s, group=colnames(s)) # volcano image pos <- expand.grid(x=1:nrow(volcano), y=1:ncol(volcano)) plot_image(pos$x, pos$y, volcano, col=cpal("plasma")) # plot original and transformed images volcano2 <- trans2d(volcano, rotate=15, translate=c(-5, 5)) plot_image(list(original=volcano, transformed=volcano2))
require(datasets) # plot signals set.seed(1) s <- simspec(6) plot_signal(domain(s), s, group=colnames(s)) # volcano image pos <- expand.grid(x=1:nrow(volcano), y=1:ncol(volcano)) plot_image(pos$x, pos$y, volcano, col=cpal("plasma")) # plot original and transformed images volcano2 <- trans2d(volcano, rotate=15, translate=c(-5, 5)) plot_image(list(original=volcano, transformed=volcano2))
These functions provide plotting methods for various graphical marks. They are not intended to be called directly.
## S3 method for class 'vizi_points' plot(x, plot = NULL, ..., n = Inf, downsampler = "lttb", jitter = "", sort = is.finite(n)) ## S3 method for class 'vizi_lines' plot(x, plot = NULL, ..., n = Inf, downsampler = "lttb", jitter = "", sort = is.finite(n)) ## S3 method for class 'vizi_peaks' plot(x, plot = NULL, ..., n = Inf, downsampler = "lttb", jitter = "", sort = is.finite(n)) ## S3 method for class 'vizi_text' plot(x, plot = NULL, ..., adj = NULL, pos = NULL, offset = 0.5) ## S3 method for class 'vizi_intervals' plot(x, plot = NULL, ..., length = 0.25, angle = 90) ## S3 method for class 'vizi_rules' plot(x, plot = NULL, ...) ## S3 method for class 'vizi_bars' plot(x, plot = NULL, ..., width = 1, stack = FALSE) ## S3 method for class 'vizi_boxplot' plot(x, plot = NULL, ..., range = 1.5, notch = FALSE, width = 0.8) ## S3 method for class 'vizi_image' plot(x, plot = NULL, ..., alpha = NA, interpolate = TRUE, maxColorValue = 1) ## S3 method for class 'vizi_pixels' plot(x, plot = NULL, ..., enhance = FALSE, smooth = FALSE, scale = FALSE, useRaster = TRUE) ## S3 method for class 'vizi_voxels' plot(x, plot = NULL, ..., xslice = NULL, yslice = NULL, zslice = NULL)
## S3 method for class 'vizi_points' plot(x, plot = NULL, ..., n = Inf, downsampler = "lttb", jitter = "", sort = is.finite(n)) ## S3 method for class 'vizi_lines' plot(x, plot = NULL, ..., n = Inf, downsampler = "lttb", jitter = "", sort = is.finite(n)) ## S3 method for class 'vizi_peaks' plot(x, plot = NULL, ..., n = Inf, downsampler = "lttb", jitter = "", sort = is.finite(n)) ## S3 method for class 'vizi_text' plot(x, plot = NULL, ..., adj = NULL, pos = NULL, offset = 0.5) ## S3 method for class 'vizi_intervals' plot(x, plot = NULL, ..., length = 0.25, angle = 90) ## S3 method for class 'vizi_rules' plot(x, plot = NULL, ...) ## S3 method for class 'vizi_bars' plot(x, plot = NULL, ..., width = 1, stack = FALSE) ## S3 method for class 'vizi_boxplot' plot(x, plot = NULL, ..., range = 1.5, notch = FALSE, width = 0.8) ## S3 method for class 'vizi_image' plot(x, plot = NULL, ..., alpha = NA, interpolate = TRUE, maxColorValue = 1) ## S3 method for class 'vizi_pixels' plot(x, plot = NULL, ..., enhance = FALSE, smooth = FALSE, scale = FALSE, useRaster = TRUE) ## S3 method for class 'vizi_voxels' plot(x, plot = NULL, ..., xslice = NULL, yslice = NULL, zslice = NULL)
x |
A graphical mark. |
plot |
A |
... |
Additional graphical parameters passed to the underlying base graphics plotting function. |
n |
Transformation. Maximum number of points to plot. This is useful for downsampling series with far more data points than are useful to plot. See |
downsampler |
Transformation. If |
jitter |
Transformation. Should jitter be applied to one or more position channels? One of |
sort |
Transformation. Should the data be sorted (along the x-axis) before plotting? Mostly useful for line charts. |
width |
The width of the bars or boxplots. |
stack |
Should bars be stacked versus grouped side-by-side? |
adj , pos , offset
|
See |
length , angle
|
See |
range , notch
|
See |
alpha |
Opacity level from 0 to 1. |
interpolate |
See |
maxColorValue |
See |
enhance |
Transformation. The name of a contrast enhancement method, such as |
smooth |
Transformation. The name of a smoothing method, such as |
scale |
Transformation. If |
useRaster |
Should a bitmap raster be used for plotting? This is typically faster on supported devices. A fallback to polygon-based plotting is used if raster plotting is not supported. |
xslice , yslice , zslice
|
Numeric vectors giving the x, y, and/or z coordinates of the volumetric slices to plot. If none are provided, defaults to plotting all z-slices. |
These methods are not intended to be called directly. They are presented here to document the transformations and parameters they accept. These should be passed a list to the trans
and params
arguments in add_mark
.
See add_mark
for supported encodings.
Kylie A. Bemis
Partial least squares (PLS), also called projection to latent structures, performs multivariate regression between a data matrix and a response matrix by decomposing both matrixes in a way that explains the maximum amount of covariation between them. It is especially useful when the number of predictors is greater than the number of observations, or when the predictors are highly correlated. Orthogonal partial least squares (OPLS) is also provided.
# NIPALS algorithm pls_nipals(x, y, k = 3L, center = TRUE, scale. = FALSE, transpose = FALSE, niter = 100L, tol = 1e-5, verbose = NA, BPPARAM = bpparam(), ...) # SIMPLS algorithm pls_simpls(x, y, k = 3L, center = TRUE, scale. = FALSE, transpose = FALSE, method = 1L, retscores = TRUE, verbose = NA, BPPARAM = bpparam(), ...) # Kernel algorithm pls_kernel(x, y, k = 3L, center = TRUE, scale. = FALSE, transpose = FALSE, method = 1L, retscores = TRUE, verbose = NA, BPPARAM = bpparam(), ...) ## S3 method for class 'pls' fitted(object, type = c("response", "class"), ...) ## S3 method for class 'pls' predict(object, newdata, k, type = c("response", "class"), simplify = TRUE, ...) # O-PLS algorithm opls_nipals(x, y, k = 3L, center = TRUE, scale. = FALSE, transpose = FALSE, niter = 100L, tol = 1e-9, regression = TRUE, verbose = NA, BPPARAM = bpparam(), ...) ## S3 method for class 'opls' coef(object, ...) ## S3 method for class 'opls' residuals(object, ...) ## S3 method for class 'opls' fitted(object, type = c("response", "class", "x"), ...) ## S3 method for class 'opls' predict(object, newdata, k, type = c("response", "class", "x"), simplify = TRUE, ...) # Variable importance in projection vip(object, type = c("projection", "weights"))
# NIPALS algorithm pls_nipals(x, y, k = 3L, center = TRUE, scale. = FALSE, transpose = FALSE, niter = 100L, tol = 1e-5, verbose = NA, BPPARAM = bpparam(), ...) # SIMPLS algorithm pls_simpls(x, y, k = 3L, center = TRUE, scale. = FALSE, transpose = FALSE, method = 1L, retscores = TRUE, verbose = NA, BPPARAM = bpparam(), ...) # Kernel algorithm pls_kernel(x, y, k = 3L, center = TRUE, scale. = FALSE, transpose = FALSE, method = 1L, retscores = TRUE, verbose = NA, BPPARAM = bpparam(), ...) ## S3 method for class 'pls' fitted(object, type = c("response", "class"), ...) ## S3 method for class 'pls' predict(object, newdata, k, type = c("response", "class"), simplify = TRUE, ...) # O-PLS algorithm opls_nipals(x, y, k = 3L, center = TRUE, scale. = FALSE, transpose = FALSE, niter = 100L, tol = 1e-9, regression = TRUE, verbose = NA, BPPARAM = bpparam(), ...) ## S3 method for class 'opls' coef(object, ...) ## S3 method for class 'opls' residuals(object, ...) ## S3 method for class 'opls' fitted(object, type = c("response", "class", "x"), ...) ## S3 method for class 'opls' predict(object, newdata, k, type = c("response", "class", "x"), simplify = TRUE, ...) # Variable importance in projection vip(object, type = c("projection", "weights"))
x |
The data matrix of predictors. |
y |
The response matrix. (Can also be a factor.) |
k |
The number of PLS components to use. (Can be a vector for the |
center |
A logical value indicating whether the variables should be shifted to be zero-centered, or a centering vector of length equal to the number of columns of |
scale. |
A logical value indicating whether the variables should be scaled to have unit variance, or a scaling vector of length equal to the number of columns of |
transpose |
A logical value indicating whether |
niter |
The maximum number of iterations (per component). |
tol |
The tolerance for convergence (per component). |
verbose |
Should progress be printed for each iteration? |
method |
The kernel algorithm to use, where |
retscores |
Should the scores be computed and returned? This also computes the amount of explained covariance for each component. This is done automatically for NIPALS, but requires additional computation for the kernel algorithms. |
regression |
For O-PLS, should a 1-component PLS regression be fit to the processed data (for each orthogonal component removed). |
... |
Not currently used. |
BPPARAM |
An optional instance of |
object |
An object inheriting from |
newdata |
An optional data matrix to use for the prediction. |
type |
The type of prediction, where |
simplify |
Should the predictions be simplified (from a list) to an array ( |
These functions implement partial least squares (PLS) using the original NIPALS algorithm by Wold et al. (1983), the SIMPLS algorithm by de Jong (1993), or the kernel algorithms by Dayal and MacGregor (1997). A function for calculating orthogonal partial least squares (OPLS) processing using the NIPALS algorithm by Trygg and Wold (2002) is also provided.
Both regression and classification can be performed. If passed a factor
, then partial least squares discriminant analysis (PLS-DA) will be performed as described by M. Barker and W. Rayens (2003).
The SIMPLS algorithm (pls_simpls()
) is relatively fast as it does not require the deflation of the data matrix. However, the results will differ slightly from the NIPALS and kernel algorithms for multivariate responses. In these cases, only the first component will be identical. The differences are not meaningful in most cases, but it is worth noting.
The kernel algorithms (pls_kernel()
) tend to be faster than NIPALS for larger data matrices. The original NIPALS algorithm (pls_nipals()
) is the reference implementation. The results from these algorithms are proven to be equivalent for both univariate and multivariate responses.
Note that the NIPALS algorithms cannot handle out-of-memory matter_mat
and sparse_mat
matrices due to the need to deflate the data matrix for each component. x
will be coerced to an in-memory matrix.
Variable importance in projection (VIP) scores proposed by Wold et al. (1993) measure of the influence each variable has on the PLS model. They can be calculated with vip()
. Note that non-NIPALS models must have retscores = TRUE
for VIP to be calculated. In practice, a VIP score greater than ~1 is a useful criterion for variable selection, although there is no statistical basis for this rule.
An object of class pls
, with the following components:
coefficients
: The regression coefficients.
projection
: The projection weights of the regression used to calculate the coefficients from the y-loadings or to project the data to the scores.
residuals
: The residuals from regression.
fitted.values
: The fitted y matrix.
weights
: (Optional) The x-weights of the regression.
loadings
: The x-loadings of the latent variables.
scores
: (Optional) The x-scores of the latent variables.
y.loadings
: The y-loadings of the latent variables.
y.scores
: (Optional) The y-scores of the latent variables.
cvar
: (Optional) The covariance explained by each component.
Or, an object of class opls
, with the following components:
weights
: The orthogonal x-weights.
loadings
: The orthogonal x-loadings.
scores
: The orthogonal x-scores.
ratio
: The ratio of the orthogonal weights to the PLS loadings for each component. This provides a measure of how much orthogonal variation is being removed by each component and can be interpreted as a scree plot similar to PCA.
x
: The processed data matrix with orthogonal variation removed.
regressions
: (Optional.) The PLS 1-component regressions on the processed data.
Kylie A. Bemis
S. Wold, H. Martens, and H. Wold. “The multivariate calibration method in chemistry solved by the PLS method.” Proceedings on the Conference on Matrix Pencils, Lecture Notes in Mathematics, Heidelberg, Springer-Verlag, pp. 286 - 293, 1983.
S. de Jong. “SIMPLS: An alternative approach to partial least squares regression.” Chemometrics and Intelligent Laboratory Systems, vol. 18, issue 3, pp. 251 - 263, 1993.
B. S. Dayal and J. F. MacGregor. “Improved PLS algorithms.” Journal of Chemometrics, vol. 11, pp. 73 - 85, 1997.
M. Barker and W. Rayens. “Partial least squares for discrimination.” Journal of Chemometrics, vol. 17, pp. 166-173, 2003.
J. Trygg and S. Wold. “Orthogonal projections to latent structures.” Journal of Chemometrics, vol. 16, issue 3, pp. 119 - 128, 2002.
S. Wold, A. Johansson, and M. Cocchi. “PLS: Partial least squares projections to latent structures.” 3D QSAR in Drug Design: Theory, Methods and Applications, ESCOM Science Publishers: Leiden, pp. 523 - 550, 1993.
register(SerialParam()) x <- cbind( c(-2.18, 1.84, -0.48, 0.83), c(-2.18, -0.16, 1.52, 0.83)) y <- as.matrix(c(2, 2, 0, -4)) pls_nipals(x, y, k=2)
register(SerialParam()) x <- cbind( c(-2.18, 1.84, -0.48, 0.83), c(-2.18, -0.16, 1.52, 0.83)) y <- as.matrix(c(2, 2, 0, -4)) pls_nipals(x, y, k=2)
This method allows computation of a truncated principal components analysis of matter_mat
and sparse_mat
matrices using the implicitly restarted Lanczos method from the “irlba” package.
## S4 method for signature 'matter_mat' prcomp(x, k = 3L, retx = TRUE, center = TRUE, scale. = FALSE, ...) ## S4 method for signature 'sparse_mat' prcomp(x, k = 3L, retx = TRUE, center = TRUE, scale. = FALSE, ...) prcomp_lanczos(x, k = 3L, retx = TRUE, center = TRUE, scale. = FALSE, transpose = FALSE, verbose = NA, BPPARAM = bpparam(), ...)
## S4 method for signature 'matter_mat' prcomp(x, k = 3L, retx = TRUE, center = TRUE, scale. = FALSE, ...) ## S4 method for signature 'sparse_mat' prcomp(x, k = 3L, retx = TRUE, center = TRUE, scale. = FALSE, ...) prcomp_lanczos(x, k = 3L, retx = TRUE, center = TRUE, scale. = FALSE, transpose = FALSE, verbose = NA, BPPARAM = bpparam(), ...)
x |
A |
k |
The number of principal components to return, must be less than |
retx |
A logical value indicating whether the rotated variables should be returned. |
center |
A logical value indicating whether the variables should be shifted to be zero-centered, or a centering vector of length equal to the number of columns of |
scale. |
A logical value indicating whether the variables should be scaled to have unit variance, or a scaling vector of length equal to the number of columns of |
transpose |
A logical value indicating whether |
verbose |
Should progress messages be printed? |
... |
Additional options passed to |
BPPARAM |
An optional instance of |
An object of class ‘prcomp’. See ?prcomp
for details.
The built-in predict()
method (from the stats package) is not compatible with the argument transpose=TRUE
.
Kylie A. Bemis
register(SerialParam()) set.seed(1) x <- matter_mat(rnorm(1000), nrow=100, ncol=10) prcomp(x)
register(SerialParam()) set.seed(1) x <- matter_mat(rnorm(1000), nrow=100, ncol=10) prcomp(x)
Calculate performance metrics for predictions from classification or regression.
predscore(x, ref)
predscore(x, ref)
x |
The predicted values. |
ref |
The reference (observed) values. |
For classification, a numeric matrix with a row for each class and columns called "Recall" and "Precision".
For regression, a numeric vector with elements named "RMSE", "MAE", and "MAPE".
Kylie A. Bemis
set.seed(1) n <- 250 s <- c("a", "b", "c") x <- sample(s, n, replace=TRUE) pred <- ifelse(runif(n) > 0.1, x, sample(s, n, replace=TRUE)) predscore(pred, x)
set.seed(1) n <- 250 s <- c("a", "b", "c") x <- sample(s, n, replace=TRUE) pred <- ifelse(runif(n) > 0.1, x, sample(s, n, replace=TRUE)) predscore(pred, x)
Normalize or rescale a signal.
# Rescale the root-mean-squared of the signal rescale_rms(x, scale = 1) # Rescale the sum of (absolute values of) the signal rescale_sum(x, scale = length(x)) # Rescale according to a specific sample rescale_ref(x, ref = 1L, scale = 1, domain = NULL) # Rescale the lower and upper limits of the signal rescale_range(x, limits = c(0, 1)) # Rescale the interquartile range rescale_iqr(x, width = 1, center = 0)
# Rescale the root-mean-squared of the signal rescale_rms(x, scale = 1) # Rescale the sum of (absolute values of) the signal rescale_sum(x, scale = length(x)) # Rescale according to a specific sample rescale_ref(x, ref = 1L, scale = 1, domain = NULL) # Rescale the lower and upper limits of the signal rescale_range(x, limits = c(0, 1)) # Rescale the interquartile range rescale_iqr(x, width = 1, center = 0)
x |
A numeric vector. |
scale |
The scaling value. |
ref |
The sample (index or domain value) to use. |
domain |
The domain variable of the signal. |
limits |
The lower and upper limits to use. |
width |
The interquartile range to use. |
center |
The center to use. |
rescale_rms()
and rescale_sum()
simply divide the signal by its root-mean-square or absolute sum, respectively, before multiplying by the given scaling factor.
rescale_ref()
finds the closest matching sample to ref
in domain
if given (it is treated as an index if domain
is NULL), and then scales the entire signal to make that sample equal to scale
.
rescale_range()
simply rescales the signal to match the given upper and lower limits.
rescale_iqr()
attempts to rescale the signal so that its interquartile range is approximately width
.
A rescaled numeric vector the same length as x
.
Kylie A. Bemis
set.seed(1) x <- rnorm(100) sqrt(mean(x^2)) y <- rescale_rms(x, 1) sqrt(mean(y^2)) range(x) z <- rescale_range(x, c(-1, 1)) range(z)
set.seed(1) x <- rnorm(100) sqrt(mean(x^2)) y <- rescale_rms(x, 1) sqrt(mean(y^2)) range(x) z <- rescale_range(x, c(-1, 1)) range(z)
These functions provide utilities for working with multiple streams of pseudo-random numbers.
RNGStreams(n = length(size), size = 1L) getRNGStream() setRNGStream(seed = NULL, kind = NULL)
RNGStreams(n = length(size), size = 1L) getRNGStream() setRNGStream(seed = NULL, kind = NULL)
n |
The number of RNG streams to create. |
size |
If |
seed |
A valid RNG stream to assign to |
kind |
The |
For RNGStreams
, a list of length n
with RNG streams (with elements named seed
and kind
) for use with setRNGStream
.
Kylie A. Bemis
# create parallel RNG streams register(SerialParam()) RNGkind("L'Ecuyer-CMRG") set.seed(1) seeds <- RNGStreams(4) seeds
# create parallel RNG streams register(SerialParam()) RNGkind("L'Ecuyer-CMRG") set.seed(1) seeds <- RNGStreams(4) seeds
Calculate the area under the receiver operating characteristic curve (ROC AUC).
rocscore(x, ref, n = 32L)
rocscore(x, ref, n = 32L)
x |
The prediction scores. |
ref |
The (logical) binary response. |
n |
The number of points in the curve. |
A single number between 0 and 1 giving the ROC AUC, with an attribute called ROC
which is a data frame giving the full ROC curve.
Kylie A. Bemis
set.seed(1) x <- runif(100) y <- ifelse(x > 0.5 & runif(100) > 0.2, TRUE, FALSE) rocscore(x, y)
set.seed(1) x <- runif(100) y <- ifelse(x > 0.5 & runif(100) > 0.2, TRUE, FALSE) rocscore(x, y)
Summarize a vector in rolling windows.
rollvec(x, width, stat = "sum", prob = 0.5)
rollvec(x, width, stat = "sum", prob = 0.5)
x |
A numeric vector. |
width |
The width of the rolling window. Must be odd. |
stat |
The statistic used to summarize the values in each bin. Must be one of "sum", "mean", "max", "min", "sd", "var", "mad", or "quantile". |
prob |
The quantile for |
An numeric vector with the same length as x
with the summarized values from each rolling window.
Kylie A. Bemis
set.seed(1) x <- sort(runif(20)) rollvec(x, 5L, "mean")
set.seed(1) x <- sort(runif(20)) rollvec(x, 5L, "mean")
Compute and return the distances between specific rows or columns of matrices according to a specific distance metric.
## S4 method for signature 'matrix,matrix' rowDists(x, y, ..., BPPARAM = bpparam()) ## S4 method for signature 'matrix,matrix' colDists(x, y, ..., BPPARAM = bpparam()) ## S4 method for signature 'matter_mat,matrix' rowDists(x, y, ..., BPPARAM = bpparam()) ## S4 method for signature 'matrix,matter_mat' colDists(x, y, ..., BPPARAM = bpparam()) ## S4 method for signature 'sparse_mat,matrix' rowDists(x, y, ..., BPPARAM = bpparam()) ## S4 method for signature 'matrix,sparse_mat' colDists(x, y, ..., BPPARAM = bpparam()) ## S4 method for signature 'ANY,missing' rowDists(x, at, ..., simplify = TRUE, BPPARAM = bpparam()) ## S4 method for signature 'ANY,missing' colDists(x, at, ..., simplify = TRUE, BPPARAM = bpparam()) .rowDists(x, y, metric = "euclidean", p = 2, weights = NULL, iter.dim = 1L, BPPARAM = bpparam(), ...) .colDists(x, y, metric = "euclidean", p = 2, weights = NULL, iter.dim = 1L, BPPARAM = bpparam(), ...) # Low-level row/col distance functions rowdist(x, y = x, metric = "euclidean", p = 2, weights = NULL, ...) coldist(x, y = x, metric = "euclidean", p = 2, weights = NULL, ...) rowdist_at(x, ix, y = x, iy = list(1L:nrow(y)), metric = "euclidean", p = 2, weights = NULL, ...) coldist_at(x, ix, y = x, iy = list(1L:ncol(y)), metric = "euclidean", p = 2, weights = NULL, ...)
## S4 method for signature 'matrix,matrix' rowDists(x, y, ..., BPPARAM = bpparam()) ## S4 method for signature 'matrix,matrix' colDists(x, y, ..., BPPARAM = bpparam()) ## S4 method for signature 'matter_mat,matrix' rowDists(x, y, ..., BPPARAM = bpparam()) ## S4 method for signature 'matrix,matter_mat' colDists(x, y, ..., BPPARAM = bpparam()) ## S4 method for signature 'sparse_mat,matrix' rowDists(x, y, ..., BPPARAM = bpparam()) ## S4 method for signature 'matrix,sparse_mat' colDists(x, y, ..., BPPARAM = bpparam()) ## S4 method for signature 'ANY,missing' rowDists(x, at, ..., simplify = TRUE, BPPARAM = bpparam()) ## S4 method for signature 'ANY,missing' colDists(x, at, ..., simplify = TRUE, BPPARAM = bpparam()) .rowDists(x, y, metric = "euclidean", p = 2, weights = NULL, iter.dim = 1L, BPPARAM = bpparam(), ...) .colDists(x, y, metric = "euclidean", p = 2, weights = NULL, iter.dim = 1L, BPPARAM = bpparam(), ...) # Low-level row/col distance functions rowdist(x, y = x, metric = "euclidean", p = 2, weights = NULL, ...) coldist(x, y = x, metric = "euclidean", p = 2, weights = NULL, ...) rowdist_at(x, ix, y = x, iy = list(1L:nrow(y)), metric = "euclidean", p = 2, weights = NULL, ...) coldist_at(x, ix, y = x, iy = list(1L:ncol(y)), metric = "euclidean", p = 2, weights = NULL, ...)
x , y
|
Numeric matrices for which distances should be calculated according to rows or columns. If a parallel backend is provided, then the calculation is parallelized over |
at |
A list, matrix, or vector of specific row or column indices for which to calculate the distances. Each row or column of |
simplify |
Should the result be simplified into a matrix if possible? |
metric |
Distance metric to compute. Supported metrics include "euclidean", "maximum", "manhattan", and "minkowski". |
p |
The power for the Minkowski distance. |
weights |
A numeric vector of weights for the distance components if calculating weighted distances. For example, the weighted Euclidean distance is |
iter.dim |
The dimension to iterate over. Must be 1 or 2, where 1 indicates rows and 2 indicates columns. |
ix , iy
|
A list of specific row or column indices for which to calculate the pairwise distances. Numeric vectors will be coerced to lists. Each list element should give a vector of indices to use for a distance computation. Elements of length 1 will be recycled to an appropriate length. |
BPPARAM |
An optional instance of |
... |
Additional arguments passed to |
rowdist()
and coldist()
calculate straightforward distances between each row or column, respectively in x
and y
. If y = x
(the default), then the output of rowdist()
will match the output of dist
(except it will be an ordinary matrix).
rowdist_at()
and coldist_at()
allow passing a list of specific row or column indices for which to calculate the distances.
rowDists()
and colDists()
are S4 generics. The current methods provide (optionally parallelized) versions of rowdist()
and coldist()
for matter_mat
and sparse_mat
matrices.
For rowdist()
and coldist()
, a matrix with rows equal to the number of observations in x
and columns equal to the number of observations in y
.
For rowdist_at()
and coldist_at()
, a list where each element gives the pairwise distances corresponding to the indices given by ix
and iy
.
rowDists()
and colDists()
have corresponding return values depending on whether at
has been specified (and the value of simplify
).
Kylie A. Bemis
register(SerialParam()) set.seed(1) x <- matrix(runif(25), nrow=5, ncol=5) y <- matrix(runif(25), nrow=3, ncol=5) rowDists(x) # same as as.matrix(dist(x)) rowDists(x, y) # distances between: # x[1,] vs x[,] # x[5,] vs x[,] rowdist_at(x, c(1,5)) # distances between: # x[1,] vs x[1:3,] # x[5,] vs x[3:5,] rowdist_at(x, ix=c(1,5), iy=list(1:3, 3:5)) # distances between: # x[i,] vs x[(i-1):(i+1),] rowDists(x, at=roll(1:5, width=3)) # distances between: # x[,] vs x[1,] rowDists(x, at=1)
register(SerialParam()) set.seed(1) x <- matrix(runif(25), nrow=5, ncol=5) y <- matrix(runif(25), nrow=3, ncol=5) rowDists(x) # same as as.matrix(dist(x)) rowDists(x, y) # distances between: # x[1,] vs x[,] # x[5,] vs x[,] rowdist_at(x, c(1,5)) # distances between: # x[1,] vs x[1:3,] # x[5,] vs x[3:5,] rowdist_at(x, ix=c(1,5), iy=list(1:3, 3:5)) # distances between: # x[i,] vs x[(i-1):(i+1),] rowDists(x, at=roll(1:5, width=3)) # distances between: # x[,] vs x[1,] rowDists(x, at=1)
Generate a sequence with spacing based on relative differences.
seq_rel(from, to, by)
seq_rel(from, to, by)
from |
The start of the sequence. |
to |
The end of the sequence. |
by |
The relative difference between consecutive elements. |
Because the relative differences depend on whether we treat the lesser or greater term as the "reference", the function actually treats each element as the center of a bin of width by
(relative to that element).
A numeric vector of some (algorithmically-determined) length where consecutive elements are separated by consistent relative differences.
Kyle Dahlin and Kylie A. Bemis
# create a sequence w/ 1e-4 relative diff spacing x <- seq_rel(500, 505, by=1e-4) x # relative spacing is averaged between lesser/greater element head(diff(x) / x[-1]) head(diff(x) / x[-length(x)]) dx <- 0.5 * ((diff(x) / x[-1]) + (diff(x) / x[-length(x)])) dx
# create a sequence w/ 1e-4 relative diff spacing x <- seq_rel(500, 505, by=1e-4) x # relative spacing is averaged between lesser/greater element head(diff(x) / x[-1]) head(diff(x) / x[-length(x)]) dx <- 0.5 * ((diff(x) / x[-1]) + (diff(x) / x[-length(x)])) dx
Spatially segment a single-channel image using a Dirichlet Gaussian mixture model (DGMM).
# Spatial Gaussian mixture model sgmix(x, y, vals, r = 1, k = 2, group = NULL, weights = c("gaussian", "bilateral", "adaptive"), metric = "maximum", p = 2, neighbors = NULL, annealing = TRUE, niter = 10L, tol = 1e-3, compress = FALSE, byrow = FALSE, verbose = NA, chunkopts=list(), BPPARAM = bpparam(), ...) ## S3 method for class 'sgmix' fitted(object, type = c("mu", "sigma", "class"), channel = NULL, ...) ## S3 method for class 'sgmix' logLik(object, ...)
# Spatial Gaussian mixture model sgmix(x, y, vals, r = 1, k = 2, group = NULL, weights = c("gaussian", "bilateral", "adaptive"), metric = "maximum", p = 2, neighbors = NULL, annealing = TRUE, niter = 10L, tol = 1e-3, compress = FALSE, byrow = FALSE, verbose = NA, chunkopts=list(), BPPARAM = bpparam(), ...) ## S3 method for class 'sgmix' fitted(object, type = c("mu", "sigma", "class"), channel = NULL, ...) ## S3 method for class 'sgmix' logLik(object, ...)
x , y , vals
|
Pixel coordinates ( |
r |
The spatial smoothing radius. |
k |
The number of segments (per group, if applicable). |
group |
A vector of pixel groups. Pixels belonging to each group will be segmented independently, and will be assigned to different segments. |
weights |
The type of spatial weights to use. |
metric |
Distance metric to use when finding neighboring pixels. Supported metrics include "euclidean", "maximum", "manhattan", and "minkowski". |
p |
The power for the Minkowski distance. |
neighbors |
An optional list giving the neighboring pixel indices for each pixel. |
annealing |
Should simulated annealing be attempted every iteration? (If |
niter |
The maximum number of iterations. |
tol |
The tolerance for convergence, as measured by the change in log-likelihood in successive iterations. |
compress |
Should the results be compressed? The resulting |
byrow |
If |
verbose |
Should progress be printed for each iteration? |
chunkopts |
An (optional) list of chunk options including |
BPPARAM |
An optional instance of |
... |
Ignored. |
object |
An object inheriting from |
type |
The type of fitted values to extract. |
channel |
The channel of fitted values to extract. |
Spatial segmentation is performed using a Gaussian mixture model from Guo et al. (2019) that uses Dirichlet priors to incorporate spatial dependence. The strength of the spatial smoothing depends on the smoothing radius (r
) and the type of spatial weights
. The "bilateral" and "adaptive" weights can preserve edges better than the standard "gaussian" weights at the expense of a (potentially) noisier segmentation.
The segmentation is initialized using k-means clustering. An expectation-maximization (E-M) algorithm with gradient descent is then used to estimate the model parameters based on log-likelihood. Optionally, simulated annealing can be used to prevent the model from getting stuck in local maxima.
An object of class sgmix
, with the following components:
class
: A list of class assignments for each channel.
probability
: (Optional) A matrix or array of posterior class probabilities.
mu
: The fitted class means.
sigma
: The fitted class standard deviations.
alpha
: The fitted Dirichlet priors.
beta
: The estimated strength of the spatial dependence.
group
: (Optional) The pixel groups.
Kylie A. Bemis
D. Guo, K. Bemis, C. Rawlins, J. Agar, and O. Vitek. “Unsupervised segmentation of mass spectrometric ion images characterizes morphology of tissues” Bioinformatics, vol. 35, issue 14, pp. i208-i217, 2019.
require(datasets) set.seed(1) seg <- sgmix(volcano, k=3) image(fitted(seg, "class", channel=1L))
require(datasets) set.seed(1) seg <- sgmix(volcano, k=3) image(fitted(seg, "class", channel=1L))
Shingles are a generalization of factors to continuous variables. Cleveland-style shingles distribute the range of a continuous variable into overlapping discrete intervals, which can be more useful for visualization than mutually-exclusive bins.
shingles(x, breaks, overlap = 0.5, labels = NULL)
shingles(x, breaks, overlap = 0.5, labels = NULL)
x |
A numeric vector. |
breaks |
Either the number of intervals or a matrix of breaks (as returned by |
overlap |
The fraction of overlap between intervals. |
labels |
The names of the intervals. |
A list giving the indices of x
in each shingle with the following attributes:
breaks
: A matrix where each row gives the lower and upper limits for each shingle.
counts
: The number of observations in each shingle.
mids
: The center of each shingle.
Kylie A. Bemis
W. S. Cleveland. Visualizing Data. New Jersey: Summit Press. 1993.
set.seed(1) x <- rnorm(100) shingles(x, 6)
set.seed(1) x <- rnorm(100) shingles(x, 6)
A simple logger that uses R's built-in signaling conditions.
## Instance creation simple_logger(file = NULL, bufferlimit = 50L, domain = NULL) ## Get logger used by matter matter_logger() ## Additional methods documented below
## Instance creation simple_logger(file = NULL, bufferlimit = 50L, domain = NULL) ## Get logger used by matter matter_logger() ## Additional methods documented below
file |
The name of the log file. If |
bufferlimit |
The maximum number of buffered log entries before they are flushed to the log file. |
domain |
See |
An object of reference class simple_logger
.
id
:An identifier for file synchronization.
buffer
:A character vector of buffered log entries that have not yet been flushed to the log file.
bufferlimit
:The maximum number of buffered log entries before they are flushed to the log file.
logfile
:The path to the log file.
domain
:See gettext
for details. If NA
, log entries will not be translated.
Class-specific methods:
$flush():
Flush buffered log entries to the log file.
$append(entry):
Append a plain text entry to the log.
$append_session():
Append the sessionInfo()
to the log.
$append_trace():
Append the traceback()
to the log.
$history(print = TRUE):
Print the complete log to the console (TRUE
) or return it as a character vector (FALSE
).
$log(..., signal = FALSE, call = NULL):
Record a log entry with a timestamp, optionally signaling a "message"
, "warning"
, or "error"
condition. For "warning"
and "error"
conditions (only), the call
may also be passed to be deparsed and recorded in the log entry.
$message(...):
Record a log entry with a timestamp and signal a message
condition.
$warning(...):
Record a log entry with a timestamp and signal a warning
condition.
$stop(...):
Record a log entry with a timestamp and signal an error
condition.
$move(file):
Move the log file to a new location. Moving the log file automatically appends the current sessionInfo()
.
$copy(file):
Copy the log file to a different location.
$close():
Flush buffered log entries (including sessionInfo()
) and detach the log file. This is called automatically at the end of an R session. Note that log files in the temporary directory will be deleted after the R session ends if they are not $move()
ed to a persistent directory before quitting.
Standard generic methods:
path(x), path(x) <- value:
Get or set the path to the log file. The replacement method is the same as $move()
.
Kylie A. Bemis
sl <- simple_logger(tempfile(fileext=".log")) sl$log("This is a silent log entry that doesn't signal a condition.") sl$log("This log entry signals a message condition.", signal=TRUE) sl$log("This log entry signals a message condition.", signal="message") sl$message("This log entry also signals a message condition") sl$flush() readLines(path(sl))
sl <- simple_logger(tempfile(fileext=".log")) sl$log("This is a silent log entry that doesn't signal a condition.") sl$log("This log entry signals a message condition.", signal=TRUE) sl$log("This log entry signals a message condition.", signal="message") sl$message("This log entry also signals a message condition") sl$flush() readLines(path(sl))
Simulate spectra from noise and a list of peaks.
simspec(n = 1L, npeaks = 50L, x = rlnorm(npeaks, 7, 0.3), y = rlnorm(npeaks, 1, 0.9), domain = c(0.9 * min(x), 1.1 * max(x)), size = 10000, sdx = 1e-5, sdy = sdymult * log1p(y), sdymult = 0.2, sdnoise = 0.1, resolution = 1000, fmax = 0.5, baseline = 0, decay = 10, units = "relative") simspec1(x, y, xout, peakwidths = NA_real_, sdnoise = 0, resolution = 1000, fmax = 0.5)
simspec(n = 1L, npeaks = 50L, x = rlnorm(npeaks, 7, 0.3), y = rlnorm(npeaks, 1, 0.9), domain = c(0.9 * min(x), 1.1 * max(x)), size = 10000, sdx = 1e-5, sdy = sdymult * log1p(y), sdymult = 0.2, sdnoise = 0.1, resolution = 1000, fmax = 0.5, baseline = 0, decay = 10, units = "relative") simspec1(x, y, xout, peakwidths = NA_real_, sdnoise = 0, resolution = 1000, fmax = 0.5)
n |
The number of spectra to simulate. |
npeaks |
The number of peaks to simulate. Not used if |
x , y
|
The locations and values of the spectral peaks. |
xout , domain
|
The output domain variable or its range. |
size |
The number of samples in each spectrum. |
sdx |
The standard deviation of the error in the observed locations of peaks, in units indicated by |
sdy |
The standard deviation(s) for the distributions of observed peak values on the log scale. |
sdymult |
A multiplier used to calculate |
sdnoise |
The standard deviation of the random noise in the spectra on the log scale. |
resolution |
The resolution as defined by |
fmax |
The fraction of the maximum peak height to use when defining the resolution. |
peakwidths |
The peak widths at |
baseline |
The maximum intensity of the baseline. Note that |
decay |
A constant used to calculate the exponential decay of the baseline. Larger values mean the baseline decays more sharply. |
units |
The units for |
Either a numeric vector of the same length as size
, giving the simulated spectrum, or a size
x n
matrix of simulated spectra.
Kylie A. Bemis
set.seed(1) y <- simspec(2) x <- attr(y, "domain") plot(x, y[,1], type="l", ylim=c(-max(y), max(y))) lines(x, -y[,2], col="red")
set.seed(1) y <- simspec(2) x <- attr(y, "domain") plot(x, y[,1], type="l", ylim=c(-max(y), max(y))) lines(x, -y[,2], col="red")
This class provides a enhanced version of the SnowParam
parallel backend from BiocParallel
based on the newer PSOCK cluster implementation from the parallel
package.
## Instance creation SnowfastParam(workers = snowWorkers(), tasks = 0L, stop.on.error = TRUE, progressbar = FALSE, RNGseed = NULL, timeout = WORKER_TIMEOUT, exportglobals = TRUE, exportvariables = TRUE, resultdir = NA_character_, jobname = "BPJOB", force.GC = FALSE, fallback = TRUE, useXDR = FALSE, manager.hostname = NA_character_, manager.port = NA_character_, ...) ## Additional methods documented below
## Instance creation SnowfastParam(workers = snowWorkers(), tasks = 0L, stop.on.error = TRUE, progressbar = FALSE, RNGseed = NULL, timeout = WORKER_TIMEOUT, exportglobals = TRUE, exportvariables = TRUE, resultdir = NA_character_, jobname = "BPJOB", force.GC = FALSE, fallback = TRUE, useXDR = FALSE, manager.hostname = NA_character_, manager.port = NA_character_, ...) ## Additional methods documented below
workers |
Either the number of workers or the names of cluster nodes. |
tasks |
The number of tasks per job. See |
stop.on.error |
Enable stop on error. See |
progressbar |
Enable text progress bar. See |
RNGseed |
The seed for random number generation. See |
timeout |
Time (in seconds) allowed for workers to complete a task. See |
exportglobals |
Export |
exportvariables |
Automatically export variables defined in the global environment and used by the function? See |
resultdir |
Job results directory. See |
jobname |
The name of the job for logging and results. See |
force.GC |
Whether to invoke the garbage collector after each call to |
fallback |
Fall back to using |
useXDR |
Should data be converted to network byte order when serializing from the manager to the workers? The default ( |
manager.hostname |
Host name of the manager node. |
manager.port |
Port on the manager with which workers communicate. |
... |
Additional arguments passed to |
SnowfastParam
is a faster but somewhat more limited version of SnowParam
. Like SnowParam
, it uses simple network of workstations (SNOW)-style parallelism so that it is available on all operating systems.
The workers are initialized in parallel, so cluster startup can often be significantly faster than SnowParam
if utilizing a large number of workers.
Because the workers are started using makePSOCKcluster
from the parallel
package rather than using BiocParallel
's startup script, some features of BiocParallel
-managed backends such as logging are unsupported or only partially available.
The default parameter useXDR=TRUE
assumes that all nodes are little endian so that data can be sent to workers without the overhead of conversion to network byte order. This should result in overall faster performance for large datasets on compatible clusters.
SnowfastParam
is intended to be used as a drop-in replacement for SnowParam
in most situations and as a more stable alternative to MulticoreParam
for long-running jobs in graphical environments like RStudio where forking the R process should be avoided.
An parallel backend derived from class BiocParallelParam
.
BiocParallelParam
methods:
bpstart(x):
Start the cluster.
bpstop(x):
Stop the cluster.
Additional methods documented in BiocParallelParam
.
Kylie A. Bemis
x <- replicate(1000, runif(200), simplify=FALSE) bp <- SnowfastParam(workers=4, tasks=20, progressbar=TRUE) ans <- chunkLapply(x, sum, BPPARAM=bp)
x <- replicate(1000, runif(200), simplify=FALSE) bp <- SnowfastParam(workers=4, tasks=20, progressbar=TRUE) ans <- chunkLapply(x, sum, BPPARAM=bp)
The sparse_mat
class implements sparse matrices, potentially stored out-of-memory. Both compressed-sparse-column (CSC) and compressed-sparse-row (CSR) formats are supported. Sparse vectors are also supported through the sparse_vec
class.
## Instance creation sparse_mat(data, index, type = "double", nrow = NA_integer_, ncol = NA_integer_, dimnames = NULL, pointers = NULL, domain = NULL, offset = 0L, rowMaj = FALSE, tolerance = c(abs=0), sampler = "none", ...) sparse_vec(data, index, type = "double", length = NA_integer_, names = NULL, domain = NULL, offset = 0L, rowMaj = FALSE, tolerance = c(abs=0), sampler = "none", ...) # Check if an object is a sparse matrix is.sparse(x) # Coerce an object to a sparse matrix as.sparse(x, ...) ## Additional methods documented below
## Instance creation sparse_mat(data, index, type = "double", nrow = NA_integer_, ncol = NA_integer_, dimnames = NULL, pointers = NULL, domain = NULL, offset = 0L, rowMaj = FALSE, tolerance = c(abs=0), sampler = "none", ...) sparse_vec(data, index, type = "double", length = NA_integer_, names = NULL, domain = NULL, offset = 0L, rowMaj = FALSE, tolerance = c(abs=0), sampler = "none", ...) # Check if an object is a sparse matrix is.sparse(x) # Coerce an object to a sparse matrix as.sparse(x, ...) ## Additional methods documented below
data |
Either the non-zero values of the sparse array, or (if |
index |
For |
type |
A 'character' vector giving the storage mode of the data in virtual memory such. See |
nrow , ncol , length
|
The number of rows and columns, or the length of the array. |
dimnames |
The names of the sparse matrix dimensions. |
names |
The names of the sparse vector elements. |
pointers |
The (zero-indexed) pointers to the start of either the rows or columns (depending on the value of |
domain |
Either |
offset |
If |
rowMaj |
Whether the data should be stored using compressed-sparse-row (CSR) representation (as opposed to compressed-sparse-column (CSC) representation). Defaults to 'FALSE', for efficient access to columns. Set to 'TRUE' for more efficient access to rows instead. |
tolerance |
For non- |
sampler |
For non-zero tolerances, how the |
x |
An object to check if it is a sparse matrix or coerce to a sparse matrix. |
... |
Additional arguments to be passed to constructor. |
An object of class sparse_mat
.
data
:The non-zero data values. Can be a numeric vector or a list of numeric vectors.
type
:The storage mode of the accessed data when read into R. This is a 'factor' with levels 'raw', 'logical', 'integer', 'numeric', or 'character'.
dim
:Either NULL
for vectors, or an integer vector of length one of more giving the maximal indices in each dimension for matrices and arrays.
names
:The names of the data elements for vectors.
dimnames
:Either NULL
or the names for the dimensions. If not NULL
, then this should be a list of character vectors of the length given by 'dim' for each dimension. This is always NULL
for vectors.
index
:The indices of the non-zero items. Can be a numeric vector or a list of numeric vectors.
pointers
:The pointers to the beginning of the rows or columns if index
and data
use vector storage rather than list storage.
domain
:Either NULL
or a vector with length equal to the number of rows (for CSC matrices) or the number of columns (for CSR matrices). If NULL
, then index
is assumed to be row or column indices. If a vector, then they define the how the non-zero elements are matched to rows or columns. The index
value of each non-zero element is matched against this domain using binary search. Must be numeric.
offset
:If domain
is NULL
(i.e., index
represents the actual row/column indices), then this is the index of the first row/column. The default of 0 means that index
is indexed from 0.
tolerance
:For non-NULL
domain, the tolerance used for floating-point equality when matching index
to the domain
. The vector should be named. Use 'absolute' to use absolute differences, and 'relative' to use relative differences.
sampler
:The type of summarization or interpolation performed when there are multiple index
values within the tolerance of the requested domain
value(s).
ops
:Deferred arithmetic operations.
transpose
Indicates whether the data is stored in row-major order (TRUE) or column-major order (FALSE). For a matrix, switching the order that the data is read is equivalent to transposing the matrix (without changing any data).
sparse_mat
and sparse_vec
instances can be created through sparse_mat()
and sparse_vec()
, respectively.
Class-specific methods:
atomdata(x)
:Access the 'data' slot.
adata(x)
:An alias for atomdata(x).
atomindex(x)
:Access the 'index' slot.
aindex(x)
:An alias for atomindex(x).
pointers(x)
:Access the 'pointers' slot.
domain(x)
:Access the 'domain' slot.
tolerance(x), tolerance(x) <- value
:Get or set resampling 'tolerance'.
sampler(x), sampler(x) <- value
:Get or set the 'sampler' method.
fetch(x, ...)
:Pull data into shared memory.
flash(x, ...)
:Push data to a temporary file.
Standard generic methods:
dim(x):
Get 'dim'.
dimnames(x), dimnames(x) <- value
:Get or set 'dimnames'.
x[i, j, ..., drop], x[i, j] <- value
:Get or set the elements of the sparse matrix. Use drop = NULL
to return a subset of the same class as the object.
cbind(x, ...), rbind(x, ...)
:Combine sparse matrices by row or column.
t(x)
:Transpose a matrix. This is a quick operation which only changes metadata and does not touch the data representation.
rowMaj(x)
:Check the data orientation.
Kylie A. Bemis
x <- matrix(rbinom(100, 1, 0.2), nrow=10, ncol=10) y <- sparse_mat(x) y[]
x <- matrix(rbinom(100, 1, 0.2), nrow=10, ncol=10) y <- sparse_mat(x) y[]
These functions allow calculation of streaming statistics. They are useful, for example, for calculating summary statistics on small chunks of a larger dataset, and then combining them to calculate the summary statistics for the whole dataset.
This is not particularly interesting for simpler, commutative statistics like sum()
, but it is useful for calculating non-commutative statistics like running sd()
or var()
on pieces of a larger dataset.
# calculate streaming univariate statistics s_stat(x, stat, group, na.rm = FALSE, ...) s_range(x, ..., na.rm = FALSE) s_min(x, ..., na.rm = FALSE) s_max(x, ..., na.rm = FALSE) s_prod(x, ..., na.rm = FALSE) s_sum(x, ..., na.rm = FALSE) s_mean(x, ..., na.rm = FALSE) s_var(x, ..., na.rm = FALSE) s_sd(x, ..., na.rm = FALSE) s_any(x, ..., na.rm = FALSE) s_all(x, ..., na.rm = FALSE) s_nnzero(x, ..., na.rm = FALSE) # calculate streaming matrix statistics s_rowstats(x, stat, group, na.rm = FALSE, ...) s_colstats(x, stat, group, na.rm = FALSE, ...) # calculate combined summary statistics stat_c(x, y, ...)
# calculate streaming univariate statistics s_stat(x, stat, group, na.rm = FALSE, ...) s_range(x, ..., na.rm = FALSE) s_min(x, ..., na.rm = FALSE) s_max(x, ..., na.rm = FALSE) s_prod(x, ..., na.rm = FALSE) s_sum(x, ..., na.rm = FALSE) s_mean(x, ..., na.rm = FALSE) s_var(x, ..., na.rm = FALSE) s_sd(x, ..., na.rm = FALSE) s_any(x, ..., na.rm = FALSE) s_all(x, ..., na.rm = FALSE) s_nnzero(x, ..., na.rm = FALSE) # calculate streaming matrix statistics s_rowstats(x, stat, group, na.rm = FALSE, ...) s_colstats(x, stat, group, na.rm = FALSE, ...) # calculate combined summary statistics stat_c(x, y, ...)
x , y , ...
|
Object(s) on which to calculate a summary statistic, or a summary statistic to combine. |
stat |
The name of a summary statistic to compute over the rows or columns of a matrix. Allowable values include: "range", "min", "max", "prod", "sum", "mean", "var", "sd", "any", "all", and "nnzero". |
group |
A factor or vector giving the grouping. If not provided, no grouping will be used. |
na.rm |
If |
These summary statistics methods are intended to be applied to chunks of a larger dataset. They can then be combined either with the individual summary statistic functions, or with stat_c()
, to produce the combined summary statistic for the full dataset. This is most useful for calculating running variances and standard deviations iteratively, which would be difficult or impossible to calculate on the full dataset.
The variances and standard deviations are calculated using running sum of squares formulas which can be calculated iteratively and are accurate for large floating-point datasets (see reference).
For all univariate functions except s_range()
, a single number giving the summary statistic. For s_range()
, two numbers giving the minimum and the maximum values.
For s_rowstats()
and s_colstats()
, a vector of summary statistics.
Kylie A. Bemis
B. P. Welford. “Note on a Method for Calculating Corrected Sums of Squares and Products.” Technometrics, vol. 4, no. 3, pp. 1-3, Aug. 1962.
B. O'Neill. “Some Useful Moment Results in Sampling Problems.” The American Statistician, vol. 68, no. 4, pp. 282-296, Sep. 2014.
set.seed(1) x <- sample(1:100, size=10) y <- sample(1:100, size=10) sx <- s_var(x) sy <- s_var(y) var(c(x, y)) stat_c(sx, sy) # should be the same sxy <- stat_c(sx, sy) # calculate with 1 new observation var(c(x, y, 99)) stat_c(sxy, 99) # calculate over rows of a matrix set.seed(2) A <- matrix(rnorm(100), nrow=10) B <- matrix(rnorm(100), nrow=10) sx <- s_rowstats(A, "var") sy <- s_rowstats(B, "var") apply(cbind(A, B), 1, var) stat_c(sx, sy) # should be the same
set.seed(1) x <- sample(1:100, size=10) y <- sample(1:100, size=10) sx <- s_var(x) sy <- s_var(y) var(c(x, y)) stat_c(sx, sy) # should be the same sxy <- stat_c(sx, sy) # calculate with 1 new observation var(c(x, y, 99)) stat_c(sxy, 99) # calculate over rows of a matrix set.seed(2) A <- matrix(rnorm(100), nrow=10) B <- matrix(rnorm(100), nrow=10) sx <- s_rowstats(A, "var") sy <- s_rowstats(B, "var") apply(cbind(A, B), 1, var) stat_c(sx, sy) # should be the same
This is a convenience function for creating and reading C-style structs in a single file stored in virtual memory.
struct(..., path = NULL, readonly = FALSE, offset = 0, filename)
struct(..., path = NULL, readonly = FALSE, offset = 0, filename)
... |
Named integers giving the members of the |
path , filename
|
A single string giving the name of the file. |
readonly |
Should the file be treated as read-only? |
offset |
A scalar integer giving the offset from the beginning of the file. |
This is simply a convenient wrapper around the wrapper around matter_list
that allows easy specification of C-style structs in a file.
A object of class matter_list
.
Kylie A. Bemis
x <- struct(first=c(int=1), second=c(double=1)) x$first <- 2L x$second <- 3.33 x$first x$second
x <- struct(first=c(int=1), second=c(double=1)) x$first <- 2L x$second <- 3.33 x$first x$second
These functions efficiently calculate summary statistics for matter_arr
objects. For matrices, they operate efficiently on both rows and columns.
## S4 method for signature 'matter_arr' range(x, ..., na.rm) ## S4 method for signature 'matter_arr' min(x, ..., na.rm) ## S4 method for signature 'matter_arr' max(x, ..., na.rm) ## S4 method for signature 'matter_arr' prod(x, ..., na.rm) ## S4 method for signature 'matter_arr' mean(x, ..., na.rm) ## S4 method for signature 'matter_arr' sum(x, ..., na.rm) ## S4 method for signature 'matter_arr' sd(x, na.rm) ## S4 method for signature 'matter_arr' var(x, na.rm) ## S4 method for signature 'matter_arr' any(x, ..., na.rm) ## S4 method for signature 'matter_arr' all(x, ..., na.rm) ## S4 method for signature 'matter_mat' colMeans(x, na.rm, dims = 1, ...) ## S4 method for signature 'matter_mat' colSums(x, na.rm, dims = 1, ...) ## S4 method for signature 'matter_mat' rowMeans(x, na.rm, dims = 1, ...) ## S4 method for signature 'matter_mat' rowSums(x, na.rm, dims = 1, ...)
## S4 method for signature 'matter_arr' range(x, ..., na.rm) ## S4 method for signature 'matter_arr' min(x, ..., na.rm) ## S4 method for signature 'matter_arr' max(x, ..., na.rm) ## S4 method for signature 'matter_arr' prod(x, ..., na.rm) ## S4 method for signature 'matter_arr' mean(x, ..., na.rm) ## S4 method for signature 'matter_arr' sum(x, ..., na.rm) ## S4 method for signature 'matter_arr' sd(x, na.rm) ## S4 method for signature 'matter_arr' var(x, na.rm) ## S4 method for signature 'matter_arr' any(x, ..., na.rm) ## S4 method for signature 'matter_arr' all(x, ..., na.rm) ## S4 method for signature 'matter_mat' colMeans(x, na.rm, dims = 1, ...) ## S4 method for signature 'matter_mat' colSums(x, na.rm, dims = 1, ...) ## S4 method for signature 'matter_mat' rowMeans(x, na.rm, dims = 1, ...) ## S4 method for signature 'matter_mat' rowSums(x, na.rm, dims = 1, ...)
x |
A |
... |
Arguments passed to |
na.rm |
If |
dims |
Not used. |
These summary statistics methods operate on chunks of data which are loaded into memory and then freed before reading the next chunk.
For row and column summaries on matrices, the iteration scheme is dependent on the layout of the data. Column-major matrices will always be iterated over by column, and row-major matrices will always be iterated over by row. Row statistics on column-major matrices and column statistics on row-major matrices are calculated iteratively.
Variance and standard deviation are calculated using a running sum of squares formula which can be calculated iteratively and is accurate for large floating-point datasets (see reference).
For mean
, sd
, and var
, a single number. For the column summaries, a vector of length equal to the number of columns of the matrix. For the row summaries, a vector of length equal to the number of rows of the matrix.
Kylie A. Bemis
B. P. Welford, “Note on a Method for Calculating Corrected Sums of Squares and Products,” Technometrics, vol. 4, no. 3, pp. 1-3, Aug. 1962.
register(SerialParam()) x <- matter(1:100, nrow=10, ncol=10) sum(x) mean(x) var(x) sd(x) colSums(x) colMeans(x) rowSums(x) rowMeans(x)
register(SerialParam()) x <- matter(1:100, nrow=10, ncol=10) sum(x) mean(x) var(x) sd(x) colSums(x) colMeans(x) rowSums(x) rowMeans(x)
Estimate the raster dimensions of a scattered 2D or 3D signal based on its pixel coordinates.
# Rasterize a 2D signal to_raster(x, y, vals) # Rasterize a 3D signal to_raster3(x, y, z, vals) # Check if coordinates are gridded is_gridded(x, tol = sqrt(.Machine$double.eps))
# Rasterize a 2D signal to_raster(x, y, vals) # Rasterize a 3D signal to_raster3(x, y, z, vals) # Check if coordinates are gridded is_gridded(x, tol = sqrt(.Machine$double.eps))
x , y , z
|
The coordinates of the data to be rasterized. For |
vals |
The data values to be rasterized. |
tol |
The tolerance allowed when estimating the resolution. Noise in the sampling rate will be allowed up to this amount when determining if the data is approximately gridded or not. |
This is meant to be a more efficient version of approx2()
when the data is already (approximately) gridded. Otherwise, approx2()
is used.
A numeric vector giving the estimated raster dimensions.
Kylie A. Bemis
# create an image set.seed(1) i <- seq(-4, 4, length.out=12) j <- seq(1, 3, length.out=9) co <- expand.grid(x=i, y=j) z <- matrix(atan(co$x / co$y), nrow=12, ncol=9) vals <- 10 * (z - min(z)) / diff(range(z)) # scatter coordinates and flatten image d <- expand.grid(x=jitter(1:12), y=jitter(1:9)) d$vals <- as.vector(z) # rasterize to_raster(d$x, d$y, d$vals)
# create an image set.seed(1) i <- seq(-4, 4, length.out=12) j <- seq(1, 3, length.out=9) co <- expand.grid(x=i, y=j) z <- matrix(atan(co$x / co$y), nrow=12, ncol=9) vals <- 10 * (z - min(z)) / diff(range(z)) # scatter coordinates and flatten image d <- expand.grid(x=jitter(1:12), y=jitter(1:9)) d$vals <- as.vector(z) # rasterize to_raster(d$x, d$y, d$vals)
Perform linear spatial transformations on a matrix, including rigid, similarity, and affine transformations.
trans2d(x, y, z, pmat, rotate = 0, translate = c(0, 0), scale = c(1, 1), interp = "linear", dimout = dim(z), ...)
trans2d(x, y, z, pmat, rotate = 0, translate = c(0, 0), scale = c(1, 1), interp = "linear", dimout = dim(z), ...)
x , y , z
|
The data to be interpolated. Alternatively, |
pmat |
A 3 x 2 transformation matrix for performing an affine transformation. Automatically generated from |
rotate |
Rotation in degrees. |
translate |
Translation vector, in the same units as |
scale |
Scaling factors. |
interp |
Interpolation method. See |
dimout |
The dimensions of the returned matrix. |
... |
Additional arguments passed to |
If x
is a matrix or z
is provided, returns a transformed matrix with the dimensions of dimout
.
Otherwise, only the transformed coordinates are returned in a data.frame
.
Kylie A. Bemis
set.seed(1) x <- matrix(0, nrow=32, ncol=32) x[9:24,9:24] <- 10 x <- x + runif(length(x)) xt <- trans2d(x, rotate=15, translate=c(-5, 5)) par(mfcol=c(1,2)) image(x, col=hcl.colors(256), main="original") image(xt, col=hcl.colors(256), main="transformed")
set.seed(1) x <- matrix(0, nrow=32, ncol=32) x[9:24,9:24] <- 10 x <- x + runif(length(x)) xt <- trans2d(x, rotate=15, translate=c(-5, 5)) par(mfcol=c(1,2)) image(x, col=hcl.colors(256), main="original") image(xt, col=hcl.colors(256), main="transformed")
Generate a UUIDv4.
uuid(uppercase = FALSE) hex2raw(x) raw2hex(x, uppercase = FALSE)
uuid(uppercase = FALSE) hex2raw(x) raw2hex(x, uppercase = FALSE)
x |
A vector to convert between |
uppercase |
Should the result be in uppercase? |
uuid
generates a random (version 4) universally unique identifier. A private RNG stream is used to avoid collisions due to set.seed
and to avoid altering the global .Random.seed
.
hex2raw
converts a hexadecimal string to a raw
vector.
raw2hex
converts a raw
vector to a hexadecimal string.
For uuid
, a list of length 2:
string
: A character vector giving the UUID.
bytes
: The raw
bytes of the UUID.
For hex2raw
, a raw
vector.
For raw2hex
, a character vector of length 1.
Kylie A. Bemis
id <- uuid() id hex2raw(id$string) raw2hex(id$bytes)
id <- uuid() id hex2raw(id$string) raw2hex(id$bytes)
These functions provide a simple grammar of graphics approach to programming with R's base graphics system.
## Initialize a plot vizi(data, ..., encoding = NULL, mark = NULL, params = NULL) ## Add plot components add_mark(plot, mark, ..., encoding = NULL, data = NULL, trans = NULL, params = NULL) add_facets(plot, by = NULL, data = NULL, nrow = NA, ncol = NA, labels = NULL, drop = TRUE, free = "") ## Set plot attributes set_title(plot, title) set_channel(plot, channel, label = NULL, limits = NULL, scheme = NULL, key = TRUE) set_coord(plot, xlim = NULL, ylim = NULL, zlim = NULL, rev = "", log = "", asp = NA, grid = TRUE) set_engine(plot, engine = c("base", "plotly")) set_par(plot, ..., style = NULL) ## Combine plots as_layers(plotlist, ...) as_facets(plotlist, ..., nrow = NA, ncol = NA, labels = NULL, drop = TRUE, free = "")
## Initialize a plot vizi(data, ..., encoding = NULL, mark = NULL, params = NULL) ## Add plot components add_mark(plot, mark, ..., encoding = NULL, data = NULL, trans = NULL, params = NULL) add_facets(plot, by = NULL, data = NULL, nrow = NA, ncol = NA, labels = NULL, drop = TRUE, free = "") ## Set plot attributes set_title(plot, title) set_channel(plot, channel, label = NULL, limits = NULL, scheme = NULL, key = TRUE) set_coord(plot, xlim = NULL, ylim = NULL, zlim = NULL, rev = "", log = "", asp = NA, grid = TRUE) set_engine(plot, engine = c("base", "plotly")) set_par(plot, ..., style = NULL) ## Combine plots as_layers(plotlist, ...) as_facets(plotlist, ..., nrow = NA, ncol = NA, labels = NULL, drop = TRUE, free = "")
data |
A |
... |
For |
encoding |
Encodings specified as a list rather than using .... |
mark |
The name of a supported mark, such as "points", "lines", etc. |
params |
Additional graphical parameters that are not mapped to the data, using either base R-style (e.g., pch, cex) or ggplot-style names (e.g., shape, size) |
plot , plotlist
|
A |
title |
The title of the plot. |
channel |
The channel to modify, using ggplot-style names (e.g., shape, size). |
label , labels
|
Plotting labels. |
limits |
The limits for the channel, specified as |
scheme |
A function or vector giving the scheme for encoding the channel. For example, a vector of colors, or a function that returns a vector of |
key |
Should a key be generated for the channel? |
xlim , ylim , zlim
|
The plot limits. These only affect the plotting window, not the data. See |
rev |
A string specifying spatial dimensions that should be reversed. E.g., |
log , asp
|
See |
grid |
Should a rectangular grid be included in the plot? |
engine |
The plotting engine. Default is to use base graphics. Using "plotly" requires the |
style |
The visual style to use for plotting. Currently supported styles are "light", "dark", and "classic". |
trans |
A list providing parameters for any transformations the mark supports. |
by |
A vector or formula giving the facet specification. |
nrow , ncol
|
The number of rows and columns in the facet grid. |
drop |
Should empty facets be dropped? |
free |
A string specifying the free spatial dimensions during faceting. E.g., |
Currently supported marks include:
points
: Points (i.e., scatterplots).
lines
: Lines (i.e., line charts).
peaks
: Height (histogram) lines.
text
: Text labels.
rules
: Reference lines.
bars
: Bars for bar charts or histograms.
intervals
: Line intervals for representing error bars or confidence intervals.
boxplot
: Box-and-whisker plots.
image
: Raster graphics.
pixels
: 2D image from pixels.
voxels
: 3D image from voxels.
Currently supported encodings include:
x, y, z
: Positions.
xmin, xmax, ymin, ymax
: Position limits for intervals
and image
.
image
: Rasters for image
.
shape
: Shape of points (pch).
size
: Size of points (cex).
color, colour
: Color (col, fg).
fill
: Fill color (bg).
alpha
: Opacity.
linetype, linewidth, lineend, linejoin, linemetre
: Line properties (lty, lwd, lend, ljoin, lmitre).
An object of class vizi_plot
.
Kylie A. Bemis
vizi_points
,
vizi_lines
,
vizi_peaks
,
vizi_text
,
vizi_intervals
,
vizi_rules
,
vizi_bars
,
vizi_boxplot
,
vizi_image
,
vizi_pixels
,
vizi_voxels
require(datasets) mtcars <- transform(mtcars, am=factor(am, labels=c("auto", "manual"))) # faceted scatter plot vizi(mtcars, x=~disp, y=~mpg) |> add_mark("points") |> add_facets(~mtcars$am) # faceted scatter plot with color vizi(mtcars, x=~disp, y=~mpg, color=~am) |> add_mark("points", params=list(shape=20, size=2, alpha=0.8)) |> add_facets(~mtcars$am) coords <- expand.grid(x=1:nrow(volcano), y=1:ncol(volcano)) # volcano image vizi(coords, x=~x, y=~y, color=volcano) |> add_mark("pixels") |> set_coord(grid=FALSE) |> set_par(xaxs="i", yaxs="i")
require(datasets) mtcars <- transform(mtcars, am=factor(am, labels=c("auto", "manual"))) # faceted scatter plot vizi(mtcars, x=~disp, y=~mpg) |> add_mark("points") |> add_facets(~mtcars$am) # faceted scatter plot with color vizi(mtcars, x=~disp, y=~mpg, color=~am) |> add_mark("points", params=list(shape=20, size=2, alpha=0.8)) |> add_facets(~mtcars$am) coords <- expand.grid(x=1:nrow(volcano), y=1:ncol(volcano)) # volcano image vizi(coords, x=~x, y=~y, color=volcano) |> add_mark("pixels") |> set_coord(grid=FALSE) |> set_par(xaxs="i", yaxs="i")
Set global parameters for plotting vizi
graphics.
## Set style and palettes vizi_style(style = "light", dpal = "Tableau 10", cpal = "Viridis") ## Set plotting engine vizi_engine(engine = c("base", "plotly")) ## Set graphical parameters vizi_par(..., style = getOption("matter.vizi.style"))
## Set style and palettes vizi_style(style = "light", dpal = "Tableau 10", cpal = "Viridis") ## Set plotting engine vizi_engine(engine = c("base", "plotly")) ## Set graphical parameters vizi_par(..., style = getOption("matter.vizi.style"))
style |
The visual style to use for plotting. Currently supported styles are "light", "dark", "voidlight", "voiddark", and "transparent". |
dpal , cpal
|
The name of discrete and continous color palettes. See |
engine |
The plotting engine. Default is to use base graphics. Using "plotly" requires the |
... |
These specify additional graphical parameters (as in |
A character vector or list with the current parameters.
Kylie A. Bemis
Two signals often need to be aligned in order to compare them. The signals may contain a similar pattern, but shifted or stretched by some amount. These functions warp the first signal to align it as closely as possible to the second signal.
# Signal warping based on local events warp1_loc(x, y, tx = seq_along(x), ty = seq_along(y), events = c("maxmin", "max", "min"), n = length(y), interp = c("linear", "loess", "spline"), tol = NA_real_, tol.ref = "abs") # Dynamic time warping warp1_dtw(x, y, tx = seq_along(x), ty = seq_along(y), n = length(y), tol = NA_real_, tol.ref = "abs") # Correlation optimized warping warp1_cow(x, y, tx = seq_along(x), ty = seq_along(y), nbins = NA_integer_, n = length(y), tol = NA_real_, tol.ref = "abs")
# Signal warping based on local events warp1_loc(x, y, tx = seq_along(x), ty = seq_along(y), events = c("maxmin", "max", "min"), n = length(y), interp = c("linear", "loess", "spline"), tol = NA_real_, tol.ref = "abs") # Dynamic time warping warp1_dtw(x, y, tx = seq_along(x), ty = seq_along(y), n = length(y), tol = NA_real_, tol.ref = "abs") # Correlation optimized warping warp1_cow(x, y, tx = seq_along(x), ty = seq_along(y), nbins = NA_integer_, n = length(y), tol = NA_real_, tol.ref = "abs")
x , y
|
Signals to be aligned by warping |
tx , ty
|
The domain variable of the signals. If |
events |
The type of events to use for calculating the alignment. |
n |
The number of samples in the warped |
interp |
The interpolation method used when warping the signal. |
tol , tol.ref
|
A tolerance specifying the maximum allowed distance between aligned samples. See |
nbins |
The number of signal segments used for warping. The correlation is maximized for each segment. |
warp1_loc()
uses a simple event-based alignment. Events are defined as local extrema. The events are matched between each signal based on proximity. The shift between the events in x
and y
are calculated and interpolated to find shifts for each sample. The warped signal is then calculated from these shifts. This is a simple heuristic method, but it is relatively fast and typically good enough for aligning peak locations.
warp1_dtw()
performs dynamic time warping. In dynamic time warping, each sample in x
is matched to a corresponding sample in y
using dynamic programming to find the optimal matches. The version implemented here is constrained by the given tolerance. This both reduces the necessary memory, and in practice tends to give more realistic (and therefore accurate) results than an unconstrained alignment. An unconstrained alignment can still be obtained by setting a high tolerance, but this may use a lot of memory.
warp1_cow()
performs correlation optimized. In correlation optimized warping, each signal is divided into some number of segments. Dynamic programming is then used to find the placement of the segment boundaries that maximizes the correlation of all the segments.
A numeric vector the same length as y
with the warped x
.
Kylie A. Bemis
G. Tomasi, F. van den Berg, and C. Andersson. “Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data.” Journal of Chemometrics, vol. 18, pp. 231-241, July 2004.
N. V. Nielsen, J. M. Carstensen, and J. Smedsgaard. “Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping.” Journal of Chromatography A, vol. 805, pp. 17-35, Jan. 1998.
set.seed(1) t <- seq(from=0, to=6 * pi, length.out=2000) dt <- 0.3 * (sin(t) + 0.6 * sin(2.6 * t)) x <- sin(t + dt) + 0.6 * sin(2.6 * (t + dt)) y <- sin(t) + 0.6 * sin(2.6 * t) xw <- warp1_dtw(x, y) plot(y, type="l") lines(x, col="blue") lines(xw, col="red", lty=2)
set.seed(1) t <- seq(from=0, to=6 * pi, length.out=2000) dt <- 0.3 * (sin(t) + 0.6 * sin(2.6 * t)) x <- sin(t + dt) + 0.6 * sin(2.6 * (t + dt)) y <- sin(t) + 0.6 * sin(2.6 * t) xw <- warp1_dtw(x, y) plot(y, type="l") lines(x, col="blue") lines(xw, col="red", lty=2)
Register two images by warping a "moving" image to align with the "fixed" image.
# Transformation-based registration warp2_trans(x, y, control = list(), trans = c("rigid", "similarity", "affine"), metric = c("cor", "mse", "mi"), nbins = 64L, scale = TRUE, dimout = dim(y)) # Mutual information mi(x, y, n = 64L)
# Transformation-based registration warp2_trans(x, y, control = list(), trans = c("rigid", "similarity", "affine"), metric = c("cor", "mse", "mi"), nbins = 64L, scale = TRUE, dimout = dim(y)) # Mutual information mi(x, y, n = 64L)
x , y
|
Images to be aligned by warping |
control |
A list of optimization control parameters. Passed to |
trans |
The type of transformation allowed: "rigid" means translation and rotation, "similarity" means translation, rotation, and scaling, and "affine" means a general affine transformation. |
metric |
The metric to optimize. |
nbins , n
|
The number of histogram bins to use when calculating mutual information. |
scale |
Should the images be normalized to have the same intensity range? |
dimout |
The dimensions of the returned matrix. |
warp2_trans()
performs a simple transformation-based registration using optim
for optimization.
A numeric vector the same length as y
with the warped x
.
Kylie A. Bemis
set.seed(1) x <- matrix(0, nrow=32, ncol=32) x[9:24,9:24] <- 10 x <- x + runif(length(x)) xt <- trans2d(x, rotate=15, translate=c(-5, 5)) xw <- warp2_trans(xt, x) par(mfcol=c(1,3)) image(x, col=hcl.colors(256), main="original") image(xt, col=hcl.colors(256), main="transformed") image(xw, col=hcl.colors(256), main="registered")
set.seed(1) x <- matrix(0, nrow=32, ncol=32) x[9:24,9:24] <- 10 x <- x + runif(length(x)) xt <- trans2d(x, rotate=15, translate=c(-5, 5)) xw <- warp2_trans(xt, x) par(mfcol=c(1,3)) image(x, col=hcl.colors(256), main="original") image(xt, col=hcl.colors(256), main="transformed") image(xw, col=hcl.colors(256), main="registered")