Title: | Save Bioconductor Objects to File |
---|---|
Description: | Save Bioconductor data structures into file artifacts, and load them back into memory. This is a more robust and portable alternative to serialization of such objects into RDS files. Each artifact is associated with metadata for further interpretation; downstream applications can enrich this metadata with context-specific properties. |
Authors: | Aaron Lun [aut, cre] |
Maintainer: | Aaron Lun <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.7.2 |
Built: | 2024-12-13 02:57:36 UTC |
Source: | https://github.com/bioc/alabaster.base |
WARNING: these functions are deprecated.
Applications are expected to handle acquisition of files before loaders are called.
Acquire a file or metadata for loading.
As one might expect, these are typically used inside a load*
function.
acquireFile(project, path) acquireMetadata(project, path) ## S4 method for signature 'character' acquireFile(project, path) ## S4 method for signature 'character' acquireMetadata(project, path)
acquireFile(project, path) acquireMetadata(project, path) ## S4 method for signature 'character' acquireFile(project, path) ## S4 method for signature 'character' acquireMetadata(project, path)
project |
Any value specifying the project of interest. The default methods expect a string containing a path to a staging directory, but other objects can be used to control dispatch. |
path |
String containing a relative path to a resource inside the staging directory. |
By default, files and metadata are loaded from the same staging directory that is written to by stageObject
.
alabaster applications can define custom methods to obtain the files and metadata from a different location, e.g., remote databases.
This is achieved by dispatching on a different class of project
.
Each custom acquisition method should take two arguments.
The first argument is an R object representing some concept of a “project”.
In the default case, this is a string containing a path to the staging directory representing the project.
However, it can be anything, e.g., a number containing a database identifier, a list of identifiers and versions, and so on -
as long as the custom acquisition method is capable of understanding it, the load*
functions don't care.
The second argument is a string containing the relative path to the resource inside that project.
This should be the path to a specific file inside the project, not the subdirectory containing the file.
More concretely, it should be equivalent to the path
in the output of stageObject
,
not the path to the subdirectory used as the input to the same function.
The return value for each custom acquisition function should be the same as their local counterparts. That is, any custom file acquisition function should return a file path, and any custom metadata acquisition function should return a naamed list of metadata.
acquireFile
methods return a local path to the file corresponding to the requested resource.
acquireMetadata
methods return a named list of metadata for the requested resource.
Aaron Lun
# Staging an example DataFrame: library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() dir.create(tmp) info <- stageObject(df, tmp, path="coldata") writeMetadata(info, tmp) # Retrieving the metadata: meta <- acquireMetadata(tmp, "coldata/simple.csv.gz") str(meta) # Retrieving the file: acquireFile(tmp, "coldata/simple.csv.gz")
# Staging an example DataFrame: library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() dir.create(tmp) info <- stageObject(df, tmp, path="coldata") writeMetadata(info, tmp) # Retrieving the metadata: meta <- acquireMetadata(tmp, "coldata/simple.csv.gz") str(meta) # Retrieving the file: acquireFile(tmp, "coldata/simple.csv.gz")
Allow alabaster applications to specify an alternative reading function in altReadObject
.
altReadObject(...) altReadObjectFunction(fun)
altReadObject(...) altReadObjectFunction(fun)
... |
Further arguments to pass to |
fun |
Function that can serve as a drop-in replacement for |
By default, altReadObject
is just a wrapper around readObject
.
However, if altReadObjectFunction
is called, altReadObject
calls the replacement fun
instead.
This allows alabaster applications to inject wholesale or class-specific customizations into the reading process,
e.g., to add more metadata whenever an instance of a particular class is encountered.
Developers of alabaster extensions should use altReadObject
(instead of readObject
) to read child objects when writing their own reading functions,
to ensure that application-specific customizations are respected for the children.
To motivate the use of altReadObject
, consider the following scenario.
We have created a reading function readX
function to read an instance of class X in an alabaster extension.
This function may be called by readObject
if instances of X are children of other objects.
An alabaster application Y requires the addition of some custom metadata during the reading process for X.
It defines an alternative reading function readObject2
that, upon encountering a schema for X, redirects to a application-specific reader readX2
.
An example implementation for readX2
would involve calling readX
and decorating the result with the extra metadata.
When operating in the context of application Y, the readObject2
generic is used to set altReadObjectFunction
.
Any calls to altReadObject
in Y's context will subsequently call readObject2
.
So, when writing a reading function in an alabaster extension for a class that might contain instances of X as children,
we use altReadObject
instead of directly using readObject
.
This ensures that, if a child instance of X is encountered and we are operating in the context of application Y,
we correctly call readObject2
and then ultimately readX2
.
The application-specific fun
is free to do anything it wants as long as it understands the representation.
It is usually most convenient to leverage the existing functionality in readObject
,
but if the application-specific saver in altSaveObject
does something unusual,
then fun
is responsible for the correct interpretation of any custom representation.
For altReadObject
, any R object similar to those returned by readObject
.
For altReadObjectFunction
, the alternative function (if any) is returned if fun
is missing.
If fun
is provided, it is used to define the alternative, and the previous alternative is returned.
Aaron Lun
old <- altReadObjectFunction() # Setting it to something. altReadObjectFunction(function(...) { print("YAY") readObject(...) }) # Staging an example DataFrame: library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() saveObject(df, tmp) # And now reading it - this should print our message. altReadObject(tmp) # Restoring the old reader: altReadObjectFunction(old)
old <- altReadObjectFunction() # Setting it to something. altReadObjectFunction(function(...) { print("YAY") readObject(...) }) # Staging an example DataFrame: library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() saveObject(df, tmp) # And now reading it - this should print our message. altReadObject(tmp) # Restoring the old reader: altReadObjectFunction(old)
Allow alabaster applications to divert to a different saving generic instead of saveObject
.
altSaveObject(...) altSaveObjectFunction(generic)
altSaveObject(...) altSaveObjectFunction(generic)
... |
Further arguments to pass to |
generic |
Generic function that can serve as a drop-in replacement for |
By default, altSaveObject
is just a wrapper around saveObject
.
However, if altSaveObjectFunction
is called, altSaveObject
calls the replacement generic
instead.
This allows alabaster applications to inject wholesale or class-specific customizations into the saving process,
e.g., to save more metadata whenever an instance of a particular class is encountered.
Developers of alabaster extensions should use altSaveObject
to save child objects when implementing saveObject
methods,
to ensure that application-specific customizations are respected for the children.
To motivate the use of altSaveObject
, consider the following scenario.
We have created a staging method for class X, defined for the saveObject
generic.
An alabaster application Y requires the addition of some custom metadata during the staging process for X.
It defines an alternative staging generic saveObject2
that, upon encountering an instance of X, redirects to an application-specific method (i.e., saveObject2,X-method
).
For example, the saveObject2
method for X could call X's saveObject
method and add the necessary metadata to the result.
When operating in the context of application Y, the saveObject2
generic is used to set altSaveObjectFunction
.
Any calls to altSaveObject
in Y's context will subsequently call saveObject2
.
So, when writing a saveObject
method for any objects that might contain an instance of X as a child,
we call altSaveObject
on that X object instead of directly using saveObject
.
This ensures that, if a child instance of X is encountered and we are operating in the context of application Y,
we correctly call saveObject2
and then ultimately the application-specific method.
The application-specific generic
is free to do anything it wants as long as the custom representation is understood by the application-specific reader in altReadObject
.
However, it is usually most convenient to re-use the existing representations created by saveObject
.
This means that any customizations should not interfere with the validity of those representations, as defined by the takane specifications and enforced by validateObject
.
We recommend that any customizations should manifest as new files starting with an underscore, as this will not interfere by any takane file specification.
For altSaveObject
, files are created at the specified location, see saveObject
for details.
For altSaveObjectFunction
, the alternative generic (if any) is returned if generic
is missing.
If generic
is provided, it is used to define the alternative, and the previous alternative is returned.
Aaron Lun
old <- altSaveObjectFunction() # Creating a new generic for demonstration purposes: setGeneric("superSaveObject", function(x, path, ...) standardGeneric("superSaveObject")) setMethod("superSaveObject", "ANY", function(x, path, ...) { print("Falling back to the base method!") saveObject(x, path, ...) }) altSaveObjectFunction(superSaveObject) # Staging an example DataFrame. This should print our message. library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() altSaveObject(df, tmp) # Restoring the old loader: altSaveObjectFunction(old)
old <- altSaveObjectFunction() # Creating a new generic for demonstration purposes: setGeneric("superSaveObject", function(x, path, ...) standardGeneric("superSaveObject")) setMethod("superSaveObject", "ANY", function(x, path, ...) { print("Falling back to the base method!") saveObject(x, path, ...) }) altSaveObjectFunction(superSaveObject) # Staging an example DataFrame. This should print our message. library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() altSaveObject(df, tmp) # Restoring the old loader: altSaveObjectFunction(old)
Find missing (NA
) values.
This is smart enough to distinguish them from NaN values in numeric x
.
For all other types, it just calls is.na
or anyNA
.
anyMissing(x) is.missing(x)
anyMissing(x) is.missing(x)
x |
Vector or array of atomic values. |
For anyMissing
, a logical scalar indicating whether any NA
values were present in x
.
For is.missing
, a logical vector or array of shape equal to x
, indicating whether each value is NA
.
Aaron Lun
anyNA(c(NaN)) anyNA(c(NA)) anyMissing(c(NaN)) anyMissing(c(NA)) is.na(c(NA, NaN)) is.missing(c(NA, NaN))
anyNA(c(NaN)) anyNA(c(NA)) anyMissing(c(NaN)) anyMissing(c(NA)) is.na(c(NA, NaN)) is.missing(c(NA, NaN))
In the alabaster.* framework, we mark missing entries inside HDF5 datasets with placeholder values. This function chooses a value for the placeholder that does not overlap with anything else in a vector.
chooseMissingPlaceholderForHdf5(x, .version = 3)
chooseMissingPlaceholderForHdf5(x, .version = 3)
x |
An atomic vector to be saved to HDF5. |
.version |
Internal use only. |
For floating-point datasets, the placeholder will not be NA if there are mixtures of NAs and NaNs. We do not rely on the NaN payload to distinguish between these two values.
Placeholder values are typically saved as scalar attributes on the HDF5 dataset that they are used in.
The usual name of this attribute is "missing-value-placeholder"
, as encoding by missingPlaceholderName
.
A placeholder value for missing values in x
,
guaranteed to not be equal to any non-missing value in x
.
chooseMissingPlaceholderForHdf5(c(TRUE, NA, FALSE)) chooseMissingPlaceholderForHdf5(c(1L, NA, 2L)) chooseMissingPlaceholderForHdf5(c("aaron", NA, "barry")) chooseMissingPlaceholderForHdf5(c("aaron", NA, "barry", "NA")) chooseMissingPlaceholderForHdf5(c(1.5, NA, 2.6)) chooseMissingPlaceholderForHdf5(c(1.5, NaN, NA, 2.6))
chooseMissingPlaceholderForHdf5(c(TRUE, NA, FALSE)) chooseMissingPlaceholderForHdf5(c(1L, NA, 2L)) chooseMissingPlaceholderForHdf5(c("aaron", NA, "barry")) chooseMissingPlaceholderForHdf5(c("aaron", NA, "barry", "NA")) chooseMissingPlaceholderForHdf5(c(1.5, NA, 2.6)) chooseMissingPlaceholderForHdf5(c(1.5, NaN, NA, 2.6))
WARNING: this function is deprecated. Redirection is no longer supported in the latest alabaster framework. Create a redirection to another path in the same staging directory. This is useful for creating short-hand aliases for resources that have inconveniently long paths.
createRedirection(dir, src, dest)
createRedirection(dir, src, dest)
dir |
String containing the path to the staging directory. |
src |
String containing the source path relative to |
dest |
String containing the destination path relative to |
src
should not correspond to an existing file inside dir
.
This avoids ambiguity when attempting to load src
via acquireMetadata
.
Otherwise, it would be unclear as to whether the user wants the file at src
or the redirection target dest
.
src
may correspond to existing directories.
This is because directories cannot be used in acquireMetadata
, so no such ambiguity exists.
A list of metadata that can be processed by writeMetadata
.
Aaron Lun
# Staging an example DataFrame: library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() dir.create(tmp) info <- stageObject(df, tmp, path="coldata") writeMetadata(info, tmp) # Creating a redirection: redirect <- createRedirection(tmp, "foobar", "coldata/simple.csv.gz") writeMetadata(redirect, tmp) # We can then use this redirect to pull out metadata: info2 <- acquireMetadata(tmp, "foobar") str(info2)
# Staging an example DataFrame: library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() dir.create(tmp) info <- stageObject(df, tmp, path="coldata") writeMetadata(info, tmp) # Creating a redirection: redirect <- createRedirection(tmp, "foobar", "coldata/simple.csv.gz") writeMetadata(redirect, tmp) # We can then use this redirect to pull out metadata: info2 <- acquireMetadata(tmp, "foobar") str(info2)
Basically just better versions of those in rhdf5, dedicated to alabaster.base and its dependents. Intended for alabaster.* developers only.
List all objects in a directory, along with their types.
listObjects(dir, include.children = FALSE)
listObjects(dir, include.children = FALSE)
dir |
String containing a path to a staging directory. |
include.children |
Logical scalar indicating whether to include child objects. |
DFrame where each row corresponds to an object and contains;
path
, the relative path to the object's subdirectory inside dir
.
type
, the type of the object
child
, whether or not the object is a child of another object.
If include.children=FALSE
, metadata is only returned for non-child objects.
Aaron Lun
tmp <- tempfile() dir.create(tmp) library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) saveObject(df, file.path(tmp, "whee")) ll <- list(A=1, B=LETTERS, C=DataFrame(X=1:5)) saveObject(ll, file.path(tmp, "stuff")) listObjects(tmp) listObjects(tmp, include.children=TRUE)
tmp <- tempfile() dir.create(tmp) library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) saveObject(df, file.path(tmp, "whee")) ll <- list(A=1, B=LETTERS, C=DataFrame(X=1:5)) saveObject(ll, file.path(tmp, "stuff")) listObjects(tmp) listObjects(tmp, include.children=TRUE)
WARNING: this function is deprecated, use listObjects
and loop over entries with readObject
instead.
As the title suggests, this function loads all non-child objects in a staging directory.
All loading is performed using altLoadObject
to respect any application-specific overrides.
Children are used to assemble their parent objects and are not reported here.
loadDirectory(dir, redirect.action = c("from", "to", "both"))
loadDirectory(dir, redirect.action = c("from", "to", "both"))
dir |
String containing a path to a staging directory. |
redirect.action |
String specifying how redirects should be handled:
|
A named list is returned containing all (non-child) R objects in dir
.
Aaron Lun
tmp <- tempfile() dir.create(tmp) library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) meta <- stageObject(df, tmp, path="whee") writeMetadata(meta, tmp) ll <- list(A=1, B=LETTERS, C=DataFrame(X=1:5)) meta <- stageObject(ll, tmp, path="stuff") writeMetadata(meta, tmp) redirect <- createRedirection(tmp, "whoop", "whee/simple.csv.gz") writeMetadata(redirect, tmp) all.meta <- loadDirectory(tmp) str(all.meta)
tmp <- tempfile() dir.create(tmp) library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) meta <- stageObject(df, tmp, path="whee") writeMetadata(meta, tmp) ll <- list(A=1, B=LETTERS, C=DataFrame(X=1:5)) meta <- stageObject(ll, tmp, path="stuff") writeMetadata(meta, tmp) redirect <- createRedirection(tmp, "whoop", "whee/simple.csv.gz") writeMetadata(redirect, tmp) all.meta <- loadDirectory(tmp) str(all.meta)
WARNING: this function is deprecated, as directories of non-child objects can just be moved with regular methods (e.g., file.rename
) in the latest version of alabaster.
Pretty much as it says in the title.
This only works with non-child objects as children are referenced by their parents and cannot be safely moved in this manner.
moveObject(dir, from, to, rename.redirections = TRUE)
moveObject(dir, from, to, rename.redirections = TRUE)
dir |
String containing the path to the staging directory. |
from |
String containing the path to a non-child object inside |
to |
String containing the new path inside |
rename.redirections |
Logical scalar specifying whether redirections pointing to |
This function will look around path
for JSON files containing redirections to from
, and update them to point to to
.
More specifically, if path
is a subdirectory, it will search in the same directory containing path
;
otherwise, it will search in the directory containing dirname(path)
.
Redirections in other locations will not be removed automatically - these will be caught by checkValidDirectory
and should be manually updated.
If rename.redirections=TRUE
, this function will additionally move the redirection files so that they are named as to
.
In the unusual case where from
is the target of multiple redirection files, the renaming process will clobber all of them such that only one of them will be present after the move.
The object represented by path
is moved, along with any redirections to it.
A NULL
is invisibly returned.
In general, alabaster.* representations are safe to move as only the parent object's resource.path
metadata properties will contain links to the children's paths.
These links are updated with the new to
path after running moveObject
on the parent from
.
However, alabaster applications may define custom data structures where the paths are present elsewhere, e.g., in the data file itself or in other metadata properties.
If so, applications are reponsible for updating those paths to reflect the naming to to
.
Aaron Lun
tmp <- tempfile() dir.create(tmp) library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) meta <- stageObject(df, tmp, path="whee") writeMetadata(meta, tmp) ll <- list(A=1, B=LETTERS, C=DataFrame(X=1:5)) meta <- stageObject(ll, tmp, path="stuff") writeMetadata(meta, tmp) redirect <- createRedirection(tmp, "whoop", "whee/simple.csv.gz") writeMetadata(redirect, tmp) list.files(tmp, recursive=TRUE) moveObject(tmp, "whoop", "YAY") list.files(tmp, recursive=TRUE)
tmp <- tempfile() dir.create(tmp) library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) meta <- stageObject(df, tmp, path="whee") writeMetadata(meta, tmp) ll <- list(A=1, B=LETTERS, C=DataFrame(X=1:5)) meta <- stageObject(ll, tmp, path="stuff") writeMetadata(meta, tmp) redirect <- createRedirection(tmp, "whoop", "whee/simple.csv.gz") writeMetadata(redirect, tmp) list.files(tmp, recursive=TRUE) moveObject(tmp, "whoop", "YAY") list.files(tmp, recursive=TRUE)
WARNING: these functions are deprecated as the saving/reading functions are already simple enough in the newer versions of the alabaster framework.
Read and write objects from a local staging directory.
These are just convenience wrappers around functions like loadObject
, stageObject
and writeMetadata
.
quickLoadObject(dir, path, ...) quickStageObject(x, dir, path, ...)
quickLoadObject(dir, path, ...) quickStageObject(x, dir, path, ...)
dir |
String containing a path to the directory. |
path |
String containing a relative path to the object of interest inside |
... |
Further arguments to pass to |
x |
Object to be saved. |
For quickLoadObject
, the object at path
.
For quickStageObject
, the object is saved to path
inside dir
.
All necessary directories are created if they are not already present.
A NULL
is returned invisibly.
Aaron Lun
local <- tempfile() # Creating a slightly complicated object: library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) df$C <- DataFrame(D=letters[1:10], E=runif(10)) # Saving it: quickStageObject(df, local, "FOOBAR") # Reading it back: quickLoadObject(local, "FOOBAR")
local <- tempfile() # Creating a slightly complicated object: library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) df$C <- DataFrame(D=letters[1:10], E=runif(10)) # Saving it: quickStageObject(df, local, "FOOBAR") # Reading it back: quickLoadObject(local, "FOOBAR")
Quickly read and write a CSV file, usually as a part of staging or loading a larger object. This assumes that all files follow the comservatory specification.
quickReadCsv( path, expected.columns, expected.nrows, compression, row.names, parallel = TRUE ) quickWriteCsv( df, path, ..., row.names = FALSE, compression = "gzip", validate = TRUE )
quickReadCsv( path, expected.columns, expected.nrows, compression, row.names, parallel = TRUE ) quickWriteCsv( df, path, ..., row.names = FALSE, compression = "gzip", validate = TRUE )
path |
String containing a path to a CSV to read/write. |
expected.columns |
Named character vector specifying the type of each column in the CSV (excluding the first column containing row names, if |
expected.nrows |
Integer scalar specifying the expected number of rows in the CSV. |
compression |
String specifying the compression that was/will be used.
This should be either |
row.names |
For For |
parallel |
Whether reading and parsing should be performed concurrently. |
df |
A DFrame or data.frame object, containing only atomic columns. |
... |
Further arguments to pass to |
validate |
Whether to double-check that the generated CSV complies with the comservatory specification. |
For .quickReadCsv
, a DFrame containing the contents of path
.
For .quickWriteCsv
, df
is written to path
and a NULL
is invisibly returned.
Aaron Lun
library(S4Vectors) df <- DataFrame(A=1, B="Aaron") temp <- tempfile() .quickWriteCsv(df, path=temp, row.names=FALSE, compression="gzip") .quickReadCsv(temp, c(A="numeric", B="character"), 1, "gzip", FALSE)
library(S4Vectors) df <- DataFrame(A=1, B="Aaron") temp <- tempfile() .quickWriteCsv(df, path=temp, row.names=FALSE, compression="gzip") .quickReadCsv(temp, c(A="numeric", B="character"), 1, "gzip", FALSE)
Read a vector consisting of atomic elements from its on-disk representation.
This is usually not directly called by users, but is instead called by dispatch in readObject
.
readAtomicVector(path, metadata, ...)
readAtomicVector(path, metadata, ...)
path |
Path to a directory created with any of the vector methods for |
metadata |
Named list containing metadata for the object, see |
... |
Further arguments, ignored. |
The vector described by info
.
Aaron Lun
"saveObject,integer-method"
, for one of the staging methods.
tmp <- tempfile() saveObject(setNames(runif(26), letters), tmp) readObject(tmp)
tmp <- tempfile() saveObject(setNames(runif(26), letters), tmp) readObject(tmp)
Read a base R factor from its on-disk representation.
This is usually not directly called by users, but is instead called by dispatch in readObject
.
readBaseFactor(path, metadata, ...)
readBaseFactor(path, metadata, ...)
path |
String containing a path to a directory, itself created with the |
metadata |
Named list containing metadata for the object, see |
... |
Further arguments, ignored. |
The vector described by info
.
Aaron Lun
"saveObject,factor-method"
, for the staging method.
tmp <- tempfile() saveObject(factor(letters[1:10], letters), tmp) readObject(tmp)
tmp <- tempfile() saveObject(factor(letters[1:10], letters), tmp) readObject(tmp)
Read a list from its on-disk representation.
This is usually not directly called by users, but is instead called by dispatch in readObject
.
readBaseList(path, metadata, simple_list.parallel = TRUE, ...)
readBaseList(path, metadata, simple_list.parallel = TRUE, ...)
path |
String containing a path to a directory, itself created with the list method for |
metadata |
Named list containing metadata for the object, see |
simple_list.parallel |
Whether to perform reading and parsing in parallel for greater speed. Only relevant for lists stored in the JSON format. |
... |
Further arguments to be passed to |
The uzuki2 specification (see https://github.com/ArtifactDB/uzuki2) allows length-1 vectors to be stored as-is or as a scalar.
If the file stores a length-1 vector as-is, readBaseList
will read the list element as a length-1 vector with the AsIs class.
If the file stores a length-1 vector as a scalar, readBaseList
will read the list element as a length-1 vector without this class.
This allows downstream users to distinguish between the storage modes in the rare cases that it is necessary.
The list represented by path
.
Aaron Lun
"stageObject,list-method"
, for the staging method.
library(S4Vectors) ll <- list(A=1, B=LETTERS, C=DataFrame(X=letters)) tmp <- tempfile() saveObject(ll, tmp) readObject(tmp)
library(S4Vectors) ll <- list(A=1, B=LETTERS, C=DataFrame(X=letters)) tmp <- tempfile() saveObject(ll, tmp) readObject(tmp)
Read a DFrame from its on-disk representation.
This is usually not directly called by users, but is instead called by dispatch in readObject
.
readDataFrame(path, metadata, ...)
readDataFrame(path, metadata, ...)
path |
String containing a path to the directory, itself created with |
metadata |
Named list containing metadata for the object, see |
... |
Further arguments, passed to |
The DFrame represented by path
.
Aaron Lun
"saveObject,DataFrame-method"
, for the staging method.
library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() saveObject(df, tmp) readObject(tmp)
library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() saveObject(df, tmp) readObject(tmp)
Read a DataFrameFactor from its on-disk representation.
This is usually not directly called by users, but is instead called by dispatch in readObject
.
readDataFrameFactor(path, metadata, ...)
readDataFrameFactor(path, metadata, ...)
path |
String containing a path to a directory, itself created with the |
metadata |
Named list containing metadata for the object, see |
... |
Further arguments to pass to internal |
A DataFrameFactor represented by path
.
Aaron Lun
"saveObject,DataFrameFactor-method"
, for the staging method.
library(S4Vectors) df <- DataFrame(X=LETTERS[1:5], Y=1:5) out <- DataFrameFactor(df[sample(5, 100, replace=TRUE),,drop=FALSE]) tmp <- tempfile() saveObject(out, tmp) readObject(tmp)
library(S4Vectors) df <- DataFrame(X=LETTERS[1:5], Y=1:5) out <- DataFrameFactor(df[sample(5, 100, replace=TRUE),,drop=FALSE]) tmp <- tempfile() saveObject(out, tmp) readObject(tmp)
Read metadata
and mcols
for a Annotated or Vector object, respectively.
This is typically used inside loading functions for concrete subclasses.
readMetadata(x, metadata.path, mcols.path, ...)
readMetadata(x, metadata.path, mcols.path, ...)
x |
|
metadata.path |
String containing a path to a directory, itself containing an on-disk representation of a base R list to be used as the |
mcols.path |
String containing a path to a directory, itself containing an on-disk representation of a DataFrame to be used as the |
... |
Further arguments to be passed to |
x
is returned, possibly with mcols
and metadata
added to it.
Aaron Lun
saveMetadata
, which does the staging.
Read an object from its on-disk representation.
This is done by dispatching to an appropriate loading function based on the type in the OBJECT
file.
readObject(path, metadata = NULL, ...) readObjectFunctionRegistry() registerReadObjectFunction(type, fun, existing = c("old", "new", "error"))
readObject(path, metadata = NULL, ...) readObjectFunctionRegistry() registerReadObjectFunction(type, fun, existing = c("old", "new", "error"))
path |
String containing a path to a directory, itself created with a |
metadata |
Named list containing metadata for the object - most importantly, the |
... |
Further arguments to pass to individual methods. |
type |
String specifying the name of type of the object. |
fun |
A loading function that accepts |
existing |
Logical scalar indicating the action to take if a function has already been registered for |
For readObject
, an object created from the on-disk representation in path
.
For readObjectFunctionRegistry
, a named list of functions used to load each object type.
For registerReadObjectFunction
, the function is added to the registry.
readObject
uses an internal registry of functions to decide how an object should be loaded into memory.
Developers of alabaster extensions can add extra functions to this registry, usually in the .onLoad
function of their packages.
Alternatively, extension developers can request the addition of their packages to default registry.
If a loading function makes use of additional arguments in ...
,
those arguments should be prefixed by the name of the object type for each method, e.g., simple_list.parallel
.
This avoids problems with conflicts in the interpretation of identically named arguments between different functions.
Unlike the ...
arguments in saveObject
, we prefix by the object type instead of the output class, as the former is used for dispatch here.
When writing loading functions for complex classes, extension developers may need to load child objects to compose the output object.
In such cases, developers should use altReadObject
on the child subdirectories, rather than calling readObject
directly.
This ensures that any application-level overrides of the loading functions are respected.
It is also expected that arguments in ...
are forwarded to internal altReadObject
calls.
Developers can manually control readObject
dispatch by suppling a metadata
list where metadata$type
is set to the desired object type.
This pattern is commonly used inside the loading function for a subclass -
an instance of the base class is first constructed by an internal readObject
call with the modified metadata$type
, after which the subclass-specific slots are added.
(In practice, base construction should be done using altReadObject
so as to respect application-specific overrides.)
Application developers can override readObject
by specifying a custom function in altReadObject
.
This can be used to point to a different registry of reading functions, to perform pre- or post-reading actions, etc.
If customization is type-specific, the custom altReadObject
function can read the type from the OBJECT
file to determine the most appropriate course of action;
the OBJECT
metadata can then be passed to the metadata
argument of any internal readObject
calls to avoid a redundant read from the same file.
Aaron Lun
library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() saveObject(df, tmp) readObject(tmp)
library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() saveObject(df, tmp) readObject(tmp)
The OBJECT file inside each directory provides some high-level metadata of the object represented by that directory.
It is guaranteed to have a type
property that specifies the object type;
individual objects may add their own information to this file.
These methods are intended for developers to easily read and load information in the OBJECT file.
readObjectFile(path) saveObjectFile(path, type, extra = list())
readObjectFile(path) saveObjectFile(path, type, extra = list())
path |
Path to the directory representing an object. |
type |
String specifying the type of the object. |
extra |
Named list containing extra metadata to be written to the OBJECT file in |
readObjectFile
returns a named list of metadata for path
.
saveObjectFile
saves metadata
to the OBJECT file inside path
Aaron Lun
tmp <- tempfile() dir.create(tmp) saveObjectFile(tmp, "foo", list(bar=list(version="1.0"))) readObjectFile(tmp)
tmp <- tempfile() dir.create(tmp) saveObjectFile(tmp, "foo", list(bar=list(version="1.0"))) readObjectFile(tmp)
WARNING: this function is deprecated, as directories of non-child objects can just be deleted with regular methods (e.g., file.rename
) in the latest version of alabaster.
Pretty much as it says in the title.
This only works with non-child objects as children are referenced by their parents and cannot be safely removed in this manner.
removeObject(dir, path)
removeObject(dir, path)
dir |
String containing the path to the staging directory. |
path |
String containing the path to a non-child object inside |
This function will search around path
for JSON files containing redirections to path
, and remove them.
More specifically, if path
is a subdirectory, it will search in the same directory containing path
;
otherwise, it will search in the directory containing dirname(path)
.
Redirections in other locations will not be removed automatically - these will be caught by checkValidDirectory
and should be manually removed.
The object represented by path
is removed, along with any redirections to it.
A NULL
is invisibly returned.
Aaron Lun
tmp <- tempfile() dir.create(tmp) library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) meta <- stageObject(df, tmp, path="whee") writeMetadata(meta, tmp) ll <- list(A=1, B=LETTERS, C=DataFrame(X=1:5)) meta <- stageObject(ll, tmp, path="stuff") writeMetadata(meta, tmp) redirect <- createRedirection(tmp, "whoop", "whee/simple.csv.gz") writeMetadata(redirect, tmp) list.files(tmp, recursive=TRUE) removeObject(tmp, "whoop") list.files(tmp, recursive=TRUE)
tmp <- tempfile() dir.create(tmp) library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) meta <- stageObject(df, tmp, path="whee") writeMetadata(meta, tmp) ll <- list(A=1, B=LETTERS, C=DataFrame(X=1:5)) meta <- stageObject(ll, tmp, path="stuff") writeMetadata(meta, tmp) redirect <- createRedirection(tmp, "whoop", "whee/simple.csv.gz") writeMetadata(redirect, tmp) list.files(tmp, recursive=TRUE) removeObject(tmp, "whoop") list.files(tmp, recursive=TRUE)
The Rfc3339 class is a character vector that stores Internet Date/time timestamps, formatted as described in RFC3339. It provides a faithful representation of any RFC3339-compliant string in an R session.
as.Rfc3339(x) ## S3 method for class 'character' as.Rfc3339(x) ## Default S3 method: as.Rfc3339(x) ## S3 method for class 'POSIXt' as.Rfc3339(x) ## S3 method for class 'Rfc3339' as.character(x, ...) is.Rfc3339(x) ## S3 method for class 'Rfc3339' as.POSIXct(x, tz = "", ...) ## S3 method for class 'Rfc3339' as.POSIXlt(x, tz = "", ...) ## S3 method for class 'Rfc3339' x[i] ## S3 method for class 'Rfc3339' x[[i]] ## S3 replacement method for class 'Rfc3339' x[i] <- value ## S3 replacement method for class 'Rfc3339' x[[i]] <- value ## S3 method for class 'Rfc3339' c(..., recursive = TRUE) ## S4 method for signature 'Rfc3339' saveObject(x, path, ...)
as.Rfc3339(x) ## S3 method for class 'character' as.Rfc3339(x) ## Default S3 method: as.Rfc3339(x) ## S3 method for class 'POSIXt' as.Rfc3339(x) ## S3 method for class 'Rfc3339' as.character(x, ...) is.Rfc3339(x) ## S3 method for class 'Rfc3339' as.POSIXct(x, tz = "", ...) ## S3 method for class 'Rfc3339' as.POSIXlt(x, tz = "", ...) ## S3 method for class 'Rfc3339' x[i] ## S3 method for class 'Rfc3339' x[[i]] ## S3 replacement method for class 'Rfc3339' x[i] <- value ## S3 replacement method for class 'Rfc3339' x[[i]] <- value ## S3 method for class 'Rfc3339' c(..., recursive = TRUE) ## S4 method for signature 'Rfc3339' saveObject(x, path, ...)
x |
For For the subset and combining methods, an Rfc3339 instance. For For |
tz , recursive , ...
|
Further arguments to be passed to individual methods. |
i |
Indices specifying elements to extract or replace. |
value |
Replacement values, either as another Rfc3339 instance, a character vector or something that can be coerced into one. |
path |
String containing the path to a directory in which to save |
This class is motivated by the difficulty in using the various POSIXt classes to faithfully represent any RFC3339-compliant string. In particular:
The POSIXt classes do not automatically capture the string's timezone offset, instead converting all times to the local timezone.
This is problematic as it discards information about the original timezone.
Technically, the POSIXlt class is capable of holding this information in the gmtoff
field but it is not clear how to set this.
There is no way to distinguish between the timezones Z
and +00:00
.
These are functionally the same but will introduce differences in the checksums of saved files
and thus interfere with deduplication mechanisms in storage backends.
Coercion of POSIXt classes to strings may print more or fewer digits in the fractional seconds than what was present in the original string. Functionally, this is probably unimportant but will still introduce differences in the checksums.
By comparison, the Rfc3339 class preserves all information in the original string,
avoiding unexpected modifications from a roundtrip through readObject
and saveObject
.
This is especially relevant for strings that were created from other languages,
e.g., Node.js Date's ISO string conversion uses Z
by default.
That said, users should not expect too much from this class. It is only used to provide a faithful representation of RFC3339 strings, and does not support any time-related arithmetic. Users are advised to convert to POSIXct or similar if such operations are required.
For as.Rfc3339
, the subset and combining methods, an Rfc3339 instance is returned.
For the other as.*
methods, an instance of the corresponding type generated from an Rfc3339 instance.
Aaron Lun
out <- as.Rfc3339(Sys.time() + 1:10) out out[2:5] out[2] <- "2" c(out, out) as.character(out) as.POSIXct(out)
out <- as.Rfc3339(Sys.time() + 1:10) out out[2:5] out[2] <- "2" c(out, out) as.character(out) as.POSIXct(out)
Save vectors containing atomic elements (or values that can be cast as such, e.g., dates and times) to an on-disk representation.
## S4 method for signature 'integer' saveObject(x, path, ...) ## S4 method for signature 'character' saveObject(x, path, ...) ## S4 method for signature 'logical' saveObject(x, path, ...) ## S4 method for signature 'double' saveObject(x, path, ...) ## S4 method for signature 'numeric' saveObject(x, path, ...) ## S4 method for signature 'Date' saveObject(x, path, ...) ## S4 method for signature 'POSIXlt' saveObject(x, path, ...) ## S4 method for signature 'POSIXct' saveObject(x, path, ...)
## S4 method for signature 'integer' saveObject(x, path, ...) ## S4 method for signature 'character' saveObject(x, path, ...) ## S4 method for signature 'logical' saveObject(x, path, ...) ## S4 method for signature 'double' saveObject(x, path, ...) ## S4 method for signature 'numeric' saveObject(x, path, ...) ## S4 method for signature 'Date' saveObject(x, path, ...) ## S4 method for signature 'POSIXlt' saveObject(x, path, ...) ## S4 method for signature 'POSIXct' saveObject(x, path, ...)
x |
Any of the atomic vector types, or Date objects, or time objects, e.g., POSIXct. |
path |
String containing the path to a directory in which to save |
... |
Further arguments that are ignored. |
x
is saved inside path
.
NULL
is invisibly returned.
Aaron Lun
readAtomicVector
, to read the files back into the session.
tmp <- tempfile() dir.create(tmp) saveObject(LETTERS, file.path(tmp, "foo")) saveObject(setNames(runif(26), letters), file.path(tmp, "bar")) list.files(tmp, recursive=TRUE)
tmp <- tempfile() dir.create(tmp) saveObject(LETTERS, file.path(tmp, "foo")) saveObject(setNames(runif(26), letters), file.path(tmp, "bar")) list.files(tmp, recursive=TRUE)
Pretty much as it says, let's save a base R factor to an on-disk representation.
## S4 method for signature 'factor' saveObject(x, path, ...)
## S4 method for signature 'factor' saveObject(x, path, ...)
x |
A factor. |
path |
String containing the path to a directory in which to save |
... |
Further arguments that are ignored. |
x
is saved inside path
.
NULL
is invisibly returned.
Aaron Lun
readBaseFactor
, to read the files back into the session.
tmp <- tempfile() saveObject(factor(1:10, 1:30), tmp) list.files(tmp, recursive=TRUE)
tmp <- tempfile() saveObject(factor(1:10, 1:30), tmp) list.files(tmp, recursive=TRUE)
Save a list or List to a JSON or HDF5 file, with extra files created for any of the more complex list elements (e.g., DataFrames, arrays). This uses the uzuki2 specification to ensure that appropriate types are declared.
## S4 method for signature 'list' saveObject(x, path, list.format = saveBaseListFormat(), ...) ## S4 method for signature 'List' saveObject(x, path, list.format = saveBaseListFormat(), ...) saveBaseListFormat(list.format)
## S4 method for signature 'list' saveObject(x, path, list.format = saveBaseListFormat(), ...) ## S4 method for signature 'List' saveObject(x, path, list.format = saveBaseListFormat(), ...) saveBaseListFormat(list.format)
x |
An ordinary R list, named or unnamed. Alternatively, a List to be coerced into a list. |
path |
String containing the path to a directory in which to save |
list.format |
String specifying the format in which to save the list. |
... |
Further arguments, passed to |
For the saveObject
method, x
is saved inside dir
.
NULL
is invisibly returned.
For saveBaseListFormat
; if list.format
is missing, a string containing the current format is returned.
If list.format
is supplied, it is used to define the current format, and the previous format is returned.
If list.format="json.gz"
(default), the list is saved to a Gzip-compressed JSON file (the default).
This is an easily parsed format with low storage overhead.
If list.format="hdf5"
, x
is saved into a HDF5 file instead.
This format is most useful for random access and for preserving the precision of numerical data.
The uzuki2 specification (see https://github.com/ArtifactDB/uzuki2) allows length-1 vectors to be stored as-is or as a scalar.
If a list element is of length 1, saveBaseList
will store it as a scalar on-disk, effectively “unboxing” it for languages with a concept of scalars.
Users can override this behavior by adding the AsIs class to the affected list element, which will force storage as a length-1 vector.
This reflects the decisions made by readBaseList
and mimics the behavior of packages like jsonlite.
Aaron Lun
https://github.com/ArtifactDB/uzuki2 for the specification.
readBaseList
, to read the list back into the R session.
library(S4Vectors) ll <- list(A=1, B=LETTERS, C=DataFrame(X=1:5)) tmp <- tempfile() saveObject(ll, tmp) list.files(tmp, recursive=TRUE)
library(S4Vectors) ll <- list(A=1, B=LETTERS, C=DataFrame(X=1:5)) tmp <- tempfile() saveObject(ll, tmp) list.files(tmp, recursive=TRUE)
Alter the format used to save DataFrames in its stageObject
methods.
saveDataFrameFormat(format)
saveDataFrameFormat(format)
format |
String containing the format to use.
Tbe |
stageObject
methods will treat a format=NULL
in the same manner as the default format.
The distinction exists to allow downstream applications to set their own defaults while still responding to user specification.
For example, an application can detect if the existing format is NULL
, and if so, apply another default via .saveDataFrameFormat
.
On the other hand, if the format is not NULL
, this is presumably specified by the user explicitly and should be respected by the application.
If format
is missing, a string containing the current format is returned, or NULL
to use the default format.
If format
is supplied, it is used to define the current format, and the previous format is returned.
Aaron Lun
(old <- .saveDataFrameFormat()) .saveDataFrameFormat("hdf5") .saveDataFrameFormat() # Setting it back. .saveDataFrameFormat(old)
(old <- .saveDataFrameFormat()) .saveDataFrameFormat("hdf5") .saveDataFrameFormat() # Setting it back. .saveDataFrameFormat(old)
Save metadata
and mcols
for Annotated or Vector objects, respectively, to disk.
These are typically used inside saveObject
methods for concrete subclasses.
saveMetadata(x, metadata.path, mcols.path, ...)
saveMetadata(x, metadata.path, mcols.path, ...)
x |
|
metadata.path |
String containing the path in which to save the |
mcols.path |
String containing the path in which to save the |
... |
Further arguments to be passed to |
If mcols(x)
has no columns, nothing is saved by saveMcols
.
Similarly, if metadata(x)
is an empty list, nothing is saved by saveMetadata
.
This avoids creating unnecessary files with no meaningful content.
If mcols(x)
has non-NULL
row names, these are removed prior to staging.
These names are usually redundant with the names associated with elements of x
itself.
The metadata for x
is saved to metadata.path
, and similarly for the mcols
.
Aaron Lun
readMetadata
, which restores metadata to the object.
Generic to save assorted R objects into appropriate on-disk representations. More methods may be defined by other packages to extend the alabaster.base framework to new classes.
saveObject(x, path, ...)
saveObject(x, path, ...)
x |
A Bioconductor object of the specified class. |
path |
String containing the path to a directory in which to save |
... |
Additional named arguments to pass to specific methods. |
dir
is created and populated with files containing the contents of x
.
NULL
should be invisibly returned.
Methods for the saveObject
generic should create a directory at path
in which the contents of x
are to be saved.
The files may consist of any format, though language-agnostic formats like HDF5, CSV, JSON are preferred.
For more complex objects, multiple files and subdirectories may be created within path
.
The only strict requirements are:
There must be an OBJECT
file inside path
,
containing a JSON object with a "type"
string property that specifies the class of the object, e.g., "data_frame"
, "summarized_experiment"
.
This will be used by loading functions to determine how to load the files into memory.
The names of files and subdirectories should not start with _
or .
.
These are reserved for applications, e.g., to build manifests or to store additional metadata.
Callers can pass optional parameters to specific saveObject
methods via ...
.
Any options recognized by a method should be prefixed by the name of the class used in the method's signature,
e.g., any options for saveObject,DataFrame-method
should start with DataFrame.
.
This scoping avoids conflicts between otherwise identically-named options of different methods.
When developing saveObject
methods of complex objects, a simple approach is to decompose x
into its “child” components.
Each component can then be saved into a subdirectory of path
, levering the existing saveObject
methods for the component classes.
In such cases, extension developers should actually call altSaveObject
on each child component, rather than calling saveObject directly.
This ensures that any application-level overrides of the loading functions are respected.
It is expected that each method will forward ...
(possibly after modification) to any internal altSaveObject
calls.
Application developers can override saveObject
by specifying a custom function in altSaveObject
.
This can be used to point to a different function to handle the saving process for each class.
The custom function can be as simple as a wrapper around saveObject
with some additional actions (e.g., to save more metadata),
or may be as complex as a full-fledged generic with its own methods for class-specific customizations.
Aaron Lun
library(S4Vectors) X <- DataFrame(X=LETTERS, Y=sample(3, 26, replace=TRUE)) tmp <- tempfile() saveObject(X, tmp) list.files(tmp, recursive=TRUE)
library(S4Vectors) X <- DataFrame(X=LETTERS, Y=sample(3, 26, replace=TRUE)) tmp <- tempfile() saveObject(X, tmp) list.files(tmp, recursive=TRUE)
Stage a DataFrame by saving it to a HDF5 file.
## S4 method for signature 'DataFrame' saveObject(x, path, ...) ## S4 method for signature 'data.frame' saveObject(x, path, ...)
## S4 method for signature 'DataFrame' saveObject(x, path, ...) ## S4 method for signature 'data.frame' saveObject(x, path, ...)
x |
A DataFrame or data.frame. |
path |
String containing the path to a directory in which to save |
... |
Additional named arguments to pass to specific methods. |
This method creates a basic_columns.h5
file that contains columns for atomic vectors, factors, dates and date-times.
Dates and date-times are converted to character vectors and saved as such inside the file.
Factors are saved as a HDF5 group with both the codes and the levels as separate datasets.
Any non-atomic columns are saved to a other_columns
subdirectory inside path
via saveObject
,
named after its zero-based positional index within x
.
If metadata
or mcols
are present,
they are saved to the other_annotations
and column_annotations
subdirectories, respectively, via saveObject
.
In the on-disk representation, no distinction is made between DataFrame and data.frame instances of x
.
Calling readDataFrame
will always produce a DFrame regardless of the class of x
.
A named list containing the metadata for x
.
x
itself is written to a HDF5 file inside path
.
Additional files may also be created inside path
and referenced from the metadata.
Aaron Lun
library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() saveObject(df, tmp) list.files(tmp, recursive=TRUE)
library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() saveObject(df, tmp) list.files(tmp, recursive=TRUE)
Stage a DataFrameFactor object, a generalization of the base factor where each level is a row of a DataFrame.
## S4 method for signature 'DataFrameFactor' saveObject(x, path, ...)
## S4 method for signature 'DataFrameFactor' saveObject(x, path, ...)
x |
A DataFrameFactor object. |
path |
String containing the path to a directory in which to save |
... |
Further arguments, to pass to internal |
x
is saved to an on-disk representation inside path
.
Aaron Lun
library(S4Vectors) df <- DataFrame(X=LETTERS[1:5], Y=1:5) out <- DataFrameFactor(df[sample(5, 100, replace=TRUE),,drop=FALSE]) tmp <- tempfile() saveObject(out, tmp) list.files(tmp, recursive=TRUE)
library(S4Vectors) df <- DataFrame(X=LETTERS[1:5], Y=1:5) out <- DataFrameFactor(df[sample(5, 100, replace=TRUE),,drop=FALSE]) tmp <- tempfile() saveObject(out, tmp) list.files(tmp, recursive=TRUE)
This handles type casting and missing placeholder value selection/substitution. It is primarily intended for developers of alabaster.* extensions.
transformVectorForHdf5(x, .version = 3)
transformVectorForHdf5(x, .version = 3)
x |
An atomic vector to be saved to HDF5. |
.version |
Internal use only. |
A list containing:
transformed
, the transformed vector.
This may be the same as x
if no NA
values were detected.
Note that logical vectors are cast to integers.
placeholder
, the placeholder value used to represent NA
values.
This is NULL
if no NA
values were detected in x
,
otherwise it is the same as the output of chooseMissingPlaceholderForHdf5
.
Aaron Lun
transformVectorForHdf5(c(TRUE, NA, FALSE)) transformVectorForHdf5(c(1L, NA, 2L)) transformVectorForHdf5(c(1L, NaN, 2L)) transformVectorForHdf5(c("FOO", NA, "BAR")) transformVectorForHdf5(c("FOO", NA, "NA"))
transformVectorForHdf5(c(TRUE, NA, FALSE)) transformVectorForHdf5(c(1L, NA, 2L)) transformVectorForHdf5(c(1L, NaN, 2L)) transformVectorForHdf5(c("FOO", NA, "BAR")) transformVectorForHdf5(c("FOO", NA, "NA"))
Check whether each object in a directory is valid by calling validateObject
on each non-nested object.
validateDirectory(dir, legacy = NULL, ...)
validateDirectory(dir, legacy = NULL, ...)
dir |
String containing the path to a directory with subdirectories populated by |
legacy |
Logical scalar indicating whether to validate a directory with legacy objects (created by the old |
... |
Further arguments to use when |
We assume that the process of validating an object will call validateObject
on any nested objects.
This allows us to skip explicit calls to validateObject
on each component of a complex object.
Character vector of the paths inside dir
that were validated, invisibly.
If any validation failed, an error is raised.
Aaron Lun
# Mocking up an object: library(S4Vectors) ncols <- 123 df <- DataFrame( X = rep(LETTERS[1:3], length.out=ncols), Y = runif(ncols) ) df$Z <- DataFrame(AA = sample(ncols)) # Mocking up the directory: tmp <- tempfile() dir.create(tmp, recursive=TRUE) saveObject(df, file.path(tmp, "foo")) # Checking that it's valid: validateDirectory(tmp) # Adding an invalid object: dir.create(file.path(tmp, "bar")) write(file=file.path(tmp, "bar", "OBJECT"), '[ "WHEEE" ]') try(validateDirectory(tmp))
# Mocking up an object: library(S4Vectors) ncols <- 123 df <- DataFrame( X = rep(LETTERS[1:3], length.out=ncols), Y = runif(ncols) ) df$Z <- DataFrame(AA = sample(ncols)) # Mocking up the directory: tmp <- tempfile() dir.create(tmp, recursive=TRUE) saveObject(df, file.path(tmp, "foo")) # Checking that it's valid: validateDirectory(tmp) # Adding an invalid object: dir.create(file.path(tmp, "bar")) write(file=file.path(tmp, "bar", "OBJECT"), '[ "WHEEE" ]') try(validateDirectory(tmp))
Validate an object's on-disk representation against the takane specifications.
This is done by dispatching to an appropriate validation function based on the type in the OBJECT
file.
validateObject(path, metadata = NULL) registerValidateObjectFunction(type, fun, existing = c("old", "new", "error")) registerValidateObjectHeightFunction( type, fun, existing = c("old", "new", "error") ) registerValidateObjectDimensionsFunction( type, fun, existing = c("old", "new", "error") ) registerValidateObjectSatisfiesInterface( type, interface, action = c("add", "remove") ) registerValidateObjectDerivedFrom(type, parent, action = c("add", "remove"))
validateObject(path, metadata = NULL) registerValidateObjectFunction(type, fun, existing = c("old", "new", "error")) registerValidateObjectHeightFunction( type, fun, existing = c("old", "new", "error") ) registerValidateObjectDimensionsFunction( type, fun, existing = c("old", "new", "error") ) registerValidateObjectSatisfiesInterface( type, interface, action = c("add", "remove") ) registerValidateObjectDerivedFrom(type, parent, action = c("add", "remove"))
path |
String containing a path to a directory, itself created with a |
metadata |
List containing metadata for the object.
If this is not supplied, it is automatically read from the |
type |
String specifying the name of type of the object. |
fun |
For For For This may also be |
existing |
Logical scalar indicating the action to take if a function has already been registered for |
interface |
String specifying the name of the interface that is represented by |
action |
String specifying whether to add or remove |
parent |
String specifying the parent object from which |
For validateObject
, NULL
is returned invisibly upon success, otherwise an error is raised.
For the registerValidObject*Function
functions, the supplied fun
is added to the corresponding registry for type
.
If fun = NULL
, any existing entry for type
is removed; a logical scalar is returned indicating whether removal was performed.
For the registerValidateObjectSatisfiesInterface
and registerValidateObjectDerivedFrom
functions, type
is added to or removed from relevant list of types.
A logical scalar is returned indicating whether the type
was added or removed - this may be FALSE
if type
was already present or absent, respectively.
Aaron Lun
https://github.com/ArtifactDB/takane, for detailed specifications of the on-disk representation for various Bioconductor objects.
library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() saveObject(df, tmp) validateObject(tmp)
library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() saveObject(df, tmp) validateObject(tmp)
WARNING: this function is deprecated as newer versions of alabaster do not need to write metadata.
Helper function to write metadata from a named list to a JSON file.
This is commonly used inside stageObject
methods to create the metadata file for a child object.
writeMetadata(meta, dir, ignore.null = TRUE)
writeMetadata(meta, dir, ignore.null = TRUE)
meta |
A named list containing metadata.
This should contain at least the |
dir |
String containing a path to the staging directory. |
ignore.null |
Logical scalar indicating whether |
Any NULL
values in meta
are pruned out prior to writing when ignore.null=TRUE
.
This is done recursively so any NULL
values in sub-lists of meta
are also ignored.
Any scalars are automatically unboxed so array values should be explicitly specified as such with I()
.
Any starting "./"
in meta$path
will be automatically removed.
This allows staging methods to save in the current directory by setting path="."
,
without the need to pollute the paths with a "./"
prefix.
The JSON-formatted metadata is validated against the schema in meta[["$schema"]]
using jsonvalidate.
The location of the schema is taken from the package
attribute in that string, if one exists;
otherwise, it is assumed to be in the alabaster.schemas package.
(All schemas are assumed to live in the inst/schemas
subdirectory of their indicated packages.)
We also use the schema to determine whether meta
refers to an actual artifact or is a metadata-only document.
If it refers to an actual file, we compute its MD5 sum and store it in the metadata for saving.
We also save its associated metadata into a JSON file at a location obtained by appending ".json"
to meta$path
.
For artifacts, the MD5 sum calculation will be skipped if the meta
already contains a md5sum
field.
This can be useful on some occasions, e.g., to improve efficiency when the MD5 sum was already computed during staging,
or if the artifact does not actually exist in its full form on the file system.
A JSON file containing the metadata is created at path
.
A list of resource metadata is returned, e.g., for inclusion as the "resource"
property in parent schemas.
Aaron Lun
library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() dir.create(tmp) info <- stageObject(df, tmp, path="coldata") writeMetadata(info, tmp) cat(readLines(file.path(tmp, "coldata/simple.csv.gz.json")), sep="\n")
library(S4Vectors) df <- DataFrame(A=1:10, B=LETTERS[1:10]) tmp <- tempfile() dir.create(tmp) info <- stageObject(df, tmp, path="coldata") writeMetadata(info, tmp) cat(readLines(file.path(tmp, "coldata/simple.csv.gz.json")), sep="\n")