Title: | Representing GDS files as array-like objects |
---|---|
Description: | GDS files are widely used to represent genotyping or sequence data. The GDSArray package implements the `GDSArray` class to represent nodes in GDS files in a matrix-like representation that allows easy manipulation (e.g., subsetting, mathematical transformation) in _R_. The data remains on disk until needed, so that very large files can be processed. |
Authors: | Qian Liu [aut, cre], Martin Morgan [aut], Hervé Pagès [aut], Xiuwen Zheng [aut] |
Maintainer: | Qian Liu <[email protected]> |
License: | GPL-3 |
Version: | 1.27.0 |
Built: | 2024-10-30 07:18:14 UTC |
Source: | https://github.com/bioc/GDSArray |
gds.class
class.Acquire a (possibly cached) gds.class
object given it's path.
acquireGDS(path, type = NULL, ...) releaseGDS(path, type = NULL, ...)
acquireGDS(path, type = NULL, ...) releaseGDS(path, type = NULL, ...)
path |
String containing a path to a GDS file. |
type |
String containing the GDS file type. Case
insensitive. Can be "seqgds" for a GDS file with sequencing
data, or "snpgds" for a GDS file with SNP data. This argument
was added for the |
other |
arguments to be passed to |
acquireConn
will cache the gds.class
object
in the current R session to avoid repeated initialization. This
improves efficiency for repeated calls. The cached
gds.class
object for any given path
can be
deleted by calling releaseGDS
for the same path
.
For acquireGDS
, by default returns a regular
gds.class
object, which are identical to that returned
by gdsfmt::openfn.gds(path)
. If type
is not NULL,
a SeqVarGDSClass
that is identical to
SeqArray::seqOPen(path)
, or SNPGDSFileClass
that
is identical to SNPRelate::snpgdsOpen(path)
. Both are
inherited from gds.class
but with additional checking
and methods.
For releaseGDS
, any existing gds.class
object for the
path
is disconnected and cleared from cache, and NULL
is invisibly returned. This is equivalent to that returned by
gdsfmt::closefn.gds()
except it take path
as
input. If path=NULL
, all cached connections are removed.
Qian Liu
fn <- gdsExampleFileName() gdscon <- acquireGDS(fn) acquireGDS(fn) ## just re-uses the cache acquireGDS(fn, type = "seqgds") ## construct a new GDS connection releaseGDS(fn) ## clears the cache
fn <- gdsExampleFileName() gdscon <- acquireGDS(fn) acquireGDS(fn) ## just re-uses the cache acquireGDS(fn, type = "seqgds") ## construct a new GDS connection releaseGDS(fn) ## clears the cache
extract_array
: the function to extract data from
a GDS
file, by taking GDSArraySeed
as input. This
function is required by the DelayedArray
for the seed
contract.
GDSArray
: The function to convert a gds file
into the GDSArray data structure.
GDSArray
example data
## S4 method for signature 'GDSArraySeed' extract_array(x, index) GDSArray(gdsfile, varname) gdsExampleFileName(type = c("seqgds", "snpgds"))
## S4 method for signature 'GDSArraySeed' extract_array(x, index) GDSArray(gdsfile, varname) gdsExampleFileName(type = c("seqgds", "snpgds"))
x |
the GDSArraySeed object |
index |
An unnamed list of subscripts as positive integer
vectors, one vector per dimension in |
gdsfile |
Can be a GDSArraySeed, a character string of gds file name, or an "gds.class" R object. |
varname |
A character string specifying the gds array node to be read into GDSArray. |
type |
the type of gds file, available are "seqgds" for
|
GDSArray
class object.
fn <- gdsExampleFileName("snpgds") allnodes <- gdsnodes(fn) ## print all available gds nodes in fn. allnodes GDSArray(fn, "genotype") GDSArray(fn, "sample.annot/pop.group") fn1 <- gdsExampleFileName("seqgds") allnodes1 <- gdsnodes(fn1) ## print all available gds nodes in fn1. allnodes1 ## GDSArray(fn1, "genotype/data") GDSArray(fn1, "variant.id") GDSArray(fn1, "sample.annotation/family") GDSArray(fn1, "annotation/format/DP/data") GDSArray(fn1, "annotation/info/DP") gdsExampleFileName("snpgds") gdsExampleFileName("seqgds")
fn <- gdsExampleFileName("snpgds") allnodes <- gdsnodes(fn) ## print all available gds nodes in fn. allnodes GDSArray(fn, "genotype") GDSArray(fn, "sample.annot/pop.group") fn1 <- gdsExampleFileName("seqgds") allnodes1 <- gdsnodes(fn1) ## print all available gds nodes in fn1. allnodes1 ## GDSArray(fn1, "genotype/data") GDSArray(fn1, "variant.id") GDSArray(fn1, "sample.annotation/family") GDSArray(fn1, "annotation/format/DP/data") GDSArray(fn1, "annotation/info/DP") gdsExampleFileName("snpgds") gdsExampleFileName("seqgds")
GDSFile
: GDSFile
is a light-weight class
to represent a GDS file. It has the '$' completion method to
complete any possible gds nodes. If the slot of 'current_path'
in 'GDSFile' object represent a valid gds node, it will return
the 'GDSArray' of that node directly. Otherwise, it will return
the 'GDSFile' object with an updated 'current_path'.
GDSFile
: the GDSFile
class constructor.
gdsfile
: filename
slot getter for
GDSFile
object.
gdsfile<-
: filename
slot setter for
GDSFile
object.
gdsnodes
: to get the available gds nodes from a
gds file name or a GDSFile
object.
GDSFile(file, current_path = "") ## S4 method for signature 'GDSFile' gdsfile(object) gdsfile(object) <- value ## S4 method for signature 'GDSFile' x$name ## S4 method for signature 'ANY' gdsnodes(x, node)
GDSFile(file, current_path = "") ## S4 method for signature 'GDSFile' gdsfile(object) gdsfile(object) <- value ## S4 method for signature 'GDSFile' x$name ## S4 method for signature 'ANY' gdsnodes(x, node)
file |
the GDS file path. |
current_path |
the current path to the closest gds node. |
object |
|
value |
the new gds file path |
x |
a character string for the GDS file name or a |
name |
the name of gds node |
node |
the node name of a gds file or |
gdsfile
: the file path of corresponding
GDSfile
object.
$
: a GDSFile
with updated @current_path
, or
GDSArray
object if the current_path
is a valid
gds node.
gdsnodes
: a character vector of all available gds
nodes within the related GDS file and the specified node.
fn <- gdsExampleFileName("seqgds") gf <- GDSFile(fn) gdsfile(gf) fn <- gdsExampleFileName("seqgds") gdsnodes(fn) gdsnodes(fn, "annotation/info") fn1 <- gdsExampleFileName("snpgds") gdsnodes(fn1) gdsnodes(fn1, "sample.annot") gf <- GDSFile(fn) gdsnodes(gf) gdsnodes(gf, "genotype") gdsfile(gf)
fn <- gdsExampleFileName("seqgds") gf <- GDSFile(fn) gdsfile(gf) fn <- gdsExampleFileName("seqgds") gdsnodes(fn) gdsnodes(fn, "annotation/info") fn1 <- gdsExampleFileName("snpgds") gdsnodes(fn1) gdsnodes(fn1, "sample.annot") gf <- GDSFile(fn) gdsnodes(gf) gdsnodes(gf, "genotype") gdsfile(gf)
dim
, dimnames
: dimension and dimnames of
object contained in the GDS file.
seed
: the GDSArraySeed
getter for
GDSArray
object.
seed<-
: the GDSArraySeed
setter for
GDSArray
object.
gdsfile
: on-disk location of GDS file
represented by this object.
## S4 method for signature 'GDSArray' seed(x) ## S4 replacement method for signature 'GDSArray' seed(x) <- value gdsfile(object) ## S4 method for signature 'GDSArraySeed' gdsfile(object) ## S4 method for signature 'GDSArray' gdsfile(object) ## S4 method for signature 'DelayedArray' gdsfile(object)
## S4 method for signature 'GDSArray' seed(x) ## S4 replacement method for signature 'GDSArray' seed(x) <- value gdsfile(object) ## S4 method for signature 'GDSArraySeed' gdsfile(object) ## S4 method for signature 'GDSArray' gdsfile(object) ## S4 method for signature 'DelayedArray' gdsfile(object)
x |
the |
value |
the new |
object |
GDSArray, GDSMatrix, GDSArraySeed, GDSFile or SummarizedExperiment object. |
dim
: the integer vector of dimensions for
GDSArray
or GDSArraySeed
objects.
dimnames
: the unnamed list of dimension names for
GDSArray
and GDSArraySeed
objects.
seed
: the GDSArraySeed
of GDSArray
object.
gdsfile
: the character string for the gds file path.
fn <- gdsExampleFileName("snpgds") ga <- GDSArray(fn, "sample.annot/pop.group") dim(ga) dimnames(ga) type(ga) seed(ga) dim(seed(ga)) gdsfile(ga)
fn <- gdsExampleFileName("snpgds") ga <- GDSArray(fn, "sample.annot/pop.group") dim(ga) dimnames(ga) type(ga) seed(ga) dim(seed(ga)) gdsfile(ga)