XStringSet
s to artifacts
and back againThe alabaster.string
package implements methods to save XStringSet
objects to
file artifacts and load them back into R. Check out the alabaster.base
for more details on the motivation and concepts of the
alabaster framework.
Given an XStringSet
, we can use
saveObject()
to save it inside a staging directory:
library(Biostrings)
x <- DNAStringSet(c(seq1="CTCNACCAGTAT", seq2="TTGA", seq3="TACCTAGAG"))
mcols(x)$score <- runif(length(x))
x
## DNAStringSet object of length 3:
## width seq names
## [1] 12 CTCNACCAGTAT seq1
## [2] 4 TTGA seq2
## [3] 9 TACCTAGAG seq3
## [1] "OBJECT"
## [2] "names.txt.gz"
## [3] "sequence_annotations/OBJECT"
## [4] "sequence_annotations/basic_columns.h5"
## [5] "sequences.fasta.gz"
We can then load it back into the session with
readObject()
.
## [1] "DNAStringSet"
## attr(,"package")
## [1] "Biostrings"
More details on the metadata and on-disk layout are provided in the schema.
The same approach works with QualityScaledXStringSet
objects:
x <- DNAStringSet(c("TTGA", "CTCN"))
q <- PhredQuality(c("*+,-", "6789"))
y <- QualityScaledDNAStringSet(x, q)
library(alabaster.string)
tmp <- tempfile()
saveObject(y, tmp)
roundtrip <- readObject(tmp)
class(roundtrip)
## [1] "QualityScaledDNAStringSet"
## attr(,"package")
## [1] "Biostrings"
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] alabaster.string_1.7.0 alabaster.base_1.7.0 Biostrings_2.75.1
## [4] GenomeInfoDb_1.43.0 XVector_0.47.0 IRanges_2.41.0
## [7] S4Vectors_0.45.0 BiocGenerics_0.53.1 generics_0.1.3
## [10] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.9 compiler_4.4.2 BiocManager_1.30.25
## [4] crayon_1.5.3 Rcpp_1.0.13-1 rhdf5filters_1.19.0
## [7] jquerylib_0.1.4 yaml_2.3.10 fastmap_1.2.0
## [10] R6_2.5.1 knitr_1.48 maketools_1.3.1
## [13] GenomeInfoDbData_1.2.13 bslib_0.8.0 rlang_1.1.4
## [16] cachem_1.1.0 xfun_0.49 sass_0.4.9
## [19] sys_3.4.3 cli_3.6.3 Rhdf5lib_1.29.0
## [22] zlibbioc_1.52.0 digest_0.6.37 alabaster.schemas_1.7.0
## [25] rhdf5_2.51.0 lifecycle_1.0.4 evaluate_1.0.1
## [28] buildtools_1.0.0 rmarkdown_2.29 httr_1.4.7
## [31] tools_4.4.2 htmltools_0.5.8.1 UCSC.utils_1.3.0