The alabaster.files
package implements methods to save common bioinformatics file formats
within the alabaster framework. It does not perform any
validation or parsing of the files, it just provides very light-weight
wrappers for processing via alabaster.base::stageObject()
.
Check out the alabaster.base
package for more details on the motivation and concepts behind
alabaster.
We’ll start with an indexed BAM file from the Rsamtools package:
bam.file <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)
bam.index <- paste0(bam.file, ".bai")
We can wrap this inside a BamFileReference
class:
library(alabaster.files)
library(S4Vectors)
wrapped.bam <- BamFileReference(bam.file, index=bam.index)
Then we can save it to file:
… and load it back at some later time.
## BamFileReference object
## path: /tmp/RtmpJ8KpKK/file171612874f91/file.bam
## index: /tmp/RtmpJ8KpKK/file171612874f91/file.bam.bai
The example above isn’t very exciting, but it demonstrates how these
files can be easily added to an alabaster project. This
allows us to incorporate the Wrapper
objects into other
Bioconductor data structures, like:
df <- DataFrame(Sample=LETTERS[1:4])
# Adding a column of assorted wrapper files:
df$File <- list(
wrapped.bam,
BigWigFileReference(system.file("tests", "test.bw", package = "rtracklayer")),
BigBedFileReference(system.file("tests", "test.bb", package = "rtracklayer")),
BcfFileReference(system.file("extdata", "ex1.bcf.gz", package = "Rsamtools"))
)
# Saving it all to the staging directory:
dir <- tempfile()
saveObject(df, dir)
# Now reading it back in:
roundtrip <- readObject(dir)
roundtrip$File
## [[1]]
## BamFileReference object
## path: /tmp/RtmpJ8KpKK/file1716502c8c9/other_columns/1/other_contents/0/file.bam
## index: /tmp/RtmpJ8KpKK/file1716502c8c9/other_columns/1/other_contents/0/file.bam.bai
##
## [[2]]
## BigWigFileReference object
## path: /tmp/RtmpJ8KpKK/file1716502c8c9/other_columns/1/other_contents/1/file.bw
##
## [[3]]
## BigBedFileReference object
## path: /tmp/RtmpJ8KpKK/file1716502c8c9/other_columns/1/other_contents/2/file.bb
##
## [[4]]
## BcfFileReference object
## path: /tmp/RtmpJ8KpKK/file1716502c8c9/other_columns/1/other_contents/3/file.bcf
## index: NULL
Similarly, if the staging directory is uploaded to a remote store, the wrapped files will automatically be included in the upload. This avoids the need for a separate process to handle these files.
alabaster.files
will try to perform some cursory validation of the wrapped file to catch
errors in user inputs. The level of validation is format-dependent but
should be fast, e.g., BAM file validation is performed by scanning the
header. In all cases, users should not expect an exhaustive check of
file validity, as that would take too long and involve more parsing than
desired for the scope of alabaster.files.
If stricter validation is required, applications calling alabaster.files
should override the saveObject()
methods for the relevant
FileReference
classes.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] S4Vectors_0.45.2 BiocGenerics_0.53.3 generics_0.1.3
## [4] alabaster.files_1.5.0 alabaster.base_1.7.2 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.9 crayon_1.5.3 compiler_4.4.2
## [4] BiocManager_1.30.25 Rcpp_1.0.13-1 Biostrings_2.75.1
## [7] GenomicRanges_1.59.1 Rsamtools_2.23.1 rhdf5filters_1.19.0
## [10] bitops_1.0-9 parallel_4.4.2 jquerylib_0.1.4
## [13] IRanges_2.41.1 BiocParallel_1.41.0 yaml_2.3.10
## [16] fastmap_1.2.0 XVector_0.47.0 R6_2.5.1
## [19] GenomeInfoDb_1.43.2 knitr_1.49 maketools_1.3.1
## [22] GenomeInfoDbData_1.2.13 bslib_0.8.0 rlang_1.1.4
## [25] cachem_1.1.0 xfun_0.49 sass_0.4.9
## [28] sys_3.4.3 cli_3.6.3 Rhdf5lib_1.29.0
## [31] zlibbioc_1.52.0 digest_0.6.37 alabaster.schemas_1.7.0
## [34] rhdf5_2.51.0 lifecycle_1.0.4 evaluate_1.0.1
## [37] codetools_0.2-20 buildtools_1.0.0 rmarkdown_2.29
## [40] httr_1.4.7 tools_4.4.2 htmltools_0.5.8.1
## [43] UCSC.utils_1.3.0