--- title: Saving common bioinformatics file formats author: - name: Aaron Lun email: infinite.monkeys.with.keyboards@gmail.com package: alabaster.files date: "Revised: December 30, 2023" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Saving common file formats} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo=FALSE} library(BiocStyle) self <- Biocpkg("alabaster.files") knitr::opts_chunk$set(error=FALSE, warning=FALSE, message=FALSE) ``` # Overview The `r self` package implements methods to save common bioinformatics file formats within the **alabaster** framework. It does not perform any validation or parsing of the files, it just provides very light-weight wrappers for processing via `alabaster.base::stageObject()`. Check out the `r Biocpkg("alabaster.base")` package for more details on the motivation and concepts behind **alabaster**. # Quick start We'll start with an indexed BAM file from the `r Biocpkg("Rsamtools")` package: ```{r} bam.file <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE) bam.index <- paste0(bam.file, ".bai") ``` We can wrap this inside a `BamFileReference` class: ```{r} library(alabaster.files) library(S4Vectors) wrapped.bam <- BamFileReference(bam.file, index=bam.index) ``` Then we can save it to file: ```{r} dir <- tempfile() saveObject(wrapped.bam, dir) ``` ... and load it back at some later time. ```{r} readObject(dir) ``` # Integration with other objects The example above isn't very exciting, but it demonstrates how these files can be easily added to an **alabaster** project. This allows us to incorporate the `Wrapper` objects into other Bioconductor data structures, like: ```{r} df <- DataFrame(Sample=LETTERS[1:4]) # Adding a column of assorted wrapper files: df$File <- list( wrapped.bam, BigWigFileReference(system.file("tests", "test.bw", package = "rtracklayer")), BigBedFileReference(system.file("tests", "test.bb", package = "rtracklayer")), BcfFileReference(system.file("extdata", "ex1.bcf.gz", package = "Rsamtools")) ) # Saving it all to the staging directory: dir <- tempfile() saveObject(df, dir) # Now reading it back in: roundtrip <- readObject(dir) roundtrip$File ``` Similarly, if the staging directory is uploaded to a remote store, the wrapped files will automatically be included in the upload. This avoids the need for a separate process to handle these files. # Validation `r self` will try to perform some cursory validation of the wrapped file to catch errors in user inputs. The level of validation is format-dependent but should be fast, e.g., BAM file validation is performed by scanning the header. In all cases, users should not expect an exhaustive check of file validity, as that would take too long and involve more parsing than desired for the scope of `r self`. If stricter validation is required, applications calling `r self` should override the `saveObject()` methods for the relevant `FileReference` classes. # Session information {-} ```{r} sessionInfo() ```