Saving genomic ranges to artifacts and back again

Overview

The alabaster.ranges package implements methods to save genomic ranges (i.e., GRanges and GRangesList objects) to file artifacts and load them back into R. It also supports various CompressedList subclasses, including the somewhat useful CompressedSplitDataFrameList. Check out alabaster.base for more details on the motivation and concepts of the alabaster framework.

Quick start

Given some genomic ranges, we can use saveObject() to save it inside a staging directory:

library(GenomicRanges)
gr <- GRanges("chrA", IRanges(sample(100), width=sample(100)))
mcols(gr)$score <- runif(length(gr))
metadata(gr)$genome <- "Aaron"
seqlengths(gr) <- c(chrA=1000)

library(alabaster.ranges)
tmp <- tempfile()
saveObject(gr, tmp)

list.files(tmp, recursive=TRUE)
## [1] "OBJECT"                                 
## [2] "other_annotations/OBJECT"               
## [3] "other_annotations/list_contents.json.gz"
## [4] "range_annotations/OBJECT"               
## [5] "range_annotations/basic_columns.h5"     
## [6] "ranges.h5"                              
## [7] "sequence_information/OBJECT"            
## [8] "sequence_information/info.h5"

We can then easily load it back in with readObject().

roundtrip <- readObject(tmp)
roundtrip
## GRanges object with 100 ranges and 1 metadata column:
##         seqnames    ranges strand |     score
##            <Rle> <IRanges>  <Rle> | <numeric>
##     [1]     chrA    98-176      * | 0.0891685
##     [2]     chrA     52-61      * | 0.1475155
##     [3]     chrA     15-33      * | 0.3373283
##     [4]     chrA    85-177      * | 0.5105193
##     [5]     chrA     28-98      * | 0.2030273
##     ...      ...       ...    ... .       ...
##    [96]     chrA     35-64      * | 0.1107076
##    [97]     chrA     49-68      * | 0.3753060
##    [98]     chrA     10-77      * | 0.0810458
##    [99]     chrA    94-152      * | 0.9425210
##   [100]     chrA     25-86      * | 0.9030519
##   -------
##   seqinfo: 1 sequence from an unspecified genome

The same can be done for GRangesList and CompressedList subclasses.

Further comments

Metadata is preserved during this round-trip:

metadata(roundtrip)
## $genome
## [1] "Aaron"
mcols(roundtrip)
## DataFrame with 100 rows and 1 column
##         score
##     <numeric>
## 1   0.0891685
## 2   0.1475155
## 3   0.3373283
## 4   0.5105193
## 5   0.2030273
## ...       ...
## 96  0.1107076
## 97  0.3753060
## 98  0.0810458
## 99  0.9425210
## 100 0.9030519
seqinfo(roundtrip)
## Seqinfo object with 1 sequence from an unspecified genome:
##   seqnames seqlengths isCircular genome
##   chrA           1000         NA   <NA>

Session information

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] alabaster.ranges_1.5.2 alabaster.base_1.5.8   GenomicRanges_1.57.1  
## [4] GenomeInfoDb_1.41.1    IRanges_2.39.2         S4Vectors_0.43.2      
## [7] BiocGenerics_0.51.1    BiocStyle_2.33.1      
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.7              cli_3.6.3               knitr_1.48             
##  [4] rlang_1.1.4             xfun_0.47               UCSC.utils_1.1.0       
##  [7] jsonlite_1.8.9          buildtools_1.0.0        htmltools_0.5.8.1      
## [10] maketools_1.3.0         sys_3.4.2               sass_0.4.9             
## [13] rmarkdown_2.28          evaluate_1.0.0          jquerylib_0.1.4        
## [16] fastmap_1.2.0           Rhdf5lib_1.27.0         alabaster.schemas_1.5.0
## [19] yaml_2.3.10             lifecycle_1.0.4         BiocManager_1.30.25    
## [22] compiler_4.4.1          Rcpp_1.0.13             rhdf5filters_1.17.0    
## [25] XVector_0.45.0          rhdf5_2.49.0            digest_0.6.37          
## [28] R6_2.5.1                GenomeInfoDbData_1.2.12 bslib_0.8.0            
## [31] tools_4.4.1             zlibbioc_1.51.1         cachem_1.1.0