Numerous groups are working on methodology for analyzing new data resources emerging from spatial transcriptomics experiments. The “computing technology stack” necessary to work with such data is complex. The following comment (Aug 16 2024) from the scverse group is characteristic:
The SpatialData Zarr format, which is described in our design doc, is an extension of the OME-NGFF specification, which makes use of the OME-Zarr, the AnnData Zarr and the Parquet file formats. We need to use these combination of technologies because currently OME-NGFF does not provide all the fundamentals required for storing spatial omics dataset; nevertheless, we try to stay as close as OME-NGFF as possible, and we are contributing to ultimately make spatial omics support available in pure OME-NGFF.
In Bioconductor 3.19, we might be committed to all these technologies (which include but do not mention HDF5) plus R, reticulate, basilisk, and sf to have a widely functional solution. Attaching the Voyager package leads to loading over 100 associated packages. The xenLite package is intended to explore the balance between functionality and high dependency load specifically for the analysis of outputs from 10x Xenium experiments. The XenSPEP class extends SpatialExperiment to include references to parquet files that manage voluminous geometry data; geometry can also be handled as DataFrame for sufficiently low volume experiments.
As of 0.0.7 of xenLite, a new class, XenSPEP, is provided to address parquet representation of cell, nucleus and transcript coordinates.
This package is based on publicly available datasets.
Human dermal melanoma FFPE
Human pan tissue dataset. For this dataset, we retrieved the
outs.zip
file from this site and ran
XenSCE::ingestXen
on the contents, producing a XenSPEP
instance which was then saved in an HDF5-backed representation. This,
along with the parquet files for cell, nucleus, and transcript
coordinates, are zipped together and placed in an Open Storage Network
bucket for retrieval via cacheXenPdmelLite
.
Human prostate adenocarcinoma FFPE
Human pan tissue dataset. For this dataset, we retrieved the
outs.zip
file from this site and ran
XenSCE::ingestXen
on the contents, producing a XenSPEP
instance which was then saved in an HDF5-backed representation. This,
along with the parquet files for cell, nucleus, and transcript
coordinates, are zipped together and placed in an Open Storage Network
bucket for retrieval via cacheXenProstLite
.
Human Lung FFPE
Lung cancer preview, Use example(cacheXenLuad)
to
obtain this instance of XenSPEP.
In this example, transcript counts are in memory in a sparse matrix,
but geometry information is handled in parquet. The
viewSegG2
function allows selection of two gene symbols,
and cells are colored according to single or double occupancy.
library(xenLite)
pa <- cacheXenLuad()
luad <- restoreZipXenSPEP(pa)
rownames(luad) <- make.names(SummarizedExperiment:::rowData(luad)$Symbol, unique = TRUE)
out <- viewSegG2(luad, c(5800, 6300), c(1300, 1800), lwd = .5, gene1 = "CD4", gene2 = "EPCAM")
legend(5800, 1390, fill = c("purple", "cyan", "pink"), legend = c("CD4", "EPCAM", "both"))
## [1] 2074
In inst/app4
, code is provided to work with the primary
dermal melanoma dataset. A map of the cell coordinates can drive focused
exploration. The region of interest is shown by a whitened rectangle in
the upper left corner.
Cells are colored by quintile of size. Points where FBL transcripts are found are plotted as dots.
More work is needed to identify useful exploratory visualizations.
We want to be able to accommodate very large numbers of cells and transcripts without heavy infrastructure commitments.
We’ll use the prostate adenocarcinoma 5K dataset to demonstrate. A 900MB zip file will be cached.
prost
is the path to a zip file in a BiocFileCache
instance.
Create a folder to work in, and unzip.
Restore the SpatialExperiment component.
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] xenLite_1.1.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 EBImage_4.49.0
## [3] dplyr_1.1.4 blob_1.2.4
## [5] arrow_17.0.0.1 filelock_1.0.3
## [7] bitops_1.0-9 fastmap_1.2.0
## [9] SingleCellExperiment_1.28.0 RCurl_1.98-1.16
## [11] BiocFileCache_2.15.0 promises_1.3.0
## [13] digest_0.6.37 mime_0.12
## [15] lifecycle_1.0.4 RSQLite_2.3.7
## [17] magrittr_2.0.3 compiler_4.4.1
## [19] rlang_1.1.4 sass_0.4.9
## [21] tools_4.4.1 utf8_1.2.4
## [23] yaml_2.3.10 knitr_1.48
## [25] S4Arrays_1.6.0 htmlwidgets_1.6.4
## [27] bit_4.5.0 curl_5.2.3
## [29] DelayedArray_0.33.1 abind_1.4-8
## [31] HDF5Array_1.35.1 withr_3.0.2
## [33] purrr_1.0.2 BiocGenerics_0.53.1
## [35] sys_3.4.3 grid_4.4.1
## [37] stats4_4.4.1 fansi_1.0.6
## [39] colorspace_2.1-1 xtable_1.8-4
## [41] ggplot2_3.5.1 Rhdf5lib_1.28.0
## [43] scales_1.3.0 SummarizedExperiment_1.36.0
## [45] cli_3.6.3 rmarkdown_2.28
## [47] crayon_1.5.3 generics_0.1.3
## [49] rjson_0.2.23 httr_1.4.7
## [51] DBI_1.2.3 cachem_1.1.0
## [53] rhdf5_2.50.0 zlibbioc_1.52.0
## [55] assertthat_0.2.1 BiocManager_1.30.25
## [57] XVector_0.46.0 tiff_0.1-12
## [59] matrixStats_1.4.1 vctrs_0.6.5
## [61] Matrix_1.7-1 jsonlite_1.8.9
## [63] IRanges_2.41.0 fftwtools_0.9-11
## [65] S4Vectors_0.44.0 bit64_4.5.2
## [67] magick_2.8.5 jpeg_0.1-10
## [69] maketools_1.3.1 locfit_1.5-9.10
## [71] jquerylib_0.1.4 glue_1.8.0
## [73] gtable_0.3.6 later_1.3.2
## [75] GenomeInfoDb_1.43.0 GenomicRanges_1.59.0
## [77] UCSC.utils_1.2.0 munsell_0.5.1
## [79] tibble_3.2.1 pillar_1.9.0
## [81] htmltools_0.5.8.1 rhdf5filters_1.18.0
## [83] GenomeInfoDbData_1.2.13 R6_2.5.1
## [85] dbplyr_2.5.0 evaluate_1.0.1
## [87] shiny_1.9.1 lattice_0.22-6
## [89] Biobase_2.67.0 highr_0.11
## [91] png_0.1-8 SpatialExperiment_1.16.0
## [93] memoise_2.0.1 httpuv_1.6.15
## [95] bslib_0.8.0 Rcpp_1.0.13
## [97] SparseArray_1.6.0 xfun_0.48
## [99] MatrixGenerics_1.19.0 buildtools_1.0.0
## [101] pkgconfig_2.0.3