The R package eds
provides a single function
readEDS
for efficiently reading Alevin’s EDS format for
single cell count data into R, utilizing the sparse matrix format in the
Matrix
package.
Note: eds
provides a low-level function
readEDS
which most users will not need to use. Most users
and developers will likely prefer to use tximport
(for
importing matrices) or tximeta
(for easy conversion to
SingleCellExperiment objects). This package is primarily
developed in order to streamline the dependency graph for other
packages.
EDS is an accronym for Efficient single cell binary Data Storage format for the cell-feature count matrices.
For more details on the EDS format see the following repository:
The following example is the same as round in ?readEDS
,
first we point to EDS files as output by Alevin:
## Loading required package: Matrix
dir0 <- system.file("extdata",package="tximportData")
samps <- list.files(file.path(dir0, "alevin"))
dir <- file.path(dir0,"alevin",samps[3],"alevin")
quant.mat.file <- file.path(dir, "quants_mat.gz")
barcode.file <- file.path(dir, "quants_mat_rows.txt")
gene.file <- file.path(dir, "quants_mat_cols.txt")
readEDS()
requires knowing the number of cells and
genes, which we find by reading in associated barcode and gene files.
Again, note that a more useful convenience function for reading in
Alevin data is tximport
(matrices) or tximeta
(for easy conversion to SingleCellExperiment).
cell.names <- readLines(barcode.file)
gene.names <- readLines(gene.file)
num.cells <- length(cell.names)
num.genes <- length(gene.names)
Finally, reading in the sparse matrix is accomplished with:
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] eds_1.9.0 Matrix_1.7-1 tximportData_1.33.0
## [4] rmarkdown_2.28
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.48
## [5] lattice_0.22-6 maketools_1.3.1 cachem_1.1.0 knitr_1.48
## [9] htmltools_0.5.8.1 buildtools_1.0.0 lifecycle_1.0.4 cli_3.6.3
## [13] grid_4.4.1 sass_0.4.9 jquerylib_0.1.4 compiler_4.4.1
## [17] sys_3.4.3 tools_4.4.1 evaluate_1.0.1 bslib_0.8.0
## [21] Rcpp_1.0.13 yaml_2.3.10 jsonlite_1.8.9 rlang_1.1.4