## Loading required package: scry
We illustrate the application of scry methods to disk-based data from the TENxPBMCData package. Each dataset in this package is stored in an HDF5 file that is accessed through a DelayedArray interface. This avoids the need to load the entire dataset into memory for analysis.
## see ?TENxPBMCData and browseVignettes('TENxPBMCData') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## An object of class "HDF5ArraySeed"
## Slot "filepath":
## [1] "/github/home/.cache/R/ExperimentHub/1be13d877140_1605"
##
## Slot "name":
## [1] "/counts"
##
## Slot "as_sparse":
## [1] TRUE
##
## Slot "type":
## [1] NA
##
## Slot "dim":
## [1] 32738 2700
##
## Slot "chunkdim":
## [1] 631 52
##
## Slot "first_val":
## [1] 0
h5counts<-h5counts[rowSums(h5counts)>0,]
system.time(h5devs<-devianceFeatureSelection(h5counts)) # 26 sec
## user system elapsed
## 19.454 0.708 20.169
We now compare the computation speed when the same data is converted to an ordinary array in-memory. Note this would not be possible with larger HDF5Array objects.
denseCounts<-as.matrix(h5counts)
system.time(denseDevs<-devianceFeatureSelection(denseCounts)) # 5 sec
## user system elapsed
## 3.340 0.156 3.497
## [1] 0
Finally we compare the speed when the counts data are stored in a sparse in-memory Matrix format
## [1] 0.05091945
sparseCounts<-Matrix::Matrix(denseCounts,sparse=TRUE)
system.time(sparseDevs<-devianceFeatureSelection(sparseCounts)) #1.6 sec
## user system elapsed
## 0.534 0.096 0.629
## [1] 1.629815e-09
Using disk-based data saves memory but slows computation time. When the data contain mostly zeros, and are not too large, the sparse in-memory Matrix object achieves fastest computation times. The resulting deviance statistics are the same for all of the different data formats.
One can run nullResiduals
on HDF5Matrix
,
DelayedArray
matrices, and sparse matrices from the
Matrix
package with the same syntax used for the base
matrix case.
We illustrate this with the same dataset from the
TENxPBMCData
package.