Title: | Large-scale single-cell RNA-seq data analysis using GDS files and Seurat |
---|---|
Description: | Extends the Seurat classes and functions to support Genomic Data Structure (GDS) files as a DelayedArray backend for data representation. It relies on the implementation of GDS-based DelayedMatrix in the SCArray package to represent single cell RNA-seq data. The common optimized algorithms leveraging GDS-based and single cell-specific DelayedMatrix (SC_GDSMatrix) are implemented in the SCArray package. SCArray.sat introduces a new SCArrayAssay class (derived from the Seurat Assay), which wraps raw counts, normalized expressions and scaled data matrix based on GDS-specific DelayedMatrix. It is designed to integrate seamlessly with the Seurat package to provide common data analysis in the SeuratObject-based workflow. Compared with Seurat, SCArray.sat significantly reduces the memory usage without downsampling and can be applied to very large datasets. |
Authors: | Xiuwen Zheng [aut, cre] , Seurat contributors [ctb] (for the classes and methods defined in Seurat) |
Maintainer: | Xiuwen Zheng <[email protected]> |
License: | GPL-3 |
Version: | 1.5.0 |
Built: | 2024-07-01 05:19:24 UTC |
Source: | https://github.com/bioc/SCArray.sat |
The package extends the Seurat classes and functions to support GDS files as a DelayedArray backend for data representation. It introduces a new SCArrayAssay class (derived from the Seurat Assay), which wraps raw counts, normalized expressions and scaled data matrix based on DelayedMatrix. It is designed to integrate seamlessly with the SeuratObject and Seurat packages to provide common data analysis, with the optimized algorithms for GDS data files.
Package: | SCArray.sat |
Type: | Package |
License: | GPL version 3 |
Xiuwen Zheng
Create an SCArrayAssay (inherited from Assay) object from counts or prenormalized data.
CreateAssayObject2(counts, data, min.cells=0, min.features=0, key=NULL, check.matrix=FALSE, ...)
CreateAssayObject2(counts, data, min.cells=0, min.features=0, key=NULL, check.matrix=FALSE, ...)
counts |
Unnormalized raw counts (matrix, dgCMatrix or DelayedMatrix) |
data |
Prenormalized data (matrix, dgCMatrix or DelayedMatrix) |
min.cells |
if > 0, a lower cutoff for filtering cells |
min.features |
if > 0, a lower cutoff for filtering features |
check.matrix |
Check counts matrix for NA, NaN, Inf, and non-integer values |
key |
Key name for the assay |
... |
Arguments passed to |
Similar to SeuratObject::CreateAssayObject()
, except allowing
DelayedMatrix counts
or data
, and returning a SCArrayAssay
object. counts
and data
should not be provided at the same time.
Return an instance of SCArrayAssay
.
Xiuwen Zheng
fn <- system.file("extdata", "example.gds", package="SCArray") x <- scArray(fn, "counts") colnames(x) <- paste0("c", 1:ncol(x)) rownames(x) <- paste0("g", 1:nrow(x)) x a <- CreateAssayObject2(x) a scGetFiles(x) scGetFiles(a) remove(x, a)
fn <- system.file("extdata", "example.gds", package="SCArray") x <- scArray(fn, "counts") colnames(x) <- paste0("c", 1:ncol(x)) rownames(x) <- paste0("g", 1:nrow(x)) x a <- CreateAssayObject2(x) a scGetFiles(x) scGetFiles(a) remove(x, a)
Normalizes the count data in the Seurat assay.
# NormalizeData(object, ...) ## S3 method for class 'SC_GDSMatrix' NormalizeData(object, normalization.method="LogNormalize", scale.factor=1e4, margin=1, verbose=TRUE, ...)
# NormalizeData(object, ...) ## S3 method for class 'SC_GDSMatrix' NormalizeData(object, normalization.method="LogNormalize", scale.factor=1e4, margin=1, verbose=TRUE, ...)
object |
input R object (e.g., a SC_GDSMatrix object) |
normalization.method |
"LogNormalize", "CLR" or "RC";
see |
scale.factor |
the scale factor for cell-level normalization |
margin |
only applicable when |
verbose |
if |
... |
additional arguments passed to specific methods |
NormalizeData()
does not store the normalized data in a GDS file,
since the calculation is "delayed" until it is needed.
Returns a SC_GDSMatrix matrix.
Xiuwen Zheng
fn <- system.file("extdata", "example.gds", package="SCArray") d <- scNewSeuratGDS(fn) d d <- NormalizeData(d) remove(a, d)
fn <- system.file("extdata", "example.gds", package="SCArray") d <- scNewSeuratGDS(fn) d d <- NormalizeData(d) remove(a, d)
Performs PCA on a Seurat SCArrayAssay or a DelayedMatrix object.
# RunPCA(object, ...) ## S3 method for class 'SCArrayAssay' RunPCA(object, assay=NULL, features=NULL, npcs=50, rev.pca=FALSE, weight.by.var=TRUE, verbose=TRUE, ndims.print=1:5, nfeatures.print=30, reduction.key="PC_", seed.use=42, ...) ## S3 method for class 'SC_GDSMatrix' RunPCA(object, assay=NULL, npcs=50, rev.pca=FALSE, weight.by.var=TRUE, verbose=TRUE, ndims.print=1:5, nfeatures.print=30, reduction.key="PC_", seed.use=42, approx=TRUE, BPPARAM, ...)
# RunPCA(object, ...) ## S3 method for class 'SCArrayAssay' RunPCA(object, assay=NULL, features=NULL, npcs=50, rev.pca=FALSE, weight.by.var=TRUE, verbose=TRUE, ndims.print=1:5, nfeatures.print=30, reduction.key="PC_", seed.use=42, ...) ## S3 method for class 'SC_GDSMatrix' RunPCA(object, assay=NULL, npcs=50, rev.pca=FALSE, weight.by.var=TRUE, verbose=TRUE, ndims.print=1:5, nfeatures.print=30, reduction.key="PC_", seed.use=42, approx=TRUE, BPPARAM, ...)
object |
input R object (e.g., a SCArrayAssay object) |
assay |
NULL for using the active assay, or an assay name |
features |
if |
npcs |
# of top PCs to be calculated |
rev.pca |
By default ( |
weight.by.var |
if |
verbose |
if |
ndims.print |
which PCs to print genes for |
nfeatures.print |
# of genes to print for each PC |
reduction.key |
dimensional reduction key |
seed.use |
a random seed; or NULL for not setting a seed internally |
approx |
if |
BPPARAM |
NULL for non-parallel execution, or a
|
... |
additional arguments passed to specific methods |
RunPCA()
computes the covariance matrix of genes (if # of genes <=
# of cells) or the cell covariance matrix for the PCA calculation, which can
reduce the times of accessing on-disk data.
Return a data frame for reduction data (via CreateDimReducObject
).
Xiuwen Zheng
RunPCA
, CreateDimReducObject
,
BiocParallelParam
, getAutoBPPARAM
fn <- system.file("extdata", "example.gds", package="SCArray") d <- scNewSeuratGDS(fn) d <- NormalizeData(d) d <- FindVariableFeatures(d, nfeatures=250) d <- ScaleData(d) d <- RunPCA(d, ndims.print=1:2) DimPlot(d, reduction="pca") remove(a, d)
fn <- system.file("extdata", "example.gds", package="SCArray") d <- scNewSeuratGDS(fn) d <- NormalizeData(d) d <- FindVariableFeatures(d, nfeatures=250) d <- ScaleData(d) d <- RunPCA(d, ndims.print=1:2) DimPlot(d, reduction="pca") remove(a, d)
Scales and centers features or residuals in the dataset.
# ScaleData(object, ...) ## S3 method for class 'SC_GDSMatrix' ScaleData(object, features=NULL, vars.to.regress=NULL, latent.data=NULL, split.by=NULL, model.use='linear', use.umi=FALSE, do.scale=TRUE, do.center=TRUE, scale.max=10, block.size=1000, min.cells.to.block=3000, verbose=TRUE, use_gds=TRUE, rm_tmpfile=TRUE, ...)
# ScaleData(object, ...) ## S3 method for class 'SC_GDSMatrix' ScaleData(object, features=NULL, vars.to.regress=NULL, latent.data=NULL, split.by=NULL, model.use='linear', use.umi=FALSE, do.scale=TRUE, do.center=TRUE, scale.max=10, block.size=1000, min.cells.to.block=3000, verbose=TRUE, use_gds=TRUE, rm_tmpfile=TRUE, ...)
object |
input R object (e.g., a SC_GDSMatrix object) |
features |
if |
vars.to.regress |
NULL or variable names to regress out |
latent.data |
NULL or a |
split.by |
variable name in the metadata, or a vector or factor defining grouping of cells |
model.use |
regression model: |
use.umi |
only applicable when the covariates are given in
|
do.scale |
if |
do.center |
if |
scale.max |
max value in the resulting scaled data; see
|
block.size |
not used |
min.cells.to.block |
not used |
verbose |
if |
use_gds |
if |
rm_tmpfile |
if |
... |
additional arguments passed to specific methods |
ScaleData()
stores the scaled data in a GDS file when
use_gds=TRUE
or an output GDS file name is given via use_gds
.
When vars.to.regress
and split.by
are both NULL
, an output
GDS file is not needed, since the resulting DelayedMatrix can be represented
as common operations on the count matrix. If use_gds=TRUE
, an output
file name "_scale_data.gds" will be used if it does not exists, or
"_scale_data2.gds" (if not exists), "_scale_data3.gds" and so on. If
use_gds
is an output file name, the resulting data matrix will be saved
to a GDS file.
When vars.to.regress
are given, a temporary GDS file (e.g.,
"_temp_scale_data.gds", use_gds
with a prefix "_temp") will be created
to store the residuals before scaling. This temporary file will be deleted
after the calculation when rm_tmpfile=TRUE
.
Returns a SC_GDSMatrix matrix if use_gds=TRUE
or use_gds
is an
output file name, otherwise returns an in-memory matrix.
Xiuwen Zheng
fn <- system.file("extdata", "example.gds", package="SCArray") d <- scNewSeuratGDS(fn) d <- NormalizeData(d) d <- FindVariableFeatures(d, nfeatures=50) d <- ScaleData(d) GetAssayData(d, slot="scale.data") # DelayedMatrix # scale with split.by ss <- rep(c(TRUE, FALSE), length.out=ncol(d)) d <- ScaleData(d, split.by=ss) fn <- scGetFiles(d) fn[2L] # the file name storing scaled data remove(a, d) unlink(fn[grepl("^_scale", fn)], force=TRUE)
fn <- system.file("extdata", "example.gds", package="SCArray") d <- scNewSeuratGDS(fn) d <- NormalizeData(d) d <- FindVariableFeatures(d, nfeatures=50) d <- ScaleData(d) GetAssayData(d, slot="scale.data") # DelayedMatrix # scale with split.by ss <- rep(c(TRUE, FALSE), length.out=ncol(d)) d <- ScaleData(d, split.by=ss) fn <- scGetFiles(d) fn[2L] # the file name storing scaled data remove(a, d) unlink(fn[grepl("^_scale", fn)], force=TRUE)
The SCArrayAssay class extends the Assay class of Seurat with the new slots
counts2
, data2
and scale.data2
replacing counts
,
data
and scale.data
.
counts2
Unnormalized raw counts (dgCMatrix or SC_GDSMatrix),
replacing Assay@counts
data2
Normalized expression data (dgCMatrix or SC_GDSMatrix),
replacing Assay@data
scale.data2
Scaled expression data (NULL, matrix or SC_GDSMatrix),
replacing [email protected]
Xiuwen Zheng
Assay-class
, Seurat_g-class
,
GetAssayData
, SetAssayData
Gets and sets data in the Seurat Assay object.
## S3 method for class 'SCArrayAssay' GetAssayData(object, slot=c("data", "scale.data", "counts"), ...) ## S3 method for class 'SCArrayAssay' SetAssayData(object, layer, new.data, slot=c('data', 'scale.data', 'counts'), ...) ## S3 method for class 'SCArrayAssay' subset(x, cells=NULL, features=NULL, ...)
## S3 method for class 'SCArrayAssay' GetAssayData(object, slot=c("data", "scale.data", "counts"), ...) ## S3 method for class 'SCArrayAssay' SetAssayData(object, layer, new.data, slot=c('data', 'scale.data', 'counts'), ...) ## S3 method for class 'SCArrayAssay' subset(x, cells=NULL, features=NULL, ...)
object , x
|
a SCArrayAssay object (inherited from SeuratObject::Assay) |
layer |
layer |
new.data |
a new data matrix (dgCMatrix or SC_GDSMatrix) |
slot |
data matrix in the Assay object, "data" is used by default |
cells |
names or indices for selected cells |
features |
names or indices for selected features |
... |
further arguments to be passed to or from other methods |
Return a data matrix or an instance of SCArrayAssay
.
Xiuwen Zheng
Get a list of file names for DelayedArray with an on-disk backend.
scGetFiles(object, ...) ## S4 method for signature 'Assay' scGetFiles(object, ...) ## S4 method for signature 'SCArrayAssay' scGetFiles(object, ...) ## S4 method for signature 'Seurat' scGetFiles(object, ...)
scGetFiles(object, ...) ## S4 method for signature 'Assay' scGetFiles(object, ...) ## S4 method for signature 'SCArrayAssay' scGetFiles(object, ...) ## S4 method for signature 'Seurat' scGetFiles(object, ...)
object |
input R object (e.g., a Seurat object) |
... |
additional arguments passed to specific methods |
Return a character vector storing file names.
Xiuwen Zheng
fn <- system.file("extdata", "example.gds", package="SCArray") a <- scNewAssayGDS(fn) d <- Seurat::CreateSeuratObject(a) scGetFiles(a) scGetFiles(d) remove(a, d)
fn <- system.file("extdata", "example.gds", package="SCArray") a <- scNewAssayGDS(fn) d <- Seurat::CreateSeuratObject(a) scGetFiles(a) scGetFiles(d) remove(a, d)
Loads the internal data to memory for any on-disk object.
scMemory(x, ...) ## S4 method for signature 'SCArrayAssay' scMemory(x, slot=NULL, ...) ## S4 method for signature 'Seurat' scMemory(x, assay=NULL, slot=NULL, ...)
scMemory(x, ...) ## S4 method for signature 'SCArrayAssay' scMemory(x, slot=NULL, ...) ## S4 method for signature 'Seurat' scMemory(x, assay=NULL, slot=NULL, ...)
x |
input R object (e.g., a Seurat object) |
assay |
NULL for using the active assay, or a list of assay names |
slot |
NULL for all |
... |
additional arguments passed to specific methods |
If slot=NULL
, return a Assay
object instead of
SCArrayAssay
object, so it can downgrade a SCArrayAssay
object to
a Assay
object.
Return an object (it maybe a different type from class(x)
).
Xiuwen Zheng
fn <- system.file("extdata", "example.gds", package="SCArray") d1 <- scNewSeuratGDS(fn) is(GetAssay(d1)) d2 <- scMemory(d1) is(GetAssay(d2)) remove(a, d1, d2)
fn <- system.file("extdata", "example.gds", package="SCArray") d1 <- scNewSeuratGDS(fn) is(GetAssay(d1)) d2 <- scMemory(d1) is(GetAssay(d2)) remove(a, d1, d2)
Creates a new Seurat Assay object (SCArrayAssay) from a GDS file.
scNewAssayGDS(gdsfile, name="counts", key="rna_", row_data=TRUE, check=TRUE, verbose=TRUE)
scNewAssayGDS(gdsfile, name="counts", key="rna_", row_data=TRUE, check=TRUE, verbose=TRUE)
gdsfile |
a file name for the GDS file, or a |
name |
characters for the name of data matrix in the GDS file;
if |
key |
characters for the Assay key |
row_data |
if |
check |
if |
verbose |
if |
Return an instance of SCArrayAssay
.
Xiuwen Zheng
SCArrayAssay
, SCArrayFileClass
,
scExperiment
, scNewSeuratGDS
# raw count data in a GDS file fn <- system.file("extdata", "example.gds", package="SCArray") a <- scNewAssayGDS(fn) a class(a) d <- Seurat::CreateSeuratObject(a) d rm(a, d)
# raw count data in a GDS file fn <- system.file("extdata", "example.gds", package="SCArray") a <- scNewAssayGDS(fn) a class(a) d <- Seurat::CreateSeuratObject(a) d rm(a, d)
Creates a new Seurat object from a GDS file.
scNewSeuratGDS(gdsfile, assay.name=NULL, key=c(counts="rna_"), row_data=TRUE, col_data=TRUE, check=TRUE, verbose=TRUE)
scNewSeuratGDS(gdsfile, assay.name=NULL, key=c(counts="rna_"), row_data=TRUE, col_data=TRUE, check=TRUE, verbose=TRUE)
gdsfile |
a file name for the GDS file, or a |
assay.name |
characters for the name of data matrix in the GDS file;
if |
key |
a character vector for an assay key map, where its names are the GDS node names |
row_data |
if |
col_data |
if |
check |
if |
verbose |
if |
"counts"
must be in the input GDS file and it is used as the raw
count data in the active Seurat assay. If "logcounts"
exists, it is
used as normalized data associated with "counts"
. If there are other
data matrices in the GDS file, they will be added to the assay list.
Return an instance of Seurat
.
Xiuwen Zheng
SCArrayAssay
, SCArrayFileClass
,
scExperiment
, scNewAssayGDS
# raw count data in a GDS file fn <- system.file("extdata", "example.gds", package="SCArray") d <- scNewSeuratGDS(fn) d class(d) rm(d)
# raw count data in a GDS file fn <- system.file("extdata", "example.gds", package="SCArray") d <- scNewSeuratGDS(fn) d class(d) rm(d)
The Seurat_g class inherits directly from "Seurat".
setClass("Seurat_g", contains="Seurat")
Xiuwen Zheng