The SingleCellExperiment
is quite a complex class that
can hold multiple aspects of the same dataset. It is possible to have
multiple assays, multiple dimensionality reduction results, and multiple
alternative Experiments - each of which can further have multiple assays
and reducedDims
! In some scenarios, it may be desirable to
loop over these pieces and apply the same function to each of them. This
is made conveniently possible via the applySCE()
framework.
Let’s say we have a moderately complicated
SingleCellExperiment
object, containing multiple
alternative Experiments for different data modalities.
library(SingleCellExperiment)
counts <- matrix(rpois(100, lambda = 10), ncol=10, nrow=10)
sce <- SingleCellExperiment(counts)
altExp(sce, "Spike") <- SingleCellExperiment(matrix(rpois(20, lambda = 5), ncol=10, nrow=2))
altExp(sce, "Protein") <- SingleCellExperiment(matrix(rpois(50, lambda = 100), ncol=10, nrow=5))
altExp(sce, "CRISPR") <- SingleCellExperiment(matrix(rbinom(80, p=0.1, 1), ncol=10, nrow=8))
sce
## class: SingleCellExperiment
## dim: 10 10
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(3): Spike Protein CRISPR
Assume that we want to compute the total count for each modality,
using the first assay. We might define a function that looks like the
below. (We will come back to the purpose of multiplier=
and
subset.row=
later.)
totalCount <- function(x, i=1, multiplier=1, subset.row=NULL) {
mat <- assay(x, i)
if (!is.null(subset.row)) {
mat <- mat[subset.row,,drop=FALSE]
}
colSums(mat) * multiplier
}
We can then easily apply this function across the main and alternative Experiments with:
## [[1]]
## [1] 111 98 112 88 108 95 88 96 92 83
##
## $Spike
## [1] 8 12 13 11 12 12 16 9 16 16
##
## $Protein
## [1] 498 516 536 526 491 474 499 454 487 467
##
## $CRISPR
## [1] 2 0 1 1 0 1 0 1 0 0
The applySCE()
call above is functionally equivalent
to:
totals.manual <- list(
totalCount(sce),
Spike=totalCount(altExp(sce, "Spike")),
Protein=totalCount(altExp(sce, "Protein")),
CRISPR=totalCount(altExp(sce, "CRISPR"))
)
stopifnot(identical(totals, totals.manual))
Besides being more verbose than applySCE()
, this
approach does not deal well with common arguments. Say we wanted to set
multiplier=10
for all calls. With the manual approach
above, this would involve specifying the argument multiple times:
totals10.manual <- list(
totalCount(sce, multiplier=10),
Spike=totalCount(altExp(sce, "Spike"), multiplier=10),
Protein=totalCount(altExp(sce, "Protein"), multiplier=10),
CRISPR=totalCount(altExp(sce, "CRISPR"), multiplier=10)
)
Whereas with the applySCE()
approach, we can just set it
once. This makes it easier to change and reduces the possibility of
errors when copy-pasting parameter lists across calls.
totals10.apply <- applySCE(sce, FUN=totalCount, multiplier=10)
stopifnot(identical(totals10.apply, totals10.manual))
Now, one might consider just using lapply()
in this
case, which also avoids the need for repeated specification:
totals10.lapply <- lapply(c(List(sce), altExps(sce)),
FUN=totalCount, multiplier=10)
stopifnot(identical(totals10.apply, totals10.lapply))
However, this runs into the opposite problem - it is no longer
possible to specify custom arguments for each call. For
example, say we wanted to subset to a different set of features for each
main and alternative Experiment. With applySCE()
, this is
still possible:
totals.custom <- applySCE(sce, FUN=totalCount, multiplier=10,
ALT.ARGS=list(Spike=list(subset.row=2), Protein=list(subset.row=3:5)))
totals.custom
## [[1]]
## [1] 1110 980 1120 880 1080 950 880 960 920 830
##
## $Spike
## [1] 60 70 100 40 40 60 40 30 60 80
##
## $Protein
## [1] 2890 2910 3300 3200 3010 3060 3090 2670 2790 2640
##
## $CRISPR
## [1] 20 0 10 10 0 10 0 10 0 0
In cases where we have a mix between custom and common arguments,
applySCE()
provides a more convenient and flexible
interface than manual calls or lapply()
ing.
SingleCellExperiment
The other convenient aspect of applySCE()
is that, if
the specified FUN=
returns a
SingleCellExperiment
, applySCE()
will try to
format the output as a SingleCellExperiment
. To
demonstrate, let’s use the head()
function to take the
first few features for each main and alternative Experiment:
## class: SingleCellExperiment
## dim: 5 10
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(3): Spike Protein CRISPR
Rather than returning a list of SingleCellExperiment
s,
we can see that the output is neatly organized as a
SingleCellExperiment
with the specified n=5
features. Moreover, each of the alternative Experiments is also
truncated to its first 5 features (or fewer, if there weren’t that many
to begin with). This output mirrors, as much as possible, the format of
the input sce
, and is much more convenient to work with
than a list of objects.
## class: SingleCellExperiment
## dim: 2 10
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## class: SingleCellExperiment
## dim: 5 10
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## class: SingleCellExperiment
## dim: 5 10
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
To look under the hood, we can turn off simplification and see what
happens. We see that the function indeed returns a list of
SingleCellExperiment
objects corresponding to the
head()
of each Experiment. When SIMPLIFY=TRUE
,
this list is passed through simplifyToSCE()
to attempt the
reorganization into a single object.
## [[1]]
## class: SingleCellExperiment
## dim: 5 10
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(3): Spike Protein CRISPR
##
## $Spike
## class: SingleCellExperiment
## dim: 2 10
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
##
## $Protein
## class: SingleCellExperiment
## dim: 5 10
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
##
## $CRISPR
## class: SingleCellExperiment
## dim: 5 10
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
For comparison, if we had to do this manually, it would be rather
tedious and error-prone, e.g., if we forgot to set n=
or if
we re-assigned the output of head()
to the wrong
alternative Experiment.
manual.head <- head(sce, n=5)
altExp(manual.head, "Spike") <- head(altExp(sce, "Spike"), n=5)
altExp(manual.head, "Protein") <- head(altExp(sce, "Protein"), n=5)
altExp(manual.head, "CRISPR") <- head(altExp(sce, "CRISPR"), n=5)
manual.head
## class: SingleCellExperiment
## dim: 5 10
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(3): Spike Protein CRISPR
Of course, this simplification is only possible when circumstances
permit. It requires that FUN=
returns a
SingleCellExperiment
at each call, and that no more than
one result is generated for each alternative Experiment. Failure to meet
these conditions will result in a warning and a non-simplified
output.
Developers may prefer to set SIMPLIFY=FALSE
and manually
call simplifyToSCE()
, possibly with
warn.level=3
to trigger an explicit error when
simplification fails.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SingleCellExperiment_1.29.1 SummarizedExperiment_1.37.0
## [3] Biobase_2.67.0 GenomicRanges_1.59.1
## [5] GenomeInfoDb_1.43.4 IRanges_2.41.2
## [7] S4Vectors_0.45.2 BiocGenerics_0.53.6
## [9] generics_0.1.3 MatrixGenerics_1.19.1
## [11] matrixStats_1.5.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] Matrix_1.7-2 jsonlite_1.8.9 compiler_4.4.2
## [4] BiocManager_1.30.25 crayon_1.5.3 jquerylib_0.1.4
## [7] yaml_2.3.10 fastmap_1.2.0 lattice_0.22-6
## [10] R6_2.5.1 XVector_0.47.2 S4Arrays_1.7.2
## [13] knitr_1.49 DelayedArray_0.33.5 maketools_1.3.1
## [16] GenomeInfoDbData_1.2.13 bslib_0.9.0 rlang_1.1.5
## [19] cachem_1.1.0 xfun_0.50 sass_0.4.9
## [22] sys_3.4.3 SparseArray_1.7.5 cli_3.6.3
## [25] grid_4.4.2 digest_0.6.37 lifecycle_1.0.4
## [28] evaluate_1.0.3 buildtools_1.0.0 abind_1.4-8
## [31] rmarkdown_2.29 httr_1.4.7 tools_4.4.2
## [34] htmltools_0.5.8.1 UCSC.utils_1.3.1