if (!requireNamespace("BiocManager", quietly=TRUE)){
install.packages("BiocManager")}
BiocManager::install("ExperimentSubset")
To install the latest version from Github, use the following code:
Loading the package:
Experiment objects such as the SummarizedExperiment
or
SingleCellExperiment
are data containers for one or more
matrix-like assays along with the associated row and column data. Often
only a subset of the original data is needed for down-stream analysis.
For example, filtering out poor quality samples will require excluding
some columns before analysis. The ExperimentSubset
object
is a container to efficiently manage different subsets of the same data
without having to make separate objects for each new subset and can be
used as a drop-in replacement for other experiment classes.
ExperimentSubset
package enables users to perform
flexible subsetting of Single-Cell data that comes from the same
experiment as well as the consequent storage of these subsets back into
the same object. In general, it offers the same interface to the users
as the SingleCellExperiment
container which is one the most
widely used containers for Single-Cell data. However, in addition to the
features offered by SingleCellExperiment
container,
ExperimentSubset
offers subsetting features while hiding
the implementation details from the users. It does so by creating
references to the subset rows
and columns
instead of storing a new assay whenever possible instead of actually
copying the redundant data. Functions from
SingleCellExperiment
such as assay
,
rowData
and colData
can be used for regular
assays as one would normally do, as well as with newly created subsets
of the data. This allows the users to use the
ExperimentSubset
container simply as if they were using
SingleCellExperiment
container with no change required to
the existing code.
ExperimentSubset
classThe ExperimentSubset
package is composed of multiple
classes that support subsets management capability depending upon the
class of the input experiment object. The currently supported experiment
classes which can be used with ExperimentSubset
include
SummarizedExperiment
,
RangedSummarizedExperiment
and
SingleCellExperiment
.
The ExperimentSubset
package adds an additional slot
subsets
to the objects from these classes which enables
support for the creation and management of subsets of data.
Each subset inside the ExperimentSubset
object (more
specifically inside the subsets
slot of the object) is
stored as an AssaySubset
instance. This
AssaySubset
instance creates reference to the row and
column indices for this particular subset against a parent (which can be
the inherited parent object or another subset). In case a new assay is
to be stored against a subset, it is stored as a separate experiment
object (same class as the inherited object) inside the subset.
ExperimentSubset
classWhile all methods available with SummarizedExperiment
and SingleCellExperiment
classes have been overridden to
support the ExperimentSubset
class with additional support
for subsets, some core methods for the creation and manipulation of
subsets have been provided with the ExperimentSubset
class.
ExperimentSubset
constructorThe constructor method allows the creation of an
ExperimentSubset
object from an input experiment object as
long as it is inherited from SummarizedExperiment
class.
Additionally, if needed, a subset can be directly created from within
the constructor by providing input a named list to the
subset
parameter.
counts <- matrix(rpois(100, lambda = 10), ncol=10, nrow=10)
sce <- SingleCellExperiment(list(counts = counts))
es <- ExperimentSubset(sce)
es
## class: SubsetSingleCellExperiment
## dim: 10 10
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## subsets(0):
## subsetAssays(0):
Additionally, an ExperimentSubset
object can also be
created directly from generally loaded data such as counts matrices,
which can be passed to the constructor function in a list.
## class: SubsetSingleCellExperiment
## dim: 10 10
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## subsets(0):
## subsetAssays(0):
createSubset
The createSubset
method as evident from the name,
creates a subset from an already available assay
in the
object. The subsetName
(a character
string),
rowIndices
(a numeric
or
character
vector
), colIndices
(a
numeric
or character
vector
) and
parentAssay
(a character
string) are the
standard parameters of the createSubset
method. If
rowIndices
or colIndices
are
missing
or NULL
, all of the rows or columns
are selected from the specified parentAssay
. If
parentAssay
is missing
or NULL
,
the first available assay from the parent object is linked as the parent
of this subset. The parentAssay
can be an
assay
in the parent object, a subset or an
assay
within a subset.
es <- createSubset(es,
subsetName = "subset1",
rows = c(1:2),
cols = c(1:5),
parentAssay = "counts")
es
## class: SubsetSingleCellExperiment
## dim: 10 10
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## subsets(1): subset1
## subsetAssays(1): subset1
setSubsetAssay
and getSubsetAssay
The setSubsetAssay
method should be used when a subset
assay
needs to be stored either in a previously created
subset. This is specifically different from the
createSubset
method which only creates a subset by
referencing to a defined parentAssay
where the
internalAssay
of the subset has no assays stored. The
setSubsetAssay
method however, is used to store an
assay
in this internalAssay
slot of the subset
which in fact is a subset experiment object of the same class as the
parent object.
subset1Assay <- assay(es, "subset1")
subset1Assay[,] <- subset1Assay[,] + 1
es <- setSubsetAssay(es,
subsetName = "subset1",
inputMatrix = subset1Assay,
subsetAssayName = "subset1Assay")
es
## class: SubsetSingleCellExperiment
## dim: 10 10
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## subsets(1): subset1
## subsetAssays(2): subset1 subset1Assay
The parameters of interest against this method are
subsetName
which specifies the name of the subset inside
which the an input assay should be stored, inputMatrix
which is a matrix-type object to be stored as an assay inside a subset
specified by the subsetName
parameter and lastly the
subsetAssayName
parameter which represents the name of the
new assay.
To get a subset assay, getSubsetAssay
method can be
used:
## [,1] [,2] [,3] [,4] [,5]
## [1,] 9 6 9 8 6
## [2,] 5 4 8 10 12
## [,1] [,2] [,3] [,4] [,5]
## [1,] 10 7 10 9 7
## [2,] 6 5 9 11 13
Apart from setSubsetAssay
and
getSubsetAssay
methods, assay
and
assay<-
methods can also be used for the same purpose.
Their usage has been described in the overridden methods section
below.
subsetSummary
The subsetSummary
method displays an overall summary of
the ExperimentSubset
object including the assays in the
parent object, the list of subsets along with the stored assays, reduced
dimensions, alt experiments and other supplementary information that may
help the users understand the current condition of the object. The most
important piece of information displayed by this method is the
hierarchical ‘parent-subset’ link against each subset in the object.
## Main assay(s):
## counts
##
## Subset(s):
## Name Dim Parent Assays
## 1 subset1 2, 5 counts subset1Assay
Helper methods have been provided for use by the users during specific circumstances while manipulating subsets of data. These helper methods and their short descriptions are given below:
subsetNames
Returns the names of all available subsets
(excluding internal subset assays)subsetAssayNames
Returns the names of all available
subsets (including internal subset assays)subsetCount
Returns the total count of the subsets
(excluding internal subset assays)subsetAssayCount
Returns the total count of the subsets
(including internal subset assays)subsetDim
Returns the dimensions of a specified
subsetsubsetColData
Gets or sets colData from/to a
subsetsubsetRowData
Gets or sets rowData from/to a
subsetsubsetColnames
Gets or sets colnames from/to a
subsetsubsetRownames
Gets or sets rownames from/to a
subsetsubsetParent
Returns the ’subset-parent` link of a
specified subsetsetSubsetAssay
Sets an assay to a subsetgetSubsetAssay
Gets an assay from a subsetBoth subsetColData
and subsetrowData
getter
methods take in an additional logical parameter
parentColData
or parentRowData
that specifies
if the returned ‘colData’ or ‘rowData’ should include the ‘colData’ and
‘rowData’ from the parent object as well. By default,
parentColData
and parentRowData
parameters are
set to FALSE
. Same applies to the usage of inherited
rowData
and colData
methods.
#store colData to parent object
colData(es) <- cbind(colData(es), sampleID = seq(1:dim(es)[2]))
#store colData to 'subset1' using option 1
colData(es, subsetName = "subset1") <- cbind(
colData(es, subsetName = "subset1"),
subsetSampleID1 = seq(1:subsetDim(es, "subset1")[2]))
#store colData to 'subset1' using option 2
subsetColData(es, "subset1") <- cbind(
subsetColData(es, "subset1"),
subsetSampleID2 = seq(1:subsetDim(es, "subset1")[2]))
#get colData from 'subset1' without parent colData
subsetColData(es, "subset1", parentColData = FALSE)
## DataFrame with 5 rows and 2 columns
## subsetSampleID1 subsetSampleID2
## <integer> <integer>
## 1 1 1
## 2 2 2
## 3 3 3
## 4 4 4
## 5 5 5
## DataFrame with 5 rows and 3 columns
## sampleID subsetSampleID1 subsetSampleID2
## <integer> <integer> <integer>
## 1 1 1 1
## 2 2 2 2
## 3 3 3 3
## 4 4 4 4
## 5 5 5 5
#same applies to `colData` and `rowData` methods when using with subsets
colData(es, subsetName = "subset1", parentColData = FALSE) #without parent data
## DataFrame with 5 rows and 2 columns
## subsetSampleID1 subsetSampleID2
## <integer> <integer>
## 1 1 1
## 2 2 2
## 3 3 3
## 4 4 4
## 5 5 5
## DataFrame with 5 rows and 3 columns
## sampleID subsetSampleID1 subsetSampleID2
## <integer> <integer> <integer>
## 1 1 1 1
## 2 2 2 2
## 3 3 3 3
## 4 4 4 4
## 5 5 5 5
ExperimentSubset
classThese are the methods that have been overridden from other classes to
support the subset feature of the ExperimentSubset
objects
by introducing an additional parameter subsetName
to these
methods. These methods can simply be called on any
ExperimentSubset
object to get or set from the parent
object or from any subset by passing the optional
subsetName
parameter.
The methods include rowData
, rowData<-
,
colData
, colData<-
, metadata
,
metadata<-
, reducedDim
,
reducedDim<-
, reducedDims
,
reducedDims<-
, reducedDimNames
,
reducedDimNames<-
, altExp
,
altExp<-
, altExps
,
altExps<-
, altExpNames
and
altExpNames<-
. All of the methods can be used with the
subsets by providing the optional subsetName
parameter.
An exception to the above approach is the use of assay
and assay<-
methods, both of which have a slightly
different usage as described below:
Because the assay<-
setter method in the case of a
subset needs to store the assay inside the subset, we need to specify
the subset name inside which the assay should be stored as
i
parameter and define the new name of the subset assay as
the additional subsetAssayName
parameter.
#creating a dummy ES object
counts <- matrix(rpois(100, lambda = 10), ncol=10, nrow=10)
sce <- SingleCellExperiment(list(counts = counts))
es <- ExperimentSubset(sce)
#create a subset
es <- createSubset(es, subsetName = "subset1", rows = c(1:2), cols = c(1:4))
#store an assay inside the newly created 'subset1'
#note that 'assay<-' setter has two important parameters 'x' and 'i' where
#'x' is the object and 'i' is the assay name, but in the case of storing to a
#subset we use 'x' as the object, 'i' as the subset name inside which the assay
#should be stored and an additional 'subsetAssayName' parameter which defines
#the name of the new assay
assay(
x = es,
i = "subset1",
subsetAssayName = "subset1InternalAssay") <- matrix(rpois(100, lambda = 10),
ncol=4, nrow=2)
Using assay
getter method is simple, as no additional
parameter is required unlike in the setter method.
#assay getter has parameters 'x' which is the input object, 'i' which can either
#be a assay name in the parent object, a subset name or a subset assay name
#getting 'counts' from parent es object
assay(
x = es,
i = "counts"
)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 11 6 11 9 6 5 8 4 12 8
## [2,] 10 11 9 6 10 16 12 13 17 7
## [3,] 12 7 9 8 11 11 6 12 8 9
## [4,] 13 9 12 8 11 13 9 10 10 8
## [5,] 12 8 3 9 10 7 8 10 9 14
## [6,] 8 6 9 6 10 10 11 11 20 10
## [7,] 6 6 12 8 15 10 13 10 11 9
## [8,] 6 10 5 13 11 19 6 8 10 12
## [9,] 19 16 15 7 14 12 10 8 10 11
## [10,] 9 15 7 14 11 13 10 6 7 13
## [,1] [,2] [,3] [,4]
## [1,] 11 6 11 9
## [2,] 10 11 9 6
#getting the 'subset1InternalAssay' from inside the 'subset1'
assay(
x = es,
i = "subset1InternalAssay"
)
## [,1] [,2] [,3] [,4]
## [1,] 7 5 6 11
## [2,] 9 16 12 15
ExperimentSubset
object: A toy exampleCreating the ExperimentSubset
object is as simple as
passing an experiment object to the ExperimentSubset
constructor:
counts <- matrix(rpois(100, lambda = 10), ncol=10, nrow=10)
sce <- SingleCellExperiment(list(counts = counts))
es <- ExperimentSubset(sce)
subsetSummary(es)
## Main assay(s):
## counts
##
## Subset(s):
## NULL
Create a subset that includes the first 5 rows and columns only:
es <- createSubset(es,
subsetName = "subset1",
rows = c(1:5),
cols = c(1:5),
parentAssay = "counts")
subsetSummary(es)
## Main assay(s):
## counts
##
## Subset(s):
## Name Dim Parent
## 1 subset1 5, 5 counts
Create another subset from subset1
by only keeping the
first two rows:
es <- createSubset(es,
subsetName = "subset2",
rows = c(1:2),
cols = c(1:5),
parentAssay = "subset1")
subsetSummary(es)
## Main assay(s):
## counts
##
## Subset(s):
## Name Dim Parent
## 1 subset1 5, 5 counts
## 2 subset2 2, 5 subset1 -> counts
Get assay
from subset2
and update
values:
Store the updated assay
back to subset2
using one of the two approaches:
#approach 1
es <- setSubsetAssay(es,
subsetName = "subset2",
inputMatrix = subset2Assay,
subsetAssayName = "subset2Assay_a1")
#approach 2
assay(es, "subset2", subsetAssayName = "subset2Assay_a2") <- subset2Assay
subsetSummary(es)
## Main assay(s):
## counts
##
## Subset(s):
## Name Dim Parent Assays
## 1 subset1 5, 5 counts
## 2 subset2 2, 5 subset1 -> counts subset2Assay_a1, subset2Assay_a2
Store an experiment object in the altExp
slot of
subset2
:
altExp(x = es,
e = "subset2_alt1",
subsetName = "subset2") <- SingleCellExperiment(assay = list(
counts = assay(es, "subset2")
))
Show the current condition of ExperimentSubset
object:
## Main assay(s):
## counts
##
## Subset(s):
## Name Dim Parent Assays
## 1 subset1 5, 5 counts
## 2 subset2 2, 5 subset1 -> counts subset2Assay_a1, subset2Assay_a2
## AltExperiments
## 1
## 2 subset2_alt1
ExperimentSubset
object: An example with real
single cell RNA-seq dataInstalling and loading required packages:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.11", ask = FALSE)
BiocManager::install(c("TENxPBMCData", "scater", "scran"))
Load PBMC4K dataset and create ExperimentSubset
object:
tenx_pbmc4k <- TENxPBMCData(dataset = "pbmc4k")
es <- ExperimentSubset(tenx_pbmc4k)
subsetSummary(es)
Compute perCellQCMetrics
on counts
matrix:
perCellQCMetrics <- perCellQCMetrics(assay(es, "counts"))
colData(es) <- cbind(colData(es), perCellQCMetrics)
Filter cells with low column sum and create a new subset called ‘filteredCells’:
filteredCellsIndices <- which(colData(es)$sum > 1500)
es <- createSubset(es, "filteredCells", cols = filteredCellsIndices, parentAssay = "counts")
subsetSummary(es)
Normalize ‘filteredCells’ subset using scater
library
and store it back:
assay(es, "filteredCells", subsetAssayName = "filteredCellsNormalized") <- normalizeCounts(assay(es, "filteredCells"))
subsetSummary(es)
Find highly variable genes from the normalized assay in the previous
step using scran
library against the ‘filteredCells’ subset
only:
topHVG1000 <- getTopHVGs(modelGeneVar(assay(es, "filteredCellsNormalized")), n = 1000)
es <- createSubset(es, "hvg1000", rows = topHVG1000, parentAssay = "filteredCellsNormalized")
subsetSummary(es)
Run ‘PCA’ on the highly variable genes computed in the last step
using scater
library against the ‘filteredCells’ subset
only:
Show the current condition of the ExperimentSubset
object:
ExperimentSubset
ExperimentSubset
constructorcreateSubset
setSubsetAssay
getSubsetAssay
subsetSummary
subsetParent
subsetCount
subsetAssayCount
subsetNames
subsetAssayNames
subsetDim
subsetRowData
subsetColData
subsetColnames
subsetRownames
subsetRowData<-
subsetColData<-
subsetColnames<-
subsetRownames<-
show
assay
assay<-
rowData
rowData<-
colData
colData<-
metadata
metadata<-
reducedDim
reducedDim<-
reducedDims
reducedDims<-
reducedDimNames
reducedDimNames<-
altExp
altExp<-
altExps
altExps<-
altExpNames
altExpNames<-
subsetSpatialCoords
subsetSpatialData
subsetSpatialData<-
subsetRowLinks
subsetColLinks
spatialCoords
spatialData
spatialData<-
rowLinks
colLinks
The internal structure of an ExperimentSubset
class is
described below:
The ExperimentSubset
object during creation is assigned
one of the classes from SubsetSummarizedExperiment
,
SubsetRangedSummarizedExperiment
or
SubsetSingleCellExperiment
which inherit from the class of
the input object. This ensures that ExperimentSubset
object
can be manipulated in a fashion similar to the input object class and so
can be used as a drop-in replacement for these classes. All methods that
are compatible with the input object class are compatible with the
ExperimentSubset
objects as well.
subsets
slotThe subsets
slot of the ExperimentSubset
object is a SimpleList
, where each element in this list is
an object of an internal AssaySubset
class. The slot itself
is not directly accessible to the users and should be accessed through
the provided methods of the ExperimentSubset
package. Each
element represents one subset linked to the experiment object in the
parent object. The structure of each subset is described below:
subsetName
A character
string that represents a user-defined name
of the subset.
rowIndices
A numeric
vector
that stores the indices of
the selected rows in the linked parent assay within for this subset.
colIndices
A numeric
vector
that stores the indices of
the selected columns in the linked parent assay for this subset.
parentAssay
A character
string that stores the name of the immediate
parent to which the subset is linked. The parentAssay
can
be an assay
in the parent ExperimentSubset
object or any subset or any internalAssay
of a subset.
internalAssay
The internalAssay
slot stores an experiment object of
same type as the input object but with the dimensions of the subset. The
internalAssay
is initially an empty experiment object with
only dimensions set to enable manipulation, but can be used to store
additional data against a subset such as assay
,
rowData
, colData
, reducedDims
,
altExps
and metadata
.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ExperimentSubset_1.17.0 TreeSummarizedExperiment_2.15.0
## [3] Biostrings_2.75.1 XVector_0.47.0
## [5] SpatialExperiment_1.17.0 SingleCellExperiment_1.29.1
## [7] SummarizedExperiment_1.37.0 Biobase_2.67.0
## [9] GenomicRanges_1.59.1 GenomeInfoDb_1.43.2
## [11] IRanges_2.41.1 S4Vectors_0.45.2
## [13] BiocGenerics_0.53.3 generics_0.1.3
## [15] MatrixGenerics_1.19.0 matrixStats_1.4.1
## [17] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] rjson_0.2.23 xfun_0.49 bslib_0.8.0
## [4] lattice_0.22-6 yulab.utils_0.1.8 vctrs_0.6.5
## [7] tools_4.4.2 parallel_4.4.2 tibble_3.2.1
## [10] fansi_1.0.6 pkgconfig_2.0.3 Matrix_1.7-1
## [13] lifecycle_1.0.4 GenomeInfoDbData_1.2.13 compiler_4.4.2
## [16] treeio_1.31.0 codetools_0.2-20 htmltools_0.5.8.1
## [19] sys_3.4.3 buildtools_1.0.0 sass_0.4.9
## [22] lazyeval_0.2.2 yaml_2.3.10 tidyr_1.3.1
## [25] pillar_1.9.0 crayon_1.5.3 jquerylib_0.1.4
## [28] BiocParallel_1.41.0 DelayedArray_0.33.2 cachem_1.1.0
## [31] magick_2.8.5 abind_1.4-8 nlme_3.1-166
## [34] tidyselect_1.2.1 digest_0.6.37 purrr_1.0.2
## [37] dplyr_1.1.4 maketools_1.3.1 fastmap_1.2.0
## [40] grid_4.4.2 cli_3.6.3 SparseArray_1.7.2
## [43] magrittr_2.0.3 S4Arrays_1.7.1 utf8_1.2.4
## [46] ape_5.8 UCSC.utils_1.3.0 rmarkdown_2.29
## [49] httr_1.4.7 evaluate_1.0.1 knitr_1.49
## [52] rlang_1.1.4 Rcpp_1.0.13-1 tidytree_0.4.6
## [55] glue_1.8.0 BiocManager_1.30.25 jsonlite_1.8.9
## [58] R6_2.5.1 fs_1.6.5 zlibbioc_1.52.0
ExperimentSubset
classExperimentSubset
class
ExperimentSubset
object: A toy exampleExperimentSubset
object: An example with real single
cell RNA-seq dataExperimentSubset