In this vignette, we provide an overview of the basic functionality
and usage of the scds
package, which interfaces with
SingleCellExperiment
objects.
Install the scds
package using Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("scds", version = "3.9")
Or from github:
scds
takes as input a SingleCellExperiment
object (see here SingleCellExperiment),
where raw counts are stored in a counts
assay,
i.e. assay(sce,"counts")
. An example dataset created by
sub-sampling the cell-hashing cell-lines data set (see https://satijalab.org/seurat/hashing_vignette.html) is
included with the package and accessible via
data("sce")
.Note that scds
is designed to
workd with larger datasets, but for the purposes of this vignette, we
work with a smaller example dataset. We apply scds
to this
data and compare/visualize reasults:
Get example data set provided with the package.
library(scds)
library(scater)
library(rsvd)
library(Rtsne)
library(cowplot)
set.seed(30519)
data("sce_chcl")
sce = sce_chcl #- less typing
dim(sce)
## [1] 2000 2000
We see it contains 2,000 genes and 2,000 cells, 216 of which are identified as doublets:
##
## Doublet Negative Singlet
## 216 83 1701
We can visualize cells/doublets after projecting into two dimensions:
We now run the scds
doublet annotation approaches.
Briefly, we identify doublets in two complementary ways:
cxds
is based on co-expression of gene pairs and works with
absence/presence calls only, while bcds
uses the full count
information and a binary classification approach using artificially
generated doublets. cxds_bcds_hybrid
combines both
approaches, for more details please consult (this manuscript). Each of the
three methods returns a doublet score, with higher scores indicating
more “doublet-like” barcodes.
#- Annotate doublet using co-expression based doublet scoring:
sce = cxds(sce,retRes = TRUE)
sce = bcds(sce,retRes = TRUE,verb=TRUE)
sce = cxds_bcds_hybrid(sce)
par(mfcol=c(1,3))
boxplot(sce$cxds_score ~ sce$doublet_true_labels, main="cxds")
boxplot(sce$bcds_score ~ sce$doublet_true_labels, main="bcds")
boxplot(sce$hybrid_score ~ sce$doublet_true_labels, main="hybrid")
For cxds
we can identify and visualize gene pairs
driving doublet annoataions, with the expectation that the two genes in
a pair might mark different types of cells (see manuscript). In the
following we look at the top three pairs, each gene pair is a row in the
plot below:
scds =
top3 = metadata(sce)$cxds$topPairs[1:3,]
rs = rownames(sce)
hb = rowData(sce)$cxds_hvg_bool
ho = rowData(sce)$cxds_hvg_ordr[hb]
hgs = rs[ho]
l1 = ggdraw() + draw_text("Pair 1", x = 0.5, y = 0.5)
p1 = plotReducedDim(sce,"tsne",color_by=hgs[top3[1,1]])
p2 = plotReducedDim(sce,"tsne",color_by=hgs[top3[1,2]])
l2 = ggdraw() + draw_text("Pair 2", x = 0.5, y = 0.5)
p3 = plotReducedDim(sce,"tsne",color_by=hgs[top3[2,1]])
p4 = plotReducedDim(sce,"tsne",color_by=hgs[top3[2,2]])
l3 = ggdraw() + draw_text("Pair 3", x = 0.5, y = 0.5)
p5 = plotReducedDim(sce,"tsne",color_by=hgs[top3[3,1]])
p6 = plotReducedDim(sce,"tsne",color_by=hgs[top3[3,2]])
plot_grid(l1,p1,p2,l2,p3,p4,l3,p5,p6,ncol=3, rel_widths = c(1,2,2))
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] cowplot_1.1.3 Rtsne_0.17
## [3] rsvd_1.0.5 scater_1.34.0
## [5] ggplot2_3.5.1 scuttle_1.16.0
## [7] SingleCellExperiment_1.28.0 SummarizedExperiment_1.36.0
## [9] Biobase_2.67.0 GenomicRanges_1.59.0
## [11] GenomeInfoDb_1.43.0 IRanges_2.41.0
## [13] S4Vectors_0.44.0 BiocGenerics_0.53.0
## [15] MatrixGenerics_1.19.0 matrixStats_1.4.1
## [17] scds_1.23.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 viridisLite_0.4.2 farver_2.1.2
## [4] dplyr_1.1.4 vipor_0.4.7 viridis_0.6.5
## [7] fastmap_1.2.0 pROC_1.18.5 digest_0.6.37
## [10] lifecycle_1.0.4 magrittr_2.0.3 compiler_4.4.1
## [13] rlang_1.1.4 sass_0.4.9 tools_4.4.1
## [16] utf8_1.2.4 yaml_2.3.10 data.table_1.16.2
## [19] knitr_1.48 labeling_0.4.3 S4Arrays_1.6.0
## [22] xgboost_1.7.8.1 DelayedArray_0.33.1 plyr_1.8.9
## [25] abind_1.4-8 BiocParallel_1.41.0 withr_3.0.2
## [28] sys_3.4.3 grid_4.4.1 fansi_1.0.6
## [31] beachmat_2.23.0 colorspace_2.1-1 scales_1.3.0
## [34] cli_3.6.3 rmarkdown_2.28 crayon_1.5.3
## [37] generics_0.1.3 httr_1.4.7 ggbeeswarm_0.7.2
## [40] cachem_1.1.0 zlibbioc_1.52.0 parallel_4.4.1
## [43] BiocManager_1.30.25 XVector_0.46.0 vctrs_0.6.5
## [46] Matrix_1.7-1 jsonlite_1.8.9 BiocSingular_1.23.0
## [49] BiocNeighbors_2.1.0 ggrepel_0.9.6 irlba_2.3.5.1
## [52] beeswarm_0.4.0 maketools_1.3.1 jquerylib_0.1.4
## [55] glue_1.8.0 codetools_0.2-20 gtable_0.3.6
## [58] UCSC.utils_1.2.0 ScaledMatrix_1.14.0 munsell_0.5.1
## [61] tibble_3.2.1 pillar_1.9.0 htmltools_0.5.8.1
## [64] GenomeInfoDbData_1.2.13 R6_2.5.1 evaluate_1.0.1
## [67] lattice_0.22-6 highr_0.11 bslib_0.8.0
## [70] Rcpp_1.0.13 gridExtra_2.3 SparseArray_1.6.0
## [73] xfun_0.48 buildtools_1.0.0 pkgconfig_2.0.3