Mass cytometry makes it possible to count a large number of proteins
simultaneously on individual cells (Bandura et
al. 2009; Bendall et al. 2011). Mass cytometry has less
spillover— measurements from one channel overlap less with those of
another—than flow cytometry (Bagwell and Adams
1993; Novo, Grégori, and Rajwa 2013), but spillover is still a
problem and affects downstream analyses such as differential testing
(Weber et al. 2019; Seiler et al. 2021) or
dimensionality reduction (McCarthy et al.
2017). Reducing spillover by careful design of experiment is
possible (Takahashi et al. 2017), but a
purely experimental approach may not be sufficient nor efficient (Lun et al. 2017). Chevrier et al. (2018) propose a method for
addressing spillover by conducting an experiment on beads. This
experiment measures spillover by staining each bead with a single
antibody. Their solution relies on an estimate for the spillover matrix
using non-negative matrix factorization. The spillover matrix encodes
the pairwise spillover proportion between channels. We avoid this step
and directly describe the spillover channels and the channel with the
true signal using a mixture of nonparametric distributions. Our main new
assumption is that the spillover distribution—not just the spillover
proportion—from the beads experiment carries over to the biological
experiment. Here, we illustrate the spillR
R
package for spillover compensation in mass cytometry.
Our motivation to submit to Bioconductor is to make our package available to a large user base and to ensure its compatibility with other packages that address preprocessing and analysis of mass cytometry data.
Install this package.
We test our method on one of the example datasets in the
CATALYST
package. The dataset has an experiment with real
cells and a corresponding bead experiment. The experiments on real cells
has 5,000 peripheral blood mononuclear cells from healthy donors
measured on 39 channels. The experiment on beads has 10,000 cells
measured on 36 channels. They have single stained bead experiments. The
number of beads per mental label range from 112 to 241.
We compare the two methods on the same marker as in the original
CATALYST
paper (Chevrier et al.
2018) in their Figure 3B. In the original experiment, they
conjugated three proteins—CD3, CD8, and HLA-DR—with two different metal
labels. Here is the CATALYST
code to load the data into a
Single Cell Experiment (SCE).
library(spillR)
library(CATALYST)
library(dplyr)
library(ggplot2)
library(cowplot)
bc_key <- c(139, 141:156, 158:176)
sce_bead <- prepData(ss_exp)
sce_bead <- assignPrelim(sce_bead, bc_key, verbose = FALSE)
sce_bead <- applyCutoffs(estCutoffs(sce_bead))
sce_bead <- computeSpillmat(sce_bead)
# --------- experiment with real cells ---------
data(mp_cells, package = "CATALYST")
sce <- prepData(mp_cells)
The function compCytof
takes as inputs two Single Cell
Experiment (SCE) objects. One contains the real cells experiment and the
other the beads experiment. It also requires a table
marker_to_barc
that maps the channels to their barcodes
used in the beads experiment. The output is the same SCE for the real
experiments with the addition of the compensated counts and the
asinh
transformed compensated counts.
# --------- table for mapping markers and barcode ---------
marker_to_barc <-
rowData(sce_bead)[, c("channel_name", "is_bc")] |>
as_tibble() |>
dplyr::filter(is_bc == TRUE) |>
mutate(barcode = bc_key) |>
select(marker = channel_name, barcode)
# --------- compensate function from spillR package ---------
sce_spillr <-
spillR::compCytof(sce, sce_bead, marker_to_barc, impute_value = NA,
overwrite = FALSE)
# --------- 2d histogram from CATALYST package -------
as <- c("counts", "exprs", "compcounts", "compexprs")
chs <- c("Yb171Di", "Yb173Di")
ps <- lapply(as, function(a) plotScatter(sce_spillr, chs, assay = a))
plot_grid(plotlist = ps, nrow = 2)
spillR
offers the possibility to visualize the
compensation results and the internal spillover estimates. The function
plotDiagnostics
presents two plots: the frequency polygons
before and after spillover compensation, and the density plot of
spillover markers with our estimation of the spillover probability
function as a black dashed curve. This plot allows us to check the
compensation performed by our method. If the black curve captures all
the spillover makers, then that indicates a reliable spillover
estimation. If the target marker in the beads experiment overlaps with
the real cells, then that indicates a high-quality bead experiment.
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] cowplot_1.1.3 ggplot2_3.5.1
## [3] dplyr_1.1.4 spillR_1.3.0
## [5] CATALYST_1.31.0 SingleCellExperiment_1.28.0
## [7] SummarizedExperiment_1.36.0 Biobase_2.67.0
## [9] GenomicRanges_1.59.0 GenomeInfoDb_1.43.0
## [11] IRanges_2.41.0 S4Vectors_0.44.0
## [13] BiocGenerics_0.53.1 generics_0.1.3
## [15] MatrixGenerics_1.19.0 matrixStats_1.4.1
## [17] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 sys_3.4.3
## [3] jsonlite_1.8.9 shape_1.4.6.1
## [5] magrittr_2.0.3 spatstat.utils_3.1-0
## [7] ggbeeswarm_0.7.2 TH.data_1.1-2
## [9] farver_2.1.2 rmarkdown_2.28
## [11] GlobalOptions_0.1.2 zlibbioc_1.52.0
## [13] vctrs_0.6.5 rstatix_0.7.2
## [15] htmltools_0.5.8.1 S4Arrays_1.6.0
## [17] plotrix_3.8-4 BiocNeighbors_2.1.0
## [19] broom_1.0.7 SparseArray_1.6.0
## [21] Formula_1.2-5 sass_0.4.9
## [23] bslib_0.8.0 plyr_1.8.9
## [25] sandwich_3.1-1 zoo_1.8-12
## [27] cachem_1.1.0 buildtools_1.0.0
## [29] igraph_2.1.1 lifecycle_1.0.4
## [31] iterators_1.0.14 pkgconfig_2.0.3
## [33] rsvd_1.0.5 Matrix_1.7-1
## [35] R6_2.5.1 fastmap_1.2.0
## [37] GenomeInfoDbData_1.2.13 clue_0.3-65
## [39] digest_0.6.37 colorspace_2.1-1
## [41] ggnewscale_0.5.0 scater_1.34.0
## [43] irlba_2.3.5.1 ggpubr_0.6.0
## [45] beachmat_2.23.0 labeling_0.4.3
## [47] cytolib_2.19.0 fansi_1.0.6
## [49] colorRamps_2.3.4 nnls_1.6
## [51] httr_1.4.7 polyclip_1.10-7
## [53] abind_1.4-8 compiler_4.4.1
## [55] withr_3.0.2 doParallel_1.0.17
## [57] ConsensusClusterPlus_1.71.0 backports_1.5.0
## [59] BiocParallel_1.41.0 viridis_0.6.5
## [61] carData_3.0-5 highr_0.11
## [63] hexbin_1.28.4 ggforce_0.4.2
## [65] ggsignif_0.6.4 MASS_7.3-61
## [67] drc_3.0-1 DelayedArray_0.33.1
## [69] rjson_0.2.23 FlowSOM_2.15.0
## [71] gtools_3.9.5 tools_4.4.1
## [73] vipor_0.4.7 beeswarm_0.4.0
## [75] glue_1.8.0 grid_4.4.1
## [77] Rtsne_0.17 cluster_2.1.6
## [79] reshape2_1.4.4 gtable_0.3.6
## [81] tidyr_1.3.1 data.table_1.16.2
## [83] ScaledMatrix_1.14.0 BiocSingular_1.23.0
## [85] car_3.1-3 utf8_1.2.4
## [87] XVector_0.46.0 ggrepel_0.9.6
## [89] foreach_1.5.2 pillar_1.9.0
## [91] stringr_1.5.1 circlize_0.4.16
## [93] splines_4.4.1 flowCore_2.19.0
## [95] tweenr_2.0.3 lattice_0.22-6
## [97] survival_3.7-0 RProtoBufLib_2.18.0
## [99] tidyselect_1.2.1 ComplexHeatmap_2.23.0
## [101] maketools_1.3.1 scuttle_1.16.0
## [103] knitr_1.48 gridExtra_2.3
## [105] xfun_0.48 stringi_1.8.4
## [107] UCSC.utils_1.2.0 yaml_2.3.10
## [109] evaluate_1.0.1 codetools_0.2-20
## [111] tibble_3.2.1 BiocManager_1.30.25
## [113] cli_3.6.3 munsell_0.5.1
## [115] jquerylib_0.1.4 Rcpp_1.0.13
## [117] png_0.1-8 spatstat.univar_3.0-1
## [119] XML_3.99-0.17 parallel_4.4.1
## [121] viridisLite_0.4.2 mvtnorm_1.3-1
## [123] scales_1.3.0 ggridges_0.5.6
## [125] purrr_1.0.2 crayon_1.5.3
## [127] GetoptLong_1.0.5 rlang_1.1.4
## [129] multcomp_1.4-26