CATALYST
Most of the pipeline and visualizations presented herein have been adapted from Chevrier et al. (2018)’s “Compensation of Signal Spillover in Suspension and Imaging Mass Cytometry” available here.
# load required packages
library(CATALYST)
library(cowplot)
library(flowCore)
library(ggplot2)
library(SingleCellExperiment)
raw_data
is a flowSet
with 2 experiments, each
containing 2’500 raw measurements with a variation of signal over time.
Samples were mixed with DVS beads captured by mass channels 140, 151,
153, 165 and 175.sample_ff
which follows a 6-choose-3 barcoding
scheme where mass channels 102, 104, 105, 106, 108, and 110 were used
for labeling such that each of the 20 individual barcodes are positive
for exactly 3 out of the 6 barcode channels. Accompanying this,
sample_key
contains a binary code of length 6 for each
sample, e.g. 111000, as its unique identifier.mp_cells
, the
package contains 36 single-antibody stained controls in
ss_exp
where beads were stained with antibodies captured by
mass channels 139, 141 through 156, and 158 through 176, respectively,
and pooled together. Note that, to decrease running time, we downsampled
to a total of 10’000 events. Lastly, isotope_list
contains
a named list of isotopic compositions for all elements within 75 through
209 u corresponding to the CyTOF mass range at the time of writing (Coursey et al. 2015).Data used and returned throughout preprocessing are organized into an
object of the SingleCellExperiment
(SCE) class. A SCE can be constructed from a directory housing a single
or set of FCS files, a character vector of the file(s),
flowFrame
(s) or a flowSet
(from the flowCore
package) using CATALYST
’s prepData
function.
prepData
will automatically identify channels not
corresponding to masses (e.g., event times), remove them from the output
SCE’s assay data, and store them as internal event metadata
(int_colData
).
When multiple files or frames are supplied, prepData
will concatenate the data into a single object, and argument
by_time
(default TRUE
) specifies whether runs
should be ordered by their acquisition time
(keyword(x, "$BTIM")
, where x
is a
flowFrame
or flowSet
). A
"sample_id"
column will be added to the output SCE’s
colData
to track which file/frame events originally source
from.
Finally, when transform
(default TRUE
), an
arcsinh-transformation with cofactor cofactor
(defaults to
5) is applied to the input (count) data, and the resulting expression
matrix is stored in the "exprs"
assay slot of the output
SCE.
## class: SingleCellExperiment
## dim: 61 5000
## metadata(2): experiment_info chs_by_fcs
## assays(2): counts exprs
## rownames(61): BC1 Vol1 ... Pb208Di BC9
## rowData names(4): channel_name marker_name marker_class use_channel
## colnames: NULL
## colData names(1): sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
##
## raw_data_1.fcs raw_data_2.fcs
## 2500 2500
## [1] "reducedDims" "altExps" "colPairs" "Time" "Event_length"
## [6] "Center" "Offset" "Width" "Residual"
CATALYST provides an implementation of bead-based normalization as described by Finck et al. (Finck et al. 2013). Here, identification of bead-singlets (used for normalization), as well as of bead-bead and cell-bead doublets (to be removed) is automated as follows:
normCytof
: Normalization using bead standardsSince bead gating is automated here, normalization comes down to a
single function that takes a SingleCellExperiment
as input
and only requires specification of the beads
to be used for
normalization. Valid options are:
"dvs"
for bead masses 140, 151, 153, 165, 175"beta"
for bead masses 139, 141, 159, 169, 175By default, we apply a median ± 5 mad
rule to remove low- and high-signal events from the bead population used
for estimating normalization factors. The extent to which bead
populations are trimmed can be adjusted via trim
. The
population will become increasingly narrow and bead-bead doublets will
be exluded as the trim
value decreases. Notably, slight
over-trimming will not affect normalization.
It is therefore recommended to choose a trim
value that is
small enough to assure removal of doublets at the cost of a small bead
population to normalize to.
normCytof
will return the following list of SCE(s)…
data
: Input dataset including normalized counts (and
expressions, if transform = TRUE
).
remove_beads = FALSE
, colData
columns
"is_bead"
and "remove"
indicate whether an
event has been marker as a bead or for removal, respectively.beads
: Subset of identified bead events.removed
: Subset of all cells that have been from the
original dataset, including bead events as well as bead-bead and
bead-cell doublets.…and ggplot
-objects:
scatter
: Scatter plot of bead vs. DNA intensities with
indication of applied gates.lines
: Running-median smoothed bead intensities
vs. time before and after normalization.Besides general normalized parameters (beads
specifying
the normalization beads, and running median windown width
k
), normCytof
requires as input to
assays
corresponding to count- and expression-like data
respectively. Here, correction factors are computed on the linear
(count) scale, while automated bead-identification happens on the
transformed (expression) scale.
By default, normCytof
will overwrite the specified
assays
with the normalized data
(overwrite = TRUE
). In order to retain both unnormalized
and normalized data, overwrite
should be set to
FALSE
, in which case normalized counts (and expression,
when transform = TRUE
) will be written to separate assay
normcounts/exprs
, respectively.
# construct SCE
sce <- prepData(raw_data)
# apply normalization; keep raw data
res <- normCytof(sce, beads = "dvs", k = 50,
assays = c("counts", "exprs"), overwrite = FALSE)
# check number & percentage of bead / removed events
n <- ncol(sce); ns <- c(ncol(res$beads), ncol(res$removed))
data.frame(
check.names = FALSE,
"#" = c(ns[1], ns[2]),
"%" = 100*c(ns[1]/n, ns[2]/n),
row.names = c("beads", "removed"))
## # %
## beads 141 2.82
## removed 153 3.06
# extract data excluding beads & doublets,
# and including normalized intensitied
sce <- res$data
assayNames(sce)
## [1] "counts" "exprs" "normcounts" "normexprs"
CATALYST provides an implementation of the single-cell deconvolution algorithm described by Zunder et al. (Zunder et al. 2015). The package contains three functions for debarcoding and three visualizations that guide selection of thresholds and give a sense of barcode assignment quality.
In summary, events are assigned to a sample when i) their positive and negative barcode populations are separated by a distance larger than a threshold value and ii) the combination of their positive barcode channels appears in the barcoding scheme. Depending on the supplied scheme, there are two possible ways of arriving at preliminary event assignments:
All data required for debarcoding are held in objects of the SingleCellExperiment (SCE) class, allowing for the following easy-to-use workflow:
assignPrelim
will return a SCE containing the input
measurement data, barcoding scheme, and preliminary event
assignments.applyCutoffs
. It is
recommended to estimate, and possibly adjust, population-specific
separation cutoffs by running estCutoffs
prior to
this.plotYields
, plotEvents
and
plotMahal
aim to guide selection of devoncolution
parameters and to give a sense of the resulting barcode assignment
quality.assignPrelim
: Assignment of preliminary IDsThe debarcoding process commences by assigning each event a
preliminary barcode ID. assignPrelim
thereby takes either a
binary barcoding scheme or a vector of numeric masses as input, and
accordingly assigns each event the appropirate row name or mass as ID.
FCS files are read into R with read.FCS
of the flowCore
package, and are represented as an object of class
flowFrame
:
## flowFrame object 'anonymous'
## with 20000 cells and 6 observables:
## name desc range minRange maxRange
## 1 (Pd102)Di BC102 9745.80 -0.999912 9745.80
## 2 (Pd104)Di BC104 9687.52 -0.999470 9687.52
## 3 (Pd105)Di BC105 8924.64 -0.998927 8924.64
## 4 (Pd106)Di BC106 8016.67 -0.999782 8016.67
## 5 (Pd108)Di BC108 9043.87 -0.999997 9043.87
## 6 (Pd110)Di BC110 8204.46 -0.999937 8204.46
## 0 keywords are stored in the 'description' slot
The debarcoding scheme should be a binary table with sample IDs as row and numeric barcode masses as column names:
## 102 104 105 106 108 110
## A1 1 1 1 0 0 0
## A2 1 1 0 1 0 0
## A3 1 1 0 0 1 0
## A4 1 1 0 0 0 1
## A5 1 0 1 1 0 0
## B1 1 0 1 0 1 0
Provided with a SingleCellExperiment
and a compatible
barcoding scheme (barcode masses must occur as parameters in the
supplied SCE), assignPrelim
will add the following data to
the input SCE: - assay slot "scaled"
containing normalized
expression values where each population is scaled to the 95%-quantile of
events assigend to the respective population. - colData
columns "bc_id"
and "delta"
containing barcode
IDs and separations between lowest positive and highest negative
intensity (on the normalized scale) - rowData
column
is_bc
specifying, for each channel, whether it has been
specified as a barcode channel
## Debarcoding data...
## o ordering
## o classifying events
## Normalizing...
## Computing deltas...
## class: SingleCellExperiment
## dim: 6 20000
## metadata(3): experiment_info chs_by_fcs bc_key
## assays(3): counts exprs scaled
## rownames(6): BC102 BC104 ... BC108 BC110
## rowData names(5): channel_name marker_name marker_class use_channel
## is_bc
## colnames: NULL
## colData names(3): sample_id bc_id delta
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## [1] "BC102" "BC104" "BC105" "BC106" "BC108" "BC110"
##
## A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1
## 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
## D2 D3 D4 D5
## 1000 1000 1000 1000
estCutoffs
: Estimation of separation cutoffsAs opposed to a single global cutoff, estCutoffs
will
estimate a sample-specific cutoff to deal with barcode population cell
yields that decline in an asynchronous fashion. Thus, the choice of
thresholds for the distance between negative and positive barcode
populations can be i) automated and ii) independent for
each barcode. Nevertheless, reviewing the yield plots (see below),
checking and possibly refining separation cutoffs is advisable.
For the estimation of cutoff parameters we consider yields upon
debarcoding as a function of the applied cutoffs. Commonly, this
function will be characterized by an initial weak decline, where
doublets are excluded, and subsequent rapid decline in yields to zero.
Inbetween, low numbers of counts with intermediate barcode separation
give rise to a plateau. To facilitate robust estimation, we fit a linear
and a three-parameter log-logistic function (Finney 1971) to the yields function with the
LL.3
function of the drc R package
(Ritz et al. 2015) (Figure
@ref(fig:estCutoffs)). As an adequate cutoff estimate, we target a point
that marks the end of the plateau regime and on-set of yield decline to
appropriately balance confidence in barcode assignment and cell
yield.
The goodness of the linear fit relative to the log-logistic fit is weighed as follow: $$w = \frac{\text{RSS}_{log-logistic}}{\text{RSS}_{log-logistic}+\text{RSS}_{linear}}$$
The cutoffs for both functions are defined as:
$$c_{linear} = -\frac{\beta_0}{2\beta_1}$$ $$c_{log-logistic}=\underset{x}{\arg\min}\:\frac{\vert\:f'(x)\:\vert}{f(x)} > 0.1$$
The final cutoff estimate c is defined as the weighted mean between these estimates:
c = (1 − w) ⋅ clog − logistic + w ⋅ clinear
# estimate separation cutoffs
sce <- estCutoffs(sce)
# view separation cutoff estimates
metadata(sce)$sep_cutoffs
## A1 A2 A3 A4 A5 B1 B2 B3
## 0.3811379 0.3262184 0.3225851 0.2879897 0.3423528 0.3583485 0.3113566 0.3242730
## B4 B5 C1 C2 C3 C4 C5 D1
## 0.3200084 0.2955329 0.3909571 0.3462298 0.3288156 0.2532980 0.2397771 0.2469145
## D2 D3 D4 D5
## 0.3221655 0.3319978 0.2804174 0.3311446
plotYields
: Selecting barcode separation cutoffsFor each barcode, plotYields
will show the distribution
of barcode separations and yields upon debarcoding as a function of
separation cutoffs. If available, the currently used separation cutoff
as well as its resulting yield within the population is indicated in the
plot’s main title.
Option which = 0
will render a summary plot of all
barcodes. All yield functions should behave as described above: decline,
stagnation, decline. Convergence to 0 yield at low cutoffs is a strong
indicator that staining in this channel did not work, and excluding the
channel entirely is sensible in this case. It is thus recommended to
always view the all-barcodes yield plot to eliminate
uninformative populations, since small populations may cause
difficulties when computing spill estimates.
applyCutoffs
: Applying deconvolution parametersOnce preliminary assignments have been made,
applyCutoffs
will apply the deconvolution parameters:
Outliers are filtered by a Mahalanobis distance threshold, which takes
into account each population’s covariance, and doublets are removed by
excluding events from a population if the separation between their
positive and negative signals fall below a separation cutoff. Current
thresholds are held in the sep_cutoffs
and
mhl_cutoff
slots of the SCE’s metadata
. By
default, applyCutoffs
will try to access the
metadata
"sep_cutoffs"
slopt of the input SCE,
requiring having run estCutoffs
prior to this, or manually
specifying a vector or separation cutoffs. Alternatively, a numeric
vector of cutoff values or a single, global value may be supplied In
either case, it is highly recommended to thoroughly review the yields
plot (see above), as the choice of separation cutoffs will determine
debarcoding quality and cell yield.
# use global / population-specific separation cutoff(s)
sce2 <- applyCutoffs(sce)
sce3 <- applyCutoffs(sce, sep_cutoffs = 0.35)
# compare yields before and after applying
# global / population-specific cutoffs
c(specific = mean(sce2$bc_id != 0),
global = mean(sce3$bc_id != 0))
## specific global
## 0.68035 0.66970
plotEvents
: Normalized intensitiesNormalized intensities for a barcode can be viewed with
plotEvents
. Here, each event corresponds to the intensities
plotted on a vertical line at a given point along the x-axis. Option
which = 0
will display unassigned events, and the number of
events shown for a given sample may be varied via argument
n
. If which = "all"
, the function will render
an event plot for all IDs (including 0) with events assigned.
plotMahal
: All barcode biaxial plotFunction plotMahal
will plot all inter-barcode
interactions for the population specified with argument
which
. Events are colored by their Mahalanobis distance.
NOTE: For more than 7 barcodes (up to
128 samples) the function will render an error, as this visualization is
infeasible and hardly informative. Using the default Mahalanobis cutoff
value of 30 is recommended in such cases.
CATALYST performs compensation via a two-step approach comprising:
Retrieval of real signal. As in conventional flow cytometry, we can model spillover linearly, with the channel stained for as predictor, and spill-effected channels as response. Thus, the intensity observed in a given channel j are a linear combination of its real signal and contributions of other channels that spill into it. Let sij denote the proportion of channel j signal that is due to channel i, and wj the set of channels that spill into channel j. Then
Ij, observed = Ij, real + ∑i ∈ wjsij
In matrix notation, measurement intensities may be viewed as the convolution of real intensities and a spillover matrix with dimensions number of events times number of measurement parameters:
Iobserved = Ireal ⋅ SM
Therefore, we can estimate the real signal, Ireal , as:
Ireal = Iobserved ⋅ SM−1 = Iobserved ⋅ CM
where SM−1 is termed
compensation matrix (CM). This approach
is implemented in compCytof(..., method = "flow")
and makes
use of flowCore’s
compensate
function.
While mathematically exact, the solution to this equation will yield negative values, and does not account for the fact that real signal would be strictly non-negative counts. A computationally efficient way to adress this is the use of non-negative linear least squares (NNLS):
min { (Iobserved − SM ⋅ Ireal)T ⋅ (Iobserved − SM ⋅ Ireal) } s.t. Ireal ≥ 0
This approach will solve for Ireal
such that the least squares criterion is optimized under the constraint
of non-negativity. To arrive at such a solution we apply the
Lawson-Hanson algorithm (Lawson and Hanson 1974;
Lawson and Hanson 1995) for NNLS implemented in the nnls
R package (method="nnls"
).
Estimation of SM. Because any signal not in
a single stain experiment’s primary channel j results from channel crosstalk,
each spill entry sij can
be approximated by the slope of a linear regression with channel j signal as the response, and
channel i signals as the
predictors, where i ∈ wj.
computeSpillmat()
offers two alternative ways for spillover
estimation, summarized in Figure @ref(fig:methods).
The default
method approximates this slope with the
following single-cell derived estimate: Let i+ denote the set of
cells that are possitive in channel i, and sijc
be the channel i to j spill computed for a cell c that has been assigned to this
population. We approximate sijc
as the ratio between the signal in unstained spillover receiving and
stained spillover emitting channel, Ij and Ii,
respectively. The expected background in these channels, mj−
and mi−,
is computed as the median signal of events that are i) negative in the
channels for which spill is estimated (i and j); ii) not assigned to
potentionally interacting channels; and, iii) not unassigned, and
subtracted from all measurements:
$$s_{ij}^c = \frac{I_j - m_j^{i-}}{I_i - m_i^{i-}}$$
Each entry sij in SM is then computed as the median spillover across all cells c ∈ i+:
sij = med(sijc | c ∈ i+)
In a population-based fashion, as done in conventional flow
cytometry, method = "classic"
calculates sij as
the slope of a line through the medians (or trimmed means) of stained
and unstained populations, mj+
and mi+,
respectively. Background signal is computed as above and substracted,
according to:
$$s_{ij} = \frac{m_j^+-m_j^-}{m_i^+-m_i^-}$$
On the basis of their additive nature, spill values are estimated
independently for every pair of interacting channels.
interactions = "default"
thereby exclusively takes into
account interactions that are sensible from a chemical and physical
point of view:
See Table @ref(tab:isotopes) for the list of mass channels considered
to potentionally contain isotopic contaminatons, along with a heatmap
representation of all interactions considered by the
default
method in Figure @ref(fig:interactions).
Metal | Isotope masses |
---|---|
La | 138, 139 |
Pr | 141 |
Nd | 142, 143, 144, 145, 146, 148, 150 |
Sm | 144, 147, 148, 149, 150, 152, 154 |
Eu | 151, 153 |
Gd | 152, 154, 155, 156, 157, 158, 160 |
Dy | 156, 158, 160, 161, 162, 163, 164 |
Er | 162, 164, 166, 167, 168, 170 |
Tb | 159 |
Ho | 165 |
Yb | 168, 170, 171, 172, 173, 174, 176 |
Tm | 169 |
Lu | 175, 176 |
Alternatively, interactions = "all"
will compute a spill
estimate for all n ⋅ (n − 1) possible
interactions, where n denotes
the number of measurement parameters. Estimates falling below the
threshold specified by th
will be set to zero. Lastly, note
that diagonal entries sii = 1
for all i ∈ 1, ..., n, so that
spill is relative to the total signal measured in a given channel.
computeSpillmat
: Estimation of the spillover
matrixGiven a SCE of single-stained beads (or cells) and a numeric vector
specifying the masses stained for, computeSpillmat
estimates the spillover matrix (SM) as described above; the estimated SM
will be stored in the SCE’s metadata
under
"spillover_matrix"
.
Spill values are affected my the method
chosen for their
estimation, that is "median"
or "mean"
, and,
in the latter case, the specified trim
percentage. The
process of adjusting these options and reviewing the compensated data
may iterative until compensation is satisfactory.
# get single-stained control samples
data(ss_exp)
# specify mass channels stained for & debarcode
bc_ms <- c(139, 141:156, 158:176)
sce <- prepData(ss_exp)
sce <- assignPrelim(sce, bc_ms, verbose = FALSE)
sce <- applyCutoffs(estCutoffs(sce))
# compute & extract spillover matrix
sce <- computeSpillmat(sce)
sm <- metadata(sce)$spillover_matrix
# do some sanity checks
chs <- channels(sce)
ss_chs <- chs[rowData(sce)$is_bc]
all(diag(sm[ss_chs, ss_chs]) == 1)
## [1] TRUE
## [1] TRUE
plotSpillmat
: Spillover matrix heatmapplotSpillmat
provides a visualization of estimated spill
percentages as a heatmap. Channels without a single-antibody stained
control are annotated in grey, and colours are ramped to the highest
spillover value present. Option annotate = TRUE
(the
default) will display spill values inside each bin, and the total amount
of spill caused and received by each channel on the top and to the
right, respectively.
plotSpillmat
will try and access the SM stored in the
input SCE’s "spillover_matrix"
metadata
slot,
requiring having run computeSpillmat
or manually specifying
a matrix of appropriate format.
## Warning: The `guide` argument in `scale_*()` cannot be `FALSE`. This was deprecated in
## ggplot2 3.3.4.
## ℹ Please use "none" instead.
## ℹ The deprecated feature was likely used in the CATALYST package.
## Please report the issue at <https://github.com/HelenaLC/CATALYST/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
compCytof
: Compensation of mass cytometry dataAssuming a linear spillover, compCytof
compensates mass
cytometry based experiments using a provided spillover matrix. If the
spillover matrix (SM) does not contain the same set of columns as the
input experiment, it will be adapted according to the following
rules:
To omit the need to respecify the cofactor
(s) for
transformation, transform = TRUE
will auto-transform the
compensated data. compCytof
will thereby try to reuse the
cofactor
(s) stored under
int_metadata(sce)$cofactor
from the previously applied
transformation; otherwise, the cofactor
argument should be
specified.
If overwrite = TRUE
(the default),
compCytof
will overwrite the specified counts
assay
(and exprs
, when
transform = TRUE
) with the compensated data. Otherwise,
compensated count (and expression) data will be stored in separate
assays compcounts/exprs
, respectively.
# construct SCE of multiplexed cells
data(mp_cells)
sce <- prepData(mp_cells)
# compensate using NNLS-method; keep uncompensated data
sce <- compCytof(sce, sm, method = "nnls", overwrite = FALSE)
# visualize data before & after compensation
chs <- c("Er167Di", "Er168Di")
as <- c("exprs", "compexprs")
ps <- lapply(as, function(a)
plotScatter(sce, chs, assay = a))
plot_grid(plotlist = ps, nrow = 1)
## Warning: The dot-dot notation (`..ncount..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(ncount)` instead.
## ℹ The deprecated feature was likely used in the CATALYST package.
## Please report the issue at <https://github.com/HelenaLC/CATALYST/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
plotScatter
provides a flexible way of visualizing
expression data as biscatters, and supports automated facetting (should
more than 2 channels be visualized). Cells may be colored by density
(default color_by = NULL
) or other (non-)continous
variables. When coloring by density, plotScatter
will use
geom_hex
to bin cells into the number of specified
bins
; otherwise cells will be plotted as points. The
following code chunks shall illustrate these different
functionalities:
While the SingleCellExperiment
class provides many
advantages in terms of compactness, interactability and robustness, it
can be desirous to write out intermediate files at each preprocessing
stage, or to use other packages currently build around
flowCore
infrastructure (flowFrame
and
flowSet
classes), or classes derived thereof (e.g., flowWorkspace’s
GatingSet
). This section demonstrates how to safely convert
between these data structures.
Conversion from SCE to flowFrame
s/flowSet
,
which in turn can be written to FCS files using flowCore’s
write.FCS
function, is not straightforward. It is not
recommended to directly write FCS via
write.FCS(flowFrame(t(assay(sce))))
, as this can lead to
invalid FCS files or the data being shown on an inappropriate scale in
e.g. Cytobank. Instead, CATALYST
provides the
sce2fcs
function to facilitate correct back-conversion.
sce2fcs
allows specification of a variable to split the
SCE by (argument split_by
), e.g., to split the data by
sample after debarcoding; whether to keep or drop any cell metadata
(argument keep_cd
) and dimension reductions (argument
keep_dr
) available within the object; and which assay data
to use (argument assay
)1:
# run debarcoding
sce <- prepData(sample_ff)
sce <- assignPrelim(sce, sample_key)
sce <- applyCutoffs(estCutoffs(sce))
# exclude unassigned events
sce <- sce[, sce$bc_id != 0]
# convert to 'flowSet' with one frame per sample
(fs <- sce2fcs(sce, split_by = "bc_id"))
## A flowSet with 20 experiments.
##
## column names(6): (Pd102)Di (Pd104)Di ... (Pd108)Di (Pd110)Di
# split check: number of cells per barcode ID
# equals number of cells in each 'flowFrame'
all(c(fsApply(fs, nrow)) == table(sce$bc_id))
## [1] TRUE
Having converted out SCE to a flowSet
, we can write out
each of its flowFrame
s to an FCS file with a meaningful
filename that retains the sample of origin:
Besides writing out FCS files, conversion to
flowFrame
s/flowSet
also enables leveraging the
existing infrastructure for these classes such as ggcyto for
visualization and openCyto
for gating:
# load required packages
library(ggcyto)
library(openCyto)
library(flowWorkspace)
# construct 'GatingSet'
sce <- prepData(raw_data)
ff <- sce2fcs(sce, assay = "exprs")
gs <- GatingSet(flowSet(ff))
# specify DNA channels
dna_chs <- c("Ir191Di", "Ir193Di")
# apply elliptical gate
gs_add_gating_method(
gs, alias = "cells",
pop = "+", parent = "root",
dims = paste(dna_chs, collapse = ","),
gating_method = "flowClust.2d",
gating_args = "K=1,q=0.9")
# plot scatter of DNA channels including elliptical gate
ggcyto(gs,
aes_string(dna_chs[1], dna_chs[2])) +
geom_hex(bins = 128) +
geom_gate(data = "cells") +
facet_null() + ggtitle(NULL) +
theme(aspect.ratio = 1,
panel.grid.minor = element_blank())
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] openCyto_2.19.0 ggcyto_1.35.0
## [3] flowWorkspace_4.19.0 ncdfFlow_2.53.0
## [5] BH_1.84.0-0 scater_1.35.0
## [7] ggplot2_3.5.1 scuttle_1.17.0
## [9] diffcyt_1.27.0 flowCore_2.19.0
## [11] cowplot_1.1.3 CATALYST_1.31.2
## [13] SingleCellExperiment_1.29.1 SummarizedExperiment_1.37.0
## [15] Biobase_2.67.0 GenomicRanges_1.59.1
## [17] GenomeInfoDb_1.43.2 IRanges_2.41.1
## [19] S4Vectors_0.45.2 BiocGenerics_0.53.3
## [21] generics_0.1.3 MatrixGenerics_1.19.0
## [23] matrixStats_1.4.1 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] splines_4.4.2 tibble_3.2.1
## [3] polyclip_1.10-7 graph_1.85.0
## [5] XML_3.99-0.17 lifecycle_1.0.4
## [7] rstatix_0.7.2 edgeR_4.5.0
## [9] doParallel_1.0.17 lattice_0.22-6
## [11] MASS_7.3-61 backports_1.5.0
## [13] magrittr_2.0.3 limma_3.63.2
## [15] sass_0.4.9 rmarkdown_2.29
## [17] jquerylib_0.1.4 yaml_2.3.10
## [19] plotrix_3.8-4 buildtools_1.0.0
## [21] minqa_1.2.8 RColorBrewer_1.1-3
## [23] ConsensusClusterPlus_1.71.0 multcomp_1.4-26
## [25] abind_1.4-8 zlibbioc_1.52.0
## [27] Rtsne_0.17 purrr_1.0.2
## [29] TH.data_1.1-2 tweenr_2.0.3
## [31] sandwich_3.1-1 circlize_0.4.16
## [33] GenomeInfoDbData_1.2.13 ggrepel_0.9.6
## [35] irlba_2.3.5.1 maketools_1.3.1
## [37] codetools_0.2-20 DelayedArray_0.33.2
## [39] ggforce_0.4.2 tidyselect_1.2.1
## [41] shape_1.4.6.1 UCSC.utils_1.3.0
## [43] farver_2.1.2 lme4_1.1-35.5
## [45] ScaledMatrix_1.15.0 viridis_0.6.5
## [47] jsonlite_1.8.9 GetoptLong_1.0.5
## [49] BiocNeighbors_2.1.1 Formula_1.2-5
## [51] ggridges_0.5.6 survival_3.7-0
## [53] iterators_1.0.14 foreach_1.5.2
## [55] tools_4.4.2 ggnewscale_0.5.0
## [57] Rcpp_1.0.13-1 glue_1.8.0
## [59] gridExtra_2.3 SparseArray_1.7.2
## [61] xfun_0.49 dplyr_1.1.4
## [63] withr_3.0.2 BiocManager_1.30.25
## [65] fastmap_1.2.0 boot_1.3-31
## [67] fansi_1.0.6 digest_0.6.37
## [69] rsvd_1.0.5 R6_2.5.1
## [71] colorspace_2.1-1 Cairo_1.6-2
## [73] gtools_3.9.5 utf8_1.2.4
## [75] tidyr_1.3.1 hexbin_1.28.5
## [77] data.table_1.16.2 FNN_1.1.4.1
## [79] httr_1.4.7 S4Arrays_1.7.1
## [81] uwot_0.2.2 pkgconfig_2.0.3
## [83] gtable_0.3.6 ComplexHeatmap_2.23.0
## [85] RProtoBufLib_2.19.0 XVector_0.47.0
## [87] sys_3.4.3 htmltools_0.5.8.1
## [89] carData_3.0-5 RBGL_1.83.0
## [91] clue_0.3-66 scales_1.3.0
## [93] png_0.1-8 colorRamps_2.3.4
## [95] knitr_1.49 reshape2_1.4.4
## [97] rjson_0.2.23 nlme_3.1-166
## [99] nloptr_2.1.1 cachem_1.1.0
## [101] zoo_1.8-12 GlobalOptions_0.1.2
## [103] stringr_1.5.1 parallel_4.4.2
## [105] vipor_0.4.7 pillar_1.9.0
## [107] grid_4.4.2 vctrs_0.6.5
## [109] ggpubr_0.6.0 car_3.1-3
## [111] BiocSingular_1.23.0 cytolib_2.19.0
## [113] beachmat_2.23.2 cluster_2.1.6
## [115] beeswarm_0.4.0 Rgraphviz_2.51.0
## [117] evaluate_1.0.1 mvtnorm_1.3-2
## [119] cli_3.6.3 locfit_1.5-9.10
## [121] compiler_4.4.2 rlang_1.1.4
## [123] crayon_1.5.3 ggsignif_0.6.4
## [125] labeling_0.4.3 FlowSOM_2.15.0
## [127] plyr_1.8.9 ggbeeswarm_0.7.2
## [129] stringi_1.8.4 viridisLite_0.4.2
## [131] BiocParallel_1.41.0 nnls_1.6
## [133] munsell_0.5.1 Matrix_1.7-1
## [135] statmod_1.5.0 drc_3.0-1
## [137] igraph_2.1.1 broom_1.0.7
## [139] bslib_0.8.0 flowClust_3.45.0
Only count-like data should be written to FCS files and is guaranteed to show with appropriate scale in Cytobank!↩︎