Title: | Cytometry dATa anALYSis Tools |
---|---|
Description: | CATALYST provides tools for preprocessing of and differential discovery in cytometry data such as FACS, CyTOF, and IMC. Preprocessing includes i) normalization using bead standards, ii) single-cell deconvolution, and iii) bead-based compensation. For differential discovery, the package provides a number of convenient functions for data processing (e.g., clustering, dimension reduction), as well as a suite of visualizations for exploratory data analysis and exploration of results from differential abundance (DA) and state (DS) analysis in order to identify differences in composition and expression profiles at the subpopulation-level, respectively. |
Authors: | Helena L. Crowell [aut, cre] , Vito R.T. Zanotelli [aut] , Stéphane Chevrier [aut, dtc] , Mark D. Robinson [aut, fnd] , Bernd Bodenmiller [fnd] |
Maintainer: | Helena L. Crowell <[email protected]> |
License: | GPL (>=2) |
Version: | 1.31.2 |
Built: | 2024-12-29 07:57:57 UTC |
Source: | https://github.com/bioc/CATALYST |
This helper function adapts the columns of a provided spillover matrix such that it is compatible with data having the column names provided.
adaptSpillmat( x, out_chs, isotope_list = CATALYST::isotope_list, verbose = TRUE )
adaptSpillmat( x, out_chs, isotope_list = CATALYST::isotope_list, verbose = TRUE )
x |
a previously calculated spillover matrix. |
out_chs |
the column names that the prepared output spillover matrix should have. Numeric names as well as names of the form MetalMass(Di), e.g. Ir191, Ir191Di or Ir191(Di), will be interpreted as masses with associated metals. |
isotope_list |
named list. Used to validate the input spillover matrix.
Names should be metals; list elements numeric vectors of their isotopes.
See |
verbose |
logical. Should warnings about possibly inaccurate spillover estimates be printed to the console? |
The rules how the spillover matrix is adapted
are explained in compCytof
.
An adapted spillover matrix with
column and row names according to out_chs
.
Helena L Crowell [email protected] & Vito RT Zanotelli
# estimate spillover matrix from # single-stained control samples data(ss_exp) sce <- prepData(ss_exp) bc_ms <- c(139, 141:156, 158:176) sce <- assignPrelim(sce, bc_ms, verbose = FALSE) sce <- applyCutoffs(estCutoffs(sce)) sce <- computeSpillmat(sce) library(SingleCellExperiment) sm1 <- metadata(sce)$spillover_matrix sm2 <- adaptSpillmat(sm1, rownames(sce), verbose = FALSE) all(dim(sm2) == ncol(sm1))
# estimate spillover matrix from # single-stained control samples data(ss_exp) sce <- prepData(ss_exp) bc_ms <- c(139, 141:156, 158:176) sce <- assignPrelim(sce, bc_ms, verbose = FALSE) sce <- applyCutoffs(estCutoffs(sce)) sce <- computeSpillmat(sce) library(SingleCellExperiment) sm1 <- metadata(sce)$spillover_matrix sm2 <- adaptSpillmat(sm1, rownames(sce), verbose = FALSE) all(dim(sm2) == ncol(sm1))
Applies separation and mahalanobies distance cutoffs.
applyCutoffs(x, assay = "exprs", mhl_cutoff = 30, sep_cutoffs = NULL)
applyCutoffs(x, assay = "exprs", mhl_cutoff = 30, sep_cutoffs = NULL)
x |
|
assay |
character string specifying which assay data to use.
Should be one of |
mhl_cutoff |
numeric mahalanobis distance threshold above which events
should be unassigned; ignored if |
sep_cutoffs |
non-negative numeric of length one or of same length
as the number of rows in the |
the input SingleCellExperiment
x
is returned with
updated colData
columns "bc_id"
and "mhl_dist"
,
and an additional int_metadata
slot "mhl_cutoff"
containing the applied mahalanobies distance cutoff.
Helena L Crowell [email protected]
Zunder, E.R. et al. (2015). Palladium-based mass tag cell barcoding with a doublet-filtering scheme and single-cell deconvolution algorithm. Nature Protocols 10, 316-333.
library(SingleCellExperiment) # construct SCE data(sample_ff, sample_key) sce <- prepData(sample_ff) # assign preliminary barcode IDs # & estimate separation cutoffs sce <- assignPrelim(sce, sample_key) sce <- estCutoffs(sce) # use estimated population-specific # vs. global separation cutoff(s) sce1 <- applyCutoffs(sce) sce2 <- applyCutoffs(sce, sep_cutoffs = 0.35) # compare yields after applying cutoff(s) c(global = mean(sce1$bc_id != 0), specific = mean(sce2$bc_id != 0))
library(SingleCellExperiment) # construct SCE data(sample_ff, sample_key) sce <- prepData(sample_ff) # assign preliminary barcode IDs # & estimate separation cutoffs sce <- assignPrelim(sce, sample_key) sce <- estCutoffs(sce) # use estimated population-specific # vs. global separation cutoff(s) sce1 <- applyCutoffs(sce) sce2 <- applyCutoffs(sce, sep_cutoffs = 0.35) # compare yields after applying cutoff(s) c(global = mean(sce1$bc_id != 0), specific = mean(sce2$bc_id != 0))
Assigns a preliminary barcode ID to each event.
assignPrelim(x, bc_key, assay = "exprs", verbose = TRUE)
assignPrelim(x, bc_key, assay = "exprs", verbose = TRUE)
x |
|
bc_key |
the debarcoding scheme. A binary matrix with sample names as row names and numeric masses as column names OR a vector of numeric masses corresponding to barcode channels. When the latter is supplied, 'assignPrelim' will create a scheme of the appropriate format internally. |
assay |
character string specifying which assay to use. |
verbose |
logical. Should extra information on progress be reported? |
a SingleCellExperiment
structured as follows:
assays
counts
- raw counts
exprs
- arcsinh-transformed counts
scaled
- population-wise scaled
expression using (95%)-quantiles as boundaries
colData
bc_id
- numeric vector of barcode assignments
delta
- separation between
positive and negative barcode populations
metadata
bc_key
- the input debarcoding scheme
Helena L Crowell [email protected]
Zunder, E.R. et al. (2015). Palladium-based mass tag cell barcoding with a doublet-filtering scheme and single-cell deconvolution algorithm. Nature Protocols 10, 316-333.
data(sample_ff, sample_key) sce <- prepData(sample_ff) sce <- assignPrelim(sce, sample_key) table(sce$bc_id)
data(sample_ff, sample_key) sce <- prepData(sample_ff) sce <- assignPrelim(sce, sample_key) table(sce$bc_id)
Computes centered log-ratios (CLR) on cluster/sample proportions across samples/clusters, and visualizes them in a lower-dimensional space, highlighting differences in composition between samples/clusters.
clrDR( x, dr = c("PCA", "MDS", "UMAP", "TSNE", "DiffusionMap"), by = c("sample_id", "cluster_id"), k = "meta20", dims = c(1, 2), base = 2, arrows = TRUE, point_col = switch(by, sample_id = "condition", "cluster_id"), arrow_col = switch(by, sample_id = "cluster_id", "condition"), arrow_len = 0.5, arrow_opa = 0.5, label_by = NULL, size_by = TRUE, point_pal = NULL, arrow_pal = NULL )
clrDR( x, dr = c("PCA", "MDS", "UMAP", "TSNE", "DiffusionMap"), by = c("sample_id", "cluster_id"), k = "meta20", dims = c(1, 2), base = 2, arrows = TRUE, point_col = switch(by, sample_id = "condition", "cluster_id"), arrow_col = switch(by, sample_id = "cluster_id", "condition"), arrow_len = 0.5, arrow_opa = 0.5, label_by = NULL, size_by = TRUE, point_pal = NULL, arrow_pal = NULL )
x |
|
dr |
character string specifying which dimension reduction to use. |
by |
character string specifying across which IDs to compute CLRs
|
k |
character string specifying which clustering to use;
valid values are |
dims |
two numeric scalars indicating which dimensions to plot. |
base |
integer scalar specifying the logarithm base to use. |
arrows |
logical specifying whether to include arrows for PC loadings. |
point_col , arrow_col
|
character string specifying a non-numeric
cell metadata column to color points and PC loading arrows by;
valid values are |
arrow_len |
non-zero single numeric specifying the length of loading vectors relative to the largest xy-coordinate in the embedded space; NULL for no re-sizing (see details). |
arrow_opa |
single numeric in [0,1] specifying the opacity (alpha) of PC loading arrows when they are grouped; 0 will hide individual arrows. |
label_by |
character string specifying a non-numeric sample metadata
variable to label points by; valid values are |
size_by |
logical specifying whether to scale point sizes by the number
of cells in a given sample/cluster (for |
point_pal , arrow_pal
|
character string of colors to use
for points and PC loading arrows. Arguments default to
|
Let k
be one of samples,
k
one of clusters,
and
p(s,k)
be the proportion of cells from s
in .
The centered log-ratio (CLR) is defined as
and analogous for clusters replacing s
by k
and K
by
S
. Thus, each sample/cluster gives a vector with length K/S
and mean 0, and the CLRs computed across all instances can be represented
as a matrix with dimensions S
x K
(or K
x S
for clusters) that we embed into a lower dimensional space.
In principle, clrDR
allows any dimension reduction to be applied on
the CLRs. The default method (dr = "PCA"
) will include the percentage
of variance explained by each principal component (PC) in the axis labels.
Noteworthily, distances between points in the lower-dimensional space are
meaningful only for linear DR methods (PCA and MDS), and results obtained
from other methods should be interpreted with caution. Thus, the output
plot's aspect ratio should be kept as is for PCA and MDS; non-linear
DR methods can use aspect.ratio = 1
, rendering a square plot.
For dr = "PCA"
, PC loadings will be represented as arrows that may be
interpreted as follows: 0° (180°) between vectors indicates a strong positive
(negative) relation between them, while vectors that are orthogonal to each
another (90°) are roughly independent.
When a vector points towards a given quadrant, the variability in proportions for the points within this quadrant are largely driven by the corresponding variable. Here, only the relative orientation of vectors to one another and to the PC axes is meaningful; however, the sign of loadings (i.e., whether an arrow points left or right) can be flipped when re-computing PCs.
When arrow_len
is specified, PC loading vectors will be re-scaled to
improve their visibility. Here, a value of 1 will stretch vectors such that
the largest loading will touch on the outer most point. Importantly, while
absolute arrow lengths are not interpretable, their relative length is.
a ggplot
object.
Helena L Crowell [email protected]
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # CLR on sample proportions across clusters # (1st vs. 3rd PCA; include sample labels) clrDR(sce, by = "sample_id", k = "meta12", dims = c(1, 3), label_by = "sample_id") # CLR on cluster proportions across samples # (use custom colors for both points & loadings) clrDR(sce, by = "cluster_id", point_pal = hcl.colors(10, "Spectral"), arrow_pal = c("royalblue", "orange"))
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # CLR on sample proportions across clusters # (1st vs. 3rd PCA; include sample labels) clrDR(sce, by = "sample_id", k = "meta12", dims = c(1, 3), label_by = "sample_id") # CLR on cluster proportions across samples # (use custom colors for both points & loadings) clrDR(sce, by = "cluster_id", point_pal = hcl.colors(10, "Spectral"), arrow_pal = c("royalblue", "orange"))
FlowSOM
clustering &
ConsensusClusterPlus
metaclusteringcluster
will first group cells into xdim
xydim
clusters using FlowSOM, and subsequently perform metaclustering
with ConsensusClusterPlus into 2 through maxK
clusters.
cluster( x, features = "type", xdim = 10, ydim = 10, maxK = 20, verbose = TRUE, seed = 1 )
cluster( x, features = "type", xdim = 10, ydim = 10, maxK = 20, verbose = TRUE, seed = 1 )
x |
|
features |
a character vector specifying
which features to use for clustering; valid values are
|
xdim , ydim
|
numeric specifying the grid size of the
self-orginizing map; passed to |
maxK |
numeric specifying the maximum number of
clusters to evaluate in the metaclustering; passed to
|
verbose |
logical. Should information on progress be reported? |
seed |
numeric. Sets the random seed for reproducible results
in |
The delta area represents the amount of extra cluster stability gained when clustering into k groups as compared to k-1 groups. It can be expected that high stability of clusters can be reached when clustering into the number of groups that best fits the data. The "natural" number of clusters present in the data should thus corresponds to the value of k where there is no longer a considerable increase in stability (pleateau onset).
a SingleCellEcperiment
with the following newly added data:
colData
cluster_id
:
each cell's cluster ID as inferred by FlowSOM
.
One of 1, ..., xdim
xydim
.
rowData
marker_class
: added when previosly unspecified. "type"
when an antigen has been used for clustering, otherwise "state"
.
used_for_clustering
: logical indicating
whether an antigen has been used for clustering.
metadata
SOM_codes
:
a table with dimensions K x (# cell type markers),
where K = xdim
x ydim
. Contains the SOM codes.
cluster_codes
:
a table with dimensions K x (maxK
+ 1).
Contains the cluster codes for all metaclustering.
delta_area
:
a ggplot
object (see details).
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
# construct SCE data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) # run clustering (sce <- cluster(sce)) # view all available clustering names(cluster_codes(sce)) # access specific clustering resolution table(cluster_ids(sce, "meta8")) # view delta area plot delta_area(sce)
# construct SCE data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) # run clustering (sce <- cluster(sce)) # view all available clustering names(cluster_codes(sce)) # access specific clustering resolution table(cluster_ids(sce, "meta8")) # view delta area plot delta_area(sce)
Compensates a mass spectrometry based experiment using a provided spillover matrix & assuming a linear spillover in the experiment.
compCytof( x, sm = NULL, method = c("nnls", "flow"), assay = "counts", overwrite = TRUE, transform = TRUE, cofactor = NULL, isotope_list = CATALYST::isotope_list )
compCytof( x, sm = NULL, method = c("nnls", "flow"), assay = "counts", overwrite = TRUE, transform = TRUE, cofactor = NULL, isotope_list = CATALYST::isotope_list )
x |
a |
sm |
a spillover matrix. |
method |
|
assay |
character string specifying which assay data to use;
should be one of |
overwrite |
logical; should the specified |
transform |
logical; should normalized counts be
arcsinh-transformed with the specified |
cofactor |
numeric cofactor(s) to use for optional
arcsinh-transformation when |
isotope_list |
named list. Used to validate the input spillover matrix.
Names should be metals; list elements numeric vectors of their isotopes.
See |
If the spillover matrix (SM) does not contain the same set of columns as the input experiment, it will be adapted according to the following rules:
columns present in the SM but not in the input data will be removed from it
non-metal columns present in the input but not in the SM will be added such that they do neither receive nor cause spill
metal columns that have the same mass as a channel present in the SM will receive (but not emit) spillover according to that channel
if an added channel could potentially receive spillover (as it has +/-1M or +16M of, or is of the same metal type as another channel measured), a warning will be issued as there could be spillover interactions that have been missed and may lead to faulty compensation
Compensates the input flowFrame
or,
if x
is a character string, all FCS files in the specified location.
Helena L Crowell [email protected] & Vito RT Zanotelli
# deconvolute single-stained control samples data(ss_exp) sce <- prepData(ss_exp) bc_ms <- c(139, 141:156, 158:176) sce <- assignPrelim(sce, bc_ms) sce <- applyCutoffs(estCutoffs(sce)) # estimate spillover matrix sce <- computeSpillmat(sce) # compensate & store compensated data in separate assays sce <- compCytof(sce, overwrite = FALSE) assayNames(sce) # biscatter before vs. after compensation chs <- c("Dy162Di", "Dy163Di") m <- match(chs, channels(sce)) i <- rownames(sce)[m][1] j <- rownames(sce)[m][2] par(mfrow = c(1, 2)) for (a in c("exprs", "compexprs")) { es <- assay(sce, a) plot(es[i, ], es[j, ], cex = 0.2, pch = 19, main = a, xlab = i, ylab = j) }
# deconvolute single-stained control samples data(ss_exp) sce <- prepData(ss_exp) bc_ms <- c(139, 141:156, 158:176) sce <- assignPrelim(sce, bc_ms) sce <- applyCutoffs(estCutoffs(sce)) # estimate spillover matrix sce <- computeSpillmat(sce) # compensate & store compensated data in separate assays sce <- compCytof(sce, overwrite = FALSE) assayNames(sce) # biscatter before vs. after compensation chs <- c("Dy162Di", "Dy163Di") m <- match(chs, channels(sce)) i <- rownames(sce)[m][1] j <- rownames(sce)[m][2] par(mfrow = c(1, 2)) for (a in c("exprs", "compexprs")) { es <- assay(sce, a) plot(es[i, ], es[j, ], cex = 0.2, pch = 19, main = a, xlab = i, ylab = j) }
Computes a spillover matrix from priorly identified single-positive populations.
computeSpillmat( x, assay = "counts", interactions = c("default", "all"), method = c("default", "classic"), trim = 0.5, th = 1e-05 )
computeSpillmat( x, assay = "counts", interactions = c("default", "all"), method = c("default", "classic"), trim = 0.5, th = 1e-05 )
x |
|
assay |
character string specifying which assay to use; should be one
of |
interactions |
|
method |
|
trim |
numeric. Specifies the trim value used for estimation of spill values.
Note that |
th |
single non-negative numeric. Specifies the threshold value below which spill estimates will be set to 0. |
The default
method estimates the spillover as the median ratio
between the unstained spillover receiving and the stained spillover
emitting channel in the corresponding single stained populations.
method = "classic"
will compute the slope of a line through
the medians (or trimmed means) of stained and unstained populations.
The medians (or trimmed means) computed from events that are i) negative
in the respective channels; and, ii) not assigned to interacting channels;
and, iii) not unassigned are subtracted as to account for background.
interactions="default"
considers only expected interactions, that is,
M+/-1 (detection sensitivity), M+16 (oxide formation) and channels measuring
metals that are potentially contaminated by isotopic impurites
(see reference below and isotope_list
).
interaction="all"
will estimate spill for all n x n - n
interactions, where n denotes the number of single-color controls
(= nrow(bc_key(re))
).
Returns a square compensation matrix with dimensions and dimension names matching those of the input flowFrame. Spillover is assumed to be linear, and, on the basis of their additive nature, spillover values are computed independently for each interacting pair of channels.
Helena L Crowell [email protected]
Coursey, J.S., Schwab, D.J., Tsai, J.J., Dragoset, R.A. (2015). Atomic weights and isotopic compositions, (available at http://physics.nist.gov/Comp).
# construct SCE from single-stained control samples data(ss_exp) sce <- prepData(ss_exp) # specify mass channels stained for bc_ms <- c(139, 141:156, 158:176) # debarcode single-positive populations sce <- assignPrelim(sce, bc_ms) sce <- estCutoffs(sce) sce <- applyCutoffs(sce) # estimate & extract spillover matrix sce <- computeSpillmat(sce) library(SingleCellExperiment) head(metadata(sce)$spillover_matrix)
# construct SCE from single-stained control samples data(ss_exp) sce <- prepData(ss_exp) # specify mass channels stained for bc_ms <- c(139, 141:156, 158:176) # debarcode single-positive populations sce <- assignPrelim(sce, bc_ms) sce <- estCutoffs(sce) sce <- applyCutoffs(sce) # estimate & extract spillover matrix sce <- computeSpillmat(sce) library(SingleCellExperiment) head(metadata(sce)$spillover_matrix)
Concatenation & Normalization
raw_data
a flowSet
with 3 experiments, each containing 2'500 raw
measurements with a variation of signal over time. Samples were mixed
with DVS beads capture by mass channels 140, 151, 153, 165 and 175.
Debarcoding
sample_ff
a flowFrame
following a 6-choose-3 barcoding scheme
where mass channels 102, 104, 105, 106, 108, and 110 were used for labeling
such that each of the 20 individual barcodes are positive for exactly 3
out of the 6 barcode channels.
sample_key
a data.frame
of dimension 20 x 6 with sample names as row and
barcode masses as column names. Contains a binary code of length 6 for each
sample in sample_ff
, e.g. 111000, as its unique identifier.
Compensation
ss_exp
a flowFrame
with 20'000 events.
Contains 36 single-antibody stained controls where beads were stained
with antibodies captured by mass channels 139, 141 through 156, and
158 through 176, respectively, and pooled together.
mp_cells
a flowFrame
with 5000 spill-affected
multiplexed cells and 39 measurement parameters.
isotope_list
a named list of isotopic compositions for all elements within 75 through 209 u corresponding to the CyTOF mass range at the time of writing.
Differential Analysis
PBMC_fs
a flowSet
with PBMCs samples from 6 patients. For each sample,
the expression of 10 cell surface and 14 signaling markers was measured
before (REF) and upon BCR/FcR-XL stimulation (BCRXL) with B cell receptor/
Fc receptor crosslinking for 30', resulting in a total of 12 samples.
PBMC_panel
a 2 column data.frame
that contains each marker's
column name in the FCS file, and its targeted protein marker.
PBMC_md
a data.frame
where each row corresponds to a sample,
and with columns describing the experimental design.
merging_table
a 20 x 2 table with "old_cluster" IDs and "new_cluster" labels to exemplify manual cluster merging and cluster annotation.
see descriptions above.
Helena L Crowell [email protected]
Bodenmiller, B., Zunder, E.R., Finck, R., et al. (2012). Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators. Nature Biotechnology 30(9): 858-67.
Coursey, J.S., Schwab, D.J., Tsai, J.J., Dragoset, R.A. (2015). Atomic weights and isotopic compositions, (available at http://physics.nist.gov/Comp).
### example data for concatenation & normalization: # raw measurement data data(raw_data) ### example data for debarcoding: # 20 barcoded samples data(sample_ff) # 6-choose-3 barcoding scheme data(sample_key) ### example data for compensation: # single-stained control samples data(ss_exp) # multiplexed cells data(mp_cells) ### example data for differential analysis: # REF vs. BCRXL samples data(PBMC_fs) # antigen panel & experimental design data(PBMC_panel, PBMC_md) # exemplary manual merging table data(merging_table)
### example data for concatenation & normalization: # raw measurement data data(raw_data) ### example data for debarcoding: # 20 barcoded samples data(sample_ff) # 6-choose-3 barcoding scheme data(sample_key) ### example data for compensation: # single-stained control samples data(ss_exp) # multiplexed cells data(mp_cells) ### example data for differential analysis: # REF vs. BCRXL samples data(PBMC_fs) # antigen panel & experimental design data(PBMC_panel, PBMC_md) # exemplary manual merging table data(merging_table)
For each sample, estimates a cutoff parameter for the distance between positive and negative barcode populations.
estCutoffs(x)
estCutoffs(x)
x |
For the estimation of cutoff parameters, we considered yields upon debarcoding as a function of the applied cutoffs. Commonly, this function will be characterized by an initial weak decline, where doublets are excluded, and subsequent rapid decline in yields to zero. In between, low numbers of counts with intermediate barcode separation give rise to a plateau. As an adequate cutoff estimate, we target the point that approximately marks the end of the plateau regime and the onset of yield decline. To facilitate robust cutoff estimation, we fit a linear and a three-parameter log-logistic function to the yields function:
The goodness of the linear fit relative to the log-logistic fit is weighed with:
and the cutoffs for both functions are defined as:
The final cutoff estimate is defined as the weighted mean between these estimates:
the input SingleCellExperiment
is returned
with an additional metadata
slot sep_cutoffs
.
Helena L Crowell [email protected]
Finney, D.J. (1971). Probit Analsis. Journal of Pharmaceutical Sciences 60, 1432.
library(SingleCellExperiment) # construct SCE data(sample_ff, sample_key) sce <- prepData(sample_ff) # assign preliminary barcode IDs # & estimate separation cutoffs sce <- assignPrelim(sce, sample_key) sce <- estCutoffs(sce) # access separation cutoff estimates (seps <- metadata(sce)$sep_cutoffs) # compute population yields cs <- split(seq_len(ncol(sce)), sce$bc_id) sapply(names(cs), function(id) { sub <- sce[, cs[[id]]] mean(sub$delta > seps[id]) }) # view yield plots including current cutoff plotYields(sce, which = "A1")
library(SingleCellExperiment) # construct SCE data(sample_ff, sample_key) sce <- prepData(sample_ff) # assign preliminary barcode IDs # & estimate separation cutoffs sce <- assignPrelim(sce, sample_key) sce <- estCutoffs(sce) # access separation cutoff estimates (seps <- metadata(sce)$sep_cutoffs) # compute population yields cs <- split(seq_len(ncol(sce)), sce$bc_id) sapply(names(cs), function(id) { sub <- sce[, cs[[id]]] mean(sub$delta > seps[id]) }) # view yield plots including current cutoff plotYields(sce, which = "A1")
SingleCellExperiment
Extracts clusters from a SingleCellExperiment
.
Populations will be either returned as a flowSet
or written to FCS files, depending on argument as
.
extractClusters( x, k, clusters = NULL, as = c("flowSet", "fcs"), out_dir = ".", verbose = TRUE )
extractClusters( x, k, clusters = NULL, as = c("flowSet", "fcs"), out_dir = ".", verbose = TRUE )
x |
|
k |
numeric or character string.
Specifies the clustering to extract populations from.
Must be one of |
clusters |
a character vector.
Specifies which clusters to extract.
|
as |
|
out_dir |
a character string. Specifies where FCS files should be writen to. Defaults to the working directory. |
verbose |
logical. Should information on progress be reported? |
a flowSet
or character vector of the output file names.
Mark D Robinson & Helena L Crowell [email protected]
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md, merging_table) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # merge clusters sce <- mergeClusters(sce, k="meta20", table=merging_table, id="merging_1") extractClusters(sce, k="merging_1", clusters=c("NK cells", "surface-"))
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md, merging_table) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # merge clusters sce <- mergeClusters(sce, k="meta20", table=merging_table, id="merging_1") extractClusters(sce, k="merging_1", clusters=c("NK cells", "surface-"))
SingleCellExperiment
filteringFilters cells/features from a SingleCellExperiment
using conditional statements a la dplyr
.
filterSCE(x, ..., k = NULL)
filterSCE(x, ..., k = NULL)
x |
|
... |
conditional statements separated by comma. Only rows/columns where the condition evaluates to TRUE are kept. |
k |
numeric or character string. Specifies the clustering to extract
populations from. Must be one of |
a SingleCellExperiment
.
Helena L Crowell [email protected]
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md, merging_table) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # one condition only, remove a single sample filterSCE(sce, condition == "Ref", sample_id != "Ref1") # keep only a subset of clusters filterSCE(sce, cluster_id %in% c(7, 8, 18), k = "meta20")
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md, merging_table) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # one condition only, remove a single sample filterSCE(sce, condition == "Ref", sample_id != "Ref1") # keep only a subset of clusters filterSCE(sce, cluster_id %in% c(7, 8, 18), k = "meta20")
Helper function to parse information from the
parameters
slot of a flowFrame
/flowSet
.
guessPanel(x, sep = "_")
guessPanel(x, sep = "_")
x |
a |
sep |
character string specifying how channel descriptions
should be parsed. E.g., if |
a data.frame
with the following columns:
name
: the parameter name
as extracted from the input flowFrame
,
desc
: the parameter description
as extracted from the input flowFrame
,
antigen
: the targeted protein markers, and
use_channel
: logical. If TRUE, the channel
is expected to contain a marker and will be kept.
Mark D Robinson & Helena L Crowell [email protected]
# examplary data with Time, DNA, BC channels, etc. data(raw_data) guessPanel(raw_data[[1]])
# examplary data with Time, DNA, BC channels, etc. data(raw_data) guessPanel(raw_data[[1]])
mergeClusters
provides a simple wrapper
to store a manual merging inside the input SingleCellExperiment
.
mergeClusters(x, k, table, id, overwrite = FALSE)
mergeClusters(x, k, table, id, overwrite = FALSE)
x |
|
k |
character string specifying the clustering to merge;
valid values are |
table |
merging table with 2 columns containing the cluster IDs to merge in the 1st, and the cluster IDs to newly assign in the 2nd column. |
id |
character string used as a label for the merging. |
overwrite |
logical specifying whether to force overwriting
should a clustering with name |
in the following code snippets,
x
is a SingleCellExperiment
object.
merging codes are accesible through cluster_codes(x)$id
all functions that ask for specification of a clustering
(e.g. plotAbundances
, plotMultiHeatmap
)
take the merging ID as a valid input argument.
a SingleCellExperiment
with newly added cluster codes stored in cluster_codes(.)$id
.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md, merging_table) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # merge clusters sce <- mergeClusters(sce, k = "meta20", id = "merging", table = merging_table) # tabulate manual merging table(cluster_ids(sce, k = "merging")) # visualize median type-marker expression plotExprHeatmap(sce, features = "type", by = "cluster_id", k = "merging", bars = TRUE)
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md, merging_table) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # merge clusters sce <- mergeClusters(sce, k = "meta20", id = "merging", table = merging_table) # tabulate manual merging table(cluster_ids(sce, k = "merging")) # visualize median type-marker expression plotExprHeatmap(sce, features = "type", by = "cluster_id", k = "merging", bars = TRUE)
an implementation of Finck et al.'s normalization of mass cytometry data using bead standards with automated bead gating.
normCytof( x, beads = c("dvs", "beta"), dna = c(191, 193), k = 500, trim = 5, remove_beads = TRUE, norm_to = NULL, assays = c("counts", "exprs"), overwrite = TRUE, transform = TRUE, cofactor = NULL, plot = TRUE, verbose = TRUE )
normCytof( x, beads = c("dvs", "beta"), dna = c(191, 193), k = 500, trim = 5, remove_beads = TRUE, norm_to = NULL, assays = c("counts", "exprs"), overwrite = TRUE, transform = TRUE, cofactor = NULL, plot = TRUE, verbose = TRUE )
x |
|
beads |
|
dna |
numeric vector of masses corresponding to DNA channels (only one is required; output scatter plot (see Value section) will be generated using the first matching channel). |
k |
integer width of the median window used for bead smoothing (affects visualizations only!). |
trim |
a single non-negative numeric.
A median+/- |
remove_beads |
logical. If TRUE, bead events will be removed from
the input |
norm_to |
a |
assays |
lnegth 2 character string specifying
which assay data to use; both should be in |
overwrite |
logical; should the specified |
transform |
logical; should normalized counts be
arcsinh-transformed with the specified |
cofactor |
numeric cofactor(s) to use for optional
arcsinh-transformation when |
plot |
logical; should bead vs. DNA scatters and smoothed bead intensities before vs. after normalization be included in the output? |
verbose |
logical; should extra information on progress be reported? |
a list of the following SingleCellExperiment
...
data
:
The filtered input SCE (when remove_beads = TRUE
);
otherwise, colData
columns is_bead
and remove
indicate whether an event as been identified as a bead or doublet.
If overwrite = FALSE
, assays normcounts/exprs
are added;
otherwise, the specified counts/exprs
assays are overwritten.
beads
, removed
:
SCEs containing subsets of events identified as beads
and that were removed, respectively. The latter includes
bead-cell and cell-cell doublets)
...and ggplot
objects:
scatter
: scatter plot of DNA vs. bead
intensities with indication of the applied gates
lines
: running-median smoothed bead
intensities before and after normalization
Helena L Crowell [email protected]
Finck, R. et al. (2013). Normalization of mass cytometry data with bead standards. Cytometry A 83A, 483-494.
data(raw_data) sce <- prepData(raw_data) # apply normalization & write normalized data to separate assays res <- normCytof(sce, beads = "dvs", k = 80, overwrite = FALSE) ncol(res$beads) # no. of bead events ncol(res$removed) # no. of events removed res$scatter # plot DNA vs. bead intensities including applied gates res$lines # plot smoothed bead intensities before vs. after normalization # filtered SCE now additionally includes # normalized count & expression data assayNames(res$data)
data(raw_data) sce <- prepData(raw_data) # apply normalization & write normalized data to separate assays res <- normCytof(sce, beads = "dvs", k = 80, overwrite = FALSE) ncol(res$beads) # no. of bead events ncol(res$removed) # no. of events removed res$scatter # plot DNA vs. bead intensities including applied gates res$lines # plot smoothed bead intensities before vs. after normalization # filtered SCE now additionally includes # normalized count & expression data assayNames(res$data)
Pseudobulk-level Multi-Dimensional Scaling (MDS) plot computed on median marker expressions in each sample.
pbMDS( x, by = c("sample_id", "cluster_id", "both"), k = "meta20", dims = c(1, 2), features = NULL, assay = "exprs", fun = c("median", "mean", "sum"), color_by = switch(by, sample_id = "condition", "cluster_id"), label_by = if (by == "sample_id") "sample_id" else NULL, shape_by = NULL, size_by = is.null(shape_by), pal = if (color_by == "cluster_id") .cluster_cols else NULL )
pbMDS( x, by = c("sample_id", "cluster_id", "both"), k = "meta20", dims = c(1, 2), features = NULL, assay = "exprs", fun = c("median", "mean", "sum"), color_by = switch(by, sample_id = "condition", "cluster_id"), label_by = if (by == "sample_id") "sample_id" else NULL, shape_by = NULL, size_by = is.null(shape_by), pal = if (color_by == "cluster_id") .cluster_cols else NULL )
x |
|
by |
character string specifying whether to aggregate
by |
k |
character string specifying which clustering to use when
|
dims |
two numeric scalars indicating which dimensions to plot. |
features |
character string specifying which features to include
for computation of reduced dimensions; valid values are
|
assay |
character string specifying which assay data to use;
valid values are |
fun |
character string specifying which summary statistic to use. |
color_by |
character string specifying a
non-numeric cell metadata column to color by;
valid values are |
label_by |
character string specifying a
non-numeric cell metadata column to label by;
valid values are |
shape_by |
character string specifying a
non-numeric cell metadata column to shape by;
valid values are |
size_by |
logical specifying whether points should be sized by the number of cells that went into aggregation; i.e., the size of a give sample, cluster or cluster-sample instance. |
pal |
character vector of colors to use;
NULL for default |
a ggplot
object.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # sample-level pseudobulks # including state-markers only pbMDS(sce, by = "sample_id", features = "state") # cluster-level pseudobulks # including type-features only pbMDS(sce, by = "cluster_id", features = "type") # pseudobulks by cluster-sample # including all features pbMDS(sce, by = "both", k = "meta12", shape_by = "condition", size_by = TRUE)
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # sample-level pseudobulks # including state-markers only pbMDS(sce, by = "sample_id", features = "state") # cluster-level pseudobulks # including type-features only pbMDS(sce, by = "cluster_id", features = "type") # pseudobulks by cluster-sample # including all features pbMDS(sce, by = "both", k = "meta12", shape_by = "condition", size_by = TRUE)
Plots the relative population abundances of the specified clustering.
plotAbundances( x, k = "meta20", by = c("sample_id", "cluster_id"), group_by = "condition", shape_by = NULL, col_clust = TRUE, distance = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), linkage = c("average", "ward.D", "single", "complete", "mcquitty", "median", "centroid", "ward.D2"), k_pal = .cluster_cols )
plotAbundances( x, k = "meta20", by = c("sample_id", "cluster_id"), group_by = "condition", shape_by = NULL, col_clust = TRUE, distance = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), linkage = c("average", "ward.D", "single", "complete", "mcquitty", "median", "centroid", "ward.D2"), k_pal = .cluster_cols )
x |
|
k |
character string specifying which clustering to use;
valid values are |
by |
a character string specifying whether to plot frequencies by samples or clusters. |
group_by |
character string specifying a non-numeric
cell metadata columnd to group by (determines the color coding);
valid values are |
shape_by |
character string specifying a non-numeric
cell metadata columnd to shape by; valid values are
|
col_clust |
for |
distance |
character string specifying the distance metric
to use for sample clustering; passed to |
linkage |
character string specifying the agglomeration method
to use for sample clustering; passed to |
k_pal |
character string specifying the cluster
color palette; ignored when |
a ggplot
object.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # plot relative population abundances # by sample & cluster, respectively plotAbundances(sce, k = "meta12") plotAbundances(sce, k = "meta8", by = "cluster_id") # use custom cluster color palette plotAbundances(sce, k = "meta10", k_pal = c("lightgrey", "cornflowerblue", "navy"))
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # plot relative population abundances # by sample & cluster, respectively plotAbundances(sce, k = "meta12") plotAbundances(sce, k = "meta8", by = "cluster_id") # use custom cluster color palette plotAbundances(sce, k = "meta10", k_pal = c("lightgrey", "cornflowerblue", "navy"))
Plots smoothed densities of marker intensities by cluster.
plotClusterExprs(x, k = "meta20", features = "type")
plotClusterExprs(x, k = "meta20", features = "type")
x |
|
k |
character string specifying which clustering to use;
valid values are |
features |
a character vector specifying
which antigens to include; valid values are
|
a ggplot
object.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) plotClusterExprs(sce, k = "meta8")
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) plotClusterExprs(sce, k = "meta8")
Plots the tSNE and PCA representing the SOM codes as inferred
by FlowSOM. Sizes are scaled to the total number of events assigned
to each cluster, and points are color according to their cluster ID upon
ConsensusClusterPlus metaclustering into k
clusters.
plotCodes(x, k = "meta20", k_pal = .cluster_cols)
plotCodes(x, k = "meta20", k_pal = .cluster_cols)
x |
|
k |
character string. Specifies the clustering to use for color coding. |
k_pal |
character string specifying the cluster color palette;
If less than |
a ggplot
object.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) plotCodes(sce, k = "meta14") # use custom cluster color palette plotCodes(sce, k = "meta12", k_pal = c("lightgrey", "cornflowerblue", "navy"))
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) plotCodes(sce, k = "meta14") # use custom cluster color palette plotCodes(sce, k = "meta12", k_pal = c("lightgrey", "cornflowerblue", "navy"))
Barplot of the number of cells measured for each sample.
plotCounts(x, group_by = "condition", color_by = group_by, prop = FALSE)
plotCounts(x, group_by = "condition", color_by = group_by, prop = FALSE)
x |
|
group_by |
character string specifying a non-numeric
cell metadata column to group by (determines x-axis ticks);
valid values are |
color_by |
character string specifying a non-numeric
cell metadata column to color by (determines grouping of bars);
valid values are |
prop |
logical specifying whether to plot relative abundances
(frequencies) for each group rather than total cell counts;
bars will be stacked when |
a ggplot
object.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) # plot number of cells per sample, colored by condition plotCounts(sce, group_by = "sample_id", color_by = "condition") # same as above, but order by patient plotCounts(sce, group_by = "patient_id", color_by = "condition") # total number of cell per patient plotCounts(sce, group_by = "patient_id", color_by = NULL) # plot proportion of cells from each patient by condition plotCounts(sce, prop = TRUE, group_by = "condition", color_by = "patient_id")
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) # plot number of cells per sample, colored by condition plotCounts(sce, group_by = "sample_id", color_by = "condition") # same as above, but order by patient plotCounts(sce, group_by = "patient_id", color_by = "condition") # total number of cell per patient plotCounts(sce, group_by = "patient_id", color_by = NULL) # plot proportion of cells from each patient by condition plotCounts(sce, prop = TRUE, group_by = "condition", color_by = "patient_id")
Heatmaps summarizing differental abundance & differential state testing results.
plotDiffHeatmap( x, y, k = NULL, top_n = 20, fdr = 0.05, lfc = 1, all = FALSE, sort_by = c("padj", "lfc", "none"), y_cols = list(padj = "p_adj", lfc = "logFC", target = "marker_id"), assay = "exprs", fun = c("median", "mean", "sum"), normalize = TRUE, col_anno = TRUE, row_anno = TRUE, hm_pal = NULL, fdr_pal = c("lightgrey", "lightgreen"), lfc_pal = c("blue3", "white", "red3") )
plotDiffHeatmap( x, y, k = NULL, top_n = 20, fdr = 0.05, lfc = 1, all = FALSE, sort_by = c("padj", "lfc", "none"), y_cols = list(padj = "p_adj", lfc = "logFC", target = "marker_id"), assay = "exprs", fun = c("median", "mean", "sum"), normalize = TRUE, col_anno = TRUE, row_anno = TRUE, hm_pal = NULL, fdr_pal = c("lightgrey", "lightgreen"), lfc_pal = c("blue3", "white", "red3") )
x |
|
y |
a |
k |
character string specifying
the clustering in |
top_n |
numeric. Number of top clusters (if |
fdr |
numeric threshold on adjusted p-values below which results should be retained and considered to be significant. |
lfc |
numeric threshold on logFCs above which to retain results. |
all |
logical specifying whether all |
sort_by |
character string specifying the |
y_cols |
named list specifying columns in |
assay |
character string specifying which assay
data to use; valid values are |
fun |
character string specifying the function to use
as summary statistic for aggregation of |
normalize |
logical specifying whether Z-score normalized values
should be plotted. If |
col_anno |
logical specifying whether to include column annotations
for all non-numeric cell metadata variables; or a character vector
in |
row_anno |
logical specifying whether to include a row annotation indicating whether cluster (DA) or cluster-marker combinations (DS) are significant, labeled with adjusted p-values, as well as logFCs. |
hm_pal |
character vector of colors
to interpolate for the heatmap. Defaults to |
fdr_pal , lfc_pal
|
character vector of colors to use for row annotations
|
a Heatmap-class
object.
Lukas M Weber & Helena L Crowell [email protected]
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce, verbose = FALSE) ## differential analysis library(diffcyt) # create design & constrast matrix design <- createDesignMatrix(ei(sce), cols_design=2:3) contrast <- createContrast(c(0, 1, 0, 0, 0)) # test for # - differential abundance (DA) of clusters # - differential states (DS) within clusters da <- diffcyt(sce, design = design, contrast = contrast, analysis_type = "DA", method_DA = "diffcyt-DA-edgeR", clustering_to_use = "meta20", verbose = FALSE) ds <- diffcyt(sce, design = design, contrast = contrast, analysis_type = "DS", method_DS = "diffcyt-DS-limma", clustering_to_use = "meta20", verbose = FALSE) # extract result tables da <- rowData(da$res) ds <- rowData(ds$res) # display test results for # - top DA clusters # - top DS cluster-marker combinations plotDiffHeatmap(sce, da) plotDiffHeatmap(sce, ds) # visualize results for subset of clusters sub <- filterSCE(sce, cluster_id %in% seq_len(5), k = "meta20") plotDiffHeatmap(sub, da, all = TRUE, sort_by = "none") # visualize results for selected feature # & include only selected annotation plotDiffHeatmap(sce["pp38", ], ds, col_anno = "condition", all = TRUE)
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce, verbose = FALSE) ## differential analysis library(diffcyt) # create design & constrast matrix design <- createDesignMatrix(ei(sce), cols_design=2:3) contrast <- createContrast(c(0, 1, 0, 0, 0)) # test for # - differential abundance (DA) of clusters # - differential states (DS) within clusters da <- diffcyt(sce, design = design, contrast = contrast, analysis_type = "DA", method_DA = "diffcyt-DA-edgeR", clustering_to_use = "meta20", verbose = FALSE) ds <- diffcyt(sce, design = design, contrast = contrast, analysis_type = "DS", method_DS = "diffcyt-DS-limma", clustering_to_use = "meta20", verbose = FALSE) # extract result tables da <- rowData(da$res) ds <- rowData(ds$res) # display test results for # - top DA clusters # - top DS cluster-marker combinations plotDiffHeatmap(sce, da) plotDiffHeatmap(sce, ds) # visualize results for subset of clusters sub <- filterSCE(sce, cluster_id %in% seq_len(5), k = "meta20") plotDiffHeatmap(sub, da, all = TRUE, sort_by = "none") # visualize results for selected feature # & include only selected annotation plotDiffHeatmap(sce["pp38", ], ds, col_anno = "condition", all = TRUE)
Dimension reduction plot colored by expression, cluster, sample or group ID.
plotDR( x, dr = NULL, color_by = "condition", facet_by = NULL, ncol = NULL, assay = "exprs", scale = TRUE, q = 0.01, dims = c(1, 2), k_pal = .cluster_cols, a_pal = hcl.colors(10, "Viridis") )
plotDR( x, dr = NULL, color_by = "condition", facet_by = NULL, ncol = NULL, assay = "exprs", scale = TRUE, q = 0.01, dims = c(1, 2), k_pal = .cluster_cols, a_pal = hcl.colors(10, "Viridis") )
x |
|
dr |
character string specifying which dimension reduction to use.
Should be one of |
color_by |
character string specifying the color coding;
valid values are |
facet_by |
character string specifying a
non-numeric cell metadata column to facet by;
valid values are |
ncol |
integer scalar specifying number of facet columns; ignored unless coloring by multiple features without facetting or coloring by a single feature with facetting. |
assay |
character string specifying which assay data to use
when coloring by marker(s); valid values are |
scale |
logical specifying whether |
q |
single numeric in [0,0.5) determining the
quantiles to trim when |
dims |
length 2 numeric specifying which dimensions to plot. |
k_pal |
character string specifying the cluster color palette;
ignored when |
a_pal |
character string specifying the |
a ggplot
object.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) # run clustering & dimension reduction sce <- cluster(sce) sce <- runDR(sce, dr = "UMAP", cells = 100) # color by single marker, split by sample plotDR(sce, color_by = "CD7", facet_by = "sample_id", ncol = 4) # color by a set of markers using custom color palette cdx <- grep("CD", rownames(sce), value = TRUE) plotDR(sce, color_by = cdx, ncol = 4, a_pal = rev(hcl.colors(10, "Spectral"))) # color by scaled expression for # set of markers, split by condition plotDR(sce, scale = TRUE, facet_by = "condition", color_by = sample(rownames(sce), 4)) # color by 8 metaclusters using custom # cluster color palette, split by sample p <- plotDR(sce, color_by = "meta8", facet_by = "sample_id", k_pal = c("lightgrey", "cornflowerblue", "navy")) p$facet$params$ncol <- 4; p
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) # run clustering & dimension reduction sce <- cluster(sce) sce <- runDR(sce, dr = "UMAP", cells = 100) # color by single marker, split by sample plotDR(sce, color_by = "CD7", facet_by = "sample_id", ncol = 4) # color by a set of markers using custom color palette cdx <- grep("CD", rownames(sce), value = TRUE) plotDR(sce, color_by = cdx, ncol = 4, a_pal = rev(hcl.colors(10, "Spectral"))) # color by scaled expression for # set of markers, split by condition plotDR(sce, scale = TRUE, facet_by = "condition", color_by = sample(rownames(sce), 4)) # color by 8 metaclusters using custom # cluster color palette, split by sample p <- plotDR(sce, color_by = "meta8", facet_by = "sample_id", k_pal = c("lightgrey", "cornflowerblue", "navy")) p$facet$params$ncol <- 4; p
Plots normalized barcode intensities for a given barcode.
plotEvents( x, which = "all", assay = "scaled", n = 1000, out_path = NULL, out_name = "event_plot" )
plotEvents( x, which = "all", assay = "scaled", n = 1000, out_path = NULL, out_name = "event_plot" )
x |
|
which |
|
assay |
character string specifying which
assay data slot to use. One of |
n |
single numeric specifying the number of events to plot. |
out_path |
character string. If specified,
events plots for all barcodes specified via |
out_name |
character strings specifying
the output's file name when |
Plots intensities normalized by population for each barcode specified
by which
: Each event corresponds to the intensities plotted on a
vertical line at a given point along the x-axis. Events are scaled to the
95% quantile of the population it has been assigned to. Barcodes with
less than 50 event assignments will be skipped; it is strongly recommended
to remove such populations or reconsider their separation cutoffs.
a list of ggplot
objects.
Helena L Crowell [email protected]
Zunder, E.R. et al. (2015). Palladium-based mass tag cell barcoding with a doublet-filtering scheme and single-cell deconvolution algorithm. Nature Protocols 10, 316-333.
data(sample_ff, sample_key) sce <- prepData(sample_ff, by_time = FALSE) sce <- assignPrelim(sce, sample_key) plotEvents(sce, which = "D1")
data(sample_ff, sample_key) sce <- prepData(sample_ff, by_time = FALSE) sce <- assignPrelim(sce, sample_key) plotEvents(sce, which = "D1")
Heatmap of marker expressions aggregated by sample, cluster, or both; with options to include annotation of cell metadata factors, clustering(s), as well as relative and absolute cell counts.
plotExprHeatmap( x, features = NULL, by = c("sample_id", "cluster_id", "both"), k = "meta20", m = NULL, assay = "exprs", fun = c("median", "mean", "sum"), scale = c("first", "last", "never"), q = 0.01, row_anno = TRUE, col_anno = TRUE, row_clust = TRUE, col_clust = TRUE, row_dend = TRUE, col_dend = TRUE, bars = FALSE, perc = FALSE, bin_anno = FALSE, hm_pal = rev(brewer.pal(11, "RdYlBu")), k_pal = .cluster_cols, m_pal = k_pal, distance = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), linkage = c("average", "ward.D", "single", "complete", "mcquitty", "median", "centroid", "ward.D2") )
plotExprHeatmap( x, features = NULL, by = c("sample_id", "cluster_id", "both"), k = "meta20", m = NULL, assay = "exprs", fun = c("median", "mean", "sum"), scale = c("first", "last", "never"), q = 0.01, row_anno = TRUE, col_anno = TRUE, row_clust = TRUE, col_clust = TRUE, row_dend = TRUE, col_dend = TRUE, bars = FALSE, perc = FALSE, bin_anno = FALSE, hm_pal = rev(brewer.pal(11, "RdYlBu")), k_pal = .cluster_cols, m_pal = k_pal, distance = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), linkage = c("average", "ward.D", "single", "complete", "mcquitty", "median", "centroid", "ward.D2") )
x |
|
features |
character string specifying which features to include;
valid values are |
by |
character string specifying whether to aggregate by sample, cluster, both. |
k |
character string specifying which
clustering to use when |
m |
character string specifying a metaclustering to include as an
annotation when |
assay |
character string specifying which assay
data to use; valid values are |
fun |
character string specifying the function to use as summary statistic. |
scale |
character string specifying the scaling strategy:
If |
q |
single numeric in [0,0.5) determining the
quantiles to trim when |
row_anno , col_anno
|
logical specifying whether to include row/column
annotations (see details); when one axis corresponds to samples
( |
row_clust , col_clust
|
logical specifying whether rows/columns should be hierarchically clustered and re-ordered accordingly. |
row_dend , col_dend
|
logical specifying whether to include the row/column dendrograms. |
bars |
logical specifying whether to include a barplot of cell counts per cluster as a right-hand side row annotation. |
perc |
logical specifying whether to display
percentage labels next to bars when |
bin_anno |
logical specifying whether to display values inside bins. |
hm_pal |
character vector of colors to interpolate for the heatmap. |
k_pal , m_pal
|
character vector of colors to interpolate
for cluster annotations when |
distance |
character string specifying the distance metric
to use for both row and column hierarchical clustering;
passed to |
linkage |
character string specifying the agglomeration method
to use for both row and column hierarchical clustering;
passed to |
By default (row/col_anno = TRUE
), for axes corresponding to samples
(y-axis for by = "sample_id"
and x-axis for by = "both"
),
annotations will be drawn for all non-numeric cell metadata variables.
Alternatively, a specific subset of annotations can be included
for only a subset of variables by specifying row/col_anno
to be a character vector in names(colData(x))
(see examples).
For axes corresponding to clusters (y-axis for by = "cluster_id"
and "both"
), annotations will be drawn for the specified
clustering(s) (arguments k
and m
).
a Heatmap-class
object.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
plotFreqHeatmap
,
plotMultiHeatmap
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # median scaled & trimmed expression by cluster plotExprHeatmap(sce, by = "cluster_id", k = "meta8", scale = "first", q = 0.05, bars = FALSE) # scale each marker between 0 and 1 # after aggregation (without trimming) plotExprHeatmap(sce, scale = "last", q = 0, bars = TRUE, perc = TRUE, hm_pal = hcl.colors(10, "YlGnBu", rev = TRUE)) # raw (un-scaled) median expression by cluster-sample plotExprHeatmap(sce, features = "pp38", by = "both", k = "meta10", scale = "never", row_anno = FALSE, bars = FALSE) # include only subset of samples sub <- filterSCE(sce, patient_id != "Patient", sample_id != "Ref3") # includes specific annotations & # split into CDx & all other markers is_cd <- grepl("CD", rownames(sce)) plotExprHeatmap(sub, rownames(sce)[is_cd], row_anno = "condition", bars = FALSE) plotExprHeatmap(sub, rownames(sce)[!is_cd], row_anno = "patient_id", bars = FALSE)
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # median scaled & trimmed expression by cluster plotExprHeatmap(sce, by = "cluster_id", k = "meta8", scale = "first", q = 0.05, bars = FALSE) # scale each marker between 0 and 1 # after aggregation (without trimming) plotExprHeatmap(sce, scale = "last", q = 0, bars = TRUE, perc = TRUE, hm_pal = hcl.colors(10, "YlGnBu", rev = TRUE)) # raw (un-scaled) median expression by cluster-sample plotExprHeatmap(sce, features = "pp38", by = "both", k = "meta10", scale = "never", row_anno = FALSE, bars = FALSE) # include only subset of samples sub <- filterSCE(sce, patient_id != "Patient", sample_id != "Ref3") # includes specific annotations & # split into CDx & all other markers is_cd <- grepl("CD", rownames(sce)) plotExprHeatmap(sub, rownames(sce)[is_cd], row_anno = "condition", bars = FALSE) plotExprHeatmap(sub, rownames(sce)[!is_cd], row_anno = "patient_id", bars = FALSE)
Plots smoothed densities of marker intensities, with a density curve for each sample ID, and curves colored by a cell metadata variable of interest.
plotExprs(x, features = NULL, color_by = "condition", assay = "exprs")
plotExprs(x, features = NULL, color_by = "condition", assay = "exprs")
x |
|
features |
character vector specifying
which features to invlude; valid values are
|
color_by |
character string specifying
a non-numeric cell metadata column by which
to color density curves for each sample;
valid values are |
assay |
character string specifying which assay data
to use; valid values are |
a ggplot
object.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) plotExprs(sce)
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) plotExprs(sce)
Heatmap of relative cluster abundances (frequencies) by sample.
plotFreqHeatmap( x, k = "meta20", m = NULL, normalize = TRUE, row_anno = TRUE, col_anno = TRUE, row_clust = TRUE, col_clust = TRUE, row_dend = TRUE, col_dend = TRUE, bars = TRUE, perc = FALSE, hm_pal = rev(brewer.pal(11, "RdBu")), k_pal = .cluster_cols, m_pal = k_pal )
plotFreqHeatmap( x, k = "meta20", m = NULL, normalize = TRUE, row_anno = TRUE, col_anno = TRUE, row_clust = TRUE, col_clust = TRUE, row_dend = TRUE, col_dend = TRUE, bars = TRUE, perc = FALSE, hm_pal = rev(brewer.pal(11, "RdBu")), k_pal = .cluster_cols, m_pal = k_pal )
x |
|
k |
character string specifying the clustering to use;
valid values are |
m |
character string specifying a metaclustering
to include as an annotation when |
normalize |
logical specifying whether to Z-score normalize. |
row_anno , col_anno
|
logical specifying whether to
include row/column annotations for clusters/samples;
for |
row_clust , col_clust
|
logical specifying whether rows/columns (clusters/samples) should be hierarchically clustered and re-ordered accordingly. |
row_dend , col_dend
|
logical specifying whether to include row/column dendrograms. |
bars |
logical specifying whether to include a barplot of cell counts per cluster as a right-hand side row annotation. |
perc |
logical specifying whether to display
percentage labels next to bars when |
hm_pal |
character vector of colors to interpolate for the heatmap. |
k_pal , m_pal
|
character vector of colors
to use for cluster and merging row annotations.
If less than |
a Heatmap-class
object.
Helena L Crowell [email protected]
plotAbundances
,
plotExprHeatmap
,
plotMultiHeatmap
,
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # complete plotFreqHeatmap(sce, k = "meta12", m = "meta8") # minimal plotFreqHeatmap(sce, k = "meta10", normalize = FALSE, bars = FALSE, row_anno = FALSE, col_anno = FALSE, row_clust = FALSE, col_clust = FALSE) # customize colors & annotations plotFreqHeatmap(sce, k = "meta7", m = "meta4", col_anno = "condition", hm_pal = c("navy", "grey95", "gold"), k_pal = hcl.colors(7, "Set 2"), m_pal = hcl.colors(4, "Dark 3"))
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # complete plotFreqHeatmap(sce, k = "meta12", m = "meta8") # minimal plotFreqHeatmap(sce, k = "meta10", normalize = FALSE, bars = FALSE, row_anno = FALSE, col_anno = FALSE, row_clust = FALSE, col_clust = FALSE) # customize colors & annotations plotFreqHeatmap(sce, k = "meta7", m = "meta4", col_anno = "condition", hm_pal = c("navy", "grey95", "gold"), k_pal = hcl.colors(7, "Set 2"), m_pal = hcl.colors(4, "Dark 3"))
Histogram of counts and plot of yields as a function of separation cutoffs.
plotMahal(x, which, assay = "exprs", n = 1000)
plotMahal(x, which, assay = "exprs", n = 1000)
x |
|
which |
character string. Specifies which barcode to plot. |
assay |
character string specifying which assay to use. |
n |
numeric. Number of cells to subsample; use NULL to include all. |
Plots all inter-barcode interactions for the population specified
by argument which
. Events are colored by their Mahalanobis distance.
Helena L Crowell [email protected]
Zunder, E.R. et al. (2015). Palladium-based mass tag cell barcoding with a doublet-filtering scheme and single-cell deconvolution algorithm. Nature Protocols 10, 316-333.
data(sample_ff, sample_key) sce <- prepData(sample_ff, by_time = FALSE) sce <- assignPrelim(sce, sample_key) sce <- estCutoffs(sce) sce <- applyCutoffs(sce) plotMahal(sce, which = "B3")
data(sample_ff, sample_key) sce <- prepData(sample_ff, by_time = FALSE) sce <- assignPrelim(sce, sample_key) sce <- estCutoffs(sce) sce <- applyCutoffs(sce) plotMahal(sce, which = "B3")
Combines expression and frequency heatmaps from
plotExprHeatmap
and plotFreqHeatmap
,
respectively, into a HeatmapList
.
plotMultiHeatmap( x, hm1 = "type", hm2 = "abundances", k = "meta20", m = NULL, assay = "exprs", fun = c("median", "mean", "sum"), scale = c("first", ifelse(hm2 == "state", "first", "last")), q = c(0.01, ifelse(hm2 == "state", 0.01, 0)), normalize = TRUE, row_anno = TRUE, col_anno = TRUE, row_clust = TRUE, col_clust = c(TRUE, hm2 == "state"), row_dend = TRUE, col_dend = c(TRUE, hm2 == "state"), bars = FALSE, perc = FALSE, hm1_pal = rev(brewer.pal(11, "RdYlBu")), hm2_pal = if (isTRUE(hm2 == "abundances")) rev(brewer.pal(11, "PuOr")) else hm1_pal, k_pal = .cluster_cols, m_pal = k_pal, distance = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), linkage = c("average", "ward.D", "single", "complete", "mcquitty", "median", "centroid", "ward.D2") )
plotMultiHeatmap( x, hm1 = "type", hm2 = "abundances", k = "meta20", m = NULL, assay = "exprs", fun = c("median", "mean", "sum"), scale = c("first", ifelse(hm2 == "state", "first", "last")), q = c(0.01, ifelse(hm2 == "state", 0.01, 0)), normalize = TRUE, row_anno = TRUE, col_anno = TRUE, row_clust = TRUE, col_clust = c(TRUE, hm2 == "state"), row_dend = TRUE, col_dend = c(TRUE, hm2 == "state"), bars = FALSE, perc = FALSE, hm1_pal = rev(brewer.pal(11, "RdYlBu")), hm2_pal = if (isTRUE(hm2 == "abundances")) rev(brewer.pal(11, "PuOr")) else hm1_pal, k_pal = .cluster_cols, m_pal = k_pal, distance = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), linkage = c("average", "ward.D", "single", "complete", "mcquitty", "median", "centroid", "ward.D2") )
x |
|
hm1 |
character string specifying
which features to include in the 1st heatmap;
valid values are |
hm2 |
character string. Specifies the right-hand side heatmap. One of:
|
k |
character string specifying which;
valid values are |
m |
character string specifying a metaclustering
to include as an annotation when |
assay |
character string specifying which assay
data to use; valid values are |
fun |
character string specifying the function to use as summary statistic. |
scale |
character string specifying the scaling strategy;
for expression heatmaps (see |
q |
single numeric in [0,1) determining the
quantiles to trim when |
normalize |
logical specifying whether to Z-score normalize
cluster frequencies across samples; see |
row_anno , col_anno
|
logical specifying whether to include
row/column annotations for cell metadata variables and clustering(s);
see |
row_clust , col_clust
|
logical specifying whether rows/columns should be hierarchically clustered and re-ordered accordingly. |
row_dend , col_dend
|
logical specifying whether to include the row/column dendrograms. |
bars |
logical specifying whether to include a barplot of cell counts per cluster as a right-hand side row annotation. |
perc |
logical specifying whether to display
percentage labels next to bars when |
hm1_pal , hm2_pal
|
character vector of colors to interpolate for each heatmap. |
k_pal , m_pal
|
character vector of colors
to use for cluster and merging row annotations.
If less than |
distance |
character string specifying the distance metric
to use in |
linkage |
character string specifying the agglomeration method
to use in |
In its 1st panel, plotMultiHeatmap
will display (scaled)
type-marker expressions aggregated by cluster (across all samples).
Depending on argument hm2
, the 2nd panel will contain one of:
hm2 = "abundances"
relataive cluster abundances by cluster & sample
hm2 = "state"
aggregated (scaled) state-marker expressions by cluster (across all samples; analogous to panel 1)
hm2 %in% rownames(x)
aggregated (scaled) marker expressions by cluster & sample
a HeatmapList-class
object.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
plotAbundances
,
plotExprHeatmap
,
plotFreqHeatmap
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # state-markers + cluster frequencies plotMultiHeatmap(sce, hm1 = "state", hm2 = "abundances", bars = TRUE, perc = TRUE) # type-markers + marker of interest plotMultiHeatmap(sce, hm2 = "pp38", k = "meta12", m = "meta8") # both, type- & state-markers plotMultiHeatmap(sce, hm2 = "state") # plot markers of interest side-by-side # without left-hand side heatmap plotMultiHeatmap(sce, k = "meta10", hm1 = NULL, hm2 = c("pS6", "pNFkB", "pBtk"), row_anno = FALSE, hm2_pal = c("white", "black"))
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # state-markers + cluster frequencies plotMultiHeatmap(sce, hm1 = "state", hm2 = "abundances", bars = TRUE, perc = TRUE) # type-markers + marker of interest plotMultiHeatmap(sce, hm2 = "pp38", k = "meta12", m = "meta8") # both, type- & state-markers plotMultiHeatmap(sce, hm2 = "state") # plot markers of interest side-by-side # without left-hand side heatmap plotMultiHeatmap(sce, k = "meta10", hm1 = NULL, hm2 = c("pS6", "pNFkB", "pBtk"), row_anno = FALSE, hm2_pal = c("white", "black"))
Plots non-redundancy scores (NRS) by feature in decreasing order of average NRS across samples.
plotNRS(x, features = NULL, color_by = "condition", assay = "exprs")
plotNRS(x, features = NULL, color_by = "condition", assay = "exprs")
x |
|
features |
a character vector specifying
which antigens to use for clustering; valid values are
|
color_by |
character string specifying the color coding;
valid values are |
assay |
character string specifying which assay data
to use; valid values are |
a ggplot
object.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) plotNRS(sce, features = NULL) # default: all markers plotNRS(sce, features = "type") # type-markers only
data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) plotNRS(sce, features = NULL) # default: all markers plotNRS(sce, features = "type") # type-markers only
Boxplot of aggregated marker data by sample or cluster, optionally colored and faceted by non-numeric cell metadata variables of interest.
plotPbExprs( x, k = "meta20", features = "state", assay = "exprs", fun = c("median", "mean", "sum"), facet_by = c("antigen", "cluster_id"), color_by = "condition", group_by = color_by, shape_by = NULL, size_by = FALSE, geom = c("both", "points", "boxes"), jitter = TRUE, ncol = NULL )
plotPbExprs( x, k = "meta20", features = "state", assay = "exprs", fun = c("median", "mean", "sum"), facet_by = c("antigen", "cluster_id"), color_by = "condition", group_by = color_by, shape_by = NULL, size_by = FALSE, geom = c("both", "points", "boxes"), jitter = TRUE, ncol = NULL )
x |
a |
k |
character string specifying which clustering to use;
values values are |
features |
character vector specifying
which features to include; valid values are
|
assay |
character string specifying which assay data
to use; valid values are |
fun |
character string specifying the summary statistic to use. |
facet_by |
|
color_by , group_by , shape_by
|
character string specifying a non-numeric cell metadata variable
to color, group and shape by, respectively; valid values are
|
size_by |
logical specifying whether to scale point sizes by
the number of cells in a given sample or cluster-sample instance;
ignored when |
geom |
character string specifying whether to include only points, boxplots or both. |
jitter |
logical specifying whether to use |
ncol |
integer scalar specifying number of facet columns. |
a ggplot
object.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
# construct SCE data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce, verbose = FALSE) # plot median expressions by sample & condition # ...split by marker plotPbExprs(sce, shape_by = "patient_id", features = sample(rownames(sce), 6)) # ...split by cluster plotPbExprs(sce, facet_by = "cluster_id", k = "meta6") # plot median type-marker expressions by sample & cluster plotPbExprs(sce, feature = "type", k = "meta6", facet_by = "antigen", group_by = "cluster_id", color_by = "sample_id", size_by = TRUE, geom = "points", jitter = FALSE, ncol = 5) # plot median state-marker expressions # by sample & cluster, split by condition plotPbExprs(sce, k = "meta6", facet_by = "antigen", group_by = "cluster_id", color_by = "condition", ncol = 7)
# construct SCE data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce, verbose = FALSE) # plot median expressions by sample & condition # ...split by marker plotPbExprs(sce, shape_by = "patient_id", features = sample(rownames(sce), 6)) # ...split by cluster plotPbExprs(sce, facet_by = "cluster_id", k = "meta6") # plot median type-marker expressions by sample & cluster plotPbExprs(sce, feature = "type", k = "meta6", facet_by = "antigen", group_by = "cluster_id", color_by = "sample_id", size_by = TRUE, geom = "points", jitter = FALSE, ncol = 5) # plot median state-marker expressions # by sample & cluster, split by condition plotPbExprs(sce, k = "meta6", facet_by = "antigen", group_by = "cluster_id", color_by = "condition", ncol = 7)
Bivariate scatter plots including visualization of (group-specific) gates, their boundaries and percentage of selected cells.
plotScatter( x, chs, color_by = NULL, facet_by = NULL, bins = 100, assay = "exprs", label = c("target", "channel", "both"), zeros = FALSE, k_pal = .cluster_cols )
plotScatter( x, chs, color_by = NULL, facet_by = NULL, bins = 100, assay = "exprs", label = c("target", "channel", "both"), zeros = FALSE, k_pal = .cluster_cols )
x |
|
chs |
character string pecifying which channels to plot.
Valid values are antigens: |
color_by |
character string specifying
a cell metadata column to color by; valid values are
|
facet_by |
character string specifying a non-numeric
cell metadata column to facet by; valid values are
|
bins |
numeric of length 1 giving the number of bins
for |
assay |
character string specifying which assay data to use.
Should be one of |
label |
character string specifying axis labels should include antigen targets, channel names, or a concatenation of both. |
zeros |
logical specifying whether to include 0 values. |
k_pal |
character string specifying the cluster color palette;
ignored when |
a ggplot
object.
Helena L Crowell [email protected]
data(raw_data) sce <- prepData(raw_data) dna_chs <- c("DNA1", "DNA2") plotScatter(sce, dna_chs, label = "both") plotScatter(sce, chs = sample(rownames(sce), 4), color_by = "sample_id") sce <- prepData(sample_ff) ids <- sample(rownames(sample_key), 3) sce <- assignPrelim(sce, sample_key[ids, ]) sce <- sce[, sce$bc_id %in% ids] chs <- sample(rownames(sce), 5) plotScatter(sce, chs, color_by = "bc_id") plotScatter(sce, chs, color_by = "delta")
data(raw_data) sce <- prepData(raw_data) dna_chs <- c("DNA1", "DNA2") plotScatter(sce, dna_chs, label = "both") plotScatter(sce, chs = sample(rownames(sce), 4), color_by = "sample_id") sce <- prepData(sample_ff) ids <- sample(rownames(sample_key), 3) sce <- assignPrelim(sce, sample_key[ids, ]) sce <- sce[, sce$bc_id %in% ids] chs <- sample(rownames(sce), 5) plotScatter(sce, chs, color_by = "bc_id") plotScatter(sce, chs, color_by = "delta")
Generates a heatmap of the spillover matrix annotated with estimated spill percentages.
plotSpillmat( x, sm = NULL, anno = TRUE, isotope_list = CATALYST::isotope_list, hm_pal = c("white", "lightcoral", "red2", "darkred"), anno_col = "black" )
plotSpillmat( x, sm = NULL, anno = TRUE, isotope_list = CATALYST::isotope_list, hm_pal = c("white", "lightcoral", "red2", "darkred"), anno_col = "black" )
x |
|
sm |
spillover matrix to visualize. If NULL, |
anno |
logical. If TRUE (default), spill percentages are shown inside bins and rows are annotated with the total amount of spill received. |
isotope_list |
named list. Used to validate the input spillover matrix.
Names should be metals; list elements numeric vectors of their isotopes.
See |
hm_pal |
character vector of colors to interpolate. |
anno_col |
character string specifying the color to use for bin annotations. |
a ggplot2
-object showing estimated spill percentages
as a heatmap with colors ramped to the highest spillover value present.
Helena L Crowell [email protected]
# get single-stained control samples & construct SCE data(ss_exp) sce <- prepData(ss_exp) # debarcode single-positive populations bc_ms <- c(139, 141:156, 158:176) sce <- assignPrelim(sce, bc_ms, verbose = FALSE) sce <- applyCutoffs(estCutoffs(sce)) # estimate & visualize spillover matrix sce <- computeSpillmat(sce) plotSpillmat(sce)
# get single-stained control samples & construct SCE data(ss_exp) sce <- prepData(ss_exp) # debarcode single-positive populations bc_ms <- c(139, 141:156, 158:176) sce <- assignPrelim(sce, bc_ms, verbose = FALSE) sce <- applyCutoffs(estCutoffs(sce)) # estimate & visualize spillover matrix sce <- computeSpillmat(sce) plotSpillmat(sce)
Plots the distribution of barcode separations and yields upon debarcoding as a function of separation cutoffs. If available, currently used separation cutoffs as well as their resulting yields will be indicated in the plot.
plotYields(x, which = 0, out_path = NULL, out_name = "yield_plot")
plotYields(x, which = 0, out_path = NULL, out_name = "yield_plot")
x |
|
which |
0, numeric or character. Specifies which barcode(s) to plot.
Valid values are IDs that occur as row names of |
out_path |
character string. If specified,
yields plots for all barcodes specified via |
out_name |
character strings specifying
the output's file name when |
The overall yield that will be achieved upon application of the specified set of separation cutoffs is indicated in the summary plot. Respective separation thresholds and their resulting yields are included in each barcode's plot. The separation cutoff value should be chosen such that it appropriately balances confidence in barcode assignment and cell yield.
a list of ggplot
objects.
Helena L Crowell [email protected]
Zunder, E.R. et al. (2015). Palladium-based mass tag cell barcoding with a doublet-filtering scheme and single-cell deconvolution algorithm. Nature Protocols 10, 316-333.
# construct SCE & apply arcsinh-transformation data(sample_ff, sample_key) sce <- prepData(sample_ff) # deconvolute samples & estimate separation cutoffs sce <- assignPrelim(sce, sample_key) sce <- estCutoffs(sce) # all barcodes summary plot plotYields(sce, which = 0) # plot for specific sample plotYields(sce, which = "C1")
# construct SCE & apply arcsinh-transformation data(sample_ff, sample_key) sce <- prepData(sample_ff) # deconvolute samples & estimate separation cutoffs sce <- assignPrelim(sce, sample_key) sce <- estCutoffs(sce) # all barcodes summary plot plotYields(sce, which = 0) # plot for specific sample plotYields(sce, which = "C1")
Data preparation
prepData( x, panel = NULL, md = NULL, features = NULL, transform = TRUE, cofactor = 5, panel_cols = list(channel = "fcs_colname", antigen = "antigen", class = "marker_class"), md_cols = list(file = "file_name", id = "sample_id", factors = c("condition", "patient_id")), by_time = TRUE, FACS = FALSE, fix_chs = c("common", "all"), ... )
prepData( x, panel = NULL, md = NULL, features = NULL, transform = TRUE, cofactor = 5, panel_cols = list(channel = "fcs_colname", antigen = "antigen", class = "marker_class"), md_cols = list(file = "file_name", id = "sample_id", factors = c("condition", "patient_id")), by_time = TRUE, FACS = FALSE, fix_chs = c("common", "all"), ... )
x |
a |
panel |
a data.frame containing, for each channel,
its column name in the input data, targeted protein marker,
and (optionally) class ("type", "state", or "none").
If 'panel' is unspecified, it will be constructed
from the first input sample via |
md |
a table with column describing the experiment. An exemplary metadata table could look as follows:
If 'md' is unspecified, the |
features |
a logical vector, numeric vector of column indices, or character vector of channel names. Specified which column to keep from the input data. Defaults to the channels listed in the input panel. |
transform |
logical. Specifies whether an arcsinh-transformation with cofactor cofactor should be performed, in which case expression values (transformed counts) will be stored in assay(x, "exprs"). |
cofactor |
numeric cofactor(s) to use for optional
arcsinh-transformation when |
panel_cols |
a names list specifying
the |
md_cols |
a named list specifying the column names of |
by_time |
logical; should samples be ordered by acquisition time?
Ignored if |
FACS |
logical; is this FACS / flow cytometry data?
By default, |
fix_chs |
specifies the strategy to use in case of panel discrepancies.
|
... |
additional arguments passed to |
By default, non-mass channels (e.g., time, event lengths) will be removed
from the output SCE's assay data and instead stored in the object's internal
cell metadata (int_colData
) to assure these data are not subject to
transformations or other computations applied to the assay data.
For more than 1 sample, prepData
will concatenate cells into a single
SingleCellExperiment
object. Note that cells will hereby be order by
"Time"
, regardless of whether by_time = TRUE
or FALSE
.
Instead, by_time
determines the sample (not cell!) order;
i.e., whether samples should be kept in their original order,
or should be re-ordered according to their acquision time
stored in keyword(flowSet, "$BTIM")
.
When a metadata table is specified (i.e. !is.null(md)
),
argument by_time
will be ignored and sample ordering
is instead determined by md[[md_cols$file]]
.
Helena L Crowell [email protected]
data(PBMC_fs, PBMC_panel, PBMC_md) prepData(PBMC_fs, PBMC_panel, PBMC_md) # channel-specific transformation cf <- sample(seq_len(10)[-1], nrow(PBMC_panel), TRUE) names(cf) <- PBMC_panel$fcs_colname sce <- prepData(PBMC_fs, cofactor = cf) int_metadata(sce)$cofactor # input has different name for "condition" md <- PBMC_md m <- match("condition", names(md)) colnames(md)[m] <- "treatment" # add additional factor variable batch ID md$batch_id <- sample(c("A", "B"), nrow(md), TRUE) # specify 'md_cols' that differ from defaults factors <- list(factors = c("treatment", "batch_id")) ei(prepData(PBMC_fs, PBMC_panel, md, md_cols = factors)) # without panel & metadata tables sce <- prepData(raw_data) # 'flowFrame' identifiers are used as sample IDs levels(sce$sample_id) # panel was guess with 'guessPanel'; # non-mass channels are set to marker class "none" rowData(sce)
data(PBMC_fs, PBMC_panel, PBMC_md) prepData(PBMC_fs, PBMC_panel, PBMC_md) # channel-specific transformation cf <- sample(seq_len(10)[-1], nrow(PBMC_panel), TRUE) names(cf) <- PBMC_panel$fcs_colname sce <- prepData(PBMC_fs, cofactor = cf) int_metadata(sce)$cofactor # input has different name for "condition" md <- PBMC_md m <- match("condition", names(md)) colnames(md)[m] <- "treatment" # add additional factor variable batch ID md$batch_id <- sample(c("A", "B"), nrow(md), TRUE) # specify 'md_cols' that differ from defaults factors <- list(factors = c("treatment", "batch_id")) ei(prepData(PBMC_fs, PBMC_panel, md, md_cols = factors)) # without panel & metadata tables sce <- prepData(raw_data) # 'flowFrame' identifiers are used as sample IDs levels(sce$sample_id) # panel was guess with 'guessPanel'; # non-mass channels are set to marker class "none" rowData(sce)
Wrapper around dimension reduction methods available
through scater
, with optional subsampling of cells per each sample.
runDR( x, dr = c("UMAP", "TSNE", "PCA", "MDS", "DiffusionMap"), cells = NULL, features = "type", assay = "exprs", ... )
runDR( x, dr = c("UMAP", "TSNE", "PCA", "MDS", "DiffusionMap"), cells = NULL, features = "type", assay = "exprs", ... )
x |
|
dr |
character string specifying which dimension reduction to use. |
cells |
single numeric specifying the maximal number of cells per sample to use for dimension reduction; NULL for all cells. |
features |
a character vector specifying which
antigens to use for dimension reduction; valid values are
|
assay |
character string specifying which assay data to use
for dimension reduction; valid values are |
... |
optional arguments for dimension reduction; passed to
|
a ggplot
object.
Helena L Crowell [email protected]
Nowicka M, Krieg C, Crowell HL, Weber LM et al. CyTOF workflow: Differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research 2017, 6:748 (doi: 10.12688/f1000research.11622.1)
# construct SCE data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) # run UMAP on <= 200 cells per sample sce <- runDR(sce, features = type_markers(sce), cells = 100)
# construct SCE data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) # run UMAP on <= 200 cells per sample sce <- runDR(sce, features = type_markers(sce), cells = 100)
SingleCellExperiment
accessorsVarious wrappers to conviniently access slots
in a SingleCellExperiment
created with prepData
, and that are used
frequently during differential analysis.
## S4 method for signature 'SingleCellExperiment' ei(x) ## S4 method for signature 'SingleCellExperiment' n_cells(x) ## S4 method for signature 'SingleCellExperiment' channels(x) ## S4 method for signature 'SingleCellExperiment' marker_classes(x) ## S4 method for signature 'SingleCellExperiment' type_markers(x) ## S4 method for signature 'SingleCellExperiment' state_markers(x) ## S4 method for signature 'SingleCellExperiment' sample_ids(x) ## S4 method for signature 'SingleCellExperiment,missing' cluster_ids(x, k = NULL) ## S4 method for signature 'SingleCellExperiment,character' cluster_ids(x, k = NULL) ## S4 method for signature 'SingleCellExperiment' cluster_codes(x) ## S4 method for signature 'SingleCellExperiment' delta_area(x)
## S4 method for signature 'SingleCellExperiment' ei(x) ## S4 method for signature 'SingleCellExperiment' n_cells(x) ## S4 method for signature 'SingleCellExperiment' channels(x) ## S4 method for signature 'SingleCellExperiment' marker_classes(x) ## S4 method for signature 'SingleCellExperiment' type_markers(x) ## S4 method for signature 'SingleCellExperiment' state_markers(x) ## S4 method for signature 'SingleCellExperiment' sample_ids(x) ## S4 method for signature 'SingleCellExperiment,missing' cluster_ids(x, k = NULL) ## S4 method for signature 'SingleCellExperiment,character' cluster_ids(x, k = NULL) ## S4 method for signature 'SingleCellExperiment' cluster_codes(x) ## S4 method for signature 'SingleCellExperiment' delta_area(x)
x |
|
k |
character string specifying the clustering to extract.
Valid values are |
ei
extracts the experimental design table.
n_cells
extracts the number of events measured per sample.
channels
extracts the original FCS file's channel names.
marker_classes
extracts marker class assignments ("type", "state", "none").
type_markers
extracts the antigens used for clustering.
state_markers
extracts antigens that were not used for clustering.
sample_ids
extracts the sample IDs as specified in the metadata-table.
cluster_ids
extracts the numeric vector of cluster IDs
as inferred by FlowSOM
.
cluster_codes
extracts a data.frame
containing cluster codes for the
FlowSOM
clustering, the ConsensusClusterPlus
metaclustering, and all mergings done through mergeClusters
.
delta_area
extracts the delta area plot stored in the
SCE's metadata
by cluster
Helena L Crowell [email protected]
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # view experimental design table ei(sce) # quick-access sample & cluster assignments plot(table(sample_ids(sce))) plot(table(cluster_ids(sce))) # access specific clustering resolution table(cluster_ids(sce, k = "meta8")) # access marker information channels(sce) marker_classes(sce) type_markers(sce) state_markers(sce) # get cluster ID correspondece between 2 clusterings old_ids <- seq_len(20) m <- match(old_ids, cluster_codes(sce)$`meta20`) new_ids <- cluster_codes(sce)$`meta12`[m] data.frame(old_ids, new_ids) # view delta area plot (relative change in area # under CDF curve vs. the number of clusters 'k') delta_area(sce)
# construct SCE & run clustering data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce) # view experimental design table ei(sce) # quick-access sample & cluster assignments plot(table(sample_ids(sce))) plot(table(cluster_ids(sce))) # access specific clustering resolution table(cluster_ids(sce, k = "meta8")) # access marker information channels(sce) marker_classes(sce) type_markers(sce) state_markers(sce) # get cluster ID correspondece between 2 clusterings old_ids <- seq_len(20) m <- match(old_ids, cluster_codes(sce)$`meta20`) new_ids <- cluster_codes(sce)$`meta12`[m] data.frame(old_ids, new_ids) # view delta area plot (relative change in area # under CDF curve vs. the number of clusters 'k') delta_area(sce)
flowFrame/Set
If split_by = NULL
, the input SCE is converted to a
flowFrame
. Otherwise,
it is split into a flowSet
by the specified colData
column.
Any cell metadata (colData
) and dimension reductions
available in the SCE may be dropped or propagated to the output.
sce2fcs(x, split_by = NULL, keep_cd = FALSE, keep_dr = FALSE, assay = "counts")
sce2fcs(x, split_by = NULL, keep_cd = FALSE, keep_dr = FALSE, assay = "counts")
x |
|
split_by |
NULL or a character string
specifying a |
keep_cd , keep_dr
|
logials specifying whether cell metadata
(stored in |
assay |
a character string specifying
which assay data to use; valid values are |
a flowFrame
if split_by = NULL
; otherwise a
flowSet
.
Helena L Crowell [email protected]
# PREPROCESSING data(sample_ff, sample_key) sce <- prepData(sample_ff, by_time = FALSE) sce <- assignPrelim(sce, sample_key, verbose = FALSE) # split SCE by barcode population fs <- sce2fcs(sce, split_by = "bc_id") # do some spot checks library(flowCore) library(SingleCellExperiment) length(fs) == nrow(sample_key) all(fsApply(fs, nrow)[, 1] == table(sce$bc_id)) identical(t(exprs(fs[[1]])), assay(sce, "exprs")[, sce$bc_id == "A1"]) # DIFFERENTIAL ANALYSIS data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce, verbose = FALSE) # split by 20 metacluster populations sce$meta20 <- cluster_ids(sce, "meta20") fs <- sce2fcs(sce, split_by = "meta20", assay = "exprs") all(fsApply(fs, nrow)[, 1] == table(sce$meta20))
# PREPROCESSING data(sample_ff, sample_key) sce <- prepData(sample_ff, by_time = FALSE) sce <- assignPrelim(sce, sample_key, verbose = FALSE) # split SCE by barcode population fs <- sce2fcs(sce, split_by = "bc_id") # do some spot checks library(flowCore) library(SingleCellExperiment) length(fs) == nrow(sample_key) all(fsApply(fs, nrow)[, 1] == table(sce$bc_id)) identical(t(exprs(fs[[1]])), assay(sce, "exprs")[, sce$bc_id == "A1"]) # DIFFERENTIAL ANALYSIS data(PBMC_fs, PBMC_panel, PBMC_md) sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md) sce <- cluster(sce, verbose = FALSE) # split by 20 metacluster populations sce$meta20 <- cluster_ids(sce, "meta20") fs <- sce2fcs(sce, split_by = "meta20", assay = "exprs") all(fsApply(fs, nrow)[, 1] == table(sce$meta20))