Package 'decontX' reference manual

Title:	Decontamination of single cell genomics data
Description:	This package contains implementation of DecontX (Yang et al. 2020), a decontamination algorithm for single-cell RNA-seq, and DecontPro (Yin et al. 2023), a decontamination algorithm for single cell protein expression data. DecontX is a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. DecontPro is a Bayesian method that estimates the level of contamination from ambient and background sources in CITE-seq ADT dataset and decontaminate the dataset.
Authors:	Yuan Yin [aut] , Masanao Yajima [aut] , Joshua Campbell [aut, cre]
Maintainer:	Joshua Campbell <[email protected]>
License:	MIT + file LICENSE
Version:	1.5.1
Built:	2025-03-19 06:11:28 UTC
Source:	https://github.com/bioc/decontX

The 'decontX' package.

Description

DecontX is a decontamination algorithm for single-cell RNA-seq data. DecontPro is a decontamination and background removal algorithm for single cell protein expression data such as CITE-seq or Total-seq.

References

Stan Development Team (2022). RStan: the R interface to Stan. R package version 2.21.7. https://mc-stan.org

Call Stan variational bayes for inference

Description

Call Stan variational bayes for inference

Usage

.call_stan_vb(data, initial_condition)
.call_stan_vb(data, initial_condition)

Arguments

`data`	A list of input data for Stan.
`initial_condition`	Initial values for Stan params.

Value

Stan output

Process Stan output.

Description

Process Stan output.

Usage

.process_stan_vb_out(stan_vb_output, dat)
.process_stan_vb_out(stan_vb_output, dat)

Arguments

`stan_vb_output`	Stan variational bayes output
`dat`	List of data input to stan vb

Value

Decomposed counts based on Stan estimate.

Decontaminate using decontPro

Description

Decontaminate using decontPro

Usage

decontPro(filtered_counts, cell_type, ...)

## S4 method for signature 'SingleCellExperiment'
decontPro(
  filtered_counts,
  cell_type,
  delta_sd = 2e-05,
  background_sd = 2e-06,
  ambient_counts = NULL,
  ...
)

## S4 method for signature 'Seurat'
decontPro(
  filtered_counts,
  cell_type,
  delta_sd = 2e-05,
  background_sd = 2e-06,
  ambient_counts = NULL,
  ...
)

## S4 method for signature 'ANY'
decontPro(
  filtered_counts,
  cell_type,
  delta_sd = 2e-05,
  background_sd = 2e-06,
  ambient_counts = NULL,
  ...
)
decontPro(filtered_counts, cell_type, ...)

## S4 method for signature 'SingleCellExperiment'
decontPro(
  filtered_counts,
  cell_type,
  delta_sd = 2e-05,
  background_sd = 2e-06,
  ambient_counts = NULL,
  ...
)

## S4 method for signature 'Seurat'
decontPro(
  filtered_counts,
  cell_type,
  delta_sd = 2e-05,
  background_sd = 2e-06,
  ambient_counts = NULL,
  ...
)

## S4 method for signature 'ANY'
decontPro(
  filtered_counts,
  cell_type,
  delta_sd = 2e-05,
  background_sd = 2e-06,
  ambient_counts = NULL,
  ...
)

Arguments

`filtered_counts`	Count matrix NxM (feature x droplet) with only filtered droplets after cell calling. If this is a SingleCellExperiment or a Seurat object, expect `counts` in the assay slot.
`cell_type`	1xM 1-based integer vector indicating cell type of each droplet.
`...`	Additional arguments for generics.
`delta_sd`	Prior variance for ambient contamination level. Default to 2e-5.
`background_sd`	Prior variance for background contamination level. Default to 2e-6.
`ambient_counts`	Count matrix NxM (feature x droplet) with only ambient droplets. Similar to `filtered_counts` param, if it is a wrapper object, expect `counts` in the assay slot. Default to NULL.

Value

A list containing decontaminated counts, and estimated parameters.

Examples

# Simulated count matrix with 100 features x 10 droplets
counts <- matrix(sample(1:10,
                        1000,
                        replace = TRUE),
                 ncol = 10)

# Cell type indicator
k <- c(1, 1, 2, 2, 2, 3, 3, 4, 4, 4)

# Simulated ambient count matrix (optional input)
ambient_counts <- matrix(sample(1:2,
                        1000,
                        replace = TRUE),
                 ncol = 10)

# Decontamination
out <- decontPro(counts, k, 1e-2, 1e-2, ambient_counts)

# Decontaminated counts
decontaminated_counts <- out$decontaminated_counts
# Simulated count matrix with 100 features x 10 droplets
counts <- matrix(sample(1:10,
                        1000,
                        replace = TRUE),
                 ncol = 10)

# Cell type indicator
k <- c(1, 1, 2, 2, 2, 3, 3, 4, 4, 4)

# Simulated ambient count matrix (optional input)
ambient_counts <- matrix(sample(1:2,
                        1000,
                        replace = TRUE),
                 ncol = 10)

# Decontamination
out <- decontPro(counts, k, 1e-2, 1e-2, ambient_counts)

# Decontaminated counts
decontaminated_counts <- out$decontaminated_counts

Contamination estimation with decontX

Description

Identifies contamination from factors such as ambient RNA in single cell genomic datasets.

Usage

decontX(x, ...)

## S4 method for signature 'SingleCellExperiment'
decontX(
  x,
  assayName = "counts",
  z = NULL,
  batch = NULL,
  background = NULL,
  bgAssayName = NULL,
  bgBatch = NULL,
  maxIter = 500,
  delta = c(10, 10),
  estimateDelta = TRUE,
  convergence = 0.001,
  iterLogLik = 10,
  varGenes = 5000,
  dbscanEps = 1,
  seed = 12345,
  logfile = NULL,
  verbose = TRUE
)

## S4 method for signature 'ANY'
decontX(
  x,
  z = NULL,
  batch = NULL,
  background = NULL,
  bgBatch = NULL,
  maxIter = 500,
  delta = c(10, 10),
  estimateDelta = TRUE,
  convergence = 0.001,
  iterLogLik = 10,
  varGenes = 5000,
  dbscanEps = 1,
  seed = 12345,
  logfile = NULL,
  verbose = TRUE
)
decontX(x, ...)

## S4 method for signature 'SingleCellExperiment'
decontX(
  x,
  assayName = "counts",
  z = NULL,
  batch = NULL,
  background = NULL,
  bgAssayName = NULL,
  bgBatch = NULL,
  maxIter = 500,
  delta = c(10, 10),
  estimateDelta = TRUE,
  convergence = 0.001,
  iterLogLik = 10,
  varGenes = 5000,
  dbscanEps = 1,
  seed = 12345,
  logfile = NULL,
  verbose = TRUE
)

## S4 method for signature 'ANY'
decontX(
  x,
  z = NULL,
  batch = NULL,
  background = NULL,
  bgBatch = NULL,
  maxIter = 500,
  delta = c(10, 10),
  estimateDelta = TRUE,
  convergence = 0.001,
  iterLogLik = 10,
  varGenes = 5000,
  dbscanEps = 1,
  seed = 12345,
  logfile = NULL,
  verbose = TRUE
)

Arguments

`x`	A numeric matrix of counts or a SingleCellExperiment with the matrix located in the assay slot under `assayName`. Cells in each batch will be subsetted and converted to a sparse matrix of class `dgCMatrix` from package Matrix before analysis. This object should only contain filtered cells after cell calling. Empty cell barcodes (low expression droplets before cell calling) are not needed to run DecontX.
`...`	For the generic, further arguments to pass to each method.
`assayName`	Character. Name of the assay to use if `x` is a SingleCellExperiment.
`z`	Numeric or character vector. Cell cluster labels. If NULL, PCA will be used to reduce the dimensionality of the dataset initially, 'umap' from the 'uwot' package will be used to further reduce the dataset to 2 dimenions and the 'dbscan' function from the 'dbscan' package will be used to identify clusters of broad cell types. Default NULL.
`batch`	Numeric or character vector. Batch labels for cells. If batch labels are supplied, DecontX is run on cells from each batch separately. Cells run in different channels or assays should be considered different batches. Default NULL.
`background`	A numeric matrix of counts or a SingleCellExperiment with the matrix located in the assay slot under `assayName`. It should have the same data format as `x` except it contains the empty droplets instead of cells. When supplied, empirical distribution of transcripts from these empty droplets will be used as the contamination distribution. Default NULL.
`bgAssayName`	Character. Name of the assay to use if `background` is a SingleCellExperiment. Default to same as `assayName`.
`bgBatch`	Numeric or character vector. Batch labels for `background`. Its unique values should be the same as those in `batch`, such that each batch of cells have their corresponding batch of empty droplets as background, pointed by this parameter. Default to NULL.
`maxIter`	Integer. Maximum iterations of the EM algorithm. Default 500.
`delta`	Numeric Vector of length 2. Concentration parameters for the Dirichlet prior for the contamination in each cell. The first element is the prior for the native counts while the second element is the prior for the contamination counts. These essentially act as pseudocounts for the native and contamination in each cell. If `estimateDelta = TRUE`, this is only used to produce a random sample of proportions for an initial value of contamination in each cell. Then `fit_dirichlet` is used to update `delta` in each iteration. If `estimateDelta = FALSE`, then `delta` is fixed with these values for the entire inference procedure. Fixing `delta` and setting a high number in the second element will force `decontX` to be more aggressive and estimate higher levels of contamination at the expense of potentially removing native expression. Default `c(10, 10)`.
`estimateDelta`	Boolean. Whether to update `delta` at each iteration.
`convergence`	Numeric. The EM algorithm will be stopped if the maximum difference in the contamination estimates between the previous and current iterations is less than this. Default 0.001.
`iterLogLik`	Integer. Calculate log likelihood every `iterLogLik` iteration. Default 10.
`varGenes`	Integer. The number of variable genes to use in dimensionality reduction before clustering. Variability is calcualted using `modelGeneVar` function from the 'scran' package. Used only when z is not provided. Default 5000.
`dbscanEps`	Numeric. The clustering resolution parameter used in 'dbscan' to estimate broad cell clusters. Used only when z is not provided. Default 1.
`seed`	Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
`logfile`	Character. Messages will be redirected to a file named `logfile`. If NULL, messages will be printed to stdout. Default NULL.
`verbose`	Logical. Whether to print log messages. Default TRUE.

Value

If x is a matrix-like object, a list will be returned with the following items:

decontXcounts:: The decontaminated matrix. Values obtained from the variational inference procedure may be non-integer. However, integer counts can be obtained by rounding, e.g. round(decontXcounts).
contamination:: Percentage of contamination in each cell.
estimates:: List of estimated parameters for each batch. If z was not supplied, then the UMAP coordinates used to generated cell cluster labels will also be stored here.
z:: Cell population/cluster labels used for analysis.
runParams:: List of arguments used in the function call.

If x is a SingleCellExperiment, then the decontaminated counts will be stored as an assay and can be accessed with decontXcounts(x). The contamination values and cluster labels will be stored in colData(x). estimates and runParams will be stored in metadata(x)$decontX. The UMAPs used to generated cell cluster labels will be stored in reducedDims slot in x.

Author(s)

Shiyi Yang, Yuan Yin, Joshua Campbell

Examples

# Generate matrix with contamination
s <- simulateContamination(seed = 12345)

library(SingleCellExperiment)
library(celda)
sce <- SingleCellExperiment(list(counts = s$observedCounts))
sce <- decontX(sce)

# Plot contamination on UMAP
plotDecontXContamination(sce)

# Plot decontX cluster labels
umap <- reducedDim(sce)
celda::plotDimReduceCluster(x = sce$decontX_clusters,
    dim1 = umap[, 1], dim2 = umap[, 2], )

# Plot percentage of marker genes detected
# in each cell cluster before decontamination
s$markers
plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts")

# Plot percentage of marker genes detected
# in each cell cluster after contamination
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = "decontXcounts")

# Plot percentage of marker genes detected in each cell
# comparing original and decontaminated counts side-by-side
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = c("counts", "decontXcounts"))

# Plot raw counts of indiviual markers genes before
# and after decontamination
plotDecontXMarkerExpression(sce, unlist(s$markers))
# Generate matrix with contamination
s <- simulateContamination(seed = 12345)

library(SingleCellExperiment)
library(celda)
sce <- SingleCellExperiment(list(counts = s$observedCounts))
sce <- decontX(sce)

# Plot contamination on UMAP
plotDecontXContamination(sce)

# Plot decontX cluster labels
umap <- reducedDim(sce)
celda::plotDimReduceCluster(x = sce$decontX_clusters,
    dim1 = umap[, 1], dim2 = umap[, 2], )

# Plot percentage of marker genes detected
# in each cell cluster before decontamination
s$markers
plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts")

# Plot percentage of marker genes detected
# in each cell cluster after contamination
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = "decontXcounts")

# Plot percentage of marker genes detected in each cell
# comparing original and decontaminated counts side-by-side
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = c("counts", "decontXcounts"))

# Plot raw counts of indiviual markers genes before
# and after decontamination
plotDecontXMarkerExpression(sce, unlist(s$markers))

Get or set decontaminated counts matrix

Description

Gets or sets the decontaminated counts matrix from a a SingleCellExperiment object.

Usage

decontXcounts(object, ...)

decontXcounts(object, ...) <- value

## S4 method for signature 'SingleCellExperiment'
decontXcounts(object, ...)

## S4 replacement method for signature 'SingleCellExperiment'
decontXcounts(object, ...) <- value
decontXcounts(object, ...)

decontXcounts(object, ...) <- value

## S4 method for signature 'SingleCellExperiment'
decontXcounts(object, ...)

## S4 replacement method for signature 'SingleCellExperiment'
decontXcounts(object, ...) <- value

Arguments

`object`	A SingleCellExperiment object.
`...`	For the generic, further arguments to pass to each method.
`value`	A matrix to save as an assay called `decontXcounts`

Value

If getting, the assay from object with the name decontXcounts will be returned. If setting, a SingleCellExperiment object will be returned with decontXcounts listed in the assay slot.

Fast normalization for numeric matrix

Description

Fast normalization for numeric matrix

Usage

fastNormProp(R_counts, R_alpha)
fastNormProp(R_counts, R_alpha)

Arguments

`R_counts`	An integer matrix
`R_alpha`	A double value to be added to the matrix as a pseudocount

Value

A numeric matrix where the columns have been normalized to proportions

Fast normalization for numeric matrix

Description

Fast normalization for numeric matrix

Usage

fastNormPropLog(R_counts, R_alpha)
fastNormPropLog(R_counts, R_alpha)

Arguments

`R_counts`	An integer matrix
`R_alpha`	A double value to be added to the matrix as a pseudocount

Value

A numeric matrix where the columns have been normalized to proportions

Fast normalization for numeric matrix

Description

Fast normalization for numeric matrix

Usage

fastNormPropSqrt(R_counts, R_alpha)
fastNormPropSqrt(R_counts, R_alpha)

Arguments

`R_counts`	An integer matrix
`R_alpha`	A double value to be added to the matrix as a pseudocount

Value

A numeric matrix where the columns have been normalized to proportions

get row and column indices of none zero elements in the matrix

Description

get row and column indices of none zero elements in the matrix

Usage

nonzero(R_counts)
nonzero(R_counts)

Arguments

R_counts

A matrix

Value

An integer matrix where each row is a row, column indices pair

Boxplot of features grouped by cell type

Description

Boxplot of features grouped by cell type

Usage

plotBoxByCluster(
  counts,
  decontaminated_counts,
  cell_type,
  features,
  file = NULL
)
plotBoxByCluster(
  counts,
  decontaminated_counts,
  cell_type,
  features,
  file = NULL
)

Arguments

`counts`	original count matrix of nADT x nDroplet.
`decontaminated_counts`	decontaminated count matrix.
`cell_type`	1xnDroplet vector of cell_type.
`features`	names of ADT to plot
`file`	file name to save plot into a pdf. If omit, return `ggplot` object.

Value

Return a pdf file named file or a ggplot object.

Examples

# Simulate a dataset with 3 cells and 2 ADTs
counts <- matrix(c(60, 72, 52, 49, 89, 112),
                 nrow = 2,
                 dimnames = list(c('CD3', 'CD4'),
                                 c('CTGTTTACACCGCTAG',
                                   'CTCTACGGTGTGGCTC',
                                   'AGCAGCCAGGCTCATT')))

decontaminated_counts <- matrix(c(58, 36, 26, 45, 88, 110),
                                nrow = 2,
                                dimnames = list(c('CD3', 'CD4'),
                                                c('CTGTTTACACCGCTAG',
                                                  'CTCTACGGTGTGGCTC',
                                                  'AGCAGCCAGGCTCATT')))

plotBoxByCluster(counts,
                 decontaminated_counts,
                 c(1, 2, 1),
                 c('CD3', 'CD4'))
# Simulate a dataset with 3 cells and 2 ADTs
counts <- matrix(c(60, 72, 52, 49, 89, 112),
                 nrow = 2,
                 dimnames = list(c('CD3', 'CD4'),
                                 c('CTGTTTACACCGCTAG',
                                   'CTCTACGGTGTGGCTC',
                                   'AGCAGCCAGGCTCATT')))

decontaminated_counts <- matrix(c(58, 36, 26, 45, 88, 110),
                                nrow = 2,
                                dimnames = list(c('CD3', 'CD4'),
                                                c('CTGTTTACACCGCTAG',
                                                  'CTCTACGGTGTGGCTC',
                                                  'AGCAGCCAGGCTCATT')))

plotBoxByCluster(counts,
                 decontaminated_counts,
                 c(1, 2, 1),
                 c('CD3', 'CD4'))

Plots contamination on UMAP coordinates

Description

A scatter plot of the UMAP dimensions generated by DecontX with cells colored by the estimated percentation of contamation.

Usage

plotDecontXContamination(
  x,
  batch = NULL,
  colorScale = c("blue", "green", "yellow", "orange", "red"),
  size = 1
)
plotDecontXContamination(
  x,
  batch = NULL,
  colorScale = c("blue", "green", "yellow", "orange", "red"),
  size = 1
)

Arguments

`x`	Either a SingleCellExperiment with `decontX` results stored in `metadata(x)$decontX` or the result from running decontX on a count matrix.
`batch`	Character. Batch of cells to plot. If `NULL`, then the first batch in the list will be selected. Default `NULL`.
`colorScale`	Character vector. Contains the color spectrum to be passed to `scale_colour_gradientn` from package 'ggplot2'. Default c("blue","green","yellow","orange","red").
`size`	Numeric. Size of points in the scatterplot. Default 1.

Value

Returns a ggplot object.

Author(s)

Shiyi Yang, Joshua Campbell

Examples

# Generate matrix with contamination
s <- simulateContamination(seed = 12345)

library(SingleCellExperiment)
library(celda)
sce <- SingleCellExperiment(list(counts = s$observedCounts))
sce <- decontX(sce)

# Plot contamination on UMAP
plotDecontXContamination(sce)

# Plot decontX cluster labels
umap <- reducedDim(sce)
celda::plotDimReduceCluster(x = sce$decontX_clusters,
    dim1 = umap[, 1], dim2 = umap[, 2], )

# Plot percentage of marker genes detected
# in each cell cluster before decontamination
s$markers
plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts")

# Plot percentage of marker genes detected
# in each cell cluster after contamination
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = "decontXcounts")

# Plot percentage of marker genes detected in each cell
# comparing original and decontaminated counts side-by-side
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = c("counts", "decontXcounts"))

# Plot raw counts of indiviual markers genes before
# and after decontamination
plotDecontXMarkerExpression(sce, unlist(s$markers))
# Generate matrix with contamination
s <- simulateContamination(seed = 12345)

library(SingleCellExperiment)
library(celda)
sce <- SingleCellExperiment(list(counts = s$observedCounts))
sce <- decontX(sce)

# Plot contamination on UMAP
plotDecontXContamination(sce)

# Plot decontX cluster labels
umap <- reducedDim(sce)
celda::plotDimReduceCluster(x = sce$decontX_clusters,
    dim1 = umap[, 1], dim2 = umap[, 2], )

# Plot percentage of marker genes detected
# in each cell cluster before decontamination
s$markers
plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts")

# Plot percentage of marker genes detected
# in each cell cluster after contamination
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = "decontXcounts")

# Plot percentage of marker genes detected in each cell
# comparing original and decontaminated counts side-by-side
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = c("counts", "decontXcounts"))

# Plot raw counts of indiviual markers genes before
# and after decontamination
plotDecontXMarkerExpression(sce, unlist(s$markers))

Plots expression of marker genes before and after decontamination

Description

Generates a violin plot that shows the counts of marker genes in cells across specific clusters or cell types. Can be used to view the expression of marker genes in different cell types before and after decontamination with decontX.

Usage

plotDecontXMarkerExpression(
  x,
  markers,
  groupClusters = NULL,
  assayName = c("counts", "decontXcounts"),
  z = NULL,
  exactMatch = TRUE,
  by = "rownames",
  log1p = FALSE,
  ncol = NULL,
  plotDots = FALSE,
  dotSize = 0.1
)
plotDecontXMarkerExpression(
  x,
  markers,
  groupClusters = NULL,
  assayName = c("counts", "decontXcounts"),
  z = NULL,
  exactMatch = TRUE,
  by = "rownames",
  log1p = FALSE,
  ncol = NULL,
  plotDots = FALSE,
  dotSize = 0.1
)

Arguments

`x`	Either a SingleCellExperiment or a matrix-like object of counts.
`markers`	Character Vector or List. A character vector or list of character vectors with the names of the marker genes of interest.
`groupClusters`	List. A named list that allows cell clusters labels coded in `z` to be regrouped and renamed on the fly. For example, `list(Tcells=c(1, 2), Bcells=7)` would recode clusters 1 and 2 to "Tcells" and cluster 7 to "Bcells". Note that if this is used, clusters in `z` not found in `groupClusters` will be excluded. Default `NULL`.
`assayName`	Character vector. Name(s) of the assay(s) to plot if `x` is a SingleCellExperiment. If more than one assay is listed, then side-by-side violin plots will be generated. Default `c("counts", "decontXcounts")`.
`z`	Character, Integer, or Vector. Indicates the cluster labels for each cell. If `x` is a SingleCellExperiment and `z = NULL`, then the cluster labels from `decontX` will be retreived from the `colData` of `x` (i.e. `colData(x)$decontX_clusters`). If `z` is a single character or integer, then that column will be retrived from `colData` of `x`. (i.e. `colData(x)[,z]`). If `x` is a counts matrix, then `z` will need to be a vector the same length as the number of columns in `x` that indicate the cluster to which each cell belongs. Default `NULL`.
`exactMatch`	Boolean. Whether to only identify exact matches for the markers or to identify partial matches using `grep`. See `retrieveFeatureIndex` for more details. Default `TRUE`.
`by`	Character. Where to search for the markers if `x` is a SingleCellExperiment. See `retrieveFeatureIndex` for more details. If `x` is a matrix, then this must be set to `"rownames"`. Default `"rownames"`.
`log1p`	Boolean. Whether to apply the function `log1p` to the data before plotting. This function will add a pseudocount of 1 and then log transform the expression values. Default `FALSE`.
`ncol`	Integer. Number of columns to make in the plot. Default `NULL`.
`plotDots`	Boolean. If `TRUE`, the expression of features will be plotted as points in addition to the violin curve. Default `FALSE`.
`dotSize`	Numeric. Size of points if `plotDots = TRUE`. Default `0.1`.

Value

Returns a ggplot object.

Author(s)

Shiyi Yang, Joshua Campbell

Examples

# Generate matrix with contamination
s <- simulateContamination(seed = 12345)

library(SingleCellExperiment)
library(celda)
sce <- SingleCellExperiment(list(counts = s$observedCounts))
sce <- decontX(sce)

# Plot contamination on UMAP
plotDecontXContamination(sce)

# Plot decontX cluster labels
umap <- reducedDim(sce)
celda::plotDimReduceCluster(x = sce$decontX_clusters,
    dim1 = umap[, 1], dim2 = umap[, 2], )

# Plot percentage of marker genes detected
# in each cell cluster before decontamination
s$markers
plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts")

# Plot percentage of marker genes detected
# in each cell cluster after contamination
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = "decontXcounts")

# Plot percentage of marker genes detected in each cell
# comparing original and decontaminated counts side-by-side
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = c("counts", "decontXcounts"))

# Plot raw counts of indiviual markers genes before
# and after decontamination
plotDecontXMarkerExpression(sce, unlist(s$markers))
# Generate matrix with contamination
s <- simulateContamination(seed = 12345)

library(SingleCellExperiment)
library(celda)
sce <- SingleCellExperiment(list(counts = s$observedCounts))
sce <- decontX(sce)

# Plot contamination on UMAP
plotDecontXContamination(sce)

# Plot decontX cluster labels
umap <- reducedDim(sce)
celda::plotDimReduceCluster(x = sce$decontX_clusters,
    dim1 = umap[, 1], dim2 = umap[, 2], )

# Plot percentage of marker genes detected
# in each cell cluster before decontamination
s$markers
plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts")

# Plot percentage of marker genes detected
# in each cell cluster after contamination
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = "decontXcounts")

# Plot percentage of marker genes detected in each cell
# comparing original and decontaminated counts side-by-side
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = c("counts", "decontXcounts"))

# Plot raw counts of indiviual markers genes before
# and after decontamination
plotDecontXMarkerExpression(sce, unlist(s$markers))

Plots percentage of cells cell types expressing markers

Description

Generates a barplot that shows the percentage of cells within clusters or cell types that have detectable levels of given marker genes. Can be used to view the expression of marker genes in different cell types before and after decontamination with decontX.

Usage

plotDecontXMarkerPercentage(
  x,
  markers,
  groupClusters = NULL,
  assayName = c("counts", "decontXcounts"),
  z = NULL,
  threshold = 1,
  exactMatch = TRUE,
  by = "rownames",
  ncol = round(sqrt(length(markers))),
  labelBars = TRUE,
  labelSize = 3
)
plotDecontXMarkerPercentage(
  x,
  markers,
  groupClusters = NULL,
  assayName = c("counts", "decontXcounts"),
  z = NULL,
  threshold = 1,
  exactMatch = TRUE,
  by = "rownames",
  ncol = round(sqrt(length(markers))),
  labelBars = TRUE,
  labelSize = 3
)

Arguments

`x`	Either a SingleCellExperiment or a matrix-like object of counts.
`markers`	List. A named list indicating the marker genes for each cell type of interest. Multiple markers can be supplied for each cell type. For example, `list(Tcell_Markers=c("CD3E", "CD3D"), Bcell_Markers=c("CD79A", "CD79B", "MS4A1")` would specify markers for human T-cells and B-cells. A cell will be considered "positive" for a cell type if it has a count greater than `threshold` for at least one of the marker genes in the list.
`groupClusters`	List. A named list that allows cell clusters labels coded in `z` to be regrouped and renamed on the fly. For example, `list(Tcells=c(1, 2), Bcells=7)` would recode clusters 1 and 2 to "Tcells" and cluster 7 to "Bcells". Note that if this is used, clusters in `z` not found in `groupClusters` will be excluded from the barplot. Default `NULL`.
`assayName`	Character vector. Name(s) of the assay(s) to plot if `x` is a SingleCellExperiment. If more than one assay is listed, then side-by-side barplots will be generated. Default `c("counts", "decontXcounts")`.
`z`	Character, Integer, or Vector. Indicates the cluster labels for each cell. If `x` is a SingleCellExperiment and `z = NULL`, then the cluster labels from `decontX` will be retived from the `colData` of `x` (i.e. `colData(x)$decontX_clusters`). If `z` is a single character or integer, then that column will be retrived from `colData` of `x`. (i.e. `colData(x)[,z]`). If `x` is a counts matrix, then `z` will need to be a vector the same length as the number of columns in `x` that indicate the cluster to which each cell belongs. Default `NULL`.
`threshold`	Numeric. Markers greater than or equal to this value will be considered detected in a cell. Default 1.
`exactMatch`	Boolean. Whether to only identify exact matches for the markers or to identify partial matches using `grep`. See `retrieveFeatureIndex` for more details. Default `TRUE`.
`by`	Character. Where to search for the markers if `x` is a SingleCellExperiment. See `retrieveFeatureIndex` for more details. If `x` is a matrix, then this must be set to `"rownames"`.Default `"rownames"`.
`ncol`	Integer. Number of columns to make in the plot. Default `round(sqrt(length(markers))`.
`labelBars`	Boolean. Whether to display percentages above each bar Default `TRUE`.
`labelSize`	Numeric. Size of the percentage labels in the barplot. Default 3.

Value

Returns a ggplot object.

Author(s)

Shiyi Yang, Joshua Campbell

Examples

# Generate matrix with contamination
s <- simulateContamination(seed = 12345)

library(SingleCellExperiment)
library(celda)
sce <- SingleCellExperiment(list(counts = s$observedCounts))
sce <- decontX(sce)

# Plot contamination on UMAP
plotDecontXContamination(sce)

# Plot decontX cluster labels
umap <- reducedDim(sce)
celda::plotDimReduceCluster(x = sce$decontX_clusters,
    dim1 = umap[, 1], dim2 = umap[, 2], )

# Plot percentage of marker genes detected
# in each cell cluster before decontamination
s$markers
plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts")

# Plot percentage of marker genes detected
# in each cell cluster after contamination
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = "decontXcounts")

# Plot percentage of marker genes detected in each cell
# comparing original and decontaminated counts side-by-side
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = c("counts", "decontXcounts"))

# Plot raw counts of indiviual markers genes before
# and after decontamination
plotDecontXMarkerExpression(sce, unlist(s$markers))
# Generate matrix with contamination
s <- simulateContamination(seed = 12345)

library(SingleCellExperiment)
library(celda)
sce <- SingleCellExperiment(list(counts = s$observedCounts))
sce <- decontX(sce)

# Plot contamination on UMAP
plotDecontXContamination(sce)

# Plot decontX cluster labels
umap <- reducedDim(sce)
celda::plotDimReduceCluster(x = sce$decontX_clusters,
    dim1 = umap[, 1], dim2 = umap[, 2], )

# Plot percentage of marker genes detected
# in each cell cluster before decontamination
s$markers
plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts")

# Plot percentage of marker genes detected
# in each cell cluster after contamination
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = "decontXcounts")

# Plot percentage of marker genes detected in each cell
# comparing original and decontaminated counts side-by-side
plotDecontXMarkerPercentage(sce, markers = s$markers,
                            assayName = c("counts", "decontXcounts"))

# Plot raw counts of indiviual markers genes before
# and after decontamination
plotDecontXMarkerExpression(sce, unlist(s$markers))

Density of each ADT, raw counts overlapped with decontaminated counts

Description

Density of each ADT, raw counts overlapped with decontaminated counts

Usage

plotDensity(counts, decontaminated_counts, features, file = NULL)
plotDensity(counts, decontaminated_counts, features, file = NULL)

Arguments

`counts`	original count matrix of nADT x nDroplet.
`decontaminated_counts`	decontaminated count matrix.
`features`	names of ADT to plot
`file`	file name to save plot into a pdf. If omit, return `ggplot` object.

Value

Return a pdf file named file or a ggplot object.

Examples

# Simulate a dataset with 3 cells and 2 ADTs
counts <- matrix(c(60, 72, 52, 49, 89, 112),
                 nrow = 2,
                 dimnames = list(c('CD3', 'CD4'),
                                 c('CTGTTTACACCGCTAG',
                                   'CTCTACGGTGTGGCTC',
                                   'AGCAGCCAGGCTCATT')))

decontaminated_counts <- matrix(c(58, 36, 26, 45, 88, 110),
                                nrow = 2,
                                dimnames = list(c('CD3', 'CD4'),
                                                c('CTGTTTACACCGCTAG',
                                                  'CTCTACGGTGTGGCTC',
                                                  'AGCAGCCAGGCTCATT')))

plotDensity(counts,
            decontaminated_counts,
            c('CD3', 'CD4'))
# Simulate a dataset with 3 cells and 2 ADTs
counts <- matrix(c(60, 72, 52, 49, 89, 112),
                 nrow = 2,
                 dimnames = list(c('CD3', 'CD4'),
                                 c('CTGTTTACACCGCTAG',
                                   'CTCTACGGTGTGGCTC',
                                   'AGCAGCCAGGCTCATT')))

decontaminated_counts <- matrix(c(58, 36, 26, 45, 88, 110),
                                nrow = 2,
                                dimnames = list(c('CD3', 'CD4'),
                                                c('CTGTTTACACCGCTAG',
                                                  'CTCTACGGTGTGGCTC',
                                                  'AGCAGCCAGGCTCATT')))

plotDensity(counts,
            decontaminated_counts,
            c('CD3', 'CD4'))

Retrieve row index for a set of features

Description

This will return indices of features among the rownames or rowData of a data.frame, matrix, or a SummarizedExperiment object including a SingleCellExperiment. Partial matching (i.e. grepping) can be used by setting exactMatch = FALSE.

Usage

retrieveFeatureIndex(
  features,
  x,
  by = "rownames",
  exactMatch = TRUE,
  removeNA = FALSE
)
retrieveFeatureIndex(
  features,
  x,
  by = "rownames",
  exactMatch = TRUE,
  removeNA = FALSE
)

Arguments

`features`	Character vector of feature names to find in the rows of `x`.
`x`	A data.frame, matrix, or SingleCellExperiment object to search.
`by`	Character. Where to search for features in `x`. If set to `"rownames"` then the features will be searched for among `rownames(x)`. If `x` inherits from class SummarizedExperiment, then `by` can be one of the fields in the row annotation data.frame (i.e. one of `colnames(rowData(x))`).
`exactMatch`	Boolean. Whether to only identify exact matches or to identify partial matches using `grep`.
`removeNA`	Boolean. If set to `FALSE`, features not found in `x` will be given `NA` and the returned vector will be the same length as `features`. If set to `TRUE`, then the `NA` values will be removed from the returned vector. Default `FALSE`.

Value

A vector of row indices for the matching features in x.

Author(s)

Yusuke Koga, Joshua Campbell

Examples

counts <- matrix(sample(1:10, 20*10, replace = TRUE),
                 nrow = 20, ncol = 10,
                 dimnames = list(paste0("Gene_",1:20),
                                 paste0("Cell_", 1:10)))
retrieveFeatureIndex(c("Gene_1", "Gene_5"), counts)
retrieveFeatureIndex(c("Gene_1", "Gene_5"), counts, exactMatch = FALSE)
counts <- matrix(sample(1:10, 20*10, replace = TRUE),
                 nrow = 20, ncol = 10,
                 dimnames = list(paste0("Gene_",1:20),
                                 paste0("Cell_", 1:10)))
retrieveFeatureIndex(c("Gene_1", "Gene_5"), counts)
retrieveFeatureIndex(c("Gene_1", "Gene_5"), counts, exactMatch = FALSE)

Simulate contaminated count matrix

Description

This function generates a list containing two count matrices – one for real expression, the other one for contamination, as well as other parameters used in the simulation which can be useful for running decontamination.

Usage

simulateContamination(
  C = 300,
  G = 100,
  K = 3,
  NRange = c(500, 1000),
  beta = 0.1,
  delta = c(1, 10),
  numMarkers = 3,
  seed = 12345
)
simulateContamination(
  C = 300,
  G = 100,
  K = 3,
  NRange = c(500, 1000),
  beta = 0.1,
  delta = c(1, 10),
  numMarkers = 3,
  seed = 12345
)

Arguments

`C`	Integer. Number of cells to be simulated. Default `300`.
`G`	Integer. Number of genes to be simulated. Default `100`.
`K`	Integer. Number of cell populations to be simulated. Default `3`.
`NRange`	Integer vector. A vector of length 2 that specifies the lower and upper bounds of the number of counts generated for each cell. Default `c(500, 1000)`.
`beta`	Numeric. Concentration parameter for Phi. Default `0.1`.
`delta`	Numeric or Numeric vector. Concentration parameter for Theta. If input as a single numeric value, symmetric values for beta distribution are specified; if input as a vector of lenght 2, the two values will be the shape1 and shape2 paramters of the beta distribution respectively. Default `c(1, 5)`.
`numMarkers`	Integer. Number of markers for each cell population. Default `3`.
`seed`	Integer. Passed to `with_seed`. For reproducibility, a default value of 12345 is used. If NULL, no calls to `with_seed` are made.

Value

A list containing the nativeMatirx (real expression), observedMatrix (real expression + contamination), as well as other parameters used in the simulation.

Author(s)

Shiyi Yang, Yuan Yin, Joshua Campbell

Examples

contaminationSim <- simulateContamination(K = 3, delta = c(1, 10))
contaminationSim <- simulateContamination(K = 3, delta = c(1, 10))

Package 'decontX'

Help Index

The 'decontX' package.

Description

References

Call Stan variational bayes for inference

Description

Usage

Arguments

Value

Process Stan output.

Description

Usage

Arguments

Value

Decontaminate using decontPro

Description

Usage

Arguments

Value

Examples

Contamination estimation with decontX

Description

Usage

Arguments

Value

Author(s)

Examples

Get or set decontaminated counts matrix

Description

Usage

Arguments

Value

See Also

Fast normalization for numeric matrix

Description

Usage

Arguments

Value

Fast normalization for numeric matrix

Description

Usage

Arguments

Value

Fast normalization for numeric matrix

Description

Usage

Arguments

Value

get row and column indices of none zero elements in the matrix

Description

Usage

Arguments

Value

Boxplot of features grouped by cell type

Description

Usage

Arguments

Value

Examples

Plots contamination on UMAP coordinates

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Plots expression of marker genes before and after decontamination

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Plots percentage of cells cell types expressing markers

Description

Usage

Arguments