Title: | Decontamination of single cell genomics data |
---|---|
Description: | This package contains implementation of DecontX (Yang et al. 2020), a decontamination algorithm for single-cell RNA-seq, and DecontPro (Yin et al. 2023), a decontamination algorithm for single cell protein expression data. DecontX is a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. DecontPro is a Bayesian method that estimates the level of contamination from ambient and background sources in CITE-seq ADT dataset and decontaminate the dataset. |
Authors: | Yuan Yin [aut, cre] , Masanao Yajima [aut] , Joshua Campbell [aut] |
Maintainer: | Yuan Yin <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.5.0 |
Built: | 2024-10-30 05:48:10 UTC |
Source: | https://github.com/bioc/decontX |
A DESCRIPTION OF THE PACKAGE
Stan Development Team (2022). RStan: the R interface to Stan. R package version 2.21.7. https://mc-stan.org
Call Stan variational bayes for inference
.call_stan_vb(data, initial_condition)
.call_stan_vb(data, initial_condition)
data |
A list of input data for Stan. |
initial_condition |
Initial values for Stan params. |
Stan output
Process Stan output.
.process_stan_vb_out(stan_vb_output, dat)
.process_stan_vb_out(stan_vb_output, dat)
stan_vb_output |
Stan variational bayes output |
dat |
List of data input to stan vb |
Decomposed counts based on Stan estimate.
Decontaminate using decontPro
decontPro(object, cell_type, ...) ## S4 method for signature 'SingleCellExperiment' decontPro(object, cell_type, delta_sd = 2e-05, background_sd = 2e-06, ...) ## S4 method for signature 'Seurat' decontPro(object, cell_type, delta_sd = 2e-05, background_sd = 2e-06, ...) ## S4 method for signature 'ANY' decontPro(object, cell_type, delta_sd = 2e-05, background_sd = 2e-06, ...)
decontPro(object, cell_type, ...) ## S4 method for signature 'SingleCellExperiment' decontPro(object, cell_type, delta_sd = 2e-05, background_sd = 2e-06, ...) ## S4 method for signature 'Seurat' decontPro(object, cell_type, delta_sd = 2e-05, background_sd = 2e-06, ...) ## S4 method for signature 'ANY' decontPro(object, cell_type, delta_sd = 2e-05, background_sd = 2e-06, ...)
object |
Data matrix NxM (feature x droplet). |
cell_type |
1xM vector of cell type. 1-based. |
... |
Additional arguments for generics. |
delta_sd |
Prior variance for ambient contamination level. Default to 2e-5. |
background_sd |
Prior variance for background contamination level. Default to 2e-6. |
A list containing decontaminated counts, and estimated parameters.
# Simulated count matrix counts <- matrix(sample(1:10, 1000, replace = TRUE), ncol = 10) # Cell type indicator k <- c(1, 1, 2, 2, 2, 3, 3, 4, 4, 4) # Decontamination out <- decontPro(counts, k, 1e-2, 1e-2) # Decontaminated counts decontaminated_counts <- out$decontaminated_counts
# Simulated count matrix counts <- matrix(sample(1:10, 1000, replace = TRUE), ncol = 10) # Cell type indicator k <- c(1, 1, 2, 2, 2, 3, 3, 4, 4, 4) # Decontamination out <- decontPro(counts, k, 1e-2, 1e-2) # Decontaminated counts decontaminated_counts <- out$decontaminated_counts
Identifies contamination from factors such as ambient RNA in single cell genomic datasets.
decontX(x, ...) ## S4 method for signature 'SingleCellExperiment' decontX( x, assayName = "counts", z = NULL, batch = NULL, background = NULL, bgAssayName = NULL, bgBatch = NULL, maxIter = 500, delta = c(10, 10), estimateDelta = TRUE, convergence = 0.001, iterLogLik = 10, varGenes = 5000, dbscanEps = 1, seed = 12345, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'ANY' decontX( x, z = NULL, batch = NULL, background = NULL, bgBatch = NULL, maxIter = 500, delta = c(10, 10), estimateDelta = TRUE, convergence = 0.001, iterLogLik = 10, varGenes = 5000, dbscanEps = 1, seed = 12345, logfile = NULL, verbose = TRUE )
decontX(x, ...) ## S4 method for signature 'SingleCellExperiment' decontX( x, assayName = "counts", z = NULL, batch = NULL, background = NULL, bgAssayName = NULL, bgBatch = NULL, maxIter = 500, delta = c(10, 10), estimateDelta = TRUE, convergence = 0.001, iterLogLik = 10, varGenes = 5000, dbscanEps = 1, seed = 12345, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'ANY' decontX( x, z = NULL, batch = NULL, background = NULL, bgBatch = NULL, maxIter = 500, delta = c(10, 10), estimateDelta = TRUE, convergence = 0.001, iterLogLik = 10, varGenes = 5000, dbscanEps = 1, seed = 12345, logfile = NULL, verbose = TRUE )
x |
A numeric matrix of counts or a SingleCellExperiment
with the matrix located in the assay slot under |
... |
For the generic, further arguments to pass to each method. |
assayName |
Character. Name of the assay to use if |
z |
Numeric or character vector. Cell cluster labels. If NULL, PCA will be used to reduce the dimensionality of the dataset initially, 'umap' from the 'uwot' package will be used to further reduce the dataset to 2 dimenions and the 'dbscan' function from the 'dbscan' package will be used to identify clusters of broad cell types. Default NULL. |
batch |
Numeric or character vector. Batch labels for cells. If batch labels are supplied, DecontX is run on cells from each batch separately. Cells run in different channels or assays should be considered different batches. Default NULL. |
background |
A numeric matrix of counts or a
SingleCellExperiment with the matrix located in the assay
slot under |
bgAssayName |
Character. Name of the assay to use if |
bgBatch |
Numeric or character vector. Batch labels for
|
maxIter |
Integer. Maximum iterations of the EM algorithm. Default 500. |
delta |
Numeric Vector of length 2. Concentration parameters for
the Dirichlet prior for the contamination in each cell. The first element
is the prior for the native counts while the second element is the prior for
the contamination counts. These essentially act as pseudocounts for the
native and contamination in each cell. If |
estimateDelta |
Boolean. Whether to update |
convergence |
Numeric. The EM algorithm will be stopped if the maximum difference in the contamination estimates between the previous and current iterations is less than this. Default 0.001. |
iterLogLik |
Integer. Calculate log likelihood every |
varGenes |
Integer. The number of variable genes to use in
dimensionality reduction before clustering. Variability is calcualted using
|
dbscanEps |
Numeric. The clustering resolution parameter used in 'dbscan' to estimate broad cell clusters. Used only when z is not provided. Default 1. |
seed |
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made. |
logfile |
Character. Messages will be redirected to a file named
|
verbose |
Logical. Whether to print log messages. Default TRUE. |
If x
is a matrix-like object, a list will be returned
with the following items:
decontXcounts
:The decontaminated matrix. Values obtained
from the variational inference procedure may be non-integer. However,
integer counts can be obtained by rounding,
e.g. round(decontXcounts)
.
contamination
:Percentage of contamination in each cell.
estimates
:List of estimated parameters for each batch. If z was not supplied, then the UMAP coordinates used to generated cell cluster labels will also be stored here.
z
:Cell population/cluster labels used for analysis.
runParams
:List of arguments used in the function call.
If x
is a SingleCellExperiment, then the decontaminated
counts will be stored as an assay and can be accessed with
decontXcounts(x)
. The contamination values and cluster labels
will be stored in colData(x)
. estimates
and runParams
will be stored in metadata(x)$decontX
. The UMAPs used to generated
cell cluster labels will be stored in
reducedDims
slot in x
.
Shiyi Yang, Yuan Yin, Joshua Campbell
# Generate matrix with contamination s <- simulateContamination(seed = 12345) library(SingleCellExperiment) library(celda) sce <- SingleCellExperiment(list(counts = s$observedCounts)) sce <- decontX(sce) # Plot contamination on UMAP plotDecontXContamination(sce) # Plot decontX cluster labels umap <- reducedDim(sce) celda::plotDimReduceCluster(x = sce$decontX_clusters, dim1 = umap[, 1], dim2 = umap[, 2], ) # Plot percentage of marker genes detected # in each cell cluster before decontamination s$markers plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts") # Plot percentage of marker genes detected # in each cell cluster after contamination plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "decontXcounts") # Plot percentage of marker genes detected in each cell # comparing original and decontaminated counts side-by-side plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = c("counts", "decontXcounts")) # Plot raw counts of indiviual markers genes before # and after decontamination plotDecontXMarkerExpression(sce, unlist(s$markers))
# Generate matrix with contamination s <- simulateContamination(seed = 12345) library(SingleCellExperiment) library(celda) sce <- SingleCellExperiment(list(counts = s$observedCounts)) sce <- decontX(sce) # Plot contamination on UMAP plotDecontXContamination(sce) # Plot decontX cluster labels umap <- reducedDim(sce) celda::plotDimReduceCluster(x = sce$decontX_clusters, dim1 = umap[, 1], dim2 = umap[, 2], ) # Plot percentage of marker genes detected # in each cell cluster before decontamination s$markers plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts") # Plot percentage of marker genes detected # in each cell cluster after contamination plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "decontXcounts") # Plot percentage of marker genes detected in each cell # comparing original and decontaminated counts side-by-side plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = c("counts", "decontXcounts")) # Plot raw counts of indiviual markers genes before # and after decontamination plotDecontXMarkerExpression(sce, unlist(s$markers))
Gets or sets the decontaminated counts matrix from a a SingleCellExperiment object.
decontXcounts(object, ...) decontXcounts(object, ...) <- value ## S4 method for signature 'SingleCellExperiment' decontXcounts(object, ...) ## S4 replacement method for signature 'SingleCellExperiment' decontXcounts(object, ...) <- value
decontXcounts(object, ...) decontXcounts(object, ...) <- value ## S4 method for signature 'SingleCellExperiment' decontXcounts(object, ...) ## S4 replacement method for signature 'SingleCellExperiment' decontXcounts(object, ...) <- value
object |
A SingleCellExperiment object. |
... |
For the generic, further arguments to pass to each method. |
value |
A matrix to save as an assay called |
If getting, the assay from object
with the name
decontXcounts
will be returned. If setting, a
SingleCellExperiment object will be returned with
decontXcounts
listed in the assay
slot.
Fast normalization for numeric matrix
fastNormProp(R_counts, R_alpha)
fastNormProp(R_counts, R_alpha)
R_counts |
An integer matrix |
R_alpha |
A double value to be added to the matrix as a pseudocount |
A numeric matrix where the columns have been normalized to proportions
Fast normalization for numeric matrix
fastNormPropLog(R_counts, R_alpha)
fastNormPropLog(R_counts, R_alpha)
R_counts |
An integer matrix |
R_alpha |
A double value to be added to the matrix as a pseudocount |
A numeric matrix where the columns have been normalized to proportions
Fast normalization for numeric matrix
fastNormPropSqrt(R_counts, R_alpha)
fastNormPropSqrt(R_counts, R_alpha)
R_counts |
An integer matrix |
R_alpha |
A double value to be added to the matrix as a pseudocount |
A numeric matrix where the columns have been normalized to proportions
get row and column indices of none zero elements in the matrix
nonzero(R_counts)
nonzero(R_counts)
R_counts |
A matrix |
An integer matrix where each row is a row, column indices pair
Boxplot of features grouped by cell type
plotBoxByCluster( counts, decontaminated_counts, cell_type, features, file = NULL )
plotBoxByCluster( counts, decontaminated_counts, cell_type, features, file = NULL )
counts |
original count matrix of nADT x nDroplet. |
decontaminated_counts |
decontaminated count matrix. |
cell_type |
1xnDroplet vector of cell_type. |
features |
names of ADT to plot |
file |
file name to save plot into a pdf. If omit, return |
Return a pdf file named file
or a ggplot
object.
# Simulate a dataset with 3 cells and 2 ADTs counts <- matrix(c(60, 72, 52, 49, 89, 112), nrow = 2, dimnames = list(c('CD3', 'CD4'), c('CTGTTTACACCGCTAG', 'CTCTACGGTGTGGCTC', 'AGCAGCCAGGCTCATT'))) decontaminated_counts <- matrix(c(58, 36, 26, 45, 88, 110), nrow = 2, dimnames = list(c('CD3', 'CD4'), c('CTGTTTACACCGCTAG', 'CTCTACGGTGTGGCTC', 'AGCAGCCAGGCTCATT'))) plotBoxByCluster(counts, decontaminated_counts, c(1, 2, 1), c('CD3', 'CD4'))
# Simulate a dataset with 3 cells and 2 ADTs counts <- matrix(c(60, 72, 52, 49, 89, 112), nrow = 2, dimnames = list(c('CD3', 'CD4'), c('CTGTTTACACCGCTAG', 'CTCTACGGTGTGGCTC', 'AGCAGCCAGGCTCATT'))) decontaminated_counts <- matrix(c(58, 36, 26, 45, 88, 110), nrow = 2, dimnames = list(c('CD3', 'CD4'), c('CTGTTTACACCGCTAG', 'CTCTACGGTGTGGCTC', 'AGCAGCCAGGCTCATT'))) plotBoxByCluster(counts, decontaminated_counts, c(1, 2, 1), c('CD3', 'CD4'))
A scatter plot of the UMAP dimensions generated by DecontX with cells colored by the estimated percentation of contamation.
plotDecontXContamination( x, batch = NULL, colorScale = c("blue", "green", "yellow", "orange", "red"), size = 1 )
plotDecontXContamination( x, batch = NULL, colorScale = c("blue", "green", "yellow", "orange", "red"), size = 1 )
x |
Either a SingleCellExperiment with |
batch |
Character. Batch of cells to plot. If |
colorScale |
Character vector. Contains the color spectrum to be passed
to |
size |
Numeric. Size of points in the scatterplot. Default 1. |
Returns a ggplot
object.
Shiyi Yang, Joshua Campbell
See decontX
for a full example of how to estimate
and plot contamination.
# Generate matrix with contamination s <- simulateContamination(seed = 12345) library(SingleCellExperiment) library(celda) sce <- SingleCellExperiment(list(counts = s$observedCounts)) sce <- decontX(sce) # Plot contamination on UMAP plotDecontXContamination(sce) # Plot decontX cluster labels umap <- reducedDim(sce) celda::plotDimReduceCluster(x = sce$decontX_clusters, dim1 = umap[, 1], dim2 = umap[, 2], ) # Plot percentage of marker genes detected # in each cell cluster before decontamination s$markers plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts") # Plot percentage of marker genes detected # in each cell cluster after contamination plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "decontXcounts") # Plot percentage of marker genes detected in each cell # comparing original and decontaminated counts side-by-side plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = c("counts", "decontXcounts")) # Plot raw counts of indiviual markers genes before # and after decontamination plotDecontXMarkerExpression(sce, unlist(s$markers))
# Generate matrix with contamination s <- simulateContamination(seed = 12345) library(SingleCellExperiment) library(celda) sce <- SingleCellExperiment(list(counts = s$observedCounts)) sce <- decontX(sce) # Plot contamination on UMAP plotDecontXContamination(sce) # Plot decontX cluster labels umap <- reducedDim(sce) celda::plotDimReduceCluster(x = sce$decontX_clusters, dim1 = umap[, 1], dim2 = umap[, 2], ) # Plot percentage of marker genes detected # in each cell cluster before decontamination s$markers plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts") # Plot percentage of marker genes detected # in each cell cluster after contamination plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "decontXcounts") # Plot percentage of marker genes detected in each cell # comparing original and decontaminated counts side-by-side plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = c("counts", "decontXcounts")) # Plot raw counts of indiviual markers genes before # and after decontamination plotDecontXMarkerExpression(sce, unlist(s$markers))
Generates a violin plot that shows the counts of marker
genes in cells across specific clusters or cell types. Can be used to view
the expression of marker genes in different cell types before and after
decontamination with decontX
.
plotDecontXMarkerExpression( x, markers, groupClusters = NULL, assayName = c("counts", "decontXcounts"), z = NULL, exactMatch = TRUE, by = "rownames", log1p = FALSE, ncol = NULL, plotDots = FALSE, dotSize = 0.1 )
plotDecontXMarkerExpression( x, markers, groupClusters = NULL, assayName = c("counts", "decontXcounts"), z = NULL, exactMatch = TRUE, by = "rownames", log1p = FALSE, ncol = NULL, plotDots = FALSE, dotSize = 0.1 )
x |
Either a SingleCellExperiment or a matrix-like object of counts. |
markers |
Character Vector or List. A character vector or list of character vectors with the names of the marker genes of interest. |
groupClusters |
List. A named list that allows
cell clusters labels coded in
|
assayName |
Character vector. Name(s) of the assay(s) to
plot if |
z |
Character, Integer, or Vector.
Indicates the cluster labels for each cell.
If |
exactMatch |
Boolean. Whether to only identify exact matches
for the markers or to identify partial matches using |
by |
Character. Where to search for the markers if |
log1p |
Boolean. Whether to apply the function |
ncol |
Integer. Number of columns to make in the plot.
Default |
plotDots |
Boolean. If |
dotSize |
Numeric. Size of points if |
Returns a ggplot
object.
Shiyi Yang, Joshua Campbell
See decontX
for a full example of how to estimate
and plot contamination.
# Generate matrix with contamination s <- simulateContamination(seed = 12345) library(SingleCellExperiment) library(celda) sce <- SingleCellExperiment(list(counts = s$observedCounts)) sce <- decontX(sce) # Plot contamination on UMAP plotDecontXContamination(sce) # Plot decontX cluster labels umap <- reducedDim(sce) celda::plotDimReduceCluster(x = sce$decontX_clusters, dim1 = umap[, 1], dim2 = umap[, 2], ) # Plot percentage of marker genes detected # in each cell cluster before decontamination s$markers plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts") # Plot percentage of marker genes detected # in each cell cluster after contamination plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "decontXcounts") # Plot percentage of marker genes detected in each cell # comparing original and decontaminated counts side-by-side plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = c("counts", "decontXcounts")) # Plot raw counts of indiviual markers genes before # and after decontamination plotDecontXMarkerExpression(sce, unlist(s$markers))
# Generate matrix with contamination s <- simulateContamination(seed = 12345) library(SingleCellExperiment) library(celda) sce <- SingleCellExperiment(list(counts = s$observedCounts)) sce <- decontX(sce) # Plot contamination on UMAP plotDecontXContamination(sce) # Plot decontX cluster labels umap <- reducedDim(sce) celda::plotDimReduceCluster(x = sce$decontX_clusters, dim1 = umap[, 1], dim2 = umap[, 2], ) # Plot percentage of marker genes detected # in each cell cluster before decontamination s$markers plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts") # Plot percentage of marker genes detected # in each cell cluster after contamination plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "decontXcounts") # Plot percentage of marker genes detected in each cell # comparing original and decontaminated counts side-by-side plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = c("counts", "decontXcounts")) # Plot raw counts of indiviual markers genes before # and after decontamination plotDecontXMarkerExpression(sce, unlist(s$markers))
Generates a barplot that shows the percentage of
cells within clusters or cell types that have detectable levels
of given marker genes. Can be used to view the expression of
marker genes in different cell types before and after
decontamination with decontX
.
plotDecontXMarkerPercentage( x, markers, groupClusters = NULL, assayName = c("counts", "decontXcounts"), z = NULL, threshold = 1, exactMatch = TRUE, by = "rownames", ncol = round(sqrt(length(markers))), labelBars = TRUE, labelSize = 3 )
plotDecontXMarkerPercentage( x, markers, groupClusters = NULL, assayName = c("counts", "decontXcounts"), z = NULL, threshold = 1, exactMatch = TRUE, by = "rownames", ncol = round(sqrt(length(markers))), labelBars = TRUE, labelSize = 3 )
x |
Either a SingleCellExperiment or a matrix-like object of counts. |
markers |
List. A named list indicating the marker genes
for each cell type of
interest. Multiple markers can be supplied for each cell type. For example,
|
groupClusters |
List. A named list that allows
cell clusters labels coded in
|
assayName |
Character vector. Name(s) of the assay(s) to
plot if |
z |
Character, Integer, or Vector. Indicates the cluster labels
for each cell.
If |
threshold |
Numeric. Markers greater than or equal to this value will be considered detected in a cell. Default 1. |
exactMatch |
Boolean. Whether to only identify exact matches
for the markers or to identify partial matches using |
by |
Character. Where to search for the markers if |
ncol |
Integer. Number of columns to make in the plot.
Default |
labelBars |
Boolean. Whether to display percentages above each bar
Default |
labelSize |
Numeric. Size of the percentage labels in the barplot. Default 3. |
Returns a ggplot
object.
Shiyi Yang, Joshua Campbell
See decontX
for a full example of how to estimate
and plot contamination.
# Generate matrix with contamination s <- simulateContamination(seed = 12345) library(SingleCellExperiment) library(celda) sce <- SingleCellExperiment(list(counts = s$observedCounts)) sce <- decontX(sce) # Plot contamination on UMAP plotDecontXContamination(sce) # Plot decontX cluster labels umap <- reducedDim(sce) celda::plotDimReduceCluster(x = sce$decontX_clusters, dim1 = umap[, 1], dim2 = umap[, 2], ) # Plot percentage of marker genes detected # in each cell cluster before decontamination s$markers plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts") # Plot percentage of marker genes detected # in each cell cluster after contamination plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "decontXcounts") # Plot percentage of marker genes detected in each cell # comparing original and decontaminated counts side-by-side plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = c("counts", "decontXcounts")) # Plot raw counts of indiviual markers genes before # and after decontamination plotDecontXMarkerExpression(sce, unlist(s$markers))
# Generate matrix with contamination s <- simulateContamination(seed = 12345) library(SingleCellExperiment) library(celda) sce <- SingleCellExperiment(list(counts = s$observedCounts)) sce <- decontX(sce) # Plot contamination on UMAP plotDecontXContamination(sce) # Plot decontX cluster labels umap <- reducedDim(sce) celda::plotDimReduceCluster(x = sce$decontX_clusters, dim1 = umap[, 1], dim2 = umap[, 2], ) # Plot percentage of marker genes detected # in each cell cluster before decontamination s$markers plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts") # Plot percentage of marker genes detected # in each cell cluster after contamination plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "decontXcounts") # Plot percentage of marker genes detected in each cell # comparing original and decontaminated counts side-by-side plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = c("counts", "decontXcounts")) # Plot raw counts of indiviual markers genes before # and after decontamination plotDecontXMarkerExpression(sce, unlist(s$markers))
Density of each ADT, raw counts overlapped with decontaminated counts
plotDensity(counts, decontaminated_counts, features, file = NULL)
plotDensity(counts, decontaminated_counts, features, file = NULL)
counts |
original count matrix of nADT x nDroplet. |
decontaminated_counts |
decontaminated count matrix. |
features |
names of ADT to plot |
file |
file name to save plot into a pdf. If omit, return |
Return a pdf file named file
or a ggplot
object.
# Simulate a dataset with 3 cells and 2 ADTs counts <- matrix(c(60, 72, 52, 49, 89, 112), nrow = 2, dimnames = list(c('CD3', 'CD4'), c('CTGTTTACACCGCTAG', 'CTCTACGGTGTGGCTC', 'AGCAGCCAGGCTCATT'))) decontaminated_counts <- matrix(c(58, 36, 26, 45, 88, 110), nrow = 2, dimnames = list(c('CD3', 'CD4'), c('CTGTTTACACCGCTAG', 'CTCTACGGTGTGGCTC', 'AGCAGCCAGGCTCATT'))) plotDensity(counts, decontaminated_counts, c('CD3', 'CD4'))
# Simulate a dataset with 3 cells and 2 ADTs counts <- matrix(c(60, 72, 52, 49, 89, 112), nrow = 2, dimnames = list(c('CD3', 'CD4'), c('CTGTTTACACCGCTAG', 'CTCTACGGTGTGGCTC', 'AGCAGCCAGGCTCATT'))) decontaminated_counts <- matrix(c(58, 36, 26, 45, 88, 110), nrow = 2, dimnames = list(c('CD3', 'CD4'), c('CTGTTTACACCGCTAG', 'CTCTACGGTGTGGCTC', 'AGCAGCCAGGCTCATT'))) plotDensity(counts, decontaminated_counts, c('CD3', 'CD4'))
This will return indices of features among the rownames
or rowData of a data.frame, matrix, or a SummarizedExperiment
object including a SingleCellExperiment.
Partial matching (i.e. grepping) can be used by setting
exactMatch = FALSE
.
retrieveFeatureIndex( features, x, by = "rownames", exactMatch = TRUE, removeNA = FALSE )
retrieveFeatureIndex( features, x, by = "rownames", exactMatch = TRUE, removeNA = FALSE )
features |
Character vector of feature names to find in the rows of
|
x |
A data.frame, matrix, or SingleCellExperiment object to search. |
by |
Character. Where to search for features in |
exactMatch |
Boolean. Whether to only identify exact matches
or to identify partial matches using |
removeNA |
Boolean. If set to |
A vector of row indices for the matching features in x
.
Yusuke Koga, Joshua Campbell
'retrieveFeatureInfo' from package 'scater'
and link{regex}
for how to use regular expressions when
exactMatch = FALSE
.
counts <- matrix(sample(1:10, 20*10, replace = TRUE), nrow = 20, ncol = 10, dimnames = list(paste0("Gene_",1:20), paste0("Cell_", 1:10))) retrieveFeatureIndex(c("Gene_1", "Gene_5"), counts) retrieveFeatureIndex(c("Gene_1", "Gene_5"), counts, exactMatch = FALSE)
counts <- matrix(sample(1:10, 20*10, replace = TRUE), nrow = 20, ncol = 10, dimnames = list(paste0("Gene_",1:20), paste0("Cell_", 1:10))) retrieveFeatureIndex(c("Gene_1", "Gene_5"), counts) retrieveFeatureIndex(c("Gene_1", "Gene_5"), counts, exactMatch = FALSE)
This function generates a list containing two count matrices – one for real expression, the other one for contamination, as well as other parameters used in the simulation which can be useful for running decontamination.
simulateContamination( C = 300, G = 100, K = 3, NRange = c(500, 1000), beta = 0.1, delta = c(1, 10), numMarkers = 3, seed = 12345 )
simulateContamination( C = 300, G = 100, K = 3, NRange = c(500, 1000), beta = 0.1, delta = c(1, 10), numMarkers = 3, seed = 12345 )
C |
Integer. Number of cells to be simulated. Default |
G |
Integer. Number of genes to be simulated. Default |
K |
Integer. Number of cell populations to be simulated.
Default |
NRange |
Integer vector. A vector of length 2 that specifies the lower
and upper bounds of the number of counts generated for each cell. Default
|
beta |
Numeric. Concentration parameter for Phi. Default |
delta |
Numeric or Numeric vector. Concentration parameter for Theta.
If input as a single numeric value, symmetric values for beta
distribution are specified; if input as a vector of lenght 2, the two
values will be the shape1 and shape2 paramters of the beta distribution
respectively. Default |
numMarkers |
Integer. Number of markers for each cell population.
Default |
seed |
Integer. Passed to |
A list containing the nativeMatirx
(real expression),
observedMatrix
(real expression + contamination), as well as other
parameters used in the simulation.
Shiyi Yang, Yuan Yin, Joshua Campbell
contaminationSim <- simulateContamination(K = 3, delta = c(1, 10))
contaminationSim <- simulateContamination(K = 3, delta = c(1, 10))