Title: | CEllular Latent Dirichlet Allocation |
---|---|
Description: | Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq data visualization functions is also included. |
Authors: | Joshua Campbell [aut, cre], Shiyi Yang [aut], Zhe Wang [aut], Sean Corbett [aut], Yusuke Koga [aut] |
Maintainer: | Joshua Campbell <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.23.0 |
Built: | 2024-10-30 04:41:56 UTC |
Source: | https://github.com/bioc/celda |
Returns a single celdaList representing the combination of two provided celdaList objects.
appendCeldaList(list1, list2)
appendCeldaList(list1, list2)
list1 |
A celda_list object |
list2 |
A celda_list object to be joined with list_1 |
A celdaList object. This object contains all resList entries and runParam records from both lists.
data(celdaCGGridSearchRes) appendedList <- appendCeldaList( celdaCGGridSearchRes, celdaCGGridSearchRes )
data(celdaCGGridSearchRes) appendedList <- appendCeldaList( celdaCGGridSearchRes, celdaCGGridSearchRes )
available models
availableModels
availableModels
An object of class character
of length 3.
Retrieves the final log-likelihood from all iterations of Gibbs sampling used to generate a celdaModel.
bestLogLikelihood(x, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' bestLogLikelihood(x, altExpName = "featureSubset") ## S4 method for signature 'celdaModel' bestLogLikelihood(x)
bestLogLikelihood(x, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' bestLogLikelihood(x, altExpName = "featureSubset") ## S4 method for signature 'celdaModel' bestLogLikelihood(x)
x |
A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG, or a celda model object. |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
Numeric. The log-likelihood at the final step of Gibbs sampling used to generate the model.
data(sceCeldaCG) bestLogLikelihood(sceCeldaCG) data(celdaCGMod) bestLogLikelihood(celdaCGMod)
data(sceCeldaCG) bestLogLikelihood(sceCeldaCG) data(celdaCGMod) bestLogLikelihood(celdaCGMod)
List of available Celda models with correpsonding descriptions.
celda()
celda()
None
celda()
celda()
Clusters the columns of a count matrix containing single-cell
data into K subpopulations. The
useAssay
assay slot in
altExpName
altExp slot will be used if
it exists. Otherwise, the useAssay
assay slot in x
will be used if
x
is a SingleCellExperiment object.
celda_C( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, K, alpha = 1, beta = 1, algorithm = c("EM", "Gibbs"), stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, zInitialize = c("split", "random", "predefined"), countChecksum = NULL, zInit = NULL, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'SingleCellExperiment' celda_C( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, K, alpha = 1, beta = 1, algorithm = c("EM", "Gibbs"), stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, zInitialize = c("split", "random", "predefined"), countChecksum = NULL, zInit = NULL, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'ANY' celda_C( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, K, alpha = 1, beta = 1, algorithm = c("EM", "Gibbs"), stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, zInitialize = c("split", "random", "predefined"), countChecksum = NULL, zInit = NULL, logfile = NULL, verbose = TRUE )
celda_C( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, K, alpha = 1, beta = 1, algorithm = c("EM", "Gibbs"), stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, zInitialize = c("split", "random", "predefined"), countChecksum = NULL, zInit = NULL, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'SingleCellExperiment' celda_C( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, K, alpha = 1, beta = 1, algorithm = c("EM", "Gibbs"), stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, zInitialize = c("split", "random", "predefined"), countChecksum = NULL, zInit = NULL, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'ANY' celda_C( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, K, alpha = 1, beta = 1, algorithm = c("EM", "Gibbs"), stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, zInitialize = c("split", "random", "predefined"), countChecksum = NULL, zInit = NULL, logfile = NULL, verbose = TRUE )
x |
A SingleCellExperiment
with the matrix located in the assay slot under |
useAssay |
A string specifying the name of the assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
sampleLabel |
Vector or factor. Denotes the sample label for each cell (column) in the count matrix. |
K |
Integer. Number of cell populations. |
alpha |
Numeric. Concentration parameter for Theta. Adds a pseudocount to each cell population in each sample. Default 1. |
beta |
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature in each cell population. Default 1. |
algorithm |
String. Algorithm to use for clustering cell subpopulations. One of 'EM' or 'Gibbs'. The EM algorithm is faster, especially for larger numbers of cells. However, more chains may be required to ensure a good solution is found. If 'EM' is selected, then 'stopIter' will be automatically set to 1. Default 'EM'. |
stopIter |
Integer. Number of iterations without improvement in the log likelihood to stop inference. Default 10. |
maxIter |
Integer. Maximum number of iterations of Gibbs sampling or EM to perform. Default 200. |
splitOnIter |
Integer. On every 'splitOnIter' iteration, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. To disable splitting, set to -1. Default 10. |
splitOnLast |
Integer. After 'stopIter' iterations have been performed without improvement, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. If a split occurs, then 'stopIter' will be reset. Default TRUE. |
seed |
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made. |
nchains |
Integer. Number of random cluster initializations. Default 3. |
zInitialize |
Character. One of 'random', 'split', or 'predefined'. With 'random', cells are randomly assigned to a populations. With 'split', cells will be split into sqrt(K) populations and then each population will be subsequently split into another sqrt(K) populations. With 'predefined', values in ‘zInit' will be used to initialize 'z'. Default ’split'. |
countChecksum |
Character. An MD5 checksum for the 'counts' matrix. Default NULL. |
zInit |
Integer vector. Sets initial starting values of z. 'zInit' is only used when ‘zInitialize = ’predfined''. Default NULL. |
logfile |
Character. Messages will be redirected to a file named 'logfile'. If NULL, messages will be printed to stdout. Default NULL. |
verbose |
Logical. Whether to print log messages. Default TRUE. |
A SingleCellExperiment object. Function
parameter settings are stored in the metadata
"celda_parameters"
slot.
Columns celda_sample_label
and celda_cell_cluster
in
colData contain sample labels and celda cell
population clusters.
celda_G for feature clustering and celda_CG for simultaneous clustering of features and cells. celdaGridSearch can be used to run multiple values of K and multiple chains in parallel.
data(celdaCSim) sce <- celda_C(celdaCSim$counts, K = celdaCSim$K, sampleLabel = celdaCSim$sampleLabel, nchains = 1)
data(celdaCSim) sce <- celda_C(celdaCSim$counts, K = celdaCSim$K, sampleLabel = celdaCSim$sampleLabel, nchains = 1)
Clusters the rows and columns of a count matrix containing
single-cell data into L modules and K subpopulations, respectively. The
useAssay
assay slot in
altExpName
altExp slot will be used if
it exists. Otherwise, the useAssay
assay slot in x
will be used if
x
is a SingleCellExperiment object.
celda_CG( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, K, L, alpha = 1, beta = 1, delta = 1, gamma = 1, algorithm = c("EM", "Gibbs"), stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, zInitialize = c("split", "random", "predefined"), yInitialize = c("split", "random", "predefined"), countChecksum = NULL, zInit = NULL, yInit = NULL, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'SingleCellExperiment' celda_CG( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, K, L, alpha = 1, beta = 1, delta = 1, gamma = 1, algorithm = c("EM", "Gibbs"), stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, zInitialize = c("split", "random", "predefined"), yInitialize = c("split", "random", "predefined"), countChecksum = NULL, zInit = NULL, yInit = NULL, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'ANY' celda_CG( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, K, L, alpha = 1, beta = 1, delta = 1, gamma = 1, algorithm = c("EM", "Gibbs"), stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, zInitialize = c("split", "random", "predefined"), yInitialize = c("split", "random", "predefined"), countChecksum = NULL, zInit = NULL, yInit = NULL, logfile = NULL, verbose = TRUE )
celda_CG( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, K, L, alpha = 1, beta = 1, delta = 1, gamma = 1, algorithm = c("EM", "Gibbs"), stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, zInitialize = c("split", "random", "predefined"), yInitialize = c("split", "random", "predefined"), countChecksum = NULL, zInit = NULL, yInit = NULL, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'SingleCellExperiment' celda_CG( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, K, L, alpha = 1, beta = 1, delta = 1, gamma = 1, algorithm = c("EM", "Gibbs"), stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, zInitialize = c("split", "random", "predefined"), yInitialize = c("split", "random", "predefined"), countChecksum = NULL, zInit = NULL, yInit = NULL, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'ANY' celda_CG( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, K, L, alpha = 1, beta = 1, delta = 1, gamma = 1, algorithm = c("EM", "Gibbs"), stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, zInitialize = c("split", "random", "predefined"), yInitialize = c("split", "random", "predefined"), countChecksum = NULL, zInit = NULL, yInit = NULL, logfile = NULL, verbose = TRUE )
x |
A SingleCellExperiment
with the matrix located in the assay slot under |
useAssay |
A string specifying the name of the assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
sampleLabel |
Vector or factor. Denotes the sample label for each cell (column) in the count matrix. |
K |
Integer. Number of cell populations. |
L |
Integer. Number of feature modules. |
alpha |
Numeric. Concentration parameter for Theta. Adds a pseudocount to each cell population in each sample. Default 1. |
beta |
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell population. Default 1. |
delta |
Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1. |
gamma |
Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1. |
algorithm |
String. Algorithm to use for clustering cell subpopulations. One of 'EM' or 'Gibbs'. The EM algorithm for cell clustering is faster, especially for larger numbers of cells. However, more chains may be required to ensure a good solution is found. Default 'EM'. |
stopIter |
Integer. Number of iterations without improvement in the log likelihood to stop inference. Default 10. |
maxIter |
Integer. Maximum number of iterations of Gibbs sampling to perform. Default 200. |
splitOnIter |
Integer. On every |
splitOnLast |
Integer. After |
seed |
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made. |
nchains |
Integer. Number of random cluster initializations. Default 3. |
zInitialize |
Chararacter. One of 'random', 'split', or 'predefined'.
With 'random', cells are randomly assigned to a populations. With 'split',
cells will be split into sqrt(K) populations and then each population will
be subsequently split into another sqrt(K) populations. With 'predefined',
values in |
yInitialize |
Character. One of 'random', 'split', or 'predefined'.
With 'random', features are randomly assigned to a modules. With 'split',
features will be split into sqrt(L) modules and then each module will be
subsequently split into another sqrt(L) modules. With 'predefined', values
in |
countChecksum |
Character. An MD5 checksum for the counts matrix. Default NULL. |
zInit |
Integer vector. Sets initial starting values of z. 'zInit' is only used when ‘zInitialize = ’predfined''. Default NULL. |
yInit |
Integer vector. Sets initial starting values of y. 'yInit' is only be used when 'yInitialize = "predefined"'. Default NULL. |
logfile |
Character. Messages will be redirected to a file named 'logfile'. If NULL, messages will be printed to stdout. Default NULL. |
verbose |
Logical. Whether to print log messages. Default TRUE. |
A SingleCellExperiment object. Function
parameter settings are stored in metadata
"celda_parameters"
in altExp slot.
In altExp slot,
columns celda_sample_label
and celda_cell_cluster
in
colData contain sample labels and celda cell
population clusters. Column celda_feature_module
in
rowData contains feature modules.
celda_G for feature clustering and celda_C for clustering cells. celdaGridSearch can be used to run multiple values of K/L and multiple chains in parallel.
data(celdaCGSim) sce <- celda_CG(celdaCGSim$counts, K = celdaCGSim$K, L = celdaCGSim$L, sampleLabel = celdaCGSim$sampleLabel, nchains = 1)
data(celdaCGSim) sce <- celda_CG(celdaCGSim$counts, K = celdaCGSim$K, L = celdaCGSim$L, sampleLabel = celdaCGSim$sampleLabel, nchains = 1)
Clusters the rows of a count matrix containing single-cell data
into L modules. The
useAssay
assay slot in
altExpName
altExp slot will be used if
it exists. Otherwise, the useAssay
assay slot in x
will be used if
x
is a SingleCellExperiment object.
celda_G( x, useAssay = "counts", altExpName = "featureSubset", L, beta = 1, delta = 1, gamma = 1, stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, yInitialize = c("split", "random", "predefined"), countChecksum = NULL, yInit = NULL, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'SingleCellExperiment' celda_G( x, useAssay = "counts", altExpName = "featureSubset", L, beta = 1, delta = 1, gamma = 1, stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, yInitialize = c("split", "random", "predefined"), countChecksum = NULL, yInit = NULL, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'ANY' celda_G( x, useAssay = "counts", altExpName = "featureSubset", L, beta = 1, delta = 1, gamma = 1, stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, yInitialize = c("split", "random", "predefined"), countChecksum = NULL, yInit = NULL, logfile = NULL, verbose = TRUE )
celda_G( x, useAssay = "counts", altExpName = "featureSubset", L, beta = 1, delta = 1, gamma = 1, stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, yInitialize = c("split", "random", "predefined"), countChecksum = NULL, yInit = NULL, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'SingleCellExperiment' celda_G( x, useAssay = "counts", altExpName = "featureSubset", L, beta = 1, delta = 1, gamma = 1, stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, yInitialize = c("split", "random", "predefined"), countChecksum = NULL, yInit = NULL, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'ANY' celda_G( x, useAssay = "counts", altExpName = "featureSubset", L, beta = 1, delta = 1, gamma = 1, stopIter = 10, maxIter = 200, splitOnIter = 10, splitOnLast = TRUE, seed = 12345, nchains = 3, yInitialize = c("split", "random", "predefined"), countChecksum = NULL, yInit = NULL, logfile = NULL, verbose = TRUE )
x |
A SingleCellExperiment
with the matrix located in the assay slot under |
useAssay |
A string specifying the name of the assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
L |
Integer. Number of feature modules. |
beta |
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell. Default 1. |
delta |
Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1. |
gamma |
Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1. |
stopIter |
Integer. Number of iterations without improvement in the log likelihood to stop inference. Default 10. |
maxIter |
Integer. Maximum number of iterations of Gibbs sampling to perform. Default 200. |
splitOnIter |
Integer. On every 'splitOnIter' iteration, a heuristic will be applied to determine if a feature module should be reassigned and another feature module should be split into two clusters. To disable splitting, set to -1. Default 10. |
splitOnLast |
Integer. After 'stopIter' iterations have been performed without improvement, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. If a split occurs, then 'stopIter' will be reset. Default TRUE. |
seed |
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made. |
nchains |
Integer. Number of random cluster initializations. Default 3. |
yInitialize |
Chararacter. One of 'random', 'split', or 'predefined'. With 'random', features are randomly assigned to a modules. With 'split', features will be split into sqrt(L) modules and then each module will be subsequently split into another sqrt(L) modules. With 'predefined', values in ‘yInit' will be used to initialize 'y'. Default ’split'. |
countChecksum |
Character. An MD5 checksum for the 'counts' matrix. Default NULL. |
yInit |
Integer vector. Sets initial starting values of y. ‘yInit' can only be used when 'yInitialize = ’predefined''. Default NULL. |
logfile |
Character. Messages will be redirected to a file named
|
verbose |
Logical. Whether to print log messages. Default TRUE. |
A SingleCellExperiment object. Function
parameter settings are stored in the metadata
"celda_parameters"
slot. Column celda_feature_module
in
rowData contains feature modules.
celda_C for cell clustering and celda_CG for simultaneous clustering of features and cells. celdaGridSearch can be used to run multiple values of L and multiple chains in parallel.
data(celdaGSim) sce <- celda_G(celdaGSim$counts, L = celdaGSim$L, nchains = 1)
data(celdaGSim) sce <- celda_G(celdaGSim$counts, L = celdaGSim$L, nchains = 1)
Example results of old celdaGridSearch on celdaCGSim
celdaCGGridSearchRes
celdaCGGridSearchRes
An object as returned from old celdaGridSearch()
celda_CG model object generated from celdaCGSim
using
old celda_CG
function.
celdaCGMod
celdaCGMod
A celda_CG object
An deprecated example of simulated count matrix from the celda_CG model.
celdaCGSim
celdaCGSim
A list of counts and properties as returned from old simulateCells().
Return or set the cell cluster labels determined by celda_C or celda_CG models.
celdaClusters(x, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' celdaClusters(x, altExpName = "featureSubset") ## S4 method for signature 'celdaModel' celdaClusters(x) celdaClusters(x, altExpName = "featureSubset") <- value ## S4 replacement method for signature 'SingleCellExperiment' celdaClusters(x, altExpName = "featureSubset") <- value
celdaClusters(x, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' celdaClusters(x, altExpName = "featureSubset") ## S4 method for signature 'celdaModel' celdaClusters(x) celdaClusters(x, altExpName = "featureSubset") <- value ## S4 replacement method for signature 'SingleCellExperiment' celdaClusters(x, altExpName = "featureSubset") <- value
x |
Can be one of
|
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
value |
Character vector of cell cluster labels for replacements. Works
only if |
One of
Character vector if x
is a
SingleCellExperiment object.
Contains cell cluster labels for each cell in x.
List if x
is a celda model object. Contains cell cluster
labels (for celda_C and celdaCG
Models) and/or feature module labels (for celda_G and celdaCG Models).
data(sceCeldaCG) celdaClusters(sceCeldaCG) data(celdaCGMod) celdaClusters(celdaCGMod)
data(sceCeldaCG) celdaClusters(sceCeldaCG) data(celdaCGMod) celdaClusters(celdaCGMod)
Old celda_C results generated from celdaCSim
celdaCMod
celdaCMod
A celda_C object
An old example simulated count matrix from the celda_C model.
celdaCSim
celdaCSim
A list of counts and properties as returned from old simulateCells().
Old celda_G results generated from celdaGsim
celdaGMod
celdaGMod
A celda_G object
Run Celda with different combinations of parameters and
multiple chains in parallel. The variable availableModels contains
the potential models that can be utilized. Different parameters to be tested
should be stored in a list and passed to the argument paramsTest
.
Fixed parameters to be used in all models, such as sampleLabel
, can
be passed as a list to the argument paramsFixed
. When
verbose = TRUE
, output from each chain will be sent to a log file
but not be displayed in stdout
.
celdaGridSearch( x, useAssay = "counts", altExpName = "featureSubset", model, paramsTest, paramsFixed = NULL, maxIter = 200, nchains = 3, cores = 1, bestOnly = TRUE, seed = 12345, perplexity = TRUE, verbose = TRUE, logfilePrefix = "Celda" ) ## S4 method for signature 'SingleCellExperiment' celdaGridSearch( x, useAssay = "counts", altExpName = "featureSubset", model, paramsTest, paramsFixed = NULL, maxIter = 200, nchains = 3, cores = 1, bestOnly = TRUE, seed = 12345, perplexity = TRUE, verbose = TRUE, logfilePrefix = "Celda" ) ## S4 method for signature 'matrix' celdaGridSearch( x, useAssay = "counts", altExpName = "featureSubset", model, paramsTest, paramsFixed = NULL, maxIter = 200, nchains = 3, cores = 1, bestOnly = TRUE, seed = 12345, perplexity = TRUE, verbose = TRUE, logfilePrefix = "Celda" )
celdaGridSearch( x, useAssay = "counts", altExpName = "featureSubset", model, paramsTest, paramsFixed = NULL, maxIter = 200, nchains = 3, cores = 1, bestOnly = TRUE, seed = 12345, perplexity = TRUE, verbose = TRUE, logfilePrefix = "Celda" ) ## S4 method for signature 'SingleCellExperiment' celdaGridSearch( x, useAssay = "counts", altExpName = "featureSubset", model, paramsTest, paramsFixed = NULL, maxIter = 200, nchains = 3, cores = 1, bestOnly = TRUE, seed = 12345, perplexity = TRUE, verbose = TRUE, logfilePrefix = "Celda" ) ## S4 method for signature 'matrix' celdaGridSearch( x, useAssay = "counts", altExpName = "featureSubset", model, paramsTest, paramsFixed = NULL, maxIter = 200, nchains = 3, cores = 1, bestOnly = TRUE, seed = 12345, perplexity = TRUE, verbose = TRUE, logfilePrefix = "Celda" )
x |
A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under |
useAssay |
A string specifying the name of the assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
model |
Celda model. Options available in availableModels. |
paramsTest |
List. A list denoting the combinations of parameters to
run in a celda model. For example,
|
paramsFixed |
List. A list denoting additional parameters to use in each celda model. Default NULL. |
maxIter |
Integer. Maximum number of iterations of sampling to perform. Default 200. |
nchains |
Integer. Number of random cluster initializations. Default 3. |
cores |
Integer. The number of cores to use for parallel estimation of chains. Default 1. |
bestOnly |
Logical. Whether to return only the chain with the highest log likelihood per combination of parameters or return all chains. Default TRUE. |
seed |
Integer. Passed to with_seed. For reproducibility,
a default value of 12345 is used. Seed values
|
perplexity |
Logical. Whether to calculate perplexity for each model. If FALSE, then perplexity can be calculated later with resamplePerplexity. Default TRUE. |
verbose |
Logical. Whether to print log messages during celda chain execution. Default TRUE. |
logfilePrefix |
Character. Prefix for log files from worker threads and main process. Default "Celda". |
A SingleCellExperiment object. Function
parameter settings and celda model results are stored in the
metadata "celda_grid_search"
slot.
celda_G for feature clustering, celda_C for
clustering of cells, and celda_CG for simultaneous clustering of
features and cells. subsetCeldaList can subset the celdaList
object. selectBestModel can get the best model for each combination
of parameters.
## Not run: data(celdaCGSim) ## Run various combinations of parameters with 'celdaGridSearch' celdaCGGridSearchRes <- celdaGridSearch(celdaCGSim$counts, model = "celda_CG", paramsTest = list(K = seq(4, 6), L = seq(9, 11)), paramsFixed = list(sampleLabel = celdaCGSim$sampleLabel), bestOnly = TRUE, nchains = 1, cores = 1) ## End(Not run)
## Not run: data(celdaCGSim) ## Run various combinations of parameters with 'celdaGridSearch' celdaCGGridSearchRes <- celdaGridSearch(celdaCGSim$counts, model = "celda_CG", paramsTest = list(K = seq(4, 6), L = seq(9, 11)), paramsFixed = list(sampleLabel = celdaCGSim$sampleLabel), bestOnly = TRUE, nchains = 1, cores = 1) ## End(Not run)
An old example simulated count matrix from the celda_G model.
celdaGSim
celdaGSim
A list of counts and properties as returned from old simulateCells()
Render a stylable heatmap of count data based on celda clustering results.
celdaHeatmap( sce, useAssay = "counts", altExpName = "featureSubset", featureIx = NULL, nfeatures = 25, ... ) ## S4 method for signature 'SingleCellExperiment' celdaHeatmap( sce, useAssay = "counts", altExpName = "featureSubset", featureIx = NULL, nfeatures = 25, ... )
celdaHeatmap( sce, useAssay = "counts", altExpName = "featureSubset", featureIx = NULL, nfeatures = 25, ... ) ## S4 method for signature 'SingleCellExperiment' celdaHeatmap( sce, useAssay = "counts", altExpName = "featureSubset", featureIx = NULL, nfeatures = 25, ... )
sce |
A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG. |
useAssay |
A string specifying which assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
featureIx |
Integer vector. Select features for display in heatmap. If
NULL, no subsetting will be performed. Default NULL. Only used for
|
nfeatures |
Integer. Maximum number of features to select for each
gene module. Default 25. Only used for |
... |
Additional parameters passed to plotHeatmap. |
list A list containing dendrogram information and the heatmap grob
'celdaTsne()' for generating 2-dimensional tSNE coordinates
data(sceCeldaCG) celdaHeatmap(sceCeldaCG)
data(sceCeldaCG) celdaHeatmap(sceCeldaCG)
Return the celda model for sce
returned by
celda_C, celda_G or celda_CG.
celdaModel(sce, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' celdaModel(sce, altExpName = "featureSubset")
celdaModel(sce, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' celdaModel(sce, altExpName = "featureSubset")
sce |
A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG. |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
Character. The celda model. Can be one of "celda_C", "celda_G", or "celda_CG".
data(sceCeldaCG) celdaModel(sceCeldaCG)
data(sceCeldaCG) celdaModel(sceCeldaCG)
Return or set the feature module cluster labels determined by celda_G or celda_CG models.
celdaModules(sce, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' celdaModules(sce, altExpName = "featureSubset") celdaModules(sce, altExpName = "featureSubset") <- value ## S4 replacement method for signature 'SingleCellExperiment' celdaModules(sce, altExpName = "featureSubset") <- value
celdaModules(sce, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' celdaModules(sce, altExpName = "featureSubset") celdaModules(sce, altExpName = "featureSubset") <- value ## S4 replacement method for signature 'SingleCellExperiment' celdaModules(sce, altExpName = "featureSubset") <- value
sce |
A SingleCellExperiment object returned by
celda_G, or celda_CG, with the matrix
located in the |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
value |
Character vector of feature module labels for replacements.
Works only if |
Character vector. Contains feature module labels for each feature in x.
data(sceCeldaCG) celdaModules(sceCeldaCG)
data(sceCeldaCG) celdaModules(sceCeldaCG)
Returns perplexity for each model in a celdaList as calculated by 'perplexity().'
celdaPerplexity(celdaList)
celdaPerplexity(celdaList)
celdaList |
An object of class celdaList. |
List. Contains one celdaModel object for each of the parameters specified in the 'runParams()' of the provided celda list.
data(celdaCGGridSearchRes) celdaCGGridModelPerplexities <- celdaPerplexity(celdaCGGridSearchRes)
data(celdaCGGridSearchRes) celdaCGGridModelPerplexities <- celdaPerplexity(celdaCGGridSearchRes)
Returns perplexity for each model in a celdaList as calculated by 'perplexity().'
## S4 method for signature 'celdaList' celdaPerplexity(celdaList)
## S4 method for signature 'celdaList' celdaPerplexity(celdaList)
celdaList |
An object of class celdaList. |
List. Contains one celdaModel object for each of the parameters specified in the 'runParams()' of the provided celda list.
data(celdaCGGridSearchRes) celdaCGGridModelPerplexities <- celdaPerplexity(celdaCGGridSearchRes)
data(celdaCGGridSearchRes) celdaCGGridModelPerplexities <- celdaPerplexity(celdaCGGridSearchRes)
Renders probability and relative expression heatmaps to visualize the relationship between features and cell populations (or cell populations and samples).
celdaProbabilityMap( sce, useAssay = "counts", altExpName = "featureSubset", level = c("cellPopulation", "sample"), ncols = 100, col2 = circlize::colorRamp2(c(-2, 0, 2), c("#1E90FF", "#FFFFFF", "#CD2626")), title1 = "Absolute probability", title2 = "Relative expression", showColumnNames = TRUE, showRowNames = TRUE, rowNamesgp = grid::gpar(fontsize = 8), colNamesgp = grid::gpar(fontsize = 12), clusterRows = FALSE, clusterColumns = FALSE, showHeatmapLegend = TRUE, heatmapLegendParam = list(title = NULL, legend_height = grid::unit(6, "cm")), ... ) ## S4 method for signature 'SingleCellExperiment' celdaProbabilityMap( sce, useAssay = "counts", altExpName = "featureSubset", level = c("cellPopulation", "sample"), ncols = 100, col2 = circlize::colorRamp2(c(-2, 0, 2), c("#1E90FF", "#FFFFFF", "#CD2626")), title1 = "Absolute probability", title2 = "Relative expression", showColumnNames = TRUE, showRowNames = TRUE, rowNamesgp = grid::gpar(fontsize = 8), colNamesgp = grid::gpar(fontsize = 12), clusterRows = FALSE, clusterColumns = FALSE, showHeatmapLegend = TRUE, heatmapLegendParam = list(title = NULL, legend_height = grid::unit(6, "cm")), ... )
celdaProbabilityMap( sce, useAssay = "counts", altExpName = "featureSubset", level = c("cellPopulation", "sample"), ncols = 100, col2 = circlize::colorRamp2(c(-2, 0, 2), c("#1E90FF", "#FFFFFF", "#CD2626")), title1 = "Absolute probability", title2 = "Relative expression", showColumnNames = TRUE, showRowNames = TRUE, rowNamesgp = grid::gpar(fontsize = 8), colNamesgp = grid::gpar(fontsize = 12), clusterRows = FALSE, clusterColumns = FALSE, showHeatmapLegend = TRUE, heatmapLegendParam = list(title = NULL, legend_height = grid::unit(6, "cm")), ... ) ## S4 method for signature 'SingleCellExperiment' celdaProbabilityMap( sce, useAssay = "counts", altExpName = "featureSubset", level = c("cellPopulation", "sample"), ncols = 100, col2 = circlize::colorRamp2(c(-2, 0, 2), c("#1E90FF", "#FFFFFF", "#CD2626")), title1 = "Absolute probability", title2 = "Relative expression", showColumnNames = TRUE, showRowNames = TRUE, rowNamesgp = grid::gpar(fontsize = 8), colNamesgp = grid::gpar(fontsize = 12), clusterRows = FALSE, clusterColumns = FALSE, showHeatmapLegend = TRUE, heatmapLegendParam = list(title = NULL, legend_height = grid::unit(6, "cm")), ... )
sce |
A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG. |
useAssay |
A string specifying which assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
level |
Character. One of "cellPopulation" or "Sample".
"cellPopulation" will display the absolute probabilities and relative
normalized expression of each module in each cell population.
|
ncols |
The number of colors (>1) to be in the color palette of the absolute probability heatmap. |
col2 |
Passed to |
title1 |
Passed to |
title2 |
Passed to |
showColumnNames |
Passed to |
showRowNames |
Passed to |
rowNamesgp |
Passed to |
colNamesgp |
Passed to |
clusterRows |
Passed to |
clusterColumns |
Passed to |
showHeatmapLegend |
Passed to |
heatmapLegendParam |
Passed to |
... |
Additional parameters passed to Heatmap. |
A HeatmapList object containing 2 Heatmap-class objects
celda_C for clustering cells. celda_CG for clustering features and cells
data(sceCeldaCG) celdaProbabilityMap(sceCeldaCG)
data(sceCeldaCG) celdaProbabilityMap(sceCeldaCG)
SCE
objectConvert a old celda model object (celda_C
,
celda_G
, or celda_CG
object) to a
SingleCellExperiment object containing celda model
information in metadata
slot. Counts matrix is stored in the
"counts"
assay slot in assays
.
celdatosce( celdaModel, counts, useAssay = "counts", altExpName = "featureSubset" ) ## S4 method for signature 'celda_C' celdatosce( celdaModel, counts, useAssay = "counts", altExpName = "featureSubset" ) ## S4 method for signature 'celda_G' celdatosce( celdaModel, counts, useAssay = "counts", altExpName = "featureSubset" ) ## S4 method for signature 'celda_CG' celdatosce( celdaModel, counts, useAssay = "counts", altExpName = "featureSubset" ) ## S4 method for signature 'celdaList' celdatosce( celdaModel, counts, useAssay = "counts", altExpName = "featureSubset" )
celdatosce( celdaModel, counts, useAssay = "counts", altExpName = "featureSubset" ) ## S4 method for signature 'celda_C' celdatosce( celdaModel, counts, useAssay = "counts", altExpName = "featureSubset" ) ## S4 method for signature 'celda_G' celdatosce( celdaModel, counts, useAssay = "counts", altExpName = "featureSubset" ) ## S4 method for signature 'celda_CG' celdatosce( celdaModel, counts, useAssay = "counts", altExpName = "featureSubset" ) ## S4 method for signature 'celdaList' celdatosce( celdaModel, counts, useAssay = "counts", altExpName = "featureSubset" )
celdaModel |
A |
counts |
A numeric matrix of counts used to generate
|
useAssay |
A string specifying the name of the assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
A SingleCellExperiment object. Function
parameter settings are stored in the metadata
"celda_parameters"
slot.
Columns celda_sample_label
and celda_cell_cluster
in
colData contain sample labels and celda cell
population clusters. Column celda_feature_module
in
rowData contain feature modules.
data(celdaCMod, celdaCSim) sce <- celdatosce(celdaCMod, celdaCSim$counts) data(celdaGMod, celdaGSim) sce <- celdatosce(celdaGMod, celdaGSim$counts) data(celdaCGMod, celdaCGSim) sce <- celdatosce(celdaCGMod, celdaCGSim$counts) data(celdaCGGridSearchRes, celdaCGSim) sce <- celdatosce(celdaCGGridSearchRes, celdaCGSim$counts)
data(celdaCMod, celdaCSim) sce <- celdatosce(celdaCMod, celdaCSim$counts) data(celdaGMod, celdaGSim) sce <- celdatosce(celdaGMod, celdaGSim$counts) data(celdaCGMod, celdaCGSim) sce <- celdatosce(celdaCGMod, celdaCGSim$counts) data(celdaCGGridSearchRes, celdaCGSim) sce <- celdatosce(celdaCGGridSearchRes, celdaCGSim$counts)
sce
objectEmbeds cells in two dimensions using Rtsne based
on a celda model. For celda_C sce
objects, PCA on the normalized
counts is used to reduce the number of features before applying t-SNE. For
celda_CG and celda_G sce
objects, tSNE is run on module
probabilities to reduce the number of features instead of using PCA.
Module probabilities are square-root transformed before applying tSNE.
celdaTsne( sce, useAssay = "counts", altExpName = "featureSubset", maxCells = NULL, minClusterSize = 100, initialDims = 20, modules = NULL, perplexity = 20, maxIter = 2500, normalize = "proportion", scaleFactor = NULL, transformationFun = sqrt, seed = 12345 ) ## S4 method for signature 'SingleCellExperiment' celdaTsne( sce, useAssay = "counts", altExpName = "featureSubset", maxCells = NULL, minClusterSize = 100, initialDims = 20, modules = NULL, perplexity = 20, maxIter = 2500, normalize = "proportion", scaleFactor = NULL, transformationFun = sqrt, seed = 12345 )
celdaTsne( sce, useAssay = "counts", altExpName = "featureSubset", maxCells = NULL, minClusterSize = 100, initialDims = 20, modules = NULL, perplexity = 20, maxIter = 2500, normalize = "proportion", scaleFactor = NULL, transformationFun = sqrt, seed = 12345 ) ## S4 method for signature 'SingleCellExperiment' celdaTsne( sce, useAssay = "counts", altExpName = "featureSubset", maxCells = NULL, minClusterSize = 100, initialDims = 20, modules = NULL, perplexity = 20, maxIter = 2500, normalize = "proportion", scaleFactor = NULL, transformationFun = sqrt, seed = 12345 )
sce |
A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG. |
useAssay |
A string specifying which assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
maxCells |
Integer. Maximum number of cells to plot. Cells will be
randomly subsampled if |
minClusterSize |
Integer. Do not subsample cell clusters below this threshold. Default 100. |
initialDims |
Integer. PCA will be used to reduce the dimensionality of the dataset. The top 'initialDims' principal components will be used for tSNE. Default 20. |
modules |
Integer vector. Determines which feature modules to use for
tSNE. If |
perplexity |
Numeric. Perplexity parameter for tSNE. Default 20. |
maxIter |
Integer. Maximum number of iterations in tSNE generation. Default 2500. |
normalize |
Character. Passed to normalizeCounts in normalization step. Divides counts by the library sizes for each cell. One of 'proportion', 'cpm', 'median', or 'mean'. 'proportion' uses the total counts for each cell as the library size. 'cpm' divides the library size of each cell by one million to produce counts per million. 'median' divides the library size of each cell by the median library size across all cells. 'mean' divides the library size of each cell by the mean library size across all cells. |
scaleFactor |
Numeric. Sets the scale factor for cell-level
normalization. This scale factor is multiplied to each cell after the
library size of each cell had been adjusted in |
transformationFun |
Function. Applys a transformation such as 'sqrt',
'log', 'log2', 'log10', or 'log1p'. If |
seed |
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made. |
sce
with t-SNE coordinates
(columns "celda_tSNE1" & "celda_tSNE2") added to
reducedDim(sce, "celda_tSNE")
.
data(sceCeldaCG) tsneRes <- celdaTsne(sceCeldaCG)
data(sceCeldaCG) tsneRes <- celdaTsne(sceCeldaCG)
sce
objectEmbeds cells in two dimensions using umap based on
a celda model. For celda_C sce
objects, PCA on the normalized counts
is used to reduce the number of features before applying UMAP. For celda_CG
sce
object, UMAP is run on module probabilities to reduce the number
of features instead of using PCA. Module probabilities are square-root
transformed before applying UMAP.
celdaUmap( sce, useAssay = "counts", altExpName = "featureSubset", maxCells = NULL, minClusterSize = 100, modules = NULL, seed = 12345, nNeighbors = 30, minDist = 0.75, spread = 1, pca = TRUE, initialDims = 50, normalize = "proportion", scaleFactor = NULL, transformationFun = sqrt, cores = 1, ... ) ## S4 method for signature 'SingleCellExperiment' celdaUmap( sce, useAssay = "counts", altExpName = "featureSubset", maxCells = NULL, minClusterSize = 100, modules = NULL, seed = 12345, nNeighbors = 30, minDist = 0.75, spread = 1, pca = TRUE, initialDims = 50, normalize = "proportion", scaleFactor = NULL, transformationFun = sqrt, cores = 1, ... )
celdaUmap( sce, useAssay = "counts", altExpName = "featureSubset", maxCells = NULL, minClusterSize = 100, modules = NULL, seed = 12345, nNeighbors = 30, minDist = 0.75, spread = 1, pca = TRUE, initialDims = 50, normalize = "proportion", scaleFactor = NULL, transformationFun = sqrt, cores = 1, ... ) ## S4 method for signature 'SingleCellExperiment' celdaUmap( sce, useAssay = "counts", altExpName = "featureSubset", maxCells = NULL, minClusterSize = 100, modules = NULL, seed = 12345, nNeighbors = 30, minDist = 0.75, spread = 1, pca = TRUE, initialDims = 50, normalize = "proportion", scaleFactor = NULL, transformationFun = sqrt, cores = 1, ... )
sce |
A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG. |
useAssay |
A string specifying which assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
maxCells |
Integer. Maximum number of cells to plot. Cells will be
randomly subsampled if |
minClusterSize |
Integer. Do not subsample cell clusters below this threshold. Default 100. |
modules |
Integer vector. Determines which features modules to use for UMAP. If NULL, all modules will be used. Default NULL. |
seed |
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made. |
nNeighbors |
The size of local neighborhood used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. Default 30. See umap for more information. |
minDist |
The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. Default 0.75. See umap for more information. |
spread |
The effective scale of embedded points. In combination with
|
pca |
Logical. Whether to perform
dimensionality reduction with PCA before UMAP. Only works for celda_C
|
initialDims |
Integer. Number of dimensions from PCA to use as
input in UMAP. Default 50. Only works for celda_C |
normalize |
Character. Passed to normalizeCounts in normalization step. Divides counts by the library sizes for each cell. One of 'proportion', 'cpm', 'median', or 'mean'. 'proportion' uses the total counts for each cell as the library size. 'cpm' divides the library size of each cell by one million to produce counts per million. 'median' divides the library size of each cell by the median library size across all cells. 'mean' divides the library size of each cell by the mean library size across all cells. |
scaleFactor |
Numeric. Sets the scale factor for cell-level
normalization. This scale factor is multiplied to each cell after the
library size of each cell had been adjusted in |
transformationFun |
Function. Applys a transformation such as 'sqrt',
'log', 'log2', 'log10', or 'log1p'. If |
cores |
Number of threads to use. Default 1. |
... |
Additional parameters to pass to umap. |
sce
with UMAP coordinates
(columns "celda_UMAP1" & "celda_UMAP2") added to
reducedDim(sce, "celda_UMAP")
.
data(sceCeldaCG) umapRes <- celdaUmap(sceCeldaCG)
data(sceCeldaCG) umapRes <- celdaUmap(sceCeldaCG)
Calculate the conditional probability of each cell belonging to each subpopulation given all other cell cluster assignments and/or each feature belonging to each module given all other feature cluster assignments in a celda model.
clusterProbability( sce, useAssay = "counts", altExpName = "featureSubset", log = FALSE ) ## S4 method for signature 'SingleCellExperiment' clusterProbability( sce, useAssay = "counts", altExpName = "featureSubset", log = FALSE )
clusterProbability( sce, useAssay = "counts", altExpName = "featureSubset", log = FALSE ) ## S4 method for signature 'SingleCellExperiment' clusterProbability( sce, useAssay = "counts", altExpName = "featureSubset", log = FALSE )
sce |
A SingleCellExperiment object returned by
celda_C, celda_G, or celda_CG, with the matrix
located in the |
useAssay |
A string specifying which assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
log |
Logical. If |
A list containging a matrix for the conditional cell subpopulation cluster and/or feature module probabilities.
'celda_C()' for clustering cells
data(sceCeldaCG) clusterProb <- clusterProbability(sceCeldaCG, log = TRUE) data(sceCeldaC) clusterProb <- clusterProbability(sceCeldaC)
data(sceCeldaCG) clusterProb <- clusterProbability(sceCeldaCG, log = TRUE) data(sceCeldaC) clusterProb <- clusterProbability(sceCeldaC)
Checks if the counts matrix is the same one used to generate the celda model object by comparing dimensions and MD5 checksum.
compareCountMatrix(counts, celdaMod, errorOnMismatch = TRUE) ## S4 method for signature 'ANY,celdaModel' compareCountMatrix(counts, celdaMod, errorOnMismatch = TRUE) ## S4 method for signature 'ANY,celdaList' compareCountMatrix(counts, celdaMod, errorOnMismatch = TRUE)
compareCountMatrix(counts, celdaMod, errorOnMismatch = TRUE) ## S4 method for signature 'ANY,celdaModel' compareCountMatrix(counts, celdaMod, errorOnMismatch = TRUE) ## S4 method for signature 'ANY,celdaList' compareCountMatrix(counts, celdaMod, errorOnMismatch = TRUE)
counts |
Integer , Numeric, or Sparse matrix. Rows represent features and columns represent cells. |
celdaMod |
A |
errorOnMismatch |
Logical. Whether to throw an error in the event of a mismatch. Default TRUE. |
Returns TRUE if provided count matrix matches the one used in the
celda object and/or errorOnMismatch = FALSE
, FALSE otherwise.
data(celdaCGSim, celdaCGMod) compareCountMatrix(celdaCGSim$counts, celdaCGMod, errorOnMismatch = FALSE) data(celdaCGSim, celdaCGGridSearchRes) compareCountMatrix(celdaCGSim$counts, celdaCGGridSearchRes, errorOnMismatch = FALSE)
data(celdaCGSim, celdaCGMod) compareCountMatrix(celdaCGSim$counts, celdaCGMod, errorOnMismatch = FALSE) data(celdaCGSim, celdaCGGridSearchRes) compareCountMatrix(celdaCGSim$counts, celdaCGGridSearchRes, errorOnMismatch = FALSE)
A toy contamination data generated by simulateContamination
contaminationSim
contaminationSim
A list
Returns the MD5 hash of the count matrix used to generate the celdaList.
countChecksum(celdaList)
countChecksum(celdaList)
celdaList |
An object of class celdaList. |
A character string of length 32 containing the MD5 digest of the count matrix.
data(celdaCGGridSearchRes) countChecksum <- countChecksum(celdaCGGridSearchRes)
data(celdaCGGridSearchRes) countChecksum <- countChecksum(celdaCGGridSearchRes)
Returns the MD5 hash of the count matrix used to generate the celdaList.
## S4 method for signature 'celdaList' countChecksum(celdaList)
## S4 method for signature 'celdaList' countChecksum(celdaList)
celdaList |
An object of class celdaList. |
A character string of length 32 containing the MD5 digest of the count matrix.
data(celdaCGGridSearchRes) countChecksum <- countChecksum(celdaCGGridSearchRes)
data(celdaCGGridSearchRes) countChecksum <- countChecksum(celdaCGGridSearchRes)
Identifies contamination from factors such as ambient RNA in single cell genomic datasets.
decontX(x, ...) ## S4 method for signature 'SingleCellExperiment' decontX( x, assayName = "counts", z = NULL, batch = NULL, background = NULL, bgAssayName = NULL, bgBatch = NULL, maxIter = 500, delta = c(10, 10), estimateDelta = TRUE, convergence = 0.001, iterLogLik = 10, varGenes = 5000, dbscanEps = 1, seed = 12345, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'ANY' decontX( x, z = NULL, batch = NULL, background = NULL, bgBatch = NULL, maxIter = 500, delta = c(10, 10), estimateDelta = TRUE, convergence = 0.001, iterLogLik = 10, varGenes = 5000, dbscanEps = 1, seed = 12345, logfile = NULL, verbose = TRUE )
decontX(x, ...) ## S4 method for signature 'SingleCellExperiment' decontX( x, assayName = "counts", z = NULL, batch = NULL, background = NULL, bgAssayName = NULL, bgBatch = NULL, maxIter = 500, delta = c(10, 10), estimateDelta = TRUE, convergence = 0.001, iterLogLik = 10, varGenes = 5000, dbscanEps = 1, seed = 12345, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'ANY' decontX( x, z = NULL, batch = NULL, background = NULL, bgBatch = NULL, maxIter = 500, delta = c(10, 10), estimateDelta = TRUE, convergence = 0.001, iterLogLik = 10, varGenes = 5000, dbscanEps = 1, seed = 12345, logfile = NULL, verbose = TRUE )
x |
A numeric matrix of counts or a SingleCellExperiment
with the matrix located in the assay slot under |
... |
For the generic, further arguments to pass to each method. |
assayName |
Character. Name of the assay to use if |
z |
Numeric or character vector. Cell cluster labels. If NULL, PCA will be used to reduce the dimensionality of the dataset initially, 'umap' from the 'uwot' package will be used to further reduce the dataset to 2 dimenions and the 'dbscan' function from the 'dbscan' package will be used to identify clusters of broad cell types. Default NULL. |
batch |
Numeric or character vector. Batch labels for cells. If batch labels are supplied, DecontX is run on cells from each batch separately. Cells run in different channels or assays should be considered different batches. Default NULL. |
background |
A numeric matrix of counts or a
SingleCellExperiment with the matrix located in the assay
slot under |
bgAssayName |
Character. Name of the assay to use if |
bgBatch |
Numeric or character vector. Batch labels for
|
maxIter |
Integer. Maximum iterations of the EM algorithm. Default 500. |
delta |
Numeric Vector of length 2. Concentration parameters for
the Dirichlet prior for the contamination in each cell. The first element
is the prior for the native counts while the second element is the prior for
the contamination counts. These essentially act as pseudocounts for the
native and contamination in each cell. If |
estimateDelta |
Boolean. Whether to update |
convergence |
Numeric. The EM algorithm will be stopped if the maximum difference in the contamination estimates between the previous and current iterations is less than this. Default 0.001. |
iterLogLik |
Integer. Calculate log likelihood every |
varGenes |
Integer. The number of variable genes to use in
dimensionality reduction before clustering. Variability is calcualted using
|
dbscanEps |
Numeric. The clustering resolution parameter used in 'dbscan' to estimate broad cell clusters. Used only when z is not provided. Default 1. |
seed |
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made. |
logfile |
Character. Messages will be redirected to a file named 'logfile'. If NULL, messages will be printed to stdout. Default NULL. |
verbose |
Logical. Whether to print log messages. Default TRUE. |
If x
is a matrix-like object, a list will be returned
with the following items:
decontXcounts
:The decontaminated matrix. Values obtained
from the variational inference procedure may be non-integer. However,
integer counts can be obtained by rounding,
e.g. round(decontXcounts)
.
contamination
:Percentage of contamination in each cell.
estimates
:List of estimated parameters for each batch. If z was not supplied, then the UMAP coordinates used to generated cell cluster labels will also be stored here.
z
:Cell population/cluster labels used for analysis.
runParams
:List of arguments used in the function call.
If x
is a SingleCellExperiment, then the decontaminated
counts will be stored as an assay and can be accessed with
decontXcounts(x)
. The contamination values and cluster labels
will be stored in colData(x)
. estimates
and runParams
will be stored in metadata(x)$decontX
. The UMAPs used to generated
cell cluster labels will be stored in
reducedDims
slot in x
.
Shiyi Yang, Yuan Yin, Joshua Campbell
# Generate matrix with contamination s <- simulateContamination(seed = 12345) library(SingleCellExperiment) sce <- SingleCellExperiment(list(counts = s$observedCounts)) sce <- decontX(sce) # Plot contamination on UMAP plotDecontXContamination(sce) # Plot decontX cluster labels umap <- reducedDim(sce) plotDimReduceCluster(x = sce$decontX_clusters, dim1 = umap[, 1], dim2 = umap[, 2], ) # Plot percentage of marker genes detected # in each cell cluster before decontamination s$markers plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts") # Plot percentage of marker genes detected # in each cell cluster after contamination plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "decontXcounts") # Plot percentage of marker genes detected in each cell # comparing original and decontaminated counts side-by-side plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = c("counts", "decontXcounts")) # Plot raw counts of indiviual markers genes before # and after decontamination plotDecontXMarkerExpression(sce, unlist(s$markers))
# Generate matrix with contamination s <- simulateContamination(seed = 12345) library(SingleCellExperiment) sce <- SingleCellExperiment(list(counts = s$observedCounts)) sce <- decontX(sce) # Plot contamination on UMAP plotDecontXContamination(sce) # Plot decontX cluster labels umap <- reducedDim(sce) plotDimReduceCluster(x = sce$decontX_clusters, dim1 = umap[, 1], dim2 = umap[, 2], ) # Plot percentage of marker genes detected # in each cell cluster before decontamination s$markers plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts") # Plot percentage of marker genes detected # in each cell cluster after contamination plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "decontXcounts") # Plot percentage of marker genes detected in each cell # comparing original and decontaminated counts side-by-side plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = c("counts", "decontXcounts")) # Plot raw counts of indiviual markers genes before # and after decontamination plotDecontXMarkerExpression(sce, unlist(s$markers))
Gets or sets the decontaminated counts matrix from a a SingleCellExperiment object.
decontXcounts(object, ...) decontXcounts(object, ...) <- value ## S4 method for signature 'SingleCellExperiment' decontXcounts(object, ...) ## S4 replacement method for signature 'SingleCellExperiment' decontXcounts(object, ...) <- value
decontXcounts(object, ...) decontXcounts(object, ...) <- value ## S4 method for signature 'SingleCellExperiment' decontXcounts(object, ...) ## S4 replacement method for signature 'SingleCellExperiment' decontXcounts(object, ...) <- value
object |
A SingleCellExperiment object. |
... |
For the generic, further arguments to pass to each method. |
value |
A matrix to save as an assay called |
If getting, the assay from object
with the name
decontXcounts
will be returned. If setting, a
SingleCellExperiment object will be returned with
decontXcounts
listed in the assay
slot.
Generate a palette of 'n' distinct colors.
distinctColors( n, hues = c("red", "cyan", "orange", "blue", "yellow", "purple", "green", "magenta"), saturationRange = c(0.7, 1), valueRange = c(0.7, 1) )
distinctColors( n, hues = c("red", "cyan", "orange", "blue", "yellow", "purple", "green", "magenta"), saturationRange = c(0.7, 1), valueRange = c(0.7, 1) )
n |
Integer. Number of colors to generate. |
hues |
Character vector. Colors available from 'colors()'. These will be used as the base colors for the clustering scheme in HSV. Different saturations and values will be generated for each hue. Default c("red", "cyan", "orange", "blue", "yellow", "purple", "green", "magenta"). |
saturationRange |
Numeric vector. A vector of length 2 denoting the saturation for HSV. Values must be in [0,1]. Default: c(0.25, 1). |
valueRange |
Numeric vector. A vector of length 2 denoting the range of values for HSV. Values must be in [0,1]. Default: 'c(0.5, 1)'. |
A vector of distinct colors that have been converted to HEX from HSV.
colorPal <- distinctColors(6) # can be used in plotting functions
colorPal <- distinctColors(6) # can be used in plotting functions
Fast matrix multiplication for double x int
eigenMatMultInt(A, B)
eigenMatMultInt(A, B)
A |
a double matrix |
B |
an integer matrix |
An integer matrix representing the product of A and B
Fast matrix multiplication for double x double
eigenMatMultNumeric(A, B)
eigenMatMultNumeric(A, B)
A |
a double matrix |
B |
an integer matrix |
An integer matrix representing the product of A and B
Generates factorized matrices showing the contribution of each feature in each cell population or each cell population in each sample.
factorizeMatrix( x, celdaMod, useAssay = "counts", altExpName = "featureSubset", type = c("counts", "proportion", "posterior") ) ## S4 method for signature 'SingleCellExperiment,ANY' factorizeMatrix( x, useAssay = "counts", altExpName = "featureSubset", type = c("counts", "proportion", "posterior") ) ## S4 method for signature 'ANY,celda_CG' factorizeMatrix(x, celdaMod, type = c("counts", "proportion", "posterior")) ## S4 method for signature 'ANY,celda_C' factorizeMatrix(x, celdaMod, type = c("counts", "proportion", "posterior")) ## S4 method for signature 'ANY,celda_G' factorizeMatrix(x, celdaMod, type = c("counts", "proportion", "posterior"))
factorizeMatrix( x, celdaMod, useAssay = "counts", altExpName = "featureSubset", type = c("counts", "proportion", "posterior") ) ## S4 method for signature 'SingleCellExperiment,ANY' factorizeMatrix( x, useAssay = "counts", altExpName = "featureSubset", type = c("counts", "proportion", "posterior") ) ## S4 method for signature 'ANY,celda_CG' factorizeMatrix(x, celdaMod, type = c("counts", "proportion", "posterior")) ## S4 method for signature 'ANY,celda_C' factorizeMatrix(x, celdaMod, type = c("counts", "proportion", "posterior")) ## S4 method for signature 'ANY,celda_G' factorizeMatrix(x, celdaMod, type = c("counts", "proportion", "posterior"))
x |
Can be one of
|
celdaMod |
Celda model object. Only works if |
useAssay |
A string specifying which assay
slot to use if |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
type |
Character vector. A vector containing one or more of "counts",
"proportion", or "posterior". "counts" returns the raw number of counts for
each factorized matrix. "proportions" returns the normalized probabilities
for each factorized matrix, which are calculated by dividing the raw counts
in each factorized matrix by the total counts in each column. "posterior"
returns the posterior estimates which include the addition of the Dirichlet
concentration parameter (essentially as a pseudocount). Default
|
For celda_CG model, A list with elements for "counts", "proportions", or "posterior" probabilities. Each element will be a list containing factorized matrices for "module", "cellPopulation", and "sample". Additionally, the contribution of each module in each individual cell will be included in the "cell" element of "counts" and "proportions" elements.
For celda_C model, a list with elements for "counts", "proportions", or "posterior" probabilities. Each element will be a list containing factorized matrices for "module" and "sample".
For celda_G model, a list with elements for "counts", "proportions", or "posterior" probabilities. Each element will be a list containing factorized matrices for "module" and "cell".
data(sceCeldaCG) factorizedMatrices <- factorizeMatrix(sceCeldaCG, type = "posterior") data(celdaCGSim, celdaCGMod) factorizedMatrices <- factorizeMatrix( celdaCGSim$counts, celdaCGMod, "posterior") data(celdaCSim, celdaCMod) factorizedMatrices <- factorizeMatrix( celdaCSim$counts, celdaCMod, "posterior" ) data(celdaGSim, celdaGMod) factorizedMatrices <- factorizeMatrix( celdaGSim$counts, celdaGMod, "posterior" )
data(sceCeldaCG) factorizedMatrices <- factorizeMatrix(sceCeldaCG, type = "posterior") data(celdaCGSim, celdaCGMod) factorizedMatrices <- factorizeMatrix( celdaCGSim$counts, celdaCGMod, "posterior") data(celdaCSim, celdaCMod) factorizedMatrices <- factorizeMatrix( celdaCSim$counts, celdaCMod, "posterior" ) data(celdaGSim, celdaGMod) factorizedMatrices <- factorizeMatrix( celdaGSim$counts, celdaGMod, "posterior" )
Fast normalization for numeric matrix
fastNormProp(R_counts, R_alpha)
fastNormProp(R_counts, R_alpha)
R_counts |
An integer matrix |
R_alpha |
A double value to be added to the matrix as a pseudocount |
A numeric matrix where the columns have been normalized to proportions
Fast normalization for numeric matrix
fastNormPropLog(R_counts, R_alpha)
fastNormPropLog(R_counts, R_alpha)
R_counts |
An integer matrix |
R_alpha |
A double value to be added to the matrix as a pseudocount |
A numeric matrix where the columns have been normalized to proportions
Fast normalization for numeric matrix
fastNormPropSqrt(R_counts, R_alpha)
fastNormPropSqrt(R_counts, R_alpha)
R_counts |
An integer matrix |
R_alpha |
A double value to be added to the matrix as a pseudocount |
A numeric matrix where the columns have been normalized to proportions
This function will output the corresponding feature module for
a specified vector of genes from a celda_CG or celda_G celdaModel
.
features
must match the rownames of sce
.
featureModuleLookup( sce, features, altExpName = "featureSubset", exactMatch = TRUE, by = "rownames" ) ## S4 method for signature 'SingleCellExperiment' featureModuleLookup( sce, features, altExpName = "featureSubset", exactMatch = TRUE, by = "rownames" )
featureModuleLookup( sce, features, altExpName = "featureSubset", exactMatch = TRUE, by = "rownames" ) ## S4 method for signature 'SingleCellExperiment' featureModuleLookup( sce, features, altExpName = "featureSubset", exactMatch = TRUE, by = "rownames" )
sce |
A SingleCellExperiment object returned by
celda_G, or celda_CG, with the matrix
located in the |
features |
Character vector. Identify feature modules for the specified
feature names. |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
exactMatch |
Logical. Whether to look for exactMatch of the gene name
within counts matrix. Default |
by |
Character. Where to search for |
Numeric vector containing the module numbers for each feature. If
the feature was not found, then an NA
value will be returned in that
position. If no features were found, then an error will be given.
data(sceCeldaCG) module <- featureModuleLookup(sce = sceCeldaCG, features = c("Gene_1", "Gene_XXX"))
data(sceCeldaCG) module <- featureModuleLookup(sce = sceCeldaCG, features = c("Gene_1", "Gene_XXX"))
Creates a table that contains the list of features in each feature module.
featureModuleTable( sce, useAssay = "counts", altExpName = "featureSubset", displayName = NULL, outputFile = NULL )
featureModuleTable( sce, useAssay = "counts", altExpName = "featureSubset", displayName = NULL, outputFile = NULL )
sce |
A SingleCellExperiment object returned by
celda_G, or celda_CG, with the matrix
located in the |
useAssay |
A string specifying which assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
displayName |
Character. The column name of
|
outputFile |
File name for feature module table. If NULL, file will not be created. Default NULL. |
Matrix. Contains a list of features per each column (feature module)
data(sceCeldaCG) featureModuleTable(sceCeldaCG)
data(sceCeldaCG) featureModuleTable(sceCeldaCG)
Identify and return significantly-enriched terms for each gene module in a Celda object or a SingleCellExperiment object. Performs gene set enrichment analysis for Celda identified modules using the enrichr.
geneSetEnrich( x, celdaModel, useAssay = "counts", altExpName = "featureSubset", databases, fdr = 0.05 ) ## S4 method for signature 'SingleCellExperiment' geneSetEnrich( x, useAssay = "counts", altExpName = "featureSubset", databases, fdr = 0.05 ) ## S4 method for signature 'matrix' geneSetEnrich(x, celdaModel, databases, fdr = 0.05)
geneSetEnrich( x, celdaModel, useAssay = "counts", altExpName = "featureSubset", databases, fdr = 0.05 ) ## S4 method for signature 'SingleCellExperiment' geneSetEnrich( x, useAssay = "counts", altExpName = "featureSubset", databases, fdr = 0.05 ) ## S4 method for signature 'matrix' geneSetEnrich(x, celdaModel, databases, fdr = 0.05)
x |
A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under |
celdaModel |
Celda object of class |
useAssay |
A string specifying which assay
slot to use if |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
databases |
Character vector. Name of reference database. Available databases can be viewed by listEnrichrDbs. |
fdr |
False discovery rate (FDR). Numeric. Cutoff value for adjusted p-value, terms with FDR below this value are considered significantly enriched. |
List of length 'L' where each member contains the significantly enriched terms for the corresponding module.
Ahmed Youssef, Zhe Wang
library(M3DExampleData) counts <- M3DExampleData::Mmus_example_list$data # subset 500 genes for fast clustering counts <- counts[seq(1501, 2000), ] # cluster genes into 10 modules for quick demo sce <- celda_G(x = as.matrix(counts), L = 10, verbose = FALSE) gse <- geneSetEnrich(sce, databases = c("GO_Biological_Process_2018", "GO_Molecular_Function_2018"))
library(M3DExampleData) counts <- M3DExampleData::Mmus_example_list$data # subset 500 genes for fast clustering counts <- counts[seq(1501, 2000), ] # cluster genes into 10 modules for quick demo sce <- celda_G(x = as.matrix(counts), L = 10, verbose = FALSE) gse <- geneSetEnrich(sce, databases = c("GO_Biological_Process_2018", "GO_Molecular_Function_2018"))
Calculate the log-likelihood for cell population and feature module cluster assignments on the count matrix, per celda model.
logLikelihood(x, celdaMod, useAssay = "counts", altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment,ANY' logLikelihood(x, useAssay = "counts", altExpName = "featureSubset") ## S4 method for signature 'matrix,celda_C' logLikelihood(x, celdaMod) ## S4 method for signature 'matrix,celda_G' logLikelihood(x, celdaMod) ## S4 method for signature 'matrix,celda_CG' logLikelihood(x, celdaMod)
logLikelihood(x, celdaMod, useAssay = "counts", altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment,ANY' logLikelihood(x, useAssay = "counts", altExpName = "featureSubset") ## S4 method for signature 'matrix,celda_C' logLikelihood(x, celdaMod) ## S4 method for signature 'matrix,celda_G' logLikelihood(x, celdaMod) ## S4 method for signature 'matrix,celda_CG' logLikelihood(x, celdaMod)
x |
A SingleCellExperiment object returned by
celda_C, celda_G, or celda_CG, with the matrix
located in the |
celdaMod |
celda model object. Ignored if |
useAssay |
A string specifying which assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
The log-likelihood of the cluster assignment for the provided SingleCellExperiment.
'celda_C()' for clustering cells
data(sceCeldaC, sceCeldaCG) loglikC <- logLikelihood(sceCeldaC) loglikCG <- logLikelihood(sceCeldaCG)
data(sceCeldaC, sceCeldaCG) loglikC <- logLikelihood(sceCeldaC) loglikCG <- logLikelihood(sceCeldaCG)
Retrieves the complete log-likelihood from all iterations of Gibbs sampling used to generate a celda model.
logLikelihoodHistory(x, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' logLikelihoodHistory(x, altExpName = "featureSubset") ## S4 method for signature 'celdaModel' logLikelihoodHistory(x)
logLikelihoodHistory(x, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' logLikelihoodHistory(x, altExpName = "featureSubset") ## S4 method for signature 'celdaModel' logLikelihoodHistory(x)
x |
A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG, or a celda model object. |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
Numeric. The log-likelihood at each step of Gibbs sampling used to generate the model.
data(sceCeldaCG) logLikelihoodHistory(sceCeldaCG) data(celdaCGMod) logLikelihoodHistory(celdaCGMod)
data(sceCeldaCG) logLikelihoodHistory(sceCeldaCG) data(celdaCGMod) logLikelihoodHistory(celdaCGMod)
Retrieves the row, column, and sample names used to generate a celdaModel.
matrixNames(celdaMod) ## S4 method for signature 'celdaModel' matrixNames(celdaMod)
matrixNames(celdaMod) ## S4 method for signature 'celdaModel' matrixNames(celdaMod)
celdaMod |
celdaModel. Options available in 'celda::availableModels'. |
List. Contains row, column, and sample character vectors corresponding to the values provided when the celdaModel was generated.
data(celdaCGMod) matrixNames(celdaCGMod)
data(celdaCGMod) matrixNames(celdaCGMod)
Renders a heatmap for selected featureModule
. Cells are
ordered from those with the lowest probability of the module on the left to
the highest probability on the right. Features are ordered from those
with the highest probability in the module
on the top to the lowest probability on the bottom.
moduleHeatmap( x, useAssay = "counts", altExpName = "featureSubset", modules = NULL, featureModule = NULL, col = circlize::colorRamp2(c(-2, 0, 2), c("#1E90FF", "#FFFFFF", "#CD2626")), topCells = 100, topFeatures = NULL, normalizedCounts = NA, normalize = "proportion", transformationFun = sqrt, scaleRow = scale, showFeatureNames = TRUE, displayName = NULL, trim = c(-2, 2), rowFontSize = NULL, showHeatmapLegend = FALSE, showTopAnnotationLegend = FALSE, showTopAnnotationName = FALSE, topAnnotationHeight = 5, showModuleLabel = TRUE, moduleLabel = "auto", moduleLabelSize = NULL, byrow = TRUE, top = NA, unit = "mm", ncol = NULL, useRaster = TRUE, returnAsList = FALSE, ... ) ## S4 method for signature 'SingleCellExperiment' moduleHeatmap( x, useAssay = "counts", altExpName = "featureSubset", modules = NULL, featureModule = NULL, col = circlize::colorRamp2(c(-2, 0, 2), c("#1E90FF", "#FFFFFF", "#CD2626")), topCells = 100, topFeatures = NULL, normalizedCounts = NA, normalize = "proportion", transformationFun = sqrt, scaleRow = scale, showFeatureNames = TRUE, displayName = NULL, trim = c(-2, 2), rowFontSize = NULL, showHeatmapLegend = FALSE, showTopAnnotationLegend = FALSE, showTopAnnotationName = FALSE, topAnnotationHeight = 5, showModuleLabel = TRUE, moduleLabel = "auto", moduleLabelSize = NULL, byrow = TRUE, top = NA, unit = "mm", ncol = NULL, useRaster = TRUE, returnAsList = FALSE, ... )
moduleHeatmap( x, useAssay = "counts", altExpName = "featureSubset", modules = NULL, featureModule = NULL, col = circlize::colorRamp2(c(-2, 0, 2), c("#1E90FF", "#FFFFFF", "#CD2626")), topCells = 100, topFeatures = NULL, normalizedCounts = NA, normalize = "proportion", transformationFun = sqrt, scaleRow = scale, showFeatureNames = TRUE, displayName = NULL, trim = c(-2, 2), rowFontSize = NULL, showHeatmapLegend = FALSE, showTopAnnotationLegend = FALSE, showTopAnnotationName = FALSE, topAnnotationHeight = 5, showModuleLabel = TRUE, moduleLabel = "auto", moduleLabelSize = NULL, byrow = TRUE, top = NA, unit = "mm", ncol = NULL, useRaster = TRUE, returnAsList = FALSE, ... ) ## S4 method for signature 'SingleCellExperiment' moduleHeatmap( x, useAssay = "counts", altExpName = "featureSubset", modules = NULL, featureModule = NULL, col = circlize::colorRamp2(c(-2, 0, 2), c("#1E90FF", "#FFFFFF", "#CD2626")), topCells = 100, topFeatures = NULL, normalizedCounts = NA, normalize = "proportion", transformationFun = sqrt, scaleRow = scale, showFeatureNames = TRUE, displayName = NULL, trim = c(-2, 2), rowFontSize = NULL, showHeatmapLegend = FALSE, showTopAnnotationLegend = FALSE, showTopAnnotationName = FALSE, topAnnotationHeight = 5, showModuleLabel = TRUE, moduleLabel = "auto", moduleLabelSize = NULL, byrow = TRUE, top = NA, unit = "mm", ncol = NULL, useRaster = TRUE, returnAsList = FALSE, ... )
x |
A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under |
useAssay |
A string specifying which assay
slot to use if |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
modules |
Integer Vector. The featureModule(s) to display.
Multiple modules can be included in a vector. Default |
featureModule |
Same as |
col |
Passed to Heatmap. Set color boundaries and colors. |
topCells |
Integer. Number of cells with the highest and lowest
probabilities for each module to include in the heatmap. For example, if
|
topFeatures |
Integer. Plot 'topFeatures' features with the highest
probabilities in the module heatmap for each featureModule. If |
normalizedCounts |
Integer matrix. Rows represent features and columns
represent cells. If you have a normalized matrix result from
normalizeCounts, you can pass through the result here to
skip the normalization step in this function. Make sure the colnames and
rownames match the object in x. This matrix should
correspond to one generated from this count matrix
|
normalize |
Character. Passed to normalizeCounts if
|
transformationFun |
Function. Passed to normalizeCounts if
|
scaleRow |
Function. Which function to use to scale each individual row. Set to NULL to disable. Occurs after normalization and log transformation. For example, scale will Z-score transform each row. Default scale. |
showFeatureNames |
Logical. Whether feature names should be displayed. Default TRUE. |
displayName |
Character. The column name of
|
trim |
Numeric vector. Vector of length two that specifies the lower
and upper bounds for plotting the data. This threshold is applied
after row scaling. Set to NULL to disable. Default |
rowFontSize |
Numeric. Font size for feature names. If |
showHeatmapLegend |
Passed to Heatmap. Show legend for expression levels. |
showTopAnnotationLegend |
Passed to HeatmapAnnotation. Show legend for cell annotation. |
showTopAnnotationName |
Passed to HeatmapAnnotation. Show heatmap top annotation name. |
topAnnotationHeight |
Passed to HeatmapAnnotation. Column annotation height. rowAnnotation. Show legend for module annotation. |
showModuleLabel |
Show left side module labels. |
moduleLabel |
The left side row titles for module heatmap. Must be
vector of the same length as |
moduleLabelSize |
Passed to gpar. The size of text (in points). |
byrow |
Passed to matrix. logical. If |
top |
Passed to marrangeGrob. The title for each page. |
unit |
Passed to unit. Single character object defining the unit of all dimensions defined. |
ncol |
Integer. Number of columns of module heatmaps. If |
useRaster |
Boolean. Rasterizing will make the heatmap a single object
and reduced the memory of the plot and the size of a file. If |
returnAsList |
Boolean. If |
... |
Additional parameters passed to Heatmap. |
A list object if plotting more than one module heatmaps. Otherwise a HeatmapList object is returned.
data(sceCeldaCG) moduleHeatmap(sceCeldaCG, displayName = "rownames")
data(sceCeldaCG) moduleHeatmap(sceCeldaCG, displayName = "rownames")
get row and column indices of none zero elements in the matrix
nonzero(R_counts)
nonzero(R_counts)
R_counts |
A matrix |
An integer matrix where each row is a row, column indices pair
Performs normalization, transformation, and/or scaling of a counts matrix
normalizeCounts( counts, normalize = c("proportion", "cpm", "median", "mean"), scaleFactor = NULL, transformationFun = NULL, scaleFun = NULL, pseudocountNormalize = 0, pseudocountTransform = 0 )
normalizeCounts( counts, normalize = c("proportion", "cpm", "median", "mean"), scaleFactor = NULL, transformationFun = NULL, scaleFun = NULL, pseudocountNormalize = 0, pseudocountTransform = 0 )
counts |
Integer, Numeric or Sparse matrix. Rows represent features and columns represent cells. |
normalize |
Character. Divides counts by the library sizes for each cell. One of 'proportion', 'cpm', 'median', or 'mean'. 'proportion' uses the total counts for each cell as the library size. 'cpm' divides the library size of each cell by one million to produce counts per million. 'median' divides the library size of each cell by the median library size across all cells. 'mean' divides the library size of each cell by the mean library size across all cells. |
scaleFactor |
Numeric. Sets the scale factor for cell-level
normalization. This scale factor is multiplied to each cell after the
library size of each cell had been adjusted in |
transformationFun |
Function. Applys a transformation such as sqrt, log, log2, log10, or log1p. If NULL, no transformation will be applied. Occurs after normalization. Default NULL. |
scaleFun |
Function. Scales the rows of the normalized and transformed count matrix. For example, 'scale' can be used to z-score normalize the rows. Default NULL. |
pseudocountNormalize |
Numeric. Add a pseudocount to counts before normalization. Default 0. |
pseudocountTransform |
Numeric. Add a pseudocount to normalized counts before applying the transformation function. Adding a pseudocount can be useful before applying a log transformation. Default 0. |
Numeric Matrix. A normalized matrix.
data(celdaCGSim) normalizedCounts <- normalizeCounts(celdaCGSim$counts, "proportion", pseudocountNormalize = 1)
data(celdaCGSim) normalizedCounts <- normalizeCounts(celdaCGSim$counts, "proportion", pseudocountNormalize = 1)
Retrieves the K/L, model priors (e.g. alpha, beta), and count matrix checksum parameters provided during the creation of the provided celdaModel.
params(celdaMod) ## S4 method for signature 'celdaModel' params(celdaMod)
params(celdaMod) ## S4 method for signature 'celdaModel' params(celdaMod)
celdaMod |
celdaModel. Options available in
|
List. Contains the model-specific parameters for the provided celda model object depending on its class.
data(celdaCGMod) params(celdaCGMod)
data(celdaCGMod) params(celdaCGMod)
Perplexity is a statistical measure of how well a probability model can predict new data. Lower perplexity indicates a better model.
perplexity( x, celdaMod, useAssay = "counts", altExpName = "featureSubset", newCounts = NULL ) ## S4 method for signature 'SingleCellExperiment,ANY' perplexity( x, useAssay = "counts", altExpName = "featureSubset", newCounts = NULL ) ## S4 method for signature 'ANY,celda_CG' perplexity(x, celdaMod, newCounts = NULL) ## S4 method for signature 'ANY,celda_C' perplexity(x, celdaMod, newCounts = NULL) ## S4 method for signature 'ANY,celda_G' perplexity(x, celdaMod, newCounts = NULL)
perplexity( x, celdaMod, useAssay = "counts", altExpName = "featureSubset", newCounts = NULL ) ## S4 method for signature 'SingleCellExperiment,ANY' perplexity( x, useAssay = "counts", altExpName = "featureSubset", newCounts = NULL ) ## S4 method for signature 'ANY,celda_CG' perplexity(x, celdaMod, newCounts = NULL) ## S4 method for signature 'ANY,celda_C' perplexity(x, celdaMod, newCounts = NULL) ## S4 method for signature 'ANY,celda_G' perplexity(x, celdaMod, newCounts = NULL)
x |
Can be one of
|
celdaMod |
Celda model object. Only works if |
useAssay |
A string specifying which assay
slot to use if |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
newCounts |
A new counts matrix used to calculate perplexity. If NULL,
perplexity will be calculated for the matrix in |
Numeric. The perplexity for the provided x
(and
celdaModel
).
data(sceCeldaCG) perplexity <- perplexity(sceCeldaCG) data(celdaCGSim, celdaCGMod) perplexity <- perplexity(celdaCGSim$counts, celdaCGMod) data(celdaCSim, celdaCMod) perplexity <- perplexity(celdaCSim$counts, celdaCMod) data(celdaGSim, celdaGMod) perplexity <- perplexity(celdaGSim$counts, celdaGMod)
data(sceCeldaCG) perplexity <- perplexity(sceCeldaCG) data(celdaCGSim, celdaCGMod) perplexity <- perplexity(celdaCGSim$counts, celdaCGMod) data(celdaCSim, celdaCMod) perplexity <- perplexity(celdaCSim$counts, celdaCMod) data(celdaGSim, celdaGMod) perplexity <- perplexity(celdaGSim$counts, celdaGMod)
Outputs a violin plot for feature expression data.
plotCeldaViolin( x, celdaMod, features, displayName = NULL, useAssay = "counts", altExpName = "featureSubset", exactMatch = TRUE, plotDots = TRUE, dotSize = 0.1 ) ## S4 method for signature 'SingleCellExperiment' plotCeldaViolin( x, features, displayName = NULL, useAssay = "counts", altExpName = "featureSubset", exactMatch = TRUE, plotDots = TRUE, dotSize = 0.1 ) ## S4 method for signature 'ANY' plotCeldaViolin( x, celdaMod, features, exactMatch = TRUE, plotDots = TRUE, dotSize = 0.1 )
plotCeldaViolin( x, celdaMod, features, displayName = NULL, useAssay = "counts", altExpName = "featureSubset", exactMatch = TRUE, plotDots = TRUE, dotSize = 0.1 ) ## S4 method for signature 'SingleCellExperiment' plotCeldaViolin( x, features, displayName = NULL, useAssay = "counts", altExpName = "featureSubset", exactMatch = TRUE, plotDots = TRUE, dotSize = 0.1 ) ## S4 method for signature 'ANY' plotCeldaViolin( x, celdaMod, features, exactMatch = TRUE, plotDots = TRUE, dotSize = 0.1 )
x |
Numeric matrix or a SingleCellExperiment object
with the matrix located in the assay slot under |
celdaMod |
Celda object of class "celda_G" or "celda_CG". Used only if
|
features |
Character vector. Uses these genes for plotting. |
displayName |
Character. The column name of
|
useAssay |
A string specifying which assay
slot to use if |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
exactMatch |
Logical. Whether an exact match or a partial match using
|
plotDots |
Boolean. If |
dotSize |
Numeric. Size of points if |
Violin plot for each feature, grouped by celda cluster
data(sceCeldaCG) plotCeldaViolin(x = sceCeldaCG, features = "Gene_1") data(celdaCGSim, celdaCGMod) plotCeldaViolin(x = celdaCGSim$counts, celdaMod = celdaCGMod, features = "Gene_1")
data(sceCeldaCG) plotCeldaViolin(x = sceCeldaCG, features = "Gene_1") data(celdaCGSim, celdaCGMod) plotCeldaViolin(x = celdaCGSim$counts, celdaMod = celdaCGMod, features = "Gene_1")
A scatter plot of the UMAP dimensions generated by DecontX with cells colored by the estimated percentation of contamation.
plotDecontXContamination( x, batch = NULL, colorScale = c("blue", "green", "yellow", "orange", "red"), size = 1 )
plotDecontXContamination( x, batch = NULL, colorScale = c("blue", "green", "yellow", "orange", "red"), size = 1 )
x |
Either a SingleCellExperiment with |
batch |
Character. Batch of cells to plot. If |
colorScale |
Character vector. Contains the color spectrum to be passed
to |
size |
Numeric. Size of points in the scatterplot. Default 1. |
Returns a ggplot
object.
Shiyi Yang, Joshua Campbell
See decontX
for a full example of how to estimate
and plot contamination.
Generates a violin plot that shows the counts of marker
genes in cells across specific clusters or cell types. Can be used to view
the expression of marker genes in different cell types before and after
decontamination with decontX
.
plotDecontXMarkerExpression( x, markers, groupClusters = NULL, assayName = c("counts", "decontXcounts"), z = NULL, exactMatch = TRUE, by = "rownames", log1p = FALSE, ncol = NULL, plotDots = FALSE, dotSize = 0.1 )
plotDecontXMarkerExpression( x, markers, groupClusters = NULL, assayName = c("counts", "decontXcounts"), z = NULL, exactMatch = TRUE, by = "rownames", log1p = FALSE, ncol = NULL, plotDots = FALSE, dotSize = 0.1 )
x |
Either a SingleCellExperiment or a matrix-like object of counts. |
markers |
Character Vector or List. A character vector or list of character vectors with the names of the marker genes of interest. |
groupClusters |
List. A named list that allows
cell clusters labels coded in
|
assayName |
Character vector. Name(s) of the assay(s) to
plot if |
z |
Character, Integer, or Vector.
Indicates the cluster labels for each cell.
If |
exactMatch |
Boolean. Whether to only identify exact matches
for the markers or to identify partial matches using |
by |
Character. Where to search for the markers if |
log1p |
Boolean. Whether to apply the function |
ncol |
Integer. Number of columns to make in the plot.
Default |
plotDots |
Boolean. If |
dotSize |
Numeric. Size of points if |
Returns a ggplot
object.
Shiyi Yang, Joshua Campbell
See decontX
for a full example of how to estimate
and plot contamination.
Generates a barplot that shows the percentage of
cells within clusters or cell types that have detectable levels
of given marker genes. Can be used to view the expression of
marker genes in different cell types before and after
decontamination with decontX
.
plotDecontXMarkerPercentage( x, markers, groupClusters = NULL, assayName = c("counts", "decontXcounts"), z = NULL, threshold = 1, exactMatch = TRUE, by = "rownames", ncol = round(sqrt(length(markers))), labelBars = TRUE, labelSize = 3 )
plotDecontXMarkerPercentage( x, markers, groupClusters = NULL, assayName = c("counts", "decontXcounts"), z = NULL, threshold = 1, exactMatch = TRUE, by = "rownames", ncol = round(sqrt(length(markers))), labelBars = TRUE, labelSize = 3 )
x |
Either a SingleCellExperiment or a matrix-like object of counts. |
markers |
List. A named list indicating the marker genes
for each cell type of
interest. Multiple markers can be supplied for each cell type. For example,
|
groupClusters |
List. A named list that allows
cell clusters labels coded in
|
assayName |
Character vector. Name(s) of the assay(s) to
plot if |
z |
Character, Integer, or Vector. Indicates the cluster labels
for each cell.
If |
threshold |
Numeric. Markers greater than or equal to this value will be considered detected in a cell. Default 1. |
exactMatch |
Boolean. Whether to only identify exact matches
for the markers or to identify partial matches using |
by |
Character. Where to search for the markers if |
ncol |
Integer. Number of columns to make in the plot.
Default |
labelBars |
Boolean. Whether to display percentages above each bar
Default |
labelSize |
Numeric. Size of the percentage labels in the barplot. Default 3. |
Returns a ggplot
object.
Shiyi Yang, Joshua Campbell
See decontX
for a full example of how to estimate
and plot contamination.
Create a scatterplot for each row of a normalized
gene expression matrix where x and y axis are from a
data dimension reduction tool.
The cells are colored by "celda_cell_cluster" column in
colData(altExp(x, altExpName))
if x
is a
SingleCellExperiment object, or x
if x
is
a integer vector of cell cluster labels.
plotDimReduceCluster( x, reducedDimName, altExpName = "featureSubset", dim1 = NULL, dim2 = NULL, size = 0.5, xlab = NULL, ylab = NULL, specificClusters = NULL, labelClusters = FALSE, groupBy = NULL, labelSize = 3.5 ) ## S4 method for signature 'SingleCellExperiment' plotDimReduceCluster( x, reducedDimName, altExpName = "featureSubset", dim1 = 1, dim2 = 2, size = 0.5, xlab = NULL, ylab = NULL, specificClusters = NULL, labelClusters = FALSE, groupBy = NULL, labelSize = 3.5 ) ## S4 method for signature 'vector' plotDimReduceCluster( x, dim1, dim2, size = 0.5, xlab = "Dimension_1", ylab = "Dimension_2", specificClusters = NULL, labelClusters = FALSE, groupBy = NULL, labelSize = 3.5 )
plotDimReduceCluster( x, reducedDimName, altExpName = "featureSubset", dim1 = NULL, dim2 = NULL, size = 0.5, xlab = NULL, ylab = NULL, specificClusters = NULL, labelClusters = FALSE, groupBy = NULL, labelSize = 3.5 ) ## S4 method for signature 'SingleCellExperiment' plotDimReduceCluster( x, reducedDimName, altExpName = "featureSubset", dim1 = 1, dim2 = 2, size = 0.5, xlab = NULL, ylab = NULL, specificClusters = NULL, labelClusters = FALSE, groupBy = NULL, labelSize = 3.5 ) ## S4 method for signature 'vector' plotDimReduceCluster( x, dim1, dim2, size = 0.5, xlab = "Dimension_1", ylab = "Dimension_2", specificClusters = NULL, labelClusters = FALSE, groupBy = NULL, labelSize = 3.5 )
x |
Integer vector of cell cluster labels or a
SingleCellExperiment object
containing cluster labels for each cell in |
reducedDimName |
The name of the dimension reduction slot in
|
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
dim1 |
Integer or numeric vector. If |
dim2 |
Integer or numeric vector. If |
size |
Numeric. Sets size of point on plot. Default |
xlab |
Character vector. Label for the x-axis. Default |
ylab |
Character vector. Label for the y-axis. Default |
specificClusters |
Numeric vector.
Only color cells in the specified clusters.
All other cells will be grey.
If NULL, all clusters will be colored. Default |
labelClusters |
Logical. Whether the cluster labels are plotted. Default FALSE. |
groupBy |
Character vector. Contains sample labels for each cell. If NULL, all samples will be plotted together. Default NULL. |
labelSize |
Numeric. Sets size of label if labelClusters is TRUE. Default 3.5. |
The plot as a ggplot object
data(sceCeldaCG) sce <- celdaTsne(sceCeldaCG) plotDimReduceCluster(x = sce, reducedDimName = "celda_tSNE", specificClusters = c(1, 2, 3)) library(SingleCellExperiment) data(sceCeldaCG, celdaCGMod) sce <- celdaTsne(sceCeldaCG) plotDimReduceCluster(x = celdaClusters(celdaCGMod)$z, dim1 = reducedDim(altExp(sce), "celda_tSNE")[, 1], dim2 = reducedDim(altExp(sce), "celda_tSNE")[, 2], specificClusters = c(1, 2, 3))
data(sceCeldaCG) sce <- celdaTsne(sceCeldaCG) plotDimReduceCluster(x = sce, reducedDimName = "celda_tSNE", specificClusters = c(1, 2, 3)) library(SingleCellExperiment) data(sceCeldaCG, celdaCGMod) sce <- celdaTsne(sceCeldaCG) plotDimReduceCluster(x = celdaClusters(celdaCGMod)$z, dim1 = reducedDim(altExp(sce), "celda_tSNE")[, 1], dim2 = reducedDim(altExp(sce), "celda_tSNE")[, 2], specificClusters = c(1, 2, 3))
Create a scatterplot for each row of a normalized gene expression matrix where x and y axis are from a data dimension reduction tool. The cells are colored by expression of the specified feature.
plotDimReduceFeature( x, features, reducedDimName = NULL, displayName = NULL, dim1 = NULL, dim2 = NULL, headers = NULL, useAssay = "counts", altExpName = "featureSubset", normalize = FALSE, zscore = TRUE, exactMatch = TRUE, trim = c(-2, 2), limits = c(-2, 2), size = 0.5, xlab = NULL, ylab = NULL, colorLow = "blue4", colorMid = "grey90", colorHigh = "firebrick1", midpoint = 0, ncol = NULL, decreasing = FALSE ) ## S4 method for signature 'SingleCellExperiment' plotDimReduceFeature( x, features, reducedDimName = NULL, displayName = NULL, dim1 = 1, dim2 = 2, headers = NULL, useAssay = "counts", altExpName = "featureSubset", normalize = FALSE, zscore = TRUE, exactMatch = TRUE, trim = c(-2, 2), limits = c(-2, 2), size = 0.5, xlab = NULL, ylab = NULL, colorLow = "blue4", colorMid = "grey90", colorHigh = "firebrick1", midpoint = 0, ncol = NULL, decreasing = FALSE ) ## S4 method for signature 'ANY' plotDimReduceFeature( x, features, dim1, dim2, headers = NULL, normalize = FALSE, zscore = TRUE, exactMatch = TRUE, trim = c(-2, 2), limits = c(-2, 2), size = 0.5, xlab = "Dimension_1", ylab = "Dimension_2", colorLow = "blue4", colorMid = "grey90", colorHigh = "firebrick1", midpoint = 0, ncol = NULL, decreasing = FALSE )
plotDimReduceFeature( x, features, reducedDimName = NULL, displayName = NULL, dim1 = NULL, dim2 = NULL, headers = NULL, useAssay = "counts", altExpName = "featureSubset", normalize = FALSE, zscore = TRUE, exactMatch = TRUE, trim = c(-2, 2), limits = c(-2, 2), size = 0.5, xlab = NULL, ylab = NULL, colorLow = "blue4", colorMid = "grey90", colorHigh = "firebrick1", midpoint = 0, ncol = NULL, decreasing = FALSE ) ## S4 method for signature 'SingleCellExperiment' plotDimReduceFeature( x, features, reducedDimName = NULL, displayName = NULL, dim1 = 1, dim2 = 2, headers = NULL, useAssay = "counts", altExpName = "featureSubset", normalize = FALSE, zscore = TRUE, exactMatch = TRUE, trim = c(-2, 2), limits = c(-2, 2), size = 0.5, xlab = NULL, ylab = NULL, colorLow = "blue4", colorMid = "grey90", colorHigh = "firebrick1", midpoint = 0, ncol = NULL, decreasing = FALSE ) ## S4 method for signature 'ANY' plotDimReduceFeature( x, features, dim1, dim2, headers = NULL, normalize = FALSE, zscore = TRUE, exactMatch = TRUE, trim = c(-2, 2), limits = c(-2, 2), size = 0.5, xlab = "Dimension_1", ylab = "Dimension_2", colorLow = "blue4", colorMid = "grey90", colorHigh = "firebrick1", midpoint = 0, ncol = NULL, decreasing = FALSE )
x |
Numeric matrix or a SingleCellExperiment object
with the matrix located in the assay slot under |
features |
Character vector. Features in the rownames of counts to plot. |
reducedDimName |
The name of the dimension reduction slot in
|
displayName |
Character. The column name of
|
dim1 |
Integer or numeric vector. If |
dim2 |
Integer or numeric vector. If |
headers |
Character vector. If |
useAssay |
A string specifying which assay
slot to use if |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
normalize |
Logical. Whether to normalize the columns of 'counts'.
Default |
zscore |
Logical. Whether to scale each feature to have a mean 0
and standard deviation of 1. Default |
exactMatch |
Logical. Whether an exact match or a partial match using
|
trim |
Numeric vector. Vector of length two that specifies the lower
and upper bounds for the data. This threshold is applied after row scaling.
Set to NULL to disable. Default |
limits |
Passed to scale_colour_gradient2. The range of color scale. |
size |
Numeric. Sets size of point on plot. Default 1. |
xlab |
Character vector. Label for the x-axis. If |
ylab |
Character vector. Label for the y-axis. If |
colorLow |
Character. A color available from 'colors()'. The color will be used to signify the lowest values on the scale. |
colorMid |
Character. A color available from 'colors()'. The color will be used to signify the midpoint on the scale. |
colorHigh |
Character. A color available from 'colors()'. The color will be used to signify the highest values on the scale. |
midpoint |
Numeric. The value indicating the midpoint of the
diverging color scheme. If |
ncol |
Integer. Passed to facet_wrap. Specify the number of columns for facet wrap. |
decreasing |
logical. Specifies the order of plotting the points.
If |
The plot as a ggplot object
data(sceCeldaCG) sce <- celdaTsne(sceCeldaCG) plotDimReduceFeature(x = sce, reducedDimName = "celda_tSNE", normalize = TRUE, features = c("Gene_98", "Gene_99"), exactMatch = TRUE) library(SingleCellExperiment) data(sceCeldaCG) sce <- celdaTsne(sceCeldaCG) plotDimReduceFeature(x = counts(sce), dim1 = reducedDim(altExp(sce), "celda_tSNE")[, 1], dim2 = reducedDim(altExp(sce), "celda_tSNE")[, 2], normalize = TRUE, features = c("Gene_98", "Gene_99"), exactMatch = TRUE)
data(sceCeldaCG) sce <- celdaTsne(sceCeldaCG) plotDimReduceFeature(x = sce, reducedDimName = "celda_tSNE", normalize = TRUE, features = c("Gene_98", "Gene_99"), exactMatch = TRUE) library(SingleCellExperiment) data(sceCeldaCG) sce <- celdaTsne(sceCeldaCG) plotDimReduceFeature(x = counts(sce), dim1 = reducedDim(altExp(sce), "celda_tSNE")[, 1], dim2 = reducedDim(altExp(sce), "celda_tSNE")[, 2], normalize = TRUE, features = c("Gene_98", "Gene_99"), exactMatch = TRUE)
Creates a scatterplot given two dimensions from a data dimension reduction tool (e.g tSNE) output.
plotDimReduceGrid( x, reducedDimName, dim1 = NULL, dim2 = NULL, useAssay = "counts", altExpName = "featureSubset", size = 1, xlab = "Dimension_1", ylab = "Dimension_2", limits = c(-2, 2), colorLow = "blue4", colorMid = "grey90", colorHigh = "firebrick1", midpoint = 0, varLabel = NULL, ncol = NULL, headers = NULL, decreasing = FALSE ) ## S4 method for signature 'SingleCellExperiment' plotDimReduceGrid( x, reducedDimName, dim1 = NULL, dim2 = NULL, useAssay = "counts", altExpName = "featureSubset", size = 1, xlab = "Dimension_1", ylab = "Dimension_2", limits = c(-2, 2), colorLow = "blue4", colorMid = "grey90", colorHigh = "firebrick1", midpoint = 0, varLabel = NULL, ncol = NULL, headers = NULL, decreasing = FALSE ) ## S4 method for signature 'ANY' plotDimReduceGrid( x, dim1, dim2, size = 1, xlab = "Dimension_1", ylab = "Dimension_2", limits = c(-2, 2), colorLow = "blue4", colorMid = "grey90", colorHigh = "firebrick1", midpoint = 0, varLabel = NULL, ncol = NULL, headers = NULL, decreasing = FALSE )
plotDimReduceGrid( x, reducedDimName, dim1 = NULL, dim2 = NULL, useAssay = "counts", altExpName = "featureSubset", size = 1, xlab = "Dimension_1", ylab = "Dimension_2", limits = c(-2, 2), colorLow = "blue4", colorMid = "grey90", colorHigh = "firebrick1", midpoint = 0, varLabel = NULL, ncol = NULL, headers = NULL, decreasing = FALSE ) ## S4 method for signature 'SingleCellExperiment' plotDimReduceGrid( x, reducedDimName, dim1 = NULL, dim2 = NULL, useAssay = "counts", altExpName = "featureSubset", size = 1, xlab = "Dimension_1", ylab = "Dimension_2", limits = c(-2, 2), colorLow = "blue4", colorMid = "grey90", colorHigh = "firebrick1", midpoint = 0, varLabel = NULL, ncol = NULL, headers = NULL, decreasing = FALSE ) ## S4 method for signature 'ANY' plotDimReduceGrid( x, dim1, dim2, size = 1, xlab = "Dimension_1", ylab = "Dimension_2", limits = c(-2, 2), colorLow = "blue4", colorMid = "grey90", colorHigh = "firebrick1", midpoint = 0, varLabel = NULL, ncol = NULL, headers = NULL, decreasing = FALSE )
x |
Numeric matrix or a SingleCellExperiment object
with the matrix located in the assay slot under |
reducedDimName |
The name of the dimension reduction slot in
|
dim1 |
Numeric vector. Second dimension from data dimension reduction output. |
dim2 |
Numeric vector. Second dimension from data dimension reduction output. |
useAssay |
A string specifying which assay
slot to use if |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
size |
Numeric. Sets size of point on plot. Default 1. |
xlab |
Character vector. Label for the x-axis. Default 'Dimension_1'. |
ylab |
Character vector. Label for the y-axis. Default 'Dimension_2'. |
limits |
Passed to scale_colour_gradient2. The range of color scale. |
colorLow |
Character. A color available from 'colors()'. The color will be used to signify the lowest values on the scale. Default "blue4". |
colorMid |
Character. A color available from 'colors()'. The color will be used to signify the midpoint on the scale. Default "grey90". |
colorHigh |
Character. A color available from 'colors()'. The color will be used to signify the highest values on the scale. Default "firebrick1". |
midpoint |
Numeric. The value indicating the midpoint of the
diverging color scheme. If |
varLabel |
Character vector. Title for the color legend. |
ncol |
Integer. Passed to facet_wrap. Specify the number of columns for facet wrap. |
headers |
Character vector. If 'NULL', the corresponding rownames are used as labels. Otherwise, these headers are used to label the genes. |
decreasing |
logical. Specifies the order of plotting the points.
If |
The plot as a ggplot object
data(sceCeldaCG) sce <- celdaTsne(sceCeldaCG) plotDimReduceGrid(x = sce, reducedDimName = "celda_tSNE", xlab = "Dimension1", ylab = "Dimension2", varLabel = "tSNE") library(SingleCellExperiment) data(sceCeldaCG) sce <- celdaTsne(sceCeldaCG) plotDimReduceGrid(x = counts(sce), dim1 = reducedDim(altExp(sce), "celda_tSNE")[, 1], dim2 = reducedDim(altExp(sce), "celda_tSNE")[, 2], xlab = "Dimension1", ylab = "Dimension2", varLabel = "tSNE")
data(sceCeldaCG) sce <- celdaTsne(sceCeldaCG) plotDimReduceGrid(x = sce, reducedDimName = "celda_tSNE", xlab = "Dimension1", ylab = "Dimension2", varLabel = "tSNE") library(SingleCellExperiment) data(sceCeldaCG) sce <- celdaTsne(sceCeldaCG) plotDimReduceGrid(x = counts(sce), dim1 = reducedDim(altExp(sce), "celda_tSNE")[, 1], dim2 = reducedDim(altExp(sce), "celda_tSNE")[, 2], xlab = "Dimension1", ylab = "Dimension2", varLabel = "tSNE")
Create a scatterplot for each row of a normalized gene expression matrix where x and y axis are from a data dimension reduction tool. The cells are colored by the module probability.
plotDimReduceModule( x, reducedDimName, useAssay = "counts", altExpName = "featureSubset", celdaMod, modules = NULL, dim1 = NULL, dim2 = NULL, size = 0.5, xlab = NULL, ylab = NULL, rescale = TRUE, limits = c(0, 1), colorLow = "grey90", colorHigh = "firebrick1", ncol = NULL, decreasing = FALSE ) ## S4 method for signature 'SingleCellExperiment' plotDimReduceModule( x, reducedDimName, useAssay = "counts", altExpName = "featureSubset", modules = NULL, dim1 = 1, dim2 = 2, size = 0.5, xlab = NULL, ylab = NULL, rescale = TRUE, limits = c(0, 1), colorLow = "grey90", colorHigh = "firebrick1", ncol = NULL, decreasing = FALSE ) ## S4 method for signature 'ANY' plotDimReduceModule( x, celdaMod, modules = NULL, dim1, dim2, size = 0.5, xlab = "Dimension_1", ylab = "Dimension_2", rescale = TRUE, limits = c(0, 1), colorLow = "grey90", colorHigh = "firebrick1", ncol = NULL, decreasing = FALSE )
plotDimReduceModule( x, reducedDimName, useAssay = "counts", altExpName = "featureSubset", celdaMod, modules = NULL, dim1 = NULL, dim2 = NULL, size = 0.5, xlab = NULL, ylab = NULL, rescale = TRUE, limits = c(0, 1), colorLow = "grey90", colorHigh = "firebrick1", ncol = NULL, decreasing = FALSE ) ## S4 method for signature 'SingleCellExperiment' plotDimReduceModule( x, reducedDimName, useAssay = "counts", altExpName = "featureSubset", modules = NULL, dim1 = 1, dim2 = 2, size = 0.5, xlab = NULL, ylab = NULL, rescale = TRUE, limits = c(0, 1), colorLow = "grey90", colorHigh = "firebrick1", ncol = NULL, decreasing = FALSE ) ## S4 method for signature 'ANY' plotDimReduceModule( x, celdaMod, modules = NULL, dim1, dim2, size = 0.5, xlab = "Dimension_1", ylab = "Dimension_2", rescale = TRUE, limits = c(0, 1), colorLow = "grey90", colorHigh = "firebrick1", ncol = NULL, decreasing = FALSE )
x |
Numeric matrix or a SingleCellExperiment object
with the matrix located in the assay slot under |
reducedDimName |
The name of the dimension reduction slot in
|
useAssay |
A string specifying which assay
slot to use if |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
celdaMod |
Celda object of class "celda_G" or "celda_CG". Used only if
|
modules |
Character vector. Module(s) from celda model to be plotted. e.g. c("1", "2"). |
dim1 |
Integer or numeric vector. If |
dim2 |
Integer or numeric vector. If |
size |
Numeric. Sets size of point on plot. Default 0.5. |
xlab |
Character vector. Label for the x-axis. Default "Dimension_1". |
ylab |
Character vector. Label for the y-axis. Default "Dimension_2". |
rescale |
Logical. Whether rows of the matrix should be rescaled to [0, 1]. Default TRUE. |
limits |
Passed to scale_colour_gradient. The range of color scale. |
colorLow |
Character. A color available from 'colors()'. The color will be used to signify the lowest values on the scale. |
colorHigh |
Character. A color available from 'colors()'. The color will be used to signify the highest values on the scale. |
ncol |
Integer. Passed to facet_wrap. Specify the number of columns for facet wrap. |
decreasing |
logical. Specifies the order of plotting the points.
If |
The plot as a ggplot object
data(sceCeldaCG) sce <- celdaTsne(sceCeldaCG) plotDimReduceModule(x = sce, reducedDimName = "celda_tSNE", modules = c("1", "2")) library(SingleCellExperiment) data(sceCeldaCG, celdaCGMod) sce <- celdaTsne(sceCeldaCG) plotDimReduceModule(x = counts(sce), dim1 = reducedDim(altExp(sce), "celda_tSNE")[, 1], dim2 = reducedDim(altExp(sce), "celda_tSNE")[, 2], celdaMod = celdaCGMod, modules = c("1", "2"))
data(sceCeldaCG) sce <- celdaTsne(sceCeldaCG) plotDimReduceModule(x = sce, reducedDimName = "celda_tSNE", modules = c("1", "2")) library(SingleCellExperiment) data(sceCeldaCG, celdaCGMod) sce <- celdaTsne(sceCeldaCG) plotDimReduceModule(x = counts(sce), dim1 = reducedDim(altExp(sce), "celda_tSNE")[, 1], dim2 = reducedDim(altExp(sce), "celda_tSNE")[, 2], celdaMod = celdaCGMod, modules = c("1", "2"))
Visualize perplexity of every model in a celdaList, by unique K/L combinations
plotGridSearchPerplexity(x, altExpName = "featureSubset", sep = 5, alpha = 0.5) ## S4 method for signature 'SingleCellExperiment' plotGridSearchPerplexity(x, altExpName = "featureSubset", sep = 5, alpha = 0.5) ## S4 method for signature 'celdaList' plotGridSearchPerplexity(x, sep = 5, alpha = 0.5)
plotGridSearchPerplexity(x, altExpName = "featureSubset", sep = 5, alpha = 0.5) ## S4 method for signature 'SingleCellExperiment' plotGridSearchPerplexity(x, altExpName = "featureSubset", sep = 5, alpha = 0.5) ## S4 method for signature 'celdaList' plotGridSearchPerplexity(x, sep = 5, alpha = 0.5)
x |
Can be one of
|
altExpName |
The name for the altExp slot
to use. Default "featureSubset". Only works if |
sep |
Numeric. Breaks in the x axis of the resulting plot. |
alpha |
Numeric. Passed to geom_jitter. Opacity of the points. Values of alpha range from 0 to 1, with lower values corresponding to more transparent colors. |
A ggplot plot object showing perplexity as a function of clustering parameters.
data(sceCeldaCGGridSearch) sce <- resamplePerplexity(sceCeldaCGGridSearch) plotGridSearchPerplexity(sce) data(celdaCGSim, celdaCGGridSearchRes) ## Run various combinations of parameters with 'celdaGridSearch' celdaCGGridSearchRes <- resamplePerplexity( celdaCGSim$counts, celdaCGGridSearchRes) plotGridSearchPerplexity(celdaCGGridSearchRes)
data(sceCeldaCGGridSearch) sce <- resamplePerplexity(sceCeldaCGGridSearch) plotGridSearchPerplexity(sce) data(celdaCGSim, celdaCGGridSearchRes) ## Run various combinations of parameters with 'celdaGridSearch' celdaCGGridSearchRes <- resamplePerplexity( celdaCGSim$counts, celdaCGGridSearchRes) plotGridSearchPerplexity(celdaCGGridSearchRes)
Renders a heatmap based on a matrix of counts where rows are features and columns are cells.
plotHeatmap( counts, z = NULL, y = NULL, scaleRow = scale, trim = c(-2, 2), featureIx = NULL, cellIx = NULL, clusterFeature = TRUE, clusterCell = TRUE, colorScheme = c("divergent", "sequential"), colorSchemeSymmetric = TRUE, colorSchemeCenter = 0, col = NULL, annotationCell = NULL, annotationFeature = NULL, annotationColor = NULL, breaks = NULL, legend = TRUE, annotationLegend = TRUE, annotationNamesFeature = TRUE, annotationNamesCell = TRUE, showNamesFeature = FALSE, showNamesCell = FALSE, rowGroupOrder = NULL, colGroupOrder = NULL, hclustMethod = "ward.D2", treeheightFeature = ifelse(clusterFeature, 50, 0), treeheightCell = ifelse(clusterCell, 50, 0), silent = FALSE, ... )
plotHeatmap( counts, z = NULL, y = NULL, scaleRow = scale, trim = c(-2, 2), featureIx = NULL, cellIx = NULL, clusterFeature = TRUE, clusterCell = TRUE, colorScheme = c("divergent", "sequential"), colorSchemeSymmetric = TRUE, colorSchemeCenter = 0, col = NULL, annotationCell = NULL, annotationFeature = NULL, annotationColor = NULL, breaks = NULL, legend = TRUE, annotationLegend = TRUE, annotationNamesFeature = TRUE, annotationNamesCell = TRUE, showNamesFeature = FALSE, showNamesCell = FALSE, rowGroupOrder = NULL, colGroupOrder = NULL, hclustMethod = "ward.D2", treeheightFeature = ifelse(clusterFeature, 50, 0), treeheightCell = ifelse(clusterCell, 50, 0), silent = FALSE, ... )
counts |
Numeric or sparse matrix. Normalized counts matrix where rows represent features and columns represent cells. . |
z |
Numeric vector. Denotes cell population labels. |
y |
Numeric vector. Denotes feature module labels. |
scaleRow |
Function. A function to scale each individual row. Set to NULL to disable. Occurs after normalization and log transformation. Defualt is 'scale' and thus will Z-score transform each row. |
trim |
Numeric vector. Vector of length two that specifies the lower and upper bounds for the data. This threshold is applied after row scaling. Set to NULL to disable. Default c(-2,2). |
featureIx |
Integer vector. Select features for display in heatmap. If NULL, no subsetting will be performed. Default NULL. |
cellIx |
Integer vector. Select cells for display in heatmap. If NULL, no subsetting will be performed. Default NULL. |
clusterFeature |
Logical. Determines whether rows should be clustered. Default TRUE. |
clusterCell |
Logical. Determines whether columns should be clustered. Default TRUE. |
colorScheme |
Character. One of "divergent" or "sequential". A "divergent" scheme is best for highlighting relative data (denoted by 'colorSchemeCenter') such as gene expression data that has been normalized and centered. A "sequential" scheme is best for highlighting data that are ordered low to high such as raw counts or probabilities. Default "divergent". |
colorSchemeSymmetric |
Logical. When the colorScheme is "divergent"
and the data contains both positive and negative numbers, TRUE indicates
that the color scheme should be symmetric from
|
colorSchemeCenter |
Numeric. Indicates the center of a "divergent" colorScheme. Default 0. |
col |
Color for the heatmap. |
annotationCell |
Data frame. Additional annotations for each cell will be shown in the column color bars. The format of the data frame should be one row for each cell and one column for each annotation. Numeric variables will be displayed as continuous color bars and factors will be displayed as discrete color bars. Default NULL. |
annotationFeature |
A data frame for the feature annotations (rows). |
annotationColor |
List. Contains color scheme for all annotations. See '?pheatmap' for more details. |
breaks |
Numeric vector. A sequence of numbers that covers the range of values in the normalized 'counts'. Values in the normalized 'matrix' are assigned to each bin in 'breaks'. Each break is assigned to a unique color from 'col'. If NULL, then breaks are calculated automatically. Default NULL. |
legend |
Logical. Determines whether legend should be drawn. Default TRUE. |
annotationLegend |
Logical. Whether legend for all annotations should be drawn. Default TRUE. |
annotationNamesFeature |
Logical. Whether the names for features should be shown. Default TRUE. |
annotationNamesCell |
Logical. Whether the names for cells should be shown. Default TRUE. |
showNamesFeature |
Logical. Specifies if feature names should be shown. Default TRUE. |
showNamesCell |
Logical. Specifies if cell names should be shown. Default FALSE. |
rowGroupOrder |
Vector. Specifies the order of feature clusters when
semisupervised clustering is performed on the |
colGroupOrder |
Vector. Specifies the order of cell clusters when
semisupervised clustering is performed on the |
hclustMethod |
Character. Specifies the method to use for the 'hclust' function. See '?hclust' for possible values. Default "ward.D2". |
treeheightFeature |
Numeric. Width of the feature dendrogram. Set to 0 to disable plotting of this dendrogram. Default: if clusterFeature == TRUE, then treeheightFeature = 50, else treeheightFeature = 0. |
treeheightCell |
Numeric. Height of the cell dendrogram. Set to 0 to disable plotting of this dendrogram. Default: if clusterCell == TRUE, then treeheightCell = 50, else treeheightCell = 0. |
silent |
Logical. Whether to plot the heatmap. |
... |
Other arguments to be passed to underlying pheatmap function. |
list A list containing dendrogram information and the heatmap grob
data(celdaCGSim, celdaCGMod) plotHeatmap(celdaCGSim$counts, z = celdaClusters(celdaCGMod)$z, y = celdaClusters(celdaCGMod)$y )
data(celdaCGSim, celdaCGMod) plotHeatmap(celdaCGSim$counts, z = celdaClusters(celdaCGMod)$z, y = celdaClusters(celdaCGMod)$y )
Visualize perplexity differences of every model in a celdaList, by unique K/L combinations.
plotRPC(x, altExpName = "featureSubset", sep = 5, alpha = 0.5) ## S4 method for signature 'SingleCellExperiment' plotRPC(x, altExpName = "featureSubset", sep = 5, alpha = 0.5) ## S4 method for signature 'celdaList' plotRPC(x, sep = 5, alpha = 0.5)
plotRPC(x, altExpName = "featureSubset", sep = 5, alpha = 0.5) ## S4 method for signature 'SingleCellExperiment' plotRPC(x, altExpName = "featureSubset", sep = 5, alpha = 0.5) ## S4 method for signature 'celdaList' plotRPC(x, sep = 5, alpha = 0.5)
x |
Can be one of
|
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
sep |
Numeric. Breaks in the x axis of the resulting plot. |
alpha |
Numeric. Passed to geom_jitter. Opacity of the points. Values of alpha range from 0 to 1, with lower values corresponding to more transparent colors. |
A ggplot plot object showing perplexity differences as a function of clustering parameters.
data(sceCeldaCGGridSearch) sce <- resamplePerplexity(sceCeldaCGGridSearch) plotRPC(sce) data(celdaCGSim, celdaCGGridSearchRes) ## Run various combinations of parameters with 'celdaGridSearch' celdaCGGridSearchRes <- resamplePerplexity( celdaCGSim$counts, celdaCGGridSearchRes) plotRPC(celdaCGGridSearchRes)
data(sceCeldaCGGridSearch) sce <- resamplePerplexity(sceCeldaCGGridSearch) plotRPC(sce) data(celdaCGSim, celdaCGGridSearchRes) ## Run various combinations of parameters with 'celdaGridSearch' celdaCGGridSearchRes <- resamplePerplexity( celdaCGSim$counts, celdaCGGridSearchRes) plotRPC(celdaCGGridSearchRes)
Recode feature module clusters using a mapping in the
from
and to
arguments.
recodeClusterY(sce, from, to, altExpName = "featureSubset")
recodeClusterY(sce, from, to, altExpName = "featureSubset")
sce |
SingleCellExperiment object returned from
celda_G or celda_CG. Must contain column
|
from |
Numeric vector. Unique values in the range of
|
to |
Numeric vector. Unique values in the range of
|
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
@return SingleCellExperiment object with recoded feature module labels.
data(sceCeldaCG) sceReorderedY <- recodeClusterY(sceCeldaCG, c(1, 3), c(3, 1))
data(sceCeldaCG) sceReorderedY <- recodeClusterY(sceCeldaCG, c(1, 3), c(3, 1))
Recode cell subpopulaton clusters using a mapping in the
from
and to
arguments.
recodeClusterZ(sce, from, to, altExpName = "featureSubset")
recodeClusterZ(sce, from, to, altExpName = "featureSubset")
sce |
SingleCellExperiment object returned from
celda_C or celda_CG. Must contain column
|
from |
Numeric vector. Unique values in the range of
|
to |
Numeric vector. Unique values in the range of
|
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
SingleCellExperiment object with recoded cell cluster labels.
data(sceCeldaCG) sceReorderedZ <- recodeClusterZ(sceCeldaCG, c(1, 3), c(3, 1))
data(sceCeldaCG) sceReorderedZ <- recodeClusterZ(sceCeldaCG, c(1, 3), c(3, 1))
Uses the celda_C model to cluster cells into
population for range of possible K's. The cell population labels of the
previous "K-1" model are used as the initial values in the current model
with K cell populations. The best split of an existing cell population is
found to create the K-th cluster. This procedure is much faster than
randomly initializing each model with a different K. If module labels for
each feature are given in 'yInit', the celda_CG model will be used to
split cell populations based on those modules instead of individual
features. Module labels will also be updated during sampling and thus
may end up slightly different than yInit
.
recursiveSplitCell( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, initialK = 5, maxK = 25, tempL = NULL, yInit = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minCell = 3, reorder = TRUE, seed = 12345, perplexity = TRUE, doResampling = FALSE, numResample = 5, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'SingleCellExperiment' recursiveSplitCell( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, initialK = 5, maxK = 25, tempL = NULL, yInit = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minCell = 3, reorder = TRUE, seed = 12345, perplexity = TRUE, doResampling = FALSE, numResample = 5, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'matrix' recursiveSplitCell( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, initialK = 5, maxK = 25, tempL = NULL, yInit = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minCell = 3, reorder = TRUE, seed = 12345, perplexity = TRUE, doResampling = FALSE, numResample = 5, logfile = NULL, verbose = TRUE )
recursiveSplitCell( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, initialK = 5, maxK = 25, tempL = NULL, yInit = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minCell = 3, reorder = TRUE, seed = 12345, perplexity = TRUE, doResampling = FALSE, numResample = 5, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'SingleCellExperiment' recursiveSplitCell( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, initialK = 5, maxK = 25, tempL = NULL, yInit = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minCell = 3, reorder = TRUE, seed = 12345, perplexity = TRUE, doResampling = FALSE, numResample = 5, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'matrix' recursiveSplitCell( x, useAssay = "counts", altExpName = "featureSubset", sampleLabel = NULL, initialK = 5, maxK = 25, tempL = NULL, yInit = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minCell = 3, reorder = TRUE, seed = 12345, perplexity = TRUE, doResampling = FALSE, numResample = 5, logfile = NULL, verbose = TRUE )
x |
A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under |
useAssay |
A string specifying the name of the assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
sampleLabel |
Vector or factor. Denotes the sample label for each cell (column) in the count matrix. |
initialK |
Integer. Initial number of cell populations to try.
Default |
maxK |
Integer. Maximum number of cell populations to try.
Default |
tempL |
Integer. Number of temporary modules to identify and use in cell
splitting. Only used if |
yInit |
Integer vector. Module labels for features. Cells will be
clustered using the celda_CG model based on the modules specified in
|
alpha |
Numeric. Concentration parameter for Theta. Adds a pseudocount
to each cell population in each sample. Default |
beta |
Numeric. Concentration parameter for Phi. Adds a pseudocount to
each feature in each cell (if |
delta |
Numeric. Concentration parameter for Psi. Adds a pseudocount
to each feature in each module. Only used if |
gamma |
Numeric. Concentration parameter for Eta. Adds a pseudocount
to the number of features in each module. Only used if |
minCell |
Integer. Only attempt to split cell populations with at least this many cells. |
reorder |
Logical. Whether to reorder cell populations using hierarchical clustering after each model has been created. If FALSE, cell populations numbers will correspond to the split which created the cell populations (i.e. 'K15' was created at split 15, 'K16' was created at split 16, etc.). Default TRUE. |
seed |
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made. |
perplexity |
Logical. Whether to calculate perplexity for each model. If FALSE, then perplexity can be calculated later with resamplePerplexity. Default TRUE. |
doResampling |
Boolean. If |
numResample |
Integer. The number of times to resample the counts matrix
for evaluating perplexity if |
logfile |
Character. Messages will be redirected to a file named "logfile". If NULL, messages will be printed to stdout. Default NULL. |
verbose |
Logical. Whether to print log messages. Default TRUE. |
A SingleCellExperiment object. Function
parameter settings and celda model results are stored in the
metadata "celda_grid_search"
slot. The models in
the list will be of class celda_C
if yInit = NULL
or
celda_CG
if zInit
is set.
recursiveSplitModule for recursive splitting of feature modules.
data(sceCeldaCG) ## Create models that range from K = 3 to K = 7 by recursively splitting ## cell populations into two to produce \link{celda_C} cell clustering models sce <- recursiveSplitCell(sceCeldaCG, initialK = 3, maxK = 7) ## Alternatively, first identify features modules using ## \link{recursiveSplitModule} moduleSplit <- recursiveSplitModule(sceCeldaCG, initialL = 3, maxL = 15) plotGridSearchPerplexity(moduleSplit) moduleSplitSelect <- subsetCeldaList(moduleSplit, list(L = 10)) ## Then use module labels for initialization in \link{recursiveSplitCell} to ## produce \link{celda_CG} bi-clustering models cellSplit <- recursiveSplitCell(sceCeldaCG, initialK = 3, maxK = 7, yInit = celdaModules(moduleSplitSelect)) plotGridSearchPerplexity(cellSplit) sce <- subsetCeldaList(cellSplit, list(K = 5, L = 10)) data(celdaCGSim, celdaCSim) ## Create models that range from K = 3 to K = 7 by recursively splitting ## cell populations into two to produce \link{celda_C} cell clustering models sce <- recursiveSplitCell(celdaCSim$counts, initialK = 3, maxK = 7) ## Alternatively, first identify features modules using ## \link{recursiveSplitModule} moduleSplit <- recursiveSplitModule(celdaCGSim$counts, initialL = 3, maxL = 15) plotGridSearchPerplexity(moduleSplit) moduleSplitSelect <- subsetCeldaList(moduleSplit, list(L = 10)) ## Then use module labels for initialization in \link{recursiveSplitCell} to ## produce \link{celda_CG} bi-clustering models cellSplit <- recursiveSplitCell(celdaCGSim$counts, initialK = 3, maxK = 7, yInit = celdaModules(moduleSplitSelect)) plotGridSearchPerplexity(cellSplit) sce <- subsetCeldaList(cellSplit, list(K = 5, L = 10))
data(sceCeldaCG) ## Create models that range from K = 3 to K = 7 by recursively splitting ## cell populations into two to produce \link{celda_C} cell clustering models sce <- recursiveSplitCell(sceCeldaCG, initialK = 3, maxK = 7) ## Alternatively, first identify features modules using ## \link{recursiveSplitModule} moduleSplit <- recursiveSplitModule(sceCeldaCG, initialL = 3, maxL = 15) plotGridSearchPerplexity(moduleSplit) moduleSplitSelect <- subsetCeldaList(moduleSplit, list(L = 10)) ## Then use module labels for initialization in \link{recursiveSplitCell} to ## produce \link{celda_CG} bi-clustering models cellSplit <- recursiveSplitCell(sceCeldaCG, initialK = 3, maxK = 7, yInit = celdaModules(moduleSplitSelect)) plotGridSearchPerplexity(cellSplit) sce <- subsetCeldaList(cellSplit, list(K = 5, L = 10)) data(celdaCGSim, celdaCSim) ## Create models that range from K = 3 to K = 7 by recursively splitting ## cell populations into two to produce \link{celda_C} cell clustering models sce <- recursiveSplitCell(celdaCSim$counts, initialK = 3, maxK = 7) ## Alternatively, first identify features modules using ## \link{recursiveSplitModule} moduleSplit <- recursiveSplitModule(celdaCGSim$counts, initialL = 3, maxL = 15) plotGridSearchPerplexity(moduleSplit) moduleSplitSelect <- subsetCeldaList(moduleSplit, list(L = 10)) ## Then use module labels for initialization in \link{recursiveSplitCell} to ## produce \link{celda_CG} bi-clustering models cellSplit <- recursiveSplitCell(celdaCGSim$counts, initialK = 3, maxK = 7, yInit = celdaModules(moduleSplitSelect)) plotGridSearchPerplexity(cellSplit) sce <- subsetCeldaList(cellSplit, list(K = 5, L = 10))
Uses the celda_G model to cluster features into modules for a range of possible L's. The module labels of the previous "L-1" model are used as the initial values in the current model with L modules. The best split of an existing module is found to create the L-th module. This procedure is much faster than randomly initializing each model with a different L.
recursiveSplitModule( x, useAssay = "counts", altExpName = "featureSubset", initialL = 10, maxL = 100, tempK = 100, zInit = NULL, sampleLabel = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minFeature = 3, reorder = TRUE, seed = 12345, perplexity = TRUE, doResampling = FALSE, numResample = 5, verbose = TRUE, logfile = NULL ) ## S4 method for signature 'SingleCellExperiment' recursiveSplitModule( x, useAssay = "counts", altExpName = "featureSubset", initialL = 10, maxL = 100, tempK = 100, zInit = NULL, sampleLabel = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minFeature = 3, reorder = TRUE, seed = 12345, perplexity = TRUE, doResampling = FALSE, numResample = 5, verbose = TRUE, logfile = NULL ) ## S4 method for signature 'matrix' recursiveSplitModule( x, useAssay = "counts", altExpName = "featureSubset", initialL = 10, maxL = 100, tempK = 100, zInit = NULL, sampleLabel = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minFeature = 3, reorder = TRUE, seed = 12345, perplexity = TRUE, doResampling = FALSE, numResample = 5, verbose = TRUE, logfile = NULL )
recursiveSplitModule( x, useAssay = "counts", altExpName = "featureSubset", initialL = 10, maxL = 100, tempK = 100, zInit = NULL, sampleLabel = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minFeature = 3, reorder = TRUE, seed = 12345, perplexity = TRUE, doResampling = FALSE, numResample = 5, verbose = TRUE, logfile = NULL ) ## S4 method for signature 'SingleCellExperiment' recursiveSplitModule( x, useAssay = "counts", altExpName = "featureSubset", initialL = 10, maxL = 100, tempK = 100, zInit = NULL, sampleLabel = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minFeature = 3, reorder = TRUE, seed = 12345, perplexity = TRUE, doResampling = FALSE, numResample = 5, verbose = TRUE, logfile = NULL ) ## S4 method for signature 'matrix' recursiveSplitModule( x, useAssay = "counts", altExpName = "featureSubset", initialL = 10, maxL = 100, tempK = 100, zInit = NULL, sampleLabel = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minFeature = 3, reorder = TRUE, seed = 12345, perplexity = TRUE, doResampling = FALSE, numResample = 5, verbose = TRUE, logfile = NULL )
x |
A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under |
useAssay |
A string specifying which assay
slot to use if |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
initialL |
Integer. Initial number of modules. |
maxL |
Integer. Maximum number of modules. |
tempK |
Integer. Number of temporary cell populations to identify and
use in module splitting. Only used if |
zInit |
Integer vector. Collapse cells to cell populations based on
labels in |
sampleLabel |
Vector or factor. Denotes the sample label for each cell
(column) in the count matrix. Default |
alpha |
Numeric. Concentration parameter for Theta. Adds a pseudocount
to each cell population in each sample. Only used if |
beta |
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell. Default 1. |
delta |
Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1. |
gamma |
Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1. |
minFeature |
Integer. Only attempt to split modules with at least this many features. |
reorder |
Logical. Whether to reorder modules using hierarchical clustering after each model has been created. If FALSE, module numbers will correspond to the split which created the module (i.e. 'L15' was created at split 15, 'L16' was created at split 16, etc.). Default TRUE. |
seed |
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made. |
perplexity |
Logical. Whether to calculate perplexity for each model.
If FALSE, then perplexity can be calculated later with
resamplePerplexity. Default |
doResampling |
Boolean. If |
numResample |
Integer. The number of times to resample the counts matrix
for evaluating perplexity if |
verbose |
Logical. Whether to print log messages. Default TRUE. |
logfile |
Character. Messages will be redirected to a file named "logfile". If NULL, messages will be printed to stdout. Default NULL. |
A SingleCellExperiment object. Function
parameter settings and celda model results are stored in the
metadata "celda_grid_search"
slot. The models in
the list will be of class celda_G if zInit = NULL
or
celda_CG if zInit
is set.
recursiveSplitCell
for recursive splitting of cell
populations.
data(sceCeldaCG) ## Create models that range from L=3 to L=20 by recursively splitting modules ## into two moduleSplit <- recursiveSplitModule(sceCeldaCG, initialL = 3, maxL = 20) ## Example results with perplexity plotGridSearchPerplexity(moduleSplit) ## Select model for downstream analysis celdaMod <- subsetCeldaList(moduleSplit, list(L = 10)) data(celdaCGSim) ## Create models that range from L=3 to L=20 by recursively splitting modules ## into two moduleSplit <- recursiveSplitModule(celdaCGSim$counts, initialL = 3, maxL = 20) ## Example results with perplexity plotGridSearchPerplexity(moduleSplit) ## Select model for downstream analysis celdaMod <- subsetCeldaList(moduleSplit, list(L = 10))
data(sceCeldaCG) ## Create models that range from L=3 to L=20 by recursively splitting modules ## into two moduleSplit <- recursiveSplitModule(sceCeldaCG, initialL = 3, maxL = 20) ## Example results with perplexity plotGridSearchPerplexity(moduleSplit) ## Select model for downstream analysis celdaMod <- subsetCeldaList(moduleSplit, list(L = 10)) data(celdaCGSim) ## Create models that range from L=3 to L=20 by recursively splitting modules ## into two moduleSplit <- recursiveSplitModule(celdaCGSim$counts, initialL = 3, maxL = 20) ## Example results with perplexity plotGridSearchPerplexity(moduleSplit) ## Select model for downstream analysis celdaMod <- subsetCeldaList(moduleSplit, list(L = 10))
Apply hierarchical clustering to reorder the cell populations and/or feature modules and group similar ones together based on the cosine distance of the factorized matrix from factorizeMatrix.
reorderCelda( x, celdaMod, useAssay = "counts", altExpName = "featureSubset", method = "complete" ) ## S4 method for signature 'SingleCellExperiment,ANY' reorderCelda( x, useAssay = "counts", altExpName = "featureSubset", method = "complete" ) ## S4 method for signature 'matrix,celda_CG' reorderCelda(x, celdaMod, method = "complete") ## S4 method for signature 'matrix,celda_C' reorderCelda(x, celdaMod, method = "complete") ## S4 method for signature 'matrix,celda_G' reorderCelda(x, celdaMod, method = "complete")
reorderCelda( x, celdaMod, useAssay = "counts", altExpName = "featureSubset", method = "complete" ) ## S4 method for signature 'SingleCellExperiment,ANY' reorderCelda( x, useAssay = "counts", altExpName = "featureSubset", method = "complete" ) ## S4 method for signature 'matrix,celda_CG' reorderCelda(x, celdaMod, method = "complete") ## S4 method for signature 'matrix,celda_C' reorderCelda(x, celdaMod, method = "complete") ## S4 method for signature 'matrix,celda_G' reorderCelda(x, celdaMod, method = "complete")
x |
Can be one of
|
celdaMod |
Celda model object. Only works if |
useAssay |
A string specifying which assay
slot to use if |
altExpName |
The name for the altExp slot. Default "featureSubset". |
method |
Passed to hclust. The agglomeration method to be used to be used. Default "complete". |
A SingleCellExperiment object (or Celda model object) with updated cell cluster and/or feature module labels.
data(sceCeldaCG) reordersce <- reorderCelda(sceCeldaCG) data(celdaCGSim, celdaCGMod) reorderCeldaCG <- reorderCelda(celdaCGSim$counts, celdaCGMod) data(celdaCSim, celdaCMod) reorderCeldaC <- reorderCelda(celdaCSim$counts, celdaCMod) data(celdaGSim, celdaGMod) reorderCeldaG <- reorderCelda(celdaGSim$counts, celdaGMod)
data(sceCeldaCG) reordersce <- reorderCelda(sceCeldaCG) data(celdaCGSim, celdaCGMod) reorderCeldaCG <- reorderCelda(celdaCGSim$counts, celdaCGMod) data(celdaCSim, celdaCMod) reorderCeldaC <- reorderCelda(celdaCSim$counts, celdaCMod) data(celdaGSim, celdaGMod) reorderCeldaG <- reorderCelda(celdaGSim$counts, celdaGMod)
reportCeldaCGRun
will run recursiveSplitModule and
recursiveSplitCell to find the number of modules (L
) and the
number of cell populations (K
). A final celda_CG model will
be selected from recursiveSplitCell. After a celda_CG model
has been fit, reportCeldaCGPlotResults
can be used to create an HTML
report for visualization and exploration of the celda_CG model
results. Some of the plotting and feature selection functions require the
installation of the Bioconductor package singleCellTK
.
reportCeldaCGRun( sce, L, K, sampleLabel = NULL, altExpName = "featureSubset", useAssay = "counts", initialL = 10, maxL = 150, initialK = 5, maxK = 50, minCell = 3, minCount = 3, maxFeatures = 5000, output_file = "CeldaCG_RunReport", output_sce_prefix = "celda_cg", output_dir = ".", pdf = FALSE, showSession = TRUE ) reportCeldaCGPlotResults( sce, reducedDimName, features = NULL, displayName = NULL, altExpName = "featureSubset", useAssay = "counts", cellAnnot = NULL, cellAnnotLabel = NULL, exactMatch = TRUE, moduleFilePrefix = "module_features", output_file = "CeldaCG_ResultReport", output_dir = ".", pdf = FALSE, showSetup = TRUE, showSession = TRUE )
reportCeldaCGRun( sce, L, K, sampleLabel = NULL, altExpName = "featureSubset", useAssay = "counts", initialL = 10, maxL = 150, initialK = 5, maxK = 50, minCell = 3, minCount = 3, maxFeatures = 5000, output_file = "CeldaCG_RunReport", output_sce_prefix = "celda_cg", output_dir = ".", pdf = FALSE, showSession = TRUE ) reportCeldaCGPlotResults( sce, reducedDimName, features = NULL, displayName = NULL, altExpName = "featureSubset", useAssay = "counts", cellAnnot = NULL, cellAnnotLabel = NULL, exactMatch = TRUE, moduleFilePrefix = "module_features", output_file = "CeldaCG_ResultReport", output_dir = ".", pdf = FALSE, showSetup = TRUE, showSession = TRUE )
sce |
A SingleCellExperiment with the matrix located in
the assay slot under |
L |
Integer. Final number of feature modules. See |
K |
Integer. Final number of cell populations. See |
sampleLabel |
Vector or factor. Denotes the sample label for each cell (column) in the count matrix. |
altExpName |
The name for the altExp slot to use. Default
|
useAssay |
A string specifying which assay slot to use. Default
|
initialL |
Integer. Minimum number of modules to try. See
recursiveSplitModule for more information. Defailt |
maxL |
Integer. Maximum number of modules to try. See
recursiveSplitModule for more information. Default |
initialK |
Integer. Initial number of cell populations to try. |
maxK |
Integer. Maximum number of cell populations to try. |
minCell |
Integer. Minimum number of cells required for feature
selection. See selectFeatures for more information. Default
|
minCount |
Integer. Minimum number of counts required for feature
selection. See selectFeatures for more information. Default
|
maxFeatures |
Integer. Maximum number of features to include. If the
number of features after filtering for |
output_file |
Character. Prefix of the html file. Default
|
output_sce_prefix |
Character. The |
output_dir |
Character. Path to save the html file. Default |
pdf |
Boolean. Whether to create PDF versions of each plot in addition
to PNGs. Default |
showSession |
Boolean. Whether to show the session information at the
end. Default |
reducedDimName |
Character. Name of the reduced dimensional object to be
used in 2-D scatter plots throughout the report. Default |
features |
Character vector. Expression of these features will be
displayed on a reduced dimensional plot defined by |
displayName |
Character. The name to use for display in scatter plots
and heatmaps. If |
cellAnnot |
Character vector. The cell-level annotations to display on
the reduced dimensional plot. These variables should be present in the
column data of the |
cellAnnotLabel |
Character vector. Additional cell-level annotations
to display on the reduced dimensional plot. Variables will be treated
as categorial and labels for each group will be placed on the plot.
These variables should be present in the column data of the |
exactMatch |
Boolean. Whether to only identify exact matches or to
identify partial matches using |
moduleFilePrefix |
Character. The features in each module will be
written to a a csv file starting with this name. If |
showSetup |
Boolean. Whether to show the setup code at the beginning.
Default |
.html file
data(sceCeldaCG) ## Not run: library(SingleCellExperiment) sceCeldaCG$sum <- colSums(counts(sceCeldaCG)) rowData(sceCeldaCG)$rownames <- rownames(sceCeldaCG) sceCeldaCG <- reportCeldaCGRun(sceCeldaCG, initialL = 5, maxL = 20, initialK = 5, maxK = 20, L = 10, K = 5) reportCeldaCGPlotResults(sce = sceCeldaCG, reducedDimName = "celda_UMAP", features = c("Gene_1", "Gene_100"), displayName = "rownames", cellAnnot="sum") ## End(Not run)
data(sceCeldaCG) ## Not run: library(SingleCellExperiment) sceCeldaCG$sum <- colSums(counts(sceCeldaCG)) rowData(sceCeldaCG)$rownames <- rownames(sceCeldaCG) sceCeldaCG <- reportCeldaCGRun(sceCeldaCG, initialL = 5, maxL = 20, initialK = 5, maxK = 20, L = 10, K = 5) reportCeldaCGPlotResults(sce = sceCeldaCG, reducedDimName = "celda_UMAP", features = c("Gene_1", "Gene_100"), displayName = "rownames", cellAnnot="sum") ## End(Not run)
Calculates the perplexity of each model's cluster assignments given the provided countMatrix, as well as resamplings of that count matrix, providing a distribution of perplexities and a better sense of the quality of a given K/L choice.
resamplePerplexity( x, celdaList, useAssay = "counts", altExpName = "featureSubset", doResampling = FALSE, numResample = 5, seed = 12345 ) ## S4 method for signature 'SingleCellExperiment' resamplePerplexity( x, useAssay = "counts", altExpName = "featureSubset", doResampling = FALSE, numResample = 5, seed = 12345 ) ## S4 method for signature 'ANY' resamplePerplexity( x, celdaList, doResampling = FALSE, numResample = 5, seed = 12345 )
resamplePerplexity( x, celdaList, useAssay = "counts", altExpName = "featureSubset", doResampling = FALSE, numResample = 5, seed = 12345 ) ## S4 method for signature 'SingleCellExperiment' resamplePerplexity( x, useAssay = "counts", altExpName = "featureSubset", doResampling = FALSE, numResample = 5, seed = 12345 ) ## S4 method for signature 'ANY' resamplePerplexity( x, celdaList, doResampling = FALSE, numResample = 5, seed = 12345 )
x |
A numeric matrix of counts or a
SingleCellExperiment returned from celdaGridSearch
with the matrix located in the assay slot under |
celdaList |
Object of class 'celdaList'. Used only if |
useAssay |
A string specifying which assay
slot to use if |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
doResampling |
Boolean. If |
numResample |
Integer. The number of times to resample the counts matrix
for evaluating perplexity if |
seed |
Integer. Passed to with_seed. For reproducibility,
a default value of |
A SingleCellExperiment object or
celdaList
object with a perplexity
property, detailing the perplexity of all K/L combinations that appeared in
the celdaList's models.
data(sceCeldaCGGridSearch) sce <- resamplePerplexity(sceCeldaCGGridSearch) plotGridSearchPerplexity(sce) data(celdaCGSim, celdaCGGridSearchRes) celdaCGGridSearchRes <- resamplePerplexity( celdaCGSim$counts, celdaCGGridSearchRes ) plotGridSearchPerplexity(celdaCGGridSearchRes)
data(sceCeldaCGGridSearch) sce <- resamplePerplexity(sceCeldaCGGridSearch) plotGridSearchPerplexity(sce) data(celdaCGSim, celdaCGGridSearchRes) celdaCGGridSearchRes <- resamplePerplexity( celdaCGSim$counts, celdaCGGridSearchRes ) plotGridSearchPerplexity(celdaCGGridSearchRes)
SCE
or celdaList
objectReturns all celda models generated during a celdaGridSearch run.
resList(x, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' resList(x, altExpName = "featureSubset") ## S4 method for signature 'celdaList' resList(x)
resList(x, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' resList(x, altExpName = "featureSubset") ## S4 method for signature 'celdaList' resList(x)
x |
An object of class SingleCellExperiment or
|
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
List. Contains one celdaModel object for each of the parameters
specified in runParams(x)
.
data(sceCeldaCGGridSearch) celdaCGGridModels <- resList(sceCeldaCGGridSearch) data(celdaCGGridSearchRes) celdaCGGridModels <- resList(celdaCGGridSearchRes)
data(sceCeldaCGGridSearch) celdaCGGridModels <- resList(sceCeldaCGGridSearch) data(celdaCGGridSearchRes) celdaCGGridModels <- resList(celdaCGGridSearchRes)
This will return indices of features among the rownames
or rowData of a data.frame, matrix, or a SummarizedExperiment
object including a SingleCellExperiment.
Partial matching (i.e. grepping) can be used by setting
exactMatch = FALSE
.
retrieveFeatureIndex( features, x, by = "rownames", exactMatch = TRUE, removeNA = FALSE )
retrieveFeatureIndex( features, x, by = "rownames", exactMatch = TRUE, removeNA = FALSE )
features |
Character vector of feature names to find in the rows of
|
x |
A data.frame, matrix, or SingleCellExperiment object to search. |
by |
Character. Where to search for features in |
exactMatch |
Boolean. Whether to only identify exact matches
or to identify partial matches using |
removeNA |
Boolean. If set to |
A vector of row indices for the matching features in x
.
Yusuke Koga, Joshua Campbell
'retrieveFeatureInfo' from package 'scater'
and link{regex}
for how to use regular expressions when
exactMatch = FALSE
.
data(celdaCGSim) retrieveFeatureIndex(c("Gene_1", "Gene_5"), celdaCGSim$counts) retrieveFeatureIndex(c("Gene_1", "Gene_5"), celdaCGSim$counts, exactMatch = FALSE)
data(celdaCGSim) retrieveFeatureIndex(c("Gene_1", "Gene_5"), celdaCGSim$counts) retrieveFeatureIndex(c("Gene_1", "Gene_5"), celdaCGSim$counts, exactMatch = FALSE)
SingleCellExperiment
or celdaList
objectReturns details on the clustering parameters and model priors from the celdaList object when it was created.
runParams(x, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' runParams(x, altExpName = "featureSubset") ## S4 method for signature 'celdaList' runParams(x)
runParams(x, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' runParams(x, altExpName = "featureSubset") ## S4 method for signature 'celdaList' runParams(x)
x |
An object of class SingleCellExperiment or class
|
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
Data Frame. Contains details on the various K/L parameters, chain parameters, seed, and final log-likelihoods derived for each model in the provided celdaList.
data(sceCeldaCGGridSearch) runParams(sceCeldaCGGridSearch) data(celdaCGGridSearchRes) runParams(celdaCGGridSearchRes)
data(sceCeldaCGGridSearch) runParams(sceCeldaCGGridSearch) data(celdaCGGridSearchRes) runParams(celdaCGGridSearchRes)
A matrix of simulated gene counts.
sampleCells
sampleCells
A matrix of simulated gene counts with 10 rows (genes) and 10 columns (cells).
A toy count matrix for use with celda.
Generated by Josh Campbell.
http://github.com/campbio/celda
Return or set the sample labels for the cells in sce
.
sampleLabel(x, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' sampleLabel(x, altExpName = "featureSubset") sampleLabel(x, altExpName = "featureSubset") <- value ## S4 replacement method for signature 'SingleCellExperiment' sampleLabel(x, altExpName = "featureSubset") <- value ## S4 method for signature 'celdaModel' sampleLabel(x)
sampleLabel(x, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' sampleLabel(x, altExpName = "featureSubset") sampleLabel(x, altExpName = "featureSubset") <- value ## S4 replacement method for signature 'SingleCellExperiment' sampleLabel(x, altExpName = "featureSubset") <- value ## S4 method for signature 'celdaModel' sampleLabel(x)
x |
Can be one of
|
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
value |
Character vector of sample labels for replacements. Works
only is |
Character vector. Contains the sample labels provided at model creation, or those automatically generated by celda.
data(sceCeldaCG) sampleLabel(sceCeldaCG) data(celdaCGMod) sampleLabel(celdaCGMod)
data(sceCeldaCG) sampleLabel(sceCeldaCG) data(celdaCGMod) sampleLabel(celdaCGMod)
A SingleCellExperiment object containing the results of running selectFeatures and celda_C on celdaCSim.
sceCeldaC
sceCeldaC
A SingleCellExperiment object
data(celdaCSim) sceCeldaC <- selectFeatures(celdaCSim$counts) sceCeldaC <- celda_C(sceCeldaC, K = celdaCSim$K, sampleLabel = celdaCSim$sampleLabel, nchains = 1)
data(celdaCSim) sceCeldaC <- selectFeatures(celdaCSim$counts) sceCeldaC <- celda_C(sceCeldaC, K = celdaCSim$K, sampleLabel = celdaCSim$sampleLabel, nchains = 1)
A SingleCellExperiment object containing the results of running selectFeatures and celda_CG on celdaCGSim.
sceCeldaCG
sceCeldaCG
A SingleCellExperiment object
data(celdaCGSim) sceCeldaCG <- selectFeatures(celdaCGSim$counts) sceCeldaCG <- celda_CG(sceCeldaCG, K = celdaCGSim$K, L = celdaCGSim$L, sampleLabel = celdaCGSim$sampleLabel, nchains = 1)
data(celdaCGSim) sceCeldaCG <- selectFeatures(celdaCGSim$counts) sceCeldaCG <- celda_CG(sceCeldaCG, K = celdaCGSim$K, L = celdaCGSim$L, sampleLabel = celdaCGSim$sampleLabel, nchains = 1)
A SingleCellExperiment object containing the results of running selectFeatures and celdaGridSearch on celdaCGSim.
sceCeldaCGGridSearch
sceCeldaCGGridSearch
A SingleCellExperiment object
data(celdaCGSim) sce <- selectFeatures(celdaCGSim$counts) sceCeldaCGGridSearch <- celdaGridSearch(sce, model = "celda_CG", paramsTest = list(K = seq(4, 6), L = seq(9, 11)), paramsFixed = list(sampleLabel = celdaCGSim$sampleLabel), bestOnly = TRUE, nchains = 1, cores = 1, verbose = FALSE)
data(celdaCGSim) sce <- selectFeatures(celdaCGSim$counts) sceCeldaCGGridSearch <- celdaGridSearch(sce, model = "celda_CG", paramsTest = list(K = seq(4, 6), L = seq(9, 11)), paramsFixed = list(sampleLabel = celdaCGSim$sampleLabel), bestOnly = TRUE, nchains = 1, cores = 1, verbose = FALSE)
A SingleCellExperiment object containing the results of running selectFeatures and celda_G on celdaGSim.
sceCeldaG
sceCeldaG
A SingleCellExperiment object
data(celdaGSim) sceCeldaG <- selectFeatures(celdaGSim$counts) sceCeldaG <- celda_G(sceCeldaG, L = celdaGSim$L, nchains = 1)
data(celdaGSim) sceCeldaG <- selectFeatures(celdaGSim$counts) sceCeldaG <- celda_G(sceCeldaG, L = celdaGSim$L, nchains = 1)
Select the chain with the best log likelihood for each
combination of tested parameters from a SCE
object gererated by
celdaGridSearch or from a celdaList
object.
selectBestModel(x, asList = FALSE, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' selectBestModel(x, asList = FALSE, altExpName = "featureSubset") ## S4 method for signature 'celdaList' selectBestModel(x, asList = FALSE)
selectBestModel(x, asList = FALSE, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' selectBestModel(x, asList = FALSE, altExpName = "featureSubset") ## S4 method for signature 'celdaList' selectBestModel(x, asList = FALSE)
x |
Can be one of
|
asList |
|
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
One of
A new SingleCellExperiment object containing
one model with the best log-likelihood for each set of parameters in
metadata(x)
. If there is only one set of parameters,
a new SingleCellExperiment object
with the matching model stored in the
metadata
"celda_parameters"
slot will be returned. Otherwise, a new
SingleCellExperiment object with the subset models stored
in the metadata
"celda_grid_search"
slot will be returned.
A new celdaList
object containing one model with the best
log-likelihood for each set of parameters. If only one set of parameters
is in the celdaList
, the best model will be returned directly
instead of a celdaList
object.
celdaGridSearch subsetCeldaList
data(sceCeldaCGGridSearch) ## Returns same result as running celdaGridSearch with "bestOnly = TRUE" sce <- selectBestModel(sceCeldaCGGridSearch) data(celdaCGGridSearchRes) ## Returns same result as running celdaGridSearch with "bestOnly = TRUE" cgsBest <- selectBestModel(celdaCGGridSearchRes)
data(sceCeldaCGGridSearch) ## Returns same result as running celdaGridSearch with "bestOnly = TRUE" sce <- selectBestModel(sceCeldaCGGridSearch) data(celdaCGGridSearchRes) ## Returns same result as running celdaGridSearch with "bestOnly = TRUE" cgsBest <- selectBestModel(celdaCGGridSearchRes)
A simple heuristic feature selection procedure.
Select features with at least minCount
counts
in at least minCell
cells. A SingleCellExperiment
object with subset features will be stored in the
altExp slot with name altExpName
.
The name of the assay
slot in altExp
will be the same as useAssay
.
selectFeatures( x, minCount = 3, minCell = 3, useAssay = "counts", altExpName = "featureSubset" ) ## S4 method for signature 'SingleCellExperiment' selectFeatures( x, minCount = 3, minCell = 3, useAssay = "counts", altExpName = "featureSubset" ) ## S4 method for signature 'matrix' selectFeatures( x, minCount = 3, minCell = 3, useAssay = "counts", altExpName = "featureSubset" )
selectFeatures( x, minCount = 3, minCell = 3, useAssay = "counts", altExpName = "featureSubset" ) ## S4 method for signature 'SingleCellExperiment' selectFeatures( x, minCount = 3, minCell = 3, useAssay = "counts", altExpName = "featureSubset" ) ## S4 method for signature 'matrix' selectFeatures( x, minCount = 3, minCell = 3, useAssay = "counts", altExpName = "featureSubset" )
x |
A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under |
minCount |
Minimum number of counts required for feature selection. |
minCell |
Minimum number of cells required for feature selection. |
useAssay |
A string specifying the name of the assay slot to use. Default "counts". |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
A SingleCellExperiment object with a
altExpName
altExp slot. Function
parameter settings are stored in the metadata
"select_features"
slot.
data(sceCeldaCG) sce <- selectFeatures(sceCeldaCG) data(celdaCGSim) sce <- selectFeatures(celdaCGSim$counts)
data(sceCeldaCG) sce <- selectFeatures(sceCeldaCG) data(celdaCGSim) sce <- selectFeatures(celdaCGSim$counts)
A function to draw clustered heatmaps where one has better control over some graphical parameters such as cell size, etc.
The function also allows to aggregate the rows using kmeans clustering. This is advisable if number of rows is so big that R cannot handle their hierarchical clustering anymore, roughly more than 1000. Instead of showing all the rows separately one can cluster the rows in advance and show only the cluster centers. The number of clusters can be tuned with parameter kmeansK.
semiPheatmap( mat, color = colorRampPalette(rev(brewer.pal(n = 7, name = "RdYlBu")))(100), kmeansK = NA, breaks = NA, borderColor = "grey60", cellWidth = NA, cellHeight = NA, scale = "none", clusterRows = TRUE, clusterCols = TRUE, clusteringDistanceRows = "euclidean", clusteringDistanceCols = "euclidean", clusteringMethod = "complete", clusteringCallback = .identity2, cutreeRows = NA, cutreeCols = NA, treeHeightRow = ifelse(clusterRows, 50, 0), treeHeightCol = ifelse(clusterCols, 50, 0), legend = TRUE, legendBreaks = NA, legendLabels = NA, annotationRow = NA, annotationCol = NA, annotation = NA, annotationColors = NA, annotationLegend = TRUE, annotationNamesRow = TRUE, annotationNamesCol = TRUE, dropLevels = TRUE, showRownames = TRUE, showColnames = TRUE, main = NA, fontSize = 10, fontSizeRow = fontSize, fontSizeCol = fontSize, displayNumbers = FALSE, numberFormat = "%.2f", numberColor = "grey30", fontSizeNumber = 0.8 * fontSize, gapsRow = NULL, gapsCol = NULL, labelsRow = NULL, labelsCol = NULL, fileName = NA, width = NA, height = NA, silent = FALSE, rowLabel, colLabel, rowGroupOrder = NULL, colGroupOrder = NULL, ... )
semiPheatmap( mat, color = colorRampPalette(rev(brewer.pal(n = 7, name = "RdYlBu")))(100), kmeansK = NA, breaks = NA, borderColor = "grey60", cellWidth = NA, cellHeight = NA, scale = "none", clusterRows = TRUE, clusterCols = TRUE, clusteringDistanceRows = "euclidean", clusteringDistanceCols = "euclidean", clusteringMethod = "complete", clusteringCallback = .identity2, cutreeRows = NA, cutreeCols = NA, treeHeightRow = ifelse(clusterRows, 50, 0), treeHeightCol = ifelse(clusterCols, 50, 0), legend = TRUE, legendBreaks = NA, legendLabels = NA, annotationRow = NA, annotationCol = NA, annotation = NA, annotationColors = NA, annotationLegend = TRUE, annotationNamesRow = TRUE, annotationNamesCol = TRUE, dropLevels = TRUE, showRownames = TRUE, showColnames = TRUE, main = NA, fontSize = 10, fontSizeRow = fontSize, fontSizeCol = fontSize, displayNumbers = FALSE, numberFormat = "%.2f", numberColor = "grey30", fontSizeNumber = 0.8 * fontSize, gapsRow = NULL, gapsCol = NULL, labelsRow = NULL, labelsCol = NULL, fileName = NA, width = NA, height = NA, silent = FALSE, rowLabel, colLabel, rowGroupOrder = NULL, colGroupOrder = NULL, ... )
mat |
numeric matrix of the values to be plotted. |
color |
vector of colors used in heatmap. |
kmeansK |
the number of kmeans clusters to make, if we want to agggregate the rows before drawing heatmap. If NA then the rows are not aggregated. |
breaks |
Numeric vector. A sequence of numbers that covers the range of values in the normalized 'counts'. Values in the normalized 'matrix' are assigned to each bin in 'breaks'. Each break is assigned to a unique color from 'col'. If NULL, then breaks are calculated automatically. Default NULL. |
borderColor |
color of cell borders on heatmap, use NA if no border should be drawn. |
cellWidth |
individual cell width in points. If left as NA, then the values depend on the size of plotting window. |
cellHeight |
individual cell height in points. If left as NA, then the values depend on the size of plotting window. |
scale |
character indicating if the values should be centered and
scaled in either the row direction or the column direction, or none.
Corresponding values are |
clusterRows |
boolean values determining if rows should be clustered or
|
clusterCols |
boolean values determining if columns should be clustered
or |
clusteringDistanceRows |
distance measure used in clustering rows.
Possible values are |
clusteringDistanceCols |
distance measure used in clustering columns. Possible values the same as for clusteringDistanceRows. |
clusteringMethod |
clustering method used. Accepts the same values as
|
clusteringCallback |
callback function to modify the clustering. Is
called with two parameters: original |
cutreeRows |
number of clusters the rows are divided into, based on the hierarchical clustering (using cutree), if rows are not clustered, the argument is ignored |
cutreeCols |
similar to |
treeHeightRow |
the height of a tree for rows, if these are clustered. Default value 50 points. |
treeHeightCol |
the height of a tree for columns, if these are clustered. Default value 50 points. |
legend |
logical to determine if legend should be drawn or not. |
legendBreaks |
vector of breakpoints for the legend. |
legendLabels |
vector of labels for the |
annotationRow |
data frame that specifies the annotations shown on left side of the heatmap. Each row defines the features for a specific row. The rows in the data and in the annotation are matched using corresponding row names. Note that color schemes takes into account if variable is continuous or discrete. |
annotationCol |
similar to annotationRow, but for columns. |
annotation |
deprecated parameter that currently sets the annotationCol if it is missing. |
annotationColors |
list for specifying annotationRow and annotationCol track colors manually. It is possible to define the colors for only some of the features. Check examples for details. |
annotationLegend |
boolean value showing if the legend for annotation tracks should be drawn. |
annotationNamesRow |
boolean value showing if the names for row annotation tracks should be drawn. |
annotationNamesCol |
boolean value showing if the names for column annotation tracks should be drawn. |
dropLevels |
logical to determine if unused levels are also shown in the legend. |
showRownames |
boolean specifying if column names are be shown. |
showColnames |
boolean specifying if column names are be shown. |
main |
the title of the plot |
fontSize |
base fontsize for the plot |
fontSizeRow |
fontsize for rownames (Default: fontsize) |
fontSizeCol |
fontsize for colnames (Default: fontsize) |
displayNumbers |
logical determining if the numeric values are also printed to the cells. If this is a matrix (with same dimensions as original matrix), the contents of the matrix are shown instead of original values. |
numberFormat |
format strings (C printf style) of the numbers shown in
cells. For example " |
numberColor |
color of the text |
fontSizeNumber |
fontsize of the numbers displayed in cells |
gapsRow |
vector of row indices that show shere to put gaps into
heatmap. Used only if the rows are not clustered. See |
gapsCol |
similar to gapsRow, but for columns. |
labelsRow |
custom labels for rows that are used instead of rownames. |
labelsCol |
similar to labelsRow, but for columns. |
fileName |
file path where to save the picture. Filetype is decided by the extension in the path. Currently following formats are supported: png, pdf, tiff, bmp, jpeg. Even if the plot does not fit into the plotting window, the file size is calculated so that the plot would fit there, unless specified otherwise. |
width |
manual option for determining the output file width in inches. |
height |
manual option for determining the output file height in inches. |
silent |
do not draw the plot (useful when using the gtable output) |
rowLabel |
row cluster labels for semi-clustering |
colLabel |
column cluster labels for semi-clustering |
rowGroupOrder |
Vector. Specifies the order of feature clusters when
semisupervised clustering is performed on the |
colGroupOrder |
Vector. Specifies the order of cell clusters when
semisupervised clustering is performed on the |
... |
graphical parameters for the text used in plot. Parameters
passed to |
Invisibly a list of components
treeRow
the clustering of rows as hclust
object
treeCol
the clustering of columns as hclust
object
kmeans
the kmeans clustering of rows if parameter
kmeansK
was specified
Raivo Kolde <[email protected]> #@examples # Create test matrix test = matrix(rnorm(200), 20, 10) test[seq(10), seq(1, 10, 2)] = test[seq(10), seq(1, 10, 2)] + 3 test[seq(11, 20), seq(2, 10, 2)] = test[seq(11, 20), seq(2, 10, 2)] + 2 test[seq(15, 20), seq(2, 10, 2)] = test[seq(15, 20), seq(2, 10, 2)] + 4 colnames(test) = paste("Test", seq(10), sep = "") rownames(test) = paste("Gene", seq(20), sep = "")
# Draw heatmaps pheatmap(test) pheatmap(test, kmeansK = 2) pheatmap(test, scale = "row", clusteringDistanceRows = "correlation") pheatmap(test, color = colorRampPalette(c("navy", "white", "firebrick3"))(50)) pheatmap(test, cluster_row = FALSE) pheatmap(test, legend = FALSE)
# Show text within cells pheatmap(test, displayNumbers = TRUE) pheatmap(test, displayNumbers = TRUE, numberFormat = "%.1e") pheatmap(test, displayNumbers = matrix(ifelse(test > 5, "*", ""), nrow(test))) pheatmap(test, cluster_row = FALSE, legendBreaks = seq(-1, 4), legendLabels = c("0", "1e-4", "1e-3", "1e-2", "1e-1", "1"))
# Fix cell sizes and save to file with correct size pheatmap(test, cellWidth = 15, cellHeight = 12, main = "Example heatmap") pheatmap(test, cellWidth = 15, cellHeight = 12, fontSize = 8, fileName = "test.pdf")
# Generate annotations for rows and columns annotationCol = data.frame(CellType = factor(rep(c("CT1", "CT2"), 5)), Time = seq(5)) rownames(annotationCol) = paste("Test", seq(10), sep = "")
annotationRow = data.frame(GeneClass = factor(rep(c("Path1", "Path2", "Path3"), c(10, 4, 6)))) rownames(annotationRow) = paste("Gene", seq(20), sep = "")
# Display row and color annotations pheatmap(test, annotationCol = annotationCol) pheatmap(test, annotationCol = annotationCol, annotationLegend = FALSE) pheatmap(test, annotationCol = annotationCol, annotationRow = annotationRow)
# Specify colors ann_colors = list(Time = c("white", "firebrick"), CellType = c(CT1 = "#1B9E77", CT2 = "#D95F02"), GeneClass = c(Path1 = "#7570B3", Path2 = "#E7298A", Path3 = "#66A61E"))
pheatmap(test, annotationCol = annotationCol, annotationColors = ann_colors, main = "Title") pheatmap(test, annotationCol = annotationCol, annotationRow = annotationRow, annotationColors = ann_colors) pheatmap(test, annotationCol = annotationCol, annotationColors = ann_colors[2])
# Gaps in heatmaps pheatmap(test, annotationCol = annotationCol, clusterRows = FALSE, gapsRow = c(10, 14)) pheatmap(test, annotationCol = annotationCol, clusterRows = FALSE, gapsRow = c(10, 14), cutreeCol = 2)
# Show custom strings as row/col names labelsRow = c("", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "Il10", "Il15", "Il1b")
pheatmap(test, annotationCol = annotationCol, labelsRow = labelsRow)
# Specifying clustering from distance matrix drows = stats::dist(test, method = "minkowski") dcols = stats::dist(t(test), method = "minkowski") pheatmap(test, clusteringDistanceRows = drows, clusteringDistanceCols = dcols)
# Modify ordering of the clusters using clustering callback option callback = function(hc, mat) sv = svd(t(mat))$v[, 1] dend = reorder(as.dendrogram(hc), wts = sv) as.hclust(dend)
pheatmap(test, clusteringCallback = callback)
This function generates a SingleCellExperiment
containing a simulated counts matrix in the "counts"
assay slot, as
well as various parameters used in the simulation which can be
useful for running celda and are stored in metadata
slot. The user
must provide the desired model (one of celda_C, celda_G, celda_CG) as well
as any desired tuning parameters for those model's simulation functions
as detailed below.
simulateCells( model = c("celda_CG", "celda_C", "celda_G"), S = 5, CRange = c(50, 100), NRange = c(500, 1000), C = 100, G = 100, K = 5, L = 10, alpha = 1, beta = 1, gamma = 5, delta = 1, seed = 12345 )
simulateCells( model = c("celda_CG", "celda_C", "celda_G"), S = 5, CRange = c(50, 100), NRange = c(500, 1000), C = 100, G = 100, K = 5, L = 10, alpha = 1, beta = 1, gamma = 5, delta = 1, seed = 12345 )
model |
Character. Options available in |
S |
Integer. Number of samples to simulate. Default 5. Only used if
|
CRange |
Integer vector. A vector of length 2 that specifies the lower
and upper bounds of the number of cells to be generated in each sample.
Default c(50, 100). Only used if
|
NRange |
Integer vector. A vector of length 2 that specifies the lower and upper bounds of the number of counts generated for each cell. Default c(500, 1000). |
C |
Integer. Number of cells to simulate. Default 100. Only used if
|
G |
Integer. The total number of features to be simulated. Default 100. |
K |
Integer. Number of cell populations. Default 5. Only used if
|
L |
Integer. Number of feature modules. Default 10. Only used if
|
alpha |
Numeric. Concentration parameter for Theta. Adds a pseudocount
to each cell population in each sample. Default 1. Only used if
|
beta |
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell population. Default 1. |
gamma |
Numeric. Concentration parameter for Eta. Adds a pseudocount to
the number of features in each module. Default 5. Only used if
|
delta |
Numeric. Concentration parameter for Psi. Adds a pseudocount to
each feature in each module. Default 1. Only used if
|
seed |
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made. |
A SingleCellExperiment object with
simulated count matrix stored in the "counts" assay slot. Function
parameter settings are stored in the metadata slot. For
"celda_CG"
and "celda_C"
models,
columns celda_sample_label
and celda_cell_cluster
in
colData contain simulated sample labels and
cell population clusters. For "celda_CG"
and "celda_G"
models, column celda_feature_module
in
rowData contains simulated gene modules.
sce <- simulateCells()
sce <- simulateCells()
This function generates a list containing two count matrices – one for real expression, the other one for contamination, as well as other parameters used in the simulation which can be useful for running decontamination.
simulateContamination( C = 300, G = 100, K = 3, NRange = c(500, 1000), beta = 0.1, delta = c(1, 10), numMarkers = 3, seed = 12345 )
simulateContamination( C = 300, G = 100, K = 3, NRange = c(500, 1000), beta = 0.1, delta = c(1, 10), numMarkers = 3, seed = 12345 )
C |
Integer. Number of cells to be simulated. Default |
G |
Integer. Number of genes to be simulated. Default |
K |
Integer. Number of cell populations to be simulated.
Default |
NRange |
Integer vector. A vector of length 2 that specifies the lower
and upper bounds of the number of counts generated for each cell. Default
|
beta |
Numeric. Concentration parameter for Phi. Default |
delta |
Numeric or Numeric vector. Concentration parameter for Theta.
If input as a single numeric value, symmetric values for beta
distribution are specified; if input as a vector of lenght 2, the two
values will be the shape1 and shape2 paramters of the beta distribution
respectively. Default |
numMarkers |
Integer. Number of markers for each cell population.
Default |
seed |
Integer. Passed to |
A list containing the nativeMatirx
(real expression),
observedMatrix
(real expression + contamination), as well as other
parameters used in the simulation.
Shiyi Yang, Yuan Yin, Joshua Campbell
contaminationSim <- simulateContamination(K = 3, delta = c(1, 10))
contaminationSim <- simulateContamination(K = 3, delta = c(1, 10))
Manually select a celda feature module to split into 2 or more modules. Useful for splitting up modules that show divergent expression of features in multiple cell clusters.
splitModule( x, module, useAssay = "counts", altExpName = "featureSubset", n = 2, seed = 12345 ) ## S4 method for signature 'SingleCellExperiment' splitModule( x, module, useAssay = "counts", altExpName = "featureSubset", n = 2, seed = 12345 )
splitModule( x, module, useAssay = "counts", altExpName = "featureSubset", n = 2, seed = 12345 ) ## S4 method for signature 'SingleCellExperiment' splitModule( x, module, useAssay = "counts", altExpName = "featureSubset", n = 2, seed = 12345 )
x |
A SingleCellExperiment object
with the matrix located in the assay slot under |
module |
Integer. The module to be split. |
useAssay |
A string specifying which assay
slot to use for |
altExpName |
The name for the altExp slot
to use. Default |
n |
Integer. How many modules should |
seed |
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made. |
A updated SingleCellExperiment object with new
feature modules stored in column celda_feature_module
in
rowData(x)
.
data(sceCeldaCG) # Split module 5 into 2 new modules. sce <- splitModule(sceCeldaCG, module = 5)
data(sceCeldaCG) # Split module 5 into 2 new modules. sce <- splitModule(sceCeldaCG, module = 5)
celdaGridSearch
Select a subset of models from a
SingleCellExperiment object generated by
celdaGridSearch that match the criteria in the argument
params
.
subsetCeldaList(x, params, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' subsetCeldaList(x, params, altExpName = "featureSubset") ## S4 method for signature 'celdaList' subsetCeldaList(x, params)
subsetCeldaList(x, params, altExpName = "featureSubset") ## S4 method for signature 'SingleCellExperiment' subsetCeldaList(x, params, altExpName = "featureSubset") ## S4 method for signature 'celdaList' subsetCeldaList(x, params)
x |
Can be one of
|
params |
List. List of parameters used to subset the matching celda
models in list |
altExpName |
The name for the altExp slot to use. Default "featureSubset". |
One of
A new SingleCellExperiment object containing
all models matching the
provided criteria in params
. If only one celda model result in the
"celda_grid_search"
slot in metadata(x)
matches
the given criteria, a new SingleCellExperiment object
with the matching model stored in the
metadata
"celda_parameters"
slot will be returned. Otherwise, a new
SingleCellExperiment object with the subset models stored
in the metadata
"celda_grid_search"
slot will be returned.
A new celdaList
object containing all models matching the
provided criteria in params
. If only one item in the
celdaList
matches the given criteria, the matching model will be
returned directly instead of a celdaList
object.
celdaGridSearch can run Celda with multiple parameters and chains in parallel. selectBestModel can get the best model for each combination of parameters.
data(sceCeldaCGGridSearch) sceK5L10 <- subsetCeldaList(sceCeldaCGGridSearch, params = list(K = 5, L = 10)) data(celdaCGGridSearchRes) resK5L10 <- subsetCeldaList(celdaCGGridSearchRes, params = list(K = 5, L = 10))
data(sceCeldaCGGridSearch) sceK5L10 <- subsetCeldaList(sceCeldaCGGridSearch, params = list(K = 5, L = 10)) data(celdaCGGridSearchRes) resK5L10 <- subsetCeldaList(celdaCGGridSearchRes, params = list(K = 5, L = 10))
topRank() can quickly identify the top 'n' rows for each column of a matrix. For example, this can be useful for identifying the top 'n' features per cell.
topRank(matrix, n = 25, margin = 2, threshold = 0, decreasing = TRUE)
topRank(matrix, n = 25, margin = 2, threshold = 0, decreasing = TRUE)
matrix |
Numeric matrix. |
n |
Integer. Maximum number of items above 'threshold' returned for each ranked row or column. |
margin |
Integer. Dimension of 'matrix' to rank, with 1 for rows, 2 for columns. Default 2. |
threshold |
Numeric. Only return ranked rows or columns in the matrix that are above this threshold. If NULL, then no threshold will be applied. Default 0. |
decreasing |
Logical. Specifies if the rank should be decreasing. Default TRUE. |
List. The 'index' variable provides the top 'n' row (feature) indices contributing the most to each column (cell). The 'names' variable provides the rownames corresponding to these indexes.
data(sampleCells) topRanksPerCell <- topRank(sampleCells, n = 5) topFeatureNamesForCell <- topRanksPerCell$names[1]
data(sampleCells) topRanksPerCell <- topRank(sampleCells, n = 5) topFeatureNamesForCell <- topRanksPerCell$names[1]