Title: | Cepo for the identification of differentially stable genes |
---|---|
Description: | Defining the identity of a cell is fundamental to understand the heterogeneity of cells to various environmental signals and perturbations. We present Cepo, a new method to explore cell identities from single-cell RNA-sequencing data using differential stability as a new metric to define cell identity genes. Cepo computes cell-type specific gene statistics pertaining to differential stable gene expression. |
Authors: | Hani Jieun Kim [aut, cre] , Kevin Wang [aut] |
Maintainer: | Hani Jieun Kim <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.13.0 |
Built: | 2024-10-30 04:38:12 UTC |
Source: | https://github.com/bioc/Cepo |
A single-cell RNA-seq dataset adapted from sc_mixology
data(cellbench)
data(cellbench)
An object of SingleCellExperiment class with 895 cells and 2001 genes.
https://github.com/LuyiTian/sc_mixology
ExprsMat accepts various matrix objects, including DelayedArray and HDF5Array for out-of-memory computations. See vignette.
Cepo( exprsMat, cellTypes, minCells = 20, minCelltype = 3, exprsPct = 0.05, prefilter_sd = NULL, prefilter_pzero = NULL, logfc = NULL, computePvalue = NULL, computeFastPvalue = TRUE, variability = "CV", method = "weightedMean", weight = c(0.5, 0.5), workers = 1L, block = NULL, ... )
Cepo( exprsMat, cellTypes, minCells = 20, minCelltype = 3, exprsPct = 0.05, prefilter_sd = NULL, prefilter_pzero = NULL, logfc = NULL, computePvalue = NULL, computeFastPvalue = TRUE, variability = "CV", method = "weightedMean", weight = c(0.5, 0.5), workers = 1L, block = NULL, ... )
exprsMat |
Expression matrix where columns denote cells and rows denote genes |
cellTypes |
Vector of cell type labels |
minCells |
Integer indicating the minimum number of cells required within a cell type |
minCelltype |
Integer indicating the minimum number of cell types required in each batch |
exprsPct |
Percentage of lowly expressed genes to remove. Default to NULL to not remove any genes. |
prefilter_sd |
Numeric value indicating threshold relating to standard deviation of genes. Used with prefilter_zeros. |
prefilter_pzero |
Numeric value indicating threshold relating to the percentage of zero expression of genes. Used with prefilter_sd. |
logfc |
Numeric value indicating the threshold of log fold-change to use to filter genes. |
computePvalue |
Whether to compute p-values using bootstrap test. Default to NULL to not make computations. Set this to an integer to set the number of bootstraps needed (recommend to be at least 100). |
computeFastPvalue |
Logical vector indicating whether to perform a faster version of p-value calculation. Set to TRUE by default. |
variability |
A character indicating the stability measure (CV, IQR, MAD, SD). Default is set to CV. |
method |
Character indicating the method for integration the two stability measures. By default this is set to 'weightedMean' with equal weights. |
weight |
Vector of two values indicating the weights for each stability measure. By default this value is c(0.5, 0.5). |
workers |
Number of cores to use. Default to 1, which invokes
|
block |
Vector of batch labels |
... |
Additional arguments passed to |
Returns a list of key genes.
library(SingleCellExperiment) data('cellbench', package = 'Cepo') cellbench cepoOutput <- Cepo(logcounts(cellbench), cellbench$celltype) cepoOutput
library(SingleCellExperiment) data('cellbench', package = 'Cepo') cellbench cepoOutput <- Cepo(logcounts(cellbench), cellbench$celltype) cepoOutput
Plot densities
plotDensities( x, cepoOutput, nGenes = 2, assay = "logcounts", celltypeColumn, celltype = NULL, genes = NULL, plotType = c("histogram", "density"), color = NULL )
plotDensities( x, cepoOutput, nGenes = 2, assay = "logcounts", celltypeColumn, celltype = NULL, genes = NULL, plotType = c("histogram", "density"), color = NULL )
x |
a |
cepoOutput |
an output from Cepo or doLimma/doVoom/doTtest/doWilcoxon functions |
nGenes |
number of top genes from each celltype to plot. Default to 2. |
assay |
a character ('logcounts' by default), indicating the name of the assays(x) element which stores the expression data (i.e., assays(x)$name_assays_expression). We strongly encourage using normalized data, such as counts per million (CPM) or log-CPM. |
celltypeColumn |
a character, indicating the name of the name of the cell type column in the colData(x). |
celltype |
a character, indicating the name of the cell type to plot. Default is NULL which selects all celltypes in the cepoOutput. |
genes |
a character vector, indicating the name of the genes to plot. Default to NULL, so that 2 top genes from each celltype will be plotted. |
plotType |
Either 'histogram' or 'density' |
color |
a named color vector. The names should correspond to the |
A ggplot
object
with cell-type specific densities for a gene.
A ggplot
object.
library(SingleCellExperiment) data('cellbench', package = 'Cepo') cellbench cepoOutput <- Cepo(logcounts(cellbench), cellbench$celltype) plotDensities( x = cellbench, cepoOutput = cepoOutput, assay = 'logcounts', plotType = 'histogram', celltypeColumn = 'celltype' ) plotDensities( x = cellbench, cepoOutput = cepoOutput, genes = c('PLTP', 'CPT1C', 'MEG3', 'SYCE1', 'MICOS10P3', 'HOXB7'), assay = 'logcounts', plotType = 'histogram', celltypeColumn = 'celltype' )
library(SingleCellExperiment) data('cellbench', package = 'Cepo') cellbench cepoOutput <- Cepo(logcounts(cellbench), cellbench$celltype) plotDensities( x = cellbench, cepoOutput = cepoOutput, assay = 'logcounts', plotType = 'histogram', celltypeColumn = 'celltype' ) plotDensities( x = cellbench, cepoOutput = cepoOutput, genes = c('PLTP', 'CPT1C', 'MEG3', 'SYCE1', 'MICOS10P3', 'HOXB7'), assay = 'logcounts', plotType = 'histogram', celltypeColumn = 'celltype' )
A subsampled single-cell RNA-seq dataset
data(sce_pancreas)
data(sce_pancreas)
An object of SingleCellExperiment class with 528 cells and 1358 genes.
Setting parallel params based on operating platform
setCepoBPPARAM(workers = 1L, ...)
setCepoBPPARAM(workers = 1L, ...)
workers |
Number of cores to use. Default to 1, which invokes
|
... |
Additional arguments passed to |
Parameters for parallel computing depending on OS
# system.time(BiocParallel::bplapply(1:3, FUN = function(i){Sys.sleep(i)}, # BPPARAM = setCepoBPPARAM(workers = 1))) # system.time(BiocParallel::bplapply(1:3, FUN = function(i){Sys.sleep(i)}, # BPPARAM = setCepoBPPARAM(workers = 3)))
# system.time(BiocParallel::bplapply(1:3, FUN = function(i){Sys.sleep(i)}, # BPPARAM = setCepoBPPARAM(workers = 1))) # system.time(BiocParallel::bplapply(1:3, FUN = function(i){Sys.sleep(i)}, # BPPARAM = setCepoBPPARAM(workers = 3)))
Extract the top genes from the Cepo output
topGenes(object, n = 5, returnValues = FALSE)
topGenes(object, n = 5, returnValues = FALSE)
object |
Output from the Cepo function |
n |
Number of top genes to extract |
returnValues |
Whether to return the numeric value associated with the top selected genes |
Returns a list of key genes.
set.seed(1234) n <- 50 ## genes, rows p <- 100 ## cells, cols exprsMat <- matrix(rpois(n * p, lambda = 5), nrow = n) rownames(exprsMat) <- paste0('gene', 1:n) colnames(exprsMat) <- paste0('cell', 1:p) cellTypes <- sample(letters[1:3], size = p, replace = TRUE) cepo_output <- Cepo(exprsMat = exprsMat, cellTypes = cellTypes) cepo_output topGenes(cepo_output, n = 2) topGenes(cepo_output, n = 2, returnValues = TRUE)
set.seed(1234) n <- 50 ## genes, rows p <- 100 ## cells, cols exprsMat <- matrix(rpois(n * p, lambda = 5), nrow = n) rownames(exprsMat) <- paste0('gene', 1:n) colnames(exprsMat) <- paste0('cell', 1:p) cellTypes <- sample(letters[1:3], size = p, replace = TRUE) cepo_output <- Cepo(exprsMat = exprsMat, cellTypes = cellTypes) cepo_output topGenes(cepo_output, n = 2) topGenes(cepo_output, n = 2, returnValues = TRUE)