Package 'scde' reference manual

Title:	Single Cell Differential Expression
Description:	The scde package implements a set of statistical methods for analyzing single-cell RNA-seq data. scde fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The scde package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify and characterize putative cell subpopulations based on transcriptional signatures. The overall approach to the differential expression analysis is detailed in the following publication: "Bayesian approach to single-cell differential expression analysis" (Kharchenko PV, Silberstein L, Scadden DT, Nature Methods, doi: 10.1038/nmeth.2967). The overall approach to subpopulation identification and characterization is detailed in the following pre-print: "Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis" (Fan J, Salathia N, Liu R, Kaeser G, Yung Y, Herman J, Kaper F, Fan JB, Zhang K, Chun J, and Kharchenko PV, Nature Methods, doi:10.1038/nmeth.3734).
Authors:	Peter Kharchenko [aut, cre], Jean Fan [aut], Evan Biederstedt [aut]
Maintainer:	Evan Biederstedt <[email protected]>
License:	GPL-2
Version:	2.35.0
Built:	2025-02-28 03:49:14 UTC
Source:	https://github.com/bioc/scde

Determine principal components of a matrix using per-observation/per-variable weights

Description

Implements a weighted PCA

Usage

bwpca(mat, matw = NULL, npcs = 2, nstarts = 1, smooth = 0,
  em.tol = 1e-06, em.maxiter = 25, seed = 1, center = TRUE,
  n.shuffles = 0)
bwpca(mat, matw = NULL, npcs = 2, nstarts = 1, smooth = 0,
  em.tol = 1e-06, em.maxiter = 25, seed = 1, center = TRUE,
  n.shuffles = 0)

Arguments

`mat`	matrix of variables (columns) and observations (rows)
`matw`	corresponding weights
`npcs`	number of principal components to extract
`nstarts`	number of random starts to use
`smooth`	smoothing span
`em.tol`	desired EM algorithm tolerance
`em.maxiter`	maximum number of EM iterations
`seed`	random seed
`center`	whether mat should be centered (weighted centering)
`n.shuffles`	optional number of per-observation randomizations that should be performed in addition to the main calculations to determine the lambda1 (PC1 eigenvalue) magnitude under such randomizations (returned in $randvar)

Value

a list containing eigenvector matrix ($rotation), projections ($scores), variance (weighted) explained by each component ($var), total (weighted) variance of the dataset ($totalvar)

Examples

set.seed(0)
mat <- matrix( c(rnorm(5*10,mean=0,sd=1), rnorm(5*10,mean=5,sd=1)), 10, 10)  # random matrix
base.pca <- bwpca(mat)  # non-weighted pca, equal weights set automatically
matw <- matrix( c(rnorm(5*10,mean=0,sd=1), rnorm(5*10,mean=5,sd=1)), 10, 10)  # random weight matrix
matw <- abs(matw)/max(matw)
base.pca.weighted <- bwpca(mat, matw)  # weighted pca

set.seed(0)
mat <- matrix( c(rnorm(5*10,mean=0,sd=1), rnorm(5*10,mean=5,sd=1)), 10, 10)  # random matrix
base.pca <- bwpca(mat)  # non-weighted pca, equal weights set automatically
matw <- matrix( c(rnorm(5*10,mean=0,sd=1), rnorm(5*10,mean=5,sd=1)), 10, 10)  # random weight matrix
matw <- abs(matw)/max(matw)
base.pca.weighted <- bwpca(mat, matw)  # weighted pca

Filter counts matrix

Description

Filter counts matrix based on gene and cell requirements

Usage

clean.counts(counts, min.lib.size = 1800, min.reads = 10,
  min.detected = 5)
clean.counts(counts, min.lib.size = 1800, min.reads = 10,
  min.detected = 5)

Arguments

`counts`	read count matrix. The rows correspond to genes, columns correspond to individual cells
`min.lib.size`	Minimum number of genes detected in a cell. Cells with fewer genes will be removed (default: 1.8e3)
`min.reads`	Minimum number of reads per gene. Genes with fewer reads will be removed (default: 10)
`min.detected`	Minimum number of cells a gene must be seen in. Genes not seen in a sufficient number of cells will be removed (default: 5)

Value

a filtered read count matrix

Examples

data(pollen)
dim(pollen)
cd <- clean.counts(pollen)
dim(cd)

data(pollen)
dim(pollen)
cd <- clean.counts(pollen)
dim(cd)

Filter GOs list

Description

Filter GOs list and append GO names when appropriate

Usage

clean.gos(go.env, min.size = 5, max.size = 5000, annot = FALSE)
clean.gos(go.env, min.size = 5, max.size = 5000, annot = FALSE)

Arguments

`go.env`	GO or gene set list
`min.size`	Minimum size for number of genes in a gene set (default: 5)
`max.size`	Maximum size for number of genes in a gene set (default: 5000)
`annot`	Whether to append GO annotations for easier interpretation (default: FALSE)

Value

a filtered GO list

Examples


# 10 sample GOs
library(org.Hs.eg.db)
go.env <- mget(ls(org.Hs.egGO2ALLEGS)[1:10], org.Hs.egGO2ALLEGS)
# Filter this list and append names for easier interpretation
go.env <- clean.gos(go.env)


# 10 sample GOs
library(org.Hs.eg.db)
go.env <- mget(ls(org.Hs.egGO2ALLEGS)[1:10], org.Hs.egGO2ALLEGS)
# Filter this list and append names for easier interpretation
go.env <- clean.gos(go.env)

Sample data

Description

A subset of Saiful et al. 2011 dataset containing first 20 ES and 20 MEF cells.

References

http://www.ncbi.nlm.nih.gov/pubmed/21543516

Sample error model

Description

SCDE error model generated from the Pollen et al. 2014 dataset.

References

www.ncbi.nlm.nih.gov/pubmed/25086649

Build error models for heterogeneous cell populations, based on K-nearest neighbor cells.

Description

Builds cell-specific error models assuming that there are multiple subpopulations present among the measured cells. The models for each cell are based on average expression estimates obtained from K closest cells within a given group (if groups = NULL, then within the entire set of measured cells). The method implements fitting of both the original log-fit models (when linear.fit = FALSE), or newer linear-fit models (linear.fit = TRUE, default) with locally fit overdispersion coefficient (local.theta.fit = TRUE, default).

Usage

knn.error.models(counts, groups = NULL, k = round(ncol(counts)/2),
  min.nonfailed = 5, min.count.threshold = 1, save.model.plots = TRUE,
  max.model.plots = 50, n.cores = parallel::detectCores(),
  min.size.entries = 2000, min.fpm = 0, cor.method = "pearson",
  verbose = 0, fpm.estimate.trim = 0.25, linear.fit = TRUE,
  local.theta.fit = linear.fit, theta.fit.range = c(0.01, 100),
  alpha.weight.power = 1/2)
knn.error.models(counts, groups = NULL, k = round(ncol(counts)/2),
  min.nonfailed = 5, min.count.threshold = 1, save.model.plots = TRUE,
  max.model.plots = 50, n.cores = parallel::detectCores(),
  min.size.entries = 2000, min.fpm = 0, cor.method = "pearson",
  verbose = 0, fpm.estimate.trim = 0.25, linear.fit = TRUE,
  local.theta.fit = linear.fit, theta.fit.range = c(0.01, 100),
  alpha.weight.power = 1/2)

Arguments

`counts`	count matrix (integer matrix, rows- genes, columns- cells)
`groups`	optional groups partitioning known subpopulations
`k`	number of nearest neighbor cells to use during fitting. If k is set sufficiently high, all of the cells within a given group will be used.
`min.nonfailed`	minimum number of non-failed measurements (within the k nearest neighbor cells) required for a gene to be taken into account during error fitting procedure
`min.count.threshold`	minimum number of reads required for a measurement to be considered non-failed
`save.model.plots`	whether model plots should be saved (file names are (group).models.pdf, or cell.models.pdf if no group was supplied)
`max.model.plots`	maximum number of models to save plots for (saves time when there are too many cells)
`n.cores`	number of cores to use through the calculations
`min.size.entries`	minimum number of genes to use for model fitting
`min.fpm`	optional parameter to restrict model fitting to genes with group-average expression magnitude above a given value
`cor.method`	correlation measure to be used in determining k nearest cells
`verbose`	level of verbosity
`fpm.estimate.trim`	trim fraction to be used in estimating group-average gene expression magnitude for model fitting (0.5 would be median, 0 would turn off trimming)
`linear.fit`	whether newer linear model fit with zero intercept should be used (T), or the log-fit model published originally (F)
`local.theta.fit`	whether local theta fitting should be used (only available for the linear fit models)
`theta.fit.range`	allowed range of the theta values
`alpha.weight.power`	1/theta weight power used in fitting theta dependency on the expression magnitude

Value

a data frame with parameters of the fit error models (rows- cells, columns- fitted parameters)

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)


data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)

Make the PAGODA app

Description

Create an interactive user interface to explore output of PAGODA.

Usage

make.pagoda.app(tamr, tam, varinfo, env, pwpca, clpca = NULL,
  col.cols = NULL, cell.clustering = NULL, row.clustering = NULL,
  title = "pathway clustering", zlim = c(-1, 1) * quantile(tamr$xv, p =
  0.95))
make.pagoda.app(tamr, tam, varinfo, env, pwpca, clpca = NULL,
  col.cols = NULL, cell.clustering = NULL, row.clustering = NULL,
  title = "pathway clustering", zlim = c(-1, 1) * quantile(tamr$xv, p =
  0.95))

Arguments

`tamr`	Combined pathways that show similar expression patterns. Output of `pagoda.reduce.redundancy`
`tam`	Combined pathways that are driven by the same gene sets. Output of `pagoda.reduce.loading.redundancy`
`varinfo`	Variance information. Output of `pagoda.varnorm`
`env`	Gene sets as an environment variable.
`pwpca`	Weighted PC magnitudes for each gene set provided in the `env`. Output of `pagoda.pathway.wPCA`
`clpca`	Weighted PC magnitudes for de novo gene sets identified by clustering on expression. Output of `pagoda.gene.clusters`
`col.cols`	Matrix of column colors. Useful for visualizing cell annotations such as batch labels. Default NULL.
`cell.clustering`	Dendrogram of cell clustering. Output of `pagoda.cluster.cells` . Default NULL.
`row.clustering`	Dendrogram of combined pathways clustering. Default NULL.
`title`	Title text to be used in the browser label for the app. Default, set as 'pathway clustering'
`zlim`	Range of the normalized gene expression levels, inputted as a list: c(lower_bound, upper_bound). Values outside this range will be Winsorized. Useful for increasing the contrast of the heatmap visualizations. Default, set to the 5th and 95th percentiles.

Value

PAGODA app

Sample error model

Description

SCDE error model generated from a subset of Saiful et al. 2011 dataset containing first 20 ES and 20 MEF cells.

References

http://www.ncbi.nlm.nih.gov/pubmed/21543516

Determine optimal cell clustering based on the genes driving the significant aspects

Description

Determines cell clustering (hclust result) based on a weighted correlation of genes underlying the top aspects of transcriptional heterogeneity. Branch orientation is optimized if 'cba' package is installed.

Usage

pagoda.cluster.cells(tam, varinfo, method = "ward.D",
  include.aspects = FALSE, verbose = 0, return.details = FALSE)
pagoda.cluster.cells(tam, varinfo, method = "ward.D",
  include.aspects = FALSE, verbose = 0, return.details = FALSE)

Arguments

`tam`	result of pagoda.top.aspects() call
`varinfo`	result of pagoda.varnorm() call
`method`	clustering method ('ward.D' by default)
`include.aspects`	whether the aspect patterns themselves should be included alongside with the individual genes in calculating cell distance
`verbose`	0 or 1 depending on level of desired verbosity
`return.details`	Boolean of whether to return just the hclust result or a list containing the hclust result plus the distance matrix and gene values

Value

hclust result

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)
tam <- pagoda.top.aspects(pwpca, return.table = TRUE, plot=FALSE, z.score=1.96)  # top aspects based on GO only
hc <- pagoda.cluster.cells(tam, varinfo)
plot(hc)


data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)
tam <- pagoda.top.aspects(pwpca, return.table = TRUE, plot=FALSE, z.score=1.96)  # top aspects based on GO only
hc <- pagoda.cluster.cells(tam, varinfo)
plot(hc)

Estimate effective number of cells based on lambda1 of random gene sets

Description

Examines the dependency between the amount of variance explained by the first principal component of a gene set and the number of genes in a gene set to determine the effective number of cells for the Tracy-Widom distribution

Usage

pagoda.effective.cells(pwpca, start = NULL)
pagoda.effective.cells(pwpca, start = NULL)

Arguments

`pwpca`	result of the pagoda.pathway.wPCA() call with n.randomizations > 1
`start`	optional starting value for the optimization (if the NLS breaks, trying high starting values usually fixed the local gradient problem)

Value

effective number of cells

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)
pagoda.effective.cells(pwpca)


data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)
pagoda.effective.cells(pwpca)

Determine de-novo gene clusters and associated overdispersion info

Description

Determine de-novo gene clusters, their weighted PCA lambda1 values, and random matrix expectation.

Usage

pagoda.gene.clusters(varinfo, trim = 3.1/ncol(varinfo$mat),
  n.clusters = 150, n.samples = 60, cor.method = "p",
  n.internal.shuffles = 0, n.starts = 10, n.cores = detectCores(),
  verbose = 0, plot = FALSE, show.random = FALSE, n.components = 1,
  method = "ward.D", secondary.correlation = FALSE,
  n.cells = ncol(varinfo$mat), old.results = NULL)
pagoda.gene.clusters(varinfo, trim = 3.1/ncol(varinfo$mat),
  n.clusters = 150, n.samples = 60, cor.method = "p",
  n.internal.shuffles = 0, n.starts = 10, n.cores = detectCores(),
  verbose = 0, plot = FALSE, show.random = FALSE, n.components = 1,
  method = "ward.D", secondary.correlation = FALSE,
  n.cells = ncol(varinfo$mat), old.results = NULL)

Arguments

`varinfo`	varinfo adjusted variance info from pagoda.varinfo() (or pagoda.subtract.aspect())
`trim`	additional Winsorization trim value to be used in determining clusters (to remove clusters that group outliers occurring in a given cell). Use higher values (5-15) if the resulting clusters group outlier patterns
`n.clusters`	number of clusters to be determined (recommended range is 100-200)
`n.samples`	number of randomly generated matrix samples to test the background distribution of lambda1 on
`cor.method`	correlation method ("pearson", "spearman") to be used as a distance measure for clustering
`n.internal.shuffles`	number of internal shuffles to perform (only if interested in set coherence, which is quite high for clusters by definition, disabled by default; set to 10-30 shuffles to estimate)
`n.starts`	number of wPCA EM algorithm starts at each iteration
`n.cores`	number of cores to use
`verbose`	verbosity level
`plot`	whether a plot showing distribution of random lambda1 values should be shown (along with the extreme value distribution fit)
`show.random`	whether the empirical random gene set values should be shown in addition to the Tracy-Widom analytical approximation
`n.components`	number of PC to calculate (can be increased if the number of clusters is small and some contain strong secondary patterns - rarely the case)
`method`	clustering method to be used in determining gene clusters
`secondary.correlation`	whether clustering should be performed on the correlation of the correlation matrix instead
`n.cells`	number of cells to use for the randomly generated cluster lambda1 model
`old.results`	optionally, pass old results just to plot the model without recalculating the stats

Value

a list containing the following fields:

clusters a list of genes in each cluster values
xf extreme value distribution fit for the standardized lambda1 of a randomly generated pattern
tci index of a top cluster in each random iteration
cl.goc weighted PCA info for each real gene cluster
varm standardized lambda1 values for each randomly generated matrix cluster
clvlm a linear model describing dependency of the cluster lambda1 on a Tracy-Widom lambda1 expectation

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
clpca <- pagoda.gene.clusters(varinfo, trim=7.1/ncol(varinfo$mat), n.clusters=150, n.cores=10, plot=FALSE)


data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
clpca <- pagoda.gene.clusters(varinfo, trim=7.1/ncol(varinfo$mat), n.clusters=150, n.cores=10, plot=FALSE)

Run weighted PCA analysis on pre-annotated gene sets

Description

For each valid gene set (having appropriate number of genes) in the provided environment (setenv), the method will run weighted PCA analysis, along with analogous analyses of random gene sets of the same size, or shuffled expression magnitudes for the same gene set.

Usage

pagoda.pathway.wPCA(varinfo, setenv, n.components = 2,
  n.cores = detectCores(), min.pathway.size = 10, max.pathway.size = 1000,
  n.randomizations = 10, n.internal.shuffles = 0, n.starts = 10,
  center = TRUE, batch.center = TRUE, proper.gene.names = NULL,
  verbose = 0)
pagoda.pathway.wPCA(varinfo, setenv, n.components = 2,
  n.cores = detectCores(), min.pathway.size = 10, max.pathway.size = 1000,
  n.randomizations = 10, n.internal.shuffles = 0, n.starts = 10,
  center = TRUE, batch.center = TRUE, proper.gene.names = NULL,
  verbose = 0)

Arguments

`varinfo`	adjusted variance info from pagoda.varinfo() (or pagoda.subtract.aspect())
`setenv`	environment listing gene sets (contains variables with names corresponding to gene set name, and values being vectors of gene names within each gene set)
`n.components`	number of principal components to determine for each gene set
`n.cores`	number of cores to use
`min.pathway.size`	minimum number of observed genes that should be contained in a valid gene set
`max.pathway.size`	maximum number of observed genes in a valid gene set
`n.randomizations`	number of random gene sets (of the same size) to be evaluated in parallel with each gene set (can be kept at 5 or 10, but should be increased to 50-100 if the significance of pathway overdispersion will be determined relative to random gene set models)
`n.internal.shuffles`	number of internal (independent row shuffles) randomizations of expression data that should be evaluated for each gene set (needed only if one is interested in gene set coherence P values, disabled by default; set to 10-30 to estimate)
`n.starts`	number of random starts for the EM method in each evaluation
`center`	whether the expression matrix should be recentered
`batch.center`	whether batch-specific centering should be used
`proper.gene.names`	alternative vector of gene names (replacing rownames(varinfo$mat)) to be used in cases when the provided setenv uses different gene names
`verbose`	verbosity level

Value

a list of weighted PCA info for each valid gene set

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
# create go environment
library(org.Hs.eg.db)
# translate gene names to ids
ids <- unlist(lapply(mget(rownames(cd), org.Hs.egALIAS2EG, ifnotfound = NA), function(x) x[1]))
rids <- names(ids); names(rids) <- ids
go.env <- lapply(mget(ls(org.Hs.egGO2ALLEGS), org.Hs.egGO2ALLEGS), function(x) as.character(na.omit(rids[x])))
# clean GOs
go.env <- clean.gos(go.env)
# convert to an environment
go.env <- list2env(go.env)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)


data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
# create go environment
library(org.Hs.eg.db)
# translate gene names to ids
ids <- unlist(lapply(mget(rownames(cd), org.Hs.egALIAS2EG, ifnotfound = NA), function(x) x[1]))
rids <- names(ids); names(rids) <- ids
go.env <- lapply(mget(ls(org.Hs.egGO2ALLEGS), org.Hs.egGO2ALLEGS), function(x) as.character(na.omit(rids[x])))
# clean GOs
go.env <- clean.gos(go.env)
# convert to an environment
go.env <- list2env(go.env)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)

Collapse aspects driven by the same combinations of genes

Description

Examines PC loading vectors underlying the identified aspects and clusters aspects based on a product of loading and score correlation (raised to corr.power). Clusters of aspects driven by the same genes are determined based on the distance.threshold and collapsed.

Usage

pagoda.reduce.loading.redundancy(tam, pwpca, clpca = NULL, plot = FALSE,
  cluster.method = "complete", distance.threshold = 0.01, corr.power = 4,
  n.cores = detectCores(), abs = TRUE, ...)
pagoda.reduce.loading.redundancy(tam, pwpca, clpca = NULL, plot = FALSE,
  cluster.method = "complete", distance.threshold = 0.01, corr.power = 4,
  n.cores = detectCores(), abs = TRUE, ...)

Arguments

`tam`	output of pagoda.top.aspects()
`pwpca`	output of pagoda.pathway.wPCA()
`clpca`	output of pagoda.gene.clusters() (optional)
`plot`	whether to plot the resulting clustering
`cluster.method`	one of the standard clustering methods to be used (fastcluster::hclust is used if available or stats::hclust)
`distance.threshold`	similarity threshold for grouping interdependent aspects
`corr.power`	power to which the product of loading and score correlation is raised
`n.cores`	number of cores to use during processing
`abs`	Boolean of whether to use absolute correlation
`...`	additional arguments are passed to the pagoda.view.aspects() method during plotting

Value

a list structure analogous to that returned by pagoda.top.aspects(), but with addition of a $cnam element containing a list of aspects summarized by each row of the new (reduced) $xv and $xvw

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)
tam <- pagoda.top.aspects(pwpca, return.table = TRUE, plot=FALSE, z.score=1.96)  # top aspects based on GO only
tamr <- pagoda.reduce.loading.redundancy(tam, pwpca)


data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)
tam <- pagoda.top.aspects(pwpca, return.table = TRUE, plot=FALSE, z.score=1.96)  # top aspects based on GO only
tamr <- pagoda.reduce.loading.redundancy(tam, pwpca)

Collapse aspects driven by similar patterns (i.e. separate the same sets of cells)

Description

Examines PC loading vectors underlying the identified aspects and clusters aspects based on score correlation. Clusters of aspects driven by the same patterns are determined based on the distance.threshold.

Usage

pagoda.reduce.redundancy(tamr, distance.threshold = 0.2,
  cluster.method = "complete", distance = NULL,
  weighted.correlation = TRUE, plot = FALSE, top = Inf, trim = 0,
  abs = FALSE, ...)
pagoda.reduce.redundancy(tamr, distance.threshold = 0.2,
  cluster.method = "complete", distance = NULL,
  weighted.correlation = TRUE, plot = FALSE, top = Inf, trim = 0,
  abs = FALSE, ...)

Arguments

`tamr`	output of pagoda.reduce.loading.redundancy()
`distance.threshold`	similarity threshold for grouping interdependent aspects
`cluster.method`	one of the standard clustering methods to be used (fastcluster::hclust is used if available or stats::hclust)
`distance`	distance matrix
`weighted.correlation`	Boolean of whether to use a weighted correlation in determining the similarity of patterns
`plot`	Boolean of whether to show plot
`top`	Restrict output to the top n aspects of heterogeneity
`trim`	Winsorization trim to use prior to determining the top aspects
`abs`	Boolean of whether to use absolute correlation
`...`	additional arguments are passed to the pagoda.view.aspects() method during plotting

Value

a list structure analogous to that returned by pagoda.top.aspects(), but with addition of a $cnam element containing a list of aspects summarized by each row of the new (reduced) $xv and $xvw

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)
tam <- pagoda.top.aspects(pwpca, return.table = TRUE, plot=FALSE, z.score=1.96)  # top aspects based on GO only
tamr <- pagoda.reduce.loading.redundancy(tam, pwpca)
tamr2 <- pagoda.reduce.redundancy(tamr, distance.threshold = 0.9, plot = TRUE, labRow = NA, labCol = NA, box = TRUE, margins = c(0.5, 0.5), trim = 0)


data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)
tam <- pagoda.top.aspects(pwpca, return.table = TRUE, plot=FALSE, z.score=1.96)  # top aspects based on GO only
tamr <- pagoda.reduce.loading.redundancy(tam, pwpca)
tamr2 <- pagoda.reduce.redundancy(tamr, distance.threshold = 0.9, plot = TRUE, labRow = NA, labCol = NA, box = TRUE, margins = c(0.5, 0.5), trim = 0)

View pathway or gene weighted PCA

Description

Takes in a list of pathways (or a list of genes), runs weighted PCA, optionally showing the result.

Usage

pagoda.show.pathways(pathways, varinfo, goenv = NULL, n.genes = 20,
  two.sided = FALSE, n.pc = rep(1, length(pathways)), colcols = NULL,
  zlim = NULL, showRowLabels = FALSE, cexCol = 1, cexRow = 1,
  nstarts = 10, cell.clustering = NULL, show.cell.dendrogram = TRUE,
  plot = TRUE, box = TRUE, trim = 0, return.details = FALSE, ...)
pagoda.show.pathways(pathways, varinfo, goenv = NULL, n.genes = 20,
  two.sided = FALSE, n.pc = rep(1, length(pathways)), colcols = NULL,
  zlim = NULL, showRowLabels = FALSE, cexCol = 1, cexRow = 1,
  nstarts = 10, cell.clustering = NULL, show.cell.dendrogram = TRUE,
  plot = TRUE, box = TRUE, trim = 0, return.details = FALSE, ...)

Arguments

`pathways`	character vector of pathway or gene names
`varinfo`	output of pagoda.varnorm()
`goenv`	environment mapping pathways to genes
`n.genes`	number of genes to show
`two.sided`	whether the set of shown genes should be split among highest and lowest loading (T) or if genes with highest absolute loading (F) should be shown
`n.pc`	optional integer vector giving the number of principal component to show for each listed pathway
`colcols`	optional column color matrix
`zlim`	optional z color limit
`showRowLabels`	controls whether row labels are shown in the plot
`cexCol`	column label size (cex)
`cexRow`	row label size (cex)
`nstarts`	number of random starts for the wPCA
`cell.clustering`	cell clustering
`show.cell.dendrogram`	whether cell dendrogram should be shown
`plot`	whether the plot should be shown
`box`	whether to draw a box around the plotted matrix
`trim`	optional Winsorization trim that should be applied
`return.details`	whether the function should return the matrix as well as full PCA info instead of just PC1 vector
`...`	additional arguments are passed to the `c.view.pathways`

Value

cell scores along the first principal component of shown genes (returned as invisible)

Control for a particular aspect of expression heterogeneity in a given population

Description

Similar to subtracting n-th principal component, the current procedure determines (weighted) projection of the expression matrix onto a specified aspect (some pattern across cells, for instance sequencing depth, or PC corresponding to an undesired process such as ribosomal pathway variation) and subtracts it from the data so that it is controlled for in the subsequent weighted PCA analysis.

Usage

pagoda.subtract.aspect(varinfo, aspect, center = TRUE)
pagoda.subtract.aspect(varinfo, aspect, center = TRUE)

Arguments

`varinfo`	normalized variance info (from pagoda.varnorm())
`aspect`	a vector giving a cell-to-cell variation pattern that should be controlled for (length should be corresponding to ncol(varinfo$mat))
`center`	whether the matrix should be re-centered following pattern subtraction

Value

a modified varinfo object with adjusted expression matrix (varinfo$mat)

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
# create go environment
library(org.Hs.eg.db)
# translate gene names to ids
ids <- unlist(lapply(mget(rownames(cd), org.Hs.egALIAS2EG, ifnotfound = NA), function(x) x[1]))
rids <- names(ids); names(rids) <- ids
go.env <- lapply(mget(ls(org.Hs.egGO2ALLEGS), org.Hs.egGO2ALLEGS), function(x) as.character(na.omit(rids[x])))
# clean GOs
go.env <- clean.gos(go.env)
# convert to an environment
go.env <- list2env(go.env)
# subtract the pattern
cc.pattern <- pagoda.show.pathways(ls(go.env)[1:2], varinfo, go.env, show.cell.dendrogram = TRUE, showRowLabels = TRUE)  # Look at pattern from 2 GO annotations
varinfo.cc <- pagoda.subtract.aspect(varinfo, cc.pattern)


data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
# create go environment
library(org.Hs.eg.db)
# translate gene names to ids
ids <- unlist(lapply(mget(rownames(cd), org.Hs.egALIAS2EG, ifnotfound = NA), function(x) x[1]))
rids <- names(ids); names(rids) <- ids
go.env <- lapply(mget(ls(org.Hs.egGO2ALLEGS), org.Hs.egGO2ALLEGS), function(x) as.character(na.omit(rids[x])))
# clean GOs
go.env <- clean.gos(go.env)
# convert to an environment
go.env <- list2env(go.env)
# subtract the pattern
cc.pattern <- pagoda.show.pathways(ls(go.env)[1:2], varinfo, go.env, show.cell.dendrogram = TRUE, showRowLabels = TRUE)  # Look at pattern from 2 GO annotations
varinfo.cc <- pagoda.subtract.aspect(varinfo, cc.pattern)

Score statistical significance of gene set and cluster overdispersion

Description

Evaluates statistical significance of the gene set and cluster lambda1 values, returning either a text table of Z scores, etc, a structure containing normalized values of significant aspects, or a set of genes underlying the significant aspects.

Usage

pagoda.top.aspects(pwpca, clpca = NULL, n.cells = NULL,
  z.score = qnorm(0.05/2, lower.tail = FALSE), return.table = FALSE,
  return.genes = FALSE, plot = FALSE, adjust.scores = TRUE,
  score.alpha = 0.05, use.oe.scale = FALSE, effective.cells.start = NULL)
pagoda.top.aspects(pwpca, clpca = NULL, n.cells = NULL,
  z.score = qnorm(0.05/2, lower.tail = FALSE), return.table = FALSE,
  return.genes = FALSE, plot = FALSE, adjust.scores = TRUE,
  score.alpha = 0.05, use.oe.scale = FALSE, effective.cells.start = NULL)

Arguments

`pwpca`	output of pagoda.pathway.wPCA()
`clpca`	output of pagoda.gene.clusters() (optional)
`n.cells`	effective number of cells (if not provided, will be determined using pagoda.effective.cells())
`z.score`	Z score to be used as a cutoff for statistically significant patterns (defaults to 0.05 P-value
`return.table`	whether a text table showing
`return.genes`	whether a set of genes driving significant aspects should be returned
`plot`	whether to plot the cv/n vs. dataset size scatter showing significance models
`adjust.scores`	whether the normalization of the aspect patterns should be based on the adjusted Z scores - qnorm(0.05/2, lower.tail = FALSE)
`score.alpha`	significance level of the confidence interval for determining upper/lower bounds
`use.oe.scale`	whether the variance of the returned aspect patterns should be normalized using observed/expected value instead of the default chi-squared derived variance corresponding to overdispersion Z score
`effective.cells.start`	starting value for the pagoda.effective.cells() call

Value

if return.table = FALSE and return.genes = FALSE (default) returns a list structure containing the following items:

xv a matrix of normalized aspect patterns (rows- significant aspects, columns- cells
xvw corresponding weight matrix
gw set of genes driving the significant aspects
df text table with the significance testing results

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)
tam <- pagoda.top.aspects(pwpca, return.table = TRUE, plot=FALSE, z.score=1.96)  # top aspects based on GO only


data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)
tam <- pagoda.top.aspects(pwpca, return.table = TRUE, plot=FALSE, z.score=1.96)  # top aspects based on GO only

Normalize gene expression variance relative to transcriptome-wide expectations

Description

Normalizes gene expression magnitudes to ensure that the variance follows chi-squared statistics with respect to its ratio to the transcriptome-wide expectation as determined by local regression on expression magnitude (and optionally gene length). Corrects for batch effects.

Usage

pagoda.varnorm(models, counts, batch = NULL, trim = 0, prior = NULL,
  fit.genes = NULL, plot = TRUE, minimize.underdispersion = FALSE,
  n.cores = detectCores(), n.randomizations = 100, weight.k = 0.9,
  verbose = 0, weight.df.power = 1, smooth.df = -1, max.adj.var = 10,
  theta.range = c(0.01, 100), gene.length = NULL)
pagoda.varnorm(models, counts, batch = NULL, trim = 0, prior = NULL,
  fit.genes = NULL, plot = TRUE, minimize.underdispersion = FALSE,
  n.cores = detectCores(), n.randomizations = 100, weight.k = 0.9,
  verbose = 0, weight.df.power = 1, smooth.df = -1, max.adj.var = 10,
  theta.range = c(0.01, 100), gene.length = NULL)

Arguments

`models`	model matrix (select a subset of rows to normalize variance within a subset of cells)
`counts`	read count matrix
`batch`	measurement batch (optional)
`trim`	trim value for Winsorization (optional, can be set to 1-3 to reduce the impact of outliers, can be as large as 5 or 10 for datasets with several thousand cells)
`prior`	expression magnitude prior
`fit.genes`	a vector of gene names which should be used to establish the variance fit (default is NULL: use all genes). This can be used to specify, for instance, a set spike-in control transcripts such as ERCC.
`plot`	whether to plot the results
`minimize.underdispersion`	whether underdispersion should be minimized (can increase sensitivity in datasets with high complexity of population, however cannot be effectively used in datasets where multiple batches are present)
`n.cores`	number of cores to use
`n.randomizations`	number of bootstrap sampling rounds to use in estimating average expression magnitude for each gene within the given set of cells
`weight.k`	k value to use in the final weight matrix
`verbose`	verbosity level
`weight.df.power`	power factor to use in determining effective number of degrees of freedom (can be increased for datasets exhibiting particularly high levels of noise at low expression magnitudes)
`smooth.df`	degrees of freedom to be used in calculating smoothed local regression between coefficient of variation and expression magnitude (and gene length, if provided). Leave at -1 for automated guess.
`max.adj.var`	maximum value allowed for the estimated adjusted variance (capping of adjusted variance is recommended when scoring pathway overdispersion relative to randomly sampled gene sets)
`theta.range`	valid theta range (should be the same as was set in knn.error.models() call
`gene.length`	optional vector of gene lengths (corresponding to the rows of counts matrix)

Value

a list containing the following fields:

mat adjusted expression magnitude values
matw weight matrix corresponding to the expression matrix
arv a vector giving adjusted variance values for each gene
avmodes a vector estimated average expression magnitudes for each gene
modes a list of batch-specific average expression magnitudes for each gene
prior estimated (or supplied) expression magnitude prior
edf estimated effective degrees of freedom
fit.genes fit.genes parameter

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)


data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)

View PAGODA output

Description

Create static image of PAGODA output visualizing cell hierarchy and top aspects of transcriptional heterogeneity

Usage

pagoda.view.aspects(tamr, row.clustering = hclust(dist(tamr$xv)), top = Inf,
  ...)
pagoda.view.aspects(tamr, row.clustering = hclust(dist(tamr$xv)), top = Inf,
  ...)

Arguments

`tamr`	Combined pathways that show similar expression patterns. Output of `pagoda.reduce.redundancy`
`row.clustering`	Dendrogram of combined pathways clustering
`top`	Restrict output to the top n aspects of heterogeneity
`...`	additional arguments are passed to the `view.aspects` method during plotting

Value

PAGODA heatmap

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)
tam <- pagoda.top.aspects(pwpca, return.table = TRUE, plot=FALSE, z.score=1.96)  # top aspects based on GO only
pagoda.view.aspects(tam)


data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)
tam <- pagoda.top.aspects(pwpca, return.table = TRUE, plot=FALSE, z.score=1.96)  # top aspects based on GO only
pagoda.view.aspects(tam)

wrapper around different mclapply mechanisms

Description

Abstracts out mclapply implementation, and defaults to lapply when only one core is requested (helps with debugging)

Usage

papply(..., n.cores = n)
papply(..., n.cores = n)

Arguments

`...`	parameters to pass to lapply, mclapply, bplapply, etc.
`n.cores`	number of cores. If 1 core is requested, will default to lapply

Sample data

Description

Single cell data from Pollen et al. 2014 dataset.

References

www.ncbi.nlm.nih.gov/pubmed/25086649

Single-cell Differential Expression (with Pathway And Gene set Overdispersion Analysis)

Description

The scde package implements a set of statistical methods for analyzing single-cell RNA-seq data. scde fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The scde package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify and characterize putative cell subpopulations based on transcriptional signatures. See vignette("diffexp") for a brief tutorial on differential expression analysis. See vignette("pagoda") for a brief tutorial on pathway and gene set overdispersion analysis to identify and characterize cell subpopulations. More extensive tutorials are available at http://pklab.med.harvard.edu/scde/index.html. (test)

Author(s)

Peter Kharchenko [email protected]

Jean Fan [email protected]

View differential expression results in a browser

Description

Launches a browser app that shows the differential expression results, allowing to sort, filter, etc. The arguments generally correspond to the scde.expression.difference() call, except that the results of that call are also passed here. Requires Rook and rjson packages to be installed.

Usage

scde.browse.diffexp(results, models, counts, prior, groups = NULL,
  batch = NULL, geneLookupURL = NULL, server = NULL, name = "scde",
  port = NULL)
scde.browse.diffexp(results, models, counts, prior, groups = NULL,
  batch = NULL, geneLookupURL = NULL, server = NULL, name = "scde",
  port = NULL)

Arguments

`results`	result object returned by `scde.expression.difference()`. Note to browse group posterior levels, use `return.posteriors = TRUE` in the `scde.expression.difference()` call.
`models`	model matrix
`counts`	count matrix
`prior`	prior
`groups`	group information
`batch`	batch information
`geneLookupURL`	The URL that will be used to construct links to view more information on gene names. By default (if can't guess the organism) the links will forward to ENSEMBL site search, using `geneLookupURL = "http://useast.ensembl.org/Multi/Search/Results?q = {0}"`. The "0" in the end will be substituted with the gene name. For instance, to link to GeneCards, use `"http://www.genecards.org/cgi-bin/carddisp.pl?gene = {0}"`.
`server`	optional previously returned instance of the server, if want to reuse it.
`name`	app name (needs to be altered only if adding more than one app to the server using `server` parameter)
`port`	Interactive browser port

Value

server instance, on which $stop() function can be called to kill the process.

Examples

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(cd)), levels = c("ESC", "MEF"))
names(sg) <- colnames(cd)

o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 10, threshold.segmentation = TRUE)
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)
# make sure groups corresponds to the models (o.ifm)
groups <- factor(gsub("(MEF|ESC).*", "\\1", rownames(o.ifm)), levels = c("ESC", "MEF"))
names(groups) <- row.names(o.ifm)
ediff <- scde.expression.difference(o.ifm, cd, o.prior, groups = groups, n.randomizations = 100, n.cores = 10, verbose = 1)
scde.browse.diffexp(ediff, o.ifm, cd, o.prior, groups = groups, geneLookupURL="http://www.informatics.jax.org/searchtool/Search.do?query={0}")  # creates browser


data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(cd)), levels = c("ESC", "MEF"))
names(sg) <- colnames(cd)

o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 10, threshold.segmentation = TRUE)
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)
# make sure groups corresponds to the models (o.ifm)
groups <- factor(gsub("(MEF|ESC).*", "\\1", rownames(o.ifm)), levels = c("ESC", "MEF"))
names(groups) <- row.names(o.ifm)
ediff <- scde.expression.difference(o.ifm, cd, o.prior, groups = groups, n.randomizations = 100, n.cores = 10, verbose = 1)
scde.browse.diffexp(ediff, o.ifm, cd, o.prior, groups = groups, geneLookupURL="http://www.informatics.jax.org/searchtool/Search.do?query={0}")  # creates browser

Internal model data

Description

Numerically-derived correction for NB->chi squared approximation stored as an local regression model

Fit single-cell error/regression models

Description

Fit error models given a set of single-cell data (counts) and an optional grouping factor (groups). The cells (within each group) are first cross-compared to determine a subset of genes showing consistent expression. The set of genes is then used to fit a mixture model (Poisson-NB mixture, with expression-dependent concomitant).

Usage

scde.error.models(counts, groups = NULL, min.nonfailed = 3,
  threshold.segmentation = TRUE, min.count.threshold = 4,
  zero.count.threshold = min.count.threshold, zero.lambda = 0.1,
  save.crossfit.plots = FALSE, save.model.plots = TRUE, n.cores = 12,
  min.size.entries = 2000, max.pairs = 5000, min.pairs.per.cell = 10,
  verbose = 0, linear.fit = TRUE, local.theta.fit = linear.fit,
  theta.fit.range = c(0.01, 100))
scde.error.models(counts, groups = NULL, min.nonfailed = 3,
  threshold.segmentation = TRUE, min.count.threshold = 4,
  zero.count.threshold = min.count.threshold, zero.lambda = 0.1,
  save.crossfit.plots = FALSE, save.model.plots = TRUE, n.cores = 12,
  min.size.entries = 2000, max.pairs = 5000, min.pairs.per.cell = 10,
  verbose = 0, linear.fit = TRUE, local.theta.fit = linear.fit,
  theta.fit.range = c(0.01, 100))

Arguments

`counts`	read count matrix. The rows correspond to genes (should be named), columns correspond to individual cells. The matrix should contain integer counts
`groups`	an optional factor describing grouping of different cells. If provided, the cross-fits and the expected expression magnitudes will be determined separately within each group. The factor should have the same length as ncol(counts).
`min.nonfailed`	minimal number of non-failed observations required for a gene to be used in the final model fitting
`threshold.segmentation`	use a fast threshold-based segmentation during cross-fit (default: TRUE)
`min.count.threshold`	the number of reads to use to guess which genes may have "failed" to be detected in a given measurement during cross-cell comparison (default: 4)
`zero.count.threshold`	threshold to guess the initial value (failed/non-failed) during error model fitting procedure (defaults to the min.count.threshold value)
`zero.lambda`	the rate of the Poisson (failure) component (default: 0.1)
`save.crossfit.plots`	whether png files showing cross-fit segmentations should be written out (default: FALSE)
`save.model.plots`	whether pdf files showing model fits should be written out (default = TRUE)
`n.cores`	number of cores to use
`min.size.entries`	minimum number of genes to use when determining expected expression magnitude during model fitting
`max.pairs`	maximum number of cross-fit comparisons that should be performed per group (default: 5000)
`min.pairs.per.cell`	minimum number of pairs that each cell should be cross-compared with
`verbose`	1 for increased output
`linear.fit`	Boolean of whether to use a linear fit in the regression (default: TRUE).
`local.theta.fit`	Boolean of whether to fit the overdispersion parameter theta, ie. the negative binomial size parameter, based on local regression (default: set to be equal to the linear.fit parameter)
`theta.fit.range`	Range of valid values for the overdispersion parameter theta, ie. the negative binomial size parameter (default: c(1e-2, 1e2))

Details

Note: the default implementation has been changed to use linear-scale fit with expression-dependent NB size (overdispersion) fit. This represents an interative improvement on the originally published model. Use linear.fit=F to revert back to the original fitting procedure.

Value

a model matrix, with rows corresponding to different cells, and columns representing different parameters of the determined models

Examples

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(cd)), levels = c("ESC", "MEF"))
names(sg) <- colnames(cd)

o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 10, threshold.segmentation = TRUE)


data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(cd)), levels = c("ESC", "MEF"))
names(sg) <- colnames(cd)

o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 10, threshold.segmentation = TRUE)

Test for expression differences between two sets of cells

Description

Use the individual cell error models to test for differential expression between two groups of cells.

Usage

scde.expression.difference(models, counts, prior, groups = NULL,
  batch = NULL, n.randomizations = 150, n.cores = 10,
  batch.models = models, return.posteriors = FALSE, verbose = 0)
scde.expression.difference(models, counts, prior, groups = NULL,
  batch = NULL, n.randomizations = 150, n.cores = 10,
  batch.models = models, return.posteriors = FALSE, verbose = 0)

Arguments

`models`	models determined by `scde.error.models`
`counts`	read count matrix
`prior`	gene expression prior as determined by `scde.expression.prior`
`groups`	a factor determining the two groups of cells being compared. The factor entries should correspond to the rows of the model matrix. The factor should have two levels. NAs are allowed (cells will be omitted from comparison).
`batch`	a factor (corresponding to rows of the model matrix) specifying batch assignment of each cell, to perform batch correction
`n.randomizations`	number of bootstrap randomizations to be performed
`n.cores`	number of cores to utilize
`batch.models`	(optional) separate models for the batch data (if generated using batch-specific group argument). Normally the same models are used.
`return.posteriors`	whether joint posterior matrices should be returned
`verbose`	integer verbose level (1 for verbose)

Value

default

a data frame with the following fields:

lb, mle, ub lower bound, maximum likelihood estimate, and upper bound of the 95

ce conservative estimate of expression-fold change (equals to the min(abs(c(lb, ub))), or 0 if the CI crosses the 0

Z uncorrected Z-score of expression difference

cZ expression difference Z-score corrected for multiple hypothesis testing using Holm procedure If batch correction has been performed (batch has been supplied), analogous data frames are returned in slots $batch.adjusted for batch-corrected results, and $batch.effect for the differences explained by batch effects alone.

return.posteriors = TRUE

A list is returned, with the default results data frame given in the $results slot. difference.posterior returns a matrix of estimated expression difference posteriors (rows - genes, columns correspond to different magnitudes of fold-change - log2 values are given in the column names) joint.posteriors a list of two joint posterior matrices (rows - genes, columns correspond to the expression levels, given by prior$x grid)

Examples

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(cd)), levels = c("ESC", "MEF"))
names(sg) <- colnames(cd)

o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 10, threshold.segmentation = TRUE)
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)
# make sure groups corresponds to the models (o.ifm)
groups <- factor(gsub("(MEF|ESC).*", "\\1", rownames(o.ifm)), levels = c("ESC", "MEF"))
names(groups) <- row.names(o.ifm)
ediff <- scde.expression.difference(o.ifm, cd, o.prior, groups = groups, n.randomizations = 100, n.cores = n.cores, verbose = 1)


data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(cd)), levels = c("ESC", "MEF"))
names(sg) <- colnames(cd)

o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 10, threshold.segmentation = TRUE)
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)
# make sure groups corresponds to the models (o.ifm)
groups <- factor(gsub("(MEF|ESC).*", "\\1", rownames(o.ifm)), levels = c("ESC", "MEF"))
names(groups) <- row.names(o.ifm)
ediff <- scde.expression.difference(o.ifm, cd, o.prior, groups = groups, n.randomizations = 100, n.cores = n.cores, verbose = 1)

Return scaled expression magnitude estimates

Description

Return point estimates of expression magnitudes of each gene across a set of cells, based on the regression slopes determined during the model fitting procedure.

Usage

scde.expression.magnitude(models, counts)
scde.expression.magnitude(models, counts)

Arguments

`models`	models determined by `scde.error.models`
`counts`	count matrix

Value

a matrix of expression magnitudes on a log scale (rows - genes, columns - cells)

Examples

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
data(o.ifm)  # Load precomputed model. Use ?scde.error.models to see how o.ifm was generated
# get expression magnitude estimates
lfpm <- scde.expression.magnitude(o.ifm, cd)

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
data(o.ifm)  # Load precomputed model. Use ?scde.error.models to see how o.ifm was generated
# get expression magnitude estimates
lfpm <- scde.expression.magnitude(o.ifm, cd)

Estimate prior distribution for gene expression magnitudes

Description

Use existing count data to determine a prior distribution of genes in the dataset

Usage

scde.expression.prior(models, counts, length.out = 400, show.plot = FALSE,
  pseudo.count = 1, bw = 0.1, max.quantile = 1 - 0.001,
  max.value = NULL)
scde.expression.prior(models, counts, length.out = 400, show.plot = FALSE,
  pseudo.count = 1, bw = 0.1, max.quantile = 1 - 0.001,
  max.value = NULL)

Arguments

`models`	models determined by `scde.error.models`
`counts`	count matrix
`length.out`	number of points (resolution) of the expression magnitude grid (default: 400). Note: larger numbers will linearly increase memory/CPU demands.
`show.plot`	show the estimate posterior
`pseudo.count`	pseudo-count value to use (default 1)
`bw`	smoothing bandwidth to use in estimating the prior (default: 0.1)
`max.quantile`	determine the maximum expression magnitude based on a quantile (default : 0.999)
`max.value`	alternatively, specify the exact maximum expression magnitude value

Value

a structure describing expression magnitude grid ($x, on log10 scale) and prior ($y)

Examples

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
data(o.ifm)  # Load precomputed model. Use ?scde.error.models to see how o.ifm was generated
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
data(o.ifm)  # Load precomputed model. Use ?scde.error.models to see how o.ifm was generated
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)

Calculate drop-out probabilities given a set of counts or expression magnitudes

Description

Returns estimated drop-out probability for each cell (row of models matrix), given either an expression magnitude

Usage

scde.failure.probability(models, magnitudes = NULL, counts = NULL)
scde.failure.probability(models, magnitudes = NULL, counts = NULL)

Arguments

`models`	models determined by `scde.error.models`
`magnitudes`	a vector (`length(counts) == nrows(models)`) or a matrix (columns correspond to cells) of expression magnitudes, given on a log scale
`counts`	a vector (`length(counts) == nrows(models)`) or a matrix (columns correspond to cells) of read counts from which the expression magnitude should be estimated

Value

a vector or a matrix of drop-out probabilities

Examples

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
data(o.ifm)  # Load precomputed model. Use ?scde.error.models to see how o.ifm was generated
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)
# calculate probability of observing a drop out at a given set of magnitudes in different cells
mags <- c(1.0, 1.5, 2.0)
p <- scde.failure.probability(o.ifm, magnitudes = mags)
# calculate probability of observing the dropout at a magnitude corresponding to the
# number of reads actually observed in each cell
self.p <- scde.failure.probability(o.ifm, counts = cd)

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
data(o.ifm)  # Load precomputed model. Use ?scde.error.models to see how o.ifm was generated
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)
# calculate probability of observing a drop out at a given set of magnitudes in different cells
mags <- c(1.0, 1.5, 2.0)
p <- scde.failure.probability(o.ifm, magnitudes = mags)
# calculate probability of observing the dropout at a magnitude corresponding to the
# number of reads actually observed in each cell
self.p <- scde.failure.probability(o.ifm, counts = cd)

Fit scde models relative to provided set of expression magnitudes

Description

If group-average expression magnitudes are available (e.g. from bulk measurement), this method can be used to fit individual cell error models relative to that reference

Usage

scde.fit.models.to.reference(counts, reference, n.cores = 10,
  zero.count.threshold = 1, nrep = 1, save.plots = FALSE,
  plot.filename = "reference.model.fits.pdf", verbose = 0, min.fpm = 1)
scde.fit.models.to.reference(counts, reference, n.cores = 10,
  zero.count.threshold = 1, nrep = 1, save.plots = FALSE,
  plot.filename = "reference.model.fits.pdf", verbose = 0, min.fpm = 1)

Arguments

`counts`	count matrix
`reference`	a vector of expression magnitudes (read counts) corresponding to the rows of the count matrix
`n.cores`	number of cores to use
`zero.count.threshold`	read count to use as an initial guess for the zero threshold
`nrep`	number independent of mixture fit iterations to try (default = 1)
`save.plots`	whether to write out a pdf file showing the model fits
`plot.filename`	model fit pdf filename
`verbose`	verbose level
`min.fpm`	minimum reference fpm of genes that will be used to fit the models (defaults to 1). Note: fpm is calculated from the reference count vector as reference/sum(reference)*1e6

Value

matrix of scde models

Examples

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)

o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 10, threshold.segmentation = TRUE)
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)
# calculate joint posteriors across all cells
jp <- scde.posteriors(models = o.ifm, cd, o.prior, n.cores = 10, return.individual.posterior.modes = TRUE, n.randomizations = 100)
# use expected expression magnitude for each gene
av.mag <- as.numeric(jp$jp %*% as.numeric(colnames(jp$jp)))
# translate into counts
av.mag.counts <- as.integer(round(av.mag))
# now, fit alternative models using av.mag as a reference (normally this would correspond to bulk RNA expression magnitude)
ref.models <- scde.fit.models.to.reference(cd, av.mag.counts, n.cores = 1)


data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)

o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 10, threshold.segmentation = TRUE)
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)
# calculate joint posteriors across all cells
jp <- scde.posteriors(models = o.ifm, cd, o.prior, n.cores = 10, return.individual.posterior.modes = TRUE, n.randomizations = 100)
# use expected expression magnitude for each gene
av.mag <- as.numeric(jp$jp %*% as.numeric(colnames(jp$jp)))
# translate into counts
av.mag.counts <- as.integer(round(av.mag))
# now, fit alternative models using av.mag as a reference (normally this would correspond to bulk RNA expression magnitude)
ref.models <- scde.fit.models.to.reference(cd, av.mag.counts, n.cores = 1)

Calculate joint expression magnitude posteriors across a set of cells

Description

Calculates expression magnitude posteriors for the individual cells, and then uses bootstrap resampling to calculate a joint expression posterior for all the specified cells. Alternatively during batch-effect correction procedure, the joint posterior can be calculated for a random composition of cells of different groups (see batch and composition parameters).

Usage

scde.posteriors(models, counts, prior, n.randomizations = 100, batch = NULL,
  composition = NULL, return.individual.posteriors = FALSE,
  return.individual.posterior.modes = FALSE, ensemble.posterior = FALSE,
  n.cores = 20)
scde.posteriors(models, counts, prior, n.randomizations = 100, batch = NULL,
  composition = NULL, return.individual.posteriors = FALSE,
  return.individual.posterior.modes = FALSE, ensemble.posterior = FALSE,
  n.cores = 20)

Arguments

`models`	models models determined by `scde.error.models`
`counts`	read count matrix
`prior`	gene expression prior as determined by `scde.expression.prior`
`n.randomizations`	number of bootstrap iterations to perform
`batch`	a factor describing which batch group each cell (i.e. each row of `models` matrix) belongs to
`composition`	a vector describing the batch composition of a group to be sampled
`return.individual.posteriors`	whether expression posteriors of each cell should be returned
`return.individual.posterior.modes`	whether modes of expression posteriors of each cell should be returned
`ensemble.posterior`	Boolean of whether to calculate the ensemble posterior (sum of individual posteriors) instead of a joint (product) posterior. (default: FALSE)
`n.cores`	number of cores to utilize

Value

default

a posterior probability matrix, with rows corresponding to genes, and columns to expression levels (as defined by prior$x)

return.individual.posterior.modes

a list is returned, with the $jp slot giving the joint posterior matrix, as described above. The $modes slot gives a matrix of individual expression posterior mode values on log scale (rows - genes, columns -cells)

return.individual.posteriors

a list is returned, with the $post slot giving a list of individual posterior matrices, in a form analogous to the joint posterior matrix, but reported on log scale

Examples

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
data(o.ifm)  # Load precomputed model. Use ?scde.error.models to see how o.ifm was generated
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)
# calculate joint posteriors
jp <- scde.posteriors(o.ifm, cd, o.prior, n.cores = 1)

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
data(o.ifm)  # Load precomputed model. Use ?scde.error.models to see how o.ifm was generated
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)
# calculate joint posteriors
jp <- scde.posteriors(o.ifm, cd, o.prior, n.cores = 1)

Test differential expression and plot posteriors for a particular gene

Description

The function performs differential expression test and optionally plots posteriors for a specified gene.

Usage

scde.test.gene.expression.difference(gene, models, counts, prior,
  groups = NULL, batch = NULL, batch.models = models,
  n.randomizations = 1000, show.plots = TRUE, return.details = FALSE,
  verbose = FALSE, ratio.range = NULL, show.individual.posteriors = TRUE,
  n.cores = 1)
scde.test.gene.expression.difference(gene, models, counts, prior,
  groups = NULL, batch = NULL, batch.models = models,
  n.randomizations = 1000, show.plots = TRUE, return.details = FALSE,
  verbose = FALSE, ratio.range = NULL, show.individual.posteriors = TRUE,
  n.cores = 1)

Arguments

`gene`	name of the gene to be tested
`models`	models
`counts`	read count matrix (must contain the row corresponding to the specified gene)
`prior`	expression magnitude prior
`groups`	a two-level factor specifying between which cells (rows of the models matrix) the comparison should be made
`batch`	optional multi-level factor assigning the cells (rows of the model matrix) to different batches that should be controlled for (e.g. two or more biological replicates). The expression difference estimate will then take into account the likely difference between the two groups that is explained solely by their difference in batch composition. Not all batch configuration may be corrected this way.
`batch.models`	optional set of models for batch comparison (typically the same as models, but can be more extensive, or recalculated within each batch)
`n.randomizations`	number of bootstrap/sampling iterations that should be performed
`show.plots`	whether the plots should be shown
`return.details`	whether the posterior should be returned
`verbose`	set to T for some status output
`ratio.range`	optionally specifies the range of the log2 expression ratio plot
`show.individual.posteriors`	whether the individual cell expression posteriors should be plotted
`n.cores`	number of cores to use (default = 1)

Value

by default returns MLE of log2 expression difference, 95

Examples

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
data(o.ifm)  # Load precomputed model. Use ?scde.error.models to see how o.ifm was generated
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)
scde.test.gene.expression.difference("Tdh", models = o.ifm, counts = cd, prior = o.prior)

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
data(o.ifm)  # Load precomputed model. Use ?scde.error.models to see how o.ifm was generated
o.prior <- scde.expression.prior(models = o.ifm, counts = cd, length.out = 400, show.plot = FALSE)
scde.test.gene.expression.difference("Tdh", models = o.ifm, counts = cd, prior = o.prior)

View PAGODA application

Description

Installs a given pagoda app (or any other rook app) into a server, optionally making a call to show it in the browser.

Usage

show.app(app, name, browse = TRUE, port = NULL, ip = "127.0.0.1",
  server = NULL)
show.app(app, name, browse = TRUE, port = NULL, ip = "127.0.0.1",
  server = NULL)

Arguments

`app`	pagoda app (output of make.pagoda.app()) or another rook app
`name`	URL path name for this app
`browse`	whether a call should be made for browser to show the app
`port`	optional port on which the server should be initiated
`ip`	IP on which the server should listen (typically localhost)
`server`	an (optional) Rook server instance (defaults to ___scde.server)

Value

Rook server instance

Examples


app <- make.pagoda.app(tamr2, tam, varinfo, go.env, pwpca, clpca, col.cols=col.cols, cell.clustering=hc, title="NPCs")
# show app in the browser (port 1468)
show.app(app, "pollen", browse = TRUE, port=1468)


app <- make.pagoda.app(tamr2, tam, varinfo, go.env, pwpca, clpca, col.cols=col.cols, cell.clustering=hc, title="NPCs")
# show app in the browser (port 1468)
show.app(app, "pollen", browse = TRUE, port=1468)

View heatmap

Description

Internal function to visualize aspects of transcriptional heterogeneity as a heatmap. Used by pagoda.view.aspects.

Usage

view.aspects(mat, row.clustering = NA, cell.clustering = NA, zlim = c(-1,
  1) * quantile(mat, p = 0.95), row.cols = NULL, col.cols = NULL,
  cols = colorRampPalette(c("darkgreen", "white", "darkorange"), space =
  "Lab")(1024), show.row.var.colors = TRUE, top = Inf, ...)
view.aspects(mat, row.clustering = NA, cell.clustering = NA, zlim = c(-1,
  1) * quantile(mat, p = 0.95), row.cols = NULL, col.cols = NULL,
  cols = colorRampPalette(c("darkgreen", "white", "darkorange"), space =
  "Lab")(1024), show.row.var.colors = TRUE, top = Inf, ...)

Arguments

`mat`	Numeric matrix
`row.clustering`	Row dendrogram
`cell.clustering`	Column dendrogram
`zlim`	Range of the normalized gene expression levels, inputted as a list: c(lower_bound, upper_bound). Values outside this range will be Winsorized. Useful for increasing the contrast of the heatmap visualizations. Default, set to the 5th and 95th percentiles.
`row.cols`	Matrix of row colors.
`col.cols`	Matrix of column colors. Useful for visualizing cell annotations such as batch labels.
`cols`	Heatmap colors
`show.row.var.colors`	Boolean of whether to show row variance as a color track
`top`	Restrict output to the top n aspects of heterogeneity
`...`	additional arguments for heatmap plotting

Value

A heatmap

A Reference Class to represent the PAGODA application

Description

This ROOK application class enables communication with the client-side ExtJS framework and Inchlib HTML5 canvas libraries to create the graphical user interface for PAGODA Refer to the code in make.pagoda.app for usage example

Fields

results: Output of the pathway clustering and redundancy reduction
genes: List of genes to display in the Detailed clustering panel
mat: Matrix of posterior mode count estimates
matw: Matrix of weights associated with each estimate in mat
goenv: Gene set list as an environment
renv: Global environment
name: Name of the application page; for display as the page title
trim: Trim quantity used for Winsorization for visualization
batch: Any batch or other known confounders to be included in the visualization as a column color track

Winsorize matrix

Description

Sets the ncol(mat)*trim top outliers in each row to the next lowest value same for the lowest outliers

Usage

winsorize.matrix(mat, trim)
winsorize.matrix(mat, trim)

Arguments

`mat`	matrix
`trim`	fraction of outliers (on each side) that should be Winsorized, or (if the value is >= 1) the number of outliers to be trimmed on each side

Value

Winsorized matrix

Examples

set.seed(0)
mat <- matrix( c(rnorm(5*10,mean=0,sd=1), rnorm(5*10,mean=5,sd=1)), 10, 10)  # random matrix
mat[1,1] <- 1000  # make outlier
range(mat)  # look at range of values
win.mat <- winsorize.matrix(mat, 0.1)
range(win.mat)  # note outliers removed

set.seed(0)
mat <- matrix( c(rnorm(5*10,mean=0,sd=1), rnorm(5*10,mean=5,sd=1)), 10, 10)  # random matrix
mat[1,1] <- 1000  # make outlier
range(mat)  # look at range of values
win.mat <- winsorize.matrix(mat, 0.1)
range(win.mat)  # note outliers removed

Package 'scde'

Help Index

Determine principal components of a matrix using per-observation/per-variable weights

Description

Usage

Arguments

Value

Examples

Filter counts matrix

Description

Usage

Arguments

Value

Examples

Filter GOs list

Description

Usage

Arguments

Value

Examples

Sample data

Description

References

Sample error model

Description

References

Build error models for heterogeneous cell populations, based on K-nearest neighbor cells.

Description

Usage

Arguments

Value

Examples

Make the PAGODA app

Description

Usage

Arguments

Value

Sample error model

Description

References

Determine optimal cell clustering based on the genes driving the significant aspects

Description

Usage

Arguments

Value

Examples

Estimate effective number of cells based on lambda1 of random gene sets

Description

Usage

Arguments

Value

Examples

Determine de-novo gene clusters and associated overdispersion info

Description

Usage

Arguments

Value

Examples

Run weighted PCA analysis on pre-annotated gene sets

Description

Usage

Arguments

Value

Examples

Collapse aspects driven by the same combinations of genes

Description

Usage

Arguments

Value

Examples

Collapse aspects driven by similar patterns (i.e. separate the same sets of cells)

Description

Usage

Arguments

Value

Examples

View pathway or gene weighted PCA

Description

Usage

Arguments