Title: | Cancer Clone Finder |
---|---|
Description: | A collection of tools for cancer genomic data clustering analyses, including those for single cell RNA-seq. Cell clustering and feature gene selection analysis employ Bayesian (and maximum likelihood) non-negative matrix factorization (NMF) algorithm. Input data set consists of RNA count matrix, gene, and cell bar code annotations. Analysis outputs are factor matrices for multiple ranks and marginal likelihood values for each rank. The package includes utilities for downstream analyses, including meta-gene identification, visualization, and construction of rank-based trees for clusters. |
Authors: | Jun Woo [aut, cre], Jinhua Wang [aut] |
Maintainer: | Jun Woo <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.27.0 |
Built: | 2024-12-14 04:23:57 UTC |
Source: | https://github.com/bioc/ccfindR |
Subsetting scNMFSet object
## S4 method for signature 'scNMFSet,ANY,ANY,ANY' x[i, j]
## S4 method for signature 'scNMFSet,ANY,ANY,ANY' x[i, j]
x |
Object to be subsetted |
i |
row index |
j |
column index |
Subsetted object
Computes GSEA enrichment score of marker sets in meta gene list
assignCelltype(obj, rank, gset, gene_names = NULL, p = 0, remove.na = FALSE, p.value = FALSE, nperm = 1000, progress.bar = TRUE, grp.prefix = c("IG"))
assignCelltype(obj, rank, gset, gene_names = NULL, p = 0, remove.na = FALSE, p.value = FALSE, nperm = 1000, progress.bar = TRUE, grp.prefix = c("IG"))
obj |
Object of class |
rank |
Rank to examine |
gset |
List of gene sets to be used as markers |
gene_names |
Names of genes to be used for meta-gene identification |
p |
Enrichment score exponent. |
remove.na |
Remove gene sets with no overlap |
p.value |
Estimatte p values using permutation |
nperm |
No. of permutation replicates |
progress.bar |
Display progress bar for p value computation |
grp.prefix |
Gene name prefix to search for with wildcard matches in query |
If obj
is of clas scNMFSet
, it computes meta gene list using
meta_gene.cv
. Otherwise, obj
is expected to be
a data frame of the same structure as the output of meta_gene.cv
;
the number of rows same as the total number of metagenes per cluster,
three columns per each cluster (gene name, meta-gene score, and coefficient of variation).
The argument gset
is a list of gene sets to be checked for enrichment in
each cluster meta gene list. The enrichment score is computed using
the GSEA algorithm (Subramanian et al. 2005).
Matrix of enrichment score statistics with cell types in rows and clusters in columns
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005). “Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles.” Proc. Natl. Acad. Sci., USA, 102(43), 15545–15550. doi:10.1073/pnas.0506580102.
dir <- system.file('extdata',package='ccfindR') pbmc <- read_10x(dir) pbmc <- vb_factorize(pbmc, ranks=5) meta <- meta_gene.cv(pbmc,rank=5, gene_names=rowData(pbmc)[,2]) markers <- list('B cell'=c('CD74','IG','HLA'), 'CD8+ T'=c('CD8A','CD8B','GZMK','CCR7','LTB'), 'CD4+ T'=c('CD3D','CD3E','IL7R','LEF1'), 'NK'=c('GNLY','NKG7','GZMA','GZMH'), 'Macrophage'=c('S100A8','S100A9','CD14','LYZ','CFD')) gsea <- assignCelltype(meta, rank=5, gset=markers, grp.prefix=c('IG','HLA')) gsea
dir <- system.file('extdata',package='ccfindR') pbmc <- read_10x(dir) pbmc <- vb_factorize(pbmc, ranks=5) meta <- meta_gene.cv(pbmc,rank=5, gene_names=rowData(pbmc)[,2]) markers <- list('B cell'=c('CD74','IG','HLA'), 'CD8+ T'=c('CD8A','CD8B','GZMK','CCR7','LTB'), 'CD4+ T'=c('CD3D','CD3E','IL7R','LEF1'), 'NK'=c('GNLY','NKG7','GZMA','GZMH'), 'Macrophage'=c('S100A8','S100A9','CD14','LYZ','CFD')) gsea <- assignCelltype(meta, rank=5, gset=markers, grp.prefix=c('IG','HLA')) gsea
Retrieve or set the basis matrices W
from factorization
in an object
basis(object)
basis(object)
object |
Object of class |
After factorization, basis matrices corresponding to each rank value are
stored as elements of a list, which is in slot basis
of object of
class scNMFSet
.
basis(object)
will return the list of matrices.
basis(object) <- value
can be used to modify it.
Either NULL
or a list of same length as
ranks(object)
, whose
elements are basis matrices derived from factorization
under each rank value.
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s,ranks=seq(2,4)) basis(s)[[1]]
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s,ranks=seq(2,4)) basis(s)[[1]]
Basis matrix accessor
## S4 method for signature 'scNMFSet' basis(object)
## S4 method for signature 'scNMFSet' basis(object)
object |
Object containing basis matrix |
List of basis matrices
Access and modify basis matrices
basis(object) <- value
basis(object) <- value
object |
Object of class |
value |
Basis matrix to be substituted |
Input object with updated basis matrices
set.seed(1) s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=3) basis(s)[[1]] <- apply(basis(s)[[1]],seq(1,2),round,digits=3) basis(s)
set.seed(1) s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=3) basis(s)[[1]] <- apply(basis(s)[[1]],seq(1,2),round,digits=3) basis(s)
Access and modify basis matrices
## S4 replacement method for signature 'scNMFSet' basis(object) <- value
## S4 replacement method for signature 'scNMFSet' basis(object) <- value
object |
Object of class |
value |
Basis matrix to be substituted |
Input object with updated basis matrices
set.seed(1) s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=3) basis(s)[[1]] <- apply(basis(s)[[1]],c(1,2),round,digits=3) basis(s)
set.seed(1) s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=3) basis(s)[[1]] <- apply(basis(s)[[1]],c(1,2),round,digits=3) basis(s)
Build tree connecting clusters at different ranks
build_tree(object, rmax)
build_tree(object, rmax)
object |
Object of class |
rmax |
Maximum rank at which tree branching stops |
List containing the tree structure
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) tree <- build_tree(s,rmax=5) tree
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) tree <- build_tree(s,rmax=5) tree
This package contains tools and utilities for cell-type discovery using single-cell transcriptomic data while evaluating significance of the depth of clustering (Woo et al. 2019).
Woo J, Winterhoff BJ, Starr TK, Aliferis C, Wang J (2019). “De novo prediction of cell-type complexity in single-cell RNA-seq and tumor microenvironments.” Life Sci. Alliance, 2, e201900443. http://dx.doi.org/10.26508/lsa.201900443.
Retrieve a coefficient matrix H
derived from factorization by rank value
and generate heatmap of its elements.
cell_map(object, rank, main = "Cells", ...)
cell_map(object, rank, main = "Cells", ...)
object |
Object of class |
rank |
Rank value for which the cell map is to be displayed.
The object must contain the corresponding slot: one element of
|
main |
Title of plot. |
... |
NULL
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60)) rownames(x) <- seq_len(10) colnames(x) <- seq_len(100) s <- scNMFSet(count=x,rowData=seq_len(10),colData=seq_len(100)) s <- vb_factorize(s,ranks=seq(2,5)) plot(s) cell_map(s, rank=3)
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60)) rownames(x) <- seq_len(10) colnames(x) <- seq_len(100) s <- scNMFSet(count=x,rowData=seq_len(10),colData=seq_len(100)) s <- vb_factorize(s,ranks=seq(2,5)) plot(s) cell_map(s, rank=3)
Use factorization results in an object to assign cells into clusters.
cluster_id(object, rank = 2)
cluster_id(object, rank = 2)
object |
Object of class |
rank |
Rank value whose factor matrices are to be used for assignment. |
Vector of length equal to the number of cells containing cluster ID numbers of each cell.
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(count=x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) cid <- cluster_id(s, rank=5) table(cid)
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(count=x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) cid <- cluster_id(s, rank=5) table(cid)
Retrieve or set the coefficient matrices from factorization in an object
coeff(object)
coeff(object)
object |
Object of class |
After factorization, coefficient matrices H
corresponding
to each rank value are
stored as elements of a list, which is in slot coeff
of object of
class scNMFSet
.
coeff(object)
will return the list of matrices.
coeff(object) <- value
can be used to modify it.
Either NULL
or a list of same length as
ranks(object)
, whose
elements are coefficient matrices derived from factorization
under each rank value.
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s,ranks=seq(2,4)) coeff(s)[[1]]
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s,ranks=seq(2,4)) coeff(s)[[1]]
Coefficient matrix accessor
## S4 method for signature 'scNMFSet' coeff(object)
## S4 method for signature 'scNMFSet' coeff(object)
object |
Object containing coefficient matrix |
List of coefficient matrices
Access and modify coefficient matrices
coeff(object) <- value
coeff(object) <- value
object |
Object of class |
value |
Coefficient matrix to be substituted |
Input object with updated coefficient matrices
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=3) coeff(s)[[1]] <- apply(coeff(s)[[1]],c(1,2),round,digits=2) coeff(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=3) coeff(s)[[1]] <- apply(coeff(s)[[1]],c(1,2),round,digits=2) coeff(s)
Can be used to access and modify coefficient matrices
## S4 replacement method for signature 'scNMFSet' coeff(object) <- value
## S4 replacement method for signature 'scNMFSet' coeff(object) <- value
object |
Object of class |
value |
Coefficient matrix to be substituted |
Input object with updated coefficient matrices
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=3) coeff(s)[[1]] <- apply(coeff(s)[[1]],c(1,2),round,digits=2) coeff(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=3) coeff(s)[[1]] <- apply(coeff(s)[[1]],c(1,2),round,digits=2) coeff(s)
Sample annotation accessor
## S4 method for signature 'scNMFSet' colData(x)
## S4 method for signature 'scNMFSet' colData(x)
x |
Object containing sample annotation |
Column annotation DataFrame
library(S4Vectors) x <- matrix(rpois(n=12,lambda=3),4,3) rownames(x) <- seq_len(4) colnames(x) <- c('a','b','c') s <- scNMFSet(count=x,rowData=seq_len(4),colData=c('a','b','c')) cols <- DataFrame(tissue=c('tissue1','tissue1','tissue2')) rownames(cols) <- c('a','b','c') colData(s) <- cols s
library(S4Vectors) x <- matrix(rpois(n=12,lambda=3),4,3) rownames(x) <- seq_len(4) colnames(x) <- c('a','b','c') s <- scNMFSet(count=x,rowData=seq_len(4),colData=c('a','b','c')) cols <- DataFrame(tissue=c('tissue1','tissue1','tissue2')) rownames(cols) <- c('a','b','c') colData(s) <- cols s
Cell annotation assignment
## S4 replacement method for signature 'scNMFSet,ANY' colData(x) <- value
## S4 replacement method for signature 'scNMFSet,ANY' colData(x) <- value
x |
Object containing cell annotation |
value |
DataFrame to be substituted |
Updated column annotation
library(S4Vectors) x <- matrix(rpois(n=12,lambda=3),4,3) rownames(x) <- seq_len(4) colnames(x) <- c('a','b','c') s <- scNMFSet(count=x,rowData=seq_len(4),colData=c('a','b','c')) cols <- DataFrame(tissue=c('tissue1','tissue1','tissue2')) rownames(cols) <- c('a','b','c') colData(s) <- cols s
library(S4Vectors) x <- matrix(rpois(n=12,lambda=3),4,3) rownames(x) <- seq_len(4) colnames(x) <- c('a','b','c') s <- scNMFSet(count=x,rowData=seq_len(4),colData=c('a','b','c')) cols <- DataFrame(tissue=c('tissue1','tissue1','tissue2')) rownames(cols) <- c('a','b','c') colData(s) <- cols s
Accessor for count matrix
## S4 method for signature 'scNMFSet' counts(object)
## S4 method for signature 'scNMFSet' counts(object)
object |
Object containing count matrix |
Count matrix
s <- scNMFSet(count = matrix(rpois(n=12,lambda=3),3,4)) counts(s)
s <- scNMFSet(count = matrix(rpois(n=12,lambda=3),3,4)) counts(s)
Count matrix can be modified
## S4 replacement method for signature 'scNMFSet' counts(object) <- value
## S4 replacement method for signature 'scNMFSet' counts(object) <- value
object |
Object containing count |
value |
Matrix-like object for replacement |
Object with updated count
mat <- matrix(rpois(n=12,lambda=3),3,4) s <- scNMFSet(count = mat) counts(s) <- mat^2 counts(s)
mat <- matrix(rpois(n=12,lambda=3),3,4) s <- scNMFSet(count = mat) counts(s) <- mat^2 counts(s)
Basis SD matrix accessor
dbasis(object)
dbasis(object)
object |
Object containing dbasis matrix |
List of dbasis matrices
Basis SD matrix accessor
## S4 method for signature 'scNMFSet' dbasis(object)
## S4 method for signature 'scNMFSet' dbasis(object)
object |
Object containing basis standard deviation (SD) matrix |
List of dbasis matrices
Basis SD matrix assignment
dbasis(object) <- value
dbasis(object) <- value
object |
Object containing dbasis matrix |
value |
List for assignment |
Updated object
Access and modify dbasis matrices
## S4 replacement method for signature 'scNMFSet' dbasis(object) <- value
## S4 replacement method for signature 'scNMFSet' dbasis(object) <- value
object |
Object of class |
value |
Basis SD matrix to be substituted |
Modified object
Coeff SD matrix accessor
dcoeff(object)
dcoeff(object)
object |
Object containing dcoeff matrix |
List of dcoeff matrices
Coeffcient SD matrix accessor
## S4 method for signature 'scNMFSet' dcoeff(object)
## S4 method for signature 'scNMFSet' dcoeff(object)
object |
Object containing coeffient standard deviation (SD) matrix |
List of dcoeff matrices
Coeff SD matrix assignment
dcoeff(object) <- value
dcoeff(object) <- value
object |
Object containing dcoeff matrix |
value |
List for assignment |
Updated object
Access and modify dcoeff matrices
## S4 replacement method for signature 'scNMFSet' dcoeff(object) <- value
## S4 replacement method for signature 'scNMFSet' dcoeff(object) <- value
object |
Object of class |
value |
Coeff SD matrix to be substituted |
Updated object
Performs single or multiple rank NMF factorization of count matrix using maximum likelihood
factorize(object, ranks = 2, nrun = 20, randomize = FALSE, nsmpl = 1, verbose = 2, progress.bar = TRUE, Itmax = 10000, ncnn.step = 40, criterion = "likelihood", linkage = "average", Tol = 1e-05, store.connectivity = FALSE)
factorize(object, ranks = 2, nrun = 20, randomize = FALSE, nsmpl = 1, verbose = 2, progress.bar = TRUE, Itmax = 10000, ncnn.step = 40, criterion = "likelihood", linkage = "average", Tol = 1e-05, store.connectivity = FALSE)
object |
|
ranks |
Rank for factorization; can be a vector of multiple values. |
nrun |
No. of runs with different initial guess. |
randomize |
Boolean; if |
nsmpl |
No. of randomized samples to average over. |
verbose |
The verbosity level: 3, each iteration output printed; 2, each run output printed; 1, each randomized sample output printed; 0, silent. |
progress.bar |
Display progress bar when |
Itmax |
Maximum no. of iteration. |
ncnn.step |
Minimum no. of steps with no change in connectivity matrix to achieve convergence. |
criterion |
If |
linkage |
Method to be sent to |
Tol |
Tolerance for checking convergence with
|
store.connectivity |
Returns a list also containing connectivity data. |
The main input is the scNMFSet
object with count matrix.
This function performs non-negative factorization and fills in the empty
slots basis
, coeff
, and ranks
.
When run with multiple values of ranks
,
factorization is repeated for each rank and the slot measure
contains quality measures of the ranks. The quality measure
likelihood
is negative the KL distance of the fit to the
target. With nrun > 1
, the likelihood is the maximum
among all runs.
The quality measure dispersion
is the scalar
measure of how far the connectivity matrix is from 0, 1. With
increasing nrun
, dispersion
decreases from 1.
nrun
should be chosen such that dispersion
does not
change appreciably.
With randomization, count
matrix of object
is shuffled.
nsmpl
can be used to average over multiple permutations. This
averaging applies to each quality measure under a given rank.
Object of class scNMFSet
with factorization slots filled.
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30)) s <- scNMFSet(count=x) s <- factorize(s,ranks=seq(2,8),nrun=5) plot(s)
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30)) s <- scNMFSet(count=x) s <- factorize(s,ranks=seq(2,8),nrun=5) plot(s)
Generate heatmap of features derived from factorization of count data.
feature_map(object, basis.matrix = NULL, rank, markers = NULL, subtract.mean = TRUE, log = TRUE, max.per.cluster = 10, feature.names = NULL, perm = NULL, main = "Feature map", cscale = NULL, cex.cluster = 1, cex.feature = 0.5, mar = NULL, ...)
feature_map(object, basis.matrix = NULL, rank, markers = NULL, subtract.mean = TRUE, log = TRUE, max.per.cluster = 10, feature.names = NULL, perm = NULL, main = "Feature map", cscale = NULL, cex.cluster = 1, cex.feature = 0.5, mar = NULL, ...)
object |
Object of class |
basis.matrix |
Basis matrix can be supplied instead of |
rank |
Rank value for which the gene map is to be displayed.
The object must contain the corresponding slot (one element of
|
markers |
Vector of gene names containing markers to be included
in addition to the metagenes. All entries of |
subtract.mean |
Process each rows of basis matrix |
log |
If |
max.per.cluster |
Maximum number of metagenes per cluster. |
feature.names |
Names to be used in the plot for features. |
perm |
Permutation of cluster IDs. |
main |
Main title. |
cscale |
Colors for heatmap. |
cex.cluster |
Cluster ID label size. |
cex.feature |
Feature ID label size. |
mar |
Margins for |
... |
This function uses image()
and is more flexible than
gene_map
.
If object
contains multiple ranks, only the requested
rank's basis matrix W will be displayed. As in gene_map
, the features
displayed in rows are selected by "max" scheme
NULL
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60)) rownames(x) <- seq_len(10) set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60)) rownames(x) <- seq_len(10) colnames(x) <- seq_len(100) s <- scNMFSet(count=x,rowData=seq_len(10), colData=seq_len(100)) s <- vb_factorize(s,ranks=seq(2,5)) plot(s) feature_map(s, rank=3)
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60)) rownames(x) <- seq_len(10) set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60)) rownames(x) <- seq_len(10) colnames(x) <- seq_len(100) s <- scNMFSet(count=x,rowData=seq_len(10), colData=seq_len(100)) s <- vb_factorize(s,ranks=seq(2,5)) plot(s) feature_map(s, rank=3)
Remove low quality cell entries from object
filter_cells(object, umi.min = 0, umi.max = Inf, plot = TRUE, remove.zeros = TRUE)
filter_cells(object, umi.min = 0, umi.max = Inf, plot = TRUE, remove.zeros = TRUE)
object |
|
umi.min |
Minimum UMI count for cell filtering |
umi.max |
Maximum UMI count for cell filtering |
plot |
If |
remove.zeros |
Remove rows/columns containing zeros only |
Takes as input scNMFSet
object and plots histogram of UMI
counts for each
cell. Optionally, cells are filtered using minimum and maximum UMI counts.
The resulting object is returned after removing empty rows and columns,
if any.
scNMFSet
object with cells filtered.
set.seed(1) s <- scNMFSet(matrix(stats::rpois(n=1200,lambda=3),40,30)) s <- filter_cells(s,umi.min=10^2.0,umi.max=10^2.1)
set.seed(1) s <- scNMFSet(matrix(stats::rpois(n=1200,lambda=3),40,30)) s <- filter_cells(s,umi.min=10^2.0,umi.max=10^2.1)
Select genes with high relative variance in count data for further analysis
filter_genes(object, markers = NULL, vmr.min = 0, min.cells.expressed = 0, max.cells.expressed = Inf, rescue.genes = FALSE, progress.bar = TRUE, save.memory = FALSE, plot = TRUE, log = "xy", cex = 0.5)
filter_genes(object, markers = NULL, vmr.min = 0, min.cells.expressed = 0, max.cells.expressed = Inf, rescue.genes = FALSE, progress.bar = TRUE, save.memory = FALSE, plot = TRUE, log = "xy", cex = 0.5)
object |
|
markers |
A vector containing marker genes to be selected.
All rows in
|
vmr.min |
Minimum variance-to-mean ratio for gene filtering. |
min.cells.expressed |
Minimum no. of cells expressed for gene filtering. |
max.cells.expressed |
Maximum no. of cells expressed for gene filtering. |
rescue.genes |
Selected additional genes whose (non-zero) count distributions have at least one mode. |
progress.bar |
Display progress of mode-gene scan or VMR
calculation with |
save.memory |
For a very large number of cells, calculate VMR
row by row while avoiding calls to |
plot |
Plot the distribution of no. of cells expressed vs. VMR. |
log |
Axis in log-scale, |
cex |
Symbol size for each gene in the plot. |
Takes as input scNMFSet
object and scatterplot no. of cells
expressed versus VMR (variance-to-mean ratio)
for each gene. Optionally, genes are filtered using minimum VMR
together with a range of no. of cells expressed.
Object of class scNMFSet
.
set.seed(1) s <- scNMFSet(matrix(stats::rpois(n=1200,lambda=3),40,30)) s <- filter_genes(s,vmr.min=1.0,min.cells.expressed=28, rescue.genes=FALSE)
set.seed(1) s <- scNMFSet(matrix(stats::rpois(n=1200,lambda=3),40,30)) s <- filter_genes(s,vmr.min=1.0,min.cells.expressed=28, rescue.genes=FALSE)
Generate heatmap of metagenes derived from factorization of count data.
gene_map(object, rank, markers = NULL, subtract.mean = TRUE, log = TRUE, max.per.cluster = 10, Colv = NA, gene.names = NULL, main = "Genes", col = NULL, ...)
gene_map(object, rank, markers = NULL, subtract.mean = TRUE, log = TRUE, max.per.cluster = 10, Colv = NA, gene.names = NULL, main = "Genes", col = NULL, ...)
object |
Object of class |
rank |
Rank value for which the gene map is to be displayed.
The object must contain the corresponding slot (one element of
|
markers |
Vector of gene names containing markers to be included
in addition to the metagenes. All entries of |
subtract.mean |
Process each rows of basis matrix |
log |
If |
max.per.cluster |
Maximum number of metagenes per cluster. |
Colv |
|
gene.names |
Names to be used in the plot for genes. |
main |
Title of plot. |
col |
Colors for the cluster panels on the left and top. |
... |
Wrapper for heatmap
to display metagenes and
associated basis matrix element magnitudes. Factorization results inside
an object specified by its rank value will be retrieved, and metagene
sets identified from clusters.
If object
contains multiple ranks, only the requested
rank's basis matrix W will be displayed. The genes displayed in rows
are selected by "max" scheme
[Carmona-Saez, BMC Bioinformatics (2006),
https://doi.org/10.1186/1471-2105-7-54]:
for each cluster (k in 1:ncol
),
rows of W are sorted by decreasing order
of W[,k]
. Marker genes for k
are those among the top
nmarker
for which W[,k]
is maximum within each row.
NULL
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60)) rownames(x) <- seq_len(10) colnames(x) <- seq_len(100) s <- scNMFSet(count=x,rowData=seq_len(10), colData=seq_len(100)) s <- vb_factorize(s,ranks=seq(2,5)) plot(s) gene_map(s, rank=3)
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60)) rownames(x) <- seq_len(10) colnames(x) <- seq_len(100) s <- scNMFSet(count=x,rowData=seq_len(10), colData=seq_len(100)) s <- vb_factorize(s,ranks=seq(2,5)) plot(s) gene_map(s, rank=3)
Retrieve or set factorization measures in an object
measure(object)
measure(object)
object |
Object of class |
Factorization under multiple rank values lead to measures stored in
a data frame inside a slot measure
. In maximum likelihood using
factorize
, this set of quality measures include
dispersion and cophenetic
coeeficients for each rank. In Bayesian factorization using
vb_factorize
,
log evidence for each rank is stored. measure(object)
will return the data
frame. measure(object) <- value
can be used to modify it.
Either NULL
or a data frame containing measures.
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s,ranks=seq(2,4)) measure(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s,ranks=seq(2,4)) measure(s)
Rank measure accessor
## S4 method for signature 'scNMFSet' measure(object)
## S4 method for signature 'scNMFSet' measure(object)
object |
Object containing measure |
Data frame of measure
Can be used to access and modify factorization measure
measure(object) <- value
measure(object) <- value
object |
Object of class |
value |
Measure to be substituted |
Input object with updated measure
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=3) measure(s)[,-1] <- apply(measure(s)[,-1], c(1,2), round,digits=3) measure(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=3) measure(s)[,-1] <- apply(measure(s)[,-1], c(1,2), round,digits=3) measure(s)
Can be used to access and modify factorization measure
## S4 replacement method for signature 'scNMFSet' measure(object) <- value
## S4 replacement method for signature 'scNMFSet' measure(object) <- value
object |
Object of class |
value |
Measure to be substituted |
Input object with updated measure
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=3) measure(s)[,-1] <- apply(measure(s)[,-1], c(1,2), round,digits=3) measure(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=3) measure(s)[,-1] <- apply(measure(s)[,-1], c(1,2), round,digits=3) measure(s)
Generates meta gene table with coefficient of variation
meta_gene.cv(object = NULL, rank, basis.matrix = NULL, dbasis = NULL, max.per.cluster = 100, gene_names = NULL, subtract.mean = TRUE, log = TRUE, cv.max = Inf)
meta_gene.cv(object = NULL, rank, basis.matrix = NULL, dbasis = NULL, max.per.cluster = 100, gene_names = NULL, subtract.mean = TRUE, log = TRUE, cv.max = Inf)
object |
Main object containing factorization outcome |
rank |
Rank for which meta gene is to be found |
basis.matrix |
Basis matrix to work with. Only necessary when
|
dbasis |
Variance of basis matrix. Only necessary when
|
max.per.cluster |
Maximum meta genes per cluster. |
gene_names |
Name of genes. If |
subtract.mean |
Standardize magnitudes of basis elements by subtracting mean |
log |
Use geometric mean. |
cv.max |
Upper bound for CV in selecting meta genes. |
Data frame with meta genes and their CV in each column.
set.seed(1) x <- simulate_whx(nrow=50, ncol=100, rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s, ranks=seq(2,8), nrun=5) plot(s) meta_gene.cv(s, rank=5)
set.seed(1) x <- simulate_whx(nrow=50, ncol=100, rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s, ranks=seq(2,8), nrun=5) plot(s) meta_gene.cv(s, rank=5)
Retrieve a basis matrix from an object and find metagenes.
meta_genes(object, rank, basis.matrix = NULL, max.per.cluster = 10, gene_names = NULL, subtract.mean = TRUE, log = TRUE)
meta_genes(object, rank, basis.matrix = NULL, max.per.cluster = 10, gene_names = NULL, subtract.mean = TRUE, log = TRUE)
object |
Object of class |
rank |
Rank value for which metagenes are to be found. |
basis.matrix |
Instead of an object containing basis matrices, the matrix itself can be provided. |
max.per.cluster |
Maximum number of metagenes per cluster. |
gene_names |
Names of genes to replace row names of basis matrix. |
subtract.mean |
Standardize the matrix elements with means within each row. |
log |
Use geometric mean and division instead of arithmetic
mean and subtraction with |
List of vectors each containing metagene names of clusters.
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60)) rownames(x) <- seq_len(10) colnames(x) <- seq_len(100) s <- scNMFSet(count=x,rowData=seq_len(10),colData=seq_len(100)) s <- vb_factorize(s,ranks=seq(2,5)) meta_genes(s, rank=4)
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60)) rownames(x) <- seq_len(10) colnames(x) <- seq_len(100) s <- scNMFSet(count=x,rowData=seq_len(10),colData=seq_len(100)) s <- vb_factorize(s,ranks=seq(2,5)) meta_genes(s, rank=4)
Generate Newick format tree string from tree list object
newick(tree, parent = "1.1", string = "")
newick(tree, parent = "1.1", string = "")
tree |
Tree list object from |
parent |
Parent ID |
string |
Newick string of parent tree |
String of newick tree
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) tree <- build_tree(s,rmax=5) nw <- newick(tree=tree) nw
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) tree <- build_tree(s,rmax=5) nw <- newick(tree=tree) nw
Rescale count matrix entries such that all cells have the same library size.
normalize_count(object)
normalize_count(object)
object |
|
For analysis purposes, it is sometimes useful to rescale integer count data into floats such that all cells have the same median counts. This function will calculate the median of all UMI counts of cells (total number of RNAs derived from each cell). All count data are then rescaled such that cells have uniform UMI count equal to the median.
scNMFSet
object with normalized count data.
library(Matrix) set.seed(1) s <- scNMFSet(count=matrix(rpois(n=1200,lambda=3),40,30)) colMeans(counts(s)) s <- normalize_count(s) colMeans(counts(s))
library(Matrix) set.seed(1) s <- scNMFSet(count=matrix(rpois(n=1200,lambda=3),40,30)) colMeans(counts(s)) s <- normalize_count(s) colMeans(counts(s))
Takes as main argument scNMFSet
object containing factorized output
and estimate the optimal rank.
optimal_rank(object, df = 10, BF.threshold = 3, type = NULL, m = NULL)
optimal_rank(object, df = 10, BF.threshold = 3, type = NULL, m = NULL)
object |
|
df |
Degrees of freedom for split fit. Upper bound is the total number of data points (number of rank values scanned). |
BF.threshold |
Bayes factor threshold for statistical threshold. |
type |
|
m |
Number of features (e.g., genes) in the count matrix. Only necessary when
|
The input object is used along with Bayes factor threshold to determine the
heterogeneity type (1 or 2) and the optimal rank.
If evidence(rank 1)/evidence(rank2) > BF.treshold
, rank 1 is favorable than rank 2.
List containing type
and ropt
(optimal rank).
set.seed(1) x <- simulate_whx(nrow=50, ncol=100, rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s, ranks=seq(2,8), nrun=5) plot(s) optimal_rank(s)
set.seed(1) x <- simulate_whx(nrow=50, ncol=100, rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s, ranks=seq(2,8), nrun=5) plot(s) optimal_rank(s)
Gene variance to mean ratio and the number of expressing cells are plotted.
plot_genes(object, vmr = NULL, ncexpr = NULL, selected_genes = NULL, variable_genes = NULL, mode_genes = NULL, marker_genes = NULL, save.memory = FALSE, progress.bar = TRUE, log = "xy", cex = 0.5)
plot_genes(object, vmr = NULL, ncexpr = NULL, selected_genes = NULL, variable_genes = NULL, mode_genes = NULL, marker_genes = NULL, save.memory = FALSE, progress.bar = TRUE, log = "xy", cex = 0.5)
object |
Object containing count data |
vmr |
Variance to mean ratio (VMR) |
ncexpr |
Number of cells expressing each gene |
selected_genes |
Logical vector specifing genes selected |
variable_genes |
Logical vector specifing genes with high VMR |
mode_genes |
Logical vector specifying genes with nonzero modes |
marker_genes |
Logical vector specifying marker genes |
save.memory |
If |
progress.bar |
Display progress bar for VMR calculation. Not used when gene lists are supplied. |
log |
Axis in log-scale, |
cex |
Symbol size for genes (supplied to |
This function can be called separately or is also called within
filter_genes
by default.
In the latter case, parameters other
than object
will have been already filled. If called separately
with NULL
gene lists, VMR is recalculated but gene selection
is not done.
NULL
set.seed(1) s <- scNMFSet(matrix(stats::rpois(n=1200,lambda=3),40,30)) plot_genes(s)
set.seed(1) s <- scNMFSet(matrix(stats::rpois(n=1200,lambda=3),40,30)) plot_genes(s)
Visualize the output of build_tree
as a dendrogram.
plot_tree(tree, direction = "rightwards", cex = 0.7, ...)
plot_tree(tree, direction = "rightwards", cex = 0.7, ...)
tree |
List containing tree structure. Output from
|
direction |
|
cex |
Font size of edge/tip labels |
... |
Other parameters to |
Uses plot.phylo
to visualize cluster tree.
NULL
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) tree <- build_tree(s,rmax=5) plot_tree(tree)
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) tree <- build_tree(s,rmax=5) plot_tree(tree)
Retrieve or set the rank values in an object
ranks(object)
ranks(object)
object |
Object of class |
Ranks for which factorization has been performed are stored
in slot ranks
of scNMFSet
object.
ranks(object)
will return the rank vector.
ranks(object) <- value
can be used to modify it.
Either NULL
or vector.
s <- scNMFSet(matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s,ranks=seq(2,4)) ranks(s)
s <- scNMFSet(matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s,ranks=seq(2,4)) ranks(s)
Rank accessor
## S4 method for signature 'scNMFSet' ranks(object)
## S4 method for signature 'scNMFSet' ranks(object)
object |
Object containing rank values |
Vector of rank values
Replace ranks
slot of scNMFSet
object
ranks(object) <- value
ranks(object) <- value
object |
Object of class |
value |
Rank values (vector) to be substituted |
Input object with updated ranks
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=seq(2,3)) ranks(s) <- c('two','three') ranks(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=seq(2,3)) ranks(s) <- c('two','three') ranks(s)
Replace ranks
slot of scNMFSet
object
## S4 replacement method for signature 'scNMFSet' ranks(object) <- value
## S4 replacement method for signature 'scNMFSet' ranks(object) <- value
object |
Object of class |
value |
Rank values (vector) to be substituted |
Input object with updated ranks
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=seq(2,3)) ranks(s) <- c('two','three') ranks(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) s <- vb_factorize(s, ranks=seq(2,3)) ranks(s) <- c('two','three') ranks(s)
Read count, gene, and barcode annotation data in 10x format
and create an object of class scNMFSet
.
read_10x(dir, count = "matrix.mtx", genes = "genes.tsv", barcodes = "barcodes.tsv", remove.zeros = TRUE)
read_10x(dir, count = "matrix.mtx", genes = "genes.tsv", barcodes = "barcodes.tsv", remove.zeros = TRUE)
dir |
Name of directory containing data files. |
count |
Name of count matrix file. |
genes |
Name of gene annotation file. |
barcodes |
Name of cell annotation file. |
remove.zeros |
If |
Files for count
, genes
, and barcodes
are
assumed to be present in dir
.
Count data are in sparse "Matrix Market" format
(https://math.nist.gov/MatrixMarket/formats.html).
Object of class scNMFSet
library(S4Vectors) s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) rowData(s) <- DataFrame(seq_len(4)) colData(s) <- DataFrame(seq_len(3)) write_10x(s,dir='.') s <- read_10x(dir='.') s
library(S4Vectors) s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3)) rowData(s) <- DataFrame(seq_len(4)) colData(s) <- DataFrame(seq_len(3)) write_10x(s,dir='.') s <- read_10x(dir='.') s
Remove rows or columns that are empty from an object
remove_zeros(object)
remove_zeros(object)
object |
Object containing data |
Object with empty rows/columns removed
set.seed(1) x <- matrix(rpois(n=100,lambda=0.1),10,10) s <- scNMFSet(count=x,remove.zeros=FALSE) s2 <- remove_zeros(s) s2
set.seed(1) x <- matrix(rpois(n=100,lambda=0.1),10,10) s <- scNMFSet(count=x,remove.zeros=FALSE) s2 <- remove_zeros(s) s2
Rename tips of trees with cell types
rename_tips(tree, rank, tip.labels)
rename_tips(tree, rank, tip.labels)
tree |
List containing tree |
rank |
Rank value of which tip names are to be replaced |
tip.labels |
Vector of new names for tips |
List containing tree with updated tip labels
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) tree <- build_tree(s,rmax=5) tree <- rename_tips(tree,rank=5,tip.labels=letters[seq_len(5)]) tree
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) tree <- build_tree(s,rmax=5) tree <- rename_tips(tree,rank=5,tip.labels=letters[seq_len(5)]) tree
Feature annotation accessor
## S4 method for signature 'scNMFSet' rowData(x)
## S4 method for signature 'scNMFSet' rowData(x)
x |
Object containing data |
DataFrame of feature annotation
x <- matrix(rpois(n=12,lambda=3),4,3) rownames(x) <- seq_len(4) colnames(x) <- seq_len(3) s <- scNMFSet(count=x,rowData=seq_len(4),colData=seq_len(3)) rowData(s)
x <- matrix(rpois(n=12,lambda=3),4,3) rownames(x) <- seq_len(4) colnames(x) <- seq_len(3) s <- scNMFSet(count=x,rowData=seq_len(4),colData=seq_len(3)) rowData(s)
Gene annotation assignment
## S4 replacement method for signature 'scNMFSet' rowData(x) <- value
## S4 replacement method for signature 'scNMFSet' rowData(x) <- value
x |
Object containing data |
value |
DataFrame of row annotation to be substituted |
Row annotation DataFrame
scNMFSet
objectObject derived from SingleCellExperiment
scNMFSet(count = NULL, ..., remove.zeros = TRUE)
scNMFSet(count = NULL, ..., remove.zeros = TRUE)
count |
Count matrix |
... |
Other parameters of |
remove.zeros |
Remove empty rows and columns |
Object of class scNMFSet
.
count <- matrix(rpois(n=12,lambda=2),4,3) s <- scNMFSet(count=count) s
count <- matrix(rpois(n=12,lambda=2),4,3) s <- scNMFSet(count=count) s
scNMFSet
for storing input data and resultsS4
class derived from SingleCellExperiment
that can store single-cell count matrix, gene and cell annotation
data frames, and factorization factors as well as quality measures
for rank determination.
## S4 method for signature 'scNMFSet,ANY' plot(x)
## S4 method for signature 'scNMFSet,ANY' plot(x)
x |
Object containing measure |
Object of class scNMFSet
NULL
plot
: Plot measures of an object.
For quality measures derived from maximum likelihood inference,
dispersion and cophenetic will be plotted separately.
For measure derived from Bayesian inference, log evidence as a function of rank values will be plotted.
assays
Named list for count matrix counts
.
rowData
DataFrame
for gene (feature)
names and annotations in columns.
colData
DataFrame
for cell IDs and other
annotations in columns (e.g., barcodes, cell types).
ranks
Vector for rank values for which factorization has been performed.
basis
List (of length equal to that of ranks
) of
basis matrices W from factorization;
dimension nrow
x rank
,
where nrow
is no. of rows in count
.
coeff
List (of length equal to that of ranks
) of
coefficient matrices H from factorization;
dimension rank
x ncol
,
where ncol
is no. of columns in count
.
measure
Data frame of factorization quality measures for
each rank (likelihood
and dispersion
).
Other slots inherited from SingleCellExperiment
class are
not explicitly used.
library(S4Vectors) # toy matrix ngenes <- 8 ncells <- 5 mat <- matrix(rpois(n=ngenes*ncells,lambda=3),ngenes,ncells) abc <- letters[seq_len(ngenes)] ABC <- LETTERS[seq_len(ncells)] genes <- DataFrame(gene_id=abc) cells <- DataFrame(cell_id=ABC) rownames(mat) <- rownames(genes) <- abc colnames(mat) <- rownames(cells) <- ABC # create scNMFSet object s <- scNMFSet(count=mat,rowData=genes,colData=cells) # alternative ways s2 <- scNMFSet(count=mat) s2 <- scNMFSet(assays=list(counts=mat)) # show dimensions dim(s) # show slots rowData(s) # modify slots colData(s) <- DataFrame(cell_id=seq_len(ncells), cell_type=c(rep('tissue1',2), rep('tissue2',ncells-2))) colData(s)
library(S4Vectors) # toy matrix ngenes <- 8 ncells <- 5 mat <- matrix(rpois(n=ngenes*ncells,lambda=3),ngenes,ncells) abc <- letters[seq_len(ngenes)] ABC <- LETTERS[seq_len(ncells)] genes <- DataFrame(gene_id=abc) cells <- DataFrame(cell_id=ABC) rownames(mat) <- rownames(genes) <- abc colnames(mat) <- rownames(cells) <- ABC # create scNMFSet object s <- scNMFSet(count=mat,rowData=genes,colData=cells) # alternative ways s2 <- scNMFSet(count=mat) s2 <- scNMFSet(assays=list(counts=mat)) # show dimensions dim(s) # show slots rowData(s) # modify slots colData(s) <- DataFrame(cell_id=seq_len(ncells), cell_type=c(rep('tissue1',2), rep('tissue2',ncells-2))) colData(s)
Display the class and dimension of an object
Object name itself on command line or (show(object))
will display
class and dimensionality
## S4 method for signature 'scNMFSet' show(object)
## S4 method for signature 'scNMFSet' show(object)
object |
Object of class |
NULL
s <- scNMFSet(matrix(rpois(n=12,lambda=3),4,3)) show(s)
s <- scNMFSet(matrix(rpois(n=12,lambda=3),4,3)) show(s)
Use one of two schemes to generate simulated data suitable for testing factorization.
simulate_data(nfeatures, nsamples, generate.factors = FALSE, nfactor = 10, alpha0 = 0.5, shuffle = TRUE)
simulate_data(nfeatures, nsamples, generate.factors = FALSE, nfactor = 10, alpha0 = 0.5, shuffle = TRUE)
nfeatures |
Number of features |
nsamples |
Vector of sample sizes in each cluster.
Rank |
generate.factors |
Generate factor matrices |
nfactor |
Total RNA count of multinomials for each cluster with
|
alpha0 |
Variance parameter of Dirichlet distribution from which
multinomial probabilities are sampled with
|
shuffle |
Randomly permute rows and columns of count matrix. |
In one scheme (generate.factors = TRUE
), simulated factor
matrices
W
and H
are used to build count data X = WH
.
In the second scheme, factor matrices are not used and X
is
sampled directly from r
(rank requested) sets of
multinomial distributions.
If generate.factors = TRUE
, list of components
w
(basis matrix, nfeatures x rank
),
h
(coefficient matrix, rank x ncells
, where
ncells
is equal to n
, the sum of nsamples
), and
x
, a matrix of Poisson deviates with mean W x H
.
If generate.factors = FALSE
, only the count matrix
x
is in the list.
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30)) s <- scNMFSet(x) s
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30)) s <- scNMFSet(x) s
Under Bayesian formulation, use prior distributions of factor matrices and generate simulated data
simulate_whx(nrow, ncol, rank, aw = 0.1, bw = 1, ah = 0.1, bh = 1)
simulate_whx(nrow, ncol, rank, aw = 0.1, bw = 1, ah = 0.1, bh = 1)
nrow |
Number of features (genes). |
ncol |
Number of cells (samples). |
rank |
Rank (ncol of W, nrow of H). |
aw |
Shape parameter of basis prior. |
bw |
Mean of basis prior. Scale parameter is equal to |
ah |
Shape parameter of coefficient prior. |
bh |
Mean of coefficient prior. Scale parameter is equal to
|
Basis W
and coefficient matrices H
are sampled from
gamma distributions (priors) with shape (aw,ah
) and mean
(bw,bh
) parameters. Count data X
are sampled from Poisson
distribution with mean values given by WH
.
List with elements w
, h
, and x
, each
containing basis, coefficient, and count matrices.
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(count=x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) plot(s)
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(count=x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) plot(s)
Perform variational Bayes NMF and store factor matrices in object
vb_factorize(object, ranks = 2, nrun = 1, verbose = 2, progress.bar = TRUE, initializer = "random", Itmax = 10000, hyper.update = rep(TRUE, 4), gamma.a = 1, gamma.b = 1, Tol = 1e-05, hyper.update.n0 = 10, hyper.update.dn = 1, connectivity = TRUE, fudge = NULL, ncores = 1, useC = TRUE, unif.stop = TRUE)
vb_factorize(object, ranks = 2, nrun = 1, verbose = 2, progress.bar = TRUE, initializer = "random", Itmax = 10000, hyper.update = rep(TRUE, 4), gamma.a = 1, gamma.b = 1, Tol = 1e-05, hyper.update.n0 = 10, hyper.update.dn = 1, connectivity = TRUE, fudge = NULL, ncores = 1, useC = TRUE, unif.stop = TRUE)
object |
|
ranks |
Rank for factorization; can be a vector of multiple values. |
nrun |
No. of runs with different initial guesses. |
verbose |
The verbosity level: 3, each iteration output printed; 2, each run output printed; 1, each randomized sample output printed; 0, silent. |
progress.bar |
Display progress bar with |
initializer |
If |
Itmax |
Maximum no. of iteration. |
hyper.update |
Vector of four logicals, each indcating whether
hyperparameters |
gamma.a |
Gamma distribution shape parameter. |
gamma.b |
Gamma distribution mean. These two parameters are used for
fixed hyperparameters with |
Tol |
Tolerance for terminating iteration. |
hyper.update.n0 |
Initial number of steps in which hyperparameters are fixed. |
hyper.update.dn |
Step intervals for hyperparameter updates. |
connectivity |
If |
fudge |
Small positive number used as lower bound for factor matrix
elements to avoid singularity. If |
ncores |
Number of processors (cores) to run. If |
useC |
Use C++ version of updates for speed. |
unif.stop |
Terminate if any of columns in basis matrix is uniform. |
The main input is the scNMFSet
object with count matrix.
This function performs non-negative factorization using Bayesian algorithm
and gamma priors. Slots basis
, coeff
, and ranks
are filled.
When run with multiple values of ranks
, factorization is
repeated for each rank and the slot measure
contains
log evidence and optimal hyperparameters for each rank.
With nrun > 1
, the solution
with the maximum log evidence is stored for a given rank.
Object of class scNMFSet
with factorization slots filled.
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) plot(s)
set.seed(1) x <- simulate_whx(nrow=50,ncol=100,rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s,ranks=seq(2,8),nrun=5) plot(s)
Use tSNE to generate two-dimensional map of coefficient matrix.
visualize_clusters(object, rank, verbose = FALSE, cex = 1, cex.names = 0.7, ...)
visualize_clusters(object, rank, verbose = FALSE, cex = 1, cex.names = 0.7, ...)
object |
|
rank |
Rank value to extract from |
verbose |
Print tSNE messages. |
cex |
Symbol size in tSNE plot |
cex.names |
Font size of labels in count barplot. |
... |
Other parameters to send to |
It retrieves a coefficient matrix H
from an object and use its elements
to assign each cell into clusters.
t-Distributed Stochastic Neighbor Embedding (t-SNE;
https://lvdmaaten.github.io/tsne/) is used to visualize the
clustering
in 2D. Also plotted is the distribution of cell counts for all clusters.
NULL
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30)) rownames(x) <- seq_len(10) colnames(x) <- seq_len(170) s <- scNMFSet(count=x,rowData=seq_len(10),colData=seq_len(170)) s <- vb_factorize(s,ranks=seq(2,5)) visualize_clusters(s,rank=5)
set.seed(1) x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30)) rownames(x) <- seq_len(10) colnames(x) <- seq_len(170) s <- scNMFSet(count=x,rowData=seq_len(10),colData=seq_len(170)) s <- vb_factorize(s,ranks=seq(2,5)) visualize_clusters(s,rank=5)
Use an object and write count and annotation files in 10x format.
write_10x(object, dir, count = "matrix.mtx", genes = "genes.tsv", barcodes = "barcodes.tsv", quote = FALSE)
write_10x(object, dir, count = "matrix.mtx", genes = "genes.tsv", barcodes = "barcodes.tsv", quote = FALSE)
object |
Object of class |
dir |
Directory where files are to be written. |
count |
File name for count matrix. |
genes |
File name for gene annotation. |
barcodes |
File name for cell annotation. |
quote |
Suppress quotation marks in output files. |
NULL
set.seed(1) x <- matrix(rpois(n=12,lambda=3),4,3) rownames(x) <- seq_len(4) colnames(x) <- seq_len(3) s <- scNMFSet(count=x,rowData=seq_len(4),colData=seq_len(3)) write_10x(s,dir='.')
set.seed(1) x <- matrix(rpois(n=12,lambda=3),4,3) rownames(x) <- seq_len(4) colnames(x) <- seq_len(3) s <- scNMFSet(count=x,rowData=seq_len(4),colData=seq_len(3)) write_10x(s,dir='.')
Write a csv file of meta gene lists from input list
write_meta(meta, file)
write_meta(meta, file)
meta |
List of meta genes output from |
file |
Output file name |
NULL
set.seed(1) x <- simulate_whx(nrow=50, ncol=100, rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s, ranks=seq(2,8), nrun=5) plot(s) m <- meta_genes(s, rank=5) write_meta(m, file='meta.csv')
set.seed(1) x <- simulate_whx(nrow=50, ncol=100, rank=5) s <- scNMFSet(x$x) s <- vb_factorize(s, ranks=seq(2,8), nrun=5) plot(s) m <- meta_genes(s, rank=5) write_meta(m, file='meta.csv')