Package 'ccfindR'

Title:	Cancer Clone Finder
Description:	A collection of tools for cancer genomic data clustering analyses, including those for single cell RNA-seq. Cell clustering and feature gene selection analysis employ Bayesian (and maximum likelihood) non-negative matrix factorization (NMF) algorithm. Input data set consists of RNA count matrix, gene, and cell bar code annotations. Analysis outputs are factor matrices for multiple ranks and marginal likelihood values for each rank. The package includes utilities for downstream analyses, including meta-gene identification, visualization, and construction of rank-based trees for clusters.
Authors:	Jun Woo [aut, cre], Jinhua Wang [aut]
Maintainer:	Jun Woo <[email protected]>
License:	GPL (>= 2)
Version:	1.27.0
Built:	2025-03-14 06:32:50 UTC
Source:	https://github.com/bioc/ccfindR

Help Index

Subsetting scNMFSet object
Cell type assignment via GSEA
Basis matrices in an Object
Basis matrix accessor
Generics for basis matrix assignment
Modify basis matrices
Build tree connecting clusters at different ranks
ccfindR: Cancer Clone FindeR
Plot heatmap of clustering coefficient matrix
Assign cells into clusters
Coefficient matrices in an Object
Coefficient matrix accessor
Generics for coefficient matrix assignment
Modify coefficient matrices
Sample annotation accessor
Cell annotation assignment
Accessor for count matrix
Assignment of count matrix
Basis SD matrix accessor
Basis SD matrix accessor
Basis SD matrix assignment
Modify dbasis matrices
Coeff SD matrix accessor
Coeffcient SD matrix accessor
Coeff SD matrix assignment
Modify dcoeff matrices
Maximum likelihood factorization
Plot heatmap of basis matrix
Filter cells with quality control criteria
Filter genes with quality control criteria
Plot heatmap of metagene matrix
Factorization measures in an Object
Rank measure accessor
Generics for factorization measure assignment
Modify factorization measure
Meta gene table with CV
Find metagenes from basis matrix
Generate Newick format tree string from tree list object
Normalize count data
Determine optimal rank
Plot gene variance distributions
Plot cluster tree
Rank values in an Object
Rank accessor
Generics for ranks assignment
Modify ranks
Read 10x data and generate scNMF object
Remove rows or columns that are empty from an object
Rename tips of trees with cell types
Feature annotation accessor
Gene annotation assignment
Create scNMFSet object
Class scNMFSet for storing input data and results
Display object
Generate simulated data for factorization
Simulate factor matrices and data using priors
Bayesian NMF inference of count matrix
Visualize clusters
Write 10x data files
Write meta genes to a file

Cell type assignment via GSEA

Description

Computes GSEA enrichment score of marker sets in meta gene list

Usage

assignCelltype(obj, rank, gset, gene_names = NULL, p = 0,
  remove.na = FALSE, p.value = FALSE, nperm = 1000,
  progress.bar = TRUE, grp.prefix = c("IG"))
assignCelltype(obj, rank, gset, gene_names = NULL, p = 0,
  remove.na = FALSE, p.value = FALSE, nperm = 1000,
  progress.bar = TRUE, grp.prefix = c("IG"))

Arguments

`obj`	Object of class `scNMFSet`.
`rank`	Rank to examine
`gset`	List of gene sets to be used as markers
`gene_names`	Names of genes to be used for meta-gene identification
`p`	Enrichment score exponent.
`remove.na`	Remove gene sets with no overlap
`p.value`	Estimatte p values using permutation
`nperm`	No. of permutation replicates
`progress.bar`	Display progress bar for p value computation
`grp.prefix`	Gene name prefix to search for with wildcard matches in query

Details

If obj is of clas scNMFSet, it computes meta gene list using meta_gene.cv. Otherwise, obj is expected to be a data frame of the same structure as the output of meta_gene.cv; the number of rows same as the total number of metagenes per cluster, three columns per each cluster (gene name, meta-gene score, and coefficient of variation). The argument gset is a list of gene sets to be checked for enrichment in each cluster meta gene list. The enrichment score is computed using the GSEA algorithm (Subramanian et al. 2005).

Value

Matrix of enrichment score statistics with cell types in rows and clusters in columns

References

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005). “Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles.” Proc. Natl. Acad. Sci., USA, 102(43), 15545–15550. doi:10.1073/pnas.0506580102.

Examples

dir <- system.file('extdata',package='ccfindR')
pbmc <- read_10x(dir)
pbmc <- vb_factorize(pbmc, ranks=5)
meta <- meta_gene.cv(pbmc,rank=5, gene_names=rowData(pbmc)[,2])
markers <- list('B cell'=c('CD74','IG','HLA'),
                'CD8+ T'=c('CD8A','CD8B','GZMK','CCR7','LTB'),
                'CD4+ T'=c('CD3D','CD3E','IL7R','LEF1'),
                'NK'=c('GNLY','NKG7','GZMA','GZMH'),
                'Macrophage'=c('S100A8','S100A9','CD14','LYZ','CFD'))
gsea <- assignCelltype(meta, rank=5, gset=markers, grp.prefix=c('IG','HLA'))
gsea
dir <- system.file('extdata',package='ccfindR')
pbmc <- read_10x(dir)
pbmc <- vb_factorize(pbmc, ranks=5)
meta <- meta_gene.cv(pbmc,rank=5, gene_names=rowData(pbmc)[,2])
markers <- list('B cell'=c('CD74','IG','HLA'),
                'CD8+ T'=c('CD8A','CD8B','GZMK','CCR7','LTB'),
                'CD4+ T'=c('CD3D','CD3E','IL7R','LEF1'),
                'NK'=c('GNLY','NKG7','GZMA','GZMH'),
                'Macrophage'=c('S100A8','S100A9','CD14','LYZ','CFD'))
gsea <- assignCelltype(meta, rank=5, gset=markers, grp.prefix=c('IG','HLA'))
gsea

Basis matrices in an Object

Description

Retrieve or set the basis matrices W from factorization in an object

Usage

basis(object)
basis(object)

Arguments

object

Object of class scNMFSet

Details

After factorization, basis matrices corresponding to each rank value are stored as elements of a list, which is in slot basis of object of class scNMFSet. basis(object) will return the list of matrices. basis(object) <- value can be used to modify it.

Value

Either NULL or a list of same length as ranks(object), whose elements are basis matrices derived from factorization under each rank value.

Examples

s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s,ranks=seq(2,4))
basis(s)[[1]]
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s,ranks=seq(2,4))
basis(s)[[1]]

Basis matrix accessor

Description

Basis matrix accessor

Usage

## S4 method for signature 'scNMFSet'
basis(object)
## S4 method for signature 'scNMFSet'
basis(object)

Arguments

object

Object containing basis matrix

Value

List of basis matrices

Generics for basis matrix assignment

Description

Access and modify basis matrices

Usage

basis(object) <- value
basis(object) <- value

Arguments

`object`	Object of class `scNMFSet`
`value`	Basis matrix to be substituted

Value

Input object with updated basis matrices

Examples

set.seed(1)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=3)
basis(s)[[1]] <- apply(basis(s)[[1]],seq(1,2),round,digits=3)
basis(s)
set.seed(1)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=3)
basis(s)[[1]] <- apply(basis(s)[[1]],seq(1,2),round,digits=3)
basis(s)

Modify basis matrices

Description

Access and modify basis matrices

Usage

## S4 replacement method for signature 'scNMFSet'
basis(object) <- value
## S4 replacement method for signature 'scNMFSet'
basis(object) <- value

Arguments

`object`	Object of class `scNMFSet`
`value`	Basis matrix to be substituted

Value

Input object with updated basis matrices

Examples

set.seed(1)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=3)
basis(s)[[1]] <- apply(basis(s)[[1]],c(1,2),round,digits=3)
basis(s)
set.seed(1)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=3)
basis(s)[[1]] <- apply(basis(s)[[1]],c(1,2),round,digits=3)
basis(s)

Build tree connecting clusters at different ranks

Description

Build tree connecting clusters at different ranks

Usage

build_tree(object, rmax)
build_tree(object, rmax)

Arguments

`object`	Object of class `scNMFSet`
`rmax`	Maximum rank at which tree branching stops

Value

List containing the tree structure

Examples

set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
tree <- build_tree(s,rmax=5)
tree
set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
tree <- build_tree(s,rmax=5)
tree

ccfindR: Cancer Clone FindeR

Description

This package contains tools and utilities for cell-type discovery using single-cell transcriptomic data while evaluating significance of the depth of clustering (Woo et al. 2019).

References

Woo J, Winterhoff BJ, Starr TK, Aliferis C, Wang J (2019). “De novo prediction of cell-type complexity in single-cell RNA-seq and tumor microenvironments.” Life Sci. Alliance, 2, e201900443. http://dx.doi.org/10.26508/lsa.201900443.

Plot heatmap of clustering coefficient matrix

Description

Retrieve a coefficient matrix H derived from factorization by rank value and generate heatmap of its elements.

Usage

cell_map(object, rank, main = "Cells", ...)
cell_map(object, rank, main = "Cells", ...)

Arguments

`object`	Object of class `scNMFSet`.
`rank`	Rank value for which the cell map is to be displayed. The object must contain the corresponding slot: one element of `coeff(object)[[k]]` for which `ranks(object)[[k]]==rank`.
`main`	Title of plot.
`...`	Other arguments to be passed to `heatmap`, `image`, and `plot`.

Value

NULL

Examples

set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60))
rownames(x) <- seq_len(10)
colnames(x) <- seq_len(100)
s <- scNMFSet(count=x,rowData=seq_len(10),colData=seq_len(100))
s <- vb_factorize(s,ranks=seq(2,5))
plot(s)
cell_map(s, rank=3)
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60))
rownames(x) <- seq_len(10)
colnames(x) <- seq_len(100)
s <- scNMFSet(count=x,rowData=seq_len(10),colData=seq_len(100))
s <- vb_factorize(s,ranks=seq(2,5))
plot(s)
cell_map(s, rank=3)

Assign cells into clusters

Description

Use factorization results in an object to assign cells into clusters.

Usage

cluster_id(object, rank = 2)
cluster_id(object, rank = 2)

Arguments

`object`	Object of class `scNMFSet`
`rank`	Rank value whose factor matrices are to be used for assignment.

Value

Vector of length equal to the number of cells containing cluster ID numbers of each cell.

Examples

set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(count=x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
cid <- cluster_id(s, rank=5)
table(cid)
set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(count=x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
cid <- cluster_id(s, rank=5)
table(cid)

Coefficient matrices in an Object

Description

Retrieve or set the coefficient matrices from factorization in an object

Usage

coeff(object)
coeff(object)

Arguments

object

Object of class scNMFSet.

Details

After factorization, coefficient matrices H corresponding to each rank value are stored as elements of a list, which is in slot coeff of object of class scNMFSet. coeff(object) will return the list of matrices. coeff(object) <- value can be used to modify it.

Value

Either NULL or a list of same length as ranks(object), whose elements are coefficient matrices derived from factorization under each rank value.

Examples

s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s,ranks=seq(2,4))
coeff(s)[[1]]
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s,ranks=seq(2,4))
coeff(s)[[1]]

Coefficient matrix accessor

Description

Coefficient matrix accessor

Usage

## S4 method for signature 'scNMFSet'
coeff(object)
## S4 method for signature 'scNMFSet'
coeff(object)

Arguments

object

Object containing coefficient matrix

Value

List of coefficient matrices

Generics for coefficient matrix assignment

Description

Access and modify coefficient matrices

Usage

coeff(object) <- value
coeff(object) <- value

Arguments

`object`	Object of class `scNMFSet`
`value`	Coefficient matrix to be substituted

Value

Input object with updated coefficient matrices

Examples

s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=3)
coeff(s)[[1]] <- apply(coeff(s)[[1]],c(1,2),round,digits=2)
coeff(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=3)
coeff(s)[[1]] <- apply(coeff(s)[[1]],c(1,2),round,digits=2)
coeff(s)

Modify coefficient matrices

Description

Can be used to access and modify coefficient matrices

Usage

## S4 replacement method for signature 'scNMFSet'
coeff(object) <- value
## S4 replacement method for signature 'scNMFSet'
coeff(object) <- value

Arguments

`object`	Object of class `scNMFSet`
`value`	Coefficient matrix to be substituted

Value

Input object with updated coefficient matrices

Examples

s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=3)
coeff(s)[[1]] <- apply(coeff(s)[[1]],c(1,2),round,digits=2)
coeff(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=3)
coeff(s)[[1]] <- apply(coeff(s)[[1]],c(1,2),round,digits=2)
coeff(s)

Sample annotation accessor

Description

Sample annotation accessor

Usage

## S4 method for signature 'scNMFSet'
colData(x)
## S4 method for signature 'scNMFSet'
colData(x)

Arguments

`x`	Object containing sample annotation

Value

Column annotation DataFrame

Examples

library(S4Vectors)
x <- matrix(rpois(n=12,lambda=3),4,3)
rownames(x) <- seq_len(4)
colnames(x) <- c('a','b','c')
s <- scNMFSet(count=x,rowData=seq_len(4),colData=c('a','b','c'))
cols <- DataFrame(tissue=c('tissue1','tissue1','tissue2'))
rownames(cols) <- c('a','b','c')
colData(s) <- cols
s
library(S4Vectors)
x <- matrix(rpois(n=12,lambda=3),4,3)
rownames(x) <- seq_len(4)
colnames(x) <- c('a','b','c')
s <- scNMFSet(count=x,rowData=seq_len(4),colData=c('a','b','c'))
cols <- DataFrame(tissue=c('tissue1','tissue1','tissue2'))
rownames(cols) <- c('a','b','c')
colData(s) <- cols
s

Cell annotation assignment

Description

Cell annotation assignment

Usage

## S4 replacement method for signature 'scNMFSet,ANY'
colData(x) <- value
## S4 replacement method for signature 'scNMFSet,ANY'
colData(x) <- value

Arguments

`x`	Object containing cell annotation
`value`	DataFrame to be substituted

Value

Updated column annotation

Examples

library(S4Vectors)
x <- matrix(rpois(n=12,lambda=3),4,3)
rownames(x) <- seq_len(4)
colnames(x) <- c('a','b','c')
s <- scNMFSet(count=x,rowData=seq_len(4),colData=c('a','b','c'))
cols <- DataFrame(tissue=c('tissue1','tissue1','tissue2'))
rownames(cols) <- c('a','b','c')
colData(s) <- cols
s
library(S4Vectors)
x <- matrix(rpois(n=12,lambda=3),4,3)
rownames(x) <- seq_len(4)
colnames(x) <- c('a','b','c')
s <- scNMFSet(count=x,rowData=seq_len(4),colData=c('a','b','c'))
cols <- DataFrame(tissue=c('tissue1','tissue1','tissue2'))
rownames(cols) <- c('a','b','c')
colData(s) <- cols
s

Accessor for count matrix

Description

Accessor for count matrix

Usage

## S4 method for signature 'scNMFSet'
counts(object)
## S4 method for signature 'scNMFSet'
counts(object)

Arguments

object

Object containing count matrix

Value

Count matrix

Examples

s <- scNMFSet(count = matrix(rpois(n=12,lambda=3),3,4))
counts(s)

s <- scNMFSet(count = matrix(rpois(n=12,lambda=3),3,4))
counts(s)

Assignment of count matrix

Description

Count matrix can be modified

Usage

## S4 replacement method for signature 'scNMFSet'
counts(object) <- value
## S4 replacement method for signature 'scNMFSet'
counts(object) <- value

Arguments

`object`	Object containing count
`value`	Matrix-like object for replacement

Value

Object with updated count

Examples

mat <- matrix(rpois(n=12,lambda=3),3,4)
s <- scNMFSet(count = mat)
counts(s) <- mat^2
counts(s)
mat <- matrix(rpois(n=12,lambda=3),3,4)
s <- scNMFSet(count = mat)
counts(s) <- mat^2
counts(s)

Basis SD matrix accessor

Description

Basis SD matrix accessor

Usage

dbasis(object)
dbasis(object)

Arguments

object

Object containing dbasis matrix

Value

List of dbasis matrices

Basis SD matrix accessor

Description

Basis SD matrix accessor

Usage

## S4 method for signature 'scNMFSet'
dbasis(object)
## S4 method for signature 'scNMFSet'
dbasis(object)

Arguments

object

Object containing basis standard deviation (SD) matrix

Value

List of dbasis matrices

Basis SD matrix assignment

Description

Basis SD matrix assignment

Usage

dbasis(object) <- value
dbasis(object) <- value

Arguments

`object`	Object containing dbasis matrix
`value`	List for assignment

Value

Updated object

Modify dbasis matrices

Description

Access and modify dbasis matrices

Usage

## S4 replacement method for signature 'scNMFSet'
dbasis(object) <- value
## S4 replacement method for signature 'scNMFSet'
dbasis(object) <- value

Arguments

`object`	Object of class `scNMFSet`
`value`	Basis SD matrix to be substituted

Value

Modified object

Coeff SD matrix accessor

Description

Coeff SD matrix accessor

Usage

dcoeff(object)
dcoeff(object)

Arguments

object

Object containing dcoeff matrix

Value

List of dcoeff matrices

Coeffcient SD matrix accessor

Description

Coeffcient SD matrix accessor

Usage

## S4 method for signature 'scNMFSet'
dcoeff(object)
## S4 method for signature 'scNMFSet'
dcoeff(object)

Arguments

object

Object containing coeffient standard deviation (SD) matrix

Value

List of dcoeff matrices

Coeff SD matrix assignment

Description

Coeff SD matrix assignment

Usage

dcoeff(object) <- value
dcoeff(object) <- value

Arguments

`object`	Object containing dcoeff matrix
`value`	List for assignment

Value

Updated object

Modify dcoeff matrices

Description

Access and modify dcoeff matrices

Usage

## S4 replacement method for signature 'scNMFSet'
dcoeff(object) <- value
## S4 replacement method for signature 'scNMFSet'
dcoeff(object) <- value

Arguments

`object`	Object of class `scNMFSet`
`value`	Coeff SD matrix to be substituted

Value

Updated object

Maximum likelihood factorization

Description

Performs single or multiple rank NMF factorization of count matrix using maximum likelihood

Usage

factorize(object, ranks = 2, nrun = 20, randomize = FALSE,
  nsmpl = 1, verbose = 2, progress.bar = TRUE, Itmax = 10000,
  ncnn.step = 40, criterion = "likelihood", linkage = "average",
  Tol = 1e-05, store.connectivity = FALSE)
factorize(object, ranks = 2, nrun = 20, randomize = FALSE,
  nsmpl = 1, verbose = 2, progress.bar = TRUE, Itmax = 10000,
  ncnn.step = 40, criterion = "likelihood", linkage = "average",
  Tol = 1e-05, store.connectivity = FALSE)

Arguments

`object`	`scNMFSet` object containing count matrix.
`ranks`	Rank for factorization; can be a vector of multiple values.
`nrun`	No. of runs with different initial guess.
`randomize`	Boolean; if `TRUE`, input matrix is randomized.
`nsmpl`	No. of randomized samples to average over.
`verbose`	The verbosity level: 3, each iteration output printed; 2, each run output printed; 1, each randomized sample output printed; 0, silent.
`progress.bar`	Display progress bar when `nrun > 1` and `verbose = 1`.
`Itmax`	Maximum no. of iteration.
`ncnn.step`	Minimum no. of steps with no change in connectivity matrix to achieve convergence.
`criterion`	If `'likelihood'`, iteration stops when fractional changes in likelihood is below tolerance `Tol`. If `criterion = 'connectivity'`, iteration stops when connectivity matrix does not change for at least `ncnn.step` steps.
`linkage`	Method to be sent to `hclust` in calculating cophenetic correlation.
`Tol`	Tolerance for checking convergence with `criterion = 'likelihood'`.
`store.connectivity`	Returns a list also containing connectivity data.

Details

The main input is the scNMFSet object with count matrix. This function performs non-negative factorization and fills in the empty slots basis, coeff, and ranks.

When run with multiple values of ranks, factorization is repeated for each rank and the slot measure contains quality measures of the ranks. The quality measure likelihood is negative the KL distance of the fit to the target. With nrun > 1, the likelihood is the maximum among all runs.

The quality measure dispersion is the scalar measure of how far the connectivity matrix is from 0, 1. With increasing nrun, dispersion decreases from 1. nrun should be chosen such that dispersion does not change appreciably. With randomization, count matrix of object is shuffled. nsmpl can be used to average over multiple permutations. This averaging applies to each quality measure under a given rank.

Value

Object of class scNMFSet with factorization slots filled.

Examples

set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30))
s <- scNMFSet(count=x)
s <- factorize(s,ranks=seq(2,8),nrun=5)
plot(s)
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30))
s <- scNMFSet(count=x)
s <- factorize(s,ranks=seq(2,8),nrun=5)
plot(s)

Plot heatmap of basis matrix

Description

Generate heatmap of features derived from factorization of count data.

Usage

feature_map(object, basis.matrix = NULL, rank, markers = NULL,
  subtract.mean = TRUE, log = TRUE, max.per.cluster = 10,
  feature.names = NULL, perm = NULL, main = "Feature map",
  cscale = NULL, cex.cluster = 1, cex.feature = 0.5, mar = NULL,
  ...)
feature_map(object, basis.matrix = NULL, rank, markers = NULL,
  subtract.mean = TRUE, log = TRUE, max.per.cluster = 10,
  feature.names = NULL, perm = NULL, main = "Feature map",
  cscale = NULL, cex.cluster = 1, cex.feature = 0.5, mar = NULL,
  ...)

Arguments

`object`	Object of class `scNMFSet`.
`basis.matrix`	Basis matrix can be supplied instead of `object`.
`rank`	Rank value for which the gene map is to be displayed. The object must contain the corresponding slot (one element of `basis(object)[[k]]` for which `ranks(object)[[k]]==rank`.
`markers`	Vector of gene names containing markers to be included in addition to the metagenes. All entries of `rowData(object)` matching them will be added to the metagene list.
`subtract.mean`	Process each rows of basis matrix `W` by standardization using the mean of elements within the row.
`log`	If `TRUE`, `subtract.mean` uses geometric mean and division. Otherwise, use arithmetic mean and subtraction.
`max.per.cluster`	Maximum number of metagenes per cluster.
`feature.names`	Names to be used in the plot for features.
`perm`	Permutation of cluster IDs.
`main`	Main title.
`cscale`	Colors for heatmap.
`cex.cluster`	Cluster ID label size.
`cex.feature`	Feature ID label size.
`mar`	Margins for `graphics::par`.
`...`	Other arguments to be passed to `image`, and `plot`.

Details

This function uses image() and is more flexible than gene_map.

If object contains multiple ranks, only the requested rank's basis matrix W will be displayed. As in gene_map, the features displayed in rows are selected by "max" scheme

Value

NULL

Examples

 
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60))
rownames(x) <- seq_len(10)
 
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60))
rownames(x) <- seq_len(10)
colnames(x) <- seq_len(100)
s <- scNMFSet(count=x,rowData=seq_len(10), colData=seq_len(100))
s <- vb_factorize(s,ranks=seq(2,5))
plot(s)
feature_map(s, rank=3)
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60))
rownames(x) <- seq_len(10)
 
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60))
rownames(x) <- seq_len(10)
colnames(x) <- seq_len(100)
s <- scNMFSet(count=x,rowData=seq_len(10), colData=seq_len(100))
s <- vb_factorize(s,ranks=seq(2,5))
plot(s)
feature_map(s, rank=3)

Filter cells with quality control criteria

Description

Remove low quality cell entries from object

Usage

filter_cells(object, umi.min = 0, umi.max = Inf, plot = TRUE,
  remove.zeros = TRUE)
filter_cells(object, umi.min = 0, umi.max = Inf, plot = TRUE,
  remove.zeros = TRUE)

Arguments

`object`	`scNMFSet` object
`umi.min`	Minimum UMI count for cell filtering
`umi.max`	Maximum UMI count for cell filtering
`plot`	If `TRUE`, the UMI count distribution of all cells will be displayed. Cells selected are colored red.
`remove.zeros`	Remove rows/columns containing zeros only

Details

Takes as input scNMFSet object and plots histogram of UMI counts for each cell. Optionally, cells are filtered using minimum and maximum UMI counts. The resulting object is returned after removing empty rows and columns, if any.

Value

scNMFSet object with cells filtered.

Examples

set.seed(1)
s <- scNMFSet(matrix(stats::rpois(n=1200,lambda=3),40,30))
s <- filter_cells(s,umi.min=10^2.0,umi.max=10^2.1)
set.seed(1)
s <- scNMFSet(matrix(stats::rpois(n=1200,lambda=3),40,30))
s <- filter_cells(s,umi.min=10^2.0,umi.max=10^2.1)

Filter genes with quality control criteria

Description

Select genes with high relative variance in count data for further analysis

Usage

filter_genes(object, markers = NULL, vmr.min = 0,
  min.cells.expressed = 0, max.cells.expressed = Inf,
  rescue.genes = FALSE, progress.bar = TRUE, save.memory = FALSE,
  plot = TRUE, log = "xy", cex = 0.5)
filter_genes(object, markers = NULL, vmr.min = 0,
  min.cells.expressed = 0, max.cells.expressed = Inf,
  rescue.genes = FALSE, progress.bar = TRUE, save.memory = FALSE,
  plot = TRUE, log = "xy", cex = 0.5)

Arguments

`object`	`scNMFSet` object.
`markers`	A vector containing marker genes to be selected. All rows in `rowData` that contain columns matching this set will be selected.
`vmr.min`	Minimum variance-to-mean ratio for gene filtering.
`min.cells.expressed`	Minimum no. of cells expressed for gene filtering.
`max.cells.expressed`	Maximum no. of cells expressed for gene filtering.
`rescue.genes`	Selected additional genes whose (non-zero) count distributions have at least one mode.
`progress.bar`	Display progress of mode-gene scan or VMR calculation with `save.memory = TRUE`.
`save.memory`	For a very large number of cells, calculate VMR row by row while avoiding calls to `as.matrix()`. Progress bar will be displayed unless `progress.bar=FALSE`.
`plot`	Plot the distribution of no. of cells expressed vs. VMR.
`log`	Axis in log-scale, `c('x','y','xy')`.
`cex`	Symbol size for each gene in the plot.

Details

Takes as input scNMFSet object and scatterplot no. of cells expressed versus VMR (variance-to-mean ratio) for each gene. Optionally, genes are filtered using minimum VMR together with a range of no. of cells expressed.

Value

Object of class scNMFSet.

Examples

set.seed(1)
s <- scNMFSet(matrix(stats::rpois(n=1200,lambda=3),40,30))
s <- filter_genes(s,vmr.min=1.0,min.cells.expressed=28,
        rescue.genes=FALSE)
set.seed(1)
s <- scNMFSet(matrix(stats::rpois(n=1200,lambda=3),40,30))
s <- filter_genes(s,vmr.min=1.0,min.cells.expressed=28,
        rescue.genes=FALSE)

Plot heatmap of metagene matrix

Description

Generate heatmap of metagenes derived from factorization of count data.

Usage

gene_map(object, rank, markers = NULL, subtract.mean = TRUE,
  log = TRUE, max.per.cluster = 10, Colv = NA, gene.names = NULL,
  main = "Genes", col = NULL, ...)
gene_map(object, rank, markers = NULL, subtract.mean = TRUE,
  log = TRUE, max.per.cluster = 10, Colv = NA, gene.names = NULL,
  main = "Genes", col = NULL, ...)

Arguments

`object`	Object of class `scNMFSet`.
`rank`	Rank value for which the gene map is to be displayed. The object must contain the corresponding slot (one element of `basis(object)[[k]]` for which `ranks(object)[[k]]==rank`.
`markers`	Vector of gene names containing markers to be included in addition to the metagenes. All entries of `rowData(object)` matching them will be added to the metagene list.
`subtract.mean`	Process each rows of basis matrix `W` by standardization using the mean of elements within the row.
`log`	If `TRUE`, `subtract.mean` uses geometric mean and division. Otherwise, use arithmetic mean and subtraction.
`max.per.cluster`	Maximum number of metagenes per cluster.
`Colv`	`NA` suppresses reordering and dendrogram of clusters along the column. See `heatmap`.
`gene.names`	Names to be used in the plot for genes.
`main`	Title of plot.
`col`	Colors for the cluster panels on the left and top.
`...`	Other arguments to be passed to `heatmap`, `image`, and `plot`.

Details

Wrapper for heatmap to display metagenes and associated basis matrix element magnitudes. Factorization results inside an object specified by its rank value will be retrieved, and metagene sets identified from clusters.

If object contains multiple ranks, only the requested rank's basis matrix W will be displayed. The genes displayed in rows are selected by "max" scheme [Carmona-Saez, BMC Bioinformatics (2006), https://doi.org/10.1186/1471-2105-7-54]: for each cluster (k in 1:ncol), rows of W are sorted by decreasing order of W[,k]. Marker genes for k are those among the top nmarker for which W[,k] is maximum within each row.

Value

NULL

Examples

 
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60))
rownames(x) <- seq_len(10)
colnames(x) <- seq_len(100)
s <- scNMFSet(count=x,rowData=seq_len(10), colData=seq_len(100))
s <- vb_factorize(s,ranks=seq(2,5))
plot(s)
gene_map(s, rank=3)
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60))
rownames(x) <- seq_len(10)
colnames(x) <- seq_len(100)
s <- scNMFSet(count=x,rowData=seq_len(10), colData=seq_len(100))
s <- vb_factorize(s,ranks=seq(2,5))
plot(s)
gene_map(s, rank=3)

Factorization measures in an Object

Description

Retrieve or set factorization measures in an object

Usage

measure(object)
measure(object)

Arguments

object

Object of class scNMFSet.

Details

Factorization under multiple rank values lead to measures stored in a data frame inside a slot measure. In maximum likelihood using factorize, this set of quality measures include dispersion and cophenetic coeeficients for each rank. In Bayesian factorization using vb_factorize, log evidence for each rank is stored. measure(object) will return the data frame. measure(object) <- value can be used to modify it.

Value

Either NULL or a data frame containing measures.

Examples

s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s,ranks=seq(2,4))
measure(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s,ranks=seq(2,4))
measure(s)

Rank measure accessor

Description

Rank measure accessor

Usage

## S4 method for signature 'scNMFSet'
measure(object)
## S4 method for signature 'scNMFSet'
measure(object)

Arguments

object

Object containing measure

Value

Data frame of measure

Generics for factorization measure assignment

Description

Can be used to access and modify factorization measure

Usage

measure(object) <- value
measure(object) <- value

Arguments

`object`	Object of class `scNMFSet`
`value`	Measure to be substituted

Value

Input object with updated measure

Examples

s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=3)
measure(s)[,-1] <- apply(measure(s)[,-1], c(1,2), round,digits=3)
measure(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=3)
measure(s)[,-1] <- apply(measure(s)[,-1], c(1,2), round,digits=3)
measure(s)

Modify factorization measure

Description

Can be used to access and modify factorization measure

Usage

## S4 replacement method for signature 'scNMFSet'
measure(object) <- value
## S4 replacement method for signature 'scNMFSet'
measure(object) <- value

Arguments

`object`	Object of class `scNMFSet`
`value`	Measure to be substituted

Value

Input object with updated measure

Examples

s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=3)
measure(s)[,-1] <- apply(measure(s)[,-1], c(1,2), round,digits=3)
measure(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=3)
measure(s)[,-1] <- apply(measure(s)[,-1], c(1,2), round,digits=3)
measure(s)

Meta gene table with CV

Description

Generates meta gene table with coefficient of variation

Usage

meta_gene.cv(object = NULL, rank, basis.matrix = NULL, dbasis = NULL,
  max.per.cluster = 100, gene_names = NULL, subtract.mean = TRUE,
  log = TRUE, cv.max = Inf)
meta_gene.cv(object = NULL, rank, basis.matrix = NULL, dbasis = NULL,
  max.per.cluster = 100, gene_names = NULL, subtract.mean = TRUE,
  log = TRUE, cv.max = Inf)

Arguments

`object`	Main object containing factorization outcome
`rank`	Rank for which meta gene is to be found
`basis.matrix`	Basis matrix to work with. Only necessary when `object` is `NULL`.
`dbasis`	Variance of basis matrix. Only necessary when `object` is `NULL`.
`max.per.cluster`	Maximum meta genes per cluster.
`gene_names`	Name of genes. If `NULL`, will be taken from row names.
`subtract.mean`	Standardize magnitudes of basis elements by subtracting mean
`log`	Use geometric mean.
`cv.max`	Upper bound for CV in selecting meta genes.

Value

Data frame with meta genes and their CV in each column.

Examples

set.seed(1)
x <- simulate_whx(nrow=50, ncol=100, rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s, ranks=seq(2,8), nrun=5)
plot(s)
meta_gene.cv(s, rank=5)
set.seed(1)
x <- simulate_whx(nrow=50, ncol=100, rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s, ranks=seq(2,8), nrun=5)
plot(s)
meta_gene.cv(s, rank=5)

Find metagenes from basis matrix

Description

Retrieve a basis matrix from an object and find metagenes.

Usage

meta_genes(object, rank, basis.matrix = NULL, max.per.cluster = 10,
  gene_names = NULL, subtract.mean = TRUE, log = TRUE)
meta_genes(object, rank, basis.matrix = NULL, max.per.cluster = 10,
  gene_names = NULL, subtract.mean = TRUE, log = TRUE)

Arguments

`object`	Object of class `scNMFSet`.
`rank`	Rank value for which metagenes are to be found.
`basis.matrix`	Instead of an object containing basis matrices, the matrix itself can be provided.
`max.per.cluster`	Maximum number of metagenes per cluster.
`gene_names`	Names of genes to replace row names of basis matrix.
`subtract.mean`	Standardize the matrix elements with means within each row.
`log`	Use geometric mean and division instead of arithmetic mean and subtraction with `subtract.mean`.

Value

List of vectors each containing metagene names of clusters.

Examples

set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60))
rownames(x) <- seq_len(10)
colnames(x) <- seq_len(100)
s <- scNMFSet(count=x,rowData=seq_len(10),colData=seq_len(100))
s <- vb_factorize(s,ranks=seq(2,5))
meta_genes(s, rank=4)
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60))
rownames(x) <- seq_len(10)
colnames(x) <- seq_len(100)
s <- scNMFSet(count=x,rowData=seq_len(10),colData=seq_len(100))
s <- vb_factorize(s,ranks=seq(2,5))
meta_genes(s, rank=4)

Generate Newick format tree string from tree list object

Description

Generate Newick format tree string from tree list object

Usage

newick(tree, parent = "1.1", string = "")
newick(tree, parent = "1.1", string = "")

Arguments

`tree`	Tree list object from `build_tree`
`parent`	Parent ID
`string`	Newick string of parent tree

Value

String of newick tree

Examples

set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
tree <- build_tree(s,rmax=5)
nw <- newick(tree=tree)
nw
set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
tree <- build_tree(s,rmax=5)
nw <- newick(tree=tree)
nw

Normalize count data

Description

Rescale count matrix entries such that all cells have the same library size.

Usage

normalize_count(object)
normalize_count(object)

Arguments

object

scNMFSet object.

Details

For analysis purposes, it is sometimes useful to rescale integer count data into floats such that all cells have the same median counts. This function will calculate the median of all UMI counts of cells (total number of RNAs derived from each cell). All count data are then rescaled such that cells have uniform UMI count equal to the median.

Value

scNMFSet object with normalized count data.

Examples

library(Matrix)
set.seed(1)
s <- scNMFSet(count=matrix(rpois(n=1200,lambda=3),40,30))
colMeans(counts(s))
s <- normalize_count(s)
colMeans(counts(s))
library(Matrix)
set.seed(1)
s <- scNMFSet(count=matrix(rpois(n=1200,lambda=3),40,30))
colMeans(counts(s))
s <- normalize_count(s)
colMeans(counts(s))

Determine optimal rank

Description

Takes as main argument scNMFSet object containing factorized output and estimate the optimal rank.

Usage

optimal_rank(object, df = 10, BF.threshold = 3, type = NULL,
  m = NULL)
optimal_rank(object, df = 10, BF.threshold = 3, type = NULL,
  m = NULL)

Arguments

`object`	`scNMFSet` object containing factorization output, or data frame containing the rank-evidence profile.
`df`	Degrees of freedom for split fit. Upper bound is the total number of data points (number of rank values scanned).
`BF.threshold`	Bayes factor threshold for statistical threshold.
`type`	`c(1,2)`. Type 1 is where there is a clear maximum. Type 2 is where marginal likelihood reaches a maximal level and stays constant. If omitted, the type will be inferred from data.
`m`	Number of features (e.g., genes) in the count matrix. Only necessary when `object` is of type `data.frame`.

Details

The input object is used along with Bayes factor threshold to determine the heterogeneity type (1 or 2) and the optimal rank. If evidence(rank 1)/evidence(rank2) > BF.treshold, rank 1 is favorable than rank 2.

Value

List containing type and ropt (optimal rank).

Examples

set.seed(1)
x <- simulate_whx(nrow=50, ncol=100, rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s, ranks=seq(2,8), nrun=5)
plot(s)
optimal_rank(s)
set.seed(1)
x <- simulate_whx(nrow=50, ncol=100, rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s, ranks=seq(2,8), nrun=5)
plot(s)
optimal_rank(s)

Plot gene variance distributions

Description

Gene variance to mean ratio and the number of expressing cells are plotted.

Usage

plot_genes(object, vmr = NULL, ncexpr = NULL, selected_genes = NULL,
  variable_genes = NULL, mode_genes = NULL, marker_genes = NULL,
  save.memory = FALSE, progress.bar = TRUE, log = "xy", cex = 0.5)
plot_genes(object, vmr = NULL, ncexpr = NULL, selected_genes = NULL,
  variable_genes = NULL, mode_genes = NULL, marker_genes = NULL,
  save.memory = FALSE, progress.bar = TRUE, log = "xy", cex = 0.5)

Arguments

`object`	Object containing count data
`vmr`	Variance to mean ratio (VMR)
`ncexpr`	Number of cells expressing each gene
`selected_genes`	Logical vector specifing genes selected
`variable_genes`	Logical vector specifing genes with high VMR
`mode_genes`	Logical vector specifying genes with nonzero modes
`marker_genes`	Logical vector specifying marker genes
`save.memory`	If `TRUE`, calculate VMR using slower method to save memory. Not used when gene lists are supplied.
`progress.bar`	Display progress bar for VMR calculation. Not used when gene lists are supplied.
`log`	Axis in log-scale, `c('x','y','xy')`.
`cex`	Symbol size for genes (supplied to `plot()`).

Details

This function can be called separately or is also called within filter_genes by default. In the latter case, parameters other than object will have been already filled. If called separately with NULL gene lists, VMR is recalculated but gene selection is not done.

Value

NULL

Examples

set.seed(1)
s <- scNMFSet(matrix(stats::rpois(n=1200,lambda=3),40,30))
plot_genes(s)
set.seed(1)
s <- scNMFSet(matrix(stats::rpois(n=1200,lambda=3),40,30))
plot_genes(s)

Plot cluster tree

Description

Visualize the output of build_tree as a dendrogram.

Usage

plot_tree(tree, direction = "rightwards", cex = 0.7, ...)
plot_tree(tree, direction = "rightwards", cex = 0.7, ...)

Arguments

`tree`	List containing tree structure. Output from `build_tree`
`direction`	`c('rightwards','downwards')`; the direction of dendrogram
`cex`	Font size of edge/tip labels
`...`	Other parameters to `plot.phylo`

Details

Uses plot.phylo to visualize cluster tree.

Value

NULL

Examples

set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
tree <- build_tree(s,rmax=5)
plot_tree(tree)
set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
tree <- build_tree(s,rmax=5)
plot_tree(tree)

Rank values in an Object

Description

Retrieve or set the rank values in an object

Usage

ranks(object)
ranks(object)

Arguments

object

Object of class scNMFSet.

Details

Ranks for which factorization has been performed are stored in slot ranks of scNMFSet object. ranks(object) will return the rank vector. ranks(object) <- value can be used to modify it.

Value

Either NULL or vector.

Examples

s <- scNMFSet(matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s,ranks=seq(2,4))
ranks(s)
s <- scNMFSet(matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s,ranks=seq(2,4))
ranks(s)

Rank accessor

Description

Rank accessor

Usage

## S4 method for signature 'scNMFSet'
ranks(object)
## S4 method for signature 'scNMFSet'
ranks(object)

Arguments

object

Object containing rank values

Value

Vector of rank values

Generics for ranks assignment

Description

Replace ranks slot of scNMFSet object

Usage

ranks(object) <- value
ranks(object) <- value

Arguments

`object`	Object of class `scNMFSet`
`value`	Rank values (vector) to be substituted

Value

Input object with updated ranks

Examples

s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=seq(2,3))
ranks(s) <- c('two','three')
ranks(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=seq(2,3))
ranks(s) <- c('two','three')
ranks(s)

Modify ranks

Description

Replace ranks slot of scNMFSet object

Usage

## S4 replacement method for signature 'scNMFSet'
ranks(object) <- value
## S4 replacement method for signature 'scNMFSet'
ranks(object) <- value

Arguments

`object`	Object of class `scNMFSet`
`value`	Rank values (vector) to be substituted

Value

Input object with updated ranks

Examples

s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=seq(2,3))
ranks(s) <- c('two','three')
ranks(s)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
s <- vb_factorize(s, ranks=seq(2,3))
ranks(s) <- c('two','three')
ranks(s)

Read 10x data and generate scNMF object

Description

Read count, gene, and barcode annotation data in 10x format and create an object of class scNMFSet.

Usage

read_10x(dir, count = "matrix.mtx", genes = "genes.tsv",
  barcodes = "barcodes.tsv", remove.zeros = TRUE)
read_10x(dir, count = "matrix.mtx", genes = "genes.tsv",
  barcodes = "barcodes.tsv", remove.zeros = TRUE)

Arguments

`dir`	Name of directory containing data files.
`count`	Name of count matrix file.
`genes`	Name of gene annotation file.
`barcodes`	Name of cell annotation file.
`remove.zeros`	If `TRUE`, empty rows/columns are removed.

Details

Files for count, genes, and barcodes are assumed to be present in dir. Count data are in sparse "Matrix Market" format (https://math.nist.gov/MatrixMarket/formats.html).

Value

Object of class scNMFSet

Examples

library(S4Vectors)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
rowData(s) <- DataFrame(seq_len(4))
colData(s) <- DataFrame(seq_len(3))
write_10x(s,dir='.')
s <- read_10x(dir='.')
s
library(S4Vectors)
s <- scNMFSet(count=matrix(rpois(n=12,lambda=3),4,3))
rowData(s) <- DataFrame(seq_len(4))
colData(s) <- DataFrame(seq_len(3))
write_10x(s,dir='.')
s <- read_10x(dir='.')
s

Remove rows or columns that are empty from an object

Description

Remove rows or columns that are empty from an object

Usage

remove_zeros(object)
remove_zeros(object)

Arguments

object

Object containing data

Value

Object with empty rows/columns removed

Examples

set.seed(1)
x <- matrix(rpois(n=100,lambda=0.1),10,10)
s <- scNMFSet(count=x,remove.zeros=FALSE)
s2 <- remove_zeros(s)
s2
set.seed(1)
x <- matrix(rpois(n=100,lambda=0.1),10,10)
s <- scNMFSet(count=x,remove.zeros=FALSE)
s2 <- remove_zeros(s)
s2

Rename tips of trees with cell types

Description

Rename tips of trees with cell types

Usage

rename_tips(tree, rank, tip.labels)
rename_tips(tree, rank, tip.labels)

Arguments

`tree`	List containing tree
`rank`	Rank value of which tip names are to be replaced
`tip.labels`	Vector of new names for tips

Value

List containing tree with updated tip labels

Examples

set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
tree <- build_tree(s,rmax=5)
tree <- rename_tips(tree,rank=5,tip.labels=letters[seq_len(5)])
tree 
set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
tree <- build_tree(s,rmax=5)
tree <- rename_tips(tree,rank=5,tip.labels=letters[seq_len(5)])
tree

Feature annotation accessor

Description

Feature annotation accessor

Usage

## S4 method for signature 'scNMFSet'
rowData(x)
## S4 method for signature 'scNMFSet'
rowData(x)

Arguments

`x`	Object containing data

Value

DataFrame of feature annotation

Examples

x <- matrix(rpois(n=12,lambda=3),4,3)
rownames(x) <- seq_len(4)
colnames(x) <- seq_len(3)
s <- scNMFSet(count=x,rowData=seq_len(4),colData=seq_len(3))
rowData(s) 
x <- matrix(rpois(n=12,lambda=3),4,3)
rownames(x) <- seq_len(4)
colnames(x) <- seq_len(3)
s <- scNMFSet(count=x,rowData=seq_len(4),colData=seq_len(3))
rowData(s)

Gene annotation assignment

Description

Gene annotation assignment

Usage

## S4 replacement method for signature 'scNMFSet'
rowData(x) <- value
## S4 replacement method for signature 'scNMFSet'
rowData(x) <- value

Arguments

`x`	Object containing data
`value`	DataFrame of row annotation to be substituted

Value

Row annotation DataFrame

Create `scNMFSet` object

Description

Object derived from SingleCellExperiment

Usage

scNMFSet(count = NULL, ..., remove.zeros = TRUE)
scNMFSet(count = NULL, ..., remove.zeros = TRUE)

Arguments

`count`	Count matrix
`...`	Other parameters of `SingleCellExperiment`
`remove.zeros`	Remove empty rows and columns

Value

Object of class scNMFSet.

Examples

count <- matrix(rpois(n=12,lambda=2),4,3)
s <- scNMFSet(count=count)
s
count <- matrix(rpois(n=12,lambda=2),4,3)
s <- scNMFSet(count=count)
s

Class `scNMFSet` for storing input data and results

Description

S4 class derived from SingleCellExperiment that can store single-cell count matrix, gene and cell annotation data frames, and factorization factors as well as quality measures for rank determination.

Usage

## S4 method for signature 'scNMFSet,ANY'
plot(x)
## S4 method for signature 'scNMFSet,ANY'
plot(x)

Arguments

`x`	Object containing measure

Value

Object of class scNMFSet

NULL

Methods (by generic)

plot: Plot measures of an object. For quality measures derived from maximum likelihood inference, dispersion and cophenetic will be plotted separately.

For measure derived from Bayesian inference, log evidence as a function of rank values will be plotted.

Slots

assays

Named list for count matrix counts.

rowData

DataFrame for gene (feature) names and annotations in columns.

colData

DataFrame for cell IDs and other annotations in columns (e.g., barcodes, cell types).

ranks

Vector for rank values for which factorization has been performed.

basis

List (of length equal to that of ranks) of basis matrices W from factorization; dimension nrow x rank, where nrow is no. of rows in count.

coeff

List (of length equal to that of ranks) of coefficient matrices H from factorization; dimension rank x ncol, where ncol is no. of columns in count.

measure

Data frame of factorization quality measures for each rank (likelihood and dispersion).

Other slots inherited from SingleCellExperiment class are not explicitly used.

Examples

library(S4Vectors)
# toy matrix
ngenes <- 8 
ncells <- 5
mat <- matrix(rpois(n=ngenes*ncells,lambda=3),ngenes,ncells)

abc <- letters[seq_len(ngenes)]
ABC <- LETTERS[seq_len(ncells)] 
genes <- DataFrame(gene_id=abc)
cells <- DataFrame(cell_id=ABC)
rownames(mat) <- rownames(genes) <- abc
colnames(mat) <- rownames(cells) <- ABC

# create scNMFSet object
s <- scNMFSet(count=mat,rowData=genes,colData=cells)
# alternative ways
s2 <- scNMFSet(count=mat)
s2 <- scNMFSet(assays=list(counts=mat))        

# show dimensions
dim(s)

# show slots
rowData(s)

# modify slots
colData(s) <- DataFrame(cell_id=seq_len(ncells),
              cell_type=c(rep('tissue1',2),
                          rep('tissue2',ncells-2)))
colData(s)
library(S4Vectors)
# toy matrix
ngenes <- 8 
ncells <- 5
mat <- matrix(rpois(n=ngenes*ncells,lambda=3),ngenes,ncells)

abc <- letters[seq_len(ngenes)]
ABC <- LETTERS[seq_len(ncells)] 
genes <- DataFrame(gene_id=abc)
cells <- DataFrame(cell_id=ABC)
rownames(mat) <- rownames(genes) <- abc
colnames(mat) <- rownames(cells) <- ABC

# create scNMFSet object
s <- scNMFSet(count=mat,rowData=genes,colData=cells)
# alternative ways
s2 <- scNMFSet(count=mat)
s2 <- scNMFSet(assays=list(counts=mat))        

# show dimensions
dim(s)

# show slots
rowData(s)

# modify slots
colData(s) <- DataFrame(cell_id=seq_len(ncells),
              cell_type=c(rep('tissue1',2),
                          rep('tissue2',ncells-2)))
colData(s)

Display object

Description

Display the class and dimension of an object

Object name itself on command line or (show(object)) will display class and dimensionality

Usage

## S4 method for signature 'scNMFSet'
show(object)
## S4 method for signature 'scNMFSet'
show(object)

Arguments

object

Object of class scNMFSet

Value

NULL

Examples

s <- scNMFSet(matrix(rpois(n=12,lambda=3),4,3))
show(s)
s <- scNMFSet(matrix(rpois(n=12,lambda=3),4,3))
show(s)

Generate simulated data for factorization

Description

Use one of two schemes to generate simulated data suitable for testing factorization.

Usage

simulate_data(nfeatures, nsamples, generate.factors = FALSE,
  nfactor = 10, alpha0 = 0.5, shuffle = TRUE)
simulate_data(nfeatures, nsamples, generate.factors = FALSE,
  nfactor = 10, alpha0 = 0.5, shuffle = TRUE)

Arguments

`nfeatures`	Number of features `m` (e.g., genes).
`nsamples`	Vector of sample sizes in each cluster. Rank `r` is equal to the length of this vector. Sum of elements is the total sample size `n`.
`generate.factors`	Generate factor matrices `W` and `H`, each with dimension `n x r` and `r x n`. If `FALSE`, factor matrices are not used and count data are generated directly from `r` multinomials for `m` genes.
`nfactor`	Total RNA count of multinomials for each cluster with `generate.factors = FALSE`. Small `nfactor` will yield sparse count matrix.
`alpha0`	Variance parameter of Dirichlet distribution from which multinomial probabilities are sampled with `generate.factors = FALSE`.
`shuffle`	Randomly permute rows and columns of count matrix.

Details

In one scheme (generate.factors = TRUE), simulated factor matrices W and H are used to build count data X = WH. In the second scheme, factor matrices are not used and X is sampled directly from r (rank requested) sets of multinomial distributions.

Value

If generate.factors = TRUE, list of components w (basis matrix, nfeatures x rank), h (coefficient matrix, rank x ncells, where ncells is equal to n, the sum of nsamples), and x, a matrix of Poisson deviates with mean W x H. If generate.factors = FALSE, only the count matrix x is in the list.

Examples

set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30))
s <- scNMFSet(x)
s
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30))
s <- scNMFSet(x)
s

Simulate factor matrices and data using priors

Description

Under Bayesian formulation, use prior distributions of factor matrices and generate simulated data

Usage

simulate_whx(nrow, ncol, rank, aw = 0.1, bw = 1, ah = 0.1, bh = 1)
simulate_whx(nrow, ncol, rank, aw = 0.1, bw = 1, ah = 0.1, bh = 1)

Arguments

`nrow`	Number of features (genes).
`ncol`	Number of cells (samples).
`rank`	Rank (ncol of W, nrow of H).
`aw`	Shape parameter of basis prior.
`bw`	Mean of basis prior. Scale parameter is equal to `aw/bw`.
`ah`	Shape parameter of coefficient prior.
`bh`	Mean of coefficient prior. Scale parameter is equal to `ah/bh`.

Details

Basis W and coefficient matrices H are sampled from gamma distributions (priors) with shape (aw,ah) and mean (bw,bh) parameters. Count data X are sampled from Poisson distribution with mean values given by WH.

Value

List with elements w, h, and x, each containing basis, coefficient, and count matrices.

Examples

set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(count=x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
plot(s)  
set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(count=x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
plot(s)

Bayesian NMF inference of count matrix

Description

Perform variational Bayes NMF and store factor matrices in object

Usage

vb_factorize(object, ranks = 2, nrun = 1, verbose = 2,
  progress.bar = TRUE, initializer = "random", Itmax = 10000,
  hyper.update = rep(TRUE, 4), gamma.a = 1, gamma.b = 1,
  Tol = 1e-05, hyper.update.n0 = 10, hyper.update.dn = 1,
  connectivity = TRUE, fudge = NULL, ncores = 1, useC = TRUE,
  unif.stop = TRUE)
vb_factorize(object, ranks = 2, nrun = 1, verbose = 2,
  progress.bar = TRUE, initializer = "random", Itmax = 10000,
  hyper.update = rep(TRUE, 4), gamma.a = 1, gamma.b = 1,
  Tol = 1e-05, hyper.update.n0 = 10, hyper.update.dn = 1,
  connectivity = TRUE, fudge = NULL, ncores = 1, useC = TRUE,
  unif.stop = TRUE)

Arguments

`object`	`scNMFSet` object containing count matrix.
`ranks`	Rank for factorization; can be a vector of multiple values.
`nrun`	No. of runs with different initial guesses.
`verbose`	The verbosity level: 3, each iteration output printed; 2, each run output printed; 1, each randomized sample output printed; 0, silent.
`progress.bar`	Display progress bar with `verbose = 1` for multiple runs.
`initializer`	If `'random'`, randomized initial conditions; `'svd2'` for singular value decomposed initial condition.
`Itmax`	Maximum no. of iteration.
`hyper.update`	Vector of four logicals, each indcating whether hyperparameters `c(aw, bw, ah, bh)` should be optimized.
`gamma.a`	Gamma distribution shape parameter.
`gamma.b`	Gamma distribution mean. These two parameters are used for fixed hyperparameters with `hyper.update` elements `FALSE`.
`Tol`	Tolerance for terminating iteration.
`hyper.update.n0`	Initial number of steps in which hyperparameters are fixed.
`hyper.update.dn`	Step intervals for hyperparameter updates.
`connectivity`	If `TRUE`, connectivity and dispersion will be calculated after each run. Can be turned off to save memory.
`fudge`	Small positive number used as lower bound for factor matrix elements to avoid singularity. If `fudge = NULL` (default), it will be replaced by `.Machine$double.eps`. Can be set to 0 to skip regularization.
`ncores`	Number of processors (cores) to run. If `ncores > 1`, parallelization is attempted.
`useC`	Use C++ version of updates for speed.
`unif.stop`	Terminate if any of columns in basis matrix is uniform.

Details

The main input is the scNMFSet object with count matrix. This function performs non-negative factorization using Bayesian algorithm and gamma priors. Slots basis, coeff, and ranks are filled.

When run with multiple values of ranks, factorization is repeated for each rank and the slot measure contains log evidence and optimal hyperparameters for each rank. With nrun > 1, the solution with the maximum log evidence is stored for a given rank.

Value

Object of class scNMFSet with factorization slots filled.

Examples

set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
plot(s)
set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
plot(s)

Visualize clusters

Description

Use tSNE to generate two-dimensional map of coefficient matrix.

Usage

visualize_clusters(object, rank, verbose = FALSE, cex = 1,
  cex.names = 0.7, ...)
visualize_clusters(object, rank, verbose = FALSE, cex = 1,
  cex.names = 0.7, ...)

Arguments

`object`	`scNMF` object.
`rank`	Rank value to extract from `object`.
`verbose`	Print tSNE messages.
`cex`	Symbol size in tSNE plot
`cex.names`	Font size of labels in count barplot.
`...`	Other parameters to send to `Rtsne`.

Details

It retrieves a coefficient matrix H from an object and use its elements to assign each cell into clusters. t-Distributed Stochastic Neighbor Embedding (t-SNE; https://lvdmaaten.github.io/tsne/) is used to visualize the clustering in 2D. Also plotted is the distribution of cell counts for all clusters.

Value

NULL

Examples

set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30))
rownames(x) <- seq_len(10)
colnames(x) <- seq_len(170)
s <- scNMFSet(count=x,rowData=seq_len(10),colData=seq_len(170))
s <- vb_factorize(s,ranks=seq(2,5))
visualize_clusters(s,rank=5)
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30))
rownames(x) <- seq_len(10)
colnames(x) <- seq_len(170)
s <- scNMFSet(count=x,rowData=seq_len(10),colData=seq_len(170))
s <- vb_factorize(s,ranks=seq(2,5))
visualize_clusters(s,rank=5)

Write 10x data files

Description

Use an object and write count and annotation files in 10x format.

Usage

write_10x(object, dir, count = "matrix.mtx", genes = "genes.tsv",
  barcodes = "barcodes.tsv", quote = FALSE)
write_10x(object, dir, count = "matrix.mtx", genes = "genes.tsv",
  barcodes = "barcodes.tsv", quote = FALSE)

Arguments

`object`	Object of class `scNMFSet` containing count data
`dir`	Directory where files are to be written.
`count`	File name for count matrix.
`genes`	File name for gene annotation.
`barcodes`	File name for cell annotation.
`quote`	Suppress quotation marks in output files.

Value

NULL

Examples

set.seed(1)
x <- matrix(rpois(n=12,lambda=3),4,3)
rownames(x) <- seq_len(4)
colnames(x) <- seq_len(3)
s <- scNMFSet(count=x,rowData=seq_len(4),colData=seq_len(3))
write_10x(s,dir='.')
set.seed(1)
x <- matrix(rpois(n=12,lambda=3),4,3)
rownames(x) <- seq_len(4)
colnames(x) <- seq_len(3)
s <- scNMFSet(count=x,rowData=seq_len(4),colData=seq_len(3))
write_10x(s,dir='.')

Write meta genes to a file

Description

Write a csv file of meta gene lists from input list

Usage

write_meta(meta, file)
write_meta(meta, file)

Arguments

`meta`	List of meta genes output from `meta_genes`
`file`	Output file name

Value

NULL

Examples

set.seed(1)
x <- simulate_whx(nrow=50, ncol=100, rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s, ranks=seq(2,8), nrun=5)
plot(s)
m <- meta_genes(s, rank=5)
write_meta(m, file='meta.csv')
set.seed(1)
x <- simulate_whx(nrow=50, ncol=100, rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s, ranks=seq(2,8), nrun=5)
plot(s)
m <- meta_genes(s, rank=5)
write_meta(m, file='meta.csv')

`x`	Object to be subsetted
`i`	row index
`j`	column index

Package 'ccfindR'

Help Index

Subsetting scNMFSet object

Description

Usage

Arguments

Value

Cell type assignment via GSEA

Description

Usage

Arguments

Details

Value

References

Examples

Basis matrices in an Object

Description

Usage

Arguments

Details

Value

Examples

Basis matrix accessor

Description

Usage

Arguments

Value

Generics for basis matrix assignment

Description

Usage

Arguments

Value

Examples

Modify basis matrices

Description

Usage

Arguments

Value

Examples

Build tree connecting clusters at different ranks

Description

Usage

Arguments

Value

Examples

ccfindR: Cancer Clone FindeR

Description

References

Plot heatmap of clustering coefficient matrix

Description

Usage

Arguments

Value

Examples

Assign cells into clusters

Description

Usage

Arguments

Value

Examples

Coefficient matrices in an Object

Description

Usage

Arguments

Details

Value

Examples

Coefficient matrix accessor

Description

Usage

Arguments

Value

Generics for coefficient matrix assignment

Description

Usage

Arguments

Value

Examples

Modify coefficient matrices

Description