Package 'scDDboost' reference manual

Title:	A compositional model to assess expression changes from single-cell rna-seq data
Description:	scDDboost is an R package to analyze changes in the distribution of single-cell expression data between two experimental conditions. Compared to other methods that assess differential expression, scDDboost benefits uniquely from information conveyed by the clustering of cells into cellular subtypes. Through a novel empirical Bayesian formulation it calculates gene-specific posterior probabilities that the marginal expression distribution is the same (or different) between the two conditions. The implementation in scDDboost treats gene-level expression data within each condition as a mixture of negative binomial distributions.
Authors:	Xiuyu Ma [cre, aut], Michael A. Newton [ctb]
Maintainer:	Xiuyu Ma <[email protected]>
License:	GPL (>= 2)
Version:	1.9.2
Built:	2025-01-29 08:47:08 UTC
Source:	https://github.com/bioc/scDDboost

A compositional model to assess expression changes from single-cell rna-seq data

Description

scDDboost is an R package to analyze changes in the distribution of single-cell expression data between two experimental conditions. Compared to other methods that assess differential expression, scDDboost benefits uniquely from information conveyed by the clustering of cells into cellular subtypes. Through a novel empirical Bayesian formulation it calculates gene-specific posterior probabilities that the marginal expression distribution is the same (or different) between the two conditions. The implementation in scDDboost treats gene-level expression data within each condition as a mixture of negative binomial distributions.

Details

The DESCRIPTION file:

Package:	scDDboost
Type:	Package
Title:	A compositional model to assess expression changes from single-cell rna-seq data
Version:	1.9.2
Date:	2018-10-31
Authors@R:	c(person("Xiuyu","Ma",email="[email protected]",role = c("cre", "aut")),person(given = "Michael A.",family = "Newton", email="[email protected]",role = "ctb"))
Description:	scDDboost is an R package to analyze changes in the distribution of single-cell expression data between two experimental conditions. Compared to other methods that assess differential expression, scDDboost benefits uniquely from information conveyed by the clustering of cells into cellular subtypes. Through a novel empirical Bayesian formulation it calculates gene-specific posterior probabilities that the marginal expression distribution is the same (or different) between the two conditions. The implementation in scDDboost treats gene-level expression data within each condition as a mixture of negative binomial distributions.
License:	GPL (>= 2)
Imports:	Rcpp (>= 0.12.11), RcppEigen (>= 0.3.2.9.0),EBSeq, BiocParallel, mclust, SingleCellExperiment, cluster, Oscope, SummarizedExperiment, stats, methods
biocViews:	SingleCell, Software, Clustering, Sequencing, GeneExpression, DifferentialExpression, Bayesian
Depends:	R (>= 4.2), ggplot2
LinkingTo:	Rcpp, RcppEigen, BH
Suggests:	knitr, rmarkdown, BiocStyle, testthat
SystemRequirements:	c++14
Roxygen:	list(wrap=FALSE)
RoxygenNote:	7.1.2
VignetteBuilder:	knitr
BugReports:	https://github.com/wiscstatman/scDDboost/issues
URL:	https://github.com/wiscstatman/scDDboost
Config/pak/sysreqs:	make libssl-dev
Repository:	https://bioc.r-universe.dev
RemoteUrl:	https://github.com/bioc/scDDboost
RemoteRef:	HEAD
RemoteSha:	c913cbeacbdb950abe10513e54439aa25a797b6e
Author:	Xiuyu Ma [cre, aut], Michael A. Newton [ctb]
Maintainer:	Xiuyu Ma <[email protected]>

Index of help topics:

EBS                     accelerated empirical bayesian
LL                      likelihood function for hyperparameters
                        estimation
calD                    calculate distance matrix
clusHelper              function to get intra and inter distance for
                        clusters
detK                    determine the number of clusters
extractInfo             extract count matrix from SingleCellExperiment
                        object
gCl                     gene_level cluster
gRef                    generate reference matrix
genRClus                generate random clusterings
getDD                   index of DD genes under FDR control
getSizeofDD             number of DD genes under FDR control
getZ1Z2                 function to get counts of cluster sizes at two
                        conditions
isRef                   check refinement relation between two clusters
lpt1t2                  log likelihood of z1,z2 given t1,t2
lpzgt                   log likelihood of aggregated multinomial counts
                        z given aggregated proportions t
mdd                     posterior of proportion change given mixture
                        double dirichlet prior
pat                     generating partition patterns
pdd                     calculate posterior probabilities of a gene to
                        be differential distributed
pddAggregate            function to aggregate intermediate results and
                        get prob of DD
rwMle                   MLE for random weighting parameter
scDDboost-package       A compositional model to assess expression
                        changes from single-cell rna-seq data
sim_dat                 scDDboost

Package used to score evidence of differential distribution in single-cell RNA-seq data

Author(s)

Xiuyu Ma [cre, aut], Michael A. Newton [ctb]

Maintainer: Xiuyu Ma <[email protected]>

References

https://projecteuclid.org/journals/annals-of-applied-statistics/volume-15/issue-2/A-compositional-model-to-assess-expression-changes-from-single-cell/10.1214/20-AOAS1423.short

Examples

data(sim_dat)
dat = extractInfo(sim_dat)
data_counts = dat$count_matrix
cd = dat$condition
bp <- BiocParallel::MulticoreParam(4)
D_c = calD(data_counts,bp)
pDD = pdd(data_counts,cd,bp,D_c)
data(sim_dat)
dat = extractInfo(sim_dat)
data_counts = dat$count_matrix
cd = dat$condition
bp <- BiocParallel::MulticoreParam(4)
D_c = calD(data_counts,bp)
pDD = pdd(data_counts,cd,bp,D_c)

calculate distance matrix

Description

calculate distance matrix

Usage

calD(data, bp)
calD(data, bp)

Arguments

`data`	transcripts
`bp`	bioc parallel parameter

Value

distance matrix

Examples

data(sim_dat)
dat <- extractInfo(sim_dat)
data_counts <- dat$count_matrix
bp <- BiocParallel::MulticoreParam(4)
D_c <- calD(data_counts,bp)
data(sim_dat)
dat <- extractInfo(sim_dat)
data_counts <- dat$count_matrix
bp <- BiocParallel::MulticoreParam(4)
D_c <- calD(data_counts,bp)

function to get intra and inter distance for clusters

Description

function to get intra and inter distance for clusters

Usage

clusHelper(D, i)
clusHelper(D, i)

Arguments

`D`	distance matrix
`i`	number of clusters

Value

vector of intra and inter distance

determine the number of clusters

Description

determine the number of clusters

Usage

detK(D, epi = 1)
detK(D, epi = 1)

Arguments

`D`	distance matrix
`epi`	threshold for cutting off

Value

number of clusters

Examples

data(sim_dat)
dat <- extractInfo(sim_dat)
data_counts <- dat$count_matrix
bp <- BiocParallel::MulticoreParam(4)
D_c <- calD(data_counts,bp)
detK(D_c)
data(sim_dat)
dat <- extractInfo(sim_dat)
data_counts <- dat$count_matrix
bp <- BiocParallel::MulticoreParam(4)
D_c <- calD(data_counts,bp)
detK(D_c)

accelerated empirical bayesian

Description

accelerated empirical bayesian

Usage

EBS(data, conditions, gclus, sf, iter = 10, hyper, PP, stp1, stp2)
EBS(data, conditions, gclus, sf, iter = 10, hyper, PP, stp1, stp2)

Arguments

`data`	single cell expression matrix, row as genes column as cells
`conditions`	partition of cells
`gclus`	partition of genes
`sf`	size factors
`iter`	maximum iteration step of EM
`hyper`	hyper parameters for beta distributions
`PP`	pattern of partitions
`stp1`	step size of hyperparameter alpha (shared by all units) in one step EM
`stp2`	step size of hyperparameter beta (unit specific) in one step EM

Value

posterior probability of mean expression pattern

extract count matrix from SingleCellExperiment object

Description

extract count matrix from SingleCellExperiment object

Usage

extractInfo(data)
extractInfo(data)

Arguments

data

SingleCellExperiment object

Value

list of count matrix and condition vector

Examples

data(sim_dat) 
dat <- extractInfo(sim_dat)
data(sim_dat) 
dat <- extractInfo(sim_dat)

gene_level cluster

Description

gene_level cluster

Usage

gCl(data, bp)
gCl(data, bp)

Arguments

`data`	transcripts
`bp`	bioc parallel parameter

Value

return a matrix whose row represent gene specific cluster

generate random clusterings

Description

generate random clusterings

Usage

genRClus(D, a, K)
genRClus(D, a, K)

Arguments

`D`	distance matrix of cells
`a`	paramter for weights
`K`	number of subtypes

Value

random generated clustering of cells

index of DD genes under FDR control

Description

index of DD genes under FDR control

Usage

getDD(pDD, FDR = 0.01)
getDD(pDD, FDR = 0.01)

Arguments

`pDD`	probability of genes being DD
`FDR`	fdr to be controlled

Value

index of positive genes

Examples

p_dd <- c(0.01,0.99,0.7,0.5)
getDD(p_dd)
p_dd <- c(0.01,0.99,0.7,0.5)
getDD(p_dd)

number of DD genes under FDR control

Description

number of DD genes under FDR control

Usage

getSizeofDD(pDD, FDR = 0.01)
getSizeofDD(pDD, FDR = 0.01)

Arguments

`pDD`	estimated probability of being DD
`FDR`	fdr to be controlled

Value

number of positive genes

Examples

p_dd <- c(0.1,0.99,1,0.05,0.05)
getSizeofDD(p_dd)
p_dd <- c(0.1,0.99,1,0.05,0.05)
getSizeofDD(p_dd)

function to get counts of cluster sizes at two conditions

Description

function to get counts of cluster sizes at two conditions

Usage

getZ1Z2(ccl, cd)
getZ1Z2(ccl, cd)

Arguments

`ccl`	clustering label
`cd`	condition label

Value

return list of counts

generate reference matrix

Description

generate reference matrix

Usage

gRef(Posp)
gRef(Posp)

Arguments

Posp

possible partition of data

Value

return a matrix indicate the refinement relation between different partitions.

check refinement relation between two clusters

Description

check refinement relation between two clusters

Usage

isRef(x, y)
isRef(x, y)

Arguments

`x`	a cluster
`y`	a cluster

Value

whether x refines y

likelihood function for hyperparameters estimation

Description

likelihood function for hyperparameters estimation

Usage

LL(param, x, d0)
LL(param, x, d0)

Arguments

`param`	parameters to be determined by MLE
`x`	distance matrix of cells
`d0`	rate parameter of prior of 1 / true distance

Value

return hyperparameteres a.

log likelihood of z1,z2 given t1,t2

Description

log likelihood of z1,z2 given t1,t2

Usage

lpt1t2(z1, z2, pp, alpha1, alpha2)
lpt1t2(z1, z2, pp, alpha1, alpha2)

Arguments

`z1`	counts of each group in condition 1
`z2`	counts of each group in condition 2
`pp`	a partition
`alpha1`	parameter of double dirichlet prior
`alpha2`	parameter of double dirichlet prior

Value

log likelihood of z1,z2 given t1,t2

log likelihood of aggregated multinomial counts z given aggregated proportions t

Description

log likelihood of aggregated multinomial counts z given aggregated proportions t

Usage

lpzgt(z, pp, alpha)
lpzgt(z, pp, alpha)

Arguments

`z`	counts of each group in one condition
`pp`	a partition
`alpha`	parameter of double dirichlet prior

Value

log likelihood of aggregated multinomial counts z given aggregated proportions t

posterior of proportion change given mixture double dirichlet prior

Description

posterior of proportion change given mixture double dirichlet prior

Usage

mdd(z1, z2, pat, alpha1, alpha2)
mdd(z1, z2, pat, alpha1, alpha2)

Arguments

`z1`	counts of each group in condition 1
`z2`	counts of each group in condition 2
`pat`	partition patterns
`alpha1`	parameter of double dirichlet prior
`alpha2`	parameter of double dirichlet prior

Value

posterior of proportion change

generating partition patterns

Description

generating partition patterns

Usage

pat(K)
pat(K)

Arguments

`K`	number of elements

Value

all possible partition of K elements

Examples

pat(3)
pat(3)

calculate posterior probabilities of a gene to be differential distributed

Description

calculate posterior probabilities of a gene to be differential distributed

Usage

pdd(
  data,
  cd,
  bp,
  D,
  random = TRUE,
  norm = TRUE,
  epi = 1,
  Upper = 1000,
  nrandom = 50,
  iter = 20,
  reltol = 0.001,
  stp1 = 1e-06,
  stp2 = 0.01,
  K = 0
)
pdd(
  data,
  cd,
  bp,
  D,
  random = TRUE,
  norm = TRUE,
  epi = 1,
  Upper = 1000,
  nrandom = 50,
  iter = 20,
  reltol = 0.001,
  stp1 = 1e-06,
  stp2 = 0.01,
  K = 0
)

Arguments

`data`	normalized preprocessed transcripts
`cd`	conditions label
`bp`	bioc parallel parameter
`D`	distance matrix of cells or cluster of cells or a given clustering
`random`	boolean indicator of whether randomzation has been been implemented on distance matrix
`norm`	boolean indicator of whether the input expression data is normalized
`epi`	tol for change of validity score in determining number of clusters
`Upper`	bound for hyper parameters optimization
`nrandom`	number of random generated distance matrix
`iter`	max number of iterations for EM
`reltol`	relative tolerance for optim on weighting paramters
`stp1`	step size of hyperparameter alpha (shared by all units) in one step EM
`stp2`	step size of hyperparameter beta (unit specific) in one step EM
`K`	number of subtypes, could be user specified or determined internally(set to 0)

Value

posterior probabilities of a gene to be differential distributed

Examples

data(sim_dat)
dat <- extractInfo(sim_dat)
data_counts <- dat$count_matrix
cd <- dat$condition
bp <- BiocParallel::MulticoreParam(4)
D_c <- calD(data_counts,bp)
pDD <- pdd(data_counts,cd,bp,D_c)
data(sim_dat)
dat <- extractInfo(sim_dat)
data_counts <- dat$count_matrix
cd <- dat$condition
bp <- BiocParallel::MulticoreParam(4)
D_c <- calD(data_counts,bp)
pDD <- pdd(data_counts,cd,bp,D_c)

function to aggregate intermediate results and get prob of DD

Description

function to aggregate intermediate results and get prob of DD

Usage

pddAggregate(z1, z2, Posp, DE, K, REF)
pddAggregate(z1, z2, Posp, DE, K, REF)

Arguments

`z1`	counts of cluster sizes in condition 1
`z2`	counts of cluster sizes in condition 2
`Posp`	partition of cells
`DE`	posterior probabilities of DE patterns
`K`	number of clusters
`REF`	reference matrix indicating relation of nested partitions

Value

return vector of prob of DD

MLE for random weighting parameter

Description

MLE for random weighting parameter

Usage

rwMle(D, reltol)
rwMle(D, reltol)

Arguments

`D`	distance matrix of cells
`reltol`	tolerance of convergence

Value

MLE of random weighting parameter

scDDboost

Description

simulated data for demonstration, data are mixture negative binomial distributed

Usage

data(sim_dat)
data(sim_dat)

Format

An object of class "list".

Examples

data(sim_dat)
data(sim_dat)

Package 'scDDboost'

Help Index

A compositional model to assess expression changes from single-cell rna-seq data

Description

Details

Author(s)

References

See Also

Examples

calculate distance matrix

Description

Usage

Arguments

Value

Examples

function to get intra and inter distance for clusters

Description

Usage

Arguments

Value

determine the number of clusters

Description

Usage

Arguments

Value

Examples

accelerated empirical bayesian

Description

Usage

Arguments

Value

extract count matrix from SingleCellExperiment object

Description

Usage

Arguments

Value

Examples

gene_level cluster

Description

Usage

Arguments

Value

generate random clusterings

Description

Usage

Arguments

Value

index of DD genes under FDR control

Description

Usage

Arguments

Value

Examples

number of DD genes under FDR control

Description

Usage

Arguments

Value

Examples

function to get counts of cluster sizes at two conditions

Description

Usage

Arguments

Value

generate reference matrix

Description

Usage

Arguments

Value

check refinement relation between two clusters

Description

Usage

Arguments

Value

likelihood function for hyperparameters estimation

Description

Usage

Arguments

Value

log likelihood of z1,z2 given t1,t2