Title: | A compositional model to assess expression changes from single-cell rna-seq data |
---|---|
Description: | scDDboost is an R package to analyze changes in the distribution of single-cell expression data between two experimental conditions. Compared to other methods that assess differential expression, scDDboost benefits uniquely from information conveyed by the clustering of cells into cellular subtypes. Through a novel empirical Bayesian formulation it calculates gene-specific posterior probabilities that the marginal expression distribution is the same (or different) between the two conditions. The implementation in scDDboost treats gene-level expression data within each condition as a mixture of negative binomial distributions. |
Authors: | Xiuyu Ma [cre, aut], Michael A. Newton [ctb] |
Maintainer: | Xiuyu Ma <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.9.0 |
Built: | 2024-11-30 04:20:04 UTC |
Source: | https://github.com/bioc/scDDboost |
scDDboost is an R package to analyze changes in the distribution of single-cell expression data between two experimental conditions. Compared to other methods that assess differential expression, scDDboost benefits uniquely from information conveyed by the clustering of cells into cellular subtypes. Through a novel empirical Bayesian formulation it calculates gene-specific posterior probabilities that the marginal expression distribution is the same (or different) between the two conditions. The implementation in scDDboost treats gene-level expression data within each condition as a mixture of negative binomial distributions.
The DESCRIPTION file:
Package: | scDDboost |
Type: | Package |
Title: | A compositional model to assess expression changes from single-cell rna-seq data |
Version: | 1.9.0 |
Date: | 2018-10-31 |
Authors@R: | c(person("Xiuyu","Ma",email="[email protected]",role = c("cre", "aut")),person(given = "Michael A.",family = "Newton", email="[email protected]",role = "ctb")) |
Description: | scDDboost is an R package to analyze changes in the distribution of single-cell expression data between two experimental conditions. Compared to other methods that assess differential expression, scDDboost benefits uniquely from information conveyed by the clustering of cells into cellular subtypes. Through a novel empirical Bayesian formulation it calculates gene-specific posterior probabilities that the marginal expression distribution is the same (or different) between the two conditions. The implementation in scDDboost treats gene-level expression data within each condition as a mixture of negative binomial distributions. |
License: | GPL (>= 2) |
Imports: | Rcpp (>= 0.12.11), RcppEigen (>= 0.3.2.9.0),EBSeq, BiocParallel, mclust, SingleCellExperiment, cluster, Oscope, SummarizedExperiment, stats, methods |
biocViews: | SingleCell, Software, Clustering, Sequencing, GeneExpression, DifferentialExpression, Bayesian |
Depends: | R (>= 4.2), ggplot2 |
LinkingTo: | Rcpp, RcppEigen, BH |
Suggests: | knitr, rmarkdown, BiocStyle, testthat |
SystemRequirements: | c++11 |
Roxygen: | list(wrap=FALSE) |
RoxygenNote: | 7.1.2 |
VignetteBuilder: | knitr |
BugReports: | https://github.com/wiscstatman/scDDboost/issues |
URL: | https://github.com/wiscstatman/scDDboost |
Config/pak/sysreqs: | make libssl-dev |
Repository: | https://bioc.r-universe.dev |
RemoteUrl: | https://github.com/bioc/scDDboost |
RemoteRef: | HEAD |
RemoteSha: | 73c0bbe7be318a517ac885ceaa9c58d0d1e4f6b8 |
Author: | Xiuyu Ma [cre, aut], Michael A. Newton [ctb] |
Maintainer: | Xiuyu Ma <[email protected]> |
Index of help topics:
EBS accelerated empirical bayesian LL likelihood function for hyperparameters estimation calD calculate distance matrix clusHelper function to get intra and inter distance for clusters detK determine the number of clusters extractInfo extract count matrix from SingleCellExperiment object gCl gene_level cluster gRef generate reference matrix genRClus generate random clusterings getDD index of DD genes under FDR control getSizeofDD number of DD genes under FDR control getZ1Z2 function to get counts of cluster sizes at two conditions isRef check refinement relation between two clusters lpt1t2 log likelihood of z1,z2 given t1,t2 lpzgt log likelihood of aggregated multinomial counts z given aggregated proportions t mdd posterior of proportion change given mixture double dirichlet prior pat generating partition patterns pdd calculate posterior probabilities of a gene to be differential distributed pddAggregate function to aggregate intermediate results and get prob of DD rwMle MLE for random weighting parameter scDDboost-package A compositional model to assess expression changes from single-cell rna-seq data sim_dat scDDboost
Package used to score evidence of differential distribution in single-cell RNA-seq data
Xiuyu Ma [cre, aut], Michael A. Newton [ctb]
Maintainer: Xiuyu Ma <[email protected]>
https://projecteuclid.org/journals/annals-of-applied-statistics/volume-15/issue-2/A-compositional-model-to-assess-expression-changes-from-single-cell/10.1214/20-AOAS1423.short
https://github.com/wiscstatman/scDDboost/blob/master/DESCRIPTION
data(sim_dat) dat = extractInfo(sim_dat) data_counts = dat$count_matrix cd = dat$condition bp <- BiocParallel::MulticoreParam(4) D_c = calD(data_counts,bp) pDD = pdd(data_counts,cd,bp,D_c)
data(sim_dat) dat = extractInfo(sim_dat) data_counts = dat$count_matrix cd = dat$condition bp <- BiocParallel::MulticoreParam(4) D_c = calD(data_counts,bp) pDD = pdd(data_counts,cd,bp,D_c)
calculate distance matrix
calD(data, bp)
calD(data, bp)
data |
transcripts |
bp |
bioc parallel parameter |
distance matrix
data(sim_dat) dat <- extractInfo(sim_dat) data_counts <- dat$count_matrix bp <- BiocParallel::MulticoreParam(4) D_c <- calD(data_counts,bp)
data(sim_dat) dat <- extractInfo(sim_dat) data_counts <- dat$count_matrix bp <- BiocParallel::MulticoreParam(4) D_c <- calD(data_counts,bp)
function to get intra and inter distance for clusters
clusHelper(D, i)
clusHelper(D, i)
D |
distance matrix |
i |
number of clusters |
vector of intra and inter distance
determine the number of clusters
detK(D, epi = 1)
detK(D, epi = 1)
D |
distance matrix |
epi |
threshold for cutting off |
number of clusters
data(sim_dat) dat <- extractInfo(sim_dat) data_counts <- dat$count_matrix bp <- BiocParallel::MulticoreParam(4) D_c <- calD(data_counts,bp) detK(D_c)
data(sim_dat) dat <- extractInfo(sim_dat) data_counts <- dat$count_matrix bp <- BiocParallel::MulticoreParam(4) D_c <- calD(data_counts,bp) detK(D_c)
accelerated empirical bayesian
EBS(data, conditions, gclus, sf, iter = 10, hyper, PP, stp1, stp2)
EBS(data, conditions, gclus, sf, iter = 10, hyper, PP, stp1, stp2)
data |
single cell expression matrix, row as genes column as cells |
conditions |
partition of cells |
gclus |
partition of genes |
sf |
size factors |
iter |
maximum iteration step of EM |
hyper |
hyper parameters for beta distributions |
PP |
pattern of partitions |
stp1 |
step size of hyperparameter alpha (shared by all units) in one step EM |
stp2 |
step size of hyperparameter beta (unit specific) in one step EM |
posterior probability of mean expression pattern
extract count matrix from SingleCellExperiment object
extractInfo(data)
extractInfo(data)
data |
SingleCellExperiment object |
list of count matrix and condition vector
data(sim_dat) dat <- extractInfo(sim_dat)
data(sim_dat) dat <- extractInfo(sim_dat)
gene_level cluster
gCl(data, bp)
gCl(data, bp)
data |
transcripts |
bp |
bioc parallel parameter |
return a matrix whose row represent gene specific cluster
generate random clusterings
genRClus(D, a, K)
genRClus(D, a, K)
D |
distance matrix of cells |
a |
paramter for weights |
K |
number of subtypes |
random generated clustering of cells
index of DD genes under FDR control
getDD(pDD, FDR = 0.01)
getDD(pDD, FDR = 0.01)
pDD |
probability of genes being DD |
FDR |
fdr to be controlled |
index of positive genes
p_dd <- c(0.01,0.99,0.7,0.5) getDD(p_dd)
p_dd <- c(0.01,0.99,0.7,0.5) getDD(p_dd)
number of DD genes under FDR control
getSizeofDD(pDD, FDR = 0.01)
getSizeofDD(pDD, FDR = 0.01)
pDD |
estimated probability of being DD |
FDR |
fdr to be controlled |
number of positive genes
p_dd <- c(0.1,0.99,1,0.05,0.05) getSizeofDD(p_dd)
p_dd <- c(0.1,0.99,1,0.05,0.05) getSizeofDD(p_dd)
function to get counts of cluster sizes at two conditions
getZ1Z2(ccl, cd)
getZ1Z2(ccl, cd)
ccl |
clustering label |
cd |
condition label |
return list of counts
generate reference matrix
gRef(Posp)
gRef(Posp)
Posp |
possible partition of data |
return a matrix indicate the refinement relation between different partitions.
check refinement relation between two clusters
isRef(x, y)
isRef(x, y)
x |
a cluster |
y |
a cluster |
whether x refines y
likelihood function for hyperparameters estimation
LL(param, x, d0)
LL(param, x, d0)
param |
parameters to be determined by MLE |
x |
distance matrix of cells |
d0 |
rate parameter of prior of 1 / true distance |
return hyperparameteres a.
log likelihood of z1,z2 given t1,t2
lpt1t2(z1, z2, pp, alpha1, alpha2)
lpt1t2(z1, z2, pp, alpha1, alpha2)
z1 |
counts of each group in condition 1 |
z2 |
counts of each group in condition 2 |
pp |
a partition |
alpha1 |
parameter of double dirichlet prior |
alpha2 |
parameter of double dirichlet prior |
log likelihood of z1,z2 given t1,t2
log likelihood of aggregated multinomial counts z given aggregated proportions t
lpzgt(z, pp, alpha)
lpzgt(z, pp, alpha)
z |
counts of each group in one condition |
pp |
a partition |
alpha |
parameter of double dirichlet prior |
log likelihood of aggregated multinomial counts z given aggregated proportions t
posterior of proportion change given mixture double dirichlet prior
mdd(z1, z2, pat, alpha1, alpha2)
mdd(z1, z2, pat, alpha1, alpha2)
z1 |
counts of each group in condition 1 |
z2 |
counts of each group in condition 2 |
pat |
partition patterns |
alpha1 |
parameter of double dirichlet prior |
alpha2 |
parameter of double dirichlet prior |
posterior of proportion change
generating partition patterns
pat(K)
pat(K)
K |
number of elements |
all possible partition of K elements
pat(3)
pat(3)
calculate posterior probabilities of a gene to be differential distributed
pdd( data, cd, bp, D, random = TRUE, norm = TRUE, epi = 1, Upper = 1000, nrandom = 50, iter = 20, reltol = 0.001, stp1 = 1e-06, stp2 = 0.01, K = 0 )
pdd( data, cd, bp, D, random = TRUE, norm = TRUE, epi = 1, Upper = 1000, nrandom = 50, iter = 20, reltol = 0.001, stp1 = 1e-06, stp2 = 0.01, K = 0 )
data |
normalized preprocessed transcripts |
cd |
conditions label |
bp |
bioc parallel parameter |
D |
distance matrix of cells or cluster of cells or a given clustering |
random |
boolean indicator of whether randomzation has been been implemented on distance matrix |
norm |
boolean indicator of whether the input expression data is normalized |
epi |
tol for change of validity score in determining number of clusters |
Upper |
bound for hyper parameters optimization |
nrandom |
number of random generated distance matrix |
iter |
max number of iterations for EM |
reltol |
relative tolerance for optim on weighting paramters |
stp1 |
step size of hyperparameter alpha (shared by all units) in one step EM |
stp2 |
step size of hyperparameter beta (unit specific) in one step EM |
K |
number of subtypes, could be user specified or determined internally(set to 0) |
posterior probabilities of a gene to be differential distributed
data(sim_dat) dat <- extractInfo(sim_dat) data_counts <- dat$count_matrix cd <- dat$condition bp <- BiocParallel::MulticoreParam(4) D_c <- calD(data_counts,bp) pDD <- pdd(data_counts,cd,bp,D_c)
data(sim_dat) dat <- extractInfo(sim_dat) data_counts <- dat$count_matrix cd <- dat$condition bp <- BiocParallel::MulticoreParam(4) D_c <- calD(data_counts,bp) pDD <- pdd(data_counts,cd,bp,D_c)
function to aggregate intermediate results and get prob of DD
pddAggregate(z1, z2, Posp, DE, K, REF)
pddAggregate(z1, z2, Posp, DE, K, REF)
z1 |
counts of cluster sizes in condition 1 |
z2 |
counts of cluster sizes in condition 2 |
Posp |
partition of cells |
DE |
posterior probabilities of DE patterns |
K |
number of clusters |
REF |
reference matrix indicating relation of nested partitions |
return vector of prob of DD
MLE for random weighting parameter
rwMle(D, reltol)
rwMle(D, reltol)
D |
distance matrix of cells |
reltol |
tolerance of convergence |
MLE of random weighting parameter
simulated data for demonstration, data are mixture negative binomial distributed
data(sim_dat)
data(sim_dat)
An object of class "list"
.
data(sim_dat)
data(sim_dat)