Title: | Batch Effect Correction with Unknow Subtypes for scRNA-seq data |
---|---|
Description: | BUSseq R package fits an interpretable Bayesian hierarchical model---the Batch Effects Correction with Unknown Subtypes for scRNA seq Data (BUSseq)---to correct batch effects in the presence of unknown cell types. BUSseq is able to simultaneously correct batch effects, clusters cell types, and takes care of the count data nature, the overdispersion, the dropout events, and the cell-specific sequencing depth of scRNA-seq data. After correcting the batch effects with BUSseq, the corrected value can be used for downstream analysis as if all cells were sequenced in a single batch. BUSseq can integrate read count matrices obtained from different scRNA-seq platforms and allow cell types to be measured in some but not all of the batches as long as the experimental design fulfills the conditions listed in our manuscript. |
Authors: | Fangda Song [aut, cre] , Ga Ming Chan [aut], Yingying Wei [aut] |
Maintainer: | Fangda Song <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.13.0 |
Built: | 2024-12-19 03:07:41 UTC |
Source: | https://github.com/bioc/BUSseq |
BUSseq R package fits an interpretable Bayesian hierarchical model—the Batch Effects Correction with Unknown Subtypes for scRNA seq Data (BUSseq)—to correct batch effects in the presence of unknown cell types. BUSseq is able to simultaneously correct batch effects, clusters cell types, and takes care of the count data nature, the overdispersion, the dropout events, and the cell-specific sequencing depth of scRNA-seq data. After correcting the batch effects with BUSseq, the corrected value can be used for downstream analysis as if all cells were sequenced in a single batch. BUSseq can integrate read count matrices obtained from different scRNA-seq platforms and allow cell types to be measured in some but not all of the batches as long as the experimental design fulfills the conditions listed in our manuscript.
Fangda Song [aut, cre] (<https://orcid.org/0000-0001-6007-3517>), Ga Ming Chan [aut], Yingying Wei [aut] (<https://orcid.org/0000-0003-3826-336X>)
Maintainer: Fangda Song <[email protected]>
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
####################################### # Apply BUSseq to the Simulation Data # ####################################### library(SingleCellExperiment) RawCountData <- assay(BUSseqfits_example, "counts") batch_ind <- colData(BUSseqfits_example) sce <- SingleCellExperiment(assays = list(counts = RawCountData), colData = DataFrame(Batch_ind = batch_ind)) BUSseqfits_res <- BUSseq_MCMC(ObservedData = sce, seed = 1234, n.cores = 2, n.celltypes = 4, n.iterations = 500) ################################################ # Extract Estimates from the BUSseqfits Object # ################################################ #return cell type indicators w.est <- celltypes(BUSseqfits_res) #return the intercept and odds ratio of the logistic regression #for dropout events gamma.est <- dropout_coefficient_values(BUSseqfits_res) #return the log-scale baseline expression values alpha.est <- baseline_expression_values(BUSseqfits_res) #return the cell-type effects beta.est <- celltype_effects(BUSseqfits_res) #return the mean expression levels mu.est <- celltype_mean_expression(BUSseqfits_res) #return the cell-specific global effects delta.est <- cell_effect_values(BUSseqfits_res) #return the location batch effects nu.est <- location_batch_effects(BUSseqfits_res) #return the overdispersion parameters phi.est <- overdispersions(BUSseqfits_res) #return the intrinsic gene indices D.est <- intrinsic_genes_BUSseq(BUSseqfits_res) #return the BIC value BIC <- BIC_BUSseq(BUSseqfits_res) #return the raw read count matrix CountData_raw <- raw_read_counts(BUSseqfits_res) #return the imputed read count matrix CountData_imputed <- imputed_read_counts(BUSseqfits_res) #return the corrected read count matrix CountData_corrected <- corrected_read_counts(BUSseqfits_res) ################# # Visualization # ################# #generate the heatmap of raw read count data heatmap_data_BUSseq(BUSseqfits_res, project_name="Heatmap_raw") #generate the heatmap of imputed read count data heatmap_data_BUSseq(BUSseqfits_res, data_type = "Imputed", project_name="Heatmap_imputed") #generate the heatmap of corrected read count data heatmap_data_BUSseq(BUSseqfits_res, data_type = "Corrected", project_name="Heatmap_corrected")
####################################### # Apply BUSseq to the Simulation Data # ####################################### library(SingleCellExperiment) RawCountData <- assay(BUSseqfits_example, "counts") batch_ind <- colData(BUSseqfits_example) sce <- SingleCellExperiment(assays = list(counts = RawCountData), colData = DataFrame(Batch_ind = batch_ind)) BUSseqfits_res <- BUSseq_MCMC(ObservedData = sce, seed = 1234, n.cores = 2, n.celltypes = 4, n.iterations = 500) ################################################ # Extract Estimates from the BUSseqfits Object # ################################################ #return cell type indicators w.est <- celltypes(BUSseqfits_res) #return the intercept and odds ratio of the logistic regression #for dropout events gamma.est <- dropout_coefficient_values(BUSseqfits_res) #return the log-scale baseline expression values alpha.est <- baseline_expression_values(BUSseqfits_res) #return the cell-type effects beta.est <- celltype_effects(BUSseqfits_res) #return the mean expression levels mu.est <- celltype_mean_expression(BUSseqfits_res) #return the cell-specific global effects delta.est <- cell_effect_values(BUSseqfits_res) #return the location batch effects nu.est <- location_batch_effects(BUSseqfits_res) #return the overdispersion parameters phi.est <- overdispersions(BUSseqfits_res) #return the intrinsic gene indices D.est <- intrinsic_genes_BUSseq(BUSseqfits_res) #return the BIC value BIC <- BIC_BUSseq(BUSseqfits_res) #return the raw read count matrix CountData_raw <- raw_read_counts(BUSseqfits_res) #return the imputed read count matrix CountData_imputed <- imputed_read_counts(BUSseqfits_res) #return the corrected read count matrix CountData_corrected <- corrected_read_counts(BUSseqfits_res) ################# # Visualization # ################# #generate the heatmap of raw read count data heatmap_data_BUSseq(BUSseqfits_res, project_name="Heatmap_raw") #generate the heatmap of imputed read count data heatmap_data_BUSseq(BUSseqfits_res, data_type = "Imputed", project_name="Heatmap_imputed") #generate the heatmap of corrected read count data heatmap_data_BUSseq(BUSseqfits_res, data_type = "Corrected", project_name="Heatmap_corrected")
BUSseq_MCMC
Function
The function gives the estimated log-scale baseline expression levels from the
output SingleCellExperiment
object.
baseline_expression_values(sce_BUSseqfit)
baseline_expression_values(sce_BUSseqfit)
sce_BUSseqfit |
An output |
alpha.est |
The estimated log-scale baseline expression levels, a G-dimensional vector whose g-th element is the estimated log-scale mean gene expression level of gene g in the first cell type. |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output BUSseqfits_example alpha.est <- baseline_expression_values(BUSseqfits_example)
# "BUSseqfits_example" is an example output BUSseqfits_example alpha.est <- baseline_expression_values(BUSseqfits_example)
BUSseq_MCMC
Function
The function gives the Bayesian Informtion Criterion (BIC) value in the output
SingleCellExperiment
object.
BIC_BUSseq(sce_BUSseqfit)
BIC_BUSseq(sce_BUSseqfit)
sce_BUSseqfit |
An output |
BIC_val |
The BIC value. |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output BUSseqfits_example Example_BIC <- BIC_BUSseq(BUSseqfits_example)
# "BUSseqfits_example" is an example output BUSseqfits_example Example_BIC <- BIC_BUSseq(BUSseqfits_example)
The function BUSseq_MCMC
runs a Markov chain Monte Carlo (MCMC)
algorithm to fit the Batch effects correction with Unknown Subtypes
for scRNA-seq data (BUSseq) model. BUSseq is an interpretable Bayesian
hierarchical model that closely follows the data-generating mechanism
of scRNA-seq experiments. BUSseq can simultaneously correct batch
effects, cluster cell types, impute missing data caused by dropout
events and detect differentially expressed genes without requiring a
preliminary normalization step. We adopt the MCMC algorithm to conduct
posterior inference for the BUSseq model. Here, we denote
the total number of cells as N, the batch number as B, the gene number as G,
and the cell-type number as K.
BUSseq_MCMC(ObservedData, n.celltypes, seed = round(runif(1,1,10000)), n.cores = 8, n.iterations = 2000, n.burnin = floor(n.iterations/2), n.unchanged = min(floor(n.iterations * 0.3), 500), n.output = n.iterations/10, working_dir = getwd(), Drop_ind = rep(TRUE, length(ObservedData)), fdr = 0.05, hyper_pi = 2, hyper_gamma0 = 3, hyper_gamma1 = c(0.001, 0.01), hyper_alpha = 5, tau1sq = 50, hyper_p = c(1, 3), hyper_tau0sq = c(2, 0.01), hyper_nu = 5, hyper_delta = 5, hyper_phi = c(1, 0.1))
BUSseq_MCMC(ObservedData, n.celltypes, seed = round(runif(1,1,10000)), n.cores = 8, n.iterations = 2000, n.burnin = floor(n.iterations/2), n.unchanged = min(floor(n.iterations * 0.3), 500), n.output = n.iterations/10, working_dir = getwd(), Drop_ind = rep(TRUE, length(ObservedData)), fdr = 0.05, hyper_pi = 2, hyper_gamma0 = 3, hyper_gamma1 = c(0.001, 0.01), hyper_alpha = 5, tau1sq = 50, hyper_p = c(1, 3), hyper_tau0sq = c(2, 0.01), hyper_nu = 5, hyper_delta = 5, hyper_phi = c(1, 0.1))
ObservedData |
|
n.celltypes |
|
seed |
|
n.cores |
|
n.iterations |
|
n.burnin |
|
n.unchanged |
|
n.output |
|
working_dir |
|
Drop_ind |
|
fdr |
The false discovery rate level we want to control in order to identify intrinsic genes. The default is 0.05. |
hyper_pi |
|
hyper_gamma0 |
|
hyper_gamma1 |
|
hyper_alpha |
|
tau1sq |
|
hyper_p |
|
hyper_tau0sq |
|
hyper_nu |
|
hyper_delta |
|
hyper_phi |
|
A SingleCellExperiment
object res
is outputed with an imputed
count data matrix assay(res, "imputed_data")
after imputing dropout
events. At the same time, the estimated cell type labels and relative size
factors of cells are added in the column-level metadata
int_colData(res)$BUSseq
, while the identified
intrinsic genes are stored in the row-level metadata
int_elementMetadata(res)$BUSseq
.
Morover, the dimension information, the posterior inference of parameters
and latent variables are stored as a list in metadata(res)$BUSseq
:
n.cell |
The total number of cells in all batches, a scalar. |
n.gene |
The number of genes, a scalar. |
n.batch |
The number of batches, a scalar. |
n.perbatch |
The number of cells in each batch, a B-dimensional vector. |
n.celltype |
The number of cell types specified by user, a scalar. |
n.iter |
The total number of iterations applied in the MCMC algorithm, a scalar. |
seed |
The seed for the MCMC algorithm. |
n.burnin |
The number of iterations as burnin, a scalar. |
gamma.est |
The estimated intercept and odds ratio of the logistic regression for dropout events, a B by 2 matrix. |
alpha.est |
The estimated log-scale baseline expression levels, a G-dimensional vector whose g-th element is the estimated log-scale mean gene expression level of gene g in the first cell type. |
beta.est |
The estimated cell-type effects, a G by K matrix, whose [g,k] element is the effects of cell type k on gene g compared with the first cell type. Note that the first column is zero as the first cell type is taken as the baseline cell type. |
nu.est |
The estimated location batch effects, a G by B matrix, where [g,b] element is the location batch effect on gene g in the batch b compared with the first batch. Note that the first column is zero as the first batch is taken as the reference batch without batch effects. |
delta.est |
The estimated cell-specific global effects, an N-dimensional vector. Note that the first element in each vector is zero as the first cell in each batch is taken as the reference cell. |
phi.est |
The estimated overdispersion parameters, a G by B matrix, where [g,b] element is the overdispersion parameter on gene g in batch b. |
pi.est |
The estimated cell-type proportions across batches, a B by K matrix, whose [b,k] element is the estimated proportion of cell type k in batch b. |
w.est |
The estimated cell-type indicators of each cell, a N-dimensional vector. |
p.est |
The estimated proportion of differentially expressed genes compared with the first cell type, a scalar. |
PPI.est |
The estimated posterior marginal probability of being differentially expressed genes compared with the first cell type. The return is G by K matrix. Noted that the first column consist of zeros as there is no differentially expressed gene compared with the cell type of itself. |
D.est |
The intrinsic gene indicators. The return is an N-dimensional vector. |
BIC |
The BIC value when K = |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
####################################### # Apply BUSseq to the Simulation Data # ####################################### library(SingleCellExperiment) RawCountData <- assay(BUSseqfits_example, "counts") batch_ind <- colData(BUSseqfits_example) sce <- SingleCellExperiment(assays = list(counts = RawCountData), colData = DataFrame(Batch_ind = batch_ind)) BUSseqfits_res <- BUSseq_MCMC(ObservedData = sce, seed = 1234, n.cores = 2, n.celltypes = 4, n.iterations = 500) ################################################ # Extract Estimates from the BUSseqfits Object # ################################################ #return cell type indicators w.est <- celltypes(BUSseqfits_res) #return the intercept and odds ratio of the logistic regression #for dropout events gamma.est <- dropout_coefficient_values(BUSseqfits_res) #return the log-scale baseline expression values alpha.est <- baseline_expression_values(BUSseqfits_res) #return the cell-type effects beta.est <- celltype_effects(BUSseqfits_res) #return the mean expression levels mu.est <- celltype_mean_expression(BUSseqfits_res) #return the cell-specific global effects delta.est <- cell_effect_values(BUSseqfits_res) #return the location batch effects nu.est <- location_batch_effects(BUSseqfits_res) #return the overdispersion parameters phi.est <- overdispersions(BUSseqfits_res) #return the intrinsic gene indices D.est <- intrinsic_genes_BUSseq(BUSseqfits_res) #return the BIC value BIC <- BIC_BUSseq(BUSseqfits_res) #return the raw read count matrix CountData_raw <- raw_read_counts(BUSseqfits_res) #return the imputed read count matrix CountData_imputed <- imputed_read_counts(BUSseqfits_res) #return the corrected read count matrix CountData_corrected <- corrected_read_counts(BUSseqfits_res) ################# # Visualization # ################# #generate the heatmap of raw read count data heatmap_data_BUSseq(BUSseqfits_res, project_name="Heatmap_raw") #generate the heatmap of imputed read count data heatmap_data_BUSseq(BUSseqfits_res, data_type = "Imputed", project_name="Heatmap_imputed") #generate the heatmap of corrected read count data heatmap_data_BUSseq(BUSseqfits_res, data_type = "Corrected", project_name="Heatmap_corrected")
####################################### # Apply BUSseq to the Simulation Data # ####################################### library(SingleCellExperiment) RawCountData <- assay(BUSseqfits_example, "counts") batch_ind <- colData(BUSseqfits_example) sce <- SingleCellExperiment(assays = list(counts = RawCountData), colData = DataFrame(Batch_ind = batch_ind)) BUSseqfits_res <- BUSseq_MCMC(ObservedData = sce, seed = 1234, n.cores = 2, n.celltypes = 4, n.iterations = 500) ################################################ # Extract Estimates from the BUSseqfits Object # ################################################ #return cell type indicators w.est <- celltypes(BUSseqfits_res) #return the intercept and odds ratio of the logistic regression #for dropout events gamma.est <- dropout_coefficient_values(BUSseqfits_res) #return the log-scale baseline expression values alpha.est <- baseline_expression_values(BUSseqfits_res) #return the cell-type effects beta.est <- celltype_effects(BUSseqfits_res) #return the mean expression levels mu.est <- celltype_mean_expression(BUSseqfits_res) #return the cell-specific global effects delta.est <- cell_effect_values(BUSseqfits_res) #return the location batch effects nu.est <- location_batch_effects(BUSseqfits_res) #return the overdispersion parameters phi.est <- overdispersions(BUSseqfits_res) #return the intrinsic gene indices D.est <- intrinsic_genes_BUSseq(BUSseqfits_res) #return the BIC value BIC <- BIC_BUSseq(BUSseqfits_res) #return the raw read count matrix CountData_raw <- raw_read_counts(BUSseqfits_res) #return the imputed read count matrix CountData_imputed <- imputed_read_counts(BUSseqfits_res) #return the corrected read count matrix CountData_corrected <- corrected_read_counts(BUSseqfits_res) ################# # Visualization # ################# #generate the heatmap of raw read count data heatmap_data_BUSseq(BUSseqfits_res, project_name="Heatmap_raw") #generate the heatmap of imputed read count data heatmap_data_BUSseq(BUSseqfits_res, data_type = "Imputed", project_name="Heatmap_imputed") #generate the heatmap of corrected read count data heatmap_data_BUSseq(BUSseqfits_res, data_type = "Corrected", project_name="Heatmap_corrected")
BUSseq_MCMC
This data set is a SingleCellExperiment
object generated by running
the BUSseq_MCMC
function on the simulatied data in the
"Example" of BUSseq-package
.
BUSseqfits_example
BUSseqfits_example
A SingleCellExperiment
object.
The simuated read count data consist of two 150-cell batches. In total, 300 genes are measured. Moreover, all cells come from four cell types.
The "Example" of BUSseq-package
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
library(SingleCellExperiment) ################################ # Set the Synthetic Parameters # ################################ rm(list=ls()) set.seed(1234) #The number of batches B<-2 #The number of cells per batch nb<-c(150,150) #The total number of cells N<-sum(nb) #The number of genes G<-300 #The number of cell types K<-4 # the project name proj <- "demo" #The first column of gamma.syn denotes the intercept of #the logistic regression for dropout events #The second column of gamma.syn denotes the odds ratios #of the logistic regression for dropout events gamma.syn<-matrix(0,B,2) gamma.syn[1,]<-c(-1,-0.5) gamma.syn[2,]<-c(-1,-0.5) #the log-scale baseline expression levels alpha.syn<-rep(0,G) alpha.syn[1:(G*0.2)]<-2 #the cell-type effects beta.syn<-matrix(0,G,K) #the first cell type is regarded as the reference cell type beta.syn[,1] <- 0 #the effects of the second cell type beta.syn[1:(G * 0.2),2] <- -2 beta.syn[(G * 0.20 + 1):(G * 0.4),2] <- 2 beta.syn[(G * 0.4 + 1):G,2] <- 0 #the effects of the third cell type beta.syn[1:(G * 0.2),3]<- -2 beta.syn[(G * 0.2 + 1):(G * 0.4),3] <- 0 beta.syn[(G * 0.4 + 1):(G * 0.6),3] <- 2 beta.syn[(G * 0.6 + 1):G,3] <- 0 #the effects of the forth cell type beta.syn[1:(G * 0.2),4]<- -2 beta.syn[(G * 0.2 + 1):(G * 0.6),4] <- 0 beta.syn[(G * 0.6 + 1):(G * 0.8),4] <- 2 beta.syn[(G * 0.8 + 1):G,4] <- 0 #the batch effects nu.syn<-matrix(NA,G,B) #the first batch is regarded as the reference batch nu.syn[,1] <- 0 #the effect of the second batch nu.syn[1:(G * 0.4),2]<-2 nu.syn[(G * 0.4 + 1):(G * 0.8),2]<-1 nu.syn[(G * 0.8 + 1):G,2]<-2 #the cell-specific size factors delta.syn <- rep(NA, N) #The frist cell in each bathc is regarded as the reference cell delta.syn[1:(nb[1] * 0.5)]<-0 delta.syn[(nb[1] * 0.5 + 1):(nb[1] * 0.9)]<-1 delta.syn[(nb[1] * 0.9 + 1):nb[1]]<-2 #the second batch delta.syn[1:(nb[2] * 0.5) + nb[1]]<-0 delta.syn[(nb[2] * 0.5 + 1):(nb[2] * 0.7) + nb[1]]<-2 delta.syn[(nb[2] * 0.7 + 1):nb[2] + nb[1]]<--1 #the batch-specific and gene-specific overdispersion parameters phi.syn<-matrix(10,G,B) #the cell-type proportions in each batch pi.syn <- matrix(NA,K,B) #the frist batch pi.syn[,1]<-c(0.3,0.3,0.2,0.2) #the second batch pi.syn[,2]<-c(0.2,0.24,0.2,0.36) ############################################## # Simulate Latent Varibles and Observed data # ############################################## #the cell-type indicators of each cell w<- rep(NA, N) #the first batch nw<-nb[1] * pi.syn[,1] w[1:nb[1]]<- rep(1:4,nw) #the second batch nw<-nb[2] * pi.syn[,2] w[1:nb[2] + nb[1]]<- rep(1:4,nw) #the indicators for dropout events z<- matrix(NA, G, N) #the underlying true expression levels x <- matrix(NA, G, N) #the observed expression levels y <- matrix(NA, G, N) #the logarithm of mean expreesion level of each gene in each cell log.mu <- matrix(NA, G, N) #generate the latent variable and observed data batch_ind <- rep(1:B, nb) for(i in 1:N){ b <- batch_ind[i] log.mu[,i] <- alpha.syn + beta.syn[,w[i]] log.mu[,i] <- log.mu[,i] + nu.syn[,b] log.mu[,i] <- log.mu[,i] + delta.syn[i] for(j in 1:G){ x[j,i]<-rnbinom(1,phi.syn[j,b], mu=exp(log.mu[j,i])) logit_pi <- gamma.syn[b,1] + gamma.syn[b,2] * x[j,i] z[j,i]<-rbinom(1,1,prob = 1/(1+exp(-logit_pi))) if(z[j,i]==0){ y[j,i]<- x[j,i] }else{ y[j,i]<- 0 } } } sce <- SingleCellExperiment(assays = list(counts = y), colData = DataFrame(Batch_ind = factor(batch_ind))) BUSseqfits_example <- BUSseq_MCMC(ObservedData = sce, seed = 1234, n.cores = 2, n.celltypes = 4, n.iterations = 500)
library(SingleCellExperiment) ################################ # Set the Synthetic Parameters # ################################ rm(list=ls()) set.seed(1234) #The number of batches B<-2 #The number of cells per batch nb<-c(150,150) #The total number of cells N<-sum(nb) #The number of genes G<-300 #The number of cell types K<-4 # the project name proj <- "demo" #The first column of gamma.syn denotes the intercept of #the logistic regression for dropout events #The second column of gamma.syn denotes the odds ratios #of the logistic regression for dropout events gamma.syn<-matrix(0,B,2) gamma.syn[1,]<-c(-1,-0.5) gamma.syn[2,]<-c(-1,-0.5) #the log-scale baseline expression levels alpha.syn<-rep(0,G) alpha.syn[1:(G*0.2)]<-2 #the cell-type effects beta.syn<-matrix(0,G,K) #the first cell type is regarded as the reference cell type beta.syn[,1] <- 0 #the effects of the second cell type beta.syn[1:(G * 0.2),2] <- -2 beta.syn[(G * 0.20 + 1):(G * 0.4),2] <- 2 beta.syn[(G * 0.4 + 1):G,2] <- 0 #the effects of the third cell type beta.syn[1:(G * 0.2),3]<- -2 beta.syn[(G * 0.2 + 1):(G * 0.4),3] <- 0 beta.syn[(G * 0.4 + 1):(G * 0.6),3] <- 2 beta.syn[(G * 0.6 + 1):G,3] <- 0 #the effects of the forth cell type beta.syn[1:(G * 0.2),4]<- -2 beta.syn[(G * 0.2 + 1):(G * 0.6),4] <- 0 beta.syn[(G * 0.6 + 1):(G * 0.8),4] <- 2 beta.syn[(G * 0.8 + 1):G,4] <- 0 #the batch effects nu.syn<-matrix(NA,G,B) #the first batch is regarded as the reference batch nu.syn[,1] <- 0 #the effect of the second batch nu.syn[1:(G * 0.4),2]<-2 nu.syn[(G * 0.4 + 1):(G * 0.8),2]<-1 nu.syn[(G * 0.8 + 1):G,2]<-2 #the cell-specific size factors delta.syn <- rep(NA, N) #The frist cell in each bathc is regarded as the reference cell delta.syn[1:(nb[1] * 0.5)]<-0 delta.syn[(nb[1] * 0.5 + 1):(nb[1] * 0.9)]<-1 delta.syn[(nb[1] * 0.9 + 1):nb[1]]<-2 #the second batch delta.syn[1:(nb[2] * 0.5) + nb[1]]<-0 delta.syn[(nb[2] * 0.5 + 1):(nb[2] * 0.7) + nb[1]]<-2 delta.syn[(nb[2] * 0.7 + 1):nb[2] + nb[1]]<--1 #the batch-specific and gene-specific overdispersion parameters phi.syn<-matrix(10,G,B) #the cell-type proportions in each batch pi.syn <- matrix(NA,K,B) #the frist batch pi.syn[,1]<-c(0.3,0.3,0.2,0.2) #the second batch pi.syn[,2]<-c(0.2,0.24,0.2,0.36) ############################################## # Simulate Latent Varibles and Observed data # ############################################## #the cell-type indicators of each cell w<- rep(NA, N) #the first batch nw<-nb[1] * pi.syn[,1] w[1:nb[1]]<- rep(1:4,nw) #the second batch nw<-nb[2] * pi.syn[,2] w[1:nb[2] + nb[1]]<- rep(1:4,nw) #the indicators for dropout events z<- matrix(NA, G, N) #the underlying true expression levels x <- matrix(NA, G, N) #the observed expression levels y <- matrix(NA, G, N) #the logarithm of mean expreesion level of each gene in each cell log.mu <- matrix(NA, G, N) #generate the latent variable and observed data batch_ind <- rep(1:B, nb) for(i in 1:N){ b <- batch_ind[i] log.mu[,i] <- alpha.syn + beta.syn[,w[i]] log.mu[,i] <- log.mu[,i] + nu.syn[,b] log.mu[,i] <- log.mu[,i] + delta.syn[i] for(j in 1:G){ x[j,i]<-rnbinom(1,phi.syn[j,b], mu=exp(log.mu[j,i])) logit_pi <- gamma.syn[b,1] + gamma.syn[b,2] * x[j,i] z[j,i]<-rbinom(1,1,prob = 1/(1+exp(-logit_pi))) if(z[j,i]==0){ y[j,i]<- x[j,i] }else{ y[j,i]<- 0 } } } sce <- SingleCellExperiment(assays = list(counts = y), colData = DataFrame(Batch_ind = factor(batch_ind))) BUSseqfits_example <- BUSseq_MCMC(ObservedData = sce, seed = 1234, n.cores = 2, n.celltypes = 4, n.iterations = 500)
BUSseq_MCMC
Function
The function gives the estimated cell-specific size effects in the output
SingleCellExperiment
object.
cell_effect_values(sce_BUSseqfit)
cell_effect_values(sce_BUSseqfit)
sce_BUSseqfit |
An output |
delta.est |
The estimated cell-specific global effects, an N-dimensional vector. Note that the first element in each vector is zero as the first cell in each batch is taken as the reference cell. |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output BUSseqfits_example delta.est <- cell_effect_values(BUSseqfits_example)
# "BUSseqfits_example" is an example output BUSseqfits_example delta.est <- cell_effect_values(BUSseqfits_example)
BUSseq_MCMC
Function
The function gives the estimated cell-type effects in the output
SingleCellExperiment
object.
celltype_effects(sce_BUSseqfit)
celltype_effects(sce_BUSseqfit)
sce_BUSseqfit |
An output |
beta.est |
The estimated cell-type effects, a G by K matrix, whose [g,k] element is the effects of cell type k on gene g compared with the first cell type. Note that the first column is zero as the first cell type is taken as the baseline cell type. |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output BUSseqfits_example beta.est <- celltype_effects(BUSseqfits_example)
# "BUSseqfits_example" is an example output BUSseqfits_example beta.est <- celltype_effects(BUSseqfits_example)
BUSseq_MCMC
Function
The function gives the estimated cell-type-specific mean expression
levels in the output SingleCellExperiment
object.
celltype_mean_expression(sce_BUSseqfit)
celltype_mean_expression(sce_BUSseqfit)
sce_BUSseqfit |
An output |
mu.est |
The estimated cell-type-specific mean expression levels, a G by K matrix, whose [g,k] element is the mean expression levels of cell type k on gene g. |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output BUSseqfits_example mu.est <- celltype_mean_expression(BUSseqfits_example)
# "BUSseqfits_example" is an example output BUSseqfits_example mu.est <- celltype_mean_expression(BUSseqfits_example)
BUSseq_MCMC
Function
The function gives the cell-type indicators of the output
SingleCellExperiment
object.
celltypes(sce_BUSseqfit)
celltypes(sce_BUSseqfit)
sce_BUSseqfit |
An output |
w.est |
The estimated cell-type indicators, an N-dimensional vector, where N is the total number of cells. |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output BUSseqfits_example celltypes_est <- celltypes(BUSseqfits_example)
# "BUSseqfits_example" is an example output BUSseqfits_example celltypes_est <- celltypes(BUSseqfits_example)
BUSseq_MCMC
The function generates a version of count data, for which the batch effects are removed and the biological variabilities are retained. We develop a quantile match approach based on the idea of inverse transform sampling. The users can perform downstream analysis on the corrected read count matrix, such as clustering, differentially expressed gene identification and so on, as if all the data were measured in a single batch.
corrected_read_counts(sce_BUSseqfit)
corrected_read_counts(sce_BUSseqfit)
sce_BUSseqfit |
An output |
corrected_data |
The corrected read count matrix, which is added to
the assays of |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output library(SingleCellExperiment) BUSseqfits_example BUSseqfits_corrected <- corrected_read_counts(BUSseqfits_example) corrected_data <- assay(BUSseqfits_corrected, "corrected_data")
# "BUSseqfits_example" is an example output library(SingleCellExperiment) BUSseqfits_example BUSseqfits_corrected <- corrected_read_counts(BUSseqfits_example) corrected_data <- assay(BUSseqfits_corrected, "corrected_data")
BUSseq_MCMC
Function
The function gives the intercept and odds ratio of the logistic regression
for dropout events in the output SingleCellExperiment
object.
dropout_coefficient_values(sce_BUSseqfit)
dropout_coefficient_values(sce_BUSseqfit)
sce_BUSseqfit |
An output |
gamma.est |
The estimated intercept and log ratio of the logistic regression for dropout events, a 2-diminsional vector. |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output BUSseqfits_example gamma.est <- dropout_coefficient_values(BUSseqfits_example)
# "BUSseqfits_example" is an example output BUSseqfits_example gamma.est <- dropout_coefficient_values(BUSseqfits_example)
BUSseq_MCMC
Function
Plot the heatmap of the log-scale read count data across multiple batches, and then save the resulting images in the user's directory as "png" format.
heatmap_data_BUSseq(sce_BUSseqfit, data_type = c("Raw","Imputed","Corrected"), gene_set = NULL, project_name= paste0("BUSseq_heatmap_",data_type), image_dir = NULL, color_key_seq = NULL, image_width = 1440, image_height = 1080)
heatmap_data_BUSseq(sce_BUSseqfit, data_type = c("Raw","Imputed","Corrected"), gene_set = NULL, project_name= paste0("BUSseq_heatmap_",data_type), image_dir = NULL, color_key_seq = NULL, image_width = 1440, image_height = 1080)
sce_BUSseqfit |
An output |
data_type |
A string to determine which count data matrix is used to draw the heatmap, "Raw" for the raw count data, "Imputed" for the imputed data, and "Corrected" for the corrected data. |
gene_set |
A vector of gene indices indicating the gene set of interest to
display in the heatmap. The default is all genes.
We also recommend displaying the intrinsic genes obtained from
|
project_name |
A string to name the "png" image. By default, the figure is named as "BUSseq_heatmap_Raw_log1p_data.png." |
image_dir |
A directory to store the gnereated heatmap. The default is to create a folder called "image" in the current directory and save there. |
color_key_seq |
A numeric vector indicating the splitting points for binning log-scale read counts into colors. The default is to space the color key points equally between the minimum and maximum of the log-scale read count data. |
image_width |
The width in pixels of the graphical device to plot. The default is 1440 px. |
image_height |
The height in pixels of the graphical device to plot. The default is 1080 px. |
To cope with the zeros in the count data, we take the transformation
log(1+x) on all count data, which corresponds to the R function
log1p()
instead of log()
.
Visualize the gene expression data matrix, where each row represents a gene, and each column represents a sample.
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
library(SingleCellExperiment) # Plot the imputed read count data of the first 100 genes heatmap_data_BUSseq(BUSseqfits_example, data_type = "Imputed", gene_set = 1:100)
library(SingleCellExperiment) # Plot the imputed read count data of the first 100 genes heatmap_data_BUSseq(BUSseqfits_example, data_type = "Imputed", gene_set = 1:100)
BUSseq_MCMC
Function
The function gives the imputed read counts in the output
SingleCellExperiment
object.
imputed_read_counts(sce_BUSseqfit)
imputed_read_counts(sce_BUSseqfit)
sce_BUSseqfit |
An output |
CountData_imputed |
The imputed read counts, a |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output library(SingleCellExperiment) BUSseqfits_example Example_CountData_imputed <- imputed_read_counts(BUSseqfits_example)
# "BUSseqfits_example" is an example output library(SingleCellExperiment) BUSseqfits_example Example_CountData_imputed <- imputed_read_counts(BUSseqfits_example)
BUSseq_MCMC
Function
The function gives the estimated intrinsic gene indicators in the output
SingleCellExperiment
object.
intrinsic_genes_BUSseq(sce_BUSseqfit)
intrinsic_genes_BUSseq(sce_BUSseqfit)
sce_BUSseqfit |
An output |
intrinsic_genes |
A vector indicating whether a gene is intrinsic or not. |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output BUSseqfits_example intri_genes <- intrinsic_genes_BUSseq(BUSseqfits_example)
# "BUSseqfits_example" is an example output BUSseqfits_example intri_genes <- intrinsic_genes_BUSseq(BUSseqfits_example)
BUSseq_MCMC
Function
The function gives the estimated location batch effects in the output
SingleCellExperiment
object.
location_batch_effects(sce_BUSseqfit)
location_batch_effects(sce_BUSseqfit)
sce_BUSseqfit |
An output |
nu.est |
The estimated location batch effects, a G by B matrix, where [g,b] element is the location batch effect on gene g in the batch b compared with the first batch. Note that the first column is zero as the first batch is taken as the reference batch without batch effects. |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output BUSseqfits_example nu.est <- location_batch_effects(BUSseqfits_example)
# "BUSseqfits_example" is an example output BUSseqfits_example nu.est <- location_batch_effects(BUSseqfits_example)
BUSseq_MCMC
Function
The function gives the estimated overdispersion parameters in the output
SingleCellExperiment
object.
overdispersions(sce_BUSseqfit)
overdispersions(sce_BUSseqfit)
sce_BUSseqfit |
An output |
phi.est |
The estimated overdispersion parameters, a G by B matrix, where [g,b] element is the overdispersion parameter of gene g in batch b. |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output BUSseqfits_example phi.est <- overdispersions(BUSseqfits_example)
# "BUSseqfits_example" is an example output BUSseqfits_example phi.est <- overdispersions(BUSseqfits_example)
BUSseq_MCMC
Function
The function gives the original read count matrix.
This function is actually equivalent to
assay(sce_BUSseqfit, "counts")
.
raw_read_counts(sce_BUSseqfit)
raw_read_counts(sce_BUSseqfit)
sce_BUSseqfit |
An output |
CountData_raw |
The raw read counts, a matrix with each row for a gene and each column for a cell. |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
# "BUSseqfits_example" is an example output library(SingleCellExperiment) BUSseqfits_example Example_CountData_raw <- raw_read_counts(BUSseqfits_example)
# "BUSseqfits_example" is an example output library(SingleCellExperiment) BUSseqfits_example Example_CountData_raw <- raw_read_counts(BUSseqfits_example)