Package 'scMerge' reference manual

Title:	scMerge: Merging multiple batches of scRNA-seq data
Description:	Like all gene expression data, single-cell data suffers from batch effects and other unwanted variations that makes accurate biological interpretations difficult. The scMerge method leverages factor analysis, stably expressed genes (SEGs) and (pseudo-) replicates to remove unwanted variations and merge multiple single-cell data. This package contains all the necessary functions in the scMerge pipeline, including the identification of SEGs, replication-identification methods, and merging of single-cell data.
Authors:	Yingxin Lin [aut, cre], Kevin Wang [aut], Sydney Bioinformatics and Biometrics Group [fnd]
Maintainer:	Yingxin Lin <[email protected]>
License:	GPL-3
Version:	1.23.0
Built:	2025-03-09 05:30:46 UTC
Source:	https://github.com/bioc/scMerge

Subsetted mouse ESC 'SingleCellExperiment' object

Description

A dataset containing 300 cells and 2026 genes from two batches of mouse ESC data

Usage

data(example_sce, package = 'scMerge')
data(example_sce, package = 'scMerge')

Format

A 'SingleCellExperiment' object

Source

https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-2600/

References

Kolodziejczyk et al.

A fast version of the ruv::RUVIII algorithm

Description

Perform a fast version of the ruv::RUVIII algorithm for scRNA-Seq data noise estimation

Usage

fastRUVIII(
  Y,
  M,
  ctl,
  k = NULL,
  eta = NULL,
  svd_k = 50,
  include.intercept = TRUE,
  average = FALSE,
  BPPARAM = SerialParam(),
  BSPARAM = ExactParam(),
  fullalpha = NULL,
  return.info = FALSE,
  inputcheck = TRUE
)
fastRUVIII(
  Y,
  M,
  ctl,
  k = NULL,
  eta = NULL,
  svd_k = 50,
  include.intercept = TRUE,
  average = FALSE,
  BPPARAM = SerialParam(),
  BSPARAM = ExactParam(),
  fullalpha = NULL,
  return.info = FALSE,
  inputcheck = TRUE
)

Arguments

`Y`	The unnormalised scRNA-Seq data matrix. A m by n matrix, where m is the number of observations and n is the number of features.
`M`	The replicate mapping matrix. The mapping matrix has m rows (one for each observation), and each column represents a set of replicates. The (i, j)-th entry of the mapping matrix is 1 if the i-th observation is in replicate set j, and 0 otherwise. See ruv::RUVIII for more details.
`ctl`	An index vector to specify the negative controls. Either a logical vector of length n or a vector of integers.
`k`	The number of unwanted factors to remove. This is inherited from the ruvK argument from the scMerge::scMerge function.
`eta`	Gene-wise (as opposed to sample-wise) covariates. See ruv::RUVIII for details.
`svd_k`	If BSPARAM is set to `RandomParam` or `IrlbaParam` class from `BiocSingular` package, then `svd_k` will be used to used to reduce the computational cost of singular value decomposition. Default to 50.
`include.intercept`	When eta is specified (not NULL) but does not already include an intercept term, this will automatically include one. See ruv::RUVIII for details.
`average`	Average replicates after adjustment. See ruv::RUVIII for details.
`BPPARAM`	A `BiocParallelParam` class object from the `BiocParallel` package is used. Default is SerialParam().
`BSPARAM`	A `BiocSingularParam` class object from the `BiocSingular` package is used. Default is ExactParam().
`fullalpha`	Not used. Please ignore. See ruv::RUVIII for details.
`return.info`	Additional information relating to the computation of normalised matrix. We recommend setting this to true.
`inputcheck`	We recommend setting this to true.

Value

A normalised matrix of the same dimensions as the input matrix Y.

Author(s)

Yingxin Lin, John Ormerod, Kevin Wang

Examples

L = ruvSimulate(m = 200, n = 500, nc = 400, nCelltypes = 3, nBatch = 2, lambda = 0.1, sce = FALSE)
Y = L$Y; M = L$M; ctl = L$ctl
improved1 = scMerge::fastRUVIII(Y = Y, M = M, ctl = ctl, 
k = 20, BSPARAM = BiocSingular::ExactParam())
improved2 = scMerge::fastRUVIII(Y = Y, M = M, ctl = ctl, 
k = 20, BSPARAM = BiocSingular::RandomParam(), svd_k = 50)
old = ruv::RUVIII(Y = Y, M = M, ctl = ctl, k = 20)
all.equal(improved1, old)
all.equal(improved2, old)
L = ruvSimulate(m = 200, n = 500, nc = 400, nCelltypes = 3, nBatch = 2, lambda = 0.1, sce = FALSE)
Y = L$Y; M = L$M; ctl = L$ctl
improved1 = scMerge::fastRUVIII(Y = Y, M = M, ctl = ctl, 
k = 20, BSPARAM = BiocSingular::ExactParam())
improved2 = scMerge::fastRUVIII(Y = Y, M = M, ctl = ctl, 
k = 20, BSPARAM = BiocSingular::RandomParam(), svd_k = 50)
old = ruv::RUVIII(Y = Y, M = M, ctl = ctl, k = 20)
all.equal(improved1, old)
all.equal(improved2, old)

getAdjustedMat

Description

Get Adjusted Matrix with scMerge2 parameter estimated

Usage

getAdjustedMat(
  exprsMat,
  fullalpha,
  ctl = rownames(exprsMat),
  adjusted_means = NULL,
  ruvK = 20,
  return_subset_genes = NULL
)
getAdjustedMat(
  exprsMat,
  fullalpha,
  ctl = rownames(exprsMat),
  adjusted_means = NULL,
  ruvK = 20,
  return_subset_genes = NULL
)

Arguments

`exprsMat`	A gene (row) by cell (column) matrix to be adjusted.
`fullalpha`	A matrix indicates the estimated alpha returned by `scMerge2()`.
`ctl`	A character vector of negative control. It should have a non-empty intersection with the rows of exprsMat.
`adjusted_means`	A rowwise mean of the gene by cell matrix
`ruvK`	An integer indicates the number of unwanted variation factors that are removed, default is 20.
`return_subset_genes`	An optional character vector of indicates the subset of genes will be adjusted.

Value

Returns the adjusted matrix will be return.

Author(s)

Yingxin Lin

Examples

## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')
## Running an example data with minimal inputs
library(SingleCellExperiment)
scMerge2_res <- scMerge2(exprsMat = logcounts(example_sce),
batch = example_sce$batch,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG,
return_matrix = FALSE)
cosineNorm_mat <- batchelor::cosineNorm(logcounts(example_sce))
adjusted_means <- DelayedMatrixStats::rowMeans2(cosineNorm_mat)
newY <- getAdjustedMat(cosineNorm_mat, scMerge2_res$fullalpha,
              ctl = segList_ensemblGeneID$mouse$mouse_scSEG,
              ruvK = 20,
              adjusted_means = adjusted_means)
assay(example_sce, "scMerge2") <- newY

example_sce = scater::runPCA(example_sce, exprs_values = 'scMerge2')                                       
scater::plotPCA(example_sce, colour_by = 'cellTypes', shape = 'batch')
## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')
## Running an example data with minimal inputs
library(SingleCellExperiment)
scMerge2_res <- scMerge2(exprsMat = logcounts(example_sce),
batch = example_sce$batch,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG,
return_matrix = FALSE)
cosineNorm_mat <- batchelor::cosineNorm(logcounts(example_sce))
adjusted_means <- DelayedMatrixStats::rowMeans2(cosineNorm_mat)
newY <- getAdjustedMat(cosineNorm_mat, scMerge2_res$fullalpha,
              ctl = segList_ensemblGeneID$mouse$mouse_scSEG,
              ruvK = 20,
              adjusted_means = adjusted_means)
assay(example_sce, "scMerge2") <- newY

example_sce = scater::runPCA(example_sce, exprs_values = 'scMerge2')                                       
scater::plotPCA(example_sce, colour_by = 'cellTypes', shape = 'batch')

Simulate a simple matrix or SingleCellExperiment to test internals of scMerge

Description

This function is designed to generate Poisson-random-variable data matrix to test on the internal algorithms of scMerge. It does not represent real biological situations and it is not intended to be used by end-users.

Usage

ruvSimulate(
  m = 100,
  n = 5000,
  nc = floor(n/2),
  nCelltypes = 3,
  nBatch = 2,
  k = 20,
  lambda = 0.1,
  sce = FALSE
)
ruvSimulate(
  m = 100,
  n = 5000,
  nc = floor(n/2),
  nCelltypes = 3,
  nBatch = 2,
  k = 20,
  lambda = 0.1,
  sce = FALSE
)

Arguments

`m`	Number of observations
`n`	Number of features
`nc`	Number of negative controls
`nCelltypes`	Number of cell-types
`nBatch`	Number of batches
`k`	Number of unwanted factors in simulation
`lambda`	Rate parameter for random Poisson generation
`sce`	If `TRUE`, returns a SingleCellExperiment object

Value

If sce is FALSE, then the output is a list consists of

Y, expression matrix generated through Poisson random variables,
ctl, a logical vector indicating the control genes,
M, replicate mapping matrix,
cellTypes, a vector indicating simulated cell types
batch, a vector indicating simulated batches

if sce is TRUE, a SingleCellExperiment wrapper will be applied on all above simulated objects.

Examples

set.seed(1)
L = ruvSimulate(m = 200, n = 1000, nc = 200, 
nCelltypes = 3, nBatch = 2, lambda = 0.1, k = 10, sce = TRUE)
print(L)
example <- scMerge(sce_combine = L,
                      ctl = paste0('gene', 1:500),
                      cell_type = L$cellTypes,
                      ruvK = 10,
                      assay_name = 'scMerge')
L = scater::runPCA(L, exprs_values = "logcounts")                      
scater::plotPCA(L, colour_by = 'cellTypes', shape = 'batch')

example = scater::runPCA(example, exprs_values = 'scMerge')                                       
scater::plotPCA(example, colour_by = 'cellTypes', shape = 'batch')
set.seed(1)
L = ruvSimulate(m = 200, n = 1000, nc = 200, 
nCelltypes = 3, nBatch = 2, lambda = 0.1, k = 10, sce = TRUE)
print(L)
example <- scMerge(sce_combine = L,
                      ctl = paste0('gene', 1:500),
                      cell_type = L$cellTypes,
                      ruvK = 10,
                      assay_name = 'scMerge')
L = scater::runPCA(L, exprs_values = "logcounts")                      
scater::plotPCA(L, colour_by = 'cellTypes', shape = 'batch')

example = scater::runPCA(example, exprs_values = 'scMerge')                                       
scater::plotPCA(example, colour_by = 'cellTypes', shape = 'batch')

Combind several `SingleCellExperiment` objects from different batches/experiments

Description

Combind several SingleCellExperiment objects from different batches/experiments.

Usage

sce_cbind(
  sce_list,
  method = "intersect",
  cut_off_batch = 0.01,
  cut_off_overall = 0.01,
  exprs = c("counts", "logcounts"),
  colData_names = NULL,
  batch_names = NULL
)
sce_cbind(
  sce_list,
  method = "intersect",
  cut_off_batch = 0.01,
  cut_off_overall = 0.01,
  exprs = c("counts", "logcounts"),
  colData_names = NULL,
  batch_names = NULL
)

Arguments

`sce_list`	A list contains the `SingleCellExperiment` Object from each batch
`method`	A string indicates the method of combining the gene expression matrix, either `union` or `intersect`. Default to `intersect`. `union` only supports matrix class.
`cut_off_batch`	A numeric vector indicating the cut-off for the proportion of a gene is expressed within each batch
`cut_off_overall`	A numeric vector indicating the cut-off for the proportion of a gene is expressed overall data
`exprs`	A string vector indicating the expression matrices to be combined. The first assay named will be used to determine the proportion of zeores.
`colData_names`	A string vector indicating the `colData` that are combined
`batch_names`	A string vector indicating the batch names for the output sce object

Value

A SingleCellExperiment object with the list of SCE objects combined.

Author(s)

Yingxin Lin

Examples

data('example_sce', package = 'scMerge')
batch_names<-unique(example_sce$batch)
sce_list<-list(example_sce[,example_sce$batch=='batch2'],
               example_sce[,example_sce$batch=='batch3'])
sce_combine<-sce_cbind(sce_list,batch_names=batch_names)
data('example_sce', package = 'scMerge')
batch_names<-unique(example_sce$batch)
sce_list<-list(example_sce[,example_sce$batch=='batch2'],
               example_sce[,example_sce$batch=='batch3'])
sce_combine<-sce_cbind(sce_list,batch_names=batch_names)

Perform the scMerge algorithm

Description

Merge single-cell RNA-seq data from different batches and experiments leveraging (pseudo)-replicates and control genes.

Usage

scMerge(
  sce_combine,
  ctl = NULL,
  kmeansK = NULL,
  exprs = "logcounts",
  hvg_exprs = "counts",
  batch_name = "batch",
  marker = NULL,
  marker_list = NULL,
  ruvK = 20,
  replicate_prop = 1,
  cell_type = NULL,
  cell_type_match = FALSE,
  cell_type_inc = NULL,
  BSPARAM = ExactParam(),
  svd_k = 50,
  dist = "cor",
  WV = NULL,
  WV_marker = NULL,
  BPPARAM = SerialParam(),
  return_all_RUV = FALSE,
  assay_name = NULL,
  plot_igraph = TRUE,
  verbose = FALSE
)
scMerge(
  sce_combine,
  ctl = NULL,
  kmeansK = NULL,
  exprs = "logcounts",
  hvg_exprs = "counts",
  batch_name = "batch",
  marker = NULL,
  marker_list = NULL,
  ruvK = 20,
  replicate_prop = 1,
  cell_type = NULL,
  cell_type_match = FALSE,
  cell_type_inc = NULL,
  BSPARAM = ExactParam(),
  svd_k = 50,
  dist = "cor",
  WV = NULL,
  WV_marker = NULL,
  BPPARAM = SerialParam(),
  return_all_RUV = FALSE,
  assay_name = NULL,
  plot_igraph = TRUE,
  verbose = FALSE
)

Arguments

`sce_combine`	A `SingleCellExperiment` object contains the batch-combined matrix with batch info in colData.
`ctl`	A character vector of negative control. It should have a non-empty intersection with the rows of sce_combine.
`kmeansK`	A vector indicates the kmeans's K for each batch. The length of kmeansK needs to be the same as the number of batch.
`exprs`	A string indicating the name of the assay requiring batch correction in sce_combine, default is logcounts.
`hvg_exprs`	A string indicating the assay that to be used for highly variable genes identification in sce_combine, default is counts.
`batch_name`	A character indicating the name of the batch column, default to "batch"
`marker`	An optional vector of markers, to be used in calculation of mutual nearest cluster. If no markers input, highly variable genes will be used instead.
`marker_list`	An optional list of markers for each batch, which will be used in calculation of mutual nearest cluster.
`ruvK`	An optional integer/vector indicating the number of unwanted variation factors that are removed, default is 20.
`replicate_prop`	A number indicating the ratio of cells that are included in pseudo-replicates, ranges from 0 to 1. Default to 1.
`cell_type`	An optional vector indicating the cell type information for each cell in the batch-combined matrix. If it is `NULL`, pseudo-replicate procedure will be run to identify cell type.
`cell_type_match`	An optional logical input for whether to find mutual nearest cluster using cell type information.
`cell_type_inc`	An optional vector indicating the indices of the cells that will be used to supervise the pseudo-replicate procedure.
`BSPARAM`	A `BiocSingularParam` class object from the `BiocSingular` package is used. Default is ExactParam().
`svd_k`	If BSPARAM is set to `RandomParam` or `IrlbaParam` class from `BiocSingular` package, then `svd_k` will be used to used to reduce the computational cost of singular value decomposition. Default to 50.
`dist`	The distance metrics that are used in the calculation of the mutual nearest cluster, default is Pearson correlation.
`WV`	A optional vector indicating the wanted variation factor other than cell type info, such as cell stages.
`WV_marker`	An optional vector indicating the markers of the wanted variation.
`BPPARAM`	A `BiocParallelParam` class object from the `BiocParallel` package is used. Default is SerialParam().
`return_all_RUV`	If `FALSE`, then only returns a `SingleCellExperiment` object with original data and one normalised matrix. Otherwise, the `SingleCellExperiment` object will contain the original data and one normalised matrix for `each` ruvK value. In this latter case, assay_name must have the same length as ruvK.
`assay_name`	The assay name(s) for the adjusted expression matrix(matrices). If `return_all_RUV = TRUE` assay_name must have the same length as ruvK.
`plot_igraph`	If `TRUE`, then during the un/semi-supervised scMerge, igraph plot will be displayed
`verbose`	If `TRUE`, then all intermediate steps will be shown. Default to `FALSE`.

Value

Returns a SingleCellExperiment object with following components:

assays: the original assays and also the normalised matrix
metadata: containing the ruvK vector, ruvK_optimal based on F-score, and the replicate matrix

Author(s)

Yingxin Lin, Kevin Wang

Examples

## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')
## Running an example data with minimal inputs
sce_mESC <- scMerge(sce_combine = example_sce,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG,
kmeansK = c(3, 3),
assay_name = 'scMerge')

sce_mESC = scater::runPCA(sce_mESC, exprs_values = "logcounts")                      
scater::plotPCA(sce_mESC, colour_by = 'cellTypes', shape = 'batch')

sce_mESC = scater::runPCA(sce_mESC, exprs_values = 'scMerge')                                       
scater::plotPCA(sce_mESC, colour_by = 'cellTypes', shape = 'batch')
## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')
## Running an example data with minimal inputs
sce_mESC <- scMerge(sce_combine = example_sce,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG,
kmeansK = c(3, 3),
assay_name = 'scMerge')

sce_mESC = scater::runPCA(sce_mESC, exprs_values = "logcounts")                      
scater::plotPCA(sce_mESC, colour_by = 'cellTypes', shape = 'batch')

sce_mESC = scater::runPCA(sce_mESC, exprs_values = 'scMerge')                                       
scater::plotPCA(sce_mESC, colour_by = 'cellTypes', shape = 'batch')

Perform the scMerge2 algorithm

Description

Merge single-cell data from different batches and experiments leveraging (pseudo)-replicates, control genes and pseudo-bulk.

Usage

scMerge2(
  exprsMat,
  batch,
  cellTypes = NULL,
  condition = NULL,
  ctl = rownames(exprsMat),
  chosen.hvg = NULL,
  ruvK = 20,
  use_bpparam = BiocParallel::SerialParam(),
  use_bsparam = BiocSingular::RandomParam(),
  use_bnparam = BiocNeighbors::AnnoyParam(),
  pseudoBulk_fn = "create_pseudoBulk",
  k_pseudoBulk = 30,
  k_celltype = 10,
  exprsMat_counts = NULL,
  cosineNorm = TRUE,
  return_subset = FALSE,
  return_subset_genes = NULL,
  return_matrix = TRUE,
  byChunk = TRUE,
  chunkSize = 50000,
  verbose = TRUE,
  seed = 1
)
scMerge2(
  exprsMat,
  batch,
  cellTypes = NULL,
  condition = NULL,
  ctl = rownames(exprsMat),
  chosen.hvg = NULL,
  ruvK = 20,
  use_bpparam = BiocParallel::SerialParam(),
  use_bsparam = BiocSingular::RandomParam(),
  use_bnparam = BiocNeighbors::AnnoyParam(),
  pseudoBulk_fn = "create_pseudoBulk",
  k_pseudoBulk = 30,
  k_celltype = 10,
  exprsMat_counts = NULL,
  cosineNorm = TRUE,
  return_subset = FALSE,
  return_subset_genes = NULL,
  return_matrix = TRUE,
  byChunk = TRUE,
  chunkSize = 50000,
  verbose = TRUE,
  seed = 1
)

Arguments

`exprsMat`	A gene (row) by cell (column) log-transformed matrix to be adjusted.
`batch`	A vector indicating the batch information for each cell in the batch-combined matrix.
`cellTypes`	An optional vector indicating the cell type information for each cell in the batch-combined matrix. If it is `NULL`, pseudo-replicate procedure will be run to identify cell type.
`condition`	An optional vector indicating the condition information for each cell in the batch-combined matrix.
`ctl`	A character vector of negative control. It should have a non-empty intersection with the rows of exprsMat
`chosen.hvg`	An optional character vector of highly variable genes chosen.
`ruvK`	An integer indicates the number of unwanted variation factors that are removed, default is 20.
`use_bpparam`	A `BiocParallelParam` class object from the `BiocParallel` package is used. Default is SerialParam().
`use_bsparam`	A `BiocSingularParam` class object from the `BiocSingular` package is used. Default is RandomParam().
`use_bnparam`	A `BiocNeighborsParam` class object from the `BiocNeighbors` package is used. Default is AnnoyParam().
`pseudoBulk_fn`	A character indicates the way of pseudobulk constructed.
`k_pseudoBulk`	An integer indicates the number of pseudobulk constructed within each cell grouping. Default is 30.
`k_celltype`	An integer indicates the number of nearest neighbours used in `buildSNNGraph` when grouping cells within each batch. Default is 10.
`exprsMat_counts`	A gene (row) by cell (column) counts matrix to be adjusted.
`cosineNorm`	A logical vector indicates whether cosine normalisation is performed on input data.
`return_subset`	If `TRUE`, adjusted matrix of only a subset of genes (hvg or indicates in `return_subset_genes`) will be return.
`return_subset_genes`	An optional character vector of indicates the subset of genes will be adjusted.
`return_matrix`	A logical value indicates whether the adjusted matrix is calculated and returned.
`byChunk`	A logical value indicates whether it calculates the adjusted matrix by chunk
`chunkSize`	A numeric indicates the size of the chunk. If `FALSE`, then only the estimated parameters will be returned.
`verbose`	If `TRUE`, then all intermediate steps will be shown. Default to `FALSE`.
`seed`	A numeric input indicates the seed used.

Value

Returns a list object with following components:

newY: if return_matrix is TRUE, the adjusted matrix will be return.
fullalpha: Alpha estimated from the fastRUVIII model.
M: Replicate matrix.

Author(s)

Yingxin Lin

Examples

## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')
## Running an example data with minimal inputs
library(SingleCellExperiment)
exprsMat <- scMerge2(exprsMat = logcounts(example_sce),
batch = example_sce$batch,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG)
assay(example_sce, "scMerge2") <- exprsMat$newY


example_sce = scater::runPCA(example_sce, exprs_values = 'scMerge2')                                       
scater::plotPCA(example_sce, colour_by = 'cellTypes', shape = 'batch')
## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')
## Running an example data with minimal inputs
library(SingleCellExperiment)
exprsMat <- scMerge2(exprsMat = logcounts(example_sce),
batch = example_sce$batch,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG)
assay(example_sce, "scMerge2") <- exprsMat$newY


example_sce = scater::runPCA(example_sce, exprs_values = 'scMerge2')                                       
scater::plotPCA(example_sce, colour_by = 'cellTypes', shape = 'batch')

Perform the scMerge2 (hierarchical) algorithm

Description

Merge single-cell data hierarchically from different batches and experiments leveraging (pseudo)-replicates, control genes and pseudo-bulk.

Usage

scMerge2h(
  exprsMat,
  batch_list = list(),
  h_idx_list = list(),
  cellTypes = NULL,
  condition = NULL,
  ctl = rownames(exprsMat),
  chosen.hvg = NULL,
  ruvK_list = 20,
  use_bpparam = BiocParallel::SerialParam(),
  use_bsparam = BiocSingular::RandomParam(),
  use_bnparam = BiocNeighbors::AnnoyParam(),
  pseudoBulk_fn = "create_pseudoBulk",
  k_pseudoBulk = 30,
  k_celltype = 10,
  exprsMat_counts = NULL,
  cosineNorm = TRUE,
  return_subset = FALSE,
  return_subset_genes = NULL,
  return_matrix = TRUE,
  verbose = TRUE,
  seed = 1
)
scMerge2h(
  exprsMat,
  batch_list = list(),
  h_idx_list = list(),
  cellTypes = NULL,
  condition = NULL,
  ctl = rownames(exprsMat),
  chosen.hvg = NULL,
  ruvK_list = 20,
  use_bpparam = BiocParallel::SerialParam(),
  use_bsparam = BiocSingular::RandomParam(),
  use_bnparam = BiocNeighbors::AnnoyParam(),
  pseudoBulk_fn = "create_pseudoBulk",
  k_pseudoBulk = 30,
  k_celltype = 10,
  exprsMat_counts = NULL,
  cosineNorm = TRUE,
  return_subset = FALSE,
  return_subset_genes = NULL,
  return_matrix = TRUE,
  verbose = TRUE,
  seed = 1
)

Arguments

`exprsMat`	A gene (row) by cell (column) log-transformed matrix to be adjusted.
`batch_list`	A list indicating the batch information for each cell in the batch-combined matrix.
`h_idx_list`	A list indicating the indeces information in the hierarchical merging.
`cellTypes`	An optional vector indicating the cell type information for each cell in the batch-combined matrix. If it is `NULL`, pseudo-replicate procedure will be run to identify cell type.
`condition`	An optional vector indicating the condition information for each cell in the batch-combined matrix.
`ctl`	A character vector of negative control. It should have a non-empty intersection with the rows of exprsMat
`chosen.hvg`	An optional character vector of highly variable genes chosen.
`ruvK_list`	An integer indicates the number of unwanted variation factors that are removed, default is 20.
`use_bpparam`	A `BiocParallelParam` class object from the `BiocParallel` package is used. Default is SerialParam().
`use_bsparam`	A `BiocSingularParam` class object from the `BiocSingular` package is used. Default is RandomParam().
`use_bnparam`	A `BiocNeighborsParam` class object from the `BiocNeighbors` package is used. Default is AnnoyParam().
`pseudoBulk_fn`	A character indicates the way of pseudobulk constructed.
`k_pseudoBulk`	An integer indicates the number of pseudobulk constructed within each cell grouping. Default is 30.
`k_celltype`	An integer indicates the number of nearest neighbours used in `buildSNNGraph` when grouping cells within each batch. Default is 10.
`exprsMat_counts`	A gene (row) by cell (column) counts matrix to be adjusted.
`cosineNorm`	A logical vector indicates whether cosine normalisation is performed on input data.
`return_subset`	If `TRUE`, adjusted matrix of only a subset of genes (hvg or indicates in `return_subset_genes`) will be return.
`return_subset_genes`	An optional character vector of indicates the subset of genes will be adjusted.
`return_matrix`	A logical vector indicates whether the adjusted matrix is calculated and returned. If `FALSE`, then only the estimated parameters will be returned.
`verbose`	If `TRUE`, then all intermediate steps will be shown. Default to `FALSE`.
`seed`	A numeric input indicates the seed used.

Author(s)

Yingxin Lin

Examples


## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')


# Create a fake sample information
example_sce$sample <- rep(c(1:4), each = 50)

# Construct a hierarchical index list
h_idx_list <- list(level1 = split(1:200, example_sce$batch),
                   level2 = list(1:200))

# Construct a batch information list
batch_list <- list(level1 = split(example_sce$sample, example_sce$batch),
                   level2 = list(example_sce$batch))
library(SingleCellExperiment)
exprsMat <- scMerge2h(exprsMat = logcounts(example_sce),
batch_list = batch_list,
h_idx_list = h_idx_list,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG,
ruvK_list = c(2, 5))
assay(example_sce, "scMerge2") <- exprsMat[[length(h_idx_list)]]
example_sce = scater::runPCA(example_sce, exprs_values = 'scMerge2')                                       
scater::plotPCA(example_sce, colour_by = 'cellTypes', shape = 'batch')
## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')


# Create a fake sample information
example_sce$sample <- rep(c(1:4), each = 50)

# Construct a hierarchical index list
h_idx_list <- list(level1 = split(1:200, example_sce$batch),
                   level2 = list(1:200))

# Construct a batch information list
batch_list <- list(level1 = split(example_sce$sample, example_sce$batch),
                   level2 = list(example_sce$batch))
library(SingleCellExperiment)
exprsMat <- scMerge2h(exprsMat = logcounts(example_sce),
batch_list = batch_list,
h_idx_list = h_idx_list,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG,
ruvK_list = c(2, 5))
assay(example_sce, "scMerge2") <- exprsMat[[length(h_idx_list)]]
example_sce = scater::runPCA(example_sce, exprs_values = 'scMerge2')                                       
scater::plotPCA(example_sce, colour_by = 'cellTypes', shape = 'batch')

Create replicate matrix for scMerge algorithm

Description

Create replicate matrix for scMerge algorithm using un-/semi-/supervised approaches.

Usage

scReplicate(
  sce_combine,
  batch = NULL,
  kmeansK = NULL,
  exprs = "logcounts",
  hvg_exprs = "counts",
  marker = NULL,
  marker_list = NULL,
  replicate_prop = 1,
  cell_type = NULL,
  cell_type_match = FALSE,
  cell_type_inc = NULL,
  dist = "cor",
  WV = NULL,
  WV_marker = NULL,
  BPPARAM = SerialParam(),
  return_all = FALSE,
  BSPARAM = ExactParam(),
  plot_igraph = TRUE,
  verbose = FALSE
)
scReplicate(
  sce_combine,
  batch = NULL,
  kmeansK = NULL,
  exprs = "logcounts",
  hvg_exprs = "counts",
  marker = NULL,
  marker_list = NULL,
  replicate_prop = 1,
  cell_type = NULL,
  cell_type_match = FALSE,
  cell_type_inc = NULL,
  dist = "cor",
  WV = NULL,
  WV_marker = NULL,
  BPPARAM = SerialParam(),
  return_all = FALSE,
  BSPARAM = ExactParam(),
  plot_igraph = TRUE,
  verbose = FALSE
)

Arguments

`sce_combine`	A `SingleCellExperiment` object contains the batch-combined matrix with batch info in colData
`batch`	A vector indicates the batch information for each cell in the batch-combined matrix.
`kmeansK`	A vector indicates the kmeans's K for each batch, length of kmeansK needs to be the same as the number of batch.
`exprs`	A string indicates the assay that are used for batch correction, default is logcounts
`hvg_exprs`	A string indicates the assay that are used for highly variable genes identification, default is counts
`marker`	A vector of markers, which will be used in calculation of mutual nearest cluster. If no markers input, highly variable genes will be used instead
`marker_list`	A list of markers for each batch, which will be used in calculation of mutual nearest cluster.
`replicate_prop`	A number indicating the ratio of cells that are included in pseudo-replicates, ranges from 0 to 1. Default to 1.
`cell_type`	A vector indicates the cell type information for each cell in the batch-combined matrix. If it is `NULL`, pseudo-replicate procedure will be run to identify cell type.
`cell_type_match`	Whether find mutual nearest cluster using cell type information
`cell_type_inc`	A vector indicates the indices of the cells that will be used to supervise the pseudo-replicate procedure
`dist`	The distance metrics that are used in the calculation of the mutual nearest cluster, default is Pearson correlation.
`WV`	A vector indicates the wanted variation factor other than cell type info, such as cell stages.
`WV_marker`	A vector indicates the markers of the wanted variation.
`BPPARAM`	A `BiocParallelParam` class object from the `BiocParallel` package is used. Default is SerialParam().
`return_all`	If `FALSE`, only return the replicate matrix.
`BSPARAM`	A `BiocSingularParam` class object from the `BiocSingular` package is used. Default is ExactParam().
`plot_igraph`	If `TRUE`, then during the un/semi-supervised scMErge, igraph plot will be displayed
`verbose`	If `TRUE`, then all intermediate steps will be shown. Default to `FALSE`.

Value

If return_all is FALSE, return a replicate matrix. If return_sce is TRUE, return the followings

`repMat`	replicate matrix
`mnc`	mutual nearest cluster
`replicate vector`	replicate vector
`HVG`	highly variable genes used in scReplicate

A cell-replicates mapping matrix. Each row correspond to a cell from the input expression matrix, and each column correspond to a cell-cluster/cell-type. An element of the mapping matrix is 1 if the scReplicate algorithm determines that this cell should belong to that cell cluster and 0 otherwise.

Author(s)

Yingxin Lin, Kevin Wang

Examples

## Loading example data
set.seed(1)
data('example_sce', package = 'scMerge')
scRep_result = scReplicate(
  sce_combine = example_sce,
  batch = example_sce$batch,
  kmeansK = c(3,3))

## Loading example data
set.seed(1)
data('example_sce', package = 'scMerge')
scRep_result = scReplicate(
  sce_combine = example_sce,
  batch = example_sce$batch,
  kmeansK = c(3,3))

RUVg function for single cell (under development)

Description

Modified based on RUV2 from package ruv and RUVg from package RUVSeq function (see these function's documentations for full documentations and usage)

Usage

scRUVg(
  Y,
  ctl,
  k,
  Z = 1,
  eta = NULL,
  include.intercept = TRUE,
  fullW = NULL,
  svdyc = NULL
)
scRUVg(
  Y,
  ctl,
  k,
  Z = 1,
  eta = NULL,
  include.intercept = TRUE,
  fullW = NULL,
  svdyc = NULL
)

Arguments

`Y`	The data. A m by n matrix, where m is the number of observations and n is the number of features.
`ctl`	index vector to specify the negative controls.
`k`	The number of unwanted factors to use.
`Z`	Any additional covariates to include in the model.
`eta`	Gene-wise (as opposed to sample-wise) covariates.
`include.intercept`	Applies to both Z and eta. When Z or eta (or both) is specified (not NULL) but does not already include an intercept term, this will automatically include one. If only one of Z or eta should include an intercept, this variable should be set to FALSE, and the intercept term should be included manually where desired.
`fullW`	Can be included to speed up execution. Is returned by previous calls of scRUVg
`svdyc`	Can be included to speed up execution. For internal use; please use fullW instead.

Value

A list consists of:

A matrix newY, the normalised matrix,
A matrix W, the unwanted variation matrix, and ;
A matrix alpha, this corresponding coefficient matrix for W.

Author(s)

Yingxin Lin, Kevin Wang

Examples

L = scMerge::ruvSimulate(m = 80, n = 1000, nc = 50, nCelltypes = 10)
Y = L$Y; ctl = L$ctl
ruvgRes = scMerge::scRUVg(Y = Y, ctl = ctl, k = 20)
L = scMerge::ruvSimulate(m = 80, n = 1000, nc = 50, nCelltypes = 10)
Y = L$Y; ctl = L$ctl
ruvgRes = scMerge::scRUVg(Y = Y, ctl = ctl, k = 20)

scRUVIII: RUVIII algorithm optimised for single cell data

Description

A function to perform location/scale adjustment to data as the input of RUVIII which also provides the option to select optimal RUVk according to the silhouette coefficient

Usage

scRUVIII(
  Y = Y,
  M = M,
  ctl = ctl,
  fullalpha = NULL,
  k = k,
  cell_type = NULL,
  batch = NULL,
  return_all_RUV = TRUE,
  BPPARAM = SerialParam(),
  BSPARAM = ExactParam(),
  svd_k = 50
)
scRUVIII(
  Y = Y,
  M = M,
  ctl = ctl,
  fullalpha = NULL,
  k = k,
  cell_type = NULL,
  batch = NULL,
  return_all_RUV = TRUE,
  BPPARAM = SerialParam(),
  BSPARAM = ExactParam(),
  svd_k = 50
)

Arguments

`Y`	The unnormalised SC data. A m by n matrix, where m is the number of observations and n is the number of features.
`M`	The replicate mapping matrix. The mapping matrix has m rows (one for each observation), and each column represents a set of replicates. The (i, j)-th entry of the mapping matrix is 1 if the i-th observation is in replicate set j, and 0 otherwise. See ruv::RUVIII for more details.
`ctl`	An index vector to specify the negative controls. Either a logical vector of length n or a vector of integers.
`fullalpha`	Not used. Please ignore.
`k`	The number of unwanted factors to remove. This is inherited from the ruvK argument from the scMerge::scMerge function.
`cell_type`	An optional vector indicating the cell type information for each cell in the batch-combined matrix. If it is `NULL`, pseudo-replicate procedure will be run to identify cell type.
`batch`	Batch information inherited from the scMerge::scMerge function.
`return_all_RUV`	Whether to return extra information on the RUV function, inherited from the scMerge::scMerge function
`BPPARAM`	A `BiocParallelParam` class object from the `BiocParallel` package is used. Default is SerialParam().
`BSPARAM`	A `BiocSingularParam` class object from the `BiocSingular` package is used. Default is ExactParam().
`svd_k`	If BSPARAM is set to `RandomParam` or `IrlbaParam` class from `BiocSingular` package, then `svd_k` will be used to used to reduce the computational cost of singular value decomposition. Default to 50.

Value

A list consists of:

RUV-normalised matrices: If k has multiple values, then the RUV-normalised matrices using all the supplied k values will be returned.
optimal_ruvK: The optimal RUV k value as determined by silhouette coefficient.

Author(s)

Yingxin Lin, Kevin Wang

Examples

L = ruvSimulate(m = 200, n = 1000, nc = 100, nCelltypes = 3, nBatch = 2, lambda = 0.1, sce = FALSE)
Y = t(log2(L$Y + 1L)); M = L$M; ctl = L$ctl; batch = L$batch;
res = scRUVIII(Y = Y, M = M, ctl = ctl, k = c(5, 10, 15, 20), batch = batch)
L = ruvSimulate(m = 200, n = 1000, nc = 100, nCelltypes = 3, nBatch = 2, lambda = 0.1, sce = FALSE)
Y = t(log2(L$Y + 1L)); M = L$M; ctl = L$ctl; batch = L$batch;
res = scRUVIII(Y = Y, M = M, ctl = ctl, k = c(5, 10, 15, 20), batch = batch)

Single Cell Stably Express Gene Index

Description

This function computes the single-cell Stably Expressed Gene (scSEG) index from Lin. et al. (2019) for a given single-cell count data matrix. Each gene in the data is fitted with a gamma-normal mixture model and the final SEG index is computed as an average of key parameters that measure the expression stability of a gene.

We recommend using either the pre-computed genes (see "See Also" below) or the top SEG genes from an user's own data as the control genes in the scMerge function (see the ctl argument in the scMerge function).

Usage

scSEGIndex(
  exprs_mat,
  cell_type = NULL,
  BPPARAM = SerialParam(progressbar = TRUE),
  return_all = FALSE
)
scSEGIndex(
  exprs_mat,
  cell_type = NULL,
  BPPARAM = SerialParam(progressbar = TRUE),
  return_all = FALSE
)

Arguments

`exprs_mat`	A log-transformed single-cell data, assumed to have no batch effect and covered a wide range of cell types. A n by m matrix, where n is the number of genes and m is the number of cells.
`cell_type`	A vector indicating the cell type information for each cell in the gene expression matrix. If it is `NULL`, the function calculates the scSEG index without using F-statistics.
`BPPARAM`	A `BiocParallelParam` class object from the `BiocParallel` package is used. Default is SerialParam(progressbar = TRUE).
`return_all`	Default to FALSE. If set to TRUE, then all genes will be returned. Otherwise, only genes with less than 80 percent zeroes will be returned.

Value

Returns a data frame. Each row is a gene and each column is a statistic relating to the stability of expression of each gene. The main statistic is the segIdx column, which is the SEG index.

Author(s)

Shila Ghazanfar, Yingxin Lin, Pengyi Yang

References

Evaluating stably expressed genes in single cells (2019). doi:10.1093/gigascience/giz106.

Examples

## Loading example data
data('example_sce', package = 'scMerge')
## subsetting genes to illustrate usage.
exprs_mat = SummarizedExperiment::assay(example_sce, 'logcounts')[1:110, 1:20]
set.seed(1)
result1 = scSEGIndex(exprs_mat = exprs_mat)
## If parallelisation is needed:
param = BiocParallel::MulticoreParam(workers = 2, progressbar = TRUE)
result2 = scSEGIndex(exprs_mat = exprs_mat, BPPARAM = param)
## Closing the parallelisation
BiocParallel::register(BPPARAM = BiocParallel::SerialParam())
## Loading example data
data('example_sce', package = 'scMerge')
## subsetting genes to illustrate usage.
exprs_mat = SummarizedExperiment::assay(example_sce, 'logcounts')[1:110, 1:20]
set.seed(1)
result1 = scSEGIndex(exprs_mat = exprs_mat)
## If parallelisation is needed:
param = BiocParallel::MulticoreParam(workers = 2, progressbar = TRUE)
result2 = scSEGIndex(exprs_mat = exprs_mat, BPPARAM = param)
## Closing the parallelisation
BiocParallel::register(BPPARAM = BiocParallel::SerialParam())

Stably expressed gene list in official gene symbols for both human and mouse

Description

A list includes the stably expressed genes for both human and mouse

Usage

data(segList, package = 'scMerge')
data(segList, package = 'scMerge')

Format

An object of class list of length 2.

Stably expressed gene list in EnsemblGeneID for both human and mouse

Description

A list includes the stably expressed genes for both human and mouse

Usage

data(segList_ensemblGeneID, package = 'scMerge')
data(segList_ensemblGeneID, package = 'scMerge')

Format

An object of class list of length 2.

Package 'scMerge'

Help Index

Subsetted mouse ESC 'SingleCellExperiment' object

Description

Usage

Format

Source

References

A fast version of the ruv::RUVIII algorithm

Description

Usage

Arguments

Value

Author(s)

Examples

getAdjustedMat

Description

Usage

Arguments

Value

Author(s)

Examples

Simulate a simple matrix or SingleCellExperiment to test internals of scMerge

Description

Usage

Arguments

Value

Examples

Combind several SingleCellExperiment objects from different batches/experiments

Description

Usage

Arguments

Value

Author(s)

Examples

Perform the scMerge algorithm

Description

Usage

Arguments

Value

Author(s)

Examples

Perform the scMerge2 algorithm

Description

Usage

Arguments

Value

Author(s)

Examples

Perform the scMerge2 (hierarchical) algorithm

Description

Usage

Arguments

Author(s)

Examples

Create replicate matrix for scMerge algorithm

Description

Usage

Arguments

Value

Author(s)

Examples

RUVg function for single cell (under development)

Description

Usage

Arguments

Value

Author(s)

Examples

scRUVIII: RUVIII algorithm optimised for single cell data

Description

Usage

Arguments

Value

Author(s)

Examples

Single Cell Stably Express Gene Index

Description

Usage

Arguments

Combind several `SingleCellExperiment` objects from different batches/experiments