Package 'demuxSNP' reference manual

Title:	scRNAseq demultiplexing using cell hashing and SNPs
Description:	This package assists in demultiplexing scRNAseq data using both cell hashing and SNPs data. The SNP profile of each group os learned using high confidence assignments from the cell hashing data. Cells which cannot be assigned with high confidence from the cell hashing data are assigned to their most similar group based on their SNPs. We also provide some helper function to optimise SNP selection, create training data and merge SNP data into the SingleCellExperiment framework.
Authors:	Michael Lynch [aut, cre] , Aedin Culhane [aut]
Maintainer:	Michael Lynch <[email protected]>
License:	GPL-3
Version:	1.5.0
Built:	2025-03-19 03:51:08 UTC
Source:	https://github.com/bioc/demuxSNP

Add SNPs to SingleCellExperiment object

Description

Add SNPs to SingleCellExperiment object

Usage

add_snps(sce, mat, thresh = 0.8)
add_snps(sce, mat, thresh = 0.8)

Arguments

`sce`	object of class SingleCellExperiment
`mat`	object of class matrix, output from VarTrix in 'consensus' mode (default)
`thresh`	threshold presence of SNP, defaults to 0.8

Value

Updated SingleCellExperiment object with snps in altExp slot

Examples

data(multiplexed_scrnaseq_sce, vartrix_consensus_snps)
multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, 
mat = vartrix_consensus_snps, 
thresh = 0.8)

data(multiplexed_scrnaseq_sce, vartrix_consensus_snps)
multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, 
mat = vartrix_consensus_snps, 
thresh = 0.8)

Return a character vector of top n most frequent genes from a SingleCellExperiment object.

Description

Returns a character vector of the top n most frequently expressed genes from the counts of the SingleCellExperiment object. Expression is based on having a count > 0 in a given cell.

Usage

common_genes(sce, n = 100)
common_genes(sce, n = 100)

Arguments

`sce`	a SingleCellExperiment object
`n`	number of genes to be returned. Defaults to n=100.

Value

character vector of n most frequently expressed genes.

Examples

data(multiplexed_scrnaseq_sce)
multiplexed_scrnaseq_sce <- common_genes(multiplexed_scrnaseq_sce)

data(multiplexed_scrnaseq_sce)
multiplexed_scrnaseq_sce <- common_genes(multiplexed_scrnaseq_sce)

Sample vcf file

Description

VCF file containing SNPs from a subset of the 1k Genomes common variants HG38 genome build.

Usage

data(commonvariants_1kgenomes_subset)
data(commonvariants_1kgenomes_subset)

Format

An object of class CollapsedVCF with 2609 rows and 0 columns.

Value

`commonvariants_1kgenomes_subset`

An object of class CollapsedVcf

Source

https://cellsnp-lite.readthedocs.io/en/latest/snp_list.html

Run demuxmix to determine high-confidence calls

Description

Run demuxmix to determine high-confidence calls

Usage

high_conf_calls(sce, assay = "HTO", pacpt = 0.95)
high_conf_calls(sce, assay = "HTO", pacpt = 0.95)

Arguments

`sce`	Object of class SingleCellExperiment with HTO (or similar) altExp assay
`assay`	Name of altExp for cell hashing counts to be retrieved from
`pacpt`	acceptance probability for demuxmix model

Value

Updated SingleCellExperiment object with logical vector indicating training data, data to be classified (all cells) and assigned labels for all cells.

Examples

data(multiplexed_scrnaseq_sce)
multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce)

data(multiplexed_scrnaseq_sce)
multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce)

SingleCellExperiment object containing multiplexed RNA and HTO data from six biological smamples

Description

Example SingleCellExperiment object containing demultiplexed scRNAseq data from six donors, used throughout and built upon in demuxSNP workflow.

Usage

data(multiplexed_scrnaseq_sce)
data(multiplexed_scrnaseq_sce)

Format

An object of class SingleCellExperiment with 259 rows and 2000 columns.

Value

`multiplexed_scrnaseq_sce`

An object of class SingleCellExperiment

k-nearest neighbour classification of cells. Training data is intended to be labels of cells confidently called using cell hashing based methods and their corresponding SNPs. Prediction data can be remaining cells but can also include the training data. Doublets are simulated by randomly combining 'd' SNP profiles from each grouping combination.

Usage

reassign(
  sce,
  k = 10,
  d = 10,
  train_cells = sce$train,
  predict_cells = sce$predict
)
reassign(
  sce,
  k = 10,
  d = 10,
  train_cells = sce$train,
  predict_cells = sce$predict
)

Arguments

`sce`	object of class SingleCellExperiment
`k`	number of neighbours used in knn, defaults to 10
`d`	number of doublets per group combination to simulate, defaults to 10
`train_cells`	logical vector specifying which cells to use to train classifier
`predict_cells`	logical vector specifying which cells to classify

Value

A SingleCellExperiment with updated group assignments called 'knn'

Examples

data(multiplexed_scrnaseq_sce, vartrix_consensus_snps)
multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce)
multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, 
mat = vartrix_consensus_snps, 
thresh = 0.8)
multiplexed_scrnaseq_sce <- reassign(sce = multiplexed_scrnaseq_sce, k = 10)

data(multiplexed_scrnaseq_sce, vartrix_consensus_snps)
multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce)
multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, 
mat = vartrix_consensus_snps, 
thresh = 0.8)
multiplexed_scrnaseq_sce <- reassign(sce = multiplexed_scrnaseq_sce, k = 10)

Reassign cells using balanced knn with jaccard distance

Description

Usage

reassign_balanced(
  sce,
  k = 20,
  d_prop = 0.5,
  train_cells = sce$train,
  predict_cells = sce$predict,
  nmin = 50,
  n = NULL
)
reassign_balanced(
  sce,
  k = 20,
  d_prop = 0.5,
  train_cells = sce$train,
  predict_cells = sce$predict,
  nmin = 50,
  n = NULL
)

Arguments

`sce`	object of class SingleCellExperiment
`k`	number of neighbours used in knn, defaults to 10
`d_prop`	determines number of doublets simulatted d, as a proportions of n (specified or calculated)
`train_cells`	logical vector specifying which cells to use to train classifier
`predict_cells`	logical vector specifying which cells to classify
`nmin`	min n per class (where available)
`n`	number of cells per group (otherwise will be calculated from data)

Value

A SingleCellExperiment with updated group assignments called 'knn_balanced'

Examples

data(multiplexed_scrnaseq_sce, vartrix_consensus_snps)
multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce)
multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, 
mat = vartrix_consensus_snps, 
thresh = 0.8)
multiplexed_scrnaseq_sce <- reassign_balanced(sce = multiplexed_scrnaseq_sce, k = 10, d=0.5)

data(multiplexed_scrnaseq_sce, vartrix_consensus_snps)
multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce)
multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, 
mat = vartrix_consensus_snps, 
thresh = 0.8)
multiplexed_scrnaseq_sce <- reassign_balanced(sce = multiplexed_scrnaseq_sce, k = 10, d=0.5)

Reassign cells based on SNPs

Description

Reassign cells based on SNPs

Usage

reassign_centroid(
  sce,
  train_cells = sce$train,
  predict_cells = sce$predict,
  labels = sce$labels,
  min_cells = 30,
  key = "Hashtag"
)
reassign_centroid(
  sce,
  train_cells = sce$train,
  predict_cells = sce$predict,
  labels = sce$labels,
  min_cells = 30,
  key = "Hashtag"
)

Arguments

`sce`	SingleCellExperiment object
`train_cells`	logical, cells to be used for training
`predict_cells`	logical, cells to be used for prediction
`labels`	provisional cell labels
`min_cells`	minimum coverage (number of cells with read at SNP location) for SNP to be used for classification.
`key`	unique key in naming of singlet groups used with grep to remove doublet/negative/uncertain labels

Value

character vector containing reassignments

Examples

data(multiplexed_scrnaseq_sce, vartrix_consensus_snps)
multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce)
multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, 
mat = vartrix_consensus_snps, 
thresh = 0.8)
multiplexed_scrnaseq_sce<-reassign_centroid(multiplexed_scrnaseq_sce)
data(multiplexed_scrnaseq_sce, vartrix_consensus_snps)
multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce)
multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, 
mat = vartrix_consensus_snps, 
thresh = 0.8)
multiplexed_scrnaseq_sce<-reassign_centroid(multiplexed_scrnaseq_sce)

Reassign cells using knn with jaccard distance

Description

Usage

reassign_jaccard(
  sce,
  k = 10,
  d = 10,
  train_cells = sce$train,
  predict_cells = sce$predict
)
reassign_jaccard(
  sce,
  k = 10,
  d = 10,
  train_cells = sce$train,
  predict_cells = sce$predict
)

Arguments

`sce`	object of class SingleCellExperiment
`k`	number of neighbours used in knn, defaults to 10
`d`	number of doublets per group combination to simulate, defaults to 10
`train_cells`	logical vector specifying which cells to use to train classifier
`predict_cells`	logical vector specifying which cells to classify

Value

A SingleCellExperiment with updated group assignments called 'knn_jaccard'

Examples

data(multiplexed_scrnaseq_sce, vartrix_consensus_snps)
multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce)
multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, 
mat = vartrix_consensus_snps, 
thresh = 0.8)
multiplexed_scrnaseq_sce <- reassign(sce = multiplexed_scrnaseq_sce, k = 10)

data(multiplexed_scrnaseq_sce, vartrix_consensus_snps)
multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce)
multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, 
mat = vartrix_consensus_snps, 
thresh = 0.8)
multiplexed_scrnaseq_sce <- reassign(sce = multiplexed_scrnaseq_sce, k = 10)

Subset common variants vcf file to only SNPs seen in 'top_genes'

Description

Subset common variants vcf file to only SNPs seen in 'top_genes'

Usage

subset_vcf(vcf, top_genes, ensdb)
subset_vcf(vcf, top_genes, ensdb)

Arguments

`vcf`	object of class CollapsedVCF
`top_genes`	output from 'common_genes' function, alternatively character vector containing custom gene names.
`ensdb`	object of class EnsDb corresponding to organism, genome of data

Value

object of class CollapsedVCF containing subset of SNPs from supplied vcf seen in commonly expressed genes

Examples

data(multiplexed_scrnaseq_sce, commonvariants_1kgenomes_subset)
top_genes <- common_genes(multiplexed_scrnaseq_sce)
ensdb <- EnsDb.Hsapiens.v86::EnsDb.Hsapiens.v86
small_vcf <- subset_vcf(commonvariants_1kgenomes_subset, top_genes, ensdb)

data(multiplexed_scrnaseq_sce, commonvariants_1kgenomes_subset)
top_genes <- common_genes(multiplexed_scrnaseq_sce)
ensdb <- EnsDb.Hsapiens.v86::EnsDb.Hsapiens.v86
small_vcf <- subset_vcf(commonvariants_1kgenomes_subset, top_genes, ensdb)

Sample VarTrix output

Description

A sample output from VarTrix corresponding to the sce SingleCellExperiment objec for a subset of SNPs located in well observed genes.

Usage

data(vartrix_consensus_snps)
data(vartrix_consensus_snps)

Format

An object of class matrix (inherits from array) with 2542 rows and 2000 columns.

Value

`vartrix_consensus_snps`

An object of class matrix

Package 'demuxSNP'

Help Index

Add SNPs to SingleCellExperiment object

Description

Usage

Arguments

Value

Examples

Return a character vector of top n most frequent genes from a SingleCellExperiment object.

Description

Usage

Arguments

Value

Examples

Sample vcf file

Description

Usage

Format

Value

commonvariants_1kgenomes_subset

Source

Run demuxmix to determine high-confidence calls

Description

Usage

Arguments

Value

Examples

SingleCellExperiment object containing multiplexed RNA and HTO data from six biological smamples

Description

Usage

Format

Value

multiplexed_scrnaseq_sce

Reassign cells using knn

Description

Usage

Arguments

Value

Examples

Reassign cells using balanced knn with jaccard distance

Description

Usage

Arguments

Value

Examples

Reassign cells based on SNPs

Description

Usage

Arguments

Value

Examples

Reassign cells using knn with jaccard distance

Description

Usage

Arguments

Value

Examples

Subset common variants vcf file to only SNPs seen in 'top_genes'

Description

Usage

Arguments

Value

Examples

Sample VarTrix output

Description

Usage

Format

Value

vartrix_consensus_snps

`commonvariants_1kgenomes_subset`

`multiplexed_scrnaseq_sce`

`vartrix_consensus_snps`