Title: | scRNAseq demultiplexing using cell hashing and SNPs |
---|---|
Description: | This package assists in demultiplexing scRNAseq data using both cell hashing and SNPs data. The SNP profile of each group os learned using high confidence assignments from the cell hashing data. Cells which cannot be assigned with high confidence from the cell hashing data are assigned to their most similar group based on their SNPs. We also provide some helper function to optimise SNP selection, create training data and merge SNP data into the SingleCellExperiment framework. |
Authors: | Michael Lynch [aut, cre] , Aedin Culhane [aut] |
Maintainer: | Michael Lynch <[email protected]> |
License: | GPL-3 |
Version: | 1.5.0 |
Built: | 2024-11-19 04:00:41 UTC |
Source: | https://github.com/bioc/demuxSNP |
Add SNPs to SingleCellExperiment object
add_snps(sce, mat, thresh = 0.8)
add_snps(sce, mat, thresh = 0.8)
sce |
object of class SingleCellExperiment |
mat |
object of class matrix, output from VarTrix in 'consensus' mode (default) |
thresh |
threshold presence of SNP, defaults to 0.8 |
Updated SingleCellExperiment object with snps in altExp slot
data(multiplexed_scrnaseq_sce, vartrix_consensus_snps) multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, mat = vartrix_consensus_snps, thresh = 0.8)
data(multiplexed_scrnaseq_sce, vartrix_consensus_snps) multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, mat = vartrix_consensus_snps, thresh = 0.8)
Returns a character vector of the top n most frequently expressed genes from the counts of the SingleCellExperiment object. Expression is based on having a count > 0 in a given cell.
common_genes(sce, n = 100)
common_genes(sce, n = 100)
sce |
a SingleCellExperiment object |
n |
number of genes to be returned. Defaults to n=100. |
character vector of n most frequently expressed genes.
data(multiplexed_scrnaseq_sce) multiplexed_scrnaseq_sce <- common_genes(multiplexed_scrnaseq_sce)
data(multiplexed_scrnaseq_sce) multiplexed_scrnaseq_sce <- common_genes(multiplexed_scrnaseq_sce)
VCF file containing SNPs from a subset of the 1k Genomes common variants HG38 genome build.
data(commonvariants_1kgenomes_subset)
data(commonvariants_1kgenomes_subset)
An object of class CollapsedVCF
with 2609 rows and 0 columns.
commonvariants_1kgenomes_subset
An object of class CollapsedVcf
https://cellsnp-lite.readthedocs.io/en/latest/snp_list.html
Run demuxmix to determine high-confidence calls
high_conf_calls(sce, assay = "HTO", pacpt = 0.95)
high_conf_calls(sce, assay = "HTO", pacpt = 0.95)
sce |
Object of class SingleCellExperiment with HTO (or similar) altExp assay |
assay |
Name of altExp for cell hashing counts to be retrieved from |
pacpt |
acceptance probability for demuxmix model |
Updated SingleCellExperiment object with logical vector indicating training data, data to be classified (all cells) and assigned labels for all cells.
data(multiplexed_scrnaseq_sce) multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce)
data(multiplexed_scrnaseq_sce) multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce)
Example SingleCellExperiment object containing demultiplexed scRNAseq data from six donors, used throughout and built upon in demuxSNP workflow.
data(multiplexed_scrnaseq_sce)
data(multiplexed_scrnaseq_sce)
An object of class SingleCellExperiment
with 259 rows and 2000 columns.
multiplexed_scrnaseq_sce
An object of class SingleCellExperiment
k-nearest neighbour classification of cells. Training data is intended to be labels of cells confidently called using cell hashing based methods and their corresponding SNPs. Prediction data can be remaining cells but can also include the training data. Doublets are simulated by randomly combining 'd' SNP profiles from each grouping combination.
reassign( sce, k = 10, d = 10, train_cells = sce$train, predict_cells = sce$predict )
reassign( sce, k = 10, d = 10, train_cells = sce$train, predict_cells = sce$predict )
sce |
object of class SingleCellExperiment |
k |
number of neighbours used in knn, defaults to 10 |
d |
number of doublets per group combination to simulate, defaults to 10 |
train_cells |
logical vector specifying which cells to use to train classifier |
predict_cells |
logical vector specifying which cells to classify |
A SingleCellExperiment with updated group assignments called 'knn'
data(multiplexed_scrnaseq_sce, vartrix_consensus_snps) multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce) multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, mat = vartrix_consensus_snps, thresh = 0.8) multiplexed_scrnaseq_sce <- reassign(sce = multiplexed_scrnaseq_sce, k = 10)
data(multiplexed_scrnaseq_sce, vartrix_consensus_snps) multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce) multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, mat = vartrix_consensus_snps, thresh = 0.8) multiplexed_scrnaseq_sce <- reassign(sce = multiplexed_scrnaseq_sce, k = 10)
k-nearest neighbour classification of cells. Training data is intended to be labels of cells confidently called using cell hashing based methods and their corresponding SNPs. Prediction data can be remaining cells but can also include the training data. Doublets are simulated by randomly combining 'd' SNP profiles from each grouping combination.
reassign_balanced( sce, k = 20, d_prop = 0.5, train_cells = sce$train, predict_cells = sce$predict, nmin = 50, n = NULL )
reassign_balanced( sce, k = 20, d_prop = 0.5, train_cells = sce$train, predict_cells = sce$predict, nmin = 50, n = NULL )
sce |
object of class SingleCellExperiment |
k |
number of neighbours used in knn, defaults to 10 |
d_prop |
determines number of doublets simulatted d, as a proportions of n (specified or calculated) |
train_cells |
logical vector specifying which cells to use to train classifier |
predict_cells |
logical vector specifying which cells to classify |
nmin |
min n per class (where available) |
n |
number of cells per group (otherwise will be calculated from data) |
A SingleCellExperiment with updated group assignments called 'knn_balanced'
data(multiplexed_scrnaseq_sce, vartrix_consensus_snps) multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce) multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, mat = vartrix_consensus_snps, thresh = 0.8) multiplexed_scrnaseq_sce <- reassign_balanced(sce = multiplexed_scrnaseq_sce, k = 10, d=0.5)
data(multiplexed_scrnaseq_sce, vartrix_consensus_snps) multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce) multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, mat = vartrix_consensus_snps, thresh = 0.8) multiplexed_scrnaseq_sce <- reassign_balanced(sce = multiplexed_scrnaseq_sce, k = 10, d=0.5)
Reassign cells based on SNPs
reassign_centroid( sce, train_cells = sce$train, predict_cells = sce$predict, labels = sce$labels, min_cells = 30, key = "Hashtag" )
reassign_centroid( sce, train_cells = sce$train, predict_cells = sce$predict, labels = sce$labels, min_cells = 30, key = "Hashtag" )
sce |
SingleCellExperiment object |
train_cells |
logical, cells to be used for training |
predict_cells |
logical, cells to be used for prediction |
labels |
provisional cell labels |
min_cells |
minimum coverage (number of cells with read at SNP location) for SNP to be used for classification. |
key |
unique key in naming of singlet groups used with grep to remove doublet/negative/uncertain labels |
character vector containing reassignments
data(multiplexed_scrnaseq_sce, vartrix_consensus_snps) multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce) multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, mat = vartrix_consensus_snps, thresh = 0.8) multiplexed_scrnaseq_sce<-reassign_centroid(multiplexed_scrnaseq_sce)
data(multiplexed_scrnaseq_sce, vartrix_consensus_snps) multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce) multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, mat = vartrix_consensus_snps, thresh = 0.8) multiplexed_scrnaseq_sce<-reassign_centroid(multiplexed_scrnaseq_sce)
k-nearest neighbour classification of cells. Training data is intended to be labels of cells confidently called using cell hashing based methods and their corresponding SNPs. Prediction data can be remaining cells but can also include the training data. Doublets are simulated by randomly combining 'd' SNP profiles from each grouping combination.
reassign_jaccard( sce, k = 10, d = 10, train_cells = sce$train, predict_cells = sce$predict )
reassign_jaccard( sce, k = 10, d = 10, train_cells = sce$train, predict_cells = sce$predict )
sce |
object of class SingleCellExperiment |
k |
number of neighbours used in knn, defaults to 10 |
d |
number of doublets per group combination to simulate, defaults to 10 |
train_cells |
logical vector specifying which cells to use to train classifier |
predict_cells |
logical vector specifying which cells to classify |
A SingleCellExperiment with updated group assignments called 'knn_jaccard'
data(multiplexed_scrnaseq_sce, vartrix_consensus_snps) multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce) multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, mat = vartrix_consensus_snps, thresh = 0.8) multiplexed_scrnaseq_sce <- reassign(sce = multiplexed_scrnaseq_sce, k = 10)
data(multiplexed_scrnaseq_sce, vartrix_consensus_snps) multiplexed_scrnaseq_sce <- high_conf_calls(multiplexed_scrnaseq_sce) multiplexed_scrnaseq_sce <- add_snps(sce = multiplexed_scrnaseq_sce, mat = vartrix_consensus_snps, thresh = 0.8) multiplexed_scrnaseq_sce <- reassign(sce = multiplexed_scrnaseq_sce, k = 10)
Subset common variants vcf file to only SNPs seen in 'top_genes'
subset_vcf(vcf, top_genes, ensdb)
subset_vcf(vcf, top_genes, ensdb)
vcf |
object of class CollapsedVCF |
top_genes |
output from 'common_genes' function, alternatively character vector containing custom gene names. |
ensdb |
object of class EnsDb corresponding to organism, genome of data |
object of class CollapsedVCF containing subset of SNPs from supplied vcf seen in commonly expressed genes
data(multiplexed_scrnaseq_sce, commonvariants_1kgenomes_subset) top_genes <- common_genes(multiplexed_scrnaseq_sce) ensdb <- EnsDb.Hsapiens.v86::EnsDb.Hsapiens.v86 small_vcf <- subset_vcf(commonvariants_1kgenomes_subset, top_genes, ensdb)
data(multiplexed_scrnaseq_sce, commonvariants_1kgenomes_subset) top_genes <- common_genes(multiplexed_scrnaseq_sce) ensdb <- EnsDb.Hsapiens.v86::EnsDb.Hsapiens.v86 small_vcf <- subset_vcf(commonvariants_1kgenomes_subset, top_genes, ensdb)
A sample output from VarTrix corresponding to the sce SingleCellExperiment objec for a subset of SNPs located in well observed genes.
data(vartrix_consensus_snps)
data(vartrix_consensus_snps)
An object of class matrix
(inherits from array
) with 2542 rows and 2000 columns.
vartrix_consensus_snps
An object of class matrix