Title: | scReClassify: post hoc cell type classification of single-cell RNA-seq data |
---|---|
Description: | A post hoc cell type classification tool to fine-tune cell type annotations generated by any cell type classification procedure with semi-supervised learning algorithm AdaSampling technique. The current version of scReClassify supports Support Vector Machine and Random Forest as a base classifier. |
Authors: | Pengyi Yang [aut] , Taiyun Kim [aut, cre] |
Maintainer: | Taiyun Kim <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 1.13.0 |
Built: | 2024-12-18 04:11:23 UTC |
Source: | https://github.com/bioc/scReClassify |
This function calculates the accuracy of the prediction to the true label.
bAccuracy(cls.truth, final)
bAccuracy(cls.truth, final)
cls.truth |
A character vector of true class label. |
final |
A vector of final classified label prediction from
|
An accuracy value.
Pengyi Yang, Taiyun Kim
data("gse87795_subset_sce") mat.expr <- gse87795_subset_sce cellTypes <- gse87795_subset_sce$cellTypes # Get dimension reduced matrix. We are using `logNorm` assay from `mat.expr`. mat.pc <- matPCs(mat.expr, assay = "logNorm") # Here we are using Support Vector Machine as a base classifier. result <- multiAdaSampling(mat.pc, cellTypes, classifier = "svm", percent = 1, L = 10) final <- result$final # Balanced accuracy bacc <- bAccuracy(cellTypes, final)
data("gse87795_subset_sce") mat.expr <- gse87795_subset_sce cellTypes <- gse87795_subset_sce$cellTypes # Get dimension reduced matrix. We are using `logNorm` assay from `mat.expr`. mat.pc <- matPCs(mat.expr, assay = "logNorm") # Here we are using Support Vector Machine as a base classifier. result <- multiAdaSampling(mat.pc, cellTypes, classifier = "svm", percent = 1, L = 10) final <- result$final # Balanced accuracy bacc <- bAccuracy(cellTypes, final)
A SingleCellExperiment object containing a subset expression matrix of GSE827795. The data contains log2 transformed FPKM expression.
GSE87795 is a mouse fetal liver development data containing 1000 genes, 367 cells and 6 cell types.
The original GSE87795 data and the study details can be found at this link
gse87795_subset_sce
gse87795_subset_sce
An object of class SingleCellExperiment
with 1000 rows and 367 columns.
Performs PCA on a given matrix and returns a dimension reduced matrix which captures at least 80% (default) of overall variability.
matPCs(data, assay = NULL, percentVar = 0.8)
matPCs(data, assay = NULL, percentVar = 0.8)
data |
An expression matrix or a SingleCellExperiment object. |
assay |
An assay to select if |
percentVar |
The percentage of variance threshold. This is used to select number of Principal Components. |
This function performs PCA to reduce the dimension of the gene expression matrix limited from 10 to 20 PCs.
Dimensionally reduced matrix.
Pengyi Yang, Taiyun Kim
data("gse87795_subset_sce") mat.expr <- gse87795_subset_sce mat.pc <- matPCs(mat.expr, assay = "logNorm") # to capture at least 70% of overall variability in the dataset, mat.dim.reduct.70 <- matPCs(mat.expr, assay = "logNorm", 0.7)
data("gse87795_subset_sce") mat.expr <- gse87795_subset_sce mat.pc <- matPCs(mat.expr, assay = "logNorm") # to capture at least 70% of overall variability in the dataset, mat.dim.reduct.70 <- matPCs(mat.expr, assay = "logNorm", 0.7)
Performs multiple adaptive sampling to train a classifier model.
multiAdaSampling( data, label, reducedDimName = NULL, classifier = "svm", percent = 1, L = 10, prob = FALSE, balance = TRUE, iter = 3 )
multiAdaSampling( data, label, reducedDimName = NULL, classifier = "svm", percent = 1, L = 10, prob = FALSE, balance = TRUE, iter = 3 )
data |
A dimension reduced matrix from |
label |
A named vector of label information for each sample.
The names should match the sample names of |
reducedDimName |
A name of the |
classifier |
Base classifier model, either "SVM" ( |
percent |
Percentage of samples to select at each iteration. |
L |
Number of ensembles. Default to 10. |
prob |
logical flag to return sample's probabilities to each class. |
balance |
logical flag to if the cell types are balanced.
If |
iter |
A number of iterations to perform adaSampling. |
A final prediction, probabilities for each cell type and the model are returned as a list.
Pengyi Yang, Taiyun Kim
library(SingleCellExperiment) # Loading the data data("gse87795_subset_sce") mat.expr <- gse87795_subset_sce cellTypes <- gse87795_subset_sce$cellTypes # Get dimension reduced matrix. We are using `logNorm` assay from `mat.expr`. reducedDim(mat.expr, "matPCs") <- matPCs(mat.expr, assay = "logNorm") # Here we are using Support Vector Machine as a base classifier. result <- multiAdaSampling(mat.expr, cellTypes, reducedDimName = "matPCs", classifier = "svm", percent = 1, L = 10)
library(SingleCellExperiment) # Loading the data data("gse87795_subset_sce") mat.expr <- gse87795_subset_sce cellTypes <- gse87795_subset_sce$cellTypes # Get dimension reduced matrix. We are using `logNorm` assay from `mat.expr`. reducedDim(mat.expr, "matPCs") <- matPCs(mat.expr, assay = "logNorm") # Here we are using Support Vector Machine as a base classifier. result <- multiAdaSampling(mat.expr, cellTypes, reducedDimName = "matPCs", classifier = "svm", percent = 1, L = 10)
A post hoc cell type classification tool to fine-tune cell type annotations generated by any cell type classification procedure with semi-supervised learning algorithm AdaSampling technique.
The current version of scReClassify supports Support Vector Machine and Random Forest as a base classifier.
Maintainer:
Taiyun Kim (ORCID:0000-0002-5028-836X)
Email: [email protected]
Authors:
Pengyi Yang (ORCID: 0000-0003-1098-3138)
Useful links:
Vignette available at: https://sydneybiox.github.io/scdney/