Title: | ILoReg: a tool for high-resolution cell population identification from scRNA-Seq data |
---|---|
Description: | ILoReg is a tool for identification of cell populations from scRNA-seq data. In particular, ILoReg is useful for finding cell populations with subtle transcriptomic differences. The method utilizes a self-supervised learning method, called Iteratitive Clustering Projection (ICP), to find cluster probabilities, which are used in noise reduction prior to PCA and the subsequent hierarchical clustering and t-SNE steps. Additionally, functions for differential expression analysis to find gene markers for the populations and gene expression visualization are provided. |
Authors: | Johannes Smolander [cre, aut], Sini Junttila [aut], Mikko S Venäläinen [aut], Laura L Elo [aut] |
Maintainer: | Johannes Smolander <[email protected]> |
License: | GPL-3 |
Version: | 1.17.0 |
Built: | 2024-11-19 03:58:20 UTC |
Source: | https://github.com/bioc/ILoReg |
The AnnotationScatterPlot enables visualizing arbitrary class labels over the nonliner dimensionality reduction, e.g. t-SNE or UMAP.
AnnotationScatterPlot.SingleCellExperiment( object, annotation, return.plot, dim.reduction.type, point.size, show.legend ) ## S4 method for signature 'SingleCellExperiment' AnnotationScatterPlot( object, annotation = NULL, return.plot = FALSE, dim.reduction.type = "", point.size = 0.7, show.legend = FALSE )
AnnotationScatterPlot.SingleCellExperiment( object, annotation, return.plot, dim.reduction.type, point.size, show.legend ) ## S4 method for signature 'SingleCellExperiment' AnnotationScatterPlot( object, annotation = NULL, return.plot = FALSE, dim.reduction.type = "", point.size = 0.7, show.legend = FALSE )
object |
of |
annotation |
a character vector, factor or numeric for the class labels. |
return.plot |
return.plot whether to return the ggplot2 object or
just draw it. Default is |
dim.reduction.type |
"tsne" or "umap". Default is |
point.size |
point size. Default is |
show.legend |
a logical denoting whether to show the legend on the right
side of the plot. Default is |
ggplot2 object if return.plot=TRUE
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- RunTSNE(sce) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) ## Change the names to the first five alphabets and Visualize the annotation. custom_annotation <- plyr::mapvalues(metadata(sce)$iloreg$clustering.manual, c(1,2,3,4,5), LETTERS[1:5]) AnnotationScatterPlot(sce, annotation = custom_annotation, return.plot = FALSE, dim.reduction.type = "tsne", show.legend = FALSE)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- RunTSNE(sce) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) ## Change the names to the first five alphabets and Visualize the annotation. custom_annotation <- plyr::mapvalues(metadata(sce)$iloreg$clustering.manual, c(1,2,3,4,5), LETTERS[1:5]) AnnotationScatterPlot(sce, annotation = custom_annotation, return.plot = FALSE, dim.reduction.type = "tsne", show.legend = FALSE)
The function estimates the optimal number of clusters K from the dendrogram of the hierarhical clustering using the silhouette method.
CalcSilhInfo.SingleCellExperiment(object, K.start, K.end) ## S4 method for signature 'SingleCellExperiment' CalcSilhInfo(object, K.start = 2, K.end = 50)
CalcSilhInfo.SingleCellExperiment(object, K.start, K.end) ## S4 method for signature 'SingleCellExperiment' CalcSilhInfo(object, K.start = 2, K.end = 50)
object |
of |
K.start |
a numeric for the smallest
K value to be tested. Default is |
K.end |
a numeric for the largest
K value to be tested. Default is |
object of SingleCellExperiment
class
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- CalcSilhInfo(sce)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- CalcSilhInfo(sce)
ClusteringScatterPlot function enables visualizing the clustering over nonliner dimensionality reduction (t-SNE or UMAP).
ClusteringScatterPlot.SingleCellExperiment( object, clustering.type, return.plot, dim.reduction.type, point.size, title, show.legend ) ## S4 method for signature 'SingleCellExperiment' ClusteringScatterPlot( object, clustering.type = "manual", return.plot = FALSE, dim.reduction.type = "", point.size = 0.7, title = "", show.legend = TRUE )
ClusteringScatterPlot.SingleCellExperiment( object, clustering.type, return.plot, dim.reduction.type, point.size, title, show.legend ) ## S4 method for signature 'SingleCellExperiment' ClusteringScatterPlot( object, clustering.type = "manual", return.plot = FALSE, dim.reduction.type = "", point.size = 0.7, title = "", show.legend = TRUE )
object |
of |
clustering.type |
"manual" or "optimal". "manual" refers to the clustering formed using the "SelectKClusters" function and "optimal" to the clustering formed using the "CalcSilhInfo" function. Default is "manual". |
return.plot |
a logical denoting whether to return the ggplot2 object.
Default is |
dim.reduction.type |
"tsne" or "umap". Default is "tsne". |
point.size |
point size. Default is Default is |
title |
text to write above the plot |
show.legend |
whether to show the legend on the right side of the plot.
Default is |
ggplot2 object if return.plot=TRUE
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) sce <- RunTSNE(sce) ClusteringScatterPlot(sce,"manual",dim.reduction.type="tsne") sce <- RunUMAP(sce) ClusteringScatterPlot(sce,"manual",dim.reduction.type="umap")
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) sce <- RunTSNE(sce) ClusteringScatterPlot(sce,"manual",dim.reduction.type="tsne") sce <- RunUMAP(sce) ClusteringScatterPlot(sce,"manual",dim.reduction.type="umap")
The function implements a script down- and oversamples data to include n cells.
DownOverSampling(x, n = 50)
DownOverSampling(x, n = 50)
x |
A character or numeric vector of data to down-and oversample. |
n |
How many cells to include per cluster. |
a list containing the output of the LiblineaR prediction
FindAllGeneMarkers enables identifying gene markers for all clusters at once. This is done by differential expresission analysis where cells from one cluster are compared against the cells from the rest of the clusters. Gene and cell filters can be applied to accelerate the analysis, but this might lead to missing weak signals.
FindAllGeneMarkers.SingleCellExperiment( object, clustering.type, test, log2fc.threshold, min.pct, min.diff.pct, min.cells.group, max.cells.per.cluster, pseudocount.use, return.thresh, only.pos ) ## S4 method for signature 'SingleCellExperiment' FindAllGeneMarkers( object, clustering.type = "manual", test = "wilcox", log2fc.threshold = 0.25, min.pct = 0.1, min.diff.pct = NULL, min.cells.group = 3, max.cells.per.cluster = NULL, pseudocount.use = 1, return.thresh = 0.01, only.pos = FALSE )
FindAllGeneMarkers.SingleCellExperiment( object, clustering.type, test, log2fc.threshold, min.pct, min.diff.pct, min.cells.group, max.cells.per.cluster, pseudocount.use, return.thresh, only.pos ) ## S4 method for signature 'SingleCellExperiment' FindAllGeneMarkers( object, clustering.type = "manual", test = "wilcox", log2fc.threshold = 0.25, min.pct = 0.1, min.diff.pct = NULL, min.cells.group = 3, max.cells.per.cluster = NULL, pseudocount.use = 1, return.thresh = 0.01, only.pos = FALSE )
object |
of |
clustering.type |
"manual" or "optimal". "manual" refers to the clustering formed using the "SelectKClusters" function and "optimal" to the clustering formed using the "CalcSilhInfo" function. Default is "manual". |
test |
Which test to use. Only "wilcoxon" (the Wilcoxon rank-sum test, AKA Mann-Whitney U test) is supported at the moment. |
log2fc.threshold |
Filters out genes that have log2 fold-change of the
averaged gene expression values (with the pseudo-count value added to the
averaged values before division if pseudocount.use > 0) below this threshold.
Default is |
min.pct |
Filters out genes that have dropout rate (fraction of cells
expressing a gene) below this threshold in both comparison groups
Default is |
min.diff.pct |
Filters out genes that do not have this minimum
difference in the dropout rates (fraction of cells expressing a gene)
between the two comparison groups. Default is |
min.cells.group |
The minimum number of cells in the two comparison
groups to perform the DE analysis. If the number of cells is below the
threshold, then the DE analysis of this cluster is skipped.
Default is |
max.cells.per.cluster |
The maximun number of cells per cluster if
downsampling is performed to speed up the DE analysis.
Default is |
pseudocount.use |
A positive integer, which is added to
the average gene expression values before calculating the fold-change,
assuring that no divisions by zero occur. Default is |
return.thresh |
If only.pos=TRUE, then return only genes that have the
adjusted p-value (adjusted by the Bonferroni method) below or equal to this
threshold. Default is |
only.pos |
Whether to return only genes that have an adjusted p-value
(adjusted by the Bonferroni method) below or equal to the threshold.
Default is |
a data frame of the results if positive results were found, else NULL
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) gene_markers <- FindAllGeneMarkers(sce)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) gene_markers <- FindAllGeneMarkers(sce)
FindGeneMarkers enables identifying gene markers for one cluster or two arbitrary combinations of clusters, e.g. 1_2 vs. 3_4_5. Gene and cell filters can be applied to accelerate the analysis, but this might lead to missing weak signals.
FindGeneMarkers.SingleCellExperiment( object, clusters.1, clusters.2, clustering.type, test, logfc.threshold, min.pct, min.diff.pct, min.cells.group, max.cells.per.cluster, pseudocount.use, return.thresh, only.pos ) ## S4 method for signature 'SingleCellExperiment' FindGeneMarkers( object, clusters.1 = NULL, clusters.2 = NULL, clustering.type = "", test = "wilcox", logfc.threshold = 0.25, min.pct = 0.1, min.diff.pct = NULL, min.cells.group = 3, max.cells.per.cluster = NULL, pseudocount.use = 1, return.thresh = 0.01, only.pos = FALSE )
FindGeneMarkers.SingleCellExperiment( object, clusters.1, clusters.2, clustering.type, test, logfc.threshold, min.pct, min.diff.pct, min.cells.group, max.cells.per.cluster, pseudocount.use, return.thresh, only.pos ) ## S4 method for signature 'SingleCellExperiment' FindGeneMarkers( object, clusters.1 = NULL, clusters.2 = NULL, clustering.type = "", test = "wilcox", logfc.threshold = 0.25, min.pct = 0.1, min.diff.pct = NULL, min.cells.group = 3, max.cells.per.cluster = NULL, pseudocount.use = 1, return.thresh = 0.01, only.pos = FALSE )
object |
of |
clusters.1 |
a character or numeric vector denoting which clusters to use in the first group (named group.1 in the results) |
clusters.2 |
a character or numeric vector denoting which clusters to use in the second group (named group.2 in the results) |
clustering.type |
"manual" or "optimal". "manual" refers to the clustering formed using the "SelectKClusters" function and "optimal" to the clustering formed using the "CalcSilhInfo" function. Default is "manual". |
test |
Which test to use. Only "wilcoxon" (the Wilcoxon rank-sum test, AKA Mann-Whitney U test) is supported at the moment. |
logfc.threshold |
Filters out genes that have log2 fold-change of the
averaged gene expression values (with the pseudo-count value added to the
averaged values before division if pseudocount.use > 0) below this threshold.
Default is |
min.pct |
Filters out genes that have dropout rate (fraction of cells
expressing a gene) below this threshold in both comparison groups
Default is |
min.diff.pct |
Filters out genes that do not have this minimum
difference in the dropout rates (fraction of cells expressing a gene)
between the two comparison groups. Default is |
min.cells.group |
The minimum number of cells in the two comparison
groups to perform the DE analysis. If the number of cells is below the
threshold, then the DE analysis is not performed.
Default is |
max.cells.per.cluster |
The maximun number of cells per cluster
if downsampling is performed to speed up the DE analysis.
Default is |
pseudocount.use |
A positive integer, which is added
to the average gene expression values before calculating the fold-change.
This makes sure that no divisions by zero occur. Default is |
return.thresh |
If only.pos=TRUE, then return only genes that
have the adjusted p-value (adjusted by the Bonferroni method) below or
equal to this threshold. Default is |
only.pos |
Whether to return only genes that have an adjusted
p-value (adjusted by the Bonferroni method) below or equal to the
threshold. Default is |
a data frame of the results if positive results were found, else NULL
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) gene_markes_1 <- FindGeneMarkers(sce,clusters.1=1) gene_markes_1_vs_2 <- FindGeneMarkers(sce,clusters.1=1,clusters.2=2)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) gene_markes_1 <- FindGeneMarkers(sce,clusters.1=1) gene_markes_1_vs_2 <- FindGeneMarkers(sce,clusters.1=1,clusters.2=2)
The GeneHeatmap function enables drawing a heatmap of the gene markers identified by FindAllGeneMarkers, where the cell are grouped by the clustering.
GeneHeatmap.SingleCellExperiment(object, clustering.type, gene.markers) ## S4 method for signature 'SingleCellExperiment' GeneHeatmap(object, clustering.type = "manual", gene.markers = NULL)
GeneHeatmap.SingleCellExperiment(object, clustering.type, gene.markers) ## S4 method for signature 'SingleCellExperiment' GeneHeatmap(object, clustering.type = "manual", gene.markers = NULL)
object |
of |
clustering.type |
"manual" or "optimal". "manual" refers to the clustering formed using the "SelectKClusters" function and "optimal" to the clustering using the "CalcSilhInfo" function. Default is "manual". |
gene.markers |
a data frame of the gene markers generated by FindAllGeneMarkers function. To accelerate the drawing, filtering the dataframe by selecting e.g. top 10 genes is recommended. |
nothing
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,r=1,k=5) # Use L=200 sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) gene_markers <- FindAllGeneMarkers(sce,log2fc.threshold = 0.5,min.pct = 0.5) top10_log2FC <- SelectTopGenes(gene_markers,top.N=10, criterion.type="log2FC",inverse=FALSE) GeneHeatmap(sce,clustering.type = "manual", gene.markers = top10_log2FC)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,r=1,k=5) # Use L=200 sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) gene_markers <- FindAllGeneMarkers(sce,log2fc.threshold = 0.5,min.pct = 0.5) top10_log2FC <- SelectTopGenes(gene_markers,top.N=10, criterion.type="log2FC",inverse=FALSE) GeneHeatmap(sce,clustering.type = "manual", gene.markers = top10_log2FC)
GeneScatterPlot enables visualizing gene expression of a gene over nonlinear dimensionality reduction with t-SNE or UMAP.
GeneScatterPlot.SingleCellExperiment( object, genes, return.plot, dim.reduction.type, point.size, title, plot.expressing.cells.last, nrow, ncol ) ## S4 method for signature 'SingleCellExperiment' GeneScatterPlot( object, genes = "", return.plot = FALSE, dim.reduction.type = "tsne", point.size = 0.7, title = "", plot.expressing.cells.last = FALSE, nrow = NULL, ncol = NULL )
GeneScatterPlot.SingleCellExperiment( object, genes, return.plot, dim.reduction.type, point.size, title, plot.expressing.cells.last, nrow, ncol ) ## S4 method for signature 'SingleCellExperiment' GeneScatterPlot( object, genes = "", return.plot = FALSE, dim.reduction.type = "tsne", point.size = 0.7, title = "", plot.expressing.cells.last = FALSE, nrow = NULL, ncol = NULL )
object |
of |
genes |
a character vector of the genes to be visualized |
return.plot |
whether to return the ggplot2 object or just
draw it (default |
dim.reduction.type |
"tsne" or "umap" (default "tsne") |
point.size |
point size (default 0.7) |
title |
text to write above the plot |
plot.expressing.cells.last |
whether to plot the expressing genes last to make the points more visible |
nrow |
a positive integer that specifies the number of rows in
the plot grid. Default is |
ncol |
a positive integer that specifies the number of columns
in the plot grid. Default is |
ggplot2 object if return.plot=TRUE
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- RunTSNE(sce) GeneScatterPlot(sce,"CD14",dim.reduction.type="tsne") sce <- RunUMAP(sce) GeneScatterPlot(sce,"CD14",dim.reduction.type="umap")
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- RunTSNE(sce) GeneScatterPlot(sce,"CD14",dim.reduction.type="tsne") sce <- RunUMAP(sce) GeneScatterPlot(sce,"CD14",dim.reduction.type="umap")
Perform Hierarchical clustering using the Ward's method.
HierarchicalClustering.SingleCellExperiment(object) ## S4 method for signature 'SingleCellExperiment' HierarchicalClustering(object)
HierarchicalClustering.SingleCellExperiment(object) ## S4 method for signature 'SingleCellExperiment' HierarchicalClustering(object)
object |
of |
object of SingleCellExperiment
class
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce)
The function implements a script that downsamples data a dataset, trains a logistic regression classifier model and then projects its clustering onto itself using a trained L1-regularized logistic regression model.
LogisticRegression( training.sparse.matrix = NULL, training.ident = NULL, C = 0.3, reg.type = "L1", test.sparse.matrix = NULL, d = 0.3 )
LogisticRegression( training.sparse.matrix = NULL, training.ident = NULL, C = 0.3, reg.type = "L1", test.sparse.matrix = NULL, d = 0.3 )
training.sparse.matrix |
A sparse matrix (dgCMatrix) containing training
sample's gene expression data with genes in rows and cells in columns.
Default is |
training.ident |
A named factor containing sample's cluster labels for
each cell in training.sparse.matrix. Default is |
C |
Cost of constraints violation in L1-regularized logistic
regression (C). Default is |
reg.type |
"L1" for LASSO and "L2" for Ridge. Default is "L1". |
test.sparse.matrix |
A sparse matrix (dgCMatrix) containing test
sample's gene expression data with genes in rows and cells in columns.
Default is |
d |
A numeric smaller than |
a list containing the output of the LiblineaR prediction
MergeClusters function enables merging clusters and naming the newly formed cluster.
MergeClusters.SingleCellExperiment(object, clusters.to.merge, new.name) ## S4 method for signature 'SingleCellExperiment' MergeClusters(object, clusters.to.merge = "", new.name = "")
MergeClusters.SingleCellExperiment(object, clusters.to.merge, new.name) ## S4 method for signature 'SingleCellExperiment' MergeClusters(object, clusters.to.merge = "", new.name = "")
object |
of |
clusters.to.merge |
a character or numeric vector for the names of the clusters to merge |
new.name |
a character for the new name of the merged cluster. If left empty, the new cluster name is formed by separating the cluster names by "_". |
object of SingleCellExperiment
class
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) sce <- MergeClusters(sce,clusters.to.merge=c(1,2),new.name="merged1")
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) sce <- MergeClusters(sce,clusters.to.merge=c(1,2),new.name="merged1")
The preprocessing was done using Cell Ranger v2.2.0 and the GRCh38.p12 human reference genome. The Normalization was done using the LogNormalize method of Seurat v3 R package. The sampling was done using the sample() function without replacement and set.seed(1) as initialization.
data(pbmc3k_500)
data(pbmc3k_500)
pbmc3k_500, dgCMatrix object
https://support.10xgenomics.com/single-cell-gene-expression
data(pbmc3k_500)
data(pbmc3k_500)
Draw an elbow plot of the standard deviations of the principal components to deduce an appropriate value for p.
PCAElbowPlot.SingleCellExperiment(object, return.plot) ## S4 method for signature 'SingleCellExperiment' PCAElbowPlot(object, return.plot = FALSE)
PCAElbowPlot.SingleCellExperiment(object, return.plot) ## S4 method for signature 'SingleCellExperiment' PCAElbowPlot(object, return.plot = FALSE)
object |
object of class 'iloreg' |
return.plot |
logical indicating if the ggplot2 object should be returned (default FALSE) |
ggplot2 object if return.plot=TRUE
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) PCAElbowPlot(sce)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) PCAElbowPlot(sce)
SingleCellExperiment
object for ILoReg
analysisThis function prepares the SingleCellExperiment
object for
ILoReg
analysis. The only required input is an object of class
SingleCellExperiment
with at least data in the logcounts
slot.
PrepareILoReg.SingleCellExperiment(object) ## S4 method for signature 'SingleCellExperiment' PrepareILoReg(object)
PrepareILoReg.SingleCellExperiment(object) ## S4 method for signature 'SingleCellExperiment' PrepareILoReg(object)
object |
an object of |
an object of SingleCellExperiment
class
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce)
RenameAllClusters function enables renaming all cluster at once.
RenameAllClusters.SingleCellExperiment(object, new.cluster.names) ## S4 method for signature 'SingleCellExperiment' RenameAllClusters(object, new.cluster.names = "")
RenameAllClusters.SingleCellExperiment(object, new.cluster.names) ## S4 method for signature 'SingleCellExperiment' RenameAllClusters(object, new.cluster.names = "")
object |
of |
new.cluster.names |
object of class 'iloreg' |
object of SingleCellExperiment
class
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) sce <- RenameAllClusters(sce,new.cluster.names=LETTERS[seq_len(5)])
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) sce <- RenameAllClusters(sce,new.cluster.names=LETTERS[seq_len(5)])
RenameCluster function enables renaming a cluster in 'clustering.manual' slot.
RenameCluster.SingleCellExperiment(object, old.cluster.name, new.cluster.name) ## S4 method for signature 'SingleCellExperiment' RenameCluster(object, old.cluster.name = "", new.cluster.name = "")
RenameCluster.SingleCellExperiment(object, old.cluster.name, new.cluster.name) ## S4 method for signature 'SingleCellExperiment' RenameCluster(object, old.cluster.name = "", new.cluster.name = "")
object |
of |
old.cluster.name |
a character variable denoting the old name of the cluster |
new.cluster.name |
a character variable the new name of the cluster |
object of SingleCellExperiment
class
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) sce <- RenameCluster(sce,1,"cluster1")
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) sce <- RenameCluster(sce,1,"cluster1")
The function implements Iterative Clustering Projection (ICP): a supervised learning -based clustering, which maximizes clustering similarity between the clustering and its projection by logistic regression.
RunICP( normalized.data = NULL, k = 15, d = 0.3, r = 5, C = 5, reg.type = "L1", max.iter = 200 )
RunICP( normalized.data = NULL, k = 15, d = 0.3, r = 5, C = 5, reg.type = "L1", max.iter = 200 )
normalized.data |
A sparse matrix (dgCMatrix) containing
normalized gene expression data with genes in rows and cells in columns.
Default is |
k |
A positive integer greater or equal to 2, denoting the number of
clusters in ICP. Default is |
d |
A numeric that defines how many cells per cluster should be
down- and oversampled (d in ceiling(N/k*d)), when stratified.downsampling=FALSE,
or what fraction should be downsampled in the stratified approach
,stratified.downsampling=TRUE. Default is |
r |
A positive integer that denotes the number of reiterations
performed until the algorithm stops. Default is |
C |
Cost of constraints violation ( |
reg.type |
"L1" for LASSO and "L2" for Ridge. Default is "L1". |
max.iter |
A positive integer that denotes the maximum number of
iterations performed until the algorithm ends. Default is |
A list that includes the probability matrix and the clustering similarity measures: ARI, NMI, etc.
This functions runs in parallel L
ICP runs, which is the computational
bottleneck of ILoReg. With ~ 3,000 cells this step should be completed
in ~ 2 h and ~ 1 h with 3 and 12 logical processors (threads), respectively.
RunParallelICP.SingleCellExperiment( object, k, d, L, r, C, reg.type, max.iter, threads ) ## S4 method for signature 'SingleCellExperiment' RunParallelICP( object, k = 15, d = 0.3, L = 200, r = 5, C = 0.3, reg.type = "L1", max.iter = 200, threads = 0 )
RunParallelICP.SingleCellExperiment( object, k, d, L, r, C, reg.type, max.iter, threads ) ## S4 method for signature 'SingleCellExperiment' RunParallelICP( object, k = 15, d = 0.3, L = 200, r = 5, C = 0.3, reg.type = "L1", max.iter = 200, threads = 0 )
object |
An object of |
k |
A positive integer greater or equal to |
d |
A numeric greater than |
L |
A positive integer greater than |
r |
A positive integer that denotes the number of reiterations
performed until the ICP algorithm stops.
Increasing recommended with a significantly larger sample size
(tens of thousands of cells). Default is |
C |
A positive real number denoting the cost of constraints violation in
the L1-regularized logistic regression model from the LIBLINEAR library.
Decreasing leads to more stringent feature selection, i.e. less genes are
selected that are used to build the projection classifier. Decreasing to a
very low value (~ |
reg.type |
"L1" or "L2". L2-regularization was not investigated in the manuscript, but it leads to a more conventional outcome (less subpopulations). Default is "L1". |
max.iter |
A positive integer that denotes
the maximum number of iterations performed until ICP stops. This parameter
is only useful in situations where ICP converges extremely slowly, preventing
the algorithm to run too long. In most cases, reaching
the number of reiterations ( |
threads |
A positive integer that specifies how many logical processors
(threads) to use in parallel computation.
Set |
an object of SingleCellExperiment
class
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,r=1,k=5)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,r=1,k=5)
Perform the PCA transformation of the joint probability matrix, which reduces the dimensionality from k*L to p
RunPCA.SingleCellExperiment(object, p, scale, threshold) ## S4 method for signature 'SingleCellExperiment' RunPCA(object, p = 50, scale = FALSE, threshold = 0)
RunPCA.SingleCellExperiment(object, p, scale, threshold) ## S4 method for signature 'SingleCellExperiment' RunPCA(object, p = 50, scale = FALSE, threshold = 0)
object |
object of |
p |
a positive integer denoting the number of principal
components to calculate and select. Default is |
scale |
a logical specifying whether the probabilities should be
standardized to unit-variance before running PCA. Default is |
threshold |
a thresfold for filtering out ICP runs before PCA with
the lower terminal projection accuracy below the threshold.
Default is |
object of SingleCellExperiment
class
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5)
Run nonlinear dimensionality reduction using t-SNE with the PCA-transformed consensus matrix as input.
RunTSNE.SingleCellExperiment(object, perplexity) ## S4 method for signature 'SingleCellExperiment' RunTSNE(object, perplexity = 30)
RunTSNE.SingleCellExperiment(object, perplexity) ## S4 method for signature 'SingleCellExperiment' RunTSNE(object, perplexity = 30)
object |
of |
perplexity |
perplexity of t-SNE |
object of SingleCellExperiment
class
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- RunTSNE(sce)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- RunTSNE(sce)
Run nonlinear dimensionality reduction using UMAP with the PCA-transformed consensus matrix as input.
RunUMAP.SingleCellExperiment(object) ## S4 method for signature 'SingleCellExperiment' RunUMAP(object)
RunUMAP.SingleCellExperiment(object) ## S4 method for signature 'SingleCellExperiment' RunUMAP(object)
object |
of |
object of SingleCellExperiment
class
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- RunUMAP(sce)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- RunUMAP(sce)
Selects K clusters from the dendrogram.
SelectKClusters.SingleCellExperiment(object, K) ## S4 method for signature 'SingleCellExperiment' SelectKClusters(object, K = NULL)
SelectKClusters.SingleCellExperiment(object, K) ## S4 method for signature 'SingleCellExperiment' SelectKClusters(object, K = NULL)
object |
of |
K |
a positive integer denoting how many clusters to select |
object of SingleCellExperiment
class
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5)
The SelectTopGenes function enables selecting top or bottom N genes based on a criterion (e.g. log2FC or adj.p.value).
SelectTopGenes( gene.markers = NULL, top.N = 10, criterion.type = "log2FC", inverse = FALSE )
SelectTopGenes( gene.markers = NULL, top.N = 10, criterion.type = "log2FC", inverse = FALSE )
gene.markers |
A data frame of the gene markers found by FindAllGeneMarkers function. |
top.N |
How many top or bottom genes to select. Default is |
criterion.type |
Which criterion to use for selecting the genes. Default is "log2FC". |
inverse |
Whether to select bottom instead of top N genes.
Default is |
an object of 'data.frame' class
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) gene_markers <- FindAllGeneMarkers(sce) ## Select top 10 markers based on log2 fold-change top10_log2FC <- SelectTopGenes(gene_markers, top.N = 10, criterion.type = "log2FC", inverse = FALSE)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) gene_markers <- FindAllGeneMarkers(sce) ## Select top 10 markers based on log2 fold-change top10_log2FC <- SelectTopGenes(gene_markers, top.N = 10, criterion.type = "log2FC", inverse = FALSE)
Draw the silhouette curve: the average silhouette value across the cells for a range of different K values.
SilhouetteCurve.SingleCellExperiment(object, return.plot) ## S4 method for signature 'SingleCellExperiment' SilhouetteCurve(object, return.plot = FALSE)
SilhouetteCurve.SingleCellExperiment(object, return.plot) ## S4 method for signature 'SingleCellExperiment' SilhouetteCurve(object, return.plot = FALSE)
object |
of |
return.plot |
a logical denoting whether the ggplot2 object
should be returned. Default is |
ggplot2 object if return.plot=TRUE
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- CalcSilhInfo(sce) SilhouetteCurve(sce)
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- CalcSilhInfo(sce) SilhouetteCurve(sce)
The VlnPlot function enables visualizing expression levels of a gene, or multiple genes, across clusters using Violin plots.
VlnPlot.SingleCellExperiment( object, clustering.type, genes, return.plot, rotate.x.axis.labels ) ## S4 method for signature 'SingleCellExperiment' VlnPlot( object, clustering.type = "manual", genes = NULL, return.plot = FALSE, rotate.x.axis.labels = FALSE )
VlnPlot.SingleCellExperiment( object, clustering.type, genes, return.plot, rotate.x.axis.labels ) ## S4 method for signature 'SingleCellExperiment' VlnPlot( object, clustering.type = "manual", genes = NULL, return.plot = FALSE, rotate.x.axis.labels = FALSE )
object |
of |
clustering.type |
"manual" or "optimal". "manual" refers to the clustering formed using the "SelectKClusters" function and "optimal" to the clustering formed using the "CalcSilhInfo" function. Default is "manual". |
genes |
a character vector denoting the gene names that are visualized |
return.plot |
return.plot whether to return the ggplot2 object |
rotate.x.axis.labels |
a logical denoting whether the x-axis
labels should be rotated 90 degrees.
or just draw it. Default is |
ggplot2 object if return.plot=TRUE
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) VlnPlot(sce,genes=c("CD3D","CD79A","CST3"))
library(SingleCellExperiment) sce <- SingleCellExperiment(assays = list(logcounts = pbmc3k_500)) sce <- PrepareILoReg(sce) ## These settings are just to accelerate the example, use the defaults. sce <- RunParallelICP(sce,L=2,threads=1,C=0.1,k=5,r=1) sce <- RunPCA(sce,p=5) sce <- HierarchicalClustering(sce) sce <- SelectKClusters(sce,K=5) VlnPlot(sce,genes=c("CD3D","CD79A","CST3"))