Package 'FEAST' reference manual

Title:	FEAture SelcTion (FEAST) for Single-cell clustering
Description:	Cell clustering is one of the most important and commonly performed tasks in single-cell RNA sequencing (scRNA-seq) data analysis. An important step in cell clustering is to select a subset of genes (referred to as “features”), whose expression patterns will then be used for downstream clustering. A good set of features should include the ones that distinguish different cell types, and the quality of such set could have significant impact on the clustering accuracy. FEAST is an R library for selecting most representative features before performing the core of scRNA-seq clustering. It can be used as a plug-in for the etablished clustering algorithms such as SC3, TSCAN, SHARP, SIMLR, and Seurat. The core of FEAST algorithm includes three steps: 1. consensus clustering; 2. gene-level significance inference; 3. validation of an optimized feature set.
Authors:	Kenong Su [aut, cre], Hao Wu [aut]
Maintainer:	Kenong Su <[email protected]>
License:	GPL-2
Version:	1.15.0
Built:	2025-03-06 05:53:00 UTC
Source:	https://github.com/bioc/FEAST

Align the cell types from the prediction with the truth.

Description

Align the cell types from the prediction with the truth.

Usage

align_CellType(tt0)
align_CellType(tt0)

Arguments

tt0

a N*N table.

Value

the matched (re-ordered) table

Examples

vec1 = rep(1:4, each=100)
vec2 = sample(vec1)
tb = table(vec1, vec2)
#tb_arg = align_CellType(tb)
vec1 = rep(1:4, each=100)
vec2 = sample(vec1)
tb = table(vec1, vec2)
#tb_arg = align_CellType(tb)

Calculate the gene-level F score and corresponding significance level.

Description

Calculate the gene-level F score and corresponding significance level.

Usage

cal_F2(Y, classes)
cal_F2(Y, classes)

Arguments

`Y`	A gene expression matrix
`classes`	The initial cluster labels NA values are allowed. This can directly from the `Consensus` function.

Value

The score vector

Examples

data(Yan)
cal_F2(Y, classes = trueclass)
data(Yan)
cal_F2(Y, classes = trueclass)

Calculate the gene-level fisher score.

Description

Calculate the gene-level fisher score.

Usage

cal_Fisher2(Y, classes)
cal_Fisher2(Y, classes)

Arguments

`Y`	A gene expression matrix
`classes`	The initial cluster labels NA values are allowed. This can directly from the `Consensus` function.

Value

The score vector This is from the paper https://arxiv.org/pdf/1202.3725.pdf Vector based calculation

Calculate 3 metrics and these methods are exported in C codes. flag = 1 — Rand index, flag = 2 — Fowlkes and Mallows's index, flag = 3 — Jaccard index

Description

Calculate 3 metrics and these methods are exported in C codes. flag = 1 — Rand index, flag = 2 — Fowlkes and Mallows's index, flag = 3 — Jaccard index

Usage

cal_metrics(cl1, cl2, randMethod = c("Rand", "FM", "Jaccard"))
cal_metrics(cl1, cl2, randMethod = c("Rand", "FM", "Jaccard"))

Arguments

`cl1`	a vector
`cl2`	a vector
`randMethod`	a string chosen from "Rand", "FM", or "Jaccard"

Value

a numeric vector including three values

Standard way to preprocess the count matrix. It is the QC step for the genes.

Description

Standard way to preprocess the count matrix. It is the QC step for the genes.

Usage

cal_MSE(Ynorm, cluster, return_mses = FALSE)
cal_MSE(Ynorm, cluster, return_mses = FALSE)

Arguments

`Ynorm`	A normalized gene expression matrix. If not, we will normalize it for you.
`cluster`	The clustering outcomes. Specifically, they are cluster labels.
`return_mses`	True or False indicating whether returning the MSE.

Value

The MSE of the clustering centers with the predicted Y.

Examples

data(Yan)
Ynorm = Norm_Y(Y)
cluster = trueclass
MSE_res = cal_MSE(Ynorm, cluster)
data(Yan)
Ynorm = Norm_Y(Y)
cluster = trueclass
MSE_res = cal_MSE(Ynorm, cluster)

Consensus Clustering

Description

Consensus Clustering

Usage

Consensus(Y, num_pcs = 10, top_pctg = 0.33, k = 2, thred = 0.9, nProc = 1)
Consensus(Y, num_pcs = 10, top_pctg = 0.33, k = 2, thred = 0.9, nProc = 1)

Arguments

`Y`	A expression matrix. It is recommended to use the raw count matrix. Users can input normalized matrix directly.
`num_pcs`	The number of top pcs that will be investigated on through consensus clustering.
`top_pctg`	Top percentage of features for dimension reduction
`k`	The number of input clusters (best guess).
`thred`	For the final GMM clustering, the probability of a cell belonging to a certain cluster.
`nProc`	number of cores for BiocParallel enviroment.

Value

the clustering labels and the featured genes.

Examples

data(Yan)
set.seed(123)
rixs = sample(nrow(Y), 500)
cixs = sample(ncol(Y), 40)
Y = Y[rixs, cixs]
con = Consensus(Y, k=5)
data(Yan)
set.seed(123)
rixs = sample(nrow(Y), 500)
cixs = sample(ncol(Y), 40)
Y = Y[rixs, cixs]
con = Consensus(Y, k=5)

Calculate the a series of the evaluation statistics.

Description

Calculate the a series of the evaluation statistics.

Usage

eval_Cluster(vec1, vec2)
eval_Cluster(vec1, vec2)

Arguments

`vec1`	a vector.
`vec2`	a vector. x and y are with the same length.

Value

a vector of evaluation metrics

Examples

vec2 = vec1 = rep(1:4, each = 100)
vec2[1:10] = 4
acc = eval_Cluster(vec1, vec2)
vec2 = vec1 = rep(1:4, each = 100)
vec2[1:10] = 4
acc = eval_Cluster(vec1, vec2)

FEAST main function

Description

FEAST main function

Usage

FEAST(
  Y,
  k = 2,
  num_pcs = 10,
  dim_reduce = c("irlba", "svd", "pca"),
  split = FALSE,
  batch_size = 1000,
  nProc = 1
)
FEAST(
  Y,
  k = 2,
  num_pcs = 10,
  dim_reduce = c("irlba", "svd", "pca"),
  split = FALSE,
  batch_size = 1000,
  nProc = 1
)

Arguments

`Y`	A expression matrix. Raw count matrix or normalized matrix.
`k`	The number of input clusters (best guess).
`num_pcs`	The number of top pcs that will be investigated through the consensus clustering.
`dim_reduce`	dimension reduction methods chosen from pca, svd, or irlba.
`split`	boolean. If T, using subsampling to calculate the gene-level significance.
`batch_size`	when split is true, need to claim the batch size for spliting the cells.
`nProc`	number of cores for BiocParallel enviroment.

Value

the rankings of the gene-significance.

Examples

data(Yan)
k = length(unique(trueclass))
set.seed(123)
rixs = sample(nrow(Y), 500)
cixs = sample(ncol(Y), 40)
Y = Y[rixs, cixs]
ixs = FEAST(Y, k=k)
data(Yan)
k = length(unique(trueclass))
set.seed(123)
rixs = sample(nrow(Y), 500)
cixs = sample(ncol(Y), 40)
Y = Y[rixs, cixs]
ixs = FEAST(Y, k=k)

FEAST main function (fast version)

Description

FEAST main function (fast version)

Usage

FEAST_fast(Y, k = 2, num_pcs = 10, split = FALSE, batch_size = 1000, nProc = 1)
FEAST_fast(Y, k = 2, num_pcs = 10, split = FALSE, batch_size = 1000, nProc = 1)

Arguments

`Y`	A expression matrix. Raw count matrix or normalized matrix.
`k`	The number of input clusters (best guess).
`num_pcs`	The number of top pcs that will be investigated through the consensus clustering.
`split`	boolean. If T, using subsampling to calculate the gene-level significance.
`batch_size`	when split is true, need to claim the batch size for spliting the cells.
`nProc`	number of cores for BiocParallel enviroment.

Value

the rankings of the gene-significance.

Examples

data(Yan)
k = length(unique(trueclass))
res = FEAST_fast(Y, k=k)
data(Yan)
k = length(unique(trueclass))
res = FEAST_fast(Y, k=k)

Normalize the count expression matrix by the size factor and take the log transformation.

Description

Normalize the count expression matrix by the size factor and take the log transformation.

Usage

Norm_Y(Y)
Norm_Y(Y)

Arguments

`Y`	a count expression matrix

Value

a normalized matrix

Examples

data(Yan)
Ynorm = Norm_Y(Y)
data(Yan)
Ynorm = Norm_Y(Y)

Standard way to preprocess the count matrix. It is the QC step for the genes.

Description

Standard way to preprocess the count matrix. It is the QC step for the genes.

Usage

process_Y(Y, thre = 2)
process_Y(Y, thre = 2)

Arguments

`Y`	A gene expression data (Raw count matrix)
`thre`	The threshold of minimum number of cells expressing a certain gene (default =2)

Value

A processed gene expression matrix. It is not log transformed

Examples

data(Yan)
YY = process_Y(Y, thre=2)
data(Yan)
YY = process_Y(Y, thre=2)

Calculate the purity between two vectors.

Description

Calculate the purity between two vectors.

Usage

Purity(x, y)
Purity(x, y)

Arguments

`x`	a vector.
`y`	a vector. x and y are with the same length.

Value

the purity score

SC3 Clustering

Description

SC3 Clustering

Usage

SC3_Clust(Y, k = NULL, input_markers = NULL)
SC3_Clust(Y, k = NULL, input_markers = NULL)

Arguments

`Y`	A expression matrix. It is recommended to use the raw count matrix.
`k`	The number of clusters. If it is not provided, k is estimated by the default method in SC3.
`input_markers`	A character vector including the featured genes. If they are not presented, SC3 will take care of this.

Value

the clustering labels and the featured genes.

Using clustering results based on feature selection to perform model selection.

Description

Using clustering results based on feature selection to perform model selection.

Usage

Select_Model_short_SC3(Y, cluster, tops = c(500, 1000, 2000))
Select_Model_short_SC3(Y, cluster, tops = c(500, 1000, 2000))

Arguments

`Y`	A gene expression matrix
`cluster`	The initial cluster labels NA values are allowed. This can directly from the `Consensus` function.
`tops`	A numeric vector containing a list of numbers corresponding to top genes; e.g., tops = c(500, 1000, 2000).

Value

mse and the SC3 clustering result.

Examples

data(Yan)
k = length(unique(trueclass))
Y = process_Y(Y, thre = 2) # preprocess the data
set.seed(123)
rixs = sample(nrow(Y), 500)
cixs = sample(ncol(Y), 40)
Y = Y[rixs, cixs]
con_res = Consensus(Y, k=k)
# not run
# mod_res = Select_Model_short_SC3(Y, cluster = con_res$cluster, top = c(100, 200))
data(Yan)
k = length(unique(trueclass))
Y = process_Y(Y, thre = 2) # preprocess the data
set.seed(123)
rixs = sample(nrow(Y), 500)
cixs = sample(ncol(Y), 40)
Y = Y[rixs, cixs]
con_res = Consensus(Y, k=k)
# not run
# mod_res = Select_Model_short_SC3(Y, cluster = con_res$cluster, top = c(100, 200))

Using clustering results (from TSCAN) based on feature selection to perform model selection.

Description

Using clustering results (from TSCAN) based on feature selection to perform model selection.

Usage

Select_Model_short_TSCAN(
  Y,
  cluster,
  minexpr_percent = 0.5,
  cvcutoff = 1,
  tops = c(500, 1000, 2000)
)
Select_Model_short_TSCAN(
  Y,
  cluster,
  minexpr_percent = 0.5,
  cvcutoff = 1,
  tops = c(500, 1000, 2000)
)

Arguments

`Y`	A gene expression matrix
`cluster`	The initial cluster labels NA values are allowed. This can directly from the `Consensus` function.
`minexpr_percent`	The threshold used for processing data in TSCAN. Using it by default.
`cvcutoff`	The threshold used for processing data in TSCAN. Using it by default.
`tops`	A numeric vector containing a list of numbers corresponding to top genes; e.g., tops = c(500, 1000, 2000).

Value

mse and the TSCAN clustering result.

Examples

data(Yan)
k = length(unique(trueclass))
Y = process_Y(Y, thre = 2) # preprocess the data
set.seed(123)
rixs = sample(nrow(Y), 500)
cixs = sample(ncol(Y), 40)
Y = Y[rixs, cixs]
con_res = Consensus(Y, k=k)
# not run
# mod_res = Select_Model_short_TSCAN(Y, cluster = con_res$cluster, top = c(100, 200))
data(Yan)
k = length(unique(trueclass))
Y = process_Y(Y, thre = 2) # preprocess the data
set.seed(123)
rixs = sample(nrow(Y), 500)
cixs = sample(ncol(Y), 40)
Y = Y[rixs, cixs]
con_res = Consensus(Y, k=k)
# not run
# mod_res = Select_Model_short_TSCAN(Y, cluster = con_res$cluster, top = c(100, 200))

An example single cell dataset for the cell label information (Yan)

Description

The true cell type labels for Yan dataset. It includes 8 different cell types.

Usage

data("Yan")data("Yan")

Format

A character vector contains the cell type label

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36552

References

Yan, Liying, et al. "Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells." Nature structural & molecular biology 20.9 (2013): 1131.

Examples

data("Yan")
table(trueclass)
data("Yan")
table(trueclass)

TSCAN Clustering

Description

TSCAN Clustering

Usage

TSCAN_Clust(Y, k, minexpr_percent = 0.5, cvcutoff = 1, input_markers = NULL)
TSCAN_Clust(Y, k, minexpr_percent = 0.5, cvcutoff = 1, input_markers = NULL)

Arguments

`Y`	A expression matrix. It is recommended to use the raw count matrix.
`k`	The number of clusters. If it is not provided, k is estimated by the default method in SC3.
`minexpr_percent`	minimum expression threshold (default = 0.5).
`cvcutoff`	the cv cutoff to filter the genes (default = 1).
`input_markers`	A character vector including the featured genes. If they are not presented, SC3 will take care of this.

Value

the clustering labels and the featured genes.

Examples

 data(Yan)
 k = length(unique(trueclass))
 # TSCAN_res = TSCAN_Clust(Y, k=k)
data(Yan)
 k = length(unique(trueclass))
 # TSCAN_res = TSCAN_Clust(Y, k=k)

function for convert a vector to a binary matrix

Description

function for convert a vector to a binary matrix

Usage

vector2matrix(vec)
vector2matrix(vec)

Arguments

vec

a vector.

Value

a n by n binary matrix indicating the adjacency.

Using clustering results based on feature selection to perform model selection.

Description

Using clustering results based on feature selection to perform model selection.

Usage

Visual_Rslt(model_cv_res, trueclass)
Visual_Rslt(model_cv_res, trueclass)

Arguments

`model_cv_res`	model selection result from `Select_Model_short_SC3`.
`trueclass`	The real class labels

Value

a list of mse dataframe, clustering accuracy dataframe, and ggplot object.

Examples

data(Yan)
k = length(unique(trueclass))
Y = process_Y(Y, thre = 2) # preprocess the data
set.seed(123)
rixs = sample(nrow(Y), 500)
cixs = sample(ncol(Y), 40)
Y = Y[rixs, ]
con_res = Consensus(Y, k=k)
# Not run
# mod_res = Select_Model_short_SC3(Y, cluster = con_res$cluster, top = c(100, 200))
library(ggpubr)
# Visual_Rslt(model_cv_res = mod_res, trueclass = trueclass)
data(Yan)
k = length(unique(trueclass))
Y = process_Y(Y, thre = 2) # preprocess the data
set.seed(123)
rixs = sample(nrow(Y), 500)
cixs = sample(ncol(Y), 40)
Y = Y[rixs, ]
con_res = Consensus(Y, k=k)
# Not run
# mod_res = Select_Model_short_SC3(Y, cluster = con_res$cluster, top = c(100, 200))
library(ggpubr)
# Visual_Rslt(model_cv_res = mod_res, trueclass = trueclass)

An example single cell count expression matrix (Yan)

Description

Y is a count expression matrix which belongs to "matrix" class. The data includes 124 cells about human preimplantation embryos and embryonic stem cells. It contains 19304 genes after removing genes with extreme high dropout rate.

Usage

data("Yan")data("Yan")

Format

An object of "matrix" class contains the count expressions

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36552

References

Yan, Liying, et al. "Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells." Nature structural & molecular biology 20.9 (2013): 1131.

Examples

data("Yan")
Y[1:10, 1:4]
data("Yan")
Y[1:10, 1:4]

Package 'FEAST'

Help Index

Align the cell types from the prediction with the truth.

Description

Usage

Arguments

Value

Examples

Calculate the gene-level F score and corresponding significance level.

Description

Usage

Arguments

Value

Examples

Calculate the gene-level fisher score.

Description

Usage

Arguments

Value

Calculate 3 metrics and these methods are exported in C codes. flag = 1 — Rand index, flag = 2 — Fowlkes and Mallows's index, flag = 3 — Jaccard index

Description

Usage

Arguments

Value

Standard way to preprocess the count matrix. It is the QC step for the genes.

Description

Usage

Arguments

Value

Examples

Consensus Clustering

Description

Usage

Arguments

Value

Examples

Calculate the a series of the evaluation statistics.

Description

Usage

Arguments

Value

Examples

FEAST main function

Description

Usage

Arguments

Value

Examples

FEAST main function (fast version)

Description

Usage

Arguments

Value

Examples

Normalize the count expression matrix by the size factor and take the log transformation.

Description

Usage

Arguments

Value

Examples

Standard way to preprocess the count matrix. It is the QC step for the genes.

Description

Usage

Arguments

Value

Examples

Calculate the purity between two vectors.

Description

Usage

Arguments

Value

SC3 Clustering

Description

Usage

Arguments

Value

Using clustering results based on feature selection to perform model selection.

Description

Usage

Arguments