Package 'CHETAH' reference manual

Title:	Fast and accurate scRNA-seq cell type identification
Description:	CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate, selective and fast scRNA-seq classifier. Classification is guided by a reference dataset, preferentially also a scRNA-seq dataset. By hierarchical clustering of the reference data, CHETAH creates a classification tree that enables a step-wise, top-to-bottom classification. Using a novel stopping rule, CHETAH classifies the input cells to the cell types of the references and to "intermediate types": more general classifications that ended in an intermediate node of the tree.
Authors:	Jurrian de Kanter [aut, cre], Philip Lijnzaad [aut]
Maintainer:	Jurrian de Kanter <[email protected]>
License:	file LICENSE
Version:	1.23.0
Built:	2025-03-29 06:34:32 UTC
Source:	https://github.com/bioc/CHETAH

Identification of cell types aided by hierarchical clustering

Description

CHETAH classifies an input dataset by comparing it to a reference dataset in a stepwise, top-to-bottom fashion. See 'details' for a full explanation. NOTE: We recommend to use all the default parameters

Usage

CHETAHclassifier(
  input,
  ref_cells = NULL,
  ref_profiles = NULL,
  ref_ct = "celltypes",
  input_c = NA,
  ref_c = NA,
  thresh = 0.1,
  gs_method = c("fc", "wilcox"),
  cor_method = c("spearman", "kendall", "pearson", "cosine"),
  clust_method = c("average", "single", "complete", "ward.D2", "ward.D", "mcquitty",
    "median", "centroid"),
  clust_dist = bioDist::spearman.dist,
  n_genes = 200,
  pc_thresh = 0.2,
  p_thresh = 0.05,
  fc_thresh = 1.5,
  subsample = FALSE,
  fix_ngenes = TRUE,
  plot.tree = FALSE,
  only_pos = FALSE,
  print_steps = FALSE
)
CHETAHclassifier(
  input,
  ref_cells = NULL,
  ref_profiles = NULL,
  ref_ct = "celltypes",
  input_c = NA,
  ref_c = NA,
  thresh = 0.1,
  gs_method = c("fc", "wilcox"),
  cor_method = c("spearman", "kendall", "pearson", "cosine"),
  clust_method = c("average", "single", "complete", "ward.D2", "ward.D", "mcquitty",
    "median", "centroid"),
  clust_dist = bioDist::spearman.dist,
  n_genes = 200,
  pc_thresh = 0.2,
  p_thresh = 0.05,
  fc_thresh = 1.5,
  subsample = FALSE,
  fix_ngenes = TRUE,
  plot.tree = FALSE,
  only_pos = FALSE,
  print_steps = FALSE
)

Arguments

`input`	required: an input SingleCellExperiment. (see: Bioconductor, and the vignette `browseVignettes("CHETAH")`)
`ref_cells`	required: A reference SingleCellExperiment, with the cell types in the "celltypes" colData (or otherwise defined in `ref_ct`.
`ref_profiles`	optional In case of bulk-RNA seq or micro-arrays, an expression matrix with one (average) reference expression profile per cell type in the columns. ('ref_cells' must be left empty)
`ref_ct`	the colData of `ref_cells` where the cell types are stored.
`input_c`	the name of the assay of the input to use. `NA` (default) will use the first one.
`ref_c`	same as `input_c`, but for the reference.
`thresh`	the initial confidence threshold, which can be changed after running by `Classify`)
`gs_method`	method for gene selection. In every node of the tree: "fc" = quick method: either a fixed number (`n_genes`) of genes is selected with the highest fold-change (default), or genes are selected that have a fold-change higher than `fc_thresh` (the latter is used when `fix_ngenes = FALSE`) . "wilcox": genes are selected based on fold-change (`fc_thresh`), percentage of expression (`pc_thresh`) and p-values (`p_thresh`), p-values are found by the wilcox test.
`cor_method`	the correlation measure: one of: "spearman" (default), "kendall", "pearson", "cosine"
`clust_method`	the method used for clustering the reference profiles. One of the methods from `hclust`
`clust_dist`	a distance measure, default: `spearman.dist`
`n_genes`	The number of genes used in every step. Only used if `fix_ngenes = TRUE`
`pc_thresh`	when: gs_method = "wilcox", only genes are selected for which more than a `pc_tresh` fraction of a reference group of cells express that gene
`p_thresh`	when: gs_method = "wilcox" , only genes are selected that have a p-value < `p_thresh`
`fc_thresh`	when: gs_method = "wilcox" or gs_method = "fc" AND fix_ngenes = FALSE, only genes are selected that have a log2 fld-change > `fc_thresh` between two reference groups. if this mode is selected, the reference must be in the log2 space.
`subsample`	to prevent reference types with a lot of cells to influence the gene selection, subsample types with more that `subsample` cells
`fix_ngenes`	when: gs_method = "fc" use a fixed number of genes for all correlations. when: gs_method = "wilcox" use a maximum of genes per step. When `fix_ngenes = FALSE & gs_methode = "fc"` `fc_thresh` is used to define the fold-change cut-off for gene selection.
`plot.tree`	Plot the classification tree.
`only_pos`	not recommended: only use genes for a reference type that are higher expressed in that type, than the others in that node.
`print_steps`	whether the number of genes (postive and negative) per step per ref_cell_type should be printed

Details

CHETAH will hierarchically cluster reference data to produce a classification tree (ct). In each node of the ct, CHETAH will assign each input cell to on of the two branches, based on gene selections, correlations and calculation of profile and confidence scores. The assignement will only performed if the confidence score for such an assignment is higher than the Confidence Threshold. If this is not the case, classification for the cell will stop in the current node. Some input cells will reach the leaf nodes of the ct (the pre-defined cell types), these classifications are called final types For other cells, assignment will stop in a node. These classifications are called intermediate types.

Value

A SingleCellExperiment with added: - input$celltype_CHETAH a named character vector that can directly be used in any other workflow/method. - "hidden" 'int_colData' and 'int_metadata', not meant for direct interaction, but which can all be viewed and interacted with using: 'PlotCHETAH' and 'CHETAHshiny' A list containing the following objects is added to input$int_metadata$CHETAH

classification a named vector: the classified types with the corresponding names of the input cells
tree the hclust object of the classification tree
nodetypes A list with the cell types under each node
nodecoor the coordinates of the nodes of the classification tree
genes A list per node, containing a list per reference type with the genes used for the profile scores of that type
parameters The parameters used

A nested DataFrame is added to input$int_colData$CHETAH. It holds 3 top-levels DataFrames

prof_scores A list with the profile scores
conf_scores A list with the confidence scores
correlations A list with the correlations of the input cells to the reference profiles

Examples

data('input_mel')
data('headneck_ref')
## Melanoma data from Tirosh et al. (2016) Science
input_mel
## Head-Neck data from Puram et al. (2017) Cancer Cell
headneck_ref
input_mel <- CHETAHclassifier(input = input_mel, ref_cells = headneck_ref)
data('input_mel')
data('headneck_ref')
## Melanoma data from Tirosh et al. (2016) Science
input_mel
## Head-Neck data from Puram et al. (2017) Cancer Cell
headneck_ref
input_mel <- CHETAHclassifier(input = input_mel, ref_cells = headneck_ref)

Launch a web page to interactively go trough the classification

Description

Launch a web page to interactively go trough the classification

Usage

CHETAHshiny(input, redD = NA, input_c = NA)
CHETAHshiny(input, redD = NA, input_c = NA)

Arguments

`input`	a SingleCellExperiment on which `CHETAHclassifier` has been run
`redD`	the name of the reducedDim of the input to use for plotting
`input_c`	the name of the assay of the input to use. `NA` (default) will use the first one.

Value

Opens a web page in your default browser

(Re)classify after running `CHETAHclassifier` using a confidence threshold
NOTE: In case of bulk reference profiles: only the correlations will be used, as the data does not allow for profile or confidence scores to be calculated.

Description

(Re)classify after running CHETAHclassifier using a confidence threshold
NOTE: In case of bulk reference profiles: only the correlations will be used, as the data does not allow for profile or confidence scores to be calculated.

Usage

Classify(input, thresh = 0.1, return_clas = FALSE)
Classify(input, thresh = 0.1, return_clas = FALSE)

Arguments

`input`	a SingleCellExperiment on which `CHETAHclassifier` has been run
`thresh`	a confidence threshold between -0 and 2. Selecting 0 will classify all cells, whereas 2 will result i n (almost) no cells to be classified. recommended: between 0.1 (fairly confident) and 1 (very confident)
`return_clas`	Instead of returning the SingleCellExperiment, only return the classification vector

Value

a charachter vector of the cell types with the names of the cells

Examples

data('input_mel')
data('headneck_ref')
## Classify all cells
input_mel <- Classify(input_mel, 0)

## Classify only cells with a very high confidence
input_mel <- Classify(input_mel, 1)

## Back to the default
input_mel <- Classify(input_mel)

## Return only the classification vector
celltypes <- Classify(input_mel, 1, return_clas = TRUE)
data('input_mel')
data('headneck_ref')
## Classify all cells
input_mel <- Classify(input_mel, 0)

## Classify only cells with a very high confidence
input_mel <- Classify(input_mel, 1)

## Back to the default
input_mel <- Classify(input_mel)

## Return only the classification vector
celltypes <- Classify(input_mel, 1, return_clas = TRUE)

Use a reference dataset to classify itself. A good reference should have almost no mixture between reference cells.

Description

Use a reference dataset to classify itself. A good reference should have almost no mixture between reference cells.

Usage

ClassifyReference(
  ref_cells,
  ref_ct = "celltypes",
  ref_c = "counts",
  return = FALSE,
  ...
)
ClassifyReference(
  ref_cells,
  ref_ct = "celltypes",
  ref_c = "counts",
  return = FALSE,
  ...
)

Arguments

`ref_cells`	the reference, similar to `CHETAHclassifier`'s ref_cells
`ref_ct`	the colData of `ref_cells` where the cell types are stored.
`ref_c`	same as `input_c`, but for the reference.
`return`	return the matrix that was used to produce the plot
`...`	Other variables to pass to `CHETAHclassifier`

Value

A square plot. The rows are the original cell types, the columns the classifion labels. The colors and sizes of the squares indicate which part of the cells of the rowname type are classified to the type of the column name. On the left of the plot, the percentage of cells that is classified to an intermediate type is plotted. A good reference would classify nearly 100

Examples

data('headneck_ref')
ClassifyReference(ref_cells = headneck_ref)
data('headneck_ref')
ClassifyReference(ref_cells = headneck_ref)

Correlate all reference profiles to each other using differentially expressed genes.

Description

Correlate all reference profiles to each other using differentially expressed genes.

Usage

CorrelateReference(
  ref_cells = NULL,
  ref_profiles = NULL,
  ref_ct = "celltypes",
  ref_c = NA,
  return = FALSE,
  n_genes = 200,
  fix_ngenes = TRUE,
  print_steps = FALSE,
  only_pos = FALSE
)
CorrelateReference(
  ref_cells = NULL,
  ref_profiles = NULL,
  ref_ct = "celltypes",
  ref_c = NA,
  return = FALSE,
  n_genes = 200,
  fix_ngenes = TRUE,
  print_steps = FALSE,
  only_pos = FALSE
)

Arguments

`ref_cells`	the reference, similar to `CHETAHclassifier`'s ref_cells
`ref_profiles`	similar to `CHETAHclassifier`'s ref_profiles
`ref_ct`	the colData of `ref_cells` where the cell types are stored.
`ref_c`	the assay of `ref_cells` to use
`return`	return the matrix that was used to produce the plot
`n_genes`	as in `CHETAHclassifier`
`fix_ngenes`	as in `CHETAHclassifier`
`print_steps`	as in `CHETAHclassifier`
`only_pos`	as in `CHETAHclassifier`

Value

A square plot. The values show how much two reference profiles correlate, when using the genes with the highest fold-change.

Examples

data('headneck_ref')
CorrelateReference(ref_cells = headneck_ref)
data('headneck_ref')
CorrelateReference(ref_cells = headneck_ref)

A SingleCellExperiment with celltypes in the "celltypes" colData. A subset of the Head-Neck data from Puram et al. (2017) Cancer Cell.

Description

A SingleCellExperiment with celltypes in the "celltypes" colData. A subset of the Head-Neck data from Puram et al. (2017) Cancer Cell.

Usage

data('headneck_ref')
data('headneck_ref')

Format

A list of expression matrices. Each object is named as the cell type of the cells in that matrix. Each matrix has the cell (names) in the colums and the genes in the rows.

Source

for the original data: GEO

References

Puram et al. (2017) Cancer Cell 171:1611-1624

A SingleCellExperiment on which CHEATHclassifier is run using the `headneck_ref` It holds subset of the Melanoma data, from Tirosh et al. (2016), Science.

Description

A SingleCellExperiment on which CHEATHclassifier is run using the headneck_ref It holds subset of the Melanoma data, from Tirosh et al. (2016), Science.

Usage

data('input_mel')
data('input_mel')

Format

This is a SingleCellExperiment

Source

for the original data: GEO

References

Tirosh et al. (2016) Science 6282:189-196

Plot the CHETAH classification on 2D visulization like t-SNE + the corresponding classification tree, colored with the same colors

Description

Plot the CHETAH classification on 2D visulization like t-SNE + the corresponding classification tree, colored with the same colors

Usage

PlotCHETAH(
  input,
  redD = NA,
  interm = FALSE,
  return = FALSE,
  tree = TRUE,
  pt.size = 1,
  return_col = FALSE,
  col = NULL
)
PlotCHETAH(
  input,
  redD = NA,
  interm = FALSE,
  return = FALSE,
  tree = TRUE,
  pt.size = 1,
  return_col = FALSE,
  col = NULL
)

Arguments

`input`	a SingleCellExperiment on which `CHETAHclassifier` has been run
`redD`	the name of the reducedDim of the input to use for plotting
`interm`	color the intermediate instead of the final types
`return`	return the plot instead of printing it
`tree`	plot the tree, along with the classification
`pt.size`	the point-size of the classication plot
`return_col`	whether the colors that are used for the classification plot should be returned
`col`	custom colors for the cell types. the colors should be named with the corresponding cell types

Value

a ggplot object

Examples

data('input_mel')
#' ## Standard plot (final types colored)
PlotCHETAH(input = input_mel)

## Intermediate types colored
PlotCHETAH(input = input_mel, interm = TRUE)

## Plot only the t-SNE plot
PlotCHETAH(input = input_mel, tree = FALSE)
data('input_mel')
#' ## Standard plot (final types colored)
PlotCHETAH(input = input_mel)

## Intermediate types colored
PlotCHETAH(input = input_mel, interm = TRUE)

## Plot only the t-SNE plot
PlotCHETAH(input = input_mel, tree = FALSE)

Plots the chetah classification tree with nodes numbered

Description

Plots the chetah classification tree with nodes numbered

Usage

PlotTree(
  input,
  col = NULL,
  col_nodes = NULL,
  return = FALSE,
  no_bgc = FALSE,
  plot_limits = c(-0.4, 0.1),
  labelsize = 6
)
PlotTree(
  input,
  col = NULL,
  col_nodes = NULL,
  return = FALSE,
  no_bgc = FALSE,
  plot_limits = c(-0.4, 0.1),
  labelsize = 6
)

Arguments

`input`	a SingleCellExperiment on which `CHETAHclassifier` has been run
`col`	a vector of colors, with the names of the reference cell types
`col_nodes`	a vector of colors, ordered for node 1 till the last node
`return`	instead of printing, return the ggplot object
`no_bgc`	remove the background color from the node numbers
`plot_limits`	define the Decreasing the former further is usefull when the labels are cut of the plot (default = c(-0,25, 01)).
`labelsize`	the size of the intermediate and leaf node labels (default = 6)

Value

A ggplot object of the classification tree

Examples

data('input_mel')
PlotTree(input = input_mel)
data('input_mel')
PlotTree(input = input_mel)

Plots a variable on a t-SNE

Description

Plots a variable on a t-SNE

Usage

PlotTSNE(
  toplot,
  input,
  redD = NA,
  col = NULL,
  return = FALSE,
  limits = NULL,
  pt.size = 1,
  shiny = NULL,
  y_limits = NULL,
  x_limits = NULL,
  legend_label = ""
)
PlotTSNE(
  toplot,
  input,
  redD = NA,
  col = NULL,
  return = FALSE,
  limits = NULL,
  pt.size = 1,
  shiny = NULL,
  y_limits = NULL,
  x_limits = NULL,
  legend_label = ""
)

Arguments

`toplot`	the variable that should be plotted. Either a character vector or a factor, or a (continuous) numeric. If toplot is not named with the rownames of `redD`, it is assumed that the order of the two is the same.
`input`	a SingleCellExperiment on which `CHETAHclassifier` has been run
`redD`	the name of the reducedDim of the input to use for plotting
`col`	a vector of colors. If `toplot` is a numeric, this will become a continuous scale. If `toplot` is a charachter vector, the colors should be named with the unique values (/levels) of toplot
`return`	instead of printing, return the ggplot object
`limits`	the limits of the continuous variable to plot. When not provided the minimal and maximal value will be used
`pt.size`	the point-size
`shiny`	Needed for the shiny application: should always be NULL
`y_limits`	the y-axis limits
`x_limits`	the x-axis limits, if NULL
`legend_label`	the label of the legend

Value

A ggplot object

Examples

data('input_mel')
CD8 <- assay(input_mel)['CD8A', ]
PlotTSNE(toplot = CD8, input = input_mel)
data('input_mel')
CD8 <- assay(input_mel)['CD8A', ]
PlotTSNE(toplot = CD8, input = input_mel)

In the CHETAH classification, replace the name of a Node and all the names of the final and intermediate types under that Node.

Description

In the CHETAH classification, replace the name of a Node and all the names of the final and intermediate types under that Node.

Usage

RenameBelowNode(
  input,
  whichnode,
  replacement,
  nodes_exclude = NULL,
  types_exclude = NULL,
  node_only = FALSE,
  return_clas = FALSE
)
RenameBelowNode(
  input,
  whichnode,
  replacement,
  nodes_exclude = NULL,
  types_exclude = NULL,
  node_only = FALSE,
  return_clas = FALSE
)

Arguments

`input`	a SingleCellExperiment on which `CHETAHclassifier` has been run
`whichnode`	the number of the Node
`replacement`	a character vector that replaces the names under the selected Node
`nodes_exclude`	optional the names of the types that should NOT be replaced
`types_exclude`	optional numbers of the Nodes under the selected Node, that should NOT be replaced
`node_only`	only rename the Node itself, without affecting the types under that Node
`return_clas`	Instead of returning the SingleCellExperiment, only return the classification vector

Value

The SingleCellExperiment with the new classification or if 'return_clas = TRUE' the classification vector.

Examples

## In the example data replace all T-cell subtypes by "T cell"
data('input_mel')
#' input_mel <- RenameBelowNode(input = input_mel, whichnode = 7, replacement = "T cell")
## In the example data replace all T-cell subtypes by "T cell"
data('input_mel')
#' input_mel <- RenameBelowNode(input = input_mel, whichnode = 7, replacement = "T cell")

Package 'CHETAH'

Help Index

Identification of cell types aided by hierarchical clustering

Description

Usage

Arguments

Details

Value

Examples

Launch a web page to interactively go trough the classification

Description

Usage

Arguments

Value

(Re)classify after running CHETAHclassifier using a confidence threshold NOTE: In case of bulk reference profiles: only the correlations will be used, as the data does not allow for profile or confidence scores to be calculated.

Description

Usage

Arguments

Value

Examples

Use a reference dataset to classify itself. A good reference should have almost no mixture between reference cells.

Description

Usage

Arguments

Value

Examples

Correlate all reference profiles to each other using differentially expressed genes.

Description

Usage

Arguments

Value

Examples

A SingleCellExperiment with celltypes in the "celltypes" colData. A subset of the Head-Neck data from Puram et al. (2017) Cancer Cell.

Description

Usage

Format

Source

References

A SingleCellExperiment on which CHEATHclassifier is run using the headneck_ref It holds subset of the Melanoma data, from Tirosh et al. (2016), Science.

Description

Usage

Format

Source

References

Plot the CHETAH classification on 2D visulization like t-SNE + the corresponding classification tree, colored with the same colors

Description

Usage

Arguments

Value

Examples

Plots the chetah classification tree with nodes numbered

Description

Usage

Arguments

Value

Examples

Plots a variable on a t-SNE

Description

Usage

Arguments

Value

Examples

In the CHETAH classification, replace the name of a Node and all the names of the final and intermediate types under that Node.

Description

Usage

Arguments

Value

Examples

(Re)classify after running `CHETAHclassifier` using a confidence threshold
NOTE: In case of bulk reference profiles: only the correlations will be used, as the data does not allow for profile or confidence scores to be calculated.

A SingleCellExperiment on which CHEATHclassifier is run using the `headneck_ref` It holds subset of the Melanoma data, from Tirosh et al. (2016), Science.