Title: | Cytometry Cluster Hierarchy and Cellular-to-phenotype Associations |
---|---|
Description: | treekoR is a novel framework that aims to utilise the hierarchical nature of single cell cytometry data to find robust and interpretable associations between cell subsets and patient clinical end points. These associations are aimed to recapitulate the nested proportions prevalent in workflows inovlving manual gating, which are often overlooked in workflows using automatic clustering to identify cell populations. We developed treekoR to: Derive a hierarchical tree structure of cell clusters; quantify a cell types as a proportion relative to all cells in a sample (%total), and, as the proportion relative to a parent population (%parent); perform significance testing using the calculated proportions; and provide an interactive html visualisation to help highlight key results. |
Authors: | Adam Chan [aut, cre], Ellis Patrick [ctb] |
Maintainer: | Adam Chan <[email protected]> |
License: | GPL-3 |
Version: | 1.15.0 |
Built: | 2024-11-19 05:44:28 UTC |
Source: | https://github.com/bioc/treekoR |
a function to add the frequency bars for each cluster
addFreqBars( p, clusters, offset = 0.75, bar_length = 3, bar_width = 0.4, freq_labels = FALSE )
addFreqBars( p, clusters, offset = 0.75, bar_length = 3, bar_width = 0.4, freq_labels = FALSE )
p |
a phylogenetic tree plot created from the ggtree() function |
clusters |
a vector representing the cell type or cluster of each cell (can be character or numeric) |
offset |
distance between the heatmap and frequency bars |
bar_length |
length of bar with max frequency |
bar_width |
width of each frequency bar |
freq_labels |
boolean indicated whether or not to show frequency bar labels |
an interactive ggplot graph object with frequency bars of clusters alongside heatmap of cluster median expression
a function to add a heatmap of cluster medians alongisde the phylogenetic tree
addHeatMap( p, cluster_medians, offset = 0.5, width = 1, expand_y_lim = 20, low = "#313695", mid = "ivory", high = "#A50026", colnames_angle = 90, metric_name = "Column z-score" )
addHeatMap( p, cluster_medians, offset = 0.5, width = 1, expand_y_lim = 20, low = "#313695", mid = "ivory", high = "#A50026", colnames_angle = 90, metric_name = "Column z-score" )
p |
a phylogenetic tree plot created from the ggtree() function |
cluster_medians |
a dataframe with the cluster medians. The rownumbers of the clusters median data frame should correspond to the nodes in the phylo tree. The column names should also correspond to the labels you want to use |
offset |
the distance between the tree plot and heatmap |
width |
width of each tile in the heatmap |
expand_y_lim |
white space below heatmap |
low |
colour used for low values on heatmap |
mid |
colour used for medium values on heatmap |
high |
colour used for large values on heatmap |
colnames_angle |
angle for x-axis label |
metric_name |
legend title |
an interactive ggplot graph object with heatmap of median cluster expressions plotted alongside hierarchical tree
a function to create a skeleton tree diagram to display significance testing results on each node
addTree(p, offset = 0.3, font_size = 2.5, hjust = 0)
addTree(p, offset = 0.3, font_size = 2.5, hjust = 0)
p |
a phylogenetic tree plot created from the ggtree() function |
offset |
distance between leaf nodes on the tree and their labels |
font_size |
font size of leaf labels |
hjust |
horizontal justification as defined in ggplot2 |
a ggtree graph object with the hierarchical tree of clusters and corresponding labels
Adding statistical test results onto the tree by using colourful nodes and branches Takes a ggtree object with test results for each node and returns a ggtree graph object
colourTree( tree, point_size = 1.5, high = "#00c434", low = "purple", mid = "ivory2" )
colourTree( tree, point_size = 1.5, high = "#00c434", low = "purple", mid = "ivory2" )
tree |
a tree plot created from the ggtree() function with p$data containing test statisic and p- |
point_size |
size of nodes in the tree |
high |
colour for large values |
low |
colour for low values |
mid |
colour for middle values |
an interactive ggplot graph object, plotting the hierarchical tree of clusters with nodes and branches coloured by the significance testing results.
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach") tested_tree <- testTree(clust_tree$clust_tree, clusters=clusters, samples=samples, classes=classes) colourTree(tested_tree)
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach") tested_tree <- testTree(clust_tree$clust_tree, clusters=clusters, samples=samples, classes=classes) colourTree(tested_tree)
Data from a an experiment investigating T cell compositions between COVID-19 patients and healthy control. This data has been transformed using a arcsinh transform using a co-factor of 5 and randomly subsetted
data(COVIDSampleData)
data(COVIDSampleData)
An object of class "SingeCellExperiment"
De Biasi et al. (2020) Nat Commun 11, 3434 (Nature)
data(COVIDSampleData)
data(COVIDSampleData)
findChildren
findChildren(tree)
findChildren(tree)
tree |
a ggtree object |
a ggtree object with the data containing a column with the clusters contained in each node
getCellGMeans helper function
geometricMean(x, na.rm = TRUE)
geometricMean(x, na.rm = TRUE)
x |
vector containing numeric values |
na.rm |
whether or not to ignore NA values |
geomtric mean of vector x
getCellGMeans
getCellGMeans(phylo, exprs, clusters, samples, classes)
getCellGMeans(phylo, exprs, clusters, samples, classes)
phylo |
a phylogram with tip.labels corresponding to cell types/cluster contained in 'clusters' vector |
exprs |
a dataframe containing single cell expression data |
clusters |
a vector representing the cell type or cluster of each cell (can be character or numeric). If numeric, cluster names need to be consecutive starting from 1. |
samples |
a vector identifying the patient each cell belongs to |
classes |
a vector containing the patient outcome/class each cell belongs to |
a dataframe containing proportions calculated for each sample
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach") means_df <- getCellGMeans(clust_tree$clust_tree, exprs=exprs, clusters=clusters, samples=samples, classes=classes)
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach") means_df <- getCellGMeans(clust_tree$clust_tree, exprs=exprs, clusters=clusters, samples=samples, classes=classes)
getCellProp
getCellProp(phylo, clusters, samples, classes, excl_top_node_parent = TRUE)
getCellProp(phylo, clusters, samples, classes, excl_top_node_parent = TRUE)
phylo |
a phylogram with tip.labels corresponding to cell types/cluster contained in 'clusters' vector |
clusters |
a vector representing the cell type or cluster of each cell (can be character or numeric). If numeric, cluster names need to be consecutive starting from 1. |
samples |
a vector identifying the patient each cell belongs to |
classes |
a vector containing the patient outcome/class each cell belongs to |
excl_top_node_parent |
a boolean indicating whether the for cell types with the highest node as their parent |
a dataframe containing proportions calculated for each sample
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach") prop_df <- getCellProp(clust_tree$clust_tree, clusters=clusters, samples=samples, classes=classes)
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach") prop_df <- getCellProp(clust_tree$clust_tree, clusters=clusters, samples=samples, classes=classes)
getClusterTree This function takes a CATALYST sce with clusters and creates a hierarchical tree
getClusterTree( exprs, clusters, hierarchy_method = "hopach", hopach_kmax = 5, hopach_K = 10, scale_exprs = TRUE )
getClusterTree( exprs, clusters, hierarchy_method = "hopach", hopach_kmax = 5, hopach_K = 10, scale_exprs = TRUE )
exprs |
a dataframe containing single cell expression data |
clusters |
a vector representing the cell type or cluster of each cell (can be character or numeric). If numeric, cluster names need to be consecutive starting from 1. |
hierarchy_method |
a string indicating the hierarchical tree construction method to be used |
hopach_kmax |
integer between 1 and 9 specifying the maximum number of children at each node in the tree |
hopach_K |
positive integer specifying the maximum number of levels in the tree. Must be 15 or less, due to computational limitations (overflow) |
scale_exprs |
boolean indicating whether to scale median cluster expression data before constructing hierarchical tree |
a list containing the cluster median frequencies and a phylogram of the hierarchical tree
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach")
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach")
getCellProp helper function
getParentProp(vars1, vars2, n_cells)
getParentProp(vars1, vars2, n_cells)
vars1 |
name of cell type, matching to column in n_cells |
vars2 |
name of parent cell type, matching to column in n_cells |
n_cells |
matrix of counts of each cell type per sample |
a vector containing the proportions of cell type vars1 as a percent of parent vars2 per sample
getCellProp helper function
getTotalProp(vars1, n_cells, n_cells_pat)
getTotalProp(vars1, n_cells, n_cells_pat)
vars1 |
name of cell type, matching to column in n_cells |
n_cells |
matrix of counts of each cell type per sample |
n_cells_pat |
vector containing number of cells per sample |
a vector containing the proportions of cell type vars1 as a percent of total per sample
getTreeResults
getTreeResults(testedTree, sort_by = "parent")
getTreeResults(testedTree, sort_by = "parent")
testedTree |
a ggtree object outputed from testTree() |
sort_by |
whether to sort by p-values testing via proportions to parent or p-values testing via absolute proportions. Values can can be c(NA, "parent", "all") |
a dataframe with hierarchical tree nodes, corresponding clusters and corresponding significance testing results
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach") tested_tree <- testTree(clust_tree$clust_tree, clusters=clusters, samples=samples, classes=classes, pos_class_name=NULL) res_df <- getTreeResults(tested_tree) head(res_df, 10)
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach") tested_tree <- testTree(clust_tree$clust_tree, clusters=clusters, samples=samples, classes=classes, pos_class_name=NULL) res_df <- getTreeResults(tested_tree) head(res_df, 10)
hopachToPhylo
hopachToPhylo(res)
hopachToPhylo(res)
res |
an object returned from the runHOPACH() function |
a phylogram converted from the outputted list from the runHOPACH function
library(SingleCellExperiment) library(data.table) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_med_dt <- as.data.table(exprs) clust_med_dt[, cluster_id := clusters] res <- clust_med_dt[, lapply(.SD, median, na.rm=TRUE), by=cluster_id] res2 <- res[,.SD, .SDcols = !c('cluster_id')] hopach_res <- runHOPACH(as.data.frame(scale(res2))) phylo <- hopachToPhylo(hopach_res)
library(SingleCellExperiment) library(data.table) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_med_dt <- as.data.table(exprs) clust_med_dt[, cluster_id := clusters] res <- clust_med_dt[, lapply(.SD, median, na.rm=TRUE), by=cluster_id] res2 <- res[,.SD, .SDcols = !c('cluster_id')] hopach_res <- runHOPACH(as.data.frame(scale(res2))) phylo <- hopachToPhylo(hopach_res)
This function takes a hierarchical tree which has been tested for proportion to all and proportion to parent cluster
plotInteractiveHeatmap( testedTree, clust_med_df, clusters, svg_width = 13, svg_height = 9, tr_offset = 0.3, tr_font_size = 2, tr_point_size = 1.5, tr_col_high = "#00c434", tr_col_low = "purple", tr_col_mid = "ivory2", hm_offset = 1, hm_tile_width = 1, hm_expand_y_lim = 20, hm_col_high = "#cc2010", hm_col_mid = "#fff8de", hm_col_low = "#66a6cc", fb_offset = 0.75, fb_bar_length = 3, fb_bar_width = 0.4, fb_freq_labels = FALSE )
plotInteractiveHeatmap( testedTree, clust_med_df, clusters, svg_width = 13, svg_height = 9, tr_offset = 0.3, tr_font_size = 2, tr_point_size = 1.5, tr_col_high = "#00c434", tr_col_low = "purple", tr_col_mid = "ivory2", hm_offset = 1, hm_tile_width = 1, hm_expand_y_lim = 20, hm_col_high = "#cc2010", hm_col_mid = "#fff8de", hm_col_low = "#66a6cc", fb_offset = 0.75, fb_bar_length = 3, fb_bar_width = 0.4, fb_freq_labels = FALSE )
testedTree |
a ggtree object that has been run through the testTree |
clust_med_df |
a dataframe with the cluster medians. The rownumbers of the clusters median data frame should correspond to the nodes in the phylo tree. The column names should also correspond to the labels you want to use |
clusters |
a vector representing the cell type or cluster of each cell (can be character or numeric) |
svg_width |
width of svg canvas |
svg_height |
height of svf canvas |
tr_offset |
distance between leaf nodes on the tree and their labels |
tr_font_size |
font size of leaf labels |
tr_point_size |
size of each node in the tree |
tr_col_high |
colour used for high test statistics, coloured on the nodes and branches of the tree |
tr_col_low |
colour used for low test statistics, coloured on the nodes and branches of the tree |
tr_col_mid |
colour used for medium test statistics, coloured on the nodes and branches of the tree |
hm_offset |
distance between the tree and the heatmap |
hm_tile_width |
width of each tile in the heatmap |
hm_expand_y_lim |
white space below heatmap |
hm_col_high |
colour used for large values on heatmap |
hm_col_mid |
colour used for medium values on heatmap |
hm_col_low |
colour used for low values on heatmap |
fb_offset |
distance between the heatmap and frequency bars |
fb_bar_length |
length of bar with max frequency |
fb_bar_width |
width of each frequency bar |
fb_freq_labels |
boolean indicated whether or not to show frequency bar labels |
an interactive ggplot object containing the hierarchical tree of clusters coloured by significance testing results, with corresponding heatmap and a scatterplot comparing significance whne testing using proportions to parent vs. absolute proportions
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach") tested_tree <- testTree(clust_tree$clust_tree, clusters=clusters, samples=samples, classes=classes) plotInteractiveHeatmap(tested_tree, clust_med_df = clust_tree$median_freq, clusters=clusters)
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach") tested_tree <- testTree(clust_tree$clust_tree, clusters=clusters, samples=samples, classes=classes) plotInteractiveHeatmap(tested_tree, clust_med_df = clust_tree$median_freq, clusters=clusters)
plotSigScatter
plotSigScatter(testedTree, scatter_tooltip, max_val)
plotSigScatter(testedTree, scatter_tooltip, max_val)
testedTree |
an output from the function testTree() |
scatter_tooltip |
vector containing tooltips for interactive plot |
max_val |
maximum value to set as x/y axis limits |
a ggplot object, containing test statistics from testing proportions relative to parent against the test statistics from testing absolute proportions.
This function runs edgeR using the treekoR inputs across all nodes of the hierarchical tree of clusters, adapted from the diffcyt workflow
runEdgeRTests(td, clusters, samples, classes, pos_class_name)
runEdgeRTests(td, clusters, samples, classes, pos_class_name)
td |
a dataframe of data from ggtree object |
clusters |
a vector representing the cell type or cluster of each cell (can be character or numeric). If numeric, cluster names need to be consecutive starting from 1. |
samples |
a vector identifying the patient each cell belongs to |
classes |
a vector containing the patient outcome/class each cell belongs to |
pos_class_name |
a character indicating which class should be treated as positive |
a dataframe with pvalues, test statistic (signed -log10(p)), and FDR
This function runs GLMM using the treekoR inputs across all nodes of the hierarchical tree of clusters, adapted from the diffcyt workflow. (Unable to get direction of test statistic currently)
runGLMMTests(td, clusters, samples, classes, pos_class_name, neg_class_name)
runGLMMTests(td, clusters, samples, classes, pos_class_name, neg_class_name)
td |
a dataframe of data from ggtree object |
clusters |
a vector representing the cell type or cluster of each cell (can be character or numeric). If numeric, cluster names need to be consecutive starting from 1. |
samples |
a vector identifying the patient each cell belongs to |
classes |
a vector containing the patient outcome/class each cell belongs to |
pos_class_name |
a character indicating which class should be treated as positive |
neg_class_name |
a character indicating which class should be treated as negative |
a dataframe with pvalues and test statistics
runHOPACH
runHOPACH(data, K = 10, kmax = 5, dissimilarity_metric = "cor")
runHOPACH(data, K = 10, kmax = 5, dissimilarity_metric = "cor")
data |
dataframe containing the median expression of the clusters/cell types |
K |
positive integer specifying the maximum number of levels in the tree. Must be 15 or less, due to computational limitations (overflow) |
kmax |
integer between 1 and 9 specifying the maximum number of children at each node in the tree |
dissimilarity_metric |
metric used to calculate dissimilarities between clusters/cell types |
a list containing the groups each cluster belongs to at each level of the hopach tree
library(SingleCellExperiment) library(data.table) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_med_dt <- as.data.table(exprs) clust_med_dt[, cluster_id := clusters] res <- clust_med_dt[, lapply(.SD, median, na.rm=TRUE), by=cluster_id] res2 <- res[,.SD, .SDcols = !c('cluster_id')] hopach_res <- runHOPACH(as.data.frame(scale(res2)))
library(SingleCellExperiment) library(data.table) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_med_dt <- as.data.table(exprs) clust_med_dt[, cluster_id := clusters] res <- clust_med_dt[, lapply(.SD, median, na.rm=TRUE), by=cluster_id] res2 <- res[,.SD, .SDcols = !c('cluster_id')] hopach_res <- runHOPACH(as.data.frame(scale(res2)))
This function takes a hierarchical tree of the cluster medians of a cytometry dataset, and then uses this structure to perform t-tests between conditions of patients testing for difference using the proportion of cluster relative to sample's n and proportion of cluster relative to sample's n of hierarchical parent cluster. Takes a ggtree object and returns a ggtree object with testing results appended in the data
testTree( phylo, clusters, samples, classes, sig_test = "ttest", p_adjust = NULL, pos_class_name = NULL )
testTree( phylo, clusters, samples, classes, sig_test = "ttest", p_adjust = NULL, pos_class_name = NULL )
phylo |
a ggtree object |
clusters |
a vector representing the cell type or cluster of each cell (can be character or numeric). If numeric, cluster names need to be consecutive starting from 1. |
samples |
a vector identifying the patient each cell belongs to |
classes |
a vector containing the patient outcome/class each cell belongs to |
sig_test |
a character, either "ttest" or "wilcox" indicating the significance test to be used |
p_adjust |
a character, indicating whether p-value adjustment should be performed. Valid values are in stats::p.adjust.methods |
pos_class_name |
a character indicating which class is positive |
a ggtree object with significance testing results in embedded data
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach") tested_tree <- testTree(clust_tree$clust_tree, clusters=clusters, samples=samples, classes=classes, sig_test="ttest", pos_class_name=NULL)
library(SingleCellExperiment) data(COVIDSampleData) sce <- DeBiasi_COVID_CD8_samp exprs <- t(assay(sce, "exprs")) clusters <- colData(sce)$cluster_id classes <- colData(sce)$condition samples <- colData(sce)$sample_id clust_tree <- getClusterTree(exprs, clusters, hierarchy_method="hopach") tested_tree <- testTree(clust_tree$clust_tree, clusters=clusters, samples=samples, classes=classes, sig_test="ttest", pos_class_name=NULL)