Title: | An algorithm to find optimal signal levels in a tree |
---|---|
Description: | The arrangement of hypotheses in a hierarchical structure appears in many research fields and often indicates different resolutions at which data can be viewed. This raises the question of which resolution level the signal should best be interpreted on. treeclimbR provides a flexible method to select optimal resolution levels (potentially different levels in different parts of the tree), rather than cutting the tree at an arbitrary level. treeclimbR uses a tuning parameter to generate candidate resolutions and from these selects the optimal one. |
Authors: | Ruizhu Huang [aut] , Charlotte Soneson [aut, cre] |
Maintainer: | Charlotte Soneson <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.3.0 |
Built: | 2024-10-31 06:30:58 UTC |
Source: | https://github.com/bioc/treeclimbR |
Aggregate observed values based on a column (sample) tree, e.g. for differential state analysis. The returned object will contain one abundance matrix for each node in the tree, with columns corresponding to sample IDs and rows corresponding to the same features as the rows of the input object. The nodes correspond to either the original sample clusters, or larger metaclusters encompassing several of the original clusters (defined by the provided column tree).
aggDS( TSE, assay = "counts", sample_id = "sample_id", group_id = "group_id", cluster_id = "cluster_id", FUN = sum, message = FALSE )
aggDS( TSE, assay = "counts", sample_id = "sample_id", group_id = "group_id", cluster_id = "cluster_id", FUN = sum, message = FALSE )
TSE |
A |
assay |
The name or index of the assay from |
sample_id |
A character scalar indicating the column of
|
group_id |
A character scalar indicating the column of
|
cluster_id |
A character scalar indicating the column of
|
FUN |
The aggregation function. |
message |
A logical scalar, indicating whether progress messages should be printed to the console. |
A SummarizedExperiment
object. Each assay represents the
aggregated values for one node in the tree.
Ruizhu Huang, Charlotte Soneson
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ape) library(ggtree) }) set.seed(1L) tr <- rtree(3, tip.label = LETTERS[seq_len(3)]) ggtree(tr) + geom_text(aes(label = node), hjust = -1, vjust = 1) + geom_text(aes(label = label), hjust = -1, vjust = -1) cc <- matrix(rpois(60, 10), nrow = 6) rownames(cc) <- paste0("gene", seq_len(6)) colnames(cc) <- paste0("cell", seq_len(10)) cd <- data.frame(sid = rep(seq_len(2), each = 5), gid = rep(letters[seq_len(2)], each = 5), cid = sample(LETTERS[seq_len(3)], size = 10, replace = TRUE), stringsAsFactors = FALSE) tse <- TreeSummarizedExperiment(assays = list(counts = cc), colTree = tr, colNodeLab = cd$cid, colData = cd) out <- aggDS(TSE = tse, assay = "counts", sample_id = "sid", group_id = "gid", cluster_id = "cid") ## Aggregated counts for the node 5 SummarizedExperiment::assay(out, "alias_5") ## This is equal to the sum of the counts from nodes 1 and 2 SummarizedExperiment::assay(out, "alias_1") SummarizedExperiment::assay(out, "alias_2")
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ape) library(ggtree) }) set.seed(1L) tr <- rtree(3, tip.label = LETTERS[seq_len(3)]) ggtree(tr) + geom_text(aes(label = node), hjust = -1, vjust = 1) + geom_text(aes(label = label), hjust = -1, vjust = -1) cc <- matrix(rpois(60, 10), nrow = 6) rownames(cc) <- paste0("gene", seq_len(6)) colnames(cc) <- paste0("cell", seq_len(10)) cd <- data.frame(sid = rep(seq_len(2), each = 5), gid = rep(letters[seq_len(2)], each = 5), cid = sample(LETTERS[seq_len(3)], size = 10, replace = TRUE), stringsAsFactors = FALSE) tse <- TreeSummarizedExperiment(assays = list(counts = cc), colTree = tr, colNodeLab = cd$cid, colData = cd) out <- aggDS(TSE = tse, assay = "counts", sample_id = "sid", group_id = "gid", cluster_id = "cid") ## Aggregated counts for the node 5 SummarizedExperiment::assay(out, "alias_5") ## This is equal to the sum of the counts from nodes 1 and 2 SummarizedExperiment::assay(out, "alias_1") SummarizedExperiment::assay(out, "alias_2")
A collection of functions from the diffcyt
package
have been generalized to work with data provided in a tree structure. The
tree represents increasingly coarse clustering of the cells, and the leaves
are the clusters from an initial, high-resolution clustering generated by
diffcyt
. Note that diffcyt
represents data using
SummarizedExperiment
objects with cells in rows and features in
columns.
buildTree(d_se, dist_method = "euclidean", hclust_method = "average") calcMediansByTreeMarker(d_se, tree) calcTreeCounts(d_se, tree) calcTreeMedians(d_se, tree, message = FALSE)
buildTree(d_se, dist_method = "euclidean", hclust_method = "average") calcMediansByTreeMarker(d_se, tree) calcTreeCounts(d_se, tree) calcTreeMedians(d_se, tree, message = FALSE)
d_se |
A |
dist_method |
The distance measure to be used. This must be one of
"euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski".
Any unambiguous substring can be given. Please refer to |
hclust_method |
The agglomeration method to be used. This should be (an
unambiguous abbreviation of) one of "ward.D", "ward.D2", "single",
"complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC)
or "centroid" (= UPGMC). Please refer to |
tree |
A |
message |
A logical scalar indicating whether progress messages should be printed. |
The data object is assumed to contain a factor marker_class
in the
column meta-data (see prepareData
), which indicates
the protein marker class for each column of data ("type"
,
"state"
, or "none"
).
Variables id_type_markers
and id_state_markers
are saved in
the metadata
slot of the output object. These can be used to
identify the 'cell type' and 'cell state' markers in the list of
assays
in the output TreeSummarizedExperiment
object,
which is useful in later steps of the 'diffcyt' pipeline.
buildTree
applies hierarchical clustering to build a tree starting
from the high-resolution clustering created by
generateClusters
. The function calculates the median
abundance for each (ID type) marker and cluster, and uses this data to
further aggregate the initial clusters using hierarchical clustering.
calcTreeCounts
calculates the number of cells per cluster-sample
combination (referred to as cluster cell 'counts', 'abundances', or
'frequencies'. This is a tree version of calcCounts
.
calcMediansByTreeMarker
calculates the median value for each
cluster-marker combination. A cluster is represented by a node on the tree.
This is a tree version of calcMediansByClusterMarker
.
calcTreeMedians
calculates the median expression for each
cluster-sample-marker combination. This is a tree version of
calcMedians
.
For buildTree
, a phylo
object representing the
hierarchical clustering of the initial high-resolution clusters.
For calcTreeCounts
, a TreeSummarizedExperiment
object,
with clusters (nodes of the tree) in rows, samples in columns and
abundances (counts) in an assay
.
For calcMediansByTreeMarker
, a TreeSummarizedExperiment
object with clusters (nodes of the tree) in rows and markers in columns. The
marker expression values are in the assay
.
For calcTreeMedians
, a TreeSummarizedExperiment
object
with median marker expression for each cluster (each node of the tree) and
each sample for each cluster (node of the tree). Each node is represented as
a separate assay, with the number of columns equal to the number of samples.
The metadata
slot contains variables id_type_markers
and
id_state_markers
.
Ruizhu Huang
## For a complete workflow example demonstrating each step in the 'diffcyt' ## pipeline, please see the diffcyt vignette. suppressPackageStartupMessages({ library(diffcyt) library(TreeSummarizedExperiment) }) ## Helper function to create random data (one sample) d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) { d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor colnames(d) <- paste0("marker", sprintf("%02d", seq_len(ncol))) d } ## Create random data (without differential signal) set.seed(123) d_input <- list( sample1 = d_random(), sample2 = d_random(), sample3 = d_random(), sample4 = d_random() ) experiment_info <- data.frame( sample_id = factor(paste0("sample", seq_len(4))), group_id = factor(c("group1", "group1", "group2", "group2")) ) marker_info <- data.frame( channel_name = paste0("channel", sprintf("%03d", seq_len(20))), marker_name = paste0("marker", sprintf("%02d", seq_len(20))), marker_class = factor(c(rep("type", 10), rep("state", 10)), levels = c("type", "state", "none")) ) # Prepare data d_se <- diffcyt::prepareData(d_input, experiment_info, marker_info) # Transform data d_se <- diffcyt::transformData(d_se) # Generate clusters d_se <- diffcyt::generateClusters(d_se) # Build a tree tr <- buildTree(d_se) ## Calculate abundances for nodes in each sample d_counts_tree <- calcTreeCounts(d_se = d_se, tree = tr) ## Calculate medians (by cluster and marker) d_medians_by_cluster_marker <- calcMediansByTreeMarker(d_se = d_se, tree = tr) ## Calculate medians (by cluster, sample and marker) d_medians_tree <- calcTreeMedians(d_se = d_se, tree = tr)
## For a complete workflow example demonstrating each step in the 'diffcyt' ## pipeline, please see the diffcyt vignette. suppressPackageStartupMessages({ library(diffcyt) library(TreeSummarizedExperiment) }) ## Helper function to create random data (one sample) d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) { d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor colnames(d) <- paste0("marker", sprintf("%02d", seq_len(ncol))) d } ## Create random data (without differential signal) set.seed(123) d_input <- list( sample1 = d_random(), sample2 = d_random(), sample3 = d_random(), sample4 = d_random() ) experiment_info <- data.frame( sample_id = factor(paste0("sample", seq_len(4))), group_id = factor(c("group1", "group1", "group2", "group2")) ) marker_info <- data.frame( channel_name = paste0("channel", sprintf("%03d", seq_len(20))), marker_name = paste0("marker", sprintf("%02d", seq_len(20))), marker_class = factor(c(rep("type", 10), rep("state", 10)), levels = c("type", "state", "none")) ) # Prepare data d_se <- diffcyt::prepareData(d_input, experiment_info, marker_info) # Transform data d_se <- diffcyt::transformData(d_se) # Generate clusters d_se <- diffcyt::generateClusters(d_se) # Build a tree tr <- buildTree(d_se) ## Calculate abundances for nodes in each sample d_counts_tree <- calcTreeCounts(d_se = d_se, tree = tr) ## Calculate medians (by cluster and marker) d_medians_by_cluster_marker <- calcMediansByTreeMarker(d_se = d_se, tree = tr) ## Calculate medians (by cluster, sample and marker) d_medians_tree <- calcTreeMedians(d_se = d_se, tree = tr)
edgerWrp
is a wrapper using functions from the edgeR
package (Robinson et al. 2010, Bioinformatics; McCarthy et al. 2012,
Nucleic Acids Research) to fit models and perform a moderated test for
each entity.
edgerWrp( count, lib_size = NULL, option = c("glm", "glmQL"), design, contrast = NULL, normalize = TRUE, normalize_method = "TMM", ... )
edgerWrp( count, lib_size = NULL, option = c("glm", "glmQL"), design, contrast = NULL, normalize = TRUE, normalize_method = "TMM", ... )
count |
A matrix with features (e.g., genes or microbes) in rows and samples in columns. |
lib_size |
A numeric vector with library sizes for each sample. If
|
option |
Either |
design |
A numeric design matrix, e.g. created by
|
contrast |
A numeric vector specifying one contrast of
the linear model coefficients to be tested. Its length
must equal the number of columns of |
normalize |
A logical scalar, specifying whether normalization factors
should be calculated (using |
normalize_method |
Normalization method to be used. Please refer to
|
... |
More arguments to pass to |
The function performs the following steps:
Create a DGEList
object. If lib_size
is
given, set the library sizes to these values, otherwise use the column sums
of the count matrix.
If normalize
is TRUE
, estimate normalization factors
using calcNormFactors
.
Estimate dispersions with estimateDisp
.
Depending on the value of option
, apply either the LRT or
QLF edgeR workflows (i.e., either glmFit
+
glmLRT
or glmQLFit
+
glmQLFTest
), testing for the specified contrast.
The output of glmQLFTest
or
glmLRT
depending on the specified option
.
Ruizhu Huang
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) ## Read example data x <- readRDS(system.file("extdata/da_sim_100_30_18de.rds", package = "treeclimbR")) ## Run differential abundance analysis out <- edgerWrp(count = assay(x), option = "glm", design = model.matrix(~ group, data = colData(x)), contrast = c(0, 1)) ## The output is an edgeR DGELRT object class(out)
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) ## Read example data x <- readRDS(system.file("extdata/da_sim_100_30_18de.rds", package = "treeclimbR")) ## Run differential abundance analysis out <- edgerWrp(count = assay(x), option = "glm", design = model.matrix(~ group, data = colData(x)), contrast = c(0, 1)) ## The output is an edgeR DGELRT object class(out)
Evaluate all candidate levels proposed by getCand
and select
the one with best performance. For more details about how the scoring is
done, see Huang et al (2021): https://doi.org/10.1186/s13059-021-02368-1.
evalCand( tree, type = c("single", "multiple"), levels, score_data = NULL, node_column, p_column, sign_column, feature_column = NULL, method = "BH", limit_rej = 0.05, use_pseudo_leaf = FALSE, message = FALSE )
evalCand( tree, type = c("single", "multiple"), levels, score_data = NULL, node_column, p_column, sign_column, feature_column = NULL, method = "BH", limit_rej = 0.05, use_pseudo_leaf = FALSE, message = FALSE )
tree |
A |
type |
A character scalar indicating whether the evaluation is for a
DA-type workflow (set |
levels |
A list of candidate levels that are returned by
|
score_data |
A |
node_column |
The name of the column that contains the node information. |
p_column |
The name of the column that contains p-values of nodes. |
sign_column |
The name of the column that contains the direction of the (estimated) change. |
feature_column |
The name of the column that contains information about the feature ID. |
method |
method The multiple testing correction method. Please refer to
the argument |
limit_rej |
The desired false discovery rate threshold. |
use_pseudo_leaf |
A logical scalar. If |
message |
A logical scalar, indicating whether progress messages should be printed. |
A list with the following components:
candidate_best
The best candidate level
output
Node-level information for best candidate level
candidate_list
A list of candidates
level_info
Summary information of all candidates
FDR
The specified FDR level
method
The method to perform multiple test correction.
column_info
A list with the specified node, p-value, sign and feature column names
More details about the columns in level_info
:
t The thresholds.
r The upper limit of t to control FDR on the leaf level.
is_valid Whether the threshold is in the range to control leaf FDR.
limit_rej
The specified FDR.
level_name
The name of the candidate level.
rej_leaf
The number of rejections on the leaf level.
rej_pseudo_leaf
The number of rejected pseudo-leaf nodes.
rej_node
The number of rejections on the tested candidate
level (leaves or internal nodes).
Ruizhu Huang
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) }) ## Generate example tree and assign p-values and signs to each node data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 13, fill = "blue", alpha = 0.5) + geom_hilight(node = 18, fill = "orange", alpha = 0.5) set.seed(1) pv <- runif(19, 0, 1) pv[c(seq_len(5), 13, 14, 18)] <- runif(8, 0, 0.001) fc <- sample(c(-1, 1), 19, replace = TRUE) fc[c(seq_len(3), 13, 14)] <- 1 fc[c(4, 5, 18)] <- -1 df <- data.frame(node = seq_len(19), pvalue = pv, logFoldChange = fc) ## Propose candidates ll <- getCand(tree = tinyTree, score_data = df, node_column = "node", p_column = "pvalue", sign_column = "logFoldChange") ## Evaluate candidates cc <- evalCand(tree = tinyTree, levels = ll$candidate_list, score_data = ll$score_data, node_column = "node", p_column = "pvalue", sign_column = "logFoldChange", limit_rej = 0.05) ## Best candidate cc$candidate_best ## Details for best candidate cc$output
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) }) ## Generate example tree and assign p-values and signs to each node data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 13, fill = "blue", alpha = 0.5) + geom_hilight(node = 18, fill = "orange", alpha = 0.5) set.seed(1) pv <- runif(19, 0, 1) pv[c(seq_len(5), 13, 14, 18)] <- runif(8, 0, 0.001) fc <- sample(c(-1, 1), 19, replace = TRUE) fc[c(seq_len(3), 13, 14)] <- 1 fc[c(4, 5, 18)] <- -1 df <- data.frame(node = seq_len(19), pvalue = pv, logFoldChange = fc) ## Propose candidates ll <- getCand(tree = tinyTree, score_data = df, node_column = "node", p_column = "pvalue", sign_column = "logFoldChange") ## Evaluate candidates cc <- evalCand(tree = tinyTree, levels = ll$candidate_list, score_data = ll$score_data, node_column = "node", p_column = "pvalue", sign_column = "logFoldChange", limit_rej = 0.05) ## Best candidate cc$candidate_best ## Details for best candidate cc$output
Calculate the false discovery rate on a tree structure, at either leaf or node level.
fdr(tree, truth, found, only.leaf = TRUE)
fdr(tree, truth, found, only.leaf = TRUE)
tree |
A |
truth |
True signal nodes (e.g., nodes that are truly differentially
abundant between experimental conditions). Note: when the
FDR is requested at the leaf level ( |
found |
Detected signal nodes (e.g., nodes that have been found to be
differentially abundant via a statistical testing procedure).
Note: when the FDR is requested at the leaf level
( |
only.leaf |
A logical scalar. If |
The estimated false discovery rate.
Ruizhu Huang
suppressPackageStartupMessages({ library(ggtree) library(TreeSummarizedExperiment) }) data(tinyTree) ## Two branches are truly differential ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 16, fill = "orange", alpha = 0.3) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) ## FDR at the leaf level if nodes 14 and 15 are called differential (1/8) fdr(tree = tinyTree, truth = c(16, 13), found = c(15, 14), only.leaf = TRUE) ## FDR at the node level if nodes 14 and 15 are called differential (2/14) fdr(tree = tinyTree, truth = c(16, 13), found = c(15, 14), only.leaf = FALSE)
suppressPackageStartupMessages({ library(ggtree) library(TreeSummarizedExperiment) }) data(tinyTree) ## Two branches are truly differential ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 16, fill = "orange", alpha = 0.3) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) ## FDR at the leaf level if nodes 14 and 15 are called differential (1/8) fdr(tree = tinyTree, truth = c(16, 13), found = c(15, 14), only.leaf = TRUE) ## FDR at the node level if nodes 14 and 15 are called differential (2/14) fdr(tree = tinyTree, truth = c(16, 13), found = c(15, 14), only.leaf = FALSE)
Find the direct children of an internal node in a tree.
findChild(tree, node, use.alias = FALSE)
findChild(tree, node, use.alias = FALSE)
tree |
A |
node |
Either the node number or node label of an internal node of
|
use.alias |
A logical scalar. If |
A vector of nodes. The numeric value is the node number, and the
vector name is the corresponding node label. If a node has no label, it
would have NA as name when use.alias = FALSE
, and have the alias
of the node label as name when use.alias = TRUE
.
Ruizhu Huang, Charlotte Soneson
suppressPackageStartupMessages({ library(ggtree) library(TreeSummarizedExperiment) }) data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node), color = "darkblue", hjust = -0.5, vjust = 0.7) + geom_text2(aes(label = label), color = "darkorange", hjust = -0.1, vjust = -0.7) ## Specify node numbers findChild(tree = tinyTree, node = c(17, 12)) ## Name return values using aliases findChild(tree = tinyTree, node = c(17, 12), use.alias = TRUE) ## Specify node labels findChild(tree = tinyTree, node = c("Node_17", "Node_12")) ## Tips have no children findChild(tree = tinyTree, node = "t4")
suppressPackageStartupMessages({ library(ggtree) library(TreeSummarizedExperiment) }) data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node), color = "darkblue", hjust = -0.5, vjust = 0.7) + geom_text2(aes(label = label), color = "darkorange", hjust = -0.1, vjust = -0.7) ## Specify node numbers findChild(tree = tinyTree, node = c(17, 12)) ## Name return values using aliases findChild(tree = tinyTree, node = c(17, 12), use.alias = TRUE) ## Specify node labels findChild(tree = tinyTree, node = c("Node_17", "Node_12")) ## Tips have no children findChild(tree = tinyTree, node = "t4")
Find all branches whose leaves do not overlap with those of the specified branches.
findExcl(tree, node, use.alias = FALSE)
findExcl(tree, node, use.alias = FALSE)
tree |
A |
node |
A numeric or character vector specifying the nodes. |
use.alias |
A logical scalar. If |
A vector of node numbers
Ruizhu Huang
suppressPackageStartupMessages({ library(ggtree) library(TreeSummarizedExperiment) }) data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 17, fill = "blue", alpha = 0.3) + geom_hilight(node = 13, fill = "orange", alpha = 0.3) ## Find branches whose leaves do not overlap with the two colored branches. ## The returned branches are represented at the highest tree level ## possible without including any of the forbidden branches. findExcl(tree = tinyTree, node = c(17, 13))
suppressPackageStartupMessages({ library(ggtree) library(TreeSummarizedExperiment) }) data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 17, fill = "blue", alpha = 0.3) + geom_hilight(node = 13, fill = "orange", alpha = 0.3) ## Find branches whose leaves do not overlap with the two colored branches. ## The returned branches are represented at the highest tree level ## possible without including any of the forbidden branches. findExcl(tree = tinyTree, node = c(17, 13))
Generate candidates for different thresholds (t). A candidate consists of a disjoint collection of leaves and internal branches, that collectively cover all leaves in the tree, and represents a specific aggregation pattern along the tree.
getCand( tree, t = NULL, score_data, node_column, p_column, sign_column, threshold = 0.05, pct_na = 0.5, message = FALSE )
getCand( tree, t = NULL, score_data, node_column, p_column, sign_column, threshold = 0.05, pct_na = 0.5, message = FALSE )
tree |
A |
t |
A vector of threshold values used to search for candidates,
in the range [0, 1]. The default ( |
score_data |
A |
node_column |
The name of the column of |
p_column |
The name of the column of |
sign_column |
The name of the column of |
threshold |
Numeric scalar; any internal node where the value of the p-value column is above this value will not be returned. The default is 0.05. The aim of this threshold is to avoid arbitrarily picking up internal nodes without true signal. |
pct_na |
Numeric scalar. In order for an internal node to be eligible
for selection, more than |
message |
A logical scalar, indicating whether progress messages should be printed to the console. |
A list with two elements: candidate_list
and
score_data
. condidate_list
is a list of candidates obtained
for the different thresholds. score_data
is a data.frame
that includes columns from the input score_data
and additional
columns with q-scores for different thresholds.
Ruizhu Huang
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) }) data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) + geom_hilight(node = 18, fill = "orange", alpha = 0.3) ## Simulate p-values and directions of change for nodes ## (Nodes 1, 2, 3, 4, 5, 13, 14, 18 have a true signal) set.seed(1) pv <- runif(19, 0, 1) pv[c(seq_len(5), 13, 14, 18)] <- runif(8, 0, 0.001) fc <- sample(c(-1, 1), 19, replace = TRUE) fc[c(seq_len(3), 13, 14)] <- 1 fc[c(4, 5, 18)] <- -1 df <- data.frame(node = seq_len(19), pvalue = pv, logFoldChange = fc) ll <- getCand(tree = tinyTree, score_data = df, t = c(0.01, 0.05, 0.1, 0.25, 0.75), node_column = "node", p_column = "pvalue", sign_column = "logFoldChange") ## Candidates ll$candidate_list ## Score table ll$score_data
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) }) data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) + geom_hilight(node = 18, fill = "orange", alpha = 0.3) ## Simulate p-values and directions of change for nodes ## (Nodes 1, 2, 3, 4, 5, 13, 14, 18 have a true signal) set.seed(1) pv <- runif(19, 0, 1) pv[c(seq_len(5), 13, 14, 18)] <- runif(8, 0, 0.001) fc <- sample(c(-1, 1), 19, replace = TRUE) fc[c(seq_len(3), 13, 14)] <- 1 fc[c(4, 5, 18)] <- -1 df <- data.frame(node = seq_len(19), pvalue = pv, logFoldChange = fc) ll <- getCand(tree = tinyTree, score_data = df, t = c(0.01, 0.05, 0.1, 0.25, 0.75), node_column = "node", p_column = "pvalue", sign_column = "logFoldChange") ## Candidates ll$candidate_list ## Score table ll$score_data
Extract different elements of the data shown in a figure generated with
TreeHeatmap
, such as the heatmap itself, or the row and column names.
getData( tree_hm, type = c("heatmap", "row_name", "column_name", "title", "column_anno", "column_order", "column_split") )
getData( tree_hm, type = c("heatmap", "row_name", "column_name", "title", "column_anno", "column_order", "column_split") )
tree_hm |
The output of |
type |
A character scalar indicating the type of information to
extract. Should be one of |
A data.frame
(if type
is not "column_order"), or a
vector of column names (if type
is "column_order").
Ruizhu Huang
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) library(ggplot2) library(scales) library(dplyr) library(ggnewscale) }) ## Load example data (tiny tree with corresponding count matrix) tse <- readRDS(system.file("extdata", "tinytree_counts.rds", package = "treeclimbR")) ## Aggregate counts for each of the highlighted subtrees tseagg <- aggTSE( tse, rowLevel = c(13, 18, setdiff(showNode(tinyTree, only.leaf = TRUE), unlist(findDescendant(tinyTree, node = c(13, 18), only.leaf = TRUE))))) ct <- SummarizedExperiment::assay(tseagg, "counts") col_split <- ifelse(colnames(ct) %in% paste0("S", seq_len(5)), "A", "B") names(col_split) <- colnames(ct) ## Prepare the tree figure tree_fig <- ggtree(tinyTree, branch.length = "none", layout = "rectangular") + geom_hilight(node = 18, fill = "orange", alpha = 0.3) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) fig <- TreeHeatmap( tree = tinyTree, tree_fig = tree_fig, hm_data = ct, cluster_column = TRUE, column_split = col_split, column_anno = col_split, column_anno_gap = 0.6, column_anno_color = c(A = "red", B = "blue"), show_colnames = TRUE, colnames_position = "bottom", colnames_angle = 90, colnames_size = 2, colnames_offset_y = -0.2, show_title = TRUE, title_offset_y = 1.5, title_color = "blue" ) fig ## Extract data for heatmap df_hm <- getData(tree_hm = fig, type = "heatmap") ## Generate data to add a column annotation ct <- df_hm |> dplyr::select(x, width, variable) |> dplyr::distinct() set.seed(1) ann <- matrix(sample(LETTERS[seq_len(2)], size = 3 * ncol(df_hm), replace = TRUE), nrow = 3) rownames(ann) <- paste0("g", seq_len(3)) colnames(ann) <- ct$variable ann <- data.frame(ann) |> mutate(y = min(df_hm$y) - seq_len(nrow(ann)), label = rownames(ann)) df_ann <- tidyr::pivot_longer( ann, names_to = "variable", values_to = "value", cols = -c("y", "label")) |> left_join(ct) ## Add new column annotation fig + new_scale_fill() + geom_tile(data = df_ann, aes(x, y-0.5, width = width, fill = value)) + scale_fill_viridis_d() + geom_text(data = df_ann, aes(x = min(x) - 1, y = y - 0.5, label = label))
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) library(ggplot2) library(scales) library(dplyr) library(ggnewscale) }) ## Load example data (tiny tree with corresponding count matrix) tse <- readRDS(system.file("extdata", "tinytree_counts.rds", package = "treeclimbR")) ## Aggregate counts for each of the highlighted subtrees tseagg <- aggTSE( tse, rowLevel = c(13, 18, setdiff(showNode(tinyTree, only.leaf = TRUE), unlist(findDescendant(tinyTree, node = c(13, 18), only.leaf = TRUE))))) ct <- SummarizedExperiment::assay(tseagg, "counts") col_split <- ifelse(colnames(ct) %in% paste0("S", seq_len(5)), "A", "B") names(col_split) <- colnames(ct) ## Prepare the tree figure tree_fig <- ggtree(tinyTree, branch.length = "none", layout = "rectangular") + geom_hilight(node = 18, fill = "orange", alpha = 0.3) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) fig <- TreeHeatmap( tree = tinyTree, tree_fig = tree_fig, hm_data = ct, cluster_column = TRUE, column_split = col_split, column_anno = col_split, column_anno_gap = 0.6, column_anno_color = c(A = "red", B = "blue"), show_colnames = TRUE, colnames_position = "bottom", colnames_angle = 90, colnames_size = 2, colnames_offset_y = -0.2, show_title = TRUE, title_offset_y = 1.5, title_color = "blue" ) fig ## Extract data for heatmap df_hm <- getData(tree_hm = fig, type = "heatmap") ## Generate data to add a column annotation ct <- df_hm |> dplyr::select(x, width, variable) |> dplyr::distinct() set.seed(1) ann <- matrix(sample(LETTERS[seq_len(2)], size = 3 * ncol(df_hm), replace = TRUE), nrow = 3) rownames(ann) <- paste0("g", seq_len(3)) colnames(ann) <- ct$variable ann <- data.frame(ann) |> mutate(y = min(df_hm$y) - seq_len(nrow(ann)), label = rownames(ann)) df_ann <- tidyr::pivot_longer( ann, names_to = "variable", values_to = "value", cols = -c("y", "label")) |> left_join(ct) ## Add new column annotation fig + new_scale_fill() + geom_tile(data = df_ann, aes(x, y-0.5, width = width, fill = value)) + scale_fill_viridis_d() + geom_text(data = df_ann, aes(x = min(x) - 1, y = y - 0.5, label = label))
Search for the target level of the tree via a specified score. The score value needs to be provided for each node of the tree.
getLevel( tree, score_data, drop, score_column, node_column, get_max, parent_first = TRUE, message = FALSE )
getLevel( tree, score_data, drop, score_column, node_column, get_max, parent_first = TRUE, message = FALSE )
tree |
A |
score_data |
A |
drop |
A logical expression indicating elements or rows to keep.
Missing values are taken as |
score_column |
The name of the column of |
node_column |
The name of the column of |
get_max |
A logical scalar. If |
parent_first |
A logical scalar. If |
message |
A logical scalar indicating whether progress messages should be printed. |
A data.frame
similar to score_data
but with an
additional column named keep
indicating which nodes to retain.
Ruizhu Huang
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) }) data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node), color = "darkblue", hjust = -0.5, vjust = 0.7) + geom_text2(aes(label = label), color = "darkorange", hjust = -0.1, vjust = -0.7) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) + geom_hilight(node = 16, fill = "orange", alpha = 0.3) ## Generate score for each node pv <- rep(0.1, 19) pv[c(16, 13, 17)] <- c(0.01, 0.05, 0.005) out <- data.frame(node = 1:19, pvalue = pv) ## Search nodes final <- getLevel(tree = tinyTree, score_data = out, drop = pvalue > 0.05, score_column = "pvalue", node_column = "node", get_max = FALSE, parent_first = TRUE, message = FALSE) ## Nodes to keep final$node[final$keep]
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) }) data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node), color = "darkblue", hjust = -0.5, vjust = 0.7) + geom_text2(aes(label = label), color = "darkorange", hjust = -0.1, vjust = -0.7) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) + geom_hilight(node = 16, fill = "orange", alpha = 0.3) ## Generate score for each node pv <- rep(0.1, 19) pv[c(16, 13, 17)] <- c(0.01, 0.05, 0.005) out <- data.frame(node = 1:19, pvalue = pv) ## Search nodes final <- getLevel(tree = tinyTree, score_data = out, drop = pvalue > 0.05, score_column = "pvalue", node_column = "node", get_max = FALSE, parent_first = TRUE, message = FALSE) ## Nodes to keep final$node[final$keep]
Extract information about candidates.
infoCand(object)
infoCand(object)
object |
An output object from evalCand. |
A data.frame
with information about candidates.
Ruizhu Huang
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) }) ## Simulate some data data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) + geom_hilight(node = 18, fill = "orange", alpha = 0.3) set.seed(1) pv <- runif(19, 0, 1) pv[c(seq_len(5), 13, 14, 18)] <- runif(8, 0, 0.001) fc <- sample(c(-1, 1), 19, replace = TRUE) fc[c(seq_len(3), 13, 14)] <- 1 fc[c(4, 5, 18)] <- -1 df <- data.frame(node = seq_len(19), pvalue = pv, logFoldChange = fc) ## Get candidates ll <- getCand(tree = tinyTree, score_data = df, node_column = "node", p_column = "pvalue", sign_column = "logFoldChange") ## Evaluate candidates cc <- evalCand(tree = tinyTree, levels = ll$candidate_list, score_data = df, node_column = "node", p_column = "pvalue", sign_column = "logFoldChange", limit_rej = 0.05) ## Get summary info about candidates out <- infoCand(object = cc) out
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) }) ## Simulate some data data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) + geom_hilight(node = 18, fill = "orange", alpha = 0.3) set.seed(1) pv <- runif(19, 0, 1) pv[c(seq_len(5), 13, 14, 18)] <- runif(8, 0, 0.001) fc <- sample(c(-1, 1), 19, replace = TRUE) fc[c(seq_len(3), 13, 14)] <- 1 fc[c(4, 5, 18)] <- -1 df <- data.frame(node = seq_len(19), pvalue = pv, logFoldChange = fc) ## Get candidates ll <- getCand(tree = tinyTree, score_data = df, node_column = "node", p_column = "pvalue", sign_column = "logFoldChange") ## Evaluate candidates cc <- evalCand(tree = tinyTree, levels = ll$candidate_list, score_data = df, node_column = "node", p_column = "pvalue", sign_column = "logFoldChange", limit_rej = 0.05) ## Get summary info about candidates out <- infoCand(object = cc) out
Perform an elementwise check of whether two vectors of nodes are "connected" in specific ways in a tree. A pair of nodes are considered to be connected if they are part of the same path from a leaf to the root of the tree. They are considered directly connected if one node is the parent of the other, and indirectly connected otherwise.
isConnect(tree, node_a, node_b, connect = "any")
isConnect(tree, node_a, node_b, connect = "any")
tree |
A |
node_a , node_b
|
The two vectors of nodes (either node numbers or node labels) to check for connections. The vectors should have the same length (if not, the shorter one will be recycled), as the check for connectivity is done elementwise. |
connect |
One of "any", "direct", "indirect", the type of connections to search for. |
A logical vector of the same length as node_a
and
node_b
, where each element indicates whether the corresponding
elements of node_a
and node_b
are connected in the
specified way.
Ruizhu Huang, Charlotte Soneson
suppressPackageStartupMessages({ library(ggtree) library(TreeSummarizedExperiment) }) data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) node_a <- c(4, 18, 19, 2) node_b <- c(4, 5, 6, 3) isConnect(tree = tinyTree, node_a = node_a, node_b = node_b, connect = "any")
suppressPackageStartupMessages({ library(ggtree) library(TreeSummarizedExperiment) }) data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) node_a <- c(4, 18, 19, 2) node_b <- c(4, 5, 6, 3) isConnect(tree = tinyTree, node_a = node_a, node_b = node_b, connect = "any")
Calculate median value of each marker in each cluster.
medianByClusterMarker( SE, assay = 1, marker_in_column = TRUE, column_cluster = "cluster_id", use_marker = NULL )
medianByClusterMarker( SE, assay = 1, marker_in_column = TRUE, column_cluster = "cluster_id", use_marker = NULL )
SE |
A |
assay |
A numeric index or assay name indicating with assay of |
marker_in_column |
A logical scalar, indicating whether markers (genes,
features) are in the columns of |
column_cluster |
The name of the column of |
use_marker |
A logical or numeric vector such that
|
A SummarizedExperiment
object
containing the median value of each marker in each cluster.
Ruizhu Huang, Charlotte Soneson
suppressPackageStartupMessages({ library(SummarizedExperiment) }) ## Simulate data with 100 cells and 10 markers (5 type, 5 state markers) set.seed(1) count <- matrix(rpois(n = 1000, lambda = 10), nrow = 100) colnames(count) <- paste0("mk", 1:10) rowD <- data.frame("cluster" = sample(seq_len(6), 100, replace = TRUE)) colD <- data.frame(type_marker = rep(c(FALSE, TRUE), each = 5)) ## SE with markers in columns d_se <- SummarizedExperiment(assays = list(counts = count), rowData = rowD, colData = colD) medianByClusterMarker(SE = d_se, marker_in_column = TRUE, column_cluster = "cluster", use_marker = colData(d_se)$type_marker) ## SE with markers in rows d_se <- SummarizedExperiment(assays = list(counts = t(count)), rowData = colD, colData = rowD) medianByClusterMarker(SE = d_se, marker_in_column = FALSE, column_cluster = "cluster", use_marker = rowData(d_se)$type_marker)
suppressPackageStartupMessages({ library(SummarizedExperiment) }) ## Simulate data with 100 cells and 10 markers (5 type, 5 state markers) set.seed(1) count <- matrix(rpois(n = 1000, lambda = 10), nrow = 100) colnames(count) <- paste0("mk", 1:10) rowD <- data.frame("cluster" = sample(seq_len(6), 100, replace = TRUE)) colD <- data.frame(type_marker = rep(c(FALSE, TRUE), each = 5)) ## SE with markers in columns d_se <- SummarizedExperiment(assays = list(counts = count), rowData = rowD, colData = colD) medianByClusterMarker(SE = d_se, marker_in_column = TRUE, column_cluster = "cluster", use_marker = colData(d_se)$type_marker) ## SE with markers in rows d_se <- SummarizedExperiment(assays = list(counts = t(count)), rowData = colD, colData = rowD) medianByClusterMarker(SE = d_se, marker_in_column = FALSE, column_cluster = "cluster", use_marker = rowData(d_se)$type_marker)
Extract a table with the top-ranked nodes from a DA/DS analysis output
(generated by runDA
or runDS
).
nodeResult( object, n = 10, type = c("DA", "DS"), adjust_method = "BH", sort_by = "PValue", p_value = 1 )
nodeResult( object, n = 10, type = c("DA", "DS"), adjust_method = "BH", sort_by = "PValue", p_value = 1 )
object |
|
n |
An integer indicating the maximum number of entities to return. |
type |
Either "DA" (for object from |
adjust_method |
A character string specifying the method used to adjust
p-values for multiple testing. See |
sort_by |
A character string specifying the sorting method. This will
be passed to |
p_value |
A numeric cutoff value for adjusted p-values. This will
be passed to |
A data frame with results for all nodes passing the imposed
thresholds. The columns logFC, logCPM,
PValue, FDR, F (or LR) are from (the
output table of) topTags
. The node column
stores the node number for each entity. Note: FDR is corrected
over all features and nodes when the specified type = "DS"
.
Ruizhu Huang, Charlotte Soneson
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) lse <- readRDS(system.file("extdata", "da_sim_100_30_18de.rds", package = "treeclimbR")) tse <- aggTSE(lse, rowLevel = showNode(tree = rowTree(lse), only.leaf = FALSE)) dd <- model.matrix( ~ group, data = colData(tse)) out <- runDA(TSE = tse, feature_on_row = TRUE, assay = "counts", option = "glmQL", design = dd, contrast = NULL, normalize = TRUE) ## Top 10 nodes with DA nodeResult(out, n = 10)
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) lse <- readRDS(system.file("extdata", "da_sim_100_30_18de.rds", package = "treeclimbR")) tse <- aggTSE(lse, rowLevel = showNode(tree = rowTree(lse), only.leaf = FALSE)) dd <- model.matrix( ~ group, data = colData(tse)) out <- runDA(TSE = tse, feature_on_row = TRUE, assay = "counts", option = "glmQL", design = dd, contrast = NULL, normalize = TRUE) ## Top 10 nodes with DA nodeResult(out, n = 10)
parEstimate
is a wrapper of the function
dirmult
with default settings for init
,
initscalar
, epsilon
, trace
and mode
. It allows
the input obj
to be either a matrix
or a
TreeSummarizedExperiment
and outputs the estimated values of
pi
and theta
.
parEstimate(obj, assay = NULL)
parEstimate(obj, assay = NULL)
obj |
A matrix or |
assay |
If |
A list including the estimates of pi
(a vector with one
element per row in obj
) and theta
(a scalar).
Ruizhu Huang, Charlotte Soneson
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) set.seed(1L) y <- matrix(rnbinom(200, size = 1, mu = 10), nrow = 10) colnames(y) <- paste("S", seq_len(20), sep = "") rownames(y) <- tinyTree$tip.label toy_tse <- TreeSummarizedExperiment(rowTree = tinyTree, assays = list(y)) res <- parEstimate(obj = toy_tse) metadata(res)$assays.par
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) set.seed(1L) y <- matrix(rnbinom(200, size = 1, mu = 10), nrow = 10) colnames(y) <- paste("S", seq_len(20), sep = "") rownames(y) <- tinyTree$tip.label toy_tse <- TreeSummarizedExperiment(rowTree = tinyTree, assays = list(y)) res <- parEstimate(obj = toy_tse) metadata(res)$assays.par
Test for differential abundance of entities using functions from
the edgeR
package. This adapts edgerWrp
to
accept input as a
TreeSummarizedExperiment
(TSE) object instead of a matrix
. Features could be
represented in either rows or columns. By default, features are in the rows.
Then, samples are in columns and the sample information is in colData
.
The tree that stores the hierarchical information about features is in
rowTree
. Each row of the assays
can be mapped to a node of
the tree. Data on rows that are mapped to internal nodes is generated from
data on leaf nodes. Normalization for samples is automatically performed by
edgeR
and the library size is calculated using features that
are mapped to leaf nodes.
runDA( TSE, feature_on_row = TRUE, assay = NULL, option = c("glm", "glmQL"), design = NULL, contrast = NULL, filter_min_count = 10, filter_min_total_count = 15, filter_large_n = 10, filter_min_prop = 0.7, normalize = TRUE, normalize_method = "TMM", group_column = "group", design_terms = "group", ... )
runDA( TSE, feature_on_row = TRUE, assay = NULL, option = c("glm", "glmQL"), design = NULL, contrast = NULL, filter_min_count = 10, filter_min_total_count = 15, filter_large_n = 10, filter_min_prop = 0.7, normalize = TRUE, normalize_method = "TMM", group_column = "group", design_terms = "group", ... )
TSE |
A |
feature_on_row |
A logical scalar. If |
assay |
A numeric index or assay name to specify which assay from
|
option |
Either |
design |
A numeric design matrix. If |
contrast |
A numeric vector specifying one contrast of
the linear model coefficients to be tested equal to zero. Its length
must equal to the number of columns of design. If |
filter_min_count |
A numeric value, passed to min.count of
|
filter_min_total_count |
A numeric value, passed to
min.total.count of |
filter_large_n |
A numeric value, passed to large.n of
|
filter_min_prop |
A numeric value, passed to min.prop of
|
normalize |
A logical scalar indicating whether to estimate
normalization factors (using |
normalize_method |
Normalization method to be used. See
|
group_column |
The name of the column in the sample annotation providing group labels for samples (currently not used). |
design_terms |
The names of columns from the sample annotation that will be used to generate the design matrix. This is ignored if design is provided. |
... |
More arguments to pass to |
The experimental design is specified by a design matrix and provided
via the argument design
. More details about the calculation of
normalization factor could be found from
calcNormFactors
.
A list with entries edgeR_results, tree, and nodes_drop.
The output of glmQLFTest
or
glmLRT
depending on the specified
option
.
The hierarchical structure of entities that was stored in the
input TSE
.
A vector storing the alias node labels of entities that are filtered before analysis due to low counts.
Ruizhu Huang
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) ## Load example data set lse <- readRDS(system.file("extdata", "da_sim_100_30_18de.rds", package = "treeclimbR")) ## Aggregate counts on internal nodes nodes <- showNode(tree = tinyTree, only.leaf = FALSE) tse <- aggTSE(x = lse, rowLevel = nodes) dd <- model.matrix(~ group, data = colData(tse)) out <- runDA(TSE = tse, feature_on_row = TRUE, assay = 1, option = "glmQL", design = dd, contrast = NULL, normalize = TRUE, filter_min_count = 2) names(out) out$nodes_drop edgeR::topTags(out$edgeR_results, sort.by = "PValue")
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) ## Load example data set lse <- readRDS(system.file("extdata", "da_sim_100_30_18de.rds", package = "treeclimbR")) ## Aggregate counts on internal nodes nodes <- showNode(tree = tinyTree, only.leaf = FALSE) tse <- aggTSE(x = lse, rowLevel = nodes) dd <- model.matrix(~ group, data = colData(tse)) out <- runDA(TSE = tse, feature_on_row = TRUE, assay = 1, option = "glmQL", design = dd, contrast = NULL, normalize = TRUE, filter_min_count = 2) names(out) out$nodes_drop edgeR::topTags(out$edgeR_results, sort.by = "PValue")
Test for differential state of entities using functions from the
edgeR
package. This adapts edgerWrp
to accept
input as a
SummarizedExperiment
(SE) object
instead of matrix
. Each assay
should correspond to data
for one node of the tree. Samples are in columns and
features are in rows. The sample information is
in colData
. The tree that stores the hierarchical relation between
the assays
is provided via the argument tree
.
runDS( SE, tree, option = c("glm", "glmQL"), design = NULL, contrast = NULL, filter_min_count = 1, filter_min_total_count = 15, filter_large_n = 10, filter_min_prop = 1, min_cells = 10, normalize = TRUE, normalize_method = "TMM", group_column = "group_id", design_terms = "group_id", message = TRUE, ... )
runDS( SE, tree, option = c("glm", "glmQL"), design = NULL, contrast = NULL, filter_min_count = 1, filter_min_total_count = 15, filter_large_n = 10, filter_min_prop = 1, min_cells = 10, normalize = TRUE, normalize_method = "TMM", group_column = "group_id", design_terms = "group_id", message = TRUE, ... )
SE |
A |
tree |
A |
option |
Either |
design |
A numeric design matrix. If |
contrast |
A numeric vector specifying one contrast of
the linear model coefficients to be tested equal to zero. Its length
must equal to the number of columns of design. If |
filter_min_count |
A numeric value, passed to min.count of
|
filter_min_total_count |
A numeric value, passed to
min.total.count of |
filter_large_n |
A numeric value, passed to large.n of
|
filter_min_prop |
A numeric value, passed to min.prop of
|
min_cells |
A numeric scalar specifying the minimal number of cells
in a node required to include a node in the analysis. The information
about the number of cells per node and sample should be available in
|
normalize |
A logical scalar indicating whether to estimate
normalization factors (using |
normalize_method |
Normalization method to be used. See
|
group_column |
The name of the column in the sample annotation providing group labels for samples. This annotation is used for filtering. |
design_terms |
The names of columns from the sample annotation that will be used to generate the design matrix. This is ignored if design is provided. |
message |
A logical scalar, indicating whether progress messages should be printed. |
... |
More arguments to pass to |
A list with entries edgeR_results, tree, and nodes_drop.
A list. Each of the elements contains the output of
glmQLFTest
or
glmLRT
for one node, depending on the specified
option
.
The hierarchical structure of entities that was stored in the
input SE
.
A vector storing the alias node labels of entities that are filtered before analysis due to low counts.
Ruizhu Huang
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) ## Load example data ds_tse <- readRDS(system.file("extdata", "ds_sim_20_500_8de.rds", package = "treeclimbR")) ds_se <- aggDS(TSE = ds_tse, assay = "counts", sample_id = "sample_id", group_id = "group", cluster_id = "cluster_id", FUN = sum) ## Information about the number of cells is provided in the metadata S4Vectors::metadata(ds_se)$n_cells ds_res <- runDS(SE = ds_se, tree = colTree(ds_tse), option = "glmQL", group_column = "group", contrast = c(0, 1), filter_min_count = 0, filter_min_total_count = 1, design = model.matrix(~ group, data = colData(ds_se)), filter_min_prop = 0, min_cells = 5, message = FALSE) ## Top differential features (across nodes) nodeResult(ds_res, type = "DS")
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) ## Load example data ds_tse <- readRDS(system.file("extdata", "ds_sim_20_500_8de.rds", package = "treeclimbR")) ds_se <- aggDS(TSE = ds_tse, assay = "counts", sample_id = "sample_id", group_id = "group", cluster_id = "cluster_id", FUN = sum) ## Information about the number of cells is provided in the metadata S4Vectors::metadata(ds_se)$n_cells ds_res <- runDS(SE = ds_se, tree = colTree(ds_tse), option = "glmQL", group_column = "group", contrast = c(0, 1), filter_min_count = 0, filter_min_total_count = 1, design = model.matrix(~ group, data = colData(ds_se)), filter_min_prop = 0, min_cells = 5, message = FALSE) ## Top differential features (across nodes) nodeResult(ds_res, type = "DS")
Select branches in a tree meeting the specified criteria in terms of number of leaves and the count proportion. Note that only internal branch nodes are considered - no individual leaves will be returned.
selNode( pr = NULL, obj = NULL, assay = 1, data = NULL, tree = NULL, minTip = 0, maxTip = Inf, minPr = 0, maxPr = 1, skip = NULL, all = FALSE )
selNode( pr = NULL, obj = NULL, assay = 1, data = NULL, tree = NULL, minTip = 0, maxTip = Inf, minPr = 0, maxPr = 1, skip = NULL, all = FALSE )
pr |
A named numeric vector to provide proportions of entities. If
this is provided, |
obj |
A |
assay |
The index or name of the assay of |
data |
Either a count table with entities in rows and samples in
columns, or a list with |
tree |
A |
minTip |
the minimum number of leaves in the selected branch. |
maxTip |
The maximum number of leaves in the selected branch. |
minPr |
The minimum count proportion of the selected branch in a sample. A value between 0 and 1. |
maxPr |
The maximum count proportion of the selected branch in a sample. A value between 0 and 1. |
skip |
A character vector of node labels. These nodes can not be descendants or the ancestors of the selected branch. |
all |
A logical scalar. If |
A data.frame
with node information for the selected
internal node(s).
Ruizhu Huang, Charlotte Soneson
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) ## Generate example data set.seed(1) data(tinyTree) toyTable <- matrix(rnbinom(40, size = 1, mu = 10), nrow = 10) colnames(toyTable) <- paste(rep(LETTERS[seq_len(2)], each = 2), rep(seq_len(2), 2), sep = "_") rownames(toyTable) <- tinyTree$tip.label ## Estimate entity proportions from count matrix under a Dirichlet ## Multinomial framework, and use this as the input for selNode dat <- parEstimate(obj = toyTable) selNode(tree = tinyTree, data = dat, all = TRUE) selNode(tree = tinyTree, data = dat, minTip = 4, maxTip = 9, minPr = 0, maxPr = 0.8, all = TRUE) ## Alternatively, directly provide the proportions vector selNode(tree = tinyTree, pr = dat$pi, all = TRUE) ## Return only branch with lowest proportion among valid ones selNode(tree = tinyTree, pr = dat$pi, all = FALSE) ## Start instead from a TreeSummarizedExperiment object lse <- TreeSummarizedExperiment(rowTree = tinyTree, assays = list(counts = toyTable)) selNode(obj = lse, assay = "counts", all = TRUE) ## Don't allow node 1 to be included selNode(obj = lse, assay = "counts", skip = 1, all = TRUE)
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) ## Generate example data set.seed(1) data(tinyTree) toyTable <- matrix(rnbinom(40, size = 1, mu = 10), nrow = 10) colnames(toyTable) <- paste(rep(LETTERS[seq_len(2)], each = 2), rep(seq_len(2), 2), sep = "_") rownames(toyTable) <- tinyTree$tip.label ## Estimate entity proportions from count matrix under a Dirichlet ## Multinomial framework, and use this as the input for selNode dat <- parEstimate(obj = toyTable) selNode(tree = tinyTree, data = dat, all = TRUE) selNode(tree = tinyTree, data = dat, minTip = 4, maxTip = 9, minPr = 0, maxPr = 0.8, all = TRUE) ## Alternatively, directly provide the proportions vector selNode(tree = tinyTree, pr = dat$pi, all = TRUE) ## Return only branch with lowest proportion among valid ones selNode(tree = tinyTree, pr = dat$pi, all = FALSE) ## Start instead from a TreeSummarizedExperiment object lse <- TreeSummarizedExperiment(rowTree = tinyTree, assays = list(counts = toyTable)) selNode(obj = lse, assay = "counts", all = TRUE) ## Don't allow node 1 to be included selNode(obj = lse, assay = "counts", skip = 1, all = TRUE)
Simulate a data set with different abundance patterns for entities under different conditions. These entities have their corresponding nodes on a tree.
simData( tree = NULL, data = NULL, obj = NULL, assay = NULL, scenario = "BS", from.A = NULL, from.B = NULL, minTip.A = 0, maxTip.A = Inf, minTip.B = 0, maxTip.B = Inf, minPr.A = 0, maxPr.A = 1, ratio = 4, adjB = NULL, pct = 0.6, nSam = c(50, 50), mu = 10000, size = NULL, n = 1, FUN = sum, message = FALSE )
simData( tree = NULL, data = NULL, obj = NULL, assay = NULL, scenario = "BS", from.A = NULL, from.B = NULL, minTip.A = 0, maxTip.A = Inf, minTip.B = 0, maxTip.B = Inf, minPr.A = 0, maxPr.A = 1, ratio = 4, adjB = NULL, pct = 0.6, nSam = c(50, 50), mu = 10000, size = NULL, n = 1, FUN = sum, message = FALSE )
tree |
A |
data |
A count matrix with entities corresponding to tree leaves
in the rows and samples in the columns. Only used when |
obj |
A |
assay |
If |
scenario |
The simulation scenario, either “BS”, “US”, or “SS” (see Details). |
from.A , from.B
|
The branch node labels of branches A and B for which the
signal will be swapped. By default, both are |
minTip.A |
The minimum number of leaves allowed in branch A. |
maxTip.A |
The maximum number of leaves allowed in branch A. |
minTip.B |
The minimum number of leaves allowed in branch B. |
maxTip.B |
The maximum number of leaves allowed in branch B. |
minPr.A |
A numeric value in [0, 1]. The minimum abundance proportion of leaves in branch A. |
maxPr.A |
A numeric value in [0, 1]. The maximum abundance proportion of leaves in branch A. |
ratio |
A numeric value. The proportion ratio of branch B to branch A.
This value is used to select branches(see Details). If there are
no branches having exactly this ratio, the pair with the value closest to
|
adjB |
A numeric value in [0, 1] (only for |
pct |
The percentage of leaves in branch B that have differential abundance under different conditions (only for scenario “SS”). |
nSam |
A numeric vector of length 2, indicating the sample size for each of the two simulated conditions. |
mu , size
|
The parameters of the Negative Binomial distribution. (see mu
and size in |
n |
A numeric value to specify how many count tables would be generated with the same settings. The default is 1, i.e., one count table would be obtained at the end. If greater than 1, the output is a list of matrices. |
FUN |
A function to calculate the aggregated count at each internal
node based on its descendant leaves (e.g., |
message |
A logical scalar, indicating whether progress messages should be printed to the console. |
Simulate a count table for entities which are corresponding to the
nodes of a tree. The entities are in rows and the samples from different
groups or conditions are in columns. The library size of each sample is
sampled from a Negative Binomial distribution with mean and size
specified by the arguments mu
and size
. The counts of
entities, that are mapped to the leaf nodes, in a sample are assumed
to follow a Dirichlet-Multinomial distribution. The parameters for
the Dirichlet-Multinomial distribution are estimated from a real data set
specified by data
via the function dirmult
(see
dirmult
). To generate different abundance patterns
under different conditions, we provide three different scenarios,
“BS”, “US”, and “SS” (specified via
scenario
).
BS: two branches are selected to swap their proportions, and leaves on the same branch have the same fold change.
US: two branches are selected to swap their proportions. Leaves in the same branch have different fold changes but same direction (either increase or decrease).
SS: two branches are selected. One branch has its proportion swapped with the proportion of some leaves from the other branch.
a TreeSummarizedExperiment object.
assays A list of count matrices, with entities in rows and samples in columns. Each row can be mapped to a node of the tree.
rowData Annotation data for the rows.
colData Annotation data for the columns.
rowTree The tree structure of entities.
rowLinks The link between rows and nodes on the tree.
metadata More details about the simulation.
FC the fold change of entities corresponding to the tree leaves.
Branch the information about two selected branches.
A The branch node label (or number) of branch A.
B The branch node label (or number) of branch B.
ratio The count proportion ratio of branch B to branch A.
A_tips The number of leaves on branch A.
B_tips The number of leaves on branch B.
A_prop The count proportion of branch A.
B_prop The count proportion of branch B.
Ruizhu Huang, Charlotte Soneson
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) ## Generate data to use as the starting point (this would usually be a ## real data set) set.seed(1L) y <- matrix(rnbinom(120, size = 1, mu = 10), nrow = 10) colnames(y) <- paste("S", seq_len(12), sep = "") rownames(y) <- tinyTree$tip.label toy_lse <- TreeSummarizedExperiment(rowTree = tinyTree, assays = list(counts = y)) simData(obj = toy_lse, ratio = 2, scenario = "BS", pct = 0.5)
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) }) ## Generate data to use as the starting point (this would usually be a ## real data set) set.seed(1L) y <- matrix(rnbinom(120, size = 1, mu = 10), nrow = 10) colnames(y) <- paste("S", seq_len(12), sep = "") rownames(y) <- tinyTree$tip.label toy_lse <- TreeSummarizedExperiment(rowTree = tinyTree, assays = list(counts = y)) simData(obj = toy_lse, ratio = 2, scenario = "BS", pct = 0.5)
Generate a table of top-ranked nodes from the optimal resolution candidate of entities on a tree.
topNodes( object, n = 10, sort_by = NULL, sort_decreasing = FALSE, sort_by_absolute = FALSE, p_value = 1 )
topNodes( object, n = 10, sort_by = NULL, sort_decreasing = FALSE, sort_by_absolute = FALSE, p_value = 1 )
object |
An output object from evalCand. |
n |
An integer, the maximum number of entities to return. |
sort_by |
A character string specifying the column of
|
sort_decreasing |
A logical value indicating whether to sort by
decreasing value of the |
sort_by_absolute |
A logical value indicating whether to take the
absolute value of the |
p_value |
A numeric cutoff value for adjusted p-values. Only entities with adjusted p-values equal or lower than specified are returned. |
A data.frame
with test results. The node
column stores the node number for each entity.
Ruizhu Huang, Charlotte Soneson
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) }) data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) + geom_hilight(node = 18, fill = "orange", alpha = 0.3) set.seed(1) pv <- runif(19, 0, 1) pv[c(seq_len(5), 13, 14, 18)] <- runif(8, 0, 0.001) fc <- sample(c(-1, 1), 19, replace = TRUE) fc[c(seq_len(3), 13, 14)] <- 1 fc[c(4, 5, 18)] <- -1 df <- data.frame(node = seq_len(19), pvalue = pv, logFoldChange = fc) ll <- getCand(tree = tinyTree, score_data = df, node_column = "node", p_column = "pvalue", sign_column = "logFoldChange") cc <- evalCand(tree = tinyTree, levels = ll$candidate_list, score_data = df, node_column = "node", p_column = "pvalue", sign_column = "logFoldChange", limit_rej = 0.05) ## Unsorted result table topNodes(cc) ## Sort by p-value in increasing order topNodes(cc, sort_by = "pvalue")
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) }) data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) + geom_hilight(node = 18, fill = "orange", alpha = 0.3) set.seed(1) pv <- runif(19, 0, 1) pv[c(seq_len(5), 13, 14, 18)] <- runif(8, 0, 0.001) fc <- sample(c(-1, 1), 19, replace = TRUE) fc[c(seq_len(3), 13, 14)] <- 1 fc[c(4, 5, 18)] <- -1 df <- data.frame(node = seq_len(19), pvalue = pv, logFoldChange = fc) ll <- getCand(tree = tinyTree, score_data = df, node_column = "node", p_column = "pvalue", sign_column = "logFoldChange") cc <- evalCand(tree = tinyTree, levels = ll$candidate_list, score_data = df, node_column = "node", p_column = "pvalue", sign_column = "logFoldChange", limit_rej = 0.05) ## Unsorted result table topNodes(cc) ## Sort by p-value in increasing order topNodes(cc, sort_by = "pvalue")
Calculate the true positive rate on a tree structure, at either leaf or node level.
tpr(tree, truth, found, only.leaf = TRUE)
tpr(tree, truth, found, only.leaf = TRUE)
tree |
A |
truth |
True signal nodes (e.g., nodes that are truly differentially
abundant between experimental conditions). Note: when the
TPR is requested at the leaf level ( |
found |
Detected signal nodes (e.g., nodes that have been found to be
differentially abundant via a statistical testing procedure).
Note: when the TPR is requested at the leaf level
( |
only.leaf |
A logical scalar. If |
The estimated true positive rate.
Ruizhu Huang
suppressPackageStartupMessages({ library(ggtree) library(TreeSummarizedExperiment) }) data("tinyTree") ## Two branches are truly differential ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 16, fill = "orange", alpha = 0.3) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) ## TPR at the leaf level if nodes 14 and 15 are called differential (7/8) tpr(tree = tinyTree, truth = c(16, 13), found = c(15, 14), only.leaf = TRUE) ## TPR at the node level if nodes 14 and 15 are called differential (12/14) tpr(tree = tinyTree, truth = c(16, 13), found = c(15, 14), only.leaf = FALSE)
suppressPackageStartupMessages({ library(ggtree) library(TreeSummarizedExperiment) }) data("tinyTree") ## Two branches are truly differential ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) + geom_hilight(node = 16, fill = "orange", alpha = 0.3) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) ## TPR at the leaf level if nodes 14 and 15 are called differential (7/8) tpr(tree = tinyTree, truth = c(16, 13), found = c(15, 14), only.leaf = TRUE) ## TPR at the node level if nodes 14 and 15 are called differential (12/14) tpr(tree = tinyTree, truth = c(16, 13), found = c(15, 14), only.leaf = FALSE)
Generate a heatmap corresponding to an arbitrary aggregation level of a tree.
TreeHeatmap( tree, tree_fig, hm_data, tree_hm_gap = 0, rel_width = 1, cell_line_color = NA, cell_line_size = 0, column_order = NULL, column_split = NULL, column_split_gap = 0.2, column_split_label = NULL, split_label_fontface = "bold", split_label_color = "black", split_label_size = 3, split_label_angle = 0, split_label_offset_x = 0, split_label_offset_y = 2, split_label_hjust = 0.5, split_label_vjust = 0, column_anno = NULL, column_anno_size = 1, column_anno_color = NULL, column_anno_gap = 0.1, legend_title_hm = "Expression", legend_title_column_anno = "group", show_colnames = FALSE, colnames_position = "top", colnames_angle = 0, colnames_offset_x = 0, colnames_offset_y = 0, colnames_size = 4, colnames_hjust = 0.5, show_rownames = FALSE, rownames_position = "right", rownames_angle = 0, rownames_offset_x = 0, rownames_offset_y = 0, rownames_size = 4, rownames_hjust = 0.5, rownames_label = NULL, show_title = FALSE, title_hm = "First heatmap", title_fontface = "bold", title_color = "black", title_size = 3, title_angle = 0, title_offset_x = 0, title_offset_y = 2, title_hjust = 0.5, cluster_column = FALSE, dist_method = "euclidean", hclust_method = "ave", show_row_tree = TRUE )
TreeHeatmap( tree, tree_fig, hm_data, tree_hm_gap = 0, rel_width = 1, cell_line_color = NA, cell_line_size = 0, column_order = NULL, column_split = NULL, column_split_gap = 0.2, column_split_label = NULL, split_label_fontface = "bold", split_label_color = "black", split_label_size = 3, split_label_angle = 0, split_label_offset_x = 0, split_label_offset_y = 2, split_label_hjust = 0.5, split_label_vjust = 0, column_anno = NULL, column_anno_size = 1, column_anno_color = NULL, column_anno_gap = 0.1, legend_title_hm = "Expression", legend_title_column_anno = "group", show_colnames = FALSE, colnames_position = "top", colnames_angle = 0, colnames_offset_x = 0, colnames_offset_y = 0, colnames_size = 4, colnames_hjust = 0.5, show_rownames = FALSE, rownames_position = "right", rownames_angle = 0, rownames_offset_x = 0, rownames_offset_y = 0, rownames_size = 4, rownames_hjust = 0.5, rownames_label = NULL, show_title = FALSE, title_hm = "First heatmap", title_fontface = "bold", title_color = "black", title_size = 3, title_angle = 0, title_offset_x = 0, title_offset_y = 2, title_hjust = 0.5, cluster_column = FALSE, dist_method = "euclidean", hclust_method = "ave", show_row_tree = TRUE )
tree |
A |
tree_fig |
A |
hm_data |
A |
tree_hm_gap |
A numeric scalar specifying the gap between the tree and the heatmap. |
rel_width |
A numeric scalar specifying the width of heatmap relative to
the width of the tree. For example, if |
cell_line_color |
A color for the lines separating cells in the heatmap. |
cell_line_size |
A numeric scalar specifying the line width for lines separating cells in the heatmap. |
column_order |
A character vector specifying the display order of the
columns in the heatmap. Should correspond to the column names of
|
column_split |
A named character vector that provides the grouping
information used to split the columns in the heatmap. The names should
correspond to the column names of |
column_split_gap |
A numeric scalar specifying the gap between the groups of split columns in the heatmap. |
column_split_label |
A named character vector to label the column split.
The names should correspond to the values in |
split_label_fontface |
The fontface of the labels of the column split. |
split_label_color |
The color of the the labels of the column split. |
split_label_size |
The size of the the labels of the column split. |
split_label_angle |
The angle of the the labels of the column split. |
split_label_offset_x |
A numeric value to shift the labels of the column split along the x-axis. |
split_label_offset_y |
A numeric value to shift the labels of the column split along the y-axis. |
split_label_hjust |
The horizontal justification for the labels of the column split: e.g. 0 (left aligned); 0.5 (centered); 1 (right aligned). |
split_label_vjust |
Similar to |
column_anno |
A named vector to specify labels that are used to annotate the columns of heatmap. |
column_anno_size |
A numeric value to specify the size of the annotation bar. |
column_anno_color |
A named vector to specify colors that are used to annotate the columns of the heatmap. |
column_anno_gap |
A numeric value to specify the gap between the column annotation bar and the heatmap. |
legend_title_hm |
The legend title of the heatmap. |
legend_title_column_anno |
The legend title of the column annotation. |
show_colnames |
A logical value to specify whether column names should be displayed. |
colnames_position |
The position of column names, either "top" or "bottom". |
colnames_angle |
A numeric scalar specifying the angle of column names. |
colnames_offset_x |
A numeric value to shift column names on the x-axis. |
colnames_offset_y |
A numeric value to shift column names on the y-axis. |
colnames_size |
A numeric value to specify the size of column names. |
colnames_hjust |
The horizontal justification for column names: e.g. 0 (left aligned); 0.5 (centered); 1 (right aligned). |
show_rownames |
A logical value to specify whether row names should be displayed. |
rownames_position |
The position of the row names, either "right" or "left". |
rownames_angle |
A numeric value specifying the angle of row names. |
rownames_offset_x |
A numeric value to shift row names on the x-axis. |
rownames_offset_y |
A numeric value to shift row names on the y-axis. |
rownames_size |
A numeric value to specify the size of row names. |
rownames_hjust |
The horizontal justification for row names: e.g. 0 (left aligned); 0.5 (centered); 1 (right aligned). |
rownames_label |
A named vector to annotate the rows of the heatmap instead of the row names of hm_data. |
show_title |
A logical value to specify whether the title should be displayed. |
title_hm |
The title of the heatmap. |
title_fontface |
The fontface of the title. |
title_color |
The color of the title. |
title_size |
The size of the title. |
title_angle |
The angle of the title. |
title_offset_x |
A numeric value to shift the title along the x-axis. |
title_offset_y |
A numeric value to shift the title along the y-axis. |
title_hjust |
The horizontal justification for the title: e.g. 0 (left aligned); 0.5 (centered); 1 (right aligned). |
cluster_column |
A logical scalar, specifying whether columns of the heatmap should be clustered by similarity. This is ignored when column_order is given. |
dist_method |
See method in |
hclust_method |
See method in |
show_row_tree |
A logical scalar (default |
A ggtree
object.
Ruizhu Huang
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) library(ggplot2) library(scales) }) ## Load example data (tiny tree with corresponding count matrix) tse <- readRDS(system.file("extdata", "tinytree_counts.rds", package = "treeclimbR")) ## Prepare the tree figure tree_fig <- ggtree(rowTree(tse), branch.length = "none", layout = "rectangular") + geom_hilight(node = 18, fill = "orange", alpha = 0.3) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) tree_fig ## Simple heatmap with tree TreeHeatmap(tree = rowTree(tse), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tse, "counts")) ## Aggregate counts for each of the highlighted subtrees tseagg <- aggTSE( tse, rowLevel = c(13, 18, setdiff(showNode(tinyTree, only.leaf = TRUE), unlist(findDescendant(tinyTree, node = c(13, 18), only.leaf = TRUE))))) ## Visualize aggregated heatmap with tree TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts")) ## Cluster columns TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE) ## Split columns col_split <- ifelse(colnames(tseagg) %in% paste0("S", seq_len(5)), "A", "B") names(col_split) <- colnames(tseagg) TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE, column_split = col_split) ## Annotate columns col_anno <- col_split TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE, column_split = col_split, column_anno = col_anno, column_anno_gap = 1) ## Change annotation colors TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE, column_split = col_split, column_anno = col_anno, column_anno_gap = 1, column_anno_color = c(A = "red", B = "blue")) ## Add column names TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE, column_split = col_split, column_anno = col_anno, column_anno_gap = 1, column_anno_color = c(A = "red", B = "blue"), show_colnames = TRUE, colnames_position = "bottom", colnames_angle = 90, colnames_size = 2, colnames_offset_y = -0.4) ## Add title TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE, column_split = col_split, column_anno = col_anno, column_anno_gap = 1, column_anno_color = c(A = "red", B = "blue"), show_colnames = TRUE, colnames_position = "bottom", colnames_angle = 90, colnames_size = 2, colnames_offset_y = -0.4, show_title = TRUE, title_offset_y = 2, title_color = "blue") ## Change colors TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE, column_split = col_split, column_anno = col_anno, column_anno_gap = 1, column_anno_color = c(A = "red", B = "blue"), show_colnames = TRUE, colnames_position = "bottom", colnames_angle = 90, colnames_size = 2, colnames_offset_y = -0.4, show_title = TRUE, title_offset_y = 2, title_color = "blue") + scale_fill_gradientn( colours = c("blue", "yellow", "red"), values = scales::rescale(c(5, 8, 10)), guide = "colorbar", limits = c(5, 10))
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) library(ggplot2) library(scales) }) ## Load example data (tiny tree with corresponding count matrix) tse <- readRDS(system.file("extdata", "tinytree_counts.rds", package = "treeclimbR")) ## Prepare the tree figure tree_fig <- ggtree(rowTree(tse), branch.length = "none", layout = "rectangular") + geom_hilight(node = 18, fill = "orange", alpha = 0.3) + geom_hilight(node = 13, fill = "blue", alpha = 0.3) tree_fig ## Simple heatmap with tree TreeHeatmap(tree = rowTree(tse), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tse, "counts")) ## Aggregate counts for each of the highlighted subtrees tseagg <- aggTSE( tse, rowLevel = c(13, 18, setdiff(showNode(tinyTree, only.leaf = TRUE), unlist(findDescendant(tinyTree, node = c(13, 18), only.leaf = TRUE))))) ## Visualize aggregated heatmap with tree TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts")) ## Cluster columns TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE) ## Split columns col_split <- ifelse(colnames(tseagg) %in% paste0("S", seq_len(5)), "A", "B") names(col_split) <- colnames(tseagg) TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE, column_split = col_split) ## Annotate columns col_anno <- col_split TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE, column_split = col_split, column_anno = col_anno, column_anno_gap = 1) ## Change annotation colors TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE, column_split = col_split, column_anno = col_anno, column_anno_gap = 1, column_anno_color = c(A = "red", B = "blue")) ## Add column names TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE, column_split = col_split, column_anno = col_anno, column_anno_gap = 1, column_anno_color = c(A = "red", B = "blue"), show_colnames = TRUE, colnames_position = "bottom", colnames_angle = 90, colnames_size = 2, colnames_offset_y = -0.4) ## Add title TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE, column_split = col_split, column_anno = col_anno, column_anno_gap = 1, column_anno_color = c(A = "red", B = "blue"), show_colnames = TRUE, colnames_position = "bottom", colnames_angle = 90, colnames_size = 2, colnames_offset_y = -0.4, show_title = TRUE, title_offset_y = 2, title_color = "blue") ## Change colors TreeHeatmap(tree = rowTree(tseagg), tree_fig = tree_fig, hm_data = SummarizedExperiment::assay(tseagg, "counts"), cluster_column = TRUE, column_split = col_split, column_anno = col_anno, column_anno_gap = 1, column_anno_color = c(A = "red", B = "blue"), show_colnames = TRUE, colnames_position = "bottom", colnames_angle = 90, colnames_size = 2, colnames_offset_y = -0.4, show_title = TRUE, title_offset_y = 2, title_color = "blue") + scale_fill_gradientn( colours = c("blue", "yellow", "red"), values = scales::rescale(c(5, 8, 10)), guide = "colorbar", limits = c(5, 10))
treeScore
takes the tree structure into account when calculating the
score for an internal node. If an internal node A has two children B and C,
(A->B, A->C)
, the new score at node A would be calculated as the
weighted mean of the scores in the whole family (A, B and C). The weights are
based on the number of descendant leaves. For example, if the node B has 2
descendant leaves, and C has 3 descendant leaves, then A would have 5. The
calculation would be .
The generation starts from the leaves and the new generated scores are used
to update those in higher level of the tree until the root is reached.
treeScore(tree, score_data, node_column, score_column, new_score)
treeScore(tree, score_data, node_column, score_column, new_score)
tree |
A |
score_data |
A data frame that includes at least two columns. One column stores the number of the node, and the other stores the original score of the corresponding node. |
node_column |
The name of the column of |
score_column |
The name of the column of |
new_score |
The name of the column that stores the generated score. |
A data.frame
similar to score_data
, but with an
extra column (named new_score
) containing the weighted scores.
Ruizhu Huang
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) library(dplyr) }) ## tree data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) ## score exScore <- data.frame(nodeNum = seq_len(19), score = (seq_len(19))/10) ## Calculate new score based on the tree newScore <- treeScore(tree = tinyTree, score_data = exScore, node_column = "nodeNum", score_column = "score", new_score = "wScore") ## Visualize the result ## The original scores are in black texts and the new ones in blue df <- newScore |> rename(node = nodeNum) |> mutate(score = round(score, 3), wScore = round(wScore, 3)) ggtree(tinyTree) %<+% df + geom_text2(aes(label = score), hjust = -0.05) + geom_text2(aes(label = wScore, hjust = -1.2), color = "blue")
suppressPackageStartupMessages({ library(TreeSummarizedExperiment) library(ggtree) library(dplyr) }) ## tree data(tinyTree) ggtree(tinyTree, branch.length = "none") + geom_text2(aes(label = node)) ## score exScore <- data.frame(nodeNum = seq_len(19), score = (seq_len(19))/10) ## Calculate new score based on the tree newScore <- treeScore(tree = tinyTree, score_data = exScore, node_column = "nodeNum", score_column = "score", new_score = "wScore") ## Visualize the result ## The original scores are in black texts and the new ones in blue df <- newScore |> rename(node = nodeNum) |> mutate(score = round(score, 3), wScore = round(wScore, 3)) ggtree(tinyTree) %<+% df + geom_text2(aes(label = score), hjust = -0.05) + geom_text2(aes(label = wScore, hjust = -1.2), color = "blue")