A dendrogram diagram displays binary trees focused on representing hierarchical relations between the tree elements. A dendrogram contains nodes, branches (edges), a root, and leaves (Figure 1A). The root is where the branches and nodes come from, indicating the direction to the leaves, i.e., the terminal nodes.
Most of the space of a dendrogram layout is used to arrange branches and inner nodes, with limited space to the leaves. For large dendrograms, the leaf labels are often squeezed to fit into small slots. Therefore, a dendrogram may not provide the best layout when the information to be displayed should highlight the leaves.
The TreeAndLeaf package aims to improve the visualization of the dendrogram leaves by combining tree and force-directed layout algorithms, shifting the focus of analysis to the leaves (Figure 1B). The package’s workflow is summarized in Figure 1C.
Figure 1. TreeAndLeaf workflow summary. (A,B) The dendrogram in A is used to construct the graph in B. (C) The main input for the TreeAndLeaf package consists of a dendrogram, and then the package transforms the dendrogram into a graph representation. The finest graph layout is achieved by a two-steps process, starting with an unrooted tree diagram, which is subsequently relaxed by a force-directed algorithm applied to the terminal nodes of the tree. The final tree-and-leaf layout varies depending on the initial state, which can be obtained by other tree layout algorithms (see section 3 for examples using ggtree layouts to setup the initial state).
This section provides a basic example using the R built-in
USArrests
dataset. The USArrests
is a
dataframe available in the user’s workspace. To know more about this
dataframe, please query ?USArrests
in the R console. We
will build a dendrogram from the USArrests
dataset, then
transform the dendrogram into a tree-and-leaf diagram, and the
result will be visualized in the RedeR application.
#-- Libraries required in this section:
#-- TreeAndLeaf(>=1.4.2), RedeR(>=1.40.4), Bioconductor >= 3.13 (R >= 4.0)
# BiocManager::install(c("TreeAndLeaf","RedeR"))
# install.packages(c("igraph","RColorBrewer"))
#-- Load packages
library("TreeAndLeaf")
library("RedeR")
library("igraph")
library("RColorBrewer")
In order to build a dendrogram from the USArrests
dataset, we need a distance matrix. We will use the default “euclidean
distance” method from the dist()
function, and then the
“average” method from hclust()
function to create the
dendrogram.
The treeAndLeaf
function will transform the
hclust into an igraph object, including some basic
graph attributes to display in the RedeR
application.
The att.mapv()
function can be used to add external
annotations to an igraph object, for example, mapping new
variables to the graph vertices. We will add all USArrests
variables to the tal
object. To map one object to another
it is essential to use the same mapping IDs, set by the
refcol
parameter, which points to a column in the input
annotation dataset. In this example, refcol = 0
indicates
that the USArrests
rownames will be used as mapping IDs. To
check the IDs in the igraph vertices, please type
V(tal)$name
in the R console.
#--- Map attributes to the tree-and-leaf
#Note: 'refcol = 0' indicates that 'dat' rownames will be used as mapping IDs
tal <- att.mapv(g = tal, dat = USArrests, refcol = 0)
Now we use the att.setv()
wrapper function to set
attributes in the tree-and-leaf diagram. To see all attributes
available to display in the RedeR application, please
type ?addGraph
in the R console. The graph attributes can
also be customized following igraph syntax rules.
#--- Set graph attributes using the 'att.setv' wrapper function
pal <- brewer.pal(9, "Reds")
tal <- att.setv(g = tal, from = "Murder", to = "nodeColor",
cols = pal, nquant = 5)
tal <- att.setv(g = tal, from = "UrbanPop", to = "nodeSize",
xlim = c(10, 50, 5), nquant = 5)
#--- Set graph attributes using 'att.addv' and 'att.adde' functions
tal <- att.addv(tal, "nodeFontSize", value = 15, index = V(tal)$isLeaf)
tal <- att.adde(tal, "edgeWidth", value = 3)
The next steps will call the RedeR application, and then display the tree-and-leaf diagram in an interactive R/Java interface. The initial layout will show an unrooted tree diagram, which will be subsequently relaxed by a force-directed algorithm applied to the terminal nodes of the tree.
#--- Send the tree-and-leaf to the interactive R/Java interface
addGraph(obj = rdp, g = tal, gzoom=75)
#--- Call 'relax' to fine-tune the leaf nodes
relax(rdp, p1=25, p2=200, p3=5, p5=5, ps=TRUE)
At this point, the user can interact with the layout process to
achieve the best or desired layout; we suggest fine-tuning the
force-directed algorithm parameters, either through the R/Java interface
or the command line relaxation function. Note that the unroot tree
diagram represents the initial state; then a relaxing process should
start until the finest graph layout is achieved. The final layout varies
depending on the initial state, which can also be adjusted by providing
more or less room for the spatial configuration (e.g. via
gzoom
parameter).
#--- Add legends
addLegend.color(obj = rdp, tal, title = "Murder Rate",
position = "topright")
addLegend.size(obj = rdp, tal, title = "Urban Population Size",
position = "bottomright")
The tree diagram represents the initial state of a
tree-and-leaf, which is then relaxed by a force-directed
algorithm applied to the terminal nodes. Therefore, the final
tree-and-leaf layout varies depending on the initial state. The
treeAndLeaf package also accepts ggtree
layouts to setup the initial state. For example, next we show a tree
diagram generated by the ggtree package, and then we
apply the tree-and-leaf transformation.
#-- Libraries required in this section:
#-- TreeAndLeaf(>=1.4.2), RedeR(>=1.40.4), Bioconductor >= 3.13 (R >= 4.0)
# BiocManager::install(c("TreeAndLeaf","RedeR","ggtree))
# install.packages(c("igraph","ape", "dendextend", "dplyr",
# "ggplot2", "RColorBrewer"))
#-- Load packages
library("TreeAndLeaf")
library("RedeR")
library("igraph")
library("ape")
library("ggtree")
library("dendextend")
library("dplyr")
library("ggplot2")
library("RColorBrewer")
#--- Generate a random phylo tree
phylo_tree <- rcoal(300)
#--- Set groups and node sizes
group <- size <- dendextend::cutree(phylo_tree, 10)
group[] <- LETTERS[1:10][group]
size[] <- sample(size)
group.df <- data.frame(label=names(group), group=group, size=size)
phylo_tree <- dplyr::full_join(phylo_tree, group.df, by='label')
#--- Generate a ggtree with 'daylight' layout
pal <- brewer.pal(10, "Set3")
ggt <- ggtree(phylo_tree, layout = 'daylight', branch.length='none')
#--- Plot the ggtree
ggt + geom_tippoint(aes(color=group, size=size)) +
scale_color_manual(values=pal) + scale_y_reverse()
#-- Convert the 'ggtree' object into a 'tree-and-leaf' object
tal <- treeAndLeaf(ggt)
#--- Map attributes to the tree-and-leaf
#Note: 'refcol = 1' indicates that 'dat' col 1 will be used as mapping IDs
tal <- att.mapv(g = tal, dat = group.df, refcol = 1)
#--- Set graph attributes using the 'att.setv' wrapper function
tal <- att.setv(g = tal, from = "group", to = "nodeColor",
cols = pal)
tal <- att.setv(g = tal, from = "size", to = "nodeSize",
xlim = c(10, 50, 5))
#--- Set graph attributes using 'att.addv' and 'att.adde' functions
tal <- att.addv(tal, "nodeFontSize", value = 1)
tal <- att.addv(tal, "nodeLineWidth", value = 0)
tal <- att.addv(tal, "nodeColor", value = "black", index=!V(tal)$isLeaf)
tal <- att.adde(tal, "edgeWidth", value = 3)
tal <- att.adde(tal, "edgeColor", value = "black")
#--- Send the tree-and-leaf to the interactive R/Java interface
addGraph(obj = rdp, g = tal, gzoom=50)
#--- Select inner nodes, preventing them from relaxing
selectNodes(rdp, V(tal)$name[!V(tal)$isLeaf], anchor=TRUE)
#--- Call 'relax' to fine-tune the leaf nodes
relax(rdp, p1=25, p2=100, p3=5, p5=1, p8=5, ps=TRUE)
#--- Add legends
addLegend.color(obj = rdp, tal, title = "Group",
position = "topright",vertical=T)
addLegend.size(obj = rdp, tal, title = "Size",
position = "topleft",
vertical=T, dxtitle=10)
This section follows the same steps described in the Quick
Start, but using a larger dendrogram derived from the R built-in
quakes
dataset. The quakes
is a dataframe
available in the user’s workspace. To know more about this dataframe,
please query ?quakes
in the R console. We will build a
dendrogram from the quakes
dataset, then transform the
dendrogram into a tree-and-leaf diagram, and the result will be
visualized in the RedeR application.
#-- Libraries required in this section:
#-- TreeAndLeaf(>=1.4.2), RedeR(>=1.40.4), Bioconductor >= 3.13 (R >= 4.0)
# BiocManager::install(c("TreeAndLeaf","RedeR"))
# install.packages(c("igraph", "RColorBrewer"))
#-- Load packages
library(TreeAndLeaf)
library(RedeR)
library(igraph)
library(RColorBrewer)
#--- Map attributes to the tree-and-leaf
#Note: 'refcol = 0' indicates that 'dat' rownames will be used as mapping IDs
tal <- att.mapv(tal, quakes, refcol = 0)
#--- Set graph attributes using the 'att.setv' wrapper function
pal <- brewer.pal(9, "Greens")
tal <- att.setv(g = tal, from = "mag", to = "nodeColor",
cols = pal, nquant = 10)
tal <- att.setv(g = tal, from = "depth", to = "nodeSize",
xlim = c(40, 120, 20), nquant = 5)
#--- Set graph attributes using 'att.addv' and 'att.adde' functions
tal <- att.addv(tal, "nodeFontSize", value = 1)
tal <- att.adde(tal, "edgeWidth", value = 10)
The next steps will call the RedeR application, and then display the tree-and-leaf diagram in an interactive R/Java interface. The initial layout will show an unrooted tree diagram, which will be subsequently relaxed by a force-directed algorithm applied to the terminal nodes of the tree.
#--- Send the tree-and-leaf to the interactive R/Java interface
addGraph(obj = rdp, g = tal, gzoom=10)
#--- Call 'relax' to fine-tune the leaf nodes
relax(rdp, p1=25, p2=200, p3=10, p4=100, p5=10, ps=TRUE)
#--- Add legends
addLegend.color(obj = rdp, tal, title = "Richter Magnitude",
position = "bottomright")
addLegend.size(obj = rdp, tal, title = "Depth (km)")
This section generates a tree-and-leaf diagram from a
pre-computed phylo
tree object. We will use a phylogenetic
tree listing 121 eukaryotes, available from the
geneplast package.
#-- Libraries required in this section:
#-- TreeAndLeaf(>=1.4.2), RedeR(>=1.40.4), Bioconductor >= 3.13 (R >= 4.0)
# BiocManager::install(c("TreeAndLeaf","RedeR","geneplast))
# install.packages(c("igraph","ape", "RColorBrewer"))
#-- Load packages
library(TreeAndLeaf)
library(RedeR)
library(igraph)
library(ape)
library(geneplast)
library(RColorBrewer)
#--- Drop organisms not listed in the 'spdata' annotation
phyloTree$tip.label <- as.character(phyloTree$tip.label)
tokeep <- phyloTree$tip.label %in% spdata$tax_id
pruned.phylo <- drop.tip(phyloTree, phyloTree$tip.label[!tokeep])
#-- Convert the phylogenetic tree into a 'tree-and-leaf' object
tal <- treeAndLeaf(pruned.phylo)
#--- Map attributes to the tree-and-leaf
#Note: 'refcol = 1' indicates that 'dat' col 1 will be used as mapping IDs
tal <- att.mapv(g = tal, dat = spdata, refcol = 1)
#--- Set graph attributes using the 'att.setv' wrapper function
pal <- brewer.pal(9, "Purples")
tal <- att.setv(g = tal, from = "genome_size_Mb",
to = "nodeSize", xlim = c(120, 250, 1), nquant = 5)
tal <- att.setv (g = tal, from = "proteins",
to = "nodeColor", nquant = 5,
cols = pal, na.col = "black")
#--- Add graph attributes using 'att.adde' and 'att.addv' functions
tal <- att.addv(tal, "nodeFontSize", value = 10)
tal <- att.adde(tal, "edgeWidth", value = 20)
# Set species names to 'nodeAlias' attribute
tal <- att.setv(tal, from = "sp_name", to = "nodeAlias")
# Select a few names to highlight in the graph
tal <- att.addv(tal, "nodeFontSize", value = 100,
filter=list('name'=sample(pruned.phylo$tip.label,30)))
tal <- att.addv(tal, "nodeFontSize", value = 100,
filter=list('name'="9606")) #Homo sapiens
# Call RedeR
rdp <- RedPort()
calld(rdp)
resetd(rdp)
#--- Send the tree-and-leaf to the interactive R/Java interface
addGraph(obj = rdp, g = tal, gzoom=10)
#--- Call 'relax' to fine-tune the leaf nodes
relax(rdp, ps=TRUE)
#--- Add legends
addLegend.color(rdp, tal, title = "Proteome Size (n)")
addLegend.size(rdp, tal, title = "Genome Size (Mb)")
The TreeAndLeaf package is designed to layout binary trees, but it can also layout other graph configurations. To exemplify this case, we will use a larger phylogenetic tree available from the geneplast package, and for which some inner nodes have more than two children, or non-binary nodes.
#-- Libraries required in this section:
#-- TreeAndLeaf(>=1.4.2), RedeR(>=1.40.4), Bioconductor >= 3.13 (R >= 4.0)
# BiocManager::install(c("TreeAndLeaf","RedeR","geneplast))
# install.packages(c("igraph","ape", "RColorBrewer"))
#-- Load packages
library(TreeAndLeaf)
library(RedeR)
library(igraph)
library(ape)
library(geneplast)
library(RColorBrewer)
#--- Map attributes to the tree-and-leaf using "%>%" operator
tal <- tal %>%
att.mapv(dat = spdata, refcol = 1) %>%
att.setv(from = "genome_size_Mb", to = "nodeSize",
xlim = c(120, 250, 1), nquant = 5) %>%
att.setv(from = "proteins", to = "nodeColor", nquant = 5,
cols = brewer.pal(9, "Blues"), na.col = "black") %>%
att.setv(from = "sp_name", to = "nodeAlias") %>%
att.adde(to = "edgeWidth", value = 20) %>%
att.addv(to = "nodeFontSize", value = 10) %>%
att.addv(to = "nodeFontSize", value = 100,
filter = list("name" = sample(pruned.phylo$tip.label, 30))) %>%
att.addv(to = "nodeFontSize", value = 100,
filter = list("name" = "9606"))
# Call RedeR
rdp <- RedPort()
calld(rdp)
resetd(rdp)
#--- Send the tree-and-leaf to the interactive R/Java interface
addGraph(obj = rdp, g = tal, gzoom=5)
#--- Call 'relax' to fine-tune the leaf nodes
relax(rdp, ps=TRUE)
#--- Add legends
addLegend.color(rdp, tal, title = "Proteome Size (n)")
addLegend.size(rdp, tal, title = "Genome size (Mb)")
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] geneplast_1.33.0 ape_5.8 RColorBrewer_1.1-3 igraph_2.1.1
#> [5] RedeR_3.2.0 TreeAndLeaf_1.19.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] jsonlite_1.8.9 compiler_4.4.1 BiocManager_1.30.25
#> [4] highr_0.11 Rcpp_1.0.13 parallel_4.4.1
#> [7] jquerylib_0.1.4 scales_1.3.0 yaml_2.3.10
#> [10] fastmap_1.2.0 lattice_0.22-6 R6_2.5.1
#> [13] knitr_1.48 snow_0.4-4 maketools_1.3.1
#> [16] munsell_0.5.1 bslib_0.8.0 rlang_1.1.4
#> [19] cachem_1.1.0 xfun_0.48 sass_0.4.9
#> [22] sys_3.4.3 cli_3.6.3 magrittr_2.0.3
#> [25] digest_0.6.37 grid_4.4.1 lifecycle_1.0.4
#> [28] nlme_3.1-166 data.table_1.16.2 evaluate_1.0.1
#> [31] glue_1.8.0 buildtools_1.0.0 colorspace_2.1-1
#> [34] rmarkdown_2.28 tools_4.4.1 pkgconfig_2.0.3
#> [37] htmltools_0.5.8.1