Combine TreeSummarizedExperiment objects

Combine multiple TreeSummarizedExperiment objects

Multiple TreeSummarizedExperiemnt objects (TSE) can be combined by using rbind or cbind. Here, we create a toy TreeSummarizedExperiment object using makeTSE() (see ?makeTSE()). As the tree in the row/column tree slot is generated randomly using ape::rtree(), set.seed() is used to create reproducible results.

library(TreeSummarizedExperiment)

set.seed(1)
# TSE: without the column tree
(tse_a <- makeTSE(include.colTree = FALSE))
## class: TreeSummarizedExperiment 
## dim: 10 4 
## metadata(0):
## assays(1): ''
## rownames(10): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(4): sample1 sample2 sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (10 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
# combine two TSEs by row
(tse_aa <- rbind(tse_a, tse_a))
## class: TreeSummarizedExperiment 
## dim: 20 4 
## metadata(0):
## assays(1): ''
## rownames(20): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(4): sample1 sample2 sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (20 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL

The generated tse_aa has 20 rows, which is two times of that in tse_a. The row tree in tse_aa is the same as that in tse_a.

identical(rowTree(tse_aa), rowTree(tse_a))
## [1] TRUE

If we rbind two TSEs (e.g., tse_a and tse_b) that have different row trees, the obtained TSE (e.g., tse_ab) will have two row trees.

set.seed(2)
tse_b <- makeTSE(include.colTree = FALSE)

# different row trees
identical(rowTree(tse_a), rowTree(tse_b))
## [1] FALSE
# 2 phylo tree(s) in rowTree
(tse_ab <- rbind(tse_a, tse_b))
## class: TreeSummarizedExperiment 
## dim: 20 4 
## metadata(0):
## assays(1): ''
## rownames(20): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(4): sample1 sample2 sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (20 rows)
## rowTree: 2 phylo tree(s) (20 leaves)
## colLinks: NULL
## colTree: NULL

In the row link data, the whichTree column gives information about which tree the row is mapped to. For tse_aa, there is only one tree named as phylo. However, for tse_ab, there are two trees (phylo and phylo.1).

rowLinks(tse_aa)
## LinkDataFrame with 20 rows and 5 columns
##              nodeLab nodeLab_alias   nodeNum    isLeaf   whichTree
##          <character>   <character> <integer> <logical> <character>
## entity1      entity1       alias_1         1      TRUE       phylo
## entity2      entity2       alias_2         2      TRUE       phylo
## entity3      entity3       alias_3         3      TRUE       phylo
## entity4      entity4       alias_4         4      TRUE       phylo
## entity5      entity5       alias_5         5      TRUE       phylo
## ...              ...           ...       ...       ...         ...
## entity6      entity6       alias_6         6      TRUE       phylo
## entity7      entity7       alias_7         7      TRUE       phylo
## entity8      entity8       alias_8         8      TRUE       phylo
## entity9      entity9       alias_9         9      TRUE       phylo
## entity10    entity10      alias_10        10      TRUE       phylo
rowLinks(tse_ab)
## LinkDataFrame with 20 rows and 5 columns
##              nodeLab nodeLab_alias   nodeNum    isLeaf   whichTree
##          <character>   <character> <integer> <logical> <character>
## entity1      entity1       alias_1         1      TRUE       phylo
## entity2      entity2       alias_2         2      TRUE       phylo
## entity3      entity3       alias_3         3      TRUE       phylo
## entity4      entity4       alias_4         4      TRUE       phylo
## entity5      entity5       alias_5         5      TRUE       phylo
## ...              ...           ...       ...       ...         ...
## entity6      entity6       alias_6         6      TRUE     phylo.1
## entity7      entity7       alias_7         7      TRUE     phylo.1
## entity8      entity8       alias_8         8      TRUE     phylo.1
## entity9      entity9       alias_9         9      TRUE     phylo.1
## entity10    entity10      alias_10        10      TRUE     phylo.1

The name of trees can be accessed using rowTreeNames. If the input TSEs use the same name for trees, rbind will automatically create valid and unique names for trees by using make.names. tse_a and tse_b both use phylo as the name of their row trees. In tse_ab, the row tree that originates from tse_b is named as phylo.1 instead.

rowTreeNames(tse_aa)
## [1] "phylo"
rowTreeNames(tse_ab)
## [1] "phylo"   "phylo.1"
# The original tree names in the input TSEs
rowTreeNames(tse_a)
## [1] "phylo"
rowTreeNames(tse_b)
## [1] "phylo"

Once the name of trees is changed, the column whichTree in the rowLinks() is updated accordingly.

rowTreeNames(tse_ab) <- paste0("tree", 1:2)
rowLinks(tse_ab)
## LinkDataFrame with 20 rows and 5 columns
##              nodeLab nodeLab_alias   nodeNum    isLeaf   whichTree
##          <character>   <character> <integer> <logical> <character>
## entity1      entity1       alias_1         1      TRUE       tree1
## entity2      entity2       alias_2         2      TRUE       tree1
## entity3      entity3       alias_3         3      TRUE       tree1
## entity4      entity4       alias_4         4      TRUE       tree1
## entity5      entity5       alias_5         5      TRUE       tree1
## ...              ...           ...       ...       ...         ...
## entity6      entity6       alias_6         6      TRUE       tree2
## entity7      entity7       alias_7         7      TRUE       tree2
## entity8      entity8       alias_8         8      TRUE       tree2
## entity9      entity9       alias_9         9      TRUE       tree2
## entity10    entity10      alias_10        10      TRUE       tree2

To run cbind, TSEs should agree in the row dimension. If TSEs only differ in the row tree, the row tree and the row link data are dropped.

cbind(tse_a, tse_a)
## class: TreeSummarizedExperiment 
## dim: 10 8 
## metadata(0):
## assays(1): ''
## rownames(10): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(8): sample1 sample2 ... sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (10 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
cbind(tse_a, tse_b)
## Warning in cbind(...): rowTree & rowLinks differ in the provided TSEs.
##  rowTree & rowLinks are dropped after 'cbind'
## class: TreeSummarizedExperiment 
## dim: 10 8 
## metadata(0):
## assays(1): ''
## rownames(10): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(8): sample1 sample2 ... sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (10 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL

Subset a TSE object

We obtain a subset of tse_ab by extracting the data on rows 11:15. These rows are mapped to the same tree named as phylo.1. So, the rowTree slot of sse has only one tree.

(sse <- tse_ab[11:15, ])
## class: TreeSummarizedExperiment 
## dim: 5 4 
## metadata(0):
## assays(1): ''
## rownames(5): entity1 entity2 entity3 entity4 entity5
## rowData names(2): var1 var2
## colnames(4): sample1 sample2 sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (5 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
rowLinks(sse)
## LinkDataFrame with 5 rows and 5 columns
##             nodeLab nodeLab_alias   nodeNum    isLeaf   whichTree
##         <character>   <character> <integer> <logical> <character>
## entity1     entity1       alias_1         1      TRUE       tree2
## entity2     entity2       alias_2         2      TRUE       tree2
## entity3     entity3       alias_3         3      TRUE       tree2
## entity4     entity4       alias_4         4      TRUE       tree2
## entity5     entity5       alias_5         5      TRUE       tree2

[ works not only as a getter but also a setter to replace a subset of sse.

set.seed(3)
tse_c <- makeTSE(include.colTree = FALSE)
rowTreeNames(tse_c) <- "new_tree"

# the first two rows are from tse_c, and are mapped to 'new_tree'
sse[1:2, ] <- tse_c[5:6, ]
rowLinks(sse)
## LinkDataFrame with 5 rows and 5 columns
##             nodeLab nodeLab_alias   nodeNum    isLeaf   whichTree
##         <character>   <character> <integer> <logical> <character>
## entity5     entity5       alias_5         5      TRUE    new_tree
## entity6     entity6       alias_6         6      TRUE    new_tree
## entity3     entity3       alias_3         3      TRUE       tree2
## entity4     entity4       alias_4         4      TRUE       tree2
## entity5     entity5       alias_5         5      TRUE       tree2

The TSE object can be subset also by nodes or/and trees using subsetByNodes

# by tree
sse_a <- subsetByNode(x = sse, whichRowTree = "new_tree")
rowLinks(sse_a)
## LinkDataFrame with 2 rows and 5 columns
##             nodeLab nodeLab_alias   nodeNum    isLeaf   whichTree
##         <character>   <character> <integer> <logical> <character>
## entity5     entity5       alias_5         5      TRUE    new_tree
## entity6     entity6       alias_6         6      TRUE    new_tree
# by node
sse_b <- subsetByNode(x = sse, rowNode = 5)
rowLinks(sse_b)
## LinkDataFrame with 2 rows and 5 columns
##             nodeLab nodeLab_alias   nodeNum    isLeaf   whichTree
##         <character>   <character> <integer> <logical> <character>
## entity5     entity5       alias_5         5      TRUE    new_tree
## entity5     entity5       alias_5         5      TRUE       tree2
# by tree and node
sse_c <- subsetByNode(x = sse, rowNode = 5, whichRowTree = "tree2")
rowLinks(sse_c)
## LinkDataFrame with 1 row and 5 columns
##             nodeLab nodeLab_alias   nodeNum    isLeaf   whichTree
##         <character>   <character> <integer> <logical> <character>
## entity5     entity5       alias_5         5      TRUE       tree2

Change specific trees of TSE

By using colTree, we can add a column tree to sse that has no column tree before.

colTree(sse)
## NULL
library(ape)
set.seed(1)
col_tree <- rtree(ncol(sse))

# To use 'colTree` as a setter, the input tree should have node labels matching
# with column names of the TSE.
col_tree$tip.label <- colnames(sse)

colTree(sse) <- col_tree
colTree(sse)
## 
## Phylogenetic tree with 4 tips and 3 internal nodes.
## 
## Tip labels:
##   sample1, sample2, sample3, sample4
## 
## Rooted; includes branch lengths.

sse has two row trees. We can replace one of them with a new tree by specifying whichTree of the rowTree.

# the original row links
rowLinks(sse)
## LinkDataFrame with 5 rows and 5 columns
##             nodeLab nodeLab_alias   nodeNum    isLeaf   whichTree
##         <character>   <character> <integer> <logical> <character>
## entity5     entity5       alias_5         5      TRUE    new_tree
## entity6     entity6       alias_6         6      TRUE    new_tree
## entity3     entity3       alias_3         3      TRUE       tree2
## entity4     entity4       alias_4         4      TRUE       tree2
## entity5     entity5       alias_5         5      TRUE       tree2
# the new row tree
set.seed(1)
row_tree <- rtree(4)
row_tree$tip.label <- paste0("entity", 5:7)

# replace the tree named as the 'new_tree'
nse <- sse
rowTree(nse, whichTree = "new_tree") <- row_tree
rowLinks(nse)
## LinkDataFrame with 5 rows and 5 columns
##             nodeLab nodeLab_alias   nodeNum    isLeaf   whichTree
##         <character>   <character> <integer> <logical> <character>
## entity5     entity5       alias_1         1      TRUE    new_tree
## entity6     entity6       alias_2         2      TRUE    new_tree
## entity3     entity3       alias_3         3      TRUE       tree2
## entity4     entity4       alias_4         4      TRUE       tree2
## entity5     entity5       alias_5         5      TRUE       tree2

In the row links, the first two rows now have new values in nodeNum and nodeLab_alias. The name in whichTree is not changed but the tree is actually updated.

# FALSE is expected
identical(rowTree(sse, whichTree = "new_tree"),
          rowTree(nse, whichTree = "new_tree"))
## [1] FALSE
# TRUE is expected
identical(rowTree(nse, whichTree = "new_tree"),
          row_tree)
## [1] TRUE

If nodes of the input tree and rows of the TSE are named differently, users can match rows with nodes via changeTree with rowNodeLab provided.

Session Info

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggplot2_3.5.1                   ggtree_3.15.0                  
##  [3] ape_5.8                         TreeSummarizedExperiment_2.15.0
##  [5] Biostrings_2.75.0               XVector_0.46.0                 
##  [7] SingleCellExperiment_1.28.0     SummarizedExperiment_1.36.0    
##  [9] Biobase_2.67.0                  GenomicRanges_1.59.0           
## [11] GenomeInfoDb_1.43.0             IRanges_2.41.0                 
## [13] S4Vectors_0.44.0                BiocGenerics_0.53.1            
## [15] generics_0.1.3                  MatrixGenerics_1.19.0          
## [17] matrixStats_1.4.1               BiocStyle_2.35.0               
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1        dplyr_1.1.4             farver_2.1.2           
##  [4] fastmap_1.2.0           lazyeval_0.2.2          digest_0.6.37          
##  [7] lifecycle_1.0.4         tidytree_0.4.6          magrittr_2.0.3         
## [10] compiler_4.4.1          rlang_1.1.4             sass_0.4.9             
## [13] tools_4.4.1             utf8_1.2.4              yaml_2.3.10            
## [16] knitr_1.48              labeling_0.4.3          S4Arrays_1.6.0         
## [19] DelayedArray_0.33.1     aplot_0.2.3             abind_1.4-8            
## [22] BiocParallel_1.41.0     withr_3.0.2             purrr_1.0.2            
## [25] sys_3.4.3               grid_4.4.1              fansi_1.0.6            
## [28] colorspace_2.1-1        scales_1.3.0            cli_3.6.3              
## [31] rmarkdown_2.28          crayon_1.5.3            treeio_1.30.0          
## [34] httr_1.4.7              cachem_1.1.0            zlibbioc_1.52.0        
## [37] parallel_4.4.1          ggplotify_0.1.2         BiocManager_1.30.25    
## [40] vctrs_0.6.5             yulab.utils_0.1.7       Matrix_1.7-1           
## [43] jsonlite_1.8.9          gridGraphics_0.5-1      patchwork_1.3.0        
## [46] maketools_1.3.1         jquerylib_0.1.4         tidyr_1.3.1            
## [49] glue_1.8.0              codetools_0.2-20        gtable_0.3.6           
## [52] UCSC.utils_1.2.0        munsell_0.5.1           tibble_3.2.1           
## [55] pillar_1.9.0            htmltools_0.5.8.1       GenomeInfoDbData_1.2.13
## [58] R6_2.5.1                evaluate_1.0.1          lattice_0.22-6         
## [61] highr_0.11              ggfun_0.1.7             bslib_0.8.0            
## [64] Rcpp_1.0.13             SparseArray_1.6.0       nlme_3.1-166           
## [67] xfun_0.48               fs_1.6.5                buildtools_1.0.0       
## [70] pkgconfig_2.0.3

Reference