Interactively explore and visualize Single Cell RNA seq data

Introduction

scTreeViz is a package for interactive visualization and exploration of Single Cell RNA sequencing data. scTreeViz provides methods for exploring hierarchical features (eg. clusters in single cell at different resolutions or taxonomic hierarchy in single cell datasets), while supporting other useful data visualization charts like heatmaps for expression and scatter plots for dimensionality reductions like UMAP or TSNE.

Loading required packages

library(scTreeViz)
library(Seurat)
library(SC3)
library(scran)
library(scater)
library(clustree)
library(igraph)
library(scRNAseq)

Preparing Datasets

The first step in using the scTreeViz package is to wrap datasets into TreeViz objects. The TreeViz class extends SummarizedExperiment and provides various methods to interactively perform various operations on the underlying hierarchy and count or expression matrices. In this section, we show various ways to generate a TreeViz object either from existing Single Cell packages (SingleCellExperiment or Seurat) or from a raw count matrix and cluster hierarchy.

From SingleCellExperiment

A number of Single cell datasets are available as SingleCellExperiment objects through the scRNAseq package, for this usecase, we use LunSpikeInData dataset. In addition, we calculate the dimensionality reductions; UMAP, TSNE and PCA from the functions provided in scater package.

# load dataset
sce<- ZeiselBrainData()
# Normalization
sce <- logNormCounts(sce)
# calculate umap and tsne
sce <- runUMAP(sce)
sce<- runTSNE(sce)
sce<- runPCA(sce)

We provide createFromSCE function to create a TreeViz object from SingleCellExperiment object. Here, the workflow works in two ways:

  1. If no cluster information is available in the colData of the SingleCellExperiment object, we create clusters at different resolutions using the WalkTrap algorithm by calling an internal function generate_walktrap_hierarchy and use this cluster information for visualization.
treeViz <- createFromSCE(sce, reduced_dim = c("UMAP","PCA","TSNE"))
#>  [1] "1.cluster1"   "2.cluster2"   "3.cluster3"   "4.cluster4"   "5.cluster5"  
#>  [6] "6.cluster6"   "7.cluster7"   "8.cluster8"   "9.cluster9"   "10.cluster10"
#> [11] "11.cluster11" "12.cluster12" "13.cluster13" "14.cluster14" "samples"
plot(treeViz)

  1. If cluster information is provided in the colData of the object, then the user should set the flag parameter check_coldata to TRUE and provide prefix for the columns where cluster information is stored.
# Forming clusters
set.seed(1000)
for (i in  seq(10)) {
  clust.kmeans <- kmeans(reducedDim(sce, "TSNE"), centers = i)
  sce[[paste0("clust", i)]] <- factor(clust.kmeans$cluster)
}

treeViz<- createFromSCE(sce, check_coldata = TRUE, col_regex = "clust")
plot(treeViz)

Note: In both cases the user needs to provide the name of dimensionality reductions present in the object as a parameter.

From Seurat

We use the dataset pbmc_small available through Seurat to create a TreeViz object.

data(pbmc_small)
pbmc <- pbmc_small

We then preprocess the data and find clusters at different resolutions.

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- NormalizeData(pbmc)
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, vars.to.regress = "percent.mt")
pbmc <- FindVariableFeatures(object = pbmc)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0), print.output = 0, save.SNN = TRUE)
pbmc

The measurements for dimensionality reduction methods we want to visualize are also added to the object via native functions in Seurat. Since PCA is already added, we calculate TSNE and UMAP

# pbmc<- RunTSNE(pbmc)
pbmc<- RunUMAP(pbmc, dims=1:3)
Reductions(pbmc)

We use the createFromSeurat function to create a TreeViz object from Seurat object. In addition the object, we pass the name of dimensionality reductions present in the object as a paramter in vector format to indicate these measurements should be added to treeviz for visualization. If the mentioned reduced dimension is not present it would simply be ignored.

treeViz<- createFromSeurat(pbmc, check_metadata = TRUE, reduced_dim = c("umap","pca","tsne"))
#> [1] "6.cluster6"   "10.cluster10" "11.cluster11" "samples"     
#> [1] "umap" "pca"  "tsne"
plot(treeViz)

Create TreeViz from count matrix and Cluster hierarchy

n=64
# create a hierarchy
df<- data.frame(cluster0=rep(1,n))
for(i in seq(1,5)){
  df[[paste0("cluster",i)]]<- rep(seq(1:(2**i)),each=ceiling(n/(2**i)),len=n)
}

# generate a count matrix
counts <- matrix(rpois(6400, lambda = 10), ncol=n, nrow=100)
colnames(counts)<- seq(1:64)
# create a `TreeViz` object
treeViz <- createTreeViz(df, counts)
plot(treeViz)

Start the TreeViz App (using hosted app)

Start the App from the treeViz object we created. This adds a facetZoom to navigate the cluster hierarchy, a heatmap of the top n most variable genes from the dataset, where ‘n’ is selected by the user and one scatter plot for each of the reduced dimensions.

app <- startTreeviz(treeViz, top_genes = 500)
Cell 416B dataset
Cell 416B dataset
pbmc dataset
pbmc dataset

Users can also use the interface to explore the same dataset using different visualizations available through Epiviz.

Visualize gene expression across clusters

Users can also add Gene Box plots using either the frontend application, or from R session. In the following example, we visualize the 5th, 50th and 500th most variable gene as Box plots visualizing expression of a gene across clusters

Adding Gene Box Plots via UI

Users need to select Add Visualization -> Gene Box PLot option from menu and then select the desired gene using the search pane in the appeared dialogue box

Selecting a gene to visualize
Selecting a gene to visualize

Adding Gene Box Plots via R Session

Users can also select the gene from R session by using the plotGene command followed by Gene name.

app$plotGene(gene="AIF1")

Stop App

After exploring the dataset, this command the websocket connection.

app$stop_app()

Session

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] scRNAseq_2.20.0             igraph_2.1.2               
#>  [3] clustree_0.5.1              ggraph_2.2.1               
#>  [5] scater_1.35.0               ggplot2_3.5.1              
#>  [7] scran_1.35.0                scuttle_1.17.0             
#>  [9] SingleCellExperiment_1.29.1 SC3_1.35.0                 
#> [11] Seurat_5.1.0                SeuratObject_5.0.2         
#> [13] sp_2.1-4                    scTreeViz_1.13.0           
#> [15] SummarizedExperiment_1.37.0 Biobase_2.67.0             
#> [17] GenomicRanges_1.59.1        GenomeInfoDb_1.43.2        
#> [19] IRanges_2.41.2              S4Vectors_0.45.2           
#> [21] BiocGenerics_0.53.3         generics_0.1.3             
#> [23] MatrixGenerics_1.19.0       matrixStats_1.4.1          
#> [25] epivizr_2.37.0              BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] progress_1.2.3           epivizrData_1.35.0       goftest_1.2-3           
#>   [4] HDF5Array_1.35.2         Biostrings_2.75.3        vctrs_0.6.5             
#>   [7] spatstat.random_3.3-2    digest_0.6.37            png_0.1-8               
#>  [10] proxy_0.4-27             pcaPP_2.0-5              gypsum_1.3.0            
#>  [13] ggrepel_0.9.6            deldir_2.0-4             parallelly_1.41.0       
#>  [16] alabaster.sce_1.7.0      MASS_7.3-61              reshape2_1.4.4          
#>  [19] httpuv_1.6.15            foreach_1.5.2            withr_3.0.2             
#>  [22] xfun_0.49                survival_3.8-3           doRNG_1.8.6             
#>  [25] memoise_2.0.1            ggbeeswarm_0.7.2         zoo_1.8-12              
#>  [28] pbapply_1.7-2            DEoptimR_1.1-3-1         sys_3.4.3               
#>  [31] prettyunits_1.2.0        KEGGREST_1.47.0          promises_1.3.2          
#>  [34] httr_1.4.7               restfulr_0.0.15          globals_0.16.3          
#>  [37] fitdistrplus_1.2-1       rhdf5filters_1.19.0      rhdf5_2.51.1            
#>  [40] UCSC.utils_1.3.0         miniUI_0.1.1.1           curl_6.0.1              
#>  [43] zlibbioc_1.52.0          ScaledMatrix_1.15.0      polyclip_1.10-7         
#>  [46] GenomeInfoDbData_1.2.13  ExperimentHub_2.15.0     SparseArray_1.7.2       
#>  [49] RBGL_1.83.0              xtable_1.8-4             stringr_1.5.1           
#>  [52] doParallel_1.0.17        evaluate_1.0.1           S4Arrays_1.7.1          
#>  [55] BiocFileCache_2.15.0     hms_1.1.3                irlba_2.3.5.1           
#>  [58] colorspace_2.1-1         filelock_1.0.3           ROCR_1.0-11             
#>  [61] reticulate_1.40.0        spatstat.data_3.1-4      magrittr_2.0.3          
#>  [64] lmtest_0.9-40            later_1.4.1              buildtools_1.0.0        
#>  [67] viridis_0.6.5            lattice_0.22-6           spatstat.geom_3.3-4     
#>  [70] future.apply_1.11.3      robustbase_0.99-4-1      scattermore_1.2         
#>  [73] XML_3.99-0.17            cowplot_1.1.3            RcppAnnoy_0.0.22        
#>  [76] maketools_1.3.1          class_7.3-22             pillar_1.10.0           
#>  [79] nlme_3.1-166             iterators_1.0.14         compiler_4.4.2          
#>  [82] beachmat_2.23.5          RSpectra_0.16-2          stringi_1.8.4           
#>  [85] tensor_1.5               GenomicAlignments_1.43.0 plyr_1.8.9              
#>  [88] crayon_1.5.3             abind_1.4-8              BiocIO_1.17.1           
#>  [91] locfit_1.5-9.10          graphlayouts_1.2.1       bit_4.5.0.1             
#>  [94] dplyr_1.1.4              codetools_0.2-20         BiocSingular_1.23.0     
#>  [97] bslib_0.8.0              alabaster.ranges_1.7.0   e1071_1.7-16            
#> [100] plotly_4.10.4            mime_0.12                splines_4.4.2           
#> [103] Rcpp_1.0.13-1            fastDummies_1.7.4        dbplyr_2.5.0            
#> [106] knitr_1.49               blob_1.2.4               BiocVersion_3.21.1      
#> [109] AnnotationFilter_1.31.0  WriteXLS_6.7.0           checkmate_2.3.2         
#> [112] listenv_0.9.1            tibble_3.2.1             Matrix_1.7-1            
#> [115] statmod_1.5.0            tweenr_2.0.3             pkgconfig_2.0.3         
#> [118] pheatmap_1.0.12          tools_4.4.2              cachem_1.1.0            
#> [121] RSQLite_2.3.9            viridisLite_0.4.2        DBI_1.2.3               
#> [124] fastmap_1.2.0            rmarkdown_2.29           scales_1.3.0            
#> [127] grid_4.4.2               ica_1.0-3                epivizrServer_1.35.0    
#> [130] Rsamtools_2.23.1         AnnotationHub_3.15.0     sass_0.4.9              
#> [133] FNN_1.1.4.1              patchwork_1.3.0          BiocManager_1.30.25     
#> [136] dotCall64_1.2            graph_1.85.0             RANN_2.6.2              
#> [139] alabaster.schemas_1.7.0  farver_2.1.2             tidygraph_1.3.1         
#> [142] yaml_2.3.10              rtracklayer_1.67.0       cli_3.6.3               
#> [145] purrr_1.0.2              txdbmaker_1.3.1          leiden_0.4.3.1          
#> [148] lifecycle_1.0.4          uwot_0.2.2               mvtnorm_1.3-2           
#> [151] backports_1.5.0          bluster_1.17.0           BiocParallel_1.41.0     
#> [154] gtable_0.3.6             rjson_0.2.23             ggridges_0.5.6          
#> [157] progressr_0.15.1         parallel_4.4.2           limma_3.63.2            
#> [160] jsonlite_1.8.9           edgeR_4.5.1              RcppHNSW_0.6.0          
#> [163] bitops_1.0-9             bit64_4.5.2              Rtsne_0.17              
#> [166] alabaster.matrix_1.7.4   spatstat.utils_3.1-1     BiocNeighbors_2.1.2     
#> [169] alabaster.se_1.7.0       jquerylib_0.1.4          metapod_1.15.0          
#> [172] dqrng_0.4.1              spatstat.univar_3.1-1    rrcov_1.7-6             
#> [175] lazyeval_0.2.2           alabaster.base_1.7.2     shiny_1.10.0            
#> [178] htmltools_0.5.8.1        sctransform_0.4.1        rappdirs_0.3.3          
#> [181] ensembldb_2.31.0         glue_1.8.0               spam_2.11-0             
#> [184] httr2_1.0.7              XVector_0.47.0           RCurl_1.98-1.16         
#> [187] gridExtra_2.3            R6_2.5.1                 tidyr_1.3.1             
#> [190] labeling_0.4.3           GenomicFeatures_1.59.1   cluster_2.1.8           
#> [193] rngtools_1.5.2           Rhdf5lib_1.29.0          DelayedArray_0.33.3     
#> [196] tidyselect_1.2.1         vipor_0.4.7              ProtGenerics_1.39.1     
#> [199] ggforce_0.4.2            xml2_1.3.6               AnnotationDbi_1.69.0    
#> [202] future_1.34.0            rsvd_1.0.5               munsell_0.5.1           
#> [205] KernSmooth_2.23-24       data.table_1.16.4        htmlwidgets_1.6.4       
#> [208] RColorBrewer_1.1-3       biomaRt_2.63.0           rlang_1.1.4             
#> [211] spatstat.sparse_3.1-0    spatstat.explore_3.3-3   beeswarm_0.4.0          
#> [214] OrganismDbi_1.49.0