Probability plot usage

Introduction

In this document, a typical analysis using the groupProbPlot function is shown. The indention with this function is to display differences between groups, tissues, stimulations or similar, with a single-cell resolution. The idea is that a cell that comes from a cell type that is specific for one of the two investigated groups will be surrounded exclusively by euclidean nearest neighbors that come from the same group. This is the basis for the analysis: in the standard case, the individual cell is given a number between -1 and 1 that reflects which fraction of the 100 closest neighbors in the euclidean space created by all the input markers that come from one group (-1) or the other (1). The scale is tweaked to reflect that the middle in this case corresponds to a likelihood of a perfect mix with 50% of the cells from each group. For an introduction to the package and example data description, see the general DepecheR package vinjette.

Installation

This is how to install the package, if that has not already been done:

if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("DepecheR")

Preparations of example data

For visualization purposes, some 2-dimensional representation of the data is necessary. This could just be two of the variables used to construct the probability vector, but it is more informative to include data on all variables, aided by e.g. tSNE or UMAP. In this case, we will display the data with tSNE.

library(DepecheR)
data("testData")
data("testDataSNE")

Group probability plotting

This function differs from the other group differentiation functions in the DepecheR package in that no clustering output from the depeche function or any other clustering algorithm is needed as input. Instead, all the indata that the euclidean nearest neighbors should be identified from needs to be added, together with a group identity vector and the 2D data used to display the data. Optionally, the resulting group probability vector can be returned, which will be the case in this example.

dataTrans <-
  testData[, c("SYK", "CD16", "CD57", "EAT.2", "CD8", "NKG2C", "CD2", "CD56")]

testData$groupProb <- groupProbPlot(xYData = testDataSNE$Y, 
                                    groupVector = testData$label, 
                                    groupName1 = "Group_1",
                                    groupName2 = "Group_2",
                                    dataTrans = dataTrans)
## [1] "Done with k-means"
## [1] "Now the first bit is done, and the iterative part takes off"
## [1] "Clusters 1 to 7 smoothed in 2.9159369468689 . Now, 13 clusters are 
## [1] left."
## [1] "Clusters 8 to 14 smoothed in 0.925199031829834 . Now, 6 clusters are 
## [1] left."
## [1] "Clusters 15 to 20 smoothed in 0.905373096466064 . Now, 0 clusters are 
## [1] left."

When running this function, the output is a high-resolution plot saved to disc. A low resolution variant of the result (made small for BioConductor size constraint reasons) is shown here. In this case, the groups are so separated, that almost all cells show a 100% probability of belonging to one of the groups or the other. This is unusual with real data, so the white fields are generally larger.

Group probaility plot
Group probaility plot

Session information

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] DepecheR_1.23.0  knitr_1.48       BiocStyle_2.35.0
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6        ellipse_0.5.0       xfun_0.48          
##  [4] bslib_0.8.0         ggplot2_3.5.1       gmodels_2.19.1     
##  [7] caTools_1.18.3      ggrepel_0.9.6       collapse_2.0.16    
## [10] lattice_0.22-6      vctrs_0.6.5         tools_4.4.1        
## [13] doSNOW_1.0.20       bitops_1.0-9        generics_0.1.3     
## [16] parallel_4.4.1      tibble_3.2.1        fansi_1.0.6        
## [19] DEoptimR_1.1-3      rARPACK_0.11-0      pkgconfig_2.0.3    
## [22] Matrix_1.7-1        KernSmooth_2.23-24  RColorBrewer_1.1-3 
## [25] mixOmics_6.29.3     lifecycle_1.0.4     stringr_1.5.1      
## [28] compiler_4.4.1      FNN_1.1.4.1         gplots_3.2.0       
## [31] munsell_0.5.1       codetools_0.2-20    snow_0.4-4         
## [34] htmltools_0.5.8.1   sys_3.4.3           buildtools_1.0.0   
## [37] sass_0.4.9          yaml_2.3.10         beanplot_1.3.1     
## [40] gmp_0.7-5           tidyr_1.3.1         pillar_1.9.0       
## [43] jquerylib_0.1.4     MASS_7.3-61         BiocParallel_1.39.0
## [46] gdata_3.0.1         cachem_1.1.0        viridis_0.6.5      
## [49] iterators_1.0.14    foreach_1.5.2       robustbase_0.99-4-1
## [52] RSpectra_0.16-2     gtools_3.9.5        tidyselect_1.2.1   
## [55] digest_0.6.37       stringi_1.8.4       purrr_1.0.2        
## [58] reshape2_1.4.4      dplyr_1.1.4         maketools_1.3.1    
## [61] fastmap_1.2.0       grid_4.4.1          colorspace_2.1-1   
## [64] cli_3.6.3           magrittr_2.0.3      utf8_1.2.4         
## [67] corpcor_1.6.10      scales_1.3.0        rmarkdown_2.28     
## [70] matrixStats_1.4.1   igraph_2.1.1        gridExtra_2.3      
## [73] moments_0.14.1      evaluate_1.0.1      viridisLite_0.4.2  
## [76] rlang_1.1.4         Rcpp_1.0.13         glue_1.8.0         
## [79] BiocManager_1.30.25 jsonlite_1.8.9      plyr_1.8.9         
## [82] R6_2.5.1            ClusterR_1.3.3