snifter provides an R wrapper for the openTSNE implementation of fast interpolated t-SNE (FI-tSNE). It is based on basilisk and reticulate. This vignette aims to provide a brief overview of typical use when applied to scRNAseq data, but it does not provide a comprehensive guide to the available options in the package.
It is highly advisable to review the documentation in snifter and the openTSNE documentation to gain a full understanding of the available options.
We will illustrate the use of snifter by generating some toy data. First, we’ll load the needed libraries, and set a random seed to ensure the simulated data are reproducible (note: it is good practice to ensure that a t-SNE embedding is robust by running the algorithm multiple times).
library("snifter")
library("ggplot2")
theme_set(theme_bw())
set.seed(42)
n_obs <- 500
n_feats <- 200
means_1 <- rnorm(n_feats)
means_2 <- rnorm(n_feats)
counts_a <- replicate(n_obs, rnorm(n_feats, means_1))
counts_b <- replicate(n_obs, rnorm(n_feats, means_2))
counts <- t(cbind(counts_a, counts_b))
label <- rep(c("A", "B"), each = n_obs)
The main functionality of the package lies in the fitsne
function. This function returns a matrix of t-SNE co-ordinates. In this
case, we pass in the 20 principal components computed based on the
log-normalised counts. We colour points based on the discrete cell types
identified by the authors.
The openTNSE package, and by extension snifter, also allows the embedding of new data into an existing t-SNE embedding. Here, we will split the data into “training” and “test” sets. Following this, we generate a t-SNE embedding using the training data, and project the test data into this embedding.
test_ind <- sample(nrow(counts), nrow(counts) / 2)
train_ind <- setdiff(seq_len(nrow(counts)), test_ind)
train_mat <- counts[train_ind, ]
test_mat <- counts[test_ind, ]
train_label <- label[train_ind]
test_label <- label[test_ind]
embedding <- fitsne(train_mat, random_state = 42L)
Once we have generated the embedding, we can now project
the unseen test data into this t-SNE embedding.
new_coords <- project(embedding, new = test_mat, old = train_mat)
ggplot() +
geom_point(
aes(embedding[, 1], embedding[, 2],
colour = train_label,
shape = "Train"
)
) +
geom_point(
aes(new_coords[, 1], new_coords[, 2],
colour = test_label,
shape = "Test"
)
) +
scale_colour_discrete(name = "Cluster") +
scale_shape_discrete(name = NULL) +
labs(x = "t-SNE 1", y = "t-SNE 2")
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_3.5.1 snifter_1.17.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] Matrix_1.7-1 gtable_0.3.6 jsonlite_1.8.9
#> [4] compiler_4.4.2 BiocManager_1.30.25 filelock_1.0.3
#> [7] Rcpp_1.0.13-1 parallel_4.4.2 assertthat_0.2.1
#> [10] jquerylib_0.1.4 scales_1.3.0 png_0.1-8
#> [13] yaml_2.3.10 fastmap_1.2.0 reticulate_1.40.0
#> [16] lattice_0.22-6 R6_2.5.1 labeling_0.4.3
#> [19] knitr_1.49 tibble_3.2.1 maketools_1.3.1
#> [22] munsell_0.5.1 pillar_1.9.0 bslib_0.8.0
#> [25] rlang_1.1.4 utf8_1.2.4 cachem_1.1.0
#> [28] dir.expiry_1.15.0 xfun_0.49 sass_0.4.9
#> [31] sys_3.4.3 cli_3.6.3 withr_3.0.2
#> [34] magrittr_2.0.3 digest_0.6.37 grid_4.4.2
#> [37] basilisk_1.19.0 lifecycle_1.0.4 vctrs_0.6.5
#> [40] evaluate_1.0.1 glue_1.8.0 farver_2.1.2
#> [43] buildtools_1.0.0 fansi_1.0.6 colorspace_2.1-1
#> [46] rmarkdown_2.29 pkgconfig_2.0.3 basilisk.utils_1.19.0
#> [49] tools_4.4.2 htmltools_0.5.8.1