Cell-Type-specific Spatially Variable gene detection, or CTSV, is an R package for identifying cell-type-specific spatially variable genes in bulk sptial transcriptomics data. In this Vignette, we will introduce a standard workflow of CTSV. By utilizing single-cell RNA sequencing data as reference, we can first use existing deconvolution methods to obtain cell type proportions for each spot. Subsequently, we take the cell type proportions, location coordinates and spatial expression data as input to CTSV.
CTSV
CTSV is an
R
package available in the Bioconductor repository. It requires
installing the R
open source statistical programming
language, which can be accessed on any operating system from CRAN. Next, you can install
CTSV
by using the following commands in your R
session:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("CTSV", version = "devel")
## Check that you have a valid Bioconductor installation
BiocManager::valid()
If there are any issues with the installation procedure or package features, the best place would be to commit an issue at the GitHub repository.
In order to run RCTD, the first step is to get cell-type proportions. There are some deconvolution methods such as RCTD, SPOTlight, SpatialDWLS and CARD. We provide an example data including the observed raw count bulk ST data, the location coordinate matrix, the cell-type proportion matrix and the true SV gene patterns.
suppressPackageStartupMessages(library(CTSV))
#> Warning: multiple methods tables found for 'intersect'
#> Warning: multiple methods tables found for 'union'
#> Warning: multiple methods tables found for 'intersect'
#> Warning: multiple methods tables found for 'setdiff'
suppressPackageStartupMessages(library(SpatialExperiment))
data("CTSVexample_data", package="CTSV")
spe <- CTSVexample_data[[1]]
W <- CTSVexample_data[[2]]
gamma_true <- CTSVexample_data[[3]]
Y <- assay(spe)
# dimension of bulk ST data
dim(Y)
#> [1] 20 100
# dimension of cell-type proportion matrix:
dim(W)
#> [1] 100 2
# SV genes in each cell type:
colnames(Y)[which(gamma_true[,1] == 1)]
#> [1] "spot11" "spot12" "spot13"
colnames(Y)[which(gamma_true[,2] == 1)]
#> [1] "spot14" "spot15" "spot16"
# Number of SV genes at the aggregated level:
sum(rowSums(gamma_true)>0)
#> [1] 6
We are now ready to run CTSV on the bulk ST data using
ctsv
function.
spe
is a SpatialExperiment class object.W
is the cell-type-specific matrix with n × K dimensions, where
K is the number of cell
types.num_core:
for parallel processing, the number of cores
used. If set to 1, parallel processing is not used. The system will
additionally be checked for number of available cores. Note, that we
recommend setting num_core
to at least 4
or
8
to improve efficiency.BPPARAM:
Optional additional argument for
parallelization. The default is NULL, in which case
num_core
will be used.The results of CTSV are located in a list.
pval
, combined p-values, a G × 2K matrix.qval
stores adjusted q-values of the combined p-values,
it is a G × 2K
matrix.# View on q-value matrix
head(result$qval)
#> [,1] [,2] [,3] [,4]
#> gene1 0.19608202 0.05511312 0.4798779 0.1473743
#> gene2 0.48137997 0.13405250 0.5496446 0.1111611
#> gene3 0.51617787 0.44324074 0.5396361 0.4848865
#> gene4 0.32941845 0.46443032 0.3294184 0.5622569
#> gene5 0.06301154 0.41376637 0.5202114 0.1050798
#> gene6 0.13405250 0.52238343 0.3294184 0.3078813
Then we want to extra SV genes with an FDR level at 0.05 using
svGene
function. We use the q-value matrix
qval
returned by the CTSV
function and a
threshold of 0.05 as input. The output of the svGene
is a
list containing two elements, the first of which is a G × 2K 0-1 matrix
indicating SV genes in each cell type and axis, denoted as
SV
. The second element is a list with names of SV genes in
each cell type, denoted as SVGene
.
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] SpatialExperiment_1.17.0 SingleCellExperiment_1.29.1
#> [3] SummarizedExperiment_1.37.0 Biobase_2.67.0
#> [5] GenomicRanges_1.59.0 GenomeInfoDb_1.43.0
#> [7] IRanges_2.41.1 S4Vectors_0.45.2
#> [9] BiocGenerics_0.53.3 generics_0.1.3
#> [11] MatrixGenerics_1.19.0 matrixStats_1.4.1
#> [13] CTSV_1.9.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 rjson_0.2.23 xfun_0.49
#> [4] bslib_0.8.0 ggplot2_3.5.1 lattice_0.22-6
#> [7] vctrs_0.6.5 tools_4.4.2 parallel_4.4.2
#> [10] tibble_3.2.1 fansi_1.0.6 pkgconfig_2.0.3
#> [13] Matrix_1.7-1 lifecycle_1.0.4 GenomeInfoDbData_1.2.13
#> [16] compiler_4.4.2 stringr_1.5.1 munsell_0.5.1
#> [19] codetools_0.2-20 htmltools_0.5.8.1 sys_3.4.3
#> [22] buildtools_1.0.0 sass_0.4.9 yaml_2.3.10
#> [25] pillar_1.9.0 crayon_1.5.3 jquerylib_0.1.4
#> [28] MASS_7.3-61 BiocParallel_1.41.0 DelayedArray_0.33.2
#> [31] cachem_1.1.0 magick_2.8.5 abind_1.4-8
#> [34] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.4
#> [37] dplyr_1.1.4 reshape2_1.4.4 maketools_1.3.1
#> [40] splines_4.4.2 fastmap_1.2.0 grid_4.4.2
#> [43] SparseArray_1.7.2 colorspace_2.1-1 cli_3.6.3
#> [46] magrittr_2.0.3 S4Arrays_1.7.1 utf8_1.2.4
#> [49] scales_1.3.0 UCSC.utils_1.3.0 rmarkdown_2.29
#> [52] XVector_0.47.0 httr_1.4.7 qvalue_2.39.0
#> [55] evaluate_1.0.1 knitr_1.49 pscl_1.5.9
#> [58] rlang_1.1.4 Rcpp_1.0.13-1 glue_1.8.0
#> [61] BiocManager_1.30.25 jsonlite_1.8.9 R6_2.5.1
#> [64] plyr_1.8.9 zlibbioc_1.52.0