Example_Analysis

Introduction

We developed a statistical method for spatially informed cell type deconvolution for spatial transcriptomics. Briefly,CARD is a reference-based deconvolution method that estimates cell type composition in spatial transcriptomics based on cell type specific expression information obtained from a reference scRNA-seq data. A key feature of CARD is its ability to accommodate spatial correlation in the cell type composition across tissue locations, enabling accurate and spatially informed cell type deconvolution as well as refined spatial map construction. CARD relies on an efficient optimization algorithm for constrained maximum likelihood estimation and is scalable to spatial transcriptomics with tens of thousands of spatial locations and tens of thousands of genes.

Installation

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("CARDspa")

This tutorial is the example analysis with CARDspa on the human pancreatic ductal adenocarcinomas data from Moncada et al, 2020. Please note that we are using edited data, see the tutorial for an example using the complete data.

Required input data

CARD requires two types of input data: - spatial transcriptomics count data, along with spatial location information.
- single cell RNAseq (scRNA-seq) count data, along with meta information indicating the cell type information and the sample (subject) information for each cell.

The example data for running the tutorial is included in the package. Here are the details about the required data input illustrated by the example datasets.

1. spatial transcriptomics data, e.g.,

library(CARDspa)
library(RcppML)
library(NMF)
#> Loading required package: registry
#> Registered S3 methods overwritten by 'registry':
#>   method               from 
#>   print.registry_field proxy
#>   print.registry_entry proxy
#> Loading required package: rngtools
#> Loading required package: cluster
#> NMF - BioConductor layer [OK] | Shared memory capabilities [NO: bigmemory] | Cores 2/2
#>   To enable shared memory capabilities, try: install.extras('
#> NMF
#> ')
#> 
#> Attaching package: 'NMF'
#> The following object is masked from 'package:generics':
#> 
#>     fit
#> The following object is masked from 'package:RcppML':
#> 
#>     nmf
library(RcppArmadillo)
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'matrixStats'
#> The following objects are masked from 'package:Biobase':
#> 
#>     anyMissing, rowMedians
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> The following object is masked from 'package:Biobase':
#> 
#>     rowMedians
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:NMF':
#> 
#>     nrun
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
library(SpatialExperiment)
library(ggplot2)
#### load the example spatial transcriptomics count data,
data(spatial_count)
spatial_count[1:4, 1:4]
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#>             10x10 10x13 10x14 10x15
#> ALDH1L1.AS2     .     .     .     .
#> EIF4A1P11       .     .     .     .
#> CYTH2           .     .     .     .
#> AL121612.1      7     .     1     .

The spatial transcriptomics count data must be in the format of matrix or sparseMatrix, while each row represents a gene and each column represents a spatial location. The column names of the spatial data can be in the “XcoordxYcoord” (i.e., 10x10) format, but you can also maintain your original spot names, for example, barcode names.

#### load the example spatial location data,
data(spatial_location)
spatial_location[1:4, ]
#>        x  y
#> 10x10 10 10
#> 10x13 10 13
#> 10x14 10 14
#> 10x15 10 15

The spatial location data must be in the format of data frame while each row represents a spatial location, the first column represents the x coordinate and the second column represents the y coordinate. The rownames of the spatial location data frame should match exactly with the column names of the spatial_count.

2. single cell RNAseq ((scRNA-seq)) data, e.g.,

data(sc_count)
sc_count[1:4, 1:4]
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#>         Cell1 Cell2 Cell3 Cell4
#> ZNF641      .     .     .     .
#> XAB2        .     .     1     .
#> C1orf87     .     .     .     .
#> USP30       .     .     .     .

The scRNA-seq count data must be in the format of matrix or sparseMatrix, while each row represents a gene and each column represents a cell.

data(sc_meta)
sc_meta[1:4, ]
#>       cellID                             cellType sampleInfo
#> Cell1  Cell1                         Acinar_cells    sample1
#> Cell2  Cell2          Ductal_terminal_ductal_like    sample1
#> Cell3  Cell3          Ductal_terminal_ductal_like    sample1
#> Cell4  Cell4 Ductal_CRISP3_high-centroacinar_like    sample1

The scRNAseq meta data must be in the format of data frame while each row represents a cell. The rownames of the scRNAseq meta data should match exactly with the column names of the scRNAseq count data. The sc_meta data must contain the column indicating the cell type assignment for each cell (e.g., “cellType” column in the example sc_meta data). Sample/subject information should be provided, if there is only one sample, we can add a column by sc_meta$sampleInfo = "sample1".

We suggest the users to check their single cell RNASeq data carefully before running CARD. We suggest the users to input the single cell RNAseq data with each cell type containing at least 2 cells. i.e.  print(table(sc_meta$cellType,useNA = “ifany”))

Cell Type Deconvolution

1. Deconvolution using CARD

We can use CARD_deconvolution to deconvolute the spatial transcriptomics data. The essential inputs are: - sc_count: Matrix or sparse matrix of raw scRNA-seq count data, each row represents a gene and each column represents a cell. This sc_count data serves as a reference for the cell type deconvolution for spatial transcriptomics data. - sc_meta: Data frame, with each row representing the cell type and/or sample information of a specific cell. The row names of this data frame should match exactly with the column names of the sc_count data. The sc_meta data must contain the column indicating the cell type assignment for each cell (e.g., “cellType” column in the example sc_meta data). - spatial_count: Matrix or sparse matrix of raw spatial resolved transcriptomics count data, each row represents a gene and each column represents a spatial location. This is the spatial transcriptomics data that we are interested to deconvolute. - spatial_location: Data frame, with two columns representing the x and y coordinates of the spatial location. The rownames of this data frame should match eaxctly with the columns of the spatial_count. - ct.varname: Caracter, the name of the column in sc_meta that specifies the cell type assignment. - ct.select: Vector of cell type names that you are interested in to deconvolute, default as NULL. If NULL, then use all cell types provided by single cell dataset. - sample.varname: Character,the name of the column in sc_meta that specifies the sample/subject information. If NULL, we just use the whole data as one sample/subject. - minCountGene: Numeric, include spatial locations where at least this number of counts detected. Default is 100. - minCountSpot: Numeric, include genes where at least this number of spatial locations that have non-zero expression. Default is 5.

This function first create a CARD object and then do deconvolution. Finally return a SpatialExperiment object. The results are stored in CARD_obj$Proportion_CARD

CARD is computationally fast and memory efficient. CARD relies on an efficient optimization algorithm for constrained maximum likelihood estimation and is scalable to spatial transcriptomics with tens of thousands of spatial locations and tens of thousands of genes. For the example dataset with the sample size of 428 locations, it takes within a minute to finish the deconvolution.

set.seed(seed = 20200107)
CARD_obj <- CARD_deconvolution(
    sc_count = sc_count,
    sc_meta = sc_meta,
    spatial_count = spatial_count,
    spatial_location = spatial_location,
    ct_varname = "cellType",
    ct_select = unique(sc_meta$cellType),
    sample_varname = "sampleInfo",
    mincountgene = 100,
    mincountspot = 5 )
#> ## QC on scRNASeq dataset! ...
#> ## QC on spatially-resolved dataset! ...
#> ## create reference matrix from scRNASeq...
#> ## Select Informative Genes! ...
#> ## Deconvolution Starts! ...
#> ## Deconvolution Finish! ...
## QC on scRNASeq dataset! ...
## QC on spatially-resolved dataset! ..
## create reference matrix from scRNASeq...
## Select Informative Genes! ...
## Deconvolution Starts! ...
## Deconvolution Finish! ...

And CARDspa also supports using SingleCellExperiment object and SingleCellExperiment object, you can run the following code:

## create sce object
sce <- SingleCellExperiment(assay = list(counts = sc_count),
                            colData = sc_meta)
## create spe object
spe <- SpatialExperiment(assay = list(counts = spatial_count),
                        spatialCoords  = as.matrix(spatial_location)
                        )
celltypes <- unique(sc_meta$cellType)
    
set.seed(seed = 20200107)
CARD_obj <- CARD_deconvolution(
    spe = spe,
    sce = sce,
    sc_count = NULL,
    sc_meta = NULL,
    spatial_count = NULL,
    spatial_location = NULL,
    ct_varname = "cellType",
    ct_select = celltypes,
    sample_varname = "sampleInfo",
    mincountgene = 100,
    mincountspot = 5
)
#> ## QC on scRNASeq dataset! ...
#> ## QC on spatially-resolved dataset! ...
#> ## create reference matrix from scRNASeq...
#> ## Select Informative Genes! ...
#> ## Deconvolution Starts! ...
#> ## Deconvolution Finish! ...

The spatial data are stored in assays(CARD_obj)$spatial_countMat and spatialCoords(CARD_obj) while the scRNA-seq data is stored in CARD_obj@metadata$sc_eset in the format of SingleCellExperiment. The results are stored in CARD_obj$Proportion_CARD.

print(CARD_obj$Proportion_CARD[1:2, ])
#>       Acinar_cells Ductal_terminal_ductal_like
#> 10x10   0.02822940                 0.002398456
#> 10x13   0.01838797                 0.037258897
#>       Ductal_CRISP3_high-centroacinar_like Cancer_clone_A Ductal_MHC_Class_II
#> 10x10                           0.01616564   0.0002066411           0.1180560
#> 10x13                           0.50388060   0.0958701424           0.1474945
#>       Cancer_clone_B      mDCs_A Ductal_APOL1_high-hypoxic  Tuft_cells
#> 10x10     0.01831038 0.034041601               0.005807915 0.023857592
#> 10x13     0.04352440 0.005606759               0.009695791 0.002570037
#>            mDCs_B        pDCs Endocrine_cells Endothelial_cells Macrophages_A
#> 10x10 0.204028802 0.001366423    0.0006522617        0.08394567  1.249407e-06
#> 10x13 0.001295846 0.001051143    0.0039317752        0.02744028  7.680739e-09
#>         Mast_cells Macrophages_B T_cells_&_NK_cells    Monocytes       RBCs
#> 10x10 2.390952e-07  3.830345e-05       4.859000e-02 4.281992e-07 0.00186420
#> 10x13 2.075633e-02  2.655632e-09       9.415854e-05 5.860814e-05 0.00359144
#>       Fibroblasts
#> 10x10  0.41243879
#> 10x13  0.07749133

2. Visualize the proportion for each cell type

First, we jointly visualize the cell type proportion matrix through scatterpie plot.Note that here because the number of spots is relatively small, so jointly visualize the cell type proportion matrix in the scatterpie plot format is duable. We do not recommend users to visualize this plot when the number of spots is > 500. Instead, we recommend users to visualize the proportion directly, i.e., using the function CARD_visualize_prop(). Details of using this function see the next example.

## set the colors. Here, I just use the colors in the manuscript, if the color
## is not provided, the function will use default color in the package.
colors <- c(
    "#FFD92F", "#4DAF4A", "#FCCDE5", "#D9D9D9", "#377EB8", "#7FC97F",
    "#BEAED4", "#FDC086", "#FFFF99", "#386CB0", "#F0027F", "#BF5B17",
    "#666666", "#1B9E77", "#D95F02", "#7570B3", "#E7298A", "#66A61E",
    "#E6AB02", "#A6761D"
)
p1 <- CARD_visualize_pie(
    proportion = CARD_obj$Proportion_CARD,
    spatial_location = spatialCoords(CARD_obj),
    colors = colors,
    radius = 0.52
) ### You can choose radius = NULL or your own radius number
print(p1)

Then, we can select some interested cell types to visualize separately.

## select the cell type that we are interested
ct.visualize <- c(
    "Acinar_cells", "Cancer_clone_A", "Cancer_clone_B",
    "Ductal_terminal_ductal_like",
    "Ductal_CRISP3_high-centroacinar_like",
    "Ductal_MHC_Class_II",
    "Ductal_APOL1_high-hypoxic",
    "Fibroblasts"
)
## visualize the spatial distribution of the cell type proportion
p2 <- CARD_visualize_prop(
    proportion = CARD_obj$Proportion_CARD,
    spatial_location = spatialCoords(CARD_obj),
    ### selected cell types to visualize
    ct_visualize = ct.visualize,
    ### if not provide, we will use the default colors
    colors = c("lightblue", "lightyellow", "red"),
    ### number of columns in the figure panel
    NumCols = 4,
    ### point size in ggplot2 scatterplot
    pointSize = 3.0
)
print(p2)

3. Visualize the proportion for two cell types

We added a new visualization function to visualize the distribution of two cell types on the same post.

## visualize the spatial distribution of two cell types on the same plot
p3 <- CARD_visualize_prop_2CT(
    ### Cell type proportion estimated by CARD
    proportion = CARD_obj$Proportion_CARD,
    ### spatial location information
    spatial_location = spatialCoords(CARD_obj),
    ### two cell types you want to visualize
    ct2_visualize = c("Cancer_clone_A", "Cancer_clone_B"),
    ### two color scales
    colors = list(
        c("lightblue", "lightyellow", "red"),
        c("lightblue", "lightyellow", "black")
    )
)
print(p3)

5. Visualize the cell type proportion correlation

# if not provide, we will use the default colors
p4 <- CARD_visualize_Cor(CARD_obj$Proportion_CARD, colors = NULL)
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
print(p4)

Refined spatial map

A unique feature of CARD is its ability to model the spatial correlation in cell type composition across tissue locations, thus enabling spatially informed cell type deconvolution. Modeling spatial correlation allows us to not only accurately infer the cell type composition on each spatial location, but also impute cell type compositions and gene expression levels on unmeasured tissue locations, facilitating the construction of a refined spatial tissue map with a resolution much higher than that measured in the original study. Specifically, CARD constructed a refined spatial map through the function CARD_imputation. The essential inputs are:

  • CARD_object: SpatialExperiment Object created by CARD_deconvolution with estimated cell type compositions on the original spatial resolved transcriptomics data.
  • NumGrids: Initial number of newly grided spatial locations. The final number of newly grided spatial locations will be lower than this value since the newly grided locations outside the shape of the tissue will be filtered.
  • ineibor: Numeric, number of neighbors used in the imputation on newly grided spatial locations, default is 10.

Briefly, CARD first outlined the shape of the tissue by applying a two-dimensional concave hull algorithm on the existing locations, then perform imputation on the newly grided spatial locations. We recommend to check the exisiting spatial locations to see if there are outliers that are seriously affect the detection of the shape.

1. Imputation on the newly grided spatial locations

CARD_obj <- CARD_imputation(
    CARD_obj, 
    num_grids = 2000, 
    ineibor = 10, 
    exclude = NULL)
#> ## The rownames of locations are matched ...
#> ## Make grids on new spatial locations ...
## The rownames of locations are matched ...
## Make grids on new spatial locations ...

The results are store in CARD_obj$refined_prop and assays(CARD_obj)$refined_expression

## Visualize the newly grided spatial locations to see if the shape is correctly
## detected. If not, the user can provide the row names of the excluded spatial
## location data into the CARD_imputation function
location_imputation <- cbind.data.frame(
    x = as.numeric(sapply(
        strsplit(rownames(CARD_obj$refined_prop), split = "x"), "[", 1
    )),
    y = as.numeric(sapply(
        strsplit(rownames(CARD_obj$refined_prop), split = "x"), "[", 2
    ))
)
rownames(location_imputation) <- rownames(CARD_obj$refined_prop)
library(ggplot2)
p5 <- ggplot(
    location_imputation,
    aes(x = x, y = y)
) +
    geom_point(shape = 22, color = "#7dc7f5") +
    theme(
        plot.margin = margin(0.1, 0.1, 0.1, 0.1, "cm"),
        legend.position = "bottom",
        panel.background = element_blank(),
        plot.background = element_blank(),
        panel.border = element_rect(colour = "grey89", fill = NA, 
                                    linewidth = 0.5)
    )
print(p5)

2. Visualize the cell type proportion at an enhanced resolution

Now we can use the same CARD_visualize_prop function to visualize the cell type proportion at the enhanced resolution. But this time, the input of the function should be the imputed cell typr propotion and corresponding newly grided spatial locations.

p6 <- CARD_visualize_prop(
    proportion = CARD_obj$refined_prop,
    spatial_location = location_imputation,
    ct_visualize = ct.visualize,
    colors = c("lightblue", "lightyellow", "red"),
    NumCols = 4
)
print(p6)

3. Visualize the marker gene expression at an enhanced resolution

After we obtained cell type proportion at the enhanced resolution by CARD, we can predict the spatial gene expression at the enhanced resolution. The following code is to visualize the marker gene expression at an enhanced resolution.

p7 <- CARD_visualize_gene(
    spatial_expression = assays(CARD_obj)$refined_expression,
    spatial_location = location_imputation,
    gene_visualize = c("A4GNT", "AAMDC", "CD248"),
    colors = NULL,
    NumCols = 6
)
print(p7)

Now, compare with the original resolution, CARD facilitates the construction of a refined spatial tissue map with a resolution much higher than that measured in the original study.

p8 <- CARD_visualize_gene(
    spatial_expression = metadata(CARD_obj)$spatial_countMat,
    spatial_location = metadata(CARD_obj)$spatial_location,
    gene_visualize = c("A4GNT", "AAMDC", "CD248"),
    colors = NULL,
    NumCols = 6
)
print(p8)

Extension of CARD in a reference-free version: CARDfree

We extended CARD to enable reference-free cell type deconvolution and eliminate the need for the single-cell reference data. We refer to this extension as the reference-free version of CARD, or simply as CARDfree. Different from CARD, CARDfree no longer requires an external single-cell reference dataset and only needs users to input a list of gene names for previously known cell type markers We use the same exmple dataset to illustrate the use of CARDfree. In addition to the exmple dataset, CARDfree also requires the input of marker gene list, which is in a list format with each element of the list being the cell type specific gene markers. The example marker list for runing the tutorial is included.

Similar to CARD, we will first need to create a CARDfree object with the spatial transcriptomics dataset and the marker gene list

1. Deconvolution using CARDfree

We can use CARD_refFree to do reference-free deconvolution. This function frist creat a CARDfree object and then do deconvolution. Briefly, the essential inputs are the same as the function CARD_deconvolution, except that this function does not require the single cell count and meta information matrix. Instead, it requires a markerList.

## deconvolution using CARDfree
data(markerList)
set.seed(seed = 20200107)
CARDfree_obj <- CARD_refFree(
    markerlist = markerList,
    spatial_count = spatial_count,
    spatial_location = spatial_location,
    mincountgene = 100,
    mincountspot = 5
)
#> ## Number of unique marker genes: 1711 for 20 cell types ...
#> ## Deconvolution Finish! ...

Similarly, you can use SpatialExperiment object.

data(markerList)
set.seed(seed = 20200107)
CARDfree_obj <- CARD_refFree(
    markerlist = markerList,
    spatial_count = NULL,
    spatial_location = NULL,
    spe = spe,
    mincountgene = 100,
    mincountspot = 5
)
#> ## Number of unique marker genes: 1711 for 20 cell types ...
#> ## Deconvolution Finish! ...

The spatial data are stored in assays(CARDfree_obj)$spatial_countMat and spatialCoords(CARDfree_obj) while the marker list is stored in CARDfree_obj@metadata$markerList in the format of list. The results are stored in CARDfree_obj$Proportion_CARD.

## One limitation of reference-free version of CARD is that the cell
## types inferred
## from CARDfree do not come with a cell type label. It might be difficult to
## interpret the results.
print(CARDfree_obj$Proportion_CARD[1:2, ])
#>              CT1          CT2          CT3          CT4          CT5
#> 10x10 0.56801726 6.769247e-02 1.588566e-12 4.774388e-02 4.516061e-03
#> 10x13 0.01074093 6.617604e-42 1.145420e-01 2.500291e-21 2.693620e-27
#>                CT6        CT7          CT8          CT9         CT10
#> 10x10 7.621232e-30 0.09191463 0.0008195946 1.756793e-02 1.014314e-02
#> 10x13 1.316703e-01 0.21314265 0.1151354998 1.816705e-06 9.278407e-45
#>               CT11        CT12         CT13         CT14         CT15
#> 10x10 8.808949e-02 0.004413839 1.169808e-14 1.195882e-14 3.446914e-03
#> 10x13 7.412540e-12 0.163930305 5.222797e-61 7.274610e-09 1.180371e-41
#>               CT16       CT17       CT18          CT19       CT20
#> 10x10 0.0058732826 0.01014773 0.05644397  8.872218e-24 0.02316981
#> 10x13 0.0001647055 0.02645184 0.01083576 2.048134e-109 0.21338415

3. Visualization of the results of CARDfree

Note that here because the number of spots is relatively small, so jointly visualize the cell type proportion matrix in the scatterpie plot format is duable. We do not recommend users to visualize this plot when the number of spots is > 500. Instead, we recommend users to visualize the proportion directly, i.e., using the function CARD_visualize_prop().

colors <- c(
    "#FFD92F", "#4DAF4A", "#FCCDE5", "#D9D9D9", "#377EB8", "#7FC97F", "#BEAED4",
    "#FDC086", "#FFFF99", "#386CB0", "#F0027F", "#BF5B17", "#666666",
    "#1B9E77", "#D95F02", "#7570B3", "#E7298A", "#66A61E", "#E6AB02",
    "#A6761D"
)
### In order to maximumply match with the original results of CARD, we order the
### colors to generally match with the results infered by CARD
current_data <- CARDfree_obj$Proportion_CARD
new_order <- current_data[, c(
    8, 10, 14, 2, 1, 6, 12, 18, 7, 13, 20, 19, 16,
    17, 11, 15, 4, 9, 3, 5
)]
CARDfree_obj$Proportion_CARD <- new_order
colnames(CARDfree_obj$Proportion_CARD) <- paste0("CT", 1:20)
p9 <- CARD_visualize_pie(CARDfree_obj$Proportion_CARD,
    spatialCoords(CARDfree_obj),
    colors = colors
)
print(p9)

Extension of CARD for single cell resolution mapping

We also extended CARD to facilitate the construction of single-cell resolution spatial transcriptomics from non-single-cell resolution spatial transcriptomics. Details of the algorithm see the main text. Briefly, we infer the single cell resolution gene expression for each measured spatial location from the non-single cell resolution spatial transcriptomics data based on reference scRNaseq data we used for deconvolution. The procedure is implemented in the function CARD_SCMapping. The essential inputs are: - CARD_object: CARD object create by the createCARDObject function. This one should be the one after we finish the deconvolution procedure - shapeSpot: a character indicating whether the sampled spatial coordinates for single cells locating in a Square-like region or a Circle-like region. The center of this region is the measured spatial location in the non-single cell resolution spatial transcriptomics data. The default is “Square”, and the other option is “Circle” - numCell: a numeric value indicating the number of cores used to accelerate the procedure.

#### Note that here the shapeSpot is the user defined variable which
#### indicates the capturing area of single cells. Details see above.
set.seed(seed = 20210107)
scMapping <- CARD_scmapping(CARD_obj, shapeSpot = "Square", numcell = 20, 
                            ncore = 2)
print(scMapping)
#> class: SingleCellExperiment 
#> dim: 5814 8460 
#> metadata(0):
#> assays(1): counts
#> rownames(5814): ZNF641 XAB2 ... APOBEC3G PCYT1B
#> rowData names(1): rownames(count_ct)
#> colnames(8460): Cell1531:10x10:9.97518192091957x9.83765210071579
#>   Cell973:10x10:10.1896061601583x9.94081321195699 ...
#>   Cell738:9x32:9.26695604040287x32.4533654800616
#>   Cell1002:9x32:9.40997453313321x32.3579135271721
#> colData names(7): x y ... CT Cell
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
### spatial location info and expression count of the single cell resolution
### data
MapCellCords <- as.data.frame(colData(scMapping))
count_SC <- assays(scMapping)$counts

The results are stored in a SingleCellExperiment object with mapped single cell resolution counts stored in the assays slot and the information of the spatial location for each single cell as well as their relashionship to the original measured spatial location is stored in the colData slot.

Next, we visualize the cell type for each single cell with their spatial location information

df <- MapCellCords
colors <- c(
    "#8DD3C7", "#CFECBB", "#F4F4B9", "#CFCCCF", "#D1A7B9", "#E9D3DE", "#F4867C",
    "#C0979F", "#D5CFD6", "#86B1CD", "#CEB28B", "#EDBC63", "#C59CC5",
    "#C09CBF", "#C2D567", "#C9DAC3", "#E1EBA0",
    "#FFED6F", "#CDD796", "#F8CDDE"
)
p10 <- ggplot(df, aes(x = x, y = y, colour = CT)) +
    geom_point(size = 3.0) +
    scale_colour_manual(values = colors) +
    # facet_wrap(~Method,ncol = 2,nrow = 3) +
    theme(
        plot.margin = margin(0.1, 0.1, 0.1, 0.1, "cm"),
        panel.background = element_rect(colour = "white", fill = "white"),
        plot.background = element_rect(colour = "white", fill = "white"),
        legend.position = "bottom",
        panel.border = element_rect(
            colour = "grey89", 
            fill = NA, 
            linewidth = 0.5),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank(),
        legend.title = element_text(size = 13, face = "bold"),
        legend.text = element_text(size = 12),
        legend.key = element_rect(colour = "transparent", fill = "white"),
        legend.key.size = unit(0.45, "cm"),
        strip.text = element_text(size = 15, face = "bold")
    ) +
    guides(color = guide_legend(title = "Cell Type"))
print(p10)

Session information

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] ggplot2_3.5.1               SpatialExperiment_1.17.0   
#>  [3] SingleCellExperiment_1.29.1 SummarizedExperiment_1.37.0
#>  [5] GenomicRanges_1.59.1        GenomeInfoDb_1.43.4        
#>  [7] IRanges_2.41.3              S4Vectors_0.45.4           
#>  [9] MatrixGenerics_1.19.1       matrixStats_1.5.0          
#> [11] RcppArmadillo_14.2.3-1      NMF_0.28                   
#> [13] Biobase_2.67.0              BiocGenerics_0.53.6        
#> [15] generics_0.1.3              cluster_2.1.8              
#> [17] rngtools_1.5.2              registry_0.5-1             
#> [19] RcppML_0.3.7                CARDspa_0.99.5             
#> 
#> loaded via a namespace (and not attached):
#>   [1] DBI_1.2.3               deldir_2.0-4            rlang_1.1.5            
#>   [4] magrittr_2.0.3          gridBase_0.4-7          spatstat.geom_3.3-5    
#>   [7] e1071_1.7-16            compiler_4.4.2          vctrs_0.6.5            
#>  [10] maps_3.4.2.1            reshape2_1.4.4          quantreg_6.00          
#>  [13] stringr_1.5.1           pkgconfig_2.0.3         crayon_1.5.3           
#>  [16] fastmap_1.2.0           magick_2.8.5            XVector_0.47.2         
#>  [19] mcmc_0.9-8              labeling_0.4.3          rmarkdown_2.29         
#>  [22] UCSC.utils_1.3.1        MatrixModels_0.5-3      purrr_1.0.4            
#>  [25] xfun_0.51               cachem_1.1.0            jsonlite_1.9.0         
#>  [28] DelayedArray_0.33.6     spatstat.utils_3.1-2    BiocParallel_1.41.2    
#>  [31] tweenr_2.0.3            wrMisc_1.15.2           parallel_4.4.2         
#>  [34] R6_2.6.1                spatstat.data_3.1-4     bslib_0.9.0            
#>  [37] stringi_1.8.4           RColorBrewer_1.1-3      spatstat.univar_3.1-1  
#>  [40] jquerylib_0.1.4         iterators_1.0.14        Rcpp_1.0.14            
#>  [43] knitr_1.49              fields_16.3             Matrix_1.7-2           
#>  [46] nnls_1.6                splines_4.4.2           tidyselect_1.2.1       
#>  [49] abind_1.4-8             yaml_2.3.10             doParallel_1.0.17      
#>  [52] spatstat.random_3.3-2   codetools_0.2-20        curl_6.2.1             
#>  [55] lattice_0.22-6          tibble_3.2.1            plyr_1.8.9             
#>  [58] withr_3.0.2             coda_0.19-4.1           evaluate_1.0.3         
#>  [61] survival_3.8-3          sf_1.0-19               units_0.8-5            
#>  [64] proxy_0.4-27            scatterpie_0.2.4        polyclip_1.10-7        
#>  [67] BiocManager_1.30.25     pillar_1.10.1           KernSmooth_2.23-26     
#>  [70] foreach_1.5.2           ggfun_0.1.8             sp_2.2-0               
#>  [73] munsell_0.5.1           scales_1.3.0            gtools_3.9.5           
#>  [76] class_7.3-23            glue_1.8.0              maketools_1.3.2        
#>  [79] tools_4.4.2             sys_3.4.3               SparseM_1.84-2         
#>  [82] RANN_2.6.2              buildtools_1.0.0        fs_1.6.5               
#>  [85] dotCall64_1.2           grid_4.4.2              tidyr_1.3.1            
#>  [88] MCMCpack_1.7-1          colorspace_2.1-1        GenomeInfoDbData_1.2.13
#>  [91] ggforce_0.4.2           cli_3.6.4               spam_2.11-1            
#>  [94] S4Arrays_1.7.3          viridisLite_0.4.2       dplyr_1.1.4            
#>  [97] concaveman_1.1.0        V8_6.0.1                gtable_0.3.6           
#> [100] ggcorrplot_0.1.4.1      yulab.utils_0.2.0       sass_0.4.9             
#> [103] digest_0.6.37           classInt_0.4-11         SparseArray_1.7.6      
#> [106] rjson_0.2.23            farver_2.1.2            htmltools_0.5.8.1      
#> [109] lifecycle_1.0.4         httr_1.4.7              MASS_7.3-64

References

Ma, Y., Zhou, X. Spatially informed cell-type deconvolution for spatial transcriptomics. Nat Biotechnol 40, 1349–1359 (2022). https://doi.org/10.1038/s41587-022-01273-7

Moncada, R., Barkley, D., Wagner, F. et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat Biotechnol 38, 333–342 (2020). https://doi.org/10.1038/s41587-019-0392-8