This package allows the user to decide to load data from single-cell level spatial transcriptomics technologies, such as Xenium, CosMx, MERSCOPE, STARmapPLUS, or seqFISH as either SpatialExperiment (SPE) or SingleCellExperiment (SCE) object.

The only difference between the two object types are where to store the spatial coordinates. For the current version of SpatialExperiment, the spatialCoords(spe) are stored in a separate slot other than colData(spe). On the other hand, SingleCellExperiment stores the spatialCoords() inside of colData(spe).

After reading in the data, we need to look at the landscape of other downstream analysis tools. For example, library(BayesSpace) is a clustering tool developed for spatial transcriptomics data, but it only takes a SCE object and looks for the spatial coordinates in the colData(sce). Other spatial visualization packages, such as library(ggspavis) in its newest version, is compatible with both SPE and SCE objects.

Therefore, to avoid the pain of object conversion, we give the flexibility to let the user decide what object type to return.

Setup

suppressMessages({
  library(SpatialExperimentIO)
  library(SpatialExperiment)
})

Quick Start

For Xenium data:

### DO NOT RUN. Example code.
xepath <- "/path/to/folder"
# Xenium as SPE, or SCE by setting `returnType = "SCE"`.
xe_spe <- readXeniumSXE(dirName = xepath)

For CosMx data:

### DO NOT RUN. Example code.
cospath <- "/path/to/folder"
# CosMx as SPE, or SCE by setting `returnType = "SCE"`.
cos_spe <- readCosmxSXE(dirName = cospath)

For MERSCOPE data:

### DO NOT RUN. Example code.
merpath <- "/path/to/folder"
# MERSCOPE as SPE, or SCE by setting `returnType = "SCE"`.
mer_spe <- readMerscopeSXE(dirName = merpath)

For STARmap PLUS data:

### DO NOT RUN. Example code.
starpath <- "/path/to/folder"
# STARmapPLUS as SPE, or SCE by setting `returnType = "SCE"`.
star_spe <- readStarmapplusSXE(dirName = starpath)

For seqFISH data:

### DO NOT RUN. Example code.
seqfpath <- "/path/to/folder"
# seqFISH as SPE, or SCE by setting `returnType = "SCE"`.
seqf_spe <- readSeqfishSXE(dirName = merpath)

That is pretty much all you need. To learn more details, please read the below sections for each technology.

Xenium

Xenium is an imaging-based in-situ sequencing technology developed by 10x Genomics. Compared to the full transcriptome coverage sequencing-based technology Visium, Xenium allows for transcript-level resolution count detection but with less genes. The transcripts are segmented into single cells and SpatialExperimentIO returns the cell-level SPE or SCE object. To read more about Xenium technology workflow, please refer to the Xenium technology overview. For more publicly available Xenium data, please refer to Xenium data download.

The object constructor assumes the downloaded unzipped Xenium Output Bundle contains the mandatory file of cells.parquet and either a folder /cell_feature_matrix or a .h5 file cell_feature_matrix.h5.

xepath <- system.file(
  file.path("extdata", "Xenium_small"),
  package = "SpatialExperimentIO")

list.files(xepath)

## [1] "cell_boundaries.parquet"    "cell_feature_matrix.h5"    
## [3] "cells.csv.gz"               "cells.parquet"             
## [5] "experiment.xenium"          "nucleus_boundaries.parquet"
## [7] "transcripts.parquet"

Read Xenium as a `SpatialExperiment` object

We display the default specification of each variable in readXeniumSXE(). To read in Xenium as a SpatialExperiment object with count matrix, column data, and all other common raw download files, you would only need to provide a valid directory name.

# read the count matrix .h5 file - automatically DropletUtils::read10xCounts(type = "HDF5")
xe_spe <- readXeniumSXE(dirName = xepath,
                        returnType = "SPE",
                        countMatPattern = "cell_feature_matrix.h5",
                        metaDataPattern = "cells.parquet",
                        coordNames = c("x_centroid", "y_centroid"),
                        addExperimentXenium = TRUE, # set to TRUE to load experiment.xenium
                        altExps = c("NegControlProbe", "UnassignedCodeword",
                                    "NegControlCodeword", "antisense", "BLANK"),
                        addParquetPaths = TRUE #, # set TRUE to load all .parquet below
                        # ... # takes arguments as below
                        ) 
xe_spe <- addParquetPathsXenium(sxe = xe_spe,
                                dirName = xepath,
                                addTx = TRUE,
                                txMetaNames = "transcripts",
                                txPattern = "transcripts.parquet",
                                addCellBound = TRUE,
                                cellBoundMetaNames = "cell_boundaries",
                                cellBoundPattern = "cell_boundaries.parquet",
                                addNucBound = TRUE,
                                NucBoundMetaNames = "nucleus_boundaries",
                                NucBoundPattern = "nucleus_boundaries.parquet")

Note that experiment.xenium is by default added to metaata(), and you can disable it by setting addExperimentXenium = FALSE.

xe_spe <- readXeniumSXE(dirName = xepath)
xe_spe

## class: SpatialExperiment 
## dim: 4 6 
## metadata(4): experiment.xenium transcripts cell_boundaries
##   nucleus_boundaries
## assays(1): counts
## rownames(4): ABCC11 ACTA2 ACTG2 ADAM9
## rowData names(3): ID Symbol Type
## colnames(6): 1 2 ... 5 6
## colData names(8): cell_id transcript_counts ... nucleus_area sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x_centroid y_centroid
## imgData names(0):

Additionally, Xenium gives different control genes in their gene panel (check with altExpNames(xe_spe) to see). In this mini example, we obtain a Xenium dataset with 4 genes all with type of "Gene Expression" in their rowData(xe_spe) and 6 cells just for illustration. In a real Xenium data download, possible altExp(xe_spe) for Xenium can have the default categories, depending on the experiment setup. For example, below we manually specified gene names contains “TestGene” will be put into altExp(). We add any other .parquet files (cell_boundaries.parquet, nucleus_boundaries.parquet), except transcripts.parquet, to the metadata(). You can control what path to add with additional boolean arguments - addTx, addCellBound, addNucBound. See the documentation for the helper function addParquetPathsXenium().

xe_spe <- readXeniumSXE(xepath,
                        altExps = c("TestGene"),
                        addParquetPaths = TRUE,
                        addTx = FALSE)
xe_spe

## class: SpatialExperiment 
## dim: 4 6 
## metadata(3): experiment.xenium cell_boundaries nucleus_boundaries
## assays(1): counts
## rownames(4): ABCC11 ACTA2 ACTG2 ADAM9
## rowData names(3): ID Symbol Type
## colnames(6): 1 2 ... 5 6
## colData names(8): cell_id transcript_counts ... nucleus_area sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x_centroid y_centroid
## imgData names(0):

If you do not have cell_feature_matrix.h5 but the folder /cell_feature_matrix instead, it should contain the following files.

list.files(file.path(xepath, "cell_feature_matrix"))
# "barcodes.tsv.gz" "features.tsv.gz" "matrix.mtx.gz"

For this example, we only provide cell_feature_matrix.h5 for demonstration. However, alternatively you can read in Xenium by specifying countMatPattern as the folder "cell_feature_matrix". You should also subset to "Gene Expression" gene type like previously.

# or read the count matrix folder - automatically DropletUtils::read10xCounts(type = "sparse")
xe_spe <- readXeniumSXE(dirName = xepath,
                        countMatPattern = "cell_feature_matrix")

Read Xenium as a `SingleCellExperiment` object

Instead, if you are interested in storing the spatialCoords() columns in colData and read Xenium in as a SingleCellExperiment, you need to change readXeniumSXE(returnType = ) to "SCE". It is also required to subset to "Gene Expression" gene type. We end up with an SCE object with 248 genes.

xe_sce <- readXeniumSXE(dirName = xepath, returnType = "SCE")
xe_sce

## class: SingleCellExperiment 
## dim: 4 6 
## metadata(4): experiment.xenium transcripts cell_boundaries
##   nucleus_boundaries
## assays(1): counts
## rownames(4): ABCC11 ACTA2 ACTG2 ADAM9
## rowData names(3): ID Symbol Type
## colnames(6): 1 2 ... 5 6
## colData names(9): cell_id x_centroid ... cell_area nucleus_area
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

This is a mock Xenium datasets with 4 genes by 6 cells. Some Xenium data set can have a dimension of 313 genes and around 110,000 cells in the Xenium human breast cancer data.

For more visualization tools for spatial transcriptomics downstream data analysis, including helpers for QC, marker gene expression level and clustering results on reduced dimensions or its spatial distribution, please refer to BiocManager::install("ggspavis").

CosMx

CosMx is an imaging-based in-situ sequencing technology by Nanostring. To read more about the CosMx technology workflow, please refer to the technology overview. For more publicly available data sets, please refer to the CosMx data download website.

The object constructor assumes the data download folder contains two mandatory files with exprMat_file.csv and metadata_file.csv in the names.

cospath <- system.file(
  file.path("extdata", "CosMx_small"),
  package = "SpatialExperimentIO")

list.files(cospath)

## [1] "lung_p9s1_exprMat_file.csv"       "lung_p9s1_fov_positions_file.csv"
## [3] "lung_p9s1_metadata_file.csv"      "lung_p9s1_tx_file.csv"           
## [5] "lung_p9s1_tx_file.parquet"

Read CosMx as a `SpatialExperiment` object

We commented out the default specification of each variable in readCosmxSXE(). To read in CosMx as a SpatialExperiment object, you would only need to provide a valid directory name.

cos_spe <- readCosmxSXE(dirName = cospath,
                        returnType = "SPE",
                        countMatPattern = "exprMat_file.csv",
                        metaDataPattern = "metadata_file.csv",
                        coordNames = c("CenterX_global_px", "CenterY_global_px"),
                        addFovPos = TRUE, # set to TRUE to add fov position columns to colData()
                        fovPosPattern = "fov_positions_file.csv",
                        altExps = c("NegPrb", "Negative", "SystemControl", "FalseCode"),
                        addParquetPaths = TRUE # , # set TRUE to add all .parquet below
                        # ... # takes arguments as below
                        ) 
cos_spe <- addParquetPathsCosmx(sxe = cos_spe,
                                dirName = cospath,
                                addTx = TRUE,
                                txMetaNames = "transcripts",
                                txPattern = "tx_file.csv",
                                addPolygon = TRUE,
                                polygonMetaNames = "polygons",
                                polygonPattern = "polygons.csv")

With this example dataset, we obtained a CosMx SPE object with 8 genes and 9 cells. Here is a demonstration of adding all recommended files as path to parquet in the metadata(), except there is no polygon file in this example data.

cos_spe <- readCosmxSXE(cospath,
                        addPolygon = FALSE)
cos_spe

## class: SpatialExperiment 
## dim: 8 9 
## metadata(2): transcripts fov_positions
## assays(1): counts
## rownames(8): AATK ABL1 ... ACKR3 ACKR4
## rowData names(0):
## colnames(9): 1 2 ... 8 9
## colData names(6): fov cell_ID ... CenterY_local_px sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(0):

Read CosMx as a `SingleCellExperiment` object

Alternatively, you can also read CosMx in as a SCE.

cos_sce <- readCosmxSXE(dirName = cospath, addPolygon = FALSE, returnType = "SCE")
cos_sce

## class: SingleCellExperiment 
## dim: 8 9 
## metadata(2): transcripts fov_positions
## assays(1): counts
## rownames(8): AATK ABL1 ... ACKR3 ACKR4
## rowData names(0):
## colnames(9): 1 2 ... 8 9
## colData names(7): fov cell_ID ... CenterX_global_px CenterY_global_px
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

In reality, a CosMx data set can have a dimension of 980 genes and around 100,000 cells for the human lung cancer data.

MERSCOPE

MERSCOPE integrated MERFISH spatial transcriptomics technology with high resolution spatial imaging, fluidics, image processing, and is a product by Vizgen. To understand more about the MERFISH technology behind MERSCOPE, please refer to the MERFISH Technology Overview. For more publicly available MERSCOPE data, please see MERSCOPE data download page.

The object constructor assumes the data download folder contains two mandatory files with cell_by_gene.csv and cell_metadata.csv in the names.

merpath <- system.file(
  file.path("extdata", "MERSCOPE_small"),
  package = "SpatialExperimentIO")

list.files(merpath)

## [1] "ovarian_p2s2_cell_by_gene.csv"  "ovarian_p2s2_cell_metadata.csv"

Read MERSCOPE as a `SpatialExperiment` object

We commented out the default specification of each variable in readMerscopeSXE(). To read in MERSCOPE as a SpatialExperiment object, you would only need to provide a valid directory name. With this example dataset, we obtained a MERSCOPE SPE object with 9 genes and 8 cells.

# mer_spe <- readMerscopeSXE(dirName = merpath, 
#                            returnType = "SPE",
#                            countMatPattern = "cell_by_gene.csv", 
#                            metaDataPattern = "cell_metadata.csv", 
#                            coordNames = c("center_x", "center_y"))

mer_spe <- readMerscopeSXE(dirName = merpath)
mer_spe

## class: SpatialExperiment 
## dim: 9 8 
## metadata(0):
## assays(1): counts
## rownames(9): PDK4 CCL26 ... ICAM3 TBX21
## rowData names(0):
## colnames(8): 15590 15591 ... 15596 88870
## colData names(3): fov volume sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : center_x center_y
## imgData names(0):

Read MERSCOPE as a `SingleCellExperiment` object

Alternatively, you can also read MERSCOPE in as a SCE.

mer_sce <- readMerscopeSXE(dirName = merpath, returnType = "SCE")
mer_sce

## class: SingleCellExperiment 
## dim: 9 8 
## metadata(0):
## assays(1): counts
## rownames(9): PDK4 CCL26 ... ICAM3 TBX21
## rowData names(0):
## colnames(8): 15590 15591 ... 15596 88870
## colData names(4): fov volume center_x center_y
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

In reality, a MERSCOPE data set can have a dimension of 550 genes and around 250,000 cells for the human ovarian cancer data.

STARmap PLUS

STARmap PLUS is an imaging-based in-situ sequencing technology that has been introduced by Zeng et al.. The object constructor assumes the data download folder contains two mandatory files with raw_expression_pd.csv and spatial.csv in the names.

starpath <- system.file(
  file.path("extdata", "STARmapPLUS_small"),
  package = "SpatialExperimentIO")

list.files(starpath)

## [1] "mock_spatial.csv"          "mockraw_expression_pd.csv"

Read STARmap PLUS as a `SpatialExperiment` object

We comment out the default parameters for your reference. In this example dataset, we provide a sample with 8 genes and 9 cells just for illustration.

# readStarmapplusSXE <- function(dirName = dirName, 
#                                returnType = "SPE",
#                                countMatPattern = "raw_expression_pd.csv", 
#                                metaDataPattern = "spatial.csv", 
#                                coordNames = c("X", "Y", "Z"))

star_spe <- readStarmapplusSXE(dirName = starpath)
star_spe

## class: SpatialExperiment 
## dim: 8 9 
## metadata(0):
## assays(1): counts
## rownames(8): A2M ABCC9 ... ADAMTS15 ADARB2
## rowData names(0):
## colnames(9): well05_0 well05_1 ... well05_7 well05_8
## colData names(7): NAME Main_molecular_cell_type ...
##   Molecular_spatial_cell_type sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(3) : X Y Z
## imgData names(0):

Read STARmap PLUS as a `SingleCellExperiment` object

Alternatively, you can also return a SingleCellExperiment object.

star_sce <- readStarmapplusSXE(dirName = starpath, returnType = "SCE")
star_sce

## class: SingleCellExperiment 
## dim: 8 9 
## metadata(0):
## assays(1): counts
## rownames(8): A2M ABCC9 ... ADAMTS15 ADARB2
## rowData names(0):
## colnames(9): well05_0 well05_1 ... well05_7 well05_8
## colData names(9): NAME X ... Sub_molecular_tissue_region
##   Molecular_spatial_cell_type
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

STARmap PLUS has a gene panel of around 1000 with up to millions of cells depending on the size of the tissue. There are 20 sample on mouse brain with tissue region annotated published by Shi et al.. Their data is avaiable to downloaded from Zenodo.

seqFISH

seqFISH is a product by Spatial Genomics, generated by the GenePS device. Like Xenium, it is claimed to work on intact cells and tissues. To understand more about the seqFISH technology, please refer to their website seqFISH Technology Overview. For more publicly available seqFISH data, please see seqFISH data download page.

The object constructor assumes the data download folder contains two mandatory files with CellxGene.csv and CellCoordinates.csv in the names.

seqfpath <- system.file(
  file.path("extdata", "seqFISH_small"),
  package = "SpatialExperimentIO")

list.files(seqfpath)

## [1] "TestSetSmall_CellCoordinates.csv" "TestSetSmall_CellxGene.csv"

Read seqFISH as a `SpatialExperiment` object

We commented out the default specification of each variable in readSeqfishSXE(). To read in seqFISH as a SpatialExperiment object, you would only need to provide a valid directory name. With this example dataset, we obtained a seqFISH SPE object with 9 genes and 14 cells.

# seqf_spe <- readSeqfishSXE(dirName = seqfpath, 
#                            returnType = "SPE",
#                            countMatPattern = "cell_by_gene.csv", 
#                            metaDataPattern = "cell_metadata.csv", 
#                            coordNames = c("center_x", "center_y"))

seqf_spe <- readSeqfishSXE(dirName = seqfpath)
seqf_spe

## class: SpatialExperiment 
## dim: 9 14 
## metadata(0):
## assays(1): counts
## rownames(9): ITGB1 NR4A1 ... ETS1 STAT3
## rowData names(0):
## colnames(14): 1 2 ... 13 14
## colData names(2): area sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : center_x center_y
## imgData names(0):

Read seqFISH as a `SingleCellExperiment` object

Alternatively, you can also read seqFISH in as a SCE.

seqf_sce <- readSeqfishSXE(dirName = seqfpath, returnType = "SCE")
seqf_sce

## class: SingleCellExperiment 
## dim: 9 14 
## metadata(0):
## assays(1): counts
## rownames(9): ITGB1 NR4A1 ... ETS1 STAT3
## rowData names(0):
## colnames(14): 1 2 ... 13 14
## colData names(3): area center_x center_y
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

So far, publicly available seqFISH data sets have 220 or 1092 genes in a panel for mouse kidney, for about 669,842 cells. There are other mouse indications, such as brain, liver, and intestine. There might be human colorectal and gastric cancer samples in the upcoming future.

Session Info

sessionInfo()

## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SpatialExperiment_1.17.0    SingleCellExperiment_1.29.1
##  [3] SummarizedExperiment_1.37.0 Biobase_2.67.0             
##  [5] GenomicRanges_1.59.1        GenomeInfoDb_1.43.4        
##  [7] IRanges_2.41.3              S4Vectors_0.45.4           
##  [9] BiocGenerics_0.53.6         generics_0.1.3             
## [11] MatrixGenerics_1.19.1       matrixStats_1.5.0          
## [13] SpatialExperimentIO_0.99.8  BiocStyle_2.35.0           
## 
## loaded via a namespace (and not attached):
##  [1] rjson_0.2.23              xfun_0.50                
##  [3] bslib_0.9.0               rhdf5_2.51.2             
##  [5] lattice_0.22-6            vctrs_0.6.5              
##  [7] rhdf5filters_1.19.1       tools_4.4.2              
##  [9] parallel_4.4.2            R.oo_1.27.0              
## [11] Matrix_1.7-2              data.table_1.16.4        
## [13] sparseMatrixStats_1.19.0  dqrng_0.4.1              
## [15] assertthat_0.2.1          lifecycle_1.0.4          
## [17] GenomeInfoDbData_1.2.13   compiler_4.4.2           
## [19] statmod_1.5.0             codetools_0.2-20         
## [21] htmltools_0.5.8.1         sys_3.4.3                
## [23] buildtools_1.0.0          sass_0.4.9               
## [25] yaml_2.3.10               crayon_1.5.3             
## [27] jquerylib_0.1.4           R.utils_2.12.3           
## [29] BiocParallel_1.41.0       DelayedArray_0.33.5      
## [31] cachem_1.1.0              limma_3.63.3             
## [33] magick_2.8.5              abind_1.4-8              
## [35] tidyselect_1.2.1          locfit_1.5-9.11          
## [37] digest_0.6.37             purrr_1.0.4              
## [39] arrow_18.1.0.1            maketools_1.3.2          
## [41] fastmap_1.2.0             grid_4.4.2               
## [43] cli_3.6.3                 SparseArray_1.7.5        
## [45] magrittr_2.0.3            S4Arrays_1.7.3           
## [47] h5mread_0.99.4            edgeR_4.5.2              
## [49] DelayedMatrixStats_1.29.1 UCSC.utils_1.3.1         
## [51] bit64_4.6.0-1             rmarkdown_2.29           
## [53] XVector_0.47.2            httr_1.4.7               
## [55] DropletUtils_1.27.2       bit_4.5.0.1              
## [57] R.methodsS3_1.8.2         beachmat_2.23.6          
## [59] HDF5Array_1.35.12         evaluate_1.0.3           
## [61] knitr_1.49                rlang_1.1.5              
## [63] Rcpp_1.0.14               glue_1.8.0               
## [65] scuttle_1.17.0            formatR_1.14             
## [67] BiocManager_1.30.25       jsonlite_1.8.9           
## [69] R6_2.6.0                  Rhdf5lib_1.29.0

SpatialExperimentIO - Data Reader Package Overview

Setup

Quick Start

Xenium

Read Xenium as a SpatialExperiment object

Read Xenium as a SingleCellExperiment object

CosMx

Read CosMx as a SpatialExperiment object

Read CosMx as a SingleCellExperiment object