This package allows the user to decide to load data from single-cell
level spatial transcriptomics technologies, such as Xenium, CosMx,
MERSCOPE, STARmapPLUS, or seqFISH as either
SpatialExperiment (SPE)
or
SingleCellExperiment (SCE)
object.
The only difference between the two object types are where to store
the spatial coordinates. For the current version of
SpatialExperiment
, the spatialCoords(spe)
are
stored in a separate slot other than colData(spe)
. On the
other hand, SingleCellExperiment
stores the
spatialCoords()
inside of colData(spe)
.
After reading in the data, we need to look at the landscape of other
downstream analysis tools. For example, library(BayesSpace)
is a clustering tool developed for spatial transcriptomics data, but it
only takes a SCE
object and looks for the spatial
coordinates in the colData(sce)
. Other spatial
visualization packages, such as library(ggspavis)
in its
newest version, is compatible with both SPE
and
SCE
objects.
Therefore, to avoid the pain of object conversion, we give the flexibility to let the user decide what object type to return.
### DO NOT RUN. Example code.
xepath <- "/path/to/folder"
# Xenium as SPE, or SCE by setting `returnType = "SCE"`.
xe_spe <- readXeniumSXE(dirName = xepath)
### DO NOT RUN. Example code.
cospath <- "/path/to/folder"
# CosMx as SPE, or SCE by setting `returnType = "SCE"`.
cos_spe <- readCosmxSXE(dirName = cospath)
### DO NOT RUN. Example code.
merpath <- "/path/to/folder"
# MERSCOPE as SPE, or SCE by setting `returnType = "SCE"`.
mer_spe <- readMerscopeSXE(dirName = merpath)
### DO NOT RUN. Example code.
starpath <- "/path/to/folder"
# STARmapPLUS as SPE, or SCE by setting `returnType = "SCE"`.
star_spe <- readStarmapplusSXE(dirName = starpath)
### DO NOT RUN. Example code.
seqfpath <- "/path/to/folder"
# seqFISH as SPE, or SCE by setting `returnType = "SCE"`.
seqf_spe <- readSeqfishSXE(dirName = merpath)
That is pretty much all you need. To learn more details, please read the below sections for each technology.
Xenium is an imaging-based in-situ sequencing technology developed by
10x Genomics. Compared to the full transcriptome coverage
sequencing-based technology Visium, Xenium allows for transcript-level
resolution count detection but with less genes. The transcripts are
segmented into single cells and SpatialExperimentIO
returns
the cell-level SPE
or SCE
object. To read more
about Xenium technology workflow, please refer to the Xenium
technology overview. For more publicly available Xenium data, please
refer to Xenium
data download.
The object constructor assumes the downloaded unzipped Xenium Output
Bundle contains the mandatory file of cells.parquet
and
either a folder /cell_feature_matrix
or a .h5 file
cell_feature_matrix.h5
.
xepath <- system.file(
file.path("extdata", "Xenium_small"),
package = "SpatialExperimentIO")
list.files(xepath)
## [1] "cell_boundaries.parquet" "cell_feature_matrix.h5"
## [3] "cells.csv.gz" "cells.parquet"
## [5] "experiment.xenium" "nucleus_boundaries.parquet"
## [7] "transcripts.parquet"
SpatialExperiment
objectWe display the default specification of each variable in
readXeniumSXE()
. To read in Xenium as a
SpatialExperiment
object with count matrix, column data,
and all other common raw download files, you would only need to provide
a valid directory name.
# read the count matrix .h5 file - automatically DropletUtils::read10xCounts(type = "HDF5")
xe_spe <- readXeniumSXE(dirName = xepath,
returnType = "SPE",
countMatPattern = "cell_feature_matrix.h5",
metaDataPattern = "cells.parquet",
coordNames = c("x_centroid", "y_centroid"),
addExperimentXenium = TRUE, # set to TRUE to load experiment.xenium
altExps = c("NegControlProbe", "UnassignedCodeword",
"NegControlCodeword", "antisense", "BLANK"),
addParquetPaths = TRUE #, # set TRUE to load all .parquet below
# ... # takes arguments as below
)
xe_spe <- addParquetPathsXenium(sxe = xe_spe,
dirName = xepath,
addTx = TRUE,
txMetaNames = "transcripts",
txPattern = "transcripts.parquet",
addCellBound = TRUE,
cellBoundMetaNames = "cell_boundaries",
cellBoundPattern = "cell_boundaries.parquet",
addNucBound = TRUE,
NucBoundMetaNames = "nucleus_boundaries",
NucBoundPattern = "nucleus_boundaries.parquet")
Note that experiment.xenium
is by default added to
metaata()
, and you can disable it by setting
addExperimentXenium = FALSE
.
## class: SpatialExperiment
## dim: 4 6
## metadata(4): experiment.xenium transcripts cell_boundaries
## nucleus_boundaries
## assays(1): counts
## rownames(4): ABCC11 ACTA2 ACTG2 ADAM9
## rowData names(3): ID Symbol Type
## colnames(6): 1 2 ... 5 6
## colData names(8): cell_id transcript_counts ... nucleus_area sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x_centroid y_centroid
## imgData names(0):
Additionally, Xenium gives different control genes in their gene
panel (check with altExpNames(xe_spe)
to see). In this mini
example, we obtain a Xenium dataset with 4 genes all with type of
"Gene Expression"
in their rowData(xe_spe)
and
6 cells just for illustration. In a real Xenium data download, possible
altExp(xe_spe)
for Xenium can have the default categories,
depending on the experiment setup. For example, below we manually
specified gene names contains “TestGene” will be put into
altExp()
. We add any other .parquet files
(cell_boundaries.parquet
,
nucleus_boundaries.parquet
), except
transcripts.parquet
, to the metadata()
. You
can control what path to add with additional boolean arguments -
addTx
, addCellBound
, addNucBound
.
See the documentation for the helper function
addParquetPathsXenium()
.
xe_spe <- readXeniumSXE(xepath,
altExps = c("TestGene"),
addParquetPaths = TRUE,
addTx = FALSE)
xe_spe
## class: SpatialExperiment
## dim: 4 6
## metadata(3): experiment.xenium cell_boundaries nucleus_boundaries
## assays(1): counts
## rownames(4): ABCC11 ACTA2 ACTG2 ADAM9
## rowData names(3): ID Symbol Type
## colnames(6): 1 2 ... 5 6
## colData names(8): cell_id transcript_counts ... nucleus_area sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x_centroid y_centroid
## imgData names(0):
If you do not have cell_feature_matrix.h5
but the folder
/cell_feature_matrix
instead, it should contain the
following files.
list.files(file.path(xepath, "cell_feature_matrix"))
# "barcodes.tsv.gz" "features.tsv.gz" "matrix.mtx.gz"
For this example, we only provide cell_feature_matrix.h5
for demonstration. However, alternatively you can read in Xenium by
specifying countMatPattern
as the folder
"cell_feature_matrix"
. You should also subset to
"Gene Expression"
gene type like previously.
SingleCellExperiment
objectInstead, if you are interested in storing the
spatialCoords()
columns in colData
and read
Xenium in as a SingleCellExperiment
, you need to change
readXeniumSXE(returnType = )
to "SCE"
. It is
also required to subset to "Gene Expression"
gene type. We
end up with an SCE
object with 248 genes.
## class: SingleCellExperiment
## dim: 4 6
## metadata(4): experiment.xenium transcripts cell_boundaries
## nucleus_boundaries
## assays(1): counts
## rownames(4): ABCC11 ACTA2 ACTG2 ADAM9
## rowData names(3): ID Symbol Type
## colnames(6): 1 2 ... 5 6
## colData names(9): cell_id x_centroid ... cell_area nucleus_area
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
This is a mock Xenium datasets with 4 genes by 6 cells. Some Xenium data set can have a dimension of 313 genes and around 110,000 cells in the Xenium human breast cancer data.
For more visualization tools for spatial transcriptomics downstream
data analysis, including helpers for QC, marker gene expression level
and clustering results on reduced dimensions or its spatial
distribution, please refer to
BiocManager::install("ggspavis")
.
CosMx is an imaging-based in-situ sequencing technology by Nanostring. To read more about the CosMx technology workflow, please refer to the technology overview. For more publicly available data sets, please refer to the CosMx data download website.
The object constructor assumes the data download folder contains two
mandatory files with exprMat_file.csv
and
metadata_file.csv
in the names.
cospath <- system.file(
file.path("extdata", "CosMx_small"),
package = "SpatialExperimentIO")
list.files(cospath)
## [1] "lung_p9s1_exprMat_file.csv" "lung_p9s1_fov_positions_file.csv"
## [3] "lung_p9s1_metadata_file.csv" "lung_p9s1_tx_file.csv"
## [5] "lung_p9s1_tx_file.parquet"
SpatialExperiment
objectWe commented out the default specification of each variable in
readCosmxSXE()
. To read in CosMx as a
SpatialExperiment
object, you would only need to provide a
valid directory name.
cos_spe <- readCosmxSXE(dirName = cospath,
returnType = "SPE",
countMatPattern = "exprMat_file.csv",
metaDataPattern = "metadata_file.csv",
coordNames = c("CenterX_global_px", "CenterY_global_px"),
addFovPos = TRUE, # set to TRUE to add fov position columns to colData()
fovPosPattern = "fov_positions_file.csv",
altExps = c("NegPrb", "Negative", "SystemControl", "FalseCode"),
addParquetPaths = TRUE # , # set TRUE to add all .parquet below
# ... # takes arguments as below
)
cos_spe <- addParquetPathsCosmx(sxe = cos_spe,
dirName = cospath,
addTx = TRUE,
txMetaNames = "transcripts",
txPattern = "tx_file.csv",
addPolygon = TRUE,
polygonMetaNames = "polygons",
polygonPattern = "polygons.csv")
With this example dataset, we obtained a CosMx SPE
object with 8 genes and 9 cells. Here is a demonstration of adding all
recommended files as path to parquet in the metadata()
,
except there is no polygon file in this example data.
## class: SpatialExperiment
## dim: 8 9
## metadata(2): transcripts fov_positions
## assays(1): counts
## rownames(8): AATK ABL1 ... ACKR3 ACKR4
## rowData names(0):
## colnames(9): 1 2 ... 8 9
## colData names(6): fov cell_ID ... CenterY_local_px sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(0):
SingleCellExperiment
objectAlternatively, you can also read CosMx in as a SCE
.
## class: SingleCellExperiment
## dim: 8 9
## metadata(2): transcripts fov_positions
## assays(1): counts
## rownames(8): AATK ABL1 ... ACKR3 ACKR4
## rowData names(0):
## colnames(9): 1 2 ... 8 9
## colData names(7): fov cell_ID ... CenterX_global_px CenterY_global_px
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
In reality, a CosMx data set can have a dimension of 980 genes and around 100,000 cells for the human lung cancer data.
MERSCOPE integrated MERFISH spatial transcriptomics technology with high resolution spatial imaging, fluidics, image processing, and is a product by Vizgen. To understand more about the MERFISH technology behind MERSCOPE, please refer to the MERFISH Technology Overview. For more publicly available MERSCOPE data, please see MERSCOPE data download page.
The object constructor assumes the data download folder contains two
mandatory files with cell_by_gene.csv
and
cell_metadata.csv
in the names.
merpath <- system.file(
file.path("extdata", "MERSCOPE_small"),
package = "SpatialExperimentIO")
list.files(merpath)
## [1] "ovarian_p2s2_cell_by_gene.csv" "ovarian_p2s2_cell_metadata.csv"
SpatialExperiment
objectWe commented out the default specification of each variable in
readMerscopeSXE()
. To read in MERSCOPE as a
SpatialExperiment
object, you would only need to provide a
valid directory name. With this example dataset, we obtained a MERSCOPE
SPE
object with 9 genes and 8 cells.
# mer_spe <- readMerscopeSXE(dirName = merpath,
# returnType = "SPE",
# countMatPattern = "cell_by_gene.csv",
# metaDataPattern = "cell_metadata.csv",
# coordNames = c("center_x", "center_y"))
mer_spe <- readMerscopeSXE(dirName = merpath)
mer_spe
## class: SpatialExperiment
## dim: 9 8
## metadata(0):
## assays(1): counts
## rownames(9): PDK4 CCL26 ... ICAM3 TBX21
## rowData names(0):
## colnames(8): 15590 15591 ... 15596 88870
## colData names(3): fov volume sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : center_x center_y
## imgData names(0):
SingleCellExperiment
objectAlternatively, you can also read MERSCOPE in as a
SCE
.
## class: SingleCellExperiment
## dim: 9 8
## metadata(0):
## assays(1): counts
## rownames(9): PDK4 CCL26 ... ICAM3 TBX21
## rowData names(0):
## colnames(8): 15590 15591 ... 15596 88870
## colData names(4): fov volume center_x center_y
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
In reality, a MERSCOPE data set can have a dimension of 550 genes and around 250,000 cells for the human ovarian cancer data.
STARmap PLUS is an imaging-based in-situ sequencing technology that
has been introduced by Zeng et al.. The
object constructor assumes the data download folder contains two
mandatory files with raw_expression_pd.csv
and
spatial.csv
in the names.
starpath <- system.file(
file.path("extdata", "STARmapPLUS_small"),
package = "SpatialExperimentIO")
list.files(starpath)
## [1] "mock_spatial.csv" "mockraw_expression_pd.csv"
SpatialExperiment
objectWe comment out the default parameters for your reference. In this example dataset, we provide a sample with 8 genes and 9 cells just for illustration.
# readStarmapplusSXE <- function(dirName = dirName,
# returnType = "SPE",
# countMatPattern = "raw_expression_pd.csv",
# metaDataPattern = "spatial.csv",
# coordNames = c("X", "Y", "Z"))
star_spe <- readStarmapplusSXE(dirName = starpath)
star_spe
## class: SpatialExperiment
## dim: 8 9
## metadata(0):
## assays(1): counts
## rownames(8): A2M ABCC9 ... ADAMTS15 ADARB2
## rowData names(0):
## colnames(9): well05_0 well05_1 ... well05_7 well05_8
## colData names(7): NAME Main_molecular_cell_type ...
## Molecular_spatial_cell_type sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(3) : X Y Z
## imgData names(0):
SingleCellExperiment
objectAlternatively, you can also return a
SingleCellExperiment
object.
## class: SingleCellExperiment
## dim: 8 9
## metadata(0):
## assays(1): counts
## rownames(8): A2M ABCC9 ... ADAMTS15 ADARB2
## rowData names(0):
## colnames(9): well05_0 well05_1 ... well05_7 well05_8
## colData names(9): NAME X ... Sub_molecular_tissue_region
## Molecular_spatial_cell_type
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
STARmap PLUS has a gene panel of around 1000 with up to millions of cells depending on the size of the tissue. There are 20 sample on mouse brain with tissue region annotated published by Shi et al.. Their data is avaiable to downloaded from Zenodo.
seqFISH is a product by Spatial Genomics, generated by the GenePS device. Like Xenium, it is claimed to work on intact cells and tissues. To understand more about the seqFISH technology, please refer to their website seqFISH Technology Overview. For more publicly available seqFISH data, please see seqFISH data download page.
The object constructor assumes the data download folder contains two
mandatory files with CellxGene.csv
and
CellCoordinates.csv
in the names.
seqfpath <- system.file(
file.path("extdata", "seqFISH_small"),
package = "SpatialExperimentIO")
list.files(seqfpath)
## [1] "TestSetSmall_CellCoordinates.csv" "TestSetSmall_CellxGene.csv"
SpatialExperiment
objectWe commented out the default specification of each variable in
readSeqfishSXE()
. To read in seqFISH as a
SpatialExperiment
object, you would only need to provide a
valid directory name. With this example dataset, we obtained a seqFISH
SPE
object with 9 genes and 14 cells.
# seqf_spe <- readSeqfishSXE(dirName = seqfpath,
# returnType = "SPE",
# countMatPattern = "cell_by_gene.csv",
# metaDataPattern = "cell_metadata.csv",
# coordNames = c("center_x", "center_y"))
seqf_spe <- readSeqfishSXE(dirName = seqfpath)
seqf_spe
## class: SpatialExperiment
## dim: 9 14
## metadata(0):
## assays(1): counts
## rownames(9): ITGB1 NR4A1 ... ETS1 STAT3
## rowData names(0):
## colnames(14): 1 2 ... 13 14
## colData names(2): area sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : center_x center_y
## imgData names(0):
SingleCellExperiment
objectAlternatively, you can also read seqFISH in as a
SCE
.
## class: SingleCellExperiment
## dim: 9 14
## metadata(0):
## assays(1): counts
## rownames(9): ITGB1 NR4A1 ... ETS1 STAT3
## rowData names(0):
## colnames(14): 1 2 ... 13 14
## colData names(3): area center_x center_y
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
So far, publicly available seqFISH data sets have 220 or 1092 genes in a panel for mouse kidney, for about 669,842 cells. There are other mouse indications, such as brain, liver, and intestine. There might be human colorectal and gastric cancer samples in the upcoming future.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SpatialExperiment_1.17.0 SingleCellExperiment_1.29.1
## [3] SummarizedExperiment_1.37.0 Biobase_2.67.0
## [5] GenomicRanges_1.59.1 GenomeInfoDb_1.43.4
## [7] IRanges_2.41.3 S4Vectors_0.45.4
## [9] BiocGenerics_0.53.6 generics_0.1.3
## [11] MatrixGenerics_1.19.1 matrixStats_1.5.0
## [13] SpatialExperimentIO_0.99.8 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] rjson_0.2.23 xfun_0.50
## [3] bslib_0.9.0 rhdf5_2.51.2
## [5] lattice_0.22-6 vctrs_0.6.5
## [7] rhdf5filters_1.19.1 tools_4.4.2
## [9] parallel_4.4.2 R.oo_1.27.0
## [11] Matrix_1.7-2 data.table_1.16.4
## [13] sparseMatrixStats_1.19.0 dqrng_0.4.1
## [15] assertthat_0.2.1 lifecycle_1.0.4
## [17] GenomeInfoDbData_1.2.13 compiler_4.4.2
## [19] statmod_1.5.0 codetools_0.2-20
## [21] htmltools_0.5.8.1 sys_3.4.3
## [23] buildtools_1.0.0 sass_0.4.9
## [25] yaml_2.3.10 crayon_1.5.3
## [27] jquerylib_0.1.4 R.utils_2.12.3
## [29] BiocParallel_1.41.0 DelayedArray_0.33.5
## [31] cachem_1.1.0 limma_3.63.3
## [33] magick_2.8.5 abind_1.4-8
## [35] tidyselect_1.2.1 locfit_1.5-9.11
## [37] digest_0.6.37 purrr_1.0.4
## [39] arrow_18.1.0.1 maketools_1.3.2
## [41] fastmap_1.2.0 grid_4.4.2
## [43] cli_3.6.3 SparseArray_1.7.5
## [45] magrittr_2.0.3 S4Arrays_1.7.3
## [47] h5mread_0.99.4 edgeR_4.5.2
## [49] DelayedMatrixStats_1.29.1 UCSC.utils_1.3.1
## [51] bit64_4.6.0-1 rmarkdown_2.29
## [53] XVector_0.47.2 httr_1.4.7
## [55] DropletUtils_1.27.2 bit_4.5.0.1
## [57] R.methodsS3_1.8.2 beachmat_2.23.6
## [59] HDF5Array_1.35.12 evaluate_1.0.3
## [61] knitr_1.49 rlang_1.1.5
## [63] Rcpp_1.0.14 glue_1.8.0
## [65] scuttle_1.17.0 formatR_1.14
## [67] BiocManager_1.30.25 jsonlite_1.8.9
## [69] R6_2.6.0 Rhdf5lib_1.29.0