Title: | `SPOTlight`: Spatial Transcriptomics Deconvolution |
---|---|
Description: | `SPOTlight`provides a method to deconvolute spatial transcriptomics spots using a seeded NMF approach along with visualization tools to assess the results. Spatially resolved gene expression profiles are key to understand tissue organization and function. However, novel spatial transcriptomics (ST) profiling techniques lack single-cell resolution and require a combination with single-cell RNA sequencing (scRNA-seq) information to deconvolute the spatially indexed datasets. Leveraging the strengths of both data types, we developed SPOTlight, a computational tool that enables the integration of ST with scRNA-seq data to infer the location of cell types and states within a complex tissue. SPOTlight is centered around a seeded non-negative matrix factorization (NMF) regression, initialized using cell-type marker genes and non-negative least squares (NNLS) to subsequently deconvolute ST capture locations (spots). |
Authors: | Marc Elosua-Bayes [aut, cre], Helena L. Crowell [aut] |
Maintainer: | Marc Elosua-Bayes <[email protected]> |
License: | GPL-3 |
Version: | 1.11.0 |
Built: | 2024-10-31 05:39:32 UTC |
Source: | https://github.com/bioc/SPOTlight |
mockSC/mockSP()
are designed to generate synthetic single-cell and
spatial mixture data. These data are not meant to represent biologically
meaningful use-cases, but are solely intended for use in examples, for
unit-testing, and to demonstrate SPOTlight
's general functionality.
Finally, .get_mgs()
implements a statistically naive way to select
markers from single-cell data; again, please don't use it in real life.
mockSC(ng = 200, nc = 50, nt = 3) mockSP(x, ns = 100) getMGS(x, n_top = 10)
mockSC(ng = 200, nc = 50, nt = 3) mockSP(x, ns = 100) getMGS(x, n_top = 10)
ng , nc , nt , ns
|
integer scalar specifying the number of genes, cells, types (groups) and spots to simulate. |
x |
Single cell experiment object |
n_top |
integer specifying the number of marker genes to extract for each cluster. |
mockSC
returns a SingleCellExperiment
with rows = genes, columns = single cells, and cell metadata
(colData
) column type
containing group identifiers.
mockSP
returns a SingleCellExperiment
with rows = genes, columns = single cells, and cell metadata
(colData
) column type
containing group identifiers.
getMGS
returns a data.frame
with nt*n_top
rows and 3 columns: gene and type (group) identifier, as well as the
gene's weight = the proportion of counts accounted for by that type.
sce <- mockSC() spe <- mockSP(sce) mgs <- getMGS(sce)
sce <- mockSC() spe <- mockSP(sce) mgs <- getMGS(sce)
This function takes in a matrix with the predicted proportions for each spot and returns a correlation matrix between cell types.
plotCorrelationMatrix( x, cor.method = c("pearson", "kendall", "spearman"), insig = c("blank", "pch"), colors = c("#6D9EC1", "white", "#E46726"), hc.order = TRUE, p.mat = TRUE, ... )
plotCorrelationMatrix( x, cor.method = c("pearson", "kendall", "spearman"), insig = c("blank", "pch"), colors = c("#6D9EC1", "white", "#E46726"), hc.order = TRUE, p.mat = TRUE, ... )
x |
numeric matrix with rows = samples and columns = cell types Must have at least two rows and two columns. |
cor.method |
Method to use for correlation: c("pearson", "kendall", "spearman"). By default pearson. |
insig |
character, specialized insignificant correlation coefficients, "pch", "blank" (default). If "blank", wipe away the corresponding glyphs; if "pch", add characters (see pch for details) on corresponding glyphs. |
colors |
character vector with three colors indicating the lower, mid, and high color. By default c("#6D9EC1", "white", "#E46726"). |
hc.order |
logical value. If TRUE, correlation matrix will be hc.ordered using hclust function. |
p.mat |
logical value. If TRUE (default), correlation significance will be used. If FALSE arguments sig.level, insig, pch, pch.col, pch.cex are invalid. |
... |
additional graphical parameters passed to |
ggplot
object
Marc Elosua Bayes & Helena L Crowell
set.seed(321) x <- replicate(m <- 25, runif(10, 0, 1)) rownames(x) <- paste0("spot", seq_len(nrow(x))) colnames(x) <- paste0("type", seq_len(ncol(x))) # The most basic example plotCorrelationMatrix(x = x) # Showing the non-significant correlatinos plotCorrelationMatrix(x = x, insig = "pch") # A more elaborated plotCorrelationMatrix( x = x, hc.order = FALSE, type = "lower", outline.col = "lightgrey", method = "circle", colors = c("#64ccc9", "#b860bd", "#e3345d"))
set.seed(321) x <- replicate(m <- 25, runif(10, 0, 1)) rownames(x) <- paste0("spot", seq_len(nrow(x))) colnames(x) <- paste0("type", seq_len(ncol(x))) # The most basic example plotCorrelationMatrix(x = x) # Showing the non-significant correlatinos plotCorrelationMatrix(x = x, insig = "pch") # A more elaborated plotCorrelationMatrix( x = x, hc.order = FALSE, type = "lower", outline.col = "lightgrey", method = "circle", colors = c("#64ccc9", "#b860bd", "#e3345d"))
This function takes in an image-related object - path to JP(E)G/PNG file, raster object, RGBarray. It returns a ggplot object with the selected image.
x |
A variety of objects can be passed: character string corresponding to an image file path, valid file types are JPG, JPEG and PNG. It can also take as input objects of class raster and RGB arrays. It can also take a SpatialExperiment from which the image will be extracted. |
slice |
Character string indicating which image slice to use when SpatialExperiment objects are passed. By default uses the first slice available. |
ggplot
object
Marc Elosua Bayes & Helena L Crowell
# Filename path <- file.path( system.file(package = "SPOTlight"), "extdata/SPOTlight.png") plotImage(x = path) # array png_img <- png::readPNG(path) plotImage(png_img) # SpatialExperiment
# Filename path <- file.path( system.file(package = "SPOTlight"), "extdata/SPOTlight.png") plotImage(x = path) # array png_img <- png::readPNG(path) plotImage(png_img) # SpatialExperiment
This function takes in a matrix with the predicted proportions
for each spot and returns a heatmap which = plotHeatmap
or a network
graph which = plotNetwork
to show which cells are interacting
spatially.
plotInteractions( x, which = c("heatmap", "network"), metric = c("prop", "jaccard"), min_prop = 0, ... )
plotInteractions( x, which = c("heatmap", "network"), metric = c("prop", "jaccard"), min_prop = 0, ... )
x |
numeric matrix with rows = samples and columns = groups. Must have at least one row and column, and at least two columns. |
which |
character string specifying the type of visualization: one of "heatmap" or "network". |
metric |
character string specifying which metric to show: one of "prop" or "jaccard". |
min_prop |
scalar specifying the value above which
a group is considered to be contributing to a given sample.
An interaction between groups i and j is counted for sample s
only when both x[s, i] and x[s, j] fall above |
... |
additional graphical parameters passed
to |
base R plot
Marc Elosua Bayes & Helena L Crowell
library(ggplot2) mat <- replicate(8, rnorm(100, runif(1, -1, 1))) # Basic example plotInteractions(mat) ### heatmap ### # This returns a ggplot object that can be modified as such plotInteractions(mat, which = "heatmap") + scale_fill_gradient(low = "#f2e552", high = "#850000") + labs(title = "Interaction heatmap", fill = "proportion") ### Network ### # specify node names nms <- letters[seq_len(ncol(mat))] plotInteractions(mat, which = "network", vertex.label = nms) # or set column names instead colnames(mat) <- nms plotInteractions(mat, which = "network") # pass additional graphical parameters for aesthetics plotInteractions(mat, which = "network", edge.color = "cyan", vertex.color = "pink", vertex.label.font = 2, vertex.label.color = "maroon")
library(ggplot2) mat <- replicate(8, rnorm(100, runif(1, -1, 1))) # Basic example plotInteractions(mat) ### heatmap ### # This returns a ggplot object that can be modified as such plotInteractions(mat, which = "heatmap") + scale_fill_gradient(low = "#f2e552", high = "#850000") + labs(title = "Interaction heatmap", fill = "proportion") ### Network ### # specify node names nms <- letters[seq_len(ncol(mat))] plotInteractions(mat, which = "network", vertex.label = nms) # or set column names instead colnames(mat) <- nms plotInteractions(mat, which = "network") # pass additional graphical parameters for aesthetics plotInteractions(mat, which = "network", edge.color = "cyan", vertex.color = "pink", vertex.label.font = 2, vertex.label.color = "maroon")
This function takes in the coordinates of the spots and the proportions of the cell types within each spot. It returns a plot where each spot is a piechart showing proportions of the cell type composition.
plotSpatialScatterpie( x, y, cell_types = colnames(y), img = FALSE, slice = NULL, scatterpie_alpha = 1, pie_scale = 0.4, degrees = NULL, axis = NULL, ... )
plotSpatialScatterpie( x, y, cell_types = colnames(y), img = FALSE, slice = NULL, scatterpie_alpha = 1, pie_scale = 0.4, degrees = NULL, axis = NULL, ... )
x |
Object containing the spots coordinates, it can be an object of class SpatialExperiment, dataframe or matrix. For the latter two rownames should have the spot barcodes to match x. If a matrix it has to of dimensions nrow(y) x 2 where the columns are the x and y coordinates in that order. |
y |
Matrix or dataframe containing the deconvoluted spots. rownames need to be the spot barcodes to match to x. |
cell_types |
Vector of cell type names to plot. By default uses the column names of y. |
img |
Logical TRUE or FALSE indicating whether to plot the image or not.
Objects of classes accepted by |
slice |
Character string indicating which slice to plot if img is TRUE. By default uses the first image. |
scatterpie_alpha |
Numeric scalar to set the alpha of the pie charts. By default 1. |
pie_scale |
Numeric scalar to set the size of the pie charts. By default 0.4. |
degrees |
From SpatialExperiment rotateImg. For clockwise (degrees > 0) and counter-clockwise (degrees < 0) rotation. By default NULL. |
axis |
From SpatialExperiment mirrorImg. When a SpatialExperiment object is passed as the image return the mirror image. For horizontal (axis = "h") and vertical (axis = "v") mirroring. By default NULL. |
... |
additional parameters to geom_scatterpie |
ggplot
object
Marc Elosua Bayes & Helena L Crowell
set.seed(321) # Coordinates x <- replicate(2, rnorm(100)) rownames(x) <- paste0("spot", seq_len(nrow(x))) colnames(x) <- c("imagecol", "imagerow") # Proportions y <- replicate(m <- 5, runif(nrow(x), 0, 1)) y <- prop.table(y, 1) rownames(y) <- paste0("spot", seq_len(nrow(y))) colnames(y) <- paste0("type", seq_len(ncol(y))) (plt <- plotSpatialScatterpie(x = x, y = y))
set.seed(321) # Coordinates x <- replicate(2, rnorm(100)) rownames(x) <- paste0("spot", seq_len(nrow(x))) colnames(x) <- c("imagecol", "imagerow") # Proportions y <- replicate(m <- 5, runif(nrow(x), 0, 1)) y <- prop.table(y, 1) rownames(y) <- paste0("spot", seq_len(nrow(y))) colnames(y) <- paste0("type", seq_len(ncol(y))) (plt <- plotSpatialScatterpie(x = x, y = y))
This function takes in the fitted NMF model and returns the
topic profiles learned for each cell facet = FALSE
or cell type
facet = TRUE
. Ideal training will return all the cell from the same
cell type to share a unique topic profile.
plotTopicProfiles(x, y, facet = FALSE, min_prop = 0.01, ncol = NULL)
plotTopicProfiles(x, y, facet = FALSE, min_prop = 0.01, ncol = NULL)
x |
|
y |
vector of group labels. Should be of length |
facet |
logical indicating whether to stratify by group.
If |
min_prop |
scalar in [0,1]. When |
ncol |
integer scalar specifying the number of facet columns. |
ggplot
object
Marc Elosua Bayes & Helena L Crowell
library(ggplot2) x <- mockSC() y <- mockSP(x) z <- getMGS(x) res <- SPOTlight(x, y, groups = x$type, mgs = z, group_id = "type", verbose = FALSE) plotTopicProfiles(res[[3]], x$type, facet = TRUE) plotTopicProfiles(res[[3]], x$type, facet = FALSE)
library(ggplot2) x <- mockSC() y <- mockSP(x) z <- getMGS(x) res <- SPOTlight(x, y, groups = x$type, mgs = z, group_id = "type", verbose = FALSE) plotTopicProfiles(res[[3]], x$type, facet = TRUE) plotTopicProfiles(res[[3]], x$type, facet = FALSE)
This function takes in the mixture data, the trained model & the topic profiles and returns the proportion of each cell type within each mixture
runDeconvolution( x, mod, ref, scale = TRUE, min_prop = 0.01, verbose = TRUE, slot = "counts" )
runDeconvolution( x, mod, ref, scale = TRUE, min_prop = 0.01, verbose = TRUE, slot = "counts" )
x |
mixture dataset. Can be a numeric matrix,
|
mod |
object of class NMFfit as obtained from trainNMF. |
ref |
Object of class matrix containing the topic profiles for each cell type as obtained from trainNMF. |
scale |
logical specifying whether to scale single-cell counts to unit variance. This gives the user the option to normalize the data beforehand as you see fit (CPM, FPKM, ...) when passing a matrix or specifying the slot from where to extract the count data. |
min_prop |
scalar in [0,1] setting the minimum contribution
expected from a cell type in |
verbose |
logical. Should information on progress be reported? |
slot |
If the object is of class |
base a list where the first element is an NMFfit
object and
the second is a matrix containing the topic profiles learnt.
Marc Elosua Bayes & Helena L Crowell
set.seed(321) # mock up some single-cell, mixture & marker data sce <- mockSC(ng = 200, nc = 10, nt = 3) spe <- mockSP(sce) mgs <- getMGS(sce) res <- trainNMF( x = sce, y = spe, groups = sce$type, mgs = mgs, weight_id = "weight", group_id = "type", gene_id = "gene") # Run deconvolution decon <- runDeconvolution( x = spe, mod = res[["mod"]], ref = res[["topic"]])
set.seed(321) # mock up some single-cell, mixture & marker data sce <- mockSC(ng = 200, nc = 10, nt = 3) spe <- mockSP(sce) mgs <- getMGS(sce) res <- trainNMF( x = sce, y = spe, groups = sce$type, mgs = mgs, weight_id = "weight", group_id = "type", gene_id = "gene") # Run deconvolution decon <- runDeconvolution( x = spe, mod = res[["mod"]], ref = res[["topic"]])
This is the backbone function which takes in single cell expression data to deconvolute spatial transcriptomics spots.
SPOTlight( x, y, groups = NULL, mgs, n_top = NULL, gene_id = "gene", group_id = "cluster", weight_id = "weight", hvg = NULL, scale = TRUE, model = c("ns", "std"), min_prop = 0.01, verbose = TRUE, slot_sc = "counts", slot_sp = "counts", ... )
SPOTlight( x, y, groups = NULL, mgs, n_top = NULL, gene_id = "gene", group_id = "cluster", weight_id = "weight", hvg = NULL, scale = TRUE, model = c("ns", "std"), min_prop = 0.01, verbose = TRUE, slot_sc = "counts", slot_sp = "counts", ... )
x , y
|
single-cell and mixture dataset, respectively. Can be a
numeric matrix or a |
groups |
vector of group labels for cells in |
mgs |
|
n_top |
integer scalar specifying the number of markers to select per group. By default NULL uses all the marker genes to initialize the model. |
gene_id , group_id , weight_id
|
character specifying the column
in |
hvg |
character vector containing hvg to include in the model. By default NULL. |
scale |
logical specifying whether to scale single-cell counts to unit variance. This gives the user the option to normalize the data beforehand as you see fit (CPM, FPKM, ...) when passing a matrix or specifying the slot from where to extract the count data. |
model |
character string indicating which model to use when running NMF. Either "ns" (default) or "std". |
min_prop |
scalar in [0,1] setting the minimum contribution
expected from a cell type in |
verbose |
logical. Should information on progress be reported? |
slot_sc , slot_sp
|
If the object is of class |
... |
additional parameters. |
SPOTlight uses a Non-Negative Matrix Factorization approach to learn which genes are important for each cell type. In order to drive the factorization and give more importance to cell type marker genes we previously compute them and use them to initialize the basis matrix. This initialized matrices will then be used to carry out the factorization with the single cell expression data. Once the model has learn the topic profiles for each cell type we use non-negative least squares (NNLS) to obtain the topic contributions to each spot. Lastly, NNLS is again used to obtain the proportion of each cell type for each spot by finding the fitting the single-cell topic profiles to the spots topic contributions.
a numeric matrix with rows corresponding to samples and columns to groups
Marc Elosua-Bayes & Helena L. Crowell
library(scater) library(scran) # Use Mock data # Refer to the vignette for a full workflow sce <- mockSC(ng = 200, nc = 10, nt = 3) spe <- mockSP(sce) mgs <- getMGS(sce) res <- SPOTlight( x = counts(sce), y = counts(spe), groups = as.character(sce$type), mgs = mgs, hvg = NULL, weight_id = "weight", group_id = "type", gene_id = "gene")
library(scater) library(scran) # Use Mock data # Refer to the vignette for a full workflow sce <- mockSC(ng = 200, nc = 10, nt = 3) spe <- mockSP(sce) mgs <- getMGS(sce) res <- SPOTlight( x = counts(sce), y = counts(spe), groups = as.character(sce$type), mgs = mgs, hvg = NULL, weight_id = "weight", group_id = "type", gene_id = "gene")
This is the training function used by SPOTLight. This function takes in single cell expression data, trains the model and learns topic profiles for each cell type
trainNMF( x, y, groups = NULL, mgs, n_top = NULL, gene_id = "gene", group_id = "cluster", weight_id = "weight", hvg = NULL, model = c("ns", "std"), scale = TRUE, verbose = TRUE, slot_sc = "counts", slot_sp = "counts", ... )
trainNMF( x, y, groups = NULL, mgs, n_top = NULL, gene_id = "gene", group_id = "cluster", weight_id = "weight", hvg = NULL, model = c("ns", "std"), scale = TRUE, verbose = TRUE, slot_sc = "counts", slot_sp = "counts", ... )
x , y
|
single-cell and mixture dataset, respectively. Can be a
numeric matrix or |
groups |
character vector of group labels for cells in |
mgs |
|
n_top |
integer scalar specifying the number of markers to select per group. By default NULL uses all the marker genes to initialize the model. |
gene_id , group_id , weight_id
|
character specifying the column
in |
hvg |
character vector containing hvg to include in the model. By default NULL. |
model |
character string indicating which model to use when running NMF. Either "ns" (default) or "std". |
scale |
logical specifying whether to scale single-cell counts to unit variance. This gives the user the option to normalize the data beforehand as you see fit (CPM, FPKM, ...) when passing a matrix or specifying the slot from where to extract the count data. |
verbose |
logical. Should information on progress be reported? |
slot_sc , slot_sp
|
If the object is of class |
... |
additional parameters. |
base a list where the first element is an NMFfit
object and
the second is a matrix containing the topic profiles learnt.
Marc Elosua Bayes & Helena L Crowell
set.seed(321) # mock up some single-cell, mixture & marker data sce <- mockSC(ng = 200, nc = 10, nt = 3) spe <- mockSP(sce) mgs <- getMGS(sce) res <- trainNMF( x = sce, y = spe, groups = sce$type, mgs = mgs, weight_id = "weight", group_id = "type", gene_id = "gene") # Get NMF model res[["mod"]] # Get topic profiles res[["topic"]]
set.seed(321) # mock up some single-cell, mixture & marker data sce <- mockSC(ng = 200, nc = 10, nt = 3) spe <- mockSP(sce) mgs <- getMGS(sce) res <- trainNMF( x = sce, y = spe, groups = sce$type, mgs = mgs, weight_id = "weight", group_id = "type", gene_id = "gene") # Get NMF model res[["mod"]] # Get topic profiles res[["topic"]]