Title: | DESpace: a framework to discover spatially variable genes |
---|---|
Description: | Intuitive framework for identifying spatially variable genes (SVGs) via edgeR, a popular method for performing differential expression analyses. Based on pre-annotated spatial clusters as summarized spatial information, DESpace models gene expression using a negative binomial (NB), via edgeR, with spatial clusters as covariates. SVGs are then identified by testing the significance of spatial clusters. The method is flexible and robust, and is faster than the most SV methods. Furthermore, to the best of our knowledge, it is the only SV approach that allows: - performing a SV test on each individual spatial cluster, hence identifying the key regions of the tissue affected by spatial variability; - jointly fitting multiple samples, targeting genes with consistent spatial patterns across replicates. |
Authors: | Peiying Cai [aut, cre] , Simone Tiberi [aut] |
Maintainer: | Peiying Cai <[email protected]> |
License: | GPL-3 |
Version: | 1.7.0 |
Built: | 2024-10-30 05:29:47 UTC |
Source: | https://github.com/bioc/DESpace |
'DESpace_test' identifies spatially variable genes (SVGs) from spatially-resolved transcriptomics data, provided spatial clusters are available.
DESpace_test( spe, spatial_cluster, sample_col = NULL, replicates = FALSE, min_counts = 20, min_non_zero_spots = 10, filter_gene = TRUE, verbose = FALSE )
DESpace_test( spe, spatial_cluster, sample_col = NULL, replicates = FALSE, min_counts = 20, min_non_zero_spots = 10, filter_gene = TRUE, verbose = FALSE )
spe |
SpatialExperiment or SingleCellExperiment. |
spatial_cluster |
Column name of spatial clusters in |
sample_col |
Column name of sample ids in |
replicates |
A logical, indicating whether biological replicates are provided (TRUE) or not (FALSE).
If biological replicates are provided, |
min_counts |
Minimum number of counts per sample (across all spots) for a gene to be analyzed. |
min_non_zero_spots |
Minimum number of non-zero spots per sample, for a gene to be analyzed. |
filter_gene |
A logical. If TRUE,
|
verbose |
A logical.
If TRUE, |
If 'sample_col' is not specified and 'replicates == FALSE',
DESpace_test
assumed that data comes from an individual sample,
and performs SV testing on it.
If 'sample_col' is provided and 'replicates == FALSE',
DESpace_test
tests each sample individually and returns a list of results for each sample.
If 'sample_col' is provided and 'replicates == TRUE',
DESpace_test
performs a joint multi-sample test.
A list of results:
- "gene_results": a dataframe contains main edgeR test results;
- "estimated_y": a DGEList object contains the estimated common dispersion, which can later be used to speed-up calculation when testing individual clusters.
- "glmFit" (only if verbose = TRUE
): a DGEGLM object contains full statistics from "edgeR::glmFit".
- "glmLRT" (only if verbose = TRUE
): a DGELRT object contains full statistics from "edgeR::glmLRT".
top_results
, individual_test
, FeaturePlot
# load the input data: data("LIBD_subset", package = "DESpace") LIBD_subset # Fit the model via \code{\link{DESpace_test}} function. set.seed(123) results_DESpace <- DESpace_test(spe = LIBD_subset, spatial_cluster = "layer_guess_reordered", verbose = FALSE) # DESpace_test returns of a list of 2 objects: # "gene_results": a dataframe contains main edgeR test results; # "estimated_y": a DGEList object contains the estimated common dispersion, # which can later be used to speed-up calculation when testing individual clusters. # We visualize differential results: head(results_DESpace$gene_results, 3)
# load the input data: data("LIBD_subset", package = "DESpace") LIBD_subset # Fit the model via \code{\link{DESpace_test}} function. set.seed(123) results_DESpace <- DESpace_test(spe = LIBD_subset, spatial_cluster = "layer_guess_reordered", verbose = FALSE) # DESpace_test returns of a list of 2 objects: # "gene_results": a dataframe contains main edgeR test results; # "estimated_y": a DGEList object contains the estimated common dispersion, # which can later be used to speed-up calculation when testing individual clusters. # We visualize differential results: head(results_DESpace$gene_results, 3)
Plot spatial gene expression.
This function is a modified version of the FeaturePlot
function from BayesSpace R package.
In comparison to the original BayesSpace function, this function allows plotting multiple genes simultaneously and drawing an outline around a specified cluster.
FeaturePlot( spe, feature, coordinates = NULL, assay.type = "logcounts", Annotated_cluster = FALSE, diverging = FALSE, low = NULL, high = NULL, mid = NULL, color = NULL, platform = "Visium", spatial_cluster = NULL, cluster = NULL, legend_cluster = FALSE, label = FALSE, ncol = 3, title = FALSE, linewidth = 0.4, legend_exprs = FALSE, title_size = 10 )
FeaturePlot( spe, feature, coordinates = NULL, assay.type = "logcounts", Annotated_cluster = FALSE, diverging = FALSE, low = NULL, high = NULL, mid = NULL, color = NULL, platform = "Visium", spatial_cluster = NULL, cluster = NULL, legend_cluster = FALSE, label = FALSE, ncol = 3, title = FALSE, linewidth = 0.4, legend_exprs = FALSE, title_size = 10 )
spe |
SpatialExperiment or SingleCellExperiment. If |
feature |
Feature vector used to color each spot. May be the name of a
gene/row in an assay of |
coordinates |
Column names of spatial coordinates of spots stored in |
assay.type |
String indicating which assay in |
Annotated_cluster |
A logical. TRUE or FALSE, indicating whether to plot the annotated spatial clusters next to expression plots. |
diverging |
A logical. If true, use a diverging color gradient in
|
low , mid , high
|
Optional hex codes for low, mid, and high values of the color gradient used for continuous spot values. |
color |
Optional hex code to set color of borders around spots. Set to
|
platform |
Spatial sequencing platform. If "Visium", the hex spot layout
will be used, otherwise square spots will be plotted. |
spatial_cluster |
Column name of spatial clusters in |
cluster |
Names of the spatial clusters used for drawing a boundary around a group of points that belong to the specify cluster. It can be NULL, "all"/"ALL", or a vector of cluster names. |
legend_cluster |
A logical. TRUE of FALSE, indicating whether to plot the legend for the shaped clusters (TRUE), or not (FALSE). Only used when 'spatial_cluster' and 'cluster' are specified. |
label |
A logical. TRUE of FALSE. Adding a label and an arrow pointing to a group. |
ncol |
The dimensions of the grid to create. By default, 1, if the length of feature equals to 1, and 3, otherwise. |
title |
A logical. TRUE or FALSE. If true, the title name of each (subplot) is the gene name. |
linewidth |
The width of the boundary line around the cluster. The default ('0.4') size of the boundary line is one. |
legend_exprs |
A logical. TRUE of FALSE, indicating whether to plot the legend for the expression level (TRUE), or not (FALSE). |
title_size |
Text size. |
Returns a ggplot object.
DESpace_test
, individual_test
, top_results
# load the input data: data("LIBD_subset", package = "DESpace") # load pre-computed results (obtained via `DESpace_test`) data("results_DESpace_test", package = "DESpace") # Visualize the gene expression of the top three genes feature = results_DESpace_test$gene_results$gene_id[seq_len(3)] FeaturePlot(LIBD_subset, feature, coordinates = c("array_row", "array_col"), ncol = 3, title = TRUE)
# load the input data: data("LIBD_subset", package = "DESpace") # load pre-computed results (obtained via `DESpace_test`) data("results_DESpace_test", package = "DESpace") # Visualize the gene expression of the top three genes feature = results_DESpace_test$gene_results$gene_id[seq_len(3)] FeaturePlot(LIBD_subset, feature, coordinates = c("array_row", "array_col"), ncol = 3, title = TRUE)
DESpace can also be used to reveal the specific areas of the tissue affected by SVGs; i.e., spatial clusters that are particularly over/under abundant compared to the average signal. This function can be used to identify SVGs for each individual cluster.
individual_test( spe, spatial_cluster, sample_col = "sample_id", edgeR_y = NULL, min_counts = 20, min_non_zero_spots = 10, filter_gene = TRUE, replicates = FALSE, BPPARAM = NULL )
individual_test( spe, spatial_cluster, sample_col = "sample_id", edgeR_y = NULL, min_counts = 20, min_non_zero_spots = 10, filter_gene = TRUE, replicates = FALSE, BPPARAM = NULL )
spe |
SpatialExperiment or SingleCellExperiment. |
spatial_cluster |
Column name of spatial clusters in |
sample_col |
Column name of sample ids in |
edgeR_y |
Pre-estimated dispersion; if it's null, compute dispersion. |
min_counts |
Minimum number of counts per sample (across all spots) for a gene to be analyzed. |
min_non_zero_spots |
Minimum number of non-zero spots per sample, for a gene to be analyzed. |
filter_gene |
A logical. If TRUE,
|
replicates |
Single sample or multi-sample test. |
BPPARAM |
An optional parameter passed internally to bplapply. We suggest using as many cores as the number of spatial clusters. If unspecified, the script does not run in parallel. Note that parallel coding performs better only when dispersion estimations are not provided beforehand. Moreover, parallelizing the script will increase the memory requirement; if memory is an issue, leave 'BPPARAM' unspecified and, hence, avoid parallelization. |
For every spatial cluster we test, edgeR
would normally re-compute the dispersion estimates based on the specific design of the test.
However, this calculation represents the majority of the overall computing time.
Therefore, to speed-up calculations, we propose to use the dispersion estimates which were previously computed for the gene-level tests.
This introduces a minor approximation which, in our benchmarks, does not lead to decreased accuracy.
If you want to use pre-computed gene-level dispersion estimates, set edgeR_y
to 'estimated_y'.
Alternatively, if you want to re-compute dispersion estimates (significantly slower, but marginally more accurate option), leave edgeR_y empty.
A list of results, with one result per spatial cluster in each element. Specifically, each item in the list is a "gene_results" dataframe which contains main edgeR test results.
top_results
, DESpace_test
, FeaturePlot
# load the input data: data("LIBD_subset", package = "DESpace") LIBD_subset # load pre-computed results (obtaines via `DESpace_test`) data("results_DESpace_test", package = "DESpace") # DESpace_test returns of a list of 2 objects: # "gene_results": a dataframe contains main edgeR test results; # "estimated_y": a DGEList object contains the estimated common dispersion, # which can later be used to speed-up calculation when testing individual clusters. # We visualize differential results: head(results_DESpace_test$gene_results, 3) # Individual cluster test: identify SVGs for each individual cluster # set parallel computing; we suggest using as many cores as the number of spatial clusters. # Note that parallelizing the script will increase the memory requirement; # if memory is an issue, leave 'BPPARAM' unspecified and, hence, avoid parallelization. set.seed(123) results_individual_test <- individual_test(LIBD_subset, edgeR_y = results_DESpace_test$estimated_y, spatial_cluster = "layer_guess_reordered") # We visualize results for the cluster 'WM' results_WM <- results_individual_test[[7]] head(results_WM,3)
# load the input data: data("LIBD_subset", package = "DESpace") LIBD_subset # load pre-computed results (obtaines via `DESpace_test`) data("results_DESpace_test", package = "DESpace") # DESpace_test returns of a list of 2 objects: # "gene_results": a dataframe contains main edgeR test results; # "estimated_y": a DGEList object contains the estimated common dispersion, # which can later be used to speed-up calculation when testing individual clusters. # We visualize differential results: head(results_DESpace_test$gene_results, 3) # Individual cluster test: identify SVGs for each individual cluster # set parallel computing; we suggest using as many cores as the number of spatial clusters. # Note that parallelizing the script will increase the memory requirement; # if memory is an issue, leave 'BPPARAM' unspecified and, hence, avoid parallelization. set.seed(123) results_individual_test <- individual_test(LIBD_subset, edgeR_y = results_DESpace_test$estimated_y, spatial_cluster = "layer_guess_reordered") # We visualize results for the cluster 'WM' results_WM <- results_individual_test[[7]] head(results_WM,3)
spatialLIBD
packageSubset from the human DLPFC 10x Genomics Visium dataset of the spatialLIBD
package
LIBD_subset |
contains a |
Peiying Cai [email protected], Simone Tiberi [email protected]
# Connect to ExperimentHub # ehub <- ExperimentHub::ExperimentHub() # Download the example spe data # spe_all <- spatialLIBD::fetch_data(type = "spe", eh = ehub) # Select one sample only: # LIBD_subset <- spe_all[, colData(spe_all)$sample_id == '151673'] # Select small set of random genes for faster runtime # set.seed(123) # sel_genes <- sample(dim(LIBD_subset)[1],500) # LIBD_subset <- LIBD_subset[sel_genes,] # keep_col <- c("array_row","array_col","layer_guess_reordered") # library(SingleCellExperiment) # LIBD_subset <- SpatialExperiment(assay = list(counts = assay(LIBD_subset), # logcounts = logcounts(LIBD_subset)), # colData = colData(LIBD_subset)[keep_col]) # save(LIBD_subset, file = "./DESpace/data/LIBD_subset.RData")
# Connect to ExperimentHub # ehub <- ExperimentHub::ExperimentHub() # Download the example spe data # spe_all <- spatialLIBD::fetch_data(type = "spe", eh = ehub) # Select one sample only: # LIBD_subset <- spe_all[, colData(spe_all)$sample_id == '151673'] # Select small set of random genes for faster runtime # set.seed(123) # sel_genes <- sample(dim(LIBD_subset)[1],500) # LIBD_subset <- LIBD_subset[sel_genes,] # keep_col <- c("array_row","array_col","layer_guess_reordered") # library(SingleCellExperiment) # LIBD_subset <- SpatialExperiment(assay = list(counts = assay(LIBD_subset), # logcounts = logcounts(LIBD_subset)), # colData = colData(LIBD_subset)[keep_col]) # save(LIBD_subset, file = "./DESpace/data/LIBD_subset.RData")
DESpace_test
functionResults from DESpace_test
function
results_DESpace_test |
contains a |
Peiying Cai [email protected], Simone Tiberi [email protected]
# load the input data: # data("LIBD_subset", package = "DESpace") # LIBD_subset # # Fit the model via `DESpace_test` function. # Parameter `spe` specifies the input `SpatialExperiment` or `SingleCellExperiment` object, # while `spatial_cluster` defines the column names of `colData(spe)` containing spatial clusters. # To obtain all statistics, set `verbose` to `TRUE`. # DESpace_test returns of a list of 4 objects: # "gene_results": a dataframe contains main edgeR test results; # "estimated_y": a DGEList object contains the estimated common dispersion, # which can later be used to speed-up calculation when testing individual clusters. # "glmFit": a DGEGLM object contains full statistics from "edgeR::glmFit" # "glmLRT" a DGELRT object contains full statistics from "edgeR::glmLRT" # # set.seed(123) # results_DESpace_test <- DESpace_test(spe = LIBD_subset, # spatial_cluster = "layer_guess_reordered", # verbose = FALSE) # # save(results_DESpace_test, file = "./DESpace/data/results_DESpace_test.RData")
# load the input data: # data("LIBD_subset", package = "DESpace") # LIBD_subset # # Fit the model via `DESpace_test` function. # Parameter `spe` specifies the input `SpatialExperiment` or `SingleCellExperiment` object, # while `spatial_cluster` defines the column names of `colData(spe)` containing spatial clusters. # To obtain all statistics, set `verbose` to `TRUE`. # DESpace_test returns of a list of 4 objects: # "gene_results": a dataframe contains main edgeR test results; # "estimated_y": a DGEList object contains the estimated common dispersion, # which can later be used to speed-up calculation when testing individual clusters. # "glmFit": a DGEGLM object contains full statistics from "edgeR::glmFit" # "glmLRT" a DGELRT object contains full statistics from "edgeR::glmLRT" # # set.seed(123) # results_DESpace_test <- DESpace_test(spe = LIBD_subset, # spatial_cluster = "layer_guess_reordered", # verbose = FALSE) # # save(results_DESpace_test, file = "./DESpace/data/results_DESpace_test.RData")
individual_test
functionResults from individual_test
function
results_individual_test |
contains a |
Peiying Cai [email protected], Simone Tiberi [email protected]
# load the input data: # data("LIBD_subset", package = "DESpace") # LIBD_subset # load pre-computed results (obtained via `DESpace_test`) # data("results_DESpace_test", package = "DESpace") # results_DESpace_test # Function `individual_test()` can be used to identify SVGs for each individual cluster. # Parameter `spatial_cluster` indicates the column names of `colData(spe)` # containing spatial clusters. # set.seed(123) # results_individual_test <- individual_test(LIBD_subset, # edgeR_y = results_DESpace_test$estimated_y, # spatial_cluster = "layer_guess_reordered") # save(results_individual_test, file = "./DESpace/data/results_individual_test.RData")
# load the input data: # data("LIBD_subset", package = "DESpace") # LIBD_subset # load pre-computed results (obtained via `DESpace_test`) # data("results_DESpace_test", package = "DESpace") # results_DESpace_test # Function `individual_test()` can be used to identify SVGs for each individual cluster. # Parameter `spatial_cluster` indicates the column names of `colData(spe)` # containing spatial clusters. # set.seed(123) # results_individual_test <- individual_test(LIBD_subset, # edgeR_y = results_DESpace_test$estimated_y, # spatial_cluster = "layer_guess_reordered") # save(results_individual_test, file = "./DESpace/data/results_individual_test.RData")
Filter significant results.
top_results
returns the significant results obtained via DESpace_test
and individual_test
.
It can also be used to merge gene- and cluster-level results into a single object.
top_results( gene_results = NULL, cluster_results, cluster = NULL, select = "both", high_low = NULL )
top_results( gene_results = NULL, cluster_results, cluster = NULL, select = "both", high_low = NULL )
gene_results |
Results returned from |
cluster_results |
Results returned from |
cluster |
A character indicating the cluster(s) whose results have to be returned. Results from all clusters are returned by default ("NULL"). |
select |
A character indicating what results should be returned ("FDR", "logFC", or "both"). Only used if "cluster_results" are provided. By default ("both"), both FDR and logFC are returned. |
high_low |
A character indicating whether to filter results or not. Only used if "cluster_results" are provided, and one cluster is specified in "cluster" parameter. By default (NULL), all results are returned in a single data.frame. If "high" or "HIGH", we only return SVGs with average abundace in "cluster" higher than in the rest of the tissue (i.e., logFC > 0). If "low" or "LOW", we only return SVGs with average abundace in "cluster" lower than in the rest of the tissue (i.e., logFC < 0). If "both" or "BOTH", then both "high" and "low" results are returned, but in two separate data.frames. |
A data.frame
object or a list of data.frame
with results.
A data.frame
object or a list of data.frame
with results.
- When only “cluster_results” is provided, results are reported as a data.frame
with columns for
gene names (gene_id), spatial clusters affected by SV (Cluster), cluster-specific likelihood ratio test statistics (LR),
cluster-specific average (across spots) log-2 counts per million (logCPM), cluster-specific log2-fold changes (logFC),
cluster-specific raw p-values (PValue), and Benjamini-Hochberg adjusted p-values (FDR) for each spatial cluster.
- When “gene_results” and “cluster_results” are given, results are reported as a data.frame
that merges gene-
and cluster-level results.
- If “cluster” is specified, the function returns a subset data.frame
for the given cluster, which contains cluster name,
gene name, LR, logCPM, logFC, PValue and FDR, ordered by FDR for the specified cluster.
- If “high_low” is set, the function returns a list of data.frame
that contains subsets of results for genes with
higher and/or lower expression in the given cluster compared to the rest of the tissue.
DESpace_test
, individual_test
, FeaturePlot
# load pre-computed results (obtained via `DESpace_test`) data("results_DESpace_test", package = "DESpace") # DESpace_test returns of a list of 2 objects: # "gene_results": a dataframe contains main edgeR test results; # "estimated_y": a DGEList object contains the estimated common dispersion, # which can later be used to speed-up calculation when testing individual clusters. # We visualize differential results: head(results_DESpace_test$gene_results, 3) # load pre-computed results (obtained via `individual_test`) data("results_individual_test", package = "DESpace") # Function `individual_test()` can be used to identify SVGs for each individual cluster. # `individual_test()` returns a list containing the results of individual clusters. # For each cluster, results are reported as a data.frame, # where columns For each cluster, results are reported as a data.frame, # where columns contain gene names (`genes`), likelihood ratio (`LR`), # log2-fold changes (`logFC`) and adjusted p-value (`FDR`). # # Combine gene-and cluster-level results merge_res = top_results(results_DESpace_test$gene_results, results_individual_test) head(merge_res,3) # 'select = "FDR"' can be used to visualize adjusted p-values for each spatial cluster. merge_res = top_results(results_DESpace_test$gene_results, results_individual_test, select = "FDR") head(merge_res,3) # Specify the cluster of interest and check top genes detected by DESpace_test. results_WM_both = top_results(cluster_results = results_individual_test, cluster = "WM", high_low = "both") head(results_WM_both$high_genes, 3) head(results_WM_both$low_genes, 3)
# load pre-computed results (obtained via `DESpace_test`) data("results_DESpace_test", package = "DESpace") # DESpace_test returns of a list of 2 objects: # "gene_results": a dataframe contains main edgeR test results; # "estimated_y": a DGEList object contains the estimated common dispersion, # which can later be used to speed-up calculation when testing individual clusters. # We visualize differential results: head(results_DESpace_test$gene_results, 3) # load pre-computed results (obtained via `individual_test`) data("results_individual_test", package = "DESpace") # Function `individual_test()` can be used to identify SVGs for each individual cluster. # `individual_test()` returns a list containing the results of individual clusters. # For each cluster, results are reported as a data.frame, # where columns For each cluster, results are reported as a data.frame, # where columns contain gene names (`genes`), likelihood ratio (`LR`), # log2-fold changes (`logFC`) and adjusted p-value (`FDR`). # # Combine gene-and cluster-level results merge_res = top_results(results_DESpace_test$gene_results, results_individual_test) head(merge_res,3) # 'select = "FDR"' can be used to visualize adjusted p-values for each spatial cluster. merge_res = top_results(results_DESpace_test$gene_results, results_individual_test, select = "FDR") head(merge_res,3) # Specify the cluster of interest and check top genes detected by DESpace_test. results_WM_both = top_results(cluster_results = results_individual_test, cluster = "WM", high_low = "both") head(results_WM_both$high_genes, 3) head(results_WM_both$low_genes, 3)