Title: | Scalable identification of spatially variable genes in spatially-resolved transcriptomics data |
---|---|
Description: | Method for scalable identification of spatially variable genes (SVGs) in spatially-resolved transcriptomics data. The method is based on nearest-neighbor Gaussian processes and uses the BRISC algorithm for model fitting and parameter estimation. Allows identification and ranking of SVGs with flexible length scales across a tissue slide or within spatial domains defined by covariates. Scales linearly with the number of spatial locations and can be applied to datasets containing thousands or more spatial locations. |
Authors: | Lukas M. Weber [aut, cre] , Stephanie C. Hicks [aut] |
Maintainer: | Lukas M. Weber <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.11.0 |
Built: | 2024-10-31 06:17:24 UTC |
Source: | https://github.com/bioc/nnSVG |
Preprocessing function to filter low-expressed genes and/or mitochondrial genes for 'nnSVG'.
filter_genes( spe, filter_genes_ncounts = 3, filter_genes_pcspots = 0.5, filter_mito = TRUE )
filter_genes( spe, filter_genes_ncounts = 3, filter_genes_pcspots = 0.5, filter_mito = TRUE )
spe |
|
filter_genes_ncounts |
|
filter_genes_pcspots |
|
filter_mito |
|
Preprocessing function to filter low-expressed genes and/or mitochondrial genes for 'nnSVG'.
This function can be used to filter out low-expressed genes and/or mitochondrial genes before additional preprocessing (calculating logcounts or deviance residuals) and running 'nnSVG'.
We use this function in the examples and vignettes in the 'nnSVG' package, and provide default filtering parameter values that are appropriate for 10x Genomics Visium data.
The use of this function is optional. Users can also perform filtering and
preprocessing separately, and run nnSVG
on a preprocessed
SpatialExperiment
object.
Returns SpatialExperiment
with filtered genes (rows) removed.
library(SpatialExperiment) library(STexampleData) # load example dataset from STexampleData package spe <- Visium_humanDLPFC() # preprocessing steps # keep only spots over tissue spe <- spe[, colData(spe)$in_tissue == 1] dim(spe) # filter low-expressed and mitochondrial genes spe <- filter_genes(spe) dim(spe)
library(SpatialExperiment) library(STexampleData) # load example dataset from STexampleData package spe <- Visium_humanDLPFC() # preprocessing steps # keep only spots over tissue spe <- spe[, colData(spe)$in_tissue == 1] dim(spe) # filter low-expressed and mitochondrial genes spe <- filter_genes(spe) dim(spe)
Function to run 'nnSVG' method to identify spatially variable genes (SVGs) in spatially-resolved transcriptomics data.
nnSVG( input, spatial_coords = NULL, X = NULL, assay_name = "logcounts", n_neighbors = 10, order = "AMMD", n_threads = 1, BPPARAM = NULL, verbose = FALSE )
nnSVG( input, spatial_coords = NULL, X = NULL, assay_name = "logcounts", n_neighbors = 10, order = "AMMD", n_threads = 1, BPPARAM = NULL, verbose = FALSE )
input |
|
spatial_coords |
|
X |
|
assay_name |
|
n_neighbors |
|
order |
|
n_threads |
|
BPPARAM |
|
verbose |
|
Function to run 'nnSVG' method to identify spatially variable genes (SVGs) in spatially-resolved transcriptomics data.
The 'nnSVG' method is based on nearest-neighbor Gaussian processes (Datta et al. 2016) and uses the BRISC algorithm (Saha and Datta 2018) for model fitting and parameter estimation. The method scales linearly with the number of spatial locations, and can be applied to datasets containing thousands or more spatial locations. For more details on the method, see our paper.
This function runs 'nnSVG' for a full dataset. The function fits a separate model for each gene, using parallelization with BiocParallel for faster runtime. The parameter estimates from BRISC (sigma.sq, tau.sq, phi) for each gene are stored in 'Theta' in the BRISC output.
Note that the method and this function are designed for a single tissue section. For an example of how to run nnSVG in a dataset consisting of multiple tissue sections, see the tutorial in the nnSVG package vignette.
'nnSVG' performs inference on the spatial variance parameter estimates (sigma.sq) using a likelihood ratio (LR) test against a simpler linear model without spatial terms (i.e. without tau.sq or phi). The estimated LR statistics can then be used to rank SVGs. P-values are calculated from the LR statistics using the asymptotic chi-squared distribution with 2 degrees of freedom, and multiple testing adjusted p-values are calculated using the Benjamini-Hochberg method. We also calculate an effect size, defined as the proportion of spatial variance, 'prop_sv = sigma.sq / (sigma.sq + tau.sq)'.
The function assumes the input is provided either as a
SpatialExperiment
object or a numeric
matrix of values. If the
input is a SpatialExperiment
object, it is assumed to contain an
assay
slot containing either log-transformed normalized counts (also
known as logcounts, e.g. from the scran
package) or deviance residuals
(e.g. from the scry
package), which have been preprocessed, quality
controlled, and filtered to remove low-quality spatial locations. If the
input is a numeric
matrix of values, these values are assumed to
already be normalized and transformed (e.g. logcounts).
If the input was provided as a SpatialExperiment
object, the
output values are returned as additional columns in the rowData
slot
of the input object. If the input was provided as a numeric
matrix
of values, the output is returned as a numeric
matrix. The output
values include spatial variance parameter estimates, likelihood ratio (LR)
statistics, effect sizes (proportion of spatial variance), p-values, and
multiple testing adjusted p-values.
library(SpatialExperiment) library(STexampleData) library(scran) ### Example 1 ### for more details see extended example in vignette # load example dataset from STexampleData package spe1 <- Visium_humanDLPFC() # preprocessing steps # keep only spots over tissue spe1 <- spe1[, colData(spe1)$in_tissue == 1] # skip spot-level quality control (already performed in this dataset) # filter low-expressed and mitochondrial genes spe1 <- filter_genes(spe1) # calculate logcounts using library size factors spe1 <- computeLibraryFactors(spe1) spe1 <- logNormCounts(spe1) # select small number of genes for fast runtime in this example set.seed(123) ix <- c( which(rowData(spe1)$gene_name %in% c("PCP4", "NPY")), sample(seq_len(nrow(spe1)), 2) ) spe1 <- spe1[ix, ] # run nnSVG set.seed(123) spe1 <- nnSVG(spe1) # show results rowData(spe1) ### Example 2: With covariates ### for more details see extended example in vignette # load example dataset from STexampleData package spe2 <- SlideSeqV2_mouseHPC() # preprocessing steps # remove spots with NA cell type labels spe2 <- spe2[, !is.na(colData(spe2)$celltype)] # skip spot-level quality control (already performed in this dataset) # filter low-expressed and mitochondrial genes spe2 <- filter_genes( spe2, filter_genes_ncounts = 1, filter_genes_pcspots = 1, filter_mito = TRUE ) # calculate logcounts using library size normalization spe2 <- computeLibraryFactors(spe2) spe2 <- logNormCounts(spe2) # select small number of genes for fast runtime in this example set.seed(123) ix <- c( which(rowData(spe2)$gene_name %in% c("Cpne9", "Rgs14")), sample(seq_len(nrow(spe2)), 2) ) spe2 <- spe2[ix, ] # create model matrix for cell type labels X <- model.matrix(~ colData(spe2)$celltype) # run nnSVG with covariates set.seed(123) spe2 <- nnSVG(spe2, X = X) # show results rowData(spe2)
library(SpatialExperiment) library(STexampleData) library(scran) ### Example 1 ### for more details see extended example in vignette # load example dataset from STexampleData package spe1 <- Visium_humanDLPFC() # preprocessing steps # keep only spots over tissue spe1 <- spe1[, colData(spe1)$in_tissue == 1] # skip spot-level quality control (already performed in this dataset) # filter low-expressed and mitochondrial genes spe1 <- filter_genes(spe1) # calculate logcounts using library size factors spe1 <- computeLibraryFactors(spe1) spe1 <- logNormCounts(spe1) # select small number of genes for fast runtime in this example set.seed(123) ix <- c( which(rowData(spe1)$gene_name %in% c("PCP4", "NPY")), sample(seq_len(nrow(spe1)), 2) ) spe1 <- spe1[ix, ] # run nnSVG set.seed(123) spe1 <- nnSVG(spe1) # show results rowData(spe1) ### Example 2: With covariates ### for more details see extended example in vignette # load example dataset from STexampleData package spe2 <- SlideSeqV2_mouseHPC() # preprocessing steps # remove spots with NA cell type labels spe2 <- spe2[, !is.na(colData(spe2)$celltype)] # skip spot-level quality control (already performed in this dataset) # filter low-expressed and mitochondrial genes spe2 <- filter_genes( spe2, filter_genes_ncounts = 1, filter_genes_pcspots = 1, filter_mito = TRUE ) # calculate logcounts using library size normalization spe2 <- computeLibraryFactors(spe2) spe2 <- logNormCounts(spe2) # select small number of genes for fast runtime in this example set.seed(123) ix <- c( which(rowData(spe2)$gene_name %in% c("Cpne9", "Rgs14")), sample(seq_len(nrow(spe2)), 2) ) spe2 <- spe2[ix, ] # create model matrix for cell type labels X <- model.matrix(~ colData(spe2)$celltype) # run nnSVG with covariates set.seed(123) spe2 <- nnSVG(spe2, X = X) # show results rowData(spe2)