Title: | Finding spatially relevant marker genes in image based spatial transcriptomics data |
---|---|
Description: | This package contains the function to find marker genes for image-based spatial transcriptomics data. There are functions to create spatial vectors from the cell and transcript coordiantes, which are passed as inputs to find marker genes. Marker genes are detected for every cluster by two approaches. The first approach is by permtuation testing, which is implmented in parallel for finding marker genes for one sample study. The other approach is to build a linear model for every gene. This approach can account for multiple samples and backgound noise. |
Authors: | Melody Jin [aut, cre] |
Maintainer: | Melody Jin <[email protected]> |
License: | GPL-3 |
Version: | 0.99.6 |
Built: | 2025-03-12 13:35:43 UTC |
Source: | https://github.com/bioc/jazzPanda |
jazzPanda pacakge provides hybrid approaches to prioritize marker genes
that uses the spatial coordinates of gene detections and cells making up
clusters.
We propose a binning approach get_vectors
that summarises the number of genes and cells
within a cluster as spatial vectors. We have developed two approaches to
detect and prioritize marker genes.
The first approach compute_permp
is based on correlation
coefficients between genes and cluster spatial vectors, where significance
of the marker genes are assessed through permutation.
The second approach lasso_markers
is based on lasso regularisation and linear modeling of our defined spatial
vectors. This second approach is more flexible and can account for multiple
samples and background noise.
Melody Jin [email protected]
helper function to check the input of binning
.check_binning(bin_param, bin_type, w_x, w_y)
.check_binning(bin_param, bin_type, w_x, w_y)
bin_param |
A numeric vector indicating the size of the bin. If the
|
bin_type |
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon". |
w_x |
A numeric vector of length two specifying the x coordinate limits of enclosing box. |
w_y |
A numeric vector of length two specifying the y coordinate limits of enclosing box. |
the length of total bins
helper function to check the inputs passed to marker detection function
.check_valid_input( gene_mt, cluster_mt, sample_names, n_fold = 10, background = NULL )
.check_valid_input( gene_mt, cluster_mt, sample_names, n_fold = 10, background = NULL )
gene_mt |
A matrix contains the transcript count in each grid.
Each row refers to a grid, and each column refers to a gene.
The column names must be specified and refer to the genes. This can be the
output from the function |
cluster_mt |
A matrix contains the number of cells in a specific
cluster in each grid. Each row refers to a grid, and each column
refers to a cluster. The column names must be specified and refer to the
clusters. Please do not assign integers as column names.
This can be the output from the function |
sample_names |
A vector specifying the names for the samples. |
n_fold |
Optional. A positive number giving the number of folds used
for cross validation. This parameter will pass to
|
background |
Optional. A matrix providing the
background information. Each row refers to a grid, and each column refers to
one category of background information. Number of rows must equal to the
number of rows in |
a list of two matrices with the following components
n_clusters |
Number of clusters |
cluster_names |
a vector of strings giving the name of the clusters |
helper function to check the names of gene/cluster/sample
.check_valid_names(x, x_name)
.check_valid_names(x, x_name)
x |
A character vector to check naming |
x_name |
A name specifying the type of x, for message purpose only |
A character vector of same length with valid names
Compute observation statistic for permutation framework
.compute_observation( x, cluster_info, correlation_method, n_cores, test_genes, bin_type, bin_param, w_x, w_y, use_cm )
.compute_observation( x, cluster_info, correlation_method, n_cores, test_genes, bin_type, bin_param, w_x, w_y, use_cm )
x |
a named list (of transcript detection coordinates) or SingleCellExperiment or SpatialExperiment or SpatialFeatureExperiment object. If a named list is provided, every list element is a dataframe containing the transcript detection coordinates and column names must include "feature_name" (gene name), "x" (x coordinate), "y" (y coordinate). The list names must match samples in cluster_info. |
cluster_info |
A dataframe/matrix containing the centroid coordinates,
cluster label and sample for each cell.The column names must include
"x" (x coordinate), "y" (y coordinate),
"cluster" (cluster label) and "sample" (sample). It is strongly recommended
to use syntactically valid names for columns clusters and samples.
If invalid names are detected, the function |
correlation_method |
A parameter pass to |
n_cores |
A positive number specifying number of cores used for parallelizing permutation testing. Default is one core (sequential processing). |
test_genes |
A vector of strings giving the name of the genes you want to test correlation for. |
bin_type |
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon". |
bin_param |
A numeric vector indicating the size of the bin. If the
|
w_x |
a numeric vector of length two specifying the x coordinate limits of enclosing box. |
w_y |
a numeric vector of length two specifying the y coordinate limits of enclosing box. |
use_cm |
A boolean value that specifies whether to create spatial vectors for genes using the count matrix and cell coordinates instead of the transcript coordinates when both types of information are available. The default setting is FALSE. |
A named list with the following components
obs.stat |
A matrix contains the observation statistic for every gene and every cluster. Each row refers to a gene, and each column refers to a cluster |
gene_mt |
contains the transcript count in each grid. Each row refers to a grid, and each column refers to a gene. |
Compute permutation statistics for permutation framework
.compute_permutation( cluster_info, perm.size = 1000, correlation_method = "pearson", bin_type, bin_param, n_cores = 1, w_x, w_y, gene_mt, cluster_names )
.compute_permutation( cluster_info, perm.size = 1000, correlation_method = "pearson", bin_type, bin_param, n_cores = 1, w_x, w_y, gene_mt, cluster_names )
cluster_info |
A dataframe/matrix containing the centroid coordinates and cluster label for each cell.The column names should include "x" (x coordinate), "y" (y coordinate), and "cluster" (cluster label). |
perm.size |
A positive number specifying permutation times |
correlation_method |
A parameter pass to |
bin_type |
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon". |
bin_param |
A numeric vector indicating the size of the bin. If the
|
n_cores |
A positive number specifying number of cores used for parallelizing permutation testing. Default is one core (sequential processing). |
w_x |
a numeric vector of length two specifying the x coordinate limits of enclosing box. |
w_y |
a numeric vector of length two specifying the y coordinate limits of enclosing box. |
gene_mt |
A matrix contains the transcript count in each grid. Each row refers to a grid, and each column refers to a gene. |
cluster_names |
A list of strings giving the name and order of the clusters |
A matrix with permutation statistics
This function takes an object of class SingleCellExperiment,
SpatialExperiment or SpatialFeatureExperimentreturns and returns a list
object that is expected for the get_vector
functions.
.convert_data(x, sample_names, test_genes)
.convert_data(x, sample_names, test_genes)
x |
a SingleCellExperiment or SpatialExperiment or SpatialFeatureExperiment object |
sample_names |
a vector of strings giving the sample names |
test_genes |
A vector of strings giving the name of the genes you want to create gene vector. |
outputs a list object with the following components
trans_lst |
A list of named dataframes. Each dataframe refers to one sample and shows the transcript detection coordinates for each gene. The name matches the input sample_names |
cm_lst |
A list of named dataframes containing the count matrix for each sample. The name matches the input sample_names |
This function creates a structured output object named 'cor_mg_result' for storing the permutation results. The object contains three matrices:
.create_cor_mg_result(obs.stat, perm.pval, perm.pval.adj)
.create_cor_mg_result(obs.stat, perm.pval, perm.pval.adj)
obs.stat |
A matrix containing the correlation coefficients for each pair of genes and cluster vectors. |
perm.pval |
A matrix containing the raw permutation p-value for each pair of genes and cluster. |
perm.pval.adj |
A matrix containing the adjusted permutation p-value for each pair of genes and cluster. |
An S3 object of class 'cor_mg_result' which includes three matrices.
This function creates a structured output object named 'glm_mg_result' for storing the marker gene results. The object contains two data frames: top results and full results.
.create_lm_mg_result(top_result, full_result)
.create_lm_mg_result(top_result, full_result)
top_result |
A data frame containing top results. |
full_result |
A data frame containing full results. |
An S3 object of class 'glm_mg_result' which includes both results data frames.
Create spatial vectors for clusters
.get_cluster_vectors( cluster_info, bin_length, bin_type, bin_param, w_x, w_y, sample_names )
.get_cluster_vectors( cluster_info, bin_length, bin_type, bin_param, w_x, w_y, sample_names )
cluster_info |
A dataframe/matrix containing the centroid coordinates, cluster label and sample for each cell.The column names must include "x" (x coordinate), "y" (y coordinate), "cluster" (cluster label) and "sample" (sample). |
bin_length |
A positive integer giving the length of total bins |
bin_type |
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon". |
bin_param |
A numeric vector indicating the size of the bin. If the
|
w_x |
A numeric vector of length two specifying the x coordinate limits of enclosing box. |
w_y |
A numeric vector of length two specifying the y coordinate limits of enclosing box. |
sample_names |
a vector of strings giving the sample names |
a matrix contains the cell count in each grid. Each row refers to a grid, and each column refers to a cluster.
This function will build gene vectors with count matrix and cell locations
.get_gene_vectors_cm( cluster_info, cm_lst, bin_type, bin_param, test_genes, w_x, w_y )
.get_gene_vectors_cm( cluster_info, cm_lst, bin_type, bin_param, test_genes, w_x, w_y )
cluster_info |
A dataframe/matrix containing the centroid coordinates, cluster label and sample for each cell.The column names must include "x" (x coordinate), "y" (y coordinate), "cluster" (cluster label) and "sample" (sample). |
cm_lst |
A list of named matrices containing the count matrix for
each sample The name must match the sample column in |
bin_type |
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon". |
bin_param |
A numeric vector indicating the size of the bin. If the
|
test_genes |
A vector of strings giving the name of the genes you
want to test. This will be used as column names for one of the result matrix
|
w_x |
A numeric vector of length two specifying the x coordinate limits of enclosing box. |
w_y |
A numeric vector of length two specifying the y coordinate limits of enclosing box. |
a matrix contains the transcript count in each grid. Each row refers to a grid, and each column refers to a gene.
This function will build gene vectors based on the transcript coordinates of every gene
.get_gene_vectors_tr( trans_lst, test_genes, bin_type, bin_param, bin_length, w_x, w_y )
.get_gene_vectors_tr( trans_lst, test_genes, bin_type, bin_param, bin_length, w_x, w_y )
trans_lst |
If specified, it is a list of named dataframes. Each dataframe refers to one sample and shows the transcript detection coordinates for each gene. Optional parameter. |
test_genes |
A vector of strings giving the name of the genes you
want to test. This will be used as column names for one of the result matrix
|
bin_type |
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon". |
bin_param |
A numeric vector indicating the size of the bin. If the
|
bin_length |
A positive integer giving the length of total bins |
w_x |
A numeric vector of length two specifying the x coordinate limits of enclosing box. |
w_y |
A numeric vector of length two specifying the y coordinate limits of enclosing box. |
a matrix contains the transcript count in each grid. Each row refers to a grid, and each column refers to a gene.
help function to get lasso coefficient for every cluster for a given model
.get_lasso_coef( i_gene, gene_mt, vec_cluster, cluster_names, n_fold = 10, n_samples, sample_names )
.get_lasso_coef( i_gene, gene_mt, vec_cluster, cluster_names, n_fold = 10, n_samples, sample_names )
i_gene |
Name of the current gene |
gene_mt |
A matrix contains the transcript count in each grid.
Each row refers to a grid, and each column refers to a gene. The column
names must be specified and refer to the genes. This can be the
output from the function |
vec_cluster |
A matrix of the spatial vectors for clusters. |
cluster_names |
A vector of strings giving the name of clusters |
n_fold |
Optional. A positive number giving the number of folds used
for cross validation. This parameter will pass to
|
n_samples |
A positive number giving the number samples |
sample_names |
A vector specifying the names for the sample |
a list of two matrices with the following components
coef_df |
A matrix giving the lasso coefficient of each cluster |
lambda.1se |
the lambda.1se value of best fitted model |
This function will run permutation framework to compute a p-value for the correlation between the vectorised genes and clusters each cluster for one sample.
compute_permp( x, cluster_info, perm.size, bin_type, bin_param, test_genes, correlation_method = "pearson", n_cores = 1, correction_method = "BH", w_x, w_y, use_cm = FALSE )
compute_permp( x, cluster_info, perm.size, bin_type, bin_param, test_genes, correlation_method = "pearson", n_cores = 1, correction_method = "BH", w_x, w_y, use_cm = FALSE )
x |
a SingleCellExperiment or SpatialExperiment or SpatialFeatureExperiment object |
cluster_info |
A dataframe/matrix containing the centroid coordinates and cluster label for each cell.The column names should include "x" (x coordinate), "y" (y coordinate), and "cluster" (cluster label). |
perm.size |
A positive number specifying permutation times |
bin_type |
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon". |
bin_param |
A numeric vector indicating the size of the bin. If the
|
test_genes |
A vector of strings giving the name of the genes you
want to test correlation for.
|
correlation_method |
A parameter pass to |
n_cores |
A positive number specifying number of cores used for parallelizing permutation testing. Default is one core (sequential processing). |
correction_method |
A character string pass to |
w_x |
a numeric vector of length two specifying the x coordinate limits of enclosing box. |
w_y |
a numeric vector of length two specifying the y coordinate limits of enclosing box. |
use_cm |
A boolean value that specifies whether to create spatial vectors for genes using the count matrix and cell coordinates instead of the transcript coordinates when both types of information are available. The default setting is FALSE. |
To get a permutation p-value for the correlation between a gene
and a cluster, this function will permute the cluster label for
each cell randomly, and calculate correlation between the genes and
permuted clusters. This process will be repeated for perm.size
times, and permutation p-value is calculated as the probability of
permuted correlations larger than the observation correlation.
An object of class 'cor_mg_result'. To access specific components of the returned object:
Use get_cor
to retrieve the matrix of observed
correlation coefficients.
Use get_perm_p
to access the matrix of
raw permutation p-values.
Use get_perm_adjp
to obtain the matrix of adjusted
permutation p-values.
library(SpatialExperiment) library(BumpyMatrix) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample="sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell = rep(paste("cell",1:20, sep=""), times=2) mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe_sample1 <- SpatialExperiment( assays = list(molecules = mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) set.seed(100) corr_res <- compute_permp(x=spe_sample1, cluster_info=clusters, perm.size=10, bin_type="square", bin_param=c(2,2), test_genes=unique(trans_info$feature_name), correlation_method = "pearson", n_cores=1, correction_method="BH", w_x=w_x , w_y=w_y) # raw permutation p-value perm_p <- get_perm_p(corr_res) # adjusted permutation p-value adjusted_perm_p <- get_perm_adjp(corr_res) # observed correlation obs_corr <- get_cor(corr_res)
library(SpatialExperiment) library(BumpyMatrix) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample="sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell = rep(paste("cell",1:20, sep=""), times=2) mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe_sample1 <- SpatialExperiment( assays = list(molecules = mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) set.seed(100) corr_res <- compute_permp(x=spe_sample1, cluster_info=clusters, perm.size=10, bin_type="square", bin_param=c(2,2), test_genes=unique(trans_info$feature_name), correlation_method = "pearson", n_cores=1, correction_method="BH", w_x=w_x , w_y=w_y) # raw permutation p-value perm_p <- get_perm_p(corr_res) # adjusted permutation p-value adjusted_perm_p <- get_perm_adjp(corr_res) # observed correlation obs_corr <- get_cor(corr_res)
Convert the coordinates of set of genes into vectors.
create_genesets( x, name_lst, cluster_info, sample_names, bin_type, bin_param, w_x, w_y, use_cm = FALSE, n_cores = 1 )
create_genesets( x, name_lst, cluster_info, sample_names, bin_type, bin_param, w_x, w_y, use_cm = FALSE, n_cores = 1 )
x |
a named list (of transcript detection coordinates) or SingleCellExperiment or SpatialExperiment or SpatialFeatureExperiment object. If a named list is provided, every list element is a dataframe containing the transcript detection coordinates and column names must include "feature_name" (nagative control name), "x" (x coordinate) and "y" (y coordinate). The list names must match samples in cluster_info. |
name_lst |
A named list of strings giving the name of features that are treated as background. |
cluster_info |
A dataframe/matrix containing the centroid coordinates, cluster and sample label for each cell.The column names must include "x" (x coordinate), "y" (y coordinate), "cluster" (cluster label) and "sample" (sample). |
sample_names |
a vector of strings giving the sample names |
bin_type |
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon". |
bin_param |
A numeric vector indicating the size of the bin. If the
|
w_x |
A numeric vector of length two specifying the x coordinate limits of enclosing box. |
w_y |
A numeric vector of length two specifying the y coordinate limits of enclosing box. |
use_cm |
A boolean value that specifies whether to create spatial vectors for genes using the count matrix and cell coordinates instead of the transcript coordinates when both types of information are available. The default setting is FALSE. |
n_cores |
A positive number specifying number of cores used for parallelizing permutation testing. Default is one core (sequential processing). |
a list of two matrices with the following components
gene_mt |
contains the transcript count in each grid. Each row refers to a grid, and each column refers to a gene. |
cluster_mt |
contains the number of cells in a specific cluster in each grid. Each row refers to a grid, and each column refers to a cluster. |
The row order of gene_mt
matches the row order of cluster_mt
.
A matrix contains the sum count in each grid.
Each row refers to a grid, each column refers to a set in name_lst
.
The column name will match the names in name_lst
.
library(SpatialExperiment) set.seed(15) trans = as.data.frame(rbind(cbind(x = runif(10, min=1, max=10), y = runif(10, min=1, max=10), feature_name="A"), cbind(x = runif(5, min=10, max=24), y = runif(5, min=1, max=10), feature_name="B"), cbind(x = runif(10, min=10, max=24), y = runif(10, min=10, max=24), feature_name="C"))) trans$x = as.numeric(trans$x) trans$y = as.numeric(trans$y) trans$cell = sample(c("cell1","cell2","cell2"),replace=TRUE, size=nrow(trans)) # create SpatialExperiment object trans_mol <- BumpyMatrix::splitAsBumpyMatrix( trans[, c("x", "y")], row = trans$feature_name, col = trans$cell ) rep1_spe<- SpatialExperiment( assays = list(molecules = trans_mol),sample_id ="sample1" ) geneset_res <- create_genesets(x=rep1_spe, sample=c("sample1"), name_lst=list(dummy_A=c("A","C"), dummy_B=c("A","B","C")), bin_type="square", bin_param=c(2,2), w_x=c(0,25), w_y=c(0,25), cluster_info=NULL)
library(SpatialExperiment) set.seed(15) trans = as.data.frame(rbind(cbind(x = runif(10, min=1, max=10), y = runif(10, min=1, max=10), feature_name="A"), cbind(x = runif(5, min=10, max=24), y = runif(5, min=1, max=10), feature_name="B"), cbind(x = runif(10, min=10, max=24), y = runif(10, min=10, max=24), feature_name="C"))) trans$x = as.numeric(trans$x) trans$y = as.numeric(trans$y) trans$cell = sample(c("cell1","cell2","cell2"),replace=TRUE, size=nrow(trans)) # create SpatialExperiment object trans_mol <- BumpyMatrix::splitAsBumpyMatrix( trans[, c("x", "y")], row = trans$feature_name, col = trans$cell ) rep1_spe<- SpatialExperiment( assays = list(molecules = trans_mol),sample_id ="sample1" ) geneset_res <- create_genesets(x=rep1_spe, sample=c("sample1"), name_lst=list(dummy_A=c("A","C"), dummy_B=c("A","B","C")), bin_type="square", bin_param=c(2,2), w_x=c(0,25), w_y=c(0,25), cluster_info=NULL)
Accessor function to retrieve the observed correlation from an 'cor_mg_result' object.
get_cor(obj)
get_cor(obj)
obj |
An 'cor_mg_result' object. |
A matrix contains the observation statistic for every gene and every cluster. Each row refers to a gene, and each column refers to a cluster
library(SpatialExperiment) library(BumpyMatrix) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample="sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell = rep(paste("cell",1:20, sep=""), times=2) mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe_sample1 <- SpatialExperiment( assays = list(molecules = mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) set.seed(100) corr_res <- compute_permp(x=spe_sample1, cluster_info=clusters, perm.size=10, bin_type="square", bin_param=c(2,2), test_genes=unique(trans_info$feature_name), correlation_method = "pearson", n_cores=1, correction_method="BH", w_x=w_x , w_y=w_y) # observed correlation obs_corr <- get_cor(corr_res)
library(SpatialExperiment) library(BumpyMatrix) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample="sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell = rep(paste("cell",1:20, sep=""), times=2) mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe_sample1 <- SpatialExperiment( assays = list(molecules = mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) set.seed(100) corr_res <- compute_permp(x=spe_sample1, cluster_info=clusters, perm.size=10, bin_type="square", bin_param=c(2,2), test_genes=unique(trans_info$feature_name), correlation_method = "pearson", n_cores=1, correction_method="BH", w_x=w_x , w_y=w_y) # observed correlation obs_corr <- get_cor(corr_res)
Accessor function to retrieve the 'full_result' dataframe from an 'glm_mg_result' object.
get_full_mg(obj, coef_cutoff = 0)
get_full_mg(obj, coef_cutoff = 0)
obj |
An 'glm_mg_result' object. |
coef_cutoff |
A positive number giving the coefficient cutoff value. Genes whose cluster showing a coefficient value smaller than the cutoff will be removed. Default is 0. |
A data frame with detailed information for each gene and the most relevant cluster label.
gene
Gene name
cluster
The name of the significant cluster after
glm_coef
The coefficient of the selected cluster
in the generalised linear model.
pearson
Pearson correlation between the gene vector and the
selected cluster vector.
max_gg_corr
A number showing the maximum pearson correlation
for this gene vector and all other gene vectors in the input gene_mt
max_gc_corr
A number showing the maximum pearson correlation
for this gene vector and every cluster vectors in the input
cluster_mt
library(SpatialExperiment) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample<-"sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=100, mean=20,sd=5), y = rnorm(n=100, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell<-sample(c("cell1","cell2","cell2"),replace=TRUE, size=nrow(trans_info)) trans_mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe<- SpatialExperiment( assays = list(molecules = trans_mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) vecs_lst <- get_vectors(x=spe,sample_names=c("sample1"), cluster_info = clusters, bin_type = "square", bin_param = c(20,20), test_genes =c("gene_A1","gene_A2","gene_B1","gene_B2"), w_x = w_x, w_y=w_y) lasso_res <- lasso_markers(gene_mt=vecs_lst$gene_mt, cluster_mt = vecs_lst$cluster_mt, sample_names=c("sample1"), keep_positive=TRUE, background=NULL) # the full result full_result <- get_full_mg(lasso_res, coef_cutoff=0.05)
library(SpatialExperiment) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample<-"sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=100, mean=20,sd=5), y = rnorm(n=100, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell<-sample(c("cell1","cell2","cell2"),replace=TRUE, size=nrow(trans_info)) trans_mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe<- SpatialExperiment( assays = list(molecules = trans_mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) vecs_lst <- get_vectors(x=spe,sample_names=c("sample1"), cluster_info = clusters, bin_type = "square", bin_param = c(20,20), test_genes =c("gene_A1","gene_A2","gene_B1","gene_B2"), w_x = w_x, w_y=w_y) lasso_res <- lasso_markers(gene_mt=vecs_lst$gene_mt, cluster_mt = vecs_lst$cluster_mt, sample_names=c("sample1"), keep_positive=TRUE, background=NULL) # the full result full_result <- get_full_mg(lasso_res, coef_cutoff=0.05)
Accessor function to retrieve the permutation adjusted p-value from an 'cor_mg_result' object.
get_perm_adjp(obj)
get_perm_adjp(obj)
obj |
An 'cor_mg_result' object. |
A matrix contains the adjusted permutation p-value. Each row refers to a gene, and each column refers to a cluster.
library(SpatialExperiment) library(BumpyMatrix) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample="sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell = rep(paste("cell",1:20, sep=""), times=2) mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe_sample1 <- SpatialExperiment( assays = list(molecules = mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) set.seed(100) corr_res <- compute_permp(x=spe_sample1, cluster_info=clusters, perm.size=10, bin_type="square", bin_param=c(2,2), test_genes=unique(trans_info$feature_name), correlation_method = "pearson", n_cores=1, correction_method="BH", w_x=w_x , w_y=w_y) # adjusted permutation p-value adjusted_perm_p <- get_perm_adjp(corr_res)
library(SpatialExperiment) library(BumpyMatrix) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample="sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell = rep(paste("cell",1:20, sep=""), times=2) mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe_sample1 <- SpatialExperiment( assays = list(molecules = mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) set.seed(100) corr_res <- compute_permp(x=spe_sample1, cluster_info=clusters, perm.size=10, bin_type="square", bin_param=c(2,2), test_genes=unique(trans_info$feature_name), correlation_method = "pearson", n_cores=1, correction_method="BH", w_x=w_x , w_y=w_y) # adjusted permutation p-value adjusted_perm_p <- get_perm_adjp(corr_res)
Accessor function to retrieve the raw permutation p-value from an 'cor_mg_result' object.
get_perm_p(obj)
get_perm_p(obj)
obj |
An 'cor_mg_result' object. |
A matrix contains the raw permutation p-value. Each row refers to a gene, and each column refers to a cluster.
library(SpatialExperiment) library(BumpyMatrix) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample<-"sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell = rep(paste("cell",1:20, sep=""), times=2) mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe_sample1 <- SpatialExperiment( assays = list(molecules = mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) set.seed(100) corr_res <- compute_permp(x=spe_sample1, cluster_info=clusters, perm.size=10, bin_type="square", bin_param=c(2,2), test_genes=unique(trans_info$feature_name), correlation_method = "pearson", n_cores=1, correction_method="BH", w_x=w_x , w_y=w_y) # raw permutation p-value perm_p <- get_perm_p(corr_res)
library(SpatialExperiment) library(BumpyMatrix) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample<-"sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell = rep(paste("cell",1:20, sep=""), times=2) mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe_sample1 <- SpatialExperiment( assays = list(molecules = mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) set.seed(100) corr_res <- compute_permp(x=spe_sample1, cluster_info=clusters, perm.size=10, bin_type="square", bin_param=c(2,2), test_genes=unique(trans_info$feature_name), correlation_method = "pearson", n_cores=1, correction_method="BH", w_x=w_x , w_y=w_y) # raw permutation p-value perm_p <- get_perm_p(corr_res)
Accessor function to retrieve the 'top_result' dataframe from an 'glm_mg_result' object.
get_top_mg(obj, coef_cutoff = 0.05)
get_top_mg(obj, coef_cutoff = 0.05)
obj |
An 'glm_mg_result' object. |
coef_cutoff |
A positive number giving the coefficient cutoff value. Genes whose top cluster showing a coefficient value smaller than the cutoff will be marked as non-marker genes ("NoSig"). Default is 0.05. |
A data frame with detailed information for each gene and the most relevant cluster label.
gene
Gene name
top_cluster
The name of the most relevant cluster
after thresholding the coefficients.
glm_coef
The coefficient of the selected cluster in the
generalised linear model.
pearson
Pearson correlation between the gene vector and the
selected cluster vector.
max_gg_corr
A number showing the maximum pearson correlation
for this gene vector and all other gene vectors in the input gene_mt
max_gc_corr
A number showing the maximum pearson correlation
for this gene vector and every cluster vectors in the input
cluster_mt
library(SpatialExperiment) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample<-"sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=100, mean=20,sd=5), y = rnorm(n=100, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell<-sample(c("cell1","cell2","cell2"),replace=TRUE, size=nrow(trans_info)) trans_mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe<- SpatialExperiment( assays = list(molecules = trans_mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) vecs_lst <- get_vectors(x=spe,sample_names=c("sample1"), cluster_info = clusters, bin_type = "square", bin_param = c(20,20), test_genes =c("gene_A1","gene_A2","gene_B1","gene_B2"), w_x = w_x, w_y=w_y) lasso_res <- lasso_markers(gene_mt=vecs_lst$gene_mt, cluster_mt = vecs_lst$cluster_mt, sample_names=c("sample1"), keep_positive=TRUE, background=NULL) # the top result top_result<- get_top_mg(lasso_res, coef_cutoff=0.05)
library(SpatialExperiment) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample<-"sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=100, mean=20,sd=5), y = rnorm(n=100, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell<-sample(c("cell1","cell2","cell2"),replace=TRUE, size=nrow(trans_info)) trans_mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe<- SpatialExperiment( assays = list(molecules = trans_mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) vecs_lst <- get_vectors(x=spe,sample_names=c("sample1"), cluster_info = clusters, bin_type = "square", bin_param = c(20,20), test_genes =c("gene_A1","gene_A2","gene_B1","gene_B2"), w_x = w_x, w_y=w_y) lasso_res <- lasso_markers(gene_mt=vecs_lst$gene_mt, cluster_mt = vecs_lst$cluster_mt, sample_names=c("sample1"), keep_positive=TRUE, background=NULL) # the top result top_result<- get_top_mg(lasso_res, coef_cutoff=0.05)
This function will convert the coordinates into a numeric vector for genes and clusters.
get_vectors( x, cluster_info, sample_names, bin_type, bin_param, test_genes, w_x, w_y, use_cm = FALSE, n_cores = 1 )
get_vectors( x, cluster_info, sample_names, bin_type, bin_param, test_genes, w_x, w_y, use_cm = FALSE, n_cores = 1 )
x |
a named list (of transcript detection coordinates) or SingleCellExperiment or SpatialExperiment or SpatialFeatureExperiment object. If a named list is provided, every list element is a dataframe containing the transcript detection coordinates and column names must include "feature_name" (gene name), "x" (x coordinate), "y" (y coordinate). The list names must match samples in cluster_info. |
cluster_info |
A dataframe/matrix containing the centroid coordinates,
cluster label and sample for each cell.The column names must include
"x" (x coordinate), "y" (y coordinate),
"cluster" (cluster label) and "sample" (sample). It is strongly recommended
to use syntactically valid names for columns clusters and samples.
If invalid names are detected, the function |
sample_names |
a vector of strings giving the sample names.
It is strongly recommended to use syntactically valid names for columns
clusters and samples. If invalid names are detected, the function
|
bin_type |
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon". |
bin_param |
A numeric vector indicating the size of the bin. If the
|
test_genes |
A vector of strings giving the name of the genes you want
to create gene vector. This will be used as column names for one of the
result matrix |
w_x |
A numeric vector of length two specifying the x coordinate limits of enclosing box. |
w_y |
A numeric vector of length two specifying the y coordinate limits of enclosing box. |
use_cm |
A boolean value that specifies whether to create spatial vectors for genes using the count matrix and cell coordinates instead of the transcript coordinates when both types of information are available. The default setting is FALSE. |
n_cores |
A positive number specifying number of cores used for parallelizing permutation testing. Default is one core (sequential processing). |
This function can be used to generate input for lasso_markers
by specifying all the parameters.
Suppose the input data contains genes,
clusters, and
samples, we
want to use
square bin to convert the coordinates
of genes and clusters into 1d vectors.
If , the returned list will contain one matrix for gene vectors
(
gene_mt
) of dimension and one matrix for
cluster vectors (
cluster_mt
) of dimension .
If , gene and cluster vectors are constructed for each sample
separately and concat together. There will be additional k columns on the
returned
cluster_mt
, which is the one-hot encoding of the
sample information.
Moreover, this function can vectorise genes and clusters separately based
on the input. If x
is NULL, this function will
return vectorised clusters based on cluster_info
.
If cluster_info
is NULL, this function will return vectorised genes
based on x
.
a list of two matrices with the following components
gene_mt |
contains the transcript count in each grid. Each row refers to a grid, and each column refers to a gene. |
cluster_mt |
contains the number of cells in a specific cluster in each grid. Each row refers to a grid, and each column refers to a cluster. |
The row order of gene_mt
matches the row order of cluster_mt
.
library(SpatialExperiment) set.seed(100) # simulate coordinates for clusters df_clA = data.frame(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), cluster="A") df_clB = data.frame(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), cluster="B") clusters = rbind(df_clA, df_clB) clusters$sample="sample1" # simulate coordinates for genes trans_info = data.frame(rbind(cbind(x = rnorm(n=10, mean=20,sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x=as.numeric(trans_info$x) trans_info$y=as.numeric(trans_info$y) trans_info$cell = sample(c("cell1","cell2","cell2"),replace=TRUE, size=nrow(trans_info)) w_x = c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y = c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) # use named list as input vecs_lst = get_vectors(x= list("sample1" = trans_info), sample_names=c("sample1"), cluster_info = clusters, bin_type = "square", bin_param = c(5,5), test_genes =c("gene_A1","gene_A2","gene_B1","gene_B2"), w_x = w_x, w_y=w_y) # use SpatialExperiment object as input trans_mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe<- SpatialExperiment( assays = list(molecules = trans_mol),sample_id ="sample1" ) vecs_lst_spe = get_vectors(x=spe,sample_names=c("sample1"), cluster_info = clusters, bin_type = "square", bin_param = c(5,5), test_genes =c("gene_A1","gene_A2","gene_B1","gene_B2"), w_x = w_x, w_y=w_y)
library(SpatialExperiment) set.seed(100) # simulate coordinates for clusters df_clA = data.frame(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), cluster="A") df_clB = data.frame(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), cluster="B") clusters = rbind(df_clA, df_clB) clusters$sample="sample1" # simulate coordinates for genes trans_info = data.frame(rbind(cbind(x = rnorm(n=10, mean=20,sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=10, mean=20, sd=5), y = rnorm(n=10, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=10, mean=100, sd=5), y = rnorm(n=10, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x=as.numeric(trans_info$x) trans_info$y=as.numeric(trans_info$y) trans_info$cell = sample(c("cell1","cell2","cell2"),replace=TRUE, size=nrow(trans_info)) w_x = c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y = c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) # use named list as input vecs_lst = get_vectors(x= list("sample1" = trans_info), sample_names=c("sample1"), cluster_info = clusters, bin_type = "square", bin_param = c(5,5), test_genes =c("gene_A1","gene_A2","gene_B1","gene_B2"), w_x = w_x, w_y=w_y) # use SpatialExperiment object as input trans_mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe<- SpatialExperiment( assays = list(molecules = trans_mol),sample_id ="sample1" ) vecs_lst_spe = get_vectors(x=spe,sample_names=c("sample1"), cluster_info = clusters, bin_type = "square", bin_param = c(5,5), test_genes =c("gene_A1","gene_A2","gene_B1","gene_B2"), w_x = w_x, w_y=w_y)
This function will find the most spatially relevant cluster label for each gene.
lasso_markers( gene_mt, cluster_mt, sample_names, keep_positive = TRUE, background = NULL, n_fold = 10 )
lasso_markers( gene_mt, cluster_mt, sample_names, keep_positive = TRUE, background = NULL, n_fold = 10 )
gene_mt |
A matrix contains the transcript count in each grid.
Each row refers to a grid, and each column refers to a gene.
The column names must be specified and refer to the genes. This can be the
output from the function |
cluster_mt |
A matrix contains the number of cells in a specific
cluster in each grid. Each row refers to a grid, and each column refers
to a cluster. The column names must be specified and refer to the clusters.
Please do not assign integers as column names.
This can be the output from the function |
sample_names |
A vector specifying the names for the samples. |
keep_positive |
A logical flag indicating whether to return positively correlated clusters or not. |
background |
Optional. A matrix providing the
background information. Each row refers to a grid, and each column refers to
one category of background information. Number of rows must equal to the
number of rows in |
n_fold |
Optional. A positive number giving the number of folds used
for cross validation. This parameter will pass to
|
This function will take the converted gene and cluster vectors from function
get_vectors
, and return the most relevant cluster label for
each gene. If there are multiple samples in the dataset, this function
will find shared markers across different samples by including additional
sample vectors in the input cluster_mt
.
This function treats all input cluster vectors as features, and create a penalized linear model for one gene vector with lasso regularization. Clusters with non-zero coefficient will be selected, and these clusters will be used to formulate a generalised linear model for this gene vector.
If the input keep_positive
is TRUE, the clusters
with positive coefficient and significant p-value will be saved in the
output matrix lasso_full_result
. The cluster with a
positive coefficient and the minimum p-value will be regarded as the most
relevant cluster to this gene and be saved in the output matrix
lasso_result
.
If the input keep_positive
is FALSE, the clusters with
negative coefficient and significant p-value will be saved in the
output matrix lasso_full_result
. The cluster with a negative
coefficient and the minimum p-value will be regarded as the most relevant
cluster to this gene and be saved in the output matrix lasso_result
.
If there is no clusters with significant p-value, the a string "NoSig" will be returned for this gene.
The parameter background
can be used to capture unwanted noise
pattern in the dataset. For example, we can include negative control
genes as a background cluster in the model. If the most relevant cluster
selected by one gene matches the background "clusters",
we will return "NoSig" for this gene.
An object of class 'glm_mg_result' To access specific components of the returned object:
Use get_top_mg
to retrieve the top result data frame
Use get_full_mg
to retrieve full result data frame
library(SpatialExperiment) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample<-"sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=100, mean=20,sd=5), y = rnorm(n=100, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell<-sample(c("cell1","cell2","cell2"),replace=TRUE, size=nrow(trans_info)) trans_mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe<- SpatialExperiment( assays = list(molecules = trans_mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) vecs_lst <- get_vectors(x=spe,sample_names=c("sample1"), cluster_info = clusters, bin_type = "square", bin_param = c(20,20), test_genes =c("gene_A1","gene_A2","gene_B1","gene_B2"), w_x = w_x, w_y=w_y) lasso_res <- lasso_markers(gene_mt=vecs_lst$gene_mt, cluster_mt = vecs_lst$cluster_mt, sample_names=c("sample1"), keep_positive=TRUE, background=NULL) # the top result top_result <- get_top_mg(lasso_res, coef_cutoff=0.05) # the full result full_result <- get_full_mg(lasso_res, coef_cutoff=0.05)
library(SpatialExperiment) set.seed(100) # simulate coordinates for clusters df_clA <- data.frame(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), cluster="A") df_clB <- data.frame(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), cluster="B") clusters <- rbind(df_clA, df_clB) clusters$sample<-"sample1" # simulate coordinates for genes trans_info <- data.frame(rbind(cbind(x = rnorm(n=100, mean=20,sd=5), y = rnorm(n=100, mean=20, sd=5), feature_name="gene_A1"), cbind(x = rnorm(n=100, mean=20, sd=5), y = rnorm(n=100, mean=20, sd=5), feature_name="gene_A2"), cbind(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), feature_name="gene_B1"), cbind(x = rnorm(n=100, mean=100, sd=5), y = rnorm(n=100, mean=100, sd=5), feature_name="gene_B2"))) trans_info$x<-as.numeric(trans_info$x) trans_info$y<-as.numeric(trans_info$y) trans_info$cell<-sample(c("cell1","cell2","cell2"),replace=TRUE, size=nrow(trans_info)) trans_mol <- BumpyMatrix::splitAsBumpyMatrix( trans_info[, c("x", "y")], row = trans_info$feature_name, col = trans_info$cell ) spe<- SpatialExperiment( assays = list(molecules = trans_mol),sample_id ="sample1" ) w_x <- c(min(floor(min(trans_info$x)), floor(min(clusters$x))), max(ceiling(max(trans_info$x)), ceiling(max(clusters$x)))) w_y <- c(min(floor(min(trans_info$y)), floor(min(clusters$y))), max(ceiling(max(trans_info$y)), ceiling(max(clusters$y)))) vecs_lst <- get_vectors(x=spe,sample_names=c("sample1"), cluster_info = clusters, bin_type = "square", bin_param = c(20,20), test_genes =c("gene_A1","gene_A2","gene_B1","gene_B2"), w_x = w_x, w_y=w_y) lasso_res <- lasso_markers(gene_mt=vecs_lst$gene_mt, cluster_mt = vecs_lst$cluster_mt, sample_names=c("sample1"), keep_positive=TRUE, background=NULL) # the top result top_result <- get_top_mg(lasso_res, coef_cutoff=0.05) # the full result full_result <- get_full_mg(lasso_res, coef_cutoff=0.05)
A data frame file containing the coordinates and cluster label for each cell of the selected subset of rep1.
data(rep1_clusters)
data(rep1_clusters)
A data frame with 1705 rows and 6 variables:
the provided cell type annotation
cluster label
x coordinates
y coordiantes
cell id
sample id
data frame
<https://cf.10xgenomics.com/samples/xenium/1.0.1/ Xenium_FFPE_Human_Breast_Cancer_Rep1/ Xenium_FFPE_Human_Breast_Cancer_Rep1_outs.zip>
A SpatialExperiment object containing the coordinates for every negative control detection for rep1_sub
data(rep1_neg)
data(rep1_neg)
A SpatialExperiment object. The molecules assay slot is a BumpyDataFrameMatrix obejct. Can retrieve DataFrame version by calling 'BumpyMatrix::unsplitAsDataFrame(molecules(rep1_neg))'. The molecules slot contains:
x coordinates
y coordiantes
negative control probe name
negative control category
SpatialExperiment
<https://cf.10xgenomics.com/samples/xenium/1.0.1/ Xenium_FFPE_Human_Breast_Cancer_Rep1/Xenium_FFPE_Human_Breast_ Cancer_Rep1_outs.zip>
A SpatialExperiment object containing the coordinates for every transcript
data(rep1_sub)
data(rep1_sub)
A SpatialExperiment object with 20 genes and 1713 cells. The molecules assay slot is a BumpyDataFrameMatrix obejct. Can retrieve DataFrame version by calling 'BumpyMatrix::unsplitAsDataFrame(molecules(rep2_sub))'. The molecules assay contains 79576 rows and 3 variables:
x coordinates
y coordiantes
transcript name
SpatialExperiment
<https://cf.10xgenomics.com/samples/xenium/1.0.1/ Xenium_FFPE_Human_Breast_Cancer_Rep1/ Xenium_FFPE_Human_Breast_Cancer_Rep1_outs.zip>
A csv file containing the coordinates and cluster label for each cell of the selected subset of rep2.
data(rep2_clusters)
data(rep2_clusters)
A data frame with 1815 rows and and 6 variables:
the provided cell type annotation
cluster label
x coordinates
y coordiantes
cell id
sample id
data frame
<https://cf.10xgenomics.com/samples/xenium/1.0.1/ Xenium_FFPE_Human_Breast_Cancer_Rep2/ Xenium_FFPE_Human_Breast_Cancer_Rep2_outs.zip>
A data frame containing the coordinates for every negative control detection for rep2
data(rep2_neg)
data(rep2_neg)
A SpatialExperiment object. The molecules assay slot is a BumpyDataFrameMatrix obejct. Can retrieve DataFrame version by calling 'BumpyMatrix::unsplitAsDataFrame(molecules(rep2_neg))'. The molecules slot contains:
x coordinates
y coordiantes
negative control probe name
.
negative control category
SpatialExperiment
<https://cf.10xgenomics.com/samples/xenium/1.0.1/ Xenium_FFPE_Human_Breast_Cancer_Rep2/ Xenium_FFPE_Human_Breast_Cancer_Rep2_outs.zip>
A SpatialExperiment object containing the coordinates for every negative control detection for rep2_sub
data(rep2_sub)
data(rep2_sub)
A SpatialExperiment object with 20 genes and 1829 cells. The molecules assay slot is a BumpyDataFrameMatrix obejct. Can retrieve DataFrame version by calling 'BumpyMatrix::unsplitAsDataFrame(molecules(rep2_sub))'. The molecules slot contains:
x coordinates
y coordiantes
transcript name
SpatialExperiment
<https://cf.10xgenomics.com/samples/xenium/1.0.1/ Xenium_FFPE_Human_Breast_Cancer_Rep2/ Xenium_FFPE_Human_Breast_Cancer_Rep2_outs.zip>