Title: | scFeatures: Multi-view representations of single-cell and spatial data for disease outcome prediction |
---|---|
Description: | scFeatures constructs multi-view representations of single-cell and spatial data. scFeatures is a tool that generates multi-view representations of single-cell and spatial data through the construction of a total of 17 feature types. These features can then be used for a variety of analyses using other software in Biocondutor. |
Authors: | Yue Cao [aut, cre], Yingxin Lin [aut], Ellis Patrick [aut], Pengyi Yang [aut], Jean Yee Hwa Yang [aut] |
Maintainer: | Yue Cao <[email protected]> |
License: | GPL-3 |
Version: | 1.7.0 |
Built: | 2024-10-31 04:52:59 UTC |
Source: | https://github.com/bioc/scFeatures |
This is a subsampled version of the melanoma patients dataset as used in our manuscript. The original dataset is available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120575.
data("example_scrnaseq")
data("example_scrnaseq")
example_scrnaseq
A Seurat object with 3523 genes and 550 cells. Some of the key metadata columns are:
cell type of the cell
patient ID of the cell
whether the patient is a responder or non-responder
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120575
This function takes a list object containing spatial transcriptomics matrix as input
and estimates the relative number of cells per spot in the data.
The number of cells is estimated as the library size scaled to
the range from 1 to 100.
This value stored in the number_cells
attribute.
get_num_cell_per_spot(alldata)
get_num_cell_per_spot(alldata)
alldata |
A list object containing spatial transcriptomics |
a vector with the relative number of cells in each spot.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq@assays$RNA@data data <- list(data = data) number_of_cells <- get_num_cell_per_spot(data)
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq@assays$RNA@data data <- list(data = data) number_of_cells <- get_num_cell_per_spot(data)
This function removes mitochondria and ribosomal genes and genes highly correlated with these genes, as mitochondria and ribosomal genes are typically not interesting to look at.
remove_mito_ribo(alldata)
remove_mito_ribo(alldata)
alldata |
A list object containing expression data |
The list object with the mitochrondrial and ribosomal genes and other highly correlated genes removed
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq data <- list(data = data@assays$RNA@data) data <- remove_mito_ribo(data)
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq data <- list(data = data@assays$RNA@data) data <- remove_mito_ribo(data)
This function takes the feature matrix generated by scFeatures as input and creates an HTML report containing the results of the association study. The report is saved to the specified output folder.
run_association_study_report(scfeatures_result, output_folder)
run_association_study_report(scfeatures_result, output_folder)
scfeatures_result |
a named list storing the scFeatures feature output.
Note that the names of the list should be one or multiple of the following:
|
output_folder |
the path to the folder where the HTML report will be saved |
an HTML file, saved to the directory defined in the output_folder
argument
## Not run: output_folder <- tempdir() utils::data("scfeatures_result" , package = "scFeatures") run_association_study_report(scfeatures_result, output_folder ) ## End(Not run)
## Not run: output_folder <- tempdir() utils::data("scfeatures_result" , package = "scFeatures") run_association_study_report(scfeatures_result, output_folder ) ## End(Not run)
This function calculates the ligand receptor interaction score using SingleCellSignalR. The output features are in the form of celltype a -> celltype b – ligand 1 -> receptor 2 , which indicates the interaction between ligand 1 in celltype a and receptor 2 in celltype b.
It supports scRNA-seq.
run_CCI(data, type = "scrna", ncores = 1)
run_CCI(data, type = "scrna", ncores = 1)
data |
A list object containing |
type |
input data type, either scrna, spatial_p, or spatial_t |
ncores |
number of cores |
a matrix of samples x features The features are in the form of ligand 1 receptor 2 celltype a, ligand 1 receptor 2 celltype b ... etc, with the numbers representing cell-cell interaction probability.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:1000, 1:100] celltype <- data$celltype sample <- data$sample data <- as.matrix(data@assays$RNA@data) alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_CCI <- run_CCI(alldata, type = "scrna" , ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:1000, 1:100] celltype <- data$celltype sample <- data$sample data <- as.matrix(data@assays$RNA@data) alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_CCI <- run_CCI(alldata, type = "scrna" , ncores = 1 )
This function calculates the pairwise distance between cell types for a sample by using the coordinates and cell types of the cells. We find the nearest neighbours of each cell and the cell types of these neighbours. These are considered as spatial interaction pairs. The cell type composition of the spatial interaction pairs are used as features. The function supports spatial proteomics and spatial transcriptomics.
run_celltype_interaction(data, type = "spatial_p", ncores = 1)
run_celltype_interaction(data, type = "spatial_p", ncores = 1)
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features The features are in the form of protein 1 vs protein 2, protein 1 vs protein 3 ... etc, with the numbers representing the proportion of each interaction pairs in a give sample.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype data <- data@assays$RNA@data sample <- sample( c("patient1", "patient2", "patient3"), ncol(data) , replace= TRUE ) x <- sample(1:100, ncol(data) , replace = TRUE) y <- sample(1:100, ncol(data) , replace = TRUE) spatialCoords <- list( x , y) alldata <- scFeatures:::formatData(data = data, sample = sample, celltype = celltype, spatialCoords = spatialCoords ) feature_celltype_interaction <- run_celltype_interaction( alldata, type = "spatial_p", ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype data <- data@assays$RNA@data sample <- sample( c("patient1", "patient2", "patient3"), ncol(data) , replace= TRUE ) x <- sample(1:100, ncol(data) , replace = TRUE) y <- sample(1:100, ncol(data) , replace = TRUE) spatialCoords <- list( x , y) alldata <- scFeatures:::formatData(data = data, sample = sample, celltype = celltype, spatialCoords = spatialCoords ) feature_celltype_interaction <- run_celltype_interaction( alldata, type = "spatial_p", ncores = 1 )
This function computes the correlation of gene expression across samples. The user can specify the genes of interest, or let the function use the top variable genes by default. The function supports scRNA-seq, spatial proteomics, and spatial transcriptomics.
run_gene_cor( data, type = "scrna", genes = NULL, num_top_gene = NULL, ncores = 1 )
run_gene_cor( data, type = "scrna", genes = NULL, num_top_gene = NULL, ncores = 1 )
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
genes |
Default to NULL, in which case the top variable genes will be used. If provided by user, need to be in the format of a list containing the genes of interest, eg, genes <- c(GZMA", "GZMK", "CCR7", "RPL38" ) |
num_top_gene |
Number of top variable genes to use when genes is not provided. Defaults to 5. |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features The features are in the form of gene 1, gene 2 ... etc, with the numbers representing the proportion that the gene is expressed across all cells.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:100, 1:200] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_gene_cor <- run_gene_cor( alldata, type = "scrna", num_top_gene = 5, ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:100, 1:200] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_gene_cor <- run_gene_cor( alldata, type = "scrna", num_top_gene = 5, ncores = 1 )
This function computes the correlation of expression of a set of genes for each cell type in the input data. The input data can be of three types: 'scrna', 'spatial_p' or 'spatial_t'. If the genes parameter is not provided by the user, the top variable genes will be selected based on the num_top_gene parameter (defaults to 100).
run_gene_cor_celltype( data, type = "scrna", genes = NULL, num_top_gene = NULL, ncores = 1 )
run_gene_cor_celltype( data, type = "scrna", genes = NULL, num_top_gene = NULL, ncores = 1 )
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
genes |
Optional dataframe with 2 columns: 'marker' and 'celltype'. The 'marker' column should contain the genes of interest (e.g. 'S100A11', 'CCL4'), and the 'celltype' column should contain the celltype that the gene expression is to be computed from (e.g. 'CD8', 'B cells'). If not provided, the top variable genes will be used based on the num_top_gene parameter. |
num_top_gene |
Number of top genes to use when genes is not provided. Defaults to 5. |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features. The features are in the form of gene 1 vs gene 2 cell type a , gene 1 vs gene 3 cell type b ... etc, with the numbers representing the correlation of the two given genes in the given cell type.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_gene_cor_celltype <- run_gene_cor_celltype( alldata, type = "scrna", num_top_gene = 5, ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_gene_cor_celltype <- run_gene_cor_celltype( alldata, type = "scrna", num_top_gene = 5, ncores = 1 )
This function computes the mean expression of genes across samples. The user can specify the genes of interest, or let the function use the top variable genes by default. The function supports scRNA-seq, spatial proteomics, and spatial transcriptomics.
run_gene_mean( data, type = "scrna", genes = NULL, num_top_gene = NULL, ncores = 1 )
run_gene_mean( data, type = "scrna", genes = NULL, num_top_gene = NULL, ncores = 1 )
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
genes |
Default to NULL, in which case the top variable genes will be used. If provided by user, need to be in the format of a list containing the genes of interest, eg, genes <- c(GZMA", "GZMK", "CCR7", "RPL38" ) |
num_top_gene |
Number of top variable genes to use when genes is not provided. Defaults to 1500. |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features The features are in the form of gene 1, gene 2 ... etc, with the numbers representing averaged gene expression across all cells.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_gene_mean <- run_gene_mean( alldata, type = "scrna", num_top_gene = 150, ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_gene_mean <- run_gene_mean( alldata, type = "scrna", num_top_gene = 150, ncores = 1 )
This function computes the mean expression of a set of genes for each cell type in the input data. The input data can be of three types: 'scrna', 'spatial_p' or 'spatial_t'. If the genes parameter is not p rovided by the user, the top variable genes will be selected based on the num_top_gene parameter (defaults to 100).
run_gene_mean_celltype( data, type = "scrna", genes = NULL, num_top_gene = NULL, ncores = 1 )
run_gene_mean_celltype( data, type = "scrna", genes = NULL, num_top_gene = NULL, ncores = 1 )
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
genes |
Optional dataframe with 2 columns: 'marker' and 'celltype'. The 'marker' column should contain the genes of interest (e.g. 'S100A11', 'CCL4'), and the 'celltype' column should contain the celltype that the gene expression is to be computed from (e.g. 'CD8', 'B cells'). If not provided, the top variable genes will be used based on the num_top_gene parameter. |
num_top_gene |
Number of top genes to use when genes is not provided. Defaults to 100. |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features. The features are in the form of gene 1 celltype a, gene 2 celltype b ... etc, with the number representing average gene expression of the given gene across the cells of the the given celltype.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:200, 1:200] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_gene_mean_celltype <- run_gene_mean_celltype( alldata, type = "scrna", num_top_gene = 100, ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:200, 1:200] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_gene_mean_celltype <- run_gene_mean_celltype( alldata, type = "scrna", num_top_gene = 100, ncores = 1 )
This function computes the proportion of gene expression across samples. The user can specify the genes of interest, or let the function use the top variable genes by default. The function supports scRNA-seq, spatial proteomics, and spatial transcriptomics.
run_gene_prop( data, type = "scrna", genes = NULL, num_top_gene = NULL, ncores = 1 )
run_gene_prop( data, type = "scrna", genes = NULL, num_top_gene = NULL, ncores = 1 )
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
genes |
Default to NULL, in which case the top variable genes will be used. If provided by user, need to be in the format of a list containing the genes of interest, eg, genes <- c(GZMA", "GZMK", "CCR7", "RPL38" ) |
num_top_gene |
Number of top variable genes to use when genes is not provided. Defaults to 1500. |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features The features are in the form of gene 1 vs gene 2, gene 1 vs gene 3 ... etc, with the numbers representing correlation of gene expressions.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_gene_prop <- run_gene_prop(alldata, type = "scrna", num_top_gene = 10, ncores = 1)
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_gene_prop <- run_gene_prop(alldata, type = "scrna", num_top_gene = 10, ncores = 1)
This function computes the proportion of expression of a set of genes for each cell type in the input data. The input data can be of three types: 'scrna', 'spatial_p' or 'spatial_t'. If the genes parameter is not provided by the user, the top variable genes will be selected based on the num_top_gene parameter (defaults to 100).
run_gene_prop_celltype( data, type = "scrna", genes = NULL, num_top_gene = NULL, ncores = 1 )
run_gene_prop_celltype( data, type = "scrna", genes = NULL, num_top_gene = NULL, ncores = 1 )
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
genes |
Optional dataframe with 2 columns: 'marker' and 'celltype'. The 'marker' column should contain the genes of interest (e.g. 'S100A11', 'CCL4'), and the 'celltype' column should contain the celltype that the gene expression is to be computed from (e.g. 'CD8', 'B cells'). If not provided, the top variable genes will be used based on the num_top_gene parameter. |
num_top_gene |
Number of top genes to use when genes is not provided. Defaults to 100. |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features. The features are in the form of gene 1 celltype a, gene 2 celltype b ... etc, with the number representing proportion of gene expression of the given gene across the cells of the the given celltype.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_gene_prop_celltype <- run_gene_prop_celltype( alldata, type = "scrna", num_top_gene = 100, ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_gene_prop_celltype <- run_gene_prop_celltype( alldata, type = "scrna", num_top_gene = 100, ncores = 1 )
This function calculates L-statistics to measure spatial autocorrelation. L value greater than zero indicates spatial attraction of the pair of proteins whereas L value less than zero indicates spatial repulsion. The function supports spatial proteomics and spatial transcriptomics.
run_L_function(data, type = "spatial_p", ncores = 1)
run_L_function(data, type = "spatial_p", ncores = 1)
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features The features are in the form of protein 1 vs protein 2, protein 1 vs protein 3 ... etc, with the numbers representing the L values.
utils::data("example_scrnaseq" , package = "scFeatures") celltype <- example_scrnaseq$celltype data <- example_scrnaseq@assays$RNA@data sample <- sample( c("patient1", "patient2", "patient3"), ncol(data) , replace= TRUE ) x <- sample(1:100, ncol(data) , replace = TRUE) y <- sample(1:100, ncol(data) , replace = TRUE) spatialCoords <- list( x , y) alldata <- scFeatures:::formatData(data = data, sample = sample, celltype = celltype, spatialCoords = spatialCoords ) feature_L_function <- run_L_function(alldata, type = "spatial_p", ncores = 1)
utils::data("example_scrnaseq" , package = "scFeatures") celltype <- example_scrnaseq$celltype data <- example_scrnaseq@assays$RNA@data sample <- sample( c("patient1", "patient2", "patient3"), ncol(data) , replace= TRUE ) x <- sample(1:100, ncol(data) , replace = TRUE) y <- sample(1:100, ncol(data) , replace = TRUE) spatialCoords <- list( x , y) alldata <- scFeatures:::formatData(data = data, sample = sample, celltype = celltype, spatialCoords = spatialCoords ) feature_L_function <- run_L_function(alldata, type = "spatial_p", ncores = 1)
This function calculates Moran's I to measure spatial autocorrelation, which an indicattion of how strongly the feature(ie, genes/proteins) expression values in a sample cluster or disperse. A value closer to 1 indicates clustering of similar values and a value closer to -1 indicates clustering of dissimilar values. A value of 0 indicates no particular clustering structure, ie, the values are spatially distributed randomly. The function supports spatial proteomics and spatial transcriptomics.
run_Morans_I(data, type = "spatial_p", ncores = 1)
run_Morans_I(data, type = "spatial_p", ncores = 1)
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features The features are in the form of protein 1, protein 2 ... etc, with the numbers representing Moran's value.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype data <- data@assays$RNA@data sample <- sample( c("patient1", "patient2", "patient3"), ncol(data) , replace= TRUE ) x <- sample(1:100, ncol(data) , replace = TRUE) y <- sample(1:100, ncol(data) , replace = TRUE) spatialCoords <- list( x , y) alldata <- scFeatures:::formatData(data = data, sample = sample, celltype = celltype, spatialCoords = spatialCoords ) feature_Morans_I <- run_Morans_I(alldata, type = "spatial_p", ncores = 1)
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype data <- data@assays$RNA@data sample <- sample( c("patient1", "patient2", "patient3"), ncol(data) , replace= TRUE ) x <- sample(1:100, ncol(data) , replace = TRUE) y <- sample(1:100, ncol(data) , replace = TRUE) spatialCoords <- list( x , y) alldata <- scFeatures:::formatData(data = data, sample = sample, celltype = celltype, spatialCoords = spatialCoords ) feature_Morans_I <- run_Morans_I(alldata, type = "spatial_p", ncores = 1)
This function calculates the nearest neighbour correlation for each feature (eg, proteins) in each sample. This is calculated by taking the correlation between each cell and its nearest neighbours cell for a particular feature. This function supports spatial proteomics, and spatial transcriptomics.
run_nn_correlation(data, type = "spatial_p", num_top_gene = NULL, ncores = 1)
run_nn_correlation(data, type = "spatial_p", num_top_gene = NULL, ncores = 1)
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
num_top_gene |
Number of top variable genes to use when genes is not provided. Defaults to 1500. |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features The features are in the form of protein 1, protein 2 ... etc, with the numbers representing Pearson's correlation.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype data <- data@assays$RNA@data sample <- sample( c("patient1", "patient2", "patient3"), ncol(data) , replace= TRUE ) x <- sample(1:100, ncol(data) , replace = TRUE) y <- sample(1:100, ncol(data) , replace = TRUE) spatialCoords <- list( x , y) alldata <- scFeatures:::formatData(data = data, sample = sample, celltype = celltype, spatialCoords = spatialCoords ) feature_nn_correlation <- run_nn_correlation( alldata, type = "spatial_p", ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype data <- data@assays$RNA@data sample <- sample( c("patient1", "patient2", "patient3"), ncol(data) , replace= TRUE ) x <- sample(1:100, ncol(data) , replace = TRUE) y <- sample(1:100, ncol(data) , replace = TRUE) spatialCoords <- list( x , y) alldata <- scFeatures:::formatData(data = data, sample = sample, celltype = celltype, spatialCoords = spatialCoords ) feature_nn_correlation <- run_nn_correlation( alldata, type = "spatial_p", ncores = 1 )
This function calculates pathway scores for a given input dataset and gene set using gene set enrichment analysis (GSVA). It supports scRNA-seq, spatial proteomics and spatial transcriptomics. It currently supports two pathway analysis methods: ssgsea and aucell. By default, it uses the 50 hallmark gene sets from msigdb. Alternatively, users can provide their own gene sets of interest in a list format.
run_pathway_gsva( data, method = "ssgsea", geneset = NULL, species = "Homo sapiens", type = "scrna", subsample = TRUE, ncores = 1 )
run_pathway_gsva( data, method = "ssgsea", geneset = NULL, species = "Homo sapiens", type = "scrna", subsample = TRUE, ncores = 1 )
data |
A list object containing |
method |
Type of pathway analysis method, currently support |
geneset |
By default (when the |
species |
Whether the species is "Homo sapiens" or "Mus musculus". Default is "Homo sapiens". |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
subsample |
Whether to subsample, either TRUE or FALSE. For larger datasets (eg, over 30,000 cells), the subsample function can be used to increase speed. |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features The features are in the form of pathway 1 celltype a, pathway 2 celltype b ... etc, with the number representing the gene set enrichment score of a given pathway in cells from a given celltype.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_pathway_gsva <- run_pathway_gsva( alldata, geneset = NULL, species = "Homo sapiens", type = "scrna", subsample = FALSE, ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_pathway_gsva <- run_pathway_gsva( alldata, geneset = NULL, species = "Homo sapiens", type = "scrna", subsample = FALSE, ncores = 1 )
This function calculates pathway scores for a given dataset and gene set using gene expression levels. It supports scRNA-seq, spatial transcriptomics and spatial proteomics and spatial transcriptomics). By default, it uses the 50 hallmark gene sets from msigdb. Alternatively, users can provide their own gene sets of interest in a list format.
run_pathway_mean( data, geneset = NULL, species = "Homo sapiens", type = "scrna", ncores = 1 )
run_pathway_mean( data, geneset = NULL, species = "Homo sapiens", type = "scrna", ncores = 1 )
data |
A list object containing |
geneset |
By default (when the |
species |
Whether the species is "Homo sapiens" or "Mus musculus". Default is "Homo sapiens". |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features The features are in the form of pathway 1 celltype a, pathway 2 celltype b ... etc, with the number representing the averaged expression of a given pathway in cells from a given celltype.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:500, 1:200] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_pathway_mean <- run_pathway_mean( alldata , geneset = NULL, species = "Homo sapiens", type = "scrna", ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:500, 1:200] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_pathway_mean <- run_pathway_mean( alldata , geneset = NULL, species = "Homo sapiens", type = "scrna", ncores = 1 )
This function calculates pathway scores for a given input dataset and gene set using the proportion of gene expression levels. It supports scRNA-seq, spatial transcriptomics and spatial proteomics and spatial transcriptomics). By default, it uses the 50 hallmark gene sets from msigdb. Alternatively, users can provide their own gene sets of interest in a list format.
run_pathway_prop( data, geneset = NULL, species = "Homo sapiens", type = "scrna", ncores = 1 )
run_pathway_prop( data, geneset = NULL, species = "Homo sapiens", type = "scrna", ncores = 1 )
data |
A list object containing |
geneset |
By default (when the |
species |
Whether the species is "Homo sapiens" or "Mus musculus". Default is "Homo sapiens". |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features The features are in the form of pathway 1 celltype a, pathway 2 celltype b ... etc, with the number representing the proportion of expression of a given pathway in cells from a given celltype.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:100, 1:100] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_pathway_prop <- run_pathway_prop( alldata, geneset = NULL, species = "Homo sapiens", type = "scrna", ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:100, 1:100] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_pathway_prop <- run_pathway_prop( alldata, geneset = NULL, species = "Homo sapiens", type = "scrna", ncores = 1 )
This function calculates the proportions of cells belonging to each cell type,
and applies a logit transformation to the proportions.
The input data must contain sample
and celltype
metadata column.
The function supports scRNA-seq and spatial proteomics.
The function returns a dataframe with samples as rows and cell types as columns.
run_proportion_logit(data, type = "scrna", ncores = 1)
run_proportion_logit(data, type = "scrna", ncores = 1)
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features The features are in the form of celltype a, celltype b, with the number representing proportions.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_proportion_logit <- run_proportion_logit( alldata, type = "scrna", ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_proportion_logit <- run_proportion_logit( alldata, type = "scrna", ncores = 1 )
This function calculates pairwise cell type proportion ratio for each sample.
and applies a logit transformation to the proportions.
The input data must contain sample
and celltype
metadata column.
The function supports scRNA-seq and spatial proteomics.
The function returns a dataframe with samples as rows and cell types as columns.
run_proportion_ratio(data, type = "scrna", ncores = 1)
run_proportion_ratio(data, type = "scrna", ncores = 1)
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features. The features are in the form of celltype a vs celltype b, celltype a vs celltype c, with the number representing the ratio between the two cell types.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_proportion_ratio <- run_proportion_ratio( alldata, type = "scrna", ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_proportion_ratio <- run_proportion_ratio( alldata, type = "scrna", ncores = 1 )
This function calculates the proportions of cells belonging to each cell type.
The input data must contain sample
and celltype
metadata column.
The function supports scRNA-seq and spatial proteomics.
The function returns a dataframe with samples as rows and cell types as columns.
run_proportion_raw(data, type = "scrna", ncores = 1)
run_proportion_raw(data, type = "scrna", ncores = 1)
data |
A list object containing |
type |
The type of dataset, either "scrna", "spatial_t", or "spatial_p". |
ncores |
Number of cores for parallel processing. |
a dataframe of samples x features. The features are in the form of celltype a, celltype b, with the number representing proportions.
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_proportion_raw <- run_proportion_raw( alldata, type = "scrna", ncores = 1 )
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq[1:50, 1:20] celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data alldata <- scFeatures:::formatData(data = data, celltype = celltype, sample = sample ) feature_proportion_raw <- run_proportion_raw( alldata, type = "scrna", ncores = 1 )
The scFeatures function generates a variety of features from a Seurat object containing single cell RNA-sequencing data. By default, all feature types will be generated and returned in a single list containing multiple data frames.
scFeatures( data = NULL, sample = NULL, celltype = NULL, spatialCoords = NULL, spotProbability = NULL, feature_types = NULL, type = "scrna", ncores = 1, species = "Homo sapiens", celltype_genes = NULL, aggregated_genes = NULL, geneset = NULL )
scFeatures( data = NULL, sample = NULL, celltype = NULL, spatialCoords = NULL, spotProbability = NULL, feature_types = NULL, type = "scrna", ncores = 1, species = "Homo sapiens", celltype_genes = NULL, aggregated_genes = NULL, geneset = NULL )
data |
input data, a matrix of genes by cells |
sample |
a vector of sample information |
celltype |
a vector of cell type information |
spatialCoords |
a list of two vectors containing the x and y coordinates of each cell |
spotProbability |
a matrix of spot probability, each row represents a celltype and each column represents a spot |
feature_types |
vector containing the name of the feature types to generate, options are "proportion_raw", "proportion_logit" , "proportion_ratio", "gene_mean_celltype", "gene_prop_celltype", "gene_cor_celltype", "pathway_gsva" , "pathway_mean", "pathway_prop", "CCI", "gene_mean_aggregated", "gene_prop_aggregated", 'gene_cor_aggregated', "L_stats" , "celltype_interaction" , "morans_I", "nn_correlation". If no value is provided, all the above feature types will be generated. |
type |
input data type, either "scrna" (stands for single-cell RNA-sequencing data), "spatial_p" (stands for spatial proteomics data), or "spatial_t" (stands for single cell spatial data ) |
ncores |
number of cores , default to 1 |
species |
either "Homo sapiens" or "Mus musculus". Defaults to "Homo sapiens" if no value provided |
celltype_genes |
the genes of interest for celltype specific gene expression feature category If no value is provided, the top variable genes will be used |
aggregated_genes |
the genes of interest for overall aggregated gene expression feature category If no value is provided, the top variable genes will be used |
geneset |
the geneset of interest for celltype specific pathway feature category If no value is provided, the 50 hallmark pathways will be used |
a list of dataframes containing the generated feature matrix in the form of sample x features
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data scfeatures_result <- scFeatures(data, celltype = celltype, sample = sample, type = "scrna", feature_types = "proportion_raw")
utils::data("example_scrnaseq" , package = "scFeatures") data <- example_scrnaseq celltype <- data$celltype sample <- data$sample data <- data@assays$RNA@data scfeatures_result <- scFeatures(data, celltype = celltype, sample = sample, type = "scrna", feature_types = "proportion_raw")
This is an example output of the scFeatures() function for example_scrnaseq.
data("scfeatures_result")
data("scfeatures_result")
scfeatures_result
A list with two dataframes. In each dataframe the columns are each patient and the rows are the feature values. The first dataframe contains the feature type "proportion_raw". The second dataframe contains the feature type "proportion_logit".