Title: | Classifier for Single-cell RNA-seq Using Cell Clusters |
---|---|
Description: | Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment. |
Authors: | Rui Fu [cre, aut], Kent Riemondy [aut], Austin Gillen [ctb], Chengzhe Tian [ctb], Jay Hesselberth [ctb], Yue Hao [ctb], Michelle Daya [ctb], Sidhant Puntambekar [ctb], RNA Bioscience Initiative [fnd, cph] |
Maintainer: | Rui Fu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.19.0 |
Built: | 2024-10-30 05:31:30 UTC |
Source: | https://github.com/bioc/clustifyr |
Given a reference matrix and a list of genes, take the union of all genes in vector and genes in reference matrix and insert zero counts for all remaining genes.
append_genes(gene_vector, ref_matrix)
append_genes(gene_vector, ref_matrix)
gene_vector |
char vector with gene names |
ref_matrix |
Reference matrix containing cell types vs. gene expression values |
Reference matrix with union of all genes
mat <- append_genes( gene_vector = human_genes_10x, ref_matrix = cbmc_ref )
mat <- append_genes( gene_vector = human_genes_10x, ref_matrix = cbmc_ref )
Find rank bias
assess_rank_bias( avg_mat, ref_mat, query_genes = NULL, res, organism, plot_name = NULL, rds_name = NULL, expand_unassigned = FALSE )
assess_rank_bias( avg_mat, ref_mat, query_genes = NULL, res, organism, plot_name = NULL, rds_name = NULL, expand_unassigned = FALSE )
avg_mat |
average expression matrix |
ref_mat |
reference expression matrix |
query_genes |
original vector of genes used to clustify |
res |
dataframe of idents, such as output of cor_to_call |
organism |
for GO term analysis, organism name: human - 'hsapiens', mouse - 'mmusculus' |
plot_name |
name for saved pdf, if NULL then no file is written (default) |
rds_name |
name for saved rds of rank_diff, if NULL then no file is written (default) |
expand_unassigned |
test all ref clusters for unassigned results |
pdf of ggplot object
## Not run: avg <- average_clusters( pbmc_matrix_small, pbmc_meta$seurat_clusters ) res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "seurat_clusters" ) top_call <- cor_to_call( res, metadata = pbmc_meta, cluster_col = "seurat_clusters", collapse_to_cluster = FALSE, threshold = 0.8 ) res_rank <- assess_rank_bias( avg, cbmc_ref, res = top_call ) ## End(Not run)
## Not run: avg <- average_clusters( pbmc_matrix_small, pbmc_meta$seurat_clusters ) res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "seurat_clusters" ) top_call <- cor_to_call( res, metadata = pbmc_meta, cluster_col = "seurat_clusters", collapse_to_cluster = FALSE, threshold = 0.8 ) res_rank <- assess_rank_bias( avg, cbmc_ref, res = top_call ) ## End(Not run)
manually change idents as needed
assign_ident( metadata, cluster_col = "cluster", ident_col = "type", clusters, idents )
assign_ident( metadata, cluster_col = "cluster", ident_col = "type", clusters, idents )
metadata |
column of ident |
cluster_col |
column in metadata containing cluster info |
ident_col |
column in metadata containing identity assignment |
clusters |
names of clusters to change, string or vector of strings |
idents |
new idents to assign, must be length of 1 or same as clusters |
new dataframe of metadata
Average expression values per cluster
average_clusters( mat, metadata, cluster_col = "cluster", if_log = TRUE, cell_col = NULL, low_threshold = 0, method = "mean", output_log = TRUE, subclusterpower = 0, cut_n = NULL )
average_clusters( mat, metadata, cluster_col = "cluster", if_log = TRUE, cell_col = NULL, low_threshold = 0, method = "mean", output_log = TRUE, subclusterpower = 0, cut_n = NULL )
mat |
expression matrix |
metadata |
data.frame or vector containing cluster assignments per cell. Order must match column order in supplied matrix. If a data.frame provide the cluster_col parameters. |
cluster_col |
column in metadata with cluster number |
if_log |
input data is natural log, averaging will be done on unlogged data |
cell_col |
if provided, will reorder matrix first |
low_threshold |
option to remove clusters with too few cells |
method |
whether to take mean (default), median, 10% truncated mean, or trimean, max, min |
output_log |
whether to report log results |
subclusterpower |
whether to get multiple averages per original cluster |
cut_n |
set on a limit of genes as expressed, lower ranked genes are set to 0, considered unexpressed |
average expression matrix, with genes for row names, and clusters for column names
mat <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", if_log = FALSE ) mat[1:3, 1:3]
mat <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", if_log = FALSE ) mat[1:3, 1:3]
Binarize scRNAseq data
binarize_expr(mat, n = 1000, cut = 0)
binarize_expr(mat, n = 1000, cut = 0)
mat |
single-cell expression matrix |
n |
number of top expressing genes to keep |
cut |
cut off to set to 0 |
matrix of 1s and 0s
pbmc_avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified" ) mat <- binarize_expr(pbmc_avg) mat[1:3, 1:3]
pbmc_avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified" ) mat <- binarize_expr(pbmc_avg) mat[1:3, 1:3]
Function to combine records into single atlas
build_atlas(matrix_fns = NULL, genes_fn, matrix_objs = NULL, output_fn = NULL)
build_atlas(matrix_fns = NULL, genes_fn, matrix_objs = NULL, output_fn = NULL)
matrix_fns |
character vector of paths to study matrices stored as .rds files. If a named character vector, then the name will be added as a suffix to the cell type name in the final matrix. If it is not named, then the filename will be used (without .rds) |
genes_fn |
text file with a single column containing genes and the ordering desired in the output matrix |
matrix_objs |
Checks to see whether .rds files will be read or R objects in a local environment. A list of environmental objects can be passed to matrx_objs, and that names will be used, otherwise defaults to numbers |
output_fn |
output filename for .rds file. If NULL the matrix will be returned instead of saving |
Combined matrix with all genes given
pbmc_ref_matrix <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", if_log = TRUE # whether the expression matrix is already log transformed ) references_to_combine <- list(pbmc_ref_matrix, cbmc_ref) atlas <- build_atlas(NULL, human_genes_10x, references_to_combine, NULL)
pbmc_ref_matrix <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", if_log = TRUE # whether the expression matrix is already log transformed ) references_to_combine <- list(pbmc_ref_matrix, cbmc_ref) atlas <- build_atlas(NULL, human_genes_10x, references_to_combine, NULL)
Distance calculations for spatial coord
calc_distance( coord, metadata, cluster_col = "cluster", collapse_to_cluster = FALSE )
calc_distance( coord, metadata, cluster_col = "cluster", collapse_to_cluster = FALSE )
coord |
dataframe or matrix of spatial coordinates, cell barcode as rownames |
metadata |
data.frame or vector containing cluster assignments per cell. Order must match column order in supplied matrix. If a data.frame provide the cluster_col parameters. |
cluster_col |
column in metadata with cluster number |
collapse_to_cluster |
instead of reporting min distance to cluster per cell, summarize to cluster level |
min distance matrix
cbs <- paste0("cb_", 1:100) spatial_coords <- data.frame( row.names = cbs, X = runif(100), Y = runif(100) ) group_ids <- sample(c("A", "B"), 100, replace = TRUE) dist_res <- calc_distance( spatial_coords, group_ids )
cbs <- paste0("cb_", 1:100) spatial_coords <- data.frame( row.names = cbs, X = runif(100), Y = runif(100) ) group_ids <- sample(c("A", "B"), 100, replace = TRUE) dist_res <- calc_distance( spatial_coords, group_ids )
compute similarity
calc_similarity(query_mat, ref_mat, compute_method, rm0 = FALSE, ...)
calc_similarity(query_mat, ref_mat, compute_method, rm0 = FALSE, ...)
query_mat |
query data matrix |
ref_mat |
reference data matrix |
compute_method |
method(s) for computing similarity scores |
rm0 |
consider 0 as missing data, recommended for per_cell |
... |
additional parameters |
matrix of numeric values
Convert expression matrix to GSEA pathway scores (would take a similar place in workflow before average_clusters/binarize)
calculate_pathway_gsea( mat, pathway_list, n_perm = 1000, scale = TRUE, no_warnings = TRUE )
calculate_pathway_gsea( mat, pathway_list, n_perm = 1000, scale = TRUE, no_warnings = TRUE )
mat |
expression matrix |
pathway_list |
a list of vectors, each named for a specific pathway, or dataframe |
n_perm |
Number of permutation for fgsea function. Defaults to 1000. |
scale |
convert expr_mat into zscores prior to running GSEA?, default = FALSE |
no_warnings |
suppress warnings from gsea ties |
matrix of GSEA NES values, cell types as row names, pathways as column names
gl <- list( "n" = c("PPBP", "LYZ", "S100A9"), "a" = c("IGLL5", "GNLY", "FTL") ) pbmc_avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified" ) calculate_pathway_gsea( mat = pbmc_avg, pathway_list = gl )
gl <- list( "n" = c("PPBP", "LYZ", "S100A9"), "a" = c("IGLL5", "GNLY", "FTL") ) pbmc_avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified" ) calculate_pathway_gsea( mat = pbmc_avg, pathway_list = gl )
get concensus calls for a list of cor calls
call_consensus(list_of_res)
call_consensus(list_of_res)
list_of_res |
list of call dataframes from cor_to_call_rank |
dataframe of cluster, new ident, and mean rank
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", ref_mat = cbmc_ref ) res2 <- cor_to_call_rank(res, threshold = "auto") res3 <- cor_to_call_rank(res) call_consensus(list(res2, res3))
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", ref_mat = cbmc_ref ) res2 <- cor_to_call_rank(res, threshold = "auto") res3 <- cor_to_call_rank(res) call_consensus(list(res2, res3))
Insert called ident results into metadata
call_to_metadata( res, metadata, cluster_col, per_cell = FALSE, rename_prefix = NULL )
call_to_metadata( res, metadata, cluster_col, per_cell = FALSE, rename_prefix = NULL )
res |
dataframe of idents, such as output of cor_to_call |
metadata |
input metadata with tsne or umap coordinates and cluster ids |
cluster_col |
metadata column, can be cluster or cellid |
per_cell |
whether the res dataframe is listed per cell |
rename_prefix |
prefix to add to type and r column names |
new metadata with added columns
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", ref_mat = cbmc_ref ) res2 <- cor_to_call(res, cluster_col = "classified") call_to_metadata( res = res2, metadata = pbmc_meta, cluster_col = "classified", rename_prefix = "assigned" )
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", ref_mat = cbmc_ref ) res2 <- cor_to_call(res, cluster_col = "classified") call_to_metadata( res = res2, metadata = pbmc_meta, cluster_col = "classified", rename_prefix = "assigned" )
reference marker matrix from seurat citeseq CBMC tutorial
cbmc_m
cbmc_m
An object of class data.frame
with 3 rows and 13 columns.
Other data:
cbmc_ref
,
downrefs
,
human_genes_10x
,
mouse_genes_10x
,
pbmc_markers
,
pbmc_markers_M3Drop
,
pbmc_matrix_small
,
pbmc_meta
,
pbmc_vargenes
reference matrix from seurat citeseq CBMC tutorial
cbmc_ref
cbmc_ref
An object of class matrix
(inherits from array
) with 2000 rows and 13 columns.
Other data:
cbmc_m
,
downrefs
,
human_genes_10x
,
mouse_genes_10x
,
pbmc_markers
,
pbmc_markers_M3Drop
,
pbmc_matrix_small
,
pbmc_meta
,
pbmc_vargenes
Given a count matrix, determine if the matrix has been either log-normalized, normalized, or contains raw counts
check_raw_counts(counts_matrix, max_log_value = 50)
check_raw_counts(counts_matrix, max_log_value = 50)
counts_matrix |
Count matrix containing scRNA-seq read data |
max_log_value |
Static value to determine if a matrix is normalized |
String either raw counts, log-normalized or normalized
check_raw_counts(pbmc_matrix_small)
check_raw_counts(pbmc_matrix_small)
Compare scRNA-seq data to reference data.
clustify(input, ...) ## Default S3 method: clustify( input, ref_mat, metadata = NULL, cluster_col = NULL, query_genes = NULL, n_genes = 1000, per_cell = FALSE, n_perm = 0, compute_method = "spearman", pseudobulk_method = "mean", verbose = TRUE, lookuptable = NULL, rm0 = FALSE, obj_out = TRUE, seurat_out = obj_out, vec_out = FALSE, rename_prefix = NULL, threshold = "auto", low_threshold_cell = 0, exclude_genes = c(), if_log = TRUE, organism = "hsapiens", plot_name = NULL, rds_name = NULL, expand_unassigned = FALSE, ... ) ## S3 method for class 'Seurat' clustify( input, ref_mat, cluster_col = NULL, query_genes = NULL, n_genes = 1000, per_cell = FALSE, n_perm = 0, compute_method = "spearman", pseudobulk_method = "mean", use_var_genes = TRUE, dr = "umap", obj_out = TRUE, seurat_out = obj_out, vec_out = FALSE, threshold = "auto", verbose = TRUE, rm0 = FALSE, rename_prefix = NULL, exclude_genes = c(), metadata = NULL, organism = "hsapiens", plot_name = NULL, rds_name = NULL, expand_unassigned = FALSE, ... ) ## S3 method for class 'SingleCellExperiment' clustify( input, ref_mat, cluster_col = NULL, query_genes = NULL, per_cell = FALSE, n_perm = 0, compute_method = "spearman", pseudobulk_method = "mean", use_var_genes = TRUE, dr = "umap", obj_out = TRUE, seurat_out = obj_out, vec_out = FALSE, threshold = "auto", verbose = TRUE, rm0 = FALSE, rename_prefix = NULL, exclude_genes = c(), metadata = NULL, organism = "hsapiens", plot_name = NULL, rds_name = NULL, expand_unassigned = FALSE, ... )
clustify(input, ...) ## Default S3 method: clustify( input, ref_mat, metadata = NULL, cluster_col = NULL, query_genes = NULL, n_genes = 1000, per_cell = FALSE, n_perm = 0, compute_method = "spearman", pseudobulk_method = "mean", verbose = TRUE, lookuptable = NULL, rm0 = FALSE, obj_out = TRUE, seurat_out = obj_out, vec_out = FALSE, rename_prefix = NULL, threshold = "auto", low_threshold_cell = 0, exclude_genes = c(), if_log = TRUE, organism = "hsapiens", plot_name = NULL, rds_name = NULL, expand_unassigned = FALSE, ... ) ## S3 method for class 'Seurat' clustify( input, ref_mat, cluster_col = NULL, query_genes = NULL, n_genes = 1000, per_cell = FALSE, n_perm = 0, compute_method = "spearman", pseudobulk_method = "mean", use_var_genes = TRUE, dr = "umap", obj_out = TRUE, seurat_out = obj_out, vec_out = FALSE, threshold = "auto", verbose = TRUE, rm0 = FALSE, rename_prefix = NULL, exclude_genes = c(), metadata = NULL, organism = "hsapiens", plot_name = NULL, rds_name = NULL, expand_unassigned = FALSE, ... ) ## S3 method for class 'SingleCellExperiment' clustify( input, ref_mat, cluster_col = NULL, query_genes = NULL, per_cell = FALSE, n_perm = 0, compute_method = "spearman", pseudobulk_method = "mean", use_var_genes = TRUE, dr = "umap", obj_out = TRUE, seurat_out = obj_out, vec_out = FALSE, threshold = "auto", verbose = TRUE, rm0 = FALSE, rename_prefix = NULL, exclude_genes = c(), metadata = NULL, organism = "hsapiens", plot_name = NULL, rds_name = NULL, expand_unassigned = FALSE, ... )
input |
single-cell expression matrix or Seurat object |
... |
additional arguments to pass to compute_method function |
ref_mat |
reference expression matrix |
metadata |
cell cluster assignments,
supplied as a vector or data.frame.
If data.frame is supplied then |
cluster_col |
column in metadata that contains cluster ids per cell. Will default to first column of metadata if not supplied. Not required if running correlation per cell. |
query_genes |
A vector of genes of interest to compare. If NULL, then common genes between the expr_mat and ref_mat will be used for comparision. |
n_genes |
number of genes limit for Seurat variable genes, by default 1000, set to 0 to use all variable genes (generally not recommended) |
per_cell |
if true run per cell, otherwise per cluster. |
n_perm |
number of permutations, set to 0 by default |
compute_method |
method(s) for computing similarity scores |
pseudobulk_method |
method used for summarizing clusters, options are mean (default), median, truncate (10% truncated mean), or trimean, max, min |
verbose |
whether to report certain variables chosen and steps |
lookuptable |
if not supplied, will look in built-in table for object parsing |
rm0 |
consider 0 as missing data, recommended for per_cell |
obj_out |
whether to output object instead of cor matrix |
seurat_out |
output cor matrix or called seurat object (deprecated, use obj_out instead) |
vec_out |
only output a result vector in the same order as metadata |
rename_prefix |
prefix to add to type and r column names |
threshold |
identity calling minimum correlation score threshold, only used when obj_out = TRUE |
low_threshold_cell |
option to remove clusters with too few cells |
exclude_genes |
a vector of gene names to throw out of query |
if_log |
input data is natural log, averaging will be done on unlogged data |
organism |
for GO term analysis, organism name: human - 'hsapiens', mouse - 'mmusculus' |
plot_name |
name for saved pdf, if NULL then no file is written (default) |
rds_name |
name for saved rds of rank_diff, if NULL then no file is written (default) |
expand_unassigned |
test all ref clusters for unassigned results |
use_var_genes |
if providing a seurat object, use the variable genes (stored in [email protected]) as the query_genes. |
dr |
stored dimension reduction |
single cell object with identity assigned in metadata, or matrix of correlation values, clusters from input as row names, cell types from ref_mat as column names
# Annotate a matrix and metadata clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "RNA_snn_res.0.5", verbose = TRUE ) # Annotate using a different method clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "RNA_snn_res.0.5", compute_method = "cosine" ) # Annotate a SingleCellExperiment object sce <- sce_pbmc() clustify( sce, cbmc_ref, cluster_col = "clusters", obj_out = TRUE, per_cell = FALSE, dr = "umap" ) # Annotate a Seurat object so <- so_pbmc() clustify( so, cbmc_ref, cluster_col = "seurat_clusters", obj_out = TRUE, per_cell = FALSE, dr = "umap" ) # Annotate (and return) a Seurat object per-cell clustify( input = so, ref_mat = cbmc_ref, cluster_col = "seurat_clusters", obj_out = TRUE, per_cell = TRUE, dr = "umap" )
# Annotate a matrix and metadata clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "RNA_snn_res.0.5", verbose = TRUE ) # Annotate using a different method clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "RNA_snn_res.0.5", compute_method = "cosine" ) # Annotate a SingleCellExperiment object sce <- sce_pbmc() clustify( sce, cbmc_ref, cluster_col = "clusters", obj_out = TRUE, per_cell = FALSE, dr = "umap" ) # Annotate a Seurat object so <- so_pbmc() clustify( so, cbmc_ref, cluster_col = "seurat_clusters", obj_out = TRUE, per_cell = FALSE, dr = "umap" ) # Annotate (and return) a Seurat object per-cell clustify( input = so, ref_mat = cbmc_ref, cluster_col = "seurat_clusters", obj_out = TRUE, per_cell = TRUE, dr = "umap" )
Main function to compare scRNA-seq data to gene lists.
clustify_lists(input, ...) ## Default S3 method: clustify_lists( input, marker, marker_inmatrix = TRUE, metadata = NULL, cluster_col = NULL, if_log = TRUE, per_cell = FALSE, topn = 800, cut = 0, genome_n = 30000, metric = "hyper", output_high = TRUE, lookuptable = NULL, obj_out = TRUE, seurat_out = obj_out, vec_out = FALSE, rename_prefix = NULL, threshold = 0, low_threshold_cell = 0, verbose = TRUE, input_markers = FALSE, details_out = FALSE, ... ) ## S3 method for class 'Seurat' clustify_lists( input, metadata = NULL, cluster_col = NULL, if_log = TRUE, per_cell = FALSE, topn = 800, cut = 0, marker, marker_inmatrix = TRUE, genome_n = 30000, metric = "hyper", output_high = TRUE, dr = "umap", obj_out = TRUE, seurat_out = obj_out, vec_out = FALSE, threshold = 0, rename_prefix = NULL, verbose = TRUE, details_out = FALSE, ... ) ## S3 method for class 'SingleCellExperiment' clustify_lists( input, metadata = NULL, cluster_col = NULL, if_log = TRUE, per_cell = FALSE, topn = 800, cut = 0, marker, marker_inmatrix = TRUE, genome_n = 30000, metric = "hyper", output_high = TRUE, dr = "umap", obj_out = TRUE, seurat_out = obj_out, vec_out = FALSE, threshold = 0, rename_prefix = NULL, verbose = TRUE, details_out = FALSE, ... )
clustify_lists(input, ...) ## Default S3 method: clustify_lists( input, marker, marker_inmatrix = TRUE, metadata = NULL, cluster_col = NULL, if_log = TRUE, per_cell = FALSE, topn = 800, cut = 0, genome_n = 30000, metric = "hyper", output_high = TRUE, lookuptable = NULL, obj_out = TRUE, seurat_out = obj_out, vec_out = FALSE, rename_prefix = NULL, threshold = 0, low_threshold_cell = 0, verbose = TRUE, input_markers = FALSE, details_out = FALSE, ... ) ## S3 method for class 'Seurat' clustify_lists( input, metadata = NULL, cluster_col = NULL, if_log = TRUE, per_cell = FALSE, topn = 800, cut = 0, marker, marker_inmatrix = TRUE, genome_n = 30000, metric = "hyper", output_high = TRUE, dr = "umap", obj_out = TRUE, seurat_out = obj_out, vec_out = FALSE, threshold = 0, rename_prefix = NULL, verbose = TRUE, details_out = FALSE, ... ) ## S3 method for class 'SingleCellExperiment' clustify_lists( input, metadata = NULL, cluster_col = NULL, if_log = TRUE, per_cell = FALSE, topn = 800, cut = 0, marker, marker_inmatrix = TRUE, genome_n = 30000, metric = "hyper", output_high = TRUE, dr = "umap", obj_out = TRUE, seurat_out = obj_out, vec_out = FALSE, threshold = 0, rename_prefix = NULL, verbose = TRUE, details_out = FALSE, ... )
input |
single-cell expression matrix, Seurat object, or SingleCellExperiment |
... |
passed to matrixize_markers |
marker |
matrix or dataframe of candidate genes for each cluster |
marker_inmatrix |
whether markers genes are already in preprocessed matrix form |
metadata |
cell cluster assignments,
supplied as a vector or data.frame.
If data.frame is supplied then |
cluster_col |
column in metadata with cluster number |
if_log |
input data is natural log, averaging will be done on unlogged data |
per_cell |
compare per cell or per cluster |
topn |
number of top expressing genes to keep from input matrix |
cut |
expression cut off from input matrix |
genome_n |
number of genes in the genome |
metric |
adjusted p-value for hypergeometric test, or jaccard index |
output_high |
if true (by default to fit with rest of package), -log10 transform p-value |
lookuptable |
if not supplied, will look in built-in table for object parsing |
obj_out |
whether to output object instead of cor matrix |
seurat_out |
output cor matrix or called seurat object (deprecated, use obj_out instead) |
vec_out |
only output a result vector in the same order as metadata |
rename_prefix |
prefix to add to type and r column names |
threshold |
identity calling minimum correlation score threshold, only used when obj_out = T |
low_threshold_cell |
option to remove clusters with too few cells |
verbose |
whether to report certain variables chosen and steps |
input_markers |
whether input is marker data.frame of 0 and 1s (output of pos_neg_marker), and uses alternate enrichment mode |
details_out |
whether to also output shared gene list from jaccard |
dr |
stored dimension reduction |
matrix of numeric values, clusters from input as row names, cell types from marker_mat as column names
# Annotate a matrix and metadata # Annotate using a different method clustify_lists( input = pbmc_matrix_small, marker = cbmc_m, metadata = pbmc_meta, cluster_col = "classified", verbose = TRUE, metric = "jaccard" )
# Annotate a matrix and metadata # Annotate using a different method clustify_lists( input = pbmc_matrix_small, marker = cbmc_m, metadata = pbmc_meta, cluster_col = "classified", verbose = TRUE, metric = "jaccard" )
Combined function to compare scRNA-seq data to bulk RNA-seq data and marker list
clustify_nudge(input, ...) ## Default S3 method: clustify_nudge( input, ref_mat, marker, metadata = NULL, cluster_col = NULL, query_genes = NULL, compute_method = "spearman", weight = 1, threshold = -Inf, dr = "umap", norm = "diff", call = TRUE, marker_inmatrix = TRUE, mode = "rank", obj_out = FALSE, seurat_out = obj_out, rename_prefix = NULL, lookuptable = NULL, ... ) ## S3 method for class 'Seurat' clustify_nudge( input, ref_mat, marker, cluster_col = NULL, query_genes = NULL, compute_method = "spearman", weight = 1, obj_out = TRUE, seurat_out = obj_out, threshold = -Inf, dr = "umap", norm = "diff", marker_inmatrix = TRUE, mode = "rank", rename_prefix = NULL, ... )
clustify_nudge(input, ...) ## Default S3 method: clustify_nudge( input, ref_mat, marker, metadata = NULL, cluster_col = NULL, query_genes = NULL, compute_method = "spearman", weight = 1, threshold = -Inf, dr = "umap", norm = "diff", call = TRUE, marker_inmatrix = TRUE, mode = "rank", obj_out = FALSE, seurat_out = obj_out, rename_prefix = NULL, lookuptable = NULL, ... ) ## S3 method for class 'Seurat' clustify_nudge( input, ref_mat, marker, cluster_col = NULL, query_genes = NULL, compute_method = "spearman", weight = 1, obj_out = TRUE, seurat_out = obj_out, threshold = -Inf, dr = "umap", norm = "diff", marker_inmatrix = TRUE, mode = "rank", rename_prefix = NULL, ... )
input |
express matrix or object |
... |
passed to matrixize_markers |
ref_mat |
reference expression matrix |
marker |
matrix of markers |
metadata |
cell cluster assignments, supplied as a vector
or data.frame. If
data.frame is supplied then |
cluster_col |
column in metadata that contains cluster ids per cell. Will default to first column of metadata if not supplied. Not required if running correlation per cell. |
query_genes |
A vector of genes of interest to compare. If NULL, then common genes between the expr_mat and ref_mat will be used for comparision. |
compute_method |
method(s) for computing similarity scores |
weight |
relative weight for the gene list scores, when added to correlation score |
threshold |
identity calling minimum score threshold, only used when obj_out = T |
dr |
stored dimension reduction |
norm |
whether and how the results are normalized |
call |
make call or just return score matrix |
marker_inmatrix |
whether markers genes are already in preprocessed matrix form |
mode |
use marker expression pct or ranked cor score for nudging |
obj_out |
whether to output object instead of cor matrix |
seurat_out |
output cor matrix or called seurat object (deprecated, use obj_out) |
rename_prefix |
prefix to add to type and r column names |
lookuptable |
if not supplied, will look in built-in table for object parsing |
single cell object, or matrix of numeric values, clusters from input as row names, cell types from ref_mat as column names
# Seurat so <- so_pbmc() clustify_nudge( input = so, ref_mat = cbmc_ref, marker = cbmc_m, cluster_col = "seurat_clusters", threshold = 0.8, obj_out = FALSE, mode = "pct", dr = "umap" ) # Matrix clustify_nudge( input = pbmc_matrix_small, ref_mat = cbmc_ref, metadata = pbmc_meta, marker = as.matrix(cbmc_m), query_genes = pbmc_vargenes, cluster_col = "classified", threshold = 0.8, call = FALSE, marker_inmatrix = FALSE, mode = "pct" )
# Seurat so <- so_pbmc() clustify_nudge( input = so, ref_mat = cbmc_ref, marker = cbmc_m, cluster_col = "seurat_clusters", threshold = 0.8, obj_out = FALSE, mode = "pct", dr = "umap" ) # Matrix clustify_nudge( input = pbmc_matrix_small, ref_mat = cbmc_ref, metadata = pbmc_meta, marker = as.matrix(cbmc_m), query_genes = pbmc_vargenes, cluster_col = "classified", threshold = 0.8, call = FALSE, marker_inmatrix = FALSE, mode = "pct" )
Correlation functions available in clustifyr
clustifyr_methods
clustifyr_methods
An object of class character
of length 5.
clustifyr_methods
clustifyr_methods
From per-cell calls, take highest freq call in each cluster
collapse_to_cluster(res, metadata, cluster_col, threshold = 0)
collapse_to_cluster(res, metadata, cluster_col, threshold = 0)
res |
dataframe of idents, such as output of cor_to_call |
metadata |
input metadata with tsne or umap coordinates and cluster ids |
cluster_col |
metadata column for cluster |
threshold |
minimum correlation coefficent cutoff for calling clusters |
new metadata with added columns
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", ref_mat = cbmc_ref, per_cell = TRUE ) res2 <- cor_to_call(res) collapse_to_cluster( res2, metadata = pbmc_meta, cluster_col = "classified", threshold = 0 )
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", ref_mat = cbmc_ref, per_cell = TRUE ) res2 <- cor_to_call(res) collapse_to_cluster( res2, metadata = pbmc_meta, cluster_col = "classified", threshold = 0 )
Calculate adjusted p-values for hypergeometric test of gene lists or jaccard index
compare_lists( bin_mat, marker_mat, n = 30000, metric = "hyper", output_high = TRUE, details_out = FALSE )
compare_lists( bin_mat, marker_mat, n = 30000, metric = "hyper", output_high = TRUE, details_out = FALSE )
bin_mat |
binarized single-cell expression matrix, feed in by_cluster mat, if desired |
marker_mat |
matrix or dataframe of candidate genes for each cluster |
n |
number of genes in the genome |
metric |
adjusted p-value for hypergeometric test, or jaccard index |
output_high |
if true (by default to fit with rest of package), -log10 transform p-value |
details_out |
whether to also output shared gene list from jaccard |
matrix of numeric values, clusters from expr_mat as row names, cell types from marker_mat as column names
pbmc_mm <- matrixize_markers(pbmc_markers) pbmc_avg <- average_clusters( pbmc_matrix_small, pbmc_meta, cluster_col = "classified" ) pbmc_avgb <- binarize_expr(pbmc_avg) compare_lists( pbmc_avgb, pbmc_mm, metric = "spearman" )
pbmc_mm <- matrixize_markers(pbmc_markers) pbmc_avg <- average_clusters( pbmc_matrix_small, pbmc_meta, cluster_col = "classified" ) pbmc_avgb <- binarize_expr(pbmc_avg) compare_lists( pbmc_avgb, pbmc_mm, metric = "spearman" )
get best calls for each cluster
cor_to_call( cor_mat, metadata = NULL, cluster_col = "cluster", collapse_to_cluster = FALSE, threshold = 0, rename_prefix = NULL, carry_r = FALSE )
cor_to_call( cor_mat, metadata = NULL, cluster_col = "cluster", collapse_to_cluster = FALSE, threshold = 0, rename_prefix = NULL, carry_r = FALSE )
cor_mat |
input similarity matrix |
metadata |
input metadata with tsne or umap coordinates and cluster ids |
cluster_col |
metadata column, can be cluster or cellid |
collapse_to_cluster |
if a column name is provided, takes the most frequent call of entire cluster to color in plot |
threshold |
minimum correlation coefficent cutoff for calling clusters |
rename_prefix |
prefix to add to type and r column names |
carry_r |
whether to include threshold in unassigned names |
dataframe of cluster, new ident, and r info
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", ref_mat = cbmc_ref ) cor_to_call(res)
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", ref_mat = cbmc_ref ) cor_to_call(res)
get ranked calls for each cluster
cor_to_call_rank( cor_mat, metadata = NULL, cluster_col = "cluster", collapse_to_cluster = FALSE, threshold = 0, rename_prefix = NULL, top_n = NULL )
cor_to_call_rank( cor_mat, metadata = NULL, cluster_col = "cluster", collapse_to_cluster = FALSE, threshold = 0, rename_prefix = NULL, top_n = NULL )
cor_mat |
input similarity matrix |
metadata |
input metadata with tsne or umap coordinates and cluster ids |
cluster_col |
metadata column, can be cluster or cellid |
collapse_to_cluster |
if a column name is provided, takes the most frequent call of entire cluster to color in plot |
threshold |
minimum correlation coefficent cutoff for calling clusters |
rename_prefix |
prefix to add to type and r column names |
top_n |
the number of ranks to keep, the rest will be set to 100 |
dataframe of cluster, new ident, and r info
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", ref_mat = cbmc_ref ) cor_to_call_rank(res, threshold = "auto")
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", ref_mat = cbmc_ref ) cor_to_call_rank(res, threshold = "auto")
get top calls for each cluster
cor_to_call_topn( cor_mat, metadata = NULL, col = "cluster", collapse_to_cluster = FALSE, threshold = 0, topn = 2 )
cor_to_call_topn( cor_mat, metadata = NULL, col = "cluster", collapse_to_cluster = FALSE, threshold = 0, topn = 2 )
cor_mat |
input similarity matrix |
metadata |
input metadata with tsne or umap coordinates and cluster ids |
col |
metadata column, can be cluster or cellid |
collapse_to_cluster |
if a column name is provided, takes the most frequent call of entire cluster to color in plot |
threshold |
minimum correlation coefficent cutoff for calling clusters |
topn |
number of calls for each cluster |
dataframe of cluster, new potential ident, and r info
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "classified" ) cor_to_call_topn( cor_mat = res, metadata = pbmc_meta, col = "classified", collapse_to_cluster = FALSE, threshold = 0.5 )
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "classified" ) cor_to_call_topn( cor_mat = res, metadata = pbmc_meta, col = "classified", collapse_to_cluster = FALSE, threshold = 0.5 )
Cosine distance
cosine(vec1, vec2)
cosine(vec1, vec2)
vec1 |
test vector |
vec2 |
reference vector |
numeric value of cosine distance between the vectors
table of references stored in clustifyrdata
downrefs
downrefs
An object of class tbl_df
(inherits from tbl
, data.frame
) with 9 rows and 6 columns.
various packages
Other data:
cbmc_m
,
cbmc_ref
,
human_genes_10x
,
mouse_genes_10x
,
pbmc_markers
,
pbmc_markers_M3Drop
,
pbmc_matrix_small
,
pbmc_meta
,
pbmc_vargenes
downsample matrix by cluster or completely random
downsample_matrix( mat, n = 1, keep_cluster_proportions = TRUE, metadata = NULL, cluster_col = "cluster" )
downsample_matrix( mat, n = 1, keep_cluster_proportions = TRUE, metadata = NULL, cluster_col = "cluster" )
mat |
expression matrix |
n |
number per cluster or fraction to keep |
keep_cluster_proportions |
whether to subsample |
metadata |
data.frame or vector containing cluster assignments per cell. Order must match column order in supplied matrix. If a data.frame provide the cluster_col parameters. |
cluster_col |
column in metadata with cluster number |
new smaller mat with less cell_id columns
set.seed(42) mat <- downsample_matrix( mat = pbmc_matrix_small, metadata = pbmc_meta$classified, n = 10, keep_cluster_proportions = TRUE ) mat[1:3, 1:3]
set.seed(42) mat <- downsample_matrix( mat = pbmc_matrix_small, metadata = pbmc_meta$classified, n = 10, keep_cluster_proportions = TRUE ) mat[1:3, 1:3]
Extract genes, i.e. "features", based on the top loadings of principal components formed from the bulk expression data set
feature_select_PCA( mat = NULL, pcs = NULL, n_pcs = 10, percentile = 0.99, if_log = TRUE )
feature_select_PCA( mat = NULL, pcs = NULL, n_pcs = 10, percentile = 0.99, if_log = TRUE )
mat |
Expression matrix. Rownames are genes, colnames are single cell cluster name, and values are average single cell expression (log transformed). |
pcs |
Precalculated pcs if available, will skip over processing on mat. |
n_pcs |
Number of PCs to selected gene loadings from. See the explore_PCA_corr.Rmd vignette for details. |
percentile |
Select the percentile of absolute values of PCA loadings to select genes from. E.g. 0.999 would select the top point 1 percent of genes with the largest loadings. |
if_log |
whether the data is already log transformed |
vector of genes
feature_select_PCA( cbmc_ref, if_log = FALSE )
feature_select_PCA( cbmc_ref, if_log = FALSE )
takes files with positive and negative markers, as described in garnett, and returns list of markers
file_marker_parse(filename)
file_marker_parse(filename)
filename |
txt file to load |
list of positive and negative gene markers
marker_file <- system.file( "extdata", "hsPBMC_markers.txt", package = "clustifyr" ) file_marker_parse(marker_file)
marker_file <- system.file( "extdata", "hsPBMC_markers.txt", package = "clustifyr" ) file_marker_parse(marker_file)
Find rank bias
find_rank_bias(avg_mat, ref_mat, query_genes = NULL)
find_rank_bias(avg_mat, ref_mat, query_genes = NULL)
avg_mat |
average expression matrix |
ref_mat |
reference expression matrix |
query_genes |
original vector of genes used to clustify |
list of matrix of rank diff values
avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", if_log = FALSE ) rankdiff <- find_rank_bias( avg, cbmc_ref, query_genes = pbmc_vargenes )
avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", if_log = FALSE ) rankdiff <- find_rank_bias( avg, cbmc_ref, query_genes = pbmc_vargenes )
pct of cells in each cluster that express genelist
gene_pct(matrix, genelist, clusters, returning = "mean")
gene_pct(matrix, genelist, clusters, returning = "mean")
matrix |
expression matrix |
genelist |
vector of marker genes for one identity |
clusters |
vector of cluster identities |
returning |
whether to return mean, min, or max of the gene pct in the gene list |
vector of numeric values
pct of cells in every cluster that express a series of genelists
gene_pct_markerm(matrix, marker_m, metadata, cluster_col = NULL, norm = NULL)
gene_pct_markerm(matrix, marker_m, metadata, cluster_col = NULL, norm = NULL)
matrix |
expression matrix |
marker_m |
matrixized markers |
metadata |
data.frame or vector containing cluster assignments per cell. Order must match column order in supplied matrix. If a data.frame provide the cluster_col parameters. |
cluster_col |
column in metadata with cluster number |
norm |
whether and how the results are normalized |
matrix of numeric values, clusters from mat as row names, cell types from marker_m as column names
gene_pct_markerm( matrix = pbmc_matrix_small, marker_m = cbmc_m, metadata = pbmc_meta, cluster_col = "classified" )
gene_pct_markerm( matrix = pbmc_matrix_small, marker_m = cbmc_m, metadata = pbmc_meta, cluster_col = "classified" )
Function to make best call from correlation matrix
get_best_match_matrix(cor_mat)
get_best_match_matrix(cor_mat)
cor_mat |
correlation matrix |
matrix of 1s and 0s
Function to make call and attach score
get_best_str(name, best_mat, cor_mat, carry_cor = TRUE)
get_best_str(name, best_mat, cor_mat, carry_cor = TRUE)
name |
name of row to query |
best_mat |
binarized call matrix |
cor_mat |
correlation matrix |
carry_cor |
whether the correlation score gets reported |
string with ident call and possibly cor value
return entries found in all supplied vectors. If the vector supplied is NULL or NA, then it will be excluded from the comparison.
get_common_elements(...)
get_common_elements(...)
... |
vectors |
vector of shared elements
Compute similarity of matrices
get_similarity( expr_mat, ref_mat, cluster_ids, compute_method, pseudobulk_method = "mean", per_cell = FALSE, rm0 = FALSE, if_log = TRUE, low_threshold = 0, ... )
get_similarity( expr_mat, ref_mat, cluster_ids, compute_method, pseudobulk_method = "mean", per_cell = FALSE, rm0 = FALSE, if_log = TRUE, low_threshold = 0, ... )
expr_mat |
single-cell expression matrix |
ref_mat |
reference expression matrix |
cluster_ids |
vector of cluster ids for each cell |
compute_method |
method(s) for computing similarity scores |
pseudobulk_method |
method used for summarizing clusters, options are mean (default), median, truncate (10% truncated mean), or trimean, max, min |
per_cell |
run per cell? |
rm0 |
consider 0 as missing data, recommended for per_cell |
if_log |
input data is natural log, averaging will be done on unlogged data |
low_threshold |
option to remove clusters with too few cells |
... |
additional parameters not used yet |
matrix of numeric values, clusters from expr_mat as row names, cell types from ref_mat as column names
Build reference atlases from external UCSC cellbrowsers
get_ucsc_reference(cb_url, cluster_col, ...)
get_ucsc_reference(cb_url, cluster_col, ...)
cb_url |
URL of cellbrowser dataset (e.g. http://cells.ucsc.edu/?ds=cortex-dev). Note that the URL must contain the ds=dataset-name suffix. |
cluster_col |
annotation field for summarizing gene expression (e.g. clustering, cell-type name, samples, etc.) |
... |
additional args passed to average_clusters |
reference matrix
## Not run: # many datasets hosted by UCSC have UMI counts in the expression matrix # set if_log = FALSE if the expression matrix has not been natural log transformed get_ucsc_reference( cb_url = "https://cells.ucsc.edu/?ds=evocell+mus-musculus+marrow", cluster_col = "Clusters", if_log = FALSE ) get_ucsc_reference( cb_url = "http://cells.ucsc.edu/?ds=muscle-cell-atlas", cluster_col = "cell_annotation", if_log = FALSE ) ## End(Not run)
## Not run: # many datasets hosted by UCSC have UMI counts in the expression matrix # set if_log = FALSE if the expression matrix has not been natural log transformed get_ucsc_reference( cb_url = "https://cells.ucsc.edu/?ds=evocell+mus-musculus+marrow", cluster_col = "Clusters", if_log = FALSE ) get_ucsc_reference( cb_url = "http://cells.ucsc.edu/?ds=muscle-cell-atlas", cluster_col = "cell_annotation", if_log = FALSE ) ## End(Not run)
Generate a unique column id for a dataframe
get_unique_column(df, id = NULL)
get_unique_column(df, id = NULL)
df |
dataframe with column names |
id |
desired id if unique |
character
Variable gene list is required for clustify
main function.
This function parses variables genes from a matrix input.
get_vargenes(marker_mat)
get_vargenes(marker_mat)
marker_mat |
matrix or dataframe of candidate genes for each cluster |
vector of marker gene names
get_vargenes(cbmc_m)
get_vargenes(cbmc_m)
convert gmt format of pathways to list of vectors
gmt_to_list( path, cutoff = 0, sep = "\thttp://www.broadinstitute.org/gsea/msigdb/cards/.*?\t" )
gmt_to_list( path, cutoff = 0, sep = "\thttp://www.broadinstitute.org/gsea/msigdb/cards/.*?\t" )
path |
gmt file path |
cutoff |
remove pathways with less genes than this cutoff |
sep |
sep used in file to split path and genes |
list of genes in each pathway
gmt_file <- system.file( "extdata", "c2.cp.reactome.v6.2.symbols.gmt.gz", package = "clustifyr" ) gene.lists <- gmt_to_list(path = gmt_file) length(gene.lists)
gmt_file <- system.file( "extdata", "c2.cp.reactome.v6.2.symbols.gmt.gz", package = "clustifyr" ) gene.lists <- gmt_to_list(path = gmt_file) length(gene.lists)
Vector of human genes for 10x cellranger pipeline
human_genes_10x
human_genes_10x
An object of class character
of length 33514.
https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest
Other data:
cbmc_m
,
cbmc_ref
,
downrefs
,
mouse_genes_10x
,
pbmc_markers
,
pbmc_markers_M3Drop
,
pbmc_matrix_small
,
pbmc_meta
,
pbmc_vargenes
more flexible metadata update of single cell objects
insert_meta_object( input, new_meta, type = class(input), meta_loc = NULL, lookuptable = NULL )
insert_meta_object( input, new_meta, type = class(input), meta_loc = NULL, lookuptable = NULL )
input |
input object |
new_meta |
new metadata table to insert back into object |
type |
look up predefined slots/loc |
meta_loc |
metadata location |
lookuptable |
if not supplied, will look in built-in table for object parsing |
new object with new metadata inserted
so <- so_pbmc() insert_meta_object(so, seurat_meta(so, dr = "umap"))
so <- so_pbmc() insert_meta_object(so, seurat_meta(so, dr = "umap"))
Use package entropy to compute Kullback-Leibler divergence. The function first converts each vector's reads to pseudo-number of transcripts by normalizing the total reads to total_reads. The normalized read for each gene is then rounded to serve as the pseudo-number of transcripts. Function entropy::KL.shrink is called to compute the KL-divergence between the two vectors, and the maximal allowed divergence is set to max_KL. Finally, a linear transform is performed to convert the KL divergence, which is between 0 and max_KL, to a similarity score between -1 and 1.
kl_divergence(vec1, vec2, if_log = FALSE, total_reads = 1000, max_KL = 1)
kl_divergence(vec1, vec2, if_log = FALSE, total_reads = 1000, max_KL = 1)
vec1 |
Test vector |
vec2 |
Reference vector |
if_log |
Whether the vectors are log-transformed. If so, the raw count should be computed before computing KL-divergence. |
total_reads |
Pseudo-library size |
max_KL |
Maximal allowed value of KL-divergence. |
numeric value, with additional attributes, of kl divergence between the vectors
make combination ref matrix to assess intermixing
make_comb_ref(ref_mat, if_log = TRUE, sep = "_and_")
make_comb_ref(ref_mat, if_log = TRUE, sep = "_and_")
ref_mat |
reference expression matrix |
if_log |
whether input data is natural |
sep |
separator for name combinations |
expression matrix
ref <- make_comb_ref( cbmc_ref, sep = "_+_" ) ref[1:3, 1:3]
ref <- make_comb_ref( cbmc_ref, sep = "_+_" ) ref[1:3, 1:3]
decide for one gene whether it is a marker for a certain cell type
marker_select(row1, cols, cut = 1, compto = 1)
marker_select(row1, cols, cut = 1, compto = 1)
row1 |
a numeric vector of expression values (row) |
cols |
a vector of cell types (column) |
cut |
an expression minimum cutoff |
compto |
compare max expression to the value of next 1 or more |
vector of cluster name and ratio value
pbmc_avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", if_log = FALSE ) marker_select( row1 = pbmc_avg["PPBP", ], cols = names(pbmc_avg["PPBP", ]) )
pbmc_avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", if_log = FALSE ) marker_select( row1 = pbmc_avg["PPBP", ], cols = names(pbmc_avg["PPBP", ]) )
Convert candidate genes list into matrix
matrixize_markers( marker_df, ranked = FALSE, n = NULL, step_weight = 1, background_weight = 0, unique = FALSE, metadata = NULL, cluster_col = "classified", remove_rp = FALSE )
matrixize_markers( marker_df, ranked = FALSE, n = NULL, step_weight = 1, background_weight = 0, unique = FALSE, metadata = NULL, cluster_col = "classified", remove_rp = FALSE )
marker_df |
dataframe of candidate genes, must contain "gene" and "cluster" columns, or a matrix of gene names to convert to ranked |
ranked |
unranked gene list feeds into hyperp, the ranked gene list feeds into regular corr_coef |
n |
number of genes to use |
step_weight |
ranked genes are tranformed into pseudo expression by descending weight |
background_weight |
ranked genes are tranformed into pseudo expression with added weight |
unique |
whether to use only unique markers to 1 cluster |
metadata |
vector or dataframe of cluster names, should have column named cluster |
cluster_col |
column for cluster names to replace original cluster, if metadata is dataframe |
remove_rp |
do not include rps, rpl, rp1-9 in markers |
matrix of unranked gene marker names, or matrix of ranked expression
matrixize_markers(pbmc_markers)
matrixize_markers(pbmc_markers)
Vector of mouse genes for 10x cellranger pipeline
mouse_genes_10x
mouse_genes_10x
An object of class character
of length 31017.
https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest
Other data:
cbmc_m
,
cbmc_ref
,
downrefs
,
human_genes_10x
,
pbmc_markers
,
pbmc_markers_M3Drop
,
pbmc_matrix_small
,
pbmc_meta
,
pbmc_vargenes
black and white palette for plotting continous variables
not_pretty_palette
not_pretty_palette
An object of class character
of length 9.
vector of colors
Function to access object data
object_data(object, ...) ## S3 method for class 'Seurat' object_data(object, slot, n_genes = 1000, ...) ## S3 method for class 'SingleCellExperiment' object_data(object, slot, ...)
object_data(object, ...) ## S3 method for class 'Seurat' object_data(object, slot, n_genes = 1000, ...) ## S3 method for class 'SingleCellExperiment' object_data(object, slot, ...)
object |
object after tsne or umap projections and clustering |
... |
additional arguments |
slot |
data to access |
n_genes |
number of genes limit for Seurat variable genes, by default 1000, set to 0 to use all variable genes (generally not recommended) |
expression matrix, with genes as row names, and cell types as column names
so <- so_pbmc() mat <- object_data( object = so, slot = "data" ) mat[1:3, 1:3] sce <- sce_pbmc() mat <- object_data( object = sce, slot = "data" ) mat[1:3, 1:3]
so <- so_pbmc() mat <- object_data( object = so, slot = "data" ) mat[1:3, 1:3] sce <- sce_pbmc() mat <- object_data( object = sce, slot = "data" ) mat[1:3, 1:3]
lookup table for single cell object structures
object_loc_lookup()
object_loc_lookup()
A list populated with standardized functions to access relevant data structures in multiple single cell data formats.
Function to convert labelled object to avg expression matrix
object_ref(input, ...) ## Default S3 method: object_ref( input, cluster_col = NULL, var_genes_only = FALSE, assay_name = NULL, method = "mean", lookuptable = NULL, if_log = TRUE, ... ) ## S3 method for class 'Seurat' object_ref( input, cluster_col = NULL, var_genes_only = FALSE, assay_name = NULL, method = "mean", lookuptable = NULL, if_log = TRUE, ... ) ## S3 method for class 'SingleCellExperiment' object_ref( input, cluster_col = NULL, var_genes_only = FALSE, assay_name = NULL, method = "mean", lookuptable = NULL, if_log = TRUE, ... )
object_ref(input, ...) ## Default S3 method: object_ref( input, cluster_col = NULL, var_genes_only = FALSE, assay_name = NULL, method = "mean", lookuptable = NULL, if_log = TRUE, ... ) ## S3 method for class 'Seurat' object_ref( input, cluster_col = NULL, var_genes_only = FALSE, assay_name = NULL, method = "mean", lookuptable = NULL, if_log = TRUE, ... ) ## S3 method for class 'SingleCellExperiment' object_ref( input, cluster_col = NULL, var_genes_only = FALSE, assay_name = NULL, method = "mean", lookuptable = NULL, if_log = TRUE, ... )
input |
object after tsne or umap projections and clustering |
... |
additional arguments |
cluster_col |
column name where classified cluster names are stored in seurat meta data, cannot be "rn" |
var_genes_only |
whether to keep only var.genes in the final matrix output, could also look up genes used for PCA |
assay_name |
any additional assay data, such as ADT, to include. If more than 1, pass a vector of names |
method |
whether to take mean (default) or median |
lookuptable |
if not supplied, will look in built-in table for object parsing |
if_log |
input data is natural log, averaging will be done on unlogged data |
reference expression matrix, with genes as row names, and cell types as column names
so <- so_pbmc() object_ref( so, cluster_col = "seurat_clusters" )
so <- so_pbmc() object_ref( so, cluster_col = "seurat_clusters" )
Overcluster by kmeans per cluster
overcluster(mat, cluster_id, power = 0.15)
overcluster(mat, cluster_id, power = 0.15)
mat |
expression matrix |
cluster_id |
list of ids per cluster |
power |
decides the number of clusters for kmeans |
new cluster_id list of more clusters
res <- overcluster( mat = pbmc_matrix_small, cluster_id = split(colnames(pbmc_matrix_small), pbmc_meta$classified) ) length(res)
res <- overcluster( mat = pbmc_matrix_small, cluster_id = split(colnames(pbmc_matrix_small), pbmc_meta$classified) ) length(res)
compare clustering parameters and classification outcomes
overcluster_test( expr, metadata, ref_mat, cluster_col, x_col = "UMAP_1", y_col = "UMAP_2", n = 5, ngenes = NULL, query_genes = NULL, threshold = 0, do_label = TRUE, do_legend = FALSE, newclustering = NULL, combine = TRUE )
overcluster_test( expr, metadata, ref_mat, cluster_col, x_col = "UMAP_1", y_col = "UMAP_2", n = 5, ngenes = NULL, query_genes = NULL, threshold = 0, do_label = TRUE, do_legend = FALSE, newclustering = NULL, combine = TRUE )
expr |
expression matrix |
metadata |
metadata including cluster info and dimension reduction plotting |
ref_mat |
reference matrix |
cluster_col |
column of clustering from metadata |
x_col |
column of metadata for x axis plotting |
y_col |
column of metadata for y axis plotting |
n |
expand n-fold for over/under clustering |
ngenes |
number of genes to use for feature selection, use all genes if NULL |
query_genes |
vector, otherwise genes with be recalculated |
threshold |
type calling threshold |
do_label |
whether to label each cluster at median center |
do_legend |
whether to draw legend |
newclustering |
use kmeans if NULL on dr or col name for second column of clustering |
combine |
if TRUE return a single plot with combined panels, if FALSE return list of plots (default: TRUE) |
faceted ggplot object
set.seed(42) overcluster_test( expr = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, cluster_col = "classified", x_col = "UMAP_1", y_col = "UMAP_2" )
set.seed(42) overcluster_test( expr = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, cluster_col = "classified", x_col = "UMAP_1", y_col = "UMAP_2" )
more flexible parsing of single cell objects
parse_loc_object( input, type = class(input), expr_loc = NULL, meta_loc = NULL, var_loc = NULL, cluster_col = NULL, lookuptable = NULL )
parse_loc_object( input, type = class(input), expr_loc = NULL, meta_loc = NULL, var_loc = NULL, cluster_col = NULL, lookuptable = NULL )
input |
input object |
type |
look up predefined slots/loc |
expr_loc |
function that extracts expression matrix |
meta_loc |
function that extracts metadata |
var_loc |
function that extracts variable genes |
cluster_col |
column of clustering from metadata |
lookuptable |
if not supplied, will use object_loc_lookup() for parsing. |
list of expression, metadata, vargenes, cluster_col info from object
so <- so_pbmc() obj <- parse_loc_object(so) length(obj)
so <- so_pbmc() obj <- parse_loc_object(so) length(obj)
Dataframe of markers from Seurat FindAllMarkers function
pbmc_markers
pbmc_markers
An object of class data.frame
with 2304 rows and 7 columns.
[pbmc_matrix]
processed by Seurat
Other data:
cbmc_m
,
cbmc_ref
,
downrefs
,
human_genes_10x
,
mouse_genes_10x
,
pbmc_markers_M3Drop
,
pbmc_matrix_small
,
pbmc_meta
,
pbmc_vargenes
Selected features of 3k pbmcs from Seurat3 tutorial
pbmc_markers_M3Drop
pbmc_markers_M3Drop
A data frame with 3 variables:
[pbmc_matrix]
processed by [M3Drop]
Other data:
cbmc_m
,
cbmc_ref
,
downrefs
,
human_genes_10x
,
mouse_genes_10x
,
pbmc_markers
,
pbmc_matrix_small
,
pbmc_meta
,
pbmc_vargenes
Count matrix of 3k pbmcs from Seurat3 tutorial, with only var.features
pbmc_matrix_small
pbmc_matrix_small
A sparseMatrix with genes as rows and cells as columns.
https://satijalab.org/seurat/v3.0/pbmc3k_tutorial.html
Other data:
cbmc_m
,
cbmc_ref
,
downrefs
,
human_genes_10x
,
mouse_genes_10x
,
pbmc_markers
,
pbmc_markers_M3Drop
,
pbmc_meta
,
pbmc_vargenes
Metadata, including umap, of 3k pbmcs from Seurat3 tutorial
pbmc_meta
pbmc_meta
An object of class data.frame
with 2638 rows and 9 columns.
[pbmc_matrix]
processed by Seurat
Other data:
cbmc_m
,
cbmc_ref
,
downrefs
,
human_genes_10x
,
mouse_genes_10x
,
pbmc_markers
,
pbmc_markers_M3Drop
,
pbmc_matrix_small
,
pbmc_vargenes
Top 2000 variable genes from 3k pbmcs from Seurat3 tutorial
pbmc_vargenes
pbmc_vargenes
An object of class character
of length 2000.
[pbmc_matrix]
processed by Seurat
Other data:
cbmc_m
,
cbmc_ref
,
downrefs
,
human_genes_10x
,
mouse_genes_10x
,
pbmc_markers
,
pbmc_markers_M3Drop
,
pbmc_matrix_small
,
pbmc_meta
Percentage detected per cluster
percent_clusters(mat, metadata, cluster_col = "cluster", cut_num = 0.5)
percent_clusters(mat, metadata, cluster_col = "cluster", cut_num = 0.5)
mat |
expression matrix |
metadata |
data.frame with cells |
cluster_col |
column in metadata with cluster number |
cut_num |
binary cutoff for detection |
matrix of numeric values, with genes for row names, and clusters for column names
Permute cluster labels to calculate empirical p-value
permute_similarity( expr_mat, ref_mat, cluster_ids, n_perm, per_cell = FALSE, compute_method, pseudobulk_method = "mean", rm0 = FALSE, ... )
permute_similarity( expr_mat, ref_mat, cluster_ids, n_perm, per_cell = FALSE, compute_method, pseudobulk_method = "mean", rm0 = FALSE, ... )
expr_mat |
single-cell expression matrix |
ref_mat |
reference expression matrix |
cluster_ids |
clustering info of single-cell data assume that genes have ALREADY BEEN filtered |
n_perm |
number of permutations |
per_cell |
run per cell? |
compute_method |
method(s) for computing similarity scores |
pseudobulk_method |
method used for summarizing clusters, options are mean (default), median, truncate (10% truncated mean), or trimean, max, min |
rm0 |
consider 0 as missing data, recommended for per_cell |
... |
additional parameters |
matrix of numeric values
Plot best calls for each cluster on a tSNE or umap
plot_best_call( cor_mat, metadata, cluster_col = "cluster", collapse_to_cluster = FALSE, threshold = 0, x = "UMAP_1", y = "UMAP_2", plot_r = FALSE, per_cell = FALSE, ... )
plot_best_call( cor_mat, metadata, cluster_col = "cluster", collapse_to_cluster = FALSE, threshold = 0, x = "UMAP_1", y = "UMAP_2", plot_r = FALSE, per_cell = FALSE, ... )
cor_mat |
input similarity matrix |
metadata |
input metadata with tsne or umap coordinates and cluster ids |
cluster_col |
metadata column, can be cluster or cellid |
collapse_to_cluster |
if a column name is provided, takes the most frequent call of entire cluster to color in plot |
threshold |
minimum correlation coefficent cutoff for calling clusters |
x |
x variable |
y |
y variable |
plot_r |
whether to include second plot of cor eff for best call |
per_cell |
whether the cor_mat was generate per cell or per cluster |
... |
passed to plot_dims |
ggplot object, cells projected by dr, colored by cell type classification
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "classified" ) plot_best_call( cor_mat = res, metadata = pbmc_meta, cluster_col = "classified" )
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "classified" ) plot_best_call( cor_mat = res, metadata = pbmc_meta, cluster_col = "classified" )
Plot called clusters on a tSNE or umap, for each reference cluster given
plot_call(cor_mat, metadata, data_to_plot = colnames(cor_mat), ...)
plot_call(cor_mat, metadata, data_to_plot = colnames(cor_mat), ...)
cor_mat |
input similarity matrix |
metadata |
input metadata with tsne or umap coordinates and cluster ids |
data_to_plot |
colname of data to plot, defaults to all |
... |
passed to plot_dims |
list of ggplot object, cells projected by dr, colored by cell type classification
Plot similarity measures on a tSNE or umap
plot_cor( cor_mat, metadata, data_to_plot = colnames(cor_mat), cluster_col = NULL, x = "UMAP_1", y = "UMAP_2", scale_legends = FALSE, ... )
plot_cor( cor_mat, metadata, data_to_plot = colnames(cor_mat), cluster_col = NULL, x = "UMAP_1", y = "UMAP_2", scale_legends = FALSE, ... )
cor_mat |
input similarity matrix |
metadata |
input metadata with per cell tsne or umap coordinates and cluster ids |
data_to_plot |
colname of data to plot, defaults to all |
cluster_col |
colname of clustering data in metadata, defaults to rownames of the metadata if not supplied. |
x |
metadata column name with 1st axis dimension. defaults to "UMAP_1". |
y |
metadata column name with 2nd axis dimension. defaults to "UMAP_2". |
scale_legends |
if TRUE scale all legends to maximum values in entire correlation matrix. if FALSE scale legends to maximum for each plot. A two-element numeric vector can also be passed to supply custom values i.e. c(0, 1) |
... |
passed to plot_dims |
list of ggplot objects, cells projected by dr, colored by cor values
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "classified" ) plot_cor( cor_mat = res, metadata = pbmc_meta, data_to_plot = colnames(res)[1:2], cluster_col = "classified", x = "UMAP_1", y = "UMAP_2" )
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "classified" ) plot_cor( cor_mat = res, metadata = pbmc_meta, data_to_plot = colnames(res)[1:2], cluster_col = "classified", x = "UMAP_1", y = "UMAP_2" )
Plot similarity measures on heatmap
plot_cor_heatmap( cor_mat, metadata = NULL, cluster_col = NULL, col = not_pretty_palette, legend_title = NULL, ... )
plot_cor_heatmap( cor_mat, metadata = NULL, cluster_col = NULL, col = not_pretty_palette, legend_title = NULL, ... )
cor_mat |
input similarity matrix |
metadata |
input metadata with per cell tsne or umap cooordinates and cluster ids |
cluster_col |
colname of clustering data in metadata, defaults to rownames of the metadata if not supplied. |
col |
color ramp to use |
legend_title |
legend title to pass to Heatmap |
... |
passed to Heatmap |
complexheatmap object
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "classified", per_cell = FALSE ) plot_cor_heatmap(res)
res <- clustify( input = pbmc_matrix_small, metadata = pbmc_meta, ref_mat = cbmc_ref, query_genes = pbmc_vargenes, cluster_col = "classified", per_cell = FALSE ) plot_cor_heatmap(res)
Plot a tSNE or umap colored by feature.
plot_dims( data, x = "UMAP_1", y = "UMAP_2", feature = NULL, legend_name = "", c_cols = pretty_palette2, d_cols = NULL, pt_size = 0.25, alpha_col = NULL, group_col = NULL, scale_limits = NULL, do_label = FALSE, do_legend = TRUE, do_repel = TRUE )
plot_dims( data, x = "UMAP_1", y = "UMAP_2", feature = NULL, legend_name = "", c_cols = pretty_palette2, d_cols = NULL, pt_size = 0.25, alpha_col = NULL, group_col = NULL, scale_limits = NULL, do_label = FALSE, do_legend = TRUE, do_repel = TRUE )
data |
input data |
x |
x variable |
y |
y variable |
feature |
feature to color by |
legend_name |
legend name to display, defaults to no name |
c_cols |
character vector of colors to build color gradient
for continuous values, defaults to |
d_cols |
character vector of colors for discrete values. defaults to RColorBrewer paired palette |
pt_size |
point size |
alpha_col |
whether to refer to data column for alpha values |
group_col |
group by another column instead of feature, useful for labels |
scale_limits |
defaults to min = 0, max = max(data$x), otherwise a two-element numeric vector indicating min and max to plot |
do_label |
whether to label each cluster at median center |
do_legend |
whether to draw legend |
do_repel |
whether to use ggrepel on labels |
ggplot object, cells projected by dr, colored by feature
plot_dims( pbmc_meta, feature = "classified" )
plot_dims( pbmc_meta, feature = "classified" )
Plot gene expression on to tSNE or umap
plot_gene(expr_mat, metadata, genes, cell_col = NULL, ...)
plot_gene(expr_mat, metadata, genes, cell_col = NULL, ...)
expr_mat |
input single cell matrix |
metadata |
data.frame with tSNE or umap coordinates |
genes |
gene(s) to color tSNE or umap |
cell_col |
column name in metadata containing cell ids, defaults to rownames if not supplied |
... |
additional arguments passed to |
list of ggplot object, cells projected by dr, colored by gene expression
genes <- c( "RP11-314N13.3", "ARF4" ) plot_gene( expr_mat = pbmc_matrix_small, metadata = tibble::rownames_to_column(pbmc_meta, "rn"), genes = genes, cell_col = "rn" )
genes <- c( "RP11-314N13.3", "ARF4" ) plot_gene( expr_mat = pbmc_matrix_small, metadata = tibble::rownames_to_column(pbmc_meta, "rn"), genes = genes, cell_col = "rn" )
plot GSEA pathway scores as heatmap, returns a list containing results and plot.
plot_pathway_gsea( mat, pathway_list, n_perm = 1000, scale = TRUE, topn = 5, returning = "both" )
plot_pathway_gsea( mat, pathway_list, n_perm = 1000, scale = TRUE, topn = 5, returning = "both" )
mat |
expression matrix |
pathway_list |
a list of vectors, each named for a specific pathway, or dataframe |
n_perm |
Number of permutation for fgsea function. Defaults to 1000. |
scale |
convert expr_mat into zscores prior to running GSEA?, default = TRUE |
topn |
number of top pathways to plot |
returning |
to return "both" list and plot, or either one |
list of matrix and plot, or just plot, matrix of GSEA NES values, cell types as row names, pathways as column names
gl <- list( "n" = c("PPBP", "LYZ", "S100A9"), "a" = c("IGLL5", "GNLY", "FTL") ) pbmc_avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified" ) plot_pathway_gsea( pbmc_avg, gl, 5 )
gl <- list( "n" = c("PPBP", "LYZ", "S100A9"), "a" = c("IGLL5", "GNLY", "FTL") ) pbmc_avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified" ) plot_pathway_gsea( pbmc_avg, gl, 5 )
Query rank bias results
plot_rank_bias(bias_df, organism = "hsapiens")
plot_rank_bias(bias_df, organism = "hsapiens")
bias_df |
data.frame of rank diff matrix between cluster and reference cell types |
organism |
for GO term analysis, organism name: human - 'hsapiens', mouse - 'mmusculus' |
ggplot object of distribution and annotated GO terms
## Not run: avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", if_log = FALSE ) rankdiff <- find_rank_bias( avg, cbmc_ref, query_genes = pbmc_vargenes ) qres <- query_rank_bias( rankdiff, "CD14+ Mono", "CD14+ Mono" ) g <- plot_rank_bias( qres ) ## End(Not run)
## Not run: avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", if_log = FALSE ) rankdiff <- find_rank_bias( avg, cbmc_ref, query_genes = pbmc_vargenes ) qres <- query_rank_bias( rankdiff, "CD14+ Mono", "CD14+ Mono" ) g <- plot_rank_bias( qres ) ## End(Not run)
generate pos and negative marker expression matrix from a list/dataframe of positive markers
pos_neg_marker(mat)
pos_neg_marker(mat)
mat |
matrix or dataframe of markers |
matrix of gene expression
m1 <- pos_neg_marker(cbmc_m)
m1 <- pos_neg_marker(cbmc_m)
adapt clustify to tweak score for pos and neg markers
pos_neg_select( input, ref_mat, metadata, cluster_col = "cluster", cutoff_n = 0, cutoff_score = 0.5 )
pos_neg_select( input, ref_mat, metadata, cluster_col = "cluster", cutoff_n = 0, cutoff_score = 0.5 )
input |
single-cell expression matrix |
ref_mat |
reference expression matrix with positive and negative markers(set expression at 0) |
metadata |
cell cluster assignments,
supplied as a vector or data.frame. If
data.frame is supplied then |
cluster_col |
column in metadata that contains cluster ids per cell. Will default to first column of metadata if not supplied. Not required if running correlation per cell. |
cutoff_n |
expression cutoff where genes ranked below n are considered non-expressing |
cutoff_score |
positive score lower than this cutoff will be considered as 0 to not influence scores |
matrix of numeric values, clusters from input as row names, cell types from ref_mat as column names
pn_ref <- data.frame( "Myeloid" = c(1, 0.01, 0), row.names = c("CD74", "clustifyr0", "CD79A") ) pos_neg_select( input = pbmc_matrix_small, ref_mat = pn_ref, metadata = pbmc_meta, cluster_col = "classified", cutoff_score = 0.8 )
pn_ref <- data.frame( "Myeloid" = c(1, 0.01, 0), row.names = c("CD74", "clustifyr0", "CD79A") ) pos_neg_select( input = pbmc_matrix_small, ref_mat = pn_ref, metadata = pbmc_meta, cluster_col = "classified", cutoff_score = 0.8 )
Color palette for plotting continous variables
pretty_palette
pretty_palette
An object of class character
of length 6.
vector of colors
Expanded color palette ramp for plotting discrete variables
pretty_palette_ramp_d(n)
pretty_palette_ramp_d(n)
n |
number of colors to use |
color ramp
Color palette for plotting continous variables, starting at gray
pretty_palette2
pretty_palette2
An object of class character
of length 9.
vector of colors
Query rank bias results
query_rank_bias(bias_list, id_mat, id_ref)
query_rank_bias(bias_list, id_mat, id_ref)
bias_list |
list of rank diff matrix between cluster and reference cell types |
id_mat |
name of cluster from average cluster matrix |
id_ref |
name of cell type in reference matrix |
data.frame rank diff values
avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", if_log = FALSE ) rankdiff <- find_rank_bias( avg, cbmc_ref, query_genes = pbmc_vargenes ) qres <- query_rank_bias( rankdiff, "CD14+ Mono", "CD14+ Mono" )
avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified", if_log = FALSE ) rankdiff <- find_rank_bias( avg, cbmc_ref, query_genes = pbmc_vargenes ) qres <- query_rank_bias( rankdiff, "CD14+ Mono", "CD14+ Mono" )
feature select from reference matrix
ref_feature_select(mat, n = 3000, mode = "var", rm.lowvar = TRUE)
ref_feature_select(mat, n = 3000, mode = "var", rm.lowvar = TRUE)
mat |
reference matrix |
n |
number of genes to return |
mode |
the method of selecting features |
rm.lowvar |
whether to remove lower variation genes first |
vector of genes
pbmc_avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified" ) ref_feature_select( mat = pbmc_avg[1:100, ], n = 5 )
pbmc_avg <- average_clusters( mat = pbmc_matrix_small, metadata = pbmc_meta, cluster_col = "classified" ) ref_feature_select( mat = pbmc_avg[1:100, ], n = 5 )
marker selection from reference matrix
ref_marker_select(mat, cut = 0.5, arrange = TRUE, compto = 1)
ref_marker_select(mat, cut = 0.5, arrange = TRUE, compto = 1)
mat |
reference matrix |
cut |
an expression minimum cutoff |
arrange |
whether to arrange (lower means better) |
compto |
compare max expression to the value of next 1 or more |
dataframe, with gene, cluster, ratio columns
ref_marker_select( cbmc_ref, cut = 2 )
ref_marker_select( cbmc_ref, cut = 2 )
generate negative markers from a list of exclusive positive markers
reverse_marker_matrix(mat)
reverse_marker_matrix(mat)
mat |
matrix or dataframe of markers |
matrix of gene names
reverse_marker_matrix(cbmc_m)
reverse_marker_matrix(cbmc_m)
Launch Shiny app version of clustifyr, may need to run install_clustifyr_app() at first time to install packages
run_clustifyr_app()
run_clustifyr_app()
instance of shiny app
## Not run: run_clustifyr_app() ## End(Not run)
## Not run: run_clustifyr_app() ## End(Not run)
Use fgsea algorithm to compute normalized enrichment scores and pvalues for gene set ovelap
run_gsea( expr_mat, query_genes, cluster_ids = NULL, n_perm = 1000, per_cell = FALSE, scale = FALSE, no_warnings = TRUE )
run_gsea( expr_mat, query_genes, cluster_ids = NULL, n_perm = 1000, per_cell = FALSE, scale = FALSE, no_warnings = TRUE )
expr_mat |
single-cell expression matrix or Seurat object |
query_genes |
A vector or named list of vectors of genesets of interest to compare via GSEA. If supplying a named list, then the gene set names will appear in the output. |
cluster_ids |
vector of cell cluster assignments, supplied as a
vector with order that
matches columns in |
n_perm |
Number of permutation for fgsea function. Defaults to 1000. |
per_cell |
if true run per cell, otherwise per cluster. |
scale |
convert expr_mat into zscores prior to running GSEA?, default = FALSE |
no_warnings |
suppress warnings from gsea ties |
dataframe of gsea scores (pval, NES), with clusters as rownames
An example SingleCellExperiment object
sce_pbmc()
sce_pbmc()
a SingleCellExperiment object populated with data from the pbmc_matrix_small scRNA-seq dataset, additionally annotated with cluster assignments.
Function to convert labelled seurat object to fully prepared metadata
seurat_meta(seurat_object, ...) ## S3 method for class 'Seurat' seurat_meta(seurat_object, dr = "umap", ...)
seurat_meta(seurat_object, ...) ## S3 method for class 'Seurat' seurat_meta(seurat_object, dr = "umap", ...)
seurat_object |
seurat_object after tsne or umap projections and clustering |
... |
additional arguments |
dr |
dimension reduction method |
dataframe of metadata, including dimension reduction plotting info
so <- so_pbmc() m <- seurat_meta(so)
so <- so_pbmc() m <- seurat_meta(so)
Function to convert labelled seurat object to avg expression matrix
seurat_ref(seurat_object, ...) ## S3 method for class 'Seurat' seurat_ref( seurat_object, cluster_col = "classified", var_genes_only = FALSE, assay_name = NULL, method = "mean", subclusterpower = 0, if_log = TRUE, ... )
seurat_ref(seurat_object, ...) ## S3 method for class 'Seurat' seurat_ref( seurat_object, cluster_col = "classified", var_genes_only = FALSE, assay_name = NULL, method = "mean", subclusterpower = 0, if_log = TRUE, ... )
seurat_object |
seurat_object after tsne or umap projections and clustering |
... |
additional arguments |
cluster_col |
column name where classified cluster names are stored in seurat meta data, cannot be "rn" |
var_genes_only |
whether to keep only var_genes in the final matrix output, could also look up genes used for PCA |
assay_name |
any additional assay data, such as ADT, to include. If more than 1, pass a vector of names |
method |
whether to take mean (default) or median |
subclusterpower |
whether to get multiple averages per original cluster |
if_log |
input data is natural log, averaging will be done on unlogged data |
reference expression matrix, with genes as row names, and cell types as column names
so <- so_pbmc() ref <- seurat_ref(so, cluster_col = "seurat_clusters")
so <- so_pbmc() ref <- seurat_ref(so, cluster_col = "seurat_clusters")
An example Seurat object
so_pbmc()
so_pbmc()
a Seurat object populated with data from the pbmc_matrix_small scRNA-seq dataset, additionally annotated with cluster assignments.
Compute the similarity score between two vectors using a customized scoring function Two vectors may be from either scRNA-seq or bulk RNA-seq data. The lengths of vec1 and vec2 must match, and must be arranged in the same order of genes. Both vectors should be provided to this function after pre-processing, feature selection and dimension reduction.
vector_similarity(vec1, vec2, compute_method, ...)
vector_similarity(vec1, vec2, compute_method, ...)
vec1 |
test vector |
vec2 |
reference vector |
compute_method |
method to run i.e. corr_coef |
... |
arguments to pass to compute_method function |
numeric value of desired correlation or distance measurement
Function to write metadata to object
write_meta(object, ...) ## S3 method for class 'Seurat' write_meta(object, meta, ...) ## S3 method for class 'SingleCellExperiment' write_meta(object, meta, ...)
write_meta(object, ...) ## S3 method for class 'Seurat' write_meta(object, meta, ...) ## S3 method for class 'SingleCellExperiment' write_meta(object, meta, ...)
object |
object after tsne or umap projections and clustering |
... |
additional arguments |
meta |
new metadata dataframe |
object with newly inserted metadata columns
so <- so_pbmc() obj <- write_meta( object = so, meta = seurat_meta(so) ) sce <- sce_pbmc() obj <- write_meta( object = sce, meta = object_data(sce, "meta.data") )
so <- so_pbmc() obj <- write_meta( object = so, meta = seurat_meta(so) ) sce <- sce_pbmc() obj <- write_meta( object = sce, meta = object_data(sce, "meta.data") )