Title: | Annotate cell types for scRNA-seq data |
---|---|
Description: | We developed EasyCellType which can automatically examine the input marker lists obtained from existing software such as Seurat over the cell markerdatabases. Two quantification approaches to annotate cell types are provided: Gene set enrichment analysis (GSEA) and a modified versio of Fisher's exact test. The function presents annotation recommendations in graphical outcomes: bar plots for each cluster showing candidate cell types, as well as a dot plot summarizing the top 5 significant annotations for each cluster. |
Authors: | Ruoxing Li [aut, cre, ctb], Ziyi Li [ctb] |
Maintainer: | Ruoxing Li <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.9.0 |
Built: | 2024-11-19 03:45:35 UTC |
Source: | https://github.com/bioc/EasyCellType |
A list containing 2 elements: Human tissues and Mouse tissues.
data(cellmarker_tissue)
data(cellmarker_tissue)
A list with 2 elements:
Human tissue
Mouse tissue
A list containing 2 elements: Human tissues and Mouse tissues.
data(clustermole_tissue)
data(clustermole_tissue)
A list with 2 elements:
Human tissue
Mouse tissue
Title Summarize markers contirbuting to the cell type annotation
coremarkers(test, data, species)
coremarkers(test, data, species)
test |
Test used to annotation cell types: "GSEA" or "fisher" |
data |
Annotation results. |
species |
"Human" or "Mouse" |
A data frame containing genes contributed to cell annotation
## core_markers <- coremarkers("GSEA", data)
## core_markers <- coremarkers("GSEA", data)
This function is used to run the annotation analysis using either GSEA or a modified Fisher's exact test. We expect users to input a data frame containing expressed markers, cluster information and the differential score (log fold change). The gene lists in that data frame should be sorted by their differential score.
easyct( data, db = "cellmarker", genetype = "Entrezid", species = "Human", tissue = NULL, p_cut = 0.5, test = "GSEA", scoretype = "std" )
easyct( data, db = "cellmarker", genetype = "Entrezid", species = "Human", tissue = NULL, p_cut = 0.5, test = "GSEA", scoretype = "std" )
data |
A data frame containing the markers, cluster, and expression scores; Marker genes should be sorted in each cluster. Order of the columns should be gene, cluster and expression level score. An example data can be loaded using 'data(gene_pbmc)'. |
db |
Name of the reference database: cellmarker, clustermole or panglaodb; |
genetype |
Indicate the gene type in the input data frame: "Entrezid" or "symbol". |
species |
Human or Mouse. Human in default. |
tissue |
Tissue types can be specified when running the analysis. Length of tissue can be larger than 1. The possible tissues can be seen using 'data(cellmarker_tissue)', 'data(clustermole_tissue)' and 'data(panglao_tissue)'. |
p_cut |
Cutoff of the P value for GSEA. |
test |
"GSEA" or "fisher"; "GSEA" is used in default. |
scoretype |
Argument used for GSEA. Default value is "std". If all scores are positive, then scoretype should be "pos". |
A list containing the test results for each cluster.
data(gene_pbmc) result <- easyct(gene_pbmc, db="cellmarker", species="Human", tissue=c("Blood", "Peripheral blood", "Blood vessel", "Umbilical cord blood", "Venous blood"), p_cut=0.3, test="GSEA", scoretype="pos")
data(gene_pbmc) result <- easyct(gene_pbmc, db="cellmarker", species="Human", tissue=c("Blood", "Peripheral blood", "Blood vessel", "Umbilical cord blood", "Venous blood"), p_cut=0.3, test="GSEA", scoretype="pos")
A data frame containing marker genes, clusters as well as the average of log 2 fold changes. The original data set is from 10X genomics, and we followed the standard workflow provided by Seurat package to process data, and then format to get the data frame.
data(gene_pbmc)
data(gene_pbmc)
A data frame with 727 rows and 3 variables:
Entrez IDs of the marker genes
Cluster
Average of log 2 fold changes getting from the process procedure
https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz
This function is used to convert the gene symbol to Entrez Id. Used in easyct function.
mapsymbol(d, species)
mapsymbol(d, species)
d |
A data frame where first column contains gene symbols. |
species |
"Human" or "Mouse". |
A data frame containing gene symbols and the corresponding Entrez ID
A list containing 2 elements: Human tissues and Mouse tissues.
data(panglao_tissue)
data(panglao_tissue)
A list with 2 elements:
Human tissue
Mouse tissue
Count matrix of Peripheral Blood Mononuclear Cells (PBMC). The original data set is from 10X genomics.
data(pbmc_data)
data(pbmc_data)
A large dgCMatrix: 32378 * 2700
Row index of the non-zero values
A vector to refer the column index of the non-zero values
Dimension of the matrix
A list of length 2 containing the row names and column names of the matrix
Vector containing all the non-zero values
https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz
This function is used to generate set of bar plots presenting up to 10 candidate cell types for each cluster.
plot_bar(test = "GSEA", data, cluster = NULL)
plot_bar(test = "GSEA", data, cluster = NULL)
test |
"GSEA" or "fisher" |
data |
Annotation results |
cluster |
Cluster can be specified to print plots. |
Bar plots showing show up to 10 candidate cell types for each cluster.
data(gene_pbmc) result <- easyct(gene_pbmc, db="cellmarker", species="Human", tissue=c("Blood", "Peripheral blood", "Blood vessel", "Umbilical cord blood", "Venous blood"), p_cut=0.3, test="GSEA", scoretype="pos") plot_bar("GSEA", result)
data(gene_pbmc) result <- easyct(gene_pbmc, db="cellmarker", species="Human", tissue=c("Blood", "Peripheral blood", "Blood vessel", "Umbilical cord blood", "Venous blood"), p_cut=0.3, test="GSEA", scoretype="pos") plot_bar("GSEA", result)
This function is used to generate a dor plot presenting the top 5 candidate cell types for each cluster.
plot_dot(test = "GSEA", data)
plot_dot(test = "GSEA", data)
test |
Test used to annotate cell types: "GSEA" or "fisher" |
data |
Annotation results |
A dot plot showing the top 5 significant cell types for each cluster.
data(gene_pbmc) result <- easyct(gene_pbmc, db="cellmarker", species="Human", tissue=c("Blood", "Peripheral blood", "Blood vessel", "Umbilical cord blood", "Venous blood"), p_cut=0.3, test="GSEA", scoretype="pos") plot_dot("GSEA", result)
data(gene_pbmc) result <- easyct(gene_pbmc, db="cellmarker", species="Human", tissue=c("Blood", "Peripheral blood", "Blood vessel", "Umbilical cord blood", "Venous blood"), p_cut=0.3, test="GSEA", scoretype="pos") plot_dot("GSEA", result)
This function is used to process the annotation test results. Processed data will be used to generate plots.
process_results(test, data)
process_results(test, data)
test |
Test used to annotation cell types: "GSEA" or "fisher" |
data |
Annotation results. |
A data frame used to generate plots.
This function is used to print summary table of annotation results for a specific cluster.
summarycelltype(test, results, cluster)
summarycelltype(test, results, cluster)
test |
"GSEA" or "fisher". |
results |
Annotation results. |
cluster |
Cluster of interest. |
A summary table of a annotation results. "core_enrichment" contains markers contributing on the annotation.
data(gene_pbmc) result <- easyct(gene_pbmc, db="cellmarker", species="Human", tissue=c("Blood", "Peripheral blood", "Blood vessel", "Umbilical cord blood", "Venous blood"), p_cut=0.3, test="GSEA", scoretype="pos") summarycelltype(test="GSEA", results=result, cluster=0)
data(gene_pbmc) result <- easyct(gene_pbmc, db="cellmarker", species="Human", tissue=c("Blood", "Peripheral blood", "Blood vessel", "Umbilical cord blood", "Venous blood"), p_cut=0.3, test="GSEA", scoretype="pos") summarycelltype(test="GSEA", results=result, cluster=0)
This function is used to conduct the modified Fisher's exact test.
test_fisher(testgenes, ref, cols)
test_fisher(testgenes, ref, cols)
testgenes |
A data frame containing query genes and the expression scores. |
ref |
The reference data base. |
cols |
Column names of the input data frame |
A data frame containg the results of fisher's exact test.