The EasyCellType
package was designed to examine an
input marker list using the databases and provide annotation
recommendations in graphical outcomes. The package refers to 3 public
available marker gene data bases, and provides two approaches to conduct
the annotation anaysis: gene set enrichment analysis(GSEA) and a
modified Fisher’s exact test. The package has been submitted to
bioconductor
to achieve an easy access for researchers.
This vignette shows a simple workflow illustrating how EasyCellType package works. The data set that will be used throughout the example is freely available from 10X Genomics.
The package can be installed using BiocManager
by the
following commands
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("EasyCellType")
Alternatively, the package can also be installed using
devtools
and launched by
After the installation, the package can be loaded with
We use the Peripheral Blood Mononuclear Cells (PBMC) data freely
available from 10X Genomics. The data can be downladed from https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz.
After downloading the data, it can be read using function
Read10X
.
We have included the data in our package, which can be loaded with
We followed the standard workflow provided by Seurat
package(Hao et al. 2021) to process the
PBMC data set. The detailed technical explanations can be found in https://satijalab.org/seurat/articles/pbmc3k_tutorial.html.
library(Seurat)
# Initialize the Seurat object
pbmc <- CreateSeuratObject(counts = pbmc_data, project = "pbmc3k", min.cells = 3, min.features = 200)
# QC and select samples
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
# Normalize the data
pbmc <- NormalizeData(pbmc)
# Identify highly variable features
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
# Scale the data
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
# Perfom linear dimensional reduction
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
# Cluster the cells
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)
# Find differentially expressed features
markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
Now we get the expressed markers for each cluster. We then convert the gene symbols to Entrez IDs.
library(org.Hs.eg.db)
library(AnnotationDbi)
markers$entrezid <- mapIds(org.Hs.eg.db,
keys=markers$gene, #Column containing Ensembl gene ids
column="ENTREZID",
keytype="SYMBOL",
multiVals="first")
markers <- na.omit(markers)
In case the data is measured in mouse, we would replace the package
org.Hs.eg.db
with org.Mm.eg.db
and do the
above analysis.
The input for EasyCellType
package should be a data
frame containing Entrez IDs, clusters and expression scores. The order
of columns should follow this rule. In each cluster, the gene should be
sorted by the expression score.
library(dplyr)
markers_sort <- data.frame(gene=markers$entrezid, cluster=markers$cluster,
score=markers$avg_log2FC) %>%
group_by(cluster) %>%
mutate(rank = rank(score), ties.method = "random") %>%
arrange(desc(rank))
input.d <- as.data.frame(markers_sort[, 1:3])
We have include the processed data in the package. It can be loaded with
Now we can call the annot
function to run annotation
analysis.
annot.GSEA <- easyct(input.d, db="cellmarker", species="Human",
tissue=c("Blood", "Peripheral blood", "Blood vessel",
"Umbilical cord blood", "Venous blood"), p_cut=0.3,
test="GSEA")
We used the GSEA approach to do the annotation. In our package, we
use GSEA
function in clusterProfiler
package(Wu et al. 2021) to conduct the
enrichment analysis. You can replace ‘GSEA’ with ‘fisher’ if you would
like to use Fisher exact test to do the annotation. The candidate
tissues can be seen using data(cellmarker_tissue)
,
data(clustermole_tissue)
and
data(panglao_tissue)
.
The dot plot showing the overall annotation results can be created by
Bar plot can be created by
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] dplyr_1.1.4 org.Hs.eg.db_3.20.0 AnnotationDbi_1.69.0
#> [4] IRanges_2.41.1 S4Vectors_0.45.2 Biobase_2.67.0
#> [7] BiocGenerics_0.53.3 generics_0.1.3 Seurat_5.1.0
#> [10] SeuratObject_5.0.2 sp_2.1-4 EasyCellType_1.5.4
#> [13] devtools_2.4.5 usethis_3.0.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] RcppAnnoy_0.0.22 splines_4.4.2 later_1.3.2
#> [4] ggplotify_0.1.2 tibble_3.2.1 R.oo_1.27.0
#> [7] polyclip_1.10-7 fastDummies_1.7.4 lifecycle_1.0.4
#> [10] globals_0.16.3 processx_3.8.4 lattice_0.22-6
#> [13] MASS_7.3-61 magrittr_2.0.3 plotly_4.10.4
#> [16] sass_0.4.9 rmarkdown_2.29 jquerylib_0.1.4
#> [19] yaml_2.3.10 remotes_2.5.0 httpuv_1.6.15
#> [22] ggtangle_0.0.4 sctransform_0.4.1 spam_2.11-0
#> [25] spatstat.sparse_3.1-0 sessioninfo_1.2.2 pkgbuild_1.4.5
#> [28] reticulate_1.40.0 pbapply_1.7-2 cowplot_1.1.3
#> [31] DBI_1.2.3 buildtools_1.0.0 RColorBrewer_1.1-3
#> [34] abind_1.4-8 pkgload_1.4.0 zlibbioc_1.52.0
#> [37] Rtsne_0.17 purrr_1.0.2 R.utils_2.12.3
#> [40] yulab.utils_0.1.8 GenomeInfoDbData_1.2.13 enrichplot_1.27.1
#> [43] ggrepel_0.9.6 irlba_2.3.5.1 spatstat.utils_3.1-1
#> [46] listenv_0.9.1 tidytree_0.4.6 maketools_1.3.1
#> [49] goftest_1.2-3 RSpectra_0.16-2 spatstat.random_3.3-2
#> [52] fitdistrplus_1.2-1 parallelly_1.39.0 leiden_0.4.3.1
#> [55] codetools_0.2-20 DOSE_4.1.0 tidyselect_1.2.1
#> [58] aplot_0.2.3 UCSC.utils_1.3.0 farver_2.1.2
#> [61] spatstat.explore_3.3-3 matrixStats_1.4.1 jsonlite_1.8.9
#> [64] ellipsis_0.3.2 progressr_0.15.0 ggridges_0.5.6
#> [67] survival_3.7-0 tools_4.4.2 treeio_1.31.0
#> [70] ica_1.0-3 Rcpp_1.0.13-1 glue_1.8.0
#> [73] gridExtra_2.3 xfun_0.49 qvalue_2.39.0
#> [76] GenomeInfoDb_1.43.1 withr_3.0.2 BiocManager_1.30.25
#> [79] fastmap_1.2.0 fansi_1.0.6 callr_3.7.6
#> [82] digest_0.6.37 R6_2.5.1 mime_0.12
#> [85] gridGraphics_0.5-1 colorspace_2.1-1 scattermore_1.2
#> [88] GO.db_3.20.0 tensor_1.5 spatstat.data_3.1-4
#> [91] RSQLite_2.3.8 R.methodsS3_1.8.2 utf8_1.2.4
#> [94] tidyr_1.3.1 data.table_1.16.2 httr_1.4.7
#> [97] htmlwidgets_1.6.4 org.Mm.eg.db_3.20.0 uwot_0.2.2
#> [100] pkgconfig_2.0.3 gtable_0.3.6 blob_1.2.4
#> [103] lmtest_0.9-40 XVector_0.47.0 sys_3.4.3
#> [106] clusterProfiler_4.15.0 htmltools_0.5.8.1 profvis_0.4.0
#> [109] dotCall64_1.2 fgsea_1.33.0 scales_1.3.0
#> [112] png_0.1-8 spatstat.univar_3.1-1 ggfun_0.1.7
#> [115] knitr_1.49 reshape2_1.4.4 nlme_3.1-166
#> [118] curl_6.0.1 zoo_1.8-12 cachem_1.1.0
#> [121] stringr_1.5.1 KernSmooth_2.23-24 parallel_4.4.2
#> [124] miniUI_0.1.1.1 desc_1.4.3 pillar_1.9.0
#> [127] grid_4.4.2 vctrs_0.6.5 RANN_2.6.2
#> [130] urlchecker_1.0.1 promises_1.3.0 xtable_1.8-4
#> [133] cluster_2.1.6 evaluate_1.0.1 cli_3.6.3
#> [136] compiler_4.4.2 rlang_1.1.4 crayon_1.5.3
#> [139] future.apply_1.11.3 ps_1.8.1 plyr_1.8.9
#> [142] forcats_1.0.0 fs_1.6.5 stringi_1.8.4
#> [145] deldir_2.0-4 viridisLite_0.4.2 BiocParallel_1.41.0
#> [148] munsell_0.5.1 Biostrings_2.75.1 lazyeval_0.2.2
#> [151] spatstat.geom_3.3-4 GOSemSim_2.33.0 Matrix_1.7-1
#> [154] RcppHNSW_0.6.0 patchwork_1.3.0 bit64_4.5.2
#> [157] future_1.34.0 ggplot2_3.5.1 KEGGREST_1.47.0
#> [160] shiny_1.9.1 ROCR_1.0-11 igraph_2.1.1
#> [163] memoise_2.0.1 bslib_0.8.0 ggtree_3.15.0
#> [166] fastmatch_1.1-4 bit_4.5.0 ape_5.8
#> [169] gson_0.1.0