dominoSignal is a tool for analysis of intra- and intercellular signaling in single cell RNA sequencing (scRNAseq) data based on transcription factor (TF) activation. Here, we show a basic example pipeline for generating communication networks as a domino object. This vignette will demonstrate how to use dominoSignal on the 10X Genomics Peripheral Blood Mononuclear Cells (PBMC) data set of 2,700 cells. The data can be downloaded here.
# BiocFileCache helps with managing files across sessions
bfc <- BiocFileCache::BiocFileCache(ask = FALSE)
data_url <- "https://zenodo.org/records/10951634/files"
# download the reduced pbmc files preprocessed PBMC 3K scRNAseq data set as a
# Seurat object
pbmc_url <- paste0(data_url, "/pbmc3k_sce.rds")
pbmc <- BiocFileCache::bfcrpath(bfc, pbmc_url)
# download scenic results
scenic_auc_url <- paste0(data_url, "/auc_pbmc_3k.csv")
scenic_auc <- BiocFileCache::bfcrpath(bfc, scenic_auc_url)
scenic_regulon_url <- paste0(data_url, "/regulons_pbmc_3k.csv")
scenic_regulon <- BiocFileCache::bfcrpath(bfc, scenic_regulon_url)
# download CellPhoneDB files
cellphone_url <- "https://github.com/ventolab/cellphonedb-data/archive/refs/tags/v4.0.0.tar.gz"
cellphone_tar <- BiocFileCache::bfcrpath(bfc, cellphone_url)
cellphone_dir <- paste0(tempdir(), "/cellphone")
untar(tarfile = cellphone_tar, exdir = cellphone_dir)
cellphone_data <- paste0(cellphone_dir, "/cellphonedb-data-4.0.0/data")
# directory for created inputs to pySCENIC and dominoSignal
input_dir <- paste0(tempdir(), "/inputs")
if (!dir.exists(input_dir)) {
dir.create(input_dir)
}
Analysis of cell-cell communication with dominoSignal often follows initial processing and annotation of scRNAseq data. We used the OSCA workflow to prepare the data set for analysis with dominoSignal. The complete processing script is available in the data-raw directory of the dominoSignal package. The processed data can be downloaded from Zenodo.
Installation of dominoSignal from Bioconductor can be achieved using
the {BiocManager}
package.
The TF activities are a required input for dominoSignal, and the regulons learned by SCENIC are an important but optional input needed to annotate TF-target interactions and to prune TF-receptor linkages where the receptor is a target of the TF. This prevents the distinction of receptor expression driving TF activity or the TF inducing the receptor’s expression.
regulons <- read.csv(scenic_regulon)
auc <- read.table(scenic_auc, header = TRUE, row.names = 1, stringsAsFactors = FALSE,
sep = ",")
The initial regulons data frame read into R from the ctx function has
two rows of column names that need to be replaced with one succinct
description. dominoSignal has changed the input format for TF regulons
to be a list storing vectors of target genes in each regulon, where the
names of the list are the TF genes. This facilitates the use of
alternative methods for TF activity quantification. We provide a helper
function, create_regulon_list_scenic()
for easy retrieval
of TF regulons from the output of the pySCENIC ctx
function.
regulons <- regulons[-1:-2, ]
colnames(regulons) <- c("TF", "MotifID", "AUC", "NES", "MotifSimilarityQvalue", "OrthologousIdentity",
"Annotation", "Context", "TargetGenes", "RankAtMax")
regulon_list <- create_regulon_list_scenic(regulons = regulons)
Users should be aware that the AUC matrix from SCENIC is loaded in a cell x TF
orientation and should be transposed to TF x cell orientation. pySCENIC
also appends “(+)” to TF names that are converted to “…” upon loading
into R. These characters can be included without affecting the results
of dominoSignal analysis but can be confusing when querying TF features
in the data. We recommend comprehensive removal of the “…” characters
using the gsub()
function.
dominoSignal has been updated to read ligand - receptor data bases in a uniform data frame format referred to as a receptor-ligand map (rl_map) to enable the use of alternative or updated reference databases in addition to the particular version of CellPhoneDB’s database in older versions of Domino. Each row corresponds to a ligand - receptor interaction. Genes participating in an interaction are referred to as “partner A” and “partner B” without requiring for fixed ordering of whether A or B is the ligand and vice versa. The minimum required columns of this data frame are:
Additional annotation columns can be provided such as name_A and name_B for ligands or receptors whose name in the interaction database does not match the names of their encoding genes. This formatting also allows for the consideration of ligand and receptor complexes comprised of a heteromeric combination of multiple proteins that must be co-expressed to function. In these cases, the “name_*” column shows the name of the protein complex, and the “gene_*” column shows the names of the genes encoding the components of the complex separated by commas “,”. When plotting results from the build domino object, the names of the interacting ligands and receptors will be used based on combined expression of the complex components.
To facilitate the use of this formatting with the CellPhoneDB
database, we include a helper function,
create_rl_map_cellphonedb()
, that automatically parses
files from the CellPhoneDB database to arrive at the rl_map format.
complexes <- read.csv(paste0(cellphone_data, "/complex_input.csv"), stringsAsFactors = FALSE)
genes <- read.csv(paste0(cellphone_data, "/gene_input.csv"), stringsAsFactors = FALSE)
interactions <- read.csv(paste0(cellphone_data, "/interaction_input.csv"), stringsAsFactors = FALSE)
proteins <- read.csv(paste0(cellphone_data, "/protein_input.csv"), stringsAsFactors = FALSE)
rl_map <- create_rl_map_cellphonedb(genes = genes, proteins = proteins, interactions = interactions,
complexes = complexes, database_name = "CellPhoneDB_v4.0" # database version used
)
knitr::kable(head(rl_map))
int_pair | name_A | uniprot_A | gene_A | type_A | name_B | uniprot_B | gene_B | type_B | annotation_strategy | source | database_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | RAreceptor_RXRG & atRetinoicAcid_byALDH1A3 | RAreceptor_RXRG | P48443,P29373 | RXRG,CRABP2 | R | atRetinoicAcid_byALDH1A3 | P47895 | ALDH1A3 | L | curated | uniprot;reactome | CellPhoneDB_v4.0 |
5 | RAreceptor_RXRG & atRetinoicAcid_byALDH1A2 | RAreceptor_RXRG | P48443,P29373 | RXRG,CRABP2 | R | atRetinoicAcid_byALDH1A2 | O94788 | ALDH1A2 | L | curated | uniprot;reactome | CellPhoneDB_v4.0 |
6 | RAreceptor_RXRG & atRetinoicAcid_byALDH1A1 | RAreceptor_RXRG | P48443,P29373 | RXRG,CRABP2 | R | atRetinoicAcid_byALDH1A1 | P00352 | ALDH1A1 | L | curated | uniprot;reactome | CellPhoneDB_v4.0 |
7 | RAreceptor_RXRB & atRetinoicAcid_byALDH1A3 | RAreceptor_RXRB | P28702,P29373 | RXRB,CRABP2 | R | atRetinoicAcid_byALDH1A3 | P47895 | ALDH1A3 | L | curated | uniprot;reactome | CellPhoneDB_v4.0 |
8 | RAreceptor_RXRB & atRetinoicAcid_byALDH1A2 | RAreceptor_RXRB | P28702,P29373 | RXRB,CRABP2 | R | atRetinoicAcid_byALDH1A2 | O94788 | ALDH1A2 | L | curated | uniprot;reactome | CellPhoneDB_v4.0 |
9 | RAreceptor_RXRB & atRetinoicAcid_byALDH1A1 | RAreceptor_RXRB | P28702,P29373 | RXRB,CRABP2 | R | atRetinoicAcid_byALDH1A1 | P00352 | ALDH1A1 | L | curated | uniprot;reactome | CellPhoneDB_v4.0 |
The change to use rl_map formatting also enables users to manually
append interactions of interest that are not included in the interaction
database if need be. This can be attained by formatting the desired
interactions as a data frame with the same column headers as the rl_map
and using the rbind()
function.
# Integrin complexes are not annotated as receptors in CellPhoneDB_v4.0
# collagen-integrin interactions between cells may be missed unless tables from
# the CellPhoneDB reference are edited or the interactions are manually added
col_int_df <- data.frame(int_pair = "a11b1 complex & COLA1_HUMAN", name_A = "a11b1 complex",
uniprot_A = "P05556,Q9UKX5", gene_A = "ITB1,ITA11", type_A = "R", name_B = "COLA1_HUMAN",
uniprot_B = "P02452,P08123", gene_B = "COL1A1,COL1A2", type_B = "L", annotation_strategy = "manual",
source = "manual", database_name = "manual")
rl_map_append <- rbind(col_int_df, rl_map)
knitr::kable(head(rl_map_append))
int_pair | name_A | uniprot_A | gene_A | type_A | name_B | uniprot_B | gene_B | type_B | annotation_strategy | source | database_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | a11b1 complex & COLA1_HUMAN | a11b1 complex | P05556,Q9UKX5 | ITB1,ITA11 | R | COLA1_HUMAN | P02452,P08123 | COL1A1,COL1A2 | L | manual | manual | manual |
4 | RAreceptor_RXRG & atRetinoicAcid_byALDH1A3 | RAreceptor_RXRG | P48443,P29373 | RXRG,CRABP2 | R | atRetinoicAcid_byALDH1A3 | P47895 | ALDH1A3 | L | curated | uniprot;reactome | CellPhoneDB_v4.0 |
5 | RAreceptor_RXRG & atRetinoicAcid_byALDH1A2 | RAreceptor_RXRG | P48443,P29373 | RXRG,CRABP2 | R | atRetinoicAcid_byALDH1A2 | O94788 | ALDH1A2 | L | curated | uniprot;reactome | CellPhoneDB_v4.0 |
6 | RAreceptor_RXRG & atRetinoicAcid_byALDH1A1 | RAreceptor_RXRG | P48443,P29373 | RXRG,CRABP2 | R | atRetinoicAcid_byALDH1A1 | P00352 | ALDH1A1 | L | curated | uniprot;reactome | CellPhoneDB_v4.0 |
7 | RAreceptor_RXRB & atRetinoicAcid_byALDH1A3 | RAreceptor_RXRB | P28702,P29373 | RXRB,CRABP2 | R | atRetinoicAcid_byALDH1A3 | P47895 | ALDH1A3 | L | curated | uniprot;reactome | CellPhoneDB_v4.0 |
8 | RAreceptor_RXRB & atRetinoicAcid_byALDH1A2 | RAreceptor_RXRB | P28702,P29373 | RXRB,CRABP2 | R | atRetinoicAcid_byALDH1A2 | O94788 | ALDH1A2 | L | curated | uniprot;reactome | CellPhoneDB_v4.0 |
dominoSignal analysis takes place in two steps.
create_domino()
initializes the domino result object
assesses differential TF activity across cell clusters by Wilcoxon rank-sum test
establishes TF-receptor linkages based on Spearman correlation of TF activities with receptor expression across the queried data set.
build_domino()
sets the parameters for which TFs and receptors are called as active within a cell cluster
aggregates the scaled expression of ligands capable of interacting with the active receptors for assessment of ligand type and cellular source triggering activation of a receptor.
dominoSignal infers active receipt of signals via receptors based on
the correlation of the receptor’s expression with TF activity across the
data set and differential activity of the TF within a cell cluster.
Correlations are conducted using scaled expression values rather than
raw counts or normalized counts. For assessment of receptor activity on
a per cell type basis, a named vector of cell cluster assignments, where
the names are cell barcodes matching the expression matrix, are
provided. Assessing signaling based on other categorical groupings of
cells can be achieved by passing these groupings as “clusters” to
build_domino()
in place of cell types. dominoSignal accepts
a matrix of counts, a matrix of scaled counts, and a
named vector of cell cluster labels as factors. Shown
below is how to extract these elements from a SingleCellExperiment
object. Note that since the data is scaled by gene, genes with no
expression in any cell need to be removed.
counts = assay(pbmc, "counts")
logcounts = assay(pbmc, "logcounts")
logcounts = logcounts[rowSums(logcounts) > 0, ]
z_scores = t(scale(t(logcounts)))
clusters = factor(pbmc$cell_type)
names(clusters) = colnames(pbmc)
Note: Ligand and receptor expression can only be assessed for genes included in the z_scores matrix. Many scRNAseq analysis pipelines recommend storing only genes with high variance in scaled expression slots for these data objects, thereby missing many genes encoding ligands and receptors. Ensure that all your genes of interest are included in the rows of your z_scores matrix. Scaled expression was calculated for all genes in this PBMC data set after removal of genes expressed in less than three cells.
At this point, the create_domino()
function can be used
to make the object. Parameters of note include “use_clusters” which is
required to assess signaling between cell types rather than linkage of
TFs and receptors broadly across the data set. “use_complexes” decides
if receptors that function in heteromeric complexes will be considered
in testing linkages between TFs and receptors. If TRUE, a receptor
complex is only linked to a TF if a majority of the component genes meet
the Spearman correlation threshold. “remove_rec_dropout” decides whether
receptor values of zero will be considered in correlation calculations;
this measure is intended to reduce the effect of dropout on receptors
with low expression.
To run create_domino()
with matrix and vector
inputs:
pbmc_dom <- create_domino(rl_map = rl_map, features = auc_in, counts = counts, z_scores = z_scores,
clusters = clusters, tf_targets = regulon_list, use_clusters = TRUE, use_complexes = TRUE,
remove_rec_dropout = FALSE)
# rl_map: receptor - ligand map data frame features: TF activation scores (AUC
# matrix) counts: counts matrix z_scores: scaled expression data clusters:
# named vector of cell cluster assignments tf_targets: list of TFs and their
# regulons use_clusters: assess receptor activation and ligand expression on a
# per-cluster basis use_complexes: include receptors and genes that function as
# a complex in results remove_rec_dropout: whether to remove zeroes from
# correlation calculations
For information on what is stored in a domino object and how to access it, please see our vignette on the structure of domino objects.
build_domino()
finalizes the construction of the domino
object by setting parameters for identifying TFs with differential
activation between clusters, receptor linkage with TFs based on
magnitude of positive correlation, and the minimum percentage of cells
within a cluster that have expression of a receptor for the receptor to
be called as active.
There are also options for thresholds of the number of TFs that may be called active in a cluster and the number of receptors that may be linked to any one TF. For thresholds of n TFs and m receptors, the bottom n TFs by lowest p-values from the Wilcoxon rank sum test and the top m receptors by Spearman correlation coefficient are chosen.
pbmc_dom <- build_domino(dom = pbmc_dom, min_tf_pval = 0.001, max_tf_per_clust = 25,
max_rec_per_tf = 25, rec_tf_cor_threshold = 0.25, min_rec_percentage = 0.1)
# min_tf_pval: Threshold for p-value of DE for TFs rec_tf_cor_threshold:
# Minimum correlation between receptor and TF min_rec_percentage: Minimum
# percent of cells that must express receptor
Both thresholds for the number of receptors and TFs can be sent to infinity (Inf) to collect all receptors and TFs that meet statistical significance thresholds.
Multiple functions are available to visualize the intracellular networks between receptors and TFs and the ligand - receptor mediated intercellular networks between cell types.
Enrichment of TF activities by cell types can be visualized by
feat_heatmap()
which plots data set-wide TF activity scores
as a heatmap.
The cumulative degree of signaling between clusters is assessed as
the sum of the scaled expression of ligands targeting active receptors
on another cluster. This can be visualized in graph format using the
signaling_network()
function. Nodes represent each cell
cluster and edges scale with the magnitude of signaling between the
clusters. The color of the edge corresponds to the sender cluster for
that signal.
Signaling networks can also be drawn with the edges only rendering the signals directed towards a given cell type or signals from one cell type directed to others. To see those options in use, as well as other plotting functions and their options, please see our plotting vignette.
Beyond the aggregated degree of signaling between cell types, the
degrees of signaling through specific ligand - receptor interactions can
be assessed. gene_network()
provides a graph of linkages
between active TFs in a cluster, the linked receptors in that cluster,
and the possible ligands of these active receptors.
New to dominoSignal, gene_network()
can be used between
two clusters to determine if any of the possible ligands for a given
receptor are expressed by a putative outgoing signaling cluster.
gene_network(pbmc_dom, clust = "dendritic_cell", OutgoingSignalingClust = "CD14_monocyte",
layout = "grid")
A comprehensive assessment of ligand expression targeting active
receptors on a given cluster can be assessed with
incoming_signaling_heatmap()
.
incoming_signaling_heatmap(pbmc_dom, rec_clust = "dendritic_cell", max_thresh = 2.5,
use_raster = FALSE)
Another form of comprehensive ligand expression assessment is available for individual active receptors in the form of circos plots new to dominoSignal. The outer arcs correspond to clusters in the domino object with inner arcs representing each possible ligand of the plotted receptor. Arcs are drawn between ligands on a cell type and the receptor if the ligand is expressed above the specified threshold. Arc widths correspond to the mean express of the ligand by the cluster with the widest arc width scaling to the maximum expression of the ligand within the data.
Since dominoSignal is a package still being developed, there are new functions and features that will be implemented in future versions. In the meantime, we have put together further information on plotting and the domino object structure if you would like to explore more of the package’s functionality. Additionally, if you find any bugs, have further questions, or want to share an idea, please let us know here.
Date last built and session information:
Sys.Date()
#> [1] "2024-10-30"
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] grid stats4 stats graphics grDevices utils datasets
#> [8] methods base
#>
#> other attached packages:
#> [1] knitr_1.48 ComplexHeatmap_2.21.1
#> [3] circlize_0.4.16 plyr_1.8.9
#> [5] SingleCellExperiment_1.27.2 SummarizedExperiment_1.35.5
#> [7] Biobase_2.67.0 GenomicRanges_1.57.2
#> [9] GenomeInfoDb_1.41.2 IRanges_2.39.2
#> [11] S4Vectors_0.43.2 BiocGenerics_0.53.0
#> [13] MatrixGenerics_1.17.1 matrixStats_1.4.1
#> [15] dominoSignal_1.1.0 rmarkdown_2.28
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.3 httr2_1.0.5 formatR_1.14
#> [4] biomaRt_2.63.0 rlang_1.1.4 magrittr_2.0.3
#> [7] clue_0.3-65 GetoptLong_1.0.5 compiler_4.4.1
#> [10] RSQLite_2.3.7 png_0.1-8 vctrs_0.6.5
#> [13] stringr_1.5.1 pkgconfig_2.0.3 shape_1.4.6.1
#> [16] crayon_1.5.3 fastmap_1.2.0 backports_1.5.0
#> [19] dbplyr_2.5.0 XVector_0.45.0 utf8_1.2.4
#> [22] UCSC.utils_1.1.0 purrr_1.0.2 bit_4.5.0
#> [25] xfun_0.48 zlibbioc_1.51.2 cachem_1.1.0
#> [28] jsonlite_1.8.9 progress_1.2.3 blob_1.2.4
#> [31] highr_0.11 DelayedArray_0.31.14 broom_1.0.7
#> [34] parallel_4.4.1 prettyunits_1.2.0 cluster_2.1.6
#> [37] R6_2.5.1 bslib_0.8.0 stringi_1.8.4
#> [40] RColorBrewer_1.1-3 car_3.1-3 jquerylib_0.1.4
#> [43] Rcpp_1.0.13 iterators_1.0.14 igraph_2.1.1
#> [46] Matrix_1.7-1 tidyselect_1.2.1 abind_1.4-8
#> [49] yaml_2.3.10 doParallel_1.0.17 codetools_0.2-20
#> [52] curl_5.2.3 lattice_0.22-6 tibble_3.2.1
#> [55] withr_3.0.2 KEGGREST_1.45.1 evaluate_1.0.1
#> [58] BiocFileCache_2.15.0 xml2_1.3.6 Biostrings_2.75.0
#> [61] pillar_1.9.0 ggpubr_0.6.0 filelock_1.0.3
#> [64] carData_3.0-5 foreach_1.5.2 generics_0.1.3
#> [67] hms_1.1.3 ggplot2_3.5.1 munsell_0.5.1
#> [70] scales_1.3.0 glue_1.8.0 maketools_1.3.1
#> [73] tools_4.4.1 sys_3.4.3 ggsignif_0.6.4
#> [76] buildtools_1.0.0 tidyr_1.3.1 AnnotationDbi_1.69.0
#> [79] colorspace_2.1-1 GenomeInfoDbData_1.2.13 Formula_1.2-5
#> [82] cli_3.6.3 rappdirs_0.3.3 fansi_1.0.6
#> [85] S4Arrays_1.5.11 dplyr_1.1.4 gtable_0.3.6
#> [88] rstatix_0.7.2 sass_0.4.9 digest_0.6.37
#> [91] SparseArray_1.5.45 rjson_0.2.23 memoise_2.0.1
#> [94] htmltools_0.5.8.1 lifecycle_1.0.4 httr_1.4.7
#> [97] GlobalOptions_0.1.2 bit64_4.5.2