The detection of tumor-host interaction is a major challange in cancer understanding. More recent advances in single cell next generation sequencing have provided new insights in unveiling tumor microenvironment (TME) cells comunications. The scTHI package provides some useful functions to identify and visualize the cell types making up the TME, and a rank-based approach to discover interesting ligand-receptor interactions establishing among malignant and non tumor cells
The scTHI package requires a matrix of count data (or normalized counts) from single cell RNA-seq experiments, where rows are genes presented with Hugo Symbols and columns are cells. For a practical demonstration, we will use the scRNA-seq data available on the Broad Institute Single-Cell Portal [1]. The dataset provides 3,321 scRNA-seq profiles from six primary Glioma (H3K27M-glioma), and includes both malignant cells and several tumor microenvironment cell types, as immune cells and oligodendrocytes. Our goal is to identify the significant ligand-receptor interactions that are established between cancer and immune cells present in the tumor microenvironment. For this reason, after downloading H3K27M-glioma data we preprocessed them, selecting only one sample of interest (i.e. BCH836), removing not-expresse genes, transforming data in log TPM , and applying quantile normalization. Processed are included in the package and can be loaded as follows:
H3K27.meta includes the annotation of each cell, so as to identify the tumor cells and the immune cells in BCH836 patient:
##
## Filter Immune cell Malignant Oligodendrocyte
## 2 53 438 34
We will use the cells annotated as Immune and a subset of Malignant.
The function scTHI.score
can be used to detect
significant ligand-receptor interactions occurring between cancer and
immune cells. The function uses a dataset of 1044 ligand-receptor pairs
to identify those which are significantly enriched between pairs
clusters of your dataset. A score is computed for each interaction pair.
This score does not simply reflect the average expression of the two
partners in the interaction pair, it is rather based on the percentage
of cells that express the interaction pair at the top of the ordered
gene list. It considers a gene expressed in a cell, only if it is in the
top 10% of the ranked gene list, as set in topRank
argument.
Interactions are not symmetric, so partnerA expression is considered on
the first cluster (in the example the Malignant cells), and partnerB
expression is considered on the second cluster (i.e. Immune cells). A
p-value for each pair is computed, based on a null model from random
permutations of data, and and only significant interaction pairs are
reported.
H3K27.result <-
scTHI_score(
expMat = H3K27,
cellCusterA = Malignant,
cellCusterB = Immune,
cellCusterAName = "Malignant",
cellCusterBName = "Immune",
topRank = 10,
PValue = TRUE,
pvalueCutoff = 0.05,
nPermu = 10,
ncore = 1
)
## Warning: Not all interaction genes are in expMat
## Computing score for 2333 interaction pairs...
## [1] "Computed ranked values for partner A"
## [1] "Computed ranked values for partner B"
## Removing low score interactions...
## Computing permutation....
## Interaction pairs detected:
## [1] "RPS19_C5AR1" "THY1_ITGAX:ITGB2" "B2M_HLA-F" "PTN_PLXNB2"
## [5] "B2M_LILRB1" "PSAP_LRP1" "SERPINE2_LRP1" "KLRC2_HLA-E"
## [9] "APP_CD74" "VCAN_ITGB1" "CALR_LRP1" "HLA-A_APLP2"
## [13] "THY1_ITGB2:ITGAM" "VCAN_TLR2" "HLA-A_LILRB1" "CHAD_ITGB1"
## [17] "APP_LRP1" "APP_NCSTN"
For which partner A and partner B are expressed in at least 50% of
the cells of the respective clusters, as set in
filterCutoff
argument. P-value computation is optional and
is estimated by permuting the mates of interaction pairs. Result also
includes two columns with the average expression values of each partner
in the respective clusters.
## interationPair interactionType partnerA partnerB
## RPS19_C5AR1 RPS19_C5AR1 simple RPS19 C5AR1
## THY1_ITGAX:ITGB2 THY1_ITGAX:ITGB2 complex THY1 ITGAX:ITGB2
## B2M_HLA-F B2M_HLA-F simple B2M HLA-F
## PTN_PLXNB2 PTN_PLXNB2 simple PTN PLXNB2
## B2M_LILRB1 B2M_LILRB1 simple B2M LILRB1
## PSAP_LRP1 PSAP_LRP1 simple PSAP LRP1
## rnkPartnerA_Malignant expValueA_Malignant rnkPartnerB_Immune
## RPS19_C5AR1 0.99 9.63 0.89
## THY1_ITGAX:ITGB2 0.81 7.12 0.92
## B2M_HLA-F 0.98 9.63 0.66
## PTN_PLXNB2 0.98 9.57 0.62
## B2M_LILRB1 0.98 9.63 0.60
## PSAP_LRP1 0.98 9.09 0.60
## expValueB_Immune SCORE pValue FDR
## RPS19_C5AR1 7.82 0.940 0 0
## THY1_ITGAX:ITGB2 8.08 # 9.18 0.865 0 0
## B2M_HLA-F 6.34 0.820 0 0
## PTN_PLXNB2 6.23 0.800 0 0
## B2M_LILRB1 5.91 0.790 0 0
## PSAP_LRP1 6.3 0.790 0 0
The function scTHI.plotResult
with argument
plotType
as “score” displays the score of each significant
pair through barplots. The number of asterisks indicates significance.
Otherwise, setting the argument plotType
as “pair”, for
each pair are shown bar plots displaying the percentage of cells in each
cluster expressing partner A and partner B, respectively.
Another useful way to explore results is the visualization of the entire data set through the t-SNE plot. scTHI allows the user to compute the non-linear dimensionality reduction of data using the scTHI.runTsne function based on the Rtsne R package [2].
## Loading required namespace: Rtsne
Users can decide to display on the tsne the two input clusters, i.e. Maglignant and Immune cells, or to observe the intensity of the expression of an intresting interaction pair in the whole dataset. The scTHI.plotCluster function differently labels cells on t-SNE if they belong to ClusterA or ClusterB.
On the other hand the scTHI.plotPairs function colors the cells on the t-SNE based on the expression levels of a user-defined interaction pair. For example, let’s try plotting the expression of the interaction pair consisting of THY1, which should be expressed by the cancer cells and the ITGAX-ITGB2 complex, instead expressed by the immune cells.
scTHI_plotPairs(
scTHIresult = H3K27.result,
cexPoint = 0.8,
interactionToplot = "THY1_ITGAX:ITGB2"
)
The THY1 gene appears uniformly expressed in tumor cells, and in according to results table, it was expressed in 861% of malignant cells at the top 10% of the ranked gene list of each cell. On the other hand the ITGAX-ITGB2 complex is highly specific of the Immune cluster, where it is expressed by 92% of the cells at the top 10% of the ranked gene list of each cell.
Another important problem in study tumor-host interaction is the
classification of all cell types that make up the tumor
microenvironment. Generally, tools for single cell data analysis are
based on the use of marker genes to classify different cell types
identified by clustering approaches. This strategy is not always the
most accurate, our approach uses a different method for the
classification of TME based on gene signature enrichment analysis. The
TME.classification
implements the Mann-Whitney-Wilcoxon
Gene Set Test (MWW-GST) algorithm [3] and tests for each cell the
enrichment of a collection of signatures of different cell types, with
greater interest for the Immune System and the Central Nervous System
cells. Thus, each cell is associated with a specific phenotype based on
the more significant enriched gene set. Let’s try to classify the TME
cells of the BCH836 patient, excluding malignant cells from the
dataset.
Tme.cells <-
rownames(H3K27.meta)[which(H3K27.meta$Type != "Malignant")]
H3K27.tme <- H3K27[, Tme.cells]
Note that TME.classification
function can take a long
time if the number of cells to test is large.
The H3K27M dataset annotation table suggests that non-tumor cells from BCH836 patient are mainly composed by Oligodendrocyte and Immune cells, respectively. The TME.classification function identifies a similar number of Oligodendrocytes and Immune cells, distinguishing the latter mainly in Microglia, Monocytes and Macrophages and fewer other Immune cells.
##
## Filter Immune cell Oligodendrocyte
## Adipocytes 0 0 1
## Bcells 0 1 0
## Eosinophils 0 1 0
## Macrophages 0 4 0
## MacrophagesM1 0 5 0
## Microglia 0 19 0
## Monocytes 0 20 0
## NaturalKiller 0 1 0
## Neutrophils 0 1 0
## Oligodendrocytes 2 0 32
## Tregs 0 1 0
## nc 0 0 1
Finally, using the previously calculated t-SNE or a set of user-defined coordinates, you can view the new classification of the TME.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] scTHI.data_1.18.0 scTHI_1.19.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] cli_3.6.3 knitr_1.49 rlang_1.1.4
## [4] xfun_0.49 jsonlite_1.8.9 Rtsne_0.17
## [7] buildtools_1.0.0 htmltools_0.5.8.1 maketools_1.3.1
## [10] sys_3.4.3 sass_0.4.9 rmarkdown_2.29
## [13] evaluate_1.0.1 jquerylib_0.1.4 fastmap_1.2.0
## [16] yaml_2.3.10 lifecycle_1.0.4 BiocManager_1.30.25
## [19] compiler_4.4.2 codetools_0.2-20 Rcpp_1.0.13-1
## [22] BiocParallel_1.41.0 digest_0.6.37 R6_2.5.1
## [25] parallel_4.4.2 bslib_0.8.0 tools_4.4.2
## [28] cachem_1.1.0
Filbin, M. G., Tirosh, I., Hovestadt, V., Shaw, M. L., Escalante, L. E., Mathewson, N. D., … & Haberler, C. (2018). Developmental and oncogenic programs in H3K27M gliomas dissected by single-cell RNA-seq. Science, 360(6386), 331-335.
Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605.
Frattini, V., Pagnotta, S. M., Fan, J. J., Russo, M. V., Lee, S. B., Garofano, L., … & Frederick, V. (2018). A metabolic function of FGFR3-TACC3 gene fusions in cancer. Nature, 553(7687), 222.