BulkSignalR :
Inference of ligand-receptor interactions from bulk data or spatial transcriptomics

library(BulkSignalR)
library(igraph)
library(dplyr)
library(STexampleData)

Introduction

What is it for?

BulkSignalR is a tool that enables the inference of L-R interactions from bulk expression data, i.e., from transcriptomics (RNA-seq or microarrays) or expression proteomics.

BulkSignalR also applies to spatial transcriptomics such as 10x Genomics VISIUM (TM), and a set of functions dedicated to spatial data have been added to better support this particular use of the package.

Starting point

There are a variety of potential data sources prior to using BulkSignalR (proteomics, sequencing, etc.) that result in the generation of a matrix of numbers representing the expression levels of genes or proteins. This matrix may be normalized already or not. In every case, the latter matrix is the starting point of using BulkSignalR.

It is mandatory that genes/proteins are represented as rows of the expression matrix and the samples as columns. HUGO gene symbols (even for proteins) must be used to ensure matching LRdb, Reactome, and GOBP contents.

You can also directly work with an object from the SummarizedExperiment or SpatialExperiment Bioconductor classes.

How does it work?

As represented in the figure below, only a few steps are required in order to perform a BulkSignalR analysis.

Three S4 objects will be sequentially constructed:
* BSR-DataModel, denoted bsrdm
* BSR-Inference, denoted bsrinf
* BSR-Signature, denoted bsrsig


BSRDataModel integrates the expression data matrix and the parameters of the statistical model learned on this matrix.

BSRInference provides a list triples (ligand, receptor, pathway downstream the receptor) with their statistical significance. If a receptor occurs in multiple pathways, the corresponding number of triples will be described in the BSR-Inference object. Genes (or proteins) targeted by the pathway are also described in this object.

BSRSignature contains gene signatures associated with the triples (ligand, receptor, downstream pathway) stored in a BSRInference object. Those signatures are comprised of the ligand and the receptor obviously, but also all the pathway target genes that were used by the statistical model. Gene signatures are meant to report the L-R interaction as a global phenomenon integrating its downstream effect. Indeed, signatures can be scored with a dedicated function allowing the user to set an arbitrary weight on the downtream component. Those scores represent the activity of the L-R interactions across the samples, they are returned as a matrix.

Because of the occurrence of certain receptors in multiple pathways, but also because some ligands may bind several receptors, or vice versa, BSR-Inference objects may contain redundant data depending on how the user want to look at them. We therefore provide a range of reduction operators meant to obtain reduced BSR-Inference objects (see below).


Furthermore, we provide several handy functions to explore the data through different plots (heatmaps, alluvial plots, chord diagrams or networks).

BulkSignalR package functions have many parameters that can be changed by the user to fit specific needs (see Reference Manual for details).

Parallel mode settings

Users can reduce compute time by using several processors in parallel.

library(doParallel)
n.proc <- 1
cl <- makeCluster(n.proc)
registerDoParallel(cl)

# To add at the end of your script
# stopCluster(cl)

Notes : For operating systems that can fork such as all the UNIX-like systems, it might be preferable to use the library doMC that is faster (less overhead). This is transparent to BulkSignalR.

In case you need to reproduce results exactly and since statistical model parameter learning involves the generation of randomized expression matrices, you might want to use set.seed(). In a parallel mode, iseed that is a parameter of clusterSetRNGStream must be used.

First Example

Loading the data

Here, we load salivary duct carcinoma transcriptomes integrated as sdc in BulkSignalR.

data(sdc)
head(sdc)
##          SDC16 SDC15 SDC14 SDC13 SDC12 SDC11 SDC10 SDC9 SDC8 SDC7 SDC6 SDC5
## TSPAN6    1141  2491  1649  3963  2697  3848  3115  495 1777 1997 1682 1380
## TNMD         3     0     2    48     0     0    10    0  195    0    4    0
## DPM1      1149  2105  1579  2597  2569  2562  1690  964  980 3323 3409  743
## SCYL3     1814   768  3513  1402  1007  1243  2501 3080  939 1190 1489 1097
## C1orf112   438  1016   507  1149   598  1017   464  531  388 1088 1719  625
## FGR        381   922   307   256   249   902   304  355  410 1054  265  121
##          SDC4 SDC3 SDC2 SDC1 SDC17 SDC19 SDC20  N20 SDC21 SDC22  N22 SDC23
## TSPAN6   1392 2490 1841 1883  1915  4093  3127 8035  4738  2994 9782  3866
## TNMD        0    0    0    1     5     0    18   21     6     2   27     0
## DPM1     1147 1705 1972 3048  2405  2943  2710 1688  2654  2772 2018  1610
## SCYL3     507 1545 2450 2006   910  1842  2947 1058  2636  4199 1173  2980
## C1orf112  318 1338 3065  723   509  1320   767  179   718  1498  147   629
## FGR       408  813  787 1152  1049  4205   722  149   584   701  164   842
##          SDC24 SDC25
## TSPAN6    3014  1765
## TNMD        14     3
## DPM1      2875  2357
## SCYL3     2079   903
## C1orf112  1008   521
## FGR        676   654

Building a BSRDataModel object

BSRDataModel creates the first object with information relative to your bulk expression data.

bsrdm <- BSRDataModel(counts = sdc)
bsrdm
## Expression values are log2-transformed: FALSE
## Normalization method: UQ
## Organism: hsapiens
## Statistical model parameters:
## List of 1
##  $ spatial.smooth: logi FALSE
## Expression data:
##                SDC16     SDC15       SDC14     SDC13     SDC12     SDC11
## TSPAN6   1285.996070 2245.8788 1651.184018 3319.4416 2573.9214 3565.2017
## TNMD        3.381234    0.0000    2.002649   40.2052    0.0000    0.0000
## DPM1     1295.012694 1897.8623 1581.091307 2175.2687 2451.7627 2373.7128
## SCYL3    2044.519606  692.4267 3517.652793 1174.3268  961.0452 1151.6491
## C1orf112  493.660192  916.0228  507.671496  962.4119  570.7100  942.2584
## FGR       429.416742  831.2727  307.406606  214.4277  237.6368  835.7100
##                SDC10      SDC9
## TSPAN6   2996.458012  562.3750
## TNMD        9.619448    0.0000
## DPM1     1625.686690 1095.2111
## SCYL3    2405.823913 3499.2222
## C1orf112  446.342381  603.2750
## FGR       292.431215  403.3194

learnParameters updates a BSR-DataModel object with the parameters necessary for BulkSignalR statistical model.

bsrdm <- learnParameters(bsrdm,quick=TRUE)
bsrdm
## Expression values are log2-transformed: FALSE
## Normalization method: UQ
## Organism: hsapiens
## Statistical model parameters:
## List of 11
##  $ spatial.smooth: logi FALSE
##  $ n.rand.LR     : int 5
##  $ n.rand.RT     : int 2
##  $ with.complex  : logi TRUE
##  $ max.pw.size   : num 400
##  $ min.pw.size   : num 5
##  $ min.positive  : num 4
##  $ quick         : logi TRUE
##  $ min.corr.LR   : num -1
##  $ LR.0          :List of 2
##   ..$ n    : int 15445
##   ..$ model:List of 7
##   .. ..$ mu     : num 0.0143
##   .. ..$ sigma  : num 0.264
##   .. ..$ factor : num 1
##   .. ..$ start  : num 6.26e-05
##   .. ..$ distrib: chr "censored_normal"
##   .. ..$ D      : num 0.488
##   .. ..$ Chi2   : num 9.51e-05
##  $ RT.0          :List of 2
##   ..$ n    : int 15445
##   ..$ model:List of 7
##   .. ..$ mu     : num 0.0143
##   .. ..$ sigma  : num 0.264
##   .. ..$ factor : num 1
##   .. ..$ start  : num 6.26e-05
##   .. ..$ distrib: chr "censored_normal"
##   .. ..$ D      : num 0.488
##   .. ..$ Chi2   : num 9.51e-05
## Expression data:
##                SDC16     SDC15       SDC14     SDC13     SDC12     SDC11
## TSPAN6   1285.996070 2245.8788 1651.184018 3319.4416 2573.9214 3565.2017
## TNMD        3.381234    0.0000    2.002649   40.2052    0.0000    0.0000
## DPM1     1295.012694 1897.8623 1581.091307 2175.2687 2451.7627 2373.7128
## SCYL3    2044.519606  692.4267 3517.652793 1174.3268  961.0452 1151.6491
## C1orf112  493.660192  916.0228  507.671496  962.4119  570.7100  942.2584
## FGR       429.416742  831.2727  307.406606  214.4277  237.6368  835.7100
##                SDC10      SDC9
## TSPAN6   2996.458012  562.3750
## TNMD        9.619448    0.0000
## DPM1     1625.686690 1095.2111
## SCYL3    2405.823913 3499.2222
## C1orf112  446.342381  603.2750
## FGR       292.431215  403.3194

Building a BSRInference object

From the previous object bsrdm, you can generate inferences by calling its method BSRInference. The resulting BSR-Inference object, bsrinf, contains all the inferred L-R interactions with their associated pathways and corrected p-values.

From here, you can already access LR interactions using: LRinter(bsrinf).

# We use a subset of the reference to speed up
# inference in the context of the vignette.
subset <- c("REACTOME_BASIGIN_INTERACTIONS",
"REACTOME_SYNDECAN_INTERACTIONS",
"REACTOME_ECM_PROTEOGLYCANS",
"REACTOME_CELL_JUNCTION_ORGANIZATION")

reactSubset <- BulkSignalR:::.SignalR$BulkSignalR_Reactome[
BulkSignalR:::.SignalR$BulkSignalR_Reactome$`Reactome name` %in% subset,]

resetPathways(dataframe = reactSubset,
resourceName = "Reactome")

bsrinf <- BSRInference(bsrdm,
    min.cor = 0.3,
    reference="REACTOME")

LRinter.dataframe <- LRinter(bsrinf)

head(LRinter.dataframe[
    order(LRinter.dataframe$qval),
    c("L", "R", "LR.corr", "pw.name", "qval")])
##           L     R   LR.corr                    pw.name         qval
## 8191  EDIL3 ITGB5 0.7299145 REACTOME_ECM_PROTEOGLYCANS 3.864444e-07
## 22111 TGFB3 ITGB5 0.7360684 REACTOME_ECM_PROTEOGLYCANS 3.864444e-07
## 18341  PLAU ITGB5 0.6389744 REACTOME_ECM_PROTEOGLYCANS 7.001329e-07
## 16001 LTBP3 ITGB5 0.5712821 REACTOME_ECM_PROTEOGLYCANS 1.022390e-06
## 21941 TGFB1 ITGB5 0.3996581 REACTOME_ECM_PROTEOGLYCANS 3.386708e-06
## 15991 LTBP1 ITGB5 0.3736752 REACTOME_ECM_PROTEOGLYCANS 3.389314e-06

You can finally filter out non-significant L-R interactions and order them by best Q-value before saving them to a file for instance.

write.table(LRinter.dataframe[order(LRinter.dataframe$qval), ],
    "./sdc_LR.tsv",
    row.names = FALSE,
    sep = "\t",
    quote = FALSE
)

Reduction strategies

The output of BSRInference is exhaustive and can thus contain redundancy due to intrinsic redundancy in the reference databases (Reactome, KEGG, GOBP) and multilateral interactions in LRdb. To address this phenomenon, we propose several strategies.

Reducing a BSRInference object to pathway

With reduceToPathway, all the L-R interactions with the receptor included in a certain pathway are aggregated to only report each downstream pathway once. For a given pathway, the reported P-values and target genes are those of best (minimum P-value) L-R interaction that was part of the aggregation. Nothing is recomputed, we simply merge.

bsrinf.redP <- reduceToPathway(bsrinf)

Reducing a BSRInference object to best pathway

With reduceToBestPathway, a BSR-Inference object is reduced to only report one pathway per L-R interaction. The pathway with the smallest P-value is selected. A same pathways might occur multiple times with with different L-R interactions.

bsrinf.redBP <- reduceToBestPathway(bsrinf)

Reducing to ligands or receptors

As already mentioned, several ligands might bind a single receptor (or several shared receptors) and the converse is true as well. Two reduction operators enable users to either aggregate all the ligands of a same receptor or all the receptors bound by a same ligand:

bsrinf.L <- reduceToLigand(bsrinf)
bsrinf.R <- reduceToReceptor(bsrinf)

Combined reductions

Combinations are possible.

For instance, users can apply reduceToPathway and reduceToBestPathway reductions sequentially to maximize the reduction effect. In case the exact same sets of aggregated ligands and receptors obtained with reduceToPathway was associated with several pathways, the pathway with the best P-value would be kept by reduceToBestPathway.

bsrinf.redP <- reduceToPathway(bsrinf)
bsrinf.redPBP <- reduceToBestPathway(bsrinf.redP)

Building a BSRSignature object

Gene signatures for a given, potentially reduced BSR-Inference object are generated by BSRSignature, which returns a BSRSignature object.

To follow the activity of L-R interactions across the samples of the dataset, scoreLRGeneSignatures computes a score for each gene signature. Then, heatmaps can be generated to represent differences, e.g., using the utility function a simpleHeatmap.

Hereafter, we show different workflows of reductions combined with gene signature scoring and display.

Scoring by ligand-receptor

bsrsig.redBP <- BSRSignature(bsrinf.redBP, qval.thres = 0.001)

scoresLR <- scoreLRGeneSignatures(bsrdm, bsrsig.redBP,
    name.by.pathway = FALSE
)

simpleHeatmap(scoresLR[1:20, ],
    hcl.palette = "Cividis",
    pointsize=8,width=4,height=3)

Scoring by pathway

bsrsig.redPBP <- BSRSignature(bsrinf.redPBP, qval.thres = 0.01)

scoresPathway <- scoreLRGeneSignatures(bsrdm, bsrsig.redPBP,
    name.by.pathway = TRUE
)

simpleHeatmap(scoresPathway,
    hcl.palette = "Blue-Red 2",
    pointsize=8,width=6,height=2)

Other visualization utilities

Heatmap of ligand-receptor-target genes expression

After computing gene signatures score, one may wish to look at the expression of the genes involved in that signature. For instance, we can display three heatmaps corresponding to the scaled (z-scores) expression of ligands (pink), receptors (green), and target genes (blue).

On the top of each heatmap, the whole signature score from scoreLRGeneSignatures is reported for reference.

pathway1 <- pathways(bsrsig.redPBP)[1]
signatureHeatmaps(
    pathway = pathway1,
    bsrdm = bsrdm,
    bsrsig = bsrsig.redPBP,
    h.width = 6,
    h.height = 8,
    fontsize = 4,
    show_column_names = TRUE
)

AlluvialPlot

alluvial.plot is a function that enable users to represent the different interactions between ligands, receptors, and pathways stored in a BSRInference object.

Obviously, it is possible to filter by ligand, receptor, or pathway. This is achieved by specifying a key word on the chosen category. A filter on L-R interaction Q-values can be applied in addition.

alluvialPlot(bsrinf,
    keywords = c("LAMC1"),
    type = "L",
    qval.thres = 0.01
)

BubblePlot

bubblePlotPathwaysLR is a handy way to visualize the strengths of several L-R interactions in relation with their receptor downstream pathways.

A vector of pathways of interest can be provided to limit the complexity of the plot.

pathways <- LRinter(bsrinf)[1,c("pw.name")]
bubblePlotPathwaysLR(bsrinf,
    pathways = pathways,
    qval.thres = 0.001,
    color = "red",
    pointsize = 8
)
## 17 LR interactions detected.

Chordiagram

chord.diagram.LR is a function that enable users to feature the different L-R interactions involved in a specific pathway.

L-R correlations strengths are drawn in a yellow color-scale.
Ligands are in grey, whereas receptors are in green.
You can also highlight in red one specific interaction by passing values of a L-R pair as follows ligand="FYN", receptor="SPN".

chordDiagramLR(bsrinf,
    pw.id.filter = "R-HSA-210991",
    limit = 20,
    ligand="FYN", 
    receptor="SPN"
)

Network Analysis

Since BulkSignalR relies on intracellular networks to estimate the statistical significance of (ligand, receptor, pathway triples), links from receptors to target genes are naturally accessible. Different functions enable users to exploit this graphical data for plotting or further data analysis.

Furthermore, networks can be exported in text files and graphML objects to be further explored with Cytoscape (www.cytoscape.org), yEd (www.yworks.com), or similar software tools.

# Generate a ligand-receptor network and export it in .graphML
# for Cytoscape or similar tools
gLR <- getLRNetwork(bsrinf.redBP, qval.thres = 1e-3)

# save to file
# write.graph(gLR,file="SDC-LR-network.graphml",format="graphml")

# As an alternative to Cytoscape, you can play with igraph package functions.
plot(gLR,
    edge.arrow.size = 0.1,
    vertex.label.color = "black",
    vertex.label.family = "Helvetica",
    vertex.label.cex = 0.1
)

# You can apply other functions.


# Community detection
u.gLR <- as_undirected(gLR) # most algorithms work for undirected graphs only
comm <- cluster_edge_betweenness(u.gLR)
# plot(comm,u.gLR,
#     vertex.label.color="black",
#     vertex.label.family="Helvetica",
#     vertex.label.cex=0.1)

# Cohesive blocks
cb <- cohesive_blocks(u.gLR)
plot(cb, u.gLR,
    vertex.label.color = "black",
    vertex.label.family = "Helvetica",
    vertex.label.cex = 0.1,
    edge.color = "black"
)

# For the next steps, we just share the code below but graph generation function
# are commented to lighten the vignette.

# Generate a ligand-receptor network complemented with intra-cellular,
# receptor downstream pathways [computations are a bit longer here]
#
# You can save to a file for cystoscape or plot with igraph.

gLRintra <- getLRIntracellNetwork(bsrinf.redBP, qval.thres = 1e-3)

lay <- layout_with_kk(gLRintra)
# plot(gLRintra,
#     layout=lay,
#     edge.arrow.size=0.1,
#     vertex.label.color="black",
#     vertex.label.family="Helvetica",
#     vertex.label.cex=0.1)

# Reduce complexity by focusing on strongly targeted pathways
pairs <- LRinter(bsrinf.redBP)
top <- unique(pairs[pairs$pval <  1e-3, c("pw.id", "pw.name")])
top
gLRintra.res <- getLRIntracellNetwork(bsrinf.redBP,
    qval.thres = 0.01,
    restrict.pw = top$pw.id
)
lay <- layout_with_fr(gLRintra.res)

# plot(gLRintra.res,
#     layout=lay,
#     edge.arrow.size=0.1,
#     vertex.label.color="black",
#     vertex.label.family="Helvetica",
#     vertex.label.cex=0.4)

Non human data


In order to process data from nonhuman organisms, users only need to specify a few additional parameters and all the other steps of the analysis remain unchanged.

By default, BulksignalR works with Homo sapiens. We implemented a strategy using ortholog genes (mapped by the orthogene BioConductor package) in BulkSignalR directly.

The function findOrthoGenes creates a correspondence table between human and another organism. convertToHuman then converts an initial expression matrix to a Homo sapiens equivalent.

When calling BSRDataModel, the user only needs to pass this transformed matrix, the actual nonhuman organism, and the correspondence table. Then, L-R interaction inference is performed as for human data. Finally, users can switch back to gene names relative to the original organism via resetToInitialOrganism. The rest of the workflow is executed as usual for computing gene signatures and visualizing.


data(bodyMap.mouse)

ortholog.dict <- findOrthoGenes(
    from_organism = "mmusculus",
    from_values = rownames(bodyMap.mouse)
)
## Dictionary Size: 14806 genes
## -> 650 : Ligands
## -> 662 : Receptors
matrix.expression.human <- convertToHuman(
    counts = bodyMap.mouse,
    dictionary = ortholog.dict
)

bsrdm <- BSRDataModel(
    counts = matrix.expression.human,
    species = "mmusculus",
    conversion.dict = ortholog.dict
)

bsrdm <- learnParameters(bsrdm,quick=TRUE)

bsrinf <- BSRInference(bsrdm,reference="REACTOME")

bsrinf <- resetToInitialOrganism(bsrinf, conversion.dict = ortholog.dict)

# For example, if you want to explore L-R interactions
# you can proceed as shown above for a human dataset.

# bsrinf.redBP <- reduceToBestPathway(bsrinf)
# bsrsig.redBP <- BSRSignature(bsrinf.redBP, qval.thres=0.001)
# scoresLR <- scoreLRGeneSignatures(bsrdm,bsrsig.redBP,name.by.pathway=FALSE)
# simpleHeatmap(scoresLR[1:20,],column.names=TRUE,
# width=9, height=5, pointsize=8)

Spatial Transcriptomics


BulkSignalR workflow can be applied to spatial transcriptomics (ST) to find significant L-R interactions occurring in a tissue. Additional functions have been introduced to facilitate the visualization and analysis of the results in a spatial context. The only necessary change is to adapt the training of the statistical model to shallower data featuring dropouts and reduced dynamic range. This is achieved by imposing a minimum correlation at -1 in the training and requiring at least two target genes in a pathway instead of 4. Also, thresholds on L-R interaction Q-values should be released slightly such as 1% instead of 0.1%.

A basic spatial function is spatialPlot that enables visualizing L-R interaction gene signature scores at their spatial coordinates with a potential reference plot (raw tissue image or user-defined areas) on the side.

When the papaer was published, we provided scripts BulkSignalR github companion, to retrieve ST raw data, from tabulated files, that was the starting point to execute our workflow.

We can also directly work with an object from the SpatialExperiment Bioconductor class.

Also you can check out the VisiumIO package that allows users to readily import Visium data from the 10X Space Ranger pipeline. VisiumIO package provides a convenient function to easily retrieve SpatialExperiment object.


# load data =================================================

# We re-initialise the environment variable of Reactome
# whith the whole set. (because we previously made a subset before)
reactSubset <- getResource(resourceName = "Reactome",
cache = TRUE)

resetPathways(dataframe = reactSubset,
resourceName = "Reactome")

# Few steps of pre-process to subset a spatialExperiment object
# from STexampleData package ==================================

spe <- Visium_humanDLPFC()
set.seed(123)

speSubset <- spe[, colData(spe)$ground_truth%in%c("Layer1","Layer2")]

idx <- sample(ncol(speSubset), 10)
speSubset <- speSubset[, idx]

my.image.as.raster <- SpatialExperiment::imgRaster(speSubset, 
    sample_id = imgData(spe)$sample_id[1], image_id = "lowres")

colData(speSubset)$idSpatial <- paste(colData(speSubset)[[4]],
                colData(speSubset)[[5]],sep = "x")


annotation <- colData(speSubset)
# prepare data =================================================

bsrdm <- BSRDataModel(speSubset,
    min.count = 1,
    prop = 0.01,
    method = "TC",
    symbol.col = 2,
    x.col = 4,
    y.col = 5, 
    barcodeID.col = 1)

bsrdm <- learnParameters(bsrdm,
    quick = TRUE,
    min.positive = 2,
    verbose = TRUE)
## Learning ligand-receptor correlation null distribution...
## Automatic null model choice:
##   Censored normal D=0.591578947368421, Chi2=0.021516041456554
##   Censored mixture D=0.596491228070175, Chi2=0.013410844499461
##   Gaussian kernel empirical D=0.578245614035088, Chi2=0.00860405205605798
##   ==> select censored mixture of 2 normals
## Censored mixed normal parameters (alpha, mean1, sd1, mean2, sd2): 0.51682782270782, -0.14660031357865, 0.171431097606999, 0.427853668453017, 0.685724390427995
## Quick learning, receptor-target correlation null distribution assumed to be equal to ligand-receptor...
## Learning of statistical model parameters completed
bsrinf <- BSRInference(bsrdm, min.cor = -1,reference="REACTOME")

# spatial analysis ============================================

bsrinf.red <- reduceToBestPathway(bsrinf)
pairs.red <- LRinter(bsrinf.red)

thres <- 0.01
min.corr <- 0.01
pairs.red <- pairs.red[pairs.red$qval < thres & pairs.red$LR.corr > min.corr,]

head(pairs.red[
    order(pairs.red$qval),
    c("L", "R", "LR.corr", "pw.name", "qval")])
##           L       R   LR.corr
## 8       ADM   VIPR1 1.0000000
## 73      CRH   VIPR1 1.0000000
## 1902  RIMS1 SLC17A7 0.2910126
## 575   CALM3   PDE1A 0.5741634
## 505   CALM2   PDE1A 0.4697701
## 26529 CALM1   ADCY1 0.3719495
##                                                     pw.name         qval
## 8              REACTOME_CLASS_B_2_SECRETIN_FAMILY_RECEPTORS 0.000000e+00
## 73             REACTOME_CLASS_B_2_SECRETIN_FAMILY_RECEPTORS 0.000000e+00
## 1902         REACTOME_TRANSMISSION_ACROSS_CHEMICAL_SYNAPSES 9.368089e-06
## 575   REACTOME_INTRACELLULAR_SIGNALING_BY_SECOND_MESSENGERS 2.096166e-04
## 505   REACTOME_INTRACELLULAR_SIGNALING_BY_SECOND_MESSENGERS 2.463890e-04
## 26529        REACTOME_TRANSMISSION_ACROSS_CHEMICAL_SYNAPSES 3.303097e-04
s.red  <- BSRSignature(bsrinf.red, qval.thres=thres)
scores.red <- scoreLRGeneSignatures(bsrdm,s.red)

head(scores.red)
##                          10x32       3x47        4x50       17x111       5x59
## {ADAM17} / {ERBB4} -0.29101761 -0.3707323 -0.36858294 -0.199355637  0.6054123
## {ADM} / {VIPR1}    -0.39580588 -0.3958059  0.07490482  1.542978119  0.3645659
## {APP} / {GPC1}     -0.25955787 -0.2773347 -0.80159128 -0.078906391  0.6176765
## {BDNF} / {NTRK2}    0.08153983 -0.5633095 -0.53968589  0.006150271  0.4081665
## {CALM1} / {PDE1A}  -0.53598153  0.1885086 -0.15395244 -0.497691512 -0.2940636
## {CALM1} / {PDE1B}   0.91933134  0.1512018 -0.10990203 -0.412462025 -0.2808510
##                           0x20       8x100       8x108      14x30       11x39
## {ADAM17} / {ERBB4} -0.34515568 -0.12101923 -0.22763140  0.4681621  0.84992034
## {ADM} / {VIPR1}    -0.39580588 -0.35986278  0.04021555 -0.3958059 -0.07957812
## {APP} / {GPC1}     -0.06718939  0.76786935 -0.29802760  0.9264921 -0.52943075
## {BDNF} / {NTRK2}   -0.54450658  0.07966383  0.56138865 -0.4437255  0.95431847
## {CALM1} / {PDE1A}  -0.28820297  0.23963850 -0.23401675  0.6169534  0.95880833
## {CALM1} / {PDE1B}  -0.33599303 -0.02749348 -0.29449527  0.4731956 -0.08253183

From here, one can start to explore the data through different plots.

Note : As we work on a data subset to increase the vignette generation, only a few point are displayed on the slide.

# Visualization ============================================

# plot one specific interaction

# we have to follow the syntax with {} 
# to be compatible with reduction operations
inter <- "{SLIT2} / {GPC1}"

# with raw tissue reference
spatialPlot(scores.red[inter, ], annotation, inter,
    ref.plot = TRUE, ref.plot.only = FALSE,
    image.raster = NULL, dot.size = 1,
    label.col = "ground_truth"
)

# or with synthetic image reference
spatialPlot(scores.red[inter, ], annotation, inter,
    ref.plot = TRUE, ref.plot.only = FALSE,
    image.raster = my.image.as.raster, dot.size = 1,
    label.col = "ground_truth"

)

You can dissect one interaction to visualise both ligand and receptor expression of the interaction.

separatedLRPlot(scores.red, "SLIT2", "GPC1", 
    ncounts(bsrdm), 
    annotation,
    label.col = "ground_truth")

# generate visual index on disk in pdf file
spatialIndexPlot(scores.red, annotation,  
    label.col = "ground_truth",
    out.file="spatialIndexPlot")


Finally, we provide function to assess statistical associations of L-R gene signature scores with the different user-defined areas of a sample. Based on these associations, a visualization tool can represent the latter in the form of a heatmap.


# statistical association with tissue areas based on correlations
# For display purpose, we only use a subset here
assoc.bsr.corr <- spatialAssociation(scores.red[c(1:17), ],
annotation, label.col = "ground_truth",test = "Spearman")

head(assoc.bsr.corr)
##                   interaction      Layer1      Layer2
## ADAM17 / ERBB4 ADAM17 / ERBB4 -0.87038828  0.87038828
## ADM / VIPR1       ADM / VIPR1 -0.35921060  0.35921060
## APP / GPC1         APP / GPC1 -0.52223297  0.52223297
## BDNF / NTRK2     BDNF / NTRK2 -0.38297084  0.38297084
## CALM1 / PDE1A   CALM1 / PDE1A -0.31333978  0.31333978
## CALM1 / PDE1B   CALM1 / PDE1B  0.03481553 -0.03481553
spatialAssociationPlot(assoc.bsr.corr)

We also provide 2D-projections (see spatialDiversityPlot function) to assess diversity among L-R interaction spatial distributions over an entire dataset. Other function as generateSpatialPlots can generate on disk multiple individual spatial plots.


Note that we also describe more diverse use cases in the BulkSignalR github companion.


See the reference manual for all the details.

Acknowledgements

We thank Guillaume Tosato for his help with the figures and Gauthier Gadouas for testing the software on different platforms.


Thank you for reading this guide and for using BulkSignalR.

Session Information

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] doParallel_1.0.17           iterators_1.0.14           
##  [3] foreach_1.5.2               STexampleData_1.14.0       
##  [5] SpatialExperiment_1.17.0    SingleCellExperiment_1.29.1
##  [7] SummarizedExperiment_1.37.0 Biobase_2.67.0             
##  [9] GenomicRanges_1.59.1        GenomeInfoDb_1.43.2        
## [11] IRanges_2.41.2              S4Vectors_0.45.2           
## [13] MatrixGenerics_1.19.1       matrixStats_1.5.0          
## [15] ExperimentHub_2.15.0        AnnotationHub_3.15.0       
## [17] BiocFileCache_2.15.0        dbplyr_2.5.0               
## [19] BiocGenerics_0.53.3         generics_0.1.3             
## [21] dplyr_1.1.4                 igraph_2.1.3               
## [23] BulkSignalR_0.99.22         rmarkdown_2.29             
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3        sys_3.4.3                
##   [3] jsonlite_1.8.9            shape_1.4.6.1            
##   [5] magrittr_2.0.3            magick_2.8.5             
##   [7] farver_2.1.2              GlobalOptions_0.1.2      
##   [9] fs_1.6.5                  vctrs_0.6.5              
##  [11] multtest_2.63.0           memoise_2.0.1            
##  [13] RCurl_1.98-1.16           ggtree_3.15.0            
##  [15] rstatix_0.7.2             htmltools_0.5.8.1        
##  [17] S4Arrays_1.7.1            curl_6.1.0               
##  [19] broom_1.0.7               SparseArray_1.7.2        
##  [21] Formula_1.2-5             gridGraphics_0.5-1       
##  [23] sass_0.4.9                bslib_0.8.0              
##  [25] htmlwidgets_1.6.4         plotly_4.10.4            
##  [27] cachem_1.1.0              buildtools_1.0.0         
##  [29] mime_0.12                 lifecycle_1.0.4          
##  [31] pkgconfig_2.0.3           Matrix_1.7-1             
##  [33] R6_2.5.1                  fastmap_1.2.0            
##  [35] GenomeInfoDbData_1.2.13   clue_0.3-66              
##  [37] digest_0.6.37             aplot_0.2.4              
##  [39] colorspace_2.1-1          AnnotationDbi_1.69.0     
##  [41] patchwork_1.3.0           grr_0.9.5                
##  [43] RSQLite_2.3.9             ggpubr_0.6.0             
##  [45] labeling_0.4.3            filelock_1.0.3           
##  [47] httr_1.4.7                abind_1.4-8              
##  [49] compiler_4.4.2            withr_3.0.2              
##  [51] bit64_4.5.2               backports_1.5.0          
##  [53] orthogene_1.13.0          carData_3.0-5            
##  [55] DBI_1.2.3                 homologene_1.4.68.19.3.27
##  [57] ggsignif_0.6.4            MASS_7.3-64              
##  [59] rappdirs_0.3.3            DelayedArray_0.33.3      
##  [61] rjson_0.2.23              tools_4.4.2              
##  [63] ape_5.8-1                 glue_1.8.0               
##  [65] stabledist_0.7-2          nlme_3.1-166             
##  [67] grid_4.4.2                Rtsne_0.17               
##  [69] cluster_2.1.8             gtable_0.3.6             
##  [71] tidyr_1.3.1               data.table_1.16.4        
##  [73] car_3.1-3                 XVector_0.47.2           
##  [75] BiocVersion_3.21.1        ggrepel_0.9.6            
##  [77] RANN_2.6.2                pillar_1.10.1            
##  [79] yulab.utils_0.1.9         babelgene_22.9           
##  [81] circlize_0.4.16           splines_4.4.2            
##  [83] treeio_1.31.0             lattice_0.22-6           
##  [85] survival_3.8-3            bit_4.5.0.1              
##  [87] tidyselect_1.2.1          ComplexHeatmap_2.23.0    
##  [89] Biostrings_2.75.3         maketools_1.3.1          
##  [91] knitr_1.49                gridExtra_2.3            
##  [93] xfun_0.50                 UCSC.utils_1.3.0         
##  [95] lazyeval_0.2.2            ggfun_0.1.8              
##  [97] yaml_2.3.10               evaluate_1.0.3           
##  [99] codetools_0.2-20          tibble_3.2.1             
## [101] BiocManager_1.30.25       ggplotify_0.1.2          
## [103] cli_3.6.3                 munsell_0.5.1            
## [105] jquerylib_0.1.4           Rcpp_1.0.13-1            
## [107] gprofiler2_0.2.3          png_0.1-8                
## [109] ggplot2_3.5.1             blob_1.2.4               
## [111] ggalluvial_0.12.5         bitops_1.0-9             
## [113] glmnet_4.1-8              viridisLite_0.4.2        
## [115] tidytree_0.4.6            scales_1.3.0             
## [117] purrr_1.0.2               crayon_1.5.3             
## [119] GetoptLong_1.0.5          rlang_1.1.4              
## [121] KEGGREST_1.47.0