--- title: "BulkSignalR :
Inference of ligand-receptor interactions from bulk data or spatial transcriptomics" author: - name: Jean-Philippe Villemin affiliation: - Institut de Recherche en Cancérologie de Montpellier, Inserm, Montpellier, France email: jean-philippe.villemin@inserm.fr - name: Jacques Colinge affiliation: - Institut de Recherche en Cancérologie de Montpellier, Inserm, Montpellier, France email: jacques.colinge@inserm.fr date: "`r format(Sys.Date(), '%m/%d/%Y')`" abstract: >

BulkSignalR is used to infer ligand-receptor (L-R) interactions from bulk expression (transcriptomics/proteomics) data, or spatial transcriptomics. Potential L-R interactions are taken from the LR*db* database, which is included in our other package SingleCellSignalR, available from Bioconductor. Inferences rely on a statistical model linking potential L-R interactions with biological pathways from Reactome or biological processes from GO. A number of visualization and data summary functions are proposed to help navigating the predicted interactions.

BulkSignalR package version: `r packageVersion("BulkSignalR")` output: rmarkdown::html_vignette: self_contained: true toc: true toc_depth: 4 highlight: pygments fig_height: 3 fig_width: 3 fig_caption: no code_folding: show package: BulkSignalR vignette: > %\VignetteIndexEntry{BulkSignalR-Main} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::knit_hooks$set(optipng = knitr::hook_optipng) ``` ```{r load-libs, message = FALSE, warning = FALSE, results = FALSE} library(BulkSignalR) library(igraph) library(dplyr) library(STexampleData) ``` # Introduction ## What is it for? `BulkSignalR` is a tool that enables the inference of L-R interactions from bulk expression data, *i.e.*, from transcriptomics (RNA-seq or microarrays) or expression proteomics. `BulkSignalR` also applies to spatial transcriptomics such as 10x Genomics VISIUM (TM), and a set of functions dedicated to spatial data have been added to better support this particular use of the package. ## Starting point There are a variety of potential data sources prior to using `BulkSignalR` (proteomics, sequencing, etc.) that result in the generation of a matrix of numbers representing the expression levels of genes or proteins. This matrix may be normalized already or not. In every case, the latter matrix is the starting point of using `BulkSignalR`. It is mandatory that genes/proteins are represented as rows of the expression matrix and the samples as columns. HUGO gene symbols (even for proteins) must be used to ensure matching LR*db*, Reactome, and GOBP contents. You can also directly work with an object from the `SummarizedExperiment` or `SpatialExperiment` Bioconductor classes. ## How does it work? As represented in the figure below, only a few steps are required in order to perform a `BulkSignalR` analysis. Three S4 objects will be sequentially constructed: * **BSR-DataModel**, denoted `bsrdm` * **BSR-Inference**, denoted `bsrinf` * **BSR-Signature**, denoted `bsrsig` \ **BSRDataModel** integrates the expression data matrix and the parameters of the statistical model learned on this matrix. **BSRInference** provides a list triples (ligand, receptor, pathway downstream the receptor) with their statistical significance. If a receptor occurs in multiple pathways, the corresponding number of triples will be described in the BSR-Inference object. Genes (or proteins) targeted by the pathway are also described in this object. **BSRSignature** contains gene signatures associated with the triples (ligand, receptor, downstream pathway) stored in a `BSRInference` object. Those signatures are comprised of the ligand and the receptor obviously, but also all the pathway target genes that were used by the statistical model. Gene signatures are meant to report the L-R interaction as a global phenomenon integrating its downstream effect. Indeed, signatures can be scored with a dedicated function allowing the user to set an arbitrary weight on the downtream component. Those scores represent the activity of the L-R interactions across the samples, they are returned as a matrix. Because of the occurrence of certain receptors in multiple pathways, but also because some ligands may bind several receptors, or *vice versa*, BSR-Inference objects may contain redundant data depending on how the user want to look at them. We therefore provide a range of reduction operators meant to obtain reduced BSR-Inference objects (see below). \ Furthermore, we provide several handy functions to explore the data through different plots (heatmaps, alluvial plots, chord diagrams or networks). `BulkSignalR` package functions have many parameters that can be changed by the user to fit specific needs (see Reference Manual for details). ## Parallel mode settings Users can reduce compute time by using several processors in parallel. ```{r parallel, message=FALSE, warnings=FALSE} library(doParallel) n.proc <- 1 cl <- makeCluster(n.proc) registerDoParallel(cl) # To add at the end of your script # stopCluster(cl) ``` **Notes : ** For operating systems that can fork such as all the UNIX-like systems, it might be preferable to use the library `doMC` that is faster (less overhead). This is transparent to `BulkSignalR`. In case you need to reproduce results exactly and since statistical model parameter learning involves the generation of randomized expression matrices, you might want to use `set.seed()`. In a parallel mode, `iseed` that is a parameter of `clusterSetRNGStream` must be used. # First Example ## Loading the data Here, we load salivary duct carcinoma transcriptomes integrated as `sdc` in `BulkSignalR`. ```{r loading,eval=TRUE} data(sdc) head(sdc) ``` ## Building a BSRDataModel object `BSRDataModel` creates the first object with information relative to your bulk expression data. ```{r BSRDataModel, eval=TRUE,cache=FALSE} bsrdm <- BSRDataModel(counts = sdc) bsrdm ``` `learnParameters` updates a BSR-DataModel object with the parameters necessary for `BulkSignalR` statistical model. ```{r learnParameters,eval=TRUE,warning=FALSE,fig.dim = c(7,3)} bsrdm <- learnParameters(bsrdm,quick=TRUE) bsrdm ``` ## Building a BSRInference object From the previous object `bsrdm`, you can generate inferences by calling its method `BSRInference`. The resulting BSR-Inference object, `bsrinf`, contains all the inferred L-R interactions with their associated pathways and corrected p-values. From here, you can already access LR interactions using: `LRinter(bsrinf)`. ```{r BSRInference,eval=TRUE} # We use a subset of the reference to speed up # inference in the context of the vignette. subset <- c("REACTOME_BASIGIN_INTERACTIONS", "REACTOME_SYNDECAN_INTERACTIONS", "REACTOME_ECM_PROTEOGLYCANS", "REACTOME_CELL_JUNCTION_ORGANIZATION") reactSubset <- BulkSignalR:::.SignalR$BulkSignalR_Reactome[ BulkSignalR:::.SignalR$BulkSignalR_Reactome$`Reactome name` %in% subset,] resetPathways(dataframe = reactSubset, resourceName = "Reactome") bsrinf <- BSRInference(bsrdm, min.cor = 0.3, reference="REACTOME") LRinter.dataframe <- LRinter(bsrinf) head(LRinter.dataframe[ order(LRinter.dataframe$qval), c("L", "R", "LR.corr", "pw.name", "qval")]) ``` You can finally filter out non-significant L-R interactions and order them by best Q-value before saving them to a file for instance. ```{r BSRInferenceTfile, eval=FALSE} write.table(LRinter.dataframe[order(LRinter.dataframe$qval), ], "./sdc_LR.tsv", row.names = FALSE, sep = "\t", quote = FALSE ) ``` ## Reduction strategies The output of `BSRInference` is exhaustive and can thus contain redundancy due to intrinsic redundancy in the reference databases (Reactome, KEGG, GOBP) and multilateral interactions in LR*db*. To address this phenomenon, we propose several strategies. ### Reducing a BSRInference object to pathway With `reduceToPathway`, all the L-R interactions with the receptor included in a certain pathway are aggregated to only report each downstream pathway once. For a given pathway, the reported P-values and target genes are those of best (minimum P-value) L-R interaction that was part of the aggregation. Nothing is recomputed, we simply merge. ```{r ReduceToPathway, eval=TRUE} bsrinf.redP <- reduceToPathway(bsrinf) ``` ### Reducing a BSRInference object to best pathway With ` reduceToBestPathway`, a BSR-Inference object is reduced to only report one pathway per L-R interaction. The pathway with the smallest P-value is selected. A same pathways might occur multiple times with with different L-R interactions. ```{r ReduceToBestPathway, eval=TRUE} bsrinf.redBP <- reduceToBestPathway(bsrinf) ``` ### Reducing to ligands or receptors As already mentioned, several ligands might bind a single receptor (or several shared receptors) and the converse is true as well. Two reduction operators enable users to either aggregate all the ligands of a same receptor or all the receptors bound by a same ligand: ```{r ReduceToLigand,eval=TRUE} bsrinf.L <- reduceToLigand(bsrinf) bsrinf.R <- reduceToReceptor(bsrinf) ``` ### Combined reductions Combinations are possible. For instance, users can apply `reduceToPathway` and `reduceToBestPathway` reductions sequentially to maximize the reduction effect. In case the exact same sets of aggregated ligands and receptors obtained with `reduceToPathway` was associated with several pathways, the pathway with the best P-value would be kept by `reduceToBestPathway`. ```{r doubleReduction, eval=TRUE} bsrinf.redP <- reduceToPathway(bsrinf) bsrinf.redPBP <- reduceToBestPathway(bsrinf.redP) ``` ## Building a BSRSignature object Gene signatures for a given, potentially reduced BSR-Inference object are generated by `BSRSignature`, which returns a BSRSignature object. To follow the activity of L-R interactions across the samples of the dataset, `scoreLRGeneSignatures` computes a score for each gene signature. Then, heatmaps can be generated to represent differences, *e.g.*, using the utility function a `simpleHeatmap`. Hereafter, we show different workflows of reductions combined with gene signature scoring and display. ### Scoring by ligand-receptor ```{r scoringLR,eval=TRUE,fig.dim = c(5,4)} bsrsig.redBP <- BSRSignature(bsrinf.redBP, qval.thres = 0.001) scoresLR <- scoreLRGeneSignatures(bsrdm, bsrsig.redBP, name.by.pathway = FALSE ) simpleHeatmap(scoresLR[1:20, ], hcl.palette = "Cividis", pointsize=8,width=4,height=3) ``` ### Scoring by pathway ```{r scoringPathway,eval=TRUE,fig.dim = c(7,3)} bsrsig.redPBP <- BSRSignature(bsrinf.redPBP, qval.thres = 0.01) scoresPathway <- scoreLRGeneSignatures(bsrdm, bsrsig.redPBP, name.by.pathway = TRUE ) simpleHeatmap(scoresPathway, hcl.palette = "Blue-Red 2", pointsize=8,width=6,height=2) ``` ## Other visualization utilities ### Heatmap of ligand-receptor-target genes expression After computing gene signatures score, one may wish to look at the expression of the genes involved in that signature. For instance, we can display three heatmaps corresponding to the scaled (z-scores) expression of ligands (pink), receptors (green), and target genes (blue). On the top of each heatmap, the whole signature score from `scoreLRGeneSignatures` is reported for reference. ```{r heatmapMulti,eval=TRUE,fig.dim = c(7,14)} pathway1 <- pathways(bsrsig.redPBP)[1] signatureHeatmaps( pathway = pathway1, bsrdm = bsrdm, bsrsig = bsrsig.redPBP, h.width = 6, h.height = 8, fontsize = 4, show_column_names = TRUE ) ``` ### AlluvialPlot `alluvial.plot` is a function that enable users to represent the different interactions between ligands, receptors, and pathways stored in a BSRInference object. Obviously, it is possible to filter by ligand, receptor, or pathway. This is achieved by specifying a key word on the chosen category. A filter on L-R interaction Q-values can be applied in addition. ```{r AlluvialPlot,eval=TRUE,fig.dim = c(8,3)} alluvialPlot(bsrinf, keywords = c("LAMC1"), type = "L", qval.thres = 0.01 ) ``` ### BubblePlot `bubblePlotPathwaysLR` is a handy way to visualize the strengths of several L-R interactions in relation with their receptor downstream pathways. A vector of pathways of interest can be provided to limit the complexity of the plot. ```{r BubblePlot,eval=TRUE,fig.dim = c(8,4)} pathways <- LRinter(bsrinf)[1,c("pw.name")] bubblePlotPathwaysLR(bsrinf, pathways = pathways, qval.thres = 0.001, color = "red", pointsize = 8 ) ``` ### Chordiagram `chord.diagram.LR` is a function that enable users to feature the different L-R interactions involved in a specific pathway. L-R correlations strengths are drawn in a yellow color-scale. Ligands are in grey, whereas receptors are in green. You can also highlight in red one specific interaction by passing values of a L-R pair as follows `ligand="FYN", receptor="SPN"`. ```{r Chordiagram,eval=TRUE,fig.dim = c(6,4.5)} chordDiagramLR(bsrinf, pw.id.filter = "R-HSA-210991", limit = 20, ligand="FYN", receptor="SPN" ) ``` # Network Analysis Since `BulkSignalR` relies on intracellular networks to estimate the statistical significance of (ligand, receptor, pathway triples), links from receptors to target genes are naturally accessible. Different functions enable users to exploit this graphical data for plotting or further data analysis. Furthermore, networks can be exported in text files and graphML objects to be further explored with Cytoscape (www.cytoscape.org), yEd (www.yworks.com), or similar software tools. ```{r network1, eval=TRUE} # Generate a ligand-receptor network and export it in .graphML # for Cytoscape or similar tools gLR <- getLRNetwork(bsrinf.redBP, qval.thres = 1e-3) # save to file # write.graph(gLR,file="SDC-LR-network.graphml",format="graphml") # As an alternative to Cytoscape, you can play with igraph package functions. plot(gLR, edge.arrow.size = 0.1, vertex.label.color = "black", vertex.label.family = "Helvetica", vertex.label.cex = 0.1 ) ``` ```{r network2, eval=TRUE} # You can apply other functions. # Community detection u.gLR <- as_undirected(gLR) # most algorithms work for undirected graphs only comm <- cluster_edge_betweenness(u.gLR) # plot(comm,u.gLR, # vertex.label.color="black", # vertex.label.family="Helvetica", # vertex.label.cex=0.1) # Cohesive blocks cb <- cohesive_blocks(u.gLR) plot(cb, u.gLR, vertex.label.color = "black", vertex.label.family = "Helvetica", vertex.label.cex = 0.1, edge.color = "black" ) ``` ```{r network3, eval=FALSE,warning=FALSE} # For the next steps, we just share the code below but graph generation function # are commented to lighten the vignette. # Generate a ligand-receptor network complemented with intra-cellular, # receptor downstream pathways [computations are a bit longer here] # # You can save to a file for cystoscape or plot with igraph. gLRintra <- getLRIntracellNetwork(bsrinf.redBP, qval.thres = 1e-3) lay <- layout_with_kk(gLRintra) # plot(gLRintra, # layout=lay, # edge.arrow.size=0.1, # vertex.label.color="black", # vertex.label.family="Helvetica", # vertex.label.cex=0.1) # Reduce complexity by focusing on strongly targeted pathways pairs <- LRinter(bsrinf.redBP) top <- unique(pairs[pairs$pval < 1e-3, c("pw.id", "pw.name")]) top gLRintra.res <- getLRIntracellNetwork(bsrinf.redBP, qval.thres = 0.01, restrict.pw = top$pw.id ) lay <- layout_with_fr(gLRintra.res) # plot(gLRintra.res, # layout=lay, # edge.arrow.size=0.1, # vertex.label.color="black", # vertex.label.family="Helvetica", # vertex.label.cex=0.4) ``` # Non human data \ In order to process data from nonhuman organisms, users only need to specify a few additional parameters and all the other steps of the analysis remain unchanged. By default, `BulksignalR` works with *Homo sapiens*. We implemented a strategy using ortholog genes (mapped by the orthogene BioConductor package) in BulkSignalR directly. The function `findOrthoGenes` creates a correspondence table between human and another organism. `convertToHuman` then converts an initial expression matrix to a *Homo sapiens* equivalent. When calling `BSRDataModel`, the user only needs to pass this transformed matrix, the actual nonhuman organism, and the correspondence table. Then, L-R interaction inference is performed as for human data. Finally, users can switch back to gene names relative to the original organism via `resetToInitialOrganism`. The rest of the workflow is executed as usual for computing gene signatures and visualizing. \ ```{r mouse,eval=TRUE,warning=FALSE} data(bodyMap.mouse) ortholog.dict <- findOrthoGenes( from_organism = "mmusculus", from_values = rownames(bodyMap.mouse) ) matrix.expression.human <- convertToHuman( counts = bodyMap.mouse, dictionary = ortholog.dict ) bsrdm <- BSRDataModel( counts = matrix.expression.human, species = "mmusculus", conversion.dict = ortholog.dict ) bsrdm <- learnParameters(bsrdm,quick=TRUE) bsrinf <- BSRInference(bsrdm,reference="REACTOME") bsrinf <- resetToInitialOrganism(bsrinf, conversion.dict = ortholog.dict) # For example, if you want to explore L-R interactions # you can proceed as shown above for a human dataset. # bsrinf.redBP <- reduceToBestPathway(bsrinf) # bsrsig.redBP <- BSRSignature(bsrinf.redBP, qval.thres=0.001) # scoresLR <- scoreLRGeneSignatures(bsrdm,bsrsig.redBP,name.by.pathway=FALSE) # simpleHeatmap(scoresLR[1:20,],column.names=TRUE, # width=9, height=5, pointsize=8) ``` # Spatial Transcriptomics \ `BulkSignalR` workflow can be applied to spatial transcriptomics (ST) to find significant L-R interactions occurring in a tissue. Additional functions have been introduced to facilitate the visualization and analysis of the results in a spatial context. The only necessary change is to adapt the training of the statistical model to shallower data featuring dropouts and reduced dynamic range. This is achieved by imposing a minimum correlation at -1 in the training and requiring at least two target genes in a pathway instead of 4. Also, thresholds on L-R interaction Q-values should be released slightly such as 1% instead of 0.1%. A basic spatial function is `spatialPlot` that enables visualizing L-R interaction gene signature scores at their spatial coordinates with a potential reference plot (raw tissue image or user-defined areas) on the side. When the papaer was published, we provided scripts [BulkSignalR github companion]( https://github.com/jcolinge/BulkSignalR_companion), to retrieve ST raw data, from tabulated files, that was the starting point to execute our workflow. We can also directly work with an object from the `SpatialExperiment` Bioconductor class. Also you can check out the `VisiumIO` package that allows users to readily import Visium data from the 10X Space Ranger pipeline. `VisiumIO` package provides a convenient function to easily retrieve SpatialExperiment object. \ ```{r spatial1,eval=TRUE,message=FALSE} # load data ================================================= # We re-initialise the environment variable of Reactome # whith the whole set. (because we previously made a subset before) reactSubset <- getResource(resourceName = "Reactome", cache = TRUE) resetPathways(dataframe = reactSubset, resourceName = "Reactome") # Few steps of pre-process to subset a spatialExperiment object # from STexampleData package ================================== spe <- Visium_humanDLPFC() set.seed(123) speSubset <- spe[, colData(spe)$ground_truth%in%c("Layer1","Layer2")] idx <- sample(ncol(speSubset), 10) speSubset <- speSubset[, idx] my.image.as.raster <- SpatialExperiment::imgRaster(speSubset, sample_id = imgData(spe)$sample_id[1], image_id = "lowres") colData(speSubset)$idSpatial <- paste(colData(speSubset)[[4]], colData(speSubset)[[5]],sep = "x") annotation <- colData(speSubset) ``` ```{r spatial2,eval=TRUE,warning=FALSE} # prepare data ================================================= bsrdm <- BSRDataModel(speSubset, min.count = 1, prop = 0.01, method = "TC", symbol.col = 2, x.col = 4, y.col = 5, barcodeID.col = 1) bsrdm <- learnParameters(bsrdm, quick = TRUE, min.positive = 2, verbose = TRUE) bsrinf <- BSRInference(bsrdm, min.cor = -1,reference="REACTOME") # spatial analysis ============================================ bsrinf.red <- reduceToBestPathway(bsrinf) pairs.red <- LRinter(bsrinf.red) thres <- 0.01 min.corr <- 0.01 pairs.red <- pairs.red[pairs.red$qval < thres & pairs.red$LR.corr > min.corr,] head(pairs.red[ order(pairs.red$qval), c("L", "R", "LR.corr", "pw.name", "qval")]) s.red <- BSRSignature(bsrinf.red, qval.thres=thres) scores.red <- scoreLRGeneSignatures(bsrdm,s.red) head(scores.red) ``` From here, one can start to explore the data through different plots. **Note :** As we work on a data subset to increase the vignette generation, only a few point are displayed on the slide. ```{r spatialPlot3,eval=TRUE,fig.dim = c(6,4.5)} # Visualization ============================================ # plot one specific interaction # we have to follow the syntax with {} # to be compatible with reduction operations inter <- "{SLIT2} / {GPC1}" # with raw tissue reference spatialPlot(scores.red[inter, ], annotation, inter, ref.plot = TRUE, ref.plot.only = FALSE, image.raster = NULL, dot.size = 1, label.col = "ground_truth" ) # or with synthetic image reference spatialPlot(scores.red[inter, ], annotation, inter, ref.plot = TRUE, ref.plot.only = FALSE, image.raster = my.image.as.raster, dot.size = 1, label.col = "ground_truth" ) ``` You can dissect one interaction to visualise both ligand and receptor expression of the interaction. ```{r spatialPlot4,eval=TRUE,fig.dim = c(6,4.5)} separatedLRPlot(scores.red, "SLIT2", "GPC1", ncounts(bsrdm), annotation, label.col = "ground_truth") ``` ```{r spatialPlot5,eval=TRUE} # generate visual index on disk in pdf file spatialIndexPlot(scores.red, annotation, label.col = "ground_truth", out.file="spatialIndexPlot") ``` \ Finally, we provide function to assess statistical associations of L-R gene signature scores with the different user-defined areas of a sample. Based on these associations, a visualization tool can represent the latter in the form of a heatmap. \ ```{r spatialPlot6,eval=TRUE,fig.dim = c(6,4.5)} # statistical association with tissue areas based on correlations # For display purpose, we only use a subset here assoc.bsr.corr <- spatialAssociation(scores.red[c(1:17), ], annotation, label.col = "ground_truth",test = "Spearman") head(assoc.bsr.corr) spatialAssociationPlot(assoc.bsr.corr) ``` We also provide 2D-projections (see `spatialDiversityPlot` function) to assess diversity among L-R interaction spatial distributions over an entire dataset. Other function as `generateSpatialPlots` can generate on disk multiple individual spatial plots. \ Note that we also describe more diverse use cases in the [BulkSignalR github companion]( https://github.com/jcolinge/BulkSignalR_companion). \ See the reference manual for all the details. ```{r inferCells,eval=FALSE,include=FALSE} ## Additional functions #This is not part of the main workflow for analyzing LR interactions but #we offer convenient functions for inferring cell types. However #we recommend using other softwares specifically dedicated to this purpose. # Inferring cell types data(sdc, package = "BulkSignalR") bsrdm <- BSRDataModel(counts = sdc) bsrdm <- learnParameters(bsrdm) bsrinf <- BSRInference(bsrdm) # Common TME cell type signatures data(immune.signatures, package = "BulkSignalR") unique(immune.signatures$signature) immune.signatures <- immune.signatures[immune.signatures$signature %in% c( "B cells", "Dentritic cells", "Macrophages", "NK cells", "T cells", "T regulatory cells" ), ] data("tme.signatures", package = "BulkSignalR") signatures <- rbind(immune.signatures, tme.signatures[tme.signatures$signature %in% c("Endothelial cells", "Fibroblasts"), ]) tme.scores <- scoreSignatures(bsrdm, signatures) # assign cell types to interactions lr2ct <- assignCellTypesToInteractions(bsrdm, bsrinf, tme.scores) head(lr2ct) # cellular network computation and plot g.table <- cellularNetworkTable(lr2ct) gCN <- cellularNetwork(g.table) plot(gCN, edge.width = 5 * E(gCN)$score) gSummary <- summarizedCellularNetwork(g.table) plot(gSummary, edge.width = 1 + 30 * E(gSummary)$score) # relationship with partial EMT--- # Should be tested HNSCC data instead of SDC!! # find the ligands data(p.EMT, package = "BulkSignalR") gs <- p.EMT$gene triggers <- relateToGeneSet(bsrinf, gs) triggers <- triggers[triggers$n.genes > 1, ] # at least 2 target genes in the gs ligands.in.gs <- intersect(triggers$L, gs) triggers <- triggers[!(triggers$L %in% ligands.in.gs), ] ligands <- unique(triggers$L) # link to cell types cf <- cellTypeFrequency(triggers, lr2ct, min.n.genes = 2) missing <- setdiff(rownames(tme.scores), names(cf$s)) cf$s[missing] <- 0 cf$t[missing] <- 0 op <- par(mar = c(2, 10, 2, 2)) barplot(cf$s, col = "lightgray", horiz = T, las = 2) par(op) # random selections based on random gene sets qval.thres <- 1e-3 inter <- LRinter(bsrinf) tg <- tgGenes(bsrinf) tcor <- tgCorr(bsrinf) good <- inter$qval <= qval.thres inter <- inter[good, ] tg <- tg[good] tcor <- tcor[good] all.targets <- unique(unlist(tg)) r.cf <- list() for (k in 1:100) { # should 1000 or more r.gs <- sample(all.targets, length(intersect(gs, all.targets))) r.triggers <- relateToGeneSet(bsrinf, r.gs, qval.thres = qval.thres) r.triggers <- r.triggers[r.triggers$n.genes > 1, ] r.ligands.in.gs <- intersect(r.triggers$L, r.gs) r.triggers <- r.triggers[!(r.triggers$L %in% r.ligands.in.gs), ] r <- cellTypeFrequency(r.triggers, lr2ct, min.n.genes = 2) missing <- setdiff(rownames(tme.scores), names(r$s)) r$s[missing] <- 0 r$t[missing] <- 0 o <- order(names(r$t)) r$s <- r$s[o] r$t <- r$t[o] r.cf <- c(r.cf, list(r)) } r.m.s <- foreach(i = seq_len(length(r.cf)), .combine = rbind) %do% { r.cf[[i]]$s } # plot results op <- par(mar = c(2, 10, 2, 2)) boxplot(r.m.s, col = "lightgray", horizontal = T, las = 2) pts <- data.frame(x = as.numeric(cf$s[colnames(r.m.s)]), cty = colnames(r.m.s)) stripchart(x ~ cty, data = pts, add = TRUE, pch = 19, col = "red") par(op) for (cty in rownames(tme.scores)) { cat(cty, ": P=", sum(r.m.s[, cty] >= cf$s[cty]) / nrow(r.m.s), "\n", sep = "") } ``` # Acknowledgements We thank Guillaume Tosato for his help with the figures and Gauthier Gadouas for testing the software on different platforms. \ Thank you for reading this guide and for using `BulkSignalR`. # Session Information ```{r session-info} sessionInfo() ```