Data visualization with epiregulon.extra

Introduction

epiregulon.extra is a companion package to epiregulon and provides functions to visualize transcription factor activity and network. It also provides statistical tests to identify transcription factors with differential activity and network topology. This tutorial continues from the reprogram-seq example included in epiregulon. We will use the gene regulatory networks constructed by epiregulon and continue with visualization and network analysis.

For full documentation of the epiregulon package, please refer to the epiregulon book.

Installation

 if (!require("BiocManager", quietly = TRUE))
     install.packages("BiocManager")

 BiocManager::install("epiregulon.extra")

Data preparation

To continue with the visualization functions, we will first need the gene expression matrix. We can download the data from scMultiome.

# load the MAE object
library(scMultiome)
## Loading required package: AnnotationHub
## Loading required package: BiocGenerics
## Loading required package: generics
## 
## Attaching package: 'generics'
## The following objects are masked from 'package:base':
## 
##     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
##     setequal, union
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
##     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
##     unsplit, which.max, which.min
## Loading required package: BiocFileCache
## Loading required package: dbplyr
## Loading required package: ExperimentHub
## Loading required package: MultiAssayExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians
## The following object is masked from 'package:ExperimentHub':
## 
##     cache
## The following object is masked from 'package:AnnotationHub':
## 
##     cache
## Loading required package: SingleCellExperiment
mae <- scMultiome::reprogramSeq()
## see ?scMultiome and browseVignettes('scMultiome') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
# expression matrix
GeneExpressionMatrix <- mae[["GeneExpressionMatrix"]]
rownames(GeneExpressionMatrix) <- rowData(GeneExpressionMatrix)$name

reducedDim(GeneExpressionMatrix, "UMAP_Combined") <- 
  reducedDim(mae[["TileMatrix500"]], "UMAP_Combined")

# arrange hash_assignment
GeneExpressionMatrix$hash_assignment <- factor(as.character(
  GeneExpressionMatrix$hash_assignment),
  levels = c("HTO10_GATA6_UTR", "HTO2_GATA6_v2", "HTO8_NKX2.1_UTR", "HTO3_NKX2.1_v2",
    "HTO1_FOXA2_v2", "HTO4_mFOXA1_v2", "HTO6_hFOXA1_UTR", "HTO5_NeonG_v2"
  )
)

Calculate TF activity

Next we load the regulon object previously calculated by epiregulon. Here we just use the pruned version of regulon object in which only relevant columns are kept. Using epiregulon, we can calculate activity of transcription factors included in the regulon object.

library(epiregulon)
library(epiregulon.extra)

data(regulon)

score.combine <- calculateActivity(
  expMatrix = GeneExpressionMatrix,
  regulon = regulon,
  mode = "weight",
  method = "weightedMean",
  exp_assay = "normalizedCounts",
  normalize = FALSE
)
## calculating TF activity from regulon using weightedmean
## Warning in calculateActivity(expMatrix = GeneExpressionMatrix, regulon =
## regulon, : The weight column contains multiple subcolumns but no cluster
## information was provided. Using first column to compute activity...
## aggregating regulons...
## creating weight matrix...
## calculating activity scores...
## normalize by the number of targets...

Perform differential activity

We perform a differential analysis of transcription factor activity across groups of cells. This function is a wrapper around findMarkers from scran.

library(epiregulon.extra)
markers <- findDifferentialActivity(
  activity_matrix = score.combine,
  clusters = GeneExpressionMatrix$hash_assignment,
  pval.type = "some",
  direction = "up",
  test.type = "t",
  logvalue = FALSE
)

We can specify the top transcription factors of interest

markers.sig <- getSigGenes(markers, topgenes = 5)
## Using a logFC cutoff of 1 for class HTO10_GATA6_UTR for direction equal to any
## Using a logFC cutoff of 1 for class HTO2_GATA6_v2 for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO8_NKX2.1_UTR for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO3_NKX2.1_v2 for direction equal to any
## Using a logFC cutoff of 0.3 for class HTO1_FOXA2_v2 for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO4_mFOXA1_v2 for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO6_hFOXA1_UTR for direction equal to any
## Using a logFC cutoff of 0.3 for class HTO5_NeonG_v2 for direction equal to any

Visualize the results

We visualize the known differential transcription factors by bubble plot

plotBubble(
  activity_matrix = score.combine,
  tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2"),
  clusters = GeneExpressionMatrix$hash_assignment
)

Then visualize the most differential transcription factors by clusters

plotBubble(
  activity_matrix = score.combine,
  tf = markers.sig$tf,
  clusters = GeneExpressionMatrix$hash_assignment
)

Visualize the known differential transcription factors by violin plot.

plotActivityViolin(
  activity_matrix = score.combine,
  tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"),
  clusters = GeneExpressionMatrix$hash_assignment
)

Visualize the known differential transcription factors by UMAP

options(ggrastr.default.dpi=100)

ActivityPlot <- plotActivityDim(
  sce = GeneExpressionMatrix,
  activity_matrix = score.combine,
  tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"),
  dimtype = "UMAP_Combined",
  label = "Clusters",
  point_size = 0.3,
  ncol = 3,
  rasterise = TRUE
)


ActivityPlot

In contrast, the gene expression of the transcription factors is very sparse

options(ggrastr.default.dpi=100)

ActivityPlot <- plotActivityDim(
  sce = GeneExpressionMatrix,
  activity_matrix = counts(GeneExpressionMatrix),
  tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"),
  dimtype = "UMAP_Combined",
  label = "Clusters",
  point_size = 0.3,
  ncol = 3,
  limit = c(0, 2),
  colors = c("grey", "blue"),
  legend.label = "GEX",
  rasterise = TRUE
)
ActivityPlot

Visualize the activity of the selected transcription factors by heat map. Due to the maximum size limit for this vignette, the output is not shown here.

plotHeatmapRegulon(
  sce = GeneExpressionMatrix,
  tfs = c("GATA6", "NKX2-1"),
  regulon = regulon,
  regulon_cutoff = 0,
  downsample = 1000,
  cell_attributes = "hash_assignment",
  col_gap = "hash_assignment",
  exprs_values = "normalizedCounts",
  name = "regulon heatmap",
  column_title_rot = 45,
  raster_quality=4
)


plotHeatmapActivity(
  activity = score.combine,
  sce = GeneExpressionMatrix,
  tfs = c("GATA6", "NKX2-1", "FOXA1", "FOXA2"),
  downsample = 5000,
  cell_attributes = "hash_assignment",
  col_gap = "hash_assignment",
  name = "Activity",
  column_title_rot = 45,
  raster_quality=3
)

Geneset enrichment

Sometimes we are interested to know what pathways are enriched in the regulon of a particular TF. We can perform geneset enrichment using the enricher function from clusterProfiler.

# retrieve genesets
H <- EnrichmentBrowser::getGenesets(
  org = "hsa",
  db = "msigdb",
  cat = "H",
  gene.id.type = "SYMBOL"
)
C2 <- EnrichmentBrowser::getGenesets(
  org = "hsa",
  db = "msigdb",
  cat = "C2",
  gene.id.type = "SYMBOL"
)
C6 <- EnrichmentBrowser::getGenesets(
  org = "hsa",
  db = "msigdb",
  cat = "C6",
  gene.id.type = "SYMBOL"
)

# combine genesets and convert genesets to be compatible with enricher
gs <- c(H, C2, C6)
gs.list <- do.call(rbind, lapply(names(gs), function(x) {
  data.frame(gs = x, genes = gs[[x]])
}))

enrichresults <- regulonEnrich(
  TF = c("GATA6", "NKX2-1"),
  regulon = regulon,
  weight = "weight",
  weight_cutoff = 0.1,
  genesets = gs.list
)
## GATA6
## 
## NKX2-1
# plot results
enrichPlot(results = enrichresults)

Network analysis

We can visualize the genesets as a network

plotGseaNetwork(
  tf = names(enrichresults),
  enrichresults = enrichresults,
  p.adj_cutoff = 0.1,
  ntop_pathways = 10
)

Differential networks

We are interested in understanding the differential networks between two conditions and determining which transcription factors account for the differences in the topology of networks. The pruned regulons with cluster-specific test statistics computed by pruneRegulon can be used to generate cluster-specific networks based on user-defined cutoffs and to visualize differential networks for transcription factors of interest. In this dataset, the GATA6 gene was only expressed in cluster 1 (C1) and NKX2-1 was only expressed in cluster 3 (C3). If we visualize the target genes of GATA6, we can see that C1 has many more target genes of GATA6 compared to C5, a cluster that does not express GATA6. Similarly, NKX2-1 target genes are confined to C3 which is the only cluster that exogenously expresses NKX2-1.

plotDiffNetwork(regulon,
  cutoff = 0.1,
  tf = c("GATA6"),
  weight = "weight",
  clusters = c("C1", "C5"),
  layout = "stress"
)
## Building graph using weight as edge weights

plotDiffNetwork(regulon,
  cutoff = 0.1,
  tf = c("NKX2-1"),
  weight = "weight",
  clusters = c("C3", "C5"),
  layout = "stress"
)
## Building graph using weight as edge weights

We can also visualize how transcription factors relate to other transcription factors in C1 cluster.

selected <- which(regulon$weight[, "C1"] > 0 &
  regulon$tf %in% c("GATA6", "FOXA1", "AR"))

C1_network <- buildGraph(regulon[selected, ],
  weights = "weight",
  cluster = "C1"
  )
## Building graph using weight as edge weights
plotEpiregulonNetwork(C1_network,
  layout = "sugiyama",
  tfs_to_highlight = c("GATA6", "FOXA1", "AR")) + 
  ggplot2::ggtitle("C1")

To systematically examine the differential network topology between two clusters, we perform an edge subtraction between two graphs, using weights computed by pruneRegulon. We then calculate the degree centrality of the weighted differential graphs and if desired, normalize the differential centrality against the total number of edges. The default normalization function is sqrt as it preserves both the difference in the number of edges (but scaled by sqrt) and the differences in the weights. If the user only wants to examine the differences in the averaged weights, the FUN argument can be changed to identity. Finally, we rank the transcription factors by (normalized) differential centrality. For demonstration purpose, the regulon list is truncated but the full list would contain all the transcription factors.

# rank by differential centrality
C1_network <- buildGraph(regulon, weights = "weight", cluster = "C1")
## Building graph using weight as edge weights
C5_network <- buildGraph(regulon, weights = "weight", cluster = "C5")
## Building graph using weight as edge weights
diff_graph <- buildDiffGraph(C1_network, C5_network, abs_diff = FALSE)
diff_graph <- addCentrality(diff_graph)
diff_graph <- normalizeCentrality(diff_graph)
rank_table <- rankTfs(diff_graph)

library(ggplot2)
ggplot(rank_table, aes(x = rank, y = centrality)) +
  geom_point() +
  ggrepel::geom_text_repel(data = rank_table, aes(label = tf)) +
  theme_classic()

Session Info

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggplot2_3.5.1               org.Hs.eg.db_3.20.0        
##  [3] AnnotationDbi_1.69.0        msigdbr_7.5.1              
##  [5] epiregulon.extra_1.3.0      epiregulon_1.3.1           
##  [7] scMultiome_1.6.0            SingleCellExperiment_1.29.1
##  [9] MultiAssayExperiment_1.33.1 SummarizedExperiment_1.37.0
## [11] Biobase_2.67.0              GenomicRanges_1.59.1       
## [13] GenomeInfoDb_1.43.2         IRanges_2.41.2             
## [15] S4Vectors_0.45.2            MatrixGenerics_1.19.0      
## [17] matrixStats_1.4.1           ExperimentHub_2.15.0       
## [19] AnnotationHub_3.15.0        BiocFileCache_2.15.0       
## [21] dbplyr_2.5.0                BiocGenerics_0.53.3        
## [23] generics_0.1.3              BiocStyle_2.35.0           
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.4.2            ggplotify_0.1.2          bitops_1.0-9            
##   [4] filelock_1.0.3           tibble_3.2.1             R.oo_1.27.0             
##   [7] polyclip_1.10-7          graph_1.85.0             XML_3.99-0.17           
##  [10] lifecycle_1.0.4          edgeR_4.5.1              lattice_0.22-6          
##  [13] MASS_7.3-61              backports_1.5.0          magrittr_2.0.3          
##  [16] limma_3.63.2             sass_0.4.9               rmarkdown_2.29          
##  [19] jquerylib_0.1.4          yaml_2.3.10              ggtangle_0.0.6          
##  [22] metapod_1.15.0           RColorBrewer_1.1-3       cowplot_1.1.3           
##  [25] DBI_1.2.3                buildtools_1.0.0         abind_1.4-8             
##  [28] zlibbioc_1.52.0          purrr_1.0.2              R.utils_2.12.3          
##  [31] ggraph_2.2.1             RCurl_1.98-1.16          yulab.utils_0.1.8       
##  [34] tweenr_2.0.3             rappdirs_0.3.3           GenomeInfoDbData_1.2.13 
##  [37] enrichplot_1.27.3        ggrepel_0.9.6            irlba_2.3.5.1           
##  [40] tidytree_0.4.6           maketools_1.3.1          annotate_1.85.0         
##  [43] dqrng_0.4.1              codetools_0.2-20         DelayedArray_0.33.3     
##  [46] DOSE_4.1.0               scuttle_1.17.0           ggforce_0.4.2           
##  [49] tidyselect_1.2.1         aplot_0.2.4              UCSC.utils_1.3.0        
##  [52] farver_2.1.2             ScaledMatrix_1.15.0      viridis_0.6.5           
##  [55] jsonlite_1.8.9           BiocNeighbors_2.1.2      tidygraph_1.3.1         
##  [58] scater_1.35.0            tools_4.4.2              treeio_1.31.0           
##  [61] Rcpp_1.0.13-1            glue_1.8.0               gridExtra_2.3           
##  [64] SparseArray_1.7.2        xfun_0.49                qvalue_2.39.0           
##  [67] dplyr_1.1.4              HDF5Array_1.35.2         withr_3.0.2             
##  [70] BiocManager_1.30.25      fastmap_1.2.0            rhdf5filters_1.19.0     
##  [73] bluster_1.17.0           digest_0.6.37            rsvd_1.0.5              
##  [76] gridGraphics_0.5-1       R6_2.5.1                 mime_0.12               
##  [79] colorspace_2.1-1         GO.db_3.20.0             Cairo_1.6-2             
##  [82] RSQLite_2.3.9            R.methodsS3_1.8.2        tidyr_1.3.1             
##  [85] data.table_1.16.4        graphlayouts_1.2.1       httr_1.4.7              
##  [88] S4Arrays_1.7.1           pkgconfig_2.0.3          gtable_0.3.6            
##  [91] blob_1.2.4               XVector_0.47.0           sys_3.4.3               
##  [94] clusterProfiler_4.15.1   htmltools_0.5.8.1        fgsea_1.33.0            
##  [97] GSEABase_1.69.0          scales_1.3.0             png_0.1-8               
## [100] ggfun_0.1.8              scran_1.35.0             knitr_1.49              
## [103] reshape2_1.4.4           nlme_3.1-166             checkmate_2.3.2         
## [106] curl_6.0.1               cachem_1.1.0             rhdf5_2.51.1            
## [109] stringr_1.5.1            BiocVersion_3.21.1       parallel_4.4.2          
## [112] vipor_0.4.7              ggrastr_1.0.2            pillar_1.10.0           
## [115] grid_4.4.2               vctrs_0.6.5              BiocSingular_1.23.0     
## [118] beachmat_2.23.5          xtable_1.8-4             cluster_2.1.8           
## [121] beeswarm_0.4.0           Rgraphviz_2.51.0         evaluate_1.0.1          
## [124] KEGGgraph_1.67.0         cli_3.6.3                locfit_1.5-9.10         
## [127] compiler_4.4.2           rlang_1.1.4              crayon_1.5.3            
## [130] labeling_0.4.3           plyr_1.8.9               fs_1.6.5                
## [133] ggbeeswarm_0.7.2         stringi_1.8.4            viridisLite_0.4.2       
## [136] BiocParallel_1.41.0      babelgene_22.9           munsell_0.5.1           
## [139] Biostrings_2.75.3        lazyeval_0.2.2           GOSemSim_2.33.0         
## [142] Matrix_1.7-1             patchwork_1.3.0          bit64_4.5.2             
## [145] Rhdf5lib_1.29.0          KEGGREST_1.47.0          statmod_1.5.0           
## [148] igraph_2.1.2             memoise_2.0.1            bslib_0.8.0             
## [151] ggtree_3.15.0            fastmatch_1.1-4          bit_4.5.0.1             
## [154] EnrichmentBrowser_2.37.0 gson_0.1.0               ape_5.8-1