epiregulon.extra is a companion package to epiregulon and provides functions to visualize transcription factor activity and network. It also provides statistical tests to identify transcription factors with differential activity and network topology. This tutorial continues from the reprogram-seq example included in epiregulon. We will use the gene regulatory networks constructed by epiregulon and continue with visualization and network analysis.
For full documentation of the epiregulon
package, please
refer to the epiregulon
book.
To continue with the visualization functions, we will first need the
gene expression matrix. We can download the data from scMultiome.
## Loading required package: AnnotationHub
## Loading required package: BiocGenerics
## Loading required package: generics
##
## Attaching package: 'generics'
## The following objects are masked from 'package:base':
##
## as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
## setequal, union
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
## mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
## unsplit, which.max, which.min
## Loading required package: BiocFileCache
## Loading required package: dbplyr
## Loading required package: ExperimentHub
## Loading required package: MultiAssayExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
## The following object is masked from 'package:ExperimentHub':
##
## cache
## The following object is masked from 'package:AnnotationHub':
##
## cache
## Loading required package: SingleCellExperiment
## see ?scMultiome and browseVignettes('scMultiome') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
# expression matrix
GeneExpressionMatrix <- mae[["GeneExpressionMatrix"]]
rownames(GeneExpressionMatrix) <- rowData(GeneExpressionMatrix)$name
reducedDim(GeneExpressionMatrix, "UMAP_Combined") <-
reducedDim(mae[["TileMatrix500"]], "UMAP_Combined")
# arrange hash_assignment
GeneExpressionMatrix$hash_assignment <- factor(as.character(
GeneExpressionMatrix$hash_assignment),
levels = c("HTO10_GATA6_UTR", "HTO2_GATA6_v2", "HTO8_NKX2.1_UTR", "HTO3_NKX2.1_v2",
"HTO1_FOXA2_v2", "HTO4_mFOXA1_v2", "HTO6_hFOXA1_UTR", "HTO5_NeonG_v2"
)
)
Next we load the regulon object previously calculated by epiregulon. Here we just use the pruned version of regulon object in which only relevant columns are kept. Using epiregulon, we can calculate activity of transcription factors included in the regulon object.
library(epiregulon)
library(epiregulon.extra)
data(regulon)
score.combine <- calculateActivity(
expMatrix = GeneExpressionMatrix,
regulon = regulon,
mode = "weight",
method = "weightedMean",
exp_assay = "normalizedCounts",
normalize = FALSE
)
## calculating TF activity from regulon using weightedmean
## Warning in calculateActivity(expMatrix = GeneExpressionMatrix, regulon =
## regulon, : The weight column contains multiple subcolumns but no cluster
## information was provided. Using first column to compute activity...
## aggregating regulons...
## creating weight matrix...
## calculating activity scores...
## normalize by the number of targets...
We perform a differential analysis of transcription factor activity
across groups of cells. This function is a wrapper around
findMarkers
from scran.
library(epiregulon.extra)
markers <- findDifferentialActivity(
activity_matrix = score.combine,
clusters = GeneExpressionMatrix$hash_assignment,
pval.type = "some",
direction = "up",
test.type = "t",
logvalue = FALSE
)
We can specify the top transcription factors of interest
## Using a logFC cutoff of 1 for class HTO10_GATA6_UTR for direction equal to any
## Using a logFC cutoff of 1 for class HTO2_GATA6_v2 for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO8_NKX2.1_UTR for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO3_NKX2.1_v2 for direction equal to any
## Using a logFC cutoff of 0.3 for class HTO1_FOXA2_v2 for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO4_mFOXA1_v2 for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO6_hFOXA1_UTR for direction equal to any
## Using a logFC cutoff of 0.3 for class HTO5_NeonG_v2 for direction equal to any
We visualize the known differential transcription factors by bubble plot
plotBubble(
activity_matrix = score.combine,
tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2"),
clusters = GeneExpressionMatrix$hash_assignment
)
Then visualize the most differential transcription factors by clusters
plotBubble(
activity_matrix = score.combine,
tf = markers.sig$tf,
clusters = GeneExpressionMatrix$hash_assignment
)
Visualize the known differential transcription factors by violin plot.
plotActivityViolin(
activity_matrix = score.combine,
tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"),
clusters = GeneExpressionMatrix$hash_assignment
)
Visualize the known differential transcription factors by UMAP
options(ggrastr.default.dpi=100)
ActivityPlot <- plotActivityDim(
sce = GeneExpressionMatrix,
activity_matrix = score.combine,
tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"),
dimtype = "UMAP_Combined",
label = "Clusters",
point_size = 0.3,
ncol = 3,
rasterise = TRUE
)
ActivityPlot
In contrast, the gene expression of the transcription factors is very sparse
options(ggrastr.default.dpi=100)
ActivityPlot <- plotActivityDim(
sce = GeneExpressionMatrix,
activity_matrix = counts(GeneExpressionMatrix),
tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"),
dimtype = "UMAP_Combined",
label = "Clusters",
point_size = 0.3,
ncol = 3,
limit = c(0, 2),
colors = c("grey", "blue"),
legend.label = "GEX",
rasterise = TRUE
)
ActivityPlot
Visualize the activity of the selected transcription factors by heat map. Due to the maximum size limit for this vignette, the output is not shown here.
plotHeatmapRegulon(
sce = GeneExpressionMatrix,
tfs = c("GATA6", "NKX2-1"),
regulon = regulon,
regulon_cutoff = 0,
downsample = 1000,
cell_attributes = "hash_assignment",
col_gap = "hash_assignment",
exprs_values = "normalizedCounts",
name = "regulon heatmap",
column_title_rot = 45,
raster_quality=4
)
plotHeatmapActivity(
activity = score.combine,
sce = GeneExpressionMatrix,
tfs = c("GATA6", "NKX2-1", "FOXA1", "FOXA2"),
downsample = 5000,
cell_attributes = "hash_assignment",
col_gap = "hash_assignment",
name = "Activity",
column_title_rot = 45,
raster_quality=3
)
Sometimes we are interested to know what pathways are enriched in the regulon of a particular TF. We can perform geneset enrichment using the enricher function from clusterProfiler.
# retrieve genesets
H <- EnrichmentBrowser::getGenesets(
org = "hsa",
db = "msigdb",
cat = "H",
gene.id.type = "SYMBOL"
)
C2 <- EnrichmentBrowser::getGenesets(
org = "hsa",
db = "msigdb",
cat = "C2",
gene.id.type = "SYMBOL"
)
C6 <- EnrichmentBrowser::getGenesets(
org = "hsa",
db = "msigdb",
cat = "C6",
gene.id.type = "SYMBOL"
)
# combine genesets and convert genesets to be compatible with enricher
gs <- c(H, C2, C6)
gs.list <- do.call(rbind, lapply(names(gs), function(x) {
data.frame(gs = x, genes = gs[[x]])
}))
enrichresults <- regulonEnrich(
TF = c("GATA6", "NKX2-1"),
regulon = regulon,
weight = "weight",
weight_cutoff = 0.1,
genesets = gs.list
)
## GATA6
##
## NKX2-1
We can visualize the genesets as a network
We are interested in understanding the differential networks between
two conditions and determining which transcription factors account for
the differences in the topology of networks. The pruned regulons with
cluster-specific test statistics computed by pruneRegulon
can be used to generate cluster-specific networks based on user-defined
cutoffs and to visualize differential networks for transcription factors
of interest. In this dataset, the GATA6 gene was only expressed in
cluster 1 (C1) and NKX2-1 was only expressed in cluster 3 (C3). If we
visualize the target genes of GATA6, we can see that C1 has many more
target genes of GATA6 compared to C5, a cluster that does not express
GATA6. Similarly, NKX2-1 target genes are confined to C3 which is the
only cluster that exogenously expresses NKX2-1.
plotDiffNetwork(regulon,
cutoff = 0.1,
tf = c("GATA6"),
weight = "weight",
clusters = c("C1", "C5"),
layout = "stress"
)
## Building graph using weight as edge weights
plotDiffNetwork(regulon,
cutoff = 0.1,
tf = c("NKX2-1"),
weight = "weight",
clusters = c("C3", "C5"),
layout = "stress"
)
## Building graph using weight as edge weights
We can also visualize how transcription factors relate to other transcription factors in C1 cluster.
selected <- which(regulon$weight[, "C1"] > 0 &
regulon$tf %in% c("GATA6", "FOXA1", "AR"))
C1_network <- buildGraph(regulon[selected, ],
weights = "weight",
cluster = "C1"
)
## Building graph using weight as edge weights
plotEpiregulonNetwork(C1_network,
layout = "sugiyama",
tfs_to_highlight = c("GATA6", "FOXA1", "AR")) +
ggplot2::ggtitle("C1")
To systematically examine the differential network topology between two
clusters, we perform an edge subtraction between two graphs, using
weights computed by pruneRegulon
. We then calculate the
degree centrality of the weighted differential graphs and if desired,
normalize the differential centrality against the total number of edges.
The default normalization function is sqrt
as it preserves
both the difference in the number of edges (but scaled by sqrt) and the
differences in the weights. If the user only wants to examine the
differences in the averaged weights, the FUN
argument can
be changed to identity
. Finally, we rank the transcription
factors by (normalized) differential centrality. For demonstration
purpose, the regulon list is truncated but the full list would contain
all the transcription factors.
# rank by differential centrality
C1_network <- buildGraph(regulon, weights = "weight", cluster = "C1")
## Building graph using weight as edge weights
## Building graph using weight as edge weights
diff_graph <- buildDiffGraph(C1_network, C5_network, abs_diff = FALSE)
diff_graph <- addCentrality(diff_graph)
diff_graph <- normalizeCentrality(diff_graph)
rank_table <- rankTfs(diff_graph)
library(ggplot2)
ggplot(rank_table, aes(x = rank, y = centrality)) +
geom_point() +
ggrepel::geom_text_repel(data = rank_table, aes(label = tf)) +
theme_classic()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggplot2_3.5.1 org.Hs.eg.db_3.20.0
## [3] AnnotationDbi_1.69.0 msigdbr_7.5.1
## [5] epiregulon.extra_1.3.0 epiregulon_1.3.1
## [7] scMultiome_1.6.0 SingleCellExperiment_1.29.1
## [9] MultiAssayExperiment_1.33.1 SummarizedExperiment_1.37.0
## [11] Biobase_2.67.0 GenomicRanges_1.59.1
## [13] GenomeInfoDb_1.43.2 IRanges_2.41.2
## [15] S4Vectors_0.45.2 MatrixGenerics_1.19.0
## [17] matrixStats_1.4.1 ExperimentHub_2.15.0
## [19] AnnotationHub_3.15.0 BiocFileCache_2.15.0
## [21] dbplyr_2.5.0 BiocGenerics_0.53.3
## [23] generics_0.1.3 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] splines_4.4.2 ggplotify_0.1.2 bitops_1.0-9
## [4] filelock_1.0.3 tibble_3.2.1 R.oo_1.27.0
## [7] polyclip_1.10-7 graph_1.85.0 XML_3.99-0.17
## [10] lifecycle_1.0.4 edgeR_4.5.1 lattice_0.22-6
## [13] MASS_7.3-61 backports_1.5.0 magrittr_2.0.3
## [16] limma_3.63.2 sass_0.4.9 rmarkdown_2.29
## [19] jquerylib_0.1.4 yaml_2.3.10 ggtangle_0.0.6
## [22] metapod_1.15.0 RColorBrewer_1.1-3 cowplot_1.1.3
## [25] DBI_1.2.3 buildtools_1.0.0 abind_1.4-8
## [28] zlibbioc_1.52.0 purrr_1.0.2 R.utils_2.12.3
## [31] ggraph_2.2.1 RCurl_1.98-1.16 yulab.utils_0.1.8
## [34] tweenr_2.0.3 rappdirs_0.3.3 GenomeInfoDbData_1.2.13
## [37] enrichplot_1.27.3 ggrepel_0.9.6 irlba_2.3.5.1
## [40] tidytree_0.4.6 maketools_1.3.1 annotate_1.85.0
## [43] dqrng_0.4.1 codetools_0.2-20 DelayedArray_0.33.3
## [46] DOSE_4.1.0 scuttle_1.17.0 ggforce_0.4.2
## [49] tidyselect_1.2.1 aplot_0.2.4 UCSC.utils_1.3.0
## [52] farver_2.1.2 ScaledMatrix_1.15.0 viridis_0.6.5
## [55] jsonlite_1.8.9 BiocNeighbors_2.1.2 tidygraph_1.3.1
## [58] scater_1.35.0 tools_4.4.2 treeio_1.31.0
## [61] Rcpp_1.0.13-1 glue_1.8.0 gridExtra_2.3
## [64] SparseArray_1.7.2 xfun_0.49 qvalue_2.39.0
## [67] dplyr_1.1.4 HDF5Array_1.35.2 withr_3.0.2
## [70] BiocManager_1.30.25 fastmap_1.2.0 rhdf5filters_1.19.0
## [73] bluster_1.17.0 digest_0.6.37 rsvd_1.0.5
## [76] gridGraphics_0.5-1 R6_2.5.1 mime_0.12
## [79] colorspace_2.1-1 GO.db_3.20.0 Cairo_1.6-2
## [82] RSQLite_2.3.9 R.methodsS3_1.8.2 tidyr_1.3.1
## [85] data.table_1.16.4 graphlayouts_1.2.1 httr_1.4.7
## [88] S4Arrays_1.7.1 pkgconfig_2.0.3 gtable_0.3.6
## [91] blob_1.2.4 XVector_0.47.0 sys_3.4.3
## [94] clusterProfiler_4.15.1 htmltools_0.5.8.1 fgsea_1.33.0
## [97] GSEABase_1.69.0 scales_1.3.0 png_0.1-8
## [100] ggfun_0.1.8 scran_1.35.0 knitr_1.49
## [103] reshape2_1.4.4 nlme_3.1-166 checkmate_2.3.2
## [106] curl_6.0.1 cachem_1.1.0 rhdf5_2.51.1
## [109] stringr_1.5.1 BiocVersion_3.21.1 parallel_4.4.2
## [112] vipor_0.4.7 ggrastr_1.0.2 pillar_1.10.0
## [115] grid_4.4.2 vctrs_0.6.5 BiocSingular_1.23.0
## [118] beachmat_2.23.5 xtable_1.8-4 cluster_2.1.8
## [121] beeswarm_0.4.0 Rgraphviz_2.51.0 evaluate_1.0.1
## [124] KEGGgraph_1.67.0 cli_3.6.3 locfit_1.5-9.10
## [127] compiler_4.4.2 rlang_1.1.4 crayon_1.5.3
## [130] labeling_0.4.3 plyr_1.8.9 fs_1.6.5
## [133] ggbeeswarm_0.7.2 stringi_1.8.4 viridisLite_0.4.2
## [136] BiocParallel_1.41.0 babelgene_22.9 munsell_0.5.1
## [139] Biostrings_2.75.3 lazyeval_0.2.2 GOSemSim_2.33.0
## [142] Matrix_1.7-1 patchwork_1.3.0 bit64_4.5.2
## [145] Rhdf5lib_1.29.0 KEGGREST_1.47.0 statmod_1.5.0
## [148] igraph_2.1.2 memoise_2.0.1 bslib_0.8.0
## [151] ggtree_3.15.0 fastmatch_1.1-4 bit_4.5.0.1
## [154] EnrichmentBrowser_2.37.0 gson_0.1.0 ape_5.8-1