When comparing samples, it is common to perform the task of
identifying overlapping loops among two or more sets of genomic
interactions. Traditionally, this is achieved through the use of
visualizations such as vennDiagram
or UpSet
plots. However, it is frequently observed that the total count displayed
in these plots does not match the original counts for each individual
list. The reason behind this discrepancy is that a single overlap may
encompass multiple interactions for one or more samples. This issue is
extensively discussed in the realm of overlapping caller for ChIP-Seq
peaks.
The hicVennDiagram aims to provide a easy to use tool for overlapping interactions calculation and proper visualization methods. The hicVennDiagram generates plots specifically crafted to eliminate the deceptive visual representation caused by the counts method.
Here is an example using hicVennDiagram with 3 files in
BEDPE
format.
First, install hicVennDiagram and other packages required to run the examples.
# list the BEDPE files
file_folder <- system.file("extdata",
package = "hicVennDiagram",
mustWork = TRUE)
file_list <- dir(file_folder, pattern = ".bedpe", full.names = TRUE)
names(file_list) <- sub(".bedpe", "", basename(file_list))
basename(file_list)
## [1] "group1.bedpe" "group2.bedpe" "group3.bedpe"
venn <- vennCount(file_list)
## upset plot
## temp fix for https://github.com/krassowski/complex-upset/issues/195
upset_themes_fix <- lapply(ComplexUpset::upset_themes, function(.ele){
lapply(.ele, function(.e){
do.call(theme, .e[names(.e) %in% names(formals(theme))])
})
})
upsetPlot(venn,
themes = upset_themes_fix)
vennCount
The vennCount
function borrows the power of
InteractionSet:findOverlaps
to calculate the overlaps and
then summarizes the results for each category. Users may want to try
different combinations of maxgap
and
minoverlap
parameters to calculate the overlapping
loops.
ChIPpeakAnno
library(ChIPpeakAnno)
bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
gr1 <- toGRanges(bed, format="BED", header=FALSE)
gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
ol <- findOverlapsOfPeaks(gr1, gr2)
overlappingPeaksToVennTable <- function(.ele){
.venn <- .ele$venn_cnt
k <- which(colnames(.venn)=="Counts")
rownames(.venn) <- apply(.venn[, seq.int(k-1)], 1, paste, collapse="")
colnames(.venn) <- sub("count.", "", colnames(.venn))
vennTable(combinations=.venn[, seq.int(k-1)],
counts=.venn[, k],
vennCounts=.venn[, seq.int(ncol(.venn))[-seq.int(k)]])
}
venn <- overlappingPeaksToVennTable(ol)
vennPlot(venn)
## or you can simply try vennPlot(vennCount(c(bed, gff)))
upsetPlot(venn, themes = upset_themes_fix)
## change the font size of labels and numbers
updated_theme <- ComplexUpset::upset_modify_themes(
## get help by vignette('Examples_R', package = 'ComplexUpset')
list('intersections_matrix'=
ggplot2::theme(
## font size of label: gr1/gr2
axis.text.y=ggplot2::element_text(size=24),
## font size of label `group`
axis.title.x=ggplot2::element_text(size=24)),
'overall_sizes'=
ggplot2::theme(
## font size of x-axis 0-200
axis.text=ggplot2::element_text(size=12),
## font size of x-label `Set size`
axis.title=ggplot2::element_text(size=18)),
'Intersection size'=
ggplot2::theme(
## font size of y-axis 0-150
axis.text=ggplot2::element_text(size=20),
## font size of y-label `Intersection size`
axis.title=ggplot2::element_text(size=16)
),
'default'=ggplot2::theme_minimal())
)
updated_theme <- lapply(updated_theme, function(.ele){
lapply(.ele, function(.e){
do.call(theme, .e[names(.e) %in% names(formals(theme))])
})
})
upsetPlot(venn,
label_all=list(na.rm = TRUE, color = 'gray30', alpha = .7,
label.padding = unit(0.1, "lines"),
size = 8 #control the font size of the individual num
),
base_annotations=list('Intersection size'=
ComplexUpset::intersection_size(
## font size of counts in the bar-plot
text = list(size=6)
)),
themes = updated_theme
)
R version 4.4.1 (2024-06-14) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 24.04.1 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC tzcode source: system (glibc)
attached base packages: [1] stats4 stats graphics grDevices utils
datasets methods
[8] base
other attached packages: [1] ChIPpeakAnno_3.41.0 ggplot2_3.5.1
GenomicRanges_1.57.2 [4] GenomeInfoDb_1.41.2 IRanges_2.39.2
S4Vectors_0.43.2
[7] BiocGenerics_0.53.0 hicVennDiagram_1.5.0
loaded via a namespace (and not attached): [1] eulerr_7.0.2
sys_3.4.3
[3] jsonlite_1.8.9 magrittr_2.0.3
[5] GenomicFeatures_1.57.1 farver_2.1.2
[7] rmarkdown_2.28 BiocIO_1.17.0
[9] zlibbioc_1.51.2 vctrs_0.6.5
[11] multtest_2.61.0 memoise_2.0.1
[13] Rsamtools_2.21.2 RCurl_1.98-1.16
[15] htmltools_0.5.8.1 S4Arrays_1.5.11
[17] progress_1.2.3 lambda.r_1.2.4
[19] curl_5.2.3 ComplexUpset_1.3.3
[21] SparseArray_1.5.45 sass_0.4.9
[23] bslib_0.8.0 htmlwidgets_1.6.4
[25] plyr_1.8.9 httr2_1.0.5
[27] futile.options_1.0.1 cachem_1.1.0
[29] buildtools_1.0.0 GenomicAlignments_1.41.0
[31] lifecycle_1.0.4 pkgconfig_2.0.3
[33] Matrix_1.7-1 R6_2.5.1
[35] fastmap_1.2.0 GenomeInfoDbData_1.2.13
[37] MatrixGenerics_1.17.1 digest_0.6.37
[39] colorspace_2.1-1 patchwork_1.3.0
[41] AnnotationDbi_1.69.0 regioneR_1.37.0
[43] RSQLite_2.3.7 filelock_1.0.3
[45] labeling_0.4.3 fansi_1.0.6
[47] httr_1.4.7 polyclip_1.10-7
[49] abind_1.4-8 compiler_4.4.1
[51] bit64_4.5.2 withr_3.0.2
[53] BiocParallel_1.41.0 DBI_1.2.3
[55] highr_0.11 biomaRt_2.63.0
[57] MASS_7.3-61 rappdirs_0.3.3
[59] DelayedArray_0.33.1 rjson_0.2.23
[61] tools_4.4.1 glue_1.8.0
[63] VennDiagram_1.7.3 restfulr_0.0.15
[65] InteractionSet_1.33.0 grid_4.4.1
[67] polylabelr_0.2.0 reshape2_1.4.4
[69] generics_0.1.3 BSgenome_1.75.0
[71] gtable_0.3.6 tidyr_1.3.1
[73] ensembldb_2.29.1 data.table_1.16.2
[75] hms_1.1.3 xml2_1.3.6
[77] utf8_1.2.4 XVector_0.45.0
[79] pillar_1.9.0 stringr_1.5.1
[81] splines_4.4.1 dplyr_1.1.4
[83] BiocFileCache_2.15.0 lattice_0.22-6
[85] survival_3.7-0 rtracklayer_1.65.0
[87] bit_4.5.0 universalmotif_1.23.8
[89] tidyselect_1.2.1 RBGL_1.81.0
[91] maketools_1.3.1 Biostrings_2.75.0
[93] knitr_1.48 ProtGenerics_1.37.1
[95] SummarizedExperiment_1.35.5 svglite_2.1.3
[97] futile.logger_1.4.3 xfun_0.48
[99] Biobase_2.67.0 matrixStats_1.4.1
[101] stringi_1.8.4 UCSC.utils_1.1.0
[103] lazyeval_0.2.2 yaml_2.3.10
[105] evaluate_1.0.1 codetools_0.2-20
[107] tibble_3.2.1 BiocManager_1.30.25
[109] graph_1.83.0 cli_3.6.3
[111] systemfonts_1.1.0 munsell_0.5.1
[113] jquerylib_0.1.4 Rcpp_1.0.13
[115] dbplyr_2.5.0 png_0.1-8
[117] XML_3.99-0.17 parallel_4.4.1
[119] blob_1.2.4 prettyunits_1.2.0
[121] AnnotationFilter_1.31.0 bitops_1.0-9
[123] pwalign_1.1.0 scales_1.3.0
[125] purrr_1.0.2 crayon_1.5.3
[127] BiocStyle_2.35.0 rlang_1.1.4
[129] KEGGREST_1.45.1 formatR_1.14