High-throughput chromosome conformation capture (Hi-C) has become the gold standard for studying 3D genome organization. While Hi-C generates rich 2D contact maps, a major goal of many genomic studies is to reconstruct the underlying 3D chromatin structures from this data.
HiSpaR is an R package designed to infer these 3D structures using hierarchical Bayesian modeling and Markov Chain Monte Carlo (MCMC) sampling. Unlike traditional constraint-based methods, HiSpaR utilizes a hierarchical framework to efficiently handle large-scale datasets while providing robust uncertainty quantification for the inferred coordinates.
HiSpaR integrates seamlessly with the HiCExperiment ecosystem. Users can easily import Hi-C data from standard formats (such as .mcool and .hic) into HiCExperiment objects and pass them directly to HiSpaR, streamlining the workflow from raw data to 3D visualization.
# Install HiSpaR from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("HiSpaR")
# HiSpaR depends on several packages for Hi-C data handling
# These are automatically installed with HiSpaR:
# - HiCExperiment: For representing Hi-C data
# - Matrix: For sparse matrix handling
# Additionally, install example data package and HiContacts for data manipulation
BiocManager::install("HiContactsData")
BiocManager::install("HiContacts")
# Optional: Install visualization and analysis packages
install.packages("plotly") # For interactive 3D visualizationHiSpaR requires the following Bioconductor and CRAN packages:
library(HiSpaR)
#> Consider using the `HiContacts` package to perform advanced genomic operations
#> on `HiCExperiment` objects.
#>
#> Read "Orchestrating Hi-C analysis with Bioconductor" online book to learn more:
#> https://js2264.github.io/OHCA/
library(HiContacts)
#> Loading required package: HiCExperiment
#> Registered S3 methods overwritten by 'readr':
#> method from
#> as.data.frame.spec_tbl_df vroom
#> as_tibble.spec_tbl_df vroom
#> format.col_spec vroom
#> print.col_spec vroom
#> print.collector vroom
#> print.date_names vroom
#> print.locale vroom
#> str.col_spec vroom
library(HiContactsData)
#> Loading required package: ExperimentHub
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: 'BiocGenerics'
#> The following object is masked from 'package:HiCExperiment':
#>
#> as.data.frame
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following object is masked from 'package:utils':
#>
#> data
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
#> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#> rbind, Reduce, rownames, sapply, saveRDS, scale, sequence, table,
#> tapply, transform, unique, unsplit, which.max, which.min
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
library(plotly)
#> Loading required package: ggplot2
#>
#> Attaching package: 'ggplot2'
#> The following object is masked from 'package:HiCExperiment':
#>
#> resolution
#>
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#>
#> last_plot
#> The following object is masked from 'package:HiCExperiment':
#>
#> export
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following object is masked from 'package:graphics':
#>
#> layout
# Load example data from HiCExperiment
cool_file <- CoolFile(HiContactsData::HiContactsData('yeast_wt', format = 'cool'))
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
hic <- import(cool_file, focus = "II:10000-100000")
# visualize the contact matrix
plotMatrix(hic)# Load example contact matrix (from HiSpaR's built-in data)
data(su1_contact_mat)
cat("Loaded contact matrix dimensions:", dim(su1_contact_mat), "\n")
#> Loaded contact matrix dimensions: 649 649su1_contact_mat is a 649×649 numeric
contact matrix for the 10.4–46.7 Mb region of human chromosome 21 at 50
kb resolution, derived from Su et al. (2020)
(zenodo.org/records/3928890). Each entry (i, j) contains the number of
Hi-C contacts between locus i and locus j. This is HiSpaR’s built-in
example dataset.
HiCExperiment class (https://github.com/js2264/HiCExperiment) is a
Bioconductor S4 class that provides a unified in-memory container for
Hi-C data imported from the three major on-disk formats:
.cool/.mcool, .hic, and HiC-Pro
matrices. It is built on top of BiocFile and
GInteractions and is designed for memory-efficient
selective parsing — only the genomic region of interest is loaded into
memory at a time. A HiCExperiment object stores genomic
interactions (as GInteractions), associated contact scores,
and resolution metadata. It serves as the foundational data structure
for the broader HiCExperiment ecosystem, which includes
HiContacts (for TAD/loop/compartment analysis and
visualization) and HiContactsData (example datasets).
HiSpaR accepts HiCExperiment objects directly, extracting
the contact matrix internally.
HiSpaR accepts two types of input:
HiCExperiment::import() (recommended for Bioconductor
integration)Both formats are automatically handled by
hispa_analyze(), which extracts and processes the contact
matrix internally.
HiSpaR provides the main function hispa_analyze() to
perform the complete 3D structure inference pipeline.
The key parameters are: mcmc_iterations (total MCMC
iterations, default 6000), use_cluster_init (use
hierarchical clustering for initialization, recommended for large
datasets), filter_quantile (remove loci below this quantile
of contact counts, default -1 meaning no filtering; use 0.1 to remove
the bottom 10%), and mcmc_burn_in (iterations to discard as
burn-in, default 0). When run with verbose = TRUE (the
default), hispa_analyze() prints a filtering report, phase
completion checkpoints, MCMC settings and progress, per-parameter
acceptance rates, and a summary of files written to
output_dir. It returns a named list with
$positions (n×3 matrix of inferred 3D coordinates),
$beta0, and $beta1.
# Run analysis on the example HiCExperiment object
result_yeast <- hispa_analyze(
hic_experiment = hic,
output_dir = tempdir(),
mcmc_iterations = 1000,
use_cluster_init = TRUE,
mcmc_burn_in = 10,
filter_quantile = 0.1, # Filter out loci in the bottom 10% of contact counts
verbose = TRUE
)
#> Filtering loci by contact count:
#> Original number of loci: 91
#> Threshold (0.1 quantile): 311
#> Loci removed: 10
#> Final number of loci: 81
#> Consider using the `HiContacts` package to perform advanced genomic operations
#> on `HiCExperiment` objects.
#>
#> Read "Orchestrating Hi-C analysis with Bioconductor" online book to learn more:
#> https://js2264.github.io/OHCA/
#> HiSpa - Hi-C Spatial Analysis
#> ==============================
#> Contact matrix dimensions: 81 x 81
#> Output directory: /tmp/Rtmp25qfmf
#>
#> [Pre-processing] Completed
#> [Structure Initialization] Completed
#> [Global Assembly] Completed
#>
#> === Main MCMC Algorithm ===
#> Running MCMC with 1000 iterations...
#> Burn-in: 10
#> Initial SD: 0.1
#> SD floor: 0.0001
#> SD ceiling: 0.3
#>
#> --- Starting MCMC Sampling ---
#> Iter 1000/1000. LL: 150446. Best LL: 150446
#> --- MCMC Finished ---
#> Final max log-likelihood found: 150446
#> Beta0 acceptance rate: 22.8%
#> Beta1 acceptance rate: 21%
#> Center pos acceptance rate: 19.2%
#> Locus pos acceptance rate: 17.9642%
#>
#> === Analysis Complete ===
#> All results saved:
#> - final_positions.txt
#> - final_parameters.txt
#> - initial_positions.txt
#> - initial_distances.txt
#> - initialization/ (per-cluster and backbone sub-runs)
# Check the structure of results
cat("Result structure:\n")
#> Result structure:
str(result_yeast)
#> List of 3
#> $ positions: num [1:81, 1:3] -5.8 -5.8 -5.82 -5.8 -5.82 ...
#> ..- attr(*, "filtered_locus_indices")= int [1:81] 2 3 4 5 6 7 8 9 10 11 ...
#> $ beta0 : num 2.32
#> $ beta1 : num -0.214
cat("Final positions (first 5 loci):\n")
#> Final positions (first 5 loci):
print(head(result_yeast$positions, 5))
#> [,1] [,2] [,3]
#> [1,] -5.804303 -1.400376 6.290445
#> [2,] -5.802735 -1.400246 6.291190
#> [3,] -5.819375 -1.421945 6.282167
#> [4,] -5.803130 -1.399313 6.291804
#> [5,] -5.817255 -1.601328 6.131225
cat("Estimated parameters: beta0 =", result_yeast$beta0, ", beta1 =", result_yeast$beta1, "\n")
#> Estimated parameters: beta0 = 2.324505 , beta1 = -0.2140643
# Run analysis on the example contact matrix
result_su <- hispa_analyze(
hic_experiment = su1_contact_mat,
output_dir = tempdir(),
mcmc_iterations = 2000,
use_cluster_init = TRUE,
mcmc_burn_in = 10,
filter_quantile = 0.1, # Filter out loci in the bottom 10% of contact counts
verbose = TRUE
)
#> Filtering loci by contact count:
#> Original number of loci: 649
#> Threshold (0.1 quantile): 13595.8
#> Loci removed: 65
#> Final number of loci: 584
#> Consider using the `HiContacts` package to perform advanced genomic operations
#> on `HiCExperiment` objects.
#>
#> Read "Orchestrating Hi-C analysis with Bioconductor" online book to learn more:
#> https://js2264.github.io/OHCA/
#> HiSpa - Hi-C Spatial Analysis
#> ==============================
#> Contact matrix dimensions: 584 x 584
#> Output directory: /tmp/Rtmp25qfmf
#>
#> [Pre-processing] Completed
#> [Structure Initialization] Completed
#> [Global Assembly] Completed
#>
#> === Main MCMC Algorithm ===
#> Running MCMC with 2000 iterations...
#> Burn-in: 10
#> Initial SD: 0.1
#> SD floor: 0.0001
#> SD ceiling: 0.3
#>
#> --- Starting MCMC Sampling ---
#> Iter 1000/2000. LL: 4.18908e+07. Best LL: 4.18908e+07
#> Iter 2000/2000. LL: 4.19484e+07. Best LL: 4.19484e+07
#> --- MCMC Finished ---
#> Final max log-likelihood found: 4.19484e+07
#> Beta0 acceptance rate: 23%
#> Beta1 acceptance rate: 20.05%
#> Center pos acceptance rate: 3.85625%
#> Locus pos acceptance rate: 2.74195%
#>
#> === Analysis Complete ===
#> All results saved:
#> - final_positions.txt
#> - final_parameters.txt
#> - initial_positions.txt
#> - initial_distances.txt
#> - initialization/ (per-cluster and backbone sub-runs)
# Check the structure of results
cat("Result structure:\n")
#> Result structure:
str(result_su)
#> List of 3
#> $ positions: num [1:584, 1:3] 2.84 2.83 2.83 2.83 2.82 ...
#> ..- attr(*, "filtered_locus_indices")= int [1:584] 7 10 11 12 13 14 15 16 17 19 ...
#> $ beta0 : num 3.1
#> $ beta1 : num -0.411
cat("Final positions (first 5 loci):\n")
#> Final positions (first 5 loci):
print(head(result_su$positions, 5))
#> [,1] [,2] [,3]
#> [1,] 2.841019 1.623848 -1.511908
#> [2,] 2.833924 1.629489 -1.506037
#> [3,] 2.833981 1.629377 -1.506003
#> [4,] 2.834014 1.629340 -1.505997
#> [5,] 2.821998 1.637273 -1.499680
cat("Estimated parameters: beta0 =", result_su$beta0, ", beta1 =", result_su$beta1, "\n")
#> Estimated parameters: beta0 = 3.099472 , beta1 = -0.4110913Verbose output during the run includes:
[Pre-processing] Completed,
[Structure Initialization] Completed,
[Global Assembly] Completed), followed by MCMC settings
(iterations, burn-in, SD bounds) and a final summary
(Final max log-likelihood found: ...).output_dir: always
final_positions.txt, final_parameters.txt,
initial_positions.txt, initial_distances.txt;
additionally initialization/ (per-cluster and backbone
sub-runs) when use_cluster_init = TRUE.Return value — a list of 3 elements:
$positions: numeric matrix of shape n×3
(e.g. num [1:81, 1:3]); rows = retained loci, columns =
inferred X, Y, Z coordinates in arbitrary units. Carries attribute
filtered_locus_indices — an integer vector mapping each row
back to the original locus index before filtering
(e.g. int [1:81] 2 3 4 5 6 …).$beta0: scalar; intercept of the log-distance model
(e.g. 3.54). Higher values indicate higher baseline contact
probability.$beta1: scalar; slope of the log-distance model
(e.g. -1.42). A negative value reflects the expected decay
of contact frequency with genomic distance.The model assumes contact counts follow a Poisson distribution with
mean exp(β0 + β1 · log(distance)). β0 and β1 are jointly
inferred with the 3D positions during MCMC.
Example output from str(result_yeast):
List of 3
$ positions: num [1:81, 1:3] 0.142 -0.031 0.218 ...
..- attr(*, "filtered_locus_indices")= int [1:81] 2 3 4 5 6 7 8 9 10 11 ...
$ beta0 : num 3.54
$ beta1 : num -1.42First 5 rows of result_yeast$positions:
We recommend using the plotly package for interactive 3D
visualization of the inferred chromatin structures:
coords_yeast <- result_yeast$positions
# Create interactive 3D scatter plot
plot_ly(
x = coords_yeast[,1],
y = coords_yeast[,2],
z = coords_yeast[,3],
type = 'scatter3d',
mode = 'markers+lines',
marker = list(size = 5, color = ~seq_len(nrow(coords_yeast)), showscale = TRUE),
) %>%
layout(
title = "Inferred 3D Chromatin Structure",
scene = list(
xaxis = list(title = "X"),
yaxis = list(title = "Y"),
zaxis = list(title = "Z")
)
)# Extract coordinates from results
coords_su <- result_su$positions
# Create interactive 3D scatter plot
plot_ly(
x = coords_su[,1],
y = coords_su[,2],
z = coords_su[,3],
type = 'scatter3d',
mode = 'markers+lines',
marker = list(size = 5, color = ~seq_len(nrow(coords_su)), showscale = TRUE),
) %>%
layout(
title = "Inferred 3D Chromatin Structure",
scene = list(
xaxis = list(title = "X"),
yaxis = list(title = "Y"),
zaxis = list(title = "Z")
)
)All results are automatically saved to the specified output directory:
final_positions.txt: Final inferred 3D coordinates (n ×
3 matrix)initial_positions.txt: Initial positions before
MCMCsessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] plotly_4.12.0 ggplot2_4.0.3 HiContactsData_1.15.0
#> [4] ExperimentHub_3.3.0 AnnotationHub_4.3.0 BiocFileCache_3.3.0
#> [7] dbplyr_2.5.2 BiocGenerics_0.59.6 generics_0.1.4
#> [10] HiContacts_1.15.0 HiCExperiment_1.13.0 HiSpaR_1.1.0
#> [13] BiocStyle_2.41.0
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.3.0 httr2_1.2.2
#> [3] rlang_1.2.0 magrittr_2.0.5
#> [5] otel_0.2.0 matrixStats_1.5.0
#> [7] compiler_4.6.0 RSQLite_3.53.1
#> [9] callr_3.7.6 png_0.1-9
#> [11] vctrs_0.7.3 stringr_1.6.0
#> [13] pkgconfig_2.0.3 crayon_1.5.3
#> [15] fastmap_1.2.0 XVector_0.53.0
#> [17] labeling_0.4.3 rmarkdown_2.31
#> [19] tzdb_0.5.0 ps_1.9.3
#> [21] ggbeeswarm_0.7.3 strawr_0.0.92
#> [23] purrr_1.2.2 bit_4.6.0
#> [25] xfun_0.57 cachem_1.1.0
#> [27] jsonlite_2.0.0 blob_1.3.0
#> [29] rhdf5filters_1.25.0 DelayedArray_0.39.3
#> [31] Rhdf5lib_2.1.0 BiocParallel_1.47.0
#> [33] parallel_4.6.0 R6_2.6.1
#> [35] bslib_0.11.0 stringi_1.8.7
#> [37] RColorBrewer_1.1-3 GenomicRanges_1.65.0
#> [39] jquerylib_0.1.4 Rcpp_1.1.1-1.1
#> [41] Seqinfo_1.3.0 SummarizedExperiment_1.43.0
#> [43] knitr_1.51 readr_2.2.0
#> [45] IRanges_2.47.2 BiocBaseUtils_1.15.1
#> [47] Matrix_1.7-5 tidyselect_1.2.1
#> [49] abind_1.4-8 yaml_2.3.12
#> [51] codetools_0.2-20 processx_3.9.0
#> [53] curl_7.1.0 lattice_0.22-9
#> [55] tibble_3.3.1 InteractionSet_1.41.0
#> [57] Biobase_2.73.1 withr_3.0.2
#> [59] KEGGREST_1.53.0 S7_0.2.2
#> [61] evaluate_1.0.5 ggrastr_1.0.2
#> [63] Biostrings_2.81.2 pillar_1.11.1
#> [65] BiocManager_1.30.27 filelock_1.0.3
#> [67] MatrixGenerics_1.25.0 stats4_4.6.0
#> [69] vroom_1.7.1 BiocVersion_3.24.0
#> [71] S4Vectors_0.51.3 hms_1.1.4
#> [73] scales_1.4.0 glue_1.8.1
#> [75] lazyeval_0.2.3 maketools_1.3.2
#> [77] tools_4.6.0 BiocIO_1.23.3
#> [79] data.table_1.18.4 sys_3.4.3
#> [81] RSpectra_0.16-2 buildtools_1.0.0
#> [83] Cairo_1.7-0 rhdf5_2.57.1
#> [85] grid_4.6.0 tidyr_1.3.2
#> [87] crosstalk_1.2.2 AnnotationDbi_1.75.0
#> [89] beeswarm_0.4.0 vipor_0.4.7
#> [91] cli_3.6.6 rappdirs_0.3.4
#> [93] viridisLite_0.4.3 S4Arrays_1.13.0
#> [95] dplyr_1.2.1 gtable_0.3.6
#> [97] sass_0.4.10 digest_0.6.39
#> [99] SparseArray_1.13.2 htmlwidgets_1.6.4
#> [101] farver_2.1.2 memoise_2.0.1
#> [103] htmltools_0.5.9 lifecycle_1.0.5
#> [105] httr_1.4.8 bit64_4.8.2