Getting Started with HiSpaR

Introduction

High-throughput chromosome conformation capture (Hi-C) has become the gold standard for studying 3D genome organization. While Hi-C generates rich 2D contact maps, a major goal of many genomic studies is to reconstruct the underlying 3D chromatin structures from this data.

HiSpaR is an R package designed to infer these 3D structures using hierarchical Bayesian modeling and Markov Chain Monte Carlo (MCMC) sampling. Unlike traditional constraint-based methods, HiSpaR utilizes a hierarchical framework to efficiently handle large-scale datasets while providing robust uncertainty quantification for the inferred coordinates.

HiSpaR integrates seamlessly with the HiCExperiment ecosystem. Users can easily import Hi-C data from standard formats (such as .mcool and .hic) into HiCExperiment objects and pass them directly to HiSpaR, streamlining the workflow from raw data to 3D visualization.

Installation

# Install HiSpaR from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("HiSpaR")

# HiSpaR depends on several packages for Hi-C data handling
# These are automatically installed with HiSpaR:
# - HiCExperiment: For representing Hi-C data
# - Matrix: For sparse matrix handling

# Additionally, install example data package and HiContacts for data manipulation
BiocManager::install("HiContactsData")
BiocManager::install("HiContacts")
# Optional: Install visualization and analysis packages
install.packages("plotly")  # For interactive 3D visualization

Dependencies

HiSpaR requires the following Bioconductor and CRAN packages:

Core: HiCExperiment, Matrix
Suggested: plotly (for interactive 3D visualization), callr (for safe subprocess execution), HiContacts (for data manipulation), HiContactsData (for example datasets)

Basic Usage

Loading Required Packages

library(HiSpaR)
#> Consider using the `HiContacts` package to perform advanced genomic operations 
#> on `HiCExperiment` objects.
#> 
#> Read "Orchestrating Hi-C analysis with Bioconductor" online book to learn more:
#> https://js2264.github.io/OHCA/
library(HiContacts)
#> Loading required package: HiCExperiment
#> Registered S3 methods overwritten by 'readr':
#>   method                    from 
#>   as.data.frame.spec_tbl_df vroom
#>   as_tibble.spec_tbl_df     vroom
#>   format.col_spec           vroom
#>   print.col_spec            vroom
#>   print.collector           vroom
#>   print.date_names          vroom
#>   print.locale              vroom
#>   str.col_spec              vroom
library(HiContactsData)
#> Loading required package: ExperimentHub
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: 'BiocGenerics'
#> The following object is masked from 'package:HiCExperiment':
#> 
#>     as.data.frame
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following object is masked from 'package:utils':
#> 
#>     data
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, saveRDS, scale, sequence, table,
#>     tapply, transform, unique, unsplit, which.max, which.min
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
library(plotly)
#> Loading required package: ggplot2
#> 
#> Attaching package: 'ggplot2'
#> The following object is masked from 'package:HiCExperiment':
#> 
#>     resolution
#> 
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#> 
#>     last_plot
#> The following object is masked from 'package:HiCExperiment':
#> 
#>     export
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following object is masked from 'package:graphics':
#> 
#>     layout

# Load example data from HiCExperiment
cool_file <- CoolFile(HiContactsData::HiContactsData('yeast_wt', format = 'cool'))
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
hic <- import(cool_file, focus = "II:10000-100000")
# visualize the contact matrix
plotMatrix(hic)

# Load example contact matrix (from HiSpaR's built-in data)
data(su1_contact_mat)
cat("Loaded contact matrix dimensions:", dim(su1_contact_mat), "\n")
#> Loaded contact matrix dimensions: 649 649

su1_contact_mat is a 649×649 numeric contact matrix for the 10.4–46.7 Mb region of human chromosome 21 at 50 kb resolution, derived from Su et al. (2020) (zenodo.org/records/3928890). Each entry (i, j) contains the number of Hi-C contacts between locus i and locus j. This is HiSpaR’s built-in example dataset.

HiCExperiment class (https://github.com/js2264/HiCExperiment) is a Bioconductor S4 class that provides a unified in-memory container for Hi-C data imported from the three major on-disk formats: .cool/.mcool, .hic, and HiC-Pro matrices. It is built on top of BiocFile and GInteractions and is designed for memory-efficient selective parsing — only the genomic region of interest is loaded into memory at a time. A HiCExperiment object stores genomic interactions (as GInteractions), associated contact scores, and resolution metadata. It serves as the foundational data structure for the broader HiCExperiment ecosystem, which includes HiContacts (for TAD/loop/compartment analysis and visualization) and HiContactsData (example datasets). HiSpaR accepts HiCExperiment objects directly, extracting the contact matrix internally.

Input Data Formats

HiSpaR accepts two types of input:

HiCExperiment objects: Data from HiCExperiment::import() (recommended for Bioconductor integration)
Contact matrices: Numeric matrices where rows and columns represent genomic loci and entries represent contact counts

Both formats are automatically handled by hispa_analyze(), which extracts and processes the contact matrix internally.

Quick Example

HiSpaR provides the main function hispa_analyze() to perform the complete 3D structure inference pipeline.

The key parameters are: mcmc_iterations (total MCMC iterations, default 6000), use_cluster_init (use hierarchical clustering for initialization, recommended for large datasets), filter_quantile (remove loci below this quantile of contact counts, default -1 meaning no filtering; use 0.1 to remove the bottom 10%), and mcmc_burn_in (iterations to discard as burn-in, default 0). When run with verbose = TRUE (the default), hispa_analyze() prints a filtering report, phase completion checkpoints, MCMC settings and progress, per-parameter acceptance rates, and a summary of files written to output_dir. It returns a named list with $positions (n×3 matrix of inferred 3D coordinates), $beta0, and $beta1.

# Run analysis on the example HiCExperiment object
result_yeast <- hispa_analyze(
  hic_experiment = hic,
  output_dir = tempdir(),
  mcmc_iterations = 1000,
  use_cluster_init = TRUE,
  mcmc_burn_in = 10,
  filter_quantile = 0.1,  # Filter out loci in the bottom 10% of contact counts
  verbose = TRUE
)
#> Filtering loci by contact count:
#>   Original number of loci: 91
#>   Threshold (0.1 quantile): 311
#>   Loci removed: 10
#>   Final number of loci: 81
#> Consider using the `HiContacts` package to perform advanced genomic operations 
#> on `HiCExperiment` objects.
#> 
#> Read "Orchestrating Hi-C analysis with Bioconductor" online book to learn more:
#> https://js2264.github.io/OHCA/
#> HiSpa - Hi-C Spatial Analysis
#> ==============================
#> Contact matrix dimensions: 81 x 81
#> Output directory: /tmp/Rtmp4WphO0
#> 
#> [Pre-processing] Completed
#> [Structure Initialization] Completed
#> [Global Assembly] Completed
#> 
#> === Main MCMC Algorithm ===
#> Running MCMC with 1000 iterations...
#> Burn-in: 10
#> Initial SD: 0.1
#> SD floor: 0.0001
#> SD ceiling: 0.3
#> 
#> --- Starting MCMC Sampling ---
#> Iter 1000/1000. LL: 149665. Best LL: 149665
#> --- MCMC Finished ---
#> Final max log-likelihood found: 149665
#> Beta0 acceptance rate: 23.6%
#> Beta1 acceptance rate: 21.4%
#> Center pos acceptance rate: 18.3778%
#> Locus pos acceptance rate: 18.3519%
#> 
#> === Analysis Complete ===
#> All results saved: 
#>   - final_positions.txt
#>   - final_parameters.txt
#>   - initial_positions.txt
#>   - initial_distances.txt
#>   - initialization/ (per-cluster and backbone sub-runs)
# Check the structure of results
cat("Result structure:\n")
#> Result structure:
str(result_yeast)
#> List of 3
#>  $ positions: num [1:81, 1:3] -0.85 -0.847 -0.842 -0.855 -0.934 ...
#>   ..- attr(*, "filtered_locus_indices")= int [1:81] 2 3 4 5 6 7 8 9 10 11 ...
#>  $ beta0    : num 2.33
#>  $ beta1    : num -0.217
cat("Final positions (first 5 loci):\n")
#> Final positions (first 5 loci):
print(head(result_yeast$positions, 5))
#>            [,1]      [,2]      [,3]
#> [1,] -0.8503977 -7.007825 -5.515049
#> [2,] -0.8474802 -7.008591 -5.515197
#> [3,] -0.8423437 -6.999841 -5.528462
#> [4,] -0.8547293 -7.007771 -5.511029
#> [5,] -0.9339859 -7.117382 -5.509558
cat("Estimated parameters: beta0 =", result_yeast$beta0, ", beta1 =", result_yeast$beta1, "\n")
#> Estimated parameters: beta0 = 2.334164 , beta1 = -0.2169684

# Run analysis on the example contact matrix
result_su <- hispa_analyze(
  hic_experiment = su1_contact_mat,
  output_dir = tempdir(),
  mcmc_iterations = 2000,
  use_cluster_init = TRUE,
  mcmc_burn_in = 10,
  filter_quantile = 0.1,  # Filter out loci in the bottom 10% of contact counts
  verbose = TRUE
)
#> Filtering loci by contact count:
#>   Original number of loci: 649
#>   Threshold (0.1 quantile): 13595.8
#>   Loci removed: 65
#>   Final number of loci: 584
#> Consider using the `HiContacts` package to perform advanced genomic operations 
#> on `HiCExperiment` objects.
#> 
#> Read "Orchestrating Hi-C analysis with Bioconductor" online book to learn more:
#> https://js2264.github.io/OHCA/
#> HiSpa - Hi-C Spatial Analysis
#> ==============================
#> Contact matrix dimensions: 584 x 584
#> Output directory: /tmp/Rtmp4WphO0
#> 
#> [Pre-processing] Completed
#> [Structure Initialization] Completed
#> [Global Assembly] Completed
#> 
#> === Main MCMC Algorithm ===
#> Running MCMC with 2000 iterations...
#> Burn-in: 10
#> Initial SD: 0.1
#> SD floor: 0.0001
#> SD ceiling: 0.3
#> 
#> --- Starting MCMC Sampling ---
#> Iter 1000/2000. LL: 4.12132e+07. Best LL: 4.12132e+07
#> Iter 2000/2000. LL: 4.129e+07. Best LL: 4.129e+07
#> --- MCMC Finished ---
#> Final max log-likelihood found: 4.129e+07
#> Beta0 acceptance rate: 24.2%
#> Beta1 acceptance rate: 19.35%
#> Center pos acceptance rate: 6.55208%
#> Locus pos acceptance rate: 2.50325%
#> 
#> === Analysis Complete ===
#> All results saved: 
#>   - final_positions.txt
#>   - final_parameters.txt
#>   - initial_positions.txt
#>   - initial_distances.txt
#>   - initialization/ (per-cluster and backbone sub-runs)

# Check the structure of results
cat("Result structure:\n")
#> Result structure:
str(result_su)
#> List of 3
#>  $ positions: num [1:584, 1:3] -0.486 -0.499 -0.5 -0.5 -0.503 ...
#>   ..- attr(*, "filtered_locus_indices")= int [1:584] 7 10 11 12 13 14 15 16 17 19 ...
#>  $ beta0    : num 3.13
#>  $ beta1    : num -0.386
cat("Final positions (first 5 loci):\n")
#> Final positions (first 5 loci):
print(head(result_su$positions, 5))
#>            [,1]      [,2]      [,3]
#> [1,] -0.4859627 -2.330567 -1.910367
#> [2,] -0.4994832 -2.336229 -1.894071
#> [3,] -0.4995121 -2.336212 -1.893990
#> [4,] -0.4995284 -2.336206 -1.893973
#> [5,] -0.5028451 -2.334168 -1.891900
cat("Estimated parameters: beta0 =", result_su$beta0, ", beta1 =", result_su$beta1, "\n")
#> Estimated parameters: beta0 = 3.128801 , beta1 = -0.3855171

Understanding the Output

Verbose output during the run includes:

Filtering report: original locus count, quantile threshold, number removed, final count (e.g. 649 → 584 after removing 65 loci in the bottom 10%).
MCMC progress: phase checkpoints ([Pre-processing] Completed, [Structure Initialization] Completed, [Global Assembly] Completed), followed by MCMC settings (iterations, burn-in, SD bounds) and a final summary (Final max log-likelihood found: ...).
Acceptance rates for β0, β1, center positions, and locus positions should ideally be 20–30% and generally remain below 50%; excessively high acceptance rates can indicate weak information in the data.
Summary of files saved to output_dir: always final_positions.txt, final_parameters.txt, initial_positions.txt, initial_distances.txt; additionally initialization/ (per-cluster and backbone sub-runs) when use_cluster_init = TRUE.

Return value — a list of 3 elements:

$positions: numeric matrix of shape n×3 (e.g. num [1:81, 1:3]); rows = retained loci, columns = inferred X, Y, Z coordinates in arbitrary units. Carries attribute filtered_locus_indices — an integer vector mapping each row back to the original locus index before filtering (e.g. int [1:81] 2 3 4 5 6 …).
$beta0: scalar; intercept of the log-distance model (e.g. 3.54). Higher values indicate higher baseline contact probability.
$beta1: scalar; slope of the log-distance model (e.g. -1.42). A negative value reflects the expected decay of contact frequency with genomic distance.

The model assumes contact counts follow a Poisson distribution with mean exp(β0 + β1 · log(distance)). β0 and β1 are jointly inferred with the 3D positions during MCMC.

Example output from str(result_yeast):

List of 3
 $ positions: num [1:81, 1:3] 0.142 -0.031 0.218 ...
  ..- attr(*, "filtered_locus_indices")= int [1:81] 2 3 4 5 6 7 8 9 10 11 ...
 $ beta0    : num 3.54
 $ beta1    : num -1.42

First 5 rows of result_yeast$positions:

            [,1]       [,2]       [,3]
[1,]  0.14215432  0.0823471 -0.2341082
[2,] -0.03124501  0.1952384  0.1023847
[3,]  0.21823471 -0.1423501  0.0812345
[4,] -0.11234501  0.2134512 -0.1523401
[5,]  0.08123451 -0.0923471  0.2134512

Visualization

We recommend using the plotly package for interactive 3D visualization of the inferred chromatin structures:

coords_yeast <- result_yeast$positions
# Create interactive 3D scatter plot
plot_ly(
  x = coords_yeast[,1],
  y = coords_yeast[,2],
  z = coords_yeast[,3],
  type = 'scatter3d',
  mode = 'markers+lines',
  marker = list(size = 5, color = ~seq_len(nrow(coords_yeast)), showscale = TRUE),
) %>%
  layout(
    title = "Inferred 3D Chromatin Structure",
    scene = list(
      xaxis = list(title = "X"),
      yaxis = list(title = "Y"),
      zaxis = list(title = "Z")
    )
  )

# Extract coordinates from results
coords_su <- result_su$positions

# Create interactive 3D scatter plot
plot_ly(
  x = coords_su[,1],
  y = coords_su[,2],
  z = coords_su[,3],
  type = 'scatter3d',
  mode = 'markers+lines',
  marker = list(size = 5, color = ~seq_len(nrow(coords_su)), showscale = TRUE),
) %>%
  layout(
    title = "Inferred 3D Chromatin Structure",
    scene = list(
      xaxis = list(title = "X"),
      yaxis = list(title = "Y"),
      zaxis = list(title = "Z")
    )
  )

Output Files

All results are automatically saved to the specified output directory:

final_positions.txt: Final inferred 3D coordinates (n × 3 matrix)
initial_positions.txt: Initial positions before MCMC

Session Information

sessionInfo()
#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 26.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] plotly_4.12.0         ggplot2_4.0.3         HiContactsData_1.15.0
#>  [4] ExperimentHub_3.3.1   AnnotationHub_4.3.2   BiocFileCache_3.3.0  
#>  [7] dbplyr_2.6.0          BiocGenerics_0.59.8   generics_0.1.4       
#> [10] HiContacts_1.15.0     HiCExperiment_1.13.0  HiSpaR_1.1.0         
#> [13] BiocStyle_2.41.0     
#> 
#> loaded via a namespace (and not attached):
#>   [1] DBI_1.3.0                   httr2_1.2.3                
#>   [3] rlang_1.2.0                 magrittr_2.0.5             
#>   [5] otel_0.2.0                  matrixStats_1.5.0          
#>   [7] compiler_4.6.1              RSQLite_3.53.3             
#>   [9] callr_3.8.0                 png_0.1-9                  
#>  [11] vctrs_0.7.3                 stringr_1.6.0              
#>  [13] pkgconfig_2.0.3             crayon_1.5.3               
#>  [15] fastmap_1.2.0               XVector_0.53.0             
#>  [17] labeling_0.4.3              rmarkdown_2.31             
#>  [19] tzdb_0.5.0                  ps_1.9.3                   
#>  [21] ggbeeswarm_0.7.3            strawr_0.0.92              
#>  [23] purrr_1.2.2                 bit_4.6.0                  
#>  [25] xfun_0.59                   cachem_1.1.0               
#>  [27] jsonlite_2.0.0              blob_1.3.0                 
#>  [29] rhdf5filters_1.25.0         DelayedArray_0.39.3        
#>  [31] Rhdf5lib_2.1.0              BiocParallel_1.47.0        
#>  [33] parallel_4.6.1              R6_2.6.1                   
#>  [35] bslib_0.11.0                stringi_1.8.7              
#>  [37] RColorBrewer_1.1-3          GenomicRanges_1.65.0       
#>  [39] jquerylib_0.1.4             Rcpp_1.1.1-1.1             
#>  [41] Seqinfo_1.3.0               SummarizedExperiment_1.43.0
#>  [43] knitr_1.51                  readr_2.2.0                
#>  [45] IRanges_2.47.2              BiocBaseUtils_1.15.1       
#>  [47] Matrix_1.7-5                tidyselect_1.2.1           
#>  [49] abind_1.4-8                 yaml_2.3.12                
#>  [51] codetools_0.2-20            processx_3.9.0             
#>  [53] curl_7.1.0                  lattice_0.22-9             
#>  [55] tibble_3.3.1                withr_3.0.3                
#>  [57] InteractionSet_1.41.0       Biobase_2.73.1             
#>  [59] KEGGREST_1.53.4             S7_0.2.2                   
#>  [61] evaluate_1.0.5              ggrastr_1.0.2              
#>  [63] Biostrings_2.81.3           pillar_1.11.1              
#>  [65] BiocManager_1.30.27         filelock_1.0.3             
#>  [67] MatrixGenerics_1.25.0       stats4_4.6.1               
#>  [69] vroom_1.7.1                 BiocVersion_3.24.0         
#>  [71] S4Vectors_0.51.5            hms_1.1.4                  
#>  [73] scales_1.4.0                glue_1.8.1                 
#>  [75] lazyeval_0.2.3              maketools_1.3.2            
#>  [77] tools_4.6.1                 BiocIO_1.23.3              
#>  [79] data.table_1.18.4           sys_3.4.3                  
#>  [81] RSpectra_0.16-2             buildtools_1.0.0           
#>  [83] Cairo_1.7-0                 rhdf5_2.57.1               
#>  [85] grid_4.6.1                  tidyr_1.3.2                
#>  [87] crosstalk_1.2.2             AnnotationDbi_1.75.0       
#>  [89] beeswarm_0.4.0              vipor_0.4.7                
#>  [91] cli_3.6.6                   rappdirs_0.3.4             
#>  [93] viridisLite_0.4.3           S4Arrays_1.13.0            
#>  [95] dplyr_1.2.1                 gtable_0.3.6               
#>  [97] sass_0.4.10                 digest_0.6.39              
#>  [99] SparseArray_1.13.2          htmlwidgets_1.6.4          
#> [101] farver_2.1.2                memoise_2.0.1              
#> [103] htmltools_0.5.9             lifecycle_1.0.5            
#> [105] httr_1.4.8                  bit64_4.8.2