Package 'HiSpaR' reference manual

Title:	Hierarchical Inference of Spatial Positions from Hi-C Data
Description:	Provides R bindings for HiSpa, a hierarchical Bayesian model for inferring three-dimensional chromatin structures from Hi-C contact matrices using Markov Chain Monte Carlo (MCMC) sampling. The package implements a cluster-based hierarchical approach that efficiently handles large-scale Hi-C datasets. It uses Rcpp and RcppArmadillo for efficient C++ integration with the original HiSpa C++ implementation, enabling fast computation of chromatin structure inference through parallel MCMC sampling.
Authors:	Yingcheng Luo [aut, cre]
Maintainer:	Yingcheng Luo <[email protected]>
License:	MIT + file LICENSE
Version:	1.1.0
Built:	2026-07-04 16:47:33 UTC
Source:	https://github.com/bioc/HiSpaR

Run HiSpa MCMC Analysis

Description

Performs hierarchical Bayesian inference of 3D chromatin structure from Hi-C contact matrix using MCMC sampling. This function follows the exact workflow from the HiSpa C++ implementation.

Usage

hispa_analyze(
  hic_experiment,
  output_dir,
  mcmc_iterations = 6000L,
  num_clusters = 0L,
  mcmc_burn_in = 0L,
  mcmc_initial_sd = 0.1,
  mcmc_sd_floor = 1e-04,
  mcmc_sd_ceil = 0.3,
  use_cluster_init = FALSE,
  cluster_init_iterations = 1000L,
  cluster_initial_sd = 0.1,
  save_samples = FALSE,
  sample_interval = 50L,
  verbose = TRUE,
  filter_quantile = -1,
  safe_mode = TRUE
)
hispa_analyze(
  hic_experiment,
  output_dir,
  mcmc_iterations = 6000L,
  num_clusters = 0L,
  mcmc_burn_in = 0L,
  mcmc_initial_sd = 0.1,
  mcmc_sd_floor = 1e-04,
  mcmc_sd_ceil = 0.3,
  use_cluster_init = FALSE,
  cluster_init_iterations = 1000L,
  cluster_initial_sd = 0.1,
  save_samples = FALSE,
  sample_interval = 50L,
  verbose = TRUE,
  filter_quantile = -1,
  safe_mode = TRUE
)

Arguments

hic_experiment

A Bioconductor 'HiCExperiment' object or a numeric matrix containing Hi-C contact data. If a 'HiCExperiment' object, the function extracts the contact matrix using 'gi2cm()' from the interactions and converts it to a numeric matrix for analysis. If a matrix, it is used directly.

output_dir

Character string specifying the output directory path.

mcmc_iterations

Integer, number of MCMC iterations (default: 6000).

num_clusters

Integer, number of clusters for hierarchical analysis. If 0 (default), automatically determined as sqrt(n).

mcmc_burn_in

Integer, number of burn-in iterations to discard (default: 0).

mcmc_initial_sd

Numeric, initial standard deviation for MCMC proposals (default: 0.1).

mcmc_sd_floor

Numeric, minimum allowed standard deviation (default: 0.0001).

mcmc_sd_ceil

Numeric, maximum allowed standard deviation (default: 0.3).

use_cluster_init

Logical, use cluster-based initialization instead of random initialization (default: FALSE).

cluster_init_iterations

Integer, number of iterations for cluster initialization MCMC (default: 1000).

cluster_initial_sd

Numeric, initial standard deviation for cluster initialization MCMC (default: 0.1).

save_samples

Logical, whether to save MCMC trace samples (default: FALSE).

sample_interval

Integer, save samples every n iterations (default: 50).

verbose

Logical, enable verbose output (default: TRUE).

filter_quantile

Numeric, quantile threshold for filtering loci by contact counts (default: -1, no filtering). If >= 0, loci with row sums below this quantile are removed. For example, 0.1 removes loci in the bottom 10% of contact counts.

safe_mode

Logical, whether to run analysis in a safe subprocess using callr (default: TRUE). If TRUE and callr is available, analysis runs in an isolated subprocess to prevent R crashes from affecting the parent session. Set to FALSE to run inline.

Details

This function implements the complete HiSpa workflow:

Filter loci by contact count (optional)
Load contact matrix from file
Assign loci to clusters (k-means)
Build cluster relationships
Initialize structure (random or cluster-based)
Assemble global structure
Run main MCMC algorithm
Save results to output directory

Locus Filtering: By default, no filtering is applied (filter_quantile = -1). If filter_quantile >= 0, loci with contact counts below the specified quantile are removed before analysis. For example, filter_quantile = 0.1 removes loci in the bottom 10% of contact counts. This improves computational efficiency and focuses analysis on loci with sufficient data.

All results are automatically saved to the output directory:

final_positions.txt - Final inferred 3D coordinates (n x 3 matrix)
initial_positions.txt - Initial positions before MCMC (n x 3 matrix)

Read results using standard R functions: final_pos <- read.table("output_dir/final_positions.txt")

Value

A list containing:

positions - A numeric matrix of dimensions n x 3 containing the final inferred 3D coordinates of loci. Rows correspond to loci, columns are (x, y, z).
beta0 - The final intercept parameter of the log-distance relationship.
beta1 - The final slope parameter of the log-distance relationship.

If filtering was applied, the positions matrix has an attribute 'filtered_locus_indices' containing the original indices of the retained loci (before filtering). Additionally, all analysis results are saved as text files in the output directory.

Examples

# Load example contact matrix
data(su1_contact_mat)

# Check dimensions
dim(su1_contact_mat)


# Example 1: Run analysis with matrix input
su_res <- hispa_analyze(
  hic_experiment = su1_contact_mat,
  output_dir = tempdir(),
  mcmc_iterations = 100,
  mcmc_burn_in = 10,
  use_cluster_init = FALSE,
  verbose = TRUE
)

# check results
dim(su_res$positions)  # Should be n x 3


# Load example contact matrix
data(su1_contact_mat)

# Check dimensions
dim(su1_contact_mat)


# Example 1: Run analysis with matrix input
su_res <- hispa_analyze(
  hic_experiment = su1_contact_mat,
  output_dir = tempdir(),
  mcmc_iterations = 100,
  mcmc_burn_in = 10,
  use_cluster_init = FALSE,
  verbose = TRUE
)

# check results
dim(su_res$positions)  # Should be n x 3

Example Hi-C Contact Matrix

Description

Hi-C contact matrix of Human Chromosome 21, collected from https://zenodo.org/records/3928890. This dataset is provided as an example for testing and demonstrating the HiSpaR package functionality.

Usage

su1_contact_mat
su1_contact_mat

Format

A symmetric numeric matrix with 649 rows and 649 columns, where entry (i,j) represents the normalized contact frequency between genomic loci i and j. The matrix is:

Dimensions: 649 x 649
Type: Numeric matrix
Symmetry: Symmetric (contact_matrix[i,j] = contact_matrix[j,i])
Diagonal: Zero or near-zero (self-contacts)
Range: Non-negative contact frequencies

Details

This contact matrix represents chromatin interaction frequencies derived from Hi-C experiments on Human Chromosome 21. Higher values indicate more frequent spatial proximity between genomic loci in the 3D nuclear space.

The data can be used directly with hispa_analyze

Source

Human Hi-C data, chromosome 21, collected from https://zenodo.org/records/3928890

Examples

# Load the example data
data(su1_contact_mat)

# Check dimensions
dim(su1_contact_mat)

# Check if matrix is symmetric
isSymmetric(su1_contact_mat)

# Summary statistics
summary(as.vector(su1_contact_mat))


# Visualize contact matrix
image(su1_contact_mat, 
      main = "Hi-C Contact Matrix (SU1)",
      xlab = "Genomic Locus", 
      ylab = "Genomic Locus")


# Load the example data
data(su1_contact_mat)

# Check dimensions
dim(su1_contact_mat)

# Check if matrix is symmetric
isSymmetric(su1_contact_mat)

# Summary statistics
summary(as.vector(su1_contact_mat))


# Visualize contact matrix
image(su1_contact_mat, 
      main = "Hi-C Contact Matrix (SU1)",
      xlab = "Genomic Locus", 
      ylab = "Genomic Locus")

Package 'HiSpaR'

Help Index

Run HiSpa MCMC Analysis

Description

Usage

Arguments

Details

Value

Examples

Example Hi-C Contact Matrix

Description

Usage

Format

Details

Source

See Also

Examples