| Title: | Hierarchical Inference of Spatial Positions from Hi-C Data |
|---|---|
| Description: | Provides R bindings for HiSpa, a hierarchical Bayesian model for inferring three-dimensional chromatin structures from Hi-C contact matrices using Markov Chain Monte Carlo (MCMC) sampling. The package implements a cluster-based hierarchical approach that efficiently handles large-scale Hi-C datasets. It uses Rcpp and RcppArmadillo for efficient C++ integration with the original HiSpa C++ implementation, enabling fast computation of chromatin structure inference through parallel MCMC sampling. |
| Authors: | Yingcheng Luo [aut, cre] |
| Maintainer: | Yingcheng Luo <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-30 09:36:41 UTC |
| Source: | https://github.com/bioc/HiSpaR |
Performs hierarchical Bayesian inference of 3D chromatin structure from Hi-C contact matrix using MCMC sampling. This function follows the exact workflow from the HiSpa C++ implementation.
hispa_analyze( hic_experiment, output_dir, mcmc_iterations = 6000L, num_clusters = 0L, mcmc_burn_in = 0L, mcmc_initial_sd = 0.1, mcmc_sd_floor = 1e-04, mcmc_sd_ceil = 0.3, use_cluster_init = FALSE, cluster_init_iterations = 1000L, cluster_initial_sd = 0.1, save_samples = FALSE, sample_interval = 50L, verbose = TRUE, filter_quantile = -1, safe_mode = TRUE )hispa_analyze( hic_experiment, output_dir, mcmc_iterations = 6000L, num_clusters = 0L, mcmc_burn_in = 0L, mcmc_initial_sd = 0.1, mcmc_sd_floor = 1e-04, mcmc_sd_ceil = 0.3, use_cluster_init = FALSE, cluster_init_iterations = 1000L, cluster_initial_sd = 0.1, save_samples = FALSE, sample_interval = 50L, verbose = TRUE, filter_quantile = -1, safe_mode = TRUE )
hic_experiment |
A Bioconductor 'HiCExperiment' object or a numeric matrix containing Hi-C contact data. If a 'HiCExperiment' object, the function extracts the contact matrix using 'gi2cm()' from the interactions and converts it to a numeric matrix for analysis. If a matrix, it is used directly. |
output_dir |
Character string specifying the output directory path. |
mcmc_iterations |
Integer, number of MCMC iterations (default: 6000). |
num_clusters |
Integer, number of clusters for hierarchical analysis. If 0 (default), automatically determined as sqrt(n). |
mcmc_burn_in |
Integer, number of burn-in iterations to discard (default: 0). |
mcmc_initial_sd |
Numeric, initial standard deviation for MCMC proposals (default: 0.1). |
mcmc_sd_floor |
Numeric, minimum allowed standard deviation (default: 0.0001). |
mcmc_sd_ceil |
Numeric, maximum allowed standard deviation (default: 0.3). |
use_cluster_init |
Logical, use cluster-based initialization instead of random initialization (default: FALSE). |
cluster_init_iterations |
Integer, number of iterations for cluster initialization MCMC (default: 1000). |
cluster_initial_sd |
Numeric, initial standard deviation for cluster initialization MCMC (default: 0.1). |
save_samples |
Logical, whether to save MCMC trace samples (default: FALSE). |
sample_interval |
Integer, save samples every n iterations (default: 50). |
verbose |
Logical, enable verbose output (default: TRUE). |
filter_quantile |
Numeric, quantile threshold for filtering loci by contact counts (default: -1, no filtering). If >= 0, loci with row sums below this quantile are removed. For example, 0.1 removes loci in the bottom 10% of contact counts. |
safe_mode |
Logical, whether to run analysis in a safe subprocess using callr (default: TRUE). If TRUE and callr is available, analysis runs in an isolated subprocess to prevent R crashes from affecting the parent session. Set to FALSE to run inline. |
This function implements the complete HiSpa workflow:
Filter loci by contact count (optional)
Load contact matrix from file
Assign loci to clusters (k-means)
Build cluster relationships
Initialize structure (random or cluster-based)
Assemble global structure
Run main MCMC algorithm
Save results to output directory
Locus Filtering: By default, no filtering is applied (filter_quantile = -1). If filter_quantile >= 0, loci with contact counts below the specified quantile are removed before analysis. For example, filter_quantile = 0.1 removes loci in the bottom 10% of contact counts. This improves computational efficiency and focuses analysis on loci with sufficient data.
All results are automatically saved to the output directory:
final_positions.txt - Final inferred 3D coordinates (n x 3 matrix)
initial_positions.txt - Initial positions before MCMC (n x 3 matrix)
Read results using standard R functions:
final_pos <- read.table("output_dir/final_positions.txt")
A list containing:
positions - A numeric matrix of dimensions n x 3 containing the final inferred 3D coordinates of loci. Rows correspond to loci, columns are (x, y, z).
beta0 - The final intercept parameter of the log-distance relationship.
beta1 - The final slope parameter of the log-distance relationship.
If filtering was applied, the positions matrix has an attribute 'filtered_locus_indices' containing the original indices of the retained loci (before filtering). Additionally, all analysis results are saved as text files in the output directory.
# Load example contact matrix data(su1_contact_mat) # Check dimensions dim(su1_contact_mat) # Example 1: Run analysis with matrix input su_res <- hispa_analyze( hic_experiment = su1_contact_mat, output_dir = tempdir(), mcmc_iterations = 100, mcmc_burn_in = 10, use_cluster_init = FALSE, verbose = TRUE ) # check results dim(su_res$positions) # Should be n x 3# Load example contact matrix data(su1_contact_mat) # Check dimensions dim(su1_contact_mat) # Example 1: Run analysis with matrix input su_res <- hispa_analyze( hic_experiment = su1_contact_mat, output_dir = tempdir(), mcmc_iterations = 100, mcmc_burn_in = 10, use_cluster_init = FALSE, verbose = TRUE ) # check results dim(su_res$positions) # Should be n x 3
Hi-C contact matrix of Human Chromosome 21, collected from https://zenodo.org/records/3928890. This dataset is provided as an example for testing and demonstrating the HiSpaR package functionality.
su1_contact_matsu1_contact_mat
A symmetric numeric matrix with 649 rows and 649 columns, where entry (i,j) represents the normalized contact frequency between genomic loci i and j. The matrix is:
649 x 649
Numeric matrix
Symmetric (contact_matrix[i,j] = contact_matrix[j,i])
Zero or near-zero (self-contacts)
Non-negative contact frequencies
This contact matrix represents chromatin interaction frequencies derived from Hi-C experiments on Human Chromosome 21. Higher values indicate more frequent spatial proximity between genomic loci in the 3D nuclear space.
The data can be used directly with hispa_analyze
Human Hi-C data, chromosome 21, collected from https://zenodo.org/records/3928890
hispa_analyze for running the analysis
# Load the example data data(su1_contact_mat) # Check dimensions dim(su1_contact_mat) # Check if matrix is symmetric isSymmetric(su1_contact_mat) # Summary statistics summary(as.vector(su1_contact_mat)) # Visualize contact matrix image(su1_contact_mat, main = "Hi-C Contact Matrix (SU1)", xlab = "Genomic Locus", ylab = "Genomic Locus")# Load the example data data(su1_contact_mat) # Check dimensions dim(su1_contact_mat) # Check if matrix is symmetric isSymmetric(su1_contact_mat) # Summary statistics summary(as.vector(su1_contact_mat)) # Visualize contact matrix image(su1_contact_mat, main = "Hi-C Contact Matrix (SU1)", xlab = "Genomic Locus", ylab = "Genomic Locus")