---
title: "Bayesian Analysis of Hi-C Interactions with HiCPotts"
author: "Itunu Godwin Osuntoki, Nicolae Radu Zabet"
date: "`r Sys.Date()`"
output:   
  BiocStyle::html_document:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Bayesian Analysis of Hi-C Interactions with HiCPotts}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

High-throughput chromosome conformation capture (Hi-C) technologies have revolutionized our understanding of 3D genome organization by mapping interactions between genomic loci. However, Hi-C data are inherently noisy and affected by experimental biases such as GC content, transposable elements, and DNA accessibility, which complicate the identification of biologically significant interactions.

The HiCPotts package provides a comprehensive framework for Bayesian analysis of Hi-C interaction data using a Hidden Markov Random Field (HMRF) model. Hi-C is a high-throughput sequencing technique that captures chromatin interactions across the genome, revealing spatial organization. This package models these interactions using a mixture of distributions (Poisson, Negative Binomial, Zero-Inflated Poisson, or Zero-Inflated Negative Binomial) while accounting for covariates, genomic distance, GC content, accessibility, and transposable element (TE) counts. The HMRF framework incorporates spatial dependencies via a Potts model, and the package employs Markov Chain Monte Carlo (MCMC) methods for parameter estimation.

An HMRF models spatial dependencies in a lattice (e.g., a matrix of Hi-C interactions) by assigning each site to one of several latent states (mixture components). The Potts model governs spatial interactions, encouraging neighboring sites to share the same state, controlled by an interaction parameter ($\gamma$).

The package uses MCMC to estimate model parameters (regression coefficients ($\beta$), zero-inflation parameter ($\theta$), dispersion parameter for Binomial distributions, and ($\gamma$)) by sampling from their posterior distributions. Covariates like genomic distance and GC content adjust for biases in interaction counts. Each Hi-C interaction is modeled as belonging to one of three mixture components following a specified distribution.

The package performance was optimized through C++ implementations using Rcpp and RcppArmadillo. The package also supports parallel processing and flexible distribution choices, making it suitable for large-scale genomic analyses.

# Motivation

Most existing computational methods fail to adequately model the spatial dependencies and overdispersion in Hi-C contact matrices, limiting their ability to distinguish true signals from other components such as noise. The HiCPotts package addresses these challenges by providing a novel Bayesian framework to detect enriched interactions while accounting for experimental biases in Hi-C data. Its integration into Bioconductor is motivated by its robust statistical approach, computational efficiency, and ability to provide insights into bias sources, making it a valuable tool for researchers studying chromatin architecture in diverse biological contexts.

HiCPotts package also extends the existing knowledge of classifying interacting loci into two components to three components (true signal, false signal, and noise) and its focus on bias correction (DNA accessibility, transposable elements) enhance its utility for integrative genomic studies. 

# Comparison with Existing Packages
There are several Bioconductor packages that addresses Hi-C data analysis, each with distinct functionalities and scopes. We highlight the uniqueness of the HiCPotts package with other existing packages:

a.  diffHic: The diffHic package focuses on detecting differential interactions between biological conditions using the edgeR framework for statistical modeling. diffHic provides methods for read pair alignment, binning, filtering, and normalization of biases (e.g., trended or CNV-driven). While diffHic excels at differential analysis, it does not explicitly model spatial dependencies in Hi-C data or account for overdispersion as HiCPotts does. HiCPotts is better suited for identifying enriched interactions within a Hi-C experiment and exploring bias sources, whereas diffHic is ideal for comparative studies across conditions.


b.  HiCcompare and multiHiCcompare: HiCcompare offers joint normalization and difference detection for multiple Hi-C datasets, operating on sparse chromatin interaction matrices. multiHiCcompare extends this to handle multiple groups and replicates using cyclic loess normalization and a general linear model (GLM) based on edgeR. Both packages emphasize comparative analysis and normalization but do not focus on detecting enriched interactions within a dataset or modeling spatial dependencies. HiCPotts’s Bayesian approach and bias correction make it complementary, as it prioritizes significant interaction detection and bias insight over differential analysis.

c.  HiCDCPlus: HiCDCPlus enables significant interaction calling and differential analysis for Hi-C and HiChIP data using a negative binomial generalized linear model. It includes tools for topologically associating domain (TAD) and A/B compartment calling, integrating with visualization tools. Like HiCPotts, it calls significant interactions, but HiCPotts’s HMRF-based model and ABC approach provide superior handling of spatial dependencies and computational tractability for chromosome-wide analysis. HiCDCPlus requires GC content information (computable internally), while HiCPotts additionally corrects for transposable elements and DNA accessibility biases.

d.  scHiCcompare: Designed for single-cell Hi-C data, scHiCcompare supports imputation, normalization, and differential interaction analysis across single-cell datasets. Its focus on single-cell data makes it distinct from HiCPotts, which targets bulk Hi-C data. HiCPotts’s ability to model overdispersion and spatial dependencies is not replicated in scHiCcompare, which prioritizes single-cell-specific challenges.


Other Tools: Packages like HiCdat and HiCExperiment provide preprocessing, visualization, or data manipulation for Hi-C data but lack the statistical rigor of HiCPotts for interaction detection. HiCdat offers a graphical interface for preprocessing and integrative analysis with other omics data, while HiCExperiment provides data structures for 3C-related experiments. Neither focuses on enriched interaction detection or bias correction like HiCPotts.

# Features of HiCPotts

HiCPotts allows researchers to identify significant intra-chromosomal interactions in Hi-C data while correcting for experimental biases. Its key features include:

1.  Use of Zero-Inflated distributions to handle overdispersion.

2.  An HMRF-based Bayesian framework with the Potts model for spatial dependency.

3.  ABC for computationally efficient handling of the Potts model’s normalizing constant.

4.  Bias correction for GC content, transposable elements, and DNA accessibility.

5.  Unlike diffHic, HiCcompare, and multiHiCcompare, which focus on differential analysis, or HiCDCPlus, which balances interaction calling and differential analysis, HiCPotts prioritizes enriched interaction detection. Its bias correction and spatial modeling make it a powerful complement to existing tools, enhancing Bioconductor’s suite for 3D genome analysis. HiCPotts can be used for initial interaction detection, followed by diffHic or HiCcompare for differential studies, creating a comprehensive Hi-C analysis workflow.


# Installation

HiCPotts depends on several CRAN and Bioconductor packages.

Install them as follows:

Install CRAN dependencies


```{r setup, include=FALSE}
library(BiocStyle)
knitr::opts_chunk$set(eval = TRUE, echo = TRUE, collapse = TRUE)
```

```{r install-cran, echo=TRUE, eval=FALSE}
install.packages(c("Rcpp", "RcppArmadillo", "parallel"))
```

```{r install-bioc, echo=TRUE, eval=FALSE}
# Install Bioconductor dependencies
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install(c("rhdf5", "strawr", "rtracklayer", "GenomicRanges", "BSgenome", "Biostrings"))
```

```{r load-package, echo=TRUE, eval=TRUE}
# Loading the package:
# install.packages(HiCPotts)
library(HiCPotts)
```

# Workflow Overview

The HiCPotts workflow involves four main steps:

1.  Data Loading: Use `get_data()` to read Hi-C contact matrices (`.hic`, `.cool`, `.h5`) and annotate bins with GC content, accessibility, and TE counts.

2.  Data Processing: Use `process_data()` to convert the data into (N \times N) matrices of interactions and covariates.

3.  MCMC Simulation: Use `run_chain_betas()` to run MCMC chains, estimating parameters and latent state assignments.

4.  Probability Computation: Use `compute_HMRFHiC_probabilities()` to calculate posterior probabilities of component assignments.

We’ll demonstrate this workflow using synthetic Hi-C data for simplicity.

## Step 1: Loading Hi-C Data

The `get_data()` function reads Hi-C contact matrices and annotates bins with covariates. For this example, we simulate a small dataset instead of using real Hi-C files, which require specific file formats and genome annotations.

Simulate a 10x10 Hi-C dataset

```{r simulate-data, include=FALSE, eval=TRUE}
set.seed(4921)

# Lattice size: an N x N grid of genomic bins.
# Each cell of the grid represents an interaction between bin i and bin j.
N <- 10

# Define the start coordinate of each bin (10 bins, 2 kb wide → 20 kb region)
bins <- seq(1, 20000, by = 2000)

# Create every pairwise combination of bin i vs bin j
data <- expand.grid(start = bins, start.j. = bins)

# Bin end coordinates (each bin is 2 kb wide)
data$end.i. <- data$start + 1999

data$end <- data$start.j. + 1999

# Chromosome label (Drosophila chr2L here)
data$chrom <- "chr2L"

# ── Simulate the four bias covariates ────────────────────────────────────────
data$GC <- runif(nrow(data), 0.3, 0.7) # Simulated GC content, GC fraction (biological range 30–70%)

data$ACC <- runif(nrow(data), 0, 1) # Simulated accessibility, DNase-I accessibility score (0–1)

data$TES <- rpois(nrow(data), 2) # Simulated TE counts, Count of overlapping transposable elements

# Poisson(λ = 5) approximates sparse Hi-C counts at this resolution
data$interactions <- rpois(nrow(data), lambda = 5) # Simulated interaction counts

# Coerce to a plain data.frame (expand.grid returns a class with extra attrs)
data <- as.data.frame(data)

# Inspect the first few rows – note the required column names:
#   start, start.j., end.i., end, chrom, GC, ACC, TES, interactions
head(data)
```

In practice, the `get_data()` function is used to load real Hi-C data from .cool, .mcool, or .hic files, similar to the HiCExperiment package. Both convert these files into structured outputs, but they differ in their results. HiCExperiment generates a HiCExperiment object that includes a contact matrix, genomic regions, metadata (e.g., resolution, chromosome), and pairwise interactions. In contrast, `get_data()` produces a data frame from the same file types, incorporating optional calculations (if available) to identify sources of experimental biases, such as GC content or DNA accessibility. Also the function support loading profiles as bigwig or bedgraph and will be imported as GRanges.

```{r get_data, include=FALSE, eval=FALSE}
# Example (not run)
# For actual Hi-C / Micro-C files use get_data() with these arguments:
#
#   file_path      – path to a .cool, .mcool, .hic, or .h5 matrix
#   chr            – chromosome to extract (e.g. "chr2L", "chr4")
#   start, end     – genomic window (bp, 1-based inclusive)
#   resolution     – target bin size in bp; native bins are merged/sliced
#   genome_package – BSgenome package used for GC computation
#   acc_wig        – optional bedGraph/wig of DNase-I (or ATAC) accessibility
#   chain_file     – optional liftOver chain if acc_wig is in another assembly
#   te_granges     – optional BED/GTF (or GRanges) of transposable elements

data <- prepare_data(
  file_path = "path/to/hic_file.h5",
  chr = "chr2L",
  start = 1,
  end = 20000,
  resolution = 2000,
  genome_package = "BSgenome.Dmelanogaster.UCSC.dm6",
  acc_wig = "path/to/dnase.wig",
  chain_file = "path/to/chain.file",
  te_granges = "path/to/te.gtf"
)
```

## Step 2: Processing Data

The `process_data()` function converts the data frame generated through the `prepare_data()` function into a list of (N $\times$ N) matrices for interactions (`y`) and covariates (`x_vars`), optionally scaling interaction counts. Also, if a HiCExperiment object is already available and the sources of biases are known and organized in a separate data frame, these can be combined into a single data frame using base R functions and the `prepare_data()` function to convert into a list for analysis.

```{r process_data, include=FALSE, eval=TRUE}
# ── Reshape the tidy interaction table into matrices for MCMC ────────────────
# process_data() returns a list with two elements:
#   $y      – an N x N matrix of observed interaction counts
#   $x_vars – a list of N x N matrices, one per covariate
#             (distance, GC, TES, ACC)
#
# Arguments:
#   data             – data.frame with the 9 required columns (see above)
#   N                – lattice dimension (here 10, matching 10 bins)
#   scale_max        – upper bound used when rescaling counts (prevents
#                      very large values from dominating the likelihood)
#   standardization_y – TRUE rescales counts to [0, scale_max]; FALSE keeps raw

processed <- process_data(data, N = N, scale_max = 500, standardization_y = TRUE)

# Pull the two pieces out of the returned list
x_vars <- processed[["x_vars"]] # Single matrix for this example
y <- processed[["y"]]

# Inspect x_vars: should be a list of distance, GC, TES, ACC matrices
str(x_vars) # Lists of distance, GC, TEs, ACC matrices
```

This produces matrices for distance, GC, TEs, ACC, and interaction counts, ready for MCMC.

## Step 3: Running MCMC Simulations

The `run_chain_betas()` function runs MCMC simulations to estimate parameters (($\beta$), ($\gamma$), ($\theta$ (for Zero-Inflated distributions)), size(for Negative binomial distributions)) and latent state assignments ((z)). In the example below, we use the Zero-Inflated Negative Binomial (ZINB) distribution to model the simulated sparse Hi-C data.

```{r setup_MCMC, include=FALSE, eval=TRUE}
# ── Define MCMC hyper-parameters and starting values ─────────────────────────

# Potts spatial-coupling parameter (γ). Values closer to 1 encourage
# neighbouring bins on the lattice to share the same latent state.
gamma_prior <- 0.3

# Total number of MCMC iterations. Burn-in defaults to iterations / 2.
# Use a small value here for vignette speed; 5 000+ is typical for real data.
iterations <- 20

# Initial value for the zero-inflation parameter (θ).
# Required when dist = "ZIP" or "ZINB"; ignored otherwise.
theta_start <- c(0.5)

# size_start (commented out) – only needed for "NB" or "ZINB":
# one initial over-dispersion value per mixture component.
# size_start <- c(1, 1, 1)

# Choice of count distribution for interactions. Options:
#   "Poisson", "NB", "ZIP", "ZINB"
# ZIP is a reasonable default for sparse Hi-C with many structural zeros.
dist <- "ZIP"
```

```{r run_MCMC, include=FALSE, eval=TRUE}
# ── Run the Metropolis-Hastings MCMC sampler ─────────────────────────────────
# run_chain_betas() samples from the joint posterior of the regression
# coefficients for all three mixture components simultaneously.

results <- run_chain_betas(
  N = N,
  gamma_prior = gamma_prior,
  iterations = iterations,
  x_vars = x_vars,
  y = y,
  theta_start = theta_start,
  use_data_priors = TRUE,
  # user_fixed_priors = FALSE,
  # epsilon = NULL,
  distance_metric = "manhattan",
  dist = dist,
  # size_start = size_start,
  mc_cores = 1
)
```

```{r MCMC_results, include=FALSE, eval=TRUE}
# ── Extract the individual components of the MCMC output ─────────────────────
# 'results' is a named list. Each element corresponds to a different set
# of posterior samples.

# Regression coefficient chains – one matrix per mixture component
# (rows = iterations, columns = covariates incl. intercept)

chains <- results[["chains"]]

# Posterior samples of the Potts spatial-coupling parameter γ
gamma <- results[["gamma"]]

# Posterior samples of the zero-inflation parameter θ (ZIP / ZINB only)
theta <- results[["theta"]]

# Over-dispersion size parameter – only present for NB / ZINB
# size <- results[["size]]
```

The output includes chains for regression parameters (`chains`), the Potts interaction parameter (`gamma`), zero-inflation parameter (`theta`), and dispersion parameters (`size`).

## Step 4: Computing Posterior Probabilities

The `compute_HMRFHiC_probabilities()` function calculates posterior probabilities for each interaction belonging to one of three mixture components, using the MCMC chains.

```{r compute_result, include=FALSE, eval=TRUE}
# ── Compute posterior HMRF component-assignment probabilities ────────────────
# For every bin-pair, compute P(component = k) for k = 1, 2, 3 using the
# posterior-mean regression parameters (first iterations/2 are discarded
# as burn-in internally).

probs <- compute_HMRFHiC_probabilities(
  data = data,
  chain_betas = chains,
  iterations = iterations,
  dist = "ZIP"
)

probs$prob1 # This produces matrices of probabilities for the first component, which translate to assigning interactions to biological states (e.g., active vs. inactive chromatin).
```


# Worked Examples: Hi-C and Micro-C Data

The synthetic example above is useful for illustrating the API but is not
biologically meaningful. This section walks through two fully realistic
analyses — one for **Hi-C** and one for **Micro-C** — using the exact same
four-step workflow. The Hi-C data files are shipped with the package under
`inst/extdata`; the Micro-C paths are illustrative and should be replaced
with your own files.

## Example 1: Hi-C Data

We analyse a 100 kb-resolution Hi-C contact map of *D. melanogaster* chr4
in BG3 cells, integrating DNase-I accessibility and transposable-element
annotations for bias correction.

### Step 1 — Locate input files

```{r hic_paths, echo=TRUE, eval=TRUE}
# Hi-C contact matrix (.cool, 100 kb, BG3 cells, chr4)
hic_file   <- system.file("extdata",
                          "BG3_WT_merged_hic_matrix_chr4_100Kb.cool",
                          package = "HiCPotts")

# DNase-I accessibility bedGraph (dm3 genome coordinates)
dnase_file <- system.file("extdata",
                          "DNaseI_BG3_gr_chr4.bedGraph",
                          package = "HiCPotts")

# LiftOver chain — maps dm3 accessibility up to dm6 assembly
chain_file <- system.file("extdata",
                          "dm3ToDm6_chr4_only.chain",
                          package = "HiCPotts")

# Transposable-element annotations (GTF, dm6 chr4)
te_file    <- system.file("extdata",
                          "dm6_TEs_chr4.gtf",
                          package = "HiCPotts")

stopifnot(file.exists(hic_file),
          file.exists(dnase_file),
          file.exists(chain_file),
          file.exists(te_file))
```

### Step 2 — Load Hi-C data and compute covariates

```{r hic_get_data, echo=TRUE, eval=TRUE}
# get_data() does four things in one call:
#   1. Reads the .cool file and rebins to the requested resolution
#   2. Calculates GC content from the BSgenome package
#   3. Lifts over the DNase-I signal (dm3 -> dm6) and aggregates per bin
#   4. Counts transposable elements overlapping each bin pair
hic_data <- get_data(
  file_path      = hic_file,
  chr            = "chr4",
  start          = 1,
  end            = 400000,
  resolution     = 200000,
  genome_package = "BSgenome.Dmelanogaster.UCSC.dm6",
  acc_wig        = dnase_file,
  chain_file     = chain_file,
  te_granges     = te_file
)

head(hic_data)
```

### Step 3 — Process data into matrices

```{r hic_process, echo=TRUE, eval=FALSE}
# (end - start) / resolution, rounded up for the edge bin.
N_hic <- 14

processed_hic <- process_data(
  hic_data,
  N                 = N_hic,
  scale_max         = 500,
  standardization_y = TRUE
)

x_vars_hic <- processed_hic[["x_vars"]]
y_hic      <- processed_hic[["y"]]
```

### Step 4 — Run MCMC with a realistic sampler configuration

```{r hic_mcmc, echo=TRUE, eval=FALSE}
# ZINB is recommended for Hi-C because of structural zeros and
# over-dispersion typical of ligation-based count data.
set.seed(2025)

hic_results <- run_chain_betas(
  N                = N_hic,
  gamma_prior      = 0.3,
  iterations       = 5000,
  x_vars           = x_vars_hic,
  y                = y_hic,
  theta_start      = 0.5,
  size_start       = c(1, 1, 1),
  use_data_priors  = TRUE,
  distance_metric  = "manhattan",
  dist             = "ZINB",
  mc_cores         = min(4L, parallel::detectCores() - 1L)
)

# run_chain_betas() returns a list with one element per input dataset.
# Unpack the first (and here only) chain set:
hic_chains <- hic_results[[1]][["chains"]]
hic_gamma  <- hic_results[[1]][["gamma"]]
hic_theta  <- hic_results[[1]][["theta"]]
hic_size   <- hic_results[[1]][["size"]]
```

### Step 5 — Compute posterior component probabilities

```{r hic_probs, echo=TRUE, eval=FALSE}
# Pass the full results list, not just the chains.
hic_probs <- compute_HMRFHiC_probabilities(
  data        = hic_data,
  chain_betas = hic_results,
  iterations  = 5000,
  dist        = "ZINB"
)

# hic_probs is a data.frame with prob1, prob2, prob3 columns that sum to 1
# per row. prob1 is the noise/baseline component; prob2 and prob3 are the
# elevated-signal regimes.
head(hic_probs[, c("start", "end", "interactions", "prob1", "prob2", "prob3")])

# Bin-pairs confidently classified as elevated signal (prob2 > 0.9). This value can be increase or reduce based on user's justification.
genuine_signal <- hic_probs[hic_probs$prob2 > 0.9,
                            c("start", "end", "interactions", "prob2")]
head(genuine_signal)
```

## Example 2: Micro-C Data

Micro-C uses micrococcal nuclease (MNase) instead of a restriction enzyme,
yielding nucleosome-resolution contact maps (~150 bp). The HiCPotts workflow
is **identical** to Hi-C; practical differences:

* **Input file**: typically a higher-resolution `.cool` or `.mcool`.
* **Resolution**: 1 kb – 5 kb rather than 10 – 100 kb.
* **Structural zeros**: even more abundant at fine resolution — ZINB strongly
  recommended.
* **Iterations**: larger lattices need longer chains to converge.

### Step 1 — Locate input files

```{r microc_paths, echo=TRUE, eval=FALSE}
# Replace these with the actual paths on your machine.
microc_file <- "path/to/MicroC_BG3_chr4_1kb.cool"
dnase_file  <- "path/to/DNaseI_BG3_gr_chr4.bedGraph"
chain_file  <- "path/to/dm3ToDm6_chr4_only.chain"
te_file     <- "path/to/dm6_TEs_chr4.gtf"
```

### Step 2 — Load Micro-C data

```{r microc_get_data, echo=TRUE, eval=FALSE}
# Aggregate from the native 1 kb to 5 kb to keep the lattice manageable.
microc_data <- get_data(
  file_path      = microc_file,
  chr            = "chr4",
  start          = 1,
  end            = 1350000,
  resolution     = 5000,
  genome_package = "BSgenome.Dmelanogaster.UCSC.dm6",
  acc_wig        = dnase_file,
  chain_file     = chain_file,
  te_granges     = te_file
)

head(microc_data)
```

### Step 3 — Process into matrices

```{r microc_process, echo=TRUE, eval=FALSE}
# 1,350,000 / 5,000 = 270 bins -> a 270 x 270 lattice.
N_microc <- 270

processed_microc <- process_data(
  microc_data,
  N                 = N_microc,
  scale_max         = 500,
  standardization_y = TRUE
)

x_vars_microc <- processed_microc[["x_vars"]]
y_microc      <- processed_microc[["y"]]
```

### Step 4 — Run MCMC

```{r microc_mcmc, echo=TRUE, eval=FALSE}
set.seed(2025)

microc_results <- run_chain_betas(
  N                = N_microc,
  gamma_prior      = 0.3,
  iterations       = 10000,
  x_vars           = x_vars_microc,
  y                = y_microc,
  theta_start      = 0.5,
  size_start       = c(1, 1, 1),
  use_data_priors  = TRUE,
  distance_metric  = "manhattan",
  dist             = "ZINB",
  mc_cores         = min(4L, parallel::detectCores() - 1L)
)
```

### Step 5 — Posterior probabilities and classification

```{r microc_probs, echo=TRUE, eval=FALSE}
microc_probs <- compute_HMRFHiC_probabilities(
  data        = microc_data,
  chain_betas = microc_results,
  iterations  = 10000,
  dist        = "ZINB"
)

# Hard component assignment: the column with the highest posterior probability.
probs_mat    <- as.matrix(microc_probs[, c("prob1", "prob2", "prob3")])
component    <- max.col(probs_mat, ties.method = "first")
table(component)
```

## Practical Notes on Hi-C vs. Micro-C

| Aspect                   | Hi-C           | Micro-C        |
|--------------------------|----------------|----------------|
| Typical resolution       | 10 – 100 kb    | 1 – 5 kb       |
| Lattice size `N`         | 10 – 1000      | 100 – 10000    |
| Recommended `dist`       | ZINB or NB     | ZINB           |
| Recommended `iterations` | 2000 – 5000    | 5000 – 20000   |
| `mc_cores`               | 2 – 4          | 4 – 8          |

The same `get_data() → process_data() → run_chain_betas() → compute_HMRFHiC_probabilities()`
pipeline applies to both; only the argument values and compute budget change.


# Advanced Usage

## Custom Priors

To use user-specified priors instead of data-driven priors:
```{r User_prior, include=FALSE, eval=TRUE}
user_priors <- list(
  list(mean = c(0, 0, 0, 0, 0), sd = c(1, 1, 1, 1, 1)),
  list(mean = c(0, 0, 0, 0, 0), sd = c(1, 1, 1, 1, 1)),
  list(mean = c(0, 0, 0, 0, 0), sd = c(1, 1, 1, 1, 1))
)

# Run MCMC
results <- run_chain_betas(
  N = N,
  gamma_prior = gamma_prior,
  iterations = iterations,
  x_vars = x_vars,
  y = y,
  theta_start = theta_start,
  # use_data_priors = TRUE,
  user_fixed_priors = user_priors,
  # epsilon = NULL,
  distance_metric = "manhattan",
  dist = dist,
  # size_start = size_start,
  mc_cores = 1
)
```

## Parallel Processing

For multiple Hi-C matrices, use mc_cores to parallelize:

```{r multiple_matrices, include=FALSE, eval=FALSE}
y_list <- list(y, y) # Example with two matrices

results <- run_chain_betas(
  N = N,
  gamma_prior = gamma_prior,
  iterations = iterations,
  x_vars = x_vars,
  y = y_list,
  theta_start = theta_start,
  size_start = size_start,
  use_data_priors = TRUE,
  user_fixed_priors = NULL,
  epsilon = NULL,
  distance_metric = "manhattan",
  dist = dist,
  mc_cores = 2
)
```

## Other Distributions

The package supports Poisson, NB, ZIP, and ZINB distributions. 
For example, to use Poisson:

```{r Additional_distributions, include=FALSE, eval=TRUE}
# Run MCMC
results <- run_chain_betas(
  N = N,
  gamma_prior = gamma_prior,
  iterations = iterations,
  x_vars = x_vars,
  y = y,
  theta_start = theta_start,
  # use_data_priors = TRUE,
  user_fixed_priors = user_priors,
  # epsilon = NULL,
  distance_metric = "manhattan",
  dist = "Poisson",
  # size_start = size_start,
  mc_cores = 1
)
```

# Technical Notes

Performance: The following functions, `pz_123()`, `run_metropolis_MCMC_betas()`, and `Neighbours_combined()`, are implemented in C++ using Rcpp and RcppArmadillo for efficiency, crucial for large Hi-C matrices.

Utility Functions: The following functions, `proposaldensity_combined()`, `likelihood_gamma()`, `gamma_prior_value()`, `posterior_combined()`, `prior_combined()`, `likelihood_combined()`, `proposalfunction()`, and `size_prior()`, support the MCMC process and are typically not called directly by users.

# Conclusion

The HiCPotts package offers a powerful tool for Bayesian analysis of Hi-C data, integrating spatial dependencies, information from sources of bias associated with Hi-C data, and flexible mixture models. This page covered the core workflow, but the package’s functions can be customized for specific research needs, such as different genomic regions or distribution assumptions.

Users trying to bring in extra covariates; beyond genomic distance, GC content, ACC-score and TEs, can do so by classifying each new factor according to how it relates to those four (for example, as a distance-like term, a sequence-composition term, or an interaction‐score term) 
and then inputting it into the same model framework.
Also, the package currently supports only intrachromosomal analyses. In the future, we plan to extend to interchromosomal contacts.


For further details, consult the package documentation (`?HiCPotts`) or contact the package maintainers. We hope HiCPotts facilitates your genomic research!


```{r, echo = FALSE}
sessionInfo()
```