PostChicago is a toolbox for integrating, annotating, visualising and
quantifying Capture-(Hi)-C (from now on Capture-C) data derived from the
output of the CHiCAGO pipeline (Chicago
package [1]).
The dataset used in this vignette and supplied with the package is from
a retinoic-acid (RA) mediated embryonic stem cell (ESC) differentiation
(Mahara et al.,
2024 [2], Capture-C Set1).
PostChicago relies on pre-processed files. This can be done using
tools for Hi-C processing up to the point of making filtered-paired
BAM files, such as HiCUP
[3], followed by creating Chicago input (or .chinput) files
using the CHiCAGO tools script bam2chicago.sh.
Before you use PostChicago, you need the following default
directories:
.chinput) files generated from paired BAM
files by CHiCAGO (bam2chicago.sh). Only required in
combination with runChicagoForPostChicago()chicagoData objects
are required for the PostChicago pipelineThis is the typical directory structure:
extdata <- system.file("extdata", package = "PostChicago")
dir(extdata)
#> [1] "dataPath" "designDir" "intervals" "postchicago" "results"
path <- file.path(tempdir(), "my_package_output")
dir.create(path, showWarnings = FALSE)
#Before we run PostChicago, we define the default directories first:
dataPath <- file.path(extdata, "dataPath")
designDir <- file.path(extdata, "designDir")
intDir <- file.path(extdata, "intervals")
outputDir <- file.path(path, "postchicago")
dir.create(outputDir)
chicagoOutputDir <- file.path(extdata, "results")
## checking which .chinput files we have stored:
dir(dataPath)
#> [1] "RA_0h_rep1.chinput" "RA_0h_rep2.chinput" "RA_72h_rep1.chinput"
#> [4] "RA_72h_rep2.chinput"
## Check if you have already stored the chicagoData tables:
dir(chicagoOutputDir)
#> [1] "cd_RA_0h_rep1.rds" "cd_RA_0h_rep2.rds"
#> [3] "cd_RA_0h.rds" "cd_RA_72h_rep1.rds"
#> [5] "cd_RA_72h_rep2.rds" "cd_RA_72h.rds"
#> [7] "fnreadsPerBait" "gDistrTotIntsPerBait"
#> [9] "hSigIntsPerBait" "iDistReadsPerSigIntPerBait"PostChicago contains the function
runChicagoForPostChicago() to call on CHiCAGO so that the
data is created and named to seamlessly fit into our pipeline. However,
chicagoData objects can also be created and saved using the
standard CHiCAGO pipeline.
weightsPath <- file.path(
system.file("extdata", package = "Chicago"),
"weights"
)
weightSettings <- file.path(weightsPath, "mESC-2reps.settings")
weightSettings <- read.delim(weightSettings, header = FALSE)
mySettings <- Chicago::defaultSettings()
mySettings[grep("weight", names(mySettings))] <- weightSettings[, 2]
mySettings$minFragLen <- 100df. This table should contain at least two
columns: filename contains the file names,
type contains the sample name. All lines with the same
sample name will be treated as replicates.df:fls <- list.files(dataPath)
fls
#> [1] "RA_0h_rep1.chinput" "RA_0h_rep2.chinput" "RA_72h_rep1.chinput"
#> [4] "RA_72h_rep2.chinput"
df <- data.frame(
filename = fls,
RA = stringr::str_remove(fls, "_rep.*"),
rep = stringr::str_remove(
stringr::str_remove(fls, "^.*h_"), "\\.chinput$"
)
)
df$type <- df$RA
head(df)
#> filename RA rep type
#> 1 RA_0h_rep1.chinput RA_0h rep1 RA_0h
#> 2 RA_0h_rep2.chinput RA_0h rep2 RA_0h
#> 3 RA_72h_rep1.chinput RA_72h rep1 RA_72h
#> 4 RA_72h_rep2.chinput RA_72h rep2 RA_72hNow you can run Chicago!
Note that for PostChicago we require more than the standard output of
the Chicago pipeline. Instead, we save the entire
chicagoData objects, containing reads and scores for all
pairwise interactions, similar to bedgraph files. We will
require this information for plotting and quantification of interaction
strengths for all samples. In contrast, standard Chicago output
(.ibed files) only saves this information for significant
interactions.
The output will be saved in the chicagoOutputDir. The
conventional naming of the chicagoData objects is
cd_<mysamplename>. They can later be loaded manually
using load(<myChicagoDataObject>) and accessed via
cd.
runChicagoForPostChicago(
df = df, mySettings = mySettings, outputDir = chicagoOutputDir,
DataPath = dataPath, DesignDir = designDir
)Running it now on individual replicates. This is recommended for the downstream analysis.
df$type <- as.character(paste(df$RA, df$rep, sep = "_"))
head(df)
#> filename RA rep type
#> 1 RA_0h_rep1.chinput RA_0h rep1 RA_0h_rep1
#> 2 RA_0h_rep2.chinput RA_0h rep2 RA_0h_rep2
#> 3 RA_72h_rep1.chinput RA_72h rep1 RA_72h_rep1
#> 4 RA_72h_rep2.chinput RA_72h rep2 RA_72h_rep2
runChicagoForPostChicago(
df = df, mySettings = mySettings,
outputDir = chicagoOutputDir,
DataPath = dataPath, DesignDir = designDir
)Regardless of whether the chicagoData objects have been
created via runChicagoForPostChicago() or via CHiCAGO (or
you are using this infrastructure for entirely different data), they
will now be loaded into a list.
For this, PostChicago requires two additional objects:
baited_genes is a GenomicRanges object
that annotates view points and is created from the .baitmap
(one of the CHiCAGO design files stored in the designDir),
which allows to easily iterate through the data and add meaningful names
to the plots. This object has two mandatory columns: 1.)
re_id containing the restriction fragment IDs that
correspond to the view points (baits) and .rmap; 2.)
genename: name of the baitrmapgr is a GenomicRanges object
annotating all restriction fragments and is created from the
.rmap (one of CHiCAGO design files stored in the
designDir). The only mandatory column is id,
containing the ID of the restriction fragment.To generate baited_genes and rmapgr objects
from design files, we can use the helper functions
baitmap2baited_genes() and rmap2rmapgr():
baited_genes <- baitmap2baited_genes(designDir, save = FALSE)
rmapgr <- rmap2rmapgr(designDir, save = FALSE)Now we are ready to read in the chicagoData tables
created by runChicagoForPostChicago() into a list using the
function loadCdList(). These tables contain replicate
normalized reads in the column N, replicate read counts
under N.1, N.2,... and Chicago scores in
score. We provide precomputed chicagoData objects as
examples with the package.
dir(chicagoOutputDir)
#> [1] "cd_RA_0h_rep1.rds" "cd_RA_0h_rep2.rds"
#> [3] "cd_RA_0h.rds" "cd_RA_72h_rep1.rds"
#> [5] "cd_RA_72h_rep2.rds" "cd_RA_72h.rds"
#> [7] "fnreadsPerBait" "gDistrTotIntsPerBait"
#> [9] "hSigIntsPerBait" "iDistReadsPerSigIntPerBait"
cdlist <- loadCdList(
resultsDir = chicagoOutputDir,
baited_genes = baited_genes
)
names(cdlist)
#> [1] "RA_0h_rep1" "RA_0h_rep2" "RA_0h" "RA_72h_rep1" "RA_72h_rep2"
#> [6] "RA_72h"For downstream analysis, we extract the list L
containing all integrated data and LL containing data from
individual replicates:
pooledSamples <- cdlist[-grep("rep", names(cdlist))]
replicateSamples <- cdlist[grep("rep", names(cdlist))]
rm(cdlist)These steps can also easily be done manually without these helper
functions, as long as the input files are organised as such.
To assess the quality of the data and individual replicates,
PostChicago provides the function plotSigIntsStats(). In
addition to the summary statistics, such as read counts per sample, this
function can plot various distributions across baits using the argument
plotDistribution. We will omit this step here and set
plotDistribution=FALSE, since our test dataset only
contains one gene. plotSigIntsStats() is important to
determine inconsistencies that cannot be explained by biology, for
instance a sample with a much larger total read number, but a much
smaller number of significant interactions. We also find that samples
with a high background have a shift in read counts within significant
interactions towards a higher read count.
plotSigIntsStats(pooledSamples, plotDistribution = FALSE,
plotExamples = TRUE, col = palette.colors(3)[2:3])Basic statistics of Capture-C data.
The main function for data visualisation is
plotInteractions().
This function uses the chicagoData objects saved by
runChicagoForPostchicago() and stored in L
(samples) or LL (replicates) to generate normalized line
plots over the genomic location around a view point (bait).
plotInteractions() iterates through the list of
chicagoData tables in L or LL and
plots a running mean read coverage across k adjacent
fragments surrounding the view point as specified by id.
The reads are normalized to the total number of reads mapped to baits
per 100,000 reads and further to the total number of baits in each
capture experiment, which enables a comparison of interactions between
different captures. In the plots, the bait position is indicated by the
grey triangle. In addition to k, we can specify the
distance plotted around a bait using the zoom argument,
with default being +/- 200kb. By default, the plot title contains the
baitID, zoom and k, however we
can add a more descriptive name, such as the genename of the view point,
using the argument name.
The default function creates line plots with bait +/- 200kb:
col <- palette.colors(3)[2:3]
name <- "Hoxb3"
id <- baited_genes[baited_genes$genename == name]$re_id
k <- 15
ylim <- c(0, 200)
plotInteractions(pooledSamples, id, k,
ylim = ylim, show.legend = TRUE, name = name,
rmapgr = rmapgr, col = col
)Capture-C signal +/- 200kb around the Hoxb3 TSS.
The plot above shows a clear difference between the samples with
RA_72h strongly losing the interaction on the left. To assess
reproducibility, we can now make the plots for individual replicates,
stored in the object LL. For this, we can keep the sample
colours, but redefine the line types using the lty option.
We can also change the plotted area via zoom. For a larger
zoom, k should be adjusted to allow for smooth
display of the data:
colreps <- rep(col, each = 2)
lty <- rep(1:2, 2)
zoom <- 500000
k <- 41
plotInteractions(replicateSamples, id, k,
zoom = zoom, ylim = ylim,
show.legend = TRUE, name = name,
rmapgr = rmapgr, col = colreps, lty = lty
)Replicate Capture-C +/- 500kb around the Hoxb3 TSS.
In addition to changing the zoom, which plots the data
symmetrically around the bait, we can specify the exact genomic region
using xlim, which overrides any zoom settings.
To make sure default zoom (+/-0.2Mb) information is omitted from the
plot title, we set zoom to NA. Additionally,
the legend can be plotted outside of the plot using the
show.legend.outside option:
par(mfrow = c(1, 2))
xlim <- c(96150000, 96350000)
zoom <- as.numeric(NA)
k <- 31
plotInteractions(pooledSamples, id,
zoom = zoom, k = k, ylim = ylim,
show.legend = FALSE, name = name,
rmapgr = rmapgr, col = col,
show.legend.outside = TRUE, xlim = xlim
)Capture-C signal within a set interval.
Annotation with intervals, such as ChIP-Seq peaks, is generally done
from a GenomicRangesList object. This can be done manually.
For ease of use, PostChicago offers the function loadGrl()
that automatically constructs the list from either .rds or
.bed files that are supplied in the intDir.
Let’s load our intervals into a GenomicRangesList object
grl:
Intervals in the grl can then be added to the line plots
using the intervals argument:
colintervals <- palette.colors(5)[4:5]
zoom <- 1000000
k <- 51
par(mfrow = c(1, 2))
plotInteractions(pooledSamples, id, k,
zoom = zoom, ylim = ylim, show.legend = FALSE,
show.legend.intervals = FALSE, name = name,
intervals = grl, colintervals = colintervals,
rmapgr = rmapgr, col = col, show.legend.outside = TRUE
)Capture-C signal and ChIP-Seq intervals around Hoxb3. H3K27ac/H3K27me3 peaks are shown in blue/green
Lineplots represent static snap shots. For a more flexible data
browsing, PostChicago provides the function
makeBedGraphs(). Per default, this function creates two
types of files for an id (= bait): i)
bedgraphs of normalized interactions for each sample using
the same normalization as for the Lineplots; ii) a bed file
containing the bait. Unless specified otherwise, using the argument
outputDir, all files are stored in the default directory
called bedgraphs.
To integrate data from multiple samples, we can create an
interactions table containing all information about any significant
interaction in one experiment using the makeIntsTable()
function. We first extract all significant interactions across samples,
annotate them with raw and pooled weighted average read counts. The
reads are then normalized (downsampled) based on the total number of
reads mapped to baits across samples (library size). While providing
replicate data is optional, for completeness of the analysis we
recommend to include data from both pooled samples and replicates,
i.e. both L and LL. The function automatically
creates pairwise interaction plots and saves them. Each interaction
receives a unique intID, which is
baitID;otherEndID. Since the creation of this table is
quite resource-intensive, it is processed in chunks using
ngroups. For efficient computing time, each group should
contain ~50-150 baits. In this experiment, we are processing just one
bait, therefore we will process them as ngroups = 1.
Once saved, the interaction table can be called like any other table in
the .txt format.
ints <- makeIntsTable(pooledSamples, baited_genes,
repscores = TRUE, LL = replicateSamples,
ngroups = 1, outfolder = outputDir
)Next, we annotate our interactions table with the intervals in
grl using the annotateInts() function. The
default requires intervals to have a direct overlap with the otherEnds.
This can be changed to being within x bp using the maxgp
argument, that works similarly to maxgap in
findOverlaps().
As quality control of our data, we can use
makeQCplots(), which plots sample correlations of raw and
downsampled read counts and Chicago scores. Two types of heatmaps are
made, displaying either overall spearman and pearson correlations or
clustering individual interactions.
QC heatmaps
Significant interactions within the interactions table can be
supplied to the plotInteractions() function using the
argument d and visualized as vertical bars:
name <- "Hoxb3"
id <- baited_genes[baited_genes$genename == name]$re_id
k <- 31
zoom <- 500000
ylim <- c(0, 100)
par(mfrow = c(1, 2))
plotInteractions(pooledSamples, id, k, zoom,
ylim = ylim, show.legend = FALSE,
show.legend.intervals = FALSE, name = name,
intervals = grl, colintervals = colintervals,
rmapgr = rmapgr, col = col,
show.legend.outside = TRUE, d = ints
)Visualization of significant interactions within lineplots. Significant interactions around Hoxb3 are shaded.
The function can visualize up to 8 different interaction types using
different colors, provided they are annotated in the column called
clusters_refined. Below an example of how to annotate the
interactions table and how to plot these interactions.
## select interactions detected only at 0h, these are 'lost'
lost <- ints[ints$RA_0h_score >= 5 & ints$RA_72h_score < 3, ]$intID
## select interactions detected only at 72h, these are 'gained'
gained <- ints[ints$RA_0h_score < 3 & ints$RA_72h_score >= 5, ]$intID
## annotate the interaction table with interaction types:
ints$clusters_refined <- "both"
ints[ints$intID %in% lost, ]$clusters_refined <- "RA_0h"
ints[ints$intID %in% gained, ]$clusters_refined <- "RA_72h"
par(mfrow = c(1, 2))
plotInteractions(pooledSamples, id, k, zoom,
ylim = ylim, show.legend = FALSE,
show.legend.intervals = FALSE,
name = name, intervals = grl,
colintervals = colintervals,
rmapgr = rmapgr, col = col,
show.legend.outside = TRUE, d = ints
)Significant interactions colored by interaction type.
So far, we have only compared interactions at the restriction fragment level. However, we are often more interested in interactions with specific intervals, such as ChIP-Seq peaks or gene promoters, which can overlap more than one restriction fragment. To achieve this, we first need to convert our gene-to-fragment interactions into gene-to-peak interactions.
The first step is to create oneGeneOnePeak tables using
the function makeOneGeneOnePeak(). Interactions with large
intervals, such as ChIP-Seq peaks, can have multiple fragment-level
interactions. In contrast, interval interactions within
oneGeneOnePeak tables assign only one interaction per
specific interval per gene. These interaction tables are automatically
saved in the directory defined in the argument folder.
quantfolder <- paste0(outputDir, "/oneGeneOnePeak")
ogoplist <- makeOneGeneOnePeak(
grl = grl, rmapgr = rmapgr,
ints = ints, folder = quantfolder
)
##look at the oneGeneOnePeak table:
ogoplist[[1]][1:2,]
#> intID baitID
#> 3958676;3958519 3958676;3958519,3958520,3958521,3958522 3958676
#> 3958676;3958526 3958676;3958526,3958527,3958528,3958529 3958676
#> otherEndID seqnames_bait start_bait
#> 3958676;3958519 3958519,3958520,3958521,3958522 chr11 96343553
#> 3958676;3958526 3958526,3958527,3958528,3958529 chr11 96343553
#> end_bait seqnames_otherEnd start_otherEnd end_otherEnd
#> 3958676;3958519 96345280 chr11 96283263 96285534
#> 3958676;3958526 96345280 chr11 96286496 96289482
#> interval_id_K27ac_peaks seqnames_K27ac_peaks start_K27ac_peaks
#> 3958676;3958519 2740 chr11 96284201
#> 3958676;3958526 2741 chr11 96286770
#> end_K27ac_peaks summit_re_id
#> 3958676;3958519 96285181 3958521
#> 3958676;3958526 96287453 3958527
##The tables are also saved in the quantfolder:
list.files(path = quantfolder, pattern = "oneGeneOnePeak")
#> [1] "oneGeneOnePeak_K27ac_peaks_interacting.txt"
#> [2] "oneGeneOnePeak_K27me3_peaks_interacting.txt"Examples:
We can visualize oneGeneOnePeak interactions with
plotInteractions() by supplying oneGeneOnePeak
tables instead of the original interactions table via the argument
d:
fls_ogop <- list.files(path = quantfolder, pattern = "oneGeneOnePeak")
## Load oneGeneOnePeak file for H3K27me3 peaks:
ogop_K27me3 <- read.delim(paste(quantfolder,
fls_ogop[grep("K27me3", fls_ogop)],sep = "/"),
stringsAsFactors = FALSE)
ogop_K27me3$clusters_refined <- "K27me3_interaction"
## Load oneGeneOnePeak file for H3K27ac peaks:
ogop_K27ac <- read.delim(
paste(quantfolder, fls_ogop[grep("K27ac", fls_ogop)],sep = "/"),
stringsAsFactors = FALSE
)
ogop_K27ac$clusters_refined <- "K27ac_interaction"
par(mfrow = c(1, 2))
## K27ac-overlapping interactions
plotInteractions(pooledSamples, id, k, zoom = zoom,
ylim = ylim, show.legend = TRUE,
name = paste("K27ac interactions \n", sep = ","),
intervals = grl, colintervals = colintervals,
rmapgr = rmapgr, col = col, d = ogop_K27ac,
colints = colintervals[1]
)
## K27me3-overlapping interactions
plotInteractions(pooledSamples, id, k, zoom = zoom,
ylim = ylim, show.legend = TRUE,
name = paste("K27me3 interactions \n", sep = ","),
intervals = grl, rmapgr = rmapgr, col = col,
colintervals = colintervals, d = ogop_K27me3,
colints = colintervals[2]
)Visualization of oneGeneOnePeak interactions. Interactions with H3K27me3 and H3K27ac are shaded.
Now that we have our oneGeneOnePeak files, we will quantify reads and
scores under Hoxb3 promoter-interacting H3K27ac peaks. To do this, we
will first create interaction matrices centered at restriction fragment
overlapping the center of interacting ranges using the function
getMatrix(). The distance from center in restriction
fragments is given via the argument zoom. The default
function creates Lm, a list of matrices based on the
supplied list of chicagoData objects. This list contains
matrices with ranges overlaps across all samples and distance-matched
control matrices. The matrices are flipped in a way that the view point
is always to the right of the interaction to standardize orientation
visually, since Capture-C signal tends to increase towards the view
point. Lm is saved within the resfolder and
can be loaded into an object called Lm.
As an example, we will now create such matrices for H3K27ac peaks.
name <- names(grl)[grep("K27ac", names(grl))]
name
#> [1] "K27ac_peaks"
## Zoom defines the +/- distance (in restriction fragments) around the intervals
## for which the matrices will be created
zoom <- 100
## Folder in which the matrices will be stored:
resfolder <- paste0(outputDir, "/matrices/")
if (!file.exists(resfolder)) {
dir.create(resfolder)
}
## un-normalized matrix with reads
m_reads <- getMatrix(pooledSamples,
zoom = zoom, d = ogop_K27ac, resfolder = resfolder, type = "reads",
norm = FALSE, name = name, rmapgr = rmapgr
)
## matrix normalized to the center of interactions
## in sample 1 (as defined per normsam) with reads
m_reads_norm <- getMatrix(pooledSamples,
zoom = zoom, d = ogop_K27ac, resfolder = resfolder, type = "reads",
norm = TRUE, name = name, rmapgr = rmapgr, normsam = 1
)
## matrix normalized to the center of interactions
## in sample 2 (as defined per normsam) with reads
m_reads_norm2 <- getMatrix(pooledSamples,
zoom = zoom, d = ogop_K27ac, resfolder = resfolder, type = "reads",
norm = TRUE, name = name, rmapgr = rmapgr, normsam = 2
)
## un-normalized matrix with scores
m_scores <- getMatrix(pooledSamples,
zoom = zoom, d = ogop_K27ac, resfolder = resfolder, type = "scores",
norm = FALSE, name = name, rmapgr = rmapgr
)
## matrix normalized to the center of interactions
## in sample 1 (as defined per normsam) with scores
m_scores_norm <- getMatrix(pooledSamples,
zoom = zoom, d = ogop_K27ac, resfolder = resfolder, type = "scores",
norm = TRUE, name = name, rmapgr = rmapgr, normsam = 1
)
## matrix normalized to the center of interactions
## in sample 2 (as defined per normsam) with scores
m_scores_norm2 <- getMatrix(pooledSamples,
zoom = zoom, d = ogop_K27ac, resfolder = resfolder, type = "scores",
norm = TRUE, name = name, rmapgr = rmapgr, normsam = 2
)Now we can use the lists of matrices to plot aggregate profiles of
different interaction types using the plotAggregatePeaks()
function. We are plotting aggregate read counts over all interactions.
Then we will plot normalized read counts to one of the samples, which
makes individual interactions more comparable.
## Loading precomputed matrices:
name <- names(grl)[grep("K27ac", names(grl))]
name
#> [1] "K27ac_peaks"
## Plots:
colaggr <- rep(col, 2)
lty <- rep(c(1, 3), each = 2)
par(mfrow = c(2, 3))
plotAggregatePeaks(m_reads,
plotwhat = "median", k = 3, xlim = c(-50, 50),
mainprefix = "K27ac\n", ylab = "reads",
col = colaggr, lty = lty
)
plotAggregatePeaks(m_reads_norm,
plotwhat = "median", k = 3, xlim = c(-50, 50),
mainprefix = "K27ac norm to 0h\n", ylab = "reads_norm",
col = colaggr, lty = lty
)
plotAggregatePeaks(m_reads_norm2,
plotwhat = "median", k = 3, xlim = c(-50, 50),
mainprefix = "K27ac norm to 72h\n", ylab = "reads_norm",
col = colaggr, lty = lty
)
plotAggregatePeaks(m_scores,
plotwhat = "median", k = 3, xlim = c(-50, 50),
mainprefix = "K27ac scores\n", ylab = "scores",
col = colaggr, lty = lty
)
plotAggregatePeaks(m_scores_norm,
plotwhat = "median", k = 3, xlim = c(-50, 50),
mainprefix = "K27ac scores norm to 0h\n",
ylab = "scores_norm",
col = colaggr, lty = lty
)
plotAggregatePeaks(m_scores_norm2,
plotwhat = "median", k = 3, xlim = c(-50, 50),
mainprefix = "K27ac scores norm to 72h\n",
ylab = "scores_norm",
col = colaggr, lty = lty
)Aggregate Capture-C profiles. Dotted lines: Distance-matched background.
The above plots show that Hoxb3 interactions with H3K27ac increase
upon differentiation, with subtle differences depending on the sample to
which the data are normalized. For sanity check, we recommend to test
out normalization to different samples. Also, to avoid inflation of fold
changes, a general recommendation is to use the positive control sample
as reference.
Let’s compare Hoxb3 interactions with H3K27ac peaks with its H3K27me3
peak interactions:
name <- names(grl)[grep("K27me3", names(grl))]
name
#> [1] "K27me3_peaks"
## matrix normalized to the center of interactions
## in sample 2 (as defined per normsam) with reads
m_reads_norm2_K27me3 <- getMatrix(pooledSamples,
zoom = zoom, d = ogop_K27me3, resfolder = resfolder,
type = "reads", norm = TRUE,
name = name, rmapgr = rmapgr, normsam = 2
)
# Plots:
colaggr <- rep(col, 2)
lty <- rep(c(1, 3), each = 2)
par(mfrow = c(1, 2))
plotAggregatePeaks(m_reads_norm2,
plotwhat = "median", k = 3,
xlim = c(-50, 50),
mainprefix = "K27ac norm to 72h\n",
ylab = "reads_norm", col = colaggr, lty = lty
)
plotAggregatePeaks(m_reads_norm2_K27me3,
plotwhat = "median", k = 3,
xlim = c(-50, 50),
mainprefix = "K27me3 norm to 72h\n",
ylab = "reads_norm", col = colaggr, lty = lty
)Aggregate Capture-C profiles for different intervals.
This comparison shows that while the Hoxb3 promoter gains
interactions with H3K27ac peaks, it loses interactions with H3K27me3
peaks, consistent with the line plots.
getMatrix() can use two types of read normalization:
downsampling (default) and scaling. In both cases reads are normalized
to the read coverage, but in case of scaling, the reads are scaled by
the total number of promoters in the Capture experiment upon
downsampling to a default library size of 100,000. This is meant to
enable comparison between different capture experiments and works in the
same way as read normalization in plotInteractions(). On
the other hand, in case of downsampling, the reads are simply
downsampled to the smallest library in L.
Below is a comparison of the two read normalization methods:
## matrix with scaling reads:
m_reads_scaling <- getMatrix(pooledSamples, zoom, ogop_K27ac,
resfolder = resfolder, type = "reads", norm = FALSE,
name = name, rmapgr = rmapgr, readnorm = "scaling"
)
## plots:
par(mfrow = c(1, 2))
plotAggregatePeaks(m_reads,
plotwhat = "median", k = 3,
xlim = c(-50, 50),
mainprefix = "K27ac downsampled\n",
ylab = "reads", col = colaggr, lty = lty
)
plotAggregatePeaks(m_reads_scaling,
plotwhat = "median", k = 3,
xlim = c(-50, 50),
mainprefix = "K27ac scaled\n",
ylab = "reads", col = colaggr, lty = lty
)
Aggregate Capture-C profiles for different normalizations. Downsampling
normalizes read counts within one experiment, whereas scaling
also normalizes to the number of capture view points.
fls_ogop <- list.files(path = quantfolder, pattern = "oneGeneOnePeak")
## Load oneGeneOnePeak file for H3K27me3 peaks:
ogop_K27me3 <- read.delim(paste(quantfolder, fls_ogop[grep("K27me3", fls_ogop)],
sep = "/"), stringsAsFactors = FALSE)
ogop_K27me3$clusters_refined <- "K27me3 interaction"
## Quantify
annotatedTable <- quantifyWithinPeaks(
data = ogop_K27me3,
cdlist = pooledSamples,
cdlist.reps = replicateSamples,
rmapgr = rmapgr,
minsize = 100,
maxsize = 40000,
file = paste0(outputDir, '/aggregated5_annotated.txt')
)
##Look at the oneGeneOnePeak table before...
names(ogop_K27me3)
#> [1] "intID" "baitID"
#> [3] "otherEndID" "seqnames_bait"
#> [5] "start_bait" "end_bait"
#> [7] "seqnames_otherEnd" "start_otherEnd"
#> [9] "end_otherEnd" "interval_id_K27me3_peaks"
#> [11] "seqnames_K27me3_peaks" "start_K27me3_peaks"
#> [13] "end_K27me3_peaks" "summit_re_id"
#> [15] "clusters_refined"
##and after the annotation
names(annotatedTable)
#> [1] "intID" "baitID"
#> [3] "otherEndID" "seqnames_bait"
#> [5] "start_bait" "end_bait"
#> [7] "seqnames_otherEnd" "start_otherEnd"
#> [9] "end_otherEnd" "interval_id_K27me3_peaks"
#> [11] "seqnames_K27me3_peaks" "start_K27me3_peaks"
#> [13] "end_K27me3_peaks" "summit_re_id"
#> [15] "clusters_refined" "RA_0h_N"
#> [17] "RA_72h_N" "RA_0h_score"
#> [19] "RA_72h_score" "RA_0h_rep1_N"
#> [21] "RA_0h_rep2_N" "RA_72h_rep1_N"
#> [23] "RA_72h_rep2_N" "RA_0h_rep1_score"
#> [25] "RA_0h_rep2_score" "RA_72h_rep1_score"
#> [27] "RA_72h_rep2_score" "RA_0h_N_downsampled"
#> [29] "RA_72h_N_downsampled" "RA_0h_rep1_N_downsampled"
#> [31] "RA_0h_rep2_N_downsampled" "RA_72h_rep1_N_downsampled"
#> [33] "RA_72h_rep2_N_downsampled" "totalValidFrags"
#> [35] "RA_0h_N_downsampled_mean" "RA_72h_N_downsampled_mean"
#> [37] "RA_0h_rep1_N_downsampled_mean" "RA_0h_rep2_N_downsampled_mean"
#> [39] "RA_72h_rep1_N_downsampled_mean" "RA_72h_rep2_N_downsampled_mean"
#> [41] "RA_0h_score_mean" "RA_72h_score_mean"
#> [43] "RA_0h_rep1_score_mean" "RA_0h_rep2_score_mean"
#> [45] "RA_72h_rep1_score_mean" "RA_72h_rep2_score_mean"Let’s now plot our quantification as boxplots using the function
boxplotsCapC() for pooled…
Boxplots for quantitative comparison of Capture-C signal. Two types of
read or score quantification is provided that give slightly different
results:
(i) Total signal within a oneGeneOnePeak interaction and
(ii) Total signal normalized to the number of restriction fragments
within which the reads
were quantified (right).
…and for the replicate data:
##plot replicate data:
boxplotsCapC(annotatedTable, col=rep(col,each=2),
reps=TRUE, show_notch = FALSE
)Boxplots for replicate comparison.
Visualizing interactions, for instance in line plots, quickly reveals
that one strong interaction peak can actually be represented by multiple
disjointed interactions. These interactions can be merged using the
aggregatePeaks() function of PostChicago. This function
takes a table of interactions and merges all otherEnds interacting with
one bait that are within a distance of dis restriction
fragments of each other.
If we would like to define sample-specific interaction
peaks, the supplied table should contain only these
sample-specific interactions.
Below is a comparison of fragment-level interactions with interactions
aggregated over 5 restriction sites:
## Define names of significant interactions shown in the plot
ints$clusters_refined <- "interaction"
## Distance in restriction fragments over which to aggregate:
dis <- 5
## Let's aggregate all interactions within 5 fragments distance of each other:
aggregated5 <- aggregatePeaks(ints, dis, names(pooledSamples),
fileprefix = "ints", outdir = outputDir
)
#> dis 5 , aggregating peaks...
## Let's aggregate all interactions within 5 fragments distance of each other
## that are significant at 0h:
aggregated5_0h <- aggregatePeaks(ints[ints$RA_0h_score >= 5, ], dis,
names(pooledSamples),
fileprefix = "ints_0h", outdir = outputDir
)
#> dis 5 , aggregating peaks...
## Let's aggregate all interactions within 5 fragments distance of each other
## that are significant at 72h:
aggregated5_72h <- aggregatePeaks(ints[ints$RA_72h_score >= 5, ], dis,
names(pooledSamples),
fileprefix = "ints_72h",
outdir = outputDir
)
#> dis 5 , aggregating peaks...
nrow(ints)
#> [1] 82
nrow(aggregated5)
#> [1] 35
nrow(aggregated5_0h)
#> [1] 16
nrow(aggregated5_72h)
#> [1] 23
## Example plots:
name <- "Hoxb3"
id <- baited_genes[baited_genes$genename == name]$re_id
k <- 11
ylim <- c(0, 200)
xlim <- c(96150000, 96350000)
zoom <- as.numeric(NA)
colints <- 'yellow'
par(mfrow = c(2, 2))
plotInteractions(pooledSamples, id, k, zoom,
ylim = ylim, show.legend = TRUE,
name = paste("Fragment-level ints \n", name, sep = ""),
rmapgr = rmapgr, col = col, d = ints,
xlim = xlim, colints = colints
)
plotInteractions(pooledSamples, id, k, zoom,
ylim = ylim, show.legend = TRUE,
name = paste("Aggregated ints, dis = ", dis, " frags \n",
name, sep = ""),
rmapgr = rmapgr, col = col, d = aggregated5,
xlim = xlim, colints = colints
)
plotInteractions(pooledSamples, id, k, zoom,
ylim = ylim, show.legend = TRUE,
name = paste("Aggregated RA_0h ints, dis = ", dis, " frags \n",
name, sep = ""),
rmapgr = rmapgr, col = col, d = aggregated5_0h,
xlim = xlim, colints = colints
)
plotInteractions(pooledSamples, id, k, zoom,
ylim = ylim, show.legend = TRUE,
name = paste("Aggregated RA_72h ints, dis = ", dis, " frags \n",
name, sep = ""),
rmapgr = rmapgr, col = col, d = aggregated5_72h,
xlim = xlim, colints = colints
)
Lineplots with different types of interactions. Interactions are shaded,
either for fragment-to-fragment interactions (top left),
aggregated within 5 restriction fragments over all (top right),
RA_0h-specific (bottom-left)
and RA_72h-specific (bottom right) significant interactions.
This visualization nicely confirms that, as suspected from the
interaction profiles, the majority of interactions in this region are
specific for RA_0h.
Apart from fragment-level aggregation, PostChicago provides the function
aggregatePeaks_regions(), which aggregates significant
interactions based on the distance in bp. Below an example on how peak
aggregation works over a 10kb distance.
## Distance in bp over which to aggregate:
disbp <- 10000
## Let's aggregate all interactions within 5 fragments distance of each other:
aggregated10kb <- aggregatePeaks_regions(ints, disbp, names(L),
fileprefix = "ints",
outdir = outputDir
)
#> distance of 10000 , aggregating peaks...
nrow(aggregated10kb)
#> [1] 11
## plot:
par(mfrow = c(1, 2))
plotInteractions(pooledSamples, id, k, zoom,
ylim = ylim, show.legend = TRUE,
name = paste("Aggregated ints, dis = ", dis,
" frags \n", name, sep = ""),
rmapgr = rmapgr, col = col, d = aggregated5,
xlim = xlim, colints = colints
)
plotInteractions(pooledSamples, id, k, zoom,
ylim = ylim, show.legend = TRUE,
name = paste("Aggregated ints, dis = ", disbp / 1000,
"kb \n", name, sep = ""),
rmapgr = rmapgr, col = col, d = aggregated10kb,
xlim = xlim, colints = colints
)Aggregation by distance between significant interactions. Interactions are either aggregated over 5 resitriction fragments (left) or over 10kb
These examples reveal that while an aggregation over 10kb nicely
highlights the two interaction peaks at the lost interaction on the
left, the interactions more proximal to the view point, however, lose
their granularity. As with peak calling for ChIP-Seq, we recommend
testing out different aggregation types and distances for each new
Capture-C experiment.
Interaction peaks can now be used to create new grl objects for
oneGeneOnePeak-level analysis analogous to the analysis for epigenetic
ranges.
Interaction peaks summarize multiple significant interactions within
a peak. Accurate quantification of all signal under the interaction
peaks includes quantification of reads and scores within restriction
fragments that are covered, but not necessarily contain significant
interactions and are therefore not part of our original interaction
table.
Such quantification can be done using the oneGeneOnePeak-level analysis
which requires creating a matrix following signal quantification from
the matrix, as discussed above. Alternatively, direct quantification is
provided by the function quantifyWithinPeaks(). This
function quantifies the total sum of raw and downsampled reads and
scores within the whole interaction peak and further normalizes these
total counts for the total number of valid restriction fragments within
the interaction peak. The function can also be used with
oneGeneOnePeak tables.
We specify the data which to annotate, the ChicagoData lists used for
the annotation. In addition, the function normalizes for the total
number of valid restriction fragments, which are specified during the
preparation of ChicagoData objects by the parameters
minFragLen and maxFragLen. If such valid
fragments were specified during the preparation of of
ChicagoData objects, they should be also specified for
quantifyWithinPeaks(). Unvalid restriction fragments are
then omitted from the normalization. Here, we use
minsize = 100 and maxsize = 40000 to specify
that we used minFragLen = 100 and the default
maxFragLen for the generation of ChicagoData
objects.
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] PostChicago_0.99.2 BiocStyle_2.41.0
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.3.0 bitops_1.0-9
#> [3] gridExtra_2.3 httr2_1.2.2
#> [5] rlang_1.2.0 magrittr_2.0.5
#> [7] otel_0.2.0 matrixStats_1.5.0
#> [9] compiler_4.6.0 RSQLite_3.53.2
#> [11] systemfonts_1.3.2 vctrs_0.7.3
#> [13] stringr_1.6.0 pkgconfig_2.0.3
#> [15] crayon_1.5.3 fastmap_1.2.0
#> [17] backports_1.5.1 dbplyr_2.6.0
#> [19] XVector_0.53.0 labeling_0.4.3
#> [21] Rsamtools_2.29.0 rmarkdown_2.31
#> [23] ragg_1.5.2 purrr_1.2.2
#> [25] bit_4.6.0 xfun_0.59
#> [27] cachem_1.1.0 cigarillo_1.3.0
#> [29] jsonlite_2.0.0 blob_1.3.0
#> [31] DelayedArray_0.39.3 BiocParallel_1.47.0
#> [33] parallel_4.6.0 cluster_2.1.8.2
#> [35] R6_2.6.1 bslib_0.11.0
#> [37] stringi_1.8.7 RColorBrewer_1.1-3
#> [39] rtracklayer_1.73.0 rpart_4.1.27
#> [41] GenomicRanges_1.65.0 jquerylib_0.1.4
#> [43] Seqinfo_1.3.0 SummarizedExperiment_1.43.0
#> [45] knitr_1.51 base64enc_0.1-6
#> [47] IRanges_2.47.2 BiocBaseUtils_1.15.1
#> [49] Matrix_1.7-5 nnet_7.3-20
#> [51] tidyselect_1.2.1 rstudioapi_0.19.0
#> [53] abind_1.4-8 yaml_2.3.12
#> [55] codetools_0.2-20 curl_7.1.0
#> [57] lattice_0.22-9 tibble_3.3.1
#> [59] withr_3.0.3 Biobase_2.73.1
#> [61] S7_0.2.2 evaluate_1.0.5
#> [63] foreign_0.8-91 BiocFileCache_3.3.0
#> [65] Biostrings_2.81.3 pillar_1.11.1
#> [67] BiocManager_1.30.27 filelock_1.0.3
#> [69] MatrixGenerics_1.25.0 Chicago_1.41.0
#> [71] checkmate_2.3.4 stats4_4.6.0
#> [73] generics_0.1.4 RCurl_1.98-1.19
#> [75] S4Vectors_0.51.3 ggplot2_4.0.3
#> [77] scales_1.4.0 glue_1.8.1
#> [79] pheatmap_1.0.13 Hmisc_5.2-6
#> [81] maketools_1.3.2 tools_4.6.0
#> [83] BiocIO_1.23.3 sys_3.4.3
#> [85] data.table_1.18.4 GenomicAlignments_1.49.0
#> [87] buildtools_1.0.0 XML_3.99-0.23
#> [89] grid_4.6.0 Delaporte_8.4.3
#> [91] tidyr_1.3.2 colorspace_2.1-2
#> [93] patchwork_1.3.2 htmlTable_2.5.0
#> [95] restfulr_0.0.17 Formula_1.2-5
#> [97] cli_3.6.6 rappdirs_0.3.4
#> [99] textshaping_1.0.5 S4Arrays_1.13.0
#> [101] dplyr_1.2.1 gtable_0.3.6
#> [103] sass_0.4.10 digest_0.6.39
#> [105] BiocGenerics_0.59.7 SparseArray_1.13.2
#> [107] rjson_0.2.23 htmlwidgets_1.6.4
#> [109] farver_2.1.2 memoise_2.0.1
#> [111] htmltools_0.5.9 lifecycle_1.0.5
#> [113] httr_1.4.8 MASS_7.3-65
#> [115] bit64_4.8.2Cairns, J., Freire-Pritchett, P., Wingett, S.W., Varnai, C., Dimond, A., Plagnol, V., Zerbino, D., Schoenfelder, S., Javierre, B.M., Osborne, C., et al. (2016). CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol 17, 127.Link
Mahara, S., Prussing, S., Smialkovska, V., Krall, S., Holliman, S., Blum, B., Dachtler, V., Borgers, H., Sollier, E., Plass, C., and Feldmann, A. (2024). Transient promoter interactions modulate developmental gene activation. Mol Cell 84, 4486-4502 e4487. Link
Wingett, S., Ewels, P., Furlan-Magaril, M., Nagano, T., Schoenfelder, S., Fraser, P., and Andrews, S. (2015). HiCUP: pipeline for mapping and processing Hi-C data. F1000Res 4, 1310. Link