| Title: | Mariner: Explore the Hi-Cs |
|---|---|
| Description: | Tools for manipulating paired ranges and working with Hi-C data in R. Functionality includes manipulating/merging paired regions, generating paired ranges, extracting/aggregating interactions from `.hic` files, and visualizing the results. Designed for compatibility with plotgardener for visualization. |
| Authors: | Eric Davis [aut, cre] (ORCID: <https://orcid.org/0000-0003-4051-3217>), Sarah Parker [aut] (ORCID: <https://orcid.org/0000-0002-2700-3979>) |
| Maintainer: | Eric Davis <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.13.0 |
| Built: | 2026-05-30 09:40:45 UTC |
| Source: | https://github.com/bioc/mariner |
'mariner' is an R/Bioconductor package for exploring Hi-C data. It enables users to flexibly manipulate, extract, and aggregate chromatin interaction data quickly and efficiently.
Key Features:
Manipulating Paired Ranges - Convert, bin, and shift paired genomic ranges.
Clustering & Merging Interactions - Group nearby interactions and select one as representative.
Extracting & Aggregating Interactions - Pull Hi-C pixels or matrices, then aggregate by files or interactions.
Calculating Loop Enrichment - Determine loop enrichment to local background with selection functions to flexibility select foreground and background.
For more details on the features of 'mariner', read the vignette: 'browseVignettes(package="mariner")'
Maintainer: Eric Davis [email protected] (ORCID)
Authors:
Sarah Parker [email protected] (ORCID)
Useful links:
Aggregation of count matrices is done blocks to avoid large memory usage. Use 'nBlocks' to control the number of blocks read into memory at once. Blocks are defined as 'length(interactions(x))/nBlocks'.
aggHicMatrices( x, by = NULL, FUN = sum, nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), compressionLevel = 0 ) ## S4 method for signature 'InteractionArray' aggHicMatrices( x, by = NULL, FUN = sum, nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), compressionLevel = 0 )aggHicMatrices( x, by = NULL, FUN = sum, nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), compressionLevel = 0 ) ## S4 method for signature 'InteractionArray' aggHicMatrices( x, by = NULL, FUN = sum, nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), compressionLevel = 0 )
x |
InteractionArray object. |
by |
String (length one character vector) describing whether to aggregate by interactions, files, or neither (i.e. NULL as default). |
FUN |
Function to use for aggregating. |
nBlocks |
Number of blocks for block-processing arrays. Default is 5. Increase this for large datasets. To read and process all data at once, set this value to 1. |
verbose |
Boolean (TRUE or FALSE) describing whether to report block-processing progress. |
BPPARAM |
Parallelization params (passed to 'BiocParallel::bplapply()'). Default is the result of 'BiocParallel::bpparams()'. Parallel processing is not available when 'by=interactions'. |
compressionLevel |
Number (length one numeric vector) between 0 (Default) and 9 indicating the compression level used on HDF5 file. |
Since interactions are typically the largest dimension in an InteractionArray, using 'by=interactions' creates an HDF5-backed array to store these large arrays. Currently parallel processing for HDF5-backed arrays are not supported regardless of the value of 'BPPARAM'.
Both 'by=NULL' and 'by=files' support parallel processing.
An aggregated 'DelayedArray' object. If 'by=interactions' or 'by=files' then a 3-dimensional 'DelayedArray' is returned. If 'by=NULL' (default) then A 2-dimensional 'DelayedMatrix' is returned.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loops as GInteractions object loops <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE) |> as_ginteractions(keep.extra.columns=FALSE) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(loops) <- 'ENSEMBL' ## Expand pixel ranges with a 5 pixel buffer on either side loops <- assignToBins(loops, binSize=100e3) |> pixelsToMatrices(buffer=5) ## Extract 10, 11x11 count matrices from 2 hic files iarr <- loops[1:10] |> pullHicMatrices(binSize=100e3, files=hicFiles) ## Aggregate all, by files, or by interactions aggHicMatrices(x=iarr) aggHicMatrices(x=iarr, by="files") aggHicMatrices(x=iarr, by="interactions")## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loops as GInteractions object loops <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE) |> as_ginteractions(keep.extra.columns=FALSE) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(loops) <- 'ENSEMBL' ## Expand pixel ranges with a 5 pixel buffer on either side loops <- assignToBins(loops, binSize=100e3) |> pixelsToMatrices(buffer=5) ## Extract 10, 11x11 count matrices from 2 hic files iarr <- loops[1:10] |> pullHicMatrices(binSize=100e3, files=hicFiles) ## Aggregate all, by files, or by interactions aggHicMatrices(x=iarr) aggHicMatrices(x=iarr, by="files") aggHicMatrices(x=iarr, by="interactions")
Aggregate the metadata columns of merged pairs
aggMetadata(x, columns, funs) ## S4 method for signature ## 'MergedGInteractions,character,character_OR_function_OR_list' aggMetadata(x, columns, funs)aggMetadata(x, columns, funs) ## S4 method for signature ## 'MergedGInteractions,character,character_OR_function_OR_list' aggMetadata(x, columns, funs)
x |
MergedGInteractions object. |
columns |
Character vector of columns to aggregate. |
funs |
Character vector of functions to apply to 'columns'. |
'x' with aggregated metadata columns
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") bedpeFiles <- c( marinerData::FS_5kbLoops.txt(), marinerData::WT_5kbLoops.txt() ) names(bedpeFiles) <- c("FS", "WT") ## Read in bedpeFiles as a list of GInteractions ## Use only first 1000 rows for fast example giList <- lapply(bedpeFiles, read.table, header=TRUE, nrows=1000) |> lapply(as_ginteractions) |> setNames(gsub("^.*extdata/(.{2}).*$", "\\1", bedpeFiles)) ## Add names describing the source and loop giList <- lapply(seq_along(giList), \(i) { x <- giList[[i]] x$name <- paste0(names(giList)[i], "_loop_", length(x)) return(x) }) ## Cluster & merge pairs x <- mergePairs(x = giList, radius = 5e03) ## List loop names aggMetadata(x, columns = "name", fun = "list") ## Aggregate values aggMetadata(x, columns = c("APScoreAvg"), fun = "mean") aggMetadata(x, columns = c("APScoreAvg", "avg"), fun = "mean") aggMetadata(x, columns = c("APScoreAvg"), fun = c("mean", "median")) ## Custom functions aggMetadata(x, columns = c("APScoreAvg"), fun = \(x) { ifelse(is.na(sd(x)), 0, sd(x)) })## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") bedpeFiles <- c( marinerData::FS_5kbLoops.txt(), marinerData::WT_5kbLoops.txt() ) names(bedpeFiles) <- c("FS", "WT") ## Read in bedpeFiles as a list of GInteractions ## Use only first 1000 rows for fast example giList <- lapply(bedpeFiles, read.table, header=TRUE, nrows=1000) |> lapply(as_ginteractions) |> setNames(gsub("^.*extdata/(.{2}).*$", "\\1", bedpeFiles)) ## Add names describing the source and loop giList <- lapply(seq_along(giList), \(i) { x <- giList[[i]] x$name <- paste0(names(giList)[i], "_loop_", length(x)) return(x) }) ## Cluster & merge pairs x <- mergePairs(x = giList, radius = 5e03) ## List loop names aggMetadata(x, columns = "name", fun = "list") ## Aggregate values aggMetadata(x, columns = c("APScoreAvg"), fun = "mean") aggMetadata(x, columns = c("APScoreAvg", "avg"), fun = "mean") aggMetadata(x, columns = c("APScoreAvg"), fun = c("mean", "median")) ## Custom functions aggMetadata(x, columns = c("APScoreAvg"), fun = \(x) { ifelse(is.na(sd(x)), 0, sd(x)) })
'as_ginteractions' takes a paired-interaction (i.e. BEDPE) formatted data-frame-like object and converts it to a GInteractions object. For convenience, 'makeGInteractionsFromDataFrame' can be used as an alias.
as_ginteractions( df, keep.extra.columns = TRUE, starts.in.df.are.0based = FALSE, ... ) makeGInteractionsFromDataFrame( df, keep.extra.columns = TRUE, starts.in.df.are.0based = FALSE, ... ) ## S4 method for signature ## 'DF_OR_df_OR_dt,logical_OR_missing,logical_OR_missing' makeGInteractionsFromDataFrame(df, keep.extra.columns, starts.in.df.are.0based) ## S4 method for signature ## 'DF_OR_df_OR_dt,logical_OR_missing,logical_OR_missing' as_ginteractions(df, keep.extra.columns, starts.in.df.are.0based)as_ginteractions( df, keep.extra.columns = TRUE, starts.in.df.are.0based = FALSE, ... ) makeGInteractionsFromDataFrame( df, keep.extra.columns = TRUE, starts.in.df.are.0based = FALSE, ... ) ## S4 method for signature ## 'DF_OR_df_OR_dt,logical_OR_missing,logical_OR_missing' makeGInteractionsFromDataFrame(df, keep.extra.columns, starts.in.df.are.0based) ## S4 method for signature ## 'DF_OR_df_OR_dt,logical_OR_missing,logical_OR_missing' as_ginteractions(df, keep.extra.columns, starts.in.df.are.0based)
df |
A data.table, data.frame, or DataFrame object. Assumes that the first 6 colummns are in the format chr1, start1, end1 and chr2, start2, end2, representing each pair of interactions. |
keep.extra.columns |
TRUE or FALSE (the default). If TRUE, the columns in df that are not used to form the genomic ranges of the returned GRanges object are then returned as metadata columns on the object. Otherwise, they are ignored. If df has a width column, then it's always ignored. |
starts.in.df.are.0based |
TRUE or FALSE (the default). If TRUE, then the start positions of the genomic ranges in df are considered to be 0-based and are converted to 1-based in the returned GRanges object. This feature is intended to make it more convenient to handle input that contains data obtained from resources using the "0-based start" convention. A notorious example of such resource is the UCSC Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables). |
... |
Additional arguments. |
GInteraction object
## data.frame df <- data.frame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000) makeGInteractionsFromDataFrame(df) ## data.frame df <- data.frame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000) as_ginteractions(df) ## data.table library(data.table) df <- data.table::data.table(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000) as_ginteractions(df) ## DataFrame library(S4Vectors) df <- DataFrame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000) as_ginteractions(df) ## Alias df <- data.frame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000, pval = 0.05, dist = 10000) makeGInteractionsFromDataFrame(df) ## Additional metadata df <- data.frame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000, pval = 0.05, dist = 10000) as_ginteractions(df) ## Remove additional metadata as_ginteractions(df, keep.extra.columns = FALSE) ## Add 1 to starts (for 0-based programs) as_ginteractions(df, starts.in.df.are.0based = TRUE)## data.frame df <- data.frame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000) makeGInteractionsFromDataFrame(df) ## data.frame df <- data.frame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000) as_ginteractions(df) ## data.table library(data.table) df <- data.table::data.table(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000) as_ginteractions(df) ## DataFrame library(S4Vectors) df <- DataFrame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000) as_ginteractions(df) ## Alias df <- data.frame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000, pval = 0.05, dist = 10000) makeGInteractionsFromDataFrame(df) ## Additional metadata df <- data.frame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000, pval = 0.05, dist = 10000) as_ginteractions(df) ## Remove additional metadata as_ginteractions(df, keep.extra.columns = FALSE) ## Add 1 to starts (for 0-based programs) as_ginteractions(df, starts.in.df.are.0based = TRUE)
Paired range objects (like 'GInteractions' or BEDPE-formatted 'data.frame'-like objects) can be binned separately for each set of ranges.
assignToBins(x, binSize, pos1 = "center", pos2 = "center", ...) ## S4 method for signature ## 'DF_OR_df_OR_dt, ## numeric, ## character_OR_numeric_OR_missing, ## character_OR_numeric_OR_missing' assignToBins(x, binSize, pos1, pos2) ## S4 method for signature ## 'GInteractions, ## numeric, ## character_OR_numeric_OR_missing, ## character_OR_numeric_OR_missing' assignToBins(x, binSize, pos1, pos2)assignToBins(x, binSize, pos1 = "center", pos2 = "center", ...) ## S4 method for signature ## 'DF_OR_df_OR_dt, ## numeric, ## character_OR_numeric_OR_missing, ## character_OR_numeric_OR_missing' assignToBins(x, binSize, pos1, pos2) ## S4 method for signature ## 'GInteractions, ## numeric, ## character_OR_numeric_OR_missing, ## character_OR_numeric_OR_missing' assignToBins(x, binSize, pos1, pos2)
x |
'GInteractions' or 'data.frame'-like object with paired interactions. |
binSize |
Integer (numeric) vector describing the new size of each pair of ranges. Accepts up to 2 values for adjusting each pair. |
pos1, pos2
|
Position within anchors to resize the bin. Can be a character or integer vector of length 1 or 'length(x)' designating the position for each element in 'x'. Character options are "start", "end" and "center". Integers are referenced from the start position for '+' and '*' strands and from the end position for the '-' strand. |
... |
Additional arguments. |
GInteractions-like object binned to 'binSize' by 'pos1' and 'pos2'.
## Construct interactions as data.frame df1 <- data.frame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000) ## Assign each range to 20-kb bins from the start positions assignToBins(x = df1, binSize = 20000, pos1 = 'start', pos2 = 'start') ## Construct GInteractions library(InteractionSet) gi1 <- data.frame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000) |> as_ginteractions() ## Assign each range to 20-kb bins from the start positions assignToBins(x = gi1, binSize = 20000, pos1 = 'start', pos2 = 'start')## Construct interactions as data.frame df1 <- data.frame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000) ## Assign each range to 20-kb bins from the start positions assignToBins(x = df1, binSize = 20000, pos1 = 'start', pos2 = 'start') ## Construct GInteractions library(InteractionSet) gi1 <- data.frame(chr1 = "chr1", x1 = 10000, x2 = 20000, chr2 = "chr1", y1 = 30000, y2 = 40000) |> as_ginteractions() ## Assign each range to 20-kb bins from the start positions assignToBins(x = gi1, binSize = 20000, pos1 = 'start', pos2 = 'start')
Flexibly bin ranges
binRanges(x, binSize, pos = "center") ## S4 method for signature 'GRanges,numeric,character_OR_numeric_OR_missing' binRanges(x, binSize, pos = "center")binRanges(x, binSize, pos = "center") ## S4 method for signature 'GRanges,numeric,character_OR_numeric_OR_missing' binRanges(x, binSize, pos = "center")
x |
'GRanges' object |
binSize |
Integer (numeric) describing the new size of each range. |
pos |
Position within range to resize the bin. Can be a character or integer vector of length 1 or 'length(x)' designating the position for each element in 'x'. Character options are "start", "end" and "center". Integers are referenced from the start position for '+' and '*' strands and from the end position for the '-' strand. |
'GRanges' object that has been shifted by 'pos' and assigned to bins of 'binSize'.
library(GenomicRanges) ## Create example GRanges gr1 <- GRanges(seqnames = "chr1", ranges = IRanges::IRanges(start = rep(5000,3), end = rep(6000,3)), strand = c('+', '-', '*')) gr2 <- gr1 |> promoters(upstream = 2000, downstream = 200) ## Binning the results binRanges(x = gr1, binSize = 1000, pos = 'start') binRanges(x = gr1, binSize = 1000, pos = 'end') binRanges(x = gr1, binSize = 1000, pos = 'center') ## Bin after shifting back to TSS binRanges(x = gr2, binSize = 1000, pos = 2000)library(GenomicRanges) ## Create example GRanges gr1 <- GRanges(seqnames = "chr1", ranges = IRanges::IRanges(start = rep(5000,3), end = rep(6000,3)), strand = c('+', '-', '*')) gr2 <- gr1 |> promoters(upstream = 2000, downstream = 200) ## Binning the results binRanges(x = gr1, binSize = 1000, pos = 'start') binRanges(x = gr1, binSize = 1000, pos = 'end') binRanges(x = gr1, binSize = 1000, pos = 'center') ## Bin after shifting back to TSS binRanges(x = gr2, binSize = 1000, pos = 2000)
Pulls Hi-C pixels and calculates the enrichment of the selected foreground ('fg') over the selected background ('bg').
calcLoopEnrichment( x, files, fg = selectCenterPixel(mhDist = 1, buffer = defaultBuffer()), bg = selectTopLeft(n = 4, buffer = defaultBuffer()) + selectBottomRight(n = 4, buffer = defaultBuffer()), FUN = function(fg, bg) median(fg + 1)/median(bg + 1), nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), ... ) ## S4 method for signature 'GInteractions,character' calcLoopEnrichment( x, files, fg = selectCenterPixel(mhDist = 1, buffer = defaultBuffer()), bg = selectTopLeft(n = 4, buffer = defaultBuffer()) + selectBottomRight(n = 4, buffer = defaultBuffer()), FUN = function(fg, bg) median(fg + 1)/median(bg + 1), nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), ... ) ## S4 method for signature 'InteractionArray,missing' calcLoopEnrichment( x, files, fg = selectCenterPixel(mhDist = 1, buffer = defaultBuffer()), bg = selectTopLeft(n = 4, buffer = defaultBuffer()) + selectBottomRight(n = 4, buffer = defaultBuffer()), FUN = function(fg, bg) median(fg + 1)/median(bg + 1), nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), ... )calcLoopEnrichment( x, files, fg = selectCenterPixel(mhDist = 1, buffer = defaultBuffer()), bg = selectTopLeft(n = 4, buffer = defaultBuffer()) + selectBottomRight(n = 4, buffer = defaultBuffer()), FUN = function(fg, bg) median(fg + 1)/median(bg + 1), nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), ... ) ## S4 method for signature 'GInteractions,character' calcLoopEnrichment( x, files, fg = selectCenterPixel(mhDist = 1, buffer = defaultBuffer()), bg = selectTopLeft(n = 4, buffer = defaultBuffer()) + selectBottomRight(n = 4, buffer = defaultBuffer()), FUN = function(fg, bg) median(fg + 1)/median(bg + 1), nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), ... ) ## S4 method for signature 'InteractionArray,missing' calcLoopEnrichment( x, files, fg = selectCenterPixel(mhDist = 1, buffer = defaultBuffer()), bg = selectTopLeft(n = 4, buffer = defaultBuffer()) + selectBottomRight(n = 4, buffer = defaultBuffer()), FUN = function(fg, bg) median(fg + 1)/median(bg + 1), nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), ... )
x |
GInteractions object or an InteractionArray object. |
files |
Character file paths to '.hic' files. Required only if GInteractions object is supplied for x. |
fg |
MatrixSelection object of matrix indices for the foreground. |
bg |
MatrixSelection object of matrix indices for the background. |
FUN |
Function with at least two parameters (i.e., 'fg', 'bg') defining how enrichment should be calculated. Must produce a single value (numeric of length one). The first and second parameters must represent fg and bg, respectively. |
nBlocks |
Number of blocks for block-processing arrays. Default is 5. Increase this for large datasets. To read and process all data at once, set this value to 1. |
verbose |
Boolean (TRUE or FALSE) describing whether to report block-processing progress. |
BPPARAM |
Parallelization params (passed to 'BiocParallel::bplapply()'). Default is the result of 'BiocParallel::bpparams()'. Parallel processing is not available when 'by=interactions'. |
... |
Additional arguments passed to 'pullHicMatrices'. See ?['pullHicMatrices']. |
A DelayedMatrix of enrichment scores where rows are interactions (i.e. loops) and columns are Hi-C files.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loops as GInteractions object loops <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE) |> as_ginteractions(keep.extra.columns=FALSE) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(loops) <- 'ENSEMBL' ## Expand binSize of loops loops <- assignToBins(x=loops, binSize=100e3) ## Calculate loop enrichment calcLoopEnrichment(x=loops[1:10], files=hicFiles) ## Customize different foreground/background ## with selection functions buffer <- 10 # choose pixel radius around center fg <- selectCenterPixel(mhDist=seq(0,4), buffer=buffer) bg <- selectCorners(n=6, buffer=buffer) + selectOuter(n=2, buffer=buffer) ## Calculate loop enrichment calcLoopEnrichment(x=loops[1:10], files=hicFiles, fg=fg, bg=bg) ## Extract count matrices first mats <- assignToBins(loops[1:10],100e3) |> pixelsToMatrices(buffer=10) |> pullHicMatrices( files=hicFiles, binSize=100e3) ## Calculate loop enrichment from count matrices calcLoopEnrichment(x = mats)## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loops as GInteractions object loops <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE) |> as_ginteractions(keep.extra.columns=FALSE) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(loops) <- 'ENSEMBL' ## Expand binSize of loops loops <- assignToBins(x=loops, binSize=100e3) ## Calculate loop enrichment calcLoopEnrichment(x=loops[1:10], files=hicFiles) ## Customize different foreground/background ## with selection functions buffer <- 10 # choose pixel radius around center fg <- selectCenterPixel(mhDist=seq(0,4), buffer=buffer) bg <- selectCorners(n=6, buffer=buffer) + selectOuter(n=2, buffer=buffer) ## Calculate loop enrichment calcLoopEnrichment(x=loops[1:10], files=hicFiles, fg=fg, bg=bg) ## Extract count matrices first mats <- assignToBins(loops[1:10],100e3) |> pixelsToMatrices(buffer=10) |> pullHicMatrices( files=hicFiles, binSize=100e3) ## Calculate loop enrichment from count matrices calcLoopEnrichment(x = mats)
A GInteractions object containing pixels of interest is resized to the 'from' resolution (if its not already), then count matrices are extracted for each interaction and Hi-C file using the new 'to' resolution. Count matrices are aggregated by interactions with the supplied 'aggFUN', and a new pixel is selected with the supplied 'selectFUN'. For large datasets, increase 'nBlocks' to allow for smaller blocks of data to be processed in memory.
changePixelRes( x, files, from, to, aggFUN = sum, selectFUN = "which.max", nBlocks = 5, verbose = TRUE, norm = "KR", half = "upper", ... ) ## S4 method for signature 'GInteractions,character' changePixelRes( x, files, from, to, aggFUN = sum, selectFUN = "which.max", nBlocks = 5, verbose = TRUE, norm = "KR", half = "upper", ... )changePixelRes( x, files, from, to, aggFUN = sum, selectFUN = "which.max", nBlocks = 5, verbose = TRUE, norm = "KR", half = "upper", ... ) ## S4 method for signature 'GInteractions,character' changePixelRes( x, files, from, to, aggFUN = sum, selectFUN = "which.max", nBlocks = 5, verbose = TRUE, norm = "KR", half = "upper", ... )
x |
GInteractions object. |
files |
Character file paths to '.hic' files. |
from |
Number (length one numeric vector) describing the resolution of 'x'. Data will be binned to this value if it is not already binned. |
to |
Number (length one numeric vector) describing the new resolution for the pixels. |
aggFUN |
Function to use for aggregating across Hi-C files. Must be passable to 'which.max' or 'which.min'. Default is "sum". |
selectFUN |
Function to use for selecting among aggregated interactions. Must be one of "which.max" or "which.min". |
nBlocks |
Number of blocks for block-processing arrays. Default is 5. Increase this for large datasets. To read and process all data at once, set this value to 1. |
verbose |
Boolean (TRUE or FALSE) describing whether to report block-processing progress. Default is TRUE. |
norm |
String (length one character vector) describing the Hi-C normalization to apply. Use 'strawr::readHicNormTypes()' to see accepted values for each file in 'files'. |
half |
String (character vector of length one) indicating whether to keep values for the upper triangular ('half="upper"') where 'start1 < start2', lower triangular ('half="lower"') where 'start1 > start2', or both ('half="both"', default). When 'half="upper"' all lower triangular values are 'NA'. When 'half="lower"' all upper triangular values are 'NA'. When 'half="both"' there are no 'NA' values. For interchromosomal interactions there is no inherent directionality between chromosomes, so data is returned regardless of specified order. |
... |
Additional arguments passed to 'pullHicMatrices()'. See ?['pullHicMatrices']. |
A GInteractions object with the updated pixel interactions, along with a column with the aggregated max/min value for that pixel.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loops as GInteractions object loops <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE) |> as_ginteractions(keep.extra.columns=FALSE) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(loops) <- 'ENSEMBL' ## Rebin loops to 2.5e6 resolution loops <- assignToBins(x=loops, binSize=2.5e06) ## Change pixel resolution from 2.5e6 to 500e3 changePixelRes(x=loops[1:5], files=hicFiles, from=2.5e6, to=500e3)## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loops as GInteractions object loops <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE) |> as_ginteractions(keep.extra.columns=FALSE) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(loops) <- 'ENSEMBL' ## Rebin loops to 2.5e6 resolution loops <- assignToBins(x=loops, binSize=2.5e06) ## Change pixel resolution from 2.5e6 to 500e3 changePixelRes(x=loops[1:5], files=hicFiles, from=2.5e6, to=500e3)
Returns the clustered pairs associated with each range in the 'MergedGInteractions' object. Order always follows the indices of the 'MergedGInteractions' object.
clusters(x, ...) ## S4 method for signature 'MergedGInteractions' clusters(x)clusters(x, ...) ## S4 method for signature 'MergedGInteractions' clusters(x)
x |
MergedGInteractions object. |
... |
Additional arguments. |
A list of data.tables cooresponding to each pair in 'x'.
## Load required packages library(data.table, include.only="fread") ## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Reference BEDPE files (loops called with SIP) bedpeFiles <- c( marinerData::FS_5kbLoops.txt(), marinerData::WT_5kbLoops.txt() ) names(bedpeFiles) <- c("FS", "WT") ## Read in bedpeFiles as a list of GInteractions ## Use only first 1000 rows for fast example giList <- lapply(bedpeFiles, fread, nrows = 1000) |> lapply(as_ginteractions) ## Cluster & merge pairs x <- mergePairs(x = giList, radius = 10e03, column = "APScoreAvg") ## Access pair clusters clusters(x[1:3]) clusters(x[3:1]) clusters(x[c(3, 1, 2)]) clusters(x) |> length()## Load required packages library(data.table, include.only="fread") ## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Reference BEDPE files (loops called with SIP) bedpeFiles <- c( marinerData::FS_5kbLoops.txt(), marinerData::WT_5kbLoops.txt() ) names(bedpeFiles) <- c("FS", "WT") ## Read in bedpeFiles as a list of GInteractions ## Use only first 1000 rows for fast example giList <- lapply(bedpeFiles, fread, nrows = 1000) |> lapply(as_ginteractions) ## Cluster & merge pairs x <- mergePairs(x = giList, radius = 10e03, column = "APScoreAvg") ## Access pair clusters clusters(x[1:3]) clusters(x[3:1]) clusters(x[c(3, 1, 2)]) clusters(x) |> length()
Access count matrices from InteractionArray or InteractionMatrix
Access count matrices from InteractionArray or InteractionMatrix
Replace method for counts
## S4 method for signature 'InteractionArray' counts(object, showDimnames = FALSE) ## S4 method for signature 'InteractionMatrix' counts(object) ## S4 replacement method for signature 'InteractionMatrix' counts(object) <- value## S4 method for signature 'InteractionArray' counts(object, showDimnames = FALSE) ## S4 method for signature 'InteractionMatrix' counts(object) ## S4 replacement method for signature 'InteractionMatrix' counts(object) <- value
object |
InteractionMatrix object |
showDimnames |
Logical vector of length-one indicating whether to show dimensions of count matrices (default FALSE). Only applicable for InteractionArray objects. |
value |
Value for replacement |
For InteractionArray, a 4-dimensional DelayedArray of Hi-C submatrices is returned with the following dimensions: rows of count matrix, columns of count matrix, Interactions in 'object', Hi-C 'files'.
For InteractionMatrix, a 2-dimensional DelayedArray is returned with rows representing interactions in 'object' and columns for each Hi-C file in 'files'.
For InteractionMatrix, the replace matrix replaces the counts assay with matrix-like objects supplied in 'value'.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ###################################### ## Accessing Hi-C count submatrices ## ###################################### ## Create example interactions x <- read.table(text=" 9 14435000 14490000 9 14740000 14795000 9 89540000 89595000 9 89785000 89840000 9 23700000 23755000 9 23760000 23815000") x <- as_ginteractions(x) ## Extract 3, 11x11 count matrices from 2 hic files iarr <- pullHicMatrices(x, hicFiles, 5e03) ## Access count matrices counts(iarr) counts(iarr, FALSE) ################################# ## Accessing Hi-C count matrix ## ################################# ## Create example interactions x <- read.table(text=" 9 14000000 14500000 9 14500000 15000000 9 89500000 90000000 9 89500000 90000000 9 23500000 24000000 9 23500000 24000000") x <- as_ginteractions(x) ## Extract 3 pixels from 2 hic files imat <- pullHicPixels(x, hicFiles, 500e03) ## Access count matrix counts(imat) ################################# ## Replacing Hi-C count matrix ## ################################# ## Realize as in-memory matrix counts(imat) <- as.matrix(counts(imat)) counts(imat) imat## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ###################################### ## Accessing Hi-C count submatrices ## ###################################### ## Create example interactions x <- read.table(text=" 9 14435000 14490000 9 14740000 14795000 9 89540000 89595000 9 89785000 89840000 9 23700000 23755000 9 23760000 23815000") x <- as_ginteractions(x) ## Extract 3, 11x11 count matrices from 2 hic files iarr <- pullHicMatrices(x, hicFiles, 5e03) ## Access count matrices counts(iarr) counts(iarr, FALSE) ################################# ## Accessing Hi-C count matrix ## ################################# ## Create example interactions x <- read.table(text=" 9 14000000 14500000 9 14500000 15000000 9 89500000 90000000 9 89500000 90000000 9 23500000 24000000 9 23500000 24000000") x <- as_ginteractions(x) ## Extract 3 pixels from 2 hic files imat <- pullHicPixels(x, hicFiles, 500e03) ## Access count matrix counts(imat) ################################# ## Replacing Hi-C count matrix ## ################################# ## Realize as in-memory matrix counts(imat) <- as.matrix(counts(imat)) counts(imat) imat
Return default buffer If InteractionArray is supplied, it uses the dimensions of counts matrices to set the buffer dimensions.
defaultBuffer(x)defaultBuffer(x)
x |
InteractionArray |
5 (set default), the buffer of the provided InteractionArray, or an error message if the InteractionArray is not odd and square (no buffer)
Overlap methods for InteractionJaggedArray
## S4 method for signature 'InteractionJaggedArray,InteractionJaggedArray' findOverlaps( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = TRUE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,Vector' findOverlaps( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = TRUE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,missing' findOverlaps( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = TRUE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,InteractionJaggedArray' countOverlaps( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = TRUE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,Vector' countOverlaps( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = TRUE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,missing' countOverlaps( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = TRUE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,InteractionJaggedArray' overlapsAny( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,Vector' overlapsAny( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,missing' overlapsAny( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,InteractionJaggedArray' subsetByOverlaps( x, ranges, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), invert = FALSE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,Vector' subsetByOverlaps( x, ranges, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), invert = FALSE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,missing' subsetByOverlaps( x, ranges, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), invert = FALSE, ..., use.region = "both" )## S4 method for signature 'InteractionJaggedArray,InteractionJaggedArray' findOverlaps( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = TRUE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,Vector' findOverlaps( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = TRUE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,missing' findOverlaps( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = TRUE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,InteractionJaggedArray' countOverlaps( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = TRUE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,Vector' countOverlaps( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = TRUE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,missing' countOverlaps( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = TRUE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,InteractionJaggedArray' overlapsAny( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,Vector' overlapsAny( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,missing' overlapsAny( query, subject, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,InteractionJaggedArray' subsetByOverlaps( x, ranges, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), invert = FALSE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,Vector' subsetByOverlaps( x, ranges, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), invert = FALSE, ..., use.region = "both" ) ## S4 method for signature 'InteractionJaggedArray,missing' subsetByOverlaps( x, ranges, maxgap = -1L, minoverlap = 0L, type = c("any", "start", "end", "within", "equal"), invert = FALSE, ..., use.region = "both" )
query, subject, x, ranges
|
An InteractionJaggedArray, Vector, GInteractions or InteractionSet object, depending on the specified method. At least one of these must be a 'subject' can be missing if query is an InteractionJaggedArray object. |
maxgap, minoverlap, type, select
|
see ?'findOverlaps' in the GenomicRanges package. |
ignore.strand |
see ?'findOverlaps' in InteractionSet package for more information. |
... |
see ?'findOverlaps' in InteractionSet package for more information |
use.region |
see ?'findOverlaps' in InteractionSet package for more information. |
invert |
Boolean (TRUE/FALSE) to invert selection. Default is TRUE. |
'findOverlaps' returns a Hits object. 'countOverlaps' returns an integer vector of overlaps for each interaction in 'query'. 'overlapsAny' returns a logical vector of overlaps for each interaction in 'query'. 'subsetByOverlaps' returns overlapping interactions as an InteractionJaggedArray.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Create test interactions gi <- read.table(text=" 1 51000000 51300000 1 51000000 51500000 2 52000000 52300000 3 52000000 52500000 1 150000000 150500000 1 150000000 150300000 2 52000000 52300000 2 52000000 52800000") |> as_ginteractions() ## InteractionJaggedArray object iarr <- pullHicMatrices(gi, hicFiles, 100e03, half="both") ## Shift first two ranges out of range gi2 <- c(assignToBins(gi[1:2], binSize=100e3, pos1=-200e3), gi[3:4]) ## Find overlaps findOverlaps(iarr, gi2) countOverlaps(iarr, gi2) countOverlaps(iarr, gi2, maxgap=100e3) overlapsAny(iarr, gi2) subsetByOverlaps(iarr, gi2) subsetByOverlaps(iarr, gi2, invert=TRUE)## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Create test interactions gi <- read.table(text=" 1 51000000 51300000 1 51000000 51500000 2 52000000 52300000 3 52000000 52500000 1 150000000 150500000 1 150000000 150300000 2 52000000 52300000 2 52000000 52800000") |> as_ginteractions() ## InteractionJaggedArray object iarr <- pullHicMatrices(gi, hicFiles, 100e03, half="both") ## Shift first two ranges out of range gi2 <- c(assignToBins(gi[1:2], binSize=100e3, pos1=-200e3), gi[3:4]) ## Find overlaps findOverlaps(iarr, gi2) countOverlaps(iarr, gi2) countOverlaps(iarr, gi2, maxgap=100e3) overlapsAny(iarr, gi2) subsetByOverlaps(iarr, gi2) subsetByOverlaps(iarr, gi2, invert=TRUE)
Read in array data in blocks, apply function, and write back to an HDF5 file.
hdf5BlockApply(x, FUN, sink, grid, sink_grid, verbose = TRUE) ## S4 method for signature 'DelayedArray' hdf5BlockApply(x, FUN, sink, grid, sink_grid, verbose = TRUE)hdf5BlockApply(x, FUN, sink, grid, sink_grid, verbose = TRUE) ## S4 method for signature 'DelayedArray' hdf5BlockApply(x, FUN, sink, grid, sink_grid, verbose = TRUE)
x |
Delayed Array object. |
FUN |
Function that takes one argument 'block' and processes it. |
sink |
HDF5RealizationSink object. |
grid |
ArrayGrid over array 'x'. |
sink_grid |
ArrayGrid over 'sink'. |
verbose |
Logical - whether block processing progress should be displayed. |
Implements an HDF5-backed option for block processing on DelayedArray objects.
An HDF5Array object.
## ################################################ ## This function is intended for advanced users. ## To learn more about using DelayedArray ## or HDF5-backed objects, see ?DelayedArray or ## ?HDF5Array ################################################### library(DelayedArray) library(HDF5Array) library(rhdf5) ## Create example array that is longer in the ## 3rd dimension (representing interactions) dims <- c(11L, 11L, 100L, 2L) a <- array(data=seq(1, prod(dims)), dim=dims) a <- DelayedArray(a) ## Define spacings, breaking up the longest dim ## Here we are processing in blocks of 10 spacings <- dim(a) spacings[3] <- ceiling(spacings[3]/10) ## Define storage dimensions (all except those ## over which the function is being applied) storageDims <- dims[c(1,2,3)] ## Define chunk dimensions for writing to HDF5 chunkDims <- storageDims chunkDims[3] <- spacings[3] ## Create grid for applying the data (grid) ## and grid for writing to the sink (sink_grid) grid <- RegularArrayGrid(dims, spacings) sink_grid <- RegularArrayGrid(storageDims, chunkDims) ## Create HDF5 file for writing h5 <- tempfile(fileext = ".h5") h5createFile(h5) ## Define compression for HDF5 compressionLevel <- 0 ## Create HDF5-backed realization sink sink <- HDF5RealizationSink(filepath=h5, name="counts", type="integer", dim=storageDims, chunkdim=chunkDims, level=compressionLevel) ## Wrap function that operates on each block ## this can be anything, here it is sum FUN <- \(block) apply(block, c(1,2,3), sum) ## Read, apply, and write to HDF5 ans <- hdf5BlockApply(x=a, FUN=FUN, sink=sink, grid=grid, sink_grid=sink_grid, verbose=TRUE) ans## ################################################ ## This function is intended for advanced users. ## To learn more about using DelayedArray ## or HDF5-backed objects, see ?DelayedArray or ## ?HDF5Array ################################################### library(DelayedArray) library(HDF5Array) library(rhdf5) ## Create example array that is longer in the ## 3rd dimension (representing interactions) dims <- c(11L, 11L, 100L, 2L) a <- array(data=seq(1, prod(dims)), dim=dims) a <- DelayedArray(a) ## Define spacings, breaking up the longest dim ## Here we are processing in blocks of 10 spacings <- dim(a) spacings[3] <- ceiling(spacings[3]/10) ## Define storage dimensions (all except those ## over which the function is being applied) storageDims <- dims[c(1,2,3)] ## Define chunk dimensions for writing to HDF5 chunkDims <- storageDims chunkDims[3] <- spacings[3] ## Create grid for applying the data (grid) ## and grid for writing to the sink (sink_grid) grid <- RegularArrayGrid(dims, spacings) sink_grid <- RegularArrayGrid(storageDims, chunkDims) ## Create HDF5 file for writing h5 <- tempfile(fileext = ".h5") h5createFile(h5) ## Define compression for HDF5 compressionLevel <- 0 ## Create HDF5-backed realization sink sink <- HDF5RealizationSink(filepath=h5, name="counts", type="integer", dim=storageDims, chunkdim=chunkDims, level=compressionLevel) ## Wrap function that operates on each block ## this can be anything, here it is sum FUN <- \(block) apply(block, c(1,2,3), sum) ## Read, apply, and write to HDF5 ans <- hdf5BlockApply(x=a, FUN=FUN, sink=sink, grid=grid, sink_grid=sink_grid, verbose=TRUE) ans
The 'InteractionArray' class extends 'InteractionSet' to provide an interface for accessing submatrices pulled from Hi-C data.
InteractionArray(assays, interactions, ...) ## S4 method for signature 'ANY,GInteractions' InteractionArray(assays, interactions, ...) ## S4 method for signature 'missing,missing' InteractionArray(assays, interactions, ...) ## S4 method for signature 'InteractionArray' show(object) ## S4 method for signature 'InteractionArray' rbind(..., deparse.level = 1) ## S4 method for signature 'InteractionArray' cbind(..., deparse.level = 1)InteractionArray(assays, interactions, ...) ## S4 method for signature 'ANY,GInteractions' InteractionArray(assays, interactions, ...) ## S4 method for signature 'missing,missing' InteractionArray(assays, interactions, ...) ## S4 method for signature 'InteractionArray' show(object) ## S4 method for signature 'InteractionArray' rbind(..., deparse.level = 1) ## S4 method for signature 'InteractionArray' cbind(..., deparse.level = 1)
assays, interactions
|
See
|
... |
InteractionArray objects to be combined column-wise. All objects must be the same class. |
object |
InteractionArray object. |
deparse.level |
An integer scalar; see '?base::cbind' for a description of this argument. |
This class is constructed with the 'pullHicMatrices()' function when all paired ranges have equal dimensions.
An InteractionArray (see description)
[InteractionSet::InteractionSet]
InteractionArray()InteractionArray()
The 'InteractionJaggedArray' class creates a container for storing interaction data alongside irregular arrays. This allows the storage of matrices with different dimensions on-disk using HDF5.
Subset an InteractionJaggedArray by its interactions ([i,]) or its Hi-C files ([,j]).
## S4 method for signature 'InteractionJaggedArray' show(object) ## S4 method for signature 'InteractionJaggedArray' dim(x) ## S4 method for signature 'InteractionJaggedArray' interactions(x) ## S4 method for signature 'InteractionJaggedArray' metadata(x) ## S4 method for signature 'InteractionJaggedArray' colData(x) ## S4 method for signature 'InteractionJaggedArray' counts(object) ## S4 method for signature 'InteractionJaggedArray' path(object) ## S4 method for signature 'InteractionJaggedArray' length(x) ## S4 method for signature 'InteractionJaggedArray,ANY,ANY,ANY' x[i, j]## S4 method for signature 'InteractionJaggedArray' show(object) ## S4 method for signature 'InteractionJaggedArray' dim(x) ## S4 method for signature 'InteractionJaggedArray' interactions(x) ## S4 method for signature 'InteractionJaggedArray' metadata(x) ## S4 method for signature 'InteractionJaggedArray' colData(x) ## S4 method for signature 'InteractionJaggedArray' counts(object) ## S4 method for signature 'InteractionJaggedArray' path(object) ## S4 method for signature 'InteractionJaggedArray' length(x) ## S4 method for signature 'InteractionJaggedArray,ANY,ANY,ANY' x[i, j]
object |
InteractionJaggedArray object. |
x |
An InteractionJaggedArray object. |
i |
Numeric vector indicating the indices of interactions to extract. |
j |
Numeric vector indicating the indices of files to extract. |
The object returned will be a InteractionJaggedArray if the submatrices contain different dimensions. However, the returned object will automatically be coerced into a InteractionArray if possible (i.e. the dimensions of the rows and columns of submatrices are the same.)
'InteractionJaggedArray()' creates an InteractionJaggedArray object.
'dim()' returns a list of the dimensions of the interactions, files, and count matrices.
'interactions()' returns the interactions.
'metadata()' returns the metadata.
'colData()' returns the column data.
'counts()' returns the JaggedArray object containing count matrix information.
'path()' returns a character vector with the path to the HDF5 file with the JaggedArray data.
'length()' returns an integer with the number of interactions in an InteractionJaggedArray object.
Subsetting returns an InteractionJaggedArray or InteractionArray object (see Details).
interactionsA GInteractions object.
colDataColumn data describing Hi-C files.
countsA JaggedArray object with data.
metadataList of metadata describing the object.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Create test interactions gi <- read.table(text=" 1 51000000 51300000 1 51000000 51500000 2 52000000 52300000 3 52000000 52500000 1 150000000 150500000 1 150000000 150300000 2 52000000 52300000 2 52000000 52800000") |> as_ginteractions() ## InteractionJaggedArray object iarr <- pullHicMatrices(gi, hicFiles, 100e03, half="both") iarr ## Show dimensions dim(iarr) ## Access interactions interactions(iarr) ## Access metadata metadata(iarr) ## Access colData colData(iarr) ## Access count matrices counts(iarr) ## Access path to HDF5 data path(iarr) ## length length(iarr) ## Subsetting iarr[1:3,1]## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Create test interactions gi <- read.table(text=" 1 51000000 51300000 1 51000000 51500000 2 52000000 52300000 3 52000000 52500000 1 150000000 150500000 1 150000000 150300000 2 52000000 52300000 2 52000000 52800000") |> as_ginteractions() ## InteractionJaggedArray object iarr <- pullHicMatrices(gi, hicFiles, 100e03, half="both") iarr ## Show dimensions dim(iarr) ## Access interactions interactions(iarr) ## Access metadata metadata(iarr) ## Access colData colData(iarr) ## Access count matrices counts(iarr) ## Access path to HDF5 data path(iarr) ## length length(iarr) ## Subsetting iarr[1:3,1]
The 'InteractionMatrix' class extends the 'InteractionSet' to provide an interface for accessing the count matrix pulled from Hi-C data.
InteractionMatrix(assays, interactions, ...) ## S4 method for signature 'ANY,GInteractions' InteractionMatrix(assays, interactions, ...) ## S4 method for signature 'missing,missing' InteractionMatrix(assays, interactions, ...) ## S4 method for signature 'InteractionMatrix' show(object) ## S4 method for signature 'InteractionMatrix' rbind(..., deparse.level = 1) ## S4 method for signature 'InteractionMatrix' cbind(..., deparse.level = 1)InteractionMatrix(assays, interactions, ...) ## S4 method for signature 'ANY,GInteractions' InteractionMatrix(assays, interactions, ...) ## S4 method for signature 'missing,missing' InteractionMatrix(assays, interactions, ...) ## S4 method for signature 'InteractionMatrix' show(object) ## S4 method for signature 'InteractionMatrix' rbind(..., deparse.level = 1) ## S4 method for signature 'InteractionMatrix' cbind(..., deparse.level = 1)
assays, interactions
|
See
|
... |
InteractionMatrix objects to be combined column-wise. All objects must be the same class. |
object |
InteractionMatrix object. |
deparse.level |
An integer scalar; see '?base::cbind' for a description of this argument. |
This class is constructed with the 'pullHicPixels()' function when all paired ranges define a single pixel.
An InteractionMatrix (see description)
[InteractionSet::InteractionSet]
InteractionMatrix()InteractionMatrix()
The 'JaggedArray' class creates a container for storing irregular or jagged array data. This allows the storage of matrices with different dimensions on-disk using HDF5.
Subset a JaggedArray by its interactions ([i,]) or its Hi-C files ([,j]).
'as.list' reads the on-disk data and returns it as an in-memory list of matrices.
## S4 method for signature 'JaggedArray' show(object) ## S4 method for signature 'JaggedArray,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'JaggedArray' as.list(x) ## S4 method for signature 'JaggedArray' path(object) ## S4 method for signature 'JaggedArray' dim(x)## S4 method for signature 'JaggedArray' show(object) ## S4 method for signature 'JaggedArray,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'JaggedArray' as.list(x) ## S4 method for signature 'JaggedArray' path(object) ## S4 method for signature 'JaggedArray' dim(x)
object |
JaggedArray object. |
x |
JaggedArray object. |
i |
Numeric vector indicating the indices of interactions to extract. |
j |
Numeric vector indicating the indices of files to extract. |
... |
Additional indices for subsetting multidimensional arrays. |
drop |
Not accepted for JaggedArray objects. |
NOTE: This class is designed specifically for holding a 4-dimensional JaggedArray <n x m x i x j> where n x m are rows and cols of count matrices, i is interactions, and j is Hi-C files.
The object returned will be a JaggedArray if the submatrices contain different dimensions. However, the returned object will automatically be coerced into a DelayedArray if possible (i.e. the dimensions of the rows and columns are the same.)
The JaggedArray data is still stored on-disk in an HDF5 file until it is coerced into a DelayedArray or realized as a list of matrices.
'JaggedArray()' creates a JaggedArray object.
Subsetting returns a JaggedArray or DelayedArray object (see Details).
'as.list()' returns a list of matrices.
'path()' returns a character vector with the path to the HDF5 file with the JaggedArray data.
'dim()' returns a list of dimensions of the JaggedArray of rows, cols, interactions and files.
h5Filepath to file for creating and storing data as an HDF5 file.
dimdimensions describing the number of matrices contained. dim[1] is the number of interactions, dim[2] is the number of files.
subListis a list of length 2 where the first position refers to interactions and the second refers to files. This list is used to record subsetting operations which are then later applied when accessing data stored in the HDF5 file.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Create test interactions gi <- read.table(text=" 1 51000000 51300000 1 51000000 51500000 2 52000000 52300000 3 52000000 52500000 1 150000000 150500000 1 150000000 150300000 2 52000000 52300000 2 52000000 52800000") |> as_ginteractions() ## InteractionJaggedArray object iarr <- pullHicMatrices(gi, hicFiles, 100e03, half="both") arr <- counts(iarr) arr ## Subsetting arr[,,1,] # DelayedArray arr[,,,1] # JaggedArray ## Realize as list as.list(arr) ## Find the data path path(arr) ## Find the data path dim(arr)## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Create test interactions gi <- read.table(text=" 1 51000000 51300000 1 51000000 51500000 2 52000000 52300000 3 52000000 52500000 1 150000000 150500000 1 150000000 150300000 2 52000000 52300000 2 52000000 52800000") |> as_ginteractions() ## InteractionJaggedArray object iarr <- pullHicMatrices(gi, hicFiles, 100e03, half="both") arr <- counts(iarr) arr ## Subsetting arr[,,1,] # DelayedArray arr[,,,1] # JaggedArray ## Realize as list as.list(arr) ## Find the data path path(arr) ## Find the data path dim(arr)
Creating random GRanges & GInteractions
makeRandomGRanges(seqinfo, n = 100, ...) makeRandomGInteractions(seqinfo, n = 100, interchromosomal = TRUE, ...) ## S4 method for signature 'Seqinfo' makeRandomGRanges(seqinfo, n, .rows = NULL) ## S4 method for signature 'Seqinfo' makeRandomGInteractions(seqinfo, n, interchromosomal)makeRandomGRanges(seqinfo, n = 100, ...) makeRandomGInteractions(seqinfo, n = 100, interchromosomal = TRUE, ...) ## S4 method for signature 'Seqinfo' makeRandomGRanges(seqinfo, n, .rows = NULL) ## S4 method for signature 'Seqinfo' makeRandomGInteractions(seqinfo, n, interchromosomal)
seqinfo |
A Seqinfo object containing the chromosome names, lengths, and genome build. |
n |
Integer describing the number of random sequences to generate |
... |
Additional arguments. |
interchromosomal |
Boolean (TRUE/FALSE) indicating whether interchromosomal interactions should be allowed. Default is TRUE. |
.rows |
(internal use only) vector of row positions to sample from seqinfo. |
A GRanges or GInteractions object with ranges selected randomly with replacement on the provided seqinfo.
## Define Seqinfo containing chromosome info if (require(TxDb.Hsapiens.UCSC.hg38.knownGene)) { txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene si <- seqinfo(txdb) si <- GenomeInfoDb::keepStandardChromosomes(si) } else { si <- Seqinfo( seqnames=c("chr1", "chr2"), seqlengths=rep(200e6, 2), genome="hg38" ) } ## Make some GRanges set.seed(123) makeRandomGRanges(si, 100) ## Make some GInteractions set.seed(123) makeRandomGInteractions(si, n=100) ## Make some GInteractions only on same chromosome set.seed(123) makeRandomGInteractions(si, n=100, interchromosomal=FALSE) ## Use specific binSizes n <- 100 binOptions <- seq(5e3, 200e3, by=5e3) si <- Seqinfo(seqnames="chr1", seqlengths=200e6, genome="hg38") set.seed(123) bins <- sample(binOptions, n, replace=TRUE) makeRandomGInteractions(si, n) |> resize(bins) |> trim()## Define Seqinfo containing chromosome info if (require(TxDb.Hsapiens.UCSC.hg38.knownGene)) { txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene si <- seqinfo(txdb) si <- GenomeInfoDb::keepStandardChromosomes(si) } else { si <- Seqinfo( seqnames=c("chr1", "chr2"), seqlengths=rep(200e6, 2), genome="hg38" ) } ## Make some GRanges set.seed(123) makeRandomGRanges(si, 100) ## Make some GInteractions set.seed(123) makeRandomGInteractions(si, n=100) ## Make some GInteractions only on same chromosome set.seed(123) makeRandomGInteractions(si, n=100, interchromosomal=FALSE) ## Use specific binSizes n <- 100 binOptions <- seq(5e3, 200e3, by=5e3) si <- Seqinfo(seqnames="chr1", seqlengths=200e6, genome="hg38") set.seed(123) bins <- sample(binOptions, n, replace=TRUE) makeRandomGInteractions(si, n) |> resize(bins) |> trim()
An object containing the selected indices of a matrix.
A MatrixSelection object (see description)
xVector of selected indices from a matrix of 'dim = buffer*2+1'.
bufferInteger indicating the buffer size, or number of pixels around a matrix.
selectCenterPixel(0, 5)selectCenterPixel(0, 5)
The 'MergedGInteractions' class extends the 'GInteractions' to contain additional information about the pairs being merged.
The 'MergedGInteractions' class uses a delegate object during initialization to assign its 'GInteractions' slots. In addition to containing information from all pairs, it also behaves as a 'GInteractions' object. 'mergePairs()' builds this object.
A MergedGInteractions object (see description)
delegateA 'GInteractions' object used to initialize 'GInteractions'-specific slots. This is the mergedPairs set of interactions.
idsAn integer vector of ids linking indices in the 'delegate' slot all pairs ('allPairs' slot). These indices are parallel to 'delegate'.
allPairsA 'data.table' containing all input pairs combined. Also contains all metadata for each pair and 1) the source of the file, 2) an id, 3) which chromosome pair it belongs to (i.e. 'grp'), and 4) the assigned cluster from 'dbscan' (i.e. 'clst').
selectionMethodCharacter describing which method was used to select the final pair from the cluster of merged pairs.
[InteractionSet::GInteractions]
## Load required packages library(data.table, include.only="fread") ## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Reference BEDPE files (loops called with SIP) bedpeFiles <- c( marinerData::FS_5kbLoops.txt(), marinerData::WT_5kbLoops.txt() ) names(bedpeFiles) <- c("FS", "WT") ## Read in bedpeFiles as a list of GInteractions ## Use only first 1000 rows for fast example giList <- lapply(bedpeFiles, fread, nrows=1000) |> lapply(as_ginteractions) ## Cluster & merge pairs x <- mergePairs(x = giList, radius = 10e03, column = "APScoreAvg") class(x)## Load required packages library(data.table, include.only="fread") ## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Reference BEDPE files (loops called with SIP) bedpeFiles <- c( marinerData::FS_5kbLoops.txt(), marinerData::WT_5kbLoops.txt() ) names(bedpeFiles) <- c("FS", "WT") ## Read in bedpeFiles as a list of GInteractions ## Use only first 1000 rows for fast example giList <- lapply(bedpeFiles, fread, nrows=1000) |> lapply(as_ginteractions) ## Cluster & merge pairs x <- mergePairs(x = giList, radius = 10e03, column = "APScoreAvg") class(x)
Sets of paired range objects (i.e., 'GInteractions') are first clustered by genomic distance with 'dbscan', then a representative interaction is selected for each cluster.
mergePairs( x, radius, method = "manhattan", column = NULL, selectMax = TRUE, pos = "center" ) ## S4 method for signature 'list_OR_SimpleList_OR_GInteractions,numeric' mergePairs( x, radius, method = "manhattan", column = NULL, selectMax = TRUE, pos = "center" )mergePairs( x, radius, method = "manhattan", column = NULL, selectMax = TRUE, pos = "center" ) ## S4 method for signature 'list_OR_SimpleList_OR_GInteractions,numeric' mergePairs( x, radius, method = "manhattan", column = NULL, selectMax = TRUE, pos = "center" )
x |
List of 'GInteractions' or 'data.frame'-like objects. |
radius |
Numeric describing the distance in base pairs used to define a cluster or pairs. |
method |
Character describing the distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Any unambiguous substring can be given. Default is "manhattan". |
column |
Character denoting the column to be used to select among clustered interactions. |
selectMax |
Logical. TRUE (default) uses 'which.max()' to select the interaction pair. FALSE uses 'which.min()'. Only applicable when 'column' is specified. |
pos |
Positions used for clustering pairs. Must be one of "start", "end" or "center". Default is "center". |
Interactions are clustered into groups using the provided base pair 'radius', and distance 'method' with 'dbscan()'. Representative interactions are selected for each group by one of two methods. If 'column' and 'selectMax' arguments are provided, the representative interaction with the maximum (or minimum) value in 'column' is returned for each cluster. If these parameters are missing, new ranges for each pair are returned by calculating the median of modes for each cluster.
Returns a 'MergedGInteractions' object.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") bedpeFiles <- c( marinerData::FS_5kbLoops.txt(), marinerData::WT_5kbLoops.txt() ) names(bedpeFiles) <- c("FS", "WT") ## Read in bedpeFiles as a list of GInteractions ## Use only first 1000 rows for fast example giList <- lapply(bedpeFiles, read.table, header=TRUE, nrows=1000) |> lapply(as_ginteractions) ## Cluster & merge pairs x <- mergePairs(x = giList, radius = 10e03, column = "APScoreAvg") x## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") bedpeFiles <- c( marinerData::FS_5kbLoops.txt(), marinerData::WT_5kbLoops.txt() ) names(bedpeFiles) <- c("FS", "WT") ## Read in bedpeFiles as a list of GInteractions ## Use only first 1000 rows for fast example giList <- lapply(bedpeFiles, read.table, header=TRUE, nrows=1000) |> lapply(as_ginteractions) ## Cluster & merge pairs x <- mergePairs(x = giList, radius = 10e03, column = "APScoreAvg") x
Returns the file path describing where the on-disk HDF5 data associated with the InteractionMatrix object is stored.
This method circumvents the 'assays<-' and 'path<-' methods for updating the HDF5 path because they are not accessible when the file path is broken.
## S4 method for signature 'InteractionMatrix' path(object) ## S4 replacement method for signature 'InteractionMatrix' path(object) <- value## S4 method for signature 'InteractionMatrix' path(object) ## S4 replacement method for signature 'InteractionMatrix' path(object) <- value
object |
InteractionMatrix object |
value |
String (length-one character vector) to use for path replacement. |
If the file no longer exists, the path is returned along with a warning.
This allows the file path to be updated even if the original linked data no longer exists.
The path to the HDF5 file associated with the InteractionMatrix object.
Updates path to HDF5 file for the InteractionMatrix object.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ################################# ## Accessing path to HDF5 data ## ################################# ## Create example interactions x <- read.table(text=" 9 14000000 14500000 9 14500000 15000000 9 89500000 90000000 9 89500000 90000000 9 23500000 24000000 9 23500000 24000000") x <- as_ginteractions(x) ## Extract 3 pixels from 2 hic files imat <- pullHicPixels(x, hicFiles, 500e03) ## Access path path(imat) ################################# ## Updating path to HDF5 data ## ################################ ## Create example interactions x <- read.table(text=" 9 14000000 14500000 9 14500000 15000000 9 89500000 90000000 9 89500000 90000000 9 23500000 24000000 9 23500000 24000000") x <- as_ginteractions(x) ## Extract 3 pixels from 2 hic files h5File <- tempfile(fileext=".h5") imat <- pullHicPixels(x, hicFiles, 500e03, h5File=h5File) ## Move file to new location newFile <- tempfile(fileext="_new.h5") file.rename(from=h5File, to=newFile) ## Update path path(imat) <- newFile path(imat)## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ################################# ## Accessing path to HDF5 data ## ################################# ## Create example interactions x <- read.table(text=" 9 14000000 14500000 9 14500000 15000000 9 89500000 90000000 9 89500000 90000000 9 23500000 24000000 9 23500000 24000000") x <- as_ginteractions(x) ## Extract 3 pixels from 2 hic files imat <- pullHicPixels(x, hicFiles, 500e03) ## Access path path(imat) ################################# ## Updating path to HDF5 data ## ################################ ## Create example interactions x <- read.table(text=" 9 14000000 14500000 9 14500000 15000000 9 89500000 90000000 9 89500000 90000000 9 23500000 24000000 9 23500000 24000000") x <- as_ginteractions(x) ## Extract 3 pixels from 2 hic files h5File <- tempfile(fileext=".h5") imat <- pullHicPixels(x, hicFiles, 500e03, h5File=h5File) ## Move file to new location newFile <- tempfile(fileext="_new.h5") file.rename(from=h5File, to=newFile) ## Update path path(imat) <- newFile path(imat)
pileupBoundaries expands input loci to the specified 'width', extracts then aggregates them into a single matrix. This can be used to aggregate windows of interactions centered on a set of loci.
pileupBoundaries( x, files, binSize, width = 5e+05, normalize = TRUE, FUN = sum, nBlocks = 50, verbose = TRUE, BPPARAM = bpparam(), blockSize = 1e+06, ... ) ## S4 method for signature 'GRanges_OR_GInteractions,character,numeric' pileupBoundaries( x, files, binSize, width = 5e+05, normalize = TRUE, FUN = sum, nBlocks = 50, verbose = TRUE, BPPARAM = bpparam(), blockSize = 1e+06, ... )pileupBoundaries( x, files, binSize, width = 5e+05, normalize = TRUE, FUN = sum, nBlocks = 50, verbose = TRUE, BPPARAM = bpparam(), blockSize = 1e+06, ... ) ## S4 method for signature 'GRanges_OR_GInteractions,character,numeric' pileupBoundaries( x, files, binSize, width = 5e+05, normalize = TRUE, FUN = sum, nBlocks = 50, verbose = TRUE, BPPARAM = bpparam(), blockSize = 1e+06, ... )
x |
GRanges or GInteractions object containing the loci to be aggregated. GInteractions will be split into unique anchors. |
files |
Character file paths to '.hic' files. |
binSize |
Integer (numeric) describing the resolution (range widths) of the paired data. Note that small values for this argument may lead to R session crashes. |
width |
Number of base pairs to expand the loci of interest in 'x'. |
normalize |
Boolean, whether to normalize the aggregated values to the number of interactions. |
FUN |
Function to use for aggregating. |
nBlocks |
Number of blocks for block-processing arrays. Default is 50. Increase this for large datasets. To read and process all data at once, set this value to 1. |
verbose |
Boolean (TRUE or FALSE) describing whether to report block-processing progress. |
BPPARAM |
Parallelization params (passed to 'BiocParallel::bplapply()'). Default is the result of 'BiocParallel::bpparams()'. Parallel processing is not available when 'by=interactions'. |
blockSize |
Number (length one numeric vector) describing the size in base-pairs to pull from each '.hic' file. Default is 1e6. For large '.hic' files 'blockSize' can be reduced to conserve the amount of data read in at a time. Larger 'blockSize' values speed up performance, but use more memory. |
... |
Additional arguments passed to 'pullHicMatrices()'. |
It may be necessary to adjust the 'zrange' in 'plotMatrix()' since the Hi-C diagonal will dominate the scale.
Using small 'binSize' values with large ranges may lead to pulling very large sections of a Hi-C map that can crash your R session. If this happens try increasing the 'binSize' and 'nBlocks' parameters, while lower the 'blockSize' parameter.
A DelayedArray of aggregated counts.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFile <- marinerData::LEUK_HEK_PJA27_inter_30.hic() names(hicFile) <- "FS" ## Loops loops <- marinerData::FS_5kbLoops.txt() |> read.table(header=TRUE, nrows=100) |> as_ginteractions() |> GenomeInfoDb::`seqlevelsStyle<-`(value='ENSEMBL') ## Warn about small binSize pileupBoundaries(x=loops, files=hicFile, binSize=50e3)## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFile <- marinerData::LEUK_HEK_PJA27_inter_30.hic() names(hicFile) <- "FS" ## Loops loops <- marinerData::FS_5kbLoops.txt() |> read.table(header=TRUE, nrows=100) |> as_ginteractions() |> GenomeInfoDb::`seqlevelsStyle<-`(value='ENSEMBL') ## Warn about small binSize pileupBoundaries(x=loops, files=hicFile, binSize=50e3)
pileupDomains expands then extracts regions/domains from Hi-C files, regularizes them so they are the same size, then aggregates them into a single matrix. This can be used to perform aggregate TAD analysis.
pileupDomains( x, files, binSize, buffer = 0.5, ndim = c(100, 100), scale = TRUE, normalize = TRUE, FUN = sum, nBlocks = 50, verbose = TRUE, BPPARAM = bpparam(), blockSize = 1e+06, ... ) ## S4 method for signature 'GRanges_OR_GInteractions,character,numeric' pileupDomains( x, files, binSize, buffer = 0.5, ndim = c(100, 100), scale = TRUE, normalize = TRUE, FUN = sum, nBlocks = 50, verbose = TRUE, BPPARAM = bpparam(), blockSize = 1e+06, ... )pileupDomains( x, files, binSize, buffer = 0.5, ndim = c(100, 100), scale = TRUE, normalize = TRUE, FUN = sum, nBlocks = 50, verbose = TRUE, BPPARAM = bpparam(), blockSize = 1e+06, ... ) ## S4 method for signature 'GRanges_OR_GInteractions,character,numeric' pileupDomains( x, files, binSize, buffer = 0.5, ndim = c(100, 100), scale = TRUE, normalize = TRUE, FUN = sum, nBlocks = 50, verbose = TRUE, BPPARAM = bpparam(), blockSize = 1e+06, ... )
x |
GRanges or GInteractions object containing the TADs or Loops to be aggregated. |
files |
Character file paths to '.hic' files. |
binSize |
Integer (numeric) describing the resolution (range widths) of the paired data. Note that small values for this argument may lead to R session crashes. |
buffer |
Fraction (length one numeric vector) pair-distance to expand around the resulting range. |
ndim |
Numeric vector of length two describing the new dimensions of the output matrices. |
scale |
Boolean (TRUE/FALSE) indicating whether the values in the new matrices should be scaled to the total signal in each matrix. |
normalize |
Boolean, whether to normalize the aggregated values to the number of interactions. |
FUN |
Function to use for aggregating. |
nBlocks |
Number of blocks for block-processing arrays. Default is 50. Increase this for large datasets. To read and process all data at once, set this value to 1. |
verbose |
Boolean (TRUE or FALSE) describing whether to report block-processing progress. |
BPPARAM |
Parallelization params (passed to 'BiocParallel::bplapply()'). Default is the result of 'BiocParallel::bpparams()'. Parallel processing is not available when 'by=interactions'. |
blockSize |
Number (length one numeric vector) describing the size in base-pairs to pull from each '.hic' file. Default is 1e6. For large '.hic' files 'blockSize' can be reduced to conserve the amount of data read in at a time. Larger 'blockSize' values speed up performance, but use more memory. |
... |
Additional arguments passed to 'pullHicMatrices()'. |
It may be necessary to adjust the 'zrange' in 'plotMatrix()' since the Hi-C diagonal will dominate the scale.
If interactions are passed to the function, only intrachromosomal ranges are maintained.
Using small 'binSize' values with large ranges may lead to pulling very large sections of a Hi-C map that can crash your R session. If this happens try increasing the 'binSize' and 'nBlocks' parameters, while lower the 'blockSize' parameter.
A DelayedArray of aggregated counts.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFile <- marinerData::LEUK_HEK_PJA27_inter_30.hic() names(hicFile) <- "FS" ## Loops loops <- marinerData::FS_5kbLoops.txt() |> read.table(header=TRUE, nrows=100) |> as_ginteractions() |> GenomeInfoDb::`seqlevelsStyle<-`(value='ENSEMBL') ## Warn about small binSize pileupDomains(x=loops, files=hicFile, binSize=50e3, buffer=0.25)## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFile <- marinerData::LEUK_HEK_PJA27_inter_30.hic() names(hicFile) <- "FS" ## Loops loops <- marinerData::FS_5kbLoops.txt() |> read.table(header=TRUE, nrows=100) |> as_ginteractions() |> GenomeInfoDb::`seqlevelsStyle<-`(value='ENSEMBL') ## Warn about small binSize pileupDomains(x=loops, files=hicFile, binSize=50e3, buffer=0.25)
pileupPixels optionally removes short interactions that intersect the diagonal before extracting then aggregating square regions around each pixel from Hi-C files. This is also known as aggregate peak analysis (APA)
pileupPixels( x, files, binSize, buffer = 5, removeShort = TRUE, minPairDist = 0, normalize = TRUE, FUN = sum, nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), ... ) ## S4 method for signature 'GInteractions,character,numeric' pileupPixels( x, files, binSize, buffer = 5, removeShort = TRUE, minPairDist = 0, normalize = TRUE, FUN = sum, nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), ... )pileupPixels( x, files, binSize, buffer = 5, removeShort = TRUE, minPairDist = 0, normalize = TRUE, FUN = sum, nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), ... ) ## S4 method for signature 'GInteractions,character,numeric' pileupPixels( x, files, binSize, buffer = 5, removeShort = TRUE, minPairDist = 0, normalize = TRUE, FUN = sum, nBlocks = 5, verbose = TRUE, BPPARAM = bpparam(), ... )
x |
GInteractions object containing interactions to extract from Hi-C files. These should be pixels of a single 'binSize' in width. |
files |
Character file paths to '.hic' files. |
binSize |
Integer (numeric) describing the resolution (range widths) of the paired data. |
buffer |
Integer indicating the buffer size, or number of pixels |
removeShort |
Boolean, whether to remove short pairs (Default) or not. |
minPairDist |
Pairs with a distance less than or equal to this value will be filtered out. |
normalize |
Boolean, whether to normalize the aggregated values to the number of interactions (after filtering out short pairs - if applicable). |
FUN |
Function to use for aggregating. |
nBlocks |
Number of blocks for block-processing arrays. Default is 5. Increase this for large datasets. To read and process all data at once, set this value to 1. |
verbose |
Boolean (TRUE or FALSE) describing whether to report block-processing progress. |
BPPARAM |
Parallelization params (passed to 'BiocParallel::bplapply()'). Default is the result of 'BiocParallel::bpparams()'. Parallel processing is not available when 'by=interactions'. |
... |
Additional arguments passed to 'pullHicMatrices()'. |
Note that pair distance filtering is done after expanding interactions to matrices.
A DelayedMatrix of aggregated counts.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFile <- marinerData::LEUK_HEK_PJA30_inter_30.hic() names(hicFile) <- "WT" ## Loops loops <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE, nrows=1000) |> as_ginteractions(keep.extra.columns=FALSE) |> assignToBins(binSize=5e3) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(loops) <- 'ENSEMBL' ## APA mat <- pileupPixels( x=loops, files=hicFile, binSize=5e3, minPairDist=50e3, normalize=FALSE )## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFile <- marinerData::LEUK_HEK_PJA30_inter_30.hic() names(hicFile) <- "WT" ## Loops loops <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE, nrows=1000) |> as_ginteractions(keep.extra.columns=FALSE) |> assignToBins(binSize=5e3) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(loops) <- 'ENSEMBL' ## APA mat <- pileupPixels( x=loops, files=hicFile, binSize=5e3, minPairDist=50e3, normalize=FALSE )
Pixels are defined as paired-ranges with starts & ends equal to their 'binSize'. This function takes GInteractions fitting this description and expands the ranges such that there is a 'buffer' of pixels around each range.
pixelsToMatrices(x, buffer) ## S4 method for signature 'GInteractions,numeric' pixelsToMatrices(x, buffer)pixelsToMatrices(x, buffer) ## S4 method for signature 'GInteractions,numeric' pixelsToMatrices(x, buffer)
x |
GInteractions object. |
buffer |
Number (length one numeric vector) of pixels around the pixels in 'x'. |
For example, a buffer of 3 would return a GInteractions object with 3 pixels surrounding the original pixel ranges.
After using 'pullHicMatrices()', the result will return a matrix of row and column dimensions of buffer*2+1.
Note, this function does not handle out-of-bound ranges.
'x' with updated ranges.
## Define example 100bp pixel library(InteractionSet) pixel <- GInteractions( anchor1=GRanges("chr1:500-600"), anchor2=GRanges("chr1:2000-2100") ) ## Expand pixel to matrix with ## 3 pixels surrounding the center ## pixel region <- pixelsToMatrices(x=pixel, buffer=3) region## Define example 100bp pixel library(InteractionSet) pixel <- GInteractions( anchor1=GRanges("chr1:500-600"), anchor2=GRanges("chr1:2000-2100") ) ## Expand pixel to matrix with ## 3 pixels surrounding the center ## pixel region <- pixelsToMatrices(x=pixel, buffer=3) region
Adjust loop enrichment to remove distance- dependent effect.
Show diagnostic plot of loop enrichment before and after distance adjustment.
plotEnrichment(scores, interactions, k = 25, nknots = 10, plot = TRUE) adjustEnrichment(x, interactions, k = 25, nknots = 10) ## S4 method for signature 'DelayedMatrix_OR_matrix,GInteractions' adjustEnrichment(x, interactions, k = 25, nknots = 10) ## S4 method for signature 'numeric,GInteractions' plotEnrichment(scores, interactions, k = 25, nknots = 10, plot = TRUE)plotEnrichment(scores, interactions, k = 25, nknots = 10, plot = TRUE) adjustEnrichment(x, interactions, k = 25, nknots = 10) ## S4 method for signature 'DelayedMatrix_OR_matrix,GInteractions' adjustEnrichment(x, interactions, k = 25, nknots = 10) ## S4 method for signature 'numeric,GInteractions' plotEnrichment(scores, interactions, k = 25, nknots = 10, plot = TRUE)
scores |
Numeric vector of enrichment scores. |
interactions |
A GInteractions Object containing the interactions used to calculate enrichment scores. |
k |
Number of observations for rolling window. |
nknots |
integer or function giving the number of knots to use see '?smooth.spline' for more info. |
plot |
Boolean (default=FALSE), of whether to show diagnostic plot. |
x |
A DelayedMatrix or matrix with enrichment scores. |
A DelayedMatrix of enrichment scores where rows are loops and columns are Hi-C files.
A plot (and associated data) for visualizing loop enrichment before and after distance adjustment.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loops as GInteractions object loops <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE, nrows=1000) |> as_ginteractions(keep.extra.columns=FALSE) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(loops) <- 'ENSEMBL' ## Calculate loop enrichment enrich <- calcLoopEnrichment( x=assignToBins(loops, 100e03), files=hicFiles ) adjustEnrichment(enrich, loops) plotEnrichment(enrich[,1], loops)## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loops as GInteractions object loops <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE, nrows=1000) |> as_ginteractions(keep.extra.columns=FALSE) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(loops) <- 'ENSEMBL' ## Calculate loop enrichment enrich <- calcLoopEnrichment( x=assignToBins(loops, 100e03), files=hicFiles ) adjustEnrichment(enrich, loops) plotEnrichment(enrich[,1], loops)
Used to plot single or aggregate matrix such as aggregate peak analysis.
plotMatrix( data, params = NULL, x = NULL, y = NULL, width = NULL, height = NULL, just = c("left", "top"), default.units = "inches", draw = TRUE, palette = colorRampPalette(RColorBrewer::brewer.pal(9, "YlGnBu")), zrange = NULL, na.color = "grey" ) ## S4 method for signature 'DelayedMatrix_OR_matrix' plotMatrix( data, params = NULL, x = NULL, y = NULL, width = NULL, height = NULL, just = c("left", "top"), default.units = "inches", draw = TRUE, palette = colorRampPalette(RColorBrewer::brewer.pal(9, "YlGnBu")), zrange = NULL, na.color = "grey" )plotMatrix( data, params = NULL, x = NULL, y = NULL, width = NULL, height = NULL, just = c("left", "top"), default.units = "inches", draw = TRUE, palette = colorRampPalette(RColorBrewer::brewer.pal(9, "YlGnBu")), zrange = NULL, na.color = "grey" ) ## S4 method for signature 'DelayedMatrix_OR_matrix' plotMatrix( data, params = NULL, x = NULL, y = NULL, width = NULL, height = NULL, just = c("left", "top"), default.units = "inches", draw = TRUE, palette = colorRampPalette(RColorBrewer::brewer.pal(9, "YlGnBu")), zrange = NULL, na.color = "grey" )
data |
'DelayedMatrix', 'matrix', list of matrices, or 3 column 'data.frame' of APA results. |
params |
Optional 'pgParams' object containing relevant function parameters. |
x |
Numeric or unit object specifying the x-location of plot. |
y |
Numeric or unit object specifying the y-location of plot. |
width |
Numeric or unit object specifying the width of plot. |
height |
Numeric or unit object specifying the height of plot. |
just |
String or numeric vector specifying the justification of the viewport relative to its (x, y) location. |
default.units |
String indicating the default units to use if 'x', 'y', 'width', or 'height' are only given as numeric vectors. |
draw |
Logical value indicating whether graphics output should be produced. |
palette |
'colorRampPalette' function to use for mapping values to colors. |
zrange |
Vector of length 2; max and min values to set color scale |
na.color |
String indicating the color to use for mapping NA values. |
Function will draw a color-mapped matrix and return an S3 object of class 'MatrixPlot'.
library(plotgardener) library(RColorBrewer) ## Create divergent matrix #### m <- matrix(data=rnorm(n=21*21, mean=0, sd=2), nrow=21, ncol=21) ## Define parameters p <- pgParams(width=3, height=3, default.units="inches") ## Create page pageCreate(params=p) ## Plot apa plot <- plotMatrix(data=m, x=p$width/2, y=p$height/2, width=p$width*0.5, height = p$width*0.5, just=c("center", "center"), palette=colorRampPalette(c("blue", "white", "red")), zrange=NULL) ## Annotate legend annoHeatmapLegend(plot=plot, x=2.3, y=0.75, width=0.1, height=0.75) ## Create sequential matrix m <- matrix(data=sample(0:100, 21*21, replace=TRUE), nrow=21, ncol=21) ## Define parameters p <- pgParams(width=3, height=3, default.units="inches") ## Create page pageCreate(params=p) ## Plot apa plot <- plotMatrix(data=m, x=p$width/2, y=p$height/2, width=p$width*0.5, height=p$width*0.5, just=c("center", "center"), palette=colorRampPalette(c("white", "dark red")), zrange = NULL) ## Annotate legend annoHeatmapLegend(plot=plot, x=2.3, y=0.75, width=0.1, height=0.75)library(plotgardener) library(RColorBrewer) ## Create divergent matrix #### m <- matrix(data=rnorm(n=21*21, mean=0, sd=2), nrow=21, ncol=21) ## Define parameters p <- pgParams(width=3, height=3, default.units="inches") ## Create page pageCreate(params=p) ## Plot apa plot <- plotMatrix(data=m, x=p$width/2, y=p$height/2, width=p$width*0.5, height = p$width*0.5, just=c("center", "center"), palette=colorRampPalette(c("blue", "white", "red")), zrange=NULL) ## Annotate legend annoHeatmapLegend(plot=plot, x=2.3, y=0.75, width=0.1, height=0.75) ## Create sequential matrix m <- matrix(data=sample(0:100, 21*21, replace=TRUE), nrow=21, ncol=21) ## Define parameters p <- pgParams(width=3, height=3, default.units="inches") ## Create page pageCreate(params=p) ## Plot apa plot <- plotMatrix(data=m, x=p$width/2, y=p$height/2, width=p$width*0.5, height=p$width*0.5, just=c("center", "center"), palette=colorRampPalette(c("white", "dark red")), zrange = NULL) ## Annotate legend annoHeatmapLegend(plot=plot, x=2.3, y=0.75, width=0.1, height=0.75)
The dimensions of the pulled submatrix is defined by dividing the widths of anchors in 'x' by the 'binSize'. When the anchor widths are the same for each interaction, an InteractionArray is returned. However, if the anchor widths differ in 'x', an InteractionJaggedArray is returned instead.
pullHicMatrices( x, files, binSize, ..., h5File = tempfile(fileext = ".h5"), half = "both", norm = "NONE", matrix = "observed", blockSize = 248956422, onDisk = TRUE, compressionLevel = 0, chunkSize = 1 ) ## S4 method for signature 'GInteractions,character,numeric' pullHicMatrices( x, files, binSize, h5File, half, norm, matrix, blockSize, onDisk, compressionLevel, chunkSize )pullHicMatrices( x, files, binSize, ..., h5File = tempfile(fileext = ".h5"), half = "both", norm = "NONE", matrix = "observed", blockSize = 248956422, onDisk = TRUE, compressionLevel = 0, chunkSize = 1 ) ## S4 method for signature 'GInteractions,character,numeric' pullHicMatrices( x, files, binSize, h5File, half, norm, matrix, blockSize, onDisk, compressionLevel, chunkSize )
x |
GInteractions object containing interactions to extract from Hi-C files. |
files |
Character file paths to '.hic' files. |
binSize |
Integer (numeric) describing the resolution (range widths) of the paired data. |
... |
Additional arguments. |
h5File |
Character file path to save '.h5' file. |
half |
String (character vector of length one) indicating whether to keep values for the upper triangular ('half="upper"') where 'start1 < start2', lower triangular ('half="lower"') where 'start1 > start2', or both ('half="both"', default). When 'half="upper"' all lower triangular values are 'NA'. When 'half="lower"' all upper triangular values are 'NA'. When 'half="both"' there are no 'NA' values. For interchromosomal interactions there is no inherent directionality between chromosomes, so data is returned regardless of specified order. |
norm |
String (length one character vector) describing the Hi-C normalization to apply. Use 'strawr::readHicNormTypes()' to see accepted values for each file in 'files'. |
matrix |
String (length one character vector) Type of matrix to extract. Must be one of "observed", "oe", or "expected". "observed" is observed counts, "oe" is observed/expected counts, "expected" is expected counts. |
blockSize |
Number (length one numeric vector) describing the size in base-pairs to pull from each '.hic' file. Default is 248956422 (the length of the longest chromosome in the human hg38 genome). For large '.hic' files 'blockSize' can be reduced to conserve the amount of data read in at a time. Larger 'blockSize' values speed up performance, but use more memory. |
onDisk |
Boolean (length one logical vector that is not NA) indicating whether extracted data should be stored on disk in an HDF5 file. Default is TRUE. |
compressionLevel |
Number (length one numeric vector) between 0 (Default) and 9 indicating the compression level used on HDF5 file. |
chunkSize |
Number (length one numeric vector) indicating how many values of 'x' to chunk for each write to HDF5 stored data. This has downstream implications for accessing subsets later. For small 'compressionLevel' values use smaller 'chunkSize' values and for large 'compressionLevel' values use large (i.e. 'length(x)') values to improve performance. |
InteractionSet object with a 4-dimensional array of Hi-C submatrices, rownames, and colnames. Array is stored with the following dimensions: Interactions in 'x', Hi-C 'files', rows of submatrix, columns of submatrix. The submatrices returned have rows cooresponding to anchor1 of 'x' and columns correspond to anchor2 of 'x'.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loop pixels as GInteractions object pixels <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE) |> as_ginteractions(keep.extra.columns=FALSE) |> assignToBins(binSize=100e3) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(pixels) <- 'ENSEMBL' ## Expand pixels to regions for pulling ## Hi-C submatrices regions <- pixelsToMatrices(x=pixels, buffer=5) ## Extract 11x11 count matrices from the ## first 100 regions and 2 Hi-C files iarr <- pullHicMatrices(x=regions[1:100], files=hicFiles, binSize=100e3) iarr ## Access count matrices counts(iarr) ## Display the start bin of each ## interaction in the count ## matrices counts(iarr, showDimnames=TRUE) ## InteractionJaggedArray example gi <- read.table(text=" 1 51000000 51300000 1 51000000 51500000 2 52000000 52300000 3 52000000 52500000 1 150000000 150500000 1 150000000 150300000 2 52000000 52300000 2 52000000 52800000") |> as_ginteractions() iarr <- pullHicMatrices(gi, hicFiles, 100e03, half="both") iarr counts(iarr)## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loop pixels as GInteractions object pixels <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE) |> as_ginteractions(keep.extra.columns=FALSE) |> assignToBins(binSize=100e3) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(pixels) <- 'ENSEMBL' ## Expand pixels to regions for pulling ## Hi-C submatrices regions <- pixelsToMatrices(x=pixels, buffer=5) ## Extract 11x11 count matrices from the ## first 100 regions and 2 Hi-C files iarr <- pullHicMatrices(x=regions[1:100], files=hicFiles, binSize=100e3) iarr ## Access count matrices counts(iarr) ## Display the start bin of each ## interaction in the count ## matrices counts(iarr, showDimnames=TRUE) ## InteractionJaggedArray example gi <- read.table(text=" 1 51000000 51300000 1 51000000 51500000 2 52000000 52300000 3 52000000 52500000 1 150000000 150500000 1 150000000 150300000 2 52000000 52300000 2 52000000 52800000") |> as_ginteractions() iarr <- pullHicMatrices(gi, hicFiles, 100e03, half="both") iarr counts(iarr)
Pull contact frequency from '.hic' files
pullHicPixels( x, files, binSize, ..., h5File = tempfile(fileext = ".h5"), half = "both", norm = "NONE", matrix = "observed", blockSize = 248956422, onDisk = TRUE, compressionLevel = 0, chunkSize = 1 ) ## S4 method for signature 'GInteractions,character,numeric' pullHicPixels( x, files, binSize, h5File, half, norm, matrix, blockSize, onDisk, compressionLevel, chunkSize )pullHicPixels( x, files, binSize, ..., h5File = tempfile(fileext = ".h5"), half = "both", norm = "NONE", matrix = "observed", blockSize = 248956422, onDisk = TRUE, compressionLevel = 0, chunkSize = 1 ) ## S4 method for signature 'GInteractions,character,numeric' pullHicPixels( x, files, binSize, h5File, half, norm, matrix, blockSize, onDisk, compressionLevel, chunkSize )
x |
GInteractions object containing interactions to extract from Hi-C files. |
files |
Character file paths to '.hic' files. |
binSize |
Integer (numeric) describing the resolution (range widths) of the paired data. |
... |
Additional arguments. |
h5File |
Character file path to save '.h5' file. |
half |
String (character vector of length one) indicating whether to keep values for the upper triangular ('half="upper"') where 'start1 < start2', lower triangular ('half="lower"') where 'start1 > start2', or both ('half="both"', default). When 'half="upper"' all lower triangular values are 'NA'. When 'half="lower"' all upper triangular values are 'NA'. When 'half="both"' there are no 'NA' values. For interchromosomal interactions there is no inherent directionality between chromosomes, so data is returned regardless of specified order. |
norm |
String (length one character vector) describing the Hi-C normalization to apply. Use 'strawr::readHicNormTypes()' to see accepted values for each file in 'files'. |
matrix |
String (length one character vector) Type of matrix to extract. Must be one of "observed", "oe", or "expected". "observed" is observed counts, "oe" is observed/expected counts, "expected" is expected counts. |
blockSize |
Number (length one numeric vector) describing the size in base-pairs to pull from each '.hic' file. Default is 248956422 (the length of the longest chromosome in the human hg38 genome). For large '.hic' files 'blockSize' can be reduced to conserve the amount of data read in at a time. Larger 'blockSize' values speed up performance, but use more memory. |
onDisk |
Boolean (length one logical vector that is not NA) indicating whether extracted data should be stored on disk in an HDF5 file. Default is TRUE. |
compressionLevel |
Number (length one numeric vector) between 0 (Default) and 9 indicating the compression level used on HDF5 file. |
chunkSize |
Number (length one numeric vector) indicating how many values of 'x' to chunk for each write to HDF5 stored data. This has downstream implications for accessing subsets later. For small 'compressionLevel' values use smaller 'chunkSize' values and for large 'compressionLevel' values use large (i.e. 'length(x)') values to improve performance. |
InteractionSet object with a 2-dimensional array of Hi-C interactions (rows) and Hi-C sample (columns).
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loop pixels as GInteractions object pixels <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE) |> as_ginteractions(keep.extra.columns=FALSE) |> assignToBins(binSize=100e3) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(pixels) <- 'ENSEMBL' ## Extract the first 100 pixels imat <- pullHicPixels(x=pixels[1:100], files=hicFiles, binSize=100e3) imat ## Access count matrix counts(imat)## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loop pixels as GInteractions object pixels <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE) |> as_ginteractions(keep.extra.columns=FALSE) |> assignToBins(binSize=100e3) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(pixels) <- 'ENSEMBL' ## Extract the first 100 pixels imat <- pullHicPixels(x=pixels[1:100], files=hicFiles, binSize=100e3) imat ## Access count matrix counts(imat)
InteractionJaggedArray objects and their count matrices (JaggedArray objects) contain variable dimension matrices. The 'regularize' function resizes these matrices to the new dimensions supplied in 'ndim'. The result is a DelayedArray object (for JaggedArray) or an InteractionArray object (for InteractionJaggedArray).
regularize( x, ndim = c(10, 10), h5File = tempfile(fileext = ".h5"), scale = TRUE, nBlocks = 5, verbose = TRUE, chunkSize = 1, compressionLevel = 0, ... ) ## S4 method for signature 'JaggedArray' regularize( x, ndim, h5File, scale, nBlocks, verbose, chunkSize, compressionLevel ) ## S4 method for signature 'InteractionJaggedArray' regularize( x, ndim, h5File, scale, nBlocks, verbose, chunkSize, compressionLevel )regularize( x, ndim = c(10, 10), h5File = tempfile(fileext = ".h5"), scale = TRUE, nBlocks = 5, verbose = TRUE, chunkSize = 1, compressionLevel = 0, ... ) ## S4 method for signature 'JaggedArray' regularize( x, ndim, h5File, scale, nBlocks, verbose, chunkSize, compressionLevel ) ## S4 method for signature 'InteractionJaggedArray' regularize( x, ndim, h5File, scale, nBlocks, verbose, chunkSize, compressionLevel )
x |
A JaggedArray or InteractionJaggedArray object. |
ndim |
Numeric vector of length two describing the new dimensions of the output matrices. |
h5File |
Character file path to save '.h5' file. |
scale |
Boolean (TRUE/FALSE) indicating whether the values in the new matrices should be scaled to the total signal in each matrix. |
nBlocks |
Number of blocks for block-processing JaggedArrays. Default is 5. Increase this for large datasets. To read and process all data at once, set this value to 1. |
verbose |
Boolean (TRUE or FALSE) describing whether to report block-processing progress. |
chunkSize |
Number (length one numeric vector) indicating how many values of 'x' to chunk for each write to HDF5 stored data. This has downstream implications for accessing subsets later. For small 'compressionLevel' values use smaller 'chunkSize' values and for large 'compressionLevel' values use large (i.e. 'length(x)') values to improve performance. |
compressionLevel |
Number (length one numeric vector) between 0 (Default) and 9 indicating the compression level used on HDF5 file. |
... |
Additional arguments. |
Note that the interaction/binSize/count matrices relationship will be inconsistent in the resulting InteractionArray object and the row/col names will not be available.
If 'x' is a JaggedArray then 'regularize' returns an HDF5-backed 4-dimensional DelayedArray object where the first and second dimensions are the rows and columns of the count matrices ('ndim'), the third dimension is the number of interactions and the fourth dimension is the number of files. If 'x' is an InteractionJaggedArray then an InteractionArray object is returned where counts returns the object described above.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( LEUK_HEK_PJA27_inter_30.hic(), LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Create test interactions gi <- read.table(text=" 1 51000000 51300000 1 51000000 51500000 2 52000000 52300000 3 52000000 52500000 1 150000000 150500000 1 150000000 150300000 2 52000000 52300000 2 52000000 52800000") |> as_ginteractions() gi <- c(gi,gi) # make more interactions ## InteractionJaggedArray object ija <- pullHicMatrices(gi, hicFiles, 100e03, half="both") ## Regularize InteractionJaggedArray ia <- regularize(ija, ndim=c(5,5), nBlocks=1) aggHicMatrices(ia, nBlocks=1) ## Regularize JaggedArray ja <- counts(ija) regularize(ja, ndim=c(5,5), nBlocks=1)## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( LEUK_HEK_PJA27_inter_30.hic(), LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Create test interactions gi <- read.table(text=" 1 51000000 51300000 1 51000000 51500000 2 52000000 52300000 3 52000000 52500000 1 150000000 150500000 1 150000000 150300000 2 52000000 52300000 2 52000000 52800000") |> as_ginteractions() gi <- c(gi,gi) # make more interactions ## InteractionJaggedArray object ija <- pullHicMatrices(gi, hicFiles, 100e03, half="both") ## Regularize InteractionJaggedArray ia <- regularize(ija, ndim=c(5,5), nBlocks=1) aggHicMatrices(ia, nBlocks=1) ## Regularize JaggedArray ja <- counts(ija) regularize(ja, ndim=c(5,5), nBlocks=1)
Removes short interactions with some padding from the diagonal. If you are resizing the regions with a function like 'pixelsToMatrices()', make sure this function is run afterwards.
removeShortPairs(x, padding = 0) ## S4 method for signature 'GInteractions' removeShortPairs(x, padding = 0)removeShortPairs(x, padding = 0) ## S4 method for signature 'GInteractions' removeShortPairs(x, padding = 0)
x |
A GInteractions object. |
padding |
Minimum distance away from the diagonal. |
Note this is only applies to intrachromosomal pairs, as pair distance is meaningless for interchromosomal pairs. Therefore, all interchromosomal pairs are kept.
A GInteractions object with the short pairs removed.
## Example GInteractions object gi <- as_ginteractions(read.table( text=" seqnames1 start1 end1 seqnames2 start2 end2 keep chr1 300 400 chr1 300 400 'no' chr1 100 200 chr1 300 400 'yes' chr1 300 400 chr1 100 200 'yes' chr1 300 400 chr2 300 400 'yes' chr1 250 350 chr1 300 400 'only_with_padding_50' chr1 300 400 chr1 250 350 'only_with_padding_50' ", header=TRUE )) ## Remove pairs that would cross the diagonal removeShortPairs(gi) ## Add 50bp of padding removeShortPairs(gi, padding=50)## Example GInteractions object gi <- as_ginteractions(read.table( text=" seqnames1 start1 end1 seqnames2 start2 end2 keep chr1 300 400 chr1 300 400 'no' chr1 100 200 chr1 300 400 'yes' chr1 300 400 chr1 100 200 'yes' chr1 300 400 chr2 300 400 'yes' chr1 250 350 chr1 300 400 'only_with_padding_50' chr1 300 400 chr1 250 350 'only_with_padding_50' ", header=TRUE )) ## Remove pairs that would cross the diagonal removeShortPairs(gi) ## Add 50bp of padding removeShortPairs(gi, padding=50)
Get selectionMethod from MergedGInteractions object
selectionMethod(x, ...) ## S4 method for signature 'MergedGInteractions' selectionMethod(x, ...)selectionMethod(x, ...) ## S4 method for signature 'MergedGInteractions' selectionMethod(x, ...)
x |
MergedGInteractions object. |
... |
Additional arguments. |
A character vector describing which selection method was used for merging.
## Load required packages library(data.table, include.only="fread") ## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Reference BEDPE files (loops called with SIP) bedpeFiles <- c( marinerData::FS_5kbLoops.txt(), marinerData::WT_5kbLoops.txt() ) names(bedpeFiles) <- c("FS", "WT") ## Read in bedpeFiles as a list of GInteractions ## Use only first 1000 rows for fast example giList <- lapply(bedpeFiles, fread, nrows=1000) |> lapply(as_ginteractions) ## Cluster & merge pairs x <- mergePairs(x = giList, radius = 10e03, column = "APScoreAvg") selectionMethod(x)## Load required packages library(data.table, include.only="fread") ## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Reference BEDPE files (loops called with SIP) bedpeFiles <- c( marinerData::FS_5kbLoops.txt(), marinerData::WT_5kbLoops.txt() ) names(bedpeFiles) <- c("FS", "WT") ## Read in bedpeFiles as a list of GInteractions ## Use only first 1000 rows for fast example giList <- lapply(bedpeFiles, fread, nrows=1000) |> lapply(as_ginteractions) ## Cluster & merge pairs x <- mergePairs(x = giList, radius = 10e03, column = "APScoreAvg") selectionMethod(x)
Get the pixel representing the strongest or weakest interaction in an InteractionArray
selectPixel( x, aggFUN = sum, selectFUN = "which.max", nBlocks = 5, verbose = TRUE ) ## S4 method for signature 'InteractionArray' selectPixel( x, aggFUN = sum, selectFUN = "which.max", nBlocks = 5, verbose = TRUE )selectPixel( x, aggFUN = sum, selectFUN = "which.max", nBlocks = 5, verbose = TRUE ) ## S4 method for signature 'InteractionArray' selectPixel( x, aggFUN = sum, selectFUN = "which.max", nBlocks = 5, verbose = TRUE )
x |
InteractionArray object |
aggFUN |
Function to use for aggregating across Hi-C files. Must be passable to 'which.max' or 'which.min'. Default is "sum". |
selectFUN |
Function to use for selecting among aggregated interactions. Must be one of "which.max" or "which.min". |
nBlocks |
Number of blocks for block-processing arrays. Default is 5. Increase this for large datasets. To read and process all data at once, set this value to 1. |
verbose |
Boolean (TRUE or FALSE) describing whether to report block-processing progress. Default is TRUE. |
A GInteractions object with the updated pixel interactions, along with a column with the aggregated max/min value for that pixel.
## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loops as GInteractions object loops <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE) |> as_ginteractions(keep.extra.columns=FALSE) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(loops) <- 'ENSEMBL' ## Rebin loops to 2.5e6 resolution loops <- assignToBins(x=loops, binSize=2.5e06) ## Pull 5x5 matrices iarr <- pullHicMatrices(x=loops[1:5], files=hicFiles, binSize=500e3, norm="KR", half='upper') ## Select pixel selectPixel(iarr)## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Read .hic file paths hicFiles <- c( marinerData::LEUK_HEK_PJA27_inter_30.hic(), marinerData::LEUK_HEK_PJA30_inter_30.hic() ) names(hicFiles) <- c("FS", "WT") ## Read in loops as GInteractions object loops <- WT_5kbLoops.txt() |> setNames("WT") |> read.table(header=TRUE) |> as_ginteractions(keep.extra.columns=FALSE) ## Removes the "chr" prefix for compatibility ## with the preprocessed hic files GenomeInfoDb::seqlevelsStyle(loops) <- 'ENSEMBL' ## Rebin loops to 2.5e6 resolution loops <- assignToBins(x=loops, binSize=2.5e06) ## Pull 5x5 matrices iarr <- pullHicMatrices(x=loops[1:5], files=hicFiles, binSize=500e3, norm="KR", half='upper') ## Select pixel selectPixel(iarr)
Note: that buffer must be the same as the selection functions to work appropriately
For 'selectCoordinates', 'rowInd' and 'colInd' are paired such that the selected position in the matrix is 'c(rowInd[1:i], colInd[1:j])' for 'i' rows and 'j' columns.
selectRadius(x, buffer, invert = FALSE) selectCenterPixel(mhDist, buffer, invert = FALSE) selectSubmatrix(m, invert = FALSE) selectCoordinates(rowInd, colInd, buffer, invert = FALSE) selectBlock(rowInd, colInd, buffer, invert = FALSE) selectTopLeft(n, buffer, inset = 0, invert = FALSE) selectTopRight(n, buffer, inset = 0, invert = FALSE) selectBottomRight(n, buffer, inset = 0, invert = FALSE) selectBottomLeft(n, buffer, inset = 0, invert = FALSE) selectCorners(n, buffer, inset = 0, invert = FALSE) selectRows(rows, buffer, invert = FALSE) selectCols(cols, buffer, invert = FALSE) selectInner(n, buffer, invert = FALSE) selectOuter(n, buffer, invert = FALSE) ## S4 method for signature 'MatrixSelection' show(object) ## S4 method for signature 'numeric' selectRadius(x, buffer, invert = FALSE) ## S4 method for signature 'numeric' selectCenterPixel(mhDist, buffer, invert = FALSE) ## S4 method for signature 'matrix' selectSubmatrix(m, invert = FALSE) ## S4 method for signature 'numeric' selectCoordinates(rowInd, colInd, buffer, invert = FALSE) ## S4 method for signature 'numeric' selectBlock(rowInd, colInd, buffer, invert = FALSE) ## S4 method for signature 'numeric' selectTopLeft(n, buffer, inset = 0, invert = FALSE) ## S4 method for signature 'numeric' selectTopRight(n, buffer, inset = 0, invert = FALSE) ## S4 method for signature 'numeric' selectBottomRight(n, buffer, inset = 0, invert = FALSE) ## S4 method for signature 'numeric' selectBottomLeft(n, buffer, inset = 0, invert = FALSE) ## S4 method for signature 'numeric' selectCorners(n, buffer, inset = 0, invert = FALSE) ## S4 method for signature 'numeric' selectRows(rows, buffer, invert = FALSE) ## S4 method for signature 'numeric' selectCols(cols, buffer, invert = FALSE) ## S4 method for signature 'numeric' selectInner(n, buffer, invert = FALSE) ## S4 method for signature 'numeric' selectOuter(n, buffer, invert = FALSE)selectRadius(x, buffer, invert = FALSE) selectCenterPixel(mhDist, buffer, invert = FALSE) selectSubmatrix(m, invert = FALSE) selectCoordinates(rowInd, colInd, buffer, invert = FALSE) selectBlock(rowInd, colInd, buffer, invert = FALSE) selectTopLeft(n, buffer, inset = 0, invert = FALSE) selectTopRight(n, buffer, inset = 0, invert = FALSE) selectBottomRight(n, buffer, inset = 0, invert = FALSE) selectBottomLeft(n, buffer, inset = 0, invert = FALSE) selectCorners(n, buffer, inset = 0, invert = FALSE) selectRows(rows, buffer, invert = FALSE) selectCols(cols, buffer, invert = FALSE) selectInner(n, buffer, invert = FALSE) selectOuter(n, buffer, invert = FALSE) ## S4 method for signature 'MatrixSelection' show(object) ## S4 method for signature 'numeric' selectRadius(x, buffer, invert = FALSE) ## S4 method for signature 'numeric' selectCenterPixel(mhDist, buffer, invert = FALSE) ## S4 method for signature 'matrix' selectSubmatrix(m, invert = FALSE) ## S4 method for signature 'numeric' selectCoordinates(rowInd, colInd, buffer, invert = FALSE) ## S4 method for signature 'numeric' selectBlock(rowInd, colInd, buffer, invert = FALSE) ## S4 method for signature 'numeric' selectTopLeft(n, buffer, inset = 0, invert = FALSE) ## S4 method for signature 'numeric' selectTopRight(n, buffer, inset = 0, invert = FALSE) ## S4 method for signature 'numeric' selectBottomRight(n, buffer, inset = 0, invert = FALSE) ## S4 method for signature 'numeric' selectBottomLeft(n, buffer, inset = 0, invert = FALSE) ## S4 method for signature 'numeric' selectCorners(n, buffer, inset = 0, invert = FALSE) ## S4 method for signature 'numeric' selectRows(rows, buffer, invert = FALSE) ## S4 method for signature 'numeric' selectCols(cols, buffer, invert = FALSE) ## S4 method for signature 'numeric' selectInner(n, buffer, invert = FALSE) ## S4 method for signature 'numeric' selectOuter(n, buffer, invert = FALSE)
x |
Integer vector of manhattan distances to select. |
buffer |
Integer describing the number of pixels surrounding the central pixel. |
invert |
Boolean indicating whether to invert the selection. |
mhDist |
Integer vector of manhattan distances to select along with center pixel. |
m |
matrix with 1's indicating selected positions and 0's indicated unselected positions. |
rowInd |
Integer describing the row indices. |
colInd |
Integer describing the column indices. |
n |
Integer describing the number of outer pixels to select. Must be length of one. |
inset |
Integer describing the number of pixels to inset the selection from the outer edge of the matrix. Default of 0 uses no inset. |
rows |
Integer describing which rows to select. |
cols |
Integer describing which cols to select. |
object |
A MatrixSelection object. |
A text-based visualization of the select matrix indices.
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
Numeric vector of matrix indices (byRow).
res <- selectCenterPixel(0, 3) show(res) selectRadius(x=c(2,3,4), buffer=5, invert=FALSE) selectCenterPixel(0, 5) selectSubmatrix(m = matrix(rep(c(1,0,1), 3), nrow=3, ncol=3)) selectCoordinates(rowInd=1:3, colInd=1:3, buffer=5) selectBlock(rowInd=1:3, colInd=1:3, buffer=5) selectTopLeft(n=3, buffer=5, inset=1, invert=FALSE) selectTopRight(n=3, buffer=5, inset=1, invert=FALSE) selectBottomRight(n=3, buffer=5, inset=1, invert=FALSE) selectBottomLeft(n=3, buffer=5, inset=1, invert=FALSE) selectCorners(n=3, buffer=5, inset=1, invert=FALSE) selectRows(rows=1:3, buffer=5, invert=FALSE) selectCols(cols=1:3, buffer=5, invert=FALSE) selectInner(n=1, buffer=5, invert=FALSE) selectOuter(n=1, buffer=5, invert=FALSE)res <- selectCenterPixel(0, 3) show(res) selectRadius(x=c(2,3,4), buffer=5, invert=FALSE) selectCenterPixel(0, 5) selectSubmatrix(m = matrix(rep(c(1,0,1), 3), nrow=3, ncol=3)) selectCoordinates(rowInd=1:3, colInd=1:3, buffer=5) selectBlock(rowInd=1:3, colInd=1:3, buffer=5) selectTopLeft(n=3, buffer=5, inset=1, invert=FALSE) selectTopRight(n=3, buffer=5, inset=1, invert=FALSE) selectBottomRight(n=3, buffer=5, inset=1, invert=FALSE) selectBottomLeft(n=3, buffer=5, inset=1, invert=FALSE) selectCorners(n=3, buffer=5, inset=1, invert=FALSE) selectRows(rows=1:3, buffer=5, invert=FALSE) selectCols(cols=1:3, buffer=5, invert=FALSE) selectInner(n=1, buffer=5, invert=FALSE) selectOuter(n=1, buffer=5, invert=FALSE)
Access each portion of a GInteractions-like object
seqnames1(x, ...) seqnames2(x, ...) start1(x, ...) end1(x, ...) start2(x, ...) end2(x, ...) ## S4 method for signature 'GInteractions_OR_InteractionSet' seqnames1(x) ## S4 method for signature 'GInteractions_OR_InteractionSet' seqnames2(x) ## S4 method for signature 'GInteractions_OR_InteractionSet' start1(x) ## S4 method for signature 'GInteractions_OR_InteractionSet' end1(x) ## S4 method for signature 'GInteractions_OR_InteractionSet' start2(x) ## S4 method for signature 'GInteractions_OR_InteractionSet' end2(x)seqnames1(x, ...) seqnames2(x, ...) start1(x, ...) end1(x, ...) start2(x, ...) end2(x, ...) ## S4 method for signature 'GInteractions_OR_InteractionSet' seqnames1(x) ## S4 method for signature 'GInteractions_OR_InteractionSet' seqnames2(x) ## S4 method for signature 'GInteractions_OR_InteractionSet' start1(x) ## S4 method for signature 'GInteractions_OR_InteractionSet' end1(x) ## S4 method for signature 'GInteractions_OR_InteractionSet' start2(x) ## S4 method for signature 'GInteractions_OR_InteractionSet' end2(x)
x |
GInteractions object. |
... |
Additional arguments. |
A vector of values corresponding to the requested component of a GInteractions-like object. For seqnames1 and seqnames2 the RLE is coerced to a character vector.
library(InteractionSet) ## Create example reference interactions objects gi <- read.table(text=" chr1 10 20 chr1 50 60 chr2 30 40 chr2 60 70 chr1 50 60 chr3 10 20") |> as_ginteractions() iset <- InteractionSet(assays=matrix(nrow=3), interactions=gi) ## Access vectors of values seqnames1(gi) start1(gi) end1(gi) seqnames2(gi) start2(gi) end2(gi) ## Also works for InteractionSet-like objects seqnames1(iset) start1(iset) end1(iset) seqnames2(iset) start2(iset) end2(iset)library(InteractionSet) ## Create example reference interactions objects gi <- read.table(text=" chr1 10 20 chr1 50 60 chr2 30 40 chr2 60 70 chr1 50 60 chr3 10 20") |> as_ginteractions() iset <- InteractionSet(assays=matrix(nrow=3), interactions=gi) ## Access vectors of values seqnames1(gi) start1(gi) end1(gi) seqnames2(gi) start2(gi) end2(gi) ## Also works for InteractionSet-like objects seqnames1(iset) start1(iset) end1(iset) seqnames2(iset) start2(iset) end2(iset)
Returns the subset of MergedGInteractions that belong to each input source object (see these with 'sources(x)'). If the source pairs all come from the same object, their corresponding merged pair is returned. However, if at least one source pair comes from a different object, then that merged pair is not returned.
sets(x, include, exclude) ## S4 method for signature 'MergedGInteractions,missing,missing' sets(x) ## S4 method for signature 'MergedGInteractions,character_OR_missing,missing' sets(x, include) ## S4 method for signature 'MergedGInteractions,missing,character_OR_missing' sets(x, exclude) ## S4 method for signature ## 'MergedGInteractions,character_OR_missing,character_OR_missing' sets(x, include, exclude)sets(x, include, exclude) ## S4 method for signature 'MergedGInteractions,missing,missing' sets(x) ## S4 method for signature 'MergedGInteractions,character_OR_missing,missing' sets(x, include) ## S4 method for signature 'MergedGInteractions,missing,character_OR_missing' sets(x, exclude) ## S4 method for signature ## 'MergedGInteractions,character_OR_missing,character_OR_missing' sets(x, include, exclude)
x |
MergedGInteractions object. |
include |
(Optional) A character vector of sources in which a pair must be present. For a list of available sources use 'sources(x)'. |
exclude |
(Optional) A character vector of sources in which a pair must be absent. For a list of available sources use 'sources(x)'. |
Optional 'include' and 'exclude' parameters modulate the behavior of 'sets' to return different subsets of originating pairs. For example, 'include' requires that the returned pairs be present in specific sources, while 'exclude' requires that returned pairs be absent from specific sources. Sources not listed in either 'include' or 'exclude' are ignored (they may or may not) be present in the returned 'MergedGInteractions' object. 'include' and 'exclude' can be used indepedently or in combination to return every possible set. If any of the same sources are used in both 'include' and 'exclude' the function will return a 0-length MergedGInteractions object.
A list of subsetted 'MergedGInteractions' objects or a 'MergedGInteractions' object (if 'include' and/or 'exclude' are used).
## Load required packages library(GenomicRanges) library(InteractionSet) ## Define example anchor regions gr1 <- GRanges(seqnames = "chr1", ranges = IRanges(start = c(30,40,40,70,80), end = c(40,50,50,80,90))) gr2 <- GRanges(seqnames = "chr1", ranges = IRanges(start = c(30,30,50,10,30), end = c(40,40,60,20,40))) ## Form GInteractions and split into two files giList <- split(x = GInteractions(gr1, gr2), f = c(rep(1,3), rep(2,2))) ## Merge pairs x <- mergePairs(x = giList, radius = 20) sets(x)## Load required packages library(GenomicRanges) library(InteractionSet) ## Define example anchor regions gr1 <- GRanges(seqnames = "chr1", ranges = IRanges(start = c(30,40,40,70,80), end = c(40,50,50,80,90))) gr2 <- GRanges(seqnames = "chr1", ranges = IRanges(start = c(30,30,50,10,30), end = c(40,40,60,20,40))) ## Form GInteractions and split into two files giList <- split(x = GInteractions(gr1, gr2), f = c(rep(1,3), rep(2,2))) ## Merge pairs x <- mergePairs(x = giList, radius = 20) sets(x)
Flexibly shifting GRanges according to strand
shiftRanges(x, pos) ## S4 method for signature 'GRanges,character_OR_numeric' shiftRanges(x, pos)shiftRanges(x, pos) ## S4 method for signature 'GRanges,character_OR_numeric' shiftRanges(x, pos)
x |
GRanges object |
pos |
Position within anchors to resize the bin. Can be a character or integer vector of length 1 or 'length(x)' designating the position for each element in bedpe. Character options are "start", "end" and "center". Integers are referenced from the start position for '+' and '*' strands and from the end position for the '-' strand. |
GRanges object with a single position range that has been shifted appropriately.
library(GenomicRanges) ## Create example GRanges gr1 <- GRanges(seqnames = "chr1", ranges = IRanges::IRanges(start = rep(5000,3), end = rep(6000,3)), strand = c('+', '-', '*')) gr2 <- gr1 |> promoters(upstream = 2000, downstream = 200) ## Shifting anchors by keyword shiftRanges(gr1, 'start') shiftRanges(gr1, 'end') shiftRanges(gr1, 'center') ## Shifting anchors by position shiftRanges(gr1, 100) shiftRanges(gr1, c(100, 200, 300)) ## Shifting back to TSS shiftRanges(gr2, 2000)library(GenomicRanges) ## Create example GRanges gr1 <- GRanges(seqnames = "chr1", ranges = IRanges::IRanges(start = rep(5000,3), end = rep(6000,3)), strand = c('+', '-', '*')) gr2 <- gr1 |> promoters(upstream = 2000, downstream = 200) ## Shifting anchors by keyword shiftRanges(gr1, 'start') shiftRanges(gr1, 'end') shiftRanges(gr1, 'center') ## Shifting anchors by position shiftRanges(gr1, 100) shiftRanges(gr1, c(100, 200, 300)) ## Shifting back to TSS shiftRanges(gr2, 2000)
Snap GRanges or GInteractions to nearest bins
Snap paired-objects to nearest bins
snapToBins(x, binSize) ## S4 method for signature 'GRanges,numeric' snapToBins(x, binSize) ## S4 method for signature 'GInteractions,numeric' snapToBins(x, binSize)snapToBins(x, binSize) ## S4 method for signature 'GRanges,numeric' snapToBins(x, binSize) ## S4 method for signature 'GInteractions,numeric' snapToBins(x, binSize)
x |
'GInteractions' object. |
binSize |
Integer (numeric) describing the new size of each range. |
GRanges object snapped to the nearest 'binSize'.
Input object snapped to the nearest 'binSize'.
library(GenomicRanges) ## Example GRanges object x <- GRanges(seqnames = c("chr1"), ranges = IRanges(start = c(1, 1, 25, 19, 21), end = c(15, 11, 31, 31, 39))) snapToBins(x, binSize = 5) snapToBins(x, binSize = 10) snapToBins(x, binSize = 20) library(InteractionSet) ## Sample GInteractions object x <- GInteractions(anchor1 = c(GRanges("chr1:1-15"), GRanges("chr1:1-11")), anchor2 = c(GRanges("chr1:25-31"), GRanges("chr1:19-31"))) snapToBins(x, binSize = 5) snapToBins(x, binSize = 10) snapToBins(x, binSize = 20)library(GenomicRanges) ## Example GRanges object x <- GRanges(seqnames = c("chr1"), ranges = IRanges(start = c(1, 1, 25, 19, 21), end = c(15, 11, 31, 31, 39))) snapToBins(x, binSize = 5) snapToBins(x, binSize = 10) snapToBins(x, binSize = 20) library(InteractionSet) ## Sample GInteractions object x <- GInteractions(anchor1 = c(GRanges("chr1:1-15"), GRanges("chr1:1-11")), anchor2 = c(GRanges("chr1:25-31"), GRanges("chr1:19-31"))) snapToBins(x, binSize = 5) snapToBins(x, binSize = 10) snapToBins(x, binSize = 20)
Access the names or source files of a 'MergedGInteractions' object.
sources(x) ## S4 method for signature 'MergedGInteractions' sources(x)sources(x) ## S4 method for signature 'MergedGInteractions' sources(x)
x |
MergedGInteractions object. |
A character vector of names or source files of a 'MergedGInteractions' object.
## Load required packages library(data.table, include.only="fread") ## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Reference BEDPE files (loops called with SIP) loopFiles <- c( marinerData::FS_5kbLoops.txt(), marinerData::WT_5kbLoops.txt() ) names(loopFiles) <- c("FS", "WT") ## Read in loopFiles as a list of GInteractions ## Use only first 1000 rows for fast example giList <- lapply(loopFiles, fread, nrows=1000) |> lapply(as_ginteractions) ## Cluster & merge pairs x <- mergePairs(x = giList, radius = 10e03) sources(x)## Load required packages library(data.table, include.only="fread") ## Load marinerData if (!require("marinerData", quietly = TRUE)) BiocManager::install("marinerData") ## Reference BEDPE files (loops called with SIP) loopFiles <- c( marinerData::FS_5kbLoops.txt(), marinerData::WT_5kbLoops.txt() ) names(loopFiles) <- c("FS", "WT") ## Read in loopFiles as a list of GInteractions ## Use only first 1000 rows for fast example giList <- lapply(loopFiles, fread, nrows=1000) |> lapply(as_ginteractions) ## Cluster & merge pairs x <- mergePairs(x = giList, radius = 10e03) sources(x)