| Title: | Deterministic Multithreaded Genomic Interval Operations |
|---|---|
| Description: | High-performance interval overlap and join operations for 'IRanges' and 'GenomicRanges'. The package provides deterministic multithreaded overlap computation, reusable subject indexes for repeated queries, and join helpers that keep range metadata in a consistent output grammar. |
| Authors: | Chirag Parsania [aut, cre] (github: cparsania) |
| Maintainer: | Chirag Parsania <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.1.0 |
| Built: | 2026-06-03 11:27:35 UTC |
| Source: | https://github.com/bioc/fastRanges |
Return query rows that do not overlap any subject ranges.
fast_anti_overlap_join( query, subject, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, query_prefix = "query_" )fast_anti_overlap_join( query, subject, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, query_prefix = "query_" )
query |
An |
subject |
An |
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
query_prefix |
Prefix applied to query columns. |
This is similar to a SQL ANTI JOIN.
It keeps only query rows with zero overlap hits.
overlap_count is always 0 in the returned table and is included to keep
the result grammar parallel to fast_semi_overlap_join().
A data.frame containing query rows with zero overlaps.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_anti_overlap_join(q, s)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_anti_overlap_join(q, s)
Build a sorted subject index that can be reused across repeated overlap queries.
fast_build_index(subject)fast_build_index(subject)
subject |
An |
The index stores a sorted representation of subject that is optimized for
repeated overlap queries.
Build the index once, then pass it as subject to
fast_find_overlaps(), fast_count_overlaps(), or other overlap-summary
functions.
Use the raw subject object, not the index, when you need subject metadata
in the output table, for example with overlap joins.
A fast_ranges_index object.
s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) idx <- fast_build_index(s) idxs <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) idx <- fast_build_index(s) idx
Assign each range to an overlap-connected component.
fast_cluster_overlaps( x, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, return = c("vector", "data.frame") )fast_cluster_overlaps( x, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, return = c("vector", "data.frame") )
x |
An |
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
return |
One of |
Two ranges are assigned to the same cluster when they are connected by a chain of overlaps under the provided overlap settings.
Two ranges can end up in the same cluster even if they do not overlap directly, as long as they are linked by intermediate overlaps.
Example: if range 1 overlaps range 2, and range 2 overlaps range 3, then all three belong to one cluster.
If return = "vector", an integer vector of cluster IDs with one
element per range in x. If return = "data.frame", a data.frame with
range_id, cluster_id, and cluster_size.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
x <- IRanges::IRanges(start = c(1, 3, 10, 11, 20), end = c(5, 8, 12, 14, 22)) fast_cluster_overlaps(x)x <- IRanges::IRanges(start = c(1, 3, 10, 11, 20), end = c(5, 8, 12, 14, 22)) fast_cluster_overlaps(x)
Count subject overlaps per query range.
fast_count_overlaps( query, subject, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE )fast_count_overlaps( query, subject, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE )
query |
An |
subject |
An |
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
deterministic does not change returned counts for this summary output.
Returns one integer per query range.
A value of 0 means that query had no matching subjects.
A value of 5 means that query matched five subject ranges under the chosen
overlap rule.
Integer vector with one element per query range.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_count_overlaps(q, s, threads = 1)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_count_overlaps(q, s, threads = 1)
Count overlaps per query and per subject metadata group.
fast_count_overlaps_by_group( query, subject, group_col, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, include_na_group = FALSE )fast_count_overlaps_by_group( query, subject, group_col, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, include_na_group = FALSE )
query |
An |
subject |
An |
group_col |
Subject metadata column name used for grouping.
This must be present in |
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
include_na_group |
Logical scalar. If |
deterministic does not change returned group counts for this summary
output.
This function answers the question: "for each query, how many overlaps came from each subject group?"
Rows correspond to query ranges.
Columns correspond to distinct values in mcols(subject)[[group_col]].
Integer matrix with one row per query and one column per group.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
q <- GenomicRanges::GRanges("chr1", IRanges::IRanges(c(1, 10), width = 5)) s <- GenomicRanges::GRanges("chr1", IRanges::IRanges(c(2, 9), width = 5)) S4Vectors::mcols(s)$grp <- c("A", "B") fast_count_overlaps_by_group(q, s, group_col = "grp")q <- GenomicRanges::GRanges("chr1", IRanges::IRanges(c(1, 10), width = 5)) s <- GenomicRanges::GRanges("chr1", IRanges::IRanges(c(2, 9), width = 5)) S4Vectors::mcols(s)$grp <- c("A", "B") fast_count_overlaps_by_group(q, s, group_col = "grp")
Compute per-position coverage for input ranges.
fast_coverage( x, shift = 0L, width = NULL, weight = 1L, method = c("auto", "sort", "hash"), threads = fast_default_threads() )fast_coverage( x, shift = 0L, width = NULL, weight = 1L, method = c("auto", "sort", "hash"), threads = fast_default_threads() )
x |
An |
shift |
Passed to |
width |
Passed to |
weight |
Passed to |
method |
Coverage method. |
threads |
Integer scalar thread count. Reserved for API consistency. |
Coverage answers the question: "how many ranges cover each position?"
For IRanges, the result is one Rle.
For GRanges, the result is an RleList, usually one coverage track per
sequence level.
Rle (for IRanges) or RleList (for GRanges).
x <- IRanges::IRanges(start = c(1, 4, 10), end = c(5, 8, 12)) fast_coverage(x)x <- IRanges::IRanges(start = c(1, 4, 10), end = c(5, 8, 12)) fast_coverage(x)
Returns the default thread count used by fastRanges overlap routines.
fast_default_threads()fast_default_threads()
The default is controlled by getOption("fastRanges.threads") and falls
back to 1L.
This helper is mainly useful when you want package-wide thread control
without passing threads = to every call.
Example:
options(fastRanges.threads = 8L) sets the default thread count for later
calls that do not specify threads explicitly.
Integer scalar thread count.
fast_default_threads() old_threads <- getOption("fastRanges.threads") options(fastRanges.threads = 3L) fast_default_threads() options(fastRanges.threads = old_threads)fast_default_threads() old_threads <- getOption("fastRanges.threads") options(fastRanges.threads = 3L) fast_default_threads() options(fastRanges.threads = old_threads)
Return non-overlapping segments induced by input ranges.
fast_disjoin(x, ignore_strand = FALSE, with_revmap = FALSE)fast_disjoin(x, ignore_strand = FALSE, with_revmap = FALSE)
x |
An |
ignore_strand |
Logical scalar. Ignored for non-genomic ranges. |
with_revmap |
Logical scalar. If |
fast_disjoin() cuts the covered span of x into the smallest
non-overlapping pieces.
An object of the same range class as x.
x <- IRanges::IRanges(start = c(1, 4, 10), end = c(5, 8, 12)) fast_disjoin(x)x <- IRanges::IRanges(start = c(1, 4, 10), end = c(5, 8, 12)) fast_disjoin(x)
Compute nearest-neighbor mapping from query ranges to subject ranges.
fast_distance_to_nearest( query, subject, ignore_strand = FALSE, threads = fast_default_threads() )fast_distance_to_nearest( query, subject, ignore_strand = FALSE, threads = fast_default_threads() )
query |
An |
subject |
An |
ignore_strand |
Logical scalar. Ignored for non-genomic ranges. |
threads |
Integer scalar thread count. Included for API consistency. |
This function currently returns the same object shape as fast_nearest().
It is provided so users can choose the more explicit name when they care
about the distance column.
A S4Vectors::DataFrame with query_id, subject_id, and
distance columns.
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_distance_to_nearest(q, s)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_distance_to_nearest(q, s)
Compute overlap pairs between query and subject using a multithreaded
C++ backend. The result is a Hits object compatible with Bioconductor
workflows.
fast_find_overlaps( query, subject, select = c("all", "first", "last", "arbitrary"), max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE )fast_find_overlaps( query, subject, select = c("all", "first", "last", "arbitrary"), max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE )
query |
An |
subject |
An |
select |
Character scalar controlling whether all hits are returned or a single subject match is selected per query. Use Use |
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
This is the core matching function in fastRanges.
Think of it as answering the question: "for each query range, which subject ranges satisfy my overlap rule?"
The return value is a Hits object. The important pieces are:
queryHits(h) gives the row numbers from query.
subjectHits(h) gives the matching row numbers from subject.
If you need only counts or a yes/no answer, prefer fast_count_overlaps()
or fast_overlaps_any(), because they return simpler summaries.
Compatibility notes:
fastRanges aims to stay close to Bioconductor overlap semantics for
supported inputs, and its outputs are routinely validated against
IRanges::findOverlaps() / GenomicRanges::findOverlaps().
Currently supported core input types are IRanges and GRanges.
Empty-range semantics are delegated to Bioconductor-compatible reference behavior.
Circular genomic sequences are not currently supported and will raise an explicit error.
GRangesList inputs are not currently supported and will raise an explicit
error.
Performance notes:
For one-off overlap calls, use the raw subject.
For repeated-query or throughput-oriented workloads, build a reusable index
once with fast_build_index(subject) and pass that index as subject.
For maximum multithreaded throughput, consider deterministic = FALSE when
output order is not important.
If select = "all", a S4Vectors::Hits object.
Otherwise, an integer vector of length length(query) containing one
selected subject index per query and NA when no subject matched.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) h <- fast_find_overlaps(q, s, threads = 1) length(h)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) h <- fast_find_overlaps(q, s, threads = 1) length(h)
Create an iterator that computes overlaps in query chunks.
fast_find_overlaps_iter( query, subject, chunk_size = 50000L, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE )fast_find_overlaps_iter( query, subject, chunk_size = 50000L, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE )
query |
An |
subject |
An |
chunk_size |
Integer scalar number of query ranges to process in one iterator step. Smaller values use less memory and give more frequent progress points. Larger values usually improve throughput but each call to
|
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
Use the iterator API when query is large and you do not want to
materialize all overlap hits in one call.
Typical workflow:
create the iterator with fast_find_overlaps_iter()
inspect progress with fast_iter_has_next()
pull one chunk with fast_iter_next()
or collect all remaining chunks with fast_iter_collect()
A fast_ranges_iter iterator object.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) it <- fast_find_overlaps_iter(q, s, chunk_size = 2L) fast_iter_has_next(it)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) it <- fast_find_overlaps_iter(q, s, chunk_size = 2L) fast_iter_has_next(it)
Return the index of the first subject range that is strictly before each query range.
fast_follow( query, subject, ignore_strand = FALSE, threads = fast_default_threads() )fast_follow( query, subject, ignore_strand = FALSE, threads = fast_default_threads() )
query |
An |
subject |
An |
ignore_strand |
Logical scalar. Ignored for non-genomic ranges. |
threads |
Integer scalar thread count. Included for API consistency. |
For each query range, return the index of the first subject range that comes strictly before the query.
Integer vector of subject indices, with NA for unmatched queries.
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_follow(q, s)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_follow(q, s)
Compute uncovered regions between ranges.
fast_gaps(x, start = NULL, end = NULL, ignore_strand = FALSE)fast_gaps(x, start = NULL, end = NULL, ignore_strand = FALSE)
x |
An |
start, end
|
Optional integer bounds for non-genomic ranges.
These are most useful for |
ignore_strand |
Logical scalar. Ignored for non-genomic ranges. |
fast_gaps() returns the regions not covered by x inside the requested
bounds.
An object of the same range class as x.
x <- IRanges::IRanges(start = c(1, 4, 10), end = c(5, 8, 12)) fast_gaps(x, start = 1L, end = 15L)x <- IRanges::IRanges(start = c(1, 4, 10), end = c(5, 8, 12)) fast_gaps(x, start = 1L, end = 15L)
Summarize index size and partition structure.
fast_index_stats(index, detailed = FALSE)fast_index_stats(index, detailed = FALSE)
index |
A |
detailed |
Logical scalar. If |
Use this function to inspect how the subject was partitioned internally and to get a rough sense of memory footprint.
subject_n is the number of indexed ranges.
partition_n is the number of internal partitions, usually driven by
sequence structure.
index_size_mb is the in-memory object size, not the serialized file size.
By default, a one-row S4Vectors::DataFrame with summary fields.
If detailed = TRUE, returns a list with summary and partitions.
s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) idx <- fast_build_index(s) fast_index_stats(idx)s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) idx <- fast_build_index(s) fast_index_stats(idx)
Convenience wrapper for fast_overlap_join(..., join = "inner").
fast_inner_overlap_join( query, subject, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, query_prefix = "query_", subject_prefix = "subject_" )fast_inner_overlap_join( query, subject, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, query_prefix = "query_", subject_prefix = "subject_" )
query |
An |
subject |
An |
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
query_prefix |
Prefix added to columns derived from |
subject_prefix |
Prefix added to columns derived from |
A data.frame overlap join result.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_inner_overlap_join(q, s)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_inner_overlap_join(q, s)
Materialize all overlap chunks from an iterator into a single Hits object.
fast_iter_collect(iter)fast_iter_collect(iter)
iter |
A |
This collects only the chunks that have not yet been consumed.
If you want all hits from the beginning after some chunks were already read,
call fast_iter_reset(iter) first and then fast_iter_collect(iter).
A S4Vectors::Hits object.
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) it <- fast_find_overlaps_iter(q, s, chunk_size = 2L) fast_iter_collect(it)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) it <- fast_find_overlaps_iter(q, s, chunk_size = 2L) fast_iter_collect(it)
Check if an Iterator Has Remaining Chunks
fast_iter_has_next(iter)fast_iter_has_next(iter)
iter |
A |
Logical scalar indicating whether more query chunks remain.
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) it <- fast_find_overlaps_iter(q, s, chunk_size = 2L) fast_iter_has_next(it)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) it <- fast_find_overlaps_iter(q, s, chunk_size = 2L) fast_iter_has_next(it)
Compute overlaps for the next query chunk.
fast_iter_next(iter)fast_iter_next(iter)
iter |
A |
Each call advances the iterator state.
The returned Hits object uses global query indices, not chunk-local
indices, so you can combine chunk outputs safely.
A S4Vectors::Hits object for the next chunk, with global query
indices.
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) it <- fast_find_overlaps_iter(q, s, chunk_size = 2L) fast_iter_next(it)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) it <- fast_find_overlaps_iter(q, s, chunk_size = 2L) fast_iter_next(it)
Reset iterator position to the first query chunk.
fast_iter_reset(iter)fast_iter_reset(iter)
iter |
A |
After reset, the next call to fast_iter_next() starts again from the first
query chunk.
Invisibly returns iter.
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) it <- fast_find_overlaps_iter(q, s, chunk_size = 2L) fast_iter_reset(it)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) it <- fast_find_overlaps_iter(q, s, chunk_size = 2L) fast_iter_reset(it)
Convenience wrapper for fast_overlap_join(..., join = "left").
fast_left_overlap_join( query, subject, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, query_prefix = "query_", subject_prefix = "subject_" )fast_left_overlap_join( query, subject, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, query_prefix = "query_", subject_prefix = "subject_" )
query |
An |
subject |
An |
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
query_prefix |
Prefix added to columns derived from |
subject_prefix |
Prefix added to columns derived from |
A data.frame overlap join result.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_left_overlap_join(q, s)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_left_overlap_join(q, s)
Load a fast_ranges_index object saved with fast_save_index().
fast_load_index(path)fast_load_index(path)
path |
File path to a serialized index. |
This function validates that the file contains the fields required by
fastRanges before returning the object.
A fast_ranges_index object.
s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) idx <- fast_build_index(s) f <- tempfile(fileext = ".rds") fast_save_index(idx, f) idx2 <- fast_load_index(f) unlink(f) print(idx2)s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) idx <- fast_build_index(s) f <- tempfile(fileext = ".rds") fast_save_index(idx, f) idx2 <- fast_load_index(f) unlink(f) print(idx2)
Compute nearest-neighbor mapping from query ranges to subject ranges.
fast_nearest( query, subject, ignore_strand = FALSE, threads = fast_default_threads() )fast_nearest( query, subject, ignore_strand = FALSE, threads = fast_default_threads() )
query |
An |
subject |
An |
ignore_strand |
Logical scalar. Ignored for non-genomic ranges. |
threads |
Integer scalar thread count. Included for API consistency. |
These functions answer nearest-neighbor questions rather than overlap questions.
fast_nearest() and fast_distance_to_nearest() return one row per matched
query.
query_id is the row index in query.
subject_id is the row index of the nearest subject.
distance is 0 when the query overlaps the subject and positive when the
nearest subject is separated by a gap.
A S4Vectors::DataFrame with query_id, subject_id, and
distance columns.
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_nearest(q, s)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_nearest(q, s)
Aggregate a numeric subject metadata column across overlaps for each query.
fast_overlap_aggregate( query, subject, value_col = NULL, fun = c("count", "sum", "mean", "min", "max"), max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, na_rm = TRUE )fast_overlap_aggregate( query, subject, value_col = NULL, fun = c("count", "sum", "mean", "min", "max"), max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, na_rm = TRUE )
query |
An |
subject |
An |
value_col |
Subject metadata column name containing numeric values.
This is required unless |
fun |
Aggregation function: one of |
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
na_rm |
Logical scalar. If |
deterministic does not change returned aggregate values for this summary
output.
This function summarizes subject metadata over the overlap hits of each query.
Examples:
use fun = "count" to count overlaps
use fun = "sum" to sum a signal column over matched subjects
use fun = "mean" to compute average matched score per query
Numeric vector with one value per query.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
q <- GenomicRanges::GRanges("chr1", IRanges::IRanges(c(1, 10), width = 5)) s <- GenomicRanges::GRanges("chr1", IRanges::IRanges(c(2, 9), width = 5)) S4Vectors::mcols(s)$score <- c(2, 5) fast_overlap_aggregate(q, s, value_col = "score", fun = "sum")q <- GenomicRanges::GRanges("chr1", IRanges::IRanges(c(1, 10), width = 5)) s <- GenomicRanges::GRanges("chr1", IRanges::IRanges(c(2, 9), width = 5)) S4Vectors::mcols(s)$score <- c(2, 5) fast_overlap_aggregate(q, s, value_col = "score", fun = "sum")
Build a metadata-preserving overlap join with a consistent, tidy tabular output grammar.
fast_overlap_join( query, subject, join = c("inner", "left"), max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, query_prefix = "query_", subject_prefix = "subject_" )fast_overlap_join( query, subject, join = c("inner", "left"), max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, query_prefix = "query_", subject_prefix = "subject_" )
query |
An |
subject |
An |
join |
Join mode.
|
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
query_prefix |
Prefix added to columns derived from |
subject_prefix |
Prefix added to columns derived from |
The join family turns overlap hits into beginner-friendly tabular output.
Output always starts with query_id and subject_id.
query_id is the 1-based row index from query.
subject_id is the 1-based row index from subject, or NA for unmatched
queries in a left join.
The remaining columns are prefixed copies of the original range columns and metadata columns.
A data.frame with overlap ids and prefixed query/subject columns.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_overlap_join(q, s, join = "inner", threads = 1)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_overlap_join(q, s, join = "inner", threads = 1)
Return TRUE for queries that overlap at least one subject range.
fast_overlaps_any( query, subject, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE )fast_overlaps_any( query, subject, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE )
query |
An |
subject |
An |
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
deterministic does not change returned logical values for this summary
output.
Returns one logical value per query range.
TRUE means at least one subject range matched.
FALSE means no subject range matched.
Logical vector with one element per query range.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_overlaps_any(q, s, threads = 1)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_overlaps_any(q, s, threads = 1)
Return the index of the first subject range that is strictly after each query range.
fast_precede( query, subject, ignore_strand = FALSE, threads = fast_default_threads() )fast_precede( query, subject, ignore_strand = FALSE, threads = fast_default_threads() )
query |
An |
subject |
An |
ignore_strand |
Logical scalar. Ignored for non-genomic ranges. |
threads |
Integer scalar thread count. Included for API consistency. |
For each query range, return the index of the first subject range that comes strictly after the query.
Integer vector of subject indices, with NA for unmatched queries.
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_precede(q, s)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_precede(q, s)
Compute range-wise intersection.
fast_range_intersect(x, y, ignore_strand = FALSE)fast_range_intersect(x, y, ignore_strand = FALSE)
x, y
|
|
ignore_strand |
Logical scalar. Ignored for non-genomic ranges. |
fast_range_intersect() keeps only the coordinate span shared by x and
y.
An object of the same range class as x and y.
x <- IRanges::IRanges(start = c(1, 10), end = c(5, 12)) y <- IRanges::IRanges(start = c(3, 20), end = c(8, 21)) fast_range_intersect(x, y)x <- IRanges::IRanges(start = c(1, 10), end = c(5, 12)) y <- IRanges::IRanges(start = c(3, 20), end = c(8, 21)) fast_range_intersect(x, y)
Compute ranges in x that are not covered by y.
fast_range_setdiff(x, y, ignore_strand = FALSE)fast_range_setdiff(x, y, ignore_strand = FALSE)
x, y
|
|
ignore_strand |
Logical scalar. Ignored for non-genomic ranges. |
fast_range_setdiff() subtracts the covered span of y from x.
An object of the same range class as x.
x <- IRanges::IRanges(start = c(1, 10), end = c(5, 12)) y <- IRanges::IRanges(start = c(3, 20), end = c(8, 21)) fast_range_setdiff(x, y)x <- IRanges::IRanges(start = c(1, 10), end = c(5, 12)) y <- IRanges::IRanges(start = c(3, 20), end = c(8, 21)) fast_range_setdiff(x, y)
Compute range-wise union.
fast_range_union(x, y, ignore_strand = FALSE)fast_range_union(x, y, ignore_strand = FALSE)
x, y
|
|
ignore_strand |
Logical scalar. Ignored for non-genomic ranges. |
fast_range_union() returns the combined interval coverage of x and y.
An object of the same range class as x and y.
x <- IRanges::IRanges(start = c(1, 10), end = c(5, 12)) y <- IRanges::IRanges(start = c(3, 20), end = c(8, 21)) fast_range_union(x, y)x <- IRanges::IRanges(start = c(1, 10), end = c(5, 12)) y <- IRanges::IRanges(start = c(3, 20), end = c(8, 21)) fast_range_union(x, y)
fast_ranges_example is a small reproducible dataset shipped with the
package. It is intended for documentation examples, teaching, unit tests,
and quick smoke checks of overlap workflows.
A named list with two components:
A GRanges object with 6 genomic intervals and metadata
columns query_id and score.
A GRanges object with 7 genomic intervals and metadata
columns gene_id and biotype.
The object is a named list with two GRanges components:
query: six example genomic intervals that behave like user-supplied
peaks, windows, or regions of interest.
subject: seven example genomic intervals that behave like annotation
features such as genes, promoters, enhancers, or other reference ranges.
Metadata columns are included so examples can demonstrate joins, grouped summaries, and overlap aggregation:
query contains query_id and score
subject contains gene_id and biotype
The same records are also distributed as plain BED files in
inst/extdata/query_peaks.bed and inst/extdata/subject_genes.bed. Use the
in-memory dataset when you want a ready-to-run example in R, and use the BED
files when you want to demonstrate file import or command-line workflows.
This dataset is intentionally small and synthetic. It is designed for examples and tests, not as a biological reference resource.
Generated from data-raw/make_example_data.R, which also writes
matching BED files to inst/extdata/.
data("fast_ranges_example", package = "fastRanges") query <- fast_ranges_example$query subject <- fast_ranges_example$subject query subject fast_find_overlaps(query, subject, threads = 1)data("fast_ranges_example", package = "fastRanges") query <- fast_ranges_example$query subject <- fast_ranges_example$subject query subject fast_find_overlaps(query, subject, threads = 1)
Merge overlapping or adjacent ranges.
fast_reduce(x, ignore_strand = FALSE, min_gap_width = 1L, with_revmap = FALSE)fast_reduce(x, ignore_strand = FALSE, min_gap_width = 1L, with_revmap = FALSE)
x |
An |
ignore_strand |
Logical scalar. Ignored for non-genomic ranges. |
min_gap_width |
Integer scalar controlling when nearby ranges should be merged.
Larger values require larger gaps before two ranges are kept separate. |
with_revmap |
Logical scalar. If |
fast_reduce() simplifies a range set by merging runs of overlapping or
near-adjacent intervals.
An object of the same range class as x.
x <- IRanges::IRanges(start = c(1, 4, 10), end = c(5, 8, 12)) fast_reduce(x)x <- IRanges::IRanges(start = c(1, 4, 10), end = c(5, 8, 12)) fast_reduce(x)
Save a fast_ranges_index object to disk for reuse across sessions.
fast_save_index(index, path, compress = TRUE)fast_save_index(index, path, compress = TRUE)
index |
A |
path |
Output file path for the serialized index. |
compress |
Logical scalar. If |
Saving an index is useful when index construction is expensive and the same subject set will be queried again in a later R session.
Invisibly returns path.
s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) idx <- fast_build_index(s) f <- tempfile(fileext = ".rds") fast_save_index(idx, f) unlink(f)s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) idx <- fast_build_index(s) f <- tempfile(fileext = ".rds") fast_save_index(idx, f) unlink(f)
Find overlaps within a single range object.
fast_self_overlaps( x, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, drop_self = TRUE, drop_redundant = TRUE )fast_self_overlaps( x, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, drop_self = TRUE, drop_redundant = TRUE )
x |
An |
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
drop_self |
Logical scalar. If |
drop_redundant |
Logical scalar. If |
fast_self_overlaps() compares a range object to itself.
This is useful when you want to know which ranges in one set overlap each other rather than overlap another object.
A S4Vectors::Hits object.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
x <- IRanges::IRanges(start = c(1, 3, 10), end = c(5, 8, 12)) fast_self_overlaps(x)x <- IRanges::IRanges(start = c(1, 3, 10), end = c(5, 8, 12)) fast_self_overlaps(x)
Return query rows that overlap at least one subject range.
fast_semi_overlap_join( query, subject, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, query_prefix = "query_" )fast_semi_overlap_join( query, subject, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE, query_prefix = "query_" )
query |
An |
subject |
An |
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
query_prefix |
Prefix applied to query columns. |
This is similar to a SQL SEMI JOIN.
It keeps only query rows that have at least one overlap hit.
overlap_count tells you how many subject ranges matched each retained
query row.
A data.frame containing matching query rows and overlap counts.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_semi_overlap_join(q, s)q <- IRanges::IRanges(start = c(1, 10, 20), width = c(5, 4, 3)) s <- IRanges::IRanges(start = c(3, 9, 18), width = c(4, 6, 5)) fast_semi_overlap_join(q, s)
Aggregate coverage into fixed-width tiles.
fast_tile_coverage( x, tile_width, step_width = tile_width, shift = 0L, width = NULL, weight = 1L, method = c("auto", "sort", "hash"), threads = fast_default_threads() )fast_tile_coverage( x, tile_width, step_width = tile_width, shift = 0L, width = NULL, weight = 1L, method = c("auto", "sort", "hash"), threads = fast_default_threads() )
x |
An |
tile_width |
Integer scalar tile width. |
step_width |
Integer scalar step width. Use a value smaller than
|
shift |
Passed to |
width |
Passed to |
weight |
Passed to |
method |
Coverage method. |
threads |
Integer scalar thread count. Reserved for API consistency. |
fast_tile_coverage() converts base-resolution coverage into a simpler table
of fixed-width summaries.
Each row in the result is one tile.
coverage_sum is the sum of coverage values across that tile.
For GRanges, the output also includes seqnames.
A data.frame with tile coordinates and coverage_sum.
x <- IRanges::IRanges(start = c(1, 4, 10), end = c(5, 8, 12)) fast_tile_coverage(x, tile_width = 5L)x <- IRanges::IRanges(start = c(1, 4, 10), end = c(5, 8, 12)) fast_tile_coverage(x, tile_width = 5L)
Count overlaps in sliding windows across the coordinate span of query.
fast_window_count_overlaps( query, subject, window_width, step_width = window_width, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE )fast_window_count_overlaps( query, subject, window_width, step_width = window_width, max_gap = -1L, min_overlap = 0L, type = c("any", "start", "end", "within", "equal"), ignore_strand = FALSE, threads = fast_default_threads(), deterministic = TRUE )
query |
An |
subject |
An |
window_width |
Integer scalar window width. |
step_width |
Integer scalar window step.
Use a value smaller than |
max_gap |
Integer scalar controlling how far apart two ranges may be and still count as a hit. Use Use Use positive values when you want "nearby" ranges to count as matches even if they do not overlap directly. Units are bases. The meaning is intentionally aligned with
|
min_overlap |
Integer scalar minimum overlap width, in bases.
Larger values require wider shared overlap and therefore return fewer hits. This argument matters only when the selected |
type |
Character scalar describing what "match" means.
|
ignore_strand |
Logical scalar controlling strand handling for genomic ranges. For
For |
threads |
Integer scalar number of worker threads to use. Use Use larger values on multicore machines when throughput matters. For repeated-query workloads, combine a prebuilt index from
|
deterministic |
Logical scalar controlling output order.
|
Windows are generated across the span of query, then overlap counts are
measured between those windows and subject.
The result is a table rather than a Hits object because the main goal is
summary, not individual hit inspection.
A data.frame containing window coordinates and overlap counts.
query is the range set you ask about. subject is the range set you
compare it against.
Core interval semantics (ASCII schematic):
type = "any"
query : |------|
subject: |------|
type = "within"
subject: |--------------|
query : |------|
type = "start"
query : |------|
subject: |------------|
type = "end"
query : |------|
subject: |------------|
type = "equal"
query : |------|
subject: |------|
gap / min_overlap controls
query : |------| |------| : subject
< gap >
The middle distance is the gap. A hit is allowed when this distance is
<= max_gap (for max_gap >= 0), and overlap width is >= min_overlap.
Beginner-friendly interpretation:
type = "any" asks "do these ranges touch or overlap closely enough to
count?"
type = "start" and type = "end" are useful when interval boundaries are
biologically meaningful, for example transcription start or end sites.
type = "within" asks whether each query lies inside a subject interval.
type = "equal" asks whether query and subject describe the same interval,
optionally with endpoint tolerance when max_gap >= 0.
This argument grammar is intentionally aligned with Bioconductor overlap
APIs (IRanges / GenomicRanges).
q <- IRanges::IRanges(start = c(1, 12), end = c(10, 20)) s <- IRanges::IRanges(start = c(2, 5, 15), end = c(3, 6, 16)) fast_window_count_overlaps(q, s, window_width = 5L, step_width = 5L)q <- IRanges::IRanges(start = c(1, 12), end = c(10, 20)) s <- IRanges::IRanges(start = c(2, 5, 15), end = c(3, 6, 16)) fast_window_count_overlaps(q, s, window_width = 5L, step_width = 5L)